Bin Yu
Statistics Department
UC Berkeley
Monday, 26 April 2004
4:10 PM
E202 Engineering Building
ABSTRACT
Our professional and personal lives now depend on the internet. The
heterogeneous and largely unregulated structure of the Internet renders
tasks such as dynamic routing, optimized service provision, service level
verification, and detection of anomalous/malicious behavior extremely
challenging. The problem is compounded by the fact that one cannot rely on
the cooperation of individual servers and routers to aid in the collection
of network traffic measurements vital for these tasks. In many ways,
network monitoring and inference problems bear a strong resemblance to other
``inverse problems'' in which key aspects of a system are not directly
observable. This emerging new field is called {\em Internet Tomography}.
In this talk, I will first review the general problem of linear internet
tomography (cf. Coates, Hero, Nowak, and Yu, 2002, SP Magazine) and then
conver in depth a special case: the estimation of OriginDestination (OD)
traffic matrix via link counts. The OD traffic information is very
important for dynamic updating of routing tables for networks. Our approach
to the OD estimation problem relies on a Gaussian model with a power
relationship between the mean and variance of OD traffic over a fixed small
time interval (e.g. 5 or 10 min) (cf. Cao, Davis, Vander Wiel and Yu, 2000,
J. Amer. Statist. Assoc.). Recognizing Maximum Likelihood Estimation (MLE)
for solving inverse problems in internet tomography is usually
computationally intractable for large networks, we use (Liang and Yu,
IEEESP, 2003) a maximum pseudolikelihood estimation (MPLE) approach to
solve a group of internet tomography problems including the OD problem.
MPLE keeps a good balance between the computational complexity and the
statistical efficiency of the parameter estimation. A
pseudoexpectationmaximization (EM) algorithm is developed to maximize the
pseudologlikelihood function. Finally, we will present some recent work
(Liang, Yu and Taft, 2003) using a Sprint network data set with validation
to compare our approach with that of the ATT group.
Refreshments will be served at 3:45 p.m. in Room 008 of the Statistics
Building
