Muthukrishnan
Back Home Up Next

 

Cleveland
Friedman
Grunwald
Jewell
Kolaczyk
Lee, T.
Lee, Y. 
Madigan
Meng
Muthukrishnan
Nair
Nolan
Rus
Saul
Singer
Wainwright
Wolfe
Wu
Yu
Estimating Simple Statistical Parameters on High Speed Data Streams
Muthu Muthukrishnan
Rutgers University
In a number of applications, input arrives very rapidly and there is limited memory to store the input. One needs to monitor simple statistical quantities of such ``streams''. In the past few years, researchers in Theoretical Computer Science have developed new estimation algorithms that work within these space and time constraints. The methods rely on metric embeddings, pseudo-random computations and sparse approximation theory. The applications include IP network traffic analysis, mining text message streams for Homeland Security and processing massive data sets in general.

I will present an overview of the algorithmic principles, and discuss issues in building data stream systems that work at IP line speeds. I will also discuss open problems, in particular, in performing more sophisticated statistical analyses on data streams.

This talk is based on an updated version of the survey at http://www.cs.rutgers.edu/~muthu/stream-1-1.ps  

Short Course: Information Theory & Statistics
Bin Yu & Mark Hansen
June 1, 2005
Colorado State University Campus
Fort Collins, CO 80523

Graybill Conference
June 2-3, 2005
Hilton Fort Collins

(Formerly: University Park Holiday- Inn)
Fort Collins, CO 80526

www.stat.colostate.edu/graybillconference
Graybill Conference Poster

Last Updated: Friday, May 24, 2005