|Data depth based procedures that outperform classical t
(or T2 in high D) procedures
Yijun Zuo , Ph.D.
Department of Statistics and Probability, Michigan State University
March 10, 2008
With a very natural order principle, trimming in one dimension is straight-forward. One-dimensional trimmed means are among most popular estimators
of centers of data sets and have been used in our daily life and in various
applied fields of Statistics such as Machine Learning, Statistic Genetics, and
Bioinformatics. Trimmed means can overcome the high sensitivity of the mean
to outliers and heavy-tailed data and the low e±ciency of the median for light-tailed data. Hence they can serve as compromises between the mean and the
median, enjoying a very good balance between robustness and effciency.
Multi-dimensional data often contain outliers, which typically are far more
difficult to detect than in one dimension. A robust procedure such as the multi-dimensional trimming that can automatically detecting outliers or \heavy tails"
is thus desirable. The task of trimming in high dimensions, however, becomes
non-trivial, for there is no natural order principle in high dimensions. In this
talk, multi-dimensional trimming based on \data depth" is discussed. It is found
that multi-dimensional depth-trimmed means can possess very desirable properties such as high efficiency and high robustness. Further, inference procedures
based on the depth-trimmed means can outperform the classical t procedure in
one dimension and Hotelling's T2 one in high dimensions.