Rebecka Jornsten

Clustering, Classification and Validation via the L1 Data Depth

Rebecka Jornsten, Department of Statistics,Rutgers University

Among the many important tasks in the analysis of microarray data are (i) the clustering of samples or genes, and (ii) the classification of samples. In addition, it is useful to have tools for validating the clustering or classification results. We present two new methods for clustering and classification based on the intuitively simple concept of data depth. We demonstrate on real and simulated data that our clustering method, DDclust, can substantially improve clustering accuracy compared with the popular PAM algorithm. The data depth based classifier, DDclass, is highly competitive with the best reported methods. We also discuss a validation tool, the Relative Data Depth (ReD), for clustering and classification. The ReD statistic is an excellent tool for identifying outliers in clustering, and selecting the number of clusters. In addition, the ReD statistic is shown to be a reliable indicator of classification confidence.

Graybill Conference
June 18-20, 2003
University Park Holiday Inn
Fort Collins, CO 80526
www.stat.colostate.edu/graybillconference
email: hari@stat.colostate.edu Fax: (970)491-7895 Phone: (970)491-5269
Last Updated: Tuesday, June 10, 2003