Clustering, Classification and Validation via the L1 Data Depth
Rebecka Jornsten, Department of Statistics,Rutgers UniversityAmong the many important tasks in the analysis of microarray data are (i) the clustering of samples or genes, and (ii) the classification of samples. In addition, it is useful to have tools for validating the clustering or classification results. We present two new methods for clustering and classification based on the intuitively simple concept of data depth. We demonstrate on real and simulated data that our clustering method, DDclust, can substantially improve clustering accuracy compared with the popular PAM algorithm. The data depth based classifier, DDclass, is highly competitive with the best reported methods. We also discuss a validation tool, the Relative Data Depth (ReD), for clustering and classification. The ReD statistic is an excellent tool for identifying outliers in clustering, and selecting the number of clusters. In addition, the ReD statistic is shown to be a reliable indicator of classification confidence.