Annette Molinaro

Prediction of Survival with Regression Trees and Cross-Validation

Annette Molinaro, Mark van der Laan, Sandrine Dudoit
Division of Biostatistics, University of California, Berkeley

Presently, clinicians collect a tremendous amount of data on patients in the hopes of finding significant prognostic factors. A common scenario in medical studies is that in which hundreds, possibly thousands, of covariates are collected on each patient along with a time to event of interest. These covariates can include histological, epidemiological, and microarray measurements. Over the past several decades there have been numerous attempts to use nonparametric methods to fit this type of data. A common approach is to modify classification and regression trees, as described by Breiman, et al., specifically for right censored data. In contrast to the modified approaches, we implement methods presented by Robins and Rotnitzky and van der Laan and Robins based on linking the observed censored data to the full data world. Cross-validation based model selection is used to choose from the predictors, i.e., levels of a tree, proposed by CART. Our method considers the risk of a predictor based on the training sample as a full data parameter. We use the inverse probability of censoring weighting method to estimate this conditional risk parameter based on the validation sample. The proposed method is shown to be asymptotically optimal under appropriate conditions. Results from a simulation study and data analysis on Comparative Genomic Hybridization arrays are presented.

Graybill Conference
June 18-20, 2003
University Park Holiday Inn
Fort Collins, CO 80526
www.stat.colostate.edu/graybillconference
email: hari@stat.colostate.edu Fax: (970)491-7895 Phone: (970)491-5269
Last Updated: Wednesday, April 16, 2003