Prediction of Survival with Regression Trees and Cross-Validation Annette Molinaro, Mark van der Laan,
Sandrine Dudoit Presently, clinicians collect a tremendous amount of data on patients in the hopes of finding significant prognostic factors. A common scenario in medical studies is that in which hundreds, possibly thousands, of covariates are collected on each patient along with a time to event of interest. These covariates can include histological, epidemiological, and microarray measurements. Over the past several decades there have been numerous attempts to use nonparametric methods to fit this type of data. A common approach is to modify classification and regression trees, as described by Breiman, et al., specifically for right censored data. In contrast to the modified approaches, we implement methods presented by Robins and Rotnitzky and van der Laan and Robins based on linking the observed censored data to the full data world. Cross-validation based model selection is used to choose from the predictors, i.e., levels of a tree, proposed by CART. Our method considers the risk of a predictor based on the training sample as a full data parameter. We use the inverse probability of censoring weighting method to estimate this conditional risk parameter based on the validation sample. The proposed method is shown to be asymptotically optimal under appropriate conditions. Results from a simulation study and data analysis on Comparative Genomic Hybridization arrays are presented. |
Graybill Conference |