Multiple Imputation of Missing Values in Logistic Regression: A Case Study
Jeffrey Mergler, M.S. Candidate, Department of Statistics, Colorado State University.
Monday, April 6, 2009
1:30 pm, Weber 223
In this study we look at Air Force Academy Preparatory School admissions data, matched with school records, and use multiple logistic regression to model the probability that a given student can achieve minimum Air Force Academy standards. The applicants are high school graduates who are underqualified for direct admission to the Air Force Academy, but are members of a desired demographic group. The Preparatory School is designed to take these students and help make them fully qualified for entrance. This study focuses on how admissions data from high school can be used to model success at the Preparatory School. The data available are fractured with missing and unreliable data. After fitting a preliminary multiple logistic regression model, the preliminary model is used with outlier detection methods to determine which data are unreliable. The unreliable data are then classified as missing, and multiple imputation is used to handle missing data contained in variables that may be used in the final model. The assumptions used for multiple logistic regression and multiple imputation are explored, and the value of multiple imputation as it applies to this case is discussed.
Adviser: Phil Chapman
Member: Haonan Wang
Outside: Ken Berry (Sociology)