| Comparing Classification Techniques for Predicting Injury-Risk Status of Thoroughbred Race Horses
| Kalanthe Luepshen-Meek
Master's Candidate, Department of Statistics, Colorado State University
Monday, June 18, 2007
n the horse racing industry, it is of the utmost importance to make sure a race horse is in excellent health. Great care is taken to ensure the horses are fit for racing. Owners and jockeys would prefer to take preventive measures with an at-risk horse rather than race it and risk a costly injury. However, it has been difficult to discern which horses are susceptible to injury before it occurs. Equine scientists are currently attempting to use levels of biomarkers in the blood serum of race horses to explain changes in their physical health. In this paper, we consider the problem of classification of race horses into two groups -- healthy horses and at-risk horses -- using eight biomarkers. We discuss the fundamental principles of three commonly used classification techniques and provide simple illustrations of their application. We consider discriminant analysis, logistic regression and support vector machines and apply each of these techniques to a prospective equine case study where injury status is known. A local alignment kernel is used to transform the measurements into features. The three classification methods are applied to the transformed data and corresponding classifiers are computed. The methods are compared by examining cross-validated classification errors for predicting injury. It is shown that support vector machines have the lowest classification error rate and show promise for future predictions on injury status.