Kernel Methods for a Discrete State/Continous Time Process with Auxiliary Information 
Todd Ashley Iverson
Ph.D. Candidate, Preliminary Examination, Department of Statistics, Colorado State University
Thursday, December 7, 2006
2:00 p.m.4 p.m.
006 Statistics
ABSTRACT
The purpose of this document is to outline the work needed to complete the Doctoral thesis for Todd Iverson. The document gives an outline of the proposed chapters and the chapter and sections in this proposal reflect the proposed chapters and sections in the dissertation.
The main focus of the proposed dissertation is developing machine learning methods for a unique form of data. Suppose that a finite number of events are observed at certain points in time. A number of measurements are taken for each event, many of which are categorical. A number of methods are developed to learn from this form of data.
The topics addressed were inspired by working with data generated from medical insurance claims. Thus many of the questions that are raised sprung from natural questions relating to the utility of health insurance claims data. First, unsupervised methods for finding interesting association rules were adapted to the present situation. We wish to find interesting rules of the form A implies B , where A and B are sets of observed categories. Specifically, techniques are introduced for two new problems.
First, finding rules from one set of categories to another was investigated. We adapted a method to mine closed itemsets and proved a corollary that could lead to substantial savings in computation time. It remains to test this result empirically.
Second, we found interesting rules that take into account an associated cost variable. Simple techniques were developed based on existing database management software to mine this new type of rule efficiently. The newly developed rules resulted in the discovery of a number of very interesting families of association rules from the health claims database.
Supervised classification is also considered. Suppose that the sampling units all belong to one of a finite number of (unobserved) classes. Given a sample path of events through time for a sampling unit, how does one go about correctly classifying that unit? The specific example of predicting the onset of a medical condition such as Type II Diabetes motivated the exploration of this topic. Specifically, support vector machines were tested with various adapted kernels and data sets. A technique that made the classification by examining the log likelihood ratio of two Hidden Markov Models was successful in classifying patients. One of these two generative models was then converted to a Fisher Kernel. We were able to correctly predict the onset of diabetes 76% of the time by adding the Fisher Kernel to a kernel based on demographic information. This was the most successful classification method to date. In ongoing work, we investigate the adaptation of string kernels such as the Local Alignment Kernel to the situation. Other methods for developing kernels will be tried if time permits.
While working with this database, we encountered some issues related to classification with support vector machines that warranted further investigation. The large size of these data reinforced the need for methods to find the optimal set of tuning parameters. We are currently studying the use of sampling along with the Nelder Mead Simplex method to search the response surface in search of the best combination of parameters. Planned work will make use of incremental and decremental support vector machines and hypothesis testing to speed the search. The data also contained many variables possibly unrelated to the condition being predicted and we investigate new methods for variable, or feature, selection. We are currently studying the use of fractional factorial designs as a method for selecting a good model that uses a smaller subset of the overall features.
Finally, we look into an optimality condition for which the SVM is the best linear classifier. We also run simulations to investigate the support vector machines convergence to the Bayes classifier when using an appropriate kernel.
The dissertation concludes with a discussion of some of the strengths and weaknesses of my approach, and some consideration of potential future research.
