

Adaptive Regression by Mixing: an alternative to model selection demonstrated on a capturerecapture study.
Lihua Chen, Panayotis Giannakouros & Yuhong Yang http://lchen.org/colostate/LihuaTalk.pdf Assessing how many clusters (an MDL criterion) Arta Doçi & Peter Bryant One of the major challenges in cluster analysis is the estimation of the
appropriate number of clusters in a dataset. Many approaches have been
proposed including, but not limited to, the elbow phenomenon, withincluster
dispersion, and viewing the estimation of the number of clusters as a model
selection problem. In this presentation, we have adopted the third approach,
i.e. we view the estimation of the number of clusters in the KMeans Algorithm
as a model selection problem. We use a criteria based on Rissanen's Minimum
Description Length (MDL) Principle to assess the number of clusters in a
dataset. We present the criterion, provide results from the analysis of a
number of data sets, and suggest some possible further avenues for future
development. Persistence of Plugin Rule in Classification of High Dimensional Binary Data ( ) Junyong Park We consider the classification when the predictors are multivariate binary random variables. For the given class, each variable is modeled as an independent Bernoulli( ) for 1≤i≤d and j=1,2 where represents the parameter of i th variable in the j th class. Triangular array for parameters, , is assumed to allow the parameters to change and the number of the variables, d, to increase for adopting the more flexible models as the sample size, n, increases. Difficulties under triangular array are pointed out, and so moderate conditions are assumed. We use maximum likelihood estimators for the parameters and plug them into Bayes classifier, say plugin rule. Under , using the linearity of the plugin rule, we show persistence of the plugin rule when the variance of the plugin rule is divergent; otherwise we show there exists an example of nonpersistence of plugin rule. We consider the sparsity of the variables in case of the nonpersistence of plugin rule, and under the condition of the sparsity, we overcome the nonpersistence by subset selection of the variables. This shows that plugin rule with selected variables may achieve a better performance than the full classifier especially in high dimensional data. We briefly discuss on the convergence rate. Improving Polar Cloud Detection by fusing MISR and MODIS information Tao Shi, Bin Yu, Eugene E.
Clothiaux, & Amy J. Braverman Clouds play a major role in controlling Earth's climate. A key to predicting climate change is to observe and understand the global distribution of clouds, their physical properties, and their relationship to regional and global climate. NASA's Earth Observ ing System is designed for studying the Earth from space using a multipleinstrument, multiplesatellite approach. However, clouds above snow and icecovered surfaces over polar regions are especially difficult to detect from satellite data because their temper ature and reflectivity are similar to that of the surface. The Multiangle Imaging SpectroRadiometer (MISR) and the Moderate Resolu tion Imaging Spectroradiometer (MODIS), two instruments on the first EOS satellite TERRA, ware launched in 1999 to provide scientists with data for global cloud study. Fusing the information from different instruments is one of NASA's high priorities in the multipleinstrument and multiplesatellite EOS system, since the combined informa tion of two or more sources of complementary data can validate and improve the results from each part. In this paper, we made an effort to improve polar cloud detection by fusing MISR and MODIS data. Compared to expert labels, the agreed pixels of the MISR ELCMC cloud detection algorithms (Shi et al 2004) and MODIS operational cloud mask (Ackerman et al 2002) are highly accurate. Therefore, those pixels may serve as a good source of accurate labels for training other classifiers. A Quadratic Discriminate Analysis classifier is trained on all MISR and MODIS features using the agreed pixels, and the classifier is then applied to the full data. The QDA classifier provides an error rate much lower than those of using either MISR or MODIS data alone. Multitarget Tracking with Application to Convective Systems Curtis Storlie A statistical approach to multiple target tracking is presented which allows for birth, death, splitting and merging of targets. Targets are also allowed to go undetected for several frames. The splitting and merging of targets is a novel addition for a statistically based tracking algorithm. This addition is essential for the tracking of storms, which is the motivation for this work. The utility of this tracker extends well beyond the tracking of storms however. It can be valuable in other tracking applications that have splitting or merging, such as vortices, radar/sonar signals, or groups of people. The method assumes that the location of a target behaves like a Gaussian Process when it is observable. A Markov State Model decides when the birth, death, splitting, or merging of targets takes place. The tracking estimate is achieved by an algorithm that finds the paths that maximize the likelihood of the assumed model. Some theoretical properties of tracking estimates are also developed such as sufficient conditions for consistency. The problem of how to quantify the confidence in a tracking estimate is addressed as well. The properties of the proposed method will be demonstrated on simulated data. Finally, the method is applied to the problem for which it was designed, tracking storms from radar reflectivity data. HumanGuided, UltraAdaptive Learning for Local Regression Benjamin Tyner Purdue University Our goal is to enhance model selection in local regression by providing an effective way to study the space of model, or tuning, parameters. These parameters include the local polynomial degree (including mixtures of consecutive degrees), the smoothing parameter (span), the depth of an interpolating kdtree (nc), and the relative scaling of the independent variables. Using ideas and methods from the study of response surfaces in designed experiments, we a study a model fit criterion, a model complexity metric, and an estimate of the variance of the noise, all as functions of the tuning parameters. A loess surface is fitted for each of a set of values of the tuning parameters. The tuning set is the design space of our experiment, and the fit, complexity, and variance estimate are the experimental responses. These metadata are then studied using response surface methods. Here we will take the model selection criterion to be the crossvalidation sum of squares divided by a strawman noise variance estimate, and take the complexity measure to be a way of measuring the equivalent number of parameters of the fit. To effectively study the metadata we use about a dozen tools of data visualization, many of which exploit the framework of trellis display. It can happen that the analysis indicates that no points in the design space provide a good fit. In this case we move to new region of the tuning parameter space, in effect, carrying out evolutionary operation. We have explored our ideas initially with 15 test data sets: five univariate functions suggested by L. Brown, across three levels of noise. Discrimination and classification based on nonparametric hypothesis testing Haiyan Wang
In this paper we are concerned with clustering and classification
problem in high dimension, low sample size data. This is a increasingly
important topic that can be applied in a wide range of practical
contexts, including gene microarray analysis, chemometrics, medical
image analysis, etc. Classical multivariate methods, which often need
sphere the data by multiplying the root inverse of the covariance
matrix, can not be applied to such cases due to the fact that there are
more parameters than sample sizes. Support vector machine and distance
weighted discrimination (Marron and Todd, 2002) work well when the
dimension is moderately large but the misclassification rate increases
significantly as the dimension increases. Here we will present a
clustering and classification method based on nonparametric hypothesis
testing developed specially for such setting. This is an appealing
alternative to techniques like inverse regression since it does not
require constant variance or normality of the predictors. Simulation
study will be given to evaluate the performance. An application to a
microarray dataset will be illustrated. Simulating red flour beetle movement by an agentbased model Kurt Zhang Kansas State University Susan Romero, Kansas State University Paul Flinn, Grain Marketing and Production Research Center, USDA Jim Campbell, Grain Marketing and Production Research Center, USDA We developed a spatially explicit, agentbased model to better understand and predict beetle population dynamics in spatially complex landscapes. Because an agentbased model allows for differences in agent behavior and competition between agents, these models can be more realistic than deterministic models, which have a number of limitations, such as oversimplification of the system and their inability to portray stochastic factors and behavioral interactions. An objectoriented language was used to simulate the behavior of beetles. The model allows movement of individual beetles to be traced. After analyzing the circular data from biological experiments, a von Mises distribution was found to fit the turning angles of walking adult beetles well. A wrapped normal distribution also fit the data well, and was faster to compute than the von Mises distribution. Moving speed was shown to be negatively correlated with turning angles. The autocorrelation and crosscorrelation within and between beetle paths were investigated.

www.stat.colostate.edu/graybillconference 