Penalized Estimation for Sample Surveys in the Presence of Auxiliary Variables
Mark Delorey , Ph.D. Candidate, Department of Statistics
Colorado State University
May 30, 2008
In conducting sample surveys, time and financial resources can be limited but research questions are wide and varied. Thus, methods for analysis must make the best use of whatever data are available and produce results that address a variety of needs. Motivation for this research comes from surveys of aquatic resources, in which sample sizes are small to moderate, but auxiliary information is available to supplement measured survey responses. Problems of survey estimation are considered, tied together in their use of constrained or penalized estimation techniques for combining information from auxiliary variables and the responses of interest.
We study a small area problem with the goal of obtaining a good ensemble estimate, that is, a collection of estimates for individual small areas that collectively give a good estimate of the overall distribution function across small areas. Estimation of the distribution function itself (Cordy and Thomas, 1997) can address questions of variability and extremes but does not provide individual estimators, nor is it appropriate when auxiliary information can be made of use. Bayes estimators are good individual estimators in terms of mean squared error but are not variable enough to represent ensemble traits (Ghosh, 1992).
An algorithm that extends the constrained Bayes methods of Louis (1984) and Ghosh (1992) for use in a model with a general covariance matrix is presented. We refer to this method as general constrained Bayes (GCB). The ensemble GCB estimator is asymptotically unbiased for the posterior mean of the empirical distribution function (edf). The ensemble properties of transformed GCB estimates are investigated to determine if the desirable ensemble characteristics displayed by the GCB estimator are preserved under such transformations. The GCB algorithm is then applied to complex models such as conditional autoregressive spatial models and to penalized spline models. Illustrative examples include the estimation of lip cancer risk, mean water acidity, and rates of change in water acidity.
We also study a moderate area problem in which the goal is to derive a set of survey weights that can be applied to each study variable with reasonable predictive results. Zheng and Little (2003) use penalized spline regression in a model-based approach for finite population estimation in a two-stage sample when predictor variables are available. Breidt et al. (2005) propose a class of model-assisted estimators based on penalized spline regression in single stage sampling. Because unbiasedness of the model-based estimator requires that the model be correctly specified, we look at extending model-assisted estimation using penalized splines to the two-stage case. By calibrating the degrees of freedom of the smooth to the most important study variables, a set of weights can be obtained that produce design consistent estimators for all study variables. The model-assisted estimator is compared to other estimators in a simulation study. Results from the simulation study show that the model-assisted estimator is comparable to other estimators when the model is correctly specified and generally superior when the model is incorrectly specified.