Shape Restricted Regression in a Bayesian Framework
Amber Hackstadt, Preliminary Ph.D., Department of Statistics, Colorado State University.
Monday, May 11, 2009
9:30 pm, 223 Weber
We propose a Bayesian paradigm along with shape restricted regression to fit data and make inference about the functional relationship between variables. Shape restricted regression splines (SRRS) are flexible models that require minimal assumptions. They are more flexible than linear regression but are more robust to knot selection than unrestricted regression splines. Shape restrictions, including monotonicity and convexity, are often reasonable assumptions. For example, when examining the relationship between heart rate and body temperature, we could expect that body temperature remains constant or increases as heart rate increases. Similarly, fisheries biologists may want to model the relationship between otolith size (otoliths are structures found just behind the eyes or in the semicircular canals in fish) and fish length. This relationship might be considered to be increasing and monotone regression splines can be used to estimate the relationship between these variables without the strong assumption of a parametric form.
For SRRS, the Bayesian framework along with vague priors gives function estimates that are similar to maximum likelihood estimates while providing posterior distributions that facilitate inference. In some cases, the Bayesian framework allows for inference that is much more difficult to perform in the frequentist framework. For models involving a continuous predictor variable as well as a dichotomous predictor variable, Bayesian credible intervals can be used to determine whether to fit one or two parallel curves to the data. For example, one may wonder if a separate regression curves should be fit for females and males modeling the relationship between heart rate and body temperature or if one curve can be fit for both sexes. If the functional relationship between the continuous predictor and the response variable is truly linear then our simulation studies show that the Bayesian credible interval for SRRS performs just as well as linear regression. However, if the functional relationship between the continuous predictor and the response variable is increasing but not linear, then our simulation studies show that the Bayesian credible interval performs better than a basic t-test. Similarly, a χ2 test can be used to determine whether to fit one or multiple parallel curves when the model involves a continuous predictor variable and a categorical variable with more than two levels. Simulation studies show that SRRS out-performs the linear regression model when the relationship between the continuous predictor and the response variable is monotone increasing but not linear, and performs just as well when there is a linear relationship between the variables.
Likewise, shape restricted regression splines in a Bayesian framework along with Bayes factors can be used to determine whether a parametric model is appropriate for a given data set versus a less restrictive shape restricted regression model. The Bayesian framework along with Bayes factors can be used to determine if a simple parametric model such as a constant or linear model is appropriate for the relationship between otolith size and fish length or if the flexible monotone increasing model is more appropriate. For data sets involving a categorical variable, Bayes factors can be used to determine if a model with interaction between the levels of the categorical variables is appropriate. For instance, a Bayes factor could be used determine whether there is an interaction effect between sexes when considering the relationship between heart rate and body temperature. We conducted simulation studies to determine the effectiveness of using Bayes factors to select the model which generated the data. Using Bayes factors to select a model more often resulted in the correct model being selected than tests performed using linear regression models except when the assumptions of linear regression were met, as expected.
Performing shape restricted regression in a Bayesian paradigm also has many possible extensions. Extensions to this work include shape restricted regression splines for generalized linear regression, weighted least squares, and spatial statistics. Additionally, properties of estimates obtained from shape restricted regression can be further examined as well as the performance of other model selection methods as compared to Bayes factors.
Dr. Jennifer Hoeting, Advisor
Dr. Mary Meyer, Co-advisor
Dr. Jean Opsomer, Committee Member
Dr. Kathryn Huyvaert, Department of Fish, Wildlife, and Conservation Biology