Poster Session & Student Competition



Registration Information

Registration On-Line

Short Course




Keynote Speakers Invited Speakers
  • Ray Chambers
  • Danny Pfeffermann
  • Jon Rao
  • Chris Skinner
  • Steve Thompson
  • Berger, Yves
  • Cardot, Herve
  • Chauvet, Guillaume
  • Dorfman, Alan
  • Ghosh, Malay
  • Goga, Camelia
  • Haziza, David
  • Kim, Jae-Kwang
  • Kreuter, Frauke
  • Lahiri, Partha
  • Little, Rod
  • Molina, Isabel
  • Morales, Domingo
  • Ruiz-Gazen, Anne
  • Scott, Alastair
  • Tzavidis, Nikos
  • Valliant, Rick
  • Wang, Lily
  • Wang, Suojin
  • Wu, Changbao

Keynote Speakers

Ray Chambers

Using Social Network Information for Survey Estimation


Thomas Suesse and Ray Chambers, Centre for Statistical and Survey Methodology, University of Wollongong, Wollongong, Australia

Model-based and model-assisted methods of survey estimation aim to improve the precision of estimators of the population total or mean relative to methods based on the nonparametric Horvitz-Thompson estimator. These methods often use a linear regression model defined in terms of auxiliary variables whose values are assumed known for all population units. Information on networks represents another form of auxiliary information that might increase the precision of these estimators, particularly if it is reasonable to assume that networked population units have similar values of the survey variable. Linear models that use networks as a source of auxiliary information include autocorrelation, disturbance and contextual models. In this paper we focus on social networks, and investigate how much of the population structure of the network needs to be known for estimation methods based on these models to be useful. In particular, we use simulation to compare the performance of the best linear unbiased predictor under a model that ignores the network with model-based estimators that incorporate network information. Our results show that incorporating network information via a contextual model seems to be the most appropriate approach. We also show that the one does not need to know the full population network, but that knowledge of the partial network linking the sampled population units to the non-sampled population units is necessary. Finally, we use friendship network data collected in the British Household Panel Study to illustrate the gains from applying the contextual model to estimation in this survey.

Danny Pfeffermann

Cross-Sectional vs Time Series Benchmarking: Which One Should We Use?


Danny Pfeffermann, University of Southampton, UK and Hebrew University of Jerusalem, Israel, Anna Sikov, Hebrew University of Jerusalem, Israel, and Richard Tiller Bureau of Labor Statistics, Washington DC, USA

Abstract This presentation is divided into two parts. In the first part I shall review and study the properties of single-stage cross-sectional and time series benchmarking procedures that have been proposed in the literature in the context of small area estimation. I shall compare cross-sectional and time series benchmarking empirically, using data generated from a time series model which complies with the familiar Fay-Herriot model at any given time point. The comparisons will focus on two important issues: efficiency under correct model specification and robustness to model misspecification.
In the second part I shall review and discuss cross-sectional and times series methods proposed in the literature for benchmarking hierarchical small area data. The time series method is applied to monthly nemployment estimates in Census Divisions (CD) and States of the U.S.A. The CD estimates are benchmarked to the national estimate and the State estimates are benchmarked to the benchmarked estimate of the CD to which they belong.

J.N.K. Rao

Weighted estimating equations approach to inference from complex survey data: overview and new developments

J. N. K. Rao, Carleton University

Traditional unweighted estimating equations approach requires modifications when analysing complex survey data because of informative sampling due to unequal selection probabilities and other design features. I will first review some work on design-weighted estimating equations and some refinements to make inference on regression parameters. I will then present some new work on design weighted composite score equations to handle inference for variance and covariance components associated with two-level models that are widely used in practice.

Chris Skinner

The Use of Survey Weights in Regression Modelling


Chris Skinner, London School of Economics and Political Science

This talk will review the use of weights in regression modelling and provide an account of some new work with Jae Kim on weight smoothing. The starting point will be the traditional use of design weights to achieve consistent estimation under informative sampling. The main focus will be on approaches to modifying these weights to improve efficiency. It is expected that reference will also be made to calibration, Bayesian inference and nonresponse weighting.   

Steve Thompson

Dynamic Network Sampling


Steve Thompson, Simon Fraser University

In this talk I describe a range of designs for selecting samples in spatial and network populations that change over time.  Examples of such situations arise in studies of hard-to-reach populations of people at risk for HIV transmission and infection, monitoring of airborne microorganisms, assessment and management of forest insect pests, and surveys in many other types of continually changing populations.  An important use of sampling designs in such situations, in addition to providing inferences about population characteristics, is to find units to which to make interventions or apply treatments.  A number of sampling designs for dynamic populations will be described.  Some interesting properties of the sampling strategies emerge in the dynamic setting that do not arise in static sampling environments.  The application of this approach to the HIV epidemic and approaches to alleviating it will be illustrated.

Invited Speakers

Yves G. Berger

Empirical Likelihood Confidence Intervals under Unequal Probability Sampling


Yves G. Berger and Omar De La Riva Torres, University of Southampton, UK

We propose a novel empirical likelihood approach, which can be used to construct design-based confidence intervals under unequal probability sampling. The proposed approach gives confidence intervals that may have better coverages than standard confidence intervals and pseudo empirical likelihood confidence intervals, which rely on variance estimates and design-effects. The proposed approach does not rely on variance estimates, design-effects, re-sampling or linearisation, even when the parameter of interest is not linear. It can be also used to construct confidence intervals of means, regressions coefficients, quantiles, totals or counts even when the population size is unknown. It also gives suitable confidence intervals when the point estimator is biased. We show that the proposed maximum empirical likelihood point estimator is asymptotically optimal. We also propose an approach that deals with large sampling fractions. We compare the proposed approach with the pseudo empirical likelihood approach, which need to be adjusted by a factor (the design effect) that is estimated. This may affect the coverages of the pseudo empirical likelihood confidence intervals. We also apply the proposed approach to a measure of poverty based upon the European Union Statistics on Income and Living Conditions (EU-SILC) surveys.

Hervé Cardot

Confidence bands for estimators of the mean of functional data for model assisted techniques and high entropy sampling designs


H. Cardot (Univ. Bourgogne, France), C. Goga (Univ. Bourgogne, France) and P. Lardin (EDF & La Poste, France)


When the study variable is functional (a function of time) and storage capacities are limited or transmission costs are high, selecting with survey sampling techniques a small fraction of the observations is an interesting alternative to signal compression techniques, particularly when the goal is the estimation of a simple quantity such as the mean trajectory.  We extend, in this functional framework, model-assisted estimators with linear regression models that can take account of auxiliary variables whose totals over the population are known. We first show, under weak hypotheses on the sampling design and the regularity of the trajectories, that the estimator of the mean function is uniformly consistent. Then, under additional assumptions, we prove a functional central limit theorem and we assess rigorously a fast technique based on simulations of Gaussian processes, which is employed to build asymptotic confidence bands.  We also consider a different approach based on πps sampling designs. Assuming the entropy of the sampling design is high, the variance function of the Horvitz-Thompson estimator can be approximated via the Hajek formula. We show, under hypotheses on the trajectories and the sampling design, that we get a uniformly consistent estimator of the variance function and that we are able to build confidence bands whose asymptotic coverage is the desired one.  A comparison of these two different approaches is made on a real dataset of sampled electricity consumption curves measured every half an hour over a period of one week.

Guillaume Chauvet

Doubly robust inference for complex parameters in the presence of missing survey data


Hélène Boistard, Toulouse School of Economics, Guillaume Chauvet, ENSAI (CREST), and David Haziza, Université de Montréal


Missing data are frequently encountered in surveys, when some units are not willing to answer or may not be joined. We consider the case when a missing value is replaced with an artificial value through simple imputation. To study the properties of the imputed estimators, we consider two distinct approaches for inference: the Nonresponse Model (NM), in which inference is made with respect to the joint distribution induced by the sampling design and the assumed nonresponse model; the IM approach, in which inference is made with respect to the joint distribution induced by the imputation model, the sampling design, and the nonresponse model.
So far, the literature has focused on estimating simple parameters such as population totals (or means). To the best of our knowledge, doubly robust estimation of more complex parameters has not been fully addressed in the literature. In this work, we examine the problem of doubly robust inference for the distribution function. We consider an imputation approach, where missing values are replaced with values of respondents selected at random with probabilities that ensure that the resulting estimator of the distribution function is doubly robust.

Alan H. Dorfman

On the Bona Fides of Cutoff Sampling


Alan H. Dorfman, U.S. Bureau of Labor Statistics


In its most general sense, cutoff sampling is a mode of sampling which deliberately embraces the ineligibility for sampling of some defined portion E of the population of interest U.   Thus the subset E is not represented by any of its members in the sample selected.  The resulting data gap is akin to what we also see arising "naturally" in survey non-response, where the characteristics of non-respondents cannot be known to be the same as those who do respond, as well as in small area estimation, where budgetary considerations and the need to cover broad swaths compel us to curtail units in small domains for which we want estimates.  Cutoff sampling, however, is in less repute than these procedures, because of its deliberate choice to omit that which is of interest, which in principle could have readily been sampled.  Despite its shadowy reputation, cutoff sampling is widely practiced, for example in certain U.S. federal establishment surveys.  It can, in fact, be viewed as a species of small area estimation.  Can we make valid inference under the restrictions it imposes and, if so, how?  

Malay Ghosh

A Likelihood Based Approach to Small Area Estimation under Measurement Error Models
  Malay Ghosh, University of Florida

We propose an adjusted profile likelihood approach for small area estimation when covariates are subject to measurement error. We remove the bias in the profile score functions and obtain consistent estimators of the parameters. “Empirical” predictors of the random effects are then derived. They are shown to be first order asymptotically optimal in the sense of Robbins. Second order approximation of the mean squared error of these predictors is also provided.

Camelia Goga

Efficient Estimation of Nonlinear Finite Population Parameters Using Nonparametrics


Camelia Goga (IMB, Universite de Bourgogne, Dijon, France) and Anne Ruiz-Gazen (TSE, Universite Toulouse 1 Capitole, Toulouse, France)


Currently, the high-precision estimation of nonlinear parameters such as Gini indices, low-income proportions or other measures of inequality is particularly crucial. In the present paper, we propose a general class of estimators for such parameters that take into account univariate auxiliary information assumed to be known for every unit in the population. Through a nonparametric model-assisted approach, we construct a unique system of survey weights that can be used to estimate any nonlinear parameter associated with any study variable of the survey, using a plug-in principle. Based on a rigorous functional approach and a linearization principle, the asymptotic variance of the proposed estimators is derived, and variance estimators are shown to be consistent under mild assumptions. The theory is fully detailed for penalized B-spline estimators the relationship with nonparametric model-calibration is highlighted.  The validity of the method is demonstrated on data extracted from the French Labor Force Survey together with suggestions for practical implementation and guidelines for choosing the smoothing parameters.  Point and confidence intervals estimation for the Gini index and the low-income proportion are derived. Theoretical and empirical results highlight our interest in using a nonparametric approach versus a parametric one when estimating nonlinear parameters in the presence of auxiliary information.

David Haziza

On the problem of bias amplification in the context of instrument vector calibration for missing survey data

David Haziza (Université de Montréal & CREST/ENSAI) and Éric Lesage (CREST/ENSAI)


In recent years, instrument vector calibration has received a lot of attention in the literature in the context of unit nonresponse.  In this presentation, we discuss the so-called single-step approach to weighting, which consists of using calibration with three simultaneous goals in mind: reduce the nonresponse, ensure consistency between survey estimates and known population totals and, possibly, improve the efficiency of point estimates.  We examine the properties of instrument vector calibration estimators, where the instrumental variables (assumed to be related to the response propensity) are available for the responding units only. We illustrate the problem of bias amplification, which has been described in the epidemiological literature by Pearl (2010) and Myers and al. (2011).  Results of a simulation will be presented.

Jae-Kwang Kim

Propensity-score-adjustment method for nonignorable  nonresponse


Jae-kwang Kim, Iowa State University and Minsun Riddles, Westat


Propensity-score-adjustment method is a popular technique for handling unit nonresponse in sample surveys. If the response probability depends on the study variable that is subject to missingness, estimating the response probability requires additional distributional assumptions about the study variable. Instead of making fully parametric assumptions about the population distribution and the response mechanism, we propose a new likelihood-based approach that is based on the distributional assumptions about the observed part of the sample. Since the model for the observed part of the sample can be verified from the data, the proposed method is less sensitive to the failure of the assumed model. Variance estimation is discussed and results from limited simulation studies are presented to compare the performance of the proposed method with the existing methods.

Frauke Kreuter

Advancements in Sample Data Augmentation


Frauke Kreuter, University of Maryland, College Park, MD


In recent years large survey organizations have made considerable efforts to enhance information on all sample cases with paradata, data from commercial vendors, and through linkage to administrative data to allow for improved field operations or nonresponse adjustments. This presentation will review such efforts from several statistical agencies, discuss problems of data quality, and point to current best practices.

Partha Lahiri

Small Area Interval Estimation 


Partha Lahiri, University of Maryland, College Park, USA


In this talk, I will revisit the problem of constructing confidence intervals for small area means in the context of the Fay-Herriot model.  The naive normality-based empirical Bayes confidence interval cuts down the length of the corresponding confidence interval based on the direct method at the expense of coverage error.  In order to reduce the coverage error of the naive empirical Bayes confidence intervals, parametric bootstrap methods have been suggested in the literature.  However, parametric bootstrap methods are generally computer intensive, especially if dataset contains a large number of small areas.  Moreover, determination of the number of bootstrap samples is often not straightforward.  In this talk, I will discuss an alternate method, based on my recent work with Masayo Yoshimori, that maintains the asymptotic coverage property of the parametric bootstrap by using area specific adjustment factors for the maximum likelihood method in estimating the model variance.  The method has a tremendous advantage over the parametric bootstrap in terms of computer time.  Moreover, in our simulation, the proposed method seems to perform better than the parametric bootstrap for areas with high leverages.

Roderick J.A. Little

Missing At Random And Ignorability For Survey Inferences With Missing Data


Roderick J.A. Little, University of Michigan and Sahar Z Zangeneh, University of Washington


In a landmark paper, Rubin (1976) showed that the missing data mechanism can be ignored for likelihood-based inference about parameters when (a) the missing data are missing at random (MAR), in the sense that missingness does not depend on the missing values after conditioning on the observed data, and (b) distinctness of the parameters of the data model and the missing-data mechanism, that is, there are no a priori ties, via parameter space restrictions or prior distributions, between the parameters of the data model and the parameters of the model for the mechanism. Rubin (1976) described (a) and (b) as the "weakest simple and general conditions under which it is always appropriate to ignore the process that causes missing data". However, it is important to note that these conditions are not necessary for ignoring the mechanism in all situations. We propose conditions for ignoring the missing-data mechanism for likelihood inferences about subsets of the parameters of the data model. We present examples where the missing data are ignorable for some parameters, but the missing data mechanism is missing not at random (MNAR), thus extending the range of circumstances where the missing data mechanism can be ignored. We apply these ideas to survey inference with missing data and poststratification information.

Isabel Molina

Small area estimation under a Fay-Herriot model with preliminary testing for the presence of random area effects


Isabel Molina, Department of Statistics, Universidad Carlos III de Madrid, Madrid, Spain, J.N.K. Rao, School of Mathematics and Statistics, Carleton University,
Ottawa, Canada, and Gauri S. Datta, Department of Statistics, University of Georgia, Athens, USA


The empirical best linear unbiased predictor (EBLUP) under a Fay-Herriot model is often used for estimation of a small area mean when the available auxiliary information is aggregated at the area level. The Fay-Herriot model involves unobservable random effects for the areas, which represent the area variation that is not explained by the auxiliary variables. Datta, Hall and Mandal (2011) proposed an alternative estimator to the EBLUP based on a preliminary test (PT) for the significance of the random effects variance. When the null hypothesis of no area effects is not rejected, a synthetic estimator based on the same model without the area effects is used. Otherwise, the EBLUP is used. The properties of this new estimator in terms of bias and mean squared error are studied for different values of random effects variance and different significance levels of the testing procedure. The PT estimator is compared with the EBLUP, with the adjusted maximum likelihood (AML) estimator introduced by Li and Lahiri (2010) and with two more combined estimators that, as the AML, give always a non zero weight to the direct estimator for all areas. Mean squared error estimators based on the preliminary testing procedure are also proposed and studied in simulations.

Domingo Morales

Small area estimation of labour force indicators under a multinomial mixed model with correlated time and area effects

  Esther López-Vizcaíno, Instituto Galego de Estatística, Spain, María José Lombardía, Universidade da Coruña, Spain, and Domingo Morales, Universidad Miguel Hernández de Elche, Spain

The aim of this work is the estimation of small area labour force indicators like totals of employed and unemployed people and unemployment rates. Small area estimators of these quantities are derived from four multinomial logit mixed models, including a model with correlated time and area random effects. Mean squared errors are used to measure the accuracy of the proposed estimators and they are estimated by analytic and bootstrap methods. The introduced methodology is applied to real data from the Spanish Labour Force Survey of Galicia.

Anne Ruiz-Gazen

Approximation of rejective sampling inclusion probabilities and application to high order correlations


Anne Ruiz-Gazen, Toulouse School of Economics, France


In the finite population context, asymptotic properties of estimators, such as consistency and asymptotic normality, are usually derived under assumptions on high order inclusion probabilities of the sampling design.  The purpose of this presentation is to generalize the approximation result obtained by Hajek for the first and second order inclusion probabilities of rejective sampling to inclusion probabilities of any order, and also to provide a more precise remainder term in the expansion.  This result is applied to illustrate that rejective sampling satisfies conditions on higher order correlations imposed in the recent literature to derive asymptotic results. A comparison with some other existing results concerning high order correlations of the rejective sampling will also be presented.

Alastair Scott

Information Criteria under Complex Sampling


Alastair Scott and Thomas Lumley
Department of Statistics, University of Auckland


Model selection criteria such as AIC and BIC are widely used in applied statistics.  In recent years there has been a huge increase in regression modelling of data from large complex surveys, and a resulting demand for versions of AIC and BIC that are valid under complex sampling. We show how to extend both criteria to complex samples. Following the approach of Takeuchi (1976) for possibly-misspecified models, AIC can be extended by replacing the penalty term by the Rao–Scott formula for the null expectation of the log likelihood ratio. BIC can be extended by a Bayesian coarsening argument, where the point estimates under complex sampling are treated as the data available for Bayesian modelling. The Laplace approximation argument used to construct BIC then gives a penalty term involving the trace and determinant of the Rao–Scott design-effect matrix.

Nikos Tzavidis

Using M-quantile Regression for Small Area Estimation of Binary and Count Outcomes


Ray Chambers (University of Wollongong), Emanuela Dreassi (University of Florence), M. Giovanna Ranalli (University of Perugia), Nicola Salvati (University of Pisa), Nikos Tzavidis (University of Southampton)


The increasing demand for reliable small area statistics has led to the development of a number of efficient model-based small area estimation (SAE). For example, the empirical best linear unbiased predictor based on a linear mixed model (LMM) is often recommended when the target of inference is the small area average of a continuous response variable.  Using a mixed model, however, requires strong distributional assumptions.  An alternative approach to small area estimation that automatically allows for robust inference is to use M-quantile models (Chambers & Tzavidis, 2006).  In reality, many survey variables are categorical in nature and are therefore not suited to standard SAE methods based on LMMs. In this presentation we discuss recent work on a new approach to SAE for discrete outcomes based on M-quantile modelling. This is based on extending the existing M-quantile approach for continuous outcomes to the case where the response is binary or a count. As with M-quantile modelling of a continuous response, random effects are avoided and between area variation in the response is characterised by variation in area-specific values of quantile-like coefficients. After reviewing M-quantile small area estimation for a continuous response, we show how the approach for robust inference for generalised linear models (GLMs) proposed by Cantoni & Ronchetti (2001) can be extended for fitting an M-quantile GLM. Approaches for defining the M-quantile coefficients, which play the role of pseudo-random effects in this framework, are discussed, alongside the definition of small area predictors and corresponding MSE estimators. Results from model-based and design-based simulation studies aimed at empirically assessing the performance of the proposed small area predictors are presented. The presentation is concluded by presenting results from the application of the proposed methods for deriving (a) unemployment estimates for Local Authority Districts in the UK and (b) estimates of the number of visits in primary health care outlets for Health Authority Districts in Italy.

Richard Valliant

Effects on Sample Design of Varying Unit Sizes in Two and Three-stage Sampling


Richard Valliant, Universities of Michigan & Maryland


Two- and three-stage sampling is sometimes necessary in household or establishment surveys for operational or cost reasons.  Accompanying each stage of sampling is a variance component that depends on the type of sample design used at each stage and on the estimator.  The relative sizes of the variance components determine measures of homogeneity that are used in determining an efficient sample design.  Although textbooks usually make the simplifying assumption that the units used at each stage of sampling contain the same numbers of units, this assumption typically is violated in real populations.  Not accounting for varying sizes of units can lead to unexpectedly inefficient sample allocations.  These points are illustrated using a data set based on the 2000 Census for a county in Maryland.

Lily Wang

Estimation of Small Area Means under Semi-Parametric Measurement Error Models


Gauri S. Datta, University of Georgia, Peter Hall, University of Melbourne, Aurore Delaigle, University of Melbourne, and Lily Wang, University of Georgia


In recent years, demand for reliable estimates for characteristics of small domains (small areas) has considerably increased worldwide due to the growing use of such estimates in formulating policies and programs, allocating government funds, planning regional development, and making marketing decisions at the local level. However, due to cost and operational considerations, it is rarely possible to get a large enough sample at the small area level to support direct estimates with adequate precision for all domains of interest. Model-based inference has gained immense popularity in producing indirect but reliable small area estimates. These indirect estimates borrow strength from related areas and other data source by linking them through appropriate models. Existing methods in small area estimation are mostly parametric, and they usually treat the explanatory variables as if they are measured without error. However, explanatory variables are often subject to measurement error. A few authors have addressed the measurement error problem in small area estimation through parametric approach based on the normality assumption. Resulting estimates are usually sensitive to the distributional assumptions. In this talk, we consider structural measurement error models and a semi-parametric approach to produce reliable point estimates and prediction intervals for small area means. Specifically, we consider an adaptation of the Fay-Herriot model for the area-level data where one of the covariates is measured with error. We replace the normality assumption of the sampling error and the normality assumption of the measurement error of a covariate by heavy-tailed distributions. Estimating the unknown measurement error density nonparametrically, we develop both point estimates and prediction intervals of small area means. We have obtained an expansion of the coverage error of the proposed prediction intervals. 

Suojin Wang

Variable selection and estimation for longitudinal survey data


Lily Wang, University of Georgia, and Suojin Wang, Texas A&M University


There is wide interest in studying longitudinal surveys where sample subjects are observed successively over time. Longitudinal surveys have been used in many areas today, for example, in the health and social sciences, to explore relationships or to identify significant variables in regression settings. In this talk we discuss a general strategy for the model selection problem in longitudinal sample surveys. A survey weighted penalized estimating equation approach is proposed to select significant variables and estimate the coefficients simultaneously. The proposed estimators are design consistent and perform as well as the oracle procedure when the correct submodel were known. The estimating function bootstrap is applied to obtain the standard errors of the estimated parameters with good accuracy. A fast and efficient variable selection algorithm is developed to identify significant variables for complex longitudinal survey data. Numerical illustrations are given to show the usefulness of the proposed methodology under various model settings and sampling designs.

Changbao Wu

Calibration Weighting Methods for Complex Surveys


Changbao Wu, Department of Statistics and Actuarial Science, University of Waterloo


This paper provides an overview of three popular calibration weighting methods for complex surveys: (i) the regression weighting method; (ii) the exponential tilting method; and (iii) the pseudo empirical likelihood method. Computational algorithms for each of the methods are discussed, and finite sample configurations of the three types of weights are examined through simulation studies. The pseudo empirical likelihood approach to calibration is shown to have several advantages, including stable weights, efficient and reliable computational procedures, and the method can easily be used for generalized raking, a special calibration problem where auxiliary population information is in the form of known marginal totals for a contingency table.