Keynote Speakers  Invited Speakers 



Keynote Speakers



Ray Chambers 
Using Social Network Information for Survey Estimation 
Thomas Suesse and Ray Chambers, Centre for Statistical and Survey Methodology, University of Wollongong, Wollongong, Australia 

Modelbased and modelassisted methods of survey estimation aim to improve the precision of estimators of the population total or mean relative to methods based on the nonparametric HorvitzThompson estimator. These methods often use a linear regression model defined in terms of auxiliary variables whose values are assumed known for all population units. Information on networks represents another form of auxiliary information that might increase the precision of these estimators, particularly if it is reasonable to assume that networked population units have similar values of the survey variable. Linear models that use networks as a source of auxiliary information include autocorrelation, disturbance and contextual models. In this paper we focus on social networks, and investigate how much of the population structure of the network needs to be known for estimation methods based on these models to be useful. In particular, we use simulation to compare the performance of the best linear unbiased predictor under a model that ignores the network with modelbased estimators that incorporate network information. Our results show that incorporating network information via a contextual model seems to be the most appropriate approach. We also show that the one does not need to know the full population network, but that knowledge of the partial network linking the sampled population units to the nonsampled population units is necessary. Finally, we use friendship network data collected in the British Household Panel Study to illustrate the gains from applying the contextual model to estimation in this survey. 

Danny Pfeffermann 
CrossSectional vs Time Series Benchmarking: Which One Should We Use? 
Danny Pfeffermann, University of Southampton, UK and Hebrew University of Jerusalem, Israel, Anna Sikov, Hebrew University of Jerusalem, Israel, and Richard Tiller Bureau of Labor Statistics, Washington DC, USA 

Abstract This presentation is divided into two parts. In the first part I shall review and study the properties of singlestage crosssectional and time series benchmarking procedures that have been proposed in the literature in the context of small area estimation. I shall compare crosssectional and time series benchmarking empirically, using data generated from a time series model which complies with the familiar FayHerriot model at any given time point. The comparisons will focus on two important issues: efficiency under correct model specification and robustness to model misspecification. 

J.N.K. Rao 
Weighted estimating equations approach to inference from complex survey data: overview and new developments 
J. N. K. Rao, Carleton University 

Traditional unweighted estimating equations approach requires modifications when analysing complex survey data because of informative sampling due to unequal selection probabilities and other design features. I will first review some work on designweighted estimating equations and some refinements to make inference on regression parameters. I will then present some new work on design weighted composite score equations to handle inference for variance and covariance components associated with twolevel models that are widely used in practice. 

Chris Skinner 
The Use of Survey Weights in Regression Modelling 
Chris Skinner, London School of Economics and Political Science 

This talk will review the use of weights in regression modelling and provide an account of some new work with Jae Kim on weight smoothing. The starting point will be the traditional use of design weights to achieve consistent estimation under informative sampling. The main focus will be on approaches to modifying these weights to improve efficiency. It is expected that reference will also be made to calibration, Bayesian inference and nonresponse weighting. 

Steve Thompson 
Dynamic Network Sampling 
Steve Thompson, Simon Fraser University 

In this talk I describe a range of designs for selecting samples in spatial and network populations that change over time. Examples of such situations arise in studies of hardtoreach populations of people at risk for HIV transmission and infection, monitoring of airborne microorganisms, assessment and management of forest insect pests, and surveys in many other types of continually changing populations. An important use of sampling designs in such situations, in addition to providing inferences about population characteristics, is to find units to which to make interventions or apply treatments. A number of sampling designs for dynamic populations will be described. Some interesting properties of the sampling strategies emerge in the dynamic setting that do not arise in static sampling environments. The application of this approach to the HIV epidemic and approaches to alleviating it will be illustrated. 
Invited Speakers



Yves G. Berger 
Empirical Likelihood Confidence Intervals under Unequal Probability Sampling 

Yves G. Berger and Omar De La Riva Torres, University of Southampton, UK 
We propose a novel empirical likelihood approach, which can be used to construct designbased confidence intervals under unequal probability sampling. The proposed approach gives confidence intervals that may have better coverages than standard confidence intervals and pseudo empirical likelihood confidence intervals, which rely on variance estimates and designeffects. The proposed approach does not rely on variance estimates, designeffects, resampling or linearisation, even when the parameter of interest is not linear. It can be also used to construct confidence intervals of means, regressions coefficients, quantiles, totals or counts even when the population size is unknown. It also gives suitable confidence intervals when the point estimator is biased. We show that the proposed maximum empirical likelihood point estimator is asymptotically optimal. We also propose an approach that deals with large sampling fractions. We compare the proposed approach with the pseudo empirical likelihood approach, which need to be adjusted by a factor (the design effect) that is estimated. This may affect the coverages of the pseudo empirical likelihood confidence intervals. We also apply the proposed approach to a measure of poverty based upon the European Union Statistics on Income and Living Conditions (EUSILC) surveys. 

Hervé Cardot 
Confidence bands for estimators of the mean of functional data for model assisted techniques and high entropy sampling designs 
H. Cardot (Univ. Bourgogne, France), C. Goga (Univ. Bourgogne, France) and P. Lardin (EDF & La Poste, France) 

When the study variable is functional (a function of time) and storage capacities are limited or transmission costs are high, selecting with survey sampling techniques a small fraction of the observations is an interesting alternative to signal compression techniques, particularly when the goal is the estimation of a simple quantity such as the mean trajectory. We extend, in this functional framework, modelassisted estimators with linear regression models that can take account of auxiliary variables whose totals over the population are known. We first show, under weak hypotheses on the sampling design and the regularity of the trajectories, that the estimator of the mean function is uniformly consistent. Then, under additional assumptions, we prove a functional central limit theorem and we assess rigorously a fast technique based on simulations of Gaussian processes, which is employed to build asymptotic confidence bands. We also consider a different approach based on πps sampling designs. Assuming the entropy of the sampling design is high, the variance function of the HorvitzThompson estimator can be approximated via the Hajek formula. We show, under hypotheses on the trajectories and the sampling design, that we get a uniformly consistent estimator of the variance function and that we are able to build confidence bands whose asymptotic coverage is the desired one. A comparison of these two different approaches is made on a real dataset of sampled electricity consumption curves measured every half an hour over a period of one week. 

Guillaume Chauvet 
Doubly robust inference for complex parameters in the presence of missing survey data 
Hélène Boistard, Toulouse School of Economics, Guillaume Chauvet, ENSAI (CREST), and David Haziza, Université de Montréal 

Missing data are frequently encountered in surveys, when some units are not willing to answer or may not be joined. We consider the case when a missing value is replaced with an artificial value through simple imputation. To study the properties of the imputed estimators, we consider two distinct approaches for inference: the Nonresponse Model (NM), in which inference is made with respect to the joint distribution induced by the sampling design and the assumed nonresponse model; the IM approach, in which inference is made with respect to the joint distribution induced by the imputation model, the sampling design, and the nonresponse model. 

Alan H. Dorfman 
On the Bona Fides of Cutoff Sampling 
Alan H. Dorfman, U.S. Bureau of Labor Statistics 

In its most general sense, cutoff sampling is a mode of sampling which deliberately embraces the ineligibility for sampling of some defined portion E of the population of interest U. Thus the subset E is not represented by any of its members in the sample selected. The resulting data gap is akin to what we also see arising "naturally" in survey nonresponse, where the characteristics of nonrespondents cannot be known to be the same as those who do respond, as well as in small area estimation, where budgetary considerations and the need to cover broad swaths compel us to curtail units in small domains for which we want estimates. Cutoff sampling, however, is in less repute than these procedures, because of its deliberate choice to omit that which is of interest, which in principle could have readily been sampled. Despite its shadowy reputation, cutoff sampling is widely practiced, for example in certain U.S. federal establishment surveys. It can, in fact, be viewed as a species of small area estimation. Can we make valid inference under the restrictions it imposes and, if so, how? 

Malay Ghosh 
A Likelihood Based Approach to Small Area Estimation under Measurement Error Models 
Malay Ghosh, University of Florida  
We propose an adjusted profile likelihood approach for small area estimation when covariates are subject to measurement error. We remove the bias in the profile score functions and obtain consistent estimators of the parameters. “Empirical” predictors of the random effects are then derived. They are shown to be first order asymptotically optimal in the sense of Robbins. Second order approximation of the mean squared error of these predictors is also provided. 

Camelia Goga 
Efficient Estimation of Nonlinear Finite Population Parameters Using Nonparametrics 
Camelia Goga (IMB, Universite de Bourgogne, Dijon, France) and Anne RuizGazen (TSE, Universite Toulouse 1 Capitole, Toulouse, France) 

Currently, the highprecision estimation of nonlinear parameters such as Gini indices, lowincome proportions or other measures of inequality is particularly crucial. In the present paper, we propose a general class of estimators for such parameters that take into account univariate auxiliary information assumed to be known for every unit in the population. Through a nonparametric modelassisted approach, we construct a unique system of survey weights that can be used to estimate any nonlinear parameter associated with any study variable of the survey, using a plugin principle. Based on a rigorous functional approach and a linearization principle, the asymptotic variance of the proposed estimators is derived, and variance estimators are shown to be consistent under mild assumptions. The theory is fully detailed for penalized Bspline estimators the relationship with nonparametric modelcalibration is highlighted. The validity of the method is demonstrated on data extracted from the French Labor Force Survey together with suggestions for practical implementation and guidelines for choosing the smoothing parameters. Point and confidence intervals estimation for the Gini index and the lowincome proportion are derived. Theoretical and empirical results highlight our interest in using a nonparametric approach versus a parametric one when estimating nonlinear parameters in the presence of auxiliary information. 

David Haziza 
On the problem of bias amplification in the context of instrument vector calibration for missing survey data 
David Haziza (Université de Montréal & CREST/ENSAI) and Éric Lesage (CREST/ENSAI) 

In recent years, instrument vector calibration has received a lot of attention in the literature in the context of unit nonresponse. In this presentation, we discuss the socalled singlestep approach to weighting, which consists of using calibration with three simultaneous goals in mind: reduce the nonresponse, ensure consistency between survey estimates and known population totals and, possibly, improve the efficiency of point estimates. We examine the properties of instrument vector calibration estimators, where the instrumental variables (assumed to be related to the response propensity) are available for the responding units only. We illustrate the problem of bias amplification, which has been described in the epidemiological literature by Pearl (2010) and Myers and al. (2011). Results of a simulation will be presented. 

JaeKwang Kim 
Propensityscoreadjustment method for nonignorable nonresponse 
Jaekwang Kim, Iowa State University and Minsun Riddles, Westat 

Propensityscoreadjustment method is a popular technique for handling unit nonresponse in sample surveys. If the response probability depends on the study variable that is subject to missingness, estimating the response probability requires additional distributional assumptions about the study variable. Instead of making fully parametric assumptions about the population distribution and the response mechanism, we propose a new likelihoodbased approach that is based on the distributional assumptions about the observed part of the sample. Since the model for the observed part of the sample can be verified from the data, the proposed method is less sensitive to the failure of the assumed model. Variance estimation is discussed and results from limited simulation studies are presented to compare the performance of the proposed method with the existing methods. 

Frauke Kreuter 
Advancements in Sample Data Augmentation 
Frauke Kreuter, University of Maryland, College Park, MD 

In recent years large survey organizations have made considerable efforts to enhance information on all sample cases with paradata, data from commercial vendors, and through linkage to administrative data to allow for improved field operations or nonresponse adjustments. This presentation will review such efforts from several statistical agencies, discuss problems of data quality, and point to current best practices. 

Partha Lahiri  Small Area Interval Estimation 
Partha Lahiri, University of Maryland, College Park, USA 

In this talk, I will revisit the problem of constructing confidence intervals for small area means in the context of the FayHerriot model. The naive normalitybased empirical Bayes confidence interval cuts down the length of the corresponding confidence interval based on the direct method at the expense of coverage error. In order to reduce the coverage error of the naive empirical Bayes confidence intervals, parametric bootstrap methods have been suggested in the literature. However, parametric bootstrap methods are generally computer intensive, especially if dataset contains a large number of small areas. Moreover, determination of the number of bootstrap samples is often not straightforward. In this talk, I will discuss an alternate method, based on my recent work with Masayo Yoshimori, that maintains the asymptotic coverage property of the parametric bootstrap by using area specific adjustment factors for the maximum likelihood method in estimating the model variance. The method has a tremendous advantage over the parametric bootstrap in terms of computer time. Moreover, in our simulation, the proposed method seems to perform better than the parametric bootstrap for areas with high leverages. 

Roderick J.A. Little 
Missing At Random And Ignorability For Survey Inferences With Missing Data 
Roderick J.A. Little, University of Michigan and Sahar Z Zangeneh, University of Washington 

In a landmark paper, Rubin (1976) showed that the missing data mechanism can be ignored for likelihoodbased inference about parameters when (a) the missing data are missing at random (MAR), in the sense that missingness does not depend on the missing values after conditioning on the observed data, and (b) distinctness of the parameters of the data model and the missingdata mechanism, that is, there are no a priori ties, via parameter space restrictions or prior distributions, between the parameters of the data model and the parameters of the model for the mechanism. Rubin (1976) described (a) and (b) as the "weakest simple and general conditions under which it is always appropriate to ignore the process that causes missing data". However, it is important to note that these conditions are not necessary for ignoring the mechanism in all situations. We propose conditions for ignoring the missingdata mechanism for likelihood inferences about subsets of the parameters of the data model. We present examples where the missing data are ignorable for some parameters, but the missing data mechanism is missing not at random (MNAR), thus extending the range of circumstances where the missing data mechanism can be ignored. We apply these ideas to survey inference with missing data and poststratification information. 

Isabel Molina 
Small area estimation under a FayHerriot model with preliminary testing for the presence of random area effects 
Isabel Molina, Department of Statistics, Universidad Carlos III de Madrid, Madrid, Spain, J.N.K. Rao, School of Mathematics and Statistics, Carleton University, 

The empirical best linear unbiased predictor (EBLUP) under a FayHerriot model is often used for estimation of a small area mean when the available auxiliary information is aggregated at the area level. The FayHerriot model involves unobservable random effects for the areas, which represent the area variation that is not explained by the auxiliary variables. Datta, Hall and Mandal (2011) proposed an alternative estimator to the EBLUP based on a preliminary test (PT) for the significance of the random effects variance. When the null hypothesis of no area effects is not rejected, a synthetic estimator based on the same model without the area effects is used. Otherwise, the EBLUP is used. The properties of this new estimator in terms of bias and mean squared error are studied for different values of random effects variance and different significance levels of the testing procedure. The PT estimator is compared with the EBLUP, with the adjusted maximum likelihood (AML) estimator introduced by Li and Lahiri (2010) and with two more combined estimators that, as the AML, give always a non zero weight to the direct estimator for all areas. Mean squared error estimators based on the preliminary testing procedure are also proposed and studied in simulations. 

Domingo Morales 
Small area estimation of labour force indicators under a multinomial mixed model with correlated time and area effects 
Esther LópezVizcaíno, Instituto Galego de Estatística, Spain, María José Lombardía, Universidade da Coruña, Spain, and Domingo Morales, Universidad Miguel Hernández de Elche, Spain  
The aim of this work is the estimation of small area labour force indicators like totals of employed and unemployed people and unemployment rates. Small area estimators of these quantities are derived from four multinomial logit mixed models, including a model with correlated time and area random effects. Mean squared errors are used to measure the accuracy of the proposed estimators and they are estimated by analytic and bootstrap methods. The introduced methodology is applied to real data from the Spanish Labour Force Survey of Galicia. 

Anne RuizGazen 
Approximation of rejective sampling inclusion probabilities and application to high order correlations 
Anne RuizGazen, Toulouse School of Economics, France 

In the finite population context, asymptotic properties of estimators, such as consistency and asymptotic normality, are usually derived under assumptions on high order inclusion probabilities of the sampling design. The purpose of this presentation is to generalize the approximation result obtained by Hajek for the ﬁrst and second order inclusion probabilities of rejective sampling to inclusion probabilities of any order, and also to provide a more precise remainder term in the expansion. This result is applied to illustrate that rejective sampling satisﬁes conditions on higher order correlations imposed in the recent literature to derive asymptotic results. A comparison with some other existing results concerning high order correlations of the rejective sampling will also be presented. 

Alastair Scott 
Information Criteria under Complex Sampling 
Alastair Scott and Thomas Lumley 

Model selection criteria such as AIC and BIC are widely used in applied statistics. In recent years there has been a huge increase in regression modelling of data from large complex surveys, and a resulting demand for versions of AIC and BIC that are valid under complex sampling. We show how to extend both criteria to complex samples. Following the approach of Takeuchi (1976) for possiblymisspecified models, AIC can be extended by replacing the penalty term by the Rao–Scott formula for the null expectation of the log likelihood ratio. BIC can be extended by a Bayesian coarsening argument, where the point estimates under complex sampling are treated as the data available for Bayesian modelling. The Laplace approximation argument used to construct BIC then gives a penalty term involving the trace and determinant of the Rao–Scott designeffect matrix. 

Nikos Tzavidis 
Using Mquantile Regression for Small Area Estimation of Binary and Count Outcomes 
Ray Chambers (University of Wollongong), Emanuela Dreassi (University of Florence), M. Giovanna Ranalli (University of Perugia), Nicola Salvati (University of Pisa), Nikos Tzavidis (University of Southampton) 

The increasing demand for reliable small area statistics has led to the development of a number of efficient modelbased small area estimation (SAE). For example, the empirical best linear unbiased predictor based on a linear mixed model (LMM) is often recommended when the target of inference is the small area average of a continuous response variable. Using a mixed model, however, requires strong distributional assumptions. An alternative approach to small area estimation that automatically allows for robust inference is to use Mquantile models (Chambers & Tzavidis, 2006). In reality, many survey variables are categorical in nature and are therefore not suited to standard SAE methods based on LMMs. In this presentation we discuss recent work on a new approach to SAE for discrete outcomes based on Mquantile modelling. This is based on extending the existing Mquantile approach for continuous outcomes to the case where the response is binary or a count. As with Mquantile modelling of a continuous response, random effects are avoided and between area variation in the response is characterised by variation in areaspecific values of quantilelike coefficients. After reviewing Mquantile small area estimation for a continuous response, we show how the approach for robust inference for generalised linear models (GLMs) proposed by Cantoni & Ronchetti (2001) can be extended for fitting an Mquantile GLM. Approaches for defining the Mquantile coefficients, which play the role of pseudorandom effects in this framework, are discussed, alongside the definition of small area predictors and corresponding MSE estimators. Results from modelbased and designbased simulation studies aimed at empirically assessing the performance of the proposed small area predictors are presented. The presentation is concluded by presenting results from the application of the proposed methods for deriving (a) unemployment estimates for Local Authority Districts in the UK and (b) estimates of the number of visits in primary health care outlets for Health Authority Districts in Italy. 

Richard Valliant  Effects on Sample Design of Varying Unit Sizes in Two and Threestage Sampling 
Richard Valliant, Universities of Michigan & Maryland 

Two and threestage sampling is sometimes necessary in household or establishment surveys for operational or cost reasons. Accompanying each stage of sampling is a variance component that depends on the type of sample design used at each stage and on the estimator. The relative sizes of the variance components determine measures of homogeneity that are used in determining an efficient sample design. Although textbooks usually make the simplifying assumption that the units used at each stage of sampling contain the same numbers of units, this assumption typically is violated in real populations. Not accounting for varying sizes of units can lead to unexpectedly inefficient sample allocations. These points are illustrated using a data set based on the 2000 Census for a county in Maryland. 

Lily Wang 
Estimation of Small Area Means under SemiParametric Measurement Error Models 
Gauri S. Datta, University of Georgia, Peter Hall, University of Melbourne, Aurore Delaigle, University of Melbourne, and Lily Wang, University of Georgia 

In recent years, demand for reliable estimates for characteristics of small domains (small areas) has considerably increased worldwide due to the growing use of such estimates in formulating policies and programs, allocating government funds, planning regional development, and making marketing decisions at the local level. However, due to cost and operational considerations, it is rarely possible to get a large enough sample at the small area level to support direct estimates with adequate precision for all domains of interest. Modelbased inference has gained immense popularity in producing indirect but reliable small area estimates. These indirect estimates borrow strength from related areas and other data source by linking them through appropriate models. Existing methods in small area estimation are mostly parametric, and they usually treat the explanatory variables as if they are measured without error. However, explanatory variables are often subject to measurement error. A few authors have addressed the measurement error problem in small area estimation through parametric approach based on the normality assumption. Resulting estimates are usually sensitive to the distributional assumptions. In this talk, we consider structural measurement error models and a semiparametric approach to produce reliable point estimates and prediction intervals for small area means. Specifically, we consider an adaptation of the FayHerriot model for the arealevel data where one of the covariates is measured with error. We replace the normality assumption of the sampling error and the normality assumption of the measurement error of a covariate by heavytailed distributions. Estimating the unknown measurement error density nonparametrically, we develop both point estimates and prediction intervals of small area means. We have obtained an expansion of the coverage error of the proposed prediction intervals. 

Suojin Wang 
Variable selection and estimation for longitudinal survey data 
Lily Wang, University of Georgia, and Suojin Wang, Texas A&M University 

There is wide interest in studying longitudinal surveys where sample subjects are observed successively over time. Longitudinal surveys have been used in many areas today, for example, in the health and social sciences, to explore relationships or to identify significant variables in regression settings. In this talk we discuss a general strategy for the model selection problem in longitudinal sample surveys. A survey weighted penalized estimating equation approach is proposed to select significant variables and estimate the coefficients simultaneously. The proposed estimators are design consistent and perform as well as the oracle procedure when the correct submodel were known. The estimating function bootstrap is applied to obtain the standard errors of the estimated parameters with good accuracy. A fast and efficient variable selection algorithm is developed to identify significant variables for complex longitudinal survey data. Numerical illustrations are given to show the usefulness of the proposed methodology under various model settings and sampling designs. 

Changbao Wu 
Calibration Weighting Methods for Complex Surveys 
Changbao Wu, Department of Statistics and Actuarial Science, University of Waterloo 

This paper provides an overview of three popular calibration weighting methods for complex surveys: (i) the regression weighting method; (ii) the exponential tilting method; and (iii) the pseudo empirical likelihood method. Computational algorithms for each of the methods are discussed, and finite sample configurations of the three types of weights are examined through simulation studies. The pseudo empirical likelihood approach to calibration is shown to have several advantages, including stable weights, efficient and reliable computational procedures, and the method can easily be used for generalized raking, a special calibration problem where auxiliary population information is in the form of known marginal totals for a contingency table. 