Abstract Submission









Proceeding Book



Session Schedule

Short Courses





General Contact:
Naitee Ting




Short Courses

  Title Instructor(s)
LSC Room
Sunday Full Day: 8am - 5pm (includes lunch)
1 Measurement error

John Buanoccorsi
University of Massachusetts

2 Prevention and treatment of missing data: turning guidance into practice Craig Mallincrokdt, Geert Mohlenbergs, Bohdana Ratitch, Lei Xu, et al
Lilly, Hasselt University, inVentiv Health, Biogen

3 Practical Bayesian Computation Fang Chen, Bob Lucas
SAS Institute
Sunday Half Day, Morning 8am -12pm (lunch is not included)
1 Graphical approaches to multiple test problems

Dong Xi, Frank Bretz

2 Network based analysis of big data Shuangge Steven Ma
Yale University
Sunday Half Day, Afternoon 1pm - 5pm (lunch is not included)
1 Classification and regression trees and forests Wei-Yin Loh
University of Wisconsin
2 Patient-Reported Outcomes: Measurement, Implementation and Interpretation Joseph Cappelleri

Short Course Fees
  Full Day Half Day
Menber, Non-Student $350 $200
Member, Student $75 $45
Non-Member, Non-Student $400 $225
Non-Member, Student $80 $50


Short Course Abstracts

Measurement Error

John Buonaccorsi, University of Massachusetts


Measurement error is ubiquitous and it is well known that the inability to exactly measure predictors in regression problems often leads to biased estimators and invalid inferences. This problem has a long history in linear problems but saw an explosion of interest over the last twenty years as methods were expanded to both deal with more complex models and address a number of practical problems that arise in practice. The methodology has been successfully, and widely used across a wide range of disciplines, most notably (but certainly not limited to) Epidemiology. 
This course will present an introductory, and relatively applied, look at measurement error in regression settings including linear and nonlinear models, the latter including generalized linear models and more explicitly logistic regression. The goal of the course is to introduce attendees to models used for measurement error, the impacts of measurement error on so-called naive analyses, which ignore it, and provide an extensive overview of the myriad techniques available to correct for it, along with associated inferences. We deal both with the case of additive error, in which case the measurement error parameters are usually estimated through replication, as well as non-additive error, where validation data (either internal or external) is exploited to correct for measurement error. Detailed examples will be provided from a variety of disciplines and, although the course does not have a computer component associated with it, an overview of available software and its use will be presented. Time permitting, we will briefly discuss measurement error in mixed/longitudinal models and time series.
Students should have some prior exposure to basic mathematical statistics and have familiarity with regression models, including seeing models and methods expressed in matrix-vector form.
Reference:   Buonaccorsi (2010), "Measurement Error: Models, Methods and Applications''; Chapman & Hall.


Prevention and treatment of missing data: turning guidance into practice

Craig Mallincrokdt, Geert Mohlenbergs, Bohdana Ratitch, Lei Xu, et al


Recent research has fostered new guidance on preventing and treating missing data in clinical trials.  This short course is based on work from the Drug Information Association’s Scientific Working Group (DIASWG) on Missing Data.  The first half-day of the course begins with an overview of the research and other background that fostered the new guidance, including a brief history of the work by the National Research Council Expert Panel on missing data that provided detailed advice to FDA on the prevention and treatment of missing data.  The first half day will also distill common elements from recent guidance into 3 pillars: 1) setting clear objectives; 2) minimizing missing data; and, 3) pre-specifying a sensible primary analyses and appropriate sensitivity analyses.  Specific means for putting the guidance into action are proposed, including detailed coverage of developing an overall analytic road map.  In the second half-day, specific software tools developed by the DIASWG to implement the analytic road map will be demonstrated on an example data set.  Attendees will be provided with these programs at no cost and encouraged to run the programs concurrent with the demonstration.  Several DIASWG members will be available to assist attendees in running the programs. 

Learning objectives
1) Understand the three pillars of preventing and treating missing data, with emphasis on developing a complete analytic road map that includes a sensible primary analysis and appropriate sensitivity analyses. 
2) Be able to apply the three pillars principles to their own research
3) Understand the theory behind key sensitivity analyses and be able to run the macros developed by the DIASWG that will be given free of charge to attendees.


Practical Bayesian Computation

Fang Chen, Bob Lucas, SAS Institute


This one-day course reviews the basic concepts of Bayesian inference and focuses on the practical use of Bayesian computational methods. The objectives are to familiarize statistical programmers and practitioners with the essentials of Bayesian computing, and to equip them with computational tools through a series of worked-out examples that demonstrate sound practices for a variety of statistical models and Bayesian concepts.

The first part of the course will review differences between classical and Bayesian approaches to inference, fundamentals of prior distributions, and concepts in estimation. The course will also cover MCMC methods and related simulation techniques, emphasizing the interpretation of convergence diagnostics in practice.

The rest of the course will take a topic-driven approach that introduces Bayesian simulation, analysis, and illustrates the Bayesian treatment of a wide range of statistical models using software with code explained in detail. The course will present major applications areas and case studies, including multi-level hierarchical models, multivariate analysis, non-linear models, meta-analysis, latent variable models, and survival models. Special topics that are discussed include Monte Carlo simulation, sensitivity analysis, missing data, model assessment and selection, variable subset selection, and prediction. The examples will be done using SAS (PROC MCMC), with a strong focus on technical details.

Attendees should have a background equivalent to an M.S. in applied statistics.  Previous exposure to Bayesian methods is useful but not required. Familiarity with material at the level of this text book is appropriate: Probability and Statistics (Addison Wesley), DeGroot and Schervish.



Graphical approaches to multiple test problems

Dong Xi, Frank Bretz (Novartis)


Methods for addressing multiplicity are becoming increasingly more important in clinical trials and other applications. Examples of such study objectives include investigation of multiple doses or regimens of a new treatment, multiple endpoints, subgroup analyses or any combination of these. This short course will provide a practical guidance on how to construct multiple testing procedures (MTPs) for such hypotheses with an emphasis on graphical approaches.

Course outline:
1. Introduction to multiple testing procedures
In the first part of this course, we will introduce the concept of multiplicity and its impact on scientific research. To deal with multiplicity issues, we will discuss basic concepts of MTPs including the error rate, adjusted p-values and single-step and stepwise procedures. Common MTPs such as Bonferroni, Holm, Hochberg and Dunnett will be introduced and compared. We will describe the closure principle and closed testing procedures as an important way to construct MTPs.

2. Graphical approaches to multiple testing
In the second part of the course, we will focus on graphical approaches that can be applied to common multiple test problems. Using graphical approaches, one can easily construct and explore different test strategies and thus tailor the test procedure to the given study objectives. The resulting multiple test procedures are represented by directed, weighted graphs, where each node corresponds to an elementary hypothesis, together with a simple algorithm to generate such graphs while sequentially testing the individual hypotheses. We also present one case study to illustrate how the approach can be used in clinical practice. The presented methods will be illustrated using the graphical user interface from the gMCP package in R, which is freely available on CRAN.


Network based analysis of big data

Shuangge (Steven) Ma, Yale University


With the fast development in data collection and storage techniques, big data are now routinely encountered in biomedicine, engineering, social science, and many other scientific fields. In many of the existing analyses, the interconnections among functional units have not been sufficiently accounted for, leading to a loss of efficiency or even failures of many statistical models. Recently, network-based analysis has emerged as an effective analysis tool for modeling big data.

In this short course, we will survey the newly developed network-based analysis methods for big data, with an emphasis on methodological development and applications. Topics to be covered will include:

  1. Background of network analysis, including motivating examples from biomedicine and social science.
  2. A brief survey of network construction methods. Experiment-based, statistical, and hybrid methods will be introduced. We will introduce network construction algorithms, their rationale, and software implementation.
  3. Incorporating network information in statistical modeling. With big data, the two main analysis paradigms are marginal analysis and joint analysis. For each paradigm, we will introduce multiple recently-developed statistical methods, their rationale, and software implementation. Demonstrating examples from biomedicine will be provided, showing the practical impact of network analysis.
  4. Network analysis of samples. A representative example is social network. Another example is the recently proposed concept of “human disease network”. We will introduce concepts and analysis methods and show data analysis examples.

After taking the course, audiences are expected to have a good understanding of (a) the “big picture” of analyzing big data using network-based methods, (b) a set of recently proposed methods, and (c) their software implementation. The demonstrating examples will be from multiple scientific fields and expected to be tightly related to daily practice of the audience.

Intended audience will include researchers from academia, pharmaceutical companies, consulting firms, and government agencies as well as advanced graduate students. Prerequisites: master-level training in statistics or a related field; generic knowledge of big data; knowledge of statistical software especially R will be a plus but not required.


Classification and regression trees and forests

Wei-Yin Loh, University of Wisconsin


It is more than 50 and 30 years since AID (Morgan and Sonquist 1963) and CART (Breiman et al 1984) appeared.  Rapidly increasing use of trees among practitioners has led to great advances in algorithmic research over the last two decades.  Modern tree models have higher prediction accuracy and do not have selection bias.  They can fit linear models in the nodes using GLM, quantile, and other loss functions; response variables may be multivariate, longitudinal, or censored; and classification trees can employ linear splits and fit kernel and nearest-neighbor node models.

The course begins with examples to compare tree and traditional models.  Then it reviews the major algorithms, including AID, CART, C4.5, CHAID, CRUISE, CTREE, GUIDE, M5, MOB, and QUEST. Real data are used to illustrate the features of each, and results on prediction accuracy and model complexity versus forests and some machine learning methods are presented.  Examples are drawn from business, science, and industry, and include applications to subgroup identification for personalized medicine, missing value imputation in surveys, and differential item functioning in educational testing. Relevant software is mentioned where appropriate.  Attendees should be familiar with multivariate analysis at the level of Johnson and Wichern's "Applied Multivariate Statistical Analysis."


The target audience is statistical researchers and practitioners from academia, business, government, and industry.  The course is particularly useful for people who routinely analyze large and complex datasets and who want to know the latest advances in algorithms and software for classification and regression tree methods.


Patient-Reported Outcomes: Measurement, Implementation and Interpretation

Joseph C. Cappelleri, Pfizer Inc


This half-day short course provides an exposition on health measurement scales – specifically, on patient-reported outcomes based on the instructor’s co-authored book. Some key elements in the development of a patient-reported outcome (PRO) instrument are noted. Highlighted here is the importance of the conceptual framework used to depict the relationship between items in a PRO instrument and the concepts measured by it. The core topics of validity and reliability are discussed. Validity, which is assessed in several ways, provides the evidence and extent that the PRO taps into the concept that it is purported to measure in a particular setting. Reliability of a PRO instrument involves its consistency or reproducibility as assessed by internal consistency and test-retest reliability. Exploratory factor analysis and confirmatory factor analysis are described as techniques to understand the underlying structure of a PRO measure with multiple items.

While most of the presentation centers on psychometrics from a classical test theory perspective, attention is also given to item response theory as an approach to scale development and evaluation. Cross-sectional analysis and longitudinal analysis of PRO scores are covered. Also covered is the topic of mediation modeling as a way to identify and explain the mechanism that underlies an observed relationship between an independent variable and a dependent variable via the inclusion of a third explanatory variable, known as a mediator variable. Variations of missing data for PRO measures are highlighted, as is the topic of multiple testing. Finally, approaches to interpret PRO results are elucidated in order to make these results useful and meaningful. Illustrations are provided mainly through real-life examples and also through simulated examples using SAS.

Reference: Cappelleri JC, Zou KH, Bushmakin AG, Alvir JMJ, Alemayehu D, Symonds T. Patient-Reported Outcomes: Measurement, Implementation and Interpretation. Boca Raton, Florida: Chapman & Hall/CRC Press. December 2013.