Quantifying Relative Incomplete Information for
Hypothesis Testing in Statistical and Genetic Studies
Xiao-Li Meng
Department of Statistics, Harvard University
This paper attempts to establish a general framework for
quantifying the relative amount of missing information in the
context of hypothesis testing with incomplete data. The work is
motivated by applications to studies, such as linkage analyses and
haplotype-based association projects, designed to identify genetic
contributions to complex diseases. In the genetic studies the
information measures are used for the experimental design, technology
comparison,
interpretation of the data, and for understanding the behavior of
some of the inference tools. The central difficulties in
constructing such information measures arise from the multiple,
and often conflicting, aims in practice. For large samples, we
show that a satisfactory, likelihood-based general solution exists
by using appropriate forms of the relative Kullback-Leiber
information, and that the proposed measures are computationally
inexpensive given the maximized likelihoods with the observed
data. We exemplify the measures on data coming from mapping
studies on the inflammatory bowel disease and stroke. For
small-sample problems, which appear rather frequently in practice
and sometimes in disguised forms (e.g., measuring individual
contribution to a large study), the robust Bayesian approach holds
great promise, though the choice of a general-purpose "default
prior" is still a very challenging problem. We also report several
intriguing connections we encountered in our investigation, such
as the connection with the fundamental identity for the EM
algorithm, the connection with the second CR (Chapman-Robbins)
lower information bound and connections between likelihood ratios
and Bayes factors.