|Bias in trials comparing paired continuous tests can cause researchers
to choose the wrong screening modality
Deborah Glueck, University of Colorado Health Sciences Center.
Monday, October 27
4:00 p.m. 223 Weber
To compare the diagnostic accuracy of two continuous screening tests, a
common approach is to test the difference between the areas under the
receiver operating characteristic (ROC) curves. After study
participants are screened with both screening tests, the disease status
is determined as accurately as possible, either by an invasive, yet
sensitive and specific secondary test, or by a less invasive, but less
sensitive approach. For most participants, disease status is
approximated through the less sensitive approach. The invasive test
must be limited to the fraction of the participants whose results on
either or both screening tests exceed a threshold of suspicion, or who
develop signs and symptoms of the disease after the initial screening
The limitations of this study design lead to a bias in the ROC
curves we call paired screening trial bias. This bias reflects the
synergistic effects of inappropriate reference standard bias,
differential verification bias, and partial verification bias. The
absence of a gold reference standard leads to inappropriate reference
standard bias. When different reference standards are used to ascertain
disease status, it creates differential verification bias. When only
suspicious screening test scores trigger a sensitive and specific
secondary test, the result is a form of partial verification bias.
For paired screening tests with bivariate normally distributed scores,
we give formulae and programs to quantify the effect of paired screening
trial bias on a paired comparison of area under the curves. We fix the
prevalence of disease, and the chance that a diseased subject manifests
signs and symptoms. We derive the formulas for true sensitivity and
specificity, and the quite different formulas for the sensitivity and
specificity observed by the study investigator.
The observed area under the ROC curves is quite different from the true
area under the ROC curves. The typical direction of the bias is a
strong inflation in sensitivity, paired with a concomitant slight
deflation of specificity.
In paired trials of screening tests, when area under the ROC curve is
used as the metric, bias may lead researchers to make the wrong decision
as to which screening test is better.