Suboptimality of MDL and Bayes in Classification under Misspecification
Peter Grunwald
CWI Amsterdam/EURANDOM Eindhoven
We show that forms of Bayesian and MDL learning that are often applied
to classification problems can be *statistically inconsistent*.
We present a classification model (a large family of classifiers) and a
distribution such that the best classifier within the model has
generalization error (expected 0/1-prediction loss) almost 0.
Nevertheless, no matter how many data are observed, both the classifier
inferred by MDL and the classifier based on the Bayesian posterior will
behave much worse than this best classifier in the sense that their
expected 0/1-prediction loss is substantially larger. The reason is
that, in order to apply Bayes and MDL to models consisting of
classifiers, these classifiers first have to be converted into
probability distributions or, equivalently, codes. The standard method
for doing this is the logistic transformation. However, the resulting
probability models cannot be expected to contain the true distribution,
and this causes the problem: our result can be re-interpreted as showing
that under misspecification, Bayes and MDL do not always converge to the
distribution in the model that is closest in KL divergence to the data
generating distribution. We compare this result with earlier results on
Bayesian inconsistency by Diaconis, Freedman and Barron.
Joint work with John Langford of the Toyota Technological Institute,
Chicago.