WAIC and WBIC
Widely Applicable and Bayesian Information Criterion
Sumio Watanabe Homepage
WAIC and WBIC are information criteria beyond Laplace and Fisher.
(A) If you want to estimate the predictive loss, you had better use WAIC.
(B) If you want to identify the true model, you had better use WBIC.
(C) Both WAIC and WBIC are applicable even if the posterior distribution
is far from any normal distribution. They can be used even if Fisher
information matrix is singular or even if a true distribution is unrealizable by
a statistical model. Especially,
they can be applicable even when the posterior is not log-concave.
(D) Both WAIC and WBIC have the completely new and mathematically rigorous theoretical support,
algebraic geometry and empirical process theory. Neither Fisher asymptotic theory nor
Laplace approximation is necessary.
(E) If you have an MCMC software, it is very easy to implement WAIC and WBIC.
(F) Both WAIC and WBIC are useful in practical applications.
(Special Remark) Bayes is much more useful than ML
when the posterior is far from the normal distribution.
Both WAIC and WBIC work even in such cases.
(Special Remark) Bayes is much more useful than ML
when a statistical model is made of integration of complex, deep, and hierarchical
inferences. Both WAIC and WBIC work even in such cases.
This theorem holds,
(1) even if Fisher information matrix is not positive definite,
(2) or even if asymptotic normality of MLE fails,
(3) or even if Laplace approximation does hot hold.
WAIC and WBIC are supported by
singular learning theory.
PDF file for printing.
(Remark 1) Identifying the true model is different from estimating the predictive loss.
(Remark 2) If the posterior distribution can not be approximated by any normal distribution,
then neither BIC nor DIC can be used in statistical model evaluation.
Both WAIC and WBIC can be employed in any circumstance. WAIC and WBIC have the theoretical support.
Asymptotically WAIC has the same expectation value as the predictive log likelihood and WBIC is the same
random variable as the Bayes marginal likelihood.
(Remark 3) If you want to check your MCMC software, you had better compare
the theoretical RLCT with empirical one.
* One method to know RLCT is to use
Theorem 2 in
WAIC paper .
* Another is to use the equations (19) and (20) in p.877 of
WBIC paper .
Let's see the true posterior distribution
(1) Even if Fisher Information matrix at the true parameter is positive definite,
the posterior distribution might not be approximated by a normal distribution.
In this example, the number of parameters is only two, whereas the number of
empirical samples is 10000. Fisher asymptotic theory can not be applied to
this case.
(2) Neither AIC, BIC, TIC, DIC, nor MDL is applicable, even if n=10000.
(3) WAIC and WBIC are applicable for all cases, n=100, 1000, and 10000.
(4) If you are a mathematician, you find algebraic geometry in this problem.
(5) If you are a statistician, you had better have the courage to see
the true posterior distribution.
(Remark) One might think that, in a real world problem, a true
distribution seldom coincides with a singularity of a statistical model,
thus Fisher asymptotic theory holds if the number of empirical samples is sufficiently large.
However, ``the sufficiently large number" is often far larger than one may expect.
Related Article
The evaluation reports of WAIC are given by the expert statisticians.
-
A.Gelman, J.B.Carlin, H.S.Stern, D.B. Dunson, A.Vehtari, and D.B.Rubin, Bayesian Data Analysis,
3rd Edition, Chapman and Hall/CRC, 2013
-
Andrew Gelman, Jessica Hwang, Aki Vehtari,
``Understanding predictive information criteria for Bayesian models," Statistics and Computing,
DOI 10.1007/s11222-013-9416-2, 2013
-
Aki Vehtari and Janne Ojanen, ``A survery of Bayesian predictive methods for model assessment, selection and comparison,
Statistics Surveys, Vol.6, pp.142-228, 2012.
(Remark.1) The original cross-validation in the Bayesian inference might be used,
but it needs quite heavy computational costs, because we need to construct the posterior distributions
n times (n is the number of training samples).
(Remark.2) The important sampling leaving-one-out cross validation (ISLOOCV) can be calculated by
the same computational cost as WAIC. In the above Vehtari-Ojanen paper (pp.189-190),
it is explained in a case of linear regression that ISLOOCV may not satisfy the central limit theorem in
posterior parameter sampling when a leverage sample is contained in the training samples
(Peruggia, 1997; Epifani, MacEachern and Peruggia, 2008). Here ``a leverage sample" is sometimes equal to
``an outlier". This phenomenon is caused by the large importance weight (1/p(x_i|w)), or equivalently by the
small p(x_i|w). On the other hand, in WAIC, such a problem does not occur, because importance weight is not
used.
References
Mathematical foundation of WAIC and WBIC is found in the following
references.
1. Mathematical background of algebraic geometry, empirical process, and
singular learning theory:
S. Watanabe, "Algebraic Geometry and Statistical Learning Theory,"
Cambridge University Press, Cambridge, UK, 2009, September.
2. Proof of WAIC in singular cases:
Sumio Watanabe, "Equations of states in singular statistical estimation",
Neural Networks, Vol.23, No.1, pp.20-34, 2010, January.
arXiv:0712.0653
3. Proof of WAIC in unrealizable cases:
Sumio Watanabe, "Equations of states in statistical learning for an unrealizable and regular case,"
IEICE Transactions, Vol.E93-A, No.3, pp.617-626, 2010, March.
arXiv:0906.0211
4. Proof of asymptotic equivalence of WAIC and Leaving-one-out Cross-Validation:
Sumio Watanabe, "Asymptotic Equivalence of Bayes Cross Validation and
Widely Applicable Information Criterion in Singular Learning Theory,"
Journal of Machine Learning Research, Vol.11, (DEC), pp.3571-3591, 2010.
5. Proof of asymptotic expansion of Bayes Marginal:
S.Watanabe,"Algebraic analysis for nonidentifiable learning machines,"
Neural Computation, Vol.13, No.4, pp.899-933, 2001.
6. Proof of WBIC:
Sumio Watanabe, "A widely applicable Bayesian information criterion,"
Journal of Machine Learning Research, Vol.14, (Mar), pp.867-897, 2013.
Sumio Watanabe Homepage