«Abstract. Bayesian model averaging has increasingly witnessed applications across an array of empirical contexts. However, the dearth of available ...»
BAYESIAN MODEL AVERAGING IN R
SHAHRAM M. AMINI AND CHRISTOPHER F. PARMETER
Abstract. Bayesian model averaging has increasingly witnessed applications across an array of
empirical contexts. However, the dearth of available statistical software which allows one to engage
in a model averaging exercise is limited. It is common for consumers of these methods to develop their own code, which has obvious appeal. However, canned statistical software can ameliorate one’s own analysis if they are not intimately familiar with the nuances of computer coding. Moreover, many researchers would prefer user ready software to mitigate the inevitable time costs that arise when hard coding an econometric estimator. To that end, this paper describes the relative merits and attractiveness of several competing packages in the statistical environment R to implement a Bayesian model averaging exercise.
1. Introduction Bayesian model averaging (BMA) is an empirical tool to deal with model uncertainty in various milieus of applied science. In general, BMA is employed when there exist a variety of models which may all be statistically reasonable but most likely result in diﬀerent conclusions about the key questions of interest to the researcher. As Raftery (1995, pg. 113) notes “In this situation, the standard approach of selecting a single model and basing inference on it underestimates uncertainty about quantities of interest because it ignores uncertainty about model form.” Typically, though not always, BMA focuses on which regressors to include in the analysis. The allure of BMA is that one can quickly determine models, or more speciﬁcally sets of explanatory variables, which possess high likelihoods. By averaging across a large set of models one can determine those variables which are relevant to the data generating process for a given set of priors used in the analysis. Each model (a set of variables) receives a weight and the ﬁnal estimates are constructed as a weighted average of the parameter estimates from each of the models. BMA includes all of the variables within the analysis, but shrinks the impact of certain variables towards zero through the model weights. These weights are the key feature for estimation via BMA and will depend upon a number of key features of the averaging exercise including the choice of prior speciﬁed.
The implementation of BMA, which was ﬁrst proposed by Leamer (1978, Sections 4.4-4.6), for linear regression models is as follows. Consider a linear regression model with a constant term, β0, Date: May 17, 2011.
Key words and phrases. Model Averaging, Zellner’s g Prior.
We would like to thank James MacKinnon, Achim Zeileis, Martin Feldkircher, and Stefan Zeugner for constructive and insightful comments.
Shahram M. Amini, Department of Economics, Virginia Polytechnic Institute and State University, email:
email@example.com. Christopher F. Parmeter, Corresponding Author, Department of Economics, University of Miami, 305-284-4397, e-mail: firstname.lastname@example.org.
Given the number of regressors, we will have 2k diﬀerent combinations of right hand side variables indexed by Mj for j = 1, 2, 3,..., 2k. Once the model space has been constructed, the posterior distribution for any coeﬃcient of interest, say βh, given the data D is
For further discussions on BMA, including its limitations and implementation, we refer the reader to the comprehensive review of Bayesian model averaging by Hoeting, Madigan, Raftery & Volinsky (1999).
The following sections provide an overview of three currently available packages in the statistical computing language of R (R Development Core Team 2010) that can implement a BMA empirical exercise. The main features under the user’s control for each of the packages, including the set of prior probabilities and model sampling algorithms as well as the plot diagnostics available to visualize the results, are described. Several detailed examples to compare the performance of these diﬀerent packages are also provided along with functioning R code in an appendix.
To our knowledge, R is the only mainstream statistical platform which oﬀers a suite of routines to conduct a BMA analysis. The availability of BMA routines in other statistical software is limited.
BMA IN R 3 Neither Gauss nor Stata possess built-in packages which allow the user to implement a genuine, linear regression BMA.1,2 Matlab, while lacking a comprehensive BMA toolbox,3 supplies users with the core functionality of the BMS package (discussed below) via installation of the BMS toolbox for Matlab. Fortran users have access to a ready-to-use BMA toolbox stemming from Fernandez, Ley & Steel’s (2001b) publicly available code. And ﬁnally, while SAS provides some functionality for implementing BMA it is incapable of handling a large-scale BMA analysis.
Beyond our review of the functionality of the three available packages, we contrast estimates and posterior inclusion probabilities across the three packages with a mock empirical example. This is done both with a set of covariates that allows for full enumeration of the model space as well as requiring the implementation of a model space search mechanism which is what truly distinguishes the three packages. The time performance of the three packages as both the sample size and the covariate space increase is also supplied. Finally, we examine whether these packages can replicate the results of recently published econometric research that employs BMA techniques. Overall, all three packages share relative advantages against their peers, yet we advocate for the BMS package given its versatility with user deﬁned priors as well as the numerous options to customize one’s BMA analysis.
2. Available Packages
2.1. The BMS Package. The BMS (an acronym for Bayesian Model Selection) package employs standard Bayesian normal-conjugate linear model as the base model and “Zellner’s g prior” as the choice of prior structures for the regression coeﬃcients (Feldkircher & Zeugner 2009). Since the form of the hyperparameter g is crucial in BMA analyses, the BMS package sets g equal to the sample size, usually known as the unit information prior (UIP). BMS also provides alternative formulations regarding the choice of g. The main function in the BMS package to implement a BMA regression analysis is bms().
2.1.1. Model Sampling. Since enumerating all potential variable combinations becomes infeasible quickly for a large number of covariates, the BMS package uses a Markov Chain Monte Carlo (MCMC) samplers to gather results on the most important part of the posterior distribution when more than 14 covariates exist. The MCMC sampler walks through the model space using the Metropolis-Hastings algorithm4, which works as follows: Suppose that the current model at step i is Mi with posterior model probability p(Mi |y, X). The MCMC sampler for the BMS package 1Millar (2011) has recently published a Stata module that uses the Bayesian Information Criterion (BIC) for estimating the probability that a variable is a part of the ﬁnal model. This module, available at http://fmwww.bc.edu/repec/bocode/b/bic.ado, calculates the BIC statistic for all possible combinations of the independent variables.
2Gauss users can ﬁnd the code used in Sala-i-Martin, Doppelhofer & Miller (2004) to implement the BACE technique at http://www.nhh.no/Default.aspx?ID=3075.
3Matlab’s Econometrics Toolbox comes with a function called bma g that provides very basic BMA functionality.
4See Metropolis, Rosenbluth, Rosenbluth, Teller & Teller (1953), Hastings (1970), Chib & Greenberg (1995), and Liu (2008).
4 SHAHRAM M. AMINI AND CHRISTOPHER F. PARMETERrandomly draws a candidate model and then moves to this model if its marginal likelihood is superior to the marginal likelihood of the current model. In this algorithm, the number of times each model is kept will converge to the distribution of posterior model probabilities p(Mi |y, X). The BMS package oﬀers two diﬀerent MCMC samplers to look at models within the model space. These two methods diﬀer in the way they propose candidate models. The ﬁrst method is called the birth-death sampler (mcmc=bd). In this case, one of the potential regressors is randomly chosen; if the chosen variable is already in the current model Mi, then the candidate model Mj will have the same set of covariates as Mi but drop the chosen variable. If the chosen covariate is not contained in Mi, then the candidate model will contain all the variables from Mi plus the chosen covariate; hence the appearance (birth) or disappearance (death) of the chosen variable depends if it already appears in the model. The second approach is called the reversible-jump sampler (mcmc=rev.jump). This sampler draws a candidate model by the birth-death method with 50% probability and with 50% probability the candidate model randomly drops one covariate with respect to Mi and randomly adds one random variable from the potential covariates that were not included in model Mi.
The precision of any MCMC sampling mechanism depends on the number of draws the procedure runs through. Given that the MCMC algorithms used in the BMS package may begin using models which might not necessarily be classiﬁed as ‘good’ models, the ﬁrst set of iterations do not usually draw models with high posterior model probabilities (PMP). This indicates that the sampler will only converge to spheres of models with the largest marginal likelihoods after some initial set of draws (known as the burn-in) from the candidate space. Therefore, this ﬁrst set of iterations will be omitted from the computation of results. In the BMS package the argument (burn) speciﬁes the number of burn-ins (models omitted), and the argument (iter) the number of subsequent iterations to be retained. The default number of burn-in draws for either MCMC sampler is 1000 and the default number of iteration draws (excluding burn-ins) 3000.
2.1.2. Model Priors. The BMS package oﬀers considerable freedom in the choice of model prior.
One can employ the uniform model prior as the choice of prior model size (mprior="uniform"), the Binomial model priors where the prior probability of a model is the product of inclusion and exclusion probabilities (mprior="fixed"), the Beta-Binomial model prior (mprior="random") that puts a hyperprior on the inclusion probabilities drawn from a Beta distribution, or a custom model size and prior inclusion probabilities (mprior="customk"). Of the three packages currently available in R this is the only package that allows for custom priors.
2.1.3. Alternative Zellner’s g Priors. Diﬀerent mechanisms have been proposed in the literature for specifying g priors. The options in the BMS package are as follows (1) g="UIP"; Unit Information Prior (UIP), that corresponds to g = N, the sample size.5 (2) g="RIC"; Sets g = K 2 and conforms to the risk inﬂation criterion.6 5See Fernandez, Ley & Steel (2001a) 6See ? for more details BMA IN R 5 (3) g="BRIC"; A mechanism that asymptotically converges to the unit information prior (g =
N ) or the risk inﬂation criterion (g = K 2 ). That is, the g prior is set to g = max(N, K 2 ).7 (4) g="HQ"; Follows the Hannan-Quinn criterion asymptotically and sets g = log(N 3 ).
(5) g="EBL"; Estimates a local empirical Bayes g-parameter.8 (6) g="hyper"; Takes the “hyper-g” prior distribution.9 2.1.4. Outputs. The main objects returned by bms() are the posterior inclusion probabilities (PIP) and posterior means and standard deviations. In addition, a call to this function returns a list of aggregate statistics including the number of draws, burn-ins, models visited, the top models, and the size of the model space. It also, returns the correlation between iteration counts and analytical PMPs for the best models10, those with the highest posterior model probabilities (PMP).
2.1.5. Plot Diagnostics. BMS package users have access to plots of the prior and posterior model size distributions, a plot of posterior model probabilities based on the corresponding marginal likelihoods and MCMC frequencies for the best models that visualize how well the sampler has converged. A grid with signs and inclusion of coeﬃcients vs. posterior model probabilities for the best models and plots of predictive densities for conditional forecasts are also produced.