# «Abstract. Bayesian model averaging has increasingly witnessed applications across an array of empirical contexts. However, the dearth of available ...»

2.2. The BAS Package. The BAS package, (an acronym for Bayesian Adaptive Sampling), performs BMA in linear models using stochastic or deterministic sampling without replacement from posterior distributions (Clyde 2010). Prior distributions on coeﬃcients are from Zellner’s g prior or mixtures of g priors corresponding to the Zellner-Siow Cauchy Priors or the Liang hyper g priors (see Liang et al. 2008). Other model selection criteria include AIC and BIC. The main function in the BAS package to implement a BMA regression analysis is bas.lm().

2.2.1. Model Sampling. If the number of covariates is less than 25, the BAS package enumerates all models, otherwise it implements three diﬀerent search algorithms to ﬁnd the models with the highest posterior probability. The ﬁrst algorithm is named the Bayesian Adaptive Sampling algorithm11, (method="BAS"), that samples without replacement using random or deterministic sampling, (random ="TRUE/FALSE"). The Bayesian Adaptive Sampling algorithm samples models from the model space without replacement using the initial sampling probabilities, and will update the sampling probabilities using the estimated marginal inclusion probabilities. If the explanatory variables are orthogonal, then the deterministic sampler provides a list of the top models in order of their approximate posterior probability, and provides an eﬀective search if the correlations of variables is small to modest. The second algorithm is the Adaptive MCMC sampler (method="AMCMC"). This sampling mechanism requires various parameters to be tuned appropriately for the algorithm to 7See Fernandez et al. (2001a) 8See George & Foster (2000) and Liang, Paulo, Molina, Clyde & Berger (2008) 9See Liang et al. (2008) and Feldkircher & Zeugner (2009) 10Note that here and throughout the remainder of the document, the terminology “best models” or “best” corresponds to best as judged by the analytical likelihood.

11See Clyde, Ghosh & Littman (2010)

## 6 SHAHRAM M. AMINI AND CHRISTOPHER F. PARMETER

converge reasonably well. The package currently handles these parameter tunings automatically by getting the computer to update tuning parameters and other choices during the course of sampling from the model space. The last method runs an initial MCMC to calculate marginal inclusion probabilities and then samples without replacement as in method="BAS" (method="MCMC+BAS").The default number of models to sample is “NULL”, meaning that a call to bas.lm() enumerates all combinations. The user can also indicate a model to initialize the sampling by passing a binary vector to the BAS functions (bestmodel). In default mode, sampling starts with the full model.

2.2.2. Model Priors. The family of prior distribution on the models is nearly identical to the BMS package allowing uniform, Bernoulli or Beta-Binomial distributions as priors (modelprior="uniform", "Bernoulli", "beta.binomial").

2.3. Alternative g Priors. To determine the prior distribution for the regression coeﬃcients, the user has the following choices:12 (1) g="AIC"; Akaike information criterion.

(2) g="BIC"; Bayesian information criterion.

(3) g="g-prior"; Takes g = N, the sample size, corresponding to the UIP.

(4) g="ZS-null"; Employs Zellner & Siow’s (1980) suggestion. If the two models under comparison are nested, the Zellner-Siow strategy is to place a ﬂat prior on common coeﬃcients and a Cauchy prior on the remaining parameters. This option utilizes the null model as the base model and compares each model with the null model.

(5) g="ZS-full"; The distinction between this method and ZS-null is the choice of base model.

ZS-full method utilizes the full model (i.e., all covariates included) as the base model and compares all other models with the full model.

(6) g="hyper-g"; This option uses a family of priors on g that provides improved mean square risk over ordinary maximum likelihood estimates in the normal means problem. Strawderman (1971) introduced this set of priors. An advantage of the hyper-g prior is that the posterior distribution of g given a model is available in closed form.

(7) g="hyper-g-laplace"; This method is same as hyper-g and the only diﬀerence is that it uses a Laplace approximation to calculate the priors on g. This avoids issues with modes on the boundary and leads to improved solutions.

(8) g="EB-local"; This procedure estimates a separate g for each model. The local empirical Bayes (EB) estimate of g is the maximum marginal likelihood estimate constrained to be nonnegative.

(9) g="EB-global"; The global EB technique sets one common g for all models and borrows strength from all models by estimating g from the marginal likelihood of the data, averaged over all models.

12See a thorough review of g priors used in BAS package in Liang et al. (2008).

** BMA IN R 7 2.3.**

1. Outputs. The main objects returned from a call to bas.lm() are posterior inclusion probabilities, posterior mean and standard deviations for each coeﬃcient, as in a call to bms(). Moreover, bas.lm() also returns the prior inclusion probability for each variable along with the posterior probability that each variable is non-zero.

2.3.2. Diagnostic Plots. The main plotting command in this package provides a panel of four graphs. The ﬁrst is a plot of the residuals versus ﬁtted values using the Bayesian model averaged coeﬃcient estimates. The second is a plot of the cumulative marginal likelihoods of all the visited models. The third is a plot of log marginal likelihood vs. model dimension and ﬁnally the fourth plot shows the posterior marginal inclusion probabilities.

2.4. The BMA Package. The BMA package (an acronym for Bayesian model averaging), performs BMA analysis assuming a uniform distribution on model priors and using a simple BIC (Bayesian Information Criterion) approximation to construct the prior probabilities on the regressions coefcients (Raftery, Hoeting, Volinsky, Painter & Yeung 2010). This package is built upon Raftery’s (1995) algorithm. The main functions in the BMA package to implement a BMA regression analysis are bicreg() and iBMA.bicreg().

2.4.1. Model Sampling. If the number of covariates is less than 30, the BMA package enumerates all models. BMA ﬁrst uses a backward elimination procedure, in which variables are eliminated one at a time starting from the full model, to reduce the initial set of variables to 30. The package screens the t-statistics for the estimated parameters and removes the variables for which statistics are small. It then utilizes a speciﬁc BIC diﬀerence, according to Table 1, to compare model Mj to Mi to ﬁnd models that are more likely a part of the ﬁnal set of good models. The models that are left are said to belong to Occam’s window, a generalization of the famous Occam’s razor, or principle of parsimony in scientiﬁc explanation. If the initial set of models in Occam’s window is large and all models are linear, BMA uses the leaps and bounds algorithm of Furnival & Wilson Jr (1974) to select a reduced set of good models. The user can set a number specifying the maximum ratio for excluding models in Occam’s window (OR). The default value of this ratio is 20. The maximum number of columns in the design matrix (maxCol), including the intercept, to be kept defaults to 31.

[Table 1 about here.] 2.4.2. Model Priors. The BMA package is inﬂexible regarding model priors, assuming all models are on equal footing a priori. The model prior is set to 1/2K where K is the total number of variables included in the analysis.

2.4.3. Regression Coeﬃcients Priors. Unlike the BMS and BAS packages, BMA does not employ Zellner’s g priors as its choice of the prior distributions for the regression coeﬃcients. Instead, the package employs the BIC approximation, which corresponds fairly closely to the UIP. Speciﬁcally,

## 8 SHAHRAM M. AMINI AND CHRISTOPHER F. PARMETER

2.4.4. Outputs. The call to bicreg() returns the posterior inclusion probabilities, the posterior mean and standard deviation of each coeﬃcient (as with bms() and bas.lm()), along with the values for the BIC of the models selected, the posterior mean of each coeﬃcient conditional on the variable being included in the model, and the R2 for the models deemed to be the most important.

2.4.5. Diagnostic Plots. BMA generates a plot of the posterior distribution for each of the estimated coeﬃcients.

Having brieﬂy described each of the BMS, BAS and BMA packages, we present the main function calls available to the user in Table 2 for ease of reference. The remainder of this section culls similarities of the key features and provides a detailed comparison of the available options for model sampling and search, construction of model priors, call outputs and diagnostics so that the reader is presented with enough information to judge the merits of each package.

3.1. Model Sampling. When the covariate space is small all three packages enumerate the model space and estimate all models to conduct the averaging. However, when the sample space of covariates is large, each of the packages engages in totally diﬀerent search behavior across the space of candidate models. Both the BAS and BMS packages use adaptive search algorithms to determine a feasible list of models to construct the posterior distribution of model probabilities.

These gives added credibility to these packages relative to the BMA package which uses a much simpler mechanism to adjudicate across candidate models. As we will see in our empirical examples, when we enumerate the model space both BMS and BAS provide roughly identical results (posterior inclusion probabilities, means and standard deviations), whereas when we resort to model sampling the interpretations of the eﬀects and inclusion of speciﬁc variables can be diﬀerent. The BMA package produces roughly similar estimates of posterior means and standard deviations, but deviates substantially with respect to the inclusion probabilities.

[Table 2 about here.]

3.3. Alternative Zellner’s g Priors. The BAS package provides additional options, including BIC and AIC, relative to the BMS package. Recall that the BMA package does not provide options for coeﬃcient priors consistent with either of the other packages.

3.4. Outputs. All three packages return the posterior probability of inclusion as well as the posterior mean and standard deviations for each coeﬃcient. However, only the BMS package provides a suite of diagnostic information to help the user gauge if further adjustment of the BMA setup is needed or to likely robustness of the results to various options within the model averaging exercise.

3.5. Diagnostic Plots. All three packages provide diagnostic output for the user to understand the results of the model averaging exercise. However, both the BMS and BAS packages provide plots of the dimensionality of the model whereas the BMA package does not.

This section compares the results of a basic BMA exercise using the UScrime dataset available in the MASS package (Venables & Ripley 2002) in R. This is a dataset on per capita crime rates in 47 U.S. states in 1960 (see Ehrlich 1973). There are 15 potential independent variables, all perceived to be associated with crime rates. The last two, probability of imprisonment and average time spent in state prisons, were the predictor variables of interest in Ehrlich’s (1973) original study, while the other 13 were control variables. This small sample will allow us to compare the results from the three packages when we can enumerate the model space. However, as a likely diﬀerence in the performance of the three packages, and the results gleaned from them, will depend upon a large model space, we also consider the same dataset with 35 ﬁctitious variables (generated as independent standard normal random deviates) included so that each package must engage in model sampling.