# «The term ‘treatment effect’ refers to the causal effect of a binary (0–1) variable on an outcome variable of scientific or policy interest. ...»

treatment effects

The term ‘treatment effect’ refers to the causal effect of a binary (0–1) variable on an

outcome variable of scientific or policy interest. Economics examples include the effects

of government programmes and policies, such as those that subsidize training for

disadvantaged workers, and the effects of individual choices like college attendance. The

principal econometric problem in the estimation of treatment effects is selection bias,

which arises from the fact that treated individuals differ from the non-treated for reasons other than treatment status per se. Treatment effects can be estimated using social experiments, regression models, matching estimators, and instrumental variables.

A ‘treatment effect’ is the average causal effect of a binary (0–1) variable on an outcome variable of scientific or policy interest. The term ‘treatment effect’ originates in a medical literature concerned with the causal effects of binary, yes-or-no ‘treatments’, such as an experimental drug or a new surgical procedure. But the term is now used much more generally.

The causal effect of a subsidized training programme is probably the mostly widely analysed treatment effect in economics (see, for example, Ashenfelter, 1978, for one of the first examples, or Heckman and Robb, 1985 for an early survey). Given a data-set describing the labour market circumstances of trainees and a non-trainee comparison group, we can compare the earnings of those who did participate in the programme and those who did not. Any empirical study of treatment effects would typically start with such simple comparisons. We might also use regression methods or matching to control for demographic or background characteristics.

In practice, simple comparisons or even regression-adjusted comparisons may provide misleading estimates of causal effects. For example, participants in subsidized training programmes are often observed to earn less than ostensibly comparable controls, even after adjusting for observed differences (see, for example, Ashenfelter and Card, 1985). This may reflect some sort of omitted variables bias, that is, a bias arising from unobserved and uncontrolled differences in earnings potential between the two groups being compared. In general, omitted variables bias (also known as selection bias) is the most serious econometric concern that arises in the estimation of treatment effects. The link between omitted variables bias, causality, and treatment effects can be seen most clearly using the potential-outcomes framework.

Causality and potential outcomes The notion of a causal effect can be made more precise using a conceptual framework that postulates a set of potential outcomes that could be observed in alternative states of the world.

Originally introduced by statisticians in the 1920s as a way to discuss treatment effects in randomized experiments, the potential outcomes framework has become the conceptual workhouse for non-experimental as well as experimental studies in many fields (see Holland, 1986, for a survey and Rubin, 1974; 1977, for influential early contributions). Potential outcomes models are essentially the same as the econometric switching regressions model (Quandt, 1958), though the latter is usually tied to a linear regression framework. Heckman (1976; 1979) developed simple two-step estimators for this model.

Average causal effects Except in the realm of science fiction, where parallel universes are sometimes imagined to be observable, it is impossible to measure causal effects at the individual level. Researchers therefore focus on average causal effects. To make the idea of an average causal effect concrete, suppose again that we are interested in the effects of a training programme on the post-training earnings of trainees. Let Y1i denote the potential earnings of individual i if he were to receive training and let Y0i denote the potential earnings of individual i if not. Denote training status by a dummy variable, Di. For each individual, we observe Yi = Y0i + Di(Y1i − Y0i), that is, we observe Y1i for trainees and Y0i for everyone else.

Let E[·] denote the mathematical expectation operator, i.e., the population average of a random variable. For continuous random variables, E[Yi] = ∫yf(y)dy, where f(y) is the density of Yi. By the law of large numbers, sample averages converge to population averages so we can think of E[·] as giving the sample average in very large samples. The two most widely studied average causal effects in the treatment effects context are the average treatment effect (ATE), E[Y1i − Y0i], and the average treatment effect on the treated (ATET), E[Y1i − Y0i| Di = 1]. Note that the ATET can be rewritten [ - | =1]= [ | = 1]- [ | = 1 ].

E Y1i Y0 i Di E Y1 i Di E Y 0 i Di This expression highlights the counter-factual nature of a causal effect. The first term is the average earnings in the population of trainees, a potentially observable quantity. The second term is the average earnings of trainees had they not been trained. This cannot be observed, though we may have a control group or econometric modelling strategy that provides a consistent estimate.

Selection bias and social experiments As noted above, simply comparing those who are and are not treated may provide a misleading estimate of a treatment effect. Since the omitted variables problem is unrelated to sampling variance or statistical inference, but rather concerned with population quantities, it too can be efficiently described by using mathematical expectation notation to denote population averages.

The contrast in average outcomes by observed treatment status is E[Yi | Di = 1] − E[Yi | Di = 0] = E[Y1i | Di = 1] − E[Y 0 i | Di = 0] = E[Y1i | Y 0 i | Di = 1] + {E[Y 0 i | Di = 1] − E[Y 0 i | Di = 0]} Thus, the naive contrast can be written as the sum of two components, ATET, plus selection bias due to the fact that the average earnings of non-trainees, E[Y0i|Di = 0], need not be a good standin for the earnings of trainees had they not been trained, E[Y0i|Di = 1].

The problem of selection bias motivates the use of random assignment to estimate treatment effects in social experiments. Random assignment ensures that the potential earnings of trainees had they not been trained – an unobservable quantity – are well-represented by the randomly selected control group. Formally, when Di is randomly assigned, E[Yi|Di = 1]−E[Yi|Di = 0] = E[Y1i−Y0i|Di = 1] = E[Y1i−Y0i]. Replacing E[Yi|Di = 1] and E[Yi|Di = 0] with the corresponding sample analog provides a consistent estimate of ATE.

Regression and matching Although it is increasingly common for randomized trials to be used to estimate treatment effects, most economic research still uses observational data. In the absence of an experiment, researchers rely on a variety of statistical control strategies and/or natural experiments to reduce omitted variables bias. The most commonly used statistical techniques in this context are regression, matching, and instrumental variables.

Regression estimates of causal effects can be motivated most easily by postulating a constant-effects model, where Y1i − Y0i = α (a constant). The constant-effects assumption is not strictly necessary for regression to estimate an average causal effect, but it simplifies things to postpone a discussion of this point. More importantly, the only source of omitted-variables bias is assumed to come from a vector of observed covariates, Xi, that may be correlated with Di. The key assumption that facilitates causal inference (sometimes called an identifying assumption), is that E[Y0i | X i, Di ] = X ′i β, (1) where β is a vector of regression coefficients. This assumption has two parts. First, Y0i (and hence Y1i, given the constant-effects assumption) is mean-independent of Di conditional on Xi.

Second, the conditional mean function for Y0i given Xi is linear. Given eq. (1), it is straightforward to show that E{Yi ( Di − E[ Di | X i ])}/ E{Di ( Di − E[ Di | X i ])} = α. (2) This is the coefficient on Di from the population regression of Yi on Di and Xi (that is, the regression coefficient in an infinite sample). Again, the law of large numbers ensures that sample regression coefficients estimate this population regression coefficient consistently.

Matching is similar to regression in that it is motivated by the assumption that the only source of omitted variables or selection bias is the set of observed covariates, Xi. Unlike regression, however, treatment effects are constructed by matching individuals with the same covariates instead of through a linear model for the effect of covariates. The key identifying assumption is also weaker, in that the effect of covariates on Y0i need not be linear. Instead of (1), the conditional independence assumption becomes

In other words, we can construct ATET or ATE by averaging X-specific treatment-control contrasts, and then reweighting these X-specific contrasts using the distribution of Xi for the treated (for ATET) or using the marginal distribution of Xi (for ATE). Since these expressions involve observable quantities, it is straightforward to construct consistent estimators from their sample analogs.

The conditional independence assumption that motivates the use of regression and matching is most plausible when researchers have extensive knowledge of the process determining treatment status. An example in this spirit is the Angrist (1998) study of the effect of voluntary military service on the civilian earnings of soldiers after discharge, discussed further below.

Regression and matching details In practice, regression estimates can be understood as a type of weighted matching estimator. If, for example, E[Di|Xi] is a linear function of Xi (as it might be if the covariates are all discrete), then it is possible to show that eq. (2) is equivalent to a matching estimator that weights cell-bycell treatment-control contrasts by the conditional variance of treatment in each cell (Angrist, 1998). This equivalence highlights the fact that the most important econometric issue in a study that relies on conditional independence assumptions to identify causal effects is the validity of these conditional independence assumptions, not whether regression or matching is used to implement them.

A computational difficulty that sometimes arises in matching models is how to find good matches for each possible value of the covariates when the covariates take on many values. For example, beginning with Ashenfelter (1978), many studies of the effect of training programmes have shown that trainees typically experience a period of declining earnings before they go into training. Because lagged earnings is both continuous and multidimensional (since more than one period’s earnings seem to matter), it may be hard to match trainees and controls with exactly the same pattern of lagged earnings. A possible solution in this case is to match trainees and controls on the propensity score, the conditional probability of treatment given covariates. Propensityscore matching relies on the fact that, if conditioning on Xi eliminates selection bias, then so does conditioning on P[Di = 1|Xi], as first noted by Rosenbaum and Rubin (1983). Use of the propensity score reduces the dimensionality of the matching problem since the propensity score is a scalar, though in practice it must still be estimated. See Dehejia and Wahba (1999) for an illustration.

Regression and matching example Between 1989 and 1992, the size of the military declined sharply because of increasing enlistment standards. Policymakers would like to know whether the people – many of them black men – who would have served under the old rules but were unable to enlist under the new rules were hurt by the lost opportunity for service. The Angrist (1998) study was meant to answer this question. The conditional independence assumptions seems plausible in this context because soldiers are selected on the basis of a few well-documented criteria related to age, schooling, and test scores and because the control group also applied to enter the military.