«Development, Security, and Cooperation Policy and Global Affairs THE NATIONAL ACADEMIES PRESS 500 Fifth Street, N.W. Washington, DC 20001 NOTICE: The ...»
While the field team in Peru described how a past project might have been designed in a way that permitted rigorous evaluation, the Uganda team focused on a multifaceted set of projects that were just getting started. Working with mission staff, the committee’s experts identified a series of planned interventions, each of which could be assessed using tools of randomized evaluation. Although these evaluation models do not cover every planned intervention currently under consideration by the Uganda mission, if implemented, they would provide substantial new evidence about the efficacy of USAID DG programming in Uganda.
CHALLENgES IN APPLyINg RANDOMIzED
the incentives (or disincentives) that DG staff and implementers have to conduct impact evaluations and their current capabilities to do so.
Randomly selecting units for treatment is simply not workable. Adopting the principle of random assignment runs the risk that certain units that project designers would very much like to include in the treatment group will wind up being excluded from the program. For some USAID staff and implementers with whom the committee spoke, this was a major reason to resist adoption of randomized evaluations. It was pointed out, for example, that in many situations USAID and its implementers can only work with local authorities that accept their help. Moreover, it was suggested that units (municipalities, ministries, groups) that lacked the “political will” to work with USAID to fully implement the programs in question would not be likely to achieve successful outcomes and thus do not merit an investment of resources. It was also suggested that units with exemplary past performance sometimes appeared to be such sure bets for program success that excluding them from participation in the new project appeared wasteful.
These are reasonable objections; however, accepting their merit need not imply jettisoning a randomized design. One option that satisfies the need for randomized selection of treatment units while also recognizing that rolling out a program in some units may not be feasible would be to select the set of units that are eligible for treatment on the basis of political will and other criteria that USAID believes maximize the chances for success and then to assign units randomly to treatment and control groups within this group of eligible units. This approach is also useful for situations where USAID seeks to limit programs to needy or conflict-affected areas, as long as there are more units than USAID can possibly treat.
Another option, suitable for situations where, for political or other reasons, allocating treatment to one or several units may be nonnegotiable (i.e., the consensus among project designers is that a particular unit or units simply must be included in the treatment group), is to go ahead with random selection of units for treatment but leave aside a certain percentage of the project budget (e.g., 10 to 15 percent) to pay for the implementation of program activities in units that were not selected but that organizers feel must be included. In such a case the evaluation would be based on a comparison of the regular treated group (not including the added units) with the control group. Of course, one can always look as well at outcomes in the non-randomly selected—the “must have”—units.
Yet comparing outcomes in such units to nontreated units would be less informative about the causal impact of the USAID intervention than comparing outcomes across the units that were randomly assigned to the treatment group and the control group.
0 IMPROVING DEMOCRACY ASSISTANCE It is unethical or impossible to preserve a control group. Is it ethical to deny treatment to control groups? This issue arises frequently in public health programs but may also be relevant in projects where, as with interventions in the area of DG, the assistance is welfare improving even if not, strictly speaking, life saving. As with public health studies, the standard defense applies: Without an experiment, how do we know whether or not the intervention helps? USAID intervenes to assist DG all over the world. As in the public health field, it behooves us to know with as much confidence as possible what works and what does not. Continuing to channel scarce resources to projects that, once properly evaluated, turn out to have no positive impact is wasteful, particularly when properly executed randomized evaluations could put USAID in a position to identify projects that do work and whose reach and impact could usefully be expanded with a shift in resources from those that have been found to be underperforming.
A second defense of randomized assignments against the criticism that some units will go untreated is that, in any project being implemented across a large number of potential units, there will virtually always be untreated units. In the context of a decentralization project involving dozens of municipalities, it is simply not feasible for USAID to work with all of them; in the context of a project designed to support CSO development, it is simply not possible for USAID to work with every group. Given the impossibility of treating eery unit, the only question is how untreated units will be chosen. In many contexts it may be fairest, and most ethically defensible, to choose untreated units by lottery, as would be the case in a randomized evaluation.
Finally, even if every unit is to be treated, it may be reasonable to delay treatment for a portion of the units by a randomized rollout. In this case, while some units (chosen by lottery) will get assistance first, others will have a delay before they receive assistance. Yet for the group that faces delay, this may be more than compensated by the possibility that the delayed group will either be spared an ineffective treatment or will receive improved assistance, since the initial phase of the rollout provides the basis for learning from a randomized impact study of the treatment’s effects.
Isolating control from treatment groups is not feasible in practice. A third objection involves the great difficulty in preventing the effects of treated units from “spilling over” and affecting control units. For example, a project that provides support for CSOs to advocate improved service delivery may impact not only the area in which the CSOs are based but also neighboring areas (either because local governments fear similar mobilization and act to forestall it or because CSOs in neighboring areas become emboldened by the example of what their colleagues are doing
IMPLEMENTING IMPACT EVALUATIONS IN THE FIELDnext door and step up their own advocacy). Another example of spillover is when grassroots party activities in one locale yield benefits in other places, either because party contacts extend across administrative boundaries or because changing attitudes are transmitted across familial and social networks. Whenever there are spillover effects (and there often are), the difference between the control and treatment groups is attenuated, and this will bias the evaluation toward a finding of no effect.
Sometimes, design modifications can help minimize the likelihood of spillover. For example, in the context of the Peruvian decentralization project discussed earlier, randomizing at the provincial level might decrease the probability that district mayors are aware of treatments administered to other units. In this case all municipalities in a province would be in either the treatment group or the control group, thereby minimizing the likelihood of spillover from municipality to municipality (except insofar as they happen to be located adjacent to a provincial boundary).
But while problematic for inference, spillover effects may be important to measure in their own right. In their study of deworming programs in Western Kenya, for example, Miguel and Kremer (2004) found that deworming interventions are not cost-effective unless the positive externalities of the program that spill over into neighboring untreated communities are accounted for. Taking advantage of the fact that the treatment is randomly assigned across space, they estimate the size of these spillover effects and then use the estimates to calculate the true effects of the deworming program, which they find to be positive once the spillover effects are accounted for. Their study underscores that not just minimizing but also measuring contamination must be a core aspect of any well-designed randomized evaluation.
A related problem is the possibility that donors from other countries might concentrate their programs in areas in which USAID is not undertaking program activities, thereby, as one program officer put it, “flooding the controls.” This may happen intentionally, when donors coordinate and divide up areas of focus to avoid duplication of efforts. Or projects not intended to directly influence democracy, such as programs to create entrepreneurs or regional cooperative associations, may in fact help the spread of democracy in the area being observed. If this occurs, the other donors’ interventions become a confounding factor associated with treatment, and this will almost certainly bias inferences about the effect of USAID interventions.23 One possible response to this issue is not to advertise the existence of 23 However, it might be pointed out that, if anything, this is likely to dilute the (it is hoped positive) effect of treatment. If other donors flood the controls and there is still a difference between groups, a causal effect of USAID’s intervention can be inferred. (At least, the effect of USAID relative to other donors can be evaluated.) IMPROVING DEMOCRACY ASSISTANCE control units. For example, in the context of a decentralization project it may be known that USAID is working in seven regions, but it need not be made publicly known which particular municipalities it is working with in each region. A second solution is to commit in advance to implement the project in all units (and to make this publicly known) but to roll it out gradually, using untreated units as a comparison group for treated units in the years before they are added to the intervention (as in the second design for the Peru decentralization program described earlier). Another option is to randomize different treatments across all municipalities. In other words, USAID would work with all municipalities in the seven regions (thereby leaving no municipalities to be flooded) but randomly assign different treatments to different municipalities (again, as discussed earlier for Peru). One final possibility is to engage other donors in conceptualizing the evaluation exercise. If multiple donors are implementing similar interventions, all would benefit from an impact evaluation of their projects. In such circumstances it may be possible to coordinate USAID’s activities with theirs to preserve a control group.