«Development, Security, and Cooperation Policy and Global Affairs THE NATIONAL ACADEMIES PRESS 500 Fifth Street, N.W. Washington, DC 20001 NOTICE: The ...»
One problem, however, is that finding appropriate measures of the outcomes that the activities are designed to affect is frequently far from straightforward. For example, the goals of the technical assistance to the Inspectorates of the High Council of Justice and the Ministry of Justice are to improve the transparency and accountability of the judiciary and to increase public confidence in judicial integrity. The latter can be measured fairly easily using public opinion polls administered before and after the period during which technical assistance was offered and then comparing the results. However, measuring the degree to which the judiciary is transparent and accountable is much more difficult. Part of the problem stems from the fact that the true level of transparency and accountability in the judiciary can only be ascertained vis-à-vis an (unknown) set of activities that should be brought to light and an (unknown) level of malfeasance that needs to be addressed. For example, suppose that, following implementation of a program designed to support the Inspectorate of the High Council of Justice, three judges are brought up on charges of corruption. Should this be taken as a sign that the activities worked in generating greater accountability? Compared to a baseline of no prosecutions, the answer is probably yes, at least to some degree, although one would also want to know whether prosecutions were selective, based on political reasons.
A slightly different evaluation problem arises with respect to activities designed to support the drafting of various pieces of legislation. One fairly straightforward measure of success in this area is simply whether IMPROVING DEMOCRACY ASSISTANCE or not the law was actually drafted and, if so, whether it included language that will demonstrably strengthen the rule of law. But assessing whether or not USAID’s support had any impact requires weighing a counterfactual question: Would the legislation have been drafted without USAID’s support and what would it have looked like? If the answers to these questions are that the legislation would not have been drafted or that the language in the resulting law would not have been optimal, the support from USAID can be judged to have been successful to the extent that the result observed is better than this counterfactual outcome. The broader problem, however, is that achieving the overarching strategic objective of strengthening the rule of law will involve more than just getting legislation drafted; it will involve getting legislation passed and then having it enforced. The point—echoing a theme from Chapter 3—is that the measurable outcome of the USAID-sponsored activity is several steps removed from the true goals of the intervention, and any assessment of “success” in these areas must be interpreted in this light. Proper measurement of project impact must move beyond proximate questions (were the institutions created?) to more distant and policy-relevant ones (have the outcomes that the existence of the new institutions were hypothesized to affect been altered in a positive way?). Answering the second question requires the existence of high-quality baseline data, preferably stretching back as far in time as possible so as to be able to distinguish general trends from project effects.
Additional Techniques to Aid Project Evaluation When N = 1 In addition to collecting high-quality baseline and follow-up data, two other techniques can aid project evaluators in making sound judgments about project efficacy. The first is to explicitly attempt to identify and rule out alternative explanations. If what looks like a project effect is identified, evaluators must ask what other factors outside the scope of the project might have caused the observed outcome. Can they be ruled out?
For example, suppose it is found that the passage of a new anticorruption law whose drafting was sponsored by USAID corresponds with a drop in corruption, as measured in national surveys. It would be important to think carefully about other factors that might have occurred at the same time as passage of the new legislation which might also account for the drop in measured corruption. Perhaps a crusading anticorruption minister was appointed right after the new legislation was passed. Might her presence at the helm of a key ministry have caused the change? One way to rule out this possibility would be to see whether larger changes in perceived corruption were evidenced in her ministry than in others or whether perceived corruption increased again after she left office—both
ADDITIONAL IMPACT EVALUATION DESIGNS AND ESSENTIAL TOOLSof which would be consistent with the argument that her appointment, not the new law, was responsible for the drop in corruption measured in the surveys. The more such competing explanations can be identified and ruled out, the more confidence there can be in the conclusion that the legislation was responsible for the positive outcome.
Evaluators are in a better position to rule out alternative explanations to the extent that USAID or its implementing partners can manipulate the timing of the intervention. An effort can be made even before a program is begun to identify other planned interventions or major events that could affect the outcome of interest and make it hard to disentangle the effect of USAID’s program from other possible factors. In this context a decision could be made to delay or speed up implementation of the program to minimize the likelihood that temporal changes in the measurement of program outcomes reflect things other than USAID’s program. To make this idea more concrete, imagine an intervention designed to increase the quantity and quality of debate in a parliament. The intervention might involve a series of training sessions on parliamentary business, a change in the rules that ties salary to attendance and participation, or an accountability mechanism that reports to the public on the activities of members of parliament. Regardless of the intervention, the outcome of interest is clear: whether members exhibit higher attendance rates and are more active in parliament after the project is complete. The problem is that many other factors might be responsible for an increase in attendance or participation—for example, if preparations for the budget begin soon after the program is initiated, this may drive up attendance and participation. If these other factors can be anticipated and avoided in planning the timing of the intervention, even stronger inferences can be drawn from temporal trends in the outcome variables.
A second strategy for improving causal inference in an N = 1 design is to look beyond the narrow outcome that the project was designed to affect and try to identify other outcomes that would be consistent with positive project impact. The example provided earlier from Uganda of using the success of projects targeting the disabled to verify the effectiveness of completely separate projects designed to promote the empowerment of marginalized citizens illustrates this technique. With regard to evaluating the effectiveness of the anticorruption legislation, an example of such a strategy would be to look at changes in applications for business licenses, which might be expected to rise as the requirement that applicants pay bribes diminishes. Again, the greater the number of outcomes consistent with project success that can be identified, the more confidence there can be in inferring that the project was, in fact, successful.
Designing impact evaluations where a large number of units are available and USAID has control over where or with whom it will work is IMPROVING DEMOCRACY ASSISTANCE relatively straightforward, although the actual design requires substantial skill. In principle, all that is needed is a random number generator—or even just a coin to toss—to assign units to treatment or control groups.
Then once the project is implemented, all that is needed is to compare average outcomes in the control and treatment groups and test whether the differences are statistically significant. The higher art of impact evaluation comes in situations where randomized evaluations are not possible.
Under such circumstances, identifying sound project designs requires flexibility, creativity, understanding of the facts on the ground, and a good sense of the implications of various design decisions for the interpretation of program evaluations. This makes them difficult, both to design and, because of the need to tailor the methodology to the details of the particular project in question, to specify ex ante. However, it does not make them impossible. As the many examples provided in this chapter suggest, there are opportunities to move beyond the current M&E approach to impact evaluations that provide key information for determining program effects, even in the most difficult, and quite common, situation where there is only a single unit being treated. Good designs require skilled, well-trained program designers—the cultivation of which should be a priority for USAID.
It also requires an organization with the resources and capacity to do the work—issues discussed in Chapters 8 and 9.
CONCLUSIONSFor every DG-promoting activity that USAID undertakes, particularly those that are central to its mission or that involve the expenditure of
large sums of money, USAID wants to be able to answer two questions:
Was doing the activity better than doing nothing at all? If so, how much better? Generally, although they may serve other management purposes well, the required M&E designs that USAID currently employs are insufficient to do this. Answering these questions requires the use of impact evaluations, which in turn require somewhat different designs. The committee found that the vast majority of USAID staff that it encountered were deeply committed to improving democratic governance around the world and to being able to evaluate the progress they were, or were not, making. The committee also found that many USAID staffers were frustrated by their inability to better answer the basic question: Are we having a positive impact?
The impact evaluation designs described in this report, and the examples presented in the previous two chapters, suggest that in principle there is considerable scope for USAID to improve its ability to answer this question. The committee would neither expect nor recommend that the agency undertake impact evaluations of all of its activities. The comADDITIONAL IMPACT EVALUATION DESIGNS AND ESSENTIAL TOOLS mittee’s specific recommendation is that USAID begin with a modest and focused initiative to examine the feasibility of applying such impact evaluation designs, including those using randomized assignment, to a small number of projects.
At the same time, the committee realizes that undertaking more impact evaluations alone will not provide the broadly based and context-sensitive information that USAID needs to plan its DG programs.
Process evaluations, the kinds of case studies discussed in Chapter 4, and more informal lessons from the field obtained by DG staff, implementers, nongovernmental organizations, and independent researchers provide important insights, valuable hypotheses, and illustrations of how programs are received and respond to changing conditions. The committee believes that USAID needs to develop organizational characteristics that will provide both incentives for more varied evaluations of its projects and mechanisms to help agency staff absorb, discuss, and continually learn from a variety of sources about those factors that affect the impact of DG programs.
REFERENCESBertrand, M., Duflo E., and Mullainathan, S. 2004. How Much Should We Trust Differencein-Difference Estimates? Quarterly Journal of Economics 119(1):249-275.
Institute for Development Research Alternatives. 2007. Corruption in Albania: Perception and Experience: Surey 00, Summary of Findings. Tirana: Institute for Development Research Alternatives and Casals & Associates.