«Development, Security, and Cooperation Policy and Global Affairs THE NATIONAL ACADEMIES PRESS 500 Fifth Street, N.W. Washington, DC 20001 NOTICE: The ...»
It is hard to plan an evaluation (or stick to one) because mission objectives and programs change all the time. A common concern the field teams heard was that randomized evaluations are insufficiently flexible to be practical. As a political officer at the U.S. Embassy in Peru commented, the embassy is sometimes compelled to “put out fires.” For example, in an experimental evaluation of the impact of municipal-level interventions in mining towns, the embassy might have to intervene if a conflict broke out in a community. This may or may not pose an issue for causal inference.
Some “fires” may be independent of treatment assignment—that is, they may be equally likely to occur in treated units as in control units. However other “fires” may be products of the treatment. They may reflect, for example, the absence of a desired treatment among controls, which necessarily feel left out. This raises more serious issues. Unanticipated events that require additional interventions in either treatment or control communities must be recorded so that they can be taken into account in the final evaluation. Such events may make interpretation of the results more complicated, but the possibility that they might arise is not an argument to forego randomized evaluations per se.
In addition, missions may wish to adjust programming midstream, either by learning lessons from an early assessment of outcomes or by responding to new developments on the ground. Sometimes this is quite consistent with the purposes of a good evaluation. For example, if there is powerful evidence part way through that a project is working, USAID may wish to extend its reach into communities that were previously in the control group (medical trials are often abandoned early if there is robust
IMPLEMENTING IMPACT EVALUATIONS IN THE FIELDevidence of the benefits, or dangerous consequences, of a treatment). The phrase “if there is powerful evidence” is crucial here. Since the whole purpose of the randomized evaluation is to generate evidence for a project’s success or failure, there is no trade-off whatsoever in abandoning it or in tweaking it midstream, if “powerful evidence” for the project’s efficacy has already emerged. A real trade-off presents itself only if the evidence for the project’s success or failure is still tentative. In such a situation a judgment call would have to be made about the relative importance of confirming what the initial evidence seems to suggest (which would require not altering the design of the randomized evaluation) or moving ahead with the change in course (which might have the benefit of maximizing impact but risks acting on a hunch that may have been ill founded).
The more difficult issue is when, as frequently happens, unforeseen challenges arise in project implementation that USAID thinks require slight adjustments in the interventions or sometimes the replacement of implementers. Changing the treatment part of the way through the process is, of course, not ideal. As long as the adjustments are consistent across the treatment group, however, there is no threat to causal inference (although it should be kept in mind that the ultimate evaluation measures a more complicated treatment). Whatever the source of the midstream correction, responsible officials will need to remember that the benefits of continuing with the rigorous evaluation design accrue agency-wide and are not limited to the particular mission or project. So the advantages of a midcourse correction for a project or mission will need to be balanced against the potential loss of valuable evaluation information that could be usefully applied to programs in other countries.
Randomized evaluations are too complex; USAID does not have the expertise to design and oversee them. Staff both in the field and in Washington consistently raised the objection that USAID is not well equipped to design and implement, or even simply oversee, randomized evaluations. This is a valid concern. While the idea of randomized evaluation is intuitive and easy to understand, the design of high-quality randomized evaluations requires additional academic training, specialized expertise, and good instincts for research design. It is likely that many (or most) USAID DG staff do not have training in research methods and causal inference, thus making it difficult for them to evaluate the quality of proposed impact evaluations or to play a role in their design and implementation.
The committee wants to emphasize that the guidance provided in this report should not be seen as a “cookbook” of ready-made evaluation designs for DG officers. It would be a mistake for USAID to endorse the typology of evaluation designs outlined in Chapter 5 and then require DG IMPROVING DEMOCRACY ASSISTANCE officers to put these new designs into practice without additional training or support. Because the issue of competence and capacity is so central to the prospect of improving evaluation in USAID DG programs, Chapter 9 is dedicated to providing recommendations about how USAID DG could make the necessary investments and provide appropriate incentives to encourage using impact evaluations of its projects where appropriate and feasible.
It will cost too much to conduct randomized evaluations. Perhaps the most important objection the committee encountered in the field is that randomized evaluation will cost too much. In part, this is a question of USAID’s priorities. If the agency is committed to knowing whether important projects achieve an impact, it will need to commit the necessary resources to the task. But aside from whether the agency commits to higher quality evaluations, it is legitimate to ask how much more randomized evaluation will cost than the procedures currently employed.
The committee’s field teams were tasked with some detective work in an effort to answer this question. As discussed in Chapter 8, the committee discovered that USAID could not provide concrete information about how much it spends on monitoring and evaluation (M&E) every year, even for a subsample of DG programs. The committee therefore encouraged the field teams to explore the cost of current approaches by reviewing project documents and through discussions with mission staff. They, too, encountered insurmountable obstacles; project documents almost never provided line items for M&E and what was reported was not consistent from one project to another. Based on interviews with implementers, the field teams reported that nontrivial amounts of time were dedicated to the collection of output and outcome indicators and the monitoring of performance, but no team could arrive at any hard numbers related to current expenditures. The committee thus cannot answer the question of how much more it will cost to introduce baseline measures, data collection for comparison groups, or random assignment, relative to current expenditures on M&E. At best it can be said that in a number of cases that the field teams examined, it seems that substantial improvements in all of these areas could be obtained for little or no additional cost, but that in other cases the costs could be substantial. Much depends on whether data are being collected from third parties or local governments versus being generated by surveys or other primary data collection by implementers, on whether surveys are already being used for the projects or would need to be developed specifically for the project in question, and on the specific outcomes that have to be measured in the treatment and control groups.
As noted, in some cases—such as reducing the initial number of units treated in order to preserve a control group—an impact evaluation could
IMPLEMENTING IMPACT EVALUATIONS IN THE FIELDactually save money compared to providing all groups with assistance immediately, before the effects of the project have been tested.
But how much will a randomized evaluation cost? Answering this question requires two different calculations. The first is the straightforward calculation of how much more it will cost to collect the necessary data. This will depend on the number of control and treatment units required for a useful random assignment; the more subtle the expected effects, the larger the number of units that will be required, with a corresponding increase in the cost of data collection. The factor to keep in mind is that, even if data collection is more costly in a randomized evaluation design, the potential benefit is that it would put USAID in a position to assess the impact of the project with much more confidence and to detect subtle improvements that might not be visible without a randomized design.
The second, much trickier, calculation lies in assessing (1) the cost of selecting units at random, which may entail not implementing project activities in units where USAID might have reason to believe that the project will have a large positive impact and/or (2) going ahead with the implementation of project activities in units where USAID has reason to believe that the project will fail. Here the cost is less a direct expense than an opportunity cost. Again, these costs must be weighed against the potential benefit of being able to conclude whether or not the project worked. Note, however, that the latter type of cost (of directing program funds either to places where staff are convinced the project will not work or away from places where staff are convinced that it will) will be greater the more confident staff members are about whether or not (or where) an accurate prediction can be made about exactly where a project will be successful and where it will not. If it is already known whether (or where) a project will work, then randomized evaluations are not needed to answer this question. The real peril lies in believing wrongly that the consequences of a program are, in fact, known and allocating resources on that basis when the hypotheses behind a program have not been tested by impact evaluations.
CONCLUSIONSThe committee’s consultants believed they had demonstrated that at least some of the types of projects USAID is now undertaking could be subject to the most powerful impact evaluation designs—large N randomized evaluations—within the normal parameters of the project design.
For a majority of committee members, this provided a “proof of concept” that the designs would also be feasible in the sense that they would work in practice as well as in theory. However, one committee member IMPROVING DEMOCRACY ASSISTANCE with experience in actually managing DG programs remained skeptical as to whether the complexity and dynamic nature of DG programming would allow random assignment evaluation designs to be implemented successfully. The committee also notes that doing random assignment evaluations in the highly politicized field of democracy assistance will likely be controversial. It is, therefore, recommended in Chapter 9, as part of a broader effort to improve evaluations and learning regarding DG programs at USAID, that USAID begin with a limited but high-visibility initiative to provide a test of the feasibility and value of applying impact evaluation methods to a select number of its DG projects.
REFERENCESCarrión, J.F., Zárate, P., and Seligson, M.A. 2007. The Political Culture of Democracy in Peru: 2006. Latin American Public Opinion Project (LAPOP), Vanderbilt University and Instituto de Estudios Peruanos, Lima, Peru. Available at http://stemason. anderbilt.edu/ files/gcfLNu/Peru_English_DIMS%000with0corrections0,pdf. Accessed on April 26, 2008.
Dehn, J., and Svensson, J. 2003. Survey Tools for Assessing Performance in Service Delivery.
Working Paper. Development Research Group, The World Bank, Washington, DC.
GRADE. 2003. Grupo de Análisis para el Desarrollo, Linea de Base Rapida: Gobiernos Subnacionales e Indicadores de Desarrollo. Lima, Peru: GRADE.
Gugerty, M.K., and Kremer, M. 2006. Outside Funding and the Dynamics of Participation in Community Associations. Background Papers. Washington, DC: World Bank. Available at http://siteresources.worldbank.org/INTPA/Resources/Training-Materials/OutsideFunding.
pdf. Accessed on April 26, 2008 Miguel, E., and Kremer, M. 2004. Worms: Identifying Impacts on Education and Health in the Presence of Treatment Externalities. Econometrica 72(1):159-217.
Olken, B.A. 2007. Monitoring Corruption: Evidence from a Field Experiment in Indonesia.
Journal of Political Economy 115:200-249.
PRODES PMP. 2006. Pro Decentralization Performance Monitoring Plan, 2003-2006. Lima, Peru: ARD, Inc.