«Development, Security, and Cooperation Policy and Global Affairs THE NATIONAL ACADEMIES PRESS 500 Fifth Street, N.W. Washington, DC 20001 NOTICE: The ...»
PRODES PMP. 2007. Pro Decentralization Performance Monitoring Plan, Fifth Year Option, February 2007-February 2008. Lima, Peru: ARD, Inc.
Savedoff, W.D., Levine, R., and Birdsall, N. 2006. When Will We Eer Learn? Improing Lies Through Impact Ealuation. Washington, DC: Center for Global Development.
Seligson, M. 2006. The AmericasBarometer, 2006: Background to the Study. Available at:
http://sitemason.anderbilt.edu/lapop/americasbarometer00eng. Accessed on February 23, 2008.
USAID/Peru. 2002. Request for Proposals (RFP) No. 527-P-02-019, Strengthening of The Decentralization Process and Selected Sub-National Governments in Peru (“the ProDecentralization Program”). Lima, Peru: USAID/Peru.
USAID/Uganda. 2007a. Request for Proposals (RFP): Strengthening Democratic Linkages in Uganda. Kampala, Uganda: USAID/Uganda.
USAID/Uganda. 2007b. Request for Proposals (RFP): Strengthening Multi-Party Democracy.
Kampala, Uganda: USAID/Uganda.
Additional Impact Evaluation Designs and Essential Tools for Better Project Evaluations
The committee recognizes, however, that randomized designs are not always possible and alternatives need to be considered. This may be because of the costs, complexity, timing, or other details of the DG project.
Thus this chapter focuses on other methods of impact evaluation for those cases where randomization is not feasible. Examples are given of ways that USAID can develop sound impact evaluations simply by giving more attention to baseline, outcome, and comparison group measurements. The chapter begins by addressing two questions regarding choices between the use of randomized designs or the other (comparison-based) impact IMPROVING DEMOCRACY ASSISTANCE evaluation designs described in Chapter 5. First, how many of USAID’s current projects appear suitable for randomized impact designs? Second, when projects are not suitable for randomized evaluations, what options are available and how should the other methods described in Chapter 5 be chosen and applied?
HOW OFTEN ARE RANDOMIzED EvALUATIONS FEASIBLE?To help answer this question, project staff collected information about the DG activities that the USAID mission in Uganda had undertaken in recent years (see Appendix E for a list of these projects as well as those in Albania and Peru). The projects in Uganda included efforts designed to provide support for the Ugandan Parliament, strengthen political pluralism and the electoral process, and promote political participation—a fairly typical roster of projects and one that parallels those implemented by USAID missions in many countries. A team member then divided these projects into 10 major activities and scored them for (1) amenability of each activity to randomized impact evaluation and (2) where randomized evaluation was not deemed possible, the benefits of adding other impact evaluation techniques (better baseline, outcome, or comparison group measures) to existing monitoring and evaluation (M&E) designs. 1 In doing so, the committee recognizes that current USAID project monitoring plans are largely designed to track an implementer’s progress in achieving agreed-upon outputs and outcomes. Our approach, therefore, is not to assess the quality of current monitoring plans but rather to assess and illustrate instances where additional information that could reveal the impact of DG projects is currently not being collected but could readily be acquired.
The first finding of the analysis was that all 0 of the activities examined used M&E plans that omitted collection of crucial information that would be needed if USAID sought to make impact evaluations of those activities.2 The committee does not mean to criticize current M&E plans, which focus on acquiring important information for program management and resource allocation. The committee wants to draw attention to the marked difference between the content of the currently mandated and universal M&E components of most DG projects and the information that would need to be acquired to conduct a sound and credible impact evaluation of project effects. The latter is a different task and, as noted, 1 This section is based on the work of Mame-Fatou Diagne, University of California, Berkeley.
2 See Chapter 2 for a discussion of the difference between current USAID project M&E
may require different expertise in designs for project implementation and data collection than are currently part of USAID’s routine activities.
For example, unless collection of data from a nontreatment comparison group is an explicit part of the project design, there is no need to monitor whether contractors are collecting such data, and it will not normally be part of M&E activities. But without such data (including good baseline data) and a set of policy-relevant outcome measures, a project’s actual effects, as opposed to the accomplishment of project tasks, such as the number of judges trained or improved municipal accounting systems established, cannot be determined.
On a scale of 1 to 10, with 10 being the most complete and credible plan for collecting data for impact evaluation, 9 of the 10 activities received a score of 1 and one received a score of 2. Again, this underlines the difference between the character of currently mandated M&E designs and impact evaluation designs. Nonetheless, on the positive side, 5 of the 10 activities were found to be, in principle, amenable to using randomized evaluation designs to determine project impacts; 4 other activities were found to be amenable to collection of baseline or nonrandom comparison group data that would significantly improve USAID’s ability to know whether or not the activity in question had a positive impact. Seven of the 10 activities were found to be amenable to changes in how outcomes were measured that by themselves would markedly strengthen the monitoring they were already doing.3 The measurement changes alone were judged to be capable of bringing the average ability to provide inferences about project outcomes from 1 to 3 on the 10-point scale, while the shift to collecting data for impact evaluation designs was found to be capable of raising the average score for making sound inferences of project effects to over 7. These are dramatic changes, and they underscore the team’s conclusions about the large potential for USAID to more accurately and credibly assess the effects of its DG projects by adding efforts to collect impact evaluation data to its M&E designs, in at least this subset of its ongoing projects.
While the scoring of these monitoring efforts is necessarily subjective and the ability to generalize from the efforts being implemented by a single mission is obviously limited, analysis of the Uganda mission’s DG activities nonetheless offers some useful lessons. First, it suggests that a number of avenues to improve knowledge of project effects are possible, ranging from simple changes in how outcomes are measured to more substantial yet feasible changes in evaluation design. Second, it suggests an answer to the question posed earlier about the frequency with which 3 The team’s conversations with both the mission and the implementers in Uganda in
randomized evaluations are likely to be feasible. In Uganda at least, randomized evaluation was judged to be a feasible evaluation design strategy for 5 of the 10 activities being undertaken, and an additional 4 out of the 10 were judged to be amenable to nonrandomized yet systematic baseline/control group designs. In effect, then, 9 out of 10 programs in Uganda could have potentially benefited from the approaches presented in this report. This is a much larger share than is commonly assumed by the USAID staff with whom the committee consulted in the course of its investigations.
Critics are right that randomization is often not possible, however, and the team judged that for evaluating the impact of one-half of the activities examined, only other forms of evaluation designs (i.e., the large N nonrandom comparison or small N and single-case comparisons discussed in Chapter 5) would be feasible. Yet the team’s finding that onehalf of the DG activities it examined were amenable to randomized design is a higher proportion than most critics would expect. This would indicate that claims that randomized impact evaluations are only “rarely” or “hardly ever” possible may be too pessimistic. Perhaps even more important, fully 9 out of 10 of these activities were found to be suitable for some form of the impact evaluation designs described in Chapter 5.
Given that none of these activities in Uganda are currently collecting the kind of information needed for such impact evaluations, but 9 out of 10 could potentially do so, USAID appears to have a great deal of choice and flexibility in deciding how much, and whether, to increase the number of programs and the amount of information it collects to determine the effects of its DG activities.
As noted in Chapter 5, randomized evaluations require that there be a very large number of units across which the projects in question might, at least in principle, be implemented, as well as that program designers be able to choose these units randomly. Many high-priority USAID DG projects—for example, those that focus on strengthening individual ministries, professional associations, or institutions; those that support the creation of vital new legislation or constitutions; or those that build capacity to achieve national-level goals such as more effective election administration—do not meet these criteria. Such projects are critical to achieving the larger goal of improving democratic governance. Precisely because they are important, improving USAID’s ability to evaluate the impact of the millions of dollars that it spends each year on implementing such projects should be accorded a high priority.
The next section addresses the question of what to do to carry out impact evaluations in situations of this kind. First, the general issue is discussed and then the other evaluation techniques highlighted in Chapter 5, with specific examples from the field are discussed. Finally, the discussion