«Development, Security, and Cooperation Policy and Global Affairs THE NATIONAL ACADEMIES PRESS 500 Fifth Street, N.W. Washington, DC 20001 NOTICE: The ...»
Using participatory evaluations to determine how much a DG activity contributed to democratic progress, or even to more modest and specific goals such as reducing corruption or increasing legislative competence, can pose problems. Participants’ views of a project’s value may rest on their individual perceptions of personal rewards. This may bias their perception of how much the program has actually changed, as they may be inclined to overestimate the impact of an activity if they benefited from it personally and hope to have it repeated or extended. Thus participatory evaluations should be combined with collection of data on additional indicators of project outcomes to provide a full understanding of project impacts.
Another type of evaluation is an output evaluation (generally equivalent to “project monitoring” within USAID). These evaluations consist of efforts to document the degree to which a program has achieved certain targets in its activities. Targets may include spending specific sums on various activities, giving financial support or training to a certain number of nongovernmental organizations (NGOs) or media outlets, training a certain number of judges or legislators, or carrying out activities involving a certain number of villagers or citizens. Output evaluations or monitoring are important for ensuring that activities are carried out as planned and that money is spent for the intended purposes. USAID thus currently spends a great deal of effort on such monitoring, and under the new “F Process,” missions report large numbers of output measures to USAID headquarters (more on this below).
Finally, impact evaluation is the term generally used for those evaluations that aim to establish, with maximum credibility, the effects of policy interventions relative to what would be observed in the absence of such interventions. These require the three parts noted above: collection of baseline data; collection of appropriate outcome data; and collection of the same data for comparable individuals, groups, or communities that, whether by assignment or for other reasons, did and did not receive the intervention.
The most credible and accurate form of impact evaluation uses randomized assignments to create a comparison group; where feasible this is the best procedure to gain knowledge regarding the effects of assistance projects. However, a number of additional designs for impact evaluations exist, and while they offer somewhat less confidence in inferences about program effects than randomized designs, they have the virtue of being applicable in conditions when randomization cannot be applied IMPROVING DEMOCRACY ASSISTANCE (e.g., when aid goes to a single group or institution or to a small number of units where the donor has little or no control over selecting who will receive assistance).
Impact evaluations pose challenges to design, requiring skill and not merely science to identify and collect data from an appropriate comparison group and match the best possible design to the conditions of the particular assistance program. The need for baseline data on both the group receiving the policy intervention and the comparison group usually means that the evaluation procedures must be designed before the project is begun and carried out as the project itself is implemented. Finally, the need to collect baseline data and comparison group data may increase the costs of evaluation.
For these reasons, among others, impact evaluations of DG programs are at present the most rarely carried out of the various kinds of evaluations described here. Indeed, many individuals throughout the community of democracy assistance donors and scholars have doubts about the feasibility and utility of conducting rigorous impact evaluations of DG projects. Within the committee, Larry Garber has strongly expressed concerns in this regard, and the committee as a whole has given a great deal of attention to these worries. However, as discussed in Chapters 6 and 7, there are a number of practical ways to deal with these issues, and these were explored in the field by the committee’s consultants in partnership with several missions. In addition, a good evaluation design is not necessarily more expensive or time-consuming than routine monitoring or a detailed process evaluation.
The differences among these distinct kinds of evaluations are often obscured by the way in which the term “evaluation” is used in DG and foreign assistance discussions. “Evaluation” is often used to imply any estimate or appraisal of the effects of donor activities, ranging from detailed counts of participants in specific programs to efforts to model the aggregate impact of all DG activities in a country on that country’s overall level of democracy. This catch-all use of the term “evaluation” undermines consideration of whether there is a proper balance among various kinds of evaluations, how various types of evaluations are being used, and whether specific types of evaluations are being done or are
needed. As another CGD report notes:
Part of the difficulty in debating the evaluation function in donor institutions is that a number of different tasks are implicitly simultaneously assigned to evaluation: building knowledge on processes and situations in receiving countries, promoting and monitoring quality, informing judgment on performance, and, increasingly, measuring actual impacts.
Agencies still need their own evaluation teams, as important knowledge providers from their own perspective and as contributors to quality EVALUATION IN USAID DG PROGRAMS management. But these teams provide little insight into our actual impacts and, although crucial, their contribution to knowledge essentially focuses on a better understanding of operational constraints and local institutional and social contexts. All these dimensions of evaluations are complementary. For effectiveness and efficiency reasons, they should be carefully identified and organized separately: some need to be conducted in house, some outside in a cooperative, peer review, or independent manner. In short, evaluation units are supposed to kill all these birds with one stone, while all of them deserve specific approaches and methods. (Jacquet 2006) Efforts to Improve Assessments and Evaluations by Donor Agencies There are encouraging signs of efforts to put greater emphasis on impact evaluations for improving democracy and governance programs.
The basic questions motivating USAID’s Strategic and Operational Research Agenda (SORA) project are also motivating other international assistance agencies and organizations. The desire to understand “what works and what doesn’t and why” in an effort to make more effective policy decisions and to be more accountable to taxpayers and stakeholders has led a host of agencies to consider new ways to determine the effects of foreign assistance projects.
This focus on impact evaluations in particular has increased since the creation of the Millennium Challenge Corporation (MCC) and the 2005 Paris Declaration on AID Effectiveness. Yet while there is wide agreement that donors need more knowledge of the effects of their assistance projects, and there are increased efforts to coordinate and harmonize the approaches and criteria employed in pursuit of that knowledge, donors are far from consensus on how best to answer the fundamental questions at issue.
As the Organization for Economic Cooperation and Development (OECD) has stated:
There is strong interest among donors, NGOs and research institutions in deepening understanding of the political and institutional factors that shape development outcomes. All donors are feeling their way on how to proceed. (OECD 2005:1) Several donors have focused on the first question posed above, the question of where to intervene in the process of democratization to help further that process. In the committee’s view this is a question that the current state of knowledge on democratic development cannot answer. It is an essential question, however, and Chapters 3 and 4 suggest specific research programs that might help bring us closer to answers. These issues are more a matter of strategic assessment of a country’s condition and potential for democratic development, rather than evaluation, a term 0 IMPROVING DEMOCRACY ASSISTANCE the committee thinks is better reserved for studying the effects of specific DG programs. Nonetheless, several national development assistance agencies have, under the general rubric of improving evaluation, sought to improve their strategic assessment tools. What all of the following donor programs have in common is an increased effort at acquiring and disseminating knowledge about how development aid works in varied contexts.
The broad range of current efforts to revise and improve evaluation procedures undertaken by national and international assistance agencies described below are aimed at better understanding the fundamental questions of interest to all: “what works and what doesn’t and why,” although at present only some involve the use of impact evaluations.
Perhaps the most visible leader in efforts to increase the use of impact evaluations is MCC, which has set a high standard for the integration of impact evaluation principles into the design of programs at the earliest
stages and for the effective use of baseline data and control groups:
There are several methods for conducting impact evaluations, with the use of random assignment to create treatment and control groups producing the most rigorous results. Using random assignment, the control group will have—on average—the same characteristics as the treatment group. Thus, the only difference between the two groups is the program, which allows evaluators to measure program impact and attribute the results to the MCC program. For this reason, random assignment is a preferred impact evaluation methodology. Because random assignment is not always feasible, MCC may also use other methods that try to estimate results using a credible comparison group, such as double difference, regression discontinuity, propensity score matching, or other type of regression analysis. (MCC 2007:19) The World Bank has also embarked on the use of impact evaluations for aid programs through its Development Impact Evaluation (DIME) project. Many of the DIME studies involve randomized-experimental evaluations; moreover, “rather than drawing policy conclusions from one-time experiments, DIME evaluates portfolios of similar programs in multiple countries to allow more robust assessments of what works” (Banerjee 2007:30).2 A major symposium on economic development aid also recently explored the pros and cons of conducting impact evaluations of specific programs (Banerjee 2007). While there were numerous objections to the unrestrained use of such methods (which are explored in more detail in Chapters 6 and 7 below), many eminent contributors urged that foreign 2 The CGD has also created the International Initiative for Impact Evaluation to encourage great
aid cannot become more effective if we are unwilling to subject our assumptions about how well various assistance programs work to credible tests. The lead author argued that ignorance of general principles to guide successful economic development (a situation that applies as much or more to our knowledge of democratization) is a powerful reason to take the more humble step of simply trying to determine which aid projects in fact work best in attaining their specific goals.
The Department for International Development (DfID) of the United Kingdom has developed the “Drivers of Change” approach because “donors are good at identifying what needs to be done to improve the lives of the poor in developing countries. But they are not always clear about how to make this happen most effectively” (DfID 2004:1). By focusing on the incorporation of “underlying political systems and the mechanics of pro-poor change... in particular the role of institutions—both formal and informal” into their analysis, this approach attempts to uncover more clearly what fosters change and reduces poverty. This approach is currently being widely applied to multiple development contexts and is being taught to numerous DfID country offices (OECD 2005:1).