«Development, Security, and Cooperation Policy and Global Affairs THE NATIONAL ACADEMIES PRESS 500 Fifth Street, N.W. Washington, DC 20001 NOTICE: The ...»
State of California. 2007. Legislative Analyst’s Office (LAO), Analysis of the 2007-2008 Budget Bill, Transportation Chapter. Available at: http://www.lao.ca.go/analysis_00/ transportation/trans_anl0.pdf. Accessed on August 18, 2007.
Tilly, C. 2004. Contention and Democracy in Europe, 0-000. Cambridge: Cambridge University Press.
Tilly, C. 2007. Democracy. Cambridge: Cambridge University Press.
de Tocqueville, A. 1969. Democracy in America. 2 vols. Garden City, NY: Anchor Books.
USAID (U.S. Agency for International Development). 2006. USAID Primer: What We Do and How We Do It. Washington, DC: USAID, p. 31. Available at: http://www.usaid.go/ about_usaid/PDACG00.pdf. Accessed on August 7, 2007.
USAID (U.S. Agency for International Development). 2007. Our Work: Democracy and Governance: Technical Areas. Available at: http://www.usaid.go/our_work/democracy_ and_goernance/technical_areas. Accessed on July 10, 2007.
U.S. Department of State, Bureau of Public Affairs. 2007. Provincial Reconstruction Teams:
Building Iraqi Capacity and Accelerating the Transition to Iraqi Self-Reliance. Available at: http://www.state.go/documents/organization/.pdf.
White, H. 2006. Impact Ealuation—The Experience of the Independent Ealuation Group of the World Bank. Washington, DC: World Bank.
White, H. 2007. Technical Rigor Must Not Take Precedence Over Other Kinds of Valuable Lessons. Pp. 81-89 in Making Aid Work. A.V. Banerjee, ed. Cambridge, MA: MIT Press.
Wholey, J.S., Hatry, H.P., and Newcomer, K.E. 2004. Handbook of Practical Program Ealuation, 2nd ed. San Francisco: Jossey-Bass.
de Zeeuw, J., and Kumar, K. 2006. Promoting Democracy in Postconflict Societies. Boulder:
Evaluation in USAID DG Programs:
Current Practices and Problems
To make decisions about the best ways to assist the spread of democracy and governance (DG), the U.S. Agency for International Development (USAID) must address at least two broad questions:
1. Where to intervene. In what countries and in what sectors within countries? Selecting the targets for DG programming requires a theory, or at least hypothesis, about the relationships among different institutions and processes and how they contribute to shaping overall trajectories toward democracy and governance. It also requires strategic assessment, that is, the ability to identify the current quality of democratic institutions and processes in various countries and set reasonable goals for their future development.
2. How to intervene. Which DG projects will work best in a given country under current conditions? Learning how well various projects work in specific conditions requires well-designed impact evaluations that can determine how much specific activities contribute to desired outcomes in those conditions.
The two questions are clearly connected. To decide where to intervene (Question 1), one wants to know which interventions can work (Question 2) in the conditions facing particular countries. Indeed, in the current state of scientific knowledge, answers to Question 2 may provide the most helpful guidance to answering Question 1.
IMPROVING DEMOCRACY ASSISTANCE This chapter therefore focuses on USAID’s policies and practices for monitoring and evaluation (M&E) of its DG projects. To provide a context, we begin with a brief description of the current state of evaluations of development assistance programs in general. Then existing USAID assessment, monitoring, and evaluation practices for DG programs are described. Since such programs are called into existence and bounded by U.S. laws and policies, the key laws and policies that shape current USAID DG assessment and evaluation practices are examined, to lay the foundation for the changes recommended later in the report. The chapter concludes with a discussion of three key problems that USAID encounters in its efforts to decide where and how to intervene.
CURRENT EvALUATION PRACTICES IN DEvELOPMENT
ASSISTANCE: gENERAL OBSERvATIONSAs Chapter 5 discusses later in detail, there is a widely recognized set of practices for how to make sound and credible determinations of how well specific programs have worked in a particular place and time (see, e.g., Shadish et al 2001, Wholey et al 2004). The goal of these practices is to determine, not merely what happened following a given assistance program, but how much what happened differs from what would be obsered in the absence of that program. The final phrase is critical, because many factors other than the given policy intervention—including ongoing longterm trends and influences from other sources—are generally involved in shaping observed outcomes. Without attention to these other factors and some attempt to account for their impact, it is easy to be misled regarding how much an aid program really is contributing to an observed outcome, whether positive or negative.
The practices used to make this determination generally have three parts: (1) collection of baseline data before a program begins, to determine the starting point of the individuals, groups, or communities who will be receiving assistance; (2) collection of data on the relevant desired outcome indicators, to determine conditions after the program has begun or operated for a certain time; and (3) collection of these same “before and after” data for a comparison set of appropriately selected or assigned individuals, groups, or communities that will not receive assistance, to estimate what would have happened in the absence of such aid.1 1 The ideal comparison group is achieved by random assignment, and if full randomization is achieved, a “before” measurement may not be required, as randomization effectively sets the control and intervention groups at the same starting point. However, both because randomization is often not achievable, requiring the use of matched or baseline-adjusted comparison groups, and because baseline data collection itself often yields valuable information about the conditions that policymakers desire to change, we generally keep to the three-part model of sound evaluation design.
EVALUATION IN USAID DG PROGRAMS Wide recognition of these practices for determining project impacts does not mean that they are widely or consistently applied, however.
Nor does it mean that policy professionals or evaluation specialists agree that the three elements are feasible or appropriate in all circumstances, especially for highly diverse and politically sensitive programs such as democracy assistance or other social programs. Thus, while some areas of development assistance, such as public health, have a long history of using impact evaluation designs to assess whether policy interventions have their intended impact, social programs are generally much less likely to employ such methods.
In 2006 the Center for Global Development (CGD), a think tank devoted to improving the effectiveness of foreign assistance in reducing global poverty and inequality, released the report of an “Evaluation Gap
Working Group” convened to focus on the problem of improving evaluations in development projects. Their report concludes:
Successful programs to improve health, literacy and learning, and household economic conditions are an essential part of global progress. Yet... it is deeply disappointing to recognize that we know relatively little about the net impact of most of these social programs.... [This is because] governments, official donors, and other funders do not demand or produce enough impact evaluations and because those that are conducted are often methodologically flawed.
Too few impact evaluations are being carried out. Documentation shows that UN agencies, multilateral development banks, and developing country governments spend substantial sums on evaluations that are useful for monitoring and operational assessments, but do not put sufficient resources into the kinds of studies needed to judge which interventions work under given conditions, what difference they make, and at what cost. (Savedoff et al 2006:1-2) Although not a focus for the CGD analysis, democracy assistance reflects this general weakness. As a recent survey of evaluations in democracy programming noted: “Lagging behind our programming, however, is research focusing on the impact of our assistance, knowledge of what types of programming is (most) effective, and how programming design and effectiveness vary with differing conditions” (Green and Kohl 2007:152). The Canadian House of Commons recently investigated
Canada’s DG programs and came to similar conclusions:
[W]eaknesses... have been identified in evaluating the effectiveness of Canada’s existing democracy assistance funding.... Canada should invest more in practical knowledge generation and research on effective democratic development assistance. (House of Commons 2007) As discussed in more detail below, there are many reasons why DG projects—and social development programs more generally—are not rouIMPROVING DEMOCRACY ASSISTANCE tinely subject to the highest standards of impact evaluation. One reason is that “evaluation” is a broad concept, of which impact evaluations are but one type (see, e.g., World Bank 2004). On more than one occasion committee members found themselves talking past USAID staff and implementers because they lack a shared vocabulary and understanding of what was meant by “evaluation.” Diverse Types of Evaluations Because the term “evaluation” is used so broadly, it may be useful to review the various types of evaluations that may be undertaken to review aid projects.
The type of evaluations most commonly called for in current USAID procedures is process evaluation. In these evaluations investigators are chosen after the project has been implemented and spend several weeks visiting the program site to study how the project was implemented, how people reacted, and what outcomes can be observed. Such an evaluation often provides vital information to DG missions, such as whether there were problems with carrying out program plans due to unexpected obstacles, or “spoilers,” or unanticipated events or other actors who became involved. They are the primary source of “lessons learned” and “best practices” intended to inform and assist project managers and implementers. They may reveal factors about the context that were not originally taken into account but that turned out to be vital for program success.
Process evaluations focus on “how” and “why” a program unfolded in a particular fashion, and if there were problems, why things did not go as originally planned.
However, such evaluations have a difficult time determining precisely how much any observed changes in key outcomes can be attributed to a foreign assistance project. This is because they often are unable to re-create appropriate baseline data if such data were not gathered before the program started and because they generally do not collect data on appropriate comparison groups, focusing instead on how a given DG project was carried out for its intended participants.
A second type of evaluation is participatory evaluation. In these evaluations the individuals, groups, or communities who will receive assistance are involved in the development of project goals, and investigators interview or survey participants after a project was carried out to determine how valuable the activity was to them and whether they were satisfied with the project’s results. Participatory evaluation is an increasingly important part of both process and impact evaluations. In regard to all evaluations, aid agencies have come to recognize that input from participants is vital in defining project goals and understanding what conEVALUATION IN USAID DG PROGRAMS stitutes success for activities that are intended to affect them. This focus on building relationships and engaging people as a project goal means this type of evaluation may also be considered part of regular project activity and not just a tool to assess its effects.