As the committee uses the term, what distinguishes an impact evaluation is the effort to determine what would have happened in the absence of the project by using comparison or control groups, or random assignment of assistance across groups or individuals, to provide a reference against which to assess the observed outcomes for groups or individuals who received assistance. Randomized designs offer the most accuracy and credibility in determining program impacts and therefore should be the first choice, where feasible, for impact evaluation designs. However, such designs are not always feasible or appropriate, and a number of other designs also provide useful information to determine the impact of many different kinds of assistance projects. For example, when there is only one group or institution receiving assistance, comparisons may be made across time by using a set of carefully timed measures before and


after the project while controlling statistically for long-term trends or key events. Impact evaluations are designed according to standard protocols of evaluation research; yet the choice of a particular design and decisions about how to adapt the design to a particular project require skilled craftsmanship as much as science.

Current Approaches to Ealuation in USAID The committee’s review of current approaches to the evaluation of

development assistance in general, and USAID DG programs in particular, found that:

• Very few of the evaluations undertaken by international or multilateral development and democracy donors are designed as impact evaluations. There are signs that this is changing as some donors and international agencies are beginning to implement new approaches to evaluation. The Millennium Challenge Corporation and the World Bank in particular have undertaken efforts to increase the use of randomized designs in evaluations of their economic assistance and anticorruption projects. A few NGOs also have undertaken randomized impact evaluations of their democracy assistance efforts.

• Within USAID the number of evaluations has declined for all types of assistance programs. The evaluations undertaken for DG programs generally focus on implementation and management concerns and have not collected the data needed for sound impact evaluations. For example, most past evaluations of DG projects have not made comparable baseline and postproject data measurements on key outcomes, and almost all past evaluations lacked data on comparison groups that did not receive assistance. This makes it nearly impossible to develop a retrospective analysis from the data in those evaluations to accurately determine the effects of DG programs.

• There is a tendency, at one and the same time, to evaluate democracy projects mainly in terms of very proximate outcome measures that mainly assess how well the project was implemented and yet to judge the ultimate success of DG projects by whether they coincide with changes in country-level measures of national democracy such as Freedom House scores. Neither course best serves USAID’s interests in determining the effects of its DG programs. Those effects are best judged by focusing on policy-relevant objectives at the local or sectoral level that are plausible outcomes of those projects.

• Once research and evaluation are completed, there are few organizational mechanisms for broad discussion among DG officers or for  SUMMARY integration of research and evaluation findings with the large range of analysis being carried on outside the agency.

• DG officials are genuinely interested in procedures that will help them better learn and demonstrate the impact of their projects. Yet there is considerable concern among many at USAID regarding whether missions would gain from designing or implementing rigorous impact evaluations, especially those using randomized assignments. This is mostly due to deep skepticism as to the applicability of this methodology to DG programs but also to the overall decline in support for evaluations within USAID, to a lack of specific expertise on impact evaluation design, and to issues in contracting timetables and procedures that discourage adoption of what is perceived as a more complicated approach to evaluation.

• More generally, while there are many calls from policymakers, USAID officials, and other international and national agencies and donors to better determine the effects of DG programs, there is also widespread skepticism regarding whether impact evaluations will, in fact, provide that information. One member of the committee, Larry Garber, emphatically shares these concerns. Among both scholars and policy professionals, skeptics worry that the designs for impact evaluations will prove too cumbersome or inflexible to work in fluid and politically sensitive conditions in the field; that such evaluations will be too costly or timeconsuming; or that such studies, in particular randomized designs, are either unethical for or ill suited to the actual projects being carried out in DG programs.

Feasibility of Impact Ealuations for DG Projects Recognizing the need to take such concerns seriously, the committee examined a wide range of impact evaluation designs and worked with DG officers at several missions to assess the feasibility of such designs for their current or planned activities. The committee’s field studies found that a much larger portion of USAID’s DG programs than expected— forming roughly half of the projects that were examined in Uganda and several projects in Peru and Albania—appear to be amenable, in the view of the committee’s consultants, to randomized assignment designs. Nor did these designs necessarily require major departures from current program procedures. Often just more attention to how programs were rolled out or allocated among groups scheduled to receive assistance, combined with measurements on both the groups currently receiving assistance and those scheduled to receive it in the future, would create a reasonable randomized assignment design. In cases where randomized assignment designs were not feasible, the field teams were able to develop other


designs that could offer a significant improvement in the ability to assess project effectiveness.

In addition, the committee found that many of the surveys that USAID is already carrying out provide excellent baseline and comparison data for DG projects; thus the data for impact evaluations that use matched or adjusted comparison groups (rather than randomization) are in some cases already being collected and could be utilized for little additional cost.

The field teams thus concluded that it was quite feasible, at least in theory, to conduct high-quality impact evaluations of varied designs that will help USAID better discern the impacts of its DG programs.

However, the committee knows that there is much skepticism regarding these procedures and, in particular, concerns—noted by Mr. Garber and by others in the democracy assistance donor community—about whether the complexity and sensitivity of DG programs will permit sound impact evaluations, especially those using randomized assignments, to be carried out. Therefore the full committee agreed that the value of such impact evaluations will have to be demonstrated in USAID’s own experience.

Strategies for Implementation

• The committee unanimously recommends that USAID move cautiously but deliberately to implement pilot impact evaluations of several carefully selected projects, including a portion with randomized designs, and expand the use of such impact evaluations as warranted by the results of those pilot evaluations and the needs expressed by USAID mission directors.

• Moreover, the committee recommends that these pilot evaluations be undertaken as part of a Dg evaluation initiative with senior leadership that will also focus on improving USAID’s capacity to undertake impact evaluations and make resources and expertise available to mission directors seeking to learn about and apply impact evaluations to their projects. This Dg evaluation initiative is described in more detail below.

–  –  –

• The concept of democracy cannot, in the present state of scientific knowledge of democracies and democratization, be defined in an authoritative (nonarbitrary) and operational fashion. It is an inherently multidimensional concept, and there is little consensus over its attributes. Definitions range from minimal—a country must choose its leaders through contested elections—to maximal—a country must have universal suffrage, accountable and limited government, sound and fair justice and extensive protection of human rights and political liberties, and economic and social policies that meet popular needs. Moreover, the definition of democracy is itself a moving target; definitions that would have seemed reasonable at one time (such as describing the United States as a democracy in 1900 despite no suffrage for women and major discrimination and little office-holding among minorities) are no longer considered reasonable today.

• Existing empirical indicators of overall democracy in a country suffer from flaws that include problems of definition and aggregation, imprecision, measurement errors, poor data coverage, and a lack of agreement among scales intended to measure the same qualities. There is thus no way to utilize existing macro-level indicators in a way that provides sound policy guidance or reliably tracks modest or short-term changes in a country’s democratic status. Existing indicators work best simply to roughly categorize countries as “fully democratic,” “authoritarian,” or “mixed or in between” and to identify large-scale or long-term movements in levels of democracy. They are particularly weak in assessing differences among the nondemocratic and mixed regimes that are the most important settings for USAID’s DG work.

• By contrast, indicators focused on specific sectors of democracy in a country (the sectoral level) would help USAID (1) track trends across various dimensions of democracy through time, (2) make precise comparisons across countries and regions, (3) understand the components and possible sequences of democratic transition, (4) analyze causal relationships (e.g., between particular facets of democracy and economic growth), and (5) assess the democratic profile (i.e., strengths and weaknesses across various dimensions of democracy) of countries where USAID operates.

• While the United States, other donor governments, and international agencies that are making policy in the areas of health or economic assistance are able to draw on databases that are compiled and updated at substantial cost by government or multilateral agencies mandated to collect such data, no comparable source of data on democracy at either the macro or sectoral level currently exists. Data on democracy are instead currently compiled by various individual academics on irregular and shoestring budgets, or by NGOs or commercial publishers, using different definitions and indicators of democracy.

0 IMPROVING DEMOCRACY ASSISTANCE Strategies for Implementation These findings have led the committee to make a recommendation that committee members believe would significantly improve USAID’s (and others’) ability to track countries’ progress and make the type of strategic assessments that will be most helpful for DG programming.

• USAID and other policymakers should explore making a substantial investment in the systematic collection of democracy indicators at a disaggregated sectoral level—focused on the components of democracy rather than (or in addition to) the overall concept. If they wish to have access to data on democracy and democratization comparable to the data relied on by policymakers and foreign assistance agencies in the areas of public health or trade and finance, a substantial government or multilateral effort to improve, develop, and maintain international data on levels and detailed aspects of democracy would be needed. This should not only involve multiple agencies and actors in efforts to initially develop a widely accepted set of sectoral data on democracy and democratic development but should also seek to institutionalize the collection and updating of democracy data for a broad clientele, along the lines of the economic, demographic, and trade data collected by the World Bank, the United Nations, and the International Monetary Fund.

