Surveys are thought to be expensive, but often the costs appear to be larger than they really are. In many of the countries in which USAID has democratization programs, the cost of a well-administered survey can be quite reasonable.5 A second objection readers may have to the DD approach is that the target (or project) areas are indeed different from the national samples in many of the ways mentioned above. Often they are poorer and more rural and therefore are expected not only to begin at levels below the nation as a whole but also to perhaps exhibit slower progress. One of the strengths of this design is that such differences can be detected and noted when the baseline survey data are collected. To correct for those differences, the survey analysis can then use an analysis-of-variance design, in which the national sample becomes merely one of the groups being compared to the various treatment regions or municipalities. Covariates 5 Costs vary directly by hourly wages in any given country. In low-wage countries, surveys can be quite inexpensive. For example, surveys in many Latin American and African countries can be conducted for $15 to $25 per interview (sometimes less) as an all-inclusive cost (sample design, pretests, training, fieldwork, and data entry). For a typical sample of 1,200 respondents (which would provide sample confidence intervals of ±3.0 percent), total costs to obtain the data would be about $30,000. Of course, that is for one round of interviews;

if the typical project involves a baseline survey followed by an end-of-project survey to measure impact, those costs would double.

Gathering the data is one cost, but analysis is another. The cost of analysis depends entirely on the price of contracting with individuals qualified to analyze such data. At a minimum, such individuals should hold a master’s degree in the social sciences, with several courses in statistics. Individuals with such qualifications are often available in target countries, and an extensive analysis of the data could be obtained in many for $20,000 or even less. Unfortunately, many of the studies the committee has seen conducted for USAID limit themselves to reporting percentages and summary statistics. Analysis of that type is rarely useful, since indices of variables normally need to be created, logistic and OLS regression techniques must be applied, and reporting of significance levels and confidence intervals is required. For example, if the consultant’s report states that the baseline study finds 10 percent of respondents attending municipal meetings in both the control and experimental areas, and the end-of-project survey finds that the treatment area has risen to 15 percent but the control group has also risen to 12 percent, it would be important to know if the change in the treatment group is statistically significant and if the increase in the control group was also significant. Thus USAID needs to be certain it has hired qualified individuals and obtained an appropriate level of statistical analysis to make the analysis useful for determining project impact.

 IMPROVING DEMOCRACY ASSISTANCE can and have been used to statistically remove the impact of the differences between the national sample and the treatment groups. Hence, if the targeted areas are, on average, poorer or exhibit lower average levels of education, those variables can be included as covariates to “remove” their impact, after which the nation and the treatment areas can be more effectively compared.

There are certainly possible flaws in this sort of analysis; for example, if there are unmeasured differences that are not known and/or cannot be controlled for statistically, the findings could be deceptive. But when randomized assignment cannot be used, this method can provide a good alternative. Since in many cases missions will not be able to select their treatment areas randomly, the “national control” sample offers a reasonable way of measuring project impact.6 Finally, it is important to add that survey samples should not be used when little is known about the expected project impact. Surveys are best used when researchers already have a good idea of how to measure the expected impact. For example, in the illustration mentioned above, it should be relatively easy to specify what increased participation means, by devising questions on frequency of attendance at town meetings, municipal council meetings, district meetings, and the like. But when a project involves less well-researched areas, focus groups should be the instrument of choice until researchers more fully understand what is going on. Focus groups can then lead to more systematic evaluation via surveys.

Strengthening Parties: An Example from Peru7 Another example of an impact evaluation design when randomization is not possible comes from Peru, where one of USAID’s programmatic goals is to strengthen political parties. An idea that has been considered by the Peru USAID mission that would serve this goal and reinforce the parallel goal of promoting decentralization is to provide assistance to 6Another factor to consider with respect to the use of surveys is the size and nature of the sample size of both the treatment and the control groups. The key factor here is the change that the project is expected to make on the key variables being studied. For example, if, again, the goal of the project is to increase participation in local government, what is the target increase that has been set? If the increase is 3 percent, a sample of 500 respondents will be too small, since a sampling error of about ±4.5 percent would emerge from a sample of that size. This means that the project evaluation would be subject to a Type II error, in which the expected impact did indeed occur, but the sample size was too small to detect it. Ideally, the control group(s) should be of the same size as the treatment group in order to maintain similar confidence intervals for the measurement of project impact/nonimpact.

7 This discussion is drawn from the report of a field team led by Thad Dunning, Yale University.


the major national-level parties in opening or strengthening local offices.

Because of the large number of municipalities in which such offices might, in principle, be opened or strengthened, such a program might seem like a good candidate for a randomized evaluation. To set up the ideal conditions for an impact evaluation, USAID or the local implementer would randomly select municipalities in which to establish or strengthen local parties from a set of acceptable municipalities. Local parties would have to accept that USAID or the contractor would select the municipalities.

However, when and where a political party chooses to open (or allocate resources to strengthen the operations of) a municipal office is purely the business of the political party. For USAID to make such decisions would be to go well beyond its mandate of supporting good governance more generally. From a project evaluation standpoint, however, the problem is that if the parties themselves choose where to open (or allocate resources to strengthen the operations of) local offices, the design would be nonrandom. If several years into the project USAID finds political parties to be stronger in the treatment municipalities, was this due to the project or to the fact that the parties selected those local branches that were already in the process of strengthening themselves? Unless the project also provided for some local branches that the national parties did not select for funding, which likely is not feasible, it would not be possible to answer this question.

Moreover, if outcomes are not tracked in municipalities in which USAID partners do not support local party offices (i.e., controls), any inferences may be especially misleading. Suppose measures of local party strength are taken today and again in five years and an increase is found.

Is this due to the effect of party-strengthening activities supported by USAID? Or is it due to some other factor, such as a change from an electoral system with preferential voting to closed party lists, which would tend to strengthen party discipline, including, perhaps, that of local parties?8 With a control group of municipalities, it could be tested whether they too had experienced a growth in party strength (in which case the cause was most likely the law, which affects all municipalities in the country, not the USAID program, which was present only in some). The point is that without data on any comparison group to provide controls, 8 Such a change is currently being considered in Peru. In the current electoral system, there is proportional representation at the department level, and voters vote for party lists but can indicate which candidate on the list they prefer. According to a range of research on the topic, this can create incentives for candidates to cultivate personal reputations and also makes the party label less important to candidates. Under a closed-list system, voters simply vote for the party ticket, and party leaders may decide the order of candidates on the list. This may tend to increase party discipline and cohesion (as well as the internal power of party elites).

 IMPROVING DEMOCRACY ASSISTANCE it will be impossible to separate the effect of USAID local activities from the effect of the law. So at a minimum, collecting data in a set of control municipalities would be highly advantageous. Thus, even if USAID gives political parties full control over which municipalities they choose for party strengthening with USAID assistance, USAID would benefit from seeking a list of those municipalities and choosing to also gather data from a sample of municipalities not on the list, to serve as a (nonrandom) comparison group.

When units cannot be randomly assigned to assistance or control groups, the challenge for an evaluator is to identify an appropriate control group—one that approximates what the treatment group would have looked like in the absence of the intervention. In this context this would mean identifying municipalities that the parties do not select that are in all other ways similar to the municipalities in which the parties elect to work. Statistical procedures—in particular, propensity score matching estimators—have been developed to assist in the process of carefully matching units to approximate a randomized design. Alternatively, evaluators can exploit the discontinuities that exist when treatment is assigned based on a unit’s value on a single continuous measure. For example, if parties elected to work in the top 20 percent of municipalities in terms of their base of support, a comparison could be constructed that exploited the fact that those just above the 20 percent threshold are quite similar to those just below.

These procedures require high-quality data on the characteristics of units that were and were not selected, as well as an understanding of the factors that contributed to the selection process. But as discussed in Chapter 5, these approaches have already been employed with impressive results in other settings not too dissimilar from some DG activities.

The larger point is that creativity can help overcome some of the potential obstacles to stronger research designs. And as long as they include a control group and sound pre- and postmeasurements, even nonrandomized designs can provide the basis for credible impact evaluations; in principle they can offer considerably more information for assessing project effects than is usually obtained in current DG M&E activities.

Supporting an Inclusive Political System in Uganda9 Another example is the project sponsored by USAID’s Uganda mission to promote the development of an inclusive political system. A key objective of this effort is to empower women and other marginalized citiThis discussion and the following one draw on work by a team led by Devra Moehler,

–  –  –

zens to lobby district and political party leaders on issues of importance to them, such as activities for the disabled. To achieve this objective, small grants are to be provided to a small number of civil society organizations (CSOs) to allow them to carry out programs in this area. The objective is certainly worthy, but it is not amenable to randomized evaluation without a substantial increase in the number of funded CSOs (see Chapter 6).

How, then, can it be determined whether the money spent on the small grant program is having the desired effect?

