A slightly different evaluation problem arises with respect to the activities designed to support the drafting of various pieces of legislation. One fairly straightforward measure of success in this area is simply whether or not the law was actually drafted, and, if so, whether it included language that will demonstrably strengthen the rule of law. But assessing whether or not USAID’s support had any impact requires weighing the counterfactual question: Would the legislation have been drafted without USAID’s support  APPENDIX E and what would it have looked like? If the answers to these questions are that the legislation would not have been drafted or that the language in the resulting law would not have been optimal, then we can judge the support from USAID to have been successful to the extent that the result we observe is better than this counter factual outcome. The broader problem, however, is that achieving the overarching strategic objective of strengthening the rule of law will involve more than just getting legislation drafted but also getting it passed and then having it enforced. The point is that the measurable outcome of the USAID-sponsored activity is several steps removed from the true goals of the intervention, and any assessment of “success” in these areas must be interpreted in this light. This is equally true with respect to other activities, such as technical assistance to aid the Albanian government in the establishment of a copyright office or an office of patents and trademarks. Whether these institutions, once created, will have any impact on protecting intellectual property will depend on much more than whether or not a formal office designed to do so has been established.

The larger point that this discussion hints at is that many of the activities in the rule of law area involve the creation of laws or the strengthening of institutions whose existence is a prerequisite for a legal system that works, and that supports democracy and market reform. Whether or not these laws and institutions actually have a positive impact on these outcomes can only be ascertained after they have been created or made sufficiently strong to work properly. In this context, evaluating the efficacy of the resources spent on such activities may not make much sense, since the impact will only be meaningful after this initial, necessary foundation-building stage. Supporting the writing of laws and the setting up of institutions such as inspectorates, citizens’ advocacy offices, and attorneys’ associations may simply be necessary investments, even if it is very difficult to know whether or not they have had, or will have, an impact on the ultimate outcomes that USAID wants to affect.

The one activity area within rule of law that might be amenable to randomized evaluation, at least in principle, is the support for rule of law–oriented nongovernmental organizations (NGOs). The problem here is that the preferred method of selecting NGOs for support is through a small grants competition, whereas a truly rigorous evaluation of the impact of support would require randomly choosing NGOs for funding.

One possible solution would be to hold a small grants competition and, having ranked the applications from best to worst, work down the list funding every other one. Then, data would need to be collected on the quality of the performance and/or the impact in its area of focus of every NGO on the list—both those that were funded and those that were not— and a comparison could then be made across those groups. The problem, again, however, is to figure out what, precisely, to measure (which will  APPENDIX E depend, in any case, on the particular goals that the NGO sets for itself).

Also, unless the small-grants competition generates a very large number of high-quality applications, this method is not likely to generate very useful results. The need for a large number of funded and nonfunded NGOs will be increased by the likelihood that NGOs will propose different sets of activities, so “success” will have two possible sources—the difficulty of the tasks that the NGO sets out to accomplish and the benefits of having received the small grant—and the sample of NGOs analyzed will need to be large enough to permit the impact of funding through the “noise” of the random variation in task difficulty.

Decentralization USAID/Peru launched a program in 2002 to support national decentralization policies initiated by the Peruvian government. Over a five-year period, the Pro-Decentralization (PRODES) program was intended to

• support the implementation of mechanisms for citizen participation with subnational governments (such as “participatory budgeting”);

• strengthen the management skills of subnational governments in selected regions of Peru; and

• increase the capacity of nongovernmental organizations in these same regions to interact with their local government.

With the exception of some activities relating to national-level policies, all interventions under the program took place in seven selected subnational regions (also called departments): Ayacucho, Cusco, Huanuco, Junin, Pasco, San Martin, and Ucayali.4 These seven regions contain 61 provinces, which in turn contain 536 districts.5 Workshops on participatory budgeting, training of civil-society orgaAs discussed elsewhere, the regions were nonrandomly selected for programs because they share high poverty rates, significant indigenous populations, narcotics-related activities, and because a number of the departments were strongholds for the Shining Path movement in the 1980s.

5 Peru has 24 departments plus one “constitutional province”; the 24 departments in turn

nizations, and other interventions took place at the regional, provincial, and district levels.6 The ultimate goal of the program was to promote “increased responsiveness of sub-national elected governments to citizens at the local level in selected regions.” This outcome is potentially measurable on different units of observation. For example, government capacity and responsiveness could be measured at the district or provincial level (through expert appraisals or other means), while citizens’ perceptions of government responsiveness may be measured at the individual level (through surveys). Experimental designs could be used to study the impact of the decentralization program, and the cost of appropriately designed experimental evaluations could in fact be far beneath the actual costs spent on monitoring and evaluation.

Best-possible designs. We discuss best-possible designs from the perspective of program evaluation. First, we discuss what an ideal ex ante design for the decentralization program might have been in 2002, when the program was begun. Second, we also discuss how an experimental design might be employed in a second phase of the program, given that all the municipalities in the seven regions were already treated in the first phase.

A “tabula rasa” design. We assume that the decentralization program will be implemented in the seven nonrandomly chosen regions in which USAID commonly works; inferences about the effect of the intervention will then be made to the districts and provinces that comprise these regions. The simplest design would involve randomization of treatment at the district level. Districts in the treatment group would be invited to receive the full bundle of interventions associated with the decentralization program (e.g., training in participatory budgeting, assistance for civil society groups, and so on); control districts would receive no interventions.

There are two disadvantages to randomizing at the district level, however. One is that some of the relevant interventions in fact take place at the provincial level.7 Another is that district mayors and other actors may more easily become aware of treatments in neighboring districts. For both of these reasons, it may be useful to randomize instead at the provincial 6 Relevant subnational authorities include members of regional councils, provincial mayors, and mayors of districts.

7 Some interventions also occurred at the regional level, particularly toward the end of the

level. Then, all districts in a province that were randomly selected for treatment would be invited to receive the bundle of interventions.

Several different kinds of outcome measures can be gathered. Survey evidence on citizens’ perceptions of local government responsiveness will be useful; so may be evaluations of municipal governance capacity taken across all municipalities in the seven regions (both treated and untreated). A difference in average outcomes across groups at the end of the program—for example, differences in the percentage of residents who say government services are “good” or “very good,” or the percentage who say the government responds “almost always” or “on the majority of occasions” to what the people want—can then be reliably attributed to the effect of the bundle of interventions, if the difference is bigger than might reasonably arise by chance.8 One feature of this design that may be perceived as a disadvantage is the fact that treated municipalities are subject to a bundle of interventions;

thus, if we observe a difference across treated and untreated groups, we may not know which particular intervention was responsible (or most responsible) for the difference. Did training in participatory budgeting matter most? Assistance to civil society groups? Or some other aspect of the bundle of interventions? This problem arises as well in some medical trials and other experiments involving complex treatments, where it may not be clear exactly what aspect of treatment is responsible for differences in average outcomes across treatment and control groups.

It seems preferable at this stage to design an evaluation plan that would allow USAID to know with some confidence whether a program financed by USAID makes any difference.

Bundling the interventions may provide the best chance to estimate a causal effect of treatment.

Once this question is answered, one might then want to ask what aspect of the bundle of interventions made a difference, using further experimental designs. However, another possibility discussed below is to implement a more complex design in which different municipalities would be randomized to receive different bundles of interventions.

The intention-to-treat principle can be used to analyze the results of the experiment. Some municipalities assigned to treatment may refuse to sign participation agreements or otherwise may not cooperate with the local contractor; these municipalities may be akin to noncompliers in a medical trial. In this context, estimating the “effect of treatment on the treated” may be of interest.

It may be worth choosing pilot districts at random as well. In the first 8 Standard errors may need to be adjusted to account for the clustering of treated districts

phase of the implemented decentralization program, only 145 municipalities were incorporated in the program in the first year, out of 536 that were eventually incorporated. Comparing municipal capacity across incorporated and unincorporated municipalities at the end of the pilot period may not lead to useful results; the incorporated municipalities were chosen for their high degree of capacity. It would be much more meaningful to randomly assign municipalities for inclusion in the pilot phase. To the extent it is necessary to include some municipalities with high ex ante management capacity and resources, this may be accomplished through stratified sampling of municipalities.

Second-phase design. USAID/Peru is preparing to roll out a second fiveyear phase of the decentralization program, again in the seven regions in which it typically works. At this point, all municipalities in the seven regions were already treated (or at least targeted for treatment) in the first phase. This may raise some special considerations for the second-phase design. Our understanding is that there are at least two possibilities for the actual implementation of the second phase of the program; which option is chosen will depend on the available budget and other factors.

One is that all 536 municipalities are again targeted for treatment. As in the first-phase design, this would not allow the possibility to partition municipalities in the seven regions into a treatment group and controls.

