«Development, Security, and Cooperation Policy and Global Affairs THE NATIONAL ACADEMIES PRESS 500 Fifth Street, N.W. Washington, DC 20001 NOTICE: The ...»
SELECTION OF FIELD vISIT SITESThree countries were selected as the sites of the field visits conducted by teams of consultants and staff: Albania, Peru, and Uganda. In particular, the selection was based primarily on the stage of program development within a country’s DG portfolio, the breadth of USAID programming, and the depth of USAID programming (as determined by long-term funding in multiple program areas of interest; see “Current and Recent USAID Projecst at the Time of Field Visits” at the end of this appendix for a list of the major DG projects in each country). In each country selected, the DG staff were at the stage of developing new projects, offering an optimal opportunity to explore options for program design that may be more or less suited for various research methodologies. The NA field team members (see “Consultant Biographies” at the end of this appendix) were thus able to understand a variety of projects at the stage of their inception, the point at which new methodologies would be most effectively designed to maximize confidence about the impact of projects and under what conditions. These considerations guided the selection of cases across geographically and politically distinct regions of the world (Central Europe/Post-Communist, Latin America/Post-Military Rule, Africa/Post-Conflict).
While there is no single point at which DG programs can be most effectively designed, implemented, or evaluated, the initial stages of development and design provide the most fruitful points at which innovative yet feasible options may be considered. Each field team therefore selected one or more projects and worked closely with USAID Mission DG officers, project implementers, and local partners through a series of in-depth conversations to understand the various opportunities and challenges presented by newly proposed program designs, data collection, and more rigorous evaluation techniques. A fuller discussion of these proposed program designs in each country visited follows.
kEy OBSERvATIONS AND FINDINgS FROM FIELD vISITS2There are ample opportunities for improving the methodology of program monitoring and evaluation within the DG sector. This is in large part due to the well-developed existing USAID evaluation procedures.
To maximize these opportunities, various approaches to evaluation must be selected based on program goals and program designs. This should involve the provision of assistance (e.g., visits by specialists in program monitoring and evaluation (M&E) from USAID/Washington to missions during the project conceptualization stage as well as subsequent stages of M&E development.
Improvements in program evaluation need not be expensive. Maximizing existing mechanisms (surveys and other data collection systems) and strategically targeting sample populations and control groups can result in more robust findings at a cost savings overall.
By improving program evaluation, the impact of USAID programs can be more accurately assessed and documented. Creating knowledge of program impacts through rigorous evaluation is the best way to identify and take advantage of lessons learned.
Institutional knowledge gained through these experiences should be shared within and beyond the mission to affect learning on a broader, agency-wide basis.
Building on Current Tools and Approaches Several current practices of mission staff demonstrate the necessary willingness to maximize reasonable opportunities for learning and provide the basis for more solid inferences over time. Currently, as a part of ongoing DG programs, mission staff collect regular and systematic information about those who receive training through USAID-funded programs. This approach to data collection should be encouraged and expanded to complement other more rigorous methodologies described below.
Similarly, implementers working with USAID have developed elaborate mechanisms for quarterly data collections pertinent to their programs.
To maximize the potential represented by these mechanisms, data collected should be directed toward understanding outcomes and impacts over outputs. Similarly, mechanisms created by local implementers should be strategically collected and analyzed to maximize cost benefits and 2 This text is drawn from memos prepared for the committee by three of its field consultants—Thad Dunning, Yale University (Peru); Devra Cohen Moehler, Cornell University (Uganda); and Dan Posner, University of California at Los Angeles (Albania)—and reflects their judgments and assessments.
APPENDIX E efficiencies. For example, collecting local government data in the form of smaller, cost-effective samples from municipalities would be beneficial.
Furthermore, this information should be fully transferable to USAID for learning purposes. Most important, these mechanisms should be consistent with key program design elements requiring consideration at the initial stages of program development.
Measurement of Outcome Indicators Indicators gathered in connection with past programs tend to be measures of “outputs” or very proximate outcomes. Examples of these output indicators include, in the context of a decentralization program, the number of relevant municipal officials trained by the implementer or the percent of target municipalities who agree to an assistance plan. Although these output measures may be useful and necessary for monitoring the performance of local implementers or to assess short-term progress on the process of implementing a program, they are less helpful for measuring the outcomes that the programs hope to promote. To improve assessment of the impact of USAID programs on ultimate objectives, it is important to gather data to the extent possible on outcome variables. One example gathered in connection with the decentralization program was the percentage of local citizens who rate the quality of local government services as “good” or “very good.” Controls Most program evaluations involve indicators gathered only or mostly on “treated” units (those groups, individuals, or organizations who were assisted by USAID). Sometimes this is unavoidable, as when a program works with only one unit or actor (e.g., the Congress). At other times, however, it is possible to find comparison units that would be useful for assessing the impact of U.S. interventions.
Using control groups is invaluable for attributing impact to a USAID program. For example, without a control group it is impossible to know if the change in local party development is a result of a USAID intervention or another factor such as change in national party law, economic growth, or better media coverage.
Gathering outcome measurements on control units need not be prohibitively costly. The cost of modifying the 2003 and 2005 national surveys in Peru conducted by the Latin American Public Opinion Project (LAPOP) to include a sample of residents in control group municipalities would likely have run around $15,000 per survey, a small investment when compared to the $20 million cost of the program over five years.
APPENDIX E Opportunities for Randomization Comparisons across units or groups with which USAID partners worked and those with which they did not are only partially informative about the impact of USAID interventions. For example, differences across these groups could reflect preexisting differences and unobserved confounders, rather than the impact of the intervention. Similarly, selection bias could account for the variation in performance between the treatment and control groups.
One of the ways that social scientists sometimes approach this difficulty is through random assignment of units to treatment. In the context of decentralization, for example, the municipalities with which USAID implementers work could be determined by lottery. Subsequent differences between treated and untreated municipalities are likely to be due to the intervention, since other factors will be roughly balanced across the two groups of municipalities.
Randomization is not feasible for many kinds of programs, and there can be a range of practical obstacles; yet these are also often surmountable. In addition, experimental designs need not be expensive; additional costs can be offset by savings introduced by appropriate designs.
SAMPLE PROPOSED PROgRAM EvALUATION
DESIgNS FROM THREE FIELD vISITS3Selected Designs from Albania: Rule of Law Programs A major part of USAID’s DG-related activities in Albania involved increasing the effectiveness and fairness of legal sector institutions. With one possible exception, none of these rule of law activities are amenable to randomized evaluation. This is because they each deal with either (a) technical assistance to a single unit (e.g., the Inspectorate of the High Council of Justice, the Inspectorate of the Ministry of Justice, the High Inspectorate for the Declaration and Audit of Assets, the Citizen’s Advocacy Office, and the National Chamber of Advocates), (b) support for the preparation of a particular piece of legislation (e.g., the Freedom of Information Act and Administrative Procedures Code, a new conflict of interest law, and a new press law), or (c) support for a single activity, such as the implementation of an annual corruption survey. For a randomized evaluation of the efficacy of these activities to be possible they would have to be, in principle, implementable across a large number of units, 3 In addition to this group of selected projects discussed here, several others were analyzed
which these are not. There is only one Inspectorate of the High Council of Justice, only one conflict of interest law being prepared, and only one National Chamber of Advocates being supported, so it is not possible to compare the impact of support for these activities both where they are and are not being supported, and certainly not across multiple units. The best—indeed, only—way to evaluate the success of these activities is to identify the outcomes they are designed to affect, measure these outcomes both before and after the activities have been undertaken, and compare these measures.
The trick, however, is to find appropriate measures of the outcomes that the activities are designed to affect, and this is frequently far from straightforward. For example, the goal of the technical assistance to the Inspectorates of the High Council of Justice and the Ministry of Justice is to improve the transparency and accountability of the judiciary and to increase public confidence in judicial integrity. The latter can be measured fairly easily using public opinion polls that probe respondents’ trust in the judiciary and perceptions of its integrity (these would be administered before and after the period during which technical assistance was offered, and the results of the polls compared). However, measuring the degree to which the judiciary is transparent and accountable is much more difficult.
Part of the problem stems from the fact that transparency and accountability can only be ascertained vis-à-vis an (unknown) set of activities that should be brought to light and an (unknown) level of malfeasance that needs to be addressed. For example, suppose that, following the implementation of the programs designed to support the Inspectorate of the High Council of Justice, we observe that three judges are brought up on charges of corruption. Should this be taken as a sign that the activities worked in generating greater accountability? Compared to a baseline of no prosecutions, the answer is probably yes, to at least some degree.
But knowing just how effective the activities were depends on whether there were just three corrupt judges who should have been prosecuted or whether there were, in fact, twenty, in which case prosecuting the three only scratched the surface of the problem, or whether the prosecutions might be selective with the targets chosen for political reasons. Parallel problems affect other rule of law initiatives, such as efforts to improve the ability of lawyers to police themselves.