«Development, Security, and Cooperation Policy and Global Affairs THE NATIONAL ACADEMIES PRESS 500 Fifth Street, N.W. Washington, DC 20001 NOTICE: The ...»
The current M&E plan for the project involves a participatory evaluation, primarily an analysis of survey data on whether respondents thought the projects “were helpful or very helpful,” supplemented by discussions with recipient organizations. A major limitation of this approach is the lack of a comparison group; data were collected only from groups or citizens who received USAID support (i.e., that were “treated”) and no effort was made to collect additional data from groups or citizens who did not receive USAID support (i.e., that could serve as a “control”). Any changes identified in the data attributed to the project might just as easily have been caused by confounding trends that happened to be taking place at the same time and that affected all communities (the project was implemented during an election period, so the more general effects of electoral mobilization cannot be ruled out as an alternative explanation for the observed changes in lobbying activism). Even in a small N design, an impact evaluation design (as opposed to the current M&E approach) that tracks trends both before and after a program is implemented and explicitly identifies untreated units for which comparable outcomes could be measured would provide much greater confidence in any inferences about the project’s actual effects.
If there are large amounts of data, the techniques described earlier (propensity score matching, regression discontinuity) can be employed.
In this context, however, there is no substitute for careful, qualitatively matched comparisons. For example, if three districts were selected in which to implement the program, the evaluator would need to identify three additional districts that are similar on a set of variables believed to be associated with the targeted outcomes (e.g., income, government capacity, infrastructure). More qualitative approaches mirror the logic underlying the quantitative techniques—the goal is to identify a relevant counterfactual in order to distinguish the impact of the program from spatial or temporal trends that, while outside the ambit of the DG assistance program, could influence outcomes in the areas being observed.
The measurement strategy in the existing M&E plan could also be significantly improved. The use of subjective assessments of activities by their participants raises two concerns: (1) because they are subjective rather than objective and (2) because the satisfaction of participants (parIMPROVING DEMOCRACY ASSISTANCE ties, CSOs, etc.) is not necessarily the same thing as project success and thus cannot provide reliable information about project’s impact. So one major area where improvement would be possible is providing additional external or objective measurements of program success (e.g., how much more funding for help for the disabled was actually granted to districts where CSOs received USAID assistance than was granted to otherwise comparable districts?).
Building the Capacity of the Parliament in Uganda Another example is the case of the bundle of USAID-sponsored activities designed to build the capacity of the Ugandan parliament through the sponsorship of field visits, public dialogues, and consultative workshops for members of parliament and parliamentary committee staff regarding specific issues such as corruption, family planning/reproductive health, and people with disabilities. The project sponsored fact-finding monitoring and supervisory field visits to 35 districts, including a number in Northern Uganda, where many members of parliament and parliamentary staff rarely venture. Again, the goals of the project are worthy and the activities appear to be well conceived; however, the project is not amenable to randomized evaluation. How can it be known whether or not the money spent on project activities had any demonstrable positive effect? Did members of parliament who participated in these activities behave differently than those who did not?
As is often the case with such projects, the principal monitoring method for these activities involved the collection of quarterly data on “outputs” (i.e., the number of public meetings attended by parliamentary committee members at the local level, the number of CSOs submitting written comments to parliamentary committee hearings, etc.) rather than “outcomes” (such as the impact that workshop attendance had on information acquisition, job performance, or other aspects of future behavior).
Also, the reports submitted by the implementing contractor do not provide much information on how the locations where the various public meetings took place or the participants who were invited to attend were selected—both of which are crucial for ruling out selection effects. The indicators measured by the contractor as part of the performance measurement plan of the project were used as indicators of project success.
However, because of the absence of a control group, it is impossible to disentangle time-varying unobserved trends from the impact of the project. For example, it is difficult to conclude that an increase in the number of parliamentary committees responding to CSOs with briefings and dialogues is an indication of project success. Such a change could reflect other (local) dynamics, the impact of other donor programs, the impact of the
ADDITIONAL IMPACT EVALUATION DESIGNS AND ESSENTIAL TOOLSproject of interest, or a combination of these. Similarly, in the absence of a counterfactual, the fact that the Persons with Disabilities Act was passed and enacted without executive initiative or support cannot be assumed to reflect the impact of the project.
As with the projects described previously, an evaluation design that furnishes more information for assessing impact than the current M&E approach is possible. First, assessing the impact of these initiatives would require some measurement of outcomes among a control group of members of parliament who were not exposed to the field visits, public dialogues, and consultative workshops. Perhaps with the intervention defined so broadly, identifying a control group is too difficult. By focusing on a more narrow set of activities, such as the opportunity for members of parliament to participate in field visits or facilitated consultative meetings between parliamentary committees and their constituencies, envisioning a reasonable control group is more feasible. For example, if not all members of parliament are going to participate in field visits, one simply needs to understand the selection process for members (and the differences that exist between participants and nonparticipants) in order to rule out characteristics correlated with participation in the program that might account for any observed differences after the field visits (i.e., members of parliament already engaged in the conflict elect to take part in a field visit to Northern Uganda). It might be possible to facilitate a series of consultative meetings for one committee at a time and to compare how behavior changes in that committee to other similar committees that had not yet benefited from the program.
In terms of the measurement of impact, one simple improvement could involve interviewing members of parliament about their actions and opinions rather than their perceptions of the usefulness of program activities. For example, instead of (or in addition to) asking, “If you participated or were aware of these activities, how useful were they in helping to generate government action on the problems in Northern Uganda?” (the current questionnaire item), a better approach would be to ask members of parliament at the beginning and after the program about their opinions on the conflict in Northern Uganda and about what they thought should be done and any action they have taken or intend to take. Questions aimed at measuring precisely what actions, if any, members of parliament or parliamentary committees took following the field visits would provide a better sense of the effects on behavior. If these questions were asked of both participants and nonparticipants, analysis of the differences between “treatment” and “control” members of parliament would be possible.
Even if these questions were asked only of participants but both before the intervention and afterward, analysis of the changes in the opinions and actions of “treatment” members of parliament would be possible.
IMPROVING DEMOCRACY ASSISTANCE The advantage of this type of evaluation design is that it permits analysis of changes or differences in members’ actual opinions and actions rather than their subjective assessment of the “usefulness” of programs. In addition, collecting information on the basic characteristics of those members who participated and those who did not would allow some statistical matching of the two groups to better determine how much the USAID DG program, as opposed to other prior characteristics of the members, contributed to any observed differences between the two groups in their subsequent actions and opinions.
As with the two other projects described earlier, implementing the proposed changes involves trade-offs, but the team concluded that, if USAID wished to learn more about the precise effectiveness of these programs, there is substantial opportunity to develop impact evaluations on these activities, even without using randomized designs.
WHAT TO DO WHEN THERE IS ONLy ONE UNIT OF ANALySIS10Many USAID projects involve interventions designed to affect a single unit of analysis. Such interventions are among the most important DG-promoting activities that the agency underwrites. But for the reasons explained in Chapter 5, they are also among the most difficult to evaluate.
For example, a major part of USAID’s DG-related activities in Albania involves increasing the effectiveness and fairness of legal-sector institutions. While critically important to the mission’s goals, almost none of the rule-of-law activities are amenable to randomized evaluation or other methods that exploit comparisons with untreated units. This is because they each deal with (1) technical assistance to a single bureaucracy (e.g., Inspectorate of the High Council of Justice, Inspectorate of the Ministry of Justice, High Inspectorate for the Declaration and Audit of Assets, Citizens Advocacy Office, and National Chamber of Advocates); (2) support for the preparation of a particular piece of legislation (e.g., Freedom of Information Act and Administrative Procedures Code, a new conflictof-interest law, and a new press law); or (3) support for a single activity, such as implementation of an annual corruption survey. For a randomized evaluation of the efficacy of these activities to be possible, they would have to be, in principle, able to be implemented across a large number of units, which these are not. There is only one Inspectorate of the High Council of Justice, only one conflict-of-interest law being prepared, and only one National Chamber of Advocates being supported, so it is not 10 This section and the next one draw on the work of a team led by Dan Posner, University
possible to compare the impact of support for these activities both where they are and are not being supported and certainly not across multiple units. The best way to evaluate the success of these activities is to identify the outcomes they are designed to affect, measure the outcomes both before and after the activities have been undertaken, and compare these measures. Collecting high-quality baseline and follow-up data, the former stretching back as far in time as possible, is the primary tool for impact evaluation in such a situation. When outcome data show a marked shift subsequent to an intervention and examination of other possible events or trends shows that they did not correspond to this shift, a credible case can be made for the intervention’s impact.