If there is little agreement on the quality and direction of democracy in countries that lie in between the extremes, it must be concluded that there is relatively little convergent validity across the most widely used democracy indicators. That is, whatever their intent, they are not in fact capturing the same concept.

By way of conclusion to this very short review of extant indicators, the committee quotes from another recent review by Jim Vermillion, current executive vice president of the International Foundation for Election


Initial work in the measurement of democracy has provided some excellent insights into specific measures and has helped enlighten our view of where underlying concepts related to democracy stand. However, we are far from coming up with a uniform, theoretically cohesive definition of the construct of democracy and its evolution that lends itself easily to statistical estimation/manipulation and meaningful hypothesis testing.

(Vermillion 2006:30) The need for a new approach to this ongoing, and very troublesome, problem of conceptualization and measurement is apparent.

Average versus Country-Specific Results It is reasonable to ask, if the existing indicators of democracy have so many problems, how can the committee have any confidence in the findings mentioned in Chapter 1, such as that the number of democracies in the world is rising and that USAID DG assistance has, on average, made a significant positive difference in democracy levels? For that matter, how is it possible for scholars to have undertaken more than two decades of quantitative research on democracy and democratization, correlating various causal factors with shifts in these democracy indicators, with any belief in the validity of their research?

The answers to this question lie in the very different purposes that democracy indicators must serve for scholarly analysis of average or overall global trends, as against the purposes they must serve to support policy analysis of trends in specific countries. For the former purpose it is acceptable for democracy data to have substantial errors regarding levels of democracy in particular states, as long as the errors are not systematically biased. That is, even a democracy scale that makes substantial errors will be useful for looking at average trends as long as its score for any given country is equally likely to be “too high” or “too low.” Such a scale will state the level of democracy as too high in about half the world’s countries and too low in the other half, but the average level of global democracy overall will be fairly correct, and scholars can use statistical methods to “separate out” the random errors from the overall trends.

 IMPROVING DEMOCRACY ASSISTANCE Statistical analyses of democracy that use extant indicators such as Polity or Freedom House are looking for the oerall or aerage effects of various factors—such as economic growth, democracy assistance, or regime types—on democracy. Thus the Finkel et al studies (2007, 2008) described above, which demonstrate a positive impact of various forms of democracy assistance on aerage levels of democracy while statistically controlling for a host of background, trend, and other causal variables, also controlled for measurement errors in the democracy indices that were assumed to be evenly distributed across countries. What their results tell us is something like the following: In any four-year period, if three countries are examined in which USAID invested an average of $10 million per country per year in DG assistance, those countries’ Freedom House scores will show an overall increase of three points (an average increase of one point per country) at the end of those four years relative to what would have been expected in the absence of USAID DG assistance.12 Let us accept this finding as the best available estimate of the truth (and this study has been subjected to careful peer criticism and its results stand up well)—on aerage, DG programs do achieve positive results.

Yet such measures are not helpful, indeed can even be misleading, if used to evaluate the effects of DG programming in particular countries.

For example, say that USAID spends $10 million on various DG programs in each of three countries. Say also that a valid and accurate democracy scale (assuming we are able to set aside the effects of any other factors on levels of democracy) would show that such programs led country 1 to increase by two points on this democracy scale and country 2 to increase by one point, while country 3 saw no change. USAID assistance programs thus achieved substantial success in one case, modest success in another, and no effect in the last.

However, the flawed indicator we have instead records that country 1 increased by three points and country 2 decreased by one point, while country 3 increased by one point. On aerage, this is exactly the same result—overall scores in these countries increased by a total of three points (or an average of one point per country) for these countries over four years. Yet if USAID relies on this flawed indicator to estimate the impact of its efforts in specific countries, it will be considerably off. It will greatly overestimate the success of its programs in countries 1 and 3 and wrongly conclude that its programs were associated with a decline in democracy in country 2—all of this just because of random errors in the way that current democracy indicators track small movements or middle-range levels of democracy in particular countries. If USAID were 12 Finkel et al (2007, 2008) found essentially the same results with Polity scores as Freedom

then to ramp up and spread the program in country 1, thinking it an overwhelming (rather than modest) success, and also spread the programs in country 3 that “seemed” to produce a success, while halting the programs that apparently failed to stem democracy decline in country 2, it could be making severe mistakes. Thus the errors found in current widely used democracy indicators, while still allowing them to serve well enough for purposes of scholarly research on average effects of various factors on democracy or for charting overall democracy trends, do not serve USAID at all well for the policy purposes of determining the effects of specific programs in particular countries.13 For this reason the rest of this chapter lays out an approach that the committee believes will be more fruitful for developing useful indicators of democratic change. Also for this reason, throughout this report methods are stressed for helping USAID determine the effects of its programs using more concrete indicators of the immediate policy outcomes of those programs, rather than macrolevel indicators of national levels of democracy.



Given the multiple difficulties encountered by Freedom House, Polity, ACLP, EIU, and other extant indicators of democracy, one might reasonably conclude that the stated task simply cannot be accomplished. That is, one cannot assign a single point score to a particular country at a particular point in time, expecting that this score will accurately capture all the nuances of democracy and be empirically valid through time and across space. The goal of precise numerical comparison is impossible.

While this conclusion may seem compelling, at least initially, one should also consider the costs of not comparing in a systematic fashion.

Without some way of analyzing the quality of democracy through time and across countries, there is no way to mark progress or regress on this vital factor, to explain it, or to affect its future course. To gain knowledge of the world, and hence to make effective policy interventions, comparisons must be made. And to compare with precision numerical scores must be assigned to countries according to the quality of democracy they supAs discussed in Chapter 4, when scholars undertake case studies of democratization in a particular country, they generally do not bother with indicators such as Polity or Freedom House to describe trends in that country, but instead focus on institutional or behavioral changes that they document in detail and seek the causes or consequences of those observed changes.

 IMPROVING DEMOCRACY ASSISTANCE posedly possess.14 How, then—given the shortcomings of extant democracy indices—might this difficult task be handled more effectively?

The committee proposes that the key to developing a more accurate and useful empirical approach to democracy—as to other large and unwieldy subjects (e.g., “governance”)—is to be found in greater disaggregation (Coppedge, forthcoming). Rather than focusing on how, precisely, to define democracy and attempting to arrive at a summary score (à la Freedom House or Polity), the committee proposes to focus on developing the most transparent, independent, and valid measures for the underlying dimensions of this concept. The key point is that this approach to data gathering takes place at a much lower level of abstraction than Big D democracy.

Previous Efforts at Disaggregation The idea of disaggregating measures of democracy and governance is of course not entirely new. As mentioned, the Polity IV dataset includes six component variables, each measured separately. Other precedents include the Handbook of Democracy and Goernance Program Indicators (USAID 1998), the Bertelsmann Transformation Index (Bertelsmann Foundation 2003), the Database of Political Institutions (Beck et al 2000), the EIU index (Kekic 2007), and the World Bank governance indicators (Kaufmann et al 2006).

In some areas—for example, free press (Freedom House 2006) or elections (Munck 2006)—disaggregated topics have been successfully measured on a global scale. In these and other instances, the committee suggests building on, or simply incorporating, previous efforts. However, the usual approach to disaggregation is flawed, either because the resulting indicators are still highly abstract and hence difficult to operationalize (e.g., Polity IV) and/or because the underlying components, while conceptually distinct, are gathered in such a way as to compromise their independence.

Consider the six World Bank governance indicators—government effectiveness, voice and accountability, control of corruption, rule of law, regulatory burden, and political instability—which involve very similar underlying components (Landman 2003, Kurtz and Schrank 2007, Thomas 2007). Issues of corruption, for example, figure in several of the six dimensions. It seems likely that overall perceptions on the part of 14 To some the assignment of a point score may seem a prime example of misplaced preci

survey respondents (whether expert or civilian) as to “how country A is doing” color many of the survey responses on which these indicators depend, insofar as survey questions tend to be quite broad. This sort of disaggregation does not achieve the intended purpose. Indeed, it is often argued that the six Kaufmann variables are best regarded as measures of the same thing and therefore are often combined in empirical analyses.

A similar problem besets other efforts at disaggregation, such as the recently released Freedom House measures of civil liberties and political rights, which are broken down into seven components: electoral process, political pluralism and participation, functioning of government, freedom of expression and belief, associational and organizational rights, rule of law, personal autonomy, and individual rights (Freedom House 2007).

Again, the extremely high correlations among these components (.87 on all paired comparisons; see Appendix C), along with the vagueness of the questions and coding procedures, prompts us to wonder whether these are truly independent measures of democracy, or simply different ways of accessing a country’s overall gestalt.

