Polity provides a total of 21 points if the Democracy and Autocracy scales are merged into the Polity2 variable, which gives the impression of considerable sensitivity. In practice, however, country scores stack up at a few places (notably, 7 for autocracies and +10 for full democracies, the highest possible score), suggesting that the scale is not as sensitive as it purports to be. The EIU index is by far the most sensitive and does not appear to be arbitrarily “bunched.”8 Note that all extant indicators are bounded to some degree and therefore constrained. This means that there is no way to distinguish the quality of democracy among countries that have perfect negative or positive scores. This is fine as long as there really is no difference in the quality of democracy among these countries. Yet the latter assumption is highly questionable. Consider that in 2004, Freedom House assigned the highest score (1) on its Political Rights Index to the following 58 countries: Andorra, Australia, Austria, Bahamas, Barbados, Belgium, Belize, Bulgaria, Canada, Cape Verde, Chile, Costa Rica, Cyprus (Greek),

Czech Republic, Denmark, Dominica, Estonia, Finland, France, Germany, Greece, Grenada, Hungary, Iceland, Ireland, Israel, Italy, Japan, Kiribati, Latvia, Liechtenstein, Luxembourg, Malta, Marshall Islands, Mauritius, Micronesia, Nauru, Netherlands, New Zealand, Norway, Palau, Panama, Poland, Portugal, San Marino, Slovakia, Slovenia, South Africa, South Korea, Spain, St. Kitts and Nevis, St. Lucia, Suriname, Sweden, Switzerland, Tuvalu, United Kingdom, United States, and Uruguay.9 Are we really willing to believe that there are no substantial differences in the quality of democracy among these diverse polities?

Measurement Errors and Data Coverage Democracy indicators often suffer from measurement errors and/or missing data.10 Some (e.g., Freedom House) are based largely on expert judgments, judgments that may or may not reflect facts on the ground. 11 Some (e.g., Freedom House in the 1970s and 1980s) rely heavily on secondary accounts from a few newspapers such as the New York Times.

These accounts may or may not be trustworthy and almost assuredly do not provide comprehensive coverage of the world. Moreover, newspaper accounts suffer from extreme selection bias, depending almost entirely on the location of the newspaper’s reporters. Thus, if the New York Times has a reporter in Mexico but none in Central America, coverage of the latter is going to much spottier than the former. In an attempt to improve coverage and sophistication, some indices (e.g., EIU) impute a large quantity of missing data. This is a dubious procedure wherever data coverage is limited, as it seems to be for many of the EIU variables. Note that many of the EIU variables rely on polling data, which are available on a highly irregular basis for 100 or so nation states.

The quality of many of the surveys on which the EIU draws has not been clearly established. This means that data for these questions must be estimated by country experts for all other cases, estimated to be about half of the sample. (The procedures employed for this estimation are not known.) Wherever human judgments are required for coding, one must be 9 The precise period in question stretches from December 1, 2003, to November 30, 2004;

obtained from http://www.freedomhouse.org/template.cfm?page=&year=00 (accessed on September 21, 2006).

10 For general treatments of the problem of conceptualization and measurement, see Adcock and Collier (2001).

11 With respect to the general problem of expert judgments, see Tetlock (2005), who found that expert opinions tended to reflect more the consensus of the expert community than an objective “truth,” inasmuch as his surveys of experts produced answers that were often, in retrospect, no more accurate than a coin toss.

 IMPROVING DEMOCRACY ASSISTANCE concerned about the basis of the respondent’s decisions. In particular, one wonders whether coding decisions about particular topics (e.g., press freedom) may reflect an overall sense to outside experts of how democratic country A is, rather than an independent evaluation of the question at hand. The committee also worries about the problem of endogeneity of the evaluations, that is, with experts looking more at what other experts and indicators are doing rather than making their own independent evaluation of the country. The intercoder “reliability” may be little more than an artifact of experts accepting other experts’ judgments. In this respect, “disaggregated” indicators are often considerably less disaggregated than they appear. Note that it is the ambiguity of the questionnaires underlying these surveys that fosters this sort of premature aggregation.

The committee undertook a limited statistical examination of the Freedom House scores for 2007 on their key components—for political rights this included electoral process, pluralism and participation, and functioning of government; for civil liberties these were freedom of expression, association and organizational rights, rule of law, and personal autonomy and individual rights (see Appendix C). Across all countries, two-way correlations among the seven components were never less than 0.86 and in several cases were 0.95 or greater. This high correlation could imply that democracy is indeed a far “smoother” condition than the “lumpy” view expressed in this study. That is, the high correlation among the items suggests that picking any one is just about as good as picking any other.

Yet the committee doubts the independence of the judgments on each of the components of the scale.

The EIU democracy scale also is divided into components: civil rights, elections, functioning of government, participation, and culture. Taking the Freedom House and EIU components together, a factor analysis reveals that a single factor loading explains 83 percent of the variance across all 12 components, and the two principal factors explain 90 percent of the variance (Coppedge 2007). This, by itself, is not problematic; it could be that good/bad things go together; that is, countries that are democratic on one dimension are also democratic on another. However, it raises concern about the actual independence of the various components in these indices.

It could be, in other words, that respondents (either experts or citizens) who are asked about different dimensions of a polity are, in fact, simply reflecting their overall sense of a country’s democratic culture. It also suggests that the various independent components in fact contain no more useful information than the principal one or two factors.

Adding to worries about measurement error is the general absence of intercoder reliability tests as part of the coding procedure. Freedom House does not conduct such tests (or at least does not make them public). Polity does so, but it requires a good deal of hands-on training before coders reach an acceptable level of coding accuracy. This suggests that other codMEASURING DEMOCRACY ers would not reach the same decisions simply by reading Polity’s coding manual or that artificial uniformity is imposed. And this, in turn, points to a potential problem of conceptual validity: Key concepts are not well matched to the empirical data.

Aggregation Since democracy is a multifaceted concept, all composite indicators must wrestle with the aggregation problem—how to weight the components of an index and which components to include. For aggregation to be successful, the rules must be clear, operational, and consistent with common notions of what democracy is; that is, the resulting concept must be valid. It goes almost without saying that different solutions to the aggregation problem lead to quite different results (Munck and Verkuilen 2002; for a possible exception to this dictum, see Coppedge and Reinicke 1990).

Although most indicators have fairly explicit aggregation rules, they are often difficult to comprehend, and consequently to apply. They may also include “wild card” elements, allowing the coder free rein to assign a final score that accords with his or her overall impression of a country (e.g., Freedom House). In some cases (e.g., Polity), components are listed separately, which helps clarify the final score a country receives. However, in Polity’s case the components of the index are themselves highly aggregated, so the overall clarity of the indicator is not improved.

Even when aggregation rules are clear and unambiguous, because they bundle a host of diverse dimensions into a single score, it is often unclear which of the dimensions is driving a country’s score in a particular year. It is often difficult to articulate what an index value of “4” means within the context of any single indicator.

Moreover, even if an aggregation rule is explicit and operational, it is never above challenge. The Polity index, in Munck and Verkuilen’s estimation, “is based on an explicit but nonetheless quite convoluted aggregation rule” (2002:26). Indeed, a large number of possible aggregation rules fit, more or less, with everyday concepts of democracy and thus meet the minimum requirements of conceptual validity. For this reason the committee regards the aggregation problem as the only problem that is unsolvable in principle. There will always be disagreement over how to aggregate the various components of “Big D democracy” (i.e., the one central concept that is assumed to summarize a country’s regime status).

the leading democracy indices. Granted, intercorrelations among various democracy indicators are moderately high, suggesting some basic agreement over what constitutes a democratic state. As shown in the analysis undertaken for the committee that is summarized in Appendix C, the Polity2 variable (combining Democracy and Autocracy) drawn from the Polity dataset and the Freedom House Political Rights Index are correlated at.88 (Pearson’s r). Yet when countries with perfect democracy scores (e.g., the United Kingdom and the United States) are excluded from the samples, this intercorrelation drops to.78. And when countries with scores of 1 and 2 on the Freedom House Political Rights scale (the two top scores) are eliminated, the correlation experiences a further drop—to.63, implying that two-thirds of the variance in one scale is unrelated to changes in the other scale for countries outside the upper tier of democracies.

The committee similarly finds that correlations between the Freedom House and EIU scores are low when the highest-scoring countries are set aside. For a substantial number of countries—Ghana, Niger, GuineaBissau, the Central African Republic, Chad, Russia, Cambodia, Haiti, Cuba, and India—the Freedom House and EIU scores differ so widely that they would be considered democratic by one scale but undemocratic by the other. Indeed, country specialists often take issue with the scoring of countries they are familiar with (e.g., Bowman et al 2005; for more extensive cross-country tests, see Hadenius and Teorell 2005).

Since tracking progress in democracy assistance often depends on accurately measuring modest improvements in democracy, it is particularly distressing that the convergence between different scales is so low in this regard. While the upper “tails” of the distributions on the major indicators (the fully democratic regimes) are highly correlated, the democracy scores for countries in the upper middle to the bottom ranges are not.

The analysis commissioned by the committee (see Appendix C) found that the average correlation between the annual Freedom House and Polity scores for autocratic countries (those with Polity scores less than −6) during 1972-2002 was only.274. Among the partially free countries of the former Soviet Union, the correlation between annual Freedom House and Polity scores for the years 1991-2002 was.295; for the partially free countries in the Middle East, it was 0.40. In many cases the correlations for specific countries were negatie, meaning that the two scales gave opposite measures of whether democracy levels were improving or not.

This is a serious problem for USAID and other donors, since they are generally most concerned with identifying the level of democracy, and degrees of improvement, precisely for those countries lying in the middle and bottom of the distribution—countries that are mainly undemocratic or imperfectly democratic—rather than for countries already at the upper end of the democracy scale.


