«Development, Security, and Cooperation Policy and Global Affairs THE NATIONAL ACADEMIES PRESS 500 Fifth Street, N.W. Washington, DC 20001 NOTICE: The ...»
The benefits of easy data collection thus must be balanced against the benefits of data efficiency, coverage, and conceptual validity.
0 APPENDIX C Survey Research A major question is whether to include dimensions that require public opinion surveys. The EIU index has lots of questions of this nature, for example, about how legitimate the general public views the election process. (“Democracy assessments” also rely centrally on surveys, though their purpose is usually not comparative [Beetham 2004].) We have opted to include relatively few questions of this nature because (a) it is very expensive to do this sort of public opinion polling on a regular basis and across all countries, (b) it is less useful if polling is conducted only in “problem” countries (for then there is no basis for comparison), (c) no such historical information is available, (d) polling questions tend to vary in form or format from country to country and year to year and hence may convey misleading information if used as a cross-national indicator, (e) in nondemocratic countries citizens may not feel free to speak openly, and (f) public perceptions are not the most valid test of a country’s level of democracy, even where civil liberties are ensured. (On the latter point, one might consider Mexico’s recent election, which many members of the public thought was highly flawed, but which outside observers seem to think was conducted with considerable fairness.) Data Sources For contemporary years, obtaining sufficient information to code each new component ought to be fairly easy. Sources such as the Chronicle of Parliamentary Elections [and Deelopments], Keesing’s Contemporary Archies, the Journal of Democracy (“Election Watch”), El Pais (www.elpais.
es), the Statesman’s Yearbook, Europa Yearbook, Political Handbook of the World, reports of the Inter-Parliamentary Union, the ACE Electoral Knowledge Network, Elections Around the World (www.electionworld.
org), the International Foundation for Election Systems (www.IFES.
org), the Commonwealth Election Law and Observer Group (www.
thecommonwealth.org), the OSCE Office for Democratic Institutions and Human Rights (www.osce.org/odihr), the Carter Center (www.cartercenter.
org), the International Republican Institute (www.iri.org), the National Democratic Institute (www.ndi.org), the Organization for American States (www.oas.org), country narratives from the annual Freedom House surveys, newspaper reports, and secondary accounts (according to subject and time period) will be invaluable. Given the project’s broad theoretical scope and empirical reach, evidence-gathering approaches must be eclectic. Multiple sources will be employed wherever possible in order to cross-validate the accuracy of underlying data.
APPENDIX C Uncertainty It is vital to include not only an estimate of a country’s level of democracy across various dimensions and components but also a level of uncertainty associated with each estimate. This may be arrived at by combining two features of the analysis (a) intercoder reliability (if available) and (b) subjective uncertainty (the coder’s estimate of how accurate a given score might be). Uncertainty estimates serve several functions: Scholars may include these estimates as a formal component of their analyses; they provide a signal to policymakers of where the democracy index is most (and least) assured; and they focus attention on ways in which future iterations of the index may be improved.
Finally, uncertainty estimates allow for the inclusion of countries and time periods with vastly different quantities and qualities of data— without compromising the legitimacy of the overall project. As noted, contemporary codings are likely to be associated with lower levels of uncertainty than the analogous historical codings, and countries about which much is known (e.g., France) will be associated with lower levels of uncertainty than countries about which very little is known (e.g., Central African Republic). Without corresponding estimates of uncertainty, an index becomes hostage to its weakest links; critics gravitate quickly to countries and time periods that are highly suspect, and the validity of the index comes under harsh assault—even if the quality of other data points is more secure. With the systematic use of uncertainty estimates, these very real difficulties are brought directly into view by granting them a formal status. In so doing, the legitimacy of the larger enterprise is enhanced, and misuses are discouraged.
Time The dataset is assumed to be annual, though it might be coded at longer intervals in earlier historical periods. (One minor question to consider is whether codings should refer to the state of affairs pertaining at the end of the designated period (December 31), or to a mean value across the period of observation [January 1–December 31].) It is strongly urged that the index—or at least some elements of it—be extended back in time, preferably to 1800. There are several reasons for this. First, if one wishes to judge trends, a trend line is necessary. And the longer the trend line, the more information will be available for analysis.
Consider the question of how Ukraine is doing now—for example, in
2008. If a new index provides data only for that year, or several years prior, the meaning of a “5” (on some imagined scale) is difficult to assess.
Similarly, a purely contemporary index is unable to evaluate the question of democratic “waves” occurring at distinct points in historical time APPENDIX C (Huntington 1991) or of distinctive “sequences” in the transition process (McFaul 2005). If we wish to judge the accuracy of these hypotheses (and many others) we must have at our disposal a substantial slice of historical time.
Second, insofar as we wish to understand causal relations—what causes democracy and what democracy causes—it is vital to have a long time series so that causes and effects can be effectively disentangled.
(Of course, this does not assure that they will be disentangled; but with observational data it is virtually a prerequisite.) Third, recent work has raised the possibility that democracy’s effects are long term, rather than (or in addition to) short term (Gerring et al 2005, Converse and Kapstein 2006, Persson and Tabellini 2006). Indeed, it is quite possible that the short-term and long-term effects of democracy are quite different (plausibly, long-term effects are more consistent, and more positive along various developmental outcomes, than short-term effects). Consideration of these questions demands a historical coding of the key variable.
For all these reasons, we think it unlikely that any new index would displace Freedom House, Polity, and ACLP unless it can match the historical coverage of these well-established indices.
Summary Scores For each dimension, a summary score will be suggested. Evidently, this task of aggregation is devilish, for all the reasons just reviewed. Yet, it should be considerably easier to solve at this level than at the level of Big D democracy. Thus, we propose to aggregate the results for each component so as to arrive at a single score for each of the 13 dimensions.
This score will be expressed on a scale from 1 to 10, providing a snapshot view of how each country, in a given year, performs on that dimension.
We feel confident that, with the aid of the underlying components listed in the index below, it will be possible for those knowledgeable about a country to reach agreement on the (approximate) level of national sovereignty, popular sovereignty, and so on enjoyed by that country in a given year. A country’s score along these 13 dimensions comprises its Democracy Profile.
This level of aggregation seems feasible, and should be easy to compare across countries and through time. We also believe that this is a useful level of aggregation. It says something meaningful, something that should be understandable to all observers. It will allow USAID and other international actors a way of gauging progress and regress; it may even provide a way of gauging the relative success of different programs— though problems of causal attribution are inevitably knotty.
APPENDIX C We are considerably less confident that it will be possible to reach agreement in aggregating across the 13 dimensions to reach a single, summary score for each country in a given year—“Big-D” democracy.
Logistics In order to manage a project of this scope without losing touch with the particularities of each case, it is necessary to marry the virtues of cross-national data with the virtues of regional expertise. As currently envisioned, the project relies primarily upon country experts to do the case-by-case coding. Student assistants may be employed in a supporting role (e.g., to fetch data). These coding decisions will be supervised by several regional experts who are permanently attached to the project and who will work to ensure that coding procedures across countries, regions, and time periods are consistent. Extensive discussion and cross-validation will be conducted at all levels, including intercoder reliability tests.
We strongly advise an open and transparent system of commentary on the scores that are proposed for each country, after initial questionnaires are completed by country experts but before results are finalized. This might include a Web-based Wikipedia-style discussion in which interested individuals are encouraged to comment on the scores provisionally assigned to the country or countries that they know well. This commentary might take the form of additional information—perhaps unknown to the country expert—that speaks to the viability of the coding. Or it might take the form of extended discussions about how a particular question applies to the circumstances of that country. Naturally, some cranky participants may be anticipated in such a process. However, the Wikipedia experience suggests that there are many civic-minded individuals, some of them quite sophisticated, who may be interested in engaging in this process and may have a lot to add. At the very least, it may provide further information upon which to base estimates of uncertainty (as discussed above). Final decisions, in any case, would be left to a larger committee.
Evidently, different components will involve different sorts of judgments and different levels of difficulty. Some issues are harder than others, and will require more codings and recodings. As a general principle, wherever low intercoder reliability persists for a given question, that question should be reexamined and, if possible, reformulated.
It is important that the process of revision be continual. Even after the completed dataset is posted, users should be encouraged to contribute suggestions for revision and these suggestions should be systematically reviewed.
APPENDIX C Pilot Tests Before USAID, or any agency, undertakes a commitment to develop— and maintain—a new democracy index, it is important that it be confident of the yield. Thus, we recommend several interim tests of a “pilot” nature.
One of the principal claims of this index is that greater intercoder reliability will be achieved when the concept of democracy is disaggregated. This claim may be probed through intercoder reliability tests across the leading democracy indices. A pilot test of this nature might be conducted in the following manner: Train the same set of coders to code all countries (or a subset of countries) in a given year according to guidelines provided by Freedom House, Polity, and the present index.
Each country-year would receive several codings by different coders, thus providing the basis for an intercoder reliability test. These would then be compared across indices. Since the coders would remain the same, varying levels of intercoder reliability should be illustrative of basic differences in the performance of the indices. Of course, there are certain methodological obstacles to any study of this sort. One must decide how much training to provide to the coders, and how much time to give them.
One must decide whether to employ a few coders to cover all countries, or have separate coders for each country. One must decide whether to hire “naïve” coders (e.g., students) or coders well versed in the countries and regions they are assigned to code (the “country expert” model). In any case, we think the exercise worthwhile, not only because it provides an initial test of the present index but also because it may bring a level of rigor to a topic—political indicators—that has languished for many years in a highly unsatisfactory state.