«COMPARABILITY OF DATA: BRFSS 2003 The BRFSS is a cross-sectional surveillance survey currently involving 54 reporting areas (1). BRFSS ...»
COMPARABILITY OF DATA: BRFSS 2003
The BRFSS is a cross-sectional surveillance survey currently involving 54 reporting areas (1). BRFSS
questionnaires, data, and reports are available on the Internet at www.cdc.gov/brfss. It is important to note
that any survey will have natural variation across sample sites; therefore, some variation between states is
to be expected. The complex sample design and the multiple reporting areas complicate the analysis of
the BRFSS. Although CDC works with the states to minimize deviations, in 2003 there were some deviations in sampling and weighting protocols, sample size, response rates, and collection or processing procedures. In addition, California’s questionnaire had a few minor differences in wording of questions.
The following section identifies other known variations for the 2003 data year.
A. 2003 Data Anomalies and Deviations from Sampling Frame and Weighting Protocols In 75% of the states, a portion of sample records intended for use during one month took more than one month to complete. In several instances, states used their monthly sample over a period of several months. This deviation will disproportionately affect analyses based on monthly, rather than annual, data.
Additionally, Michigan received its sample quarterly rather than monthly.
Several states did not collect data for all 12 months of the year. New Jersey did not report any interviews in July. The District of Columbia did not complete any interviews in May, June, July, and August. New Mexico did not complete any interviews in October and Ohio did not complete interviews in July and August.
Several states were unable to close out the December sample in 2003 and data collection continued into early 2004. Illinois, Kentucky, Nevada, New Mexico, Ohio, Oklahoma, Utah, and Wisconsin had some completed interviews in January, 2004. Hawaii completed some interviews in January and February, 2004.
More information about the quality of the survey data can be found in the 2003 BRFSS Summary Data Quality Report.
B. Other Limitations of the 2003 Data Telephone coverage varies by state and also by subpopulation. Telephone coverage averages 97.6% for U.S. states as a whole, but noncoverage ranges from 1.1% in Connecticut and New Hampshire, to 6.6% in Mississippi. It is estimated that 23.8% of households in Puerto Rico are without telephone service. Data on telephone coverage in U.S. households are available at http://factfinder.census.gov.
Illinois used a dual questionnaire and collected data on core items addressing health status, health care access, exercise, diabetes, hypertension and cholesterol awareness, asthma, immunization, tobacco use, alcohol consumption, physical activity, and demographics from all eligible respondents. Questions about fruit and vegetable consumption, weight control, excess sun exposure, arthritis, falls, disability, veteran’s status, and HIV/AIDS were asked of about half of eligible respondents.
California modified the wording of core questions addressing health plans, diabetes, frequency of alcohol consumption, Hispanic ethnicity, level of education, and household income. The data from these questions may therefore have limited comparability to those of other reporting areas.
The data from an optional module is included if asked of all eligible respondents within a state for the entire data collection year. A state may have indicated the use of an optional module in 2003, but the data may have been moved into the state-added questions if it does not represent all eligible respondents.
A change in 2002 to the final disposition codes has continued to present some inconsistencies in closing out the questionnaire. Prior to 2002, interviews that were terminated during or after the demographics section were coded as complete interviews, and any remaining unanswered questions were coded as refused by the interviewer. In 2002, a revised procedure was implemented for handling partial completes.
The revised procedure for partial completes is to stop coding questions at the point of interview termination to assign the appropriate disposition code. However, states have not consistently followed the procedure. During 2003, states generally handled partial complete interviews in one of three ways: they 1) coded the remaining questions as refused and coded the record a 110 Complete, 2) coded the remaining questions as refused and coded the record a 120 Partial Complete, or 3) did not ask the remaining questions (answers left as missing) and coded the record a 120 Partial Complete. The variability in how the interviews are dispositioned and where in the survey the interview was terminated will have an impact on refusal rates for certain questions and modules. These inconsistencies should be taken into account when determining which records to include in an analysis. Records with a termination in the questionnaire followed by coded refusals for the remainder of the eligible responses have been dispositioned as 120 Partial Completes.
Another issue regarding partial completes is the inappropriate coding of the remaining questions as “refused” (i.e., ‘9’) when some of these questions may have valid response codes of greater than ‘9.’ For example, some questions allow responses of 01-76, 77, 88, and 99 (with 99 as the refusal code). The problem occurs when an interviewer incorrectly codes the remaining questions as refused and enters a ‘9’ instead of a ‘99’ for these question response types. Nine (9) is a valid response for these particular questions and should not have been used to indicate refusal; doing so may have altered which questions were coded as refused for the remainder of a core section or module. When reviewing responses to a partial complete, data users should therefore be aware that a core section or module that follows the demographics section may contain questions incorrectly coded as refused (‘9 filled’).
Several states continue to ask the Diabetes module questions directly after the Diabetes questions in the core of the survey. In addition, several states ask the Adult Asthma module questions after the asthma questions in the core. Some states have also asked the Childhood Asthma module questions in the demographics section of the core survey after question 6, (CHILDREN) – number of children under age 18 in household.
More information about survey item nonresponse can be found in the 2003 BRFSS Summary Data Quality Report and in the respective states’ Data Quality Reports.
STATISTICAL AND ANALYTIC ISSUESEstimation Procedures Unweighted data on the BRFSS represent the actual responses of each respondent, before any adjustment is made for variation in respondents’ probability of selection, disproportionate selection of population subgroups relative to the state’s population distribution, or nonresponse. Weighted BRFSS data represent results that have been adjusted to compensate for these issues. Irrespective of state sample design, use of the final weight in analysis is necessary if generalizations are to be made from the sample to the population.
The procedures for estimating variances described in most statistical texts and used in most statistical software packages are based on the assumption of simple random sampling (SRS). However, the data collected in the BRFSS are obtained through a complex sample design; therefore, the direct application of standard statistical analysis methods for variance estimation and hypothesis testing may yield misleading results. There are computer programs available that take such complex sample designs into account.
SAS Version 8’s SURVEYMEANS and SURVEYREG procedures, SUDAAN, and Epi Info’s C-Sample are among those suitable for analyzing BRFSS data (2,3,4). SAS and SUDAAN can be used for tabular and regression analyses (2,3); SUDAAN has these and additional options (3). Epi Info’s C-sample can be used to calculate simple frequencies and two-way cross-tabulations (4). When using these software products, users must know the stratum, the primary sampling units, and the record weight—all of which are on the master data file. For more information on calculating variance estimations using SAS, see the SAS/STAT Users Guide, Version 8 (2). For information about SUDAAN, see the SUDAAN Users Manual, Release 7.5 (3). For information about Epi Info, see Epi Info, Version 6.0 (4).
Although the overall number of respondents in the BRFSS is more than sufficiently large for statistical inference purposes, subgroup analyses can lead to estimators that are unreliable. Consequently, users need to pay particular attention to the subgroup sample when analyzing subgroup data, especially within a single data year or geographic area. Small sample sizes may produce unstable estimates. Reliability of an estimate depends on the actual unweighted number of respondents in a category, not on the weighted number. Interpreting and reporting weighted numbers that are based on a small, unweighted number of respondents can mislead the reader into believing that a given finding is much more precise than it actually is. The BRFSS follows a rule of not reporting or interpreting percentages based upon a denominator of fewer than 50 respondents (unweighted sample). For this reason, the FIPS County code is removed from the data file for any county with less than 50 respondents.
Analytic Issues Advantages and Disadvantages of Telephone Surveys Compared with face-to-face interviewing techniques, telephone interviews are easy to conduct and monitor and are cost efficient. However, telephone interviews have limitations. Telephone surveys may have higher levels of noncoverage than face-to-face interviews because some U.S. households cannot be reached by telephone. As mentioned earlier, approximately 98% of households in the United States have telephones. A number of studies have shown that the telephone and non-telephone populations are different with respect to demographic, economic, and health characteristics (5, 6, 7). Although the estimates of characteristics for the total population are unlikely to be substantially affected by the omission of the households without telephones, some of the subpopulation estimates could be biased. Telephone coverage is lower for population subgroups such as blacks in the South, people with low incomes, people in rural areas, people with less than 12 years education, people in poor health, and heads of households under 25 years of age (8). However, poststratification adjustments for age, race, and sex, and other weighting adjustments used for the BRFSS data minimize the impact of differences in noncoverage, undercoverage, and nonresponse at the state level.
Despite the above limitations, prevalence estimates from the BRFSS correspond well with findings from surveys based on face-to-face interviews, including studies conducted by the National Institute on Alcohol Abuse and Alcoholism, CDC’s National Center for Health Statistics, and the American Heart Association (9,10). A summary of methodologic studies of BRFSS is provided in the publication section at www.cdc.gov/brfss.
Surveys based on self-reported information may be less accurate than those based on physical measurements. For example, respondents are known to underreport weight. Although this type of potential bias is an element of both telephone and face-to-face interviews, the underreporting should be taken into consideration when interpreting self-reported data. However, when measuring change over time, this type of bias is likely to be constant, and is therefore not a factor in trend analysis.
With ongoing changes in telephone technology, there are more and more households that have cellular telephones and no traditional telephone lines in their homes. These households are presently not in the sampling frame for the BRFSS, which may bias the survey results, especially if the percentage of cellulartelephone-only households increases in the coming years. The BRFSS has plans to study the impact of cellular phones on survey response and the feasibility of various methods for data collection to complement present survey methods (1).
Aggregating Data Over Time
When data from one time period are insufficient for estimating the prevalence of a risk factor, data from multiple periods can be combined as long as the prevalence of the risk factor of interest did not substantially change during one of the periods. One method that can be used to assess the stability of the
prevalence estimates is as follows (9):