«COMPARABILITY OF DATA: BRFSS 2003 The BRFSS is a cross-sectional surveillance survey currently involving 54 reporting areas (1). BRFSS ...»
1. Compute the prevalence for the risk factor for each period.
3. Identify a statistical test appropriate for comparing the lowest and the highest estimates at the 5% level of significance. For example, depending on the type of data, a t-test or the sign test might be appropriate.
4. Test the hypothesis that prevalence is not changing by using a two-sided test in which the null hypothesis is that the prevalences are equal.
5. Determine whether the resulting difference could be expected to occur by chance alone less than 5% of the time (i.e., test at the 95% confidence level).
Analyzing Subgroups Provided that the prevalence of risk factors did not change rapidly over time, data combined for two or more years may provide a sufficient number of respondents for additional prevalence estimates for population groups (such as age/sex/race subgroups or county populations). Before combining data for subgroups, it is necessary to determine whether the total number of respondents will yield the precision needed, which depends upon the intended use of the estimate. For example, greater precision would be required to justify implementing expensive programs than that needed for general information only.
The table below shows the sample size required for each of several levels of precision, based on a calculation in which the estimated risk factor prevalence is 50% and the design effect is 1.5.
Precision is indicated by the width of the 95% confidence interval around the prevalence estimate. For example, precision of 2% indicates that the 95% confidence interval is plus (+) or minus (-) 2% of 50%, or 48% to 52%. As shown in the table, to yield this high a level of precision, the sample size required is about 3,600 persons. When a lower level of precision is acceptable, the sample size can be considerably smaller.
The design effect is a measure of the complexity of the sampling design that indicates how the design differs from simple random sampling. It is defined as the variance for the actual sampling design divided by the variance for a simple random sample of the same size (9,11). For most risk factors in most states, the design effect is less than 1.5. If it is more than 1.5, however, sample sizes may need to be larger than those shown in the table above.
The standard error of a percentage is largest at 50% and decreases as a percentage approaches 0% or 100%. From this perspective, the required sample sizes listed in the table above are conservative estimates. They should be reasonably valid for percentages between 20% and 80%, but may significantly overstate the required sample sizes for smaller or larger percentages.
Creating Synthetic Estimates
Even after combining data for several years, sample sizes may still be inadequate for risk factor estimates for some geographic areas (e.g., counties) or subpopulations (e.g., people with diabetes) In such situations, the analyst may wish to derive synthetic estimates by extrapolating from BRFSS data collected at the state level.
Synthetic estimates can be calculated using the population estimates for the subgroup of interest and the statewide BRFSS risk factor prevalences for that subgroup. This approach assumes that the risk factor prevalences for specific subgroups in each area are the same as the statewide risk factor prevalences for the same subgroups. For example, it assumes that the risk factor prevalences for black women in every county of a state are the same as those for black women in the entire state. The accuracy of the estimate depends on the validity of this assumption, which is often impossible to judge. However, a “ballpark” estimate may be sufficient for establishing broad goals and objectives for prevention strategies. For a discussion of the precision of such estimates, see Levy and Lemeshow, 1991 (12).
An example for estimating the number of people with hypertension in a hypothetical county, as well as the overall prevalence of hypertension in that county, is shown below. The sex and race distribution of the county’s population differs from the statewide population, and these differences need to be taken into account. By developing a table like the one below, a synthetic estimate for the overall county prevalence of hypertension can be made.
The statewide prevalence values, given as rates per 100 persons, are computed from the BRFSS data.
The estimated number of persons with hypertension for each race-sex group in the county was obtained by multiplying the statewide prevalence for that group by the county population for the group. To determine the total county prevalence, the number of people with hypertension in each race-sex group in the county were summed and this sum (18,070) was divided by the county’s total population (75,000) to yield an overall prevalence of 24.1 per 100 persons.
Creating Direct Estimates
Provided that the subpopulation sample size is sufficient, analysts may choose to produce direct estimates. SUDAAN or a similar program will be needed for direct estimates. If possible, it is desirable to re-adjust the poststratification weight (_POSTSTR) to the age-by-race-by-gender population distribution of the subarea (e.g.county). To locally post-stratify the CDC BRFSS weights used for the direct estimate, poststratify _WT2 to the population of interest. The equivalent local final weight is a product of _WT2 and the local poststratification factor.
New Calculated Variables and Risk Factors
Not all of the variables that appear on the public use data set are taken directly from the state files. CDC prepares a set of SAS programs that are used for end of year processing. These programs prepare the data for analysis and add weighting, sample design, intermediate, calculated variables, and risk factors to the data set. The following calculated variables and risk factors, created for the user’s convenience, are
examples of results from this procedure:
MODCAT_, VIGCAT_, PACAT, _RFHLTH, _RFNOPA, _RFHYPE4
The procedures for the variables vary in complexity; some only combine codes, while others require sorting and combining selected codes from multiple variables, which may result in the calculation of an intermediate variable. For further details regarding the calculated variables and risk factors, refer to the document entitled “Calculated Variables and Risk Factors for the 2003 Behavioral Risk Factor Surveillance System,” located at http://www.cdc.gov/brfss/technical_infodata/surveydata/2003.htm.
1. Mokdad AH, Stroup DF, Giles WH. Public health surveillance for behavioral risk factors in a changing environment. Recommendations from the Behavioral Risk Factor Surveillance Team. MMWR Recomm Rep 2003; 52:1-12.
2. SAS Institute Inc., SAS/STAT User’s Guide, Version 8. Cary, NC: SAS Institute, Inc., 1999; 3181– 3272.
3. Shah BV, Barnwell BG, Bieler GS. SUDAAN User’s Manuel, Release 7.5, Research Triangle Park, NC: Research Triangle Institute, 1997.
4. Dean AG, Dean JA, Coulombier D, Brendel KA, Smith DC, Burton AH, Dicker RC, Sullivan K, Fagan RF, Arner TG. Epi Info, Version 6.0: A word processing, database, and statistics program for public health on IBM-compatible microcomputers. Centers for Disease Control and Prevention. 1995.
5. Groves RM, Kahn RL. Surveys by Telephone: A national comparison with personal interviews, New York, Academic Press, 1979.
6. Banks MJ. Comparing health and medical care estimates of the phone and nonphone populations.
Proceedings of the Section on Survey Research Methods, American Statistical Association, 1983, pp 569–574.
7. Thornberry OT, Massey JT. Trends in United States Telephone Coverage Across Time and Subgroups. In Groves, RM et al editors Telephone Survey Methodology, pp. 25–49, New York, John Wiley & Sons, 1988.
8. Massey JT, Botman SL. Weighting Adjustments for Random Digit Dialed Surveys. In Groves, RM et al editors Telephone Survey Methodology, pp. 143–160,New York, John Wiley & Sons, 1988.
9. Frazier EL, Franks AL, Sanderson LM. Behavioral Risk Factor Data. In Using chronic disease data: A handbook for public health practitioners, pp 4.1–1.17. Centers for Disease Control and Prevention.
10. Nelson DE, Powell-Griner E, Town M, Kovar MG. A comparison of national estimates from the National Health Interview Survey and the Behavioral Risk Factor Surveillance System. Am J Public Health 2003; 93:1335-1341.
11. Groves RM. Survey Errors and Survey Costs. New York: John Wiley and Sons, 1989; 265, 271–272.
12. Levy PS, Lemeshow S. Sampling of Populations: Methods and Applications. New York: John Wiley and Sons, 1991; 347–350.