«Robert Rohde1, Judith Curry2, Donald Groom3, Robert Jacobsen3,4, Richard A. Muller1,3,4, Saul Perlmutter3,4, Arthur Rosenfeld3,4, Charlotte Wickham5, ...»
Berkeley Earth Temperature Averaging
Robert Rohde1, Judith Curry2, Donald Groom3,
Robert Jacobsen3,4, Richard A. Muller1,3,4, Saul Perlmutter3,4,
Arthur Rosenfeld3,4, Charlotte Wickham5, Jonathan Wurtele3,4
Novim Group, Berkeley Earth Surface Temperature Project; 2Georgia Institute of Technology;
Lawrence Berkeley National Laboratory; 4 University of California, Berkeley; 5Now at Oregon State
A new mathematical framework is presented for producing maps and large-scale averages of temperature changes from weather station data for the purposes of climate analysis.
This allows one to include short and discontinuous temperature records, so that nearly all temperature data can be used. The framework contains a weighting process that assesses the quality and consistency of a spatial network of temperature stations as an integral part of the averaging process. This permits data with varying levels of quality to be used without compromising the accuracy of the resulting reconstructions. Lastly, the process presented here is extensible to spatial networks of arbitrary density (or locally varying density) while maintaining the expected spatial relationships. In this paper, this framework is applied to the Global Historical Climatology Network land temperature dataset to present a new global land temperature reconstruction from 1800 to present with error uncertainties that include many key effects. In so doing, we find that the global land mean temperature has increased by 0.911 ±
0.042 C since the 1950s (95% confidence for statistical and spatial uncertainties). This change is consistent with global land-surface warming results previously reported, but with reduced uncertainty.
1. Introduction While there are many indicators of climate change, the long-term evolution of global surface temperatures is perhaps the metric that is both the easiest to understand and most closely linked to the quantitative predictions of climate models. It is also backed by the largest collection of raw data. According to the summary provided by the Intergovernmental Panel on Climate Change (IPCC), the mean global surface temperature (both land and oceans) has increased 0.64 ± 0.13 C from 1956 to 2005 at 95% confidence (Trenberth et al. 2007).
During the latter half of the twentieth century weather monitoring instruments of good quality were widely deployed, yet the quoted uncertainty on global temperature change during this time period is still ± 20%. Reducing this uncertainty is a major goal of this paper. Longer records may provide more precise indicators of change; however, according to the IPCC, temperature increases prior to 1950 were caused by a combination of anthropogenic factors and natural factors (e.g. changes in solar activity), and it is only since about 1950 that man-made emissions have come to dominate over natural factors. Hence constraining the post-1950 period is of particular importance in understanding the impact of greenhouse gases.
The Berkeley Earth Surface Temperature project was created to help refine our estimates of the rate of recent global warming. This is being approached through several parallel efforts to
A) increase the size of the data set used to study global climate change, B) bring additional statistical techniques to bear on the problem that will help reduce the uncertainty in the resulting averages, and C) produce new analysis of systematic effects, including data selection bias, urban heat island effects, and the limitations of poor station siting. The current paper focuses on refinements in the averaging process itself and does not introduce any new data. The analysis framework described here includes a number of features to identify and handle unreliable data;
however, discussion of specific biases such as those associated with station siting and/or urban heat islands will also be published separately.
2. Averaging Methods of Prior Studies Presently there are three major research groups that routinely produce a global average time series of instrumental temperatures for the purposes of studying climate change. These groups are located at the National Aeronautics and Space Administration Goddard Institute for Space Studies (NASA GISS), the National Oceanic and Atmospheric Administration (NOAA), and a collaboration of the Hadley Centre of the UK Meteorological Office with the Climate Research Unit of East Anglia (HadCRU). They have developed their analysis frameworks over a period of about 25 years and share many common features (Hansen and Lebedeff 1987; Hansen et al. 1999; Hansen et al. 2010; Jones et al. 1986; Jones and Moberg 2003; Brohan et al. 2006;
Smith and Reynolds 2005; Smith et al. 2008). The global average time series for the three groups are presented in Figure 1 and their relative similarities are immediately apparent. Each group combines measurements from fixed-position weather stations on land with transient ships / buoys in water to reconstruct changes in the global average temperature during the instrumental era, roughly 1850 to present. Two of the three groups (GISS and HadCRU) treat the land-based and ocean problems as essentially independent reconstructions with global results only formed after constructing separate land and ocean time series. The present paper will present improvements and innovations for the processing of the land-based measurements. Though much of the work presented can be modified for use in an ocean context, we will not discuss that application at this time due to the added complexities and systematics involved in monitoring from mobile ships / buoys.
Figure 1. (Upper panel) Comparison of the global annual averages of the three major research groups, plotted relative to the 1951-1980 average.
(Lower panel) The annual average uncertainty at 95% confidence reported by each of the three groups. NASA reports an uncertainty at only three discrete times, shown as solid dots, while the other two groups provide continuous estimates of the uncertainty.
In broad terms each land-based temperature analysis can be broken down into several overlapping pieces: A) the compilation of a basic dataset, B) the application of a quality control and “correction” framework to deal with erroneous, biased, and questionable data, and C) a process by which the resulting data is mapped and averaged to produce useful climate indices.
The existing research groups use different but heavily overlapping data sets consisting of between 4400 and 7500 weather monitoring stations (Brohan et al. 2006, Hansen et al. 2010;
Peterson and Vose 1997). Our ongoing work to build a climate database suggests that over 40000 weather station records have been digitized. All three temperature analysis groups derive a global average time series starting from monthly average temperatures, though daily data and records of maximum and minimum temperatures (as well as other variables such as precipitation) are increasingly used in other forms of climate analysis (Easterling et al. 1997, Klein and Können 2003, Alexander et al. 2006, Zhang et al. 2007). The selection of stations to include in climate analyses has been heavily influenced by algorithms that require the use of long, nearly-continuous records. Secondarily, the algorithms often require that all or most of a reference “baseline” period be represented from which a station’s “normal” temperature is defined. Each group differs in how it approaches these problems and the degree of flexibility they have in their execution, but these requirements have served to exclude many temperature records shorter than 15 years from existing analyses (only 5% of NOAA records are shorter than 15 years).
The focus on methods that require long records may arise in part from the way previous authors have thought about the climate. The World Meteorological Organization (WMO) gives an operational definition of climate as the average weather over a period of 30 years (Arguez and Vose 2011). From this perspective, it is trivially true that individual weather stations must have very long records in order to perceive multi-decadal climate changes from a single site.
However, as we will show, the focus on long record lengths is unnecessary when one can compare many station records with overlapping spatial and temporal coverage.
Additionally, though the focus of existing work has been on long records, it is unclear that such records are ultimately more accurate for any given time interval than are shorter records covering the same interval. The consistency of long records is affected by changes in instrumentation, station location, measurement procedures, local vegetation and many other factors that can introduce artificial biases in a temperature record (Folland et al. 2001, Peterson and Vose 1997, Brohan et al. 2006, Menne et al. 2009, Hansen et al. 2001). A previous analysis of the 1218 stations in US Historical Climatology Network found that on average each record has one spurious shift in mean level greater than about 0.5 C for every 15-20 years of record (Menne et al. 2009). Existing detection algorithms are inefficient for biases less than 0.5 C, suggesting that the typical length of record reliability is likely to be even shorter. All three groups have developed procedures to detect and “correct” for such biases by introducing adjustments to individual time series. Though procedures vary, the goal is generally to detect spurious changes in a record and use neighboring series to derive an appropriate adjustment. This process is generally known as “homogenization”, and has the effect of making the temperature network more spatially homogeneous but at the expense that neighboring series are no longer independent. For all of the existing groups, this process of bias adjustment is a separate step conducted prior to constructing a global average.
After homogenization (and other quality control steps), the existing groups place each “corrected” time series in its spatial context and construct a global average. The simplest process, conducted by HadCRU, divides the Earth into 5° x 5° latitude-longitude grid cells and associates the data from each station time series with a single cell. Because the size of the cells varies with latitude, the number of records per cell and weight per record is affected by this gridding process in a way that has nothing to do with the nature of the underlying climate. In contrast, GISS uses an 8000-element equal-area grid, and associates each station time series with multiple grid cells by defining the grid cell average as a distance-weighted function of temperatures at many nearby station locations. This captures some of the spatial structure and is resistant to many of the gridding artifacts that can affect HadCRU. Lastly, NOAA has the most sophisticated treatment of spatial structure. NOAA’s process, in part, decomposes an estimated spatial covariance matrix into a collection of empirical modes of spatial variability on a 5° x 5° grid. These modes are then used to map station data onto the grid according to the degree of covariance expected between the weather at a station location and the weather at a grid cell center. (For additional details, and explanation of how low-frequency and high-frequency modes are handled differently, see Smith and Reynolds 2005). In principle, NOAA’s method should be the best at capturing and exploiting spatial patterns of weather variability. However, their process relies on defining spatial modes during a relatively short modern reference period (1982for land records, Smith and Reynolds 2005), and they must assume that the patterns of spatial variation observed during that interval are adequately representative of the entire history.
Further, if the goal is to understand climate change then the assumption that spatial patterns of weather variability are time-invariant is potentially confounding.