«Robert Rohde1, Judith Curry2, Donald Groom3, Robert Jacobsen3,4, Richard A. Muller1,3,4, Saul Perlmutter3,4, Arthur Rosenfeld3,4, Charlotte Wickham5, ...»
In all three of these prior approaches, every record used in gridded averaging is assumed to be equally reliable. More precisely, they make the assumption that their quality control and homogenization processes address erroneous and biased data prior to the gridding and averaging step in such a way that each resulting time series is deserving of equal weight. (GISS makes a partial exception in that a corrective model for urban heat island biases is applied after gridding.) This has the effect that records subject to many bias “corrections” can be given the same weight in an average as a record where no bias adjustments were found to be necessary. In such cases, the differences in data quality may play a role in how the uncertainty is assessed, but not in the construction of the global average.
All three of the averaging processes currently being used rely on the concept of a “baseline” parameter to define the “normal” weather. The baseline can either be introduced for each record before gridding (e.g. HadCRU) or it can be introduced after gridding and defined at the level of the grid cell average (e.g. NASA). The intent of the baseline temperature parameter is to capture the “normal” climate at that location by reference to the average weather over some specific reference period (e.g. 1960-1980). Each time series is then replaced by an “anomaly” time series consisting of the differences from the baseline. This approach is motivated by the observation that temperatures change rapidly with latitude (about 1 C per 150 km poleward) and altitude (about 1 C for every 220 m of surface elevation), and that these changes are quite large compared to the approximately 1 C / century of global warming that one wants to investigate. In effect, the baseline parameters are meant to capture most of the spatial variability between sites.
In particular, the average of anomaly series should be much less sensitive to biases due to the start and stop of individual records. Without some adjustment for such spatial variability, an excess of high (or low) latitude stations could erroneously pull the corresponding global average to lower (or higher) values.
The use of an individual baseline parameter per station (or grid cell) makes no assumptions about the underlying spatial structure. This means the maximum spatial information can, in principle, be removed from each record; however, several trade-offs are incurred in doing so. First, the use of predefined reference intervals will limit the usability of stations that were not active during the corresponding period (though other compensating approaches are often used). Secondly, by defining all stations to have zero anomaly during the reference period, one may suppress true structure in the temperature field at that time.
Specifically, reconstructions using this method will have lower spatial variability during the reference interval than at other times due to the artificial constraint that all regions have the same mean value during the reference period.
Lastly, after gridding the data and creating anomaly series, each existing group creates a large-scale average using an area-weighted average of non-empty grid cells. HadCRU and GISS add an additional nuance, as they apply a post-stratification procedure prior to their final average.
Specifically, they create averages of specific latitude bands (or hemispheres in HadCRU’s case), and then combine those averages to create the final global average. This has the effect that each missing cell in a latitude band is essentially replaced by the average of the valid cells in the band before constructing the ultimate global average. To a degree this approach also compensates for the fact that certain areas (e.g. the Northern Hemisphere) tend to have much greater historical coverage than others. Monte Carlo tests we conducted generally confirm that latitudinal banding improves the accuracy of the overall average given the techniques employed by HadCRU and GISS; however, we observe that such approaches are largely an indirect means of incorporating information about the spatial structure of the temperature field that could be modeled more directly.
3. New Averaging Model The global average temperature is a simple descriptive statistic that aims to characterize the Earth. Operationally, the global average may be defined as the integral average of the temperatures over the surface of the Earth as would be measured by an ideal weather station sampling the air at every location. As the true Earth has neither ideal temperature stations nor infinitely dense spatial coverage, we can never capture the ideal global average temperature completely; however, we can use the data we do have to constrain its value.
As described in the preceding section, the existing global temperature analysis groups use a variety of well-motivated algorithms to generate a history of global temperature change.
However, none of their approaches would generally correspond to a statistical model in the more formal sense.
Let ( ⃑ ) be the global temperature field in space and time. We define the
(⃑ ) () ( ⃑) (⃑ ) 
Uniqueness can be guaranteed by applying the constraints:
( ⃑) ⃑ ∫,
Given this decomposition, we see that ( ) corresponds to the global mean temperature as a function of time. ( ⃑) captures the time-invariant spatial structure of the temperature field, and hence can be seen as a form of spatial “climatology”, though it differs from the normal definition of a climatology by a simple additive factor corresponding to the long-term average of ( ). The last term, ( ⃑ ), is meant to capture the “weather”, i.e. those fluctuations in temperature over space and time that are neither part of the long-term evolution of the average nor part of the stable spatial structure. In this paper, we show how it is possible to estimate the global temperature field by simultaneously constraining all three pieces of ( ⃑ ) using the available data. (Because we are introducing a large number of symbols, we summarize all the key symbols in the Appendix.) As our study is based solely on the use of land-based temperature data, we choose to restrict the spatial integrals in equation  to only the Earth’s land surface. As a result, our study will identify ( ) with the land temperature average only. Rather than defining a specific base interval (e.g. 1950-1980) as has been common in prior work, we will show below how it is possible to reconcile all time periods simultaneously. As a result, the time integral in equation
Where SSD denotes the sum of square deviations and such a minimization would attempt to minimize the error terms. Though appealing,  is ultimately misguided as ( ) is distributed highly non-uniformly in both space and time, and the temperature histories at neighboring stations are highly correlated. A naïve application of  would result in ̂( ) biased towards the most densely sampled regions of the globe.
However,  does inspire our first natural set of constraint equations, namely ∑ ( ( ) ̂( ) ̂ ( ⃑ ))  ̂ ∑ Since ̂ is specific to a single station, there is no disadvantage to simply stating that it be chosen to minimize the error at that specific station.
To determine the other fields, it is instructive to consider the properties that we expect ̂ ( ⃑ ) to have. To begin, it should have (at least approximately) zero mean over space and time in accordance with equation . Next, we expect that weather fluctuations should be highly correlated over short distances in space. These considerations are very similar to the fundamental assumptions of the spatial statistical analysis technique known as Kriging (Krige
In most practical uses of Kriging it is necessary to estimate or approximate the covariance matrix in equation  based on the available data (Krige 1951, Cressie 1990, Journel 1989). NOAA also requires the covariance matrix for their optimal interpolation method; they estimate it by first constructing a variogram during a time interval with dense temperature sampling and then decomposing it into empirical spatial modes that are used to model the spatial structure of the data (Smith and Reynolds 2005). Their approach is nearly ideal for capturing the spatial structure of the data during the modern era, but has several weaknesses. Specifically this method assumes that the spatial structures are adequately constrained during a brief calibration period and that such relationships remain stable even over an extended period of climate change.
We present an alternative that preserves many of the natural spatial considerations provided by Kriging, but also shares characteristics with the local averaging approach adopted by GISS (Hansen et al 1999, Hansen and Lebedeff 1987). If the variance of the underlying field changes slowly as a function of location, then the covariance function can be replaced with the
correlation function, ( ⃑ ⃑⃑), which leads to the formulation that:
(⃑ ⃑) (⃑ ⃑ ) (⃑ ) (⃑ ⃑ )  (⃑ ⃑) (⃑ ⃑ ) ( ) ( ) (⃑ ) (⃑ ⃑ ) ( (⃑ ⃑) (⃑ ⃑) )
The Kriging formulation is most efficient at capturing fluctuations that have a scale length comparable to the correlation length; however, it also permits the user to find finer structure if more densely positioned data is provided. In particular, in the limit of infinitely dense data, the Kriging estimate of the field will necessarily match the field exactly. This is in direct contrast to the GISS and HadCRU averaging approaches that will always smooth over fine structure.
A further modification is made by assuming that ( ⃑ ⃑⃑) | ⃑ ⃑⃑| ( ), where denotes the distance between ⃑ and ⃑⃑. This allows us to parameterize the correlation field as a simple function of one variable, though it admittedly neglects differences in correlation that might be related to factors such as latitude, altitude, and local vegetation, etc. The correlation
function is parameterized using:
 ( ) () This is compared to a reference data set based on randomly selecting 500,000 pairs of stations, and measuring the correlation of their non-seasonal temperature fluctuations provided they have at least ten years of overlapping data. The resulting data set and fit are presented in Figure 2. Pair selection was accomplished by choosing random locations on the globe and locating the nearest temperature records, subject to a requirement that it be no more than 100 km from the chosen random location. The small constant term measures the correlation over the very largest distance scale; however, for the purposes of equation  it is computationally which we did while scaling the rest of equation  by ( ) to advantageous to set compensate near. This allows us to treat stations at distances greater than ~4000 km as completely uncorrelated, which greatly simplifying the matrix inversion in equation  since a majority of the matrix elements are now zeros. Figure 2 shows that the correlation structure is substantial out to a distance of ~1000 km, and non-trivial to ~2000 km from each site.
Figure 2. Mean correlation versus distance curve constructed from 500,000 pair-wise comparisons of station temperature records.
Each station pair was selected at random, and the measured correlation was calculated after removing seasonality and with the requirement that they have at least 10 years of overlapping data. Red, green, and yellow curves show a moving range corresponding to the inner 80, 50, and 20% of data respectively. The black curve corresponds to the modeled correlation vs. distance reported in the text. This correlation versus distance model is used as the foundation of the Kriging process used in the Berkeley Average.
Based on the data, the best fit values in equation  were = 0.1276, = 2.4541 x 10-4 / km, = 5.3881 x 10-7 / km2, -2.7452 x 10-11 / km3, = 8.3007 x 10-14 / km4 and = 0.0272. These were the values we used in the Berkeley Earth temperature reconstruction method.