# «Robert Rohde1, Judith Curry2, Donald Groom3, Robert Jacobsen3,4, Richard A. Muller1,3,4, Saul Perlmutter3,4, Arthur Rosenfeld3,4, Charlotte Wickham5, ...»

In Figure 3 we show similar fits using station pairs restricted by either latitude or longitude. In the case of longitude, we divide the Earth into 8 longitude bands and find that the correlation structure is very similar across each. The largest deviation occurs in the band centered at 23 W with reduced correlation at short distances. This band is one of several that include relatively few temperature stations as it spans much of the Atlantic Ocean, and so this deviation might be primarily a statistical fluctuation. The deviations observed in Figure 3 for latitude bands are more meaningful however. We note that latitude bands show decreasing short-range correlation as one approaches the equator and a corresponding increase in long-range correlation. Both of these effects are consistent with decreased weather variability in most tropical areas. These variations, though non-trivial, are relatively modest for most regions. For the current presentation we shall restrict ourselves to the simple correlation function given by equation [14], though further refinements of the correlation function are likely to be a topic of future research.

** Figure 3. Correlation versus distance fits, similar to Figure 2, but using only stations selected from portions of the Earth.**

The Earth is divided into eight longitudinal slices (Left) or seven latitudinal slices (Right), with the slice centered at the latitude or longitude appearing in the legend. In each panel, the global average curve (Figure 2) is plotted in black. All eight longitudinal slices are found to be similar to the global average. For the latitudinal slices, we find that the correlation is systematically reduced at low latitudes. This feature is discussed in the text.

We note that the correlation in the limit of zero distance, ( ), has a natural and important physical interpretation. It is an estimate of the correlation that one expects to see between two typical weather monitors placed at the same location. By extension, if we assume such stations would report the same temperature except that each is subject to random and () uncorrelated error, then it follows that of the non-seasonal variation in the typical station record is caused by measurement noise that is unrelated to the variation in the underlying temperature field. Since the average root-mean-square non-seasonal variability is ~2.0 C, it follows that an estimate of the short-term instrumental noise for the typical month at a typical station is ~0.47 C at 95% confidence. This estimate is much larger than the approximately 0.06 C typically used for the random monthly measurement error (Folland et al.

2001). Our correlation analysis suggests that such estimates may understate the amount of random noise introduced by local and instrumental effects. However, we note that the same authors assign an uncertainty of 0.8 C to the homogenization process they use to remove longerterm biases. We suspect that the difficulty they associate with homogenization is partially caused by the same short-term noise that we observe. However, our correlation estimate would not generally include long-term biases that cause a station to be persistently too hot or too cold, and so the estimates are not entirely comparable. The impact of short-term local noise on the ultimate temperature reconstruction can be reduced in regions where stations are densely located and thus provide overlapping coverage. The simple correlation function described above would ∬ ( ⃑) ⃑ of the Earth’s temperature field;

imply that each temperature station captures ⃑ ∬ equivalently, 180 ideally distributed weather stations would be sufficient to capture nearly all of the expected structure in the Earth’s monthly mean anomaly field. This is similar to the estimate of 110 to 180 stations provided by Jones 1994. We note that the estimate of 180 stations includes the effect of measurement noise. Removing this consideration, we would find that the underlying monthly mean temperature field has approximately 115 independent degrees of freedom. In practice though, quality control and bias correction procedures will substantially increase the number of records required.

The new Kriging coefficients ( ⃑ ) defined by equation [12] also have several natural

**interpretations. First, the average of ( ⃑ ) over land:**

[15] ∫ (⃑ )⃑ ⃑ ∫ can be interpreted as the total weight in the global land-surface average attributed to the i-th station at time. Second, the use of correlation rather than covariance in our construction, gives rise to a natural interpretation of the sum of ( ⃑ ) over all stations. Because Kriging is linear

**and our construction of R is positive definite, it follows that:**

[16] (⃑ ) ∑ (⃑ ) where ( ⃑ ) has the qualitative interpretation as the fraction of the ( ⃑ ) field that has been effectively constrained by the data. The above is true even though individual terms ( ⃑ ) may in general be negative. Since the true temperature anomaly is

We see that in the limit ( ⃑ ), the temperature estimate at ⃑ depends only on the local data ( ), while in the limit ( ⃑ ) the temperature field at ⃑ is estimated to have the

**same value as the global average of the data. For diagnostic purposes it is also useful to define:**

which provides a measure of total field completeness as a function of time.

Under the ordinary Kriging formulation, we would expect to find the parameters ̂( )

**and ̂ by minimizing a quality of fit metric:**

**This is nearly identical to the constraint in equation [2] that:**

[21] ∫ (⃑ )⃑ This latter criterion is identical to equation [20] in both the limit ( ⃑ ), indicating dense sampling, and the limit ( ⃑ ), indicating an absence of sampling since ( ⃑ ) also becomes 0 in this limit. We choose to accept equation [2] as our fundamental constraint equation rather than equation [20]. This implies that our solution is only an approximation to the ordinary Kriging solution in the spatial mode; however, making this approximation confers several advantages. First, it ensures that ̂( ) and ̂ retain their natural physical interpretation.

Second, computational advantages are provided by isolating the ( ⃑ ) so that the integrals might be performed independently for each station.

Given equations [7] and [13] imposing criterion [2] actually constrains the global average temperature ̂( ) nearly completely. Though not immediately obvious, constraints [7], [13] and [21] leave a single unaccounted for degree of freedom. Specifically, one can adjust all ̂( ) by any arbitrary additive factor provided one makes a compensating adjustment to all ̂. This last degree of freedom can be removed by specifying the climatology ( ⃑), applying the zero mean criterion from equation [2] and assuming that the local anomaly distribution (equation [5]) will

**also have mean 0. This implies:**

where is the number of months of data for the i-th station. The modified diagonal terms on the correlation matrix are the natural effect of treating the value ̂ as if it were entered into the, which appropriately gives greater weight to values of ̂ that are more Kriging process precisely constrained. As noted previously, the factors associated with latitude and altitude collectively capture ~95% of the variance in the stationary climatology field. Most of the remaining structure is driven by dynamical processes (e.g. ocean and atmospheric circulation) or by boundary conditions such as the nearness to an ocean.

This final normalization described here has the effect of placing the ̂( ) on an absolute scale such that these values are a true measure of mean temperature and not merely a measure of a temperature anomaly. In practice, we find that the normalization to an absolute scale is considerably more uncertain than the determination of relative changes in temperature. This occurs due to the large range of variations in ̂ from nearly 30 C at the tropics to about -50 C in Antarctica. This large variability makes it relatively difficult to measure the spatial average temperature, and as a result there is more measurement uncertainty in the estimate of the absolute temperature normalization than there is in the measurement of changes over time.

The preceding outline explains the core of our analysis process. However, we make other modifications to address issues of bias correction and station reliability. Whereas other groups use a procedure they refer to as homogenization, our approach is different; we call it the scalpel.

4. Homogenization and the Scalpel Temperature time series may be subject to many measurement artifacts and microclimate effects (Folland et al. 2001, Peterson and Vose 1997, Brohan et al. 2006, Menne et al. 2009, Hansen et al. 2001). Measurement biases often manifest as abrupt discontinuities arising from changes in instrumentation, site location, nearby environmental changes (e.g. construction), and similar artifacts. They can also derive from gradual changes in instrument quality or calibration, for example, fouling of a station due to accumulated dirt or leaves can change the station’s thermal or air flow characteristics. In addition to measurement problems, even an accurately recorded temperature history may not provide a useful depiction of regional scale temperature changes due to microclimate effects at the station site that are not representative of large-scale climate patterns. The most widely discussed microclimate effect is the potential for “urban heat islands” to cause spuriously large temperature trends at sites in regions that have undergone urban development (Hansen et al. 2010, Oke 1982, Jones et al. 1990). At noted in the prior section, we estimate that on average 12% of the non-seasonal variance in a typical monthly temperature time series is caused by short-term local noise of one kind or another. All of the existing temperature analysis groups use processes designed to detect various discontinuities in a temperature time series and “correct” them by introducing adjustments that make the presumptively biased time series look more like neighboring time series and/or regional averages (Menne and Williams 2009, Jones and Moberg 2003, Hansen et al. 1999). This data correction process is called “homogenization.” Rather than correcting data, we rely on a philosophically different approach. Our method has two components: 1) Break time series into independent fragments at times when there is evidence of abrupt discontinuities, and 2) Adjust the weights within the fitting equations to account for differences in reliability. The first step, cutting records at times of apparent discontinuities, is a natural extension of our fitting procedure that determines the relative offsets between stations, encapsulated by ̂, as an intrinsic part of our analysis. We call this cutting procedure the scalpel. Provided that we can identify appropriate breakpoints, the necessary adjustment will be made automatically as part of the fitting process. We are able to use the scalpel approach because our analysis method can use very short records, whereas the methods employed by other groups generally require their time series be long enough to contain a reference interval.

The addition of breakpoints will generally improve the quality of fit provided they occur at times of actual discontinuities in the record. The addition of unnecessary breakpoints (i.e.

adding breaks at time points which lack any real discontinuity), should be trend neutral in the fit as both halves of the record would then be expected to tend towards the same ̂ value; however, unnecessary breakpoints can amplify noise and increase the resulting uncertainty in the record (discussed below).