Technical Comments

Response to Comment on "The Spatial Extent of 20th-Century Warmth in the Context of the Past 1200 Years"

Science  29 Jun 2007:
Vol. 316, Issue 5833, pp. 1844
DOI: 10.1126/science.1141446

Abstract

Reconsidering the basis for selecting proxy records according to their correlation with local temperature has no substantive influence on the statistical significance of 20th-century warming that we reported, provided that the degree of selectivity is correctly estimated. The conclusion that recent warming is unusually widespread compared with the past 1200 years therefore remains valid.

Bürger (1) raises two important and closely related issues that are relevant to many reconstructions of past temperature variability, including our analysis (2). These issues concern the selection of which proxy records to use as predictors of past temperatures and the consideration of the selection process when building a statistical model with which to evaluate the significance of the results. A critical element in both issues is the degree of selectivity being used. In a case where the pool of potential temperature-sensitive proxy records is large but only a small number of these records actually exhibit evidence for temperature sensitivity in practice, the arguments of Bürger that (i) the selection criterion should be stricter than that used in (2), and (ii) the selection criterion can significantly influence the statistical testing, may be valid. But this was not the case for our study. The pool from which we made our selection was actually rather small, and thus the number of positive correlations and the strength of those correlations are both far greater than would have been obtained by chance.

It is difficult to quantify exactly the size of the pool of potential records from which the 14 series used in (2) were selected, because there is implicit and explicit selection at various stages, from the decision to publish original data to the decision to include data in large-scale climate reconstructions. The 14 series used in (2) were selected from three previous studies (35), although this set also encompasses almost all the proxies with high temporal resolution used in the other Northern Hemisphere temperature reconstructions cited in (2). The only selection choices relevant to the concerns of Bürger (1) are those that were made on the basis of the correlation between proxy record and local temperature. We did not exclude any of the records from (3) or (5) on the basis of these correlations, but we did exclude two tree-ring records (Mackenzie and Gotland) from those used by (4) because their correlations with the instrumental temperature record were negative. The authors of these previous studies had, of course, already made additional selection decisions, and it is possible that some of these were made on the basis of temperature correlations. Given that (4) did not report any local temperature correlations, it is most likely that their selection was instead based on their a priori expectation of sensitivity to temperature (e.g., by selecting tree-ring records derived from high-elevation or high-latitude sites). Although (5) presented correlations between proxy records and their local temperature series, no records from the Northern Hemisphere were excluded on that basis. The series presented by (3) were chosen to represent a variety of proxy types; there was also a clear expectation that they were temperature sensitive, although this expectation was based on an a priori assessment of probable temperature sensitivity rather than on actual correlation values.

Hence, in our study (2), only two series were excluded on the basis of negative correlations with their local temperature, and no further series had been explicitly excluded by the three studies from which we obtained our data. We cannot be certain that prior knowledge of temperature correlations did not influence previous selection decisions, and there are more levels in the hierarchy of work upon which our study depends at which some selection decisions may have been made on the basis of correlations between proxy records and their local temperature. However, the degree of selectivity is unlikely to be much greater than that for which we have explicit information. Simply, there is not a large number of records of millennial length that have relatively high temporal resolution and an a priori expectation of a dominant temperature signal. Our selection criterion was a relatively weak one: simply that the correlation with local temperature had to be positive for each proxy record (resulting in the selection of 14 from a pool of only 16 series). Bürger argues that the likelihood of incorrectly selecting a temperature-insensitive proxy record (a “type I error”) under this criterion is then 0.5. This may be the case when selecting from a large pool of records with no a priori justification for temperature sensitivity. It is an inappropriate argument in our case, because the probability that at least 14 out of 16 series have positive correlations with temperature simply by chance is less than 0.01 (using the binomial distribution under the assumption that each test is independent). Bürger uses a stricter criterion (that the positive correlation must be significantly different from zero with a confidence of 99%) to select only 8 series. The chance that a temperature-insensitive record passes this test is 0.01, and the probability that 8 out of our pool of 16 records did so by chance is less than 10-11.

As a combined sample, therefore, it is very likely that the proxy records we used contain highly significant sensitivity to their local temperatures. Nevertheless, it is possible that some of the series that we selected are poor records of temperature variability. If such series have been included, they will have degraded the record throughout its length and might hamper the recognition of the 20th century as an exceptional period rather than enhance it. The stricter criterion proposed by Bürger would reduce this possibility but would greatly increase the chance that genuinely temperature-sensitive proxies are excluded (a “type II error”). Given the small pool of proxies from which to choose, together with the short period of overlap with instrumental temperature over which to assess the data, the impact of type II errors is quite large. Bürger notes that almost all proxies would fail his criterion if applied to decadal means of proxy data and temperature, yet this is not because the correlations are weaker overall (in fact, they are mostly stronger) but because the degrees of freedom are much lower and the probability of erroneous exclusion of good proxies is, therefore, much greater. It is possible for some proxies to exhibit a much closer match to local temperature variations on decadal time scales than on subdecadal time scales, yet this is very difficult to evaluate statistically (6). Our decision to use a weak criterion for selecting proxy records was intended to reduce the probability of erroneous exclusion of good proxies.

Bürger illustrates the effect of the selection stage on the statistical test used to assess the significance of deviations in inferred temperature from the millennial-mean level. The significance levels determined in (1) are basedona selection of eight random series that happened by chance to correlate positively (with 99% confidence) with a local temperature record. These eight would, therefore, have been selected as having the strongest temperature correlations out of a pool of 800 (because, in general, only 1% would exhibit strong correlations by chance). The 14 proxies we used were selected from a pool of only 16 proxies, which is clearly a much lower degree of selectivity than that simulated in (1). Therefore, we address Bürger's second concern by undertaking a more appropriate simulation of the procedure in (2). The assessment of the statistical significance of the results of (2) is modified so that, rather than comparing the real proxy results with a similar analysis based on 14 random synthetic proxy series, we now generate 16 synthetic series and select for analysis the 14 that exhibit the strongest correlations with their local temperature records (regardless of whether these correlations are statistically significant, or even positive, although positive ones are selected before negative ones). When this procedure is repeated 10,000 times, 95% of the results lie between the thick lines shown in Fig. 1. Although these do deviate from the significance levels estimated in (2) (thin lines in Fig. 1) during the final centuries of the analysis (principally by a negative shift around 1900 and a positive shift from 1950 onward), the difference is small enough that our original conclusions remain unaltered.

Fig. 1.

Difference between the fraction of proxy records in each year with normalized values >1 and the fraction with values <–1(double line) compared with the 5th and 95th percentiles of randomized results using the procedure described in (2) (thin lines) and a modified version in which 14 series are selected from 16 proxy series shifted randomly in time, with the selection made by excluding the two series with the lowest or most negative correlations against the 16 local temperature records (thick lines).

The larger impact of the selection process on the significance levels estimated by Bürger is the result of inappropriate modeling of the degree of selectivity used in (2). Nevertheless, we agree with Bürger that the selection process should be simulated as part of the significance testing process in this and related work and that this is an interesting new avenue that has not been given sufficient attention until now.

References and Notes

View Abstract

Subjects

Navigate This Article