Open Mind

Breaking Records

June 26, 2009 · 23 Comments

According to temperature data from GISS the hottest year on record is 2005, but according to data from HadCRU (the HadCRUT3v data set), the hottest year is 1998. You might wonder whether there’s any significance to the fact that, eleven years later, the HadCRU data set hasn’t yet set a new record. HadCRU data shows a much stronger influence from the very strong 1998 el Nino than does GISS data; hence the HadCRU 1998 record is considerably more extreme than the GISS 1998 record (it was the record at the time). How long should we expect it to take before breaking a record anyway? How long does a record have to remain unbroken before we have statistically significant evidence that global warming might have peaked in 1998?

The gory mathematical details are given at the end of this post. But for the HadCRUT3v data set, the 1998 record was a whopping 2.6 standard deviations above the trend line. That’s a lot! Already we should expected it to take a while to break that record.

Using the formulae outlined at the end of this post, we can compute the probability that the record won’t be broken until any later year n, given a steady warming rate at 0.017 deg.C/yr. The probability is shown in the left-hand graph, with the “Survival function” (it’s not the actual survival function, it’s the probability of the record not being broken until year n or later) shown in the right-hand graph:


We see that the most likely single year in which to break the record is year 10 (2008), although there’s still considerable probability that the record will last longer than that. In fact, there’s a 6.9% chance the record will last 14 years — until 2012 — even assuming, as we have done, that global temperature is a steady increase plus random noise. Hence the “95% confidence limit” (the standard in scientific research) is 14 years; only if the record lasts beyond 2012 do we have statistically significant evidence of any change in the global warming pattern.

The HadCRU record lasts so long because temperature in 1998 was so much above the trend line. What about GISS? In this case, the 1998 value is only 2.2 standard deviations above the trend, so it’s easier to break the record. Still, the 1998 record shouldn’t last beyond 2010, it should be broken by then. And in fact the GISS 1998 record WAS broken, in 2005. It was also tied in 2007.


The new GISS record is year 2005, but that’s only 1.2 standard deviations above the trend so it shouldn’t take as long to break. In fact it shouldn’t last beyond 7 years, so if we don’t break it by 2012, only THEN should we wonder why the record hasn’t been exceeded. Of course, that assumes we don’t have some unforseen event like a massive volcanic eruption, which cools the planet and alters the underlying trend.


Lots of denialists — lots of them — use the 1998 record in the HadCRU data to claim that “global warming stopped in 1998″ or “the globe has cooled since 1998.” Lots of other analyses show how foolish such claims are, but this particular one shows with crystal clarity: the fact that HadCRU data hasn’t yet exceeded its 1998 value is nothing more than what is to be expected. Anyone who tells you different is selling something.

But, a cleverly crafted yet fundamentally flawed “sales pitch” is all they’ve got.

Probability for breaking the record in year n

Let’s suppose that annual average global temperature is the combination of a steady, linear trend at a rate of 0.017 deg.C/yr, and normally distributed white noise. This is actually a pretty good approximation; we know the noise isn’t white noise but for annual averages it is at least approximately so, and in fact the noise approximately follows the normal distribution. Then our simple model of annual average temperature is

x_t = \alpha + \beta t + \varepsilon_t,

where \alpha is the intercept of the trend line, \beta is its slope (about 0.017 deg.C/yr), and \varepsilon_t is random (normally distributed white) noise with mean value zero and standard deviation \sigma.

Now suppose that some particular year, let’s call it “year zero,” the noise term \varepsilon is big enough to set a new record for global annual average temperature. Since t=0, the record temperature is

x_0 = \alpha + \varepsilon_0.

What’s the chance of breaking the record the following year? To break the record we require x_1 > x_0, or

x_1 = \alpha + \beta + \varepsilon_1 > \alpha + \varepsilon_0 = x_0.

This is the same as requiring

\varepsilon_1 > \varepsilon_0 - \beta.

If the noise follows the probability density function f(\varepsilon), with cumulative distribution function F(\varepsilon), then that probability is just

Probability = 1 - F(\varepsilon_0-\beta).

We can even use the normal cdf \Phi(z) to compute the noise cdf as

F(\varepsilon) = \Phi(\varepsilon/\sigma).

What’s the chance we don’t break the record until the 2nd year after it’s set? For that to happen, first we have to NOT break the record the following year, which has probability

Probability(not 1st year) = F(\varepsilon_0 - \beta).

Then we have to break the record the 2nd-following year. This means x_2 > x_0, or

x_2 = \alpha + 2 \beta + \varepsilon_2 > \alpha + \varepsilon_0 = x_0.

This is the same as

\varepsilon_2 > \varepsilon_0 - 2 \beta,

and the probability of that happening is

 1 - F(\varepsilon_0 - 2 \beta).

Hence the probability of not breaking the record in year 1 and breaking it in year 2 is the product of these probabilities, namely

Probability(year 2) = F(\varepsilon_0 - \beta) [ 1 - F(\varepsilon_0 - 2 \beta) ].

By similar reasoning, the chance we won’t break the record until year 3 is the probability of NOT breaking it in year 1, times the probability of NOT breaking it in year 2, times the probability of breaking it in year 3, which is

Probability (year 3) = F(\varepsilon_0 - \beta) F(\varepsilon_0 - 2 \beta) [ 1 - F(\varepsilon_0 - 3 \beta) ].

You can probably see a pattern developing; the chance that we won’t break the record until year n is

Probability (year n) = F(\varepsilon_0 - \beta) F(\varepsilon_0 - 2\beta) F(\varepsilon_0 - 3\beta) ... F(\varepsilon_0 - (n-1)\beta) [1 - F(\varepsilon_0 - n\beta)].

Categories: Global Warming

23 responses so far ↓

  • Deep Climate // June 26, 2009 at 9:20 pm | Reply

    Great post and very understandable, thanks. I hope Edward Wegman reads it.

    In the last equation for probability (year n), is there any simplification possible of the product of F1 through Fn-1 that represents the cumulative probability of not breaking the record for years 1 through n-1?

  • Timothy Chase // June 27, 2009 at 12:42 am | Reply

    It has been a while for me, what I would expect would be a modified form of the law of exponential decay, and with exponential decay the survival probability for year n would be expressed as an exponential of the probability of “surviving” the initial year (p), that is for year n the probability of survival to and including that year is p^n, and likewise the probability of decaying on a particular year would be (p^(n-1))(1-p). Now exponential decay would likewise involve white noise, as there would be no correlation between a given year and the succeeding year.

    So as I see it, the difference between that pure exponential decay and the formula you have given is the n=1,2,3,4,… βs where β is simply the slope of the trendline and represents the constant march year after year along that trendline.

    And looking back over your explanation it appears that this is exactly what you have done. Is that it? My apologies. It has been a while for me.

    [Response: Essentially yes, bearing in mind that the \beta term affects the argument of the cumulative distribution function.]

  • Bob Tisdale // June 27, 2009 at 1:14 am | Reply

    Tamino: FYI, the Hadley Centre changed SST data sources in 1998. The following quote is from the Hadley Centre:

    “Brief description of the data
    “The SST data are taken from the International Comprehensive Ocean-Atmosphere Data Set, ICOADS, from 1850 to 1997 and from the NCEP-GTS from 1998 to the present.”

    And now a quote from ICOADS:

    “ICOADS Data
    “The total period of record is currently 1784-May 2007 (Release 2.4), such that the observations and products are drawn from two separate archives (Project Status). ICOADS is supplemented by NCEP Real-time data (1991-date; limited products, NOT FULLY CONSISTENT WITH ICOADS).” [Emphasis added.]

    This change in data suppliers created an upward step change in their data with respect to the SST datasets that did not swap suppliers at that time (ERSST.v2, ERSST.v3b, OI.v2).

    And GISS has used OI.v2 SST data since December 1981.


  • michel lecar // June 27, 2009 at 8:50 am | Reply

    Problem with HADCRU is, they will not reveal either their raw data or their algorithms.

    So I can’t really see the sense of debating their stuff. Still less using it in any public policy debate. It is not reproducible, its not subject to external scrutiny. It could be right or wrong, who knows? Its not science. At the moment it is at the level of ‘trust me, I’m a climate science expert’. No, if you want to be taken seriously, show us the workings.

    Unlike GISS, to Hansen’s great credit. There may be things wrong with the GISS algorithms and raw data, but GISS is setting an example of reproducibility and verifiability which HADCRU needs to follow. If they don’t, get their stuff out of IPCC and get it out of all policy discussions.

  • Barton Paul Levenson // June 27, 2009 at 12:35 pm | Reply


    could you express an equation like

    e1 > e0 – beta


    e1 > (e0 – beta)

    to make it clearer for us computer science types? Remember that in many programming languages,

    (e1 > e0) – beta

    would be evaluated differently from

    e1 > (e0 – beta)

    and might give a different answer depending on the implementation. Remove all ambiguity!

    [Response: I sympathize with your dilemma, but including the parentheses would be bad form mathematically although clearer for programmers. It's not incorrect, but mathematicians would wonder why I included the unnecessary parentheses. Nothing personal, really! -- but I think I'll conform to standard mathematical style.]

  • george // June 27, 2009 at 3:03 pm | Reply

    I wonder about the value of the whole “record” thing.

    As we saw recently when NASA’s small error (and adjustment) “changed” the rankings for the continental US (though not with statistical significance) , some people actually misuse/abuse the rankings.

    Fox news and others were reporting that 1934 had suddenly become the “hottest year on record” with the implication that it was for the entire globe, when in fact, the result was for the continental US AND the difference between 1998 and 1934 was STILL not significant (neither before or after the adjustment)

    Also, as pointed out above, the fact that the hadCRUT temp for 1998 is 2.6 std deviations above the trend may not be entirely due to nature (unless you consider the data set switch or possible errors in the calculation of the global temp “natural”).

    So, scientifically speaking, it’s a little hard to gage what a “record” temperature actually means.

    Unfortunately, the general (unscientific) public has no such problem assessing records. A record is a record and only steroids can change that.

    Perhaps worst of all, if you set up a record to be broken within a certain time period with the idea that it is NOT broken, global warming becomes suspect, I think you may be asking for trouble because if it does not happen and the hadcrut 1998 temp was actually in error, it will be very hard to convince people that the fact that the record has not been broken in 14 years (or whatever) is really meaningless.

    I think this may be another case where the public gets confused by a descriptive tool that is less than optimal and may actually be counterproductive.

    I think statements like the following are a better indicator (to both the public and to scientists) of what is happening than the “record.”

    “The ten warmest years [of the instrumental record since 1880] all occur within the 12-year period 1997-2008.” (NASA GISS)

  • MikeN // June 27, 2009 at 3:14 pm | Reply

    >the fact that HadCRU data hasn’t yet exceeded its 1998 value is nothing more than what is to be expected.

    That doesn’t look true. 95% confidence isn’t the same as 50% confidence. ‘What’s expected’ is the 50% confidence level.

    [Response: Your statistical naivete is showing.]

  • dhogaza // June 27, 2009 at 3:35 pm | Reply

    BPL, as someone who made his living writing high-end compilers for a variety of languages and processors during my 20s and early 30s, offhand I can’t think of any mainstream language which gives comparison operators like “>” equal or higher precedence than arithmetic operators like “-”.

  • Timothy Chase // June 27, 2009 at 4:29 pm | Reply

    dhogaza wrote:

    … I can’t think of any mainstream language which gives comparison operators like “>” equal or higher precedence than arithmetic operators like “-”.

    Another point: the (a>b) will be a boolean, and as such its treatment from one computer language to another will be ambiguous since true will be 1 in some languages but -1 in others — that is assuming the language isn’t strongly typed to begin with, in which case subtracting a numerical value from a boolean would be strictly verboten anyway.

  • Timothy Chase // June 27, 2009 at 4:51 pm | Reply

    RE a>b-c

    Anyway I am glad this came up.

    For five years I was doing VB6, and although (a>b) was looked down on a bit due to its ambiguity it was a nice shorthand that simplified code. So I could definitely see where BPL was coming from. At the same time I had vaguely noticed the notational convention employed in math.

    Nice to think about — as it involved some connections.

  • MikeN // June 27, 2009 at 5:07 pm | Reply

    >Response: Your statistical naivete is showing.]

    Oh, you want to use expected value instead? Looking at your chart, that still doesn’t give you a number higher than 10.

    [Response: The expected value in statistics isn't what we "expect" to get, it's the average value of repeated identical experiments as the number of repetitions grows unboundedly. It's not even the single most likely value (that's the mode). And the likelihood of getting that value is often quite small -- including in this case, for which the most likely value has less than 13% probability of occuring. In fact for a continuous (rather than discrete) random variable, the probability of getting exactly the "expected value" is equal to zero.

    As for the idea that what we "expect" is the 50% confidence limits, that's utter nonsense -- we expect the result to be outside 50% confidence limits as often as it's within them.

    What we "expect" is that most of the time (95% of the time being the de facto scientific standard) it will be within a given set of confidence limits (95% confidence limits). Only when that fails to happen do we have any statistical evidence that our hypothesis is mistaken. Even that's not proof; we "expect" to be outside the 95% confidence limits for no other reason than random fluctuation, 5% of the time.

    The level of naivete you've exhibited about statistics is astounding, but hardly surprising. It's the obstinacy with which you cling to your ignorance that's truly embarrassing. If you simply admit it, we'll respect your wisdom; if not...]

  • george // June 27, 2009 at 6:45 pm | Reply

    There are broken records and then there are broken records.

    The former might mean something but the latter almost never do.

  • michel lecar // June 27, 2009 at 7:22 pm | Reply

    But you are not answering the question.

    If the HADCRU originating data and algorithm has not been revealed, how do we know that the various trends and levels are not an artifact of the way its been compiled? So, why do we think any tests of significance are testing movements in temperature, as opposed to movements in the index?

    What you are showing is that there are significant movements in the HADCRU index. How do you know this corresponds to movements in temperature?

    One is sure they have done their best. But unless we can verify what that best amounted to, its a waste of time thinking much about what their work, taken at face value, shows.

    [Response: You're just flapping your lips in an attempt to smear HadCRU. The close match of HadCRU, GISS, NCDC, and other data sets is plenty of confirmation that they're on the right track, and the results of HadCRU data are independently recoverable from other data sets. Do yourself a favor and give it up.]

  • Deep Climate // June 27, 2009 at 8:20 pm | Reply

    But let’s not think about:
    x + 2 = 2x

    (x + 2) == (2 * x)


  • Hank Roberts // June 27, 2009 at 8:46 pm | Reply

    MikeB, Tamino is sincere about admitting ignorance and is a very good teacher.

    Don’t be thin-skinned; he’s much less caustic than either of my statistics teachers tended to be with me; it goes with the territory.)

    “… Often, the person telling you to do a search … thinks (a) the information you need is easy to find, and (b) you will learn more if you seek out the information than if you have it spoon-fed to you.

    You shouldn’t be offended by this; by hacker standards, your respondent is showing you a rough kind of respect simply by not ignoring you. You should instead be thankful for this grandmotherly kindness…. the direct, cut-through-the-bullshit communications style that is natural to people who are more concerned about solving problems than making others feel warm and fuzzy…. Get over it. It’s normal. In fact, it’s healthy and appropriate.

    Community standards do not maintain themselves: They’re maintained by people actively applying them, visibly, in public. …
    Remember: When that hacker tells you that you’ve screwed up, and (no matter how gruffly) tells you not to do it again, he’s acting out of concern for (1) you and (2) his community. It would be much easier for him to ignore you …
    —-end excerpt—-

  • Riccardo // June 27, 2009 at 10:04 pm | Reply

    in the claim that 1998 is 2.6 standard deviations (SD) above the trend line, the SD is calculated from the stated error of the measurements or from the residuals in a given period of time?

    [Response: You can't use the stated measurement error because that includes only measurement error, not the natural variation which we're really interested in. I estimated the deviation in two ways: first, by fitting a lowess smooth to the entire data set and basing it on the residuals from that fit, and second, by fitting a line to the 1975-present data and basing it on the residuals from that fit. Both estimates put 1988 2.6 standard deviations above the trend.]

  • Lazar // June 28, 2009 at 12:04 am | Reply


    they will not reveal either their raw data or their algorithms

    the algorithms are described in relevant papers, the main one for hadcrut3 is available free at the hadley or cru websites, there’s a list of surface stations used in crutem3 at cru and their data can be obtained free from ghcn or the relevant national weather service, and finally sst measurements are available free from noaa icoads

    Its not science

    … and you’re in a better position to judge that than the referees and the probably hundreds of scientists who use hadcrut?… and you think people here will believe you?
    it takes helluvalota work to ask the right questions… let alone make substantial criticism…
    have you read those references re climate sensitivity?

  • Ray Ladbury // June 28, 2009 at 12:08 am | Reply

    Michel, an old saying: A man with one watch always knows what time it is–even if the watch is broken. A man with two watches is never sure, but at least he’ll know if one of them is broken. HADCRU is nto the only watch we have.

  • Ray Ladbury // June 28, 2009 at 12:12 am | Reply

    Mike N.,
    First, confidence and probability are different entities. Second:
    mean=expected value=1st moment
    mode=most probable value
    median=point where the cumulative probability is 0.5

    Dude, go learn some probability. You’ll get a lot more out of Tamino’s posts.

  • dhogaza // June 28, 2009 at 12:24 am | Reply

    Dude, go learn some probability. You’ll get a lot more out of Tamino’s posts.

    Let’s have him over for poker, first …

  • Glen Raphael // June 28, 2009 at 3:31 am | Reply

    So: when HadCRU still hasn’t exceeded 1998’s level as of the end of 2012, *then* you will be convinced there’s something wrong with your model? Good to know!

    (Actually, if Bob is correct that 1998 was bumped to a higher level due to a one-time change in data sources, then the jump wasn’t really 2.6 standard deviations after all. In which case the numbers are heavily padded in your favor. Still, easily falsifiable short-term predictions are pretty rare in the climate debate, and this one seems pretty likely to bite you, so good on you for making it!)

    [Response: You're mistaken. If the 1998 figure is too high, there's still the same probability of exceeding the given *numerical* value whether it's a genuine temperature record or not -- except that the inflated 1998 value makes the estimated trend rate too high, so it causes an overestimate of the likelihood of exceeding the given numerical value. Hence the numbers are "padded" *against* breaking the record.

    As for "pretty likely to bite me," you offer exactly the evidence for that I expected: none.

    If the HadCRU data don't exceed the 1998 value by 2012, that's evidence but not proof of a difference between model and reality. If there's a known cause for such an observation (Pinatubo-scale or larger volcano next month), all bets are off. The GISS data have ALREADY exceeded the 1998 record.

    It's generally only denialists who are desperately seeking some single event or measure that justifies saying "global warming is wrong." Sane and honest climate researchers acknowledge that climate is a lot more complicated than that, we have to aggregate all the evidence; it's not a single record-setting year, it's the combination of hundreds, even thousands, of evidences that combine to make an overwhelming case for global warming. Perhaps that's just too subtle for you; it certainly is for Bob Carter.

    The POINT of this post is that not only is the 1998 HadCRU record not "proof" against global warming, it isn't even evidence.]

  • michel lecar // June 28, 2009 at 6:46 am | Reply

    Simple question, and maybe I am wrong about this, if so, I’ll own up to it. Where exactly does one find the data and the algorithm in a form that one can run it, and generate the series?

    Steven Haxby quotes the following reply from Defra on this subject

    Although I accept that you are understandably concerned over this issue relating to scientific practice, the CRU is an independent organisation which receives no DECC funding for developing the CRU land dataset and therefore DECC does not have any proprietary rights to this data. It is up to Professor Jones, as the dataset?s owner, to release this data. So far, in response to various freedom of information requests, he has released only the names of the meteorological stations used to compile his dataset, but the station data for many of these (though admittedly not all) can in fact be obtained at the Goddard Institute for Space Studies (GISS) website at [my emphasis]

    So it really does not sound, does it, as though Defra thinks all the data is available in a form which will let one first verify that the algorithm applied to it will generate the time series, and then after that move to asking whether the algorithm is appropriate? Is this wrong, and is there a full data set and a code listing someplace where we can get it and look at it?

    The reply (by Andrew Glyn) goes on to say:

    the HadCRU global temperature graph was one of four that were cited by the Intergovernmental Panel on Climate Change (IPCC) in their 2007 Assessment as evidence of the warming that has occurred since the end of the nineteenth century. One of these graphs was produced by GISS who do make available on their website all the station names and the associated temperature data along with their calculation computer code. The graph produced by GISS is very similar to HadCRU (and the other two independently produced graphs) but is calculated by a different method to that developed by Professor Jones. The close similarity of these graphs indicates that there are no concerns over the integrity of the HadCRU global temperature graph.

    Which is actually not far from the suggestion I made, if you look at it in a different light. It amounts to saying don’t use the stuff. Use other series where the underpinnings have been placed in the public domain. Because this one adds nothing and has to be verified by referring to them.

    But that of course is not the way its publicized.

  • Ray Ladbury // June 28, 2009 at 11:03 am | Reply

    I’ll bring chips. Mike N., bring lots of money.

Leave a Comment