Open Mind

Analyze This

November 5, 2007 · 38 Comments

No, this isn’t about a movie starring Robert de Niro and Billy Crystal (although I do like that movie — especially the cameo appearance by Tony Bennett).

Sooner or later, just about every scientist has to deal with numerical data. That’s one of the good things about being a mathematician; everybody needs us. A lot of scientists (especially in the physical sciences) are talented mathematicians themselves and even make contributions to the study of statistics, devising ingenious new methods of analysis. A lot of scientists go astray, either because they don’t appreciate the subtleties of the analysis method they apply, or because they don’t know the right method to choose.

Because global warming is a prominent environmental and political issue, it has brought scientific data under the scrutiny of the general public. Those who viewed Al Gore’s An Inconvenient Truth probably never saw a movie with so many graphs in it! But one of the lessons of mathematics is that if we’re not careful, the numbers can lead us to the wrong conclusion. I think it was Mark Twain who said there are three kinds of lies: lies, damned lies, and statistics.

Let’s take a look at some numerical data, that just happen to be temperature data from the beginning of 1975 to the end of 2001 for a particular location (it doesn’t matter where). The data were not chosen to establish any particular result or behavior. I took data from ECA (the European Climate Assessment and Dataset Project) because they provide daily measurements, and selected location number 1. Without further ado, here’s a graph of the data:

The first thing we can see is a regular pattern of ups and downs. We can see this better if we zoom in on a smaller time frame; here’s the first five years of this data set:

We see that there are wide swings in temperature each year, so the seasons are strong in this location; hence it’s probably not near the equator. We also see that temperature tends to be coldest shortly after the start of the year, hottest shortly after the middle of the year, so this must be a northern-hemisphere location (which you already figured out because it comes from the European Climate Assessment and Dataset Network).

Cycles, Period, Frequency

If we didn’t already know this was temperature data, we might not be sure that the ups and downs are an annual cycle, and not as “regular” as we’d like them to be. Maybe it’s something close to every year, but not quite; maybe the length of the cycle is changing. We can get an idea about that by applying period analysis. These kinds of analysis look for periodic behavior in our data, and the most popular by far is Fourier analysis (named for the brilliant French mathematician, Jean Baptiste Joseph Fourier). If there is periodic fluctuation, it will have a period (in this case, one year). For each period, there’s a quantity called the frequency, which is the number of cycles per unit of time. If each cycle is one year, the frequency is one cycle per year. If each cycle is one tenth of a year, the frequency is ten cycles per year. In fact the relationship between period $P$ and frequency $\nu$ is quite simple:

$\nu = 1/P$.

Fourier analysis generally creates a plot called a Fourier spectrum, or power spectrum, or sometimes called a periodogram, which plots frequency against something called the power. The power is an indicator of the chance that there’s real periodic behavior at that frequency. Here’s the periodogram for our data:

At most frequencies, the power level is very low, barely noticeable on the graph. But at frequency $\nu = 1$ cycle/year, there’s a very tall peak. This tells us that this data show a period of one year, and no other periods. That doesn’t mean there are no other periods, just that they’re not detectable with this analysis of this data! But this result means it’s safe to assume that the up-and-down fluctuations are truly periodic, with a non-changing period of one year: the cycle of the seaons.

The point is that if it looks cyclic, with a single constant period, it might be subtly different; in particular, your period may vary. For temperature data, it’s generally safe to assume a consistent period of one year because we know the physics behind it. But for data in general, the appearance may be an illusion; the methods of period analysis enable us to confirm or deny more complicated behavior.

Removing the Seasons with Averages

The seasonal variation is so big that it swamps whatever long-term trend may be present. If we really want to know whether there’s a trend in temperature change here, we need to remove that seasonal variation. One way to do this is to take annual averages. Each year has its winter, spring, summer, and fall, so if we average the data throughout the entire year, we’ll get the average over all seaons, and that will remove the seasonal changes.

There’s one thing to be careful of. For this procedure to work, we have to have data for every day of the year. If we’re missing some, that could bias our results and introduce a false seasonal effect. Suppose, for instance, that in one particular year we’re missing ten days in the middle of summer. When we average the year’s data, we’ll be leaving out ten of the hottest days — so our average will be unnaturally cold. It’s like estimating average height, but leaving out the ten tallest people in the sample.

The ECA data do indeed cover every day of each year, so we can safely remove the seasonal influence by computing annual averages. And here they are:

We see a lot of fluctuation from year to year; this is annual variation.

Looking for a Trend

But does it show a trend? The instinct for a lot of people is to take the difference between the values of the first and last data points, and use that to estimate a trend:

From 1975 to 2001, the change is -1.17 deg.C over 26 years, which indicates cooling at 0.045 deg.C/yr. But is this right? No! It ignores the fact that two single years can differ wildly because of all that annual variation. Worse yet, this process essentially ignores all the data from 1976 to 2000, and that’s a poor strategy for estimating the trend from 1975 to the end of 2001.

We can also see that the difference-between-single-years method gives wildly varying results depending on which years you choose to start and end with, by taking the difference between 1976 and 2000:

For this choice of start/end years, the change is +2.14 deg.C over 24 years, for a warming rate of +0.089 deg.C/yr. That’s very different!

There’s a much better way of looking for a trend, called linear regression: find the straight line which gives the best “fit” to the data, in the sense that the sum of the squares of the errors (differences between the data values and the straight-line values) is minimum. Here’s the linear regression fit to the data:

This indicates warming, at a rate of 0.037 deg.C/yr.

Is it Real?

Warming at 0.037 deg.C/yr is about twice the global rate. But is this result meaningful? After all, the data show a lot of annual variation; isn’t it possible that we’d get some trend rate from linear regression just because of random fluctuations in the data? Yes, it is, and the more variation in the data, the more trend we can get even if it’s just random. So we need to apply some statistical significance test. There’s a standard way to test linear regression: we assume that the data are just noise, i.e., they’re purely random, then we estimate the odds of getting the result we got just from random chance. We can do this for the annual average temperatures, and the result is, there’s about an 8% chance of getting a trend that large, from random data which has as much variation as this data. This is the false-alarm probability, so we’d say there’s an 8% false-alarm probability for the trend test. We could also say that the result has 92% confidence.

In science, the “standard” likelihood that the result is random, in order to call it significant, is a mere 5% false-alarm probability, or “95% confidence.” Since the result we got is that it’s more likely than that, we can’t say it’s significant with 95% confidence. We can say it’s significant with 92% confidence, but since that’s not up to the standard confidence level for scientific analysis, most scientists would regard warming as probable but by no means established.

We can also determine a probable error for the trend rate we get from linear regression. The probable error for the warming rate computed from annual averages is 0.022 deg.C/yr. There’s about a 95% chance that the actual value lies within 1.96 “probable error units” of our estimate, so the error range in our estimated warming rate is about $1.96 \times 0.022 = 0.043$. So we can say, with 95% confidence, that the true warming rate lies between 0.037 - 0.043, and 0.037 + 0.043, or simply that it lies between -0.006 (slight cooling) and +0.080 (major warming) deg.C/yr.

A quick note: the value 1.96 for probable-error-units (1.96 “standard deviations”) is pretty close to 2, so we often take a shortcut and just use the value 2. This overestimates the error range (only slightly), and overestimates the false-alarm probability (underestimates the confidence level), so it’s a conservative choice; we might miss a trend but we won’t identify one falsely. Hence we often say “two standard deviations” for 95% confidence, but if we’re being precise, we should say “1.96 standard deviations.”

For our test, we computed the false-alarm probability as the chance of getting the given result, if the data are really just random noise with no trend. The supposition behind this test, that the data are only random, is the null hypothesis. If the probability of getting our result when the null hypothesis is true, is so small that it’s just not credible, then we reject the null hypothesis.

Rejecting the null hypothesis means that we can say with confidence that the data are not just noise, there’s some signal there. But it does not establish that the data follow a straight line. Maybe it’s an upward-curving line, or downward-curving line, or up-then-down-then-up-then-up-even-faster curve. But if we get a result that’s too unlikely to be believable for random noise, we can only say that the data aren’t just random noise. There’s some signal there, but we may not have the right idea of what the signal is.

This is important to bear in mind. A statistically significant test result doesn’t establish that the signal follows the pattern we’ve tested for (in this case, a straight line). It merely rejects the null hypothesis, that there’s no signal at all. For the annual average temperatures we’re looking at, we haven’t even succeeded in doing that; it’s still believable (but not very likely) that the data are just noise, no signal.

Longer Averages

When we averaged over each year, we did more than just remove the cycle of the seasons. We also removed the fast changes in temperature that are shown in the data. Temperature changes on many time scales. There are very fast changes; the difference between night and day can be quite large and happen in less than 12 hours. But those are already removed from our data, because they’re daily mean temperature. We’ll also see (sometimes sizeable) changes from day to day, from week to week, from month to month. From the perspective of climate, this is very fast variation. By taking annual averages, we smooth out the variations that happen on a time scale less than one year.

We still see year-to-year changes, and we can call this fast variation. Climate is the average weather over many years, so this fast variation doesn’t tell us a whole lot about it. We’re really interested in the changes that last longer. It often helps to slow the changes down even more, by taking averages over yet longer time periods. Here, for instance, is the above data averaged every five years instead of every year:

Now we have a clearer picture of long-term (slow) changes. We also have greater visual indication of a trend; the last three averages are all higher then the first three averages. It also suggests that perhaps temperature is changing, but not steadily; another possibility is that it has shown a shift from one value to another around 1990.

Comparing Averages: More than One Way to Skin a Cat

We could carry the averaging idea to the limit, by comparing the average temperature for the 1st half of the data, to the average for the 2nd half of the data. Let’s take our annual average data, and compare the averages of those from the 1975-1988 period, to the 1988-2002 period.

For 1975-1988, the average of the 13 data points is 6.3198 with a standard deviation of 0.8737; for 1988-2002, the average of the 14 data points is 7.2524 with a standard deviation of 0.7002. Are the averages significantly different?

There’s a standard way to test significance between the difference of two averages. Given averages $\mu_1$ and $\mu_2$ based on $N_1$ and $N_2$ data points, with standard deviations $\sigma_1$ and $\sigma_2$, we compute the quantity

$z = (\mu_2 - \mu_1) / \sqrt{\sigma_1^2 / N_1 + \sigma_2^2 / N_2}$.

Using the null hypothesis that the data are random and have the same mean, the quantity $z$ will follow the normal distribution with mean 0 and standard deviation 1.

For annual average temperature from this location, the quantity $z$ turns out to be 3.05. For a normal-distribution variable with mean and standard deviation equal to 0 and 1, the probability of getting a result that big or bigger is less than 1% (in fact, about 0.2%). So we can reject the null hypothesis with less than 1% false-alarm probability, or greater than 99% confidence.

This is a much stronger result than we got from linear regression. We’re also using a different model of temperature change. For linear regression, the model is one of steady temperature change at a constant rate. For the before-after average comparison, the model is one of “step change” — one average applies before 1988, another after 1988. We can graph this model, and the fit that results from it, here:

The fact that we got notably stronger statistical significance from this model, indicates that from a purely statistical point of view the step-change model gives a better match to the data than the steady-change model. But I’ll emphasize that this doesn’t prove that either model is correct; the only thing we can claim to “prove” (in the statistical sense) is that the null hypothesis is wrong: the data are not just noise, there’s some signal there. We can say with confidence that this location has gotten warmer, but how it has warmed is still an open question.

Using All the Data

The trend analysis for this data so far, is based on 27 annual averages. Frankly, that’s not a lot of data! The original data consist of 9,862 measurements; it’d be nice if we could use it all. There’s more information content in the 9,862 data points than in just the 26. But if we just use the raw data as is, we still have to contend with the annual cycle, which really does swamp the long-term changes (which is what we’re after). Is there a way to remove the cycle without averaging an entire year’s measurements into a single number? Fortunately, there is.

Removing the Cycle with Anomalies

We can determine what the average annual cycle looks like, by averaging not entire years, but times of year over many successive years. For example, if we take the data for the first day of the year (or the first week, or the first month), from every year in the sample, then we’ll have a good idea of the average behavior for the first day (or week or month) of the year.

Then we can take each individual measurement, and subtract from it the average value for that time of year. This will tell us, not the temperature on Jan. 1, but the difference between a given year’s Jan. 1 temperature and the Jan. 1 average. This is called temperature anomaly. Positive numbers will indicate warmer-than-average data, negative numbers will indicate cooler-than-average data, whatever the season.

The more data we put into each day’s (or week’s, or month’s) average, the more precise our average behavior will be. Using daily averages gives us only 27 points in each average, and that’s not very precise. So, I’ll compute averages for each week of the year (actually, for each 1/50th of a year, which is very close to a week):

I’ve plotted two full cycles (which are identical to each other), so we’ll get a clear picture at all times of year (rather than having wintertime “split” between the start of the graph and its end).

The shape of the average annual cycle is evident. We also see a little bit of “jiggling about” in addition to the annual cycle, mainly because of random fluctuations. We can get a smoother estimate of the annual cycle by computing a smooth version of the data which we require to be exactly the same for every year. Fourier analysis is an ideal tool for this, because any periodic function can be expressed as a Fourier series, i.e., a sum of sine/cosine waves all with frequencies that are integer multiples of the fundamental frequency. The fundamental frequency is 1 cycle/year, so we’ll find the best fit for a combination of sinusoidal cycles with frequencies 1 cycle/year, 2 cycles/year, 3, 4, etc.; these are the harmonics of the fundamental frequency. Using six harmonics, we get a smooth picture of the annual cycle which looks like this:

Now we can subtract this behavior from the observed behavior, leaving residuals which represent the departure of each data point from the average for that time of year: the anomalies. They look like this:

We can also view with more time detail by looking at just the first five years of anomalies:

Now we can apply the same analyses as before. We’ll have more information due to more data, but we’ll also note that there’s more “scatter” (higher standard deviation) in the daily anomalies than in our annual averages, so we can’t expect the significance for our statistical tests to scale directly according to the number of data points. For linear regression, the estimated warming rate is 0.0378 deg.C/year, and it’s significant with false-alarm probability less than one in a billion, greater than 99.9999999% confidence. Likewise, comparing the pre-1988 average to post-1998 gives a significant warming with far greater than 99.9999999% confidence. It looks like we’re getting huge results!! But …

There’s Noise, and then there’s Noise

The statistical tests we’ve applied so far assume that the random part of the data (the noise) is white noise. This means that the noise part of every data point is completely independent of the noise part of every other data point. So, the noise on Jan. 2nd 1975 is completely independent of the noise on Jan 1st 1975, and every other day as well. This is a very common property of noise, and applies to a great many situations.

But it doesn’t apply to temperature data. If Jan. 1st is exceptionally warm because we’re in a winterime heat wave, then Jan. 2nd is likely to be unusually warm too. For temperature, the random fluctuations (the “weather”) show strong autocorrelation. In particular, nearby (in time) values tend to show positive correlation; random fluctuations don’t apply to a single instant, they tend to persist, albeit briefly. This kind of noise is called red noise.

Red noise can affect the significance of statistical tests dramatically. To compensate, we need to estimate how much autocorrelation exists in the noise, and how long it persists. There are many ways to do this. Probably the most common is to assume that the noise follows what is called an AR(1) process. In an AR(1) process, we assume that each noise value $\epsilon_n$ (not the signal, but the noise) is related to the previous noise value $\epsilon_{n-1}$ according to

$\epsilon_n = \alpha \epsilon_{n-1} + \beta_n$,

where the values $\beta_n$ are a white-noise process. We can estimate the value of the constant $\alpha$ by analyzing the data values.

For an AR(1) process, we can correct the statistical test by reducing the number of data points $N$ to an “effective number” $N_{eff}$, given by

$N_{eff} = N (1 - \alpha) / (1 + \alpha)$.

Making this adjustment, we find that the linear regression rate is significant at the 99.6% confidence level. That’s quite strong, but not 99.9999999% confidence.

However, just because noise is red noise doesn’t mean it’s AR(1) noise. I like to estimate the red-noise effect by taking into account the autocorrelation at all lags, in the most conservative possible way (the way that gives us the least chance of establishing significance). Using the stricter significance test, we find that the linear regression result is only significant at the 94% confidence level. That’s right on the edge of the “standard” 95% confidence level, and a rather strong result — but nowhere near 99.9999999% confidence! For the before-and-after average comparison, the result is significant at 99.8% confidence.

In fact, we should have corrected for autocovariance when analyzing the annual averages; the effect in that case is not very large, but not zero either. It turns out that the red-noise-corrected significance test results from the daily anomalies are about the same as the uncorrected results from the annual averages. In this particular case, the information gain from using all the data is about equal to the false information gain from failure to correct for red noise. That’s just a coincidence.

If at First You Don’t Succeed …

There’s another factor which is often overlooked, even by professionals. When we tested this data for temperature change, we didn’t just apply one test, we applied two. This means that we had not one, but two chances to get an apparently significant result just by accident. Suppose the data are just noise, no signal, and we require 95% confidence. With one test, we have a probability of 0.95 of failing the test. With two tests, the chance of double failure is only $0.95 \times 0.95 = 0.9025$. With three tests, it’s $0.95 \times 0.95 \times 0.95 = 0.8574$. So with multiple tests, we have more chance of getting a “hit” just by chance. After all, you don’t have much chance to win the lottery just by luck, but if you buy more tickets you have more chance.

Actually, it’s not quite so simple becaues different tests are usually not independent. But the point remains, that if we apply a vast number of tests, the chance of getting an apparently significant result just by accident increases. So with multiple tests, we need to correct the critical value (i.e., the value we must exceed to call the result “significant”) to compensate. For the two tests we’ve applied (linear regression and average comparison), the affect is slight, and doesn’t materially affect our result: linear regression is almost significant at 95% confidence, average comparison is definitely so.

For comparing averages, I split the data into before- and after-1988 subsets because it was a year boundary which split the data roughly in half. But I could have compared before and after 1987, or 1989, or 2000, or 1992.148. In fact I could have tested all possible boundary moments. Doing so constitutes one form of the SCUSUM test (strongly related to “change point analysis”). But since SCUSUM tests so many different boundary moments, we have to compensate for applying multiple tests. However, the tests at different boundaries are so far from independent, that we don’t have to compensate nearly as much as we would for that many independent tests. Test statistics for SCUSUM have been computed using computer simulations to determine the likelihood of given results from truly random data. It turns out that SCUSUM indicates the strongest difference between before and after averages occurs when the split is at 1987.962, which is very close to 1988. So, the initial choice of 1988 was very fortuitous. That too is a coincidence.

Smoother than Smooth

It’s often quite revealing to remove the random fluctuations, at least approximately, by smoothing the data. I sometimes think there are as many ways to do this as people who do it! Such methods can be broadly divided into two categories: global and local methods. Global methods analyze all the data in one fell swoop, to produce a smooth curve approximating the data. Local methods estimate the value at any given moment by using data in a small region of time near the moment for which we’re trying to estimate (often with nearer data given more weight).

We’ve already seen one smoothing method: taking averages. The graph of averages showed a lot of scatter, but far less than the original data (even if we take anomalies); we removed a lot of the random variation by taking averages. The longer the averaging period, the more random variation we remove; the 5-year averages removed more of it than the 1-year averages. One-year averages smooth the data on a roughly 1-year timescale, 5-year averages on a 5-year timescale, etc.

I’ll show the results of three smoothing methods. One is a low-pass filter; it’s a global method based on Fourier analysis. Any function at all can be expressed as a Fourier series, and if the time coverage is finite it can be expressed with a finite number of Fourier terms (a finite number of frequencies). The low frequencies correspond to slow variations, the high frequencies to fast variations. If we take the Fourier series expression of our data, and ignore the high-frequency terms, we’ll rid ourselves of the fast variations, leaving only the slow. Since we expect that the noise changes faster than the signal, we’ll eliminate most of the noise but only a little signal. How much smoothing we do (the timescale of the smoothing) depends on our choice of cutoff frequency. If we keep only the lowest frequencies, we keep only the slowest variation and we have a long timescale; the higher our cutoff frequency, the faster variations that can get through, and the shorter our smoothing timescale.

Another is wavelet analysis. This is a local analysis, which fits a single-frequency sinusoid to the data near any point at which we wish to make an estimate, weighting each datum by its distance from the point in question according to Gaussian decay (so the “weight function” looks just like the normal probability distribution). By varying the width of our weighting function, we can control the timescale of the smoothing.

The third is the method of moving polynomials. Again it’s a local method, often Gaussian-weighted, and the width of the weighting function again controls the timescale of smoothing.

Here are the results of the three different methods:

Plainly, all three methods give similar results. This is no surprise; after all, they should all give results which are close to the true signal, so they should all give results which are close to each other. The biggest differences are at the extremes of the seasons, in the dead of winter and height of summer. For these data, I consider the moving-polynomial method to give the best approximation, mainly because there’s so much data.

We can already see one thing from this: the winter troughs vary more than the summer peaks. It looks as though there’s more change occuring during winter than summer. Perhaps we could study different seasons separately to confirm or deny this.

To Everything There is a Season

This is relatively simple (at least, by one method). We can isolate the data from each of the four seasons and analyze it separately. I’ll adopt the climatological definition of seasons: winter is Dec-Jan-Feb, spring is Mar-Apr-May, summer is Jun-Jul-Aug, fall is Sep-Oct-Nov. Then we can apply the same analyses to the anomalies from each season separately.

Using linear regression, we estimate the summer trend as a mere 0.007 deg.C/yr, in autumn it’s only 0.013 deg.C/yr, in spring it’s as much as 0.048 deg.C/yr, while winter gives a rate of 0.084 deg.C/yr. The fall and summer rates are not statistically significant, even using the much-too-lenient white noise assumption. The spring warming passes the white-noise test but not the AR(1) or full-red-noise test. The winter warming is significant using the AR(1) model for red noise, but not the full red noise test. This emphasizes that we really need all the data to establish significant warming. Even so, the analysis is a strong indicator of what’s probably happening here: only mild warming if at all during summer and fall, more warming in spring, and the most warming during winter.

The SCUSUM test gives somewhat different results. Summer shows the most significant difference between pre- and post-1977.53 (earlier than the data as a whole), slightly cooling by a mere 0.05 deg.C, although the result is definitely not significant. Spring shows probable warming with its biggest difference before and after 1988.33, warming by 0.55 deg.C. Fall now shows more warming, and more significance, than spring, rising by 1.36 deg.C around 1999.63 (quite a bit later than the data as a whole). The most significant warming (but not the largest) is in winter, changing about 1987.96 (almost exactly the same time as the data as a whole) by about 1.19 deg.C.

The conclusion? This location has warmed. The warming is closer to a step-change in late 1987 than a steady warming, although neither model is a perfect match to the data. Warming is most significant in winter; fall warming appears to be large as well, but seems to come quite a bit later than winter warming. Warming during summer is not strongly indicated at all.

There’s more data from this location, but I limited the analysis because just the post-1975 data had nearly 10,000 data points, and much more than that makes ExCel (which I used to create the plots) slow down to a crawl (even the 10,000 made it painfully slow). Perhaps some day I’ll look into the entire record and report the results.

I have taken a few shortcuts in all of this (for the sake of clarity) and left out some details, and to be perfectly honest, I can’t promise that at the moment I’m not a little delirious. Also, this little exposition is far from complete; there are many more aspects to the problem and tools to apply. Still, I hope it gives some insight into the many complications that can arise when trying to answer as simple a question as whether or not a single temperature time series shows warming or cooling, or not. Certainly the most naive analyses can go astray in a number of ways. And we can’t be sure that some brilliant mathematician may, at some future date, find fault with our present understanding, or find much more powerful new methods to extend it, in either case shining new light on the best way to answer such a simple question.

Finally, the location to which this data applies is Vaexjoe, Sweden.

Categories: Global Warming · climate change

38 responses so far ↓

• John Mashey

Wonderful!

1) My cognitive psychologist colleagues always said:

“Humans are way better than computers at finding patterns, and often find ones that aren’t even there.”,

which is why this kind of analysis so worthwhile.

2) “If at first you don’t succeed” is a good reminder that if:

a) You datamine large numbers of studies, you will always find false positives.
OR
b) You take several sets of time-series data,
allow lag-times to vary, and
pick convenient endpoints

you can find almost anything … which then evaporates when new data arrives. As I recall, there have been a few papers like that in this turf, as people are always discovering cycles of various lengths.

• Excellent, T. Statistics for pleasure? Never entertained that idea before…

So, how do you test for lengthening of seasons? That is to say, the length of time that “summer’ or “winter” lasts in a given location? You have to choose an arbitrary (but relevant) temp and look for change in the length of time that it is exceeded.

So, are summers in Sweden now longer, and winters shorter?

• cody

Thanks, very nice, very clear, very well explained. Have you ever considered R, though? Spoken as one wearily wondering whether its worth the effort, but also tired of waiting for a spreadsheet to get through it all.

[Response: R squared gives the fraction of total variance which is explained by whatever model we fit. This has its uses, but it doesn't really tell us the statistical significance; that also depends on the (effective) number of data points and on then number of degrees of freedom in the model we choose. So R alone doesn't enable you to compute statistical significance. For me, it's usually not worth the wait (but there are cases in which it's useful information).]

• henry

Wow.

Took a while to read, still looking through.

Just a quick question, though:

As you said:

We see that there are wide swings in temperature each year, so the seasons are strong in this location; hence it’s probably not near the equator. We also see that temperature tends to be coldest shortly after the start of the year, hottest shortly after the middle of the year, so this must be a northern-hemisphere location (which you already figured out because it comes from the European Climate Assessment and Dataset Network).”

This chart shows us the trends in one location.

I can imagine that a SH location (same relative position), would show a cycle 180 degrees out.

Would these two cycles cancel out?

If possible, could you find a SH/NH location pair and show how a global average would be determined?

[Response: The northern-southern seasonal cycles cancel each other almost, but not exactly. Earth is closest to the sun (perihelion) during NH winter/SH summer, so the SH has somewhat stronger seasonal cycle than the NH and the planet as a whole is a bit warmer in January (when perihelion occurs).

Global average temperature calculations (for tracking global warming) are all based on *anomalies* rather than raw temperature, so all the seasonal cycles are removed before the global average is computed.]

• Bananus

Sorry, comment of Tamino on comment of henry is wrong. If I recall correctly, earth is cooler in January than in July because of larger inertia of Southern Hemisphere than Northern Hemisphere (ocean versus land). However, total energy content is larger in January than in July.

SH has also a smaller seasonal cycle than the NH due to the thermal inertia of its oceans.

[Response: I stand corrected. Do you have a reference where I could find more details?]

• Don Fontaine

Cody is probably refering to the stats platform R, not the correlation coef R. It is free. At
http://cran.r-project.org

[Response: I've always insisted on writing my own programs to implement my own twist on things. I realize it's just stubborn on my part, but I resist using stats platforms or code recipe books, and at this point I'm too old a dog (or is that dinosaur?) to learn that many new tricks. Advice to the aspiring statistician: don't make the same mistake. Get in the habit of making good use of all that work that other people do for you.

And what's your opinion of R?]

• John Cross

Tamino: Excellent, interesting and informative. I think this is one of your best posts ever!

Regards,
John

• henry

“If I recall correctly, earth is cooler in January than in July because of larger inertia of Southern Hemisphere than Northern Hemisphere (ocean versus land). However, total energy content is larger in January than in July.

SH has also a smaller seasonal cycle than the NH due to the thermal inertia of its oceans.”

Will wait to see the references that Bananus comes up with, but does this mean that SH oceans will get more solar energy than NH land?

[Global average temperature calculations (for tracking global warming) are all based on *anomalies* rather than raw temperature, so all the seasonal cycles are removed before the global average is computed.]

Would still like to see a NH/SH pair worked out using raw data:

1). To refute the argument that AGW is a “regional” thing

and

2). To see what kind of a signal is left after the “seasonal cycle cancellation effect” is removed.

Should still show a trend, shouldn’t it?

Just curious.

[Response: I don't doubt that Bananus is right; I'm just curious about more details.

When I looked into Milankovitch cycles, I discovered that even 'though one hemisphere can get more intense solar irradiance during summer because of closer approach to the sun, the *annual total* is unaffected because that hemisphere's summer season is *shorter*. The 1/r^2 factor from earth-sun distance exactly cancels the r^2 factor due to faster angular movement when closer to the sun. See e.g. Huybers 2006, Science, 313, 508-511.]

• Aaron Lewis

Good! However, why do you consider daily temperature data to be more instructive than say, daily sea ice area or sea ice extent data? It may be that a similar analysis of sea ice data should be submitted to a journal, or better yet, the Atlantic Monthly or New Yorker.

• tamino: excellent post ! You wrote: “For linear regression, the estimated warming rate is 0.0378 deg.C/year, and it’s significant with false-alarm probability less than one in a billion, greater than 99.9999999% confidence. Likewise, comparing the pre-1988 average to post-1998 gives a significant warming with far greater than 99.9999999% confidence. It looks like we’re getting huge results!! But …”

Over in ‘Uncertain sensitivity’ I have mentioned the famous paper by Lockwood et al. [on which so much research on solar long-term changes in the 20th century and even before hinges]:

“A Doubling of the Sun’s Coronal Magnetic Field during the Last 100 Years
M. Lockwood, R. Stamper, and M.N. Wild
NATURE Vol. 399, 3 June 1999. Pages 437-439

You’ll find this remarkable statement in their paper:
“Recently, an unprecedentedly high and significant correlation coefficient of 0.97 has been obtained [...] giving a significance level of 99.999,999,999,999,87 %”

I think they even beat out your number :-)
When I tried to quote that number in one of my papers, the reviewer at first wouldn’t let me quote ’such a ridiculous number’. Actually, Lockwood et al. must have felt this themselves because they actually wrote it like this: (100-1.3×10^(-13)) % which should fool nobody, but apparently has fooled a lot, since their paper has been cited by very many without anybody [except one, me] commenting on this.
They reason for this ridiculous nonsense is, as you point out, that many time series show persistence and successive data points are not independent.

Such nonsense must be widespread. Just last week I was asked to review a paper of sunspot activity based on daily values of the sunspot number. The author claimed that he had ~4000 daily values per solar cycles and calculated the statistical significance based on that many independent data points. It has been known for at least 80 years that the number of truly independent data points in a solar cycle is not 4000 [or 5,760,000 if you count the sunspots every minute instead of once a day], but only about 20 [twenty].

• Chirstopher

“I like to estimate the red-noise effect by taking into account the autocorrelation at all lags, in the most conservative possible way (the way that gives us the least chance of establishing significance). ”

Could you offer some details as to how? In the same style as the overall article? For AR(1) you estimate alpha from the data and use that to get at effective N, correct? How would this work across all lags?

[Response: For the AR(1) model, the constant $\alpha$ is the autocorrelation at lag 1 (often called $\rho_1$). If it's AR(1), that then determines all the autocorrelations $\rho_n$, at all lags n. The ratio of the number of data points to the effective number is

$1 + 2[\rho_1 + \rho_2 + \rho_3 + ...]‘ title=’1 + 2[\rho_1 + \rho_2 + \rho_3 + ...]‘ class=’latex’ />.

When the

are determined by an AR(1) process, it works out to the expression given above.

When the $\rho_n$ are not AR(1), it’s necessary to estimate all of them to compute the sum (or at least, enough of them that we can safely assume the remaining terms are small). This is not easy because the usual estimators are biased (meaning, what we expect to get from the estimate is not the true value). So first, I use my own estimator, which is less biased (but has more variance) than the standard formulae (but it’s by no means perfect). Then, when computing the sum, I adopt the ultra-conservative approach of taking the largest of all the “partial sums,” thereby giving the smallest effective number. There are other complications; the ratio as I’ve stated in this comment applies to an infinite time series, but finite time series will show a slightly smaller ratio. I deliberately ignore that in order to be even more conservative.

The effect of autocorrelation on time series analysis can be quite complex. Unfortunately, it’s too often ignored, leading to high confidence in false conclusions.]

• The standard way in Geophysics to estimate the ‘effective’ number of independent points goes something like this:
Let the standard deviation of a variable y[] be m. A function m(h) is defined as the standard deviation of the series y(h) defined as the averages of sets of successive values of y, such that y(h) = (y[0] +y[1] +y[2] + …y[h-1])/h
for any series y[] we can construct another series y’[] by repeating the each value r times. The average of h of the new values (if h = r h’, where h’ is an integer) is the average of h’ values y[]. Hence, the standard deviation
m’(h) for these averages of the series y’[] is:
m’(h) = m(h’) = m/sqrt(h’) = m/sqrt(h/r) = sqrt(r) (m/sqrt(h)), and then
r = ((m’(h)/(m/sqrt(h))^2 = h (m’(h))^2/m^2.

Using these considerations, we can get a measure of the degree of persistence
(or conservation as it is often called) by computing the standard deviation m(h)
for the original values of y[] (i.e. for h = 1), and also for the averages of every h successive values of y[] (h = 2, 3, …). We write this as:
e(h) = (m(h)/(m/sqrt(h)))^2 = h m^2(h)/m^2.
The effective number of independent values in the sets of h successive values y[] is then h/e(h).

For the sunspot series since 1749, Chapman and Bartels reported the following values:
h (days) 32 64 128 256 512 1024 2058 4096
e(h) 14.0 26.8 51.7 99 190 325 323 201
h/e(h) 2.3 2.4 2.5 2.6 2.7 3.1 6.3 20.4

• Don Fontaine

I’m new to R having spent a couple of weeks playing with it. First impression is favorable. They have time series objects and periodograms are quick. I wasn’t able to get it to automatically deal with missing data in the time series. I had to manually locate and fix by putting in an estimate. Default graphics are good. It does quick and easy lag plots, monthly sea ice extent and areas was my test data set. The 1 to 13 month lag plots make a nice picture. I haven’t found lag correlation function calculation but it probably is in there. I am not a statistician so am inclined to accept a tool if it appears to be behaving as advertised.

• wattsupwiththat

Nicely done.

• [...] Analyze This No, this isn’t about a movie starring Robert de Niro and Billy Crystal (although I do like that movie — […] [...]

• cody

tamino, if you have the energy, maybe a companion piece on pca? Not trying to make work for you, just a thought!

• Bananus

Here I am. Data? For the smaller variance between NH and SH, take a look for instance at the NCEP/NCAR reanalysis. (I hope these links work, otherwise, you can remake the maps on climexp.knmi.nl). Tamino, if you would like more information, send me an email.

• Bananus

Extra graph to be able to compare the NH and the SH:

• tristram shandy

Tamino,

Learning R would be a good thing for you to do.
It’s nice to write your own analysis code, but alas unless you test it in a suitable fashion then you run the risk of making unforced errors. Just a word to the wise. Also, It would be nice
and code. That way we don’t have to re invent the wheel, but we build on your knowledge and contribution.

The other thing to have a look at is this nice little paper:

http://ocean.mit.edu/~cwunsch/papersonline/wunschaha2007.pdf

I note that on a few occasions you select periods from a time series and then do linear fits ( for example, looking at 1975 to present in land surface records) The human visual system is very well adpated to picking out “linear regimes”

Sadly, this wet ware system can be misleading. The previous article has some delightful examples of miseleading stats in climate science. I think with your understanding you could make a good contribution to draining the swamp.
Given that you ascribe to AGW perhaps you would be in a good position to question questionable practices.
FWIW

[Response: Wunsch's paper was a joy to read, thanks for the reference. Clearly he and I are of like mind, and I see that he's trying to do for geophysics what I tried to do in astronomy: advocate better sense (and more healthy skepticism) in statistical analysis. I especially like his illustration of the danger of over-interpreting Fourier analysis (this is a pet peeve of mine, and astronomers are as guilty as anybody), and I swear I've only seen a few papers using wavelet analysis in time series that didn't make me wince. Also, we're both trying to get people actually to *look* at the data; the eye/brain combination surely has its weaknesses, but it's still one of the best analysis tools around (and it gets better, the more experience you gain).]

• Tamino:

Another great post.

I am busy trying to reproduce your analysis in Excel, only to see Tristan Shandy suggest that …

“… It would be nice
and code. That way we don’t have to re invent the wheel, but we build on your knowledge and contribution.”

I agree with Tristan!!! You indicated in a previous post that you do your work in Excel. I would be happy to host copies of your workbooks on my website so that your readers could take a crack at the same data and see what they come up with.

Since I’m interested in the ” how to” aspects of environmental data analysis with Excel and find myself often using your data analysis examples to expand my Excel toolkit, this could be a great fit. I know that at least Tristan and I would like to get our hands on your workbooks.

E-mail me if you’d like to pursue.

• John Mashey

1) Some of this is reminiscent of John Tukey: see
http://cm.bell-labs.com/cm/ms/departments/sia/tukey/tributes.html#chambers1
for John Chambers (S –> R) comments; the Neyman lecture he menions has additional useful thoughts about statistics and software.
[And if I were doign serious statistical work, I'd certainly learn R.]

2) Some other related favorites:

the first batch are more about detecting or avoiding ,errors, the second about showing data well.

John Allen Paulos, “Innumeracy: Mathematical Illiteracy and its Consequences”, 1988.

Darrell Huff “How to Lie With Statistics”, 1954.

Joel Best, “Damned Lies and Statistics”, 2001.

Gregory Kimble, “How to use (and Mis-use) Statistics”, 1978.

Gerald Everette Jones, “How to Lie With Charts”, 1995.

Mark Monmoner & H. J. de Blij, “How to Lie with Maps (2nd Ed), 1996.

Philip Good & james Hardin, “Common Errors in Statistics (and How to Avoid Them)”, 2003.

and classics, but :
======
William S. Cleveland, “Visualizing Data”, 1993.
and “The Elements of Graphing Data”, 1994.

and of course:
Edward Tufte
“The Visual Display of Quantitative Information”,
“Envisioning Information”, “Visual Explanations”, and “Beautiful Evidence.”

Those are truly beautiful books, and if you don’t have them, Tufte often gives a $380 one-day course that includes all 4 books (which would be ~$200)t, and from personal experience, it is well worth hearing him.
http://www.edwardtufte.com/tufte/

• Tamino:
Interesting post. But I’m a bit put off by niggling errors in the simpler cases,

For example, you are doing statistics using small samples sizes. I did notice you are using “z” values for large numbers in your discussion. That ‘1.96′ you cite for the 95% confidence interval rises to 2.07 when you have only 22 degrees of freedom (which I think is what you have left after you do your linear regression and fit two parameters. In many other examples where you average, you have lower degrees of freedom which means the t-distribution requires larger values “z”.)

This also has some direct bearing on your narrative. If you take the short cut and round two 2.07 to 2, you are likely to accept as real trends that occur due to random chance. (Your text suggests the opposite because you say the rounding is up from 1.96 to 2, rather than down from 2.07 to 2.)

This may sound nit-picky, but of course, scientists generally wish to be cautious and really don’t want to appear to be claiming they have shown things with a high degree of confidence, when that confidence is unwarranted. I’m fairly sure you need to use small sample statistics all cases where you discuss the annual annual temperatures and the longer time averages.

I was wondering also, to be complete, why don’t you de-trend and apply the correction for auto-correlation to the step-function example? (The one discussed in your section “Comparing Averages: More than One Way to Skin a Cat”)

After all, eyeballing that data suggests correlation between average annual temperatures in adjacent years. (Plus, I believe things like ElNino cycles persist more than one year. So, the correlation between adjacent temperatures in the time series should be tested, addressed dealt with in some way.

Both dealing with the small sample issue and the autocorrelation will reduce the degree of confidence in your conclusion that the later years are warmer than the former years — but at least you will have dealt with the issues.

Still, good article.

[Response: Note that after introducing red noise I stated, "In fact, we should have corrected for autocovariance when analyzing the annual averages; the effect in that case is not very large, but not zero either." And when using all the data, I did correct for autocorrelation in the step-function model; that's why the significance estimate dropped from more than 99.9999999% confidence to 99.8%. Other than failure to apply a t-test to compensate for small sample size when using annual averages, I don't see that any of your criticisms are valid. The claim that this location warmed is based on comparison of averages for the entire data set, including the effect of autocorrelation, and is valid. And the entire post is intended as an introduction for the non-analyst, not a peer-reviewed analysis of this data set. So yes, I'd say it's a bit nit-picky.]

• John Mashey

Maybe a bit nit-picky, but certainly the small-sample issues were worth mentioning - more people mess that up than one might expect.

Anyway, still a fine exposition - if you had a section at the right for particularly important posts, this should go there.

• Ahh– I see where you mentioned that issue of the autocorrelation applied to the step function case in that single sentence! Sorry, I did miss that. (That’s what I get for not having a yellow highlighter to use on the web. )

I would like to comment on this though:

“And the entire post is intended as an introduction for the non-analyst, not a peer-reviewed analysis of this data set. So yes, I’d say it’s a bit nit-picky.”

My reason for thinking it’s nit-picky is different from yours. (Mine was that eyeballing the data, and being aware of the likely magnitude of the autocorrelation, I thought the t or 3.05 wasn’t going to get knocked down enough to change the conclusion. )

But, with all due respect Tamino, if your goal is to present this to the non-analyst, I think it is a grave mistake not to specifically discuss the correction for autocorrelation when using the 27 annual data points. It’s also a grave error not to mention the issue of small sample size.

After all, the 27 data point example is precisely the one that the non-analyst is likely to be able to understand, try to repeat or ask their friends to explain.

For these reasons, appearing to make errors — like ignoring small sample size or correlation– that would result in markdowns of undergraduate reports is unwise.

Moreover, the fact that the temperature change result can be shown significant while doing the simpler, more tractable problem, will tend to both:

a) emphasize the strength of the conclusion that temperatures are higher in the latter set of years and

b) show that the conclusion doesn’t require high-falutin’ steps that non-analysts will imagine amount to statistical massaging of data. (That is, the Fourier transform, etc.)

For what it’s worth, a clear, comprehensive illustration of the treatment using with the 27 annual temperature data points would make a nice classroom exercise too.

• Chirstopher

OK, I’m confused now. Could you elaborate or, as it were, supplement the post with your thoughts on these considerations? Using t for small n makes sense the rest did not.

[Response: I think that the introduction to the impact of autocorrelation was logically done. It's appropriate to deal with it later in the exposition, and the unbelievable statistical significance which results when we ignore it with a large number of closely-spaced data is a natural starting point. I also clearly stated, "In fact, we should have corrected for autocovariance when analyzing the annual averages." Going back to re-do the analysis of annual averages wouldn't have added any new ideas.

This is not an undergrad stats lecture, it's a "public lecture." I also stated clearly that "I have taken a few shortcuts in all of this (for the sake of clarity) and left out some details." My opinion: everybody is a critic. They're entitled to their opinion, so am I, and I don't think the monday-morning quarterbacks have really added any clarity. It's especially revealing that only *after* their comments did anyone say, "OK, I'm confused now."]

• Tamino:
After I posted yesterday, I realized that the four untested assumptions you used when illustrating the hypothesis test of the “step function” model for the 27 annual average temperatures might be enough to lead to truly spurious results.

So, I took a little time today and:
a) guessed value of alpha= 1/3 for annual average temperatures. (I’d have calculated from your data– but you don’t post a url and I’m not quite sure where to get it based on the text only description. So, instead, I downloaded some available global temperature data and calculating some serial correlation values for annual average temperatures. Since it’s a guess, I didn’t think it was worth being overly precise and used 1/3. )
b) applied your formula to determine the effective sample size (Neff) for both the 13 and 14 year periods.
c) accounted for the possibilities of different variances for the 13 and 14 years sample groups. (That is, I used the more general formulation as described here and
d) accounted for small sample size (rather than using t=1.96 for the 95% confidence interval).

I did this in haste, and may well have done this incorrectly, but appears that when we correct for all the assumptions you made, we accept the null hypothesis when testing the “step function” model for change in the annual average temperature . That is: The apparent change is not statistically significant.

That’s the opposite conclusion from the one you provide in your narrative.

So, out of curiosity, what did you find for the value of “alpha” for the annual average temperatures?

Or, since I’m sure you already calculated the value in R, maybe you could make your computation public, as one of your readers above suggested. This would probably save you time since, I could just open the file and read the value of alpha.

• David B. Benson

Tamino — This is quite good. I’d be interested in a Bayes factor comparison of two hypotheses, say, flat trend versus linear or step trend.

To keep it short, just the yearly averages will suffice. Assume the deviations from the trend line are due to Gaussian random noise and that all years are independent. Then the joint log-probability for either trend line is the sum of the log-probabilities for each year. The difference in the joint log-probabilites, times ten, gives the decibans by which one hypothesis is to be preferred over the other. The Wikipeida page on Bayses factor gives the Harold Jeffreys translation of decibans into strength terms. What is interesting in this little test is whether or not the linear trend does better than the flat trend by at least 5 decibans (substantial) or even by at least 10 decibans (strong).

• sirlurksalot

Lucia what the hell is your problem?

Tamino gave us an excellent essay. He said “In fact, we should have corrected for autocovariance when analyzing the annual averages” but you act like he never mentioned it or even tried to cover it up. He gave all kinds of caveats. We all get it except you. And T is so right when he points out that until you came along nobody had to say “OK, I’m confused now”.

Here’s what I think. Your claim of four untested assumptions is bull****. You haven’t contributed to clarity or knowledge one little bit you only muddied the waters. You’re not nit-picky you’re bending over backwards to find fault.

I guess you did this in an attempt to make yourself look smart. You failed.

• dino

Very informative, especially for the statistically challenged. I have a simple query if someone would care to advise me. I have temperature data for 21 years, but only monthly with several gaps. There appears to be a small warming trend, but is there any way of testing significance with such an incomplete data set? Thanks for the blog.

[Response: The presence of gaps has little effect on most trend analyses (but it can have a profound effect on, e.g., Fourier analysis).

Are these data raw temperature, or temperature anomaly? What are you using for analysis?]

• fragment

Lucia, using the info in the post it took a minute on google to find this, which looks like where Tamino sourced the data from. Given that this post is an analysis of temp data from a single location, I don’t see how you can claim an opposite conclusion when you’ve gone and analysed something else.

• dino

The data are raw temperature. I only have access to excel at the moment. I have tried imputing figures for the missing data with a simple average by month, which I considered would provide a conservative result. This still showed a warming trend, albeit weaker than before. Not sure how to test the significance of the trend.

• tristram shandy

Sirlurksalot,

Try to keep things civil

• tristram shandy

cool if you could post your R code. I’ve encouraged Tamino to do the same. Plus some folks might not appreciate the nits you pick, but some do. And you’ve been pleasent. Took me a while to learn that.

• Charles Whitney

I agree with sirlurksalot. Lucia’s first comment was pleasant enough, but her last one was just an attempt to make tamino look bad by proving he was wrong. But it makes no sense since he was the first to say he should have applied the autocorrelation correction to the annual averages.

You compliment Lucia for being pleasant. I don’t think so. Pleasant language doesn’t make unpleasant behavior any less reprehensible.

• Molnar

To address a minor point, Twain said it, but he attributed it to Disraeli. The true origin is unclear, but it’s pretty certain that Twain neither claimed, nor should be credited with, authorship.

• back when the 1934 thing was hot
I tried my hand at creating a temperature estimate for the lower 48 for the month of January.
my question is how close did I get?
here’s my data http://www.geocities.com/jacob_inmt47/d50.txt
thanks for your time and effort

• fragment– I didn’t exactly analyze “something else”. What I was trying to do is whether lift Tamino’s result of “reject Ho” is likely to remain stable if we all of the several assumptions he made. It looks like it may not. (I didn’t say it would certainly not be– I said it might not.)

What I did was take the information Tamino provided and try to see what happens if I 1) account for the finite size of the data set, 2) remove the assumption of equal variances and 3) account for correlation using a plausible value of ‘alpha’. I could do 1 & 2 based on the information Tamino provided in his post. (Those corrections require known sample size and sample standard deviations.)

To do (3) I need to estimate the magnitude of “alpha”, which Tamino does not provide in his article.

If he had posted that value, I could just finish the problem and figure out if, after lifting Tamino’s assumptions, we still “reject Ho”.

Since didn’t have it, and other similar data, are available, I estimated “alpha= 1/3″ based on a set of global average temperatures and use that value.

Using that value I get a different result for the hypothesis test than that reported by Tamino.

What does my getting a different result mean?

It means that it’s necessary to get the correct value of alpha to do the full problem correctly. That’s why I asked Tamino for the value or failing that to precisely specify what data he used. As motivatino for my request, I explained why I wanted the value.

So, in that vein, I thank you, Fragment, for the link to what might be the data.

However, I still am not sure which of the various data sets for Temperature match the one Tamino used.

I know Tamino said: “I took data from ECA (the European Climate Assessment and Dataset Project) because they provide daily measurements, and selected location number 1.”

The link you, Fragment, provided has things like “All elements”, “Daily maximum temperature TX” “Daily maximum temperature TN”. Daily mean temperature TG” etc. So, which is the appropriate temperature from location 1?

If we either learn the magnitude of “alpha” or identify the correct series of data points, the issue of correctness could be resolved.

Of course, it’s also fine if this remains unresolved. It’s not like the fate of the world, the consensus on global climate change, or even non-analysts’ understanding of how the thought process involved in hypothesis testing really hangs on this issue or blog post, but I’m curious about it.

• Phil.

“Aaron Lewis // November 5, 2007 at 5:32 pm
Good! However, why do you consider daily temperature data to be more instructive than say, daily sea ice area or sea ice extent data? It may be that a similar analysis of sea ice data should be submitted to a journal, or better yet, the Atlantic Monthly or New Yorker.”

The data in ‘Cryosphere Today’ would be a good source for this, both NH and SH and combined data (illustrates the non cancellation).