In a previous post we looked at a model for random processes, the “AR” (autoregressive) model. Let’s take a look at another random process, the “MA” (moving average) model.

Many of us are familiar with *moving averages* as a way of smoothing data. In its simplest form, given a time series we take the average of a number of consecutive values of a time series, centered at a particular point, to generate the moving-average value at that point. For simplicity, let’s take the average of consecutive values: the central value, plus to the left and to the right, to create the smoothed series :

.

Moving averages are a fine way to smooth a time series. But it’s important to keep in mind that two consecutive smoothed values have a lot of data points in common; the values and are based on entirely the same set of original values, except for which forms part of the average for but not for , and which forms part of the average for but not for . Therefore the process introduces *autocorrelation* in the smoothed series.

Suppose the original time series is a random process, and the expected value of any given datum is zero, i.e., it’s a zero-mean random time series. Then the *autocovariance* at lag j is the expected value of a data point with its lag-j ( time steps preceding) value (the angle brackets denote the expected value of the enclosed quantity)

.

The autocovariance at lag zero is just the data variance

Now suppose that our original series is *white noise*, which means that different values are uncorrelated. In this case all the autocovariances are zero, except the lag-zero autocovariance which is the data variance

, for .

If , then the autocovariance at lag zero is just the variance of the white-noise series

,

Therefore the autocorrelations are also all zero, except the lag-zero autocorrelation which is always equal to 1:

, for .

Now let’s consider our moving-average time series. We have

.

We can substitute the values for to get

.

When we evaluate the double sum, we’ll have a large number of terms which are products of two values of . If the -values are *different*, then the expected value of their product will be zero because all the values are uncorrelated (it’s white noise!). But if the term is of two -values which are the *same*, then the expected value of the product is just the variance of the white noise, . If is less than , then there will be exactly such nonzero terms, but if is greater than or equal to then there won’t be any terms at all with the same values. The total autocovariance is therefore the number of nonzero terms times the expected value of each nonzero term, or

, for , otherwise zero.

These are the auto*covariances*; the auto*correlations* are therefore

.

From this we see that the autocorrelation decreases steadily until lag , when it becomes zero, and is zero for all higher lags. I generated 1000 data points of a white-noise series, then computed a series of 3-point moving averages (which includes the central value and points before and after), and computed the *sample* autocorrelation of this series, which is graphed here:

The dashed lines give the 95% confidence limits of no autocorrelation. Clearly the autocorrelation decreases steadily, reaching zero (within its error limits) at lag 3, and is zero (within its error limits) thereafter.

This is the usual definition of a moving average. It computes an average for the time index at the center of the range of values being averaged. One can also, if one wishes, assign the averaged value to the *end* time of the range of values being averaged (this is what ExCel does, and I find it a bit annoying but so be it). One can also compute a *weighted* moving average, in which the values being averaged aren’t all weighted equally. In fact it’s not uncommon to use a Gaussian weight function, so values at the center of the range are given greater weight and values far from the center less weight. All these are examples of moving averages.

If one computes a moving average from a finite number of terms, and the values being averaged are a white-noise process, and there are a finite number of values being averaged at each step, and the average is assigned the time index of the final point being averaged, and the weight given to the final point is 1, then we have an example of a *moving average process*. We can write it as

.

The process is a white-noise process. Note that the coefficient of is one, that there are a finite number () of terms, and that the time index of the average is the same as that of the “most recent” value entering into the average. Since there are terms in the average (from to in reverse order), this is a moving-average process of order , or an process.

This is not too dissimilar from the process we looked at before. Each new value was a white-noise term () plus a linear combination of preceding values, as much as (the order of the process) time steps preceding the new value. Now we have that each new value is a white-noise term () plus a linear combination of preceding white-noise terms, as much as (the order of the process) time steps preceding the new value.

We can analyze the autocorrelation structure of the process, in much the same way we did for simple moving averages. If we do so, we’ll find that just as before, when we get beyond a certain lag value the autocorrelation is zero. In fact, for an process the autocorrelation is zero for all lags greater than .

In fact the moving average actually *is* also a moving-average process. Consider the 3-point moving average shown above. This was

.

Now define a *new* white-noise process

.

Then we see that

.

Since is a white-noise process, there are a finite number of terms, and the coefficient of the most recent term is one, we have a moving-average process. It goes as far as 2 time steps prior to the current time index, so in fact it’s an process. I’ll take this opportunity to correct a mistake I made in recent comments, where I said that a -point moving average has structure; in fact it actually has structure.

The fact that the autocorrelation function (ACF) is zero for lags greater than the order of the process is a *sign* that the process is MA. We noted something similar for AR processes, that the *partial* autocorrelation function (PACF) is zero for lags greater than the order of the MA process. If you study the autocorrelation and partial autocorrelation functions, and one of them doesn’t just “tail off” gradually, but ends (by which I mean, goes to zero) abruptly, that’s a sign of a pure AR or pure MA process. If it’s the ACF that ends abruptly, the process is MA and the lag for the final nonzero autocorrelation is the order of the MA process. If it’s the PACF that ends abruptly, the process is AR and the lag for the final nonzero partial autocorrelation is the order of the AR process.

A random process doesn’t have to be just AR or MA. It can be both, consisting of a white-noise term () plus a linear combination of preceding values up to order , *and* preceding values up to order q. Then we have

.

This is the general form of an ARMA(p,q) process. For an ARMA process, neither the ACF nor the PACF will end abruptly, values will decay slowly for all lags. So we can’t tell just from the ACF or PACF what the orders of the process are. Instead we can form hypotheses, estimate models for each hypothesis, and test their suitability using some criterion (like the Akaike or Bayesian information criterion).

The residuals from fitting a trend to global average temperature data, at monthly time resolution, are reasonably well approximated as an ARMA(1,1) process. The match isn’t perfect, but it’s pretty good. In an upcoming post, I’ll generate some artificial ARMA(1,1) data, add a trend to it, and we’ll look at how it behaves. In doing so, we’ll gain some insight into how monthly average global temperature can be expected to behave.

And of course we haven’t exhausted the possibilities of alphabet soup either! So there’s more to come on more versions of time series processes.

## 3 responses so far ↓

swade016// September 12, 2008 at 1:53 pmIs the reason they monthly approximations are only “pretty good” is due to temperature being non-stationary? If you were to switch to yearly, would the ARMA(1,1) become better since the “seasonal” component would be minimized?

Still trying to catch up on time-series stuff, so forgive me if this is a stupid question.

[

Response: The non-stationarity of the temperature is included in the trend; presumably the difference between the trend and the data (the residuals) is stationary. The ARMA(1,1) model applies to the residuals only. It's possible that the residuals themselves are non-stationary, but I don't see any evidence of that.I believe that the residuals are not exactly an ARMA(1,1) process primarily because there's some very elaborate physics behind global climate, and such processes tend to induce a complicated autocorrelation structure, so the autocorrelations don't exactly follow the behavior (the signature of an ARMA(1,1). However, they're very close to following that formula. Yet for most models, it's impossible to confirm or deny them with certainty because we can only

estimatethe autocorrelations, and as the lag gets very large they all become statistically indistinguishable from zero. Also, there's no "perfect" estimator of autocorrelation from observed data; the most common (Yule-Walker) estimate is biased, and even bias-corrected estimates are only unbiased to 1st order.We can say with certainty that it's not AR(1).]Gavin's Pussycat// September 12, 2008 at 5:57 pmThis is fun.

Two questions:

1) is the reason that monthly mean temp data is ARMA(1,1) that it is actually constructed by one-month averaging (the MA(1) part) on “point data” (daily means) that are close to AR(1)?

[

Response: No. If you take monthly averages of AR(1) daily data, then you will get autocorrelations of the form , but the factor will be greater than 1. For GISS monthly data the factor is less than 1.]2) In an earlier post (”To AR(1) or not to AR(1)” IIRC) you used an autocovariance expression of alpha*rho^j. Am I right in recognizing your above mentioned “signature” for ARMA(1,1)?

[

Response: Yes indeed.]Gavin's Pussycat// September 13, 2008 at 6:58 amOK, thanks. That means then that daily global mean temp data are

notAR(1), right? So… whatisit?## Leave a Comment