Open Mind

Alphabet Soup, part 1: AR

August 22, 2008 · 8 Comments

If a time series is the sum of a signal and noise, then the noise part itself constitutes a time series. If it’s white noise, then all the values are uncorrelated with each other. But we’ve often considered that the noise in geophysical time series (like temperature data) isn’t white. What other kinds of noise processes are there?

In this post we’ll take a look at a kind of process known as an autoregressive, or AR process. There’s a lot of math in this post! Also, we’ll only scratch the surface of the subject of alphabet soup because there are a lot more types of random time series than just AR series. But in future posts, we’ll take a look at MA series, ARMA series, ARIMA, SARIMA, FARIMA series, and we’ll consider their behaviors as well. Fortunately, this post lays the groundwork for much of the notation and conventions involved.

And for those of you who aren’t in to the math thing … don’t worry, I’ll continue posting about climate (and occasionally other) stuff as well. This series will appear interspersed among those other topics.

Consider a random time series x_n, where n is the “time index.” Assume that the expected value of each datum is zero, i.e., that it’s a zero-mean series, and that the expected variance (the expected value of the square) of any data value is \sigma^2. Let’s further assume that the time series is stationary, which basically means that any given time frame can be expected to behave just like any other given time frame. If this noise process is white, then we further have that any two different values are uncorrelated, i.e., the expected value of their product is zero. Using angle brackets to indicate the expected value of the enclosed quantity, we can write these conditions as

\langle x_j \rangle = 0,

for any data point, and that

\langle x_j x_k \rangle = 0,

unless j=k, in which case

\langle (x_j)^2 \rangle = \sigma^2.

The white-noise model is the most common one when analyzing data naively. However, it’s commonplace for time series to behave differently. We’ve seen in many instances that the noise in temperature (and other) time series isn’t white, because it shows autocorrelation.

To understand this, we can begin by defining the autocovariance for our zero-mean noise data: the expected value of the product of two data values which have some lag between them. If the lag is zero, then we have simply the expected value of the square of a given value, which is just the variance of the series. This can also be called the autocovariance at lag zero, which we can write as

\gamma_0 = \langle (x_n)^2 \rangle = \sigma^2.

The autocovariance at some generic lag h is

\gamma_h = \langle x_n x_{n-h} \rangle.

The first thing to note is that since the process is stationary, the expected value of the product of two values where the 2nd one leads rather than lags the 1st (by the same number of time steps) will be the same:

\langle x_n x_{n-h} \rangle = \langle x_n x_{n+h} \rangle.

Another way to say this is

\gamma_h = \gamma_{-h}.

Essentially, the expected value of two data points doesn’t depend on which one leads and which one lags, only on the difference between their time indexes. So if we know the values of \gamma_h for all nonnegative values of h, we know them for negative values also, hence we know the entire autocovariance structure of our time series of random variables.

We’re not usually as interested in the size of the autocovariances as we are in their relative sizes; these define the autocorrelation at arbitrary lag

\rho_h = \gamma_h / \gamma_0.

We see immediately that the autocorrelation at lag 0 will be

\rho_0 = 1.

It’s also not too hard to determine that the other autocorrelations can’t be greater than 1,

\rho_h \le 1,

and in fact will only be equal to 1 when the lag h is equal to 0. If any of the autocorrelations for nonzero lag are different from zero, then we say that the time series exhibits autocorrelation. As we’ve seen before, knowledge of the autocorrelation structure enables us to determine how the noise affects things like the probable error of a trend estimate.

AR models

How can we generate a random series that shows autocorrelation? One way is through an autoregressive process. The simplest example is a 1st-order autoregressive, or AR(1) process, in which each value is some constant \phi (which is not zero) times the previous value, plus a white-noise term:

x_n = \phi x_{n-1} + w_n.

The white noise terms are assumed to be a stationary, zero-mean, uncorrelated random process with its own variance \langle (w_n)^2 \rangle = \sigma_w^2, which has no correlation with any of the already-existing data values. This equation holds for all values of the time index n. To get the variance of the data series itself, we can square both sides of this equation to get

x_n^2 = \phi^2 x_{n-1}^2 + 2 \phi x_{n-1} w_n + w_n^2.

Now let’s take the expected value of both sides. On the left, we have the expected value of the square of a data value, which is just \sigma^2. The first term on the right is \phi^2 times the expected square of a data value, which gives \phi^2 \sigma^2. The 2nd term on the right is zero, because w_n is uncorrelated with x_{n-1}. The final term on the right is just the variance of the white-noise process, which is \sigma_w^2. So we have

\sigma^2 = \phi^2 \sigma^2 + \sigma_w^2,

which we can rearrange as

\sigma^2 = \sigma_w^2 / (1 - \phi^2).

This of course is the same as \gamma_0, the autocovariance at lag 0.

If we multiply the basic AR(1) equation by some lagged value x_{n-h}, and take expected values as before, we get

\gamma_h = \phi \gamma_{h-1}.

This tells us that \gamma_1 = \phi \gamma_0, \gamma_2 = \phi \gamma_1, \gamma_3 = \phi \gamma_2, etc. From this we can deduce that

\gamma_h = \phi^h \gamma_0,

or simply

\rho_h = \phi^h.

Hence we know the entire autocorrelation structure of an AR(1) process. It’s worth noting that all of the autocorrelations, for all lags, are nonzero.

AR models aren’t limited to first order. We can have a 2nd-order process in which each value is a linear combination of the last two terms, plus a white-noise factor, giving an AR(2) process:

x_n = \phi_1 x_{n-1} + \phi_2 x_{n-2} + w_n.

In fact we can define an AR(p) process of arbitrary order p as

x_n = \phi_1 x_{n-1} + \phi_2 x_{n-2} + ... + \phi_p x_{n-p} + w_n.

For each order, if we know the autoregressive coefficients \phi_1, \phi_2, ..., \phi_p we can compute the autocorrelations, so we can again determine the autocorrelation structure of the process.

Recursive Representation

Consider again the AR(1) process

x_n = \phi x_{n-1} + w_n.

This holds true not just for time index n, but for time index n-1, n-2, etc. Therefore we can recurse this formula to get

x_n = \phi x_{n-1} + w_n = \phi (\phi x_{n-2} + w_{n-1}) + w_n = \phi (\phi (x_{n-3} + w_{n-2}) + w_{n-1}) + w_n = ...

This is equivalent to

x_n = \phi^2 x_{n-2} + \phi w_{n-1} + w_n = \phi^3 x_{n-3} + \phi^2 w_{n-2} + \phi w_{n-1} + w_n = ...

= \phi^k x_{n-k} + \phi^{k-1} w_{n-k+1} + \phi^{k-2} w_{n-k+2} + ... + \phi w_{n-1} + w_n.

Now suppose that the autoregressive factor \phi is between -1 and +1. Then the quantities \phi^k get smaller and smaller as k gets bigger, and if we take k big enough they’ll get negligibly small. We can even let k go to infinity, in which case the term \phi^\infty goes to zero and the dependence on a preceding x term disappears altogether

x_n = w_n + \phi w_{n-1} + \phi^2 w_{n-2} + \phi^3 w_{n-3} + ... + \phi^k w_{n-k} + ...

= \sum_{k=0}^\infty \phi^k w_{n-k}.

This process, in which each value is a weighted average of white-noise terms, is a type of moving-average, or MA process. In this case, there are infinitely many white-noise values involved, so we might call it an MA(\infty) process. But MA processes are the subject of the next in this series, so let’s not get too far ahead of ourselves. Of course we would prefer to express it as an AR(1) process, since we can compute a finite process; in fact an AR or MA process generally has finite order, although we can certainly imagine such processes being of infinite order.


If the AR(1) multiplier \phi is greater than 1 or less than -1, then the quantities \phi^k get bigger and bigger as k increases. Then, when we recurse the AR(1) equation the term involving a past value of x gets bigger and bigger, and certainly won’t shrink to 0 as k goes to infinity — in fact it’ll explode to infinity itself. In this case we say that the AR(1) process is not causal, rather that it’s explosive. But we can handle such a case by reversing the direction of time, re-writing the AR(1) equation as

x_{n-1} = x_n / \phi - w_n.

We’ve changed the sign of the white-noise term, but that’s OK because it’s just a white-noise random term anyway. This series behaves just like a causal AR(1) process going backwards in time, because now the multipler is 1/\phi, and if \phi is bigger than 1 or less than -1, then 1/\phi is between -1 and +1.

Operator Representation

We can rewrite the basic AR(1) equation by moving all the terms involving x to the left-hand side:

x_n - \phi x_{n-1} = w_n.

In fact we can do so for the general AR(p) equation:

x_n - \phi_1 x_{n-1} - \phi_2 x_{n-2} - ... - \phi_p x_{n-p} = w_n.

Now let’s introduce a quantity B, the backshift operator. It’s not a number, it’s an operator, which transforms any quantity like x_n to the same quantity for the preceding time index. In other words, the backshift operator, when operating on any time-indexed quantity, gives the lag-1 value of that quantity. Specifically, operating on the time series x we have

B x_n = x_{n-1}.

If we operate with the backshift operator twice we get the lag-2 value:

B^2 x_n = B (B x_n) = B x_{n-1} = x_{n-2},

and so on and so on, etc. etc. We can therefore write our AR(1) model in operator form as

(1 - \phi B) x_n = w_n,

and the general AR(p) model as

(1 - \phi_1 B - \phi_2 B^2 - ... - \phi_p B^p) x_n = w_n.

This is the operator form of the AR(p) model. We can also define an operator \phi(B) as

\phi(B) = 1 - \phi_1 B - \phi_2 B^2 - ... - \phi_p B^p.

Then we can write the operator form of the AR(p) equation as

\phi(B) x_n = w_n.

This notation is very compact, and turns out to have many advantages.

Now we can determine whether or not the general AR(p) model is causal. We use the operator form \phi(B), but substitute a complex number z for the operator B, i.e.,

\phi(z) = 1 - \phi_1 z - \phi_2 z^2 - ... - \phi_p z^p.

This is just a polynomial in z, of order p. Now we find the roots of the polynomial, i.e., all the values z for which

\phi(z) = 0.

In general, these roots can be complex numbers. But for any root z = a + bi, where a is the real part and b the imaginary part, we can compute the squared norm of the number as

|z|^2 = a^2 + b^2.

This squared norm is always a real number, and always positive. We can now state that if all the roots of the polynomial have squared norm greater than 1, then the AR(p) process is causal.

We mentioned, in passing, the moving-average, or MA process. And that’s the next alphabetical topic in this series…

Categories: mathematics

8 responses so far ↓

  • Gavin's Pussycat // August 22, 2008 at 8:09 pm

    There’s a \phi missing from the second term in the RHS of the second equation under “AR models”.

    Fortunately it drops out :-)

    [Response: Fixed.]

  • Gavin's Pussycat // August 22, 2008 at 8:11 pm


    \gamma_2 = \phi \gamma_3

    is bass-ackwards…

    [Response: And fixed. Thanks.]

  • Phil B. // August 22, 2008 at 9:42 pm

    Tamino, I am an Electrical Engineer and use z-transforms for linear difference equations and discrete time transfer functions in my signal processing and control system work. The standard definition for z in my field is the reciprocal of yours i.e. B = (1/z)=z^-1 and a filter is stable if the roots are within the unit circle. You are consistent in your definition but there are literally hundreds of textbooks that use the other definition.
    Phil B.

    [Response: But I suspect those aren't time series analysis textbooks. See e.g. Shumway & Stoffer, Time Series Analysis and its Applications. It's really not unusual for different fields to adopt different notational conventions. I actually use a lot of notation that isn't standard for my own analyses; one of my guiding principles is that notation should be your servant, not your master. But in this case, I've stuck with textbook standards.]

  • TCO // August 23, 2008 at 1:52 am

    point of pedagogy: I advise not to add new notation when also adding new concepts. (For us both the notation and concept may be new.) there are also some treatements on the web (using google) that give more intuitive simple descriptions. Rather than pure math ones.

  • Arch Stanton // August 23, 2008 at 5:08 pm

    My head blew up.

    But I’ll stop by again later for more.

  • Lazar // August 24, 2008 at 10:14 am

    Beautifully clear.
    Thank you, Tamino.

  • Phil Scadden // August 24, 2008 at 9:51 pm

    I’m out of my field with time series but increasingly having to look at them. This is wonderfully clear stuff.
    Curious though about case where you have magnitude of noise correlated with magnitude of value. Ie the noise is coming in part from the instrumentation and furthermore this noise is highly autocorrelated because of slow instrumentation response. I dont suppose you can point me to an analysis of such a timeseries? Estimation of error is my problem.

  • george // August 25, 2008 at 12:26 am

    For each order, if we know the autoregressive coefficients \phi_1, \phi_2, …, \phi_p we can compute the autocorrelations, so we can again determine the autocorrelation structure of the process.

    Perhaps these are stupid questions, but I’m a total novice in this area, so here goes:

    If you don’t know the coefficients ahead of time (or even the number of coefficients), how can you determine what kind of process it is?

    Alternatively, if you don’t know what kind of process it is ahead of time, how can you determine the coefficients or even how many there are?

    [Response: It's an excellent question, and there's no single definitive answer. If the process is purely MA (moving-average, which will be a future topic), then the autocorrelations will be non-zero only up to a finite lag -- that's a clue that the process is MA and the highest order of nonzero autocorrelation will be the order of the MA process. If the process is purely AR (like in this post), there's a complement to the autocorrelation called the "partial autocorrelation" which will be non-zero only up to a finite lag, and the highest order of nonzero partial autocorrelation will be the order of the AR process.

    If neither the autocorrelation series, or the partial autocorrelation series, seems to be definitely zero beyond a certain order, then the process may be ARMA (both AR and MA) but we still have to determine the order of each. One way to approximate it is actually to build models using observed data, and apply tests to determine which model is "best." They can be tested for quality using AIC (the Akaike information criterion), or the Bayesian information criterion, or other tests. But it basically comes down to educated guesses about what models to test, combined with statistical tests to determine which model is best.

    And then there are the ARIMA, SARIMA, etc. processes. In part at least, determining the best model is an art as well as a science.]

Leave a Comment