If a time series is the sum of a signal and noise, then the noise part itself constitutes a time series. If it’s *white noise*, then all the values are uncorrelated with each other. But we’ve often considered that the noise in geophysical time series (like temperature data) isn’t white. What other kinds of noise processes are there?

In this post we’ll take a look at a kind of process known as an *autoregressive*, or AR process. There’s a lot of math in this post! Also, we’ll only scratch the surface of the subject of alphabet soup because there are a lot more types of random time series than just AR series. But in future posts, we’ll take a look at MA series, ARMA series, ARIMA, SARIMA, FARIMA series, and we’ll consider their behaviors as well. Fortunately, this post lays the groundwork for much of the notation and conventions involved.

And for those of you who aren’t in to the math thing … don’t worry, I’ll continue posting about climate (and occasionally other) stuff as well. This series will appear interspersed among those other topics.

Consider a random time series , where is the “time index.” Assume that the expected value of each datum is zero, i.e., that it’s a zero-mean series, and that the expected variance (the expected value of the square) of any data value is . Let’s further assume that the time series is *stationary*, which basically means that any given time frame can be expected to behave just like any other given time frame. If this noise process is white, then we further have that any two different values are uncorrelated, i.e., the expected value of their product is zero. Using angle brackets to indicate the expected value of the enclosed quantity, we can write these conditions as

,

for any data point, and that

,

unless , in which case

.

The white-noise model is the most common one when analyzing data naively. However, it’s commonplace for time series to behave differently. We’ve seen in many instances that the noise in temperature (and other) time series isn’t white, because it shows *autocorrelation*.

To understand this, we can begin by defining the *autocovariance* for our zero-mean noise data: the expected value of the product of two data values which have some *lag* between them. If the lag is zero, then we have simply the expected value of the square of a given value, which is just the variance of the series. This can also be called the autocovariance at lag zero, which we can write as

.

The autocovariance at some generic lag is

.

The first thing to note is that since the process is stationary, the expected value of the product of two values where the 2nd one *leads* rather than lags the 1st (by the same number of time steps) will be the same:

.

Another way to say this is

.

Essentially, the expected value of two data points doesn’t depend on which one leads and which one lags, only on the difference between their time indexes. So if we know the values of for all nonnegative values of , we know them for negative values also, hence we know the entire autocovariance structure of our time series of random variables.

We’re not usually as interested in the *size* of the autocovariances as we are in their *relative* sizes; these define the *autocorrelation* at arbitrary lag

.

We see immediately that the autocorrelation at lag 0 will be

.

It’s also not too hard to determine that the other autocorrelations can’t be greater than 1,

,

and in fact will only be equal to 1 when the lag is equal to 0. If *any* of the autocorrelations for nonzero lag are different from zero, then we say that the time series exhibits autocorrelation. As we’ve seen before, knowledge of the autocorrelation structure enables us to determine how the noise affects things like the probable error of a trend estimate.

**AR models**

How can we generate a random series that shows autocorrelation? One way is through an *autoregressive* process. The simplest example is a 1st-order autoregressive, or AR(1) process, in which each value is some constant (which is *not* zero) times the previous value, plus a white-noise term:

.

The white noise terms are assumed to be a stationary, zero-mean, uncorrelated random process with its own variance , which has no correlation with any of the already-existing data values. This equation holds for all values of the time index . To get the variance of the data series itself, we can square both sides of this equation to get

.

Now let’s take the expected value of both sides. On the left, we have the expected value of the square of a data value, which is just . The first term on the right is times the expected square of a data value, which gives . The 2nd term on the right is zero, because is uncorrelated with . The final term on the right is just the variance of the white-noise process, which is . So we have

,

which we can rearrange as

.

This of course is the same as , the autocovariance at lag 0.

If we multiply the basic AR(1) equation by some lagged value , and take expected values as before, we get

.

This tells us that , , , etc. From this we can deduce that

,

or simply

.

Hence we know the entire autocorrelation structure of an AR(1) process. It’s worth noting that *all* of the autocorrelations, for all lags, are nonzero.

AR models aren’t limited to first order. We can have a 2nd-order process in which each value is a linear combination of the last *two* terms, plus a white-noise factor, giving an AR(2) process:

.

In fact we can define an AR(p) process of arbitrary order as

.

For each order, if we know the autoregressive coefficients we can compute the autocorrelations, so we can again determine the autocorrelation structure of the process.

**Recursive Representation**

Consider again the AR(1) process

.

This holds true not just for time index , but for time index , etc. Therefore we can recurse this formula to get

This is equivalent to

.

Now suppose that the autoregressive factor is between -1 and +1. Then the quantities get smaller and smaller as gets bigger, and if we take big enough they’ll get negligibly small. We can even let go to infinity, in which case the term goes to zero and the dependence on a preceding term disappears altogether

.

This process, in which each value is a weighted average of white-noise terms, is a type of *moving-average*, or MA process. In this case, there are infinitely many white-noise values involved, so we might call it an MA() process. But MA processes are the subject of the next in this series, so let’s not get too far ahead of ourselves. Of course we would prefer to express it as an AR(1) process, since we can compute a finite process; in fact an AR or MA process generally has finite order, although we can certainly *imagine* such processes being of infinite order.

**Causality**

If the AR(1) multiplier is greater than 1 or less than -1, then the quantities get bigger and bigger as increases. Then, when we recurse the AR(1) equation the term involving a past value of gets bigger and bigger, and certainly won’t shrink to 0 as goes to infinity — in fact it’ll explode to infinity itself. In this case we say that the AR(1) process is not *causal*, rather that it’s *explosive*. But we can handle such a case by reversing the direction of time, re-writing the AR(1) equation as

.

We’ve changed the sign of the white-noise term, but that’s OK because it’s just a white-noise random term anyway. This series behaves just like a causal AR(1) process going backwards in time, because now the multipler is , and if is bigger than 1 or less than -1, then is between -1 and +1.

**Operator Representation**

We can rewrite the basic AR(1) equation by moving all the terms involving to the left-hand side:

.

In fact we can do so for the general AR(p) equation:

.

Now let’s introduce a quantity , the *backshift operator*. It’s not a number, it’s an operator, which transforms any quantity like to the same quantity for the *preceding time index*. In other words, the backshift operator, when operating on any time-indexed quantity, gives the lag-1 value of that quantity. Specifically, operating on the time series we have

.

If we operate with the backshift operator *twice* we get the lag-2 value:

,

and so on and so on, etc. etc. We can therefore write our AR(1) model in operator form as

,

and the general AR(p) model as

.

This is the *operator form* of the AR(p) model. We can also define an operator as

.

Then we can write the operator form of the AR(p) equation as

.

This notation is very compact, and turns out to have many advantages.

Now we can determine whether or not the general AR(p) model is causal. We use the operator form , but substitute a *complex number* for the operator , i.e.,

.

This is just a polynomial in , of order . Now we find the roots of the polynomial, i.e., all the values for which

.

In general, these roots can be complex numbers. But for any root , where is the real part and the imaginary part, we can compute the squared norm of the number as

.

This squared norm is always a real number, and always positive. We can now state that **if all the roots of the polynomial have squared norm greater than 1, then the AR(p) process is causal**.

We mentioned, in passing, the moving-average, or MA process. And that’s the next alphabetical topic in this series…

## 8 responses so far ↓

Gavin's Pussycat// August 22, 2008 at 8:09 pmThere’s a \phi missing from the second term in the RHS of the second equation under “AR models”.

Fortunately it drops out :-)

[

Response: Fixed.]Gavin's Pussycat// August 22, 2008 at 8:11 pmand

\gamma_2 = \phi \gamma_3

is bass-ackwards…

[

Response: And fixed. Thanks.]Phil B.// August 22, 2008 at 9:42 pmTamino, I am an Electrical Engineer and use z-transforms for linear difference equations and discrete time transfer functions in my signal processing and control system work. The standard definition for z in my field is the reciprocal of yours i.e. B = (1/z)=z^-1 and a filter is stable if the roots are within the unit circle. You are consistent in your definition but there are literally hundreds of textbooks that use the other definition.

Phil B.

[

Response: But I suspect those aren't time series analysis textbooks. See e.g. Shumway & Stoffer, Time Series Analysis and its Applications. It's really not unusual for different fields to adopt different notational conventions. I actually use a lot of notation that isn't standard for my own analyses; one of my guiding principles is that notation should be your servant, not your master. But in this case, I've stuck with textbook standards.]TCO// August 23, 2008 at 1:52 ampoint of pedagogy: I advise not to add new notation when also adding new concepts. (For us both the notation and concept may be new.) there are also some treatements on the web (using google) that give more intuitive simple descriptions. Rather than pure math ones.

Arch Stanton// August 23, 2008 at 5:08 pmMy head blew up.

But I’ll stop by again later for more.

Lazar// August 24, 2008 at 10:14 amBeautifully clear.

Thank you, Tamino.

Phil Scadden// August 24, 2008 at 9:51 pmI’m out of my field with time series but increasingly having to look at them. This is wonderfully clear stuff.

Curious though about case where you have magnitude of noise correlated with magnitude of value. Ie the noise is coming in part from the instrumentation and furthermore this noise is highly autocorrelated because of slow instrumentation response. I dont suppose you can point me to an analysis of such a timeseries? Estimation of error is my problem.

george// August 25, 2008 at 12:26 amPerhaps these are stupid questions, but I’m a total novice in this area, so here goes:

If you don’t know the coefficients ahead of time (or even the number of coefficients), how can you determine what kind of process it is?

Alternatively, if you don’t know what kind of process it is ahead of time, how can you determine the coefficients or even how many there are?

[

Response: It's an excellent question, and there's no single definitive answer. If the process is purely MA (moving-average, which will be a future topic), then the autocorrelations will be non-zero only up to a finite lag -- that's a clue that the process is MA and the highest order of nonzero autocorrelation will be the order of the MA process. If the process is purely AR (like in this post), there's a complement to the autocorrelation called the "partial autocorrelation" which will be non-zero only up to a finite lag, and the highest order of nonzero partial autocorrelation will be the order of the AR process.If neither the autocorrelation series, or the partial autocorrelation series, seems to be definitely zero beyond a certain order, then the process may be ARMA (both AR and MA) but we still have to determine the order of each. One way to approximate it is actually to build models using observed data, and apply tests to determine which model is "best." They can be tested for quality using AIC (the Akaike information criterion), or the Bayesian information criterion, or other tests. But it basically comes down to educated guesses about what models to test, combined with statistical tests to determine which model is best.

And then there are the ARIMA, SARIMA, etc. processes. In part at least, determining the best model is an art as well as a science.]## Leave a Comment