# Open Mind ## Alphabet Soup, part 3a: Parameters for ARMA(1,1)

#### September 16, 2008 · 4 Comments

There seems to be some interest in how one estimates the parameters $\phi,~\theta,~\sigma_w$ for an ARMA(1,1) model, based on estimates of the standard deviation and autocorrelation of observed data. Therefore I’ll illustrate how I arrived at the estimates used in the last post, and give some estimates for other data sets as well.

I’d like those readers who are genuinely interested in mathematics to have some understanding of how the formulae arise. So I’m going to go through the steps to derive them. Because of that, I’ll split this into two posts. In this one, we’ll derive the basic formulae relating the ARMA(1,1) parameters to the observed quantities (the data variance and its autocorrelations). In the next, we’ll invert those equations and apply them to actual data from GISS and HadCRU, to get the ARMA(1,1) parameters for those data sets. That’s why this post is called “3a”; the next will be “3b”. In any case, this post will have lots of equations. I love that; those of you who hate it will find this post tedious. But you can just wait for the next post to get formulae you can use to reproduce what I’ve done.

First, a reminder of what the ARMA(1,1) model is. The noise is modeled as an ARMA(1,1) process, i.e., $x_n = \phi x_{n-1} + w_n + \theta w_{n-1}$.

Here $x_n$ is the noise value, and $w_n$ is a white noise process, i.e., each value has the same expected value (zero) and standard deviation ( $\sigma_w$), and all the white-noise values are uncorrelated. The parameters $\phi,~\theta$ define the particular ARMA(1,1) process we’re considering, there’s only 1 $\phi$ value so we have AR order 1, there’s only 1 $\theta$ value so we have MA order 1, so it’s order (1,1), or simply ARMA(1,1).

In an earlier post we introduced the backshift operator B. It changes any variable from its present value to the value which is one time step earlier. Therefore $B x_n = x_{n-1}$, and $B w_n = w_{n-1}$, etc.

Hence we can rewrite the ARMA(1,1) model as $x_n = \phi B x_n + w_n + \theta B w_n$.

Bringing all the $x$ terms to the left hand side, we have $(1 - \phi B) x_n = (1 + \theta B) w_n$.

We can think of the quantity $1 - \phi B$ as an operator, which changes any variable to its present value minus $\phi$ times the preceding value. One might wonder, does this operator have an inverse? It turns out that if $|\phi| < 1$, then its inverse can be written as $(1 - \phi B)^{-1} = 1 + \phi B + \phi^2 B^2 + \phi^3 B^3 + ...$

The sum goes on to infinity. You can confirm this formula by actually muliplying it by $1 - \phi B$.

Multiplying both sides of our ARMA(1,1) model equation by this inverse, we get $x_n = (1 - \phi B)^{-1} (1 - \phi B) x_n$ $= (1 + \phi B + \phi^2 B^2 + \phi^3 B^3 + ...) (1 + \theta B) w_n$.

Working out the algebra, we get $x_n = [1 + (\phi + \theta)B + \phi(\phi+\theta)B^2 + \phi^2(\phi+\theta)B^3) + ...] w_n$.

We can write this as $x_n = (\psi_0 + \psi_1 B + \psi_2 B^2 + \psi_3 B^3 + ...) w_n$,

which serves to define the quantities $\psi_n$. For the ARMA(1,1) case in question, we have $\psi_0 = 1$, (it always does), $\psi_n = \phi^{n-1} (\phi + \theta)$, for n greater than 0.

What good is it to express the model in this fashion? Because we know that (angle brackets denote the expecation value of the enclosed quantity) $\langle w_n^2 \rangle = \sigma_w^2$,

and $\langle w_j w_k \rangle = 0$, for $j \ne k$,

we can deduce that the autocovariances are $\gamma_j = \sum_{n=0}^\infty \psi_n \psi_{n+j}$.

Hence this formulation allows for a straightforward evaluation of the autocovariances of any ARMA(p,q) model. I’m guessing that many readers will be able to confirm this last equation for themselves.

Let’s apply it to our ARMA(1,1) model. For $j=0$ we have $\gamma_0 = \sigma_w^2 [ \psi_0^2 + \psi_1^2 + \psi_2^2 + ...] = \sigma_w^2 [1 + \sum_{n=1}^\infty \phi^{2n-2} (\phi+\theta)^2] = \sigma^2$,

where the last part comes from the fact that the autocovariance at lag 0 is, by definition, equal to the data variance. We can use the fact that for any z, $1 + z + z^2 + z^3 + ... = 1/(1-z)$ to reformulate this as $\gamma_0 = \sigma_w^2 [ 1 + (\phi + \theta)^2 / (1 - \phi^2)]$.

For $j \ge 1$ we have $\gamma_j = \sigma_w^2 [ \phi^{j-1}(\phi+\theta) + \sum_{n=1^\infty} \phi^{2n-2+j} (\phi+\theta)^2]$.

Again we can use the formula for an infinite sum of powers to formulate this as $\gamma_j = \sigma_w^2 [\phi^{j-1} (\phi+\theta) + \phi^j (\phi+\theta)^2 / (1-\phi^2)$ $= \sigma_w^2 \phi^j [1 + \theta/\phi + (\phi+\theta)^2/(1-\phi^2)]$.

The autocorrelations are the $\rho_j = \gamma_j / \gamma_0$, which makes the factors $\sigma_w^2$ cancel out, giving $\rho_j = \phi^j [{1 + \theta/\phi + (\phi+\theta)^2/(1-\phi^2) \over 1 + (\phi+\theta)^2/(1-\phi^2)}]$.

I can also write this as $\rho_1 = \phi^j [ 1 + {\theta/\phi \over 1 + (\phi+\theta)^2/(1-\phi^2)}]$,

and this formula holds only for $j \ge 1$; for $j=0$ we have (of course!) $\rho_0=1$ (as it always does).

It’s worth noting that in this formula, only the term $\phi^j$ has any dependence on the lag $j$. Therefore for an ARMA(1,1) model, the autocorrelations decay exponentially, with decay factor $\phi$, although the first autocorrelation is not equal to $\phi$ (it would be for an AR(1) model).

These are the formulae giving the observed quantities, the data variance $\sigma^2$ and the autocorrelations $\rho_j$, in terms of the ARMA(1,1) parameters, the white-noise variance $\sigma_w^2$ and $\phi,~\theta$. In the next post we’ll invert them to get the parameters from the observed quantities. We’ll also apply it to GISS and HadCRU data, and see what pops out.

Categories: mathematics

### 4 responses so far ↓

• So out of curiosity (and pure laziness), can you solve for an ARMA(n,m) process as well?

[Response: Yes you can, given the orders (n,m) and estimates of the variance and all the autocorrelations. It's quite a bit more complicated, and the uncertainties become larger as the orders increase.]

• apolytongp

Can you please critically (by which I mean incisively, not “negatively”) evaluate the red noise scheme that Steve McI used, which had the sample set as an input? WA discuss it and say that it overmodels the sample within the noise (iow, it has some non-randomness properties in that it emulates the sample itself). But they don’t do math. Will you please do so? Thanks.

[Response: That's a lot of work. I have work to do for my employer, a number of research projects of my own, and this blog takes time ... I'll put it in the queue as another possibility, but I can't make any guarantees.]

• george

Not sure if it is ever of any practical interest (since it represents very special cases), but the above equation

x_n = [1 + (phi + theta)B + phi(phi+theta)B^2 + phi^2(phi+theta)B^3) + ...] w_n

indicates that ” x_n = w_n ” for the cases in which theta = -phi.

ie, “x” at any given time is determined only by the white noise at that time (zero autocorrelation at lags >= 1 )

So, apparently, for an ARMA(1,1) process, it is possible (at least in principle, for special cases) to get some sort of “cancellation” between the AR and MA parts of the process.

[Response: Actually the case you give is an example of "redundancy." For $\theta = -\phi$, the equation can be rearranged as $(1 - \phi B) x_n = (1 - \phi B) w_n$, and the operator factors cancel each other out so the "correct" equation is $x_n = w_n$. Hence in this case it's not really proper to call it an ARMA(1,1) model, it's just a white-noise model.]

• george

You note above that for

ARMA(1,1) process,

x_n = phi*x_{n-1} + w_n + theta*w_{n-1}

I’m trying to understand what this means physically

It makes sense that the temperature (”x”) at each given time would be some fraction of its immediately preceding value (it would decay a bit due to heat loss, for example) and it also makes sense that it might include extra (new) white noise at each stage.

But I am trying to understand what the third part means:

+ theta*w_{n-1}

In particular, for the case you gave for GISS, in which “theta” was actually negative, what does it mean (physically) to say that (at each stage) we need to “subtract out” a fraction of the white noise that was just added at the previous stage?

What would be the underlying physical reason for doing so?

[Response: As I commented earlier, in many (but not all) cases an ARMA(1,1) process is equivalent to an AR(1) process plus an independent white-noise process. One could think of it as being an AR(1) process to mimic the "weather noise" plus an independent white-noise process to model some other kind of noise superimposed on it. I think Lucia has used the AR(1) + WN model with that in mind. Mathematically, it may be simpler to treat it as a single noise process than a sum of two noise processes. The single-noise-processes model has no clear physical interpretation that I'm aware of, but the sum-of-two-noise-processes model does, although I don't know whether or not that actually applies.]

Also, I understand that it is simply the limiting case for a more general process, but isn’t it really a bit of a misnomer to call ARMA(1,1) a “Moving average” when it only includes a single previous value?!

[Response: All MA processes involve a finite number of previous values in addition to the newest white-noise value. The fact that it involves only 1 doesn't disqualify it from meeting the definition.]

Actually, it surprises me a bit that global temperature data can be modeled as ARMA(1,1).

Somehow, I would expect the temperature “noise” of the recent past to more strongly influence the current temperature than is the case with ARMA(1,1).

Doesn’t that assume that all climate noise is very short lived? (less than or equal to 1 year)

[Response: You should keep in mind that for an ARMA(1,1) process, it's ALL noise. The purpose of ARMA(1,1) is to model the noise in global temperature, after we've already subtracted out the "signal" (which for the last 30 years or so is well approximated as a straight line).

So the noise has "memory" by virtue of the AR coefficient $\phi$ as well as the MA coefficient $\theta$. Since the effect of $\phi$ is recursive, this causes the "memory" to persist much longer than one time step. In fact it decays exponentially, and for $\phi=0.8493$ it decays with "characteristic" or "e-folding" time scale a little over 6 months. But it persists even longer than that, albeit the strength of the persistence continues to decay. The impact is still there many years into the future, but the level is so low it eventually becomes negligible.]

But as this indicates that major El Ninos have duration from 0.5 year to 1.5 year.

Why not use a higher order — eg, ARMA(2,2) — process?

Wouldn’t it be likely (based on purely physcial considerations) to be a better fit than ARMA(1,1)?

Perhaps the math is too messy? :)

[Response: It may be a higher-order process, but the higher-order coefficients are probably much smaller, and there's not enough data to determine them with accuracy. My guess would be (although I haven't run the numbers) that if we applied an ARMA(2,2) process, then the 2nd-order coefficients would have uncertainty larger than their estimates, in which case we couldn't even be sure that we got the sign of the coefficients correct. In short, I don't see evidence of higher orders and I suspect that if they're there, we can't yet quantify them with sufficient precision to be useful.

And the fact is, the ARMA(1,1) model is pretty good.]