Open Mind

Take it to the Limit: Part 1, Moments

June 13, 2009 · 7 Comments

Probably the most important theorem in all of statistics is the central limit theorem. I’d like to give an explanation what it is, and show that it is indeed true (not rigorously but at least quantitatively), so I’ll take a couple of posts to illustrate it. In this, the 1st installment, I’ll introduce the concepts of moments of a probability function and of the moment-generating function.


Most of us are familiar with the mean value. For a population of data it’s the average value, and for a distribution (probability distribution, that is) it’s the expected value.

“Expected value” is one of the most important and useful concepts in statistics. Suppose there are a finite number of possibilities from a measurement or experiment. For flipping a coin, e.g., there is a finite number of possible outcomes: heads or tails (I’ll ignore the possibility of landing on edge, or rolling down a drain). If we can specify the probability of each possibility then we’ve given the probability mass function, or pmf, for the experiment.

We can even list the possibilities and give them labels from 1st to last, with the possible outcomes being x_1,~x_2,~x_3,.... Then, to compute the expected value of some function A(x) of the outcome, we take the sum, over all possible outcomes, of the probability of that happening times the value of the function if that happens. Therefore

{\bf E}(A(x)) = \sum_j P(x_j) A(x_j),

where the sum is over all possible values of x_j. The bold-face E is the expected value operator, so {\bf E}(A(x)) indicates the expected value of A(x). Another notation is to use angle brackets, as in

\langle A(x) \rangle = \sum_j P(x_j) A(x_j).

Some of the most important expected values for a probability mass function are the expected values of powers of x, the random variable itself. The most fundamental is the 1st power, which is just the expected value of x, also known as the mean

mean = {\bf E}(x) = \sum_j P(x_j) x_j.

A trivial case is the zeroth power, since the zeroth power of anything is equal to 1 so the expected value of the zeroth power of x (or anything else) is 1

{\bf E}(x^0) = {\bf E}(1) = 1.

The expected value of x^2 is also known as the 2nd moment of the probability mass function

2nd moment = {\bf E}(x^2) = \sum_j P(x_j) x_j^2.

In fact the expected value of the k^{th} power of x is the k^{th} moment of the pmf.

k^{th} moment = {\bf E}(x^k).

If we have a continuous range of possible outcomes (like any number between 0 and 1) rather than a finite number (like heads or tails), then instead of giving a probability mass function (pmf) P(x) we give a probability density function (pdf) f(x). Now the expected value is an integral rather than a sum

{\bf E}(A(x)) = \int_{-\infty}^{\infty} f(x) A(x) ~dx.

All the preceding still applies, i.e., the expected value of the k^{th} power of x is still called the k^{th} moment of the distribution.

We can even give the moments symbols; let \mu_k be the k^{th} moment. The first moment, \mu_1, is our old friend the mean, also known as \mu.

Moment-generating function

Consider the function defined by the power series

m(t) = 1 + \mu_1 t + \frac{1}{2} \mu_2 t^2 + \frac{1}{6} \mu_3 t^3 + ... = \sum_{k=0}^\infty \frac{1}{k!} \mu_k t^k.

This is called the moment-generating function for the probability function (whether it’s a pmf or pdf). We can write it as

m(t) = {\bf E}(1) + {\bf E}(x) t + \frac{1}{2}{\bf E}(x^2) t^2 + \frac{1}{6}{\bf E}(x^3) t^3 + ... = \sum_{k=0}^\infty \frac{1}{k!} {\bf E}(x^k) t^k.

A most useful property of the expected value operator is that the sum of expected values is the expected value of a sum, i.e.,

{\bf E}(A) + {\bf E}(B) = {\bf E}(A+B).

Therefore our moment-generating function is

m(t) = {\bf E}(1 + xt + \frac{1}{2} x^2 t^2 + \frac{1}{6} x^3 t^3 + ...) = {\bf E}(\sum_{k=0}^\infty \frac{1}{k!} (xt)^k).

You may recognize the final sum as the series expansion for e^{xt}, so we finally have that the moment-generating function is

m(t) = {\bf E}(e^{xt}).

An interesting fact is that if we know all the moments of a probability function, then we know the probability function itself. If we know the moment-generating function, we can find all the moments. So the moment-generating function gives us complete information about the probability function.

It should be mentioned that for some probability functions, some of the moments don’t exist. Take the pdf given by

f(x) = 1/x^2 for x \ge 1, while f(x)=0 otherwise.

It’s a perfectly good pdf, since it’s nowhere negative and the total probability for all possible values is equal to 1

\int_1^{\infty} f(x)~dx = 1.

But when we try to compute the first moment

{\bf E}(x) = \int_1^\infty x f(x) ~dx = \int_1^\infty (1/x) ~dx,

we find that the integral is not finite; therefore this distribution doesn’t have a 1st moment (or 2nd moment, or any at all except the zeroth moment). Hence it has no moment-generating function.

However, most distributions do have moments, in fact they have all moments from zeroth to infinitieth (is that a word?), so they have a moment-generating function. One of the most important distributions is the normal distribution, given by

f(x) = {1 \over \sigma \sqrt{2\pi}} e^{-\frac{1}{2} (x-\mu)^2/\sigma^2}.

All its moments are finite, and the moment-generating function has the especially simple form

m(t) = e^{\mu t + \frac{1}{2}\sigma^2 t^2} (normal distribution).

In the next installment, we’ll use the moment-generating function to characterize the moments of the average of a set of observations.

Categories: mathematics

7 responses so far ↓

  • Ray Ladbury // June 13, 2009 at 5:54 pm | Reply

    Which of course begs for my favorite nerdy statistics joke:
    Q: What’s the Cauchy distribution’s least favorite question?
    A: Got a moment?

    A question: Do you know of a good way to estimate the amount of data required to determine a particular moment of a PDF to a given confidence level?

  • David B. Benson // June 13, 2009 at 9:44 pm | Reply

    Tamino — The eleventh equation

    m(t) = {\bf E}(1 + xt + \frac{1}{2} x^2 t^2 + \frac{1}{6} x^3 t^3 + …) = {\bf E}(\sum_{k=0}^\infty \frac{1}{k!} (xt)^k

    is missing a closing parenthsis at the end.
    A nit I picked in a clear exposition.

    [Response: Without nitpickers, errors would remain when not needed.]

  • John Mashey // June 13, 2009 at 10:37 pm | Reply

    Reading the post, I started thinking “Cauchy”, but Ray was too quick. Distributions with no moments are pretty nasty.

    I suspect this thread will be a a bit deeper mathematically than most :-)

  • michel lecar // June 15, 2009 at 12:15 pm | Reply

    If you are looking for something which is more of a howto and a whyto, try this:

    http://www.wadsworth.com/psychology_d/templates/student_resources/workshops/stat_workshp/cnt_lim_therm/cnt_lim_therm_01.html

    It might be more profitable for the involved layman to spend his/her time on (i) feedbacks, their nature, strength and evidence (ii) emissions, their scale, needed reductions, measures to deliver these, rather than mastering basic statistics.

    But to each his own.

  • Neven // June 15, 2009 at 1:45 pm | Reply

    I’m not getting any of this! But hey, that’s probably why I speak six languages.

    Nevertheless, it’s nice to see you post, Tamino.

  • Deep Climate // June 15, 2009 at 7:25 pm | Reply

    Michel said:

    It might be more profitable for the involved layman to spend his/her time on (i) feedbacks, their nature, strength and evidence (ii) emissions, their scale, needed reductions, measures to deliver these, rather than mastering basic statistics.

    On the other hand, a knowledge of basic statistics is very useful in the evaluation of the “evidence” and other issues.

  • Don fontaine // June 16, 2009 at 7:32 pm | Reply

    Shouldn’t it be the expected value of A, not f, in the equation where you introduce the integral form?

    [Response: Yes indeed. Fixed.]

Leave a Comment