Skip to content Skip to navigation


You are here: Home » Content » Minimum Mean Squared Error Estimators


Content Actions

Minimum Mean Squared Error Estimators

Module by: Don Johnson

In terms of the densities involved in scalar random-parameter problems, the mean-squared error is given by

Eε2=θ- θ ̂2prθdrdθ ε 2 θ r θ θ 2 p r θ (1)
where prθ p r θ is the joint density of the observations and the parameter. To minimize this integral with respect to θ ̂ θ , we rewrite using the laws of conditional probability as
Eε2=prθ- θ ̂r2pθ|rdθdr ε 2 r p r θ θ θ r 2 p r θ (2)
The density p r ( · ) p r ( · ) is nonnegative. To minimize the mean-squared error, we must minimize the inner integral for each value of rr because the integral is weighted by a positive quantity. We focus attention on the inner integral, which is the conditional expected value of the squared estimation error. The condition, a fixed value of rr, implies that we seek that constant θ ̂r θ r derived from rr that minimizes the second moment of the random parameter θθ. A well-known result from probability theory states that the minimum of Ex-c2 x c 2 occurs when the constant cc equals the expected value of the random variable xx (see Expected Values of Probability Functions). The inner integral and thereby the mean-squared error is minimized by choosing the estimator to be the conditional expected value of the parameter given the observations.
θ̂MMSEr=Eθ|r θ MMSE r r θ (3)
Thus, a parameter's minimum mean-squared error (MMSE) estimate is the parameter's a posteriori (after the observations have been obtained) expected value.

The associated conditional probability density pθ|r p r θ is not often directly stated in a problem definition and must somehow be derived. In many applications, the likelihood function pr|θ p θ r and the a priori density of the parameter are a direct consequence of the problem statement. These densities can be used to find the joint density of the observations and the parameter, enabling us to use Bayes's Rule to fine the a posteriori density if we knew the unconditional probability density of the observations.

pθ|r=pr|θpθpr p r θ p θ r p θ p r (4)
This density pr p r is often difficult to determine. Be that as it may, to find the a posteriori conditional expected value, it need not be known. The numerator entirely expresses the a posteriori density's dependence on θθ; the denominator only serves as the scaling factor to yield a unit-area quantity. The expected value is the center-of-mass of the probability density and does not depend directly on the "weight" of the density, bypassing calculation of the scaling factor. If not, the MMSE estimate can be exceedingly difficult to compute.

Example 1

Let LL statistically independent observations be obtained, each of which is expressed by rl=θ+nl r l θ n l . Each nl n l is a Gaussian random variable having zero mean and variance σ n 2 σ n 2 . Thus, the unknown parameter in this problem is the mean of the observations. Assume it to be a Gaussian random variable a priori (mean m θ m θ and variance σ θ 2 σ θ 2 ). The likelihood function is easily found to be

pr|θ=l=0L-112π σ n 2-12rl-θ σ n 2 p θ r l 0 L 1 1 2 σ n 2 1 2 r l θ σ n 2 (5)
so that the a posteriori density is given by
pθ|r=12π σ θ 2-12θ- m θ σ θ 2l=0L-112π σ n 2-12rl-θ σ n 2pr p r θ 1 2 σ θ 2 1 2 θ m θ σ θ 2 l 0 L 1 1 2 σ n 2 1 2 r l θ σ n 2 p r (6)
In an attempt to find the expected value of this distribution, lump all terms that do not depend explicitly on the quantity θθ into a proportionality term.
pθ|r-12rl-θ2 σ n 2+θ- m θ 2 σ θ 2 p r θ 1 2 r l θ 2 σ n 2 θ m θ 2 σ θ 2 (7)
After some manipulation, this expression can be written as
pθ|r-12σ2θ-σ2 m θ σ θ 2+rl σ n 22 p r θ 1 2 σ 2 θ σ 2 m θ σ θ 2 r l σ n 2 2 (8)
where σ2 σ 2 is a quantity that succinctly expresses the ratio σ n 2 σ θ 2 σ n 2+L σ θ 2 σ n 2 σ θ 2 σ n 2 L σ θ 2 . The form of the a posteriori density suggests that it too is Gaussian; its mean, and therefore the MMSE estimate of θθ, is given by
θ̂MMSEr=σ2 m θ σ θ 2+rl σ n 2 θ MMSE r σ 2 m θ σ θ 2 r l σ n 2 (9)

More insight into the nature of this estimate is gained by rewriting it as

θ̂MMSEr= σ n 2L σ θ 2+ σ n 2L m θ + σ θ 2 σ θ 2+ σ n 2L1Ll=0L-1rl θ MMSE r σ n 2 L σ θ 2 σ n 2 L m θ σ θ 2 σ θ 2 σ n 2 L 1 L l 0 L 1 r l (10)
The term σ n 2L σ n 2 L is the variance of the averaged observations for a given value of θθ; it expresses the squared error encountered in estimating the mean by simple averaging. If this error is much greater than the a priori variance of θθ ( σ n 2L σ θ 2 σ n 2 L σ θ 2 ), implying that the observations are noisier than the variation of the parameter, the MMSE estimate ignores the observations and tends to yield the a priori mean m θ m θ as its value. If the averaged observations are less variable than the parameter, the second term dominates, and the average of the observations is the estimate's value. This estimate behavior between these extremes is very intuitive. The detailed form of the estimate indicates how the squared error can be minimized by a linear combination of these extreme estimates.

The conditional expected value of the estimate equals

Eθ̂MMSE|θ= σ n 2L σ θ 2+ σ n 2L m θ + σ θ 2 σ θ 2+ σ n 2Lθ θ θ MMSE σ n 2 L σ θ 2 σ n 2 L m θ σ θ 2 σ θ 2 σ n 2 L θ (11)
This estimate is biased because its expected value does not equal the value of the sought-after parameter. It is asymptotically unbiased as the squared measurement error σ n 2L σ n 2 L tends to zero as LL becomes large. The consistency of the estimator is determined by investigating the expected value of the squared error. Note that the variance of the a posteriori density is the quantity σ2 σ 2 ; as this quantity does not depend on rr, it also equals the unconditional variance. As the number of observations increases, this variance tends to zero. In concert with the estimate being asymptotically unbiased, the expected value of the estimation error thus tends to zero, implying that we have a consistent estimate.

Comments, questions, feedback, criticisms?

Send feedback