In terms of the densities involved in scalar random-parameter
problems, the mean-squared error is given by

Eε2=∫∫θ-
θ
̂2prθdrdθ
ε
2
θ
r
θ
θ
2
p
r
θ
(1)

where

prθ
p
r
θ
is the joint density of the observations and the parameter. To
minimize this integral with respect to

θ
̂
θ
, we rewrite using the laws of conditional
probability as

Eε2=∫pr∫θ-
θ
̂r2pθ|rdθdr
ε
2
r
p
r
θ
θ
θ
r
2
p
r
θ
(2)

The density

p
r
(
·
)
p
r
(
·
)
is nonnegative. To minimize the mean-squared error, we must
minimize the inner integral for each value of

rr because the integral is weighted
by a positive quantity. We focus attention on the inner
integral, which is the conditional expected value of the squared
estimation error. The condition, a fixed value of

rr, implies that we seek that
constant

θ
̂r
θ
r
derived from

rr that minimizes the second moment
of the random parameter

θθ. A
well-known result from probability theory states that the
minimum of

Ex-c2
x
c
2
occurs when the constant

cc
equals the expected value of the random variable

xx
(see

Expected Values of Probability
Functions). The inner integral and thereby the
mean-squared error is minimized by choosing the estimator to be
the conditional expected value of the parameter given the
observations.

θ̂MMSEr=Eθ|r
θ
MMSE
r
r
θ
(3)

Thus, a parameter's minimum mean-squared error (MMSE) estimate
is the parameter's

a posteriori (after the
observations have been obtained) expected value.

The associated conditional probability density
pθ|r
p
r
θ
is not often directly stated in a problem definition and must
somehow be derived. In many applications, the likelihood
function
pr|θ
p
θ
r
and the a priori density of the parameter are
a direct consequence of the problem statement. These densities
can be used to find the joint density of the observations and
the parameter, enabling us to use Bayes's Rule to fine the
a posteriori density *if*
we knew the unconditional probability density of the
observations.

pθ|r=pr|θpθpr
p
r
θ
p
θ
r
p
θ
p
r
(4)

This density

pr
p
r
is often difficult to determine. Be that as it may, to find the

a posteriori conditional expected value, it
need not be known. The numerator entirely expresses the

a
posteriori density's dependence on

θθ; the denominator only serves
as the scaling factor to yield a unit-area quantity. The expected
value is the center-of-mass of the probability density and does

*not* depend directly on the "weight" of the
density, bypassing calculation of the scaling factor. If not, the
MMSE estimate can be exceedingly difficult to compute.

Let LL statistically independent
observations be obtained, each of which is expressed by
rl=θ+nl
r
l
θ
n
l
.
Each
nl
n
l
is a Gaussian random variable having zero mean and variance
σ
n
2
σ
n
2
. Thus, the unknown parameter in this problem is the
mean of the observations. Assume it to be a Gaussian random
variable a priori (mean
m
θ
m
θ
and variance
σ
θ
2
σ
θ
2
).
The likelihood function is easily found to be

pr|θ=∏l=0L-112π
σ
n
2ⅇ-12rl-θ
σ
n
2
p
θ
r
l
0
L
1
1
2
σ
n
2
1
2
r
l
θ
σ
n
2
(5)

so that the

a posteriori density is given by

pθ|r=12π
σ
θ
2ⅇ-12θ-
m
θ
σ
θ
2∏l=0L-112π
σ
n
2ⅇ-12rl-θ
σ
n
2pr
p
r
θ
1
2
σ
θ
2
1
2
θ
m
θ
σ
θ
2
l
0
L
1
1
2
σ
n
2
1
2
r
l
θ
σ
n
2
p
r
(6)

In an attempt to find the expected value of this distribution,
lump all terms that do not depend

*explicitly* on the quantity

θθ
into a proportionality term.

pθ|r∝ⅇ-12∑rl-θ2
σ
n
2+θ-
m
θ
2
σ
θ
2
∝
p
r
θ
1
2
r
l
θ
2
σ
n
2
θ
m
θ
2
σ
θ
2
(7)

After some manipulation, this expression can be written as

pθ|r∝ⅇ-12σ2θ-σ2
m
θ
σ
θ
2+∑rl
σ
n
22
∝
p
r
θ
1
2
σ
2
θ
σ
2
m
θ
σ
θ
2
r
l
σ
n
2
2
(8)

where

σ2
σ
2
is a quantity that succinctly expresses the ratio

σ
n
2
σ
θ
2
σ
n
2+L
σ
θ
2
σ
n
2
σ
θ
2
σ
n
2
L
σ
θ
2
. The form of the

a posteriori
density suggests that it too is Gaussian; its mean, and
therefore the MMSE estimate of

θθ, is given by

θ̂MMSEr=σ2
m
θ
σ
θ
2+∑rl
σ
n
2
θ
MMSE
r
σ
2
m
θ
σ
θ
2
r
l
σ
n
2
(9)

More insight into the nature of this estimate is gained by
rewriting it as

θ̂MMSEr=
σ
n
2L
σ
θ
2+
σ
n
2L
m
θ
+
σ
θ
2
σ
θ
2+
σ
n
2L1L∑l=0L-1rl
θ
MMSE
r
σ
n
2
L
σ
θ
2
σ
n
2
L
m
θ
σ
θ
2
σ
θ
2
σ
n
2
L
1
L
l
0
L
1
r
l
(10)

The term

σ
n
2L
σ
n
2
L
is the variance of the averaged observations for a given value
of

θθ; it expresses the
squared error encountered in estimating the mean by simple
averaging. If this error is much greater than the

a
priori variance of

θθ (

σ
n
2L≫
σ
θ
2
≫
σ
n
2
L
σ
θ
2
), implying that the observations are noisier than
the variation of the parameter, the MMSE estimate ignores the
observations and tends to yield the

a
priori mean

m
θ
m
θ
as its value. If the averaged observations are less variable
than the parameter, the second term dominates, and the average
of the observations is the estimate's value. This estimate
behavior between these extremes is very intuitive. The
detailed form of the estimate indicates how the squared error
can be minimized by a linear combination of these extreme
estimates.

The conditional expected value of the estimate equals

Eθ̂MMSE|θ=
σ
n
2L
σ
θ
2+
σ
n
2L
m
θ
+
σ
θ
2
σ
θ
2+
σ
n
2Lθ
θ
θ
MMSE
σ
n
2
L
σ
θ
2
σ
n
2
L
m
θ
σ
θ
2
σ
θ
2
σ
n
2
L
θ
(11)

This estimate is biased because its expected value does not
equal the value of the sought-after parameter. It is
asymptotically unbiased as the squared measurement error

σ
n
2L
σ
n
2
L
tends to zero as

LL becomes
large. The consistency of the estimator is determined by
investigating the expected value of the squared error. Note
that the variance of the

a posteriori
density is the quantity

σ2
σ
2
; as this quantity does not depend on

rr, it also equals the
unconditional variance. As the number of observations
increases, this variance tends to zero. In concert with the
estimate being asymptotically unbiased, the expected value of
the estimation error thus tends to zero, implying that we have
a consistent estimate.