You are on page 1of 12

Statistical Inference for

Engineers and Data Scientists


Solutions Manual
PIERRE MOULIN and VENUGOPAL V. VEERAVALLI
1 Introduction

1.1 Binary Decision Making. Consider the following decision problem with
X = A = {0, 1}. Suppose the cost function is given by

 0 if a = x
C(a, x) = 1 if x = 0, a = 1

10 if x = 1, a = 0

The observation Y takes values in the set Y = {1, 2, 3} and the conditional
p.m.f.’s of Y are:

p0 (1) = 0.4, p0 (2) = 0.6, p0 (3) = 0, p1 (1) = p1 (2) = 0.25, p1 (3) = 0.5

a) Is there a best decision rule based on conditional risks?


Solution. We may compute the conditional risks for the eight possible deci-
sion rules as follows.

δ 1 2 3 R0 R1 0.5(R0 + R1 ) max(R0 , R1 )

δ1 0 0 0 0 10 5 10
δ2 0 0 1 0 5 2.5 5
δ3 0 1 0 0.6 7.5 4.05 7.5
δ4 0 1 1 0.6 2.5 1.55 2.5
δ5 1 0 0 0.4 7.5 3.95 7.5
δ6 1 0 1 0.4 2.5 1.45 2.5
δ7 1 1 0 1 5 3 5
δ8 1 1 1 1 0 0.5 1

It is clear that there is no rule that can be better than both δ2 and δ8 , and
hence there is no best rule based on conditional risks alone.
b) Find Bayes (for equal priors) and minimax rules within the set of deterministic
decision rules.
Solution. It follows from the table that δ8 (always choose 1) is the best Bayes
as well as minimax rule in D.
c) Now consider the set of randomized decision rules. Find a Bayes rule (for equal
priors). Also find a randomized rule whose maximum risk is smaller than that
of the minimax rule of part (b).
2 Introduction

Solution. Randomization does not improve minimum Bayes risk, hence δ8 is


still the best rule in D̃.
However it is easy to check that the rule

δ2 with prob. 1/6
δ̃m =
δ8 with prob. 5/6

has a minimax risk of 5/6, which is better than the minimax risk of the non-
randomized minimax rule from part (b). Since R0 (δi ) is smaller than R0 (δ8 )
for any i ∈ {3, 4, 5, 6}, one can also randomize δ8 with δi , i ∈ {3, 4, 5, 6} to
achieve a smaller minimax error.

1.2 Health Insurance. Find the minimax decision rule for the health insurance
problem given in Example 1.3.
Solution. Since A = {buy, not} and Y = {0, 1}, there are four deterministic
decision rules.

δ 0 1 R0 (δ) R1 (δ) 0.2R0 (δ) + 0.8R1 (δ) max{R0 (δ), R1 (δ)}

δ1 buy buy 1 1 1 1
δ2 buy not 11/2 0 1.1 11/2
δ3 not buy 11/2 1 1.9 11/2
δ4 not not 10 0 2 10

where
X 1
R0 (δ) = E0 [C(δ(Y ), 0)] = p0 (y)C(δ(y), 0) = [C(δ(0), 0) + C(δ(1), 0)]
y=0,1
2
X
R1 (δ) = E1 [C(δ(Y ), 1)] = p1 (y)C(δ(y), 1) = C(δ(1), 1).
y=0,1

Listing all four decision rules in above table, we observe that δ1 (y) = buy ∀y,
is the minimax rule among all deterministic decision rules. By observing that
R0 (δ1 ) < R0 (δi ), i = 2, 3, 4, it is clear that there is no randomized rule δ̃ such
that R0 (δ̃) < 1. As a result, δ1 is also the minimax rule among all randomized
rules.

1.3 Binary Communication with Erasures. Let X = {0, 1}, and A = {0, 1, e}.
This example models communication of a bit x ∈ X with an erasure option at
the receiver. Now suppose

(y − (−1)j+1 )2
 
1
pj (y) = √ exp − , j ∈ X , y ∈ R.
2πσ 2 2σ 2

That is, Y has distribution N (−1, σ 2 ) when the state is 0, and Y has distribution
Introduction 3

N (1, σ 2 ) when the state is 1. Assume a cost structure



 0 if a = 0, j = 0 or a = 1, j = 1
C(a, j) = 1 if a = 1, j = 0 or a = 0, j = 1

c if a = e
Furthermore, assume that the two states are equally likely.
a) First assume that c < 0.5. Show that the Bayes rule for this problem has the
form:

 0 y ≤ −t
δB (y) = e −t < y < t
1 y ≥ t.

Also give an expression for t in terms of the parameters of the problem.


Solution. The Bayes rule chooses the decision with smallest a posteriori cost
given Y = y. Now,
X1
C(d|y) = C(d, j)π(j|y).
j=0

Thus C(0|y) = π(1|y), C(1|y) = π(0|y), and C(e|y) = c[π(0|y) + π(1|y)] = c,


with
p0 (y)π0 p0 (y)π0 1
π(0|y) = = =
p(y) p0 (y)π0 + p1 (y)π1 1 + exp(2y/σ 2 )
and
1
π(1|y) = 1 − π(0|y) = .
1 + exp(−2y/σ 2 )
We erase (i.e., choose d = e) if c < min{π(0|y), π(1|y)}. Since min{π(0|y), π(1|y)}
is always less that 0.5, we never erase if c ≥ 0.5.
If c < 0.5, we consider two cases:
(i) if y ≥ 0, then min{π(0|y), π(1|y)} = π(0|y) and hence we erase
σ2
 
1 1
if c < i.e., if y < ln − 1 =: t
1 + exp(2y/σ 2 ) 2 c
(ii) if y < 0, then min{π(0|y), π(1|y)} = π(1|y) and hence we erase
σ2
 
1 1
if c < i.e., if y > − ln − 1 = −t
1 + exp(−2y/σ 2 ) 2 c
Thus
Ye = (−t, t).
We choose 0 if π0 (y) ≤ min{c, π(1|y)}. From above it is clear that
Y0 = (−∞, −t].
Finally, we choose 1 if π1 (y) ≤ min{c, π(0|y)}. Thus
Y1 = [t, ∞).
4 Introduction

To summarize,

 0 y ≤ −t σ2

1

δB (y) = e −t < y < t , where t = ln −1 , if c < 0.5,
2 c
1 y≥t

b) Now find δB (y) when c ≥ 0.5.


Solution. Recall that we erase if c < min{π(0|y), π(1|y)}. Since min{π(0|y), π(1|y)}
is always less that 0.5, we never erase if c ≥ 0.5.

0 y≤0
δB (y) = , if c ≥ 0.5.
1 y>0

1.4 Medical Diagnosis. Let X be the health state of a patient, who is healthy
(x = 0) with probability 0.8 and sick (x = 1) with probability 0.2. A medical test
produces a measurement Y following a Poisson law with parameter x + 1. The
physician may take one of three actions: declare the patient healthy (a = 0), sick
(a = 1), or order a new medical test (a = t). The cost structure is as follows:


 0 if a = x
1 if a = 1, x = 0

C(a, x) =

 e if a = 0, x = 1

e/4 if a = t.
Derive the Bayes decision rule for the physician.

Solution: The state variable is the health which satisfies



0 w.p. 0.8 (healthy)
x=
1 w.p. 0.2 (sick)
The observation Y ∼ Poi(x + 1). That is
(x + 1)y −x−1
Px (y) = e for y = 0, 1, 2, . . .
y!
We commence by deriving the posterior distribution
π(x)Px (y)
π(x|y) =
P (y)
π(x)(x + 1)y e−x−1
= .
y!p(y)
Now the posterior cost of action a given observation y is
X
C(a|y) = E[C(a, X)|Y = y] = π(x|y)C(a, x).
x=0,1

Let fa (y) be defined as


X
fa (y) , y!P (y)C(a|y) = π(x)(x + 1)y e−x−1 C(a, x).
x=0,1
Introduction 5

Now evaluate this expression for a ∈ {0, 1, t}


0.2 y
f0 (y) = (0.2) · 2y · e−2 · e = 2
e
0.8
f1 (y) =
e
e 0.05 y
0.8e−1 + 0.2 · 2y · e−2 = 0.2 +

ft (y) = 2 .
4 e
For each y ∈ {0, 1, 2 . . .} the Bayes rule selects the minimum of these three
expressions. Comparing
 
0.2 0.4 0.8 1.6
f0 (y) = , , , ,...
e e e e
 
0.8 0.8 0.8 0.8
f1 (y) = , , , ,...
e e e e
 
0.2e + 0.05 0.2e + 0.1 0.2e + 0.2 0.2 + 0.4
ft (y) = , , , ,... ,
e e e e
we obtain

δB (y) = {0, 0, t, 1, 1, 1, . . .}.

1.5 Optimal Routing. A driver needs to decide between taking the highway (h)
or side streets (s) based on prior information and input from map application on
her phone about congestion on the highway. If the highway is congested, it would
be worse than taking the side streets, but if the highway has normal traffic it
would better than taking the side streets. Let X denote the state of the highway,
with value 0 denoting normal traffic, and value 1 denoting congestion. The prior
information that the driver has tells her that at the time she is planning to drive

π(0) = P{X = 0} = 0.3, π(1) = P{X = 1} = 0.7.

The map application displays a recommendation (using colors such as blue and
red), which is modeled as a binary observation Y ∈ {0, 1}, with

p0 (0) = 0.8, p0 (1) = 0.2, p1 (0) = 0.4, p1 (1) = 0.6.

The cost function that penalizes the driver’s actions is given by:

 10 if a = h, x = 1

1 if a = h, x = 0

C(a, x) =

 5 if a = s, x = 0

2 if a = s, x = 1.

Derive the Bayes decision rule for the driver.


6 Introduction

Solution: Bayes decision rule minimizes the a posteriori cost:


δB (y) = arg min C(a|y)
a∈{h,s}
X
= arg min π(x)px (y)C(a, x)
a∈{h,s}
x=0,1

= arg min C̃(a, y)


a∈{h,s}

where

X
C̃(a, y) = π(x)px (y)C(a, x).
x=0,1

Note that
C̃(h, 0) = π(0)p0 (0)C(h, 0) + π(1)p1 (0)C(h, 1)
= 0.3 × 0.8 × 1 + 0.7 × 0.4 × 10 = 3.04
C̃(s, 0) = π(0)p0 (0)C(s, 0) + π(1)p1 (0)C(s, 1)
= 0.3 × 0.8 × 5 + 0.7 × 0.4 × 2 = 1.76
C̃(h, 1) = π(0)p0 (1)C(h, 0) + π(1)p1 (1)C(h, 1)
= 0.3 × 0.2 × 1 + 0.7 × 0.6 × 10 = 4.26
C̃(s, 1) = π(0)p0 (1)C(s, 0) + π(1)p1 (1)C(s, 1)
= 0.3 × 0.2 × 5 + 0.7 × 0.6 × 2 = 1.14
and therefore
δB (0) = δB (1) = s.
That is, the driver should take the side streets in both cases!
1.6 Job Application. Don, Marco and Ted are CEO candidates for their com-
pany. Each candidate incurs a cost if another candidate is selected for the job.
If Don is selected, the cost to Marco and Ted is 10. If Marco or Ted is selected,
the cost to the other two candidates is 2. The cost to a candidate is 0 if he is
selected.
a) The selection will be made by a committee; it is believed that the committee
will select Don with probability 0.8 and Marco with probability 0.1. Compute
the expected cost for each candidate.
Solution:
Expected cost for Don: (0.8 × 0) + (0.1 × 2) + (0.1 × 2) = 0.4.
Expected cost for Marco: (0.8 × 10) + (0.1 × 0) + (0.1 × 2) = 8.2.
Expected cost for Ted: (0.8 × 10) + (0.1 × 2) + (0.1 × 0) = 8.2.
b) While Marco and Ted are underdogs, if either one of them drops out of con-
tention, Don’s probability of being selected will drop to 0.6. Marco and Ted
discuss that possibility and have three possible actions (Marco drops out, Ted
drops out, or both stay in the race). What decision minimizes their expected
cost?
Introduction 7

Solution:
• If Marco drops out then
Expected cost for Marco: (0.6 × 10) + (0.4 × 2) = 6.8
Expected cost for Ted: (0.6 × 10) + (0.4 × 0) = 6.
• If Ted drops out then
Expected cost for Marco = 6
Expected cost for Ted = 6.8.
• If Marco and Ted randomize their decision that one of them drops out based
on a fair coin toss then
Expected cost for each = 6.8+6
2 = 6.4.
• So randomization is the optimal strategy here.

1.7 Wise Investor. Ann has received $1000 and must decide whether to invest
it in the stock market or to safely put it in the bank where it collects no interest.
Assume the economy is either in recession or in expansion, and will remain in the
same state for the next year. During that period, the stock market will go up 10%
if the economy expands, and down 5% otherwise. The economy has a 40% prior
probability to be in expansion. An investment guru claims to know the state of
the economy and reveals it to Ann. Whatever the state is, there is a probability
λ ∈ [0, 1] that the guru is incorrect. Ann will act on the binary information
provided by the guru but is well aware of the underlying probabilistic model.
Give the Bayesian and minimax decision rules and Ann’s expected financial
gains in terms of λ. Perform a sanity check to verify your answer is correct in
the extreme cases λ = 0 and λ = 1.
Solution: The state in this problem is

0 : economic recession w.p. π(0) = 0.6
x=
1 : economic expansion w.p. π(1) = 0.4
The observation y ∈ {0, 1} is the guru’s claim about the state of the economy.
The conditional pmfs are given by P0 (1) = P1 (0) = λ. The action space is

0 : bank
a=
1 : stock
and the cost matrix is

 50 : x = 0, a = 1
C(a, x) = −100 : x = 1, a = 1

0: a=0
There are four deterministic rules. We will determine the Bayes and minimax
risks associated with each of them.
1 δ1 (y) = y (Trust guru). The conditional risks are
R0 (δ1 ) = (1 − λ)C(0, 0) + λC(1, 0) = 50λ
R1 (δ1 ) = λC(0, 1) + (1 − λ)C(1, 1) = −100(1 − λ).
8 Introduction

Therefore the Bayes risk is


r(δ1 ) = π(0)R0 (δ1 ) + π(1)R1 (δ1 )
= 0.6 × 50λ − 0.4 × 100(1 − λ) = 70λ − 40.
On the other hand the maximum risk for δ1 is
Rmax (δ1 ) = max(R0 (δ1 ), R1 (δ1 )) = 50 max(λ, 2(λ − 1)) = 50λ.
2 δ2 (y) = 1 (Invest). The conditional risks are
R0 (δ2 ) = C(1, 0) = 50
R1 (δ2 ) = C(1, 1) = −100
Therefore the Bayes risk is
r(δ2 ) = π(0)50 − π(1)100 = −10.
On the other hand the maximum risk for δ2 is
Rmax (δ2 ) = max(R0 (δ2 ), R1 (δ2 )) = 50.
3 δ3 (y) = 0 (Don’t invest). The conditional risks are
R0 (δ3 ) = 0
R1 (δ3 ) = 0.
Therefore the Bayes risk is
r(δ3 ) = 0
which is the same as the maximum risk
Rmax (δ3 ) = 0.
4 δ4 (y) = y + 1 (mod 2) (Do opposite of guru). The conditional risks are
R0 (δ4 ) = (1 − λ)C(1, 0) + λC(0, 0) = 50(1 − λ)
R1 (δ4 ) = (1 − λ)C(0, 1) + λC(1, 1) = −100λ
Therefore the Bayes risk is
r(δ4 ) = π(0)R0 (δ1 ) + π(1)R1 (δ1 )
= 30 − 70λ
On the other hand the maximum risk for δ4 is
Rmax (δ4 ) = max(R0 (δ4 ), R1 (δ4 )) = 50 max(1 − λ, −2λ) = 50(1 − λ).
Thus the Bayes rule, as a function of λ is

 δ1 λ ≤ 3/7
δB = δ 3/7 ≤ λ ≤ 4/7
 2
δ4 λ ≥ 4/7
Introduction 9

Figure 1.1 Ann’s expected financial gain as a function of λ.

The expected gain is given in Fig. 1.1.


For all λ ∈ [0, 1] the minimax rule is δm = δ3 (don’t invest). Additionally δ1
and δ4 are also minimax rules for λ = 0 and λ = 1 respectively.
Problem 4 After evaluating the normalizing constants, the pdf’s are:
3 2
p0 (y) = y I{|y|≤1}
2
3 − |y|
p1 (y) = I{|y|≤3}
9
Hence, the likelihood ratio is:
(
2 3−|y|
27 y 2 |y| ≤ 1
L(y) =
∞ |y| > 1
With uniform costs and equal priors, the Bayes rule is
H1
>
L(y) 1
<
H0
or
H1
>
p1 (y) p0 (y)
<
H0
After drawing a sketch of the pdf’s, we get

 1 |y| < t
δB (y) = 0 t < |y| ≤ 1
1 |y| > 1

10 Introduction


−1+ 163
t is the solution to p0 (t) = p1 (t) ⇒ t = 27 ≈ .436.
Here are the errors:
P (error|H0 ) = P0 [|y| ≤ t] + P0 [|y| > 1]
Z t
3
= 2 y 2 dy
2 0
≈ 0.083
and
P (error|H1 ) = P1 [t < |y| ≤ t]
Z 1
1
= 2 (3 − y)dy
9 t
≈ 0.286
Therefore, the average error is
1 1
P (error) =P (error|H0 ) + P (error|H1 ) ≈ .184
2 2
1.8 Randomized Bayes Rules. Prove Theorem 1.2.
Solution. We need to show that
min r(δ, π) = min r(δ̃, π).
δ∈D δ̃∈D̃

Since D ⊂ D̃, it is clear that the right side is less than or equal to the left side.
To prove the reverse inequality, suppose δ̃ chooses δ` with probability γ` , ` =
1, . . . , L. Then
L
X L
X
r(δ̃, π) = γ` r(δ` , π) ≥ γ` min r(δ, π) = min r(δ, π).
δ∈D δ∈D
`=1 `=1

Taking the minimum over δ̃ ∈ D̃ on the left side yields the desired inequality.
1.9 Equalizer Rule. Prove Proposition 1.4.
Solution: TBD.

You might also like