Professional Documents
Culture Documents
Chapter 5
Yifan Wang
May 2019
Exercise 5.1
1. It is due to the strategy that player will not stop until meet-
ing 20 or 21. That indicates player would face the risk of failing by
hitting, which results the low value part right before 20 and 21. On
the 20 and 21, however, the player stops and has a very high oppor-
tunity to win, especially when dealer will stop at 17 or higher.
2. It drop off for the whole last row on the left because if Dealer
showing an ACE, It has very high possibility of getting higher score
than the player when it counts as 11. Thus, the value of dealer’s A has
contained the dealer’s winning rate of making it usable or not. Other
cards have no such condition thus A is a special which makes the gap.
Exercise 5.2
No. Black jack does not contain two duplicate state in any episode,
making first-visit and every-visit method essentially the same thing.
1
Exercise 5.3
Drawing problem. I will not draw it here. The diagram has max
of qπ in the bottom.
2
Exercise 5.4
3
Exercise 5.5
Because X
|J(s)| = ρt:T (t)−1 = 10
t∈J(s)
ρt:T (t)−1 = 1
first-visit:
vs = 10
all-visit:
1
vs = (1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10) = 5.5
10
Exercise 5.6
eq (5.6):
P
. t∈J(s) ρt:T (t)−1 Gt
V (s) = P
t∈J(s) ρt:T (t)−1
For Q:
P
. t∈J(s,a) ρt+1:T (t)−1 Gt
Q(s, a) = P
t∈J(s,a) ρt+1:T (t)−1
4
Exercise 5.7
The weighted average algorithm will need few episodes to decrease its
bias. Especially when the ρt:T (t)−1 is big, the weighted average algo-
rithm would be shifted by those data. When we get enough episodes,
the average begins to be stable and decreasing the bias.
Exercise 5.8
...
∞ k−1
X 1X l l
= 0.1 0.9 · 2 · 2
k
k=1 l=0
∞ k−1
X 1 X
= 0.2 1.8l = ∞.
k
k=1 l=0
5
On the other hand, considering weighted average method:
" T −1 k
!2 #
1 X Y π(At |St )
Eb G0
|J(s)| t=0
b(At |St )
k=1
...
∞ k−1
X 1 1 X
≤ 0.1 Pk 0.9l · 2l · 2
k ( i=0 2 )
i 2
k=1 l=0
∞ k−1
X 1 1 X l
≤ 0.2 1.8
k 22k
k=1 l=0
∞
X 1 −2k
≤ 0.2 2 k 1.8k
k
k=1
∞
X 1 −2k k
≤ 2 k2
k
k=1
X∞
= 2−k = 1
k=1
Exercise 5.9
= ...
1
= Vn−1 (St ) + Gn (t) − Vn−1 (St )
n
6
Exercise 5.10
Pn
. k=1 Wk Gk
Vn+1 = P n
k=1 Wk
Pn−1 Pn−1
Wn Gn + k=1 Wk Gk Wk
= Pn−1 Pk=1
n
k=1 Wk k=1 Wk
Wn Gn Cn−1
= + Vn
Cn−1 Cn
Wn Gn Vn Cn−1
= +
Cn Cn
Wn Gn Vn Cn−1
= Vn + + − Vn
Cn Cn
Wn Gn Vn Cn−1 − Vn Cn
= Vn + +
Cn Cn
Wn Gn −Vn Wn
= Vn + +
Cn Cn
Wn
= Vn + Gn − Vn
Cn
7
Exercise 5.11
Exercise 5.12
Exercise 5.13
from 5.12
π(At |St ) π(At+1 |St+1 ) π(At+2 |St+2 ) π(AT −1 |ST −1 )
ρt:T −1 Rt+1 = ... Rt+1
b(At |St ) b(At+1 |St+1 ) b(At+2 |St+2 ) b(AT −1 |ST −1 )
E(XY ) = E(X)E(Y )
Exercise 5.14
8
Skip