A) A Screen Shot of Your Bayesian Network and A Concise Description Explaining The Overall Purpose of This Expert System and The Meaning of Nodes

Question 1:
a) A screen shot of your Bayesian network and a concise

description explaining the overall purpose of this expert system
and the meaning of nodes.
The overall purpose of this expert system is to figure out the factors which give rise
to the delay of flights , if so, how long the delay would take, and how much they will
influence the outcomes.
Node explanation:
airport_weather: the weather condition in the area of airport. ( Bad, Good)
airport_birds: the number of birds which appeared in the area of airport. ( Lot, Few,
Null)
airport_airstrip: the situations of airstrips of airport. ( Busy, Void)
plane_malfunction: the situations of machines of plane. ( Break Down, Little Trouble,

Work Well)
plane_dispatch: the situations of dispatching of airplane’s company. ( Busy, Casual)
plane_pilot: the attendance of pilot of the plane. ( Present, Absent)
passengers_absence: the attendance of passengers of the plane. ( Yes, No)
plane_eligibility: the eligibility of the plane. ( Yes, No)
airport_eligibility: the eligibility of the airport. ( Yes, No)
accidents: other situations happens. ( Yes, No)
takeoff_delay: the delay situations of flight. ( No, Half Hour, More Than Hour).
b) Design three queries that can be answered using your network,
e.g. will you make it to class on time given certain weather
conditions. Describe how to execute each query, i.e. the evidence
you used, the node(s) queried and the a posteriori probabilities at
each node. Also, explain the results you obtain. Your work should
demonstrate that your network indeed encodes your expertise
effectively and the resulting probabilities make sense.
1. In a good day, would a lot of birds, which appear at the area of airport, give rise
to the absence of polite?
Set evidence of node airport_weather as good, and set evidence of node

airport_birds as lot, Then update.
The node plane_polite is queried.
The possibility of absence of polite is 0.28.
A lot of birds would lead busy airstrips of airport because of lacking available
airstrips which could cause busy dispatching of plane’s company, and accidents
which could cause malfunctions of planes, which give a possibility that the polite
would like to take a break or be dispatched to another airport which could cause the
absence.
2. When no accidents happen, would ineligibility of airport influence the eligibility of

plane?
Set evidence of node accidents as no, and set evidence of node airport_eligibility as
no, then update.
The node airplane_eligibility is queried.
The possibility of ineligibility of plane is 0.635.
The ineligibility of airport would cause the busy dispatching of plane’s company
because of flights postponing which could give rise to the ineligibility of plane.
3. In a bad day, and a few birds fly around the area of airport, would some
accidents occur, and the plane takes off on time, if no, how long would the delay
last?
Set evidence of node airport_weather as bad, set evidence of node airport_birds as

few, then update.
The node takeoff_delay and node accidents are queried.

The possibility of more than an hour delay of flight is 0.408 and that of half hour
delay is 0.475. And the possibility of accidents is 0.5.
The weather condition and birds are always the main reasons of accidents. They
can give rise to ineligibility of the airport, make the dispatch of plane’s company
busy, and even cause malfunctions of the plane. Bad weather can also make
absence of passengers. Lasting time of delay would depend on the eligibility of
airport and plane, the odds of accidents, and all check-in of passengers. In general,
these bad factors would lead a half hour delay. However, in some worse situations,
delay would last more than an hour.
c). Show an example of independence in your network. This could

be done by taking a few related nodes and showing how their
probabilities are impacted by setting some evidence.
The node airport_weather and node airport_birds have same influence nodes
airport_eligibility and accidents.
Firstly, we do not set evidence of node airport_weather and node airport_birds. Set
evidence of node accidents as yes. The ‘Yes’ possibility of airport_eligibility is 0.275.
Then set evidence of node accidents as no. The ‘Yes’ possibility of airport_eligibility
is 0.568. When the other nodes have not set, the outcomes proved that node
accidents and node airport_eligibility are not guaranteed to be independent.
Secondly, we set evidence of node airport_weather as good and node airport_birds

as Null. Set evidence of node accidents as yes. The ‘Yes’ possibility of
airport_eligibility is 0.78. Then set evidence of node accidents as no. The ‘Yes’
possibility of airport_eligibility is 0.78. When the other nodes have not set, the
outcomes proved that node accidents and node airport_eligibility are conditionally
independent.
Not knowing airport_weather and airport_birds.
accidents Yes No
airport_eligibi Yes No Yes No

lity 0.275 0.725 0.568 0.432
These tow nodes are not guaranteed to be independent.
Knowing airport_weather as good and airport_birds as Null.
accidents Yes No
airport_eligibi Yes No Yes No

lity 0.78 0.22 0.78 0.22
These tow nodes are conditionally independent.

Question 2:
a) Draw the Markov decision problem corresponding to the map in
Figure 1. Make sure to label the states, actions, associated costs
and probabilities clearly.
0.2
0
0.8
a22
a26
a15
1
a12
a24
a4
a
a9
a20
a25
a13
a17
a10
a18
a23
a19
a27
a16
a28
a14
a
a8
1
a
S.8
3
11
G
X21
2
7
1
56
1
3
2
5
4
7
6
b) Perform value iteration on the MDP from a) by hand using a

discount factor of 1 and present the results as shown on page 15
“example of Value Iteration (1)” of the MDP lecture notes.
i 0 1 2 3 4 5 6 7 8 9 10
3.46 3.49 3.49 3.49
0 1 2 3 3.36 3.5 3.5
4 1 8 9
3.46 3.49 3.49 3.49
0 1 2 3 3.36 3.5 3.5
4 2.49 1 2.49 8 2.49 9
S1 1 2 2.36 2.46 2.5 2.5 2.5
3.87 1 4.43 8 4.48 9 4.49 4.49
0 1 2 3 4.28 4.5
2 3 1 5 9
2.46 2.49 2.49 2.49
0 1 2 2.36 2.5 2.5 2.5
4 1 8 9
2.24 2.24
0 1 2 2.2 2.24 2.25 2.25 2.25 2.25
8 9
3.13 3.22 3.24 3.24
0 1 2 2.84 3.25 3.25 3.25
6 1.24 1 1.24 3 8
S2 1 1.2 1.24 1.25 1.25 1.25 1.25 1.25
8 2.24 9 2.24
0 1 2 2.2 2.24 2.25 2.25 2.25 2.25
8 9
1.24 1.24
0 1 1.2 1.24 1.25 1.25 1.25 1.25 1.25
8 9
3.48 3.66 3.72 3.74 3.74 3.74 3.74
0 1 2 3
8 9 7 4 8 9 9
4.48 4.66 4.42 4.74 4.74 4.74
0 1 2 3 4
3.48 8 3.66 9 3.72 7 3.74 4 3.74 8 3.74 9 3.74
S3 1 2 3
8 4.89 9 5.40 7 5.62 4 8 5.73 9 5.74 9
0 1 2 3 4 5.71
8 6 8 8 7
4.89 4.66 4.72 4.74 4.74 4.74
0 1 2 3 4
8 9 7 4 8 9
1.24 1.24
0 1 1.2 1.24 1.25 1.25 1.25 1.25 1.25
8 9
2.24 2.24
0 1 2 2.2 2.24 2.25 2.25 2.25 2.25
1.24 8 1.24 9
S4 1 1.2 1.24 1.25 1.25 1.25 1.25 1.25
3.13 8 3.22 9 3.24 3.24
0 1 2 2.84 3.25 3.25 3.25
6 1 3 8
2.24 2.24
0 1 2 2.2 2.24 2.25 2.25 2.25 2.25
8 9
4.85 4.95 4.98 4.99 4.99
0 1 2 3 4 4.59
3 2 5 6 9
5.85 5.95 5.98 5.99
0 1 2 3 4 5 5.59
4.85 3 4.95 2 4.98 5 4.99 6 4.99
S5 1 2 3 4 4.59
3 5.85 2 5.95 5 5.98 6 5.99 9
0 1 2 3 4 5 5.59
3 2 5 6
4.85 4.95 4.98 4.99 4.99
0 1 2 3 4 4.59
3 2 5 6 9
4.48 4.66 4.72 4.74 4.74 4.74
0 1 2 3 4
8 9 7 4 8 9
4.89 5.40 5.62 5.73 5.74
0 1 2 3 4 5.71
3.48 8 3.66 6 3.72 8 3.74 3.74 8 3.74 7 3.74
S6 1 2 3
8 4.48 9 4.66 7 4.72 4 4.74 8 4.74 9 4.74 9
0 1 2 3 4
8 9 7 4 8 9
3.48 3.66 3.72 3.74 3.74 3.74 3.74
0 1 2 3
8 9 7 4 8 9 9
2.46 2.49 2.49 2.49
0 1 2 2.36 2.5 2.5 2.5
4 1 8 9
3.87 4.28 4.43 4.48 4.49 4.49
0 1 2 3 4.5
2 2.46 3 2.49 3 2.49 1 2.49 5 9
S7 1 2 2.36 2.5 2.5 2.5
4 3.46 1 3.49 8 3.49 9
0 1 2 3 3.36 3.5 3.5 3.5
4 1 8
3.46 3.49 3.49
0 1 2 3 3.36 3.5 3.5 3.5
4 1 8
goal 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
c) Briefly explain how policy iteration can be applied to solve the

MDP from a). You may start with a random policy, in which states
are assigned random actions. Describe the process and show how
it works for two iterations.
policy i = 0
1.First, we assigned a random policy to every node except goal node and X node.
a0(S1) = a4, a0(S2) = a8, a0(S3) = a9, a0(S4) = a13, a0(S5) = a17, a0(S6) = a22,
a0(S7) = a26;
2.According to the policy, we got the equations of nodes, then calculated the
results.
gd0(S1) = 1 + gd0(S1) * 0.2 + gd0(S2) * 0.8 = 2.5,
gd0(S2) = 1 + gd0(S2) * 0.2 + gd0(G) * 0.8 = 1.25,
gd0(S3) = 1 + gd0(S3) * 0.2 + gd0(S1) * 0.8 = 3.75,
gd0(S4) = 1 + gd0(S4) * 0.2 + gd0(G) * 0.8 = 1.25,
gd0(S5) = 1 + gd0(S5) * 0.2 + gd0(S3) * 0.8 = 5,
gd0(S6) = 1 + gd0(S6) * 0.2 + gd0(S5) * 0.8 = 6.25,
gd0(S7) = 1 + gd0(S7) * 0.2 + gd0(S6) * 0.8 = 7.5;
3.According to the value of nodes, we calculated the results of every policy of every
node.
S1: a1 = 1 + 1 * gd0(S1) =3.5, a2 = 3.5, a3 = 1 + 0.8 * gd0(S3) + 0.2 * gd0(S1) =

4.5, a4 = 2.5;
S2: a5 = 2.25, a6 = 3.25, a7 = 2.25, a8 = 1.25;
S3: a9 = 3.75, a10 = 4.75, a11 = 5.75, a12 = 4.75;
S4: a13 = 1.25, a14 = 2.25, a15 = 7.25, a16 = 2.25;
S5: a17 = 5, a18 = 6, a19 = 6, a20 = 7;
S6: a21 = 7.25, a22 = 6.25, a23 = 7.25, a24 = 8.25;
S7: a25 = 3.5, a26 = 7.5, a27 = 8.5, a28 = 8.5.
Policy i = i + 1
4.According the values of policies of every node, we chose the minimal one as the
new policy. In this case, node S7’s minimal value policy was a25, so we switched
a26 to a25. Then we started step 2.
gd0(S1) = 1 + gd0(S1) * 0.2 + gd0(S2) * 0.8 = 2.5,

gd0(S2) = 1 + gd0(S2) * 0.2 + gd0(G) * 0.8 = 1.25,
gd0(S3) = 1 + gd0(S3) * 0.2 + gd0(S1) * 0.8 = 3.75,
gd0(S4) = 1 + gd0(S4) * 0.2 + gd0(G) * 0.8 = 1.25,
gd0(S5) = 1 + gd0(S5) * 0.2 + gd0(S3) * 0.8 = 5,
gd0(S6) = 1 + gd0(S6) * 0.2 + gd0(S5) * 0.8 = 6.25,
gd0(S7) = 1 + gd0(S7) * 0.2 + gd0(S4) * 0.8 = 2.5;
5.We did step 3 again.
S1: a1 = 1 + 1 * gd0(S1) =3.5, a2 = 3.5, a3 = 1 + 0.8 * gd0(S3) + 0.2 * gd0(S1) =

4.5, a4 = 2.5;
S2: a5 = 2.25, a6 = 3.25, a7 = 2.25, a8 = 1.25;
S3: a9 = 3.75, a10 = 4.75, a11 = 5.75, a12 = 4.75;
S4: a13 = 1.25, a14 = 2.25, a15 = 7.25, a16 = 2.25;
S5: a17 = 5, a18 = 6, a19 = 6, a20 = 7;
S6: a21 = 7.25, a22 = 6.25, a23 = 7.25, a24 = 4.25;
S7: a25 = 2.5, a26 =6.5, a27 = 3.5, a28 = 3.5.
6. In this case, S6’s minimal value policy was a24, so we switched a22 to a24. Then
we did step 2 again. Do the loop, until all minimal value policy of every node in
current iteration is the same with that in previous one.
d) For maps 1, 2, & 3, show the number of iterations taken by
value iteration to reach the terminating criterion, your cost map
and policy, using a discount factor of 1.
map1:
map2
map3
Question 3:
a) Draw the Markov decision problem corresponding to the map in
Figure 2. Make sure to label the states, actions, associated costs
and probabilities clearly.
0.8
a14
0
0.2
a26
a30
a19
a18
1a7
1
a10
a28
a16
a4
a11
a
-2
a25
a29
a17
-1
a22
a23
a12
a32
a31
a27
a20
-1
-2
a24
a13
a21
a8
1
a
S.8
a6
15
X2
1
5
9
3
1
2
3
6
4
5
8
7
b) Perform value iteration on the MDP from a) by hand using a

discount factor of 1 and present the first 5 steps of the results as
shown on page 15 “example of Value Iteration (1)” of the MDP
lecture notes. State whether value iteration would terminate
based on the termination criterion described earlier. Provide a
brief explanation.
i 0 1 2 3 4 5
a1 0 1 2 3 2.08 0.392
a2 0 1 2 3 2.08 0.392
S1 1 2 1.08 -0.608 -2.526
a3 0 1 2 3 2.08 0.392
a4 0 1 2 1.08 -0.608 -2.526
a5 0 1 2 0.6 -1.27 -3.256
a6 0 1 2 2.52 1.408 -0.337
S2 1 -0.399 -2.28 -4.26 -6.251
a7 0 1 2 2.52 1.408 -0.337
a8 0 1 -0.399 -2.28 -4.26 -6.251
a9 0 -2 -4 -6 -8 -10
a10 0 -2 -1.6 -3.12 -5.024 -7.005
S3 -2 -4 -6 -8 -10
a11 0 -2 -1.6 -3.12 -5.024 -7.005
a12 0 -2 -4 -6 -8 -10
a13 0 1 2 1.08 -0.608 -2.526
a14 0 1 2 3 2.08 0.392
S4 1 2 1.08 -0.608 -2.256
a15 0 1 2 1.72 0.8 -0.325
a16 0 1 2 1.08 -0.608 -2.256
a17 0 1 -0.399 -2.28 -4.256 -6.251
a18 0 1 2 2.52 1.408 -0.338
S5 1 -0.399 -2.28 -4.256 -6.251
a19 0 1 2 2.52 1.408 -0.338
a20 0 1 2 0.6 -1.28 -3.256
a21 0 -1 -2 -3 -4 -5
a22 0 -1 -2 -3 -4 -5
S6 -1 -2 -3 -4 -5
a23 0 -1 -2 -3 -4 -5
a24 0 -1 -0.399 -1.08 -2.016 -3.003
a25 0 1 2 2.68 1.76 0.213
a26 0 1 0.399 -0.52 -1.504 -2.501
S7 1 0.399 -0.52 -1.504 -2.501
a27 0 1 2 1.4 0.48 -0.504
a28 0 1 2 2.68 1.76 0.213
a29 0 1 2 1.08 -0.608 -2.526
a30 0 1 2 1.72 0.8 -0.325
S8 1 2 1.08 -0.608 -2.526
a31 0 1 2 3 2.08 0.392
a32 0 1 2 3 2.08 0.392
Explanation:
Value iteration would not terminate based on the termination criterion.
According to the question, there is no goal node in the map which could cause that
cost values of nodes increase endlessly. And the incremental value is much bigger
than 0.01 which criterion requires. It depends on the highest absolutely value that
‘C’ nodes decide.
In this case, the value of ‘C’ nodes are negative which means that the cost of each
node would decrease endlessly, and D-value between two iterations would be not
reasonably smaller.
However, if we give a discount factor, which is smaller than 1, to these equations,

the D-value between two iterations would become smaller and smaller. Because
when factor times value, it would give rise to a de-gradient change.
c) Perform value iteration on the MDP from a) by hand using a

discount factor of 0.5 and present the results as shown on page
15 “example of Value Iteration (1)” of the MDP lecture notes.
i 0 1 2 3 4 5 6 7 8 9
1.63 1.52 1.46 1.43 1.42 1.41
a1 0 1 1.5 1.75
5 9 9 9 3 5
1.63 1.52 1.46 1.43 1.42 1.41
a2 0 1 1.5 1.75
5 1.05 9 0.93 9 0.87 9 0.84 3 5 0.82
S1 1 1.5 1.27 0.83
1.63 9 1.52 9 1.46 7 1.43 6 1.42 1.41 3
a3 0 1 1.5 1.75
5 9 9 9 3 5
1.05 0.93 0.87 0.84 0.82
a4 0 1 1.5 1.27 0.83
9 9 7 6 3
0.91 0.79 0.72 0.69 0.68 0.67
a5 0 1 1.5 1.15
5 1 9 8 2 4
1.49 1.38 1.32 1.27 1.26
a6 0 1 1.5 1.63 1.29
1 - 2 - 1 - - 5 - 7 -
-
S2 1 0.3 1.49 0.41 1.38 0.54 1.32 0.60 0.63 1.27 0.65 1.26 0.65
a7 0 1 1.5 1.63 0.17 1.29
1 7 2 2 1 4 5 5 1 7 9
- - - - - -
-
a8 0 1 0.3 0.41 0.54 0.60 0.63 0.65 0.65
0.17
7 2 4 5 1 9
- - - - -
-
a9 0 -2 -3 -3.5 3.87 3.93 3.96 3.98 3.99
3.75
5 7 9 4 2
- - - - - -
-
a10 0 -2 -1.8 2.41 2.54 2.60 2.63 2.65 2.65
2.18 - - - - -
8 - 2 4 5 1 9
S3 -2 -3 -3.5 3.87 3.93 3.96 3.98 3.99
- 3.75 - - - - -
- 5 7 9 4 2
a11 0 -2 -1.8 2.41 2.54 2.60 2.63 2.65 2.65
2.18
8 2 4 5 1 9
- - - - -
-
a12 0 -2 -3 -3.5 3.87 3.93 3.96 3.98 3.99
3.75
5 7 9 4 2
1.05 0.93 0.87 0.84 0.82
a13 0 1 1.5 1.27 0.83
9 9 7 6 3
1.63 1.52 1.46 1.43 1.42 1.41
a14 0 1 1.5 1.75
5 1.05 9 0.93 9 0.87 9 0.84 3 5 0.82
S4 1 1.5 1.27 0.83
1.31 9 1.24 9 1.20 7 1.18 6 1.17 3
a15 0 1 1.5 1.43 1.18
5 5 8 9 5
1.05 0.93 0.87 0.84 0.82
a16 0 1 1.5 1.27 0.83
9 9 7 6 3
- - - - - -
-
a17 0 1 0.3 0.41 0.54 0.60 0.63 0.65 0.65
0.17
7 2 4 5 1 9
1.49 - 1.38 - 1.32 - - 1.27 - 1.26 -
a18 0 1 1.5 1.63 - 1.29
S5 1 0.3 1 0.41 2 0.54 1 0.60 0.63 5 0.65 7 0.65
0.17
1.49 7 1.38 2 1.32 4 5 1.27 1 1.26 9
a19 0 1 1.5 1.63 1.29
1 2 1 5 7
0.91 0.79 0.72 0.69 0.68 0.67
a20 0 1 1.5 1.15
5 1 9 8 2 4
- - - - - -
-
a21 0 -1 -1.5 1.87 1.93 1.96 1.98 1.99 1.99
1.75
5 75 9 4 2 6
- - - - - -
-
a22 0 -1 -1.5 1.87 1.93 1.96 1.98 1.99 1.99
1.75 - - - - - -
- 5 75 9 4 2 6
S6 -1 -1.5 1.87 7.93 1.96 1.98 1.99 1.99
1.75 - - - - - -
- 5 7 9 4 2 6
a23 0 -1 -1.5 1.87 1.93 1.96 1.98 1.99 1.99
1.75
5 75 9 4 2 6
- - - - - -
-
a24 0 -1 -0.7 0.98 1.04 1.07 1.09 1.10 1.10
0.87
7 87 9 5 3 7
1.55 0.34 1.45 0.28 1.40 0.25 1.37 0.23 1.36 1.35 0.22
S7 a25 0 1 1 1.5 0.7 1.67 0.47 0.23
5 7 8 5 4 3 6 8 2 5 6
0.34 0.28 0.25 0.23 0.22
a26 0 1 0.7 0.47 0.23
7 5 3 8 6
1.23 1.17 1.14 1.12 1.11 1.11
a27 0 1 1.5 1.35
5 3 2 7 9 5
1.55 1.45 1.40 1.37 1.36 1.35
a28 0 1 1.5 1.67
5 8 4 6 2 5
1.05 0.93 0.87 0.84 0.82
a29 0 1 1.5 1.27 0.83
9 9 7 6 3
1.31 1.24 1.20 1.18 1.17
a30 0 1 1.5 1.43 1.18
5 1.05 5 0.93 8 0.87 9 0.84 5 0.82
S8 1 1.5 1.27 0.83
1.63 9 1.52 9 1.46 7 1.43 6 1.42 1.41 3
a31 0 1 1.5 1.75
5 9 9 9 3 5
1.63 1.52 1.46 1.43 1.42 1.41
a32 0 1 1.5 1.75
5 9 9 9 3 5
d) For maps 4, 5 & 6, show the number of iterations taken by
and policy using a discount factor of 0.9.
map4
map5
map6
e) For maps 4, 5 & 6, show the number of iterations taken by
and policy using a discount factor of 0.3.
map4
map5
map6

A) A Screen Shot of Your Bayesian Network and A Concise Description Explaining The Overall Purpose of This Expert System and The Meaning of Nodes

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A) A Screen Shot of Your Bayesian Network and A Concise Description Explaining The Overall Purpose of This Expert System and The Meaning of Nodes

Uploaded by

Copyright:

Available Formats

Question 1:

a) A screen shot of your Bayesian network and a concise

airport_weather: the weather condition in the area of airport. ( Bad, Good)

airport_airstrip: the situations of airstrips of airport. ( Busy, Void)

plane_malfunction: the situations of machines of plane. ( Break Down, Little Trouble,

plane_dispatch: the situations of dispatching of airplane’s company. ( Busy, Casual)

plane_pilot: the attendance of pilot of the plane. ( Present, Absent)

passengers_absence: the attendance of passengers of the plane. ( Yes, No)

plane_eligibility: the eligibility of the plane. ( Yes, No)

airport_eligibility: the eligibility of the airport. ( Yes, No)

accidents: other situations happens. ( Yes, No)

Set evidence of node airport_weather as good, and set evidence of node

The node plane_polite is queried.

The possibility of absence of polite is 0.28.

2. When no accidents happen, would ineligibility of airport influence the eligibility of

The node airplane_eligibility is queried.

The possibility of ineligibility of plane is 0.635.

Set evidence of node airport_weather as bad, set evidence of node airport_birds as

The node takeoff_delay and node accidents are queried.

c). Show an example of independence in your network. This could

Secondly, we set evidence of node airport_weather as good and node airport_birds

Not knowing airport_weather and airport_birds.

airport_eligibi Yes No Yes No

These tow nodes are not guaranteed to be independent.

Knowing airport_weather as good and airport_birds as Null.

airport_eligibi Yes No Yes No

These tow nodes are conditionally independent.

b) Perform value iteration on the MDP from a) by hand using a

c) Briefly explain how policy iteration can be applied to solve the

gd0(S1) = 1 + gd0(S1) * 0.2 + gd0(S2) * 0.8 = 2.5,

gd0(S2) = 1 + gd0(S2) * 0.2 + gd0(G) * 0.8 = 1.25,

gd0(S3) = 1 + gd0(S3) * 0.2 + gd0(S1) * 0.8 = 3.75,

gd0(S4) = 1 + gd0(S4) * 0.2 + gd0(G) * 0.8 = 1.25,

gd0(S5) = 1 + gd0(S5) * 0.2 + gd0(S3) * 0.8 = 5,

gd0(S6) = 1 + gd0(S6) * 0.2 + gd0(S5) * 0.8 = 6.25,

gd0(S7) = 1 + gd0(S7) * 0.2 + gd0(S6) * 0.8 = 7.5;

S1: a1 = 1 + 1 * gd0(S1) =3.5, a2 = 3.5, a3 = 1 + 0.8 * gd0(S3) + 0.2 * gd0(S1) =

S2: a5 = 2.25, a6 = 3.25, a7 = 2.25, a8 = 1.25;

S3: a9 = 3.75, a10 = 4.75, a11 = 5.75, a12 = 4.75;

S4: a13 = 1.25, a14 = 2.25, a15 = 7.25, a16 = 2.25;

S5: a17 = 5, a18 = 6, a19 = 6, a20 = 7;

S6: a21 = 7.25, a22 = 6.25, a23 = 7.25, a24 = 8.25;

S7: a25 = 3.5, a26 = 7.5, a27 = 8.5, a28 = 8.5.

gd0(S1) = 1 + gd0(S1) * 0.2 + gd0(S2) * 0.8 = 2.5,

gd0(S3) = 1 + gd0(S3) * 0.2 + gd0(S1) * 0.8 = 3.75,

gd0(S4) = 1 + gd0(S4) * 0.2 + gd0(G) * 0.8 = 1.25,

gd0(S5) = 1 + gd0(S5) * 0.2 + gd0(S3) * 0.8 = 5,

gd0(S6) = 1 + gd0(S6) * 0.2 + gd0(S5) * 0.8 = 6.25,

gd0(S7) = 1 + gd0(S7) * 0.2 + gd0(S4) * 0.8 = 2.5;

5.We did step 3 again.

S1: a1 = 1 + 1 * gd0(S1) =3.5, a2 = 3.5, a3 = 1 + 0.8 * gd0(S3) + 0.2 * gd0(S1) =

S2: a5 = 2.25, a6 = 3.25, a7 = 2.25, a8 = 1.25;

S3: a9 = 3.75, a10 = 4.75, a11 = 5.75, a12 = 4.75;

S4: a13 = 1.25, a14 = 2.25, a15 = 7.25, a16 = 2.25;

S5: a17 = 5, a18 = 6, a19 = 6, a20 = 7;

S6: a21 = 7.25, a22 = 6.25, a23 = 7.25, a24 = 4.25;

S7: a25 = 2.5, a26 =6.5, a27 = 3.5, a28 = 3.5.

b) Perform value iteration on the MDP from a) by hand using a

Value iteration would not terminate based on the termination criterion.

However, if we give a discount factor, which is smaller than 1, to these equations,

c) Perform value iteration on the MDP from a) by hand using a

You might also like