Explore Ebooks
Categories
Explore Audiobooks
Categories
Explore Magazines
Categories
Explore Documents
Categories
For the sake of easy analysis of the PropCost probability distributions obtained
throughout this project from the insurance network, we define the function f to be a
weighted average across the discrete domain, resulting in a single scalar value
representative of the overall cost. More specifically,
(f = 48275.62)
These results are consistent with those obtained by executing the given enumeration
procedure, and those given in Table 1 of the project hand-out.
(f = 52028.74)
(f = 51859.40)
Liu, Smith 3
6000
5000
4000
3000
2000
1000
0
1 2 3 4 5 6 7 8 9 10
Trials
6000
5000
4000
3000
2000
1000
0
1 2 3 4 5 6 7 8 9 10
Trials
6000
5000
4000
3000
2000
1000
0
1 2 3 4 5 6 7 8 9 10
Trials
6000
5000
4000
3000
2000
1000
0
1 2 3 4 5 6 7 8 9 10
Trials
B. Discussion
Error! Reference source not found. through Error! Reference source
not found. illustrate the running time of a random order variable
elimination algorithm for each of the problems in Task 2 of the project
handout. We ran the algorithm ten times for each problem. For each bar,
if there it is stacked with a purple bar on top of it, then the heap ran out of
memory during that execution. In this case, we know that the execution
would have taken at least the amount of time illustrated by the blue bar,
the time it executed before running out of memory. We suppose that
each execution where the computer ran out of memory would have taken
at least 5000 seconds to complete.
It is worth noting that the time taken on the successful runs (the samples
without a purple bar) is much lower than the time taken to execute the
unsuccessful runs before they crashed. I.e. the successful blue bars tend
to be shorter than the unsuccessful blue bars. This indicates that either
random ordering tends to get it very right or very wrong.
Liu, Smith 6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
1 2 3 4
Problem Number
Table 1. Average time of execution for variable elimination for the problems
from Task 2. Averages are constructed across ten independant runs each,
which are illustrated in Figure 5.
B. Discussion
As can be seen from
Table 1, the time needed for variable elimination is much smaller for a
greedy elimination ordering versus a random ordering. This makes a lot
of sense, because the random ordering could happen to eliminate a
parent of many children, creating a huge factor which slows down the
algorithm and eats up memory. On the contrary, greedy ordering variable
elimination works very well. Even in the cases from Section 3 in which
we did not run out of memory, the greedy algorithm tends to be about
100-200 times faster.
Liu, Smith 7
4.00E-03
3.50E-03
3.00E-03
KL Divergence
2.50E-03
2.00E-03
1.50E-03
1.00E-03
5.00E-04
0.00E+00
0 200 400 600 800 1000
Size of Prefix Thrown Away
1.20E-03
Average KL Divergence
1.00E-03
8.00E-04
6.00E-04
4.00E-04
2.00E-04
0.00E+00
0 200 400 600 800 1000
Size of Prefix Thrown Away
B. Discussion
In this analysis, we ran the Gibbs sampler with 2000 samples on the
same problem (Carpo – 1). For each iteration, we threw away a variable
number of the first samples. The idea is that since Gibbs sampling is a
Markov Chain algorithm, each sample highly depends on the samples
before it. Since we choose a random initialization vector for each
variable, it can take some “burn in” time before the algorithm begins to
settle into the right global solution.
1. Likelihood Weighting
Likelihood Weighting - Problem Insurance1
7.00E-02
6.00E-02
5.00E-02
KL Divergence
4.00E-02
3.00E-02
2.00E-02
1.00E-02
0.00E+00
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
)Number of Samples (x1000
0.035
0.03
0.025
KL Divergence
0.02
0.015
0.01
0.005
0
00
00
00
00
00
00
00
00
00
00
00
0
0
10
20
30
40
50
60
70
80
90
10
11
12
13
14
15
16
17
18
19
20
Number of Samples
7.00E-02
6.00E-02
5.00E-02
Divergence
4.00E-02
3.00E-02
2.00E-02
1.00E-02
0.00E+00
0
0
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
10
11
12
13
14
15
16
17
18
19
20
Sample Size
0.02
0.018
0.016
KL Divergence
0.014
0.012
0.01
0.008
0.006
0.004
0.002
0
00
00
00
00
00
00
00
00
00
00
00
0
0
10
20
30
40
50
60
70
80
90
10
11
12
13
14
15
16
17
18
19
20
Number of Samples
8.00E-02
7.00E-02
6.00E-02
5.00E-02
4.00E-02
3.00E-02
2.00E-02
1.00E-02
0.00E+00
1000
1100
1200
1300
1400
1500
1600
1700
1800
1900
2000
100
200
300
400
500
600
700
800
900
Number of Samples
0.05
0.045
0.04
KL Divergence
0.035
0.03
0.025
0.02
0.015
0.01
0.005
0
00
00
00
00
00
00
00
00
00
00
00
0
0
0
0
0
0
0
10
20
30
40
50
60
70
80
90
10
11
12
13
14
15
16
17
18
19
20
Number of Samples
2.50E-02
2.00E-02
1.50E-02
1.00E-02
5.00E-03
0.00E+00
1000
1100
1200
1300
1400
1500
1600
1700
1800
1900
2000
100
200
300
400
500
600
700
800
900
Number of Samples
0.007
0.006
0.005
KL Divergence
0.004
0.003
0.002
0.001
0
0
0
0
0
0
0
0
0
0
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
10
11
12
13
14
15
16
17
18
19
20
Number of Samples
2. Gibbs Sampling
1.2
0.8
0.6
0.4
0.2
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
13000
14000
15000
16000
17000
18000
19000
20000
21000
22000
23000
24000
25000
Number of Samples
Figure 14. Divergences resulting from Gibbs Sampling applied to P(PropCost | Age
= Adolescent, Antilock = False, Mileage = FiftyThou, MakeModel = SportsCar) for
sample sizes between 1000 and 25000.
Liu, Smith 16
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
13000
14000
15000
16000
17000
18000
19000
20000
21000
22000
23000
24000
25000
Number of Samples
1.2
0.8
0.6
0.4
0.2
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
13000
14000
15000
16000
17000
18000
19000
20000
21000
22000
23000
24000
25000
Number of Samples
Figure 16. Divergences resulting from Gibbs Sampling applied to P(PropCost | Age
= Adolescent, Antilock = False, Mileage = FiftyThou, GoodStudent = True) for
sample sizes between 1000 and 25000.
Liu, Smith 17
0.25
0.2
0.15
0.1
0.05
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
13000
14000
15000
16000
17000
18000
19000
20000
21000
22000
23000
24000
25000
Number of Samples
4.00E-03
3.50E-03
3.00E-03
2.50E-03
2.00E-03
1.50E-03
1.00E-03
5.00E-04
0.00E+00
00
00
00
00
00
0
00
00
00
00
00
00
00
00
10
30
50
70
90
11
13
15
17
19
21
23
25
Number of Samples
Figure 18. Divergences resulting from Gibbs Sampling applied to P(N112 | N64 =
"3", N113 = "1", N116 = "0") for sample sizes between 1000 and 25000.
Liu, Smith 18
1.20E-03
1.00E-03
8.00E-04
6.00E-04
4.00E-04
2.00E-04
0.00E+00
00
00
00
00
00
0
00
00
00
00
00
00
00
00
10
30
50
70
90
11
13
15
17
19
21
23
25
Number of Samples
Figure 19. Average Divergence resulting from Gibbs Sampling applied to P(N112 |
N64 = "3", N113 = "1", N116 = "0") for sample sizes between 1000 and 25000.
0.035
0.03
0.025
0.02
0.015
0.01
0.005
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
13000
14000
15000
16000
17000
18000
19000
20000
21000
22000
23000
24000
25000
Number of Samples
Figure 20. Divergences resulting from Gibbs Sampling applied to P(N143 | N146 =
"1", N116 = "0", N121 = "1") for sample sizes between 1000 and 25000.
Liu, Smith 19
0.014
0.012
0.01
0.008
0.006
0.004
0.002
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
13000
14000
15000
16000
17000
18000
19000
20000
21000
22000
23000
24000
25000
Number of Samples
Figure 21. Average divergence resulting from Gibbs Sampling applied to P(N143 |
N146 = "1", N116 = "0", N121 = "1") for sample sizes between 1000 and 25000.
B. Discussion of Results
Four interesting things:
3. Convergence is logarithmic
This is an evident feature of all of the graphs, but has enormous
implications for a choice of algorithms.
It turns out that for the networks and queries that we considered, variable
elimination is the champ on both accuracy and speed. As can be seen
from Table 2, variable elimination performed in near-second times on each
problem, while Gibbs took about 15 seconds and Likelihood Weighting
took around 5 seconds.
This is with 1000 samples for the sampling algorithms, and an effective
infinite samples for variable elimination.
Our results might have been different if the networks involved were much
more dense (i.e. connected) or much larger.
Liu, Smith 21