Professional Documents
Culture Documents
Homework Day 2 - Mahaputra
Homework Day 2 - Mahaputra
Homework 1
Table 1. Probability Comparison
No. p Info(p) Entropy q Info(q) Entropy
0.29874 0.29874
1 1/12 3.584962501 7 1/12 3.584962501 7
0.29874 0.29874
2 1/12 3.584962501 7 1/12 3.584962501 7
0.29874 0.43082
3 1/12 3.584962501 7 1/6 2.584962501 7
0.29874 0.43082
4 1/12 3.584962501 7 1/6 2.584962501 7
0.29874 0.29874
5 1/12 3.584962501 7 1/12 3.584962501 7
0.29874 0.29874
6 1/12 3.584962501 7 1/12 3.584962501 7
0.29874 0.29874
7 1/12 3.584962501 7 1/12 3.584962501 7
0.29874 0.19104
8 1/12 3.584962501 7 1/24 4.584962501 0
0.29874 0.19104
9 1/12 3.584962501 7 1/24 4.584962501 0
0.29874 0.19104
10 1/12 3.584962501 7 1/24 4.584962501 0
0.29874 0.19104
11 1/12 3.584962501 7 1/24 4.584962501 0
0.29874 0.29874
12 1/12 3.584962501 7 1/12 3.584962501 7
Sum(p) Sum 3.58496 Sum(q) Sum (Entropy- 3.41829
= 1 (Entropy-p) = 3 = 1 q) = 6
Sum
(Entropy 3.75162 Sum (Entropy 3.58496
-p||q) = 9 -q||p) = 3
KL
Divergenc
e (p||q) -0.33333
1
Name : Mahaputra Madani Senen
Risk Management & System Safety
Student No. : M10801850 Homework Day 2
Due date : 16/12/2019
Probability Comparison
9/50
4/25
7/50
3/25
1/10
2/25
3/50
1/25
1/50
0
0 2 4 6 8 10 12 14
p q
Homework 2
Table 2. The Entropy Calculation Result
Head Tail Info(H) Info(T) Entropy
0.0001 0.9999 13.28771 0.000144 0.001473
0.01 0.99 6.643856 0.0145 0.080793
0.02 0.98 5.643856 0.029146 0.141441
0.03 0.97 5.058894 0.043943 0.194392
0.04 0.96 4.643856 0.058894 0.242292
0.05 0.95 4.321928 0.074001 0.286397
……….. Continue until…
0.48 0.52 1.058894 0.943416 0.998846
0.49 0.51 1.029146 0.971431 0.999711
0.50 0.5 1 1 1
0.51 0.49 0.971431 1.029146 0.999711
0.52 0.48 0.943416 1.058894 0.998846
0.53 0.47 0.915936 1.089267 0.997402
0.54 0.46 0.888969 1.120294 0.995378
0.55 0.45 0.862496 1.152003 0.992774
0.56 0.44 0.836501 1.184425 0.989588
0.57 0.43 0.810966 1.217591 0.985815
2
Name : Mahaputra Madani Senen
Risk Management & System Safety
Student No. : M10801850 Homework Day 2
Due date : 16/12/2019
Based on Table 2, we found that there is a 0.5 probability that gives the highest uncertainty.
Therefore, the entropy result also remains high.
ENTROPY
1.2
0.8
0.6
0.4
0.2
0
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101
As shown in Figure 2, we found that there are Head and Tail has equal probability
equal to 0.5, then it gives the highest entropy. Therefore, it can be interpreted that when there
are several potential events which each event has the same probability (uniform), then the
risk taker will find difficulty in deciding since they cannot guess which event that more likely
to happen.
3
Name : Mahaputra Madani Senen
Risk Management & System Safety
Student No. : M10801850 Homework Day 2
Due date : 16/12/2019
Homework 3
In this case, we defined that a = 2, and b = 4. Moreover, we generated 10000 random
numbers and we also calculated the F-1 by using the following formula:
F-1 = a + p (b - a), then we obtained a histogram plot based on the calculation result
afterwards. Furthermore, we can manage to calculate PDF for sample and theory, and KL-
Divergence as shown in Table 3. The result for KL-Divergence is -0.07761672. And figure 3
shows the picture of Histogram of F-1 (p) for 10000 random numbers.
Table 3. The F-1 calculation result
Frequenc PDF Info
Bin y PDF (Sample) (Theory) Info (Sample) (Theory)
2.00020050 6.65821148
6 1 0.0001 0.00990099 13.28771238 3
6.65821148
2.02019745 96 0.0096 0.00990099 6.702749879 3
2.04019439 6.65821148
5 95 0.0095 0.00990099 6.717856771 3
2.06019133 6.65821148
9 98 0.0098 0.00990099 6.673002535 3
2.08018828 6.65821148
4 94 0.0094 0.00990099 6.733123528 3
2.10018522 6.65821148
8 106 0.0106 0.00990099 6.559791925 3
2.12018217 6.65821148
3 109 0.0109 0.00990099 6.519528055 3
2.14017911 6.65821148
7 99 0.0099 0.00990099 6.658355759 3
2.16017606 6.65821148
2 90 0.009 0.00990099 6.795859283 3
……. Continue …..
2.64010273
1 100 0.01 0.00990099 6.64385619 6.658211483
2.66009967
6 108 0.0108 0.00990099 6.532824877 6.658211483
2.68009662 117 0.0117 0.00990099 6.41734766 6.658211483
2.70009356
5 104 0.0104 0.00990099 6.587272661 6.658211483
2.72009050
9 115 0.0115 0.00990099 6.442222329 6.658211483
2.74008745
4 101 0.0101 0.00990099 6.629500897 6.658211483
2.76008439 99 0.0099 0.00990099 6.658355759 6.658211483
4
Name : Mahaputra Madani Senen
Risk Management & System Safety
Student No. : M10801850 Homework Day 2
Due date : 16/12/2019
8
2.78008134
3 87 0.0087 0.00990099 6.844768884 6.658211483
2.80007828
7 100 0.01 0.00990099 6.64385619 6.658211483
2.82007523
2 108 0.0108 0.00990099 6.532824877 6.658211483
2.84007217
6 115 0.0115 0.00990099 6.442222329 6.658211483
2.86006912
1 112 0.0112 0.00990099 6.480357457 6.658211483
2.88006606
6 102 0.0102 0.00990099 6.615287038 6.658211483
2.90006301 104 0.0104 0.00990099 6.587272661 6.658211483
2.92005995
5 95 0.0095 0.00990099 6.717856771 6.658211483
…. Continue …
3.93990412
6 99 0.0099 0.00990099 6.658355759 6.658211483
3.95990107
1 88 0.0088 0.00990099 6.828280761 6.658211483
3.97989801
5 99 0.0099 0.00990099 6.658355759 6.658211483
More 101 0.0101 0.00990099 6.629500897 6.658211483
Sum (Freq) = 10000
Sum (Ent-Sample) Sum (Ent-Theory)
= 6.638387169 = 6.658211
Sum (Entropy Sum (Entropy
-Sample||Theory) -Theory||Sample)
= 6.658211483 = 6.716004
KL Divergence (Sample||
Theory) = -0.07761672
5
Name : Mahaputra Madani Senen
Risk Management & System Safety
Student No. : M10801850 Homework Day 2
Due date : 16/12/2019
10
0
0 2 4 6 8 10 12
Homework 4
Table 4 shows the calculation for determining event time, with time at step 0.01 and event
based on rule if the random number is greater than 0.7, there will be event, and event time is
time multiply event. And then event time were sorted from smaller to larger.
6
Name : Mahaputra Madani Senen
Risk Management & System Safety
Student No. : M10801850 Homework Day 2
Due date : 16/12/2019
7
Name : Mahaputra Madani Senen
Risk Management & System Safety
Student No. : M10801850 Homework Day 2
Due date : 16/12/2019
10
0
0 2 4 6 8 10 12
Homework 5
1. Cohort definition
In statistics, marketing, and demographics, a cohort is a group of subjects who share
decisive characteristics (usually subjects that experience a common event in a certain
period of time, such as birth or graduation). Or in other words, cohort is a group of people
who have something in common, can represent the source population – the population
from which cases of disease arise. For examples, such as all employees in an office
building, everyone who attended a football game, all the residents of a neighborhood.
2. Prospective definition
For prospective cohort study is type of research that there is a collection of exposure data
(baseline) of subjects recruited prior to the development of the desired results. The subject
is then followed through time (future) to record when the subject develops the desired
result. Ways to follow up on research subjects include: telephone interviews, face-to-face
interviews, physical exams, medical / laboratory tests, and sending questionnaires (source:
http://sphweb.bumc.bu.edu/otlt/MPH-
Modules/EP/EP713_CohortStudies/EP713_CohortStudies_print.html).
8
Name : Mahaputra Madani Senen
Risk Management & System Safety
Student No. : M10801850 Homework Day 2
Due date : 16/12/2019
3. Retrospective definition
Retrospective studies begin with subjects who are at risk of having interesting outcomes or
illnesses and identify results starting from where the subject was when the study began to
past the subject to identify exposures. Retrospective use notes: clinical, education, birth
certificate, death certificate, etc. But that might be difficult because there might not be data
for research that is starting. These studies may have many exposures which might make
this study difficult (source: http://sphweb.bumc.bu.edu/otlt/MPH-
Modules/EP/EP713_CohortStudies/EP713_CohortStudies_print.html). An example of a
retrospective cohort study is, if a demographic examines a group of people born in 1970
who have type 1 diabetes. Demographics will start by looking at historical data. However,
if demographics see ineffective data in an attempt to infer the source of type 1 diabetes,
demographic results will not be accurate (source: http://sphweb.bumc.bu.edu/otlt/MPH-
Modules/EP/EP713_CohortStudies/EP713_CohortStudies5.html). The figure 6 shows
illustration of retrospective cohort study.
9
Name : Mahaputra Madani Senen
Risk Management & System Safety
Student No. : M10801850 Homework Day 2
Due date : 16/12/2019
10
Name : Mahaputra Madani Senen
Risk Management & System Safety
Student No. : M10801850 Homework Day 2
Due date : 16/12/2019
Homework 6
From this homework, we want to see how statistic can be cheated. We can change the
correlation of data sets, with just change two sets of data. As we can show in the table below.
Table 6. Data sets, and the rank with their Correlation value
x y rank(x) rank(y)
0.52786
0.436553614 4 113 84
0.14984
0.529963464 5 92 164
0.06030
0.104397213 4 176 185
0.226603218 0.79876 147 38
0.60303
0.219025289 3 147 69
0.62811
0.155444842 3 163 64
0.14714
0.111213715 8 170 162
0.98549
0.460870931 1 104 4
0.95394
0.33675166 3 128 10
0.54626
0.811181187 7 37 74
….. continue ….
0.35894198 0.97125
4 8 8 3
0.54043345 0.87397
6 5 7 3
0.57716894 0.57289
4 2 5 5
0.76972053 0.58017
2 8 4 4
0.78357789 0.10305
4 5 3 8
0.97128
0.79122632 9 2 2
0.57179704 0.18254
6 7 2 6
11
Name : Mahaputra Madani Senen
Risk Management & System Safety
Student No. : M10801850 Homework Day 2
Due date : 16/12/2019
0.26109679 0.44376
1 9 3 3
0.89421203 0.65339
2 1 1 2
0.18594745 0.30899
9 9 2 2
0.26340043 0.99718
1 4 1 1
0.14617571 0.29567
7 1 1 1
Corellation 0.39354
Corellation = -0.0697 = 2
Table 7. Data sets, and the rank with their New Correlation value when we change two
sets of data
x y rank(x) rank(y)
0.436553614 0.527864 113 84
0.529963464 0.149845 92 163
0.104397213 0.060304 175 185
0.226603218 0.79876 146 40
0.219025289 0.603033 146 70
…. Continue ….
0.78357789 0.10305
4 5 3 8
0.97128
0.79122632 9 2 4
0.57179704 0.18254
6 7 2 6
50 50 1 1
-50 -50 4 5
0.18594745 0.30899
9 9 2 3
0.26340043 0.99718
1 4 1 1
0.14617571 0.29567
7 1 1 2
12
Name : Mahaputra Madani Senen
Risk Management & System Safety
Student No. : M10801850 Homework Day 2
Due date : 16/12/2019
1 6
It can be seen that from table 6 and 7, that when we change two sets data with very large
numbers and very small numbers in each the same row, we will get significantly much larger
correlation value for random numbers from -0.0697 to 0.996321, but for the rank sets, it is
not significantly changed the correlation value.
Table 8. Data sets, and the rank with their New Correlation value when we change two
sets of data
x y rank(x) rank(y)
0.436553614 0.527864 113 84
0.529963464 0.149845 92 163
0.104397213 0.060304 175 184
0.226603218 0.79876 146 39
0.460870931 0.985491 104 5
….. continue …
0.78357789 0.10305
4 5 3 7
0.97128
0.79122632 9 2 3
0.57179704 0.18254
6 7 2 5
50 -50 1 6
-50 50 4 1
0.18594745 0.30899
9 9 2 2
0.26340043 0.99718
1 4 1 1
0.14617571 0.29567
7 1 1 1
Corellation Correlation 0.40169
= -0.99665 = 9
Since we changed two sets data but with different sign (Positive or negative) in each row, we
can get the correlation value significantly change smaller than before, from -0.0697 to
-0.99665, but for the rank sets, it is not significantly changed the correlation value. From this
13
Name : Mahaputra Madani Senen
Risk Management & System Safety
Student No. : M10801850 Homework Day 2
Due date : 16/12/2019
case, we can conclude that even it can be cheated, in such area we still can find there is
something wrong with the data.
14