Tutorial Sheet EN Solution

Tutorial Statistics and Probability
Summer Term 2022
Solutions
Contents
1 Statistical attributes and variables 3
2 Measures to describe statistical distributions 10
3 Two dimensional distributions 14
4 Linear regression 20
5 Combinatorics and counting principles 25
6 Fundamentals of probability theory 27
7 Random variables in one dimension 31
8 Multidimensional random variables 37
9 Stochastic models and special distributions 40
10 Limit theorems 46
11 Point estimation of population parameters 48
12 Interval estimation 50
13 Statistical hypotheses testing 56
2
1 Statistical attributes and variables
Comprehension questions
Explain in your own words :
• What is statistics about and what does descriptive statistics mean?
• What is a frequency density function and what a distribution function, and which (mathematical)
features do they have to fulfill? How are they linked to each other?
• What kind of information is easily given by a frequency density function and by a distribution
function?
Solution of Exercise 1
Potential examples:
Attribute Statistical units Characteristic Type Scale
values
hair color male in the age black, brown, qualitative nominal
of 60 to 65 blond, gray, . . .
income tutor 15 €/h to 20 €/h quantitative cardinal
discrete
school grades class of 2010 0-15 points quantitative ordinal
discrete
gender students at FS male, female qualitative nominal
bank account account at a N0 or e.g. quantitative cardinal
transfers per savings bank (0-1000) discrete
month
body height players of the 1, 60m - 2, 30m quantitative cardinal
NBA continuous
..
.
(a) The statistical units are the days at which the kiosk is open. Possible attributes are zero and all
natural numbers. However, a more precise answer would be the maximum amount of newspapers
the kiosk gets delivered per day. The type is a quantitative discrete one which is measureable
on a cardinal scale.
(b) The sum of all days is n = 200. If each entity ni is divided by this number n one obtains the
relative frequencies hi . Summing up these numbers yields the relative cumulative frequencies
Hi . These are the values of the distribution function at its steps.
i number of number of days rel. frequencies hi rel. cum.
newspapers sold xi ni frequencies Hi
1 0 21 0.105 0.105
2 1 46 0.230 0.335
3 2 54 0.270 0.605
4 3 40 0.200 0.805
5 4 24 0.120 0.925
6 5 10 0.050 0.975
7 6 5 0.025 1.000
Σ 200 1.000
Distribution function H : R → [0, 1]
3


 0 for x<0
0≤x<1




 0.105 for
0.335 for 1≤x<2





 0.605 for 2≤x<3
H(x) =

 0.805 for 3≤x<4




 0.925 for 4≤x<5
0.975 for 5≤x<6






1 for 6≤x
1.000
0.975
0.925
0.805
0.605
H(x)
0.335
0.105
0 1 2 3 4 5 6 7
(c) H(2) = (21 + 46 + 54)/200 = 0.605

On 60.5% of all days the kiosk owner does sell two or less newspapers of that certain kind.
4
100
90
80%
80
70
60 (b)
H̄(x) [%]
50%
50
40 (c)
30
30 % a)
20 x[80%]
xMed
10
250 300 350 400 450 500 550 600 650

rent expenses
(a) H̄(450) − H̄(350) = 40% − 10% = 30%

!
(b) H̄(x) = 80% =⇒ x = 550
(c) H̄(xMed ) = 50% =⇒ xMed = 500
(d)
0.6
0.4
h̄(x) [%]
0.2
0
250 300 350 400 450 500 550 600 650
x
5
(a) We use a working table:
Points
i ni hi Hi
from . . . to below . . .
1 0− 25 50 0.125 0.125
2 25 − 50 90 0.225 0.350
3 50 − 75 170 0.425 0.775
4 75 − ··· 90 0.225 1.000
Σ 400 1.000
100
1.5
b)
80
h̄(x) [%]
1 H̄(x) [%]
60
40
0.5
20
0
0 25 50 75 100 0
x 0 25 50 75 100
(b) Obviously, the number of students with at most 90 points is missing in the table. So, it needs
to be approximated by the given data. There are several ways how it can be done:
• Given that 90 points is in the last interval, it needs to be that H(90) = H3 + x. The
unknown x is the portion of h4 covering the range from 75 to 90 points. Assuming that the
distribution inside this interval is uniform and continuous, we obtain:
90 − 75
x = h4 · = 0.225 · 0.6 = 0.135
100 − 75
So we get H(90) = H3 + 0, 135 = 91%, or in absolute values 364 (= 91% · 400) students
with at most 90 points.
• An alternative would be to start with H(90) = H(100) − y, where y would be the fraction
of students who achieved more than 90 points.
100 − 90
y = h4 · = 0.225 · 0.4 = 0.09
100 − 75
Thus again H(90) = H4 − 0.09 = 1 − 0.09 = 91%.
For this approximation we assumed that points are continuously and uniformly distributed within
the class from 75 to 100 points. If you assume only discrete numbers of points you get slightly
deviating results. Using appropriate assumptions, those are also valid.
6
(a)
class absolute frequency relative frequency rel. cumulative frequency

i ni hi HK (ξi )
1 [100, 200] = [ξ0 , ξ1 ] 2 0.04 0.04
2 (200, 300] = (ξ1 , ξ2 ] 7 0.14 0.18
3 (300, 400] .
.. 9 0.18 0.36
4 (400, 500] 8 0.16 0.52
5 (500, 600] 6 0.12 0.64
6 (600, 700] 9 0.18 0.82
7 (700, 800] 2 0.04 0.86
8 (800, 900] 1 0.02 0.88
9 (900, 1000] 3 0.06 0.94
10 (1000, 1100] 1 0.02 0.96
11 (1100, 1200] = (ξ10 , ξ11 ] 2 0.04 1.00
Σ 50 1.00
(b) Histogram:
0.002
0.0018
0.0016
0.0014
0.0012
h̄(x)
0.001
0.0008
0.0006
0.0004
0.0002
0
0 200 400 600 800 1000 1200
x
7
1
0.96
0.92
0.88
0.84
0.8
0.76
0.72
0.68
0.64
0.6
0.56
H̄(x)
0.52
0.48
0.44
0.4
0.36
0.32
0.28
0.24
0.2
0.16
0.12
8 · 10−2
4 · 10−2
0
0 200 400 600 800 1,000 1,200
(c) Median
i. using the given data set:
x25 + x26 489.9 + 498.0
xmed = = = 493.95
2 2
ii. using the distribution function of classes:
xmed = H̄ −1 (50%)
50% − 36%
= 400 + · (500 − 400)
52% − 36%
= 400 + 87.5 = 487.5
or
xmed = H̄ −1 (50%)
52% − 50%
= 500 − · (500 − 400)
52% − 36%
= 500 − 12.5 = 487.5
(d) i. Within the ordered data set we have
x35 = 642, 7 ≤ 650 < 650, 6 = x36
Therefore on 35 days the turnover is 650 EUR or less.
8
ii. Using the distribution function of classes H̄ we obtain the relative frequency of days with
a turnover of 650 EUR or less by:
650 − 600
H̄(650) = H̄(600) + · H̄(700) − H̄(600)
700 − 600
1
= 0.64 + · (0.82 − 0.64)
2
= 0.73 = 73%
So, according to the distribution function of classes on 73% · 50 = 36.5 (or rounded to 37)
days, the turnover was 650 EUR or less.
Using the polygon approximation H̄ as distribution function we presume a uniform distribution
of observations within the classes. However in class (600, 700] we recognize only 3 out of 9 values
being below the midpoint of class whereas 6 observations are above the midpoint.
612.6
| 638.6
{z 642.7} |650.6 651.1 651.3 {z 687.8 689.6 690.7}
3 values (≤650) 6 values (>650)
9
2 Measures to describe statistical distributions
• One of the most important measures is the arithmetic mean. Which alternatives do you know
and when would you prefer one measure over another?
• What is the difference between a quartile and a quantile?
• Why is the variance defined by a square operator, why not only as the sum of differences to the
arithmetic mean, or why not only as the sum of absolute differences?
(a) The arithmetic mean of the 9,114 numbers results as
49
1 X
x̄ = nj · j = (187 · 1 + 194 · 2 + 194 · 3 + · · · + 207 · 49)/9114 = 25.22 .
9114 j=1
(b) The uniformly weighted arithmetic mean of the 49 lotto numbers is given by
49
1 X 1 49 · 50
x̄uniform = j= · = 25 .
49 j=1 49 2
The uniformly weighted arithmetic mean is assuming that all numbers appear with the same
frequency. As the calculation shows, this is not true for the given data. However, since the
calculated number is pretty close to 25, it is likely that for very large sample sizes the calculation
approaches the uniformly weighted mean.
If 40% of the overall income will be increased by 12% and the residual 60% stay constant, the average
results as follows
40% · 1.12 + 60% · 1 = 1.048 .
So, after increasing income, the arithmetic mean will be 1.048 · 3400 € = 3563.20 €.
Since only the top 20% of incomes will be increased, the median (50%-quantile) will be unaffected and
remains 3100 €.
1 11
(a) The arithmetic mean p̄ is · (6 + 3 + 2) = ≈ 3.6667 .
3 3
−1
1 1 1 1 1 1

(b) From · + + = we get the harmonic mean as reciprocal value: Hp = =3.
3 6 3 2 3 3
| {z }
=1
1
(c) The quantity weighted arithmetic mean results as: · (1 · 6 + 2 · 3 + 3 · 2) = 3
1+2+3
The correct answer is given by c). This is the total payment (18) divided by the number of dozens of
oranges bought (6).
(a) Using
i 1 2
vi [ km
h ] 60 50
4 5
hi 9 9
10
the average speed (overall distance by total time) is obtained as weighted arithmetic average v̄:
1h · 60 km 5 km
h + 4 h · 50 h
average speed =
1h + 45 h
60 km
h +
5
4 · 50 km
h
= 9
4
4 km 5 km km
= · 60 + · 50 = 54.4
|9 h {z 9 h} h
=h1 ·v1 +h2 ·v2 =v̄
(b) The average growth rate is determined by the geometric mean G1+r of the growth factors 1 + r:
i 1 2 3
ri 1.8% 2.5% 2.0%
1 + ri 1.018 1.025 1.020
√3
√3
G1+r = 1.018 · 1.025 · 1.020 = 1.0643 = 1.0209
Thus the average growth rate is about 2.1%.
(c) Using
district i 1 2 3 4 5
rate of unemployment xi in % 4 3 5 9 6
number of unemployed ni 1600 750 1000 3600 1500
we obtain the average rate of unemployment (ratio of total number of unemployed to total
number of individuals in the examination) as the harmonic mean HX of the xi :
total number n of unemployed
z }| {
1600 + 750 + 1000 + 3600 + 1500 8450
average rate of unemployment = 1600 750 1000 3600 1500 = = 5.63%
4% + 3% + 5% + 9% + 6%
150000
| {z }
total number of individuals in the examination
n

= n1 n2 n5 = HX
x1 + x2 + ··· + x5
(d) The solutions x1 , x2 of a quadratic equation x2 + p · x + q are related to the coefficients p, q by

(Vieta’s formula):
x1 + x2 = −p and x1 · x2 = q
Thus for the solutions of x2 − 15x + 49 = 0 we have
i.
1 −p 15
x̄ = (x1 + x2 ) = = = 7.5
2 2 2
ii. Because of x1 + x2 = 15 > 0 and x1 · x2 = 49 > 0 both solutions are positive, so we can
build the geometric (as well as the harmonic) mean.
√ √ √
GX = x1 · x2 = q = 49 = 7
1 1
iii. The reciprocals x1 , x2 are solutions of the equation
1 1 15 1
2
− 15 + 49 = 0 ⇐⇒ x2 − x+ =0 (for x 6= 0) .
x x 49 49
Thus
2 2 98
HX = 1 1 = 15 = = 6.53 .
x1 + x2 49
15
11
(a) Working table
Deviation Deviation Deviation Deviation
IQ(w) IQ(sch)
IQ(w) − IQ(w) squared IQ(sch) − IQ(sch) squared
90 −10 100 95 0 0
90 −10 100 99 4 16
97 −3 9 90 −5 25
99 −1 1 105 10 100
98 −2 4 85 −10 100
145 45 2025 98 3 9
114 14 196 110 15 225
80 −20 400 96 1 1
85 −15 225 69 −26 676
102 2 4 103 8 64
Σ= 1000 0 3064 950 0 1216
IQ(w) s2w = 306.4 IQ(sch) s2sch = 121.6
= 100 = 95
(b) The variance of the w-group is larger. The same is true for the coefficient of variation:
√ √
306.4 121.6
V Kw = = 0.175 V Ksch = = 0.116
100 95
X, Y : The sequences X and Y do not vary at all. Thus s2X = s2Y = 0.
02 + 1 2 + 0 2 + 1 2 + 0 2 + 1 2 + 0 2 + 1 2
U : u = 0.5 ; s2U = − 0.52 = 0.5 − 0.25 = 0.25
8
5 12 + 02 + 12 + 02 + 12 + 02 + 12 + 02 + 12 25 20
V : v= ; s2V = − = ≈ 0.2469
9 9 81 81
33 9 + 9 + 9 + 49 + 25 + 9 + 25 + 16 1089 119
W : w= ; s2W = − = ≈ 1.8594
8 8 64 64
45 1 + 4 + 9 + 16 + 25 + 36 + 49 + 64 + 81 285
T : t= = 5 ; s2T = − 25 = − 25
9 9 9
95 20
= − 25 = ≈ 6.6667
3 3
For the calculation of variances the formula s2X = x2 − x2 was used.

(a) Working table:
Europe U.S.
expenses midpoint expenses midpoint
xi hi hi · xi hi · x2i yi li li · yi li · yi2
[EUR] [EUR] [EUR] [EUR2 ] [USD] [USD] [USD] [USD2 ]
0 − 50 25 0.1 2.5 62.5 0 − 50 25 0.2 5 125
50 − 100 75 0.2 15 1125 50 − 100 75 0.25 18.75 1406.25
100 − 150 125 0.3 37.5 4687.5 100 − 150 125 0.25 31.25 3906.25
150 − 200 175 0.2 35 6125 150 − 200 175 0.2 35 6125
200 − 250 225 0.1 22.5 5062.5 200 − 250 225 0.05 11.25 2531.25
250 − 275 0.1 27.5 7562.5 250 − 275 0.05 13.75 3781.25
Σ 1 140 24 625 Σ 1 115 17 875
= x̄ = x2 = ȳ = y2
12
Thus for the arithmetic means x̄, ȳ, variances s2X , s2Y , standard deviations sX , sY and coefficients
of variations VKX , VKY we have:
Europe U.S.
x̄ = 140 EUR ȳ = 115 USD
s2X = x2 − x̄2 = 5025 EUR2 s2Y = − ȳ 2
y2 = 4650 USD2
sX = 70.88 EUR sY = 68.19 USD
VKX = s|x̄|
X
= 70.88
140 = 0.5063 VKY = s|ȳ|
Y
= 68.19
115 = 0.5930
(b) Since the variances and standard deviations are measured with different scales (EUR vs. USD),
we can’t compare them directly. It would be possible to make them comparable using a foreign
exchange rate. Another possibility, that simultaneously refers the dispersion to the absolute level
of the values in the data set is the (dimensionless) coefficient of variation. Using this measure
the dispersion of expenses is larger for the students from the U.S.
Y is a linear transformation of X, namely Y = a + bX with a = 4 and b = −2.5. Using x = 12 and
s2X = 25 we obtain:
y = a + b x = 4 − 2.5 · 12 = −26
s2Y = b2 s2X = 6.25 · 25 = 156.25
q
sY = |b| · sX = 2.5 · 5 = 12.5 = s2Y
From the given data we obtain:
12
X 15
X
xi = n · x̄ = 12 · 9 = 108 ⇒ xi = 108 + 8 + 12 + 13 = 141
i=1 i=1
12
X 15
X
x2i = n · s2X + x̄2 = 12 · 2.52 + 92 = 1047 ⇒ x2i = 1047 + 82 + 122 + 132 = 1424
i=1 i=1
Now we can calculate x̄new and sXnew as follows:

141
x̄new = = 9.4
15
1424
s2Xnew = x2new − (x̄new )2 = − 9.42 = 6.5733
15
sXnew = 2.564
13
3 Two dimensional distributions
• What is the intention of distributions with two or more dimensions?
• How do average and variance differ (in their calculations) from the case of univariate statistics?
• What does „statistical independency“ mean and why is it important?
• What is the difference between „statistical dependency“ and „correlation“?
(a) Scatter plot:
(b) Calculate covariance and correlation using a working table:

i xi yi xi − x (xi − x)2 yi − y (yi − y)2 (yi − y) · (yi − y)
1 98 94 −4 16 −6 36 24
2 100 94 −2 4 −6 36 12
3 104 103 2 4 3 9 6
4 104 105 2 4 5 25 10
5 102 99 0 0 −1 1 0
6 102 102 0 0 2 4 0
7 104 103 2 4 3 9 6
Σ= 714 700 32 120 58
x = 102 y = 100 s2X = 4.5714 s2Y = 17.1429 cXY = 8.2857
sX = 2.1381 s2Y = 4.1404
cXY 8.2857
Coefficent of correlation: rXY = = = 0.9360 .
sx · sY 2.1381 · 4.1404
There is a strong positive relationship between the IQs of twins within this sample. However,
from the sample we cannot deduct that this is a general rule.
14
(See also solution 11 on page 12.)
X, Y : The sequences X and Y do not vary. Thus (xi − x) = (yi − y) = 0 for all i. This implies that
all covariances with sequnces X or Y involved also do vanish. Therefore cXU = cY V = 0.
1·1+0·2+1·3+0·4+1·5+0·6+1·7+0·8+1·9 25
cV T : v · t = =
9 9
25 5 45
cV T =v·t−v·t= − · =0
9 9 9
cU V : Since the sequences U and V are of different lengths, cU V is undefined.
0+3+0+7+0+3+0+4 17
rU W : At first we calculate the covariance: u · w = = ,
8 8
17 1 33 1
cU W = u · w − u · w = − · = = 0.0625
8 2 8 16
cU W 0.0625
and then the correlation coefficient: rU W = = = 0.091 67 .
sU · sW 0, 5 · 1.3636
rY V : Since sY = 0 , rY V is undefined. (Division by zero)
cV T
rV T : Because cV T = 0 and sV , sT 6= 0, we have rV T = =0.
sV · sT
(a) Two characteristics are gathered: age and type of employment. Age is cardinal but clustered
in intervals, whereas type of employment is nominal and has only two possible values (It’s a so
called dichotomous attribute).
(b) Joint distribution and marginal distributions:
Marginal
Age classes Self-employed Employees (dep.)
distribution Age
15 – 25 0.0028 0.1116 0.1144
25 – 35 0.0165 0.1804 0.1969
35 – 45 0.0397 0.2645 0.3042
45 – 55 0.0348 0.2141 0.2489
55 – 65 0.0218 0.0996 0.1214
65 – 95 0.0075 0.0067 0.0142
Marginal distrib.
0.1231 0.8769 1
Type of employm.
(c) Conditional distribution for age and its marginal distribution:

Conditional distributions
Age classes Self-employed Employees (dep.) Marginal distribution
15 – 25 0.0227 0.1272 0.1144
25 – 35 0.1342 0.2057 0.1969
35 – 45 0.3227 0.3016 0.3042
45 – 55 0.2825 0.2441 0.2489
55 – 65 0.1772 0.1136 0.1214
65 – 95 0.0606 0.0077 0.0142
1 1 1
15
Histogram of the attribute age class conditional on dependently employed people:
(d) For the determination of the conditional medians we consider the distribution functions of the
conditional distributions:
Age classes Self-employed Employees (dep.)
15 – 25 0.0227 0.1272
25 – 35 0.1569 0.3329
35 – 45 0.4796 0.6345
45 – 55 0.7621 0.8786
55 – 65 0.9393 0.9922
65 – 95 1 1
For the self-employed, the median is located in the age interval of 45-55 years (since the value
1
2 is therein). Using a polygon to approximate the distributon function the median is given by
(see sketch below):
xmed − 45 0.5 − 0.4796

= .
55 − 45 0.7621 − 0.4796
Solving for xmed reveals
0.5 − 0.4796
xmed = 45 + (55 − 45) · ≈ 45.72 .
0.7621 − 0.4796
In the same manner we calculate the median ymed of employees:
0.5 − 0.3329
ymed = 35 + (45 − 35) · ≈ 40.54 .
0.6345 − 0.3329
16
(e) For the calculation of the conditional means it is assumed that the midpoint of a class represents
the average of the observations within a class. This is unlikely to be exactly the case, however,
it is an reasonable approximation and the errors should be small.
Relative frequencies
Avg. of class Self-employed Employees (dep.)
20 0.0227 0.1272
30 0.1342 0.2057
40 0.3227 0.3016
50 0.2825 0.2441
60 0.1772 0.1136
80 0.0606 0.0077
This reveals a conditional arithmetic mean x for the self-employed of
x = 20 · 0.0227 + 30 · 0.1242 + · · · + 80 · 0.0606 ≈ 47.01
and a conditional arithmetic mean y for the employees of
y = 20 · 0.1272 + 30 · 0.2057 + · · · + 80 · 0.0077 ≈ 40.42 .
(f) Proportion of self-employed, being 55 years or older:
0.1772 + 0.0606 = 0.2378 = 23.78%
Proportion of employees, being 55 years or older:
0.1136 + 0.0077 = 0.1213 = 12.13%
0.0142 of people are older than 65.

0.0075 of people are older than 65 and are self-employed.
0.0075
Therefore, ≈ 53% of people older than 65 are self-employed.
0.0142
17
(a) We complete the contingency table:
y1 y2 y3
x1 10 5 3 18
x2 6 5 1 12
16 10 4 30
i.
6 1
h1|X=x2 = =
12 2
1 1
h2|Y =y3 = =
4 4
ii. X and Y are not independent, for instance it holds

18 16
h1• · h1• = · = 0.32 , but
30 30
10
h11 = = 0.3
30
(b) For the absolute frequencies nij of two independent characteristics it holds:
1 1
nij = n · hij = n · hi• · h•j = n · · ni• · · n•j
n n
1
= · ni• · n•j
n
1
Thus: n11 = · 20 · 50 = 5 , . . .
200
y1 y2 y3 y4 y5
x1 5 3 1 7 4 20
x2 10 6 2 14 8 40
x3 15 9 3 21 12 60
x4 10 6 2 14 8 40
x5 10 6 2 14 8 40
50 30 10 70 40 200
(a) At first, we have absH(Y = y1 ) = 80 ⇒ relH(Y = y1 ) = 80/200 = 0.4 . From this, by
completion of the first column and the last row we find the two values 0.2 . Now we complete
the second column by 0.1 .
Now relH(Y = y3 |X = x1 ) = 0.5 means, that conditional on X = x1 the relative frequency
of y3 is as large as for y1 and y2 altogether. Thus we have h(x1 , y3 ) = h(x1 , y1 ) + h(x1 , y2 ) =
0.2 + 0.1 = 0.3 .
Finally, we find the missing values 0.1 , 0.4 and 0.6 by completion of rows or columns.
h(xi , yj ) y1 = 0 y2 = 1 y3 = 4 Σ
x1 = 1 0.2 0.1 0.3 0.6
x2 = 2 0.2 0.1 0.1 0.4
Σ 0.4 0.2 0.4 1
18
(b) X and Y are staistically dependent, for instance we have:
h1• · h•1 = 0.6 · 0.4 = 0.24 6= 0.2 = h11 = h(x1 , y1 )
(c) The condition x + y = 2 is met by pairs (x2 = 2, y1 = 0) and (x1 = 1, y2 = 1) . Their quantity
is given by:
n · (h(2, 0) + h(1, 1)) = 200 · (0.2 + 0.1) = 60
(d)
x̄ = 1 · 0.6 + 2 · 0.4 = 1.4

ȳ = 1 · 0.2 + 4 · 0.4 = 1.8
xy = 1 · 1 · 0.1 + 1 · 4 · 0.3 + 2 · 1 · 0.1 + 2 · 4 · 0.1 = 2.3
cXY = xy − x̄ · ȳ = 2.3 − 1.4 · 1.8 = −0.22
19
4 Linear regression
• What is the purpose of linear regression analysis?
• Discuss how a regression line and causality are related to each other!
• Interpret the coefficient of determination R2 !
Working table:
Month xi yi xi − x yi − y (xi − x)2 (yi − y)2 (xi − x) · (yi − y)
January 3000 200 200 50 40 000 2500 10 000
February 3200 250 400 100 160 000 10 000 40 000
March 2900 200 100 50 10 000 2500 5000
April 2700 150 −100 0 10 000 0 0
May 2700 150 −100 0 10 000 0 0
June 2800 150 0 0 0 0 0
July 2600 100 −200 −50 40 000 2500 10 000
August
September 2500 50 −300 −100 90 000 10 000 30 000
October 2600 70 −200 −80 40 000 6400 16 000
November 2800 150 0 0 0 0 0
December 3000 180 200 30 40 000 900 6000
Σ = 30 800 1650 440 000 34 800 117 000
x = 2800 y = 150 s2X = 40000 s2Y = 3163.64 cXY = 10 636.36
(a) The correlation coefficent rXY is given by:
cXY 10 636.36
rXY = = = 0.9473
sX · sY 200 · 56.246
So we have a strong linear dependency between X and Y .
(b) i. Regress Y on X:
cXY 10 636.36
b= 2 = = 0.2659 , a = y − b · x = 150 − 0.2659 · 2800 = −594.52
sX 40000
regression line: y(x) = a + b · x = −594.52 + 0.2659x

ii. Regress X on Y :
cXY 10 636.36
b0 = 2 = = 3.3621 , a0 = x − b0 · y = 2800 − 3.3621 · 150 = 2295.6897
sY 3163.64
regression line: x(y) = a0 + b0 · y = 2295.6897 + 3.3621y
20
(c)
Regression Y auf / on X
Regression X auf / on Y
250
200
Überstunden / Overtime
150
100
50
2400 2500 2600 2700 2800 2900 3000 3100 3200 3300
Produktion / Production
(d) Estimation for the quantity produced during a “usual” month, where no overtime is done:
(use x(y) and set y = 0)
x(0) = 2295.6897
(a) Working table:
Household xi yi xi − x yi − y (xi − x)2 (yi − y)2 (xi − x) · (yi − y)
1 40 80 −20 −18 400 324 360
2 45 80 −15 −18 225 324 270
3 60 90 0 −8 0 64 0
4 80 140 20 42 400 1764 840
5 75 100 15 2 225 4 30
Σ= 300 490 1250 2480 1500
x = 60 y = 98 s2X = 250 s2Y = 496 cXY = 300
(b) For the correlation coefficient rXY we obtain:

cXY 300
rXY = =√ √ = 0.8519
sX · sY 250 · 496
There is a strong positive correlation between X and Y .
(c) Regress Y on X:
cXY 300
b= 2 = = 1.2 , a = y − b · x = 98 − 1.2 · 60 = 26
sX 250
regression line: y(x) = a + b · x = 26 + 1.2 x

Interpretation: Additional income of 1000 € per year would increase the average monthly expen-
ditures for dairy products by 1.20 €.
21
(d) Regress X on Y :
cXY 300
b0 = 2 = = 0.6048 , a0 = x − b0 · y = 60 − 0.6048 · 98 = 0.7296
sY 496
regression line: x(y) = a0 + b0 · y = 0.7296 + 0.6048 y
(e) “More income means, you can afford more expensive food.”
“More and expensive dairy products support you such that your income increases.”
In general the given information is sufficient to calculate a linear regression. However, in doing so, the
calculation reveals, that there is something wrong with the numbers:
s2X = 418 − 202 = 18

s2Y = 24 − 32 = 15
cXY = 77 − 20 · 3 = 17
From this we deduce values for the correlation coefficient and the coefficient of determination:
17 17
rXY = √ = >1
18 · 15 16.4316
289
R2 2
= rXY = >1
270
However, a correlation coefficient or a coefficient of determination larger than 1 can never happen.
Thus the values given can’t be correct.
(a) Working table:
month index return stock return

i xi [%] yi [%] x2i [] yi2 [] xi yi []
1 3.6 5.0 2.96 25.00 18.00
2 2.6 3.4 6.76 11.56 8.84
3 2.8 1.0 7.84 1.00 2.80
4 0.0 −0.8 0.00 0.64 0.00
5 −0.4 1.6 0.16 2.56 −0.64
n=6 2.2 3.0 4.84 9.00 6.60
Σ 10.8 13.2 32.56 49.76 35.60
22
From the sums calculated above we obtain:
10.8%
x̄ = = 1.8%
6
13.2%
ȳ = = 2.2%
6
32.56
s2X = − 1.82 = 2.1867
6
49.76
s2Y = − 2.22 = 3.4533
6
35.60
cXY = − 1.8% · 2.2% = 1.9733
6
1.9733
rXY = √ √ = 0.7181
2.1867% · 3.4533%
(b)
cXY 1.9733
b = = = 0.9024
s2X 2.1867
a = ȳ − bx̄ = 2.2% − 0.9024 · 1.8% = 0.5757%
23
(c)

!
"# $ %

(d) The total variance s2Y of Y can be split as follows:
s2Y = s2Ŷ + s2E
For the part s2Ŷ , that is explained by the regression, we obtain
s2Ŷ = s2a+bX = b2 · s2X = 0.90242 · 2.1867 = 1.7807 .
Accordingly for the residual part s2E of the variance it remains
s2E = s2Y − s2Ŷ = 3.4533 − 1.7807 = 1.6726 .
Now the coefficient of determination R2 is given by
s2Ŷ 1.7807
R2 = = = 0.5157
s2Y 3.4533
or alternatively
R2 = rXY
2
= 0.71812 = 0.5157 .
24
5 Combinatorics and counting principles
• What is combinatory and which purpose does it answer to?
• Which problems/questions are typically asked in this subject?
• Are you able to transfer (the methods of) combinatory to other issues? Give examples for
problems beyond your statistics and math classes.
Assuming that the English language uses all 26 latin characters, and given that 5 vowels exist, the
combinations with a vowel in the middle are
21 · 5 · 21 = 2205.
However, sometimes the “y”is counted as a vowel. Then, the result becomes
20 · 6 · 20 = 2400.
In either case, this theoretical calculation does not mean that all possible combinations are plausible
from a linguistic point of view.
! !
6 43
6 correct numbers: = 1·1=1
6 0
! ! ! !
6 1 42 6
5 correct numbers with bonus: = =6
5 1 0 5
! ! !
6 1 42
5 correct numbers without bonus: = 6 · 1 · 42 = 252
5 0 1
! !
6 43
4 correct numbers: = 15 · 903 = 13 545
4 2
! !
6 43
3 correct numbers: = 20 · 12 341 = 246 820
3 3
In total there are 260 624 (different) winning combinations.

For the supervisor, there are obviously 12 possibilities, since 12 permanently employed consultants
are present. From the residual 11 permanently employed consultants another 2 have to be choosen.
Finally, for the professor there are 5 possible choices. According to the basic principle of counting the
total number of possibilities is:
! ! !
12 11 5 11 · 10
· · = 12 · · 5 = 12 · 55 · 5 = 3300
1 2 1 2
25
Each area of the team can be setup using a combination where order matters:
5!
2 strikers: = 5·4 = 20 possibilities
(5 − 2)!
7!
5 players for mid-field: = 7 · 6 · 5 · 4 · 3 = 2520 possibilities
(7 − 5)!
6!
3 players for defense: = 6·5·4 = 120 possibilities
(6 − 3)!
3!
1 goal keeper: = 3 = 3 possibilities
(3 − 1)!
According to the basic principle of counting there are
20 · 2 520 · 120 · 3 = 18 144 000
possible setups for the team.

If the order within an area does not matter, then we only have
! ! ! !
5 7 6 3
· · · = 10 · 20 · 21 · 3 = 12 600
2 5 3 1
different setups for the team.

(a) We distinguish 3 cases:

1 1 10
(i) Alan goes to the party, Bob doesn’t: 1 0 6 = 210
1 1 10
(ii) Bob goes to the party, Alan doesn’t: 1 0 6 = 210
2 10
(iii) Both, Alan and Bob do not go to the party: 0 7 = 120
540
In total there are 540 possibilities.
(b) We distinguish 2 cases:

2 10
(i) Alan and Betty go to the party: 2 5 = 252
2 10
(ii) Both, Alan and Betty do not go to the party: 0 7 = 120
372
In total there are 372 possibilities.
26
6 Fundamentals of probability theory
• How did Laplace define probability? What is statistical probability and what does subjective
probability mean?
• Explain Bayes’ theorem and give an example for its application.
(a) Ω = {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6),
(2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6),
(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6),
(4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6),
(5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6),
(6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)}
(b) The event B having a 5 for the first throw has the following 6 elements (elementary events)
B = {(5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6)} ,
thus P(B) = 61 .
The event A, that the sum is 10 or larger, has again 6 elements (elementary events)
A = {(4, 6), (5, 5), (5, 6), (6, 4), (6, 5), (6, 6)} ,
thus P(A) = 61 .
1
The joint event A ∩ B = {(5, 5), (5, 6)} has two elementary events, P(A ∩ B) = 18 .
In order to determine the conditional probability P(A|B) we calculate
P(A ∩ B) 1/18 1
P(A|B) = = = .
P(B) 1/6 3
(c) The modified request B 0 that at least one of the dice shows a 5, consists of 11 elements:
B 0 = {(5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6), (1, 5), (2, 5), (3, 5), (4, 5), (6, 5)}
Thus P(B 0 ) = 11
36 . The event A ∩ B 0 has 3 elements
A ∩ B 0 = {(5, 5), (5, 6), (6, 5)} ,
thus P(A ∩ B 0 ) = 1
12 . For the conditional probability P(A|B 0 ) we now have
P(A ∩ B 0 ) 1/12 3
P(A|B 0 ) = 0
= = .
P(B ) 11/36 11
27
There are 26 different six-digit patterns consisting of heads and tails, e.g. (H, H, T, T, T, H). All of
them appear with an equal probability of 216 . We are interested in patterns with exactly three H. The
number of such patterns is 63 . Thus, the probability we are looking for is given by

6
3 20 5
= = = 31.25% .
26 64 16
Ω = {1, 2, 3, 4, 5, 6} × {1, 2, 3, 4, 5, 6} , |Ω| = 36

A = {(2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6),
(5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6)} , |A| = 12
B = {(1, 6),
(2, 5), (2, 6),
(3, 4), (3, 5), (3, 6),
(4, 3), (4, 4), (4, 5), (4, 6),
(5, 2), (5, 3), (5, 4), (5, 5), (5, 6),
(6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)} , |B| = 21
A ∩ B = {(2, 5), (2, 6), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6)} , |A ∩ B| = 7
Thus:
12 1
P(A) = =
36 3
21 7
P(B) = =
36 12
7
P(A ∩ B) =
36
1 7 7
P(A) · P(B) = · = = P(A ∩ B)
3 12 36
Therefore A and B are independent.
With two children the sample space Ω consists of four elements with all alike probabilities (B =
ˆ boy;
G=
ˆ girl) :
Ω = {(B; B), (B; G), (G; B), (G, G)}
# of favorable outcomes
We determine the probabilities a), b), c) by the ratio .
# of possible outcomes
|{(B; B)}| 1
(a) =
|Ω| 4
|{(B; B)}| 1
(b) =
|{(B; B), (B; G), (G; B)}| 3
|{(B; B)}| 1
(c) =
|{(B; B), (B; G)}| 2
28
20 1
arrival minutes train to P(A) = =
120 6
from to A B C
40 1
2:00 pm 2:10 pm 10 10 P(B) = =
120 3
2:10 pm 2:30 pm 20 20
60 1
2:30 pm 3:00 pm 30 30 P(C) = =
120 2
3:00 pm 3:10 pm 10 10
3:10 pm 3:30 pm 20 20
3:30 pm 4:00 pm 30 30
120 20 40 60
(a) P(A ∪ B) = P(A) + P(B) − P(A ∩ B) = 0.5 + 0.3 − 0.2 = 0.6
P(A ∩ B) 0.2 2
(b) P(A|B) = = =
P(B) 0.3 3
(c) P(A ∩ B) = 1 − P(A ∩ B) = 1 − 0.2 = 0.8
( f ) P(A ∩ B) = 1 − P(A ∩ B) = 1 − P(A ∪ B) = 1 − 0.6 = 0.4
(d) P(A ∪ B) = P(A) + P(B) − P(A ∩ B) = 0.5 + 0.7 − 0.4 = 0.8
(e) P(A ∪ B) = 1 − P(A ∪ B) = 1 − 0.6 = 0.4
(a) If (A implies B) and P(A) > 0, then it follows:
(x) P(A|B) ≥ P(A)

( ) P(A|B) < P(A)
( ) P(A|B) ≥ P(B)
(x) P(B|A) > 0
(b) If (A and B are disjoint) and P(A) > 0, then it follows:
( ) P(A ∩ B) = P(A) · P(B)

(x) P(A ∪ B) = P(A) + P(B)
( ) P(A ∪ B) = 0
( ) P(A|B) = P(A)
Let F be the event, that a randomly choosen good is faulty. Let A and B be the events that the good
was produced by the respective machine. Then we know:
P(A) = 0.7 , P(B) = P(A) = 0.3 , P(F |A) = 0.08 , P(F |B) = 0.06 .
From this we deduce the total probability of F
P(F ) = P(F |A) · P(A) + P(F |B) · P(B) = 0, 08 · 0.7 + 0.06 · 0.3 = 0.074
29
and finally the conditional probability
P(F |A) · P(A) 0.08 · 0.7
P(A|F ) = = ≈ 0.7568 = 75.68% .
P(F ) 0.074
Drawing without putting back is like randomly ordering the parts. For an arbitrary place in the
ordered sequence of the 25 parts, the possibility to put a faulty part there is
5 1
= = 20% .
25 5
This is the correct answer for a) as well as for b).
(a) Let M be the event that a person randomly chosen from the group is a man, W the event, that
a person is a woman and let T be the event that a person is taller than 1.90 m. Then we know:
P(T |M ) = 0.04, P(T |W ) = 0.01, P(W ) = 0.6 ⇒ P(M ) = 0.4
Applying Bayes’ theorem we compute the desired probability P(W |T ):
P(T |W ) · P(W )
P(W |T ) =
P(T |W ) · P(W ) + P(T |M ) · P(M )
0.01 · 0.6 3
= = = 0.27
0.01 · 0.6 + 0.04 · 0.4 11
(b) Let P1 , P2 , P3 be the events that the pond containing 1, 2 or 3 fish is chosen randomly.
1
P(P1 ) = P(P2 ) = P(P3 ) =
3
Let M be the event, that a fish is cought and marked and that on the next day an unmarked
fish is cought from the same pond. Then we have:
1 2
P(M |P1 ) = 0 , P(M |P2 ) = , P(M |P3 ) =
2 3
According to the law of total probability we get:
1 1 2 1 7
P(M ) = P(M |P1 ) · P(P1 ) + P(M |P2 ) · P(P3 ) + P(M |P3 ) · P(P3 ) = 0 + · + · =
2 3 3 3 18
Finally applying Bayes’ theorem yields the probabilties that the chosen pond contains (i) one,
(ii) two or (iii) three fish:
P(M |P1 ) · P(P1 )
i. P(P1 |M ) = =0
P(M )
1 1
P(M |P2 ) · P(P2 ) 2 · 3 3
ii. P(P2 |M ) = = 7 =
P(M ) 18
7
2 1
P(M |P3 ) · P(P3 ) 3 · 3 4
iii. P(P3 |M ) = = 7 =
P(M ) 18
7
(c) Let Wi , i = 1, 2 be the events ‘White ball with the ith draw’.
i. P(W2 ) = P(W2 |W1 ) · P(W1 ) + P(W2 |W1C ) · P(W1C )
3 4 4 5 32 4
= · + · = =
8 9 8 9 8·9 9
3 4
P(W2 |W1 ) · P(W1 ) · 3
ii. P(W1 |W2 ) = = 849 =
P(W2 ) 9
8
30
7 Random variables in one dimension
• What is a random variable? How is the sample space Ω linked to the set of events E?
• What is the domain and co-domain of every distribution function?
• What is the relationship of probability mass functions/probability density functions of random

variables and frequency functions/histograms of a characteristic in a sample statistic.
• Explain the difference of expectation value and average value or mean.
A real valued function F defined on R is a distribution function, if and only if the following conditions
hold:
(a) F is continuous from the right.
(b) F is monotonically increasing.
(c) lim F (x) = 0 and lim F (x) = 1 .

x→−∞ x→+∞
In case of a continuous random variable it additionally holds that

(d) F is continuous.
Table of properties a) to d) for the functions with graphs 1) to 6)
1) 2) 3) 4) 5) 6)
a) X X X X ? X
b) X X X − X X
c) − X X X X −
d) X X X X − X
So only the functions with graphs 2) and 3) fulfill all the required properties of a distribution function.
31
(a) For the mass function fX : R → R+
0 of X it holds

1


 16 for x = 0
4 1

16 = 4 for x = 1




! 
4 1 = 6
 3
for x = 2
fX (x) = · 4 = 16
4
8
1
x 2  16 =

 4 for x = 3

1
for x = 4


16





0 otherwise
(b) Modus: The value with highest probability: xMod = 2

Median: xMed = x[0.5] = 2
Expected value:
With a symmetry argument we have: E(X) = 2
(Or we calculate E(X) = 41 + 2 · 83 + 3 · 14 + 4 · 16
1
= 2)
1 3 1 1
Variance: At first we calculate E(X 2 ) = 4 +4· 8 +9· 4 + 16 · 16 =5.
This gives us V(X) = E(X 2 ) − E(X)2 =5−4=1 .
(c)
F(x)
1
15/16
11/16
5/16
1/16
0
x
0 1 2 3 4
Remark: X ist binomially distributed with parameters n = 4 and p = 21 , X ∼ Bi(4, 12 ). Thus the
results can be obtained directly from properties of binomially distributed random variables: For the
expectation value it holds, that E(X) = n · p = 4 · 21 = 2 and for the variance V(X) = n · p · (1 − p) =
4 · 12 · 12 = 1 .
(a) The (triangular) area under the graph of the density function must be of size 1:
1 !
· 3 · 2a = 3a = 1
2
1
Thus a = 3 .
R∞ R1 R3
As an alternative one sets the integral −∞ f (x) dx = 0 2ax dx + 1 (3a − ax) dx equal to 1.
32
(b)
f(x)
2/3
0 x
0 1 2 3
(c)
P(X = 1) = 0
Z 1 Z 2
2 1

P(0.5 < X < 2) = x dx + 1− x dx
0.5 3 1 3
1 2 1 2

1 1 1 2 1 3

= x + x − x2 = − +2− −1+ =
3 0.5 6 1 3 12 3 6 4
P(X < 2) = 1 − P(X ≥ 2)

1 2 3
Z 3
1

= 1− 1 − x dx = 1 − x − x
2 3 6 2
3 2 5
= 1−3+ +2− =
2 3 6
(d) The condition (X < 0.5) is a subset of (X < 1), thus P(X < 1|X < 0.5) = 1 .
(e) No. It is apparent from the graph in b), that the mode lies at XMod = 1. However, the
expectation value for this positively skewed distribution is larger than that.
f(x)
0.5
0 x
0 1 2 3
(a)
33
(b)
Z 2 Z 2
2
P(X ≤ 2) = f (x) dx = (3x − x2 ) dx
−∞ 9 0
2 3 2 1 3 2 2 8 20

= x − x = 6− =
9 2 3 0 9 3 27
(c) Using a symmetry argument we have E(X) = 1.5. Alternatively we calculate:

Z ∞ Z 3 Z 3
2 2
E(X) = x f (x) dx = x · (3x − x2 ) dx = (3x2 − x3 ) dx
−∞ 9 0 9 0
2 1 3 2 81 3

= 3
x − x4 = 27 − =
9 4 0 9 4 2
(a) Since every distribution function is continuous from the right, we have:
F (3) = lim F (x) = 1

2 x→3+
3−1

=⇒ = 1
a
=⇒ a2 = 4 =⇒ a = 2 or a = −2
(b)
(c) Density function f : R → R+

0
x − 1

for 1 ≤ x ≤ 3
f (x) = F 0 (x) = 2

0 otherwise
34
(d)
+∞ Z3 " #3
1 1 x3 x2
Z
2
E(X) = x · f (x) dx = x − x dx = −
2 2 3 2 1
−∞ 1
1 9 1 1 7

= 9− − + =
2 2 3 2 3
+∞ Z3 " #3
1 1 x4 x3
Z
2 2 3 2
E(X ) = x · f (x) dx = x − x dx = −
2 2 4 3 1
−∞ 1
1 81
17 1 1

= = −9− +
2 43 4 3
17 49 2
V(X) = E(X 2 ) − [E(X)]2 = − =
3 9 9
k
(e) The quartiles Qk fulfil the condition F (Qk ) = 4 for k = 1, 2, 3. Therefore 1 ≤ Qk ≤ 3 and
k
F (Qk ) = ⇐⇒
4
Qk − 1 2 k

= ⇐⇒
2 4
√
Qk − 1 = k ⇐⇒
√
Qk = 1 + k,
thus √ √ √
Q1 = 1 + 1=2, Q2 = 1 + 2, Q3 = 1 + 3.
For a) and c) we use Chebyschev’s inequality in the form
1
P(|X − µ| < kσ) ≥ 1 −
k2
4 4
(a) Let kσ = 2 ⇒ k 2 = σ2
= 2 =2.
1 1
Then we have P(8 < X < 12) = P(|X − 10| < 2) ≥ 1 − 2 = 2 .
25 25
(c) Let kσ = 5 ⇒ k 2 = σ2
= 2 .
2 23
Then we have P(5 < X < 15) = P(|X − 10| < 5) ≥ 1 − 25 = 25 .
For b) and d) we use Chebyschev’s inequality in the form
1
P(|X − µ| ≥ kσ) ≤
k2
9 9
(b) Let kσ = 3 ⇒ k 2 = σ2
= 2 .
2
Then we have P({X < 7} ∪ {13 < X}) = P(|X − 10| ≥ 3) ≤ 9 . (Note, that X is continuous.)
64 64
(d) Let kσ = 8 ⇒ k 2 = σ2
= 2 = 32 .
1
Then we have P({X < 2} ∪ {18 < X}) = P(|X − 10| ≥ 8) ≤ 32 .
35
(a) The area of the dartboard is (2d)2 = 64, the area of the bull’s eye is π ≈ 3.1416. Therefore the
1 π

probability to score a 2, 4, 6 or 8 is 4 1 − 64 ≈ 23.77%. The probability to hit the bull’s eye
π
is 64 ≈ 4.91%.
For the probability mass function of X we have approximately:

23.77%

 for x ∈ {2, 4, 6, 8}
fX (x) = 4.91% for x = 20


0 otherwise
(b) and for the distribution function approximately:


0

 for x<2

23.77% for 2≤x<4





47.54% for 4≤x<6


FX (x) =


 71.31% for 6≤x<8

95.08% for 8 ≤ x < 20







1 for 20 ≤ x
(c)
E(X) = 23.77% · (2 + 4 + 6 + 8) + 4.91% · 20 = 5.74

E(X 2 ) = 23.77% · (22 + 42 + 62 + 82 ) + 4.91% · 202 = 48.16
V(X) = E(X 2 ) − E(X)2 = 48.16 − 5.742 = 15.21
q
V(X) = 3.90
36
8 Multidimensional random variables
• What do joint probability distribution and marginal probability distribution mean?
• What does „stochastically independent“ mean and how can you examine the stochastic indepen-
dence of two random variables?
• Give examples of a multidimensional random variable.
(a) We obtain the marginal distributions by calculating the sum of every column and every row
Y
2 3 4
1 1 1 1
10
9 9 9 3
1 1 1
X 20 0
6 6 3
1 2 1 1
30
18 9 18 3
1 1 1
3 3 3
xi 10 20 30 yi 2 3 4
being and
fX (xi ) 1/3 1/3 1/3 fY (yi ) 1/3 1/3 1/3
f (x, 4)
(b) Using f1 (x|Y = 4) = = 3 · f (x, 4) we deduct the conditional distribution:
fY (4)
xi 10 20 30
f1 (xi |Y = 4) 1/3 1/2 1/6
(c)
1
E(X) = · (10 + 20 + 30) = 20
3
1 1400
E(X 2 ) = · (102 + 202 + 302 ) =
3 3
2 2 1400 200
V(X) = E(X ) − E(Y ) = − 400 = = 66.6667
3 3
1
E(Y ) = · (2 + 3 + 4) = 3
3
1 29
E(Y 2 ) = · (22 + 32 + 42 ) =
3 3
2 29 2
V(Y ) = E(Y 2 ) − E(Y ) = − 9 = = 0.6667
3 3
37
(d)
1 1 1
E(X · Y ) = 10 · 2 · + 10 · 3 · + 10 · 4 ·
9 9 9
1 1
+ 20 · 2 · + 20 · 3 · 0 + 20 · 4 ·
6 6
1 2 1
+ 30 · 2 · + 30 · 3 · + 30 · 4 · = 60
18 9 18
Cov(X, Y ) = E(X · Y ) − E(X) · E(Y ) = 60 − 20 · 3 = 0
ρXY = 0
(e) X and Y are uncorrelated since the covariance vanishes, but they are not independent, because
marginal distributions and conditional distributions differ. [Compare a) and b)]
Let the random variable X describe profit & loss of the bet with a probability p of winning:
x −1 3
P(X) = x 1−p p
An expectation value of 0 gives a condition for p:
!
E(X) = p − 1 + 3p = 4p − 1 = 0
1
=⇒ p = = 25%
4
For the gambling game the random variable Y shall describe profit & loss:
y −1 1 2 3
3
3 5 2 1 1 3 5 1 1 2
3
5 1
P(Y = y) 6 1 6 6 2 6 6 6
125 75 15 1
= 216 = 216 = 216 = 216
Then we have the expectation value:

1 17
E(Y ) = · (−125 + 75 + 30 + 3) = − ≈ −0.0787
216 216
Interpretation: On average you loose 7.87 cent per game.
We complete the table:
f (xm , yn ) y1 = −5 y2 = 0 y3 = 5
x1 = 3 0.4 − p 0.15 p − 0.25 0.3
x2 = 4 p 0.1 0.6 − p 0.7
0.4 0.25 0.35 1
(a) Determine p in a way such that Cov(X, Y ) = 0 holds.
E(X) = 3 · 0.3 + 4 · 0.7 = 3.7

E(Y ) = −5 · 0.4 + 5 · 0.35 = −0.25
E(X · Y ) = 3 · (−5) · (0.4 − p) + 3 · 5 · (p − 0.25) + 4 · (−5) · p + 4 · 5 · (0.6 − p)
= 2.25 − 10p
Cov(X, Y ) = E(X · Y ) − E(X) · E(Y ) = 2.25 − 10p − 3.7 · (−0.25)
!
= 3.175 − 10p = 0
=⇒ p = 0.3175
38
(b)
E(X · Y ) = 2.25 − 3.175 = −0.925

E(X 2 ) = 9 · 0.3 + 16 · 0.7 = 13.9
V(X) = E(X 2 ) − E(X)2 = 13.9 − 3.72 = 0.21
E[(X + 3)2 ] = E(X 2 ) + 6E(X) + 9 = 13.9 + 6 · 3.7 + 9 = 45.1
39
9 Stochastic models and special distributions
• How is an arbitrary normal distribution linked to the standard normal distribution? Can a nor-
mally distributed random variable be transformed into a standard normally distributed variable?
• Name some well-known distributions.
• Is it possible to give an exact density function for a real random variable like a stock return?
(
1 for 0 ≤ x < 1
(a) Density function: f : R → R+
0, f (x) =
0 otherwise

0

 for x < 0
Distribution function: F : R → [0, 1] , F (x) = x for 0 ≤ x < 1

for 1 ≤ x

1
1
E(X) = (symmetry)
2
Z 1
1 3 1 1
E(X 2 ) = x2 dx =
x =
0 3 0 3
1 1 1
V(X) = E(X 2 ) − E(X)2 = − =
3 4 12
40
(b) Probability mass function:
(
1
100 000 for y ∈ {0.000 00; 0.000 01; 0.000 02; . . . ; 0.999 99}
fY (y) =
0 otherwise
Transfoming Y by U = m · Y + 1 with m = 100 000 yields a uniformly distributed random

variable U with range {1, 2, . . . , 99 999, m = 100 000}. According to the lecture notes it holds
that
m+1 100 001
E(U ) = = ,
2 2
m2 − 1 1010 − 1
V(U ) = = .
12 12
1
The reverse transformation Y = m (U − 1) eventually reveals
m+1
E(U ) − 1 −1 m−1
E(Y ) = = 2 = = 0.499 995 ,
m m 2m
1 m2 − 1 1 − m−2 1 − 10−10
V(Y ) = 2
· = = ≈ 0.083 33 .
m 12 12 12
X−0.5
√ √
(c) Through the transformation Z = √ = 12(X − 0.5) = 3(2X − 1) the random variable Z
1/12
√ √with expectation 0 and variance 1. For X ∈ [0, 1) the values of

becomes uniformly distributed
Z fall into the interval [− 3, 3). The density function is given by:

1
√ √
 √
2 3
for − 3≤z< 3
fZ (z) = .
0 otherwise
The number of life insurances sold can be described by a binomially distributed random variable X
2
with parameters n = 16 and p = 10 = 0.2. X ∼ Bi(16; 0.2)
(a)
E(X) = n · p = 16 · 0.2 = 3.2

V(X) = n · p · (1 − p) = 16 · 0.2 · 0.8 = 2.56
(b)
3 3
!
X X 16
P(X ≤ 3) = P(X = x) = · 0.2x · 0.816−x
x=0 x=0
x
16 · 15 16 · 15 · 14
= 0.816 + 16 · 0.2 · 0.815 + · 0.22 · 0.814 + · 0.23 · 0.813
2 6
= 59.81%
(c) Now it holds X ∼ Bi(12; 0.2).
P(X > 10) = P(X = 11) + P(X = 12)

! !
12 12
= · 0.211 · 0.81 + · 0.212 · 0.80
11 12
= 12 · 0.211 · 0.8 + 0.212 ≈ 2 · 10−7 (almost zero)
41
(a) The number X of correct answers is binomially distributed with parameters n = 20 and p = 15 ;
X ∼ Bi(20, 51 ). On average the student will answer n · p = 4 questions correctly.
(b) The probability to pass the exam is:
P(X ≥ 10) = 1 − P(X ≤ 9) = 1 − 0.9974 = 0.0026 = 0.26%
(Hint: Use tables or software with statistic functions to determine the values of probabilities
P(X ≤ 9), etc.)
Now we are looking for the largest x such that P(X ≥ x) > 5% holds:
P(X ≥ 9) = 1 − P(X ≤ 8) ≈ 1%
P(X ≥ 8) = 1 − P(X ≤ 7) ≈ 3%
P(X ≥ 7) = 1 − P(X ≤ 6) ≈ 8.7%
The limit for passing the exam would be lowered to 7 correct answers.
We are dealing with a binomial distribution with parameters n = 10 and p = 0.5 ; X ∼ Bi(10; 0.5) .
(a)
1
P({X ≤ 1} ∪ {X ≥ 9}) = P(|X − 5| ≥ |{z}
4 )≤
k2
=kσ
16 16 32
With kσ = 4 we get k 2 = σ2
= np(1−p) = 5 = 6.4. This leads to the estimation:
1 5
P({X ≤ 1} ∪ {X ≥ 9}) ≤ 2
= = 0.156 25 = 15.625%
k 32
(b) The exact value is much less:
P({X ≤ 1} ∪ {X ≥ 9}) = P(X = 0) + P(X = 1) + P(X = 9) + P(X = 10)

11
= (1 + 10 + 10 + 1) · 0.510 = = 2.148%
512
The annual savings of an account are described by a normally distributed random variable X with
parameters µ = 400 und σ 2 = 2002 ; X ∼ N 400, 2002 . Then Z = X−400

200 is standard normally
distributed, Z ∼ N (0, 1). We now calculate the proportions of the probability mass that falls into the
respective classes and determine the number of accounts within each class:
42
Standardisation Values of Difference: number of
of class limits distribution function portion for class accounts
0−400
(1) z1 = 200 = −2 FSt (−2) = 0.0228 0.0228 28 044
200−400
(2) z2 = 200 = −1 FSt (−1) = 0.1587 0.1359 167 157
300−400
(3) z3 = 200 = −0.5 FSt (−0.5) = 0.3085 0.1458 184 254
400−400
(4) z4 = 200 =0 FSt (0) = 0.5000 0.1915 235 545
600−400
(5) z5 = 200 =1 FSt (1) = 0.8413 0.3413 419 799
800−400
(6) z6 = 200 =2 FSt (2) = 0.9772 0.1359 167 157
∞−400
(7) z7 = 200 =∞ FSt (∞) = 1.0000 0.0228 28 044
(a)
P(−1 ≤ Z ≤ 1) = FSt (1) − FSt (−1) = 2 · FSt (1) − 1 = 2 · 0.8413 − 1 = 68.62%
(b)
X − 18
P(−1 ≤ Z ≤ 1) = P(−1 ≤ ≤ 1) = P(−4 ≤ X − 18 ≤ 4) = P(14 ≤ X ≤ 22)
4
The corresponding interval for X is given by [14, 22].
(c)
10 − 18 X − 18 26 − 18

P(10 < X ≤ 26) = P < ≤
4 4 4
= P(−2 < Z ≤ 2) = FSt (2) − FSt (−2)
= 2 · FSt (2) − 1 = 2 · 0.9772 − 1 = 95.44%
 
 X − 50 a − 50 

a − 50

P(X < a) = 0.6 ⇐⇒ P <  = 0.6 ⇐⇒ FSt = 0.6
 
 5 }
| {z 5  5
∼N(0,1)
a − 50
⇐⇒ = 0.25 ⇐⇒ a ≈ 51.25 .
5
b − 50 50 − b

P(X ≥ b) = 0.75 ⇐⇒ P(X < b) = 0.25 ⇐⇒ FSt = 0.25 ⇐⇒ FSt = 0.75
5 5
50 − b
⇐⇒ = 0.67 ⇐⇒ b ≈ 46.65.
5
X − 50

c c c

P(|X − 50| < c) = 0.6 ⇐⇒ P
< = 0.6 ⇐⇒ D = 2FSt − 1 = 0.6
5 5 5 5
c c

⇐⇒ FSt = 0.8 ⇐⇒ ≈ 0.84 ⇐⇒ c ≈ 4.2 .
5 5
43
(a)
! ! !
45 45 45
P(10 ≤ X ≤ 12) = · 0.410 · 0.635 + · 0.411 · 0.634 + · 0.412 · 0.633
10 11 12
= 0.005 750 56 + 0.012 198 1 + 0.023 040 9
= 0.040 99 ≈ 4.1%
(b) We calculate expected value and standard deviation of X:
µ = E(X) = 45 · 0.4 = 18
V(X) = 45 · 0.4 · 0.6 = 10.8
√
σ = 10.8 = 3.286
and approximate X by a N µ, σ 2 normally distributed random variable:

12.5 − 18 9.5 − 18

P(10 ≤ X ≤ 12) ≈ FSt √ − FSt √
10.8 10.8
= FSt (−1.674) − FSt (−2.586)
= FSt (2.586) − FSt (1.674)
= 0.9952 − 0.9529 ≈ 4.23%
(c) Without continuity correction we obtain a larger error:
P(10 ≤ X ≤ 12) = P(9 < X ≤ 12)

12 − 18 9 − 18

≈ FSt √ − FSt √
10.8 10.8
= FSt (−1.826) − FSt (−2.739)
= FSt (2.739) − FSt (1.826)
= = 0.9969 − 0.9661 ≈ 3.1%
For a given probability 1−p the limits of the intervals for U ∼ N (0, 1) are given by the (1− p2 )-quantiles
z[1 − p/2] of the standard normal distribution:
1−p =P(−u ≤ U ≤ u) = FSt (u) − FSt (−u) = 2 · FSt (u) − 1

p
⇒ FSt (u) = 1 − ⇒ u = z[1 − p/2]
2
Accordingly for W it holds:
1−p = P(−w − 5 ≤ W ≤ w − 5) = P(−2w ≤ 2(W + 5) ≤ 2w)

| {z }
=U
= FSt (2w) − FSt (−2w) = 2 · FSt (2w) − 1
p 1
⇒ FSt (2w) = 1 − ⇒ w = · z[1 − p/2]
2 2
So for the given values of 1 − p we have:
44
a) b)
1−p Quantiles Range for U Range for W
80% z[0.90] = 1.282 −1.282 < U < 1.282 −5.641 < W < −4.359
90% z[0.95] = 1.645 −1.645 < U < 1.645 −5.823 < W < −4.178
95% z[0.975] = 1.960 −1.960 < U < 1.960 −5.980 < W < −4.020
100% z[1.0] = ∞ −∞ < U < ∞ −∞ < W < ∞
Let X be the normally distributed approximation of the random variable describing the points
achieved.
49.5 − 60

(a) P(X ≤ 49.5) ≈ P Z ≤ = FSt (−1.05) = 1 − FSt (1.05) = 1 − 0.8531 = 14.69%
10
(b) P(79.5 < X ≤ 95.5) ≈ P(1.95 < Z ≤ 3.55) = FSt (3.55) − FSt (1.95) = 0.9998 − 0.9744
= 2.54%
(c) 10% = FSt (−1.28) = P(Z ≤ −1.28) ≈ P(X ≤ |−1.28 ·{z10 + 60})
=47.2
The limit for passing the exam should be set to 47 points.
Then P(X < 47) = P(X ≤ 46.5) ≈ 1 − FSt (1.35) = 8.85% < 10%,
but P(X < 48) = P(X ≤ 47.5) ≈ 1 − FSt (1.25) = 10.56% > 10%.
45
10 Limit theorems
• Does the central limit theorem put requirements on the original distribution of random variables?
• What does convergence with probability 1 mean? For which law do we need this definition?
(a) The expectation value of the sampling X = n1 ni=1 Xi corresponds to the expectation value of
P
the basic population.

n
1X 1
E(X) = E(Xi ) = · n · µ = µ = 1700
n i=1 | {z } n
=µ
√
For the standard deviation σX we obtain the so called n-law:
n
2 1 X 1 2 σ2
σX = V(X) = V(X i ) = · n · σ =
n2 i=1 | {z } n2 n
=σ 2
⇒
σ 144
σX = √ =√ = 10.1824
n 200
Consider the 120 applications as a random sampling of a binomially distributed random variable X
with parameters p = 0.4 and n = 120; X ∼ Bi(120; 0.4). For X we have E(X) = n · p = 48 and
V(X) = n · p(·1 − p) = 28.8. A proportion of female applicants of 35% corresponds to 42 women and
a proportion of 45% corresponds to 54 women among all applicants.
(a) We approximate X by a normally distributed random variable and calculate:

 
 41 − 48 X − 48 54 − 48  − 48 41 − 48
 
 ≈ FSt 54

P(41 < X ≤ 54) = P √
 < √ ≤ √ √ − F St √
 28.8 28.8
| {z }
28.8 
 28.8 28.8
≈∼N(0,1)
= FSt (1.118) − FSt (−1.3044) = 0.8682 − 0.0961 = 77.21%
(b) Taking into account the correction for continuity we obtain more or less the same result:
54.5 − 48 41.5 − 48

P(41 < X ≤ 54) ≈ FSt √ − FSt √
28.8 28.8
= FSt (1.2112) − FSt (−1.2112) = 2 · 0.8871 − 1 = 77.42%
1
The number of claims X is binomially distributed with n = 1 000 000 and p = 100 .
46
(a)
µ = E(X) = n · p = 10 000
σ 2 = V(X) = n · p · (1 − p) = 9900
(b) X can be approximated appropriately by a normally distributed random variable Y with pa-
rameters µ und σ 2
Y −µ y
P(|Y − µ| ≤ y) = 95% ⇐⇒ P( | | ≤ ) = 95%
σ }
| {z σ
=Z∼N(0,1)
y

⇐⇒ P |Z| ≤ = 95%
σ
y y

⇐⇒ 2FSt − 1 = 95% ⇐⇒ FSt = 0.975
σ σ
y √
⇐⇒ = 1.96 ⇐⇒ y = 1.96 · 9900 = 195.02 .
σ
With probability 95% the number of claims is in the interval
[10 000 − 195; 10 000 + 195] = [9805; 10 195].
(c)
Y −µ y−µ

P(Y > y) = 5% ⇐⇒ P(Y ≤ y) = 95% ⇐⇒ P ≤ = 95%
σ σ
y−µ y−µ

⇐⇒ FSt = 95% ⇐⇒ = 1, 645
σ σ
√
⇐⇒ y = 1, 645 · σ + µ = 1, 645 · 9900 + 10 000 = 10 163.7
The number of claims exceeds 10 164 with a probability of (at most) 5%.
47
11 Point estimation of population parameters
• What does unbiasedness mean? What is consistency? What is efficiency?
• How do you compare two different estimators? Which criterion do you use?
• You can define many different estimators. Is there one estimator which is always “correct”?
(a) Estimator for the mean:
µ̂G = g = 600 000 €
Estimator for the standard deviation:
r
n 225
r
σ̂G = · sG = · 90 000 € ≈ 90 201 €
n−1 224
√
(b) According to the n-law it holds:
1 90 201
σ̂G = √ · σ̂G = = 6013 €
n 15
(a) Estimation of the standard deviation in the basic population:
r
n 225
r
σ̂G = · sG = · 0.005 kg ≈ 0.005 011 1 kg
n−1 224
(more or less sG )
Estimation of the standard deviation of the sample mean:
1 0.005 011 1 kg
σ̂G = √ · σ̂G = = 0.000 334 kg
n 15
(b) Point estimator of the mean: µ̂ = g = 0.824 kg
2 and σ̂ 2 are unbiased:
(a) σ̂A C
n
2 1X h i 1
E(σ̂A ) = E (Xj − µ)2 = · n · σ 2 = σ 2
n j=1 | {z } n
=σ 2
2 1
E(σ̂C ) = · E(X12 ) + E(X22 ) + E(Xn2 ) −µ2 = E(X12 ) − µ2 = σ 2
3 | {z }
=3·E(X12 )
2 is not unbiased but only asymptotically unbiased:

However, σ̂B
n
2 1 X h i n n→∞
E(σ̂B ) = E (Xj − µ)2 = · σ2 −→ σ2
n − 1 j=1 | {z } n − 1
=σ 2
48
2 is independent of the number n. Therefore it will not
(b) The value of the variance of estimator σ̂C
converge to zero with growing n. Thus σ̂C 2 is not a consistent estimator.
2 is the only biased estimator. Its bias (for n = 9) is:

(c) σ̂B
n 9 1

2 2
bias = E(σ̂B ) −σ = − 1 σ2 = − 1 σ2 = σ2
n−1 8 8
2 because of its smaller

(d) Probably, one would prefer the slightly biased but consistent estimator σ̂B
variance. (Please note: Here we are talking about the variance of the estimator!)
49
12 Interval estimation
• What is the connection between the probability of error and the probability of confidence?
• What is a confidence interval?
• What is the connection between the normal distribution and the chi-square-distribution?
(a)
1
µ = · (0 + 1 + 2 + 3 + 4) = 2
5
1
σ2 = · (1 + 4 + 9 + 16) − 22 = 6 − 4 = 2
5
(b) Values of X (2) = 12 (X1 + X2 )
X (2) X1
0 1 2 3 4
0 0 0.5 1 1.5 2
1 0.5 1 1.5 2 2.5
X2 2 1 1.5 2 2.5 3
3 1.5 2 2.5 3 3.5
4 2 2.5 3 3.5 4
Probability mass function of X (2) :
x(2) 0 0.5 1 1.5 2 2.5 3 3.5 4

1 2 3 4 5 4 3 2 1
f (x(2) ) 25 25 25 25 25 25 25 25 25
(c)
0 · 1 + 0.5 · 2 + 1 · 3 + 1.5 · 4 + 2 · 5 + 2.5 · 4 + 3 · 3 + 3.5 · 2 + 4 · 1

E(X (2) ) = =2
25
2 0.52 · 2 + 12 · 3 + 1.52 · 4 + 22 · 5 + 2.52 · 4 + 32 · 3 + 3.52 · 2 + 42 · 1
E(X (2) ) = =5
25
V(X (2) ) = 5 − 22 = 1
(d)
σ2 2
V(X (2) ) = = =1
2 2
(e) Using the distribution found in b) we obtain:
3 2 1
P(2.5 < X (2) ≤ 3.5) = f (3) + f (3.5) = + = = 20%
25 25 5
50
(f) The random variable X (50) has the expectation value of E(X (50) ) = 2 and the variance V(X (50) ) =
σ2 2 1
50 = 50 = 25 . Applying the central limit theorem the distribution of X (50) can be approximated
well by a normal distribution:
 
 2.5 − 2 X (50) − 2 3.5 − 2 

 
P(2.5 < X (50) ≤ 3.5) = P
 < ≤  = P(2.5 < Z ≤ 7.5)
 1/5 1/5
| {z }
1/5  
≈Z∼N(0,1)(0,1)
= FSt (7.5) − FSt (2.5) ≈ 1 − 0.9938 = 0.62%
From solution 62 we know:
µ̂G = g = 600 000 €

r
n 225
r
σ̂G = · sG = · 90 000 € ≈ 90 201 €
n−1 224
1 90 200 €
σ̂G = √ · σ̂G = = 6013 €
n 15
Considering a sample size of n = 225 we assume a normal distribution for the sample mean. A
probability of error of α = 4.55 % corresponds to the quantile z[1 − α/2] = z[0.977 25] ≈ 2.00.
Therefore we get the confidence interval
h i
CI(µ, 1 − α) = g − z[1 − α/2] · σ̂G , g + z[1 − α/2] · σ̂G
CI(µ; 0.9545) = [600 000 − 2 · 6013 ; 600 000 + 2 · 6013] = [587 974 ; 612 026] .
For the extrapolation we multiply the mean by the number of companies N = 12 100.
CI(Profit of industry sector; 0.9545) = [7.114 Bill. ; 7.406 Bill. ]
The variance in the basic population is known and the characteristic is normally distributed.
(a)
σ 3
σX = = √ = 0.3
100 10
1 − α = 95% ⇒ z[1 − α/2] = z[0.975] = 1.96
CI(µ, 95%) = [53.97 − 1.96 · 0.3 ; 53.97 + 1.96 · 0.3] = [53.382; 54.558]
(b) The length of the intervall is given by:
σ 3 11.76 !
2 · z[1 − α/2] · σX = 2 · z[1 − α/2] · √ = 2 · 1.96 · √ = √ = 0.4
n n n
2
Solving for n reveals n = 11.76
0.4 = 864.36. For sample sizes of 865 or larger the length of the
confidence intervals is 0.4 or below.
51
(c) For a symmetric 99% confidence interval we have z[1 − α/2] = z[0.995] = 2.575 and therefore:
2
3 15.45

!
0.4 = 2 · 2.575 · √ ⇒ n= ≈ 1491.9.
n 0.4
The minimum sample size is 1492.
characteristic IQ is assumed to be normally distributed with parameters µ = 100 and
(a) The √
σ = 225 = 15. Then Z = IQ−10015 is standard normally distributed. Thus:
IQ − 100 130 − 100

P(IQ > 130) = P > = P(Z > 2) = 1 − FSt (2) = 1 − 0.9772 = 2.28 %.
15 15
(b) For IQ we find:
E(IQ) = 100
225
V(IQ) = = 2.25
100
σIQ = 1.5
Again by standardisation:
98 − 100 IQ − 100 103 − 100

P(98 < IQ < 103) = P < <
1.5 1.5 1.5
= FSt (2) − FSt (−4/3) = 0.9772 − 0.0912 = 88.6 %.
(a) Z ∞ Z ∞ Z ∞
1 1
E(X) = x · f (x) dx = x· dx = dx = [ln x]∞
1 =∞
−∞ 1 x2 1 x
The expectation value does not exist!
(b) Z x Z x x
1 1 1

f (t) dt = dt = − =1−
1 1 t2 t 1 x
(
0 for x ≤ 1
Distribution function: F (x) = 1
1− x for x > 1
Requirement for median:
! 1 1 1
F (xMed ) = ⇐⇒ 1− = ⇐⇒ xMed = 2
2 xMed 2
(c) Requirement for a:

!
P(1 ≤ X ≤ a) = 95% ⇐⇒ F (a) − F (1) = 95%
| {z }
=0
⇐⇒ F (a) = 95%
1
⇐⇒ 1 − = 0.95 ⇐⇒ a = 20
a
Thus P(1 ≤ X ≤ 20) = 95% .
52
Interval estimation for µ with unknown variance σ 2 , small sample size (n = 5) and normally distributed
characteristic in the basic population:

CI(µ, 1 − α) = x − tn−1 [1 − α/2] · σ̂X , x + tn−1 [1 − α/2] · σ̂X
Calculation of sample statistics:
2% − 4% + 3% − 2% − 1%
x = = −0.4 %
5
(2%)2 + (4%)2 + (3%)2 + (2%)2 + (1%)2
s2 = − (0.4 %)2 = 6.64 (%)2
√ 5
s 6.64%
σ̂X = √ = √ = 1.288%
n−1 4
tn−1 [1 − α/2] = t4 [0.99] = 3.747
Confidence Interval:
CI(µ, 98%) = [−0.4 % − 3.747 · 1.288 % ; −0.4 % + 3.747 · 1.288 %]

= [−5.226 % ; 4.426 % ]
Assessment: Based on the given sample a useful confidence interval at a confidence level of 98% can
hardly be indicated.
(a) Interval estimation for µ with known variance σ 2 :

CI(µ, 1 − α) = x − z[1 − α/2] · σX , x + z[1 − α/2] · σX
560 + 532 + · · · + 515

x = GE = 536 GE
7
σ
σX = √ = 3.694 GE
7
z[1 − α/2] = z[0.95] = 1.645
CI(µ, 90%) = [536 − 1.645 · 3.694 ; 536 + 1.645 · 3.694]
= [529.82 ; 542.08] (GE)
(b) Interval estimation for the unknown variance σ 2 :

" #
n · s2 n · s2
CI(σ 2 , 1 − α) = ,
χ2n−1 [1 − α/2] χ2n−1 [α/2]
s2 = x2 − x2 = 200.286 GE2
χ26 [0.95] = 12.59
χ26 [0.05] = 1.635
7 · 200.286 7 · 200.286

2
CI(σ , 90%) = ; = [111.4 ; 857.5] (GE)2
12.59 1.635
53
(c) Length of interval:
σ !
2 · z[1 − α/2] · σX = 2 · z[1 − α/2] · √ = 10 GE
n
√ !2
2 · z[1 − α/2] · σ 2 1.645 · 95.5

=⇒ n = = = 10.34
10 GE 5
The sample size should at least be 11.
(a) Sample statistics and required estimators:
100 + 103 + 104 + 106 + 112
x = = 105
5
1002 + 1032 + 1042 + 1062 + 1122
x2 = = 11041
5
s2X = x2 − x2 = 16
2 5 2 5
σ̂X = · s = · 16 = 20
4 X 4
2 1 2 1
σ̂X = · σ̂X = · 20 = 4
5 5
or directly
2 1 2 1
σ̂X = · sX = · 16 = 4
4 4
σ̂X = 2
i. Interval estimation for µ with unknown variance σ 2 , small sample size (n = 5) and normally
distributed characteristic in the basic population:

CI(µ, 1 − α) = x − tn−1 [1 − α/2] · σ̂X , x + tn−1 [1 − α/2] · σ̂X
99.5 % quantile of t-distribution with 4 degrees of freedom:
tn−1 [1 − α/2] = t4 [0.995] = 4.604
Confidence interval:
CI(µ, 99%) = [105 − 4.604 · 2 ; 105 + 4.604 · 2]

= [95.792 ; 114.208]
ii. Interval estimation for the unknown variance σ 2 :

" #
2 n · s2X n · s2X
CI(σ , 1 − α) = ,
χ2n−1 [1 − α/2] χ2n−1 [α/2]
97.5 % and 2.5 % quantiles of chi-squared distribution with 4 degrees of freedom:
χ24 [0.975] = 11.14

χ24 [0.025] = 0.484
Confidence interval:
5 · 16 5 · 16

2
CI(σ , 95%) = ; = [7.18 ; 165.29]
11.14 0.484
54
(b) Confidence interval for µ with unknown variance σ 2 and large sample size:

CI(µ, 1 − α) = x − z[1 − α/2] · σ̂X , x + z[1 − α/2] · σ̂X
Sample statistics:
x = 150 g
s = 28 g
s 28 g
σ̂X = √ =√
n−1 n−1
z[1 − α/2] = z[0.95] ≈ 1.65
Comparing the length of the confidence interval with the difference of the given boundaries yields
the sample size:
2 · z[1 − α/2] · σ̂X = 156.6 g − 143.4 g ⇐⇒

28 g
2 · 1.65 · √ = 13.2 g ⇐⇒
n−1
√
7 = n−1 =⇒ n = 50
55
13 Statistical hypotheses testing
• What do type 1 and type 2 error mean?
• When will the normal distribution be used and when the t-distribution?
(a) s
σ 0.45 mm2
σX = = = 0.3 mm
n 5
(b) No, this is not an estimation, because the variance in the basic population was assumed to be
known.
(c) For α = 3% and a two sided rejection region we find the quantile z[1 − α/2] = z[0.985] = 2.17.
The machine has to be stopped, if the value of the sample mean deviates more than
z[1 − α/2] · σX = 2.17 · 0.3 mm = 0.651 mm
from the target value µ0 .
Sample statistics and required estimators:
12.7 + 13.3 + 13.0 + 12.9 + 13.1
x = = 13.0
5
12.72 + 13.32 + 13.02 + 12.92 + 13.12
x2 = = 169.04
5
s2X = x2 − x2 = 0.04
2 5 2 5
σ̂X = · sX = · 0.04 = 0.05
4 4
2 1 2 1
σ̂X = · σ̂ = · 0.05 = 0.01
5 X 5
or directly
2 1 2 1
σ̂X = · s = · 0.04 = 0.01
4 X 4
σ̂X = 0.1
(a) (1) Formulate hypotheses:
H0 : µ = µ0 = 12.83 vs. H1 : µ 6= 12.83
(2) Calculate test quantity:

|x − µ0 | |13 − 12.83|
= = 1.7
σ̂X 0.1
(3) Critical t-value for α = 5% and 4 degrees of freedom:
t4 [1 − α/2] = t4 [0.975] = 2.776
56
(4) Test decision:
1.7 < 2.776 =⇒ retain H0 !
(b) (1) Formulate hypotheses:
H0 : µ = µ0 = 12.83 vs. H1 : µ 6= 12.83
(2) Calculate test quantity; now the test quantity

p is determined using the known standard
√
deviation of the sample mean σX = σ/ n = 0.036/5 = 0.084 85 :
|x − µ0 | |13 − 12.83|
= = 2.003
σX 0.084 84
(3) Critical z-value for α = 5%:
z[1 − α/2] = z[0.975] = 1.96
(4) Test decision:

2.003 > 1.96 =⇒ reject H0 !
12.7 + 13.3 + 13.0 + 12.9 + 13.1
x = = 13.0
5
12.72 + 13.32 + 13.02 + 12.92 + 13.12
x2 = = 169.04
5
s2X = x2 − x2 = 0.04
2 5 2 5
σ̂X = · sX = · 0.04 = 0.05
4 4
2 1 2 1
σ̂X = · σ̂ = · 0.05 = 0.01
5 X 5
or directly
2 1 2 1
σ̂X = · sX = · 0.04 = 0.01
4 4
σ̂X = 0.1
(a) (1) Formulate hypotheses:
H0 : µ ≤ µ0 = 600 000 € vs. H1 : µ > 600 000 €

g − µ0 636 000 − 600 000 36
= √ = = 2.4
σG 90 000/ 36 15

z[1 − α] = z[0.99] = 2.33
(4) Test decision:

2.4 > 2.33 =⇒ reject H0 !
57
(b) (1) Formulate hypotheses:
H0 : µ ≤ µ0 = 600 000 € vs. H1 : µ > 600 000 €

g − µ0 636 000 − 600 000 36 000
= √ = = 2.366
σ̂G 90 000/ 35 15 212.8
(3) Critical t-value for α = 1% and 35 degrees of freedom:
t35 [1 − α] = t35 [0.99] = 2.440
(4) Test decision:

2.366 < 2.440 =⇒ retain H0 !
(a) One sided Gauß-Test for µ with known variance σ 2
Sample statistics:
1.512 % + · · · + 1.396 %
x = = 1.445 %
r 8
0.01
σX = % = 0.0354 %
8
(1) Formulate hypotheses:
H0 : µ ≤ µ0 = 1.40 % vs. H1 : µ > 1.40 %

x − µ0 1.445 − 1.40
= = 1.271
σX 0.0354

z[1 − α] = z[0.99] = 2.33
(4) Test decision:

1.271 < 2.33 =⇒ retain H0 !
(b) Two sided Gauß-Test for µ with known variance σ 2

Sample statistics:
8.4% + · · · 8.3%
x = = 8.6%
r 7
0.5
σX = % = 0.267%
7
H0 : µ = µ0 = 8% vs. H1 : µ 6= 8%
58
x − µ 8.6% − 8%
0
= = 2.247

σX 0.267%
z[1 − α/2] = z[0.975] = 1.96
(4) Test decision:

2.247 > 1.96 =⇒ reject H0 !
Comparison of two means:
xA = 5.9%
xB = 7.7%
s2A = 0.597
s2B = 0.328
nA s2A + nB s2B 7 · 0.597 + 5 · 0.328
σ̂ 2 = = = 0.582
nA + nB − 2 7+5−2
s
√
r
nA + nB 7+5
σ̂∆ = σ̂ · = 0.582% · = 0.447%
nA · nB 7·5

H0 : µA = µB vs. H1 : µA 6= µB

xA − xB 5.9% − 7.7%

=
0.447% = 4.03

σ̂
∆
(3) Critical t-value for α = 5% and nA + nB − 2 degrees of freedom:
tnA +nB −2 [1 − α/2] = t10 [0.975] = 2.228
(4) Test decision:

4.03 > 2.228 =⇒ reject H0 !
(a) Two sided test for variance:
Sample statistics:
5+0+1+3−4
x = = 1 MU
5
42 + 12 + 02 + 22 + 52
s2X = = 9.2 MU2
5
H0 : σ 2 = σ02 = 10 MU2 vs. H1 : σ 2 6= 10 MU2
59
n · s2X 5 · 9.2
= = 4.6
σ02 10
(3) Critical χ2 -values for α = 5% and n − 1 degrees of freedom:
χ2lower = χ2n−1 [α/2] = χ24 [0.025] = 0.484

χ2upper = χ2n−1 [1 − α/2] = χ24 [0.975] = 11.14
(4) Test decision:

0.484 ≤ 4.6 ≤ 11.14 =⇒ retain H0 !
(b) One sided (from above) test for variance:

Sample statistics:
3−5+0+2+7−4 1
x = = MU
6 2
2.5 + 5.5 + 0.5 + 1.5 + 6.52 + 4.52
2 2 2 2
s2X = = 16.917 MU2
6
H0 : σ 2 ≤ σ02 = 10 MU2 vs. H1 : σ 2 > 10 MU2

n · s2X 6 · 16.917
2 = = 10.15
σ0 10
(3) Critical χ2 -value for α = 5% and n − 1 degrees of freedom:
χ2upper = χ2n−1 [1 − α] = χ25 [0.95] = 11.07
(4) Test decision:

10.15 < 11.07 =⇒ retain H0 !
One sided test from below: Check whether the deficiencies have become significantly less.
H0 : p ≥ p0 = 0.21 vs. H1 : p < 0.21

s
0.21 · (1 − 0.21) 0.407
σ = = = 0.0204
400 20
h − p0 0.174 − 0.21
= = −1.76
σ 0.0204

z[0.05] = −z[0.95] = −1.645
(4) Test decision:

−1.76 < −1.645 =⇒ reject H0 !
60

Tutorial Sheet EN Solution

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Tutorial Sheet EN Solution

Uploaded by

Copyright:

Available Formats

Tutorial Statistics and Probability

Summer Term 2022

2 Measures to describe statistical distributions 10

3 Two dimensional distributions 14

5 Combinatorics and counting principles 25

6 Fundamentals of probability theory 27

7 Random variables in one dimension 31

8 Multidimensional random variables 37

9 Stochastic models and special distributions 40

11 Point estimation of population parameters 48

13 Statistical hypotheses testing 56

• What is statistics about and what does descriptive statistics mean?

(c) H(2) = (21 + 46 + 54)/200 = 0.605

250 300 350 400 450 500 550 600 650

(a) H̄(450) − H̄(350) = 40% − 10% = 30%

(c) H̄(xMed ) = 50% =⇒ xMed = 500

class absolute frequency relative frequency rel. cumulative frequency

ii. using the distribution function of classes:

(d) i. Within the ordered data set we have

x35 = 642, 7 ≤ 650 < 650, 6 = x36

Therefore on 35 days the turnover is 650 EUR or less.

• What is the difference between a quartile and a quantile?

(d) The solutions x1 , x2 of a quadratic equation x2 + p · x + q are related to the coefficients p, q by

For the calculation of variances the formula s2X = x2 − x2 was used.

Now we can calculate x̄new and sXnew as follows:

• What is the intention of distributions with two or more dimensions?

• What does „statistical independency“ mean and why is it important?

• What is the difference between „statistical dependency“ and „correlation“?

(b) Calculate covariance and correlation using a working table:

(b) Joint distribution and marginal distributions:

(c) Conditional distribution for age and its marginal distribution:

xmed − 45 0.5 − 0.4796

Solving for xmed reveals

In the same manner we calculate the median ymed of employees:

x = 20 · 0.0227 + 30 · 0.1242 + · · · + 80 · 0.0606 ≈ 47.01

and a conditional arithmetic mean y for the employees of

y = 20 · 0.1272 + 30 · 0.2057 + · · · + 80 · 0.0077 ≈ 40.42 .

(f) Proportion of self-employed, being 55 years or older:

0.1772 + 0.0606 = 0.2378 = 23.78%

Proportion of employees, being 55 years or older:

0.1136 + 0.0077 = 0.1213 = 12.13%

0.0142 of people are older than 65.

ii. X and Y are not independent, for instance it holds

h1• · h•1 = 0.6 · 0.4 = 0.24 6= 0.2 = h11 = h(x1 , y1 )

x̄ = 1 · 0.6 + 2 · 0.4 = 1.4

• What is the purpose of linear regression analysis?

• Interpret the coefficient of determination R2 !

(a) The correlation coefficent rXY is given by:

So we have a strong linear dependency between X and Y .

regression line: y(x) = a + b · x = −594.52 + 0.2659x

regression line: x(y) = a0 + b0 · y = 2295.6897 + 3.3621y

(b) For the correlation coefficient rXY we obtain:

regression line: y(x) = a + b · x = 26 + 1.2 x

regression line: x(y) = a0 + b0 · y = 0.7296 + 0.6048 y

s2X = 418 − 202 = 18

month index return stock return

(d) The total variance s2Y of Y can be split as follows:

s2Y = s2Ŷ + s2E

For the part s2Ŷ , that is explained by the regression, we obtain

s2Ŷ = s2a+bX = b2 · s2X = 0.90242 · 2.1867 = 1.7807 .

Accordingly for the residual part s2E of the variance it remains

s2E = s2Y − s2Ŷ = 3.4533 − 1.7807 = 1.6726 .

s2Ŷ = s2a+bX = b2 · s2X = 0.90242 · 2.1867 = 1.7807 .

s2E = s2Y − s2Ŷ = 3.4533 − 1.7807 = 1.6726 .