Professional Documents
Culture Documents
Solutions
Contents
1 Statistical attributes and variables 3
4 Linear regression 20
10 Limit theorems 46
12 Interval estimation 50
2
1 Statistical attributes and variables
Comprehension questions
Explain in your own words :
• What is a frequency density function and what a distribution function, and which (mathematical)
features do they have to fulfill? How are they linked to each other?
• What kind of information is easily given by a frequency density function and by a distribution
function?
Solution of Exercise 1
Potential examples:
Attribute Statistical units Characteristic Type Scale
values
hair color male in the age black, brown, qualitative nominal
of 60 to 65 blond, gray, . . .
income tutor 15 €/h to 20 €/h quantitative cardinal
discrete
school grades class of 2010 0-15 points quantitative ordinal
discrete
gender students at FS male, female qualitative nominal
bank account account at a N0 or e.g. quantitative cardinal
transfers per savings bank (0-1000) discrete
month
body height players of the 1, 60m - 2, 30m quantitative cardinal
NBA continuous
..
.
Solution of Exercise 2
(a) The statistical units are the days at which the kiosk is open. Possible attributes are zero and all
natural numbers. However, a more precise answer would be the maximum amount of newspapers
the kiosk gets delivered per day. The type is a quantitative discrete one which is measureable
on a cardinal scale.
(b) The sum of all days is n = 200. If each entity ni is divided by this number n one obtains the
relative frequencies hi . Summing up these numbers yields the relative cumulative frequencies
Hi . These are the values of the distribution function at its steps.
i number of number of days rel. frequencies hi rel. cum.
newspapers sold xi ni frequencies Hi
1 0 21 0.105 0.105
2 1 46 0.230 0.335
3 2 54 0.270 0.605
4 3 40 0.200 0.805
5 4 24 0.120 0.925
6 5 10 0.050 0.975
7 6 5 0.025 1.000
Σ 200 1.000
Distribution function H : R → [0, 1]
3
0 for x<0
0≤x<1
0.105 for
0.335 for 1≤x<2
0.605 for 2≤x<3
H(x) =
0.805 for 3≤x<4
0.925 for 4≤x<5
0.975 for 5≤x<6
1 for 6≤x
1.000
0.975
0.925
0.805
0.605
H(x)
0.335
0.105
0 1 2 3 4 5 6 7
4
Solution of Exercise 3
100
90
80%
80
70
60 (b)
H̄(x) [%]
50%
50
40 (c)
30
30 % a)
20 x[80%]
xMed
10
(d)
0.6
0.4
h̄(x) [%]
0.2
0
250 300 350 400 450 500 550 600 650
x
5
Solution of Exercise 4
(a) We use a working table:
Points
i ni hi Hi
from . . . to below . . .
1 0− 25 50 0.125 0.125
2 25 − 50 90 0.225 0.350
3 50 − 75 170 0.425 0.775
4 75 − ··· 90 0.225 1.000
Σ 400 1.000
100
1.5
b)
80
h̄(x) [%]
1 H̄(x) [%]
60
40
0.5
20
0
0 25 50 75 100 0
x 0 25 50 75 100
(b) Obviously, the number of students with at most 90 points is missing in the table. So, it needs
to be approximated by the given data. There are several ways how it can be done:
• Given that 90 points is in the last interval, it needs to be that H(90) = H3 + x. The
unknown x is the portion of h4 covering the range from 75 to 90 points. Assuming that the
distribution inside this interval is uniform and continuous, we obtain:
90 − 75
x = h4 · = 0.225 · 0.6 = 0.135
100 − 75
So we get H(90) = H3 + 0, 135 = 91%, or in absolute values 364 (= 91% · 400) students
with at most 90 points.
• An alternative would be to start with H(90) = H(100) − y, where y would be the fraction
of students who achieved more than 90 points.
100 − 90
y = h4 · = 0.225 · 0.4 = 0.09
100 − 75
Thus again H(90) = H4 − 0.09 = 1 − 0.09 = 91%.
For this approximation we assumed that points are continuously and uniformly distributed within
the class from 75 to 100 points. If you assume only discrete numbers of points you get slightly
deviating results. Using appropriate assumptions, those are also valid.
6
Solution of Exercise 5
(a)
(b) Histogram:
0.002
0.0018
0.0016
0.0014
0.0012
h̄(x)
0.001
0.0008
0.0006
0.0004
0.0002
0
0 200 400 600 800 1000 1200
x
7
1
0.96
0.92
0.88
0.84
0.8
0.76
0.72
0.68
0.64
0.6
0.56
H̄(x)
0.52
0.48
0.44
0.4
0.36
0.32
0.28
0.24
0.2
0.16
0.12
8 · 10−2
4 · 10−2
0
0 200 400 600 800 1,000 1,200
(c) Median
i. using the given data set:
x25 + x26 489.9 + 498.0
xmed = = = 493.95
2 2
xmed = H̄ −1 (50%)
50% − 36%
= 400 + · (500 − 400)
52% − 36%
= 400 + 87.5 = 487.5
or
xmed = H̄ −1 (50%)
52% − 50%
= 500 − · (500 − 400)
52% − 36%
= 500 − 12.5 = 487.5
8
ii. Using the distribution function of classes H̄ we obtain the relative frequency of days with
a turnover of 650 EUR or less by:
650 − 600
H̄(650) = H̄(600) + · H̄(700) − H̄(600)
700 − 600
1
= 0.64 + · (0.82 − 0.64)
2
= 0.73 = 73%
So, according to the distribution function of classes on 73% · 50 = 36.5 (or rounded to 37)
days, the turnover was 650 EUR or less.
Using the polygon approximation H̄ as distribution function we presume a uniform distribution
of observations within the classes. However in class (600, 700] we recognize only 3 out of 9 values
being below the midpoint of class whereas 6 observations are above the midpoint.
612.6
| 638.6
{z 642.7} |650.6 651.1 651.3 {z 687.8 689.6 690.7}
3 values (≤650) 6 values (>650)
9
2 Measures to describe statistical distributions
• One of the most important measures is the arithmetic mean. Which alternatives do you know
and when would you prefer one measure over another?
• Why is the variance defined by a square operator, why not only as the sum of differences to the
arithmetic mean, or why not only as the sum of absolute differences?
Solution of Exercise 6
(a) The arithmetic mean of the 9,114 numbers results as
49
1 X
x̄ = nj · j = (187 · 1 + 194 · 2 + 194 · 3 + · · · + 207 · 49)/9114 = 25.22 .
9114 j=1
(b) The uniformly weighted arithmetic mean of the 49 lotto numbers is given by
49
1 X 1 49 · 50
x̄uniform = j= · = 25 .
49 j=1 49 2
The uniformly weighted arithmetic mean is assuming that all numbers appear with the same
frequency. As the calculation shows, this is not true for the given data. However, since the
calculated number is pretty close to 25, it is likely that for very large sample sizes the calculation
approaches the uniformly weighted mean.
Solution of Exercise 7
If 40% of the overall income will be increased by 12% and the residual 60% stay constant, the average
results as follows
40% · 1.12 + 60% · 1 = 1.048 .
So, after increasing income, the arithmetic mean will be 1.048 · 3400 € = 3563.20 €.
Since only the top 20% of incomes will be increased, the median (50%-quantile) will be unaffected and
remains 3100 €.
Solution of Exercise 8
1 11
(a) The arithmetic mean p̄ is · (6 + 3 + 2) = ≈ 3.6667 .
3 3
−1
1 1 1 1 1 1
(b) From · + + = we get the harmonic mean as reciprocal value: Hp = =3.
3 6 3 2 3 3
| {z }
=1
1
(c) The quantity weighted arithmetic mean results as: · (1 · 6 + 2 · 3 + 3 · 2) = 3
1+2+3
The correct answer is given by c). This is the total payment (18) divided by the number of dozens of
oranges bought (6).
Solution of Exercise 9
(a) Using
i 1 2
vi [ km
h ] 60 50
4 5
hi 9 9
10
the average speed (overall distance by total time) is obtained as weighted arithmetic average v̄:
1h · 60 km 5 km
h + 4 h · 50 h
average speed =
1h + 45 h
60 km
h +
5
4 · 50 km
h
= 9
4
4 km 5 km km
= · 60 + · 50 = 54.4
|9 h {z 9 h} h
=h1 ·v1 +h2 ·v2 =v̄
(b) The average growth rate is determined by the geometric mean G1+r of the growth factors 1 + r:
i 1 2 3
ri 1.8% 2.5% 2.0%
1 + ri 1.018 1.025 1.020
√3
√3
G1+r = 1.018 · 1.025 · 1.020 = 1.0643 = 1.0209
Thus the average growth rate is about 2.1%.
(c) Using
district i 1 2 3 4 5
rate of unemployment xi in % 4 3 5 9 6
number of unemployed ni 1600 750 1000 3600 1500
we obtain the average rate of unemployment (ratio of total number of unemployed to total
number of individuals in the examination) as the harmonic mean HX of the xi :
total number n of unemployed
z }| {
1600 + 750 + 1000 + 3600 + 1500 8450
average rate of unemployment = 1600 750 1000 3600 1500 = = 5.63%
4% + 3% + 5% + 9% + 6%
150000
| {z }
total number of individuals in the examination
n
= n1 n2 n5 = HX
x1 + x2 + ··· + x5
11
Solution of Exercise 10
(a) Working table
Deviation Deviation Deviation Deviation
IQ(w) IQ(sch)
IQ(w) − IQ(w) squared IQ(sch) − IQ(sch) squared
90 −10 100 95 0 0
90 −10 100 99 4 16
97 −3 9 90 −5 25
99 −1 1 105 10 100
98 −2 4 85 −10 100
145 45 2025 98 3 9
114 14 196 110 15 225
80 −20 400 96 1 1
85 −15 225 69 −26 676
102 2 4 103 8 64
Σ= 1000 0 3064 950 0 1216
IQ(w) s2w = 306.4 IQ(sch) s2sch = 121.6
= 100 = 95
(b) The variance of the w-group is larger. The same is true for the coefficient of variation:
√ √
306.4 121.6
V Kw = = 0.175 V Ksch = = 0.116
100 95
Solution of Exercise 11
X, Y : The sequences X and Y do not vary at all. Thus s2X = s2Y = 0.
02 + 1 2 + 0 2 + 1 2 + 0 2 + 1 2 + 0 2 + 1 2
U : u = 0.5 ; s2U = − 0.52 = 0.5 − 0.25 = 0.25
8
5 12 + 02 + 12 + 02 + 12 + 02 + 12 + 02 + 12 25 20
V : v= ; s2V = − = ≈ 0.2469
9 9 81 81
33 9 + 9 + 9 + 49 + 25 + 9 + 25 + 16 1089 119
W : w= ; s2W = − = ≈ 1.8594
8 8 64 64
45 1 + 4 + 9 + 16 + 25 + 36 + 49 + 64 + 81 285
T : t= = 5 ; s2T = − 25 = − 25
9 9 9
95 20
= − 25 = ≈ 6.6667
3 3
12
Thus for the arithmetic means x̄, ȳ, variances s2X , s2Y , standard deviations sX , sY and coefficients
of variations VKX , VKY we have:
Europe U.S.
x̄ = 140 EUR ȳ = 115 USD
s2X = x2 − x̄2 = 5025 EUR2 s2Y = − ȳ 2
y2 = 4650 USD2
sX = 70.88 EUR sY = 68.19 USD
VKX = s|x̄|
X
= 70.88
140 = 0.5063 VKY = s|ȳ|
Y
= 68.19
115 = 0.5930
(b) Since the variances and standard deviations are measured with different scales (EUR vs. USD),
we can’t compare them directly. It would be possible to make them comparable using a foreign
exchange rate. Another possibility, that simultaneously refers the dispersion to the absolute level
of the values in the data set is the (dimensionless) coefficient of variation. Using this measure
the dispersion of expenses is larger for the students from the U.S.
Solution of Exercise 13
Y is a linear transformation of X, namely Y = a + bX with a = 4 and b = −2.5. Using x = 12 and
s2X = 25 we obtain:
y = a + b x = 4 − 2.5 · 12 = −26
s2Y = b2 s2X = 6.25 · 25 = 156.25
q
sY = |b| · sX = 2.5 · 5 = 12.5 = s2Y
Solution of Exercise 14
From the given data we obtain:
12
X 15
X
xi = n · x̄ = 12 · 9 = 108 ⇒ xi = 108 + 8 + 12 + 13 = 141
i=1 i=1
12
X 15
X
x2i = n · s2X + x̄2 = 12 · 2.52 + 92 = 1047 ⇒ x2i = 1047 + 82 + 122 + 132 = 1424
i=1 i=1
13
3 Two dimensional distributions
Comprehension questions
Explain in your own words :
• How do average and variance differ (in their calculations) from the case of univariate statistics?
Solution of Exercise 15
(a) Scatter plot:
cXY 8.2857
Coefficent of correlation: rXY = = = 0.9360 .
sx · sY 2.1381 · 4.1404
There is a strong positive relationship between the IQs of twins within this sample. However,
from the sample we cannot deduct that this is a general rule.
14
Solution of Exercise 16
(See also solution 11 on page 12.)
X, Y : The sequences X and Y do not vary. Thus (xi − x) = (yi − y) = 0 for all i. This implies that
all covariances with sequnces X or Y involved also do vanish. Therefore cXU = cY V = 0.
1·1+0·2+1·3+0·4+1·5+0·6+1·7+0·8+1·9 25
cV T : v · t = =
9 9
25 5 45
cV T =v·t−v·t= − · =0
9 9 9
cU V : Since the sequences U and V are of different lengths, cU V is undefined.
0+3+0+7+0+3+0+4 17
rU W : At first we calculate the covariance: u · w = = ,
8 8
17 1 33 1
cU W = u · w − u · w = − · = = 0.0625
8 2 8 16
cU W 0.0625
and then the correlation coefficient: rU W = = = 0.091 67 .
sU · sW 0, 5 · 1.3636
rY V : Since sY = 0 , rY V is undefined. (Division by zero)
cV T
rV T : Because cV T = 0 and sV , sT 6= 0, we have rV T = =0.
sV · sT
Solution of Exercise 17
(a) Two characteristics are gathered: age and type of employment. Age is cardinal but clustered
in intervals, whereas type of employment is nominal and has only two possible values (It’s a so
called dichotomous attribute).
Marginal
Age classes Self-employed Employees (dep.)
distribution Age
15 – 25 0.0028 0.1116 0.1144
25 – 35 0.0165 0.1804 0.1969
35 – 45 0.0397 0.2645 0.3042
45 – 55 0.0348 0.2141 0.2489
55 – 65 0.0218 0.0996 0.1214
65 – 95 0.0075 0.0067 0.0142
Marginal distrib.
0.1231 0.8769 1
Type of employm.
15
Histogram of the attribute age class conditional on dependently employed people:
(d) For the determination of the conditional medians we consider the distribution functions of the
conditional distributions:
Age classes Self-employed Employees (dep.)
15 – 25 0.0227 0.1272
25 – 35 0.1569 0.3329
35 – 45 0.4796 0.6345
45 – 55 0.7621 0.8786
55 – 65 0.9393 0.9922
65 – 95 1 1
For the self-employed, the median is located in the age interval of 45-55 years (since the value
1
2 is therein). Using a polygon to approximate the distributon function the median is given by
(see sketch below):
0.5 − 0.4796
xmed = 45 + (55 − 45) · ≈ 45.72 .
0.7621 − 0.4796
0.5 − 0.3329
ymed = 35 + (45 − 35) · ≈ 40.54 .
0.6345 − 0.3329
16
(e) For the calculation of the conditional means it is assumed that the midpoint of a class represents
the average of the observations within a class. This is unlikely to be exactly the case, however,
it is an reasonable approximation and the errors should be small.
Relative frequencies
Avg. of class Self-employed Employees (dep.)
20 0.0227 0.1272
30 0.1342 0.2057
40 0.3227 0.3016
50 0.2825 0.2441
60 0.1772 0.1136
80 0.0606 0.0077
This reveals a conditional arithmetic mean x for the self-employed of
Solution of Exercise 19
(a) At first, we have absH(Y = y1 ) = 80 ⇒ relH(Y = y1 ) = 80/200 = 0.4 . From this, by
completion of the first column and the last row we find the two values 0.2 . Now we complete
the second column by 0.1 .
Now relH(Y = y3 |X = x1 ) = 0.5 means, that conditional on X = x1 the relative frequency
of y3 is as large as for y1 and y2 altogether. Thus we have h(x1 , y3 ) = h(x1 , y1 ) + h(x1 , y2 ) =
0.2 + 0.1 = 0.3 .
Finally, we find the missing values 0.1 , 0.4 and 0.6 by completion of rows or columns.
h(xi , yj ) y1 = 0 y2 = 1 y3 = 4 Σ
x1 = 1 0.2 0.1 0.3 0.6
x2 = 2 0.2 0.1 0.1 0.4
Σ 0.4 0.2 0.4 1
18
(b) X and Y are staistically dependent, for instance we have:
(c) The condition x + y = 2 is met by pairs (x2 = 2, y1 = 0) and (x1 = 1, y2 = 1) . Their quantity
is given by:
n · (h(2, 0) + h(1, 1)) = 200 · (0.2 + 0.1) = 60
(d)
19
4 Linear regression
Comprehension questions
Explain in your own words :
• Discuss how a regression line and causality are related to each other!
Solution of Exercise 20
Working table:
Month xi yi xi − x yi − y (xi − x)2 (yi − y)2 (xi − x) · (yi − y)
January 3000 200 200 50 40 000 2500 10 000
February 3200 250 400 100 160 000 10 000 40 000
March 2900 200 100 50 10 000 2500 5000
April 2700 150 −100 0 10 000 0 0
May 2700 150 −100 0 10 000 0 0
June 2800 150 0 0 0 0 0
July 2600 100 −200 −50 40 000 2500 10 000
August
September 2500 50 −300 −100 90 000 10 000 30 000
October 2600 70 −200 −80 40 000 6400 16 000
November 2800 150 0 0 0 0 0
December 3000 180 200 30 40 000 900 6000
Σ = 30 800 1650 440 000 34 800 117 000
x = 2800 y = 150 s2X = 40000 s2Y = 3163.64 cXY = 10 636.36
cXY 10 636.36
rXY = = = 0.9473
sX · sY 200 · 56.246
(b) i. Regress Y on X:
cXY 10 636.36
b= 2 = = 0.2659 , a = y − b · x = 150 − 0.2659 · 2800 = −594.52
sX 40000
cXY 10 636.36
b0 = 2 = = 3.3621 , a0 = x − b0 · y = 2800 − 3.3621 · 150 = 2295.6897
sY 3163.64
20
(c)
Regression Y auf / on X
Regression X auf / on Y
250
200
Überstunden / Overtime
150
100
50
2400 2500 2600 2700 2800 2900 3000 3100 3200 3300
Produktion / Production
(d) Estimation for the quantity produced during a “usual” month, where no overtime is done:
(use x(y) and set y = 0)
x(0) = 2295.6897
Solution of Exercise 21
(a) Working table:
Household xi yi xi − x yi − y (xi − x)2 (yi − y)2 (xi − x) · (yi − y)
1 40 80 −20 −18 400 324 360
2 45 80 −15 −18 225 324 270
3 60 90 0 −8 0 64 0
4 80 140 20 42 400 1764 840
5 75 100 15 2 225 4 30
Σ= 300 490 1250 2480 1500
x = 60 y = 98 s2X = 250 s2Y = 496 cXY = 300
(c) Regress Y on X:
cXY 300
b= 2 = = 1.2 , a = y − b · x = 98 − 1.2 · 60 = 26
sX 250
21
(d) Regress X on Y :
cXY 300
b0 = 2 = = 0.6048 , a0 = x − b0 · y = 60 − 0.6048 · 98 = 0.7296
sY 496
(e) “More income means, you can afford more expensive food.”
“More and expensive dairy products support you such that your income increases.”
Solution of Exercise 22
In general the given information is sufficient to calculate a linear regression. However, in doing so, the
calculation reveals, that there is something wrong with the numbers:
From this we deduce values for the correlation coefficient and the coefficient of determination:
17 17
rXY = √ = >1
18 · 15 16.4316
289
R2 2
= rXY = >1
270
However, a correlation coefficient or a coefficient of determination larger than 1 can never happen.
Thus the values given can’t be correct.
Solution of Exercise 23
(a) Working table:
22
From the sums calculated above we obtain:
10.8%
x̄ = = 1.8%
6
13.2%
ȳ = = 2.2%
6
32.56
s2X = − 1.82 = 2.1867
6
49.76
s2Y = − 2.22 = 3.4533
6
35.60
cXY = − 1.8% · 2.2% = 1.9733
6
1.9733
rXY = √ √ = 0.7181
2.1867% · 3.4533%
(b)
cXY 1.9733
b = = = 0.9024
s2X 2.1867
a = ȳ − bx̄ = 2.2% − 0.9024 · 1.8% = 0.5757%
23
(c)
!
"# $ %
s2Ŷ 1.7807
R2 = = = 0.5157
s2Y 3.4533
or alternatively
R2 = rXY
2
= 0.71812 = 0.5157 .
24
5 Combinatorics and counting principles
Comprehension questions
Explain in your own words :
• Are you able to transfer (the methods of) combinatory to other issues? Give examples for
problems beyond your statistics and math classes.
Solution of Exercise 24
Assuming that the English language uses all 26 latin characters, and given that 5 vowels exist, the
combinations with a vowel in the middle are
21 · 5 · 21 = 2205.
However, sometimes the “y”is counted as a vowel. Then, the result becomes
20 · 6 · 20 = 2400.
In either case, this theoretical calculation does not mean that all possible combinations are plausible
from a linguistic point of view.
Solution of Exercise 25
! !
6 43
6 correct numbers: = 1·1=1
6 0
! ! ! !
6 1 42 6
5 correct numbers with bonus: = =6
5 1 0 5
! ! !
6 1 42
5 correct numbers without bonus: = 6 · 1 · 42 = 252
5 0 1
! !
6 43
4 correct numbers: = 15 · 903 = 13 545
4 2
! !
6 43
3 correct numbers: = 20 · 12 341 = 246 820
3 3
Solution of Exercise 27
25
Each area of the team can be setup using a combination where order matters:
5!
2 strikers: = 5·4 = 20 possibilities
(5 − 2)!
7!
5 players for mid-field: = 7 · 6 · 5 · 4 · 3 = 2520 possibilities
(7 − 5)!
6!
3 players for defense: = 6·5·4 = 120 possibilities
(6 − 3)!
3!
1 goal keeper: = 3 = 3 possibilities
(3 − 1)!
26
6 Fundamentals of probability theory
Comprehension questions
Explain in your own words :
• How did Laplace define probability? What is statistical probability and what does subjective
probability mean?
Solution of Exercise 29
(a) Ω = {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6),
(2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6),
(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6),
(4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6),
(5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6),
(6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)}
(b) The event B having a 5 for the first throw has the following 6 elements (elementary events)
B = {(5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6)} ,
thus P(B) = 61 .
The event A, that the sum is 10 or larger, has again 6 elements (elementary events)
A = {(4, 6), (5, 5), (5, 6), (6, 4), (6, 5), (6, 6)} ,
thus P(A) = 61 .
1
The joint event A ∩ B = {(5, 5), (5, 6)} has two elementary events, P(A ∩ B) = 18 .
In order to determine the conditional probability P(A|B) we calculate
P(A ∩ B) 1/18 1
P(A|B) = = = .
P(B) 1/6 3
(c) The modified request B 0 that at least one of the dice shows a 5, consists of 11 elements:
B 0 = {(5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6), (1, 5), (2, 5), (3, 5), (4, 5), (6, 5)}
Thus P(B 0 ) = 11
36 . The event A ∩ B 0 has 3 elements
thus P(A ∩ B 0 ) = 1
12 . For the conditional probability P(A|B 0 ) we now have
P(A ∩ B 0 ) 1/12 3
P(A|B 0 ) = 0
= = .
P(B ) 11/36 11
Solution of Exercise 30
27
There are 26 different six-digit patterns consisting of heads and tails, e.g. (H, H, T, T, T, H). All of
them appear with an equal probability of 216 . We are interested in patterns with exactly three H. The
number of such patterns is 63 . Thus, the probability we are looking for is given by
6
3 20 5
= = = 31.25% .
26 64 16
Solution of Exercise 31
Thus:
12 1
P(A) = =
36 3
21 7
P(B) = =
36 12
7
P(A ∩ B) =
36
1 7 7
P(A) · P(B) = · = = P(A ∩ B)
3 12 36
Therefore A and B are independent.
Solution of Exercise 32
With two children the sample space Ω consists of four elements with all alike probabilities (B =
ˆ boy;
G=
ˆ girl) :
Ω = {(B; B), (B; G), (G; B), (G, G)}
# of favorable outcomes
We determine the probabilities a), b), c) by the ratio .
# of possible outcomes
|{(B; B)}| 1
(a) =
|Ω| 4
|{(B; B)}| 1
(b) =
|{(B; B), (B; G), (G; B)}| 3
|{(B; B)}| 1
(c) =
|{(B; B), (B; G)}| 2
Solution of Exercise 33
28
20 1
arrival minutes train to P(A) = =
120 6
from to A B C
40 1
2:00 pm 2:10 pm 10 10 P(B) = =
120 3
2:10 pm 2:30 pm 20 20
60 1
2:30 pm 3:00 pm 30 30 P(C) = =
120 2
3:00 pm 3:10 pm 10 10
3:10 pm 3:30 pm 20 20
3:30 pm 4:00 pm 30 30
120 20 40 60
Solution of Exercise 34
P(A ∩ B) 0.2 2
(b) P(A|B) = = =
P(B) 0.3 3
Solution of Exercise 35
Solution of Exercise 36
Let F be the event, that a randomly choosen good is faulty. Let A and B be the events that the good
was produced by the respective machine. Then we know:
P(A) = 0.7 , P(B) = P(A) = 0.3 , P(F |A) = 0.08 , P(F |B) = 0.06 .
P(F ) = P(F |A) · P(A) + P(F |B) · P(B) = 0, 08 · 0.7 + 0.06 · 0.3 = 0.074
29
and finally the conditional probability
P(F |A) · P(A) 0.08 · 0.7
P(A|F ) = = ≈ 0.7568 = 75.68% .
P(F ) 0.074
Solution of Exercise 37
Drawing without putting back is like randomly ordering the parts. For an arbitrary place in the
ordered sequence of the 25 parts, the possibility to put a faulty part there is
5 1
= = 20% .
25 5
This is the correct answer for a) as well as for b).
Solution of Exercise 38
(a) Let M be the event that a person randomly chosen from the group is a man, W the event, that
a person is a woman and let T be the event that a person is taller than 1.90 m. Then we know:
P(T |M ) = 0.04, P(T |W ) = 0.01, P(W ) = 0.6 ⇒ P(M ) = 0.4
Applying Bayes’ theorem we compute the desired probability P(W |T ):
P(T |W ) · P(W )
P(W |T ) =
P(T |W ) · P(W ) + P(T |M ) · P(M )
0.01 · 0.6 3
= = = 0.27
0.01 · 0.6 + 0.04 · 0.4 11
(b) Let P1 , P2 , P3 be the events that the pond containing 1, 2 or 3 fish is chosen randomly.
1
P(P1 ) = P(P2 ) = P(P3 ) =
3
Let M be the event, that a fish is cought and marked and that on the next day an unmarked
fish is cought from the same pond. Then we have:
1 2
P(M |P1 ) = 0 , P(M |P2 ) = , P(M |P3 ) =
2 3
According to the law of total probability we get:
1 1 2 1 7
P(M ) = P(M |P1 ) · P(P1 ) + P(M |P2 ) · P(P3 ) + P(M |P3 ) · P(P3 ) = 0 + · + · =
2 3 3 3 18
Finally applying Bayes’ theorem yields the probabilties that the chosen pond contains (i) one,
(ii) two or (iii) three fish:
P(M |P1 ) · P(P1 )
i. P(P1 |M ) = =0
P(M )
1 1
P(M |P2 ) · P(P2 ) 2 · 3 3
ii. P(P2 |M ) = = 7 =
P(M ) 18
7
2 1
P(M |P3 ) · P(P3 ) 3 · 3 4
iii. P(P3 |M ) = = 7 =
P(M ) 18
7
(c) Let Wi , i = 1, 2 be the events ‘White ball with the ith draw’.
i. P(W2 ) = P(W2 |W1 ) · P(W1 ) + P(W2 |W1C ) · P(W1C )
3 4 4 5 32 4
= · + · = =
8 9 8 9 8·9 9
3 4
P(W2 |W1 ) · P(W1 ) · 3
ii. P(W1 |W2 ) = = 849 =
P(W2 ) 9
8
30
7 Random variables in one dimension
Comprehension questions
Explain in your own words :
• What is a random variable? How is the sample space Ω linked to the set of events E?
Solution of Exercise 39
A real valued function F defined on R is a distribution function, if and only if the following conditions
hold:
31
(a) For the mass function fX : R → R+
0 of X it holds
1
16 for x = 0
4 1
16 = 4 for x = 1
!
4 1 = 6
3
for x = 2
fX (x) = · 4 = 16
4
8
1
x 2 16 =
4 for x = 3
1
for x = 4
16
0 otherwise
11/16
5/16
1/16
0
x
0 1 2 3 4
Remark: X ist binomially distributed with parameters n = 4 and p = 21 , X ∼ Bi(4, 12 ). Thus the
results can be obtained directly from properties of binomially distributed random variables: For the
expectation value it holds, that E(X) = n · p = 4 · 21 = 2 and for the variance V(X) = n · p · (1 − p) =
4 · 12 · 12 = 1 .
Solution of Exercise 41
(a) The (triangular) area under the graph of the density function must be of size 1:
1 !
· 3 · 2a = 3a = 1
2
1
Thus a = 3 .
R∞ R1 R3
As an alternative one sets the integral −∞ f (x) dx = 0 2ax dx + 1 (3a − ax) dx equal to 1.
32
(b)
f(x)
2/3
0 x
0 1 2 3
(c)
P(X = 1) = 0
Z 1 Z 2
2 1
P(0.5 < X < 2) = x dx + 1− x dx
0.5 3 1 3
1 2 1 2
1 1 1 2 1 3
= x + x − x2 = − +2− −1+ =
3 0.5 6 1 3 12 3 6 4
(d) The condition (X < 0.5) is a subset of (X < 1), thus P(X < 1|X < 0.5) = 1 .
(e) No. It is apparent from the graph in b), that the mode lies at XMod = 1. However, the
expectation value for this positively skewed distribution is larger than that.
Solution of Exercise 42
f(x)
0.5
0 x
0 1 2 3
(a)
33
(b)
Z 2 Z 2
2
P(X ≤ 2) = f (x) dx = (3x − x2 ) dx
−∞ 9 0
2 3 2 1 3 2 2 8 20
= x − x = 6− =
9 2 3 0 9 3 27
Solution of Exercise 43
(a) Since every distribution function is continuous from the right, we have:
(b)
x − 1
for 1 ≤ x ≤ 3
f (x) = F 0 (x) = 2
0 otherwise
34
(d)
+∞ Z3 " #3
1 1 x3 x2
Z
2
E(X) = x · f (x) dx = x − x dx = −
2 2 3 2 1
−∞ 1
1 9 1 1 7
= 9− − + =
2 2 3 2 3
+∞ Z3 " #3
1 1 x4 x3
Z
2 2 3 2
E(X ) = x · f (x) dx = x − x dx = −
2 2 4 3 1
−∞ 1
1 81
17 1 1
= = −9− +
2 43 4 3
17 49 2
V(X) = E(X 2 ) − [E(X)]2 = − =
3 9 9
k
(e) The quartiles Qk fulfil the condition F (Qk ) = 4 for k = 1, 2, 3. Therefore 1 ≤ Qk ≤ 3 and
k
F (Qk ) = ⇐⇒
4
Qk − 1 2 k
= ⇐⇒
2 4
√
Qk − 1 = k ⇐⇒
√
Qk = 1 + k,
thus √ √ √
Q1 = 1 + 1=2, Q2 = 1 + 2, Q3 = 1 + 3.
Solution of Exercise 44
For a) and c) we use Chebyschev’s inequality in the form
1
P(|X − µ| < kσ) ≥ 1 −
k2
4 4
(a) Let kσ = 2 ⇒ k 2 = σ2
= 2 =2.
1 1
Then we have P(8 < X < 12) = P(|X − 10| < 2) ≥ 1 − 2 = 2 .
25 25
(c) Let kσ = 5 ⇒ k 2 = σ2
= 2 .
2 23
Then we have P(5 < X < 15) = P(|X − 10| < 5) ≥ 1 − 25 = 25 .
For b) and d) we use Chebyschev’s inequality in the form
1
P(|X − µ| ≥ kσ) ≤
k2
9 9
(b) Let kσ = 3 ⇒ k 2 = σ2
= 2 .
2
Then we have P({X < 7} ∪ {13 < X}) = P(|X − 10| ≥ 3) ≤ 9 . (Note, that X is continuous.)
64 64
(d) Let kσ = 8 ⇒ k 2 = σ2
= 2 = 32 .
1
Then we have P({X < 2} ∪ {18 < X}) = P(|X − 10| ≥ 8) ≤ 32 .
Solution of Exercise 45
35
(a) The area of the dartboard is (2d)2 = 64, the area of the bull’s eye is π ≈ 3.1416. Therefore the
1 π
probability to score a 2, 4, 6 or 8 is 4 1 − 64 ≈ 23.77%. The probability to hit the bull’s eye
π
is 64 ≈ 4.91%.
For the probability mass function of X we have approximately:
23.77%
for x ∈ {2, 4, 6, 8}
fX (x) = 4.91% for x = 20
0 otherwise
(c)
36
8 Multidimensional random variables
Comprehension questions
Explain in your own words :
• What does „stochastically independent“ mean and how can you examine the stochastic indepen-
dence of two random variables?
Solution of Exercise 46
(a) We obtain the marginal distributions by calculating the sum of every column and every row
Y
2 3 4
1 1 1 1
10
9 9 9 3
1 1 1
X 20 0
6 6 3
1 2 1 1
30
18 9 18 3
1 1 1
3 3 3
xi 10 20 30 yi 2 3 4
being and
fX (xi ) 1/3 1/3 1/3 fY (yi ) 1/3 1/3 1/3
f (x, 4)
(b) Using f1 (x|Y = 4) = = 3 · f (x, 4) we deduct the conditional distribution:
fY (4)
xi 10 20 30
f1 (xi |Y = 4) 1/3 1/2 1/6
(c)
1
E(X) = · (10 + 20 + 30) = 20
3
1 1400
E(X 2 ) = · (102 + 202 + 302 ) =
3 3
2 2 1400 200
V(X) = E(X ) − E(Y ) = − 400 = = 66.6667
3 3
1
E(Y ) = · (2 + 3 + 4) = 3
3
1 29
E(Y 2 ) = · (22 + 32 + 42 ) =
3 3
2 29 2
V(Y ) = E(Y 2 ) − E(Y ) = − 9 = = 0.6667
3 3
37
(d)
1 1 1
E(X · Y ) = 10 · 2 · + 10 · 3 · + 10 · 4 ·
9 9 9
1 1
+ 20 · 2 · + 20 · 3 · 0 + 20 · 4 ·
6 6
1 2 1
+ 30 · 2 · + 30 · 3 · + 30 · 4 · = 60
18 9 18
Cov(X, Y ) = E(X · Y ) − E(X) · E(Y ) = 60 − 20 · 3 = 0
ρXY = 0
(e) X and Y are uncorrelated since the covariance vanishes, but they are not independent, because
marginal distributions and conditional distributions differ. [Compare a) and b)]
Solution of Exercise 47
Let the random variable X describe profit & loss of the bet with a probability p of winning:
x −1 3
P(X) = x 1−p p
An expectation value of 0 gives a condition for p:
!
E(X) = p − 1 + 3p = 4p − 1 = 0
1
=⇒ p = = 25%
4
For the gambling game the random variable Y shall describe profit & loss:
y −1 1 2 3
3
3 5 2 1 1 3 5 1 1 2
3
5 1
P(Y = y) 6 1 6 6 2 6 6 6
125 75 15 1
= 216 = 216 = 216 = 216
38
(b)
39
9 Stochastic models and special distributions
Comprehension questions
Explain in your own words :
• How is an arbitrary normal distribution linked to the standard normal distribution? Can a nor-
mally distributed random variable be transformed into a standard normally distributed variable?
• Is it possible to give an exact density function for a real random variable like a stock return?
Solution of Exercise 49
(
1 for 0 ≤ x < 1
(a) Density function: f : R → R+
0, f (x) =
0 otherwise
0
for x < 0
Distribution function: F : R → [0, 1] , F (x) = x for 0 ≤ x < 1
for 1 ≤ x
1
1
E(X) = (symmetry)
2
Z 1
1 3 1 1
E(X 2 ) = x2 dx =
x =
0 3 0 3
1 1 1
V(X) = E(X 2 ) − E(X)2 = − =
3 4 12
40
(b) Probability mass function:
(
1
100 000 for y ∈ {0.000 00; 0.000 01; 0.000 02; . . . ; 0.999 99}
fY (y) =
0 otherwise
Solution of Exercise 50
The number of life insurances sold can be described by a binomially distributed random variable X
2
with parameters n = 16 and p = 10 = 0.2. X ∼ Bi(16; 0.2)
(a)
(b)
3 3
!
X X 16
P(X ≤ 3) = P(X = x) = · 0.2x · 0.816−x
x=0 x=0
x
16 · 15 16 · 15 · 14
= 0.816 + 16 · 0.2 · 0.815 + · 0.22 · 0.814 + · 0.23 · 0.813
2 6
= 59.81%
41
Solution of Exercise 51
(a) The number X of correct answers is binomially distributed with parameters n = 20 and p = 15 ;
X ∼ Bi(20, 51 ). On average the student will answer n · p = 4 questions correctly.
(Hint: Use tables or software with statistic functions to determine the values of probabilities
P(X ≤ 9), etc.)
Now we are looking for the largest x such that P(X ≥ x) > 5% holds:
P(X ≥ 9) = 1 − P(X ≤ 8) ≈ 1%
P(X ≥ 8) = 1 − P(X ≤ 7) ≈ 3%
P(X ≥ 7) = 1 − P(X ≤ 6) ≈ 8.7%
The limit for passing the exam would be lowered to 7 correct answers.
Solution of Exercise 52
We are dealing with a binomial distribution with parameters n = 10 and p = 0.5 ; X ∼ Bi(10; 0.5) .
(a)
1
P({X ≤ 1} ∪ {X ≥ 9}) = P(|X − 5| ≥ |{z}
4 )≤
k2
=kσ
16 16 32
With kσ = 4 we get k 2 = σ2
= np(1−p) = 5 = 6.4. This leads to the estimation:
1 5
P({X ≤ 1} ∪ {X ≥ 9}) ≤ 2
= = 0.156 25 = 15.625%
k 32
Solution of Exercise 53
The annual savings of an account are described by a normally distributed random variable X with
parameters µ = 400 und σ 2 = 2002 ; X ∼ N 400, 2002 . Then Z = X−400
200 is standard normally
distributed, Z ∼ N (0, 1). We now calculate the proportions of the probability mass that falls into the
respective classes and determine the number of accounts within each class:
42
Standardisation Values of Difference: number of
of class limits distribution function portion for class accounts
0−400
(1) z1 = 200 = −2 FSt (−2) = 0.0228 0.0228 28 044
200−400
(2) z2 = 200 = −1 FSt (−1) = 0.1587 0.1359 167 157
300−400
(3) z3 = 200 = −0.5 FSt (−0.5) = 0.3085 0.1458 184 254
400−400
(4) z4 = 200 =0 FSt (0) = 0.5000 0.1915 235 545
600−400
(5) z5 = 200 =1 FSt (1) = 0.8413 0.3413 419 799
800−400
(6) z6 = 200 =2 FSt (2) = 0.9772 0.1359 167 157
∞−400
(7) z7 = 200 =∞ FSt (∞) = 1.0000 0.0228 28 044
Solution of Exercise 54
(a)
P(−1 ≤ Z ≤ 1) = FSt (1) − FSt (−1) = 2 · FSt (1) − 1 = 2 · 0.8413 − 1 = 68.62%
(b)
X − 18
P(−1 ≤ Z ≤ 1) = P(−1 ≤ ≤ 1) = P(−4 ≤ X − 18 ≤ 4) = P(14 ≤ X ≤ 22)
4
The corresponding interval for X is given by [14, 22].
(c)
10 − 18 X − 18 26 − 18
P(10 < X ≤ 26) = P < ≤
4 4 4
= P(−2 < Z ≤ 2) = FSt (2) − FSt (−2)
= 2 · FSt (2) − 1 = 2 · 0.9772 − 1 = 95.44%
Solution of Exercise 55
X − 50 a − 50
a − 50
P(X < a) = 0.6 ⇐⇒ P < = 0.6 ⇐⇒ FSt = 0.6
5 }
| {z 5 5
∼N(0,1)
a − 50
⇐⇒ = 0.25 ⇐⇒ a ≈ 51.25 .
5
b − 50 50 − b
P(X ≥ b) = 0.75 ⇐⇒ P(X < b) = 0.25 ⇐⇒ FSt = 0.25 ⇐⇒ FSt = 0.75
5 5
50 − b
⇐⇒ = 0.67 ⇐⇒ b ≈ 46.65.
5
X − 50
c c c
P(|X − 50| < c) = 0.6 ⇐⇒ P
< = 0.6 ⇐⇒ D = 2FSt − 1 = 0.6
5 5 5 5
c c
⇐⇒ FSt = 0.8 ⇐⇒ ≈ 0.84 ⇐⇒ c ≈ 4.2 .
5 5
43
Solution of Exercise 56
(a)
! ! !
45 45 45
P(10 ≤ X ≤ 12) = · 0.410 · 0.635 + · 0.411 · 0.634 + · 0.412 · 0.633
10 11 12
= 0.005 750 56 + 0.012 198 1 + 0.023 040 9
= 0.040 99 ≈ 4.1%
µ = E(X) = 45 · 0.4 = 18
V(X) = 45 · 0.4 · 0.6 = 10.8
√
σ = 10.8 = 3.286
12.5 − 18 9.5 − 18
P(10 ≤ X ≤ 12) ≈ FSt √ − FSt √
10.8 10.8
= FSt (−1.674) − FSt (−2.586)
= FSt (2.586) − FSt (1.674)
= 0.9952 − 0.9529 ≈ 4.23%
Solution of Exercise 57
For a given probability 1−p the limits of the intervals for U ∼ N (0, 1) are given by the (1− p2 )-quantiles
z[1 − p/2] of the standard normal distribution:
44
a) b)
1−p Quantiles Range for U Range for W
80% z[0.90] = 1.282 −1.282 < U < 1.282 −5.641 < W < −4.359
90% z[0.95] = 1.645 −1.645 < U < 1.645 −5.823 < W < −4.178
95% z[0.975] = 1.960 −1.960 < U < 1.960 −5.980 < W < −4.020
100% z[1.0] = ∞ −∞ < U < ∞ −∞ < W < ∞
Solution of Exercise 58
Let X be the normally distributed approximation of the random variable describing the points
achieved.
49.5 − 60
(a) P(X ≤ 49.5) ≈ P Z ≤ = FSt (−1.05) = 1 − FSt (1.05) = 1 − 0.8531 = 14.69%
10
(b) P(79.5 < X ≤ 95.5) ≈ P(1.95 < Z ≤ 3.55) = FSt (3.55) − FSt (1.95) = 0.9998 − 0.9744
= 2.54%
(c) 10% = FSt (−1.28) = P(Z ≤ −1.28) ≈ P(X ≤ |−1.28 ·{z10 + 60})
=47.2
The limit for passing the exam should be set to 47 points.
Then P(X < 47) = P(X ≤ 46.5) ≈ 1 − FSt (1.35) = 8.85% < 10%,
but P(X < 48) = P(X ≤ 47.5) ≈ 1 − FSt (1.25) = 10.56% > 10%.
45
10 Limit theorems
Comprehension questions
Explain in your own words :
• Does the central limit theorem put requirements on the original distribution of random variables?
• What does convergence with probability 1 mean? For which law do we need this definition?
Solution of Exercise 59
(a) The expectation value of the sampling X = n1 ni=1 Xi corresponds to the expectation value of
P
Solution of Exercise 60
Consider the 120 applications as a random sampling of a binomially distributed random variable X
with parameters p = 0.4 and n = 120; X ∼ Bi(120; 0.4). For X we have E(X) = n · p = 48 and
V(X) = n · p(·1 − p) = 28.8. A proportion of female applicants of 35% corresponds to 42 women and
a proportion of 45% corresponds to 54 women among all applicants.
41 − 48 X − 48 54 − 48 − 48 41 − 48
≈ FSt 54
P(41 < X ≤ 54) = P √
< √ ≤ √ √ − F St √
28.8 28.8
| {z }
28.8
28.8 28.8
≈∼N(0,1)
(b) Taking into account the correction for continuity we obtain more or less the same result:
54.5 − 48 41.5 − 48
P(41 < X ≤ 54) ≈ FSt √ − FSt √
28.8 28.8
Solution of Exercise 61
1
The number of claims X is binomially distributed with n = 1 000 000 and p = 100 .
46
(a)
µ = E(X) = n · p = 10 000
σ 2 = V(X) = n · p · (1 − p) = 9900
(b) X can be approximated appropriately by a normally distributed random variable Y with pa-
rameters µ und σ 2
Y −µ y
P(|Y − µ| ≤ y) = 95% ⇐⇒ P( | | ≤ ) = 95%
σ }
| {z σ
=Z∼N(0,1)
y
⇐⇒ P |Z| ≤ = 95%
σ
y y
⇐⇒ 2FSt − 1 = 95% ⇐⇒ FSt = 0.975
σ σ
y √
⇐⇒ = 1.96 ⇐⇒ y = 1.96 · 9900 = 195.02 .
σ
With probability 95% the number of claims is in the interval
(c)
Y −µ y−µ
P(Y > y) = 5% ⇐⇒ P(Y ≤ y) = 95% ⇐⇒ P ≤ = 95%
σ σ
y−µ y−µ
⇐⇒ FSt = 95% ⇐⇒ = 1, 645
σ σ
√
⇐⇒ y = 1, 645 · σ + µ = 1, 645 · 9900 + 10 000 = 10 163.7
The number of claims exceeds 10 164 with a probability of (at most) 5%.
47
11 Point estimation of population parameters
Comprehension questions
Explain in your own words :
• How do you compare two different estimators? Which criterion do you use?
• You can define many different estimators. Is there one estimator which is always “correct”?
Solution of Exercise 62
(a) Estimator for the mean:
µ̂G = g = 600 000 €
Estimator for the standard deviation:
r
n 225
r
σ̂G = · sG = · 90 000 € ≈ 90 201 €
n−1 224
√
(b) According to the n-law it holds:
1 90 201
σ̂G = √ · σ̂G = = 6013 €
n 15
Solution of Exercise 63
(a) Estimation of the standard deviation in the basic population:
r
n 225
r
σ̂G = · sG = · 0.005 kg ≈ 0.005 011 1 kg
n−1 224
(more or less sG )
Estimation of the standard deviation of the sample mean:
1 0.005 011 1 kg
σ̂G = √ · σ̂G = = 0.000 334 kg
n 15
Solution of Exercise 64
2 and σ̂ 2 are unbiased:
(a) σ̂A C
n
2 1X h i 1
E(σ̂A ) = E (Xj − µ)2 = · n · σ 2 = σ 2
n j=1 | {z } n
=σ 2
2 1
E(σ̂C ) = · E(X12 ) + E(X22 ) + E(Xn2 ) −µ2 = E(X12 ) − µ2 = σ 2
3 | {z }
=3·E(X12 )
48
2 is independent of the number n. Therefore it will not
(b) The value of the variance of estimator σ̂C
converge to zero with growing n. Thus σ̂C 2 is not a consistent estimator.
n 9 1
2 2
bias = E(σ̂B ) −σ = − 1 σ2 = − 1 σ2 = σ2
n−1 8 8
49
12 Interval estimation
Comprehension questions
Explain in your own words :
• What is the connection between the probability of error and the probability of confidence?
• What is the connection between the normal distribution and the chi-square-distribution?
Solution of Exercise 65
(a)
1
µ = · (0 + 1 + 2 + 3 + 4) = 2
5
1
σ2 = · (1 + 4 + 9 + 16) − 22 = 6 − 4 = 2
5
X (2) X1
0 1 2 3 4
0 0 0.5 1 1.5 2
1 0.5 1 1.5 2 2.5
X2 2 1 1.5 2 2.5 3
3 1.5 2 2.5 3 3.5
4 2 2.5 3 3.5 4
(c)
(d)
σ2 2
V(X (2) ) = = =1
2 2
3 2 1
P(2.5 < X (2) ≤ 3.5) = f (3) + f (3.5) = + = = 20%
25 25 5
50
(f) The random variable X (50) has the expectation value of E(X (50) ) = 2 and the variance V(X (50) ) =
σ2 2 1
50 = 50 = 25 . Applying the central limit theorem the distribution of X (50) can be approximated
well by a normal distribution:
Solution of Exercise 66
From solution 62 we know:
Considering a sample size of n = 225 we assume a normal distribution for the sample mean. A
probability of error of α = 4.55 % corresponds to the quantile z[1 − α/2] = z[0.977 25] ≈ 2.00.
Therefore we get the confidence interval
h i
CI(µ, 1 − α) = g − z[1 − α/2] · σ̂G , g + z[1 − α/2] · σ̂G
CI(µ; 0.9545) = [600 000 − 2 · 6013 ; 600 000 + 2 · 6013] = [587 974 ; 612 026] .
For the extrapolation we multiply the mean by the number of companies N = 12 100.
Solution of Exercise 67
The variance in the basic population is known and the characteristic is normally distributed.
(a)
σ 3
σX = = √ = 0.3
100 10
1 − α = 95% ⇒ z[1 − α/2] = z[0.975] = 1.96
CI(µ, 95%) = [53.97 − 1.96 · 0.3 ; 53.97 + 1.96 · 0.3] = [53.382; 54.558]
σ 3 11.76 !
2 · z[1 − α/2] · σX = 2 · z[1 − α/2] · √ = 2 · 1.96 · √ = √ = 0.4
n n n
2
Solving for n reveals n = 11.76
0.4 = 864.36. For sample sizes of 865 or larger the length of the
confidence intervals is 0.4 or below.
51
(c) For a symmetric 99% confidence interval we have z[1 − α/2] = z[0.995] = 2.575 and therefore:
2
3 15.45
!
0.4 = 2 · 2.575 · √ ⇒ n= ≈ 1491.9.
n 0.4
The minimum sample size is 1492.
Solution of Exercise 68
characteristic IQ is assumed to be normally distributed with parameters µ = 100 and
(a) The √
σ = 225 = 15. Then Z = IQ−10015 is standard normally distributed. Thus:
IQ − 100 130 − 100
P(IQ > 130) = P > = P(Z > 2) = 1 − FSt (2) = 1 − 0.9772 = 2.28 %.
15 15
E(IQ) = 100
225
V(IQ) = = 2.25
100
σIQ = 1.5
Again by standardisation:
98 − 100 IQ − 100 103 − 100
P(98 < IQ < 103) = P < <
1.5 1.5 1.5
Solution of Exercise 69
(a) Z ∞ Z ∞ Z ∞
1 1
E(X) = x · f (x) dx = x· dx = dx = [ln x]∞
1 =∞
−∞ 1 x2 1 x
The expectation value does not exist!
(b) Z x Z x x
1 1 1
f (t) dt = dt = − =1−
1 1 t2 t 1 x
(
0 for x ≤ 1
Distribution function: F (x) = 1
1− x for x > 1
Requirement for median:
! 1 1 1
F (xMed ) = ⇐⇒ 1− = ⇐⇒ xMed = 2
2 xMed 2
52
Solution of Exercise 70
Interval estimation for µ with unknown variance σ 2 , small sample size (n = 5) and normally distributed
characteristic in the basic population:
CI(µ, 1 − α) = x − tn−1 [1 − α/2] · σ̂X , x + tn−1 [1 − α/2] · σ̂X
2% − 4% + 3% − 2% − 1%
x = = −0.4 %
5
(2%)2 + (4%)2 + (3%)2 + (2%)2 + (1%)2
s2 = − (0.4 %)2 = 6.64 (%)2
√ 5
s 6.64%
σ̂X = √ = √ = 1.288%
n−1 4
tn−1 [1 − α/2] = t4 [0.99] = 3.747
Confidence Interval:
Assessment: Based on the given sample a useful confidence interval at a confidence level of 98% can
hardly be indicated.
Solution of Exercise 71
(a) Interval estimation for µ with known variance σ 2 :
CI(µ, 1 − α) = x − z[1 − α/2] · σX , x + z[1 − α/2] · σX
s2 = x2 − x2 = 200.286 GE2
χ26 [0.95] = 12.59
χ26 [0.05] = 1.635
7 · 200.286 7 · 200.286
2
CI(σ , 90%) = ; = [111.4 ; 857.5] (GE)2
12.59 1.635
53
(c) Length of interval:
σ !
2 · z[1 − α/2] · σX = 2 · z[1 − α/2] · √ = 10 GE
n
√ !2
2 · z[1 − α/2] · σ 2 1.645 · 95.5
=⇒ n = = = 10.34
10 GE 5
The sample size should at least be 11.
Solution of Exercise 72
(a) Sample statistics and required estimators:
100 + 103 + 104 + 106 + 112
x = = 105
5
1002 + 1032 + 1042 + 1062 + 1122
x2 = = 11041
5
s2X = x2 − x2 = 16
2 5 2 5
σ̂X = · s = · 16 = 20
4 X 4
2 1 2 1
σ̂X = · σ̂X = · 20 = 4
5 5
or directly
2 1 2 1
σ̂X = · sX = · 16 = 4
4 4
σ̂X = 2
i. Interval estimation for µ with unknown variance σ 2 , small sample size (n = 5) and normally
distributed characteristic in the basic population:
CI(µ, 1 − α) = x − tn−1 [1 − α/2] · σ̂X , x + tn−1 [1 − α/2] · σ̂X
Confidence interval:
Confidence interval:
5 · 16 5 · 16
2
CI(σ , 95%) = ; = [7.18 ; 165.29]
11.14 0.484
54
(b) Confidence interval for µ with unknown variance σ 2 and large sample size:
CI(µ, 1 − α) = x − z[1 − α/2] · σ̂X , x + z[1 − α/2] · σ̂X
Sample statistics:
x = 150 g
s = 28 g
s 28 g
σ̂X = √ =√
n−1 n−1
z[1 − α/2] = z[0.95] ≈ 1.65
Comparing the length of the confidence interval with the difference of the given boundaries yields
the sample size:
55
13 Statistical hypotheses testing
Comprehension questions
Explain in your own words :
• When will the normal distribution be used and when the t-distribution?
Solution of Exercise 73
(a) s
σ 0.45 mm2
σX = = = 0.3 mm
n 5
(b) No, this is not an estimation, because the variance in the basic population was assumed to be
known.
(c) For α = 3% and a two sided rejection region we find the quantile z[1 − α/2] = z[0.985] = 2.17.
The machine has to be stopped, if the value of the sample mean deviates more than
Solution of Exercise 74
Sample statistics and required estimators:
12.7 + 13.3 + 13.0 + 12.9 + 13.1
x = = 13.0
5
12.72 + 13.32 + 13.02 + 12.92 + 13.12
x2 = = 169.04
5
s2X = x2 − x2 = 0.04
2 5 2 5
σ̂X = · sX = · 0.04 = 0.05
4 4
2 1 2 1
σ̂X = · σ̂ = · 0.05 = 0.01
5 X 5
or directly
2 1 2 1
σ̂X = · s = · 0.04 = 0.01
4 X 4
σ̂X = 0.1
56
(4) Test decision:
1.7 < 2.776 =⇒ retain H0 !
|x − µ0 | |13 − 12.83|
= = 2.003
σX 0.084 84
Solution of Exercise 75
Sample statistics and required estimators:
12.7 + 13.3 + 13.0 + 12.9 + 13.1
x = = 13.0
5
12.72 + 13.32 + 13.02 + 12.92 + 13.12
x2 = = 169.04
5
s2X = x2 − x2 = 0.04
2 5 2 5
σ̂X = · sX = · 0.04 = 0.05
4 4
2 1 2 1
σ̂X = · σ̂ = · 0.05 = 0.01
5 X 5
or directly
2 1 2 1
σ̂X = · sX = · 0.04 = 0.01
4 4
σ̂X = 0.1
57
(b) (1) Formulate hypotheses:
Solution of Exercise 76
(a) One sided Gauß-Test for µ with known variance σ 2
Sample statistics:
1.512 % + · · · + 1.396 %
x = = 1.445 %
r 8
0.01
σX = % = 0.0354 %
8
8.4% + · · · 8.3%
x = = 8.6%
r 7
0.5
σX = % = 0.267%
7
H0 : µ = µ0 = 8% vs. H1 : µ 6= 8%
58
(2) Calculate test quantity:
x − µ 8.6% − 8%
0
= = 2.247
σX 0.267%
Solution of Exercise 77
Comparison of two means:
Sample statistics and required estimators:
xA = 5.9%
xB = 7.7%
s2A = 0.597
s2B = 0.328
nA s2A + nB s2B 7 · 0.597 + 5 · 0.328
σ̂ 2 = = = 0.582
nA + nB − 2 7+5−2
s
√
r
nA + nB 7+5
σ̂∆ = σ̂ · = 0.582% · = 0.447%
nA · nB 7·5
Solution of Exercise 78
(a) Two sided test for variance:
Sample statistics:
5+0+1+3−4
x = = 1 MU
5
42 + 12 + 02 + 22 + 52
s2X = = 9.2 MU2
5
59
(2) Calculate test quantity:
n · s2X 5 · 9.2
= = 4.6
σ02 10
Solution of Exercise 79
One sided test from below: Check whether the deficiencies have become significantly less.
(1) Formulate hypotheses:
60