You are on page 1of 21

1

Course of Study Exercises Statistics


Bachelor Computer Science WS 2020/21

Sheet VII - Solutions


Discrete Random Variables and Distributions
1. A random experiment consists of tossing n (distinct) coins and record-
ing the sequence of scores X1 , X2 , ..., Xn , where 1 denotes head and 0
denotes tail. Let Y denote the number of heads.

(a) What is a suitable sample space Ω? How many elements contains


Ω?
(b) Express Y as a function on the sample space Ω.
(c) Show that |{Y = k}| = nk for k ∈ {0, 1, ..., n}.


(d) With n=5, explicitly list the elements in the event {Y = 3}.

Answer: n different
P coins are tossed and Xi = is the score for coin i.
Furthermore, Y = ni=1 Xi = Number of heads

(a) Ω = {(x1 , x2 , ..., xn )|xi ∈ {0, 1}i = 1, 2, ..., n}, |Ω| = 2n


(b) Y = X1 + X2 + ... + Xn , Y : Ω 7→ {0, 1}
(c) Y = k : subset of {1, 2, ..., n} with k elements
(d) {Y = 3} =
{(1, 1, 1, 0, 0), (1, 1, 0, 1, 0), (1, 1, 0, 0, 1), (1, 0, 1, 0, 1), (1, 0, 1, 1, 0),
(1, 0, 0, 1, 1), (0, 1, 1, 1, 0), (0, 1, 1, 0, 1), (0, 1, 0, 1, 1), (0, 0, 1, 1, 1)}

2. Suppose that two fair, standard dice are tossed and the sequence of
scores (X1 , X2 ) recorded. Let Y = X1 + X2 , denote the sum of the
scores, U = min(X1 , X2 ), the minimum score, and V = max(X1 , X2 )
the maximum score.

(a) Find the probability density function of (X1 , X2 ).


(b) Find the probability density function of Y .
(c) Find the probability density function of U.
(d) Find the probability density function of V .
(e) Find the probability density function of (U, V ).

Prof. Dr. Falkenberg, Faculty 2 Statistics WS 20/21


2
Y = X1 + X2
Answer: Original data: U = min(X1 , X2 )
V = max(X1 , X2 )
1
(a) P ((X1 , X2 ) = (i, j)) = 36
, i, j ∈ {1, 2, 3, 4, 5, 6}
k 2 3 4 5 6 7 8 9 10 11 12
(b) 1 2 3 4 5 6 5 4 3 2 1
P(Y=k) 36 36 36 36 36 36 36 36 36 36 36

k 1 2 3 4 5 6
(c) 11 9 7 5 3 1
P(U=k) 36 36 36 36 36 36

k 1 2 3 4 5 6
(d) 1 3 5 7 9 11
P(V=k) 36 36 36 36 36 36
1 2
(e) P (i, i) = 36
,i = 1, 2, 3, 4, 5, 6 and P (i, j) = 36
,i < j = 2, 3, 4, 5, 6
###################################################################
# S u p p ose t h a t two f a i r , s t a n d a r d d i c e a r e t o s s e d and t h e s e q u e n c e
# o f s c o r e s ( X 1 , X 2 ) a r e r e c o r d e d . L et Y=X 1+X 2 , d e n o t e t h e sum
# o f t h e s c o r e s , U=min ( X 1 , X 2 ) , t h e minimum s c o r e , and
# V=max ( X 1 , X 2 ) t h e maximum s c o r e .
#
# f i l e : p r o b r v r o l l i n g d i c e .R
###################################################################

library ( gtools )
library ( tidyverse )

# a ) Find t h e p r o b a b i l i t y d e n s i t y f u n c t i o n o f ( X 1 , X 2 ) .
# expand . g r i d ( ) c r e a t e a d a t a fr am e fr om a l l c o m b i n a t i o n s o f t h e
# s u p p l i e d v e c t o r s or f a c t o r s .
d e n s i t y . X1 . X2 <−
p e r m u t a t i o n s ( n = 6 , r = 2 , v = 1 : 6 , r e p e a t s . a l l o w e d = TRUE) %>%
a s t i b b l e ( ) %>%
mutate ( p r ob = 1 / 3 6 )

# b ) Find t h e p r o b a b i l i t y d e n s i t y function o f Y.
d e n s i t y . Y <− d e n s i t y . X1 . X2 %>%
mutate (Y = V1+V2 ) %>%
g r o u p b y (Y) %>%
mutate ( p r ob . Y = sum ( p r ob ) ) %>%
s e l e c t (Y, p r ob . Y) %>%
unique ( )

# c ) Find t h e p r o b a b i l i t y d e n s i t y function o f U.
d e n s i t y . U <− d e n s i t y . X1 . X2 %>%
r o w w i s e ( ) %>%
mutate (U = min ( V1 , V2 ) ) %>%
g r o u p b y (U) %>%
mutate ( p r ob . U = sum ( p r ob ) ) %>%
s e l e c t (U, p r ob . U) %>%
unique ( )

# d ) Find t h e p r o b a b i l i t y d e n s i t y function o f V.
d e n s i t y . V <− d e n s i t y . X1 . X2 %>%
r o w w i s e ( ) %>%
mutate (V = max ( V1 , V2 ) ) %>%
g r o u p b y (V) %>%
mutate ( p r ob . V = sum ( p r ob ) ) %>%
s e l e c t (V, p r ob . V) %>%
unique ( )

# e ) Find t h e p r o b a b i l i t y d e n s i t y f u n c t i o n o f (U, V ) .
d e n s i t y .UV <− d e n s i t y . X1 . X2 %>%
r o w w i s e ( ) %>%
mutate (UV = p a s t e ( min ( V1 , V2 ) , max ( V1 , V2 ) ) ) %>%
g r o u p b y (UV) %>%
mutate ( p r ob .UV = sum ( p r ob ) ) %>%
s e l e c t (UV, p r ob .UV) %>%
unique ( )

Prof. Dr. Falkenberg, Faculty 2 Statistics WS 20/21


3
#################################################################
# alternative solution
#################################################################
# a ) Find t h e p r o b a b i l i t y d e n s i t y f u n c t i o n o f ( X 1 , X 2 ) .
# expand . g r i d ( ) c r e a t e a d a t a fr am e fr om a l l c o m b i n a t i o n s o f t h e
# s u p p l i e d v e c t o r s or f a c t o r s .
x1 <− s e q ( 1 , 6 , 1 )
x2 <− s e q ( 1 , 6 , 1 )
x 1 x 2 <− expand . g r i d ( x=x1 , y=x2 )
x 1 x 2 d e n s <− c b i n d ( x1 x2 , r e p ( 1 / 3 6 , 3 6 ) )
x1 x2 dens

# b ) Find t h e p r o b a b i l i t y density function o f Y.


y <− x 1 x 2 [ , 1 ] + x 1 x 2 [ , 2 ]
y d e n s <− t a b l e ( y ) / 3 6
y dens

# c ) Find t h e p r o b a b i l i t y d e n s i t y f u n c t i o n o f U .
# a p p l y ( ) r e t u r n s a v e c t o r o r a r r a y o r l i s t o f v a l u e s o b t a i n e d by
# a p p l y i n g a f u n c t i o n t o m a r g i n s o f an a r r a y o r m a t r i x .
u <− a p p l y ( x1 x2 , 1 , min )
u d e n s <− t a b l e ( u ) / 3 6
u dens

# d ) Find t h e p r o b a b i l i t y density function o f V.


v <− a p p l y ( x1 x2 , 1 , max )
v d e n s <− t a b l e ( v ) / 3 6
v dens

# e ) Find t h e p r o b a b i l i t y d e n s i t y f u n c t i o n of (U, V ) .
uv <− c b i n d ( x1 x2 , u , v )
uv
#
u v d e n s <− t a b l e ( uv [ , 3 ] , uv [ , 4 ] ) / 3 6
uv dens

3. R offers for a large number of probability distributions functions. The


commands for each distribution are prepended with a letter to indicate
the functionality:

• “d” returns the height of the probability density function


• “p” returns the cumulative density function
• “q” returns the inverse cumulative density function (quantiles)
• “r” returns randomly generated numbers

Consider an urn with 100 balls, where are 30 balls of them are red. 20
balls are randomly drawn and let X be the number of red drawn balls.

(a) Determine the distribution of X if the balls are drawn with resp.
without replacement.
(b) Plot the density of X.
(c) Generate a sample of size 20 of values of X.
(d) Compute P (5 < X < 15).
(e) Determine the 25% quantile, the median and the 75% quantile of
X.
Answer:

Prof. Dr. Falkenberg, Faculty 2 Statistics WS 20/21


4
#####################################################################
# R o f f e r s f o r a l a r g e number o f p r o b a b i l i t y d i s t r i b u t i o n s f u n c t i o n s .
# The commands f o r e a c h d i s t r i b u t i o n a r e p r ep en d ed w i t h a l e t t e r t o
# i n d i c a t e the f u n c t i o n a l i t y :
# ”d” r et u r n s the h ei gh t of the p r o b a b i l i t y d en si t y f u n ct i on
# ”p” r et u r n s the cumulative d en si t y f u n ct i on
# ” q” r et u r n s the i n v e r s e cumulative d en si t y f u n ct i on ( q u a n t i l e s )
# ”r” r e t u r n s r an d om l y g e n e r a t e d numbers
# C o n s i d e r an u r n w i t h 100 b a l l s , wh er e a r e 30 b a l l s o f them a r e r e d .
# 20 b a l l s a r e r an d om l y drawn and l e t X be t h e number o f r e d drawn b a l l s .
# a ) D et er m i n e t h e d i s t r i b u t i o n o f X i f t h e b a l l s a r e drawn w i t h r e s p .
# without rep l acemen t .
# b) Plot the d en si t y of X.
# c ) G e n e r a t e a sam p l e o f s i z e 20 o f v a l u e s o f X
# d ) Compute P( 5 < X < 1 5 ) .
# e ) D et er m i n e t h e 25% q u a n t i l e , t h e median and t h e 75% q u a n t i l e o f X .
#
# f i l e : p r o b r v r f u n c t i o n s .R
########################################################################

# w i t h r e p l a c e m e n t : X ˜ B( n =20 ,p = 0 . 3 )

# p l ot of the d en si t y
k <− 0 : 2 0
p l o t ( k , dbinom ( k , 2 0 , 0 . 3 ) , t y p e = ”h ” , main =”B( n =20 ,p = 0 . 3 ” , x l a b =”x ” ,
y l a b =” d e n s i t y ” )
# sam p l e o f s i z e 20
rbinom ( n =20 , s i z e = 2 0 , p r ob = 0 . 3 )
# P( 5 < X < 1 5 )
sum ( dbinom ( 6 : 1 4 , s i z e = 2 0 , p r ob = 0 . 3 ) )
pbinom ( 1 4 , s i z e = 2 0 , p r ob = 0 . 3 ) − pbinom ( 5 , s i z e = 2 0 , p r ob = 0 . 3 )
# quantile
qbinom ( c ( 0 . 2 5 , 0 . 5 , 0 . 7 5 ) , s i z e = 2 0 , p r ob = 0 . 3 )

# without r e p l a c e m e n t : X ˜ H( n =20 ,M=30 ,N=100)

# p l ot of the d en si t y
k <− 0 : 2 0
p l o t ( k , d h yp er ( k ,m=30 ,n=70 , k = 20) , t y p e = ”h ” , main =”H( n =20 ,M=30 ,N=100” ,
x l a b =”x ” , y l a b =” d e n s i t y ” )
# sam p l e o f s i z e 20
r h y p e r ( 2 0 ,m=30 ,n=70 , k=20)
# P( 5 < X < 2 0 )
sum ( d h yp er ( 6 : 1 4 ,m=30 ,n=70 , k = 20) )
p h yp er ( 1 4 ,m=30 ,n=70 , k=20) − p h yp er ( 5 ,m=30 ,n =70 ,k =20)
# quantile
qh yp er ( c ( 0 . 2 5 , 0 . 5 , 0 . 7 5 ) ,m=30 ,n =70 ,k =20)

4. In a game a player can bet 1$ on any of the numbers 1, 2, 3, 4, 5 and


6. Three dice are rolled. If the players number appears k times, where
k ≥ 1, the player gets k$ back plus the original stack of 1$. Over the
long run, how many cents per game a player expects to win or lose
playing this game?
n= 3
Answer: X = money per game, wanted: E(X), data: i.e.
p = 61
1 2
p1 = P(Player number occurs once) = 31 · 16 · 56 ,

2 1
p2 = P(Player number occurs twice) = 32 · 16 · 56 ,

3 0
p3 = P(Player number occurs three times) = 33 · 16 · 65 ,

0 3
p0 = P(Player number occurs 0 times) = 30 · 16 · 65


Thus, we get
p0 · (0 − 1) + p1 · (2 − 1) + p2 · (3 − 1) + p3 · (4 − 1) = n · p − p0 =
3
3 · 16 − p0 = 21 − 56 = −0.0787
######################################################

Prof. Dr. Falkenberg, Faculty 2 Statistics WS 20/21


5
# I n a game a p l a y e r can b e t 1 $ on any o f t h e numbers
# 1 , 2 , 3 , 4 , 5 and 6 . Th r ee d i c e a r e r o l l e d . I f t h e
# p l a y e r s number a p p e a r s k t i m e s , wh er e k >= 1 , t h e p l a y e r
# g e t s k $ b ack p l u s t h e o r i g i n a l s t a c k o f 1 $ . Over t h e
# l o n g run , how many c e n t s p e r game a p l a y e r e x p e c t s t o
# win o r l o s e p l a y i n g t h i s game ?
#
# f i l e : p r o b r v e x p v a l u e .R
#######################################################

library ( tidyverse )

# c r e a t e a t i b b l e w i t h t h e d i f f e r e n t ou t com es and t h e i r
# probabilities
R <−
tibble (
k =0:3
)
R <−
R %>%
mutate ( p r ob=c h o o s e ( 3 , k ) ∗ ( 1 / 6 ) ∗ ∗ k ∗ ( 5/6) ∗∗ ( 3 − k ) ) %>%
mutate ( r = i f e l s e ( k==0,−1,−1+1+k ) )
R
# expected value
e x p r <− sum (R [ , 2 ] ∗ R [ , 3 ] )
exp r

5. Consider a lottery of 20 tickets. Among the tickets there is are 1 first


prize, 4 second prizes and 15 rivets. 5 tickets are drawn from the lottery
drum. Determine the probability that

(a) 2 rivets have been drawn.


(b) 2 rivets, 2 second prizes and the first prize were drawn.
(c) the 5th ticket drawn is the first ticket which is not a rivet.

Calculate the probabilities if the tickets are drawn with replacement.


What results will be obtained if the number of tickets in the drum
is increased (with equal proportions of first prizes, second prizes and
rivets)?
Answer: Let N be the number of tickets, n the number of drawn
tickets, FP the number of first prizes, SP the number of second prizes
and R the number of rivets.

(a) X=number of rivets drawn: the random variable is hypergeomet-


ricly distributed (H(n=5,R=15,N=20)):
15
 5
2
· 3
P (X = 2) = 20
 ≈ 0.0726
5

(b) If Y= number of second prizes and Z = number of first prizes we


have
15
 4 1
2
· 2 · 1
P (X = 2, Y = 2, Z = 1) = 20
 ≈ 0.041
5

Prof. Dr. Falkenberg, Faculty 2 Statistics WS 20/21


6
(c) U = number of tickets drawn to get the first non rivet ticket. If U
= 5 the first 4 tickets are rivets and the 5th ticket is a non rivet.
15

4 5
P (U = 5) = 20 · ≈ 0.088
4
16

If the tickets are drawn with replacement X is binomially distributed


(B(n=5,p=15/20)), (X,Y,Z) is multinomially distributed and U is ge-
omatricly distributed. We get
 15 2 5 2
P (X = 2) = 52 20 · 20 ≈ 0.088
5! 15 2 4 2 1 1
  
P (X = 2, Y = 2, Z = 1) = 2!2!1! · 20 · 20 · 20 ≈ 0.03375
15 4
 5
P (U = 5) = 1 − 20 · 20 ≈ 0.0791

Increasing the number of tickets with the same ratio of rivets, second
and firts prizes we get
fp sp riv pa-worepl pa-wrepl pb-worepl pb-wrepl pc-worepl pc-wrepl
1 4 15 0.0677 0.0879 0.0406 0.0338 0.0880 0.0791
6 24 90 0.0853 0.0879 0.0348 0.0338 0.0804 0.0791
11 44 165 0.0865 0.0879 0.0343 0.0338 0.0798 0.0791
16 64 240 0.0870 0.0879 0.0341 0.0338 0.0796 0.0791
21 84 315 0.0872 0.0879 0.0340 0.0338 0.0795 0.0791
26 104 390 0.0873 0.0879 0.0340 0.0338 0.0794 0.0791
31 124 465 0.0874 0.0879 0.0340 0.0338 0.0794 0.0791
36 144 540 0.0875 0.0879 0.0339 0.0338 0.0793 0.0791
41 164 615 0.0875 0.0879 0.0339 0.0338 0.0793 0.0791
46 184 690 0.0876 0.0879 0.0339 0.0338 0.0793 0.0791
51 204 765 0.0876 0.0879 0.0339 0.0338 0.0793 0.0791
Mention that the results in case of drawing ticktes with replacement do
not not depend on the total number of tickets, since the ratios rivets,
second and first prizes are identical. Furthermore mention that the
results without replacement tends to the resuls with replacement.
####################################################################
# C o n s i d e r a l o t t e r y o f 20 t i c k e t s . Among t h e t i c k e t s t h e r e
# a r e a f i r s t p r i z e , 4 s e c o n d p r i z e s and 15 r i v e t s . 5 t i c k e t s
# a r e drawn fr om t h e l o t t e r y drum . D et er m i n e t h e p r o b a b i l i t y t h a t
# a ) 2 r i v e t s h ave b een drawn .
# b ) 2 r i v e t s , 2 s e c o n d p r i z e s and t h e f i r s t p r i z e wer e drawn .
# c ) The 5 t h l o t drawn i s t h e f i r s t l o t wh i ch i s n ot a r i v e t .
# C a l c u l a t e t h e p r o b a b i l i t i e s i f t h e l o t s a r e drawn w i t h r e p l a c e m e n t .
# What r e s u l t s w i l l be o b t a i n e d i f t h e number o f l o t s i n t h e drum i s
# i n c r e a s e d ( with equ al p r o p o r t i o n s o f f i r s t p r i z e s , second p r i z e s
# and r i v e t s ) ?
#
# f i l e : p r o b r v r i v e t .R
#####################################################################

library ( tidyverse )

f p <− 1
sp <− 4
r i v <− 15
n <− 5

# without replacement

Prof. Dr. Falkenberg, Faculty 2 Statistics WS 20/21


7
# a ) hypergeometric d i s t r i b u t i o n
# H( k , M, N+M) d e n s i t y :
#d h yp er ( x = number o f w h i t e b a l l s drawn ,
# m = number o f w h i t e b a l l s i n t h e urn ,
# n = number o f b l a c k b a l l s i n t h e urn ,
# k = t h e number o f b a l l s drawn fr om t h e u r n )
pa <− d h yp er ( 2 , r i v , f p+sp , n )
pa # = c h o o s e ( 1 5 , 2 ) ∗ c h o o s e ( 5 , 3 ) / c h o o s e ( 2 0 , 5 )

# b) g e ne r a l is e d hypergeometric d i s t r i b u t i o n
pb <− c h o o s e ( r i v , 2 ) ∗ c h o o s e ( sp , 2 ) ∗ c h o o s e ( fp , 1 ) /
c h o o s e ( r i v+f p+sp , n )
pb
# c ) geometric d i s t r i b u t i o n
pc <− ( c h o o s e ( r i v , n − 1)/ c h o o s e ( f p+sp+r i v , n − 1)) ∗
( f p+sp ) / ( f p+sp+r i v −(n − 1))
pc

# r e s u l t s w i t h and w i t h o u t r e p l a c e m e n t f o r i n c r e a s i n g
# number o f t i c k e t s b u t c o n s t a n t p r o p o r t i o n o f t h e
# prizes
r e s u l t s <−
tibble (
m = s e q ( fr om =1 , t o =55 , by = 5) ,
f p = m,
sp = 4 ∗m,
r i v = 15 ∗m,
p a w o r e p l = d h yp er ( 2 , r i v , f p+sp , 5 ) ,
p a w r e p l = dbinom ( 2 , 5 , r i v / ( r i v+f p+sp ) ) ,
p b w o r e p l = c h o o s e ( r i v , 2 ) ∗ c h o o s e ( sp , 2 ) ∗ c h o o s e ( fp , 1 ) /
c h o o s e ( r i v+f p+sp , 1+ 2+ 2) ,
p b w r e p l = f a c t o r i a l (1+2+2)/
( f a c t o r i a l (2)∗ f a c t o r i a l (2)∗ f a c t o r i a l (1 ))∗
( r i v / ( r i v+f p+sp ) ) ˆ 2 ∗ ( sp / ( r i v+f p+sp ) ) ˆ 2 ∗
( f p / ( r i v+f p+sp ) ) ˆ 1 ,
p c w o r e p l = ( c h o o s e ( r i v , 5 − 1) / c h o o s e ( f p+sp+r i v , 5 − 1 ) ) ∗
( f p+sp ) / ( f p+sp+r i v − (5 − 1)) ,
p c w r e p l = dgeom ( 5 − 1 , ( f p+sp ) / ( f p+sp+r i v ) )
)
results

6. Consider a network of 3 computers. A networks fails if at least one


computer is isolated. The events that a connection fails are independent
and the probability to fail is 1 − α. Compare the following topologies
with respect to their reliabilities.
A B C
2 2 2

1 1 1

3 4 3 4 3 4

Answer:

1 connection between node i and node j is ok
A: With vij =
0 connection between node i and node j is defect
and P (vij = 1) = α for every i, j we define the probability space

(Ω = {(v12 , v13 , v14 ) | vij ∈ {0, 1}}, P(Ω), P )

Prof. Dr. Falkenberg, Faculty 2 Statistics WS 20/21


8
v12 +v13 +v14 1−(v12 +v13 +v14 )
with P (v12 , v13 , v14 ) = α · (1 − α)

⇒ P (network A is ok) = P ((1, 1, 1)) = α3

B: With Ω = {(v12 , v12



, v13 , v13

, v14 , v14

) | vij ∈ {0, 1}} we define the
probability space analogously.
′ ′ ′
P (network B is ok) = P ({v1,2 + v12 ≥ 1} ∩ {v13 + v13 ≥ 1} ∩ {v14 + v14 ≥ 1})
′ 3 2 3
= (P (v1,2 + v12 ≥ 1)) = (1 − (1 − α) )
= α3 (2 − α)3
C: With (Ω = {(v12 , v13 , v14 , v23 , v24 , v34 ) | vij ∈ {0, 1} we define the
probability space analogously. Due to the independency and the
identical failure rate of the connection we get
 
6 k
P (k connections are ok) = α (1 − α)6−k
k
The network is ok if the graph with the working connections as
edges is connected.
The graph is not connected if and only if only 0, 1 or 2
connections are ok or 3 in the following cases

If there are more than 4 working connections the graph is con-


nected.
6     
X 6 k 6−k 6
P (Network C is ok) = α (1 − α) + − 4 α3 (1 − α)3
k=4
k 3
= 16α3(1 − α)3 + 15α4 (1 − α)2 + 6α5 (1 − α) + α6

Prof. Dr. Falkenberg, Faculty 2 Statistics WS 20/21


9

The diagram shows that P (network A is ok) < P (network B is ok) <
P (network C is ok) for every α ∈ (0, 1).
###########################################################
# C o n s i d e r a n et wor k o f 3 com p u t er s . A n e t w o r k s f a i l s i f a t
# l e a s t on e com p u t er i s i s o l a t e d . The e v e n t s t h a t a c o n n e c t i o n
# f a i l s a r e i n d e p e n d e n t and t h e p r o b a b i l i t y t o f a i l i s 1− a l p h a .
# Compare t h e t o p o l o g i e s A, B and C w i t h r e s p e c t t o t h e i r
# reliabilities .
# L et PA, PB and PC t h e p r o b a b i l i t i e s t h a t t h e n et wor k i s ok
# and a l p h a t h e p r o b a b i l i t y t h a t on e c o n n e c t i o n i s ok .
#
# f i l e : p r o b r v n e t w o r k r e l i a b i l i t y .R
############################################################

library ( tidyverse )

a l p h a <− 0 . 9
# r e l i a b i l i t i e s of the t o p o l o g i e s
PA <− f u n c t i o n ( a l p h a ) {
r e t u r n ( alpha ˆ3)
}
PB <− f u n c t i o n ( a l p h a ) {
r e t u r n ( a l p h a ˆ3 ∗ (2 − a l p h a ) ˆ 3 )
}
PC <− f u n c t i o n ( a l p h a ) {
r e t u r n ( 1 6 ∗ a l p h a ˆ3 ∗ (1 − a l p h a )ˆ3+15 ∗ a l p h a ˆ4 ∗ (1 − a l p h a )ˆ2+
6 ∗ a l p h a ˆ5 ∗ (1 − a l p h a )+ a l p h a ˆ 6 )
}

# t i b b l e con t ai n i n g the r e l i a b i l i t i e s f o r d i f f e r e n t alpha


p d a t a <−
tibble (
a l p h a=s e q ( 0 , 1 , by = 0 . 0 1 ) ,
A=PA( a l p h a ) ,
B=PB( a l p h a ) ,
C=PC( a l p h a )
) %>%
# g a t h e r t h e col u m n s A, B , C t o on e v a r i a b l e
g a t h e r ( ”A” , ”B” , ”C” , key=”t o p o l o g y ” ,
v a l u e=” r e l i a b i l i t y ” )
#
p data
t op . c o l <− c ( ” r e d ” , ” b l u e ” , ” b l a c k ” )
p l o t ( x = p data$alpha , y = p d a t a $ r e l i a b i l i t y ,
xlab = ” alpha ” , ylab = ” r e l i a b i l i t y ” ,

Prof. Dr. Falkenberg, Faculty 2 Statistics WS 20/21


10
t y p e = ”b ” , pch = 2 0 ,
c o l = t op . c o l [ a s . f a c t o r ( p d a t a $ t o p o l o g y ) ] )
legend (” r i g h t ” , unique ( p data$topology ) ,
c o l=t op . c o l , pch =20 , c e x = 0 . 7 5 )
a b l i n e ( a = 0 , b = 1)

7. compare Heumann, Schomaker p 176, Exercise 8.6


A company organizes a raffle at an end-of-year function. There are
4000 raffle tickets to be sold, of which 500 win a prize. The price of
each ticket is 1.5 Euro. The value of the prizes varies between 80 Euro
and 250 Euro with an average of 142 Euro.

(a) An employee wants to have a 99% guarantee of receiving three


prizes. How much money does he need to spend? Use R to solve
the question.
(b) Use R to plot the function which describes the relationship be-
tween the number of tickets bought and the probability of winning
at least three prizes.
(c) Given the value of the prizes and the costs of the tickets, it is
worth taking part in raffle?

Answer: Let Xn be the number of prize tickets if you have drawn n


tickets randomly. Xn is hypergemtrically distributed (∼ H(n, 500, 4000)).
The probability to get at least 3 prize tickets is pn = P (Xn >= 3) =
1 −P (Xn <= 2). Thus we are looking for the smallest n withpn ≥ 0.99.

(a) Generate a dataframe with the columns n and pn and find the
minimum n with pn ≥ 0.99
dis < −
tibble( n=seq(1,4000), p=1-phyper(2,500,3500,n)) min(which(dis$p >=0.99))
You will n=64. If you buy 64 tickets you must pay 64 · 1.5 = 96
Euro but you will get a return of 3 · 142 = 426 Euro with a
probability of 0.99.
The total value of all prizes is 500 · 142 = 71000 Euro. If all tickets
are sold the company will get 400 · 1.5 = 6000 Euro. Certainly the
company have not paid the regular prize for all prizes.
(b) dis %>%
filter(n <= 75) %>%
ggplot(mapping=aes(x=n,y=p))+
geom point()+
geom abline(slope = 0, intercept = 0.99)

Prof. Dr. Falkenberg, Faculty 2 Statistics WS 20/21


11
1.00

0.75

0.50
p

0.25

0.00

0 20 40 60
n

Continous Random Variables and Distributions


1. The time T (in minutes) required to perform a certain job is uniformly
distributed over the interval [15, 60].

(a) Find the probability that the job requires more than 30 minutes.
(b) Given that the job is not finished after 30 minutes, find the prob-
ability that the job will require more than 15 additional minutes.

Answer:
R 60
(a) f (x) = 1
b−a
= 1
60−15
= 1
45
i.e. 1
30 45
dx = 1
45
· [x]60
30 =
30
45
= 2
3
(b) X = total working time and we become
60 1 1
·[x]60 15
R
P (X>45) dx
P (X > 45|X > 30) = P (X>30)
= R45
60
45
1 = 45
30
45
= 45
30 = 0.5
30 45 dx 45 45

2. compare Heumann, Schomaker, p. 150, exercise 7.4


A quality index summarizes different features of a product by means of
a score. Different experts may assign different quality scores depending
on their experience with the product. Let X be the quality index for a
tablet. Suppose the respective probability density function is given as
follows: 
cx(2 − x) if 0 ≤ x ≤ 2
f (x) =
0 elsewhere

(a) Determine c such that f is a proper probability density function.


(b) Determine the cumulative distribution function.
(c) What is probability that the score is less equal 1.5 and bigger 0.5?
(d) Calculate the expectation and the variance.

Prof. Dr. Falkenberg, Faculty 2 Statistics WS 20/21


12
Answer:

(a) From
2 2
2
x3 4c
Z Z
2 2
1= cx(2 − x)dx = c (2x − x )dx = c(x − ) =
0 0 3 0 3

we get c=0.75.
(b) If 0 ≤ x ≤ 2 we have
x
x
Z
F (x) = 0.75 u(2 − u)du = 0.75x2 (1 − ) ⇒
0 3

 0 0≤x
x 2
F (x) = 0.75x (1 − 3 ) 0 < x ≤ 2
1 2<x

(c)
9 1 1 1 11
P (0.5 < X ≤ 1.5) = F (1.5)−f (0.5) = 0.75( (1− )− (1− )) =
4 2 4 6 16

(d)
2
2
2x3 x4
Z
2
E(X) = 0.75 x (2 − x)dx = 0.75 ( − ) = 1
0 3 4 0
Z 2 2
2 3 x4 x5
E(X ) = 0.75 x (2 − x)dx = 0.75 ( − ) = 1.2 ⇒
0 2 5 0
V ar(X) = E(X ) − (E(X))2 = 1.2 − 1 = 0.2
2

Normal Distributions
1. R offers a number of functions for calculating with normal distributions.
Call them up in the RStudio Help area with the keyword Normal and
familiarize yourself with them.
R has four in built functions to generate normal distribution.

(a) The function qnorm() gives height of the probability distribution


at each point for a given mean and standard deviation. Apply this
function to create a plot the density of the normal distribution
with mean 2.5 and standard deviation 1.5.

Prof. Dr. Falkenberg, Faculty 2 Statistics WS 20/21


13
(b) pnorm() gives the probability of a normally distributed random
number to be less that the value of a given number (cumulative
distribution function). Apply this function to create a plot of the
normal distribution function with mean 2.5 and standard devia-
tion 1.5.
(c) qnorm() takes the probability value and gives a number whose
cumulative value matches the probability value (quantile). Apply
this function to plot the quantiles of the normal distribution with
mean 2.5 and standard deviation 1.5.
(d) rnorm() is used to generate random numbers whose distribution is
normal. It takes the sample size as input and generates that many
random numbers. Draw a histogram to show the distribution of
the generated numbers which are normally distributed with mean
2.5 and standard deviation 1.5 .
Answer:
2############################################################
# R o f f e r s a number o f f u n c t i o n s f o r c a l c u l a t i n g w i t h n or m al
# d i s t r i b u t i o n s . C a l l them up i n t h e R S t u d i o H el p a r e a w i t h
# t h e keyword Normal and f a m i l i a r i z e y o u r s e l f w i t h them .
# R h as f o u r i n b u i l t f u n c t i o n s t o g e n e r a t e n or m al d i s t r i b u t i o n .
#
# f i l e : p r o b n d r f u n c t i o n s .R
#############################################################

# The f u n c t i o n dnorm ( ) g i v e s h e i g h t o f t h e p r o b a b i l i t y d e n s i t y
# a t e a c h p o i n t f o r a g i v e n mean and s t a n d a r d d e v i a t i o n . Apply
# t h i s f u n c t i o n t o c r e a t e a p l o t t h e d e n s i t y o f t h e n or m al
# d i s t r i b u t i o n w i t h mean 2 . 5 and s t a n d a r d d e v i a t i o n 1 . 5 .
x <− s e q ( − 10 , 1 0 , by = . 1 )
y <− dnorm ( x , mean = 2 . 5 , sd = 1 . 5 )
p l o t (x , y , type = ” l ”)

# pnorm ( ) g i v e s t h e p r o b a b i l i t y o f a n o r m a l l y d i s t r i b u t e d random
# v a r i a b l e t o be l e s s t h a t t h e v a l u e o f a g i v e n number ( c u m u l a t i v e
# d i s t r i b u t i o n f u n c t i o n ) . Apply t h i s f u n c t i o n t o c r e a t e a p l o t o f
# t h e n or m al d i s t r i b u t i o n f u n c t i o n w i t h mean 2 . 5 and s t a n d a r d
# deviation 1. 5.
x <− s e q ( − 10 , 10 , by = . 1 )
y <− pnorm ( x , mean = 2 . 5 , sd = 1 . 5 )
p l o t (x , y , type = ” l ”)

# qnorm ( ) t a k e s t h e p r o b a b i l i t y v a l u e and g i v e s a number whose


# c u m u l a t i v e v a l u e m at ch es t h e p r o b a b i l i t y v a l u e ( q u a n t i l e ) .
# Apply t h i s f u n c t i o n t o p l o t t h e q u a n t i l e s o f t h e n or m al d i s t r i b u t i o n
# w i t h mean 2 . 5 and s t a n d a r d d e v i a t i o n 1 . 5 .
x <− s e q ( 0 , 1 , by = 0 . 0 2 )
y <− qnorm ( x , mean = 2 , sd = 1 . 5 )

p l o t (x , y , type = ” l ”)

# rnorm ( ) i s u sed t o g e n e r a t e random numbers whose d i s t r i b u t i o n i s


# n or m al . I t t a k e s t h e sam p l e s i z e a s i n p u t and g e n e r a t e s t h a t many
# random numbers . Draw a h i s t o g r a m t o show t h e d i s t r i b u t i o n o f t h e
# g e n e r a t e d numbers wh i ch a r e n o r m a l l y d i s t r i b u t e d w i t h mean 2 . 5 and
# standard d e v i a t i o n 1 . 5 .
y <− rnorm ( 1 0 0 , mean = 2 . 5 , sd = 1 . 5 )
h i s t ( y , main = ” Normal D i s t r i b u t i o n ( mean = 2 . 5 , sd = 1 . 5 ) ” )

2. If scores are normally distributed with a mean of 35 and a standard


deviation of 10, what percent of the scores is:
(a) greater than 34?

Prof. Dr. Falkenberg, Faculty 2 Statistics WS 20/21


14
(b) smaller than 42?
(c) between 28 and 34?
Answer: Score S ∼ N(µ, σ 2 ) with µ = 35 and σ = 10

(a) P (X > 34) = 1 − P (X ≤ 34) = 1 − P X−35 ≤ 34−35



10 10
=
34−35
1−Φ 10 = 1−Φ(−0.1) = 1−(1−Φ(0.1)) = Φ(0.1) = 0.53983
R-Code: 1-pnorm(34,35,10)
(b) P (X < 42) = P (X ≤ 42) = Φ 42−35

10
= Φ(0.7) = 0.758036
R-Code: pnorm(42,35,10)
(c) P (28 ≤ X ≤ 34) = P (X ≤ 34) − P (X ≤ 28) =
= Φ 34−35
10
− Φ 28−35
10
= Φ(−0.1) − Φ(−0.7) =
1−Φ(0.1)−(1−Φ(0.7)) = Φ(0.7)−Φ(0.1) = 0.758036−0.539828 =
0.2182
R-Code: pnorm(34,35,10) - pnorm(28,35,10)

3. Assume a normal distribution with a mean of 70 and a standard devi-


ation of 12. What limits would include the middle 65% of the cases?
Answer: X ∼ N(70, 122 ) and P (u ≤ X ≤ o) = 0.65 i.e. P (X ≤ u) =
0.175 and P (X ≤ o) = 0.175 + 0.65 = 0.825 i.e. Φ o−70 12
= 0.825 i.e.
o−70
12
= u 0.825 (the 82.5%-Quantil of the N(0, 1)−distribution)
With u0.80 = 0.8416 and u0.85 = 1.0364 we become
u0.825 ≈ u0.80 + 11 (u0.85 − u0.80 ) = 0.939
Totally o = 70 + 12 · u0.825 = 81.268 and u = 70 − 12 · u0.825 = 58.732
R-Code: qnorm(0.825,70,12) qnorm(0.825,70,12)
4. Suppose that weights of bags of potato chips coming from a factory
follow a normal distribution with mean 12.8 ounces and standard de-
viation 0.6 ounces. If the manufacturer wants to keep the mean at
12.8 ounces but adjust the standard deviation so that only 1% of the
bags weigh less than 12 ounces, how small does he need to make that
standard deviation?
Answer: X ∼ N(12.8, 0.62 )
We keep the expectation 12.8 but adjust the standard deviation, such
X−12.8 12−12.8

that P (X < 12) = 0.01. We become P s
< s = 0.01 i.e.
−0.8
with the standard normal distribution we become Φ s
= 0.01.
0.8

With Φ(−x) = 1 − Φ(x) : 1 − Φ s = 0.01 i.e.
Φ 0.8
s
= 0.99 ⇒ 0.8
s
0.8
= 2.3263 i.e. s = 2.3263 ≈ 0.3439
R-Code: (12-12.8)/qnorm(0.01,0,1)

Prof. Dr. Falkenberg, Faculty 2 Statistics WS 20/21


15
5. Peter and Paul agree to meet at a restaurant at noon. Peter arrives at
a time normally distributed with mean 12:00 and standard deviation
5 minutes. Paul arrives at a time normally distributed with mean
12:02 and standard deviation 2 minutes. Assuming the two arrivals are
independent, find the probabilty that

(a) Peter arrives before Paul


(b) both men arrive within 3 minutes of noon
(c) the two men arrive within 3 minutes of each other
 
1 2
Answer: Arrival time Peter: X ∼ N 12, 12
 
1 1 2
Arrival time Paul: Y ∼ N 12 30 , 30

(a) P (X < Y ) = P (X − Y < 0) and


1 1 2
 
1 2

1 1 2
 
1 2
X − Y ∼ N 12 − 12 30 , 12 + 30 = N − 30 , 12 + 30
 
1
0+ 30
P (X − Y < 0) = Φ q 1 2 1 2 = 0.644309
( 12 ) +( 30 )
(b) P (11.95 ≤ X ≤ 12.05, 11.95 ≤ Y ≤ 12.05) =
=P  (11.95 ≤ X ≤ 12.05)
 · P (11.95  ≤Y ≤ 12.05)  = 
1 1
 
12.05−12 11.95−12 12.05−12 30 11.95−12 30
= Φ 1 − Φ 1 · Φ 1 − Φ 1 =
12 12 30 30
(0.72575 − (1 − 0.72575)) · (0.69146 − (1 − 0.99379)) = 0.3094
1

(c) P |X  − Y | ≤ 20
=
 P (−0.05
 ≤ X − Y ≤ 0.05) =
0,05+ 1 −0,05+ 1
= Φ q 1 2 30 1 2 − Φ q 1 2 301 2 = Φ(0.93) − Φ(−0.19) =
( 12 ) +( 30 ) ( 12 ) +( 30 )
0.82 − (1 − 0.58) = 0.40
##############################################################
# P e t e r and P au l a g r e e t o meet a t a r e s t a u r a n t a t noon . P e t e r
# a r r i v e s a t a t i m e n o r m a l l y d i s t r i b u t e d w i t h mean 1 2 : 0 0 and
# s t a n d a r d d e v i a t i o n 5 m i n u t e s . P au l a r r i v e s a t a t i m e n o r m a l l y
# d i s t r i b u t e d w i t h mean 1 2 : 0 2 and s t a n d a r d d e v i a t i o n 2 m i n u t e s .
# Assuming t h e two a r r i v a l s a r e i n d e p e n d e n t , f i n d t h e f o l l o w i n g
# probabilie .
#
# f i l e : p r o b n d p e t e r p a u l .R
###############################################################
m1 <− 1 2 ; m2 <− 12+2/60
s 1 <− 5 / 6 0 ; s 2 <− 2/60

# a ) P e t e r a r r i v e s b e f o r e P au l
p <− pnorm ( 0 , m1−m2 , s q r t ( s 1 ˆ2+ s 2 ˆ 2 ) )
p

# b ) b ot h men a r r i v e w i t h i n 3 m i n u t e s o f noon
p <− ( pnorm ( 12+ 3/60 , m1 , s 1 ) − pnorm ( 12 − 3/60 , m1 , s 1 ) ) ∗
( pnorm ( 12+ 3/60 , m2 , s 2 ) − pnorm ( 12 − 3/60 , m2 , s 2 ) )
p

# c ) t h e two men a r r i v e wi th i n 3 minutes o f each oth er


p <− pnorm ( 3 / 6 0 , m1−m2 , s q r t ( s 1 ˆ2+ s 2 ˆ 2 ) ) − pnorm ( − 3/60 , m1−m2 , s q r t ( s 1 ˆ2+ s 2 ˆ 2 ) )
p

Prof. Dr. Falkenberg, Faculty 2 Statistics WS 20/21


16
2 2
6. The weight of a melon, X, in kg is N(µ = 1.2, σ = 0.3 ), i.e. normally
distributed with expectation 1.2 kg and standard deviation 0.3 kg. The
weight Y for a pineapple is in kg N(µ = 0.6, σ 2 = 0.22 ). We assume
that a melon and a pineapple are chosen independently of each other.
(a) Which distribution has the total weight of the two fruits?
(b) Calculate the probability that the total weight of the two fruits
does not exceed 2.0 kg.
(c) The melon costs 2 euro per kg and the pineapple 4 euro per kg.
Give an expression for the total price Z using X and Y . What is
the distribution of Z?
(d) Calculate the probability that the price Z is higher than 4 euro.
Answer:
(a) X +Y is normally distributed with expectation E[X]+E[Y ] = 1.8
and variance V ar(X) + V ar(Y ) = 0.32 + 0.22 = 0.13
 
2.0−1.8
(b) P (X + Y ≤ 2.0) = Φ √
0.13
≈ Φ(0.55) ≈ 0.71
(c) Z = 2X + 4Y . The price is a linear combination of normally
distributed random variables and thus also normhjally distributed.
The expectation is 2E[X] + 4E[Y ] = 2 · 1.2 + 4 · 0.6 = 4.8 and the
variance is 22 V ar(X) + 42 V ar(Y ) = 4 · 0.32 + 16 · 0.22 = 1

(d) P (Z > 4)P = 1−P (Z ≤ 4) = 1−Φ((4−4.8)/ 1) = 1−Φ(−0.8) =
Φ(0.8) = 0.7881
#################################################################
# The w e i g h t o f a melon , X, i n kg i s N(mu= 1 . 2 , si gm a ˆ 2 = 0 . 3 ˆ 2 ) ,
# i . e . n o r m a l l y d i s t r i b u t e d w i t h e x p e c t a t i o n 1 . 2 kg and s t a n d a r d
# d e v i a t i o n 0 . 3 kg . The w e i g h t Y f o r a p i n e a p p l e i s i n kg
# N(mu= 0 . 6 , si gm a ˆ 2 = 0 . 2 ˆ 2 ) . We assume t h a t a melon and a p i n e a p p l e
# are chosen i n d ep en d en tl y o f each oth er .
#
# f i l e : pr o b nd f r u i t s .R
##################################################################

mu m <− 1.2
sigma m <− 0 . 3
mu p <− 0.6
sigma p <− 0 . 2

# a ) Which d i s t r i b u t i o n h as t h e t o t a l weight of t h e two fruits ?


# S = X + Y
mu s <− mu m + mu p
s i g m a s <− ( sigma mˆ2+ s i g m a p ˆ 2 ) ˆ 0 . 5

# b) Cal cu l at e the p r o b a b i l i t y that the total weight of t h e two fruits


# d o e s n ot e x c e e d 2 . 0 kg .
pnorm ( 2 , mu s , s i g m a s )

# c ) The melon c o s t s 2 e u r o p e r kg and t h e p i n e a p p l e 4 e u r o p e r kg .


# Give an e x p r e s s i o n f o r t h e t o t a l p r i c e Z u s i n g X and Y . What i s t h e
# d i s t r i b u t i o n o f Z?
mu z <− 2 ∗ mu m + 4 ∗ mu p
s i g m a z <− ( 2 ˆ 2 ∗ sigma m ˆ2 + 4ˆ2 ∗ s i g m a p ˆ 2 ) ˆ 0 . 5

# d) Cal cu l at e the p r o b a b i l i t y that the price Z is h i g h e r t h an 4 e u r o .


1−pnorm ( 4 , mu z , s i g m a z )

Prof. Dr. Falkenberg, Faculty 2 Statistics WS 20/21


17
Central Limit Theorem
1.

2. A machine consists of the three modules A, B and C. The machine


works only if all three modules are working and if no error occured
during the construction phase. The probabilities that the modules
A, B and C are defect are 1%, 1% and 5%. The probability for an
error during the construction phase 2%. The four kinds of errors occur
independently of each other.

(a) Calculate the expectation and the variation of the number of de-
fect machines in a lot of 1000 randomly chosen machines.
(b) The producer is thinking about guaranteeing that not more than
110 machines are defect i such a lot. With which approximate
probability can this guarantee promise be kept?
(c) Each defect machine provokes an extra cost of 100 euro. The
producer considers to buy a better module C (at a higher price)
but with an error rate of is 1%.
How high can the additional cost for each machine for module
C be, in order to say that it is (according to the expectation)
profitable to buy the more expensive module C?

Answer:

(a) P (A and B and C ok and no construction error) =


= P (A ok) · P (B ok) · P (C ok) · P (no construction error) =
= 0.99 · 0.99 · 0.95 · 0.98 = 0.91247
X = # defect machines in a set of 1000 machines. Then X ∼
B(n = 1000, p = 0.08753). We become E(X) = n · p = 87.53 and
V ar(X) = n · p · (1 − p) = 87.53 · (1 − 0.08753) ≈ 79.86
(b) Approximation with
 a normal distribution:
P (X ≤ 110) ≈ Φ 110+0.5−87.53

79.86
≈ Φ(2.58) = 0.995
(c) Cost for each: 100 · 0.0875 = 8.753
The better module C: pc = 0.01 and p̂= probability that a machine
is defect. We become:
p̂ = 1 − 0.99 · 0.99 · 0.99 · 0.98 ≈ 0.0491 Costs with this better
module: 100p̂ = 4.91 i.e. max. additional costs for module C:
8.753-4.91=3.843, thus 3.84 euro.

Prof. Dr. Falkenberg, Faculty 2 Statistics WS 20/21


18
##################################################################
# A m ach i n e c o n s i s t s o f t h e t h r e e m od u l es A, B and C . The m ach i n e
# wor ks o n l y i f a l l t h r e e m od u l es a r e w o r k i n g and i f no e r r o r
# o c c u r e d d u r i n g t h e c o n s t r u c t i o n p h a s e . The p r o b a b i l i t i e s t h a t
# t h e m od u l es A, B and C a r e d e f e c t a r e 1% , 1% and 5%. The
# p r o b a b i l i t y f o r an e r r o r d u r i n g t h e c o n s t r u c t i o n p h a s e 2%. The
# f o u r ki n d s o f e r r o r s occu r i n d ep en d en tl y o f each oth er .
#
# f i l e : p r o b c l m o d u l e s .R
##################################################################

# a ) C a l c u l a t e t h e e x p e c t a t i o n and t h e v a r i a t i o n o f t h e number o f
# d e f e c t m ach i n es i n a l o t o f 1000 r an d om l y c h o s e n m ach i n es .
n <− 1000
pa <− 0 . 0 1
pb <− 0 . 0 1
pc <− 0 . 0 5
pcp <− 0 . 0 2
# p r ob . o f no e r r o r
p n o e r r o r <− (1 − pa ) ∗ (1 − pb ) ∗ (1 − pc ) ∗ (1 − pcp )
# expected value
e x p n d e f <− n ∗ (1 − p n o e r r o r )
# variance
v a r n d e f <− n ∗ (1 − p n o e r r o r ) ∗ p n o e r r o r
p no error ; exp n def ; var n def

# b ) The p r o d u c e r i s t h i n k i n g ab ou t g u a r a n t e e i n g t h a t n ot more
# t h an 110 m ach i n es a r e d e f e c t i n su ch a l o t . With wh i ch a p p r o x i m a t e
# p r o b a b i l i t y can t h i s g u a r a n t e e p r o m i s e be k e p t ?
pbinom ( 1 1 0 , n ,1 − p n o e r r o r ) # 0 . 9 9 3 6 7 8 2
# a p p r o x i m a t i o n by a n or m al d i s t r i b u t i o n
# without c o n t i n u i t y c o r r e c t i o n
pnorm ( 1 1 0 , mean=e x p n d e f , sd =( v a r n d e f ) ˆ 0 . 5 ) # 0 . 9 9 4 0 4 2 9
# with c o n t i n u i t y c o r r e c t i o n
pnorm ( 1 1 0 . 5 , mean=e x p n d e f , sd =( v a r n d e f ) ˆ 0 . 5 ) # 0 . 9 9 4 9 2 4 2

# c ) Each d e f e c t m ach i n e p r o v o k e s an e x t r a c o s t o f 100 e u r o .


# The p r o d u c e r c o n s i d e r s t o buy a b e t t e r module C ( a t a h i g h e r
# p r i c e ) b u t w i t h an e r r o r r a t e o f i s 1%. How h i g h can t h e
# a d d i t i o n a l c o s t f o r e a c h m ach i n e f o r module C be , i n o r d e r t o
# s a y t h a t i t i s ( a c c o r d i n g t o t h e e x p e c t a t i o n ) p r o f i t a b l e t o buy
# t h e more e x p e n s i v e module C?
p n o e r r o r n e w <− (1 − pa ) ∗ (1 − pb ) ∗ ( 1 − 0. 01) ∗ ( 1 − pcp )
100 ∗ (1 − p n o e r r o r ) − 100 ∗ (1 − p n o e r r o r n e w )

3. An airline knows that over the long run, 90% of passengers who reserve
seats show up for their flight. On a particular flight with 300 seats, the
airline accepts 324 reservations.

(a) Assuming that passengers show up independently of each other,


what is the chance that a passenger with a reservation do not get
a seat?
(b) How many reservations can be given, if the airline will accept an
overbooking probability of 1%?

Answer: Let n be the number of flight tickets, which are sold, and
let Yi = 1, if the person having flight ticket i shows up, and otherwise
Yi = 0, 1 ≤ i ≤ n.
Yi are Bernoulli-distributed stochastic variables with P (Yi = 0) = 0.1 =
1 − P (Yi = 1) for all 1 ≤ i ≤ n.
Let Sn = ni=1 Yi be the number of passengers showing up. We assume
P
that the passengers show up independent of each other. In this case
we have E[Sn ] = n · E[Y1 ] und V ar(Sn ) = n · V ar(Y1) with E[Y1 ] =

Prof. Dr. Falkenberg, Faculty 2 Statistics WS 20/21


19
P (Y1 = 1) = 0.9 and V ar[Y1 ] = P (Y1 = 1) · P (Y1 = 0) = 0.9 · 0.1
All passengers must become a seat with a probability higher than 0.99.
The plane has totally 300 seats. Thus, the following equation must be
fulfilled: P (Sn > 300) < 0.01
Sn −E[Sn ]
Using the central limit theorem, we become W = √ ∼ N(0, 1)
V ar(Sn )
approximately i.e. 
Sn −E[Sn ] 300−E[Sn ]
P (Sn > 300) = P √ >√ < 0.01
V ar(Sn ) V ar(Sn )
This
 can be expressed as:  
300.5−0.9·n 300.5−0.9·n
P W > √n·0.9·0.1 < 0.01 and P W ≤ √
n·0.9·0.1
≥ 0.99
300.5−0.9·n
We become √
n·0.9·0.1
≥ Φ (0.99)u0.99 the equation
−1

300.5 − n · 0.9

u0.99 =
0.09 · n
0.09 · u0.99 · n = 300.52 − 2 · 0.9 · 300.5 · n + 0.92 · n2
2

0.09 · u20.99 + 2 · 0.9 · 300.5 300.5


0 = n2 − +
0.92 0.92
⇒ n1 = 320.0169, n2 = 348.3622

Since 300.5 − n · n must be positive n2 is no solution. Thus, the airline


should sell at most 320 flight tickets.
##############################################################
# An a i r l i n e knows t h a t o v e r t h e l o n g run , 90% o f passengers
# who r e s e r v e s e a t s show up f o r t h e i r f l i g h t . On a p a r t i c u l a r
# f l i g h t w i t h 300 s e a t s , t h e a i r l i n e a c c e p t s 324 r e s e r v a t i o n s .
#
# f i l e : p r o b c l a i r l i n e .R
##############################################################

# a ) Assuming t h a t p a s s e n g e r s show up i n d e p e n d e n t l y o f e a c h o t h e r ,
# what i s t h e c h a n c e t h a t a p a s s e n g e r w i t h a r e s e r v a t i o n do n ot
# get a seat ?
n <− 3 2 4 ; p <− 0 . 9
# exact value
p e x <− 1− pbinom ( 3 0 0 . 5 , n , p )
p ex
# ap p r ox . v a l u e
m <− n ∗ p ; s <− s q r t ( n ∗ p ∗ (1 − p ) )
p ap p <− 1−pnorm ( 3 0 0 . 5 ,m, s )
p ap p

# b ) How many r e s e r v a t i o n s can be g i v e n , i f the airline will


# a c c e p t an o v e r b o o k i n g p r o b a b i l i t y o f 1%?

# e x a c t bound
n <− s e q ( 3 0 1 , 3 5 0 , 1 )
p e x <− pbinom ( 3 0 0 , n , p )
o e x <− n [ max ( wh i ch ( p e x >= 0 . 9 9 ) ) ]
o ex

# ap p r ox . bound
m <− n ∗ p ; s <− s q r t ( n ∗ p ∗ (1 − p ) )
p ap p <− pnorm ( 3 0 0 . 5 ,m, s )
o ap p <− n [ max ( wh i ch ( p ap p >= 0 . 9 9 ) ) ]
o ap p

# a l t e r n a t i v e s o l u t i o n : s o l v i n g the equation
# ( 3 0 0 . 5 − p ∗ n ) ˆ 2 = p ∗ (1 − p ) ∗ u ˆ2 0 . 9 9 ∗ n
# q u a d r a t i c e q u a t i o n : n ˆ2 + a ∗ n +b =0

Prof. Dr. Falkenberg, Faculty 2 Statistics WS 20/21


20
u 0 9 9 <− qnorm ( 0 . 9 9 , 0 , 1 )
#u 0 9 9
#p
a <− − (2 ∗ p ∗ 3 0 0 . 5 + p ∗ (1 − p ) ∗ u 0 9 9 ˆ 2 ) / ( p ˆ 2 )
b <− ( 3 0 0 . 5 / p ) ˆ 2
# a;b
n1 <− −a /2 + s q r t ( ( a / 2 ) ˆ 2 −b )
n2 <− −a /2 − s q r t ( ( a / 2 ) ˆ 2 −b )
n1 ; n2
300.5 − n1 ∗ p
300.5 − n2 ∗ p

# s o l u t i o n w i t h o u t s o l v i n g t h e ab ove e q u a t i o n
library ( tidyverse )
tibble (
n = 300:324 ,
p = 1− pbinom ( 3 0 0 , s i z e = n , p r ob = 0 . 9 ) ,
p . app = 1−pnorm ( 3 0 0 . 5 , mean = n ∗ 0 . 6 , sd = s q r t ( n ∗ 0 . 9 ∗ 0 . 1 ) )
) %>% f i l t e r ( p >= 0 . 0 1 ) %>% f i l t e r ( n == min ( n ) ) # n < 3 2 1 ! !

4. As a new residential area with 1000 domestic homes is going to be


built, the number of required parking lots is calculated in the following
way: We assume that there is no relation between the number of cars
in different homes. Furthermore, we assume that a domestic home has
no car with probability 0.2, one car with probability 0.7 and two cars
with probability 0.1. The number of parking lots should be planned in
such way that the probability that each car gets a parking lot is 0.99.
How many parking lots should be built?
Answer: Let Xi , 1 ≤ i ≤ 1000 be the number of cars in household i.
Then Xi , 1 ≤ i ≤ 1000 are independent stochastic variables with the
same discrete distribution.
P (Xi = 0) = 0.2 and P (Xi = 1) = 0.7 und P (Xi = 2) = 0.1, i =
1, . . . , 1000
Let YPbe the total number of cars in the 1000 households:
Y = 1000 i=1 Xi . The central limit theorem gives that:
i=1 Xi )−1000·E[X1 ]
( 1000
P
Z = √ is an approximatively normally distributed
1000·V ar(X1 )
variable is, with Z ∼ N(0, 1).
P2
We have E[X1 ] = x=0 xP (X1 = x) = 1 · 0.7 + 2 · 0.1 = 0.9 and
V ar(X1 ) = E(X12 ) − (E(X1 ))2 = 12 · 0.7 + 22 · 0.1 − 0.92 = 0.29
The smallest number of parking lots to build, so that all cars get a
parking lot with probability 0.99, is given by: P (Y ≤ y) = 0.99. We
become the equation  P1000 
P1000  ( i=1√ Xi )−1000·0.9 y−1000·0.9
0.99 = P (Y ≤ y) = P i=1 Xi ≤ y = P 1000·0.29
≤ √1000·0.29 ≈
P Z < y−900

17.03
This gives y−900
17.03
= Φ−1 (0.99) = 2.33
Thus y = 900 + 2.33 · 17.03 ≈ 939.7. Thus, 940 parking lots must be
built.

Prof. Dr. Falkenberg, Faculty 2 Statistics WS 20/21


21
###################################################################
# As a new r e s i d e n t i a l a r e a w i t h 1000 d o m e s t i c homes i s going
# t o be b u i l t , t h e number o f r e q u i r e d p a r k i n g l o t s i s c a l c u l a t e d
# i n t h e f o l l o w i n g way : We assume t h a t t h e r e i s no r e l a t i o n b et ween
# t h e number o f c a r s i n d i f f e r e n t homes . Fu r t h er m or e , we assume
# t h a t a d o m e s t i c home h as no c a r w i t h p r o b a b i l i t y 0 . 2 , on e c a r
# w i t h p r o b a b i l i t y 0 . 7 and two c a r s w i t h p r o b a b i l i t y 0 . 1 . The number
# o f p a r k i n g l o t s s h o u l d be p l a n n e d i n su ch way t h a t t h e p r o b a b i l i t y
# t h a t e a c h c a r g e t s a p a r k i n g l o t i s 0 . 9 9 . How many p a r k i n g l o t s
# s h o u l d be b u i l t ?
#
# f i l e : p r o b c l r e s a r e a .R
###################################################################

p0 <− 0 . 2
p1 <− 0 . 7
p2 <− 0 . 1
n <− 1000
# expected values
EX <− 0 ∗ p0 + 1 ∗ p1 + 2 ∗ p2
EX2 <− 0ˆ2 ∗ p0 + 1ˆ2 ∗ p1 + 2ˆ2 ∗ p2
# variance
VarX <− EX2 − (EX) ˆ 2
EX
VarX
# 99% q u a n t i l e
qnorm ( 0 . 9 9 , n ∗EX, ( n ∗ VarX ) ˆ 0 . 5 ) # 9 3 9 . 6 1 6 3

5. Starting in the origin, a particle is moving along the integer axes in


this way:
At each time point 1, 2, 3, . . ., the particle is moving either one step to
the left or one step to the right with the same probability. There is
also a third alternative: The particle stays constant, without moving.
This alternative has the probability p and 0 < p < 1. The movement
of the particle is independent of earlier movements.
How should p be chosen, if we want that the probability is 1% that the
particle at time point 100 is located to the right of point 15?
Answer: Let Sn be the Pnlocation of the particle at time n.
Then we have Sn = i=1 Xi , with Xi independent, identically dis-
tributed random variables and
P (Xi = 0) = p and P (Xi = 1) = P (Xi = −1) = (1 − p)/2.
We look for p such that P (S100 > 15) = 0.01. We have E[Xi ] = 0 and
V ar(Xi ) = E[Xi2 ] = 1 − p. p
The central limit theorem gives that Sn / n(1 − p) converges towards
a N(0, 1)−distribution. Thus:   
15
P (S100 > 15) = 1 − P √ 100 S
≤√ ≈ 1 − Φ 2√31−p
100(1−p) 100(1−p)

This must be equal to 0.01. Thus, 3/2 1 − p must √ be equal to the
99%-quantile of the N(0, 1)−distribution, i.e. 3/2 1 − p = 2.3263,
and this gives p ≈ 0.58.
R-Code: 1-(1.5/qnorm(0.99,0,1))2̂

Prof. Dr. Falkenberg, Faculty 2 Statistics WS 20/21

You might also like