This action might not be possible to undo. Are you sure you want to continue?

Welcome to Scribd! Start your free trial and access books, documents and more.Find out more

1 Combinatorial analysis 1

1.1 Fundamental Principles Of Counting: Tree Diagram . . . . . . . . . . . . . 1

1.2 Factorial Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Elements of probability theory 1

2.1 Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2.1.1 Venn Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1.2 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1.3 Set Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.4 Principle Of Duality: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Sample Space And Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.3 The Concept Of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.4 Addition Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.5 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.6 The Multiplication Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.7 Bayes’ Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 Elements of mathematical statistics 1

3.1 Stochastic Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

0.1

0 CONTENTS

3.2 Discrete Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . 2

3.3 Continuous Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . 3

3.4 Joint Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3.5 Expected Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.6 Some Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.6.1 Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.6.2 Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.6.3 Other Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . 12

3.7 Some Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.7.1 Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.7.2 Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.7.3 Erlang-k Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4 Theory of sampling 1

4.1 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

4.2 Sampling Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

4.3 The Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

4.4 Sampling Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

4.4.1 Population mean µ and variance σ

2

are known . . . . . . . . . . . . . 3

4.4.2 Population mean µ and variance σ

2

are both unknown . . . . . . . . . 4

4.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Author index 7

Index 7

Chapter 1

Combinatorial analysis

In some cases the number of possible outcomes for a particular event is not very large, and

so direct counting of possible outcomes is not diﬃcult. However, problems often arise where

direct counting becomes a practical impossibility. In these cases use is made of

combinatorial analysis, which could be called a sophisticated way of counting.

1.1 Fundamental Principles Of Counting: Tree

Diagram

We ﬁrst introduce two rules which are employed in many proofs through the combinatorial

analysis:

Rule of Sum: If object A may be chosen in m ways, and object

B in n other ways, ”either A or B” may be

chosen in m + n ways.

Rule of Product: If object A may be chosen in m ways, and

thereafter object B in n ways, ”both A

and B” may be chosen in this order in

m· n ways (multiplication principle).

It should be noticed that in the Rule of Sum, the choices of A and B are mutually

exclusive, that is, one can not choose both A and B but either A or B.

1

2 CHAPTER 1. COMBINATORIAL ANALYSIS

T

1

T

2

L

2

T

1

T

2

T

1

T

2

R

1

R

2

R

3

R

1

R

2

R

3

T

2

T

1

T

1

T

2

T

2

T

1

L

1

Figure 1.1: Fig.1.1 Tree diagram for example (1.1), where, where=2, l=3, and m = 2

The Rule of Product is often used in cases where the order of choosing is immaterial, that

is, where the choices are independent. But in many practical situations the possiblity of

dependence should not be ignored.

If one thing can be accomplished in n

1

diﬀerent ways, and if after this a second thing can be

accomplished in n

2

diﬀerent ways,· · ·, and ﬁnally a k’th thing can be accomplished in n

k

diﬀerent ways, then all k things (which are assumed to be independent of each other) can

be accomplished in the speciﬁed order in n = n

1

· n

2

· · · n

k

diﬀerent ways.

A diagram, called a tree diagram because of its appearance (ﬁg.1.1), is often used in

connection with these rules.

Example 1.1 (see Fig. 1.1)

Let the setting-up procedure of a call involve the following devices:

k local circuits : L

1

, L

2

, · · · , L

k

1.2. FACTORIAL FUNCTION 3

l registers : R

1

, R

2

, · · · , R

l

m trunk circuits : T

1

, T

2

, · · · , T

m

Under the assumption of independence a call can then be set up in

n = k · l · m

diﬀerent ways.

For a group of 1000 subscribers typical ﬁgures are

k = 80, l = 15, m = 50, i.e. n = 60.000.

If a malfunctioning only occurs for a speciﬁc combination of devices, it can be very diﬃcult

to trace the fault, as it only appears in one out of 60.000 calls (assuming random hunting).

2

1.2 Factorial Function

Factorial n (denoted by n!, n integer) is deﬁned as

n! = n · (n −1) · (n −2) · · · 2 · 1 (1.1)

It is convenient to deﬁne

0! = 1 (1.2)

Example 2.1

For many calculators the upper range of number is

9.999 · 10

99

Thus the factorial function only exists for n < 70:

69! = 1.7112 · 10

98

70! = 1.1979 · 10

100

4 CHAPTER 1. COMBINATORIAL ANALYSIS

If n is large a direct evaluation of n! is impractical. In such cases Sterling’s approximation

is often applied:

n! ≈

√

2πn · n

n

· e

−n

(1.3)

where e is the base of natural logarithms

(e 2.718281828 ).

The symbol ≈ means that the ratio of the left side to the right side approaches 1 as n →∞.

For this reason we often call the right side an asymptotic expansion of the left side. The

symbol means approximately equal to.

2

The gamma function, denoted by Γ(n) , is deﬁned for any real value of n > 0:

Γ(n) =

_

∞

0

t

n−1

· e

−t

dt , n > 0 (1.4)

A recurrence formula is

Γ(n + 1) = n · Γ(n) (1.5)

We can easily ﬁnd Γ(1) = 1. If n is a positive integer, then we get (1.1):

Γ(n + 1) = n!

1.3 Permutations

The basic deﬁnition of a permutation is given below:

A r-permutation of n diﬀerent objects is an ordered selection or arrangement of r (r ≤ n) of

the objects.

Actually it is sampling without replacement. Suppose that we are given n distinguishable

objects and wish to arrange r (≤ n) of these objects on a line. Since there are n ways of

choosing 1st object and, after this is done, n −1 ways of choosing the 2nd object,· · · , and

ﬁnally (n −r + 1) ways of choosing the r’th object, by repeating the application of the rule

of product it follows by the fundamental principle of counting that the number of diﬀerent

arrangements, or permutations as they are called, is given by

P

_

n

r

_

= n · (n −1) · · · (n −r + 1) (1.6)

We call P

_

n

r

_

the number of permutations of n objects taken r at a time.

1.4. COMBINATIONS 5

For the particular case where r = n (1.6) becomes

P

_

n

n

_

= n! (1.7)

We can write (1.6) in terms of factorials as

P

_

n

r

_

=

n!

(n −r)!

(1.8)

If r = n we see that (1.6) and (1.8) agree only if 0! = 1 (1.2).

Example 3.1

We consider the case where some objects are identical. The number of permutation of n

objects consisting of groups of which n

1

are identical, n

2

are identical,· · ·, and n

k

are

identical, where

n = n

1

+ n

2

+· · · + n

k

is

n!

n

1

! · n

2

! · · · n

k

!

=

_

n

n

1

n

2

· · · · · n

k

_

. The term of the righthand side is called the Polynomial coeﬃcient.

Example 3.2

Let us consider a group of n circuits. We can look (hunt) for idle circuits in

P

_

n

n

_

= n!

diﬀerent ways.

2

1.4 Combinations

In a permutation we are interested in the order of arrangements of the objects. Thus (a b c)

is diﬀerent permutation from (b c a). In many problems, however, we are only interested in

selecting objects without regards to the order. Such selections are called combinations. For

example (a b c) and (b c a) are the same combination.

6 CHAPTER 1. COMBINATORIAL ANALYSIS

The total number of combinations of r objects selected from n (also called the combinations

of n items taken r at a time) is denoted by C

_

n

r

_

or

_

n

r

_

.

We have

C

_

n

r

_

=

_

n

r

_

=

n!

r!(n −r)!

(1.9)

It can also be written

_

n

r

_

=

P

_

n

r

_

r!

(1.10)

We can easily derive the expression by noticing that each combination of r diﬀerent objects

may be ordered in r! ways, and so ordered it is an r-permutation.

Thus we have

r!C

_

n

r

_

= P

_

n

r

_

= n(n −1) · · · · · (n −r + 1), n ≥ r

By moving r! to the other side we get the expression (1.10).

It is easy to see that

_

n

r

_

=

_

n

n −r

_

(1.11)

The numbers

_

n

r

_

are often called binomial coeﬃcients because they arise in the

binomial expansion:

(x + y)

n

=

n

r=0

_

n

r

_

x

r

· y

n−r

(1.12)

They can be generalized in several ways. Thus we deﬁne:

_

−n

r

_

= (−1)

r

_

n + r −1

r

_

, n > 0 (1.13)

This appears in the Negative Binomial Distribution.

Example 4.1

1.4. COMBINATIONS 7

The number of combinations of n objects taken 1,2,· · · , or n at a time are:

_

n

1

_

+

_

n

2

_

+· · · +

_

n

n

_

= 2

n

−1 (1.14)

i.e.

n

r=0

_

n

r

_

= 2

n

This is readily seen from (1.12) by letting x = y = 1.

2

Example 4.2

The combinations of the letters a, b, c and d taken 3 at a time are

abc, abd, acd, bcd.

To form all the permutations of 4 letters taken 3 at a time it is necessary to take each

combination and write out all possible permutations of the given combination:

Combinations Permutations

abc abc, acb, bac, bca, cab, cba

abd abd, adb, bad, bda, dab, dba

acd acd, adc, cad, cda, dac, dca

bcd bcd, bdc, cbd, cdb, dcb, dbc

Combinations Permutations

aaa, aab, aac, aad aaa, aab, baa, aba, aac, caa, aca, aad, daa, ada

bbb, bba, bbc, bbd bbb, bba, abb, bab, bbc, cbb, bcb, bbd, dbb, bdb

ccc, cca, ccb, ccd ccc, cca, acc, cac, ccb, bcc, cbc, ccd, dcc, cdc

ddd, dda, ddb, ddc ddd, dda, add, dad, ddb, bdd, dbd, ddc, cdd, dcd

abc abc, acb, bac, bca, cab, cba

abd abd, adb, bad, bda, dab, dba

acd acd, adc, cad, cda, dac, dca

bcd bcd, bdc, cbd, cdb, dcb, dbc

2

Example 4.3

8 CHAPTER 1. COMBINATORIAL ANALYSIS

Let us consider a set of n objects consisting of n

1

diﬀerent objects of type 1, n

2

diﬀerent

objects of type 2, etc. where

n = n

1

+ n

2

+· · · + n

k

We consider combinations of r objects containing r

1

objects of type 1, r

2

objects of type 2,

· · ·, r

k

objects of type k. where

r = r

1

+ r

2

+· · · + r

k

, r

i

≤ n

i

The number of these combinations is by the fundamental principle of counting:

_

n

1

r

1

_

·

_

n

2

r

2

_

· · ·

_

n

k

r

k

_

The total number of combinations with r elements is

_

n

r

_

.

2

Many combinatorial problems can be reduced to the following form. For a group of n

circuits, p of them are busy and (n −p) of them are idle. A group of k circuits is chosen at

random. We seek the number of combinations which contain exactly x busy circuits. Here x

can be any integer between zero and p or k, whichever is the smaller.

The chosen group contains x busy and k −x idle circuits. Since any choice of busy circuits

may be combined with any choice of idle ones, the busy ones can be chosen in

_

p

x

_

diﬀerent ways and the idle ones in

_

n −p

k −x

_

diﬀerent ways. Thus the total number of

combinations containing x busy circuits is

_

p

x

_

·

_

n −p

k −x

_

(1.15)

The total number of combinations containing k circuits (idle or busy) is

_

n

k

_

. So the

1.4. COMBINATIONS 9

relative number of ”favourable” combinations is

q

k

=

_

p

x

__

n −p

k −x

_

_

n

p

_

(1.16)

This can be rewritten in the form

q

k

=

_

k

x

__

n −k

p −x

_

_

n

p

_

(1.17)

In terms of probabilities this is called the hypergeometric distribution.

Example 4.4

For the special case x = k we get (1.16)

q

k

=

_

p

k

__

n −p

0

_

_

n

k

_

=

_

p

k

_

_

n

k

_

(1.18)

or from (1.17)

q

k

=

_

k

k

__

n −k

p −k

_

_

n

p

_

=

_

n −k

p −k

_

_

n

p

_

(1.19)

These expressions are useful when deriving Palm-Jacobæus’ formula and Erlang’s

interconnection formula (Erlang’s ideal grading) in teletraﬃc theory.

2

Useful Relations And Results

• Equalities:

_

n

r

_

=

_

n

n −r

_

(1.20)

_

n

r

_

= 0, for r > n and for r < 0 (1.21)

_

n

0

_

= 1 (1.22)

10 CHAPTER 1. COMBINATORIAL ANALYSIS

• Recurrence formula (Pascal’s triangle):

_

n

r −1

_

+

_

n

r

_

=

_

n + 1

r

_

(1.23)

n

r=0

_

n

r

_

= 2

n

(1.24)

n

i=r

_

i

r

_

=

_

n + 1

r + 1

_

(1.25)

1

1 1

1 2 1

1 3 3 1

1 4 6 4 1

· · · · · · · · · · · · · · · · · · · · · · · · · · ·

Fig.1.2 Pascal’s triangle

• Summary:

Permutation =

_

¸

_

¸

_

n!

(n−r)!

, repetitions are not allowed

n

r

, repetitions are allowed

Combination =

_

¸

¸

_

¸

¸

_

n!

r!(n−r)!

, repetitions are not allowed

(n+r−1)!

r!(n−1)!

, repetitions are allowed

1.5 Exercises

1. Evaluate 69! by using Sterling’s approximation to n! (use logarithm to the base 10).

Compare the result with the value given in Example(2.1).

2. In how many ways can 4 calls occupy 10 diﬀerent circuits?

3. How many 6-digit telephone numbers can be formed with the digits 0,1,2,· · ·, 9 (0 is

not allowed in the ﬁrst digit) if

(a) repetitions are allowed?

(b) repetitions are not allowed?

(c) the last digit must be 0 and repetitions are not allowed?

1.5. EXERCISES 11

4. We consider a 4-digit binary number. Every digit may be ”0” or ”1”. How many

diﬀerent numbers have two ”0” (and two ”1”)?

5. Prove that (1.16) and (1.17) are equal.

6. Prove formula (1.23).

7. A trunk group contains 10 circuits. 7 circuits are busy. What is the number of

combinations which contain X(X = 0, 1, 2, 3, 4) busy circuits?

8. The number of combinations of n digits taken x(x = 0, 1, 2, · · · , n) at a time, where

every digit has one of k diﬀerent values 0,1,2,· · ·,k −1 can be shown to be

_

n + k

n

_

Verify this for n = 2 and k = 10.

Updated: 2001.01.10

12 CHAPTER 1. COMBINATORIAL ANALYSIS

Chapter 2

Elements of probability theory

Probability theory deals with the studies of events whose occurrence cannot be predicted in

advance. These kinds of events are termed random events. For example, to throw a single

die, the result may be one of the six numbers: 1, 2, 3, 4, 5, 6. We cannot predict the

result. So the outcome of throwing a die is a random event. When observing the number of

telephone calls arriving at a telephone exchange during a certain time interval, we are of

cause unable to predict the actual number of arriving calls. This is also a random event.

Probability theory is usually discussed in terms of experiments and possible outcomes of the

experiments. The set theory plays an important role in the study of probability theory.

2.1 Set Theory

A set is a collection of objects called elements of the set. In general we shall denote a set by

a capital letter such as A, B, C, etc. and an element by a lower case letter such as a, b, c, etc..

If an element c belongs to a set A, we write c ∈ A. If c does not belong to A, we write

c ∈ A. If both a and c belong to A we write a, c ∈ A. A set is well-deﬁned if we are able to

determine whether a particular element does or does not belong to the set.

A set can be deﬁned by listing its elements. If the set A consists of the elements a, b, c, then

we write

A = {a, b, c}

A set can also be deﬁned by describing some properties held by all elements and by

non-elements. We call a set a ﬁnite (or inﬁnite) one if it contains a ﬁnite (or inﬁnite)

number of elements.

1

2 CHAPTER 2. ELEMENTS OF PROBABILITY THEORY

Example 1.1

A = {abc, acd, abd, bcd}

= {x | x is a combination of the letters a, b, c

and d taken three at a time }

2

Example 1.2

B = {x | x is the number of telephone call attempts

between 9 a.m. and 10 a.m. }

2

If any element of a set A does belong to a set B, then we call A a subset of B, written:

A ⊆ B (” A is contained in B ”) or

B ⊇ A (” B contains A ”)

For all sets we have A ⊆ A. If both A ⊆ B, and B ⊆ A, then A and B are said to be equal,

and we write A = B. In this case A and B have exactly the same elements.

If A and B do not have the same elements, we write A = B.

If A ⊆ B, but A = B, then we call A a proper subset of B denoted by A ⊂ B.

Example 1.3 (cf. Example 1.1)

The set consisting of the combinations of the letters a, b, c, and d taken three at a time is

a proper subset of the set consisting of the permutations of the same four letters taken three

at a time.

2

Example 1.4 (cf. Example 1.2)

The set consisting of successful call attempts between 9 a.m. and 10 a.m. is a subset of all

call attempts during the same period.

2.1. SET THEORY 3

2

The following theorem is true for any sets A, B, C:

if A ⊆ B and B ⊆ C then A ⊆ C

All sets considered are in general assumed to be subsets of some ﬁxed set called the universe

or the universal set, and denoted by U. It is also useful to deﬁne a set having no elements

at all. This is called the null set and is denoted by ∅.

2.1.1 Venn Diagram

A universe U can be shown graphically by the set of points inside a rectangle (ﬁg. 2.1).

Subsets of U (such as A and B shown in ﬁg. 2.1) can be represented by sets of points inside

circles. Such a diagram is called a Venn Diagram. It often serves to provide geometric

intuition regarding possible relationships between sets.

....... ... .. . .. . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . .. . .. ... ...... .....................................................................................................................................................................................................................

.

.

.

.

. .

.

. .

.

. .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. .

.

. .

.

. .

.

.

.

.

....... ... .. . .. . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . .. . .. ... ...... .....................................................................................................................................................................................................................

.

.

.

.

. .

.

. .

.

. .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. .

.

. .

.

. .

.

.

.

.

A

B

U

Fig. 2.1 A Venn Diagram. U is the universe. A and B are subsets.

2.1.2 Operators

In set theory we deﬁne a number of operators. We assign symbols to them, just as

arithmetic operators for addition, subtraction, multiplication and division have symbols like

+, −, ×,÷. The set operators are:

4 CHAPTER 2. ELEMENTS OF PROBABILITY THEORY

Union (symbol ∪).

The union of A and B, denoted by A∪ B, is the set of elements which belongs to

either A or B or both A and B (ﬁg. 2.2).

Intersection (symbol ∩).

The intersection of A and B, denoted by A∩ B, is the set of elements which belong to

both A and B (ﬁg. 2.3).

If A∩B = ∅, then the two sets are called disjoint or mutually exclusive. They have no

elements in common.

Diﬀerence (symbol \).

The diﬀerence of A and B, denoted by A\ B, is the set consisting of all elements of A

which do not belong to B (ﬁg. 2.4).

Complement (symbol C).

The absolute complement or, simply, the complement of A denoted by CA, is the set

of elements which does not belong to A.

The complement of A relative to B, denoted by C

B

A is the set of elements in B which

does not belong to A (i.e. B \ A) (ﬁg. 2.5).

..... ... ... .. . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . .. ... ... .........................................................................................................................................................................................................................

.

.

.

.

. .

.

. .

.

. .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. .

.

. .

.

. .

.

.

.

.

..... ... ... .. . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . .. ... ... .........................................................................................................................................................................................................................

.

.

.

. .

.

. .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. .

.

. .

.

. .

.

.

.

.. ..... ... .. . .. . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . .. . .. ... ........................................................................................................................................................................................................................... .. ..... ... .. . .. . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . .. . .. ... ...........................................................................................................................................................................................................................

.

.

. .

.

. .

. .

. .

. .

. .

. .

. .

.

. .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. .

. .

.

. .

. .

. .

. .

. .

. .

. .

.

A

B

U

A

B

U

Fig. 2.2 The union :A∪ B Fig. 2.3 The intersection :A ∩ B

.. ..... ... .. . .. . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . .. . .. ... ........................................................................................................................................................................................................................... .. ..... ... .. . .. . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . .. . .. ... ...........................................................................................................................................................................................................................

. . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . . .

. . . . .

. . . . . .

. . . . .

. . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. .

. .

.

.

.

.

.

.

.

.

. .

. .

.. ..... ... .. . .. . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . .. . .. ... ...........................................................................................................................................................................................................................

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. . . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . .

A

B

U

A

U

Fig. 2.4 The diﬀerence :A\ B Fig. 2.5 The complement :CA

2.1. SET THEORY 5

2.1.3 Set Theorems

Set operations are similar to those of Boolean algebra as seen from the following theorems.

1. Idempotent laws:

A ∪ A = A A∩ A = A

2. Commutative laws:

A∪ B = B ∪ A A∩ B = B ∩ A

3. Associative laws:

A∪ (B ∪ C) = (A∪ B) ∪ C = A ∪ B ∪ C

A∩ (B ∩ C) = (A∩ B) ∩ C = A ∩ B ∩ C

4. Distributive laws:

A∩ (B ∪ C) = (A∩ B) ∪ (A∩ C)

A∪ (B ∩ C) = (A∪ B) ∩ (A∪ C)

5. Identity laws:

A ∪ ∅ = A A∩ ∅ = ∅

A∪ U = U A∩ U = A

6. De Morgan’s laws:

C(A ∪ B) = CA ∩ CB

C(A ∩ B) = CA ∪ CB

7. Complement laws:

A∪ CA = U A∩ CA = ∅

C(CA) = A CU = ∅ C∅ = U

8. For any sets A and B:

A = (A∩ B) ∪ (A ∩ CB)

2.1.4 Principle Of Duality:

Any true results involving sets is also true if we replace unions by intersections, intersections

by unions, sets by their complements and we reverse the inclusion symbols ⊂ and ⊃.

6 CHAPTER 2. ELEMENTS OF PROBABILITY THEORY

2.2 Sample Space And Events

Experiments are of great importance in science and engineering. Experiments in which the

outcome will not be the same, even though the conditions are nearly identical, are called

random experiments and are subject to study by probability theory.

A set of a list of all possible outcomes of an experiment is called a sample space denoted by

S. The individual outcome is called a sample point which is an element of the set S.

Example 2.1

Experiment : Throw a die

Sample space: S = {1, 2, 3, 4, 5, 6}

Example 2.2

Experiment : Observation of the number of busy

lines in a group of n lines.

Sample space: S = {0, 1, 2, · · · , n}

2

Example 2.3

Experiment : Observation of the number of calls

arriving at a telephone station

during a certain time interval.

Sample space: S = {0, 1, 2, · · ·}

2

Example 2.4

Experiment : Make a telephone call between 9 a.m.

and 10 a.m. and observe the time

of the call.

Sample space: S = {t | 9 ≤ t ≤ 10}

2

2.2. SAMPLE SPACE AND EVENTS 7

If the sample space has a ﬁnite number of points it is called a ﬁnite sample space (Example

2.1 and 2.2). If it has as many points as there are natural numbers it is called a

countable ﬁnite sample space (Example 2.3). In both cases it is called a discrete sample

space. If it has as many points as there are points in some interval it is called a

non-countable inﬁnite sample space or a continuous sample space(Example 2.4).

An event is a subset A of the sample space, i.e. it is a set of possible outcomes. If the

outcome of an experiment is A we say that the event A has occurred or that A is a

realization of the experiment.

An event which consists of one sample point of S is called a simple event. It can not be

broken down to other events. A compound event is the aggregate of a number of simple

events. A sure event is an event that will deﬁnitely occur. Naturally an impossible event

never occurs.

Example 2.5

For a group of 12 lines the compound event that exactly 2 circuits are busy consists of

_

12

2

_

= 66 simple events.

2

Since events are sets, statements concerning events can be translated into the language of

set theory and conversely. We can represent events graphically on a Venn Diagram, and we

also have an algebra of events corresponding to the algebra of sets given in section 2.1. By

using the operators of section 2.1 on events in S we can obtain other events in S. If A and

B are events, we have

A∪ B : the event ”either A or B or both”

A∩ B : the event ”both A and B”

CA : the event ”not A”

A\ B : the event ”A but not B”

∅ : the event ”never occur”

S : the event ”surely occur”

If A∩ B = ∅, that is the sets corresponding to events A and B are disjoint, then both

events cannot occur simultaneously. They are mutually exclusive.

A set of events is termed exhaustive if their union is the entire sample space S.

8 CHAPTER 2. ELEMENTS OF PROBABILITY THEORY

Example 2.6

Consider a trunk group having four trunk circuits. The experiment is to observe the state of

the trunks: busy or idle. A sample space S for this experiment of observing trunk circuits 1

through 4 may be the set of four-tuples (a

1

, a

2

, a

3

, a

4

). a

i

is either 1 or 0 indicating that

the i’th trunk circuit is busy or idle. Thus the sample point (1,0,1,0 ) corresponds to the

outcome that the ﬁrst and the third circuits are busy while the second and the fourth

circuits are idle. The sample space consists of 2

4

= 16 sample points.

Let A be an event in which at least two circuits are idle, and let B be an event in which no

more than two circuits are idle. Then A∪ B is the whole space S. A∩ B is the collection of

elements of S in which just two trunks are idle. If C is the event that exactly one circuit is

idle, then A∩ C = ∅ that is to say A and C have no sample points in common.

2

2.3 The Concept Of Probability

Probability is a positive measure between 0 and 1 associated with each simple event, the

total of all simple event probabilities being 1. From a strict mathematical point of view it is

diﬃcult to deﬁne the concept of probability. We shall use a relative frequency approach (the

posteriori approach)

In early or classical probability theory, all sample spaces were assumed to be ﬁnite, and

each sample point was considered to occur with equal frequency. The deﬁnition of the

probability P of an event A was described by the relative frequency by which A occurs:

P(A) =

h

n

where h is the number of sample points in A and n is the total number of sample points.

This deﬁnition is applicable in some cases as the following example.

Example 3.1 (ref. Example 2.6)

According to Example 2.6, there are 2

4

= 16 sample points. A is the event that at least two

trunk circuits are idle, B is the event that at most two circuits are idle, C is the event that

exactly one circuit is idle. We use the combinational analysis to get the probabilities:

h

A

=

_

4

2

_

+

_

4

3

_

+

_

4

4

_

= 11

2.3. THE CONCEPT OF PROBABILITY 9

and P(A) =

h

A

n

=

11

16

= 0.6875

h

B

=

_

4

0

_

+

_

4

1

_

+

_

4

2

_

= 11

and P(B) =

h

B

n

=

11

16

= 0.6875

h

A∪B

=

4

i=0

_

4

i

_

= 16

and P(A∪ B) =

16

16

= 1.

We ﬁnd that a sure event has the probability 1.

h

A∩B

=

_

4

2

_

= 6

and P(A∩ B) =

6

16

= 0.375

h

C

=

_

4

1

_

= 4

and P(C) =

4

16

= 0.25

h

A∩C

= 0

and P(A∩ C) =

0

16

= 0.

We ﬁnd that an impossible event has the probability 0.

2

Let us consider an experiment with sample space S. Let h be the number of times that the

event A occurs in n repetitions of the experiment. Then we deﬁne the probability of A by

P(A) = lim

n→∞

h

n

(2.1)

Thus the probability of an event is the proportion of all experiments in which this event

occurs when we make a very large number of experiments. From the deﬁnition we obtain a

number of basic properties:

1. For every event A we have:

0 ≤ P(A) ≤ 1 (2.2)

10 CHAPTER 2. ELEMENTS OF PROBABILITY THEORY

2. An impossible event has zero probability:

P(∅) = 0 (2.3)

3. A sure event has probability of unity:

P(S) = 1 (2.4)

4. For any number of disjoint events A

1

,A

2

,· · ·,A

k

we have:

P(A

1

∪ A

2

∪ · · · ∪ A

k

) =

k

i=1

P(A

i

) (2.5)

In particular for two disjoint events:

P(A

1

∪ A

2

) = P(A

1

) +P(A

2

) (2.6)

If S is a continuous sample space, then the probability of any particular sample point is

zero. We therefore need the concept of a probability density:

P(A) =

_

A

p(s)ds (2.7)

This is similar to the discrete case (2.5), and all laws of probability still apply if we replace

summation by integration.

Example 3.2

In Example 2.4 we have a continuous sample space {9 ≤ t ≤ 10}. The probabiliy of a call at

9:30 sharp is zero. If we assume the call is equally likely to occur anywhere between 9 and

10, then the density function becomes (1hour)

−1

= (60minutes)

−1

. The probability of a call

between 9:29 and 9:31 then becomes

2

60

=

1

30

.

2

Example 3.3

A single die is thrown. The sample space is

S = {1, 2, 3, 4, 5, 6}. If we assume the die is fair, then we assign equal probabilities to the

sample points:

P(1) = P(2) = P(3) = P(4) = P(5) = P(6) =

1

6

2.4. ADDITION RULE 11

2

In some cases enough is known about the experiment to enumerate all possible outcomes

and to state that these are equally likely. The probability of A is then equal to the ratio of

the number of outcomes in which A is realized to the total number of outcomes.

Combinatorial analysis is useful to ﬁnd the relevant number of outcomes.

Example 3.4 (cf.Chapter 1 Example 4.4)

If all possible combinations are equally likely, then the probability of ﬁnding k busy circuits

are given by (1.18) or (1.19).

2

We now consider an important theorem in probability theory.

2.4 Addition Rule

If A and B are two events, then

P(A∪ B) = P(A) +P(B) −P(A∩ B) (2.8)

This formula might be used to simplify the evaluations of Example 3.1.

If the events are mutually exclusive (disjoint), then

P(A∪ B) = P(A) + P(B) (2.9)

Generalizations to 3 or more events can be made.

2.5 Conditional Probability

We now investigate a more complicated version of the case in Example 2.6.

Given the ﬁrst trunk circuit is known to be idle, then we are interested in the probability P

that there are at least two other circuits idle. It is obvious that” the ﬁrst circuit is idle” is

an event, and that” at least two other circuits are idle is also an event. So P is the

probability that one event occurs under the condition that another event has occurred. This

kind of probability is naturally called conditional probability.

12 CHAPTER 2. ELEMENTS OF PROBABILITY THEORY

We consider an experiment. If it is known that an event B has already occurred, then the

probability that the event A has also occurred is known as the conditional probability. This

is denoted by P(A | B), the conditional probability of A given B, and it is deﬁned by

P(A | B) =

P(A∩ B)

P(B)

(2.10)

i.e. the number of experiments in which A and B (at least) are realized.

Example 3.5

Let A denote the event ”a group of 5 circuits (a, b, c, d, e) contains 2 calls only which occupy

adjacent circuits”. Let B denote the event ”a group of 5 circuits contains 2 calls only, one of

which occupies the circuit a”.

From combinatorial analysis we know that 2 lines can be busy in

_

5

2

_

= 10 diﬀerent ways.

Therefore we get P(B) =

4

10

(ab, ac, ad, or ae occupied), and P(A∩ B) =

1

10

(ab occupied).

P(A | B) =

1

10

4

10

=

1

4

2

2.6 The Multiplication Theorem

From (2.10) we get the following formula:

P(A∩ B) = P(A) · P(B | A) = P(B) · P(A | B) (2.11)

If the events are mutually exclusive we get

P(A∩ B) = P(A) · P(B) (2.12)

In this case A and B are said to (statistically) independent events. The probability of event

A does not depend on whether B has occurred or not:

P(B | A) = P(B)

or

P(A | B) = P(A)

2.7. BAYES’ RULE 13

Example 3.6

If the probability of getting B-busy (called party busy) at a random point of time is 0.10,

then the probability of getting B-busy in two call-attempts at two diﬀerent days is

0.10 · 0.10 = 0.01.

2

2.7 Bayes’ Rule

Let A

1

, A

2

,· · ·,A

k

be mutually exclusive events whose union is the sample space S

(exhaustive). Then for any event A we have

P(A

i

| A) =

P(A

i

) · P(A | A

i

)

k

j=1

P(A

i

) · P(A | A

j

)

(2.13)

From this we can ﬁnd the probabilities of the events A

1

,A

2

,· · ·, A

k

which can cause A to

occur. In this case we can thus obtain P(A | B) from P(B | A), which in general is not

possible.

Example 3.7

A and B are two persons in two diﬀerent places calling a third person C. It has been found

that A talks with C on the average 9 times during the same time as B talk with C 10 times.

On the average, out of 100 call attempts, A succeeds 80 times and B 70 times. During this

trial, C’s telephone is ringing, but it is unknown which person is calling successfully. Then

what is the probability that C’s incoming call is from B?

Let T

a

and T

b

be the events that the call is from A and B respectly. Consider the ratio of

the average number of successful calls from A to the average number of successful calls from

B, we have P(T

a

) = 0.9 · P(T

b

). Let D be the event that a call reaches C. According to the

data, we have P(D | T

a

) = 0.8 and P(D | T

b

) = 0.7. Using the Bayes’ formula:

P(T

b

| D) =

P(T

b

) · P(D | T

b

)

P(T

a

) · P(D | T

a

) +P(T

b

) · P(D | T

b

)

=

0.7 · P(T

b

)

0.9 · P(T

b

) · 0.8 + 0.7 · P(T

b

)

= 0.493

2

14 CHAPTER 2. ELEMENTS OF PROBABILITY THEORY

2.8 Exercises

1. Telephones returned to a workshop for repair are subject to three kinds of defects,

namely A, B and C. A sample of 1000 pieces was inspected with the following results:

400 had type A defect (and possibly

other defects).

500 had type B defect (and possibly

other defects).

300 had type C defect (and possibly

other defects).

60 had both type A and B defects

(and possibly C).

100 had both type A and C defects

(and possibly B).

80 had both type B and C defects

(and possibly C).

20 had all type A, B and C defects.

Let A, B, C be subsets of the universe consisting of all 1000 telephones.

(a) Make a Venn Diagram representing

the above subsets.

Find from this diagram.

(b) The number of telephones which had

none of these defects.

(c) The number of telephones which had

at least one of these defects.

(d) The number of telephones which

were free of type A and B defects.

(e) The number of telephones which

had no more than one of these defects.

2. A sample space consists of the following sample points:

U = (E

1

,E

2

,E

3

,E

4

,E

5

,E

6

)

We deﬁne the subsets:

A = (E

2

,E

3

,E

5

)

B = (E

3

,E

5

,E

6

)

(a) Find the intersection of A and B.

2.8. EXERCISES 15

(b) Find the union of A and B.

(c) Find the diﬀerence of A and B.

(d) Find the complement of A.

(e) Find the complement of A relative to B.

3. A ball is drawn at random from a box containing 6 red balls, 4 white balls and 5 blue

balls. Determine the probability that it is

(a) red

(b) white

(c) blue

(d) not red

(e) red or white

4. A die is thrown twice.

(a) Deﬁne the sample space for each throw

and for the total experiment.

(b) Find the probability of the event.

A = two times six

A = at least one six

(c) Find the probability of not getting

a total of 7 or 11 in the two throws.

5. Let the sample space in exercise 2 correspond to the throw of a die. We assign equal

probability to the sample points. Find the probability P(B | A).

6. Determine the probability of three sixs in ﬁve throws of a fair die.

7. A box contains 8 red, 3 white and 9 blue balls. If 3 balls are drawn at random

without replacement, determine the probability that:

(a) all 3 are red

(b) all 3 are white

(c) 2 are red and 1 is blue

(d) at least 1 is white

(e) 1 of each color is drawn

(f) balls are drawn in the order of red, white, blue.

8. Suppose six dice are thrown simultaneously. What is the probability of getting

(a) all faces alike

(b) no two faces alike

(c) only ﬁve diﬀerent faces

16 CHAPTER 2. ELEMENTS OF PROBABILITY THEORY

9. Try to generalize formula (2.8) to 3 events and to prove it.

Updated: 2001.01.10

Chapter 3

Elements of mathematical statistics

3.1 Stochastic Variables

Let us assign a real number to each point of a sample space, i.e. each sample point has a

single real value. This is a function deﬁned on the sample space. This function is called a

stochastic function, and the result of a given experiment which generates sample points is

called a stochastic variable (a random variable). Actually this variable is a

stochastic function deﬁned on the sample space.

A stochastic variable which is deﬁned on a discrete sample space is called a

discrete stochastic variable, and a stochastic variable which is deﬁned on a continuous

sample space and takes on a uncountable inﬁnite number of values is called a

continuous stochastic variable.

A stochastic variable is in general denoted by a capital letter (e.g. X, Y ) whereas the

possible values are denoted by lower case letters (e.g. x, y).

Example 1.1

Teletraﬃc measurements: a traﬃc recorder scans a group of circuits at regular intervals.

The number of circuits found busy is a discrete stochastic variable. The observed call

holding time is a continuous stochastic variable.

2

1

2 CHAPTER 3. ELEMENTS OF MATHEMATICAL STATISTICS

A stochastic variable is characterized by its (cumulative)

distribution function F(x):

F(x) = P{X ≤ x} −∞< x < ∞ (3.1)

Here ”X ≤ x” is a shorthand notation for the event corresponding to the set of all points s

in sample space S for which X(s) ≤ x. F(x) is a never decreasing function of x and

F(−∞) = 0, F(∞) = 1.

3.2 Discrete Probability Distributions

Let X be a discrete stochastic variable which can take the values x

1

, x

2

, · · ·, x

n

(ﬁnite

number or countably many values). If these values are assumed with probabilities given by

P{X = x

k

} = f(x

k

) (3.2)

then we introduce the (probability) density function (frequency function) noted by

P{X = x} = f(x) (3.3)

For x = x

k

this reduces to (3.2), while for other values of x we have f(x) = 0.

In general a function f(x) is a density function if

f(x) ≥ 0 (3.4)

and

x

f(x) = 1 (3.5)

where the sum is to be taken over all possible values of x.

The distribution function is obtained from the density function by noting that

F(x) = P{X ≤ x} =

u≤x

f(u) (3.6)

Example 1.2 (ref. Example 2.6 of Chapter2)

Let us deﬁne a discrete random variable X that counts the number of busy circuits in a

trunk group of 4 trunks. X takes only the values 1, 2, 3 and 4. Let the probabilities be given

by

p(1) = 0.40, p(2) = 0.35, p(3) = 0.15 p(4) = 0.10

Then we can get the distribution function F(x) of the discrete random variable X as follows:

3.3. CONTINUOUS PROBABILITY DISTRIBUTIONS 3

F(0) = 0

F(1) = p(1) = 0.40

F(2) = p(1) +p(2) = 0.75

F(3) = p(1) +p(2) +p(3) = 0.90

F(4) = p(1) +p(2) +p(3) +p(4) = 1.00

Thus for example F(3.4) = 0.90. (ref. Fig. 3.1)

0.0

0.2

0.4

0.6

0.8

1.0

p

r

o

b

a

b

i

l

i

t

y

0 1 2 3 4 5

Number of tests

0.40

0.75

0.90

1.00

Fig. 3.1 Distribution function of Example 1.2

3.3 Continuous Probability Distributions

When X is a continuous stochastic variable, the probability that X takes any one particular

value is in general zero. We noticed, however, in Chapter 2 (2.7) that the probability that

X is in between two diﬀerent values is meaningful. In fact ”a < X ≤ b” is the event

corresponding to the set ]a, b].

The concept of probability density leads us to the introduction of a (probability)

density function f(x) where

f(x) ≥ 0 (3.7)

_

∞

−∞

f(u)du = 1 (3.8)

We denote the probability that X lies between a and b by

P{a < X ≤ b} =

_

b

a

f(u)du (3.9)

4 CHAPTER 3. ELEMENTS OF MATHEMATICAL STATISTICS

Any function satisfying (3.7) and (3.8) will be a density function.

By analogy to (3.6) we deﬁne the distribution function F(x) for a continuous stochastic

variable by

F(x) = P{X ≤ x} = P{−∞< X ≤ x} =

_

x

−∞

f(u)du (3.10)

In general, we have

f(x) = F

(x) (3.11)

P{x < X ≤ x + dx} ≈ f(x)dx (3.12)

is called the probability element of the distribution and expresses the probability that X

belongs to the interval ]x, x + dx] .

Example 3.1

A random variable X is called exponentially distributed with parameter λ (λ > 0) if X has

a density function f(x) deﬁned as follows:

f(x) =

_

0 if x < 0

λ · e

−λx

if x ≥ 0

We get the distribution function F(x) by

F(x) =

_

x

−∞

f(t)dt =

_

0 if x < 0

1 −e

−λx

if x ≥ 0

0.0 0.5 1.0 1.5 2.0 2.5 3.0

X

0.0

0.5

1.0

F(x)

........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

Fig. 3.2 Distribution function for Example 3.1 with parameter λ = 3.

3.4. JOINT DISTRIBUTIONS 5

3.4 Joint Distributions

We are often interested in two stochastic variables X and Y at the same time. These may

be the outcomes of 2 experiments, or they may be a pair of ﬁgures emerging from a single

experiment.

(X, Y ) can be regarded as taking values in the product space (S ×T) consisting of all pairs

(s, t) with s ∈ S and t ∈ T.

We ﬁrst consider the case of two discrete stochastic variables X, Y . The

joint (probability) density function is deﬁned by

f(x, y) = P{X = x, Y = y} (3.13)

where

f(x, y) ≥ 0 (3.14)

u,v

f(u, v) = 1 (3.15)

The probability that X = x

j

and Y = y

k

is given by

f(x

j

, y

k

) = P{X = x

j

, Y = y

k

} (3.16)

The total probability of P{X = x

j

} is obtained by adding all possible values of y

k

:

P{X = x

j

} = f

1

(x

j

) =

ν

f(x

j

, ν) (3.17)

This is called the marginal density function of X.

The joint distribution function of X and Y is deﬁned by

F(x, y) = P{X ≤ x, Y ≤ y} =

u≤x

v≤y

f(u, v) (3.18)

The continuous case is easily obtained by analogy by replacing sums by integrals. It is also

obvious how the mixed case (discrete - continuous) should be dealt with.

If the events X = x and Y = y are independent for all x and y, then we say that X and Y

are independent stochastic variables. In this case

P{X = x, Y = y} = P{X = x} · P{Y = y} (3.19)

or equivalently

f(x, y) = f(x) · f(y) (3.20)

6 CHAPTER 3. ELEMENTS OF MATHEMATICAL STATISTICS

Generalizations to more than two variables can also be made.

Example 4.1

Consider again two consecutive throws of a die. Let X and Y correspond to the result of

the ﬁrst and second throw. We can easily see that X and Y are independent, and

x, y ∈ {1, 2, 3, 4, 5, 6}, each with probability

1

6

. Hence the two-dimentional variable (X, Y )

takes on the pairs of values (i, j) where i, j ∈ {1, 2, 3, 4, 5, 6}.

Let Z = X +Y , which is the sum of the random variables X and Y . It is a one-dimentional

random variable which takes on the values of the sum i + j : 2, 3, 4, · · · , 11, 12. Because of

the independence of X and Y , we have

P(Z = 2) =P(X = 1, Y = 1)

=P(X = 1) · P(Y = 1) =

1

36

P(Z = 3) =P(X = 1, Y = 2)

+P(X = 2, Y = 1) =

1

18

· · · · · · · · · · · ·

P(Z = 12) =P(X = 6, Y = 6)

=P(X = 6) · P(Y = 6) =

1

36

It is easy to verify that

12

i=2

P(Z = i) = 1.

2

3.5 Expected Values

For a discrete stochastic variable X taking the possible values x

1

, x

2

, · · · , x

n

we deﬁne the

expectation of X of the mean of X as follows

E(X) =

n

j=1

x

j

· P{X = x

j

} =

n

j=1

x

j

· f(x

j

) (3.21)

For the continuous case the expectation of X with density function f(x) is deﬁned in a

similar way:

E(X) =

_

∞

−∞

x · f(x)dx (3.22)

Let X be a stochastic variable. Consider a single-valued function g(t), then Y = g(X) is

also a stochastic variable, and in analogy with (3.21) and (3.22) we deﬁne the expectation

of g(x) by:

3.5. EXPECTED VALUES 7

Discrete case:

E(X) =

n

j=1

g(x

j

) · f(x

j

) (3.23)

Continuous case:

E(X) =

_

+∞

−∞

g(x) · f(x)dx (3.24)

E(X) is the mean value of X and is a measure of the location of X.

Example 5.1

Assume that the random variable X can take on two values x

1

= −1 with probability

p

1

= 0.2 and x

2

= 1 with probabity p

2

= 0.8

The expectation of X equals

E(X) = 0.2 · (−1) + 0.8 · 1 = 0.6

2

Example 5.2

Let the stochastic variable X take on the values x = 0, 1, 2, · · · and let

P(X = x) =

λ

x

x!

e

−λ

λ(> 0) is constant.

We ﬁnd the mean value of X:

E(X) =

∞

x=0

x ·

λ

x

x!

· e

−λ

= λe

−λ

∞

x=1

λ

x−1

(x −1)!

= λe

−λ

∞

x=0

λ

x

x!

= λe

−λ

e

λ

= λ

In this case, we call X Possion distributed.

2

Example 5.3

A random variable X is called Normal distributed if

f(x) =

1

√

2π

e

−

x

2

2

8 CHAPTER 3. ELEMENTS OF MATHEMATICAL STATISTICS

We derive the mean value of X:

E(X) =

1

√

2π

_

∞

−∞

xe

−

x

2

2

dx = −

1

√

2π

_

∞

−∞

(−x)e

−

x

2

2

dx

= −

1

√

2π

[e

−

x

2

2

]

+∞

−∞

= 0

2

Some properties of the expectation are shown below:

1. E(c) = c

2. E(c · X) = c · E(X)

3. E(X

1

+ X

2

) = E(X

1

) +E(X

2

)

4. E(X

1

· X

2

) = E(X

1

) · E(X

2

) X

1

, X

2

are independent

Hereby c is a constant and X, X

1

, X

2

are stochastic variables whose expectations exist.

Of particular interest is the expectation of g(x), when g(x) = X

r

, where r is positive

integer. Then α

r

= E(X

r

) is called the r’th moment of X:

Discrete case:

α

r

=

j

x

r

j

· f(x

j

) (3.25)

Continuous case:

α

r

=

_

+∞

−∞

x

r

· f(x)dx (3.26)

We notice that:

α

1

= E(X) α

2

= E(X

2

)

The r’th moment of a stochastic variable X about a is deﬁned by E((X −a)

r

).

Moments about the mean of X are denoted by µ

r

:

Discrete case:

µ

r

=

j

(x

j

−E(X))

r

· f(x

j

) (3.27)

3.5. EXPECTED VALUES 9

Continuous case:

µ

r

=

_

+∞

−∞

(x −E(X))

r

· f(x)dx (3.28)

Of particular interest is the 2nd moment about the mean. This is called the variance :

V ar(X) = E((X −E(X))

2

) (3.29)

This is a non-negative number. The square root of the variance is called the

standard deviation.

We can easily get:

V ar(X) = µ

2

= α

2

−α

2

1

(3.30)

Some properties of the variance are:

1. V ar(c) = 0

2. V ar(cX) = c

2

· V ar(X)

3. V ar(X

1

+ X

2

) = V ar(X

1

) +V ar(X

2

) X

1

, X

2

are independent

Example 5.4

We calculate the variances

1. in Example 5.1

α

1

= E(X) = 0.6

α

2

= E(X

2

) = (−1)

2

· 0.2 + 1

2

· 0.8 = 1

µ

2

= V ar(X) = α

2

−α

2

1

= 0.64

2. in Example 5.2

α

1

= λ

α

2

= E(X

2

) =

∞

x=0

x

2

·

λ

x

x!

· e

−λ

= e

−λ

(λ

2

e

λ

+ λe

λ

) = λ

2

+ λ

µ

2

= V ar(X) = α

2

−α

2

1

= λ

10 CHAPTER 3. ELEMENTS OF MATHEMATICAL STATISTICS

3. in Example 5.3

α

1

= E(X) = 0

α

2

= E(X

2

) =

1

√

2π

_

+∞

−∞

x

2

e

−

x

2

2

dx = 1

µ

2

= V ar(X) = α

2

−α

2

1

= 1

2

For a discrete stochastic variable assuming non-negative integers as values the

Binomial moments are very useful in teletraﬃc theory. The r’th Binomial moment is

deﬁned by

β

r

=

∞

i=r

_

i

r

_

p(i) (3.31)

The results given above can be extended to two or more variables having joint density

functions, e.g. f(x, y).

E(X) =

u

v

u · f(u, v) (3.32)

An interesting quantity arising in the case of two variables is the covariance deﬁned by

Cov(X, Y ) = E((X −E(X))(Y −E(Y ))) (3.33)

If X and Y are independent, then Cov(X, Y ) = 0.

On the other hand, if X and Y are identical

(X = Y ) then

Cov(X, Y ) = (V ar(X) · V ar(Y ))

1

2

= V ar(X) (3.34)

Thus we are led to a measure of the dependence of the variables X and Y given by

ρ =

Cov(X, Y )

(V ar(X) · V ar(Y ))

1

2

(3.35)

This is a dimensionless quantity called the correlation coeﬃcient or coeﬃcient of correlation.

3.6 Some Discrete Distributions

We shall now describe some important distributions of the discrete type.

3.6. SOME DISCRETE DISTRIBUTIONS 11

3.6.1 Binomial Distribution

Later on we will realize that many important discrete variables are outcomes from the

concept of Bernoulli sequence of experiments. A Bernoulli experiment is a trial with only

two possible outcomes. We normally call the two outcomes ”success” and ”failure” with

respective probabilities p and 1 −p . A sequence of such experiments is called Bernoulli

sequence if all the experiments have the same probability of ”success” or ”failure”.

We now consider Binomial distribution. This distribution is given by (ref. Table 3.1)

P{X = x} =

_

n

x

_

· p

x

(1 −p)

n−x

, x = 0, 1, 2, · · · , n (3.36)

We ﬁnd

E(X) = n · p (3.37)

V ar(X) = np(1 −p) (3.38)

This distribution applies if one makes n independent Bernoulli experiments in which the

probability of ”success” in each experiment is p. P{X = x} is the probability of there being

exactly x ”successes”. This probability is derived by combinatorial analysis.

Example 6.1

This model is usable when we make n test-calls and observes how many of them are

unsuccessful.

2

3.6.2 Poisson Distribution

This distribution is given by (ref. Table 3.1)

P{X = x} =

λ

x

x!

e

−λ

, x = 0, 1, 2, · · · (3.39)

We ﬁnd (cf. Example 5.2 & 5.4):

E(X) = λ (3.40)

V ar(X) = λ (3.41)

This distribution is obtained as the limit of the Binomial distribution when we increase n

and at the same time reduce α, keeping α · n constant and equal to λ.

12 CHAPTER 3. ELEMENTS OF MATHEMATICAL STATISTICS

The Poisson distribution, however, is not only an approximation of the Binomial

distribution, but it is a distribution of its own right, as we shall see in the teletraﬃc theory.

Example 6.2

If we examine a large number of subscribers each of which has a small probability of being

busy, then the number found busy will follow a Poission distribution.

2

Example 6.3

The number of calls incoming to an exchange during one hour will also follow a Poission

distribution.

2

3.6.3 Other Discrete Distributions

The random variable which in a Bernouli sequence counts the number of trials to get the

ﬁrst success is called a geometric random variable. It is described by the

Geometric distribution. In some cases we don’t include the trial for success so that the

values assumed are k = 0, 1, · · · . The geometric distribution is shown in Table 3.1. Notice,

that this distribution includes the success (k = 1, 2, · · ·). By adding k geometric

distributions we get the Negative Binomial distribution, which is also shown in Table 3.1

(Pascal distribution). In Chapter 1 we indicated the Hypergeometric distribution (formula

(1.17)). From Table 3.1 we notice the close relationship between the Binomial, the

Geometric and the Negative Binomial distributions.

3.7 Some Continuous Distributions

3.7.1 Normal Distribution

This is a continuous distribution with the density function

f(t) =

1

σ

√

2π

exp(−

1

2

(

t −µ

σ

)

2

) −∞< t < +∞ (3.42)

3.7. SOME CONTINUOUS DISTRIBUTIONS 13

with (Example 5.3 & 5.4):

E(T) = µ (3.43)

V ar(T) = σ

2

(3.44)

One usually writes T = N(µ, σ), which means ”T is normally distributed with mean value µ

and the standard deviation σ ”.

The standard Normal distribution has a mean of 0 and variance of 1, and forms the basis

for tables of the Normal distribution. The properties of other Normal distributions are

obtained from these tables by working in terms of the quantity

(t−µ)

σ

.

3.7.2 Exponential Distribution

This distribution is called the negative exponential distribution in teletraﬃc theory. The

density and distribution functions are

f(t) = λe

−λt

, t ≥ 0, λ > 0 (3.45)

respectively

F(t) = 1 −e

−λt

, t ≥ 0, λ > 0 (3.46)

We have:

E(T) =

1

λ

(3.47)

V ar(T) =

1

λ

2

(3.48)

This is one of the most important distributions in teletraﬃc theory.

The well-known Markov or ”memoryless” property is inherent in this distribution as we

have:

P{X > t + h | X > t} = P{X > h}

The stochastic variable forget the age t.

3.7.3 Erlang-k Distribution

By adding k exponentially distributed stochastic variables we get a new stochastic variable

which is Erlang-k distributed (Table 3.1). By allowing k to be non-integral this can be

generalized to the Gamma distribution:

f(t) =

(λt)

k−1

Γ(k)

λe

−λt

(3.49)

14 CHAPTER 3. ELEMENTS OF MATHEMATICAL STATISTICS

By replacing λ by (kλ) the mean value becomes independent of k:

f(t) =

λk

(k −1)!

(λkt)

k−1

· e

−λkt

λ > 0 (3.50)

For integral k we have the distribution function:

F(t) = 1 −e

−λkt

·

k−1

j=0

(λkt)

j

j!

(3.51)

E(T) =

1

λ

V ar(T) =

1

kλ

2

(3.52)

When k = 1, the Erlang-k distribution is identical with the exponential distribution. When

k = ∞, since V ar(T) = 0 ,the random variable becomes constant.

From Table 3.1 we notice the close relationship between the Exponential, the Poisson and

the Erlang-k distributions. We also notice the relationship to the discrete cases.

Example 7.1

The holding time of a control device has been found to be Erlang-5 distributed with average

value of 500 milliseconds. What is the probability that the holding time does not exceed

750 milliseconds?

Let X be the random variable, we have E(X) = 500 ms.. So that λ =

1

500

(ms.)

−1

. The

probability that X does not exceed 750 milliseconds is given by F

x

(750).

F

x

(750) = 1 −e

−λkx

k−1

j=0

(λkx)

j

j!

= 1 −e

−7.5

4

j=0

(7.5)

j

j!

= 0.8679

2

3.8. EXERCISES 15

BINOMIAL PROCESS POISSON PROCESS

Discrete time Continuous time

Probability of event = α Rate of event = λ

0 < α < 1 (intensity) λ > 0

Time interval GEOMETRIC EXPONENTIAL

between two DISTRIBUTION DISTRIBUTION

events = time p(n) = α · (1 −α)

n−1

f(t) = λ · e

−λt

interval from n = 1, 2, 3, · · · t ≥ 0

a random point

of time to E =

1

α

V =

1−α

α

2

E =

1

λ

V =

1

λ

2

next event

Time interval NEGATIVE BINOMIAL ERLANG-k

DISTRIBUTION(PASCAL) DISTRIBUTION

until the

occurrence of p(n | k) =

_

n −1

k −1

_

· α

k

f(t | k) =

(λt)

k−1

(k−1)!

· λ · e

−λt

event number k ·(1 −α)

n−k

t ≥ 0

n = k, k + 1, · · ·

E =

k

α

V =

k(1−α)

α

2

E =

k

λ

V =

k

λ

2

Number of BINOMIAL POISSON

events in DISTRIBUTION DISTRIBUTION

a ﬁxed p(x | n) =

_

n

x

_

· α

x

f(x | t) =

(λt)

x

x!

· e

−λt

time interval ·(1 −α)

n−x

t ≥ 0

x = 0, 1, 2, · · · , n

E = α · n E = λ · t V = λ · t

V = α · n · (1 −α)

Table 1 Correspondence between the distribution of the Binomial process

and the Poisson process. E= mean value, V = variance.

3.8 Exercises

1. The number of calls arriving on a group of devices in a telephone system was recorded

on a counter. The counter was read oﬀ every 3 minutes. The following values x

i

were

16 CHAPTER 3. ELEMENTS OF MATHEMATICAL STATISTICS

obtained during a period from 8 a.m. to 10 a.m..

06 08 08 07 07 06 09 07 06 03

09 07 11 07 12 09 13 08 09 05

08 15 09 09 19 16 10 11 11 15

17 12 16 14 15 14 09 10 14 14

(a) Make a diagram of the frequency of diﬀerent

number of arriving calls.

(b) Make a table of the (empirical) density function,

i.e. of the relative frequency of diﬀerent number

of arriving calls.

(c) Make a table of the (empirical) distribution

function.

(d) Calculate the mean value x of the number

of arriving calls for this measurement

(1) from (b). (2) from the formula x =

1

40

40

i=1

x

i

(e) Calculate the variance of the number of

arriving calls for this measurement

(1) from (b). (2) from the formula V ar =

1

n

n

i=1

(x

i

−x)

2

(In practice we divide the sum by n −1 instead of n when

we calculate the variance of ”observations” because this

gives a better result. We use n only in theoretical analysis)

2. Prove (3.30).

3. Show that if X and Y are independent, then Cov(X, Y ) = 0.

Updated: 2001.01.10

Chapter 4

Theory of sampling

4.1 Sampling

We are often interested in drawing conclusions about a large set of objects, which we shall

call a population . The population size N can be ﬁnite or inﬁnite. Instead of examining the

entire population (doing this is often impossible in practice) we observe only a

sample of size n, which is a subset of the population. The process of obtaining samples is

called sampling. The purpose is to obtain some knowledge about the population from

results found in the sample.

Sampling where each element of a population may be chosen more than once is called

sampling with replacement. In sampling without replacement each element cannot be

chosen more than once.

A population is characterized by a stochastic variable X, which is deﬁned by a distribution

function F(X) having population parameters as the mean value µ, the variance σ

2

, etc. If

we know F(t), then we have full information about the population. However, in real world

problems one often has little or no knowledge about the distribution underlying samples. So

ﬁnding knowledge, when little or nothing is known of the underlying distribution, is the

main topic of this chapter. In general we shall only try to get some knowledge (estimates)

of some population parameters by sampling.

4.2 Sampling Statistics

By taking random samples from the population these may be used to obtain estimates of

the population parameters. An important problem in sampling theory is to decide how to

1

2 CHAPTER 4. THEORY OF SAMPLING

form the sample statistics which will best estimate a given population parameter.

Let us pick n members from the population at random:

observations : x

1

, x

2

, · · · , x

n

sample size : n

Then we calculate the following sample statistics:

sample mean : x =

1

n

·

n

i=1

x

i

(4.1)

sample variance : s

2

=

1

n

·

n

i=1

x

2

i

−x

2

(4.2)

These statistics are functions of stochastic variables and are therefore stochastic variables

themselves.

The unkown population mean and variance are estimated by the following unbiased

estimators:

µ = E{x} (4.3)

σ

2

= E{´ s

2

} = E{

n

n −1

· s

2

} (4.4)

(these important results are proven in mathematical statistics)

We now want to know how accurate these results are.

4.3 The Central Limit Theorem

This is a fundamental theorem from mathematical statistics:

If a sample of size n is taken from a population with ﬁnite mean µ and ﬁnite variance σ

2

(and otherways any statistical distribution), then as n increases the distribution of the

sample mean x is asymptotically normal distributed (cf. section 3.7) with mean value µ and

variance

σ

2

n

. Or equivalently, the distribution of

Z =

(x −µ)

σ

√

n

(4.5)

tends towards the standard Normal distribution N(0, 1) as n increases.

4.4. SAMPLING DISTRIBUTION 3

Example 4.1

Consider a sequence of Bernouli random variables X

1

, X

2

, · · · , X

n

that are independent and

each with ”success” probability p. We have then E(X

i

= p and V ar(X

i

= p(1 −p)). By the

central limit theorem, we get:

x −p

√

p(1−p)

√

n

→N(0, 1) (n →∞)

or

n

i=1

X

i

−np

_

np(1 −p)

→N(0, 1) (n →∞)

Since we know from section 6 of Chapter 3 that S

n

=

n

i=1

X

i

is binomial distributed, the

above expression shows that for large n an approximation for binomial probaboloties can be

obtained by using the Normal probabilities of N(np, np(1 −p)).

2

4.4 Sampling Distribution

A sample statistic, which is calculated from a sample, is a function of random variables and

is therefore itself a random variable. The probability distribution of a sample statistic is

called the sampling distribution of the statistic . We shall only consider two sampling

distributions for the sample mean.

4.4.1 Population mean µ and variance σ

2

are known

Suppose that the population from which samples are taken has a probability distribution

with mean value µ and variance σ

2

(not necessary a Normal distribution). Then it can be

shown that the sampling distribution of x is asymptotically normal distributed N(µ, σ

2

),

i.e. :

Z =

x −µ

σ

√

n

→N(0, 1) for n →∞ (4.6)

This is a consequence of the Central Limit Theorem in section 4.3.

4 CHAPTER 4. THEORY OF SAMPLING

If we choose a so-called conﬁdence level 1 −α, then we can expect to ﬁnd x lying between

the conﬁdence limits

µ ±z

1−

α

2

·

σ

√

n

(4.7)

with the probability (1 −α) · 100% of the time.

This interval (µ −z

1−

α

2

·

σ

√

n

, µ + z

1−

α

2

·

σ

√

n

) is called the conﬁdence interval . z

1−

α

2

is

obtained from the standard Normal distribution:

P{−∞< T ≤ z

1−

α

2

} = 1 −

α

2

(4.8)

Example 4.2

For some values of the conﬁdence level α we have the following values of z:

α z

1−

α

2

10% 1.6449

5% 1.9600

1% 2.5758

2

Thus (section 3.7) 2.5% of probability (area under the density function) is above t = 1.9600,

and (because of symmetry) 2.5% is below t = −1.9600.

4.4.2 Population mean µ and variance σ

2

are both unknown

In most pratical applications we do not know the population parameters µ and/or σ

2

. Then

we estimate these parameters by the sample mean x , respectively the sample variance ´ s

2

.

It can be shown that the sample mean has a so-called (student) t-distribution. The

conﬁdence interval becomes:

x ±t

1−

α

2

,n−1

·

´ s

√

n

(4.9)

where the t-value is obtained from a table of the t-distribution, which has an additional

parameter: degrees of freedom = n −1. For increasing n this distribution is asymptotically

Normal distributed:

lim

n→∞

t

1−

α

2

,n−1

= z

1−

α

2

(4.10)

4.4. SAMPLING DISTRIBUTION 5

The t-value yields a larger conﬁdence interval than the z-value (we have less information

because we don’t know the population mean and variance), but for large values of n and for

most pratical purposes we often use the z-value.

Example 4.3

For α = 5% we had z

97.5%

= 1.96. From the t-distribution we get:

n t

97.5%

1 12.71

2 4.30

5 2.57

10 2.23

20 2.09

50 2.01

We notice that for increasing n the t-value tends to 1.96.

2

For a given conﬁdence level we have a relation between the conﬁdence limits (conﬁdence

interval) and the sample size. If we want to reduce the conﬁdence interval by a factor c,

then we must increase the sample size by a factor c

2

.

Example 4.4

The average holding time of calls during a certain period in a telephone system is to be

estimated. Based on a random sample of 100 holding times of calls, the sample mean and

sample variance are calculated as x = 5.74 time unit and ´ s

2

= 2.65 square of time unit.

Find a 95% conﬁdence interval for the true average holding time of calls in that period.

Let µ denote the true average holding time of calls. The conﬁdence interval for µ based on

formula (4.9), is

(x −

´ s

√

n

· t

1−

α

2

,n−1

, x +

´ s

√

n

· t

1−

α

2

,n−1

)

where n = 100, 1 −α = 0.95, x = 5.74, ´ s = 2.65. We have t

1−

α

2

,n−1

= 1.984. Therefore the

6 CHAPTER 4. THEORY OF SAMPLING

conﬁdence interval for µ is (5.4170, 6.0630).

2

4.5 Exercises

1. A bottle is supposed to contain 250 ml of wine, with a standard deviation of 3ml. If

we sample 200 such bottles at random, probability that the average of wine contained

in a bottle will be

(a) At most 248 ml.

(b) At least 252 ml.

(c) Between 249 and 251 ml.

2. If X is a Poisson random variable with mean 81, ﬁnd the approximate probability

P(X ≥ 75).

3. Let X

1

, X

2

, · · · , X

n

(n large enough to justify applying the Central Limit Theorem) be

independent random variables, each Poisson with mean λ. Find an (appriximate)

1 −α conﬁdence interval for µ. (α = 5%).

4. Suppose that it is observed that the average span of using one kind of parts of a

machine is 5 years, with a standard deviation of 1.2 years. By sampling of 100 of this

kind parts, we obtain x = 4.75. Construct a conﬁdence interval for µ with conﬁdence

level

(a) 99% (b) 95% (c) 90% (d) 80%

Does the length of the intervals increase or decrease as the conﬁdence level decrease?

5. A certain kind of instrument labeled ”1.5 kg weight”. A random sample of 50

instruments is measured to be in the standard weight. We calculate x = 1.47 kg,

´ s

2

= 0.09 kg

2

. Construct a conﬁdence interval for µ with conﬁdence level

(a) 80% (b) 95% (c) 99%.

Updated: 2001.01.10

4.5. EXERCISES 7

0 3.2 3.3 3.4 3.5 3.6 Discrete Probability Distributions

CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . 2 3 5 6

Continuous Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . Joint Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Expected Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Some Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.6.1 3.6.2 3.6.3 Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Other Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . 12

3.7

Some Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.7.1 3.7.2 3.7.3 Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 13 Erlang-k Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.8

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1 1 1 2 3 3 4 6 7 7

4 Theory of sampling 4.1 4.2 4.3 4.4 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sampling Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sampling Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 4.4.2 4.5 Population mean µ and variance σ 2 are known . . . . . . . . . . . . . Population mean µ and variance σ 2 are both unknown . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Author index Index

**Chapter 1 Combinatorial analysis
**

In some cases the number of possible outcomes for a particular event is not very large, and so direct counting of possible outcomes is not diﬃcult. However, problems often arise where direct counting becomes a practical impossibility. In these cases use is made of combinatorial analysis, which could be called a sophisticated way of counting.

1.1

Fundamental Principles Of Counting: Diagram

Tree

We ﬁrst introduce two rules which are employed in many proofs through the combinatorial analysis:

Rule of Sum: If object A may be chosen in m ways, and object B in n other ways, ”either A or B” may be chosen in m + n ways.

Rule of Product: If object A may be chosen in m ways, and thereafter object B in n ways, ”both A and B” may be chosen in this order in m · n ways (multiplication principle).

It should be noticed that in the Rule of Sum, the choices of A and B are mutually exclusive, that is, one can not choose both A and B but either A or B. 1

2

CHAPTER 1. COMBINATORIAL ANALYSIS

R1 L1

T1

T2 R2

T1

T2 R3

T1

T2 R1 L2

T1

T2 R2 R3

T1

T2

T1

T2

Figure 1.1: Fig.1.1 Tree diagram for example (1.1), where, where=2, l=3, and m = 2 The Rule of Product is often used in cases where the order of choosing is immaterial, that is, where the choices are independent. But in many practical situations the possiblity of dependence should not be ignored. If one thing can be accomplished in n1 diﬀerent ways, and if after this a second thing can be accomplished in n2 diﬀerent ways,· · ·, and ﬁnally a k’th thing can be accomplished in nk diﬀerent ways, then all k things (which are assumed to be independent of each other) can be accomplished in the speciﬁed order in n = n1 · n2 · · · nk diﬀerent ways. A diagram, called a tree diagram because of its appearance (ﬁg.1.1), is often used in connection with these rules. Example 1.1 (see Fig. 1.1) Let the setting-up procedure of a call involve the following devices: k local circuits : L1 , L2 , · · · , Lk

1.2. FACTORIAL FUNCTION l registers m trunk circuits : R1 , R2 , · · · , Rl : T1 , T2 , · · · , Tm

3

Under the assumption of independence a call can then be set up in n =k·l·m diﬀerent ways.

For a group of 1000 subscribers typical ﬁgures are

k = 80, l = 15, m = 50, i.e.

n = 60.000.

If a malfunctioning only occurs for a speciﬁc combination of devices, it can be very diﬃcult to trace the fault, as it only appears in one out of 60.000 calls (assuming random hunting). 2

1.2

Factorial Function

Factorial n (denoted by n!, n integer) is deﬁned as n! = n · (n − 1) · (n − 2) · · · 2 · 1 It is convenient to deﬁne 0! = 1 Example 2.1 For many calculators the upper range of number is 9.999 · 1099 Thus the factorial function only exists for n < 70: 69! = 1.7112 · 1098 70! = 1.1979 · 10100 (1.2) (1.1)

718281828 ). then we get (1. The symbol means approximately equal to. or permutations as they are called. denoted by Γ(n) . is deﬁned for any real value of n > 0: Γ(n) = A recurrence formula is Γ(n + 1) = n · Γ(n) We can easily ﬁnd Γ(1) = 1. Since there are n ways of choosing 1st object and. The symbol ≈ means that the ratio of the left side to the right side approaches 1 as n → ∞. In such cases Sterling’s approximation is often applied: √ n! ≈ 2πn · nn · e−n (1.3 Permutations The basic deﬁnition of a permutation is given below: A r-permutation of n diﬀerent objects is an ordered selection or arrangement of r (r ≤ n) of the objects. .6) the number of permutations of n objects taken r at a time. COMBINATORIAL ANALYSIS If n is large a direct evaluation of n! is impractical. Suppose that we are given n distinguishable objects and wish to arrange r (≤ n) of these objects on a line. n − 1 ways of choosing the 2nd object. Actually it is sampling without replacement. is given by P We call P n r n r = n · (n − 1) · · · (n − r + 1) (1.1): Γ(n + 1) = n! (1.5) ∞ 0 tn−1 · e−t dt .4 CHAPTER 1.· · · .3) where e is the base of natural logarithms (e 2. 2 The gamma function. If n is a positive integer.4) 1. after this is done. n > 0 (1. and ﬁnally (n − r + 1) ways of choosing the r’th object. For this reason we often call the right side an asymptotic expansion of the left side. by repeating the application of the rule of product it follows by the fundamental principle of counting that the number of diﬀerent arrangements.

We can look (hunt) for idle circuits in P diﬀerent ways. and nk are identical.1. we are only interested in selecting objects without regards to the order.4 Combinations In a permutation we are interested in the order of arrangements of the objects. where n = n1 + n2 + · · · + nk is n! = n1 ! · n2 ! · · · nk ! n n1 n2 · · · · · nk .8) agree only if 0! = 1 (1. COMBINATIONS For the particular case where r = n (1.2 Let us consider a group of n circuits. The number of permutation of n objects consisting of groups of which n1 are identical. Example 3.4.· · ·.7) (1.8) If r = n we see that (1. In many problems. Thus (a b c) is diﬀerent permutation from (b c a).1 We consider the case where some objects are identical.6) becomes P We can write (1.6) and (1.2). The term of the righthand side is called the Polynomial coeﬃcient. 2 n n = n! 1. .6) in terms of factorials as P n r = n! (n − r)! n n = n! 5 (1. however. For example (a b c) and (b c a) are the same combination. Example 3. n2 are identical. Such selections are called combinations.

r r We have C n r = n r = n! r!(n − r)! (1.10). Example 4. It is easy to see that n r = n n−r (1.n > 0 (1. Thus we have n r n r r!C =P = n(n − 1) · · · · · (n − r + 1).11) n are often called binomial coeﬃcients because they arise in the r binomial expansion: n n (x + y)n = xr · y n−r r r=0 The numbers They can be generalized in several ways. n ≥ r By moving r! to the other side we get the expression (1.13) This appears in the Negative Binomial Distribution.6 CHAPTER 1.10) We can easily derive the expression by noticing that each combination of r diﬀerent objects may be ordered in r! ways.1 . and so ordered it is an r-permutation.12) (1. COMBINATORIAL ANALYSIS The total number of combinations of r objects selected from n (also called the combinations n n of n items taken r at a time) is denoted by C or . Thus we deﬁne: −n r = (−1)r n+r−1 r .9) It can also be written n r P = n r r! (1.

aca. cba abd. add.4. cab. dda. bbc. bac. bcc.14) n r=0 n r = 2n This is readily seen from (1. bba. cac. dca bcd. dda. bad.3 . bcd. cbc. cbd. cab. ccd. bdc. adb. b. bac. aab. ccb.1. aac. adc. bbc. 2 Example 4. ddb. cca. dca bcd. bab. cba abd. dbd. cdd. adb. cdb. adc. dba acd. ccd ddd. daa. bdc. cdc ddd. bcb. ccb. dcd abc. acc. dcc. To form all the permutations of 4 letters taken 3 at a time it is necessary to take each combination and write out all possible permutations of the given combination: Combinations abc abd acd bcd P ermutations abc. abd. dcb. dac.2. cad. aad. COMBINATIONS The number of combinations of n objects taken 1. cdb. aba. bda. dcb.e. c and d taken 3 at a time are abc. acb. cca. abb. ada bbb. cad. dab. dab.2 The combinations of the letters a. cda. dbb. ddc. aad bbb. cbd.· · · . dbc 2 Example 4. cbb. caa. cda. bbd ccc. bba.12) by letting x = y = 1. bca. aac. dbc Combinations aaa. acb. bda. or n at a time are: n 1 i. bca. bdd. dba acd. bdb ccc. dad. dac. bad. aab. bbd. + n 2 +···+ n n = 2n − 1 7 (1. ddc abc abd acd bcd P ermutations aaa. ddb. acd. baa.

whichever is the smaller. A group of k circuits is chosen at random. 2 Many combinatorial problems can be reduced to the following form. For a group of n circuits. rk objects of type k. We seek the number of combinations which contain exactly x busy circuits. n2 diﬀerent objects of type 2. Thus the total number of k−x combinations containing x busy circuits is p x · n−p k−x n k (1. · · ·. where n = n1 + n2 + · · · + nk We consider combinations of r objects containing r1 objects of type 1.8 CHAPTER 1. p of them are busy and (n − p) of them are idle. ri ≤ ni The number of these combinations is by the fundamental principle of counting: n1 r1 · n2 r2 ··· nk rk The total number of combinations with r elements is n r .15) The total number of combinations containing k circuits (idle or busy) is . the busy ones can be chosen in x n−p diﬀerent ways and the idle ones in diﬀerent ways. r2 objects of type 2. Since any choice of busy circuits p may be combined with any choice of idle ones. where r = r 1 + r2 + · · · + rk . The chosen group contains x busy and k − x idle circuits. Here x can be any integer between zero and p or k. etc. COMBINATORIAL ANALYSIS Let us consider a set of n objects consisting of n1 diﬀerent objects of type 1. So the .

for r > n and for r < 0 n 0 =1 .17) In terms of probabilities this is called the hypergeometric distribution. 2 Useful Relations And Results • Equalities: n r n r = n n−r (1.21) (1. Example 4.17) qk = = (1.4.16) This can be rewritten in the form k x qk = (1.19) These expressions are useful when deriving Palm-Jacobæus’ formula and Erlang’s interconnection formula (Erlang’s ideal grading) in teletraﬃc theory.18) or from (1.16) p k n−p 0 n k k k n−k p−k n p p k n k n−k p−k n p qk = = (1.22) = 0.1.4 For the special case x = k we get (1. COMBINATIONS relative number of ”favourable” combinations is p x n−p k−x n p n−k p−x n p 9 qk = (1.20) (1.

· · ·. 9 (0 is not allowed in the ﬁrst digit) if (a) repetitions are allowed? (b) repetitions are not allowed? (c) the last digit must be 0 and repetitions are not allowed? . r!(n−1)! Combination = 1.2. Compare the result with the value given in Example(2.2 Pascal’s triangle • Summary: P ermutation = n! .1.24) (1.25) i r n+1 r+1 1 1 1 1 1 ··· ··· 4 ··· 3 ··· 6 ··· 2 3 ··· 4 ··· 1 1 1 ··· 1 ··· Fig.23) (1.5 Exercises 1. COMBINATORIAL ANALYSIS • Recurrence formula (Pascal’s triangle): n r−1 + n r=0 n i=r n r n r = = = 2n n+1 r (1. 2.10 CHAPTER 1. r!(n−r)! (n+r−1)! . (n−r)! repetitions are not allowed repetitions are allowed repetitions are not allowed repetitions are allowed nr . How many 6-digit telephone numbers can be formed with the digits 0.1. In how many ways can 4 calls occupy 10 diﬀerent circuits? 3.1). Evaluate 69! by using Sterling’s approximation to n! (use logarithm to the base 10). n! .

01. 7 circuits are busy.5. The number of combinations of n digits taken x(x = 0.· · ·.k − 1 can be shown to be n+k n Verify this for n = 2 and k = 10. 11 Updated: 2001. A trunk group contains 10 circuits. 7. 6. How many diﬀerent numbers have two ”0” (and two ”1”)? 5. 3. EXERCISES 4. 1.1.1. Every digit may be ”0” or ”1”. What is the number of combinations which contain X(X = 0.17) are equal. 1.23).16) and (1. 2. where every digit has one of k diﬀerent values 0. 2. We consider a 4-digit binary number.10 . Prove that (1.2. · · · . Prove formula (1. n) at a time. 4) busy circuits? 8.

12 CHAPTER 1. COMBINATORIAL ANALYSIS .

If c does not belong to A. A set is well-deﬁned if we are able to determine whether a particular element does or does not belong to the set. If both a and c belong to A we write a. Probability theory is usually discussed in terms of experiments and possible outcomes of the experiments. we are of cause unable to predict the actual number of arriving calls. We cannot predict the result. We call a set a ﬁnite (or inﬁnite) one if it contains a ﬁnite (or inﬁnite) number of elements. When observing the number of telephone calls arriving at a telephone exchange during a certain time interval. c. 5. b. If the set A consists of the elements a.. This is also a random event. c. the result may be one of the six numbers: 1. If an element c belongs to a set A. 1 . and an element by a lower case letter such as a.Chapter 2 Elements of probability theory Probability theory deals with the studies of events whose occurrence cannot be predicted in advance. A set can be deﬁned by listing its elements. b. In general we shall denote a set by a capital letter such as A. 3. 2. etc. B. to throw a single die. then we write A = {a. b.1 Set Theory A set is a collection of objects called elements of the set. we write c ∈ A. So the outcome of throwing a die is a random event. c} A set can also be deﬁned by describing some properties held by all elements and by non-elements. etc. These kinds of events are termed random events. For example. The set theory plays an important role in the study of probability theory. 2. c ∈ A. 6. 4. C. we write c ∈ A.

} 2 If any element of a set A does belong to a set B. If A ⊆ B. 2 Example 1. acd. bcd} = {x | x is a combination of the letters a.4 (cf.2 Example 1.m. ELEMENTS OF PROBABILITY THEORY A = {abc. and 10 a. . c and d taken three at a time } 2 Example 1. In this case A and B have exactly the same elements. and 10 a.3 (cf. b. and B ⊆ A. c. abd. is a subset of all call attempts during the same period. then we call A a subset of B.m. Example 1. we write A = B. If both A ⊆ B. b. then A and B are said to be equal. but A = B.2 B = {x | x is the number of telephone call attempts between 9 a. then we call A a proper subset of B denoted by A ⊂ B.1) The set consisting of the combinations of the letters a.m. Example 1.2) The set consisting of successful call attempts between 9 a. and we write A = B.1 CHAPTER 2. written: A ⊆ B (” A is contained in B ”) or B ⊇ A (” B contains A ”) For all sets we have A ⊆ A.m. If A and B do not have the same elements. Example 1. and d taken three at a time is a proper subset of the set consisting of the permutations of the same four letters taken three at a time.

. . ... .... U is the universe... ... .. . . . . . . .... .. .. . We assign symbols to them. . . . . ..1).. . . . ... .. .1 Venn Diagram A universe U can be shown graphically by the set of points inside a rectangle (ﬁg. .. . .. .. . . . . 2. . .. ..... . ... ... . . .1. . .. . . . . .. .. .. ... . Subsets of U (such as A and B shown in ﬁg.. .. .. . . ... . . multiplication and division have symbols like +. ..... . .1.... ... .. .. ... . ... ×. . ... .. . .. ... . . ... . . . . ... .. .. . .. .. . .. . . . ... ...... . . .... .. . 2. .. . .. ... . . .. −.. . . ... . . . . ... . . . ... .. . . . ...... .. .. . . .. ..2 Operators In set theory we deﬁne a number of operators. .. . . .... .. . . . . .. .. . SET THEORY 2 The following theorem is true for any sets A. . . . . . .. .. .. .. . . ... . . . . . 2. . ... just as arithmetic operators for addition. ..... . . . .. . . ... . .. . . .. ... . 2.. .... ... ... .. . . . . . . .. . ... . .. .. .. . . . .... . . .... .. ... . . .. . .2. ... . . ... ... .. ... . .. .. . .. This is called the null set and is denoted by ∅. .. .. . .. . .. . .. It often serves to provide geometric intuition regarding possible relationships between sets.. .. .. . . . . Such a diagram is called a Venn Diagram. B. . . . . . . subtraction.. A. .. . ... . . . ... . . . . The set operators are: . ... .. . .. . . . ... . . . C: 3 if A⊆B and B⊆C then A⊆C All sets considered are in general assumed to be subsets of some ﬁxed set called the universe or the universal set. . .... . . . ... .. . . . .. . . ..1 A Venn Diagram. . ..... ... ........ U B Fig. .. . . . . .. . ... . .. . . ... . .. .. .... . ..... . ... ... . .. . ... .. . . . . . .. . ... ... ... . . .÷. . . .. . .. . .. . .. .. . . . .. ... .. ... .. ... ... . . .. . . .. . . .. . .... . . . .1. .. .. .. ... . . and denoted by U. . . . . .. .. . .. .. It is also useful to deﬁne a set having no elements at all. . 2.. . . . . ... ... . .. . ... .. . .. ... .. . . . .. . . ... . A and B are subsets. .1) can be represented by sets of points inside circles. . . .. . . .. . . . .. .

. . simply. . . . .. . . .. . .. . . ... . .. . .. ..... .. ... . .... . ....... . .. U B ... .. ... . 2. . . . ... . ... .. ... is the set consisting of all elements of A which do not belong to B (ﬁg. . A ...... ... . . is the set of elements which belong to both A and B (ﬁg.. . ... .. ..... .. ... ... ... .. ... ... .... . .... . . . . Complement (symbol C).. ........ . ..... . .. . . .. .. .... . . . . .. .. .. . . B \ A) (ﬁg.... . .. ..5). .. . . .. .. . . . .... . . . ... ... . .. . . . ... . .. . ... . .. .. ..... . . denoted by A ∪ B. . . ... .. .. . ... .. . . .. .. . .. .. . ... ... .. . The absolute complement or. .. .. .. . ..... . . . . . . . .. . . . . .. . . .. ..... .. . . . . . A.. . .... ..... .. ... . .. . . . . .. ... ... ... ... .. . .. . . .. ... . ...... ... . . . . . . .. .. ... . .. . .. ... . .. . ... ..... . . . . . . .... . .... . ..... ... . .. is the set of elements which does not belong to A... ... ... . ... . .... The diﬀerence of A and B.... . ... . . ... .. .. CHAPTER 2... . . . . .. . .. .. .. .... ... .... . . . . ... .. . .. .. .4 Union (symbol ∪). . .. . .. . . . .. . . . .. . . . .. . .. ..... ... ...... .. . ...... . .. . . ... . . . . . . . . . .4 The diﬀerence :A \ B Fig. .. The complement of A relative to B. . . . .... . . . . ... ... . . . ... .. . .. .. .5 The complement :CA . ... .... . . ... . . . .. . B ...... ... . .. . . . ... . . .. . . .. ... . ..... ... . . ..... ... .. ..... ... .. .... 2.. . ...... A. . . . .. .. ... . .. . .. ....... ... ... .. . ..... . . ... . . ... ... . . .. . ... ... .. ..... . .. .. .. ... .. ... . .. . . .. . .. . .. ... .. . . .. .... .. .. .. . . . . . . . .. . .. .. .. .. .. . .. . . .. .. .. . .. .. . . . .... ....... .. ..... . ELEMENTS OF PROBABILITY THEORY The union of A and B. .. .. . . . .... . . . .. . .... .. . . .. .. . ... . . . .... .. . .. . .... .. . ...... ..3 The intersection :A ∩ B . . .. . .. .. . .... . .. . .... . .. ... . ... . .. . . . ... . . . . ... . . ... . . U . .. . . . .. . . . ..... ..... ... .. . ... . . . .2 The union :A ∪ B Fig.. A U . .. .. . . . . .3). . Intersection (symbol ∩). . . . ... .. . .. . . . . . . . ... . . . .... . . .. .... . . .. . .. 2.. . .. .. ... . .. ....2).. ... . .. . . .. . . . . . ... . . denoted by A ∩ B.. . . . . 2.. .. . the complement of A denoted by CA. .. . . . . .. 2. ...... ... . ...... . . .. . . .... .... . .. . . .... ....... . .... . ... .. .. . . . . .. is the set of elements which belongs to either A or B or both A and B (ﬁg. . . .. 2. .... .... ... .. .... . . .. . . . .... . The intersection of A and B.. . . . . . . . ... ... . ....... . . .. .. . .... . . .. . ... .. ... . .. .... .... . . . .. . . ... ..... . . . . . .... ... . .. ... . . .. . . .. . . . . . ... . . . .. . . . .. 2.... then the two sets are called disjoint or mutually exclusive.. .... ....e. . .. . .. .. . . . . .... .. . . 2.... .4)..... .. . ... . .. ... .... .. . . . . .. ... . ... .... . . . . ... . . . .... . .. ...... .. . .. . . . . . . . .. . . . . . ... . . .. . . .. . . ... . . . . . .. ... .... . .. .. . . . . .. . . They have no elements in common. .... .. .. . . Diﬀerence (symbol \)... . .. . . . .... . . .. .. . .... . .. . . . .. . . .. .. ... . .. . .. ... ... .. ... . . . ... . . . . . ... . .. . . . . . ... ... ... .. .. .. . . . ... . . . . .. .. .. .. . ..... .. . .. ... . ........ . . . .. .. . Fig. ... .. . . . . . . ...... . .. . . ... . . ... .. . . .. . If A ∩ B = ∅.. . denoted by A \ B. . . . .... ... .. . ..... . .... ..... .. . . . . .. ... . U B Fig. ... . .. . . . . ... .. .. ... .. . ... ... . .. ..... .. .. .. . . ... denoted by CB A is the set of elements in B which does not belong to A (i.... . . . . . .. .

1. SET THEORY 5 2. Distributive laws: A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C) A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C) 5.2. Associative laws: A ∪ (B ∪ C) = (A ∪ B) ∪ C = A ∪ B ∪ C A ∩ (B ∩ C) = (A ∩ B) ∩ C = A ∩ B ∩ C 4. De Morgan’s laws: C(A ∪ B) = CA ∩ CB C(A ∩ B) = CA ∪ CB 7. Identity laws: A∪∅ =A A∪U =U 6.3 Set Theorems Set operations are similar to those of Boolean algebra as seen from the following theorems.1. For any sets A and B: A = (A ∩ B) ∪ (A ∩ CB) CU = ∅ A ∩ CA = ∅ C∅ = U A∩∅=∅ A∩U =A A∩B =B∩A A∩A=A 2. Complement laws: A ∪ CA = U C(CA) = A 8.1. Commutative laws: A∪B =B∪A 3. sets by their complements and we reverse the inclusion symbols ⊂ and ⊃. intersections by unions. .4 Principle Of Duality: Any true results involving sets is also true if we replace unions by intersections. 1. Idempotent laws: A∪A= A 2.

ELEMENTS OF PROBABILITY THEORY 2.6 CHAPTER 2.m. Sample space: S = {0. 1. The individual outcome is called a sample point which is an element of the set S.2 Experiment : Observation of the number of busy lines in a group of n lines.3 Experiment : Observation of the number of calls arriving at a telephone station during a certain time interval. Sample space: S = {0. 4. Experiments in which the outcome will not be the same. 2. are called random experiments and are subject to study by probability theory.m. and 10 a. Sample space: S = {t | 9 ≤ t ≤ 10} 2 . 2. A set of a list of all possible outcomes of an experiment is called a sample space denoted by S. 2. 3. 1. even though the conditions are nearly identical. 6} Example 2. 5. and observe the time of the call.2 Sample Space And Events Experiments are of great importance in science and engineering. Example 2.1 Experiment : Throw a die Sample space: S = {1. n} 2 Example 2. · · ·} 2 Example 2. · · · .4 Experiment : Make a telephone call between 9 a.

We can represent events graphically on a Venn Diagram. A sure event is an event that will deﬁnitely occur.2). If A and B are events.e.1 and 2. then both events cannot occur simultaneously. In both cases it is called a discrete sample space.3). By using the operators of section 2. A compound event is the aggregate of a number of simple events. that is the sets corresponding to events A and B are disjoint.1 on events in S we can obtain other events in S.2.4). If it has as many points as there are natural numbers it is called a countable ﬁnite sample space (Example 2. 2 2 Since events are sets. It can not be broken down to other events. They are mutually exclusive. statements concerning events can be translated into the language of set theory and conversely. . If it has as many points as there are points in some interval it is called a non-countable inﬁnite sample space or a continuous sample space(Example 2. and we also have an algebra of events corresponding to the algebra of sets given in section 2. A set of events is termed exhaustive if their union is the entire sample space S.1. Example 2.5 For a group of 12 lines the compound event that exactly 2 circuits are busy consists of 12 = 66 simple events. we have A∪B A∩B CA A\B ∅ S : : : : : : the the the the the the event event event event event event ”either A or B or both” ”both A and B” ”not A” ”A but not B” ”never occur” ”surely occur” If A ∩ B = ∅. An event which consists of one sample point of S is called a simple event. Naturally an impossible event never occurs. i. If the outcome of an experiment is A we say that the event A has occurred or that A is a realization of the experiment. it is a set of possible outcomes.2. SAMPLE SPACE AND EVENTS 7 If the sample space has a ﬁnite number of points it is called a ﬁnite sample space (Example 2. An event is a subset A of the sample space.

a3 .6. Example 2. Let A be an event in which at least two circuits are idle.1 (ref. B is the event that at most two circuits are idle.6 CHAPTER 2.6) According to Example 2. We shall use a relative frequency approach (the posteriori approach) In early or classical probability theory. A is the event that at least two trunk circuits are idle.0.0 ) corresponds to the outcome that the ﬁrst and the third circuits are busy while the second and the fourth circuits are idle. The sample space consists of 24 = 16 sample points. there are 24 = 16 sample points. all sample spaces were assumed to be ﬁnite.3 The Concept Of Probability Probability is a positive measure between 0 and 1 associated with each simple event. ELEMENTS OF PROBABILITY THEORY Consider a trunk group having four trunk circuits. This deﬁnition is applicable in some cases as the following example. Then A ∪ B is the whole space S. The experiment is to observe the state of the trunks: busy or idle. The deﬁnition of the probability P of an event A was described by the relative frequency by which A occurs: h n P (A) = where h is the number of sample points in A and n is the total number of sample points. We use the combinational analysis to get the probabilities: hA = 4 2 + 4 3 + 4 4 = 11 . the total of all simple event probabilities being 1. and each sample point was considered to occur with equal frequency. 2 2. A sample space S for this experiment of observing trunk circuits 1 through 4 may be the set of four-tuples (a1 .8 Example 2. From a strict mathematical point of view it is diﬃcult to deﬁne the concept of probability. If C is the event that exactly one circuit is idle. then A ∩ C = ∅ that is to say A and C have no sample points in common.1. C is the event that exactly one circuit is idle. Thus the sample point (1. a4 ). A ∩ B is the collection of elements of S in which just two trunks are idle. a2 . ai is either 1 or 0 indicating that the i’th trunk circuit is busy or idle. Example 3. and let B be an event in which no more than two circuits are idle.

Then we deﬁne the probability of A by P (A) = lim h n→∞ n (2. For every event A we have: 0 ≤ P (A) ≤ 1 (2.2) .6875 4 hA∪B = and P (A ∪ B) = 16 16 i=0 4 i = 16 = 1.375 hC = 4 1 =4 and P (C) = 4 16 = 0. From the deﬁnition we obtain a number of basic properties: 1. 2 Let us consider an experiment with sample space S.25 hA∩C = 0 and P (A ∩ C) = 0 16 = 0. We ﬁnd that an impossible event has the probability 0. hA∩B = and P (A ∩ B) = 6 16 4 2 =6 = 0.1) Thus the probability of an event is the proportion of all experiments in which this event occurs when we make a very large number of experiments. We ﬁnd that a sure event has the probability 1. Let h be the number of times that the event A occurs in n repetitions of the experiment.6875 hB = 4 0 + 4 1 + 4 2 = 11 and P (B) = hB n = 11 16 = 0.2.3. THE CONCEPT OF PROBABILITY and P(A) = hA n 9 = 11 16 = 0.

then we assign equal probabilities to the sample points: 1 6 P (1) = P (2) = P (3) = P (4) = P (5) = P (6) = .3) (2. We therefore need the concept of a probability density: P (A) = A p(s)ds (2.5) If S is a continuous sample space. 4. If we assume the call is equally likely to occur anywhere between 9 and 10.4 we have a continuous sample space {9 ≤ t ≤ 10}. 2. and all laws of probability still apply if we replace summation by integration. A sure event has probability of unity: P (S) = 1 4.6) k i=1 (2. If we assume the die is fair.A2 .4) P (Ai) (2. then the density function becomes (1hour)−1 = (60minutes)−1 .2 In Example 2. Example 3.5).Ak we have: P (A1 ∪ A2 ∪ · · · ∪ Ak ) = In particular for two disjoint events: P (A1 ∪ A2 ) = P (A1 ) + P (A2 ) (2.· · ·. 2 Example 3. The sample space is S = {1. 5. For any number of disjoint events A1 . An impossible event has zero probability: P (∅) = 0 3. 6}.10 CHAPTER 2. The probability of a call 2 1 between 9:29 and 9:31 then becomes 60 = 30 . then the probability of any particular sample point is zero.3 A single die is thrown. 3. The probabiliy of a call at 9:30 sharp is zero. ELEMENTS OF PROBABILITY THEORY 2.7) This is similar to the discrete case (2.

. Example 3.2. This kind of probability is naturally called conditional probability. and that” at least two other circuits are idle is also an event.18) or (1.8) 2.1. 2 We now consider an important theorem in probability theory. Given the ﬁrst trunk circuit is known to be idle.Chapter 1 Example 4. If the events are mutually exclusive (disjoint). then P (A ∪ B) = P (A) + P (B) − P (A ∩ B) This formula might be used to simplify the evaluations of Example 3.4) If all possible combinations are equally likely.9) (2.19).6. 2.4 (cf. then we are interested in the probability P that there are at least two other circuits idle. The probability of A is then equal to the ratio of the number of outcomes in which A is realized to the total number of outcomes. ADDITION RULE 2 11 In some cases enough is known about the experiment to enumerate all possible outcomes and to state that these are equally likely. (2. then the probability of ﬁnding k busy circuits are given by (1. It is obvious that” the ﬁrst circuit is idle” is an event. Combinatorial analysis is useful to ﬁnd the relevant number of outcomes.4. So P is the probability that one event occurs under the condition that another event has occurred.5 Conditional Probability We now investigate a more complicated version of the case in Example 2.4 Addition Rule If A and B are two events. then P (A ∪ B) = P (A) + P (B) Generalizations to 3 or more events can be made.

From combinatorial analysis we know that 2 lines can be busy in Therefore we get P (B) = 4 10 5 = 10 diﬀerent ways.12 CHAPTER 2. P (A | B) = 1 10 4 10 = 1 4 2 2. ac. e) contains 2 calls only which occupy adjacent circuits”. then the probability that the event A has also occurred is known as the conditional probability. If it is known that an event B has already occurred.6 The Multiplication Theorem From (2.e. d. Example 3.12) (2. one of which occupies the circuit a”. the conditional probability of A given B. The probability of event A does not depend on whether B has occurred or not: P (B | A) = P (B) or P (A | B) = P (A) . This is denoted by P (A | B). the number of experiments in which A and B (at least) are realized. Let B denote the event ”a group of 5 circuits contains 2 calls only. ELEMENTS OF PROBABILITY THEORY We consider an experiment. b. c. 2 1 (ab.10) i. ad. and P (A ∩ B) = 10 (ab occupied).11) In this case A and B are said to (statistically) independent events. and it is deﬁned by P (A | B) = P (A ∩ B) P (B) (2.10) we get the following formula: P (A ∩ B) = P (A) · P (B | A) = P (B) · P (A | B) If the events are mutually exclusive we get P (A ∩ B) = P (A) · P (B) (2. or ae occupied).5 Let A denote the event ”a group of 5 circuits (a.

7 Bayes’ Rule Let A1 . we have P (Ta ) = 0. Let D be the event that a call reaches C. During this trial.7 · P (Tb ) = 0.8 and P (D | Tb ) = 0.493 0. A succeeds 80 times and B 70 times.10 · 0. According to the data.7.10. It has been found that A talks with C on the average 9 times during the same time as B talk with C 10 times. but it is unknown which person is calling successfully.2.6 If the probability of getting B-busy (called party busy) at a random point of time is 0.9 · P (Tb ) · 0. Example 3.A2 . Consider the ratio of the average number of successful calls from A to the average number of successful calls from B.13) From this we can ﬁnd the probabilities of the events A1 . On the average.10 = 0. C’s telephone is ringing. we have P (D | Ta ) = 0.7. A2 .Ak be mutually exclusive events whose union is the sample space S (exhaustive). Ak which can cause A to occur. Then for any event A we have P (Ai | A) = P (Ai ) · P (A | Ai ) k j=1 P (Ai ) · P (A | Aj ) (2.· · ·.7 · P (Tb ) 2 . out of 100 call attempts.8 + 0.01.· · ·. In this case we can thus obtain P (A | B) from P (B | A).9 · P (Tb ).7 A and B are two persons in two diﬀerent places calling a third person C. Using the Bayes’ formula: P (Tb | D) = P (Tb ) · P (D | Tb ) P (Ta ) · P (D | Ta ) + P (Tb ) · P (D | Tb ) = 0. then the probability of getting B-busy in two call-attempts at two diﬀerent days is 0. 2 2. which in general is not possible. BAYES’ RULE 13 Example 3. Then what is the probability that C’s incoming call is from B? Let Ta and Tb be the events that the call is from A and B respectly.

namely A. (d) The number of telephones which were free of type A and B defects.E3 .E4 .E6 ) (a) Find the intersection of A and B.14 CHAPTER 2. 80 had both type B and C defects (and possibly C). ELEMENTS OF PROBABILITY THEORY 2. 500 had type B defect (and possibly other defects). 300 had type C defect (and possibly other defects). A sample space consists of the following sample points: U = (E1 . C be subsets of the universe consisting of all 1000 telephones. (b) The number of telephones which had none of these defects. 100 had both type A and C defects (and possibly B). B and C defects.E5 . . B and C. A sample of 1000 pieces was inspected with the following results: 400 had type A defect (and possibly other defects). (e) The number of telephones which had no more than one of these defects. 2. 20 had all type A. Find from this diagram. 60 had both type A and B defects (and possibly C). B. Telephones returned to a workshop for repair are subject to three kinds of defects.E5 ) B = (E3 . (c) The number of telephones which had at least one of these defects. Let A.8 Exercises 1. (a) Make a Venn Diagram representing the above subsets.E3 .E2 .E6 ) We deﬁne the subsets: A = (E2 .E5 .

15 3. (a) Deﬁne the sample space for each throw and for the total experiment. 8. Find the probability P (B | A).8. 4 white balls and 5 blue balls. Let the sample space in exercise 2 correspond to the throw of a die. EXERCISES (b) Find the union of A and B. 7. Determine the probability of three sixs in ﬁve throws of a fair die. A ball is drawn at random from a box containing 6 red balls. A die is thrown twice. (e) Find the complement of A relative to B. A box contains 8 red. (b) Find the probability of the event. 6. We assign equal probability to the sample points. Suppose six dice are thrown simultaneously. 3 white and 9 blue balls. Determine the probability that it is (a) red (b) white (c) blue (d) not red (e) red or white 4. If 3 balls are drawn at random without replacement. What is the probability of getting (a) all faces alike (b) no two faces alike (c) only ﬁve diﬀerent faces . A = two times six A = at least one six (c) Find the probability of not getting a total of 7 or 11 in the two throws. determine the probability that: (a) all 3 are red (b) all 3 are white (c) 2 are red and 1 is blue (d) at least 1 is white (e) 1 of each color is drawn (f) balls are drawn in the order of red. (c) Find the diﬀerence of A and B.2. white. blue. (d) Find the complement of A. 5.

10 . Updated: 2001.8) to 3 events and to prove it. ELEMENTS OF PROBABILITY THEORY 9.01. Try to generalize formula (2.16 CHAPTER 2.

y). A stochastic variable is in general denoted by a capital letter (e.g. each sample point has a single real value. 2 1 . This function is called a stochastic function. The number of circuits found busy is a discrete stochastic variable. and a stochastic variable which is deﬁned on a continuous sample space and takes on a uncountable inﬁnite number of values is called a continuous stochastic variable. i.Chapter 3 Elements of mathematical statistics 3. and the result of a given experiment which generates sample points is called a stochastic variable (a random variable).e. This is a function deﬁned on the sample space. X. x. The observed call holding time is a continuous stochastic variable. A stochastic variable which is deﬁned on a discrete sample space is called a discrete stochastic variable. Actually this variable is a stochastic function deﬁned on the sample space. Y ) whereas the possible values are denoted by lower case letters (e.g.1 Teletraﬃc measurements: a traﬃc recorder scans a group of circuits at regular intervals. Example 1.1 Stochastic Variables Let us assign a real number to each point of a sample space.

xn (ﬁnite number or countably many values).35. while for other values of x we have f (x) = 0. x2 . If these values are assumed with probabilities given by P {X = xk } = f (xk ) then we introduce the (probability) density function (frequency function) noted by P {X = x} = f (x) For x = xk this reduces to (3. Example 2. In general a function f (x) is a density function if f (x) ≥ 0 and f (x) = 1 x (3. The distribution function is obtained from the density function by noting that F (x) = P {X ≤ x} = u≤x f (u) (3.2 (ref.4) (3. 3. ELEMENTS OF MATHEMATICAL STATISTICS A stochastic variable is characterized by its (cumulative) distribution function F (x): F (x) = P {X ≤ x} −∞<x<∞ (3.15 p(4) = 0.40.6 of Chapter2) Let us deﬁne a discrete random variable X that counts the number of busy circuits in a trunk group of 4 trunks. 2.10 Then we can get the distribution function F (x) of the discrete random variable X as follows: .2 CHAPTER 3. · · ·.5) where the sum is to be taken over all possible values of x. F (∞) = 1. 3 and 4.1) Here ”X ≤ x” is a shorthand notation for the event corresponding to the set of all points s in sample space S for which X(s) ≤ x.2 Discrete Probability Distributions Let X be a discrete stochastic variable which can take the values x1 .3) (3.2).2) (3. p(3) = 0. p(2) = 0.6) Example 1. Let the probabilities be given by p(1) = 0. X takes only the values 1. F (x) is a never decreasing function of x and F (−∞) = 0.

7) (3.3.75 F (3) = p(1) + p(2) + p(3) = 0. in Chapter 2 (2. CONTINUOUS PROBABILITY DISTRIBUTIONS F (0) = 0 F (1) = p(1) = 0.8) f (u)du = 1 We denote the probability that X lies between a and b by P {a < X ≤ b} = b a f (u)du (3.8 0.90.7) that the probability that X is in between two diﬀerent values is meaningful. 3.1 Distribution function of Example 1.00 Thus for example F (3. The concept of probability density leads us to the introduction of a (probability) density function f(x) where f (x) ≥ 0 ∞ −∞ (3. the probability that X takes any one particular value is in general zero. however.1) 1.40 F (2) = p(1) + p(2) = 0.3 Continuous Probability Distributions When X is a continuous stochastic variable. We noticed.0 p r o b a b i l i t y 0. (ref. In fact ”a < X ≤ b” is the event corresponding to the set ]a.9) .3.0 0 1 2 3 0.90 F (4) = p(1) + p(2) + p(3) + p(4) = 1.6 0.2 0.4 0. 3.2 3. b].40 4 5 Number of tests Fig. Fig.75 3 1.4) = 0.90 0.00 0.

we have f (x) = F (x) P {x < X ≤ x + dx} ≈ f (x)dx (3. .0 2... . ... .. 3... x + dx] ..... .0 Fig.0 F (x) 0. ... .. . ... . .2 Distribution function for Example 3. . . . . ... .. . ....... ... .... . .5 1. .... . . ..... ... ..12) x −∞ f (u)du (3... .. ...0 ... .. .... .. . ... . ELEMENTS OF MATHEMATICAL STATISTICS Any function satisfying (3.5 X 2...10) is called the probability element of the distribution and expresses the probability that X belongs to the interval ]x...... . . ... Example 3. .. .. . .. .7) and (3.. . ... . ..... 0..... .. ..0 1...... .11) (3..0 0...5 0...... . ... ... . . . . ........ . .. .1 with parameter λ = 3...5 3..4 CHAPTER 3...1 A random variable X is called exponentially distributed with parameter λ (λ > 0) if X has a density function f (x) deﬁned as follows: f (x) = We get the distribution function F (x) by F (x) = x −∞ 0 if x < 0 λ · e−λx if x ≥ 0 f (t)dt = 0 if x < 0 −λx 1−e if x ≥ 0 1..6) we deﬁne the distribution function F (x) for a continuous stochastic variable by F (x) = P {X ≤ x} = P {−∞ < X ≤ x} = In general. ... . ..... ....... ..8) will be a density function....... .... ...... ........ . . .. ... . By analogy to (3..... ... .. .......... . ..

ν) (3. The joint (probability) density function is deﬁned by f (x.v (3. yk ) = P {X = xj .4 Joint Distributions We are often interested in two stochastic variables X and Y at the same time. Y = y} where f (x. Y ≤ y} = u≤x v≤y f (u.14) (3.15) The probability that X = xj and Y = yk is given by f (xj . t) with s ∈ S and t ∈ T .19) . ν (3. Y = yk } The total probability of P {X = xj } is obtained by adding all possible values of yk : P {X = xj } = f1 (xj ) = This is called the marginal density function of X. y) = P {X = x. (X. JOINT DISTRIBUTIONS 5 3. Y ) can be regarded as taking values in the product space (S × T ) consisting of all pairs (s. y) = f (x) · f (y) (3. or they may be a pair of ﬁgures emerging from a single experiment.16) f (xj .17) The joint distribution function of X and Y is deﬁned by F (x. y) = P {X ≤ x. then we say that X and Y are independent stochastic variables. v) = 1 u. y) ≥ 0 f (u.13) (3.4. These may be the outcomes of 2 experiments. v) (3.continuous) should be dealt with.20) (3. It is also obvious how the mixed case (discrete . Y .3.18) The continuous case is easily obtained by analogy by replacing sums by integrals. Y = y} = P {X = x} · P {Y = y} or equivalently f (x. We ﬁrst consider the case of two discrete stochastic variables X. If the events X = x and Y = y are independent for all x and y. In this case P {X = x.

We can easily see that X and Y are independent. x2 . y ∈ {1. which is the sum of the random variables X and Y . Example 4. · · · .22) we deﬁne the expectation of g(x) by: . 5. Let X and Y correspond to the result of the ﬁrst and second throw.21) For the continuous case the expectation of X with density function f (x) is deﬁned in a similar way: ∞ x · f (x)dx (3. 4. and x. j) where i. Y = 1) 1 =P (X = 1) · P (Y = 1) = 36 P (Z = 3) =P (X = 1. Y ) 6 takes on the pairs of values (i. Y = 2) 1 +P (X = 2.1 Consider again two consecutive throws of a die. 2. 3. and in analogy with (3. Hence the two-dimentional variable (X. 4. we have P (Z = 2) =P (X = 1. ELEMENTS OF MATHEMATICAL STATISTICS Generalizations to more than two variables can also be made.6 CHAPTER 3. 6}.5 Expected Values For a discrete stochastic variable X taking the possible values x1 . 3. each with probability 1 . 11. 2. It is a one-dimentional random variable which takes on the values of the sum i + j : 2.21) and (3.22) E(X) = −∞ Let X be a stochastic variable. Y = 1) = 18 ············ P (Z = 12) =P (X = 6. Y = 6) 1 =P (X = 6) · P (Y = 6) = 36 It is easy to verify that 12 i=2 P (Z = i) = 1. 12. Consider a single-valued function g(t). Let Z = X + Y . · · · . 2 3. Because of the independence of X and Y . 3. 4. 5. xn we deﬁne the expectation of X of the mean of X as follows n E(X) = j=1 xj · P {X = xj } = n j=1 xj · f (xj ) (3. j ∈ {1. then Y = g(X) is also a stochastic variable. 6}.

E(X) = x=0 x· ∞ λx −λ λx−1 · e = λe−λ x! x=1 (x − 1)! = λe −λ λx = λe−λ eλ = λ x=0 x! ∞ In this case.6 2 Example 5.3.3 A random variable X is called Normal distributed if x2 1 f (x) = √ e− 2 2π .8 The expectation of X equals E(X) = 0.2 and x2 = 1 with probabity p2 = 0. EXPECTED VALUES Discrete case: E(X) = j=1 n 7 g(xj ) · f (xj ) (3. 2. · · · and let P (X = x) = We ﬁnd the mean value of X: ∞ λx −λ e x! λ(> 0) is constant.8 · 1 = 0.5.1 Assume that the random variable X can take on two values x1 = −1 with probability p1 = 0.2 Let the stochastic variable X take on the values x = 0. 2 Example 5.23) Continuous case: E(X) = +∞ −∞ g(x) · f (x)dx (3.2 · (−1) + 0. 1. we call X Possion distributed. Example 5.24) E(X) is the mean value of X and is a measure of the location of X.

Of particular interest is the expectation of g(x). Moments about the mean of X are denoted by µr : Discrete case: µr = j (xj − E(X))r · f (xj ) (3. X1 .26) α1 = E(X) α2 = E(X 2 ) The r’th moment of a stochastic variable X about a is deﬁned by E((X − a)r ). E(c) = c 2. E(X1 + X2 ) = E(X1 ) + E(X2 ) 4. X2 are stochastic variables whose expectations exist. when g(x) = X r . where r is positive integer.27) . Then αr = E(X r ) is called the r’th moment of X: Discrete case: αr = j xr · f (xj ) j (3. E(X1 · X2 ) = E(X1 ) · E(X2 ) X1 . E(c · X) = c · E(X) 3. ELEMENTS OF MATHEMATICAL STATISTICS We derive the mean value of X: 1 E(X) = √ 2π x2 1 xe− 2 dx = − √ −∞ 2π x2 1 = − √ [e− 2 ]+∞ −∞ 2π ∞ ∞ −∞ (−x)e− 2 dx x2 = 0 2 Some properties of the expectation are shown below: 1. X2 are independent Hereby c is a constant and X.25) Continuous case: αr = We notice that: +∞ −∞ xr · f (x)dx (3.8 CHAPTER 3.

4 We calculate the variances 1. (3.8 = 1 2 µ2 = V ar(X) = α2 − α1 = 0. in Example 5. V ar(cX) = c2 · V ar(X) 3. This is called the variance : V ar(X) = E((X − E(X))2 ) This is a non-negative number.1 α1 = E(X) = 0.2 + 12 · 0.30) Some properties of the variance are: 1.28) Of particular interest is the 2nd moment about the mean. V ar(X1 + X2 ) = V ar(X1 ) + V ar(X2 ) X1 . X2 are independent Example 5.29) We can easily get: 2 V ar(X) = µ2 = α2 − α1 (3.5.64 2. V ar(c) = 0 2. The square root of the variance is called the standard deviation.2 α1 = λ α2 = E(X 2 ) = ∞ x=0 x2 · λx x! · e−λ = e−λ (λ2 eλ + λeλ ) = λ2 + λ 2 µ2 = V ar(X) = α2 − α1 = λ .6 α2 = E(X2 ) = (−1)2 · 0. EXPECTED VALUES Continuous case: µr = +∞ −∞ 9 (x − E(X))r · f (x)dx (3. in Example 5.3.

E(X) = u v u · f (u.33) (3. Y ) = 0. The r’th Binomial moment is deﬁned by ∞ i βr = p(i) (3. e. in Example 5.3 α1 = E(X) = 0 CHAPTER 3.35) This is a dimensionless quantity called the correlation coeﬃcient or coeﬃcient of correlation. 3. f (x.6 Some Discrete Distributions We shall now describe some important distributions of the discrete type.32) An interesting quantity arising in the case of two variables is the covariance deﬁned by Cov(X.31) r i=r The results given above can be extended to two or more variables having joint density functions.10 3.34) (3. then Cov(X. y). ELEMENTS OF MATHEMATICAL STATISTICS +∞ α2 = E(X 2 ) = √1 −∞ x2 e− 2 dx = 1 2π 2 µ2 = V ar(X) = α2 − α1 = 1 x2 2 For a discrete stochastic variable assuming non-negative integers as values the Binomial moments are very useful in teletraﬃc theory. v) (3.g. Y ) = (V ar(X) · V ar(Y )) 2 = V ar(X) Thus we are led to a measure of the dependence of the variables X and Y given by ρ= Cov(X. Y ) (V ar(X) · V ar(Y )) 2 1 1 (3. On the other hand. Y ) = E((X − E(X))(Y − E(Y ))) If X and Y are independent. . if X and Y are identical (X = Y ) then Cov(X.

2.41) λx −λ e . SOME DISCRETE DISTRIBUTIONS 11 3. This probability is derived by combinatorial analysis. This distribution is given by (ref. 2 3.37) (3.6. x = 0. x! x = 0. We normally call the two outcomes ”success” and ”failure” with respective probabilities p and 1 − p .1) P {X = x} = We ﬁnd (cf. Example 6.39) This distribution is obtained as the limit of the Binomial distribution when we increase n and at the same time reduce α. 2. n (3.6. We now consider Binomial distribution. P {X = x} is the probability of there being exactly x ”successes”. 1.40) (3. keeping α · n constant and equal to λ. Example 5. .38) n x · px (1 − p)n−x . Table 3. Table 3.6.1 Binomial Distribution Later on we will realize that many important discrete variables are outcomes from the concept of Bernoulli sequence of experiments.2 Poisson Distribution This distribution is given by (ref.3.2 & 5. A Bernoulli experiment is a trial with only two possible outcomes.36) This distribution applies if one makes n independent Bernoulli experiments in which the probability of ”success” in each experiment is p. · · · (3. · · · . A sequence of such experiments is called Bernoulli sequence if all the experiments have the same probability of ”success” or ”failure”.4): E(X) = λ V ar(X) = λ (3.1 This model is usable when we make n test-calls and observes how many of them are unsuccessful.1) P {X = x} = We ﬁnd E(X) = n · p V ar(X) = np(1 − p) (3. 1.

1 (Pascal distribution). Example 6.6. 1. then the number found busy will follow a Poission distribution. as we shall see in the teletraﬃc theory. The geometric distribution is shown in Table 3. By adding k geometric distributions we get the Negative Binomial distribution. ELEMENTS OF MATHEMATICAL STATISTICS The Poisson distribution. 2. · · ·). however. · · · . the Geometric and the Negative Binomial distributions. 2 Example 6. which is also shown in Table 3.2 If we examine a large number of subscribers each of which has a small probability of being busy.7 3.1 Some Continuous Distributions Normal Distribution This is a continuous distribution with the density function 1 1 t−µ 2 f (t) = √ exp(− ( )) 2 σ σ 2π − ∞ < t < +∞ (3. but it is a distribution of its own right. From Table 3.17)).12 CHAPTER 3. Notice. It is described by the Geometric distribution.42) . that this distribution includes the success (k = 1.1 we notice the close relationship between the Binomial. 2 3.1. In some cases we don’t include the trial for success so that the values assumed are k = 0. is not only an approximation of the Binomial distribution. 3.7.3 Other Discrete Distributions The random variable which in a Bernouli sequence counts the number of trials to get the ﬁrst success is called a geometric random variable. In Chapter 1 we indicated the Hypergeometric distribution (formula (1.3 The number of calls incoming to an exchange during one hour will also follow a Poission distribution.

3. λ > 0 (3. respectively F (t) = 1 − e−λt .43) (3. We have: E(T ) = t ≥ 0.44) One usually writes T = N(µ. By allowing k to be non-integral this can be generalized to the Gamma distribution: f (t) = (λt)k−1 −λt λe Γ(k) (3. The density and distribution functions are f (t) = λe−λt .4): E(T ) = µ V ar(T ) = σ 2 13 (3.7.3 & 5. 3.7.48) t ≥ 0. The properties of other Normal distributions are obtained from these tables by working in terms of the quantity (t−µ) .2 Exponential Distribution This distribution is called the negative exponential distribution in teletraﬃc theory.47) (3. which means ”T is normally distributed with mean value µ and the standard deviation σ ”.7. σ). SOME CONTINUOUS DISTRIBUTIONS with (Example 5.46) (3.1).45) 1 λ 1 V ar(T ) = 2 λ This is one of the most important distributions in teletraﬃc theory. and forms the basis for tables of the Normal distribution. σ 3.3 Erlang-k Distribution By adding k exponentially distributed stochastic variables we get a new stochastic variable which is Erlang-k distributed (Table 3. The well-known Markov or ”memoryless” property is inherent in this distribution as we have: P {X > t + h | X > t} = P {X > h} The stochastic variable forget the age t. λ > 0 (3. The standard Normal distribution has a mean of 0 and variance of 1.49) .

What is the probability that the holding time does not exceed 750 milliseconds? 1 Let X be the random variable.1 The holding time of a control device has been found to be Erlang-5 distributed with average value of 500 milliseconds.5)j = 0.8679 j! j=0 2 4 . The probability that X does not exceed 750 milliseconds is given by Fx (750). ELEMENTS OF MATHEMATICAL STATISTICS By replacing λ by (kλ) the mean value becomes independent of k: λk (λkt)k−1 · e−λkt (k − 1)! f (t) = λ>0 (3.. So that λ = 500 (ms.the random variable becomes constant. We also notice the relationship to the discrete cases. we have E(X) = 500 ms.52) When k = 1.1 we notice the close relationship between the Exponential. Fx (750) = 1 − e −λkx k−1 (λkx)j j! j=0 = 1−e −7.)−1 . the Poisson and the Erlang-k distributions. From Table 3.51) E(T ) = 1 λ V ar(T ) = 1 kλ2 (3.14 CHAPTER 3. since V ar(T ) = 0 . the Erlang-k distribution is identical with the exponential distribution. Example 7.50) For integral k we have the distribution function: (λkt)j · j! j=0 k−1 F (t) = 1 − e −λkt (3. When k = ∞.5 (7.

3.8 Exercises 1. 2. 1.8. · · · k E = α V = k(1−α) α2 occurrence of p(n | k) = Number of events in a ﬁxed time interval BINOMIAL DISTRIBUTION n p(x | n) = · αx x ·(1 − α)n−x x = 0. 3. 3. EXERCISES 15 BINOMIAL PROCESS Discrete time Probability of event = α 0<α<1 Time interval between two events = time interval from a random point of time to next event Time interval until the n−1 · αk k−1 event number k ·(1 − α)n−k n = k. The number of calls arriving on a group of devices in a telephone system was recorded on a counter. · · · E= 1 α POISSON PROCESS Continuous time Rate of event = λ (intensity) λ > 0 EXPONENTIAL DISTRIBUTION f (t) = λ · e−λt t≥0 E= 1 λ V = 1−α α2 V = 1 λ2 NEGATIVE BINOMIAL DISTRIBUTION(PASCAL) ERLANG-k DISTRIBUTION f (t | k) = (λt)k−1 (k−1)! · λ · e−λt t≥0 E= k λ V = k λ2 POISSON DISTRIBUTION f (x | t) = (λt)x x! · e−λt t≥0 E =λ·t V =λ·t Table 1 Correspondence between the distribution of the Binomial process and the Poisson process. k + 1. E= mean value. · · · . The counter was read oﬀ every 3 minutes. The following values xi were . 2. n E =α·n V = α · n · (1 − α) GEOMETRIC DISTRIBUTION p(n) = α · (1 − α)n−1 n = 1. V = variance.

(b) Make a table of the (empirical) density function. (2) from the formula x = 40 40 xi i=1 (e) Calculate the variance of the number of arriving calls for this measurement 1 (1) from (b).16 CHAPTER 3. (c) Make a table of the (empirical) distribution function. Updated: 2001.30). ELEMENTS OF MATHEMATICAL STATISTICS obtained during a period from 8 a. Prove (3.e.. (d) Calculate the mean value x of the number of arriving calls for this measurement 1 (1) from (b). of the relative frequency of diﬀerent number of arriving calls.01.10 . Y ) = 0. Show that if X and Y are independent. 06 09 08 17 08 07 15 12 08 11 09 16 07 07 09 14 07 12 19 15 06 09 16 14 09 13 10 09 07 08 11 10 06 09 11 14 03 05 15 14 (a) Make a diagram of the frequency of diﬀerent number of arriving calls. We use n only in theoretical analysis) 2.m. (2) from the formula V ar = n n (xi − x)2 i=1 (In practice we divide the sum by n − 1 instead of n when we calculate the variance of ”observations” because this gives a better result. 3. i. then Cov(X.m. to 10 a.

The population size N can be ﬁnite or inﬁnite. So ﬁnding knowledge. etc. in real world problems one often has little or no knowledge about the distribution underlying samples. An important problem in sampling theory is to decide how to 1 .2 Sampling Statistics By taking random samples from the population these may be used to obtain estimates of the population parameters. Instead of examining the entire population (doing this is often impossible in practice) we observe only a sample of size n. which we shall call a population . when little or nothing is known of the underlying distribution. The process of obtaining samples is called sampling. If we know F (t). However. 4. which is a subset of the population.1 Sampling We are often interested in drawing conclusions about a large set of objects. In sampling without replacement each element cannot be chosen more than once. The purpose is to obtain some knowledge about the population from results found in the sample.Chapter 4 Theory of sampling 4. In general we shall only try to get some knowledge (estimates) of some population parameters by sampling. which is deﬁned by a distribution function F (X) having population parameters as the mean value µ. Sampling where each element of a population may be chosen more than once is called sampling with replacement. the variance σ 2 . is the main topic of this chapter. A population is characterized by a stochastic variable X. then we have full information about the population.

.1) sample variance : 1 n 2 · x − x2 n i=1 i (4.4) 4.2) These statistics are functions of stochastic variables and are therefore stochastic variables themselves. Let us pick n members from the population at random: observations : x1 . xn sample size : n Then we calculate the following sample statistics: sample mean : x= s2 = 1 n xi · n i=1 (4. x2 .2 CHAPTER 4. section 3.3) (4.3 The Central Limit Theorem This is a fundamental theorem from mathematical statistics: If a sample of size n is taken from a population with ﬁnite mean µ and ﬁnite variance σ 2 (and otherways any statistical distribution). 1) as n increases. The unkown population mean and variance are estimated by the following unbiased estimators: µ = E{x} n · s2 } σ 2 = E{s2 } = E{ n−1 (these important results are proven in mathematical statistics) We now want to know how accurate these results are.7) with mean value µ and 2 variance σ . then as n increases the distribution of the sample mean x is asymptotically normal distributed (cf. · · · . (4. THEORY OF SAMPLING form the sample statistics which will best estimate a given population parameter. the distribution of n Z= (x − µ) σ √ n (4.5) tends towards the standard Normal distribution N(0. Or equivalently.

4. 1) f or n → ∞ (4. : x−µ Z = σ → N(0. 2 4.6) √ n This is a consequence of the Central Limit Theorem in section 4.4 Sampling Distribution A sample statistic.3. σ 2 ). i. is a function of random variables and is therefore itself a random variable. SAMPLING DISTRIBUTION Example 4. 1) (n → ∞) p(1−p) √ n or n i=1 Xi − np np(1 − p) → N(0.1 Population mean µ and variance σ 2 are known Suppose that the population from which samples are taken has a probability distribution with mean value µ and variance σ 2 (not necessary a Normal distribution). We shall only consider two sampling distributions for the sample mean. Xn that are independent and each with ”success” probability p. · · · . np(1 − p)). we get: x−p √ → N(0. We have then E(Xi = p and V ar(Xi = p(1 − p)). . The probability distribution of a sample statistic is called the sampling distribution of the statistic . X2 .1 3 Consider a sequence of Bernouli random variables X1 . Then it can be shown that the sampling distribution of x is asymptotically normal distributed N(µ.4. the i=1 above expression shows that for large n an approximation for binomial probaboloties can be obtained by using the Normal probabilities of N(np. By the central limit theorem. which is calculated from a sample. 4.4. 1) (n → ∞) Since we know from section 6 of Chapter 3 that Sn = n Xi is binomial distributed.e.

4.7) 2 n with the probability (1 − α) · 100% of the time.6449 1. and (because of symmetry) 2. respectively the sample variance s2 . µ + z1− α · √n ) is called the conﬁdence interval .2 α 2 (4.9600.5758 2 Thus (section 3.7) 2.n−1 = z1− α (4.9600 2.10) 2 2 n→∞ . then we can expect to ﬁnd x lying between the conﬁdence limits σ µ ± z1− α · √ (4.5% is below t = −1. THEORY OF SAMPLING If we choose a so-called conﬁdence level 1 − α.8) For some values of the conﬁdence level α we have the following values of z: α 10% 5% 1% z1− α 2 1.2 Population mean µ and variance σ 2 are both unknown In most pratical applications we do not know the population parameters µ and/or σ 2 .9) x ± t1− α .9600. Then we estimate these parameters by the sample mean x . σ σ This interval (µ − z1− α · √n . For increasing n this distribution is asymptotically Normal distributed: lim t1− α . It can be shown that the sample mean has a so-called (student) t-distribution.5% of probability (area under the density function) is above t = 1. z1− α is 2 2 2 obtained from the standard Normal distribution: P {−∞ < T ≤ z1− α } = 1 − 2 Example 4.4 CHAPTER 4. which has an additional parameter: degrees of freedom = n − 1.n−1 · √ 2 n where the t-value is obtained from a table of the t-distribution.4. The conﬁdence interval becomes: s (4.

57 2.984. The conﬁdence interval for µ based on formula (4.96. Therefore the 2 . is s s (x − √ · t1− α .5% = 1.74.65 square of time unit. then we must increase the sample size by a factor c2 .9).71 4.23 2.3 For α = 5% we had z97. If we want to reduce the conﬁdence interval by a factor c. 2 For a given conﬁdence level we have a relation between the conﬁdence limits (conﬁdence interval) and the sample size.96.4 The average holding time of calls during a certain period in a telephone system is to be estimated. but for large values of n and for most pratical purposes we often use the z-value. 1 − α = 0. Example 4. Based on a random sample of 100 holding times of calls.09 2. We have t1− α .95.n−1 ) 2 2 n n where n = 100. s = 2. x + √ · t1− α .30 2.n−1 = 1.4.65. From the t-distribution we get: n 1 2 5 10 20 50 t97. the sample mean and sample variance are calculated as x = 5. Find a 95% conﬁdence interval for the true average holding time of calls in that period.01 We notice that for increasing n the t-value tends to 1. SAMPLING DISTRIBUTION 5 The t-value yields a larger conﬁdence interval than the z-value (we have less information because we don’t know the population mean and variance).5% 12. x = 5.4. Let µ denote the true average holding time of calls.n−1 .74 time unit and s2 = 2. Example 4.

By sampling of 100 of this kind parts. 4. A bottle is supposed to contain 250 ml of wine.01.0630). Find an (appriximate) 1 − α conﬁdence interval for µ. CHAPTER 4. Construct a conﬁdence interval for µ with conﬁdence level (a) 99% (b) 95% (c) 90% (d) 80% Does the length of the intervals increase or decrease as the conﬁdence level decrease? 5. each Poisson with mean λ. (b) At least 252 ml.5 Exercises 1.09 kg 2 . probability that the average of wine contained in a bottle will be (a) At most 248 ml.5 kg weight”. A random sample of 50 instruments is measured to be in the standard weight. s2 = 0. (c) Between 249 and 251 ml. 6. THEORY OF SAMPLING 2 4.75. Xn (n large enough to justify applying the Central Limit Theorem) be independent random variables.4170. Construct a conﬁdence interval for µ with conﬁdence level (a) 80% (b) 95% (c) 99%. A certain kind of instrument labeled ”1.47 kg. with a standard deviation of 3ml. Updated: 2001. ﬁnd the approximate probability P (X ≥ 75). We calculate x = 1.10 . 2. Let X1 . 3. · · · . If we sample 200 such bottles at random. with a standard deviation of 1. (α = 5%). we obtain x = 4.6 conﬁdence interval for µ is (5.2 years. X2 . Suppose that it is observed that the average span of using one kind of parts of a machine is 5 years. If X is a Poisson random variable with mean 81.

5.4. EXERCISES 7 .

- Steep
- lec10 (1).pdf
- Interploation and Extrapolaiton
- Lecturenotes3658 Probability
- Chapter 07 - Vectors-And-Dot Products
- Digital Logic Design
- 15
- Advancedcounting Slides
- Matlab Code for Sorted Real Schur Forms
- Sample Chapter
- 181648916 Adjoint Tutorial PDF
- pdf
- Notes_NumericalOptimization
- Proving Inequalities
- O notation in Isabelle.pdf
- Lecture 01
- 12_12. Fourier Series Aided Finite Element Method (FSAFEM)
- Spherical Harmonics
- Boolean Algebra
- Mathematics Required for Robotic Motion
- p523_chap01
- Section2 3 Perturb Regular
- Algebraic Vectors Tutorial
- Application Homotopic Methods to Static Flexible Pipe Problems
- Soft Jar Notes0001
- ESSENTIAL MATHS FOR ENGINEERING
- conformal field theory
- Mental Math Tricks to Impress Your Friends
- Curves&Surfaces Lec Ppt
- Measures

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue listening from where you left off, or restart the preview.

scribd