## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Data

Data are very important for scientific study, and statistics is a discipline that deals with the

collection, presentation and analysis of data. In this chapter we are going to study how we can

summarize and describe a set of data. When we study a set of data we need to identify the

following important characteristics of the dataset.

• Primary and secondary data. When the data are collected by us it is called primary data.

We always have the individual values of the data. When the dataset is collected by others,

it is called secondary data. Sometimes the data is grouped into a table, and is called

grouped data.

• Population and sample data. Population refers to the totality of elements in which we are

interested. Suppose we want to study the salary of Hong Kong people, our population

includes all those persons who work in Hong Kong. However as the population is so big

that it is not practical and economical to collect salary data of all the working people, we

always select randomly only a subset of the population and the data is sample.

• Discrete and continuous data. It is important to identify whether the data is continuous or

discrete. For example data on the number of persons in a household is discrete, and data

on salary is continuous. Different statistical techniques are used for handling discrete or

continuous data.

Frequency Distribution

Statistical data obtained by means of census, sample surveys or experiments usually consist of

raw, unorganized sets of numerical values. Before these data can be used as a basis for inferences

about the phenomenon under investigation or as a basis for decision, they must be summarized

and the pertinent information must be extracted.

Example 1

A random sample of 100 households in a town was selected and their monthly town gas

consumption (in cubic metres) in last month were recorded as follows:

55 82 83 109 78 87 95 94 85 67

80 109 83 89 91 104 90 103 67 52

107 78 86 29 72 66 92 99 60 75

88 112 97 88 49 62 70 66 88 62

72 85 81 78 77 41 105 92 94 74

78 75 87 83 71 99 56 69 78 60

1197 39 104 86 67 79 98 102 82 91

46 120 73 125 132 86 48 55 112 28

42 24 130 100 46 57 31 129 137 59

102 51 135 53 105 110 107 46 108 117

1

A useful method for summarizing a set of data is the construction of a frequency table, or a

frequency distribution. That is, we divide the overall range of values into a number of classes and

count the number of observations that fall into each of these classes or intervals.

The general rules for constructing a frequency distribution are:

(i) There should not be too few or too many classes.

(ii) Insofar as possible, equal class intervals are preferred. But the first and last classes can

be open-ended to cater for extreme values.

In example 1, the sample size is 100 and the range for the data is 113 (137 - 24). A frequency

distribution with six classes is appropriate and it is shown below.

Frequency distribution of household town gas consumption

Town gas monthly

consumption

( in cubic metres)

Number of

households

20 - 39 5

40 - 59 15

60 - 79 25

80 - 99 30

100 - 119 18

120 - 139 7

Total 100

Class limits: are the numbers that typically serve to identify the classes in a listing of a frequency

distribution. Thus, in the above frequency distribution, for the class whose frequency is 30, its

lower class limit is 80 and upper class limit is 99.

As contrasted to a class limit, a class boundary is the precise point that separates one class from

another, rather than being a value indicated in one of the classes. A class boundary is typically

located midway between the upper limit of a class and the lower limit of the next higher class

adjoining it. Therefore the class boundary separating the class 60-79 and the class 80-99 is

halfway between 79 and 80, that is, at the point 79.5.

Class interval: is the width of a class. The class interval of a class is computed by subtracting the

lower limit (boundary) of the class from the lower limit (boundary) of the next class.

Class midpoint or class mark: is the point dividing the class into equal halves on the basis of

class interval. This point can be obtained by adding the lower and upper limits (boundaries) of a

class and dividing by 2.

Relative frequency of a class: is the frequency of the class divided by the total frequency of the

distribution.

Cumulative frequency distribution: shows the number of items of a series that are less than (or

more than) certain specified values.

2

Measure of Central Tendency

A value that would describe the 'centre' of a distribution would be visually located near the spot

where most of the data seem to be concentrated. Consequently, values that fulfil this role are

called measures of central tendency.

The most common measures of the central tendency of a data set are arithmetic mean or simply as

mean, median and mode.

The mean of a set of numerical data is the sum of the set divided by the number of observations,

that is, their average.

The median of a distribution is the value which divides the distribution so that an equal number of

values lie on either side of it, i.e., half of the items have values smaller or equal to it and half of

the items have values larger or equal to it.

The mode of a set of numerical data is the value which occurs most frequently.

Example 1 (calculating mean, median and mode for individual data)

The following table shows the hourly wage rates of eight sampled construction workers.

Worker i 1 2 3 4 5 6 7 8

Hourly wage

rate (

i

x

)

$35 38 46 60 65 69 72 78

Mean

8

) (

8

1

∑

·

·

i

i

x

x

)

8

(

8 7 6 5 4 3 2 1

x x x x x x x x + + + + + + +

·

875 . 57

8

463

· · ($)

Location of the median: 5 . 4

2

9

2

1

· ·

+ n

th

Median = 5 . 62

2

65 60

2

5 4

·

+

·

+ x x

($)

Mode: the sample size is too small, mode cannot be identified.

Example 2 (calculating mean, median and mode for grouped data)

The following table shows the daily wages of a random sample of construction workers.

Calculate its mean, median and mode.

3

Daily Wages ($) Number of Workers

200 - 399 5

400 - 599 15

600 - 799 25

800 - 999 30

1000 - 1199 18

1200 - 1399 7

Total 100

Solution

Daily Wages ($)

Number of

Workers

i

f

Class Mark

i

x

i i

x f

200 - 399 5 299.5 1,497.5

400 - 599 15 499.5 7,492.5

600 - 799 25 699.5 17,489.5

800 - 999 30 899.5 26,985.5

1000 - 1199 18 1,099.5 19,791.0

1200 - 1399 7 1,299.5 9,096.5

Total 100 82,350.0

Mean 5 . 823

100

0 . 350 , 82

) (

6

1

6

1

· · ·

∑

∑

·

·

i

i

i

i i

f

x f

x ($)

Daily Wages ($) Number of Workers

i

f

Cumulative Frequency

i

F

200 - 399 5 5

400 - 599 15 20

600 - 799 25 45

800 - 999 30 75

1000 - 1199 18 93

1200 - 1399 7 100

4

Total 100

As 0.5n = 0.5(100) = 50, so the median lies in the 4th class.

Median =

) (

5 . 0

4

4

3

4

c

f

F n

L

−

+

where L is the lower class boundary,

c is the class interval.

8 . 832 ) 200 (

30

45 ) 100 ( 5 . 0

5 . 799 ·

−

+ · ($)

Daily Wages ($)

Number of

Workers

i

f

Class Interval

i

c

Relative Density

) 200 (

'

i

i

i

c

f

f ·

200 - 399 5 200 5

400 - 599 15 200 15

600 - 799 25 200 25

800 - 999 30 200 30

1000 - 1199 18 200 18

1200 - 1399 7 200 7

Total 100

As 30

'

4

· f is the largest relative density, so mode lies in the 4th class.

Mode ) (

) ( ) (

4

'

3

'

4

'

5

'

4

'

3

'

4

4

c

f f f f

f f

L

− + −

−

+ ·

3 . 858 ) 200 (

) 25 30 ( ) 18 30 (

25 30

5 . 799 ·

− + −

−

+ ·

($)

Advantages and disadvantages of each measure

Mean

Advantages: (i) All values in the distribution are used in its calculation, so it can be

regarded as more representative than the other two measures.

(ii) Its method of calculation is simple and most people understand the

meaning of its result.

(iii) Its result can easily be used in further analysis.

5

Disadvantages: (i) Its result can be easily distorted by extreme values. As such, its result

may be rather lower or higher than the bulk of the values and

becomes unrepresentative.

(ii) In case of open end classes, mean can be calculated only if their class

marks are determined. If such classes contain a large proportion of

the values, then the mean may be subjected to substantial error.

Median

Advantage: Its result will not be affected by extreme values and open end

classes.

Disadvantage: It has to be supplemented by other statistics because it does not

reflect the distribution in the way that the mean does, that is,

including all values.

Mode

Advantages: (i) Its result will not be affected by extreme values and open end

classes.

(ii) If data are not grouped, it can be determined easily.

Disadvantages: (i) It has to be supplemented by other statistics.

(ii) It is difficult to obtain an accurate estimate of the mode if the values

are classified into a frequency distribution.

How to select a suitable measure

(i) Always select the mean whenever there is no special reason for choosing the other two

measures.

(ii) Select the median is the distribution consists of substantial amount of extreme large or

small values.

(iii) Select the mode if integral result is preferred as in cases the data are in ordinal scales.

6

Measure of data variation (variability)

A measure of central tendency is almost never, by itself, sufficient to provide an adequate

summary of the characteristics of a set of data. We will usually require, in addition, a measure of

the amount of variation in the data.

Example 1

Consider the following measurements, in grams, for two samples of strawberry jam bottled by

companies A and B:

Sample for

Company A

31 32 32 33 32

Sample for

Company B

28 29 32 35 36

Both samples have the same mean, 32 grams. It is obvious that company A, in comparison with

company B, bottles strawberry jam with a more consistent content. We say that the variability of

the observations is smaller for company A. Therefore in buying strawberry jam we would feel

more confident that the bottle we select will be closer to the advertised average content if we buy

from company A.

The most important measures of variability or dispersion are the range, mean deviation,

standard deviation and variance.

(There are some other measures like quartile deviation and percentiles. We shall not study these

measures. Read our textbook if interested)

The range of a set of numbers is the difference between the largest and the smallest number in

the set.

Example 2 (For individual data)

The following table shows the hourly wage rates of eight sampled construction workers.

Worker i 1 2 3 4 5 6 7 8

Hourly wage

rate (

i

x

)

$35 38 46 60 65 69 72 78

The range is $78 - $35 = $43.

Though range is simple and can be obtained easily, its result is unstable. This is particularly true

if the sample size is large. So whenever the sample size is over 10, we seldom choose to use

range to indicate variability of the data.

7

Mean deviation is the average of the absolute deviation of the numerical data from their mean.

Worker i 1 2 3 4 5 6 7 8

Hourly wage

rate (

i

x

)

$35 38 46 60 65 69 72 78

875 . 57 − ·

−

i

i

x

x x

22.875 19.875 11.875 2.125 7.125 11.125 14.125 20.125

Mean deviation

656 . 13

8

25 . 109

8

875 . 57

8

1

· ·

−

∑

· i

i

x

($)

The mean deviation is a good measure to show the extent of variation of the data in a distribution.

However, when this measurement is used in further analysis, it would give rise to some

unnecessary tedious mathematical problem as a result of its absolute value term. To avoid this

pitfall, we can use the standard deviation instead.

Standard deviation of a population (

) σ

is the square root of the average of the squared distances

of the observations from the mean.

N

x

N

i

i ∑

·

−

·

1

2

) ( µ

σ

, where

µ

is the population mean

To compute the sample standard deviation

) (s

we use the above formula, replacing

µ

by x

and N by 1 − n .

2

1

( )

1

n

i

i

x x

s

n

·

−

·

−

∑

Worker i 1 2 3 4 5 6 7 8 Total

Hourly wage

rate (

i

x

)

$35 38 46 60 65 69 72 78 463

2

( 35.875)

16.226($)

7

x

s

−

· ·

Variance is the square of the standard deviation.

8

2

2 1

( )

1

n

i

i

x x

s

n

·

−

·

−

∑

Example 3 (for grouped data)

The following table shows the daily wages of a random sample of construction workers.

Calculate its mean deviation, variance, and standard deviation.

Daily Wages ($) Number of Workers

200 - 399 5

400 - 599 15

600 - 799 25

800 - 999 30

1000 - 1199 18

1200 - 1399 7

Total 100

Solution

Daily Wages ($)

Number of

Workers

i

f

Class Mark

i

x

5 . 823 − ·

−

i i

i i

x f

x x f

200 - 399 5 299.5 2,620

400 - 599 15 499.5 4,860

600 - 799 25 699.5 3,100

800 - 999 30 899.5 2,280

1000 - 1199 18 1,099.5 4,968

1200 - 1399 7 1,299.5 3,332

Total 100 21,160

Mean deviation 60 . 211

100

160 , 21

6

1

6

1

· ·

−

·

∑

∑

·

·

i

i

i

i i

f

x x f

($)

Daily Wages

($)

Number of

Workers Class Mark

2

( )

i i

f x x −

9

i

f

i

x

200 - 399 5 299.5 1, 372,880

400 - 599 15 499.5 1,574,640

600 - 799 25 699.5 384,400

800 - 999 30 899.5 173,280

1000 - 1199 18 1,099.5 1,371,168

1200 - 1399 7 1,299.5 1,586,032

Total 100 6,462,400

Variance

2

6462400

( ) 65, 276.77

99

s · ·

Standard deviation =

65276.77 255.49 ·

Comparison of the variation of two distributions

The values of the standard deviations cannot be used as the bases of the comparison because:

(a) units of measurements of the two distributions may be different, and

(b) average values of two distributions may be widely dissimilar.

The correct measure that should be used is the coefficient of variation

) (CV

.

% 100

x

s

CV ·

Example 4

The following table shows the summary statistics for the daily wages of two types of workers.

Worker's

Type

Daily Wages

Mean Standard deviation

I $100 $20

II $150 $24

Compare these two daily wages distributions.

Solution

In comparison Distribution Reason

Average magnitude

II > I 100 150 · > ·

I II

x x

Variation I > II

% 16 % 100

150

24

% 20 % 100

100

20

· · > · ·

II I

CV CV

Chapter Two - Probability

Introduction and concepts

“Perhaps it was man’s unquenchable thirst for gambling that led to the early development of

probability theory. In an effort to increase their winnings, gamblers called upon the

10

mathematicians to provide optimum strategies for various games of chance.” ---- from Walpole

R.E. Introduction to Statistics

Probability is the basis upon which the discipline of statistics has been developed and applied in

many fields associated with chance occurrences such as politics, business, weather forecasting,

and scientific research. Probability may be taken as a tool with which we may solve problems

involving uncertainties. In fact uncertainty is a basic element of human experiences. To cite some

examples: travelling time, number of customers, rainfall, temperature, share price movement,

length of our life, etc.

There are three approaches to understand probability. In the empirical approach, probability may

be taken as a relative frequency. As such the probability of an aeroplane arriving its destination

on time may be taken as the proportion of times the aeropline has been on time in the past, say,

one thousand times.

Suppose in a trial of an experiment, there are k possible outcomes which are equally likely. The

probability of the occurrence of an outcome is therefore 1/k. Thus in throwing a coin, the

probability of having a head is ½. In our course, we shall adopt this approach but the empirical

approach is always useful in giving us some intuition to understand the problem.

The third approach is very mathematical. A number of axioms have been set up and from these some

theorems of probability have been developed. This approach is too abstract and usually used by

mathematicians.

Some Basic Concepts

Sample space: is a set of all possible outcomes of an experiment.

Event: is a subset of a sample space.

To find the probability of an event we need to count the number of outcomes of the event and the

number of all possible outcomes of the experiment, and then to divide the former by the latter. Hence

the following counting rules may be helpful.

Some counting rules

Example 1

Three items are selected at random from a manufacturing process. Each item is inspected and

classified defective (D) or non-defective (N).

Its sample space is =

¹

;

¹

¹

'

¹

NNN NND NDN DNN

NDD DND DDN DDD

Example 2

11

The event that the number of defectives in above example is greater than 1.

Its sample space is = {DDD DDN DND NDD}

The probability of the event is 4/8 or ½.

Example 3

Suppose a licence plate containing two letters following by three digits with the first digit not

zero. How many different licence plates can be printed?

1st

Letter

2

nd

Letter

1st

Digit

2nd

Digit

3rd

Digit

Number of

Choices

A - Z

(26)

A - Z

(26)

1 - 9

(9)

0 - 9

(10)

0 - 9

(10)

Number of different licence plates that can be printed is

(26)(26)(9)(10)(10) = 608,400

Example 4

Find the possible permutations (the number of ways where sequence of the letters is counted)

from 3 letters A, B, C.

Consider the following tree diagram:

• The number of permutations of n distinct objects is

! ) 1 )( 2 ).....( 2 )( 1 )( ( n n n n P

n n

· − − ·

• The number of permutations of n distinct objects taken r at a time is

12

) 1 )( 2 )...( 1 )( ( + − + − − · r n r n n n P

r n

( )( )

)! (

!

) 1 )( 2 )...( 1 )( (

) 1 )( 2 )...( 1 )( ( ) 1 )( 2 )...( 1 )( (

r n

n

r n r n

r n r n r n r n n n

−

·

− − −

− − − + − + − −

·

e.g. The number of 3-letter words formed from 5 letters is

60

)! 3 5 (

! 5

3 5

·

−

· P

• The number of distinct permutations of n objects of which

1

n

are alike of the first

kind,

2

n are alike of the second kind, .....,

k

n

are alike of the kth kind and

n n n n

k

· + + + ...

2 1

is

) ! )...( ! )( ! (

!

2 1 k

n n n

n

Find the possible permutations of the following 5 letters: A A A B C

There are five objects of which three are alike.

∴The answer

! 3

! 5

! 3

5 5

· ·

P

Example 5

How many 7-letter words can be formed using the letters of the word 'BENZENE'?

(there are 1 B, 3 E, 2 N and 1 Z)

The number of 7-letter words that can be formed is

420

) ! 1 )( ! 2 )( ! 3 )( ! 1 (

! 7

·

• The number of combinations (number of ways where sequence is not counted) of n

distinct objects taken r at a time is

)! ( !

!

r n r

n

C

r n

−

·

Find the possible combinations of 5 distinct objects taken 3 at a time.

The answer

)! 3 5 ( ! 3

! 5

−

·

Example 6

13

The number of 3-person committees that can be formed from a group of 4 persons is

4

)! 3 4 ( ! 3

! 4

3 4

·

−

· C

Example 7

A box contains 8 eggs, 3 of which are rotten. Three eggs are picked at random. Find the

probabilities of the following events.

(a) Exactly two eggs are rotten.

(b) All eggs are rotten.

(c) No egg is rotten.

Solution:

(a) The 8 eggs can be divided into 2 groups, namely, 3 rotten eggs as the first group and 5

good eggs as the second group.

Getting 2 rotten eggs in 3 randomly selected eggs can occurred if we select randomly 2

eggs from the first group and 1 egg from the second group.

The number of this outcome is

( ) ( ) 15

1 5 2 3

· C C

Total number of possible outcomes of selecting 3 eggs randomly from the total 8 eggs is

56

3 8

· C

.

Thus the probability of having exactly two rotten among the 3 randomly selected eggs is

( ) ( )

56

15

3 8

1 5 2 3

·

C

C C

(b) Similarly, the probability of having all 3 rotten eggs is

( ) ( )

56

1

3 8

0 5 3 3

·

C

C C

(c) The probability of having no rotten egg is

( ) ( )

28

5

56

10

3 8

3 5 0 3

· ·

C

C C

14

Rules of probability

The following rules may help us to find the probability of an event.

Addition Rule: For any events that are not mutually exclusive

) ( ) ( ) ( ) ( B A P B P A P B A P ∩ − + · ∪

where B A∪ is the union of two sets A and B, it is the set of elements that belong to A

or to B or to both.

B A∩ is the intersection of two sets A and B, it is the set of elements that are common

to A and B.

Illustrative example

180 students took examinations in English and Mathematics. Their results were as follows:

Number of students passing English = 80

Number of students passing Mathematics = 120

Number of students passing at least one subject = 144

Then we can rewrite the above results as:

Probability that a randomly selected student passed English =

9

4

180

80

·

Probability that a randomly selected student passed Mathematics

3

2

180

120

· ·

Probability that a randomly selected student passed at least one subject

5

4

180

144

· ·

Find the probability that a randomly selected student passed both subject.

Solution

Let E be the event of passing English, and M be the event of passing Mathematics.

It is given that:

9

4

) ( · E P ;

3

2

) ( · M P ;

5

4

) ( · ∪E M P

As

) ( ) ( ) ( ) ( E M P M P E P E M P ∩ − + · ∪

∴

31 . 0

45

14

5

4

3

2

9

4

) ( ) ( ) ( ) ( · · − + · ∪ − + · ∩ E M P M P E P E M P

Example 8

15

A card is drawn from a complete deck of playing cards. What is the probability that the card is a

heart or an ace?

Solution

Let A be the event of getting a heart, and B be the event of getting an ace.

The probability that the card is a heart or an ace is

) ( B A P ∪

.

) ( ) ( ) ( ) ( B A P B P A P B A P ∩ − + · ∪

13

4

52

16

52

1

52

4

52

13

· · − + ·

For mutually exclusive events,

) ( ) ( ) ( B P A P B A P + · ∪

What is the probability of getting a total of '7' or '11' when a pair of dice are tossed?

Solution

Total number of possible outcomes = (6)(6) = 36

Possible outcomes of getting a total of '7' :{1,6; 2,5; 3,4; 4,3; 5,2; 6,1}

Possible outcomes of getting a total of '11' : {5,6; 6,5}

Let A be the event of getting a total of '7', and B be the event of getting a total of '11'.

The probability of getting a total of '7' or '11' is

) ( B A P ∪

.

) ( ) ( ) ( ) ( ) ( ) ( B P A P B A P B P A P B A P + · ∩ − + · ∪

...A and B are mutually exclusive

9

2

36

2

36

6

· + ·

If A and A' are complementary events then

) ' ( 1 ) ( A P A P − ·

Example 9

A coin is tossed six times in succession. What is the probability that at least one head occurs?

Let A be the number of heads occurs in six successive tosses.

) 0 ( 1 ) 1 ( · − · ≥ A P A P

16

( )( )( )( )( )( )

64

63

2

1

2

1

2

1

2

1

2

1

2

1

1 · − ·

Conditional Probability

Let A and B be two events. The conditional probability of event A given that event B has

occurred, denoted by

) / ( B A P

is defined as

) (

) (

) / (

B P

B A P

B A P

∩

·

provided that P(B) > 0.

Similarly, the conditional probability of B given that event A has occurred is defined as

) (

) (

) / (

A P

B A P

A B P

∩

·

, provided P(A) > 0.

Example 10

A hamburger chain found that 75% of all customers use mustard, 80% use ketchup, and 65% use

both, when ordering a hamburger. What are the probabilities that:

(a) a ketchup-user uses mustard?

(b) a mustard-user uses ketchup?

Solution

Let A be the event of using mustard, and B be the event of using ketchup.

It is given that:

75 . 0 ) ( · A P

;

80 . 0 ) ( · B P

;

65 . 0 ) ( · ∩B A P

(a) P(a ketchup-user uses mustard)

8125 . 0

80 . 0

65 . 0

) (

) (

) / ( · ·

∩

· ·

B P

B A P

B A P

(b) P(a mustard-user uses ketchup)

8667 . 0

75 . 0

65 . 0

) (

) (

) / ( · ·

∩

· ·

A P

B A P

A B P

Multiplicative Rule

) / ( ) ( ) ( A B P A P B A P · ∩

or =

) / ( ) ( B A P B P

Statistically Independence: the occurrence or non-occurrence of one event has no effect on the

probability of occurrence of the other event.

17

Two events A and B are independent if and only if

) ( ) ( ) ( B P A P B A P · ∩

Example 11

A pair of fair dice are thrown twice. What is the probability of getting totals of 7 and 11?

Solution

Let

i

A

be the event of getting '7' in the i-th throw and j

B

be the event of getting '11' in the j-th

throw.

P(Getting totals of 7 and ll) ) ( ) ( ) (

2 1 2 1

A B P B A P B A P ∩ + ∩ · ∩ ·

) / ( ) ( ) / ( ) (

1 2 1 1 2 1

B A P B P A B P A P + ·

) ( ) ( ) ( ) (

2 1 2 1

A P B P B P A P + · .... j i

B A ,

are independent

54

1

36

6

36

2

36

2

36

6

·

,

_

¸

¸

,

_

¸

¸

+

,

_

¸

¸

,

_

¸

¸

·

Theorem of Total Probability

If the events

k

B B B ,..., ,

2 1

constitute a partition of the sample space S such that

) 0 ( ≠

i

B P

for i = 1, 2, ... , k, then for any event A of S

) ( ... ) ( ) ( ) (

2 1 k

B A P B A P B A P A P ∩ + + ∩ + ∩ ·

) / ( ) ( ... ) / ( ) ( ) / ( ) (

2 2 1 1 k k

B A P B P B A P B P B A P B P + + + ·

Example 12

Suppose 50% of the cars are manufactured in the United States and 15% of these are compact;

30% of the cars are manufactured in Europe and 40% of these are compact; and finally, 20% are

manufactured in Japan and 60% of these are compact. If a car is picked at random from the lot,

find the probability that it is a compact.

Let A be the event that the car is compact,

1

B be the event that the car is manufactured is United States,

2

B be the event that the car is manufactured in Europe, and

3

B

be the event that the car is manufactured in Japan.

) ( ) ( ) ( ) (

3 2 1

B A P B A P B A P A P ∩ + ∩ + ∩ ·

) / ( ) ( ) / ( ) ( ) / ( ) (

3 3 2 2 1 1

B A P B P B A P B P B A P B P + + ·

315 . 0 ) 60 . 0 )( 20 . 0 ( ) 40 . 0 )( 30 . 0 ( ) 15 . 0 )( 50 . 0 ( · + + ·

Baye's Theorem

If k

E E E ,...,

2 , 1 are mutually exclusive events such that

k

E E E ∪ ∪ ∪ ...

2 1

contains all

sample points of S, then for any event D of S with

0 ) ( ≠ D P

,

18

∑

·

∩

∩

·

∩

·

k

j

j

i i

i

D E P

D E P

D P

D E P

D E P

1

) (

) (

) (

) (

) / (

) / ( ) ( ... ) / ( ) ( ) / ( ) (

) / ( ) (

2 2 1 1 k k

i i

E D P E P E D P E P E D P E P

E D P E P

+ +

·

Example 13

Suppose a box contains 2 red balls and 1 white ball and a second box contains 2 red ball and 2

white balls. One of the boxes is selected by chance and a ball is drawn from it. If the drawn ball is

red, what is the probability that it came from the 1st box?

Solution

Let A be the event of drawing a red ball and B be the event of choosing the 1st box.

Given:

2

1

) ' ( ) ( · · B P B P ;

3

2

) / ( · B A P ;

4

2

) ' / ( · B A P

P(Coming from the 1st box/the drawn ball is red)

) / ( A B P ·

) ' ( ) (

) (

) (

) (

B A P B A P

B A P

A P

B A P

∩ + ∩

∩

·

∩

·

7

4

)

4

2

)(

2

1

( )

3

2

)(

2

1

(

)

3

2

)(

2

1

(

) ' / ( ) ' ( ) / ( ) (

) / ( ) (

·

+

·

+

·

B A P B P B A P B P

B A P B P

Chapter Three - Probability Distributions

To cope with uncertainties of outcome, a statistical model that describes the behavior of the outcome

is needed. These theoretical models which are very similar to relative frequency distributions, are

called probability distributions.

Random Variables - A random variable is a variable that takes on different numerical values

determined by the outcomes of a random experiment.

19

Example 1

An experiment of tossing a coin 3 times.

Let random variable, X be the number of heads achieved.

As S = {HHH HHT HTH THH TTH THT HTT TTT},

so X = {0, 1, 2, 3}

Discrete random variable - in a given interval, only a specified number of values can occur.

Continuous random variable - in a given interval, any value can occur.

Probability Distribution of a random variable - is a representation of the probabilities for all

the possible outcomes.

Example 2

The probability distribution of the number of heads occurred when a coin is tossed 4 times.

x 0 1 2 3 4

P(X=x)

16

1

16

4

16

6

16

4

16

1

That is,

16

) (

4 x

C

x X P · · ,

, 0 · x

1, 2, 4

Example 3

Consider an experiment of tossing two fair dice.

Let random variable, X be the sum of the two dice. Then the probability distribution of X is:

x 2 3 4 5 6 7 8 9 10 11 12

P(X=x)

36

1

36

2

36

3

36

4

36

5

36

6

36

5

36

4

36

3

36

2

36

1

The probability function

) (x f

, of a discrete random variable X expresses the probability that X

takes the value x, as a function of x. That is

) ( ) ( x X P x f · ·

20

where the function is evaluated at all possible values of x.

Properties of probability function

) ( x X P ·

:-

1.

0 ) ( ≥ ·x X P

for any value x.

2.

∑

· ·

x

x X P . 1 ) (

Mathematical Expectations

The expected value, E(X), of a discrete random variable X is defined as

E X xP X x

x

( ) ( ) or

x

µ · ·

∑

It is the mean of the probability distribution.

Let X be a random variable. The expectation of the squared discrepancy about the mean, (X −

µ x)

2

, is called the variance, denoted σ x

2

, and given by

Var X E X

x x

( ) [( ) ] or σ µ

2 2

· −

· − ·

∑

( ) ( ) x P X x

x

x

µ

2

· · −

∑

x P X x

x

x

2 2

( ) µ

Example 4

Calculate the mean and variance of the discrete probability distribution in example 2 and 3.

The Normal Distribution

Normal distribution is probability distribution of a continuous random variable. It is based on the

Law of Errors which states that

1. Errors are inevitable.

2. Large errors are less likely than small errors.

3. Positive and negative errors are equally likely.

Definition :

21

A continuous random variable X is defined to be a normal random variable if its probability

function is given by

f x

x

( )

( )

exp[ ( ) ] · −

− 1

2

1

2

2

σ π

µ

σ

for −∞ < x < +∞

where µ = the mean of X, σ = the standard deviation of X,

π = 3.14154

Notation : X ~ N(µ , σ

2

)

Properties of the normal distribution:-

1. It is a continuous distribution.

2. The curve is symmetric and bell-shaped about a vertical axis through the mean µ .

3. The total area under the curve and above the horizontal axis is equal to 1.

4. Area under the normal curve:

• Approximately 68% of the values in a normally distributed population are within

1 standard deviation from the mean.

• Approximately 95.5% of the values in a normally distributed population are

within 2 standard deviation from the mean.

• Approximately 99.7% of the values in a normally distributed population are

within 3 standard deviation from the mean.

The standard normal curve :

The distribution of a normal random variable with µ = 0 and σ =1 is called a standard normal

distribution. Usually a standard normal random variable is denoted by Z.

Notation : Z ~ N(0, 1)

Remark : Usually a table of Z is set up to find the probability P(Z ≥ z) for z ≥ 0.

Example 7

Given Z ~ N(0, 1)

(a) P(Z > 1.73) = 0.0418

(b) P(0 < Z < 1.73) = P(Z > 0) - P(Z > 1.73) = 0.5 - 0.0418 = 0.4582

22

(c) P(−2.42 < Z < 0.8) = 1 - P(Z < -2.42) - P(Z > 0.8)

= 1 - 0.00776 - .2119 = 0.78034

(d) P(1.8 < Z < 2.8) = P(Z > 1.8) - P(Z > 2.8) = 0.0359 - 0.00256 = 0.03334

(e) the value z that has

(i) 5% of the area below it

Let the corresponding z value be z1, then we have P(Z < z1) = 0.05.

From the standard normal distribution table we have P(Z < -1.64) = 0.05.

So z1 = -1.64

(ii) 39.44% of the area between 0 and z.

Let the corresponding z value be z1, then we have P(0 < Z < z1) = 0.3944.

From the standard normal distribution table we have P(0 < Z < 1.25) = 0.3944.

So z1 = 1.25

Theorem :

If X is a normal random variable with mean µ and standard deviation σ , then

Z

X

·

− µ

σ

is a standard normal random variable and hence

P x X x P

x

Z

x

( ) ( )

1 2

1 2

< < ·

−

< <

− µ

σ

µ

σ

Example 8

Given X ~ N(50, 10

2

), find P(45 < X < 62).

Solution:

,

_

¸

¸ −

<

−

<

−

· < <

σ

µ

σ

µ

σ

µ 62 45

) 62 45 (

X

P X P

,

_

¸

¸ −

< <

−

·

10

50 62

10

50 45

Z P

) 2 . 1 5 . 0 ( < < − · Z P

) 2 . 1 ( ) 5 . 0 ( 1 > − − < − · Z P Z P

= 1 - 0.3085 - .1151 = 0.5764

23

Example 9

The charge account at a certain department store is approximately normally distributed with an

average balance of $80 and a standard deviation of $30. What is the probability that a charge

account randomly selected has a balance

(a) over $125;

(b) between $65 and $95.

Let X be the balance in the charge account. X ∼ N(80,

2

30 )

(a) )

125

( ) 125 (

σ

µ

σ

µ −

>

−

· >

X

P X P

0668 . 0 ) 5 . 1 ( )

30

80 125

( · > ·

−

> · Z P Z P

(b)

,

_

¸

¸ −

<

−

<

−

· < <

σ

µ

σ

µ

σ

µ 95 65

) 95 65 (

X

P X P

,

_

¸

¸ −

< <

−

·

30

80 95

30

80 65

Z P

) 5 . 0 ( ) 5 . 0 ( 1 ) 5 . 0 5 . 0 ( > − − < − · < < − · Z P Z P Z P

3830 . 0 3085 . 0 3085 . 0 1 · − − ·

Example 10

On an examination the average grade was 74 and the standard deviation was 7. If 12% of the

class are given A’s, and the grades are curved to follow a normal distribution, what is the lowest

possible A and the highest possible B?

Let X be the examination grade and x1 be the lowest grade for A.

12 . 0

7

74

12 . 0 ) (

1

1

·

,

_

¸

¸ −

> ⇒ · >

x

Z P x X P

From the standard normal distribution, we get

1210 . 0 ) 17 . 1 ( · > Z P

, and

1190 . 0 ) 18 . 1 ( · > Z P

so

12 . 0 ) 175 . 1 ( ≅ > Z P

Thus 175 . 1

7

74

1

·

− x

24

i.e. 83 2 . 82 ) 175 . 1 )( 7 ( 74

1

≅ · + · x

The highest possible B is 82.

The Binomial Distribution

A binomial experiment possesses the following properties :

1. There are n identical observations or trials.

2. Each trial has two possible outcomes, one called “success” and the other “failure”. The

outcomes are mutually exclusive and collectively exhaustive.

3. The probabilities of success p and of failure 1 −p remain the same for all trials.

4. The outcomes of trials are independent of each other.

Example 11

1. In testing 10 items as they come off an assembly line, where each test or trial may

indicate a defective or a non-defective item.

2. Five cards are drawn with replacement from an ordinary deck and each trial is labelled a

success or failure depending on whether the card is red or black.

Definition :

In a binomial experiment with a constant probability p of success at each trial, the probability

distribution of the binomial random variable X, the number of successes in n independent trials, is

called the binomial distribution.

Notation : X ~ b(n, p)

P(X = x) =

n

x

p q

x n x

¸

¸

_

,

−

x = 0, 1, …, n

p + q = 1

Example 12

Of a large number of mass-produced articles, one-tenth is defective. Find the probabilities that a

random sample of 20 will obtain

(a) exactly two defective articles;

(b) at least two defective articles.

Let X be the number of defective articles in a random sample of 20. X ∼ b(20,

10

1

)

(a) 28517 . 0

10

9

10

1

2

20

) 2 (

18 2

·

,

_

¸

¸

,

_

¸

¸

,

_

¸

¸

· · X P

25

(b)

60825 . 0 27017 . 0 12158 . 1

10

9

10

1

1

20

10

9

10

1

0

20

1

) 1 ( ) 0 ( 1 ) 2 (

19 20 0

· − − ·

,

_

¸

¸

,

_

¸

¸

,

_

¸

¸

−

,

_

¸

¸

,

_

¸

¸

,

_

¸

¸

− ·

· − · − · ≥ X P X P X P

Example 13

A test consists of 6 questions, and to pass the test a student has to answer at least 4 questions

correctly. Each question has three possible answers, of which only one is correct. If a student

guesses on each question, what is the probability that the student will pass the test?

Let X be the no. of correctly answered questions among 6 questions. X ∼ b(6,

3

1

)

( ) ( )

x x

x x

x

x X P X P

−

· ·

∑ ∑

,

_

¸

¸

· · · ≥

6

6

4

6

4

3

2

3

1

6

) ( ) 4 (

( ) ( ) ( ) ( ) ( ) ( ) 10014 . 0

3

2

3

1

6

6

3

2

3

1

5

6

3

2

3

1

4

6 0 6 1 5 2 4

·

,

_

¸

¸

+

,

_

¸

¸

+

,

_

¸

¸

·

Theorem

The mean and variance of the binomial distribution with parameters of n and p are

µ = np and σ

2

= npq respectively where p + q = 1.

Example 14

A packaging machine produces 20 percent defective packages. A random sample of ten packages

is selected, what are the mean and standard deviation of the binomial distribution of that process?

Let X be the no. of defective packages in a sample of 10 packages. X ∼ b(10, 0.2)

Its mean is µ = np = (10)(0.2) = 2

Its standard deviation is 265 . 1 ) 8 . 0 )( 2 . 0 )( 10 ( · · · npq σ

The Normal Approximation to the Binomial Distribution Theorem :

Given X is a random variable which follows the binomial distribution with parameters n and p,

then

P X x P

x np

npq

Z

x np

npq

( ) (

( . )

( )

( . )

( )

) · ·

− −

< <

+ − 05 05

if n is large and p is not close to 0 or 1.

Remark : If both np and nq are greater than 5, the approximation will be good.

26

Example 15

A process yields 10% defective items. If 100 items are randomly selected from the process, what

is the probability that the number of defective exceeds 13?

Let X be the no. of defective in a random sample of 100 items. X ∼ b(100, 0.1)

10 ) 1 . 0 )( 100 ( · · ·np µ

, 3 ) 9 . 0 )( 1 . 0 )( 100 ( · · · npq σ

) 5 . 13 ' ( ) 13 ( > ≅ > X P X P

by normal approximation

121 . 0 ) 167 . 1 (

3

10 5 . 13 5 . 13 '

· > ·

,

_

¸

¸ −

> ·

,

_

¸

¸ −

>

−

· Z P Z P

X

P

σ

µ

σ

µ

Example 17

A multiple-choice quiz has 200 questions each with four possible answers of which only one is

the correct answer. What is the probability that sheer guesswork yields from 25 to 30 correct

answers for 80 of the 200 problems about which the student has no knowledge?

Let X be the no. of correct answers for 80 with sheer guesswork. X ∼ b(80, 0.25)

20 ) 25 . 0 )( 80 ( · · ·np µ

, 15 ) 75 . 0 )( 25 . 0 )( 80 ( · · · npq σ

) 5 . 30 ' 5 . 24 ( ) 30 25 ( < < ≅ ≤ ≤ X P X P

by normal approximation

1196 . 0 00336 . 0 1230 . 0 ) 71 . 2 16 . 1 (

15

20 5 . 30

15

20 5 . 24

· − · < < ·

,

_

¸

¸ −

< <

−

· Z P Z P

The Poisson Distribution

Experiments yielding numerical values of a random variable X, the number of successes

(observations) occurring during a given time interval (or in a specified region) are often called

Poisson experiments.

A Poisson experiment has the following properties:

1. The number of successes in any interval is independent of the number of successes in

other interval.

2. The probability of a single success occurring during a short interval is proportional to the

length of the time interval and does not depend on the number of successes occurring

outside this time interval.

3. The probability of more than one success in a very small interval is negligible.

Examples of random variables following Poisson Distribution

27

1. The number of customers arrived during a time period of length t.

2. The number of telephone calls per hour received by an office.

3. The number of typing errors per page.

4. The number of accidents occurred at a junction per day.

Definition :

The probability distribution of the Poisson random variable X is called the Poisson distribution.

Notation : X ~ Po(λ )

where λ is the average number of successes occuring in the given time interval.

P(X = x) =

e

x

x −λ

λ

!

x = 0, 1, 2, …

e = 2.718283

Theorem: In a Poisson Distribution mean is equal to variance, i.e.,

2

σ µ· .

Example 17

The average number of radioactive particles passing through a counter during 1 millisecond in a

laboratory experiment is 4. What is the probability that 6 particles enter the counter in a given

millisecond?

Let X be the no. of particles entering the counter in a given millisecond. X ∼ Po(4)

1042 . 0

! 6

4

) 6 (

6 4

· · ·

−

e

X P

Example 19

Ships arrive in a harbour at a mean rate of two per hour. Suppose that this situation can be

described by a Poisson distribution. Find the probabilities for a 30-minute period that

(a) No ships arrive;

(b) Three ships arrive.

Let X be the no. of ship arriving in a harbour for a 30-minute period. X ∼ Po( 1

2

2

· )

28

(a) 3679 . 0

! 0

1

) 0 (

0 1

· · ·

−

e

X P

(b) 0613 . 0

! 3

1

) 3 (

3 1

· · ·

−

e

X P

Theorem :

The mean and variance of the Poisson distribution both have mean λ .

Poisson approximation to the binomial distribution

If n is large and p is near 0 or near 1.00 in the binomial distribution, then the binomial distribution

can be approximated by the Poisson distribution with parameter np.

Example 20

If the prob. that an individual suffers a bad reaction from a certain injection is 0.001, determine

the prob. that out of 2000 individuals, more than 2 individuals will suffer a bad reaction.

Sol

n

: According to binomial :

The required probability

= ( ) ( ) ( ) ( ) ( ) ( ) 1

2000

0

0001 0999

2000

1

0001 0999

2000

2

0001 0999

0 2000 1 1999 2 1998

−

¸

¸

_

,

+

¸

¸

_

,

+

¸

¸

_

,

¹

'

¹

¹

;

¹

. . . . . .

Using Poisson distribution:

P(0 suffers) =

2

0

1

0 2

2

e

e

−

·

!

λ = np = 2

P(1 suffers) =

2

2 1

2

! 1

2

e

e

·

−

P(2 suffer) =

2

2!

2

2 2

2

e

e

−

·

Then the required probability = 1

5

0 323

2

− ·

e

.

General speaking, the Poisson distribution will provide a good approximation to binomial when

(i) n is at least 20 and p is at most 0.05; or

(ii) n is at least 100, the approximation will generally be excellent provided p< 0.1.

Example 21

29

Two percent of the output of a machine is defective. A lot of 300 pieces will be produced.

Determine the probability that exactly four pieces will be defective.

Let X be the no. of defective pieces among 300 pieces. X ∼ b(300, 0.02)

1338 . 0 ) 98 . 0 ( ) 02 . 0 ( ) 4 (

296 4

4 300

· · · C X P

By Poisson Approximation:

6 ) 02 . 0 )( 300 ( · · ·np λ

1338 . 0

! 4

6

) 4 (

4 6

· · ·

−

e

X P

CHAPTER 4 - SAMPLING DISTRIBUTIONS AND ESTIMATION

Definition

1. A sample statistic is a characteristic of a sample.

A population parameter is a characteristic of a population.

2. A statistic is a random variable that depends only on the observed random sample.

3. A sampling distribution is a probability distribution for a sample statistic. It indicates the

extent to which a sample statistic will tend to vary because of chance variation in random

sampling.

4. The standard deviation of the distribution of a sample statistic is known as the standard

error of the statistic.

An illustrating example

Suppose a population consists of four elements, {0,1,2,3}. A simple random sample of two

elements is to be drawn.

The population has two parameters: a mean

µ

of 1.5 and a variance

2

σ

of 1.6667.

Obviously there are six possible samples ( 6

4

2

· C ). They are

30

Sample Sample mean error Probability

0,1 0.5 -1.0 1/6

0,2 1.0 -0.5 1/6

0,3 1.5 0 1/6

1,2 1.5 0 1/6

1,3 2.0 0.5 1/6

2,3 2.5 1.0 1/6

From the above table, we can see that if we draw a sample and use the sample mean to estimate

the population mean, the accuracy of our estimate depends on which sample we have drawn,

which in turn depends on chance.

The probability distribution of sample mean is known as a sampling distribution of sample mean,

as compiled in the following table:

Sample mean 0.5 1.0 1.5 2.0 2.5

Probability 1/6 1/6 2/6 1/6 1/6

The expected value of sample mean is

Y y P y y E · · + + + + · ·

∑

5 . 1 6 / 1 * 5 . 2 6 / 1 * 0 . 2 6 / 2 * 5 . 1 6 / 1 * 0 . 1 6 / 1 * 5 . 0 ) ( * ) (

.

Hence the average value of the sample mean is equal to the population mean. We call the sample

mean an unbiased estimator of the population mean.

The variance of the sample mean (i.e., the average square deviation of the sample mean from the

population mean) is: ∑

− · ) ( ) ( ) (

2

y P Y y y V

=

2 2 2 2 2

1 1 2 1 1

(0.5 1.5) * (1.0 1.5) * (1.5 1.5) * (2.0 1.5) * (2.5 1.5) *

6 6 6 6 6

− + − + − + − + −

=0.4167

Sampling Distribution of Mean

The Central Limit Theorem

If repeated samples of size n are drawn from any infinite population with mean

µ

and

variance σ

2

, and n is large (n ≥ 30), the distribution of x , the sample mean, is

approximately normal, with mean µ (i.e. ( ) E x µ · ) and variance σ

2

/n (i.e.

2

( ) V x

n

σ

· ), and this approximation becomes better as n becomes larger.

Notes: As in the previous illustrating example, we can see the following modifications:

31

(i) If the population is finite,

2

( ) (1 )

n

V x

N n

σ

· − ; where (1-n/N) is known as the finite

population correction factor. When N is very big, the factor is equal to 1.

(ii) If n is small, say less than 30, the sampling distribution is not so normal. A t-distribution

will be used (discussed later).

In the above example, N=4, n=2,

2

( ) (1 )

n

V x

N n

σ

· − =(1-2/4)(1.6667/2) = 0.4167. If the

population is big (or the sample is drawn with replacement), then

2

( ) V x

n

σ

· =1.6667/2=0.8333.

In this course we assume a big population or sampling with replacement.

Example 1

An electrical firm manufactures light bulbs that have a length of life that is approximately normal

distributed with mean equal to 800 hours and a standard deviation of 40 hours. Find the

probability that a random sample of 16 bulbs will have an average life of less than 775 hours.

Let X be the average life of the 16 bulbs. X ∼ N(

· · µ µ

x 800, · ·

n

x

2

2

σ

σ

16

40

2

)

,

_

¸

¸

−

< ·

,

_

¸

¸ −

<

−

· <

16

40

800 775 775

) 775 ( Z P

X

P X P

x

x

x

x

σ

µ

σ

µ

00621 . 0 ) 5 . 2 ( · − < · Z P

Example 2

The mean IQ scores of all students attending a college is 110 with a standard deviation of 10.

(a) If the IQ scores are normally distribution, what is the probability that the score of any one

student is greater than 112?

(b) What is the probability that the mean score in a random sample of 36 students is greater

than 112?

(c) What is the probability that the mean score in a random sample of 100 students is greater

than 112?

Solution

(a) Let X be the student's IQ score. X ∼ N(110,

2

10 )

32

4207 . 0 ) 2 . 0 (

10

110 112

) 112 ( · > ·

,

_

¸

¸ −

> · > Z P Z P X P

(b) Let

1

X be the mean score of a sample of 36 students.

1

X ∼ N(

, 110 · · µ µ

x

36

10

2 2

2

· ·

n

x

σ

σ )

1151 . 0 ) 2 . 1 (

36

10

110 112

) 112 (

1

· > ·

,

_

¸

¸

−

> · > Z P Z P X P

(c) Let

2

X be the mean score of a sample of 100 students.

2

X ∼ N(

, 110 · µ

100

10

2 2

·

n

σ

)

228 . 0 . 0 ) 2 (

100

10

110 112

) 112 (

2

· > ·

,

_

¸

¸

−

> · > Z P Z P X P

Estimation

Estimation is the process of using statistics from sample data to estimate the parameters of the

population. A statistic is a random variable which depends on which sample is drawn from a

population.

The followings are some examples

Estimator Population parameter

1.

x

µ

2. s

2

σ

2

3. P P

There are two important properties for an estimator, namely, unbiasedness and efficiency.

Unbiased estimator: An estimator, for example,

x

, is unbiased if and only if ( ) E x µ · .

33

Efficiency: The efficiency of an estimator, for example,

x

, is given by ( ) V x . The smaller the

( ) V x , the more accurate will be the

x

as an estimator.

There are two types of estimate

1. A point estimate is a single-value estimate of a population parameter, for example,

^ ^

; x P p µ · ·

.

2. An interval estimate of a population parameter gives an interval that may contain the true

value of the parameter with a certain probability (i.e. confidence); for example,

Pr( ) 0.99. a b µ < < ·

For a point estimate, both the accuracy and reliability of the estimation are unknown. For an

interval estimate, the width of the interval gives the accuracy and the probability gives the

reliability of the estimation.

Examples 3

(a) The mean and standard deviation for the quality point averages of a random sample of 36

college seniors are calculated to be 2.6 and 0.3, respectively. Find a 95% confidence

interval for the mean of the entire senior class.

(b) How large a sample is required in (a) if we want to be 95% confident of µ is off by less

than 0.05?

Solution

Let

µ

be the mean of the entire senior class.

Given: n = 36, 6 . 2 · x , s = 0.3, (1 - α ) = 0.95 05 . 0 · ⇒α

(a) A 95% confidence interval estimate for the

µ

is

,

_

¸

¸

+ < <

,

_

¸

¸

− ⇒ + < < −

36

3 . 0

96 . 1 6 . 2

36

3 . 0

96 . 1 6 . 2

ˆ ˆ

025 . 0 025 . 0

µ

σ

µ

σ

n

z x

n

z x

698 . 2 502 . 2 < < ⇒ µ

(b) Let

1

n be the required sample size.

To be 95% confident that

µ

if off by less than 0.05 would imply

34

05 . 0

3 . 0

96 . 1 05 . 0

ˆ

1 1

025 . 0

≤

,

_

¸

¸

⇒ ≤

,

_

¸

¸

n n

z

σ

∴ 139 30 . 138

05 . 0

) 3 . 0 )( 96 . 1 (

2

1

≅ ·

1

]

1

¸

≥ n

A summary table for constructing (1 −α )% confidence interval for mean and proportion

Estimating Conditions Formula

Mean

Large samples (n ≥ 30) OR

σ is known

n

Z X

n

Z X

σ

µ

σ

α α

2 2

+ < < −

Mean *

Small samples and σ

unknown

n

s

t X

n

s

t X

2

,

2

,

α

ν

α

ν

µ + < < −

v=n-1

Proportion Large sample

n

q p

Z p

ˆ ˆ

ˆ

2

α

t

Difference of means

Large sample OR

1

σ and

2

σ are known

2

2

1

2

1

2

2 1

ˆ

) (

n n

Z X X

σ σ

α

+ t −

Difference of means

Small sample &

1

σ and

2

σ are unknown, assume

2 1

σ σ ·

2 1

2

,

2 1

1 1

) ( ) (

n n

s t X X

p

+ t −

α

ν

2

2 1

− + · n n ν ,

pooled estimate of sample standard

deviation:

2

) 1 ( ) 1 (

2 1

2

2 2

2

1 1

− +

− + −

·

n n

s n s n

s

p

Difference of means

Small sample &

1

σ and

2

σ are unknown,

assume

1

σ ≠

2

σ

2 2

1 2

1 2

,

2

1 2

( )

s s

X X t

n n

α

ν

− t +

35

1

) / (

1

) / (

) / / (

2

2

2

2

2

1

2

1

2

1

2

2

2

2 1

2

1

−

+

−

+

·

n

n s

n

n s

n s n s

υ

Difference of means Paired observations

, / 2

d

v

s

d t

n

α

t

;

1 2 d x x · −

and v=n-1

Difference of proportions Large samples

2

2 2

1

1 1

2

2 1

ˆ ˆ ˆ ˆ

) ˆ ˆ (

n

q p

n

q p

z p p + t −

α

Example 4

The contents of seven similar containers of sulfuric acid are 9.8, 10.2, 10.4, 9.8, 10.0, 10.2 and

9.6 liters. Find a 95% confidence interval for the mean of all such containers, assuming an

approximate normal distribution.

Solution

Let

µ

be the mean of all such containers.

Given: n = 7 ∑

·70 x

∑

· 48 . 700

2

x

10

7

70

· · ·

∑

n

x

x

08 . 0

6

7

70

48 . 700

1

) (

2

2

2

2

·

−

·

−

−

·

∑

∑

n

n

x

x

s

∴ s = 2828 . 0 08 . 0 · ; (1 -

α

) = 0.95 05 . 0 · ⇒α ,

447 . 2

025 . 0 , 6

· t

A 95% confidence interval estimate for the

µ

is

,

_

¸

¸

+ < <

,

_

¸

¸

− ⇒ + < < −

7

2828 . 0

447 . 2 10

7

2828 . 0

447 . 2 10

025 . 0 , 6 025 . 0 , 6

µ µ

n

s

t x

n

s

t x

262 . 10 738 . 9 < < ⇒ µ

Example 5

In a random sample of n = 500 families owning television sets in the city of Hamilton, Canada, it

was found that x = 340 owned color sets. Find a 95% confidence interval for the actual

proportion of families in this city with colour sets.

Let P be the actual proportion of families in this city with colour sets.

36

Given: n = 500, 68 . 0

500

340

ˆ · · ·

n

x

p ,

05 . 0 95 . 0 ) 1 ( · ⇒ · − α α

A 95% confidence interval for P is

,

_

¸

¸

+ < <

,

_

¸

¸

−

n

q p

z p P

n

q p

z p

ˆ ˆ

ˆ

ˆ ˆ

ˆ

025 . 0 025 . 0

72 . 0 64 . 0

500

) 32 )(. 68 (.

96 . 1 68 . 0

500

) 32 )(. 68 (.

96 . 1 68 . 0 < < ⇒ + < < − ⇒ P P

Examples 6

A standardized chemistry test was given to 50 girls and 75 boys. The girls made an average

grade of 76 with a standard deviation of 6, while the boys made an average grade of 82 with a

standard deviation of 8. Find a 96% confidence interval for the difference µ 1 and µ 2, where µ 1

is the mean score of all boys and µ 2 is the mean score of all girls who might take this test.

Given: 75

1

· n , 50

2

· n , 82

1

· x , 8

1

· s , 76

2

· x , 6

2

· s ,

04 . 0 96 . ) 1 ( · ⇒ · − α α

A 96% confidence interval for

2 1

µ µ − is:

2

2

2

1

2

1

02 . 0 2 1 2 1

2

2

2

1

2

1

02 . 0 2 1

ˆ ˆ

) (

ˆ ˆ

) (

n n

z x x

n n

z x x

σ σ

µ µ

σ σ

+ + − < − < + − −

50

6

75

8

05 . 2 ) 76 82 (

50

6

75

8

05 . 2 ) 76 82 (

2 2

2 1

2 2

+ + − < − < + − − ⇒ µ µ ,

30

1

> n & 30

2

> n , so

1 1

ˆ s · σ &

2 2

ˆ s · σ

57 . 8 43 . 3

2 1

< − < ⇒ µ µ

Example 7

In a batch chemical process, two catalysts are being compared for their effect on the output of the

process reaction. A sample of 12 batches is prepared using catalyst 1 and a sample of 10 batches

was obtained using catalyst 2. The 12 batches for which catalyst 1 was used gave an average

yield of 85 with a sample standard deviation of 4, while the average for the second sample gave

37

an average of 81 and a sample standard deviation of 5. Find a 90% confidence interval for the

difference between the population means, assuming the populations are approximately normally

distributed with equal variances.

Solution

Let

1

µ and

2

µ be the mean population yield using catalyst 1 and catalyst 2, respectively.

Given: 12

1

· n , 10

2

· n , 85

1

· x , 4

1

· s , 81

2

· x , 5

2

· s ,

10 . 0 90 . ) 1 ( · ⇒ · − α α

, 20 2 10 12 2

2 1

· − + · − + · n n ν ,

725 . 1

05 . 0 , 20

· t

pooled estimate of sample standard deviation

2

) 1 ( ) 1 (

2 1

2

2 2

2

1 1

− +

− + −

·

n n

s n s n

s

p

478 . 4

2 10 12

5 ) 1 10 ( 4 ) 1 12 (

2 2

·

− +

− + −

·

A 90% confidence interval for

2 1

µ µ − is:

2 1

05 . 0 , 20 2 1 2 1

2 1

05 . 0 , 20 2 1

1 1

) ( ) (

1 1

) ( ) (

n n

s t x x

n n

s t x x

p p

+ + − < − < + − − µ µ

10

1

12

1

) 478 . 4 )( 725 . 1 ( ) 81 85 (

10

1

12

1

) 478 . 4 )( 725 . 1 ( ) 81 85 (

2 1

+ + − < − < + − − ⇒ µ µ

31 . 7 69 . 0

2 1

< − < ⇒ µ µ

Example 8

The weight of 10 adults selected randomly before and after a certain new diet was introduced was

recorded as follows:

Adult

Before (

1

x ) After (

2

x )

Difference

1 76 81 -5

2 60 52 8

3 85 87 -2

4 58 70 -12

5 91 86 5

6 75 77 -2

7 82 90 -8

8 64 63 1

9 79 85 -6

10 88 83 5

38

Find a 98% confidence interval for the mean difference in weight.

Solution

i

d

d

n

·

∑

= -1.6

2

2

( ( 1.6))

40.7

1

i

d

d

s

n

− −

· ·

−

∑

For v = n-1 = 9;

0.01

2.821 t · .

A 98% confidence interval is

6.38

1.6 (2,821)

10

¸ _

− t

¸ ,

That is 7.29 4.09

d

µ − < <

Example 9

A certain change in a manufacturing procedure for component parts is being considered. Samples

are taken using both the existing and the new procedure in order to determine if the new

procedure results in an improvement. If 75 of 1500 items from the existing procedure were found

to be defective and 80 of 2000 items from the new procedure were found to be defective, find a

90% confidence interval for the true difference in the fraction of defectives between the existing

and the new process.

Solution

Let

1

P and

2

P be the true fraction of defectives of the existing and the new processes,

respectively.

Given: 1500

1

· n , 75

1

· x , 05 . 0

1500

75

ˆ

1

· · p

2000

2

· n , 80

2

· x , 04 . 0

2000

80

ˆ

2

· · p

10 . 0 90 . ) 1 ( · ⇒ · − α α

A 90% confidence interval for

1

P -

2

P is:

2

2 2

1

1 1

05 . 0 2 1 2 1

2

2 2

1

1 1

05 . 0 2 1

ˆ ˆ ˆ ˆ

) ˆ ˆ (

ˆ ˆ ˆ ˆ

) ˆ ˆ (

n

q p

n

q p

z p p P P

n

q p

n

q p

z p p + + − < − < + − −

39

2000

) 96 )(. 04 (.

1500

) 95 )(. 05 (.

64 . 1 ) 01 . 0 (

2000

) 96 )(. 04 (.

1500

) 95 )(. 05 (.

64 . 1 ) 01 . 0 (

2 1

+ + < − < + − ⇒ P P

021697 . 0 001697 . 0

2 1

< − < − ⇒ P P

Lecture 5 - Introduction to Test of Hypothesis

Statistical Hypothesis

Consider the following example:

A manufacturer of sports equipment has developed a new synthetic fishing line that he claims has a

mean breaking strength of 8 kilograms with a standard deviation of 0.5 kilogram. Test the

hypothesis that

µ

= 8 kilograms against the alternative that

µ

≠ 8 kilograms if a random

sample of 50 lines is tested and found to have a mean breaking strength of 7.8 kilograms.

Use a 0.01 level of significance.

When a random sample is drawn from a population (the 50 lines randomly selected), the sample

information can be used assess the validity of some conjecture, or hypothesis. Here

µ

= 8

kilograms is known as the null hypotheis and

µ

≠ 8 kilograms is the alternative hypothesis. They

are complementary to each other, and we need to decide which one to accept on the basis of the

sample result of 50 lines.

Now let us make a 95% confidence interval about the mean breaking strength of the population as

below:

0.5 0.5

7.8 1.96* 7.8 1.96* 0.95

50 50

P µ

¸ _

− < < + ·

¸ ,

;

i.e., (7.6614 7.9386) 0.95 P µ < < · .

As there is a probability of 0.95 that the mean breaking strength is between 7.66 kg and 7.94 kg,

it is highly unlikely that the null hypothesis

µ

= 8 kg is true and hence should be rejected.

There are four possible situations for the above decision making exercise:

0

H is correct

0

H is wrong

Accept

0

H

Correct decision Type 2 error

Reject

0

H

Type 1 error Correct decision

40

We still have a probability of 1-0.95, or 0.05 to reject a true

0

H . We call this probability ‘level of

significance’ or

α

, which is the probability of committing a type 1 error.

The rationale of hypothesis testing is simply outlined as above. There are however some formal

concepts and procedures to conduct the test. The details are put down below.

Some Hypothesis Testing Terminology

1. Null hypothesis, H0

A hypothesis that is held to be true until very strong evidence to the contrary is obtained.

H0 :

0

µ µ ·

2. Alternative hypothesis, H1

It is a hypothesis that is complement to the null hypothesis. Hence it will be accepted if

the null hypothesis is rejected.

1 0

: H µ µ ≠ (two-tail test)

1 0

: H µ µ > (One-tail test)

1 0

: H µ µ < (One-tail test)

In the one-tail test we have some expectation about the direction of the error when the

null hypothesis is wrong, while in the two-tail test we don’t have such expectation.

3. Test statistics

is the value, based on the sample, used to determine whether the null hypothesis should

be rejected or accepted.

4. Critical region

is a region in which if the test statistic falls the null hypothesis will be rejected.

5. Types of error

(a) Type I error: Reject H0 when H0 is true

(b) Type II error: Accept H0 when Ha is true

41

6. The significance level, α

is the probability of committing a type 1 error, i.e., P(Type I error) = α .

The probability of committing a type 2 error is β ; i.e., P(Type II error) = β .

Basic Steps in Testing Hypothesis

1. Formulate the null hypothesis.

2. Formulate the alternative hypothesis.

3. Specify the level of significance to be used.

4. Select the appropriate test statistic and establish the critical region.

5. Compute the value of the test statistic.

6. Conclusion: Reject H0 if the statistic has a value in the critical region, otherwise

accept H0.

42

Tests concerning means

The tests concerning means and proportions are summarized in the following table.

H0 Conditions Test statistic

0

µ µ ·

Large samples (n ≥ 30) OR σ is

known

n

x

z

σ

µ

0

−

·

0

µ µ · Small samples and σ unknown

n s

x

t

0

µ −

·

with 1 − · n υ

0 2 1

d · −µ µ

Large samples OR

1

σ and

2

σ are

known

2

2

2

1

2

1

0 2 1

) (

n n

d x x

z

σ σ

+

− −

·

0 2 1

d · −µ µ

Small sample &

1

σ and

2

σ are

unknown, assume

2 1

σ σ ·

,

_

¸

¸

+

− −

·

2 1

0 2 1

1 1

) (

n n

s

d x x

t

p

with 2

2 1

− + · n n υ and

2

) 1 ( ) 1 (

2 1

2

2 2

2

1 1 2

− +

− + −

·

n n

s n s n

s

p

if

1

σ =

2

σ but unknown

0 2 1

d · −µ µ

Small sample &

1

σ and

2

σ are

unknown,

assume

1

σ ≠

2

σ

2

2

2

1

2

1

0 2 1

) (

n

s

n

s

d x x

t

+

− −

·

with

1

) / (

1

) / (

) / / (

2

2

2

2

2

1

2

1

2

1

2

2

2

2 1

2

1

−

+

−

+

·

n

n s

n

n s

n s n s

υ

43

0 2 1

d · −µ µ Paired observations

n s

d d

t

d

0

−

·

with 1 − · n υ

p p ·

0

Large sample

0

0 0

ˆ

(1 )

p p

z

p p

n

−

·

−

p p

1 2

0 − · Large samples

,

_

¸

¸

+ −

−

·

2 1

2 1

1 1

) ˆ 1 ( ˆ

) ˆ ˆ (

n n

p p

p p

z

1 1 2 2

1 2

ˆˆ

ˆ

n p n p

p

n n

+

·

+

44

Example 1

A manufacturer of sports equipment has developed a new synthetic fishing line that he claims has a

mean breaking strength of 8 kilograms with a standard deviation of 0.5 kilogram. Test the

hypothesis that

µ

= 8 kilograms against the alternative that

µ

≠ 8 kilograms if a random

sample of 50 lines is tested and found to have a mean breaking strength of 7.8 kilograms.

Use a 0.01 level of significance.

Null hypothesis:

8 · µ

kilograms

Alternative hypothesis:

8 ≠ µ

kilograms

Level of significance: 0.01

Critical region: Z >

58 . 2

005 . 0

· z

or Z <

58 . 2

005 . 0

− · −z

Computation:

n = 50 8 . 7 · x 5 . 0 · σ

828 . 2

50

5 . 0

8 8 . 7

− ·

−

·

−

·

n

x

z

σ

µ

Conclusion: As the sample z (= -2.828) falls inside the critical region, so reject the null

hypothesis at 0.01 level of significance and conclude that

µ

is significantly smaller than 8

kilograms.

Example 2

The average length of time for students to register for fall classes at a certain college has been 50

minutes with a standard deviation of 10 minutes. A new registration procedure using modern

computing machines is being tried. If a random sample of 12 students had an average registration

time of 42 minutes with a standard deviation of 11.9 minutes under the new system, test the

hypothesis that the population mean is now less than 50, using a level of significance of (1) 0.05,

and (2) 0.01. Assume the population of times to be normal.

Let

µ

be the population mean time for students to register in the new registration procedure.

(1) Null hypothesis:

50 · µ

minutes

Alternative hypothesis:

50 < µ

minutes

Level of significance: 0.05

Critical region: (n = 12 < 30; and the new σ is unknown, so t-test should be used )

degree of freedom (ν ) = n -1 = 12 -1 =11

45

∴ t <

796 . 1

05 . 0 , 11

− · t

Computation:

n = 12 42 · x s = 11.9

∴

329 . 2

12

9 . 11

50 42

− ·

−

·

−

·

n

s

x

t

µ

Conclusion: As sample t (= -2.329) falls inside the critical region, so reject the null

hypothesis at 0.05 level of significance and conclude that

µ

is significantly smaller than

50 minutes.

(2) Identical with those of (1) except the critical region would be replaced by:

718 . 2

01 . 0 , 11

− · <t t

and the corresponding conclusion would be changed as follows:

As sample t (= -2.329) falls outside the critical region, so reject the alternative hypothesis

at 0.01 level of significance and conclude that

µ

is not highly significantly smaller than

50 minutes.

Example 3

An experiment was performed to compare the abrasive wear of two different laminated materials.

Twelve pieces of material 1 were tested, by exposing each piece to a machine measuring wear.

Ten pieces of material 2 were similarly tested. In each case, the depth of wear was observed.

The samples of material 1 gave an average (coded) wear of 85 units with a standard deviation of

4, while the samples of material 2 gave an average of 81 and a standard deviation of 5. Test the

hypothesis that the two types of material exhibit the same mean abrasive wear at the 0.10 level of

significance. Assume the populations to be approximately normal with equal variances.

Let

1

µ and

2

µ be the mean abrasive wear of material 1 and 2 respectively.

Null hypothesis:

2 1

µ µ · , i.e. 0

2 1

· − µ µ

Alternative hypothesis:

2 1

µ µ ≠ , i.e. 0

2 1

≠ − µ µ

Level of significance: 0.10

Critical region: (As both

1

n and

2

n are smaller than 30 and their standard deviations are

unknown, so t-test has to be used.)

20 2 10 12 2

2 1

· − + · − + · n n ν

, ∴

725 . 1

05 . 0 , 20

· >t t

or

725 . 1

05 . 0 , 20

− · − < t t

46

Computation:

12

1

· n 85

1

· x 4

1

· s

10

2

· n 81

2

· x 5

2

· s

05 . 20

2 10 12

5 ) 9 ( 4 ) 11 (

2

) 1 ( ) 1 (

2 2

2 1

2

2 2

2

1 1 2

·

− +

+

·

− +

− + −

·

n n

s n s n

s

p

086 . 2

10

1

12

1

05 . 20

0 ) 81 85 (

1 1

) ( ) (

2 1

2 1 2 1

·

+

− −

·

+

− − −

·

n n

s

x x

t

p

µ µ

Conclusion: As the sample t (=2.086) falls inside the critical region, so reject the null hypothesis

at 0.10 level of significance and conclude that the mean abrasive wear of material 1 is

significantly higher than that of the material 2.

Example 4

Five samples of a ferrous-type substance are to be used to determine if there is a difference

between a laboratory chemical analysis and an X-ray fluorescence analysis of the iron content.

Each sample was split into two sub-samples and the two types of analysis were applied.

Following are the coded data showing the iron content analysis:

Sample

Analysis 1 2 3 4 5

x-ray 2.0 2.0 2.3 2.1 2.4

Chemical 2.2 1.9 2.5 2.3 2.4

Assuming the populations normal, test at the 0.05 level of significance whether the two methods

of analysis give, on the average, the same result.

Let

1

µ and

2

µ be the mean iron content determined by the laboratory chemical analysis and

X-ray fluorescence analysis respectively; and

D

µ be the mean of the population of differences of paired measurements.

Null hypothesis:

2 1

µ µ · or 0 ·

D

µ

Alternative hypothesis:

2 1

µ µ ≠ or 0 ≠

D

µ

Level of significance: 0.05

Critical region: (As n = 5 < 30, so t-test should be used.)

47

∴

776 . 2

025 . 0 , 4

· >t t

or

776 . 2

025 . 0 , 4

− · − < t t

Computation:

Sample

Analysis 1 2 3 4 5

x-ray 2.0 2.0 2.3 2.1 2.4

Chemical 2.2 1.9 2.5 2.3 2.4

i

d -0.2 0.1 -0.2 -0.2 0

2

i

d

0.04 0.01 0.04 0.04 0

5 . 0

5

1

− ·

∑

· i

i

d 13 . 0

5

1

2

·

∑

· i

i

d

1 . 0

5

5 . 0

5

− ·

−

· ·

∑

d

d

( )

02 . 0

) 4 )( 5 (

) 5 . 0 ( ) 13 . 0 )( 5 (

) 1 (

2

2

2

2

·

− −

·

−

−

·

∑ ∑

n n

d d n

s

d

5811 . 1

5

02 . 0

0 ) 1 . 0 (

− ·

− −

·

−

·

n

s

d

t

d

D

µ

Conclusion: As the sample t (=-1.5811) falls outside the critical region, so reject the alternative

hypothesis at 0.05 level of significance and conclude that there is no significant difference in the

mean iron content determined by the above two analyses.

Tests Concerning Proportions

Example 5

48

A manufacturing company has submitted a claim that 90% of items produced by a certain process

are non-defective. An improvement in the process is being considered that they feel will lower

the proportion of defective below the current 10%. In an experiment 100 items are produced with

the new process and 5 are defective. Is this evidence sufficient to conclude that the method has

been improved? Use a 0.05 level of significance.

Let P be the proportion of defective product in the new production process.

Null hypothesis: P = 0.1

Alternative hypothesis: P < 0.1

Level of significance: 0.05

Critical region:

64 . 1

05 . 0

− · − < z Z

Computation:

n = 100 x = 5 05 . 0

100

5

ˆ · · ·

n

x

p

667 . 1

100

) 9 . 0 )( 1 . 0 (

1 . 0 05 . 0

) 1 (

ˆ

− ·

−

·

−

−

·

n

P P

P p

z

Conclusion: As the sample z (=-1.667) falls inside the critical region, so reject the null hypothesis

at 0.05 level of significance and conclude that P is significantly smaller than 0.1. That is, the

production method has been improved in lowering the proportion of defective below the current

10%.

Example 6

A vote is to be taken among the residents of a town and the surrounding country to determine

whether a proposed chemical plant should be constructed. The construction site is within the

town limits and for this reason many voters in the country feel that the proposal will pass because

of the large proportion of town voters who favor the construction. To determine if there is a

significant difference in the proportion of town voters and county voters favoring the proposal, a

poll is taken. If 120 of 200 town voters favor the proposal and 240 of 500 county residents favor

it, would you agree that the proportion of town voters favoring the proposal is higher than the

proportion of county voters? Use a 0.025 level of significance.

Let

1

P and

2

P be the proportions of town voters and country voters, respectively, favouring

the proposal.

Null hypothesis:

2 1

P P · or 0

2 1

· −P P

Alternative hypothesis:

2 1

P P > or 0

2 1

> −P P

Level of significance: 0.025

49

Critical region:

96 . 1

025 . 0

· >z Z

Computation:

200

1

· n 120

1

· x 6 . 0

200

120

ˆ

1

1

1

· · ·

n

x

p

500

2

· n 240

2

· x 48 . 0

500

240

ˆ

2

2

2

· · ·

n

x

p

514 . 0

500 200

) 48 . 0 )( 500 ( ) 6 . 0 )( 200 ( ˆ ˆ

ˆ

2 1

2 2 1 1

·

+

+

·

+

+

·

n n

p n p n

p

870 . 2

500

1

200

1

) 486 . 0 )( 514 . 0 (

0 ) 48 . 0 6 . 0 (

1 1

) ˆ 1 ( ˆ

) ( ) ˆ ˆ (

2 1

2 1 2 1

·

1

]

1

¸

+

− −

·

1

]

1

¸

+ −

− − −

·

n n

p p

P P p p

z

Conclusion: As sample z (=2.870) falls inside the critical region, so reject the null hypothesis at

0.025 level of significance and conclude that the proportion of town voters favouring the proposal

is significantly larger than that of the country voters.

50

Chapter 7 - Chi-square Tests

There are two types of chi-square tests: goodness-of-fit test and tests for independence.

Goodness-of-fit Test

A test to determine if a population has a specified theoretical distribution. The test is based on

how good a fit we have between the frequency of occurrence of observations in an observed

sample and the expected frequencies obtained from the hypothesized distribution.

Theorem: A goodness-of-fit test between observed and expected frequencies is based on the

quantity

χ

test

2

2

·

−

∑

( ) O E

E

i i

i

where χ

test

2

is a value of the random variable whose sampling distribution is

approximated very closed by the Chi-square distribution,

Oi is the observed frequency of cell i, and Ei is the expected frequency of cell i.

The number of degrees of freedom in a Chi-square goodness-of-fit test is equal to

the number of cells minus the number of quantities obtained from the observed

data that are used in the calculations of the expected frequencies.

For a level of significance equal to α χ χ

α

,

test

2 2

> constitutes the critical region. The decision

criterion described here should not be used unless each of the expected frequencies is at least

equal to 5.

Example 1

Consider the tossing of a die 120 times.

Faces

1 2 3 4 5 6

Observed 20 22 17 18 19 24

Expected

By comparing the observed frequencies with the expected frequencies, one has to decide whether

the die is fair die or not.

Null hypothesis: the die is a fair die, i.e.

6

1

) ( · · i X P for i = 1, 2, 3, 4, 5, and 6

Alternative hypothesis: the die is not a fair die

51

Level of significance: 0.05

Critical region: 5 1 6 1 · − · − · n ν ; ∴

07 . 11

2

05 . 0 , 5

2

· > χ χ

Computation:

Expected value = 20 )

6

1

( 120 ) ( · · ·i X nP

i 1 2 3 4 5 6

Observed

) (

i

O 20 22 17 18 19 24

Expected

) (

i

E

20 20 20 20 20 20

i i

E O − 0 2 -3 -2 -1 4

∑

·

−

·

6

1

2

2

) (

i i

i i

E

E O

χ

7 . 1

20

4

20

) 1 (

20

) 2 (

20

) 3 (

20

2

20

0

2 2 2 2 2 2

· +

−

+

−

+

−

+ + ·

Conclusion: As the sample

2

χ (= 1.7) falls outside the critical region, so reject the alternative

hypothesis and conclude that the die is a fair die.

Example 2

The following distribution of battery lives may be approximated by the normal distribution.

Class boundaries Oi z-value p-value Ei

1.45 - 1.95 2

1.95 - 2.45 1

2.45 - 2.95 4

2.95 - 3.45 15

3.45 - 3.95 10

3.95 - 4.45 5

4.45 - 4.95 3

Chi-squared test can be applied to test whether the above frequency distribution can be

approximated by a normal distribution or not.

Null hypothesis: the distribution can be approximated by a normal distribution

Alternative hypothesis: the distribution cannot be approximated by a normal distribution

52

Level of significance: 0.05

Critical region:

2

05 . 0 ,

2

ν

χ χ >

where 3 − · n ν , and n is the number of cells.

Computation:

For finding the expected values, the mean and standard deviation of the frequency distribution

have to be found first.

Class boundaries

) (

i

f

Oi

Class mark

(

i

x

)

1.45 - 1.95 2 1.7

1.95 - 2.45 1 2.2

2.45 - 2.95 4 2.7

2.95 - 3.45 15 3.2

3.45 - 3.95 10 3.7

3.95 - 4.45 5 4.2

4.45 - 4.95 3 4.7

n = 40 ∑

· 5 . 136 fx 75 . 484

2

·

∑

fx

4125 . 3

40

5 . 136

· · ·

∑

∑

f

fx

x

( )

( )

6969 . 0

39

40

5 . 136

75 . 484

1

/

2

2

2

·

−

·

−

−

·

∑ ∑

n

n fx fx

s

1

]

1

¸

−

< <

−

· −

6969 . 0

4125 . 3

6969 . 0

4125 . 3

i i

U

Z

L

value z

;

where

i

L

and

i

U

are the Lower and Upper Boundaries of the ith class.

1

]

1

¸

−

< <

−

· −

6969 . 0

4125 . 3

6969 . 0

4125 . 3

i i

U

Z

L

value p

53

1

]

1

¸

−

< <

−

·

6969 . 0

4125 . 3

6969 . 0

4125 . 3

) 40 (

i i

i

U

Z

L

P E

Class boundaries Oi z-value p-value Ei

1.45 - 1.95 2 Z<-2.10 .0179 0.716

1.95 - 2.45 1 -2.10<Z<-1.38 .0659 2.636

2.45 - 2.95 4 -1.38<Z<-0.66 .1708 6.832

2.95 - 3.45 15 -0.66<Z<0.05 .2653 10.612

3.45 - 3.95 10 0.05<Z<0.77 .2595 10.38

3.95 - 4.45 5 0.77<Z<1.49 .1525 6.1

4.45 - 4.95 3 1.49<Z .0681 2.724

In order to satisfy the rule that the expected value in each cell is larger than or equal to 5, we have

to combine the first three classes in to one cell and the last two classes into another cell. As such,

the number of cells (n) is 4.

824 . 8

) 824 . 8 8 (

38 . 10

) 38 . 10 10 (

612 . 10

) 612 . 10 15 (

184 . 10

) 184 . 10 7 ( ) (

2 2 2 2 2

2

−

+

−

+

−

+

−

·

−

·

∑

E

E O

χ

901 . 2 ·

Conclusion: Since

841 . 3

2

05 . 0 , 1

· χ

, so the sample

2

χ (= 2.901) falls outside the critical region.

As such, reject the alternative hypothesis and conclude that the distribution of battery lives can be

approximated by a normal distribution.

Test for Independence

The Chi-square test procedure can also be used to test the hypothesis of independence of two

variables/attributes. The observed frequencies of two variables are entered in a two-way

classification table, or contingency table.

Remark: The expected frequency of the cell in the i

th

row and j

th

column in the contingency

table

E

ij

·

(total of row i) *(total of column j)

grand tota l

The degrees of freedom for the contingency table is equal to (r −1) (c −1) where r is the number of

rows and c is the number of columns in the table.

54

Example 3

Suppose that we wish to study the relationship between grade point average and appearance.

Grade Point Average

Appearance 1 2 3 4 Totals

attractive 14 ( ) 11 ( ) 10 ( ) 5 ( ) 40

ordinary 10 ( ) 16 ( ) 16 ( ) 14 ( ) 56

unattractive 3 ( ) 4 ( ) 7 ( ) 10 ( ) 24

Totals 27 31 33 29 120

Null hypothesis: There is no relationship between grade point average and appearance. That is,

the two characteristics are independent.

Alternative hypothesis: There is a relationship between grade point average and appearance. That

is, the two characteristics are not independent.

Level of significance: 0.05

Critical region:

2

05 . 0 ,

2

ν

χ χ >

, where ν = (r -1)(c - 1)

Computation:

Grade Point Average

Appearance 1 2 3 4 Totals

attractive 14

(9)

11

(10.33)

10

(11)

5

(9.67)

40

ordinary 10

( 12.6)

16

( 14.47)

16

(15.4)

14

(13.53)

56

unattractive 3

(5.4)

4

(6.2)

7

(6.6)

10

(5.8)

24

Totals 27 31 33 29 120

67 . 9

) 67 . 9 5 (

11

) 11 10 (

33 . 10

) 33 . 10 11 (

9

) 9 14 ( ) (

2 2 2 2 2

2

−

+

−

+

−

+

−

·

−

·

∑

E

E O

χ

53 . 13

) 53 . 13 14 (

4 . 15

) 4 . 15 16 (

47 . 14

) 47 . 14 16 (

6 . 12

) 6 . 12 10 (

2 2 2 2

−

+

−

+

−

+

−

+

55

818 . 10

8 . 5

) 8 . 5 10 (

6 . 6

) 6 . 6 7 (

2 . 6

) 2 . 6 4 (

4 . 5

) 4 . 5 3 (

2 2 2 2

·

−

+

−

+

−

+

−

+

Conclusion: Since

596 . 12

2

05 . 0 , 6

· χ

, so sample

2

χ (=10.818) falls outside the critical region.

So reject the alternative hypothesis and conclude that there is no evidence to support there is

relationship between grade point average and appearance.

Test for Homogeneity

To test the hypothesis that several population proportions are equal.

Remark: The approach for the test of homogeneity is the same as for the test of

independence of variables/attributes.

Example 4

A study of the purchase decisions for 3 stock portfolio managers, A, B, and C was conducted to

compare the rates of stock purchases that resulted in profits over a time period that was less than

or equal to one year. One hundred randomly selected purchases obtained for each of the managers

showed the results given in the table. Do these data provide evidence of differences among the

rates of successful purchases for the three portfolio managers? Test with . 05 . 0 · α

Result

Manager

A B C

Purchases show profit 63 71 55

Purchase do no show profit 37 29 45

Total 100 100 100

Null hypothesis: the rates of stock purchases that resulted in profit were the same for the three

stock portfolio managers

Alternative: their rates were not all the same

Level of significance: 0.05

Critical region:

2 ) 1 3 )( 1 2 ( · − − · ν

; ∴

991 . 5

2

05 . 0 , 2

2

· > χ χ

Computation:

Result

Manager

A B C Total

Purchases show profit 63

(63)

71

(63)

55

(63)

189

56

Purchase do no show profit 37

(37)

29

(37)

45

(37)

111

Total 100 100 100 300

49 . 5

37

) 37 45 (

37

) 37 29 (

37

) 37 37 (

63

) 63 55 (

63

) 63 71 (

63

) 63 63 (

2 2 2 2 2 2

2

·

−

+

−

+

−

+

−

+

−

+

−

· χ

Conclusion: As the sample

2

χ (= 5.49) falls outside the critical region so reject the alternative

hypothesis and conclude that there is no sufficient evidence to support the rates of purchases

resulted in profit of the three portfolio managers were different.

57

A useful method for summarizing a set of data is the construction of a frequency table, or a frequency distribution. That is, we divide the overall range of values into a number of classes and count the number of observations that fall into each of these classes or intervals. The general rules for constructing a frequency distribution are: (i) (ii) There should not be too few or too many classes. Insofar as possible, equal class intervals are preferred. But the first and last classes can be open-ended to cater for extreme values.

In example 1, the sample size is 100 and the range for the data is 113 (137 - 24). A frequency distribution with six classes is appropriate and it is shown below. Frequency distribution of household town gas consumption Town gas monthly consumption ( in cubic metres) 20 - 39 40 - 59 60 - 79 80 - 99 100 - 119 120 - 139 Total Number of households 5 15 25 30 18 7 100

Class limits: are the numbers that typically serve to identify the classes in a listing of a frequency distribution. Thus, in the above frequency distribution, for the class whose frequency is 30, its lower class limit is 80 and upper class limit is 99. As contrasted to a class limit, a class boundary is the precise point that separates one class from another, rather than being a value indicated in one of the classes. A class boundary is typically located midway between the upper limit of a class and the lower limit of the next higher class adjoining it. Therefore the class boundary separating the class 60-79 and the class 80-99 is halfway between 79 and 80, that is, at the point 79.5. Class interval: is the width of a class. The class interval of a class is computed by subtracting the lower limit (boundary) of the class from the lower limit (boundary) of the next class. Class midpoint or class mark: is the point dividing the class into equal halves on the basis of class interval. This point can be obtained by adding the lower and upper limits (boundaries) of a class and dividing by 2. Relative frequency of a class: is the frequency of the class divided by the total frequency of the distribution. Cumulative frequency distribution: shows the number of items of a series that are less than (or more than) certain specified values.

2

Measure of Central Tendency A value that would describe the 'centre' of a distribution would be visually located near the spot where most of the data seem to be concentrated. Consequently, values that fulfil this role are called measures of central tendency. The most common measures of the central tendency of a data set are arithmetic mean or simply as mean, median and mode. The mean of a set of numerical data is the sum of the set divided by the number of observations, that is, their average. The median of a distribution is the value which divides the distribution so that an equal number of values lie on either side of it, i.e., half of the items have values smaller or equal to it and half of the items have values larger or equal to it. The mode of a set of numerical data is the value which occurs most frequently.

Example 1 (calculating mean, median and mode for individual data) The following table shows the hourly wage rates of eight sampled construction workers. Worker i Hourly wage rate ( x i ) 1 $35 2 38 3 46 4 60 5 65 6 69 7 72 8 78

Mean

(x) =

=

∑x

i =1

8

i

(=

8

x1 + x 2 + x3 + x 4 + x5 + x6 + x7 + x8 ) 8

463 = 57 .875 ($) 8 n +1 9 = = 4.5 th 2 2

Location of the median:

Median =

x 4 + x5 60 + 65 = = 62.5 ($) 2 2

Mode: the sample size is too small, mode cannot be identified. Example 2 (calculating mean, median and mode for grouped data) The following table shows the daily wages of a random sample of construction workers. Calculate its mean, median and mode.

3

Daily Wages ($) 200 - 399 400 - 599 600 - 799 800 - 999 1000 - 1199 1200 - 1399 Total

Number of Workers 5 15 25 30 18 7 100

Solution Number of Daily Wages ($) 200 - 399 400 - 599 600 - 799 800 - 999 1000 - 1199 1200 - 1399 Total

6

Workers

fi

Class Mark

xi

f i xi

5 15 25 30 18 7 100

299.5 499.5 699.5 899.5 1,099.5 1,299.5

1,497.5 7,492.5 17,489.5 26,985.5 19,791.0 9,096.5 82,350.0

Mean ( x ) =

∑fx

i =1 6 i

i

∑f

i =1

=

82,350.0 = 823.5 ($) 100

i

Daily Wages ($) 200 - 399 400 - 599 600 - 799 800 - 999 1000 - 1199 1200 - 1399

Number of Workers

fi

Cumulative Frequency

Fi

5 15 25 30 18 7

5 20 45 75 93 100

4

Total

100

As 0.5n = 0.5(100) = 50, so the median lies in the 4th class. Median = L4 +

0.5n − F3 (c 4 ) f4

where

L is the lower class boundary,

c

is the class interval.

= 799 .5 +

**0.5(100 ) − 45 ( 200 ) = 832 .8 ($) 30
**

Number of Class Interval

ci

Relative Density

Daily Wages ($)

Workers

fi

f i' =

fi (200 ) ci

5 15 25 30 18 7

200 - 399 400 - 599 600 - 799 800 - 999 1000 - 1199 1200 - 1399 Total

5 15 25 30 18 7 100

200 200 200 200 200 200

As f 4' = 30 is the largest relative density, so mode lies in the 4th class.

f 4' − f 3' (c 4 ) Mode = L4 + ( f 4' − f 5' ) + ( f 4' − f 3' )

= 799 .5 + 30 − 25 ( 200 ) = 858 .3 ($) (30 −18 ) + (30 − 25 )

Advantages and disadvantages of each measure Mean Advantages: (i) (ii) (iii) All values in the distribution are used in its calculation, so it can be regarded as more representative than the other two measures. Its method of calculation is simple and most people understand the meaning of its result. Its result can easily be used in further analysis.

5

It has to be supplemented by other statistics. As such. 6 . including all values. It has to be supplemented by other statistics because it does not reflect the distribution in the way that the mean does. (ii) Median Advantage: Disadvantage: Mode Advantages: (i) (ii) Its result will not be affected by extreme values and open end classes. that is. If data are not grouped. In case of open end classes. Its result will not be affected by extreme values and open end classes. then the mean may be subjected to substantial error. it can be determined easily. its result may be rather lower or higher than the bulk of the values and becomes unrepresentative. mean can be calculated only if their class marks are determined.Disadvantages: (i) Its result can be easily distorted by extreme values. If such classes contain a large proportion of the values. It is difficult to obtain an accurate estimate of the mode if the values are classified into a frequency distribution. Disadvantages: (i) (ii) How to select a suitable measure (i) (ii) (iii) Always select the mean whenever there is no special reason for choosing the other two measures. Select the median is the distribution consists of substantial amount of extreme large or small values. Select the mode if integral result is preferred as in cases the data are in ordinal scales.

It is obvious that company A. Therefore in buying strawberry jam we would feel more confident that the bottle we select will be closer to the advertised average content if we buy from company A. in comparison with company B. by itself. We will usually require. we seldom choose to use range to indicate variability of the data. Example 2 (For individual data) The following table shows the hourly wage rates of eight sampled construction workers. mean deviation. in grams. So whenever the sample size is over 10.$35 = $43. 7 . We shall not study these measures. sufficient to provide an adequate summary of the characteristics of a set of data. in addition. for two samples of strawberry jam bottled by companies A and B: Sample for Company A Sample for Company B 31 28 32 29 32 32 33 35 32 36 Both samples have the same mean. We say that the variability of the observations is smaller for company A. Example 1 Consider the following measurements. (There are some other measures like quartile deviation and percentiles. bottles strawberry jam with a more consistent content.Measure of data variation (variability) A measure of central tendency is almost never. 32 grams. Read our textbook if interested) The range of a set of numbers is the difference between the largest and the smallest number in the set. its result is unstable. Though range is simple and can be obtained easily. a measure of the amount of variation in the data. The most important measures of variability or dispersion are the range. Worker i Hourly wage rate ( x i ) 1 $35 2 38 3 46 4 60 5 65 6 69 7 72 8 78 The range is $78 . standard deviation and variance. This is particularly true if the sample size is large.

25 = 13. s= ∑ (x − x ) i =1 i n 2 n −1 1 $35 2 38 3 46 4 60 5 65 6 69 7 72 8 78 Total 463 Worker i Hourly wage rate ( x i ) s= ( x − 35.226($) 7 Variance is the square of the standard deviation. 8 . Worker i Hourly wage rate ( x i ) xi − x = x i −57 .875 8 = ($) 109 .125 6 69 11. replacing µ by x and N by n −1 . we can use the standard deviation instead. σ = ∑ (x i =1 N i − µ )2 .125 5 65 7.656 8 The mean deviation is a good measure to show the extent of variation of the data in a distribution. To avoid this pitfall.875 2 38 19.875 1 $35 22.875) 2 = 16. when this measurement is used in further analysis.875 4 60 2. Standard deviation of a population ( σ) is the square root of the average of the squared distances of the observations from the mean. where µ is the population mean N To compute the sample standard deviation (s ) we use the above formula.125 7 72 14.875 3 46 11.Mean deviation is the average of the absolute deviation of the numerical data from their mean.125 Mean deviation ∑x i =1 8 i − 57.125 8 78 20. it would give rise to some unnecessary tedious mathematical problem as a result of its absolute value term. However.

160 = 211.299.160 Mean deviation = ∑f i =1 6 i 6 xi − x i ∑f i =1 = 21. Calculate its mean deviation.5 5 15 25 30 18 7 100 299.5 1.s2 = ∑ (x − x ) i =1 i n 2 n −1 Example 3 (for grouped data) The following table shows the daily wages of a random sample of construction workers.399 400 .099.5 699.1399 Total Solution Number of Daily Wages ($) 200 .799 800 .100 2.620 4.5 499.5 1.799 800 .1399 Total Workers fi Number of Workers 5 15 25 30 18 7 100 Class Mark xi f i xi − x = f i xi −823 .599 600 .332 21.968 3. and standard deviation.5 899.860 3. Daily Wages ($) 200 .999 1000 . variance.599 600 .60 ($) 100 Number of Daily Wages ($) 9 Workers Class Mark f i ( xi − x) 2 .5 2.1199 1200 .399 400 .1199 1200 .999 1000 .280 4.

799 800 .5 1.5 1.586.999 1000 .099.574.280 1. V The correct measure that should be used is the coefficient of variation (C ) .400 6462400 = 65. 372.371.399 400 .fi xi 200 .880 1.49 Comparison of the variation of two distributions The values of the standard deviations cannot be used as the bases of the comparison because: (a) units of measurements of the two distributions may be different.5 1. In an effort to increase their winnings.77 99 Standard deviation = 65276. s CV = 100% x Example 4 The following table shows the summary statistics for the daily wages of two types of workers. Solution In comparison Average magnitude Distribution II > I Variation Chapter Two . Worker's Type I II Mean $100 $150 Daily Wages Standard deviation $20 $24 Compare these two daily wages distributions. 276.Probability Introduction and concepts “Perhaps it was man’s unquenchable thirst for gambling that led to the early development of probability theory. and (b) average values of two distributions may be widely dissimilar.400 173.462.1399 Total 2 Variance ( s ) = 5 15 25 30 18 7 100 299.77 = 255.168 1. gamblers called upon the I > II Reason x II = 150 > x I = 100 CVI = 20 24 100% = 20% > CVII = 100% = 16% 100 150 10 .599 600 .5 699.5 899.299.1199 1200 .5 499.032 6.640 384.

Some counting rules Example 1 Three items are selected at random from a manufacturing process. A number of axioms have been set up and from these some theorems of probability have been developed. number of customers. there are k possible outcomes which are equally likely.” ---. share price movement. business.E. There are three approaches to understand probability. Probability may be taken as a tool with which we may solve problems involving uncertainties. etc. Suppose in a trial of an experiment. temperature. The probability of the occurrence of an outcome is therefore 1/k.from Walpole R. length of our life.mathematicians to provide optimum strategies for various games of chance. In fact uncertainty is a basic element of human experiences. Each item is inspected and classified defective (D) or non-defective (N). DDD Its sample space is = DNN DDN NDN DND NND NDD NNN Example 2 11 . Some Basic Concepts Sample space: is a set of all possible outcomes of an experiment. the probability of having a head is ½. and scientific research. The third approach is very mathematical. we shall adopt this approach but the empirical approach is always useful in giving us some intuition to understand the problem. As such the probability of an aeroplane arriving its destination on time may be taken as the proportion of times the aeropline has been on time in the past. weather forecasting. Hence the following counting rules may be helpful. Event: is a subset of a sample space. To cite some examples: travelling time. say. Thus in throwing a coin. one thousand times. To find the probability of an event we need to count the number of outcomes of the event and the number of all possible outcomes of the experiment. In the empirical approach. In our course. rainfall. and then to divide the former by the latter. probability may be taken as a relative frequency. Introduction to Statistics Probability is the basis upon which the discipline of statistics has been developed and applied in many fields associated with chance occurrences such as politics. This approach is too abstract and usually used by mathematicians.

( 2)(1) = n! • The number of permutations of n distinct objects taken r at a time is 12 . B. C.... Consider the following tree diagram: • The number of permutations of n distinct objects is n Pn = (n)( n −1)( n − 2).400 Example 4 Find the possible permutations (the number of ways where sequence of the letters is counted) from 3 letters A..The event that the number of defectives in above example is greater than 1. Example 3 Suppose a licence plate containing two letters following by three digits with the first digit not zero. How many different licence plates can be printed? 1st Letter Number of Choices A-Z (26) 2nd Letter A-Z (26) 1st Digit 1-9 (9) 2nd Digit 0-9 (10) 3rd Digit 0-9 (10) Number of different licence plates that can be printed is (26)(26)(9)(10)(10) = 608. Its sample space is = {DDD DDN DND NDD} The probability of the event is 4/8 or ½.

n k are alike of the kth kind and n! n1 + n 2 + .. 3 E..( n − r + 2)( n − r +1) )( (n − r )( n − r −1).( n − r + 2)( n − r +1) = ( (n)( n −1)....g... 2 N and 1 Z) The number of 7-letter words that can be formed is 7! =4 0 2 (1 )( 3!)( 2!)(1 ) ! ! • The number of combinations (number of ways where sequence is not counted) of n distinct objects taken r at a time is n Cr = n! r!( n − r )! Find the possible combinations of 5 distinct objects taken 3 at a time.( 2)(1) = n! (n − r )! e. n 2 are alike of the second kind.n Pr = (n)( n −1)... .. The answer = 5! 3!(5 −3)! Example 6 13 .. ∴ The answer = 5 P5 5! = 3! 3! Example 5 How many 7-letter words can be formed using the letters of the word 'BENZENE'? (there are 1 B.. The number of 3-letter words formed from 5 letters is 5 5! P3 = = 60 (5 −3)! • The number of distinct permutations of n objects of which n1 are alike of the first kind. + n k = n is ( n1!)( n 2 !)..( 2)(1) ) (n − r )( n − r −1)....( n k !) Find the possible permutations of the following 5 letters: A A A B C There are five objects of which three are alike..

Solution: (a) The 8 eggs can be divided into 2 groups. The number of this outcome is ( 3 C 2 )( 5 C1 ) = 15 Total number of possible outcomes of selecting 3 eggs randomly from the total 8 eggs is 8 C 3 = 56 . Find the probabilities of the following events. namely. (a) Exactly two eggs are rotten. 3 rotten eggs as the first group and 5 good eggs as the second group. the probability of having all 3 rotten eggs is ( 3 C3 )( 5 C0 ) 8 C3 = 1 56 (c) The probability of having no rotten egg is ( 3 C 0 )( 5 C3 ) 8 C3 = 10 5 = 56 28 14 . (c) No egg is rotten.The number of 3-person committees that can be formed from a group of 4 persons is 4 C3 = 4! =4 3!( 4 − 3)! Example 7 A box contains 8 eggs. 3 of which are rotten. (b) All eggs are rotten. Three eggs are picked at random. Thus the probability of having exactly two rotten among the 3 randomly selected eggs is ( 3 C 2 )( 5 C1 ) 8 C3 = 15 56 (b) Similarly. Getting 2 rotten eggs in 3 randomly selected eggs can occurred if we select randomly 2 eggs from the first group and 1 egg from the second group.

Solution 144 4 = 180 5 Let E be the event of passing English. 9 3 P( M ∪ E ) = 4 5 As P( M ∪ E ) = P( E ) + P ( M ) − P( M ∩ E ) ∴ P ( M ∩ E ) = P ( E ) + P ( M ) − P( M ∪ E ) = 4 2 4 14 + − = = 0. and M be the event of passing Mathematics.31 9 3 5 45 Example 8 15 . to A and B. It is given that: P ( E ) = 4 2 . Their results were as follows: Number of students passing English = 80 Number of students passing Mathematics = 120 Number of students passing at least one subject = 144 Then we can rewrite the above results as: Probability that a randomly selected student passed English = 80 4 = 180 9 120 2 = 180 3 Probability that a randomly selected student passed Mathematics = Probability that a randomly selected student passed at least one subject = Find the probability that a randomly selected student passed both subject. A ∪ B is the union of two sets A and B. it is the set of elements that are common Illustrative example 180 students took examinations in English and Mathematics. it is the set of elements that belong to A A ∩ B is the intersection of two sets A and B. Addition Rule: For any events that are not mutually exclusive P ( A ∪ B ) = P ( A) + P ( B ) − P ( A ∩ B ) where or to B or to both. P( M ) = .Rules of probability The following rules may help us to find the probability of an event.

5} Let A be the event of getting a total of '7'. 5. P ( A ∪ B ) = P ( A) + P ( B ) − P ( A ∩ B ) = 13 4 1 16 4 + − = = 52 52 52 52 13 For mutually exclusive events.A card is drawn from a complete deck of playing cards..4. 4.. What is the probability that at least one head occurs? Let A be the number of heads occurs in six successive tosses.3. 2. P ( A ∪ B ) = P ( A) + P ( B ) − P ( A ∩ B ) = P ( A) + P ( B ) . What is the probability that the card is a heart or an ace? Solution Let A be the event of getting a heart.6.5.1} Possible outcomes of getting a total of '11' : {5. and B be the event of getting a total of '11'. P ( A ∪ B ) = P ( A) + P ( B ) What is the probability of getting a total of '7' or '11' when a pair of dice are tossed? Solution Total number of possible outcomes = (6)(6) = 36 Possible outcomes of getting a total of '7' :{1. and B be the event of getting an ace. 3. 6. The probability that the card is a heart or an ace is P ( A ∪B ) .6. 6. The probability of getting a total of '7' or '11' is P ( A ∪B ) .A and B are mutually exclusive = 6 2 2 + = 36 36 9 If A and A' are complementary events then P ( A) =1 − P ( A' ) Example 9 A coin is tossed six times in succession.2. P ( A ≥ 1) = 1 − P ( A = 0) 16 .

What are the probabilities that: (a) a ketchup-user uses mustard? (b) a mustard-user uses ketchup? Solution Let A be the event of using mustard. P ( A) Example 10 A hamburger chain found that 75% of all customers use mustard. provided P(A) > 0. P ( B ) = 0. Similarly. when ordering a hamburger. and B be the event of using ketchup.8667 P ( A) 0. It is given that: P ( A) = 0. the conditional probability of B given that event A has occurred is defined as P( B / A) = P( A ∩ B) .=1 − 1 63 ( 2 )( 1 2 )( 1 2 )( 1 2 )( 1 2 )( 1 2 ) = 64 Conditional Probability Let A and B be two events. 17 .80 P ( A ∩ B ) 0.75 . P ( A ∩B ) = 0.80 .8125 P( B) 0.75 (b) P(a mustard-user uses ketchup) = P ( B / A) = Multiplicative Rule P ( A ∩ B ) = P ( A) P ( B / A) or = P ( B ) P ( A / B ) Statistically Independence: the occurrence or non-occurrence of one event has no effect on the probability of occurrence of the other event.65 (a) P(a ketchup-user uses mustard) = P ( A / B ) = P ( A ∩ B ) 0. and 65% use both. denoted by P ( A / B ) is defined as P( A ∩ B) P( B) P( A / B ) = provided that P(B) > 0. 80% use ketchup.65 = = 0. The conditional probability of event A given that event B has occurred.65 = = 0.

... find the probability that it is a compact.. P ( A) = P ( A ∩ B1 ) + P ( A ∩ B2 ) + P ( A ∩ B3 ) = P ( B1 ) P ( A / B1 ) + P ( B2 ) P ( A / B2 ) + P ( B3 ) P ( A / B3 ) = (0. ∪ E k contains all sample points of S.Two events A and B are independent if and only if P ( A ∩ B ) = P ( A) P( B ) Example 11 A pair of fair dice are thrown twice. + P ( Bk ) P ( A / Bk ) Example 12 Suppose 50% of the cars are manufactured in the United States and 15% of these are compact...15 ) + (0.. . If a car is picked at random from the lot. k. 20% are manufactured in Japan and 60% of these are compact.50 )( 0.20 )( 0.. Let A be the event that the car is compact. What is the probability of getting totals of 7 and 11? Solution Let Ai be the event of getting '7' in the i-th throw and B j be the event of getting '11' in the j-th throw.315 Baye's Theorem If E1. then for any event A of S P ( A) = P ( A ∩ B1 ) + P ( A ∩ B2 ) + . P(Getting totals of 7 and ll) = P ( A ∩ B ) = P ( A1 ∩ B2 ) + P ( B1 ∩ A2 ) = P ( A1 ) P ( B2 / A1 ) + P ( B1 ) P ( A2 / B1 ) = P ( A1 ) P ( B2 ) + P ( B1 ) P ( A2 ) .60 ) = 0... and B3 be the event that the car is manufactured in Japan... 18 .30 )( 0. E k are mutually exclusive events such that E1 ∪ E 2 ∪ . B j are independent 6 2 2 6 1 = + = 36 36 36 36 54 Theorem of Total Probability If the events B1 .40 ) + (0. B2 . E 2 . .. Bk constitute a partition of the sample space S such that P ( Bi ≠ 0) for i = 1.. and finally. 2.... + P ( A ∩ Bk ) = P ( B1 ) P ( A / B1 ) + P ( B2 ) P ( A / B2 ) + . then for any event D of S with P ( D ) ≠ 0 . 30% of the cars are manufactured in Europe and 40% of these are compact. B1 be the event that the car is manufactured is United States.. B2 be the event that the car is manufactured in Europe. Ai ..

19 . One of the boxes is selected by chance and a ball is drawn from it.A random variable is a variable that takes on different numerical values determined by the outcomes of a random experiment. These theoretical models which are very similar to relative frequency distributions. 2 P( A / B) = 2 . are called probability distributions. what is the probability that it came from the 1st box? Solution Let A be the event of drawing a red ball and B be the event of choosing the 1st box. If the drawn ball is red.. Random Variables . P ( E k ) P ( D / E k ) Example 13 Suppose a box contains 2 red balls and 1 white ball and a second box contains 2 red ball and 2 white balls.. a statistical model that describes the behavior of the outcome is needed. Given: P( B ) = P ( B ' ) = 1 .Probability Distributions To cope with uncertainties of outcome. 3 P( A / B' ) = 2 4 P(Coming from the 1st box/the drawn ball is red) = P ( B / A) = P( A ∩ B) P( A ∩ B) = P ( A) P( A ∩ B) + P( A ∩ B' ) ( 1 )(2 ) P ( B) P( A / B ) 4 2 3 = = = P( B) P( A / B) + P( B' ) P( A / B' ) ( 1 )(2 ) + ( 1 )(2 ) 7 2 3 2 4 Chapter Three .P( Ei / D) = P ( Ei ∩ D ) = P( D) P( Ei ∩ D) ∑ P( E j =1 k j ∩ D) = P( Ei ) P( D / Ei ) P ( E1 ) P ( D / E1 ) + P ( E 2 ) P ( D / E 2 ) + .

P ( X = x) = 4 0 1 1 6 1 4 1 6 2 6 1 6 3 4 1 6 4 1 1 6 Cx . 3} Discrete random variable .in a given interval. 2. Probability Distribution of a random variable . Continuous random variable . 2. x = 0. Let random variable. 1. Let random variable. of a discrete random variable X expresses the probability that X takes the value x.Example 1 An experiment of tossing a coin 3 times. 4 16 Example 3 Consider an experiment of tossing two fair dice. any value can occur. Then the probability distribution of X is: x P(X=x) 2 1 36 3 2 36 4 3 36 5 4 36 6 5 3 6 7 6 36 8 5 36 9 4 36 10 3 36 11 2 36 12 1 3 6 The probability function f ( x ) . 1. That is f ( x) = P( X = x ) 20 .in a given interval. X be the number of heads achieved. x P(X=x) That is. only a specified number of values can occur. as a function of x. As so S = {HHH HHT HTH THH TTH THT HTT TTT}.is a representation of the probabilities for all the possible outcomes. X = {0. Example 2 The probability distribution of the number of heads occurred when a coin is tossed 4 times. X be the sum of the two dice.

and given by Var ( X ) or σ x = E [( X − µ x ) 2 ] 2 = ∑ ( x − µ x ) 2P ( X = x ) x = ∑ x2 P( X = x) − µ x x 2 Example 4 Calculate the mean and variance of the discrete probability distribution in example 2 and 3. (X − µ x)2. It is based on the Law of Errors which states that 1.where the function is evaluated at all possible values of x. denoted σ x2. Errors are inevitable. 3. The Normal Distribution Normal distribution is probability distribution of a continuous random variable. of a discrete random variable X is defined as E ( X ) or µ x = ∑ xP( X = x) x It is the mean of the probability distribution. E(X). ∑P ( X x = x ) = 1. P ( X = x ) ≥ 0 for any value x. Large errors are less likely than small errors. Let X be a random variable. 2. The expectation of the squared discrepancy about the mean. Properties of probability function P ( X = x) :1. Positive and negative errors are equally likely. is called the variance. Mathematical Expectations The expected value. 2. Definition : 21 .

Example 7 Given Z ~ N(0. It is a continuous distribution.5% of the values in a normally distributed population are within 2 standard deviation from the mean.4582 22 . 1) (a) (b) P(Z > 1. Notation : Z ~ N(0.A continuous random variable X is defined to be a normal random variable if its probability function is given by f (x ) = 1 1 x−µ 2 exp[− ( ) ] 2 σ σ ( 2π ) for −∞ < x < +∞ where µ = the mean of X.14154 Notation : X ~ N(µ .73) = 0.7% of the values in a normally distributed population are within 3 standard deviation from the mean.73) = 0. 3. Usually a standard normal random variable is denoted by Z. Approximately 99. The curve is symmetric and bell-shaped about a vertical axis through the mean µ . 2. Area under the normal curve: • • • Approximately 68% of the values in a normally distributed population are within 1 standard deviation from the mean.73) = P(Z > 0) . Approximately 95. π = 3.0.0418 = 0.5 .P(Z > 1. σ 2) Properties of the normal distribution:1. The standard normal curve : The distribution of a normal random variable with µ = 0 and σ =1 is called a standard normal distribution. The total area under the curve and above the horizontal axis is equal to 1. 4.0418 P(0 < Z < 1. σ = the standard deviation of X. 1) Remark : Usually a table of Z is set up to find the probability P(Z ≥ z) for z ≥ 0.

5 < Z <1.78034 P(1.8) = 0. Solution: 45 − µ X − µ 62 − µ P (45 < X < 62) = P < < σ σ σ 62 − 50 45 − 50 = P <Z < 10 10 = P ( − .5764 23 .42 < Z < 0.P(Z > 0.8 < Z < 2. From the standard normal distribution table we have P(0 < Z < 1.00256 = 0.P(Z < -2.8) = 1 .64 (ii) 39.2119 = 0.44% of the area between 0 and z.0.. then Z= X −µ σ is a standard normal random variable and hence P ( x1 < X < x2 ) = P ( x1 − µ x −µ <Z< 2 ) σ σ Example 8 Given X ~ N(50. From the standard normal distribution table we have P(Z < -1.25) = 0.5) − P ( Z >1. 102).0.8) . find P(45 < X < 62). Let the corresponding z value be z1.(c) (d) (e) P(− 2.8) = P(Z > 1. then we have P(Z < z1) = 0. then we have P(0 < Z < z1) = 0..2) 0 =1 − P ( Z < −0. So z1 = -1.1151 = 0.00776 .64) = 0.05.03334 the value z that has (i) 5% of the area below it Let the corresponding z value be z1.3085 .2) = 1 .3944.42) .8) = 1 .0.05.P(Z > 2. So z1 = 1.25 Theorem : If X is a normal random variable with mean µ and standard deviation σ .3944.0359 .

5) =1 − P ( Z < −0.5 < Z < 0. we get P ( Z >1.3085 − 0.18 ) = 0.5) = 1 − 0. Let X be the balance in the charge account. and the grades are curved to follow a normal distribution. 30 2 ) (a) P ( X > 125 ) = P ( X −µ σ > 125 − µ σ ) = P( Z > 125 − 80 ) = P ( Z > 1. If 12% of the class are given A’s. What is the probability that a charge account randomly selected has a balance (a) (b) over $125.3085 = 0.3830 Example 10 On an examination the average grade was 74 and the standard deviation was 7.17 ) = 0.Example 9 The charge account at a certain department store is approximately normally distributed with an average balance of $80 and a standard deviation of $30.12 7 From the standard normal distribution.5) = 0.0668 30 (b) 65 − µ X − µ 95 − µ P(65 < X < 95) = P < < σ σ σ 95 − 80 65 − 80 = P <Z < 30 30 = P ( −0.175 ) ≅ 0.175 7 24 . and P( Z >1.12 x1 − 74 = 1. what is the lowest possible A and the highest possible B? Let X be the examination grade and x1 be the lowest grade for A. between $65 and $95. X ∼ N(80.1210 .1190 so Thus P ( Z >1.5) − P ( Z > 0. x − 74 P( X > x1 ) = 0.12 ⇒ P Z > 1 = 0.

one-tenth is defective. the number of successes in n independent trials.2 ≅ 83 The highest possible B is 82. at least two defective articles. where each test or trial may indicate a defective or a non-defective item. The Binomial Distribution A binomial experiment possesses the following properties : 1. Five cards are drawn with replacement from an ordinary deck and each trial is labelled a success or failure depending on whether the card is red or black. Find the probabilities that a random sample of 20 will obtain (a) (b) exactly two defective articles.i. p) n x n − x p q x x = 0. X ∼ b(20. 20 1 9 P ( X = 2) = 2 10 10 2 18 (a) = 0. The outcomes of trials are independent of each other. Definition : In a binomial experiment with a constant probability p of success at each trial. the probability distribution of the binomial random variable X. In testing 10 items as they come off an assembly line. Example 11 1. There are n identical observations or trials. ….28517 25 .175 ) = 82 . 1. 3. Notation : P(X = x) = X ~ b(n. The probabilities of success p and of failure 1 − p remain the same for all trials. 2. 1 ) 1 0 Let X be the number of defective articles in a random sample of 20. is called the binomial distribution. Each trial has two possible outcomes. one called “success” and the other “failure”. The outcomes are mutually exclusive and collectively exhaustive.e. 2. n p+q=1 Example 12 Of a large number of mass-produced articles. x1 = 74 + (7)(1. 4.

Each question has three possible answers.2) = 2 Its standard deviation is σ = npq = (10 )( 0. and to pass the test a student has to answer at least 4 questions correctly.5) − np ( x + 0. what are the mean and standard deviation of the binomial distribution of that process? Let X be the no.8) =1. If a student guesses on each question. the approximation will be good. 26 . X ∼ b(6. 6 6 6 P ( X ≥ 4) = ∑P( X = x) = ∑ 1 3 x =4 x =4 x ( ) (2 3 ) x 1 1 ) 3 6−x 6 = 1 4 3 ( ) (2 3 ) 4 2 6 + 1 5 3 ( ) (2 3 ) 5 6 + 1 6 3 ( ) (2 3 ) 6 0 = 0. Remark : If both np and nq are greater than 5.12158 − 0. 0. A random sample of ten packages is selected.265 The Normal Approximation to the Binomial Distribution Theorem : Given X is a random variable which follows the binomial distribution with parameters n and p. X ∼ b(10.P ( X ≥ 2) = 1 − P ( X = 0) − P ( X = 1) (b) 20 1 9 = 1 − 0 10 10 0 20 20 1 9 − 1 10 10 19 = 1 − . of defective packages in a sample of 10 packages.5) − np <Z< ) ( npq ) ( npq ) if n is large and p is not close to 0 or 1.10014 Theorem The mean and variance of the binomial distribution with parameters of n and p are µ = np and σ 2 = npq respectively where p + q = 1. what is the probability that the student will pass the test? Let X be the no. of which only one is correct. of correctly answered questions among 6 questions. then P( X = x) = P( ( x − 0.2) Its mean is µ = np = (10)(0. Example 14 A packaging machine produces 20 percent defective packages.27017 = 0.60825 Example 13 A test consists of 6 questions.2)( 0.

5 −10 X '−µ 13 . X ∼ b(100. 0. the number of successes (observations) occurring during a given time interval (or in a specified region) are often called Poisson experiments. What is the probability that sheer guesswork yields from 25 to 30 correct answers for 80 of the 200 problems about which the student has no knowledge? Let X be the no. of defective in a random sample of 100 items.71) = 0.75 ) = 15 P (25 ≤ X ≤ 30 ) ≅ P( 24 . σ = P ( X >13 ) ≅ P ( X ' >13 .Example 15 A process yields 10% defective items.167 ) = 0.1) = 10 .25) µ = np = (80 )( 0.1) µ = np = (100 )( 0.1230 − 0.9) =3 q 00 by normal approximation 13 . A Poisson experiment has the following properties: 1. The probability of a single success occurring during a short interval is proportional to the length of the time interval and does not depend on the number of successes occurring outside this time interval. 2. The number of successes in any interval is independent of the number of successes in other interval. σ = npq = (80 )( 0.16 < Z < 2.25 ) = 20 . Examples of random variables following Poisson Distribution 27 . If 100 items are randomly selected from the process. The probability of more than one success in a very small interval is negligible.5 − µ = P > = P Z > = P ( Z > 1.1)( 0.5) np = (1 )( 0.00336 = 0.25 )( 0.5 < X ' < 30 .5) by normal approximation 24 . what is the probability that the number of defective exceeds 13? Let X be the no.121 σ 3 σ Example 17 A multiple-choice quiz has 200 questions each with four possible answers of which only one is the correct answer. 3. 0.5 − 20 = P <Z < = P (1. X ∼ b(80. of correct answers for 80 with sheer guesswork.1196 15 15 The Poisson Distribution Experiments yielding numerical values of a random variable X.5 − 20 30 .

of particles entering the counter in a given millisecond. What is the probability that 6 particles enter the counter in a given millisecond? Let X be the no. Notation : X ~ Po(λ ) where λ is the average number of successes occuring in the given time interval. 2.1. Three ships arrive. 3.e. P(X = x) = e− λ λx x! x = 0. µ = σ 2 . X ∼ Po(4) P ( X = 6) = e −4 4 6 = 0. The number of customers arrived during a time period of length t. The number of accidents occurred at a junction per day. Definition : The probability distribution of the Poisson random variable X is called the Poisson distribution. … e = 2. The number of typing errors per page. The number of telephone calls per hour received by an office. 4. Suppose that this situation can be described by a Poisson distribution.1042 6! Example 19 Ships arrive in a harbour at a mean rate of two per hour. Example 17 The average number of radioactive particles passing through a counter during 1 millisecond in a laboratory experiment is 4. 1.718283 Theorem: In a Poisson Distribution mean is equal to variance. i. X ∼ Po( 2 2 = 1) 28 . of ship arriving in a harbour for a 30-minute period. 2. Find the probabilities for a 30-minute period that (a) (b) No ships arrive. Let X be the no..

or (ii) n is at least 100.00 in the binomial distribution. then the binomial distribution can be approximated by the Poisson distribution with parameter np.05. Then the required probability = 1 − Example 21 29 . Poisson approximation to the binomial distribution If n is large and p is near 0 or near 1.(a) P ( X = 0) = e −110 = 0. that an individual suffers a bad reaction from a certain injection is 0.999) ( 0. the approximation will generally be excellent provided p< 0. Example 20 If the prob. more than 2 individuals will suffer a bad reaction.3679 0! e −113 = 0.323 e2 General speaking.1.999) ( 0.0613 3! (b) P ( X = 3) = Theorem : The mean and variance of the Poisson distribution both have mean λ .001) ( 0.001) ( 0. the Poisson distribution will provide a good approximation to binomial when (i) n is at least 20 and p is at most 0.999) 1 2 0 Using Poisson distribution: P(0 suffers) = 20 e −2 1 = 2 0! e λ = np = 2 P(1 suffers) = 21 e −2 2 = 2 1! e 2 2 e −2 2 = 2 2! e P(2 suffer) = 5 = 0.001.001) ( 0. Soln : According to binomial : The required probability 2000 2000 2000 0 2000 1 1999 2 1998 + + = 1 − ( 0. that out of 2000 individuals. determine the prob.

Determine the probability that exactly four pieces will be defective.1. The population has two parameters: a mean µ of 1. Let X be the no.02 ) 4 (0.98 ) 296 = 0.02 ) = 6 P ( X = 4) = e −6 6 4 = 0. X ∼ b(300. 0.2. The standard deviation of the distribution of a sample statistic is known as the standard error of the statistic. 4. It indicates the extent to which a sample statistic will tend to vary because of chance variation in random sampling. 4 Obviously there are six possible samples ( C 2 = 6 ).5 and a variance σ 2 of 1. 3. A lot of 300 pieces will be produced. of defective pieces among 300 pieces. A statistic is a random variable that depends only on the observed random sample.1338 4! CHAPTER 4 -SAMPLING DISTRIBUTIONS AND ESTIMATION Definition 1.3}. A sampling distribution is a probability distribution for a sample statistic.02) P ( X = 4) =300 C 4 (0. A sample statistic is a characteristic of a sample. {0. They are 30 . An illustrating example Suppose a population consists of four elements. A population parameter is a characteristic of a population.6667.Two percent of the output of a machine is defective.1338 By Poisson Approximation: λ = np = (300 )( 0. 2. A simple random sample of two elements is to be drawn.

5 1.0 − 1.5 * 1 / 6 +1.5 2. We call the sample mean an unbiased estimator of the population mean.e. and n is large (n ≥ 30).0 -0. which in turn depends on chance.5)2 * + (1.5 * 1 / 6 =1.5 * 2 / 6 + 2.5 − 1. and this approximation becomes better as n becomes larger.0 * 1 / 6 + 2.3 2.5 − 1.0 − 1.5 1. the accuracy of our estimate depends on which sample we have drawn. as compiled in the following table: Sample mean Probability 0.3 Sample mean 0.0 1.5)2 * + (2.5 1/6 1.5 2/6 2.5) 2 * + (1.3 1. we can see that if we draw a sample and use the sample mean to estimate the population mean.5)2 * + (2.4167 Sampling Distribution of Mean The Central Limit Theorem If repeated samples of size n are drawn from any infinite population with mean µ and variance σ 2.5 − 1.5 1/6 The expected value of sample mean is E ( y ) = ∑y * P ( y ) = 0. we can see the following modifications: 31 . the average square deviation of the sample mean from the V ( y ) = ∑ y −Y ) 2 P ( y ) = ( population mean) is: 1 1 2 1 1 (0.0 2.5 = Y . Hence the average value of the sample mean is equal to the population mean. is approximately normal.2 0. with mean µ (i.5 error -1.5)2 * 6 6 6 6 6 =0.5 0 0 0. the sample mean.e.Sample 0.e.0 1/6 1.. the distribution of x . σ2 ).1 0. The probability distribution of sample mean is known as a sampling distribution of sample mean.0 Probability 1/6 1/6 1/6 1/6 1/6 1/6 From the above table. V ( x) = n Notes: As in the previous illustrating example. E ( x) = µ ) and variance σ 2/n (i. The variance of the sample mean (i.0 * 1 / 6 +1.2 1.5 1.0 1/6 2.

When N is very big. the factor is equal to 1.6667/2=0. X ∼ N(110. what is the probability that the score of any one student is greater than 112? What is the probability that the mean score in a random sample of 36 students is greater than 112? What is the probability that the mean score in a random sample of 100 students is greater than 112? Solution (a) Let X be the student's IQ score.00621 Example 2 The mean IQ scores of all students attending a college is 110 with a standard deviation of 10. Find the probability that a random sample of 16 bulbs will have an average life of less than 775 hours. 10 2 ) 32 . X ∼ N( µ x = µ = 800. If the ) N n σ2 population is big (or the sample is drawn with replacement). σ 2 40 2 2 = Let X be the average life of the 16 bulbs. V ( x ) = (1 − In this course we assume a big population or sampling with replacement.5) = 0.4167. Example 1 An electrical firm manufactures light bulbs that have a length of life that is approximately normal distributed with mean equal to 800 hours and a standard deviation of 40 hours. A t-distribution will be used (discussed later). n σ2 =(1-2/4)(1. the sampling distribution is not so normal.8333. (ii) If n is small. where (1-n/N) is known as the finite ) N n population correction factor. (a) (b) (c) If the IQ scores are normally distribution. n=2. σ x = ) n 16 X − µx 775 − µx P ( X < 775 ) = P < σ σx x 775 −800 = P Z < 40 16 = P ( Z < −2. then V ( x ) = =1.(i) If the population is finite. N=4. V ( x ) = (1 − n σ2 .6667/2) = 0. say less than 30. n In the above example.

Population parameter µ σ P 2 x s2 P There are two important properties for an estimator.22 8 P( X 2 Estimation Estimation is the process of using statistics from sample data to estimate the parameters of the population. 2. namely. unbiasedness and efficiency. is unbiased if and only if E ( x ) = µ .0.1 5 11 P( X 1 1 2 −1 0 1 1 >1 2 ) = PZ > 1 1 0 3 6 (c) Let X 2 be the mean score of a sample of 100 students. The followings are some examples Estimator 1. 2 X 1 ∼ N( µ x = µ = 110 . 3. for example. σ x = σ 2 10 2 = ) n 36 = P ( Z >1. 33 . σ 2 10 2 X 2 ∼ N( µ =110 .2) = 0.(b) 112 − 110 P ( X > 112 ) = P Z > = P ( Z > 0. x .4207 10 Let X 1 be the mean score of a sample of 36 students.2) =0. Unbiased estimator: An estimator. A statistic is a random variable which depends on which sample is drawn from a population. = ) n 100 1 2 −1 1 10 >1 2 ) = PZ > 1 10 10 0 = P ( Z > 2) = 0.

99. The smaller the V ( x ) . ^ ^ µ = x. Find a 95% confidence interval for the mean of the entire senior class.3. x .6 .698 (b) Let n1 be the required sample size. is given by V ( x ) . the width of the interval gives the accuracy and the probability gives the reliability of the estimation. For a point estimate. for example. (1 (a) α ) = 0.6 and 0. To be 95% confident that µ if off by less than 0.025 ˆ ˆ 0.3 σ σ < µ < x + z 0.3.6 − 1.502 < µ < 2. A point estimate is a single-value estimate of a population parameter.Efficiency: The efficiency of an estimator. Pr(a < µ < b) = 0. How large a sample is required in (a) if we want to be 95% confident of µ is off by less than 0. An interval estimate of a population parameter gives an interval that may contain the true value of the parameter with a certain probability (i.95 ⇒ α = 0. Given: n = 36. x = 2. both the accuracy and reliability of the estimation are unknown. P = p .05 A 95% confidence interval estimate for the µ is x − z 0. the more accurate will be the x as an estimator. confidence). There are two types of estimate 1.05? (b) Solution Let µ be the mean of the entire senior class. respectively.3 < µ < 2.025 ⇒ 2.e. s = 0.6 + 1. for example.05 would imply 34 . 2. For an interval estimate.96 36 ⇒ 2.96 n n 36 0. for example. Examples 3 (a) The mean and standard deviation for the quality point averages of a random sample of 36 college seniors are calculated to be 2.

3) n1 ≥ = 138 .05 A summary table for constructing (1 − α )% confidence interval for mean and proportion Estimating Mean Mean * Conditions Large samples (n ≥ 30) OR σ is known Small samples and σ unknown Large sample Large sample OR σ1 and σ2 are known Small sample & σ1 and σ2 are unknown.05 ∴ (1.96 n n 1 1 2 ≤ 0.α 2 n n v=n-1 Proportion Difference of means ˆ p ± Zα 2 ˆˆ pq n 2 ( X 1 − X 2 ) ± Zα ˆ σ 12 σ 2 + n1 n2 1 1 + n1 n 2 Difference of means ( X 1 − X 2 ) ± tν . assume σ1 ≠ σ2 2 (n1 − 1) s12 + ( n2 − 1) s 2 n1 + n2 − 2 ( X 1 − X 2 ) ± tν . σ 0. assume σ1 = σ 2 Formula X − Zα X − tν .05 ⇒1.α 2 2 s12 s2 + n1 n2 35 .30 ≅ 139 0.96 )( 0.α ( s p ) 2 ν = n1 + n 2 − 2 .3 ˆ z 0.α 2 2 σ σ < µ < X + Zα 2 n n s s < µ < X + tν .025 ≤ 0. pooled estimate of sample standard deviation: sp = Difference of means Small sample & σ1 and σ2 are unknown.

738 < µ < 10 .447 n 7 0. Given: n = 7 ∑x = 70 n 7 ∑x s2 = (1 - 2 = 700 . 10. assuming an approximate normal distribution.95 ⇒ α = 0. 10. Solution Let µ be the mean of all such containers. 0. 36 .2828 ⇒ 10 − 2.48 ∑ x = 70 = 10 x= ∴ s= 0.262 Example 5 In a random sample of n = 500 families owning television sets in the city of Hamilton.6 liters.08 α ) = 0. 9. t 6. d = x1 − x 2 n ˆ ˆ ˆ ˆ p1 q1 p 2 q 2 + n1 n2 Difference of proportions Large samples ˆ ˆ ( p1 − p 2 ) ± zα 2 Example 4 The contents of seven similar containers of sulfuric acid are 9.α / 2 and v=n-1 sd . Let P be the actual proportion of families in this city with colour sets. 10. 0.48 − 70 6 2 7 = 0.2.υ= 2 ( s12 / n1 + s 2 / n2 ) 2 2 ( s12 / n1 ) 2 ( s 2 / n2 ) 2 + n1 − 1 n2 − 1 Difference of means Paired observations d ± tv .8.8.05 .2828 < µ < 10 + 2.025 s n < µ < x + t 6. Find a 95% confidence interval for the mean of all such containers. it was found that x = 340 owned color sets.2 and 9. ∑ x2 − (∑ x ) 2 n −1 n = 700.025 s 0.025 = 2.2828 . Canada.08 = 0. 0. Find a 95% confidence interval for the actual proportion of families in this city with colour sets.4.447 7 ⇒9. 10.447 A 95% confidence interval estimate for the µ is x − t 6.0.

68 )(.05 + 75 50 75 50 ⇒ (82 − 76 ) − 2. 32 ) (. 68 )(. Given: n1 = 75 .95 ⇒α = 0. while the average for the second sample gave 37 . so σ1 = s1 & σ 2 = s 2 ⇒ 3. n 2 = 50 . Find a 96% confidence interval for the difference µ 1 and µ 2. The girls made an average grade of 76 with a standard deviation of 6. The 12 batches for which catalyst 1 was used gave an average yield of 85 with a sample standard deviation of 4.025 pq < P < p + z 0. x 2 = 76 .96 ⇒ 0. + < µ1 − µ 2 < (82 − 76 ) + 2.05 A 95% confidence interval for P is ˆˆ ˆˆ ˆ ˆ p − z 0.04 A 96% confidence interval for µ1 − µ2 is: ( x1 − x 2 ) − z 0. (1 − α) = . A sample of 12 batches is prepared using catalyst 1 and a sample of 10 batches was obtained using catalyst 2.43 < µ1 − µ 2 < 8.96 ⇒ α = 0.68 .Given: n = 500.57 Example 7 In a batch chemical process. x1 = 82 .72 500 500 Examples 6 A standardized chemistry test was given to 50 girls and 75 boys.02 + n1 n 2 n1 n2 82 62 82 62 . n 500 (1 − α) = 0. while the boys made an average grade of 82 with a standard deviation of 8. s 2 = 6 .02 ˆ ˆ2 ˆ ˆ2 σ 12 σ 2 σ 12 σ 2 + < µ1 − µ 2 < ( x1 − x 2 ) + z 0. 32 ) < P < 0. s1 = 8 .64 < P < 0.05 ˆ ˆ n1 > 30 & n 2 > 30 .025 pq n n ⇒ 0.96 (. two catalysts are being compared for their effect on the output of the process reaction. where µ 1 is the mean score of all boys and µ 2 is the mean score of all girls who might take this test.68 +1.68 −1. ˆ p = x = 340 = 0.

(1 − α) = .725 )( 4.478 ) + 12 10 12 10 ⇒ (85 − 81) − (1. 1 Given: n1 = 12 .478 ) ⇒ 0. s 2 = 5 . s1 = 4 . x 2 = 81 . ν = n1 + n 2 − 2 = 12 + 10 − 2 = 20 .31 Example 8 The weight of 10 adults selected randomly before and after a certain new diet was introduced was recorded as follows: Adult 1 2 3 4 5 6 7 8 9 10 Before ( x1 ) 76 60 85 58 91 75 82 64 79 88 After ( x2 ) 81 52 87 70 86 77 90 63 85 83 Difference -5 8 -2 -12 5 -2 -8 1 -6 5 38 . Find a 90% confidence interval for the difference between the population means.10 . respectively.725 )( 4.725 pooled estimate of sample standard deviation s p = 2 (n1 − 1) s12 + ( n2 − 1) s 2 n1 + n2 − 2 = A 90% confidence interval for µ1 − µ2 is: ( x1 − x 2 ) − t 20 . 0.05 ( s p ) (12 −1)4 2 + (10 −1)5 2 = 4.05 ( s p ) + n1 n 2 n1 n 2 1 1 1 1 + < µ1 − µ2 < (85 − 81) + (1.69 < µ1 − µ 2 < 7. x1 = 85 . Solution Let µ and µ2 be the mean population yield using catalyst 1 and catalyst 2. n 2 = 10 . 0.05 =1.478 12 + 10 − 2 1 1 1 1 + < µ1 − µ2 < ( x1 − x 2 ) + t 20 . assuming the populations are approximately normally distributed with equal variances.0.90 ⇒ α = 0. t 20 .an average of 81 and a sample standard deviation of 5.

Samples are taken using both the existing and the new procedure in order to determine if the new procedure results in an improvement.6 s 2 d ∑ (d = i − (−1.05 1500 ˆ p 2 = 80 2000 = 0.821) 10 Example 9 A certain change in a manufacturing procedure for component parts is being considered.10 A 90% confidence interval for P .01 = 2. x 2 = 80 .7 For v = n-1 = 9. find a 90% confidence interval for the true difference in the fraction of defectives between the existing and the new process.09 6.P2 is: 1 ˆ ˆ ( p1 − p 2 ) − z 0. ˆ p1 = 75 = 0. Solution Let P and P2 be the true fraction of defectives of the existing and the new processes.04 n 2 = 2000 . If 75 of 1500 items from the existing procedure were found to be defective and 80 of 2000 items from the new procedure were found to be defective. Given: n1 = 1500 . (1 − α) = . A 98% confidence interval is That is −7.90 ⇒ α = 0.821 .05 ˆ ˆ ˆ ˆ p1 q1 p 2 q 2 ˆ ˆ + < P1 − P2 < ( p1 − p 2 ) + z 0. t0. Solution ∑d d= n i = -1. x1 = 75 .05 n1 n2 ˆ ˆ ˆ ˆ p1 q1 p 2 q 2 + n1 n2 39 .Find a 98% confidence interval for the mean difference in weight.6 ± (2.29 < µd < 4.6)) 2 n −1 = 40.38 −1. 1 respectively.

.96* = 0. 04 )(.66 kg and 7. 95 ) (. Test the hypothesis that µ = 8 kilograms against the alternative that µ ≠ 8 kilograms if a random sample of 50 lines is tested and found to have a mean breaking strength of 7. 05 )(.01) −1.5 0. They are complementary to each other. 04 )(. Use a 0.8 − 1. 96 ) + 1500 2000 ⇒ −0. Here µ = 8 kilograms is known as the null hypotheis and µ ≠ 8 kilograms is the alternative hypothesis. and we need to decide which one to accept on the basis of the sample result of 50 lines.⇒(0.64 1 1500 2000 (.96* < µ < 7. As there is a probability of 0. 96 ) + < P − P2 < (0. When a random sample is drawn from a population (the 50 lines randomly selected).001697 < P1 − P2 < 0. it is highly unlikely that the null hypothesis µ = 8 kg is true and hence should be rejected.e.95 . 05 )(.8 + 1.01) +1. P (7.5 kilogram. the sample information can be used assess the validity of some conjecture.Introduction to Test of Hypothesis Statistical Hypothesis Consider the following example: A manufacturer of sports equipment has developed a new synthetic fishing line that he claims has a mean breaking strength of 8 kilograms with a standard deviation of 0.6614 < µ < 7. There are four possible situations for the above decision making exercise: H 0 is correct Accept H 0 Reject H 0 Correct decision Type 1 error H 0 is wrong Type 2 error Correct decision 40 .8 kilograms.94 kg.64 (.5 P 7.95 that the mean breaking strength is between 7. Now let us make a 95% confidence interval about the mean breaking strength of the population as below: 0. 50 50 i.9386) = 0. 95 ) (.021697 Lecture 5 .95 .01 level of significance. or hypothesis.

Hence it will be accepted if the null hypothesis is rejected. Null hypothesis. while in the two-tail test we don’t have such expectation. H1 : µ ≠ µ0 (two-tail test) H1 : µ > µ0 (One-tail test) H1 : µ < µ0 (One-tail test) In the one-tail test we have some expectation about the direction of the error when the null hypothesis is wrong. We call this probability ‘level of significance’ or α . Types of error (a) (b) Type I error: Reject H0 when H0 is true Type II error: Accept H0 when Ha is true 41 . based on the sample. Some Hypothesis Testing Terminology 1. Alternative hypothesis. H0 : µ = µ0 2. Test statistics is the value. There are however some formal concepts and procedures to conduct the test. H1 It is a hypothesis that is complement to the null hypothesis. H0 A hypothesis that is held to be true until very strong evidence to the contrary is obtained.We still have a probability of 1-0. 5. Critical region is a region in which if the test statistic falls the null hypothesis will be rejected. which is the probability of committing a type 1 error. used to determine whether the null hypothesis should be rejected or accepted. 4. 3. or 0.05 to reject a true H 0 . The rationale of hypothesis testing is simply outlined as above. The details are put down below.95.

6.e. 5. Specify the level of significance to be used. The significance level.e. Conclusion: Reject H0 if the statistic has a value in the critical region. Formulate the alternative hypothesis. P(Type I error) = α .6. The probability of committing a type 2 error is β . Formulate the null hypothesis. Basic Steps in Testing Hypothesis 1. P(Type II error) = β . i. 4.. i. α is the probability of committing a type 1 error.. 3. Compute the value of the test statistic. otherwise accept H0. 2. Select the appropriate test statistic and establish the critical region. 42 .

assume σ 1 = σ 2 1 1 sp + n n 2 1 with υ = n1 + n 2 − 2 and t= ( x1 − x 2 ) − d 0 s2 = p µ1 − µ2 = d 0 Small sample & σ1 and σ 2 are unknown. H0 Conditions Large samples (n ≥ 30) OR σ is known Small samples and σ unknown Test statistic x − µ0 z= σ n x − µ0 t= with υ = n − 1 s n µ = µ0 µ = µ0 µ1 − µ2 = d 0 Large samples OR σ1 and σ2 are known z= ( x1 − x 2 ) − d 0 2 σ 12 σ 2 + n1 n2 µ1 − µ2 = d 0 Small sample & σ1 and σ 2 are unknown. assume σ1 ≠ σ 2 2 (n1 −1) s12 + (n2 −1) s2 n1 + n2 − 2 if σ1 = σ2 but unknown ( x − x2 ) − d 0 t= 1 2 s12 s 2 + n1 n2 with υ= 2 ( s12 / n1 + s 2 / n2 ) 2 2 ( s12 / n1 ) 2 ( s 2 / n2 ) 2 + n1 − 1 n2 − 1 43 .Tests concerning means The tests concerning means and proportions are summarized in the following table.

µ1 − µ2 = d 0 Paired observations Large sample t= d − d0 sd n with υ = n − 1 p = p0 z= p1 − p2 = 0 Large samples ˆ p − p0 p0 (1 − p0 ) n ˆ ˆ ( p1 − p2 ) z= 1 1 ˆ ˆ p(1 − p ) + n1 n2 ˆˆ n p +n p ˆ p= 1 1 2 2 n1 + n2 44 .

Example 2 The average length of time for students to register for fall classes at a certain college has been 50 minutes with a standard deviation of 10 minutes. test the hypothesis that the population mean is now less than 50.01 level of significance and conclude that µ is significantly smaller than 8 kilograms.01 level of significance. so reject the null hypothesis at 0. and (2) 0.58 Computation: n = 50 x = 7. using a level of significance of (1) 0.8 kilograms. A new registration procedure using modern computing machines is being tried.01.5 kilogram. If a random sample of 12 students had an average registration time of 42 minutes with a standard deviation of 11.5 z= x−µ 7.005 = 2.828) falls inside the critical region. Test the hypothesis that µ = 8 kilograms against the alternative that µ ≠ 8 kilograms if a random sample of 50 lines is tested and found to have a mean breaking strength of 7.05 Critical region: (n = 12 < 30. and the new degree of freedom (ν ) = n -1 = 12 -1 =11 σ is unknown. Let µ be the population mean time for students to register in the new registration procedure. so t-test should be used ) 45 .8 σ = 0.5 n 50 Conclusion: As the sample z (= -2. Use a 0. (1) Null hypothesis: µ = 50 minutes Alternative hypothesis: µ < 50 minutes Level of significance: 0.01 Critical region: Z > z 0. Null hypothesis: µ = 8 kilograms Alternative hypothesis: µ ≠ 8 kilograms Level of significance: 0.58 or Z < − z 0.9 minutes under the new system.005 = −2.Example 1 A manufacturer of sports equipment has developed a new synthetic fishing line that he claims has a mean breaking strength of 8 kilograms with a standard deviation of 0. 8 − 8 = = −2. Assume the population of times to be normal.05.828 σ 0.

329) falls inside the critical region.725 t 1 46 .1 ∴ t < t11 . 0.10 level of significance. µ1 − µ 2 ≠ 0 Level of significance: 0. the depth of wear was observed. Ten pieces of material 2 were similarly tested.01 = − . 0. ∴ t >t 20 . In each case. The samples of material 1 gave an average (coded) wear of 85 units with a standard deviation of 4.05 = − . while the samples of material 2 gave an average of 81 and a standard deviation of 5.725 or t < − 20 .796 Computation: n = 12 x = 42 s = 11.9 ∴ t= x−µ 42 − 50 = = −2. i. Assume the populations to be approximately normal with equal variances. µ1 − µ 2 = 0 Alternative hypothesis: µ1 ≠ µ 2 .e. so reject the null hypothesis at 0.718 2 and the corresponding conclusion would be changed as follows: As sample t (= -2. Test the hypothesis that the two types of material exhibit the same mean abrasive wear at the 0. 0.05 =1. 0. Let µ and µ2 be the mean abrasive wear of material 1 and 2 respectively. i.05 level of significance and conclude that µ is significantly smaller than 50 minutes.01 level of significance and conclude that µ is not highly significantly smaller than 50 minutes. by exposing each piece to a machine measuring wear.05 = − .9 n 12 Conclusion: As sample t (= -2.329) falls outside the critical region.329 s 11.e. Example 3 An experiment was performed to compare the abrasive wear of two different laminated materials. (2) Identical with those of (1) except the critical region would be replaced by: t <t11 . Twelve pieces of material 1 were tested. so reject the alternative hypothesis at 0.10 Critical region: (As both n1 and n 2 are smaller than 30 and their standard deviations are unknown. so t-test has to be used. 1 Null hypothesis: µ1 = µ 2 .) ν = n1 + n2 − 2 = 12 + 10 − 2 = 20 .

0 2. so t-test should be used.05 Critical region: (As n = 5 < 30. on the average.2 2 2.05 + 12 10 = 2.3 2. Each sample was split into two sub-samples and the two types of analysis were applied.086) falls inside the critical region.4 2. Null hypothesis: µ1 = µ 2 Alternative hypothesis: µ1 ≠ µ 2 Level of significance: 0.4 Assuming the populations normal.05 level of significance whether the two methods of analysis give. the same result.10 level of significance and conclude that the mean abrasive wear of material 1 is significantly higher than that of the material 2. and µD be the mean of the population of differences of paired measurements. Following are the coded data showing the iron content analysis: Sample Analysis x-ray Chemical 1 2.5 4 2. so reject the null hypothesis at 0.1 2.0 1.086 Conclusion: As the sample t (=2. Let µ and µ2 be the mean iron content determined by the laboratory chemical analysis and 1 X-ray fluorescence analysis respectively.05 n1 + n 2 − 2 12 + 10 − 2 t= ( x1 − x 2 ) − ( µ1 − µ 2 ) sp 1 1 + n1 n2 = (85 − 81) − 0 1 1 20.9 3 2. test at the 0.) or µD = 0 or µD ≠ 0 47 .3 5 2.Computation: n1 = 12 n2 = 10 x1 = 85 x 2 = 81 s1 = 4 s2 = 5 s2 = p 2 (n1 − 1) s12 + (n 2 − 1) s 2 (11)4 2 + (9)5 2 = = 20. Example 4 Five samples of a ferrous-type substance are to be used to determine if there is a difference between a laboratory chemical analysis and an X-ray fluorescence analysis of the iron content.

2 0.04 2 2.025 = − .9 0.∴ t > t 4.0 2.3 -0. 0. so reject the alternative hypothesis at 0.01 3 2.13) − ( −0.13 d = ∑d 5 = − 0.1 5 2 s = 2 d n∑d 2 − ( ∑d ) n( n −1) = (5)( 0.2 0.04 5 2.5 = −0.4 2.1) − 0 = = −1. Tests Concerning Proportions Example 5 48 .5811) falls outside the critical region.1 2.5 -0.776 t 2 Sample Analysis x-ray Chemical di 1 2.1 0.02 (5)( 4) t= d − µ D ( −0. 0.5) 2 = 0.5811 sd 0.4 0 0 d i2 ∑d i =1 5 i = −0.2 -0.05 level of significance and conclude that there is no significant difference in the mean iron content determined by the above two analyses.04 4 2.02 n 5 Conclusion: As the sample t (=-1.025 = 2.3 2.0 1.2 0.5 ∑d i =1 5 2 i = 0.776 Computation: or t < − 4 .

1 Level of significance: 0. so reject the null hypothesis at 0.05 level of significance and conclude that P is significantly smaller than 0. Is this evidence sufficient to conclude that the method has been improved? Use a 0. Example 6 A vote is to be taken among the residents of a town and the surrounding country to determine whether a proposed chemical plant should be constructed.1. would you agree that the proportion of town voters favoring the proposal is higher than the proportion of county voters? Use a 0. If 120 of 200 town voters favor the proposal and 240 of 500 county residents favor it.1)( 0. Let P be the proportion of defective product in the new production process.1 (0. To determine if there is a significant difference in the proportion of town voters and county voters favoring the proposal.05 − 0. In an experiment 100 items are produced with the new process and 5 are defective.A manufacturing company has submitted a claim that 90% of items produced by a certain process are non-defective.1 Alternative hypothesis: P < 0.05 = −1.025 level of significance.9) 100 = −1. Null hypothesis: P = P2 1 or P − P2 = 0 1 or Alternative hypothesis: P > P2 1 Level of significance: 0. favouring 1 the proposal. Let P and P2 be the proportions of town voters and country voters.05 Critical region: Z < −z 0.025 P − P2 > 0 1 49 . Null hypothesis: P = 0. That is.667 Conclusion: As the sample z (=-1.667) falls inside the critical region.05 level of significance. The construction site is within the town limits and for this reason many voters in the country feel that the proposal will pass because of the large proportion of town voters who favor the construction. respectively.05 n 100 z= ˆ p −P P (1 − P ) n = 0. An improvement in the process is being considered that they feel will lower the proportion of defective below the current 10%. the production method has been improved in lowering the proportion of defective below the current 10%. a poll is taken.64 Computation: n = 100 x=5 ˆ p= x 5 = = 0.

48) − 0 1 1 (0.96 Computation: n1 = 200 x1 = 120 ˆ p1 = x1 120 = = 0.514 n1 + n 2 200 + 500 ˆ ˆ ( p1 − p 2 ) − ( P1 − P2 ) 1 1 ˆ ˆ p (1 − p ) + n1 n2 = (0.025 = 1.48 ) = = 0.486 ) + 200 500 = 2. so reject the null hypothesis at 0.48 n2 500 n 2 = 500 x 2 = 240 ˆ p2 = ˆ p= ˆ ˆ n1 p1 + n 2 p 2 (200 )( 0. 50 .514 )( 0.870) falls inside the critical region.6 n1 200 x 2 240 = = 0.6) + (500 )( 0.6 − 0.025 level of significance and conclude that the proportion of town voters favouring the proposal is significantly larger than that of the country voters.870 z= Conclusion: As sample z (=2.Critical region: Z > z 0.

The test is based on how good a fit we have between the frequency of occurrence of observations in an observed sample and the expected frequencies obtained from the hypothesized distribution. Example 1 Consider the tossing of a die 120 times. one has to decide whether the die is fair die or not. P( X = i) = 1 6 for i = 1. The decision criterion described here should not be used unless each of the expected frequencies is at least equal to 5. 4. The number of degrees of freedom in a Chi-square goodness-of-fit test is equal to the number of cells minus the number of quantities obtained from the observed data that are used in the calculations of the expected frequencies. Theorem: A goodness-of-fit test between observed and expected frequencies is based on the quantity χ2 = ∑ test (Oi − E i ) 2 Ei where χ 2 is a value of the random variable whose sampling distribution is test approximated very closed by the Chi-square distribution.Chapter 7 . Goodness-of-fit Test A test to determine if a population has a specified theoretical distribution. Null hypothesis: the die is a fair die. 2 2 For a level of significance equal to α . 3. Oi is the observed frequency of cell i.Chi-square Tests There are two types of chi-square tests: goodness-of-fit test and tests for independence. 2. i. and Ei is the expected frequency of cell i. Faces 1 Observed Expected 20 2 22 3 17 4 18 5 19 6 24 By comparing the observed frequencies with the expected frequencies. and 6 Alternative hypothesis: the die is not a fair die 51 .e. 5. χ test > χ α constitutes the critical region.

05 2 2 Critical region: ν = n − 1 = 6 − 1 = 5 .4.45 . Class boundaries 1. Example 2 The following distribution of battery lives may be approximated by the normal distribution.1.7 20 20 20 20 20 20 Conclusion: As the sample χ2 (= 1.2.0.2. so reject the alternative hypothesis and conclude that the die is a fair die.95 Oi 2 1 4 15 10 5 3 z-value p-value Ei Chi-squared test can be applied to test whether the above frequency distribution can be approximated by a normal distribution or not. Null hypothesis: the distribution can be approximated by a normal distribution Alternative hypothesis: the distribution cannot be approximated by a normal distribution 52 .95 .45 3.4.Level of significance: 0.7) falls outside the critical region.3.95 3.45 .95 1.3. ∴ χ > χ5.95 .45 4.95 .45 .95 2.45 2.45 .05 = 11 .07 Computation: 1 Expected value = nP ( X = i ) = 120 ( ) = 20 6 i Observed (Oi ) Expected ( E i ) Oi − E i 1 20 20 0 6 2 22 20 2 3 17 20 -3 4 18 20 -2 5 19 20 -1 6 24 20 4 χ2 = ∑ i =1 (Oi − Ei ) 2 Ei = 0 2 2 2 ( −3) 2 ( −2) 2 ( −1) 2 4 2 + + + + + = 1.

05 2 2 Critical region: χ > χν .4125 p − value = i <Z < i 0.95 2.45 .6969 0.6969 where Li and U i are the Lower and Upper Boundaries of the ith class.2 3.5 ∑ fx 2 = 484 .7 2.95 .1.6969 0.05 where ν = n − 3 .2.4125 L − 3.4.45 2. and n is the number of cells. U − 3.75 ∑ fx ∑f 136 .5 s= ∑ fx 2 − ( ∑ fx) / n n−1 = ( 39 40 = 0.75 − 136.2.95 1.4.6969 53 .45 4.95 3.45 .45 .4125 L − 3.2 4.Level of significance: 0.4125 40 2 2 484.5 = 3.95 .45 3.95 Oi 2 1 4 15 10 5 3 n = 40 x= ∑fx = =136 .95 . the mean and standard deviation of the frequency distribution have to be found first. Computation: For finding the expected values.7 3. 0.4125 z − value = i <Z < i . ( fi ) Class mark ( xi ) 1.45 .2 2. 0.6969 ) U − 3.7 Class boundaries 1.7 4.3.3.

95 1.832 10.45 .45 4.716 2.824 ) 2 = + + + E 10 . the number of cells (n) is 4.05 = 3.77 0. 54 .612 10 .824 = 2.2.2653 .38) 2 (8 − 8.901) falls outside the critical region.95 2.3. As such.612 10.95 .45 .4.38 -1.77<Z<1.184 ) 2 (15 − 10 . 0. so the sample χ2 (= 2.49 1.38 6. we have to combine the first three classes in to one cell and the last two classes into another cell. Test for Independence The Chi-square test procedure can also be used to test the hypothesis of independence of two variables/attributes.45 .4.10 -2.1525 .4125 E i = (40 ) P i <Z < i 0.4125 L − 3.10<Z<-1.95 3.49<Z p-value .0681 Ei 0.0179 .95 Oi 2 1 4 15 10 5 3 z-value Z<-2. χ2 = ∑ (O − E ) 2 (7 − 10 .95 .636 6.1 2. or contingency table.2.6969 Class boundaries 1.66<Z<0. As such.45 2.724 In order to satisfy the rule that the expected value in each cell is larger than or equal to 5. reject the alternative hypothesis and conclude that the distribution of battery lives can be approximated by a normal distribution.66 -0.1708 .38 8. Remark: The expected frequency of the cell in the ith row and jth column in the contingency table E ij = (total of row i) * (total of column j) grand tota l The degrees of freedom for the contingency table is equal to (r − 1) (c − where r is the number of 1) rows and c is the number of columns in the table.6969 0.U − 3.841 .612 ) 2 (10 − 10 .3.38<Z<-0.2595 .95 .45 .05 0.0659 .901 2 Conclusion: Since χ1.45 3. The observed frequencies of two variables are entered in a two-way classification table.1.05<Z<0.184 10 .

05 . That is.6) 2 (16 −14 . That is. Level of significance: 0.6) 33 Appearance attractive ordinary unattractive Totals 1 14 (9) 10 ( 12.67) 14 (13. the two characteristics are not independent.47 15 .67 ) 2 = + + + E 9 10 .47 ) 2 (16 −15 .1) Computation: Grade Point Average 2 3 11 (10.2) 31 10 (11) 16 (15. the two characteristics are independent.33) 2 (10 − 11) 2 (5 − 9.6) 3 (5.Example 3 Suppose that we wish to study the relationship between grade point average and appearance.47) 4 (6.53 55 . 0.05 2 2 Critical region: χ > χν .33 11 9.4) 27 4 Totals 40 56 24 120 5 (9.4) 7 (6. Alternative hypothesis: There is a relationship between grade point average and appearance. Grade Point Average 2 3 4 11 ( ) 16 ( ) 4( ) 31 10 ( ) 16 ( ) 7( ) 33 5( ) 14 ( ) 10 ( ) 29 Appearance attractive ordinary unattractive Totals 1 14 ( ) 10 ( ) 3( ) 27 Totals 40 56 24 120 Null hypothesis: There is no relationship between grade point average and appearance.53) 10 (5.4) 2 (14 −13 .53 ) 2 + + + 12 .67 + (10 −12 .8) 29 χ2 = ∑ (O − E ) 2 (14 − 9) 2 (11 − 10 .6 14 .4 13 .33) 16 ( 14. where ν = (r -1)(c .

and C was conducted to compare the rates of stock purchases that resulted in profits over a time period that was less than or equal to one year.05 = 5. So reject the alternative hypothesis and conclude that there is no evidence to support there is relationship between grade point average and appearance. Test for Homogeneity To test the hypothesis that several population proportions are equal. Remark: The approach for the test of homogeneity is the same as for the test of independence of variables/attributes. Example 4 A study of the purchase decisions for 3 stock portfolio managers. 0.8) 2 + + + = 10 . A.6) 2 (10 − 5. Result Purchases show profit Purchase do no show profit Total A 63 37 100 Manager B 71 29 100 C 55 45 100 Null hypothesis: the rates of stock purchases that resulted in profit were the same for the three stock portfolio managers Alternative: their rates were not all the same Level of significance: 0.596 .8 2 Conclusion: Since χ6.6 5. 0.4 6.991 Computation: Result Purchases show profit A 63 (63) Manager B 71 (63) C 55 (63) Total 189 56 .818) falls outside the critical region. ∴ χ > χ2.2) 2 (7 − 6.+ (3 − 5.05 =12 .05 2 2 Critical region: ν = (2 −1)( 3 −1) = 2 .05 . B. Do these data provide evidence of differences among the rates of successful purchases for the three portfolio managers? Test with α = 0.2 6.818 5. One hundred randomly selected purchases obtained for each of the managers showed the results given in the table.4) 2 ( 4 − 6. so sample χ2 (=10.

Purchase do no show profit Total 37 (37) 100 29 (37) 100 45 (37) 100 111 300 χ2 = (63 − 63) 2 (71 − 63) 2 (55 − 63) 2 (37 − 37 ) 2 (29 − 37 ) 2 (45 − 37 ) 2 + + + + + = 5.49) falls outside the critical region so reject the alternative hypothesis and conclude that there is no sufficient evidence to support the rates of purchases resulted in profit of the three portfolio managers were different. 57 .49 63 63 63 37 37 37 Conclusion: As the sample χ2 (= 5.

- Kuliah 3 Sampling
- Estimation
- Monte Carlo Methods in Financial Engineering
- Small Area Estimation
- j.1420-9101.2000.00137.x
- T7-3 T.pdf
- Bhls Kernels Practice 08
- A Generalized Family of Estimators for Estimating Population Mean Using Two Auxiliary Attributes
- Chapter 7
- draftf01
- 5. IJMCAR - Generalized Family of Efficient Estimators of Population Median Using Information on Two Auxiliary Variables (1).
- Estimation
- Inference on Single Mean_new
- Standerd diviation
- MEC 003-J-11
- [Notes]STAT1 - Elementary Statistics
- Decay Model
- jhgkjb
- STAT212_052_F1
- TIK Report11
- Anthony Rogers 2005 JWEIA Measure Correlate Predict
- Session12 Solution
- (13) Normal Distribution
- Special Project
- Object Detection
- Statistics Revision Solutions
- APSTATS Midterm Cram Sheet
- Peer Effects
- Ntc Project
- FRM 2017 Part I - Quicksheet

- FM Sept Answers
- 4GL Programming language.docx
- ploughing back.docx
- internalquestionpaper(FMI)j
- Date Sheet
- rojgardiwas
- cubes.docx
- be mid term
- capitalmarketinindia-120409001551-phpapp01
- fi
- It Midterm
- New Microsoft Office Word Document (2)
- ec mid term
- Water & Toilet Suchana Block- BHIMTAL
- Date Sheet
- Printers
- Extension
- ref.txt
- TeacherResume11-12.pdf
- invitation later.doc
- Ref
- three.docx
- Admission Notification 2013 14
- 2 Key
- Book1
- Markets Notes
- Wedding Card
- Tables
- M.COM I AND II

Read Free for 30 Days

Cancel anytime.

Close Dialog## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Loading