This action might not be possible to undo. Are you sure you want to continue?

**Introduction Probaility and Statistics
**

Lecture Notes

Muhammad El-Taha

Department of Mathematics and Statistics

University of Southern Maine

96 Falmouth Street

Portland, ME 04104-9300

MBA 604, Spring 2003

MBA 604

Introduction to Probability and Statistics

Course Content.

Topic 1: Data Analysis

Topic 2: Probability

Topic 3: Random Variables and Discrete Distributions

Topic 4: Continuous Probability Distributions

Topic 5: Sampling Distributions

Topic 6: Point and Interval Estimation

Topic 7: Large Sample Estimation

Topic 8: Large-Sample Tests of Hypothesis

Topic 9: Inferences From Small Sample

Topic 10: The Analysis of Variance

Topic 11: Simple Linear Regression and Correlation

Topic 12: Multiple Linear Regression

1

Contents

1 Data Analysis 5

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Graphical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Numerical methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4 Percentiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

5 Sample Mean and Variance

For Grouped Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

6 z-score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2 Probability 22

1 Sample Space and Events . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2 Probability of an event . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3 Laws of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4 Counting Sample Points . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5 Random Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

6 Modeling Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3 Discrete Random Variables 35

1 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2 Expected Value and Variance . . . . . . . . . . . . . . . . . . . . . . . . 37

3 Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4 Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4 Continuous Distributions 48

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

2 The Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3 Uniform: U[a,b] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4 Exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

2

5 Sampling Distributions 56

1 The Central Limit Theorem (CLT) . . . . . . . . . . . . . . . . . . . . . 56

2 Sampling Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

6 Large Sample Estimation 61

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

2 Point Estimators and Their Properties . . . . . . . . . . . . . . . . . . . 62

3 Single Quantitative Population . . . . . . . . . . . . . . . . . . . . . . . 62

4 Single Binomial Population . . . . . . . . . . . . . . . . . . . . . . . . . 64

5 Two Quantitative Populations . . . . . . . . . . . . . . . . . . . . . . . . 66

6 Two Binomial Populations . . . . . . . . . . . . . . . . . . . . . . . . . . 67

7 Large-Sample Tests of Hypothesis 70

1 Elements of a Statistical Test . . . . . . . . . . . . . . . . . . . . . . . . 70

2 A Large-Sample Statistical Test . . . . . . . . . . . . . . . . . . . . . . . 71

3 Testing a Population Mean . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4 Testing a Population Proportion . . . . . . . . . . . . . . . . . . . . . . . 73

5 Comparing Two Population Means . . . . . . . . . . . . . . . . . . . . . 74

6 Comparing Two Population Proportions . . . . . . . . . . . . . . . . . . 75

7 Reporting Results of Statistical Tests: P-Value . . . . . . . . . . . . . . . 77

8 Small-Sample Tests of Hypothesis 79

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

2 Student’s t Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

3 Small-Sample Inferences About a Population Mean . . . . . . . . . . . . 80

4 Small-Sample Inferences About the Diﬀerence Between Two Means: In-

dependent Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5 Small-Sample Inferences About the Diﬀerence Between Two Means: Paired

Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

6 Inferences About a Population Variance . . . . . . . . . . . . . . . . . . 86

7 Comparing Two Population Variances . . . . . . . . . . . . . . . . . . . . 87

9 Analysis of Variance 89

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

2 One Way ANOVA: Completely Randomized Experimental Design . . . . 90

3 The Randomized Block Design . . . . . . . . . . . . . . . . . . . . . . . . 93

3

10 Simple Linear Regression and Correlation 98

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

2 A Simple Linear Probabilistic Model . . . . . . . . . . . . . . . . . . . . 99

3 Least Squares Prediction Equation . . . . . . . . . . . . . . . . . . . . . 100

4 Inferences Concerning the Slope . . . . . . . . . . . . . . . . . . . . . . . 103

5 Estimating E(y|x) For a Given x . . . . . . . . . . . . . . . . . . . . . . 105

6 Predicting y for a Given x . . . . . . . . . . . . . . . . . . . . . . . . . . 105

7 Coeﬃcient of Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

8 Analysis of Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

9 Computer Printouts for Regression Analysis . . . . . . . . . . . . . . . . 107

11 Multiple Linear Regression 111

1 Introduction: Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

2 A Multiple Linear Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

3 Least Squares Prediction Equation . . . . . . . . . . . . . . . . . . . . . 112

4

Chapter 1

Data Analysis

Chapter Content.

Introduction

Statistical Problems

Descriptive Statistics

Graphical Methods

Frequency Distributions (Histograms)

Other Methods

Numerical methods

Measures of Central Tendency

Measures of Variability

Empirical Rule

Percentiles

1 Introduction

Statistical Problems

1. A market analyst wants to know the eﬀectiveness of a new diet.

2. A pharmaceutical Co. wants to know if a new drug is superior to already existing

drugs, or possible side eﬀects.

3. How fuel eﬃcient a certain car model is?

4. Is there any relationship between your GPA and employment opportunities.

5. If you answer all questions on a (T,F) (or multiple choice) examination completely

randomly, what are your chances of passing?

6. What is the eﬀect of package designs on sales.

5

7. How to interpret polls. How many individuals you need to sample for your infer-

ences to be acceptable? What is meant by the margin of error?

8. What is the eﬀect of market strategy on market share?

9. How to pick the stocks to invest in?

I. Deﬁnitions

Probability: A game of chance

Statistics: Branch of science that deals with data analysis

Course objective: To make decisions in the prescence of uncertainty

Terminology

Data: Any recorded event (e.g. times to assemble a product)

Information: Any aquired data ( e.g. A collection of numbers (data))

Knowledge: Useful data

Population: set of all measurements of interest

(e.g. all registered voters, all freshman students at the university)

Sample: A subset of measurements selected from the population of interest

Variable: A property of an individual population unit (e.g. major, height, weight of

freshman students)

Descriptive Statistics: deals with procedures used to summarize the information con-

tained in a set of measurements.

Inferential Statistics: deals with procedures used to make inferences (predictions)

about a population parameter from information contained in a sample.

Elements of a statistical problem:

(i) A clear deﬁnition of the population and variable of interest.

(ii) a design of the experiment or sampling procedure.

(iii) Collection and analysis of data (gathering and summarizing data).

(iv) Procedure for making predictions about the population based on sample infor-

mation.

(v) A measure of “goodness” or reliability for the procedure.

Objective. (better statement)

To make inferences (predictions, decisions) about certain characteristics of a popula-

tion based on information contained in a sample.

Types of data: qualitative vs quantitative OR discrete vs continuous

Descriptive statistics

Graphical vs numerical methods

6

2 Graphical Methods

Frequency and relative frequency distributions (Histograms):

Example

Weight Loss Data

20.5 19.5 15.6 24.1 9.9

15.4 12.7 5.4 17.0 28.6

16.9 7.8 23.3 11.8 18.4

13.4 14.3 19.2 9.2 16.8

8.8 22.1 20.8 12.6 15.9

Objective: Provide a useful summary of the available information.

Method: Construct a statistical graph called a “histogram” (or frequency distribution)

Weight Loss Data

class bound- tally class rel.

aries freq, f freq, f/n

1 5.0-9.0- 3 3/25 (.12)

2 9.0-13.0- 5 5/25 (.20)

3 13.0-17.0- 7 7/25 (.28)

4 17.0-21.0- 6 6/25 (.24)

5 21.0-25.0- 3 3/25 (.12)

6 25.0-29.0 1 1/25 (.04)

Totals 25 1.00

Let

k = # of classes

max = largest measurement

min = smallest measurement

n = sample size

w = class width

Rule of thumb:

-The number of classes chosen is usually between 5 and 20. (Most of the time between

7 and 13.)

-The more data one has the larger is the number of classes.

7

Formulas:

k = 1 + 3.3log

10

(n);

w =

max −min

k

.

Note: w =

28.6−5.4

6

= 3.87. But we used

w =

29−5

6

= 4.0 (why?)

Graphs: Graph the frequency and relative frequency distributions.

Exercise. Repeat the above example using 12 and 4 classes respectively. Comment on

the usefulness of each including k = 6.

Steps in Constructing a Frequency Distribution (Histogram)

1. Determine the number of classes

2. Determine the class width

3. Locate class boundaries

4. Proceed as above

Possible shapes of frequency distributions

1. Normal distribution (Bell shape)

2. Exponential

3. Uniform

4. Binomial, Poisson (discrete variables)

Important

-The normal distribution is the most popular, most useful, easiest to handle

- It occurs naturally in practical applications

- It lends itself easily to more in depth analysis

Other Graphical Methods

-Statistical Table: Comparing diﬀerent populations

- Bar Charts

- Line Charts

- Pie-Charts

- Cheating with Charts

8

3 Numerical methods

Measures of Central Measures of Dispersion

Tendency (Variability)

1. Sample mean 1. Range

2. Sample median 2. Mean Absolute Deviation (MAD)

3. Sample mode 3. Sample Variance

4. Sample Standard Deviation

I. Measures of Central Tendency

Given a sample of measurements (x

1

, x

2

, · · · , x

n

) where

n = sample size

x

i

= value of the i

th

observation in the sample

1. Sample Mean (arithmetic average)

x =

x

1

+x

2

+···+xn

n

or x =

¸

x

n

Example 1: Given a sample of 5 test grades

(90, 95, 80, 60, 75)

then

¸

x = 90 + 95 + 80 + 60 + 75 = 400

x =

¸

x

n

=

400

5

= 80.

Example 2: Let x = age of a randomly selected student sample:

(20, 18, 22, 29, 21, 19)

¸

x = 20 + 18 + 22 + 29 + 21 + 19 = 129

x =

¸

x

n

=

129

6

= 21.5

2. Sample Median

The median of a sample (data set) is the middle number when the measurements are

arranged in ascending order.

Note:

If n is odd, the median is the middle number

9

If n is even, the median is the average of the middle two numbers.

Example 1: Sample (9, 2, 7, 11, 14), n = 5

Step 1: arrange in ascending order

2, 7, 9, 11, 14

Step 2: med = 9.

Example 2: Sample (9, 2, 7, 11, 6, 14), n = 6

Step 1: 2, 6, 7, 9, 11, 14

Step 2: med =

7+9

2

= 8.

Remarks:

(i) x is sensitive to extreme values

(ii) the median is insensitive to extreme values (because median is a measure of

location or position).

3. Mode

The mode is the value of x (observation) that occurs with the greatest frequency.

Example: Sample: (9, 2, 7, 11, 14, 7, 2, 7), mode = 7

10

Eﬀect of x, median and mode on relative frequency distribution.

11

II. Measures of Variability

Given: a sample of size n

sample: (x

1

, x

2

, · · · , x

n

)

1. Range:

Range = largest measurement - smallest measurement

or Range = max - min

Example 1: Sample (90, 85, 65, 75, 70, 95)

Range = max - min = 95-65 = 30

2. Mean Absolute Diﬀerence (MAD) (not in textbook)

MAD =

¸

|x −x|

n

Example 2: Same sample

x =

¸

x

n

= 80

x x −x |x −x|

90 10 10

85 5 5

65 -15 15

75 -5 5

70 -10 10

95 15 15

Totals 480 0 60

MAD =

¸

|x −x|

n

=

60

6

= 10.

Remarks:

(i) MAD is a good measure of variability

(ii) It is diﬃcult for mathematical manipulations

3. Sample Variance, s

2

s

2

=

¸

(x −x)

2

n −1

4. Sample Standard Deviation, s

12

s =

√

s

2

or s =

¸

(x−x)

2

n−1

Example: Same sample as before (x = 80)

x x −x (x −x)

2

90 10 100

85 5 25

65 -15 225

75 -5 25

70 -10 100

95 15 225

Totals 480 0 700

Therefore

x =

¸

x

n

=

480

6

= 80

s

2

=

¸

(x −x)

2

n −1

=

700

5

= 140

s =

√

s

2

=

√

140 = 11.83

Shortcut Formula for Calculating s

2

and s

s

2

=

¸

x

2

−

(

¸

x)

2

n

n −1

s =

¸

x

2

−

(

¸

x)

2

n

n −1

(or s =

√

s

2

).

Example: Same sample

13

x x

2

90 8100

85 7225

65 4225

75 5625

70 4900

95 9025

Totals 480 39,100

s

2

=

¸

x

2

−

(

¸

x)

2

n

n −1

=

39, 100 −

(480)

2

6

5

=

39, 100 −38, 400

5

=

700

5

= 140

s =

√

s

2

=

√

140 = 11.83.

Numerical methods(Summary)

Data: {x

1

, x

2

, · · · , x

n

}

(i) Measures of central tendency

Sample mean: x =

¸

x

i

n

Sample median: the middle number when the measurements are arranged in ascending

order

Sample mode: most frequently occurring value

(ii) Measures of variability

Range: r = max −min

Sample Variance: s

2

=

¸

(x

i

−x)

2

n−1

Sample standard deviation: s=

√

s

2

Exercise: Find all the measures of central tendency and measures of variability for the

weight loss example.

Graphical Interpretation of the Variance:

Finite Populations

Let N = population size.

Data: {x

1

, x

2

, · · · , x

N

}

Population mean: µ =

¸

x

i

N

Population variance:

σ

2

=

¸

(x

i

−µ)

2

N

14

Population standard deviation: σ =

√

σ

2

, i.e.

σ =

¸

(x

i

−µ)

2

N

Population parameters vs sample statistics.

Sample statistics: x, s

2

, s.

Population parameters: µ, σ

2

, σ.

Practical Signiﬁcance of the standard deviation

Chebyshev’s Inequality. (Regardless of the shape of frequency distribution)

Given a number k ≥ 1, and a set of measurements x

1

, x

2

, . . . , x

n

, at least (1 −

1

k

2

) of

the measurements lie within k standard deviations of their sample mean.

Restated. At least (1 −

1

k

2

) observations lie in the interval (x −ks, x + ks).

Example. A set of grades has x = 75, s = 6. Then

(i) (k = 1): at least 0% of all grades lie in [69, 81]

(ii) (k = 2): at least 75% of all grades lie in [63, 87]

(iii) (k = 3): at least 88% of all grades lie in [57, 93]

(iv) (k = 4): at least ?% of all grades lie in [?, ?]

(v) (k = 5): at least ?% of all grades lie in [?, ?]

Suppose that you are told that the frequency distribution is bell shaped. Can you

improve the estimates in Chebyshev’s Inequality.

Empirical rule. Given a set of measurements x

1

, x

2

, . . . , x

n

, that is bell shaped. Then

(i) approximately 68% of the measurements lie within one standard deviations of their

sample mean, i.e. (x −s, x + s)

(ii) approximately 95% of the measurements lie within two standard deviations of

their sample mean, i.e. (x −2s, x + 2s)

(iii) at least (almost all) 99% of the measurements lie within three standard deviations

of their sample mean, i.e. (x −3s, x + 3s)

Example A data set has x = 75, s = 6. The frequency distribution is known to be

normal (bell shaped). Then

(i) (69, 81) contains approximately 68% of the observations

(ii) (63, 87) contains approximately 95% of the observations

(iii) (57, 93) contains at least 99% (almost all) of the observations

Comments.

(i) Empirical rule works better if sample size is large

(ii) In your calculations always keep 6 signiﬁcant digits

15

(iii) Approximation: s

range

4

(iv) Coeﬃcient of variation (c.v.) =

s

x

4 Percentiles

Using percentiles is useful if data is badly skewed.

Let x

1

, x

2

, . . . , x

n

be a set of measurements arranged in increasing order.

Deﬁnition. Let 0 < p < 100. The p

th

percentile is a number x such that p% of all

measurements fall below the p

th

percentile and (100 −p)% fall above it.

Example. Data: 2, 5, 8, 10, 11, 14, 17, 20.

(i) Find the 30th percentile.

Solution.

(S1) position = .3(n + 1) = .3(9) = 2.7

(S2) 30th percentile = 5 +.7(8 −5) = 5 + 2.1 = 7.1

Special Cases.

1. Lower Quartile (25th percentile)

Example.

(S1) position = .25(n + 1) = .25(9) = 2.25

(S2) Q

1

= 5 + .25(8 −5) = 5 + .75 = 5.75

2. Median (50th percentile)

Example.

(S1) position = .5(n + 1) = .5(9) = 4.5

(S2) median: Q

2

= 10 +.5(11 −10) = 10.5

3. Upper Quartile (75th percentile)

Example.

(S1) position = .75(n + 1) = .75(9) = 6.75

(S2) Q

3

= 14 + .75(17 −14) = 16.25

Interquartiles.

IQ = Q

3

−Q

1

Exercise. Find the interquartile (IQ) in the above example.

16

5 Sample Mean and Variance

For Grouped Data

Example: (weight loss data)

Weight Loss Data

class boundaries mid-pt. freq. xf x

2

f

x f

1 5.0-9.0- 7 3 21 147

2 9.0-13.0- 11 5 55 605

3 13.0-17.0- 15 7 105 1,575

4 17.0-21.0- 19 6 114 2,166

5 21.0-25.0- 23 3 69 1,587

6 25.0-29.0 27 1 27 729

Totals 25 391 6,809

Let k = number of classes.

Formulas.

x

g

=

¸

xf

n

s

2

g

=

¸

x

2

f −(

¸

xf)

2

/n

n −1

where the summation is over the number of classes k.

Exercise: Use the grouped data formulas to calculate the sample mean, sample variance

and sample standard deviation of the grouped data in the weight loss example. Compare

with the raw data results.

6 z-score

1. The sample z-score for a measurement x is

z =

x −x

s

2. The population z-score for a measurement x is

17

z =

x −µ

σ

Example. A set of grades has x = 75, s = 6. Suppose your score is 85. What is your

relative standing, (i.e. how many standard deviations, s, above (below) the mean your

score is)?

Answer.

z =

x −x

s

=

85 −75

6

= 1.66

standard deviations above average.

Review Exercises: Data Analysis

Please show all work. No credit for a correct ﬁnal answer without a valid argu-

ment. Use the formula, substitution, answer method whenever possible. Show your work

graphically in all relevant questions.

1. (Fluoride Problem) The regulation board of health in a particular state specify

that the ﬂuoride level must not exceed 1.5 ppm (parts per million). The 25 measurements

below represent the ﬂuoride level for a sample of 25 days. Although ﬂuoride levels are

measured more than once per day, these data represent the early morning readings for

the 25 days sampled.

.75 .86 .84 .85 .97

.94 .89 .84 .83 .89

.88 .78 .77 .76 .82

.71 .92 1.05 .94 .83

.81 .85 .97 .93 .79

(i) Show that x = .8588, s

2

= .0065, s = .0803.

(ii) Find the range, R.

(iii) Using k = 7 classes, ﬁnd the width, w, of each class interval.

(iv) Locate class boundaries

(v) Construct the frequency and relative frequency distributions for the data.

18

class frequency relative frequency

.70-.75-

.75-.80-

.80-.85-

.85-.90-

.90-.95-

.95-1.00-

1.00-1.05

Totals

(vi) Graph the frequency and relative frequency distributions and state your conclu-

sions. (Vertical axis must be clearly labeled)

2. Given the following data set (weight loss per week)

(9, 2, 5, 8, 4, 5)

(i) Find the sample mean.

(ii) Find the sample median.

(iii) Find the sample mode.

(iv) Find the sample range.

(v) Find the mean absolute diﬀerence.

(vi) Find the sample variance using the deﬁning formula.

(vii) Find the sample variance using the short-cut formula.

(viii) Find the sample standard deviation.

(ix) Find the ﬁrst and third quartiles, Q

1

and Q

3

.

(x) Repeat (i)-(ix) for the data set (21, 24, 15, 16, 24).

Answers: x = 5.5, med =5, mode =5 range = 7, MAD=2, s

s

, 6.7, s = 2.588, Q−3 =

8.25.

3. Grades for 50 students from a previous MAT test are summarized below.

class frequency, f xf x

2

f

40 -50- 4

50 -60- 6

60-70- 10

70-80- 15

80-90- 10

90-100 5

Totals

19

(i) Complete all entries in the table.

(ii) Graph the frequency distribution. (Vertical axis must be clearly labeled)

(iii) Find the sample mean for the grouped data

(iv) Find the sample variance and standard deviation for the grouped data.

Answers: Σxf = 3610, Σx

2

f = 270, 250, x = 72.2, s

2

= 196, s = 14.

4. Refer to the raw data in the ﬂuoride problem.

(i) Find the sample mean and standard deviation for the raw data.

(ii) Find the sample mean and standard deviation for the grouped data.

(iii) Compare the answers in (i) and (ii).

Answers: Σxf = 21.475, Σx

2

f = 18.58, x

g

=, s

g

= .0745.

5. Suppose that the mean of a population is 30. Assume the standard deviation is

known to be 4 and that the frequency distribution is known to be bell-shaped.

(i) Approximately what percentage of measurements fall in the interval (22, 34)

(ii) Approximately what percentage of measurements fall in the interval (µ, µ + 2σ)

(iii) Find the interval around the mean that contains 68% of measurements

(iv)Find the interval around the mean that contains 95% of measurements

6. Refer to the data in the ﬂuoride problem. Suppose that the relative frequency

distribution is bell-shaped. Using the empirical rule

(i) ﬁnd the interval around the mean that contains 99.6% of measurements.

(ii) ﬁnd the percentage of measurements fall in the interval (µ + 2σ, ∞)

7. (4 pts.) Answer by True of False . (Circle your choice).

T F (i) The median is insensitive to extreme values.

T F (ii) The mean is insensitive to extreme values.

T F (iii) For a positively skewed frequency distribution, the mean is larger than the

median.

T F (iv) The variance is equal to the square of the standard deviation.

T F (v) Numerical descriptive measures computed from sample measurements are

called parameters.

T F (vi) The number of students attending a Mathematics lecture on any given day

is a discrete variable.

20

T F (vii) The median is a better measure of central tendency than the mean when a

distribution is badly skewed.

T F (viii) Although we may have a large mass of data, statistical techniques allow us

to adequately describe and summarize the data with an average.

T F (ix) A sample is a subset of the population.

T F (x) A statistic is a number that describes a population characteristic.

T F (xi) A parameter is a number that describes a sample characteristic.

T F (xii) A population is a subset of the sample.

T F (xiii) A population is the complete collection of items under study.

21

Chapter 2

Probability

Contents.

Sample Space and Events

Probability of an Event

Equally Likely Outcomes

Conditional Probability and Independence

Laws of Probability

Counting Sample Points

Random Sampling

1 Sample Space and Events

Deﬁnitions

Random experiment: involves obtaining observations of some kind

Examples Toss of a coin, throw a die, polling, inspecting an assembly line, counting

arrivals at emergency room, etc.

Population: Set of all possible observations. Conceptually, a population could be gen-

erated by repeating an experiment indeﬁnitely.

Outcome of an experiment:

Elementary event (simple event): one possible outcome of an experiment

Event (Compound event): One or more possible outcomes of a random experiment

Sample space: the set of all sample points (simple events) for an experiment is called

a sample space; or set of all possible outcomes for an experiment

Notation.

Sample space : S

22

Sample point: E

1

, E

2

, . . . etc.

Event: A, B, C, D, E etc. (any capital letter).

Venn diagram:

Example.

S = {E

1

, E

2

, . . . , E

6

}.

That is S = {1, 2, 3, 4, 5, 6}. We may think of S as representation of possible outcomes

of a throw of a die.

More deﬁnitions

Union, Intersection and Complementation

Given A and B two events in a sample space S.

1. The union of A and B, A ∪ B, is the event containing all sample points in either

A or B or both. Sometimes we use AorB for union.

2. The intersection of A and B, A∩B, is the event containing all sample points that

are both in A and B. Sometimes we use AB or AandB for intersection.

3. The complement of A, A

c

, is the event containing all sample points that are not in

A. Sometimes we use notA or A for complement.

Mutually Exclusive Events (Disjoint Events) Two events are said to be mutually

exclusive (or disjoint) if their intersection is empty. (i.e. A∩ B = φ).

Example Suppose S = {E

1

, E

2

, . . . , E

6

}. Let

A = {E

1

, E

3

, E

5

};

B = {E

1

, E

2

, E

3

}. Then

(i)A ∪ B = {E

1

, E

2

, E

3

, E

5

}.

(ii) AB = {E

1

, E

3

}.

(iii) A

c

= {E

2

, E

4

, E

6

}; B

c

= {E

4

, E

5

, E

6

};

(iv) A and B are not mutually exclusive (why?)

(v) Give two events in S that are mutually exclusive.

2 Probability of an event

Relative Frequency Deﬁnition If an experiment is repeated a large number, n, of

times and the event A is observed n

A

times, the probability of A is

P(A)

n

A

n

Interpretation

n = # of trials of an experiment

23

n

A

= frequency of the event A

n

A

n

= relative frequency of A

P(A)

n

A

n

if n is large enough.

(In fact, P(A) = lim

n→∞

n

A

n

.)

Conceptual Deﬁnition of Probability

Consider a random experiment whose sample space is S with sample points E

1

, E

2

, . . . ,.

For each event E

i

of the sample space S deﬁne a number P(E) that satisﬁes the following

three conditions:

(i) 0 ≤ P(E

i

) ≤ 1 for all i

(ii) P(S) = 1

(iii) (Additive property)

¸

S

P(E

i

) = 1,

where the summation is over all sample points in S.

We refer to P(E

i

) as the probability of the E

i

.

Deﬁnition The probability of any event A is equal to the sum of the probabilities of the

sample points in A.

Example. Let S = {E

1

, . . . , E

10

}. It is known that P(E

i

) = 1/20, i = 1, . . . , 6 and

P(E

i

) = 1/5, i = 7, 8, 9 and P(E

10

) = 2/20. In tabular form, we have

E

i

E

1

E

2

E

3

E

4

E

5

E

6

E

7

E

8

E

9

E

10

p(E

i

) 1/20 1/20 1/20 1/20 1/20 1/20 1/5 1/5 1/5 1/10

Question: Calculate P(A) where A = {E

i

, i ≥ 6}.

A:

P(A) = P(E

6

) + P(E

7

) + P(E

8

) + P(E

9

) + P(E

10

)

= 1/20 + 1/5 + 1/5 + 1/5 + 1/10 = 0.75

Steps in calculating probabilities of events

1. Deﬁne the experiment

2. List all simple events

3. Assign probabilities to simple events

4. Determine the simple events that constitute an event

5. Add up the simple events’ probabilities to obtain the probability of the event

24

Example Calculate the probability of observing one H in a toss of two fair coins.

Solution.

S = {HH, HT, TH, TT}

A = {HT, TH}

P(A) = 0.5

Interpretations of Probability

(i) In real world applications one observes (measures) relative frequencies, one cannot

measure probabilities. However, one can estimate probabilities.

(ii) At the conceptual level we assign probabilities to events. The assignment, how-

ever, should make sense. (e.g. P(H)=.5, P(T)=.5 in a toss of a fair coin).

(iii) In some cases probabilities can be a measure of belief (subjective probability).

This measure of belief should however satisfy the axioms.

(iv) Typically, we would like to assign probabilities to simple events directly; then use

the laws of probability to calculate the probabilities of compound events.

Equally Likely Outcomes

The equally likely probability P deﬁned on a ﬁnite sample space S = {E

1

, . . . , E

N

},

assigns the same probability P(E

i

) = 1/N for all E

i

.

In this case, for any event A

P(A) =

N

A

N

=

sample points in A

sample points in S

=

#(A)

#(S)

where N is the number of the sample points in S and N

A

is the number of the sample

points in A.

Example. Toss a fair coin 3 times.

(i) List all the sample points in the sample space

Solution: S = {HHH, · · · TTT} (Complete this)

(ii) Find the probability of observing exactly two heads, at most one head.

3 Laws of Probability

Conditional Probability

The conditional probability of the event A given that event B has occurred is denoted

by P(A|B). Then

P(A|B) =

P(A∩ B)

P(B)

25

provided P(B) > 0. Similarly,

P(B|A) =

P(A∩ B)

P(A)

Independent Events

Deﬁnitions. (i) Two events A and B are said to be independent if

P(A∩ B) = P(A)P(B).

(ii) Two events A and B that are not independent are said to be dependent.

Remarks. (i) If A and B are independent, then

P(A|B) = P(A) and P(B|A) = P(B).

(ii) If A is independent of B then B is independent of A.

Probability Laws

Complementation law:

P(A) = 1 −P(A

c

)

Additive law:

P(A∪ B) = P(A) + P(B) −P(A∩ B)

Moreover, if A and B are mutually exclusive, then P(AB) = 0 and

P(A∪ B) = P(A) + P(B)

Multiplicative law (Product rule)

P(A∩ B) = P(A|B)P(B)

= P(B|A)P(A)

Moreover, if A and B are independent

P(AB) = P(A)P(B)

Example Let S = {E

1

, E

2

, . . . , E

6

}; A = {E

1

, E

3

, E

5

}; B = {E

1

, E

2

, E

3

}; C = {E

2

, E

4

, E

6

};D =

{E

6

}. Suppose that all elementary events are equally likely.

(i) What does it mean that all elementary events are equally likely?

(ii) Use the complementation rule to ﬁnd P(A

c

).

(iii) Find P(A|B) and P(B|A)

(iv) Find P(D) and P(D|C)

26

(v) Are A and B independent? Are C and D independent?

(vi) Find P(A∩ B) and P(A∪ B).

Law of total probability Let the B, B

c

be complementary events and let A denote an

arbitrary event. Then

P(A) = P(A∩ B) + P(A∩ B

c

) ,

or

P(A) = P(A|B)P(B) + P(A|B

c

)P(B

c

).

Bayes’ Law

Let the B, B

c

be complementary events and let A denote an arbitrary event. Then

P(B|A) =

P(AB)

P(A)

=

P(A|B)P(B)

P(A|B)P(B) + P(A|B

c

)P(B

c

)

.

Remarks.

(i) The events of interest here are B, B

c

, P(B) and P(B

c

) are called prior probabilities,

and

(ii) P(B|A) and P(B

c

|A) are called posterior (revised) probabilities.

(ii) Bayes’ Law is important in several ﬁelds of applications.

Example 1. A laboratory blood test is 95 percent eﬀective in detecting a certain disease

when it is, in fact, present. However, the test also yields a “false positive” results for

1 percent of healthy persons tested. (That is, if a healthy person is tested, then, with

probability 0.01, the test result will imply he or she has the disease.) If 0.5 percent of

the population actually has the disease, what is the probability a person has the disease

given that the test result is positive?

Solution Let D be the event that the tested person has the disease and E the event

that the test result is positive. The desired probability P(D|E) is obtained by

P(D|E) =

P(D ∩ E)

P(E)

=

P(E|D)P(D)

P(E|D)P(D) + P(E|D

c

)P(D

c

)

=

(.95)(.005)

(.95)(.005) + (.01)(.995)

=

95

294

.323.

27

Thus only 32 percent of those persons whose test results are positive actually have the

disease.

Probabilities in Tabulated Form

4 Counting Sample Points

Is it always necessary to list all sample points in S?

Coin Tosses

Coins sample-points Coins sample-points

1 2 2 4

3 8 4 16

5 32 6 64

10 1024 20 1,048,576

30 10

9

40 10

12

50 10

15

64 10

19

Note that 2

30

10

9

= one billion, 2

40

10

12

= one thousand billion, 2

50

10

15

=

one trillion.

RECALL: P(A) =

n

A

n

, so for some applications we need to ﬁnd n, n

A

where n and

n

A

are the number of points in S and A respectively.

Basic principle of counting: mn rule

Suppose that two experiments are to be performed. Then if experiment 1 can result

in any one of m possible outcomes and if, for each outcome of experiment 1, there are n

possible outcomes of experiment 2, then together there are mn possible outcomes of the

two experiments.

Examples.

(i) Toss two coins: mn = 2 ×2 = 4

(ii) Throw two dice: mn = 6 ×6 = 36

(iii) A small community consists of 10 men, each of whom has 3 sons. If one man

and one of his sons are to be chosen as father and son of the year, how many diﬀerent

choices are possible?

Solution: Let the choice of the man as the outcome of the ﬁrst experiment and the

subsequent choice of one of his sons as the outcome of the second experiment, we see,

from the basic principle, that there are 10 ×3 = 30 possible choices.

Generalized basic principle of counting

28

If r experiments that are to be performed are such that the ﬁrst one may result in

any of n

1

possible outcomes, and if for each of these n

1

possible outcomes there are n

2

possible outcomes of the second experiment, and if for each of the possible outcomes of

the ﬁrst two experiments there are n

3

possible outcomes of the third experiment, and if,

. . ., then there are a total of n

1

· n

2

· · · n

r

possible outcomes of the r experiments.

Examples

(i) There are 5 routes available between A and B; 4 between B and C; and 7 between

C and D. What is the total number of available routes between A and D?

Solution: The total number of available routes is mnt = 5.4.7 = 140.

(ii) A college planning committee consists of 3 freshmen, 4 sophomores, 5 juniors,

and 2 seniors. A subcommittee of 4, consisting of 1 individual from each class, is to be

chosen. How many diﬀerent subcommittees are possible?

Solution: It follows from the generalized principle of counting that there are 3·4·5·2 =

120 possible subcommittees.

(iii) How many diﬀerent 7−place license plates are possible if the ﬁrst 3 places are to

be occupied by letters and the ﬁnal 4 by numbers?

Solution: It follows from the generalized principle of counting that there are 26 · 26 ·

26 · 10 · 10 · 10 · 10 = 175, 760, 000 possible license plates.

(iv) In (iii), how many license plates would be possible if repetition among letters or

numbers were prohibited?

Solution: In this case there would be 26 · 25 · 24 · 10 · 9 · 8 · 7 = 78, 624, 000 possible

license plates.

Permutations: (Ordered arrangements)

The number of ways of ordering n distinct objects taken r at a time (order is impor-

tant) is given by

n!

(n −r)!

= n(n −1)(n −2) · · · (n −r + 1)

Examples

(i) In how many ways can you arrange the letters a, b and c. List all arrangements.

Answer: There are 3! = 6 arrangements or permutations.

(ii) A box contains 10 balls. Balls are selected without replacement one at a time. In

how many diﬀerent ways can you select 3 balls?

Solution: Note that n = 10, r = 3. Number of diﬀerent ways is

10 · 9 · 8 =

10!

7!

= 720,

29

(which is equal to

n!

(n−r)!

).

Combinations

For r ≤ n, we deﬁne

n

r

=

n!

(n −r)!r!

and say that

n

r

**represents the number of possible combinations of n objects taken r at
**

a time (with no regard to order).

Examples

(i) A committee of 3 is to be formed from a group of 20 people. How many diﬀerent

committees are possible?

Solution: There are

20

3

=

20!

3!17!

=

20.19.18

3.2.1

= 1140 possible committees.

(ii) From a group of 5 men and 7 women, how many diﬀerent committees consisting

of 2 men and 3 women can be formed?

Solution:

5

2

7

3

**= 350 possible committees.
**

5 Random Sampling

Deﬁnition. A sample of size n is said to be a random sample if the n elements are selected

in such a way that every possible combination of n elements has an equal probability of

being selected.

In this case the sampling process is called simple random sampling.

Remarks. (i) If n is large, we say the random sample provides an honest representation

of the population.

(ii) For ﬁnite populations the number of possible samples of size n is

N

n

. For instance

the number of possible samples when N = 28 and n = 4 is

28

4

= 20, 475.

(iii) Tables of random numbers may be used to select random samples.

6 Modeling Uncertainty

The purpose of modeling uncertainty (randomness) is to discover the laws of change.

1. Concept of Probability. Even though probability (chance) involves the notion of

change, the laws governing the change may themselves remain ﬁxed as time passes.

Example. Consider a chance experiment: Toss of a coin.

30

Probabilistic Law. In a fair coin tossing experiment the percentage of (H)eads is very

close to 0.5. In the model (abstraction): P(H) = 0.5 exactly.

Why Probabilistic Reasoning?

Example. Toss 5 coins repeatedly and write down the number of heads observed in each

trial. Now, what percentage of trials produce 2 Heads?

answer. Use the Binomial law to show that

P(2Heads) =

5

2

(0.5)

2

(1 −.5)

3

=

5!

2!3!

(0.5)

2

(.5)

3

= 0.3125

Conclusion. There is no need to carry out this experiment to answer the question.

(Thus saving time and eﬀort).

2. The Interplay Between Probability and Statistics. (Theory versus Application)

(i) Theory is an exact discipline developed from logically deﬁned axioms (conditions).

(ii) Theory is related to physical phenomena only in inexact terms (i.e. approxi-

mately).

(iii) When theory is applied to real problems, it works ( i.e. it makes sense).

Example. A fair die is tossed for a very large number of times. It was observed that

face 6 appeared 1, 500. Estimate how many times the die is tossed.

Answer. 9000 times.

Review Exercises: Probability

Please show all work. No credit for a correct ﬁnal answer without a valid argu-

ment. Use the formula, substitution, answer method whenever possible. Show your work

graphically in all relevant questions.

1. An experiment consists of tossing 3 fair coins.

(i) List all the elements in the sample space.

(ii) Describe the following events:

A = { observe exactly two heads}

B = { Observe at most one tail}

C = { Observe at least two heads}

D = {Observe exactly one tail}

(iii) Find the probabilities of events A, B, C, D.

31

2. Suppose that S = {1, 2, 3, 4, 5, 6} such that P(1) = .1, P(2) = .1,P(3)=.1, P(4)=.2,

P(5) = .2, P(6) = .3.

(i) Find the probability of the event A = {4, 5, 6}.

(ii) Find the probability of the complement of A.

(iii) Find the probability of the event B = {even}.

(iv) Find the probability of the event C = {odd}.

3. An experiment consists of throwing a fair die.

(i) List all the elements in the sample space.

(ii) Describe the following events:

A = { observe a number larger than 3 }

B = { Observe an even number}

C = { Observe an odd number}

(iii) Find the probabilities of events A, B, C.

(iv) Compare problems 2. and 3.

4. Refer to problem 3. Find

(i) A∪ B

(ii) A∩ B

(iii) B ∩ C

(iv) A

c

(v) C

c

(vi) A ∪ C

(vii) A ∩ C

(viii) Find the probabilities in (i)-(vii).

(ix) Refer to problem 2., and answer questions (i)-(viii).

5. The following probability table gives the intersection probabilities for four events

A, B, C and D:

A B

C .06 0.31

D .55 .08

1.00

(i) Using the deﬁnitions, ﬁnd P(A), P(B), P(C), P(D), P(C|A), P(D|A) and P(C|B).

32

(ii) Find P(B

c

).

(iii) Find P(A∩ B).

(iv) Find P(A∪ B).

(v) Are B and C independent events? Justify your answer.

(vi) Are B and C mutually exclusive events? Justify your answer.

(vii) Are C and D independent events? Justify your answer.

(viii) Are C and D mutually exclusive events? Justify your answer.

6. Use the laws of probability to justify your answers to the following questions:

(i) If P(A ∪ B) = .6, P(A) = .2, and P(B) = .4, are A and B mutually exclusive?

independent?

(ii) If P(A∪ B) = .65, P(A) = .3, and P(B) = .5, are A and B mutually exclusive?

independent?

(iii) If P(A ∪ B) = .7, P(A) = .4, and P(B) = .5, are A and B mutually exclusive?

independent?

7. Suppose that the following two weather forecasts were reported on two local TV

stations for the same period. First report: The chances of rain are today 30%, tomorrow

40%, both today and tomorrow 20%, either today or tomorrow 60%. Second report: The

chances of rain are today 30%, tomorrow 40%, both today and tomorrow 10%, either

today or tomorrow 60%. Which of the two reports, if any, is more believable? Why? No

credit if answer is not justiﬁed. (Hint: Let A and B be the events of rain today and rain

tomorrow.)

8. A box contains ﬁve balls, a black (b), white (w), red (r), orange (o), and green (g).

Three balls are to be selected at random.

(i) Find the sample space S (Hint: there is 10 sample points).

S = {bwr, · · ·}

(ii) Find the probability of selecting a black ball.

(iii) Find the probability of selecting one black and one red ball.

9. A box contains four black and six white balls.

(i) If a ball is selected at random, what is the probability that it is white? black?

(ii) If two balls are selected without replacement, what is the probability that both

balls are black? both are white? the ﬁrst is white and the second is black? the ﬁrst is

black and the second is white? one ball is black?

(iii) Repeat (ii) if the balls are selected with replacement.

33

(Hint: Start by deﬁning the events B

1

and B − 2 as the ﬁrst ball is black and the

second ball is black respectively, and by deﬁning the events W

1

abd W − 2 as the ﬁrst

ball is white and the second ball is white respectively. Then use the product rule)

10. Answer by True of False . (Circle your choice).

T F (i) An event is a speciﬁc collection of simple events.

T F (ii) The probability of an event can sometimes be negative.

T F (iii) If A and B are mutually exclusive events, then they are also dependent.

T F (iv) The sum of the probabilities of all simple events in the sample space may be

less than 1 depending on circumstances.

T F (v) A random sample of n observations from a population is not likely to provide

a good estimate of a parameter.

T F (vi) A random sample of n observations from a population is one in which every

diﬀerent subset of size n from the population has an equal probability of being selected.

T F (vii) The probability of an event can sometimes be larger than one.

T F (viii) The probability of an elementary event can never be larger than one half.

T F (ix) Although the probability of an event occurring is .9, the event may not occur

at all in 10 trials.

T F (x) If a random experiment has 5 possible outcomes, then the probability of each

outcome is 1/5.

T F (xi) If two events are independent, the occurrence of one event should not aﬀect

the likelihood of the occurrence of the other event.

34

Chapter 3

Random Variables and Discrete

Distributions

Contents.

Random Variables

Expected Values and Variance

Binomial

Poisson

Hypergeometric

1 Random Variables

The discrete rv arises in situations when the population (or possible outcomes) are

discrete (or qualitative).

Example. Toss a coin 3 times, then

S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}

Let the variable of interest, X, be the number of heads observed then relevant events

would be

{X = 0} = {TTT}

{X = 1} = {HTT, THT, TTH}

{X = 2} = {HHT, HTH, THH}

{X = 3} = {HHH}.

The relevant question is to ﬁnd the probability of each these events.

Note that X takes integer values even though the sample space consists of H’s and

T’s.

35

The variable X transforms the problem of calculating probabilities from that of set

theory to calculus.

Deﬁnition. A random variable (r.v.) is a rule that assigns a numerical value to each

possible outcome of a random experiment.

Interpretation:

-random: the value of the r.v. is unknown until the outcome is observed

- variable: it takes a numerical value

Notation: We use X, Y , etc. to represent r.v.s.

A Discrete r.v. assigns a ﬁnite or countably inﬁnite number of possible values

(e.g. toss a coin, throw a die, etc.)

A Continuous r.v. has a continuum of possible values.

(e.g. height, weight, price, etc.)

Discrete Distributions The probability distribution of a discrete r.v., X, assigns a

probability p(x) for each possible x such that

(i) 0 ≤ p(x) ≤ 1, and

(ii)

¸

x

p(x) = 1

where the summation is over all possible values of x.

Discrete distributions in tabulated form

Example.

Which of the following deﬁnes a probability distribution?

x 0 1 2

p(x) 0.30 0.50 0.20

x 0 1 2

p(x) 0.60 0.50 -0.10

x -1 1 2

p(x) 0.30 0.40 0.20

Remarks. (i) Discrete distributions arise when the r.v. X is discrete (qualitative data)

36

(ii) Continuous distributions arise when the r.v. X is continuous (quantitative data)

Remarks. (i) In data analysis we described a set of data (sample) by dividing it into

classes and calculating relative frequencies.

(ii) In Probability we described a random experiment (population) in terms of events

and probabilities of events.

(iii) Here, we describe a random experiment (population) by using random variables,

and probability distribution functions.

2 Expected Value and Variance

Deﬁnition 2.1 The expected value of a discrete rv X is denoted by µ and is deﬁned to

be

µ =

¸

x

xp(x).

Notation: The expected value of X is also denoted by µ = E[X]; or sometimes µ

X

to

emphasize its dependence on X.

Deﬁnition 2.2 If X is a rv with mean µ, then the variance of X is deﬁned by

σ

2

=

¸

x

(x −µ)

2

p(x)

Notation: Sometimes we use σ

2

= V (X) (or σ

2

X

).

Shortcut Formula

σ

2

=

¸

x

2

p(x) −µ

2

Deﬁnition 2.3 If X is a rv with mean µ, then the standard deviation of X, denoted by

σ

X

, (or simply σ) is deﬁned by

σ =

V (X) =

¸

(x −µ)

2

p(x)

Shortcut Formula

σ =

¸

x

2

p(x) −µ

2

37

3 Discrete Distributions

Binomial.

The binomial experiment (distribution) arises in following situation:

(i) the underlying experiment consists of n independent and identical trials;

(ii) each trial results in one of two possible outcomes, a success or a failure;

(iii) the probability of a success in a single trial is equal to p and remains the same

throughout the experiment; and

(iv) the experimenter is interested in the rv X that counts the number of successes

observed in n trials.

A r.v. X is said to have a binomial distribution with parameters n and p if

p(x) =

n

x

p

x

q

n−x

(x = 0, 1, . . . , n)

where q = 1 −p.

Mean: µ = np

Variance: σ

2

= npq, σ =

√

npq

Example: Bernoulli.

A rv X is said to have a Bernoulli distribution with parameter p if

Formula: p(x) = p

x

(1 −p)

1−x

x = 0, 1.

Tabulated form:

x 0 1

p(x) 1-p p

Mean: µ = p

Variance: σ

2

= pq, σ =

√

pq

Binomial Tables.

Cumulative probabilities are given in the table.

Example. Suppose X has a binomial distribution with n = 10, p = .4. Find

(i) P(X ≤ 4) = .633

(ii) P(X < 6) = P(X ≤ 5) = .834

(iii) P(X > 4) = 1 −P(X ≤ 4) = 1 −.633 = .367

(iv) P(X = 5) = P(X ≤ 5) −P(X ≤ 4) = .834 −.633 = .201

Exercise: Answer the same question with p = 0.7

38

Poisson.

The Poisson random variable arises when counting the number of events that occur

in an interval of time when the events are occurring at a constant rate; examples include

number of arrivals at an emergency room, number of items demanded from an inventory;

number of items in a batch of a random size.

A rv X is said to have a Poisson distribution with parameter λ > 0 if

p(x) = e

−λ

λ

x

/x!, x = 0, 1, . . . .

Graph.

Mean: µ = λ

Variance: σ

2

= λ, σ =

√

λ

Note: e 2.71828

Example. Suppose the number of typographical errors on a single page of your book

has a Poisson distribution with parameter λ = 1/2. Calculate the probability that there

is at least one error on this page.

Solution. Letting X denote the number of errors on a single page, we have

P(X ≥ 1) = 1 −P(X = 0) = 1 −e

−0.5

0.395

Rule of Thumb. The Poisson distribution provides good approximations to binomial

probabilities when n is large and µ = np is small, preferably with np ≤ 7.

Example. Suppose that the probability that an item produced by a certain machine

will be defective is 0.1. Find the probability that a sample of of 10 items will contain at

most 1 defective item.

Solution. Using the binomial distribution, the desired probability is

P(X ≤ 1) = p(0) + p(1) =

10

0

(0.1)

0

(0.9)

10

+

10

1

(0.1)

1

(0.9)

9

= 0.7361

Using Poisson approximation, we have λ = np = 1

e

−1

+ e

−1

0.7358

which is close to the exact answer.

Hypergeometric.

The hypergeometric distribution arises when one selects a random sample of size n,

without replacement, from a ﬁnite population of size N divided into two classes consisting

39

of D elements of the ﬁrst kind and N − D of the second kind. Such a scheme is called

sampling without replacement from a ﬁnite dichotomous population.

Formula:

f(x) =

D

x

N−D

n−x

N

n

,

where max(0, n −N + D) ≤ x ≤ min(n, D). We deﬁne F(x) = 0, elsewhere.

Mean: E[X] = n(

D

N

)

Variance: V (X) = (

N−n

N−1

)(n)(

D

N

(1 −

D

N

))

The

N−n

N−1

is called the ﬁnite population correction factor.

Example. (Sampling without replacement)

Suppose an urn contains D = 10 red balls and N − D = 15 white balls. A random

sample of size n = 8, without replacement, is drawn and the number or red balls is

denoted by X. Then

f(x) =

10

x

15

8−x

25

8

0 ≤ x ≤ 8 .

4 Markov Chains

Example 1.(Brand Switching Problem)

Suppose that a manufacturer of a product (Brand 1) is competing with only one

other similar product (Brand 2). Both manufacturers have been engaged in aggressive

advertising programs which include oﬀering rebates, etc. A survey is taken to ﬁnd out

the rates at which consumers are switching brands or staying loyal to brands. Responses

to the survey are given below. If the manufacturers are competing for a population of

y = 300, 000 buyers, how should they plan for the future (immediate future, and in the

long-run)?

Brand Switching Data

This week

Last week Brand 1 Brand 2 Total

Brand 1 90 10 100

Brand 2 40 160 200

40

Brand 1 Brand 2

Brand 1 90/100 10/100

Brand 2 40/200 160/200

So

P =

0.9 0.1

0.2 0.8

**Question 1. suppose that customer behavior is not changed over time. If 1/3 of all
**

customers purchased B1 this week.

What percentage will purchase B1 next week?

What percentage will purchase B2 next week?

What percentage will purchase B1 two weeks from now?

What percentage will purchase B2 two weeks from now?

Solution: Note that π

0

= (1/3, 2/3), then

π

1

= (π

1

1

, π

1

2

) = (π

0

1

, π

0

2

)P

π

1

= (π

1

1

, π

1

2

) = (1/3, 2/3)

0.9 0.1

0.2 0.8

= (1.3/3, 1.7/3)

B1 buyers will be 300, 000(1.3/3) = 130, 000

B2 buyers will be 300, 000(1.7/3) = 170, 000.

Two weeks from now: exercise.

Question 2. Determine whether each brand will eventually retain a constant share of

the market.

Solution:

We need to solve π = πP, and

¸

i

π

i

= 1, that is

(π

1

, π

2

) = (π

1

, π

2

)

0.9 0.1

0.2 0.8

and

π

1

+ π

2

= 1

Matrix multiplication gives

π

1

= 0.9π

1

+ 0.2π

2

π

2

= 0.1π

1

+ 0.8π

2

π

1

+ π

2

= 1

41

One equation is redundant. Choose the ﬁrst and the third. we get

0.1π

1

= 0.2π

2

and π

1

+ π

2

= 1

which gives

(π

1

, π

2

) = (2/3, 1/3)

Brand 1 will eventually capture two thirds of the market (200, 000) customers.

Example 2. On any particular day Rebecca is either cheerful (c) or gloomy (g). If she is

cheerful today then she will be cheerful tomorrow with probability 0.7. If she is gloomy

today then she will be gloomy tomorrow with probability 0.4.

(i) What is the transition matrix P?

Solution:

P =

0.7 0.3

0.6 0.4

**(ii) What is the fraction of days Rebecca is cheerful? gloomy?
**

Solution: The fraction of days Rebecca is cheerful is the probability that on any given

day Rebecca is cheerful. This can be obtained by solving π = πP, where π = (π

0

, π

1

),

and π

0

+ π

1

= 1.

Exercise. Complete this problem.

Review Exercises: Discrete Distributions

Please show all work. No credit for a correct ﬁnal answer without a valid argu-

ment. Use the formula, substitution, answer method whenever possible. Show your work

graphically in all relevant questions.

1. Identify the following as discrete or continuous random variables.

(i) The market value of a publicly listed security on a given day

(ii) The number of printing errors observed in an article in a weekly news magazine

(iii) The time to assemble a product (e.g. a chair)

(iv) The number of emergency cases arriving at a city hospital

(v) The number of sophomores in a randomly selected Math. class at a university

(vi) The rate of interest paid by your local bank on a given day

2. What restrictions do we place on the probabilities associated with a particular

probability distribution?

42

3. Indicate whether or not the following are valid probability distributions. If they

are not, indicate which of the restrictions has been violated.

(i)

x -1 0 1 3.5

p(x) .6 .1 .1 .2

(ii)

x -1 1 3.5

p(x) .6 .6 -.2

(ii)

x -2 1 4 6

p(x) .2 .2 .2 .1

43

4. A random variable X has the following probability distribution:

x 1 2 3 4 5

p(x) .05 .10 .15 .45 .25

(i) Verify that X has a valid probability distribution.

(ii) Find the probability that X is greater than 3, i.e. P(X > 3).

(iii) Find the probability that X is greater than or equal to 3, i.e. P(X ≥ 3).

(iv) Find the probability that X is less than or equal to 2, i.e. P(X ≤ 2).

(v) Find the probability that X is an odd number.

(vi) Graph the probability distribution for X.

5. A discrete random variable X has the following probability distribution:

x 10 15 20 25

p(x) .2 .3 .4 .1

(i) Calculate the expected value of X, E(X) = µ.

(ii) Calculate the variance of X, σ

2

.

(ii) Calculate the standard deviation of X, σ.

Answers: µ = 17, σ

2

= 21, σ = 4.58.

6. For each of the following probability distributions, calculate the expected value of

X, E(X) = µ; the variance of X, σ

2

; and the standard deviation of X, σ.

(i)

x 1 2 3 4

p(x) .4 .3 .2 .1

44

(ii)

x -2 -1 2 4

p(x) .2 .3 .3 .2

7. In how many ways can a committee of ten be chosen from ﬁfteen individuals?

8. Answer by True of False . (Circle your choice).

T F (i) The expected value is always positive.

T F (ii) A random variable has a single numerical value for each outcome of a random

experiment.

T F (iii) The only rule that applies to all probability distributions is that the possible

random variable values are always between 0 and 1.

T F (iv) A random variable is one that takes on diﬀerent values depending on the

chance outcome of an experiment.

T F (v) The number of television programs watched per day by a college student is

an example of a discrete random variable.

T F (vi) The monthly volume of gasoline sold in one gas station is an example of a

discrete random variable.

T F (vii) The expected value of a random variable provides a complete description of

the random variable’s probability distribution.

T F (viii) The variance can never be equal to zero.

T F (ix) The variance can never be negative.

T F (x) The probability p(x) for a discrete random variable X must be greater than

or equal to zero but less than or equal to one.

T F (xi) The sum of all probabilities p(x) for all possible values of X is always equal

to one.

T F (xii) The most common method for sampling more than one observation from a

population is called random sampling.

Review Exercises: Binomial Distribution

Please show all work. No credit for a correct ﬁnal answer without a valid argu-

ment. Use the formula, substitution, answer method whenever possible. Show your work

graphically in all relevant questions.

45

1. List the properties for a binomial experiment.

2. Give the formula for the binomial probability distribution.

3. Calculate

(i) 5!

(ii) 10!

(iii)

7!

3!4!

4. Consider a binomial distribution with n = 4 and p = .5.

(i) Use the formula to ﬁnd P(0), P(1), · · · , P(4).

(ii) Graph the probability distribution found in (i)

(iii) Repeat (i) and (ii) when n = 4, and p = .2.

(iv) Repeat (i) and (ii) when n = 4, and p = .8.

5. Consider a binomial distribution with n = 5 and p = .6.

(i) Find P(0) and P(2) using the formula.

(ii) Find P(X ≤ 2) using the formula.

(iii) Find the expected value E(X) = µ

(iv) Find the standard deviation σ

6. Consider a binomial distribution with n = 500 and p = .6.

(i) Find the expected value E(X) = µ

(ii) Find the standard deviation σ

7. Consider a binomial distribution with n = 25 and p = .6.

(i) Find the expected value E(X) = µ

(ii) Find the standard deviation σ

(iii) Find P(0) and P(2) using the table.

(iv) Find P(X ≤ 2) using the table.

(v) Find P(X < 12) using the table.

(vi) Find P(X > 13) using the table.

(vii) Find P(X ≥ 8) using the table.

8. A sales organization makes one sale for every 200 prospects that it contacts. The

organization plans to contact 100, 000 prospects over the coming year.

(i) What is the expected value of X, the annual number of sales.

(ii) What is the standard deviation of X.

46

(iii) Within what limits would you expect X to fall with 95% probability. (Use the

empirical rule). Answers: µ = 500, σ = 22.3

9. Identify the binomial experiment in the following group of statements.

(i) a shopping mall is interested in the income levels of its customers and is taking a

survey to gather information

(ii) a business ﬁrm introducing a new product wants to know how many purchases

its clients will make each year

(iii) a sociologist is researching an area in an eﬀort to determine the proportion of

households with male “head of households”

(iv) a study is concerned with the average hours worked be teenagers who are attend-

ing high school

(v) Determining whether or nor a manufactured item is defective.

(vi) Determining the number of words typed before a typist makes an error.

(vii) Determining the weekly pay rate per employee in a given company.

10. Answer by True of False . (Circle your choice).

T F (i) In a binomial experiment each trial is independent of the other trials.

T F (i) A binomial distribution is a discrete probability distribution

T F (i) The standard deviation of a binomial probability distribution is given by npq.

47

Chapter 4

Continuous Distributions

Contents.

1. Standard Normal

2. Normal

3. Uniform

4. Exponential

1 Introduction

RECALL: The continuous rv arises in situations when the population (or possible

outcomes) are continuous (or quantitative).

Example. Observe the lifetime of a light bulb, then

S = {x, 0 ≤ x < ∞}

Let the variable of interest, X, be observed lifetime of the light bulb then relevant events

would be {X ≤ x}, {X ≥ 1000}, or {1000 ≤ X ≤ 2000}.

The relevant question is to ﬁnd the probability of each these events.

Important. For any continuous pdf the area under the curve is equal to 1.

2 The Normal Distribution

Standard Normal.

A normally distributed (bell shaped) random variable with µ = 0 and σ = 1 is said

to have the standard normal distribution. It is denoted by the letter Z.

48

pdf of Z:

f(z) =

1

√

2π

e

−z

2

/2

; −∞< z < ∞,

Graph.

Tabulated Values.

Values of P(0 ≤ Z ≤ z) are tabulated in the appendix.

Critical Values: z

α

of the standard normal distribution are given by

P(Z ≥ z

α

) = α

which is in the tail of the distribution.

Examples.

(i) P(0 ≤ Z ≤ 1) = .3413

(ii) P(−1 ≤ Z ≤ 1) = .6826

(iii) P(−2 ≤ Z ≤ 2) = .9544

(iv) P(−3 ≤ Z ≤ 3) = .9974

Examples. Find z

0

such that

(i) P(Z > z

0

) = .10; z

0

= 1.28.

(ii) P(Z > z

0

) = .05; z

0

= 1.645.

(iii) P(Z > z

0

) = .025; z

0

= 1.96.

(iv) P(Z > z

0

) = .01; z

0

= 2.33.

(v) P(Z > z

0

) = .005; z

0

= 2.58.

(vi) P(Z ≤ z

0

) = .10, .05, .025, .01, .005. (Exercise)

Normal

A rv X is said to have a Normal pdf with parameters µ and σ if

Formula:

f(x) =

1

σ

√

2π

e

−(x−µ)

2

/2σ

2

; −∞< x < ∞,

where

−∞< µ < ∞; 0 < σ < ∞.

Properties

Mean: E[X] = µ

Variance: V (X) = σ

2

Graph: Bell shaped.

Area under graph = 1.

Standardizing a normal r.v.:

49

Z-score:

Z =

X −µ

X

σ

X

OR (simply)

Z =

X −µ

σ

Conversely,

X = µ + σZ .

Example If X is a normal rv with parameters µ = 3 and σ

2

= 9, ﬁnd (i) P(2 < X < 5),

(ii) P(X > 0), and (iii) P(X > 9).

Solution (i)

P(2 < X < 5) = P(−0.33 < Z < 0.67)

= .3779.

(ii)

P(X > 0) = P(Z > −1) = P(Z < 1)

= .8413.

(iii)

P(X > 9) = P(Z > 2.0)

= 0.5 −0.4772 = .0228

Exercise Refer to the above example, ﬁnd P(X < −3).

Example The length of life of a certain type of automatic washer is approximately

normally distributed, with a mean of 3.1 years and standard deviation of 1.2 years. If

this type of washer is guaranteed for 1 year, what fraction of original sales will require

replacement?

Solution Let X be the length of life of an automatic washer selected at random, then

z =

1 −3.1

1.2

= −1.75

Therefore

P(X < 1) = P(Z < −1.75) =

50

Exercise: Complete the solution of this problem.

Normal Approximation to the Binomial Distribution.

When and how to use the normal approximation:

1. Large n, i.e. np ≥ 5 and n(1 −p) ≥ 5.

2. The approximation can be improved using correction factors.

Example. Let X be the number of times that a fair coin, ﬂipped 40, lands heads.

(i) Find the probability that X = 20. (ii) Find P(10 ≤ X ≤ 20). Use the normal

approximation.

Solution Note that np = 20 and np(1 −p) = 10.

P(X = 20) = P(19.5 < X < 20.5)

= P(

19.5 −20

√

10

<

X −20

√

10

<

20.5 −20

√

10

)

P(−0.16 < Z < 0.16)

= .1272.

The exact result is

P(X = 20) =

40

20

(0.5)

20

(0.5)

20

= .1268

(ii) Exercise.

3 Uniform: U[a,b]

Formula:

f(x) =

1

b −a

a < x < b

= 0 elsewhere

Graph.

Mean: µ = (a + b)/2

Variance: σ

2

= (b −a)

2

/12; σ = (b −a)/

√

12

CDF: (Area between a and c)

P(X ≤ c) = 0, c ≤ a ,

P(X ≤ c) =

c −a

b −a

, a ≤ c ≤ b ,

P(X ≤ c) = 1, c ≥ b

51

Exercise. Specialize the above results to the Uniform [0, 1] case.

4 Exponential

The exponential pdf often arises, in practice, as being the distribution of the amount

of time until some speciﬁc event occurs. Examples include time until a new car breaks

down, time until an arrival at emergency room, ... etc.

A rv X is said to have an exponential pdf with parameter λ > 0 if

f(x) = λe

−λx

, x ≥ 0

= 0 elsewhere

Properties

Graph.

Mean: µ = 1/λ

Variance: σ

2

= 1/λ

2

, σ = 1/λ

CDF: P(X ≤ a) = 1 −e

−λa

.

P(X > a) = e

−λa

Example 1. Suppose that the length of a phone call in minutes is an exponential rv with

parameter λ = 1/10. If someone arrives immediately ahead of you at a public telephone

booth, ﬁnd the probability that you will have to wait (i) more than 10 minutes, and (ii)

between 10 and 20 minutes.

Solution Let X be the be the length of a phone call in minutes by the person ahead of

you.

(i)

P(X > 10) = e

−λa

= e

−1

0.368

(ii)

P(10 < X < 20) = e

−1

−e

−2

0.233

Example 2. The amount of time, in hours, that a computer functions before breaking

down is an exponential rv with λ = 1/100.

(i) What is the probability that a computer will function between 50 and 150 hours

before breaking down?

(ii) What is the probability that it will function less than 100 hours?

Solution.

52

(i) The probability that a computer will function between 50 and 150 hours before

breaking down is given by

P(50 ≤ X ≤ 150) = e

−50/100

−e

−150/100

= e

−1/2

−e

−3/2

.384

(ii) Exercise.

Memoryless Property

FACT. The exponential rv has the memoryless property.

Converse The exponential distribution is the only continuous distribution with the

memoryless property.

Review Exercises: Normal Distribution

Please show all work. No credit for a correct ﬁnal answer without a valid argu-

ment. Use the formula, substitution, answer method whenever possible. Show your work

graphically in all relevant questions.

1. Calculate the area under the standard normal curve between the following values.

(i) z = 0 and z = 1.6 (i.e. P(0 ≤ Z ≤ 1.6))

(ii) z = 0 and z = −1.6 (i.e. P(−1.6 ≤ Z ≤ 0))

(iii) z = .86 and z = 1.75 (i.e. P(.86 ≤ Z ≤ 1.75))

(iv) z = −1.75 and z = −.86 (i.e. P(−1.75 ≤ Z ≤ −.86))

(v) z = −1.26 and z = 1.86 (i.e. P(−1.26 ≤ Z ≤ 1.86))

(vi) z = −1.0 and z = 1.0 (i.e. P(−1.0 ≤ Z ≤ 1.0))

(vii) z = −2.0 and z = 2.0 (i.e. P(−2.0 ≤ Z ≤ 2.0))

(viii) z = −3.0 and z = 3.0 (i.e. P(−3.0 ≤ Z ≤ 3.0))

2. Let Z be a standard normal distribution. Find z

0

such that

(i) P(Z ≥ z

0

) = 0.05

(ii) P(Z ≥ z

0

) = 0.99

(iii) P(Z ≥ z

0

) = 0.0708

(iv) P(Z ≤ z

0

) = 0.0708

(v) P(−z

0

≤ Z ≤ z

0

) = 0.68

(vi) P(−z

0

≤ Z ≤ z

0

) = 0.95

53

3. Let Z be a standard normal distribution. Find z

0

such that

(i) P(Z ≥ z

0

) = 0.10

(ii) P(Z ≥ z

0

) = 0.05

(iii) P(Z ≥ z

0

) = 0.025

(iv) P(Z ≥ z

0

) = 0.01

(v) P(Z ≥ z

0

) = 0.005

4. A normally distributed random variable X possesses a mean of µ = 10 and a

standard deviation of σ = 5. Find the following probabilities.

(i) X falls between 10 and 12 (i.e. P(10 ≤ X ≤ 12)).

(ii) X falls between 6 and 14 (i.e. P(6 ≤ X ≤ 14)).

(iii) X is less than 12 (i.e. P(X ≤ 12)).

(iv) X exceeds 10 (i.e. P(X ≥ 10)).

5. The height of adult women in the United States is normally distributed with mean

64.5 inches and standard deviation 2.4 inches.

(i) Find the probability that a randomly chosen woman is larger than 70 inches tall.

(Answer: .011)

(ii) Alice is 71 inches tall. What percentage of women are shorter than Alice. (Answer:

.9966)

6. The lifetimes of batteries produced by a ﬁrm are normally distributed with a mean

of 100 hours and a standard deviation of 10 hours. What is the probability a randomly

selected battery will last between 110 and 120 hours.

7. Answer by True of False . (Circle your choice).

T F (i) The standard normal distribution has its mean and standard deviation equal

to zero.

T F (ii) The standard normal distribution has its mean and standard deviation equal

to one.

T F (iii) The standard normal distribution has its mean equal to one and standard

deviation equal to zero.

T F (iv) The standard normal distribution has its mean equal to zero and standard

deviation equal to one.

T F (v) Because the normal distribution is symmetric half of the area under the curve

lies below the 40th percentile.

54

T F (vi) The total area under the normal curve is equal to one only if the mean is

equal to zero and standard deviation equal to one.

T F (vii) The normal distribution is symmetric only if the mean is zero and the

standard deviation is one.

55

Chapter 5

Sampling Distributions

Contents.

The Central Limit Theorem

The Sampling Distribution of the Sample Mean

The Sampling Distribution of the Sample Proportion

The Sampling Distribution of the Diﬀerence Between Two Sample Means

The Sampling Distribution of the Diﬀerence Between Two Sample Proportions

1 The Central Limit Theorem (CLT)

Roughly speaking, the CLT says

The sampling distribution of the sample mean, X, is

Z =

X −µ

X

σ

X

The sampling distribution of the sample proportion,

ˆ

P, is

Z =

ˆ p −µ

ˆ p

σ

ˆ p

2 Sampling Distributions

Suppose the distribution of X is normal with with mean µ and standard deviation σ.

(i) What is the distribution of

X−µ

σ

?

Answer: It is a standard normal, i.e.

56

Z =

X −µ

σ

I. The Sampling Distribution of the Sample Mean

(ii) What is the the mean (expected value) and standard deviation of X?

Answer:

µ

X

= E(X) = µ

σ

X

= S.E.(X) =

σ

√

n

(iii) What is the sampling distribution of the sample mean X?

Answer: The distribution of X is a normal distribution with mean µ and standard

deviation σ/

√

n, equivalently,

Z =

X −µ

X

σ

X

=

X −µ

σ/

√

n

(iv) What is the sampling distribution of the sample mean, X, if X is not normally

distributed?

Answer: The distribution of X is approximately a normal distribution with mean µ

and standard deviation σ/

√

n provided n is large (i.e. n ≥ 30).

Example. Consider a population, X, with mean µ = 4 and standard deviation σ = 3.

A sample of size 36 is to be selected.

(i) What is the mean and standard deviation of X?

(ii) Find P(4 < X < 5),

(iii) Find P(X > 3.5), (exercise)

(iv) Find P(3.5 ≤ X ≤ 4.5). (exercise)

II. The Sampling Distribution of the Sample Proportion

Suppose the distribution of X is binomial with with parameters n and p.

(ii) What is the the mean (expected value) and standard deviation of

ˆ

P?

Answer:

µ

ˆ

P

= E(

ˆ

P) = p

57

σ

ˆ

P

= S.E.(

ˆ

P) =

pq

n

(iii) What is the sampling distribution of the sample proportion

ˆ

P?

Answer:

ˆ

P has a normal distribution with mean p and standard deviation

pq

n

,

equivalently

Z =

ˆ

P −µ

ˆ

P

σ

ˆ

P

=

ˆ

P −p

pq

n

provided n is large (i.e. np ≥ 5, and nq ≥ 5).

Example. It is claimed that at least 30% of all adults favor brand A versus brand B.

To test this theory a sample n = 400 is selected. Suppose 130 individuals indicated

preference for brand A.

DATA SUMMARY: n = 400, x = 130, p = .30, ˆ p = 130/400 = .325

(i) Find the mean and standard deviation of the sample proportion

ˆ

P.

Answer:

µ

ˆ p

= p = .30

σ

ˆ p

=

pq

n

= .023

(ii) Find P(

ˆ

P > 0.30)

III. Comparing two Sample Means

E(X

1

−X

2

) = µ

1

−µ

2

σ

X

1

−X

2

=

σ

2

1

n

1

+

σ

2

2

n

2

Z =

X

1

−X

2

−(µ

1

−µ

2

)

σ

2

1

n

1

+

σ

2

2

n

2

provided n

1

, n

2

≥ 30.

58

IV. Comparing two Sample Proportions

E(

ˆ

P

1

−

ˆ

P

2

) = p

1

−p

2

σ

ˆ

P

1

−

ˆ

P

2

=

p

1

q

1

n

1

+

p

2

q

2

n

2

Z =

ˆ

P

1

−

ˆ

P

2

−(p

1

−p

2

)

p

1

q

1

n

1

+

p

2

q

2

n

2

provided n

1

and n

2

are large.

Review Exercises: Sampling Distributions

Please show all work. No credit for a correct ﬁnal answer without a valid argu-

ment. Use the formula, substitution, answer method whenever possible. Show your work

graphically in all relevant questions.

1. A normally distributed random variable X possesses a mean of µ = 20 and a

standard deviation of σ = 5. A random sample of n = 16 observations is to be selected.

Let X be the sample average.

(i) Describe the sampling distribution of X (i.e. describe the distribution of X and

give µ

x

, σ

x

). (Answer: µ = 20, σ

x

= 1.2)

(ii) Find the z-score of x = 22 (Answer: 1.6)

(iii) Find P(X ≥ 22) =

(iv) Find P(20 ≤ X ≤ 22)).

(v) Find P(16 ≤ X ≤ 19)).

(vi) Find P(X ≥ 23)).

(vii) Find P(X ≥ 18)).

2. The number of trips to doctor’s oﬃce per family per year in a given community is

known to have a mean of 10 with a standard deviation of 3. Suppose a random sample

of 49 families is taken and a sample mean is calculated.

(i) Describe the sampling distribution of the sample mean, X. (Include the mean µ

x

,

standard deviation σ

x

, and type of distribution).

59

(ii) Find the probability that the sample mean, X, does not exceed 9.(Answer: .01)

(iii) Find the probability that the sample mean, X, does not exceed 11. (Answer:

.99)

3. When a random sample of size n is drawn from a normal population with mean µ

and and variance σ

2

, the sampling distribution of the sample mean X will be

(a) exactly normal.

(b) approximately normal

(c) binomial

(d) none of the above

4. Answer by True of False . (Circle your choice).

T F (i) The central limit theorem applies regardless of the shape of the population

frequency distribution.

T F (ii) The central limit theorem is important because it explains why some estima-

tors tend to possess, approximately, a normal distribution.

60

Chapter 6

Large Sample Estimation

Contents.

1. Introduction

2. Point Estimators and Their Properties

3. Single Quantitative Population

4. Single Binomial Population

5. Two Quantitative Populations

6. Two Binomial Populations

7. Choosing the Sample Size

1 Introduction

Types of estimators.

1. Point estimator

2. Interval estimator: (L, U)

Desired Properties of Point Estimators.

(i) Unbiased: Mean of the sampling distribution is equal to the parameter.

(ii) Minimum variance: Small standard error of point estimator.

(iii) Error of estimation: distance between a parameter and its point estimate is small.

Desired Properties of Interval Estimators.

(i) Conﬁdence coeﬃcient: P(interval estimator will enclose the parameter)=1 − α

should be as high as possible.

(ii) Conﬁdence level: Conﬁdence coeﬃcient expressed as a percentage.

(iii) Margin of Error: (Bound on the error of estimation) should be as small as possible.

Parameters of Interest.

61

Single Quantitative Population: µ

Single Binomial Population: p

Two Quantitative Populations: µ

1

−µ

2

Two Binomial Populations: p

1

−p

2

2 Point Estimators and Their Properties

Parameter of interest: θ

Sample data: n,

ˆ

θ, σ

ˆ

θ

Point estimator:

ˆ

θ

Estimator mean: µ

ˆ

θ

= θ (Unbiased)

Standard error: SE(

ˆ

θ) = σ

ˆ

θ

Assumptions: Large sample + others (to be speciﬁed in each case)

3 Single Quantitative Population

Parameter of interest: µ

Sample data: n, x, s

Other information: α

Point estimator: x

Estimator mean: µ

x

= µ

Standard error: SE(x) = σ/

√

n (also denoted as σ

x

)

Conﬁdence Interval (C.I.) for µ:

x ±z

α/2

σ

√

n

Conﬁdence level: (1 − α)100% which is the probability that the interval estimator

contains the parameter.

Margin of Error. ( or Bound on the Error of Estimation)

B = z

α/2

σ

√

n

Assumptions.

1. Large sample (n ≥ 30)

2. Sample is randomly selected

62

Example 1. We are interested in estimating the mean number of unoccupied seats per

ﬂight, µ, for a major airline. A random sample of n = 225 ﬂights shows that the sample

mean is 11.6 and the standard deviation is 4.1.

Data summary: n = 225; x = 11.6; s = 4.1.

Question 1. What is the point estimate of µ ( Do not give the margin of error)?

x = 11.6

Question 2. Give a 95% bound on the error of estimation (also known as the margin

of error).

B = z

α/2

σ

√

n

= 1.96

4.1

√

225

= 0.5357

Question 3. Find a 90% conﬁdence interval for µ.

x ±z

α/2

σ

√

n

11.6 ±1.645

4.1

√

225

11.6 ±0.45 = (11.15, 12.05)

Question 4. Interpret the CI found in Question 3.

The interval contains µ with probability 0.90.

OR

If repeated sampling is used, then 90% of CI constructed would contain µ.

Question 5. What is the width of the CI found in Question 3.?

The width of the CI is

W = 2z

α/2

σ

√

n

W = 2(0.45) = 0.90

OR

W = 12.05 −11.15 = 0.90

Question 6. If n, the sample size, is increased what happens to the width of the CI?

what happens to the margin of error?

The width of the CI decreases.

The margin of error decreases.

Sample size:

n

(z

α/2

)

2

σ

2

B

2

63

where σ is estimated by s.

Note: In the absence of data, σ is sometimes approximated by

R

4

where R is the

range.

Example 2. Suppose you want to construct a 99% CI for µ so that W = 0.05. You are

told that preliminary data shows a range from 13.3 to 13.7. What sample size should

you choose?

A. Data summary: α = .01; R = 13.7 −13.3 = .4;

so σ .4/4 = .1. Now

B = W/2 = 0.05/2 = 0.025. Therefore

n

(z

α/2

)

2

σ

2

B

2

=

2.58

2

(.1)

2

0.025

2

= 106.50 .

So n = 107. (round up)

Exercise 1. Find the sample size necessary to reduce W in the ﬂight example to .6. Use

α = 0.05.

4 Single Binomial Population

Parameter of interest: p

Sample data: n, x, ˆ p =

x

n

(x here is the number of successes).

Other information: α

Point estimator: ˆ p

Estimator mean: µ

ˆ p

= p

Standard error: σ

ˆ p

=

pq

n

Conﬁdence Interval (C.I.) for p:

ˆ p ±z

α/2

ˆ pˆ q

n

Conﬁdence level: (1 − α)100% which is the probability that the interval estimator

contains the parameter.

Margin of Error.

B = z

α/2

ˆ pˆ q

n

64

Assumptions.

1. Large sample (np ≥ 5; nq ≥ 5)

2. Sample is randomly selected

Example 3. A random sample of n = 484 voters in a community produced x = 257

voters in favor of candidate A.

Data summary: n = 484; x = 257; ˆ p =

x

n

=

257

484

= 0.531.

Question 1. Do we have a large sample size?

nˆ p = 484(0.531) = 257 which is ≥ 5.

nˆ q = 484(0.469) = 227 which is ≥ 5.

Therefore we have a large sample size.

Question 2. What is the point estimate of p and its margin of error?

ˆ p =

x

n

=

257

484

= 0.531

B = z

α/2

ˆ pˆ q

n

= 1.96

(0.531)(0.469)

484

= 0.044

Question 3. Find a 90% conﬁdence interval for p.

ˆ p ±z

α/2

ˆ pˆ q

n

0.531 ±1.645

(0.531)(0.469)

484

0.531 ±0.037 = (0.494, 0.568)

Question 4. What is the width of the CI found in Question 3.?

The width of the CI is

W = 2z

α/2

ˆ pˆ q

n

= 2(0.037) = 0.074

Question 5. Interpret the CI found in Question 3.

The interval contains p with probability 0.90.

OR

If repeated sampling is used, then 90% of CI constructed would contain p.

Question 6. If n, the sample size, is increased what happens to the width of the CI?

what happens to the margin of error?

65

The width of the CI decreases.

The margin of error decreases.

Sample size.

n

(z

α/2

)

2

(ˆ pˆ q)

B

2

.

Note: In the absence of data, choose ˆ p = ˆ q = 0.5 or simply ˆ pˆ q = 0.25.

Example 4. Suppose you want to provide an accurate estimate of customers preferring

one brand of coﬀee over another. You need to construct a 95% CI for p so that B = 0.015.

You are told that preliminary data shows a ˆ p = 0.35. What sample size should you choose

? Use α = 0.05.

Data summary: α = .05; ˆ p = 0.35; B = 0.015

n

(z

α/2

)

2

(ˆ pˆ q)

B

2

=

(1.96)

2

(0.35)(0.65)

0.015

2

= 3, 884.28

So n = 3, 885. (round up)

Exercise 2. Suppose that no preliminary estimate of ˆ p is available. Find the new sample

size. Use α = 0.05.

Exercise 3. Suppose that no preliminary estimate of ˆ p is available. Find the sample

size necessary so that α = 0.01.

5 Two Quantitative Populations

Parameter of interest: µ

1

−µ

2

Sample data:

Sample 1: n

1

, x

1

, s

1

Sample 2: n

2

, x

2

, s

2

Point estimator: X

1

−X

2

Estimator mean: µ

X

1

−X

2

= µ

1

−µ

2

Standard error: SE(X

1

−X

2

) =

σ

2

1

n

1

+

σ

2

2

n

2

Conﬁdence Interval.

(x

1

−x

2

) ±z

α/2

σ

2

1

n

1

+

σ

2

2

n

2

66

Assumptions.

1. Large samples ( n

1

≥ 30; n

2

≥ 30)

2. Samples are randomly selected

3. Samples are independent

Sample size.

n

(z

α/2

)

2

(σ

2

1

+ σ

2

2

)

B

2

6 Two Binomial Populations

Parameter of interest: p

1

−p

2

Sample 1: n

1

, x

1

, ˆ p

1

=

x

1

n

1

Sample 2: n

2

, x

2

, ˆ p

2

=

x

2

n

2

p

1

−p

2

(unknown parameter)

α (signiﬁcance level)

Point estimator: ˆ p

1

− ˆ p

2

Estimator mean: µ

ˆ p

1

−ˆ p

2

= p

1

−p

2

Estimated standard error: σ

ˆ p

1

−ˆ p

2

=

ˆ p

1

ˆ q

1

n

1

+

ˆ p

2

ˆ q

2

n

2

Conﬁdence Interval.

(ˆ p

1

− ˆ p

2

) ±z

α/2

ˆ p

1

ˆ q

1

n

1

+

ˆ p

2

ˆ q

2

n

2

Assumptions.

1. Large samples,

(n

1

p

1

≥ 5, n

1

q

1

≥ 5, n

2

p

2

≥ 5, n

2

q

2

≥ 5)

2. Samples are randomly and independently selected

Sample size.

n

(z

α/2

)

2

(ˆ p

1

ˆ q

1

+ ˆ p

2

ˆ q

2

)

B

2

For unkown parameters:

n

(z

α/2

)

2

(0.5)

B

2

Review Exercises: Large-Sample Estimation

67

Please show all work. No credit for a correct ﬁnal answer without a valid argu-

ment. Use the formula, substitution, answer method whenever possible. Show your work

graphically in all relevant questions.

1. A random sample of size n = 100 is selected form a quantitative population. The

data produced a mean and standard deviation of x = 75 and s = 6 respectively.

(i) Estimate the population mean µ, and give a 95% bound on the error of estimation

(or margin of error). (Answer: B=1.18)

(ii) Find a 99% conﬁdence interval for the population mean. (Answer: B=1.55)

(iii) Interpret the conﬁdence interval found in (ii).

(iv) Find the sample size necessary to reduce the width of the conﬁdence interval in

(ii) by half. (Answer: n=400)

2. An examination of the yearly premiums for a random sample of 80 automobile

insurance policies from a major company showed an average of $329 and a standard

deviation of $49.

(i) Give the point estimate of the population parameter µ and a 99% bound on the

error of estimation. (Margin of error). (Answer: B=14.135)

(ii) Construct a 99% conﬁdence interval for µ.

(iii) Suppose we wish our estimate in (i) to be accurate to within $5 with 95% con-

ﬁdence; how many insurance policies should be sampled to achieve the desired level of

accuracy? (Answer: n=369)

3. Suppose we wish to estimate the average daily yield of a chemical manufactured

in a chemical plant. The daily yield recorded for n = 100 days, produces a mean and

standard deviation of x = 870 and s = 20 tons respectively.

(i) Estimate the average daily yield µ, and give a 95% bound on the error of estimation

(or margin of error).

(ii) Find a 99% conﬁdence interval for the population mean.

(iii) Interpret the conﬁdence interval found in (ii).

(iv) Find the sample size necessary to reduce the width of the conﬁdence interval in

(ii) by half.

4. Answer by True of False . (Circle your choice).

T F (i) If the population variance increases and other factors are the same, the width

of the conﬁdence interval for the population mean tends to increase.

68

T F (ii) As the sample size increases, the width of the conﬁdence interval for the

population mean tends to decrease.

T F (iii) Populations are characterized by numerical descriptive measures called sta-

tistics.

T F (iv) If, for a given C.I., α is increased, then the margin of error will increase.

T F (v) The sample standard deviation s can be used to approximate σ when n is

larger than 30.

T F (vi) The sample mean always lies above the population mean.

69

Chapter 7

Large-Sample Tests of Hypothesis

Contents.

1. Elements of a statistical test

2. A Large-sample statistical test

3. Testing a population mean

4. Testing a population proportion

5. Testing the diﬀerence between two population means

6. Testing the diﬀerence between two population proportions

7. Reporting results of statistical tests: p-Value

1 Elements of a Statistical Test

Null hypothesis: H

0

Alternative (research) hypothesis: H

a

Test statistic:

Rejection region : reject H

0

if .....

Graph:

Decision: either “Reject H

0

” or “Do no reject H

0

”

Conclusion: At 100α% signiﬁcance level there is (in)suﬃcient statistical evidence to

“ favor H

a

” .

Comments:

* H

0

represents the status-quo

* H

a

is the hypothesis that we want to provide evidence to justify. We show that H

a

is true by showing that H

0

is false, that is proof by contradiction.

Type I error ≡ { reject H

0

|H

0

is true }

70

Type II error ≡ { do not reject H

0

|H

0

is false}

α = Prob{Type I error}

β = Prob{Type II error}

Power of a statistical test:

Prob{reject H

0

— H

0

is false }= 1 −β

Example 1.

H

0

: Innocent

H

a

: Guilty

α = Prob{sending an innocent person to jail}

β = Prob{letting a guilty person go free}

Example 2.

H

0

: New drug is not acceptable

H

a

: New drug is acceptable

α = Prob{marketing a bad drug}

β = Prob{not marketing an acceptable drug}

2 A Large-Sample Statistical Test

Parameter of interest: θ

Sample data: n,

ˆ

θ, σ

ˆ

θ

Test:

Null hypothesis (H

0

) : θ = θ

0

Alternative hypothesis (H

a

): 1) θ > θ

0

; 2) θ < θ

0

; 3) θ = θ

0

Test statistic (TS):

z =

ˆ

θ −θ

0

σ

ˆ

θ

Critical value: either z

α

or z

α/2

Rejection region (RR) :

1) Reject H

0

if z > z

α

2) Reject H

0

if z < −z

α

3) Reject H

0

if z > z

α/2

or z < −z

α/2

Graph:

Decision: 1) if observed value is in RR: “Reject H

0

”

2) if observed value is not in RR: “Do no reject H

0

”

71

Conclusion: At 100α% signiﬁcance level there is (in)suﬃcient statistical evidence to

· · · .

Assumptions: Large sample + others (to be speciﬁed in each case).

One tailed statistical test

Upper (right) tailed test

Lower (left) tailed test

Two tailed statistical test

3 Testing a Population Mean

Parameter of interest: µ

Sample data: n, x, s

Other information: µ

0

= target value, α

Test:

H

0

: µ = µ

0

H

a

: 1) µ > µ

0

; 2) µ < µ

0

; 3) µ = µ

0

T.S. :

z =

x −µ

0

σ/

√

n

Rejection region (RR) :

1) Reject H

0

if z > z

α

2) Reject H

0

if z < −z

α

3) Reject H

0

if z > z

α/2

or z < −z

α/2

Graph:

Decision: 1) if observed value is in RR: “Reject H

0

”

2) if observed value is not in RR: “Do no reject H

0

”

Conclusion: At 100α% signiﬁcance level there is (in)suﬃcient statistical evidence to

“ favor H

a

” .

Assumptions:

Large sample (n ≥ 30)

Sample is randomly selected

Example: Test the hypothesis that weight loss in a new diet program exceeds 20 pounds

during the ﬁrst month.

Sample data : n = 36, x = 21, s

2

= 25, µ

0

= 20, α = 0.05

H

0

: µ = 20 (µ is not larger than 20)

72

H

a

: µ > 20 (µ is larger than 20)

T.S. :

z =

x −µ

0

s/

√

n

=

21 −20

5/

√

36

= 1.2

Critical value: z

α

= 1.645

RR: Reject H

0

if z > 1.645

Graph:

Decision: Do not reject H

0

Conclusion: At 5% signiﬁcance level there is insuﬃcient statistical evidence to con-

clude that weight loss in a new diet program exceeds 20 pounds per ﬁrst month.

Exercise: Test the claim that weight loss is not equal to 19.5.

4 Testing a Population Proportion

Parameter of interest: p (unknown parameter)

Sample data: n and x (or ˆ p =

x

n

)

p

0

= target value

α (signiﬁcance level)

Test:

H

0

: p = p

0

H

a

: 1) p > p

0

; 2) p < p

0

; 3) p = p

0

T.S. :

z =

ˆ p −p

0

p

0

q

0

/n

RR:

1) Reject H

0

if z > z

α

2) Reject H

0

if z < −z

α

3) Reject H

0

if z > z

α/2

or z < −z

α/2

Graph:

Decision:

1) if observed value is in RR: “Reject H

0

”

2) if observed value is not in RR: “Do not reject H

0

”

Conclusion: At (α)100% signiﬁcance level there is (in)suﬃcient statistical evidence

to “ favor H

a

” .

Assumptions:

73

1. Large sample (np ≥ 5, nq ≥ 5)

2. Sample is randomly selected

Example. Test the hypothesis that p > .10 for sample data: n = 200, x = 26.

Solution.

ˆ p =

x

n

=

26

200

= .13,

Now

H

0

: p = .10

H

a

: p > .10

TS:

z =

ˆ p −p

0

p

0

q

0

/n

=

.13 −.10

(.10)(.90)/200

= 1.41

RR: reject H

0

if z > 1.645

Graph:

Dec: Do not reject H

0

Conclusion: At 5% signiﬁcance level there is insuﬃcient statistical evidence to con-

clude that p > .10.

Exercise Is the large sample assumption satisﬁed here ?

5 Comparing Two Population Means

Parameter of interest: µ

1

−µ

2

Sample data:

Sample 1: n

1

, x

1

, s

1

Sample 2: n

2

, x

2

, s

2

Test:

H

0

: µ

1

−µ

2

= D

0

H

a

: 1)µ

1

−µ

2

> D

0

; 2) µ

1

−µ

2

< D

0

;

3) µ

1

−µ

2

= D

0

T.S. :

z =

(x

1

−x

2

) −D

0

σ

2

1

n

1

+

σ

2

2

n

2

RR:

1) Reject H

0

if z > z

α

2) Reject H

0

if z < −z

α

74

3) Reject H

0

if z > z

α/2

or z < −z

α/2

Graph:

Decision:

Conclusion:

Assumptions:

1. Large samples ( n

1

≥ 30; n

2

≥ 30)

2. Samples are randomly selected

3. Samples are independent

Example: (Comparing two weight loss programs)

Refer to the weight loss example. Test the hypothesis that weight loss in the two diet

programs are diﬀerent.

1. Sample 1 : n

1

= 36, x

1

= 21, s

2

1

= 25 (old)

2. Sample 2 : n

2

= 36, x

2

= 18.5, s

2

2

= 24 (new)

D

0

= 0, α = 0.05

H

0

: µ

1

−µ

2

= 0

H

a

: µ

1

−µ

2

= 0,

T.S. :

z =

(x

1

−x

2

) −0

σ

2

1

n

1

+

σ

2

2

n

2

= 2.14

Critical value: z

α/2

= 1.96

RR: Reject H

0

if z > 1.96 or z < −1.96

Graph:

Decision: Reject H

0

Conclusion: At 5% signiﬁcance level there is suﬃcient statistical evidence to conclude

that weight loss in the two diet programs are diﬀerent.

Exercise: Test the hypothesis that weight loss in the old diet program exceeds that of

the new program.

Exercise: Test the claim that the diﬀerence in mean weight loss for the two programs

is greater than 1.

6 Comparing Two Population Proportions

Parameter of interest: p

1

−p

2

Sample 1: n

1

, x

1

, ˆ p

1

=

x

1

n

1

,

75

Sample 2: n

2

, x

2

, ˆ p

2

=

x

2

n

2

,

p

1

−p

2

(unknown parameter)

Common estimate:

ˆ p =

x

1

+ x

2

n

1

+ n

2

Test:

H

0

: p

1

−p

2

= 0

H

a

: 1) p

1

−p

2

> 0

2) p

1

−p

2

< 0

3) p

1

−p

2

= 0

T.S. :

z =

(ˆ p

1

− ˆ p

2

) −0

ˆ pˆ q(1/n

1

+ 1/n

2

)

RR:

1) Reject H

0

if z > z

α

2) Reject H

0

if z < −z

α

3) Reject H

0

if z > z

α/2

or z < −z

α/2

Graph:

Decision:

Conclusion:

Assumptions:

Large sample(n

1

p

1

≥ 5, n

1

q

1

≥ 5, n

2

p

2

≥ 5, n

2

q

2

≥ 5)

Samples are randomly and independently selected

Example: Test the hypothesis that p

1

− p

2

< 0 if it is known that the test statistic is

z = −1.91.

Solution:

H

0

: p

1

−p

2

= 0

H

a

: p

1

−p

2

< 0

TS: z = −1.91

RR: reject H

0

if z < −1.645

Graph:

Dec: reject H

0

Conclusion: At 5% signiﬁcance level there is suﬃcient statistical evidence to conclude

that p

1

−p

2

< 0.

76

Exercise: Repeat as a two tailed test

7 Reporting Results of Statistical Tests: P-Value

Deﬁnition. The p-value for a test of a hypothesis is the smallest value of α for which

the null hypothesis is rejected, i.e. the statistical results are signiﬁcant.

The p-value is called the observed signiﬁcance level

Note: The p-value is the probability ( when H

0

is true) of obtaining a value of the

test statistic as extreme or more extreme than the actual sample value in support of H

a

.

Examples. Find the p-value in each case:

(i) Upper tailed test:

H

0

: θ = θ

0

H

a

: θ > θ

0

TS: z = 1.76

p-value = .0392

(ii) Lower tailed test:

H

0

: θ = θ

0

H

a

: θ < θ

0

TS: z = −1.86

p-value = .0314

(iii) Two tailed test:

H

0

: θ = θ

0

H

a

: θ = θ

0

TS: z = 1.76

p-value = 2(.0392) = .0784

Decision rule using p-value: (Important)

Reject H

0

for all α > p −value

Review Exercises: Testing Hypothesis

Please show all work. No credit for a correct ﬁnal answer without a valid argu-

ment. Use the formula, substitution, answer method whenever possible. Show your work

graphically in all relevant questions.

1. A local pizza parlor advertises that their average time for delivery of a pizza is

within 30 minutes of receipt of the order. The delivery time for a random sample of 64

77

orders were recorded, with a sample mean of 34 minutes and a standard deviation of 21

minutes.

(i) Is there suﬃcient evidence to conclude that the actual delivery time is larger than

what is claimed by the pizza parlor? Use α = .05.

H

0

:

H

a

:

T.S. (Answer: 1.52)

R.R.

Graph:

Dec:

Conclusion:

((ii) Test the hypothesis that H

a

: µ = 30.

2. Answer by True of False . (Circle your choice).

T F (v) If, for a given test, α is ﬁxed and the sample size is increased, then β will

increase.

78

Chapter 8

Small-Sample Tests of Hypothesis

Contents:

1. Introduction

2. Student’s t distribution

3. Small-sample inferences about a population mean

4. Small-sample inferences about the diﬀerence between two means: Independent

Samples

5. Small-sample inferences about the diﬀerence between two means: Paired Samples

6. Inferences about a population variance

7. Comparing two population variances

1 Introduction

When the sample size is small we only deal with normal populations.

For non-normal (e.g. binomial) populations diﬀerent techniques are necessary

2 Student’s t Distribution

RECALL

For small samples (n < 30) from normal populations, we have

z =

x −µ

σ/

√

n

If σ is unknown, we use s instead; but we no more have a Z distribution

Assumptions.

79

1. Sampled population is normal

2. Small random sample (n < 30)

3. σ is unknown

t =

x −µ

s/

√

n

Properties of the t Distribution:

(i) It has n −1 degrees of freedom (df)

(ii) Like the normal distribution it has a symmetric mound-shaped probability distri-

bution

(iii) More variable (ﬂat) than the normal distribution

(iv) The distribution depends on the degrees of freedom. Moreover, as n becomes

larger, t converges to Z.

(v) Critical values (tail probabilities) are obtained from the t table

Examples.

(i) Find t

0.05,5

= 2.015

(ii) Find t

0.005,8

= 3.355

(iii) Find t

0.025,26

= 2.056

3 Small-Sample Inferences About a Population Mean

Parameter of interest: µ

Sample data: n, x, s

Other information: µ

0

= target value, α

Point estimator: x

Estimator mean: µ

x

= µ

Estimated standard error: σ

x

= s/

√

n

Conﬁdence Interval for µ:

x ±t

α

2

,n−1

(

s

√

n

)

Test:

H

0

: µ = µ

0

H

a

: 1) µ > µ

0

; 2) µ < µ

0

; 3) µ = µ

0

.

Critical value: either t

α,n−1

or t

α

2

,n−1

80

T.S. : t =

x−µ

0

s/

√

n

RR:

1) Reject H

0

if t > t

α,n−1

2) Reject H

0

if t < −t

α,n−1

3) Reject H

0

if t > t

α

2

,n−1

or t < −t

α

2

,n−1

Graph:

Decision: 1) if observed value is in RR: “Reject H

0

”

2) if observed value is not in RR: “Do not reject H

0

”

Conclusion: At 100α% signiﬁcance level there is (in)suﬃcient statistical evidence to

“favor H

a

” .

Assumptions.

1. Small sample (n < 30)

2. Sample is randomly selected

3. Normal population

4. Unknown variance

Example For the sample data given below, test the hypothesis that weight loss in a new

diet program exceeds 20 pounds per ﬁrst month.

1. Sample data: n = 25, x = 21.3, s

2

= 25, µ

0

= 20, α = 0.05

Critical value: t

0.05,24

= 1.711

H

0

: µ = 20

H

a

: µ > 20,

T.S.:

t =

x −µ

0

s/

√

n

=

21.3 −20

5/

√

25

= 1.3

RR: Reject H

0

if t > 1.711

Graph:

Decision: Do not reject H

0

Conclusion: At 5% signiﬁcance level there is insuﬃcient statistical evidence to con-

clude that weight loss in a new diet program exceeds 20 pounds per ﬁrst month.

Exercise. Test the claim that weight loss is not equal to 19.5, (i.e. H

a

: µ = 19.5).

4 Small-Sample Inferences About the Diﬀerence Be-

tween Two Means: Independent Samples

Parameter of interest: µ

1

−µ

2

81

Sample data:

Sample 1: n

1

, x

1

, s

1

Sample 2: n

2

, x

2

, s

2

Other information: D

0

= target value, α

Point estimator: X

1

−X

2

Estimator mean: µ

X

1

−X

2

= µ

1

−µ

2

Assumptions.

1. Normal populations

2. Small samples ( n

1

< 30; n

2

< 30)

3. Samples are randomly selected

4. Samples are independent

5. Variances are equal with common variance

σ

2

= σ

2

1

= σ

2

2

Pooled estimator for σ.

s =

(n

1

−1)s

2

1

+ (n

2

−1)s

2

2

n

1

+ n

2

−2

Estimator standard error:

σ

X

1

−X

2

= σ

1

n

1

+

1

n

2

Reason:

σ

X

1

−X

2

=

σ

2

1

n

1

+

σ

2

2

n

2

=

σ

2

n

1

+

σ

2

n

2

= σ

1

n

1

+

1

n

2

Conﬁdence Interval:

(x

1

−x

2

) ±(t

α/2,n

1

+n

2

−2

)(s

1

n

1

+

1

n

2

)

Test:

H

0

: µ

1

−µ

2

= D

0

82

H

a

: 1)µ

1

−µ

2

> D

0

; 2) µ

1

−µ

2

< D

0

;

3) µ

1

−µ

2

= D

0

T.S. :

t =

(x

1

−x

2

) −D

0

s

1

n

1

+

1

n

2

RR: 1) Reject H

0

if t > t

α,n

1

+n

2

−2

2) Reject H

0

if t < −t

α,n

1

+n

2

−2

3) Reject H

0

if t > t

α/2,n

1

+n

2

−2

or t < −t

α/2,n

1

+n

2

−2

Graph:

Decision:

Conclusion:

Example.(Comparison of two weight loss programs)

Refer to the weight loss example. Test the hypothesis that weight loss in a new diet

program is diﬀerent from that of an old program. We are told that that the observed

value is 2.2 and the we know that

1. Sample 1 : n

1

= 7

2. Sample 2 : n

2

= 8

α = 0.05

Solution.

H

0

: µ

1

−µ

2

= 0

H

a

: µ

1

−µ

2

= 0

T.S. :

t =

(x

1

−x

2

) −0

s

1

n

1

+

1

n

2

= 2.2

Critical value: t

.025,13

= 2.160

RR: Reject H

0

if t > 2.160 or t < −2.160

Graph:

Decision: Reject H

0

Conclusion: At 5% signiﬁcance level there is suﬃcient statistical evidence to conclude

that weight loss in the two diet programs are diﬀerent.

Exercise: Test the claim that the diﬀerence in mean weight loss for the two programs

is greater than 0.

Minitab Commands: A twosample t procedure with a pooled estimate of variance

MTB> twosample C1 C2;

SUBC>pooled;

83

SUBC> alternative 1.

Note: alternative : 1=right-tailed; -1=left tailed; 0=two tailed.

5 Small-Sample Inferences About the Diﬀerence Be-

tween Two Means: Paired Samples

Parameter of interest: µ

1

−µ

2

= µ

d

Sample of paired diﬀerences data:

Sample : n = number of pairs, d = sample mean, s

d

Other information: D

0

= target value, α

Point estimator: d

Estimator mean: µ

d

= µ

d

Assumptions.

1. Normal populations

2. Small samples ( n

1

< 30; n

2

< 30)

3. Samples are randomly selected

4. Samples are paired (not independent)

Sample standard deviation of the sample of n paired diﬀerences

s

d

=

¸

n

i=1

(d

i

−d)

2

n −1

Estimator standard error: σ

d

= s

d

/

√

n

Conﬁdence Interval.

d ±t

α/2,n−1

s

d

/

√

n

Test.

H

0

: µ

1

−µ

2

= D

0

(equivalently, µ

d

= D

0

)

H

a

: 1)µ

1

−µ

2

= µ

d

> D

0

; 2) µ

1

−µ

2

= µ

d

< D

0

;

3) µ

1

−µ

2

= µ

d

= D

0

,

T.S. :

t =

d −D

0

s

d

/

√

n

RR:

1) Reject H

0

if t > t

α,n−1

2) Reject H

0

if t < −t

α,n−1

84

3) Reject H

0

if t > t

α/2,n−1

or t < −t

α/2,n−1

Graph:

Decision:

Conclusion:

Example. A manufacturer wishes to compare wearing qualities of two diﬀerent types

of tires, A and B. For the comparison a tire of type A and one of type B are randomly

assigned and mounted on the rear wheels of each of ﬁve automobiles. The automobiles

are then operated for a speciﬁed number of miles, and the amount of wear is recorded

for each tire. These measurements are tabulated below.

Automobile Tire A Tire B

1 10.6 10.2

2 9.8 9.4

3 12.3 11.8

4 9.7 9.1

5 8.8 8.3

x

1

= 10.24 x

2

= 9.76

Using the previous section test we would have t = 0.57 resulting in an insigniﬁcant

test which is inconsistent with the data.

Automobile Tire A Tire B d=A-B

1 10.6 10.2 .4

2 9.8 9.4 .4

3 12.3 11.8 .5

4 9.7 9.1 .6

5 8.8 8.3 .5

x

1

= 10.24 x

2

= 9.76 d = .48

Q1: Provide a summary of the data in the above table.

Sample summary: n = 5, d = .48, s

d

= .0837

Q2: Do the data provide suﬃcient evidence to indicate a diﬀerence in average wear

for the two tire types.

Test. (parameter µ

d

= µ

1

−µ

2

)

H

0

: µ

d

= 0

H

a

: µ

d

= 0

T.S. :

t =

d −D

0

s

d

/

√

n

=

.48 −0

.0837/

√

5

= 12.8

85

RR: Reject H

0

if t > 2.776 or t < −2.776 ( t

.025,4

= 2.776)

Graph:

Decision: Reject H

0

Conclusion: At 5% signiﬁcance level there is suﬃcient statistical evidence to to con-

clude that the average amount of wear for type A tire is diﬀerent from that for type B

tire.

Exercise. Construct a 99% conﬁdence interval for the diﬀerence in average wear for the

two tire types.

6 Inferences About a Population Variance

Chi-square distribution. When a random sample of size n is drawn from a normal

population with mean µ and standard deviation σ, the sampling distribution of S

2

de-

pends on n. The standardized distribution of S

2

is called the chi-square distribution and

is given by

X

2

=

(n −1)s

2

σ

2

Degrees of freedom (df): ν = n −1

Graph: Non-symmetrical and depends on df

Critical values: using X

2

tables

Test.

H

0

: σ

2

= σ

2

0

H

a

: σ

2

= σ

2

0

(two-tailed test).

T.S. :

X

2

=

(n −1)s

2

σ

2

0

RR: Reject H

0

if X

2

> X

2

α/2

or X

2

< X

2

1−α/2

where X

2

is based on (n −1) degrees of

freedom.

Graph:

Decision:

Conclusion:

Assumptions.

1. Normal population

2. Random sample

Example:

86

Use text

7 Comparing Two Population Variances

F-distribution. When independent samples are drawn from two normal populations

with equal variances then S

2

1

/S

2

2

possesses a sampling distribution that is known as an

F distribution. That is

F =

s

2

1

s

2

2

Degrees of freedom (df): ν

1

= n

1

−1; ν

2

= n

2

−1

Graph: Non-symmetrical and depends on df

Critical values: using F tables

Test.

H

0

: σ

2

1

= σ

2

2

H

a

: σ

2

1

= σ

2

2

(two-tailed test).

T.S. :F =

s

2

1

s

2

2

where s

2

1

is the larger sample variance.

Note: F =

larger sample variance

smaller sample variance

RR: Reject H

0

if F > F

α/2

where F

α/2

is based on (n

1

− 1) and (n

2

− 1) degrees of

freedom.

Graph:

Decision:

Conclusion:

Assumptions.

1. Normal populations

2. Independent random samples

Example. (Investment Risk) Investment risk is generally measured by the volatility

of possible outcomes of the investment. The most common method for measuring in-

vestment volatility is by computing the variance ( or standard deviation) of possible

outcomes. Returns over the past 10 years for ﬁrst alternative and 8 years for the second

alternative produced the following data:

Data Summary:

Investment 1: n

1

= 10, x

1

= 17.8%; s

2

1

= 3.21

Investment 2: n

2

= 8, x

2

= 17.8%; s

2

2

= 7.14

Both populations are assumed to be normally distributed.

87

Q1: Do the data present suﬃcient evidence to indicate that the risks for investments

1 and 2 are unequal ?

Solution.

Test:

H

0

: σ

2

1

= σ

2

2

H

a

: σ

2

1

= σ

2

2

(two-tailed test).

T.S. :

F =

s

2

2

s

2

1

=

7.14

3.21

= 2.22

.

RR: Reject H

0

if F > F

α/2

where

F

α/2,n

2

−1,n

1

−1

= F

.025,7,9

= 4.20

Graph:

Decision: Do not reject H

0

Conclusion: At 5% signiﬁcance level there is insuﬃcient statistical evidence to indicate

that the risks for investments 1 and 2 are unequal.

Exercise. Do the upper tail test. That is H

a

: σ

2

1

> σ

2

2

.

88

Chapter 9

Analysis of Variance

Contents.

1. Introduction

2. One Way ANOVA: Completely Randomized Experimental Design

3. The Randomized Block Design

1 Introduction

Analysis of variance is a statistical technique used to compare more than two popu-

lation means by isolating the sources of variability.

Example. Four groups of sales people for a magazine sales agency were subjected to

diﬀerent sales training programs. Because there were some dropouts during the training

program, the number of trainees varied from program to program. At the end of the

training programs each salesperson was assigned a sales area from a group of sales areas

that were judged to have equivalent sales potentials. The table below lists the number

of sales made by each person in each of the four groups of sales people during the ﬁrst

week after completing the training program. Do the data present suﬃcient evidence to

indicate a diﬀerence in the mean achievement for the four training programs?

Goal. Test whether the means are equal or not. That is

H

0

: µ

1

= µ

2

= µ

3

= µ

4

H

a

: Not all means are equal

Deﬁnitions:

(i) Response: variable of interest or dependent variable (sales)

(ii) Factor: categorical variable or independent variable (training technique)

(iii) Treatment levels (factor levels): method of training; t =4

89

Training Group

1 2 3 4

65 75 59 94

87 69 78 89

73 83 67 80

79 81 62 88

81 72 83

69 79 76

90

n

1

= 6 n

2

= 7 n

3

= 6 n

4

= 4 n = 23

T

i

454 549 425 351 GT= 1779

T

i

75.67 78.43 70.83 87.75

parameter µ

1

µ

2

µ

3

µ

4

(iv) ANOVA: ANalysis OF VAriance

(v) N-Way ANOVA: studies N factors.

(vi) experimental unit: (trainee)

2 One Way ANOVA: Completely Randomized Ex-

perimental Design

ANOVA Table

Source of error df SS MS F p-value

Treatments 3 712.6 237.5 3.77

Error 19 1,196.6 63.0

Totals 22 1909.2

Inferences about population means

Test.

H

0

: µ

1

= µ

2

= µ

3

= µ

4

H

a

: Not all means are equal

T.S. : F =

MST

MSE

= 3.77

where F is based on (t-1) and (n-t) df.

RR: Reject H

0

if F > F

α,t−1,n−t

i.e. Reject H

0

if F > F

0.05,3,19

= 3.13

90

Graph:

Decision: Reject H

0

Conclusion: At 5% signiﬁcance level there is suﬃcient statistical evidence to indicate

a diﬀerence in the mean achievement for the four training programs.

Assumptions.

1. Sampled populations are normal

2. Independent random samples

3. All t populations have equal variances

Computations.

ANOVA Table

S of error df SS MS F p-value

Trments t-1 SST MST=SST/(t-1) MST/MSE

Error n-t SSE MSE=SSE/(n-t)

Totals n-1 TSS

Training Group

1 2 3 4

x

11

x

21

x

31

x

41

x

12

x

22

x

32

x

42

x

13

x

23

x

33

x

43

x

14

x

24

x

34

x

44

x

15

x

25

x

35

x

16

x

26

x

36

x

27

n

1

n

2

n

3

n

4

n

T

i

T

1

T

2

T

3

T

4

GT

T

i

T

1

T

2

T

3

T

4

parameter µ

1

µ

2

µ

3

µ

4

Notation:

TSS: sum of squares of total deviation.

SST: sum of squares of total deviation between treatments.

SSE: sum of squares of total deviation within treatments (error).

CM: correction for the mean

91

GT: Grand Total.

Computational Formulas for TSS, SST and SSE:

TSS =

t

¸

i=1

n

i

¸

j=1

x

2

ij

−CM

SST =

t

¸

i=1

T

2

i

n

i

−CM

SSE = TSS −SST

Calculations for the training example produce

CM = (

¸¸

x

ij

)

2

/n = 1, 779

2

/23 = 137, 601.8

TSS =

¸¸

x

2

ij

−CM = 1, 909.2

SST =

¸

T

2

i

n

i

−CM = 712.6

SSE = TSS −SST = 1, 196.6

Thus

ANOVA Table

Source of error df SS MS F p-value

Treatments 3 712.6 237.5 3.77

Error 19 1,196.6 63.0

Totals 22 1909.2

Conﬁdence Intervals.

Estimate of the common variance:

s =

√

s

2

=

√

MSE =

SSE

n−t

CI for µ

i

:

T

i

±t

α/2,n−t

s

√

n

i

CI for µ

i

−µ

j

:

(T

i

−T

j

) ±t

α/2,n−t

s

2

(

1

n

i

+

1

n

j

)

MINITAB

MTB> aovoneway C1-C4.

Exercise. Produce a Minitab output for the above example.

92

3 The Randomized Block Design

Extends paired-diﬀerence design to more than two treatments.

A randomized block design consists of b blocks, each containing t experimental units.

The t treatments are randomly assigned to the units in each block, and each treatment

appears once in every block.

Example. A consumer preference study involving three diﬀerent package designs (treat-

ments) was laid out in a randomized block design among four supermarkets (blocks).

The data shown in Table 1. below represent the number of units sold for each package

design within each supermarket during each of three given weeks.

(i) Provide a data summary.

(ii) Do the data present suﬃcient evidence to indicate a diﬀerence in the mean sales

for each package design (treatment)?

(iii) Do the data present suﬃcient evidence to indicate a diﬀerence in the mean sales

for the supermarkets?

weeks

w1 w2 w3

s1 (1) 17 (3) 23 (2) 34

s2 (3) 21 (1) 15 (2) 26

s3 (1) 1 (2) 23 (3) 8

s4 (2) 22 (1) 6 (3) 16

Remarks.

(i) In each supermarket (block) the ﬁrst entry represents the design (treatment) and

the second entry represents the sales per week.

(ii) The three designs are assigned to each supermarket completely at random.

(iii) An alternate design would be to use 12 supermarkets. Each design (treatment)

would be randomly assigned to 4 supermarkets. In this case the diﬀerence in sales could

be due to more than just diﬀerences in package design. That is larger supermarkets

would be expected to have larger overall sales of the product than smaller supermarkets.

The randomized block design eliminates the store-to-store variability.

For computational purposes we rearrange the data so that

Data Summary. The treatment and block totals are

t = 3 treatments; b = 4 blocks

93

Treatments

t1 t2 t3 B

i

s1 17 34 23 B

1

s2 15 26 21 B

2

s3 1 23 8 B

3

s4 6 22 16 B

4

T

i

T

1

T

2

T

3

T

1

= 39, T

2

= 105, T

3

= 68

B

1

= 74, B

2

= 62, B

3

= 32, B

4

= 44

Calculations for the training example produce

CM = (

¸¸

x

ij

)

2

/n = 3, 745.33

TSS =

¸¸

x

2

ij

−CM = 940.67

SST =

¸

T

2

i

b

−CM = 547.17

SSB =

¸

B

2

i

t

−CM = 348.00

SSE = TSS −SST −SSB = 45.50

94

MINITAB.(Commands and Printouts)

MTB> Print C1-C3

ROW UNITS TRTS BLOCKS

1 17 1 1

2 34 2 1

3 23 3 1

4 15 1 2

5 26 2 2

6 21 3 2

7 1 1 3

8 23 2 3

9 8 3 3

10 6 1 4

11 22 2 4

12 16 3 4

MTB> ANOVA C1=C2 C3

ANOVA Table

Source of error df SS MS F p-value

Treatments 2 547.17 273.58 36.08 0.000

Blocks 3 348.00 116.00 15.30 0.003

Error 6 45.50 7.58

Totals 11 940.67

95

Solution to (ii)

Test.

H

0

: µ

1

= µ

2

= µ

3

H

a

: Not all means are equal

T.S. : F =

MST

MSE

= 36.09

where F is based on (t-1) and (n-t-b+1) df.

RR: Reject H

0

if F > F

α,t−1,n−t−b+1

i.e. Reject H

0

if F > F

0.05,2,6

= 5.14

Graph:

Decision: Reject H

0

Conclusion: At 5% signiﬁcance level there is suﬃcient statistical evidence to indicate

a real diﬀerence in the mean sales for the three package designs.

Note that n −t −b + 1 = (t −1)(b −1).

Solution to (iii)

Test.

H

0

: Block means are equal

H

a

: Not all block means are equal (i.e. blocking is desirable)

T.S.: F =

MSB

MSE

= 15.30

where F is based on (b-1) and (n-t-b+1) df.

RR: Reject H

0

if F > F

α,b−1,n−t−b+1

i.e. Reject H

0

if F > F

0.005,3,6

= 12.92

Graph:

Decision: Reject H

0

Conclusion: At .5% signiﬁcance level there is suﬃcient statistical evidence to indicate

a real diﬀerence in the mean sales for the four supermarkets, that is the data supports

our decision to use supermarkets as blocks.

Assumptions.

1. Sampled populations are normal

2. Dependent random samples due to blocking

3. All t populations have equal variances

Conﬁdence Intervals.

Estimate of the common variance:

s =

√

s

2

=

√

MSE =

SSE

n−t−b+1

CI for µ

i

−µ

j

:

96

(T

i

−T

j

) ±t

α/2,n−t−b+1

s

2

b

Exercise. Construct a 90% C.I. for the diﬀerence between mean sales from package

designs 1 and 2.

97

Chapter 10

Simple Linear Regression and

Correlation

Contents.

1. Introduction: Example

2. A Simple Linear probabilistic model

3. Least squares prediction equation

4. Inferences concerning the slope

5. Estimating E(y|x) for a given x

6. Predicting y for a given x

7. Coeﬃcient of correlation

8. Analysis of Variance

9. Computer Printouts

1 Introduction

Linear regression is a statistical technique used to predict (forecast) the value of a

variable from known related variables.

Example.( Ad Sales) Consider the problem of predicting the gross monthly sales volume

y for a corporation that is not subject to substantial seasonal variation in its sales volume.

For the predictor variable x we use the the amount spent by the company on advertising

during the month of interest. We wish to determine whether advertising is worthwhile,

that is whether advertising is actually related to the ﬁrm’s sales volume. In addition we

wish to use the amount spent on advertising to predict the sales volume. The data in the

table below represent a sample of advertising expenditures,x, and the associated sales

98

volume, y, for 10 randomly selected months.

Ad Sales Data

Month y(y$10,000) x(x$10,000)

1 101 1.2

2 92 0.8

3 110 1.0

4 120 1.3

5 90 0.7

6 82 0.8

7 93 1.0

8 75 0.6

9 91 0.9

10 105 1.1

Deﬁnitions.

(i) Response: dependent variable of interest (sales volume)

(ii) Independent (predictor) variable ( Ad expenditure)

(iii) Linear equations (straight line): y = a + bx

Scatter diagram:

Best ﬁt straight line:

Equation of a straight line:

(y-intercept and slope)

2 A Simple Linear Probabilistic Model

Model.

Y = β

0

+ β

1

X +

where

x: independent variable (predictor)

y: dependent variable (response)

β

0

and β

1

are unknown parameters.

: random error due to other factors not included in the model.

Assumptions.

1. E() := µ

= 0.

2. V ar() := σ

2

= σ

2

.

99

3. The r.v. has a normal distribution with mean 0 and variance σ

2

.

4. The random components of any two observed y values are independent.

3 Least Squares Prediction Equation

The least squares prediction equation is sometimes called the estimated regression

equation or the prediction equation.

ˆ y =

ˆ

β

0

+

ˆ

β

1

x

This equation is obtained by using the method of least squares; that is

min

¸

(y − ˆ y)

2

Computational Formulas.

Objective: Estimate β

0

, β

1

and σ

2

.

x =

¸

x/n; y =

¸

y/n

SS

xx

=

¸

(x −x)

2

=

¸

x

2

−(

¸

x)

2

/n

SS

yy

=

¸

(y −y)

2

=

¸

y

2

−(

¸

y)

2

/n

SS

xy

=

¸

(x −x)(y −y) =

¸

xy −(

¸

x)(

¸

y)/n

ˆ

β

1

= SS

xy

/SS

xx

ˆ

β

0

= y −

ˆ

β

1

x.

To estimate σ

2

SSE = SS

yy

−

ˆ

β

1

SS

xy

= SS

yy

−(SS

xy

)

2

/SS

xx

.

s

2

=

SSE

n −2

Remarks.

(i)

ˆ

β

1

: is the slope of the estimated regression equation.

(ii) s

2

provides a measure of spread of points (x, y) around the regression line.

Ad Sales example

Question 1. Do a scatter diagram. Can you say that x and y are linearly related?

Answer.

Question 2. Use the computational formulas to provide a data summary.

100

Answer.

Data Summary.

x = 0.94; y = 95.9

SS

xx

= .444

SS

xy

= 23.34

SS

yy

= 1600.9

101

Optional material

Ad Sales Calculations

Month x y x

2

xy y

2

1 1.2 101 1.44 121.2 10,201

2 0.8 92 0.64 73.6 8,464

3 1.0 110 1.00 110.0 12,100

4 1.3 120 1.69 156.0 14,400

5 0.7 90 0.49 63.0 8,100

6 0.8 82 0.64 65.6 6,724

7 1.0 93 1.00 93.0 8,649

8 0.6 75 0.36 45.0 5,625

9 0.9 91 0.81 81.9 8,281

10 1.1 105 1.21 115.5 11,025

Sum

¸

x

¸

y

¸

x

2

¸

xy

¸

y

2

9.4 959 9.28 924.8 93,569

x = 0.94 y = 95.9

x =

¸

x/n = 0.94; y =

¸

y/n = 95.9

SS

xx

=

¸

x

2

−(

¸

x)

2

/n = 9.28 −

(9.4)

2

10

= .444

SS

xy

=

¸

xy −(

¸

x)(

¸

y)/n = 924.8 −

(9.4)(959)

10

= 23.34

SS

yy

=

¸

y

2

−(

¸

y)

2

/n = 93, 569 −

(959)

2

10

= 1600.9

102

Question 3. Estimate the parameters β

0

, and β

1

.

Answer.

ˆ

β

1

= SS

xy

/SS

xx

=

23.34

.444

= 52.5676 52.57

ˆ

β

0

= y −

ˆ

β

1

x = 95.9 −(52.5676)(.94) 46.49.

Question 4. Estimate σ

2

.

Answer.

SSE = SS

yy

−

ˆ

β

1

SS

xy

= 1, 600.9 −(52.5676)(23.34) = 373.97 .

Therefore

s

2

=

SSE

n −2

=

373.97

8

= 46.75

Question 5. Find the least squares line for the data.

Answer.

ˆ y =

ˆ

β

0

+

ˆ

β

1

x = 46.49 + 52.57x

Remark. This equation is also called the estimated regression equation or prediction

line.

Question 6. Predict sales volume, y, for a given expenditure level of $10, 000 (i.e.

x = 1.0).

Answer.

ˆ y = 46.49 + 52.57x = 46.49 + (52.57)(1.0) = 99.06.

So sales volume is $990, 600.

Question 7. Predict the mean sales volume E(y|x) for a given expenditure level of

$10, 000, x = 1.0.

Answer.

E(y|x) = 46.49 + 52.57x = 46.49 + (52.57)(1.0) = 99.06

so the mean sales volume is $990, 600.

Remark. In Question 6 and Question 7 we obtained the same estimate, the bound

on the error of estimation will, however, be diﬀerent.

4 Inferences Concerning the Slope

Parameter of interest: β

1

Point estimator:

ˆ

β

1

103

Estimator mean: µ

ˆ

β

1

= β

1

Estimator standard error: σ

ˆ

β

1

= σ/

√

SS

xx

Test.

H

0

: β

1

= β

10

(no linear relationship)

H

a

: β

1

= β

10

(there is linear relationship)

T.S. :

t =

ˆ

β

1

−β

10

s/

√

SSxx

RR:

Reject H

0

if t > t

α/2,n−2

or t < −t

α/2,n−2

Graph:

Decision:

Conclusion:

Question 8. Determine whether there is evidence to indicate a linear relationship be-

tween advertising expenditure, x, and sales volume, y.

Answer.

Test.

H

0

: β

1

= 0 (no linear relationship)

H

a

: β

1

= 0 (there is linear relationship)

T.S. :

t =

ˆ

β

1

−0

s/

√

SSxx

=

52.57 −0

6.84/

√

.444

= 5.12

RR: ( critical value: t

.025,8

= 2.306)

Reject H

0

if t > 2.306 or t < −2.306

Graph:

Decision: Reject H

0

Conclusion: At 5% signiﬁcance level there is suﬃcient statistical evidence to indicate

a linear relation ship between advertising expenditure, x, and sales volume, y.

Conﬁdence interval for β

1

:

ˆ

β

1

±t

α/2,n−2

s

√

SSxx

Question 9. Find a 95% conﬁdence interval for β

1

.

Answer.

104

ˆ

β

1

±t

α/2,n−2

s

√

SSxx

52.57 ±2.306

6.84

√

.444

52.57 ±23.57 = (28.90, 76.24)

5 Estimating E(y|x) For a Given x

The conﬁdence interval (CI) for the expected (mean) value of y given x = x

p

is given

by

ˆ y ±t

α/2,n−2

s

2

[

1

n

+

(x

p

−x)

2

SS

xx

]

6 Predicting y for a Given x

The prediction interval (PI) for a particular value of y given x = x

p

is given by

ˆ y ±t

α/2,n−2

s

2

[1 +

1

n

+

(x

p

−x)

2

SS

xx

]

7 Coeﬃcient of Correlation

In a previous section we tested for a linear relationship between x and y.

Now we examine how strong a linear relationship between x and y is.

We call this measure coeﬃcient of correlation between y and x.

r =

SS

xy

SS

xx

SS

yy

Remarks.

(i) −1 ≤ r ≤ 1.

105

(ii) The population coeﬃcient of correlation is ρ.

(iii) r > 0 indicates a positive correlation (

ˆ

β

1

> 0)

(iv) r < 0 indicates a negative correlation (

ˆ

β

1

< 0)

(v) r = 0 indicates no correlation (

ˆ

β

1

= 0)

Question 10. Find the coeﬃcient of correlation, r.

Answer.

r =

SS

xy

SS

xx

SS

yy

=

23.34

0.444(1, 600.9)

= 0.88

Coeﬃcient of determination

Algebraic manipulations show that

r

2

=

SS

yy

−SSE

SS

yy

Question 11. By what percentage is the sum of squares of deviations of y about the

mean (SS

yy

) is reduced by using ˆ y rather than y as a predictor of y?

Answer.

r

2

=

SS

yy

−SSE

SS

yy

= 0.88

2

= 0.77

r

2

= is called the coeﬃcient of determination

8 Analysis of Variance

Notation:

TSS := SS

yy

=

¸

(y −y)

2

(Total SS of deviations).

SSR =

¸

(ˆ y −y)

2

(SS of deviations due to regression or explained deviations)

SSE =

¸

(y − ˆ y)

2

(SS of deviations for the error or unexplained deviations)

TSS = SSR+ SSE

Question 12. Give the ANOVA table for the AD sales example.

Answer.

Question 13. Use ANOVA table to test for a signiﬁcant linear relationship between

sales and advertising expenditure.

106

ANOVA Table

Source df SS MS F p-value

Reg. 1 1,226.927 1,226.927 26.25 0.0001

Error 8 373.973 46.747

Totals 9 1,600.900

ANOVA Table

Source df SS MS F p-value

Reg. 1 SSR MSR=SSR/(1) MSR/MSE

Error n-2 SSE MSE=SSE/(n-2)

Totals n-1 TSS

Answer.

Test.

H

0

: β

1

= 0 (no linear relationship)

H

a

: β

1

= 0 (there is linear relationship)

T.S.: F =

MSR

MSE

= 26.25

RR: ( critical value: F

.005,1,8

= 14.69)

Reject H

0

if F > 14.69

(OR: Reject H

0

if α > p-value)

Graph:

Decision: Reject H

0

Conclusion: At 0.5% signiﬁcance level there is suﬃcient statistical evidence to indicate

a linear relationship between advertising expenditure, x, and sales volume, y.

9 Computer Printouts for Regression Analysis

Store y in C1 and x in C2.

MTB> Plot C1 C2. : Gives a scatter diagram.

MTB> Regress C1 1 C2.

Computer output for Ad sales example:

More generally we obtain:

107

The regression equation is

y=46.5 + 52.6 x

Predictor Coef Stdev t-ratio P

Constant 46.486 9.885 4.70 0.000

x 52.57 10.26 5.12 0.000

s=6.837 R-sq=76.6% R-sq(adj)=73.7%

Analysis of Variance

Source df SS MS F p-value

Reg. 1 1,226.927 1,226.927 26.25 0.000

Error 8 373.973 46.747

Totals 9 1,600.900

Review Exercises: Linear Regression

Please show all work. No credit for a correct ﬁnal answer without a valid argu-

ment. Use the formula, substitution, answer method whenever possible. Show your work

graphically in all relevant questions.

1. Given the following data set

x -3 -1 1 1 2

y 6 4 3 1 1

(i) Plot the scatter diagram, and indicate whether x and y appear linearly related.

(ii) Show that

¸

x = 0;

¸

y = 15;

¸

x

2

= 16;

¸

y

2

= 63; SS

xx

= 16; SS

yy

= 18; and

SS

xy

= −16.

(iii) Find the regression equation for the data. (Answer: ˆ y = 3 −x)

(iv) Plot the regression equation on the same graph as (i); Does the line appear to

provide a good ﬁt for the data points?

(v) Compute SSE and s

2

. (Answer: s

2

= 2/3)

(vi) Estimate the expected value of y when x = −1

(vii) Find the correlation coeﬃcient r and ﬁnd r

2

. (Answer: r = −.943, r

2

= .889)

108

The regression equation is

y =

ˆ

β

0

+

ˆ

β

1

x

Predictor Coef Stdev t-ratio P

Constant

ˆ

β

0

σ

ˆ

β

0

TS: t p-value

x

ˆ

β

1

σ

ˆ

β

1

TS: t p-value

s =

√

MSE R −sq = r

2

R-sq(adj)

Analysis of Variance

Source df SS MS F p-value

Reg. 1 SSR MSR=SSR/(1) MSR/MSE

Error n-2 SSE MSE=SSE/(n-2)

Totals n-1 TSS

2. A study of middle to upper-level managers is undertaken to investigate the re-

lationship between salary level,Y , and years of work experience, X. A random sample

sample of 20 managers is chosen with the following results (in thousands of dollars):

¸

x

i

= 235;

¸

y

i

= 763.8; SS

xx

= 485.75; SS

yy

= 2, 236.1; and SS

xy

= 886.85. It is

further assumed that the relationship is linear.

(i) Find

ˆ

β

0

,

ˆ

β

1

, and the estimated regression equation.

(Answer: ˆ y = 16.73 + 1.826x)

(ii) Find the correlation coeﬃcient, r.(Answer: r = .85)

(iii) Find r

2

and interpret it value.

3. The Regress Minitab’s command has been applied to data on family income, X,

and last year’s energy consumption, Y , from a random sample of 25 families. The income

data are in thousands of dollars and the energy consumption are in millions of BTU. A

portion of a linear regression computer printout is shown below.

Predictor Coef stdev t-ratio P

Constant 82.036 2.054 39.94 0.000

X 0.93051 0.05727 16.25 0.000

s= R-sq=92.0% R-sq(adj)=91.6%

Analysis of Variance

109

Source DF SS MS F P

Regression 7626.6 264.02 0.000

Error 23

Total 8291

(i) Complete all missing entries in the table.

(ii) Find

ˆ

β

0

,

ˆ

β

1

, and the estimated regression equation.

(iii) Do the data present suﬃcient evidence to indicate that Y and X are linearly

related? Test by using α = 0.01.

(iv) Determine a point estimate for last year’s mean energy consumption of all families

with an annual income of $40,000.

4. Answer by True of False . (Circle your choice).

T F (i) The correlation coeﬃcient r shows the degree of association between x and y.

T F (ii) The coeﬃcient of determination r

2

shows the percentage change in y resulting

form one-unit change in x.

T F (iii) The last step in a simple regression analysis is drawing a scatter diagram.

T F (iv) r = 1 implies no linear correlation between x and y.

T F (v) We always estimate the value of a parameter and predict the value of a

random variable.

T F (vi) If β

1

= 1, we always predict the same value of y regardless of the value of x.

T F (vii) It is necessary to assume that the response y of a probability model has a

normal distribution if we are to estimate the parameters β

0

, β

1

, and σ

2

.

110

Chapter 11

Multiple Linear Regression

Contents.

1. Introduction: Example

2. Multiple Linear Model

3. Analysis of Variance

4. Computer Printouts

1 Introduction: Example

Multiple linear regression is a statistical technique used predict (forecast) the value

of a variable from multiple known related variables.

2 A Multiple Linear Model

Model.

Y = β

0

+ β

1

X

1

+ β

2

X

2

+ β

3

X

3

+

where

x

i

: independent variables (predictors)

y: dependent variable (response)

β

i

: unknown parameters.

: random error due to other factors not included in the model.

Assumptions.

1. E() := µ

= 0.

2. V ar() := σ

2

= σ

2

.

3. has a normal distribution with mean 0 and variance σ

2

.

111

4. The random components of any two observed y values are independent.

3 Least Squares Prediction Equation

Estimated Regression Equation

ˆ y =

ˆ

β

0

+

ˆ

β

1

x

1

+

ˆ

β

2

x

2

+

ˆ

β

3

x

3

This equation is obtained by using the method of least squares

Multiple Regression Data

Obser. y x

1

x

2

x

3

1 y

1

x

11

x

21

x

31

2 y

2

x

12

x

22

x

32

· · · · · · · · · · · · · · ·

n y

n

x

1n

x

2n

x

3n

Minitab Printout

The regression equation is

y =

ˆ

β

0

+

ˆ

β

1

x

1

+

ˆ

β

2

x

2

+

ˆ

β

3

x

3

Predictor Coef Stdev t-ratio P

Constant

ˆ

β

0

σ

ˆ

β

0

TS: t p-value

x

1

ˆ

β

1

σ

ˆ

β

1

TS: t p-value

x

2

ˆ

β

2

σ

ˆ

β

2

TS: t p-value

x

3

ˆ

β

3

σ

ˆ

β

3

TS: t p-value

s =

√

MSE R

2

= r

2

R

2

(adj)

Analysis of Variance

Source df SS MS F p-value

Reg. 3 SSR MSR=SSR/(3) MSR/MSE

Error n −4 SSE MSE=SSE/(n-4)

Totals n −1 TSS

112

Source df SS

x

1

1 SSx

1

x

1

x

2

1 SSx

2

x

2

x

3

1 SSx

3

x

3

Unusual observations (ignore)

113

MINITAB.

Use REGRESS command to regress y stored in C1 on the 3 predictor variables stored

in C2 −C4.

MTB> Regress C1 3 C2-C4;

SUBC> Predict x1 x2 x3.

The subcommand PREDICT in Minitab, followed by ﬁxed values of x

1

, x

2

, and x

3

calculates the estimated value of ˆ y (Fit), its estimated standard error (Stdev.Fit), a 95%

CI for E(y), and a 95% PI for y.

Example. A county assessor wishes to develop a model to relate the market value, y, of

single-family residences in a community to the variables:

x

1

: living area in thousands of square feet;

x

2

: number of ﬂoors;

x

3

: number of bedrooms;

x

4

: number of baths.

Observations were recorded for 29 randomly selected single-family homes from res-

idences recently sold at fair market value. The resulting prediction equation will then

be used for assessing the values of single family residences in the county to establish the

amount each homeowner owes in property taxes.

A Minitab printout is given below:

MTB> Regress C1 4 C2-C5;

SUBC> Predict 1.0 1 3 2;

SUBC> Predict 1.4 2 3 2.5.

The regression equation is

y = −16.6 + 7.84x

1

−34.4x

2

−7.99x

3

+ 54.9x

4

Predictor Coef. Stdev t-ratio P

Constant −16.58 18.88 −0.88 0.389

x

1

7.839 1.234 6.35 0.000

x

2

−34.39 11.15 −3.09 0.005

x

3

−7.990 8.249 −0.97 0.342

x

4

54.93 13.52 4.06 0.000

s = 16.58 R

2

= 88.2% R

2

(adj) = 86.2%

114

Analysis of Variance

Source df SS MS F p-value

Reg. 4 49359 12340 44.88 0.000

Error 24 6599 275

Totals 28 55958

Source df SS

x

1

1 44444

x

2

1 59

x

3

1 321

x

4

1 4536

Fit Stdev.Fit 95%C.I. 95%P.I.

113.32 5.80 (101.34, 125.30) (77.05, 149.59)

137.75 5.48 (126.44, 149.07) (101.70, 173.81)

115

Q1. What is the prediction equation ?

The regression equation is

y = −16.6 + 7.84x

1

−34.4x

2

−7.99x

3

+ 54.9x

4

Q2. What type of model has been chosen to ﬁt the data?

Multiple linear regression model.

Q3. Do the data provide suﬃcient evidence to indicate that the model contributes

information for the prediction of y? Test using α = 0.05.

Test:

H

0

: model not useful

H

a

: model is useful

T.S. : p-value=0.000

DR. Reject H

0

if α > p −value

Graph:

Decision: Reject H

0

Conclusion: At 5% signiﬁcance level there is suﬃcient statistical evidence to indicate

that the model contributes information for the prediction of y.

Q4. Give a 95% CI for E(y) and PI for y when x

1

= 10, x

2

= 1, x

3

= 3, and x

4

= 2.

CI: (101.34, 125.30)

PI: (77.05, 149.59)

Non-Linear Models

Example.

ˆ y =

ˆ

β

0

+

ˆ

β

1

x

1

+

ˆ

β

2

x

2

+

ˆ

β

3

x

2

1

x

2

116

MBA 604, Spring 2003 MBA 604 Introduction to Probability and Statistics Course Content. Topic 1: Data Analysis Topic 2: Probability Topic 3: Random Variables and Discrete Distributions Topic 4: Continuous Probability Distributions Topic 5: Sampling Distributions Topic 6: Point and Interval Estimation Topic 7: Large Sample Estimation Topic 8: Large-Sample Tests of Hypothesis Topic 9: Inferences From Small Sample Topic 10: The Analysis of Variance Topic 11: Simple Linear Regression and Correlation Topic 12: Multiple Linear Regression

1

Contents

1 Data Analysis 1 Introduction . . . . . . . . . 2 Graphical Methods . . . . . 3 Numerical methods . . . . . 4 Percentiles . . . . . . . . . . 5 Sample Mean and Variance For Grouped Data . . . . . 6 z-score . . . . . . . . . . . . 2 Probability 1 Sample Space and Events 2 Probability of an event . . 3 Laws of Probability . . . . 4 Counting Sample Points . 5 Random Sampling . . . . 6 Modeling Uncertainty . . . 5 5 7 9 16 17 17 22 22 23 25 28 30 30 35 35 37 38 40 48 48 48 51 52

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

3 Discrete Random Variables 1 Random Variables . . . . . . . 2 Expected Value and Variance 3 Discrete Distributions . . . . . 4 Markov Chains . . . . . . . . 4 Continuous Distributions 1 Introduction . . . . . . . 2 The Normal Distribution 3 Uniform: U[a,b] . . . . . 4 Exponential . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

2

. . . . . . . . . . . . . . . . 2 A Large-Sample Statistical Test . . . . . 3 Small-Sample Inferences About a Population Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Point Estimators and Their Properties 3 Single Quantitative Population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 One Way ANOVA: Completely Randomized Experimental Design . . . . . . . . . . . . 6 Large Sample Estimation 1 Introduction . 2 Student’s t Distribution . . . . . 6 Two Binomial Populations . . . 7 Reporting Results of Statistical Tests: P-Value . . . . . . . . . . 2 Sampling Distributions . . . . . . . 7 Large-Sample Tests of Hypothesis 1 Elements of a Statistical Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Comparing Two Population Means . . . . . . . . . . . . . . . . 3 Testing a Population Mean . . . . . . . . . . . . . . . . . . . . . . 4 Single Binomial Population . . . . 56 56 56 61 61 62 62 64 66 67 70 70 71 72 73 74 75 77 79 79 79 80 81 84 86 87 89 89 90 93 . . . . . . . . . . . . . . . . . . . . . . . . . 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Small-Sample Inferences About the Diﬀerence Between Two Means: Independent Samples . . . 6 Inferences About a Population Variance . . . . . . . 3 The Randomized Block Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Small-Sample Inferences About the Diﬀerence Between Two Means: Paired Samples . 4 Testing a Population Proportion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 Sampling Distributions 1 The Central Limit Theorem (CLT) . . 7 Comparing Two Population Variances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Analysis of Variance 1 Introduction . . . . . . . . . . . . 5 Two Quantitative Populations . . . . . . . 6 Comparing Two Population Proportions . . . . . . . . . . . . . . . . . . 8 Small-Sample Tests of Hypothesis 1 Introduction . .

. . . . 7 Coeﬃcient of Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Predicting y for a Given x . . . . . 8 Analysis of Variance . . . . . . 4 Inferences Concerning the Slope . . . . . . . . . . 2 A Simple Linear Probabilistic Model . . . . . . . . . . . . . . . . . . . . . . 4 . . . . . . . . . . . . . . . . . 9 Computer Printouts for Regression Analysis . . . . . . . . . . . . 98 98 99 100 103 105 105 105 106 107 111 111 111 112 11 Multiple Linear Regression 1 Introduction: Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Least Squares Prediction Equation . . . . . . . . . . . . . 2 A Multiple Linear Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10 Simple Linear Regression and Correlation 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Least Squares Prediction Equation . . . . . . . . . . . . . . . . . . . . . . . . . 5 Estimating E(y|x) For a Given x . . . . . . . . . . . . . . . . . . . . . . . .

How fuel eﬃcient a certain car model is? 4. A market analyst wants to know the eﬀectiveness of a new diet. wants to know if a new drug is superior to already existing drugs. What is the eﬀect of package designs on sales. 5 . A pharmaceutical Co. 2. 5. 3. If you answer all questions on a (T. Is there any relationship between your GPA and employment opportunities. Introduction Statistical Problems Descriptive Statistics Graphical Methods Frequency Distributions (Histograms) Other Methods Numerical methods Measures of Central Tendency Measures of Variability Empirical Rule Percentiles 1 Introduction Statistical Problems 1. or possible side eﬀects.F) (or multiple choice) examination completely randomly. what are your chances of passing? 6.Chapter 1 Data Analysis Chapter Content.

weight of freshman students) Descriptive Statistics: deals with procedures used to summarize the information contained in a set of measurements. (ii) a design of the experiment or sampling procedure. height. Types of data: qualitative vs quantitative OR discrete vs continuous Descriptive statistics Graphical vs numerical methods 6 . (iv) Procedure for making predictions about the population based on sample information. times to assemble a product) Information: Any aquired data ( e. Elements of a statistical problem: (i) A clear deﬁnition of the population and variable of interest. How to interpret polls. What is the eﬀect of market strategy on market share? 9. major. (better statement) To make inferences (predictions. all registered voters. A collection of numbers (data)) Knowledge: Useful data Population: set of all measurements of interest (e.7. Inferential Statistics: deals with procedures used to make inferences (predictions) about a population parameter from information contained in a sample. (iii) Collection and analysis of data (gathering and summarizing data).g.g. all freshman students at the university) Sample: A subset of measurements selected from the population of interest Variable: A property of an individual population unit (e.g. Objective. Deﬁnitions Probability: A game of chance Statistics: Branch of science that deals with data analysis Course objective: To make decisions in the prescence of uncertainty Terminology Data: Any recorded event (e. How many individuals you need to sample for your inferences to be acceptable? What is meant by the margin of error? 8.g. How to pick the stocks to invest in? I. decisions) about certain characteristics of a population based on information contained in a sample. (v) A measure of “goodness” or reliability for the procedure.

7 .3 19. f 1 5.0-17.1 Loss 15.2 20.28) 6/25 (.5 15.5 19.9 Objective: Provide a useful summary of the available information.6 5.0-9.) -The more data one has the larger is the number of classes.2 12.0 11.0-13.6 9.4 12.04) 1.12) 1/25 (.7 16.07 4 17.24) 3/25 (.00 Let k = # of classes max = largest measurement min = smallest measurement n = sample size w = class width Rule of thumb: -The number of classes chosen is usually between 5 and 20.06 5 21.6 18.0-29.3 8. Method: Construct a statistical graph called a “histogram” (or frequency distribution) Weight Loss Data class boundtally class aries freq.0 1 Totals 25 rel.0-25.8 9.8 22. f /n 3/25 (.20) 7/25 (.9 7.03 2 9.05 3 13.0-21. freq. (Most of the time between 7 and 13.8 13.4 14.4 23.8 15.03 6 25.4 16.1 17.8 Data 24.12) 5/25 (.9 28.2 Graphical Methods Frequency and relative frequency distributions (Histograms): Example Weight 20.

0 (why?) 6 Graphs: Graph the frequency and relative frequency distributions. easiest to handle . Exponential 3. max − min .Pie-Charts .87. Repeat the above example using 12 and 4 classes respectively. Determine the class width 3. Proceed as above Possible shapes of frequency distributions 1.Cheating with Charts 8 . k w= Note: w = 28. Comment on the usefulness of each including k = 6. Normal distribution (Bell shape) 2. most useful.Line Charts .3log10 (n). Binomial. Steps in Constructing a Frequency Distribution (Histogram) 1. Exercise.It lends itself easily to more in depth analysis Other Graphical Methods -Statistical Table: Comparing diﬀerent populations . Determine the number of classes 2. Poisson (discrete variables) Important -The normal distribution is the most popular.It occurs naturally in practical applications .Formulas: k = 1 + 3. But we used w = 29−5 = 4.4 6 = 3.6−5. Uniform 4. Locate class boundaries 4.Bar Charts .

5 6 2. 75) then x = 90 + 95 + 80 + 60 + 75 = 400 x x = n = 400 = 80. 21. Sample median 3. Sample Mean (arithmetic average) x = x = x1 +x2 +···+xn n x n or Example 1: Given a sample of 5 test grades (90. Sample mode Measures of Dispersion (Variability) Range Mean Absolute Deviation (MAD) Sample Variance Sample Standard Deviation 1. x2 .3 Numerical methods Measures of Central Tendency 1. · · · . 95. 80. 5 Example 2: Let x = age of a randomly selected student sample: (20. the median is the middle number 9 . 22. 18. 4. xn ) where n = sample size xi = value of the ith observation in the sample 1. 19) x = 20 + 18 + 22 + 29 + 21 + 19 = 129 x x = n = 129 = 21. Note: If n is odd. 60. 3. Sample Median The median of a sample (data set) is the middle number when the measurements are arranged in ascending order. 29. I. 2. Measures of Central Tendency Given a sample of measurements (x1 . Sample mean 2.

Example: Sample: (9. Mode The mode is the value of x (observation) that occurs with the greatest frequency. 2. 2. 14). 6. 7. 3. 14). 11. 2. 2. Example 1: Sample (9. the median is the average of the middle two numbers. 7. 2 Remarks: (i) x is sensitive to extreme values (ii) the median is insensitive to extreme values (because median is a measure of location or position). n = 6 Step 1: 2. 11. 9. 7. mode = 7 10 . 7. 7. 7). n = 5 Step 1: arrange in ascending order 2. 7. 6. 11. 14 Step 2: med = 9. 14 Step 2: med = 7+9 = 8. 14. 9. 11. 11. Example 2: Sample (9.If n is even.

median and mode on relative frequency distribution.Eﬀect of x. 11 .

n 6 x = 80 n |x − x| n Remarks: (i) MAD is a good measure of variability (ii) It is diﬃcult for mathematical manipulations 3.min Example 1: Sample (90.min = 95-65 = 30 2. 95) Range = max . Range: Range = largest measurement . s (x − x)2 n−1 12 .II. 85. s2 s2 = 4. · · · . x2 .smallest measurement or Range = max . xn ) 1. Mean Absolute Diﬀerence (MAD) (not in textbook) MAD = Example 2: Same sample x= x x − x |x − x| 90 10 10 85 5 5 65 -15 15 75 -5 5 70 -10 10 95 15 15 Totals 480 0 60 MAD = 60 |x − x| = = 10. Sample Variance. Sample Standard Deviation. 75. Measures of Variability Given: a sample of size n sample: (x1 . 65. 70.

2 (or s = Example: Same sample 13 .s = or s = √ s2 (x−x)2 n−1 Example: Same sample as before (x = 80) x x − x (x − x)2 90 10 100 85 5 25 65 -15 225 75 -5 25 70 -10 100 95 15 225 Totals 480 0 700 Therefore x= s2 = 480 x = = 80 n 6 700 (x − x)2 = = 140 n−1 5 √ √ s = s2 = 140 = 11.83 Shortcut Formula for Calculating s2 and s s2 = ( x) x2 − n n−1 2 s= ( x) x2 − n n−1 √ s2 ).

83. Data: {x1 . 100 − 38.x x2 90 8100 85 7225 65 4225 75 5625 70 4900 95 9025 Totals 480 39. x2 . · · · . s = 2 2 ( x) n 2 2 Numerical methods(Summary) Data: {x1 . 100 − (480) 6 s = = n−1 5 39. xn } (i) Measures of central tendency Sample mean: x = n i Sample median: the middle number when the measurements are arranged in ascending order Sample mode: most frequently occurring value (ii) Measures of variability Range: r = max − min √ Sample standard deviation: s= s2 Exercise: Find all the measures of central tendency and measures of variability for the weight loss example. xN } Population mean: µ = Population variance: xi N x Sample Variance: s2 = (xi −x)2 n−1 σ2 = (xi − µ)2 N 14 . 400 700 = = = 140 5 5 √ √ s2 = 140 = 11.100 x − 39. x2 . · · · . Graphical Interpretation of the Variance: Finite Populations Let N = population size.

(xi − µ)2 N σ= Population parameters vs sample statistics. xn . σ.e. x2 . x + s) (ii) approximately 95% of the measurements lie within two standard deviations of their sample mean. s = 6. that is bell shaped. . 87] (iii) (k = 3): at least 88% of all grades lie in [57. . i. i.e. Practical Signiﬁcance of the standard deviation Chebyshev’s Inequality. s. Empirical rule. (x − 2s. (x − 3s.Population standard deviation: σ = √ σ 2 . xn . .e. ?] Suppose that you are told that the frequency distribution is bell shaped. (Regardless of the shape of frequency distribution) Given a number k ≥ 1. x + 3s) Example A data set has x = 75. Can you improve the estimates in Chebyshev’s Inequality. σ 2 . s2 .e. (x − s. x2 . 87) contains approximately 95% of the observations (iii) (57. . at least (1 − k12 ) of the measurements lie within k standard deviations of their sample mean. 81] (ii) (k = 2): at least 75% of all grades lie in [63. Sample statistics: x. Then (i) (69. x + 2s) (iii) at least (almost all) 99% of the measurements lie within three standard deviations of their sample mean. Restated. 81) contains approximately 68% of the observations (ii) (63. and a set of measurements x1 . . 93) contains at least 99% (almost all) of the observations Comments. s = 6. i. i. Then (i) approximately 68% of the measurements lie within one standard deviations of their sample mean. ?] (v) (k = 5): at least ?% of all grades lie in [?. 93] (iv) (k = 4): at least ?% of all grades lie in [?. At least (1 − k12 ) observations lie in the interval (x − ks. Population parameters: µ. x + ks). A set of grades has x = 75. . Example. Then (i) (k = 1): at least 0% of all grades lie in [69. . (i) Empirical rule works better if sample size is large (ii) In your calculations always keep 6 signiﬁcant digits 15 . Given a set of measurements x1 . . The frequency distribution is known to be normal (bell shaped).

25 (S2) Q1 = 5 + . 17.5(9) = 4.(iii) Approximation: s range 4 s x (iv) Coeﬃcient of variation (c. .75(17 − 14) = 16.25(n + 1) = . xn be a set of measurements arranged in increasing order. 1. Median (50th percentile) Example.25(9) = 2. Example.1 = 7. Lower Quartile (25th percentile) Example. Let 0 < p < 100.75(n + 1) = . Let x1 . . (i) Find the 30th percentile.3(n + 1) = . 5.75 2.5(n + 1) = . 14. IQ = Q3 − Q1 Exercise. (S1) position = .25(8 − 5) = 5 + . . 11.1 Special Cases.7 (S2) 30th percentile = 5 + .3(9) = 2.5 (S2) median: Q2 = 10 + .25 Interquartiles.75(9) = 6.7(8 − 5) = 5 + 2. 10. 20. The pth percentile is a number x such that p% of all measurements fall below the pth percentile and (100 − p)% fall above it. Find the interquartile (IQ) in the above example. x2 . Upper Quartile (75th percentile) Example.5 3. Deﬁnition.75 = 5.5(11 − 10) = 10. (S1) position = . Solution. 16 . Data: 2. .) = 4 Percentiles Using percentiles is useful if data is badly skewed.75 (S2) Q3 = 14 + . (S1) position = .v. (S1) position = . 8.

019 6 5 21. x f 1 5.015 7 4 17.0 27 1 Totals 25 Example: (weight loss data) x2 f 147 605 1.587 729 6.0-17.0-9.0-21.575 2.5 Sample Mean and Variance For Grouped Data Weight Loss Data class boundaries mid-pt.809 xf 21 55 105 114 69 27 391 Let k = number of classes. sample variance and sample standard deviation of the grouped data in the weight loss example.166 1.023 3 6 25. Compare with the raw data results.011 5 3 13. Formulas. 6 z-score 1.07 3 2 9. The sample z-score for a measurement x is z= x−x s 2. xg = xf n s2 = g x2 f − ( xf )2 /n n−1 where the summation is over the number of classes k. Exercise: Use the grouped data formulas to calculate the sample mean.0-29. freq.0-13.0-25. The population z-score for a measurement x is 17 .

s = . w. Show your work graphically in all relevant questions. how many standard deviations.8588. s2 = .e. Review Exercises: Data Analysis Please show all work. s.82 . What is your relative standing. 1.76 .05 . Suppose your score is 85. Use the formula.89 .94 . The 25 measurements below represent the ﬂuoride level for a sample of 25 days. No credit for a correct ﬁnal answer without a valid argument.88 .97 . answer method whenever possible.89 .z= x−µ σ Example.94 .83 .0065.66 z= s 6 standard deviations above average. (i.93 . substitution.85 . these data represent the early morning readings for the 25 days sampled. above (below) the mean your score is)? Answer. . of each class interval.92 1. (Fluoride Problem) The regulation board of health in a particular state specify that the ﬂuoride level must not exceed 1.84 .78 .77 .79 (i) Show that x = . 18 . (ii) Find the range.83 . R.0803. (iv) Locate class boundaries (v) Construct the frequency and relative frequency distributions for the data.86 .97 . Although ﬂuoride levels are measured more than once per day. s = 6. (iii) Using k = 7 classes. A set of grades has x = 75.75 .71 .84 . 85 − 75 x−x = = 1.85 . ﬁnd the width.81 .5 ppm (parts per million).

(iii) Find the sample mode. 8.5.7.80. (Vertical axis must be clearly labeled) 2. 16. 24. (ii) Find the sample median. Grades for 50 students from a previous MAT test are summarized below.70-.95.588.75. 5) (i) Find the sample mean.90-. 5. 6.95-1.85. class frequency. MAD=2. med =5. (viii) Find the sample standard deviation. Answers: x = 5. ss . 2.90. Q − 3 = 8. 15. s = 2.80-. Q1 and Q3 .00-1. Given the following data set (weight loss per week) (9.001.class frequency . (v) Find the mean absolute diﬀerence. (x) Repeat (i)-(ix) for the data set (21. 24). (iv) Find the sample range. mode =5 range = 7. f xf x2 f 40 -504 50 -606 60-7010 70-8015 80-9010 90-100 5 Totals 19 . (ix) Find the ﬁrst and third quartiles. 3.05 Totals relative frequency (vi) Graph the frequency and relative frequency distributions and state your conclusions. 4. (vii) Find the sample variance using the short-cut formula. (vi) Find the sample variance using the deﬁning formula.25.85-.75-.

Suppose that the relative frequency distribution is bell-shaped. Refer to the raw data in the ﬂuoride problem. T F (iv) The variance is equal to the square of the standard deviation. T F (ii) The mean is insensitive to extreme values. 250. 5. ∞) 7.0745.(i) Complete all entries in the table. s = 14. Using the empirical rule (i) ﬁnd the interval around the mean that contains 99. Σx2 f = 270.58. Assume the standard deviation is known to be 4 and that the frequency distribution is known to be bell-shaped. (Circle your choice). sg = . (4 pts. T F (v) Numerical descriptive measures computed from sample measurements are called parameters. Refer to the data in the ﬂuoride problem. (i) Approximately what percentage of measurements fall in the interval (22. µ + 2σ) (iii) Find the interval around the mean that contains 68% of measurements (iv)Find the interval around the mean that contains 95% of measurements 6. s2 = 196. Σx2 f = 18. x = 72. (iii) Compare the answers in (i) and (ii). (ii) ﬁnd the percentage of measurements fall in the interval (µ + 2σ. T F (vi) The number of students attending a Mathematics lecture on any given day is a discrete variable. (ii) Find the sample mean and standard deviation for the grouped data. 4.475. T F (iii) For a positively skewed frequency distribution. (i) Find the sample mean and standard deviation for the raw data. the mean is larger than the median. (ii) Graph the frequency distribution. (Vertical axis must be clearly labeled) (iii) Find the sample mean for the grouped data (iv) Find the sample variance and standard deviation for the grouped data. Answers: Σxf = 21. Suppose that the mean of a population is 30.6% of measurements. 20 .) Answer by True of False . T F (i) The median is insensitive to extreme values. 34) (ii) Approximately what percentage of measurements fall in the interval (µ. xg =. Answers: Σxf = 3610.2.

21 . T F (xii) A population is a subset of the sample. T F (xi) A parameter is a number that describes a sample characteristic. T F (ix) A sample is a subset of the population. T F (xiii) A population is the complete collection of items under study.T F (vii) The median is a better measure of central tendency than the mean when a distribution is badly skewed. statistical techniques allow us to adequately describe and summarize the data with an average. T F (x) A statistic is a number that describes a population characteristic. T F (viii) Although we may have a large mass of data.

etc. Conceptually. a population could be generated by repeating an experiment indeﬁnitely. polling. Sample Space and Events Probability of an Event Equally Likely Outcomes Conditional Probability and Independence Laws of Probability Counting Sample Points Random Sampling 1 Sample Space and Events Deﬁnitions Random experiment: involves obtaining observations of some kind Examples Toss of a coin. counting arrivals at emergency room. inspecting an assembly line. Outcome of an experiment: Elementary event (simple event): one possible outcome of an experiment Event (Compound event): One or more possible outcomes of a random experiment Sample space: the set of all sample points (simple events) for an experiment is called a sample space. Population: Set of all possible observations.Chapter 2 Probability Contents. Sample space : S 22 . throw a die. or set of all possible outcomes for an experiment Notation.

B = {E1 . Then (i)A ∪ B = {E1 . That is S = {1. . . E5 }. 2 Probability of an event Relative Frequency Deﬁnition If an experiment is repeated a large number. Event: A. . The union of A and B. . Let A = {E1 . . B. D. C. is the event containing all sample points in either A or B or both. . (iii) Ac = {E2 . A ∩ B. E6 }. A ∩ B = φ). Ac . E2 . 3. (any capital letter). (iv) A and B are not mutually exclusive (why?) (v) Give two events in S that are mutually exclusive. (ii) AB = {E1 . Venn diagram: Example. . . Sometimes we use notA or A for complement. (i. The complement of A. S = {E1 . A ∪ B. is the event containing all sample points that are not in A. Sometimes we use AorB for union. E5 . 3. E6 }. E3 . More deﬁnitions Union. E5 }. 5. E2 . E3 }. B c = {E4 .e. E2 . Example Suppose S = {E1 .Sample point: E1 . Mutually Exclusive Events (Disjoint Events) Two events are said to be mutually exclusive (or disjoint) if their intersection is empty. the probability of A is P (A) Interpretation n = # of trials of an experiment nA n 23 . of times and the event A is observed nA times. . E3 . 4. The intersection of A and B. is the event containing all sample points that are both in A and B. E4 . Intersection and Complementation Given A and B two events in a sample space S. 2. 1. 6}. E6 }. E2 . 2. E etc. . . E3 }. We may think of S as representation of possible outcomes of a throw of a die. E6 }. n. Sometimes we use AB or AandB for intersection. etc. E2 .

For each event Ei of the sample space S deﬁne a number P (E) that satisﬁes the following three conditions: (i) 0 ≤ P (Ei ) ≤ 1 for all i (ii) P (S) = 1 (iii) (Additive property) P (Ei ) = 1. i = 7. .75 Steps in calculating probabilities of events 1. . . 9 and P (E10 ) = 2/20. In tabular form. Add up the simple events’ probabilities to obtain the probability of the event 24 . we have Ei p(Ei ) E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 1/20 1/20 1/20 1/20 1/20 1/20 1/5 1/5 1/5 1/10 Question: Calculate P (A) where A = {Ei . . List all simple events 3. 6 and P (Ei ) = 1/5. It is known that P (Ei) = 1/20. . A (In fact. . . Example. . . We refer to P (Ei ) as the probability of the Ei . .. i = 1. E2 . S where the summation is over all sample points in S.nA = frequency of the event A nA = relative frequency of A n P (A) nA n if n is large enough. Let S = {E1 .) Conceptual Deﬁnition of Probability Consider a random experiment whose sample space is S with sample points E1 . i ≥ 6}. A: P (A) = P (E6 ) + P (E7 ) + P (E8 ) + P (E9 ) + P (E10 ) = 1/20 + 1/5 + 1/5 + 1/5 + 1/10 = 0. . P (A) = limn→∞ nn . Deﬁne the experiment 2. 8. . Deﬁnition The probability of any event A is equal to the sum of the probabilities of the sample points in A. Assign probabilities to simple events 4. E10 }. Determine the simple events that constitute an event 5.

Example Calculate the probability of observing one H in a toss of two fair coins.g. · · · T T T } (Complete this) (ii) Find the probability of observing exactly two heads. 3 Laws of Probability Conditional Probability The conditional probability of the event A given that event B has occurred is denoted by P (A|B). (e. should make sense. . Example. (ii) At the conceptual level we assign probabilities to events. however. HT. assigns the same probability P (Ei ) = 1/N for all Ei . Solution. one can estimate probabilities. at most one head. T H. T H} P (A) = 0. (iii) In some cases probabilities can be a measure of belief (subjective probability). Toss a fair coin 3 times. . In this case. then use the laws of probability to calculate the probabilities of compound events. This measure of belief should however satisfy the axioms. EN }. Then P (A ∩ B) P (A|B) = P (B) 25 . T T } A = {HT. Equally Likely Outcomes The equally likely probability P deﬁned on a ﬁnite sample space S = {E1 . However. S = {HH. we would like to assign probabilities to simple events directly.5 in a toss of a fair coin). one cannot measure probabilities. P(T)=.5. (i) List all the sample points in the sample space Solution: S = {HHH. . (iv) Typically. P(H)=. for any event A P (A) = sample points in A #(A) NA = = N sample points in S #(S) where N is the number of the sample points in S and NA is the number of the sample points in A. The assignment. .5 Interpretations of Probability (i) In real world applications one observes (measures) relative frequencies.

Remarks. A = {E1 . . E3 }. (i) What does it mean that all elementary events are equally likely? (ii) Use the complementation rule to ﬁnd P (Ac ). E6 }. (ii) If A is independent of B then B is independent of A. Similarly. . Probability Laws Complementation law: P (A) = 1 − P (Ac ) Additive law: P (A ∪ B) = P (A) + P (B) − P (A ∩ B) Moreover. if A and B are independent P (AB) = P (A)P (B) Example Let S = {E1 . (i) If A and B are independent. E4 . (iii) Find P (A|B) and P (B|A) (iv) Find P (D) and P (D|C) 26 . if A and B are mutually exclusive.provided P (B) > 0. (i) Two events A and B are said to be independent if P (A ∩ B) = P (A)P (B). E2 . . then P (A|B) = P (A) and P (B|A) = P (B). (ii) Two events A and B that are not independent are said to be dependent. . E6 }. then P (AB) = 0 and P (A ∪ B) = P (A) + P (B) Multiplicative law (Product rule) P (A ∩ B) = P (A|B)P (B) = P (B|A)P (A) Moreover. C = {E2 .D = {E6 }. Suppose that all elementary events are equally likely. E2 . E5 }. P (B|A) = P (A ∩ B) P (A) Independent Events Deﬁnitions. B = {E1 . E3 .

(ii) Bayes’ Law is important in several ﬁelds of applications. However. (That is. B c be complementary events and let A denote an arbitrary event.995) 95 294 .01. P (A) P (A|B)P (B) + P (A|B c )P (B c ) Remarks. The desired probability P (D|E) is obtained by P (D ∩ E) P (E) P (E|D)P (D) P (E|D)P (D) + P (E|D c )P (D c) (. the test also yields a “false positive” results for 1 percent of healthy persons tested.005) (.(v) Are A and B independent? Are C and D independent? (vi) Find P (A ∩ B) and P (A ∪ B). then.) If 0. B c . Bayes’ Law Let the B. P (B) and P (B c ) are called prior probabilities. or P (A) = P (A|B)P (B) + P (A|B c )P (B c ). A laboratory blood test is 95 percent eﬀective in detecting a certain disease when it is. with probability 0.5 percent of the population actually has the disease. (i) The events of interest here are B. present. Law of total probability Let the B. Then P (B|A) = P (AB) P (A|B)P (B) = .005) + (. what is the probability a person has the disease given that the test result is positive? Solution Let D be the event that the tested person has the disease and E the event that the test result is positive.323. in fact. if a healthy person is tested. B c be complementary events and let A denote an arbitrary event.01)(. and (ii) P (B|A) and P (B c |A) are called posterior (revised) probabilities.95)(. Then P (A) = P (A ∩ B) + P (A ∩ B c ) . Example 1. P (D|E) = = = = 27 .95)(. the test result will imply he or she has the disease.

from the basic principle. so for some applications we need to ﬁnd n. we see. each of whom has 3 sons.Thus only 32 percent of those persons whose test results are positive actually have the disease. there are n possible outcomes of experiment 2. then together there are mn possible outcomes of the two experiments. nA where n and nA are the number of points in S and A respectively. how many diﬀerent choices are possible? Solution: Let the choice of the man as the outcome of the ﬁrst experiment and the subsequent choice of one of his sons as the outcome of the second experiment. for each outcome of experiment 1. (i) Toss two coins: mn = 2 × 2 = 4 (ii) Throw two dice: mn = 6 × 6 = 36 (iii) A small community consists of 10 men. 250 1015 = one trillion. Generalized basic principle of counting 28 .048. Examples.576 9 10 40 1012 15 10 64 1019 Note that 230 109 = one billion. that there are 10 × 3 = 30 possible choices. Probabilities in Tabulated Form 4 Counting Sample Points Is it always necessary to list all sample points in S? Coins 1 3 5 10 30 50 Coin Tosses sample-points Coins sample-points 2 2 4 8 4 16 32 6 64 1024 20 1. Then if experiment 1 can result in any one of m possible outcomes and if. A RECALL: P (A) = nn . If one man and one of his sons are to be chosen as father and son of the year. 240 1012 = one thousand billion. Basic principle of counting: mn rule Suppose that two experiments are to be performed.

What is the total number of available routes between A and D? Solution: The total number of available routes is mnt = 5. (iii) How many diﬀerent 7−place license plates are possible if the ﬁrst 3 places are to be occupied by letters and the ﬁnal 4 by numbers? Solution: It follows from the generalized principle of counting that there are 26 · 26 · 26 · 10 · 10 · 10 · 10 = 175. and if for each of the possible outcomes of the ﬁrst two experiments there are n3 possible outcomes of the third experiment. consisting of 1 individual from each class.4. how many license plates would be possible if repetition among letters or numbers were prohibited? Solution: In this case there would be 26 · 25 · 24 · 10 · 9 · 8 · 7 = 78. A subcommittee of 4. r = 3. Permutations: (Ordered arrangements) The number of ways of ordering n distinct objects taken r at a time (order is important) is given by n! = n(n − 1)(n − 2) · · · (n − r + 1) (n − r)! Examples (i) In how many ways can you arrange the letters a. (iv) In (iii).7 = 140. (ii) A college planning committee consists of 3 freshmen. and if for each of these n1 possible outcomes there are n2 possible outcomes of the second experiment. and 2 seniors. In how many diﬀerent ways can you select 3 balls? Solution: Note that n = 10. Number of diﬀerent ways is 10 · 9 · 8 = 10! = 720. . 5 juniors. List all arrangements. 000 possible license plates.If r experiments that are to be performed are such that the ﬁrst one may result in any of n1 possible outcomes. 000 possible license plates. 4 sophomores. Balls are selected without replacement one at a time. and if.. How many diﬀerent subcommittees are possible? Solution: It follows from the generalized principle of counting that there are 3·4·5·2 = 120 possible subcommittees. 4 between B and C. Examples (i) There are 5 routes available between A and B. (ii) A box contains 10 balls. . is to be chosen. 624. and 7 between C and D. 760. 7! 29 . b and c. then there are a total of n1 · n2 · · · nr possible outcomes of the r experiments. Answer: There are 3! = 6 arrangements or permutations. .

how many diﬀerent committees consisting of 2 men and 3 women can be formed? Solution: 5 7 = 350 possible committees. Examples (i) A committee of 3 is to be formed from a group of 20 people. Even though probability (chance) involves the notion of change. (n−r)! Combinations For r ≤ n.1 = 1140 possible committees. we say the random sample provides an honest representation of the population. 30 . Concept of Probability. (i) If n is large.(which is equal to n! ). (iii) Tables of random numbers may be used to select random samples. we deﬁne n! n = r (n − r)!r! and say that n r represents the number of possible combinations of n objects taken r at a time (with no regard to order). A sample of size n is said to be a random sample if the n elements are selected in such a way that every possible combination of n elements has an equal probability of being selected. In this case the sampling process is called simple random sampling. 2 3 5 Random Sampling Deﬁnition. 1. Example. For instance = 20. (ii) From a group of 5 men and 7 women.19. How many diﬀerent committees are possible? Solution: There are 20 3 = 20! 3!17! = 20. Consider a chance experiment: Toss of a coin. the laws governing the change may themselves remain ﬁxed as time passes. 475.18 3. (ii) For ﬁnite populations the number of possible samples of size n is the number of possible samples when N = 28 and n = 4 is 28 4 N n . 6 Modeling Uncertainty The purpose of modeling uncertainty (randomness) is to discover the laws of change.2. Remarks.

5)2 (. In the model (abstraction): P (H) = 0.e. Estimate how many times the die is tossed.5 exactly. Toss 5 coins repeatedly and write down the number of heads observed in each trial. it works ( i. (i) List all the elements in the sample space. There is no need to carry out this experiment to answer the question. In a fair coin tossing experiment the percentage of (H)eads is very close to 0. B. D. (Thus saving time and eﬀort). An experiment consists of tossing 3 fair coins. Why Probabilistic Reasoning? Example. 500. C.5)3 2 5! (0. 1. (ii) Describe the following events: A = { observe exactly two heads} B = { Observe at most one tail} C = { Observe at least two heads} D = {Observe exactly one tail} (iii) Find the probabilities of events A. The Interplay Between Probability and Statistics. Review Exercises: Probability Please show all work. Use the formula. approximately). 31 . Show your work graphically in all relevant questions.3125 = 2!3! Conclusion.5. Example. No credit for a correct ﬁnal answer without a valid argument. 2. answer method whenever possible. (iii) When theory is applied to real problems. what percentage of trials produce 2 Heads? answer. Now. Use the Binomial law to show that P (2Heads) = 5 (0. substitution. A fair die is tossed for a very large number of times. (ii) Theory is related to physical phenomena only in inexact terms (i.e.5)2 (1 − . It was observed that face 6 appeared 1. Answer.5)3 = 0.Probabilistic Law. it makes sense). (Theory versus Application) (i) Theory is an exact discipline developed from logically deﬁned axioms (conditions). 9000 times.

P (5) = . 5. 4.1. P (D).2. Refer to problem 3.31 . (iii) Find the probability of the event B = {even}. The following probability table gives the intersection probabilities for four events A. B. 3. (i) Find the probability of the event A = {4. 5. 6} such that P (1) = . 6}.. and 3. P(4)=.06 . P (6) = .2.P(3)=. P (2) = . C and D: A .2. 32 .08 1. (iv) Find the probability of the event C = {odd}. 4. Suppose that S = {1.1. 3. Find (i) A ∪ B (ii) A ∩ B (iii) B ∩ C (iv) Ac (v) C c (vi) A ∪ C (vii) A ∩ C (viii) Find the probabilities in (i)-(vii). P (C|A). P (C). 5. P (D|A) and P (C|B). B. (ii) Describe the following events: A = { observe a number larger than 3 } B = { Observe an even number} C = { Observe an odd number} (iii) Find the probabilities of events A.00 C D (i) Using the deﬁnitions. (ix) Refer to problem 2. C. and answer questions (i)-(viii). ﬁnd P (A). 2.55 B 0. (ii) Find the probability of the complement of A. (i) List all the elements in the sample space.3. An experiment consists of throwing a fair die. P (B).1. (iv) Compare problems 2.

· · ·} (ii) Find the probability of selecting a black ball. Suppose that the following two weather forecasts were reported on two local TV stations for the same period. Use the laws of probability to justify your answers to the following questions: (i) If P (A ∪ B) = . and green (g).4. are A and B mutually exclusive? independent? (ii) If P (A ∪ B) = . both today and tomorrow 10%. tomorrow 40%. and P (B) = . 6. P (A) = . white (w).4.6.5. (i) If a ball is selected at random. P (A) = . Second report: The chances of rain are today 30%. A box contains four black and six white balls. (iii) Find the probability of selecting one black and one red ball.3. either today or tomorrow 60%. (Hint: Let A and B be the events of rain today and rain tomorrow.5. both today and tomorrow 20%. what is the probability that both balls are black? both are white? the ﬁrst is white and the second is black? the ﬁrst is black and the second is white? one ball is black? (iii) Repeat (ii) if the balls are selected with replacement. (i) Find the sample space S (Hint: there is 10 sample points). orange (o).(ii) Find P (B c ). Three balls are to be selected at random. a black (b). and P (B) = . what is the probability that it is white? black? (ii) If two balls are selected without replacement. if any. S = {bwr. red (r). 9. (iii) Find P (A ∩ B). First report: The chances of rain are today 30%.) 8. (v) Are B and C independent events? Justify your answer. are A and B mutually exclusive? independent? (iii) If P (A ∪ B) = . A box contains ﬁve balls. are A and B mutually exclusive? independent? 7. (vi) Are B and C mutually exclusive events? Justify your answer. is more believable? Why? No credit if answer is not justiﬁed. either today or tomorrow 60%. 33 . Which of the two reports. (vii) Are C and D independent events? Justify your answer.2. and P (B) = . P (A) = . (viii) Are C and D mutually exclusive events? Justify your answer. tomorrow 40%. (iv) Find P (A ∪ B).65.7.

and by deﬁning the events W1 abd W − 2 as the ﬁrst ball is white and the second ball is white respectively. T F (ix) Although the probability of an event occurring is . T F (ii) The probability of an event can sometimes be negative. T F (xi) If two events are independent. (Circle your choice). T F (x) If a random experiment has 5 possible outcomes. T F (vi) A random sample of n observations from a population is one in which every diﬀerent subset of size n from the population has an equal probability of being selected. Answer by True of False .(Hint: Start by deﬁning the events B1 and B − 2 as the ﬁrst ball is black and the second ball is black respectively.9. T F (v) A random sample of n observations from a population is not likely to provide a good estimate of a parameter. the event may not occur at all in 10 trials. then the probability of each outcome is 1/5. 34 . T F (iv) The sum of the probabilities of all simple events in the sample space may be less than 1 depending on circumstances. T F (i) An event is a speciﬁc collection of simple events. T F (iii) If A and B are mutually exclusive events. T F (vii) The probability of an event can sometimes be larger than one. T F (viii) The probability of an elementary event can never be larger than one half. Then use the product rule) 10. the occurrence of one event should not aﬀect the likelihood of the occurrence of the other event. then they are also dependent.

Toss a coin 3 times. Random Variables Expected Values and Variance Binomial Poisson Hypergeometric 1 Random Variables The discrete rv arises in situations when the population (or possible outcomes) are discrete (or qualitative). T HH} {X = 3} = {HHH}. T HT. T HT. The relevant question is to ﬁnd the probability of each these events.Chapter 3 Random Variables and Discrete Distributions Contents. Example. T HH. T T T } Let the variable of interest. be the number of heads observed then relevant events would be {X = 0} = {T T T } {X = 1} = {HT T. HT T. HHT. 35 . X. T T H. T T H} {X = 2} = {HHT. HT H. Note that X takes integer values even though the sample space consists of H’s and T’s. HT H. then S = {HHH.

50 -0.The variable X transforms the problem of calculating probabilities from that of set theory to calculus.v. etc. assigns a ﬁnite or countably inﬁnite number of possible values (e.60 0.s. etc. Which of the following deﬁnes a probability distribution? x p(x) 0 1 2 0.variable: it takes a numerical value Notation: We use X. toss a coin.v. Discrete distributions in tabulated form Example. etc.v.) is a rule that assigns a numerical value to each possible outcome of a random experiment.v. assigns a probability p(x) for each possible x such that (i) 0 ≤ p(x) ≤ 1.30 0.10 x p(x) -1 1 2 0.v.) A Continuous r. is unknown until the outcome is observed .40 0.50 0. X is discrete (qualitative data) 36 . X.30 0. price. to represent r. (i) Discrete distributions arise when the r.v. A Discrete r. has a continuum of possible values. Y .. Deﬁnition.20 Remarks.v. throw a die.20 x p(x) 0 1 2 0.) Discrete Distributions The probability distribution of a discrete r. A random variable (r. weight.g. height. and (ii) x p(x) = 1 where the summation is over all possible values of x.g. Interpretation: -random: the value of the r. (e.

2 If X is a rv with mean µ. (ii) In Probability we described a random experiment (population) in terms of events and probabilities of events.3 If X is a rv with mean µ. (i) In data analysis we described a set of data (sample) by dividing it into classes and calculating relative frequencies.(ii) Continuous distributions arise when the r. denoted by σX .1 The expected value of a discrete rv X is denoted by µ and is deﬁned to be µ= x xp(x). we describe a random experiment (population) by using random variables. and probability distribution functions. (or simply σ) is deﬁned by σ= Shortcut Formula σ= x2 p(x) − µ2 V (X) = (x − µ)2 p(x) 37 . Deﬁnition 2.v. or sometimes µX to emphasize its dependence on X. 2 Expected Value and Variance Deﬁnition 2. Shortcut Formula x2 p(x) − µ2 σ2 = (x − µ)2 p(x) Deﬁnition 2. then the variance of X is deﬁned by σ2 = x 2 Notation: Sometimes we use σ 2 = V (X) (or σX ). X is continuous (quantitative data) Remarks. (iii) Here. then the standard deviation of X. Notation: The expected value of X is also denoted by µ = E[X].

. .834 − .7 38 . Find (i) P (X ≤ 4) = . X is said to have a binomial distribution with parameters n and p if p(x) = n x n−x p q (x = 0. σ = pq Binomial Tables. . Mean: µ = np √ Variance: σ 2 = npq.633 = . A rv X is said to have a Bernoulli distribution with parameter p if Formula: p(x) = px (1 − p)1−x x = 0.4.v. The binomial experiment (distribution) arises in following situation: (i) the underlying experiment consists of n independent and identical trials.633 (ii) P (X < 6) = P (X ≤ 5) = . A r. (ii) each trial results in one of two possible outcomes.201 Exercise: Answer the same question with p = 0.834 (iii) P (X > 4) = 1 − P (X ≤ 4) = 1 − . p = .367 (iv) P (X = 5) = P (X ≤ 5) − P (X ≤ 4) = . a success or a failure.3 Discrete Distributions Binomial. σ = npq Example: Bernoulli. n) x where q = 1 − p. and (iv) the experimenter is interested in the rv X that counts the number of successes observed in n trials. 1. . Suppose X has a binomial distribution with n = 10.633 = . Tabulated form: x p(x) 0 1 1-p p Mean: µ = p √ Variance: σ 2 = pq. 1. Cumulative probabilities are given in the table. Example. (iii) the probability of a success in a single trial is equal to p and remains the same throughout the experiment.

we have P (X ≥ 1) = 1 − P (X = 0) = 1 − e−0.7358 which is close to the exact answer. .1)1 (0. we have λ = np = 1 e−1 + e−1 0.9)9 = 0. Hypergeometric. Graph.9)10 + (0. number of items in a batch of a random size. from a ﬁnite population of size N divided into two classes consisting 39 . the desired probability is P (X ≤ 1) = p(0) + p(1) = 10 10 (0. number of items demanded from an inventory. Solution. preferably with np ≤ 7. The Poisson random variable arises when counting the number of events that occur in an interval of time when the events are occurring at a constant rate.1)0 (0. A rv X is said to have a Poisson distribution with parameter λ > 0 if p(x) = e−λ λx /x!.1.395 Rule of Thumb. examples include number of arrivals at an emergency room. Mean: µ = λ √ Variance: σ 2 = λ. . Suppose that the probability that an item produced by a certain machine will be defective is 0. σ = λ Note: e 2.5 0. Suppose the number of typographical errors on a single page of your book has a Poisson distribution with parameter λ = 1/2. The hypergeometric distribution arises when one selects a random sample of size n. 1. . . without replacement. Using the binomial distribution.71828 Example. Letting X denote the number of errors on a single page. Solution. The Poisson distribution provides good approximations to binomial probabilities when n is large and µ = np is small. Example.7361 0 1 Using Poisson approximation. x = 0. Calculate the probability that there is at least one error on this page. Find the probability that a sample of of 10 items will contain at most 1 defective item.Poisson.

and in the long-run)? Brand Switching Data This week Last week Brand 1 Brand 2 Brand 1 90 10 Brand 2 40 160 Total 100 200 40 . Then f (x) = 10 x 25 8 15 8−x 0≤x≤8. Both manufacturers have been engaged in aggressive advertising programs which include oﬀering rebates. We deﬁne F (x) = 0. A survey is taken to ﬁnd out the rates at which consumers are switching brands or staying loyal to brands. If the manufacturers are competing for a population of y = 300.(Brand Switching Problem) Suppose that a manufacturer of a product (Brand 1) is competing with only one other similar product (Brand 2). Responses to the survey are given below. D). where max(0. etc. n − N + D) ≤ x ≤ min(n. Formula: f (x) = D x N −D n−x N n . elsewhere.of D elements of the ﬁrst kind and N − D of the second kind. (Sampling without replacement) Suppose an urn contains D = 10 red balls and N − D = 15 white balls. without replacement. 4 Markov Chains Example 1. Such a scheme is called sampling without replacement from a ﬁnite dichotomous population. 000 buyers. D Mean: E[X] = n( N ) D Variance: V (X) = ( N −n )(n)( N (1 − N −1 D )) N The N −n N −1 is called the ﬁnite population correction factor. Example. A random sample of size n = 8. how should they plan for the future (immediate future. is drawn and the number or red balls is denoted by X.

2 0.7/3) B1 buyers will be 300. suppose that customer behavior is not changed over time. 1.1 0. π2 ) = (1/3. Two weeks from now: exercise. and i πi = 1.8 Question 1.9π1 + 0. π2 ) = (π1 .1 0.9 0.Brand 1 Brand 2 Brand 1 90/100 10/100 Brand 2 40/200 160/200 So P = 0.7/3) = 170.3/3. π2 ) = (π1 .1π1 + 0. that is (π1 .9 0. 000(1. What percentage will purchase B1 next week? What percentage will purchase B2 next week? What percentage will purchase B1 two weeks from now? What percentage will purchase B2 two weeks from now? Solution: Note that π 0 = (1/3. Question 2. 2/3) 0. then 1 1 0 0 π 1 = (π1 . 000(1. π2 ) and π1 + π2 = 1 Matrix multiplication gives π1 = 0.9 0. 000.1 0. 2/3).8 = (1.8 41 .3/3) = 130.2 0. π2 )P 1 1 π 1 = (π1 . Determine whether each brand will eventually retain a constant share of the market.8π2 π1 + π2 = 1 0.2π2 π2 = 0. If 1/3 of all customers purchased B1 this week. 000 B2 buyers will be 300. Solution: We need to solve π = πP .2 0.

If she is gloomy today then she will be gloomy tomorrow with probability 0. What restrictions do we place on the probabilities associated with a particular probability distribution? 42 .3 P = 0. This can be obtained by solving π = πP . π1 ). If she is cheerful today then she will be cheerful tomorrow with probability 0. Choose the ﬁrst and the third. a chair) (iv) The number of emergency cases arriving at a city hospital (v) The number of sophomores in a randomly selected Math.One equation is redundant. 1/3) Brand 1 will eventually capture two thirds of the market (200. Identify the following as discrete or continuous random variables.g. class at a university (vi) The rate of interest paid by your local bank on a given day 2. 000) customers. Use the formula. Show your work graphically in all relevant questions. and π0 + π1 = 1. π2 ) = (2/3. On any particular day Rebecca is either cheerful (c) or gloomy (g).6 0. Example 2. (i) What is the transition matrix P ? Solution: 0. Review Exercises: Discrete Distributions and π1 + π2 = 1 Please show all work. answer method whenever possible. where π = (π0 . No credit for a correct ﬁnal answer without a valid argument.4 (ii) What is the fraction of days Rebecca is cheerful? gloomy? Solution: The fraction of days Rebecca is cheerful is the probability that on any given day Rebecca is cheerful. (i) The market value of a publicly listed security on a given day (ii) The number of printing errors observed in an article in a weekly news magazine (iii) The time to assemble a product (e. 1. we get 0.7 0. Exercise.4. substitution.2π2 which gives (π1 .7. Complete this problem.1π1 = 0.

2 .1 .6 .2 .6 .5 -.3.6 3.1 43 . If they are not.1 3. Indicate whether or not the following are valid probability distributions. (i) x p(x) -1 0 1 . indicate which of the restrictions has been violated.5 .2 6 .2 (ii) x p(x) -2 1 4 .2 (ii) x -1 1 p(x) .

and the standard deviation of X. i. A random variable X has the following probability distribution: x p(x) 1 2 . (vi) Graph the probability distribution for X. (v) Find the probability that X is an odd number. σ 2 .58. 5.25 (i) Verify that X has a valid probability distribution.05 . σ. P (X ≤ 2). (ii) Calculate the standard deviation of X. the variance of X.e. σ = 4. A discrete random variable X has the following probability distribution: x p(x) 10 15 20 25 . i. P (X ≥ 3). (i) x p(x) 1 2 . 6. (iv) Find the probability that X is less than or equal to 2. For each of the following probability distributions.45 .15 .1 (i) Calculate the expected value of X. i. P (X > 3).3 3 4 . (ii) Calculate the variance of X. σ 2 = 21.4 .2 .4 . calculate the expected value of X.e.2 .3 .1 44 . (ii) Find the probability that X is greater than 3. (iii) Find the probability that X is greater than or equal to 3. Answers: µ = 17.e.10 3 4 5 . E(X) = µ.4. σ. σ 2 . E(X) = µ.

Use the formula. T F (v) The number of television programs watched per day by a college student is an example of a discrete random variable.2 . T F (x) The probability p(x) for a discrete random variable X must be greater than or equal to zero but less than or equal to one. Review Exercises: Binomial Distribution Please show all work.(ii) x p(x) -2 -1 2 . answer method whenever possible. T F (iv) A random variable is one that takes on diﬀerent values depending on the chance outcome of an experiment. T F (xi) The sum of all probabilities p(x) for all possible values of X is always equal to one. T F (viii) The variance can never be equal to zero. T F (iii) The only rule that applies to all probability distributions is that the possible random variable values are always between 0 and 1. In how many ways can a committee of ten be chosen from ﬁfteen individuals? 8. T F (vi) The monthly volume of gasoline sold in one gas station is an example of a discrete random variable. substitution. T F (xii) The most common method for sampling more than one observation from a population is called random sampling. T F (ii) A random variable has a single numerical value for each outcome of a random experiment. (Circle your choice). T F (i) The expected value is always positive. T F (ix) The variance can never be negative. Show your work graphically in all relevant questions.3 4 . 45 . Answer by True of False . T F (vii) The expected value of a random variable provides a complete description of the random variable’s probability distribution.2 7.3 . No credit for a correct ﬁnal answer without a valid argument.

8. Consider a binomial distribution with n = 500 and p = . 46 .6. List the properties for a binomial experiment. 5. (i) Use the formula to ﬁnd P (0). Consider a binomial distribution with n = 4 and p = . P (1). (v) Find P (X < 12) using the table. 3.1. (ii) What is the standard deviation of X. The organization plans to contact 100.6. 8. (i) Find the expected value E(X) = µ (ii) Find the standard deviation σ 7. (ii) Graph the probability distribution found in (i) (iii) Repeat (i) and (ii) when n = 4. (iii) Find the expected value E(X) = µ (iv) Find the standard deviation σ 6. (vii) Find P (X ≥ 8) using the table. 2. P (4). and p = .2. (iv) Find P (X ≤ 2) using the table. 000 prospects over the coming year. Give the formula for the binomial probability distribution. Consider a binomial distribution with n = 5 and p = . the annual number of sales. Calculate (i) 5! (ii) 10! (iii) 7! 3!4! 4. (iv) Repeat (i) and (ii) when n = 4.6. (ii) Find P (X ≤ 2) using the formula. (i) Find the expected value E(X) = µ (ii) Find the standard deviation σ (iii) Find P (0) and P (2) using the table. A sales organization makes one sale for every 200 prospects that it contacts. (vi) Find P (X > 13) using the table. and p = . (i) What is the expected value of X.5. · · · . (i) Find P (0) and P (2) using the formula. Consider a binomial distribution with n = 25 and p = .

(vii) Determining the weekly pay rate per employee in a given company. (Use the empirical rule). Answer by True of False . Identify the binomial experiment in the following group of statements.(iii) Within what limits would you expect X to fall with 95% probability. Answers: µ = 500. (i) a shopping mall is interested in the income levels of its customers and is taking a survey to gather information (ii) a business ﬁrm introducing a new product wants to know how many purchases its clients will make each year (iii) a sociologist is researching an area in an eﬀort to determine the proportion of households with male “head of households” (iv) a study is concerned with the average hours worked be teenagers who are attending high school (v) Determining whether or nor a manufactured item is defective.3 9. (Circle your choice). T F (i) In a binomial experiment each trial is independent of the other trials. 47 . 10. (vi) Determining the number of words typed before a typist makes an error. σ = 22. T F (i) A binomial distribution is a discrete probability distribution T F (i) The standard deviation of a binomial probability distribution is given by npq.

Example. 0 ≤ x < ∞} Let the variable of interest. be observed lifetime of the light bulb then relevant events would be {X ≤ x}. or {1000 ≤ X ≤ 2000}. Observe the lifetime of a light bulb. Standard Normal 2. Important.Chapter 4 Continuous Distributions Contents. Normal 3. {X ≥ 1000}. The relevant question is to ﬁnd the probability of each these events. then S = {x. 1. X. 2 The Normal Distribution Standard Normal. 48 . It is denoted by the letter Z. A normally distributed (bell shaped) random variable with µ = 0 and σ = 1 is said to have the standard normal distribution. Exponential 1 Introduction RECALL: The continuous rv arises in situations when the population (or possible outcomes) are continuous (or quantitative). For any continuous pdf the area under the curve is equal to 1. Uniform 4.

z0 = 2.9974 Examples. −∞ < z < ∞. (i) P (0 ≤ Z ≤ 1) = .v. 2π Graph.01.025.05. z0 = 1. z0 = 1. (Exercise) Normal A rv X is said to have a Normal pdf with parameters µ and σ if Formula: 1 2 2 f (x) = √ e−(x−µ) /2σ .96.3413 (ii) P (−1 ≤ Z ≤ 1) = . (iii) P (Z > z0 ) = . −∞ < x < ∞.: 49 . .005.10.025.33. Values of P (0 ≤ Z ≤ z) are tabulated in the appendix. Area under graph = 1.10. z0 = 1.645. . z0 = 2.05. (vi) P (Z ≤ z0 ) = . (ii) P (Z > z0 ) = . Critical Values: zα of the standard normal distribution are given by P (Z ≥ zα ) = α which is in the tail of the distribution. Standardizing a normal r.9544 (iv) P (−3 ≤ Z ≤ 3) = . (v) P (Z > z0 ) = . . 0 < σ < ∞ . Properties Mean: E[X] = µ Variance: V (X) = σ 2 Graph: Bell shaped.01.005.28.6826 (iii) P (−2 ≤ Z ≤ 2) = .pdf of Z: 1 2 f (z) = √ e−z /2 . (iv) P (Z > z0 ) = . Find z0 such that (i) P (Z > z0 ) = . σ 2π where −∞ < µ < ∞. . Tabulated Values.58. Examples.

8413.67) = .Z-score: Z= OR (simply) Z= Conversely. (ii) P (X > 0) = P (Z > −1) = P (Z < 1) = . ﬁnd (i) P (2 < X < 5). X − µX σX X −µ σ X = µ + σZ .0) = 0.1 years and standard deviation of 1. with a mean of 3. what fraction of original sales will require replacement? Solution Let X be the length of life of an automatic washer selected at random. (iii) P (X > 9) = P (Z > 2.75 1.75) = 1 − 3. Solution (i) P (2 < X < 5) = P (−0. ﬁnd P (X < −3).2 50 . Example If X is a normal rv with parameters µ = 3 and σ 2 = 9.5 − 0.0228 Exercise Refer to the above example. then z= Therefore P (X < 1) = P (Z < −1.2 years.4772 = .3779. and (iii) P (X > 9). (ii) P (X > 0). If this type of washer is guaranteed for 1 year.1 = −1.33 < Z < 0. Example The length of life of a certain type of automatic washer is approximately normally distributed.

The exact result is P (X = 20) = (ii) Exercise.1268 20 3 Uniform: U[a. P (X ≤ c) = b−a P (X ≤ c) = 1. np ≥ 5 and n(1 − p) ≥ 5. i.5) = P( X − 20 20. Let X be the number of times that a fair coin. The approximation can be improved using correction factors. Example.16) = .b] 1 a<x<b b−a = 0 elsewhere Formula: f (x) = √ Variance: σ 2 = (b − a)2 /12. lands heads. 2. ﬂipped 40. When and how to use the normal approximation: 1. (i) Find the probability that X = 20. (ii) Find P (10 ≤ X ≤ 20). Normal Approximation to the Binomial Distribution. c ≥ b Graph. P (X = 20) = P (19. Use the normal approximation. Large n.1272. Mean: µ = (a + b)/2 51 . Solution Note that np = 20 and np(1 − p) = 10. c−a .a ≤ c ≤ b .5 − 20 19.5 − 20 √ < √ < √ ) 10 10 10 P (−0.5)20 = .Exercise: Complete the solution of this problem.5)20 (0. c ≤ a .5 < X < 20. 40 (0. σ = (b − a)/ 12 CDF: (Area between a and c) P (X ≤ c) = 0.e.16 < Z < 0.

The amount of time.Exercise.. A rv X is said to have an exponential pdf with parameter λ > 0 if f (x) = λe−λx . etc. Mean: µ = 1/λ Variance: σ 2 = 1/λ2. Suppose that the length of a phone call in minutes is an exponential rv with parameter λ = 1/10. (i) P (X > 10) = e−λa = e−1 (ii) P (10 < X < 20) = e−1 − e−2 0. and (ii) between 10 and 20 minutes. 1] case. Examples include time until a new car breaks down. Solution Let X be the be the length of a phone call in minutes by the person ahead of you. as being the distribution of the amount of time until some speciﬁc event occurs. 52 . P (X > a) = e−λa Example 1. x ≥ 0 = 0 elsewhere Properties Graph.. in practice. (i) What is the probability that a computer will function between 50 and 150 hours before breaking down? (ii) What is the probability that it will function less than 100 hours? Solution. . that a computer functions before breaking down is an exponential rv with λ = 1/100. in hours. time until an arrival at emergency room. Specialize the above results to the Uniform [0. 4 Exponential The exponential pdf often arises.233 0. If someone arrives immediately ahead of you at a public telephone booth.368 Example 2. ﬁnd the probability that you will have to wait (i) more than 10 minutes. σ = 1/λ CDF: P (X ≤ a) = 1 − e−λa .

75)) (iv) z = −1.26 and z = 1.0708 (v) P (−z0 ≤ Z ≤ z0 ) = 0. P (−2.0 ≤ Z ≤ 2. P (0 ≤ Z ≤ 1.0 (i. Use the formula. 1.86 (i.6 (i.68 (vi) P (−z0 ≤ Z ≤ z0 ) = 0.6)) (ii) z = 0 and z = −1.e. P (−1.e. P (−1.0 (i.86)) (v) z = −1. Review Exercises: Normal Distribution Please show all work.0708 (iv) P (Z ≤ z0 ) = 0. P (−3.0 and z = 3.0 ≤ Z ≤ 3.0)) 2.26 ≤ Z ≤ 1. Converse The exponential distribution is the only continuous distribution with the memoryless property.384 (ii) Exercise. The exponential rv has the memoryless property.0 and z = 2.0)) (viii) z = −3.0 and z = 1.86)) (vi) z = −1.75 ≤ Z ≤ −.0 ≤ Z ≤ 1. P (. answer method whenever possible. Calculate the area under the standard normal curve between the following values.75 and z = −.6 (i.e. Let Z be a standard normal distribution. Find z0 such that (i) P (Z ≥ z0 ) = 0.86 and z = 1.05 (ii) P (Z ≥ z0 ) = 0.0 (i.e.(i) The probability that a computer will function between 50 and 150 hours before breaking down is given by P (50 ≤ X ≤ 150) = e−50/100 − e−150/100 = e−1/2 − e−3/2 .6 ≤ Z ≤ 0)) (iii) z = . Show your work graphically in all relevant questions. substitution. P (−1.e.86 (i.e. No credit for a correct ﬁnal answer without a valid argument.95 53 .86 ≤ Z ≤ 1.0)) (vii) z = −2. Memoryless Property FACT.e. (i) z = 0 and z = 1.e. P (−1.99 (iii) P (Z ≥ z0 ) = 0.75 (i.

e. T F (iv) The standard normal distribution has its mean equal to zero and standard deviation equal to one. A normally distributed random variable X possesses a mean of µ = 10 and a standard deviation of σ = 5. Find z0 such that (i) P (Z ≥ z0 ) = 0. The height of adult women in the United States is normally distributed with mean 64.e. 54 .5 inches and standard deviation 2. (i) X falls between 10 and 12 (i. P (6 ≤ X ≤ 14)). (Answer: .011) (ii) Alice is 71 inches tall. (iv) X exceeds 10 (i. T F (v) Because the normal distribution is symmetric half of the area under the curve lies below the 40th percentile. P (10 ≤ X ≤ 12)). T F (ii) The standard normal distribution has its mean and standard deviation equal to one. T F (iii) The standard normal distribution has its mean equal to one and standard deviation equal to zero. P (X ≤ 12)). Find the following probabilities. 5.005 4. (iii) X is less than 12 (i. 7.01 (v) P (Z ≥ z0 ) = 0.10 (ii) P (Z ≥ z0 ) = 0. Answer by True of False . T F (i) The standard normal distribution has its mean and standard deviation equal to zero. What is the probability a randomly selected battery will last between 110 and 120 hours. The lifetimes of batteries produced by a ﬁrm are normally distributed with a mean of 100 hours and a standard deviation of 10 hours.025 (iv) P (Z ≥ z0 ) = 0.3.05 (iii) P (Z ≥ z0 ) = 0. Let Z be a standard normal distribution.e.e.9966) 6. P (X ≥ 10)). What percentage of women are shorter than Alice. (i) Find the probability that a randomly chosen woman is larger than 70 inches tall. (Answer: . (Circle your choice). (ii) X falls between 6 and 14 (i.4 inches.

T F (vii) The normal distribution is symmetric only if the mean is zero and the standard deviation is one. 55 .T F (vi) The total area under the normal curve is equal to one only if the mean is equal to zero and standard deviation equal to one.

the CLT says The sampling distribution of the sample mean. is Z= p − µp ˆ ˆ σp ˆ 2 Sampling Distributions Suppose the distribution of X is normal with with mean µ and standard deviation σ. P . (i) What is the distribution of X−µ ? σ Answer: It is a standard normal. The Central Limit Theorem The Sampling Distribution of The Sampling Distribution of The Sampling Distribution of The Sampling Distribution of the the the the Sample Mean Sample Proportion Diﬀerence Between Two Sample Means Diﬀerence Between Two Sample Proportions 1 The Central Limit Theorem (CLT) Roughly speaking. i. X.e. 56 . is Z= X − µX σX ˆ The sampling distribution of the sample proportion.Chapter 5 Sampling Distributions Contents.

with mean µ = 4 and standard deviation σ = 3.E. Z= X − µX X −µ √ = σX σ/ n (iv) What is the sampling distribution of the sample mean. if X is not normally distributed? Answer: The distribution of X is approximately a normal distribution with mean µ √ and standard deviation σ/ n provided n is large (i.5).(X) = √ n (iii) What is the sampling distribution of the sample mean X? Answer: The distribution of X is a normal distribution with mean µ and standard √ deviation σ/ n.e. X. (iii) Find P (X > 3.Z= X −µ σ I. (exercise) (iv) Find P (3. The Sampling Distribution of the Sample Mean (ii) What is the the mean (expected value) and standard deviation of X? Answer: µX = E(X) = µ σ σX = S. (i) What is the mean and standard deviation of X? (ii) Find P (4 < X < 5). n ≥ 30). equivalently. The Sampling Distribution of the Sample Proportion Suppose the distribution of X is binomial with with parameters n and p. Example. ˆ (ii) What is the the mean (expected value) and standard deviation of P ? Answer: ˆ µP = E(P ) = p ˆ 57 . A sample of size 36 is to be selected. (exercise) II. Consider a population.5). X.5 ≤ X ≤ 4.

30 ˆ σp = ˆ ˆ (ii) Find P (P > 0. and nq ≥ 5).(P ) = ˆ pq n ˆ (iii) What is the sampling distribution of the sample proportion P ? ˆ Answer: P has a normal distribution with mean p and standard deviation equivalently Z= ˆ ˆ P − µP P −p ˆ = pq σP ˆ n pq .30.ˆ σP = S.30) III.023 n σX 1 −X 2 = Z= X 1 − X 2 − (µ1 − µ2 ) 2 σ1 n1 + 2 σ2 n2 provided n1 . x = 130. Comparing two Sample Means E(X 1 − X 2 ) = µ1 − µ2 2 2 σ1 σ2 + n1 n2 pq = .e. n provided n is large (i.E. p = 130/400 = . Suppose 130 individuals indicated preference for brand A. np ≥ 5. DATA SUMMARY: n = 400. Answer: µp = p = . It is claimed that at least 30% of all adults favor brand A versus brand B. p = . n2 ≥ 30.325 ˆ ˆ (i) Find the mean and standard deviation of the sample proportion P . To test this theory a sample n = 400 is selected. 58 . Example.

and type of distribution). The number of trips to doctor’s oﬃce per family per year in a given community is known to have a mean of 10 with a standard deviation of 3. Comparing two Sample Proportions ˆ ˆ E(P1 − P2 ) = p1 − p2 p1 q1 p2 q2 + n1 n2 σP1 −P2 = ˆ ˆ Z= provided n1 and n2 are large. (Answer: µ = 20. Suppose a random sample of 49 families is taken and a sample mean is calculated. (i) Describe the sampling distribution of the sample mean. σx ). 59 . 2. (vi) Find P (X ≥ 23)). (vii) Find P (X ≥ 18)). ˆ ˆ P1 − P2 − (p1 − p2 ) p1 q1 2 + pnq2 n1 2 Review Exercises: Sampling Distributions Please show all work. 1. No credit for a correct ﬁnal answer without a valid argument. (i) Describe the sampling distribution of X (i. substitution. Show your work graphically in all relevant questions. describe the distribution of X and give µx . A normally distributed random variable X possesses a mean of µ = 20 and a standard deviation of σ = 5. answer method whenever possible. Let X be the sample average.6) (iii) Find P (X ≥ 22) = (iv) Find P (20 ≤ X ≤ 22)). (Include the mean µx . standard deviation σx .2) (ii) Find the z-score of x = 22 (Answer: 1. σx = 1. Use the formula. A random sample of n = 16 observations is to be selected.e. (v) Find P (16 ≤ X ≤ 19)).IV. X.

does not exceed 11. Answer by True of False . 60 . T F (i) The central limit theorem applies regardless of the shape of the population frequency distribution. T F (ii) The central limit theorem is important because it explains why some estimators tend to possess. approximately.99) 3. the sampling distribution of the sample mean X will be (a) exactly normal. (b) approximately normal (c) binomial (d) none of the above 4. does not exceed 9. When a random sample of size n is drawn from a normal population with mean µ and and variance σ 2 . (Circle your choice). X.(Answer: . (Answer: .01) (iii) Find the probability that the sample mean.(ii) Find the probability that the sample mean. X. a normal distribution.

Single Binomial Population 5. (ii) Conﬁdence level: Conﬁdence coeﬃcient expressed as a percentage. (i) Conﬁdence coeﬃcient: P(interval estimator will enclose the parameter)=1 − α should be as high as possible. Parameters of Interest. Introduction 2. Choosing the Sample Size 1 Introduction Types of estimators. 1. 61 . (i) Unbiased: Mean of the sampling distribution is equal to the parameter. Two Binomial Populations 7. (ii) Minimum variance: Small standard error of point estimator. U) Desired Properties of Point Estimators. (iii) Error of estimation: distance between a parameter and its point estimate is small. Point estimator 2.Chapter 6 Large Sample Estimation Contents. 1. Desired Properties of Interval Estimators. (iii) Margin of Error: (Bound on the error of estimation) should be as small as possible. Two Quantitative Populations 6. Interval estimator: (L. Point Estimators and Their Properties 3. Single Quantitative Population 4.

Margin of Error. 1.I. Sample is randomly selected 62 . ( or Bound on the Error of Estimation) σ B = zα/2 √ n Assumptions. x.Single Quantitative Population: µ Single Binomial Population: p Two Quantitative Populations: µ1 − µ2 Two Binomial Populations: p1 − p2 2 Point Estimators and Their Properties Parameter of interest: θ ˆ ˆ Sample data: n. Large sample (n ≥ 30) 2. s Other information: α Point estimator: x Estimator mean: µx = µ √ Standard error: SE(x) = σ/ n (also denoted as σx ) Conﬁdence Interval (C. σθ ˆ Point estimator: θ Estimator mean: µθ = θ (Unbiased) ˆ ˆ Standard error: SE(θ) = σ ˆ θ Assumptions: Large sample + others (to be speciﬁed in each case) 3 Single Quantitative Population Parameter of interest: µ Sample data: n. θ.) for µ: σ x ± zα/2 √ n Conﬁdence level: (1 − α)100% which is the probability that the interval estimator contains the parameter.

Give a 95% bound on the error of estimation (also known as the margin of error). Question 5.1. Sample size: (zα/2 )2 σ 2 n B2 63 .6 Question 2. If n.? The width of the CI is σ W = 2zα/2 √ n W = 2(0. What is the width of the CI found in Question 3.05 − 11.645 √ 225 11. The margin of error decreases. s = 4. We are interested in estimating the mean number of unoccupied seats per ﬂight. Interpret the CI found in Question 3.15. σ x ± zα/2 √ n 4.6 ± 0. is increased what happens to the width of the CI? what happens to the margin of error? The width of the CI decreases. Data summary: n = 225. for a major airline.1 σ = 0.1 11.90 OR W = 12. 4.90 Question 6. OR If repeated sampling is used. The interval contains µ with probability 0.5357 B = zα/2 √ = 1. the sample size. Question 1. Find a 90% conﬁdence interval for µ.96 √ n 225 Question 3. µ.90.45 = (11. What is the point estimate of µ ( Do not give the margin of error)? x = 11. then 90% of CI constructed would contain µ.Example 1.45) = 0. 12.1. A random sample of n = 225 ﬂights shows that the sample mean is 11. x = 11.6 ± 1.05) Question 4.15 = 0.6.6 and the standard deviation is 4.

R = 13. Use α = 0. (round up) Exercise 1. Example 2. 4 Single Binomial Population Parameter of interest: p Sample data: n. You are told that preliminary data shows a range from 13.6.3 = .1. Find the sample size necessary to reduce W in the ﬂight example to . Now B = W/2 = 0. B = zα/2 pq ˆˆ n 64 .01.7.05.05. x. Data summary: α = . 0. Suppose you want to construct a 99% CI for µ so that W = 0.582 (.) for p: p ± zα/2 ˆ pq ˆˆ n Conﬁdence level: (1 − α)100% which is the probability that the interval estimator contains the parameter.4/4 = .4. σ is sometimes approximated by R where R is the 4 range. What sample size should you choose? A.7 − 13.3 to 13.0252 So n = 107.I.05/2 = 0.025. ˆ x Other information: α Point estimator: p ˆ Estimator mean: µp = p ˆ Standard error: σp = ˆ pq n Conﬁdence Interval (C. Margin of Error. Note: In the absence of data. p = n (x here is the number of successes).50 . so σ . Therefore n (zα/2 )2 σ 2 B2 2.where σ is estimated by s.1)2 = = 106.

074 n Question 5. 1. x = 257. the sample size. Find a 90% conﬁdence interval for p. nq ≥ 5) 2.037 = (0.96 = 0. Question 6.531.469) = 1.531 n 484 B = zα/2 pq ˆˆ (0.90. 0. Large sample (np ≥ 5. What is the point estimate of p and its margin of error? p= ˆ x 257 = = 0. q Therefore we have a large sample size. p ± zα/2 ˆ pq ˆˆ n 0.568) Question 4. If n. p nˆ = 484(0.531 ± 0.469) 484 0. p = n = 257 = 0.Assumptions. ˆ x 484 Question 1. is increased what happens to the width of the CI? what happens to the margin of error? 65 . Interpret the CI found in Question 3. then 90% of CI constructed would contain p. Question 2.044 n 484 Question 3.531)(0.531)(0. Sample is randomly selected Example 3.494.? The width of the CI is W = 2zα/2 pq ˆˆ = 2(0.645 (0.037) = 0.469) = 227 which is ≥ 5. Data summary: n = 484.531) = 257 which is ≥ 5.531 ± 1. Do we have a large sample size? nˆ = 484(0. What is the width of the CI found in Question 3. OR If repeated sampling is used. The interval contains p with probability 0. A random sample of n = 484 voters in a community produced x = 257 voters in favor of candidate A.

s1 Sample 2: n2 .5 or simply pq = 0.28 0.35. (round up) Exercise 2. You need to construct a 95% CI for p so that B = 0. choose p = q = 0. s2 Point estimator: X 1 − X 2 Estimator mean: µX 1 −X 2 = µ1 − µ2 Standard error: SE(X 1 − X 2 ) = 2 σ1 n1 + 2 σ2 n2 Conﬁdence Interval. 885. 5 Two Quantitative Populations Parameter of interest: µ1 − µ2 Sample data: Sample 1: n1 . p = 0. What sample size should you choose ˆ ? Use α = 0. The margin of error decreases. 884.65) = = 3.25.015. x1 . n pˆ (zα/2 )2 (ˆq) .01. (x1 − x2 ) ± zα/2 2 2 σ1 σ2 + n1 n2 66 . Suppose that no preliminary estimate of p is available. 2 B Note: In the absence of data. Sample size.015 ˆ n pˆ (zα/2 )2 (ˆq ) 2 B (1. x2 . Suppose that no preliminary estimate of p is available. You are told that preliminary data shows a p = 0. Exercise 3.05.35)(0. B = 0. ˆ ˆ ˆˆ Example 4.The width of the CI decreases.35. Find the sample ˆ size necessary so that α = 0.05.96)2 (0. Find the new sample ˆ size. Use α = 0. Data summary: α = .0152 So n = 3. Suppose you want to provide an accurate estimate of customers preferring one brand of coﬀee over another.05.

Large samples ( n1 ≥ 30. n2 q2 ≥ 5) 2. Samples are independent Sample size. n2 ≥ 30) 2. Samples are randomly selected 3. x2 . ˆ (ˆ1 − p2 ) ± zα/2 p Assumptions.Assumptions. 1. Large samples. 1. n2 p2 ≥ 5. 2 2 (zα/2 )2 (σ1 + σ2 ) n B2 6 Two Binomial Populations Parameter of interest: p1 − p2 x Sample 1: n1 . p2 = ˆ x2 n2 p1 − p2 (unknown parameter) α (signiﬁcance level) ˆ Point estimator: p1 − p2 ˆ Estimator mean: µp1−ˆ2 = p1 − p2 ˆ p Estimated standard error: σp1 −ˆ2 = ˆ p Conﬁdence Interval.5) B2 ˆˆ p1 q1 p2 q2 ˆˆ + n1 n2 p1 q1 ˆ ˆ n1 + p2 q2 ˆ ˆ n2 Review Exercises: Large-Sample Estimation 67 . p1 = n1 ˆ 1 Sample 2: n2 . p ˆ ˆˆ (zα/2 )2 (ˆ1 q1 + p2 q2 ) n 2 B For unkown parameters: n (zα/2 )2 (0. n1 q1 ≥ 5. x1 . (n1 p1 ≥ 5. Samples are randomly and independently selected Sample size.

Show your work graphically in all relevant questions. A random sample of size n = 100 is selected form a quantitative population. T F (i) If the population variance increases and other factors are the same.55) (iii) Interpret the conﬁdence interval found in (ii).18) (ii) Find a 99% conﬁdence interval for the population mean. and give a 95% bound on the error of estimation (or margin of error). the width of the conﬁdence interval for the population mean tends to increase. (i) Give the point estimate of the population parameter µ and a 99% bound on the error of estimation. (iv) Find the sample size necessary to reduce the width of the conﬁdence interval in (ii) by half.Please show all work. (Answer: B=14. Answer by True of False . (i) Estimate the population mean µ. An examination of the yearly premiums for a random sample of 80 automobile insurance policies from a major company showed an average of $329 and a standard deviation of $49. produces a mean and standard deviation of x = 870 and s = 20 tons respectively. substitution. 68 . The daily yield recorded for n = 100 days. (iii) Suppose we wish our estimate in (i) to be accurate to within $5 with 95% conﬁdence. (ii) Find a 99% conﬁdence interval for the population mean. Use the formula.135) (ii) Construct a 99% conﬁdence interval for µ. (iii) Interpret the conﬁdence interval found in (ii). (Circle your choice). Suppose we wish to estimate the average daily yield of a chemical manufactured in a chemical plant. how many insurance policies should be sampled to achieve the desired level of accuracy? (Answer: n=369) 3. The data produced a mean and standard deviation of x = 75 and s = 6 respectively. answer method whenever possible. (Answer: n=400) 2. (Answer: B=1. (Margin of error). No credit for a correct ﬁnal answer without a valid argument. and give a 95% bound on the error of estimation (or margin of error). (iv) Find the sample size necessary to reduce the width of the conﬁdence interval in (ii) by half. (i) Estimate the average daily yield µ. 4. 1. (Answer: B=1.

69 . T F (iii) Populations are characterized by numerical descriptive measures called statistics. T F (v) The sample standard deviation s can be used to approximate σ when n is larger than 30. the width of the conﬁdence interval for the population mean tends to decrease. T F (vi) The sample mean always lies above the population mean..I. then the margin of error will increase. for a given C.T F (ii) As the sample size increases. T F (iv) If. α is increased.

that is proof by contradiction.. Testing a population proportion 5. Elements of a statistical test 2. Testing a population mean 4. Graph: Decision: either “Reject H0 ” or “Do no reject H0 ” Conclusion: At 100α% signiﬁcance level there is (in)suﬃcient statistical evidence to “ favor Ha ” . Testing the diﬀerence between two population proportions 7.. Testing the diﬀerence between two population means 6. Comments: * H0 represents the status-quo * Ha is the hypothesis that we want to provide evidence to justify.Chapter 7 Large-Sample Tests of Hypothesis Contents. Reporting results of statistical tests: p-Value 1 Elements of a Statistical Test Null hypothesis: H0 Alternative (research) hypothesis: Ha Test statistic: Rejection region : reject H0 if ... Type I error ≡ { reject H0 |H0 is true } 70 . We show that Ha is true by showing that H0 is false. 1. A Large-sample statistical test 3.

2) θ < θ0 . H0 : Innocent Ha : Guilty α = P rob{sending an innocent person to jail} β = P rob{letting a guilty person go free} Example 2.Type II error ≡ { do not reject H0 |H0 is false} α = P rob{Type I error} β = P rob{Type II error} Power of a statistical test: Prob{reject H0 — H0 is false }= 1 − β Example 1. H0 : New drug is not acceptable Ha : New drug is acceptable α = P rob{marketing a bad drug} β = P rob{not marketing an acceptable drug} 2 A Large-Sample Statistical Test Parameter of interest: θ ˆ ˆ Sample data: n. 3) θ = θ0 Test statistic (TS): ˆ θ − θ0 z= σθ ˆ Critical value: either zα or zα/2 Rejection region (RR) : 1) Reject H0 if z > zα 2) Reject H0 if z < −zα 3) Reject H0 if z > zα/2 or z < −zα/2 Graph: Decision: 1) if observed value is in RR: “Reject H0 ” 2) if observed value is not in RR: “Do no reject H0 ” 71 . σθ Test: Null hypothesis (H0 ) : θ = θ0 Alternative hypothesis (Ha ): 1) θ > θ0 . θ.

: x − µ0 √ z= σ/ n Rejection region (RR) : 1) Reject H0 if z > zα 2) Reject H0 if z < −zα 3) Reject H0 if z > zα/2 or z < −zα/2 Graph: Decision: 1) if observed value is in RR: “Reject H0 ” 2) if observed value is not in RR: “Do no reject H0 ” Conclusion: At 100α% signiﬁcance level there is (in)suﬃcient statistical evidence to “ favor Ha ” . Assumptions: Large sample (n ≥ 30) Sample is randomly selected Example: Test the hypothesis that weight loss in a new diet program exceeds 20 pounds during the ﬁrst month. µ0 = 20. x. Assumptions: Large sample + others (to be speciﬁed in each case). α = 0. s Other information: µ0 = target value.S. 2) µ < µ0 .05 H0 : µ = 20 (µ is not larger than 20) 72 . 3) µ = µ0 T. Sample data : n = 36. One tailed statistical test Upper (right) tailed test Lower (left) tailed test Two tailed statistical test 3 Testing a Population Mean Parameter of interest: µ Sample data: n. s2 = 25. x = 21. α Test: H0 : µ = µ 0 Ha : 1) µ > µ0 .Conclusion: At 100α% signiﬁcance level there is (in)suﬃcient statistical evidence to ··· .

5. : 21 − 20 x − µ0 √ √ = = 1.Ha : µ > 20 (µ is larger than 20) T.645 RR: Reject H0 if z > 1. 4 Testing a Population Proportion Parameter of interest: p (unknown parameter) Sample data: n and x (or p = n ) ˆ x p0 = target value α (signiﬁcance level) Test: H 0 : p = p0 Ha : 1) p > p0 .S. 2) p < p0 .S. : p − p0 ˆ z= p0 q0 /n RR: 1) Reject H0 if z > zα 2) Reject H0 if z < −zα 3) Reject H0 if z > zα/2 or z < −zα/2 Graph: Decision: 1) if observed value is in RR: “Reject H0 ” 2) if observed value is not in RR: “Do not reject H0 ” Conclusion: At (α)100% signiﬁcance level there is (in)suﬃcient statistical evidence to “ favor Ha ” . 3) p = p0 T. Assumptions: 73 .645 Graph: Decision: Do not reject H0 Conclusion: At 5% signiﬁcance level there is insuﬃcient statistical evidence to conclude that weight loss in a new diet program exceeds 20 pounds per ﬁrst month. Exercise: Test the claim that weight loss is not equal to 19.2 z= s/ n 5/ 36 Critical value: zα = 1.

Solution.10. Test the hypothesis that p > .S. Large sample (np ≥ 5. : (x1 − x2 ) − D0 z= 2 σ1 σ2 + n2 n1 2 RR: 1) Reject H0 if z > zα 2) Reject H0 if z < −zα 74 . Exercise Is the large sample assumption satisﬁed here ? 5 Comparing Two Population Means Parameter of interest: µ1 − µ2 Sample data: Sample 1: n1 .41 p0 q0 /n (.10)(.13. s1 Sample 2: n2 .10 for sample data: n = 200. x1 . x2 . ˆ x Now H0 : p = . 2) µ1 − µ2 < D0 .10 z= = = 1. Sample is randomly selected Example. x = 26.90)/200 RR: reject H0 if z > 1.10 TS: p − p0 ˆ .645 Graph: Dec: Do not reject H0 Conclusion: At 5% signiﬁcance level there is insuﬃcient statistical evidence to conclude that p > .10 Ha : p > .13 − .1. 3) µ1 − µ2 = D0 T. nq ≥ 5) 2. s2 Test: H 0 : µ 1 − µ 2 = D0 Ha : 1)µ1 − µ2 > D0 . 26 p = n = 200 = .

s2 = 24 (new) 2 D0 = 0.5.96 or z < −1. Samples are independent Example: (Comparing two weight loss programs) Refer to the weight loss example.14 z= 2 σ1 σ2 + n2 n1 2 Critical value: zα/2 = 1.S. Test the hypothesis that weight loss in the two diet programs are diﬀerent.05 H0 : µ 1 − µ 2 = 0 Ha : µ1 − µ2 = 0. n2 ≥ 30) 2. Exercise: Test the hypothesis that weight loss in the old diet program exceeds that of the new program. Sample 1 : n1 = 36. x1 . 6 Comparing Two Population Proportions Parameter of interest: p1 − p2 x Sample 1: n1 . x1 = 21. s2 = 25 (old) 1 2.3) Reject H0 if z > zα/2 or z < −zα/2 Graph: Decision: Conclusion: Assumptions: 1. Exercise: Test the claim that the diﬀerence in mean weight loss for the two programs is greater than 1. Samples are randomly selected 3. Large samples ( n1 ≥ 30. T.96 RR: Reject H0 if z > 1. 1. ˆ 1 75 . p1 = n1 . Sample 2 : n2 = 36. α = 0.96 Graph: Decision: Reject H0 Conclusion: At 5% signiﬁcance level there is suﬃcient statistical evidence to conclude that weight loss in the two diet programs are diﬀerent. : (x1 − x2 ) − 0 = 2. x2 = 18.

n2 q2 ≥ 5) Samples are randomly and independently selected Example: Test the hypothesis that p1 − p2 < 0 if it is known that the test statistic is z = −1. : z= x1 + x2 n1 + n2 (ˆ1 − p2 ) − 0 p ˆ pq (1/n1 + 1/n2 ) ˆˆ RR: 1) Reject H0 if z > zα 2) Reject H0 if z < −zα 3) Reject H0 if z > zα/2 or z < −zα/2 Graph: Decision: Conclusion: Assumptions: Large sample(n1 p1 ≥ 5.Sample 2: n2 .91 RR: reject H0 if z < −1.S.91. 76 . p2 = ˆ x2 . n1 q1 ≥ 5. n2 p2 ≥ 5. n2 p1 − p2 (unknown parameter) Common estimate: p= ˆ Test: H 0 : p1 − p2 = 0 Ha : 1) p1 − p2 > 0 2) p1 − p2 < 0 3) p1 − p2 = 0 T. x2 . Solution: H 0 : p1 − p2 = 0 H a : p1 − p2 < 0 TS: z = −1.645 Graph: Dec: reject H0 Conclusion: At 5% signiﬁcance level there is suﬃcient statistical evidence to conclude that p1 − p2 < 0.

substitution.86 p-value = . 1.0784 Decision rule using p-value: (Important) Reject H0 for all α > p − value Review Exercises: Testing Hypothesis Please show all work.0314 (iii) Two tailed test: H0 : θ = θ0 Ha : θ = θ0 TS: z = 1. The p-value is called the observed signiﬁcance level Note: The p-value is the probability ( when H0 is true) of obtaining a value of the test statistic as extreme or more extreme than the actual sample value in support of Ha .76 p-value = . The delivery time for a random sample of 64 77 . the statistical results are signiﬁcant. answer method whenever possible. No credit for a correct ﬁnal answer without a valid argument.76 p-value = 2(.e. Show your work graphically in all relevant questions.Exercise: Repeat as a two tailed test 7 Reporting Results of Statistical Tests: P-Value Deﬁnition. The p-value for a test of a hypothesis is the smallest value of α for which the null hypothesis is rejected. Use the formula. A local pizza parlor advertises that their average time for delivery of a pizza is within 30 minutes of receipt of the order. i.0392) = . Examples. Find the p-value in each case: (i) Upper tailed test: H0 : θ = θ0 Ha : θ > θ0 TS: z = 1.0392 (ii) Lower tailed test: H0 : θ = θ0 Ha : θ < θ0 TS: z = −1.

then β will increase. (Answer: 1.R. T F (v) If. H0 : Ha : T.05. (i) Is there suﬃcient evidence to conclude that the actual delivery time is larger than what is claimed by the pizza parlor? Use α = . α is ﬁxed and the sample size is increased.orders were recorded. 2. Answer by True of False . with a sample mean of 34 minutes and a standard deviation of 21 minutes.52) R. (Circle your choice). for a given test.S. Graph: Dec: Conclusion: ((ii) Test the hypothesis that Ha : µ = 30. 78 .

but we no more have a Z distribution Assumptions. For non-normal (e.Chapter 8 Small-Sample Tests of Hypothesis Contents: 1. we use s instead. Student’s t distribution 3. Small-sample inferences about the diﬀerence between two means: Paired Samples 6. binomial) populations diﬀerent techniques are necessary 2 Student’s t Distribution RECALL For small samples (n < 30) from normal populations. Comparing two population variances 1 Introduction When the sample size is small we only deal with normal populations. Introduction 2. Small-sample inferences about the diﬀerence between two means: Independent Samples 5. we have z= x−µ √ σ/ n If σ is unknown.g. Small-sample inferences about a population mean 4. 79 . Inferences about a population variance 7.

n−1 ( √ ) 2 n Test: H0 : µ = µ 0 Ha : 1) µ > µ0 .n−1 2 80 . Sampled population is normal 2.n−1 or t α .025. s Other information: µ0 = target value.056 3 Small-Sample Inferences About a Population Mean Parameter of interest: µ Sample data: n.5 = 2. as n becomes larger. Moreover. (i) Find t0.015 (ii) Find t0.26 = 2. α Point estimator: x Estimator mean: µx = µ √ Estimated standard error: σx = s/ n Conﬁdence Interval for µ: s x ± t α . x.355 (iii) Find t0. Small random sample (n < 30) 3. (v) Critical values (tail probabilities) are obtained from the t table Examples. 3) µ = µ0 .005. 2) µ < µ0 . Critical value: either tα.1. t converges to Z. σ is unknown t= x−µ √ s/ n Properties of the t Distribution: (i) It has n − 1 degrees of freedom (df) (ii) Like the normal distribution it has a symmetric mound-shaped probability distribution (iii) More variable (ﬂat) than the normal distribution (iv) The distribution depends on the degrees of freedom.05.8 = 3.

Sample is randomly selected 3. 1.5.05 Critical value: t0.05. µ0 = 20. T.e.: 21.5). x = 21.S. Sample data: n = 25.24 = 1. (i.3 − 20 x − µ0 √ √ = = 1. Normal population 4. α = 0. Unknown variance Example For the sample data given below.S. s2 = 25.n−1 2) Reject H0 if t < −tα. test the hypothesis that weight loss in a new diet program exceeds 20 pounds per ﬁrst month. : t = x−µ0 √ s/ n RR: 1) Reject H0 if t > tα. 1. Test the claim that weight loss is not equal to 19.n−1 2 2 Graph: Decision: 1) if observed value is in RR: “Reject H0 ” 2) if observed value is not in RR: “Do not reject H0 ” Conclusion: At 100α% signiﬁcance level there is (in)suﬃcient statistical evidence to “favor Ha ” . 4 Small-Sample Inferences About the Diﬀerence Between Two Means: Independent Samples Parameter of interest: µ1 − µ2 81 .3 t= s/ n 5/ 25 RR: Reject H0 if t > 1.711 Graph: Decision: Do not reject H0 Conclusion: At 5% signiﬁcance level there is insuﬃcient statistical evidence to conclude that weight loss in a new diet program exceeds 20 pounds per ﬁrst month.T.n−1 3) Reject H0 if t > t α .711 H0 : µ = 20 Ha : µ > 20.3. Exercise. Assumptions.n−1 or t < −t α . Ha : µ = 19. Small sample (n < 30) 2.

Normal populations 2. 1.Sample data: Sample 1: n1 . Samples are independent 5. s2 Other information: D0 = target value. α Point estimator: X 1 − X 2 Estimator mean: µX 1 −X 2 = µ1 − µ2 Assumptions.n1 +n2 −2 )(s Test: H 0 : µ 1 − µ 2 = D0 1 1 + ) n1 n2 2 2 σ1 σ2 + = n1 n2 (n1 − 1)s2 + (n2 − 1)s2 1 2 n1 + n2 − 2 1 1 + n1 n2 σ2 σ2 + n1 n2 1 1 + n1 n2 82 . Variances are equal with common variance 2 2 σ 2 = σ1 = σ2 Pooled estimator for σ. n2 < 30) 3. x2 . s= Estimator standard error: σX 1 −X 2 = σ Reason: σX 1 −X 2 = = σ Conﬁdence Interval: (x1 − x2 ) ± (tα/2. Small samples ( n1 < 30. Samples are randomly selected 4. x1 . s1 Sample 2: n2 .

Ha : 1)µ1 − µ2 > D0 ; 2) µ1 − µ2 < D0 ; 3) µ1 − µ2 = D0 T.S. : (x1 − x2 ) − D0 t= 1 1 s n1 + n2 RR: 1) Reject H0 if t > tα,n1 +n2 −2 2) Reject H0 if t < −tα,n1 +n2 −2 3) Reject H0 if t > tα/2,n1 +n2 −2 or t < −tα/2,n1 +n2 −2 Graph: Decision: Conclusion: Example.(Comparison of two weight loss programs) Refer to the weight loss example. Test the hypothesis that weight loss in a new diet program is diﬀerent from that of an old program. We are told that that the observed value is 2.2 and the we know that 1. Sample 1 : n1 = 7 2. Sample 2 : n2 = 8 α = 0.05 Solution. H0 : µ 1 − µ 2 = 0 Ha : µ 1 − µ 2 = 0 T.S. : (x1 − x2 ) − 0 = 2.2 t= 1 1 s n1 + n2 Critical value: t.025,13 = 2.160 RR: Reject H0 if t > 2.160 or t < −2.160 Graph: Decision: Reject H0 Conclusion: At 5% signiﬁcance level there is suﬃcient statistical evidence to conclude that weight loss in the two diet programs are diﬀerent. Exercise: Test the claim that the diﬀerence in mean weight loss for the two programs is greater than 0. Minitab Commands: A twosample t procedure with a pooled estimate of variance MTB> twosample C1 C2; SUBC>pooled;

83

SUBC> alternative 1. Note: alternative : 1=right-tailed; -1=left tailed; 0=two tailed.

5

Small-Sample Inferences About the Diﬀerence Between Two Means: Paired Samples

Parameter of interest: µ1 − µ2 = µd Sample of paired diﬀerences data: Sample : n = number of pairs, d = sample mean, sd Other information: D0 = target value, α Point estimator: d Estimator mean: µd = µd Assumptions. 1. Normal populations 2. Small samples ( n1 < 30; n2 < 30) 3. Samples are randomly selected 4. Samples are paired (not independent) Sample standard deviation of the sample of n paired diﬀerences sd =

n i=1 (di

− d)2 n−1

√ Estimator standard error: σd = sd / n Conﬁdence Interval. √ d ± tα/2,n−1 sd / n Test. H0 : µ1 − µ2 = D0 (equivalently, µd = D0 ) Ha : 1)µ1 − µ2 = µd > D0 ; 2) µ1 − µ2 = µd < D0 ; 3) µ1 − µ2 = µd = D0 , T.S. : d − D0 √ t= sd / n RR: 1) Reject H0 if t > tα,n−1 2) Reject H0 if t < −tα,n−1

84

3) Reject H0 if t > tα/2,n−1 or t < −tα/2,n−1 Graph: Decision: Conclusion: Example. A manufacturer wishes to compare wearing qualities of two diﬀerent types of tires, A and B. For the comparison a tire of type A and one of type B are randomly assigned and mounted on the rear wheels of each of ﬁve automobiles. The automobiles are then operated for a speciﬁed number of miles, and the amount of wear is recorded for each tire. These measurements are tabulated below. Automobile Tire A Tire B 1 10.6 10.2 2 9.8 9.4 3 12.3 11.8 4 9.7 9.1 5 8.8 8.3 x1 = 10.24 x2 = 9.76 Using the previous section test we would have t = 0.57 resulting in an insigniﬁcant test which is inconsistent with the data. Automobile Tire A Tire B d=A-B 1 10.6 10.2 .4 2 9.8 9.4 .4 3 12.3 11.8 .5 4 9.7 9.1 .6 5 8.8 8.3 .5 x1 = 10.24 x2 = 9.76 d = .48 Q1: Provide a summary of the data in the above table. Sample summary: n = 5, d = .48, sd = .0837 Q2: Do the data provide suﬃcient evidence to indicate a diﬀerence in average wear for the two tire types. Test. (parameter µd = µ1 − µ2 ) H0 : µ d = 0 Ha : µ d = 0 T.S. : .48 − 0 d − D0 √ = 12.8 √ = t= sd / n .0837/ 5

85

The standardized distribution of S 2 is called the chi-square distribution and is given by (n − 1)s2 X2 = σ2 Degrees of freedom (df): ν = n − 1 Graph: Non-symmetrical and depends on df Critical values: using X 2 tables Test. the sampling distribution of S 2 depends on n.RR: Reject H0 if t > 2.776 or t < −2.4 = 2.776) Graph: Decision: Reject H0 Conclusion: At 5% signiﬁcance level there is suﬃcient statistical evidence to to conclude that the average amount of wear for type A tire is diﬀerent from that for type B tire.S. Random sample Example: 86 .025. 6 Inferences About a Population Variance Chi-square distribution. : (n − 1)s2 X2 = 2 σ0 2 2 RR: Reject H0 if X 2 > Xα/2 or X 2 < X1−α/2 where X 2 is based on (n − 1) degrees of freedom. Construct a 99% conﬁdence interval for the diﬀerence in average wear for the two tire types. Exercise. Graph: Decision: Conclusion: Assumptions. 1. 2 H0 : σ 2 = σ0 2 Ha : σ 2 = σ0 (two-tailed test). When a random sample of size n is drawn from a normal population with mean µ and standard deviation σ.776 ( t. T. Normal population 2.

87 . Independent random samples Example. s2 = 7. 1 Note: F = larger sample variance smaller sample variance RR: Reject H0 if F > Fα/2 where Fα/2 is based on (n1 − 1) and (n2 − 1) degrees of freedom.21 1 2 Investment 2: n2 = 8.8%. ν2 = n2 − 1 Graph: Non-symmetrical and depends on df Critical values: using F tables Test. The most common method for measuring investment volatility is by computing the variance ( or standard deviation) of possible outcomes.S. :F = s2 1 s2 2 where s2 is the larger sample variance.8%. That is s2 F = 1 s2 2 Degrees of freedom (df): ν1 = n1 − 1. 2 2 H0 : σ1 = σ2 2 2 Ha : σ1 = σ2 (two-tailed test). Returns over the past 10 years for ﬁrst alternative and 8 years for the second alternative produced the following data: Data Summary: Investment 1: n1 = 10. Graph: Decision: Conclusion: Assumptions.Use text 7 Comparing Two Population Variances F-distribution. T. x2 = 17. x1 = 17. 1. Normal populations 2. When independent samples are drawn from two normal populations 2 2 with equal variances then S1 /S2 possesses a sampling distribution that is known as an F distribution. (Investment Risk) Investment risk is generally measured by the volatility of possible outcomes of the investment.14 Both populations are assumed to be normally distributed. s2 = 3.

88 . That is Ha : σ1 > σ2 . RR: Reject H0 if F > Fα/2 where Fα/2. Do the upper tail test.9 = 4.14 = 2. Test: 2 2 H0 : σ1 = σ2 2 2 Ha : σ1 = σ2 (two-tailed test). 2 2 Exercise. T.22 F = 2 = 2 s1 3.n1 −1 = F.025.S.Q1: Do the data present suﬃcient evidence to indicate that the risks for investments 1 and 2 are unequal ? Solution.21 .n2 −1. : s2 7.7.20 Graph: Decision: Do not reject H0 Conclusion: At 5% signiﬁcance level there is insuﬃcient statistical evidence to indicate that the risks for investments 1 and 2 are unequal.

the number of trainees varied from program to program. The Randomized Block Design 1 Introduction Analysis of variance is a statistical technique used to compare more than two population means by isolating the sources of variability. Do the data present suﬃcient evidence to indicate a diﬀerence in the mean achievement for the four training programs? Goal.Chapter 9 Analysis of Variance Contents. Because there were some dropouts during the training program. One Way ANOVA: Completely Randomized Experimental Design 3. Introduction 2. That is H0 : µ 1 = µ 2 = µ 3 = µ 4 Ha : Not all means are equal Deﬁnitions: (i) Response: variable of interest or dependent variable (sales) (ii) Factor: categorical variable or independent variable (training technique) (iii) Treatment levels (factor levels): method of training. Example. At the end of the training programs each salesperson was assigned a sales area from a group of sales areas that were judged to have equivalent sales potentials. The table below lists the number of sales made by each person in each of the four groups of sales people during the ﬁrst week after completing the training program. 1. Four groups of sales people for a magazine sales agency were subjected to diﬀerent sales training programs. Test whether the means are equal or not. t =4 89 .

77 where F is based on (t-1) and (n-t) df. (vi) experimental unit: (trainee) 2 One Way ANOVA: Completely Randomized Experimental Design ANOVA Table df SS MS F p-value 3 712.13 90 .43 µ1 µ2 n3 = 6 425 70.e.83 µ3 n4 = 4 351 87.77 19 1.n−t i.5 3.75 µ4 n = 23 GT= 1779 (iv) ANOVA: ANalysis OF VAriance (v) N-Way ANOVA: studies N factors. H0 : µ 1 = µ 2 = µ 3 = µ 4 Ha : Not all means are equal M ST T.05. : F = M SE = 3.S. Reject H0 if F > F0.t−1.1 65 87 73 79 81 69 2 75 69 83 81 72 79 90 Training Group 3 4 59 94 78 89 67 80 62 88 83 76 Ti Ti parameter n1 = 6 n2 = 7 454 549 75.0 22 1909.6 63.2 Source of error Treatments Error Totals Inferences about population means Test.6 237.196.3.67 78.19 = 3. RR: Reject H0 if F > Fα.

Independent random samples 3. 1. Sampled populations are normal 2. S of error df SS Trments t-1 SST Error n-t SSE Totals n-1 TSS ANOVA Table MS F MST=SST/(t-1) MST/MSE MSE=SSE/(n-t) p-value Training Group 1 2 3 x11 x21 x31 x12 x22 x32 x13 x23 x33 x14 x24 x34 x15 x25 x35 x16 x26 x36 x27 n1 T1 T1 µ1 n2 T2 T2 µ2 n3 T3 T3 µ3 4 x41 x42 x43 x44 Ti Ti parameter n4 T4 T4 µ4 n GT Notation: TSS: sum of squares of total deviation. SST: sum of squares of total deviation between treatments. Assumptions. CM: correction for the mean 91 . SSE: sum of squares of total deviation within treatments (error).Graph: Decision: Reject H0 Conclusion: At 5% signiﬁcance level there is suﬃcient statistical evidence to indicate a diﬀerence in the mean achievement for the four training programs. All t populations have equal variances Computations.

909. 7792/23 = 137. 196. 601.6 63.n−t s2 ( 1 1 + ) ni nj MINITAB MTB> aovoneway C1-C4. SST and SSE: t ni T SS = i=1 j=1 t x2 − CM ij SST = Ti2 − CM i=1 ni SSE = T SS − SST Calculations for the training example produce xij )2 /n = 1.n−t √ ni CI for µi − µj : (T i − T j ) ± tα/2.196.8 CM = ( T SS = x2 − CM = 1. Computational Formulas for TSS.6 Conﬁdence Intervals.6 237.2 ij SST = SSE = T SS − SST = 1.5 3.6 Thus Source of error Treatments Error Totals ANOVA Table df SS MS F p-value 3 712. Exercise.0 22 1909.77 19 1. 92 .GT: Grand Total. Estimate of the common variance: √ √ s = s2 = MSE = SSE n−t CI for µi : s T i ± tα/2. Produce a Minitab output for the above example.2 Ti2 ni − CM = 712.

Example. b = 4 blocks 93 . (ii) The three designs are assigned to each supermarket completely at random. (iii) An alternate design would be to use 12 supermarkets. That is larger supermarkets would be expected to have larger overall sales of the product than smaller supermarkets. The randomized block design eliminates the store-to-store variability. (ii) Do the data present suﬃcient evidence to indicate a diﬀerence in the mean sales for each package design (treatment)? (iii) Do the data present suﬃcient evidence to indicate a diﬀerence in the mean sales for the supermarkets? weeks w1 w2 w3 (1) 17 (3) 23 (2) 34 (3) 21 (1) 15 (2) 26 (1) 1 (2) 23 (3) 8 (2) 22 (1) 6 (3) 16 s1 s2 s3 s4 Remarks. The data shown in Table 1. The t treatments are randomly assigned to the units in each block. (i) In each supermarket (block) the ﬁrst entry represents the design (treatment) and the second entry represents the sales per week.3 The Randomized Block Design Extends paired-diﬀerence design to more than two treatments. A randomized block design consists of b blocks. The treatment and block totals are t = 3 treatments. Each design (treatment) would be randomly assigned to 4 supermarkets. A consumer preference study involving three diﬀerent package designs (treatments) was laid out in a randomized block design among four supermarkets (blocks). below represent the number of units sold for each package design within each supermarket during each of three given weeks. In this case the diﬀerence in sales could be due to more than just diﬀerences in package design. For computational purposes we rearrange the data so that Data Summary. each containing t experimental units. and each treatment appears once in every block. (i) Provide a data summary.

T3 = 68 B1 = 74. B2 = 62. B4 = 44 Calculations for the training example produce xij )2 /n = 3. T2 = 105.33 CM = ( T SS = x2 − CM = 940. 745.s1 s2 s3 s4 Ti Treatments t1 t2 t3 17 34 23 15 26 21 1 23 8 6 22 16 T1 T2 T3 Bi B1 B2 B3 B4 T1 = 39. B3 = 32.00 SSB = SSE = T SS − SST − SSB = 45.50 Ti2 b 2 Bi t − CM = 547.17 94 .67 ij SST = − CM = 348.

58 36.67 95 .00 15.58 11 940.30 0.00 116.MINITAB.50 7.08 0.17 273.000 3 348.(Commands and Printouts) MTB> Print C1-C3 ROW 1 2 3 4 5 6 7 8 9 10 11 12 UNITS 17 34 23 15 26 21 1 23 8 6 22 16 TRTS 1 2 3 1 2 3 1 2 3 1 2 3 BLOCKS 1 1 1 2 2 2 3 3 3 4 4 4 MTB> ANOVA C1=C2 C3 Source of error Treatments Blocks Error Totals ANOVA Table df SS MS F p-value 2 547.003 6 45.

Solution to (ii) Test. Estimate of the common variance: √ √ SSE s = s2 = MSE = n−t−b+1 CI for µi − µj : 96 .2.6 = 12. Reject H0 if F > F0. 1.6 = 5.30 where F is based on (b-1) and (n-t-b+1) df. that is the data supports our decision to use supermarkets as blocks.005.5% signiﬁcance level there is suﬃcient statistical evidence to indicate a real diﬀerence in the mean sales for the four supermarkets.e.e.S.14 Graph: Decision: Reject H0 Conclusion: At 5% signiﬁcance level there is suﬃcient statistical evidence to indicate a real diﬀerence in the mean sales for the three package designs.b−1. All t populations have equal variances Conﬁdence Intervals. Note that n − t − b + 1 = (t − 1)(b − 1).05. RR: Reject H0 if F > Fα.92 Graph: Decision: Reject H0 Conclusion: At .S. Sampled populations are normal 2. Reject H0 if F > F0. Dependent random samples due to blocking 3. RR: Reject H0 if F > Fα.n−t−b+1 i. : F = M SE = 36.n−t−b+1 i. H0 : Block means are equal Ha : Not all block means are equal (i.: F = M SB M SE = 15. blocking is desirable) T.3. Solution to (iii) Test.09 where F is based on (t-1) and (n-t-b+1) df.t−1. H0 : µ 1 = µ 2 = µ 3 Ha : Not all means are equal M ST T.e. Assumptions.

97 . for the diﬀerence between mean sales from package designs 1 and 2.n−t−b+1 s 2 b Exercise.I.(T i − T j ) ± tα/2. Construct a 90% C.

and the associated sales 98 . Least squares prediction equation 4. that is whether advertising is actually related to the ﬁrm’s sales volume. Analysis of Variance 9. Predicting y for a given x 7.Chapter 10 Simple Linear Regression and Correlation Contents. 1. Coeﬃcient of correlation 8. In addition we wish to use the amount spent on advertising to predict the sales volume.( Ad Sales) Consider the problem of predicting the gross monthly sales volume y for a corporation that is not subject to substantial seasonal variation in its sales volume. For the predictor variable x we use the the amount spent by the company on advertising during the month of interest. Estimating E(y|x) for a given x 6. Example. Inferences concerning the slope 5. We wish to determine whether advertising is worthwhile. Introduction: Example 2. A Simple Linear probabilistic model 3.x. Computer Printouts 1 Introduction Linear regression is a statistical technique used to predict (forecast) the value of a variable from known related variables. The data in the table below represent a sample of advertising expenditures.

Assumptions.000) 101 1. 99 . : random error due to other factors not included in the model.8 110 1.0 120 1.7 82 0.6 91 0. E( ) := µ = 0.9 105 1. (i) Response: dependent variable of interest (sales volume) (ii) Independent (predictor) variable ( Ad expenditure) (iii) Linear equations (straight line): y = a + bx Scatter diagram: Best ﬁt straight line: Equation of a straight line: (y-intercept and slope) 2 A Simple Linear Probabilistic Model Model.000) x(x$10.8 93 1. y. Y = β0 + β1 X + where x: independent variable (predictor) y: dependent variable (response) β0 and β1 are unknown parameters. 1. for 10 randomly selected months. V ar( ) := σ 2 = σ 2 . Month 1 2 3 4 5 6 7 8 9 10 Ad Sales Data y(y$10.1 Deﬁnitions.volume. 2.3 90 0.2 92 0.0 75 0.

Objective: Estimate β0 . Do a scatter diagram. Ad Sales example Question 1. 100 . ˆ ˆ y = β0 + β1 x ˆ This equation is obtained by using the method of least squares. To estimate σ 2 y)/n ˆ SSE = SSyy − β1 SSxy = SSyy − (SSxy )2 /SSxx . that is min (y − y )2 ˆ Computational Formulas. The r. Question 2. 3 Least Squares Prediction Equation The least squares prediction equation is sometimes called the estimated regression equation or the prediction equation. y) around the regression line. s2 = SSE n−2 Remarks. β1 and σ 2 . Can you say that x and y are linearly related? Answer. ˆ (i) β1 : is the slope of the estimated regression equation. (ii) s2 provides a measure of spread of points (x. has a normal distribution with mean 0 and variance σ 2 .v. y = y/n SSxx = (x − x)2 = x2 − ( x)2 /n SSyy = (y − y)2 = y 2 − ( y)2/n SSxy = (x − x)(y − y) = xy − ( x)( ˆ β1 = SSxy /SSxx ˆ ˆ β0 = y − β1 x.3. Use the computational formulas to provide a data summary. 4. x = x/n. The random components of any two observed y values are independent.

444 SSxy = 23. x = 0.Answer. y = 95.34 SSyy = 1600.94.9 101 . Data Summary.9 SSxx = .

569 − (9.28 x = 0.64 1.94.00 1.0 110 1.Optional material Month 1 2 3 4 5 6 7 8 9 10 Sum Ad Sales Calculations x y x2 1.44 0.4 959 9.100 14.649 5.8 − y)2/n = 93.28 − x)( = .34 10 (959)2 = 1600.100 6. y = x2 − ( xy − ( y2 − ( y/n = 95.81 1.0 81.2 101 1.3 120 1.464 12.49 0.00 0.5 xy 924.025 y2 93.4)(959) = 23.64 1.8 82 0.724 8.0 65.201 8.9 91 0.6 75 0.9 10 102 .2 73.9 xy 121.8 92 0.6 93.444 y)/n = 924.4)2 10 SSxx = SSxy = SSyy = x)2 /n = 9.6 110.69 0.569 x= x/n = 0.400 8.7 90 0.9 (9.36 0.0 93 1.281 11.625 8.21 x y x2 9.94 y = 95.9 115.8 y2 10.1 105 1.0 63.0 156.0 45.

x = 1. however. Question 6.57)(1.5676)(23. ˆ β1 = SSxy /SSxx = 23. Answer. and β1 . y.49 + (52.49 + (52.5676)(. Answer. 000. This equation is also called the estimated regression equation or prediction line. Question 7.94) 46.5676 52.49 + 52.0. E(y|x) = 46.34 = 52.57)(1.57x ˆ Remark.9 − (52.Question 3.0) = 99.97 = = 46. ˆ So sales volume is $990. Predict the mean sales volume E(y|x) for a given expenditure level of $10.0). Find the least squares line for the data. Remark. Estimate σ 2 . Predict sales volume.57x = 46. 600. β Question 4. the bound on the error of estimation will. y = 46. Answer. be diﬀerent. 600. In Question 6 and Question 7 we obtained the same estimate. 000 (i.06. ˆ SSE = SSyy − β1 SSxy = 1.49 + 52.34) = 373. Answer.9 − (52.49.444 ˆ ˆ0 = y − β1 x = 95.06 so the mean sales volume is $990. ˆ ˆ y = β0 + β1 x = 46. Therefore s2 = SSE 373.75 n−2 8 Question 5.e. Estimate the parameters β0 .49 + 52.97 . 4 Inferences Concerning the Slope Parameter of interest: β1 ˆ Point estimator: β1 103 . x = 1.0) = 99.57x = 46.57 . 600. Answer. for a given expenditure level of $10.

y.12 6. x.8 = 2. H0 : β1 = β10 (no linear relationship) Ha : β1 = β10 (there is linear relationship) T.444 s/ SSxx RR: ( critical value: t.84/ . x. : ˆ 52. Test. and sales volume. : ˆ β1 − β10 t= √ s/ SSxx RR: Reject H0 if t > tα/2.57 − 0 β1 − 0 √ = t= √ = 5.306) Reject H0 if t > 2.S.306 Graph: Decision: Reject H0 Conclusion: At 5% signiﬁcance level there is suﬃcient statistical evidence to indicate a linear relation ship between advertising expenditure.Estimator mean: µβ1 = β1 ˆ √ Estimator standard error: σβˆ1 = σ/ SSxx Test.S.n−2 Graph: Decision: Conclusion: Question 8.306 or t < −2.025. Conﬁdence interval for β1 : s ˆ β1 ± tα/2. Answer. H0 : β1 = 0 (no linear relationship) Ha : β1 = 0 (there is linear relationship) T. Determine whether there is evidence to indicate a linear relationship between advertising expenditure. 104 . Answer. y. Find a 95% conﬁdence interval for β1 . and sales volume.n−2 √ SSxx Question 9.n−2 or t < −tα/2.

57 ± 23.444 52.306 √ .s ˆ β1 ± tα/2.90. We call this measure coeﬃcient of correlation between y and x.57 = (28.57 ± 2. Now we examine how strong a linear relationship between x and y is.24) 5 by Estimating E(y|x) For a Given x The conﬁdence interval (CI) for the expected (mean) value of y given x = xp is given y ± tα/2. (i) −1 ≤ r ≤ 1.n−2 √ SSxx 6. 105 . 76.n−2 s2 [1 + ˆ 1 (xp − x)2 + ] n SSxx 7 Coeﬃcient of Correlation In a previous section we tested for a linear relationship between x and y.84 52. r= SSxy SSxx SSyy Remarks.n−2 s2 [ ˆ 1 (xp − x)2 ] + n SSxx 6 Predicting y for a Given x The prediction interval (PI) for a particular value of y given x = xp is given by y ± tα/2.

By what percentage is the sum of squares of deviations of y about the ˆ mean (SSyy ) is reduced by using y rather than y as a predictor of y? Answer. r= SSxy SSxx SSyy = 23. r.88 Coeﬃcient of determination Algebraic manipulations show that r2 = SSyy − SSE SSyy Question 11. y SSR = (ˆ − y)2 (SS of deviations due to regression or explained deviations) ˆ SSE = (y − y )2 (SS of deviations for the error or unexplained deviations) T SS = SSR + SSE Question 12.444(1. Find the coeﬃcient of correlation.9) = 0. SSyy − SSE r2 = = 0.(ii) The population coeﬃcient of correlation is ρ. ˆ (iii) r > 0 indicates a positive correlation (β1 > 0) ˆ (iv) r < 0 indicates a negative correlation (β1 < 0) ˆ (v) r = 0 indicates no correlation (β1 = 0) Question 10. Give the ANOVA table for the AD sales example.882 = 0. 600.34 0. 106 . Use ANOVA table to test for a signiﬁcant linear relationship between sales and advertising expenditure. Answer. Question 13. Answer.77 SSyy r 2 = is called the coeﬃcient of determination 8 Analysis of Variance Notation: T SS := SSyy = (y − y)2 (Total SS of deviations).

973 46. and sales volume.25 0. Error Totals ANOVA Table df SS MS F p-value 1 1.226.600.8 = 14. 9 Computer Printouts for Regression Analysis Store y in C1 and x in C2. MTB> Regress C1 1 C2. H0 : β1 = 0 (no linear relationship) Ha : β1 = 0 (there is linear relationship) T.1. Computer output for Ad sales example: More generally we obtain: 107 . y.900 ANOVA Table MS MSR=SSR/(1) MSE=SSE/(n-2) Source df SS Reg.226. : Gives a scatter diagram. 1 SSR Error n-2 SSE Totals n-1 TSS F MSR/MSE p-value Answer.: F = M SR M SE = 26.69 (OR: Reject H0 if α > p-value) Graph: Decision: Reject H0 Conclusion: At 0. Test.69) Reject H0 if F > 14.927 26.0001 8 373.Source Reg. MTB> Plot C1 C2.747 9 1.25 RR: ( critical value: F.927 1.5% signiﬁcance level there is suﬃcient statistical evidence to indicate a linear relationship between advertising expenditure.S.005. x.

12 R-sq=76.7% Source Reg.885 4. x2 = 16.900 Review Exercises: Linear Regression Please show all work.486 9. and indicate whether x and y appear linearly related.70 52. Show your work graphically in all relevant questions.6% P 0. (ii) Show that x = 0.889) 108 .57 10.226. y = 15. (Answer: y = 3 − x) ˆ (iv) Plot the regression equation on the same graph as (i).000 0. r 2 = .600. No credit for a correct ﬁnal answer without a valid argument. (Answer: r = −. answer method whenever possible. and SSxy = −16. Use the formula. Error Totals Analysis of Variance df SS MS F p-value 1 1.927 1. (Answer: s2 = 2/3) (vi) Estimate the expected value of y when x = −1 (vii) Find the correlation coeﬃcient r and ﬁnd r 2 .6 x Coef Stdev t-ratio 46.927 26. SSyy = 18.000 R-sq(adj)=73. SSxx = 16.747 9 1.226.25 0.943.Predictor Constant x s=6.000 8 373. Does the line appear to provide a good ﬁt for the data points? (v) Compute SSE and s2 . substitution.973 46.5 + 52.837 The regression equation is y=46. y 2 = 63. (iii) Find the regression equation for the data. Given the following data set x -3 -1 1 1 2 y 6 4 3 1 1 (i) Plot the scatter diagram. 1.26 5.

The Regress Minitab’s command has been applied to data on family income. A random sample sample of 20 managers is chosen with the following results (in thousands of dollars): xi = 235.0% Variance stdev t-ratio P 2. (Answer: y = 16.000 R-sq(adj)=91. SSyy = 2.826x) ˆ (ii) Find the correlation coeﬃcient. SSxx = 485.05727 16. from a random sample of 25 families. Predictor Constant X s= Analysis of Coef 82.8. r.85. and years of work experience. 1 SSR MSR=SSR/(1) MSR/MSE Error n-2 SSE MSE=SSE/(n-2) Totals n-1 TSS p-value 2. yi = 763. 3. β1 .6% 109 . The income data are in thousands of dollars and the energy consumption are in millions of BTU. and last year’s energy consumption.036 0.Y . ˆ ˆ (i) Find β0 . X. 236.1.73 + 1. Y .000 0. and the estimated regression equation. A portion of a linear regression computer printout is shown below.(Answer: r = . A study of middle to upper-level managers is undertaken to investigate the relationship between salary level.054 39. and SSxy = 886.75. It is further assumed that the relationship is linear.93051 R-sq=92.85) (iii) Find r 2 and interpret it value.25 0.The regression equation is ˆ ˆ y = β0 + β1 x Predictor Constant x s= √ MSE Coef ˆ β0 ˆ β1 Stdev σβˆ0 σβˆ1 R − sq = r 2 t-ratio TS: t TS: t P p-value p-value R-sq(adj) Analysis of Variance Source df SS MS F Reg.94 0. X.

(Circle your choice). T F (iv) r = 1 implies no linear correlation between x and y. T F (i) The correlation coeﬃcient r shows the degree of association between x and y. we always predict the same value of y regardless of the value of x. 110 . T F (iii) The last step in a simple regression analysis is drawing a scatter diagram. β1 . and the estimated regression equation.000 (i) Complete all missing entries in the table.Source Regression Error Total DF SS 23 8291 MS F P 7626. (iii) Do the data present suﬃcient evidence to indicate that Y and X are linearly related? Test by using α = 0. T F (vii) It is necessary to assume that the response y of a probability model has a normal distribution if we are to estimate the parameters β0 .6 264. T F (vi) If β1 = 1. β1 .01.000. and σ 2 . 4. (iv) Determine a point estimate for last year’s mean energy consumption of all families with an annual income of $40. T F (ii) The coeﬃcient of determination r 2 shows the percentage change in y resulting form one-unit change in x. T F (v) We always estimate the value of a parameter and predict the value of a random variable.02 0. Answer by True of False . ˆ ˆ (ii) Find β0 .

V ar( ) := σ 2 = σ 2 . 111 . Computer Printouts 1 Introduction: Example Multiple linear regression is a statistical technique used predict (forecast) the value of a variable from multiple known related variables. Y = β0 + β1 X1 + β2 X2 + β3 X3 + where xi : independent variables (predictors) y: dependent variable (response) βi : unknown parameters. Multiple Linear Model 3. 1. Assumptions. has a normal distribution with mean 0 and variance σ 2 .Chapter 11 Multiple Linear Regression Contents. : random error due to other factors not included in the model. Analysis of Variance 4. Introduction: Example 2. 2 A Multiple Linear Model Model. E( ) := µ = 0. 2. 1. 3.

Error Totals df 3 n−4 n−1 Analysis of Variance SS MS F SSR MSR=SSR/(3) MSR/MSE SSE MSE=SSE/(n-4) TSS p-value 112 . The random components of any two observed y values are independent. y x1 x2 x3 1 y1 x11 x21 x31 2 y2 x12 x22 x32 ··· ··· ··· ··· ··· n yn x1n x2n x3n Minitab Printout The regression equation is ˆ ˆ ˆ ˆ y = β0 + β1 x1 + β2 x2 + β3 x3 Predictor Constant x1 x2 x3 s= √ MSE Coef ˆ β0 ˆ β1 ˆ β2 ˆ β3 Stdev σβˆ0 σβˆ1 σβˆ2 σβˆ3 R2 = r 2 t-ratio TS: t TS: t TS: t TS: t P p-value p-value p-value p-value R2 (adj) Source Reg.4. 3 Least Squares Prediction Equation Estimated Regression Equation ˆ ˆ ˆ ˆ y = β0 + β1 x1 + β2 x2 + β3 x3 ˆ This equation is obtained by using the method of least squares Multiple Regression Data Obser.

Source x1 x2 x3

df 1 1 1

SS SSx1 x1 SSx2 x2 SSx3 x3

Unusual observations (ignore)

113

MINITAB. Use REGRESS command to regress y stored in C1 on the 3 predictor variables stored in C2 − C4. MTB> Regress C1 3 C2-C4; SUBC> Predict x1 x2 x3. The subcommand PREDICT in Minitab, followed by ﬁxed values of x1 , x2 , and x3 calculates the estimated value of y (Fit), its estimated standard error (Stdev.Fit), a 95% ˆ CI for E(y), and a 95% PI for y. Example. A county assessor wishes to develop a model to relate the market value, y, of single-family residences in a community to the variables: x1 : living area in thousands of square feet; x2 : number of ﬂoors; x3 : number of bedrooms; x4 : number of baths. Observations were recorded for 29 randomly selected single-family homes from residences recently sold at fair market value. The resulting prediction equation will then be used for assessing the values of single family residences in the county to establish the amount each homeowner owes in property taxes. A Minitab printout is given below: MTB> Regress C1 4 C2-C5; SUBC> Predict 1.0 1 3 2; SUBC> Predict 1.4 2 3 2.5. The regression equation is y = −16.6 + 7.84x1 − 34.4x2 − 7.99x3 + 54.9x4 Predictor Coef. Stdev t-ratio Constant −16.58 18.88 −0.88 x1 7.839 1.234 6.35 x2 −34.39 11.15 −3.09 −7.990 8.249 −0.97 x3 54.93 13.52 4.06 x4 s = 16.58 R2 = 88.2%

P 0.389 0.000 0.005 0.342 0.000

R2 (adj) = 86.2%

114

Source Reg. Error Totals

df 4 24 28

Analysis of Variance SS MS F p-value 49359 12340 44.88 0.000 6599 275 55958

Source x1 x2 x3 x4

df 1 1 1 1

SS 44444 59 321 4536

Fit Stdev.Fit 113.32 5.80 137.75 5.48

95%C.I. 95%P.I. (101.34, 125.30) (77.05, 149.59) (126.44, 149.07) (101.70, 173.81)

115

: p-value=0. ˆ ˆ ˆ ˆ y = β0 + β1 x1 + β2 x2 + β3 x2 x2 ˆ 1 116 .30) PI: (77. What is the prediction equation ? The regression equation is y = −16.Q1. 125. 149.S.84x1 − 34.9x4 Q2. CI: (101. and x4 = 2. Do the data provide suﬃcient evidence to indicate that the model contributes information for the prediction of y? Test using α = 0.05.05.4x2 − 7. Test: H0 : model not useful Ha : model is useful T. What type of model has been chosen to ﬁt the data? Multiple linear regression model.34. Q4. Q3. x3 = 3. Give a 95% CI for E(y) and PI for y when x1 = 10.59) Non-Linear Models Example. Reject H0 if α > p − value Graph: Decision: Reject H0 Conclusion: At 5% signiﬁcance level there is suﬃcient statistical evidence to indicate that the model contributes information for the prediction of y. x2 = 1.99x3 + 54.000 DR.6 + 7.

Sign up to vote on this title

UsefulNot useful- Statistics Notes
- STATISTICS
- Econometrics Notes - University of Utah (370 Pages)
- 17450844 Statistics in Plain English Second Edition
- Quantitative Investment Analysis CFA Institute Investment Series
- Statistics Book
- The Mathematics Of Financial Derivatives
- Introduction to Probability and Statistics Using R!
- Probability Distributions
- Problem Set 3
- An undergraduate Introduction to Financial Mathematics by J Robert Buchanan
- Functions of Statistics
- Probability Theory in Decision Making
- Lecture Notes Statistics
- PS - Handbook.pdf
- _newbold_ism_07.pdf
- Walpole Chapter 01
- Chapter 4
- Basic Probability
- Chap 1-Descriptive Statistics
- Basic Stats
- Probability
- Sample of Six Sigma Green Belt Primer
- Statistical Thinking For
- 3 Probability[1] (1).ppt
- Chap 02 Probability(2)
- lect1-Dr Attia
- Infer Ential
- AP Statistics Semester i Ago-dic 2009 II
- Module 1 Probability
- Lecture Notes Statistics

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue reading from where you left off, or restart the preview.

scribd