MBA 604

Introduction Probaility and Statistics
Lecture Notes
Muhammad El-Taha
Department of Mathematics and Statistics
University of Southern Maine
96 Falmouth Street
Portland, ME 04104-9300
MBA 604, Spring 2003
MBA 604
Introduction to Probability and Statistics
Course Content.
Topic 1: Data Analysis
Topic 2: Probability
Topic 3: Random Variables and Discrete Distributions
Topic 4: Continuous Probability Distributions
Topic 5: Sampling Distributions
Topic 6: Point and Interval Estimation
Topic 7: Large Sample Estimation
Topic 8: Large-Sample Tests of Hypothesis
Topic 9: Inferences From Small Sample
Topic 10: The Analysis of Variance
Topic 11: Simple Linear Regression and Correlation
Topic 12: Multiple Linear Regression
1
Contents
1 Data Analysis 5
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Graphical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Numerical methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4 Percentiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5 Sample Mean and Variance
For Grouped Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6 z-score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2 Probability 22
1 Sample Space and Events . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2 Probability of an event . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3 Laws of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4 Counting Sample Points . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5 Random Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6 Modeling Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3 Discrete Random Variables 35
1 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2 Expected Value and Variance . . . . . . . . . . . . . . . . . . . . . . . . 37
3 Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4 Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4 Continuous Distributions 48
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2 The Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3 Uniform: U[a,b] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4 Exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2
5 Sampling Distributions 56
1 The Central Limit Theorem (CLT) . . . . . . . . . . . . . . . . . . . . . 56
2 Sampling Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6 Large Sample Estimation 61
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2 Point Estimators and Their Properties . . . . . . . . . . . . . . . . . . . 62
3 Single Quantitative Population . . . . . . . . . . . . . . . . . . . . . . . 62
4 Single Binomial Population . . . . . . . . . . . . . . . . . . . . . . . . . 64
5 Two Quantitative Populations . . . . . . . . . . . . . . . . . . . . . . . . 66
6 Two Binomial Populations . . . . . . . . . . . . . . . . . . . . . . . . . . 67
7 Large-Sample Tests of Hypothesis 70
1 Elements of a Statistical Test . . . . . . . . . . . . . . . . . . . . . . . . 70
2 A Large-Sample Statistical Test . . . . . . . . . . . . . . . . . . . . . . . 71
3 Testing a Population Mean . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4 Testing a Population Proportion . . . . . . . . . . . . . . . . . . . . . . . 73
5 Comparing Two Population Means . . . . . . . . . . . . . . . . . . . . . 74
6 Comparing Two Population Proportions . . . . . . . . . . . . . . . . . . 75
7 Reporting Results of Statistical Tests: P-Value . . . . . . . . . . . . . . . 77
8 Small-Sample Tests of Hypothesis 79
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
2 Student’s t Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3 Small-Sample Inferences About a Population Mean . . . . . . . . . . . . 80
4 Small-Sample Inferences About the Difference Between Two Means: In-
dependent Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5 Small-Sample Inferences About the Difference Between Two Means: Paired
Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6 Inferences About a Population Variance . . . . . . . . . . . . . . . . . . 86
7 Comparing Two Population Variances . . . . . . . . . . . . . . . . . . . . 87
9 Analysis of Variance 89
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
2 One Way ANOVA: Completely Randomized Experimental Design . . . . 90
3 The Randomized Block Design . . . . . . . . . . . . . . . . . . . . . . . . 93
3
10 Simple Linear Regression and Correlation 98
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
2 A Simple Linear Probabilistic Model . . . . . . . . . . . . . . . . . . . . 99
3 Least Squares Prediction Equation . . . . . . . . . . . . . . . . . . . . . 100
4 Inferences Concerning the Slope . . . . . . . . . . . . . . . . . . . . . . . 103
5 Estimating E(y|x) For a Given x . . . . . . . . . . . . . . . . . . . . . . 105
6 Predicting y for a Given x . . . . . . . . . . . . . . . . . . . . . . . . . . 105
7 Coefficient of Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
8 Analysis of Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
9 Computer Printouts for Regression Analysis . . . . . . . . . . . . . . . . 107
11 Multiple Linear Regression 111
1 Introduction: Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
2 A Multiple Linear Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
3 Least Squares Prediction Equation . . . . . . . . . . . . . . . . . . . . . 112
4
Chapter 1
Data Analysis
Chapter Content.
Introduction
Statistical Problems
Descriptive Statistics
Graphical Methods
Frequency Distributions (Histograms)
Other Methods
Numerical methods
Measures of Central Tendency
Measures of Variability
Empirical Rule
Percentiles
1 Introduction
Statistical Problems
1. A market analyst wants to know the effectiveness of a new diet.
2. A pharmaceutical Co. wants to know if a new drug is superior to already existing
drugs, or possible side effects.
3. How fuel efficient a certain car model is?
4. Is there any relationship between your GPA and employment opportunities.
5. If you answer all questions on a (T,F) (or multiple choice) examination completely
randomly, what are your chances of passing?
6. What is the effect of package designs on sales.
5
7. How to interpret polls. How many individuals you need to sample for your infer-
ences to be acceptable? What is meant by the margin of error?
8. What is the effect of market strategy on market share?
9. How to pick the stocks to invest in?
I. Definitions
Probability: A game of chance
Statistics: Branch of science that deals with data analysis
Course objective: To make decisions in the prescence of uncertainty
Terminology
Data: Any recorded event (e.g. times to assemble a product)
Information: Any aquired data ( e.g. A collection of numbers (data))
Knowledge: Useful data
Population: set of all measurements of interest
(e.g. all registered voters, all freshman students at the university)
Sample: A subset of measurements selected from the population of interest
Variable: A property of an individual population unit (e.g. major, height, weight of
freshman students)
Descriptive Statistics: deals with procedures used to summarize the information con-
tained in a set of measurements.
Inferential Statistics: deals with procedures used to make inferences (predictions)
about a population parameter from information contained in a sample.
Elements of a statistical problem:
(i) A clear definition of the population and variable of interest.
(ii) a design of the experiment or sampling procedure.
(iii) Collection and analysis of data (gathering and summarizing data).
(iv) Procedure for making predictions about the population based on sample infor-
mation.
(v) A measure of “goodness” or reliability for the procedure.
Objective. (better statement)
To make inferences (predictions, decisions) about certain characteristics of a popula-
tion based on information contained in a sample.
Types of data: qualitative vs quantitative OR discrete vs continuous
Descriptive statistics
Graphical vs numerical methods
6
2 Graphical Methods
Frequency and relative frequency distributions (Histograms):
Example
Weight Loss Data
20.5 19.5 15.6 24.1 9.9
15.4 12.7 5.4 17.0 28.6
16.9 7.8 23.3 11.8 18.4
13.4 14.3 19.2 9.2 16.8
8.8 22.1 20.8 12.6 15.9
Objective: Provide a useful summary of the available information.
Method: Construct a statistical graph called a “histogram” (or frequency distribution)
Weight Loss Data
class bound- tally class rel.
aries freq, f freq, f/n
1 5.0-9.0- 3 3/25 (.12)
2 9.0-13.0- 5 5/25 (.20)
3 13.0-17.0- 7 7/25 (.28)
4 17.0-21.0- 6 6/25 (.24)
5 21.0-25.0- 3 3/25 (.12)
6 25.0-29.0 1 1/25 (.04)
Totals 25 1.00
Let
k = # of classes
max = largest measurement
min = smallest measurement
n = sample size
w = class width
Rule of thumb:
-The number of classes chosen is usually between 5 and 20. (Most of the time between
7 and 13.)
-The more data one has the larger is the number of classes.
7
Formulas:
k = 1 + 3.3log
10
(n);
w =
max −min
k
.
Note: w =
28.6−5.4
6
= 3.87. But we used
w =
29−5
6
= 4.0 (why?)
Graphs: Graph the frequency and relative frequency distributions.
Exercise. Repeat the above example using 12 and 4 classes respectively. Comment on
the usefulness of each including k = 6.
Steps in Constructing a Frequency Distribution (Histogram)
1. Determine the number of classes
2. Determine the class width
3. Locate class boundaries
4. Proceed as above
Possible shapes of frequency distributions
1. Normal distribution (Bell shape)
2. Exponential
3. Uniform
4. Binomial, Poisson (discrete variables)
Important
-The normal distribution is the most popular, most useful, easiest to handle
- It occurs naturally in practical applications
- It lends itself easily to more in depth analysis
Other Graphical Methods
-Statistical Table: Comparing different populations
- Bar Charts
- Line Charts
- Pie-Charts
- Cheating with Charts
8
3 Numerical methods
Measures of Central Measures of Dispersion
Tendency (Variability)
1. Sample mean 1. Range
2. Sample median 2. Mean Absolute Deviation (MAD)
3. Sample mode 3. Sample Variance
4. Sample Standard Deviation
I. Measures of Central Tendency
Given a sample of measurements (x
1
, x
2
, · · · , x
n
) where
n = sample size
x
i
= value of the i
th
observation in the sample
1. Sample Mean (arithmetic average)
x =
x
1
+x
2
+···+xn
n
or x =
¸
x
n
Example 1: Given a sample of 5 test grades
(90, 95, 80, 60, 75)
then
¸
x = 90 + 95 + 80 + 60 + 75 = 400
x =
¸
x
n
=
400
5
= 80.
Example 2: Let x = age of a randomly selected student sample:
(20, 18, 22, 29, 21, 19)
¸
x = 20 + 18 + 22 + 29 + 21 + 19 = 129
x =
¸
x
n
=
129
6
= 21.5
2. Sample Median
The median of a sample (data set) is the middle number when the measurements are
arranged in ascending order.
Note:
If n is odd, the median is the middle number
9
If n is even, the median is the average of the middle two numbers.
Example 1: Sample (9, 2, 7, 11, 14), n = 5
Step 1: arrange in ascending order
2, 7, 9, 11, 14
Step 2: med = 9.
Example 2: Sample (9, 2, 7, 11, 6, 14), n = 6
Step 1: 2, 6, 7, 9, 11, 14
Step 2: med =
7+9
2
= 8.
Remarks:
(i) x is sensitive to extreme values
(ii) the median is insensitive to extreme values (because median is a measure of
location or position).
3. Mode
The mode is the value of x (observation) that occurs with the greatest frequency.
Example: Sample: (9, 2, 7, 11, 14, 7, 2, 7), mode = 7
10
Effect of x, median and mode on relative frequency distribution.
11
II. Measures of Variability
Given: a sample of size n
sample: (x
1
, x
2
, · · · , x
n
)
1. Range:
Range = largest measurement - smallest measurement
or Range = max - min
Example 1: Sample (90, 85, 65, 75, 70, 95)
Range = max - min = 95-65 = 30
2. Mean Absolute Difference (MAD) (not in textbook)
MAD =
¸
|x −x|
n
Example 2: Same sample
x =
¸
x
n
= 80
x x −x |x −x|
90 10 10
85 5 5
65 -15 15
75 -5 5
70 -10 10
95 15 15
Totals 480 0 60
MAD =
¸
|x −x|
n
=
60
6
= 10.
Remarks:
(i) MAD is a good measure of variability
(ii) It is difficult for mathematical manipulations
3. Sample Variance, s
2
s
2
=
¸
(x −x)
2
n −1
4. Sample Standard Deviation, s
12
s =

s
2
or s =

¸
(x−x)
2
n−1
Example: Same sample as before (x = 80)
x x −x (x −x)
2
90 10 100
85 5 25
65 -15 225
75 -5 25
70 -10 100
95 15 225
Totals 480 0 700
Therefore
x =
¸
x
n
=
480
6
= 80
s
2
=
¸
(x −x)
2
n −1
=
700
5
= 140
s =

s
2
=

140 = 11.83
Shortcut Formula for Calculating s
2
and s
s
2
=
¸
x
2

(
¸
x)
2
n
n −1
s =

¸
x
2

(
¸
x)
2
n
n −1
(or s =

s
2
).
Example: Same sample
13
x x
2
90 8100
85 7225
65 4225
75 5625
70 4900
95 9025
Totals 480 39,100
s
2
=
¸
x
2

(
¸
x)
2
n
n −1
=
39, 100 −
(480)
2
6
5
=
39, 100 −38, 400
5
=
700
5
= 140
s =

s
2
=

140 = 11.83.
Numerical methods(Summary)
Data: {x
1
, x
2
, · · · , x
n
}
(i) Measures of central tendency
Sample mean: x =
¸
x
i
n
Sample median: the middle number when the measurements are arranged in ascending
order
Sample mode: most frequently occurring value
(ii) Measures of variability
Range: r = max −min
Sample Variance: s
2
=
¸
(x
i
−x)
2
n−1
Sample standard deviation: s=

s
2
Exercise: Find all the measures of central tendency and measures of variability for the
weight loss example.
Graphical Interpretation of the Variance:
Finite Populations
Let N = population size.
Data: {x
1
, x
2
, · · · , x
N
}
Population mean: µ =
¸
x
i
N
Population variance:
σ
2
=
¸
(x
i
−µ)
2
N
14
Population standard deviation: σ =

σ
2
, i.e.
σ =

¸
(x
i
−µ)
2
N
Population parameters vs sample statistics.
Sample statistics: x, s
2
, s.
Population parameters: µ, σ
2
, σ.
Practical Significance of the standard deviation
Chebyshev’s Inequality. (Regardless of the shape of frequency distribution)
Given a number k ≥ 1, and a set of measurements x
1
, x
2
, . . . , x
n
, at least (1 −
1
k
2
) of
the measurements lie within k standard deviations of their sample mean.
Restated. At least (1 −
1
k
2
) observations lie in the interval (x −ks, x + ks).
Example. A set of grades has x = 75, s = 6. Then
(i) (k = 1): at least 0% of all grades lie in [69, 81]
(ii) (k = 2): at least 75% of all grades lie in [63, 87]
(iii) (k = 3): at least 88% of all grades lie in [57, 93]
(iv) (k = 4): at least ?% of all grades lie in [?, ?]
(v) (k = 5): at least ?% of all grades lie in [?, ?]
Suppose that you are told that the frequency distribution is bell shaped. Can you
improve the estimates in Chebyshev’s Inequality.
Empirical rule. Given a set of measurements x
1
, x
2
, . . . , x
n
, that is bell shaped. Then
(i) approximately 68% of the measurements lie within one standard deviations of their
sample mean, i.e. (x −s, x + s)
(ii) approximately 95% of the measurements lie within two standard deviations of
their sample mean, i.e. (x −2s, x + 2s)
(iii) at least (almost all) 99% of the measurements lie within three standard deviations
of their sample mean, i.e. (x −3s, x + 3s)
Example A data set has x = 75, s = 6. The frequency distribution is known to be
normal (bell shaped). Then
(i) (69, 81) contains approximately 68% of the observations
(ii) (63, 87) contains approximately 95% of the observations
(iii) (57, 93) contains at least 99% (almost all) of the observations
Comments.
(i) Empirical rule works better if sample size is large
(ii) In your calculations always keep 6 significant digits
15
(iii) Approximation: s
range
4
(iv) Coefficient of variation (c.v.) =
s
x
4 Percentiles
Using percentiles is useful if data is badly skewed.
Let x
1
, x
2
, . . . , x
n
be a set of measurements arranged in increasing order.
Definition. Let 0 < p < 100. The p
th
percentile is a number x such that p% of all
measurements fall below the p
th
percentile and (100 −p)% fall above it.
Example. Data: 2, 5, 8, 10, 11, 14, 17, 20.
(i) Find the 30th percentile.
Solution.
(S1) position = .3(n + 1) = .3(9) = 2.7
(S2) 30th percentile = 5 +.7(8 −5) = 5 + 2.1 = 7.1
Special Cases.
1. Lower Quartile (25th percentile)
Example.
(S1) position = .25(n + 1) = .25(9) = 2.25
(S2) Q
1
= 5 + .25(8 −5) = 5 + .75 = 5.75
2. Median (50th percentile)
Example.
(S1) position = .5(n + 1) = .5(9) = 4.5
(S2) median: Q
2
= 10 +.5(11 −10) = 10.5
3. Upper Quartile (75th percentile)
Example.
(S1) position = .75(n + 1) = .75(9) = 6.75
(S2) Q
3
= 14 + .75(17 −14) = 16.25
Interquartiles.
IQ = Q
3
−Q
1
Exercise. Find the interquartile (IQ) in the above example.
16
5 Sample Mean and Variance
For Grouped Data
Example: (weight loss data)
Weight Loss Data
class boundaries mid-pt. freq. xf x
2
f
x f
1 5.0-9.0- 7 3 21 147
2 9.0-13.0- 11 5 55 605
3 13.0-17.0- 15 7 105 1,575
4 17.0-21.0- 19 6 114 2,166
5 21.0-25.0- 23 3 69 1,587
6 25.0-29.0 27 1 27 729
Totals 25 391 6,809
Let k = number of classes.
Formulas.
x
g
=
¸
xf
n
s
2
g
=
¸
x
2
f −(
¸
xf)
2
/n
n −1
where the summation is over the number of classes k.
Exercise: Use the grouped data formulas to calculate the sample mean, sample variance
and sample standard deviation of the grouped data in the weight loss example. Compare
with the raw data results.
6 z-score
1. The sample z-score for a measurement x is
z =
x −x
s
2. The population z-score for a measurement x is
17
z =
x −µ
σ
Example. A set of grades has x = 75, s = 6. Suppose your score is 85. What is your
relative standing, (i.e. how many standard deviations, s, above (below) the mean your
score is)?
Answer.
z =
x −x
s
=
85 −75
6
= 1.66
standard deviations above average.
Review Exercises: Data Analysis
Please show all work. No credit for a correct final answer without a valid argu-
ment. Use the formula, substitution, answer method whenever possible. Show your work
graphically in all relevant questions.
1. (Fluoride Problem) The regulation board of health in a particular state specify
that the fluoride level must not exceed 1.5 ppm (parts per million). The 25 measurements
below represent the fluoride level for a sample of 25 days. Although fluoride levels are
measured more than once per day, these data represent the early morning readings for
the 25 days sampled.
.75 .86 .84 .85 .97
.94 .89 .84 .83 .89
.88 .78 .77 .76 .82
.71 .92 1.05 .94 .83
.81 .85 .97 .93 .79
(i) Show that x = .8588, s
2
= .0065, s = .0803.
(ii) Find the range, R.
(iii) Using k = 7 classes, find the width, w, of each class interval.
(iv) Locate class boundaries
(v) Construct the frequency and relative frequency distributions for the data.
18
class frequency relative frequency
.70-.75-
.75-.80-
.80-.85-
.85-.90-
.90-.95-
.95-1.00-
1.00-1.05
Totals
(vi) Graph the frequency and relative frequency distributions and state your conclu-
sions. (Vertical axis must be clearly labeled)
2. Given the following data set (weight loss per week)
(9, 2, 5, 8, 4, 5)
(i) Find the sample mean.
(ii) Find the sample median.
(iii) Find the sample mode.
(iv) Find the sample range.
(v) Find the mean absolute difference.
(vi) Find the sample variance using the defining formula.
(vii) Find the sample variance using the short-cut formula.
(viii) Find the sample standard deviation.
(ix) Find the first and third quartiles, Q
1
and Q
3
.
(x) Repeat (i)-(ix) for the data set (21, 24, 15, 16, 24).
Answers: x = 5.5, med =5, mode =5 range = 7, MAD=2, s
s
, 6.7, s = 2.588, Q−3 =
8.25.
3. Grades for 50 students from a previous MAT test are summarized below.
class frequency, f xf x
2
f
40 -50- 4
50 -60- 6
60-70- 10
70-80- 15
80-90- 10
90-100 5
Totals
19
(i) Complete all entries in the table.
(ii) Graph the frequency distribution. (Vertical axis must be clearly labeled)
(iii) Find the sample mean for the grouped data
(iv) Find the sample variance and standard deviation for the grouped data.
Answers: Σxf = 3610, Σx
2
f = 270, 250, x = 72.2, s
2
= 196, s = 14.
4. Refer to the raw data in the fluoride problem.
(i) Find the sample mean and standard deviation for the raw data.
(ii) Find the sample mean and standard deviation for the grouped data.
(iii) Compare the answers in (i) and (ii).
Answers: Σxf = 21.475, Σx
2
f = 18.58, x
g
=, s
g
= .0745.
5. Suppose that the mean of a population is 30. Assume the standard deviation is
known to be 4 and that the frequency distribution is known to be bell-shaped.
(i) Approximately what percentage of measurements fall in the interval (22, 34)
(ii) Approximately what percentage of measurements fall in the interval (µ, µ + 2σ)
(iii) Find the interval around the mean that contains 68% of measurements
(iv)Find the interval around the mean that contains 95% of measurements
6. Refer to the data in the fluoride problem. Suppose that the relative frequency
distribution is bell-shaped. Using the empirical rule
(i) find the interval around the mean that contains 99.6% of measurements.
(ii) find the percentage of measurements fall in the interval (µ + 2σ, ∞)
7. (4 pts.) Answer by True of False . (Circle your choice).
T F (i) The median is insensitive to extreme values.
T F (ii) The mean is insensitive to extreme values.
T F (iii) For a positively skewed frequency distribution, the mean is larger than the
median.
T F (iv) The variance is equal to the square of the standard deviation.
T F (v) Numerical descriptive measures computed from sample measurements are
called parameters.
T F (vi) The number of students attending a Mathematics lecture on any given day
is a discrete variable.
20
T F (vii) The median is a better measure of central tendency than the mean when a
distribution is badly skewed.
T F (viii) Although we may have a large mass of data, statistical techniques allow us
to adequately describe and summarize the data with an average.
T F (ix) A sample is a subset of the population.
T F (x) A statistic is a number that describes a population characteristic.
T F (xi) A parameter is a number that describes a sample characteristic.
T F (xii) A population is a subset of the sample.
T F (xiii) A population is the complete collection of items under study.
21
Chapter 2
Probability
Contents.
Sample Space and Events
Probability of an Event
Equally Likely Outcomes
Conditional Probability and Independence
Laws of Probability
Counting Sample Points
Random Sampling
1 Sample Space and Events
Definitions
Random experiment: involves obtaining observations of some kind
Examples Toss of a coin, throw a die, polling, inspecting an assembly line, counting
arrivals at emergency room, etc.
Population: Set of all possible observations. Conceptually, a population could be gen-
erated by repeating an experiment indefinitely.
Outcome of an experiment:
Elementary event (simple event): one possible outcome of an experiment
Event (Compound event): One or more possible outcomes of a random experiment
Sample space: the set of all sample points (simple events) for an experiment is called
a sample space; or set of all possible outcomes for an experiment
Notation.
Sample space : S
22
Sample point: E
1
, E
2
, . . . etc.
Event: A, B, C, D, E etc. (any capital letter).
Venn diagram:
Example.
S = {E
1
, E
2
, . . . , E
6
}.
That is S = {1, 2, 3, 4, 5, 6}. We may think of S as representation of possible outcomes
of a throw of a die.
More definitions
Union, Intersection and Complementation
Given A and B two events in a sample space S.
1. The union of A and B, A ∪ B, is the event containing all sample points in either
A or B or both. Sometimes we use AorB for union.
2. The intersection of A and B, A∩B, is the event containing all sample points that
are both in A and B. Sometimes we use AB or AandB for intersection.
3. The complement of A, A
c
, is the event containing all sample points that are not in
A. Sometimes we use notA or A for complement.
Mutually Exclusive Events (Disjoint Events) Two events are said to be mutually
exclusive (or disjoint) if their intersection is empty. (i.e. A∩ B = φ).
Example Suppose S = {E
1
, E
2
, . . . , E
6
}. Let
A = {E
1
, E
3
, E
5
};
B = {E
1
, E
2
, E
3
}. Then
(i)A ∪ B = {E
1
, E
2
, E
3
, E
5
}.
(ii) AB = {E
1
, E
3
}.
(iii) A
c
= {E
2
, E
4
, E
6
}; B
c
= {E
4
, E
5
, E
6
};
(iv) A and B are not mutually exclusive (why?)
(v) Give two events in S that are mutually exclusive.
2 Probability of an event
Relative Frequency Definition If an experiment is repeated a large number, n, of
times and the event A is observed n
A
times, the probability of A is
P(A)
n
A
n
Interpretation
n = # of trials of an experiment
23
n
A
= frequency of the event A
n
A
n
= relative frequency of A
P(A)
n
A
n
if n is large enough.
(In fact, P(A) = lim
n→∞
n
A
n
.)
Conceptual Definition of Probability
Consider a random experiment whose sample space is S with sample points E
1
, E
2
, . . . ,.
For each event E
i
of the sample space S define a number P(E) that satisfies the following
three conditions:
(i) 0 ≤ P(E
i
) ≤ 1 for all i
(ii) P(S) = 1
(iii) (Additive property)
¸
S
P(E
i
) = 1,
where the summation is over all sample points in S.
We refer to P(E
i
) as the probability of the E
i
.
Definition The probability of any event A is equal to the sum of the probabilities of the
sample points in A.
Example. Let S = {E
1
, . . . , E
10
}. It is known that P(E
i
) = 1/20, i = 1, . . . , 6 and
P(E
i
) = 1/5, i = 7, 8, 9 and P(E
10
) = 2/20. In tabular form, we have
E
i
E
1
E
2
E
3
E
4
E
5
E
6
E
7
E
8
E
9
E
10
p(E
i
) 1/20 1/20 1/20 1/20 1/20 1/20 1/5 1/5 1/5 1/10
Question: Calculate P(A) where A = {E
i
, i ≥ 6}.
A:
P(A) = P(E
6
) + P(E
7
) + P(E
8
) + P(E
9
) + P(E
10
)
= 1/20 + 1/5 + 1/5 + 1/5 + 1/10 = 0.75
Steps in calculating probabilities of events
1. Define the experiment
2. List all simple events
3. Assign probabilities to simple events
4. Determine the simple events that constitute an event
5. Add up the simple events’ probabilities to obtain the probability of the event
24
Example Calculate the probability of observing one H in a toss of two fair coins.
Solution.
S = {HH, HT, TH, TT}
A = {HT, TH}
P(A) = 0.5
Interpretations of Probability
(i) In real world applications one observes (measures) relative frequencies, one cannot
measure probabilities. However, one can estimate probabilities.
(ii) At the conceptual level we assign probabilities to events. The assignment, how-
ever, should make sense. (e.g. P(H)=.5, P(T)=.5 in a toss of a fair coin).
(iii) In some cases probabilities can be a measure of belief (subjective probability).
This measure of belief should however satisfy the axioms.
(iv) Typically, we would like to assign probabilities to simple events directly; then use
the laws of probability to calculate the probabilities of compound events.
Equally Likely Outcomes
The equally likely probability P defined on a finite sample space S = {E
1
, . . . , E
N
},
assigns the same probability P(E
i
) = 1/N for all E
i
.
In this case, for any event A
P(A) =
N
A
N
=
sample points in A
sample points in S
=
#(A)
#(S)
where N is the number of the sample points in S and N
A
is the number of the sample
points in A.
Example. Toss a fair coin 3 times.
(i) List all the sample points in the sample space
Solution: S = {HHH, · · · TTT} (Complete this)
(ii) Find the probability of observing exactly two heads, at most one head.
3 Laws of Probability
Conditional Probability
The conditional probability of the event A given that event B has occurred is denoted
by P(A|B). Then
P(A|B) =
P(A∩ B)
P(B)
25
provided P(B) > 0. Similarly,
P(B|A) =
P(A∩ B)
P(A)
Independent Events
Definitions. (i) Two events A and B are said to be independent if
P(A∩ B) = P(A)P(B).
(ii) Two events A and B that are not independent are said to be dependent.
Remarks. (i) If A and B are independent, then
P(A|B) = P(A) and P(B|A) = P(B).
(ii) If A is independent of B then B is independent of A.
Probability Laws
Complementation law:
P(A) = 1 −P(A
c
)
Additive law:
P(A∪ B) = P(A) + P(B) −P(A∩ B)
Moreover, if A and B are mutually exclusive, then P(AB) = 0 and
P(A∪ B) = P(A) + P(B)
Multiplicative law (Product rule)
P(A∩ B) = P(A|B)P(B)
= P(B|A)P(A)
Moreover, if A and B are independent
P(AB) = P(A)P(B)
Example Let S = {E
1
, E
2
, . . . , E
6
}; A = {E
1
, E
3
, E
5
}; B = {E
1
, E
2
, E
3
}; C = {E
2
, E
4
, E
6
};D =
{E
6
}. Suppose that all elementary events are equally likely.
(i) What does it mean that all elementary events are equally likely?
(ii) Use the complementation rule to find P(A
c
).
(iii) Find P(A|B) and P(B|A)
(iv) Find P(D) and P(D|C)
26
(v) Are A and B independent? Are C and D independent?
(vi) Find P(A∩ B) and P(A∪ B).
Law of total probability Let the B, B
c
be complementary events and let A denote an
arbitrary event. Then
P(A) = P(A∩ B) + P(A∩ B
c
) ,
or
P(A) = P(A|B)P(B) + P(A|B
c
)P(B
c
).
Bayes’ Law
Let the B, B
c
be complementary events and let A denote an arbitrary event. Then
P(B|A) =
P(AB)
P(A)
=
P(A|B)P(B)
P(A|B)P(B) + P(A|B
c
)P(B
c
)
.
Remarks.
(i) The events of interest here are B, B
c
, P(B) and P(B
c
) are called prior probabilities,
and
(ii) P(B|A) and P(B
c
|A) are called posterior (revised) probabilities.
(ii) Bayes’ Law is important in several fields of applications.
Example 1. A laboratory blood test is 95 percent effective in detecting a certain disease
when it is, in fact, present. However, the test also yields a “false positive” results for
1 percent of healthy persons tested. (That is, if a healthy person is tested, then, with
probability 0.01, the test result will imply he or she has the disease.) If 0.5 percent of
the population actually has the disease, what is the probability a person has the disease
given that the test result is positive?
Solution Let D be the event that the tested person has the disease and E the event
that the test result is positive. The desired probability P(D|E) is obtained by
P(D|E) =
P(D ∩ E)
P(E)
=
P(E|D)P(D)
P(E|D)P(D) + P(E|D
c
)P(D
c
)
=
(.95)(.005)
(.95)(.005) + (.01)(.995)
=
95
294
.323.
27
Thus only 32 percent of those persons whose test results are positive actually have the
disease.
Probabilities in Tabulated Form
4 Counting Sample Points
Is it always necessary to list all sample points in S?
Coin Tosses
Coins sample-points Coins sample-points
1 2 2 4
3 8 4 16
5 32 6 64
10 1024 20 1,048,576
30 10
9
40 10
12
50 10
15
64 10
19
Note that 2
30
10
9
= one billion, 2
40
10
12
= one thousand billion, 2
50
10
15
=
one trillion.
RECALL: P(A) =
n
A
n
, so for some applications we need to find n, n
A
where n and
n
A
are the number of points in S and A respectively.
Basic principle of counting: mn rule
Suppose that two experiments are to be performed. Then if experiment 1 can result
in any one of m possible outcomes and if, for each outcome of experiment 1, there are n
possible outcomes of experiment 2, then together there are mn possible outcomes of the
two experiments.
Examples.
(i) Toss two coins: mn = 2 ×2 = 4
(ii) Throw two dice: mn = 6 ×6 = 36
(iii) A small community consists of 10 men, each of whom has 3 sons. If one man
and one of his sons are to be chosen as father and son of the year, how many different
choices are possible?
Solution: Let the choice of the man as the outcome of the first experiment and the
subsequent choice of one of his sons as the outcome of the second experiment, we see,
from the basic principle, that there are 10 ×3 = 30 possible choices.
Generalized basic principle of counting
28
If r experiments that are to be performed are such that the first one may result in
any of n
1
possible outcomes, and if for each of these n
1
possible outcomes there are n
2
possible outcomes of the second experiment, and if for each of the possible outcomes of
the first two experiments there are n
3
possible outcomes of the third experiment, and if,
. . ., then there are a total of n
1
· n
2
· · · n
r
possible outcomes of the r experiments.
Examples
(i) There are 5 routes available between A and B; 4 between B and C; and 7 between
C and D. What is the total number of available routes between A and D?
Solution: The total number of available routes is mnt = 5.4.7 = 140.
(ii) A college planning committee consists of 3 freshmen, 4 sophomores, 5 juniors,
and 2 seniors. A subcommittee of 4, consisting of 1 individual from each class, is to be
chosen. How many different subcommittees are possible?
Solution: It follows from the generalized principle of counting that there are 3·4·5·2 =
120 possible subcommittees.
(iii) How many different 7−place license plates are possible if the first 3 places are to
be occupied by letters and the final 4 by numbers?
Solution: It follows from the generalized principle of counting that there are 26 · 26 ·
26 · 10 · 10 · 10 · 10 = 175, 760, 000 possible license plates.
(iv) In (iii), how many license plates would be possible if repetition among letters or
numbers were prohibited?
Solution: In this case there would be 26 · 25 · 24 · 10 · 9 · 8 · 7 = 78, 624, 000 possible
license plates.
Permutations: (Ordered arrangements)
The number of ways of ordering n distinct objects taken r at a time (order is impor-
tant) is given by
n!
(n −r)!
= n(n −1)(n −2) · · · (n −r + 1)
Examples
(i) In how many ways can you arrange the letters a, b and c. List all arrangements.
Answer: There are 3! = 6 arrangements or permutations.
(ii) A box contains 10 balls. Balls are selected without replacement one at a time. In
how many different ways can you select 3 balls?
Solution: Note that n = 10, r = 3. Number of different ways is
10 · 9 · 8 =
10!
7!
= 720,
29
(which is equal to
n!
(n−r)!
).
Combinations
For r ≤ n, we define

n
r

=
n!
(n −r)!r!
and say that

n
r

represents the number of possible combinations of n objects taken r at
a time (with no regard to order).
Examples
(i) A committee of 3 is to be formed from a group of 20 people. How many different
committees are possible?
Solution: There are

20
3

=
20!
3!17!
=
20.19.18
3.2.1
= 1140 possible committees.
(ii) From a group of 5 men and 7 women, how many different committees consisting
of 2 men and 3 women can be formed?
Solution:

5
2

7
3

= 350 possible committees.
5 Random Sampling
Definition. A sample of size n is said to be a random sample if the n elements are selected
in such a way that every possible combination of n elements has an equal probability of
being selected.
In this case the sampling process is called simple random sampling.
Remarks. (i) If n is large, we say the random sample provides an honest representation
of the population.
(ii) For finite populations the number of possible samples of size n is

N
n

. For instance
the number of possible samples when N = 28 and n = 4 is

28
4

= 20, 475.
(iii) Tables of random numbers may be used to select random samples.
6 Modeling Uncertainty
The purpose of modeling uncertainty (randomness) is to discover the laws of change.
1. Concept of Probability. Even though probability (chance) involves the notion of
change, the laws governing the change may themselves remain fixed as time passes.
Example. Consider a chance experiment: Toss of a coin.
30
Probabilistic Law. In a fair coin tossing experiment the percentage of (H)eads is very
close to 0.5. In the model (abstraction): P(H) = 0.5 exactly.
Why Probabilistic Reasoning?
Example. Toss 5 coins repeatedly and write down the number of heads observed in each
trial. Now, what percentage of trials produce 2 Heads?
answer. Use the Binomial law to show that
P(2Heads) =

5
2

(0.5)
2
(1 −.5)
3
=
5!
2!3!
(0.5)
2
(.5)
3
= 0.3125
Conclusion. There is no need to carry out this experiment to answer the question.
(Thus saving time and effort).
2. The Interplay Between Probability and Statistics. (Theory versus Application)
(i) Theory is an exact discipline developed from logically defined axioms (conditions).
(ii) Theory is related to physical phenomena only in inexact terms (i.e. approxi-
mately).
(iii) When theory is applied to real problems, it works ( i.e. it makes sense).
Example. A fair die is tossed for a very large number of times. It was observed that
face 6 appeared 1, 500. Estimate how many times the die is tossed.
Answer. 9000 times.
Review Exercises: Probability
Please show all work. No credit for a correct final answer without a valid argu-
ment. Use the formula, substitution, answer method whenever possible. Show your work
graphically in all relevant questions.
1. An experiment consists of tossing 3 fair coins.
(i) List all the elements in the sample space.
(ii) Describe the following events:
A = { observe exactly two heads}
B = { Observe at most one tail}
C = { Observe at least two heads}
D = {Observe exactly one tail}
(iii) Find the probabilities of events A, B, C, D.
31
2. Suppose that S = {1, 2, 3, 4, 5, 6} such that P(1) = .1, P(2) = .1,P(3)=.1, P(4)=.2,
P(5) = .2, P(6) = .3.
(i) Find the probability of the event A = {4, 5, 6}.
(ii) Find the probability of the complement of A.
(iii) Find the probability of the event B = {even}.
(iv) Find the probability of the event C = {odd}.
3. An experiment consists of throwing a fair die.
(i) List all the elements in the sample space.
(ii) Describe the following events:
A = { observe a number larger than 3 }
B = { Observe an even number}
C = { Observe an odd number}
(iii) Find the probabilities of events A, B, C.
(iv) Compare problems 2. and 3.
4. Refer to problem 3. Find
(i) A∪ B
(ii) A∩ B
(iii) B ∩ C
(iv) A
c
(v) C
c
(vi) A ∪ C
(vii) A ∩ C
(viii) Find the probabilities in (i)-(vii).
(ix) Refer to problem 2., and answer questions (i)-(viii).
5. The following probability table gives the intersection probabilities for four events
A, B, C and D:
A B
C .06 0.31
D .55 .08
1.00
(i) Using the definitions, find P(A), P(B), P(C), P(D), P(C|A), P(D|A) and P(C|B).
32
(ii) Find P(B
c
).
(iii) Find P(A∩ B).
(iv) Find P(A∪ B).
(v) Are B and C independent events? Justify your answer.
(vi) Are B and C mutually exclusive events? Justify your answer.
(vii) Are C and D independent events? Justify your answer.
(viii) Are C and D mutually exclusive events? Justify your answer.
6. Use the laws of probability to justify your answers to the following questions:
(i) If P(A ∪ B) = .6, P(A) = .2, and P(B) = .4, are A and B mutually exclusive?
independent?
(ii) If P(A∪ B) = .65, P(A) = .3, and P(B) = .5, are A and B mutually exclusive?
independent?
(iii) If P(A ∪ B) = .7, P(A) = .4, and P(B) = .5, are A and B mutually exclusive?
independent?
7. Suppose that the following two weather forecasts were reported on two local TV
stations for the same period. First report: The chances of rain are today 30%, tomorrow
40%, both today and tomorrow 20%, either today or tomorrow 60%. Second report: The
chances of rain are today 30%, tomorrow 40%, both today and tomorrow 10%, either
today or tomorrow 60%. Which of the two reports, if any, is more believable? Why? No
credit if answer is not justified. (Hint: Let A and B be the events of rain today and rain
tomorrow.)
8. A box contains five balls, a black (b), white (w), red (r), orange (o), and green (g).
Three balls are to be selected at random.
(i) Find the sample space S (Hint: there is 10 sample points).
S = {bwr, · · ·}
(ii) Find the probability of selecting a black ball.
(iii) Find the probability of selecting one black and one red ball.
9. A box contains four black and six white balls.
(i) If a ball is selected at random, what is the probability that it is white? black?
(ii) If two balls are selected without replacement, what is the probability that both
balls are black? both are white? the first is white and the second is black? the first is
black and the second is white? one ball is black?
(iii) Repeat (ii) if the balls are selected with replacement.
33
(Hint: Start by defining the events B
1
and B − 2 as the first ball is black and the
second ball is black respectively, and by defining the events W
1
abd W − 2 as the first
ball is white and the second ball is white respectively. Then use the product rule)
10. Answer by True of False . (Circle your choice).
T F (i) An event is a specific collection of simple events.
T F (ii) The probability of an event can sometimes be negative.
T F (iii) If A and B are mutually exclusive events, then they are also dependent.
T F (iv) The sum of the probabilities of all simple events in the sample space may be
less than 1 depending on circumstances.
T F (v) A random sample of n observations from a population is not likely to provide
a good estimate of a parameter.
T F (vi) A random sample of n observations from a population is one in which every
different subset of size n from the population has an equal probability of being selected.
T F (vii) The probability of an event can sometimes be larger than one.
T F (viii) The probability of an elementary event can never be larger than one half.
T F (ix) Although the probability of an event occurring is .9, the event may not occur
at all in 10 trials.
T F (x) If a random experiment has 5 possible outcomes, then the probability of each
outcome is 1/5.
T F (xi) If two events are independent, the occurrence of one event should not affect
the likelihood of the occurrence of the other event.
34
Chapter 3
Random Variables and Discrete
Distributions
Contents.
Random Variables
Expected Values and Variance
Binomial
Poisson
Hypergeometric
1 Random Variables
The discrete rv arises in situations when the population (or possible outcomes) are
discrete (or qualitative).
Example. Toss a coin 3 times, then
S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}
Let the variable of interest, X, be the number of heads observed then relevant events
would be
{X = 0} = {TTT}
{X = 1} = {HTT, THT, TTH}
{X = 2} = {HHT, HTH, THH}
{X = 3} = {HHH}.
The relevant question is to find the probability of each these events.
Note that X takes integer values even though the sample space consists of H’s and
T’s.
35
The variable X transforms the problem of calculating probabilities from that of set
theory to calculus.
Definition. A random variable (r.v.) is a rule that assigns a numerical value to each
possible outcome of a random experiment.
Interpretation:
-random: the value of the r.v. is unknown until the outcome is observed
- variable: it takes a numerical value
Notation: We use X, Y , etc. to represent r.v.s.
A Discrete r.v. assigns a finite or countably infinite number of possible values
(e.g. toss a coin, throw a die, etc.)
A Continuous r.v. has a continuum of possible values.
(e.g. height, weight, price, etc.)
Discrete Distributions The probability distribution of a discrete r.v., X, assigns a
probability p(x) for each possible x such that
(i) 0 ≤ p(x) ≤ 1, and
(ii)
¸
x
p(x) = 1
where the summation is over all possible values of x.
Discrete distributions in tabulated form
Example.
Which of the following defines a probability distribution?
x 0 1 2
p(x) 0.30 0.50 0.20
x 0 1 2
p(x) 0.60 0.50 -0.10
x -1 1 2
p(x) 0.30 0.40 0.20
Remarks. (i) Discrete distributions arise when the r.v. X is discrete (qualitative data)
36
(ii) Continuous distributions arise when the r.v. X is continuous (quantitative data)
Remarks. (i) In data analysis we described a set of data (sample) by dividing it into
classes and calculating relative frequencies.
(ii) In Probability we described a random experiment (population) in terms of events
and probabilities of events.
(iii) Here, we describe a random experiment (population) by using random variables,
and probability distribution functions.
2 Expected Value and Variance
Definition 2.1 The expected value of a discrete rv X is denoted by µ and is defined to
be
µ =
¸
x
xp(x).
Notation: The expected value of X is also denoted by µ = E[X]; or sometimes µ
X
to
emphasize its dependence on X.
Definition 2.2 If X is a rv with mean µ, then the variance of X is defined by
σ
2
=
¸
x
(x −µ)
2
p(x)
Notation: Sometimes we use σ
2
= V (X) (or σ
2
X
).
Shortcut Formula
σ
2
=
¸
x
2
p(x) −µ
2
Definition 2.3 If X is a rv with mean µ, then the standard deviation of X, denoted by
σ
X
, (or simply σ) is defined by
σ =

V (X) =

¸
(x −µ)
2
p(x)
Shortcut Formula
σ =

¸
x
2
p(x) −µ
2
37
3 Discrete Distributions
Binomial.
The binomial experiment (distribution) arises in following situation:
(i) the underlying experiment consists of n independent and identical trials;
(ii) each trial results in one of two possible outcomes, a success or a failure;
(iii) the probability of a success in a single trial is equal to p and remains the same
throughout the experiment; and
(iv) the experimenter is interested in the rv X that counts the number of successes
observed in n trials.
A r.v. X is said to have a binomial distribution with parameters n and p if
p(x) =

n
x

p
x
q
n−x
(x = 0, 1, . . . , n)
where q = 1 −p.
Mean: µ = np
Variance: σ
2
= npq, σ =

npq
Example: Bernoulli.
A rv X is said to have a Bernoulli distribution with parameter p if
Formula: p(x) = p
x
(1 −p)
1−x
x = 0, 1.
Tabulated form:
x 0 1
p(x) 1-p p
Mean: µ = p
Variance: σ
2
= pq, σ =

pq
Binomial Tables.
Cumulative probabilities are given in the table.
Example. Suppose X has a binomial distribution with n = 10, p = .4. Find
(i) P(X ≤ 4) = .633
(ii) P(X < 6) = P(X ≤ 5) = .834
(iii) P(X > 4) = 1 −P(X ≤ 4) = 1 −.633 = .367
(iv) P(X = 5) = P(X ≤ 5) −P(X ≤ 4) = .834 −.633 = .201
Exercise: Answer the same question with p = 0.7
38
Poisson.
The Poisson random variable arises when counting the number of events that occur
in an interval of time when the events are occurring at a constant rate; examples include
number of arrivals at an emergency room, number of items demanded from an inventory;
number of items in a batch of a random size.
A rv X is said to have a Poisson distribution with parameter λ > 0 if
p(x) = e
−λ
λ
x
/x!, x = 0, 1, . . . .
Graph.
Mean: µ = λ
Variance: σ
2
= λ, σ =

λ
Note: e 2.71828
Example. Suppose the number of typographical errors on a single page of your book
has a Poisson distribution with parameter λ = 1/2. Calculate the probability that there
is at least one error on this page.
Solution. Letting X denote the number of errors on a single page, we have
P(X ≥ 1) = 1 −P(X = 0) = 1 −e
−0.5
0.395
Rule of Thumb. The Poisson distribution provides good approximations to binomial
probabilities when n is large and µ = np is small, preferably with np ≤ 7.
Example. Suppose that the probability that an item produced by a certain machine
will be defective is 0.1. Find the probability that a sample of of 10 items will contain at
most 1 defective item.
Solution. Using the binomial distribution, the desired probability is
P(X ≤ 1) = p(0) + p(1) =

10
0

(0.1)
0
(0.9)
10
+

10
1

(0.1)
1
(0.9)
9
= 0.7361
Using Poisson approximation, we have λ = np = 1
e
−1
+ e
−1
0.7358
which is close to the exact answer.
Hypergeometric.
The hypergeometric distribution arises when one selects a random sample of size n,
without replacement, from a finite population of size N divided into two classes consisting
39
of D elements of the first kind and N − D of the second kind. Such a scheme is called
sampling without replacement from a finite dichotomous population.
Formula:
f(x) =

D
x

N−D
n−x

N
n

,
where max(0, n −N + D) ≤ x ≤ min(n, D). We define F(x) = 0, elsewhere.
Mean: E[X] = n(
D
N
)
Variance: V (X) = (
N−n
N−1
)(n)(
D
N
(1 −
D
N
))
The
N−n
N−1
is called the finite population correction factor.
Example. (Sampling without replacement)
Suppose an urn contains D = 10 red balls and N − D = 15 white balls. A random
sample of size n = 8, without replacement, is drawn and the number or red balls is
denoted by X. Then
f(x) =

10
x

15
8−x

25
8

0 ≤ x ≤ 8 .
4 Markov Chains
Example 1.(Brand Switching Problem)
Suppose that a manufacturer of a product (Brand 1) is competing with only one
other similar product (Brand 2). Both manufacturers have been engaged in aggressive
advertising programs which include offering rebates, etc. A survey is taken to find out
the rates at which consumers are switching brands or staying loyal to brands. Responses
to the survey are given below. If the manufacturers are competing for a population of
y = 300, 000 buyers, how should they plan for the future (immediate future, and in the
long-run)?
Brand Switching Data
This week
Last week Brand 1 Brand 2 Total
Brand 1 90 10 100
Brand 2 40 160 200
40
Brand 1 Brand 2
Brand 1 90/100 10/100
Brand 2 40/200 160/200
So
P =

0.9 0.1
0.2 0.8

Question 1. suppose that customer behavior is not changed over time. If 1/3 of all
customers purchased B1 this week.
What percentage will purchase B1 next week?
What percentage will purchase B2 next week?
What percentage will purchase B1 two weeks from now?
What percentage will purchase B2 two weeks from now?
Solution: Note that π
0
= (1/3, 2/3), then
π
1
= (π
1
1
, π
1
2
) = (π
0
1
, π
0
2
)P
π
1
= (π
1
1
, π
1
2
) = (1/3, 2/3)

0.9 0.1
0.2 0.8

= (1.3/3, 1.7/3)
B1 buyers will be 300, 000(1.3/3) = 130, 000
B2 buyers will be 300, 000(1.7/3) = 170, 000.
Two weeks from now: exercise.
Question 2. Determine whether each brand will eventually retain a constant share of
the market.
Solution:
We need to solve π = πP, and
¸
i
π
i
= 1, that is

1
, π
2
) = (π
1
, π
2
)

0.9 0.1
0.2 0.8

and
π
1
+ π
2
= 1
Matrix multiplication gives
π
1
= 0.9π
1
+ 0.2π
2
π
2
= 0.1π
1
+ 0.8π
2
π
1
+ π
2
= 1
41
One equation is redundant. Choose the first and the third. we get
0.1π
1
= 0.2π
2
and π
1
+ π
2
= 1
which gives

1
, π
2
) = (2/3, 1/3)
Brand 1 will eventually capture two thirds of the market (200, 000) customers.
Example 2. On any particular day Rebecca is either cheerful (c) or gloomy (g). If she is
cheerful today then she will be cheerful tomorrow with probability 0.7. If she is gloomy
today then she will be gloomy tomorrow with probability 0.4.
(i) What is the transition matrix P?
Solution:
P =

0.7 0.3
0.6 0.4

(ii) What is the fraction of days Rebecca is cheerful? gloomy?
Solution: The fraction of days Rebecca is cheerful is the probability that on any given
day Rebecca is cheerful. This can be obtained by solving π = πP, where π = (π
0
, π
1
),
and π
0
+ π
1
= 1.
Exercise. Complete this problem.
Review Exercises: Discrete Distributions
Please show all work. No credit for a correct final answer without a valid argu-
ment. Use the formula, substitution, answer method whenever possible. Show your work
graphically in all relevant questions.
1. Identify the following as discrete or continuous random variables.
(i) The market value of a publicly listed security on a given day
(ii) The number of printing errors observed in an article in a weekly news magazine
(iii) The time to assemble a product (e.g. a chair)
(iv) The number of emergency cases arriving at a city hospital
(v) The number of sophomores in a randomly selected Math. class at a university
(vi) The rate of interest paid by your local bank on a given day
2. What restrictions do we place on the probabilities associated with a particular
probability distribution?
42
3. Indicate whether or not the following are valid probability distributions. If they
are not, indicate which of the restrictions has been violated.
(i)
x -1 0 1 3.5
p(x) .6 .1 .1 .2
(ii)
x -1 1 3.5
p(x) .6 .6 -.2
(ii)
x -2 1 4 6
p(x) .2 .2 .2 .1
43
4. A random variable X has the following probability distribution:
x 1 2 3 4 5
p(x) .05 .10 .15 .45 .25
(i) Verify that X has a valid probability distribution.
(ii) Find the probability that X is greater than 3, i.e. P(X > 3).
(iii) Find the probability that X is greater than or equal to 3, i.e. P(X ≥ 3).
(iv) Find the probability that X is less than or equal to 2, i.e. P(X ≤ 2).
(v) Find the probability that X is an odd number.
(vi) Graph the probability distribution for X.
5. A discrete random variable X has the following probability distribution:
x 10 15 20 25
p(x) .2 .3 .4 .1
(i) Calculate the expected value of X, E(X) = µ.
(ii) Calculate the variance of X, σ
2
.
(ii) Calculate the standard deviation of X, σ.
Answers: µ = 17, σ
2
= 21, σ = 4.58.
6. For each of the following probability distributions, calculate the expected value of
X, E(X) = µ; the variance of X, σ
2
; and the standard deviation of X, σ.
(i)
x 1 2 3 4
p(x) .4 .3 .2 .1
44
(ii)
x -2 -1 2 4
p(x) .2 .3 .3 .2
7. In how many ways can a committee of ten be chosen from fifteen individuals?
8. Answer by True of False . (Circle your choice).
T F (i) The expected value is always positive.
T F (ii) A random variable has a single numerical value for each outcome of a random
experiment.
T F (iii) The only rule that applies to all probability distributions is that the possible
random variable values are always between 0 and 1.
T F (iv) A random variable is one that takes on different values depending on the
chance outcome of an experiment.
T F (v) The number of television programs watched per day by a college student is
an example of a discrete random variable.
T F (vi) The monthly volume of gasoline sold in one gas station is an example of a
discrete random variable.
T F (vii) The expected value of a random variable provides a complete description of
the random variable’s probability distribution.
T F (viii) The variance can never be equal to zero.
T F (ix) The variance can never be negative.
T F (x) The probability p(x) for a discrete random variable X must be greater than
or equal to zero but less than or equal to one.
T F (xi) The sum of all probabilities p(x) for all possible values of X is always equal
to one.
T F (xii) The most common method for sampling more than one observation from a
population is called random sampling.
Review Exercises: Binomial Distribution
Please show all work. No credit for a correct final answer without a valid argu-
ment. Use the formula, substitution, answer method whenever possible. Show your work
graphically in all relevant questions.
45
1. List the properties for a binomial experiment.
2. Give the formula for the binomial probability distribution.
3. Calculate
(i) 5!
(ii) 10!
(iii)
7!
3!4!
4. Consider a binomial distribution with n = 4 and p = .5.
(i) Use the formula to find P(0), P(1), · · · , P(4).
(ii) Graph the probability distribution found in (i)
(iii) Repeat (i) and (ii) when n = 4, and p = .2.
(iv) Repeat (i) and (ii) when n = 4, and p = .8.
5. Consider a binomial distribution with n = 5 and p = .6.
(i) Find P(0) and P(2) using the formula.
(ii) Find P(X ≤ 2) using the formula.
(iii) Find the expected value E(X) = µ
(iv) Find the standard deviation σ
6. Consider a binomial distribution with n = 500 and p = .6.
(i) Find the expected value E(X) = µ
(ii) Find the standard deviation σ
7. Consider a binomial distribution with n = 25 and p = .6.
(i) Find the expected value E(X) = µ
(ii) Find the standard deviation σ
(iii) Find P(0) and P(2) using the table.
(iv) Find P(X ≤ 2) using the table.
(v) Find P(X < 12) using the table.
(vi) Find P(X > 13) using the table.
(vii) Find P(X ≥ 8) using the table.
8. A sales organization makes one sale for every 200 prospects that it contacts. The
organization plans to contact 100, 000 prospects over the coming year.
(i) What is the expected value of X, the annual number of sales.
(ii) What is the standard deviation of X.
46
(iii) Within what limits would you expect X to fall with 95% probability. (Use the
empirical rule). Answers: µ = 500, σ = 22.3
9. Identify the binomial experiment in the following group of statements.
(i) a shopping mall is interested in the income levels of its customers and is taking a
survey to gather information
(ii) a business firm introducing a new product wants to know how many purchases
its clients will make each year
(iii) a sociologist is researching an area in an effort to determine the proportion of
households with male “head of households”
(iv) a study is concerned with the average hours worked be teenagers who are attend-
ing high school
(v) Determining whether or nor a manufactured item is defective.
(vi) Determining the number of words typed before a typist makes an error.
(vii) Determining the weekly pay rate per employee in a given company.
10. Answer by True of False . (Circle your choice).
T F (i) In a binomial experiment each trial is independent of the other trials.
T F (i) A binomial distribution is a discrete probability distribution
T F (i) The standard deviation of a binomial probability distribution is given by npq.
47
Chapter 4
Continuous Distributions
Contents.
1. Standard Normal
2. Normal
3. Uniform
4. Exponential
1 Introduction
RECALL: The continuous rv arises in situations when the population (or possible
outcomes) are continuous (or quantitative).
Example. Observe the lifetime of a light bulb, then
S = {x, 0 ≤ x < ∞}
Let the variable of interest, X, be observed lifetime of the light bulb then relevant events
would be {X ≤ x}, {X ≥ 1000}, or {1000 ≤ X ≤ 2000}.
The relevant question is to find the probability of each these events.
Important. For any continuous pdf the area under the curve is equal to 1.
2 The Normal Distribution
Standard Normal.
A normally distributed (bell shaped) random variable with µ = 0 and σ = 1 is said
to have the standard normal distribution. It is denoted by the letter Z.
48
pdf of Z:
f(z) =
1


e
−z
2
/2
; −∞< z < ∞,
Graph.
Tabulated Values.
Values of P(0 ≤ Z ≤ z) are tabulated in the appendix.
Critical Values: z
α
of the standard normal distribution are given by
P(Z ≥ z
α
) = α
which is in the tail of the distribution.
Examples.
(i) P(0 ≤ Z ≤ 1) = .3413
(ii) P(−1 ≤ Z ≤ 1) = .6826
(iii) P(−2 ≤ Z ≤ 2) = .9544
(iv) P(−3 ≤ Z ≤ 3) = .9974
Examples. Find z
0
such that
(i) P(Z > z
0
) = .10; z
0
= 1.28.
(ii) P(Z > z
0
) = .05; z
0
= 1.645.
(iii) P(Z > z
0
) = .025; z
0
= 1.96.
(iv) P(Z > z
0
) = .01; z
0
= 2.33.
(v) P(Z > z
0
) = .005; z
0
= 2.58.
(vi) P(Z ≤ z
0
) = .10, .05, .025, .01, .005. (Exercise)
Normal
A rv X is said to have a Normal pdf with parameters µ and σ if
Formula:
f(x) =
1
σ


e
−(x−µ)
2
/2σ
2
; −∞< x < ∞,
where
−∞< µ < ∞; 0 < σ < ∞.
Properties
Mean: E[X] = µ
Variance: V (X) = σ
2
Graph: Bell shaped.
Area under graph = 1.
Standardizing a normal r.v.:
49
Z-score:
Z =
X −µ
X
σ
X
OR (simply)
Z =
X −µ
σ
Conversely,
X = µ + σZ .
Example If X is a normal rv with parameters µ = 3 and σ
2
= 9, find (i) P(2 < X < 5),
(ii) P(X > 0), and (iii) P(X > 9).
Solution (i)
P(2 < X < 5) = P(−0.33 < Z < 0.67)
= .3779.
(ii)
P(X > 0) = P(Z > −1) = P(Z < 1)
= .8413.
(iii)
P(X > 9) = P(Z > 2.0)
= 0.5 −0.4772 = .0228
Exercise Refer to the above example, find P(X < −3).
Example The length of life of a certain type of automatic washer is approximately
normally distributed, with a mean of 3.1 years and standard deviation of 1.2 years. If
this type of washer is guaranteed for 1 year, what fraction of original sales will require
replacement?
Solution Let X be the length of life of an automatic washer selected at random, then
z =
1 −3.1
1.2
= −1.75
Therefore
P(X < 1) = P(Z < −1.75) =
50
Exercise: Complete the solution of this problem.
Normal Approximation to the Binomial Distribution.
When and how to use the normal approximation:
1. Large n, i.e. np ≥ 5 and n(1 −p) ≥ 5.
2. The approximation can be improved using correction factors.
Example. Let X be the number of times that a fair coin, flipped 40, lands heads.
(i) Find the probability that X = 20. (ii) Find P(10 ≤ X ≤ 20). Use the normal
approximation.
Solution Note that np = 20 and np(1 −p) = 10.
P(X = 20) = P(19.5 < X < 20.5)
= P(
19.5 −20

10
<
X −20

10
<
20.5 −20

10
)
P(−0.16 < Z < 0.16)
= .1272.
The exact result is
P(X = 20) =

40
20

(0.5)
20
(0.5)
20
= .1268
(ii) Exercise.
3 Uniform: U[a,b]
Formula:
f(x) =
1
b −a
a < x < b
= 0 elsewhere
Graph.
Mean: µ = (a + b)/2
Variance: σ
2
= (b −a)
2
/12; σ = (b −a)/

12
CDF: (Area between a and c)
P(X ≤ c) = 0, c ≤ a ,
P(X ≤ c) =
c −a
b −a
, a ≤ c ≤ b ,
P(X ≤ c) = 1, c ≥ b
51
Exercise. Specialize the above results to the Uniform [0, 1] case.
4 Exponential
The exponential pdf often arises, in practice, as being the distribution of the amount
of time until some specific event occurs. Examples include time until a new car breaks
down, time until an arrival at emergency room, ... etc.
A rv X is said to have an exponential pdf with parameter λ > 0 if
f(x) = λe
−λx
, x ≥ 0
= 0 elsewhere
Properties
Graph.
Mean: µ = 1/λ
Variance: σ
2
= 1/λ
2
, σ = 1/λ
CDF: P(X ≤ a) = 1 −e
−λa
.
P(X > a) = e
−λa
Example 1. Suppose that the length of a phone call in minutes is an exponential rv with
parameter λ = 1/10. If someone arrives immediately ahead of you at a public telephone
booth, find the probability that you will have to wait (i) more than 10 minutes, and (ii)
between 10 and 20 minutes.
Solution Let X be the be the length of a phone call in minutes by the person ahead of
you.
(i)
P(X > 10) = e
−λa
= e
−1
0.368
(ii)
P(10 < X < 20) = e
−1
−e
−2
0.233
Example 2. The amount of time, in hours, that a computer functions before breaking
down is an exponential rv with λ = 1/100.
(i) What is the probability that a computer will function between 50 and 150 hours
before breaking down?
(ii) What is the probability that it will function less than 100 hours?
Solution.
52
(i) The probability that a computer will function between 50 and 150 hours before
breaking down is given by
P(50 ≤ X ≤ 150) = e
−50/100
−e
−150/100
= e
−1/2
−e
−3/2
.384
(ii) Exercise.
Memoryless Property
FACT. The exponential rv has the memoryless property.
Converse The exponential distribution is the only continuous distribution with the
memoryless property.
Review Exercises: Normal Distribution
Please show all work. No credit for a correct final answer without a valid argu-
ment. Use the formula, substitution, answer method whenever possible. Show your work
graphically in all relevant questions.
1. Calculate the area under the standard normal curve between the following values.
(i) z = 0 and z = 1.6 (i.e. P(0 ≤ Z ≤ 1.6))
(ii) z = 0 and z = −1.6 (i.e. P(−1.6 ≤ Z ≤ 0))
(iii) z = .86 and z = 1.75 (i.e. P(.86 ≤ Z ≤ 1.75))
(iv) z = −1.75 and z = −.86 (i.e. P(−1.75 ≤ Z ≤ −.86))
(v) z = −1.26 and z = 1.86 (i.e. P(−1.26 ≤ Z ≤ 1.86))
(vi) z = −1.0 and z = 1.0 (i.e. P(−1.0 ≤ Z ≤ 1.0))
(vii) z = −2.0 and z = 2.0 (i.e. P(−2.0 ≤ Z ≤ 2.0))
(viii) z = −3.0 and z = 3.0 (i.e. P(−3.0 ≤ Z ≤ 3.0))
2. Let Z be a standard normal distribution. Find z
0
such that
(i) P(Z ≥ z
0
) = 0.05
(ii) P(Z ≥ z
0
) = 0.99
(iii) P(Z ≥ z
0
) = 0.0708
(iv) P(Z ≤ z
0
) = 0.0708
(v) P(−z
0
≤ Z ≤ z
0
) = 0.68
(vi) P(−z
0
≤ Z ≤ z
0
) = 0.95
53
3. Let Z be a standard normal distribution. Find z
0
such that
(i) P(Z ≥ z
0
) = 0.10
(ii) P(Z ≥ z
0
) = 0.05
(iii) P(Z ≥ z
0
) = 0.025
(iv) P(Z ≥ z
0
) = 0.01
(v) P(Z ≥ z
0
) = 0.005
4. A normally distributed random variable X possesses a mean of µ = 10 and a
standard deviation of σ = 5. Find the following probabilities.
(i) X falls between 10 and 12 (i.e. P(10 ≤ X ≤ 12)).
(ii) X falls between 6 and 14 (i.e. P(6 ≤ X ≤ 14)).
(iii) X is less than 12 (i.e. P(X ≤ 12)).
(iv) X exceeds 10 (i.e. P(X ≥ 10)).
5. The height of adult women in the United States is normally distributed with mean
64.5 inches and standard deviation 2.4 inches.
(i) Find the probability that a randomly chosen woman is larger than 70 inches tall.
(Answer: .011)
(ii) Alice is 71 inches tall. What percentage of women are shorter than Alice. (Answer:
.9966)
6. The lifetimes of batteries produced by a firm are normally distributed with a mean
of 100 hours and a standard deviation of 10 hours. What is the probability a randomly
selected battery will last between 110 and 120 hours.
7. Answer by True of False . (Circle your choice).
T F (i) The standard normal distribution has its mean and standard deviation equal
to zero.
T F (ii) The standard normal distribution has its mean and standard deviation equal
to one.
T F (iii) The standard normal distribution has its mean equal to one and standard
deviation equal to zero.
T F (iv) The standard normal distribution has its mean equal to zero and standard
deviation equal to one.
T F (v) Because the normal distribution is symmetric half of the area under the curve
lies below the 40th percentile.
54
T F (vi) The total area under the normal curve is equal to one only if the mean is
equal to zero and standard deviation equal to one.
T F (vii) The normal distribution is symmetric only if the mean is zero and the
standard deviation is one.
55
Chapter 5
Sampling Distributions
Contents.
The Central Limit Theorem
The Sampling Distribution of the Sample Mean
The Sampling Distribution of the Sample Proportion
The Sampling Distribution of the Difference Between Two Sample Means
The Sampling Distribution of the Difference Between Two Sample Proportions
1 The Central Limit Theorem (CLT)
Roughly speaking, the CLT says
The sampling distribution of the sample mean, X, is
Z =
X −µ
X
σ
X
The sampling distribution of the sample proportion,
ˆ
P, is
Z =
ˆ p −µ
ˆ p
σ
ˆ p
2 Sampling Distributions
Suppose the distribution of X is normal with with mean µ and standard deviation σ.
(i) What is the distribution of
X−µ
σ
?
Answer: It is a standard normal, i.e.
56
Z =
X −µ
σ
I. The Sampling Distribution of the Sample Mean
(ii) What is the the mean (expected value) and standard deviation of X?
Answer:
µ
X
= E(X) = µ
σ
X
= S.E.(X) =
σ

n
(iii) What is the sampling distribution of the sample mean X?
Answer: The distribution of X is a normal distribution with mean µ and standard
deviation σ/

n, equivalently,
Z =
X −µ
X
σ
X
=
X −µ
σ/

n
(iv) What is the sampling distribution of the sample mean, X, if X is not normally
distributed?
Answer: The distribution of X is approximately a normal distribution with mean µ
and standard deviation σ/

n provided n is large (i.e. n ≥ 30).
Example. Consider a population, X, with mean µ = 4 and standard deviation σ = 3.
A sample of size 36 is to be selected.
(i) What is the mean and standard deviation of X?
(ii) Find P(4 < X < 5),
(iii) Find P(X > 3.5), (exercise)
(iv) Find P(3.5 ≤ X ≤ 4.5). (exercise)
II. The Sampling Distribution of the Sample Proportion
Suppose the distribution of X is binomial with with parameters n and p.
(ii) What is the the mean (expected value) and standard deviation of
ˆ
P?
Answer:
µ
ˆ
P
= E(
ˆ
P) = p
57
σ
ˆ
P
= S.E.(
ˆ
P) =

pq
n
(iii) What is the sampling distribution of the sample proportion
ˆ
P?
Answer:
ˆ
P has a normal distribution with mean p and standard deviation

pq
n
,
equivalently
Z =
ˆ
P −µ
ˆ
P
σ
ˆ
P
=
ˆ
P −p

pq
n
provided n is large (i.e. np ≥ 5, and nq ≥ 5).
Example. It is claimed that at least 30% of all adults favor brand A versus brand B.
To test this theory a sample n = 400 is selected. Suppose 130 individuals indicated
preference for brand A.
DATA SUMMARY: n = 400, x = 130, p = .30, ˆ p = 130/400 = .325
(i) Find the mean and standard deviation of the sample proportion
ˆ
P.
Answer:
µ
ˆ p
= p = .30
σ
ˆ p
=

pq
n
= .023
(ii) Find P(
ˆ
P > 0.30)
III. Comparing two Sample Means
E(X
1
−X
2
) = µ
1
−µ
2
σ
X
1
−X
2
=

σ
2
1
n
1
+
σ
2
2
n
2
Z =
X
1
−X
2
−(µ
1
−µ
2
)

σ
2
1
n
1
+
σ
2
2
n
2
provided n
1
, n
2
≥ 30.
58
IV. Comparing two Sample Proportions
E(
ˆ
P
1

ˆ
P
2
) = p
1
−p
2
σ
ˆ
P
1

ˆ
P
2
=

p
1
q
1
n
1
+
p
2
q
2
n
2
Z =
ˆ
P
1

ˆ
P
2
−(p
1
−p
2
)

p
1
q
1
n
1
+
p
2
q
2
n
2
provided n
1
and n
2
are large.
Review Exercises: Sampling Distributions
Please show all work. No credit for a correct final answer without a valid argu-
ment. Use the formula, substitution, answer method whenever possible. Show your work
graphically in all relevant questions.
1. A normally distributed random variable X possesses a mean of µ = 20 and a
standard deviation of σ = 5. A random sample of n = 16 observations is to be selected.
Let X be the sample average.
(i) Describe the sampling distribution of X (i.e. describe the distribution of X and
give µ
x
, σ
x
). (Answer: µ = 20, σ
x
= 1.2)
(ii) Find the z-score of x = 22 (Answer: 1.6)
(iii) Find P(X ≥ 22) =
(iv) Find P(20 ≤ X ≤ 22)).
(v) Find P(16 ≤ X ≤ 19)).
(vi) Find P(X ≥ 23)).
(vii) Find P(X ≥ 18)).
2. The number of trips to doctor’s office per family per year in a given community is
known to have a mean of 10 with a standard deviation of 3. Suppose a random sample
of 49 families is taken and a sample mean is calculated.
(i) Describe the sampling distribution of the sample mean, X. (Include the mean µ
x
,
standard deviation σ
x
, and type of distribution).
59
(ii) Find the probability that the sample mean, X, does not exceed 9.(Answer: .01)
(iii) Find the probability that the sample mean, X, does not exceed 11. (Answer:
.99)
3. When a random sample of size n is drawn from a normal population with mean µ
and and variance σ
2
, the sampling distribution of the sample mean X will be
(a) exactly normal.
(b) approximately normal
(c) binomial
(d) none of the above
4. Answer by True of False . (Circle your choice).
T F (i) The central limit theorem applies regardless of the shape of the population
frequency distribution.
T F (ii) The central limit theorem is important because it explains why some estima-
tors tend to possess, approximately, a normal distribution.
60
Chapter 6
Large Sample Estimation
Contents.
1. Introduction
2. Point Estimators and Their Properties
3. Single Quantitative Population
4. Single Binomial Population
5. Two Quantitative Populations
6. Two Binomial Populations
7. Choosing the Sample Size
1 Introduction
Types of estimators.
1. Point estimator
2. Interval estimator: (L, U)
Desired Properties of Point Estimators.
(i) Unbiased: Mean of the sampling distribution is equal to the parameter.
(ii) Minimum variance: Small standard error of point estimator.
(iii) Error of estimation: distance between a parameter and its point estimate is small.
Desired Properties of Interval Estimators.
(i) Confidence coefficient: P(interval estimator will enclose the parameter)=1 − α
should be as high as possible.
(ii) Confidence level: Confidence coefficient expressed as a percentage.
(iii) Margin of Error: (Bound on the error of estimation) should be as small as possible.
Parameters of Interest.
61
Single Quantitative Population: µ
Single Binomial Population: p
Two Quantitative Populations: µ
1
−µ
2
Two Binomial Populations: p
1
−p
2
2 Point Estimators and Their Properties
Parameter of interest: θ
Sample data: n,
ˆ
θ, σ
ˆ
θ
Point estimator:
ˆ
θ
Estimator mean: µ
ˆ
θ
= θ (Unbiased)
Standard error: SE(
ˆ
θ) = σ
ˆ
θ
Assumptions: Large sample + others (to be specified in each case)
3 Single Quantitative Population
Parameter of interest: µ
Sample data: n, x, s
Other information: α
Point estimator: x
Estimator mean: µ
x
= µ
Standard error: SE(x) = σ/

n (also denoted as σ
x
)
Confidence Interval (C.I.) for µ:
x ±z
α/2
σ

n
Confidence level: (1 − α)100% which is the probability that the interval estimator
contains the parameter.
Margin of Error. ( or Bound on the Error of Estimation)
B = z
α/2
σ

n
Assumptions.
1. Large sample (n ≥ 30)
2. Sample is randomly selected
62
Example 1. We are interested in estimating the mean number of unoccupied seats per
flight, µ, for a major airline. A random sample of n = 225 flights shows that the sample
mean is 11.6 and the standard deviation is 4.1.
Data summary: n = 225; x = 11.6; s = 4.1.
Question 1. What is the point estimate of µ ( Do not give the margin of error)?
x = 11.6
Question 2. Give a 95% bound on the error of estimation (also known as the margin
of error).
B = z
α/2
σ

n
= 1.96
4.1

225
= 0.5357
Question 3. Find a 90% confidence interval for µ.
x ±z
α/2
σ

n
11.6 ±1.645
4.1

225
11.6 ±0.45 = (11.15, 12.05)
Question 4. Interpret the CI found in Question 3.
The interval contains µ with probability 0.90.
OR
If repeated sampling is used, then 90% of CI constructed would contain µ.
Question 5. What is the width of the CI found in Question 3.?
The width of the CI is
W = 2z
α/2
σ

n
W = 2(0.45) = 0.90
OR
W = 12.05 −11.15 = 0.90
Question 6. If n, the sample size, is increased what happens to the width of the CI?
what happens to the margin of error?
The width of the CI decreases.
The margin of error decreases.
Sample size:
n
(z
α/2
)
2
σ
2
B
2
63
where σ is estimated by s.
Note: In the absence of data, σ is sometimes approximated by
R
4
where R is the
range.
Example 2. Suppose you want to construct a 99% CI for µ so that W = 0.05. You are
told that preliminary data shows a range from 13.3 to 13.7. What sample size should
you choose?
A. Data summary: α = .01; R = 13.7 −13.3 = .4;
so σ .4/4 = .1. Now
B = W/2 = 0.05/2 = 0.025. Therefore
n
(z
α/2
)
2
σ
2
B
2
=
2.58
2
(.1)
2
0.025
2
= 106.50 .
So n = 107. (round up)
Exercise 1. Find the sample size necessary to reduce W in the flight example to .6. Use
α = 0.05.
4 Single Binomial Population
Parameter of interest: p
Sample data: n, x, ˆ p =
x
n
(x here is the number of successes).
Other information: α
Point estimator: ˆ p
Estimator mean: µ
ˆ p
= p
Standard error: σ
ˆ p
=

pq
n
Confidence Interval (C.I.) for p:
ˆ p ±z
α/2

ˆ pˆ q
n
Confidence level: (1 − α)100% which is the probability that the interval estimator
contains the parameter.
Margin of Error.
B = z
α/2

ˆ pˆ q
n
64
Assumptions.
1. Large sample (np ≥ 5; nq ≥ 5)
2. Sample is randomly selected
Example 3. A random sample of n = 484 voters in a community produced x = 257
voters in favor of candidate A.
Data summary: n = 484; x = 257; ˆ p =
x
n
=
257
484
= 0.531.
Question 1. Do we have a large sample size?
nˆ p = 484(0.531) = 257 which is ≥ 5.
nˆ q = 484(0.469) = 227 which is ≥ 5.
Therefore we have a large sample size.
Question 2. What is the point estimate of p and its margin of error?
ˆ p =
x
n
=
257
484
= 0.531
B = z
α/2

ˆ pˆ q
n
= 1.96

(0.531)(0.469)
484
= 0.044
Question 3. Find a 90% confidence interval for p.
ˆ p ±z
α/2

ˆ pˆ q
n
0.531 ±1.645

(0.531)(0.469)
484
0.531 ±0.037 = (0.494, 0.568)
Question 4. What is the width of the CI found in Question 3.?
The width of the CI is
W = 2z
α/2

ˆ pˆ q
n
= 2(0.037) = 0.074
Question 5. Interpret the CI found in Question 3.
The interval contains p with probability 0.90.
OR
If repeated sampling is used, then 90% of CI constructed would contain p.
Question 6. If n, the sample size, is increased what happens to the width of the CI?
what happens to the margin of error?
65
The width of the CI decreases.
The margin of error decreases.
Sample size.
n
(z
α/2
)
2
(ˆ pˆ q)
B
2
.
Note: In the absence of data, choose ˆ p = ˆ q = 0.5 or simply ˆ pˆ q = 0.25.
Example 4. Suppose you want to provide an accurate estimate of customers preferring
one brand of coffee over another. You need to construct a 95% CI for p so that B = 0.015.
You are told that preliminary data shows a ˆ p = 0.35. What sample size should you choose
? Use α = 0.05.
Data summary: α = .05; ˆ p = 0.35; B = 0.015
n
(z
α/2
)
2
(ˆ pˆ q)
B
2
=
(1.96)
2
(0.35)(0.65)
0.015
2
= 3, 884.28
So n = 3, 885. (round up)
Exercise 2. Suppose that no preliminary estimate of ˆ p is available. Find the new sample
size. Use α = 0.05.
Exercise 3. Suppose that no preliminary estimate of ˆ p is available. Find the sample
size necessary so that α = 0.01.
5 Two Quantitative Populations
Parameter of interest: µ
1
−µ
2
Sample data:
Sample 1: n
1
, x
1
, s
1
Sample 2: n
2
, x
2
, s
2
Point estimator: X
1
−X
2
Estimator mean: µ
X
1
−X
2
= µ
1
−µ
2
Standard error: SE(X
1
−X
2
) =

σ
2
1
n
1
+
σ
2
2
n
2
Confidence Interval.
(x
1
−x
2
) ±z
α/2

σ
2
1
n
1
+
σ
2
2
n
2
66
Assumptions.
1. Large samples ( n
1
≥ 30; n
2
≥ 30)
2. Samples are randomly selected
3. Samples are independent
Sample size.
n
(z
α/2
)
2

2
1
+ σ
2
2
)
B
2
6 Two Binomial Populations
Parameter of interest: p
1
−p
2
Sample 1: n
1
, x
1
, ˆ p
1
=
x
1
n
1
Sample 2: n
2
, x
2
, ˆ p
2
=
x
2
n
2
p
1
−p
2
(unknown parameter)
α (significance level)
Point estimator: ˆ p
1
− ˆ p
2
Estimator mean: µ
ˆ p
1
−ˆ p
2
= p
1
−p
2
Estimated standard error: σ
ˆ p
1
−ˆ p
2
=

ˆ p
1
ˆ q
1
n
1
+
ˆ p
2
ˆ q
2
n
2
Confidence Interval.
(ˆ p
1
− ˆ p
2
) ±z
α/2

ˆ p
1
ˆ q
1
n
1
+
ˆ p
2
ˆ q
2
n
2
Assumptions.
1. Large samples,
(n
1
p
1
≥ 5, n
1
q
1
≥ 5, n
2
p
2
≥ 5, n
2
q
2
≥ 5)
2. Samples are randomly and independently selected
Sample size.
n
(z
α/2
)
2
(ˆ p
1
ˆ q
1
+ ˆ p
2
ˆ q
2
)
B
2
For unkown parameters:
n
(z
α/2
)
2
(0.5)
B
2
Review Exercises: Large-Sample Estimation
67
Please show all work. No credit for a correct final answer without a valid argu-
ment. Use the formula, substitution, answer method whenever possible. Show your work
graphically in all relevant questions.
1. A random sample of size n = 100 is selected form a quantitative population. The
data produced a mean and standard deviation of x = 75 and s = 6 respectively.
(i) Estimate the population mean µ, and give a 95% bound on the error of estimation
(or margin of error). (Answer: B=1.18)
(ii) Find a 99% confidence interval for the population mean. (Answer: B=1.55)
(iii) Interpret the confidence interval found in (ii).
(iv) Find the sample size necessary to reduce the width of the confidence interval in
(ii) by half. (Answer: n=400)
2. An examination of the yearly premiums for a random sample of 80 automobile
insurance policies from a major company showed an average of $329 and a standard
deviation of $49.
(i) Give the point estimate of the population parameter µ and a 99% bound on the
error of estimation. (Margin of error). (Answer: B=14.135)
(ii) Construct a 99% confidence interval for µ.
(iii) Suppose we wish our estimate in (i) to be accurate to within $5 with 95% con-
fidence; how many insurance policies should be sampled to achieve the desired level of
accuracy? (Answer: n=369)
3. Suppose we wish to estimate the average daily yield of a chemical manufactured
in a chemical plant. The daily yield recorded for n = 100 days, produces a mean and
standard deviation of x = 870 and s = 20 tons respectively.
(i) Estimate the average daily yield µ, and give a 95% bound on the error of estimation
(or margin of error).
(ii) Find a 99% confidence interval for the population mean.
(iii) Interpret the confidence interval found in (ii).
(iv) Find the sample size necessary to reduce the width of the confidence interval in
(ii) by half.
4. Answer by True of False . (Circle your choice).
T F (i) If the population variance increases and other factors are the same, the width
of the confidence interval for the population mean tends to increase.
68
T F (ii) As the sample size increases, the width of the confidence interval for the
population mean tends to decrease.
T F (iii) Populations are characterized by numerical descriptive measures called sta-
tistics.
T F (iv) If, for a given C.I., α is increased, then the margin of error will increase.
T F (v) The sample standard deviation s can be used to approximate σ when n is
larger than 30.
T F (vi) The sample mean always lies above the population mean.
69
Chapter 7
Large-Sample Tests of Hypothesis
Contents.
1. Elements of a statistical test
2. A Large-sample statistical test
3. Testing a population mean
4. Testing a population proportion
5. Testing the difference between two population means
6. Testing the difference between two population proportions
7. Reporting results of statistical tests: p-Value
1 Elements of a Statistical Test
Null hypothesis: H
0
Alternative (research) hypothesis: H
a
Test statistic:
Rejection region : reject H
0
if .....
Graph:
Decision: either “Reject H
0
” or “Do no reject H
0

Conclusion: At 100α% significance level there is (in)sufficient statistical evidence to
“ favor H
a
” .
Comments:
* H
0
represents the status-quo
* H
a
is the hypothesis that we want to provide evidence to justify. We show that H
a
is true by showing that H
0
is false, that is proof by contradiction.
Type I error ≡ { reject H
0
|H
0
is true }
70
Type II error ≡ { do not reject H
0
|H
0
is false}
α = Prob{Type I error}
β = Prob{Type II error}
Power of a statistical test:
Prob{reject H
0
— H
0
is false }= 1 −β
Example 1.
H
0
: Innocent
H
a
: Guilty
α = Prob{sending an innocent person to jail}
β = Prob{letting a guilty person go free}
Example 2.
H
0
: New drug is not acceptable
H
a
: New drug is acceptable
α = Prob{marketing a bad drug}
β = Prob{not marketing an acceptable drug}
2 A Large-Sample Statistical Test
Parameter of interest: θ
Sample data: n,
ˆ
θ, σ
ˆ
θ
Test:
Null hypothesis (H
0
) : θ = θ
0
Alternative hypothesis (H
a
): 1) θ > θ
0
; 2) θ < θ
0
; 3) θ = θ
0
Test statistic (TS):
z =
ˆ
θ −θ
0
σ
ˆ
θ
Critical value: either z
α
or z
α/2
Rejection region (RR) :
1) Reject H
0
if z > z
α
2) Reject H
0
if z < −z
α
3) Reject H
0
if z > z
α/2
or z < −z
α/2
Graph:
Decision: 1) if observed value is in RR: “Reject H
0

2) if observed value is not in RR: “Do no reject H
0

71
Conclusion: At 100α% significance level there is (in)sufficient statistical evidence to
· · · .
Assumptions: Large sample + others (to be specified in each case).
One tailed statistical test
Upper (right) tailed test
Lower (left) tailed test
Two tailed statistical test
3 Testing a Population Mean
Parameter of interest: µ
Sample data: n, x, s
Other information: µ
0
= target value, α
Test:
H
0
: µ = µ
0
H
a
: 1) µ > µ
0
; 2) µ < µ
0
; 3) µ = µ
0
T.S. :
z =
x −µ
0
σ/

n
Rejection region (RR) :
1) Reject H
0
if z > z
α
2) Reject H
0
if z < −z
α
3) Reject H
0
if z > z
α/2
or z < −z
α/2
Graph:
Decision: 1) if observed value is in RR: “Reject H
0

2) if observed value is not in RR: “Do no reject H
0

Conclusion: At 100α% significance level there is (in)sufficient statistical evidence to
“ favor H
a
” .
Assumptions:
Large sample (n ≥ 30)
Sample is randomly selected
Example: Test the hypothesis that weight loss in a new diet program exceeds 20 pounds
during the first month.
Sample data : n = 36, x = 21, s
2
= 25, µ
0
= 20, α = 0.05
H
0
: µ = 20 (µ is not larger than 20)
72
H
a
: µ > 20 (µ is larger than 20)
T.S. :
z =
x −µ
0
s/

n
=
21 −20
5/

36
= 1.2
Critical value: z
α
= 1.645
RR: Reject H
0
if z > 1.645
Graph:
Decision: Do not reject H
0
Conclusion: At 5% significance level there is insufficient statistical evidence to con-
clude that weight loss in a new diet program exceeds 20 pounds per first month.
Exercise: Test the claim that weight loss is not equal to 19.5.
4 Testing a Population Proportion
Parameter of interest: p (unknown parameter)
Sample data: n and x (or ˆ p =
x
n
)
p
0
= target value
α (significance level)
Test:
H
0
: p = p
0
H
a
: 1) p > p
0
; 2) p < p
0
; 3) p = p
0
T.S. :
z =
ˆ p −p
0

p
0
q
0
/n
RR:
1) Reject H
0
if z > z
α
2) Reject H
0
if z < −z
α
3) Reject H
0
if z > z
α/2
or z < −z
α/2
Graph:
Decision:
1) if observed value is in RR: “Reject H
0

2) if observed value is not in RR: “Do not reject H
0

Conclusion: At (α)100% significance level there is (in)sufficient statistical evidence
to “ favor H
a
” .
Assumptions:
73
1. Large sample (np ≥ 5, nq ≥ 5)
2. Sample is randomly selected
Example. Test the hypothesis that p > .10 for sample data: n = 200, x = 26.
Solution.
ˆ p =
x
n
=
26
200
= .13,
Now
H
0
: p = .10
H
a
: p > .10
TS:
z =
ˆ p −p
0

p
0
q
0
/n
=
.13 −.10

(.10)(.90)/200
= 1.41
RR: reject H
0
if z > 1.645
Graph:
Dec: Do not reject H
0
Conclusion: At 5% significance level there is insufficient statistical evidence to con-
clude that p > .10.
Exercise Is the large sample assumption satisfied here ?
5 Comparing Two Population Means
Parameter of interest: µ
1
−µ
2
Sample data:
Sample 1: n
1
, x
1
, s
1
Sample 2: n
2
, x
2
, s
2
Test:
H
0
: µ
1
−µ
2
= D
0
H
a
: 1)µ
1
−µ
2
> D
0
; 2) µ
1
−µ
2
< D
0
;
3) µ
1
−µ
2
= D
0
T.S. :
z =
(x
1
−x
2
) −D
0

σ
2
1
n
1
+
σ
2
2
n
2
RR:
1) Reject H
0
if z > z
α
2) Reject H
0
if z < −z
α
74
3) Reject H
0
if z > z
α/2
or z < −z
α/2
Graph:
Decision:
Conclusion:
Assumptions:
1. Large samples ( n
1
≥ 30; n
2
≥ 30)
2. Samples are randomly selected
3. Samples are independent
Example: (Comparing two weight loss programs)
Refer to the weight loss example. Test the hypothesis that weight loss in the two diet
programs are different.
1. Sample 1 : n
1
= 36, x
1
= 21, s
2
1
= 25 (old)
2. Sample 2 : n
2
= 36, x
2
= 18.5, s
2
2
= 24 (new)
D
0
= 0, α = 0.05
H
0
: µ
1
−µ
2
= 0
H
a
: µ
1
−µ
2
= 0,
T.S. :
z =
(x
1
−x
2
) −0

σ
2
1
n
1
+
σ
2
2
n
2
= 2.14
Critical value: z
α/2
= 1.96
RR: Reject H
0
if z > 1.96 or z < −1.96
Graph:
Decision: Reject H
0
Conclusion: At 5% significance level there is sufficient statistical evidence to conclude
that weight loss in the two diet programs are different.
Exercise: Test the hypothesis that weight loss in the old diet program exceeds that of
the new program.
Exercise: Test the claim that the difference in mean weight loss for the two programs
is greater than 1.
6 Comparing Two Population Proportions
Parameter of interest: p
1
−p
2
Sample 1: n
1
, x
1
, ˆ p
1
=
x
1
n
1
,
75
Sample 2: n
2
, x
2
, ˆ p
2
=
x
2
n
2
,
p
1
−p
2
(unknown parameter)
Common estimate:
ˆ p =
x
1
+ x
2
n
1
+ n
2
Test:
H
0
: p
1
−p
2
= 0
H
a
: 1) p
1
−p
2
> 0
2) p
1
−p
2
< 0
3) p
1
−p
2
= 0
T.S. :
z =
(ˆ p
1
− ˆ p
2
) −0

ˆ pˆ q(1/n
1
+ 1/n
2
)
RR:
1) Reject H
0
if z > z
α
2) Reject H
0
if z < −z
α
3) Reject H
0
if z > z
α/2
or z < −z
α/2
Graph:
Decision:
Conclusion:
Assumptions:
Large sample(n
1
p
1
≥ 5, n
1
q
1
≥ 5, n
2
p
2
≥ 5, n
2
q
2
≥ 5)
Samples are randomly and independently selected
Example: Test the hypothesis that p
1
− p
2
< 0 if it is known that the test statistic is
z = −1.91.
Solution:
H
0
: p
1
−p
2
= 0
H
a
: p
1
−p
2
< 0
TS: z = −1.91
RR: reject H
0
if z < −1.645
Graph:
Dec: reject H
0
Conclusion: At 5% significance level there is sufficient statistical evidence to conclude
that p
1
−p
2
< 0.
76
Exercise: Repeat as a two tailed test
7 Reporting Results of Statistical Tests: P-Value
Definition. The p-value for a test of a hypothesis is the smallest value of α for which
the null hypothesis is rejected, i.e. the statistical results are significant.
The p-value is called the observed significance level
Note: The p-value is the probability ( when H
0
is true) of obtaining a value of the
test statistic as extreme or more extreme than the actual sample value in support of H
a
.
Examples. Find the p-value in each case:
(i) Upper tailed test:
H
0
: θ = θ
0
H
a
: θ > θ
0
TS: z = 1.76
p-value = .0392
(ii) Lower tailed test:
H
0
: θ = θ
0
H
a
: θ < θ
0
TS: z = −1.86
p-value = .0314
(iii) Two tailed test:
H
0
: θ = θ
0
H
a
: θ = θ
0
TS: z = 1.76
p-value = 2(.0392) = .0784
Decision rule using p-value: (Important)
Reject H
0
for all α > p −value
Review Exercises: Testing Hypothesis
Please show all work. No credit for a correct final answer without a valid argu-
ment. Use the formula, substitution, answer method whenever possible. Show your work
graphically in all relevant questions.
1. A local pizza parlor advertises that their average time for delivery of a pizza is
within 30 minutes of receipt of the order. The delivery time for a random sample of 64
77
orders were recorded, with a sample mean of 34 minutes and a standard deviation of 21
minutes.
(i) Is there sufficient evidence to conclude that the actual delivery time is larger than
what is claimed by the pizza parlor? Use α = .05.
H
0
:
H
a
:
T.S. (Answer: 1.52)
R.R.
Graph:
Dec:
Conclusion:
((ii) Test the hypothesis that H
a
: µ = 30.
2. Answer by True of False . (Circle your choice).
T F (v) If, for a given test, α is fixed and the sample size is increased, then β will
increase.
78
Chapter 8
Small-Sample Tests of Hypothesis
Contents:
1. Introduction
2. Student’s t distribution
3. Small-sample inferences about a population mean
4. Small-sample inferences about the difference between two means: Independent
Samples
5. Small-sample inferences about the difference between two means: Paired Samples
6. Inferences about a population variance
7. Comparing two population variances
1 Introduction
When the sample size is small we only deal with normal populations.
For non-normal (e.g. binomial) populations different techniques are necessary
2 Student’s t Distribution
RECALL
For small samples (n < 30) from normal populations, we have
z =
x −µ
σ/

n
If σ is unknown, we use s instead; but we no more have a Z distribution
Assumptions.
79
1. Sampled population is normal
2. Small random sample (n < 30)
3. σ is unknown
t =
x −µ
s/

n
Properties of the t Distribution:
(i) It has n −1 degrees of freedom (df)
(ii) Like the normal distribution it has a symmetric mound-shaped probability distri-
bution
(iii) More variable (flat) than the normal distribution
(iv) The distribution depends on the degrees of freedom. Moreover, as n becomes
larger, t converges to Z.
(v) Critical values (tail probabilities) are obtained from the t table
Examples.
(i) Find t
0.05,5
= 2.015
(ii) Find t
0.005,8
= 3.355
(iii) Find t
0.025,26
= 2.056
3 Small-Sample Inferences About a Population Mean
Parameter of interest: µ
Sample data: n, x, s
Other information: µ
0
= target value, α
Point estimator: x
Estimator mean: µ
x
= µ
Estimated standard error: σ
x
= s/

n
Confidence Interval for µ:
x ±t
α
2
,n−1
(
s

n
)
Test:
H
0
: µ = µ
0
H
a
: 1) µ > µ
0
; 2) µ < µ
0
; 3) µ = µ
0
.
Critical value: either t
α,n−1
or t
α
2
,n−1
80
T.S. : t =
x−µ
0
s/

n
RR:
1) Reject H
0
if t > t
α,n−1
2) Reject H
0
if t < −t
α,n−1
3) Reject H
0
if t > t
α
2
,n−1
or t < −t
α
2
,n−1
Graph:
Decision: 1) if observed value is in RR: “Reject H
0

2) if observed value is not in RR: “Do not reject H
0

Conclusion: At 100α% significance level there is (in)sufficient statistical evidence to
“favor H
a
” .
Assumptions.
1. Small sample (n < 30)
2. Sample is randomly selected
3. Normal population
4. Unknown variance
Example For the sample data given below, test the hypothesis that weight loss in a new
diet program exceeds 20 pounds per first month.
1. Sample data: n = 25, x = 21.3, s
2
= 25, µ
0
= 20, α = 0.05
Critical value: t
0.05,24
= 1.711
H
0
: µ = 20
H
a
: µ > 20,
T.S.:
t =
x −µ
0
s/

n
=
21.3 −20
5/

25
= 1.3
RR: Reject H
0
if t > 1.711
Graph:
Decision: Do not reject H
0
Conclusion: At 5% significance level there is insufficient statistical evidence to con-
clude that weight loss in a new diet program exceeds 20 pounds per first month.
Exercise. Test the claim that weight loss is not equal to 19.5, (i.e. H
a
: µ = 19.5).
4 Small-Sample Inferences About the Difference Be-
tween Two Means: Independent Samples
Parameter of interest: µ
1
−µ
2
81
Sample data:
Sample 1: n
1
, x
1
, s
1
Sample 2: n
2
, x
2
, s
2
Other information: D
0
= target value, α
Point estimator: X
1
−X
2
Estimator mean: µ
X
1
−X
2
= µ
1
−µ
2
Assumptions.
1. Normal populations
2. Small samples ( n
1
< 30; n
2
< 30)
3. Samples are randomly selected
4. Samples are independent
5. Variances are equal with common variance
σ
2
= σ
2
1
= σ
2
2
Pooled estimator for σ.
s =

(n
1
−1)s
2
1
+ (n
2
−1)s
2
2
n
1
+ n
2
−2
Estimator standard error:
σ
X
1
−X
2
= σ

1
n
1
+
1
n
2
Reason:
σ
X
1
−X
2
=

σ
2
1
n
1
+
σ
2
2
n
2
=

σ
2
n
1
+
σ
2
n
2
= σ

1
n
1
+
1
n
2
Confidence Interval:
(x
1
−x
2
) ±(t
α/2,n
1
+n
2
−2
)(s

1
n
1
+
1
n
2
)
Test:
H
0
: µ
1
−µ
2
= D
0
82
H
a
: 1)µ
1
−µ
2
> D
0
; 2) µ
1
−µ
2
< D
0
;
3) µ
1
−µ
2
= D
0
T.S. :
t =
(x
1
−x
2
) −D
0
s

1
n
1
+
1
n
2
RR: 1) Reject H
0
if t > t
α,n
1
+n
2
−2
2) Reject H
0
if t < −t
α,n
1
+n
2
−2
3) Reject H
0
if t > t
α/2,n
1
+n
2
−2
or t < −t
α/2,n
1
+n
2
−2
Graph:
Decision:
Conclusion:
Example.(Comparison of two weight loss programs)
Refer to the weight loss example. Test the hypothesis that weight loss in a new diet
program is different from that of an old program. We are told that that the observed
value is 2.2 and the we know that
1. Sample 1 : n
1
= 7
2. Sample 2 : n
2
= 8
α = 0.05
Solution.
H
0
: µ
1
−µ
2
= 0
H
a
: µ
1
−µ
2
= 0
T.S. :
t =
(x
1
−x
2
) −0
s

1
n
1
+
1
n
2
= 2.2
Critical value: t
.025,13
= 2.160
RR: Reject H
0
if t > 2.160 or t < −2.160
Graph:
Decision: Reject H
0
Conclusion: At 5% significance level there is sufficient statistical evidence to conclude
that weight loss in the two diet programs are different.
Exercise: Test the claim that the difference in mean weight loss for the two programs
is greater than 0.
Minitab Commands: A twosample t procedure with a pooled estimate of variance
MTB> twosample C1 C2;
SUBC>pooled;
83
SUBC> alternative 1.
Note: alternative : 1=right-tailed; -1=left tailed; 0=two tailed.
5 Small-Sample Inferences About the Difference Be-
tween Two Means: Paired Samples
Parameter of interest: µ
1
−µ
2
= µ
d
Sample of paired differences data:
Sample : n = number of pairs, d = sample mean, s
d
Other information: D
0
= target value, α
Point estimator: d
Estimator mean: µ
d
= µ
d
Assumptions.
1. Normal populations
2. Small samples ( n
1
< 30; n
2
< 30)
3. Samples are randomly selected
4. Samples are paired (not independent)
Sample standard deviation of the sample of n paired differences
s
d
=

¸
n
i=1
(d
i
−d)
2
n −1
Estimator standard error: σ
d
= s
d
/

n
Confidence Interval.
d ±t
α/2,n−1
s
d
/

n
Test.
H
0
: µ
1
−µ
2
= D
0
(equivalently, µ
d
= D
0
)
H
a
: 1)µ
1
−µ
2
= µ
d
> D
0
; 2) µ
1
−µ
2
= µ
d
< D
0
;
3) µ
1
−µ
2
= µ
d
= D
0
,
T.S. :
t =
d −D
0
s
d
/

n
RR:
1) Reject H
0
if t > t
α,n−1
2) Reject H
0
if t < −t
α,n−1
84
3) Reject H
0
if t > t
α/2,n−1
or t < −t
α/2,n−1
Graph:
Decision:
Conclusion:
Example. A manufacturer wishes to compare wearing qualities of two different types
of tires, A and B. For the comparison a tire of type A and one of type B are randomly
assigned and mounted on the rear wheels of each of five automobiles. The automobiles
are then operated for a specified number of miles, and the amount of wear is recorded
for each tire. These measurements are tabulated below.
Automobile Tire A Tire B
1 10.6 10.2
2 9.8 9.4
3 12.3 11.8
4 9.7 9.1
5 8.8 8.3
x
1
= 10.24 x
2
= 9.76
Using the previous section test we would have t = 0.57 resulting in an insignificant
test which is inconsistent with the data.
Automobile Tire A Tire B d=A-B
1 10.6 10.2 .4
2 9.8 9.4 .4
3 12.3 11.8 .5
4 9.7 9.1 .6
5 8.8 8.3 .5
x
1
= 10.24 x
2
= 9.76 d = .48
Q1: Provide a summary of the data in the above table.
Sample summary: n = 5, d = .48, s
d
= .0837
Q2: Do the data provide sufficient evidence to indicate a difference in average wear
for the two tire types.
Test. (parameter µ
d
= µ
1
−µ
2
)
H
0
: µ
d
= 0
H
a
: µ
d
= 0
T.S. :
t =
d −D
0
s
d
/

n
=
.48 −0
.0837/

5
= 12.8
85
RR: Reject H
0
if t > 2.776 or t < −2.776 ( t
.025,4
= 2.776)
Graph:
Decision: Reject H
0
Conclusion: At 5% significance level there is sufficient statistical evidence to to con-
clude that the average amount of wear for type A tire is different from that for type B
tire.
Exercise. Construct a 99% confidence interval for the difference in average wear for the
two tire types.
6 Inferences About a Population Variance
Chi-square distribution. When a random sample of size n is drawn from a normal
population with mean µ and standard deviation σ, the sampling distribution of S
2
de-
pends on n. The standardized distribution of S
2
is called the chi-square distribution and
is given by
X
2
=
(n −1)s
2
σ
2
Degrees of freedom (df): ν = n −1
Graph: Non-symmetrical and depends on df
Critical values: using X
2
tables
Test.
H
0
: σ
2
= σ
2
0
H
a
: σ
2
= σ
2
0
(two-tailed test).
T.S. :
X
2
=
(n −1)s
2
σ
2
0
RR: Reject H
0
if X
2
> X
2
α/2
or X
2
< X
2
1−α/2
where X
2
is based on (n −1) degrees of
freedom.
Graph:
Decision:
Conclusion:
Assumptions.
1. Normal population
2. Random sample
Example:
86
Use text
7 Comparing Two Population Variances
F-distribution. When independent samples are drawn from two normal populations
with equal variances then S
2
1
/S
2
2
possesses a sampling distribution that is known as an
F distribution. That is
F =
s
2
1
s
2
2
Degrees of freedom (df): ν
1
= n
1
−1; ν
2
= n
2
−1
Graph: Non-symmetrical and depends on df
Critical values: using F tables
Test.
H
0
: σ
2
1
= σ
2
2
H
a
: σ
2
1
= σ
2
2
(two-tailed test).
T.S. :F =
s
2
1
s
2
2
where s
2
1
is the larger sample variance.
Note: F =
larger sample variance
smaller sample variance
RR: Reject H
0
if F > F
α/2
where F
α/2
is based on (n
1
− 1) and (n
2
− 1) degrees of
freedom.
Graph:
Decision:
Conclusion:
Assumptions.
1. Normal populations
2. Independent random samples
Example. (Investment Risk) Investment risk is generally measured by the volatility
of possible outcomes of the investment. The most common method for measuring in-
vestment volatility is by computing the variance ( or standard deviation) of possible
outcomes. Returns over the past 10 years for first alternative and 8 years for the second
alternative produced the following data:
Data Summary:
Investment 1: n
1
= 10, x
1
= 17.8%; s
2
1
= 3.21
Investment 2: n
2
= 8, x
2
= 17.8%; s
2
2
= 7.14
Both populations are assumed to be normally distributed.
87
Q1: Do the data present sufficient evidence to indicate that the risks for investments
1 and 2 are unequal ?
Solution.
Test:
H
0
: σ
2
1
= σ
2
2
H
a
: σ
2
1
= σ
2
2
(two-tailed test).
T.S. :
F =
s
2
2
s
2
1
=
7.14
3.21
= 2.22
.
RR: Reject H
0
if F > F
α/2
where
F
α/2,n
2
−1,n
1
−1
= F
.025,7,9
= 4.20
Graph:
Decision: Do not reject H
0
Conclusion: At 5% significance level there is insufficient statistical evidence to indicate
that the risks for investments 1 and 2 are unequal.
Exercise. Do the upper tail test. That is H
a
: σ
2
1
> σ
2
2
.
88
Chapter 9
Analysis of Variance
Contents.
1. Introduction
2. One Way ANOVA: Completely Randomized Experimental Design
3. The Randomized Block Design
1 Introduction
Analysis of variance is a statistical technique used to compare more than two popu-
lation means by isolating the sources of variability.
Example. Four groups of sales people for a magazine sales agency were subjected to
different sales training programs. Because there were some dropouts during the training
program, the number of trainees varied from program to program. At the end of the
training programs each salesperson was assigned a sales area from a group of sales areas
that were judged to have equivalent sales potentials. The table below lists the number
of sales made by each person in each of the four groups of sales people during the first
week after completing the training program. Do the data present sufficient evidence to
indicate a difference in the mean achievement for the four training programs?
Goal. Test whether the means are equal or not. That is
H
0
: µ
1
= µ
2
= µ
3
= µ
4
H
a
: Not all means are equal
Definitions:
(i) Response: variable of interest or dependent variable (sales)
(ii) Factor: categorical variable or independent variable (training technique)
(iii) Treatment levels (factor levels): method of training; t =4
89
Training Group
1 2 3 4
65 75 59 94
87 69 78 89
73 83 67 80
79 81 62 88
81 72 83
69 79 76
90
n
1
= 6 n
2
= 7 n
3
= 6 n
4
= 4 n = 23
T
i
454 549 425 351 GT= 1779
T
i
75.67 78.43 70.83 87.75
parameter µ
1
µ
2
µ
3
µ
4
(iv) ANOVA: ANalysis OF VAriance
(v) N-Way ANOVA: studies N factors.
(vi) experimental unit: (trainee)
2 One Way ANOVA: Completely Randomized Ex-
perimental Design
ANOVA Table
Source of error df SS MS F p-value
Treatments 3 712.6 237.5 3.77
Error 19 1,196.6 63.0
Totals 22 1909.2
Inferences about population means
Test.
H
0
: µ
1
= µ
2
= µ
3
= µ
4
H
a
: Not all means are equal
T.S. : F =
MST
MSE
= 3.77
where F is based on (t-1) and (n-t) df.
RR: Reject H
0
if F > F
α,t−1,n−t
i.e. Reject H
0
if F > F
0.05,3,19
= 3.13
90
Graph:
Decision: Reject H
0
Conclusion: At 5% significance level there is sufficient statistical evidence to indicate
a difference in the mean achievement for the four training programs.
Assumptions.
1. Sampled populations are normal
2. Independent random samples
3. All t populations have equal variances
Computations.
ANOVA Table
S of error df SS MS F p-value
Trments t-1 SST MST=SST/(t-1) MST/MSE
Error n-t SSE MSE=SSE/(n-t)
Totals n-1 TSS
Training Group
1 2 3 4
x
11
x
21
x
31
x
41
x
12
x
22
x
32
x
42
x
13
x
23
x
33
x
43
x
14
x
24
x
34
x
44
x
15
x
25
x
35
x
16
x
26
x
36
x
27
n
1
n
2
n
3
n
4
n
T
i
T
1
T
2
T
3
T
4
GT
T
i
T
1
T
2
T
3
T
4
parameter µ
1
µ
2
µ
3
µ
4
Notation:
TSS: sum of squares of total deviation.
SST: sum of squares of total deviation between treatments.
SSE: sum of squares of total deviation within treatments (error).
CM: correction for the mean
91
GT: Grand Total.
Computational Formulas for TSS, SST and SSE:
TSS =
t
¸
i=1
n
i
¸
j=1
x
2
ij
−CM
SST =
t
¸
i=1
T
2
i
n
i
−CM
SSE = TSS −SST
Calculations for the training example produce
CM = (
¸¸
x
ij
)
2
/n = 1, 779
2
/23 = 137, 601.8
TSS =
¸¸
x
2
ij
−CM = 1, 909.2
SST =
¸
T
2
i
n
i
−CM = 712.6
SSE = TSS −SST = 1, 196.6
Thus
ANOVA Table
Source of error df SS MS F p-value
Treatments 3 712.6 237.5 3.77
Error 19 1,196.6 63.0
Totals 22 1909.2
Confidence Intervals.
Estimate of the common variance:
s =

s
2
=

MSE =

SSE
n−t
CI for µ
i
:
T
i
±t
α/2,n−t
s

n
i
CI for µ
i
−µ
j
:
(T
i
−T
j
) ±t
α/2,n−t

s
2
(
1
n
i
+
1
n
j
)
MINITAB
MTB> aovoneway C1-C4.
Exercise. Produce a Minitab output for the above example.
92
3 The Randomized Block Design
Extends paired-difference design to more than two treatments.
A randomized block design consists of b blocks, each containing t experimental units.
The t treatments are randomly assigned to the units in each block, and each treatment
appears once in every block.
Example. A consumer preference study involving three different package designs (treat-
ments) was laid out in a randomized block design among four supermarkets (blocks).
The data shown in Table 1. below represent the number of units sold for each package
design within each supermarket during each of three given weeks.
(i) Provide a data summary.
(ii) Do the data present sufficient evidence to indicate a difference in the mean sales
for each package design (treatment)?
(iii) Do the data present sufficient evidence to indicate a difference in the mean sales
for the supermarkets?
weeks
w1 w2 w3
s1 (1) 17 (3) 23 (2) 34
s2 (3) 21 (1) 15 (2) 26
s3 (1) 1 (2) 23 (3) 8
s4 (2) 22 (1) 6 (3) 16
Remarks.
(i) In each supermarket (block) the first entry represents the design (treatment) and
the second entry represents the sales per week.
(ii) The three designs are assigned to each supermarket completely at random.
(iii) An alternate design would be to use 12 supermarkets. Each design (treatment)
would be randomly assigned to 4 supermarkets. In this case the difference in sales could
be due to more than just differences in package design. That is larger supermarkets
would be expected to have larger overall sales of the product than smaller supermarkets.
The randomized block design eliminates the store-to-store variability.
For computational purposes we rearrange the data so that
Data Summary. The treatment and block totals are
t = 3 treatments; b = 4 blocks
93
Treatments
t1 t2 t3 B
i
s1 17 34 23 B
1
s2 15 26 21 B
2
s3 1 23 8 B
3
s4 6 22 16 B
4
T
i
T
1
T
2
T
3
T
1
= 39, T
2
= 105, T
3
= 68
B
1
= 74, B
2
= 62, B
3
= 32, B
4
= 44
Calculations for the training example produce
CM = (
¸¸
x
ij
)
2
/n = 3, 745.33
TSS =
¸¸
x
2
ij
−CM = 940.67
SST =
¸
T
2
i
b
−CM = 547.17
SSB =
¸
B
2
i
t
−CM = 348.00
SSE = TSS −SST −SSB = 45.50
94
MINITAB.(Commands and Printouts)
MTB> Print C1-C3
ROW UNITS TRTS BLOCKS
1 17 1 1
2 34 2 1
3 23 3 1
4 15 1 2
5 26 2 2
6 21 3 2
7 1 1 3
8 23 2 3
9 8 3 3
10 6 1 4
11 22 2 4
12 16 3 4
MTB> ANOVA C1=C2 C3
ANOVA Table
Source of error df SS MS F p-value
Treatments 2 547.17 273.58 36.08 0.000
Blocks 3 348.00 116.00 15.30 0.003
Error 6 45.50 7.58
Totals 11 940.67
95
Solution to (ii)
Test.
H
0
: µ
1
= µ
2
= µ
3
H
a
: Not all means are equal
T.S. : F =
MST
MSE
= 36.09
where F is based on (t-1) and (n-t-b+1) df.
RR: Reject H
0
if F > F
α,t−1,n−t−b+1
i.e. Reject H
0
if F > F
0.05,2,6
= 5.14
Graph:
Decision: Reject H
0
Conclusion: At 5% significance level there is sufficient statistical evidence to indicate
a real difference in the mean sales for the three package designs.
Note that n −t −b + 1 = (t −1)(b −1).
Solution to (iii)
Test.
H
0
: Block means are equal
H
a
: Not all block means are equal (i.e. blocking is desirable)
T.S.: F =
MSB
MSE
= 15.30
where F is based on (b-1) and (n-t-b+1) df.
RR: Reject H
0
if F > F
α,b−1,n−t−b+1
i.e. Reject H
0
if F > F
0.005,3,6
= 12.92
Graph:
Decision: Reject H
0
Conclusion: At .5% significance level there is sufficient statistical evidence to indicate
a real difference in the mean sales for the four supermarkets, that is the data supports
our decision to use supermarkets as blocks.
Assumptions.
1. Sampled populations are normal
2. Dependent random samples due to blocking
3. All t populations have equal variances
Confidence Intervals.
Estimate of the common variance:
s =

s
2
=

MSE =

SSE
n−t−b+1
CI for µ
i
−µ
j
:
96
(T
i
−T
j
) ±t
α/2,n−t−b+1
s

2
b
Exercise. Construct a 90% C.I. for the difference between mean sales from package
designs 1 and 2.
97
Chapter 10
Simple Linear Regression and
Correlation
Contents.
1. Introduction: Example
2. A Simple Linear probabilistic model
3. Least squares prediction equation
4. Inferences concerning the slope
5. Estimating E(y|x) for a given x
6. Predicting y for a given x
7. Coefficient of correlation
8. Analysis of Variance
9. Computer Printouts
1 Introduction
Linear regression is a statistical technique used to predict (forecast) the value of a
variable from known related variables.
Example.( Ad Sales) Consider the problem of predicting the gross monthly sales volume
y for a corporation that is not subject to substantial seasonal variation in its sales volume.
For the predictor variable x we use the the amount spent by the company on advertising
during the month of interest. We wish to determine whether advertising is worthwhile,
that is whether advertising is actually related to the firm’s sales volume. In addition we
wish to use the amount spent on advertising to predict the sales volume. The data in the
table below represent a sample of advertising expenditures,x, and the associated sales
98
volume, y, for 10 randomly selected months.
Ad Sales Data
Month y(y$10,000) x(x$10,000)
1 101 1.2
2 92 0.8
3 110 1.0
4 120 1.3
5 90 0.7
6 82 0.8
7 93 1.0
8 75 0.6
9 91 0.9
10 105 1.1
Definitions.
(i) Response: dependent variable of interest (sales volume)
(ii) Independent (predictor) variable ( Ad expenditure)
(iii) Linear equations (straight line): y = a + bx
Scatter diagram:
Best fit straight line:
Equation of a straight line:
(y-intercept and slope)
2 A Simple Linear Probabilistic Model
Model.
Y = β
0
+ β
1
X +
where
x: independent variable (predictor)
y: dependent variable (response)
β
0
and β
1
are unknown parameters.
: random error due to other factors not included in the model.
Assumptions.
1. E() := µ

= 0.
2. V ar() := σ
2

= σ
2
.
99
3. The r.v. has a normal distribution with mean 0 and variance σ
2
.
4. The random components of any two observed y values are independent.
3 Least Squares Prediction Equation
The least squares prediction equation is sometimes called the estimated regression
equation or the prediction equation.
ˆ y =
ˆ
β
0
+
ˆ
β
1
x
This equation is obtained by using the method of least squares; that is
min
¸
(y − ˆ y)
2
Computational Formulas.
Objective: Estimate β
0
, β
1
and σ
2
.
x =
¸
x/n; y =
¸
y/n
SS
xx
=
¸
(x −x)
2
=
¸
x
2
−(
¸
x)
2
/n
SS
yy
=
¸
(y −y)
2
=
¸
y
2
−(
¸
y)
2
/n
SS
xy
=
¸
(x −x)(y −y) =
¸
xy −(
¸
x)(
¸
y)/n
ˆ
β
1
= SS
xy
/SS
xx
ˆ
β
0
= y −
ˆ
β
1
x.
To estimate σ
2
SSE = SS
yy

ˆ
β
1
SS
xy
= SS
yy
−(SS
xy
)
2
/SS
xx
.
s
2
=
SSE
n −2
Remarks.
(i)
ˆ
β
1
: is the slope of the estimated regression equation.
(ii) s
2
provides a measure of spread of points (x, y) around the regression line.
Ad Sales example
Question 1. Do a scatter diagram. Can you say that x and y are linearly related?
Answer.
Question 2. Use the computational formulas to provide a data summary.
100
Answer.
Data Summary.
x = 0.94; y = 95.9
SS
xx
= .444
SS
xy
= 23.34
SS
yy
= 1600.9
101
Optional material
Ad Sales Calculations
Month x y x
2
xy y
2
1 1.2 101 1.44 121.2 10,201
2 0.8 92 0.64 73.6 8,464
3 1.0 110 1.00 110.0 12,100
4 1.3 120 1.69 156.0 14,400
5 0.7 90 0.49 63.0 8,100
6 0.8 82 0.64 65.6 6,724
7 1.0 93 1.00 93.0 8,649
8 0.6 75 0.36 45.0 5,625
9 0.9 91 0.81 81.9 8,281
10 1.1 105 1.21 115.5 11,025
Sum
¸
x
¸
y
¸
x
2
¸
xy
¸
y
2
9.4 959 9.28 924.8 93,569
x = 0.94 y = 95.9
x =
¸
x/n = 0.94; y =
¸
y/n = 95.9
SS
xx
=
¸
x
2
−(
¸
x)
2
/n = 9.28 −
(9.4)
2
10
= .444
SS
xy
=
¸
xy −(
¸
x)(
¸
y)/n = 924.8 −
(9.4)(959)
10
= 23.34
SS
yy
=
¸
y
2
−(
¸
y)
2
/n = 93, 569 −
(959)
2
10
= 1600.9
102
Question 3. Estimate the parameters β
0
, and β
1
.
Answer.
ˆ
β
1
= SS
xy
/SS
xx
=
23.34
.444
= 52.5676 52.57
ˆ
β
0
= y −
ˆ
β
1
x = 95.9 −(52.5676)(.94) 46.49.
Question 4. Estimate σ
2
.
Answer.
SSE = SS
yy

ˆ
β
1
SS
xy
= 1, 600.9 −(52.5676)(23.34) = 373.97 .
Therefore
s
2
=
SSE
n −2
=
373.97
8
= 46.75
Question 5. Find the least squares line for the data.
Answer.
ˆ y =
ˆ
β
0
+
ˆ
β
1
x = 46.49 + 52.57x
Remark. This equation is also called the estimated regression equation or prediction
line.
Question 6. Predict sales volume, y, for a given expenditure level of $10, 000 (i.e.
x = 1.0).
Answer.
ˆ y = 46.49 + 52.57x = 46.49 + (52.57)(1.0) = 99.06.
So sales volume is $990, 600.
Question 7. Predict the mean sales volume E(y|x) for a given expenditure level of
$10, 000, x = 1.0.
Answer.
E(y|x) = 46.49 + 52.57x = 46.49 + (52.57)(1.0) = 99.06
so the mean sales volume is $990, 600.
Remark. In Question 6 and Question 7 we obtained the same estimate, the bound
on the error of estimation will, however, be different.
4 Inferences Concerning the Slope
Parameter of interest: β
1
Point estimator:
ˆ
β
1
103
Estimator mean: µ
ˆ
β
1
= β
1
Estimator standard error: σ
ˆ
β
1
= σ/

SS
xx
Test.
H
0
: β
1
= β
10
(no linear relationship)
H
a
: β
1
= β
10
(there is linear relationship)
T.S. :
t =
ˆ
β
1
−β
10
s/

SSxx
RR:
Reject H
0
if t > t
α/2,n−2
or t < −t
α/2,n−2
Graph:
Decision:
Conclusion:
Question 8. Determine whether there is evidence to indicate a linear relationship be-
tween advertising expenditure, x, and sales volume, y.
Answer.
Test.
H
0
: β
1
= 0 (no linear relationship)
H
a
: β
1
= 0 (there is linear relationship)
T.S. :
t =
ˆ
β
1
−0
s/

SSxx
=
52.57 −0
6.84/

.444
= 5.12
RR: ( critical value: t
.025,8
= 2.306)
Reject H
0
if t > 2.306 or t < −2.306
Graph:
Decision: Reject H
0
Conclusion: At 5% significance level there is sufficient statistical evidence to indicate
a linear relation ship between advertising expenditure, x, and sales volume, y.
Confidence interval for β
1
:
ˆ
β
1
±t
α/2,n−2
s

SSxx
Question 9. Find a 95% confidence interval for β
1
.
Answer.
104
ˆ
β
1
±t
α/2,n−2
s

SSxx
52.57 ±2.306
6.84

.444
52.57 ±23.57 = (28.90, 76.24)
5 Estimating E(y|x) For a Given x
The confidence interval (CI) for the expected (mean) value of y given x = x
p
is given
by
ˆ y ±t
α/2,n−2

s
2
[
1
n
+
(x
p
−x)
2
SS
xx
]
6 Predicting y for a Given x
The prediction interval (PI) for a particular value of y given x = x
p
is given by
ˆ y ±t
α/2,n−2

s
2
[1 +
1
n
+
(x
p
−x)
2
SS
xx
]
7 Coefficient of Correlation
In a previous section we tested for a linear relationship between x and y.
Now we examine how strong a linear relationship between x and y is.
We call this measure coefficient of correlation between y and x.
r =
SS
xy

SS
xx
SS
yy
Remarks.
(i) −1 ≤ r ≤ 1.
105
(ii) The population coefficient of correlation is ρ.
(iii) r > 0 indicates a positive correlation (
ˆ
β
1
> 0)
(iv) r < 0 indicates a negative correlation (
ˆ
β
1
< 0)
(v) r = 0 indicates no correlation (
ˆ
β
1
= 0)
Question 10. Find the coefficient of correlation, r.
Answer.
r =
SS
xy

SS
xx
SS
yy
=
23.34

0.444(1, 600.9)
= 0.88
Coefficient of determination
Algebraic manipulations show that
r
2
=
SS
yy
−SSE
SS
yy
Question 11. By what percentage is the sum of squares of deviations of y about the
mean (SS
yy
) is reduced by using ˆ y rather than y as a predictor of y?
Answer.
r
2
=
SS
yy
−SSE
SS
yy
= 0.88
2
= 0.77
r
2
= is called the coefficient of determination
8 Analysis of Variance
Notation:
TSS := SS
yy
=
¸
(y −y)
2
(Total SS of deviations).
SSR =
¸
(ˆ y −y)
2
(SS of deviations due to regression or explained deviations)
SSE =
¸
(y − ˆ y)
2
(SS of deviations for the error or unexplained deviations)
TSS = SSR+ SSE
Question 12. Give the ANOVA table for the AD sales example.
Answer.
Question 13. Use ANOVA table to test for a significant linear relationship between
sales and advertising expenditure.
106
ANOVA Table
Source df SS MS F p-value
Reg. 1 1,226.927 1,226.927 26.25 0.0001
Error 8 373.973 46.747
Totals 9 1,600.900
ANOVA Table
Source df SS MS F p-value
Reg. 1 SSR MSR=SSR/(1) MSR/MSE
Error n-2 SSE MSE=SSE/(n-2)
Totals n-1 TSS
Answer.
Test.
H
0
: β
1
= 0 (no linear relationship)
H
a
: β
1
= 0 (there is linear relationship)
T.S.: F =
MSR
MSE
= 26.25
RR: ( critical value: F
.005,1,8
= 14.69)
Reject H
0
if F > 14.69
(OR: Reject H
0
if α > p-value)
Graph:
Decision: Reject H
0
Conclusion: At 0.5% significance level there is sufficient statistical evidence to indicate
a linear relationship between advertising expenditure, x, and sales volume, y.
9 Computer Printouts for Regression Analysis
Store y in C1 and x in C2.
MTB> Plot C1 C2. : Gives a scatter diagram.
MTB> Regress C1 1 C2.
Computer output for Ad sales example:
More generally we obtain:
107
The regression equation is
y=46.5 + 52.6 x
Predictor Coef Stdev t-ratio P
Constant 46.486 9.885 4.70 0.000
x 52.57 10.26 5.12 0.000
s=6.837 R-sq=76.6% R-sq(adj)=73.7%
Analysis of Variance
Source df SS MS F p-value
Reg. 1 1,226.927 1,226.927 26.25 0.000
Error 8 373.973 46.747
Totals 9 1,600.900
Review Exercises: Linear Regression
Please show all work. No credit for a correct final answer without a valid argu-
ment. Use the formula, substitution, answer method whenever possible. Show your work
graphically in all relevant questions.
1. Given the following data set
x -3 -1 1 1 2
y 6 4 3 1 1
(i) Plot the scatter diagram, and indicate whether x and y appear linearly related.
(ii) Show that
¸
x = 0;
¸
y = 15;
¸
x
2
= 16;
¸
y
2
= 63; SS
xx
= 16; SS
yy
= 18; and
SS
xy
= −16.
(iii) Find the regression equation for the data. (Answer: ˆ y = 3 −x)
(iv) Plot the regression equation on the same graph as (i); Does the line appear to
provide a good fit for the data points?
(v) Compute SSE and s
2
. (Answer: s
2
= 2/3)
(vi) Estimate the expected value of y when x = −1
(vii) Find the correlation coefficient r and find r
2
. (Answer: r = −.943, r
2
= .889)
108
The regression equation is
y =
ˆ
β
0
+
ˆ
β
1
x
Predictor Coef Stdev t-ratio P
Constant
ˆ
β
0
σ
ˆ
β
0
TS: t p-value
x
ˆ
β
1
σ
ˆ
β
1
TS: t p-value
s =

MSE R −sq = r
2
R-sq(adj)
Analysis of Variance
Source df SS MS F p-value
Reg. 1 SSR MSR=SSR/(1) MSR/MSE
Error n-2 SSE MSE=SSE/(n-2)
Totals n-1 TSS
2. A study of middle to upper-level managers is undertaken to investigate the re-
lationship between salary level,Y , and years of work experience, X. A random sample
sample of 20 managers is chosen with the following results (in thousands of dollars):
¸
x
i
= 235;
¸
y
i
= 763.8; SS
xx
= 485.75; SS
yy
= 2, 236.1; and SS
xy
= 886.85. It is
further assumed that the relationship is linear.
(i) Find
ˆ
β
0
,
ˆ
β
1
, and the estimated regression equation.
(Answer: ˆ y = 16.73 + 1.826x)
(ii) Find the correlation coefficient, r.(Answer: r = .85)
(iii) Find r
2
and interpret it value.
3. The Regress Minitab’s command has been applied to data on family income, X,
and last year’s energy consumption, Y , from a random sample of 25 families. The income
data are in thousands of dollars and the energy consumption are in millions of BTU. A
portion of a linear regression computer printout is shown below.
Predictor Coef stdev t-ratio P
Constant 82.036 2.054 39.94 0.000
X 0.93051 0.05727 16.25 0.000
s= R-sq=92.0% R-sq(adj)=91.6%
Analysis of Variance
109
Source DF SS MS F P
Regression 7626.6 264.02 0.000
Error 23
Total 8291
(i) Complete all missing entries in the table.
(ii) Find
ˆ
β
0
,
ˆ
β
1
, and the estimated regression equation.
(iii) Do the data present sufficient evidence to indicate that Y and X are linearly
related? Test by using α = 0.01.
(iv) Determine a point estimate for last year’s mean energy consumption of all families
with an annual income of $40,000.
4. Answer by True of False . (Circle your choice).
T F (i) The correlation coefficient r shows the degree of association between x and y.
T F (ii) The coefficient of determination r
2
shows the percentage change in y resulting
form one-unit change in x.
T F (iii) The last step in a simple regression analysis is drawing a scatter diagram.
T F (iv) r = 1 implies no linear correlation between x and y.
T F (v) We always estimate the value of a parameter and predict the value of a
random variable.
T F (vi) If β
1
= 1, we always predict the same value of y regardless of the value of x.
T F (vii) It is necessary to assume that the response y of a probability model has a
normal distribution if we are to estimate the parameters β
0
, β
1
, and σ
2
.
110
Chapter 11
Multiple Linear Regression
Contents.
1. Introduction: Example
2. Multiple Linear Model
3. Analysis of Variance
4. Computer Printouts
1 Introduction: Example
Multiple linear regression is a statistical technique used predict (forecast) the value
of a variable from multiple known related variables.
2 A Multiple Linear Model
Model.
Y = β
0
+ β
1
X
1
+ β
2
X
2
+ β
3
X
3
+
where
x
i
: independent variables (predictors)
y: dependent variable (response)
β
i
: unknown parameters.
: random error due to other factors not included in the model.
Assumptions.
1. E() := µ

= 0.
2. V ar() := σ
2

= σ
2
.
3. has a normal distribution with mean 0 and variance σ
2
.
111
4. The random components of any two observed y values are independent.
3 Least Squares Prediction Equation
Estimated Regression Equation
ˆ y =
ˆ
β
0
+
ˆ
β
1
x
1
+
ˆ
β
2
x
2
+
ˆ
β
3
x
3
This equation is obtained by using the method of least squares
Multiple Regression Data
Obser. y x
1
x
2
x
3
1 y
1
x
11
x
21
x
31
2 y
2
x
12
x
22
x
32
· · · · · · · · · · · · · · ·
n y
n
x
1n
x
2n
x
3n
Minitab Printout
The regression equation is
y =
ˆ
β
0
+
ˆ
β
1
x
1
+
ˆ
β
2
x
2
+
ˆ
β
3
x
3
Predictor Coef Stdev t-ratio P
Constant
ˆ
β
0
σ
ˆ
β
0
TS: t p-value
x
1
ˆ
β
1
σ
ˆ
β
1
TS: t p-value
x
2
ˆ
β
2
σ
ˆ
β
2
TS: t p-value
x
3
ˆ
β
3
σ
ˆ
β
3
TS: t p-value
s =

MSE R
2
= r
2
R
2
(adj)
Analysis of Variance
Source df SS MS F p-value
Reg. 3 SSR MSR=SSR/(3) MSR/MSE
Error n −4 SSE MSE=SSE/(n-4)
Totals n −1 TSS
112
Source df SS
x
1
1 SSx
1
x
1
x
2
1 SSx
2
x
2
x
3
1 SSx
3
x
3
Unusual observations (ignore)
113
MINITAB.
Use REGRESS command to regress y stored in C1 on the 3 predictor variables stored
in C2 −C4.
MTB> Regress C1 3 C2-C4;
SUBC> Predict x1 x2 x3.
The subcommand PREDICT in Minitab, followed by fixed values of x
1
, x
2
, and x
3
calculates the estimated value of ˆ y (Fit), its estimated standard error (Stdev.Fit), a 95%
CI for E(y), and a 95% PI for y.
Example. A county assessor wishes to develop a model to relate the market value, y, of
single-family residences in a community to the variables:
x
1
: living area in thousands of square feet;
x
2
: number of floors;
x
3
: number of bedrooms;
x
4
: number of baths.
Observations were recorded for 29 randomly selected single-family homes from res-
idences recently sold at fair market value. The resulting prediction equation will then
be used for assessing the values of single family residences in the county to establish the
amount each homeowner owes in property taxes.
A Minitab printout is given below:
MTB> Regress C1 4 C2-C5;
SUBC> Predict 1.0 1 3 2;
SUBC> Predict 1.4 2 3 2.5.
The regression equation is
y = −16.6 + 7.84x
1
−34.4x
2
−7.99x
3
+ 54.9x
4
Predictor Coef. Stdev t-ratio P
Constant −16.58 18.88 −0.88 0.389
x
1
7.839 1.234 6.35 0.000
x
2
−34.39 11.15 −3.09 0.005
x
3
−7.990 8.249 −0.97 0.342
x
4
54.93 13.52 4.06 0.000
s = 16.58 R
2
= 88.2% R
2
(adj) = 86.2%
114
Analysis of Variance
Source df SS MS F p-value
Reg. 4 49359 12340 44.88 0.000
Error 24 6599 275
Totals 28 55958
Source df SS
x
1
1 44444
x
2
1 59
x
3
1 321
x
4
1 4536
Fit Stdev.Fit 95%C.I. 95%P.I.
113.32 5.80 (101.34, 125.30) (77.05, 149.59)
137.75 5.48 (126.44, 149.07) (101.70, 173.81)
115
Q1. What is the prediction equation ?
The regression equation is
y = −16.6 + 7.84x
1
−34.4x
2
−7.99x
3
+ 54.9x
4
Q2. What type of model has been chosen to fit the data?
Multiple linear regression model.
Q3. Do the data provide sufficient evidence to indicate that the model contributes
information for the prediction of y? Test using α = 0.05.
Test:
H
0
: model not useful
H
a
: model is useful
T.S. : p-value=0.000
DR. Reject H
0
if α > p −value
Graph:
Decision: Reject H
0
Conclusion: At 5% significance level there is sufficient statistical evidence to indicate
that the model contributes information for the prediction of y.
Q4. Give a 95% CI for E(y) and PI for y when x
1
= 10, x
2
= 1, x
3
= 3, and x
4
= 2.
CI: (101.34, 125.30)
PI: (77.05, 149.59)
Non-Linear Models
Example.
ˆ y =
ˆ
β
0
+
ˆ
β
1
x
1
+
ˆ
β
2
x
2
+
ˆ
β
3
x
2
1
x
2
116

MBA 604, Spring 2003 MBA 604 Introduction to Probability and Statistics Course Content. Topic 1: Data Analysis Topic 2: Probability Topic 3: Random Variables and Discrete Distributions Topic 4: Continuous Probability Distributions Topic 5: Sampling Distributions Topic 6: Point and Interval Estimation Topic 7: Large Sample Estimation Topic 8: Large-Sample Tests of Hypothesis Topic 9: Inferences From Small Sample Topic 10: The Analysis of Variance Topic 11: Simple Linear Regression and Correlation Topic 12: Multiple Linear Regression

1

Contents
1 Data Analysis 1 Introduction . . . . . . . . . 2 Graphical Methods . . . . . 3 Numerical methods . . . . . 4 Percentiles . . . . . . . . . . 5 Sample Mean and Variance For Grouped Data . . . . . 6 z-score . . . . . . . . . . . . 2 Probability 1 Sample Space and Events 2 Probability of an event . . 3 Laws of Probability . . . . 4 Counting Sample Points . 5 Random Sampling . . . . 6 Modeling Uncertainty . . . 5 5 7 9 16 17 17 22 22 23 25 28 30 30 35 35 37 38 40 48 48 48 51 52

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

3 Discrete Random Variables 1 Random Variables . . . . . . . 2 Expected Value and Variance 3 Discrete Distributions . . . . . 4 Markov Chains . . . . . . . . 4 Continuous Distributions 1 Introduction . . . . . . . 2 The Normal Distribution 3 Uniform: U[a,b] . . . . . 4 Exponential . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

2

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Small-Sample Inferences About the Difference Between Two Means: Independent Samples . . . . . . . . . . . . . . . . . . . . . 2 A Large-Sample Statistical Test . . . . . . . . . . . . 3 The Randomized Block Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Two Quantitative Populations . . . . . . . . . . . . . . 2 Point Estimators and Their Properties 3 Single Quantitative Population . . . 8 Small-Sample Tests of Hypothesis 1 Introduction . . . . . . . . . . . . . . . . . . . 5 Small-Sample Inferences About the Difference Between Two Means: Paired Samples . . . . . . . 4 Testing a Population Proportion . .5 Sampling Distributions 1 The Central Limit Theorem (CLT) . . . . . . . . . . . . . . . . . 2 One Way ANOVA: Completely Randomized Experimental Design . . . . . . . . . . . . 5 Comparing Two Population Means . . . . . . . . . . . . 6 Two Binomial Populations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Comparing Two Population Variances . . . . . . . . . . . . . . . . . . . . . . . . 9 Analysis of Variance 1 Introduction . . . . . . . . . . 2 Student’s t Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Large-Sample Tests of Hypothesis 1 Elements of a Statistical Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 56 56 61 61 62 62 64 66 67 70 70 71 72 73 74 75 77 79 79 79 80 81 84 86 87 89 89 90 93 . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Sampling Distributions . . . . . . . . . . . . . . . . . 7 Reporting Results of Statistical Tests: P-Value . . . . . . . . . . 3 . . . . . . . . . . . . . . . . . . . . . . 6 Large Sample Estimation 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Single Binomial Population . . . . . . . . . . . . . . . . 3 Small-Sample Inferences About a Population Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Comparing Two Population Proportions . . 6 Inferences About a Population Variance . 3 Testing a Population Mean . . . .

5 Estimating E(y|x) For a Given x . . . . . . . . . . . . .10 Simple Linear Regression and Correlation 1 Introduction . . . . . . . . . . . . 6 Predicting y for a Given x . . . . . . . . . . . . . . . . 3 Least Squares Prediction Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 A Simple Linear Probabilistic Model . . . . . . . 7 Coefficient of Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Least Squares Prediction Equation . . . . . . . . . . 8 Analysis of Variance . . . . . . . . . . . . 4 Inferences Concerning the Slope . . . . . . . . . . . . . . . . . . . . . 98 98 99 100 103 105 105 105 106 107 111 111 111 112 11 Multiple Linear Regression 1 Introduction: Example . . . . . 2 A Multiple Linear Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Computer Printouts for Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 . .

If you answer all questions on a (T. or possible side effects. How fuel efficient a certain car model is? 4. A market analyst wants to know the effectiveness of a new diet.Chapter 1 Data Analysis Chapter Content. Introduction Statistical Problems Descriptive Statistics Graphical Methods Frequency Distributions (Histograms) Other Methods Numerical methods Measures of Central Tendency Measures of Variability Empirical Rule Percentiles 1 Introduction Statistical Problems 1. 2. wants to know if a new drug is superior to already existing drugs.F) (or multiple choice) examination completely randomly. 3. A pharmaceutical Co. 5 . What is the effect of package designs on sales. what are your chances of passing? 6. 5. Is there any relationship between your GPA and employment opportunities.

decisions) about certain characteristics of a population based on information contained in a sample. weight of freshman students) Descriptive Statistics: deals with procedures used to summarize the information contained in a set of measurements. A collection of numbers (data)) Knowledge: Useful data Population: set of all measurements of interest (e.g. How to pick the stocks to invest in? I. (iii) Collection and analysis of data (gathering and summarizing data). How to interpret polls. times to assemble a product) Information: Any aquired data ( e. (iv) Procedure for making predictions about the population based on sample information.g.g. Definitions Probability: A game of chance Statistics: Branch of science that deals with data analysis Course objective: To make decisions in the prescence of uncertainty Terminology Data: Any recorded event (e. How many individuals you need to sample for your inferences to be acceptable? What is meant by the margin of error? 8. (better statement) To make inferences (predictions. (v) A measure of “goodness” or reliability for the procedure. all freshman students at the university) Sample: A subset of measurements selected from the population of interest Variable: A property of an individual population unit (e. Objective.g. Inferential Statistics: deals with procedures used to make inferences (predictions) about a population parameter from information contained in a sample. Types of data: qualitative vs quantitative OR discrete vs continuous Descriptive statistics Graphical vs numerical methods 6 . height.7. Elements of a statistical problem: (i) A clear definition of the population and variable of interest. What is the effect of market strategy on market share? 9. all registered voters. major. (ii) a design of the experiment or sampling procedure.

03 6 25.5 15.3 19.2 12.6 9.03 2 9.9 28.28) 6/25 (.5 19.12) 5/25 (.1 Loss 15. Method: Construct a statistical graph called a “histogram” (or frequency distribution) Weight Loss Data class boundtally class aries freq.) -The more data one has the larger is the number of classes.0-25.0 1 Totals 25 rel. f /n 3/25 (.9 Objective: Provide a useful summary of the available information.4 23.07 4 17. 7 .6 18.04) 1.12) 1/25 (.0-29.9 7.00 Let k = # of classes max = largest measurement min = smallest measurement n = sample size w = class width Rule of thumb: -The number of classes chosen is usually between 5 and 20.8 Data 24. freq.05 3 13.4 12.0 11.4 16. f 1 5.2 20.3 8.1 17.4 14. (Most of the time between 7 and 13.0-21.0-9.8 22.2 Graphical Methods Frequency and relative frequency distributions (Histograms): Example Weight 20.0-17.8 13.7 16.8 9.20) 7/25 (.06 5 21.24) 3/25 (.0-13.8 15.6 5.

3log10 (n).0 (why?) 6 Graphs: Graph the frequency and relative frequency distributions. Normal distribution (Bell shape) 2. max − min . Steps in Constructing a Frequency Distribution (Histogram) 1. Determine the number of classes 2. Exercise. Comment on the usefulness of each including k = 6. k w= Note: w = 28. Binomial. But we used w = 29−5 = 4.4 6 = 3.Pie-Charts .Formulas: k = 1 + 3.Line Charts . Repeat the above example using 12 and 4 classes respectively. Exponential 3.87.It occurs naturally in practical applications .6−5. Locate class boundaries 4. easiest to handle . Uniform 4.It lends itself easily to more in depth analysis Other Graphical Methods -Statistical Table: Comparing different populations .Cheating with Charts 8 . Poisson (discrete variables) Important -The normal distribution is the most popular.Bar Charts . most useful. Determine the class width 3. Proceed as above Possible shapes of frequency distributions 1.

2. the median is the middle number 9 . Sample median 3. Sample mean 2. 95. 60. 75) then x = 90 + 95 + 80 + 60 + 75 = 400 x x = n = 400 = 80.5 6 2. I. 22. Note: If n is odd. Sample mode Measures of Dispersion (Variability) Range Mean Absolute Deviation (MAD) Sample Variance Sample Standard Deviation 1. xn ) where n = sample size xi = value of the ith observation in the sample 1. 5 Example 2: Let x = age of a randomly selected student sample: (20. 19) x = 20 + 18 + 22 + 29 + 21 + 19 = 129 x x = n = 129 = 21. 21. x2 .3 Numerical methods Measures of Central Tendency 1. 80. 4. · · · . 3. Sample Mean (arithmetic average) x = x = x1 +x2 +···+xn n x n or Example 1: Given a sample of 5 test grades (90. Measures of Central Tendency Given a sample of measurements (x1 . Sample Median The median of a sample (data set) is the middle number when the measurements are arranged in ascending order. 29. 18.

14. Example 1: Sample (9. n = 6 Step 1: 2. 11. 7. 7. Example 2: Sample (9. 7. 2.If n is even. 7. 6. 11. 14). 9. 2 Remarks: (i) x is sensitive to extreme values (ii) the median is insensitive to extreme values (because median is a measure of location or position). 7. Mode The mode is the value of x (observation) that occurs with the greatest frequency. 3. 9. 11. 11. mode = 7 10 . 14). 2. n = 5 Step 1: arrange in ascending order 2. 2. 2. 7. the median is the average of the middle two numbers. 7). 6. 11. Example: Sample: (9. 14 Step 2: med = 7+9 = 8. 14 Step 2: med = 9.

median and mode on relative frequency distribution. 11 .Effect of x.

65. · · · .min = 95-65 = 30 2. n 6 x = 80 n |x − x| n Remarks: (i) MAD is a good measure of variability (ii) It is difficult for mathematical manipulations 3. Sample Standard Deviation. s2 s2 = 4. 70. 95) Range = max .smallest measurement or Range = max .II. xn ) 1. Range: Range = largest measurement . Sample Variance.min Example 1: Sample (90. Measures of Variability Given: a sample of size n sample: (x1 . s (x − x)2 n−1 12 . Mean Absolute Difference (MAD) (not in textbook) MAD = Example 2: Same sample x= x x − x |x − x| 90 10 10 85 5 5 65 -15 15 75 -5 5 70 -10 10 95 15 15 Totals 480 0 60 MAD = 60 |x − x| = = 10. 75. x2 . 85.

83 Shortcut Formula for Calculating s2 and s s2 = ( x) x2 − n n−1 2 s= ( x) x2 − n n−1 √ s2 ).s = or s = √ s2 (x−x)2 n−1 Example: Same sample as before (x = 80) x x − x (x − x)2 90 10 100 85 5 25 65 -15 225 75 -5 25 70 -10 100 95 15 225 Totals 480 0 700 Therefore x= s2 = 480 x = = 80 n 6 700 (x − x)2 = = 140 n−1 5 √ √ s = s2 = 140 = 11. 2 (or s = Example: Same sample 13 .

s = 2 2 ( x) n 2 2 Numerical methods(Summary) Data: {x1 . Graphical Interpretation of the Variance: Finite Populations Let N = population size. xN } Population mean: µ = Population variance: xi N x Sample Variance: s2 = (xi −x)2 n−1 σ2 = (xi − µ)2 N 14 . Data: {x1 . 400 700 = = = 140 5 5 √ √ s2 = 140 = 11. xn } (i) Measures of central tendency Sample mean: x = n i Sample median: the middle number when the measurements are arranged in ascending order Sample mode: most frequently occurring value (ii) Measures of variability Range: r = max − min √ Sample standard deviation: s= s2 Exercise: Find all the measures of central tendency and measures of variability for the weight loss example.83. 100 − (480) 6 s = = n−1 5 39.100 x − 39. · · · .x x2 90 8100 85 7225 65 4225 75 5625 70 4900 95 9025 Totals 480 39. x2 . 100 − 38. · · · . x2 .

. i. . 93) contains at least 99% (almost all) of the observations Comments. i. Can you improve the estimates in Chebyshev’s Inequality. (x − 3s. The frequency distribution is known to be normal (bell shaped).e. σ.e. s2 . x + s) (ii) approximately 95% of the measurements lie within two standard deviations of their sample mean. ?] (v) (k = 5): at least ?% of all grades lie in [?. x2 . . At least (1 − k12 ) observations lie in the interval (x − ks. at least (1 − k12 ) of the measurements lie within k standard deviations of their sample mean. Empirical rule. (Regardless of the shape of frequency distribution) Given a number k ≥ 1. Practical Significance of the standard deviation Chebyshev’s Inequality. x2 . σ 2 . xn .Population standard deviation: σ = √ σ 2 . 93] (iv) (k = 4): at least ?% of all grades lie in [?. Then (i) (69. . that is bell shaped. . Example. . and a set of measurements x1 . . Given a set of measurements x1 . Sample statistics: x. xn . i. Restated. (xi − µ)2 N σ= Population parameters vs sample statistics. s = 6.e. ?] Suppose that you are told that the frequency distribution is bell shaped. s. s = 6. (i) Empirical rule works better if sample size is large (ii) In your calculations always keep 6 significant digits 15 . Population parameters: µ. Then (i) (k = 1): at least 0% of all grades lie in [69. Then (i) approximately 68% of the measurements lie within one standard deviations of their sample mean. 87) contains approximately 95% of the observations (iii) (57. (x − s. 87] (iii) (k = 3): at least 88% of all grades lie in [57.e. x + ks). i. (x − 2s. A set of grades has x = 75. x + 3s) Example A data set has x = 75. 81) contains approximately 68% of the observations (ii) (63. . x + 2s) (iii) at least (almost all) 99% of the measurements lie within three standard deviations of their sample mean. 81] (ii) (k = 2): at least 75% of all grades lie in [63.

75 2. 10.5(11 − 10) = 10.(iii) Approximation: s range 4 s x (iv) Coefficient of variation (c. xn be a set of measurements arranged in increasing order. IQ = Q3 − Q1 Exercise.5(9) = 4.3(9) = 2.25 (S2) Q1 = 5 + . .25 Interquartiles.1 Special Cases. Definition.1 = 7. (S1) position = . Let 0 < p < 100. Median (50th percentile) Example.) = 4 Percentiles Using percentiles is useful if data is badly skewed.25(8 − 5) = 5 + . Let x1 . x2 . 17. 20. 8. 1. Example.75 (S2) Q3 = 14 + . Upper Quartile (75th percentile) Example.25(n + 1) = .5 3. Solution.7(8 − 5) = 5 + 2. (i) Find the 30th percentile. .5(n + 1) = . Lower Quartile (25th percentile) Example. (S1) position = . . Find the interquartile (IQ) in the above example.3(n + 1) = .5 (S2) median: Q2 = 10 + . (S1) position = .75(n + 1) = . The pth percentile is a number x such that p% of all measurements fall below the pth percentile and (100 − p)% fall above it. 11.75 = 5.25(9) = 2. (S1) position = .v. 14. Data: 2.75(9) = 6.75(17 − 14) = 16.7 (S2) 30th percentile = 5 + . . 5. 16 .

Exercise: Use the grouped data formulas to calculate the sample mean.011 5 3 13.587 729 6.5 Sample Mean and Variance For Grouped Data Weight Loss Data class boundaries mid-pt. xg = xf n s2 = g x2 f − ( xf )2 /n n−1 where the summation is over the number of classes k.0-13.0-9.0 27 1 Totals 25 Example: (weight loss data) x2 f 147 605 1. 6 z-score 1.809 xf 21 55 105 114 69 27 391 Let k = number of classes. The sample z-score for a measurement x is z= x−x s 2.0-17.166 1.019 6 5 21.015 7 4 17. x f 1 5.0-29. Formulas.07 3 2 9. Compare with the raw data results.575 2.0-21. The population z-score for a measurement x is 17 .023 3 6 25.0-25. freq. sample variance and sample standard deviation of the grouped data in the weight loss example.

e.85 .84 .85 .94 . No credit for a correct final answer without a valid argument. substitution.94 .0065. . (iii) Using k = 7 classes. of each class interval.88 .05 . Although fluoride levels are measured more than once per day.71 . (i.81 . s. R. (iv) Locate class boundaries (v) Construct the frequency and relative frequency distributions for the data. The 25 measurements below represent the fluoride level for a sample of 25 days. (Fluoride Problem) The regulation board of health in a particular state specify that the fluoride level must not exceed 1.89 .78 . s = .8588. 85 − 75 x−x = = 1.75 . (ii) Find the range.z= x−µ σ Example. w. above (below) the mean your score is)? Answer. s = 6. find the width. how many standard deviations. Show your work graphically in all relevant questions. Review Exercises: Data Analysis Please show all work. s2 = .97 .0803.83 . 18 . these data represent the early morning readings for the 25 days sampled. answer method whenever possible.77 . Use the formula.86 . What is your relative standing.84 .89 .92 1. Suppose your score is 85.79 (i) Show that x = .5 ppm (parts per million).76 . A set of grades has x = 75. 1.83 .97 .93 .66 z= s 6 standard deviations above average.82 .

f xf x2 f 40 -504 50 -606 60-7010 70-8015 80-9010 90-100 5 Totals 19 . 15. Q1 and Q3 . mode =5 range = 7. MAD=2. (vii) Find the sample variance using the short-cut formula.90-. (vi) Find the sample variance using the defining formula.95.05 Totals relative frequency (vi) Graph the frequency and relative frequency distributions and state your conclusions. (ii) Find the sample median. s = 2.00-1. (iii) Find the sample mode. 2.70-. 6.75.80.5.95-1. 16. (iv) Find the sample range. 4. (Vertical axis must be clearly labeled) 2.class frequency .80-. med =5. Q − 3 = 8.90. Given the following data set (weight loss per week) (9. (ix) Find the first and third quartiles.85. ss . 8. Grades for 50 students from a previous MAT test are summarized below.7. Answers: x = 5.85-. 3. 5. (viii) Find the sample standard deviation. (v) Find the mean absolute difference.25.001. 24).75-. 5) (i) Find the sample mean. class frequency. (x) Repeat (i)-(ix) for the data set (21. 24.588.

2. s = 14. x = 72. Assume the standard deviation is known to be 4 and that the frequency distribution is known to be bell-shaped. (Vertical axis must be clearly labeled) (iii) Find the sample mean for the grouped data (iv) Find the sample variance and standard deviation for the grouped data. T F (ii) The mean is insensitive to extreme values. 250.6% of measurements. T F (i) The median is insensitive to extreme values. (i) Approximately what percentage of measurements fall in the interval (22. Answers: Σxf = 21. (ii) Find the sample mean and standard deviation for the grouped data. (ii) Graph the frequency distribution. ∞) 7.475. sg = . 5. Answers: Σxf = 3610.0745. Refer to the data in the fluoride problem. xg =. 20 .(i) Complete all entries in the table. (Circle your choice). Refer to the raw data in the fluoride problem. 34) (ii) Approximately what percentage of measurements fall in the interval (µ. Suppose that the relative frequency distribution is bell-shaped. T F (vi) The number of students attending a Mathematics lecture on any given day is a discrete variable. T F (v) Numerical descriptive measures computed from sample measurements are called parameters. Using the empirical rule (i) find the interval around the mean that contains 99. Suppose that the mean of a population is 30. (4 pts. Σx2 f = 270.58.) Answer by True of False . µ + 2σ) (iii) Find the interval around the mean that contains 68% of measurements (iv)Find the interval around the mean that contains 95% of measurements 6. s2 = 196. 4. (i) Find the sample mean and standard deviation for the raw data. (ii) find the percentage of measurements fall in the interval (µ + 2σ. T F (iii) For a positively skewed frequency distribution. the mean is larger than the median. (iii) Compare the answers in (i) and (ii). Σx2 f = 18. T F (iv) The variance is equal to the square of the standard deviation.

21 . T F (viii) Although we may have a large mass of data. T F (xi) A parameter is a number that describes a sample characteristic. statistical techniques allow us to adequately describe and summarize the data with an average. T F (xii) A population is a subset of the sample. T F (xiii) A population is the complete collection of items under study. T F (x) A statistic is a number that describes a population characteristic.T F (vii) The median is a better measure of central tendency than the mean when a distribution is badly skewed. T F (ix) A sample is a subset of the population.

Outcome of an experiment: Elementary event (simple event): one possible outcome of an experiment Event (Compound event): One or more possible outcomes of a random experiment Sample space: the set of all sample points (simple events) for an experiment is called a sample space. Sample space : S 22 . polling. counting arrivals at emergency room. a population could be generated by repeating an experiment indefinitely. Conceptually. Population: Set of all possible observations. Sample Space and Events Probability of an Event Equally Likely Outcomes Conditional Probability and Independence Laws of Probability Counting Sample Points Random Sampling 1 Sample Space and Events Definitions Random experiment: involves obtaining observations of some kind Examples Toss of a coin. inspecting an assembly line. etc.Chapter 2 Probability Contents. or set of all possible outcomes for an experiment Notation. throw a die.

Sample point: E1 . B c = {E4 . E2 . C. The intersection of A and B. Intersection and Complementation Given A and B two events in a sample space S. . (any capital letter). E2 . (iv) A and B are not mutually exclusive (why?) (v) Give two events in S that are mutually exclusive. E2 . E3 }. . n. Then (i)A ∪ B = {E1 . E3 . (ii) AB = {E1 . 5. E3 }. . S = {E1 . the probability of A is P (A) Interpretation n = # of trials of an experiment nA n 23 . . 3. A ∩ B. A ∪ B. (i. Event: A. 2. . Ac . B = {E1 . Example Suppose S = {E1 . of times and the event A is observed nA times. . E5 }. Mutually Exclusive Events (Disjoint Events) Two events are said to be mutually exclusive (or disjoint) if their intersection is empty. 4. Venn diagram: Example. E6 }. B. Sometimes we use AorB for union. E4 . E2 . . A ∩ B = φ). etc. E6 }. E etc. Sometimes we use notA or A for complement. 3. E2 . 1. More definitions Union. The union of A and B. . 2. . is the event containing all sample points that are both in A and B. is the event containing all sample points in either A or B or both. 2 Probability of an event Relative Frequency Definition If an experiment is repeated a large number. D. . 6}. E5 . . We may think of S as representation of possible outcomes of a throw of a die. Let A = {E1 . The complement of A. E6 }. That is S = {1. E5 }. Sometimes we use AB or AandB for intersection. is the event containing all sample points that are not in A.e. (iii) Ac = {E2 . E6 }. E3 .

. Add up the simple events’ probabilities to obtain the probability of the event 24 . In tabular form. E10 }. 9 and P (E10 ) = 2/20. Assign probabilities to simple events 4. . . 8. S where the summation is over all sample points in S.. i ≥ 6}. i = 1. we have Ei p(Ei ) E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 1/20 1/20 1/20 1/20 1/20 1/20 1/5 1/5 1/5 1/10 Question: Calculate P (A) where A = {Ei . Let S = {E1 .nA = frequency of the event A nA = relative frequency of A n P (A) nA n if n is large enough. . We refer to P (Ei ) as the probability of the Ei . Example. List all simple events 3. . Define the experiment 2. A: P (A) = P (E6 ) + P (E7 ) + P (E8 ) + P (E9 ) + P (E10 ) = 1/20 + 1/5 + 1/5 + 1/5 + 1/10 = 0. E2 . . . 6 and P (Ei ) = 1/5. . For each event Ei of the sample space S define a number P (E) that satisfies the following three conditions: (i) 0 ≤ P (Ei ) ≤ 1 for all i (ii) P (S) = 1 (iii) (Additive property) P (Ei ) = 1. . Definition The probability of any event A is equal to the sum of the probabilities of the sample points in A. . . P (A) = limn→∞ nn . i = 7. . It is known that P (Ei) = 1/20.) Conceptual Definition of Probability Consider a random experiment whose sample space is S with sample points E1 . A (In fact.75 Steps in calculating probabilities of events 1. Determine the simple events that constitute an event 5.

3 Laws of Probability Conditional Probability The conditional probability of the event A given that event B has occurred is denoted by P (A|B). (i) List all the sample points in the sample space Solution: S = {HHH. In this case.Example Calculate the probability of observing one H in a toss of two fair coins. The assignment. (e. EN }. we would like to assign probabilities to simple events directly. (ii) At the conceptual level we assign probabilities to events. then use the laws of probability to calculate the probabilities of compound events. This measure of belief should however satisfy the axioms. one cannot measure probabilities. . . one can estimate probabilities.g. However. (iii) In some cases probabilities can be a measure of belief (subjective probability). Equally Likely Outcomes The equally likely probability P defined on a finite sample space S = {E1 . . · · · T T T } (Complete this) (ii) Find the probability of observing exactly two heads. T H} P (A) = 0. P(H)=.5 in a toss of a fair coin). S = {HH. Then P (A ∩ B) P (A|B) = P (B) 25 . Example. Toss a fair coin 3 times. P(T)=. T H. Solution. T T } A = {HT. HT.5 Interpretations of Probability (i) In real world applications one observes (measures) relative frequencies. (iv) Typically. should make sense. however. at most one head. assigns the same probability P (Ei ) = 1/N for all Ei . for any event A P (A) = sample points in A #(A) NA = = N sample points in S #(S) where N is the number of the sample points in S and NA is the number of the sample points in A.5. .

provided P (B) > 0. . if A and B are mutually exclusive. E3 }. . then P (A|B) = P (A) and P (B|A) = P (B). E6 }. (ii) Two events A and B that are not independent are said to be dependent. E4 . (i) If A and B are independent. (iii) Find P (A|B) and P (B|A) (iv) Find P (D) and P (D|C) 26 . Remarks. B = {E1 . C = {E2 . if A and B are independent P (AB) = P (A)P (B) Example Let S = {E1 . . A = {E1 . E3 . (i) Two events A and B are said to be independent if P (A ∩ B) = P (A)P (B). Similarly. E2 . Probability Laws Complementation law: P (A) = 1 − P (Ac ) Additive law: P (A ∪ B) = P (A) + P (B) − P (A ∩ B) Moreover. E2 . (i) What does it mean that all elementary events are equally likely? (ii) Use the complementation rule to find P (Ac ). .D = {E6 }. E5 }. (ii) If A is independent of B then B is independent of A. Suppose that all elementary events are equally likely. P (B|A) = P (A ∩ B) P (A) Independent Events Definitions. then P (AB) = 0 and P (A ∪ B) = P (A) + P (B) Multiplicative law (Product rule) P (A ∩ B) = P (A|B)P (B) = P (B|A)P (A) Moreover. E6 }.

Then P (A) = P (A ∩ B) + P (A ∩ B c ) .(v) Are A and B independent? Are C and D independent? (vi) Find P (A ∩ B) and P (A ∪ B). if a healthy person is tested. B c be complementary events and let A denote an arbitrary event. or P (A) = P (A|B)P (B) + P (A|B c )P (B c ). (i) The events of interest here are B. Then P (B|A) = P (AB) P (A|B)P (B) = .5 percent of the population actually has the disease. B c .01.005) + (. P (A) P (A|B)P (B) + P (A|B c )P (B c ) Remarks.95)(. B c be complementary events and let A denote an arbitrary event. P (D|E) = = = = 27 . then. A laboratory blood test is 95 percent effective in detecting a certain disease when it is. the test result will imply he or she has the disease.95)(.005) (.995) 95 294 . with probability 0. (That is. what is the probability a person has the disease given that the test result is positive? Solution Let D be the event that the tested person has the disease and E the event that the test result is positive. present. Bayes’ Law Let the B. The desired probability P (D|E) is obtained by P (D ∩ E) P (E) P (E|D)P (D) P (E|D)P (D) + P (E|D c )P (D c) (. However.) If 0. (ii) Bayes’ Law is important in several fields of applications.01)(. and (ii) P (B|A) and P (B c |A) are called posterior (revised) probabilities. the test also yields a “false positive” results for 1 percent of healthy persons tested.323. Example 1. Law of total probability Let the B. in fact. P (B) and P (B c ) are called prior probabilities.

for each outcome of experiment 1.576 9 10 40 1012 15 10 64 1019 Note that 230 109 = one billion. Examples. each of whom has 3 sons. then together there are mn possible outcomes of the two experiments. nA where n and nA are the number of points in S and A respectively. we see. (i) Toss two coins: mn = 2 × 2 = 4 (ii) Throw two dice: mn = 6 × 6 = 36 (iii) A small community consists of 10 men. that there are 10 × 3 = 30 possible choices. 240 1012 = one thousand billion. how many different choices are possible? Solution: Let the choice of the man as the outcome of the first experiment and the subsequent choice of one of his sons as the outcome of the second experiment. Then if experiment 1 can result in any one of m possible outcomes and if.Thus only 32 percent of those persons whose test results are positive actually have the disease. Probabilities in Tabulated Form 4 Counting Sample Points Is it always necessary to list all sample points in S? Coins 1 3 5 10 30 50 Coin Tosses sample-points Coins sample-points 2 2 4 8 4 16 32 6 64 1024 20 1. If one man and one of his sons are to be chosen as father and son of the year. so for some applications we need to find n.048. there are n possible outcomes of experiment 2. A RECALL: P (A) = nn . from the basic principle. 250 1015 = one trillion. Basic principle of counting: mn rule Suppose that two experiments are to be performed. Generalized basic principle of counting 28 .

A subcommittee of 4. List all arrangements. (iv) In (iii). 760.. 000 possible license plates. 4 between B and C. Balls are selected without replacement one at a time. 000 possible license plates. 5 juniors. b and c. and 2 seniors.7 = 140. Answer: There are 3! = 6 arrangements or permutations. then there are a total of n1 · n2 · · · nr possible outcomes of the r experiments. and if. In how many different ways can you select 3 balls? Solution: Note that n = 10. how many license plates would be possible if repetition among letters or numbers were prohibited? Solution: In this case there would be 26 · 25 · 24 · 10 · 9 · 8 · 7 = 78. . Examples (i) There are 5 routes available between A and B. r = 3. Permutations: (Ordered arrangements) The number of ways of ordering n distinct objects taken r at a time (order is important) is given by n! = n(n − 1)(n − 2) · · · (n − r + 1) (n − r)! Examples (i) In how many ways can you arrange the letters a. 624. 4 sophomores. is to be chosen. . . 7! 29 . and if for each of the possible outcomes of the first two experiments there are n3 possible outcomes of the third experiment. consisting of 1 individual from each class. and 7 between C and D. How many different subcommittees are possible? Solution: It follows from the generalized principle of counting that there are 3·4·5·2 = 120 possible subcommittees. (ii) A college planning committee consists of 3 freshmen. (ii) A box contains 10 balls. (iii) How many different 7−place license plates are possible if the first 3 places are to be occupied by letters and the final 4 by numbers? Solution: It follows from the generalized principle of counting that there are 26 · 26 · 26 · 10 · 10 · 10 · 10 = 175. Number of different ways is 10 · 9 · 8 = 10! = 720.4.If r experiments that are to be performed are such that the first one may result in any of n1 possible outcomes. and if for each of these n1 possible outcomes there are n2 possible outcomes of the second experiment. What is the total number of available routes between A and D? Solution: The total number of available routes is mnt = 5.

475. Consider a chance experiment: Toss of a coin. we define n! n = r (n − r)!r! and say that n r represents the number of possible combinations of n objects taken r at a time (with no regard to order). Examples (i) A committee of 3 is to be formed from a group of 20 people. Remarks. Example.2. Even though probability (chance) involves the notion of change. Concept of Probability.18 3. (i) If n is large.1 = 1140 possible committees. how many different committees consisting of 2 men and 3 women can be formed? Solution: 5 7 = 350 possible committees. (ii) From a group of 5 men and 7 women. we say the random sample provides an honest representation of the population.(which is equal to n! ). (ii) For finite populations the number of possible samples of size n is the number of possible samples when N = 28 and n = 4 is 28 4 N n .19. A sample of size n is said to be a random sample if the n elements are selected in such a way that every possible combination of n elements has an equal probability of being selected. 2 3 5 Random Sampling Definition. (n−r)! Combinations For r ≤ n. 30 . In this case the sampling process is called simple random sampling. the laws governing the change may themselves remain fixed as time passes. (iii) Tables of random numbers may be used to select random samples. How many different committees are possible? Solution: There are 20 3 = 20! 3!17! = 20. For instance = 20. 1. 6 Modeling Uncertainty The purpose of modeling uncertainty (randomness) is to discover the laws of change.

It was observed that face 6 appeared 1.5. Example. In a fair coin tossing experiment the percentage of (H)eads is very close to 0.3125 = 2!3! Conclusion. substitution. Use the Binomial law to show that P (2Heads) = 5 (0.5 exactly. it makes sense). B. 500. Show your work graphically in all relevant questions. (ii) Describe the following events: A = { observe exactly two heads} B = { Observe at most one tail} C = { Observe at least two heads} D = {Observe exactly one tail} (iii) Find the probabilities of events A. Use the formula. Review Exercises: Probability Please show all work. Why Probabilistic Reasoning? Example.5)2 (. 31 . (ii) Theory is related to physical phenomena only in inexact terms (i.5)3 = 0. An experiment consists of tossing 3 fair coins. (iii) When theory is applied to real problems. approximately). 1. No credit for a correct final answer without a valid argument. what percentage of trials produce 2 Heads? answer. There is no need to carry out this experiment to answer the question. C. (Thus saving time and effort).e. Now. Toss 5 coins repeatedly and write down the number of heads observed in each trial. In the model (abstraction): P (H) = 0.e. A fair die is tossed for a very large number of times. (Theory versus Application) (i) Theory is an exact discipline developed from logically defined axioms (conditions). Estimate how many times the die is tossed. 2. The Interplay Between Probability and Statistics.Probabilistic Law. it works ( i. Answer.5)3 2 5! (0. D. 9000 times.5)2 (1 − . (i) List all the elements in the sample space. answer method whenever possible.

P (6) = .00 C D (i) Using the definitions. 5. The following probability table gives the intersection probabilities for four events A.55 B 0. 3. (ix) Refer to problem 2. 32 . P (B). Find (i) A ∪ B (ii) A ∩ B (iii) B ∩ C (iv) Ac (v) C c (vi) A ∪ C (vii) A ∩ C (viii) Find the probabilities in (i)-(vii).2. C and D: A . (i) List all the elements in the sample space. (iv) Compare problems 2. (i) Find the probability of the event A = {4. 6}. P (C|A). P (2) = .08 1.2. (ii) Describe the following events: A = { observe a number larger than 3 } B = { Observe an even number} C = { Observe an odd number} (iii) Find the probabilities of events A. 2. (iv) Find the probability of the event C = {odd}.P(3)=.31 .2. 5.1. Suppose that S = {1. P(4)=.1. find P (A). (ii) Find the probability of the complement of A. B. (iii) Find the probability of the event B = {even}. 5. An experiment consists of throwing a fair die. and 3.1. 6} such that P (1) = . 4. P (5) = . P (C). 4. Refer to problem 3. 3. B. C.06 . P (D|A) and P (C|B).. P (D). and answer questions (i)-(viii).3.

are A and B mutually exclusive? independent? 7. (viii) Are C and D mutually exclusive events? Justify your answer. and green (g).6.3. either today or tomorrow 60%. (iii) Find the probability of selecting one black and one red ball. Suppose that the following two weather forecasts were reported on two local TV stations for the same period.7. · · ·} (ii) Find the probability of selecting a black ball. 6.) 8. and P (B) = .4. both today and tomorrow 20%. orange (o). either today or tomorrow 60%. (vi) Are B and C mutually exclusive events? Justify your answer. P (A) = . white (w). are A and B mutually exclusive? independent? (iii) If P (A ∪ B) = . red (r). P (A) = . what is the probability that it is white? black? (ii) If two balls are selected without replacement. Three balls are to be selected at random.65.5. tomorrow 40%. 9. (iii) Find P (A ∩ B). S = {bwr. (iv) Find P (A ∪ B). and P (B) = . Which of the two reports.4. (i) Find the sample space S (Hint: there is 10 sample points). is more believable? Why? No credit if answer is not justified. P (A) = . A box contains four black and six white balls. (Hint: Let A and B be the events of rain today and rain tomorrow. and P (B) = . tomorrow 40%. a black (b). are A and B mutually exclusive? independent? (ii) If P (A ∪ B) = . First report: The chances of rain are today 30%. both today and tomorrow 10%. (vii) Are C and D independent events? Justify your answer. Second report: The chances of rain are today 30%. if any. what is the probability that both balls are black? both are white? the first is white and the second is black? the first is black and the second is white? one ball is black? (iii) Repeat (ii) if the balls are selected with replacement.2.(ii) Find P (B c ). (v) Are B and C independent events? Justify your answer. 33 . (i) If a ball is selected at random.5. A box contains five balls. Use the laws of probability to justify your answers to the following questions: (i) If P (A ∪ B) = .

Answer by True of False . T F (ix) Although the probability of an event occurring is . T F (xi) If two events are independent. and by defining the events W1 abd W − 2 as the first ball is white and the second ball is white respectively. the event may not occur at all in 10 trials.(Hint: Start by defining the events B1 and B − 2 as the first ball is black and the second ball is black respectively. T F (vi) A random sample of n observations from a population is one in which every different subset of size n from the population has an equal probability of being selected. T F (v) A random sample of n observations from a population is not likely to provide a good estimate of a parameter. T F (viii) The probability of an elementary event can never be larger than one half. then the probability of each outcome is 1/5. T F (iii) If A and B are mutually exclusive events. then they are also dependent. T F (iv) The sum of the probabilities of all simple events in the sample space may be less than 1 depending on circumstances. the occurrence of one event should not affect the likelihood of the occurrence of the other event. T F (ii) The probability of an event can sometimes be negative.9. (Circle your choice). T F (i) An event is a specific collection of simple events. T F (x) If a random experiment has 5 possible outcomes. T F (vii) The probability of an event can sometimes be larger than one. 34 . Then use the product rule) 10.

Toss a coin 3 times. T HT. 35 . be the number of heads observed then relevant events would be {X = 0} = {T T T } {X = 1} = {HT T. HT H. T HH. X. Example. Random Variables Expected Values and Variance Binomial Poisson Hypergeometric 1 Random Variables The discrete rv arises in situations when the population (or possible outcomes) are discrete (or qualitative). T HH} {X = 3} = {HHH}. The relevant question is to find the probability of each these events. T HT. then S = {HHH.Chapter 3 Random Variables and Discrete Distributions Contents. HT H. Note that X takes integer values even though the sample space consists of H’s and T’s. T T H} {X = 2} = {HHT. HT T. T T H. HHT. T T T } Let the variable of interest.

10 x p(x) -1 1 2 0.v.20 x p(x) 0 1 2 0. X is discrete (qualitative data) 36 . X. Y . is unknown until the outcome is observed .v. (e.20 Remarks.30 0. assigns a finite or countably infinite number of possible values (e..v. etc.60 0.50 0.) Discrete Distributions The probability distribution of a discrete r. to represent r.v. (i) Discrete distributions arise when the r.variable: it takes a numerical value Notation: We use X. Which of the following defines a probability distribution? x p(x) 0 1 2 0.g.) A Continuous r. Definition.s. and (ii) x p(x) = 1 where the summation is over all possible values of x. assigns a probability p(x) for each possible x such that (i) 0 ≤ p(x) ≤ 1. has a continuum of possible values. Interpretation: -random: the value of the r.The variable X transforms the problem of calculating probabilities from that of set theory to calculus.v.v. weight.40 0. toss a coin. etc. A Discrete r. throw a die. height.g. A random variable (r. Discrete distributions in tabulated form Example.) is a rule that assigns a numerical value to each possible outcome of a random experiment.50 -0.30 0. etc. price.v.

2 If X is a rv with mean µ. Definition 2. (i) In data analysis we described a set of data (sample) by dividing it into classes and calculating relative frequencies. then the variance of X is defined by σ2 = x 2 Notation: Sometimes we use σ 2 = V (X) (or σX ). or sometimes µX to emphasize its dependence on X.(ii) Continuous distributions arise when the r. (ii) In Probability we described a random experiment (population) in terms of events and probabilities of events. Shortcut Formula x2 p(x) − µ2 σ2 = (x − µ)2 p(x) Definition 2. Notation: The expected value of X is also denoted by µ = E[X].1 The expected value of a discrete rv X is denoted by µ and is defined to be µ= x xp(x). denoted by σX . then the standard deviation of X.3 If X is a rv with mean µ. (or simply σ) is defined by σ= Shortcut Formula σ= x2 p(x) − µ2 V (X) = (x − µ)2 p(x) 37 . we describe a random experiment (population) by using random variables. X is continuous (quantitative data) Remarks. 2 Expected Value and Variance Definition 2. and probability distribution functions. (iii) Here.v.

σ = pq Binomial Tables. The binomial experiment (distribution) arises in following situation: (i) the underlying experiment consists of n independent and identical trials. p = . a success or a failure.4. (iii) the probability of a success in a single trial is equal to p and remains the same throughout the experiment. Mean: µ = np √ Variance: σ 2 = npq.7 38 .834 − .633 = .633 (ii) P (X < 6) = P (X ≤ 5) = . A rv X is said to have a Bernoulli distribution with parameter p if Formula: p(x) = px (1 − p)1−x x = 0. .201 Exercise: Answer the same question with p = 0. σ = npq Example: Bernoulli. Example.633 = . Suppose X has a binomial distribution with n = 10. Tabulated form: x p(x) 0 1 1-p p Mean: µ = p √ Variance: σ 2 = pq. 1. . and (iv) the experimenter is interested in the rv X that counts the number of successes observed in n trials. 1. Cumulative probabilities are given in the table.367 (iv) P (X = 5) = P (X ≤ 5) − P (X ≤ 4) = . n) x where q = 1 − p.v.834 (iii) P (X > 4) = 1 − P (X ≤ 4) = 1 − .3 Discrete Distributions Binomial. . (ii) each trial results in one of two possible outcomes. A r. X is said to have a binomial distribution with parameters n and p if p(x) = n x n−x p q (x = 0. Find (i) P (X ≤ 4) = . .

Hypergeometric. σ = λ Note: e 2. Find the probability that a sample of of 10 items will contain at most 1 defective item. Graph.Poisson.7358 which is close to the exact answer. The hypergeometric distribution arises when one selects a random sample of size n. examples include number of arrivals at an emergency room. A rv X is said to have a Poisson distribution with parameter λ > 0 if p(x) = e−λ λx /x!. we have λ = np = 1 e−1 + e−1 0. . preferably with np ≤ 7. 1.1)1 (0.9)10 + (0.71828 Example. . . from a finite population of size N divided into two classes consisting 39 .5 0. x = 0. Mean: µ = λ √ Variance: σ 2 = λ. Example. number of items in a batch of a random size. Suppose that the probability that an item produced by a certain machine will be defective is 0. . The Poisson distribution provides good approximations to binomial probabilities when n is large and µ = np is small.9)9 = 0.395 Rule of Thumb. The Poisson random variable arises when counting the number of events that occur in an interval of time when the events are occurring at a constant rate. Suppose the number of typographical errors on a single page of your book has a Poisson distribution with parameter λ = 1/2. Using the binomial distribution. Letting X denote the number of errors on a single page. without replacement. Solution. number of items demanded from an inventory.1. we have P (X ≥ 1) = 1 − P (X = 0) = 1 − e−0.1)0 (0. Calculate the probability that there is at least one error on this page.7361 0 1 Using Poisson approximation. Solution. the desired probability is P (X ≤ 1) = p(0) + p(1) = 10 10 (0.

000 buyers. If the manufacturers are competing for a population of y = 300.(Brand Switching Problem) Suppose that a manufacturer of a product (Brand 1) is competing with only one other similar product (Brand 2). how should they plan for the future (immediate future. (Sampling without replacement) Suppose an urn contains D = 10 red balls and N − D = 15 white balls. Example. is drawn and the number or red balls is denoted by X. etc. D Mean: E[X] = n( N ) D Variance: V (X) = ( N −n )(n)( N (1 − N −1 D )) N The N −n N −1 is called the finite population correction factor. Such a scheme is called sampling without replacement from a finite dichotomous population. We define F (x) = 0. Then f (x) = 10 x 25 8 15 8−x 0≤x≤8. n − N + D) ≤ x ≤ min(n. Responses to the survey are given below. 4 Markov Chains Example 1. elsewhere. Formula: f (x) = D x N −D n−x N n . without replacement. Both manufacturers have been engaged in aggressive advertising programs which include offering rebates.of D elements of the first kind and N − D of the second kind. where max(0. and in the long-run)? Brand Switching Data This week Last week Brand 1 Brand 2 Brand 1 90 10 Brand 2 40 160 Total 100 200 40 . D). A survey is taken to find out the rates at which consumers are switching brands or staying loyal to brands. A random sample of size n = 8.

7/3) B1 buyers will be 300. Solution: We need to solve π = πP . Determine whether each brand will eventually retain a constant share of the market. π2 )P 1 1 π 1 = (π1 . π2 ) = (π1 .2 0. suppose that customer behavior is not changed over time.8π2 π1 + π2 = 1 0.8 41 . 000(1. and i πi = 1.8 = (1. π2 ) = (1/3. 000 B2 buyers will be 300.2π2 π2 = 0. 000(1. Two weeks from now: exercise. 000.7/3) = 170. that is (π1 .9 0.9 0. 1.1 0. What percentage will purchase B1 next week? What percentage will purchase B2 next week? What percentage will purchase B1 two weeks from now? What percentage will purchase B2 two weeks from now? Solution: Note that π 0 = (1/3.3/3) = 130.2 0.1 0. 2/3).1 0. 2/3) 0.Brand 1 Brand 2 Brand 1 90/100 10/100 Brand 2 40/200 160/200 So P = 0. Question 2. π2 ) = (π1 .9π1 + 0. If 1/3 of all customers purchased B1 this week.9 0.3/3.1π1 + 0. then 1 1 0 0 π 1 = (π1 . π2 ) and π1 + π2 = 1 Matrix multiplication gives π1 = 0.8 Question 1.2 0.

4. π2 ) = (2/3.7. 000) customers. If she is cheerful today then she will be cheerful tomorrow with probability 0. Show your work graphically in all relevant questions.1π1 = 0. class at a university (vi) The rate of interest paid by your local bank on a given day 2. Choose the first and the third. Complete this problem. Exercise.6 0.One equation is redundant. On any particular day Rebecca is either cheerful (c) or gloomy (g). answer method whenever possible. and π0 + π1 = 1. substitution.7 0. where π = (π0 . Example 2. 1/3) Brand 1 will eventually capture two thirds of the market (200. (i) The market value of a publicly listed security on a given day (ii) The number of printing errors observed in an article in a weekly news magazine (iii) The time to assemble a product (e. π1 ). (i) What is the transition matrix P ? Solution: 0. 1.4 (ii) What is the fraction of days Rebecca is cheerful? gloomy? Solution: The fraction of days Rebecca is cheerful is the probability that on any given day Rebecca is cheerful. No credit for a correct final answer without a valid argument. a chair) (iv) The number of emergency cases arriving at a city hospital (v) The number of sophomores in a randomly selected Math. What restrictions do we place on the probabilities associated with a particular probability distribution? 42 .3 P = 0. This can be obtained by solving π = πP . Identify the following as discrete or continuous random variables. If she is gloomy today then she will be gloomy tomorrow with probability 0. Review Exercises: Discrete Distributions and π1 + π2 = 1 Please show all work. we get 0.g. Use the formula.2π2 which gives (π1 .

6 . indicate which of the restrictions has been violated.5 .2 .5 -.2 (ii) x p(x) -2 1 4 . (i) x p(x) -1 0 1 .1 .3.2 (ii) x -1 1 p(x) . Indicate whether or not the following are valid probability distributions.6 3.1 3.1 43 .2 6 .2 .6 . If they are not.

(ii) Calculate the variance of X. σ 2 . P (X > 3).15 .2 . For each of the following probability distributions. σ 2 = 21.25 (i) Verify that X has a valid probability distribution. A discrete random variable X has the following probability distribution: x p(x) 10 15 20 25 .1 (i) Calculate the expected value of X. calculate the expected value of X.58. E(X) = µ. (ii) Find the probability that X is greater than 3.3 .3 3 4 .4 . (i) x p(x) 1 2 . i. and the standard deviation of X.e. σ = 4.10 3 4 5 .2 . P (X ≥ 3). 6. A random variable X has the following probability distribution: x p(x) 1 2 .1 44 . i. (iii) Find the probability that X is greater than or equal to 3. (vi) Graph the probability distribution for X. σ 2 .e. σ. i. Answers: µ = 17. (v) Find the probability that X is an odd number.4 . the variance of X.4. σ. E(X) = µ. (iv) Find the probability that X is less than or equal to 2.05 .e. P (X ≤ 2). (ii) Calculate the standard deviation of X.45 . 5.

T F (x) The probability p(x) for a discrete random variable X must be greater than or equal to zero but less than or equal to one. T F (ii) A random variable has a single numerical value for each outcome of a random experiment. T F (viii) The variance can never be equal to zero. Show your work graphically in all relevant questions. T F (iv) A random variable is one that takes on different values depending on the chance outcome of an experiment.2 . T F (iii) The only rule that applies to all probability distributions is that the possible random variable values are always between 0 and 1. Use the formula.2 7. T F (ix) The variance can never be negative. Review Exercises: Binomial Distribution Please show all work. Answer by True of False . T F (vii) The expected value of a random variable provides a complete description of the random variable’s probability distribution.3 . substitution. T F (i) The expected value is always positive. T F (xii) The most common method for sampling more than one observation from a population is called random sampling. No credit for a correct final answer without a valid argument. T F (xi) The sum of all probabilities p(x) for all possible values of X is always equal to one. In how many ways can a committee of ten be chosen from fifteen individuals? 8. T F (vi) The monthly volume of gasoline sold in one gas station is an example of a discrete random variable. T F (v) The number of television programs watched per day by a college student is an example of a discrete random variable.(ii) x p(x) -2 -1 2 . answer method whenever possible.3 4 . (Circle your choice). 45 .

and p = . 5. · · · . (iii) Find the expected value E(X) = µ (iv) Find the standard deviation σ 6. (iv) Find P (X ≤ 2) using the table. 2. (i) Use the formula to find P (0). the annual number of sales. (i) Find the expected value E(X) = µ (ii) Find the standard deviation σ 7. (vi) Find P (X > 13) using the table. Calculate (i) 5! (ii) 10! (iii) 7! 3!4! 4. (iv) Repeat (i) and (ii) when n = 4. P (1). (i) What is the expected value of X.1. Consider a binomial distribution with n = 5 and p = . and p = .6.6. List the properties for a binomial experiment. (vii) Find P (X ≥ 8) using the table. (ii) Find P (X ≤ 2) using the formula. P (4). (i) Find P (0) and P (2) using the formula. Consider a binomial distribution with n = 4 and p = . Give the formula for the binomial probability distribution. A sales organization makes one sale for every 200 prospects that it contacts. 000 prospects over the coming year. (ii) Graph the probability distribution found in (i) (iii) Repeat (i) and (ii) when n = 4. Consider a binomial distribution with n = 500 and p = . (v) Find P (X < 12) using the table.8. Consider a binomial distribution with n = 25 and p = .2. The organization plans to contact 100. 3. 46 . 8.5.6. (ii) What is the standard deviation of X. (i) Find the expected value E(X) = µ (ii) Find the standard deviation σ (iii) Find P (0) and P (2) using the table.

σ = 22. 47 . Answer by True of False .3 9. Answers: µ = 500. (i) a shopping mall is interested in the income levels of its customers and is taking a survey to gather information (ii) a business firm introducing a new product wants to know how many purchases its clients will make each year (iii) a sociologist is researching an area in an effort to determine the proportion of households with male “head of households” (iv) a study is concerned with the average hours worked be teenagers who are attending high school (v) Determining whether or nor a manufactured item is defective.(iii) Within what limits would you expect X to fall with 95% probability. (vii) Determining the weekly pay rate per employee in a given company. T F (i) In a binomial experiment each trial is independent of the other trials. 10. (Circle your choice). T F (i) A binomial distribution is a discrete probability distribution T F (i) The standard deviation of a binomial probability distribution is given by npq. (vi) Determining the number of words typed before a typist makes an error. (Use the empirical rule). Identify the binomial experiment in the following group of statements.

then S = {x. For any continuous pdf the area under the curve is equal to 1. Exponential 1 Introduction RECALL: The continuous rv arises in situations when the population (or possible outcomes) are continuous (or quantitative). {X ≥ 1000}.Chapter 4 Continuous Distributions Contents. Example. Uniform 4. A normally distributed (bell shaped) random variable with µ = 0 and σ = 1 is said to have the standard normal distribution. Standard Normal 2. Observe the lifetime of a light bulb. The relevant question is to find the probability of each these events. Normal 3. or {1000 ≤ X ≤ 2000}. 48 . It is denoted by the letter Z. be observed lifetime of the light bulb then relevant events would be {X ≤ x}. 1. 0 ≤ x < ∞} Let the variable of interest. 2 The Normal Distribution Standard Normal. X. Important.

10. (Exercise) Normal A rv X is said to have a Normal pdf with parameters µ and σ if Formula: 1 2 2 f (x) = √ e−(x−µ) /2σ . (v) P (Z > z0 ) = .645.6826 (iii) P (−2 ≤ Z ≤ 2) = . Properties Mean: E[X] = µ Variance: V (X) = σ 2 Graph: Bell shaped. z0 = 1.005.05.: 49 . z0 = 2.58.01.01. (iv) P (Z > z0 ) = . z0 = 2. 0 < σ < ∞ . (ii) P (Z > z0 ) = .pdf of Z: 1 2 f (z) = √ e−z /2 .025. Examples. σ 2π where −∞ < µ < ∞.05. Tabulated Values. (vi) P (Z ≤ z0 ) = . Standardizing a normal r. z0 = 1.28. . Area under graph = 1. Critical Values: zα of the standard normal distribution are given by P (Z ≥ zα ) = α which is in the tail of the distribution.005. . (i) P (0 ≤ Z ≤ 1) = . .33.3413 (ii) P (−1 ≤ Z ≤ 1) = .96.9974 Examples. −∞ < x < ∞.v.10.9544 (iv) P (−3 ≤ Z ≤ 3) = . Values of P (0 ≤ Z ≤ z) are tabulated in the appendix. 2π Graph. Find z0 such that (i) P (Z > z0 ) = . −∞ < z < ∞. (iii) P (Z > z0 ) = .025. z0 = 1. .

and (iii) P (X > 9). find P (X < −3). Solution (i) P (2 < X < 5) = P (−0.2 years. (ii) P (X > 0).75 1.1 = −1. If this type of washer is guaranteed for 1 year. Example The length of life of a certain type of automatic washer is approximately normally distributed. Example If X is a normal rv with parameters µ = 3 and σ 2 = 9.0228 Exercise Refer to the above example.0) = 0. (iii) P (X > 9) = P (Z > 2. then z= Therefore P (X < 1) = P (Z < −1.4772 = .67) = .2 50 .5 − 0.1 years and standard deviation of 1. find (i) P (2 < X < 5). (ii) P (X > 0) = P (Z > −1) = P (Z < 1) = . with a mean of 3.8413.Z-score: Z= OR (simply) Z= Conversely.3779. X − µX σX X −µ σ X = µ + σZ . what fraction of original sales will require replacement? Solution Let X be the length of life of an automatic washer selected at random.75) = 1 − 3.33 < Z < 0.

5) = P( X − 20 20. 40 (0. Mean: µ = (a + b)/2 51 . The exact result is P (X = 20) = (ii) Exercise.5 − 20 √ < √ < √ ) 10 10 10 P (−0. np ≥ 5 and n(1 − p) ≥ 5. P (X = 20) = P (19. c−a .b] 1 a<x<b b−a = 0 elsewhere Formula: f (x) = √ Variance: σ 2 = (b − a)2 /12.5 < X < 20. P (X ≤ c) = b−a P (X ≤ c) = 1. Use the normal approximation. (ii) Find P (10 ≤ X ≤ 20). c ≤ a .5)20 (0.5 − 20 19.e.1268 20 3 Uniform: U[a.16 < Z < 0. 2. Let X be the number of times that a fair coin. σ = (b − a)/ 12 CDF: (Area between a and c) P (X ≤ c) = 0. (i) Find the probability that X = 20. lands heads. i.Exercise: Complete the solution of this problem. c ≥ b Graph. Large n.5)20 = . The approximation can be improved using correction factors. Normal Approximation to the Binomial Distribution. flipped 40. Solution Note that np = 20 and np(1 − p) = 10.16) = . When and how to use the normal approximation: 1.a ≤ c ≤ b . Example.1272.

Solution Let X be the be the length of a phone call in minutes by the person ahead of you. Specialize the above results to the Uniform [0. Examples include time until a new car breaks down. σ = 1/λ CDF: P (X ≤ a) = 1 − e−λa . Suppose that the length of a phone call in minutes is an exponential rv with parameter λ = 1/10. Mean: µ = 1/λ Variance: σ 2 = 1/λ2. find the probability that you will have to wait (i) more than 10 minutes. (i) P (X > 10) = e−λa = e−1 (ii) P (10 < X < 20) = e−1 − e−2 0. etc. and (ii) between 10 and 20 minutes. that a computer functions before breaking down is an exponential rv with λ = 1/100.233 0.. 1] case. 52 . . x ≥ 0 = 0 elsewhere Properties Graph. in practice. The amount of time. A rv X is said to have an exponential pdf with parameter λ > 0 if f (x) = λe−λx . 4 Exponential The exponential pdf often arises. (i) What is the probability that a computer will function between 50 and 150 hours before breaking down? (ii) What is the probability that it will function less than 100 hours? Solution. as being the distribution of the amount of time until some specific event occurs.368 Example 2.. time until an arrival at emergency room. P (X > a) = e−λa Example 1. If someone arrives immediately ahead of you at a public telephone booth.Exercise. in hours.

75 (i.0)) (vii) z = −2. Calculate the area under the standard normal curve between the following values.0 (i.e. P (−2. Let Z be a standard normal distribution.86 (i.75 ≤ Z ≤ −.6 ≤ Z ≤ 0)) (iii) z = . P (−1.0)) 2. answer method whenever possible.0 and z = 1.e.26 ≤ Z ≤ 1.0 and z = 2. Memoryless Property FACT. No credit for a correct final answer without a valid argument.0 ≤ Z ≤ 1.384 (ii) Exercise. P (−3. P (−1.86)) (vi) z = −1.95 53 . P (−1.0 (i.0 ≤ Z ≤ 3.6 (i. The exponential rv has the memoryless property. substitution. 1. Find z0 such that (i) P (Z ≥ z0 ) = 0.26 and z = 1.0 (i.0)) (viii) z = −3.0 and z = 3. P (−1.e. (i) z = 0 and z = 1. P (0 ≤ Z ≤ 1.0708 (v) P (−z0 ≤ Z ≤ z0 ) = 0.6 (i.0708 (iv) P (Z ≤ z0 ) = 0. Converse The exponential distribution is the only continuous distribution with the memoryless property.05 (ii) P (Z ≥ z0 ) = 0.(i) The probability that a computer will function between 50 and 150 hours before breaking down is given by P (50 ≤ X ≤ 150) = e−50/100 − e−150/100 = e−1/2 − e−3/2 .e.86 (i.75 and z = −.86)) (v) z = −1.e.68 (vi) P (−z0 ≤ Z ≤ z0 ) = 0. Use the formula. Review Exercises: Normal Distribution Please show all work.e.99 (iii) P (Z ≥ z0 ) = 0. P (.e. Show your work graphically in all relevant questions.75)) (iv) z = −1.e.86 ≤ Z ≤ 1.0 ≤ Z ≤ 2.6)) (ii) z = 0 and z = −1.86 and z = 1.

e. T F (iii) The standard normal distribution has its mean equal to one and standard deviation equal to zero. (ii) X falls between 6 and 14 (i. Find the following probabilities. P (10 ≤ X ≤ 12)). P (X ≤ 12)). (iii) X is less than 12 (i.005 4. 5. (Answer: . (iv) X exceeds 10 (i.4 inches. What is the probability a randomly selected battery will last between 110 and 120 hours. (Circle your choice).e. P (X ≥ 10)). (Answer: .e. T F (ii) The standard normal distribution has its mean and standard deviation equal to one.5 inches and standard deviation 2. The height of adult women in the United States is normally distributed with mean 64. Let Z be a standard normal distribution.025 (iv) P (Z ≥ z0 ) = 0. Find z0 such that (i) P (Z ≥ z0 ) = 0.e. The lifetimes of batteries produced by a firm are normally distributed with a mean of 100 hours and a standard deviation of 10 hours. What percentage of women are shorter than Alice. T F (i) The standard normal distribution has its mean and standard deviation equal to zero. P (6 ≤ X ≤ 14)). T F (v) Because the normal distribution is symmetric half of the area under the curve lies below the 40th percentile.3.01 (v) P (Z ≥ z0 ) = 0. (i) X falls between 10 and 12 (i.9966) 6. Answer by True of False . A normally distributed random variable X possesses a mean of µ = 10 and a standard deviation of σ = 5. (i) Find the probability that a randomly chosen woman is larger than 70 inches tall.10 (ii) P (Z ≥ z0 ) = 0. 54 .05 (iii) P (Z ≥ z0 ) = 0. 7.011) (ii) Alice is 71 inches tall. T F (iv) The standard normal distribution has its mean equal to zero and standard deviation equal to one.

T F (vii) The normal distribution is symmetric only if the mean is zero and the standard deviation is one.T F (vi) The total area under the normal curve is equal to one only if the mean is equal to zero and standard deviation equal to one. 55 .

i. is Z= p − µp ˆ ˆ σp ˆ 2 Sampling Distributions Suppose the distribution of X is normal with with mean µ and standard deviation σ.e. X. the CLT says The sampling distribution of the sample mean. (i) What is the distribution of X−µ ? σ Answer: It is a standard normal. 56 . is Z= X − µX σX ˆ The sampling distribution of the sample proportion.Chapter 5 Sampling Distributions Contents. P . The Central Limit Theorem The Sampling Distribution of The Sampling Distribution of The Sampling Distribution of The Sampling Distribution of the the the the Sample Mean Sample Proportion Difference Between Two Sample Means Difference Between Two Sample Proportions 1 The Central Limit Theorem (CLT) Roughly speaking.

5). The Sampling Distribution of the Sample Proportion Suppose the distribution of X is binomial with with parameters n and p.(X) = √ n (iii) What is the sampling distribution of the sample mean X? Answer: The distribution of X is a normal distribution with mean µ and standard √ deviation σ/ n. ˆ (ii) What is the the mean (expected value) and standard deviation of P ? Answer: ˆ µP = E(P ) = p ˆ 57 . A sample of size 36 is to be selected. (exercise) (iv) Find P (3.5). (i) What is the mean and standard deviation of X? (ii) Find P (4 < X < 5). with mean µ = 4 and standard deviation σ = 3. n ≥ 30).5 ≤ X ≤ 4. The Sampling Distribution of the Sample Mean (ii) What is the the mean (expected value) and standard deviation of X? Answer: µX = E(X) = µ σ σX = S. (exercise) II. X. X. (iii) Find P (X > 3.Z= X −µ σ I. equivalently.E. if X is not normally distributed? Answer: The distribution of X is approximately a normal distribution with mean µ √ and standard deviation σ/ n provided n is large (i. Example.e. Consider a population. Z= X − µX X −µ √ = σX σ/ n (iv) What is the sampling distribution of the sample mean.

It is claimed that at least 30% of all adults favor brand A versus brand B. p = . Answer: µp = p = .(P ) = ˆ pq n ˆ (iii) What is the sampling distribution of the sample proportion P ? ˆ Answer: P has a normal distribution with mean p and standard deviation equivalently Z= ˆ ˆ P − µP P −p ˆ = pq σP ˆ n pq . Comparing two Sample Means E(X 1 − X 2 ) = µ1 − µ2 2 2 σ1 σ2 + n1 n2 pq = .ˆ σP = S.023 n σX 1 −X 2 = Z= X 1 − X 2 − (µ1 − µ2 ) 2 σ1 n1 + 2 σ2 n2 provided n1 .e. x = 130.325 ˆ ˆ (i) Find the mean and standard deviation of the sample proportion P . To test this theory a sample n = 400 is selected. and nq ≥ 5). DATA SUMMARY: n = 400. 58 .30.E. np ≥ 5. p = 130/400 = .30) III. n2 ≥ 30. n provided n is large (i.30 ˆ σp = ˆ ˆ (ii) Find P (P > 0. Example. Suppose 130 individuals indicated preference for brand A.

describe the distribution of X and give µx . Let X be the sample average. A random sample of n = 16 observations is to be selected. substitution. 1. (vi) Find P (X ≥ 23)). 2. ˆ ˆ P1 − P2 − (p1 − p2 ) p1 q1 2 + pnq2 n1 2 Review Exercises: Sampling Distributions Please show all work. Comparing two Sample Proportions ˆ ˆ E(P1 − P2 ) = p1 − p2 p1 q1 p2 q2 + n1 n2 σP1 −P2 = ˆ ˆ Z= provided n1 and n2 are large. No credit for a correct final answer without a valid argument. (i) Describe the sampling distribution of the sample mean.IV. (Include the mean µx .6) (iii) Find P (X ≥ 22) = (iv) Find P (20 ≤ X ≤ 22)). Use the formula. and type of distribution). σx = 1. (Answer: µ = 20. Suppose a random sample of 49 families is taken and a sample mean is calculated. answer method whenever possible. Show your work graphically in all relevant questions. (i) Describe the sampling distribution of X (i. 59 . (vii) Find P (X ≥ 18)). (v) Find P (16 ≤ X ≤ 19)). X. A normally distributed random variable X possesses a mean of µ = 20 and a standard deviation of σ = 5. σx ). The number of trips to doctor’s office per family per year in a given community is known to have a mean of 10 with a standard deviation of 3. standard deviation σx .2) (ii) Find the z-score of x = 22 (Answer: 1.e.

When a random sample of size n is drawn from a normal population with mean µ and and variance σ 2 . X. T F (ii) The central limit theorem is important because it explains why some estimators tend to possess. does not exceed 11. the sampling distribution of the sample mean X will be (a) exactly normal. Answer by True of False . approximately. 60 . a normal distribution. (b) approximately normal (c) binomial (d) none of the above 4. (Circle your choice).(Answer: .99) 3. X. (Answer: .(ii) Find the probability that the sample mean.01) (iii) Find the probability that the sample mean. T F (i) The central limit theorem applies regardless of the shape of the population frequency distribution. does not exceed 9.

(i) Unbiased: Mean of the sampling distribution is equal to the parameter. (ii) Minimum variance: Small standard error of point estimator. Two Binomial Populations 7. Desired Properties of Interval Estimators. 1. (iii) Margin of Error: (Bound on the error of estimation) should be as small as possible. 1. Point Estimators and Their Properties 3. Choosing the Sample Size 1 Introduction Types of estimators. (ii) Confidence level: Confidence coefficient expressed as a percentage. Point estimator 2. Introduction 2. 61 . (iii) Error of estimation: distance between a parameter and its point estimate is small. Single Binomial Population 5. Interval estimator: (L.Chapter 6 Large Sample Estimation Contents. Single Quantitative Population 4. Two Quantitative Populations 6. Parameters of Interest. U) Desired Properties of Point Estimators. (i) Confidence coefficient: P(interval estimator will enclose the parameter)=1 − α should be as high as possible.

) for µ: σ x ± zα/2 √ n Confidence level: (1 − α)100% which is the probability that the interval estimator contains the parameter. Large sample (n ≥ 30) 2. x.Single Quantitative Population: µ Single Binomial Population: p Two Quantitative Populations: µ1 − µ2 Two Binomial Populations: p1 − p2 2 Point Estimators and Their Properties Parameter of interest: θ ˆ ˆ Sample data: n. Margin of Error. 1. σθ ˆ Point estimator: θ Estimator mean: µθ = θ (Unbiased) ˆ ˆ Standard error: SE(θ) = σ ˆ θ Assumptions: Large sample + others (to be specified in each case) 3 Single Quantitative Population Parameter of interest: µ Sample data: n. Sample is randomly selected 62 . θ. s Other information: α Point estimator: x Estimator mean: µx = µ √ Standard error: SE(x) = σ/ n (also denoted as σx ) Confidence Interval (C.I. ( or Bound on the Error of Estimation) σ B = zα/2 √ n Assumptions.

x = 11.6 Question 2.645 √ 225 11. What is the point estimate of µ ( Do not give the margin of error)? x = 11.45) = 0.1 σ = 0. is increased what happens to the width of the CI? what happens to the margin of error? The width of the CI decreases. The margin of error decreases.1 11. A random sample of n = 225 flights shows that the sample mean is 11. Question 1. We are interested in estimating the mean number of unoccupied seats per flight.1.6 and the standard deviation is 4. for a major airline.05 − 11.05) Question 4.90 Question 6.? The width of the CI is σ W = 2zα/2 √ n W = 2(0. s = 4. Question 5.45 = (11. OR If repeated sampling is used.6 ± 0. The interval contains µ with probability 0. then 90% of CI constructed would contain µ. Interpret the CI found in Question 3. 12. What is the width of the CI found in Question 3.6. the sample size. Data summary: n = 225. Sample size: (zα/2 )2 σ 2 n B2 63 .96 √ n 225 Question 3. 4.15 = 0.Example 1. σ x ± zα/2 √ n 4.5357 B = zα/2 √ = 1.15.90. Give a 95% bound on the error of estimation (also known as the margin of error).90 OR W = 12. µ.1. Find a 90% confidence interval for µ.6 ± 1. If n.

Therefore n (zα/2 )2 σ 2 B2 2. B = zα/2 pq ˆˆ n 64 .01.1)2 = = 106. Use α = 0.025. Data summary: α = . Example 2. σ is sometimes approximated by R where R is the 4 range.4.3 = .0252 So n = 107.4/4 = .1.6.7.50 .05.where σ is estimated by s. Suppose you want to construct a 99% CI for µ so that W = 0. R = 13.I. 4 Single Binomial Population Parameter of interest: p Sample data: n. Find the sample size necessary to reduce W in the flight example to . Note: In the absence of data. Now B = W/2 = 0.) for p: p ± zα/2 ˆ pq ˆˆ n Confidence level: (1 − α)100% which is the probability that the interval estimator contains the parameter.3 to 13. (round up) Exercise 1. Margin of Error. What sample size should you choose? A. p = n (x here is the number of successes).582 (.05/2 = 0. ˆ x Other information: α Point estimator: p ˆ Estimator mean: µp = p ˆ Standard error: σp = ˆ pq n Confidence Interval (C. so σ . You are told that preliminary data shows a range from 13. 0.05. x.7 − 13.

044 n 484 Question 3. What is the point estimate of p and its margin of error? p= ˆ x 257 = = 0. Do we have a large sample size? nˆ = 484(0. The interval contains p with probability 0.037) = 0. x = 257.96 = 0. nq ≥ 5) 2. Question 6. A random sample of n = 484 voters in a community produced x = 257 voters in favor of candidate A.469) = 1.531 n 484 B = zα/2 pq ˆˆ (0.531)(0.531 ± 0.074 n Question 5. Interpret the CI found in Question 3.494.531)(0. p ± zα/2 ˆ pq ˆˆ n 0. 0. What is the width of the CI found in Question 3. Sample is randomly selected Example 3.Assumptions. Question 2. Find a 90% confidence interval for p. 1. p = n = 257 = 0. OR If repeated sampling is used.469) 484 0.531 ± 1.645 (0. ˆ x 484 Question 1.568) Question 4.531) = 257 which is ≥ 5.469) = 227 which is ≥ 5.037 = (0. Large sample (np ≥ 5. If n.? The width of the CI is W = 2zα/2 pq ˆˆ = 2(0.90. is increased what happens to the width of the CI? what happens to the margin of error? 65 . p nˆ = 484(0. then 90% of CI constructed would contain p. q Therefore we have a large sample size. Data summary: n = 484.531. the sample size.

2 B Note: In the absence of data. 884.01.05.015. n pˆ (zα/2 )2 (ˆq) . x2 .35. p = 0.65) = = 3. Use α = 0.96)2 (0. s1 Sample 2: n2 . Sample size. (round up) Exercise 2. Data summary: α = .0152 So n = 3.35)(0.25.The width of the CI decreases. Suppose you want to provide an accurate estimate of customers preferring one brand of coffee over another.015 ˆ n pˆ (zα/2 )2 (ˆq ) 2 B (1. You need to construct a 95% CI for p so that B = 0. 885.35.05. Find the sample ˆ size necessary so that α = 0. What sample size should you choose ˆ ? Use α = 0.28 0. You are told that preliminary data shows a p = 0. x1 . (x1 − x2 ) ± zα/2 2 2 σ1 σ2 + n1 n2 66 . Exercise 3. 5 Two Quantitative Populations Parameter of interest: µ1 − µ2 Sample data: Sample 1: n1 . The margin of error decreases.5 or simply pq = 0. ˆ ˆ ˆˆ Example 4. s2 Point estimator: X 1 − X 2 Estimator mean: µX 1 −X 2 = µ1 − µ2 Standard error: SE(X 1 − X 2 ) = 2 σ1 n1 + 2 σ2 n2 Confidence Interval. Suppose that no preliminary estimate of p is available. Find the new sample ˆ size. B = 0.05. Suppose that no preliminary estimate of p is available. choose p = q = 0.

n1 q1 ≥ 5. Large samples. Large samples ( n1 ≥ 30. p1 = n1 ˆ 1 Sample 2: n2 .Assumptions. x1 . p ˆ ˆˆ (zα/2 )2 (ˆ1 q1 + p2 q2 ) n 2 B For unkown parameters: n (zα/2 )2 (0. Samples are randomly selected 3. ˆ (ˆ1 − p2 ) ± zα/2 p Assumptions. p2 = ˆ x2 n2 p1 − p2 (unknown parameter) α (significance level) ˆ Point estimator: p1 − p2 ˆ Estimator mean: µp1−ˆ2 = p1 − p2 ˆ p Estimated standard error: σp1 −ˆ2 = ˆ p Confidence Interval. x2 . n2 p2 ≥ 5. 1. 1. n2 q2 ≥ 5) 2. (n1 p1 ≥ 5. n2 ≥ 30) 2. 2 2 (zα/2 )2 (σ1 + σ2 ) n B2 6 Two Binomial Populations Parameter of interest: p1 − p2 x Sample 1: n1 .5) B2 ˆˆ p1 q1 p2 q2 ˆˆ + n1 n2 p1 q1 ˆ ˆ n1 + p2 q2 ˆ ˆ n2 Review Exercises: Large-Sample Estimation 67 . Samples are independent Sample size. Samples are randomly and independently selected Sample size.

answer method whenever possible. (Answer: B=1. (Answer: B=1. (i) Estimate the average daily yield µ. Suppose we wish to estimate the average daily yield of a chemical manufactured in a chemical plant. and give a 95% bound on the error of estimation (or margin of error).135) (ii) Construct a 99% confidence interval for µ. (iii) Interpret the confidence interval found in (ii). the width of the confidence interval for the population mean tends to increase. The daily yield recorded for n = 100 days.18) (ii) Find a 99% confidence interval for the population mean. produces a mean and standard deviation of x = 870 and s = 20 tons respectively. 4.Please show all work.55) (iii) Interpret the confidence interval found in (ii). (i) Estimate the population mean µ. No credit for a correct final answer without a valid argument. (Answer: n=400) 2. how many insurance policies should be sampled to achieve the desired level of accuracy? (Answer: n=369) 3. substitution. An examination of the yearly premiums for a random sample of 80 automobile insurance policies from a major company showed an average of $329 and a standard deviation of $49. A random sample of size n = 100 is selected form a quantitative population. (Answer: B=14. Answer by True of False . The data produced a mean and standard deviation of x = 75 and s = 6 respectively. and give a 95% bound on the error of estimation (or margin of error). 68 . 1. Use the formula. Show your work graphically in all relevant questions. T F (i) If the population variance increases and other factors are the same. (iv) Find the sample size necessary to reduce the width of the confidence interval in (ii) by half. (ii) Find a 99% confidence interval for the population mean. (iv) Find the sample size necessary to reduce the width of the confidence interval in (ii) by half. (Circle your choice). (iii) Suppose we wish our estimate in (i) to be accurate to within $5 with 95% confidence. (i) Give the point estimate of the population parameter µ and a 99% bound on the error of estimation. (Margin of error).

then the margin of error will increase. T F (iv) If.. for a given C. the width of the confidence interval for the population mean tends to decrease. T F (v) The sample standard deviation s can be used to approximate σ when n is larger than 30. 69 .T F (ii) As the sample size increases.I. T F (vi) The sample mean always lies above the population mean. α is increased. T F (iii) Populations are characterized by numerical descriptive measures called statistics.

We show that Ha is true by showing that H0 is false.. 1.. Reporting results of statistical tests: p-Value 1 Elements of a Statistical Test Null hypothesis: H0 Alternative (research) hypothesis: Ha Test statistic: Rejection region : reject H0 if .. Graph: Decision: either “Reject H0 ” or “Do no reject H0 ” Conclusion: At 100α% significance level there is (in)sufficient statistical evidence to “ favor Ha ” . Testing the difference between two population proportions 7. Testing the difference between two population means 6.. A Large-sample statistical test 3. Comments: * H0 represents the status-quo * Ha is the hypothesis that we want to provide evidence to justify. Testing a population proportion 5.Chapter 7 Large-Sample Tests of Hypothesis Contents. that is proof by contradiction. Elements of a statistical test 2. Testing a population mean 4. Type I error ≡ { reject H0 |H0 is true } 70 .

3) θ = θ0 Test statistic (TS): ˆ θ − θ0 z= σθ ˆ Critical value: either zα or zα/2 Rejection region (RR) : 1) Reject H0 if z > zα 2) Reject H0 if z < −zα 3) Reject H0 if z > zα/2 or z < −zα/2 Graph: Decision: 1) if observed value is in RR: “Reject H0 ” 2) if observed value is not in RR: “Do no reject H0 ” 71 .Type II error ≡ { do not reject H0 |H0 is false} α = P rob{Type I error} β = P rob{Type II error} Power of a statistical test: Prob{reject H0 — H0 is false }= 1 − β Example 1. H0 : Innocent Ha : Guilty α = P rob{sending an innocent person to jail} β = P rob{letting a guilty person go free} Example 2. H0 : New drug is not acceptable Ha : New drug is acceptable α = P rob{marketing a bad drug} β = P rob{not marketing an acceptable drug} 2 A Large-Sample Statistical Test Parameter of interest: θ ˆ ˆ Sample data: n. σθ Test: Null hypothesis (H0 ) : θ = θ0 Alternative hypothesis (Ha ): 1) θ > θ0 . 2) θ < θ0 . θ.

Conclusion: At 100α% significance level there is (in)sufficient statistical evidence to ··· . Assumptions: Large sample + others (to be specified in each case). s2 = 25.05 H0 : µ = 20 (µ is not larger than 20) 72 . x = 21. One tailed statistical test Upper (right) tailed test Lower (left) tailed test Two tailed statistical test 3 Testing a Population Mean Parameter of interest: µ Sample data: n. x.S. s Other information: µ0 = target value. Assumptions: Large sample (n ≥ 30) Sample is randomly selected Example: Test the hypothesis that weight loss in a new diet program exceeds 20 pounds during the first month. Sample data : n = 36. 3) µ = µ0 T. : x − µ0 √ z= σ/ n Rejection region (RR) : 1) Reject H0 if z > zα 2) Reject H0 if z < −zα 3) Reject H0 if z > zα/2 or z < −zα/2 Graph: Decision: 1) if observed value is in RR: “Reject H0 ” 2) if observed value is not in RR: “Do no reject H0 ” Conclusion: At 100α% significance level there is (in)sufficient statistical evidence to “ favor Ha ” . α = 0. µ0 = 20. 2) µ < µ0 . α Test: H0 : µ = µ 0 Ha : 1) µ > µ0 .

4 Testing a Population Proportion Parameter of interest: p (unknown parameter) Sample data: n and x (or p = n ) ˆ x p0 = target value α (significance level) Test: H 0 : p = p0 Ha : 1) p > p0 .Ha : µ > 20 (µ is larger than 20) T.5. : 21 − 20 x − µ0 √ √ = = 1.S. Exercise: Test the claim that weight loss is not equal to 19. 3) p = p0 T.2 z= s/ n 5/ 36 Critical value: zα = 1. 2) p < p0 .645 Graph: Decision: Do not reject H0 Conclusion: At 5% significance level there is insufficient statistical evidence to conclude that weight loss in a new diet program exceeds 20 pounds per first month.645 RR: Reject H0 if z > 1. : p − p0 ˆ z= p0 q0 /n RR: 1) Reject H0 if z > zα 2) Reject H0 if z < −zα 3) Reject H0 if z > zα/2 or z < −zα/2 Graph: Decision: 1) if observed value is in RR: “Reject H0 ” 2) if observed value is not in RR: “Do not reject H0 ” Conclusion: At (α)100% significance level there is (in)sufficient statistical evidence to “ favor Ha ” . Assumptions: 73 .S.

Solution. 26 p = n = 200 = . s1 Sample 2: n2 .13.10)(. : (x1 − x2 ) − D0 z= 2 σ1 σ2 + n2 n1 2 RR: 1) Reject H0 if z > zα 2) Reject H0 if z < −zα 74 .90)/200 RR: reject H0 if z > 1.13 − .10 TS: p − p0 ˆ . Sample is randomly selected Example. ˆ x Now H0 : p = .10 for sample data: n = 200. Exercise Is the large sample assumption satisfied here ? 5 Comparing Two Population Means Parameter of interest: µ1 − µ2 Sample data: Sample 1: n1 . 2) µ1 − µ2 < D0 . x1 . x2 .41 p0 q0 /n (. nq ≥ 5) 2.10 Ha : p > . s2 Test: H 0 : µ 1 − µ 2 = D0 Ha : 1)µ1 − µ2 > D0 .10.645 Graph: Dec: Do not reject H0 Conclusion: At 5% significance level there is insufficient statistical evidence to conclude that p > . Test the hypothesis that p > .10 z= = = 1.1.S. Large sample (np ≥ 5. 3) µ1 − µ2 = D0 T. x = 26.

s2 = 24 (new) 2 D0 = 0. 1. ˆ 1 75 . p1 = n1 .96 Graph: Decision: Reject H0 Conclusion: At 5% significance level there is sufficient statistical evidence to conclude that weight loss in the two diet programs are different. x1 = 21. Sample 1 : n1 = 36. Exercise: Test the hypothesis that weight loss in the old diet program exceeds that of the new program. s2 = 25 (old) 1 2. 6 Comparing Two Population Proportions Parameter of interest: p1 − p2 x Sample 1: n1 . n2 ≥ 30) 2. x2 = 18. α = 0. : (x1 − x2 ) − 0 = 2. Exercise: Test the claim that the difference in mean weight loss for the two programs is greater than 1.3) Reject H0 if z > zα/2 or z < −zα/2 Graph: Decision: Conclusion: Assumptions: 1.05 H0 : µ 1 − µ 2 = 0 Ha : µ1 − µ2 = 0.S. Large samples ( n1 ≥ 30. x1 .96 RR: Reject H0 if z > 1. Samples are randomly selected 3. Samples are independent Example: (Comparing two weight loss programs) Refer to the weight loss example.14 z= 2 σ1 σ2 + n2 n1 2 Critical value: zα/2 = 1. Test the hypothesis that weight loss in the two diet programs are different.5.96 or z < −1. T. Sample 2 : n2 = 36.

n2 p1 − p2 (unknown parameter) Common estimate: p= ˆ Test: H 0 : p1 − p2 = 0 Ha : 1) p1 − p2 > 0 2) p1 − p2 < 0 3) p1 − p2 = 0 T.Sample 2: n2 .91 RR: reject H0 if z < −1. 76 .91. Solution: H 0 : p1 − p2 = 0 H a : p1 − p2 < 0 TS: z = −1.S. n2 p2 ≥ 5. x2 .645 Graph: Dec: reject H0 Conclusion: At 5% significance level there is sufficient statistical evidence to conclude that p1 − p2 < 0. : z= x1 + x2 n1 + n2 (ˆ1 − p2 ) − 0 p ˆ pq (1/n1 + 1/n2 ) ˆˆ RR: 1) Reject H0 if z > zα 2) Reject H0 if z < −zα 3) Reject H0 if z > zα/2 or z < −zα/2 Graph: Decision: Conclusion: Assumptions: Large sample(n1 p1 ≥ 5. p2 = ˆ x2 . n2 q2 ≥ 5) Samples are randomly and independently selected Example: Test the hypothesis that p1 − p2 < 0 if it is known that the test statistic is z = −1. n1 q1 ≥ 5.

A local pizza parlor advertises that their average time for delivery of a pizza is within 30 minutes of receipt of the order. Use the formula. Find the p-value in each case: (i) Upper tailed test: H0 : θ = θ0 Ha : θ > θ0 TS: z = 1.0314 (iii) Two tailed test: H0 : θ = θ0 Ha : θ = θ0 TS: z = 1. the statistical results are significant. Examples.86 p-value = . i.0392) = .e. 1.Exercise: Repeat as a two tailed test 7 Reporting Results of Statistical Tests: P-Value Definition.0392 (ii) Lower tailed test: H0 : θ = θ0 Ha : θ < θ0 TS: z = −1. substitution. The p-value for a test of a hypothesis is the smallest value of α for which the null hypothesis is rejected.76 p-value = 2(. Show your work graphically in all relevant questions. The p-value is called the observed significance level Note: The p-value is the probability ( when H0 is true) of obtaining a value of the test statistic as extreme or more extreme than the actual sample value in support of Ha . answer method whenever possible. The delivery time for a random sample of 64 77 . No credit for a correct final answer without a valid argument.76 p-value = .0784 Decision rule using p-value: (Important) Reject H0 for all α > p − value Review Exercises: Testing Hypothesis Please show all work.

2. H0 : Ha : T. (Answer: 1.R. α is fixed and the sample size is increased. Graph: Dec: Conclusion: ((ii) Test the hypothesis that Ha : µ = 30. with a sample mean of 34 minutes and a standard deviation of 21 minutes. (Circle your choice).S. then β will increase.52) R. 78 . for a given test.orders were recorded.05. Answer by True of False . T F (v) If. (i) Is there sufficient evidence to conclude that the actual delivery time is larger than what is claimed by the pizza parlor? Use α = .

we use s instead. Introduction 2.Chapter 8 Small-Sample Tests of Hypothesis Contents: 1. Comparing two population variances 1 Introduction When the sample size is small we only deal with normal populations.g. Small-sample inferences about the difference between two means: Paired Samples 6. we have z= x−µ √ σ/ n If σ is unknown. Small-sample inferences about a population mean 4. For non-normal (e. Student’s t distribution 3. binomial) populations different techniques are necessary 2 Student’s t Distribution RECALL For small samples (n < 30) from normal populations. Inferences about a population variance 7. Small-sample inferences about the difference between two means: Independent Samples 5. but we no more have a Z distribution Assumptions. 79 .

Critical value: either tα. 3) µ = µ0 .015 (ii) Find t0.n−1 ( √ ) 2 n Test: H0 : µ = µ 0 Ha : 1) µ > µ0 .025.05. 2) µ < µ0 . α Point estimator: x Estimator mean: µx = µ √ Estimated standard error: σx = s/ n Confidence Interval for µ: s x ± t α .n−1 2 80 .5 = 2. Moreover.355 (iii) Find t0.26 = 2.8 = 3.1. t converges to Z. s Other information: µ0 = target value. Small random sample (n < 30) 3.056 3 Small-Sample Inferences About a Population Mean Parameter of interest: µ Sample data: n. Sampled population is normal 2. σ is unknown t= x−µ √ s/ n Properties of the t Distribution: (i) It has n − 1 degrees of freedom (df) (ii) Like the normal distribution it has a symmetric mound-shaped probability distribution (iii) More variable (flat) than the normal distribution (iv) The distribution depends on the degrees of freedom. (i) Find t0. x. as n becomes larger.005. (v) Critical values (tail probabilities) are obtained from the t table Examples.n−1 or t α .

3 t= s/ n 5/ 25 RR: Reject H0 if t > 1. Test the claim that weight loss is not equal to 19.711 Graph: Decision: Do not reject H0 Conclusion: At 5% significance level there is insufficient statistical evidence to conclude that weight loss in a new diet program exceeds 20 pounds per first month.3. : t = x−µ0 √ s/ n RR: 1) Reject H0 if t > tα. (i.05 Critical value: t0. µ0 = 20. Unknown variance Example For the sample data given below.: 21.05. s2 = 25. 1. 4 Small-Sample Inferences About the Difference Between Two Means: Independent Samples Parameter of interest: µ1 − µ2 81 .n−1 3) Reject H0 if t > t α .e.24 = 1. test the hypothesis that weight loss in a new diet program exceeds 20 pounds per first month. T.S.5). Ha : µ = 19.n−1 or t < −t α . Assumptions. α = 0. Sample data: n = 25.n−1 2 2 Graph: Decision: 1) if observed value is in RR: “Reject H0 ” 2) if observed value is not in RR: “Do not reject H0 ” Conclusion: At 100α% significance level there is (in)sufficient statistical evidence to “favor Ha ” . Sample is randomly selected 3.T.S. Exercise. Normal population 4.5.n−1 2) Reject H0 if t < −tα.3 − 20 x − µ0 √ √ = = 1. x = 21.711 H0 : µ = 20 Ha : µ > 20. Small sample (n < 30) 2. 1.

n1 +n2 −2 )(s Test: H 0 : µ 1 − µ 2 = D0 1 1 + ) n1 n2 2 2 σ1 σ2 + = n1 n2 (n1 − 1)s2 + (n2 − 1)s2 1 2 n1 + n2 − 2 1 1 + n1 n2 σ2 σ2 + n1 n2 1 1 + n1 n2 82 . Variances are equal with common variance 2 2 σ 2 = σ1 = σ2 Pooled estimator for σ. 1. α Point estimator: X 1 − X 2 Estimator mean: µX 1 −X 2 = µ1 − µ2 Assumptions. s2 Other information: D0 = target value. x2 .Sample data: Sample 1: n1 . Samples are independent 5. Small samples ( n1 < 30. x1 . Samples are randomly selected 4. Normal populations 2. n2 < 30) 3. s1 Sample 2: n2 . s= Estimator standard error: σX 1 −X 2 = σ Reason: σX 1 −X 2 = = σ Confidence Interval: (x1 − x2 ) ± (tα/2.

Ha : 1)µ1 − µ2 > D0 ; 2) µ1 − µ2 < D0 ; 3) µ1 − µ2 = D0 T.S. : (x1 − x2 ) − D0 t= 1 1 s n1 + n2 RR: 1) Reject H0 if t > tα,n1 +n2 −2 2) Reject H0 if t < −tα,n1 +n2 −2 3) Reject H0 if t > tα/2,n1 +n2 −2 or t < −tα/2,n1 +n2 −2 Graph: Decision: Conclusion: Example.(Comparison of two weight loss programs) Refer to the weight loss example. Test the hypothesis that weight loss in a new diet program is different from that of an old program. We are told that that the observed value is 2.2 and the we know that 1. Sample 1 : n1 = 7 2. Sample 2 : n2 = 8 α = 0.05 Solution. H0 : µ 1 − µ 2 = 0 Ha : µ 1 − µ 2 = 0 T.S. : (x1 − x2 ) − 0 = 2.2 t= 1 1 s n1 + n2 Critical value: t.025,13 = 2.160 RR: Reject H0 if t > 2.160 or t < −2.160 Graph: Decision: Reject H0 Conclusion: At 5% significance level there is sufficient statistical evidence to conclude that weight loss in the two diet programs are different. Exercise: Test the claim that the difference in mean weight loss for the two programs is greater than 0. Minitab Commands: A twosample t procedure with a pooled estimate of variance MTB> twosample C1 C2; SUBC>pooled;

83

SUBC> alternative 1. Note: alternative : 1=right-tailed; -1=left tailed; 0=two tailed.

5

Small-Sample Inferences About the Difference Between Two Means: Paired Samples

Parameter of interest: µ1 − µ2 = µd Sample of paired differences data: Sample : n = number of pairs, d = sample mean, sd Other information: D0 = target value, α Point estimator: d Estimator mean: µd = µd Assumptions. 1. Normal populations 2. Small samples ( n1 < 30; n2 < 30) 3. Samples are randomly selected 4. Samples are paired (not independent) Sample standard deviation of the sample of n paired differences sd =
n i=1 (di

− d)2 n−1

√ Estimator standard error: σd = sd / n Confidence Interval. √ d ± tα/2,n−1 sd / n Test. H0 : µ1 − µ2 = D0 (equivalently, µd = D0 ) Ha : 1)µ1 − µ2 = µd > D0 ; 2) µ1 − µ2 = µd < D0 ; 3) µ1 − µ2 = µd = D0 , T.S. : d − D0 √ t= sd / n RR: 1) Reject H0 if t > tα,n−1 2) Reject H0 if t < −tα,n−1

84

3) Reject H0 if t > tα/2,n−1 or t < −tα/2,n−1 Graph: Decision: Conclusion: Example. A manufacturer wishes to compare wearing qualities of two different types of tires, A and B. For the comparison a tire of type A and one of type B are randomly assigned and mounted on the rear wheels of each of five automobiles. The automobiles are then operated for a specified number of miles, and the amount of wear is recorded for each tire. These measurements are tabulated below. Automobile Tire A Tire B 1 10.6 10.2 2 9.8 9.4 3 12.3 11.8 4 9.7 9.1 5 8.8 8.3 x1 = 10.24 x2 = 9.76 Using the previous section test we would have t = 0.57 resulting in an insignificant test which is inconsistent with the data. Automobile Tire A Tire B d=A-B 1 10.6 10.2 .4 2 9.8 9.4 .4 3 12.3 11.8 .5 4 9.7 9.1 .6 5 8.8 8.3 .5 x1 = 10.24 x2 = 9.76 d = .48 Q1: Provide a summary of the data in the above table. Sample summary: n = 5, d = .48, sd = .0837 Q2: Do the data provide sufficient evidence to indicate a difference in average wear for the two tire types. Test. (parameter µd = µ1 − µ2 ) H0 : µ d = 0 Ha : µ d = 0 T.S. : .48 − 0 d − D0 √ = 12.8 √ = t= sd / n .0837/ 5

85

2 H0 : σ 2 = σ0 2 Ha : σ 2 = σ0 (two-tailed test). T.776 ( t.RR: Reject H0 if t > 2. 1. Construct a 99% confidence interval for the difference in average wear for the two tire types.025. When a random sample of size n is drawn from a normal population with mean µ and standard deviation σ. The standardized distribution of S 2 is called the chi-square distribution and is given by (n − 1)s2 X2 = σ2 Degrees of freedom (df): ν = n − 1 Graph: Non-symmetrical and depends on df Critical values: using X 2 tables Test.776 or t < −2. Random sample Example: 86 . 6 Inferences About a Population Variance Chi-square distribution. : (n − 1)s2 X2 = 2 σ0 2 2 RR: Reject H0 if X 2 > Xα/2 or X 2 < X1−α/2 where X 2 is based on (n − 1) degrees of freedom. Normal population 2.S. the sampling distribution of S 2 depends on n.776) Graph: Decision: Reject H0 Conclusion: At 5% significance level there is sufficient statistical evidence to to conclude that the average amount of wear for type A tire is different from that for type B tire.4 = 2. Graph: Decision: Conclusion: Assumptions. Exercise.

1. ν2 = n2 − 1 Graph: Non-symmetrical and depends on df Critical values: using F tables Test. x2 = 17. 87 . The most common method for measuring investment volatility is by computing the variance ( or standard deviation) of possible outcomes.Use text 7 Comparing Two Population Variances F-distribution. 2 2 H0 : σ1 = σ2 2 2 Ha : σ1 = σ2 (two-tailed test). :F = s2 1 s2 2 where s2 is the larger sample variance. When independent samples are drawn from two normal populations 2 2 with equal variances then S1 /S2 possesses a sampling distribution that is known as an F distribution. x1 = 17. 1 Note: F = larger sample variance smaller sample variance RR: Reject H0 if F > Fα/2 where Fα/2 is based on (n1 − 1) and (n2 − 1) degrees of freedom. That is s2 F = 1 s2 2 Degrees of freedom (df): ν1 = n1 − 1. T. (Investment Risk) Investment risk is generally measured by the volatility of possible outcomes of the investment.S.8%. s2 = 3.8%.14 Both populations are assumed to be normally distributed. Independent random samples Example. s2 = 7. Returns over the past 10 years for first alternative and 8 years for the second alternative produced the following data: Data Summary: Investment 1: n1 = 10.21 1 2 Investment 2: n2 = 8. Graph: Decision: Conclusion: Assumptions. Normal populations 2.

9 = 4. Do the upper tail test.14 = 2.n2 −1.22 F = 2 = 2 s1 3. RR: Reject H0 if F > Fα/2 where Fα/2. 2 2 Exercise. 88 .20 Graph: Decision: Do not reject H0 Conclusion: At 5% significance level there is insufficient statistical evidence to indicate that the risks for investments 1 and 2 are unequal.n1 −1 = F.S. Test: 2 2 H0 : σ1 = σ2 2 2 Ha : σ1 = σ2 (two-tailed test).7.025.Q1: Do the data present sufficient evidence to indicate that the risks for investments 1 and 2 are unequal ? Solution. T.21 . : s2 7. That is Ha : σ1 > σ2 .

1. t =4 89 . That is H0 : µ 1 = µ 2 = µ 3 = µ 4 Ha : Not all means are equal Definitions: (i) Response: variable of interest or dependent variable (sales) (ii) Factor: categorical variable or independent variable (training technique) (iii) Treatment levels (factor levels): method of training. the number of trainees varied from program to program. Introduction 2. Example. Because there were some dropouts during the training program. Test whether the means are equal or not. Four groups of sales people for a magazine sales agency were subjected to different sales training programs. Do the data present sufficient evidence to indicate a difference in the mean achievement for the four training programs? Goal.Chapter 9 Analysis of Variance Contents. At the end of the training programs each salesperson was assigned a sales area from a group of sales areas that were judged to have equivalent sales potentials. The Randomized Block Design 1 Introduction Analysis of variance is a statistical technique used to compare more than two population means by isolating the sources of variability. The table below lists the number of sales made by each person in each of the four groups of sales people during the first week after completing the training program. One Way ANOVA: Completely Randomized Experimental Design 3.

67 78.2 Source of error Treatments Error Totals Inferences about population means Test.6 63.S.43 µ1 µ2 n3 = 6 425 70.1 65 87 73 79 81 69 2 75 69 83 81 72 79 90 Training Group 3 4 59 94 78 89 67 80 62 88 83 76 Ti Ti parameter n1 = 6 n2 = 7 454 549 75.e.19 = 3.n−t i.83 µ3 n4 = 4 351 87.13 90 . : F = M SE = 3.77 19 1.0 22 1909.3.05. Reject H0 if F > F0. H0 : µ 1 = µ 2 = µ 3 = µ 4 Ha : Not all means are equal M ST T.75 µ4 n = 23 GT= 1779 (iv) ANOVA: ANalysis OF VAriance (v) N-Way ANOVA: studies N factors.5 3.t−1. RR: Reject H0 if F > Fα.77 where F is based on (t-1) and (n-t) df.6 237. (vi) experimental unit: (trainee) 2 One Way ANOVA: Completely Randomized Experimental Design ANOVA Table df SS MS F p-value 3 712.196.

SSE: sum of squares of total deviation within treatments (error). Independent random samples 3. Sampled populations are normal 2. Assumptions. All t populations have equal variances Computations. S of error df SS Trments t-1 SST Error n-t SSE Totals n-1 TSS ANOVA Table MS F MST=SST/(t-1) MST/MSE MSE=SSE/(n-t) p-value Training Group 1 2 3 x11 x21 x31 x12 x22 x32 x13 x23 x33 x14 x24 x34 x15 x25 x35 x16 x26 x36 x27 n1 T1 T1 µ1 n2 T2 T2 µ2 n3 T3 T3 µ3 4 x41 x42 x43 x44 Ti Ti parameter n4 T4 T4 µ4 n GT Notation: TSS: sum of squares of total deviation. CM: correction for the mean 91 .Graph: Decision: Reject H0 Conclusion: At 5% significance level there is sufficient statistical evidence to indicate a difference in the mean achievement for the four training programs. SST: sum of squares of total deviation between treatments. 1.

n−t s2 ( 1 1 + ) ni nj MINITAB MTB> aovoneway C1-C4. 601.2 ij SST = SSE = T SS − SST = 1. 92 . Estimate of the common variance: √ √ s = s2 = MSE = SSE n−t CI for µi : s T i ± tα/2. Computational Formulas for TSS.2 Ti2 ni − CM = 712. 196.6 237.77 19 1. Produce a Minitab output for the above example. 7792/23 = 137.0 22 1909.8 CM = ( T SS = x2 − CM = 1.6 Confidence Intervals.196.5 3.6 63. 909.GT: Grand Total.n−t √ ni CI for µi − µj : (T i − T j ) ± tα/2. SST and SSE: t ni T SS = i=1 j=1 t x2 − CM ij SST = Ti2 − CM i=1 ni SSE = T SS − SST Calculations for the training example produce xij )2 /n = 1. Exercise.6 Thus Source of error Treatments Error Totals ANOVA Table df SS MS F p-value 3 712.

That is larger supermarkets would be expected to have larger overall sales of the product than smaller supermarkets. (i) Provide a data summary. and each treatment appears once in every block. (i) In each supermarket (block) the first entry represents the design (treatment) and the second entry represents the sales per week. b = 4 blocks 93 . (ii) Do the data present sufficient evidence to indicate a difference in the mean sales for each package design (treatment)? (iii) Do the data present sufficient evidence to indicate a difference in the mean sales for the supermarkets? weeks w1 w2 w3 (1) 17 (3) 23 (2) 34 (3) 21 (1) 15 (2) 26 (1) 1 (2) 23 (3) 8 (2) 22 (1) 6 (3) 16 s1 s2 s3 s4 Remarks. The treatment and block totals are t = 3 treatments. In this case the difference in sales could be due to more than just differences in package design. below represent the number of units sold for each package design within each supermarket during each of three given weeks. each containing t experimental units. A randomized block design consists of b blocks. The randomized block design eliminates the store-to-store variability.3 The Randomized Block Design Extends paired-difference design to more than two treatments. (iii) An alternate design would be to use 12 supermarkets. Example. The data shown in Table 1. (ii) The three designs are assigned to each supermarket completely at random. A consumer preference study involving three different package designs (treatments) was laid out in a randomized block design among four supermarkets (blocks). The t treatments are randomly assigned to the units in each block. Each design (treatment) would be randomly assigned to 4 supermarkets. For computational purposes we rearrange the data so that Data Summary.

67 ij SST = − CM = 348.33 CM = ( T SS = x2 − CM = 940.50 Ti2 b 2 Bi t − CM = 547. T3 = 68 B1 = 74. B3 = 32.17 94 . 745. B2 = 62.00 SSB = SSE = T SS − SST − SSB = 45. B4 = 44 Calculations for the training example produce xij )2 /n = 3.s1 s2 s3 s4 Ti Treatments t1 t2 t3 17 34 23 15 26 21 1 23 8 6 22 16 T1 T2 T3 Bi B1 B2 B3 B4 T1 = 39. T2 = 105.

000 3 348.MINITAB.003 6 45.58 36.50 7.(Commands and Printouts) MTB> Print C1-C3 ROW 1 2 3 4 5 6 7 8 9 10 11 12 UNITS 17 34 23 15 26 21 1 23 8 6 22 16 TRTS 1 2 3 1 2 3 1 2 3 1 2 3 BLOCKS 1 1 1 2 2 2 3 3 3 4 4 4 MTB> ANOVA C1=C2 C3 Source of error Treatments Blocks Error Totals ANOVA Table df SS MS F p-value 2 547.00 116.17 273.08 0.67 95 .00 15.30 0.58 11 940.

H0 : Block means are equal Ha : Not all block means are equal (i. Assumptions.S.n−t−b+1 i.: F = M SB M SE = 15. Reject H0 if F > F0.6 = 12.92 Graph: Decision: Reject H0 Conclusion: At .30 where F is based on (b-1) and (n-t-b+1) df.n−t−b+1 i.e. Dependent random samples due to blocking 3. Estimate of the common variance: √ √ SSE s = s2 = MSE = n−t−b+1 CI for µi − µj : 96 . that is the data supports our decision to use supermarkets as blocks. 1.t−1.e.b−1. RR: Reject H0 if F > Fα.14 Graph: Decision: Reject H0 Conclusion: At 5% significance level there is sufficient statistical evidence to indicate a real difference in the mean sales for the three package designs.5% significance level there is sufficient statistical evidence to indicate a real difference in the mean sales for the four supermarkets. RR: Reject H0 if F > Fα.e. Reject H0 if F > F0. blocking is desirable) T.Solution to (ii) Test. : F = M SE = 36.3. H0 : µ 1 = µ 2 = µ 3 Ha : Not all means are equal M ST T.2.6 = 5.S. All t populations have equal variances Confidence Intervals.005. Solution to (iii) Test.09 where F is based on (t-1) and (n-t-b+1) df. Sampled populations are normal 2.05. Note that n − t − b + 1 = (t − 1)(b − 1).

Construct a 90% C. 97 .(T i − T j ) ± tα/2.n−t−b+1 s 2 b Exercise.I. for the difference between mean sales from package designs 1 and 2.

In addition we wish to use the amount spent on advertising to predict the sales volume. Inferences concerning the slope 5. A Simple Linear probabilistic model 3. Example. that is whether advertising is actually related to the firm’s sales volume. Predicting y for a given x 7. Estimating E(y|x) for a given x 6. Analysis of Variance 9. Computer Printouts 1 Introduction Linear regression is a statistical technique used to predict (forecast) the value of a variable from known related variables. We wish to determine whether advertising is worthwhile. Introduction: Example 2. and the associated sales 98 . For the predictor variable x we use the the amount spent by the company on advertising during the month of interest. Least squares prediction equation 4.Chapter 10 Simple Linear Regression and Correlation Contents.x.( Ad Sales) Consider the problem of predicting the gross monthly sales volume y for a corporation that is not subject to substantial seasonal variation in its sales volume. The data in the table below represent a sample of advertising expenditures. 1. Coefficient of correlation 8.

3 90 0.6 91 0. 2.1 Definitions.000) 101 1.0 75 0. Y = β0 + β1 X + where x: independent variable (predictor) y: dependent variable (response) β0 and β1 are unknown parameters. Assumptions. (i) Response: dependent variable of interest (sales volume) (ii) Independent (predictor) variable ( Ad expenditure) (iii) Linear equations (straight line): y = a + bx Scatter diagram: Best fit straight line: Equation of a straight line: (y-intercept and slope) 2 A Simple Linear Probabilistic Model Model.volume.000) x(x$10.8 93 1.2 92 0. y.9 105 1. 99 . : random error due to other factors not included in the model. Month 1 2 3 4 5 6 7 8 9 10 Ad Sales Data y(y$10.8 110 1.0 120 1. V ar( ) := σ 2 = σ 2 . E( ) := µ = 0. 1. for 10 randomly selected months.7 82 0.

3. that is min (y − y )2 ˆ Computational Formulas.v. Can you say that x and y are linearly related? Answer. β1 and σ 2 . Do a scatter diagram. y) around the regression line. ˆ ˆ y = β0 + β1 x ˆ This equation is obtained by using the method of least squares. ˆ (i) β1 : is the slope of the estimated regression equation. The r. Objective: Estimate β0 . To estimate σ 2 y)/n ˆ SSE = SSyy − β1 SSxy = SSyy − (SSxy )2 /SSxx . 100 . Ad Sales example Question 1. has a normal distribution with mean 0 and variance σ 2 . 3 Least Squares Prediction Equation The least squares prediction equation is sometimes called the estimated regression equation or the prediction equation. The random components of any two observed y values are independent. Question 2. (ii) s2 provides a measure of spread of points (x. x = x/n. y = y/n SSxx = (x − x)2 = x2 − ( x)2 /n SSyy = (y − y)2 = y 2 − ( y)2/n SSxy = (x − x)(y − y) = xy − ( x)( ˆ β1 = SSxy /SSxx ˆ ˆ β0 = y − β1 x. s2 = SSE n−2 Remarks. Use the computational formulas to provide a data summary. 4.

9 SSxx = . y = 95.Answer. Data Summary. x = 0.34 SSyy = 1600.444 SSxy = 23.9 101 .94.

649 5. y = x2 − ( xy − ( y2 − ( y/n = 95.724 8.6 93.9 (9.49 0.201 8.8 y2 10.444 y)/n = 924.21 x y x2 9.100 6.3 120 1.64 1.9 115.0 156.00 0.0 63.0 81.44 0.8 − y)2/n = 93.94 y = 95.8 92 0.4 959 9.9 xy 121.6 110.400 8.9 91 0.36 0.625 8.64 1.69 0.6 75 0.0 93 1.7 90 0.100 14.569 x= x/n = 0.0 65.28 x = 0.025 y2 93. 569 − (9.4)2 10 SSxx = SSxy = SSyy = x)2 /n = 9.00 1.9 10 102 .2 73.Optional material Month 1 2 3 4 5 6 7 8 9 10 Sum Ad Sales Calculations x y x2 1.5 xy 924.94.281 11.0 45.0 110 1.4)(959) = 23.2 101 1.1 105 1.28 − x)( = .34 10 (959)2 = 1600.8 82 0.464 12.81 1.

94) 46. Question 6.49 + 52.57x ˆ Remark. 000.0) = 99.e.9 − (52.49 + (52. for a given expenditure level of $10. y.34 = 52.0). Answer. Estimate σ 2 . Answer. 600. 4 Inferences Concerning the Slope Parameter of interest: β1 ˆ Point estimator: β1 103 .49 + (52. This equation is also called the estimated regression equation or prediction line.9 − (52.49.97 .57x = 46.06.0) = 99. Question 7. however.75 n−2 8 Question 5. Find the least squares line for the data. Therefore s2 = SSE 373.57x = 46. ˆ So sales volume is $990.06 so the mean sales volume is $990. be different.5676 52.57 . x = 1. E(y|x) = 46.97 = = 46. 600. 600.57)(1.49 + 52. x = 1. Estimate the parameters β0 .0. Predict sales volume.5676)(23. ˆ ˆ y = β0 + β1 x = 46. and β1 .5676)(. β Question 4. ˆ SSE = SSyy − β1 SSxy = 1. Answer.Question 3.444 ˆ ˆ0 = y − β1 x = 95. the bound on the error of estimation will. y = 46. Answer.57)(1. Predict the mean sales volume E(y|x) for a given expenditure level of $10.49 + 52. ˆ β1 = SSxy /SSxx = 23. 000 (i. Answer. Remark. In Question 6 and Question 7 we obtained the same estimate.34) = 373.

Find a 95% confidence interval for β1 . Answer.n−2 or t < −tα/2.S. Confidence interval for β1 : s ˆ β1 ± tα/2. and sales volume. and sales volume.57 − 0 β1 − 0 √ = t= √ = 5.n−2 √ SSxx Question 9.306 Graph: Decision: Reject H0 Conclusion: At 5% significance level there is sufficient statistical evidence to indicate a linear relation ship between advertising expenditure. x.Estimator mean: µβ1 = β1 ˆ √ Estimator standard error: σβˆ1 = σ/ SSxx Test. H0 : β1 = 0 (no linear relationship) Ha : β1 = 0 (there is linear relationship) T.306) Reject H0 if t > 2.306 or t < −2. Determine whether there is evidence to indicate a linear relationship between advertising expenditure.S. y.12 6. : ˆ 52.n−2 Graph: Decision: Conclusion: Question 8. : ˆ β1 − β10 t= √ s/ SSxx RR: Reject H0 if t > tα/2. 104 .444 s/ SSxx RR: ( critical value: t. H0 : β1 = β10 (no linear relationship) Ha : β1 = β10 (there is linear relationship) T.025. Answer.8 = 2. Test.84/ . y. x.

76. (i) −1 ≤ r ≤ 1.306 √ .90.57 ± 23. We call this measure coefficient of correlation between y and x.444 52.n−2 s2 [ ˆ 1 (xp − x)2 ] + n SSxx 6 Predicting y for a Given x The prediction interval (PI) for a particular value of y given x = xp is given by y ± tα/2. 105 .57 ± 2.n−2 s2 [1 + ˆ 1 (xp − x)2 + ] n SSxx 7 Coefficient of Correlation In a previous section we tested for a linear relationship between x and y. Now we examine how strong a linear relationship between x and y is.s ˆ β1 ± tα/2.24) 5 by Estimating E(y|x) For a Given x The confidence interval (CI) for the expected (mean) value of y given x = xp is given y ± tα/2.84 52. r= SSxy SSxx SSyy Remarks.57 = (28.n−2 √ SSxx 6.

y SSR = (ˆ − y)2 (SS of deviations due to regression or explained deviations) ˆ SSE = (y − y )2 (SS of deviations for the error or unexplained deviations) T SS = SSR + SSE Question 12.(ii) The population coefficient of correlation is ρ. Question 13.77 SSyy r 2 = is called the coefficient of determination 8 Analysis of Variance Notation: T SS := SSyy = (y − y)2 (Total SS of deviations). r= SSxy SSxx SSyy = 23. ˆ (iii) r > 0 indicates a positive correlation (β1 > 0) ˆ (iv) r < 0 indicates a negative correlation (β1 < 0) ˆ (v) r = 0 indicates no correlation (β1 = 0) Question 10. Answer.34 0. 106 . By what percentage is the sum of squares of deviations of y about the ˆ mean (SSyy ) is reduced by using y rather than y as a predictor of y? Answer.88 Coefficient of determination Algebraic manipulations show that r2 = SSyy − SSE SSyy Question 11.9) = 0. SSyy − SSE r2 = = 0.444(1.882 = 0. 600. Use ANOVA table to test for a significant linear relationship between sales and advertising expenditure. Find the coefficient of correlation. Answer. Give the ANOVA table for the AD sales example. r.

S.8 = 14.25 RR: ( critical value: F.69) Reject H0 if F > 14.Source Reg. H0 : β1 = 0 (no linear relationship) Ha : β1 = 0 (there is linear relationship) T.226. Error Totals ANOVA Table df SS MS F p-value 1 1. 9 Computer Printouts for Regression Analysis Store y in C1 and x in C2.: F = M SR M SE = 26. x.747 9 1.600. : Gives a scatter diagram.226. MTB> Regress C1 1 C2.005.927 1. Test.900 ANOVA Table MS MSR=SSR/(1) MSE=SSE/(n-2) Source df SS Reg.927 26.973 46.5% significance level there is sufficient statistical evidence to indicate a linear relationship between advertising expenditure. MTB> Plot C1 C2.1.69 (OR: Reject H0 if α > p-value) Graph: Decision: Reject H0 Conclusion: At 0. 1 SSR Error n-2 SSE Totals n-1 TSS F MSR/MSE p-value Answer. Computer output for Ad sales example: More generally we obtain: 107 .0001 8 373. y.25 0. and sales volume.

(iii) Find the regression equation for the data. Use the formula.7% Source Reg. (Answer: y = 3 − x) ˆ (iv) Plot the regression equation on the same graph as (i).837 The regression equation is y=46. 1. and SSxy = −16.000 8 373.000 R-sq(adj)=73. (Answer: s2 = 2/3) (vi) Estimate the expected value of y when x = −1 (vii) Find the correlation coefficient r and find r 2 . substitution.885 4. x2 = 16.Predictor Constant x s=6. SSyy = 18. Does the line appear to provide a good fit for the data points? (v) Compute SSE and s2 .12 R-sq=76. SSxx = 16.226.000 0. No credit for a correct final answer without a valid argument.927 1. y 2 = 63.25 0. and indicate whether x and y appear linearly related.889) 108 . (ii) Show that x = 0. Given the following data set x -3 -1 1 1 2 y 6 4 3 1 1 (i) Plot the scatter diagram. r 2 = . Show your work graphically in all relevant questions.927 26.973 46. answer method whenever possible. Error Totals Analysis of Variance df SS MS F p-value 1 1. y = 15.943.57 10.5 + 52.900 Review Exercises: Linear Regression Please show all work.6 x Coef Stdev t-ratio 46. (Answer: r = −.226.70 52.6% P 0.486 9.747 9 1.26 5.600.

β1 .73 + 1. ˆ ˆ (i) Find β0 . yi = 763. 1 SSR MSR=SSR/(1) MSR/MSE Error n-2 SSE MSE=SSE/(n-2) Totals n-1 TSS p-value 2. SSxx = 485. Predictor Constant X s= Analysis of Coef 82.8. A random sample sample of 20 managers is chosen with the following results (in thousands of dollars): xi = 235. and last year’s energy consumption. X.036 0. and years of work experience. SSyy = 2. The Regress Minitab’s command has been applied to data on family income.85) (iii) Find r 2 and interpret it value.75. X.94 0. 3.0% Variance stdev t-ratio P 2.The regression equation is ˆ ˆ y = β0 + β1 x Predictor Constant x s= √ MSE Coef ˆ β0 ˆ β1 Stdev σβˆ0 σβˆ1 R − sq = r 2 t-ratio TS: t TS: t P p-value p-value R-sq(adj) Analysis of Variance Source df SS MS F Reg. (Answer: y = 16. Y . It is further assumed that the relationship is linear.1.Y .6% 109 .05727 16.000 0.85. and the estimated regression equation. A study of middle to upper-level managers is undertaken to investigate the relationship between salary level. and SSxy = 886. from a random sample of 25 families.000 R-sq(adj)=91.826x) ˆ (ii) Find the correlation coefficient.(Answer: r = . The income data are in thousands of dollars and the energy consumption are in millions of BTU.054 39. r. 236. A portion of a linear regression computer printout is shown below.93051 R-sq=92.25 0.

we always predict the same value of y regardless of the value of x.02 0. T F (iii) The last step in a simple regression analysis is drawing a scatter diagram. and σ 2 . T F (iv) r = 1 implies no linear correlation between x and y.Source Regression Error Total DF SS 23 8291 MS F P 7626. T F (vii) It is necessary to assume that the response y of a probability model has a normal distribution if we are to estimate the parameters β0 .000 (i) Complete all missing entries in the table.01. T F (vi) If β1 = 1. ˆ ˆ (ii) Find β0 . β1 .000.6 264. (Circle your choice). 110 . β1 . 4. T F (i) The correlation coefficient r shows the degree of association between x and y. (iv) Determine a point estimate for last year’s mean energy consumption of all families with an annual income of $40. T F (ii) The coefficient of determination r 2 shows the percentage change in y resulting form one-unit change in x. T F (v) We always estimate the value of a parameter and predict the value of a random variable. (iii) Do the data present sufficient evidence to indicate that Y and X are linearly related? Test by using α = 0. and the estimated regression equation. Answer by True of False .

Computer Printouts 1 Introduction: Example Multiple linear regression is a statistical technique used predict (forecast) the value of a variable from multiple known related variables. 2. 1. 3. Assumptions. has a normal distribution with mean 0 and variance σ 2 . Analysis of Variance 4. Multiple Linear Model 3. 111 . : random error due to other factors not included in the model. 2 A Multiple Linear Model Model. V ar( ) := σ 2 = σ 2 . 1. E( ) := µ = 0. Introduction: Example 2. Y = β0 + β1 X1 + β2 X2 + β3 X3 + where xi : independent variables (predictors) y: dependent variable (response) βi : unknown parameters.Chapter 11 Multiple Linear Regression Contents.

y x1 x2 x3 1 y1 x11 x21 x31 2 y2 x12 x22 x32 ··· ··· ··· ··· ··· n yn x1n x2n x3n Minitab Printout The regression equation is ˆ ˆ ˆ ˆ y = β0 + β1 x1 + β2 x2 + β3 x3 Predictor Constant x1 x2 x3 s= √ MSE Coef ˆ β0 ˆ β1 ˆ β2 ˆ β3 Stdev σβˆ0 σβˆ1 σβˆ2 σβˆ3 R2 = r 2 t-ratio TS: t TS: t TS: t TS: t P p-value p-value p-value p-value R2 (adj) Source Reg. The random components of any two observed y values are independent.4. 3 Least Squares Prediction Equation Estimated Regression Equation ˆ ˆ ˆ ˆ y = β0 + β1 x1 + β2 x2 + β3 x3 ˆ This equation is obtained by using the method of least squares Multiple Regression Data Obser. Error Totals df 3 n−4 n−1 Analysis of Variance SS MS F SSR MSR=SSR/(3) MSR/MSE SSE MSE=SSE/(n-4) TSS p-value 112 .

Source x1 x2 x3

df 1 1 1

SS SSx1 x1 SSx2 x2 SSx3 x3

Unusual observations (ignore)

113

MINITAB. Use REGRESS command to regress y stored in C1 on the 3 predictor variables stored in C2 − C4. MTB> Regress C1 3 C2-C4; SUBC> Predict x1 x2 x3. The subcommand PREDICT in Minitab, followed by fixed values of x1 , x2 , and x3 calculates the estimated value of y (Fit), its estimated standard error (Stdev.Fit), a 95% ˆ CI for E(y), and a 95% PI for y. Example. A county assessor wishes to develop a model to relate the market value, y, of single-family residences in a community to the variables: x1 : living area in thousands of square feet; x2 : number of floors; x3 : number of bedrooms; x4 : number of baths. Observations were recorded for 29 randomly selected single-family homes from residences recently sold at fair market value. The resulting prediction equation will then be used for assessing the values of single family residences in the county to establish the amount each homeowner owes in property taxes. A Minitab printout is given below: MTB> Regress C1 4 C2-C5; SUBC> Predict 1.0 1 3 2; SUBC> Predict 1.4 2 3 2.5. The regression equation is y = −16.6 + 7.84x1 − 34.4x2 − 7.99x3 + 54.9x4 Predictor Coef. Stdev t-ratio Constant −16.58 18.88 −0.88 x1 7.839 1.234 6.35 x2 −34.39 11.15 −3.09 −7.990 8.249 −0.97 x3 54.93 13.52 4.06 x4 s = 16.58 R2 = 88.2%

P 0.389 0.000 0.005 0.342 0.000

R2 (adj) = 86.2%

114

Source Reg. Error Totals

df 4 24 28

Analysis of Variance SS MS F p-value 49359 12340 44.88 0.000 6599 275 55958

Source x1 x2 x3 x4

df 1 1 1 1

SS 44444 59 321 4536

Fit Stdev.Fit 113.32 5.80 137.75 5.48

95%C.I. 95%P.I. (101.34, 125.30) (77.05, 149.59) (126.44, 149.07) (101.70, 173.81)

115

x2 = 1. ˆ ˆ ˆ ˆ y = β0 + β1 x1 + β2 x2 + β3 x2 x2 ˆ 1 116 .59) Non-Linear Models Example.Q1.9x4 Q2. : p-value=0.34.000 DR. What is the prediction equation ? The regression equation is y = −16. x3 = 3.6 + 7.84x1 − 34.4x2 − 7.99x3 + 54.30) PI: (77. CI: (101. 125. Give a 95% CI for E(y) and PI for y when x1 = 10.S. Q4. Reject H0 if α > p − value Graph: Decision: Reject H0 Conclusion: At 5% significance level there is sufficient statistical evidence to indicate that the model contributes information for the prediction of y. 149. Do the data provide sufficient evidence to indicate that the model contributes information for the prediction of y? Test using α = 0. What type of model has been chosen to fit the data? Multiple linear regression model. Q3.05.05. Test: H0 : model not useful Ha : model is useful T. and x4 = 2.

Master your semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master your semester with Scribd & The New York Times

Cancel anytime.