Professional Documents
Culture Documents
By
Ritwik Jain
Sudhanshu Kumar Ashwini
Standard Deviation Vs Variance
S.D has Same unit as that of sample
Variance Defines The Spread and distribution
S.D is more useful in displaying the data where as
Variance is more useful in calculation on data
What is a Probability Distribution?
Experiment: Toss a
coin three times.
Observe the number
of heads. The possible
results are: zero
heads, one head, two
heads, and three
heads.
What is the probability
distribution for the
3
number of heads?
Probability Distribution of Number of Heads
Observed in 3 Tosses of a Coin
4
Characteristics of a Probability Distribution
5
Random Variable
Random variable - a quantity resulting from an
experiment that, by chance, can assume different
values
7
Mean, S.D, Variance of a Probability
Distribution
MEAN
•The mean is a typical value used to represent the
central location of a probability distribution.
•The mean of a probability distribution is also
referred to as its expected value.
•S.D and Variance Measures the amount of spread
in a distribution
8
Binomial Probability Distribution
Characteristics of a Binomial Probability
Distribution
There are only two possible outcomes on a particular
trial of an experiment.
The outcomes are mutually exclusive,
The random variable is the result of counts.
Each trial is independent of any other trial
9
Real Life samples
A study in June 2003 by the Illinois Department of
Transportation concluded that 76.2 percent of front
seat occupants used seat belts. A sample of 12
vehicles is selected. What is the probability the
front seat occupants in at least 7 of the 12 vehicles
are wearing seat belts?
Poisson Probability Distribution
The Poisson probability distribution describes the
number of times some event occurs during a
specified interval. The interval may be time,
distance, area, or volume.
Assumptions of the Poisson Distribution
(1) The probability is proportional to the length of the
interval.
(2) The intervals are independent.
11
Poisson Probability Distribution
12
Poisson Probability Distribution - Example
13
Binomial Versus Poisson
The Uniform Distribution
15
Real Life Example
Southwest Arizona State University provides bus
service to students while they are on campus. A bus
arrives at the North Main Street and College Drive stop
every 30 minutes between 6 A.M. and 11 P.M. during
weekdays. Students arrive at the bus stop at random
times. The time that a student waits is uniformly
distributed from 0 to 30 minutes. How long will a
student “typically” have to wait for a bus? In other
words what is the mean waiting time? What is the
standard deviation of the waiting times? What is the
probability a student will wait more than 25 minutes?
Characteristics of a Normal
Probability Distribution
It is bell-shaped and has a single peak at the center of the distribution.
The arithmetic mean, median, and mode are equal
The total area under the curve is 1.00; half the area under the normal
curve is to the right of this center point and the other half to the left of
it.
It is symmetrical about the mean.
It is asymptotic: The curve gets closer and closer to the X-axis but
never actually touches it. To put it another way, the tails of the curve
extend indefinitely in both directions.
The location of a normal distribution is determined by the mean,, the
dispersion or spread of the distribution is determined by the standard
deviation,σ .
17
The Normal Distribution - Families
18
The Standard Normal Probability
Distribution
The standard normal distribution is a normal distribution
with a mean of 0 and a standard deviation of 1.
It is also called the z distribution.
A z-value is the distance between a selected value,
designated X, and the population mean , divided by the
population standard deviation, σ.
The formula is:
19
The Empirical Rule
About 68 percent of the
area under the normal
curve is within one
standard deviation of the
mean.
About 95 percent is within
two standard deviations of
the mean.
Practically all is within
three standard deviations
of the mean.
20
The Empirical Rule - Example
As part of its quality assurance
program, the Autolite
Battery Company conducts
tests on battery life. For a
particular D-cell alkaline
battery, the mean life is 19
hours. The useful life of the
battery follows a normal
distribution with a standard
deviation of 1.2 hours.
23
Probability Sampling
A probability sample is a sample
selected such that each item or person
in the population being studied has a
known likelihood of being included in
the sample.
24
Methods of Probability Sampling
25
Methods of Probability Sampling
26
Methods of Probability Sampling
In nonprobability sample inclusion in the sample is
based on the judgment of the person selecting the
sample.
27
Sampling Distribution of the Sample
Means
28
Central Limit Theorem
29
Using the Sampling
Distribution of the Sample Mean (Sigma Known)
X
z
n
30
Using the Sampling
Distribution of the Sample Mean (Sigma Unknown)
If the population does not follow the normal
distribution, but the sample is of at least 30
observations, the sample means will follow the normal
distribution.
To determine the probability a sample mean falls
within a particular region, use:
X
t
s n
31
Using the Sampling Distribution of the Sample Mean (Sigma
Known) - Example
The Quality Assurance Department for Cola, Inc., maintains records
regarding the amount of cola in its Jumbo bottle. The actual amount of
cola in each bottle is critical, but varies a small amount from one bottle
to the next. Cola, Inc., does not wish to underfill the bottles. On the
other hand, it cannot overfill each bottle. Its records indicate that the
amount of cola follows the normal probability distribution. The mean
amount per bottle is 31.2 ounces and the population standard deviation
is 0.4 ounces. At 8 A.M. today the quality technician randomly selected
16 bottles from the filling line. The mean amount of cola contained in
the bottles is 31.38 ounces.
Is this an unlikely result? Is it likely the process is putting too much soda
in the bottles? To put it another way, is the sampling error of 0.18
ounces unusual?
32
Point and Interval Estimates
A point estimate is the statistic, computed from sample
information, which is used to estimate the population
parameter.
A confidence interval estimate is a range of values constructed
from sample data so that the population parameter is likely to
occur within that range at a specified probability. The
specified probability is called the level of confidence.
33
Point and Interval Estimates
A point estimate is the statistic, computed from sample information, which is used to
estimate the population parameter.
A confidence interval estimate is a range of values constructed from sample data so that
the population parameter is likely to occur within that range at a specified probability.
The specified probability is called the level of confidence.
34
Factors Affecting Confidence Interval Estimates
The factors that determine the width of a
confidence interval are:
1.The sample size, n.
2.The variability in the population, usually σ
estimated by s.
3.The desired level of confidence.
35
Interval Estimates - Interpretation
For a 95% confidence interval about 95% of the similarly constructed intervals
will contain the parameter being estimated. Also 95% of the sample means
for a specified sample size will lie within 1.96 standard deviations of the
hypothesized population
36
Characteristics of the t-distribution
1. It is, like the z distribution, a continuous distribution.
2. It is, like the z distribution, bell-shaped and symmetrical.
3. There is not one t distribution, but rather a family of t
distributions. All t distributions have a mean of 0, but their
standard deviations differ according to the sample size, n.
4. The t distribution is more spread out and flatter at the
center than the standard normal distribution As the sample
size increases, however, the t distribution approaches the
standard normal distribution,
37
When to Use the z or t Distribution for
Confidence Interval Computation
38
Real Life Example
A tire manufacturer wishes to investigate the tread
life of its tires. A sample of 10 tires driven 50,000
miles revealed a sample mean of 0.32 inch of tread
remaining with a standard deviation of 0.09 inch.
Construct a 95 percent confidence interval for the
population mean. Would it be reasonable for the
manufacturer to conclude that after 50,000 miles
the population mean amount of tread remaining is
0.30 inches?
Population Proportion- Example
The union representing the Bottle Blowers of
America (BBA) is considering a proposal to merge
with the Teamsters Union. According to BBA
union bylaws, at least three-fourths of the union
membership must approve any merger. A random
sample of 2,000 current BBA members reveals
1,600 plan to vote for the merger proposal. What is
the estimate of the population proportion?
Develop a 95 percent confidence interval for the
population proportion. Basing your decision on
this sample information, can you conclude that the
necessary proportion of BBA members favor the
merger? Why?
What is a Hypothesis?
41
What is Hypothesis Testing?
Hypothesis testing is a procedure, based on
sample evidence and probability theory,
used to determine whether the hypothesis is
a reasonable statement and should not be
rejected, or is unreasonable and should be
rejected.
42
Hypothesis Testing Steps
43
Important Things to Remember about H0 and H1
H : null hypothesis and H : alternate hypothesis
0 1
H and H are mutually exclusive and collectively exhaustive
0 1
H is always presumed to be true
0
H has the burden of proof
1
A random sample (n) is used to “reject H ”
0
If we conclude 'do not reject H ', this does not necessarily mean that the
0
null hypothesis is true, it only suggests that there is not sufficient
evidence to reject H0; rejecting the null hypothesis then, suggests that
the alternative hypothesis may be true.
Equality is always part of H (e.g. “=” , “≥” , “≤”).
0
“≠” “<” and “>” always part of H
1
44
How to Set Up a Claim as Hypothesis
45
Left-tail or Right-tail Test?
• The direction of the test involving
claims that use the words “has
improved”, “is better than”, and the like Inequality
Keywords Part of:
will depend upon the variable being Symbol
46
47
Real Life example
Jamestown Steel Company manufactures and
assembles desks and other office equipment at several
plants in western New York State. The weekly
production of the Model A325 desk at the Fredonia
Plant follows the normal probability distribution with a
mean of 200 and a standard deviation of 16. Recently,
because of market expansion, new production methods
have been introduced and new employees hired. The
vice president of manufacturing would like to
investigate whether there has been a change in the
weekly production of the Model A325 desk.
Testing for a Population Mean with a
Known Population Standard Deviation- Example
49
Testing for a Population Mean with a
Known Population Standard Deviation- Example
Type II Error:
Defined as the probability of “accepting” the null
hypothesis when it is actually false.
This is denoted by the Greek letter “β”
51
Real Life Example Population Standard Deviation
Unknown -
The McFarland Insurance Company Claims Department reports the mean cost to
process a claim is $60. An industry comparison showed this amount to be
larger than most other insurance companies, so the company instituted cost-
cutting measures. To evaluate the effect of the cost-cutting measures, the
Supervisor of the Claims Department selected a random sample of 26 claims
processed last month. The sample information is reported below.
At the .01 significance level is it reasonable a claim is now less than $60?
52
Testing for a Population Mean with a
Known Population Standard Deviation- Example
53
Testing for a Population Mean with a
Known Population Standard Deviation- Example
55
Assumptions in Testing a Population Proportion using the z-
Distribution
56
Test Statistic for Testing a Single
Population Proportion
Hypothesized
population proportion
Sample proportion
p
z
(1 )
n
Sample size
57
Test Statistic for Testing a Single Population
Proportion - Example
58
Test Statistic for Testing a Single Population
Proportion - Example
59
Testing for a Population Proportion - Example
61
EXAMPLE 1 continued
62
Example 1 continued
63
Example 1 continued
Xs Xu
z
s2 u2
ns nu
5.5 5.3
The computed value of 3.13 is larger than the
2 2
0.40 0.30 critical value of 2.33. Our decision is to reject the
null hypothesis. The difference of .20 minutes
50 100 between the mean checkout time using the
standard method is too large to have occurred by
0.2
3.13 chance. We conclude the U-Scan method is
0.064 faster.
64
Two-Sample Tests about Proportions
Here are several examples.
The vice president of human resources wishes to know whether there is
a difference in the proportion of hourly employees who miss more than
5 days of work per year at the Atlanta and the Houston plants.
General Motors is considering a new design for the Pontiac Grand Am.
The design is shown to a group of potential buyers under 30 years of
age and another group over 60 years of age. Pontiac wishes to know
whether there is a difference in the proportion of the two groups who
like the new design.
A consultant to the airline industry is investigating the fear of flying
among adults. Specifically, the company wishes to know whether there
is a difference in the proportion of men versus women who are fearful
of flying.
65
Two Sample Tests of Proportions
We investigate whether two samples came from
populations with an equal proportion of successes.
66
Two Sample Tests of Proportions continued
67
Real Life
68
Two Sample Tests of Proportions - Example
69
Two Sample Tests of Proportions - Example
70
Two Sample Tests of Proportions - Example
The computed value of 2.21 is in the area of rejection. Therefore, the null hypothesis is
rejected at the .05 significance level. To put it another way, we reject the null hypothesis
that the proportion of young women who would purchase Heavenly is equal to the
proportion of older women who would purchase Heavenly.
71
Comparing Population Means with Unknown Population
Standard Deviations (the Pooled t-test)
72
Comparing Population Means with Unknown Population Standard
Deviations (the Pooled t-test) - Example
Step 2: State the level of significance. The .10 significance level is stated in
the problem.
73
Comparing Population Means with Unknown Population Standard
Deviations (the Pooled t-test) - Example
74
Comparing Population Means with Unknown Population Standard
Deviations (the Pooled t-test) - Example
75
Comparing Population Means with Unknown Population Standard
Deviations (the Pooled t-test) - Example
76
Comparing Population Means with Unequal Population
Standard Deviations
77
Two-Sample Tests of Hypothesis:
Dependent Samples
For example:
If you wished to buy a car you would look at the same
car at two (or more) different dealerships and compare
the prices.
If you wished to measure the effectiveness of a new diet
you would weigh the dieters at the start and at the finish
of the program.
78
Hypothesis Testing Involving Paired
Observations
d
t
sd / n
Where
is the mean of the differences
sdd is the standard deviation of the differences
n is the number of pairs (differences)
79
Hypothesis Testing Involving Paired Observations -
Example
80
Hypothesis Testing Involving Paired
Observations - Example
81
Characteristics of F-Distribution
There is a “family” of F
Distributions.
Each member of the family is
determined by two parameters: the
numerator degrees of freedom and
the denominator degrees of
freedom.
F cannot be negative, and it is a
continuous distribution.
The F distribution is positively
skewed.
Its values range from 0 to
As F the curve approaches
the X-axis.
82
Comparing Two Population Variances
The F distribution is used to test the hypothesis that the variance of one normal
population equals the variance of another normal population. The following
examples will show the use of the test:
Two Barth shearing machines are set to produce steel bars of the same length.
The bars, therefore, should have the same mean length. We want to ensure that
in addition to having the same mean length they also have similar variation.
The mean rate of return on two types of common stock may be the same, but
there may be more variation in the rate of return in one than the other. A sample
of 10 technology and 10 utility stocks shows the same mean rate of return, but
there is likely more variation in the Internet stocks.
A study by the marketing department for a large newspaper found that men and
women spent about the same amount of time per day reading the paper.
However, the same report indicated there was nearly twice as much variation in
time spent per day among the men than the women.
83
Test for Equal Variances
84
Test for Equal Variances - Example
Lammers Limos offers limousine service from the city hall in Toledo,
Ohio, to Metro Airport in Detroit. Sean Lammers, president of the
company, is considering two routes. One is via U.S. 25 and the other via I-
75. He wants to study the time it takes to drive to the airport using each
route and then compare the results. He collected the following sample data,
which is reported in minutes.
Using the .10 significance level, is there a difference in the variation in the
driving times for the two routes?
85
Test for Equal Variances - Example
86
Test for Equal Variances - Example
87
Test for Equal Variances - Example
Step 5: Compute the value of F and make a decision
We conclude that there is a difference in the variation of the travel times along
88
the two routes.
Comparing Means of Two or More
Populations
The F distribution is also used for testing whether two
or more sample means came from the same or equal
populations.
Assumptions:
The sampled populations follow the normal
distribution.
The populations have equal standard deviations.
The samples are randomly selected and are
independent.
89
Comparing Means of Two or More
Populations
The Null Hypothesis is that the population means are
the same. The Alternative Hypothesis is that at least
one of the means is different.
The Test Statistic is the F distribution.
The Decision rule is to reject the null hypothesis if F
(computed) is greater than F (table) with numerator
and denominator degrees of freedom.
Hypothesis Setup and Decision Rule:
H0: µ1 = µ2 =…= µk
H1: The means are not all equal
Reject H0 if F > F,k-1,n-k
90
Analysis of Variance – F statistic
SST k 1
F
SSE n k
91
Real Life Exmple
92
Comparing Means of Two or More Populations –
Example
94
Comparing Means of Two or More Populations – Example
95
Comparing Means of Two or More Populations –
Example
96
Computing SS Total and SSE
97
Computing SST
The computed value of F is 8.99, which is greater than the critical value of 5.09,
so the null hypothesis is rejected.
Conclusion: The population means are not all equal. The mean scores are not
the same for the four airlines; at this point we can only conclude there is a
difference in the treatment means. We cannot determine which treatment groups
differ or how many treatment groups differ.
98
Two-Way Analysis of Variance
B r2 ( X ) 2
SSB
k n
99
Two-Way Analysis of Variance - Example
WARTA, the Warren Area Regional Transit Authority, is expanding
bus service from the suburb of Starbrick into the central business
district of Warren. There are four routes being considered from
Starbrick to downtown Warren: (1) via U.S. 6, (2) via the West
End, (3) via the Hickory Street Bridge, and (4) via Route 59.
WARTA conducted several tests to determine whether there was
a difference in the mean travel times along the four routes.
Because there will be many different drivers, the test was set up
so each driver drove along each of the four routes. Next slide
shows the travel time, in minutes, for each driver-route
combination. At the .05 significance level, is there a difference in
the mean travel time along the four routes? If we remove the
effect of the drivers, is there a difference in the mean travel time?
100
Two-Way Analysis of Variance - Example
101
Two-Way ANOVA with Interaction
102
Graphical Observation of Mean Times
103
Interaction Effect
We can investigate these questions statistically by extending the two-
way ANOVA procedure presented in the previous section. We add
another source of variation, namely, the interaction.
In order to estimate the “error” sum of squares, we need at least two
measurements for each driver/route combination.
As example, suppose the experiment presented earlier is repeated by
measuring two more travel times for each driver and route
combination. That is, we replicate the experiment. Now we have three
new observations for each driver/route combination.
Using the mean of three travel times for each driver/route combination
we get a more reliable measure of the mean travel time.
104
Three Tests in ANOVA with Replication
The ANOVA now has three sets of hypotheses to test:
1. H0: There is no interaction between drivers and routes.
H1: There is interaction between drivers and routes.
105
Question’s
When can we Have class Tomorrow?
Can We extend it to fourth session if Topics are Not
finished By tomorrow?