Professional Documents
Culture Documents
Elementary Statistics
Accidents Larson Farber
60
50
40
30
20
10
0
0 2 4 6 8 10 12 14 16 18 20
Ch. 9 Larson/Farber
Hours of Training 1
Section 9.1
Correlation
Correlation
A relationship between two variables.
Explanatory Response
(Independent)Variable (Dependent)Variable
x y
Hours of Training Number of Accidents
Shoe Size Height
Cigarettes smoked per day Lung Capacity
Score on SAT Grade Point Average
Height IQ
What type of relationship exists between the two variables
and is the correlation significant?
Ch. 9 Larson/Farber 3
Scatter Plots and Types of Correlation
x = hours of training
y = number of accidents
Accidents
60
50
40
30
20
10
0 2 4 6 8 10 12 14 16 18 20
Hours of Training
Ch. 9 Larson/Farber 4
Scatter Plots and Types of Correlation
x = SAT score
GPA
4.00
y = GPA
3.75
3.50
3.25
3.00
2.75
2.50
2.25
2.00
1.75
1.50
300 350 400 450 500 550 600 650 700 750 800
Math SAT
150
140
130
120
110
100
90
80
60 64 68 72 76 80
Height
No linear correlation
Ch. 9 Larson/Farber 6
Correlation Coefficient
A measure of the strength and direction of a linear relationship
between two variables
nxy − xy
r=
nx 2 − (x ) ny 2 − (y ) 2
2
-1 0 1
If r is close to -1 If r is close to If r is close to 1
there is a strong 0 there is no there is a strong
negative linear positive
correlation correlation correlation
Ch. 9 Larson/Farber 7
x Application
Absences Grade
x y
8 78
2 92
Final
5 90
Grade
95 12 58
90
85
80
15 43
75
70
9 74
65
60 6 81
55
50
45
40 Absences
0 2 4 6 8 10 12 14 16
x
Ch. 9 Larson/Farber 8
Computation of r
x y xy x2 y2
1 8 78 624 64 6084
2 2 92 184 4 8464
3 5 90 450 25 8100
4 12 58 696 144 3364
5 15 43 645 225 1849
6 9 74 666 81 5476
7 6 81 486 36 6561
57 516 3751 579 39898
nxy − xy 7(579) − ( 57 ) 7(39898) − (516)2
r= =
2
− 3155
r=
804 13030 = - 0.975
Ch. 9 Larson/Farber 9
Hypothesis Test for Significance
r is the correlation coefficient for the sample. The
correlation coefficient for the population is (rho).
For a two tail test for significance:
H 0 : = 0 (The correlation is not significant)
H a : 0 (The correlation is significant)
For left-tail and right tail to test H0 : 0 H0 : 0
negative or positive significance:
Ha : 0 Ha : 0
Linear Regression
The Line of Regression
Once you know there is a significant linear correlation, you
can write an equation describing the relationship between
the x and y variables. This equation is called the line of
regression or least squares line.
The equation of a line may be written as y = mx + b
where m is the slope of the line and b is the y-intercept
The line of regression is: yˆ = mx + b
The slope m is nxy − xy
m=
nx 2 − (x) 2
The y-intercept is
b = y − mx
Ch. 9 Larson/Farber 15
(xi,yi) = a data point
ˆ i ) = a point on the line with same x-value
( xi , y
d i = yi − y
ˆ i Called a residual
revenue
260
250 (xi,yi)
240 di
230 ( xi , yˆ i )
220
210
d
200 2 is a minimum
190
180
1.5 2.0 2.5 3.0
Ad $
x y xy x2 Write the equation of the
1 8 78 624 64 line of regression with
2 2 92 184 4 x = number of absences
3 5 90 450 25 and y = final grade.
4 12 58 696 144
5 15 43 645 225
6 9 74 666 81 Calculate m and b
7 6 81 486 36
57 516 3751 579
Ch. 9 Larson/Farber 19
Section 9.3
Measures of
Regression and
Correlation
The Coefficient of Determination
The coefficient of determination, r2 is the ratio of explained
variation in y to the total variation in y.
Explained variation
r =
2
Total variation
The correlation coefficient of number of times absent and final
grade is r = - 0.975. The coefficient of determination is r2 = (-
0.975)2 = 0.9506.
Interpretation: About 95% of the variation in final grades can be
explained by the number of times a student is absent. The other
5% is unexplained and can be due to sampling error or other
variables such as intelligence, amount of time studied etc.
Ch. 9 Larson/Farber 21
The Standard Error of Estimate
The Standard Error of Estimate se is the standard deviation
of the observed yi values about the predicted ŷvalue.
( yi − yˆ i ) 2
se =
n−2
Ch. 9 Larson/Farber 22
The Standard Error of Estimate
x y ŷ ( y − yˆ ) 2 ( yi − yˆ i ) 2
se =
1 8 78 74.275 13.8756 n−2
2 2 92 97.819 33.8608
3 5 90 86.047 15.6262 92.767
4 12 58 58.579 0.3352 se = 5
5 15 43 46.807 14.4932
6 9 74 70.351 13.3152
7 6 81 82.123 1.2611
92.767
Calculate yˆ = −3.924 x + 105 .667 for each x. = 4.307
Ch. 9 Larson/Farber 23
Prediction Intervals
Given a specific linear regression equation and x0 a specific value of x, a
c-prediction interval for y is:
ˆ −E y y
y ˆ +E
where 1 n( x0 − x ) 2
E = tc se 1+ +
n nx 2 − (x) 2
Ch. 9 Larson/Farber 24
Application
Construct a 90% confidence interval for a final grade when a student
has been absent 6 times.
The point (6, 82.123) is the point on the regression line with x-
coordinate of 6.
Ch. 9 Larson/Farber 25
Application
Construct a 90% confidence interval for a final grade when a
student has been absent 6 times.
1 n( x0 − x ) 2
2. Find E E = t c se 1 + +
n n x 2 − ( x ) 2
1 7(6 − 8.14) 2
= 2.015(4.307) 1 + +
7 7(579) − (57) 2
= 2.015(4.307) 1.18273 = 9.438
Ch. 9 Larson/Farber 27
Minitab Output
Regression Analysis
Ch. 9 Larson/Farber 28
Section 9.4
Multiple Regression
More Explanatory Variables
absence IQ Grade
8 115 78
2 135 92
5 126 90
12 110 58
15 105 43
9 120 74
6 125 81
Ch. 9 Larson/Farber 30
Minitab Output
Regression Analysis
Ch. 9 Larson/Farber 31
Interpretation
The regression equation is
Grade = 52.7 - 2.65 absence + 0.357 IQ
Ch. 9 Larson/Farber 32
Predicting the Response Variable
Ch. 9 Larson/Farber 33
Chi- Square Tests and the F-
Distribution
10
Elementary Statistics
Larson Farber
Ch. 9 Larson/Farber
Section 10.1
Goodness of Fit
Chi-Square Distributions
Several important statistical tests use a probability
distribution known as chi square, denoted ².
2 for 1 or 2 d.f. 2 for 3 or more d.f.
0 2 0 2
² is a family of distributions. The graph of the ² distribution
depends on the number of degrees of freedom (number of free
choices) in a statistical experiment.
The ² distributions are skewed right and are not symmetric.
The value of 2 is greater than or equal to 0.
Ch. 9 Larson/Farber
Multinomial Experiments
Ch. 9 Larson/Farber
2 = 0.6755
0 11.34 2
Ch. 9 Larson/Farber
Section 10.2
Independence
Test for Independence
Ch. 9 Larson/Farber
Application
The following table reflects the gender and job performance
evaluation of 220 accountants. Test the claim that gender and
job performance are independent. Use = 0.05
(O − E ) 2
2 =
E
Ch. 9 Larson/Farber
Chi-Square Test
(O − E ) 2
=
2
E
O E (O-E)2 (O-E)2/E
22 18.33 13.49 0.74
81 79.42 2.50 0.03
9 14.25 27.61 1.94
14 17.67 13.49 0.76
75 76.58 2.50 0.03
19 13.75 27.61 2.01
220 220.00 5.51
= 5.512
Ch. 9 Larson/Farber
5.99 2
0
Comparing Two
Variances
Two Sample Test for Variances
To compare population variances, 12 and 2 2 , use the F-distribution.
Let s12 and s22 represent the sample variances of two different populations.
If both populations are normal and the population variances,
12 and 2 2 , are equal, then the sampling distribution
2
is called an F-distribution. s1
s12 always represents the F = 2
s2
larger of the two variances.
0.8
0.7
0.6
d.f.N = 8
0.5 d.f.D =20
0.4
0.3
0.2
0.1
0.0 0 1 2 3 4 5
Ch. 9 Larson/Farber
F-Test for Variances
In F-tests for equal variances, only use the right tail critical
value. For a right tailed test, use the critical value
corresponding to the one in the table for the given .
For a two tail test, use the right hand critical value
corresponding to 2 .
Ch. 9 Larson/Farber
Application
An engineer wants to perform a t-test to see if the mean gas
consumption of Car A is lower than that for Car B. A random sample of
gas consumption of 16 Car A’s has a standard deviation of 4.5. A
random sample of the gas consumption of 22 Car B’s has a standard
deviation 4.2. Should the engineer use the t-test with equal variances
or the one for unequal variances? Use = 0.05.
1. Write the null and alternative hypothesis
Since the sample variance for
H0: 12 = 22
Car A is larger than that for Car
Ha: 12 22 B, use s12 to represent the
2. State the level of significance sample variance for car A.
= 0.05
Ch. 9 Larson/Farber
3. Determine the sampling distribution
An F distribution with d.f.N = 15, d.f.D =21
0.8
0.7
4. Find the critical value
0.6
0.5
0.4 5. Find the rejection region
0.3
0.2
0.1 0.025
0.0 0 1 2 3 4 5
2.53
Ch. 9 Larson/Farber
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1 0.025
0.0 0 1 2 2.53 3 4 5
Analysis of Variance
ANOVA
(
SS B = ni xi − x ) 2
( )
2
SS B ni xi − x
MS B = =
k −1 k −1
Ch. 9 Larson/Farber
Mean Square Within
Calculate SSW and divide by N-k, the degrees of freedom
N −k N −k
•If MSB is close in value to MSW the variation is not attributed to
different effects the different treatments have on the variable. The
ratio of the two measures (F-ratio) is close to 1.
Ch. 9 Larson/Farber
6. Find the test statistic MS B
F=
MSW
Northeast Midwest South West
308 246 103 223
58 169 143 184
141 246 164 221
109 158 119 269
220 167 99 199
144 76 214 171
316 108 204
Calculate the mean and variance for each sample
x = 185.14 177.00 135.71 210.14
s 2 = 9838.66 4050.05 1020.80
1741.39
4779
Calculate x the mean of all values. x= = 177
27
Ch. 9 Larson/Farber
( )2
SS B = ni xi − x
mean n (x − x)
i
2
(
ni xi − x )
2
SS B 20086
MS B = = = 6695.33
k −1 3
Ch. 9 Larson/Farber
SSW = (ni − 1)s 2
i
n s2 (ni − 1)si2
1 7 9838.66 59031.9
2 6 4050.05 20250.2
3 7 1741.39 10448.4
4 7 1020.80 6124.8
95855
95855
MSW = = 4167.61
23 6955.33
F= = 1.669
4167.61
Ch. 9 Larson/Farber
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1 0.10
0.0 0 1 2 3 4 5
2.53
7. Make your decision
Since F= 1.669 does not fall in the rejection region, fail to reject
the null hypothesis.
8. Interpret your decision
There is not enough evidence to support the claim that the
means are not all equal. Expenses for reading are the same for
all four regions.
Ch. 9 Larson/Farber
Minitab Output
One-way Analysis of Variance
Analysis of Variance
Source DF SS MS F P
Factor 3 20085 6695 1.61 0.215
Error 23 95857 4168
Total 26 15942
Ch. 9 Larson/Farber
Chapter
11
Nonparametric Tests
Elementary Statistics
Larson Farber
Ch. 9 Larson/Farber 69
Section 11.1
Hypotheses
Left-tailed test: H0: median k and Ha:median < k
or
Right-tailed test: H0: median k and Ha: median > k
or
Two-tailed test: H0: median = k and Ha: median k
Ch. 9 Larson/Farber
Sign Test
To use the sign test, first compare each entry in the sample to
the hypothesized median, k.
Ch. 9 Larson/Farber
Sign Test
Test Statistic: When n 25 the test statistic is the smaller
number of + or - signs.
( x + 0.5) − 0.5n
When n > 25 the test statistic is: z=
n
2
For n > 25, you are testing the binomial probability that p = 0.50
Ch. 9 Larson/Farber
Application
A meteorologist claims that the daily median temperature for the
month of January in San Diego is 57 Fahrenheit. The temperatures
(in degrees Fahrenheit) for 18 randomly selected January days are
listed below. At = 0.01, can you support the meteorologist’s claim?
58 62 55 55 53 52 52 59 55 55 60 56 57 61 58 63 63 55
1. Write the null and alternative hypothesis
H0: median = 57 and Ha: median 57
2. State the level of significance
= 0.01
3. Determine the sampling distribution
Binomial with p = 0.5
Ch. 9 Larson/Farber
58 62 55 55 53 52 52 59 55
55 60 56 57 61 58 63 63 55
+ + - - - - - + -
- + - 0 + + + + -
Ch. 9 Larson/Farber
4. Find the critical value With n = 17 use Table 8
Critical value is 2.
Ch. 9 Larson/Farber
7. Make your decision
The test statistic, 8, does not fall in the critical region. Fail to
reject the null hypothesis.
8. Interpret your decision
There is not enough evidence to reject the meteorologist’s
claim that the median daily temperature for January in San
Diego is 57
The sign test can also be used with paired
data (such as before and after). Find the
difference between corresponding values and
record the sign. Use the same procedure.
Ch. 9 Larson/Farber
Section 11.2
= 0.01
Ch. 9 Larson/Farber
Before After Diff abs rank Sign rank
Ch. 9 Larson/Farber
The sum of the positive ranks is 5 + 6 + 3 + 8 + 7 + 4 = 33
Ch. 9 Larson/Farber
Wilcoxon Rank-Sum
The Wilcoxon rank-sum test is a nonparametric test that can
be used to determine whether two independent samples were
selected from populations having the same distribution.
When the samples are the same size, it does not matter which is n1.
Ch. 9 Larson/Farber
Wilcoxon Rank-Sum
Test statistic:
Combine the data from both samples and rank it.
R = the sum of the ranks for the smaller sample.
Find the z-score for the value of R.
where R − R
z=
R
n1 (n1 + n2 + 1) n1n2 (n1 + n2 + 1)
R = R =
2 12
Ch. 9 Larson/Farber
Section 11.3
The Kruskal-Wallis
Test
The Kruskal-Wallis Test
The Kruskal-Wallis test is a nonparametric test that can be used to
determine whether three or more independent samples were selected
from populations having the same distribution.
Reject the null hypothesis when H is greater than the critical number.
(always use a right tail test.)
Ch. 9 Larson/Farber 87
Application
You want to compare the hourly pay rates of accountants
who work in Michigan, New York and Virginia. To do so,
you randomly select 10 accountants in each state and record
their hourly pay rate as shown below. At the .01 level, can
you conclude that the distributions of accountants’ hourly
pay rates in these three states are different?
MI(1) NY(2) VA(3)
14.24 21.18 17.02
14.06 20.94 20.63
14.85 16.26 17.47
17.47 21.03 15.54
14.83 19.95 15.38
19.01 17.54 14.9
13.08 14.89 20.48
15.94 18.88 18.5
13.48 20.06 12.8
16.94 21.81 15.57
Ch. 9 Larson/Farber 88
1. Write the null and alternative hypothesis
Ch. 9 Larson/Farber
R1= 94.5, R2 = 223, R3 =147.5 Find the test statistic
n1 = 10, n2=10 and n3 =10, so N = 30
12 94.52 2232 147.52
H= + + − 3(30 + 1) = 10.76
30(30 + 1) 10 10 10
9.210 10.76
Make Your Decision
The test statistic, 10.76 falls in the rejection region, so
Reject the null hypothesis
Interpret your decision
There is a difference in the salaries of the 3 states.
Ch. 9 Larson/Farber
Section 11.4
Rank Correlation
Section 11.4
Rank Correlation
Statistics
Rank Correlation
The Spearman rank correlation coefficient, rs, is a measure of the
strength of the relationship between two variables. The Spearman rank
correlation coefficient is calculated using the ranks of paired sample
data entries. The formula for the Spearman rank correlation coefficient
is 6d 2
rs = 1 −
n n( 2
−1 ) .
The hypotheses:
H0: = 0 (There is no correlation between the variables.)
Ha: 0 (There is a significant correlation between the variables.)
Ch. 9 Larson/Farber 94
Rank Correlation
Seven candidates applied for a x y
nursing position. The seven
candidates were placed in rank 1 2 1
order first by x and then by y. The 2 4 4
results of the rankings are listed 3 1 3
below. Using a .05 level of 4 5 2
significance, test the claim that 5 7 6
there is a significant correlation 6 3 1
between the variables. 7 6 7
1 2 1 1 1
2 4 4 0 0 Critical Value
3 1 3 -2 4 = 0 .715
4 5 2 3 9
5 7 6 1 1
6 3 1 2 4
7 6 7 -1 1
20
6d 2 6(20)
rs = 1 − = 1− = 1 − 0.357 = 0.643
n ( n − 1)
2
7(7 − 1)
2
Since the statistic 0.643 does not fall in the rejection region,
fail to reject H0 . There is not enough evidence to support the
claim that there is a significant correlation.
Ch. 9 Larson/Farber