Professional Documents
Culture Documents
JOHN S. LOUCKS
St. Edwards University
Chapter 3
Descriptive Statistics: Numerical
Methods
Measures of Location
Measures of Variability
Measures of Relative Location and Detecting
Outliers
Exploratory Data Analysis
Measures of Association Between Two
Variables
The Weighted Mean and
Working with Grouped Data
Measures of Location
Mean
Median
Mode
Percentiles
Quartiles
Mean
Mean
xi 34 , 356
x
490.80
n
70
425
440
450
465
480
510
575
430
440
450
470
485
515
575
430
440
450
470
490
525
580
435
445
450
472
490
525
590
435
445
450
475
490
525
600
435
445
460
475
500
535
600
435
445
460
475
500
549
600
435
445
460
480
500
550
600
440
450
465
480
500
570
615
440
450
465
480
510
570
615
6
Median
Median
Median
Median = 50th percentile
i = (p/100)n = (50/100)70 = 35.5
Averaging the 35th and 36th data values:
Median = (475 + 475)/2 = 475
425
440
450
465
480
510
575
430
440
450
470
485
515
575
430
440
450
470
490
525
580
435
445
450
472
490
525
590
435
445
450
475
490
525
600
435
445
460
475
500
535
600
435
445
460
475
500
549
600
435
445
460
480
500
550
600
440
450
465
480
500
570
615
440
450
465
480
510
570
615
9
Mode
10
Mode
450 occurred most frequently (7 times)
Mode = 450
425
440
450
465
480
510
575
430
440
450
470
485
515
575
430
440
450
470
490
525
580
435
445
450
472
490
525
590
435
445
450
475
490
525
600
435
445
460
475
500
535
600
435
445
460
475
500
549
600
435
445
460
480
500
550
600
440
450
465
480
500
570
615
440
450
465
480
510
570
615
11
Percentiles
12
Percentiles
13
90th Percentile
i = (p/100)n = (90/100)70 = 63
Averaging the 63rd and 64th data values:
90th Percentile = (580 + 590)/2 =
585
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
14
Quartiles
15
Third Quartile
Third quartile = 75th percentile
i = (p/100)n = (75/100)70 = 52.5 = 53
Third quartile = 525
425
440
450
465
480
510
575
430
440
450
470
485
515
575
430
440
450
470
490
525
580
435
445
450
472
490
525
590
435
445
450
475
490
525
600
435
445
460
475
500
535
600
435
445
460
475
500
549
600
435
445
460
480
500
550
600
440
450
465
480
500
570
615
440
450
465
480
510
570
615
16
Measures of Variability
17
Measures of Variability
Range
Interquartile Range
Variance
Standard Deviation
Coefficient of Variation
18
Range
19
Range
Range = largest value - smallest value
Range = 615 - 425 = 190
425
440
450
465
480
510
575
430
440
450
470
485
515
575
430
440
450
470
490
525
580
435
445
450
472
490
525
590
435
445
450
475
490
525
600
435
445
460
475
500
535
600
435
445
460
475
500
549
600
435
445
460
480
500
550
600
440
450
465
480
500
570
615
440
450
465
480
510
570
615
20
Interquartile Range
21
Interquartile Range
3rd Quartile (Q3) = 525
1st Quartile (Q1) = 445
Interquartile Range = Q3 - Q1
80
425 430 430 435 435 435 435
440 440 440 445 445 445 445
450 450 450 450 450 460 460
465 470 470 472 475 475 475
480 485 490 490 490 500 500
510 515 525 525 525 535 549
575 575 580 590 600 600 600
= 525 - 445 =
435
445
460
480
500
550
600
440
450
465
480
500
570
615
440
450
465
480
510
570
615
22
Variance
23
Variance
x
)
i
s2
n 1
i
2
N
24
Standard Deviation
25
Coefficient of Variation
(100)
(100)
26
Variance
( xi x ) 2
n 1
2 , 996.16
Standard Deviation
s s 2 2996. 47 54. 74
Coefficient of Variation
s
54. 74
100
100 11.15
x
490.80
27
z-Scores
Chebyshevs Theorem
Empirical Rule
Detecting Outliers
28
z-Scores
zi i
1. 20
s
54. 74
-1.20
-0.93
-0.75
-0.47
-0.20
0.35
1.54
-1.11
-0.93
-0.75
-0.38
-0.11
0.44
1.54
-1.11
-0.93
-0.75
-0.38
-0.01
0.62
1.63
-1.02
-0.84
-0.75
-0.34
-0.01
0.62
1.81
-1.02
-0.84
-0.75
-0.29
-0.01
0.62
1.99
-1.02
-0.84
-0.56
-0.29
0.17
0.81
1.99
-1.02
-0.84
-0.56
-0.29
0.17
1.06
1.99
-1.02
-0.84
-0.56
-0.20
0.17
1.08
1.99
-0.93
-0.75
-0.47
-0.20
0.17
1.45
2.27
-0.93
-0.75
-0.47
-0.20
0.35
1.45
2.27
30
Chebyshevs Theorem
At least (1 - 1/k2) of the items in any data set
will be
within k standard deviations of the mean, where
k is
any value greater than 1.
At least 75% of the items must be within
k = 2 standard deviations of the
mean.
At least 89% of the items must be within
k = 3 standard deviations of the
mean.
At least 94% of the items must be within
k = 4 standard deviations of the
mean.
31
Chebyshevs Theorem
Let k = 1.5 withx
54.74
= 490.80 and s =
32
425
440
450
465
480
510
575
430
440
450
470
485
515
575
430
440
450
470
490
525
580
435
445
450
472
490
525
590
435
445
450
475
490
525
600
435
445
460
475
500
535
600
435
445
460
475
500
549
600
435
445
460
480
500
550
600
440
450
465
480
500
570
615
440
450
465
480
510
570
615
33
Empirical Rule
For data having a bell-shaped distribution:
34
Empirical Rule
For data having a bell-shaped distribution:
35
Empirical Rule
For data having a bell-shaped distribution:
36
Empirical Rule
Interval
% in Interval
Within +/- 1s 436.06 to 545.54 48/70 = 69%
Within +/- 2s 381.32 to 600.28 68/70 = 97%
Within +/- 3s 326.58 to 655.02 70/70 = 100%
425
440
450
465
480
510
575
430
440
450
470
485
515
575
430
440
450
470
490
525
580
435
445
450
472
490
525
590
435
445
450
475
490
525
600
435
445
460
475
500
535
600
435
445
460
475
500
549
600
435
445
460
480
500
550
600
440
450
465
480
500
570
615
440
450
465
480
510
570
615
37
Detecting Outliers
38
Detecting Outliers
The most extreme z-scores are -1.20 and
2.27.
Using |z| > 3 as the criterion for an outlier,
there are no outliers in this data set.
-1.20
-0.93
-0.75
-0.47
-0.20
0.35
1.54
-1.11
-0.93
-0.75
-0.38
-0.01
0.62
1.63
-1.02
-0.84
-0.75
-0.34
-0.01
0.62
1.81
-1.02
-0.84
-0.75
-0.29
-0.01
0.62
1.99
-1.02
-0.84
-0.56
-0.29
0.17
0.81
1.99
-1.02
-0.84
-0.56
-0.29
0.17
1.06
1.99
-1.02
-0.84
-0.56
-0.20
0.17
1.08
1.99
-0.93
-0.75
-0.47
-0.20
0.17
1.45
2.27
-0.93
-0.75
-0.47
-0.20
0.35
1.45
2.27
39
Five-Number Summary
Box Plot
40
Five-Number Summary
Smallest Value
First Quartile
Median
Third Quartile
Largest Value
41
435
445
460
480
500
550
600
440
450
465
480
500
570
615
440
450
465
480
510
570
615
42
Box Plot
44
Box Plot
Lower Limit: Q1 - 1.5(IQR) = 450 - 1.5(75)
= 337.5
Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(75)
= 637.5
There are no outliers.
37
5
40
0
42
5
45
0
47
5
50
0
45
Measures of Association
Between Two Variables
Covariance
Correlation Coefficient
46
Covariance
47
Covariance
xy
( xi x )( yi y )
48
Correlation Coefficient
rxy
xy
sx s y
xy
x y
xy
Weighted Mean
Mean for Grouped Data
Variance for Grouped Data
Standard Deviation for Grouped Data
50
Weighted Mean
51
Weighted Mean
x = wi xi
wi
where:
xi = value of observation i
wi = weight for observation i
52
Grouped Data
Sample Data
fM
x
f
i
Population Data
fM
where:
fi = frequency of class i
Mi = midpoint of class i
54
17
12
8
7
4
2
4
2
6
55
Sample Data
2
f
(
M
x
)
i
i
s2
n 1
Population Data
2
f
(
M
i
i
2
N
57
s 3, 017.89 54. 94
This approximation differs by only $.20
from the actual standard deviation of $54.74.
58
End of Chapter 3
59