Chapter 3 B

Contemporary Business
Statistics, 3e
by
Williams, Sweeney, and Anderson
Slides by
JOHN
LOUCKS
St. Edward’s
University
© 2009 Cengage South-Western. All Rights Reserved Slide

1
Chapter 3, Part B
Descriptive Statistics: Numerical Measures
 Measures of Distribution Shape, Relative Location,
and Detecting Outliers
 Exploratory Data Analysis
 Measures of Association Between Two Variables
 The Weighted Mean and
Working with Grouped Data

2
Measures of Distribution Shape,
Relative Location, and Detecting Outliers
 Distribution Shape
 z-Scores
 Chebyshev’s Theorem
 Empirical Rule
 Detecting Outliers

3
Distribution Shape: Skewness
 An important measure of the shape of a distribution

is called skewness.
 The formula for computing skewness for a data set is
somewhat complex.
 Skewness can be easily computed using statistical
software.
 Excel’s SKEW function can be used to compute the
skewness of a data set.

4
 Symmetric (not skewed)

• Skewness is zero.
• Mean and median are equal.
.35
Skewness = 0
.30
Relative Frequency
.25
.20
.15
.10
.05
0

5
 Moderately Skewed Left

• Skewness is negative.
• Mean will usually be less than the median.
.35
Skewness = .31
.30
Relative Frequency
.25
.20
.15
.10
.05
0

6
 Moderately Skewed Right

• Skewness is positive.
• Mean will usually be more than the median.
.35
Skewness = .31
.30
Relative Frequency
.25
.20
.15
.10
.05
0

7
 Highly Skewed Right

• Skewness is positive (often above 1.0).
• Mean will usually be more than the median.
.35
Skewness = 1.25
.30
Relative Frequency
.25
.20
.15
.10
.05
0

8
 Example: Apartment Rents

Seventy efficiency apartments were randomly
sampled in a college town. The monthly rent prices
for the apartments are listed below in ascending order.
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

9
.35 Skewness = .92

.30
Relative Frequency
.25
.20
.15
.10
.05
0

10
z-Scores
The z-score is often called the standardized value.
It denotes the number of standard deviations a data

value xi is from the mean.
xi  x
zi 
s

11
z-Scores
 An observation’s z-score is a measure of the relative

location of the observation in a data set.
 A data value less than the sample mean will have a
z-score less than zero.
 A data value greater than the sample mean will have
a z-score greater than zero.
 A data value equal to the sample mean will have a
z-score of zero.

12
z-Scores

• z-Score of Smallest Value (425)
xi  x 425  490.80
z    1.20
s 54.74
Standardized Values for Apartment Rents

-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93
-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75
-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47
-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20
-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.35
0.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.45
1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27

13
Chebyshev’s Theorem
At least (1 - 1/z2) of the items in any data set will be

within z standard deviations of the mean, where z is
any value greater than 1.

14
At least 75% of the data values must be

within z = 2 standard deviations of the mean.



15

Let z = 1.5 with x = 490.80 and s = 54.74
At least (1  1/(1.5)2) = 1  0.44 = 0.56 or 56%

of the rent values must be between
x - z(s) = 490.80  1.5(54.74) = 409
and
x + z(s) = 490.80 + 1.5(54.74) = 573
(Actually, 86% of the rent values

are between 409 and 573.)

16
Empirical Rule
For data having a bell-shaped distribution:
68.26% of the values of a normal random variable

are within +/- 1 standard deviation of its mean.

are within +/- 2 standard deviations of its mean.

are within +/- 3 standard deviations of its mean.

17
Empirical Rule
99.72%
95.44%
68.26%

x
 – 3  – 1  + 1  + 3
 – 2  + 2

18
Detecting Outliers
 An outlier is an unusually small or unusually large

value in a data set.
 A data value with a z-score less than -3 or greater
than +3 might be considered an outlier.
 It might be:
• an incorrectly recorded data value
• a data value that was incorrectly included in the
data set
• a correctly recorded data value that belongs in
the data set

19
Detecting Outliers

• The most extreme z-scores are -1.20 and 2.27
• Using |z| > 3 as the criterion for an outlier, there
are no outliers in this data set.
Standardized Values for Apartment Rents

-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93
-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75
-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47
-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20
-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.35
0.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.45
1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27

20
Exploratory Data Analysis
 Five-Number Summary
 Box Plot

21
Five-Number Summary
1 Smallest Value
2 First Quartile
3 Median
4 Third Quartile
5 Largest Value

22
Five-Number Summary

Lowest Value = 425 First Quartile = 445
Median = 475
Third Quartile = 525 Largest Value = 615
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

23
Box Plot

• A box is drawn with its ends located at the first and
third quartiles.
• A vertical line is drawn in the box at the location of
the median (second quartile).
400 425 450 475 500 525 550 575 600 625
Q1 = 445 Q3 = 525
Q2 = 475
24
Box Plot
 Limits are located (not drawn) using the interquartile

range (IQR).
 Data outside these limits are considered outliers.
 The locations of each outlier is shown with the
symbol * .
continued

25
Box Plot

• The lower limit is located 1.5(IQR) below Q1.
Lower Limit: Q1 - 1.5(IQR) = 445 - 1.5(80) = 325
• The upper limit is located 1.5(IQR) above Q3.

Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(80) = 645
• There are no outliers (values less than 325 or

greater than 645) in the apartment rent data.

26
Box Plot

• Whiskers (dashed lines) are drawn from the ends
of the box to the smallest and largest data values
inside the limits.
400 425 450 475 500 525 550 575 600 625
Smallest value Largest value

inside limits = 425 inside limits = 615
27
Measures of Association
Between Two Variables
 Covariance
 Correlation Coefficient

28
Covariance
The covariance is a measure of the linear association

between two variables.
Positive values indicate a positive relationship.
Negative values indicate a negative relationship.

29
Covariance
The covariance is computed as follows:
 ( xi  x )( yi  y ) for
sxy 
n 1 samples
 ( xi   x )( yi   y ) for
 xy  populations
N

30
Correlation Coefficient
Correlation is a measure of linear association and not

necessarily causation.
Just because two variables are highly correlated, it

does not mean that one variable is the cause of the
other.

31
The correlation coefficient is computed as follows:

sxy  xy
rxy   xy 
sx s y  x y
for for
samples populations

32
The coefficient can take on values between -1 and +1.
Values near -1 indicate a strong negative linear

relationship.
Values near +1 indicate a strong positive linear

relationship.

33
Covariance and Correlation Coefficient
 Example: Golfing Study

A golfer is interested in investigating the
relationship, if any, between driving distance and 18-
hole score.
Average Driving Average
Distance (yds.) 18-Hole Score
277.6 69
259.5 71
269.1 70
267.0 70
255.6 71
272.9 69

34
x y ( xi  x ) ( y i  y ) ( xi  x )( y i  y )
277.6 69 10.65 -1.0 -10.65
259.5 71 -7.45 1.0 -7.45
269.1 70 2.15 0 0
267.0 70 0.05 0 0
255.6 71 -11.35 1.0 -11.35
272.9 69 5.95 -1.0 -5.95
Average 267.0 70.0 Total -35.40
Std. Dev. 8.2192 .8944

35

• Sample Covariance
sxy 
 ( x  x )( y
i i  y)

35.40
  7.08
n1 61
• Sample Correlation Coefficient
sxy 7.08
rxy    -.9631
sx sy (8.2192)(.8944)

36
Using Excel to Compute the
• Formula Worksheet
A B C D E
Average 18-Hole
1 Drive Score
2 277.6 69 Pop. Covariance =COVAR(A2:A7,B2:B7)
3 259.5 71 Samp. Correlation =CORREL(A2:A7,B2:B7)
4 269.1 70
5 267.0 70
6 255.6 71
7 272.9 69
8

37
Using Excel to Compute the
• Value Worksheet
A B C D E
Average 18-Hole
1 Drive Score
2 277.6 69 Pop. Covariance -5.9
3 259.5 71 Samp. Correlation -0.9631
4 269.1 70
5 267.0 70
6 255.6 71
7 272.9 69
8
Sample Covariance = sxy = n/(n – 1)xy = 6/(6 – 1)(-5.9) = -7.08

38
The Weighted Mean and
Working with Grouped Data
 Weighted Mean
 Mean for Grouped Data
 Variance for Grouped Data
 Standard Deviation for Grouped Data

39
Weighted Mean
 When the mean is computed by giving each data

value a weight that reflects its importance, it is
referred to as a weighted mean.
 In the computation of a grade point average (GPA),
the weights are the number of credit hours earned for
each grade.
 When data values vary in importance, the analyst
must choose the weight that best reflects the
importance of each value.

40
Weighted Mean
x
 wx i i
w i
where:
xi = value of observation i
wi = weight for observation i

41
Grouped Data
 The weighted mean computation can be used to

obtain approximations of the mean, variance, and
standard deviation for the grouped data.
 To compute the weighted mean, we treat the
midpoint of each class as though it were the mean
of all items in the class.
 We compute a weighted mean of the class midpoints
using the class frequencies as weights.
 Similarly, in computing the variance and standard
deviation, the class frequencies are used as weights.

42
Mean for Grouped Data
 Sample Data
x  fM i i
 Population Data
  fM i i
N
where:
fi = frequency of class i
Mi = midpoint of class i

43
Sample Mean for Grouped Data

The previously presented sample of apartment
rents is shown here as grouped data in the form of
a frequency distribution. Rent ($) Frequency
420-439 8
440-459 17
460-479 12
480-499 8
500-519 7
520-539 4
540-559 2
560-579 4
580-599 2
600-619 6
44
Sample Mean for Grouped Data
Rent ($) fi Mi f iMi

420-439 8 429.5 3436.0 34,525
x  493.21
440-459 17 449.5 7641.5 70
460-479 12 469.5 5634.0 This approximation
480-499 8 489.5 3916.0
differs by $2.41 from
500-519 7 509.5 3566.5
520-539 4 529.5 2118.0 the actual sample
540-559 2 549.5 1099.0 mean of $490.80.
560-579 4 569.5 2278.0
580-599 2 589.5 1179.0
600-619 6 609.5 3657.0
Total 70 34525.0

45
Variance for Grouped Data
 For sample data

2
 f i ( M i  x )
s2 
n 1
 For population data

2
 f i ( M i   )
2 
N

46
Sample Variance for Grouped Data
Rent ($) fi Mi Mi - x (M i - x )2 f i (M i - x )2
420-439 8 429.5 -63.7 4058.96 32471.71
440-459 17 449.5 -43.7 1910.56 32479.59
460-479 12 469.5 -23.7 562.16 6745.97
480-499 8 489.5 -3.7 13.76 110.11
500-519 7 509.5 16.3 265.36 1857.55
520-539 4 529.5 36.3 1316.96 5267.86
540-559 2 549.5 56.3 3168.56 6337.13
560-579 4 569.5 76.3 5820.16 23280.66
580-599 2 589.5 96.3 9271.76 18543.53
600-619 6 609.5 116.3 13523.36 81140.18
Total 70 208234.29
continued
47
Sample Variance for Grouped Data

• Sample Variance
s2 = 208,234.29/(70 – 1) = 3,017.89
• Sample Standard Deviation

s  3,017.89  54.94
This approximation differs by only $.20

from the actual standard deviation of $54.74.

48
End of Chapter 3, Part B

49

Chapter 3 B

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 3 B

Uploaded by

Copyright:

Available Formats

Contemporary Business

© 2009 Cengage South-Western. All Rights Reserved Slide

© 2009 Cengage South-Western. All Rights Reserved Slide

© 2009 Cengage South-Western. All Rights Reserved Slide

 An important measure of the shape of a distribution

© 2009 Cengage South-Western. All Rights Reserved Slide

 Symmetric (not skewed)

© 2009 Cengage South-Western. All Rights Reserved Slide

 Moderately Skewed Left

© 2009 Cengage South-Western. All Rights Reserved Slide

 Moderately Skewed Right

© 2009 Cengage South-Western. All Rights Reserved Slide

 Highly Skewed Right

© 2009 Cengage South-Western. All Rights Reserved Slide

 Example: Apartment Rents

© 2009 Cengage South-Western. All Rights Reserved Slide

 Example: Apartment Rents

.35 Skewness = .92

© 2009 Cengage South-Western. All Rights Reserved Slide

The z-score is often called the standardized value.

It denotes the number of standard deviations a data

© 2009 Cengage South-Western. All Rights Reserved Slide

 An observation’s z-score is a measure of the relative

© 2009 Cengage South-Western. All Rights Reserved Slide

 Example: Apartment Rents

Standardized Values for Apartment Rents

© 2009 Cengage South-Western. All Rights Reserved Slide

At least (1 - 1/z2) of the items in any data set will be

© 2009 Cengage South-Western. All Rights Reserved Slide

At least 75% of the data values must be

At least 89% of the data values must be

At least 94% of the data values must be

© 2009 Cengage South-Western. All Rights Reserved Slide

 Example: Apartment Rents

At least (1  1/(1.5)2) = 1  0.44 = 0.56 or 56%

(Actually, 86% of the rent values

© 2009 Cengage South-Western. All Rights Reserved Slide

For data having a bell-shaped distribution:

68.26% of the values of a normal random variable

95.44% of the values of a normal random variable

99.72% of the values of a normal random variable

© 2009 Cengage South-Western. All Rights Reserved Slide

© 2009 Cengage South-Western. All Rights Reserved Slide

 An outlier is an unusually small or unusually large

© 2009 Cengage South-Western. All Rights Reserved Slide

 Example: Apartment Rents

Standardized Values for Apartment Rents

© 2009 Cengage South-Western. All Rights Reserved Slide

© 2009 Cengage South-Western. All Rights Reserved Slide

© 2009 Cengage South-Western. All Rights Reserved Slide

 Example: Apartment Rents

© 2009 Cengage South-Western. All Rights Reserved Slide

 Example: Apartment Rents

 Limits are located (not drawn) using the interquartile

© 2009 Cengage South-Western. All Rights Reserved Slide

 Example: Apartment Rents

• The upper limit is located 1.5(IQR) above Q3.

• There are no outliers (values less than 325 or

© 2009 Cengage South-Western. All Rights Reserved Slide

 Example: Apartment Rents

Smallest value Largest value

© 2009 Cengage South-Western. All Rights Reserved Slide

The covariance is a measure of the linear association