You are on page 1of 49

Contemporary Business

Statistics, 3e
by
Williams, Sweeney, and Anderson

Slides by

JOHN
LOUCKS
St. Edward’s
University

© 2009 Cengage South-Western. All Rights Reserved Slide


1
Chapter 3, Part B
Descriptive Statistics: Numerical Measures
 Measures of Distribution Shape, Relative Location,
and Detecting Outliers
 Exploratory Data Analysis
 Measures of Association Between Two Variables
 The Weighted Mean and
Working with Grouped Data

© 2009 Cengage South-Western. All Rights Reserved Slide


2
Measures of Distribution Shape,
Relative Location, and Detecting Outliers
 Distribution Shape
 z-Scores
 Chebyshev’s Theorem
 Empirical Rule
 Detecting Outliers

© 2009 Cengage South-Western. All Rights Reserved Slide


3
Distribution Shape: Skewness

 An important measure of the shape of a distribution


is called skewness.
 The formula for computing skewness for a data set is
somewhat complex.
 Skewness can be easily computed using statistical
software.
 Excel’s SKEW function can be used to compute the
skewness of a data set.

© 2009 Cengage South-Western. All Rights Reserved Slide


4
Distribution Shape: Skewness

 Symmetric (not skewed)


• Skewness is zero.
• Mean and median are equal.
.35
Skewness = 0
.30
Relative Frequency

.25
.20
.15
.10
.05
0

© 2009 Cengage South-Western. All Rights Reserved Slide


5
Distribution Shape: Skewness

 Moderately Skewed Left


• Skewness is negative.
• Mean will usually be less than the median.
.35
Skewness = .31
.30
Relative Frequency

.25
.20
.15
.10
.05
0

© 2009 Cengage South-Western. All Rights Reserved Slide


6
Distribution Shape: Skewness

 Moderately Skewed Right


• Skewness is positive.
• Mean will usually be more than the median.
.35
Skewness = .31
.30
Relative Frequency

.25
.20
.15
.10
.05
0

© 2009 Cengage South-Western. All Rights Reserved Slide


7
Distribution Shape: Skewness

 Highly Skewed Right


• Skewness is positive (often above 1.0).
• Mean will usually be more than the median.
.35
Skewness = 1.25
.30
Relative Frequency

.25
.20
.15
.10
.05
0

© 2009 Cengage South-Western. All Rights Reserved Slide


8
Distribution Shape: Skewness

 Example: Apartment Rents


Seventy efficiency apartments were randomly
sampled in a college town. The monthly rent prices
for the apartments are listed below in ascending order.
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

© 2009 Cengage South-Western. All Rights Reserved Slide


9
Distribution Shape: Skewness

 Example: Apartment Rents

.35 Skewness = .92


.30
Relative Frequency

.25

.20
.15

.10
.05
0

© 2009 Cengage South-Western. All Rights Reserved Slide


10
z-Scores

The z-score is often called the standardized value.

It denotes the number of standard deviations a data


value xi is from the mean.

xi  x
zi 
s

© 2009 Cengage South-Western. All Rights Reserved Slide


11
z-Scores

 An observation’s z-score is a measure of the relative


location of the observation in a data set.
 A data value less than the sample mean will have a
z-score less than zero.
 A data value greater than the sample mean will have
a z-score greater than zero.
 A data value equal to the sample mean will have a
z-score of zero.

© 2009 Cengage South-Western. All Rights Reserved Slide


12
z-Scores

 Example: Apartment Rents


• z-Score of Smallest Value (425)

xi  x 425  490.80
z    1.20
s 54.74

Standardized Values for Apartment Rents


-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93
-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75
-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47
-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20
-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.35
0.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.45
1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27

© 2009 Cengage South-Western. All Rights Reserved Slide


13
Chebyshev’s Theorem

At least (1 - 1/z2) of the items in any data set will be


within z standard deviations of the mean, where z is
any value greater than 1.

© 2009 Cengage South-Western. All Rights Reserved Slide


14
Chebyshev’s Theorem

At least 75% of the data values must be


within z = 2 standard deviations of the mean.

At least 89% of the data values must be


within z = 3 standard deviations of the mean.

At least 94% of the data values must be


within z = 4 standard deviations of the mean.

© 2009 Cengage South-Western. All Rights Reserved Slide


15
Chebyshev’s Theorem

 Example: Apartment Rents


Let z = 1.5 with x = 490.80 and s = 54.74

At least (1  1/(1.5)2) = 1  0.44 = 0.56 or 56%


of the rent values must be between
x - z(s) = 490.80  1.5(54.74) = 409
and
x + z(s) = 490.80 + 1.5(54.74) = 573

(Actually, 86% of the rent values


are between 409 and 573.)

© 2009 Cengage South-Western. All Rights Reserved Slide


16
Empirical Rule

For data having a bell-shaped distribution:

68.26% of the values of a normal random variable


are within +/- 1 standard deviation of its mean.

95.44% of the values of a normal random variable


are within +/- 2 standard deviations of its mean.

99.72% of the values of a normal random variable


are within +/- 3 standard deviations of its mean.

© 2009 Cengage South-Western. All Rights Reserved Slide


17
Empirical Rule

99.72%
95.44%
68.26%


x
 – 3  – 1  + 1  + 3
 – 2  + 2

© 2009 Cengage South-Western. All Rights Reserved Slide


18
Detecting Outliers

 An outlier is an unusually small or unusually large


value in a data set.
 A data value with a z-score less than -3 or greater
than +3 might be considered an outlier.
 It might be:
• an incorrectly recorded data value
• a data value that was incorrectly included in the
data set
• a correctly recorded data value that belongs in
the data set

© 2009 Cengage South-Western. All Rights Reserved Slide


19
Detecting Outliers

 Example: Apartment Rents


• The most extreme z-scores are -1.20 and 2.27
• Using |z| > 3 as the criterion for an outlier, there
are no outliers in this data set.

Standardized Values for Apartment Rents


-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93
-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75
-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47
-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20
-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.35
0.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.45
1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27

© 2009 Cengage South-Western. All Rights Reserved Slide


20
Exploratory Data Analysis

 Five-Number Summary
 Box Plot

© 2009 Cengage South-Western. All Rights Reserved Slide


21
Five-Number Summary

1 Smallest Value

2 First Quartile

3 Median

4 Third Quartile

5 Largest Value

© 2009 Cengage South-Western. All Rights Reserved Slide


22
Five-Number Summary

 Example: Apartment Rents


Lowest Value = 425 First Quartile = 445
Median = 475
Third Quartile = 525 Largest Value = 615
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

© 2009 Cengage South-Western. All Rights Reserved Slide


23
Box Plot

 Example: Apartment Rents


• A box is drawn with its ends located at the first and
third quartiles.
• A vertical line is drawn in the box at the location of
the median (second quartile).

400 425 450 475 500 525 550 575 600 625

Q1 = 445 Q3 = 525
Q2 = 475
© 2009 Cengage South-Western. All Rights Reserved Slide
24
Box Plot

 Limits are located (not drawn) using the interquartile


range (IQR).
 Data outside these limits are considered outliers.
 The locations of each outlier is shown with the
symbol * .
continued

© 2009 Cengage South-Western. All Rights Reserved Slide


25
Box Plot

 Example: Apartment Rents


• The lower limit is located 1.5(IQR) below Q1.
Lower Limit: Q1 - 1.5(IQR) = 445 - 1.5(80) = 325

• The upper limit is located 1.5(IQR) above Q3.


Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(80) = 645

• There are no outliers (values less than 325 or


greater than 645) in the apartment rent data.

© 2009 Cengage South-Western. All Rights Reserved Slide


26
Box Plot

 Example: Apartment Rents


• Whiskers (dashed lines) are drawn from the ends
of the box to the smallest and largest data values
inside the limits.

400 425 450 475 500 525 550 575 600 625

Smallest value Largest value


inside limits = 425 inside limits = 615
© 2009 Cengage South-Western. All Rights Reserved Slide
27
Measures of Association
Between Two Variables
 Covariance
 Correlation Coefficient

© 2009 Cengage South-Western. All Rights Reserved Slide


28
Covariance

The covariance is a measure of the linear association


between two variables.

Positive values indicate a positive relationship.

Negative values indicate a negative relationship.

© 2009 Cengage South-Western. All Rights Reserved Slide


29
Covariance

The covariance is computed as follows:

 ( xi  x )( yi  y ) for
sxy 
n 1 samples

 ( xi   x )( yi   y ) for
 xy  populations
N

© 2009 Cengage South-Western. All Rights Reserved Slide


30
Correlation Coefficient

Correlation is a measure of linear association and not


necessarily causation.

Just because two variables are highly correlated, it


does not mean that one variable is the cause of the
other.

© 2009 Cengage South-Western. All Rights Reserved Slide


31
Correlation Coefficient

The correlation coefficient is computed as follows:


sxy  xy
rxy   xy 
sx s y  x y

for for
samples populations

© 2009 Cengage South-Western. All Rights Reserved Slide


32
Correlation Coefficient

The coefficient can take on values between -1 and +1.

Values near -1 indicate a strong negative linear


relationship.

Values near +1 indicate a strong positive linear


relationship.

© 2009 Cengage South-Western. All Rights Reserved Slide


33
Covariance and Correlation Coefficient

 Example: Golfing Study


A golfer is interested in investigating the
relationship, if any, between driving distance and 18-
hole score.
Average Driving Average
Distance (yds.) 18-Hole Score
277.6 69
259.5 71
269.1 70
267.0 70
255.6 71
272.9 69

© 2009 Cengage South-Western. All Rights Reserved Slide


34
Covariance and Correlation Coefficient

 Example: Golfing Study

x y ( xi  x ) ( y i  y ) ( xi  x )( y i  y )
277.6 69 10.65 -1.0 -10.65
259.5 71 -7.45 1.0 -7.45
269.1 70 2.15 0 0
267.0 70 0.05 0 0
255.6 71 -11.35 1.0 -11.35
272.9 69 5.95 -1.0 -5.95
Average 267.0 70.0 Total -35.40
Std. Dev. 8.2192 .8944

© 2009 Cengage South-Western. All Rights Reserved Slide


35
Covariance and Correlation Coefficient

 Example: Golfing Study


• Sample Covariance

sxy 
 ( x  x )( y
i i  y)

35.40
  7.08
n1 61
• Sample Correlation Coefficient
sxy 7.08
rxy    -.9631
sx sy (8.2192)(.8944)

© 2009 Cengage South-Western. All Rights Reserved Slide


36
Using Excel to Compute the
Covariance and Correlation Coefficient
 Example: Golfing Study
• Formula Worksheet
A B C D E
Average 18-Hole
1 Drive Score
2 277.6 69 Pop. Covariance =COVAR(A2:A7,B2:B7)
3 259.5 71 Samp. Correlation =CORREL(A2:A7,B2:B7)
4 269.1 70
5 267.0 70
6 255.6 71
7 272.9 69
8

© 2009 Cengage South-Western. All Rights Reserved Slide


37
Using Excel to Compute the
Covariance and Correlation Coefficient
 Example: Golfing Study
• Value Worksheet
A B C D E
Average 18-Hole
1 Drive Score
2 277.6 69 Pop. Covariance -5.9
3 259.5 71 Samp. Correlation -0.9631
4 269.1 70
5 267.0 70
6 255.6 71
7 272.9 69
8

Sample Covariance = sxy = n/(n – 1)xy = 6/(6 – 1)(-5.9) = -7.08

© 2009 Cengage South-Western. All Rights Reserved Slide


38
The Weighted Mean and
Working with Grouped Data
 Weighted Mean
 Mean for Grouped Data
 Variance for Grouped Data
 Standard Deviation for Grouped Data

© 2009 Cengage South-Western. All Rights Reserved Slide


39
Weighted Mean

 When the mean is computed by giving each data


value a weight that reflects its importance, it is
referred to as a weighted mean.
 In the computation of a grade point average (GPA),
the weights are the number of credit hours earned for
each grade.
 When data values vary in importance, the analyst
must choose the weight that best reflects the
importance of each value.

© 2009 Cengage South-Western. All Rights Reserved Slide


40
Weighted Mean

x
 wx i i

w i

where:
xi = value of observation i
wi = weight for observation i

© 2009 Cengage South-Western. All Rights Reserved Slide


41
Grouped Data

 The weighted mean computation can be used to


obtain approximations of the mean, variance, and
standard deviation for the grouped data.
 To compute the weighted mean, we treat the
midpoint of each class as though it were the mean
of all items in the class.
 We compute a weighted mean of the class midpoints
using the class frequencies as weights.
 Similarly, in computing the variance and standard
deviation, the class frequencies are used as weights.

© 2009 Cengage South-Western. All Rights Reserved Slide


42
Mean for Grouped Data

 Sample Data

x  fM i i

 Population Data

  fM i i

N
where:
fi = frequency of class i
Mi = midpoint of class i

© 2009 Cengage South-Western. All Rights Reserved Slide


43
Sample Mean for Grouped Data

 Example: Apartment Rents


The previously presented sample of apartment
rents is shown here as grouped data in the form of
a frequency distribution. Rent ($) Frequency
420-439 8
440-459 17
460-479 12
480-499 8
500-519 7
520-539 4
540-559 2
560-579 4
580-599 2
600-619 6
© 2009 Cengage South-Western. All Rights Reserved Slide
44
Sample Mean for Grouped Data

 Example: Apartment Rents

Rent ($) fi Mi f iMi


420-439 8 429.5 3436.0 34,525
x  493.21
440-459 17 449.5 7641.5 70
460-479 12 469.5 5634.0 This approximation
480-499 8 489.5 3916.0
differs by $2.41 from
500-519 7 509.5 3566.5
520-539 4 529.5 2118.0 the actual sample
540-559 2 549.5 1099.0 mean of $490.80.
560-579 4 569.5 2278.0
580-599 2 589.5 1179.0
600-619 6 609.5 3657.0
Total 70 34525.0

© 2009 Cengage South-Western. All Rights Reserved Slide


45
Variance for Grouped Data

 For sample data


2
 f i ( M i  x )
s2 
n 1

 For population data


2
 f i ( M i   )
2 
N

© 2009 Cengage South-Western. All Rights Reserved Slide


46
Sample Variance for Grouped Data

 Example: Apartment Rents

Rent ($) fi Mi Mi - x (M i - x )2 f i (M i - x )2
420-439 8 429.5 -63.7 4058.96 32471.71
440-459 17 449.5 -43.7 1910.56 32479.59
460-479 12 469.5 -23.7 562.16 6745.97
480-499 8 489.5 -3.7 13.76 110.11
500-519 7 509.5 16.3 265.36 1857.55
520-539 4 529.5 36.3 1316.96 5267.86
540-559 2 549.5 56.3 3168.56 6337.13
560-579 4 569.5 76.3 5820.16 23280.66
580-599 2 589.5 96.3 9271.76 18543.53
600-619 6 609.5 116.3 13523.36 81140.18
Total 70 208234.29
continued
© 2009 Cengage South-Western. All Rights Reserved Slide
47
Sample Variance for Grouped Data

 Example: Apartment Rents


• Sample Variance

s2 = 208,234.29/(70 – 1) = 3,017.89

• Sample Standard Deviation


s  3,017.89  54.94

This approximation differs by only $.20


from the actual standard deviation of $54.74.

© 2009 Cengage South-Western. All Rights Reserved Slide


48
End of Chapter 3, Part B

© 2009 Cengage South-Western. All Rights Reserved Slide


49

You might also like