You are on page 1of 29

Business Analytics

Descriptive Statistics
Analyzing Distributions

Lecture # 04

1
TOPICS to be COVERED

01 Percentiles

02 Quartiles

03 Variance

04 Standard Deviation

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part,
except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website for classroom use.
Analyzing Distributions
• Distributions are very useful for interpreting
and analyzing data.
• A distribution describes the overall variability
of the observed values of a variable.
• In this section we introduce additional ways of
analyzing distributions.

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part,
except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website for classroom use. 3
Analyzing Distributions
• Percentiles
• Quartiles
• Z- score
• Empirical Rule
• Box Plots

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part,
except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website for classroom use. 4
Percentile
• A percentile is the value of a variable at which a
specified (approximate) percentage of
observations are below that value.

• The pth percentile tells us the point in the data


where approximately p% of the observations
have values less than the pth percentile; hence,
approximately (100 − p)% of the observations
have values greater than the pth percentile.
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part,
except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website for classroom use. 5
• Percentile:
The value below which a percentage of data falls.

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part,
except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website for classroom use. 6
Example
• If a child's weight is at the 50th percentile line,
that means that out of 100 normal children
her age, 50 will be bigger than she is and 50
smaller.
• Similarly, if she is in the 75th percentile, that
means that she is bigger than 75 children and
smaller than only 25, compared with 100
children her age.

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part,
except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website for classroom use. 7
Examples:
Colleges and universities frequently report admission test
scores in terms of percentiles.

• For instance, suppose an applicant obtains a raw score of 54


on the verbal portion of an admission test. How this student
performed in relation to other students taking the same test
may not be readily apparent.

• However, if the raw score of 54 corresponds to the 70th


percentile, we know that approximately 70% of the students
scored lower than this individual, and approximately 30% of
the students scored higher.
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part,
except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website for classroom use. 8
To calculate the pth percentile for a data set containing n observations we must first
arrange the data in ascending order (smallest value to largest value).

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part,
except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website for classroom use. 9
Exercise:

Compute the 85th percentile for the home sales


data in Table 2.9.

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part,
except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website for classroom use. 10
# selling price
1 108,000
2 138,000
3 138,000
4 142,000
5 186,000
6 199,500
7 208,000
8 254,000
9 254,000
10 257,500
11 298,000
12 456,250
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part,
except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website for classroom use. 11
• with p = 85 and n = 12, the location of the
• 85th percentile is

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part,
except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website for classroom use. 12
• The interpretation of L85 = 11.05

• i.e. the 85th percentile is 5% of the way between


the value in position 11 and the value in position
12.
OR
• the 85th percentile is the value in position 11
(298,000) plus 0.05 times the difference
between the value in position 12 (456,250) and
the value in position 11 (298,000).
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part,
except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website for classroom use. 13
Thus,85th percentile
=298,000+ 0.05(456,250- 298,000)
=298,000 +0.05(158,250)
=305,912.50

• Therefore, $305,912.50 represents the 85th


percentile of the home sales data.

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part,
except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website for classroom use. 14
Quartiles
••   When the data is divided into four equal parts:
– Each part contains approximately 25% of the
observations
– Division points are referred to as quartiles

= first quartile, or 25th percentile


= Second quartile, or 50th percentile (also the median)
= Third quartile, or 75th percentile

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part,
except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website for classroom use. 15
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part,
except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website for classroom use. 16
• To demonstrate quartiles, the home sales data
are again arranged in ascending order

# Selling Price
1. 108,000 7. 208,000
2. 138,000 8. 254,000
3. 138,000 9. 254,000
4. 142,000 10. 257,500
5. 186,000 11. 298,000
6. 199,500 12. 456,250

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part,
except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website for classroom use. 17
• We already identified Q2, the second quartile
(median) as 203,750 (as calculated earlier)
• To find Q1 and Q3 we must find the 25th and
75th percentiles.

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part,
except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website for classroom use. 18
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part,
except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website for classroom use. 19
• Therefore, for the home sale data
• 25th percentile is $139,000
• 50th percentile is $203,750
• 75th percentile is $256,625.
• So, The quartiles divide the home sales data into four
parts, with each part containing 25% of the observations.

108,000 138,000 138,000 ,


142,000 186,000 199,500 ,
208,000 254,000 254,000 ,
257,500 298,000 456,250
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part,
except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website for classroom use. 20
• The difference between the third and first
quartiles is often referred to as the interquartile
range, or IQR.
• For the home sales data,
IQR =Q3 -Q1
256,625 - 139,000 = 117,625.
Because it excludes the smallest and largest 25% of
values in the data, the IQR is a useful measure of
variation for data that have extreme values or are
highly skewed.
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part,
except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website for classroom use. 21
Variance
• The variance is a measure of variability that
utilizes all the data. The variance is based on
the deviation about the mean, which is the
difference between the value of each
observation (xi ) and the mean

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part,
except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website for classroom use. 22
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part,
except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website for classroom use. 23
Standard Deviation
• The standard deviation is defined to be the
positive square root of the variance. We use ‘s’
to denote the sample standard deviation and
to denote the population standard deviation.

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part,
except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website for classroom use. 24
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part,
except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website for classroom use. 25
Table 2.12

The sample variance for the sample of class sizes in five college classes is
s^2 =64.
Thus,
the sample standard deviation is = 8.

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part,
except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website for classroom use. 26
Coefficient of Variation
• In some situations we may be interested in a
descriptive statistic that indicates how large
the standard deviation is relative to the mean.
This measure is called the coefficient of
variation and is usually expressed as a
percentage.

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part,
except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website for classroom use. 27
• we found a sample mean of 44 and a sample
standard deviation of 8.
• The coefficient of variation is (8/44 * 100)
18.2%.

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part,
except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website for classroom use. 28
Thank You !

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part,
except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website for classroom use.

You might also like