Chapter 3 (Descriptive)

CHAPTER 3
DESCRIPTIVE
STATISTICS
POINTS TO HIGHLIGHT
§ Modifying data in Excel
§ Creating distributions from data
§ Measures of location
§ Measures of variability
§ Analyzing distribution
§ Measure of association between two variables

§ Sorting and Filtering Data in Excel
§ Conditional Formatting of Data in Excel

SORTING AND FILTERING DATA IN EXCEL
To sort the automobiles by March 2010 sales:
§ Step 1: Select cells A1:F21
§ Step 2: Click the Data tab in the Ribbon
§ Step 3: Click Sort in the Sort & Filter group
§ Step 4: Select the check box for My data has headers
§ Step 5: In the first Sort by dropdown menu, select Sales (March
2010)
§ Step 6: In the Order dropdown menu, select Largest to Smallest
§ Step 7: Click OK
4
FIGURE 3.1: USING EXCEL’S SORT FUNCTION TO SORT THE
TOP-SELLING AUTOMOBILES DATA
5
FIGURE 3.2: TOP-SELLING AUTOMOBILES DATA SORTED BY
SALES IN MARCH 2010 SALES
6
SORTING AND FILTERING DATA IN EXCEL
§ Using Excel’s Filter function to see the sales of

models made by Toyota
§ Step 1: Select cells A1:F21
§ Step 2: Click the Data tab in the Ribbon
§ Step 3: Click Filter in the Sort & Filter group
§ Step 4: Click on the Filter Arrow in column B, next to Manufacturer
§ Step 5: If all choices are checked, you can easily deselect all choices by
unchecking (Select All). Then select only the check box for Toyota.
§ Step 6. Click OK
7
FIGURE 3.3: TOP SELLING AUTOMOBILES DATA FILTERED TO
SHOW ONLY AUTOMOBILES MANUFACTURED BY TOYOTA
8
CONDITIONAL FORMATTING OF DATA IN EXCEL
§ Makes it easy to identify data that satisfy certain conditions in a
data set
9
Example:
§ To identify the automobile models in Table 3.1 for which sales had
decreased from March 2010 to March 2011:
§ Step 1: Starting with the original data shown in Figure 3.1, select cells
F1:F21
§ Step 2: Click on the Home tab in the Ribbon
§ Step 3: Click Conditional Formatting in the Styles group
§ Step 4: Select Highlight Cells Rules, and click Less Than from the
dropdown menu
§ Step 5: Enter 0% in the Format cells that are LESS THAN: box
§ Step 6: Click OK
10
FIGURE 3.4: USING CONDITIONAL FORMATTING IN EXCEL TO
HIGHLIGHT AUTOMOBILES WITH DECLINING SALES FROM MARCH
2010
11
FIGURE 3.5: USING CONDITIONAL FORMATTING IN EXCEL TO
GENERATE DATA BARS FOR THE TOP-SELLING AUTOMOBILES
DATA
12
§ Quick Analysis button appears just outside the bottom-
right corner of a group of selected cells
§ Provides shortcuts for Conditional Formatting, adding Data
Bars, etc.
§ Frequency Distributions for Categorical Data
§Relative Frequency and Percent Frequency
Distributions
§ Frequency Distributions for Quantitative Data
§ Histograms
FREQUENCY DISTRIBUTIONS FOR CATEGORICAL
DATA
§ Frequency distribution: A summary of

data that shows the number (frequency) of
observations in each of several
nonoverlapping classes,
§ Typically referred to as bins, when dealing

with distributions
15
TABLE 3.6: DATA FROM A SAMPLE OF 50 SOFT
DRINK PURCHASES
16
§ The frequency distribution summarizes information about the
popularity of the five soft drinks:
§ Coca-Cola is the leader

§ Pepsi is second
§ Diet Coke is third and Sprite and Dr. Pepper are tied for fourth
FIGURE 3.8: CREATING A FREQUENCY DISTRIBUTION FOR
SOFT DRINKS DATA IN EXCEL
18
RELATIVE FREQUENCY AND PERCENT FREQUENCY
DISTRIBUTIONS
§ Relative frequency distribution: It is a

tabular summary of data showing the
relative frequency for each bin
§ Percent frequency distribution:

Summarizes the percent frequency of the
data for each bin
19
TABLE 3.9: RELATIVE FREQUENCY AND PERCENT FREQUENCY
DISTRIBUTIONS OF SOFT DRINK PURCHASES
20
FREQUENCY DISTRIBUTIONS FOR QUANTITATIVE
DATA
Three steps necessary to define the classes for

a frequency distribution with quantitative data:
1. Determine the number of non-overlapping bins.
2. Determine the width of each bin.
3. Determine the bin limits.
21
Table 3.10: Year-End Audit Times (Days)
Table 3.11: Frequency, Relative Frequency, and Percent

Frequency Distributions for the Audit Time Data
22
FIGURE 3.12: USING EXCEL TO GENERATE A FREQUENCY
DISTRIBUTION FOR AUDIT TIMES DATA
23
HISTOGRAM
§ A common graphical presentation of quantitative
data
§ Constructed by placing the variable of interest on

the horizontal axis and the selected frequency
measure (absolute frequency, relative frequency, or
percent frequency) on the vertical axis.
§ The frequency measure of each class is shown by

drawing a rectangle whose base is determined by
the class limits on the horizontal axis and whose
height is the corresponding frequency measure.
24
FIGURE 3.13: HISTOGRAM FOR THE AUDIT TIME
DATA
25
FIGURE 3.14: CREATING A HISTOGRAM FOR THE AUDIT
TIME DATA USING DATA ANALYSIS TOOLPAK IN EXCEL
26
FIGURE 3.15: COMPLETED HISTOGRAM FOR THE AUDIT
TIME DATA USING DATA ANALYSIS TOOLPAK IN EXCEL
27
CREATING DISTRIBUTIONS FROM DATA
§ Histograms provides information about the shape, or form, of

a distribution
§ Skewness: Lack of symmetry
§ Skewness is an important characteristic of the shape of a
distribution
28
FIGURE 3.16: HISTOGRAMS SHOWING DISTRIBUTIONS WITH
DIFFERENT LEVELS OF SKEWNESS
29
CUMULATIVE DISTRIBUTIONS
§ Cumulative frequency distribution: shows the number of

data items with values less than or equal to the upper class
limit of each class
§ A variation of the frequency distribution that provides another
tabular summary of quantitative data
30
TABLE 3.17: CUMULATIVE FREQUENCY, CUMULATIVE RELATIVE
FREQUENCY, AND CUMULATIVE PERCENT FREQUENCY
DISTRIBUTIONS FOR THE AUDIT TIME DATA
31
§ Mean (Arithmetic Mean)
§ Median
§ Mode
§ Geometric Mean
MEASURES OF LOCATION
Mean/Arithmetic Mean
§ Average value for a variable
§ The mean is denoted by 𝑥̅
§ n = sample size
§ 𝑥! = value of variable x for the first observation
§ 𝑥" = value of variable x for the second observation
§ 𝑥# = value of variable x for the nth observation
33
TABLE 3.18: DATA ON HOME SALES IN CINCINNATI, OHIO,
SUBURB
34
COMPUTATION OF SAMPLE MEAN
§ illustration: Computation of the mean home selling price for the
sample of 12 home sales
35
Median
§ Value of the item in the middle when the
data are arranged in ascending order
§ Value of middle item, for an odd number of
observations
§ Average of values of two middle items, for an
even number of observations
36
COMPUTATION OF SAMPLE MEDIAN
§ illustration: When the number of observations
are odd
v Consider the class size data for a sample of five
college classes:
46 54 42 46 32
v Arrange the class size data in ascending order
32 42 46 46 54
v Middlemost value in the data set = 46
v Median is 46
37
COMPUTATION OF SAMPLE MEDIAN
§ illustration - When the number of observations
are even
§ Consider the data on home sales in Cincinnati, Ohio,
Suburb (Table 2.9)
§ Arrange the data in ascending order:
108,000 138,000 138,000 142,000 186,000 199,500
208,000 254,000 254,000 257,500 298,000 456,250
§ Median = average of two middle values
= "199,500 + 208,000" /"2" =203,750
Middle Two Values

38
Mode
§ Value that occurs most frequently in a data set
§ Consider the class size data:
32 42 46 46 54
§ Observe - 46 is the only value that occurs more
than once
§ Mode is 46
§ Multimodal data - Data contain at least two
modes
§ Bimodal data - Data contain exactly two modes
39
FIGURE 3.19: CALCULATING THE MEAN, MEDIAN, AND MODES
FOR THE HOME SALES DATA USING EXCEL
40
Geometric Mean
§ nth root of the product of n values
§ Used in analyzing growth rates or rate of change
§ Sample geometric mean
41
TABLE 3.20: PERCENTAGE ANNUAL RETURNS AND GROWTH
FACTORS FOR THE MUTUAL FUND DATA
§ illustration: Consider the percentage annual returns and
growth factors for the mutual fund data over the past 10
years
§ We will determine the mean rate of growth for the fund
over the 10-year period
42
COMPUTATION OF GEOMETRIC MEAN
§ Solution:
§ Product of the growth factors:
(.779)(1.287)(1.109)(1.049)(1.158)(1.055)(.630)(1.265)(1.1
51)(1.021)
= 1.335
§ Geometric mean of the growth factors:
!"
𝑥!̅ = 1.335 = 1.029
§ Conclude that annual returns grew at an average annual rate of
(1.029 – 1)100% or 2.9%
43
FIGURE 3.21: CALCULATING THE GEOMETRIC MEAN FOR THE
MUTUAL FUND DATA USING EXCEL
44
§ Range
§ Variance
§ Standard Deviation
§ Coefficient of Variation
MEASURES OF VARIABILITY
Table 3.22: Annual Payouts for Figure 3.23: Histograms for
Payouts of Past 20 Years from Fund
Two Different Investment Funds A and Fund B
46
COMPUTATION OF RANGE
Range
§ Found by subtracting the smallest value from the largest value in a
data set
§ Illustration: Consider the data on home sales in Cincinnati, Ohio,
suburb
§ Largest home sales price: $456,250
§ Smallest home sales price: $108,000
§ Range = Largest value – Smallest value
= $456,250 – $108,000
= $348,250
§ Drawback: Range is based on only two of the observations and
thus is highly influenced by extreme values
47
Variance
§ Measure of variability that utilizes all the data
§ It is based on the deviation about the mean, which is the
difference between the value of each observation (xi) and the
mean
§ The deviations about the mean are squared while computing

the variance
$
∑ $# % $̅
§ Sample variance, 𝑠" = '%(
$
∑ $# % )
§ Population variance , 𝜎" = *
48
TABLE 3.24: COMPUTATION OF DEVIATIONS AND SQUARED
DEVIATIONS ABOUT THE MEAN FOR THE CLASS SIZE DATA
Computation of Sample
Variance:
49
FIGURE 3.25: CALCULATING VARIABILITY MEASURES FOR
THE HOME SALES DATA IN EXCEL
50
§ Standard Deviation
§ Positive square root of the variance
§ Measured in the same units as the original data
§ For sample , s= 𝑠 "
§ For population, σ= σ"
§ Coefficient of Variation
+,-./-0/ /123-,34.
§
51-.
x 100 %
§ Measures the standard deviation relative to the mean
§ Expressed as a percentage
51
COMPUTATION OF COEFFICIENT OF VARIATION
§ Illustration:
§ Consider the class size data:
46 54 42 46 32
§ Mean, 𝑥̅ = 44
§ Standard deviation, s = 8
8
§ Coefficient of variation = 44 x 100 % = 18.2%
52
§ Percentiles
§ Quartiles
§ Z-Scores
§ Empirical Rule
§ Identifying Outliers
§ Box Plots
ANALYZING DISTRIBUTIONS
Percentiles
§ Value of a variable at which a specified (approximate)
percentage of observations are below that value
§ The pth percentile tells us the point in the data where:
§ Approximately p percent of the observations have values less
than the pth percentile
§ Approximately (100 – p) percent of the observations have values
greater than the pth percentile
54
§ Steps to calculate the pth percentile:
§ Arrange the data in ascending order (smallest to largest value)
§ Compute k = (n + 1) × p
§ Divide k into its integer component, i, and its decimal
component, d
§ If d = 0, find the kth largest value in the data set; this is the pth
percentile
§ If d > 0, the percentile is between the values in positions i and i + 1
in the sorted data; to find this percentile, we must interpolate
between these two values:
i. Calculate the difference between the values in
positions i and i + 1 in the sorted data set; we define
this difference between the two values as m
ii. Multiply this difference by d: t = m × d
iii. To find the pth percentile, add t to the value in
position i of the sorted data
55
§ Illustration
§ To determine the 85th percentile for the home sales data in

Table 3.18.
1. Arrange the data in ascending order
108,000 138,000 138,000 142,000 186,000 199,500
208,000 254,000 254,000 257,500 298,000 456,250
2. Compute k = (n + 1) × p = (12 + 1) × 0.85 = 11.05
3. Dividing 11.05 into the integer and decimal components
gives us i = 11 and d = 0.05
4. d > 0, interpolate between the values in the 11th and 12th
positions in the sorted data
56
Illustration (contd.)
§ To determine the 85th percentile for the home sales data in Table
3.18
§ The value in the 11th position is 298,000
§ The value in the 12th position is 456,250
m = 456,250 – 298,000 = 158,250
t = m × d = 158,250 × 0.05 = 7912.5
pth percentile = 298,000 + 7912.5 = 305,912.5
$305,912.50 represents the 85th percentile of the home sales data
57
Quartiles
§ When the data is divided into four equal parts:
§ Each part contains approximately 25% of the observations
§ Division points are referred to as quartiles
§ 𝑄! = first quartile, or 25th percentile
§ 𝑄" = second quartile, or 50th percentile (also the median)
§ 𝑄# = third quartile, or 75th percentile
58
z-score
§ Measures the relative location of a value in the data set
§ Helps to determine how far a particular value is from the
mean relative to the data set’s standard deviation
§ Standardized value
§ If 𝑥! , 𝑥" , . . . , 𝑥# is a sample of n observations
$# % $̅
𝑧6 = 7
§ 𝑧$ = z-score for 𝑥$
§ 𝑥̅ = sample mean
§ s = sample standard deviation
59
TABLE 3.26: Z-SCORES FOR THE CLASS SIZE DATA
§ For class size data, 𝑥̅ = 44 and s= 8
§ For observations with a value > mean, z-score >0
§ For observations with a value <mean, z-score <0
60
FIGURE 3.27: CALCULATING Z-SCORES FOR THE
HOME SALES DATA IN EXCEL
61
EXAMPLE: WHICH IS THE BETTER OFFER?
Suppose that two graduating seniors, one a marketing major
and one an accounting major, are comparing job offers. The
accounting major has an offer for $45,000 per year, and the
marketing student has an offer for $42,000 per year. Summary
information about the distribution of offers follows:
Accounting: mean = 46,000 Standard deviation = 1500
Marketing: mean = 42,500 Standard deviation = 1000
EXAMPLE: WHICH IS THE BETTER OFFER?
§ Accounting § Marketing
45000 − 46000 42000 − 42500

z score = = −0.67 z score = = −0.5
1500 1000
This offer is 0.67 SD below the This offer is 0.5 SD below the
mean mean
Empirical Rule
§ For data having a bell-shaped distribution:
§ Within 1 standard deviation—approximately 68% of the data values
§ Within 2 standard deviations—approximately 95% of the data values
§ Within 3 standard deviations—almost all the data values
Identifying Outliers
§ Outliers: Extreme values in a data set
§ It can be identified using standardized values (z-scores)
§ Any data value with a z-score less than –3 or greater than +3 is an
outlier
64
Box Plots
§ Graphical summary of the distribution of data
§ Developed from the quartiles for a data set
Figure 3.28: Box

Plot for the
Home Sales Data
65
FIGURE 3.29: BOX PLOTS COMPARING HOME SALE
PRICES IN DIFFERENT COMMUNITIES
66
§ Scatter Charts
§ Covariance
§ Correlation Coefficient
TABLE 3.30: DATA FOR BOTTLED WATER SALES AT
QUEENSLAND AMUSEMENT PARK FOR A SAMPLE OF 14
SUMMER DAYS
68
FIGURE 3.31: CHART SHOWING THE POSITIVE LINEAR
RELATION BETWEEN SALES AND HIGH TEMPERATURES
Scatter chart
69
MEASURES OF ASSOCIATION BETWEEN
TWO VARIABLES
q Scatter Charts:
§ Useful graph for analyzing the relationship between

two variables
§ The scatter chart also suggests that a straight line

could be used as an approximation for the relationship
between two variables
70
TWO VARIABLES
§ Covariance: Descriptive measure of the linear association
between two variables
§ Sample covariance for a sample of size n with the observations
∑ %8 ()% &8 ( &*
(𝑥! , 𝑦! ), (𝑥" , 𝑦" ), and so on: 𝑠%& =
+(,
∑ $# % µ% 9# % µ&
§ Population covariance,𝜎$9 = *
71
TABLE 3.32: SAMPLE COVARIANCE CALCULATIONS FOR DAILY HIGH
TEMPERATURE AND BOTTLED WATER SALES AT QUEENSLAND AMUSEMENT
PARK
72
FIGURE 3.33: CALCULATING COVARIANCE AND CORRELATION
COEFFICIENT FOR BOTTLED WATER SALES USING EXCEL
73
TWO VARIABLES
§ Correlation coefficient: Measures the relationship between
two variables
§ Not affected by the units of measurement for x andy
§ Sample correlation coefficient denoted by𝑟$9
'
§ 𝑟%& = ! !"
!
! "
∑ "# % &" ## % &#
§ 𝑠"# = sample covariance = '%(
∑ "# % "̅ $
§ 𝑠" = sample standard deviation of x = '%(
∑ ## % #* $
§ 𝑠# = sample standard deviation of y= '%(
74
INTERPRETATION OF CORRELATION
COEFFICIENT
–1 ≤ r ≤ +1
r value Relationship between

the x and y variables
<0 Negative linear
Near 0 No linear relationship
>0 Positive linear

75
FIGURE 3.34: SCATTER DIAGRAMS AND ASSOCIATED
COVARIANCE VALUES FOR DIFFERENT VARIABLE
RELATIONSHIPS
(a) (b) (c)

𝑠%& Positive: 𝑠%& Approximately 0: 𝑠%& Negative:
(x and y are positively (x and y are not (x and y are negatively
linearly related) linearly related) linearly related)
76
COMPUTATION OF CORRELATION COEFFICIENT
Illustration
§ To determine the sample correlation coefficient for bottled
water sales at Queensland Amusement Park:
#!" 12.8
𝑟!" = # # = (4.36)(3.15) = 0.93
! "
§ There is a very strong linear relationship between high

temperature and sales
77
FIGURE 3.35: EXAMPLE OF NONLINEAR RELATIONSHIP
PRODUCING A CORRELATION COEFFICIENT NEAR ZERO
78

Chapter 3 (Descriptive)

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 3 (Descriptive)

Uploaded by

Copyright:

Available Formats

CHAPTER 3

§ Creating distributions from data

§ Measure of association between two variables

§ Conditional Formatting of Data in Excel

§ Using Excel’s Filter function to see the sales of

§ Step 2: Click on the Home tab in the Ribbon

§ Step 3: Click Conditional Formatting in the Styles group

§ Frequency distribution: A summary of

§ Typically referred to as bins, when dealing

§ Coca-Cola is the leader

§ Relative frequency distribution: It is a

§ Percent frequency distribution:

Three steps necessary to define the classes for

Table 3.11: Frequency, Relative Frequency, and Percent

§ Constructed by placing the variable of interest on

§ The frequency measure of each class is shown by

§ Histograms provides information about the shape, or form, of

§ Cumulative frequency distribution: shows the number of

Middle Two Values

§ The deviations about the mean are squared while computing

§ To determine the 85th percentile for the home sales data in

45000 − 46000 42000 − 42500

Figure 3.28: Box

§ Useful graph for analyzing the relationship between

§ The scatter chart also suggests that a straight line

r value Relationship between

Near 0 No linear relationship

>0 Positive linear

(a) (b) (c)

§ There is a very strong linear relationship between high

You might also like