Professional Documents
Culture Documents
DESCRIPTIVE
STATISTICS
POINTS TO HIGHLIGHT
§ Modifying data in Excel
§ Measures of location
§ Measures of variability
§ Analyzing distribution
4
FIGURE 3.1: USING EXCEL’S SORT FUNCTION TO SORT THE
TOP-SELLING AUTOMOBILES DATA
5
FIGURE 3.2: TOP-SELLING AUTOMOBILES DATA SORTED BY
SALES IN MARCH 2010 SALES
6
SORTING AND FILTERING DATA IN EXCEL
7
FIGURE 3.3: TOP SELLING AUTOMOBILES DATA FILTERED TO
SHOW ONLY AUTOMOBILES MANUFACTURED BY TOYOTA
8
CONDITIONAL FORMATTING OF DATA IN EXCEL
§ Makes it easy to identify data that satisfy certain conditions in a
data set
9
CONDITIONAL FORMATTING OF DATA IN EXCEL
Example:
§ To identify the automobile models in Table 3.1 for which sales had
decreased from March 2010 to March 2011:
§ Step 1: Starting with the original data shown in Figure 3.1, select cells
F1:F21
§ Step 4: Select Highlight Cells Rules, and click Less Than from the
dropdown menu
§ Step 5: Enter 0% in the Format cells that are LESS THAN: box
§ Step 6: Click OK
10
FIGURE 3.4: USING CONDITIONAL FORMATTING IN EXCEL TO
HIGHLIGHT AUTOMOBILES WITH DECLINING SALES FROM MARCH
2010
11
FIGURE 3.5: USING CONDITIONAL FORMATTING IN EXCEL TO
GENERATE DATA BARS FOR THE TOP-SELLING AUTOMOBILES
DATA
12
CONDITIONAL FORMATTING OF DATA IN EXCEL
§ Quick Analysis button appears just outside the bottom-
right corner of a group of selected cells
§ Provides shortcuts for Conditional Formatting, adding Data
Bars, etc.
§ Frequency Distributions for Categorical Data
§Relative Frequency and Percent Frequency
Distributions
§ Frequency Distributions for Quantitative Data
§ Histograms
FREQUENCY DISTRIBUTIONS FOR CATEGORICAL
DATA
15
TABLE 3.6: DATA FROM A SAMPLE OF 50 SOFT
DRINK PURCHASES
16
§ The frequency distribution summarizes information about the
popularity of the five soft drinks:
18
RELATIVE FREQUENCY AND PERCENT FREQUENCY
DISTRIBUTIONS
19
TABLE 3.9: RELATIVE FREQUENCY AND PERCENT FREQUENCY
DISTRIBUTIONS OF SOFT DRINK PURCHASES
20
FREQUENCY DISTRIBUTIONS FOR QUANTITATIVE
DATA
21
Table 3.10: Year-End Audit Times (Days)
22
FIGURE 3.12: USING EXCEL TO GENERATE A FREQUENCY
DISTRIBUTION FOR AUDIT TIMES DATA
23
HISTOGRAM
§ A common graphical presentation of quantitative
data
24
FIGURE 3.13: HISTOGRAM FOR THE AUDIT TIME
DATA
25
FIGURE 3.14: CREATING A HISTOGRAM FOR THE AUDIT
TIME DATA USING DATA ANALYSIS TOOLPAK IN EXCEL
26
FIGURE 3.15: COMPLETED HISTOGRAM FOR THE AUDIT
TIME DATA USING DATA ANALYSIS TOOLPAK IN EXCEL
27
CREATING DISTRIBUTIONS FROM DATA
28
FIGURE 3.16: HISTOGRAMS SHOWING DISTRIBUTIONS WITH
DIFFERENT LEVELS OF SKEWNESS
29
CUMULATIVE DISTRIBUTIONS
30
TABLE 3.17: CUMULATIVE FREQUENCY, CUMULATIVE RELATIVE
FREQUENCY, AND CUMULATIVE PERCENT FREQUENCY
DISTRIBUTIONS FOR THE AUDIT TIME DATA
31
§ Mean (Arithmetic Mean)
§ Median
§ Mode
§ Geometric Mean
MEASURES OF LOCATION
Mean/Arithmetic Mean
§ Average value for a variable
§ The mean is denoted by 𝑥̅
§ n = sample size
§ 𝑥! = value of variable x for the first observation
§ 𝑥" = value of variable x for the second observation
§ 𝑥# = value of variable x for the nth observation
33
TABLE 3.18: DATA ON HOME SALES IN CINCINNATI, OHIO,
SUBURB
34
COMPUTATION OF SAMPLE MEAN
§ illustration: Computation of the mean home selling price for the
sample of 12 home sales
35
MEASURES OF LOCATION
Median
§ Value of the item in the middle when the
data are arranged in ascending order
§ Value of middle item, for an odd number of
observations
§ Average of values of two middle items, for an
even number of observations
36
COMPUTATION OF SAMPLE MEDIAN
§ illustration: When the number of observations
are odd
v Consider the class size data for a sample of five
college classes:
46 54 42 46 32
v Arrange the class size data in ascending order
32 42 46 46 54
v Middlemost value in the data set = 46
v Median is 46
37
COMPUTATION OF SAMPLE MEDIAN
§ illustration - When the number of observations
are even
§ Consider the data on home sales in Cincinnati, Ohio,
Suburb (Table 2.9)
§ Arrange the data in ascending order:
108,000 138,000 138,000 142,000 186,000 199,500
208,000 254,000 254,000 257,500 298,000 456,250
§ Median = average of two middle values
= "199,500 + 208,000" /"2" =203,750
32 42 46 46 54
§ Observe - 46 is the only value that occurs more
than once
§ Mode is 46
§ Multimodal data - Data contain at least two
modes
§ Bimodal data - Data contain exactly two modes
39
FIGURE 3.19: CALCULATING THE MEAN, MEDIAN, AND MODES
FOR THE HOME SALES DATA USING EXCEL
40
MEASURES OF LOCATION
Geometric Mean
§ nth root of the product of n values
§ Used in analyzing growth rates or rate of change
§ Sample geometric mean
41
TABLE 3.20: PERCENTAGE ANNUAL RETURNS AND GROWTH
FACTORS FOR THE MUTUAL FUND DATA
§ illustration: Consider the percentage annual returns and
growth factors for the mutual fund data over the past 10
years
§ We will determine the mean rate of growth for the fund
over the 10-year period
42
COMPUTATION OF GEOMETRIC MEAN
§ Solution:
§ Product of the growth factors:
(.779)(1.287)(1.109)(1.049)(1.158)(1.055)(.630)(1.265)(1.1
51)(1.021)
= 1.335
§ Geometric mean of the growth factors:
!"
𝑥!̅ = 1.335 = 1.029
§ Conclude that annual returns grew at an average annual rate of
(1.029 – 1)100% or 2.9%
43
FIGURE 3.21: CALCULATING THE GEOMETRIC MEAN FOR THE
MUTUAL FUND DATA USING EXCEL
44
§ Range
§ Variance
§ Standard Deviation
§ Coefficient of Variation
MEASURES OF VARIABILITY
Table 3.22: Annual Payouts for Figure 3.23: Histograms for
Payouts of Past 20 Years from Fund
Two Different Investment Funds A and Fund B
46
COMPUTATION OF RANGE
Range
§ Found by subtracting the smallest value from the largest value in a
data set
§ Illustration: Consider the data on home sales in Cincinnati, Ohio,
suburb
§ Largest home sales price: $456,250
§ Smallest home sales price: $108,000
§ Range = Largest value – Smallest value
= $456,250 – $108,000
= $348,250
§ Drawback: Range is based on only two of the observations and
thus is highly influenced by extreme values
47
MEASURES OF VARIABILITY
Variance
§ Measure of variability that utilizes all the data
§ It is based on the deviation about the mean, which is the
difference between the value of each observation (xi) and the
mean
48
TABLE 3.24: COMPUTATION OF DEVIATIONS AND SQUARED
DEVIATIONS ABOUT THE MEAN FOR THE CLASS SIZE DATA
Computation of Sample
Variance:
49
FIGURE 3.25: CALCULATING VARIABILITY MEASURES FOR
THE HOME SALES DATA IN EXCEL
50
MEASURES OF VARIABILITY
§ Standard Deviation
§ Positive square root of the variance
§ Measured in the same units as the original data
§ For sample , s= 𝑠 "
§ For population, σ= σ"
§ Coefficient of Variation
+,-./-0/ /123-,34.
§
51-.
x 100 %
§ Measures the standard deviation relative to the mean
§ Expressed as a percentage
51
COMPUTATION OF COEFFICIENT OF VARIATION
§ Illustration:
§ Consider the class size data:
46 54 42 46 32
§ Mean, 𝑥̅ = 44
§ Standard deviation, s = 8
8
§ Coefficient of variation = 44 x 100 % = 18.2%
52
§ Percentiles
§ Quartiles
§ Z-Scores
§ Empirical Rule
§ Identifying Outliers
§ Box Plots
ANALYZING DISTRIBUTIONS
Percentiles
§ Value of a variable at which a specified (approximate)
percentage of observations are below that value
§ The pth percentile tells us the point in the data where:
§ Approximately p percent of the observations have values less
than the pth percentile
§ Approximately (100 – p) percent of the observations have values
greater than the pth percentile
54
ANALYZING DISTRIBUTIONS
§ Steps to calculate the pth percentile:
§ Arrange the data in ascending order (smallest to largest value)
§ Compute k = (n + 1) × p
§ Divide k into its integer component, i, and its decimal
component, d
§ If d = 0, find the kth largest value in the data set; this is the pth
percentile
§ If d > 0, the percentile is between the values in positions i and i + 1
in the sorted data; to find this percentile, we must interpolate
between these two values:
i. Calculate the difference between the values in
positions i and i + 1 in the sorted data set; we define
this difference between the two values as m
ii. Multiply this difference by d: t = m × d
iii. To find the pth percentile, add t to the value in
position i of the sorted data
55
ANALYZING DISTRIBUTIONS
§ Illustration
Illustration (contd.)
§ To determine the 85th percentile for the home sales data in Table
3.18
§ The value in the 11th position is 298,000
§ The value in the 12th position is 456,250
m = 456,250 – 298,000 = 158,250
t = m × d = 158,250 × 0.05 = 7912.5
pth percentile = 298,000 + 7912.5 = 305,912.5
$305,912.50 represents the 85th percentile of the home sales data
57
ANALYZING DISTRIBUTIONS
Quartiles
§ When the data is divided into four equal parts:
§ Each part contains approximately 25% of the observations
§ Division points are referred to as quartiles
§ 𝑄! = first quartile, or 25th percentile
§ 𝑄" = second quartile, or 50th percentile (also the median)
§ 𝑄# = third quartile, or 75th percentile
58
ANALYZING DISTRIBUTIONS
z-score
§ Measures the relative location of a value in the data set
§ Helps to determine how far a particular value is from the
mean relative to the data set’s standard deviation
§ Standardized value
§ If 𝑥! , 𝑥" , . . . , 𝑥# is a sample of n observations
$# % $̅
𝑧6 = 7
§ 𝑧$ = z-score for 𝑥$
§ 𝑥̅ = sample mean
§ s = sample standard deviation
59
TABLE 3.26: Z-SCORES FOR THE CLASS SIZE DATA
§ For class size data, 𝑥̅ = 44 and s= 8
§ For observations with a value > mean, z-score >0
§ For observations with a value <mean, z-score <0
60
FIGURE 3.27: CALCULATING Z-SCORES FOR THE
HOME SALES DATA IN EXCEL
61
EXAMPLE: WHICH IS THE BETTER OFFER?
Suppose that two graduating seniors, one a marketing major
and one an accounting major, are comparing job offers. The
accounting major has an offer for $45,000 per year, and the
marketing student has an offer for $42,000 per year. Summary
information about the distribution of offers follows:
Accounting: mean = 46,000 Standard deviation = 1500
Marketing: mean = 42,500 Standard deviation = 1000
EXAMPLE: WHICH IS THE BETTER OFFER?
§ Accounting § Marketing
This offer is 0.67 SD below the This offer is 0.5 SD below the
mean mean
ANALYZING DISTRIBUTIONS
Empirical Rule
§ For data having a bell-shaped distribution:
§ Within 1 standard deviation—approximately 68% of the data values
§ Within 2 standard deviations—approximately 95% of the data values
§ Within 3 standard deviations—almost all the data values
Identifying Outliers
§ Outliers: Extreme values in a data set
§ It can be identified using standardized values (z-scores)
§ Any data value with a z-score less than –3 or greater than +3 is an
outlier
64
ANALYZING DISTRIBUTIONS
Box Plots
§ Graphical summary of the distribution of data
§ Developed from the quartiles for a data set
65
FIGURE 3.29: BOX PLOTS COMPARING HOME SALE
PRICES IN DIFFERENT COMMUNITIES
66
§ Scatter Charts
§ Covariance
§ Correlation Coefficient
TABLE 3.30: DATA FOR BOTTLED WATER SALES AT
QUEENSLAND AMUSEMENT PARK FOR A SAMPLE OF 14
SUMMER DAYS
68
FIGURE 3.31: CHART SHOWING THE POSITIVE LINEAR
RELATION BETWEEN SALES AND HIGH TEMPERATURES
Scatter chart
69
MEASURES OF ASSOCIATION BETWEEN
TWO VARIABLES
q Scatter Charts:
70
MEASURES OF ASSOCIATION BETWEEN
TWO VARIABLES
§ Covariance: Descriptive measure of the linear association
between two variables
§ Sample covariance for a sample of size n with the observations
∑ %8 ()% &8 ( &*
(𝑥! , 𝑦! ), (𝑥" , 𝑦" ), and so on: 𝑠%& =
+(,
∑ $# % µ% 9# % µ&
§ Population covariance,𝜎$9 = *
71
TABLE 3.32: SAMPLE COVARIANCE CALCULATIONS FOR DAILY HIGH
TEMPERATURE AND BOTTLED WATER SALES AT QUEENSLAND AMUSEMENT
PARK
72
FIGURE 3.33: CALCULATING COVARIANCE AND CORRELATION
COEFFICIENT FOR BOTTLED WATER SALES USING EXCEL
73
MEASURES OF ASSOCIATION BETWEEN
TWO VARIABLES
§ Correlation coefficient: Measures the relationship between
two variables
§ Not affected by the units of measurement for x andy
§ Sample correlation coefficient denoted by𝑟$9
'
§ 𝑟%& = ! !"
!
! "
∑ "# % &" ## % &#
§ 𝑠"# = sample covariance = '%(
∑ "# % "̅ $
§ 𝑠" = sample standard deviation of x = '%(
∑ ## % #* $
§ 𝑠# = sample standard deviation of y= '%(
74
INTERPRETATION OF CORRELATION
COEFFICIENT
–1 ≤ r ≤ +1
76
COMPUTATION OF CORRELATION COEFFICIENT
Illustration
§ To determine the sample correlation coefficient for bottled
water sales at Queensland Amusement Park:
#!" 12.8
𝑟!" = # # = (4.36)(3.15) = 0.93
! "
77
FIGURE 3.35: EXAMPLE OF NONLINEAR RELATIONSHIP
PRODUCING A CORRELATION COEFFICIENT NEAR ZERO
78