You are on page 1of 78

CHAPTER 3

DESCRIPTIVE
STATISTICS
POINTS TO HIGHLIGHT
§ Modifying data in Excel

§ Creating distributions from data

§ Measures of location

§ Measures of variability

§ Analyzing distribution

§ Measure of association between two variables


§ Sorting and Filtering Data in Excel

§ Conditional Formatting of Data in Excel


SORTING AND FILTERING DATA IN EXCEL
To sort the automobiles by March 2010 sales:
§ Step 1: Select cells A1:F21
§ Step 2: Click the Data tab in the Ribbon
§ Step 3: Click Sort in the Sort & Filter group
§ Step 4: Select the check box for My data has headers
§ Step 5: In the first Sort by dropdown menu, select Sales (March
2010)
§ Step 6: In the Order dropdown menu, select Largest to Smallest
§ Step 7: Click OK

4
FIGURE 3.1: USING EXCEL’S SORT FUNCTION TO SORT THE
TOP-SELLING AUTOMOBILES DATA

5
FIGURE 3.2: TOP-SELLING AUTOMOBILES DATA SORTED BY
SALES IN MARCH 2010 SALES

6
SORTING AND FILTERING DATA IN EXCEL

§ Using Excel’s Filter function to see the sales of


models made by Toyota
§ Step 1: Select cells A1:F21
§ Step 2: Click the Data tab in the Ribbon
§ Step 3: Click Filter in the Sort & Filter group
§ Step 4: Click on the Filter Arrow in column B, next to Manufacturer
§ Step 5: If all choices are checked, you can easily deselect all choices by
unchecking (Select All). Then select only the check box for Toyota.
§ Step 6. Click OK

7
FIGURE 3.3: TOP SELLING AUTOMOBILES DATA FILTERED TO
SHOW ONLY AUTOMOBILES MANUFACTURED BY TOYOTA

8
CONDITIONAL FORMATTING OF DATA IN EXCEL
§ Makes it easy to identify data that satisfy certain conditions in a
data set

9
CONDITIONAL FORMATTING OF DATA IN EXCEL
Example:
§ To identify the automobile models in Table 3.1 for which sales had
decreased from March 2010 to March 2011:

§ Step 1: Starting with the original data shown in Figure 3.1, select cells
F1:F21

§ Step 2: Click on the Home tab in the Ribbon

§ Step 3: Click Conditional Formatting in the Styles group

§ Step 4: Select Highlight Cells Rules, and click Less Than from the
dropdown menu

§ Step 5: Enter 0% in the Format cells that are LESS THAN: box

§ Step 6: Click OK
10
FIGURE 3.4: USING CONDITIONAL FORMATTING IN EXCEL TO
HIGHLIGHT AUTOMOBILES WITH DECLINING SALES FROM MARCH
2010

11
FIGURE 3.5: USING CONDITIONAL FORMATTING IN EXCEL TO
GENERATE DATA BARS FOR THE TOP-SELLING AUTOMOBILES
DATA

12
CONDITIONAL FORMATTING OF DATA IN EXCEL
§ Quick Analysis button appears just outside the bottom-
right corner of a group of selected cells
§ Provides shortcuts for Conditional Formatting, adding Data
Bars, etc.
§ Frequency Distributions for Categorical Data
§Relative Frequency and Percent Frequency
Distributions
§ Frequency Distributions for Quantitative Data
§ Histograms
FREQUENCY DISTRIBUTIONS FOR CATEGORICAL
DATA

§ Frequency distribution: A summary of


data that shows the number (frequency) of
observations in each of several
nonoverlapping classes,

§ Typically referred to as bins, when dealing


with distributions

15
TABLE 3.6: DATA FROM A SAMPLE OF 50 SOFT
DRINK PURCHASES

16
§ The frequency distribution summarizes information about the
popularity of the five soft drinks:

§ Coca-Cola is the leader


§ Pepsi is second
§ Diet Coke is third and Sprite and Dr. Pepper are tied for fourth
FIGURE 3.8: CREATING A FREQUENCY DISTRIBUTION FOR
SOFT DRINKS DATA IN EXCEL

18
RELATIVE FREQUENCY AND PERCENT FREQUENCY
DISTRIBUTIONS

§ Relative frequency distribution: It is a


tabular summary of data showing the
relative frequency for each bin

§ Percent frequency distribution:


Summarizes the percent frequency of the
data for each bin

19
TABLE 3.9: RELATIVE FREQUENCY AND PERCENT FREQUENCY
DISTRIBUTIONS OF SOFT DRINK PURCHASES

20
FREQUENCY DISTRIBUTIONS FOR QUANTITATIVE
DATA

Three steps necessary to define the classes for


a frequency distribution with quantitative data:
1. Determine the number of non-overlapping bins.
2. Determine the width of each bin.
3. Determine the bin limits.

21
Table 3.10: Year-End Audit Times (Days)

Table 3.11: Frequency, Relative Frequency, and Percent


Frequency Distributions for the Audit Time Data

22
FIGURE 3.12: USING EXCEL TO GENERATE A FREQUENCY
DISTRIBUTION FOR AUDIT TIMES DATA

23
HISTOGRAM
§ A common graphical presentation of quantitative
data

§ Constructed by placing the variable of interest on


the horizontal axis and the selected frequency
measure (absolute frequency, relative frequency, or
percent frequency) on the vertical axis.

§ The frequency measure of each class is shown by


drawing a rectangle whose base is determined by
the class limits on the horizontal axis and whose
height is the corresponding frequency measure.

24
FIGURE 3.13: HISTOGRAM FOR THE AUDIT TIME
DATA

25
FIGURE 3.14: CREATING A HISTOGRAM FOR THE AUDIT
TIME DATA USING DATA ANALYSIS TOOLPAK IN EXCEL

26
FIGURE 3.15: COMPLETED HISTOGRAM FOR THE AUDIT
TIME DATA USING DATA ANALYSIS TOOLPAK IN EXCEL

27
CREATING DISTRIBUTIONS FROM DATA

§ Histograms provides information about the shape, or form, of


a distribution
§ Skewness: Lack of symmetry
§ Skewness is an important characteristic of the shape of a
distribution

28
FIGURE 3.16: HISTOGRAMS SHOWING DISTRIBUTIONS WITH
DIFFERENT LEVELS OF SKEWNESS

29
CUMULATIVE DISTRIBUTIONS

§ Cumulative frequency distribution: shows the number of


data items with values less than or equal to the upper class
limit of each class
§ A variation of the frequency distribution that provides another
tabular summary of quantitative data

30
TABLE 3.17: CUMULATIVE FREQUENCY, CUMULATIVE RELATIVE
FREQUENCY, AND CUMULATIVE PERCENT FREQUENCY
DISTRIBUTIONS FOR THE AUDIT TIME DATA

31
§ Mean (Arithmetic Mean)
§ Median
§ Mode
§ Geometric Mean
MEASURES OF LOCATION
Mean/Arithmetic Mean
§ Average value for a variable
§ The mean is denoted by 𝑥̅
§ n = sample size
§ 𝑥! = value of variable x for the first observation
§ 𝑥" = value of variable x for the second observation
§ 𝑥# = value of variable x for the nth observation

33
TABLE 3.18: DATA ON HOME SALES IN CINCINNATI, OHIO,
SUBURB

34
COMPUTATION OF SAMPLE MEAN
§ illustration: Computation of the mean home selling price for the
sample of 12 home sales

35
MEASURES OF LOCATION
Median
§ Value of the item in the middle when the
data are arranged in ascending order
§ Value of middle item, for an odd number of
observations
§ Average of values of two middle items, for an
even number of observations

36
COMPUTATION OF SAMPLE MEDIAN
§ illustration: When the number of observations
are odd
v Consider the class size data for a sample of five
college classes:
46 54 42 46 32
v Arrange the class size data in ascending order
32 42 46 46 54
v Middlemost value in the data set = 46
v Median is 46

37
COMPUTATION OF SAMPLE MEDIAN
§ illustration - When the number of observations
are even
§ Consider the data on home sales in Cincinnati, Ohio,
Suburb (Table 2.9)
§ Arrange the data in ascending order:
108,000 138,000 138,000 142,000 186,000 199,500
208,000 254,000 254,000 257,500 298,000 456,250
§ Median = average of two middle values
= "199,500 + 208,000" /"2" =203,750

Middle Two Values


38
MEASURES OF LOCATION
Mode
§ Value that occurs most frequently in a data set
§ Consider the class size data:

32 42 46 46 54
§ Observe - 46 is the only value that occurs more
than once
§ Mode is 46
§ Multimodal data - Data contain at least two
modes
§ Bimodal data - Data contain exactly two modes
39
FIGURE 3.19: CALCULATING THE MEAN, MEDIAN, AND MODES
FOR THE HOME SALES DATA USING EXCEL

40
MEASURES OF LOCATION
Geometric Mean
§ nth root of the product of n values
§ Used in analyzing growth rates or rate of change
§ Sample geometric mean

41
TABLE 3.20: PERCENTAGE ANNUAL RETURNS AND GROWTH
FACTORS FOR THE MUTUAL FUND DATA
§ illustration: Consider the percentage annual returns and
growth factors for the mutual fund data over the past 10
years
§ We will determine the mean rate of growth for the fund
over the 10-year period

42
COMPUTATION OF GEOMETRIC MEAN
§ Solution:
§ Product of the growth factors:
(.779)(1.287)(1.109)(1.049)(1.158)(1.055)(.630)(1.265)(1.1
51)(1.021)
= 1.335
§ Geometric mean of the growth factors:
!"
𝑥!̅ = 1.335 = 1.029
§ Conclude that annual returns grew at an average annual rate of
(1.029 – 1)100% or 2.9%

43
FIGURE 3.21: CALCULATING THE GEOMETRIC MEAN FOR THE
MUTUAL FUND DATA USING EXCEL

44
§ Range
§ Variance
§ Standard Deviation
§ Coefficient of Variation
MEASURES OF VARIABILITY
Table 3.22: Annual Payouts for Figure 3.23: Histograms for
Payouts of Past 20 Years from Fund
Two Different Investment Funds A and Fund B

46
COMPUTATION OF RANGE
Range
§ Found by subtracting the smallest value from the largest value in a
data set
§ Illustration: Consider the data on home sales in Cincinnati, Ohio,
suburb
§ Largest home sales price: $456,250
§ Smallest home sales price: $108,000
§ Range = Largest value – Smallest value
= $456,250 – $108,000
= $348,250
§ Drawback: Range is based on only two of the observations and
thus is highly influenced by extreme values

47
MEASURES OF VARIABILITY
Variance
§ Measure of variability that utilizes all the data
§ It is based on the deviation about the mean, which is the
difference between the value of each observation (xi) and the
mean

§ The deviations about the mean are squared while computing


the variance
$
∑ $# % $̅
§ Sample variance, 𝑠" = '%(
$
∑ $# % )
§ Population variance , 𝜎" = *

48
TABLE 3.24: COMPUTATION OF DEVIATIONS AND SQUARED
DEVIATIONS ABOUT THE MEAN FOR THE CLASS SIZE DATA

Computation of Sample
Variance:
49
FIGURE 3.25: CALCULATING VARIABILITY MEASURES FOR
THE HOME SALES DATA IN EXCEL

50
MEASURES OF VARIABILITY
§ Standard Deviation
§ Positive square root of the variance
§ Measured in the same units as the original data
§ For sample , s= 𝑠 "
§ For population, σ= σ"

§ Coefficient of Variation
+,-./-0/ /123-,34.
§
51-.
x 100 %
§ Measures the standard deviation relative to the mean
§ Expressed as a percentage

51
COMPUTATION OF COEFFICIENT OF VARIATION

§ Illustration:
§ Consider the class size data:
46 54 42 46 32
§ Mean, 𝑥̅ = 44
§ Standard deviation, s = 8
8
§ Coefficient of variation = 44 x 100 % = 18.2%

52
§ Percentiles
§ Quartiles
§ Z-Scores
§ Empirical Rule
§ Identifying Outliers
§ Box Plots
ANALYZING DISTRIBUTIONS

Percentiles
§ Value of a variable at which a specified (approximate)
percentage of observations are below that value
§ The pth percentile tells us the point in the data where:
§ Approximately p percent of the observations have values less
than the pth percentile
§ Approximately (100 – p) percent of the observations have values
greater than the pth percentile

54
ANALYZING DISTRIBUTIONS
§ Steps to calculate the pth percentile:
§ Arrange the data in ascending order (smallest to largest value)
§ Compute k = (n + 1) × p
§ Divide k into its integer component, i, and its decimal
component, d
§ If d = 0, find the kth largest value in the data set; this is the pth
percentile
§ If d > 0, the percentile is between the values in positions i and i + 1
in the sorted data; to find this percentile, we must interpolate
between these two values:
i. Calculate the difference between the values in
positions i and i + 1 in the sorted data set; we define
this difference between the two values as m
ii. Multiply this difference by d: t = m × d
iii. To find the pth percentile, add t to the value in
position i of the sorted data

55
ANALYZING DISTRIBUTIONS
§ Illustration

§ To determine the 85th percentile for the home sales data in


Table 3.18.
1. Arrange the data in ascending order
108,000 138,000 138,000 142,000 186,000 199,500
208,000 254,000 254,000 257,500 298,000 456,250
2. Compute k = (n + 1) × p = (12 + 1) × 0.85 = 11.05
3. Dividing 11.05 into the integer and decimal components
gives us i = 11 and d = 0.05
4. d > 0, interpolate between the values in the 11th and 12th
positions in the sorted data
56
ANALYZING DISTRIBUTIONS

Illustration (contd.)
§ To determine the 85th percentile for the home sales data in Table
3.18
§ The value in the 11th position is 298,000
§ The value in the 12th position is 456,250
m = 456,250 – 298,000 = 158,250
t = m × d = 158,250 × 0.05 = 7912.5
pth percentile = 298,000 + 7912.5 = 305,912.5
$305,912.50 represents the 85th percentile of the home sales data

57
ANALYZING DISTRIBUTIONS

Quartiles
§ When the data is divided into four equal parts:
§ Each part contains approximately 25% of the observations
§ Division points are referred to as quartiles
§ 𝑄! = first quartile, or 25th percentile
§ 𝑄" = second quartile, or 50th percentile (also the median)
§ 𝑄# = third quartile, or 75th percentile

58
ANALYZING DISTRIBUTIONS
z-score
§ Measures the relative location of a value in the data set
§ Helps to determine how far a particular value is from the
mean relative to the data set’s standard deviation
§ Standardized value
§ If 𝑥! , 𝑥" , . . . , 𝑥# is a sample of n observations
$# % $̅
𝑧6 = 7
§ 𝑧$ = z-score for 𝑥$
§ 𝑥̅ = sample mean
§ s = sample standard deviation

59
TABLE 3.26: Z-SCORES FOR THE CLASS SIZE DATA
§ For class size data, 𝑥̅ = 44 and s= 8
§ For observations with a value > mean, z-score >0
§ For observations with a value <mean, z-score <0

60
FIGURE 3.27: CALCULATING Z-SCORES FOR THE
HOME SALES DATA IN EXCEL

61
EXAMPLE: WHICH IS THE BETTER OFFER?
Suppose that two graduating seniors, one a marketing major
and one an accounting major, are comparing job offers. The
accounting major has an offer for $45,000 per year, and the
marketing student has an offer for $42,000 per year. Summary
information about the distribution of offers follows:
Accounting: mean = 46,000 Standard deviation = 1500
Marketing: mean = 42,500 Standard deviation = 1000
EXAMPLE: WHICH IS THE BETTER OFFER?
§ Accounting § Marketing

45000 − 46000 42000 − 42500


z score = = −0.67 z score = = −0.5
1500 1000

This offer is 0.67 SD below the This offer is 0.5 SD below the
mean mean
ANALYZING DISTRIBUTIONS

Empirical Rule
§ For data having a bell-shaped distribution:
§ Within 1 standard deviation—approximately 68% of the data values
§ Within 2 standard deviations—approximately 95% of the data values
§ Within 3 standard deviations—almost all the data values

Identifying Outliers
§ Outliers: Extreme values in a data set
§ It can be identified using standardized values (z-scores)
§ Any data value with a z-score less than –3 or greater than +3 is an
outlier

64
ANALYZING DISTRIBUTIONS
Box Plots
§ Graphical summary of the distribution of data
§ Developed from the quartiles for a data set

Figure 3.28: Box


Plot for the
Home Sales Data

65
FIGURE 3.29: BOX PLOTS COMPARING HOME SALE
PRICES IN DIFFERENT COMMUNITIES

66
§ Scatter Charts

§ Covariance
§ Correlation Coefficient
TABLE 3.30: DATA FOR BOTTLED WATER SALES AT
QUEENSLAND AMUSEMENT PARK FOR A SAMPLE OF 14
SUMMER DAYS

68
FIGURE 3.31: CHART SHOWING THE POSITIVE LINEAR
RELATION BETWEEN SALES AND HIGH TEMPERATURES
Scatter chart

69
MEASURES OF ASSOCIATION BETWEEN
TWO VARIABLES
q Scatter Charts:

§ Useful graph for analyzing the relationship between


two variables

§ The scatter chart also suggests that a straight line


could be used as an approximation for the relationship
between two variables

70
MEASURES OF ASSOCIATION BETWEEN
TWO VARIABLES
§ Covariance: Descriptive measure of the linear association
between two variables
§ Sample covariance for a sample of size n with the observations
∑ %8 ()% &8 ( &*
(𝑥! , 𝑦! ), (𝑥" , 𝑦" ), and so on: 𝑠%& =
+(,
∑ $# % µ% 9# % µ&
§ Population covariance,𝜎$9 = *

71
TABLE 3.32: SAMPLE COVARIANCE CALCULATIONS FOR DAILY HIGH
TEMPERATURE AND BOTTLED WATER SALES AT QUEENSLAND AMUSEMENT
PARK

72
FIGURE 3.33: CALCULATING COVARIANCE AND CORRELATION
COEFFICIENT FOR BOTTLED WATER SALES USING EXCEL

73
MEASURES OF ASSOCIATION BETWEEN
TWO VARIABLES
§ Correlation coefficient: Measures the relationship between
two variables
§ Not affected by the units of measurement for x andy
§ Sample correlation coefficient denoted by𝑟$9
'
§ 𝑟%& = ! !"
!
! "
∑ "# % &" ## % &#
§ 𝑠"# = sample covariance = '%(
∑ "# % "̅ $
§ 𝑠" = sample standard deviation of x = '%(

∑ ## % #* $
§ 𝑠# = sample standard deviation of y= '%(

74
INTERPRETATION OF CORRELATION
COEFFICIENT
–1 ≤ r ≤ +1

r value Relationship between


the x and y variables
<0 Negative linear

Near 0 No linear relationship

>0 Positive linear


75
FIGURE 3.34: SCATTER DIAGRAMS AND ASSOCIATED
COVARIANCE VALUES FOR DIFFERENT VARIABLE
RELATIONSHIPS

(a) (b) (c)


𝑠%& Positive: 𝑠%& Approximately 0: 𝑠%& Negative:
(x and y are positively (x and y are not (x and y are negatively
linearly related) linearly related) linearly related)

76
COMPUTATION OF CORRELATION COEFFICIENT

Illustration
§ To determine the sample correlation coefficient for bottled
water sales at Queensland Amusement Park:

#!" 12.8
𝑟!" = # # = (4.36)(3.15) = 0.93
! "

§ There is a very strong linear relationship between high


temperature and sales

77
FIGURE 3.35: EXAMPLE OF NONLINEAR RELATIONSHIP
PRODUCING A CORRELATION COEFFICIENT NEAR ZERO

78

You might also like