Business Statistics: Graphs, Charts, and Tables - Describing Your Data Graphs, Charts, and Tables - Describing Your Data

Business Statistics
Graphs,
Graphs, Charts,
Charts, and
and Tables
Tables ––
Describing
Describing Your
Your Data
Data
Dr.M.Raghunadh Acharya
11/07/10 1
Contents …
• Construct a frequency distribution both
manually and with a computer
• Construct and interpret a histogram
• Create and interpret bar charts, pie
charts, and stem-and-leaf diagrams
• Present and interpret data in line charts
and scatter diagrams
11/07/10 2
Frequency Distributions
What is a Frequency Distribution?
• A frequency distribution is a list or a table …
• containing the values of a variable (or a set
of ranges within which the data falls) ...
• and the corresponding frequencies with
which each value occurs (or frequencies with
which data falls within each range)
11/07/10 3
Why Use Frequency Distributions?
• A frequency distribution is a way

to summarize data
• The distribution condenses the
raw data into a more useful
form...
• and allows for a quick visual
interpretation of the data
11/07/10 4
Frequency Distribution:
Discrete Data
• Discrete data: possible values are countable
Number of days Frequency

Example: An
advertiser asks read
0 44
200 customers
1 24
how many days
2 18
per week they
read the daily 3 16
newspaper. 4 20
5 22
6 26
7 30
11/07/10 Total 200 5
Relative Frequency
Relative Frequency: What proportion is in each category?
Number of days Frequency Relative

read Frequency
44
0 44 .22 = .22
1 24 .12
200
22% of the
2 18 .09 people in the
3 16 .08 sample report
that they read
4 20 .10 the newspaper
0 days per week
5 22 .11
6 26 .13
7 30 .15
Total
11/07/10 200 1.00 6
Frequency Distribution: Continuous Data
• Continuous Data: may take on any value in some

interval
Example: A manufacturer of insulation randomly selects 20 winter

days and records the daily high temperature
24, 35, 17, 21, 24, 37, 26, 46, 58, 30,
32, 13, 12, 38, 41, 43, 44, 27, 53, 27
(Temperature is a continuous variable because it could

be measured to any degree of precision desired)
11/07/10 7
Grouping Data by Classes
Sort raw data in ascending order:

12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43,
44, 46, 53, 58
• Find range: 58 - 12 = 46
• Select number of classes: 5 (usually between 5 and 20)
• Compute class width: 10 (46/5 then round off)

• Determine class boundaries:10, 20, 30, 40, 50
• Compute class midpoints: 15, 25, 35, 45, 55
• Count observations & assign to classes
11/07/10 8
Frequency Distribution Example
Data in ordered array:

12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Frequency Distribution
Class Frequency Relative

Frequency
10 but under 20 3 .15
20 but under 30 6 .30
30 but under 40 5 .25
40 but under 50 4 .20
50 but under 60 2 .10
Total 20 1.00
11/07/10 9
Histograms
• The classes or intervals are shown on the horizontal
axis
• frequency is measured on the vertical axis
• Bars of the appropriate heights can be used to

represent the number of observations within each
class
• Such a graph is called a histogram
11/07/10 10
Histogram Example
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
No gaps
between
bars, since
continuous
data
Class Midpoints
11/07/10 11
Questions for Grouping Data
into Classes
• 1. How wide should each interval be?

(How many classes should be used?)
• 2. How should the endpoints of the

intervals be determined?
• Often answered by trial and error, subject to
user judgment
• The goal is to create a distribution that is
neither too "jagged" nor too "blocky”
• Goal is to appropriately show the pattern of
variation in the data
11/07/10 12
How Many Class Intervals?
• Many (Narrow class intervals)

• may yield a very jagged distribution
with gaps from empty classes
• Can give a poor indication of how
frequency varies across classes
• Few (Wide class intervals)

• may compress variation too much and
yield a blocky distribution
• can obscure important patterns of
variation.
11/07/10 13
(X axis labels are upper class endpoints)
General Guidelines
• Number of Data Points Number of Classes

under 50 5- 7
50 – 100 6 - 10
100 – 250 7 - 12
over 250 10 - 20
– Class widths can typically be reduced as the number of

observations increases
– Distributions with numerous observations are more likely
to be smooth and have gaps filled since data are plentiful
11/07/10 14
Class Width
• The class width is the distance between the
lowest possible value and the highest possible
value for a frequency class
• The minimum class

width is
Largest Value  Smallest Value
W =
Number of Classes
11/07/10 15
Histograms in Excel
1
Select
Tools/Data
Analysis
11/07/10 16
Histograms in Excel
(continued
)
2
Choose Histogram
3
Input data and bin
ranges
11/07/10 17
Stem and Leaf Diagram
• A simple way to see distribution
details in a data set
METHOD: Separate the sorted
data series into leading digits
(the stem) and the trailing digits
(the leaves)
11/07/10 18
Example:

12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Here, use the 10’s digit for the stem unit:

•
Stem Leaf
• 12 is shown as 1 2
3 5
• 35 is shown as
11/07/10 19
Example:

12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
• Completed Stem-and-leaf diagram:

Stem Leaves
1 2 3 7
2 1 4 4 6 7
8
3 0 2 5 7 8
4 1 3 4 6
5 3 8
11/07/10 20
Using other stem units
• Using the 100’s digit as the stem:

– Round off the 10’s digit to form the
leaves Stem Leaf
– 613 would become 6 1
• 776 would become 7 8
• ...
• 1224 becomes 12 2
11/07/10 21
Graphing Categorical Data
Categorical
Data
Pie Bar Pareto

Charts Charts Diagram
11/07/10 22
Bar and Pie Charts
• Bar charts and Pie charts are

often used for qualitative
(category) data
• Height of bar or size of pie slice

shows the frequency or
percentage for each category
11/07/10 23
Pie Chart Example
Current Investment Portfolio
Investment Amount Percentage Savings
Type (in thousands $)
15%
Stocks 46.5 42.27 Stocks
Bonds 32.0 29.09 42%
CD
CD 15.5 14.09 14%
Savings 16.0 14.55
Total 110 100
Bonds Percentages
(Variables are Qualitative) are rounded to
29% the nearest
percent
11/07/10 24
Bar Chart Example
11/07/10 25
% invested in each category
(bar graph)
11/07/10
Pareto Diagram Example
(line graph)
cumulative % invested
26
Bar Chart Example
Number of Frequency
days read
0 44
1 24
2 18
3 16
4 20
5 22
6 26
7 30
Total 200
11/07/10 27
Tabulating and Graphing
Multivariate Categorical Data
• Investment in thousands of dollars

Investment Investor A Investor B Investor C Total
Category
Stocks 46.5 55 27.5 129

Bonds 32.0 44 19.0 95
CD 15.5 20 13.5 49
Savings 16.0 28 7.0 51
Total 110.0 147 67.0 324
11/07/10 28
Tabulating and Graphing
Multivariate Categorical Data
(continued
)
• Side by side charts
11/07/10 29
Side-by-Side Chart Example
Sales by quarter for three sales territories:
•
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
East 20.4 27.4 59 20.4
West 30.6 38.6 34.6 31.6
North 45.9 46.9 45 43.9
11/07/10 30
Line Charts and Scatter Diagrams
• Line charts show values of one

variable vs. time
– Time is traditionally shown on the
horizontal axis
Scatter Diagrams show points for
bivariate data
– one variable is measured on the vertical
axis and the other variable is measured
on the horizontal axis
11/07/10 31
Line Chart Example
Inflation
Year
Rate
1985 3.56
1986 1.86
1987 3.65
1988 4.14
1989 4.82
1990 5.40
1991 4.21
1992 3.01
1993 2.99
1994 2.56
1995 2.83
1996 2.95
1997 2.29
1998 1.56
1999 2.21
2000 3.36
2001 2.85
2002 1.58
11/07/10 32
Scatter Diagram Example
Volume Cost per

per day day
23 125
26 140
29 146
33 160
38 167
42 170
50 188
55 195
60 200
11/07/10 33
Types of Relationships
• Linear Relationships
Y Y
X X
11/07/10 34
(continued
)
• Curvilinear Relationships
Y Y
X X
11/07/10 35
(continued
)
• No Relationship
Y Y
X X
11/07/10 36
Chapter Summary
• Data in raw form are usually not easy to use for

decision making -- Some type of organization is
needed:
♦ Table ♦ Graph
• Techniques reviewed in this chapter:

– Frequency Distributions and Histograms
– Bar Charts and Pie Charts
– Stem and Leaf Diagrams
– Line Charts and Scatter Diagrams
11/07/10 37
Summarization measures …..
Summarization measures are single or few number representations of the

data which are helpful in representing data and also to compare between
data. Based on the summary measures of the sample ,population measures
can be forecasted.
The following will illustrate the above, different measures to represent the
data are as follows :
1. Measures of Center and Location

2. Mean, median, mode, geometric mean, midrange
3. Other measures of Location
4. Weighted mean, percentiles, quartiles
5. Measures of Variation
6. Range, Inter quartile range, variance and standard deviation,
coefficient of variation
11/07/10 38
Summary Measures
Describing Data Numerically
Center and Location Other Measures of Variation

Location
Mean Range
Percentiles
Median Inter quartile Range
Quartiles
Variance
Mode
Standard Deviation
Weighted Mean
Coefficient of Variation
11/07/10 39
Overview: Measures of Center and Location
Center and Location
Mean Median Mode Weighted Mean

n
∑x i
∑ wx i i
x= i =1 XW =
N
n ∑w i
∑x i
µW =
∑ wxi i
µ=
∑w
i =1
N i
11/07/10 40
Mean (Arithmetic Average)
• The Mean is the arithmetic average of data values

– Sample mean
n
∑x
n = Sample
Size
i
x + x +  + xn
x= i =1
= 1 2
n n
– Population mean
N N = Population
∑x Size
x1 + x 2 +  + x N
i
µ= =i=1
N N
11/07/10 41
Mean (Arithmetic Average)
• The most common measure of central tendency

• Mean = sum of values divided by the number of values
• Affected by extreme values (outliers)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Mean = 3 Mean = 4
1 + 2 + 3 + 4 + 5 15 1 + 2 + 3 + 4 + 10 20
= =3 = =4
5 5 5 5
11/07/10 42
Median
• Not affected by extreme values
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Median = 3 Median = 3
• In an ordered array, the median is the “middle” number

– If n or N is odd, the median is the middle number
– If n or N is even, the median is the average of the two middle numbers
11/07/10 43
Mode
• A measure of central tendency

• Value that occurs most often
• Not affected by extreme values
• Used for either numerical or categorical data
• There may be no mode
• There may be several modes
0 1 2 3 4 5 6
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 5 No Mode
11/07/10 44
Weighted Mean
Used when values are grouped by frequency or relative importance

•
Example: Sample of
26 Repair Projects
Weighted Mean Days to Complete:
Days to Frequency
Complete
5 4
XW =
∑ wx
i i
=
(4 × 5) + (12 × 6) + (8 × 7) + (2 × 8)
6 12 ∑w i 4 + 12 + 8 + 2
7 8 164
= = 6.31 days
8 2 26
11/07/10 45
Review Example
$2,000 K
House
•
Prices:
Five houses on a hill by the beach
$2,000,000
500,000 $500 K
300,000
100,000 $300 K
100,000
$100 K
$100 K
11/07/10 46
Summary Statistics
House Prices: • Mean: ($3,000,000/5)

= $600,000
$2,000,000
• Median: middle value of ranked data
500,000 = $300,000
300,000
100,000 • Mode: most frequent value
= $100,000
100,000
Sum 3,000,000
11/07/10 47
Which measure of location is the “best”?
• Mean is generally used, unless extreme values (outliers) exist

• Then median is often used, since the median is not sensitive to
extreme values.
– Example: Median home prices may be reported for a region –
less sensitive to outliers
11/07/10 48
Shape of a Distribution
Describes how data is distributed
•
Symmetric or skewed
•
Left-Skewed Symmetric Right-Skewed
Mean < Median < Mode
Mean = Median = Mode Mode < Median < Mean

(Longer tail extends to left) (Longer tail extends to right)
11/07/10 49
Other Location Measures
Other Measures of
Location
Percentiles Quartiles
The pth percentile in a data array: • 1st quartile = 25th percentile

• p% are less than or equal to this
value • 2nd quartile = 50th percentile
• (100 – p)% are greater than or = median
equal to this value
(where 0 ≤ p ≤ 100) • 3rd quartile = 75th percentile
11/07/10 50
Percentiles
• The p percentile in an ordered array of n values is the value in i position, where

th th
p
i= (n + 1)
100
• Example: The 60th percentile in an ordered array of 19 values is the
value in 12th position:
p 60
i= (n + 1) = (19 + 1) = 12
100 100
11/07/10 51
Quartiles
25% 25% 25% 25%
Q1 Q2 Q3
• Quartiles split the ranked data into 4 equal groups
• Example: Find the first quartile
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22
(n = 9)
25
Q1 = 25th percentile, so find the 25 100
(9+1) = 2.5 position
100
so use the value half way between the 2nd and 3rd values,
so Q1=12.5
11/07/10 52
Box and Whisker Plot
• A Graphical display of data using 5-number summary:
Minimum -- Q1 -- Median -- Q3 -- Maximum
Example:
25% 25% 25% 25%
Minimum 1st Median 3rd Maximum

Quartile Quartile
11/07/10 53
Shape of Box and Whisker Plots
• The Box and central line are centered between the

endpoints if data is symmetric around the median
• A Box and Whisker plot can be shown in either vertical or

horizontal format
11/07/10 54
Distribution Shape and Box and Whisker Plot
Left-Skewed Symmetric Right-Skewed
Q1 Q2 Q3
Q1 Q2 Q3 Q1 Q2 Q3
11/07/10 55
Box-and-Whisker Plot Example
• Below is a Box-and-Whisker plot for the following data:
0Min
2 2 2 Q1
3 3 4 5 5 Q2
10 27 Q3 Max
00 223 35 5 27 27
• This data is very right skewed, as the plot depicts
11/07/10 56
Measures of Variation
Variation
Range Variance Standard Deviation Coefficient of

Variation
Population Population
Interquartile
Variance Standard
Range
Deviation
Sample Sample
Variance Standard
Deviation
11/07/10 57
Variation
• Measures of variation give information on the spread or

variability of the data values.
Same center,
different variation
11/07/10 58
Range
• Difference between the largest and the smallest observations.
Range = xmaximum – xminimum
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
11/07/10 59
Disadvantages of the Range
• Ignores the way in which data are distributed
7 8 9 10 11 7 8 9 10 11
12 Range = 12 - 7 = 5 12 Range = 12 - 7 = 5
Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
11/07/10 60
Interquartile Range
• Can eliminate some outlier problems by using the Interquartile range
• Eliminate some high-and low-valued observations and calculate the range

from the remaining values.
• Interquartile range = 3rd quartile – 1st quartile
11/07/10 61
Interquartile Range
Example:
Median X
X Q1 Q3 maximum
minimum (Q2)
25% 25% 25% 25%
12 30 45 57 70
Interquartile range
= 57 – 30 = 27
11/07/10 62
Variance
• Average of squared deviations of values from

n the mean
∑ i
(x − x ) 2
– Sample variance: s2 = i =1
n -1
N
– Population variance: ∑ (x i − μ) 2
σ2 = i=1
N
11/07/10 63
Standard Deviation
• Most commonly used measure of variation

• Shows variation about the mean
• Has the same units as the original data
n
– Sample standard deviation:
∑ i
(x − x ) 2
s= i=1
n -1
N
– Population standard deviation:

∑ i
(x − μ) 2
σ= i =1
N
11/07/10 64
Calculation Example: Sample Standard Deviation
Sample
Data (Xi) : 10 12 14 15 17 18 18 24
n=8 Mean = x = 16
(10 −x )2 +(12 −x )2 +(14 −x )2 + +(24 −x )2

s =
n −1
(10 −16) 2
+(12 −16) 2
+(14 −16) 2
+ +(24 −16) 2
=
8 −1
126
= = 4.2426
7
11/07/10 65
Comparing Standard Deviations
Data A
Mean = 15.5
s = 3.338
11 12 13 14 15 16 17 18 19 20 21
Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = .9258
Data C
Mean = 15.5
s = 4.57
11 12 13 14 15 16 17 18 19 20 21
11/07/10 66
Coefficient of Variation
• Measures relative variation

• Always in percentage (%)
• Shows variation relative to mean
• Is used to compare two or more sets of data measured in different units
Population Sample
σ  s 
CV =   ⋅ 100% CV =
x
 ⋅100%

μ  
11/07/10 67
Comparing Coefficient of Variation
• Stock A:
– Average price last year = $50
– Standard deviation = $5
s $5
CVA =   ⋅ 100% = ⋅ 100% = 10%
x $50
Both stocks
have the same
Stock B: standard
Average price last year = $100 deviation, but
Standard deviation = $5 stock B is less
variable
s  $5
CVB =   ⋅ 100% =
 ⋅ 100% = 5% relative to its
price
x  $100
11/07/10 68
The Empirical Rule
• If the data distribution is bell-shaped, then the interval:

• μ ± 1σ contains about 68% of the values in the population or the sample
X
68%
μ ± 1σ
μ
11/07/10 69
The Empirical Rule
• μ ± 2σ contains about 95% of the values in the population or the sample

• μ ± 3σ contains about 99.7% of the values in the population or the sample
95% 99.7%
μ ± 2σ μ ± 3σ
11/07/10 70
Tchebysheff’s Theorem
• Regardless of how the data are distributed, at least (1 - 1/k2) of

the values will fall within k standard deviations of the mean
• Examples: At least within
– (1 - 1/12) = 0% ……..... k=1 (μ ± 1σ)
(1 - 1/22) = 75% …........ k=2 (μ ± 2σ)
(1 - 1/32) = 89% …........ k=3 (μ ± 3σ)
11/07/10 71
Standardized Data Values
• A standardized data value refers to the number of standard deviations a value is from the mean
• Standardized data values are sometimes referred to as z-scores
11/07/10 72
Standardized Population Values
x −μ
z=
σ
where:
• x = original data value
• μ = population mean
• σ = population standard deviation
• z = standard score
(number of standard deviations x is from μ)
11/07/10 73
Standardized Sample Values
x −x
z=
where: s
• x = original data value
• x = sample mean
• s = sample standard deviation
• z = standard score
(number of standard deviations x is from μ)
Remark: The standardized sample values are used for
constructing the confidence limits for the
population parameters.
11/07/10 74

Business Statistics: Graphs, Charts, and Tables - Describing Your Data Graphs, Charts, and Tables - Describing Your Data

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Business Statistics: Graphs, Charts, and Tables - Describing Your Data Graphs, Charts, and Tables - Describing Your Data

Uploaded by

Copyright:

Available Formats

Business Statistics

• A frequency distribution is a way

Number of days Frequency

Number of days Frequency Relative

• Continuous Data: may take on any value in some

Example: A manufacturer of insulation randomly selects 20 winter

(Temperature is a continuous variable because it could

Sort raw data in ascending order:

• Compute class width: 10 (46/5 then round off)

Data in ordered array:

Class Frequency Relative

• Bars of the appropriate heights can be used to

• Such a graph is called a histogram

• 1. How wide should each interval be?

• 2. How should the endpoints of the

• Many (Narrow class intervals)

• Few (Wide class intervals)

• Number of Data Points Number of Classes

– Class widths can typically be reduced as the number of

• The minimum class

Data in ordered array:

Here, use the 10’s digit for the stem unit:

Data in ordered array:

• Completed Stem-and-leaf diagram:

• Using the 100’s digit as the stem:

Pie Bar Pareto

• Bar charts and Pie charts are

• Height of bar or size of pie slice

• Investment in thousands of dollars

Stocks 46.5 55 27.5 129

• Line charts show values of one

Volume Cost per

• Data in raw form are usually not easy to use for

• Techniques reviewed in this chapter:

Summarization measures are single or few number representations of the

1. Measures of Center and Location

Describing Data Numerically

Center and Location Other Measures of Variation

Center and Location

Mean Median Mode Weighted Mean

• The Mean is the arithmetic average of data values

• The most common measure of central tendency

• Not affected by extreme values

• In an ordered array, the median is the “middle” number

• A measure of central tendency

Used when values are grouped by frequency or relative importance

House Prices: • Mean: ($3,000,000/5)

• Mean is generally used, unless extreme values (outliers) exist

Left-Skewed Symmetric Right-Skewed

Mean < Median < Mode

Mean = Median = Mode Mode < Median < Mean

The pth percentile in a data array: • 1st quartile = 25th percentile

• The p percentile in an ordered array of n values is the value in i position, where

25% 25% 25% 25%

• Example: Find the first quartile

Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22

• A Graphical display of data using 5-number summary:

Minimum -- Q1 -- Median -- Q3 -- Maximum

25% 25% 25% 25%

Minimum 1st Median 3rd Maximum

• The Box and central line are centered between the

• A Box and Whisker plot can be shown in either vertical or