Professional Documents
Culture Documents
Graphs,
Graphs, Charts,
Charts, and
and Tables
Tables ––
Describing
Describing Your
Your Data
Data
Dr.M.Raghunadh Acharya
11/07/10 1
Contents …
• Construct a frequency distribution both
manually and with a computer
• Construct and interpret a histogram
• Create and interpret bar charts, pie
charts, and stem-and-leaf diagrams
• Present and interpret data in line charts
and scatter diagrams
11/07/10 2
Frequency Distributions
What is a Frequency Distribution?
• A frequency distribution is a list or a table …
• containing the values of a variable (or a set
of ranges within which the data falls) ...
• and the corresponding frequencies with
which each value occurs (or frequencies with
which data falls within each range)
11/07/10 3
Why Use Frequency Distributions?
11/07/10 4
Frequency Distribution:
Discrete Data
• Discrete data: possible values are countable
24, 35, 17, 21, 24, 37, 26, 46, 58, 30,
32, 13, 12, 38, 41, 43, 44, 27, 53, 27
11/07/10 7
Grouping Data by Classes
• Find range: 58 - 12 = 46
• Select number of classes: 5 (usually between 5 and 20)
Frequency Distribution
11/07/10 10
Histogram Example
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
No gaps
between
bars, since
continuous
data
Class Midpoints
11/07/10 11
Questions for Grouping Data
into Classes
11/07/10 13
(X axis labels are upper class endpoints)
General Guidelines
11/07/10 14
Class Width
• The class width is the distance between the
lowest possible value and the highest possible
value for a frequency class
11/07/10 15
Histograms in Excel
1
Select
Tools/Data
Analysis
11/07/10 16
Histograms in Excel
(continued
)
2
Choose Histogram
3
Input data and bin
ranges
11/07/10 17
Stem and Leaf Diagram
• A simple way to see distribution
details in a data set
METHOD: Separate the sorted
data series into leading digits
(the stem) and the trailing digits
(the leaves)
11/07/10 18
Example:
Stem Leaf
• 12 is shown as 1 2
3 5
• 35 is shown as
11/07/10 19
Example:
11/07/10 21
Graphing Categorical Data
Categorical
Data
11/07/10 22
Bar and Pie Charts
11/07/10 23
Pie Chart Example
Current Investment Portfolio
Investment Amount Percentage Savings
Type (in thousands $)
15%
Stocks 46.5 42.27 Stocks
Bonds 32.0 29.09 42%
CD
CD 15.5 14.09 14%
Savings 16.0 14.55
Total 110 100
Bonds Percentages
(Variables are Qualitative) are rounded to
29% the nearest
percent
11/07/10 24
Bar Chart Example
11/07/10 25
% invested in each category
(bar graph)
11/07/10
Pareto Diagram Example
(line graph)
cumulative % invested
26
Bar Chart Example
Number of Frequency
days read
0 44
1 24
2 18
3 16
4 20
5 22
6 26
7 30
Total 200
11/07/10 27
Tabulating and Graphing
Multivariate Categorical Data
11/07/10 28
Tabulating and Graphing
Multivariate Categorical Data
(continued
)
• Side by side charts
11/07/10 29
Side-by-Side Chart Example
Sales by quarter for three sales territories:
•
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
East 20.4 27.4 59 20.4
West 30.6 38.6 34.6 31.6
North 45.9 46.9 45 43.9
11/07/10 30
Line Charts and Scatter Diagrams
11/07/10 31
Line Chart Example
Inflation
Year
Rate
1985 3.56
1986 1.86
1987 3.65
1988 4.14
1989 4.82
1990 5.40
1991 4.21
1992 3.01
1993 2.99
1994 2.56
1995 2.83
1996 2.95
1997 2.29
1998 1.56
1999 2.21
2000 3.36
2001 2.85
2002 1.58
11/07/10 32
Scatter Diagram Example
23 125
26 140
29 146
33 160
38 167
42 170
50 188
55 195
60 200
11/07/10 33
Types of Relationships
• Linear Relationships
Y Y
X X
11/07/10 34
Types of Relationships
(continued
)
• Curvilinear Relationships
Y Y
X X
11/07/10 35
Types of Relationships
(continued
)
• No Relationship
Y Y
X X
11/07/10 36
Chapter Summary
11/07/10 37
Summarization measures …..
11/07/10 38
Summary Measures
11/07/10 39
Overview: Measures of Center and Location
∑x i
∑ wx i i
x= i =1 XW =
N
n ∑w i
∑x i
µW =
∑ wxi i
µ=
∑w
i =1
N i
11/07/10 40
Mean (Arithmetic Average)
∑x
n = Sample
Size
i
x + x + + xn
x= i =1
= 1 2
n n
– Population mean
N N = Population
∑x Size
x1 + x 2 + + x N
i
µ= =i=1
N N
11/07/10 41
Mean (Arithmetic Average)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Mean = 3 Mean = 4
1 + 2 + 3 + 4 + 5 15 1 + 2 + 3 + 4 + 10 20
= =3 = =4
5 5 5 5
11/07/10 42
Median
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Median = 3 Median = 3
11/07/10 43
Mode
0 1 2 3 4 5 6
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 5 No Mode
11/07/10 44
Weighted Mean
Example: Sample of
26 Repair Projects
Weighted Mean Days to Complete:
Days to Frequency
Complete
5 4
XW =
∑ wx
i i
=
(4 × 5) + (12 × 6) + (8 × 7) + (2 × 8)
6 12 ∑w i 4 + 12 + 8 + 2
7 8 164
= = 6.31 days
8 2 26
11/07/10 45
Review Example
$2,000 K
House
•
Prices:
Five houses on a hill by the beach
$2,000,000
500,000 $500 K
300,000
100,000 $300 K
100,000
$100 K
$100 K
11/07/10 46
Summary Statistics
$2,000,000
• Median: middle value of ranked data
500,000 = $300,000
300,000
100,000 • Mode: most frequent value
= $100,000
100,000
Sum 3,000,000
11/07/10 47
Which measure of location is the “best”?
11/07/10 48
Shape of a Distribution
Describes how data is distributed
•
Symmetric or skewed
•
Other Measures of
Location
Percentiles Quartiles
11/07/10 50
Percentiles
p
i= (n + 1)
100
• Example: The 60th percentile in an ordered array of 19 values is the
value in 12th position:
p 60
i= (n + 1) = (19 + 1) = 12
100 100
11/07/10 51
Quartiles
Q1 Q2 Q3
• Quartiles split the ranked data into 4 equal groups
(n = 9)
25
Q1 = 25th percentile, so find the 25 100
(9+1) = 2.5 position
100
so use the value half way between the 2nd and 3rd values,
so Q1=12.5
11/07/10 52
Box and Whisker Plot
Example:
11/07/10 53
Shape of Box and Whisker Plots
11/07/10 54
Distribution Shape and Box and Whisker Plot
Q1 Q2 Q3
Q1 Q2 Q3 Q1 Q2 Q3
11/07/10 55
Box-and-Whisker Plot Example
0Min
2 2 2 Q1
3 3 4 5 5 Q2
10 27 Q3 Max
00 223 35 5 27 27
11/07/10 56
Measures of Variation
Variation
Population Population
Interquartile
Variance Standard
Range
Deviation
Sample Sample
Variance Standard
Deviation
11/07/10 57
Variation
Same center,
different variation
11/07/10 58
Range
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
11/07/10 59
Disadvantages of the Range
7 8 9 10 11 7 8 9 10 11
12 Range = 12 - 7 = 5 12 Range = 12 - 7 = 5
Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
11/07/10 60
Interquartile Range
11/07/10 61
Interquartile Range
Example:
Median X
X Q1 Q3 maximum
minimum (Q2)
25% 25% 25% 25%
12 30 45 57 70
Interquartile range
= 57 – 30 = 27
11/07/10 62
Variance
∑ i
(x − x ) 2
– Sample variance: s2 = i =1
n -1
N
– Population variance: ∑ (x i − μ) 2
σ2 = i=1
N
11/07/10 63
Standard Deviation
s= i=1
n -1
N
σ= i =1
N
11/07/10 64
Calculation Example: Sample Standard Deviation
Sample
Data (Xi) : 10 12 14 15 17 18 18 24
n=8 Mean = x = 16
(10 −16) 2
+(12 −16) 2
+(14 −16) 2
+ +(24 −16) 2
=
8 −1
126
= = 4.2426
7
11/07/10 65
Comparing Standard Deviations
Data A
Mean = 15.5
s = 3.338
11 12 13 14 15 16 17 18 19 20 21
Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = .9258
Data C
Mean = 15.5
s = 4.57
11 12 13 14 15 16 17 18 19 20 21
11/07/10 66
Coefficient of Variation
Population Sample
σ s
CV = ⋅ 100% CV =
x
⋅100%
μ
11/07/10 67
Comparing Coefficient of Variation
• Stock A:
– Average price last year = $50
– Standard deviation = $5
s $5
CVA = ⋅ 100% = ⋅ 100% = 10%
x $50
Both stocks
have the same
Stock B: standard
Average price last year = $100 deviation, but
Standard deviation = $5 stock B is less
variable
s $5
CVB = ⋅ 100% =
⋅ 100% = 5% relative to its
price
x $100
11/07/10 68
The Empirical Rule
X
68%
μ ± 1σ
μ
11/07/10 69
The Empirical Rule
95% 99.7%
μ ± 2σ μ ± 3σ
11/07/10 70
Tchebysheff’s Theorem
11/07/10 71
Standardized Data Values
• A standardized data value refers to the number of standard deviations a value is from the mean
11/07/10 72
Standardized Population Values
x −μ
z=
σ
where:
• x = original data value
• μ = population mean
• σ = population standard deviation
• z = standard score
(number of standard deviations x is from μ)
11/07/10 73
Standardized Sample Values
x −x
z=
where: s
• x = original data value
• x = sample mean
• s = sample standard deviation
• z = standard score
(number of standard deviations x is from μ)
Remark: The standardized sample values are used for
constructing the confidence limits for the
population parameters.
11/07/10 74