You are on page 1of 46

Organizing and Visualizing

Data

1
The Choice Is Yours
Even though he is still in his 20s, Tom Sanchez
realizes that he needs to start funding his
retirement plan now because you can never
start too early to save for retirement. Sanchez
wants to make a reasonable investment
choice and believes that placing his money in
retirement funds would be a good choice for
his current financial situation. He decides to
contact the Choice Is Yours (CIY) investment
service that a business professor had once
said was noted for its ethical behavior and
fairness toward younger investors.
2
What Sanchez did not know is that Choice Is Yours
has already been thinking about studying a wide
variety of retirement funds, with the business
objective of being able to suggest appropriate
funds for its younger investors. A company task
force has already selected 318 retirement funds
that may prove appropriate for younger investors.
You have been asked to define, collect, organize,
and visualize data about these funds in ways that
could assist prospective clients making decisions
about the funds in which they will invest. What
facts about each fund would you collect to help
customers compare and contrast the many funds?
3
You decide that a good starting point would
be to define the variables for key
characteristics of each fund, including each
fund’s past performance. You also decide to
define variables such as the amount of assets
that a fund manages and whether the goal of
a fund is to invest in companies whose
earnings are expected to substantially
increase in future years (a “growth” fund) or
invest in companies whose stock price is
undervalued, priced low relative to their
earnings potential (a “value” fund).
4
You collect data from appropriate sources and
use the business convention of placing the data
for each variable in its own column in a
worksheet. As you think more about your task,
you realize that 318 rows of data, one for each
fund in the sample, would be a lot for anyone to
review. Prospective clients such as Tom Sanchez
will be forced to scroll down through several
screens to view all the data and will face the
challenge of remembering the data that has
gone off screen. Is there something else you can
do? Can you organize and present these data to
prospective clients in a more helpful and
comprehensible manner? 5
• Data.files\Retirement Funds.xlsx

6
Methods to Organize and Visualize Data
Variable Type Organizing Visualizing
Categorical Summary Table, Bar Chart, Pie Chart, Pareto
Contingency Table Chart, Side-by-side bar chart
Numerical Ordered Array, Stem-and-leaf display,
Frequency Distribution, Histogram, Polygon,
Relative Frequency Cumulative percentage
Distribution, Percentage polygon, Boxplot
Distribution, Cumulative
Percentage Distribution,

For Two Bivariate tables Scatter Plot, Time-series plot


Numerical
Variables

For Categorical Multidimensional Pivot Tables


and Numerical Contingency Tables
Variables
Considered
together

7
Organizing Categorical Data

8
Summary Table
How adults pay their monthly bills
Form of Payment Number Percentage (%)

Cash 118 29.5


Cheque 128 32
Electronic/online 154 38.5
Total 400 100

9
Retirement Funds categorized by Risk
Fund Risk Number of Percentage of
Level Funds Funds (%)

Low 99 31.13
Average 145 45.60
High 74 23.27
Total 318 100

10
Contingency Table
Fund Type and Risk Level
Risk Level
Fund Type Low Average High Total
Growth 62 113 48 223
Value 37 32 26 95
Total 99 145 74 318

11
Fund Type and Risk Level
(based on percentage of overall total)
Risk Level
Fund Type Low Average High Total

Growth 19.50% 35.53% 15.09% 70.13%

Value 11.64% 10.06% 8.18% 29.87%

Total 31.13% 45.60% 23.27% 100%

12
Fund Type and Risk Level
(based on percentage of Row Total)
Risk Level
Fund Type Low Average High Total

Growth 27.80% 50.67% 21.52% 100%

Value 38.95% 33.68% 27.37% 100%

Total 31.13% 45.60% 23.27% 100%

13
Fund Type and Risk Level
(based on percentage of Column Total)
Risk Level
Fund Type Low Average High Total

Growth 62.63% 77.93% 64.86% 70.13%

Value 37.37% 22.07% 35.14% 29.87%

Total 100% 100% 100% 100%

14
Organizing Numerical Data

15
Raw Data
Petrol sold by 30 petrol filling stations (000’Litres)

16.2 15.7 16.4 16.0 15.9


15.4 16.4 15.8 16.3 16.9
16.0 15.2 15.7 15.6 16.0
16.6 15.8 16.2 15.9 16.8
15.9 15.6 15.9 16.8 16.3
15.8 16.1 15.9 16.3 16.0

16
Ordered Array
Petrol sold by 30 petrol filling stations (000’Litres)

15.2 15.4 15.6 15.6 15.6


15.7 15.7 15.8 15.8 15.8
15.9 15.9 15.9 15.9 16.0
16.0 16.0 16.0 16.1 16.2
16.2 16.3 16.3 16.3 16.4
16.4 16.6 16.8 16.8 16.9

17
• Unorganized Data of Meal Cost
• Restaurants.xlsx

18
City Restaurants Meal Cost ($)

27 53 53 65 47 51 81 57 63 53

30 63 68 29 44 48 57 29 34 42

76 42 53 30 64 88 57 82 51 38

41 32 69 45 55 38 54 57 31 62

44 44 43 53 45 55 92 90 42 45
Suburban Restaurants Meal Cost
($)
35 33 48 52 58 51 48 40 48 36

43 42 39 49 38 48 48 56 41 41

47 30 32 54 32 44 48 45 43 36

48 50 48 61 65 30 37 53 36 46

56 44 29 32 46 47 48 35 31 28
City Restaurant Meal Cost
(Ordered Array)
27 29 29 30 30 31 32 34 38 38

41 42 42 43 44 44 44 45 45 46

47 47 48 51 51 53 53 53 53 53

54 55 55 57 57 57 57 62 63 63

64 65 68 69 76 81 82 88 92 92

21
Suburban Restaurant Meal Cost
(Ordered Array)
28 29 30 30 31 32 32 32 33 35

35 35 36 36 36 37 38 39 40 41

41 42 43 43 44 44 45 46 46 47

47 48 48 48 48 48 48 48 48 48

49 50 51 52 53 54 56 56 58 61

22
The Frequency Distribution
List of all the values obtained in the data and the
corresponding frequency with which these values occur in
the data
 Ungrouped frequency distribution
 Ungrouped.FD.xlsx
 Grouped frequency distribution
• Frequency
• Classes
• Class Interval
• Mid Value of a class
• Range

23
The Three-Year Percentage Return of Retirement
Funds (Grouped)
3-Year Return % Growth Funds (freq.) Value Funds (freq.)
0-5 1 0
5-10 2 1
10-15 16 12
15-20 52 35
20-25 101 29
25-30 33 9
30-35 13 7
35-40 2 2
40-45 0 0
45-50 0 0
50-55 2 0
55-60 0 0
60-65 1 0
Total 223 95
24
Activity
Construct a frequency distribution for
City and Suburban Meal Cost

25
Restaurant Meal Cost (Grouped)
Meal Cost No. of Restaurants Meal Cost No. of Restaurants
(CI) (city frequency) (CI) (suburb frequency)

20-30 3 20-30 2
30-40 7 30-40 16
40-50 13 40-50 23
50-60 14 50-60 8
60-70 7 60-70 1
70-80 1 70-80 0
80-90 3 80-90 0
90-100 2 90-100 0
Total 50 Total 50
26
Types of classes
 Exclusive
 Inclusive

27
Inclusive FD
Meal Cost No. of Restaurants Meal Cost No. of Restaurants
(CI) (city frequency) (CI) (suburb frequency)

20-29 3 20-29 2
30-39 7 30-39 16
40-49 13 40-49 23
50-59 14 50-59 8
60-69 7 60-69 1
70-79 1 70-79 0
80-89 3 80-89 0
90-99 2 90-99 0
Total 50 Total 50
28
Principles of Classification
a) Number of classes should be between
5 and 15.
b) There should be no overlapping of classes.
c) All intervals should be of uniform width.
d) Class interval i=Range/No. of classes
e) Open-ended classes should be avoided.
f) Lower limits of classes should be simple
multiples of class width.

29
Relative Frequency

30
Relative Frequency Distribution
3-Year Growth Relative Percen Value Relative Percenta
Return (frequency) frequency tage (frequency) frequency ge
%
0-5 1 0.0045 0.45 0 0 0
5-10 2 0.009 0.9 1 0.0105 1.053
10-15 16 0.0717 7.17 12 0.1263 12.63
15-20 52 0.2332 23.3 35 0.3684 36.84
20-25 101 0.4529 45.3 29 0.3053 30.53
25-30 33 0.148 14.8 9 0.0947 9.474
30-35 13 0.0583 5.83 7 0.0737 7.368
35-40 2 0.009 0.9 2 0.0211 2.105
40-45 0 0 0 0 0 0
45-50 0 0 0 0 0 0
50-55 2 0.009 0.9 0 0 0
55-60 0 0 0 0 0 0
60-65 1 0.0045 0.45 0 0 0
Total 223 1.00 100.0 95 1.00 100.0

31
The Cumulative Distribution
&
The Percentage Cumulative
Distribution

32
Less Than Cumulative Frequency Distribution
(Growth Funds)
3-Year Return No. of No. of funds Relative % of funds
% funds (less than an upper limit) freq (%) (less than an upper limit

0-5 1 1 0.45 0.45


5-10 2 3 0.9 1.35
10-15 16 19 7.17 8.52
15-20 52 71 23.3 31.82
20-25 101 172 45.3 77.12
25-30 33 205 14.8 91.92
30-35 13 218 5.83 97.75
35-40 2 220 0.9 98.65
40-45 0 220 0 98.65
45-50 0 220 0 98.65
50-55 2 222 0.9 99.55
55-60 0 222 0 99.55
60-65 1 223 0.45 100
Total 223 100
33
More Than Cumulative Frequency Distribution
(Growth Funds)
3-Year Return No. of No. of funds Relative % of funds
% funds (more than an lower freq (%) (more than a lower limit
limit)
0-5 1 223 0.45 100
5-10 2 222 0.9 99.55
10-15 16 220 7.17 98.65
15-20 52 204 23.3 91.48
20-25 101 152 45.3 68.18
25-30 33 51 14.8 22.88
30-35 13 18 5.83 8.08
35-40 2 5 0.9 2.25
40-45 0 3 0 1.35
45-50 0 3 0 1.35
50-55 2 3 0.9 1.35
55-60 0 1 0 0.45
60-65 1 1 0.45 0.45
Total 223 100
34
Visualizing Data

35
Categorical Data
– The Bar Chart
– The Pie Chart
– The Pareto Chart

36
The Bar Chart
Type of Funds

95

223

0 50 100 150 200 250

37
The Bar Chart
Risk Type
160
145
140

120
99
100

80 74
60

40

20

38
The Pie Chart
Market Cap

24%

Large
Mid-Cap
Small
50% (blank)

27%

39
The Pareto Chart
400 100
350 90
80
300
70
250 60
200 50
150 40
30
100 Frequency
20 Cumulative %
50 10
0 0

40
• Data.files\ATM Transactions.xlsx

41
The Side-by-side Bar Chart
120

100

80

Low
60
Average
High
40

20

0
Growth Value

42
Numerical Data
– The Histogram
– The Percent Polygon
– The Cumulative Percent Polygon

43
Exercise 1.3
A sports psychologist studying the effect of jogging
on college students’ grades collected data from a
group of college joggers. Along with some other
variables, he recorded the average number of
miles run per day. He compiled his results into the
following distribution:

44
Miles per Freq. Miles per Freq.
day day
1.00-1.40 32 3.40-3.80 111
1.40-1.80 43 3.80-4.20 95
1.80-2.20 81 4.20-4.60 82
2.20-2.60 122 4.60-5.00 47
2.60-3.00 131 5.00 and up 53
3.00-3.40 130 45
a) Construct an ogive that tells you approx.
how many miles a day the middle jogger
runs?
b) Approx. what proportion of college joggers
runs at least 3.0 miles a day?

46

You might also like