You are on page 1of 58

CHAPTER 3

VISUALISING DATA
CONTENT

3.1 3.4
Data Organisation and Box-plot
Frequency Distribution

3.2 3.5
Histograms, Frequency Other Types of Graphs
Polygons, Ogive (For Qualitative Data)

3.3 3.6
Stem and Leaf Plot Graphical Summary
using Microsoft Excel
OVERVIEW
ORGANISING AND GRAPHING DATA
(QUANTITATIVE DATA)

FREQUENCY OGIVE (CUMULATIVE


DISTRIBUTION FREQUENCY GRAPH)
(Section 3.1) (Section 3.2)

HISTOGRAM
(Section 3.2)
STEM AND LEAF PLOT
(Section 3.3)

FREQUENCY POLYGONS BOX-PLOT


(Section 3.2) (Section 3.4)
OVERVIEW
ORGANISING AND GRAPHING DATA
(QUALITATIVE DATA)

PIE CHART BAR CHART


(Section 3.5) (Section 3.5)

FREQUENCY PARETO CONTIGENCY


DISTRIBUTION CHART TABLE
(Section 3.1) (Section 3.5) (Chapter 7)
3.1

DATA ORGANISATION
AND
FREQUENCY DISTRIBUTION
FREQUENCY DISTRIBUTION
(QUANTITATIVE DATA)

UNGROUPED DATA
Data are given as INDIVIDUAL POINTS

EXAMPLE

Television sets Frequency

0 2

1 13
GROUPED DATA
2 18 Data are given in INTERVALS
3 0
EXAMPLE
4 10
Exam Score Frequency
5 2
90-99 7

80-89 5

70-79 15

60-69 4

50-59 5

40-49 1
FREQUENCY DISTRIBUTION
(QUANTITATIVE DATA)

FREQUENCY DISTRIBUTION

UNGROUPED DATA GROUPED DATA

REASON

Which type is more desirable?

Reduces the complexity of the data

Helps to smoothen out irregularities in the distribution


EXAMPLE 3.1
(QUANTITATIVE DATA)

Heights of statistics students were obtained by a lecturer as part of a study conducted for class. The
last digits of those heights are listed below.

0 1 6 5 5 5 0 3 3

5 5 8 8 0 7 5 5 0

8 0 2 5 3 1 9 5 5

5 5 4 6 5 0 0 5 0

Construct a frequency distribution with 10 classes.

Solution

Last Digit 0 1 2 3 4 5 6 7 8 9

Frequency 8 2 1 3 1 14 2 1 3 1
FREQUENCY DISTRIBUTION
(QUANTITATIVE DATA-GROUPED DATA)

Determine the number of classes


01 using the Sturge’s formula:
k  1  3.322 log10 n  round- up 
Calculate the class width:
maximum minimum 02
Class width  (round-up)
k
Choose either minimum data value or
a convenient value below the
03 minimum data value as the first lower
class limits/class boundaries.

Using the first lower class limit and


the class width, list the other lower 04
class limits.

List the lower class limits in a vertical


05 column and then enter the upper
class limits.

Take each individual data value and


put a tally mark in the appropriate
class. Add the tally marks to find the 06
total frequency for each class.
EXAMPLE 3.2
(QUANTITATIVE DATA)

The following table depicted the marks obtained by 40 students in an examination.

33 38 41 41 43 47 48 48 49 49

50 52 52 52 54 55 56 56 57 58

59 60 61 62 64 65 65 66 66 68

68 69 71 71 75 76 79 80 81 85

Construct a frequency distribution table for the data above with 31 as the starting point.
Solution

k  1  3.322 log10 40  7  round- up 

85  33
Class width   8  round  up 
7
EXAMPLE 3.2
(QUANTITATIVE DATA)-CONTINUE

Class Boundaries Frequency

31  x  39 2

39  x  47 3

47  x  55 10

55  x  63 9

63  x  71 8
71  x  79 4
79  x  87 4
EXAMPLE 3.3
(QUALITATIVE DATA)

A sample of 25 young executives was taken at random. Each executive was asked to choose only one
of the five listed models of national cars; Myvi, Bezza, Viva, Axia and Alza.

Bezza Viva Bezza Myvi Myvi


Myvi Myvi Alza Alza Alza
Axia Axia Viva Alza Viva
Alza Myvi Viva Myvi Alza
Bezza Viva Alza Alza Viva

Develop a frequency distribution table for the data above.

Solution

Models Myvi Bezza Viva Axia Alza

Frequency 6 3 6 2 8
3.2

 HISTOGRAMS
 FREQUENCY POLYGON
 OGIVE
OVERVIEWS

ORGANISING AND GRAPHING GROUPED DATA

HISTOGRAM FREQUENCY POLYGON OGIVE

X-AXIS: X-AXIS: X-AXIS:


CLASS BOUNDARIES MIDPOINT UPPER CLASS BOUNDARIES

Y-AXIS: Y-AXIS: Y-AXIS:


FREQUENCY FREQUENCY CUMULATIVE FREQUENCY
HISTOGRAM
(THE SHAPE OF DISTRIBUTION)

Left-skewed Distribution Right-skewed Distribution

Symmetrical Distribution

Identify the direction of skewed based on “TAIL”


HISTOGRAM
(THE SHAPE OF DISTRIBUTION)

UNIFORM U-SHAPE BIMODAL

J-SHAPE REVERSE J
EXAMPLE 3.4
(HISTOGRAM)

The following frequency distribution summarized the percentage of sugar in a popular brand of soft
drink.

The percentage of sugar Frequency

32.0-32.9 5

33.0-33.9 12

34.0-34.9 20

35.0-35.9 16

36.0-36.9 7

Construct a histogram for the data above. Then identify the shape of distribution based on the
histogram.
EXAMPLE 3.4
(HISTOGRAM)-CONTINUE

Solution

Class Boundaries Frequency

31.95-32.95 5

32.95-33.95 12

33.95-34.95 20

34.95-35.95 16 SYMMETRICAL
DISTRIBUTION
35.95-36.95 7 25

Frequency 20

15

10

0
31.95 32.95 33.95 34.95 35.95 36.95
Class Boundaries
EXAMPLE 3.5
(FREQUENCY POLYGON)

The following table depicted the frequency distribution for the weight of 52 female workers in a factory.
Measurements have been recorded to the nearest kilogram (kg).

Weight (kg) Number of female workers

40-44 3

45-49 2

50-54 7

55-59 18

60-64 18

65-69 3

70-74 1

Construct a frequency polygon for the frequency distribution above.


EXAMPLE 3.5
(FREQUENCY POLYGON)-CONTINUE

Solution

Midpoint Frequency

37 0

42 3

47 2

52 7

57 18 20
18
62 18
16
67 3 14
Frequency

12
72 1
10
79 0 8
6
4
2
0
37 42 47 52 57 62 67 72 79
Class Boundaries
EXAMPLE 3.6
(OGIVE/
CUMULATIVE FREQUENCY GRAPH)

The following table shows the frequency distribution which depicts the years of service for 75
employees of a large manufacturing department of an international company.

Class Limits 1-5 6-10 11-15 16-20 21-25 26-30

Frequency 21 25 15 0 8 6

Construct an cumulative frequency graph (ogive) for the given frequency distribution. Then, find the
number of employees who serves in the company less than 18 years.

Solution

Upper boundary 0.5 5.5 10.5 15.5 20.5 25.5 30.5

Cumulative Frequency 0 21 46 61 61 69 75
EXAMPLE 3.6
(OGIVE/
CUMULATIVE FREQUENCY GRAPH)-
CONTINUE

80

70
Cumulative frequency

60

50

40

30

20

10

0
0.5 5.5 10.5 15.5 20.5 25.5 30.5
Upper boundary

The number of employees who serves less than 18 years: 61 people


3.3

STEM AND LEAF PLOT


STEM AND LEAF PLOT

showing data
 stem (leading
in graphic
digit)
form
 leaf (trailing
digit)

the stem
key indicator must be
to define the arranged in
stem and leaf order.
values.
STEM AND LEAF PLOT

mixture
model/back- shape of distribution
to-back stem (rotated in horizontal
and leaf position))
EXAMPLE 3.7
(STEM AND LEAF PLOT)

Alias Consultancy conducted a survey on the number of motorcycle thefts in Malaysia for a period of 25
days. The data acquired are shown below.

10 11 12 13 13 14 25 26 27 27
28 28 29 29 31 31 32 33 34 34
45 47 49 50 52

Construct a stem and leaf plot. Then give a comment on the distribution of the number of motorcycle
thefts in Klang.
Solution

Distribution: Right-skewed distribution

Stem Leaf
1 0 1 2 3 3 4
2 5 6 7 7 8 8 9 9
3 1 1 2 3 4 4 Key : 0 0 means 0
4
(iii) 5 7 9
5 0 2
EXAMPLE 3.8
(MIXTURES STEM AND LEAF PLOT)

The numbers of blocked intrusion attempts on each day during the first two weeks of the month were

56 47 49 37 38 60 50 43 43 59 50 56 54 58

After the change of firewall settings, the numbers of blocked intrusions during the next 20 days were

53 21 32 49 45 38 44 33 32 43
53 46 36 48 39 35 37 36 39 45

i. Construct a back-to-back stem and leaf plot. Then, give a comment on the distributions of the
number of blocked intrusions attempts before and after the change.
ii. Based on the answer in (i), can we said that the change of firewall settings is reduced the number
of blocked intrusions attempts? Justify your answer.
EXAMPLE 3.8
(MIXTURES STEM AND LEAF PLOT)-
CONTINUE

Solution

i Before Stem After


2 1
8 7 3 2 2 3 5 6 6 8 7 9 9
Key : 2 1 means 21
9 7 3 3 4 3 4 5 5 6 8 9
9 8 6 6 4 0 0 5 3 3
0 6

Left-skewed distribution Right-skewed distribution

ii
Since the peak of the distribution for the number of blocked intrusions attempts is
shifted to left after the changes of firewall settings, therefore we can said that
the said that the change of firewall settings is reduced the number of blocked
intrusions attempts.
EXERCISE 3.1
(STEM AND LEAF PLOT)
1. The following data shows 22 exam marks for Mathematic course:

44 52 70 75 53 44 52 66
57 79 83 68 94 66 59 45
69 48 53 80 95 44

Construct the stem-and-leaf plot for the data.

2. The data shown represents the percentage of unemployed males and females in 1995 for a
sample of countries of the world. Using the whole numbers as stems and the decimals as
leaves construct a back-to-back (mixture) stem and leaf plot and compare the distribution
of the two groups.

Females Males
4.9 5.0 5.3 2.1 2.3 2.3
5.5 5.6 5.6 2.7 3.0 3.3
5.8 6.1 6.3 3.3 3.6 3.7
6.6 6.7 7.1 3.9 4.2 4.2
7.4 7.6 7.9 4.4 4.5 5.6
3.4

BOX-PLOT
WHAT IS
BOX-PLOT??
A box and whisker plot

a graphical method of displaying


variation in a set of data

a box and whisker plot can provide


additional detail while allowing multiple sets
of data to be displayed in the same graph.

very effective and easy to read, as they can


summarize data from multiple sources and
display the results in a single graph

comparison of data from different categories


for easier, more effective in decision making
BOX-PLOT

Boxplots (Box and Whiskers plot) are


graphical representations of a five- The five-number summaries are:
number summary of a data set and  The lowest value of data set
outliers. (minimum)
 Q1 (1st Quartile or 25th percentile)
 The median (2nd Quartile or 50th
percentile)
 Q3 (3rd Quartile or 75th percentile)
 The highest value of data set
(maximum)

A Vertical boxplot

A Horizontal boxplot
HOW TO DRAW
BOX-PLOT??
PROCEDURES FOR
CONSTRUCTING A BOX-PLOT

1 Arrange the data in ascending order.

2 Find the first quartile, Q1 , second quartile Q2 , and third quartile Q3 .

Identify the presentation of outliers. IQR  Q3  Q1


Lower limit: x  Q3  1.5IQR Note:
3
The larger value of IQR,
Upper limit: x  Q1  1.5IQR the larger of variability.

4 Draw a scale for the data on the x-axis.

5 Locate the minimum value, Q1 , Q2 , Q3 , maximum value and outliers (if any) on the scale.

Draw a box around Q1and Q3vertical line through the Q2 , and connect the upper and
6 lower value.
OUTLIERS

an extremely high or an
extremely low data value when
compared with the rest of the
data values.
HOW TO DETECT OUTLIERS??

1 Arrange the data in order

2 Find Q1 and Q3

3 Find lower limit: Q1  1.5  Q3  Q1 

4 Find upper limit: Q3  1.5  Q3  Q1 

5 Check the data set for any data value which is smaller than or
larger than the lower and upper limit

x  Q1  1.5  Q3  Q1  or x  Q3  1.5  Q3  Q1 
EXAMPLE 3.9
OUTLIERS

The number of credits in business courses eight job applicants had is shown here:
9, 12, 15, 27, 33, 45, 63, 72.
Find the first and third quartiles for the above data. Is there any outlier on the above
data?
Solution:
x2  x3 x6  x7
Q1  x18  x2   13.5 and Q3  x 38  x6   54
4
2 4
2
Q1  1.5 Q3  Q1   13.5  1.5(54  13.5)  47.25
Q3  1.5 Q3  Q1   54  1.5(54  13.5)  114.75

Since 47.25  x  114.75 , thus there is no outlier.


EXERCISE 3.2
OUTLIERS

1. Given 19 2 1 4 3 7 5 4 6 . Find outliers if any.

2. Given 6 2 11 4 3 7 7 5 8 6 21 12. Find outliers if any.


PROCEDURES FOR
CONSTRUCTING A BOX-PLOT

1 Arrange the data in ascending order.

2 Find the first quartile, Q1 , second quartile Q2 , and third quartile Q3 .

Identify the presentation of outliers. IQR  Q3  Q1


Lower limit: x  Q1  1.5IQR Note:
3
The larger value of IQR,
Upper limit: x  Q3  1.5IQR the larger of variability.

4 Draw a scale for the data on the x-axis.

5 Locate the minimum value, Q1 , Q2 , Q3 , maximum value and outliers (if any) on the scale.

Draw a box around Q1 and Q3 vertical line through the Q2 , and connect the upper and
6 lower value.
EXAMPLE 3.10
(BOX-PLOT)

A company that sells mail-order handphones interested to study the typical weekly sales for the
inventory planning. The company has randomly selected 10 weekly sales from last year’s records and
obtained the data (RM’000) as shown below.

147 108 123 122 131 115 125 127 128 123

Construct a box-plot for data above. Then, give a comment about the shape of distribution.
Solution

108 115 122 123 123 125 127 128 131 147

Q1  x 1(10)  x2.5 Q3  x 3(10)  x7.5


 4  4 
Q1  x 3  122 Q2  x 2(10)  x5 Q3  x8  128
 4 
x 5  x 6 
(iii) Q2   124
2
EXAMPLE 3.10
(BOX-PLOT)-CONTINUE

Information Weekly sales (RM’000)

Minimum 115

First Quartile 122

Median 124

Third Quartile 128

Maximum 131

Outlier Yes (108,147)


Upper Limit x   Q1  1.5 IQR  113
Lower Limit x   Q3  1.5 IQR  137 

Outlier Minimum Maximum Outlier

Q1 Q2 Q3
105 115 125 135 145
(iii)
Weekly Sales (RM'000)
EXAMPLE 3.11
(PARALLEL BOX-PLOTS)

Mr. Tan is interested to compare the number of hours that housewives spend on television programs between
the town and rural areas in Kuantan. Therefore, he has randomly surveyed 20 and eleven housewives from the
town and rural areas, respectively. The following table depicted the collected data in ascending order.

Town Area Rural Area


25 26 27 28 2 7 8 11
28 29 29 31 12 15 20 24
32 34 35 35 30 32 34
36 37 38 39
40 41 43 45

Construct a parallel box-plots.


EXAMPLE 3.11
(PARALLEL BOX-PLOTS)-CONTINUE

Information Town Area (T) Rural Area (R)

Minimum 25 2

28  29 Q1  x 11  x3  8
First Quartile Q1  x 20 
  28.5 
  2.73 
 5  2 4 
 4 

34  35 Q2  x 11  x 6  15
Median Q2  x 20 
  34.5 
 10  2  5.5 
 2  2 

38  39
Third Quartile Q3  x 3 20 
  38.5 Q3  x 311 8.25   x9  30
15  2  
  4 
 4 

Maximum 45 34

IQR IQR  Q3  Q1  10 IQR  Q3  Q1  22

Outlier No No
Upper Limit x   Q1  1.5 IQR  13.5  x   Q1  1.5 IQR  25 
Lower Limit x   Q3  1.5 IQR  53.5  x   Q3  1.5 IQR  63
EXAMPLE 3.11
(PARALLEL BOX-PLOTS)-CONTINUE

Rural Area

R
Minimum Maximum
Q1 Q2 Q3
Town Area

T
Minimum Maximum
Q1 Q2 Q3
0 10 20 30 40 50
BOX-PLOT
(SHAPE DISTRIBUTION)

median is near
median falls median falls
the centre of the
to the left to the right
box

Symmetrical Right-skewed Left-skewed


BOX-PLOT
(SHAPE DISTRIBUTION)

 Suppose the median is near the centre of


the box (approximately symmetric):

the lines are right line is left line is


about the same larger than larger than
length the left line the right line
COMPARING BOX-PLOT

 If the boxplots for two or more data sets are graphed on the same axis, the
distributions can be compared using their central tendency (average) and variability
values.
 To compare the average, use the location of the medians.
 To compare the variability, use the length of the IQR.
COMPARING BOX-PLOT

Shape
Evening : left-skewed
Day: left skewed
Evening
Variability
Evening : IQR = 18
Day: IQR=27.25
Since IQR evening < IQR
Day
day, performance test
score during evening is
better than during the
day
EXERCISE 3.3
BOX-PLOT

1. Jason saves a portion of his salary from his part-time job in the hope
of buying a pair of shoes. He recorded the number of ringgit he was able
to save over the past 15 weeks.
19, 12, 9, 7, 17, 10, 6, 18, 9, 14, 19, 8, 5, 17, 9
Plot a box-plot to illustrate the money saving by Jason.

2. Test scores for a college statistics class held during the day are:
99; 56; 32; 90; 81; 56; 45; 77; 84; 72; 68; 32; 79; 90
Test scores for a college statistics class held during the evening are:
78; 68; 89; 76; 65; 45; 90; 80; 85; 78; 98; 90; 81; 25

Construct box plot for each set of data in the same axis.
3.5

OTHER TYPES OF GRAPHS


(FOR QUALITATIVE DATA)
EXAMPLE 3.11
(PIE CHART)

The following table depicted the operating cost of a minimart in Kelantan for the year 2017.

Item Rental Electricity Administration Wages Others

Expense (RM) 2500 1800 2200 3000 500

i. Construct a pie chart to represent the data above.


ii. If the operating cost for each item is increased by 20% in 2017 compared to the year 2016, determine the
angle of the sector for electricity for the year 2016.

Solution

i Total Expenses=RM10000

Item Rental Electricity Administration Wages Others

Expense (RM) 2500 1800 2200 3000 500

Angle (°) 90° 64.8° 79.2° 108° 18°


x
Angle      360; where x represents the expense for particular item.
10000
EXAMPLE 3.11
(PIE CHART)-CONTINUE

5.00%
25.00%
30.00%

18.00%
22.00%

Rental Electricity Administation Wages Others

ii Total Expenses=RM8000

Item Rental Electricity Administration Wages Others

Expense (RM) 2000 1440 1760 2400 400

Angle (°) 90° 64.8° 79.2° 108° 18°

Therefore, the angle for the sector of electricity is 64.8°.


EXAMPLE 3.12
(PARETO CHART)

Construct a Pareto chart to represent the data given in the following table to display the frequency and
percentage of the different areas of employment for the accounting graduates.

General
Area Accounting Marketing Finance Others Total
Management

Number of
80 30 60 15 15 200
graduates

Solution

General
Area Accounting Finance Marketing Others
Management
Number of
80 60 30 15 15
graduates

Percentage (%) 40 30 15 7.5 7.5

Cumulative
40 70 85 92.5 100
Percentage (%)
x
Percentage   100%; where x represents the expense for particular item.
200
EXAMPLE 3.12
(PARETO CHART)-CONTINUE

100 100%

90 90%

80 80%

70 70%
Frequency

60 60%

50 50%

40 40%

30 30%

20 20%

10 10%

0 0%
Accounting Finance Marketing General Others
Management
Area
EXAMPLE 3.13
(BAR CHART)

The following data rendered the statistics of two major violence against women in a city from year 2015 to
2017.
Year
Violence
2015 2016 2017
Domestic violence 410 433 489
Rape 180 210 306

Construct a clustered bar chart to represent the information above.


Solution

600
489
500
433
410
400
Frequency

306
300
210 Domestic violence
180
200 Rape
100

0
2015 2016 2017
Year
3.6

GRAPH SUMMARY USING


MICROSOFT EXCEL
MICROSOFT EXCEL
THANK YOU
END OF CHAPTER 3

You might also like