You are on page 1of 69

ORGANIZING DATA

■ The data need to be organized to show


important properties that may help in
the analysis of the data.
■ The organized data provide better
understanding for statistical inference.

2
RAW DATA
● Definition
● Data recorded in the sequence in which
they are collected and before they are
processed or ranked are called raw data.

3
Raw Data: Ages of 50 students

21 19 24 25 29 34 26 27 37 33
18 20 19 22 19 19 25 22 25 23
25 19 31 19 23 18 23 19 23 26
22 28 21 20 22 22 21 20 19 21
25 23 18 37 27 23 21 25 21 24

4
3 Ways to Present Data
1. Textual
- the presentation is in narrative or
paragraph form
- may not get immediate interest of the
reader but it can present a more
comprehensive picture of the data
because of further written explanation of
its nature.
5
3 Ways to Present Data
2. Tabular
- makes use of rows and columns like a
frequency table of frequency distribution
- data are presented in a systematic
and orderly manner which catches one’s
attention and may facilitate the
comprehension and analysis of the data
presented
6
3 Ways to Present Data
3. Graphical
- a pictorial or geometrical presentation
of the given data
- can be in a form of frequency
polygon, bar graph, pie graph, stem and
leaf display and pictograph

7
Type of Employment Students
Intend to Engage in

Number of Frequency
Students column
Variable Type of Employment
Private companies 60
Category Government 30 Frequency
Own business 10

Sum = 100

8
Frequency Distributions
● Definition
● A frequency distribution are lists of all
categories and the number of elements that
belong to each of the categories.

9
Example
● A sample of 30 employees from large
companies was selected, and these
employees were asked how stressful their
jobs were. The responses of these
employees are recorded next where very
represents very stressful, somewhat means
somewhat stressful, and none stands for
not stressful at all.
10
Example
Some what None Somewhat Very Very None
Very Somewhat Somewhat Very Somewhat Somewhat
Very Somewhat None Very None Somewhat
Somewhat Very Somewhat Somewhat Very None
Somewhat Very very somewhat None Somewhat

Construct a frequency distribution table for these


data.

11
Solution
Table 1: Frequency Distribution of Stress on Job

Stress on Job Tally Frequency (f)


Very |||| |||| 10
Somewhat |||| |||| |||| 14
None |||| | 6
Sum = 30

12
Relative Frequency and
Percentage Distributions
● Calculating Relative Frequency of a
Category

13
Relative Frequency and
Percentage Distributions cont.
● Calculating Percentage

Percentage = (Relative frequency) · 100

14
Example
● Determine the relative frequency and
percentage for the data in table 1

15
Solution

Table 2 Relative Frequency and Percentage Distributions of


Stress on Job

Stress on Job Relative Frequency Percentage


Very 10/30 = .333 .333(100) = 33.3
Somewhat 14/30 = .467 .467(100) = 46.7
None 6/30 = .200 .200(100) = 20.0
Sum = 1.00 Sum = 100

16
Thesis table

17
Graphical Presentation
of Data
● Definition
● A graph made of bars whose heights
represent the frequencies of respective
categories is called a bar graph.

18
Bar graph for the frequency
distribution of stress on job

19
Graphical Presentation
of Data cont.
● Definition
● A circle divided into portions that represent
the relative frequencies or percentages of a
population or a sample belonging to
different categories is called a pie chart.

20
Pie chart for the percentage distribution
of stress on jobs

21
ORGANIZING AND GRAPHING
QUANTITATIVE DATA
■ Frequency Distributions
■ Constructing Frequency Distribution
Tables
■ Relative and Percentage Distributions
■ Graphing Grouped Data
■ Histograms
■ Polygons

22
Frequency Distributions cont.
In cases where the number of data is big,
the presentation using the grouped
frequency distribution is preferred.
The data are arranged by categories or
classes with their corresponding
frequencies.

23
Frequency Distributions
Daily Earnings of 100 Employees of a Company

Variable
Daily Earnings Number of Employees Frequency
(Peso) f column
401 to 600 9
601 to 800 22
Frequency of the
Third class 801 to 1000 39
third class
1001 to 1200 15
1201 to 1400 9
1401 to 1600 6

Lower limit of the Upper limit of the


sixth class sixth class

24
Frequency Distributions cont.
● Definition
● The class boundary is given by the
midpoint of the upper limit of one class and
the lower limit of the next class.

25
Frequency Distributions cont.
Finding Class Width

Class width = Upper boundary – Lower boundary

26
Frequency Distributions cont.
Calculating Class Midpoint or Mark

27
Class Boundaries, Class Widths, and Class
Midpoints for Daily Earnings

Class Limits Class Boundaries Class Width Class Midpoint


401 to 600 400.5 to less than 600.5 200 500.5
601 to 800 600.5 to less than 800.5 200 700.5
801 to 1000 800.5 to less than 1000.5 200 900.5
1001 to 1200 1000.5 to less than 1200.5 200 1100.5
1201 to 1400 1200.5 to less than 1400.5 200 1300.5
1401 to 1600 1400.5 to less than 1600.5 200 1500.5

28
Constructing Frequency
Distribution Tables
■ Procedure for Constructing Frequency
Distribution
1. Determine the range by taking the
difference of the highest value and lowest
value.
2. Determine the number of class intervals.
There is no definite rule in determining the
number of class intervals for as long as the
number can provide necessary information
needed (not so many or few). 29
Constructing Frequency
Distribution Tables
■ Procedure for Constructing Frequency
Distribution
2. The ideal number of class intervals is
between 5 and 20 depending on the nature of
data.

30
Constructing Frequency
Distribution Tables

31
Constructing Frequency
Distribution Tables
■ Procedure for Constructing Frequency
Distribution
3. Take the quotient of the range by the desired
number of class intervals to get the size of the
class interval. The lower limit of the lowest class
interval is preferably a multiple of the class size
of the class interval.

32
Example
● The table gives the admission test scores of
the students. Construct a frequency
distribution table.

33
Scores of Students in the Admission Test

192 152 142


152 165 175 139
124 164 136 167
146 165 198 162
167 177 152 160
140 200 133 223
155 217 230 205
169 187 165

34
Solution

Now we round this approximate width to a convenient


number – say, 22.

35
Solution
The lower limit of the first class can be taken as
124 or any number less than 124. Suppose we
take 124 as the lower limit of the first class. Then
our classes will be
124 – 145, 146 – 167, 168 – 189, 190 – 211,
and 212 - 233

36
Frequency Distribution for the Data

Total Test Scores Tally f


124 – 145 |||| | 6
146 – 167 |||| |||| ||| 13
168 – 189 |||| 4
190 – 211 |||| 4
212 - 233 ||| 3
∑f = 30

37
Relative Frequency and
Percentage Distributions
Relative Frequency and Percentage Distributions

38
Example
● Calculate the relative frequencies and
percentages

39
Solution
Relative Frequency and Percentage Distributions

Relative
Scores Class Boundaries Percentage
Frequency
124 – 145 123.5 to less than 145.5 .200 20.0
146 – 167 145.5 to less than 167.5 .433 43.3
168 – 189 167.5 to less than 189.5 .133 13.3
190 – 211 189.5 to less than 211.5 .133 13.3
212 - 233 211.5 to less than 233.5 .100 10.0
Sum = .999 Sum = 99.9%

40
Graphing Grouped Data

● Definition
● A histogram is a graph in which classes are marked
on the horizontal axis and the frequencies, relative
frequencies, or percentages are marked on the
vertical axis. The frequencies, relative frequencies,
or percentages are represented by the heights of
the bars. In a histogram, the bars are drawn
adjacent to each other.
41
Frequency histogram
15

12

9
Frequency

124 - 146 - 168 - 190 - 212 -


145 167 189 211 233
42
Admission test scores
Relative frequency histogram
.50
Relative Frequency

.40

.30

.20

.10

124 - 146 - 168 - 190 - 212 -


145 167 189 211 233
43
Admission test scores
Graphing Grouped Data cont.

● Definition
● A graph formed by joining the midpoints of
the tops of successive bars in a histogram
with straight lines is called a polygon.

44
Frequency polygon
15

12
Frequency

124 - 146 - 168 - 190 - 212 -


145 167 189 211 233
45
Bar Graph
■ Is an illustration of the data using bars
in the xy-plane

46
Example
The administration in a large city wanted to know the
distribution of vehicles owned by households in that city. A
sample of 40 randomly selected households from this city
produced the following data on the number of vehicles
owned:
5 1 1 2 0 1 1 2 1 1
1 3 3 0 2 5 1 2 3 4
2 1 2 2 1 2 2 1 1 1
4 2 1 1 2 1 1 4 1 3
Construct a frequency distribution table for these data, and
draw a bar graph.

47
Solution
Frequency Distribution of Vehicles Owned

Number of
Vehicles Owned
Households (f)
0 2
1 18
2 11
3 4
4 3
5 2
Σf = 40
48
Bar graph

49
SHAPES OF HISTOGRAMS
1. Symmetric
2. Skewed
3. Uniform or rectangular

50
Symmetric histograms

51
a) A histogram skewed to the right. (b) A
histogram skewed to the left.

(a) (b)

52
A histogram with uniform distribution

53
CUMULATIVE FREQUENCY
DISTRIBUTIONS
● Definition
● A cumulative frequency distribution gives the
total number of values that fall below the
upper boundary of each class.

54
Example
● Using the frequency distribution of
Admission Test Scores, reproduced in the
next slide, prepare a cumulative frequency
distribution.

55
Example
Total Test Scores f
124 – 145 6
146 – 167 13
168 – 189 4
190 – 211 4
212 - 233 3

56
Solution
Cumulative Frequency Distribution of Test Scores

Class Limits Class Boundaries Cumulative Frequency


124 – 145 123.5 to less than 145.5 6
124 – 167 123.5 to less than 167.5 6 + 13 = 19
124 – 189 123.5 to less than 189.5 6 + 13 + 4 = 23
124 – 211 123.5 to less than 211.5 6 + 13 + 4 + 4 = 27
124 – 233 123.5 to less than 233.5 6 + 13 + 4 + 4 + 3 = 30

57
CUMULATIVE FREQUENCY
DISTRIBUTIONS cont.
● Calculating Cumulative Relative Frequency
and Cumulative Percentage

58
Cumulative Relative Frequency and
Cumulative Percentage Distributions

Cumulative Cumulative
Class Limits Relative Frequency Percentage
124 – 145 6/30 = .200 20.0
124 – 167 19/30 = .633 63.3
124 – 189 23/30 = .767 76.7
124 – 211 27/30 = .900 90.0
124 - 233 30/30 = 1.00 100.0

59
CUMULATIVE FREQUENCY
DISTRIBUTIONS cont.
● Definition
● An ogive is a curve drawn for the cumulative
frequency distribution by joining with
straight lines the dots marked above the
upper boundaries of classes at heights equal
to the cumulative frequencies of respective
classes.

60
Ogive for the cumulative frequency
distribution

30
Cumulative frequency

25

20

15

10

123.5 145.5 167.5 189.5 211.5 233.5


61
Admission Test Scores
STEM-AND-LEAF DISPLAYS
● Definition
● In a stem-and-leaf display of quantitative
data, each value is divided into two portions
– a stem and a leaf. The leaves for each
stem are shown separately in a display.

62
Example
● The following are the scores of 30 college
students on a statistics test:
75 52 80 96 65 79 71 87 93 95
69 72 81 61 76 86 79 68 50 92
83 84 77 64 71 87 72 92 57 98

● Construct a stem-and-leaf display.

63
Solution
● To construct a stem-and-leaf display for
these scores, we split each score into two
parts. The first part contains the first digit,
which is called the stem. The second part
contains the second digit, which is called the
leaf.

64
Solution
● We observe from the data that the stems
for all scores are 5, 6, 7, 8, and 9 because
all the scores lie in the range 50 to 98

65
Stem-and-leaf display

Stems

Leaf for 52

5 2
Leaf for 75
6
7 5
8
9

66
Solution
● After we have listed the stems, we read the
leaves for all scores and record them next
to the corresponding stems on the right
side of the vertical line.

67
Stem-and-leaf display of test
scores

5 2 0 7
6 5 9 1 8 4
7 5 9 1 2 6 9 7 1 2
8 0 7 1 6 3 4 7
9 6 3 5 2 2 8

68
Ranked stem-and-leaf display of test
scores

5 0 2 7
6 1 4 5 8 9
7 1 1 2 2 5 6 7 9 9
8 0 1 3 4 6 7 7
9 2 2 3 5 6 8

69

You might also like