You are on page 1of 46

Click the frog to go back to the board.

SLIDESMANIA.COM

HEY!
ARE YOU READY FOR
THE NEXT CHAPTER?
Click the frog to go back to the board.
SLIDESMANIA.COM

CHAPTER 4 :
DATA MANAGEMENT
Click the frog to go back to the board.
SLIDESMANIA.COM

UNIT 4.1 : Introduction to Data Management


UNIT 4.2 : Measures of Central Tendency
UNIT 4.3 : Measures of Dispersion
UNIT 4.4 : Measures of Relative Position
UNIT 4.5 : Probabilities and Normal Distributions
UNIT 4.6 : Linear Regression and Correlation
SLIDESMANIA.COM
SLIDESMANIA.COM

WAZZ UP!
I’M JOMEL, THE FIRST REPORTER.
I’M GOING INTRODUCE TO YOU THE
DATA MANAGEMENT
Click the frog to go back to the board.
SLIDESMANIA.COM

INTRODUCTION
TO
DATA MANAGEMENT
UNIT 4.1 INTRODUCTION TO DATA MANAGEMENT

Data management is a process by which


information is acquired and processed to ensure
the accessibility and reliability of the data for its
users. One of the most important tool in
processing and managing such information is
statistics.

Statistics is utilized in most areas of human


endeavor. It is usually used in education,
research, business, agriculture, and other fields
and even in everyday life activities.
Click the frog to go back to the board.
SLIDESMANIA.COM

A. ORGANIZATION OF
DATA
UNIT 4.1 INTRODUCTION TO DATA MANAGEMENT

A. ORGANIZATION OF DATA

1. frequency distribution - is an overview of all


distinct values in some variable and the number of
times they occur.That is, a frequency distribution
tells how frequencies are distributed over values.

2. Graphs and Charts - most useful tool in


presenting the data)
UNIT 4.1 INTRODUCTION TO DATA MANAGEMENT

DEFINITION OF TERMS
RAW DATA – data collected in original form.
RANGE – highest value minus lowest value.
CLASS LIMITS – smallest and largest observations in each
class(apparent class).
CLASS BOUNDARIES - individual values chosen to separate
classes (often being the midpoints between upper and lower
classes limits of adjacent classes)
INTERVAL(width)– distance between class lower boundary
and the class upper boundary (denoted by i )
UNIT 4.1 INTRODUCTION TO DATA MANAGEMENT

DEFINITION OF TERMS

FREQUENCY – number of values in specific class of frequency


distribution(denoted by f ).
PERCENTAGE – multiplying the relative frequency by 100%.
CUMULATIVE FREQUENCY – sum of the frequencies accumulated
up to the upper boundary of a class in a frequency distribution
MIDPOINT – point halfway between the class limits of each class,
data within the class.
UNIT 4.1 INTRODUCTION TO DATA MANAGEMENT

CATEGORICAL FREQUENCY
DISTRIBUTION
Example:
High High High Low Average

Used to organize nominal-level


or ordinal level type of data. Average Low Average Average Average

Some examples -> gender,


business type, political Low Average Average High High

affiliation, etc.
Low Low Average High High
UNIT 4.1 INTRODUCTION TO DATA MANAGEMENT

CATEGORICAL FREQUENCY DISTRIBUTION


STEP 1: Construct a table
STEP 2: Tally the raw data
STEP 3: Convert to numerical frequencies
STEP 4: Get the percentage

CLASS TALLY FREQUENCY PERCENTAGE

High IIIII – II 7 35

Average IIIII – III 8 40

Low IIIII 5 25

TOTAL 20 100
SLIDESMANIA.COM
UNIT 4.1 INTRODUCTION TO DATA MANAGEMENT Click the frog to go back to the board.

DID YOU KNOW ???


There are four types of frequency distribution
under statistics which are:

• Ungrouped frequency distribution


• Grouped frequency distribution
• Relative frequency distribution
• Cumulative frequency distribution
UNIT 4.1 INTRODUCTION TO DATA MANAGEMENT

DETERMINING THE CLASS INTERVAL

- Generally, the number of classes for a


frequency distribution table varies from 5 to
20
- It is preferred to have more classes as the size
of the data set increases.

𝑅𝐴𝑁𝐺𝐸 𝐻𝑉 −𝐿𝑉
CLASS INTERVAL = =
𝑁𝑈𝑀𝐵𝐸𝑅 𝑂𝐶 𝐶𝐿𝐴𝑆𝑆𝐸𝑆 𝑘
UNIT 4.1 INTRODUCTION TO DATA MANAGEMENT

GROUPED FREQUENCY DISTRIBUTION

- Used when the range of data set is large.


- The data must be group into classes whether it is categorical or
interval data.
Example: These data represent the record high temperatures in F for each of the
50 states. Construct a grouped frequency distribution for the data using 7
classes.
112 100 127 120 134 118 105 110 109 112
110 118 117 116 118 122 114 114 105 109
107 112 114 115 118 117 118 122 106 110
116 108 110 121 113 120 119 111 104 111
120 113 120 117 105 110 118 112 114 114
UNIT 4.1 INTRODUCTION TO DATA MANAGEMENT

GROUPED FREQUENCY DISTRIBUTION


112 100 127 120 134 118 105 110 109 112
110 118 117 116 118 122 114 114 105 109
107 112 114 115 118 117 118 122 106 110

Solution:
116 108 110 121 113 120 119 111 104 111
120 113 120 117 105 110 118 112 114 114

Step 1: Determine the class interval or width.


• Find the highest value and lowest value: H = 134 and L = 100.
• Find the range: R highest value - lowest value, so R 134 - 100 = 34
• Select the number of classes desired (usually between 5 and 20). In this
case, 7 is arbitrarily chosen. Find the class width by dividing the range by
the number of classes.

R 34
Width = = = 4.9 or 5
number of classes 7

• Find the class boundaries by subtracting 0.5 from each lower class limit and
adding 0.5 to each upper class limit: 99.5–104.5, 104.5–109.5, etc.
UNIT 4.1 INTRODUCTION TO DATA MANAGEMENT

GROUPED FREQUENCY DISTRIBUTION


Solution: 112
110
100
118
127
117
120
116
134
118
118
122
105
114
110
114
109
105
112
109
Step 2: Tally the data. 107
116
112
108
114
110
115
121
118
113
117
120
118
119
122
111
106
104
110
111
Step 3 Find the numerical frequencies from the tallies. 120 113 120 117 105 110 118 112 114 114

Step 4 Find the relative frequency, percentage, midpoint, and cumulative frequencies.
Class Limits Class Tally Frequenc Relative Percentage Midpoint Cumulative
Boundaries y Frequency Frequency
100–104 99.5–104.5 II 2 0.04 4 102 2

105–109 104.5–109.5 IIIII – III 8 0.16 16 107 10

110 - 114 109.5–114.5 IIIII – IIIII – IIIII 18 0.36 36 112 28


– III
115–119 114.5–119.5 IIIII – IIIII – III 13 0.26 26 117 41

120–124 119.5–124.5 IIIII – II 7 0.14 14 122 48

125–129 124.5–129.5 I 1 0.02 2 127 49

130–134 129.5–134.5 I 1 0.02 2 132 50


∑f = 50
Click the frog to go back to the board.
SLIDESMANIA.COM

B. GRAPHICAL
STATISTICAL DATA
UNIT 4.1 INTRODUCTION TO DATA MANAGEMENT

B. GRAPHICAL STATISTICAL DATA

1. Histogram
- is a graphical representation of a
grouped frequency distribution with
continuous classes. It is an area
diagram and can be defined as a set of https://ezspss.com/frequency-distribution-in-spss/

rectangles with bases along with the


intervals between class boundaries and
with areas proportional to frequencies
in the corresponding classes.
SLIDESMANIA.COM
Click the frog to go back to the board.

DID YOU KNOW ???

The term ‘histogram’


was first introduced
by Karl Pearson.
UNIT 4.1 INTRODUCTION TO DATA MANAGEMENT

B. GRAPHICAL STATISTICAL DATA

Histogram (Example) Height Range


Number of
Trees
(ft)
Uncle Bruno owns a garden with
(Frequency)

30 black cherry trees. Each tree 60 - 65 3

is of a different height. The 66 - 70 3


height of the trees (in inches): 71 - 75 8
61, 63, 64, 66, 68, 69, 71, 71.5, 72, 76 - 80 10
72.5, 73, 73.5, 74, 74.5, 76, 76.2,
81 - 85 5
76.5, 77, 77.5, 78, 78.5, 79,
79.2, 80, 81, 82, 83, 84, 85, 87. 86 - 90 1
UNIT 4.1 INTRODUCTION TO DATA MANAGEMENT

B. GRAPHICAL STATISTICAL DATA

Number of
Height Range
Trees
(ft)
(Frequency)
60 - 65 3
66 - 70 3
71 - 75 8
76 - 80 10
81 - 85 5
86 - 90 1
UNIT 4.1 INTRODUCTION TO DATA MANAGEMENT

B. GRAPHICAL STATISTICAL DATA

2. Frequency polygon is a
line graph of class
frequency plotted against
class midpoint. It can be
obtained by joining the
midpoints of the tops of https://www.sciencedirect.com/topics/mathematics/frequency-

the rectangles in the


polygon#:~:text=A%20frequency%20polygon%20is%20a,3.3.).

histogram
UNIT 4.1 INTRODUCTION TO DATA MANAGEMENT

B. GRAPHICAL STATISTICAL DATA


Cumulative
Lower Limit Upper Limit Count
Count

EXAMPLE
29.5 39.5 0 0
39.5 49.5 3 3
49.5 59.5 10 13
59.5 69.5 53 66
A frequency polygon for 642 69.5 79.5 107 173

psychology test scores shown in


79.5 89.5 147 320
89.5 99.5 130 450
Figure 1 was constructed from the 99.5 109.5 78 528
109.5 119.5 59 587
frequency table shown in Table 1. 119.5 129.5 36 623
129.5 139.5 11 634
139.5 149.5 6 640
149.5 159.5 1 641
159.5 169.5 1 642
169.5 179.5 0 642
Table 1. Frequency Distribution of Psychology Test Scores.
UNIT 4.1 INTRODUCTION TO DATA MANAGEMENT

B. GRAPHICAL STATISTICAL DATA


Cumulative
Lower Limit Upper Limit Count
Count
29.5 39.5 0 0
39.5 49.5 3 3
49.5 59.5 10 13
59.5 69.5 53 66
69.5 79.5 107 173
79.5 89.5 147 320
89.5 99.5 130 450
99.5 109.5 78 528
109.5 119.5 59 587
119.5 129.5 36 623
129.5 139.5 11 634
139.5 149.5 6 640
Figure 1. Frequency polygon for the psychology test scores.
149.5 159.5 1 641
159.5 169.5 1 642
169.5 179.5 0 642
UNIT 4.1 INTRODUCTION TO DATA MANAGEMENT

B. GRAPHICAL STATISTICAL DATA

3. Cumulative frequency polygon


- is a type of frequency polygon that
shows cumulative frequencies. In
other words, the cumulative percent
are added on the graph from left to
right. An ogive graph plots cumulative
frequency on the y-axis and class https://www.researchgate.net/figure/14-Weight-

boundaries along the x-axis.


Cumulative-Frequency-Polygon-OGIVE_fig4_259990962
UNIT 4.1 INTRODUCTION TO DATA MANAGEMENT

B. GRAPHICAL STATISTICAL DATA


Cumulative frequency polygon(Example) Lower Limit Upper Limit Count
Cumulative
Count
29.5 39.5 0 0
A frequency polygon for 642 psychology test scores
39.5 49.5 3 3
shown in Figure 1 was constructed from the frequency 49.5 59.5 10 13
table shown in Table 1. 59.5 69.5 53 66
69.5 79.5 107 173
79.5 89.5 147 320
89.5 99.5 130 450
. 99.5 109.5 78 528
109.5 119.5 59 587
119.5 129.5 36 623
129.5 139.5 11 634
139.5 149.5 6 640
149.5 159.5 1 641
159.5 169.5 1 642
169.5 179.5 0 642
Figure 2. Cumulative frequency polygon for the psychology test scores.
UNIT 4.1 INTRODUCTION TO DATA MANAGEMENT
Allotted Time : 20 mins

ACTIVITY 1
Instructions: Do the following in notebook or bondpaper. Submit in pdf or jpg form. Filename must
be Surname_Group No._Activity1_
SJS Travel Agency, a nationwide local travel agency, offers a special rate on summer period. The
owner wants additional information on the ages of those people taking travel tours. Construct a
histogram, frequency polygon, and cumulative frequency polygon.
Class Limits Class Midpoints Frequency Cumulative
. Boundaries Frequency
18 – 26 17.5 – 26.5 22 3 3

27 – 35 26.5 – 35.5 31 5 8

36 – 44 35.5 – 44.5 40 9 17

45 - 53 44.5 – 53.5 49 14 31

54 – 62 53.5 – 62.5 58 11 42

63 – 71 62.5 – 71.5 67 6 47

72 - 80 71.5 – 80.5 76 2 50
UNIT 4.1 INTRODUCTION TO DATA MANAGEMENT

B. GRAPHICAL STATISTICAL DATA

HISTOGRAM Graphically display


the counts
FREQUENCY POLYGON
CUMULATIVE FREQUENCY
POLYGON

. PARETO CHART
BAR CHART OR BAR GRAPH
PIE CHART OR CIRCLE GRAPH
Count the frequency of TIME SERIES GRAPH
each value of the PICTOGRAPH
variable SCATTER PLOT
UNIT 4.1 INTRODUCTION TO DATA MANAGEMENT

B. GRAPHICAL STATISTICAL DATA

4. Pareto Chart
- is a graph used to
represent a frequency
distribution for a categorical
data(or nominal-level) and
frequencies are displayed by
the heights of vertical bars,
which are arranged in order
from highest to lowest
UNIT 4.1 INTRODUCTION TO DATA MANAGEMENT

B. GRAPHICAL STATISTICAL DATA

Example

∑f = 149
UNIT 4.1 INTRODUCTION TO DATA MANAGEMENT

B. GRAPHICAL STATISTICAL DATA

5. Bar chart or Bar graph


- is similar to bar histogram. The
bases of the rectangles are
arbitrary intervals whose centers
are the codes. The height of each
rectangle represents the
frequency of the category. It is
also applicable for categorical
data(or nominal-level).
UNIT 4.1 INTRODUCTION TO DATA MANAGEMENT

B. GRAPHICAL STATISTICAL DATA

Example
Table: Favorite Type of Movie

Comedy Action Romance Drama SciFi

4 5 6 1 4
UNIT 4.1 INTRODUCTION TO DATA MANAGEMENT

B. GRAPHICAL STATISTICAL DATA

6. Pie chart or Circle graph


- is a circle divided into portions
that represent the relative
frequencies(or percentages) of
the data belonging to different
categories. The data should be
categorical or nominal level.
UNIT 4.1 INTRODUCTION TO DATA MANAGEMENT

B. GRAPHICAL STATISTICAL DATA

Example
Table: Favorite Type of Movie

Comedy Action Romance Drama SciFi TOTAL

4 5 6 1 4 20

Comedy Action Romance Drama SciFi TOTAL

4 5 6 1 4 20

4/20 5/20 6/20 1/20 4/20


100%
= 20% = 25% = 30% = 5% = 20%
UNIT 4.1 INTRODUCTION TO DATA MANAGEMENT

B. GRAPHICAL STATISTICAL DATA

7. Time series graph


- represents the data that
occur over specific period of
time under observation. In
addition, it shows a trend or
pattern on the increase or
decrease over the period of
time.
UNIT 4.1 INTRODUCTION TO DATA MANAGEMENT

B. GRAPHICAL STATISTICAL DATA

Example
The table shows the sales of a company in millions of dollars.
199
Year 1986 1987 1988 1989 1990 1991 1992 1993 1994 1996 1997 1998 1999 2000
5
Sales
$m 12 3 9 24 33 48 27 15 36 57 51 24 45 63 57
UNIT 4.1 INTRODUCTION TO DATA MANAGEMENT

B. GRAPHICAL STATISTICAL DATA

8. Pictogram(Pictograph)
- is a combination of
attention-getting quality and
the accuracy of the bar chart.
Appropriate pictures arranged
in a row(sometimes in column)
present the quantities for
comparison.
UNIT 4.1 INTRODUCTION TO DATA MANAGEMENT

B. GRAPHICAL STATISTICAL DATA

Example
Information about 300 children of a school who come to school by different modes of transportation.

→ 1 face represents 10 children Information gathered from the above table:

(i) Number of students going to school by


different modes of transportation:
Auto-rickshaw = 6 × 10 = 60, Car = 4 × 10 =
40, Bicycle = 7 × 10 = 70, Bus = 10 × 10 =
100, On foot = 3 × 10 = 30
(ii) Total number = 60 + 40 + 70 + 100 + 30 =
300
UNIT 4.1 INTRODUCTION TO DATA MANAGEMENT

B. GRAPHICAL STATISTICAL DATA

9. Scatter Plot
- used to examine possible
relationships between two
numerical variables. The two
variables are plot in x-axis
and y-axis.
UNIT 4.1 INTRODUCTION TO DATA MANAGEMENT

B. GRAPHICAL STATISTICAL DATA

Example
The local ice cream shop keeps track of how much ice cream they sell versus the noon temperature
on that day. Here are their figures for the last 12 days:
Ice Cream Sales vs Temperature

Temperature °C Ice Cream Sales


14.2° $215
16.4° $325
11.9° $185
15.2° $332
18.5° $406
22.1° $522
19.4° $412
25.1° $614
23.4° $544
18.1° $421
22.6° $445
17.2° $408
UNIT 4.1 INTRODUCTION TO DATA MANAGEMENT

B. GRAPHICAL STATISTICAL DATA


UNIT 4.1 INTRODUCTION TO DATA MANAGEMENT

B. GRAPHICAL STATISTICAL DATA

SOME GUIDELINES IN CREATING A GRAPH:

1. The graphs/chart should include the title.


2. The scales for all axes should be included.
3. The scale on the y-axis should start at zero.
4. The graph/chart should not disfigure the data.
5. The x-axis and y-axis should be properly labeled.
6. The graph/chart should not contain unnecessary decorations.
7. The simplest possible graph/chart should be used for any data set.
Click the frog to go back to the board.
SLIDESMANIA.COM

YEYY !!!
WE’RE DONE WITH THE UNIT 4.1
PREPARE YOURSELF AS WE DISCUSS MORE
ABOUT DATA MANAGEMENT.
SEE YOU IN UNIT 4.2 !!!
SLIDESMANIA.COM
Click the frog to go back to the board.

“Facts are
stubborn things,
but statistics
are pliable.”
-Mark Twain

You might also like