You are on page 1of 17

LESSON 1(finals): Data Management

Introduction
Statistics is very important especially in academic endeavors like research writing. Data
management is one of those processes involved to come up with an accurate findings and
conclusion.
This unit will broaden your understanding of Mathematics as it relates to managing data. You are
expected to apply methods for organizing and analyzing large amounts of information and carry
out a culminating investigation that integrates statistical concepts and skills.
More so, this unit covers important statistical tools in data management. It presents data
gathering and organizing data, representing data using graphs and charts, interpreting organized
data, measures of central tendency, measures of dispersion and relative position, the normal
distribution curve, and linear correlation.
Enjoy your learning!

Topic 1: Data Gathering, Organization, Presentation and Interpretation

Learning Objectives

Upon the completion of this topic, you are expected to:


a. summarize and present data using the different methods of data presentation;
b. construct graphs and tables to present given data; and
c. interpret the data presented.

Presentation of Content

I. Data Gathering
Research is only valuable if you can share the data effectively. In this topic, you will learn how
to organize data and construct various charts and graphs to represent the same.

What is a Data?
Data is a collection of information from facts, statistics, numbers, characteristics, observations,
and measurements that represent an idea. There are two forms of data. 
1. Quantitative data deals with the quantity (for example, the number of whales at Sea
World). 
2. Qualitative data is another form of data that deals with the description of things. It can
be observed but not measured (such as the color of your eyes).

What are the Levels of Measuring Data?


When grouped, data can be formed into a single variable. Variables in quantitative analysis are
usually classified by their level of measurement, as indicated below.
1. Nominal data are categorical variables and has lowest level of measurement. Category
means that the values are not numerical. Examples are civil status, ID number, religion,
sex, etc.

When you are asked about your civil status, you will not answer 1,2,3 etc. But rather
your answer would either be single, married, widow or widower. These data (single,
married, widow, widower) are called categorical data.
Sex is either be male or female, but not 4 or 5. the category is either female or male.

2. Ordinal variables are categorical variables with order. (e.g. level of satisfaction, quality
of life indices)
3. Interval are quantitative variables but has no true zero point. (e.g. temperature in degree
Celsius, Intelligence Quotient)
4. Ratio is the highest level of measurement and has true zero point.
(e.g. weight of child, number of vaccinations)
 
A. Methods of Gathering Data
There are different methods that you can use to collect data and they are the following:
1. Direct method is data collection through the use of interviews. The enumerator talks to
the subject personally. He gets the data through a series of questions asked from the
subject of the interview.
2. Indirect Method is data collection through the use of questionnaires. These
questionnaires may be sent through the postal or electronic mail.
3. Observation is done through observation with the use of our senses. For example, the
MMDA gives report every week on the number of accidents happening at EDSA. To do
this, an MMDA personnel will just count the number of accidents through their CCTV.
4. Experimentation is usually done through experiment in laboratories and classrooms.
5. Registration is acquiring data from private and government agencies such as from the
National Statistics Office, the Bangko Sentral ng Pilipinas, Department of Finance, etc.

II. Organization of Data


After data has been collected, it can be consolidated and summarized in tables.
When the variable of interest is qualitative, the statistical table is a list of the categories being
considered, along with a measure of how often each value occurred.
The data can be summarized through the following ways:
A. The frequency or number of measurements in each category
B. The relative frequency, or proportion, of measurements in each category
C. The percentage of measurement in each category 

III. Presentation of Data


Once the measurements are summarized in a statistical table, you can either use graphs or charts
to display the distribution of the data.

A. Ways of Presenting Data


These are the different ways of presenting data.
1. Textual Form– Data and information are presented in paragraph and narrative form.
2. Tabular Form– Quantitative data are summarized in rows and columns.
3. Graphical Form– Data are presented in charts, graphs or pictures.

Textual Form
Have you seen data presented in textual form? Below is an example.
Study revealed that Mathematics teachers always used chalkboard (4.62) and textbooks (4.37);
and they sometimes used geometric figures (3.29), graphs (3.16), graphing board (3.12), pictures
(3.02), flash cards (3.01), and whiteboard (3.00). The respondents seldom used geometry board
(2.19), advance organizers (2.12), and realia (2.12). The overall weighted mean of 2.93 indicates
that the Mathematics teachers sometimes used the given traditional instructional materials in
teaching mathematical concepts.

Tabular Form
We can present data using frequency distribution table.

Frequency Distribution Table


The frequency distribution is an arrangement of numerical data according to size or magnitude,
with corresponding frequencies and class mark.
How can we present data using frequency distribution?

Constructing the Frequency Distribution Table


Refer to the guidelines below in constructing the table.

1. Determine the range of the data (the difference between the highest and lowest figure).
2. Divide the range by the number of classes to determine the class interval. To determine
the number of classes, we can use the formula:

k =1+3.3 log n
Where:
k =number of classes
n=number of values
3. The result is rounded off to the nearest whole number.
4. Start the first class with the lowest observation or a multiple of the class interval. This is
the lower limit of the first class. The highest observation is the upper limit of the last
class.
5. Determine the other lower limits by adding the class interval until we reach the computed
number of classes (k).
6. Write the upper limits by subtracting 1 from the lower limit of the upper class.
7. Count the number of values that fall under each class.

Example:
Construct a frequency distribution from the sales volume of 50 medical sales representatives.
723 735 720 765 779 788 745 757 819 767
767 755 781 800 812 796 753 728 740 753
770 793 786 775 760 801 793 786 794 781
738 744 757 769 752 735 746 769 777 766
750 771 730 745 783 779 805 788 768 760

Solution:

1. Compute the range (R).


R=819−720=99

2. Find the number of classes (k).


k =1+3.3 log n
k =1+3.3 log (50)
k =6.6∨7

3. Compute the class interval (i ).


Class Interval=99/7=14.14∨14
 
After computing the required values, we can now construct the frequency distribution table.
Number of Relative
Amount of Sales
Boundaries Sales Frequency
(Classes)
(Frequency) (Percentage)
720 – 733 719.5 – 733.5 4 8%
734 - 747 733.5 – 747.5 8 16%
748 - 761 747.5 – 761.5 9 18%
762 - 775 761.5 – 775.5 10 20%
776 - 789 775.5 – 789.5 10 20%
790 - 803 789.5 – 803.5 6 12%
804 - 819 803.5 – 819.5 3 6%

Note: The lower boundaries for the classes is 0.5 unit below the smallest observation of the class.
The upper boundary for the class is 0.5 unit above the largest observation of the class. The data
can be summarized in the table by recording the number (frequency) and the percentage (relative
frequency) of observations in each category or class.

Graphical Form
We can present data using charts and graphs. For instance, pie chart displays how the total
quantity is distributed among the categories while the bar chart uses the height of the bar to
display the amount in a particular category.

Example:
Four thousand new students were admitted at a university in Metro Manila for the school year,
2011-2012. The students were enrolled in the following programs:
Program Number of students
Accounting 320
Actuarial Science 440
Banking and Finance 720
Entrepreneurial Management 1,080
Economics 800
Marketing 400
Tourism 240
Total 4,000

How do we present these data using pie chart and bar graph?

Below are the calculations for the construction of the pie chart.
Program Frequency Relative Percent Angle

Accounting 320 .08 8% 28.8


Actuarial Science 440 .11 11% 39.6
Banking and Finance 720 .18 18% 64.8
Entrepreneurial Management 1,080 .27 27% 97.2
Economics 800 .20 20% 72.0
Marketing 400 .10 10% 36.0
Tourism 240 .06 6% 21.6
Total 4,000 1.00 100% 360 0

From the given calculations, this is how to present the data using pie chart.

Program Preference of the New Students


Acc As Bf Em Eco M T

This can also be represented by a solid diagram:


This is how to present the data using graph. 
Application

1.Below is a summary of color preference of randomly selected car buyers in Tuguegarao:


BLACK RED BLUE GRAY WHITE
320 180 195 155 250
A. Construct a percentage of relative distribution.
B. Construct a pie chart to describe the data.
C. Construct a bar chart to describe the data.
 
2. Consider the following data:
31 60 75 55 50 40 66 38
39 27 67 61 72 56 59 42
70 24 41 41 25 27 50 48

Present the data by constructing a frequency table.

 
Topic 2: Measures of Central Tendency

Learning Objectives

Upon the completion of this topic, you are expected to:


a. define and differentiate the measure of central tendency: mean, median and mode;
b. give the advantage of mean, median and mode; and
c. calculate mean, median and mode for a grouped and ungrouped data;
d. identify the most appropriate measure of central tendency in a certain distribution.

Presentation of Content

I. Mean
Are you familiar with the averages? One of them is the mean.
The mean is the most popular and well known measure of central tendency. It can be used with
both discrete and continuous data, although its use is most often with continuous data. The mean
is equal to the sum of all the values in the data set divided by the number of values in the data
set.
 
So, if we have n values in a data set and they have values x , x , ..., x , the sample mean, usually
1 2 n

denoted by  x  (pronounced x bar), is:


(x 1 + x 2+ …+ x n)
x=
n

This formula is usually written in a slightly different manner using the Greek capital letter ,
pronounced "sigma", which means "sum of".

x=
∑x
n

You may have noticed that the above formula refers to the sample mean.
Why have we called it a sample mean?

In statistics, samples and populations have very different meanings and these differences are very
important, even if, in the case of the mean, they are calculated in the same way.
 
To acknowledge that we are calculating the population mean and not the sample mean, we use
the Greek lower case letter "mu", denoted as µ:

μ=
∑x
N
Characteristics of the Mean
These are some of the characteristics of the mean.
1. The mean is essentially a model of your data set.
2. It includes every value in your data set as part of the calculation.
3. Mean is the only measure of central tendency where the sum of the deviations of each
value from the mean is always zero.
4. The mean is a reliable or a more stable measurement to use when sample data are being
used to make inferences about populations.
5. The mean is sensitive or is greatly affected by the values, high or low and this makes in
appropriate average to use.
6. The mean is the most commonly used, easily understood, easily calculated, and generally
recognized average.
7. It is best measure to use when the distribution is symmetrical.
8. It is useful measure for inferential statistics.
9. It is used to obtain an average value of a series of value after each item is weighted. This
is referred to as weighted mean.

Mean Computation for Ungrouped Data


For ungrouped data, the mean is computed by simply adding all the values and dividing the sum
by the total number of items. For the sample mean, the formula is:
n

∑ xi
x= i=1
n
Where:
x ̅ =sample mean
x=value of each item
n=number of items∈the sample
Σ=the summation of

In simpler form, the formula for the sample mean may be presented as:

x=
∑x
n
And for the population mean, it is:

μ=
∑x
N
Where:
µ¿ arithmetic mean of a population
N=number of x items ∈the population

Example:
Let us consider the scores of Michael in his statistics class. The scores have been arrayed in
descending order.
76 76 62 51 45 27 12 6 2

Solution:
Since in the case of Michael’s scores, Σx = 357, Michael’s mean score is

x=
∑ x = 357 =39.67
n 9

Example:
The grade in Geometry of 10 students are 87, 84, 85, 85, 86, 90, 79, 82, 78, and 76. What is the
average grade of the 10 students?

Solution:

x=
∑ x = 832 =83.2
n 10
Hence, the average grade of the 10 students is 83.2.

Example:
The weight of four bags of wheat (in kg) are 103, 105, 102, and 104. Find the mean weight.

Solution:

x=
∑ x = 414 =103.5 kg
n 4

II. Median
The median (~ x ) of a set of data is a measure of central tendency that occupies the middle position
in an array of values.
It is the number that divides the bottom 50% of the data from the top 50%, that is, half the data
items fall below the median and half above that value. In an odd number of items the median is
simply the middle value. If n is even, the median is the average of the two middle data values in
its ordered list.
The middle value or term in a set of data arranged according to size/ magnitude (either increasing
or decreasing) is called the median.
Uses of Median
The median is used whenever an average of position is desired. It is used when open– ended
intervals are involved. Since the median divides a distribution in half, it is also frequently used as
an average in testing general abilities, like in intelligence test.

Characteristics of Median
The median is another widely used average, easy to understand, and easy to compute. It cannot
be found unless the items are arranged in an ascending or descending order. It is the point that
divides the frequency distribution into two halves. The median is not affected by the extremely
high or low values, so it is better choice when a distribution is in p badly skewed. It may be
determined in open– ended distribution.

Median Computation for Ungrouped Data


The median is computed as follows:
1. Arrange the items in an array.
2. Identify the middle value.

Example 1:
The library logbook shows that 58, 60, 54, 35, and 97 books, respectively, were borrowed from
Monday to Friday last week. Find the median.

Solution:
Arrange the (58, 60, 54, 35, and 97) data in increasing order.
35, 54, 58, 60, 97
We can see from the arranged numbers that the middle value is 58. Thus, the median is 58.

Example 2:
The amount of money a balut vendor earned on five randomly selected days are:
₱ 86, ₱ 109, ₱ 141, ₱ 74, ₱ 123

Solution:
Making an array, we have:
₱ 74, ₱ 86, ₱ 109, ₱ 123, ₱ 141
Since there are 5 (odd) items,
~
x=₱ 109
Example 3:
Andrea’s scores in 10 quizzes during the first quarter are 8, 7, 6, 10, 9, 5, 9, 6, 10, and 7. Find the
median.

Solution:
Arrange the scores in increasing order. 5, 6, 6, 7, 7, 8, 9, 9, 10, 10
Since the number of measures is even, then the median is the average of the two middle scores.
~ 7+8
x= = 7.5
2
Hence, the median of the set of scores is 7.5.

III. Mode
The mode ( ^x ), by definition, is the most commonly occurring value in a series. A series may
have more than one or none at all.
For the grouped data, the class with the greatest frequency is called the modal class.
A distribution with only one mode is said to be unimodal. In case wherein there are two class
limits with the highest frequency, the distribution is referred to as bimodal. Further, the
distribution is multimodal when there are three or more modes.

Uses of Mode
It is used when a quick estimate of the average is needed. It helps us spot a trend. Being the most
frequently occurring value, if you are a shoe producer or a clothing manufacturer and you want
to know the size that will fit the greatest number of people, you would seek the modal size.
Obviously, the shoe producer or clothing manufacturer will produce more shoes or dresses in the
most commonly purchased size than in other sizes.
The mode therefore provides information to businessman and producers that would help them in
business and decision making.
The mode is the measure or value which occurs most frequently in a set of data.
It is the value with the greatest frequency.
 
Characteristics of Mode
It is the simplest central tendency. It is not affected by extreme values in a distribution but
unreliable measure of central tendency.
It is not affected by extreme values in a distribution. It is not necessary to arrange the item before
the mode is known.
The mode may not exist in some set of data or there maybe more than one mode in other data set.
 
Mode Computation for Ungrouped Data
For ungrouped data, the most frequent occurring score is the mode.
To find the mode for a set of data:
1. Select the measure that appear most often in the set;
2. If two or more measures appear the same number of times, then each of these values is a
mode; and
3. If every measure appears the same number of times, then the set of data has no mode.

Example 1
Find the mode of the following values.
3, 4, 7, 7, 7, 8, 11, 11, 14, 18, 19

Answer
^x = 7

Example 2
Determine the mode of the following set of data.
6, 6, 6, 9, 9, 9, 9, 12, 12, 12, 12, 12, 12, 15, 15, 15, 15, 15, 21, 21, 35, 35

Answer
^x = 12 and 15
 
Topic 3: Measures of Dispersion

Learning Objectives

Upon the completion of this topic, you are expected to:


a. define range, standard deviation, and variance;
b. calculate range, standard deviation, and variance for ungrouped data; and
c. describe the given set of data using the computed measures of dispersion.

Presentation of Content

I. Range
The range is the simplest measure of variability. It is the difference between the largest value and
the smallest value. The formula for the range is:
R=H−L
Where:
R=Range
H=Highest value
L=Lowest value

Test scores of 10, 8, 9, 7, 5, and 3, will give us a range of 7 from 10 – 3.


 
Characteristics of Range
It is easy to compute and understand. It emphasizes the extreme values. However, it is the most
unstable measure because its values easily change or fluctuates with the change in the extreme
values.

Uses of Range
The range is used to report the movement of stock process over a period of time and the weather
reports typically state the high and low temperature readings for a 24– hour period.
 
Example 1:
The following are the daily wages of 8 factory workers of two garment factories. Factory A and
factory B. Find the range of salaries in peso (Php).  
Factory A: 400, 450, 520, 380, 482, 495, 575, 450.
Factory B: 450, 400, 450, 480, 450, 450, 400, 672

Solution:
Finding the range of wages: Range = Highest wage – Lowest wage
Range A=575−380=195
Range B=672−350=322

Comparing the two wages, you will note that wages of workers of factory B have a higher range
than wages of workers of factory A. These ranges tell us that the wages of workers of factory B
are more scattered than the wages of workers of factory A.
The range tells us that it is not a stable measure of variability because its value can fluctuate
greatly even with a change in just a single value, either the highest or lowest.

Example 2:
Find the range in the sets A, B, and C.
Set A : 81, 83, 87, 90, 94
Set B : 84, 86, 87, 88, 90
Set C : 85, 86, 87, 88, 89

Solution:
Set A : Range=HV −LV =94−81=13

Set B : Range=HV −LV =90−84=6

Set C : Range=HV −LV =89−85=4

Based on the computed range for sets A, B, C, it can be concluded that A has greater variability
as compared top B and C.
 
III. Variance
The variance of a set of data is denoted by the symbol s2. It determines how spread out the data
is. To find the variance (s2), we use the formula:

2
s=
∑ (x−x)
2

n−1
Where:
n=thetotal number of data
x=is the raw score
x=the mean of the data

Variance Computation for Ungrouped Data


Calculate the variance follow these steps:
1. Work out the mean (the simple average of the numbers)
2. For each number, subtract the mean and square the result (the squared difference).
3. Work out the average of those squared differences.

Example:
You and your friends have just measured the heights of your dogs (in millimeters). The heights
(at the shoulders) are: 600mm, 470mm, 170mm, 430mm, and 300mm. Find out the value of the
variance.
 

Solution:
Step 1. Work out the mean (the simple average of the numbers)

600+470+170+ 430+300 1,970


Mean = = = 394
5 5

Step 2. For each number, subtract the mean and square the result (the squared difference)
and work out the average of those squared differences.
2 2 2
s =206 + 76 + ¿ ¿
2 108,520
s=
4
2
s =27,130
So the value of
the variance is
27,130.
III. Standard Deviation
While the range is about how much your data covers, standard deviation has to do more with
how much difference there is between the scores. It is defined as a number representing how far
from the mean each score is.
Simply, the standard deviation is the square root of the variance.

Characteristics of Standard Deviation


Standard deviation is a number used to tell how measurements for a group are spread out from
the mean or expected value.
A low standard deviation means that most of the numbers are very close to the average. A
high standard deviation means that the numbers are spread out.

Standard Deviation Computation for Ungrouped Data


To find the standard deviation, follow the steps below.
1. Calculate the mean.
2. Calculate the deviations, which are the scores minus the average.
3. Square the deviations.
4. Sum up the squared deviations.
5. Divide the sum of the squared deviations by the number of scores in your data set minus
1.
6. Take the square root of the result.

Formula:

Where:
s=
√ ∑ (x −x)2
n−1

s=the standard deviation


x=the individual score
x=the mean
n=the number of scores

Example 1
Sam has 20 rose bushes, but only counted the flowers on 6 of them! The "population" is all 20
rose bushes and the "sample" is the 6 bushes that Sam counted the flowers of. Let us say Sam's
flower counts are: 9, 2, 5, 4, 12, and 7, find the value of the standard deviation.
 
Solution
Step 1. Work out the mean.
Using sampled values 9, 2, 5, 4, 12, 7
The mean is(9+ 2+5+ 4 +12+ 7)/6=39 /6=6.5
So, x=6.5

Step 2. Then for each number, subtract the mean and square the result.
( 9−6.5 )2=(2.5)2∨6.25
( 2−6.5 )2=(−4.5)2∨20.25
( 5−6.5 )2=(−1.5) 2∨2.25
( 4−6.5 )2=(−2.5) 2∨6.25
( 12−6.5 ) 22=(5.5)2∨30.25
( 7−6.5 )2=(0.5)2∨0.25
 Step 3. Then work out the mean of those squared differences.
∑ ¿6.25+ 20.25+ 2.25+6.25+30.25+0.25=65.5
65.5 65.5
= ∨13.1
6−1 5
This value is called the sample variance.

Step 5. Take the square root of that.


Standard Deviation= √13.1=3.62

Application

Directions: Find the range, variance, and standard deviation of the following quantitative
frequency distributions.

The following data represent the difference in scores between the winning and losing teams in a
sample of 15 college football bowl games from 20018-2019.

12 15 16 11 13
24 15 15 12 15
12 9 12 9 16
25 8 11 8 17
12 10 10 9 18

You might also like