Data Management: Midterm

DATA MANAGEMENT
MIDTERM
Introduction to Data Management
• Raw data
• Range
• Frequency distribution
• Class limits (Apparent limits)
• Class boundaries (Real limits)
• Interval (width)
• Frequency (f)
• Percentage
• Cumulative frequency
• Midpoint (x)
Frequency Distribution
is a grouping of data into categories showing

the number of observations in each of the non-
overlapping classes.
is the organization of data in a tabular form,

using mutually exclusive classes showing the
number of observation in each.
Raw data
is the data collected in original form.

Range
is the difference of the highest value and

lowest value in a distribution
Class limits
is the highest and lowest values describing a

class.
Class boundaries
is the upper and lower values of a class for

group frequency distribution whose values has
additional decimal place more than the class limits
and end with the digit 5.
Interval
is the distance between class lover boundary

and the class upper boundary and it is denoted by
the symbol i.
Frequency
is the number of values in a specific class of a

frequency distribution.
Percentage
is obtained by multiplying the relative

frequency by 100%.
Cumulative frequency
is the sum of the frequencies accumulated up

to the upper boundary of a class in a frequency
distribution.
Midpoint
is the point halfway between the kinits of

each class and is representative of the data
within that class.
Twenty applicants were given a
performance evaluation appraisal.
High High High Low Average

Average Low Average Average Average
Low Average Average High High
Low Low Average High High
Construct a frequency distribution for the data.
High High High Low Average
Average Low Average Average Average
Low Average Average High High
Low Low Average High High
Class Tally f Percentage Found by
High IIIII – II 7 35 (7/20) x 100
Average IIIII – III 8 40 (8/20) x 100
Low IIIII 5 25 (5/20) x 100
Total 20 100
Determining class Interval
2𝑘 "2 to the k rule”:
𝑅𝑎𝑛𝑔𝑒 𝐻𝑉 − 𝐿𝑉
𝑆𝑢𝑔𝑔𝑒𝑠𝑡𝑒𝑑 𝐶𝑙𝑎𝑠𝑠 𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 = =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠 𝑘
Where: HV = Highest value in a data set
LV = Lowest value in a data set
k = number of classes
I = suggested class interval
2𝑘 ≥ 𝑛
Determining class Interval
𝑅𝑎𝑛𝑔𝑒
𝑆𝑢𝑔𝑔𝑒𝑠𝑡𝑒𝑑 𝐶𝑙𝑎𝑠𝑠 𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 =
1 + 3 log 𝑛
Construct a frequency distribution for the
following data.
11 19 11 15 16 10
16 16 15 17 10 27 Determine the ff:
21 11 13 21 10 16 a. Range
b. Interval
11 19 24 12 22 13 c. Class limits
19 13 18 20 21 11 d. Relative
19 15 11 25 29 23 frequencies
16 23 10 17 11 27 e. Percentage
f. Cumulative
16 24 12 21 13 12 frequency
26 15 11 14 10 12 g. Midpoints
11 15 18 12 20 13
Determine the value of k = 1 + 3 log n where n = 60.
log60 = 1.7782=15125
k = 1 + 3 (1.7781512)
k = 1 + 5.3344536
k = 6. therefore, 6 is the estimated number of classes.
r = 29- 10 = 19
Class size = 17/6 = 3.16 or 3.
Interval is 3.
11 19 11 15 16 10 Class
Class percentag
Boundarie f x fx Cf Relative f
limits e
16 16 15 17 10 27 s
1/60=0.01 0.1666*10
21 11 13 21 10 16 27.5 – 30.5 28-30 1 29 29 60
6 0 = 1.67
11 19 24 12 22 13 24.5 – 27.5 25-27 4 26 104 59 0.0667 6.67
19 13 18 20 21 11 21.5 – 24.5 22-24 5 23 115 55 0.833 8.33
19 15 11 25 29 23 18.5 – 21.5 19-21 10 20 200 50 0.1667 16.67
16 23 10 17 11 27 15.5 – 18.5 16-18 10 17 170 40 0.1667 16.67
16 24 12 21 13 12 12.5 – 15.5 13-15 11 14 154 30 0.1833 18.33
26 15 11 14 10 12 9.5 – 12.5 10-12 19 11 209 19 0.3167 31.67
11 15 18 12 20 13 Total, n 60 981
Construct a frequency distribution for the following
data. The scores of the students in Geometry test.
55 63 44 37 50 57 44 57 42 46
58 40 54 65 39 27 28 56 38 45
30 35 56 78 55 27 50 28 44 28
39 37 65 43 33 70 60 61 60 44
Cf Relative
Class Class f x fx (Cumulative Percentage
Boundaries limits (frequency) (midpoint) frequency)
Frequency)
72 – 80 1 76 76 40 0.025 2.5
63 – 71 4 67 268 39 0.1 10
54- 62 11 58 638 35 0.275 27.5
45 – 53 4 49 196 24 0.1 10
36 – 44 12 40 480 20 0.3 30
27 – 35 8 31 248 8 0.2 20
Total, n 40 1,906 100

Median (grouped data)
In computing the median of the grouped data, determine the median class
which contains the (N/2)th score under Cf of the cumulative frequency
distribution. To solve for the median, we use the formula:
𝑁
−𝑐𝑓𝑏
𝑀𝑑 = 𝑋𝐿𝐵 + 2
I
𝑓𝑚
Where: 𝑀𝑑 = median
𝑋𝐿𝐵 = the lower boundary or true lower limit of the median class.
N = total frequency
𝑐𝑓𝑏 = cumulative frequency before the median class
𝑓𝑚 = frequency of the median class
i = size of the class interval
Compute the median given following data:
Scores in Statistics f
75 – 79 6
70 – 74 7
65 – 69 2
60 – 64 8
55 – 59 12
50 – 54 7
45 – 49 10
40 – 44 8
N 60
Measures of Central Tendency
Central tendency determines a numerical value in the central region

of a distribution. It refers to the center of distribution of
observations. There are three measures of central tendency:
- Mean
- Median
- Mode
Measures of Central Tendency
Any data set can be characterized by

measuring central tendency. A measure of central
tendency, commonly referred to as an average, is a
single value that represents a data set. Its purpose
is to locate the center of a data set.
Mean
• The mean is also called arithmetic mean or average.

It can be affected by extreme score. It is stable, and
varies less from sample to sample. It is used if the
most reliable measure is desired and when there are
few very high values and few very low values. The
mean is the balance point of a score distribution.
Properties of Mean
1. A set of data has only one mean.

2. Mean can be applied for interval or ratio.
3. All values in the data set are included in computing the mean.
4. The mean is very useful in comparing two or more data sets.
5. Mean is affected by the extreme small or large values on a data
set.
6. Mean is most appropriate in symmetrical data.
Mean
𝑆𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑣𝑎𝑙𝑢𝑒𝑠
𝑀𝑒𝑎𝑛 =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠
σ𝑥 σ𝑥
𝑆𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛: 𝑥ҧ = 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛: 𝜇 =
𝑛 𝑁
where: 𝑥ҧ = sample mean (it is read as “x bar”).

𝜇 = population mean (it is read as “mu”).
x = the value of any particular observation or
measurement.
σ 𝑥 = sum of all x’s.
n = total number of values in the sample.
N = total number of values in the population.
Ex.1: The daily salaries of a sample of eight employees at GMS,
Inc. are Php550, Php420, Php560, Php500, Php700, Php670,
Php860, Php480. Find the mean daily rate of employees.
σ𝑥 𝑥1 +𝑥2 +𝑥3 +𝑥4 +𝑥5 +𝑥6 +𝑥7 +𝑥8

𝑥ҧ = =
𝑛 𝑛
550+420+560+500+700+670+860+480 4,740
𝑥ҧ = = = 592.50
8 8
The sample mean daily salary of employees is Php592.50.

Find the population mean of the ages of 9 middle-
management employees of a certain company. The
ages are 53, 45, 59, 48, 54, 46, 51, 58 and 55.
σ𝑥
𝜇=
𝑁
53 + 45 + 59 + 48 + 54 + 46 + 51 + 58 + 55
𝜇=
9
469
= = 52.11
9
The mean population age of middle – management
employees is 52.11
MEDIAN
The median is the midpoint of the data array.

When the data set is ordered, whether ascending
or descending, it is called data array. Median is an
opportunity measure of central tendency for data
that are ordinal or above, but is more valuable in
an ordinal type of data.
Properties of Median
1. The median is unique, there is only one median for a data set.
2. The median is found by arranging the set of data from lowest or
highest (or highest to lowest) and getting the value of the
middle observation.
3. Median is not affected by the extreme small or large values.
4. Median can be applied for ordinal, interval and ratio data.
5. Median is most appropriate in a skewed data.
To determine the value of the median for
ungrouped, we need to consider two rules.
• If n is odd, the median is the middle ranked.

• If n is even, then the median is the average of two middle ranked
values.
𝑛+1
middle (Rank Value) =
2
Note that n is the population/sample size.

MODE
The mode is the value in a data set that appears most

frequently. Like the median and unlike the mean, extreme values in
a data set do not affect the mode. A data may not contain any
mode if none of the values is “most typical”. A data set that has
only one value that occurs the greatest frequency is said to be
unimodal. If the data has two values with the same greatest
frequency both values are considered the mode and the data set is
bimodal. If the data set has more than two more modes, then the
data set is said to be multimodal. There are some cases when a
data set values have the same number of frequency. When this
occurs, the data set is said to be no mode.
Properties of MODE
1. The mode is found by locating the most frequent occurring value.

2. The mode is the easiest average to compute.
3. There can be more than one mode or even no mode in any given data
set.
4. Mode is not affected by the extreme small or large values.
5. Mode can be applied for nominal, ordinal. Interval and ratio data.
Weighted Mean
The weighted mean is particularly useful when various classes

or groups contribute differently to the total. The weighted mean is
found by multiplying each value by its corresponding weight and
dividing by the sum of the weights.
𝑥1 𝑤1 + 𝑥2 𝑤2 + 𝑥3 𝑤3 + …+ 𝑥𝑛 𝑤𝑛
𝑥ҧ𝑤 =
𝑤1 +𝑤2 +𝑤3 + …+𝑤𝑛
Where: 𝑥ҧ = 𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝑚𝑒𝑎𝑛.

𝑤𝑖 = corresponding weight.
𝑥𝑖 = the value of any particular observation or measurement.
Example 1:
At the Mathematics department of San Sebastian College there

are 18 instructors, 12 assistant professors, 7 associate professors, and
3 professors. Their monthly income salaries are Php30,500.00,
Php33,700.00, Php38,600.00 and Php45,000.00. What is the weighted
mean salary.
Solution:
Let w1 = 18 w2 = 12 w3 = 7 w4 = 3
x1 = 30,500 x2 = 33,700 x3 = 38,600 x4 = 45,000
30,500(18)+33,700(12)+38,600(7)+45,000(3)
𝑥ҧ𝑤 =
18+12+7+3
1,358,600
𝑥ҧ𝑤 = = 33,965
40
The weighted mean salary is Php33,965.00

Measures of Dispersion
Another important characteristics of a data set is

how it is distributed, or how far each element is from
some measure of central tendency (average). There
are several ways to measure the variability of the
data. Although the most common and most important
is the standard deviation, which provide an average
distance for each element from the mean.
Standard Deviation
Standard deviation is a statistical term that

provides a good indication of volatility. It
measures how widely values are dispersed from
average. Dispersion is the distance between the
actual value and the average value.
Range
Probably the simplest and easiest way to determine measure

of dispersion is the range. The range is the difference of the
highest value and the lowest value in the data set.
Advantages:
- it is easy to compute
-it is easy to understand
Disadvantages:
- it can be distorted by a single extreme value (outlier)
- only two values are used in the calculation.
Assignment:
(ungrouped data)
The daily rates of a sample of eight employees at GSM Inc.

are Php550.00, Php420.00, Php560.00, Php500.00, Php700.00,
Php670.00, Php860.00, Php480,00.
Find the variance and standard deviation.

How to compute for the mean?
(Ungrouped data)
The mean is the balance point of a distribution.
𝑠𝑢𝑚 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠
Mean =
𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠
Example 1:
Jeffrey has bee working on programming and

uploading a Web site for his company for the past 24
months. The following numbers represent the number
of hours Jeffrey has worked on this Web site for each of
the past 7 months: 24, 25, 33, 50, 53, 66, 78. What is
the mean (average) number of hours that h Jeffrey
worked on his Web site each month.
Solution:
Step 1. Add the numbers to determine the total

number of hours he worked.
24 + 25 + 33 + 50+ 53 + 66 + 78 = 329
Step 2. Divide the total by the number of months.
𝑠𝑢𝑚 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠 329

Mean = = = 47
𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠 7
47 was the average number of hours that Jeffrey

worked on this Web site each month.
Example 2:
The following are Marivic’s score during the statistics quizzes.

70, 72, 77, 78, 84, and 79.
A. Compute the mean of the scores.

B. Show that the sum of the difference of the scores from the mean is 0.
C. Show that it is greatly affected by extreme values.
Weighted Mean
σ 𝑓𝑋
WM =
𝑁
Where: WM = weighted mean

f = frequency
X = score
σ 𝑓𝑋 = sum of the product of frequency and score
N = total frequeny
Example 3:
There are 1, 000 notebooks sold at

Php10.00 each; 500 notebooks at Php20.00
each; 500 notebooks as Php25.00 each, and
100 notebooks at Php30.00 each. Compute
the weighted mean.
Solution:
Prepare the frequency distribution.
Notebook’s Price
f fX
(X)
Php 10 1000 Php10, 000.00
20 500 10, 000
25 500 12, 500
30 100 3, 000
N = 2, 100 σ 𝑓𝑋 = Php35,
500.00
σ 𝑓𝑋
WM =
𝑁
35,500
= = 16.90
2,100
Grouped Data: MEAN
σ 𝑓𝑋
M=
𝑁
where M = mean
f = frequency
X = class mark
σ 𝑓𝑋 = sum of the product of frequencies and class marks
N = total frequency
Example 4:
The table show summarizes the weights of the cubs. Find the
average weight of the cubs.
Weight of the Cubs f

201 – 210 3
191 – 200 8
181 – 190 12
171 – 180 11
161 – 170 9
151 - 160 2
N = 45
Reminder:
• The class mark is just equal to the average

value of the upper class limit and the lower
class limit from each of the class limits in
the given frequency distribution.
Solution:
• In solving for the mean given the grouped data or frequency distribution, we
have to add two columns for classmark (midpoint) X and fX, that is
Weight of the Cubs f X fX
201 – 210 3 205.5 616.5
191 – 200 8 195.5 1564.0
181 – 190 12 185.5 2226.0
171 – 180 11 175.5 1930.5
161 – 170 9 165.5 1489.5
151 - 160 2 155.5 311.0
N = 45 σ 𝑓𝑋 = 8137.5
σ 𝑓𝑋 8137.5
M= = = 180.83
𝑁 45
Exercises:
1. The size of pants sold during one business day in a department
store are 32, 38, 34, 42, 36, 34, 40, 44, 32, 34. Find the average
size of the pants.
2. Give the frequency distribution for the weights of 50 pieces of
luggage. Compute the mean.
Weight (kilograms) Number of Pieces, f
7–9 2
10 -12 8
13 – 15 14
16 – 18 19
19 -21 7
N 50
Assignment:
Advance reading on:

Measures of dispersion.
-variance
-standard deviation
The size of pants sold during one business day in a
department store are 32, 38, 34, 42, 36, 34, 40,
44, 32, 34. Find the average size of the pants.
σ𝑥
𝑥ҧ =
𝑛
32+32+34+34+34+36+38+40+42+44
=
10
366
=
10
= 36.6
Cf Relative
Class Class f x fx (Cumulative Percentage
Boundaries limits (frequency) (midpoint) frequency)
Frequency)
6.5 – 9.5 7–9 2 8 16 2 0.04 4
9.5 – 12.5 10 -12 8 11 88 10 0.16 16
12.5–15.5 13 – 15 14 14 196 24 0.28 28
15.5-18.5 16 – 18 19 17 323 43 0.38 38
18.5-21.5 19 -21 7 20 140 50 0.14 14
Total, n 50 763 100

Mean,
σ 𝑓𝑋
𝑥ҧ =
𝑁
763
=
50
𝑥ҧ = 15.26
𝑁
−𝑐𝑓𝑏
Median, 𝑀𝑑 = 𝑋𝐿𝐵 + 2
𝑓𝑚
𝑖
50
−24
= 15.5 + 2
x3
19
25 −24
= 15.5 + x3
19
1
= 15.5 + x3
19
= 15.5 + (0.052631578) x 3
= 15.5 + 0.157894736
𝑀𝑑 = 15.65789474
𝑑𝑓1
Mode, 𝑀𝑜 = 𝑋𝐿𝐵 + 𝑖
(𝑑𝑓1 +𝑑𝑓2 )
19−14
= 15.5 + x3
19−14 +(19 −7)
5
= 15.5 + x3
5+12
5
= 15.5 + x3
17
= 15.5 + (0.294117647) x 3
= 15.5 + (0.88235294)
= 16.38235294
Ascending order of Classes
Class
limits
f x fx Cf 𝑥ҧ x- 𝑥ҧ (x- 𝑥)ҧ 𝟐 f(x- 𝑥)ҧ 𝟐
19 -21 7 20 140 50 15.26 4.74 22.4676 157.2732
16 – 18 19 17 323 43 15.26 1.74 3.0276 57.5244
13 – 15 14 14 196 24 15.26 -1.26 1.5876 22.2264

10 -12 8 11 88 10 15.26 -4.26 18.1476 145.1808
7–9 2 8 16 2 15.26 -7.26 52.7076 105.4152
σ (x− 𝑥)ҧ ෍ (x− 𝑥)ҧ 𝟐 σ f(x− 𝑥)ҧ 𝟐

Total, n 50 763
= -6.3 = 487.62
= 97.938
VARIANCE & STANDARD DEVIATION (sample)
VARIANCE (Ungrouped: Population)
VARIANCE (Ungrouped: Sample)
VARIANCE (Grouped: Population)
VARIANCE (Grouped: Sample)
Start here!
Ex.1: The daily salaries of a sample of eight employees at GMS,
Inc. are Php550, Php420, Php560, Php500, Php700, Php670,
Php860, Php480. Find the mean daily rate of employees.
σ𝑥 𝑥1 +𝑥2 +𝑥3 +𝑥4 +𝑥5 +𝑥6 +𝑥7 +𝑥8

𝑥ҧ = =
𝑛 𝑛
550+420+560+500+700+670+860+480 4,740
𝑥ҧ = = = 592.50
8 8
The sample mean daily salary of employees is Php592.50.

Subtract the mean from each value in the
data set.
x ഥ
𝒙 x-𝒙
ഥ ഥ)𝟐
(𝒙 − 𝒙
550 592.50 -42.5 1,806.25
420 592.50 172.5 29,756.25
560 592.50 -32.5 1,056.25
500 592.50 -92.5 8,556.25
700 592.50 107.5 11,556.25
670 592.50 77.5 6,006.25
860 592.50 267.5 71,556.25
480 592.50 -112.5 12,656.25
σ 𝑥 = 4,740 σ(𝑥 - 𝒙
ഥ)=0 ഥ)𝟐 = 142,950
σ(𝒙 − 𝒙
Variance and standard deviation
2
෌ 𝑥 − 𝑥ҧ 2
𝑠𝟐 = σ 𝑥 − 𝑥ҧ
𝑛−1 𝒔=
𝑛−1
𝟏𝟒𝟐,𝟗𝟓𝟎
=
𝟖 −𝟏 𝟏𝟒𝟐,𝟗𝟓𝟎
=
𝟖 −𝟏
𝑠 𝟐 = 𝟐𝟎, 𝟒𝟐𝟏. 𝟒𝟑
= 𝟐𝟎, 𝟒𝟐𝟏. 𝟒𝟑
𝒔 = 𝟏𝟒𝟐. 𝟗𝟎
Class
limits
19 -21 7 20 140 50 15.26 4.74 22.4676 157.2732

1.74
16 – 18 19 17 323 43 15.26 3.0276 57.5244
13 – 15 14 14 196 24 15.26 -1.26 1.5876 22.2264
10 -12 8 11 88 10 15.26 -4.26 18.1476 145.1808
7–9 2 8 16 2 15.26 -7.26 52.7076 105.4152
σ (x− 𝑥)ҧ ෍ (x− 𝑥)ҧ 𝟐 σ f(x− 𝑥)ҧ 𝟐

Total, n 50 763
= -6.3 = 487.62
= 97.938
MEASURES OF RELATIVE POSITION: QUARTILE
𝑁
𝑘 𝑁+1 − 𝑐𝑓𝑏
𝑄𝑛 = 𝑋𝐿𝐵 + 4 𝑖
𝑄𝑘 =
4 𝑓𝑚
Where: Where:
𝑄𝑘 = Quartile 𝑄𝑛 = 𝑡ℎ𝑒 𝑠𝑐𝑜𝑟𝑒 𝑐𝑜𝑟𝑟𝑒𝑠𝑝𝑜𝑛𝑑𝑖𝑛𝑔 𝑡𝑜 𝑡ℎ𝑒 𝑖𝑡ℎ 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑟𝑎𝑛𝑘.
𝑁 = 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑋𝐿𝐵 = 𝑙𝑜𝑤𝑒𝑟 𝑙𝑖𝑚𝑖𝑡 𝑜𝑓 𝑡ℎ𝑒 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝑐𝑙𝑎𝑠𝑠.
𝑘 = 𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑓𝑚 = 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑐𝑙𝑎𝑠𝑠.
𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛 𝑐𝑓𝑏 = 𝑐𝑢𝑚𝑢𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐 𝑜𝑓 𝑡ℎ𝑒 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝑏𝑒𝑓𝑜𝑟𝑒 𝑡ℎ𝑒 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑐𝑙𝑎𝑠
i = 𝑐𝑙𝑎𝑠𝑠 𝑠𝑖𝑧𝑒.
4 = stands for the class size division
N = total frequency
Find the 1st, 2nd, 3rd quartiles of the ages of 9 middle-
management employees of a certain company.
The ages are 53, 45, 59, 48, 54, 46, 51, 58 and 55.
1. Arrange the data in order.
𝑘 𝑁+1
𝑄𝑘 = 45, 46, 48, 51, 53, 54, 55, 58, 59
4
2. Select the 1st, 2nd, 3rd quartiles value using the formula.
1 9+1 (10)
Where:
𝑄1 = = = 2.5
4 4
𝑄𝑘 = Quartile 2 9+1 (20)
𝑁 = 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑄2 = = =5
4 4
𝑘 = 𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒 3 9+1 (30)
𝑄3 = = = 7.5
4 4
𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛
3. Identify the 1st, 2nd, 3rd quartiles values in the data set.
45, 46, 48, 51, 53, 54, 55, 58, 59
𝑄1 = 47, 𝑄2 = 53, 𝑄3 = 56.5
2.5 5 7.5
MEASURES OF RELATIVE POSITION: DECILE
𝑁
𝐷𝑛 = 𝑋𝐿𝐵 + 10 𝑖
𝐷𝑘 =
10 𝑓𝑚
Where: Where:
𝐷𝑘 = Decile 𝐷𝑛 = 𝑡ℎ𝑒 𝑠𝑐𝑜𝑟𝑒 𝑐𝑜𝑟𝑟𝑒𝑠𝑝𝑜𝑛𝑑𝑖𝑛𝑔 𝑡𝑜 𝑡ℎ𝑒 𝑖𝑡ℎ 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑟𝑎𝑛𝑘.
𝑘 = 𝐷𝑒𝑐𝑖𝑙𝑒 𝑓𝑚 = 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑐𝑙𝑎𝑠𝑠.
N = total frequency
MEASURES OF RELATIVE POSITION: PERCENTILE
𝑁
𝑃𝑛 = 𝑋𝐿𝐵 + 100 𝑖
𝑃𝑘 =
100 𝑓𝑚
Where: Where:
𝑃𝑘 = Percentile 𝑃𝑛 = 𝑡ℎ𝑒 𝑠𝑐𝑜𝑟𝑒 𝑐𝑜𝑟𝑟𝑒𝑠𝑝𝑜𝑛𝑑𝑖𝑛𝑔 𝑡𝑜 𝑡ℎ𝑒 𝑖𝑡ℎ 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑟𝑎𝑛𝑘.
𝑘 = 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑓𝑚 = 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑐𝑙𝑎𝑠𝑠.
N = total frequency
Given the frequency distribution below, calculate the
following:
Statistics Test Results
Class Limits f cf
60 – 62 2 40
Find :
57 – 59 2 38
𝑄1 , 𝑄3 ,
54 – 56 4 36
51 – 53 5 32 𝑃10 , 𝑃50 ,
48 – 50 11 27
45 – 47 8 16 𝐷2 , 𝐷5
42 – 44 4 7
(Tip: Complete the table
39 – 41 2 4 with the required
36 – 38 1 2 information for Q,D,P)
33 - 35 1 1 Take Home
N = 40 Exercises
Solve for 𝐷3 , 𝐷7 , and 𝐷9
Score in Algebra f
75 – 79 6
70 – 74 7
65 – 69 2
60 – 64 8
55 – 59 12
50 – 54 7
45 – 49 10
40 – 44 8
N 60
UNGROUPED GROUPED
σ𝑥 σ 𝑓𝑥
MEAN 𝑥ҧ = 𝑥ҧ =
𝑛 𝑛
𝑁
−𝑐𝑓𝑏
MEDIAN Middle most value 𝑥=
෤ 𝑋𝐿𝐵 + 2
𝑓𝑚
𝑖
𝑑𝑓1
𝑥=
ො 𝑋𝐿𝐵 + 𝑖
MODE Most frequent values (𝑑𝑓1 +𝑑𝑓2 )
2 2
σ 𝑥 − 𝑥ҧ σ 𝑓 𝑥 − 𝑥ҧ
VARIANCE 2
𝑠 = 2
𝑠 =
𝑛−1 𝑛−1
STANDARD σ 𝑥 − 𝑥ҧ 2
σ 𝑓 𝑥 − 𝑥ҧ 2
DEVIATION 𝑠= 𝑠=
𝑛−1 𝑛−1
𝑁 (2)𝑁
QUARTILE 𝑘 𝑁+1 − 𝑐𝑓𝑏 − 𝑐𝑓𝑏
𝑄𝑘 = 𝑄𝑛 = 𝑋𝐿𝐵 + 4 𝑖 𝑄2 = 𝑋𝐿𝐵 + 4 𝑖
(1 – 4) 4 𝑓𝑚 𝑓𝑚
𝑁 (5)𝑁
DECILE 𝑘 𝑁+1
10 − 𝑐𝑓𝑏 − 𝑐𝑓𝑏
𝐷𝑘 = 𝐷𝑛 = 𝑋𝐿𝐵 + 𝑖 𝐷5 = 𝑋𝐿𝐵 + 10 𝑖
(1 – 10) 10 𝑓𝑚 𝑓𝑚
𝑁 75𝑁
PERCENTILE 𝑘 𝑁+1
100 − 𝑐𝑓𝑏 − 𝑐𝑓𝑏
𝑃𝑘 = 𝑃𝑛 = 𝑋𝐿𝐵 + 𝑖 𝑃75 = 𝑋𝐿𝐵 + 100 𝑖
(1 – 100) 100 𝑓𝑚 𝑓𝑚
x ഥ
𝒙 x-𝒙
ഥ ഥ)𝟐
(𝒙 − 𝒙
550 592.50 -42.5 1,806.25

UNGROUPED DATA
420 592.50 172.5 29,756.25
560 592.50 -32.5 1,056.25
500 592.50 -92.5 8,556.25
700 592.50 107.5 11,556.25
670 592.50 77.5 6,006.25
860 592.50 267.5 71,556.25
480 592.50 -112.5 12,656.25
σ 𝑥 = 4,740 σ(𝑥 - 𝒙
ഥ)=0 ഥ)𝟐 = 142,950
σ(𝒙 − 𝒙
Class
limits
GROUPED DATA
19 -21 7 20 140 50 15.26 4.74 22.4676 157.2732
1.74
16 – 18 19 17 323 43 15.26 3.0276 57.5244
15.26 -1.26
13 – 15 14 14 196 24 1.5876 22.2264
15.26 -4.26
10 -12 8 11 88 10 18.1476 145.1808
15.26 -7.26
7–9 2 8 16 2 52.7076 105.4152
σ (x− 𝑥)ҧ ෍ (x− 𝑥)ҧ 𝟐 σ f(x− 𝑥)ҧ 𝟐

Total, n 50 763
= -6.3 = 487.62
= 97.938

Data Management: Midterm

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Management: Midterm

Uploaded by

Copyright:

Available Formats

DATA MANAGEMENT

is a grouping of data into categories showing

is the organization of data in a tabular form,

is the data collected in original form.

is the difference of the highest value and

is the highest and lowest values describing a

is the upper and lower values of a class for

is the distance between class lover boundary

is the number of values in a specific class of a

is obtained by multiplying the relative

is the sum of the frequencies accumulated up

is the point halfway between the kinits of

High High High Low Average

2𝑘 "2 to the k rule”:

11 19 24 12 22 13 24.5 – 27.5 25-27 4 26 104 59 0.0667 6.67

19 13 18 20 21 11 21.5 – 24.5 22-24 5 23 115 55 0.833 8.33

19 15 11 25 29 23 18.5 – 21.5 19-21 10 20 200 50 0.1667 16.67

16 23 10 17 11 27 15.5 – 18.5 16-18 10 17 170 40 0.1667 16.67

16 24 12 21 13 12 12.5 – 15.5 13-15 11 14 154 30 0.1833 18.33

26 15 11 14 10 12 9.5 – 12.5 10-12 19 11 209 19 0.3167 31.67

54- 62 11 58 638 35 0.275 27.5

Total, n 40 1,906 100

Central tendency determines a numerical value in the central region

Any data set can be characterized by

• The mean is also called arithmetic mean or average.

1. A set of data has only one mean.

where: 𝑥ҧ = sample mean (it is read as “x bar”).

σ𝑥 𝑥1 +𝑥2 +𝑥3 +𝑥4 +𝑥5 +𝑥6 +𝑥7 +𝑥8

The sample mean daily salary of employees is Php592.50.

The median is the midpoint of the data array.

• If n is odd, the median is the middle ranked.

Note that n is the population/sample size.

The mode is the value in a data set that appears most

1. The mode is found by locating the most frequent occurring value.

The weighted mean is particularly useful when various classes

Where: 𝑥ҧ = 𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝑚𝑒𝑎𝑛.

At the Mathematics department of San Sebastian College there

The weighted mean salary is Php33,965.00

Another important characteristics of a data set is

Standard deviation is a statistical term that

Probably the simplest and easiest way to determine measure

The daily rates of a sample of eight employees at GSM Inc.

Find the variance and standard deviation.

The mean is the balance point of a distribution.

Jeffrey has bee working on programming and

Step 1. Add the numbers to determine the total

𝑠𝑢𝑚 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠 329

47 was the average number of hours that Jeffrey

The following are Marivic’s score during the statistics quizzes.

A. Compute the mean of the scores.

Where: WM = weighted mean

There are 1, 000 notebooks sold at

Weight of the Cubs f

• The class mark is just equal to the average

201 – 210 3 205.5 616.5

191 – 200 8 195.5 1564.0

181 – 190 12 185.5 2226.0

171 – 180 11 175.5 1930.5

161 – 170 9 165.5 1489.5

151 - 160 2 155.5 311.0

Advance reading on:

6.5 – 9.5 7–9 2 8 16 2 0.04 4