You are on page 1of 5

STATISTICAL TREATMENT OF DATA

STATISTICS is a branch of mathematics concerned with the organization, analysis and interpretation of numerical data.

• Collection of Data refers to the process of obtaining information.


• Organization of Data refers to the ascertaining manner of presenting the data into tables, graphs or charts so that
the logical and statistical conclusions can be drawn from the collected measurements.
• Analysis of Data refers to the process of extracting from the given data relevant information from which numerical
description can be formulated.
• Interpolation of Data refers to the task of drawing conclusions from the analyzed data.

TWO MAJOR AREAS OF STATISTICS

1. Descriptive Statistics - statistical method concerned with describing the properties and characteristics of a set of
data (e.g. gender, age group, percentage of literacy, average family income, etc.)
2. Inferential Statistics – statistical analysis concerned with the analysis of data leading to prediction, inferences,
interpretation or conclusion about the entire population

STATISTICAL TERMS

• Data – any quantitative or qualitative information


a. Quantitative data – numerical information obtained from counting or measuring that which be
manipulated by any fundamental operation (e.g. age, IQ scores, height, weight, income)
b. Qualitative data – descriptive attributes that can’t be subjected to mathematical operations (e.g. gender,
citizenship, educational attainment, religion)
• Population – totality of all the elements or persons for which one has an interest at a particular time (e.g. faculty
of a school, graduating class, employees of a company)
• Sample – part of a population determined by sampling procedures, usually denoted by n
• Parameter – statistical information or attribute taken from a population
• Statistic – any estimate of statistical attribute taken from a sample
• Variable – specific factor, property or characteristics of a population or a sample which differentiates a sample or
group of samples from another group
a. Discrete variable – can be obtained by counting (e.g. number of computers in the laboratory, etc.)
b. Continuous variable – can be obtained by measuring objects or attributes (e.g. weight of students, etc.)

SIGMA NOTATION (∑) – summation; most frequently used from of notation in statistics which abbreviates the sum of the
quantities in a given range

COLLECTION OF DATA

1. Interview –requires face-to-face inquiry with the respondent


2. Questionnaire – makes use of written questions to be answered by the respondents
3. Observation – makes us of the different human senses in gathering information
4. Registration/ Census – requires enactment of law to take effect because it needs the participation of a large, if
not the entire, population
5. Experimentation – usually conducted in laboratories where specimens are subjected to some aspects poof control
to find out cause and effect relationship

TYPEPS OF DATA

• Primary data – gathered directly from the source


• Secondary data – gathered from secondary sources like books, magazines, journal, etc.

STATISTICAL PRESENTATION

• Textual presentation – data are presented in paragraph form


• Graphical presentation – data are presented in visual form; pictures displaying numerical information
• Tabular presentation – data are presented in tables to show the relation between the column and row quantities

STATISTICAL GRAPHS

• Bar graph –show relative sizes of data; bars drawn proportional to the data may be horizontal or vertical
• Line graph – show the relationship between two or more sets of continuous data
• Circle graph – best used to compare parts to a whole where the size of each sector of the circle is proportional to
the size of the category that it represents
FREQUENCY DISTRIBUTION – the tabular presentation of data showing the frequency of each score

Midyear Test Scores of 45 Students in Mathematics IV

29 27 28 27 34 29 27 27 28

25 23 35 25 29 33 23 27 33

27 22 40 27 21 29 22 25 29

25 21 20 21 23 25 30 20 28

30 29 28 30 27 27 27 19 30

STEPS IN CONSTRUCTING FREQUENCY DISTRIBUTION TABLES

1. Find the RANGE (𝑟 ) (difference between the highest score and the lowest score)
𝑟 = ℎ𝑖𝑔ℎ𝑒𝑠𝑡 𝑠𝑐𝑜𝑟𝑒 − 𝑙𝑜𝑤𝑒𝑠𝑡 𝑠𝑐𝑜𝑟𝑒 = 40 − 19 = 𝟐𝟏

2. Decide on the number of CLASSES (grouping or category; ideally between 5 and 15)
*assume that the desired number of classes is 7

3. Determine CLASS INTERVAL (size of each class rounded to the nearest integer)
𝑟𝑎𝑛𝑔𝑒 21
𝑖 = 𝑑𝑒𝑠𝑖𝑟𝑒𝑑 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠 = 7
=𝟑

4. Determine the classes starting with the lowest class.


*19 – 21 where 19 is the lower limit (LL) and 21 is the upper limit (UL)

5. Determine the CLASS FREQUENCY (𝑓) for each class by counting the tally.

Class Frequency (𝑓)


40 – 42 1
37 – 39 0
34 – 36 2
31 – 33 2
28 – 30 14
25 - 27 15
22 – 24 5
19 – 21 6

NUMERICAL VALUES RELEVANT IN FREQUENCY DISTRIBUTION

• Class Mark (CM) – middle value in a class


25+27
𝐶𝑀 = 2
= 26

• Class Boundaries – described as true limits because these are more precise expression of class limits
a. Lower Boundaries (LB) – 0.5 less than the lower limit (e.g. 25 – 0.5 = 24.5)
b. Upper Boundaries (UB) – 0.5 more than the upper limit (e.g. 27 + 0.5 = 27.5)

• Cumulative Frequency (C𝑓)


a. <C𝒇 – found by adding the frequency of the class and the frequency of the lower classes
(e.g. 14 + (15 + 5 + 6) = 40)
b. >C𝒇 – found by adding frequency of the class and the frequency of the upper classes
(e.g. 14 + (2 + 2 + 0 + 1) = 19)
Midyear Test Scores of 45 Students in Mathematics IV

Class Boundaries Cumulative Frequency


Class 𝑓 CM LL UL
LB UB <C 𝒇 >C 𝒇
40 – 42 1 41 40 42 39.5 42.5 45 1
37 – 39 0 38 37 39 36.5 39.5 44 0
34 – 36 2 35 34 36 33.5 36.5 44 3
31 – 33 2 31 31 33 30.5 33.5 42 5
28 – 30 14 29 28 30 27.5 30.5 40 19
25 - 27 15 26 25 27 24.5 27.5 26 34
22 – 24 5 23 22 24 21.5 24.5 11 39
19 – 21 6 20 19 21 18.5 21.5 6 45

HISTOGRAM – bar graph-like representation of a frequency distribution; the height of each bar corresponds to the class
frequency and the width corresponds to the class interval

FREQUENCY POLYGON – line graph where the frequency of each class is plotted against class mark

OGIVE – line graph where the cumulative frequency of each class is plotted against the corresponding class boundary; the
intersection of the less than ogive and the greater than ogive is the MEAN of the data

QUANTITATIVE DESCRIPTION OF DATA

• Measures of Central Tendency/ Measure of Average – statistic that serves as a representative of the data; a
quantitative representation of the set of data under investigation and lies within the center of the set of data
• Measures of Dispersion/ Measure of Spread – statistic that indicates how close or widespread the data are from
the average

MEASURES OF CENTRAL TENDENCY

1. Mean – arithmetic average; the sum of the quantities divided by the number of quantities under consideration
𝑥 ,𝑥 ,…,𝑥
a. Mean of Ungrouped Data (𝑥̅ = 1 2𝑛 𝑛 )
b. Mean of Grouped Data Using Class Mark
∑ 𝑓𝑥
( 𝑥̅ = )
∑𝑓

where; 𝑥̅ − 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑔𝑟𝑜𝑢𝑝𝑒𝑑 𝑑𝑎𝑡𝑎


𝑓 − 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑒𝑎𝑐ℎ 𝑐𝑙𝑎𝑠𝑠
𝑥 − 𝑐𝑙𝑎𝑠𝑠 𝑚𝑎𝑟𝑘 𝑜𝑓 𝑒𝑎𝑐ℎ 𝑐𝑙𝑎𝑠𝑠

2. Median – middle value in a set of quantities; the value that separates an ordered set of data in two equal parts
a. Median of Ungrouped Data
𝑛+1
• If n is odd, the median is the ( 2
)𝑡ℎ quantity
𝑛 𝑛
• If n is even, the median is the mean of (2 + 1)𝑡ℎ and (2 )𝑡ℎ quantities
b. Median of Grouped Data
∑𝑓
−𝑐𝑓
𝑥̃ = 𝑙𝑏𝑚𝑒 + [ 2𝑓 ]𝑖
𝑚𝑒

where; 𝑥̃ = 𝑚𝑒𝑑𝑖𝑎𝑛 𝑜𝑓 𝑔𝑟𝑜𝑢𝑝𝑒𝑑 𝑑𝑎𝑡𝑎


𝑙𝑏𝑚𝑒 = 𝑙𝑜𝑤𝑒𝑟 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠
∑ 𝑓 = 𝑠𝑢𝑚 𝑜𝑓 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠
𝑐𝑓 = 𝑐𝑢𝑚𝑢𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑙𝑜𝑤𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝑛𝑒𝑥𝑡 𝑡𝑜 𝑡ℎ𝑒 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠
𝑓𝑚𝑒 = 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠
𝑖 = 𝑐𝑙𝑎𝑠𝑠 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙

3. Mode – quantity with the most number of frequency


• Unimodal distribution – contains only one mode
• Bimodal distribution – contains two modes
• Trimodal distribution – contains three modes
• No Mode – has no mode
a. Mode of Grouped Data
𝐷1
𝑥̂ = 𝑙𝑏𝑚𝑜 + ( )𝑖
𝐷1 + 𝐷2

where; 𝑥̂ = 𝑚𝑜𝑑𝑒 𝑜𝑓 𝑔𝑟𝑜𝑢𝑝𝑒𝑑 𝑑𝑎𝑡𝑎


𝑙𝑏𝑚𝑜 = 𝑙𝑜𝑤𝑒𝑟 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑜𝑑𝑒 𝑐𝑙𝑎𝑠𝑠
𝐷1 = 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑡ℎ𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑜𝑑𝑎𝑙 𝑐𝑙𝑎𝑠𝑠 𝑎𝑛𝑑 𝑡ℎ𝑒 𝑛𝑒𝑥𝑡 𝑢𝑝𝑝𝑒𝑟 𝑐𝑙𝑎𝑠𝑠
𝐷2 = 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑡ℎ𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑜𝑑𝑎𝑙 𝑐𝑙𝑎𝑠𝑠 𝑎𝑛𝑑 𝑡ℎ𝑒 𝑛𝑒𝑥𝑡 𝑙𝑜𝑤𝑒𝑟 𝑐𝑙𝑎𝑠𝑠
𝑖 = 𝑐𝑙𝑎𝑠𝑠 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙

EXAMPLE:

Midyear Test Scores of 45 Students in Mathematics IV

Class Boundaries Cumulative Frequency


Class 𝑓 CM (x) LL UL 𝑓𝑥
LB UB <C 𝒇 >C 𝒇
40 – 42 1 41 40 42 39.5 42.5 45 1 41
37 – 39 0 38 37 39 36.5 39.5 44 0 0
34 – 36 2 35 34 36 33.5 36.5 44 3 70
31 – 33 2 31 31 33 30.5 33.5 42 5 62
28 – 30 14 29 28 30 27.5 30.5 40 19 406
25 - 27 15 26 25 27 24.5 27.5 26 34 390
22 – 24 5 23 22 24 21.5 24.5 11 39 115
19 – 21 6 20 19 21 18.5 21.5 6 45 120
∑ 𝑓 = 45 ∑ 𝑓𝑥 = 1204

A. MEAN C. MODE
∑ 𝑓𝑥 1204 𝐷1
𝑥̅ = = = 𝟐𝟔. 𝟕𝟔 𝑥̂ = 𝑙𝑏𝑚𝑜 + ( )𝑖
∑𝑓 45 𝐷1 + 𝐷2
1
𝑥̂ = 24.5 + ( )3
B. MEDIAN 1 + 10
∑𝑓 1
− 𝑐𝑓 𝑥̂ = 24.5 + ( ) 3
11
𝑥̃ = 𝑙𝑏𝑚𝑒 + [ 2 ]𝑖
𝑓𝑚𝑒 𝑥̂ = 24.5 + (0.09)3
𝑥̂ = 24.5 + 0.27
45
− 11 𝑥̂ = 𝟐𝟒. 𝟕𝟕
𝑥̃ = 24.5 + [ 2 ]3
15
22.5 − 11
𝑥̃ = 24.5 + [ ]3
15
11.5
𝑥̃ = 24.5 + [ ]3
15
𝑥̃ = 24.5 + [0.77]3
𝑥̃ = 24.5 + 2.31
𝑥̃ = 𝟐𝟔. 𝟖𝟏

CHARACTERISTICS OF MEASURES OF CENTRAL TENDENCY

MEAN MEDIAN MODE


Nature of computational or calculated inspectional or commercial
rank or positional average
Computation average average
easily affected by an increase
Sensitivity to may or may not be affected by may or may not be affected by
or decrease in the number of
Other Data extreme values an introduction of other data
data
most widely used average and less widely used than the mean
rarely used and cannot be
Usability subject to further mathematical but can be subjected to a few
mathematically manipulated
computation mathematical computation
measure for interval or ratio measure for nominal scales
Nature of measure for ordinal scales such
scales such as scores, grades, such as number of certain
Data as test scores, salary
temperature and population brand of commodities

You might also like