You are on page 1of 27

ENMA 104 – ENGINEERING DATA ANALYSIS

Course Description:
This course introduces different methods of data collection and the suitability
of using particular method for a given situation. It includes a coverage and discussion
of the relationship of probability to statistics, probability distribution of random
variables, and their uses, linear functions of random variables within the context of
their application to data analysis and inference, estimation techniques for unknown
parameters and hypothesis testing used in making up inferences from sample to
population. Inference for regression parameters and build models for estimating
means and predicting future values of key variables under study. Statistically based
experimental design techniques and analysis of outcomes of experiments are discussed
with the aid of statistical software.

TOPIC 1
LEARNING OUTCOME:
1. Define and explain statistical terms
2. Differentiate the different divisions of statistics
3. Identify the scale of measurements of variables
4. Be able to apply summation rules

INTRODUCTION
Statistics is the field of study concerned with the collection, analysis and
interpretation of uncertain data. The methods of statistics allow scientists and
engineers to design valid experiments and to draw reliable conclusions from the data
they produce.
The collection and analysis of data are fundamental to science and
engineering. Scientist discover the principles that govern the physical world and
engineers learn how to design important new products and processes by analyzing data
collected in scientific experiments. A major difficulty which scientific data is that
they are subject to random variation or uncertainty. That is when scientific
measurements are repeated, they come out somewhat differently each time.

DIVISION OF STATISTICS
1. Descriptive Statistics
This refers to the methods of summarizing and presenting data in
the form which will make them easier to analyze and interpret. It characterizes the
distribution of a set of observations on a specific variable or variables. Let it be
clearly understood that descriptive statistics provides information only about the
collected data and in no way draws inferences or conclusions concerning a large set of
data and in no way draws inferences or conclusions concerning a larger set of data.
The construction of tables, charts, graphs are some examples.

1
2. Inferential Statistics
This refers to the drawing of valid conclusions or inferences about
a population based on a representative sample systematically taken from the same
population. The data on hand are usually a sample of the actual population of
interest. For example, most presidential election polls only a sample of about 1,000
individuals, and yet the goal is to describe the expected voting behavior of 100 million
or more.

POPULATION AS DIFFERENTIATED FROM SAMPLE


The word population refers to the total collection of actual or potential
realizations of the unit of observations in the research study. This means there are
population of students, teachers, principals, animals and many others.
The sample is the specific, finite, realized which represent their characteristics
or traits. These members constitute a subset of a population. These measures of the
population are called parameters, while those of the sample are called estimates or
statistic.

VARIABLE
The variable is the construct that a researcher is attempting to measure. The
term variable refers to a characteristic of the subject or individuals. For example,
course preference is a variable that can take n values such as education, criminology,
nursing, computer and others. Another example include gender, age, intelligence,
attitude, faculty ranks and so on.
Variables are classified into qualitative and quantitative variable. A
qualitative variable (also called categorical) has values that are described by words
rather than numbers. Clearly, qualitative variables generally have either nominal or
ordinal scale. For example gender, disease status, occupation, race and others.
Quantitative or numerical variable is a data which arises from counting,
measuring something or from some kind of mathematical operation. These are
variable values that are intrinsically numeric. Number of children in a family and age
are examples of quantitative variable.

VARIABLES ACCORDING T CONTINUITY OF VALUES


1. Continuous variables – These are variables whose levels can take
continuous values.
Example. The height of a person which can be 150 cm , 160 cm
depending on the accuracy of measurement is a continuous variable.

2. Discrete or discontinuous variables – These are variables whose


values or levels cannot take the form of decimals.

Example. The number of children in a family which can assume any of


the values 1,2,3… but cannot be 1.5 or 3.4 is a discrete variable.

2
VARIABLES ACCORDING TO SCALE OF MEASUREMENT

1. Nominal Scale – Data classified into various distinct categories in which


ordering of label or assign numbers to the things is not implied. It has to
characteristics.
a.) they consist of two or more categories and
b.) do not have an intrinsic order.

Dichotomous variables are nominal variables that have two categories.

a.) Dichotomous variables are deigned to give you an either / response:


Example, you are either male or female. Do you like playing
chess? You may answer yes or no.
b.) Dichotomous variables can either be fixed or designed: Example, sex
variable can only be dichotomous (male or female) therefore they are fixed.

2. Ordinal Scale – Data classified into distinct categories in which ordering


of label or assigning numbers to things is implied.

Example, educational attainment, restaurant rating, pain level etc.

3. Interval Scale – Ordered scales in which the difference between


measurements
provides a meaningful quantity or intervals between scale point.
No absolute zero or absence of a meaningful zero is a key characteristic of interval
data.

Example, Temperature, Scores in Achievement Test, Calendar Time

4. Ratio Scale – Ordered scale in which the difference between the


measurements
involves absolute zero which means it possesses a meaningful zero that represents the
absence of the quantity being measured.

Example, Age , weight, height, distance

SUMMATION NOTATION

In statistics it is frequently necessary to work with sums of numerical values.


For example, we may wish to compute the average cost of a certain brand of
toothpaste sold at different stores. Perhaps we would like to know the total number of
heads that occur when 3 coins are tossed several times.
Consider a controlled experiment in which the decreases in weight over a 6-
month period were 15,10,18 and 6 kilograms, respectively. If we designate the first
recorded value of x1, the second x2 and so on, then we can write x1=15, x2=10, x3=18
and x4=6.
Using the Greek letter Ʃ (capital sigma) to indicate “summation of”, we can
write the sum of 4 weights as
4

x
i =1
i

3
where we read “summation of xi, I going from 1 to 4”. The numbers 1 and 4 are called
the lower and upper limits of summation.

x
i =1
i = x1 + x 2 + x3 + x4 = 15 + 10 + 18 + 6 = 49

Also ,
3

x
i =2
i = x 2 + x3 = 10 + 18 = 28

n
In general, the symbol x i =1
i means that we replace I wherever it appears after

the summation symbol by 1, then by 2, and so on up to n, and we add up the terms.


Therefore, we can write

x
i =1
i
2
=x1 + x2 + x3
2 2 2

x
j =2
j y j =x2 y 2 + x3 y 3 + x4 y 4 + x5 y 5

The subscript may be any letter, although I, j and k seem to be preferred by


statisticians.

Solved Problems

3 3
1 ) If x1 = 3 , x2 = 5 and x3 = 7 , find a ) xi b ) 2 x 2 c )  ( xi − i )
i =1 i =2

Solutuion :
a )  xi = x1 + x2 + x3 = 3 + 5 + 7 = 15
3
b )  2 x 2 = 2 x1 + 2 x2 + 2 x3 = 2(3) + 2(5 ) + 2(7 ) = 166
2 2 2 2 2 2

i =1
3
c )  (xi − i ) = (x2 − 2 ) + ( x3 − 3) = (5 − 2 ) + (7 − 4 ) = 7
i =2

2 ) Given x1 = 2 , x2 = −3 , x3 = 1 , y1 = 4 , y2 = 2 and y3 = 5 , evaluate


3
a ) xi yi = x1 y1 + x2 y2 + x3 y3 = (2 )(4 ) + (− 3)(2 ) + (1)(5 ) = 7
I =1

 3  2 2 
(
b )   xi   y j  = ( x2 + x3 ) y1 + y2 = (− 3 + 1)(4 2 + 2 2 ) = −40
2 2
)
 i = 2  i =1 

Theorem 1: The summation of the sum of two or more variables is the sum of their
summations.
n n n n

 (xi + yi + zi ) = xi + yi + zi
i =1 i =1 i =1 i =1

Theorem 2: If c is a constant, then

4
n n

 cxi = c xi
i =1 i =1

Theorem 3: If c is a constant, then


n

 c = nc
i =1

Exercises:

1 ) If x1 = 4 , x2 = −3 , x3 = 6 and x4 = −1,evaluate the following :


4
a )  xi (xi − 3)
2
4
b )  (xi + 1)
2
3
c)
(xi + 2 )
i =1 i =2 i =2 xi
2 ) If x1 = −2 , x2 = 3 , x3 = 1 , y1 = 4 , y2 = 0 and y3 = −5 ,evaluate the following :

b )  (2 xi + yi − 3) c ) ( x 2 )( y )
3
a )  xi yi
2

i =2

TOPIC 2

LEARNING OUTCOME:

1. Identify ways of data collection.


2. Understand manners in which data are collected.
3. Present data in different ways.
4. Learn steps in constructing frequency distribution.

COLLECTION OF DATA

In order to ensure the accuracy of data, one must know the right sources of
data and methods of collecting them.

TYPES OF DATA

1. Primary Data – These refers to data observed or collected from


firsthand information directly gathered from the source.

Advantage: The information gathered is more accurate and more likely


to be correct.

Disadvantage: Collection can be costly and time consuming.

2. Secondary Data – These refers to information collected in the past or


other sources which are previously gathered by other individual or agencies.

Advantage: a) It can be obtained quickly


b) It can be less expensive as much as it can be done with
books and internet.

Disadvantage: a) No real control over the quality of data


b) The needed information does not meet one’s specified
needs.

5
METHODS OF COLLECTING DATA

1. Direct Method or Interview – The researcher gets the needed


data/information directly from the respondent.

2. Indirect Method or Questionnaire – This is commonly used in


collecting primary data. The information is collected through a set of questionnaires.
A questionnaire is a document prepared by the researcher containing sets of questions
given out to acquire the needed information.

3. Registration Method – It refers to continuous, permanent, compulsory


recording of the occurrence of events together with certain identifying or descriptive
characteristics concerning them as provided through civil code, laws or regulations.
Examples of registration method are records of birth, marriages and deaths at NSO.
Registration record of Filipinos of voting age at the COMELEC is another example.

4. Observation Method – It involves mechanical or human observation of what


people actually do or what events take place. The information is collected by observing
process at work.

5. Experimentation Method – An experiment is a study of cause and


effect. It differs from non-experimental methods in that it involves the deliberate
manipulation of one variable, while trying to keep all other variables constant.

PRESENTATION OF DATA
1. Textual – This mode of presentation is explained or discussed in text
or in paragraph form.
2. Tabular – The data are systematically presented through tables
consisting of vertical columns and horizontal rows with headings these rows and
columns
3. Graphical – It is the most effective means of presenting statistical
data because a graph may make things clearer.

TYPES OF COMMON GRAPHS


1. Histograms - The histogram is a popular graphing tool. It is used to
summarize discrete or continuous data that are measured on an interval scale. It is
often used to illustrate the major features of the distribution of the data in a
convenient form. A histogram divides up the range of possible values in a data set into
classes or groups. For each group, a rectangle is constructed with a base length equal
to the range of values in that specific group, and an area proportional to the number
of observations falling into that group. This means that the rectangles will be drawn of
non-uniform height. A histogram has an appearance similar to a vertical bar graph, but
when the variables are continuous, there are no gaps between the bars. When the
variables are discrete, however, gaps should be left between the bars. Figure 1 is a
good example of a histogram

6
The histogram is a popular graphing tool. It is used to
summarize discrete or continuous data that are measured on an interval scale. It is
often used to illustrate the major features of the distribution of the data in a
convenient form. A histogram divides up the range of possible values in a data set into
classes or groups. For each group, a rectangle is constructed with a base length equal
to the range of values in that specific group, and an area proportional to the number
of observations falling into that group. This means that the rectangles will be drawn of
non-uniform height. A histogram has an appearance similar to a vertical bar graph, but
when the variables are continuous, there are no gaps between the bars. When the
variables are discrete, however, gaps should be left between the bars. Figure 1 is a
good example of a histogram.

2. Histograph is a graph formed by joining the midpoints of histogram column tops.


These graphs are used only when depicting data from the continuous variables shown
on a histogram.

A histograph smoothes out the abrupt changes that may appear in a histogram, and is
useful for demonstrating continuity of the variable being studied. Figure 2 and 3 are
good examples of histographs.

7
Unlike Figure 2, this histograph has spaces between the bars. By just looking at this
illustration, the reader can immediately tell that the spaces mean the variables are
discrete. In this way, histographs make it easier for the readers to determine what
type of variables has been used.

2. Frequency Polygon – A frequency polygon is a line graph that connects the


midpoints of the histogram intervals, plus extra intervals at the beginning and end so
that the line will touch the x-axis. An ogive is a line graph of the cumulative
frequencies.

8
3. Pie Chart – it is a circular graph that shows the relative contribution that different
categories contribute to an overall total. A wedge of the circle represents each
category’s contribution such that the graph resembles a pie that has been cut into
different sized slices.

FREQUENCY DISTRIBUTIONS

Raw Data
Raw data are collected which have not been organized numerically. An
example is the set of weight of 300 students obtained from an alphabetical listing of
college records.

Array
An array is an arrangement of raw numerical data according to
magnitude which is ascending or descending order. The difference between the
largest and smallest number is called the range of the data. For example, if the
heaviest weight of 300 students is 100 kg and the lightest is 54 kg the range is 100 – 54
= 46 kg

Frequency Distribution

It is a tabular arrangement of data showing its classification or grouping


according to size.

Class Interval – This refers to the grouping defined by a lower limit and an upper limit

Class frequency – This refers to the number of observations belonging to a class


interval

Class mark – This is the midpoint or middle value of the class interval

Class boundary – This is the more precise expressions of the class limits also called
the true Limits

Class size – This is the width of each class interval

9
Steps in Constructing a Frequency Distribution

1. Arrange the given raw data in ascending order


2. Compute for the range
Range = Highest Score – Lowest Score
3. Determine the number of classes by using Sturge’s Formula
𝑲 = 𝟏 + 𝟑. 𝟑𝟐𝟐 𝐥𝐨𝐠 𝒏
Where: 𝑲 is the approximate number of classes
𝒏 is the number of observations
4. Compute for the class size C = R ÷ K. The computed value of C should be
rounded-off for convenience.

5. Determine the lowest limit


6. Tally each score to the category of class interval it belongs. Sum the
frequency and check if its total is equal to the total number of observations.

Relative Frequency Distribution


Denote by % (RF) is derived by getting the ratio of the number of items in each
class to the total number of frequency . The relative frequency distribution maybe
expressed in % and its total is 100%.

Cumulative Frequency Distribution


The cumulative frequency is the accumulated frequencies of the classes, it can
be either at the beginning or end of the distribution.

The “less than” cumulative frequency is the number of observations that are
less than the upper class boundary in a given interval.

The "greater than” cumulative frequency is the number of observations that are
greater than the lower class boundary in a given interval.

Example: Grouped Data


Construct a frequency distribution table of 100 students in a Calculus class.
The following are their scores out of 30 items.

12 12 12 12 13 14 14 14 14 14
15 15 15 15 16 16 16 16 17 17
17 17 17 17 17 17 17 17 18 18
18 18 18 18 18 18 18 18 18 18
18 19 19 19 19 19 19 19 20 20
20 20 20 20 20 20 20 20 20 20
21 21 21 21 21 22 22 22 22 23
23 23 23 23 23 24 24 24 24 24
24 25 25 25 25 25 25 26 26 26
26 27 27 27 27 27 28 28 29 29

10
Compute the Range: R = Highest Score – Lowest Score ; R = 29 – 12 ; R = 17

Compute for the Number of Classes: K = 1 + 3.322 log n


K = 1 + 3.322 log 100 = 7.664
(Normally 5 - 20)

Compute for the Class Size: C = R ÷ K ; C = 17 ÷ 7.664 ; C = 2.22 ≅ 2

In some books 2.22 is rounded up so 2.22 becomes 3


So the frequency table below wiil have class interval 12 – 14 ; 15 – 17 ; 18 -20 ………..

Class Mark: (Lower limit + Upper limit) ÷ 2 example (12 + 13) ÷ 2 = 12.5
Class Boundary: Subtract 0.5 from the lower limit and add 0.5 from the upper limit

<Cumulative Frequency: Add the frequencies from the top start at 5: (5+9) = (14 +
14) = 28…
>Cumulative Frequency: Subtract the frequencies from the top start with total
number of observation which is 100

Relative Frequency: Divide the frequency of the class interval by the total number of
observations times 100% example: (5 ÷ 100) x 100% = 5%......

Class Class Class <Cumulative >Cumulative Relative


Frequency
Interval Mark Boundaries Frequency Frequency Frequency
12 - 13 5 12.5 11.5 – 13.5 5 100 5%
14 - 15 9 14.5 13.5 – 15.5 14 95 9%
16 - 17 14 16.5 15.5 -17.5 28 86 14%
18 - 19 20 18.5 17.5 – 19.5 48 72 20%
20 - 21 17 20.5 19.5 -21.5 65 52 17%
22 - 23 10 22.5 21.5 – 23.5 75 35 10%
24 - 25 12 24.5 23.5 – 25.5 87 25 12%
26 - 27 9 26.5 25.5 – 27.5 96 13 9%
28 - 29 4 28.5 27.5 – 29.5 100 4 4%
N=100

FREQUENCY DISTRIBUTION TABLE


Exercise:

1. Prepare the frequency distribution table for the data using 46 as the lowest
limit and 5 as the class interval.

68 73 81 76 83 96 76 83 95 81 48 56
75 79 90 73 76 77 84 63 68 65 62 70
50 57 89 91 69 72 51 60 81 48 72

11
2. Below are scores in a Statistics examination of 150 students. Construct a
frequency table. Present the given data in graphical form using histogram, frequency
polygon and ogive.

27 64 62 88 59 59 35 26 57 45
53 69 43 42 75 72 54 77 42 70
70 71 75 25 41 49 43 68 59 58
57 69 61 67 58 71 39 62 44 29
27 94 31 83 55 56 56 57 68 35
41 61 56 42 78 66 62 48 73 80
49 73 51 44 51 61 44 69 55 55
43 62 65 51 13 63 85 76 70 56
80 62 60 60 34 54 61 49 39 70
79 40 34 55 46 68 59 45 58 72
44 54 54 48 44 57 89 54 69 63
48 53 60 47 29 45 60 41 85 58
76 40 73 40 36 86 51 33 46 72
46 51 56 59 69 53 71 61 55 79
65 55 91 55 61 57 58 80 67 45

TOPIC 3

LEARNING OUTCOME:
1. Identify the location of a set of data using measures of central tendency.
2. Calculate the measures of central tendency for both group and ungrouped
data
3. Apply measures of position for both group and ungrouped data.
4. Interpret skewness and kurtosis of data set.

MEASURES OF CENTRAL TENDENCY


After the data have been presented in tabular or graphical form, the researcher
must be able to describe them in terms of single number. This single figure which is
representative or summary of the characteristics of a given set of data is called a
measure of central tendency.

The most commonly used measures of central tendency are mean, median and
mode.

12
Measures of Central Tendency of Ungrouped Data
Ungrouped Data or Raw Data are those which are not yet organized or arranged
into frequency distribution. If your number of observation is less or equal to 30 it is
ungrouped data.

Mean
The arithmetic mean or arithmetic average is defined as the sum of all items or
terms divided by the total number of items or terms. The definition is the same for
both the sample and population although different symbols are used.

The symbol for sample mean is x bar 𝑥̅ and for population mean is the Greek
letter mu 𝜇.

∑𝑥 ∑𝑥
Population mean 𝜇 = Sample mean 𝑥̅ =
𝑁 𝑛

Where: Where:
𝜇 = mean 𝑥̅ = mean
∑ 𝑥 = sum of all scores ∑ 𝑥 = sum of all scores
𝑁 = total number of terms in 𝑁 = total number of terms in
the population the sample

Example: Find the mean of the sample whose scores are: 12, 10, 18,16, 20 and 14

∑𝑥 12+10+18+16+20+14
Sample mean 𝑥̅ = = = 15
𝑛 6

Median
The median of ungrouped data is the value of the middle item after arranging
the data in ascending order.

Note: for odd number of terms the median is the middle term
for even number of term sum of two middle terms divided by 2

Example: Determine the median of 6, 14, 10 12, 2, 8, 4


2, 4, 6, 8, 10, 12, 14
median = 8

Example: Determine the median of 10, 8, 20, 16, 9, 13, 15, 7


7, 8, 9, 10, 13, 15, 16, 20
median = (10 +13)/2=11.5

Mode
The mode for ungrouped data is defined as the value that appears with the
highest frequency.

Example: Find the mode of 4, 7, 6, 4, 11, 3, 5, 8, 9 , 2


mode = 4
13
Measures of Central Tendency of Grouped Data
Grouped data are those data organized and summarized in the forms of
frequency distribution. If your number of observation is greater than 30 it is grouped
data. These are data classified into categories for better presentation and analysis.

Mean

1. Long Method:
∑ 𝑿𝒊 𝑭𝒊
̅=
𝒙 where: 𝑋𝑖 = classmark
𝒏
𝐹𝑖 = frequency
𝑛 = total number of frequency
2. Short Method: ( coded formula )
∑𝒏
𝒊=𝟏 𝒅𝒊 𝒇𝒊
̅ = 𝑨𝒎 + [
𝒙 ]𝒊
𝒏
where: 𝐴𝑚 = assumed mean or class mark of the
intervwith highest frequency
𝑑𝑖 = coded deviation
𝑓𝑖 = frequency
𝑖 = class interval
𝑛 = total number of frequency

Example:
The mean score of the frequency of 60 students in entrance examination.

Class Frequency Class Mark


(𝑋𝑖 𝐹𝑖 ) (𝑑 𝑖 ) (𝑑𝑖 𝑓𝑖 )
Interval (𝑓 ) 𝑖 (𝑋𝑖 )
18 - 26 8 22 176 -2 -16
27 - 35 13 31 403 -1 -13
36 - 44 21 40 840 0 0
45 - 53 6 49 294 1 6
54 - 62 12 58 696 2 24
∑ 𝑋𝑖 𝐹𝑖
𝑛 = 60 ∑ 𝑑𝑖 𝑓𝑖 = 1
= 2409

Solution:

1. Long Method:
∑ 𝑋 𝑖 𝐹𝑖 2409
𝑥̅ = = = 𝟒𝟎. 𝟏𝟓
𝑛 60

2. Short Method: ( coded formula )


∑𝑛
𝑖=1 𝑑𝑖 𝑓𝑖 1
𝑥̅ = 𝐴𝑚 + [ ]𝑖 𝑥̅ = 40 + [ ] 9 = 𝟒𝟎. 𝟏𝟓
𝑛 60

14
Median

The formula for finding the of grouped data:


𝒏
− <𝒄𝒇
𝟐
𝑴𝒅 = 𝑳𝑪𝑩𝑴𝒅 + 𝒄 ( )
𝒇𝒊
where: 𝑀𝑑 = median
𝐿𝐶𝐵𝑀𝑑 = Lower Class Boundary containing the middle class
< 𝑐𝑓 = less than the cumulative frequency preceding the median class
𝑐 = class interval
𝑛 = total number of frequencies

To solve for the median:

1. Compute the less than cumulative frequency

2. Find the class interval in which n/2, one half the total respondent
must equal to or greater than to less than cumulative frequency for the first time.

3. Apply the formula by substituting the given values.

Example:

Class Frequency
< Cumulative Frequency
Interval (𝑓 )𝑖
18 - 26 8 8
27 - 35 13 21
42 → median
36 - 44 21
class
45 - 53 6 48
54 - 62 12 60
𝑛 = 60

n/2 = 60/2 =30

𝑛 60
− <𝑐𝑓 − 21
𝑀𝑑 = 𝐿𝐶𝐵𝑀𝑑 + 𝑐 ( 2 ) 𝑀𝑑 = 35.5 + [ 2 21 ] = 𝟑𝟗. 𝟑𝟔
𝑓𝑖

Mode
The formula for finding the mode of grouped data:

𝒇𝑴𝒐 − 𝒇𝟏
𝑴𝒐 = 𝑳𝑪𝑩𝑴𝒐 + 𝒄 (𝟐𝒇𝑴 )
𝒐 − 𝒇𝟏 −𝒇𝟐
where: 𝑀𝑜 = Mode
𝐿𝐶𝐵𝑀𝑜 = Lower Class Boundary containing the modal class
𝑓𝑀𝑜 = frequency of the class interval containing the modal class
𝑓1 = frequency of the class before the modal class
𝑓2 = frequency of the class after the modal class
𝑐 = class size
𝑛 = total number of frequencies
15
Example:

Class Frequency
Interval (𝑓 ) 𝑖
18 - 26 8
27 - 35 13
36 - 44 21
45 - 53 6
54 - 62 12
𝑛 = 60

𝑓𝑀𝑜 − 𝑓1 21−13
𝑀𝑜 = 𝐿𝐶𝐵𝑀𝑜 + 𝑐 (2𝑓𝑀 ) 𝑀𝑜 = 35.5 + 9 (2(21) −13 − 6) = 𝟑𝟖. 𝟔𝟑
𝑜 − 𝑓 −𝑓
1 2

Weighted Mean

The weighted arithmetic mean of given groups of data is the average of


the mean of all the groups. Consider the proper weights assigned to the observed
values according to their relative importance.

∑ 𝒘𝒙
̅=
𝒙 𝒏
where: w = weight of each item
x = value of each item
n = total number of weights

Example: A man bought 10 liters of premium gasoline at P43.00 per liter, 12 liters
P43.50 and 18 liters at P42.50 from three different gasoline stations. Find the average
price per liter.

𝒘𝟏 𝒙𝟏 +𝒘𝟐 𝒙𝟐 +𝒘𝟑 𝒙𝟑
̅=
𝒙 𝒘𝟏 + 𝒘𝟐 + 𝒘𝟑
10(43.00) + 12(43.50) + 18(42.50)
𝑥̅ = = 𝟒𝟐. 𝟗𝟐𝟓
10 + 12 +1 8

16
MEASURES OF POSITION

Quantiles
The quantiles are a natural extension of the median concept in that they
are the values which divide the distribution into a given number of equal parts. While
the median divide the distribution into two parts, the quartiles divide the distribution
into four equal parts or quartiles, ten equal parts or deciles and one hundred equal
parts or percentiles.

Ungrouped Data
𝑖(𝑛+1)
Quartile: 4
𝑖(𝑛+1)
Decile: 10
𝑖(𝑛+1)
Percentile: 100

Example: Find the 3rd quartile of the following data.


5, 7, 11, 1, 17, 23, 19, 3, 9, 21, 15, and 13

Arrange in ascending order: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23

𝑖(𝑛+1) 3(12 + 1)
𝑄3 = = = 9.75𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 ∴ 17 + 0.75(19 − 17) = 18.5
4 4

Grouped Data

𝒊𝒏
− <𝒄𝒇𝑸𝒊−𝟏
𝟒
𝑸𝒊 = 𝑳𝑪𝑩𝑸𝒊 + 𝒄 ( )
𝒇𝑸𝒊

where: 𝐿𝐶𝐵𝑄𝑖 = Lower Class Boundary of the 𝑄𝑖 𝑡ℎ class


C = class size
𝑛 = total number of observations in the
distribution
< 𝑐𝑓𝑄𝑖−1 = less than cumulative frequency preceding
𝑄𝑖 𝑡ℎ class
𝑓𝑄𝑖 = frequency of the 𝑄𝑖 𝑡ℎ class

𝒊𝒏
− <𝒄𝒇𝑫𝒊−𝟏
𝟏𝟎
𝑫𝒊 = 𝑳𝑪𝑩𝑫𝒊 + 𝒄 ( )
𝒇𝑫𝒊

where: 𝐿𝐶𝐵𝐷𝑖 = Lower Class Boundary of the 𝐷𝑖 𝑡ℎ class


𝑐 = class size
𝑛 = total number of observations in the
distribution
< 𝑐𝑓𝐷𝑖−1 = less than cumulative frequency preceding
𝐷𝑖 𝑡ℎ class

17
𝑓𝐷𝑖 = frequency of the 𝐷𝑖 𝑡ℎ class

𝒊𝒏
− <𝒄𝒇𝑷𝒊−𝟏
𝟏𝟎𝟎
𝑷𝒊 = 𝑳𝑪𝑩𝑷𝒊 + 𝒄 ( )
𝒇𝑷𝒊

where: 𝐿𝐶𝐵𝑃𝑖 = Lower Class Boundary of the 𝑃𝑖 𝑡ℎ class


𝑐 = class size
𝑛 = total number of observations in the
distribution
< 𝑐𝑓𝑃𝑖−1 = less than cumulative frequency preceding
𝑃𝑖 𝑡ℎ class
𝑓𝑃𝑖 = frequency of the 𝑃𝑖 𝑡ℎ class

Example: Compute for the 𝑄3

Class Frequency
< Cumulative Frequency
Interval (𝑓 )𝑖
18 - 26 8 8
27 - 35 13 21
36 - 44 21 42
48 → class interval
containing
45 - 53 6
the desired
quartile
54 - 62 12 60
𝑛 = 60

𝑖𝑛 3∙60
= = 45
4 4

𝒊𝒏
− <𝒄𝒇𝑸𝒊−𝟏 45−42
𝟒
𝑸𝒊 = 𝑳𝑪𝑩𝑸𝒊 + 𝒄 ( ) 𝑄3 = 44.5 + 9 ( ) = 𝟒𝟗
𝒇𝑸𝒊 6

Measures of Skewness and Kurtosis

A fundamental task in many statistical analyses is to characterize the location


and variability of a data set. A further characterization of the data includes skewness
and kurtosis.

The histogram is an effective graphical technique showing both the skewness


and kurtosis of data set.

Skewness

Skewness is a sign of symmetry and deviation from a normal distribution.

3(𝑀𝑒𝑎𝑛 − 𝑀𝑒𝑑𝑖𝑎𝑛) 3(𝜇−𝑀𝑑)


𝑆𝑘 = 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = 𝜎

18
Interpretaion

Right skewed distribution – most values are concentrated on left of the mean with
extreme values to the right if skewness is greater than 0.

Left skewed distribution – most values are concentrated on the right of the mean with
extreme values to the left if skewness is less than 0.
The distribution is symmetrical around the mean if skewness is equal to 0 which is also
equal to man and median.

Example: Calculate the skewness of the following data

2 3 3 4 4 7 12

∑𝑥 2 +3 + 3 + 4 + 4 + 7 + 12
𝜇= = =5
𝑁 7
(2 − 5)2 + (3 − 5)2 + (3 − 5)2 + (4 − 5)2 + (4 − 5)2 + (7 − 5)2 + (12 − 5)2
𝜎2 =
7
= 10.29 𝜎 = √10.29 = 3.21

3(𝑀𝑒𝑎𝑛 − 𝑀𝑒𝑑𝑖𝑎𝑛) 3(𝜇−𝑀𝑑) 3(5−4)


𝑆𝑘 = 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = = = 0.93, 𝑠𝑘𝑒𝑤𝑒𝑑 𝑡𝑜 𝑡ℎ𝑒 𝑟𝑖𝑔ℎ𝑡
𝜎 3.21

Kurtosis

Kurtosis is used in distribution analysis as a sign of flattening or “peakedness”


of a distribution.

∑𝑁
𝑖=1(𝑋𝑖 − 𝜇 )
4
For ungrouped data: 𝐾 = 𝑁𝜎4

∑𝑁
𝑖=1 𝑓𝑖 (𝑋𝑚𝑖 − 𝜇 )
4
For grouped data: 𝐾 = 𝑁𝜎4

Where: 𝑋𝑖 = score of the ith individual in the sample


𝑓𝑖 = frequency of the ith category
𝑋𝑚𝑖 = class mark of the ith category
𝜇 = population mean
𝑁 = sample size
𝜎 = standard deviation

Interpretation

If kurtosis > 0 – Leptokurtic distribution, sharper than a normal distribution,


with values concentrated around the mean and thicker tails. This means high
probability for extreme values.

19
If kurtosis < 0 – Platykurtic distribution, flatter than a normal distribution with
a wider peak. The probability for extreme values is less than for a normal distribution
and the values are wider spread around the mean.

Example: The following indicates the test scores of 10 students in Physics. Calculate
the kurtosis.

22 27 32 37 42 47 52 57 62 67

𝑋𝑖 (𝑋𝑖 − 𝑋̅) (𝑋𝑖 − 𝑋̅ )4


67 (67-44.5) = 22.5 256289.06
62 (62-44.5) = 17.5 93789.06
57 (57-44.5) = 12.5 24414.06
52 (52-44.5) = 7.5 3164.06
47 (47-44.5) = 2.5 39.06
42 (42-44.5) = -2.5 39.06
37 (37-44.5) = -7.5 3164.06
32 (32-44.5) = -12.5 24414.06
27 (27-44.5) = -17.5 93789.06
22 (22-44.5) = -22.5 2566289.06
∑ 𝑋𝑖 = 445 ∑(𝑋𝑖 − 𝑋̅)4 = 755390.63
𝜇 = 44.5 𝜎 = 14.36

755390.63
𝐾 = 10(42522.39) = 1.77, 𝑙𝑒𝑝𝑡𝑜𝑘𝑢𝑟𝑡𝑖𝑐

Exercises:

1. The following scores were received by 20 eng’g students in a short quiz:


10, 9, 15, 20, 13, 15, 18, 11, 7, 12, 15, 13, 18, 19, 12, 8, 10, 13, 17 and 15. Find
𝑄3 , 𝐷10 , 𝑃40 , 𝑚𝑒𝑎𝑛, 𝑚𝑒𝑑𝑖𝑎𝑛 & 𝑚𝑜𝑑𝑒.

2. The following frequency distribution of examination marks in math. Compute mean,


median
and mode.

Class interval Frequency


45 – 49 3
50 – 54 4
55 – 59 5
60 – 64 6
65 – 69 13
70 – 74 9
75 – 79 8
80 – 84 7
85 – 89 2

20
3. The following are the result of exam of 55 students in Thermodynamics. Compute
P62 , D7 & Q1.

Class interval Frequency


33 – 40 4
41 – 48 10
57 – 64 11
65 – 72 9
73 – 80 6
81 – 88 2
89 – 96 5

4. Find the degree of skewness for the following data:

a. Scores in Mechanics quiz


50 55 56 57 57 57 65 65 66
70 70 70 70 70

b. Height of students in cm
126 134 155 150 150 152 152 160 165
174 175 175 176 176 176 177 177

5. Calculate the degree of kurtosis for the data below:

a. Weight in kilogram of luggage in a local airport


15 20 20 21 21 21 22 23 25 28
30 31 32 33 34 35 35 35 35 36

b. Scoring averages of basketball players during the recent tournament


10.5 11 12 12.5 12.5 15.5 15.5 16 16 18
18.5 19 19 20.5 20.5 22 22 23 24.5 25

TOPIC 4

MEASURES OF DISPERSION OR VARIABILITY

While measures of central tendency are used to estimate “normal” values of a


data set, measures of dispersion are important for describing the spread of the data,
or its variation around the central value. Two distinct samples may have the same
mean or median, but completely different levels of variability, or vice versa. A proper
description of the set of data should include both these characteristics. There are
various methods that can be used to measure the dispersion of data set, each with its
own set of advantages and disadvantages.

21
For Ungrouped Data

Range
It is defined as the difference between the largest and smallest values. Also, it
is one of the simplest measure of variability to calculate. It depends only on extreme
values and provides no information about how the remaining data are distributed.

Mean Absolute Deviation (MAD)

To arrive at more precise and reliable measure of variation, all item values in
the distribution must be taken into account and determine the amount by which each
item value varies from the mean of the distribution and one way of doing so is to use
the mean absolute deviation.

∑|𝒙𝒊 − 𝒙
̅|
𝑴𝑨𝑫 = 𝒏
where: 𝑥𝑖 = value of each observation
𝑥̅ = mean
𝑛 = total number of items

Quartile Deviation or Semi- Interquartile Range

It is the amount of dispersion present in the middle 50% of the values in a


distribution. It is the difference between the first quartile and the third quartile
divided by two.

𝑸𝟑 −𝑸𝟏
𝑸𝑫 = 𝟐

Variance

It is the average of the squared deviation values from the distributions mean. If
all values are identical the variance is zero, the greater the dispersion of values the
greater the variance. The symbol for sample variance is 𝑠 2 and the population
variance is 𝜎 2 .

∑𝑵
𝒊=𝟏(𝑿𝒊 − 𝝁)
𝟐
Population Varaiance: 𝝈𝟐 = 𝑵
where: 𝑥𝑖 = value of each item
𝜇 = population mean
𝑁 = total number of observations

∑𝑵 ̅ 𝟐
𝒊=𝟏(𝑿𝒊 − 𝑿)
Sample Variance: 𝒔𝟐 = 𝒏−𝟏
where: 𝑋𝑖 = value of each item
̅ = population mean
𝑿
𝑛 = total number of observations
22
Standard Deviation

It is the positive square root of the variance which measures the spread or
dispersion of each value from the mean of the distribution. It is the most used
measure of spread since it improves interpretability by removing the variance square
and expressing deviations in their original unit and is significantly related to the
normal distributions. It is the most important measure of dispersion since it enables us
to determine with a great deal of accuracy where values of the distribution are
located in relation to the mean.

∑𝑵
𝒊=𝟏(𝑿𝒊 − 𝝁)
𝟐
Population Standard Deviation: 𝜎 = √ 𝑵
where: 𝑥𝑖 = value of each item
𝜇 = population mean
𝑁 = total number of observations

∑𝑵 ̅ 𝟐
𝒊=𝟏(𝑿𝒊 − 𝑿)
Sample Standard Deviation: 𝑠 = √ 𝒏−𝟏
where: 𝑋𝑖 = value of each item
̅ = population mean
𝑿
𝑛 = total number of observations

Example: Ungrouped Data

The weights in kilos of 12 kid are: 50, 59, 55, 48, 60, 54, 48, 61, 57, 45, 52 & 63.
Solve the ff.
a. Range
b. Quartile Deviation
c. Mean Absolute Deviation
d. Variance
e. Standard Deviation

a. Range =Highest Value – Lowest Value


= 63 – 45 = 18

𝑸𝟑 −𝑸𝟏
b. 𝑸𝑫 = 𝟐
45, 48, 48, 50, 52, 54, 55, 57, 59, 60, 61, 63

𝑖(𝑛+1) 1(12+1)
𝑄1 = = = 3.25𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 ∴ 48 + 0.25(50 − 48) = 48.55
4 4

𝑖 (𝑛+1) 3(12+1)
𝑄3 = = = 9.75𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 ∴ 59 + 0.75(60 − 59) = 59.75
4 4

𝑄3 − 𝑄1 59.75−48.50
𝑄𝐷 = = = 𝟓. 𝟔𝟑
2 2

23
c. Mean Absolute Deviation

∑𝑥 652
𝑥̅ = = = 54.33
𝑛 12

𝑋𝑖 |𝑋𝑖 − 𝑋̅| (𝑋𝑖 − 𝑋̅)2


45 |45 − 54.33| = 9.33 87.04
48 |48 − 54.33| = 6.33 40.06
48 |48 − 54.33| = 6.33 40.06
50 |50 − 54.33| = 4.33 18.74
52 |52 − 54.33| = 2.33 5.42
54 |54 − 54.33| = 0.33 0.10
55 |55 − 54.33| = 1.33 1.76
57 |57 − 54.33| = 3.33 11.08
59 |59 − 54.33| = 5.33 28.40
60 |60 − 54.33| = 6.33 40.06
61 |61 − 54.33| = 7.33 53.72
63 |63 − 54.33| = 9.33 87.04
∑(𝑋𝑖 − 𝑋̅)2
∑|𝑋𝑖 − 𝑋̅ | = 61.96
= 413.48

∑|𝑥𝑖 − 𝑥̅ | 61.96
𝑀𝐴𝐷 = = = 𝟓. 𝟏𝟔
𝑛 12

d. Variance

∑𝑁 ̅ 2
𝑖=1(𝑋𝑖 − 𝑋 ) 413.48
𝑠2 = = = 𝟑𝟕. 𝟓𝟖
𝑛−1 12−1

e. Standard Deviation

∑𝑁 ̅ 2
𝑖=1(𝑋𝑖− 𝑋 ) 413.48
𝑠=√ 𝑛−1
= √ 12−1 = 𝟔. 𝟏𝟑

For Grouped Data

Range = Upper boundary of the highest class – lower boundary of the lowest class

𝑸𝟑 −𝑸𝟏
Deviation = 𝑸𝑫 = 𝟐

∑ 𝒇|𝒄𝒎 − 𝒙
̅|
Mean Absolute Deviation = 𝑴𝑨𝑫 =
𝒏
where: 𝑐𝑚 = class midpoint
𝑥̅ = mean
𝑛 = total number of items
𝑓 = frequency

̅ )𝟐
∑ 𝒇(𝒄𝒎−𝒙
Variance = 𝝈𝟐 =
𝒏
24
where: 𝑐𝑚 = class mark of each classes
̅ = mean
𝒙
𝑛 = total number of observations
𝑓 = frequency

̅ )𝟐
∑ 𝒇(𝒄𝒎−𝒙
Standard Deviation = 𝝈 = √
𝒏
where: 𝑐𝑚 = class mark of each classes
𝒙 = mean
̅
𝑛 = total number of observations
𝑓 = frequency

Example : For grouped data

The following is a frequency distribution of an achievement test. Using the


table compute the following: a) Mean Absolute Deviation b) Standard Deviation and c)
Quartile Deviation

a) Mean Absolute Deviation

Class
Class Frequency
Mark 𝑑 𝑑𝑓 |𝑐𝑚 − 𝑥̅ | 𝑓 |𝑐𝑚 − 𝑥̅ |
interval (𝑓)
( 𝑐𝑚 )
22 – 40.15 = 8(18.15) =
18 - 26 8 22 -2 -16
18.15 145.20
13(9.15) =
27 - 35 13 31 -1 -13 31 - 40.15 = 9.15
118.95
36 - 44 21 40 0 0 40 – 40.15 = 0.15 21(0.15) = 3.15
45 - 53 6 49 1 6 49 – 40.15 = 8.85 6(8.85) = 53.10
58 – 40.15 = 12(17.85) =
54 -62 12 58 2 24
17.85 214.20
∑ 𝑑𝑓 ∑ 𝑓|𝑐𝑚 − 𝑥̅ | =
N= 60
=1 534.60

∑𝑛
𝑖=1 𝑑1 𝑓1 1
𝑥̅ = 𝐴𝑚 + ( ) 𝑖 = 40 + (60) 9 = 40.15
𝑁

∑ 𝑓|𝑐𝑚 − 𝑥̅ | 534.60
𝑀𝐴𝐷 = = = 𝟖. 𝟗𝟏
𝑛 60

25
b) Standard Deviation

Class
Class Frequency
Mark (𝑐𝑚 − 𝑥̅ ) (𝑐𝑚 − 𝑥̅ )2 𝑓 (𝑐𝑚 − 𝑥̅ )2
interval (𝑓)
( 𝑐𝑚 )
18 - 26 8 22 22 – 40.15 = 18.15 329.42 2635.36
27 - 35 13 31 31 - 40.15 = 9.15 83.72 1088.36
36 - 44 21 40 40 – 40.15 = 0.15 0.02 0.42
45 - 53 6 49 49 – 40.15 = 8.85 78.32 469.92
54 -62 12 58 58 – 40.15 = 17.85 318.62 3823.44
∑ 𝑓 (𝑐𝑚
N= 60 − 𝑥̅ )2
= 8017.5

̅)𝟐
∑ 𝒇(𝒄𝒎−𝒙 𝟖𝟎𝟏𝟕.𝟓
𝝈=√ =√ = 𝟏𝟏. 𝟓𝟓
𝒏 𝟔𝟎

c) Quartile Deviation

𝒊𝒏 𝟏(𝟔𝟎)
− <𝒄𝒇𝑸𝒊−𝟏 −𝟖
𝑸𝟏 = 𝑳𝑪𝑩𝑸𝟏 + 𝒄 ( 𝟒 𝒇𝑸𝒊
) = 𝟐𝟔. 𝟓 + 𝟗 ( 𝟒
𝟏𝟑
) = 𝟑𝟏. 𝟑𝟓
𝒊𝒏 𝟑(𝟔𝟎)
− <𝒄𝒇𝑸𝟑−𝟏 − 𝟒𝟐
𝑸𝟑 = 𝑳𝑪𝑩𝑸𝟑 + 𝒄 ( 𝟒 ) = 𝟒𝟒. 𝟓 + 𝟗 ( 𝟒
) = 𝟒𝟗
𝒇𝑸𝟑 𝟔
𝑸𝟑 − 𝑸𝟏 𝟒𝟗 − 𝟑𝟏.𝟑𝟓
𝑸𝑫 = = = 𝟖. 𝟖𝟑
𝟐 𝟐

Exercises:

1. The following are the scores of 15 IT students in 3 quizzes.

a) 16, 15, 18, 30, 32, 12, 16, 30, 24, 16, 18, 35, 50, 35, 36
b) 11, 13, 15, 17, 14, 35, 10, 11, 12, 12, 16, 24, 24, 26, 45
c) 16, 17, 18, 19, 20, 21, 22, 22, 22, 24, 26, 20, 19, 30, 27

Compute the following:

a) range b) quartile deviation


c) mean absolute deviation d) variance
e) standard deviation

26
2. The table shows the frequency distribution of the number of netbooks sold during
the past month at 45 stores. Solve the a) quartile deviation b) mean absolute
deviation c) variance d) standard deviation

Class interval Frequency


3 -11 5
12 -20 10
21 – 29 4
30 – 38 14
39 – 47 8
48 – 56 3
57 - 65 1

3. The following list of scores from an English exam given to grade 7 students.
Compute the a) range b) quartile deviation c) mean absolute deviation d)
variance and e) standard deviation

23 35 20 65 79 88 45 57 119 67 67 55 81
100 112 96 53 28 40 53 70 93 86 75 6 101
93 86 94 81 38 44 57 69 52 35 46 69 77
66 50 71 30 45 83 79 105 88 68 60

4) The table shows the departure of trains in a certain station. Solve the a) quartile
deviation b) mean absolute deviation c) variance d) standard deviation

Class interval Frequency


10 – 14 13
15 – 19 24
20 – 24 18
25 – 29 15
30 – 34 20
35 - 39 16

27

You might also like