Professional Documents
Culture Documents
W E
EDA – S
2021-2022
EDA – Module 2
DESCRIBING DATA
ENGINEERING DATA ANALYSIS
Unit 1: Module 1: Unit 2: Module 2: Unit3: Module 3: Unit 4: Module 4: Unit 5: Module
Intro to EDA as a Subject in Describing Data Descriptive Statistics, 5: Sensitivity Analysis ,
Introduction to Probability
BSCE Frequency Tables Sampling Distributions Project Study use of MS
A survey of probability Excel and PowerPoint
EDA Overview Frequency Distributions concepts Mean, Median, Mode,
Percentile, quartile, Outliers Presentations
CHED PSG for BSCE program Graphical Presentations Discrete probability Concepts
Bayes Theorem, Empirical Sensitivity Analysis
Descriptive Statistics. Symmetry and skewness Discrete Probability Rule, Hypothesis testing, Z- Project Study in Statistics
Qualitative and Quantitative Measure of Central Locations Distributions test and t-tests, Normal
Variable, Discrete and Statistical Tools and MS Excel,
Measure of Variations Continuous Probability Distribution, Linear
continuous Variables PowerPoint presentations of
Distributions Regression
. analyzed data
Levels of measurements- . ANOVA
1.Frequency Tables
2.Frequency Distributions
3.Graphical Presentations
4.Symmetry and skewness
5.Measure of Central Locations
6.Measure of Variations
Course Outline
𝑛
𝜇𝐻 =
𝑛 1
σ𝑖=1
𝑋𝑖
Measure of Central Locations/Tendencies
2. Median
The median of a set of observations arranged in an
increasing or decreasing order of magnitude is
the middle value when the number of observations is
odd or
the arithmetic mean of the two middle values when
the number of observations is even.
Measure of Central Locations/Tendencies
3. Mode
The mode of a set of observations is
that value which occurs most often or
with the greatest frequency.
Measure of Variation
➢ Population Variance
2
σ𝑁𝑋𝑖 − 𝜇
𝑖=1
2 𝑁 σ𝑁 𝑋
𝑖=1 𝑖
2
− 𝑁
σ𝑖=1 𝑋𝑖
𝜎2= =
𝑁 𝑁2
N = population size
➢ Sample Variance
2
σ𝑛
𝑖=1 𝑋𝑖 − 𝜇
2 𝑛 σ𝑛𝑖=1 𝑋𝑖 2 − σ𝑛𝑖=1 𝑋𝑖 2
𝑆 = =
𝑛−1 𝑛 𝑛−1
n = sample size
Measure of Variation
Standard Deviation = square root of the variance
𝜎= 𝜎2
𝑠= 𝑠2
Mean:
Coding Techniques
2
2
𝜎coded
Multiplication: 𝜎actual =
(constant)2
Sample Problems
1. The number of incorrect answers on a true-false
competency test for a random sample of 15
students were recorded as follows: 2, 1, 3, 0, 1, 3,
6, 0, 3, 3, 5, 2, 1, 4, and 2. Find (a) the mean; (b)
the median; (c) the mode.
2. The average IQ of 10 students in a mathematics
course is 114. If 9 of the students have IQs of 101,
125, 118, 128, 106, 115, 99, 118, and 109, what
must be the other IQ?
Sample Problems
1. The number of incorrect answers on a true-false competency test
for a random sample of 15 students were recorded as follows: 2,
1, 3, 0, 1, 3, 6, 0, 3, 3, 5, 2, 1, 4, and 2. Find (a) the mean; (b)
the median; (c) the mode.
By MS Excel
MEAN = 2.4
MEDIAN = 2
MODE = 3
Sample Problems
1 2 3 4 5 6 7 8 9 10
Mean
101 = 114118 128 106 115
125 99 118 109 X
1019 + 𝑥
= 114; 1140 = 1019 + 𝑥 ; 𝑥 = 121
10
Sample Problems
Mean = 17 x x+ 4
23 27
Mean = 19
By Block system
(17)(6) + 𝑥 + 𝑥 + 4 = 19(8)
2𝑥 = (19)(8) – (17)(6) − 4
46
𝑥 = = 23
2
other number 23 + 4 = 27
Sample Problems
𝑁
− 𝐹𝑐𝑢𝑚
𝑀𝑒 = 𝐿𝑚𝑒 + 𝐶 2
Median 𝑓𝑚𝑒
𝑓𝑚𝑜 − 𝑓𝑏
Mode 𝑀𝑂 = 𝐿𝑚𝑜 + 𝐶
2𝑓𝑚𝑜 − 𝑓𝑏 − 𝑓𝑎
Measure of Central
Locations
Where:
fi = frequency of the ith class
Xi = class mark (midpoint) of the ith class
Lme = the lower limit on the true class boundary of the
median class
C = the class width
N = the total number of observations
Fcum = the cumulative frequency (<) of the class just
before the median class
Measure of Central Locations
fme = the frequency of the median class
Lmo = the lower limit on the true class boundary of the
modal class
fmo = the frequency of the modal class
fb = the frequency of the class just before the modal class
fa = the frequency of the class just after the modal class
Measure of Variation
(Grouped Data)
RANGE = Highest Value – Lowest Value
Population Variance
2
2
σ𝑘𝑖=1 𝑓𝑖 𝑋𝑖 2 𝑁 σ𝑘𝑖=1 𝑓𝑖 𝑋𝑖 2
− 𝑘
σ𝑖=1 𝑓𝑖 𝑋𝑖
𝜎𝐺 = − 𝜇2 =
𝑁 𝑁2
Standard Deviation, σ
𝜎= 𝜎𝐺 2
MEASURE OF VARIATION
Sample Variance
2
2
𝑛 σ𝑘𝑖=1 𝑓𝑖 𝑋𝑖 2
− 𝑘
σ𝑖=1 𝑓𝑖 𝑋𝑖
𝑠𝐺 =
𝑛 𝑛−1
Standard Deviation, S
𝑠= 𝑠𝐺 2
EXAMPLE:
The following numbers represent the total number of projects
undertaken by 60 different contractors in CAR for the past five
years.
12 6 8 23 6 7 25 7 3 3 4 1
18 10 14 7 19 9 6 8 4 4 6 3
18 13 24 7 6 8 9 7 5 5 6 5
19 6 8 14 8 8 14 8 6 5 21 22
12 8 17 10 2 17 7 7 16 10 22 25
Excel Sheet
12 6 8 23 6 7 25 7 3 3 4 1
18 10 14 7 19 9 6 8 4 4 6 3
18 13 24 7 6 8 9 7 5 5 6 5
19 6 8 14 8 8 14 8 6 5 21 22
12 8 17 10 2 17 7 7 16 10 22 25
MAX 25
MIN 1
COUNT 60
RANGE 24
K 7.745967
SAY 7 or 8
CLASSES 3 K=8
4 K=7
Excel Sheet (Final)
23 6 7 25 7 3 3 4 1
7 19 9 6 8 4 4 6 3
7 6 8 9 7 5 5 6 5
14 8 8 14 8 6 5 21 22
10 2 17 7 7 16 10 22 25
Excel Sheet (tentative)
Chart Title
120
100
80
60
40
20
0
-0.5 – 3.5 3.5 – 7.5 7.5 – 11.5 11.5 – 15.5 15.5 – 19.5 19.5 – 23.5 23.5 – 27.5
20
Frequency
Modal Reading
10
19.5
15.5
27.5
23.5
11.5
-0.5
7.5
3.5
Coefficient of Skewness
3 mean − median
Coef. of Skewness =
Standard Deviation
Frequency Ogives
Figure
60
Cumulative Frequency
40
30
Median Reading
20
21.5
13.5
25.5
17.5
9.5
1.5
5.5
Class Mark
Central Tendencies
1. Mean
σ 𝑓𝑖 𝑋𝑖 618
𝜇𝐺 = = = 10.3 contracts
σ 𝑓𝑖 60
2. Median
𝑁
− 𝐹𝑐𝑢𝑚
𝑀𝑒𝐺 = 𝐿𝑚𝑒 + 𝐶 2
𝑓𝑚𝑒
60
− 27
𝑀𝑒𝐺 = 7.5 + 4 2 = 8.4 contracts
13
Central Tendencies
3. Mode
𝑓𝑚𝑜 − 𝑓𝑏
𝑀𝑜𝐺 = 𝐿𝑚𝑜 + 𝐶
2𝑓𝑚𝑜 − 𝑓𝑏 − 𝑓𝑎
22 − 5
𝑀𝑜𝐺 = 3.5 + 4
2 22 − 5 + 13
= 6.1 contracts
Variance:
1. As a population;
σ 2 σ 2
2
𝑁 𝑓 𝑋
𝑖 𝑖 − 𝑋𝑖
𝜎𝐺 =
𝑁2
2
2
60 8887 − 618
𝜎𝐺 = = 42.03
60 2
Standard Deviation,
2. As a sample;
σ 2 σ 2
𝑛 𝑓𝑖 𝑋𝑖 − 𝑋𝑖
𝑆𝐺2 =
𝑛 𝑛−1
2
2 60 8887 − 618
𝑆𝐺 = = 42.74
60 59
Standard Deviation,
Percentile, Pi
➢Values that divide a set of observations into 100 equal parts.
➢Interpretation: These values, denoted by P1, P2, . . . . , P99, are such that
1% of the data falls below P1; 69% of the data falls below P69.
Decile
Decile, Di
➢Values that divide a set of observations into 10 equal parts.
➢Interpretation: These values, denoted by D1, D2, . . . . , D9, are such that
10% of the data falls below D1, 40% of the data falls below D4, . . . . , 90% of
the data falls below D9.
Quartile
Quartile, Qi
➢Values that divide a set of observations into 4 equal parts.
➢Interpretation: These values, denoted by Q1, Q2, and Q3, are such that 25% of
the data falls below Q1, 50% falls below Q2, and 75% falls below Q3.
Group Data – problem Solving
14 15 27 21 18
19 18 22 33 16
18 17 23 28 13
Grouped Data Analysis - Solution
Frequency distribution table (FDT)
Steps:
1. Determine The Range Of The Data,
R = Highest Value – Lowest Value
R = 33-12 = 21
2. Determine The Number Of Classes, K some Textbooks
recommend between 5 and 20
𝐾 = 𝑁 = 4.47 or 𝐾 = 1 + 3.3 log 𝑁 =
say 5 5.29
Frequency
22
5
33 4
16 3
18 2
17 1
23 0
10-14 15-19 20-24 25-29 30-34
28
Audit time (days)
13
Frequency
Notes =FREQUENCY(A2:A21,D2:D6)
Use CTRL+SHIFT+ENTER
Group Data – Table 2.7 Frequency data
Notes =FREQUENCY(A2:A21,D2:D6)
Use CTRL+SHIFT+ENTER
Group Data – Charting – MS Excel
Quiz
A portion of a frequency distribution table (FDT) is given below.
Classes Frequency
10.25 – 5
24.92 – 8
39.59 – 15
54.26 – 10
68.93 – 19
83.60 – 8
Required:
1.Construct the FDT
2.Determine the mode of the data
3.Determine the coefficient of skewness. Consider the data as a sample.
Frequency Tables:
Describing data
Describing data
6
Frequency
22 5
33
4
16
3
18
2
17
23 1
28 0
13 10-14 15-19 20-24 25-29 30-34
Audit time (days)