You are on page 1of 56

ENGINEERING DATA

ANALYSIS

Data Colllection

Inferential Statistiscs
Descriptive Statistics
Data Presentation
Estimation Hypothesis Univariate analysis
Measures of Location
Tabulation Testing
Measures of Dispersion
Diagrams Ponit estimate Multivariate analysis
Measures of Skewness &
Graphs Inteval estimate
Kurtosis
Useful for exploratory of a data set as well as for
the reporting of final results of a study.
Measures of Central Tendency
or
Measures of Location
or
Measures of Averages
Descriptive Statistics
The goal of descriptive statistics is to summarize a
collection of data in a clear and understandable
way.
Central Tendency
Measure of Central Tendency:
A single summary score that best describes the central
location of an entire distribution of scores.
The typical score.
The center of the distribution.
One distribution can have multiple locations where
scores cluster.
Must decide which measure is best for a given situation.
Central Tendency
Measures of Central Tendency:
Mean
The sum of all scores divided by the number of
scores.
Median
The value that divides the distribution in half when
observations are ordered.
Mode
The most frequent score.
Measure of central tendency
Arithmetic Mean (Mean)
Definition:
Sum of all the observations divided by the number
of the observations

The arithmetic mean is the most common measure


of the central location of a sample. n
N

x i
x i
Population   i 1 Sample X i 1

N n
Mean “sigma”, the sum of X, add up
all scores

Population
X
“mu”  “N”, the total number of
N scores in a population

Sample “sigma”, the sum of X, add up


all scores
X
“X bar” X 
n
“n”, the total number
of scores in a sample
The Mean
PROPERTIES:
the arithmetic average of all the scores
(X)/N
the sum of the deviations from the mean that
makes (Xi - ) is equal to 0.
the sum of squared deviations that makes
(Xi - )2 is the minimum.
Mean reflects magnitude of every observation.
Easily affected by presence of extreme values.
Calculating the Mean for
Grouped Data

 f X
X 
N
where: f X = a score multiplied by its frequency

Mean affected by extreme values

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14

Mean = 5 Mean = 6
Calculating the Mean
Calculate the mean of the following data:
1 5 4 3 2
Sum the scores (X):
1 + 5 + 4 + 3 + 2 = 15
Divide the sum (X = 15) by the number of
scores (N = 5):
15 / 5 = 3
Mean = X = 3
Calculating the Mean for
Grouped Data
Score Number of
students
Find the mean of the following data:
10 3
Mean = 9 10
[3(10)+10(9)+9(8)+8(7)+10(6)+
2(5)]/42 = 7.57 8 9

7 8

6 10

5 2
Calculating the Mean for
Grouped Data
CI frequency Class
Midpoi
fi X i
nt(CM)
Find the mean
of the following 40-50 4 45 180
data: 51-61 3 56 168
62-72 6 67 402
73-83 8 78 624
84-94 7 89 623
95-105 2 100 200
TOTAL 30 2197
The Median
The median is simply another name for the
50th percentile
It is the score in the middle; half of the scores
are larger than the median and half of the scores
are smaller than the median
Not affected by extreme values
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14

Median = 5 Median = 5
The Median

Properties
It is a positional value and hence is not affected by the
presence of extreme values.
The sum of absolute deviations from a point, is minimum.
May not exist in the data and does not take actual values.
Good with ordinal data.
How To Calculate the Median
Conceptually, it is easy to calculate the median
There are many minor problems that can occur; it
is best to let a computer do it
Sort the data from highest to lowest
Find the score in the middle
middle = (n + 1) / 2
If n, the number of scores, is even the median is
the average of the middle two scores
Calculating the Median for
Grouped Data
N / 2  cf
Median  l  h
f
• To use this formula first determine median class.
Median class is that class whose less than type cumulative
frequency is just more than N / 2 ;
• l = lower limit of TCB of median class that its <cF is
equal or greater ½ of N ;
• cf = less than type cumulative frequency of premedian
class;
• fmd = frequency of median class
• h = class width.
Median Example
What is the median of the following scores:
10 8 14 15 7 3 3 8 12 10 9
Sort the scores:
15 14 12 10 10 9 8 8 7 3 3
Determine the middle score:
middle = (n + 1) / 2 = (11 + 1) / 2 = 6
Middle score = median = 9
Median Example
What is the median of the following scores:
24 18 19 42 16 12
Sort the scores:
42 24 19 18 16 12
Determine the middle score:
middle = (n + 1) / 2 = (6 + 1) / 2 = 3.5
Median = average of 3rd and 4th scores:
(19 + 18) / 2 = 18.5
Calculating the Median for
Grouped Data
CI f TCB CM <cF

Find the median


40-50 4 39.5 -50.5 45 4
of the following
data: 51-61 3 50.5 – 61.5 56 7
62-72 6 61.5 – 72.5 67 13
73-83 8 72.5-83.5 78 21
84-94 7 83.5 -94.5 89 28
95-105 2 94.5-105.5 100 30
TOTAL 30
The Mode
The mode is the score that occurs most frequently in
a set of data
Not Affected by Extreme Values
There May Not be a Mode
There May be Several Modes (Unimodal, Bimodal,
Multimodal)
Used for Either Numerical or Categorical Data

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6

Mode = 9 No Mode
The Mode
The mode is not a very useful measure of
central tendency (determined by frequency)
It is insensitive to large changes in the data set
That is, two data sets that are very different from
each other can have the same mode
The mode is primarily used with nominally
scaled data
It is the only measure of central tendency that is
appropriate for nominally scaled data
Calculating the Mode for
Grouped Data
 f m  f1 
Mode  l   h
 2 f m  f1  f 2 
To use this formula first determine modal class.
Modal class is that class which has maximum
frequency ;
l = lower limit TCB of modal class;
fm = maximum frequency;
f1 = frequency of pre modal class ;
f2 = frequency of post modal class
Calculating the mode for
Grouped Data
CI f TCB CM <cF

Find the mode


40-50 4 39.5 -50.5 45 4
of the following
data: 51-61 3 50.5 – 61.5 56 7
62-72 6 61.5 – 72.5 67 13
73-83 8 72.5-83.5 78 21
84-94 7 83.5 -94.5 89 28
95-105 2 94.5-105.5 100 30
TOTAL 30
Measures Skewness and

Measures of Kurtosis
Skewness
One way of studying skewness of a FDT is to
compare the values of mean, median and mode.

Measure of Skewness – Extent to which the items


are symmetrically distributed. It is the degree of
its departure from symmetry with differing degree
and directions of asymmetry.
Symmetrical Distribution
Assymetrical or Skewed Distribution
Relation Between
Mean, Median & Mode
In symmetrical distributions,
the median and mean are
equal
For normal distributions, mean =
median = mode
In positively skewed
distributions, the mean is
greater than the median
mode < median <mean

In negatively skewed
distributions, the mean is
smaller than the median
mean < median < mode
Shape of Curve
Describes How Data Are Distributed
Measures of Shape:
Symmetric or skewed

Left-Skewed Symmetric Right-Skewed


Mean Median Mode Mean = Median = Mode Mode Median Mean
Example of Skewed Distribution
CI f

Mean,  = 73.23 40-50 4


Median = 75.25
51-61 3
Mode = 79.83
SD = 3.07 62-72 6
73-83 8
Coefficient of Skewness: 84-94 7
Sk = 3(mean-median)/√SD
95-105 2
Sk < 0 (negatively/left skewed)
Sk = 0 (not skewed or symmetrical) TOTAL 30
Sk > 0 (positively/right skewed)
Kurtosis
Denoted by K.
Flatness or peakedness of one distribution in relation to
another.
Types of distribution:
a. Leptokurtic, K>3 more peaked (Implies data are
concentrated towards the mean, homogenous)
b. Platykurtic, K<3 less peaked (Implies data are more
dispersed, homogenous)
c. Mesokurtic, (falls between a and b which approaches the
normal distribution)
Shape of Curve
Describes How Data Are Distributed
Measures of Shape:
Leptokurtic, Platykurtic, Mesokurtic
Leptokurtic
Normal
Mean = Median = Mode Mesokurtic

Platykurtic
Example of Skewed Distribution
CI f
Mean,  = 73.23
Median = 75.25 40-50 4
Mode = 79.83
51-61 3
SD = 3.07
62-72 6
K < 0 (Platykurtic)
K= 0 (Mesokurtic) 73-83 8
K>0 (leptokurtic) 84-94 7
95-105 2
TOTAL 30
Measures of Dispersion or

Measure of Variability/Variation
Dispersion/ Variability

Simply describe the spread or variability of the observations


in the data set.
A. Range
B. Variance
C. Standard Deviation
D. Coefficient of Variance
RANGE
•Highest value – lowest value (ungrouped data set)
•Upper limit of the highest CI – Lower limit of the
lowest class (grouped data)
•Properties:
•1. Quick but rough measure of dispersion.
•2. The larger value of the range, the more dispersed the
observations.
•3. It considers only the lowest and the highest values.
Calculating Range

•Coefficient of range:
•COR = maximum - minimum
maximum + minimum
Example: {12, 15, 20,28, 30, 40, 50}
Variance
•Mean of the squared deviation of observation
•Important Measure of Variation
•Shows Variation About the Mean:
•For the Population:  X   
i
2

 2

N

•For the Sample: 


 xi  x 2

s2 
n 1
For the Population: use N in the For the Sample : use n - 1 in
denominator. the denominator.
Variance
•Properties:
•1. Always non-negative
•2. Easy to manipulate for further mathematical
treatment.
•3. Makes use of all observation.
•4. Squared value.
Calculating Ungrouped Data
Variance

•For Ungrouped: X   
 2

 i
2

•Find the variance of the given data set:


•{53, 49, 57, 49, 50}
Grouped Data

For the Population: use N in the For the Sample : use n - 1 in


denominator. the denominator.
Calculating Grouped Data Variance
•Find the variance of the following table:
CI f CM (Xi) (Xi - G) (Xi - G)² fi ((Xi - G)²

40-50 4 45 -28.23 796.9329 3187.7316


51-61 3 56 -17.23 296.8729 890.6187
62-72 6 67 -6.23 38.8129 232.8774
73-83 8 78 4.77 22.7529 182.0232
84-94 7 89 15.77 248.6929 1740.8503
95-105 2 100 26.77 716.6329 1433.2658
TOTAL 30 7667.3670
Standard Deviation

•Important Measure of Variation


•Shows Variation About the Mean:
•For the Population:

 X i  
2

•For the Sample:


s
 x  x 
i
2

n 1
Calculating ungrouped data
(SD)
•For Ungrouped:

 X i  
2

• Example: {53, 49, 57, 49, 50}

•For Ungrouped:
Calculating grouped data (SD)
•For Grouped:
CI f CM (Xi) (Xi - G) (Xi - G)² fi ((Xi - G)²

40-50 4 45 -28.23 796.9329 3187.7316


51-61 3 56 -17.23 296.8729 890.6187
62-72 6 67 -6.23 38.8129 232.8774
73-83 8 78 4.77 22.7529 182.0232
84-94 7 89 15.77 248.6929 1740.8503
95-105 2 100 26.77 716.6329 1433.2658
TOTAL 30 7667.3670
Coefficient of Variation

Measure of Relative Variation


Always a %
Shows Variation Relative to Mean
Used to Compare 2 or More Groups
Formula (for Sample):
 SD 
CV     100%
 X 
Find the Variance, SD & CV

1. 5 test scores for Calculus I are 95, 83, 92, 81, 75.

2. Consider this dataset showing the retirement age of


11 people, in whole years:
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
3. Here are a bunch of 10 point quizzes from MAT117:
9, 6, 7, 10, 9, 4, 9, 2, 9, 10, 7, 7, 5, 6, 7
4. 11, 140, 98, 23, 45, 14, 56, 78, 93, 200, 123, 165
Find the Mean, Median, Mode
Variance, SD & CV (Grouped)
Exam marks for 60 students (marked out of 65)

mean = 30.3 sd = 14.46


Group Frequency Table

Frequency Percent
0 but less than 10 4 6.7
10 but less than 20 9 15.0
20 but less than 30 17 28.3
30 but less than 40 15 25.0
40 but less than 50 9 15.0
50 but less than 60 5 8.3
60 or over 1 1.7
Total 60 100.0
Measures of Position
Position

• Quartiles, Q (1, 2, 3, 4)
•Deciles, D (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
•Percentiles, P (1, 2, 3, 4, 5,… 100)
Denoted by Q are three score points that divide
the distribution into four equal parts.
Contents:
a. Lower Quartile, Q₁ (first quartile) – Below 25%
b.Middle quartile, Q₂ (second) – below 50%
(median)
c. Upper Quartile, Q₃ (third) – below 75%
51
52
The nine score points that divide the distribution
into ten equal parts. (D₁, D₂, D₃, …, D₉)
Contents:
a. D₁ (first decile) – Below 10%
b.D₂ (second) – below 20%

53
54
The 99 score points that divide the distribution
into 100 equal parts. (P₁, P₂, P₃, …, P₉₉)
Contents:
a. P₁ (first decile) – Below 1%

55
56

You might also like