Professional Documents
Culture Documents
DATA DESCRIPTIon
1
7.1 INTRODUCTION TO DATA
LEARNING OUTCOMES
• At the end of this topic, student should be able
to;
• Identify the discrete and continuous data.
• Identify ungrouped and grouped data.
• Construct and interpret stem and leaf diagrams.
2
Population
Parameter
3
Sample
Example
The population of all students from our matriculation
college as shown below
Sample 1- girls marks in Maths
Sample 2 - boys’ weight
Sample 3 – boys’ heights
Sample 4 – girls’ heights 4
Variable
Qualitative Data
Qualitative data not in numerical form but
instead assigned as attributes such as man
and woman, blue or black, and yes or no.
Quantitative Data
Example 3 Example 4
Shoe size is a Discrete Temperature is a
variable. E.g. 5, 5½, 6, 6½ continuous variable.
etc. Not in between.
Example 5 Example 6
STEM
Leading digits Trailing digit LEAF
38 6
( used in sorting ) (Shown in display )
• 56 • 58
• 61 • 58
• 61 • 63
• 60 • 61
• 59 • 59
• 57
The ordered numbers from least to greatest are 56,
57, 58, 58, 59, 59, 60, 61, 61, 61, 63
MEDIAN MODE
Find the middle number Find the number that
in a set of numbers. occurs most often in a
set of numbers.
The median is 59 inches.
The mode is 61 inches.
7.2 MEASURE S OF LOCATION ( For Ungrouped Data )
a) Find and Interpret the mean. Mode, median, quartiles and percentiles for
ungrouped data
c) Find and interpret the mean, mode, median, quartiles and percentiles for
the grouped data
19
Ungrouped Data
20
Mean
Mean of a set data x1, x2 , x3 ,........,xn is
defined as
x Number
Sum of all data
of data
=
x1 x2 x3 xn
n
x
n
Example 3
x x
n
(b) Find the mean of a set of data
x x
n
Example
Given that the mean of a set of data 4, 7, 8, 11, x, 18, 9 and 10 is 10, find the
value of x.
Mean, 𝑥 = 10
4:7:8:11:𝑥:18:9:10
= 10
8
67:𝑥
= 10
8
67 + 𝑥 = 80
𝑥 = 13
24
Mode
The mode of a set of data is the value
that occurs most frequently.
Example 4
Find the mode for the following set of data.
a) 5, 2, 3, 3, 5, 4, 28, 5
Solution :
b) 2, 3, 5, 8, 10
Solution :
Solution :
Median
The median is the middle value when
a set of data is arranged in order of
magnitude, then choose the middle
point.
th
median =
n1
observation
xn1
2 2
𝑛:1
(a) If n is an odd number, the median is the th value
2
𝑛:1
Median = th value
2
𝑛 𝑛
(b) If n is an even number, the median is the average of th value and the + 1 th value
2 2
1 𝑛 𝑛
Median =2 th value + + 1 th value
2 2
• Half or 5O% of the values in a set of data are smaller than the median
Example
Example 5
Find the median for the following set of data.
30
Percentiles
The values that divide a list of
arranged numbers into 100 equal
parts.
Find the first, second and third quartile of the following data:
a) 5, 8, 4, 4, 6, 3, 8
40
Example 8:
Find the 15th, 25th and 60th percentile of the following data:
3, 4, 5, 7, 2, 1, 6, 9, 2, 8, 6, 8
43
7.2 MEASURE S OF LOCATION
(For Grouped Data)
44
LEARNING OUTCOMES
1) find and interpret the mean, mode, median, quartiles and percentiles for
grouped data.
45
Mean
• If a set of grouped data given in frequency distribution, for
example in the form of class intervals,
f 1 x 1 f 2 x 2 ... f k x k
mean = x
f 1 f 2 ... f k
Mean , x fx i i
f i
d1
Mode = x̂ L C
d d
1 2
L = lower class boundary of the modal class
d1 = the frequency difference between the modal class and the class
before it.
d2 = the frequency difference between the modal class and the class
after it..
C = class width
Q 37
Mode from histogram
d1 & d2 are frequencies difference
frequency
between the modal class and the
class before and after it.
d1 d2
C is the width of the modal class
c Class boundaries
L
mode
38
Median
• Since the original information of the raw data is lost when the
data is grouped, the median of a grouped data can only be
estimated.
nF
2
Median = x L C
f
Q 49
Median
nF
2
Median = x L C
f
Smallest Largest
data value data value
Q1 Q2 Q3
k (n) F
4 k
Qk Lk Ck , k 1, 2, 3
fk
th
Qk k n
observation; k = 1, 2,3
4
th
First quartile Q1 1 n
observation
4
th
Second quartile Q2 2
n
observation
4
3 th
Third quartile Q3 n
observation
4
43
2
3
Percentile
Percentiles divide a set of data which are arranged in
ascending order into 100 equal parts, denoted by
Smallest Largest
data value data value
P1 P2 P3 P98 P99
1% 1% 1% 1% 1% 1% 1%
45
Percentile
The kth percentile, Pk, can be calculated by using the formula
k (n) F
100 k
Pk Lk Ck , k 1, 2, 3, ...,99
f
k
47
Example 9
Calculate:
a) Mean d) Q1 and Q3
b) Median e) Interquartile range
c) Mode f) P10
Example 11
Calculate:
a) Mean d) Q1 and Q3
b) Median e) Interquartile range
c) Mode f) P70
7.3 MEASURES
OF DISPERSION
71
Measures of Dispersion
Smaller dispersion
Larger dispersion
mean
Consider the following Math quiz scores of students
from two classes, MS1 and MD1
Measures of Dispersion
MS1 3 4 5 6 8 9 10 12 15
MD1 3 7 7 7 8 8 8 9 15
Sample Variance
n 2
xi
1 n 2 i1
s
2
xi
n 1 i1 n
Sample Variance
n
2
fixi
1 n 2 i1
s
2
fi xi
n 1i1 n
xi : class midpoint
fi : class frequency
n : the number of data values
Example
Example 19
The data below represents the number of kilometers
that 20 runners ran during a week. Find the variance
Measures of Dispersion
lack of symmetry.
In a symmetrical distribution, the mean, mode
and median are equal to each other.
x = x̂ = ~
x
x = x̂ = ~
x.
A set of observations that is not symmetrically distributed is said to
be skewed. It is positively skewed if a greater proportion of the
observations are less than or equal to (as opposed to greater than
or equal to) the mean; this indicates that the mean is larger than
the median. The histogram of a positively skewed distribution will
generally have a long right tail; thus, this distribution is also known
as being skewed to the right.
frequency
Mode variable
Mean
Median
Concept of Skewness
to be skewed.
x̂ ~x x
The longer tail occurs to the right of the curve.
x̂ < ~
x<x .
On the other hand, a negatively skewed distribution has more
observations that are greater than or equal to the mean. Such a
distribution has a mean that is less than the median. The
histogram of a negatively skewed distribution will generally have a
long left tail; thus, the phrase skewed to the left is applied here.
frequency
Mean variable
Median Mode
Concept of Skewness
to be skewed.
x~x x̂
The longer tail occurs to the left of the curve.
x<~x < x̂ .
Note that when distributions are skewed, the median generally lies
between the mode and the median, and the following relationship is
satisfied :
mean mode
sk
std dev
Or
3(mean median)
sk
std dev
If sk = 0, the distribution is symmetric.
If sk > 0, the distribution is positively skewed.
If sk < 0, the distribution is negatively skewed.
Example 20
. Class Intervals Frequency
Measures of Dispersion
10-19 5
20-29 7
Find :
30-39 4
a) Mean 40-49 4
b) Median 50-59 3
c) Mode
d)Variance 60-69 2
e) Standard deviation
f) Pearson’s Coefficient and interpret your answer
g) State with reason whether mean or median is a better
measure of location
Example 21
The following table gives the cumulative frequency distribution for the
Measures of Dispersion
ANSWER :
a) mean= 10.72 median =10.808 s/d =3.812
b) Sk =-0.069 (skewed to the left/ almost symmetry)
c) Mean (because the distribution almost symmetry)