You are on page 1of 93

Chapter 7:

DATA DESCRIPTIon

7.1 INTRODUCTION TO DATA


7.2 MEASURE of LOCATION
7.3 MEASURE of DISPERSION

1
7.1 INTRODUCTION TO DATA
LEARNING OUTCOMES
• At the end of this topic, student should be able
to;
• Identify the discrete and continuous data.
• Identify ungrouped and grouped data.
• Construct and interpret stem and leaf diagrams.

2
Population

A population is collection of all elements


whose characteristics are being studied

Parameter

Parameter is a summary measure of a


population or sample (such as population
mean  , variance  2, etc.).

3
Sample

A sample is any set of entities, cases, subjects,


items or experimental units chosen from the
population.

Example
The population of all students from our matriculation
college as shown below
Sample 1- girls marks in Maths
Sample 2 - boys’ weight
Sample 3 – boys’ heights
Sample 4 – girls’ heights 4
Variable

A variable is any measured characteristic or


attribute that differs for different subjects. For
example, if the weight of 30 subjects were
measured, then weight would be a variable.

Qualitative Data
Qualitative data not in numerical form but
instead assigned as attributes such as man
and woman, blue or black, and yes or no.

Quantitative Data

Quantitative data refers to observations can be


measured numerically.
5
A survey was carried out on 40 boys and 60 girls at a college
to find whether they liked the subject Mathematics or not.
State the
a) population
b) sample
c) variable
d) whether the variable is qualitative or quantitative.

a) All students of the college


b) 40 boys and 60 girls
c) Like Mathematics
6
d) qualitative
7
Discrete and Continuous Data
Discrete data can only take Continuous data can take on
on certain individual values. any value in a certain range.
Example 1 Example 2
Number of pages in a book is Length of a film is a
a discrete variable. continuous variable.

Example 3 Example 4
Shoe size is a Discrete Temperature is a
variable. E.g. 5, 5½, 6, 6½ continuous variable.
etc. Not in between.

Example 5 Example 6

Number of people in a Time taken to run a race is


race is a discrete a continuous variable.
variable.
DATA

Ungrouped Data Grouped Data


Any data that you Data that have been
first gather . It is organised into a
also called raw data. frequency distribution
Example of ungrouped data:

(i) 3,5,6,2,5,2,4,6,5 Number of books 0 1 2 3


Frequency 3 7 4 2

Example of group data:

Height (cm) 150-155 155-160 160-165 165-170


Frequency 2 8 6 5

Note : All data is to be considered as


SAMPLE
unless otherwise stated.
10
The most commonly used graphs in statistics are:
1. The Histogram
2. The Frequency Polygon.
3. The Cumulative Frequency Graph
4. The Bar Chart
5. Pie Chart
6. Pareto Charts
7. Dot Plot
8. Stem and Leaf Plot
9. Time Series Graph
Stem-and-Leaf Diagram
A Stem and Leaf Diagram is a special table where each data value is
split into a "stem" (the first digit or digits) and a "leaf" (usually the
last digit).
Stem and Leaf Plot | STEMPLOT
 The technique involves a combination of a graphic
technique and a sorting technique, ie listing the data
in rank order according to numerical value
 Stem and Leaf Plot organizes data by showing the
items in order using stems and leaves.
 It summarizes the shape of the data (the distribution)
and provides extra detail regarding individual values

 The leaf is the trailing digit

 The stem is the leading digit(s) of the data


For example , the numerical data 386 might split 38 - 6 as
shown :

STEM
Leading digits Trailing digit LEAF

38 6
( used in sorting ) (Shown in display )

A stem-and-leaf diagram is a method of


presenting a data set so that gaps or
concentration in the clarify the process of
constructing a stem-and-leaf display.
The heights of 11 fourth-grade badminton
players are (in inches):

• 56 • 58
• 61 • 58
• 61 • 63
• 60 • 61
• 59 • 59
• 57
The ordered numbers from least to greatest are 56,
57, 58, 58, 59, 59, 60, 61, 61, 61, 63

Then, put your data in a stem – and – leaf plot.

• Each STEM stands for the HEIGHT IN INCHES


first digit of each
number. Stem Leaves
5
• Record the tens digits in 6
order from least to
greatest.
The ordered numbers from least to greatest are
56, 57, 58, 58, 59, 59, 60, 61, 61, 61, 63

HEIGHT IN INCHES • Each LEAF stands for the


second digit of each
Stem Leaves number.
5 6, 7, 8, 8, 9, 9
6 0, 1, 1, 1, 3 • Record the ones digits in
order from least to
greatest.
HEIGHT IN INCHES
Stem Leaves
5 6, 7, 8, 8, 9, 9
6 0, 1, 1, 1, 3

MEDIAN MODE
Find the middle number Find the number that
in a set of numbers. occurs most often in a
set of numbers.
The median is 59 inches.
The mode is 61 inches.
7.2 MEASURE S OF LOCATION ( For Ungrouped Data )

a) Find and Interpret the mean. Mode, median, quartiles and percentiles for
ungrouped data

b) Construct and interpret box-and-whisker plots for ungrouped data

c) Find and interpret the mean, mode, median, quartiles and percentiles for
the grouped data

19
Ungrouped Data

9, 12, 14, 11, 14, 14, 13, 13,


14, 13, 11, 13, 11,

what is the middle


value of the set?

what is the average ?

20
Mean
Mean of a set data x1, x2 , x3 ,........,xn is
defined as
x  Number
Sum of all data
of data
=

x1  x2  x3  xn

n
 x
n
Example 3

(a) Find the mean of a set of numbers


3, 5, 7, 4, 5, 9, 6

x  x
n
(b) Find the mean of a set of data

Number of Male Children 0 1 2 3 4 5


Frequency 2 5 7 3 2 1

x  x
n
Example
Given that the mean of a set of data 4, 7, 8, 11, x, 18, 9 and 10 is 10, find the
value of x.

Mean, 𝑥 = 10

4:7:8:11:𝑥:18:9:10
= 10
8
67:𝑥
= 10
8
67 + 𝑥 = 80
𝑥 = 13
24
Mode
The mode of a set of data is the value
that occurs most frequently.

Example 4
Find the mode for the following set of data.

a) 5, 2, 3, 3, 5, 4, 28, 5

Solution :
b) 2, 3, 5, 8, 10

Solution :

c) 0.2, 0.4, 0.4, 0.4, 0.5, 0.7, 0.7, 0.7, 0.5

Solution :
Median
The median is the middle value when
a set of data is arranged in order of
magnitude, then choose the middle
point.

For a set of data x1,x2,x3,...,xn arranged in order


of magnitude, there are two cases.
a) When the number of data (n) is odd,

th
median =

n1


 observation

 xn1

2  2

b) When the number of data (n) is even,


median = mean of the two middle values 
1

xn  xn 1
2 2 2

Median of ungrouped data
• If the values of a set of data are arranged in order of magnitude (ascending or descending
order), the value lying in the middle is known as the median of the set of data
• A set of data having n values that are arranged in order is given:

𝑛:1
(a) If n is an odd number, the median is the th value
2

𝑛:1
Median = th value
2

𝑛 𝑛
(b) If n is an even number, the median is the average of th value and the + 1 th value
2 2

1 𝑛 𝑛
Median =2 th value + + 1 th value
2 2

• Half or 5O% of the values in a set of data are smaller than the median
Example
Example 5
Find the median for the following set of data.

a) 21, 24, 17, 28, 36, 20, 32

b) 3.56, 2.71, 5.48, 8.61, 4.35, 6.22


Example 6

A set of data is 11, 2, 13, 6, x, 11, 3, 12, 6, y, the mode is 6


and the median is 7. Find the values of x and y. Hence
calculate the mean of the data.
Quartiles
The values that divide a list of
arranged numbers into quarters.

First put the list of numbers in order.


Then cut the list into four equal parts.
The Quartiles are at the "cuts".

30
Percentiles
The values that divide a list of
arranged numbers into 100 equal
parts.

First put the list of numbers in order.


Then use the formula as below
 X s  X s1 ,
 if s is an integer
Pk and Qk   2
 Xs , if s is a non-integer

nk
where s  and s  the least integer greater than s.
100 31
Example
Example
Example 7 :

Find the first, second and third quartile of the following data:

a) 5, 8, 4, 4, 6, 3, 8

40
Example 8:

Find the 15th, 25th and 60th percentile of the following data:

3, 4, 5, 7, 2, 1, 6, 9, 2, 8, 6, 8

43
7.2 MEASURE S OF LOCATION
(For Grouped Data)

44
LEARNING OUTCOMES

• At the end of this topic, student should be able to:

1) find and interpret the mean, mode, median, quartiles and percentiles for
grouped data.

2) construct and interpret box and whisker plot

45
Mean
• If a set of grouped data given in frequency distribution, for
example in the form of class intervals,

f 1 x 1  f 2 x 2  ... f k x k
mean = x
f 1  f 2  ... f k

Mean , x   fx i i

f i

 xi is the midpoint of the ith class and fi is the


corresponding frequency 36
Mode
Mode is the value that occurs with the highest frequency
in a set of data.

 d1 
Mode = x̂  L  C
d d 
 1 2
L = lower class boundary of the modal class
d1 = the frequency difference between the modal class and the class
before it.
d2 = the frequency difference between the modal class and the class
after it..
C = class width
Q 37
Mode from histogram
d1 & d2 are frequencies difference
frequency
between the modal class and the
class before and after it.

d1 d2
C is the width of the modal class

L is lower class boundary of the


modal class

c Class boundaries
L
mode
38
Median
• Since the original information of the raw data is lost when the
data is grouped, the median of a grouped data can only be
estimated.

 The median of a grouped data is calculated using the


following formula :

 nF 
2 
Median = x  L   C
 f 

Q 49
Median

 nF 
2 
Median = x  L   C
 f 

L = lower boundary of median class


n = total number of observations
F = cumulative frequency before the median class
C = class width
f = frequency of the median class
Q 50
Example
Quartile
• Quartiles divide a set of data which are arranged in
ascending order into 4 equal parts.

Smallest Largest
data value data value
Q1 Q2 Q3

25% 25% 25% 25%


Quartile
The procedure for estimating quartiles is the same as estimating the
median, because median is actually the 2nd quartile, Q2.
Therefore, first (Q1), second (Q2) and third quartile(Q3) are estimated by the formula

 k (n)  F 
4 k 
Qk  Lk  Ck , k  1, 2, 3
 fk 
 

Lk = lower boundary of the class where Qk lies


n = total number of observations
Fk = cumulative frequency before the Qk class
Ck = class width where Qk lies
fk = frequency of the class where Qk lies
Quartile

 
th
Qk  k  n 


observation; k = 1, 2,3
4
 

  th
First quartile Q1  1  n



 observation
4
 

  th
Second quartile Q2  2 


n 

observation
4
 

3  th
Third quartile Q3   n 

 observation
4 

43
2

3
Percentile
 Percentiles divide a set of data which are arranged in
ascending order into 100 equal parts, denoted by

P1, P2, P3, . . ., P98, P99

Smallest Largest
data value data value

P1 P2 P3 P98 P99

1% 1% 1% 1% 1% 1% 1%

45
Percentile
The kth percentile, Pk, can be calculated by using the formula

 k (n)  F 
 100 k 
Pk  Lk  Ck , k  1, 2, 3, ...,99
 f 
 k 

Lk = lower boundary of the class where Pk lies


n = total number of observations
Fk = cumulative frequency before the Pk class
Ck = class width where Pk lies
fk = frequency of the class where Pk lies
Percentile

1st quartile, Q1 = The 25th percentile

Second quartile, Q2 = The median = The 50th percentile

Third quartile, Q3 = The 75th percentile

47
Example 9

The table shows the weight frequency distribution of 304 contestants in a


competition. Find:
a) Mode
b) Median
c) Mean
d) Q1 , P30 , P80 (using formulae)
Example 10

Calculate:
a) Mean d) Q1 and Q3
b) Median e) Interquartile range
c) Mode f) P10
Example 11

Calculate:
a) Mean d) Q1 and Q3
b) Median e) Interquartile range
c) Mode f) P70
7.3 MEASURES
OF DISPERSION

71
Measures of Dispersion

At the end of this lesson, students should be able to

a) Find and interpret variance and standard


deviation for ungrouped data.
b) Find and interpret variance and standard
deviation for grouped data.
c) Find and interpret the Pearson’s coefficient of
skewness.
 Dispersion (variation) is how the data is spread
Measures of Dispersion

out, or dispersed from the mean.

 The smaller the dispersion values, the more


consistent the data (values are close together).

 The larger the dispersion values, the more spread


out the data values are. This means that the data
is not as consistent.
Measures of Dispersion

Smaller dispersion

Larger dispersion

mean
 Consider the following Math quiz scores of students
from two classes, MS1 and MD1
Measures of Dispersion

MS1 3 4 5 6 8 9 10 12 15
MD1 3 7 7 7 8 8 8 9 15

 Both classes have the same mean score, which is 8.


So, which class is better?
 From observation, most of the scores of students
from MD1 are closer to the mean.
 The variability or dispersion of the scores from the
mean is less for class MD1 than for class MS1.
Therefore, we can say that class MD1 is better than
class MS1.
Measures of Dispersion Variance For Ungrouped Data

Sample Variance

  n 2
   xi  
1  n 2  i1  
s 
2
  xi  
n 1 i1 n 

xi : each data value


n : the number of data values
Example
Example 18

Find the variance and standard deviation of all the


Measures of Dispersion

following numbers : 6, 7, 10, 11, 11, 13, 16, 18, 25.

** Assume SAMPLE if not mentioned


Measures of Dispersion Variance For Grouped Data

Sample Variance

  n 
2
   fixi  
1  n 2 i1  
s 
2
  fi xi  
n 1i1 n 

xi : class midpoint
fi : class frequency
n : the number of data values
Example
Example 19
The data below represents the number of kilometers
that 20 runners ran during a week. Find the variance
Measures of Dispersion

and the standard deviation for the distribution.

Distance (km) Frequency


5.5 – 10.5 1
10.5 – 15.5 2
15.5 – 20.5 3
20.5 – 25.5 5
25.5 – 30.5 4
30.5 – 35.5 3
35.5 – 40.5 2
SYMMETRY AND SKEWNESS
A set of observations is symmetrically distributed if its
graphical representation (histogram, bar chart) is
symmetric with respect to a vertical axis passing through
the mean. For a symmetrically distributed population or
sample, the mean, median and mode have the same
value. Half of all measurements are greater than the mean,
while half are less than the mean.
Concept of Skewness
 The skewness of a distribution is defined as the
Measures of Dispersion

lack of symmetry.
 In a symmetrical distribution, the mean, mode
and median are equal to each other.

x = x̂ = ~
x
 x = x̂ = ~
x.
A set of observations that is not symmetrically distributed is said to
be skewed. It is positively skewed if a greater proportion of the
observations are less than or equal to (as opposed to greater than
or equal to) the mean; this indicates that the mean is larger than
the median. The histogram of a positively skewed distribution will
generally have a long right tail; thus, this distribution is also known
as being skewed to the right.
frequency

Mode variable
Mean
Median
Concept of Skewness

 If the distribution is not symmetrical, it is said


Measures of Dispersion

to be skewed.

x̂ ~x x
 The longer tail occurs to the right of the curve.
 x̂ < ~
x<x .
On the other hand, a negatively skewed distribution has more
observations that are greater than or equal to the mean. Such a
distribution has a mean that is less than the median. The
histogram of a negatively skewed distribution will generally have a
long left tail; thus, the phrase skewed to the left is applied here.
frequency

Mean variable
Median Mode
Concept of Skewness

 If the distribution is not symmetrical, it is said


Measures of Dispersion

to be skewed.

x~x x̂
 The longer tail occurs to the left of the curve.
 x<~x < x̂ .
Note that when distributions are skewed, the median generally lies
between the mode and the median, and the following relationship is
satisfied :

mean – mode = 3 (mean – median)

• If mean > mode, the skew is positive


• If mean < mode, the skew is negative
• If mean = mode, the skew is zero and the
distribution is symmetrical.
Pearson’s Coefficient of Skewness

 Pearson’s coefficient of skewness is given by


Measures of Dispersion

mean  mode
sk 
std dev

Or
3(mean  median)
sk 
std dev
 If sk = 0, the distribution is symmetric.
If sk > 0, the distribution is positively skewed.
If sk < 0, the distribution is negatively skewed.
Example 20
. Class Intervals Frequency
Measures of Dispersion

10-19 5
20-29 7
Find :
30-39 4
a) Mean 40-49 4
b) Median 50-59 3
c) Mode
d)Variance 60-69 2
e) Standard deviation
f) Pearson’s Coefficient and interpret your answer
g) State with reason whether mean or median is a better
measure of location
Example 21
The following table gives the cumulative frequency distribution for the
Measures of Dispersion

weights (kg) of fifty hampers during a festival at a supermarket.


(a) Find the mean, median and standard deviation
(b)Hence, calculate the Pearson’s coefficient of skewness and
interpret your answer.
(c)State with reason whether mean or median is a better measure of
location.

ANSWER :
a) mean= 10.72 median =10.808 s/d =3.812
b) Sk =-0.069 (skewed to the left/ almost symmetry)
c) Mean (because the distribution almost symmetry)

You might also like