You are on page 1of 18

Descriptive Statistics: Numerical

Dr. Md. Israt Rayhan


Professor
Institute of Statistical Research and Training (ISRT)
University of Dhaka
Email: israt@isrt.ac.bd
Describing Data Numerically
Describing Data Numerically

Central Tendency Variation

Arithmetic Mean Range

Median
Variance
Mode

Standard Deviation

Coefficient of Variation

Prepared by Dr. Md. Israt Rayhan


Measures of Central Tendency
Overview
Central Tendency

Mean Median Mode

x i
x i1
n
Arithmetic Midpoint of Most frequently
average ranked values observed value

Prepared by Dr. Md. Israt Rayhan


Arithmetic Mean
 The arithmetic mean (mean) is the most
common measure of central tendency
 For a population of N values:
N

xx1  x 2    x N
i Population
μ 
i1
values
N N
Population size

 For a sample of size n:


n

x i
x1  x 2    x n Observed
x i1
 values
n n
Sample size
Prepared by Dr. Md. Israt Rayhan
Arithmetic Mean
(continued)

 The most common measure of central tendency


 Mean = sum of values divided by the number of values
 Affected by extreme values (outliers)

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Mean = 3 Mean = 4
1  2  3  4  5 15 1  2  3  4  10 20
 3  4
5 5 5 5

Prepared by Dr. Md. Israt Rayhan


Median
 In an ordered list, the median is the “middle”
number (50% above, 50% below)

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Median = 3 Median = 3

 Not affected by extreme values

Prepared by Dr. Md. Israt Rayhan


Finding the Median

 The location of the median:

n 1
Median position  position in the ordered data
2
 If the number of values is odd, the median is the middle number
 If the number of values is even, the median is the average of
the two middle numbers

n 1
 Note that is not the value of the median, only the
2
position of the median in the ranked data

Prepared by Dr. Md. Israt Rayhan


Mode
 A measure of central tendency
 Value that occurs most often
 Not affected by extreme values
 Used for either numerical or categorical data
 There may may be no mode
 There may be several modes

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6

No Mode
Mode = 9
Prepared by Dr. Md. Israt Rayhan
Which measure of location
is the “best”?

 Mean is generally used, unless


extreme values (outliers) exist
 Then median is often used, since
the median is not sensitive to
extreme values.
 Example: Median home prices may be
reported for a region – less sensitive to
outliers

Prepared by Dr. Md. Israt Rayhan


Range

 Simplest measure of variation


 Difference between the largest and the smallest
observations:

Range = Xlargest – Xsmallest

Example:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Range = 14 - 1 = 13

Prepared by Dr. Md. Israt Rayhan


Population Variance

 Average of squared deviations of values from


the mean
N
 Population variance:
 (x i  μ) 2

σ 2 i 1
N
Where μ = population mean
N = population size
xi = ith value of the variable x
Prepared by Dr. Md. Israt Rayhan
Sample Variance

 Average (approximately) of squared deviations


of values from the mean
n
 Sample variance:
 (x  x)i
2

s 
2 i1
n -1
Where X = arithmetic mean
n = sample size
Xi = ith value of the variable X
Prepared by Dr. Md. Israt Rayhan
Population Standard Deviation
 Most commonly used measure of variation
 Shows variation about the mean
 Has the same units as the original data

 Population standard deviation:

 i
(x  μ) 2

σ i 1
N
Prepared by Dr. Md. Israt Rayhan
Sample Standard Deviation
 Most commonly used measure of variation
 Shows variation about the mean
 Has the same units as the original data

 i
 Sample standard deviation:
(x  x) 2

S i1
n -1

Prepared by Dr. Md. Israt Rayhan


Calculation Example:
Sample Standard Deviation
Sample
Data (xi) : 10 12 14 15 17 18 18 24
n=8 Mean = x = 16

(10  X)2  (12  x)2  (14  x)2    (24  x)2


s
n 1

(10  16)2  (12  16)2  (14  16)2    (24  16)2



8 1

126 A measure of the “average”


  4.2426 scatter around the mean
7
Prepared by Dr. Md. Israt Rayhan
Coefficient of Variation

 Measures relative variation


 Always in percentage (%)
 Shows variation relative to mean
 Can be used to compare two or more sets of
data measured in different units

 s 
CV    100%
x 
Prepared by Dr. Md. Israt Rayhan
Comparing Coefficient
of Variation
 Stock A:
 Average price last year = $50

 Standard deviation = $5

s $5
CVA    100%  100%  10%
x $50 Both stocks
 Stock B: have the same
standard
 Average price last year = $100 deviation, but
stock B is less
 Standard deviation = $5 variable relative
to its price
s $5
CVB    100%  100%  5%
x $100
Prepared by Dr. Md. Israt Rayhan
Approximations for Grouped Data
Suppose a data set contains values m1, m2, . . ., mk,
occurring with frequencies f1, f2, . . . fK

 For a population of N observations the mean is


K

 fimi where
K
N   fi
μ i1 i1

N
 For a sample of n observations, the mean is
K

 fm i i
where
K
n   fi
x i1
i1

n
Prepared by Dr. Md. Israt Rayhan

You might also like