You are on page 1of 21

Statistics

Lecture 5
Descriptive Statistics

1. ANALYSIS OF LOCATION (CENTRAL TENDENCY)


where along the scale of all possible values our particular distribution
happens to be centered (mean, median, mode)
The purpose - to describe population in one figure a representative value of a mass of data.
THE MEASURES OF POSITION MEASURES OF LOCATION
• Mode
• Quantiles: Quartiles, Deciles, Percentiles

2. ANALYSIS OF DISPERSION (VARIATION)


how the data varies (it always should complement measures of location

3. ANALYSIS OF SKEWNESS (SYMMETRY)


whether the distribution is symmetric or skewed

4. ANALYSIS OF CONCENTRATION
ANALYSIS OF CONCENTRATION –
Distribution of the total value between the elementary units -
whether the total value of the variable is uniformly distributed
between the elementary units or not
ANALYSIS OF KURTOSIS(PEAKEDNESS)
whether the distribution is mesokurtic, leptokurtic or platykurtic
Concentration of elementary units near the mean value
Descriptive Statistics - MEASURES OF CENTRAL TENDENCY - Measures of Position

Some Properties of The Median the value above which and below which lie an equal number of observations

• always exists and is unique for every distribution

• is not affected by extreme values in the distribution - useful measure of a


representative value for a highly skewed distributions

• can be estimated from a frequency distribution even if the distribution has


unequal class intervals or open-ended classes

Telephone-calls duration - xi Number of calls - ni


(in min.)

Less than 2 9
2-4 15
4-8 6
Above 8 3
Descriptive Statistics - MEASURES OF CENTRAL TENDENCY - Measures of Position

Quartiles Q , Q EXCEL --- > QUARTILE


1 3
Q , Q we calculate like Q
1 3 2
STEPS:
• we have to calculate the value N/4 or 3N/4
Q1 exceeds 25% of the data and is exceeded by 75% of the data • we have to have a cumulative serie
- to determine Q1 and Q3 class interval
The 1st quartile (lower) - Q1

– the lower boundary of the class containing the Q1


– the frequency of the class containing the Q1
– the cumulative frequency of the class preceding the Q1
– the width of the class containing the Q1

The 2nd quartile - Q2=MEDIAN 1st

Q3 exceeds 75% of the data and is exceeded by 25% of the data.

The 3rd quartile (upper) - Q3

– the lower boundary of the class containing the Q3


– the frequency of the class containing the Q3
– the cumulative frequency of the class preceding the Q3
– the width of the class containing the Q3
Descriptive Statistics - MEASURES OF CENTRAL TENDENCY - Measures of Position

Quartiles Q , Q EXCEL --- > QUARTILE


1 3
Q , Q we calculate like Q
1 3 2
STEPS:
• we have to calculate the value N/4 or 3N/4
Q1 exceeds 25% of the data and is exceeded by 75% of the data • we have to have a cumulative serie
- to determine Q1 and Q3 class interval
The 1st quartile (lower) - Q1 newlyweds percentage
by age of men - xi cum ni
ni
– the lower boundary of the class containing the Q1 less than 20 1 1
– the frequency of the class containing the Q1 Q1 20 - 25 29 30
25 - 30
– the cumulative frequency of the class preceding the Q1 44 74

– the width of the class containing the Q1 Q3 30 - 35 14 88


35 – 40 4 92
40 – 50 4 96
50 and more 4 100
The 2nd quartile - Q2=MEDIAN 1st
Total 100

Q3 exceeds 75% of the data and is exceeded by 25% of the data. N 25 − 1


4
= 25 Q1 = 20 +  5 = 24
29
The 3rd quartile (upper) - Q3 The interpretation: 75% of men get married with ages above
the 24 and 25% of men get married with ages below the 24.

– the lower boundary of the class containing the Q3 3N 75 − 74


= 75 Q = 30 +  5 = 30
– the frequency of the class containing the Q3 4 3
14
– the cumulative frequency of the class preceding the Q3 The interpretation: 25% of men get married with ages above
– the width of the class containing the Q3 the 30 and 75% of men get married with ages below the 30.
Descriptive Statistics - MEASURES OF CENTRAL TENDENCY

Comparison of the MEAN, MEDIAN and MODE


MEAN - the center of gravity of the distribution.
MEDIAN - divides the area into two equal parts
MODE - highest point of the distribution.

Symmetrical distribution - has only one peak and is symmetric,


- the median, the mean and the mode all coincide

The bell shaped normal curve is an example of such a case.

https://www.quora.com/How-can-I-tell-if-I-see-a-normal-distribution
Descriptive Statistics

1. ANALYSIS OF LOCATION (CENTRAL TENDENCY)


where along the scale of all possible values our particular distribution
happens to be centered (mean, median, mode)
The purpose - to describe population in one figure a representative value of a mass of data.

2. ANALYSIS OF DISPERSION (VARIATION)


how the data varies (it always should complement measures of location
– because eg. mean, when appear alone can be very misleading)

3. ANALYSIS OF SKEWNESS (SYMMETRY)


whether the distribution is symmetric or skewed

4. ANALYSIS OF CONCENTRATION
ANALYSIS OF CONCENTRATION –
Distribution of the total value between the elementary units -
whether the total value of the variable is uniformly distributed
between the elementary units or not
ANALYSIS OF KURTOSIS(PEAKEDNESS)
whether the distribution is mesokurtic, leptokurtic or platykurtic
Concentration of elementary units near the mean value
Descriptive Statistics - MEASURES OF DISPERSION

MEASURES OF DISPERSION – describe the amount of spread in a set of data in one of two ways:

DISTANCE MEASURES - the distance between two particular observed values (e.g. we can calculate Q3 - Q1)
• can be calculated quickly and easily,
• frequently used in describing the data, rarely used to measure the variation in a set of data
when we are interested in testing hypothesis about a population parameter

DEVIATIONS FROM THE MEAN - the average of the deviations of the units from some central value (mean)

DISTANCE MEASURES
THE RANGE
THE INTERQUARTILE RANGE IQR = Q3 - Q1

only 50% of the observations fall between Q1 and Q3

E.g. Variable - newlyweds by age of men

IQR=Q3-Q1=30-24=6
Interpretation: 50% of the men get married between 24 years and 30.
The age of 50% of newly married man vary from 24 till 30 and
the range of the middle 50% of the newly married man equals about 6 years.
Descriptive Statistics - MEASURES OF DISPERSION

DISTANCE MEASURES

QUARTILE DEVIATION
(SEMI-INTERQUARTILE RANGE)

- measure of dispersion from the median (the average spread of the 1st and the 3rd quartiles from the median)
- the measure of the middle 50% of the data

E.g. Variable - newlyweds by age of men

Interpretation: The age of newly married man differ from the median
for about 3 years on the average.
Descriptive Statistics - MEASURES OF DISPERSION

DEVIATIONS FROM THE MEAN: ! Problem with averaging the deviations from the mean –
the positive and the negative deviations cancel each other

THE MEAN ABSOLUTE DEVIATION EXCEL --- > AVEDEV


the average value of the absolute deviations from the mean
n k
 xi − x  x'i − x fi
d = i =1 d = i =1 In the case of grouped data
n n

THE VARIANCE - the average value of the squared deviations from the mean
n n N
 ( xi − x ) 2
 xi2  ( xi −  )2
SAMPLE s 2 = i =1 = i =1 − x 2 POPULATION  2 = i =1
n −1 n −1 N

problems with interpretation (measures the average of the squared deviations)


! frequently a very large number relative to the original observations

the squared root of the variance is calculated STANDARD DEVIATION


Descriptive Statistics - MEASURES OF DISPERSION

DEVIATIONS FROM THE MEAN EXCEL --- > STDEVP


STANDARD DEVIATION SAMPLE s = sx = S = S x POPULATION 
N

 (x
N
detailed series − x) x
2 2
i i
Sx = i =1
Sx = i =1
− x2
Id Wages - x
i
N N
1 1800 40000
2 2000 0
3 2200 40000
E.g. Variable – wages of workers
Total 6000 80000
Interpretation: The wage of workers differ from
the mean on the average for about 163.3 zloty.

frequency distribution of a discrete variable

x
k

 (x − x ) ni
2 2
Grades Number of
xini i i ni
- xi grades - ni
2 0 0 0 Sx = i =1
Sx = i =1
− x2
3 3 9 3 N N
4 10 40 0
5 3 15 3
Total 16 64 6
E.g. Variable – The grades

Interpretation: The grades differ from the mean on


the average for about 0.61.
Descriptive Statistics - MEASURES OF DISPERSION

DEVIATIONS FROM THE MEAN EXCEL --- > STDEVP


STANDARD DEVIATION

frequency distribution of a continuous variable

k k

 (x'i − x ) 2
ni  2
x ' i ni
Sx = i =1
Sx = i =1
− x2
N N

Wages - xi Number of midpoint


employees -
ni
0-6 3 3 9 27
6 - 12 4 9 36 324
12 - 18 13 15 195 2925
Total 20 240 3276 E.g. Variable – wages of accountants

Interpretation: The wage of of accountants differ from the


mean on the average for about 4450 zloty.
Descriptive Statistics - MEASURES OF DISPERSION CHEBYSHEV’S THEOREM

Empirical (68-95-99.7) Rule


For data sets having a distribution that is approximately bell shaped, the following properties apply:
• about 68% of all values fall within 1 standard deviation of the mean
• about 95% of all values fall within 2 standard deviations of the mean
• about 99.7% of all values fall within 3 standard deviations of the mean
Descriptive Statistics - MEASURES OF DISPERSION

Empirical (68-95-99.7) Rule


For data sets having a distribution that is approximately bell shaped, the following properties apply:
• about 68% of all values fall within 1 standard deviation of the mean
• about 95% of all values fall within 2 standard deviations of the mean
• about 99.7% of all values fall within 3 standard deviations of the mean

THE TYPICAL RANGE


E.g. Variable – wages of accountants

Interpretation:
A typical of accountant earns ranged from 7.55 till 16.45 zloty.
Or About 68% of of accountants earns ranged from 7.55 till 16.45 zloty.

THE POSITION version of TYPICAL RANGE


II THE TYPICAL RANGE
E.g. Variable - newlyweds by age of men

Interpretation: A typical age of newlyweds men ranged from 24 to 30.


Descriptive Statistics - MEASURES OF DISPERSION

THE COEFFICIENT OF VARIATION - the sample Standard Deviation stated as a proportion of


the arithmetic Mean of the distribution

Sx
Vx =  100% E.g. Variable – wages of accountants

x
The lower the value of the coefficient of variation,
the more homogeneous the population Interpretation: The standard deviation of the wage of
of accountants constitutes over 37% of the mean

THE POSITION version of coefficient of variation


II THE COEFFICIENT OF VARIATION - the Quartile Deviation stated as a proportion of the Median

E.g. Variable - newlyweds by age of men

Interpretation: The quartile deviation of newlyweds


men constitutes over 11% of the mean
Descriptive Statistics

1. ANALYSIS OF LOCATION (CENTRAL TENDENCY)


where along the scale of all possible values our particular distribution
happens to be centered (mean, median, mode)
The purpose - to describe population in one figure a representative value of a mass of data.

2. ANALYSIS OF DISPERSION (VARIATION)


how the data varies (it always should complement measures of location
– because eg. mean, when appear alone can be very misleading)

3. ANALYSIS OF SKEWNESS (SYMMETRY)


whether the distribution is symmetric or skewed

4. ANALYSIS OF CONCENTRATION
ANALYSIS OF CONCENTRATION –
Distribution of the total value between the elementary units -
whether the total value of the variable is uniformly distributed
between the elementary units or not
ANALYSIS OF KURTOSIS(PEAKEDNESS)
whether the distribution is mesokurtic, leptokurtic or platykurtic
Concentration of elementary units near the mean value
Descriptive Statistics – MEASURES OF SKEWNESS (SYMMETRY)

Whether the distribution is symmetric or skewed (positively or negatively)?


EXCEL --- > SKEW

Mean=Median= Mode

the distribution is negatively skewed the distribution is positively skewed


the most of values are to the right from the mean x -D the most of values are to the left from the mean

Age of death - most of people die at the ages over 60, Income - most people have low incomes,
many economic and demographic variables
Descriptive Statistics – MEASURES OF SKEWNESS (SYMMETRY)

Pearson’s coefficient of skewness

(1) x − D = 3( x − Me) (2) (3) As  − 1;1


Measures of position:
Yule's coefficient AQ  − 1;1

the direction of skewness


AS, AQ = 0 - the distribution is symmetric
AS, AQ < 0 - the distribution is skewed to the left (negatively skewed)
AS, AQ > 0 - the distribution is skewed to the right (positively skewed )

the strength of skewness


if the value of the coefficient is close to zero, E.g. As=0.8 – the distribution is strongly skewed to the right
it means that the distribution is slightly skewed the most of values are to the right from the mean
0.1 − 0.4 - the distribution is slightly skewed
As= -0.1 - the distribution is slightly skewed to the left
0.5 − 0.7 - the distribution is moderately skewed
the most of values are to the left from the mean
0.8 − 1.0 - the distribution is strongly s skewed
Descriptive Statistics – MEASURES OF SKEWNESS (SYMMETRY)

The analysis of skewness can be presented by calculating the 3rd moment about the mean
(the cubed deviations from the mean)

The normalised third central moment

 =  i
( x − x ) 3

3
N

is a standardised measure of skewness


Descriptive Statistics

1. ANALYSIS OF LOCATION (CENTRAL TENDENCY)


where along the scale of all possible values our particular distribution
happens to be centered (mean, median, mode)
The purpose - to describe population in one figure a representative value of a mass of data.

2. ANALYSIS OF DISPERSION (VARIATION)


how the data varies (it always should complement measures of location
– because eg. mean, when appear alone can be very misleading)

3. ANALYSIS OF SKEWNESS (SYMMETRY)


whether the distribution is symmetric or skewed

4. ANALYSIS OF CONCENTRATION
ANALYSIS OF CONCENTRATION –
Distribution of the total value between the elementary units -
whether the total value of the variable is uniformly distributed
between the elementary units or not
ANALYSIS OF KURTOSIS(PEAKEDNESS)
whether the distribution is mesokurtic, leptokurtic or platykurtic
Concentration of elementary units near the mean value
Descriptive Statistics – MEASURES OF KURTOSIS

The analysis of kurtosis can be presented by calculating the 4th moment about the mean
EXCEL --- > KURT
The normalised fourth central moment

is a standardised measure of kurtosis

MESOKURTIC (normal) DISTRIBUTION - The distributions, which are of the same


concentration about the mean value as the normal distribution

PLATYKURTIC DISTRIBUTION - The distributions, which are of a lower (smaller)


concentration about the mean value than the normal distribution.
The distributions, which are flattered than the normal one.

LEPTOKURTIC DISTRIBUTION - The distributions, which are of a higher (greater)


concentration about the mean value than the normal distribution.
The distributions, which are more peaked than the normal one.

You might also like