2023 Statistics Fin 5

Statistics
Lecture 5
Descriptive Statistics
1. ANALYSIS OF LOCATION (CENTRAL TENDENCY)

where along the scale of all possible values our particular distribution
happens to be centered (mean, median, mode)
The purpose - to describe population in one figure a representative value of a mass of data.
THE MEASURES OF POSITION MEASURES OF LOCATION
• Mode
• Quantiles: Quartiles, Deciles, Percentiles
2. ANALYSIS OF DISPERSION (VARIATION)

how the data varies (it always should complement measures of location
3. ANALYSIS OF SKEWNESS (SYMMETRY)

whether the distribution is symmetric or skewed
4. ANALYSIS OF CONCENTRATION
ANALYSIS OF CONCENTRATION –
Distribution of the total value between the elementary units -
whether the total value of the variable is uniformly distributed
between the elementary units or not
ANALYSIS OF KURTOSIS(PEAKEDNESS)
whether the distribution is mesokurtic, leptokurtic or platykurtic
Concentration of elementary units near the mean value
Descriptive Statistics - MEASURES OF CENTRAL TENDENCY - Measures of Position
Some Properties of The Median the value above which and below which lie an equal number of observations
• always exists and is unique for every distribution
• is not affected by extreme values in the distribution - useful measure of a

representative value for a highly skewed distributions
• can be estimated from a frequency distribution even if the distribution has

unequal class intervals or open-ended classes
Telephone-calls duration - xi Number of calls - ni

(in min.)
Less than 2 9
2-4 15
4-8 6
Above 8 3
Quartiles Q , Q EXCEL --- > QUARTILE

1 3
Q , Q we calculate like Q
1 3 2
STEPS:
• we have to calculate the value N/4 or 3N/4
Q1 exceeds 25% of the data and is exceeded by 75% of the data • we have to have a cumulative serie
- to determine Q1 and Q3 class interval
The 1st quartile (lower) - Q1
– the lower boundary of the class containing the Q1

– the frequency of the class containing the Q1
– the cumulative frequency of the class preceding the Q1
– the width of the class containing the Q1
The 2nd quartile - Q2=MEDIAN 1st
Q3 exceeds 75% of the data and is exceeded by 25% of the data.
The 3rd quartile (upper) - Q3
– the lower boundary of the class containing the Q3

– the frequency of the class containing the Q3
– the cumulative frequency of the class preceding the Q3
– the width of the class containing the Q3
Quartiles Q , Q EXCEL --- > QUARTILE

1 3
Q , Q we calculate like Q
1 3 2
STEPS:
• we have to calculate the value N/4 or 3N/4
Q1 exceeds 25% of the data and is exceeded by 75% of the data • we have to have a cumulative serie
- to determine Q1 and Q3 class interval
The 1st quartile (lower) - Q1 newlyweds percentage
by age of men - xi cum ni
ni
– the lower boundary of the class containing the Q1 less than 20 1 1
– the frequency of the class containing the Q1 Q1 20 - 25 29 30
25 - 30
– the cumulative frequency of the class preceding the Q1 44 74
– the width of the class containing the Q1 Q3 30 - 35 14 88

35 – 40 4 92
40 – 50 4 96
50 and more 4 100
The 2nd quartile - Q2=MEDIAN 1st
Total 100
Q3 exceeds 75% of the data and is exceeded by 25% of the data. N 25 − 1

4
= 25 Q1 = 20 +  5 = 24
29
The 3rd quartile (upper) - Q3 The interpretation: 75% of men get married with ages above
the 24 and 25% of men get married with ages below the 24.
– the lower boundary of the class containing the Q3 3N 75 − 74

= 75 Q = 30 +  5 = 30
– the frequency of the class containing the Q3 4 3
14
– the cumulative frequency of the class preceding the Q3 The interpretation: 25% of men get married with ages above
– the width of the class containing the Q3 the 30 and 75% of men get married with ages below the 30.
Descriptive Statistics - MEASURES OF CENTRAL TENDENCY
Comparison of the MEAN, MEDIAN and MODE

MEAN - the center of gravity of the distribution.
MEDIAN - divides the area into two equal parts
MODE - highest point of the distribution.
Symmetrical distribution - has only one peak and is symmetric,

- the median, the mean and the mode all coincide
The bell shaped normal curve is an example of such a case.
https://www.quora.com/How-can-I-tell-if-I-see-a-normal-distribution


– because eg. mean, when appear alone can be very misleading)

Descriptive Statistics - MEASURES OF DISPERSION
MEASURES OF DISPERSION – describe the amount of spread in a set of data in one of two ways:
DISTANCE MEASURES - the distance between two particular observed values (e.g. we can calculate Q3 - Q1)
• can be calculated quickly and easily,
• frequently used in describing the data, rarely used to measure the variation in a set of data
when we are interested in testing hypothesis about a population parameter
DEVIATIONS FROM THE MEAN - the average of the deviations of the units from some central value (mean)
DISTANCE MEASURES
THE RANGE
THE INTERQUARTILE RANGE IQR = Q3 - Q1
only 50% of the observations fall between Q1 and Q3
E.g. Variable - newlyweds by age of men
IQR=Q3-Q1=30-24=6
Interpretation: 50% of the men get married between 24 years and 30.
The age of 50% of newly married man vary from 24 till 30 and
the range of the middle 50% of the newly married man equals about 6 years.
DISTANCE MEASURES
QUARTILE DEVIATION
(SEMI-INTERQUARTILE RANGE)
- measure of dispersion from the median (the average spread of the 1st and the 3rd quartiles from the median)
- the measure of the middle 50% of the data
Interpretation: The age of newly married man differ from the median
for about 3 years on the average.
DEVIATIONS FROM THE MEAN: ! Problem with averaging the deviations from the mean –
the positive and the negative deviations cancel each other
THE MEAN ABSOLUTE DEVIATION EXCEL --- > AVEDEV

the average value of the absolute deviations from the mean
n k
 xi − x  x'i − x fi
d = i =1 d = i =1 In the case of grouped data
n n
THE VARIANCE - the average value of the squared deviations from the mean
n n N
 ( xi − x ) 2
 xi2  ( xi −  )2
SAMPLE s 2 = i =1 = i =1 − x 2 POPULATION  2 = i =1
n −1 n −1 N
problems with interpretation (measures the average of the squared deviations)

! frequently a very large number relative to the original observations
the squared root of the variance is calculated STANDARD DEVIATION

DEVIATIONS FROM THE MEAN EXCEL --- > STDEVP

STANDARD DEVIATION SAMPLE s = sx = S = S x POPULATION 
N
 (x
N
detailed series − x) x
2 2
i i
Sx = i =1
Sx = i =1
− x2
Id Wages - x
i
N N
1 1800 40000
2 2000 0
3 2200 40000
E.g. Variable – wages of workers
Total 6000 80000
Interpretation: The wage of workers differ from
the mean on the average for about 163.3 zloty.
frequency distribution of a discrete variable
x
k
 (x − x ) ni
2 2
Grades Number of
xini i i ni
- xi grades - ni
2 0 0 0 Sx = i =1
Sx = i =1
− x2
3 3 9 3 N N
4 10 40 0
5 3 15 3
Total 16 64 6
E.g. Variable – The grades
Interpretation: The grades differ from the mean on

the average for about 0.61.
DEVIATIONS FROM THE MEAN EXCEL --- > STDEVP

STANDARD DEVIATION
frequency distribution of a continuous variable
k k
 (x'i − x ) 2
ni  2
x ' i ni
Sx = i =1
Sx = i =1
− x2
N N
Wages - xi Number of midpoint

employees -
ni
0-6 3 3 9 27
6 - 12 4 9 36 324
12 - 18 13 15 195 2925
Total 20 240 3276 E.g. Variable – wages of accountants
Interpretation: The wage of of accountants differ from the

mean on the average for about 4450 zloty.
Descriptive Statistics - MEASURES OF DISPERSION CHEBYSHEV’S THEOREM
Empirical (68-95-99.7) Rule

For data sets having a distribution that is approximately bell shaped, the following properties apply:
• about 68% of all values fall within 1 standard deviation of the mean
• about 95% of all values fall within 2 standard deviations of the mean
• about 99.7% of all values fall within 3 standard deviations of the mean
Empirical (68-95-99.7) Rule

For data sets having a distribution that is approximately bell shaped, the following properties apply:
• about 68% of all values fall within 1 standard deviation of the mean
• about 95% of all values fall within 2 standard deviations of the mean
• about 99.7% of all values fall within 3 standard deviations of the mean
THE TYPICAL RANGE

E.g. Variable – wages of accountants
Interpretation:
A typical of accountant earns ranged from 7.55 till 16.45 zloty.
Or About 68% of of accountants earns ranged from 7.55 till 16.45 zloty.
THE POSITION version of TYPICAL RANGE

II THE TYPICAL RANGE
Interpretation: A typical age of newlyweds men ranged from 24 to 30.

THE COEFFICIENT OF VARIATION - the sample Standard Deviation stated as a proportion of

the arithmetic Mean of the distribution
Sx
Vx =  100% E.g. Variable – wages of accountants
x
The lower the value of the coefficient of variation,
the more homogeneous the population Interpretation: The standard deviation of the wage of
of accountants constitutes over 37% of the mean
THE POSITION version of coefficient of variation

II THE COEFFICIENT OF VARIATION - the Quartile Deviation stated as a proportion of the Median
Interpretation: The quartile deviation of newlyweds

men constitutes over 11% of the mean



Descriptive Statistics – MEASURES OF SKEWNESS (SYMMETRY)
Whether the distribution is symmetric or skewed (positively or negatively)?

EXCEL --- > SKEW
Mean=Median= Mode
the distribution is negatively skewed the distribution is positively skewed

the most of values are to the right from the mean x -D the most of values are to the left from the mean
Age of death - most of people die at the ages over 60, Income - most people have low incomes,
many economic and demographic variables
Pearson’s coefficient of skewness
(1) x − D = 3( x − Me) (2) (3) As  − 1;1

Measures of position:
Yule's coefficient AQ  − 1;1
the direction of skewness

AS, AQ = 0 - the distribution is symmetric
AS, AQ < 0 - the distribution is skewed to the left (negatively skewed)
AS, AQ > 0 - the distribution is skewed to the right (positively skewed )
the strength of skewness

if the value of the coefficient is close to zero, E.g. As=0.8 – the distribution is strongly skewed to the right
it means that the distribution is slightly skewed the most of values are to the right from the mean
0.1 − 0.4 - the distribution is slightly skewed
As= -0.1 - the distribution is slightly skewed to the left
0.5 − 0.7 - the distribution is moderately skewed
the most of values are to the left from the mean
0.8 − 1.0 - the distribution is strongly s skewed
The analysis of skewness can be presented by calculating the 3rd moment about the mean
(the cubed deviations from the mean)
The normalised third central moment
 =  i
( x − x ) 3
3
N
is a standardised measure of skewness




Descriptive Statistics – MEASURES OF KURTOSIS
The analysis of kurtosis can be presented by calculating the 4th moment about the mean
EXCEL --- > KURT
The normalised fourth central moment
is a standardised measure of kurtosis
MESOKURTIC (normal) DISTRIBUTION - The distributions, which are of the same

concentration about the mean value as the normal distribution
PLATYKURTIC DISTRIBUTION - The distributions, which are of a lower (smaller)

concentration about the mean value than the normal distribution.
The distributions, which are flattered than the normal one.
LEPTOKURTIC DISTRIBUTION - The distributions, which are of a higher (greater)

concentration about the mean value than the normal distribution.
The distributions, which are more peaked than the normal one.

2023 Statistics Fin 5

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2023 Statistics Fin 5

Uploaded by

Copyright:

Available Formats

Statistics

1. ANALYSIS OF LOCATION (CENTRAL TENDENCY)

2. ANALYSIS OF DISPERSION (VARIATION)

3. ANALYSIS OF SKEWNESS (SYMMETRY)

• always exists and is unique for every distribution

• is not affected by extreme values in the distribution - useful measure of a

• can be estimated from a frequency distribution even if the distribution has

Telephone-calls duration - xi Number of calls - ni

Quartiles Q , Q EXCEL --- > QUARTILE

– the lower boundary of the class containing the Q1

The 2nd quartile - Q2=MEDIAN 1st

Q3 exceeds 75% of the data and is exceeded by 25% of the data.

The 3rd quartile (upper) - Q3

– the lower boundary of the class containing the Q3

Quartiles Q , Q EXCEL --- > QUARTILE

– the width of the class containing the Q1 Q3 30 - 35 14 88

Q3 exceeds 75% of the data and is exceeded by 25% of the data. N 25 − 1

– the lower boundary of the class containing the Q3 3N 75 − 74

Comparison of the MEAN, MEDIAN and MODE

Symmetrical distribution - has only one peak and is symmetric,

The bell shaped normal curve is an example of such a case.

1. ANALYSIS OF LOCATION (CENTRAL TENDENCY)

2. ANALYSIS OF DISPERSION (VARIATION)

3. ANALYSIS OF SKEWNESS (SYMMETRY)

only 50% of the observations fall between Q1 and Q3

E.g. Variable - newlyweds by age of men

E.g. Variable - newlyweds by age of men

THE MEAN ABSOLUTE DEVIATION EXCEL --- > AVEDEV

problems with interpretation (measures the average of the squared deviations)

the squared root of the variance is calculated STANDARD DEVIATION

DEVIATIONS FROM THE MEAN EXCEL --- > STDEVP

frequency distribution of a discrete variable

Interpretation: The grades differ from the mean on

DEVIATIONS FROM THE MEAN EXCEL --- > STDEVP

frequency distribution of a continuous variable

Wages - xi Number of midpoint

Interpretation: The wage of of accountants differ from the

Empirical (68-95-99.7) Rule

Empirical (68-95-99.7) Rule

THE TYPICAL RANGE

THE POSITION version of TYPICAL RANGE

Interpretation: A typical age of newlyweds men ranged from 24 to 30.

THE COEFFICIENT OF VARIATION - the sample Standard Deviation stated as a proportion of

THE POSITION version of coefficient of variation

E.g. Variable - newlyweds by age of men

Interpretation: The quartile deviation of newlyweds

1. ANALYSIS OF LOCATION (CENTRAL TENDENCY)

2. ANALYSIS OF DISPERSION (VARIATION)

3. ANALYSIS OF SKEWNESS (SYMMETRY)

Whether the distribution is symmetric or skewed (positively or negatively)?

the distribution is negatively skewed the distribution is positively skewed

Pearson’s coefficient of skewness

(1) x − D = 3( x − Me) (2) (3) As  − 1;1

the direction of skewness

the strength of skewness

The normalised third central moment

is a standardised measure of skewness

1. ANALYSIS OF LOCATION (CENTRAL TENDENCY)

2. ANALYSIS OF DISPERSION (VARIATION)

3. ANALYSIS OF SKEWNESS (SYMMETRY)

is a standardised measure of kurtosis

MESOKURTIC (normal) DISTRIBUTION - The distributions, which are of the same

PLATYKURTIC DISTRIBUTION - The distributions, which are of a lower (smaller)