You are on page 1of 4

STATISTICS

May 2022

Statistical indicators of univariate data – Descriptive analysis

A. Indicators of central tendency

The mean of a non-binary variable I. Ungrouped data

Significance: the average level of the n

variable, at sample level ∑ xi


i=1
x=
n
where ( x i )i=1 ,n is the level of the variable for statistical unit “i”

II. Grouped data


r

∑ x i ∙ ni
x= i=1r
∑ ni
i=1
where ( x i )i=1 ,r is the level of the variable for group “i”
( n i) i=1, r is the absolute frequency of group “i”
The mean of the binary variable m
f = , where m is the number of units in favorable case (which
n
meet a condition previously set), n is the total number of units
in the sample (the sample size)
The Median I. Ungrouped data
Rank ascendingly the data.
Significance: 50% of the sample units n+ 1
have the value of the variable lower than Find the Median position (location): Me pos =
2
the Median, and 50% - higher than it. If n is an odd number, Me = middle value in the ranked data
series
If n is an even number, Me= the mean of the two middle values
in the ranked data series.

II. Grouped data


Calculate the ascending absolute cumulative frequencies (Fai)

Find the Median position (location): Me pos =


∑ ni+ 1 = n+ 1
2 2
Find the first Fai > Me pos
The value of the variable corresponding to this previously
identified Fai is the Median.
The Mode The value of the variable (xi) with the maximum frequency
(with the maximum ni)
Significance: most units in the sample has
the value of the variable equal to the
Mode.

B. Indicators of variability

Range R=xmax-xmin
Quartiles
The first quartile (Q1)
25% of the sample-units have the value of the variable lower than Q1, and 75% - higher than it.
Follow the same steps as for determining the Median - for
ungrouped / grouped data, but change the Q1 position as follows: ∑ n i+1 n+1
Q 1 pos= =
4 4
The second quartile (Q2) is the Median (Me)

The third quartile (Q3)


75% of the sample-units have the value of the variable lower than Q3, and 25% - higher than it.
Follow the same steps as for determining the Median - for
ungrouped / grouped data, but change the Q3 position as follows: 3 (∑ ni +1) 3 ( n+1 )
Q 3 pos = =
4 4
Interquartile range
IQR=Q3 −Q1
The difference between the third and the first quartiles.
Outliers x i< Q1−1,5 ⋅ IQR
The values of the variable x iwhich meet one of the following x i> Q3+ 1,5⋅ IQR
conditions:
Variance – for a non-binary variable I. Ungrouped data

Significance: if s2=0, there is no variation in the data series, all n

values are equal to each other and equal to the Mean; if s2>0, there ∑ ( x i−x )2 if n≤30
2 i=1
is variability in the data series, the higher the s2 value, the greater s=
the variability. n−1
n

∑ ( x i−x )2 if n>30
2 i=1
s=
n

II. Grouped data


r

∑ ( x i−x )2 ∙ ni if n≤30
2 i=1
s=
n−1
r

∑ ( x i−x )2 ∙ ni if n>30
2 i=1
s=
n
The variance of a binary variable 2
s b=f ( 1−f )
Standard Deviation s= √ s
2

Significance: shows how many measurement units a value of the


data series deviates, on average from the sample mean
Coefficient of variation s
v= ⋅100
Significance: if v≤35%, the data series is homogeneous, the mean x
is representative; if v>35%, the data series is not homogeneous,
the mean is not representative

C. Indicators of the distribution shape

1. Skewness
Pearson’s Coefficient of Skewness ( Sk P ∈ [−1,1 ] ¿ P x−Mo
Sk( Mo)=
Interpretation: s
Type of skewness:
- Sk P >0 → positive skewness, small values predominate P 3 ( x−Me )
Sk( M e) =
- Sk P <¿ 0 → negative skewness, large values predominate s
- Sk P =¿0 → symmetrical distribution, no skewness
Strength (intensity) of skewness:
- |Sk P|≈ 0 → weak skewness
- |Sk P|≈ 0,5 → medium/average skewness
- |Sk P|≈ 1 → strong skewness
Fisher’s Coefficient of skewness
Interpretation:
Type of skewness: n

 Sk F >0 → positive skewness, small values predominate n


∑ ( x i− x̄ )3
i=1
 F
Sk < ¿0 → negative skewness, large values predominate sk ( F)=
( n−1 ) ( n−2 ) s
3
 Sk F =¿ 0 → symmetrical distribution, no skewness
Strength (intensity) of skewness: where
 0<|Sk F|< 0,5 → weak skewness
 0,5<|Sk F|<1 → medium/average skewness
 |Sk F|>1 → strong skewness

2. Kurtosis
Fisher’s Coefficient of Kurtosis
Interpretation: n

 Kurt >0 → leptokurtic distribution (higher than the


n ( n+1 )
∑ ( x i− x̄ )4 3 ( n−1 )
i=1
normal distribution), the values of the variable are more kurt = −
concentrated around the mean than in the normal ( n−1 ) ( n−2 ) ( n−3 ) s
4
( n−2 )( n−
distribution) unde
 Kurt <¿ 0 → platykurtic distribution (lower than the
normal distribution), the values of the variable are less
concentrated around the mean than in the normal
distribution)
 Kurt =0 → mezokurtic distribution (normal distribution)

You might also like