Professional Documents
Culture Documents
Practical
Mr. Ninganagouda P
Lecturer in biostatistics,
Dept. of community medicine,
SIMS&RH,Tumkur
Measures of central tendency
Definition:
The property of concentration of the observations around a central
value is called central tendency. The central value around which there
is concentration is called measures of central tendency( measures of
location, average).
If the observations x1, x2, ………., xn have frequencies f1, f2,……..,fn, the
arithmetic mean is -
x̄ =
x̄ =
2.Median
Definition :
Median of a set of values is the middle most values when they are arranged in
the ascending order of magnitude. (such an arrangement is called an array) It is a
value that is greater than half of the values and lesser than the remaining half.
The median is denoted by M.
Median(M)
3. Mode
Definition :
Mode is the value which has the highest frequency. It is the most
frequency occurring value. It is denoted by Z.
In the case of raw data, and also in the case of a discrete frequency
distribution, mode is the value with highest frequency.
• The logic behind its computation can be • It is not based on all the
easily understand. It can be easily computed. values.
• Even when some of the extreme values are • It cannot be used in deep
Median missing, it can be computed. statistical analysis.
• It is not affected by abnormal extreme • It is not as stable as
values. arithmetic mean.
• It can be graphically found out.
• The logic behind its computation can be • It is not based on all the
easily understand. It can be easily computed. values.
• Even when some of the extreme values are • It cannot be used in deep
Mode
missing, it can be computed. statistical analysis.
• It can be graphically found out • It is not stable as arithmetic
mean
MEASURES OF DISPERSIION
Definition : Variation (dispersion) is the property of deviation of
values from the average. The degree of variation is indicated by the
measures of variation.
Various measures of variation which are in common use are –
1. Range (R)
2. Quartile deviation or Semi-interquartile range (QD)
3. Mean Deviation (M.D)
4. Standard deviation (S.D)
5. Variance(σ𝟐 )
6. Coefficient of variation(C.V)
1.The Range :
Definition – “Range is the difference between the highest and the
lowest values in the data” is called range.
If H is the highest value and L is the lowest value in the data,
the range of variation is –
R=H–L
Relative measures of variation which is used for comparison
of frequency distributions is coefficient of range. It is –
Inter quartile Not affected by extreme values Does not tell you what happened
Often used with skewed data beyond the quartiles
range
Good measure Mathematical properties not
All values used useful(SD preferred)
Variance Used when data are fairly symmetrical Not so good if data are strongly
skewed
Problems on
Solution:
Mean:
We know that, Mean formula for raw data,
𝟐.𝟓+𝟑.𝟑+𝟏.𝟕+𝟒 +𝟑.𝟖+𝟐.𝟕+𝟒.𝟏+𝟑.𝟒+𝟑.𝟗+𝟒.𝟐+𝟑+𝟐.𝟖+𝟑.𝟏+𝟑+𝟐.𝟏
Mean( 𝑥ҧ ) =
𝟏𝟓
𝟒𝟕.𝟔
Mean( 𝑥ҧ ) = 𝟏𝟓
= 3.17 ≈ 3 𝑘𝑔
Range(R ):
Range(R) = H – L
Here, Highest value(H) = 4.2 and
Lowest value(L) = 1.7
Range(R) = 4.2 – 1.7
Range(R) = 2.5kg
2. The number of days taken for the appearance of signs and
symptoms after exposure : 12,19,13,5,14,10,22,43,14,51,5.
Find the mean, median and mode.
Solution:
Mean =
𝟏𝟐+𝟏𝟗+𝟏𝟑+𝟓+𝟏𝟒+𝟏𝟎+𝟐𝟐+𝟒𝟑+𝟏𝟒+𝟓𝟏+𝟓
Mean( 𝑥ҧ ) =
𝟏𝟏
𝟐𝟎𝟖
Mean( 𝑥ҧ ) = 𝟏𝟏
= 18.91 ≈ 19 days
Median: The arrayed series(ascending order) is-
5,5,10,12,13, ,14,19,22,43,51
(𝑛+1) 𝑡ℎ
Median(M) = value in the arrayed series
2
Here, n = 11,
(11+1) 𝑡ℎ
Median(M) = value in the arrayed series = 6𝑡ℎ value in the arrayed series
2
Median(M) = 14 days
Mode(Z) : Here, the value 5 and 14 are highest frequency(repeated
value)
Therefore, mode is Z = 5 and 14 days.
3. The following data relates to the number of children of
26 couples. Find the median.
No. of children per couple: 2,0,5,2,2,1,0,0,3,4,2,1,1,
2,3,0,1,2,7,2,2,1,3,4,1,5
Solution : The arrayed series ( ascending series ) is –
0 0 0 01 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 4 4 5 5 7
Here, n = 26. Therefore, median is –
𝑛+1 𝑡ℎ
𝑀𝑒𝑑𝑖𝑎𝑛 𝑀 = value in the arrayed series
2
26+1 𝑡ℎ
𝑀𝑒𝑑𝑖𝑎𝑛 𝑀 = value in the arrayed series
2
𝑀𝑒𝑑𝑖𝑎𝑛 𝑀 = 2+0.5(2 – 2)
𝟗.𝟑+𝟏𝟏.𝟓+𝟏𝟎.𝟐+𝟏𝟏.𝟐+𝟏𝟏.𝟕+𝟏𝟐.𝟏+𝟏𝟏.𝟐+𝟏𝟎.𝟕+𝟖.𝟕+𝟗.𝟑+𝟖.𝟗+𝟏𝟐.𝟒+𝟏𝟏.𝟔
Mean( 𝑥ҧ ) =
𝟏𝟑
Mean( 𝑥ҧ ) = 10.68 g/dl
Quartile deviation:
(𝒏+𝟏) 𝒕𝒉 (𝟏𝟑+𝟏) 𝒕𝒉 𝟏𝟒 𝒕𝒉
𝑸𝟏 = observation = observation = observation = 3.5
𝟒 𝟒 𝟒
𝑸𝟏 = 3rd value + 0.5(4th value – 3rd value) = 9.3 + 0.5(9.3 - 9.3) = 9.3
𝟑(𝒏+𝟏) 𝒕𝒉 𝟑(𝟏𝟑+𝟏) 𝒕𝒉
𝑸𝟑 = observation = observation = 10.5
𝟒 𝟒
𝑸𝟑 − 𝑸𝟏 𝟏𝟏.𝟕 − 𝟗.𝟑
QD = = = 1.2 g/dl
𝟐 𝟐
Solution:
Measures of central tendency:
Mean ( 𝑥ҧ ) = =
𝟕+𝟏𝟎+𝟏𝟗+𝟏𝟒+𝟏𝟓+𝟏𝟎+𝟏𝟑+𝟏𝟒+𝟗+𝟏𝟕+𝟏𝟏+𝟏𝟔+𝟏𝟏+𝟏𝟕+𝟏𝟏+𝟖
Mean( 𝑥ҧ ) =
𝟏𝟔
202
Mean( 𝑥ҧ ) = = 12.61 ≈ 13 KG
16
Median(M) = The arrayed series(ascending order) is-
7, 8, 9, 10, 10, 11, 11, 11, 13, 14, 14, 15, 16, 17, 17, 19
(𝑛+1) 𝑡ℎ
Median(M) = value in the arrayed series
2
Here, n = 16,
(16+1) 𝑡ℎ
Median(M) = value in the arrayed series = 8.5𝑡ℎ value in the arrayed series
2
Median(M) = 8th value + 0.5(9th value – 8th value) = 11 + 0.5(13-9) = 15.5 kg.
Interpretation: ?
6. Given below table is frequency distribution of individuals in a community
according to the number of illness suffered by them in a year.
No. of illness suffered in a No. of individuals
year
0 24
1 76
2 79
3 81
4 86
5 51
6 26
7 43
Total 466
0 24 0 x 24 = 0 24 σ𝒏
𝒊=𝟏 𝒇𝒊 𝒙𝒊
1 76 1 x 76 = 76 24 + 76 =100 Mean(𝑥)ҧ =
𝑵
2 79 2 x 79 = 158 100 + 79 = 179
3 81 3 x 81 = 243 179 + 81 = 260 𝟏𝟓𝟑𝟑
Mean(𝑥)ҧ =
4 x 86 = 344 𝟒𝟔𝟔
4 86 260 + 86 = 346
5 51 5 x 51 = 255 346 + 51 = 397
Mean(𝑥)ҧ = 3.3≈ 3
6 26 6 x 26 = 156 397 + 26 = 423
7 43 7 x 43 = 301 423 + 43 = 466 Mean(𝑥)ҧ = 3 illness
Total 466 1533
suffered per year
σ 𝒇𝒊 𝒐𝒓 N = 466 σ 𝒇𝒊 𝒙𝒊 =1533
Range(R) = H – L = 86 - 24 = 62 individuals
𝑄3 + 𝑄1
Quartile deviation: Q.D =
2
No. of illness
No. of individuals
suffered in a year
(𝒇𝒊 ) LCF
(𝒙𝒊 )
𝑁+1 𝑡ℎ
𝑄1 = Observation
0 24 24 4
1 76 100 = 116.75 𝑡ℎ observation
2 79 179 =2
3 81 260
4 86 346 𝑁+1 𝑡ℎ
𝑄3 = 3 Observation
5 51 397 4
6 26 423 = 350.25 𝑡ℎ observation
7 43 466 =5
Total 466
𝑄3 + 𝑄1 5 −2
Quartile deviation(Q.D) = = = 1.5 ≈ 2 illness suffered per year
2 2
σ 𝑓𝑖 𝑥𝑖 −𝑥ҧ 2 1811
1. = = = 1.97 ≈ 𝟐 illness suffered per year
𝑁 466
N
σ 𝑓𝑖 𝑥𝑖−𝑥ҧ 2 1811
69 Variance =σ2 = σ2 = = 3.89 ≈ 4 illness suffered
𝑁 466
σ 𝟐
Coefficient of variation(CV) = x 100, C.V = x 100 = 66.67 %
µ 𝟑
Interpretation: Therefore, Mean±SD is 3.3±1.97 illness suffered per year
Mean with interquartile range(Q1 to Q3) is 3 with (IQR) is (2-5) illness suffered per year
and the given data has mode 4 and range is 62.
7 .In a clinical trial to see the effect of improvement in haemoglobin
among anaemic patients, the following data were obtained. Calculate
the median of haemoglobin level.
Distribution of haemoglobin level of 50 patients
Haemoglobin level (gm%) No. of patients
4-6 2
6-8 3
8 - 10 15
10 - 12 20
12 - 14 10
Total 50
Solution : The given data is in grouped or classified data, Then
In order to do the calculation , of above table has to be
reconstructed as given below
< C.F (less than
Hb. (gm%) No. of patients Upper class
cumulative
(Class Interval) (Frequency = f) boundry
frequency)
4-6 2 6 2
6-8 3 8 5
8 - 10 15 l 10 20 m
l 10 - 12 20 12 40
12 - 14 10 14 50
Total N = 50
f Median class
Given, N = 50 (total frequency)
l = 10 (lower boundary of median class)
m = 20 (cumulative frequency up to the median class)
c = 2 (class interval), i.e. 14 – 12 = 12 -10 …… = 2
(equal class intervals) and
f = 20 (frequency in the median class), i.e. frequency in class 10 - 12
Median formula for grouped or classified data as given below,
Median formula
(grouped data)
8. The distribution of systolic blood pressure(mmHg) values of patients attending at rural health
camp is given in the following table,
Systolic blood pressure(mmHg) Number of patients
90 – 100 3
100 – 110 5
110 – 120 7
120 – 130 10
130 – 140 15
140 – 150 11
150 – 160 9
160 – 170 6
170 - 180 4
Total 70
Calculate all the measures of central tendency , dispersion and coefficient of variation and
Interpret the results.
Solution: Mean
Systolic blood Number of
Mid point of CI
pressure(mmHg) patients f*x LCF
(x)
(Class interval) (f)
90 – 100 1 95 95 1
𝑁 60
Given, l = 130, f = 13, m = 19, c = 10, = =30
2 2
30 − 19 ∗10
Median(M) = 130 +
13
Mode: Mode formula for grouped or classified or tabulated data as given below
𝑓 − 𝑓1 ∗𝑐
Mode(Z) = l +
2𝑓 −𝑓1 −𝑓2
13 −11 ∗10
Mode(Z) = 130 +
2 ∗13 −11 −12
𝑁
4
− 𝑚1 ∗𝑐
𝑁𝑡ℎ
𝑄1 = 𝑙1 + , 𝑄1 = observation
𝑓1 4
60
= observation = 15th observation
60 4
4
−8 ∗10 𝑄1 class = 120 - 130
𝑄1 = 120 +
11
𝑁 𝑡ℎ 60
𝑄3 class = 3 observation =3* = 45th observation,
4 4
𝑄3 class is 150-160
3 ∗60
4
− 44 ∗10
𝑄3 = 150 + , 𝑄3 = 150+1.11 = 151.11 mmHg
9
Total 60 =N 18820
σ 𝑓𝑖 𝑥𝑖 −𝑥ҧ 2 18820
Standard deviation (σ) = = = 17.71 mmHg
𝑁 60
σ 𝟏𝟕.𝟕𝟏
Coefficient of variation(CV) = x 100, C.V = x 100 = 12.74 %
µ 𝟏𝟑𝟗
σ 𝟒
Coefficient of variation(CV) of treatment group = x 100, C.V = x 100 = 25
µ 𝟏𝟔
σ 𝟑
Coefficient of variation(CV) of control group = x 100, C.V = x 100 = 27
µ 𝟏𝟏
Therefore the C.V value of treatment group(25) is less than the C.V value of control group (27).
Thus treatment group is more consistent.
10. In a study on dental hygiene, conducted by department of community medicine in
association with department of dentistry the following data were obtained,
Group Sample Size Mean dental hygiene score S.D
Rural school children 60 24 6
Urban school children 60 28 3
Compare the both the groups and find which group is more consistent.
Solution:
σ 𝟔
Coefficient of variation(CV) of rural school children = x 100, C.V = x 100 = 25
µ 𝟐𝟒
σ 𝟑
Coefficient of variation(CV) of urban school children = x 100, C.V = x 100 = 10.71
µ 𝟐𝟖
Therefore the C.V value of rural school children (25) is greater than the C.V value of urban
school children (10.71).
Thus urban school children dental hygiene is more consistent.
THANK YOU