Unit 13 - Basic Statistics

Mathematics for IT Unit 13
Unit 13 Basic Statistics

Structure:
13.1 Introduction
Objectives
13.2 Measures of Central Tendency
13.3 Standard Deviation
13.4 Discrete Series
13.5 Methods: Deviation taken from Assumed Mean
13.6 Continuous Series
13.7 Combined Standard Deviation
13.8 Coefficient of Variation
13.9 Variance
13.10 Summary
13.11 Terminal Questions
13.12 Answers
13.1 Introduction
In this unit we discuss some of the concepts of Basic Statistics. The single
value, which is representative of a set of values, may be used to give an
indication of the general size of the members in a set, the word ‘average’
often being used to indicate the single value. The Statistical term used for
‘average’ is the arithmetic mean or just the mean. Other measures of central
tendency may be used and these include the median and the modal values.
The standard deviation of a set of data gives an indication of the amount of
dispersion, or the scatter, of members of the set from the measure of central
tendency.
Objectives:
At the end of the unit you would be able to understand
 the concept of central tendency and its applications
 how to calculate standard deviation and variation of a given set of data.
13.2 Measures of Central Tendency

Measures of central tendency
This chapter is devoted to different measures used to summarize the data.
Different measures discussed are Mean, Median and Mode. Along with
Manipal University Jaipur B0947 Page No.: 324
these three fundamental and trivial measures, two other measures,

Geometric Mean, Harmonic Mean are clearly introduced. Definition, method
of computation, Interpretation and uses form the structure of the explanation
for each measure.
Generally, in a frequency distribution, the values cluster around a central
value. This property of concentration of the values around a central value is
called Central Tendency. The central value around which there is
concentration is called measure of central tendency (measure of location,
average).
Generally, a simple comparison of frequency distribution is made by
comparing their measures of central tendency.
For a frequency distribution, five important measures of central tendency are
defined.
They are:
1. Arithmetic Mean (A.M.)
2. Median
3. Mode
4. Geometric Mean (G.M.)
5. Harmonic Mean (H.M.)
Depending upon the need and nature of study, proper measure is chosen.
However, a measure of central tendency is considered to be good if it has
some of the following qualities:
Desired qualities of an ideal measure of central tendency.
1. It should be easy to understand. Its computation procedure should be
simple
2. It should be rigidly defined
3. It should be based on all the values
4 It should not be affected too much by abnormal extreme values
5. It should be capable of further algebraic treatment so that it could be
used in further analysis of the data
6. It should be stable. That is, the measure should be such that sampling
variation in the value of the measure should be least.
Arithmetic Mean (Mean)
Arithmetic mean of a set of values is obtained by dividing the sum of the
values by the number of values in the set. Arithmetic mean of the values

x1 , x 2 , .......... , x n is 
x
x1  x 2  .......  x n

x
n n
If the observations x1 , x 2 , .......... , x n have frequencies f1 , f2 , .......... .fn ,
the arithmetic mean is
f1 x1  f2 x 2  ....... fn x n  fx
x 
f1  f2  .........  fn N
(for discrete frequency distribution)
Where N = f is the total frequency.
Thus, for a raw data, the arithmetic mean is
x
x
n
For a tabulated data (discrete or continuous), it is
x
 fx
N
Example 1: Heights of six students are 163, 173, 168, 156, 162 and 165
cms. Find the arithmetic mean.
Solution: The arithmetic mean of the heights is
x
 x  163  173  168  156  162  165
n 6
987
  164.5 cms.
6
Example 2: In a one-day cricket match, a bowler bowls 8 overs. He gives
away 3, 5, 12, 0, 4, 1, 3 and 7 runs in these overs. Find the mean run rate
per over.
Solution: The mean run rate is
x
Sum of values

x
Number of values n
3  5  12  0  4  1  3  7

8
35
  4.375 runs per over.
8

Example 3: In an office there are 84 employees. Their salaries are as given

below:
Salary
2430 2590 2870 3390 4720 5160
(Rs.)
Employees 4 28 31 16 3 2
i) Find the mean salary of the employees

ii) What is the total salary of the employees ?
Solution:
Salary (Rs.) Employees
fx
(x) (f)
2430 4 9720
2590 28 72520
2870 31 88970
3390 16 54240
4720 3 14160
5160 2 10320
Total 84 249930
(i) The mean salary of the employees is
x
 fx  249930  Rs. 2975.36
N 84
(ii) Total salary of the employees is
 fx = Rs. 249930.
Example 4: A survey of 128 smokers revealed the following frequency
distribution of daily expenditure on smoking of these smokers. Find the
mean daily expenditure
Expenditure (Rs.) 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70 70 – 80
No. of smokers 23 44 35 12 9 3 2

Solution:
Expenditure
Frequency (f) Mid-value (x) fx
(Rs.)
10 – 20 23 15 345
20 – 30 44 25 1100
30 – 40 35 35 1255
40 – 50 12 45 540
50 – 60 9 55 495
60 – 70 3 65 195
70 – 80 2 75 150
Total 128 - 4050
The mean is
x
 fx  4050  Rs. 31.64
N 128
The mean daily expenditure is Rs. 31.64.
Change of Origin and Scale
Let x1, x2, ………., xn be n values. Let ‘a’ be a constant. Then x1 – a, x2 – a,
…….., xn – a are the values of x1, x2,……….. xn with origin shifted to ‘a’. If ‘c’
is a positive constant,
x1  a x2  a x a
, ,.........., n
c c c
are the values x1, x2, ………., xn with origin shifted to a and scale changed
x a
by c. Thus, u  is the variable x with origin shifted to a and scale
c
changed by c.
x a
Here u  therefore, x = a + cu
c
And so, x  a  cu  a 
c  fu
N
However, if c = 1, x  a  u  a 
u
n

Deviations: Let x1, x2, x3, …….., xn be n values. Let ‘a’ be a constant. Then
x1  a, x2  a, x3  a,....., x n  a are the deviations of the values from
the constant a. The squares of these deviations, namely,
x1  a2 , x2  a 2 , x3  a2 ,........., xn  a 2 are the squared deviations of
the values.
Thus, x1  x , x 2  x , x 3  x , .......... , x n  x  are the deviations from the
arithmetic mean.
x1  x 2 , x2  x 2 , x3  x 2 , .......... , x n  x 2 are the squared deviations
from the arithmetic mean. The deviations may be positive, negative or zero.
But, the squared deviations will never be negative.
Properties of Arithmetic Mean:
Arithmetic mean has the following important properties:
1. Algebraic sum of the deviations of a set of values from their arithmetic
mean is zero
That is,  x  x   0
2. Sum of the squared deviations of a set of values is minimum when
deviations are taken around the arithmetic mean.
3. Let x1 be the arithmetic mean of a set of n1 values. And let x2 , be the
arithmetic mean of another set of n2 values. Then, the arithmetic
mean of the two sets of values put together is
n x  n2 x 2
x 1 1 (combined arithmetic mean)
n1  n2
Example 5: The mean of marks scored by 30 girls of a class is 44%. The

mean for 50 boys is 42%. Find the mean for the whole class.
Solution: Here n1  30 , n2  50 , x1  44% and x 2  42%. The combined
arithmetic mean is
n x  n2 x 2
x 1 1
n1  n2
30  44  50  42 1230  2100
   42.75%
30  50 80

Example 6: Average (mean) weight of a type of screws is 10.4 gms. A

packet of 100 such screws is mixed with another packet of 150 screws of
another type. In the mixture, average weight is found to be 10.9 gms. Find
the average weight of the second packet of screws.
Solution:
Here n1  100 , n2  150 , x1  10.4 gms. and x  10.9 gms.
The mean weight of the second set of screws x 2  can be calculated by
using the relation.
n x  n2 x 2
x 1 1
n1  n2
100  10.4  150 x 2
10.9 
100  150
1040  150 x 2
10.9 
250
Therefore, 150 x 2  10.9   250  1040  1685
1685
And so, x 2   11.23 gms.
150
Thus, the mean weight of screws in the second packet is x 2  11.23 gms.
Merits of arithmetic mean:
1. It is rigidly defined
2. The logic behind its computation can be easily understood. It can be
easily computed.
3. It can be easily adopted for further statistical analysis
4. It is based on all the values
5. It is more stable than any other average
6. It can be calculated even when some of values are equal to zero or
negative.
Demerits of arithmetic mean:
1. It is highly affected by abnormal extreme values
2. Since it is based on all the values, even if one of the values is missing, it
cannot be calculated.
3. Sometimes, the arithmetic mean may be a value which is not assumed
by the variable.

Median
Median of a set of values is the middle most value when they are arranged
in the ascending order of magnitude. (Such an arrangement is called an
array). It is a value that is greater than half of the values and lesser than the
remaining half. The median is denoted by M.
In the case of a raw data and also a discrete frequency distribution, the
mean is-
 n  1 
th
M  value in the arrayed series
 2 
In the case of a continuous frequency distribution, the median is –
 N  
 2  m  c 
M  I    
 f 
 
 
Where I : lower limit of the median class
c : width of the median class
f : frequency of the mean class.
m : less-than cumulative frequency up to/ (cumulative
frequency corresponding to the class preceding the median
class)
N : Total frequency
Median class is the class which contains the median.
Example 7
The following data relates to the number of children of 25 couples. Find the
median
No. of children per couple: 2, 0, 5, 2, 5 , 1, 0, 0, 3, 4, 2, 1, 1, 2, 3, 0,
1, 2, 7, 2, 2, 1, 3, 4, 1.
Solution:
The arrayed series (ascending series) is:
0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 5,
7
Here, n =25. Therefore, median is
 n  1 
th
M  value in the arrayed series
 2 
 25  1 
th
 th
 value = 13 value
 2 
= 2 children per couple
Merits of median:
1. The logic behind its computation is easily understood. It can be easily
computed.
2. Even when some of the extreme values are missing, it can be
computed.
3. It is not affected by abnormal extreme values
4. It can be used for the study of qualitative data also.
Demerits of median:
1. It is not based on all the values
2. It cannot be used in deep statistical analysis.
Mode
Mode is the value which has the highest frequency. It is the most
frequently occurring value. It is denoted by Z.
In the case of raw data, and also in the case of a discrete frequency
distribution, mode is the value with highest frequency.
In the case of a continuous frequency distribution, mode is
 f  f1   c 
Z I 
 2 f  f1  f2 
Where l : lower limit of the modal class
f : frequency of the modal class
c : width of the modal class
f1 : frequency of the class preceding the modal class
f2 : frequency of the class succeeding the modal class
Modal class is the class width containing the mode.
Generally, modal class will be the class with highest frequency. But
sometimes, it may be a class other than the class with highest
frequency. In such a situation, mode is obtained by using the formula –
 cf2 
Z l 
 f1  f2 
Most of frequency distributions have only one value with highest frequency.
Such frequency distributions are unimodal – they have only one mode. On

the other hand, if in a frequency distribution, there is more than one value
with highest frequency, such a distribution is multimodal – it will have more
than one mode. If there are two modes, the distribution is bimodal.
However, for a distribution which has more than one mode, it is said to be ill
– defined.
Example 8:
The following are the number of children for 20 couples. Find the mode.
No. of children per couple: 2, 3, 6 , 3, 4, 0, 5, 2 , 2, 4, 3, 2, 1, 0, 4, 2, 2, 1, 1, 3
Solution:
The data should be tabulated first
No. of couples
No. of children
Tally marks Frequency
0 II 2
1 III 3
2 IIII I 6
3 IIII 4
4 III 3
5 I 1
6 I 1
Total 20
Here, the value 2 has the highest frequency.

Therefore, mode is Z = 2 children/ couple.
Example 9:
For the following distribution, find the mode
Percentage marks 10 – 19 20 – 29 30 – 39 40 – 49 50 – 59 60 – 69 70 – 79
No. of students 8 19 29 36 25 13 4

Solution:
Here, the class intervals are of inclusive type. Firstly, they should be
converted into the exclusive type.
Modified class Frequency

9.5 – 19.5 8
19.5 – 29.5 19
29.5 – 39.5 29 (f1)
39.5  49.5 36 f 
49.5 – 59.5 25 (f2)
59.5 – 69.5 13
69.5 – 79.5 4
Total 134
Modal class
Since 36 is the highest frequency and it is far higher than the other
frequencies, the class interval 39.5 – 49.5 is the modal class. Thus l = 39.5,
f = 36, f1 = 29, f2 = 25 and c = 10.
The model is –
 f  f1   c   36  29   10 
Z l   39.5   
 2f  f1  f2   2  36  29  25 
 70 
 39.5     43.4%
 18 
Merits and demerits of mode
The merits and demerits of mode are the same as merits and demerits of
median. In addition, one demerit of mode can be listed. It is –
For some frequency distribution, mode is ill – defined.
13.3 Standard Deviation

Definition: Standard Deviation is the root mean square deviation of the
values from their arithmetic mean.
Standard Deviation is the abbreviation and  (read, sigma) is the symbol.
Mean square deviation of the value from their arithmetic mean is variance
and is denoted by 2. Standard Deviation is the positive square root of

variance. Karl Pearson introduced the concept of standard deviation in

1893. Standard Deviation is also called mean square deviation. It is a
mathematical deficiency of mean deviation to ignore negative sign. Standard
deviation possesses most of the desirable properties of a good measure of
dispersion. It is the most widely used absolute measure of dispersion.
The corresponding relative measure is coefficient of variation. It is very
popular and so extensively used.
Standard Deviation
Coefficient of variation   100
Arithmatic mean
Formulae
Individual
Method Discrete Series Continuous Series
Observation
 x  x 2 f x  x  f m  x 2
2
Actual
1.
Mean N N N
2 2 2
Direct x 2  x   fx 2  fx  fm 2  fm 
2.      
method N N  N  N  N  N 
2 2 2
Assumed d 2  d  fd 2  fd  fd 2  fd 
3.      
mean N N  N  N  N  N 
2 2 2
d' 2  d'  fd' 2  fd'  fd' 2  fd' 
Step C   C    C  
4.
Deviation N  N  N  N  N  N 
Individual Observations
Method 1: Deviation taken from actual mean
X 2
Standard Deviation,   where x  X  X . This method is simple
N
when X  X values are integers
Steps:
1. Form a table with the given values, x in the first column
x
2. Find out the arithmetic mean, X   
 N 
 

3. Find out the deviation of each values from the actual mean and call it x
i.e, find x  X  X . Enter those values in the next column
4. Find out the squares of the deviation of the values from the actual mean,
i.e, find x2. Enter those values in the next column.
5. Find out the mean of the squared deviation of the values from their
arithmetic mean i.e., find

x 2
..
N
6. Find out the square root of

x 2
. It is the standard deviation.

N
Example 1: Consider 77, 73, 75, 70, 72, 76, 75, 72, 74, 76. Give standard
deviation for the numbers given above.
Solution:
xXX X
X x2 Arithmetic mean: X 
X : 74 N
77 3 9 740

10
73 –1 1
= 74
75 1 1
70 –4 16 Standard Deviation
72 –2 4
x 2
 
76 2 4 N
75 1 1 44
=
72 –2 4 10
74 0 0 = 4.4
76 2 4 = 2.10
x  740  X  X   0 x 2  44
Method 2: Direct Method

Without taking any deviations, the standard deviation can directly be
calculated by the formula.

2
X 2  X 
Standard Deviation    
N N 
This method can be used for all kinds of data. This formula is used later for
correcting the mistakes in the calculations.
Steps:
1. Form a table with the given values, x, in the first column
2. Find the square of each X and write in the next column under the title X2.
3. Find the totals X and X 2 and N, the number of values
4. Substitute in the above formula and simplify
Example: 2
10 students of B.Com class of a college have obtained the following marks
in statistics out of 100. Calculate the standard deviation
S. No : 1 2 3 4 5 6 7 8 9 10
Marks : 5 10 20 25 40 42 45 48 70 80
Solutions:
S. No Marks X X2 Standard Deviation
2
1 5 25 X 2  X 
   
2 10 100 N N 
3 20 400 2
20143  385 
  
4 25 625 10  10 
5 40 1600
 20143  38.5 2
6 42 1764
7 45 2025  2014 .30  1482 .25
8 48 2304
 532.05
9 70 4900
 23.07
10 80 6400
Total X  385 X
2
 20143

Method 3: Deviations taken from assumed mean

This is same as the one followed in the calculation of arithmetic mean. But
the formula is as follows:
2
d 2 d 
Standard Deviation,    
N N 
d = X – A is preferred when X  X are fractions.
Steps:
1. Form a table with the given values, X, in the first column.
2. Assume any value as ‘A’ if it is root specified in a problem. It is
preferable to assume a value in between the minimum value and the
maximum value of X as A.
3. Find out the observation of each value from the assumed mean A and
call it d. i.e., find d = X – A and write them in the next column
4. Write the squares of the deviations, d2, in the next column
5. Find d and d2 and identify N, the number of values substitute them in
the above formula and simplify.
Example 3:
For the data below, calculate standard deviation:
40, 50, 60, 70, 80, 90, 100
Solution:
d=X–A
X X Standard Deviation
A = 70
2
40 – 30 900 d 2 d 
   
50 – 20 400
N N 
60 – 10 100 2
2800  0 
  
70 0 0 7 7 
80 10 100
 400  0 2
90 20 400
100 30 900 = 20
Total d  0 d 2  2800

Method 4 Step Deviation Method:

This is same as the one followed in the calculation of arithmetic mean. But
the formula is as follows:
d  2  d  
2
Standard Deviation,   C   
N  N 
X A
d 
C
This method is preferred when X  X are fractions and there is common
difference between X.
Steps:
1. Form a table with the given values X, in the first column
2. Choose A and C as mentioned under Arithmetic mean
3. Find out the step deviation corresponding to each X. i.e., find
X A
d  and write those values in the next column.
C
4. Write the squares of d . i .e, d 2 , in the next column.
5. Find d’ and d’2 and identify N, the number of values. Substitute them
in the above formula and simplify.
Example 4: Given below are the marks obtained by 5 B.Sc. students
Roll No : 101 102 103 104 105
Marks : 10 30 20 25 15
Calculate Standard Deviation Standard Deviation
Solution:
d '  d ' 
2 2
X A  C  
d'  d'2 N  N 
Marks C
Roll No.
X A = 20 2
10  0 
C=5 5  
101 10 –2 4 5 5 
102 30 2 4
 5  2  02
103 20 0 0
104 25 1 1 5 2
105 15 –1 1 = 5  1.4142
Total – d’ = 0 d’ = 10

2
=7.07
Note: A problem can be solved by any one method.
13.4 Discrete Series

13.4.1 Method 1: Deviations taken from actual mean
fx 2
Standard Deviation,  
N
Where x = X  X and N = f
Steps:
1. Find out the arithmetic mean, X .
2. Find out the deviation of each X from the actual mean and call it x. i.e.,
find x = X  X
3. Find out x2 values
4. Multiply each x2 by the corresponding f and get fx2
5. Find fx2
fx 2
6. Divide fx2 by N and find the square root of the quotient. i.e, find
N
Example 5: Calculate the standard deviation of the following series.
x 6 9 12 15 18
f 7 12 13 10 8
Solution:
X XX
X f fx x2 fx2
X  12
6 7 42 –6 36 252
9 12 108 –3 9 108
12 13 156 0 0 0
15 10 150 3 9 90
18 8 144 6 36 288
Total N = 50 fx = 600 – – fx2 = 738
fx
Arithmetic mean = X 
N
600

50
= 12.00
fx 2
Standard Deviation  
N
738

50
 14.36
= 3.84
Method 2: Direct Method
Under this method, the formula becomes the following
2
fx 2  fx 
N  N 
Steps:
1. From a table with the given values, x and the frequencies, f in the first
two columns
2. Multiply each x by the corresponding f to find fx. Write all such fx
values in the next column.
3. Multiply each fx by the corresponding x to find fx2 (It is not (fx)2. That
is, fx should not be squared) such fx2 value in the next column.
4. Find N(=f), fx and fx2.
5. Substitute in the above formula and simplify.
Example 6: Calculate the standard deviation
No. of goals scored in a match : (x) 0 1 2 3 4 5
No. of Matches : (f) 1 2 4 3 0 2
Solution:
X f fx fx2
Standard Deviation
0 1 0 0 fx 2  fx 
   
2
1 2 2 2 N  N 
2 4 8 16 95  20 
  
2
3 3 9 27 12  12 
4 0 0 0  7.9167  2.4167 2
5 2 10 50  7.9167  5.8404
Total N = 12 fx = 29 fx = 95
2
 2.0763
= 1.44
13.5 Method 3: Deviation taken from assumed measure

The formula is as follows:
2
fd 2  fd 
N  N 
d = X – A, A – Assumed mean, N = f
Steps:
1. Form a table with the given values X and the frequencies, f is the first
two columns
2. Choose the value for A, assumed mean, if it is not specified.
3. Subtract A from each X and form the next column with d = X – A values
4. Multiply each d by the corresponding f and enter all such products in the
next column under the title fd.
5. Multiply each fd by the corresponding d and enter all such products in
the next column under the title fd2 (these are not the squares of fd
values)
6. Find N ( = f), fd and fd2
Example 7:
Calculate standard deviation from the following data:
x : 6 9 12 15 18
f : 7 12 19 10 2
Solution:
Let X: 6, 9, 12, 15 and 18.
d=X–A
X f fd fd2
A = 12
6 7 –6 – 42 252
9 12 –3 – 36 108
12 19 0 0 0
15 10 3 30 90
18 2 6 12 72
Total N = 50 – fd = – 36 fd2 = 522

2
fd 2  fd 
N  N 
2
522   36 
  
50  50 
 10.44  0.72 2
 10.4400  0.5184
 9.9216
= 3.15
13.5.1 Method 4: Step Deviation Method
This following formula is used.
2
fd' 2  fd' 
Standard Deviation,   C   
N  N 
X A
d'  ; N  f
C
Steps:
1. Form a table with the given values, x and the frequencies, f in the first
two columns
2. Choose the value for A.
X A
3. Find out d' corresponding to each X and enter them in the next
C
column
4. Multiply each d  by the corresponding f to get fd  . Enter them in the
next column.
5. Multiply each fd  by the corresponding d  and get fd  2 . Enter them in
the next column.
6. Find N    f ,  fd  and  fd 2

Example 8:
The weekly salaries of a group of employees are given in the following table.
Find the mean and standard deviation of the salaries.
Salaries (in Rs.) : 75 80 85 90 95 100
No. of persons : 3 7 18 12 6 4
Solution:
Salary (in No. of X A
d
Rs.) persons C fd’ fd2
x f A = 85; C = 5
75 3 –2 –6 12
80 7 –1 –7 7
85 18 0 0 0
90 12 1 12 12
95 6 2 12 24
100 4 3 12 36
Total N = 50 – fd = 23 d2 = 91
 fd 
Arithmetic mean X  A  C  
 N 
 23 
 85  5  
 50 
= 85 + 2.3
= 87.30 (Rs.)
Standard Deviation
2
fd 2  fd 
 C  
N  N 
2
91  23 
5  
50  50 
 5  1.82  0.46 2
 5  1.8200  0.2116

 5  1.6084
= 5  1.27
= 6.35 (Rs.)
13.6 Continuous Series

When X in the formulae for the calculation of standard deviation of discrete
series is replaced by m the corresponding formulae for continuous series
are obtained.
The calculations start with the mid values (m) of the class intervals and
class frequencies (f) when less than or more than frequencies are given,
class interval and class frequencies are to be found first.
Method 1: Deviations taken from actual mean.
The formula is as follows:
f m  X 
2
Standard deviation,  
N
Steps:
1. Form a table with class intervals and class frequencies in the first two
column.
2. Find the mid values (m) and write them the next column
3. Find the products of f and m and write them in the next column
4. Find X 
 fm where N  f , X may be found by other formula also.
N
5. Subtract X from each m. Enter the resulting m  X values in the next
column

6. Write m  X 2 in the next column
Find f m  X  and write them in the next column
2
7.
Find f m  X 
2
8.
Divide f m  X  by N and take the square root to get the standard

2
9.
deviation.

Example 9: Find the standard deviation

Class Intervals : 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50
Frequency : 2 5 9 3 1
Solution:
mX
Class Frequency Mid
Interval f value m
fm
X  23
m  X 2 
f m X 2
0 – 10 2 5 10 – 18 324 648
10 – 20 5 15 75 –8 64 320
20 – 30 9 25 225 2 4 36
30 – 40 3 35 105 12 144 432
40 – 50 1 45 45 22 484 484
f m  X   1920
2
N = 20 – fm = 460 – –
fm 460
Arithmetic Mean X    23
N 20
Standard Deviation  

f m  X 2
N
1920

20
 96
= 9.80
Method 2: Direct method
Under this method, the form of the formula is as follows:
2
fm 2  fm 
N N 
Steps:
1. Form a table with class intervals and frequencies in the first two
columns.
2. Find the mid values (m) and write them in the next column.
3. Find the products of f and m and write those fm values in the next
column

4. Find the products of m and fm and write those fm2 values in the next
column.
5. Find N(=f), fm and fm2
6. Substitute in the formula and simplify.
Example 10:
The following data were obtained while observing the life span of a few neon
lights of a company calculate S.D.
Life span (years) : 4–6 6–8 8 – 10 10 – 12 12 – 15
No. of neon lights : 10 17 32 21 20
Solution:
Life span No. of Neon Mid value
fm fm2
(Years) lights (f) (m)
4–6 10 5 50 250
6–8 17 7 119 833
8 – 10 32 9 288 2592
10 – 12 21 11 231 2541
12 – 14 20 13 260 3380
N = 100 - fm = 948 fm2 = 9596
2
fm 2  fm 
Standard Deviation,    
N  N 
2
9596  948 
  
100  100 
 95.96  9.48 2
 95.9600  89.8704
 6.0896
= 2.47

13.7 Combined Standard Deviation

When two or three groups merge, the mean and standard deviation of the
combined group are calculated as follows:
13.7.1 Case 1: Merger of two groups
Size Mean SD
Group I N1 X1 1
Group II N2 X2 2
That is
N1 – Number of items in the first group
N2 – Number of items in the second group
X 1 - Mean of items in the first group
X 2 - Mean of items in the second group
 1 - Standard deviation of items in the first group
 2 - Standard deviation of items in the second group.
The mean of the combined group
N1 X1  N 2 X 2
X12 
N1  N 2
The standard deviation of the combined group
N1  12  N 2  22  N1 d12  N 2 d 22
 12 
N1  N 2
d1  X 1  X 12 and d 2  X 2  X 12
Example 11:
The mean and standard deviation of 63 children on an average test are
respectively 27.6 and 7.1. To them are added a new group of 26 who have
less training and whose mean is 19.2 and standard deviation is 6.2. How will
the value of combined group differ from those of the original 63 children as
to mean and standard deviation?
Solution:
Given number of children Mean Mark S.D. of marks
N1 = 63 X1  27.6  1  7.1
N2 = 26 X 2  19.2  2  6.2

N1 X1  N2 X 2
 Combined mean X12 
N1  N2
63  27.6  26  19.2

63  26
1738 .8  499.2

89
2238 .0

89
= 25.15
Combined standard deviation:
N1  12  N 2  22  N1 d12  N 2 d 22
 12 
N1  N 2
63  7.12  26  6.2 2  63  2.45 2  26  5.95 2


63.26
3175 .83  999.44  378.1575  920.4650

89
5473 .8925

89
 61.5044
= 7.84
13.7.2 Case 2: Merger of three groups
Size Mean SD
Group I N1 X1 1
Group II N2 X2 2
Group III N3 X3 3
The mean of the combined groups
N1 X1  N 2 X 2  N 3 X 3
 123 
N1  N 2  N 3

The standard deviation of the combined groups

N1  12  N 2  22  N 3  33  N1 d12  N 2 d 22  N 3 d 33
 123 
N1  N 2  N 3
Where d1  X 1  X 123 d 2  X 2  X 123 d 3  X 3  X 123
Uses
Standard deviation is the best absolute measure of dispersion. It is a part of
many statistical concepts such as skewness, kurtosis, correlation,
regression, estimation sampling, tests of significance and statistical quality
control. Not only in statistics but also in biology, education, psychology and
other disciplines standard deviation is of immense use.
Merits
1. Standard deviation is rigidly defined
2. It is calculated on the basis of the magnitudes of all the items
3. It could be manipulated further. The combined standard deviation can
be calculated
4. Mistakes in its calculation can be corrected. The entire calculation
need not be redone.
5. Coefficient of variation is based on standard deviation. It is the best
and most widely used relative measure of dispersion
6. It is free from sampling fluctuations. This property of sampling stability
has brought it in dispensable place in tests of significance
7. It reduces the complexity in the approach of normal distribution by
providing standard normal variable
8. It is the most important absolute measure of dispersion. It is used in all
the areas of statistics. It is widely used in other disciplines such as
psychology, education and biology as well.
9. Scientific calculators show the standard deviation of any series.
10. Different forms of the formula are available.
Demerits
1. Compared with other absolute measures of dispersion, it is difficult to
calculate
2. It is not simple to understand

3. It gives more weightage to the items away from the mean than those
near the mean as the deviations are squared.
13.8 Coefficient of Variation

Standard Deviation
Coefficient of variation =  100
Arithmetic Mean
C.V. is the abbreviation
S.D.
 C.V .   100
A.M .

  100
X
Karl Pearson gave this definition. Like all other relative measures of
dispersion, it is a pure number. All relative measures of dispersion are free
from units of measurement such as kg., metre, litre, etc. The variations in
two or more series (groups or sets of data) are compared on the basis of a
relative measure of dispersion.
For example, an Indian may have different income at various periods of
time. His income is quoted in dollars. The variations in their incomes can be
compared by using any relative measure of dispersion.
Coefficient of variation is the more widely used relative measure of
dispersion and the best measure of central tendency. It is a percentage.
While comparing two or more groups, the group which has less coefficient of
variation is less variable or more consistent or more stable or more uniform
or more homogeneous.
Example 12:
Calculate the coefficient of variation of the following:
40, 41, 45, 49, 50, 51, 55, 59, 60, 60

Solution:
XX
X
X  51
X  X 2 X
Mean X 
40 – 11 121 N
510
41 –10 100   51.00
10
 X  X 
45 –6 36 2
49 –2 4 S.D.  
N
50 –1 1 504
  50.4
51 0 0 10
= 7.10
55 4 16

C.V .   100
59 8 64 X
7.10
60 9 81   100
51.00
60 9 81 = 13.92
 x  510 –  X  X   504
13.9 Variance
Definition
Variance is the mean square deviation of the values from their arithmetic
mean  2 (read, sigma square) is the symbol. Standard deviation is the
positive square root of variance and is denoted by . The term of variance
was introduced by R.A. Fisher in the year 1913. It is used much in sampling,
analysis of variance, etc. In analysis of variance, total variation is split into a
few components. Each component is due to one factor of variation. The
significance of the variation is then tested.
Formulae
These formulae can be compared with those under standard deviation
Method Individual Discrete Continuous
Observations Series Series
 X  X   f X  X   f m  X 
2 2 2
1. Actual mean
N N N

 fm2    fm 
2
 fX 2    fX 
2
 X 2    x 
2
2. Direct Method
N  N  N  N  N  N 
   
3. Assumed mean  d 2    d 
2
 fd 2    fd 
2
 fd 2 

 fd 2 
N  N  N  N  N  N 
   
 2  2
4. Step deviation C 2   d     d    C 2   fd     fd   
2 2
 2
C2 
 fd 2    fd   
 N  N    N  N    N  N  
     
Individual Observations
Example 13: Number of goals scored by a team in different matches.
Calculate variance.
XX
Goals X
X : 1 .7
X  X 2
2 0.3 0.09
Mean X 
X
N
0 – 1.7 2.89
17
  1.7
1 – 0.7 0.49 10
 X  X 
2
3 1.3 1.69
Variance,  2

0 – 1.7 2.89 N
16.10

4 2.3 5.29 10
= 1.61
3 1.3 1.69
1 – 0.7 0.49
1 – 0.7 0.49
2 0.3 0.09
Self Assessment Questions
1. Why is that standard deviation considered to be the most popular
measure of dispersion?
2. Calculate the standard deviation from the following data:
14, 22, 9, 15, 20, 17, 12, 11
3. The table below gives the marks obtained by 10 B.Com. students in
statistics examination. Calculate standard deviation.
Numbers: 1 2 3 4 5 6 7 8 9 10
Narks: 43 48 65 57 31 60 37 48 78 59

4. Calculate standard deviation from the following:

Marks : 10 20 30 40 50 60
No. of students : 8 12 20 10 7 3
5. Compute the standard deviation from the following data:
Class : 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70
Frequency : 8 12 11 14 9 7 4
13.10 Summary
In this unit we discussed the concept of standard deviation, the different
types of formulas are discussed with good examples. The concept of
variance is discussed next with examples.
13.11 Terminal Questions

1. Explain coefficient of variation (co-variance) and given formulae
2. Prices of a particular commodity in five years in two cities are below:
Price in City A Price in City B
20 10
22 20
19 18
23 12
16 15
From the above data find variance and covariance and the city which
had more stable prices?
3. Define Skewness and how many type of formulae?
4. Calculate (i) Karl – Pearson’s coefficient of Skewness and (ii) Bowley’s
coefficient of Skewness for the data given below:
Mid Value : 20 30 40 50 60 70 80
Frequency : 1 12 55 91 55 12 1
5. Calculate Kelly’s coefficient of Skewness
Class : 30 – 49 50 – 69 70 – 89 90 – 109 100 – 1290130 – 149 150 – 169
Frequency : 25 40 50 100 80 50 25
13.12 Answers
Self Assessment Questions
1. Karl Person introduced the concept of standard deviation in 1893. It is
the most important measure of dispersion and is widely used in many
statistical formulae. Standard deviation is also called Root-mean square
deviation or Mean Error or Mean Square Error
The reason is that it is the square-root of the means of the squared

deviation from the arithmetic mean. It provides accurate result. In this
method the drawback of ignoring algebraic sign (in mean deviation) is
overcome by taking the square of deviations, there by making all the
deviations as positive.
It is defined as positive square-root of the arithmetic mean of the
squares of the deviation of the given observation from their arithmetic
mean. The standard deviation is denoted by the Greek letter  (Sigma).
Deviation taken from actual mean method.
2. Calculate of standard deviation from actual mean
XX
Value (X)
 X  15  X  X 2
14 –1 1
22 7 49
9 –6 36
15 0 0
20 5 25
17 2 4
12 –3 9
11 –4 16
 X 10  X  X   140
120
X  15
8
 X 2 or   X  X 
2
 
N N
140
  17.5  4.18
8

Alternatively
We can find out standard deviation by using variable directly, i.e. no
deviation is fount out.
Values = x X2
14 196
22 484
9 81
15 225
20 400
17 289
12 144
11 121
X = 120 X = 1940
2
X2 
2
 X
   
N  N 
 
2
1940  120 
  
8  8 
 242.5  225
 17.5
= 4.18
Deviation taken from assumed mean method
d 2 
2
 d
The formula    
N  N 
 

3. Deviation from assumed mean

R. Numbers Marks x d=X–A d2
(1 – A = 50)
1 43 –7 49
2 48 –2 4
3 65 15 225
4 57 7 49
5 31 – 19 361
6 60 10 100
7 37 – 13 169
8 48 –2 4
9 78 28 784
10 59 9 81
N = 10 d = 26 d2 = 1826
d 2 
2
 d
   
N  N 
 
2
1826  26 
    182 .6  2.6 2
10  10 
 182.6  6.76
 175.84
= 13.26
dxx
4. Marks (x) f fx d2 fd2
x  30.8
10 8 80 – 20.8 432.64 3461.12
20 12 240 – 10.8 116.64 1399.68
30 20 600 – 0.8 0.64 12.80
40 10 400 9.2 84.64 846.40
50 7 350 19.2 368.64 2580.48
60 3 180 29.2 852.64 2557.92
X = 210 N = 60 fx = 1850 fd2 = 10,858.40
Mean X 
 fx  1850  30.8
N 60

Standard deviation  
 fd 2
N
10858 .64

60
= 13.45
Another method
Marks f d = x – 30 fd fd2
10 8 – 20 – 160 3200
20 12 – 10 – 120 1200
30 20 0 0 0
40 10 10 100 1000
50 7 20 140 2800
60 3 30 90 2700
X = 210 N = 60 fx = 1850 fd = 50 fd = 10,900
2
 fd 2 
2
 fd 
Standard deviation    
N  N 
fd = 50; fd2 = 10,900; N = 60

2
10900  50 
   
60  60 
 181.67  0.69
 180.98  13.45
  13.45
Method 3:
x  30
Marks f d  fd  fd  2
10
10 8 –2 – 16 32
20 12 –1 –12 12
30 20 0 0 0
40 10 1 10 10
50 7 2 14 28
60 3 3 9 27
 fd  5  fd 2  109
 fd     fd  
2
  C
2  N 
 
 fd 2  109 ;  fd   5 ; N = 60 C = 10
2
109  5 
      10
60  60 
 10817 0.0069  10
 1.81  10
= 1.345  10
 = 13.45
5.
XA
d
C
Class (x) Mid value f fd fd2
X  35

10
0 – 10 5 –3 8 – 24 72
10 – 20 15 –2 12 – 24 48
20 – 30 25 –1 17 – 17 17
30 – 40 35 0 14 0 0
40 – 50 45 1 9 9 9
50 – 60 55 2 7 14 28
60 – 70 65 3 4 12 36
N = 71 fd = – 30 fd = 210
2
X  A
 fd  C
N
A = 35  fd  30 N = 71 C = 10
 35 
 30   10
71
= 35 – 4.225 = 30.775
 fd 2 
2
 fd 
Standard deviation     C
N  N 
 

2
210   30 
    10
71  71 
 2.957  0.4225 2  10
 2.7785  10
= 1.667  10 = 16.67
Terminal Questions
1. The standard deviation is an absolute measure of dispersion. Coefficient
being considered as the “percentage variation in mean, standard
deviation being considered as the total variation in the mean. That is it
shows the relationship between the standard deviation and the
arithmetic mean expressed in terms of percentage.
standard deviation
Coefficient of variance   100
mean

(or) Covariance =  100
X
2. Calculation of coefficient of variation
Price deviation from dx2 y deviation from dy2
x x  20 dx y  15 dy
20 0 0 10 –5 25
22 2 4 20 5 25
19 –1 1 18 3 9
23 3 9 12 –3 9
16 –4 16 15 0 0
x = 100 dx = 0 dx2 = 30 dy = 0 dy = 68
2
City A City B
x
 x  100  20 y
 y  75  15
N 5 N 5
x  20 y  15
x 
 dx 2 y 
 dy 2
N N

30 68
 
5 5
 6  13.6
 x  2.45  x  3.69
x y
C.V .   100 C.V .   100
x y
2.45 3.69
  100   100
20 15
C.V. = 12.25 C.V. = 24.6
Variance  = 2.45
2
Variance 2 = 3.69
City A had more stable prices than in city B because the coefficient of
variations is lower in city A
3. “Skewness is the degree of asymmetry, or departure from symmetry, of
a distributuion”
1. Karl Pearson’s coefficient of Skewness
mean  mode
SK P  or  X  Z
standard deviation S.D. σ 
(or)
SK P 
3 mean  mode 


3 XZ 
standard deviation σ
It can be used when mode is ill defined.
2. Bowley’s coefficient of Skewness
Q  Q1  2M
SK B  3
Q 3  Q1
3. Kelly’s coefficient of Skewness

P  P10  2M
SKK  90
P90  P10
D9  D1  2M

D9  D1

M.A.
4. Mid P frequency d  fd fd 2 class cf
C
value f A = 50 C = 10 interval
20 1 –3 –3 9 15 – 25 1
30 12 –2 – 24 48 25 – 35 13
40 55 –1 – 55 55 35 – 45 68
50 91 0 0 0 45 – 55 159
60 55 1 55 55 55 – 65 214
70 12 2 24 48 65 – 75 226
80 1 3 3 9 75 – 85 227
Total N = 227 – 0  fd2  224 – –
i) A.M. X  A 
 fd   C  50  0
 10  50  0  50
N 227
 fd     fd 
2
2
224  0 
S.D.   C   10     9.93
N  N  227  227 
Greatest frequency = 91 model class interval: 45 – 55

L = 45 f1 = 91 f0 = 55 f2 = 55
hf1  f0  10 91  55 
Mode  
2f1  f0  f2 2 91  55  55
10 36  360
   50
182  110 72
Karl Pearson’s coefficient of skewness
X Z 50  50
SK P   0
 9.93
N 227
ii)   56.75 ; Q1 class interval: 35 – 45
4 4
L = 35 h = 10 f = 55 C = 13
hN 
Q1  L   C
f 4 
 35 
10
50.75  13 
55
 35 
10
43.75   42.95
55

Q2 (or) median
N 227
  113.5 Median class interval: 45 – 55
2 2
L = 45 f = 91 h = 10 c = 68
h N 
M L  C
f 2 
10
 45  113.5  68 
91
= 57. 05
3N
Q3:  170.25 Q3 class interval: 55 – 65
4
L = 55 h = 10 f = 55 Q = 159
h  3N 
Q3  L   C
f  4 
 55 
10
170.25  159 
55
= 57.05
Q3  Q1  2M
 Bowley’s Skewness SKB 
Q3  Q1
57.05  42.95  2  50

57.05  42  95
0
 0
14.10
5.
Class frequency True class cumulative
Intervals frequency
30 – 49 25 29.5 – 49.5 25
50 – 69 40 49.5 – 69.5 65 
70 – 89 50 69.5 – 89.5 115
90 – 109 100 89.5 – 109.5 215 
110 – 129 80 109.5 – 129.5 295
130 – 149 50 129.5 – 149.5 345 
150 – 169 25 149.5 – 169.5 370
N = 370

10 N 370
 10   37. 37th cumulative frequency is included in the class
100 100
interval
49.5 – 69.5. It is P10 class interval
L10 = 49.5 h10 = 20 f10 = 40 C10 = 25
h  10 N 
P10  L10  10   C10 
f10  100 
 49.5 
20
37.25 
40
= 49.5 + 6 = 55.5
N 370
  185 89.5 – 109.5 is the class interval
2 2
L = 89.5 h = 20 f = 100 C = 115
h N 
M L  C
f 2 
20
 89.5  185  115 
100
= 89.5 + 14 = 103.5
90 N 370
 90   333
100 100
129.5 – 149.5 is P90 class interval
h  90 N 
L90  L90  90   C90 
f90  100 
 129.5 
20
333  295 
50
 129.5  15.2  144.7
Kelly’s coefficient of Skewness
P  P10  2M
SKK  90
P90  P10
144.7  55.5  2  103.5

144.7  55.5
 6.8
  0.0762
89.2

References:
1. Algebra and Trigonometry by Richard Brown
2. Integral calculus by Shanthi Narayan
Publication – S. Chand & Co.
3. Differential calculus by Shanthi Narayan
4. Problems in Calculus of one variable by I. A. Maron
Publication – CBS Publishers
5. Trigonometry by S.L. Loney
6. Applied & Computational Complex Analysis by Peter Henrici
7. Mathematical Analysis by K.G. Binmore.
________________

Unit 13 - Basic Statistics

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 13 - Basic Statistics

Uploaded by

Copyright:

Available Formats

Mathematics for IT Unit 13

Unit 13 Basic Statistics

13.2 Measures of Central Tendency

these three fundamental and trivial measures, two other measures,

Manipal University Jaipur B0947 Page No.: 325

Manipal University Jaipur B0947 Page No.: 326

Example 3: In an office there are 84 employees. Their salaries are as given

i) Find the mean salary of the employees

(i) The mean salary of the employees is

Manipal University Jaipur B0947 Page No.: 327

Manipal University Jaipur B0947 Page No.: 328

Example 5: The mean of marks scored by 30 girls of a class is 44%. The

Manipal University Jaipur B0947 Page No.: 329

Example 6: Average (mean) weight of a type of screws is 10.4 gms. A

Manipal University Jaipur B0947 Page No.: 330

Manipal University Jaipur B0947 Page No.: 332

Here, the value 2 has the highest frequency.

Manipal University Jaipur B0947 Page No.: 333

Modified class Frequency

13.3 Standard Deviation

Manipal University Jaipur B0947 Page No.: 334

variance. Karl Pearson introduced the concept of standard deviation in

Manipal University Jaipur B0947 Page No.: 335

arithmetic mean i.e., find

6. Find out the square root of

. It is the standard deviation.

Method 2: Direct Method

Manipal University Jaipur B0947 Page No.: 336

Manipal University Jaipur B0947 Page No.: 337

Method 3: Deviations taken from assumed mean

Manipal University Jaipur B0947 Page No.: 338

Method 4 Step Deviation Method:

Total – d’ = 0 d’ = 10

13.4 Discrete Series

13.5 Method 3: Deviation taken from assumed measure

Total N = 50 – fd = – 36 fd2 = 522

Manipal University Jaipur B0947 Page No.: 342

Manipal University Jaipur B0947 Page No.: 343

Manipal University Jaipur B0947 Page No.: 344

13.6 Continuous Series

Divide f m  X  by N and take the square root to get the standard

Manipal University Jaipur B0947 Page No.: 345

Example 9: Find the standard deviation

30 – 40 3 35 105 12 144 432

Manipal University Jaipur B0947 Page No.: 346

N = 100 - fm = 948 fm2 = 9596

Manipal University Jaipur B0947 Page No.: 347

13.7 Combined Standard Deviation

Manipal University Jaipur B0947 Page No.: 348

63  7.12  26  6.2 2  63  2.45 2  26  5.95 2

Manipal University Jaipur B0947 Page No.: 349

The standard deviation of the combined groups

Manipal University Jaipur B0947 Page No.: 350

13.8 Coefficient of Variation

Manipal University Jaipur B0947 Page No.: 351

Manipal University Jaipur B0947 Page No.: 352

Manipal University Jaipur B0947 Page No.: 353

4. Calculate standard deviation from the following:

13.11 Terminal Questions

The reason is that it is the square-root of the means of the squared

2. Calculate of standard deviation from actual mean

Manipal University Jaipur B0947 Page No.: 355

Manipal University Jaipur B0947 Page No.: 356

3. Deviation from assumed mean