This action might not be possible to undo. Are you sure you want to continue?
Kulwant Singh Kapoor
Data Structure
The process of arranging data in groups or
classes according to resemblances and
similarities is technically called
classification.
Types of Classification:
Geographical
Chronological
Qualitative
Quantitative
Geographical Data
In geographical classification data are classified on the
basis of place.
Example: geographical distribution of National Income
COUNTRY INCOME IN US DOLLARS
Canada 7950
USA 7880
West Germany 7510
France 6730
USSR 2800
India 500
Chronological Data
When the data are classified on the basis of time,
also known as time series.
Example: production of polio vaccine by a company
“X”.
YEAR No. of Vaccines
2005 12,800
2006 15,600
2007 18,200
2008 16,600
2009 20,000
2010 20,800
Qualitative Data
When data are classified on the basis of descriptive
characteristics or attributes.
Examples:
• Male/ Female
• Strongly agree/ Agree/Disagree/Strongly Disagree
• Low/Medium/High
• Diabetic/Non Diabetic
• Hypertensive/Mildly Hypertensive/Non
Hypertensive
Quantitative Classification
When classification is based on characteristics
which are capable of Quantitative measurement.
Example:
Height/Weight
Income/Expenditure
Blood Pressure
Body Temperature
Blood Count
Quantitative Data
Ungrouped Grouped
Raw Data Discreet data Continuous data
• Mean
• Median
• Mode
• Quartile
• Percentile
MEASURE OF CENTRAL
TENDENCY
MEAN
• Arithmetic Mean of a given set of observations is
their sum divided by the number of observations.
For example if X
1
, X
2,
X
3,…..
X
n
are the given n
observations then their arithmetic mean, denoted
by
1 2 1
........
n
i
n i
x
x x x
X
n n
=
+ + +
= =
¿
EXAMPLE 1
MARKS OF 24
STUDENTS
12 43 54 67 87 98 65 43
54 67 89 90 98 76 54 56
54 98 89 78 90 98 99 87
TOTAL 1746
# OF OBSERVATIONS 24
MEAN 72.75
Arithmetic's Mean for UnGrouped
Series
• Employee Income XA
• 1 1000 
• 2 1500 
• 3 800 
• 4 1200 
• 5 900 
For discreet data mean is calculated with
respect to frequencies.
In Case of continuous data, the value of X is
taken as the mid value of the corresponding
class.
1 1 2 2 1
1 2
1
........
......
n
i i
n n i
n
n
i
i
f x
f x f x f x
X
f f f
f
=
=
+ + +
= =
+ + +
¿
¿
EXAMPLE 2 NUMBER OF STUDENTS ABSENT IN A YEAR
X f Xf
1 8 8
2 9 18
3 21 63
4 32 128
5 12 60
6 22 132
7 24 168
8 37 296
9 15 135
10 20 200
TOTAL 200 1208
MEAN 6.04
• Marks Students X40
X f d F*d
20 8  
30 12  
40 20  
50 10  
60 6  
70 4  
Total 60
EXAMPLE 3 DISTRIBUTION OF NUMBER OF
PROCESSED ARTICLES PER DAY
PER PERSON
LIMITS f X fX
80100 7 90 630
100120 50 110 5500
120140 80 130 10400
140160 60 150 9000
160180 3 170 510
TOTAL 200 26040
MEAN 130.2
Mathematical Properties of
Arithmetic Mean
• Property 1 – The Algebraic sum of the
deviations of the given set of
observations from their arithmetic
mean is zero
• Property 2 – If the sizes and the mean
of two component series is known then
the mean of resultant series obtained
on combining the given series can be
found
Merits and demerits of
Arithmetic Mean
Merits:
i. It is rigidly defined.
ii. It is easy to calculate and understand.
iii. It is based on all the observations
iv. It is suitable for further mathematical
treatment.
v. Of all the averages, arithmetic mean is
affected least by fluctuations of sampling
or arithmetic mean is a stable average.
(contd.)
Merits and demerits of
Arithmetic Mean
Demerits:
i. It is affected by extreme observations.
ii. It cannot be used in case of open end classes such as less than 10
and more than 70, etc.
iii. It can not be determined by inspection nor can it be located
graphically.
iv. It cannot be used in dealing with qualitative characteristics.
v. It cannot be obtained if a single observation is missing or lost.
vi. It is not representative of the distribution and hence is not a suitable
measure of location
vii. It may lead to wrong conclusion if the details of the data from which
it is obtained are not available.
viii. Arithmetic mean may not be one of the values which the variable
actually takes and is termed as fictitious mean
Mean For Combined Data
• If is the mean for observations and
If is the mean for observations
The combined mean is given by
1
X
1
n
2
X
2
n
1 1 2 2
1 2
n X n X
X
n n
+
=
+
Example
• Mean height of 25 Male worker in the
factory is 61 inches and Mean height of 35
female worker is the same factory is 58
inches. Find out the combine Mean of 60
workers
Median
Median is that value of the variable which
divides the group in two equal parts, one
part comprising all the values greater and
the other, all the values less than the
median.
Median is only a positional average i.e, its
value depends on the position occupied by
a value in the frequency distribution.
Calculation of Median
Case I: Ungrouped data: If the number of observation is odd,
then the median is the middle value after the observations
have been arranged in ascending or descending order of
magnitude.
Case II: Discreet Distribution: In case of frequency
distribution where the variable takes the value X
1,
X
2,, …,
X
n
with respective frequencies ƒ
1,
ƒ
2,, …,
ƒ
n
with ∑ƒ=N, total
frequency, median is the size of the (N+1)/2th item or
observation. In this case the use of cumulative frequency
(c. ƒ.) distribution facilitates the calculations.
EXAMPLE 4
MARKS OF 10
STUDENTS ARE
4 7 6 8 9 4 3 2 7 8
IN ORDER 2 3 4 4 6 7 7 8 8 9
MEDIAN 6.5
MARKS OF 11
STUDENTS ARE
4 7 6 8 9 4 3 2 7 8 4
IN ORDER 2 3 4 4 4 6 7 7 8 8 9
MEDIAN 6
8 COINS ARE TOSSED AND NUMBER OF HEAD ARE NOATED
THE EXPERIMENT IS REPEATED 256 TIMES
# HEADS FREQUENCY
X f CF xf
0 1 1 0
1 9 10 9
2 26 36 52
3 59 95 177
4 72 167 288
5 52 219 260
6 29 248 174
7 7 255 49
8 1 256 8
N/2 128 1017
MEDIAN 4 mean 3.972656
• Case III: Continuous distribution:
Compute cumulative frequency (cf)
Find N/2
See cf just greater than N/2
The corresponding class contains the median value
called median class
2
h N
Median l C
f
 
= + ÷

\ .
Where l is the lower limit of median class
f is the frequency of the median class
H is the magnitude of the median class
N is the total frequency
C is the CF of the class preceding the median class
Merits:
i. It is rigidly defined
ii. It is easy to understand and calculate for a non medical
person.
iii. It is not affected by extreme observations and as such is very
useful in the case of skewed distributions
iv. It can be computed by dealing with the distribution with open
end classes
v. It can sometimes be located by simple inspection and can
also be computed graphically
vi. It is the only average to be used while dealing with qualitative
characteristics which can not be measured quantitatively but
still can be arranged in ascending oe descending order of
magnitude.
Merits And Demerits
Merits And Demerits
Demerits:
i. In case of even number of observations of
ungrouped data it can not be determined
exactly.
ii. It is not based on each and every item of the
distribution.
iii. It is not suitable for further mathematical
treatment.
iv. It is relatively less stable than mean, particularly
for small samples.
Quartile
The values which divide the given
data into four equal parts are
known as quartiles. Therefore,
there will be only three such points
Quartile
The values which divide the given data into four
equal parts are known as quartiles. Therefore,
there will be only three such points Q
1,
Q
2 and
Q
3
such that Q
1
≤Q
2
≤Q
3
termed as the three quartiles.
Q
1
known as the lower or first quartile is the value
which has 25% of the items of the distribution
below it and consequently 75% of the items are
greater than it. Q
2 ,
the second quartile coincides
with the median and has equal number of
observations above and below it. Q
3
upper or third
quartile, has 75% of the observations below it and
consequently 25% of the observations above it
1
4
h N
Q l C
f
 
= + ÷

\ .
3
3
4
h N
Q l C
f
 
= + ÷

\ .
Percentile
Percentiles are the values which divide the
series into 100 equal parts. So, there are 99
percentiles P
1
, P
2
… P
99
such that P
1
≤ P
2
≤
… ≤ P
99.
The i
th
percentile value is:
100
i
h iN
P l C
f
 
= + ÷

\ .
MODE
• Mode is the value which has the
greatest frequency density
• Mode for continuous distribution is
given by
( )
( ) ( )
1 0
1 0 2 1
h f f
Mode l
f f f f
÷
= +
÷ ÷ ÷
EXAMPLE 7
f x xf
1020 4 15 60
2030 6 25 150
3040 5 35 175
4050 10 45 450
5060 20 55 1100
6070 22 65 1430
7080 21 75 1575
8090 6 85 510
90100 2 95 190
100110 1 105 105
f1=22 h=10 5745
f0=20 97
f2=21 mean 59.2268
l=60
mode= 66.6666667
Measures of Dispersion
Range
Quartile deviation
Mean Deviation
Variance
Standard deviation
RANGE
max min
Range X X = ÷
Range is the difference between the two extreme
observations of distribution
OR
It is the difference between the greatest (maximum) and the
smallest (minimum) observation of the distribution.
It is the simplest but crude measure of dispersion. It is
rigidly defined, readily comprehensible and easiest to
compute requiring very little calculations
EXAMPLE
MARKS OF STUDENTS
ROLL NO. MARKS SORTED
123 98 52
125 95 56
126 96 56
127 87 66
128 56 78
134 52 87
135 89 89
136 78 95
137 56 96
138 66 98
RANGE 9852= 46
RANGE
Merits and Demerits of Range
It is not based in the entire set of data.
Its value varies very widely from sample to
sample.
If the X
max
and X
min
remain unaltered and all the
other values are replaced by a set of observation
the range of distribution remains the same.
It can not be used when dealing with open end
classes
Not Suitable for mathematical treatment.
It is very sensitive to the size of the sample.
It is too indefinite to be used as a practical
measure of dispersion.
QUARTILE DEVIATION
3 1
D
2
Q Q
Quartile eviation
÷
=
It is a measure of dispersion based on the upper quartile
Q
3
and the lower quartile Q
1.
Interquartile Range= Q
3
 Q
1
Quartile Deviation is obtained from inter quartile range
on dividing by 2.
Merits and Demerits of Quartile
Merits:
It is quite easy to understand & calculate.
It makes use of 50% of the data & as such is
better measure than range
As it ignore 25% of data from the beginning and
25% from the top end, it is not affected at all by
extreme observations.
It can be Computed from the Frequency
distribution with open end classes .
(Contd.)
Demerits:
It is not based on all observations.
It is affected considerably by
fluctuations of sampling.
It is not suitable for further
mathematical treatment.
Merits and Demerits of Quartile
EXAMPLE
DISTRIBUTION OF MONTHLY EARNING
MONTH EARNING
1 10239
2 10250
3 10251
4 10251
5 10257
6 10258
7 10260
8 10261
9 10262
10 10262
11 10273
12 10275
Q1 10251
Q3 10262
QUARTILE DEVIATION 5.5
MEAN DEVIATION
1
D
i
Mean eviation X X
n
= ÷
¿
1
D
i i
Mean eviation f X X
N
= ÷
¿
Average or Mean deviation is the average amount of scatter
of the items in a distribution from either the mean or the
median, ignoring the signs of deviation. The average that is
taken of the scatter is an arithmetic mean, which accounts for
the fact that this measure is often called the mean deviation.
For grouped data
For ungrouped data
EXAMPLE
DISTRIBUTION OF SERIES OF DAILY RENTS
HOUSE RENT XMEAN
1 3000 1819
2 3000 1819
3 3000 1819
4 3750 1069
5 4000 819.4
6 4000 819.4
7 4000 819.4
8 4500 319.4
9 4750 69.44
10 5000 180.6
11 5000 180.6
12 5000 180.6
13 5250 430.6
14 5250 430.6
15 5500 680.6
16 6250 1431
17 6500 1681
18 9000 4181
TOTAL 86750 18750
MEAN 4819.4
EXAMPLE
DISTRIBUTION OF HEIGHTS OF STUDENTS
HEIGHT # OF STUDENTS
X f fX f(XMEAN)
158 15 2370 49.1667
159 20 3180 45.5556
160 32 5120 40.8889
161 35 5635 9.72222
162 33 5346 23.8333
163 21 3423 36.1667
164 10 1640 27.2222
165 8 1320 29.7778
166 6 996 28.3333
TOTAL 180 29030 290.667
MEAN 161.278
MD 1.61481
STANDARD DEVIATION
• It is defined as the positive square root of the
mean of the squares of the deviations of the given
observations from their mean
( )
2
1
Standard Deviation
i
X X
n
o = = ÷
¿
( )
2
1
Standard Deviation
i i
f X X
N
o = = ÷
¿
For ungrouped data
For grouped data
VARIANCE
( )
2
2
1
i i
Variance f X X
N
o = = ÷
¿
( )
2
2
1
i
Variance X X
n
o = = ÷
¿
It is the square of standard deviation and is denoted
by σ
2
For ungrouped data
For grouped data
PROPERTIES OF STANDARD
DEVIATION
PROPERTY – 1
is independent of change of origin but not scale
PROPERTY – 2
Is the minimum value of the root mean square deviation
PROPERTY 3
Is suitable for further mathematical treatment
PROPERTY 4
SD < Range
MERITS AND DEMERITS OF SD
• Is the most important and widely used
measure of dispersion
• It is defined on all the observations
• The squaring of the deviations removes the
drawback of ignoring the signs of deviations
in computing the mean deviation
• It is affected least by fluctuations of
sampling
EXAMPLE
X (XMEAN)^2
12 13.69
15 0.49
24 68.89
12 13.69
13 7.29
15 0.49
14 2.89
12 13.69
16 0.09
24 68.89
TOTAL 157 190.1
MEAN 15.7
VARIANCE19.01
SD 4.36
EXAMPLE
# LETTERS IN WORD FREQUENCY XMEAN
X f fX d fd^d
1 3 3 3.277 32.208
2 8 16 2.277 41.463
3 9 27 1.277 14.667
4 10 40 0.277 0.765
5 5 25 0.723 2.617
6 4 24 1.723 11.880
7 3 21 2.723 22.251
8 1 8 3.723 13.864
9 3 27 4.723 66.932
10 1 10 5.723 32.757
TOTAl 47 201 239.404
MEAN 4.277
VARIANCE 5.094
EXAMPLE f x xf d^2 fd^2
3039 1 29.539.5 34.5 34.5 1128.96 1128.96
4049 4 39.549.5 44.5 178 556.96 2227.84
5059 14 49.559.5 54.5 763 184.96 2589.44
6069 20 59.569.5 64.5 1290 12.96 259.2
7079 22 69.579.5 74.5 1639 40.96 901.12
8089 12 79.589.5 84.5 1014 268.96 3227.52
9099 2 89.599.5 94.5 189 696.96 1393.92
TOTAL 75 5107.5 11728
MEAN 68.1
VARIANCE 156
SD 12.5
CORRELATION
When the relationships of quantitative
nature, the appropriate statistical tool for
discovering and measuring the relationship
and expressing it in a brief formula is known
as correlation
It is defined as an analysis of the co
variation between two or more variables
Types of Correlation
a) Positive and negative correlation
b) Linear and nonlinear correlation
METHODS OF STUDYING
CORRELATION
1. Scatter diagram
2. Karl Pearson’s coefficient of
correlation
3. Bivariate correlation method
4. Rank correlation
Scatter Diagram
Karl Pearson’s Coefficient of
Correlation
• Is a numerical measure of linear
relationship between them and is
defined as the ratio of the covariance
between X & Y to the product of the
standard deviations
( , )
x y
Cov x y
r
o o
=
2 2
1
( )( )
1 1
( ) ( )
x x y y
n
r
x x y y
n n
÷ ÷
=
÷ ÷
¿
¿ ¿
2 2 2 2
( )( )
[ ( ) ][ ( ) ]
n xy x y
r
n x x n y y
÷
=
÷ ÷
¿ ¿ ¿
¿ ¿ ¿ ¿
EXAMPLE
ADVERTISING Sales
EXPENSES
x y dx=xmx dy=ymy dx^2 dy^2 dxdy
39 47 26 19 676 361 494
65 53 0 13 0 169 0
62 58 3 8 9 64 24
90 86 25 20 625 400 500
82 62 17 4 289 16 68
75 68 10 2 100 4 20
25 60 40 6 1600 36 240
98 91 33 25 1089 625 825
36 51 29 15 841 225 435
78 84 13 18 169 324 234
650 660 0 0 5398 2224 2704
mx= 65
my= 66
r= 0.78
Data Structure
The process of arranging data in groups or classes according to resemblances and similarities is technically called classification. Types of Classification: Geographical Chronological Qualitative Quantitative
Geographical Data
In geographical classification data are classified on the basis of place. Example: geographical distribution of National Income
COUNTRY Canada USA West Germany France USSR India
INCOME IN US DOLLARS 7950 7880 7510 6730 2800 500
600 18.800 15.000 20. also known as time series.200 2008 2009 2010 16. of Vaccines 12.600 20.800 . YEAR 2005 2006 2007 No.Chronological Data When the data are classified on the basis of time. Example: production of polio vaccine by a company “X”.
Qualitative Data When data are classified on the basis of descriptive characteristics or attributes.Diabetic • Hypertensive/Mildly Hypertensive/Non Hypertensive . Examples: • Male/ Female • Strongly agree/ Agree/Disagree/Strongly Disagree • Low/Medium/High • Diabetic/Non.
Example: Height/Weight Income/Expenditure Blood Pressure Body Temperature Blood Count .Quantitative Classification When classification is based on characteristics which are capable of Quantitative measurement.
Quantitative Data Ungrouped Grouped Raw Data Discreet data Continuous data .
MEASURE OF CENTRAL TENDENCY • • • • • Mean Median Mode Quartile Percentile .
.. denoted by n x1 x2 . X3. X2.. Xn are the given n observations then their arithmetic mean. For example if X1....…..MEAN • Arithmetic Mean of a given set of observations is their sum divided by the number of observations.. xn X n x i 1 i n .
EXAMPLE 1 12 54 43 67 MARKS OF 24 STUDENTS 54 89 67 90 87 98 98 76 65 54 43 56 54 98 89 78 90 98 99 87 TOTAL # OF OBSERVATIONS MEAN 1746 24 72.75 .
Arithmetic's Mean for UnGrouped Series • • • • • • Employee 1 2 3 4 5 Income 1000 1500 800 1200 900 XA  .
... f n fx i 1 n n i i f i 1 i ... In Case of continuous data.For discreet data mean is calculated with respect to frequencies.. the value of X is taken as the mid value of the corresponding class. f1 x1 f 2 x2 ... f n xn X f1 f 2 .....
EXAMPLE 2 X f 1 8 2 9 3 21 4 32 5 12 6 22 7 24 8 37 9 15 10 20 TOTAL MEAN 200 6.04 NUMBER OF STUDENTS ABSENT IN A YEAR Xf 8 18 63 128 60 132 168 296 135 200 1208 .
• Marks Students X f 20 8 30 12 40 20 50 10 60 6 70 4 Total 60 X40 d  F*d  .
EXAMPLE 3 DISTRIBUTION OF NUMBER OF PROCESSED ARTICLES PER DAY PER PERSON f X 7 90 50 80 60 110 130 150 fX 630 5500 10400 9000 LIMITS 80100 100120 120140 140160 160180 TOTAL MEAN 3 200 130.2 170 510 26040 .
Mathematical Properties of Arithmetic Mean • Property 1 – The Algebraic sum of the deviations of the given set of observations from their arithmetic mean is zero • Property 2 – If the sizes and the mean of two component series is known then the mean of resultant series obtained on combining the given series can be found .
Merits and demerits of Arithmetic Mean
Merits: i. It is rigidly defined. ii. It is easy to calculate and understand. iii. It is based on all the observations iv. It is suitable for further mathematical treatment. v. Of all the averages, arithmetic mean is affected least by fluctuations of sampling or arithmetic mean is a stable average. (contd.)
Merits and demerits of Arithmetic Mean
Demerits:
i. ii. It is affected by extreme observations. It cannot be used in case of open end classes such as less than 10 and more than 70, etc. iii. It can not be determined by inspection nor can it be located graphically. iv. It cannot be used in dealing with qualitative characteristics. v. It cannot be obtained if a single observation is missing or lost. vi. It is not representative of the distribution and hence is not a suitable measure of location vii. It may lead to wrong conclusion if the details of the data from which it is obtained are not available. viii. Arithmetic mean may not be one of the values which the variable actually takes and is termed as fictitious mean
Mean For Combined Data
• If X1 is the mean for n1 observations and If X 2 is the mean for n 2 observations The combined mean is given by
n1 X1 n2 X 2 X n1 n2
Find out the combine Mean of 60 workers .Example • Mean height of 25 Male worker in the factory is 61 inches and Mean height of 35 female worker is the same factory is 58 inches.
Median Median is that value of the variable which divides the group in two equal parts. all the values less than the median. one part comprising all the values greater and the other. .e. Median is only a positional average i. its value depends on the position occupied by a value in the frequency distribution.
total frequency.. ƒn with ∑ƒ=N. …. X2. . In this case the use of cumulative frequency (c.Calculation of Median Case I: Ungrouped data: If the number of observation is odd..) distribution facilitates the calculations. then the median is the middle value after the observations have been arranged in ascending or descending order of magnitude. median is the size of the (N+1)/2th item or observation. Case II: Discreet Distribution: In case of frequency distribution where the variable takes the value X1. ƒ2. …. Xn with respective frequencies ƒ1. ƒ.
EXAMPLE 4 4 IN ORDER MARKS OF 10 STUDENTS ARE 7 3 6 4 8 4 9 6 4 7 3 7 2 8 7 8 8 9 2 6.5 MEDIAN MARKS OF 11 STUDENTS ARE 4 IN ORDER 7 6 8 9 4 3 2 7 8 4 2 6 3 4 4 4 6 7 7 8 8 9 MEDIAN .
972656 .SED AND NUMBER OF HEAD ARE NOATED THE EXPERIMENT IS REPEATED 256 TIMES # HEADS FREQUENCY X f CF xf 0 1 1 0 1 9 10 9 2 26 36 52 3 59 95 177 4 72 167 288 5 52 219 260 6 29 248 174 7 7 255 49 8 1 256 8 N/2 128 1017 MEDIAN 4 mean 3.
• Case III: Continuous distribution: Compute cumulative frequency (cf) Find N/2 See cf just greater than N/2 The corresponding class contains the median value called median class hN Median l C f 2 Where l is the lower limit of median class f is the frequency of the median class H is the magnitude of the median class N is the total frequency C is the CF of the class preceding the median class .
.
It is not affected by extreme observations and as such is very useful in the case of skewed distributions It can be computed by dealing with the distribution with open end classes It can sometimes be located by simple inspection and can also be computed graphically It is the only average to be used while dealing with qualitative characteristics which can not be measured quantitatively but still can be arranged in ascending oe descending order of magnitude. iii. ii. It is rigidly defined It is easy to understand and calculate for a non medical person. iv. . vi.Merits And Demerits Merits: i. v.
particularly for small samples. It is not suitable for further mathematical treatment. It is relatively less stable than mean. iv. In case of even number of observations of ungrouped data it can not be determined exactly. It is not based on each and every item of the distribution. ii. . iii.Merits And Demerits Demerits: i.
there will be only three such points . Therefore.Quartile The values which divide the given data into four equal parts are known as quartiles.
Q3 upper or third quartile. has 75% of the observations below it and consequently 25% of the observations above it . the second quartile coincides with the median and has equal number of observations above and below it. Q2 and Q3 such that Q1≤Q2 ≤Q3 termed as the three quartiles. there will be only three such points Q1. Q2 . Therefore.Quartile The values which divide the given data into four equal parts are known as quartiles. Q1 known as the lower or first quartile is the value which has 25% of the items of the distribution below it and consequently 75% of the items are greater than it.
h N Q1 l C f 4 h 3N Q3 l C f 4 .
The i th percentile value is: h P l i f iN C 100 . P2 … P99 such that P1 ≤ P2 ≤ … ≤ P99.Percentile Percentiles are the values which divide the series into 100 equal parts. So. there are 99 percentiles P1.
MODE • Mode is the value which has the greatest frequency density • Mode for continuous distribution is given by Mode l f1 f0 f2 f1 h f1 f 0 .
EXAMPLE 7 1020 2030 3040 4050 5060 6070 7080 8090 90100 100110 f1=22 f0=20 f2=21 l=60 mode= f 4 6 5 10 20 22 21 6 2 1 h=10 97 mean 66.2268 x 15 25 35 45 55 65 75 85 95 105 xf 60 150 175 450 1100 1430 1575 510 190 105 5745 .6666667 59.
Measures of Dispersion Range Quartile deviation Mean Deviation Variance Standard deviation .
RANGE Range is the difference observations of distribution between OR It is the difference between the greatest (maximum) and the smallest (minimum) observation of the distribution. It is rigidly defined. readily comprehensible and easiest to compute requiring very little calculations Range X max X min . the two extreme It is the simplest but crude measure of dispersion.
RANGE EXAMPLE MARKS OF STUDENTS ROLL NO. 123 125 126 127 128 134 135 136 137 138 RANGE MARKS 98 95 96 87 56 52 89 78 56 66 9852= SORTED 52 56 56 66 78 87 89 95 96 98 46 .
It can not be used when dealing with open end classes Not Suitable for mathematical treatment. If the Xmax and Xmin remain unaltered and all the other values are replaced by a set of observation the range of distribution remains the same. It is very sensitive to the size of the sample.Merits and Demerits of Range It is not based in the entire set of data. It is too indefinite to be used as a practical measure of dispersion. Its value varies very widely from sample to sample. .
Interquartile Range= Q3 .Q1 Quartile Deviation is obtained from inter quartile range on dividing by 2. Q3 Q1 Quartile Deviation 2 .QUARTILE DEVIATION It is a measure of dispersion based on the upper quartile Q3 and the lower quartile Q1.
Merits and Demerits of Quartile Merits: It is quite easy to understand & calculate. It can be Computed from the Frequency distribution with open end classes . It makes use of 50% of the data & as such is better measure than range As it ignore 25% of data from the beginning and 25% from the top end.) . (Contd. it is not affected at all by extreme observations.
It is not suitable for further mathematical treatment.Merits and Demerits of Quartile Demerits: It is not based on all observations. . It is affected considerably by fluctuations of sampling.
5 .EXAMPLE DISTRIBUTION OF MONTHLY EARNING MONTH 1 2 3 4 5 6 7 8 9 10 11 12 EARNING 10239 10250 10251 10251 10257 10258 10260 10261 10262 10262 10273 10275 Q1 10251 Q3 10262 QUARTILE DEVIATION 5.
For ungrouped data Mean Deviation 1 Xi X n For grouped data 1 Mean Deviation fi X i X N . The average that is taken of the scatter is an arithmetic mean.MEAN DEVIATION Average or Mean deviation is the average amount of scatter of the items in a distribution from either the mean or the median. ignoring the signs of deviation. which accounts for the fact that this measure is often called the mean deviation.
6 11 5000 180.44 10 5000 180.4 8 4500 319.4 9 4750 69.6 14 5250 430.EXAMPLE DISTRIBUTION OF SERIES OF DAILY RENTS HOUSE RENT XMEAN 1 3000 1819 2 3000 1819 3 3000 1819 4 3750 1069 5 4000 819.6 15 5500 680.4 7 4000 819.4 6 4000 819.6 12 5000 180.6 16 6250 1431 17 6500 1681 18 9000 4181 TOTAL 86750 18750 MEAN 4819.6 13 5250 430.4 .
5556 160 32 5120 40.EXAMPLE DISTRIBUTION OF HEIGHTS OF STUDENTS HEIGHT # OF STUDENTS X f fX f(XMEAN) 158 15 2370 49.8889 161 35 5635 9.667 MEAN 161.8333 163 21 3423 36.1667 159 20 3180 45.3333 TOTAL 180 29030 290.72222 162 33 5346 23.2222 165 8 1320 29.7778 166 6 996 28.278 MD 1.61481 .1667 164 10 1640 27.
STANDARD DEVIATION • It is defined as the positive square root of the mean of the squares of the deviations of the given observations from their mean For ungrouped data 1 Standard Deviation Xi X n For grouped data i 2 Standard Deviation 1 N f X i X 2 .
VARIANCE It is the square of standard deviation and is denoted by σ2 For ungrouped data 2 1 Variance X i X n For grouped data 2 2 1 Variance fi X i X N 2 .
PROPERTIES OF STANDARD DEVIATION PROPERTY – 1 is independent of change of origin but not scale PROPERTY – 2 Is the minimum value of the root mean square deviation PROPERTY 3 Is suitable for further mathematical treatment PROPERTY 4 SD < Range .
MERITS AND DEMERITS OF SD • Is the most important and widely used measure of dispersion • It is defined on all the observations • The squaring of the deviations removes the drawback of ignoring the signs of deviations in computing the mean deviation • It is affected least by fluctuations of sampling .
49 68.1 .89 13.EXAMPLE X 12 15 24 12 13 15 14 12 16 24 TOTAL 157 MEAN 15.69 0.89 13.69 7.69 0.01 SD 4.7 VARIANCE19.09 68.89 190.49 2.36 (XMEAN)^2 13.29 0.
277 2.932 32.208 41.TOTAl MEAN VARIANCE EXAMPLE # LETTERS IN WORD FREQUENCY X f fX 1 3 3 2 8 16 3 9 27 4 10 40 5 5 25 6 4 24 7 3 21 8 1 8 9 3 27 10 1 10 47 201 4.667 0.723 3.723 5.463 14.723 4.277 5.880 22.094 XMEAN d 3.723 2.723 fd^d 32.723 1.277 0.251 13.765 2.864 66.617 11.757 239.277 1.277 0.404 .
96 2227.44 1290 12.5 74.2 1639 40.579.539.1 156 12.559.5 34.5 1128.569.96 2589.96 901.5 xf d^2 fd^2 34.5 39.52 189 696.96 1128.5 11728 .5 69.96 1393.96 178 556.599.EXAMPLE 3039 4049 5059 6069 7079 8089 9099 TOTAL MEAN VARIANCE SD f 1 4 14 20 22 12 2 75 68.5 59.96 259.5 94.5 x 29.5 49.5 54.84 763 184.589.5 79.549.5 84.5 89.5 44.92 5107.5 64.12 1014 268.96 3227.
CORRELATION When the relationships of quantitative nature. the appropriate statistical tool for discovering and measuring the relationship and expressing it in a brief formula is known as correlation It is defined as an analysis of the covariation between two or more variables .
Types of Correlation a) Positive and negative correlation b) Linear and nonlinear correlation .
Bivariate correlation method 4. Karl Pearson’s coefficient of correlation 3. Rank correlation . Scatter diagram 2.METHODS OF STUDYING CORRELATION 1.
Scatter Diagram .
Karl Pearson’s Coefficient of Correlation • Is a numerical measure of linear relationship between them and is defined as the ratio of the covariance between X & Y to the product of the standard deviations r Cov ( x. y ) x y .
r 1 ( x x)( y y) n 1 2 1 ( x x) ( y y ) 2 n n r [n x 2 ( x) 2 ][ n y 2 ( y ) 2 ] n xy ( x)( y ) .
78 dx^2 dy^2 dxdy 676 361 494 0 169 0 9 64 24 625 400 500 289 16 68 100 4 20 1600 36 240 1089 625 825 841 225 435 169 324 234 5398 2224 2704 .EXAMPLE ADVERTISING Sales EXPENSES x dx=xmx y dy=ymy 39 47 26 19 65 53 0 13 62 58 3 8 90 86 25 20 82 62 17 4 75 68 10 2 25 60 40 6 98 91 33 25 36 51 29 15 78 84 13 18 650 660 0 0 mx= 65 my= 66 r= 0.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.