# Basic Statistics

Measures of Central Tendency Variable A variable is a quantity which can vary from one value to another. The common examples of variables are barometer readings, temperature, rainfall records, wages, heights etc. Variables are of two kinds. (i) Continuous variables: The quantities which can assume any numerical value within a certain range are called continuous variables. For example, if we consider the height H of a child at various ages, we observe that as the child grows from 110 cm to 160 cm (say), his height takes all possible values within this range. Thus, the height H of a child at various ages is an example of a continuous variable. (ii) Discrete (or discontinuous) variables: The quantities which cannot assume all possible values are called discrete variables. For example, the number of children x in a family can take any of the values 0, 1, 2, 3 , ..., but cannot be 2.1, 2.46, 3.783. Thus, the number of children x is an example of a discrete variable. statistics paper which are arranged according to their roll numbers, the maximum marks allotted to the paper being 100: 17, 71, 70, 14, 22, 00, 55, 59, 23, 87, 93, 21, 22, 50, 87, 54, 70, 52, 87, 74, 63, 87, 28, 04, 17, 49, 85, 81, 76, 52, 21, 86, 50, 87, 26, 87, 50, 60, 32, 40, 30, 90, 27, 89, 81, 21, 14, 30, 37, 22. The data given above is in raw form, i.e. we cannot conclude anything. Such type of data is called Ungrouped data. When the data is arranged in ascending or descending order of magnitude, the data is said to be arranged in an array. Suppose these data is arranged in the intervals 0-10, 10-20, 20-30, 30-40, 40-50, 50-60, 60-70, 70-80, 80-90, 90100. This is done by the method called ‘tally method.’ In this method, we prepare a frequency table as follows:

Constant: A quantity is a constant if it can assume only one value.

Frequency Distribution A frequency distribution is defined when the following two information are specified: (i) (ii) The value which the variable takes, and The number of times (i.e. frequencies) a value is taken by a variable.

Consider the marks obtained by 50 students in a

In the above example, 'marks' is the variable x and the 'number of students' against the marks is called frequency f. Thus, the frequency of the class 50-60 is 8. The value of the variable which is mid-way between the upper and lower limits is called mid-

.. 1 ..

Median 5. we assume any value of the variable as ‘assumed mean’ and then find the
1 actual mean using X = A + h N
∑ fi ui where A:
xi − A h
Assumed mean. is given by
Marks above 90 above 80 above 70 above 60 above 50 above 40 above 30 above 20 above 10 above 0
X 1 + X 2 + X 3 + . Cumulative frequency 'more than' Number of students 2 13 18 20 28 30 34 44 48 50 Cumulative frequency 'less than' Marks under 20 under 30 under 40 under 50 under 60 under 70 under 80 under 90 under 100 Number of students 6 16 20 22 30 32 37 48 50
Measures of Central Tendency The following are the five measures of central tendency. Arithmetic mean 2. Let W . each item of the series is considered equally important but there may be cases where all items may not have equal importance and some of them may be comparatively more important than others.. 1. Mode Arithmetic Mean (AM) If X1. 95. X 2.. Geometric mean 3. . we employ a welldefined short-cut method which is explained in the example below. .. . the following two forms of the frequency distribution may also be used. For the table. X n respectively.
. . . The cumulative frequency corresponding to a class is the total of all the frequencies upto and including that class. . In the above example.. Xn are the given n observations. to calculate AM.. X 3 .. h: Class size and ui =
. proper weightage is to be given to various items — the weights attached to each item being proportional to the importance of the item in the distribution.+ X n ∑ X = n n Weighted Arithmetic Mean In the calculation of simple average.. In this method. In the above example.. usually denoted by X . then their AM. 25. The cumulative frequencies are shown in the last column.. the mid-values of the classes are respectively 5. W . Harmonic mean 4.. W .point (or mid-value) of the class. . .. 2 . Then the weighted arithmetic mean.. 15. ... the cumulative frequency of the class 0-10 is 2 (since there is no class above it). X 3 .. X 2 .. In such cases. usually denoted
by X
W
is given by X W =
∑ Wi Xi
i=1 n
n
∑ Wi
i=1
Sometimes.. W be the weights attached to X=
1 2 3 n
variable values X 1 .

Step 1: Find the cumulative frequencies (c. h = 20
fi 7 14 19 25 20 10 5 xi . 891. Step 1: Obtain the frequency distribution. Y 4 = 6.
greater than For grouped or continuous frequency distribution. 3 .
. then median is 2 n the AM of the values of 2 observations. h = 20.. respectively.. n Step 3: If n is odd. . X2 = –5.. calculation of multiplying two inconvenient numbers is avoided. (using (4) and (e))
44 = 900 + 20 − = 891.
Median The median is that value of the variable which divides the group in to two equal parts. mean wage = Rs.) just
N and determine the corresponding 2 value of the variable. Calculation of Median For individual observations Step 1: Arrange the observations x1.f.f.2.A ui = ( x1 − A) /h −100 −5 − 80 − 40 0 20 80 100 −4 −2 0 1 4 5 fiui − 35 − 56 − 38 0 20 40 25
For discrete frequency distribution. Wage (Amount in Rs.) 800 820 860 900 920 980 1000 Number of workers 7 14 19 25 20 10 5 Sol. Y 2 = –8. Calculate (1) ΣX (2) ΣY (3) ΣXY (5) ΣY2 (4) ΣX2 (6) (ΣX)(ΣY) (7) ΣXY2 (8) Σ(X + Y)(X – Y) Sol. 1 Two variables X and Y. xn in ascending or descending order of magnitude.. assume the values X1 = 2.
xi 800 820 860 900 920 980 1000
n + 1 observation. x2. 2 Find the mean wage from the data given below. where N = 2
∑ fi
i =1
n
Step 3: See the cumulative frequency (c. (1) (2) (3) (4) (5) (6) (7) (8) ΣX = –7 ΣY = 5 ΣXY = 26 ΣX2 = 109 ΣY2 = 209 (ΣX)(ΣY) = –35 (using (1) and (2)) ΣXY2 = –190 Σ(X + Y)(X – Y) = Σ(X2 – Y2) = 109 – 209 = –100. If n is even. Step 2: Determine the total number of observations.Ex.) Step 2: Find
N . then median is the value of
Ex.. With the help of this method. One part comprises all the values greater than and the other part comprises all the values less than the median. Step 2: Prepare the cumulative frequency column and obtain N = Σ fi Find
∑fi = 100
Here A = 900. 2
. say.
∑fi ui = − 44
1 n ∴ Mean = X = A + h ∑ fi ui N i=1
N . which is the median. X3 = 4.
th
th
n and + 1 2
th
Let the assumed mean be A = 900. Y 3 = 10.2 100 Hence. X4 = –8 and Y1 = –3.

6. The 2 2
N is 2 26 and the corresponding class is 15-20. 5. Ex.5. 4 Find the mean.. f is frequency of the median class. 9.6. (2) Arranged in an array.
Note: The mean deviation from the median for any distribution is minimum. 2. the numbers are 48. 5. 5. median. L = lower limit of the f median class. 12.
cumulative frequency just greater than . The mode may or may not exist.5. 5. 50. 8. 5. (1) Arranged in an array. 2 Mode = Number most frequently occurring = 5.Step 3: See the cumulative frequency just greater than
N and determine the corresponding class. 48.
of two middle numbers =
Here N = 49 ⇒
N 49 = = 24. Mean = 49. the numbers are 2.9 Sol. Ex. 6. 4. F = cumulative frequency of the class preceding the median class. Median = middle number = 49. 9 has two modes. F = 11. 10. 8 and 9. 48.9. 5. Mode = non existent. and mode for the sets (1) 3.
Mode Mode is the value which occurs most frequently in a set of observations. 49. 4. 4 . 7. 5. 15.7. 6 and (2) 51. 7. 3 Calculate the median for the following distribution: Class: 5-10 10-15 15-20 20-25 25-30 30-35 35-40 40-45 Frequency: 5 6 15 10 5 4 2 2 Sol. Step 4: Use the formula: Median
Thus 15-20 is the median class such that l = 15. 7. 5. 3. 9. h = width (size) of the median class.
N −F 24.7. 12. 50. 9.5.5 × 5 = 19. f = 15. The set 2. 5. where. Examples The set 2. and even if it does exist. Mean = 5.3. 9. it may not be unique.. 4. Median = Arithmetic mean
Class 5 − 10 10 − 15 15 − 20 20 − 25 25 − 30 30 − 35 35 − 40 40 − 45
Freqency Cumulative frequency 5 6 15 10 5 4 2 2 N = 49 5 11 26 36 41 45 47 49
1 (5 + 5) = 5. 16 has no mode. 2 This class is known as the median class. 5.3 and 51.8. 7. 10. 2. 11. N = Σ fi.5 15
N/ 2 − F =L+ × h . 8. 4 and 7.6. 6. 49. 10. 2. A distribution having a unique mode is called ‘unimodal’ and one having more than one is called ‘multimodal’. 3. The set 3.5 − 11 ∴ Median = l + 2 × h = 15 + ×5 f 15
= 15 + 13.1.
. 48.5. h = 5. and is called bimodal. 2. 18 has mode 9.

. Shares 3. we get 3.. mode is the value of the variable corresponding to the maximum frequency. 18 Sol. Uses 1. 7. 6. 5 (2) 9. 8. 3. To counter this. 10. Mode = I +
Relationship Between Mean (M). Mean = Median = Mode In case of a ‘moderately’ asymmetrical distribution. Median (Md) and Mode(Mo) In case of symmetrical distribution. Quartile deviation 5. 3. 9. After arranging the terms of (1). 12. Let Xmax be the greatest observation and Xmin the smallest observation of the variable. f1= frequency of the modal class f0= Frequency of class preceding the modal class f2= Frequency of class succeeding the modal class The above formula can be rephrased as Mode
D1 = L + D + D c . Range = Xmax–Xmin. After arranging the terms of (2). In case of continuous frequency distribution. 9. 15. Mode = 3 Median – 2 Mean Measures of Dispersion The various measures of dispersion are as follows 1.42 = 25. 9. Standard deviation 4. 57
= 21 +
. 8. Then.Calculation of Mode In case of frequency distribution.
h( f1 − f0 ) . the class corresponding to the maximum frequency is called the modal class and the value of mode is obtained as.
(M0 ) = l +
= 21 +
f − f−1 ×i 2f − f−1 − f1
72 − 36 ×7 144 − 36 − 51 252 = 21 + 4. Therefore.
Ex.e. 8. 5 Compute the mode of the following distribution: Class intervals: 0-7 7-14 14-21 21-28 28-35 35-42 42-49 Frequency : 19 25 36 72 51 43 28 Mode
Sol. Weather forecast Ex. 18. the class containing the mode) D1 = Excess of modal frequency over frequency of next lower class D2 = Excess of modal frequency over frequency of next higher class c = Size of the model class interval. semi-interquartile range and 10 – 90 percentile range were designed to improve on the deficiencies of the range. 7. range = 15. 1 2 1 where L1 = Lower class boundary of the modal class (i.. the range is same but the variation is greater in the first series than the second series. 10 – 90 percentile range Range It is the difference between two extreme observations of a distribution. 18. like in this example. 6. 9. 15. Therefore. 8. 10. 8. 5 . range = 15. 18. 9. 5. 8. Quality control 2.42 . we get 3. Range 2. Mean deviation 3. ( f1 − f0 ) − ( f2 − f1 )
where l = Lower limit of modal class h = Magnitude of the modal class. 6 Find the range of the sets (1) 12. Note that range is not a very good measure of dispersion because.

93 − 62.62 63 . The number of students in the range X ± 2 MD is
62.74 Number of students 5 18 42 27 8 Total = 100
0.93 to 71.5 − 74.45 ± 3(2. 7 Determine the percentage of the students heights in the following table of mean deviation of the heights of 100 male students.65 . (2) The range from 62.5 5 + 18 + 42 + 27 + 5– 3 74.31 1.45| (f) 32... (3) The number of students in the range
X ± 3 MD is
60.65 66 .
Sol.Mean Deviation (MD) If X1.74
Class |X – X| = Frequency f|X – X| M ark (X) |X – 76.71 – 68. mean deviation about A is given by 1 1 Σ fi | Xi − A | = Σ fi | di | = | X − X | .5 3 – 65.26) = 67. then the mean deviation (MD) about A. which is 97% of 3 the total.45 62..68 69 . 3 3 which is 55% of total.67 − 59.5 8 ≈ 86. X2.1 64 18 3.68 .19 in.9 67 42 0.45 68. the upper class boundary of the second class is 65.97 − 71. The number of students in the range
Ex.5) of the students in the fourth 3 class (since the class-interval size is 3 inches.23 8 ≈ 97.21 (18) + (27) ≈ 55.
.5 18 + 42 + 27 + 18 – 3 71. which is 86% of 3 the total. Xn are n given observations.71 inches is
X ± MD = 67.
Height (in) 60 63 66 69 72 .4 73 8 5.71 72 . X3 .71 . to 69.97 inches is
X ± MD is 42 +
(2) X ± 2 MD
X ± 2 MD = 67. 6 .26.5 inches). In case of frequency distribution. n n
+
1 (69.78. and the lower class boundary of the fourth class is 68. where d = X – A.5 inches.45 18.85 70 27 2. i i n n Modulus removes the effects of negative deviations and therefore gives us the absolute value of the deviation.. is given by MD =
The range from 65.45 ± 6.19) of the students in the second class
1 1 Σ | Xi − A | = Σ | di | .62 .50
.55 N = Σf = 100 Σf |X – X| = 226. that fall within the ranges (1) X ± MD (3) X ± 3 MD
Height (inches) 60 .25 61 5 6.55 44. This range includes
all the individuals in the third class +
1 (65.45 ± 2.

if multiplied by k. the minimum is that for which a = X .. σ does not change.
2 variance = σ =
=
X2 − X2
The root mean square deviation about any point a is
defined as S =
∑(X
i =1
N
i
− a)2
. But. Sample variance means variance of a sample drawn out from a population. Therefore. if S=
each observation is transformed to transforms to
aσ . 7 . This is so because the sum of the squares of the deviation of a set of numbers Xi from any number a is minimum if and only if a = X .Generally. This means that if each of the observations are increased by a constant k.. where a is the
N
average besides the arithmetic mean. σ c
.Standard Deviation If X1.. This property provides an important reason for defining the standard deviation as above. XN is the set of N observations. X ax i + b . .. Therefore. Note that root mean square deviation about origin is also called the quadratic mean and is given by
1 ∑ x i2 . X2. N Also note that σ is independent of origin but not of scale.
. σ changes to kσ. s 2 N represents sample variance and σ2 represents population variance. then its standard deviation is given by
Variance It is the square of the standard deviation. Of all such deviations.
1 σ= Σ ( Xi − X)2 where X is the AM alternative N formula for standard deviations is
σ=
2 Σfi X i Σf X − i i N N 2
1 Σ ( Xi − X)2 .