This action might not be possible to undo. Are you sure you want to continue?
Chapter 3: Measures of Central Tendency and Dispersion
Raw Data are the raw materials that will have to be converted into finished products (Information). From a voluminous database containing raw data, it is impossible to see any pattern unless they are converted into information by data reduction. The reduction can be achieved by summary measures, which are concise and yet give a reasonably accurate view of the original data. This chapter covers the important summary measures of central tendency and dispersion (variation)
1) What is Central Tendency?
Whenever you measure things of the same kind, a fairly large number of such measurements will tend to cluster around the middle value. The question that arises is " is it possible to define one typical representative average in such a manner that the remaining items in the data set will cluster around this value?" will have a tendency to be closer to this value? Such a value is called a measure of "Central Tendency". The other terms that are used synonymously are "Measures of Location", or "Statistical Averages".
and Information Analysts rely heavily on summary measures when a large mass of data will have to be analyzed to help decision-makers. . As a manager. Median. and Mode.2) Measures of Central Tendency Quantitative Specialists. The most widely used measures of central tendency are Arithmetic Mean . You need these summary measures of central tendency to draw meaningful conclusions in your functional area of operation. Statisticians.
3. 6. consider a data set containing the following observations: 4. 3. In symbolic form mean is given by X X n X = Arithmetic Mean = Indicates sum all X values in the data set = Total number of observations(Sample Size) X n . 5. 3. It is defined as the sum of all observations in a data set divided by the total number of observations. The arithmetic mean = (4+3+6+5+3+3)/6 =4.Arithmetic Mean Arithmetic Mean (called mean) is the most common measure of central tendency used by all managers in their sphere of activities. For example.
. 572. It is not the best average to use when the data set contains extreme values (Very high or very low values).Arithmetic Mean for Raw Data Example The inner diameter of a particular grade of tire based on 5 sample measurements are as follows: (figures in millimeters) 565. 568. 585 Applying the formula X X n We get mean = (565+570+572+568+585)/5 =572 Caution: Arithmetic Mean is affected by extreme values or fluctuations in sampling. 570.
then median will be between two middle values.Median Median is the middle most observation when you arrange data in ascending or descending order of magnitude. That is. Median is a very useful measure for ranked data in the context of consumer preferences and rating. n 1 Median th value of ranked data 2 n = Number of observations in the sample Note: If the sample size is an odd number then median is (n+1)/2 th value in the ranked data. You take the average of these two middle values. If the sample size is even. . Median is such that 50% of the observations are above the median and 50% of the observations are below the median. It is not affected by extreme values but affected by the number of observations. the data are ranked and the middle value is picked up.
45 40 60 80 90 65 55 Arranging the data after ranking gives 90 80 65 60 55 45 40 Median = (n+1)/2 th value in this set = (7+1)/2 th observation= 4th observation=60 Hence Median = 60 for this problem.Median for Raw Data Example -Odd Sample Size Marks obtained by 7 students in Computer Science Exam are given below: Compute the median. .
56.55 and 2. you will get 2.45 2. Calculate the median value.50 2.56 2.43 2.58 Arranging the data in the ascending order.45 2.46 2.43 2. 2.60 2.Median for Raw Data Example .50 2. Hence median = (2.58 2.46 2.60 The median falls between 5th and 6th observation. That is between 2.55 2.555 .56 2.65 2.66 2.55 2.55+2.66 2.Even Sample Size Diameter of a shaft in millimeters in a manufacturing unit is Given below for 10 samples.65 2.56)/2 =2.
Median and mean will not be helpful in this type of situation. In these cases mode cannot be uniquely determined. Caution: In a few problems in real life. there will be more than one mode such as bimodal and multi-modal values. Another example where mode is the only answer is in determining the most typical shoe size to be kept in stock in a shop selling shoes. Mode is a very useful measure when you want to keep in the inventory. It has the maximum frequency of occurrence. . Mode is not affected by extreme values. the most popular shirt in terms of collar size during festival season.Mode Mode is that value which occurs most often.
Mode for Raw Data Example The life in number of hours of 10 flashlight batteries are as follows: Find the mode. Hence. . mode=340. 340 350 340 340 320 340 330 330 340 350 340 occurs five times.
Mean for Grouped Data Formula for Mean is given by Where fX X n X fX = Mean = Sum of cross products of frequency in each class with midpoint X of each class = Total number of observations (Total frequency) = n f .
Mean for Grouped Data Example Find the arithmetic mean for the following continuous frequency distribution: Class 0-1 Frequency 1 1-2 4 2-3 8 3-4 7 4-5 3 5-6 2 .
5/25=3.02 .0 20.5 2.5 13.5 5.0 75.02 Applying the formula fX X n = 75.5 1.5 11.5 C f 1 4 8 7 3 2 25 D fX 0.5 4.0 24.Solution for the Example 1 2 3 4 5 6 7 8 9 A Class 0-1 1-2 2-3 3-4 4-5 5-6 Totals Mean B X 0.5 3.5 6.5 3.
Median for Grouped Data Formula for Median is given by Median = (n/2) m L c f Where L =Lower limit of the median class n = Total number of observations = f m = Cumulative frequency preceding the median class f = Frequency of the median class c = Class interval of the median class .
Median for Grouped Data Example Find the median for the following continuous frequency distribution: Class 0-1 Frequency 1 1-2 4 2-3 8 3-4 7 4-5 3 5-6 2 .
(25/ 2) 5 Median = L (n/2) m c .Solution for the Example Class Frequency Cumulative Frequency 0-1 1 1 1-2 4 5 2-3 8 13 3-4 7 20 4-5 3 23 5-6 2 25 Total 25 Substituting in the formula the relevant values.we have Median = 2 1 f 8 = 2.9375 .
Mode for Grouped Data d1 c Mode = L d1 d2 Where L =Lower limit of the modal class d1 f1 f0 f1 f0 d2 f1 f2 = Frequency of the modal class = Frequency preceding the modal class = Frequency succeeding the modal class C = Class Interval of the modal class f2 .
Mode for Grouped Data Example Example: Find the mode for the following continuous frequency distribution: Class 0-1 Frequency 1 1-2 4 2-3 8 3-4 7 4-5 3 5-6 2 .
8 .Solution for the Example Class 0-1 1-2 2-3 3-4 4-5 5-6 Total Frequency 1 4 8 7 3 2 25 d1 c Mode = L d 1 d 2 L=2 d1 f1 f0 = 8 -4 = 4 d2 f1 f 2 = 8 -7 = 1 C = 1 Hence Mode = 2 4 1 5 = 2.
Comparison of Mean. Does not require measurement on all observations Requires measurement on all observations. Mode Mean Defined as the arithmetic average of all observations in the data set. Median Defined as the middle value in the data set arranged in ascending or descending order. . Cannot be determined Not uniquely defined for under all conditions. Uniquely and comprehensively defined. multi-modal situations. it has the largest frequency. Median. Does not require measurement on all observations Mode Defined as the most frequently occurring value in the distribution.
That is. Cannot be treated algebraically. combined. Mode Cont. groups cannot be combined. Mean Median Mode Affected by extreme values. Median. . Not affected by extreme Not affected by extreme values. Modes of several groups cannot be combined. Means of several Medians of several groups can be combined.Comparison of Mean. Can be treated algebraically. That is. Cannot be treated algebraically. values. That is.
to gauge whether there is distortion to the bell shaped symmetrical normal distribution curve that forms the foundation stone upon which the entire statistical inference is built.3) Measures of Dispersion In simple terms. . It is important to study the central tendency along with dispersion to throw light on the shape of the curve. It answers unambiguously the question " What is the magnitude of departure from the average value for different groups having identical averages?". measures of dispersion indicate how large the spread of the distribution is around the central tendency.
12.3. It is calculated as the difference between maximum and minimum value in the data set. 10.Range Range is the simplest of all measures of dispersion. 9 Range = XMaximum XMinimum = 18-9=9 . 12. 11. 14. 14. 11. 18. Range = XMaximum XMinimum Example for Computing Range The following data represent the percentage return on investment for 10 mutual funds per annum. Calculate Range.5. 11.
Limitation of Range Caution: Range is a good measure of spread in the distribution only when a data set shows a stable pattern of variation without extreme values. . then range should not be used. If one of the components of range namely the maximum value or minimum value becomes an extreme value.
It is computed as the range after eliminating the highest and lowest 25% of observations in a data set that is arranged in ascending order. you can resort to interquartile range. Thus this measure is not sensitive to extreme values. Interquartile range = Range computed on middle 50% of the observations . To overcome this deficiency.Interquartile Range Range is entirely dependent on maximum and minimum values in the data set and is highly misleading when one of them is an extreme value.
14. 14. 10. and 14. 18) observations in this data set.5) and last two (14. 18 Ignore the first two (9. 12. 9 Arranging in ascending order. For this if you calculate range. 12.5. 14.Interquartile Range-Example The following data represent the percentage return on investment for 9 mutual funds per annum. The remaining contains 50% of the data. 12. Data Set: 12. Interquartile range = 14-11 =3. 14.5. you get interquartile range. 11. 10. the data set becomes 9. 12. Calculate interquartile range. 11. 12. 14. 11. 18. 11. 11. They are 11. . 10.
in which all deviations are treated as positive ignoring the actual sign. 2) Ignoring the sign has serious implications to a business manager attempting to measure the spread of the distribution distribution in a scientific manner. Hence it reflects the dispersion of every item in the distribution. Unlike range. MAD = Where X X n X X represents sum of all deviations from arithmetic mean after ignoring sign n = Number of observations in the sample(sample size) Caution: Mean Absolute Deviation (MAD) has two weaknesses. it is defined by the following formula. In symbolic form. 1) It cannot be combined for several groups. MAD is based on all observations.Mean Absolute Deviation(MAD) Mean Absolute Deviation (MAD) is defined as the average based on the deviations measured from arithmetic mean. X = Arithmetic Mean .
18.3. Calculate MAD (Please note that this is the same example used for computing Range) 12. 9 X X = (12+14+11+18+10.28 + 11 12.5 12.28 12 . 14.3 + 12 12 .3+12+14+11+9)/10 =12. 11.28 + 14 12 .28 + 10.Example for MAD The following data represent the percentage return on investment for 10 mutual funds per annum. 28 = 18.832 n .28 12.5+11. 10. 12.28 + 9 12 . 11.28 n 14 12.28 + 11.32/10 =1. 11.5. 14.32 + 11 MAD = X X = 18.28 + X X = 12 + 18 12.28 12.
It plays a very vital role in testing hypotheses and forming confidence interval. It has many advantages over the rest of the measures of variations. It is a classic measure of dispersion. In simple terms. standard deviation is the square root of variance. . To define standard deviation. you need to define another term called variance. It is capable of being algebraically treated which implies that you can combine standard deviations of many groups.Standard Deviation Standard deviation forms the basis for the discussion on Inferential Statistics. It is based on all observations.
Important Terms with Notations Im p o rta n t T erm s w ith n o ta tio n s K ey R em a rk s ( X X ) n 1 2 S am p le V ariance S 2 n 1 S am p le S tand ard D eviatio n ( X X ) 2 1. X n X 2 N is an u nbiased 2 N P o p u la tio n S tand ard D e viatio n ( X = ) 2 e stim ato r o f 3. S 2 is an u nbia sed (X = ) 2 S= n 1 P o p u la tio n V ariance ( X X ) 2 e stim ato r o f 2. X (P o p u latio n M ean) N n = N u m b er o f o bservatio ns in the sa m p le (S am p le size) N = N u m be r o f o bservatio ns in the P o p u la tio n (P o p u latio n S ize) . X ( X N ) 2 W here X X n N T he d iviso r n-1 is alw ays u sed w hile calcu lating sa m p le variance fo r ensu ring p ro p erty o f being u nbiased S tand ard d eviatio n is alw ays the sq u are ro o t o f variance (S am p le M ea n) and 4.
18. 12. 14. 11. 9 .5. 10. 11. 12. 11.Example for Standard Deviation The following data represent the percentage return on investment for 10 mutual funds per annum.3. 14. Calculate the sample standard deviation.
Solution for the Example .
Sample Variance = S 2 (X X) = 6. From the spreadsheet of Microsoft Excel in the previous slide.33 is seen) Sample Standard Deviation = S = (In column D and row 15.28 is n seen).52 is seen) (X X ) n 1 2 = 2. 12.52 .28 (In column A and row14.33 (In column D and row 14. n 1 2 6. it is easy to see X that Mean = X =12. 2.Solution for the Example Cont.
based on frequency distribution is given by S= f(X X ) n 1 2 which is used to estimate the Population Population Standard Deviation . Here X fX n n is the Sample Size = f . X =Mid Point of each class .Standard Deviation for Grouped Data The standard deviation for sample data.
Standard Deviation for Grouped Data-Example Frequency Distribution of Funds Return on Investment 5-10 10-15 15-20 20-25 25-30 Total Return on Investment of Mutual Number of Mutual Funds 10 12 16 14 8 60 .
Solution for the Example .
it is easy to see Mean = X fX n =1040/60=17.Solution for the Example From the spreadsheet of Microsoft Excel in the previous slide.33 59 = 6. X) f(X 2 Standard Deviation = S = (Cell H12) n 1 = 2448.333(cell F10).44 .
Coefficient of Variation (Relative Dispersion) Coefficient of Variation (CV) is defined as the ratio of Standard Deviation to Mean. Larger the CV. you would like to have a small CV so that your assessment in a situation is robust. The percentage risk is minimized. As a manager. CV is the measure to use when you want to see the relative spread across groups or segments. It also measures the extent of spread in a distribution as a percentage to the mean. . greater is the percentage spread. In symbolic form S CV = for the sample data and = X σ μ for the population data.
The sales performance of these two in the context of selling PCs are given below.Coefficient of Variation Example Consider two Sales Persons working in the same territory. Sales Person 1 Mean Sales (One year average) 50 units Standard Deviation 5 units Sales Person 2 Mean Sales (One year average)75 units Standard deviation 25 units . Comment on the results.
Sales Person2 has a very high departure or standard deviation from his average sales achievement. Look at the scatter. Even though.33 or 33% for sales Person2. Sales Person2 has achieved a higher average.10 or 10% for the Sales Person1 and 25/75=0.Interpretation for the Example The CV is 5/50 =0. his performance is not consistent and seems erratic. It seems Sales Person1 performs better than Sales Person2 with less relative dispersion or scattering. The moral of the story is "don't get carried away by absolute number". .
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.