Analysis of Data

   

Measures of Central Tendency Measures of Dispersion Skewness Kurtosis

Measure of central tendency (Averages) is the middle point and unique value that describes the entire data. Methods: 1.Arithmatic mean or mean 2.Median 3.Mode 4.Geometric Mean 5.Hormonic Mean

Mean:The mean of n observations is the ratio of sum of n observations to the total no of observations. Ungrouped data/ Raw data:
Sumofobservations Mean ! No.ofobservations

Calculating Mean from grouped data:

ean

§ f vx §f

Deviation Method:
ean «§ f v d » A ¬ ¼vh ­ §f ½

Where A is assumed value chosen from series X(mid-points of class interval)

d

x A h

h is the width of the class interval

1.Child care community Nursery is eligible for a county social services grant as long as the average age of its children stays below 9. If these data represent the ages of all the children currently attending ChildCare, do they qualify for the grant? 8 5 9 10 9 12 7 12 13 7 8

2. The following data represent the ages of patients admitted to a small hospital. 85 75 66 43 40 88 80 56 56 67 89 83 65 53 75 87 83 52 44 48 a) Construct a frequency distribution with classes 40-49,50-59 etc.. b) Compute the A.M

3)National Tire company holds reserve funds in short-term marketable securities. The ending daily balance (in million \$) of the marketable securities account for 2-weeks is shown below: Week 1: 1.973 1.970 1.972 1.975 1.976 Week 2: 1.969 1.892 1.893 1.887 1.895 What was the average amount invested in marketable securities during a) the first week b) the second week c) the 2-week period d) An average balance over the 2-weeks of more than \$1.970 millions would qualify National Tire company for higher interest rates. Does it qualify? e) If the answer to part (c) is less than \$1.970 millions, by how much would the last day¶s invested amount have to rise to qualify the company for the higher interest rates?

Calculate the mean from the following data
Marks 0-10 10-20 20-30 30-40 40-50 50-60 60-70 No.of students 4 7 13 21 15 25 20

Merits:1.It is based on each and every observation of the data. 2.It is useful for performing statistical procedures such as comparing the means from several datasets. 3.It is unique Demerits:1.It is highly affected by extreme values 2.It is not suitable in case of open-end class intervals.

Ex:- Compute average for the following 2 , 8 , 9 , 11 Mean = (2+8+9+11)/4=30/4=7.5 Mean is not suitable

Median:Median is the value which separates the entire data into two parts. Raw data: 1.Arrange the data in an ascending or descending order of magnitude. 2.Compute [(n+1)/2]th term which gives the position of the median.

Calculating the Median from Grouped data:

edian

« L¬ ­

2

f

» ¼vh ½

Where L is the lower limit of the median class f is the frequency of the median class h is the width of the median class interval N is the total frequency m is the sum of all the class frequencies up to but not including the median class Median class is the cumulative frequency just greater than N/2.

1.Find the median 5 , 7, 4, 9, 5, 6, 2 Ans: 2, 4, 5, 5, 6, 7, 9 n=7 i.e odd median=(7+1)/2=4th term=5 2.Find the median 8, 7, 9, 4, 8, 10, 9, 9, 3, 5 Ans: 3, 4, 5, 7, 8, 8, 9, 9, 9, 10 n=10 even Median=(10+1)/2=5.5th term i.e median is the average of 5th and 6th terms=(8+8)/2=8

Income Group Frequency 2000-3000 3000-4000 4000-5000 5000-6000 6000-7000
N=20

C.f 2

2 5 7(m) 6(f) 4 3

13 Median Class 17 20

N/2=20/2=10 L=4000

«10  7 » Median ! 4000 ¬ ¼1000 ! 4000 500 ! 4500 ­ 6 ½

The following data relates to distribution of total loans/credit among various borrowers according to rate of interest.
Rate of interest % of Total C.F <6 6-8 8-10 10-12 >12 10.2 15.5 26 32.6 15.7 N=100
24.3 « 50  25.7 » Median ! 8 ¬ 2!8 ! 8 1.8692 ! 9.8692 \$ 9.87 ¼ 13 ­ 26 ½

10.2 25.7 51.7 84.3 100

Merits: 1. Extreme values do not affect its value 2. Specially useful when data is skewed De-Merits: 1.It may not be representative of the data Ex: 1, 2, 10 2. It does not consider all the observations

Mode
 The mode of a data set is the value that occurs with greatest frequency.  The greatest frequency can occur at two or more different values.  If the data have exactly two modes, the data are bimodal.  If the data have more than two modes, the data are multimodal.

425 440 450 465 480 510 575

430 440 450 470 485 515 575

430 440 450 470 490 525 580

435 445 450 472 490 525 590

435 445 450 475 490 525 600

435 445 460 475 500 535 600

435 445 460 475 500 549 600

435 445 460 480 500 550 600

440 450 465 480 500 570 615

440 450 465 480 510 570 615

Finding Mode for grouped data:-

Mode !

« f1  f 0 » ¬ ¼h ­ 2 f1  f 0  f 2 ½

Land holdings 0-2 2-4 4-6 6-8 8-10

No.of holdings 15 20(f0) 35(f1) 25(f2) 15 Modal Class

« » 35  20 30 « 15 » Mode ! 4 ¬ ¼ 2 ! 4 ¬ 25 ¼ 2 ! 4 25 ! 4 1.2 ! 5.2 ­ ½ ­ 2(35)  20  25 ½

Compute average growth rate (savings A/c) for the following Year Interest rate(%) 7 8 10 12 18 Growth Savings factor at the end of year 1.07 107 1.08 1.10 1.12 1.18 115.56 127.12 142.37 168

1 2 3 4 5

Growth factor = 1+(Interest rate/100) Let us deposit initially Rs100.00 Average growth factor=(1.07+1.08+1.10+1.12+1.18)/5=1.11 i.e an average interest rate of 11% per year Therefore a Rs100.00 deposit would grow in five years to (100)x(1.11)x(1.11)x(1.11)x(1.11)x(1.11)= Rs168.51

G.M= n x 1 . x 2 ..... x n Now Average growth factor=
5

(1.07)(1.08)(1.10)(1.12)(1.18) ! 5 1.679965 ! 1.1093

Let us consider the following series 1. 2. 3. 4. 50 49 40 10 , , , , 50 50 50 50 , , , , 50 51 60 90

Measures of Variability
       Range Interquartile Range Quartile Deviation Mean Deviation Variance Standard Deviation Coefficient of Variation

Range
 The range of a data set is the difference between the largest and smallest data values.  It is the simplest measure of variability.  It is very sensitive to the smallest and largest data values.

Example: Apartment Rents
 Range Range = largest value - smallest value Range = 615 - 425 = 190 425 440 450 465 480 510 575 430 440 450 470 485 515 575 430 440 450 470 490 525 580 435 445 450 472 490 525 590 435 445 450 475 490 525 600 435 445 460 475 500 535 600 435 445 460 475 500 549 600 435 445 460 480 500 550 600 440 450 465 480 500 570 615 440 450 465 480 510 570 615

Interquartile Range
 The interquartile range of a data set is the difference between the third quartile and the first quartile.  It overcomes the sensitivity to extreme data values.

 Interquartile Range 3rd Quartile (Q3) = 525 1st Quartile (Q1) = 445 Interquartile Range = Q3 - Q1 = 525 - 445 = 80

425 440 450 465 480 510 575

430 440 450 470 485 515 575

430 440 450 470 490 525 580

435 445 450 472 490 525 590

435 445 450 475 490 525 600

435 445 460 475 500 535 600

435 445 460 475 500 549 600

435 445 460 480 500 550 600

440 450 465 480 500 570 615

440 450 465 480 510 570 615

50% of the apartment have rent between 445&525 and the range of their rents being 80

Q3 Quartile Deviation is Q.D !

Q1 2

Q 3 Q1 and the coefficient of Q.D is Q 3  Q1

Quartile eviation is .

!

3 

Q1 2

and the coe icient o

. is 

Q1 3  Q1
3

It can be used in open ± end frequency distribution

Mean Deviation
 M.D is the arithmetic mean of the absolute deviations of all items from average and is given 1 by

M .D !

§ x  Mean n

1 M .D ! § x  Median n 1 M .D ! § f x  Mean N 1 M .D ! § f x  Median N

Variance
 The variance is a measure of variability that utilizes all the data.

 It is based on the difference between the value of each observation (xi) and the mean.

The variance is the average of the squared differences between each data value and the mean and usually denoted by 2 .

1 2 W ! § x  x n 1 2 2 W ! § f x  x
2

«1 2 W !¬ ¬ ­

§ fd

2

§

¸ fd ¹ º

2

» 2 ¼vh ¼ ½

x -a Where d ! h

 The standard deviation of a data set is the positive square root of the variance, denoted by . 1 2 W! § x  x n

W!

1 N

§ f x  x §
2

2

«1 W! ¬ ¬N ­

§

¸ fd ¹ º

2

» ¼ h ¼ ½

x -a Where d ! h

Calculate Q.D and coefficient of Q.D from the following data

Roll No Marks

1 25

2 55

3 5

4 45

5 15

6 35

th ¨ n 1¸ 7 ite ! 1.75th ite Q1 ! size of © ¹ ite ! size of 4 ª 4 º size of 1.75 th ite ! size of 1st ite  0.75(size of 2 nd  size of 1st ite )

th

! 5  0.75(15  5) ! 5  7.5 ! 12.5
th ¨ 3(n  1) ¸ (3 v 7) Q3 ! size of © ite ! 5.25th ite ¹ ite ! size of 4 ª 4 º size of 5.25 th ite ! size of 5 th ite  0.25(size of 6 th  size of 5 th ite ) th

! 45  0.25( 55  45 ) ! 45  2.5 ! 47.5 Q.D ! 47.5  12.5 / 2 ! 17.5

A Quality control laboratory received samples of electric bulbs for testing their lives, from two suppliers. The results were as follows.
Length of life (in hours) 1500-2000 2000-2500 2500-3000 Company A Company B

16 26 8

18 22 8

 Which company¶s bulbs have the greater length of life?  Which company¶s bulbs are more uniform w.r.t their lives?

The shareholders Research Centre of India has recently conducted a research-study on price behaviour of three leading industrial shares, A,B, and C for the period 1979-1985, the results of which are published as follows in its Quarterly Journal: Share Average price (RS) A B C 18.2 22.5 24.0 Standard deviation 5.4 4.5 6.0 Current selling price 36.00 34.75 39.00

i)

Which share, in your opinion, appears to be more stable? ii) If you are the holder of all the three shares, which one would you like to dispose of at present, and why?

X 0 1 2 3 4 5 6 7 8

f 1 8 28 56 70 56 28 8 1

Mean ! Median ! Mode 
Q1 ! Q3  Q2 2

Bell shaped cu
80 70 60 50 40 30 20 10 0 1 2 3 4 5 x f

e

6

7

8

9

Skewness:Lack of symmetry Absolute Measures:Sk = Mean-Median Sk = Mean-Mode

Relative Measures:-

Karl Pearsons Coefficient of Skewness is given by Mean  Median Sk ! Standard deviation 3( Mean  Mode) Sk ! Standard deviation Bowlys Coefficient of Skewness is given by Q3  Q2  Q2  Q1 Sk B ! Q3  Q2  Q2  Q1

Kurtosis:Convexity of the curve

Ex: The data on the profits (in Rs lakh) earned by 60 companies is as follows:
Profits No.of companies Below 10 5 10-20 12 20-30 20 30-40 16 Above 40-50 50 5 2

Determine co efficient of Skewness.