Statistical methods

Measurement of central tendency Measurement of dispersion Correlation Regression

Data & its types 

Definition of Data: Facts, figures, enumerations & other materials, past and present, serving as basis for study and analysis; they are raw material for analysis; provide basis for testing hypothesis, developing scales and tables  Data help researchers draw inferences on specific issues/ problems  Quality of findings depend on relevance, adequacy & reliability of data  Types of data (Not in statistical sense) A.1. Personal data (Individual as a source)  Demographic & socio-economic Characteristics  Behaviour variables  Attitude, behaviour, opinions  Awareness, preferences, knowledge  Practices, intensions 2. Organisational data (Organisational sources)  Archives ,Manuscript library, museums 3. Territorial data  Economic structure, occupation pattern B. I Secondary (Paper method) II Primary (Pencil Method)    

Methods & Techniques of Data Collection 

I-Secondary data 
 

How to scrutinize Published & unpublished Methods where used 

   

A-Meta analysis B- Historical method C-Content analysis D-Informetrics E-Use studies

II-Primary data           A-Records & relics B-Observation C-Experimentation D-Simulation E-Ask people orally F-Ask people in writing G-Panel study H-Projective techniques I -Sociometry J -Case study -Interview / Depth interview / Schedule -Mail survey / questionnaire -Mechanical devices .

studies and reports. 5. Policies on foreign direct investment .ASSOCAM(Associated chamber of commerce and Industry).CII(Confederation of INDIAN INDUSTRY). FICCI(federation of Indian chambers of conference and industry). Rules on international trading. Central and local govt. state budgets 4. Internet sites /webpage of different companies and organizations 2. 3. import and exports.Data Sources  Primary Data  Secondary Data1.

6 ean = .0 7.6 N=1 0.5 Edu ational Attainment Reason or ermination ¢   £ ¢¡ ¤ ¢ ¢   ¢   £ ¡¡¡ ¤   ¡ ¢ ¢   ¢ 0 0 .5 5.0 15.0 6.0 .0 7.Skewness and Kurtosis: some examples Edu ational Attainment 1 0 Reason or ermination 80 100 60 80 60 0 Frequen 0 Std.00 .81 ean = .0 1 .00 0 .0 .0 5.0 .8 N= 1.5 10.0 17. De = 1. De = 5.0 Frequen Std.5 0 .

.

.

.

.

.

.

Pictogram .

Annotated box plot .

Describing Data Numerically Describing Data Numerically Central Tendency Arithmetic Mean Median Mode Variation Range Interquartile Range Variance Standard Deviation Coefficient of Variation .

Measures of Central Tendency Overview Central Tendency Mean n Median Mode § ! i !1 i n Midpoint of ranked values Most fre uently observed value Arithmetic average .

 x N ! ! N N i i !1  §x N Population values Population size For a sample of size n: n § ! i !1 i n ! 1  .  n n Observed values Sample size .Arithmetic Mean  The arithmetic mean (mean) is the most common measure of central tendency  For a population of N values: x1  x 2  .

Arithmetic Mean (continued)    The most common measure of central tendency Mean sum of values divided by the number of values Affected by extreme values (outliers) 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Mean = 3 1  2  3  4  5 15 ! !3 5 5 Mean = 4 1  2  3  4  10 2 ! !4 5 5 .

Median  In an ordered list. 50% below) 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Median = 3    Median = 3 Not affected by extreme values Median L+[(1/2N-C)/f ]h Q2 Compare knowledge level in Two subjects for a group of students by median . the median is the ³middle´ number (50% above.

c.9 ¸ ¨ jN  p. f © ª 4 Qj ! L  f ¸ ¹ º h..2...2.c.3 ¨ jN  p..Quartiles.2.. f ¹ © º h P ! L  ª 100 j f j ! 1.99 ¸ ¹ ºh . Deciles..c. Deciles(divides data in ten parts) and percentiles (divides data in 1000 parts) Mode 3median-2mode ¨ jN  p. Percentiles   Similar to median which divides data in to parts . j ! 1. f © ª 10 Dj ! L  f j ! 1. Quartiles (divides data in four parts).

the median is the middle number If the number of values is even. the median is the average of the two middle numbers  n 1 is not the value of the median.Finding the Median  The location of the median: edian osition !   n 1 osition in the ordered data If the number of values is odd. only the 2 position of the median in the ranked data Note that .

Mode       A measure of central tendency Value that occurs most often Not affected by extreme values Used for either numerical or categorical data There may be several modes Frequency after modal class Mode L+[(f-f-1)/(2f-f-1-f1 )]h Frequency before modal class 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6 Mode = 9 No Mode .

000 300.Review xample  Five houses on a hill by the beach $ House Prices: $2.000.000 100.000 $ $ $ $ .000 500.000 100.

000 ($3.000.000.000.000 100.Review xample: Summary Statistics House Prices:  Mean: $2.000  .000 Sum 3.000 100.000 Mode: most fre uent value $100.000 300.000 500.000/5) $600.000  Median: middle value of ranked data $300.

Example 5 9 7 9 10 9 5 Mean mode median SD 1 2 3 4 5 7 7 20 22 20 4.5 19 20 22 22 17 7.F C 5 5 49 6 11 44 15 26 38 10 36 23 5 41 13 4 45 8 2 47 4 2 49 2 Median=L+[(1/2N-C)/f ]h Median Class=Total Freq/2 Class Median Class='15-20 i.714286 4.- MODAL CLASS= Max Frequency class i.e 26 in Cumulative frequency e (-- ( -.238095 7 4 5.142857 9 9 4.47619 Class 5-10 10-15 15-20 20-25 25-30 30-35 35-40 40-45 Frequenc less t .e 15 is max fre in freq G21 Median=15+[((1/2)49-11)/15 ]5 Mode 15+[(15-6)/(2x15-6-10 )]5 .F More Than C.

since the median is not sensitive to extreme values.   Example: Median home prices may be reported for a region ± less sensitive to outliers .Which measure of location is the ³best´?  Mean is generally used. unless extreme values (outliers) exist Then median is often used.

production. ). Best considered in case of constructing index number. .Geometric mean & Harmonic mean  Geometric mean is nth root of product of n observations ( ex: average percent increase in sales.

.§ log X log G.M ! anti log N  .

M ! ¨ f ¸ §© X ¹ ª º . H. .M ! ¨1 §© X ª ¸ ¹ º .N1 log G1  N 2 log G2 ! N1  N 2 Harmonic mean: restricted use such as average rate of increase of profits average price at which an article has been sold N H.

different variation . Same center.Measures of Variability Variation Range Interquartile Range Variance Standard Deviation Coefficient of Variation  Measures of variation give information on the spread or variability of the data values.

1 = 13 .Range   Simplest measure of variation Difference between the largest and the smallest observations: Range Xlargest ± Xsmallest Example: 0 1 2 3 4 5 9 10 11 12 13 14 Range = 14 .

2.1.2.2.1.1 = 4 1.3.2.Disadvantages of the Range  Ignores the way in which data are distributed 7 8 9 10 11 12 7 8 9 10 11 12 Range = 12 .1.1.2.7 = 5  Range = 12 .1.3.2.1 = 119 .1.1.1.2.2.4.1.2.1.1.2.1.120 Range = 120 .3.2.1.4.1.2.1.2.1.3.2.3.1.3.2.3.3.1.7 = 5 Sensitive to outliers 1.2.1.1.5 Range = 5 .

and low-valued observations and calculate the range of the middle 50% of the data Inter uartile range 3rd uartile ± 1st uartile IQR Q3 ± Q1 LS Coefficien t of Range ! LS Coefficien t of quartile deviation ! Q3  Q1 Q3  Q1 .Inter uartile Range    Can eliminate some outlier problems by using the interquartile range . Eliminate high.

Inter uartile Range Example: X minimum 25% Q1 25% Median (Q2) 25% Q3 25% X maximum 12 30 45 57 70 Inter uartile range 57 ± 30 27 .

Quartiles  Quartiles split the ranked data into 4 segments with an e ual number of values per segment 25% Q1 25% Q2 25% Q3 25%    The first uartile. is the value for which 25% of the observations are smaller and 75% are larger Q2 is the same as the median (50% are smaller. Q1. 50% are larger) Only 25% of the observations are greater than the third uartile .

where First uartile position: Q1 = 0.50(n+1) (the median position) Third uartile position: Q3 = 0.75(n+1) where n is the number of observed values .Quartile Formulas Find a uartile by determining the value in the appropriate position in the ranked data.25(n+1) Second uartile position: Q2 = 0.

5 position of the ranked data so use the value half way between the 2nd and 3rd values.25(9+1) = 2.5 .Quartiles  Example: Find the first uartile Sample Ranked Data: 11 12 13 16 16 17 18 21 22 (n 9) Q1 = is in the 0. so Q1 = 12.

Standard deviation  Variance=s uare of S.d ! x  A ¹ N º 2 .D § SD(from actual mean) ! W ! i !1 n (x i  x ) 2 N . N SD(assumed mean) ! W ! § fd N 2 ¨  © © ª § fd ¸ ¹ . SD(assumed mean) ! W ! §d N 2 ¨  © © ª (x i § d ¸ ¹ . N ¹ º 2 § SD(from actual mean) ! W ! i !1 n  x)2 .

Population Standard Deviation    Most commonly used measure of variation Shows variation about the mean Has the same units as the original data  Population standard deviation: § (x  i ) ! i !1 -1 .

Calculation Example: Sample Standard Deviation Sample Data (xi) : 10 12 n= 14 15 17 18 18 24 Mean = x = 16 s! (10  X ) 2  (12  x) 2  (14  x) 2  .  (24  x) 2 n 1 (10  16) 2  (12  16) 2  (14  16) 2  .  (24  16) 2 8 1 126 7 ! ! ! 4.2426 A measure of the ³average´ scatter around the mean .

SD Example Size x 6 Freq f 3 6 9 9 10 11 12 13 8 5 4 Total f=48 D=x-9 -3 -2 -1 0 1 2 3 Fxd -9 -12 -9 0 8 10 12 fxd2 27 24 9 0 8 20 36 Sum=124  Example 7 n S. ! § f (x  x) §f i i i !1 i 2 ! § §f fd 2 i 124 ! ! 1.607 48 .

8 © © n ¹ º ª 2 2 ¨ § d2¸ ¹ ! 930 . 8 © eanB ! 27. Man A =x D1=x-51 d12 B. S.2  (  24 ) 2 ! 18 . CovB ! (18 .8  (  1) 2 ! 41 . 8 / 50 ) x100 ! 83 .Man B =y 47 12 16 42 4 51 37 48 13 0 D2=y-51 d22 SD 12 115 6 73 7 19 119 36 84 29 Total -10 17508 Total -240 9302  Who is better scorer & who is more consistent? A! meanA ! 50 . B ! © n ¹ n ª º Coariation ! ( 41 .B.6 %.8 / 27 ) x100 ! 69 . S.6 % § d2 2 . § d1 n 2 ¨ § d1 ¸ ¹ ! 1750 .

Measuring variation Small standard deviation Large standard deviation .

570 Data C 11 12 13 14 15 16 17 18 19 20 21 .5 s = 0.5 s = 4.338 Data B 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.5 s = 3.Comparing Standard Deviations Data A 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.926 Mean = 15.

Advantages of Variance and Standard Deviation  Each value in the data set is used in the calculation Values far from the mean are given extra weight (because deviations from the mean are s uared)  .

The Empirical Rule  If the data distribution is bell-shaped. then the interval: s 1 contains about 68% of the values in the population or the sample  68% s1 .

7% of the values in the population or the sample 95% 99.The Empirical Rule  s2 s  contains about 95% of the values in the population or the sample contains about 99.7% s2 s3 .

Coefficient of Variation     Measures relative variation Always in percentage (%) Shows variation relative to mean Can be used to compare two or more sets of data measured in different units ¨ s ¸ C V !© ¹™ © ¹ ª º % .

Comparing Coefficient of Variation  Stock A:  Average price last year = $50  Standard deviation = $5 ¨s¸ $5 CVA ! © ¹ ™100 ! ™ 100 ! 10 © ¹ $50 ª º  Stock B:   Average price last year = $100 Standard deviation = $5 ¨s CVB ! © ©x ª ¸ $5 ¹ ™100% ! ™100% ! 5% ¹ $100 º Both stocks have the same standard deviation. but stock B is less variable relative to its price .

fK  For a population of N observations the mean is §fm i K i !  i!1 r N ! § fi i !1 K N For a sample of n observations. . the mean is § fm i K i x! i !1 r n ! § fi i!1 K n . . mk. .. m2. . . f2.Approximations for Grouped Data Suppose a data set contains values m1. . occurring with fre uencies f1.

Shape of a Distribution   Describes how data are distributed Measures of shape  Symmetric or skewed Left-Skewed Mean < Median Symmetric Mean = Median Right-Skewed Median < Mean .

1 ' rth moment about arbitrary point ! Q r ! N weget d d d dd d Q 2 ! Q 2  Q12 . d dd dd d Q 4 ! Q 4  4Q3 Q1  6Q 2 Q12  3Q14 . § i !1 n f i (x i  a) r .Moments  Moments are defined as 1 rth moment about mean ! Q r ! N § i !1 n f i (x i  x) r . . Q3 ! Q3  3Q 2 Q1  2Q13 .

Skewness    Skewness refers lack of symmetry (may be from mean) and tell about difference between variation . whereF1 ! 2 Q3 3 Q2  alwaysposi tive. Karl pearson coefficient of skewness=(MeanMode)/Standard deviation Bowley¶s or uartile coefficient of skewness Q  Q1  2med owley Skewness ! 3 Q3  Q1 Coefficient of skewness based on third moment K 1 ! F1 . . skewness tell about direction of skewness such as left skewed or right skewed.

5 22.527.5 17.522.537.5 32.5 12.5 7.5 27.5 .532.Moments  From the given data find the first four moments about origin Monthly No of profit Compani es (f) 4 10 20 36 16 12 2 Less than 7.517.5-12.

5 17.5 22.5 10 15 20 25  Find the first four2 moments about origin 30 12 35 2 N=100 3 -6 178 -42 874 .Monthly profit Less than 7.527.5-12.5 27.5 Total SD Mid point No of X Compani es (f) 5 4 10 20 36 16 D=(X20)/5 -3 -2 -1 0 1 fd fd2 fd3 fd4 7.537.517.532.522.5 32.5 12.

3)3 ! 12. Q'd 2! 100 § fd 2 N i2 ! 178 x25 ! 44.5.504.5  4(0.5)  6(44.Moments about arbitrary mean and mean Q'd 1! d Q3 ! § § fd N N i! 3 6 x5 ! 0.41.5  3(0.0422 .3)2 ! 44.5057.3)4 ! 5423.5)(0.3x44.3.3)2  3(0. 100 874 i ! x625 ! 5463.5.5)  2(0. 100 4 fd 3  42 d i ! x125 ! 52.3x  52. K1 ! Skewness! F1 ! 2 Q3 3 Q2 ! 0. Q'3 ! 52.5  (0. K 2 ! Kurtosis! F2  3 ! Q4 2 Q2  3 ! Ans . Q4 ! 100 § fd 4 N Q'2 ! 44.5. Q'4 ! 5462.

Kurtosis  Kurtosis refers to bulginess or degree of Leptokurtic flatness or peakness.K ! F2  3 Platykurtic More peaked than normal then leptokurtic F 2 H3 3  Less peaked then platykurtic F 2 R  normal curve is mesokurtic  . kurtosisF 2 ! Q4 2 Q2 .

Scatter Plots of Data with Various Correlation Coefficients
Y Y Y

X r = -1 Y Y r = -.6

X r=0 Y

X

X r = +1 r = +.3

X r=0

X

Correlation   

Correlation helps in determining the degree between two or more variables, However it does not tell us cause effect relationship. Methods: Scatter diagram , karl Pearson coefficient of correlation, Spearman¶s rank correlation   ¨ ¸¨ ¸ § © X  X ¹ ©Y  Y ¹ Karl Pea Formula: ª ºª º
r! ¨ ¸ §© X  X ¹ ª º 
2

¨ ¸ § ©Y  Y ¹ ª º 

2

Ex: Psycological test of intelligence Ratio and engineering ability Ratio of 10 students are as follows. Calculate the coefficient of correlation. ( Mean x= xbar=99,Mean y =ybar=98)

Student A B C D E F G H I J TOTAL

Intelligen t ratio x 105 104 102 101 100 99 98 96 93 92 990

x-xbar=X 6 5 3 2 1 0 -1 -3 -6 -7 0

X square

Engg Ratio y 101 103 100 98 95 96 104 92 97 94

y-ybar=Y 3 5 2 0 -3 -2 6 -6 -1 -4 0

Y square

XY

170

980

140

92

§ XY r! §X Y
2

2

92 ! ! 0.59 170 x140

Correlation Of bivariate grouped data When fre uency data is given r! N § f d x d y  § fd x d y N § fd x  .

§ fd x 2 2 N § fd y  .

§ fd y 2 2 .

Spearman Rank Correlation Ex: Persons A B C D E F G H I j Rank in stat R 1 9 10 6 5 7 2 4 8 1 3 Rank in Income R 2 1 2 3 4 5 6 7 8 9 10 280 D=R1-R2 D SQUARED 6 x 280 r ! 1 10(10 2  1) .

Features of Correlation Coefficient. the weaker any positive linear relationship   . the stronger the positive linear relationship The closer to 0. the stronger the negative linear relationship The closer to 1. r    Unit free Ranges between ±1 and 1 The closer to ±1.

733 There is a relatively strong positive linear relationship between test score #1 and test score #2 Test #2 Score 100 5 0  85 80 75 70 Test #1 Score  Students who scored high on the first test tended to score high on second test ¥ ¥ ¥ ¥ 70 75 80 85 0 5 100 .Interpreting the Result Scatter Plot of Test Scores  r = .

§ .Obtaining Linear Relationships i.e regression An e uation can be fit to show the best linear relationship between two variables: Y = a +bX Where Y is the dependent variable and X is the independent variable Normal e uations for regression line of y on x  § y ! na  b§ x. xy ! a§ x  b§ x 2 .

Regression example sn 1 2 3 4 5 n=5 x 1 2 3 4 5 15 y 2 5 3 8 7 25 xsquare y square 1 4 4 25 9 9 16 64 25 49 55 151 xy 2 10 9 32 35 88  Regression line of x on y 2 x ! a  by . § x ! na  b § y .§ xy ! a § y  b § y . .

.Regression example s 1 1 2 2 3 3 4 4 5 5 n=5 15  2 5 3 8 7 25 square y square 1 4 4 25 9 9 16 64 25 49 55 151 xy 2 10 9 32 35 88 Regression line of y on x y ! a  bx.§ xy ! a § x  b§ x 2 . § y ! na  b § x.

§ § ( x  x )( y  y ) ! § XY ! § XY b! 2 ( x  x )2 X2 nW x § § thus ( y  y ) ! r !r Wy Wx . This shows that means lie on y=a+bx. § § § ( x  x )( y  y ) ! 0  b § ( x  x ) 2 . . § y § n x.Regression Coefficient when deviations taken from assumed mean From second e uation from the previous slide ! a  b n y ! a  bx. Wy Wx (x  x) .ybar) e n § xy ! a § x  b § x 2 . Takes form ( x  x )( y  y ) ! (x  x)  b ( x  x )2. Shifting the origin to (xbar.

Reg Coe of y on x and x on y So we get regression line of y on x and x on y respectively (y  y) ! r W W y x ( x  x ). § § XY X 2 W (x  x) ! r W W r y x x y ( y  y ). b yx ! ( x  x ) ! b xy ( y  y ). ( y  y ) ! b yx ( x  x ). b xy ! § § XY Y 2 W W r W x y ! r2 .

Ex: From the following data find the two regression e uations. Sales 91 97 108 121 67 124 51 73 111 57 Purchase 71 75 69 97 70 91 39 61 80 47 X=x-xbar= x90 91 97 108 121 67 124 51 73 111 57 900 Y=yybar=yx square purchase 70 Ysquare XY 1 1 71 1 1 7 49 75 5 25 18 324 69 -1 1 31 961 97 27 729 -23 529 70 0 0 34 1156 91 21 441 -39 1521 39 -31 961 -17 289 61 -9 81 21 441 80 10 100 -33 1089 47 -23 529 0 6360 700 0 2868 Sales 1 35 -18 837 0 714 1209 153 210 759 3900 .

Reg Coe and reg line of y on x and x on y Regression line of y on x and x on y. 36 ( y  70 ). 83  0 . XY Y 2 b xy ! ! 3900 / 2868 ! 1 . ( y  70 ) ! 0 . 613 ( x  90 ). 36 ( x  x ) ! b xy ( y  y ) ( x  90 ) ! 1 . y ! 14 . 2  1 . 36 y . 613 x . b yx ! § § § § XY X 2 ! 3900 6360 ! 0 . x !  5 . 613 .

Sign up to vote on this title
UsefulNot useful