You are on page 1of 60

Bi – Variate Analysis

•It is concerned with the relationship between pairs of variables(X,Y) a data set.
•It is the simultaneous analysis of two variables.
•It is usually undertaken to see if one variable is related to another variable.
• Both variables are numerical
• Both variables are categorical
• One variable is numerical and the other ia
categorical.
Bivariate Statistical Techniques

• Correlation Analysis
• Linear Regression Analysis
• Association of Attributes
• Two-way ANOVA
Cross Tabulation
• A cross tabulation is a technique that describes two or more
variables simultaneously and results in tables that reflect the
joint distribution of two or more variables that have a limited
number of categories or distinct values.
• cross tabulation displays the joint distribution of two or more
variables.
Correlation Analysis
• Study of relationship between two variables
• If changes in the value of one variable will affect the value of
the other variable then both the variables are correlated.
Types of correlation
• On the Basis of Direction
– Positive Correlation
– Negative Correlation
• On the basis of Number of Variables
– Simple Correlation
– Multiple Correlation
– Partial Correlation
• On the Basis of Ratio of Change direction
– Linear Correlation
– Non Linear Correlation
Correlation:-On the Basis of Direction

• Positive Correlation
– Correlation is positive when two variables vary in the same direction.
– For Ex. Correlation between sales and Advt. expenses

Firms A B C D E F
Advt.Expeses 12 14 17 18 20 23
Sales 20 30 40 50 60 70
• Negative Correlation
• Both variables vary in opposite direction
• When variable increases other variable decreases and vice versa.
• For Ex:- Correlation between Production and Price of crop

Producti 100 200 300 400 500


on(Kg)
Price(Rs 10 9 8 7 6
Per Kg)
• Linear Correlation
– Change in values of one variable has a fixed ratio
to the variation in the values of other variable.
– When these variables are plotted on the graph,
then plotted points would fall on a straight line.
Groundnu 10 20 30 40 50 60
t(Kg)
Oil(Kg) 2 4 6 8 10 12
• Non Linear Correlation
– Change in values of one variable does not have a fixed
ratio to the variation in the value of other variable.
– When these variables are plotted on a graph, then plotted
points would fall on a curve.

Use of 1 2 3 4 5 6
fertilizer(in
Kg)
Production of 4 6 9 16 25 28
Rice(in Kg)
• Simple Correlation
– When we measure the linear relationship between two variables then
this relationship is known as simple correlation.
– Ex.-relationship between sales and expenses
• Partial Correlation
– If we have various related variables and try to find out the relationship
between two variables then it is known as partial correlation.
– For Ex:-Two variables height and weight, which are partially correlated
because of effect of third variable ‘age’
• Multiple Correlation
– Measurement of effect of multiple variables on one variable.
– Ex:-Relationship of rainfall and temperature on the yield of wheat.
Degree of correlation
• Perfect Correlation
– Perfect positive correlation
• two variables change in the same direction and in the same
proportion
• Coefficient of correlation in this case is +1
– Perfect negative correlation
• Two variables change in opposite direction
• Coefficient of correlation is -1
• Absence of Correlation
– If series of two variables show no relation between them
or change in one variable does not lead to change in the
other variable then it means there is no relationship
between variables.
– Coefficient of correlation is zero.
• Limited degree of correlation
– If two variables are not perfectly correlated or there is an
absence of perfect correlation, then it is referred to as
Limited correlation
– It may be positive, negative or zero.
– Lies within limits +/- 1
• High Degree(+/- 0.75 to +/- 1)
• Low Degree(0 and +/- 0.25)
• Moderate Degree(+/- 0.25 to +/- 0.75)
strong intermediate weak weak intermediate strong

-1 -0.75 -0.25 0 0.25 0.75 1


indirect Direct
perfect perfect
correlation correlation
no relation
If r = Zero this means no association or
correlation between the two variables.

If 0 < r < 0.25 = weak correlation.

If 0.25 ≤ r < 0.75 = intermediate correlation.

If 0.75 ≤ r < 1 = strong correlation.

If r = l = perfect correlation.
Degree Positive Correlation Negative Correlation
coefficient coefficient

Perfect +1 -1
Limited
High Between +0.75 to +1 Between -0.75 to -1
Moderate Between +0.25 to + 0.75 Between -0.25 to - 0.75

Low Between 0 to 0.25 Between 0 to -0.25


Absence Zero Zero
Correlation Coefficient
• The extent of measurement as to how much
one number can expect to be influenced by
changes in another number is known as
correlation coefficient
Methods of Computing Correlation
• Scatter diagram
• Spearman’s Rank Correlation
• Karl Pearson’s Coefficient of Correlation
Scatter Plot/Diagram
• Rectangular coordinate
• Two quantitative variables
• One variable is called independent (X) and
the second is called dependent (Y)
• Points are not joined
• No frequency table
SBP(mmHg)

220
200

180
160

140
120

100
80 wt (kg)
60 70 80 90 100 110 120
SBP (mmHg)
220

200

180

160

140

120

100

80
Wt (kg)
60 70 80 90 100 110 120
Scatter Diagram:-It is a dot chart specially used to show the correlation
• Merits of Scatter Plot
– It is very simple and non-mathematical technique.
– It is not influenced by the size of extreme items
– It is the basic step to find out the relationship
between two variables
• Demerits of Scatter Plot
– It cannot find out exact degree of correlation
between two variables.
– We can only view the visual form of correlation
and direction on the chart
Karl Pearson’s Coefficient of Correlation
• Calculate Karl Pearson Coefficient of Correlation

X 42 52 55 60 66 68 65 60 58 34
Y 11 13 18 22 26 40 31 27 24 18
X Y x=X-X x2 y=Y-Y y2 xy
42 11
52 13
55 18
60 22
66 26
68 40
65 31
60 27
58 24
34 18
Merits:

1. This method indicates the presence or absence of correlation between two


variables and gives the exact degree of their correlation.
2. In this method, we can also ascertain the direction of the correlation;
positive, or negative.
3. This method has many algebraic properties for which the calculation of co-
efficient of correlation, and other related factors, are made easy.
Demerits:

1. It is more difficult to calculate than other methods of calculations.

2. It is much affected by the values of the extreme items.

3. It is based on a many assumptions, such as: linear relationship, cause and effect

relationship etc. which may not always hold good.

4. It is very much likely to be misinterpreted in case of homogeneous data.


Assumptions:

Karl Pearson based his formula on following basic

assumptions:

(A) Two variables are affected by many independent causes and

form a normal distribution.

(B) The cause and effect relationship exists between two variables

(C) The relationship between two variables is linear. It is often

denoted by r.
Spearman’s Rank Correlation
• Technique to find the correlation between the ranks of two series
• This technique is used when the value of the variable cannot be calculated
quantitatively.
• Professor Charls Spearman worked out a method for determining
correlation in which the values of all data of a series are assigned ranks in
decreasing or increasing (ascending) order.
• In this ranking process, the highest value is given rank 1 and the next higher
value is given rank 2 and so on. In some series the values of two or more
data are similar.
• In that case, the mean of the ranks will be equally shared by those data, as
for example in one series there are two observations; one at S. No. 3 and
the other at S. No. 10 of 67 each. In ranking process 67 at S. No. 3 and 67 at
S. No. 10 instead of being ranked 6 and 7 respectively are ranked at 6.5
(mean of rank 6 and rank 7).
a) Problems where actual rank are given.

1) Calculate the difference ‘D’ of two Ranks i.e. (R1 – R2).

2) Square the difference & calculate the sum of the difference i.e. ∑ D 2

3) Substitute the values obtained in the formula.


b) Problems where Ranks are not given :

1.If the ranks are not given, then we need to assign ranks to the data
series.
2.The lowest value in the series can be assigned rank 1 or the highest
value in the series can be assigned rank 1.
3.We need to follow the same scheme of ranking for the other series.
4. Then calculate the rank correlation coefficient in similar way as we do
when the ranks are given.
c) When the ranks are repeated
Where
r = Rank Coefficient of Correlation
d= Difference between two ranks (R1-R2)
n=Number of Pair of Observations

Rank correlation coefficient ( r ) always exists between -1


and +1
• Following are the rank obtained by 10 students in two subjects, English
and History. To what extent the knowledge of the students in the two
subjects is related

English 1 2 3 4 5 6 7 8 9 10
History 2 4 1 5 3 10 9 6 7 8
Rank of Rank of D= R1- D2
English( History(R R2
R1) 2)
1 2 -1 1
2 4 -2 4
3 1 +2 4
4 5 -1 1
5 3 +2 4
6 10 -4 16
7 9 -2 4
8 6 +2 4
9 7 +2 4
10 8 +2 4
= 46

=1- 6*46/10(10*10-1) = + 0.72


When ranks are not given
• Calculate coefficient of correlation by rank
method
X 54 58 85 75 65 90 80 50
Y 120 134 150 115 110 140 142 100
X Rank Y Rank(R2 D=R1-R2 D2
(R1) )
54 7 120 5 2 4
58 6 134 4 2 4
85 2 150 1 1 1
75 4 115 6 -2 4
65 5 110 7 -2 4
90 1 140 3 -2 4
80 3 142 2 1 1
50 8 100 8 0 0
=22

=1- 6* 22/8(8*8-1) = 0.738


Since n  =  8 and  ∑d²  =  4, apply the above formula,
we get 
r  =  1 - 6 ∑d² / n(n² - 1)
r  =  1 - 6x4 / 8(8² - 1)
r  =  1 - 0.0476
r  =  0.95
The high positive value of the rank correlation coefficient indicates
that there is a very good amount of agreement between sales and
advertisement.
Tied/Repeated Ranks
• Obtain the rank correlation coefficient for the following data

X 68 64 74 50 64 80 74 40 55 64
Y 62 58 67 45 81 60 67 48 50 70
X Rank(R1) Y Rank(R2) D=R1-R2 D2
68 4 62 5 -1 1
64 5 58 7 -2 4
74 2.5 67 3.5 -1 1
50 9 45 10 -1 1
64 5 81 1 4 16
80 1 60 6 -5 25
74 2.5 67 3.5 -1 1
40 10 48 9 1 1
55 8 50 8 0 0
64 5 70 2 3 9
• Compute the coefficient of rank correlation between Eco. marks and statistics marks as given
below :
Solution : 
This is a case of tied ranks as more than one student share the same
mark both for Economics and Statistics.
For Eco. the student receiving 80 marks gets rank 1 one getting 62
marks receives rank 2, the student with 60 receives rank 3, student
with 56 marks gets rank 4 and since there are two students, each
getting 50 marks, each would be receiving a common rank, the
average of the next two ranks 5 and 6 i.e. (5+6) / 2  =  5.50 and lastly
the last rank..
7 goes to the student getting the lowest Eco marks.
In a similar manner, we award ranks to the students with stats marks.
Computation of Rank Correlation Between Eco Marks and Stats
Marks with Tied Marks

For Economics mark there is one tie of length 2 and for


statistics mark, there are two ties of lengths 2 and 3
respectively.
Merits Spearman’s Rank Correlation „

1.This method is simpler to understand and easier to apply compared to


karl pearson’s correlation method. „

2.This method is useful where we can give the ranks and not the actual
data. (qualitative term) „

3.This method is to use where the initial data in the form of ranks.

Limitation Spearman’s Correlation

1.„ Cannot be used for finding out correlation in a grouped frequency


distribution. „

2. This method should be applied where N exceeds 30.


Regression Analysis „

1.Regression Analysis is mathematical measure of average relationship


between two or more variables. „

2.Regression analysis is a statistical tool used in prediction of value of


unknown variable from known variable.

Advantages of Regression Analysis

1. Regression analysis provides estimates of values of the dependent


variables from the values of independent variables. „

2.Regression analysis also helps to obtain a measure of the error


involved in using the regression line as a basis for estimations . „

3. Regression analysis helps in obtaining a measure of the degree of


association or correlation that exists between the two variable.
Assumptions in Regression Analysis
1. Existence of actual linear relationship. „

2. The regression analysis is used to estimate the values within the


range for which it is valid. „

3. The relationship between the dependent and independent variables


remains the same till the regression equation is calculated. „

4. The dependent variable takes any random value but the values of the
independent variables are fixed. „

5. In regression, we have only one dependant variable in our estimating


equation. However, we can use more than one independent variable.
Regression line
• „ Regression line is the line which gives the
best estimate of one variable from the value
of any other given variable. „
• The regression line gives the average
relationship between the two variables in
mathematical form. „
• The Regression would have the following
properties:
a) ∑( Y – Y c ) = 0 and
 b) ∑( Y – Y c )2 = Minimum
• Regression line „ For two variables X and Y, there are
always two lines of regression – „ Regression line of
X on Y : gives the best estimate for the value of X for
any specific given values of Y „
• X=a+bY
• a = X - intercept
• „ b = Slope of the line
• „ X = Dependent variable „
• Y = Independent variable
• Regression line „ For two variables X and Y, there are
always two lines of regression – „ Regression line of
Y on X : gives the best estimate for the value of Y for
any specific given values of X
• „ Y = a + bx
• a = Y - intercept „
• b = Slope of the line
• „ Y = Dependent variable „
• x= Independent variable
• The Explanation of Regression Line „
• In case of perfect correlation ( positive or negative ) the two
line of regression coincide. „ If the two R.
• line are far from each other then degree of correlation is less,
& vice versa. „
• The mean values of X &Y can be obtained as the point of
intersection of the two regression line. „
• The higher degree of correlation between the variables, the
angle between the lines is smaller & vice versa.

You might also like