Professional Documents
Culture Documents
Module 12 Q1 Bivariateanalysis
Module 12 Q1 Bivariateanalysis
Co-Principal Investigator
QUADRANT –I
2. LEARNING OUTCOME:
After studying this module, you shall be able to
Know the basics of Bivariate analysis
Understand the Cross tabulation
Comprehend the Speakman’s Rank Correlation Coefficient
Understand the Correlation Coefficient
3. Introduction
Bivariate analysis is one of the easiest kinds of statistical analysis which investigates the
association between two variables in order to determine the empirical relationship between them.
It can be applicable for testing simple hypotheses of relationship and can assist establish the
degree to which it becomes easier to understand and anticipate a value for one variable if we the
value of the other variable is known.
4. Bivariate analysis
There are three types of measures used for carrying out bivariate analysis. These are (a) cross
tabulation (b) Spearman’s rank correlation coefficient and (c) Pearson’s linear correlation
coefficient.
Cross
tabulation
Spearman
Introduction
Rank
of third
Correlation
variable
Coefficient
4.1 Cross-tabulation
Figure 2 Cross Tabulation (Adapted from Indiamart.com)
For the purposes of cross tabulation, answers to two questions are combined and data is
systematized together. A cross tabulation calculate the remarks in every individual cross–
category of two variables. The descriptive outcome of a cross tabulation is expressed in the form
of a frequency count for every cell individually in the analysis.. For example, in cross – tabulation
a two-category measure of income with a two-category of intention to purchase of a product the
fundamental outcome is a cross-classification as exhibited in table 1.
In constructing cross-classification tables, one has to first determine which data should be given
primary emphasis and which should be given secondary emphasis. Data with primary emphasis
are normally given in columns while those with secondary emphasis are shown in rows. This
order is repeated for higher order tables those having three or more dimensions. This convention
is almost invariably followed because it is easier to see data when figure follows one another in a
column rather than in a row.
Table 1
Income
Low income level High Income level
Intention to purchase Low level of 120 60
intention to purchase
intention
High level of intention 80 190
to purchase
Respondent count 200 250
.
The results of a cross-tabulation are more meaningful if cell frequencies are computed in
percentage terms. The nature of relationship between the variables determines the basis for
calculating category percentages. The variables can be dealt with as independent and dependent
variable. Here, the purchase intention could be treated as dependent variable and income as
independent variable. The general practice is to cast percentages in the direction of independent
variable across the dependent variable. There are 200 respondents with low income, out of which
120 have low purchase intention and 80 with high intention to purchase. Expressed In percentage
terms 60 per cent of the respondents with low income possess low level of intention to purchase
the product and 80 per cent possess high level of intention to purchase. Now there are 250
respondents with high level of income, out of which 60 possess low level of intention to purchase
whereas 190 with high level of intention to purchase for the product. Computing percentages
according to column, it is seen that 24 per cent keep low purchase intention whereas 76 percent
have high purchase intention for the product. The results indicate that with increase in income,
the purchase intention for the product increases.
Thus, the two variables every one individually possessing certain categories can be cross-
tabulated. A high association may be show a high association by the interpretation of the cross
tabulation results among two variables. That does not mean that the independent variable is the
cause of the dependent variable. The researcher based on his experience or expectations assumes
the causality between the two variables. A high association between two variables, does not imply
a relationship of cause-and effect.
Initial relationship
was spurious
Table 2
Income
Low Income High Income
Consumption of Ice High consumption 30% 55%
Cream Low Consumption 70% 45%
Total 100 100
No. of respondents 600 400
The results indicate that 55 percent of high income respondents fall into high consumption
category as compared to 30 per cent of low income respondents. Before inferring that high
income respondents consume more ice cream as compared to low income families, a third
variable, namely, gender is introduced into the analysis. The results are report in table 3.
Table 3
Gender
Males Females
Low income High income Low income High income
High 30% 38% 20% 60%
consumption
Low 70% 62% 80% 40%
consumption
Column total 100% 100% 100% 100%
No. of 400 180% 200 220
respondents
In table 3 gender of the respondent was introduced as the third variable. The relationship between
consumption of ice cream and income of respondents was reexamined in the light of the third
variable. In case of female, 60 per cent with high income fall in the high consumption category as
compared to 20 percent of those with low income. In case of males, 38 per cent with high income
fall in the high consumption category as compared to 30 per cent with low income. Therefore, it
is seen that percentages are closer in case of males. Therefore, the relationship between ice cream
consumption and income has been refined by introduction of a third variable namely gender.
High income respondents are more likely to fall in the high consumption category and this is
more so in case of females as compared to males.
The table indicates that 35% of respondents with high education own a flat in a high rise building
as opposed to 22 per cent with low education. Now when a third variable ‘income categorized as
low and high income is introduced, the results are shown on table 5.
Table 5
Income
Low income High income
High Low High Low
education education education education
Ownership of Yes 18% 18% 45% 45%
flats in high No 82% 82% 55% 55%
rise buildings Column total 100% 100% 100% 100%
No of 100 300 200 200
respondents
The results on the table show that irrespective of the education level, the ownership of flat in high
rise buildings depends upon the income level. It is more for the high income respondents than that
for the low income respondents, indicating that the initial relationship was spurious.
It is seen that 65 percent of males above 35 years have a high desire to go to temple whereas 70
per cent of females below 35 years of age have a high desired to go to temple. Therefore, the
introduction of third variable has revealed the suppressed relationship between desire to visit
temple and age.
Table 7
Gender
Male Female
< > < >
High 35% 65% 70% 30%
Low 65% 35% 30% 70%
Column total 100% 100% 100% 100%
No of 300 300 100 100
respondents
Table 8 indicates that 60 per cent of the large households buy large sized toothpaste whereas 60
per cent of small households buy small-size toothpaste. Now if income categorized as low income
and high income is introduced as third variable, the new table is table 9.
Table 9
Income
Low income High Income
Large Small Large Small
Household Household Household Household
Size of Large 60% 40% 60% 40%
Toothpaste Small 40% 60% 40% 60%
Column Total 100% 100% 100% 100%
No. of 100 150 100 250
respondents
Results indicate that 60 per cent of the large households buy large-sized toothpaste where 60 per
cent of small households buy small-size toothpaste. Now if income categorized as low income
and high income is introduced as third variable, the new table is presented 9.
It is found that even with the introduction of third variable, i.e., income, the initial relationship
remain unchanged.
rs
Where, d = difference of two ranks (R1 -R2 )
rs = Spearman’s correlation or rank correlation
Example:
10 salesman employed by a company were given month’s training. At the end of the specified
training, they took a test and were ranked on the basis of their performance. They were then
posted to their respective areas. At the end of six months they were rated in respect of their sales
performance. These ranks are shown below:
Salesmen 1 2 3 4 5 6 7 8 9 10
Rank obtained in training 4 6 1 3 9 7 10 2 8 5
Rank based on Sales Performance 5 8 3 1 7 6 9 2 10 4
∑d² 24
rs
= 0.855
A coefficient of 0.855 shows a very high degree of correlation between the performance in
training and the sales performance of the ten salesmen.
First we set up null hypothesis that there is no correlation between ranked data. From Spearman’
Rank Correlation values table we find that the critical value of rs for n=10 at 5 percent level of
significance as 0.6364. As the tabulated value of rank correlation r s is 0.855 which is more than
the critical value, the null hypothesis is rejected. It can be inferred that the performance in
training and the sales performance of a sample of ten salesmen are associated.
In cases where ranks are not given, we have to assign ranks to actual data. We may do so by
taking highest value as 1 or the lowest value as 1.
Scatter diagram method of ascertaining correlation between two variables is to prepare a dot
chart. The data are plotted on a graph paper in the form of dots. By looking at the scatter the
investigator can make about the relationship between the variables.
In graphic method the individual values of the two variables are plotted on the graph paper.
The two curves are obtained, one for X variable and another for Y variable. The examination
of the direction and nearness of the two curves can indicate whether the variables are related.
The direction and strength of the linear association between two numerical paired variables is
measured by the coefficient r.
The movement of the two variables in the same direction tells about the positive correlation
between the two variables. When two variables move in the opposite direction, the correlation
is negative. When the variables move in no connection with each other, there is zero
correlation.
rxy =
The statistical test for the significance of a correlation coefficient is conducted usin g a t-
statistic. The hypothesis to be tested is:
H0 ρ : = 0 H1 : ρ ≠ 0
Test statistic is given by
Given the value of r and n, the value of the test statistic t could be computed. Now for a given
level of significance, if computed t is greater than tabulated t with n-2 degrees of freedom, the
null hypothesis of no correlation between X and Y are rejected.
tn-2 =
Summary
Bivariate analysis examines the relationship between two variables. There are three types of
measures used for carrying out bivariate analysis. These are (a) cross tabulation (b) Spearman’s
rank correlation coefficient and (c) Pearson’s linear correlation coefficient. Spearman’s rank
correlation coefficient is one of the non-parametric tests used to measure the degree of association
between two variables. Sometimes, we come across statistical series that are ranked according to
size. Correlation is a measure of association between two numerical variables. When we are
dealing with two variables we are talking in terms of simple correlation and when more than two
variables are involved, the subject matter of interest is called multiple correlations. Correlation
analysis helps us in determining the degree of relationship between two variables- it does not
determine cause and effect relationship. Even a high degree of correlation does not indicate a
relation of cause and effect existing between the variables. Scatter diagram method of
ascertaining correlation between two variables is to prepare a dot chart. The data are plotted on a
graph paper in the form of dots. By looking at the scatter the investigator can make about the
relationship between the variables.