Module 12 Q1 Bivariateanalysis

Items Description of Module
Subject Name Management

Paper Name Research Methodology
Module Title Bivariate analysis
Module ID Module 12
Pre-Requisites Understanding the nature of bivariate analysis
Objectives To study the bivariate analaysis
Keywords Bivariate analysis, Cross tabulation, Spearman, Correlation Coefficient
Role Name Affiliation
Prof.Ipshita Bansal Department of Management
Principal Investigator Studies, BPSMV, Khanpur
Kalan, Sonipat
Co-Principal Investigator
Prof. S.P.Singh Department of Management

Paper Coordinator Studies, GKV, Haridwar
Prof. S.P.Singh Department of Management

Content Writer (CW) Studies, GKV, Haridwar
Content Reviewer (CR)

Language Editor (LE)
QUADRANT –I
1. Module : Bivariate Analysis

2. Learning Outcome
3. Meaning of Bivariate analysis
4. Cross tabulation
5. Spearman’s Rank Correlation Coefficient
6. Correlation Coefficient
Summary
1. Module: Bivariate Analysis
2. LEARNING OUTCOME:
After studying this module, you shall be able to
 Know the basics of Bivariate analysis
 Understand the Cross tabulation
 Comprehend the Speakman’s Rank Correlation Coefficient
 Understand the Correlation Coefficient
3. Introduction
Bivariate analysis is one of the easiest kinds of statistical analysis which investigates the
association between two variables in order to determine the empirical relationship between them.
It can be applicable for testing simple hypotheses of relationship and can assist establish the
degree to which it becomes easier to understand and anticipate a value for one variable if we the
value of the other variable is known.
4. Bivariate analysis
There are three types of measures used for carrying out bivariate analysis. These are (a) cross
tabulation (b) Spearman’s rank correlation coefficient and (c) Pearson’s linear correlation
coefficient.
Cross
tabulation
Spearman
Introduction
Rank
of third
Correlation
variable
Coefficient
Figure 1 Bi-variate Analysis
4.1 Cross-tabulation
Figure 2 Cross Tabulation (Adapted from Indiamart.com)
For the purposes of cross tabulation, answers to two questions are combined and data is
systematized together. A cross tabulation calculate the remarks in every individual cross–
category of two variables. The descriptive outcome of a cross tabulation is expressed in the form
of a frequency count for every cell individually in the analysis.. For example, in cross – tabulation
a two-category measure of income with a two-category of intention to purchase of a product the
fundamental outcome is a cross-classification as exhibited in table 1.
In constructing cross-classification tables, one has to first determine which data should be given
primary emphasis and which should be given secondary emphasis. Data with primary emphasis
are normally given in columns while those with secondary emphasis are shown in rows. This
order is repeated for higher order tables those having three or more dimensions. This convention
is almost invariably followed because it is easier to see data when figure follows one another in a
column rather than in a row.
Table 1
Income
Low income level High Income level
Intention to purchase Low level of 120 60
intention to purchase
intention
High level of intention 80 190
to purchase
Respondent count 200 250
.
The results of a cross-tabulation are more meaningful if cell frequencies are computed in
percentage terms. The nature of relationship between the variables determines the basis for
calculating category percentages. The variables can be dealt with as independent and dependent
variable. Here, the purchase intention could be treated as dependent variable and income as
independent variable. The general practice is to cast percentages in the direction of independent
variable across the dependent variable. There are 200 respondents with low income, out of which
120 have low purchase intention and 80 with high intention to purchase. Expressed In percentage
terms 60 per cent of the respondents with low income possess low level of intention to purchase
the product and 80 per cent possess high level of intention to purchase. Now there are 250
respondents with high level of income, out of which 60 possess low level of intention to purchase
whereas 190 with high level of intention to purchase for the product. Computing percentages
according to column, it is seen that 24 per cent keep low purchase intention whereas 76 percent
have high purchase intention for the product. The results indicate that with increase in income,
the purchase intention for the product increases.
Thus, the two variables every one individually possessing certain categories can be cross-
tabulated. A high association may be show a high association by the interpretation of the cross
tabulation results among two variables. That does not mean that the independent variable is the
cause of the dependent variable. The researcher based on his experience or expectations assumes
the causality between the two variables. A high association between two variables, does not imply
a relationship of cause-and effect.
4.2 Introduction of third variable

Once the relationship between the two variables has been established, the researcher may
introduce a third variable into the analysis to elaborate and refine the initial observed relationship
between two variables. The main question being asked is whether the interpretation of the
relationship is modified with the introduction of third variable. There would be various
possibilities on introducing the third variable.
It may refine the association that was observed originally between two variables.
By introducing the third variable, it may be found that there was no association between initial
variables or the original association was spurious.
Introducing a third variable may indicate association between original two variables although no
association was observed originally.
Introduction of the third variable may not show any change in the initial association between two
variables.
No change in initial
relationship
Introduction Refining the

Reveal suppressed
of third initial
relation
variable relationship
Initial relationship
was spurious
Figure 3 Steps in Introduction of a third variable
4.2.1 Refining the initial relationship

The data on table 2 represents the relationship between consumption of ice cream and income
level. The respondents are divided into two categories- high consumption or low consumption
based on the amount of ice cream consumed. Similarly, the variable income was divided into two
categories – low income and high income.
Table 2
Income
Low Income High Income
Consumption of Ice High consumption 30% 55%
Cream Low Consumption 70% 45%
Total 100 100
No. of respondents 600 400
The results indicate that 55 percent of high income respondents fall into high consumption
category as compared to 30 per cent of low income respondents. Before inferring that high
income respondents consume more ice cream as compared to low income families, a third
variable, namely, gender is introduced into the analysis. The results are report in table 3.
Table 3
Gender
Males Females
Low income High income Low income High income
High 30% 38% 20% 60%
consumption
Low 70% 62% 80% 40%
consumption
Column total 100% 100% 100% 100%
No. of 400 180% 200 220
respondents
In table 3 gender of the respondent was introduced as the third variable. The relationship between
consumption of ice cream and income of respondents was reexamined in the light of the third
variable. In case of female, 60 per cent with high income fall in the high consumption category as
compared to 20 percent of those with low income. In case of males, 38 per cent with high income
fall in the high consumption category as compared to 30 per cent with low income. Therefore, it
is seen that percentages are closer in case of males. Therefore, the relationship between ice cream
consumption and income has been refined by introduction of a third variable namely gender.
High income respondents are more likely to fall in the high consumption category and this is
more so in case of females as compared to males.
4.2.2 Initial relationship was spurious

A study was conducted to examine the relation between the ownership of flat in high-rise
buildings and education level. The ownership of flat was categorized as yes or no, whereas the
variable education was categorized as low education and high education. The results of the study
are shown on table 4.
Table 4
High Education Low Education
Ownership of flats in Yes 35% 22%
high rise buildings No 65% 78%
Column Total 100% 100%
No of respondents 300 500
The table indicates that 35% of respondents with high education own a flat in a high rise building
as opposed to 22 per cent with low education. Now when a third variable ‘income categorized as
low and high income is introduced, the results are shown on table 5.
Table 5
Income
Low income High income
High Low High Low
education education education education
Ownership of Yes 18% 18% 45% 45%
flats in high No 82% 82% 55% 55%
rise buildings Column total 100% 100% 100% 100%
No of 100 300 200 200
respondents
The results on the table show that irrespective of the education level, the ownership of flat in high
rise buildings depends upon the income level. It is more for the high income respondents than that
for the low income respondents, indicating that the initial relationship was spurious.
4.2.3 Reveal suppressed association

A study was conducted to examine the relationship between the desire to visit temple and age.
The respondents who desire to visit temple were categorized as low and high and the age
categorized as younger respondents (age less than 35 years) and older respondents (at least 35
years of age). The cross tabulation of data resulted in table 6.
The results on table 6 show that desire to visit temple is independent of age. Now when gender is
added as the third variable, the results obtained are summarized in table 6.
Table 6
Age
<35  35 years
Desire to visit temple High 50% 50%

Low 50% 50
Column Total 100% 100%
No. of respondents 400 400
It is seen that 65 percent of males above 35 years have a high desire to go to temple whereas 70
per cent of females below 35 years of age have a high desired to go to temple. Therefore, the
introduction of third variable has revealed the suppressed relationship between desire to visit
temple and age.
Table 7
Gender
Male Female
< > < >
High 35% 65% 70% 30%
Low 65% 35% 30% 70%
Column total 100% 100% 100% 100%
No of 300 300 100 100
respondents
4.2.4 No change in initial relationship

There are situations when the introduction of a third variable does not change the initial
relationship. Consider the data in the cross table 8 where one variable is the size of toothpaste
bought by the families and the other variable is the size of the household. The size of toothpaste
was categorized as small and large and size of household was categorized as small and large.
Table 8
Household size
Large Small
Large 60% 40%
Size of toothpaste Small 40% 60%
Column total 100% 100%
No of respondents 200 300
Table 8 indicates that 60 per cent of the large households buy large sized toothpaste whereas 60
per cent of small households buy small-size toothpaste. Now if income categorized as low income
and high income is introduced as third variable, the new table is table 9.
Table 9
Income
Low income High Income
Large Small Large Small
Household Household Household Household
Size of Large 60% 40% 60% 40%
Toothpaste Small 40% 60% 40% 60%
Column Total 100% 100% 100% 100%
No. of 100 150 100 250
respondents
Results indicate that 60 per cent of the large households buy large-sized toothpaste where 60 per
cent of small households buy small-size toothpaste. Now if income categorized as low income
and high income is introduced as third variable, the new table is presented 9.
It is found that even with the introduction of third variable, i.e., income, the initial relationship
remain unchanged.
4.3 Spearman Rank Correlation Coefficient

Spearman’s rank correlation coefficient is one of the non-parametric tests used to measure the
degree of association between two variables. Sometimes, we come across statistical series that
are ranked according to size. This is because the exact magnitude of individual items in the series
cannot be ascertained. In such case Karl Pearsonian coefficient of correlation cannot be
calculated. Instead, Spearman’s rank correlation is used. This method is based on the banks of the
observations rather than on a specific distribution of X and Y. There are two types of problems in
Rank correlation (i) When actual ranks are given and (ii) When ranks are not given.
When ranks are given
The rank correlation coefficient is calculated with the help of following formula:
rs
Where, d = difference of two ranks (R1 -R2 )
rs = Spearman’s correlation or rank correlation
Example:
10 salesman employed by a company were given month’s training. At the end of the specified
training, they took a test and were ranked on the basis of their performance. They were then
posted to their respective areas. At the end of six months they were rated in respect of their sales
performance. These ranks are shown below:
Salesmen 1 2 3 4 5 6 7 8 9 10
Rank obtained in training 4 6 1 3 9 7 10 2 8 5
Rank based on Sales Performance 5 8 3 1 7 6 9 2 10 4
Salesmen Rank obtained in Rank obtained Difference Differences

Training (X) on the basis of (X-Y=’d’) squared (dF)
Sales (Y)
1 4 5 -1 1
2 6 8 -2 4
3 1 3 -2 4
4 3 1 2 4
5 9 7 2 4
6 7 6 1 1
7 10 9 1 1
8 2 2 0 0
9 8 10 -2 4
10 5 4 1 1
∑d² 24
rs
= 0.855
A coefficient of 0.855 shows a very high degree of correlation between the performance in
training and the sales performance of the ten salesmen.
First we set up null hypothesis that there is no correlation between ranked data. From Spearman’
Rank Correlation values table we find that the critical value of rs for n=10 at 5 percent level of
significance as 0.6364. As the tabulated value of rank correlation r s is 0.855 which is more than
the critical value, the null hypothesis is rejected. It can be inferred that the performance in
training and the sales performance of a sample of ten salesmen are associated.
In cases where ranks are not given, we have to assign ranks to actual data. We may do so by
taking highest value as 1 or the lowest value as 1.
Spearman’s r is a distribution free or non-parametric measure of correlation. This is due to the

fact that no strict assumptions are made regarding the underlying distribution from which the
sample observations have been taken. As such the result may not be as dependable as in case of
ordinary correlation where the distribution is known.
4.4 Correlation Analysis

A commonly used measure of association between two numerical variables is the correlation
analysis. When we are dealing with two variables we are talking in terms of simple correlation
and when more than two variables are involved, the subject matter of interest is called multiple
correlations.
Correlation analysis helps us in determining the degree of relationship between two variables- it
does not determine cause and effect relationship. Even a high degree of correlation does not
indicate a relation of cause and effect existing between the variables. By itself it establishes only
covariance.
There are various methods of ascertaining correlation between two variables. These are:
 Scatter Diagram
 Graphic Method
 Karl Pearson’s Correlation Analysis
 Concurrent Deviation Method, and
 Least Squares Method
Scatter diagram method of ascertaining correlation between two variables is to prepare a dot
chart. The data are plotted on a graph paper in the form of dots. By looking at the scatter the
investigator can make about the relationship between the variables.
In graphic method the individual values of the two variables are plotted on the graph paper.
The two curves are obtained, one for X variable and another for Y variable. The examination
of the direction and nearness of the two curves can indicate whether the variables are related.
The direction and strength of the linear association between two numerical paired variables is
measured by the coefficient r.
The movement of the two variables in the same direction tells about the positive correlation
between the two variables. When two variables move in the opposite direction, the correlation
is negative. When the variables move in no connection with each other, there is zero
correlation.
A quantitative estimate of a linear correlation between two variables X and Y is given by

Karl Pearson
rxy =
The statistical test for the significance of a correlation coefficient is conducted usin g a t-
statistic. The hypothesis to be tested is:
H0 ρ : = 0 H1 : ρ ≠ 0
Test statistic is given by
Given the value of r and n, the value of the test statistic t could be computed. Now for a given
level of significance, if computed t is greater than tabulated t with n-2 degrees of freedom, the
null hypothesis of no correlation between X and Y are rejected.
tn-2 =
Summary
Bivariate analysis examines the relationship between two variables. There are three types of
measures used for carrying out bivariate analysis. These are (a) cross tabulation (b) Spearman’s
rank correlation coefficient and (c) Pearson’s linear correlation coefficient. Spearman’s rank
correlation coefficient is one of the non-parametric tests used to measure the degree of association
between two variables. Sometimes, we come across statistical series that are ranked according to
size. Correlation is a measure of association between two numerical variables. When we are
dealing with two variables we are talking in terms of simple correlation and when more than two
variables are involved, the subject matter of interest is called multiple correlations. Correlation
analysis helps us in determining the degree of relationship between two variables- it does not
determine cause and effect relationship. Even a high degree of correlation does not indicate a
relation of cause and effect existing between the variables. Scatter diagram method of
ascertaining correlation between two variables is to prepare a dot chart. The data are plotted on a
graph paper in the form of dots. By looking at the scatter the investigator can make about the
relationship between the variables.

Module 12 Q1 Bivariateanalysis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Module 12 Q1 Bivariateanalysis

Uploaded by

Copyright:

Available Formats

Items Description of Module

Subject Name Management

Prof. S.P.Singh Department of Management

Prof. S.P.Singh Department of Management

Content Reviewer (CR)

1. Module : Bivariate Analysis

1. Module: Bivariate Analysis

Figure 1 Bi-variate Analysis

4.2 Introduction of third variable

Introduction Refining the

Figure 3 Steps in Introduction of a third variable

4.2.1 Refining the initial relationship

4.2.2 Initial relationship was spurious

4.2.3 Reveal suppressed association

Desire to visit temple High 50% 50%

4.2.4 No change in initial relationship

4.3 Spearman Rank Correlation Coefficient

Salesmen Rank obtained in Rank obtained Difference Differences

Spearman’s r is a distribution free or non-parametric measure of correlation. This is due to the

4.4 Correlation Analysis

A quantitative estimate of a linear correlation between two variables X and Y is given by

You might also like