You are on page 1of 82

There is always

something new to
experience!!!!
Always curious to learn!!!
We are here to learn those
interesting things!!!
What’s
now?
Today’s challenge is –
LOOKING FOR
RELATIONS AND
ASSOCIATIONS!!!
LOOKING FOR RELATIONS...

AMONG SET OF VARIABLES


Can we say looking at the following table that
higher the category of officers, higher the
degree of satisfaction?
Categories of Officers – I means Senior
Post
Satisfaction I II III IV
High 40 60 52 48
Medium 103 87 82 88
Low 57 53 66 64

THINK
• Do you think credit card has boost up your purchasing power?
a) Strongly Agree
b) Somewhat Agree
c) Neither Disagree nor Agree
d) Disagree
e) Strongly Disagree
• Do you feel end up buying more while using credit card?
a) Strongly Agree
b) Somewhat Agree
c) Neither Disagree nor Agree
d) Disagree
e) Strongly Disagree
What is the thing
you wish to ‘PUT
ON TEST’?
What to do …?
Research Project:

“TABOOS ABOUT ABORTION


IN INDIAN SOCIETY”

One of the research issue in it -


“Under what circumstances, people
like to have abortion?”
The researcher had obtained
responses from different male and
female respondents.
The data is ……………………….
GENDER
MALE FEMALE TOTAL
WHENEVER THEY
466 448 914
WANT ABORTION
FEELINGS
ONLY IN SPECIAL
ABOUT 345 383 728
CIRCUMSTANCES
ABORTION
SHOULD NOT BE
38 46 84
ALLOWED
TOTAL 849 877 1726

IS THERE ANY ‘DEGREE OF


ASSOCIATION’ BETWEEN
GENDER AND FEELINGS ABOUT
ABORTION?
MEASURES OF
ASSOCIATION/correlation
 A Measure of Association is a numerical index
summarizing the strength, degree and direction of
relationship in a two - dimensional cross-
classification of variables. It reveals ‘how two
variables are related’.
 Any measure of association/correlation represents
the mutual relationships between two variables.
FIRST,

MEASURES OF

ASSOCIATION FOR

NOMINAL DATA...
MEASURES OF
ASSOCIATION
 MEASURES OF ASSOCIATION FOR NOMINAL DATA

 CHI - SQUARE is used to determine whether there exists


any association/relation among nominal variables. But,
CHI - SQUARE STATISTIC has the following
features/limitations -

 It shows whether relationship exists or not.

 It does not provide any degree of relationship among


the variables.

 It is highly sensitive to sample size.


MEASURES OF ASSOCIATION…
NOMINAL DATA
MEASURES BASED UPON CHI – SQUARE
Cramer’s CONTINGENCY coefficient, V -
V = (2/(q -1))1/2

where

 = c2/n
2

q = Minimum( number of rows, number of columns).


 In case of no association, V = 0 and in case of the perfect
MEASURES OF ASSOCIATION…
NOMINAL DATA (CONTINUED…)
 There is no general consensus as to what constitutes a strong
relation or weak relation among the variables. However, the
following is offered as a guideline for interpreting the values
of the above mentioned statistics.
 Value of Statistics Possible Interpretation
0 - 0.0999 Negligible Association
0.10 - 0.1999 Weak Association
0.20 - 0.3999 Moderate Association
0.40 - 0.5999 Fairly Strong Association
0.60 - 0.9999 Very Strong Association
What is the degree of
association between
the degree and major
a student does?
OBSERVED DATA
Degree
Major TOTAL
BBA MBA DBA

Accounting 75 5 10 90

Finance 25 20 5 50

Other 20 5 35 60

TOTAL 120 30 50 200


EXPECTED DATA
Degree
Major TOTAL
BBA MBA DBA

Accounting 54 13.5 22.5 90

Finance 30 7.5 12.5 50

Other 36 9 15 60

TOTAL 120 30 50 200


CALCULATION OF CHI-SQUARE
Degree
Major TOTAL
BBA MBA DBA

Accounting 8.17 5.35 6.94 20.46

Finance 0.83 20.83 4.50 26.17

Other 7.11 1.78 26.67 35.56

TOTAL 16.11 27.96 38.11 82.19


• Cramer’s V is –

V  ( 82.19 / 200 )/( 3-1 )


 0.4533
• Gender: □ Male □ Female
• Occupation: □ Student □ Homemaker
□ Salaried □ Self-Employed
• Marital Status: □ Single □ Married

• Which Outdoor ENTERTAINMENT you like most?


a) Go for shopping in malls etc
b) Watch movie in a theater
c) Dine with friends and family members
What is the
d) Go out for adventurous experience thing
e) Any Other (please specify) you wish to ‘PUT
ON TEST’?
SPSS Output …
MARTIAL STATUS * ENTERTAINMENT Crosstabulation

Count
ENTERTAINMENT
Dine with
Go for friends Go out for Any Other
shopping in Watch movie and family adventurous (please
malls etc in a theater members experience specify) Total
MARTIAL MARRIED 80 40 10 35 15 180
STATUS UNMARRIED 25 40 50 25 5 145
Total 105 80 60 60 20 325

Symmetric Measures

Nominal by Phi
Value Approx. Sig. What can you
.426 .000
Nominal Cramer's V .426 .000 Say about the
N of Valid Cases
a. Not assuming the null hypothesis.
325
Association?
b. Using the asymptotic standard error assuming the null
hypothesis.
SPSS Output …
GENDER * ENTERTAINMENT Crosstabulation

Count
ENTERTAINMENT
Dine with
Go for friends Go out for Any Other
shopping in Watch movie and family adventurous (please
malls etc in a theater members experience specify) Total
GENDER MALE 40 60 20 75 25 220
FEMALE 45 25 10 5 20 105
Total 85 85 30 80 45 325

Symmetric Measures

Value Approx. Sig.


Nominal by Phi .371 .000 What can you
Nominal Cramer's V .371 .000
N of Valid Cases 325
Say about the
a. Not assuming the null hypothesis. Association?
b. Using the asymptotic standard error assuming the null
hypothesis.
SPSS Output …
OCCUPATION * ENTERTAINMENT Crosstabulation

Count
ENTERTAINMENT
Dine with
Go for friends Go out for Any Other
shopping in Watch movie and family adventurous (please
malls etc in a theater members experience specify) Total
OCCUPATION STUDENTS 25 40 50 15 5 135
HOMEMAKER 30 12 8 2 5 57
SALARIED 25 20 10 5 3 63
SELF-EMPLOYED 8 7 4 1 3 23
Total 88 79 72 23 16 278

Symmetric Measures

Value Approx. Sig.


Nominal by Phi .370 .000 What can you
Nominal Cramer's V .214 .000
N of Valid Cases 278
Say about the
a. Not assuming the null hypothesis. Association?
b. Using the asymptotic standard error assuming the null
hypothesis.
What do you say?
Which has more impact
on Type of Entertainment
– Gender, Martial Status
or Occupation?
SECOND,

MEASURES OF

ASSOCIATION FOR

ORDINAL DATA...
MEASURES OF ASSOCIATION…
ORDINAL DATA
· Measures of association between ordinal
data are classified into two groups -
– That which are based upon the concept of rank order
correlation
– That which are based upon the concepts of
agreement/concordance or disagreement/discordance.
 METHOD BASED UPON RANK ORDER CORRELATION
CONCEPT :

SPEARMAN RANK ORDER CORRELATION


It is equal to 1 - (6 d2)/(N(N2-1))
MEASURES OF ASSOCIATION…
ORDINAL DATA (CONTINUED…)
 Methods based upon the concepts of AGREEMENT
/CONCORDANCE or DISAGREEMENT/ DISCORDANCE
 These measures are developed to measure association
among ordinal data in the sense of ... “WHAT IS THE
DEGREE OF AGREEMENT OR DISAGREEMENT AMONG
THE RANKS ASSIGNED TO TWO VARIABLES?”
 These measures make use of the following concepts …
· AGREEMENT/CONCORDANCE(C): It means degree of
harmony/agreement between two ranks. Two pairs
(X1,Y1) and (X2,Y2) are said be concordant if
X1 > X2  Y1 > Y2 OR
X1 < X2  Y1 < Y2
MEASURES OF ASSOCIATION…
ORDINAL DATA (CONTINUED…)
 Methods based upon the concepts of AGREEMENT
/CONCORDANCE or DISAGREEMENT/ DISCORDANCE

· DISAGREEMENT/DISCORDANCE(D):It means
degree of disharmony/disagreement between
two ranks. Two pairs (X1,Y1) and (X2,Y2)
are said be discordant if
• X1 > X2  Y1 < Y2 OR
• X1 < X2  Y1 > Y2

· TIES:If some equality is found between pairs of


observation, then there exists a tie.
* PAIRS TIED ON X; PAIRS TIED ON Y AND PAIRS TIED
ON X &Y BOTH.
MEASURES OF ASSOCIATION…
ORDINAL DATA (CONTINUED…)
 Methods based upon the concepts of AGREEMENT
/CONCORDANCE or DISAGREEMENT/ DISCORDANCE

¨ Goodman and Kruskal Gamma () :

 = ( C - D )/ (C+D)
Can we estimate what is
the DEGREE OF
AGREEMENT between
two tests?
Two tests were conducted
to measure the Employees Test #1 Test #2
LEADERSHIP TRAITS OF 1 10 11
10 EMPLOYEES. 2 12 15
3 13 14
4 14 14
The data collected shows 5 15 14
the number of traits 6 10 13
possessed by an 7 8 9
employee out of 20 traits. 8 9 9
9 12 10
10 15 15
Can we say looking at the following table that
higher the category of officers, higher the
degree of satisfaction?
Categories of Officers – I means Senior
Post
Satisfaction I II III IV
High 40 60 52 48
Medium 103 87 82 88
Low 57 53 66 64

Revisiting the
Problem!!!
1. You prefer that your life partner’s family
should be financially sound.
2. Love marriage is better than ‘arranged
marriage’

Are respondents
consistent in
their responses?
SPSS Output …
Love marriage is better than 'arranged marriage'. * Your prefer that your life partner's family should be financially sound.
Crosstabulation

Count
Your prefer that your life partner's family should be financially sound.
STRONGLY SOMEWHAT SOMEWHAT STRONGLY
AGREE AGREE NEUTRAL DISAGREE DISAGREE Total
Love marriage STRONGLY AGREE 10 4 12 13 16 55
is better than SOMEWHAT AGREE 15 6 22 8 25 76
'arranged NEUTRAL 20 12 23 8 15 78
marriage'.
SOMEWHAT DISAGREE 28 9 24 18 10 89
STRONGLY DISAGREE 22 14 25 17 5 83
Total 95 45 106 64 71 381

Symmetric Measures

Asymp.
a b
Value Std. Error Approx. T Approx. Sig.
Ordinal by Ordinal Gamma -.197 .049 -3.991 .000
N of Valid Cases 381 What can you
a. Not assuming the null hypothesis.
CONCLUDE?
b. Using the asymptotic standard error assuming the null hypothesis.
THIRD,
MEASURES OF
ASSOCIATION FOR
INTERVAL & RATIO
SCALE DATA...
MEASURES OF ASSOCIATION FOR
INTERVAL AND RATIO SCALE DATA
 The degree of correlation between two variables
measured on interval and ratio scale can be measured
through PERASON’S CORRELATION COEFFICIENT which
is -
N  XY  (  X)(  Y )
rxy 
N  X2  (  X)2  N  Y 2  (  Y )2 
   
 Value of r Possible Interpretation
0.90 - 1.00 Very Strong Association
0.70 - 0.90 Fairly Strong Association
0.40 - 0.70 Moderate Association
0.20 - 0.40 Weak Association
Less than 0.2 Negligible Association
FOURTH,
MEASURES OF ASSOCIATION
FOR INTERVAL & RATIO
SCALE DATA AND NOMINAL
DATA...
Research Project:
What to do …?
“TV VIEWING HABITS AMONG
WOMEN IN NCR”

One of the research issue in it -


“Is working/non-working status of
women has any impact on the hours of
viewing TV?”
The researcher had obtained
responses from different working and
non-working women respondents.
The data is
……………

IS THERE ANY ‘DEGREE OF


ASSOCIATION’ BETWEEN STATUS
OF WOMEN AND TV VIEWING
HOURS?
MEASURE OF ASSOCIATION …
INTERVAL BY NOMINAL

 ETA (h): When one variable is categorical and the other is


a scaled one, then Eta is more suitable measure of
correlation.

 Usually, ETA is used when the dependent variable is


measured on the Scale Level while the independent
variable is measured on the Nominal Level.

 ETA squared is variation in dependent variable explained


by independent variable.
SPSS Output …

Directional Measures

Value
Nominal by Interval Eta STATUS OF WOMEN
RESPONDENT .799
Dependent
AVERAGE TV VIEWING
HOUR PER DAY IN THE .617
LAST WEEK Dependent

What can you


Say about the
Association?
The Last thing about Correlation …

“The invalid assumption that


correlation implies cause is
probably among the two or
three most serious and
common errors of human
reasoning”
With a happy relation and correlation … let’s MOVE to
something different ...
Have you ever wondered how a financial analyst
can predict Profits of a company?
Are you aware of the TOOL needed to estimate the
beta of a company?
Can I establish a functional relation between EPS of
share and its market price?
Can you model CAUSE and EFFECT
RELATIONSHIP between the two variables?
If you are serious in looking answers
to the issues raised then we have ……
Regression Model

Yi    X i  
Dr. C. P. Gupta
DETERMINISTIC
vs.
STOCHASTIC MODEL

 A model with the following functional


relation is called DETERMINISTIC model:
Y=a + b X

 A model with the following functional


relation is called STOCHASTIC model:
Y = a+bX + e
Why DISTURBANCE/ERROR
TERM …???

 Usually, the following rationales are given for


“Why ERROR term…?” :
 Omission of the variables and specification
error
 Measurement Error
 Human indeterminacy or stochastic nature of
economic processes
Some Preliminaries!!!

 Parameters: Unknown constants in a model are called


parameters. For instance, a and b are parameters in the
following model:
Y= a + b X+e
 Estimators: An estimator is a rule, formula, an algorithm
that is applied to the data in a specific sample to
compute an estimate of the population parameter.
 Estimates: An estimate is a number or specific value
computed or obtained through an estimator.
What is a GOOD estimator?
 Following are some criteria on the basis of which one
can judge the “goodness” of an estimator:
 Computational Cost
 Highest R2
 Linear Estimator
 Unbiased Estimator
 Minimum-variance or Efficient Estimator
 Based on all available information
CLASSICAL LINEAR REGRESSION
MODEL
 The General form of the Classical Linear Regression Model:
Yi = a + bXi + e i ; i = 1, …,n

 BASIC ASSUMPTIONS:
 Zero Mean of the Disturbance: E[ei] = 0 for all i;
 Homoscedasticity: Var[ei] = s2, a constant for all i;
 Non-autocorrelation: Cov[ei , ej] = 0 if i  j;
 Uncorrelatedness of regressor and disturbance: Cov[Xi , ej] = 0
if all i and j;
 Normality: ei ~ N[0, s2]; and
 Non-Stochastic Regressor: the value of Xi is a known constant
in the probability distribution of Yi.
The parameters of the Classical Regression
Model are determined by LEAST SQUARES
METHOD.

Using Least Squares Method, the estimate of


b, say b, is determined as follows:

b
i ( X i  X )( Yi  Y )
i ( X i  X ) 2
And, the estimate of a, say a, can be
determined as thus:
a  Y  bX
Let’s do step-by-step Regression
Analysis …
Trying to establish a Relation between the
Interest Rates and Futures Index
Day Interest Rate Futures Index
1 7.43 221
2 7.48 222
3 8.00 226
4 7.75 225
5 7.60 224
6 7.63 223
7 7.68 223
8 7.67 226
9 7.59 226
10 8.07 235
11 8.03 233
12 7.25 325
13 8.00 241
Step No.#1: Do we have sufficient
evidence to fit a Linear Regression Model?

Can you fit a linear


regression model to the data?
Relook at the data…
What will you like to say
about this point?

It is an OUTLIER!!!!!
Identify an outlier and remove it…
Day Interest Rate Futures Index
1 7.43 221
2 7.48 222
3 8.00 226
4 7.75 225
5 7.60 224
6 7.63 223
7 7.68 223
8 7.67 226
9 7.59 226
10 8.07 235
11 8.03 233
12 7.25 325
13 8.00 241
Removing the outlier we get the final data
for Regression Analysis …
Day Interest Rate Futures Index
1 7.43 221
2 7.48 222
3 8.00 226
4 7.75 225
5 7.60 224
6 7.63 223
7 7.68 223
8 7.67 226
9 7.59 226
10 8.07 235
11 8.03 233
13 8.00 241
Using the Least Square Method, we get …

Estimate of the Beta (b)…

Covariance(x, y)
Estimate of  
Variance(x )
Using Scientific Calculator, one can get ---
Covariance (Interest Rate and Futures Index) = 1.0180; and Variance
of Interest Rate = 0.0462.
Therefore, the estimate of Beta is: 22.0307
Using the Least Square Method, we get …

Estimate of the Alpha (a)…

Estimate of   y   x

Using Scientific Calculator, one can get ---


Mean of Interest Rate = 7.74; and
Mean of Futures Index = 227.0833.
Therefore, the estimate of Alpha is: 56.4740
Our FINAL REGRESSION EQUATION…

Futures Index = 56.4740 + 22.0307 Interest Rate


Line of BEST-FIT – Regression Line
Futures Index

f(x) = 22.03 x + 56.47


R² = 0.66
Futures Index

Interest Rate
Will our story of Regression
Analysis end here?

NO!
We shall have a beginning of … a
NEW STORY!
Before we proceed further, we must ensure –
‘how best is our line of BEST FIT?’

Futures Index

f(x) = 22.03 x + 56.47


R² = 0.66
Futures Index

Interest Rate
For that, we need a tool…

… to measure the DEGREE OF


GOODNESS OF FIT.
One of the ways in which the FIT of
the Regression can be evaluated is -

‘whether variation in x is a good


predictor of variation in y.’

Now, what is that which can measure


such a variation?
And, it is …

R =R Square!!!!
2

R2 - COEFFICIENT OF DETERMINATION. It is a measure that


represents the proportion of total variation in dependent variable
explained by the model.

Higher the value of R2, higher the variation explained and hence, it
is a better fit.
It is good that R2 can explain about the
GOODNESS of FIT. But, whatever is
explained how can I believe that it
would be statistically significant?!!!!!
But, why
For that ANOVA in
one can Regression?
use
ANOVA!!!!!!
ANALYSIS OF VARIANCE

ANOVA TABLE
Sources of Variation Variation Degrees of Freedom Mean Squre F-Ratio
Ratio of
Regression SSR K SSR/K
Mean
Residuals SSE n - (K+1) SSE/(n-(K+1)) Squares
Total SST n-1 SST/(n-1)
Summarizing…
Evaluating the FIT of the Regression!

 There are different tools to capture different


dimensions of FIT!!!!!
 Coefficient of determination or r2

 Analysis of Variance (ANOVA)

 Testing significance of the parameters of the


model individually.
Using EXCEL to get the Result of
Regression Analysis.
SUMMARY OUTPUT         
           
Regression Statistics        
Multiple R 0.8153       
R Square 0.6646       
Adjusted R Square 0.6311       
Standard Error 3.6850       
Observations 12       
           

ANOVA          
  df SS MS F Significance F
Regression 1 269.1232 269.1 19.82 0.0012
Residual 10 135.7934 13.58   
Total 11 404.9167     
           
  Coefficients Standard Error t Stat P-value  
Intercept 56.4740 38.3384 1.473 0.172 
Interest Rate 22.0307 4.9487 4.452 0.001 
Once we get the Regression Line and
assuming that it is the BEST FITTED LINE,
Then WHAT?
Where to go?
One can use Regression Analysis
for …
 One, Establishing a relation between the
variables and estimate the values.

 Second, to make a forecast!!!!!


Let’s take another example:
Car Age(Years) Selling Price (Rs.'000)
1 9 81
2 7 60
3 11 36
4 12 40
5 8 56
6 7 15
7 8 76
8 11 80
9 10 80
10 12 60
11 6 86
12 8 80
13 5 90
14 8 70
15 9 50
16 12 40
17 8 75
18 7 65
19 6 85
20 10 50
Let’s take another example:
Car Age(Years) Selling Price (Rs.'000)
1 9 81
2 7 60
3 11 36
4 12 40
5 8 56
6 7 15
7 8 76
8 11 80
9 10 80
10 12 60
11 6 86
12 8 80
13 5 90
14 8 70
15 9 50
16 12 40
17 8 75
18 7 65
19 6 85
20 10 50
EXCEL OUTPUT……
SUMMARY OUTPUT    
1.What   the  
is
     
Regression
   
Line?
 
Regression Statistics        
Multiple R 0.4218       
R Square 0.1779       
Adjusted R Square 0.1322  2.How
  well
  the 
Standard Error 18.8546  Regression
    Line
 
Observations 20  Fit
  the  Data? 
           

ANOVA          
  df SS MS F Significance F
Regression 1 1384.805684 1384.81 3.89541 0.063970444
Residual 18 6398.944316 355.497   
Total 19 7783.75     
           
  Coefficients Standard Error t Stat P-value  
Intercept 98.6206 18.1639 5.42948 3.7E-05  
Age(Years) -4.0081 2.0308 -1.9737 0.06397  
EXCEL OUTPUT……
SUMMARY OUTPUT        
      1.What
  is the
   
Regression Statistics   Regression
    Line?
 
Multiple R 0.7262       
R Square 0.5274       
Adjusted R Square 0.5012 
2.How 
well  the  
Standard Error 2.0738 
Regression
   
Line
 
Fit
Observations 20       
the Data?
           

ANOVA     3.If R/S    


Ratio is  0.30,
  df SS
then MS determine F Significance F
the
Regression 1 86.4036 86.4036 20.0902 0.0003
P/E Ratio?
Residual 18 77.4139 4.3008   
Total 19 163.8175     
           
  Coefficients Standard Error t Stat P-value  
Intercept 5.9772 0.9174 6.5155 0.0000 
R/S Ratio 74.0676 16.5248 4.4822 0.0003 

You might also like