You are on page 1of 42

Regression Analysis

Muhammad Akram Naseem (iqra4ever@gmail.com) Presenter: Research Centre for Training and Development(RCTD

3/10/2014

Unlock the Potential of Data Analysis

Model Building
Model Mathematical way to express the theory is known as model Types of Models 1. Exact Models(Mathematical Model) 2. In-Exact Models(Statistical Model)

3/10/2014

Unlock the Potential of Data Analysis

Exact Models(Mathematical Model)


The expression by which output can be determined exactly by the input(s) known as exact model, e.g
Chemical formula of water: H2O Chemical formula of glucose:C6H12O6
Area of a circle=pi(radius)^2

3/10/2014

Unlock the Potential of Data Analysis

In-Exact Models (Statistical Model)


The expressions in which output cant be determined exactly by some input(s) known as statistical models, e.g. 1. Fertility of land cant be determined exactly by only amount of rain fall 2. CGPA(marks) of the students cant be determined exactly by study hours of the students 3. Sale of Ice cream cant be determined exactly by only daily temperature
3/10/2014 Unlock the Potential of Data Analysis 4

Different Forms of Statistical Model


Linear Models 1. Simple Linear Models 2. Multiple Linear Models Non Linear Models 1. Polynomial Models 2. Reciprocal Models 3. Logarithmic Models
3/10/2014 Unlock the Potential of Data Analysis 5

Linearity Determination
Graphic Form
Y=a+bx, Fig-1
0 1 -1 -2 -3 -4 -5 2 3 4 5 6 7 8 9 10 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10

Y=a+bx Fig-2

-6

3/10/2014

Unlock the Potential of Data Analysis

Linearity Determination
Graphic Form
Quardatic Fig-3
600 500 400 300 200

100
0 -100 -200 -300 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

3/10/2014

Unlock the Potential of Data Analysis

Linearity Determination
Linear with respect to variables and parameters Y=+X+
Y=+ 1X1 +2X2+------------------------ kXk+

3/10/2014

Unlock the Potential of Data Analysis

Assumptions of Classical Linear Regression Model(CLRM)


1. The regression model is linear in the parameters. Y=+X+u 2. X values are fixed in repeated sampling (X is assumed to be nonstochastic) 3. Zero mean value of disturbance Ui i.e E(Ui) =0

4. Homoscadasticity or equal variance of ui Var(ui)=2


3/10/2014 Unlock the Potential of Data Analysis 9

Assumptions of Classical Linear Regression Model(CLRM)


5.No autocorrelation between the disturbances, the correlation between any two ui and uj ij is zero Cov(ui , uj) =0 6. Zero covariance between ui and Xi E(ui Xi)=0 7.The number of observations n must be greater than the number of parameters to be estimated. Alternatively, the number of observations n must be greater than the number of explanatory variables 8.Variability in X values. The X values in a given sample must not all be the same. Var(X) must be a finite positive number.

9.The regression model is correctly specified. Alternatively, there is no specification bias or error
10.There is no perfect multicollinearity. That is, there is no perfect linear relationship among the explanatory variable.
3/10/2014 Unlock the Potential of Data Analysis 10

Regression Analysis
Dependence of one variable on a single variable or more than one variables is known as Regression Simple Regression Dependence of one variable on a single variable is known as simple Regression Multiple Regression Dependence of one variable on more than one variables is known as multiple Regression
3/10/2014 Unlock the Potential of Data Analysis 11

Regression Analysis
Simple Regression 1-Blood Pressure(Y)depends on age(X) Y=+X+ 2-CGPA of students(Y) depend on study hours(X) Y=+X+ 3-Production of a certain crop(Y) depend on amount of fertilizer used(X) Y=+X+
3/10/2014 Unlock the Potential of Data Analysis 12

Regression Analysis
Dependent Variable Slope of line or Regression Co-efficient or Rate of change Residual term

Y=+X+
Y-intercept In Dependent Variable

3/10/2014

Unlock the Potential of Data Analysis

13

9 Cases of bivariate Regression


Case #
1

Dependent variable(Y)
Quantitative

Independent variable(X)
Quantitative

Technique

2
3 4 5 6 7 8 9
3/10/2014

Quantitative
Quantitative Categorical-Binary Categorical-Binary Categorical-Binary Categorical-Multi Category Categorical-Multi Category Categorical-Multi Category

Categorical-Binary
Categorical-Multi Category Quantitative Categorical-Binary Categorical-Multi Category Quantitative Categorical-Binary Categorical-Multi Category

Regression

Logistic Regression

Multinomial Logistic
14

Unlock the Potential of Data Analysis

Regression Analysis
Purpose of Regression Analysis 1. To find out rate of change 2. To estimate the dependent variable on the basis of independent variable(s)

3/10/2014

Unlock the Potential of Data Analysis

15

Regression lines for different values of b Y=+X+


(a) b > 0

(b) < 0

(c) = 0

3/10/2014

Unlock the Potential of Data Analysis

16

Simple Regression Analysis (Case-1)


Case: Blood Pressure is dependent on age Dependent variable: Blood Pressure(B.P) Independent: Age(x) xy x
b =

Model: B.P= +age+


a =

n x 2 n

y n n 2 x n

byx x

3/10/2014

Unlock the Potential of Data Analysis

17

Scattered Diagram
150 140

B.P

130

120

110 10 20 30 40 50 60 70 80

AGE

3/10/2014

Unlock the Potential of Data Analysis

18

Simple Regression Analysis (Case-1)

3.Shift the desire variables

1.Click on analyze

2.Click on linear

4.Click on Ok

3/10/2014

Unlock the Potential of Data Analysis

19

Simple Regression Analysis (Case-1)


Model Summary Model R 1 0.965 Explanatory power of the model R Square Adjusted R Square Std. Error of the Estimate .930 .926 2.178

ANOVAb Model Sum of Squares df Regression 1078.29 1 Residual 80.658 17 Total 1158.97 18 a. Predictors: (Constant), Age b. Dependent Variable: B.P

Mean Square 1078.29 4.745

F Sig. 227.268 0.00

P-value suggest that model is significant


20

3/10/2014

Unlock the Potential of Data Analysis

Output tables
1-Summary table 2-ANOVA table 3-Co-efficients table

3/10/2014

Unlock the Potential of Data Analysis

21

Simple Regression Analysis (Case-1)


Co-efficients
Unstandardized Coefficients Standardized Coefficients t Sig.

B
(Constant) Age 112.216 0.447

Std. Error
1.401 0.030

Beta
80.097 0.965 15.075 0.00 0.00

Estimated model is: B.P=112.216+0.447Age


3/10/2014 Unlock the Potential of Data Analysis

P-value suggest that explanatory variable is significant


22

Practice- Case-1
An experiment was conducted to study the impact of heart rate(X) on anxiety(Y). The data relate to 12 normal adults and is given in spss file. Estimate the model------- Y=+X+

3/10/2014

Unlock the Potential of Data Analysis

23

Simple Regression Analysis (Case-2)


In case 2 our objective is to know the impact of a categorical (binary) explanatory variable on a quantitative dependent variable, how the analysis will be performed and how we will interpret the findings. Dependent variable: Marks Independent variable: gender

3/10/2014

Unlock the Potential of Data Analysis

24

Multiple Regression
1-Saving of household(Y)depends on monthly income(X1), size of family(X2) and so on Y=+ 1X1 +2X2+------------------------ kXk+ 2-CGPA of students(Y) depend on study hours(X1),IQ(X2) and so on Y=+ 1X1 +2X2+------------------------ kXk+ 3-Production of a certain crop(Y) depend on amount of fertilizer used(X1),water(X2) and so on Y=+ 1X1 +2X2+------------------------ kXk+
3/10/2014 Unlock the Potential of Data Analysis 25

Simple Regression Analysis (Case-2)


In Regression Analysis, when ever explanatory variable is categorical , then we introduce dummy variable. Number of dummy variables= number of categories-1, In case-2 our explanatory variable is gender( male, female) which possess two categories, so we introduce one dummy variable(D) by the following coding scheme Female=1 , male= 0
3/10/2014 Unlock the Potential of Data Analysis 26

Simple Regression Analysis (Case-2)


Model Summary Model R 1 0.056 Explanatory power of the model R Square Adjusted R Square Std. Error of the Estimate .003 -.007 5.484

ANOVAb Model Sum of Squares df Regression 9.213 1 Residual 2947.547 98 Total 2956.760 99 a. Predictors: (Constant), gender b. Dependent Variable: Marks

Mean Square 9.213 30.077

F 0.306

Sig. 0.58

P-value suggest that model is in significant


27

3/10/2014

Unlock the Potential of Data Analysis

Simple Regression Analysis (Case-2)


Co-efficients
Unstandardized Coefficients Standardized Coefficients t Sig.

B
(Constant) gnder 68.455 -0.610

Std. Error
0.739 1.102

Beta
92.569 -0.056 -0.553 0.00 0.581

Estimated model Marks=68.455-0.610gender


3/10/2014 Unlock the Potential of Data Analysis

P-value suggest that explanatory variable is in significant


28

Simple Regression Analysis (Case-2)


Marks=68.455-0.610gender Average marks of male students: Marks=68.455-0.610(0)=68.455---------------(1) Average marks of female students: Marks=68.455-0.610(1)=67.844----------------(2) The difference of equation (2)-(1)= -0.610
3/10/2014 Unlock the Potential of Data Analysis 29

Practice:-Case 2
Use Case 2.save data file and determine the impact of gender on the salary of employees of an organization Dependent variable: Salary Independent variable: gender Estimated the model: Y=+X+

3/10/2014

Unlock the Potential of Data Analysis

30

Simple Regression Analysis (Case-3)


In case 3 our objective is to know the impact of a multi-categorical explanatory variable on a quantitative dependent variable, how the analysis will be performed and how we will interpret the findings. File used: Case3.sav Dependent variable: Salary Independent variable: Job category
3/10/2014 Unlock the Potential of Data Analysis 31

Simple Regression Analysis (Case-3)


In Case3 our explanatory variable is job category, which posses 3 sub categories(Clerical , Custodial , Manager). We will make two dummy variables by taking considering one sub-category as reference or base category D1: (clerical=1 , otherwise= 0) D2: (custodial=1 , otherwise= 0) , manager as reference category
3/10/2014 Unlock the Potential of Data Analysis 32

Simple Regression Analysis (Case-3)


Model Summary Model R 1 0.805 Explanatory power of the model R Square Adjusted R Square Std. Error of the Estimate .649 .647 10144.651

ANOVAb Model Sum of Squares Regression 8.943E10 Residual 4.847E10 Total 1.379E11

df 2 471 473

Mean Square 4.472E10 1.029E8

F Sig. 434.502 0.00

a. Predictors: (Constant), D1,D2 b. Dependent Variable: Salary

P-value suggest that model is significant


33

3/10/2014

Unlock the Potential of Data Analysis

Simple Regression Analysis (Case-3)


Co-efficients
Unstandardized Coefficients Standardized Coefficients t Sig.

B
(Constant) D1 D2 63977.798 -36138.018 -33038.909

Std. Error
1106.872 1228.281 2244.280

Beta
57.801 -0.897 -0.449 -29.422 -14.721 0.00 0.00 0.00

Estimated model :
Salary=63977-36138D1-33038D2
3/10/2014 Unlock the Potential of Data Analysis

P-value suggest that explanatory variables D1 and D2 are significant


34

Simple Regression Analysis Average Salary of (Case-3) different job categories


80000

Estimated model :

63978 27840 30939

60000 40000 20000 0 Clerical Custodian Manager

Salary=63977-36138D1-33038D2 Average Salary of Clerks:63977-36138(1)-33038(0)= 27839 Average Salary of Custodian:63977-36138(0)33038(1)=30939 Average Salary of Managers:63977-36138(0)33038(0)=63977
3/10/2014 Unlock the Potential of Data Analysis 35

Multiple Regression
Dependent Variable Regression co-efficients

Y=+ 1X1 +2X2+------------------------ kXk+


In Dependent Variable

Residual term

Intercept

3/10/2014

Unlock the Potential of Data Analysis

36

Multiple Regression
To know the impact of age and weight on blood pressure a random sample from 20 patients is collected and analyzed BP=+ 1AGE +2Weight+

3/10/2014

Unlock the Potential of Data Analysis

37

Multiple Regression

3.Shift the desire variables

1.Click on analyze

2.Click on linear

4.Click on Ok

3/10/2014

Unlock the Potential of Data Analysis

38

Explanatory power of the model

Model Summary
Model R R Square Adjusted R Square Std. Error of the Estimate

1.00

0.99
ANOVA

0.99

0.53

Sum of Squares

df

Mean Square

Sig.

Regression Residual Total

555.18 4.82 560

2.00 17.00 19

277.59 0.28

978.25

0.00

P-value suggest that model is significant


3/10/2014 Unlock the Potential of Data Analysis 39

Coefficients

Unstandardized Coefficients

Standardized Coefficients
t Sig.

Std. Error

Beta

(Constant) Age Weight

-16.58 0.71 1.03

3.01 0.05 0.03

0.33 0.82

-5.51 13.23 33.15

0.00 0.00 0.00

Estimated model is:

BP=-16.58+0.71Age+1.03Weight

P-value suggest that explanatory variable is significant


40

3/10/2014

Unlock the Potential of Data Analysis

Practice
The data given in spss file were collected using a simple random sample of 20 hypertensive patients. Y = mean arterial blood pressure (mmHg) X1 = age (years) X2 = weight (kg) X3 = body surface area (sqm) X4 = duration of hypertension (years) X5 = basal pulse (beats/min) X6 = measures of stress

3/10/2014

Unlock the Potential of Data Analysis

41

3/10/2014

Unlock the Potential of Data Analysis

42

You might also like