Professional Documents
Culture Documents
Associate professor
Dept of Accounting & Information Systems
University of Dhaka
Regression Analysis
• Independent variables: This is the variable that explains the other explain variable .
Its values are not dependent, called independent variable. We denote it as X.
5
Regression Analysis
LINEAR REGRESSION: The line of Regression is the graphical or
relationship representation of the best estimate of one variable for any given
value of the other variable.
If X and Y are two variables of which relationship is to be indicated, a line that
gives best estimate of Y for any value of X, it is called Regression line of Y on X.
6
Assumptions Underlying Linear Regression
For each value of X, there is a group of Y values, and these
1. Normality: For any fixed value of X, Y is normally distributed.
The means of these normal distributions of Y values all lie on the
straight line of regression. The standard deviations of these normal
distributions are equal.
2. Independence: The Y values are statistically independent. This means that
in the selection of a sample, the Y values chosen for a particular X value do
not depend on the Y values for any other X values.
7
Assumptions Underlying Linear Regression
The least square method of fitting a line of best fit requires minimizing the
sum of the squares of vertical deviations of each observed Y value from the
fitted line.
^ Y 2 aY bXY
s y.x
(Y Y ) 2 s y. x
n2 n2
16
Finding the Regression Equation - Example
TomKeller Soni
Jones
^ ^
Y 18.9476
1.1842
X Y 18.9476
1.1842
X
^ ^
Y 18.9476
1.1842
(20) Y 18.9476
1.1842
(30)
^ ^
Y 42.6316 Y 54.4736
18
Plotting the Estimated and the Actual Y’s
19
Standard Error of the Estimate - Example
20
Standard Error of the Estimate
Graphical Illustration of the Differences between Actual Y – ^
Estimated Y (Y Y )
22
PROPERTIES OF REGRESSION COEFFICIENTS
1. The slope of regression line is called the regression coefficient. It tells the
effect on dependent variable if there is a unit change in the independent
variable. Since for a paired data on X and Y variables, there are two
regression lines: regression line of Y on X and regression line of X on Y, so
we have two regression coefficients:
The signs of both the regression coefficients are the same, and so the value of r
will also have the same sign.
PROPERTIES OF REGRESSION COEFFICIENTS
Problem
Problem
Multiple Regression Analysis
28
Multiple Regression Analysis
29
Regression Plane for a 2-Independent Variable Linear
Regression Equation
30
Problem
• Here,
• Z = Financial distress
X1(Liquidity Ratio) = Net Working Capital/ Total Assets
X2(Profitability Ratio) = Retained Earnings/ Total Assets
X3(Return on Assets)= EBIT/ Total Assets
X4(Solvency Ratio)= Market Value of Equity/ Book Value of
Total Liabilities
Multiple Linear Regression - Example
33
The Multiple Regression Equation – Interpreting the
Regression Coefficients
• The regression coefficient for mean outside temperature is 4.583. The coefficient
is negative and shows an inverse relationship between heating cost and
temperature. As the outside temperature increases, the cost to heat the home
decreases. The numeric value of the regression coefficient provides more
information. If we increase temperature by 1 degree and hold the other two
independent variables constant, we can estimate a decrease of $4.583 in monthly
heating cost. So if the mean temperature in Boston is 25 degrees and it is 35
degrees in Philadelphia, all other things being the same (insulation and age of
furnace), we expect the heating cost would be $45.83 less in Philadelphia.
• The attic insulation variable also shows an inverse relationship: the more
insulation in the attic, the less the cost to heat the home. So the negative sign for
this coefficient is logical. For each additional inch of insulation, we expect the
cost to heat the home to decline $14.83 per month, regardless of the outside
temperature or the age of the furnace.
• The age of the furnace variable shows a direct relationship. With an older
furnace, the cost to heat the home increases. Specifically, for each additional year
older the furnace is, we expect the cost to increase $6.10 per month.
34
Applying the Model for Estimation
35
Multiple Standard Error of Estimate
36
Multiple Regression and Correlation Assumptions
37
The ANOVA Table
38
Minitab – the ANOVA Table
39
Coefficient of Multiple Determination (r2)
40
Minitab – the ANOVA Table
SSR 171,220
R2 0.804
SS total 212,916
41
Adjusted Coefficient of Determination
42
Adjusted Coefficient of Determination
43
Evaluating the
Assumptions of Multiple Regression
44
Analysis of Residuals
45
Scatter Diagram
46
Residual Plot
47
Distribution of Residuals
Both MINITAB and Excel offer another graph that helps to evaluate the
assumption of normally distributed residuals. It is a called a normal probability
plot and is shown to the right of the histogram.
48
Multicollinearity
• Multicollinearity exists when independent
variables (X’s) are correlated.
• Correlated independent variables make it
difficult to make inferences about the individual
regression coefficients (slopes) and their
individual effects on the dependent variable (Y).
• However, correlated independent variables do
not affect a multiple regression equation’s
ability to predict the dependent variable (Y).
49
Variance Inflation Factor
• A general rule is if the correlation between two independent variables
is between -0.70 and 0.70 there likely is not a problem using both of
the independent variables.
• A more precise test is to use the variance inflation factor (VIF).
• The value of VIF is found as follows:
•The term R2j refers to the coefficient of determination, where the selected
independent variable is used as a dependent variable and the remaining
independent variables are used as independent variables.
•A VIF greater than 10 is considered unsatisfactory, indicating that independent
variable should be removed from the analysis.
50
Multicollinearity – Example
Refer to the data in the table,
which relates the heating
cost to the independent
variables outside
temperature, amount of
insulation, and age of
furnace.
Develop a correlation matrix
for all the independent
variables.
Does it appear there is a
problem with
multicollinearity?
Find and interpret the
variance inflation factor
for each of the
independent variables.
51
Correlation Matrix - Minitab
52
VIF – Minitab Example
Coefficient of
Determination
54
Residual Plot versus Fitted Values
• The graph below shows the
residuals plotted on the
vertical axis and the fitted
values on the horizontal axis.
• Note the run of residuals
above the mean of the
residuals, followed by a run
below the mean. A scatter plot
such as this would indicate
possible autocorrelation.
55
Qualitative Independent Variables
• Frequently we wish to use nominal-scale
variables—such as gender, whether the home has
a swimming pool, or whether the sports team
was the home or the visiting team—in our
analysis. These are called qualitative variables.
• To use a qualitative variable in regression
analysis, we use a scheme of dummy variables in
which one of the two possible conditions is coded
0 and the other 1.
56