You are on page 1of 20

Multiple Regression

Analysis
Bentuk Umum MRE

Ket:
Y = Variabel dependen
a = intercept (konstanta) dari y
b = koefisien variabel bebas
X =variabel bebas
Example
Suppose the selling price of a home is directly
related to the number of rooms and inversely related
to its age, let X1 refer to the number of rooms, X2 to
the age of the home and y topi to the selling price of
the home ($000)
y=21.2 +18.7X1-0.25X2
So, a seven-room house that is 30 years old is
expected to sell for??
y=21.2 +18.7(7)-0.25(30)=$144,600
Salsberry Realty sells homes along the East Coast of the United
States. One question frequently asked by prospective buyers is
“how much can we expect to pay to heat the home in the winter”?
The research department at Salsberry thinks 3 variables relate to
heating costs: the mean daily outside temperature, the number of
inches of insulation, and the age in years of the furnace. They
conduct a random sample of 20 homes. Determine the regression
equation.
y is the dependent variable
X1 is the outside temperature
X2 is inches of insulation
X3 is the age of the furnace
y=a+b1X1+b2X2+b3X3
y is used to estimate the value of y
Once we determine the regression equation, we can calculate the heating costs
for January, given the mean outside temperature is 30 degrees, there are 5 inches
of insulation, and the furnace is 10 years old.

y=a+b1X1+b2X2+b3X3
y=a+427.194X1+4.583X2+6.101X3
y=a+427.194(30)+4.583(5)+6.101(10)=276.56

Thus, the estimated heating costs for January are $276.56.


An ANOVA table
summarizes the multiple
regression analysis
It reports the total amount of
the variation divided in two
components
❖ The regression, the
variation in all the
independent variables
❖ The residual or error,
the unexplained
variation of y
It reports the degrees of freedom of the independent variables, the error variation,
and the total variation
Total sum of squares = SS total = sigma (y-ybar)^2

Regression sum of squares = SSR = sigma(ytopi-ybar)^2

Residual or Error sum of squares = SSE = sigma(y-ytopi)^2


MSE

where:
Y is the actual observation.
Ytopi is the estimated value computed from the regression equation.
n is the number of observations in the sample.
k is the number of independent variables.
SSR is the Residual Sum of Squares from an ANOVA table
MSE

The actual heating cost for the first observation, Y, is $250, the outside
temperature, is 35 degrees, the depth of insulation, is 3 inches, and the age of the
furnace, is 6 years. (subs.kan ke persamaan); Y=258.90.

The actual heating cost was $250, so the residual—which is the difference
between the actual value and the estimated value—is Y-Ytopi=250 - 258.90=
8.90.

This difference of $8.90 is the random or unexplained error for the first home
sampled. Our next step is to square this difference—that is, find (Y-Ytopi)^2= (250
- 258.90)^2= (-8.90)^2 = 79.21.
Fungsi MSE
First, the units are the same as the dependent variable, so the standard error is in
dollars, plusmin $51.05.

Second, we expect the residuals to be approximately normally distributed, so


about 68 percent of the residuals will be within plusmin $51.05. and about 95
percent within plusmin 2($51.05) or plusmin $102.10.
Coef. Multiple Determination

Koefisien determinasi berganda, fungsinya adalah untuk mengukur kebaikan suai


(goodness of fit) dari persamaan regresi; yaitu memberikan proporsi atau
persentase variasi total dalam variabel terikat yang dijelaskan oleh variabel bebas.
Nilai R2 terletak antara 0 – 1, dan kecocokan model dikatakan lebih baik kalau R2
semakin mendekati 1.
We conclude that the independent variables (outside temperature, amount of
insulation, and age of furnace) explain, or account for, 80.4 percent of the
variation in heating cost.
Adjusted Coef. Determination

If we compare the R^2(0.80) to the adjusted R^2 (0.77), the difference in this
case is small.
Measure of the effectiveness
There are two measures of effectiveness of the regression equation

The multiple standard error of the estimate is similar to the standard deviation

It is measured in the same units as the dependent variable

It is based on squared deviations between the observed and predicted values of


the dependent variable It ranges from 0 to plus infinity It is calculated from the
following equation
Contoh Soal
The director of marketing at Reeves Wholesale Products is studying monthly
sales. Three independent variables were selected as estimators of sales: regional
population, per capita income, and regional unemployment rate. The regression
equation was computed to be (in dollars):

Y= 64,100 + 0.394X1 + 9.6X2 - 11,600X3

a. What is the full name of the equation?

b. Interpret the number 64,100.

c. What are the estimated monthly sales for a particular region with a population of
796,000, per capita income of $6,940, and an unemployment rate of 6.0 percent?
a. Multiple regression equation

b. 64,100 is the Y-intercept

c. 64,100 + 0.394(796,000) + 9.6(6,940) - 11,600(6.0) = $374,748


Consider the ANOVA table that follows.

a. Determine the standard error of estimate. About 95 percent of the residuals will
be between what two values?
b. Determine the coefficient of multiple determination. Interpret this value.
c. Determine the coefficient of multiple determination, adjusted for the degrees of
freedom.
Answer

You might also like