You are on page 1of 17

1.

MULTIPLE REGRESSION
• In a simple linear regression there is only one
independent variable.
• In a multiple regression there are two or more
independent variables.
• The multiple regression model for two
independent variables is:
• 𝑌 = 𝛼 + 𝛽1 𝑋1 + 𝛽2 𝑋2
1. MULTIPLE REGRESSION
• Normal Equations
• σ 𝑌 = 𝑛𝛼 + 𝛽1 σ 𝑋1 + 𝛽2 σ 𝑋2

• σ 𝑋1 𝑌 = 𝛼 σ 𝑋1 + 𝛽1 σ 𝑋12 + 𝛽2 σ 𝑋1 𝑋2

• σ 𝑋2 𝑌 = 𝛼 σ 𝑋2 + 𝛽1 σ 𝑋1 𝑋2 + 𝛽2 σ 𝑋22
1. MULTIPLE REGRESSION
• Matrix Notation
෍𝑌 𝑛 ෍ 𝑋1 ෍ 𝑋2 𝛼

𝛽1
෍ 𝑋1 𝑌 = ෍ 𝑋1 ෍ 𝑋12 ෍ 𝑋1 𝑋2
𝛽2
෍ 𝑋2 𝑌 ෍ 𝑋2 ෍ 𝑋1 𝑋2 ෍ 𝑋22
1. MULTIPLE REGRESSION
• A company sells a new brand of deodorant in 6
marketing districts. The management of the
company, in an exploration of the feasibility of
predicting district sales ( 𝑌 ) from target
population of size (𝑋1 ) and per capita income
(𝑋2 ), collected data and obtained the following
results:
1. MULTIPLE REGRESSION
෍𝑌 ෍ 𝑋1 ෍ 𝑋2 ෍ 𝑋1 𝑌 ෍ 𝑋2 𝑌 ෍ 𝑋12 ෍ 𝑋22 ෍ 𝑋1 𝑋2 ෍ 𝑌 2

54 48 102 339 720 494 2188 1034 576


1. MULTIPLE REGRESSION
• Obtain the least square estimates of the regression
coefficients.
• Solve for the three constants using a calculator
• 𝛼 = 16.48 𝛽1 = 0.39 𝛽2 = −0.62

• Write down the multiple regression equation.

• 𝑌 = 16.48 + 0.39𝑋1 − 0.62𝑋2


1. MULTIPLE REGRESSION
• Interpret the regression coefficients in your equation above.
𝛼 = 16.48
• If there is no target population and no per capita income, the
sales will be 16.48
𝛽1 = 0.39
• If the target population increases by 1, while the per capita
income remains the same, the sales will also increase by 0.39
𝛽2 = −0.62
• If the per capita income increases by I, while the target
population remains the same, the sales will decrease by 0.62
1. MULTIPLE REGRESSION
• Residual Analysis
• The difference between the observed 𝑌 and the
corresponding fitted value 𝑌 ∗ is called the residual, 𝑒.
• 𝑒 = 𝑌 − 𝑌∗
• Example
• The following data was collected on two variables 𝑥 and
𝑦, and a least squares regression line was fitted to the
data. The resulting equation is 𝑦 ∗ = 2.35 + 0.86𝑥
1. MULTIPLE REGRESSION
• Residual Analysis
x 23 15 26 24 22 29 32 40 41 46

y 19 18 22 20 27 25 32 38 35 45

• Find the residuals for the data above.


1. MULTIPLE REGRESSION
• Residual Analysis
• The sum of the residuals is zero; σ 𝑌 − 𝑌 ∗ = 0
• Errors have constant variance
• Errors are independent
• Errors are normally distributed
2. VARIABLE SELECTION
•Concept
•Selection of the set of independent
variables that contributes significantly
to the dependent variable
2. VARIABLE SELECTION
•Advantages
•Cost Reduction
•Less Time
•Easier to handle resulting model
2. VARIABLE SELECTION
•Disadvantages
•Can be abused
•Discarding key explanatory variables
•Reduce the explanatory powers of the
model
2. VARIABLE SELECTION
•Stages
•Data Collection Stage
•Screening Stage
•Model Building Stage
2. VARIABLE SELECTION
•Techniques
•Consider all possible regression models
•Identify independent variables which are
‘good’
•Employ automatic search procedures
3. OUTLIERS
• Definition
• Extreme observations that are not typical of the
rest of the observations
• Causes
• Errors during data collection, recording,
transcribing or key punching the data
• Genuine observations
3. OUTLIERS
• Effects
• Useful evidence
• Misleading fit
• Incorrect estimate

You might also like