Professional Documents
Culture Documents
Weichen Wang
August 2021
Response variable?
800
600
400
200
Weight (carats)
Response variable?
800
Price
600
400
200
Weight (carats)
Response variable?
800
Price
Explanatory variable?
600
400
200
Weight (carats)
Response variable?
800
Price
Explanatory variable?
600
Weight
400
200
Weight (carats)
Response variable?
800
Price
Explanatory variable?
600
Weight
400
Weight (carats)
Response variable?
800
Price
Explanatory variable?
600
Weight
400
Weight (carats)
v
n u n
1X u 1 X
x̄ = xi , sx = t (xi − x̄)2
n i=1
n − 1 i=1
r=0.98 r=−0.98
5000
5000
r=0.98 r=0.98 r=0.98
Price (Singapore dollars)
4000
4000
800
Price (RMB)
Price (RMB)
3000
3000
600
2000
2000
400
1000
1000
200
0.15 0.20 0.25 0.30 0.35 0.15 0.20 0.25 0.30 0.35 0.03 0.04 0.05 0.06 0.07
0.35
1000
r=0.98 r=0.98
0.30
Price (Singapore dollars)
800
Weight (carats)
0.25
600
0.20
400
0.15
200
0.15 0.20 0.25 0.30 0.35 200 400 600 800 1000
3
2
2
1
1
0
0
−1
−1
−1
−2
−2
−2
r=0.98 r=0.68 r=0.43
−3
−3
−3
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
The scatterplot below shows the relationship between HS graduate rate in all 50
US states and DC and the % of residents who live below the poverty line
Which of the following is the best guess for the correlation?
18
16
(a)
0.6
% in poverty
14
(b)
-0.75 12
(c)
-0.1 10
(d)
0.02 8
6
(e)
-1.5
80 85 90
% HS grad
The scatterplot below shows the relationship between HS graduate rate in all 50
US states and DC and the % of residents who live below the poverty line
Which of the following is the best guess for the correlation?
18
16
(a)
0.6
% in poverty
14
(b)
-0.75 12
(c)
-0.1 10
(d)
0.02 8
6
(e)
-1.5
80 85 90
% HS grad
Which of the following is the best guess for the correlation between % in poverty
and % female householder?
18
16
(a)
0.1
% in poverty
14
(b)
-0.6 12
(c)
-0.4 10
(d)
0.9 8
6
(e)
0.5
8 10 12 14 16 18
% female householder, no husband present
Which of the following is the best guess for the correlation between % in poverty
and % female householder?
18
16
(a)
0.1
% in poverty
14
(b)
-0.6 12
(c)
-0.4 10
(d)
0.9 8
6
(e)
0.5
8 10 12 14 16 18
% female householder, no husband present
Which of the following has the strongest correlation, i.e. correlation coefficient
closest to +1 or -1?
(a) (b)
(c) (d)
Which of the following has the strongest correlation, i.e. correlation coefficient
closest to +1 or -1?
(b) →
(a) (b) correlation
means linear
association
(c) (d)
1000
Price (Singapore dollars)
800
600
400
200
Weight (carats)
1000
Price (Singapore dollars)
800
600
400
200
Weight (carats)
How much do I expect to pay for a ring with a 0.3 carat diamond?
Given a budget of $1000, how big can I expect to afford?
predictor
I y is called the dependent variable or response
predictor
I y is called the dependent variable or response
Examples
I Weight of diamond vs price
I Education vs salary
predictor
I y is called the dependent variable or response
Examples
I Weight of diamond vs price
I Education vs salary
1000
Price (Singapore dollars)
800
600
400
200
Weight (carats)
ˆ i = b0 + b1 weighti
price
H
@ HH
predicted price
@
R
@ H
j
H
slope explanatory variable weight
intercept
(a)
1000
(b)
(c)
800
Which of the follow-
ing appears to be the
line that best fits the
linear relationship be- 600
tween price and weight?
Choose one.
400
200
(a)
1000
(b)
(c)
800
Which of the follow-
ing appears to be the
line that best fits the
linear relationship be- 600
tween price and weight?
Choose one.
400
(a)
200
800
observed y=750
residual e=80
predicted y=670
The price for this ring is 80
600
Weight (carats)
800
weight price
(x) (y )
600
correlation r= 0.98
200
Weight (carats)
b1 = ...?
213.64
b1 = 0.98 × = 3721.02
0.06
213.64
b1 = 0.98 × = 3721.02
0.06
Interpretation
213.64
b1 = 0.98 × = 3721.02
0.06
Interpretation
each 1 carat increase in weight results in SGD$ 3,721 increase in price
each 0.1 carat increase in weight results in SGD$ 372.1 increase in price
Weichen Wang (HKU) 2021 27 / 132
Intercept
The intercept is where the regression line intersects the y -axis. The calculation of
the intercept uses the fact the a regression line always passes through (x̄, ȳ ).
b0 = ȳ − b1 x̄
b0 = ȳ − b1 x̄
In context...
weight price
(x) (y )
mean x̄ = 0.20 ȳ = 500.08
sd sx = 0.06 sy = 213.64
correlation r = 0.98
b0 = ȳ − b1 x̄
In context...
weight price
(x) (y )
mean x̄ = 0.20 ȳ = 500.08
sd sx = 0.06 sy = 213.64
correlation r = 0.98
b0 = ȳ − b1 x̄
1000
In context...
500
(x) (y )
mean x̄ = 0.20 ȳ = 500.08
0
sd sx = 0.06 sy = 213.64
correlation r = 0.98 b0=−259.63
−500
−0.1 0.0 0.1 0.2 0.3
b0 = 500.08 − 3721.02 × 0.20 = −259.63
Weight (carats)
b0 = ȳ − b1 x̄
1000
In context...
500
(x) (y )
mean x̄ = 0.20 ȳ = 500.08
0
sd sx = 0.06 sy = 213.64
correlation r = 0.98 b0=−259.63
−500
−0.1 0.0 0.1 0.2 0.3
b0 = 500.08 − 3721.02 × 0.20 = −259.63
Weight (carats)
Interpretation
b0 = ȳ − b1 x̄
1000
In context...
500
(x) (y )
mean x̄ = 0.20 ȳ = 500.08
0
sd sx = 0.06 sy = 213.64
correlation r = 0.98 b0=−259.63
−500
−0.1 0.0 0.1 0.2 0.3
b0 = 500.08 − 3721.02 × 0.20 = −259.63
Weight (carats)
Interpretation
If you walk into the store, asking for a 0-carat ring, the store actually pays you
SGD$259.63!
b0 = ȳ − b1 x̄
1000
In context...
500
(x) (y )
mean x̄ = 0.20 ȳ = 500.08
0
sd sx = 0.06 sy = 213.64
correlation r = 0.98 b0=−259.63
−500
−0.1 0.0 0.1 0.2 0.3
b0 = 500.08 − 3721.02 × 0.20 = −259.63
Weight (carats)
Interpretation
If you walk into the store, asking for a 0-carat ring, the store actually pays you
SGD$259.63!
Don’t extrapolate outside the data range!
Weichen Wang (HKU) 2021 28 / 132
Regression Line
1000
Price (Singapore dollars)
800
600
400
200
Weight (carats)
price
[ = b0 + b1 weight
\ = a0 + a1 price
weight
price
[ = b0 + b1 weight
\ = a0 + a1 price
weight
price
[ = b0 + b1 weight
\ = a0 + a1 price
weight
1000
predicted price=856.68
800
plugging in the value of x in the linear
model equation.
600
According to the linear model
400
price = −259.63 + 3721.02 weight,
what is the predicted price for a 0.30
200
carat ring? 0.15 0.20 0.25 0.30 0.35
Weight (carats)
1000
predicted price=856.68
800
plugging in the value of x in the linear
model equation.
600
According to the linear model
400
price = −259.63 + 3721.02 weight,
what is the predicted price for a 0.30
200
carat ring? 0.15 0.20 0.25 0.30 0.35
yi = β0 + β1 xi + i
yi = β0 + β1 xi + i
E (yi |xi ) = β0 + β1 xi
Model parameters: β0 , β1 , σ 2
Think of data as the sum of the signal β0 + β1 xi and noise i .
Interpretations:
I β + β x : conditional mean of y at x = x .
0 1 i i
I β : average of y when x = 0.
0
I β : average change in y between locations x and x + 1.
1
I σ: standard deviation of variation around conditional mean.
I Equal variance
I Normally distributed
dependence.
I A Residual vs. x scatterplot should reveal no meaningful
pattern.
I A Residual vs. Predicted scatterplot should reveal no
meaningful pattern.
I A histogram and normal quantile plot of the residuals should
1000
Price (Singapore dollars)
800
600
400
200
Weight (carats)
50
50
800
Residuals
Residuals
0
0
600
−50
−50
400
200
0.15 0.20 0.25 0.30 0.35 0.15 0.20 0.25 0.30 0.35 200 400 600 800 1000
50
Residuals
200 400 600 800
Price
0
−50
0.15 0.20 0.25 0.30 0.35 0.15 0.20 0.25 0.30 0.35
50
Residuals
200 400 600 800
Price
0
−50
0.15 0.20 0.25 0.30 0.35 0.15 0.20 0.25 0.30 0.35
50
Residuals
200 400 600 800
Price
0
−50
0.15 0.20 0.25 0.30 0.35 0.15 0.20 0.25 0.30 0.35
Sample Quantiles
50
Frequency
0 2 4 6 8
0
−50
−100 −50 0 50 −2 −1 0 1 2
1 2 3 4 5 6 7
DisplayFeet
Weichen Wang (HKU) 2021 47 / 132
Model Diagnostics 1: A Linear Fit Does Not Make Sense!
400
300
Sales
200
100
1 2 3 4 5 6 7
DisplayFeet
100
Residuals
50
0
−100
1 2 3 4 5 6 7
DisplayFeet
400
400
300
300
Sales
Sales
200
200
100
100
1 2 3 4 5 6 7 1 2 3 4 5 6 7
DisplayFeet DisplayFeet
100
Residuals
residuals
50
50
0
0
−50
−100
1 2 3 4 5 6 7 1 2 3 4 5 6 7
DisplayFeet DisplayFeet
Bodywtkg
8
5000
logBrainwtg
6
Brainwtg
3000
4
2
1000
0
−2
0
Bodywtkg logBodywtkg
8
5000
logBrainwtg
logBrainwtg
6
6
Brainwtg
3000
4
2
2
1000
0
−2
−2
0
60
60
50
50
40
40
Frequency
Frequency
30
30
20
20
10
10
0
0
0 2000 4000 6000 0 1000 3000 5000
Bodywtkg Brainwtg
Histogram of logBodywtkg Histogram of logBrainwtg
15
15
Frequency
Frequency
10
10
5
5
0
−5 0 5 10 −2 0 2 4 6 8 10
logBodywtkg logBrainwtg
y = a + bx
If x changes from x to x + δ, the exact change in y is bδ.
y = a + b loge x
If x changes from x to x(1 + p%) (a p% change), the
approximate change in y is bp%.
loge y = a + bx
If x changes from x to x + δ, the approximate change in y is y to
y (1 + bδ).
loge y = a + b loge x
If x changes from x to x(1 + p%) (a p% change), the
approximate change in y is y to y (1 + bp%).
1500
40
Sample Quantiles
Frequency
30
500
20
10
−500
0
2
Sample Quantiles
15
1
Frequency
10
0
5
−1
0
−2 −1 0 1 2 −2 −1 0 1 2
0 5 10 15 20 25 30
Period
residuals
1e+07
4e+07
−1e+07
0e+00
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Period Period
residuals
1e+07
4e+07
−1e+07
0e+00
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Period Period
2 4 6 8 10 12 14 16
Number.of.Crews
5 10
Residuals
50
−5 0
30
−15
10
2 4 6 8 10 12 14 16 10 20 30 40 50 60
2 4 6 8 10 12 14 16
Number.of.Crews
7
RoomsPerCrew
70
RoomsClean
6
50
5
4
30
3
10
Number.of.Crews 1/NumCrews
Outliers are points that lie away from the cloud of points.
Outliers are points that lie away from the cloud of points.
High leverage points: outliers that lie horizontally away from the
center of the cloud.
Outliers are points that lie away from the cloud of points.
High leverage points: outliers that lie horizontally away from the
center of the cloud.
Influential points: high leverage points that actually influence the
slope of the regression line.
Outliers are points that lie away from the cloud of points.
High leverage points: outliers that lie horizontally away from the
center of the cloud.
Influential points: high leverage points that actually influence the
slope of the regression line.
In order to determine if a point is influential, visualize the
regression line with and without the point. Does the slope of the
line change considerably? If so, then the point is influential. If
not, then it’s not an influential point.
the points
40
the outlier?
0
(a)
influential
(b)
high leverage −20
(c)
none of the above
(d)
there are no outliers
2
−2
40
the outlier?
0
(a)
influential
(b)
high leverage −20
(c)
none of the above
(d)
there are no outliers
2
−2
10
−5
−5
10
Not much...
−5
−5
w. outliers
w.o. outliers
1e+05 2e+05 3e+05 4e+05
House.Price
Crime.Rate
w. outliers
w.o. outliers
25000
Profit
15000
5000
0
Sq_Feet
Weichen Wang (HKU) 2021 62 / 132
Model Diagnostics 5: Outliers
y
(a)
Constant variability
(b)
Linear relationship
(c)
Normal residuals
g$residuals
(d)
No extreme outliers x
y
(a)
Constant variability
(b)
Linear relationship
(c)
Normal residuals
g$residuals
(d)
No extreme outliers x
y
(a)
Constant variability
(b)
Linear relationship
(c)
Normal residuals
g$residuals
g$residuals
(d)
No extreme outliersx x
y
(a)
Constant variability
(b)
Linear relationship
(c)
Normal residuals
g$residuals
g$residuals
(d)
No extreme outliersx x
yi = β0 + β1 xi + i
yi = β0 + β1 xi + i
Population line
yi = β0 + β1 xi + i
yi = b0 + b1 xi + ei
Population line
yi = β0 + β1 xi + i
yi = b0 + b1 xi + ei
weight
We want to
price
600
population line.
200
I H
0 : β1 = 0 (nothing is going on): The explanatory variable is
not a significant predictor of the response variable, i.e. no
relationship ⇒ slope of the relationship is 0.
I H
a : β1 6= 0 (something going on): The explanatory variable is
a significant predictor of the response variable, i.e.
relationship ⇒ slope of the relationship is different than 0.
I H
0 : β1 = 0 (nothing is going on): The explanatory variable is
not a significant predictor of the response variable, i.e. no
relationship ⇒ slope of the relationship is 0.
I H
a : β1 6= 0 (something going on): The explanatory variable is
a significant predictor of the response variable, i.e.
relationship ⇒ slope of the relationship is different than 0.
2 What is the range of possible values that β1 might take on?
I We can use confidence interval for β
1 to answer this.
I H
0 : β1 = 0 (nothing is going on): The explanatory variable is
not a significant predictor of the response variable, i.e. no
relationship ⇒ slope of the relationship is 0.
I H
a : β1 6= 0 (something going on): The explanatory variable is
a significant predictor of the response variable, i.e.
relationship ⇒ slope of the relationship is different than 0.
2 What is the range of possible values that β1 might take on?
I We can use confidence interval for β
1 to answer this.
3 What is the range of possible values of E (Y ) for given value X ?
I We can use confidence intervals for E (Y ).
I H
0 : β1 = 0 (nothing is going on): The explanatory variable is
not a significant predictor of the response variable, i.e. no
relationship ⇒ slope of the relationship is 0.
I H
a : β1 6= 0 (something going on): The explanatory variable is
a significant predictor of the response variable, i.e.
relationship ⇒ slope of the relationship is different than 0.
2 What is the range of possible values that β1 might take on?
I We can use confidence interval for β
1 to answer this.
3 What is the range of possible values of E (Y ) for given value X ?
I We can use confidence intervals for E (Y ).
Remember: We lose 1 degree of freedom for each parameter we estimate, and in simple linear regression we
estimate 2 parameters, β0 and β1 .
Remember: We lose 1 degree of freedom for each parameter we estimate, and in simple linear regression we
estimate 2 parameters, β0 and β1 .
1000
800
price
600
400
200
weight
Weichen Wang (HKU) 2021 76 / 132
4. Prediction Interval for y = β0 + β1 x +
How much might pay for a specific ring with a 1/4 carat diamond?
1000
800
price
600
400
200
weight
Weichen Wang (HKU) 2021 77 / 132
Comparison
1000
800
price
600
400
200
weight
200 horsepower?
I How much does my 200-pound brother owe me after riding
200 horsepower?
I How much does my 200-pound brother owe me after riding
60
50
40
MPG
30
20
2 3 4 5 6
Weight
60
50
40
MPG
30
20
2 3 4 5 6
Weight
2 3 4 5 6
Weight
2 3 4 5 6
Weight
2 4 6 90 130
3 5 7
GPHM
6
Weight
4
2
350
Horsepower
100
130
Wheelbase
90
250000
Price
0
2 4 6 90 130
3 5 7
GPHM
6
Weight
4
2
350
Horsepower
100
130
Wheelbase
90
250000
Price
0
1.5
8
GPHM Residual
1.0
GPHM Actual
7
0.5
6
0.0
5
4
−1.0
3
4 5 6 7 8 4 5 6 7 8
1.5
80
1.0
Frequency
60
0.5
0.0
40
20
−1.0
0
1.5
8
GPHM Residual
1.0
GPHM Actual
7
0.5
6
0.0
5
4
−1.0
3
4 5 6 7 8 4 5 6 7 8
1.5
80
1.0
Frequency
60
0.5
0.0
40
20
−1.0
0
H0 : βk = 0 for all k.
H0 : βk = 0 for all k.
(TSS − SSE )/K (178.04 + 9.04 + 6.35 + 0.41)/4
F = = = 352.5 ∼ FK ,n−K −1
SSE /(n − K − 1) 29.42/214
H0 : βk = 0 for all k.
(TSS − SSE )/K (178.04 + 9.04 + 6.35 + 0.41)/4
F = = = 352.5 ∼ FK ,n−K −1
SSE /(n − K − 1) 29.42/214
I At least one β
k is not zero, i.e. at least one of the Xk ’s is
useful in predicting Y .
Weichen Wang (HKU) 2021 99 / 132
Multiple Linear Regression Review
40 60 80 120
60
Age
40
20
Correlation
120
20 40 60 2 4 6 8
8
Med
6
Rating
(Intercept) 3.30845 3.42190 0.967 0.436
Age -0.04144 0.10786 -0.384 0.738
4
moderate incomes ($70K ∼ $80K)
2
Estimate Std. Error t value Pr (> |t|)
(Intercept) 8.36412 2.34772 3.563 0.0026
Age -0.07978 0.04791 -1.665 0.1153
20 30 40 50 60 70
Age
high incomes (> $110K)
Estimate Std. Error t value Pr (> |t|)
(Intercept) 12.07081 1.28999 9.357 0.000235
Age -0.06243 0.01873 -3.332 0.020727
8
Med
6
Rating
(Intercept) 3.30845 3.42190 0.967 0.436
Age -0.04144 0.10786 -0.384 0.738
4
moderate incomes ($70K ∼ $80K)
2
Estimate Std. Error t value Pr (> |t|)
(Intercept) 8.36412 2.34772 3.563 0.0026
Age -0.07978 0.04791 -1.665 0.1153
20 30 40 50 60 70
Age
high incomes (> $110K)
Estimate Std. Error t value Pr (> |t|)
(Intercept) 12.07081 1.28999 9.357 0.000235
Age -0.06243 0.01873 -3.332 0.020727
over TBill30
Also, consider multiple linear regression of exPACGE on both
exSP500 and exVW
SP500
Estimate Std. Error t value Pr (> |t|)
(Intercept) 0.009682 0.004317 2.243 0.026803
SP500 0.310295 0.087490 3.547 0.000562
ANOVA F-statistic 12.58 p-value 0.0005623
0.15
0.10
0.10
0.05
0.05
PACGE
PACGE
0.00
0.00
−0.05
−0.05
−0.10
−0.10
−0.20 −0.10 0.00 0.05 0.10 −0.20 −0.10 0.00 0.05 0.10
SP500 VW
Estimate Std. Error t value Pr (> |t|) Estimate Std. Error t value Pr (> |t|)
(Intercept) 0.009682 0.004317 2.243 0.026803 (Intercept) 0.008371 0.004371 1.915 0.057918
SP500 0.310295 0.087490 3.547 0.000562 VW 0.315696 0.084970 3.715 0.000313
ANOVA F-statistic 12.58 p-value 0.0005623 ANOVA F-statistic 13.8 p-value 0.0003126
0.10
0.05
Estimate Std. Error t value Pr (> |t|)
(Intercept) 0.009682 0.004317 2.243 0.026803
SP500 0.310295 0.087490 3.547 0.000562
0.00
ANOVA F-statistic 12.58 p-value 0.0005623
VW
Estimate Std. Error t value Pr (> |t|)
−0.10
(Intercept) 0.008371 0.004371 1.915 0.057918
VW 0.315696 0.084970 3.715 0.000313
ANOVA F-statistic 13.8 p-value 0.0003126
−0.20
Estimate Std. Error t value Pr (> |t|)
(Intercept) 0.005448 0.005119 1.064 0.289
SP500 -0.821098 0.749946 -1.095 0.276
−0.20 −0.10 0.00 0.05 0.10
VW 1.111498 0.731784 1.519 0.132
ANOVA F-statistic 7.513 p-value 0.0008547 SP500
0.10
0.05
Estimate Std. Error t value Pr (> |t|)
(Intercept) 0.009682 0.004317 2.243 0.026803
SP500 0.310295 0.087490 3.547 0.000562
0.00
ANOVA F-statistic 12.58 p-value 0.0005623
VW
Estimate Std. Error t value Pr (> |t|)
−0.10
(Intercept) 0.008371 0.004371 1.915 0.057918
VW 0.315696 0.084970 3.715 0.000313
ANOVA F-statistic 13.8 p-value 0.0003126
−0.20
Estimate Std. Error t value Pr (> |t|)
(Intercept) 0.005448 0.005119 1.064 0.289
SP500 -0.821098 0.749946 -1.095 0.276
−0.20 −0.10 0.00 0.05 0.10
VW 1.111498 0.731784 1.519 0.132
ANOVA F-statistic 7.513 p-value 0.0008547 SP500
Huge Collinearity!!!
redundant?
factors simultaneously.
I When predictors are collinear, the F test reveals their net
factors simultaneously.
I When predictors are collinear, the F test reveals their net
2 SSE /(n − K − 1)
Radj =1−
TSS/(n − 1)
where n is the number of cases and K is the number of predictors
2 SSE /(n − K − 1)
Radj =1−
TSS/(n − 1)
where n is the number of cases and K is the number of predictors
2
Because K is never negative, Radj will always be smaller than R 2 .
2
Radj applies a penalty for the number of predictors
2
Therefore, we can choose models with higher Radj over others.
Weichen Wang (HKU) 2021 115 / 132
R 2 vs. adjusted R 2
6
rating
External Internal
origin
I x
1 be the indicator function of being ‘Internal‘,
I (Origin = Internal)
I β
0 = µExternal
I β
1 = µInternal − µExternal
I x
1 be the indicator function of being ‘Internal‘,
I (Origin = Internal)
I β
0 = µExternal
I β
1 = µInternal − µExternal
ANOVA model is the same as
yi = β0 + β1 xi1 + i
I x
1 be the indicator function of being ‘Internal‘,
I (Origin = Internal)
I β
0 = µExternal
I β
1 = µInternal − µExternal
ANOVA model is the same as
yi = β0 + β1 xi1 + i
These two tests are equivalent
H0 : µInternal = µExternal and H0 : β1 = 0
Weichen Wang (HKU) 2021 120 / 132
Regress Manager Rating on Origin
origin
6
rating
External
Internal
4
60 80 100
salary
origin
6
rating
External
Internal
4
60 80 100
salary
origin
6
rating
External
Internal
4
60 80 100
salary
origin
6
rating
External
Internal
4
60 80 100
salary
External
Internal
4 External
Estimate Std. Error t value Pr (> |t|)
(Intercept) -1.9369 0.9862 -1.964 0.0542
2 salary 0.1054 0.0125 8.432 9.01e-12
60 80 100
salary
External
Internal
4 External
Estimate Std. Error t value Pr (> |t|)
(Intercept) -1.9369 0.9862 -1.964 0.0542
2 salary 0.1054 0.0125 8.432 9.01e-12
60 80 100
salary
explain Rating.
I With Salary added, the effect of Origin changes sign. Now
7.5
rating origin
External
5.0 Internal
2.5
60 80 100
salary
7.5
rating origin
External
5.0 Internal
2.5
60 80 100
salary
60 80 100
salary
60 80 100
salary
60 80 100
salary
Beyond just looking at the plot, we can fit a model that allows
the slopes to differ.
This model gives an estimate of the difference between the slopes.
This estimate is known as an interaction.
An interaction between a dummy variable and a numerical
variable measures the difference between the slopes of the
numerical variable in the two groups.
8
Estimate Std. Error t value Pr (> |t|)
origin (Intercept) -1.936941 1.156482 -1.675 0.0961
6
rating
External
originInternal 0.243417 1.447230 0.168 0.8667
salary 0.105391 0.014657 7.191 3.09e-11
Internal
4 originInternal:salary 0.003702 0.019520 0.190 0.8499
60 80 100
salary
8
Estimate Std. Error t value Pr (> |t|)
origin (Intercept) -1.936941 1.156482 -1.675 0.0961
6
rating
External
originInternal 0.243417 1.447230 0.168 0.8667
salary 0.105391 0.014657 7.191 3.09e-11
Internal
4 originInternal:salary 0.003702 0.019520 0.190 0.8499
60 80 100
salary
−2
External Internal
origin