You are on page 1of 38

STAT 250

Dr. Kari Lock Morgan

Simple Linear
Regression

SECTIONS 2.6, 9.1


• Simple linear regression

Statistics: Unlocking the Power of Data Lock5


Want More Stats???
• If you have enjoyed learning how to analyze data, and
want to learn more:
• STAT 460 (Intermediate Applied Statistics)
• STAT 461 (Analysis of Variance)
• STAT 462 (Applied Regression Analysis)
• All applied, only prerequisite is STAT 200 or 250

• If you like math and want to learn more of the


mathematical theory behind what we’ve learned:
• take STAT 414 (Probability)
• and then STAT 415 (Mathematical Statistics)
• Prerequisite: MATH 230 or MATH 231
Statistics: Unlocking the Power of Data Lock5
MODELING

Statistics: Unlocking the Power of Data Lock5


Question of the Day
Is the size of certain regions of
your brain associated with the
size of your social network?

Statistics: Unlocking the Power of Data Lock5


Social Networks and the Brain
Data from 40 students at City College London

How to measure brain size?

How to measure social network size?

Source: R. Kanai, B. Bahrami, R. Roylance and G. Ree (2011).


Online social network size is reflected in human brain structure, Proceedings of the
Royal Society B: Biological Sciences. 10/19/11.

Statistics: Unlocking the Power of Data Lock5


Measuring Brain Size
Structural Magnetic Resonance Imaging (MRI)

Voxel-based morphometry (VBM) to compute regional


grey matter volume based on T1-weighted anatomical
MRI scans

Brain regions found significant in initial study


 Amygdala (emotion and emotional memory)
 Middle temporal gyrus (social perception)
 Entorhinal cortex (memory and navigation)
 Superior temporal sulcus (perception of others)

Response: normalized z-score of grey matter density for


these brain regions
Statistics: Unlocking the Power of Data Lock5
Brain Regions

Image from Do our Brains Determine our Facebook Friend Count? (www.nature.com)
Statistics: Unlocking the Power of Data Lock5
Social Networks and the Brain
How to measure size of social network?
 How many were present at your 18th or 21st birthday party?
 If you were going to have a party now, how many people would you
invite?
 What is the total number of friends in your phonebook?
 Write down the names of the people to whom you would send a text
message marking a celebratory event. How many people is that?
 Write down the names of people in your phonebook you would meet for
a chat in a small group (one to three people). How many people is that?
 How many friends have you kept from school and university whom you
could have a friendly conversation with now?
 How many friends do you have on ‘Facebook’?
 How many friends do you have from outside school or university?
 Write down the names of the people of whom you feel you could ask a
favor and expect to have it granted. How many people is that?

Statistics: Unlocking the Power of Data Lock5


Social Networks and the Brain

Response
Variable, y

r = 0.436

Explanatory
Variable, x
Statistics: Unlocking the Power of Data Lock5
Linear Model

A linear model predicts a response


variable, y, using a linear function of
explanatory variables

Simple linear regression predicts on


response variable, y, as a linear
function of one explanatory variable, x

Statistics: Unlocking the Power of Data Lock5


Regression Line
Goal: Find a straight line that best fits the data

Statistics: Unlocking the Power of Data Lock5


Equation of the Line
The estimated regression line is

^  +𝛽
^𝑦 = 𝛽 ^ 𝑥
0 1
Intercept Slope
 
where x is the explanatory variable, and is the
predicted response variable.
• Slope: increase in predicted y for every unit
increase in x
• Intercept: predicted y value when x = 0
Statistics: Unlocking the Power of Data Lock5
Regression Line
 
Which plot goes
(a) (b)
with the line
?

(c) (d)

Statistics: Unlocking the Power of Data Lock5


Regression Line
 

𝐺𝑀 𝐷𝑒𝑛𝑠𝑖𝑡𝑦=−0.9276+.0 231𝐹𝐵𝑓𝑟𝑖𝑒𝑛𝑑𝑠
^
Intercept Slope

Statistics: Unlocking the Power of Data Lock5


Regression Model

𝐺𝑀 𝐷𝑒𝑛𝑠𝑖𝑡𝑦=−0.9276+.00231𝐹𝐵𝑓𝑟𝑖𝑒𝑛𝑑𝑠

Which is a correct interpretation?
a) The average GM density is -0.9276
b) For every extra 0.00231 Facebook friends, the
predicted GM density increases by 1
c) Predicted GM Density increases by 0.00231 for
each extra Facebook friend
d) For every extra 0.00231 Facebook friends, the
predicted GM density increases by -0.9276
Statistics: Unlocking the Power of Data Lock5
Explanatory and Response
• Unlike correlation, for linear regression it
does matter which is the explanatory
variable and which is the response
 

Statistics: Unlocking the Power of Data Lock5


Prediction
• The regression equation can be used to
predict y for a given value of x
 

𝐺• 𝑀 𝐷𝑒𝑛𝑠𝑖𝑡𝑦=−0.9276+.00231𝐹𝐵𝑓𝑟𝑖𝑒𝑛𝑑𝑠
^
What GM density would you predict for a
person with 100 Facebook friends?

Statistics: Unlocking the Power of Data Lock5


Prediction

Statistics: Unlocking the Power of Data Lock5


Prediction
Should you use this equation to predict the
normalized size of these regions of your brain?

a) Yes
b) No

Statistics: Unlocking the Power of Data Lock5


Regression Caution 1

• Do not use the regression equation or line


to predict outside the range of x values
available in your data (do not extrapolate!)

• If none of the x values are anywhere near


0, then the intercept is meaningless!

Statistics: Unlocking the Power of Data Lock5


Linearity
• The relationship between x and y is
linear (it makes sense to draw a line
through the scatterplot)

Statistics: Unlocking the Power of Data Lock5


Regression Caution 2

• Computers will calculate a regression line for


any two quantitative variables, even if they are
not associated or if the association is not linear

• ALWAYS PLOT YOUR DATA!

• The regression line/equation should only be


used if the association is approximately linear

Statistics: Unlocking the Power of Data Lock5


Charlie
Dog Years
• From www.dogyears.com:
“The old rule-of-thumb that one dog year
equals seven years of a human life is not
accurate. The ratio is higher with youth
and decreases a bit as the dog ages.”

ACTUAL
• 1 dog year = 7 human years
• Linear: human age = 7×dog age
LINEAR

A linear model can still be useful, even


if it doesn’t perfectly fit the data.
Statistics: Unlocking the Power of Data Lock5
“All models are wrong,
but some are useful”
-George Box

Statistics: Unlocking the Power of Data Lock5


Simple Linear Model
• The population/true simple linear model is
 

Intercept Slope Random error

• 0 and 1, are unknown parameters


• Can use familiar inference methods!
Statistics: Unlocking the Power of Data Lock5
Inference for the Slope
Test for whether the slope is significantly
different from 0 (whether there is any linear
relationship between x and y):
H0 : b 1 =0
Ha : b1 ¹ 0
Confidence interval for the true slope

Statistics: Unlocking the Power of Data Lock5


Regression in Minitab

Statistics: Unlocking the Power of Data Lock5


Association

Is the number of Facebook friends significantly


associated with Grey Matter Density?
a) Yes
b) No

Statistics: Unlocking the Power of Data Lock5


Social Networks and the Brain

Should you go out and add more Facebook


friends to increase the size of your brain?

a) Yes
b) No

Statistics: Unlocking the Power of Data Lock5


Regression Caution 4

Statistics: Unlocking the Power of Data Lock5


Confidence Interval

statistic  t *  SE

Statistics: Unlocking the Power of Data Lock5


Multiple Testing?

Statistics: Unlocking the Power of Data Lock5


False Positive (Type I Error) Protection

To further protect against Type I errors, they


performed two independent analyses on two
separate samples (n = 125, then n = 40)

Statistics: Unlocking the Power of Data Lock5


Real-World Network Size
What about real-world network size?

Statistics: Unlocking the Power of Data Lock5


R2

R2 is the proportion of the variability in


the response variable, Y, that is
explained by the explanatory variable, X

For simple linear regression, R2 = r2 (R2 is just


the sample correlation squared)

Statistics: Unlocking the Power of Data Lock5


R2
R 2  0.67 R 2  0.09

How much does the variability in Y decrease if you know X?

Statistics: Unlocking the Power of Data Lock5


Regression in Minitab

0.4362 = 0.19

Statistics: Unlocking the Power of Data Lock5


To Do
HW 2.6, HW 9.1 (due Wednesday, 4/19)

Statistics: Unlocking the Power of Data Lock5

You might also like