You are on page 1of 23

Regression Analysis

Scatter plots
Regression analysis requires
interval and ratio-level data.
To see if your data fits the models
of regression, it is wise to conduct
a scatter plot analysis.
The reason?
Regression analysis assumes a linear
relationship. If you have a curvilinear
relationship or no relationship,
regression analysis is of little use.

Types of Lines

e
rs
n
o
lIn
a
c
m
o
P
e
rC
it,c
p
a
u
re
td
n
o
la
rs
,
1P
9

Scatter plot
M
h
c
2O
0
tim
s
aB
te
sc
r15c
e
to
fP
p
o
la
u
tio
n
5y
e
rs
a
n
d
e
v
rw
h
h
lo
r'0.3s
D
g
e
ro
M
r35.e
o
,0
.0n
.2
0
2
5
2
.0
00P
2
5
2
3
0

50
3
0
4
erc
P
to
n
fP
plu
tio
a
n
w
th
ace
B
h
rlo
D
'seg
rey
b
P
rs
eo
a
n
lIc
o
m
P
e
rC
pit
a

This is a linear
relationship
It is a positive
relationship.
As population with
BAs increases so
does the personal
income per capita.

e
rs
n
o
lIn
a
c
m
o
P
e
rC
it,c
p
a
u
re
td
n
o
la
rs
,
1P
9

Regression Line
M
h
c
2O
0
tim
s
aB
te
sc
r15c
e
to
fP
p
o
la
u
tio
n
5y
e
rs
a
n
d
e
v
rw
h
h
lo
r'0.3s
D
g
e
r.54235.e
,0
.0n
.2
0
2
5
2
.0
00P
2
qr
S
R
io
L
rM
a
e
n
0o
=
2
5
0
3

50
3
0
4
erc
P
to
n
fP
plu
tio
a
n
w
th
ace
B
h
rlo
D
'seg
rey
b
P
rs
eo
a
n
lIc
o
m
P
e
rC
pit
a

Regression line is
the best straight
line description of
the plotted points
and use can use it
to describe the
association between
the variables.
If all the lines fall
exactly on the line
then the line is 0
and you have a
perfect relationship.

Things to remember
Regressions are still focuses on
association, not causation.
Association is a necessary
prerequisite for inferring
causation, but also:
1. The independent variable must
preceded the dependent variable in
time.
2. The two variables must be plausibly
lined by a theory,
3. Competing independent variables
must be eliminated.

e
n
o
rs
lIn
a
o
c
e
m
rC
P
p
a
it,c
re
u
td
n
la
o
,
rs
1P
9

Regression Table
0O
2
h
tim
s
aith
sB
e
rc
e
fP
pu
ti2o
la
n
eM
y
5
rscn
a
d
v25e
ch
a
lo
e
g
e
D
rM
15n
.0to
0.2
.0rw
30.r's
20P
qLineao
S
0.5e
r=
4235.0,
03250 R

The regression
coefficient is not a
good indicator for
the strength of the
relationship.
Two scatter plots
with very different
dispersions could
produce the same
regression line.

rs
e
P
n
o
lIc
a
m
o
P
e
rC
it,c
p
a
u
n
re
td
rs
la
o
,1
9

40350n
P
rc
e
tofP
ula
p
tion
w
ithB
ac
elo
h
r's
eg
D
re
by
P
on
rs
e
lIco
a
eP
m
ap
rC
ita

p
o
latio
nP
e60rS
q
ae
i80l.R10SqLi.near=0.461203.
M
40.u
.u
200.20. P
03250

40350n
P
rc
e
to
fP
pu
tio
la
nw
thB
he
c
a
r's
lo
D
gre
e
by
P
rso
e
na
lIco
m
P
e
rC
pit
a

Regression coefficient
The regression coefficient is the
slope of the regression line and
tells you what the nature of the
relationship between the variables
is.
How much change in the
independent variables is associated
with how much change in the
dependent variable.
The larger the regression coefficient
the more change.

Pearsons r
To determine strength you look at
how closely the dots are clustered
around the line. The more tightly
the cases are clustered, the
stronger the relationship, while the
more distant, the weaker.
Pearsons r is given a range of -1 to
+ 1 with 0 being no linear
relationship at all.

Reading the tables


Model Summary
Model
1

R
.736a

R Square
.542

Adjusted
R Square
.532

Std. Error of
the Estimate
2760.003

a. Predictors: (Constant), Percent of Population 25 years


and Over with Bachelor's Degree or More, March 2000
estimates

When you run regression analysis on SPSS you


get a 3 tables. Each tells you something about
the relationship.
The first is the model summary.
The R is the Pearson Product Moment
Correlation Coefficient.
In this case R is .736
R is the square root of R-Squared and is the
correlation between the observed and
predicted values of dependent variable.

R-Square
Model Summary
Model
1

R
.736a

R Square
.542

Adjusted
R Square
.532

Std. Error of
the Estimate
2760.003

a. Predictors: (Constant), Percent of Population 25 years


and Over with Bachelor's Degree or More, March 2000
estimates

R-Square is the proportion of variance in the


dependent variable (income per capita) which can be
predicted from the independent variable (level of
education).
This value indicates that 54.2% of the variance in
income can be predicted from the variable education.
Note that this is an overall measure of the strength
of association, and does not reflect the extent to
which any particular independent variable is
associated with the dependent variable.
R-Square is also called the coefficient of
determination.

Adjusted R-square
Model Summary
Model
1

R
.736a

R Square
.542

Adjusted
R Square
.532

Std. Error of
the Estimate
2760.003

a. Predictors: (Constant), Percent of Population 25 years


and Over with Bachelor's Degree or More, March 2000
estimates

As predictors are added to the model, each predictor will explain


some of the variance in the dependent variable simply due to
chance.
One could continue to add predictors to the model which would
continue to improve the ability of the predictors to explain the
dependent variable, although some of this increase in R-square
would be simply due to chance variation in that particular sample.
The adjusted R-square attempts to yield a more honest value to
estimate the R-squared for the population. The value of R-square
was .542, while the value of Adjusted R-square was .532. There
isnt much difference because we are dealing with only one
variable.
When the number of observations is small and the number of
predictors is large, there will be a much greater difference
between R-square and adjusted R-square.
By contrast, when the number of observations is very large
compared to the number of predictors, the value of R-square and
adjusted R-square will be much closer.

ANOVA
ANOVAb
Model
1

Regression
Residual
Total

Sum of
Squares
4.32E+08
3.66E+08
7.98E+08

df
1
48
49

Mean Square
432493775.8
7617618.586

F
56.775

Sig.
.000a

a. Predictors: (Constant), Percent of Population 25 years and Over with Bachelor's


Degree or More, March 2000 estimates
b. Dependent Variable: Personal Income Per Capita, current dollars, 1999

The p-value associated with this F value is very small


(0.0000).
These values are used to answer the question "Do the
independent variables reliably predict the dependent
variable?".
The p-value is compared to your alpha level (typically 0.05)
and, if smaller, you can conclude "Yes, the independent
variables reliably predict the dependent variable".
If the p-value were greater than 0.05, you would say that
the group of independent variables does not show a
statistically significant relationship with the dependent
variable, or that the group of independent variables does
not reliably predict the dependent variable.

Coefficients
Coefficientsa

Model
1

(Constant)
Percent of Population
25 years and Over
with Bachelor's
Degree or More,
March 2000 estimates

Unstandardized
Coefficients
B
Std. Error
10078.565
2312.771

688.939

91.433

Standardized
Coefficients
Beta

.736

t
4.358

Sig.
.000

7.535

.000

a. Dependent Variable: Personal Income Per Capita, current dollars, 1999

B - These are the values for the regression


equation for predicting the dependent variable
from the independent variable.
These are called unstandardized coefficients
because they are measured in their natural units.
As such, the coefficients cannot be compared with
one another to determine which one is more
influential in the model, because they can be
measured on different scales.

Coefficients
Coefficientsa

Model
1

(Constant)
Percent of Population
25 years and Over
with Bachelor's
Degree or More,
March 2000 estimates
Population Per
Square Mile

Unstandardized
Coefficients
B
Std. Error
13032.847
1902.700

Standardized
Coefficients
Beta

t
6.850

Sig.
.000

517.628

78.613

.553

6.584

.000

7.953

1.450

.461

5.486

.000

a. Dependent Variable: Personal Income Per Capita, current dollars, 1999

This chart looks at two variables and shows


how the different bases affect the B value.
That is why you need to look at the
standardized Beta to see the differences.

Coefficients
Coefficientsa

Model
1

(Constant)
Percent of Population
25 years and Over
with Bachelor's
Degree or More,
March 2000 estimates

Unstandardized
Coefficients
B
Std. Error
10078.565
2312.771

688.939

91.433

Standardized
Coefficients
Beta

.736

t
4.358

Sig.
.000

7.535

.000

a. Dependent Variable: Personal Income Per Capita, current dollars, 1999

Beta - The are the standardized coefficients.


These are the coefficients that you would obtain if you
standardized all of the variables in the regression, including
the dependent and all of the independent variables, and ran
the regression.
By standardizing the variables before running the
regression, you have put all of the variables on the same
scale, and you can compare the magnitude of the
coefficients to see which one has more of an effect.
You will also notice that the larger betas are associated
with the larger t-values.

How to translate a typical table


Regression Analysis Level of Education by Income per capita

Independent variables
Percent population with BA
R2
Number of Cases

Income per capita


b
Beta
688.939
.736
.542
49

Part of the Regression Equation


b represents the slope of the line
It is calculated by dividing the change
in the dependent variable by the
change in the independent variable.
The difference between the actual
value of Y and the calculated amount
is called the residual.
The represents how much error there
is in the prediction of the regression
equation for the y value of any
individual case as a function of X.

Comparing two variables


Regression analysis is useful for
comparing two variables to see whether
controlling for other independent variable
affects your model.
For the first independent variable,
education, the argument is that a more
educated populace will have higher-paying
jobs, producing a higher level of per capita
income in the state.
The second independent variable is
included because we expect to find betterpaying jobs, and therefore more
opportunity for state residents to obtain
them, in urban rather than rural areas.

Single

Multiple
Regression
Model Summary

Model
1

R
.736a

Model Summary

Adjusted
R Square
.532

R Square
.542

Std. Error of
the Estimate
2760.003

Model
1

a. Predictors: (Constant), Percent of Population 25 years


and Over with Bachelor's Degree or More, March 2000
estimates

R
.849a

Regression
Residual
Total

Sum of
Squares
4.32E+08
3.66E+08
7.98E+08

df
1
48
49

ANOVAb
Mean Square
432493775.8
7617618.586

F
56.775

Sig.
.000a

Model
1

a. Predictors: (Constant), Percent of Population 25 years and Over with Bachelor's


Degree or More, March 2000 estimates

Regression
Residual
Total

(Constant)
Percent of Population
25 years and Over
with Bachelor's
Degree or More,
March 2000 estimates

688.939

91.433

df
2
47
49

Mean Square
287614518.2
4742775.141

F
60.643

Sig.
.000a

b. Dependent Variable: Personal Income Per Capita, current dollars, 1999

Coefficientsa

Coefficientsa
Unstandardized
Coefficients
B
Std. Error
10078.565
2312.771

Sum of
Squares
5.75E+08
2.23E+08
7.98E+08

a. Predictors: (Constant), Population Per Square Mile, Percent of Population 25 years


and Over with Bachelor's Degree or More, March 2000 estimates

b. Dependent Variable: Personal Income Per Capita, current dollars, 1999

Model
1

Std. Error of
the Estimate
2177.791

a. Predictors: (Constant), Population Per Square Mile,


Percent of Population 25 years and Over with
Bachelor's Degree or More, March 2000 estimates

ANOVAb
Model
1

Adjusted
R Square
.709

R Square
.721

Standardized
Coefficients
Beta

.736

a. Dependent Variable: Personal Income Per Capita, current dollars, 1999

t
4.358

7.535

Sig.
.000

.000

Model
1

(Constant)
Percent of Population
25 years and Over
with Bachelor's
Degree or More,
March 2000 estimates
Population Per
Square Mile

Unstandardized
Coefficients
B
Std. Error
13032.847
1902.700

Standardized
Coefficients
Beta

t
6.850

Sig.
.000

517.628

78.613

.553

6.584

.000

7.953

1.450

.461

5.486

.000

a. Dependent Variable: Personal Income Per Capita, current dollars, 1999

Single Regression
Independent variables
Percent population with BA
R2
Number of Cases

Income per capita


b
Beta
688.939
.736
.542
49

Multiple Regression
Independent variables
Percent population with BA
Population Density
R2
Adjusted R2
Number of Cases

Income per capita


b
Beta
517.628
.553
7.953
.461
.721
.709
49

Perceptions of victory

Regression

You might also like