You are on page 1of 53

Research Methodology and

Applied Statistics
Week 9
SIMPLE LINEAR REGRESSION AND
CORRELATION
REFERENCE

Anthony Hayter. (2012). Probability and Statistics


for Engineers and Scientists. 04. Thomson
Brooks/Cole. Australia. ISBN : 978-1133112143.

Bina Nusantara University 2


Learning Outcomes
LO3: Use proper statistical techniques for statistical
decision making
LO4: interpret the result of the calculation
LO5 : Draw the statistical conclusions from
experiments and observations

Bina Nusantara University 3


Correlation Analysis

1.1 Definition

Correlation analysis is used to measure strength of the


association (linear relationship) between two variables
 Only concerned with strength of the relationship
 No causal effect is implied
 In regression that variables are dependent and
independent variable

Variable Variable
1 2

Bina Nusantara University 4


Correlation Analysis

1.2 Correlation Coefficient

A scatter plot (or scatter diagram) is used to


show the relationship between two variables

Bina Nusantara University 5


Correlation Analysis

Strong relationships Weak relationships

y y

x x

y y

x x
Bina Nusantara University 6
Correlation Analysis

No relationship x

x
Bina Nusantara University 7
Correlation Analysis

The population correlation coefficient ρ (rho)


measures the strength of the association
between the variables

The sample correlation coefficient r is an estimate


of ρ and is used to measure the strength of the
linear relationship in the sample observations
Bina Nusantara University 8
Correlation Analysis

 Range between -1 and 1


 The closer to -1, the stronger the negative linear
relationship
 The closer to 1, the stronger the positive linear
relationship
 The closer to 0, the weaker the linear
relationship
Bina Nusantara University 9
Correlation Analysis

Example the value of r in scatterplot


y y y

x x x
r = -1 r = -0.6 r=0

y y

x x
r = +0.3 r = +1

Bina Nusantara University 10


Correlation Analysis

Sample correlation coefficient:


n

 ( x  x )( y
i i  y)
rxy  i 1

 n   n
2
 ( xi  x )   ( yi  y ) 
2

 i 1   i 1 
r = Sample correlation coefficient
n = Sample size
x = first variable
y = second variable
Bina Nusantara University 11
Correlation Analysis
Testing a Correlation
Hypothesis Test Statistic Critical Values

H 0:  ³  o
t  t ,n  2
H 1:  <  o
r n2
H 0 :  £ o t
1 r2 t  t ,n  2
H 1 :  > o

H 0:  =  o t  t / 2,n  2 or
t  t / 2,n 2
H 1:  ¹  o
Bina Nusantara University 12
Correlation Analysis

Example 1

Correlation between student population and quarterly


sales

Bina Nusantara University 13


Correlation Analysis

Example 1

Correlation between student population and quarterly


sales

Bina Nusantara University 14


Correlation Analysis

Example 1

Correlation between student population and quarterly


sales
n

 ( x  x )( y  y )
i i
rxy  i 1

n 2 
n
2
 ( xi  x )   ( yi  y ) 
 i 1   i 1 
2840
rxy   0,95
(568)(15730)
Bina Nusantara University 15
The Simple Linear Regression
Model

Modeling is often performed by finding a functional


relationship between the expected value of a dependent
variable and a set of explanatory or independent variable

Independent Dependent
variable variable

Bina Nusantara University 16


The Simple Linear Regression
Model

Dependent variable:
The variable we wish to explain
Independent variable:
The variable used to explain the dependent variable

Independent Dependent
variable variable

Bina Nusantara University 17


The Simple Linear Regression
Model

Linear regression is

dependent Variable (Y)


a modeling technique
in which the expected
value of a dependent
variable is modeled as
a linear combination
of a set independent
variable
Independent Variable (X)

Bina Nusantara University 18


The Simple Linear Regression
Model

Regression analysis is used to:


 Perform the model relationship of independent and
independent variable
 Predict the value of a dependent variable based on
the value of at least one independent variable
 Explain the impact of changes in an independent
variable on the dependent variable

Bina Nusantara University 19


The Simple Linear Regression
Model

Example 2

A regression model for the timing of production runs


 Model : What is the model
relationship of run size to
run time?
 Predict : What is the run
time value when the run
size is 200?
 Impact : Is there the run
size effect on run time?

Bina Nusantara University 20


The Simple Linear Regression
Model

Example 2

A regression model for the timing of production runs

 Model :

 Predict : 201.7
 Impact : Yes

Bina Nusantara University 21


The Simple Linear Regression
Model

 Only one independent variable

 Relationship between independent variable and


dependent variable is described by a linear function

 Changes in dependent variable are assumed to be


caused by changes in independent variable

Bina Nusantara University 22


The Simple Linear Regression
Model

When data are collected in pairs the standard notation


used to designate this is:

Where
x = independent variable
y = dependent variable
n = number of data

Bina Nusantara University 23


The Simple Linear Regression
Model

The population regression model:

Population Random
Population Independent Error
Slope
Dependent y intercept Variable term, or
Coefficient
Variable residual

y  β 0  β1x  ε
Linear component Random Error
component
Bina Nusantara University 24
The Simple Linear Regression
Model

What is linear?  Scatterplot :


dependent Variable (Y)

Independent Variable (X)

Bina Nusantara University 25


The Simple Linear Regression
Model

y y

x x
Linear relationships Nonlinear relationships
y y

x x
Bina Nusantara University 26
The Simple Linear Regression
Model

 Usually we have a sample of data instead of the whole


population.
 The slope β1 and intercept βo are unknown, since
these are the values for the whole population
 Then, use the given data to estimate the slope and
the intercept.

y  β 0  β1x  ε
Bina Nusantara University 27
The Simple Linear Regression
Model

Estimated (or Estimate of the Estimate of the


predicted) y regression regression slope
value intercept

Independent
ˆ ˆ
ŷ   0  1x variable

The individual random error terms ei have a mean of zero

Bina Nusantara University 28


The Simple Linear Regression
Model

Estimation of slope and intercept


n

S xy  x i  x  yi  y 
ˆ1   i 1
n
S xx
 x  x
2
i
i 1

ˆo  y  ˆ1 x

Bina Nusantara University 29


The Simple Linear Regression
Model

Estimation of variance
n

 y  yˆ i 
2
i
ˆ 
2 i 1
n2

Bina Nusantara University 30


The Simple Linear Regression
Model

Coefficient Determination
2
 R , the coefficient of determination of the regression
line is defined as the proportion of the total sample
variability in the Y ’s explained by the regression model

R r 2 2
xy
R  rxy
Bina Nusantara University 31
The Simple Linear Regression
Model

Example 3

Student population and quarterly sales data


For 10 armand’s pizza parlors

Bina Nusantara University 32


Example 3

Scatterplot

Bina Nusantara University 33


The Simple Linear Regression
Model

Example 3

Bina Nusantara University 34


The Simple Linear Regression
Model

Example 3
n

 x i  x  yi  y 
2840
ˆ1  i 1
n
 5
568

 ix  x  2

i 1

ˆo  y  b1 x  130  5(14)  60

yˆ  60  5 x
Bina Nusantara University 35
The Simple Linear Regression
Model

Example 3

yˆ  60  5 x

Bina Nusantara University 36


Inference on the Parameter

The slope is a normal distribution

Confidence interval estimation


ˆ
1   t / 2,n 1 
ˆ
s xx

Bina Nusantara University 37


Inference on the Parameter

Hypothesis Test :
β1 = 0
β1 ≠ 0
Statistical test :
ˆ1
t
ˆ / S xx

Critical value :
t  t / 2,n  2

Bina Nusantara University 38


Inference on the Parameter

Example 4

Continue from example 3


Hypothesis Test :
β1 = 0
β1 ≠ 0 ˆ1 5
t   8,61
Statistical test : ˆ / S xx 191.25 / 568

t t    t  2,308
Critical value : 5% / 2 ,8

Conclusion : Reject H0
Bina Nusantara University 39
Inference on the Parameter

An estimator of this unknown quantity is the value of the


estimated regression equation at X = x *, namely

Bina Nusantara University 40


Inference on the Parameter

Has normal distribution

Confidence interval estimation

Bina Nusantara University 41


Inference on the Parameter

Prediction Interval for


an individual y, given
y xp

Confidence
Interval for
the mean of
y, given xp

x
x xp
Bina Nusantara University 42
The Analysis of Variance Table

Is a different test statistic which can be used


when there is more than one predictor variable,
that is, in multiple regression
ANOVA table is based upon the variability in the
dependent variable (y) and provides a hypothesis
test
β1 = 0
β1 ≠ 0

Bina Nusantara University 43


The Analysis of Variance Table

- Hypothesis :
H 0 : 1  0
H 1 : 1  0
- Test statistics :

Bina Nusantara University 44


The Analysis of Variance Table

ANOVA Table

Bina Nusantara University 45


The Analysis of Variance Table

n
SST    yi  y 
2

i 1
n
SSE    yi  yˆ i 
2

i 1
n
SSR    yˆ i  y 
2

i 1

Bina Nusantara University 46


The Analysis of Variance Table

The Sum of Squares for a Simple Linear Regression

Bina Nusantara University 47


The Analysis of Variance Table

Bina Nusantara University 48


The Analysis of Variance Table

Bina Nusantara University 49


The Analysis of Variance Table

Example 5

Continued from example 3

Bina Nusantara University 50


Exercises
(1)

Bina Nusantara University 51


Exercises
(2)

Bina Nusantara University 52


Thank You

You might also like