You are on page 1of 70

APPLIED STATISTICS (D1074)

SIMPLE LINEAR REGRESSION AND CORRELATION

Week 11-14
(1) Correlation Analysis
(1) Correlation Analysis
1.1 Definition

Correlation analysis is used to measure strength of the


association (linear relationship) between two variables
 Only concerned with strength of the relationship
 No causal effect is implied
 In regression that variables are dependent and
independent variable

Variable Variable
1 2

Bina Nusantara University 3


(1) Correlation Analysis
1.2 Correlation Coefficient

A scatter plot (or scatter diagram) is used to


show the relationship between two variables

Bina Nusantara University 4


(1) Correlation Analysis
Strong relationships Weak relationships
y y

x x

y y

x x
Bina Nusantara University 5
(1) Correlation Analysis

No relationship x

x
Bina Nusantara University 6
(1) Correlation Analysis

The population correlation coefficient ρ (rho)


measures the strength of the association
between the variables

The sample correlation coefficient r is an estimate


of ρ and is used to measure the strength of the
linear relationship in the sample observations
Bina Nusantara University 7
(1) Correlation Analysis

 Range between -1 and 1


 The closer to -1, the stronger the negative linear
relationship
 The closer to 1, the stronger the positive linear
relationship
 The closer to 0, the weaker the linear
relationship
Bina Nusantara University 8
(1) Correlation Analysis
Example the value of r in scatterplot
y y y

x x x
r = -1 r = -0.6 r=0

y y

x x
r = +0.3 r = +1

Bina Nusantara University 9


(1) Correlation Analysis
Sample correlation coefficient:
n

 ( x  x )( y
i i  y)
rxy  i 1

 n   n
2
 ( xi  x )   ( yi  y ) 
2

 i 1   i 1 
r = Sample correlation coefficient
n = Sample size
x = first variable
y = second variable
Bina Nusantara University 10
(1) Correlation Analysis
1.3 Testing a Correlation
Hypothesis Test Statistic Critical Values

H 0:  ³  o
t  t ,n  2
H 1:  <  o
r n2
H 0 :  £ o t
1 r2 t  t ,n  2
H 1 :  > o

H 0:  =  o t  t / 2,n  2 or
t  t / 2,n 2
H 1:  ¹  o
Bina Nusantara University 11
(1) Correlation Analysis
Example 1

Correlation between student population and quarterly


sales

Bina Nusantara University 12


(1) Correlation Analysis

Example 1

Correlation between student population and quarterly


sales

Bina Nusantara University 13


(1) Correlation Analysis

Example 1

Correlation between student population and quarterly


sales
n

 ( x  x )( y  y )
i i
rxy  i 1

n 2 
n
2
 ( xi  x )   ( yi  y ) 
 i 1   i 1 
2840
rxy   0,95
(568)(15730)
Bina Nusantara University 14
(2) The Simple Linear Regression
Model
(2) The Simple Linear Regression Model
2.1 Modeling

Modeling is often performed by finding a functional


relationship between the expected value of a dependent
variable and a set of explanatory or independent variable

Independent Dependent
variable variable

Bina Nusantara University 16


(2) The Simple Linear Regression Model
2.2 Variable
Dependent variable:
The variable we wish to explain
Independent variable:
The variable used to explain the dependent variable

Independent Dependent
variable variable

Bina Nusantara University 17


(2) The Simple Linear Regression Model
2.3 Linear Regression
Linear regression is

dependent Variable (Y)


a modeling technique
in which the expected
value of a dependent
variable is modeled as
a linear combination
of a set independent
variable Independent Variable (X)

Bina Nusantara University 18


(2) The Simple Linear Regression Model
2.3 Linear Regression

Regression analysis is used to:


 Perform the model relationship of independent and
independent variable
 Predict the value of a dependent variable based on
the value of at least one independent variable
 Explain the impact of changes in an independent
variable on the dependent variable

Bina Nusantara University 19


(2) The Simple Linear Regression Model

Example 2

A regression model for the timing of production runs


 Model : What is the model
relationship of run size to
run time?
 Predict : What is the run
time value when the run
size is 200?
 Impact : Is there the run
size effect on run time?

Bina Nusantara University 20


(2) The Simple Linear Regression Model

Example 2

A regression model for the timing of production runs

 Model :

 Predict : 201.7
 Impact : Yes

Bina Nusantara University 21


(2) The Simple Linear Regression Model
2.4 Simple Linear Regression

 Only one independent variable

 Relationship between independent variable and


dependent variable is described by a linear function

 Changes in dependent variable are assumed to be


caused by changes in independent variable

Bina Nusantara University 22


(2) The Simple Linear Regression Model

When data are collected in pairs the standard notation


used to designate this is:

Where
x = independent variable
y = dependent variable
n = number of data

Bina Nusantara University 23


(2) The Simple Linear Regression Model
2.5 Simple Linear Regression Model

The population regression model:


Population Random
Population Independent Error
Slope
y intercept Variable term, or
Coefficient
Dependent residual

y  β 0  β1x  ε
Variable

Linear component Random Error


component
24
Bina Nusantara University
(2) The Simple Linear Regression Model
What is linear?  Scatterplot :
dependent Variable (Y)

Independent Variable (X)

Bina Nusantara University 25


(2) The Simple Linear Regression Model
y y

x x
Linear relationships Nonlinear relationships
y y

x x
Bina Nusantara University 26
(2) The Simple Linear Regression Model
2.6 Estimation

 Usually we have a sample of data instead of the whole


population.
 The slope β1 and intercept βo are unknown, since
these are the values for the whole population
 Then, use the given data to estimate the slope and
the intercept.

y  β 0  β1x  ε
27
Bina Nusantara University
(2) The Simple Linear Regression Model
 Estimated
Estimated (or Estimate of the Estimate of the
predicted) y regression regression slope
value intercept

Independent
ˆ ˆ
ŷ   0  1x variable

The individual random error terms ei have a mean of zero

28
Bina Nusantara University
(2) The Simple Linear Regression Model

Estimation of slope and intercept


n

S xy  x i  x  yi  y 
ˆ1   i 1
n
S xx
 x  x
2
i
i 1

ˆo  y  ˆ1 x

29
Bina Nusantara University
(2) The Simple Linear Regression Model

Estimation of variance
n

 y  yˆ i 
2
i
ˆ 
2 i 1
n2

30
Bina Nusantara University
(2) The Simple Linear Regression Model
2.7 Coefficient Determination
2
 R , the coefficient of determination of the regression
line is defined as the proportion of the total sample
variability in the Y ’s explained by the regression model

R r 2 2
xy
R  rxy
31
Bina Nusantara University
(2) The Simple Linear Regression Model

Example 3

Student population and quarterly sales data


For 10 armand’s pizza parlors

32
Bina Nusantara University
(2) The Simple Linear Regression Model

Example 3

Scatterplot

33
Bina Nusantara University
(2) The Simple Linear Regression Model

Example 3

34
Bina Nusantara University
(2) The Simple Linear Regression Model

Example 3
n

 x i  x  yi  y 
2840
ˆ1  i 1
n
 5
568

 ix  x  2

i 1

ˆo  y  b1 x  130  5(14)  60

yˆ  60  5 x
35
Bina Nusantara University
(2) The Simple Linear Regression Model

Example 3

yˆ  60  5 x

36
Bina Nusantara University
(3) Inference on the Parameter
(3) Inference on the Parameter
3.1 Parameter 1

The slope is a normal distribution

Confidence interval estimation


ˆ
1   t / 2,n 1 
ˆ
s xx
38
Bina Nusantara University
(3) Inference on the Parameter
Hypothesis Test :
β1 = 0
β1 ≠ 0
Statistical test :
ˆ1
t
ˆ / S xx

Critical value :
t  t / 2,n  2

39
Bina Nusantara University
(3) Inference on the Parameter

Example 4

Continue from example 3


Hypothesis Test :
β1 = 0
β1 ≠ 0 ˆ1 5
t   8,61
Statistical test : ˆ / S xx 191.25 / 568

t t    t  2,308
Critical value : 5% / 2 ,8

Conclusion : Reject H0
40
Bina Nusantara University
(3) Inference on the Parameter
3.2 Regression Line

An estimator of this unknown quantity is the value of the


estimated regression equation at X = x *, namely

41
Bina Nusantara University
(3) Inference on the Parameter

Has normal distribution

Confidence interval estimation

42
Bina Nusantara University
(3) Inference on the Parameter
Prediction Interval for
an individual y, given
y xp

Confidence
Interval for
 + b x the mean of
y = b0
1
y, given xp

x
x
Bina Nusantara University xp 43
(4) The Analysis of Variance Table
(4) The Analysis of Variance Table
4.1 Definition

Is a different test statistic which can be used


when there is more than one predictor variable,
that is, in multiple regression
ANOVA table is based upon the variability in the
dependent variable (y) and provides a hypothesis
test
β1 = 0
β1 ≠ 0
45
Bina Nusantara University
(4) The Analysis of Variance Table
4.2 Hypothesis Test

- Hypothesis :
H 0 : 1  0
H 1 : 1  0
- Test statistics :

46
Bina Nusantara University
(4) The Analysis of Variance Table
4.3 ANOVA Table

47
Bina Nusantara University
(4) The Analysis of Variance Table
n
SST    yi  y 
2

i 1
n
SSE    yi  yˆ i 
2

i 1
n
SSR    yˆ i  y 
2

i 1

48
Bina Nusantara University
(4) The Analysis of Variance Table
4.4 The Sum of Squares for a Simple Linear Regression

49
Bina Nusantara University
(4) The Analysis of Variance Table

50
Bina Nusantara University
(4) The Analysis of Variance Table

51
Bina Nusantara University
(4) The Analysis of Variance Table

Example 5

Continued from example 3

52
Bina Nusantara University
(5) Residual Analysis
(5) Residual Analysis
5.1 Residuals
Y
prediction y  yˆ
these differences
are called
residuals or
errors

yˆ  ˆ0  ˆ1 x

54
Bina Nusantara University
(5) Residual Analysis
The residuals are defined ei  yi  yˆ i
Residual analysis can be used to :
 Identify data points that are outliers
 Check whether the fitted model is appropriate
 Check whether the error variance is constant
 Check whether the error terms are normally
distributed

55
Bina Nusantara University
(5) Residual Analysis

Residual plot showing random scatter and no patterns

56
Bina Nusantara University
(5) Residual Analysis

Residual plot indicating points that may be outliers

57
Bina Nusantara University
(5) Residual Analysis

A grouping of positive and negative residuals


indicates that the linear model is inappropriate

58
Bina Nusantara University
(5) Residual Analysis

A funnel shape in the residual plot indicates a


nonconstant error variance

59
Bina Nusantara University
(6) Application with Minitab
(6) Application with Minitab
Correlation Analysis

Bina Nusantara University 61


(6) Application with Minitab

Bina Nusantara University 62


(6) Application with Minitab

Bina Nusantara University 63


(6) Application with Minitab
The Simple Linear Regression Model

Bina Nusantara University 64


(6) Application with Minitab

Bina Nusantara University 65


(6) Application with Minitab

Bina Nusantara University 66


Exercises
(1)

Bina Nusantara University 67


Exercises
(2)

Bina Nusantara University 68


THANK YOU

Bina Nusantara University 69


Reference

Anthony Hayter. (2012). Probability and Statistics for


Engineers and Scientists. 04. Thomson Brooks/Cole.
Australia. ISBN : 978-1133112143.

Bina Nusantara University 70

You might also like