You are on page 1of 16

Linear Regression

SitiHawa/MTE3105/ed.2

Think Of
Below is 2 sets of data, set A (coursework score) and set B (test score) and think of the conclusion that you can make between this sets of data.

Set A 83.0 92.0 80.0 83.0 80.0 80.0 90.0 83.0 80.0 80.0 94.0 94.0 80.0 80.0 90.0 90.0 92.0 94.0

Set B 71.0 82.0 71.0 51.0 71.0 61.0 65.0 60.0 80.0 82.0 64.0 59.0 47.0 57.0 71.0 78.0 83.0 80.0

SitiHawa/MTE3105/ed.2

Think Of
Dependent variable Independent variable

Plot graph for this data and explain it to the class

Dependent Or Independent

SitiHawa/MTE3105/ed.3

Linear Regression
Example : A set of data and the plot of a "best-fit" straight line through the data.

SitiHawa/MTE3105/ed.2

Linear Regression
The relationship between two sets of data (x and y) is linear, when the data is plotted (y vs x) the result is a straight line.
This relationship is having a linear correlation and follows the equation of a straight line, y = mx+b .

SitiHawa/MTE3105/ed.2

Linear Regression : Concept

Example of linear regression with one dependent and one independent variable.
SitiHawa/MTE3105/ed.2

Scatterplots
A scatterplot is a graphic tool used to display the relationship between two quantitative variables. A scatterplot consists of an X axis (the horizontal axis), a Y axis (the vertical axis), and a series of dots. Each dot on the scatterplot represents one observation from a data set. The position of the dot on the scatterplot represents its X and Y values.
SitiHawa/MTE3105/ed.2

y
6.0

5.0

4.0

3.0

2.0

1.0

0.0 0.0 2.0 4.0 6.0 8.0

Linear Regression : Scatter Diagram


A scatterplot is often employed to identify potential associations between two variables : explanatory variable and response variable.

Positive association (positive slope)

Negative association (negative slope)

No association

SitiHawa/MTE3105/ed.2

Linear Regression
Given a set of data (xi, yi) with n data points, the slope and yintercept can be determined using the following:

n ( xy) x y n ( x ) ( x)
2 2

y m x b n
SitiHawa/MTE3105/ed.2

Correlation Coefficient
Given a set of data (xi , yi) with n data points, the correlation coefficient, r can be determined by :

n ( x ) ( x) n ( y ) ( y)
2 2 2 2
See example using MS Excel

n ( xy ) x y

SitiHawa/MTE3105/ed.2

Linear Regression
Correlation coefficient, r, is a measure of the reliability of the linear relationship between the x and y values.

r = 1 indicates an exact linear relationship between x and y.


Values of r close to 1 indicate excellent linear reliability.

If the correlation coefficient is relatively far away from 1, the predictions based on the linear relationship, y = mx + b , will be less reliable.

SitiHawa/MTE3105/ed.2

Correlation Coefficients, r
Correlation coefficients measure the strength of association between two variables. The most common correlation coefficient, called the Pearson product-moment correlation coefficient, measures the strength of the linear association between variables.
SitiHawa/MTE3105/ed.2

Scatterplots and Correlation Coefficients


The scatterplots below show how different patterns of data produce different degrees of correlation.

Maximum positive correlation (r = 1.0)

Strong positive correlation (r = 0.80)

Zero correlation (r = 0)

Minimum negative correlation (r = -1.0)

Moderate negative correlation (r = -0.43)

Strong correlation with outlier (r = 0.71)

SitiHawa/MTE3105/ed.2

Linear Regression : Concept


Simple Linear Regression is the method for finding the "line of best fit" between the dependent variable, y, and the independent variable, x.

Simple: only one independent variable


In general, the goal of linear regression is to find the line that best predicts Y from X. Linear regression does this by finding the line that minimizes the sum of the squares of the vertical distances of the points from the line. The Least Squares Regression Line is the line which minimizes the sum of the square or the error of the data points.

SitiHawa/MTE3105/ed.2

How to Interpret a Correlation Coefficient


The sign and the absolute value of a correlation coefficient describe the direction and the magnitude of the relationship between two variables.
The value of a correlation coefficient ranges between -1 and 1. The greater the absolute value of a correlation coefficient, the stronger the linear relationship. The strongest linear relationship is indicated by a correlation coefficient of -1 or 1. The weakest linear relationship is indicated by a correlation coefficient equal to 0. A positive correlation means that if one variable gets bigger, the other variable tends to get bigger.

A negative correlation means that if one variable gets bigger, the other variable tends to get smaller.
SitiHawa/MTE3105/ed.2

Interpolation & Extrapolation


Interpolation is making a prediction within the range of values of the predictor in the sample used to generate the model. Extrapolation is making a prediction outside the range of values of the predictor in the sample used to generate the model.

SitiHawa/MTE3105/ed.2