You are on page 1of 17

2/3/2021 Linear Regression | Six Sigma Study Guide

Linear Regression
Posted by Ramana PV

Linear regression is a statistical technique to estimate the mathematical relationship between a dependent variable (usually

denoted as Y) and an independent variable (usually denoted as X). In other words, predict the change in the dependent

variable according to the change in the independent variable.

Dependent variable or Criterion variable – is the variable for which we wish to make a predictions

Independent variable or Predictor variable – The variable used to explain the dependent variable

When to Use Linear Regression


In simple linear regression, there is only one independent variable used to predict a single dependent variable. Whereas, in

multiple linear regression more than one independent variables used to predict a single dependent variable. In fact, the

basic difference between simple and multiple regression is in terms of explanatory variables.

For example compare the crop yield rate against the rain fall rate in a season.

Notes about Linear Regression


The rst step of linear regression is to test the linearity assumption, this can be performed by plot the values in a graph

known as scatter plot, to observe the relationship between dependent and independent variable, because if the data is

exponentially scattered then there is no meaning to create the regression equation.

https://sixsigmastudyguide.com/linear-regression/ 1/17
2/3/2021 Linear Regression | Six Sigma Study Guide

Draw the line which covers the majority of the points, further this line considered as the “best t” line

The mathematical equation of the line is y=a+bx+ε

Where

b – Slope of the line

a – y intercept when x=0

Random error (ε-Epsilon) – The difference between an observed value of y and the mean value of y for a given value of

x.

Assumption of Linear regression


Linear relationship between dependent and independent variable

All variables of regression to be multivariate normal

Particularly there is no or little multicollinearity in the data

Response variable is continuous and also residuals are almost same throughout the regression line

The method of Least Squares


https://sixsigmastudyguide.com/linear-regression/ 2/17
2/3/2021 Linear Regression | Six Sigma Study Guide

The method of least squares is a standard approach in regression analysis to determine the best t line for a given data, It

basically provides a visual relationship between the given data points.

In general, the dependent variables are demonstrated on y-axis, while the independent variables are demonstrated on x-

axis. The least square method determines the position of a straight line or also called trend line and the equation of the line.

This straight line is also known as best t line.

The least square method means that the overall solution minimizes the sum of squares of the errors made in the results of

every single equation. For instance, Least Squares Equation can be used to nd the values of the coef cients a and b

The least square estimator of a and b are compute as follows:

Compute â and b̂ values and then substitute these values into the equation of a line to obtain the least squares prediction

equation or regression line

Linear Regression example in DMAIC


Example: Linear Regression is speci cally uses in Analyze phase of DMAIC to estimate the mathematical relationship

between a dependent variable and an independent variable.

https://sixsigmastudyguide.com/linear-regression/ 3/17
2/3/2021 Linear Regression | Six Sigma Study Guide

A passenger vehicle manufacturer reviewing the 10 salespersons training records. In fact, their main aim is to compare the

salespersons achieved target (in %) with the number of sales module training’ completed.

Compute the least square prediction equation or regression line

https://sixsigmastudyguide.com/linear-regression/ 4/17
2/3/2021 Linear Regression | Six Sigma Study Guide

Furthermore, predict y for a given value of x by substitution into the prediction equation. For example, If a salesperson

completes 15 training modules, then the predicted achieved target sales would be:

ŷ = 31.09+3.5742(15)= 84.7019=84.7%

Estimate the variability of random errors


Referring the mathematical equation of the line is y=a+bx+ε and also the least square line is

https://sixsigmastudyguide.com/linear-regression/ 5/17
2/3/2021 Linear Regression | Six Sigma Study Guide

A random error (Є) affects the error of prediction. Hence the variability of the random errors (σε2) is the key parameter

while predicting by the least squares line.

Estimate variability of the random error σε2

Example: From the above data, compute the variability of the random errors

From the above calculation σ̂Є is 5.38. Thus, most of the points will fall within ±1.96 σ̂Є i.e 10.54 of the line, hence approx

95% of the values should be in this region. Moreover from the above graph, it is clearly evident that all the values are within

±10.54 of the line.

Test of slope coef cient


The existence of a signi cant relationship between dependent and independent variable can be tested by whether b is equal

to 0. If b is not equal to 0 there is a linear relationship. The null and alternative hypotheses are

The null hypothesis H0 : b=0

https://sixsigmastudyguide.com/linear-regression/ 6/17
2/3/2021 Linear Regression | Six Sigma Study Guide

The alternative hypothesis H1:  b≠0

Degrees of freedom = n-2

Example:  From the above data determine if the slope results are signi cant at a 95% con dence level

Determine the critical values of t for 8 degrees of freedom at 95% con dence level

t0.025, 8  = -2.306 and 2.306

The calculated t value is 5.481, which is not in between -2.306 and 2.306, we can reject the null hypothesis if t value is

greater than 2.306 or less than -2.306  

In this case, we can reject the null hypothesis and concluded that b≠0 and there is a linear relationship between dependent

and independent variable

https://sixsigmastudyguide.com/linear-regression/ 7/17
2/3/2021 Linear Regression | Six Sigma Study Guide

Con dence interval estimate for the slop b


The con dence interval estimate for the slope b is

Example: from the above data, compute the con dence interval around the slope of the line

2.0707<b<5.07

Correlation Coef cient


The linear correlation coef cient r measures the strength of the linear relationship between the paired x and y values in a

sample.

Pearson’s Correlation Coef cient

Example: from the above data, nd the correlation coef cient

https://sixsigmastudyguide.com/linear-regression/ 8/17
2/3/2021 Linear Regression | Six Sigma Study Guide

Note that -1≤ r ≤ +1

The line slopes upward to the right when r indicates positive value

The line slopes downward to the right when r indicates negative value

A value closer to 1, indicates the stronger positive linear relationship

A value closer to -1, indicates the stronger negative linear relationship

When r=0 implies no linear correlation

How is correlation analysis used to compare


bivariate data?
Measure of central tendency, variance, or spread summarizes a single variable by providing important information about its

distribution. Often, more than one variable is collected in a study or experiment. When two variables are measured on a

single experiment unit, the resulting data are called bivariate data. Ex job satisfaction strati ed by income.

In most instances, in bivariate data, it determines that one variable in uences the other variable. The quantities from these

two variables often represented using scatter plots to explore the relation between two variables.

https://sixsigmastudyguide.com/linear-regression/ 9/17
2/3/2021 Linear Regression | Six Sigma Study Guide

Depends on the type of data, Bivariate data can be described with graphs and numerical measures. If one or both variables

are qualitative, then use a pie chart or bar chart to see the relation between variables. For example, compare the

relationship between opinion and gender. If the two variables are quantitative, use the scatter plot. The Correlation

Coef cient is often used in comparing bivariate data.

Example: Correlation between the amount of time spent in Casino (independent variable) and the amount ($) lost

(dependent variable).

correlation coef cient

The correlation coef cient varies between -1 and +1. Values approaching -1 or +1 indicate strong correlation (negative or

positive) and values close to 0 indicate little or no correlation between x and y.

Sample Correlation Coef cient

https://sixsigmastudyguide.com/linear-regression/ 10/17
2/3/2021 Linear Regression | Six Sigma Study Guide

Correlation does not mean causation.

A positive correlation can be either good news or bad news

A negative correlation is not necessarily bad news. It merely means that as the independent variable goes more negative,

the dependent variable goes negative as well.

r = 0; does not indicate the absence of  a relationship, a curvilinear pattern may exist; r=-0.76 has the same predictive power

as r = +0.76

Correlation Coef cient Videos

The Correlation Coe cient - Explained in Three Steps

Coef cient of determination (R2)


The coef cient of determination is the proportion of the explained variation divided by the total variation, when a linear

regression is performed.

r2 lines in the interval of 0≤ r2 ≤1.

https://sixsigmastudyguide.com/linear-regression/ 11/17
2/3/2021 Linear Regression | Six Sigma Study Guide

Example: from the above data, compute the coef cient of determination

We can say that 79% of the variation in sales target achieved can be explained by variation in number of training modules

completed.

Linear Regression Related Topics


Residual Analysis: “Because a linear regression model is not always appropriate for the data, you should assess the

appropriateness of the model by de ning residuals and examining residual plots.”

Linear Regression
Linear regression is a statistical technique to estimate the
relationship between a dependent variable and an
independent variable.

Six Sigma Study Guide 0

Canonical Correlation Analysis


Canonical correlation analysis seeks the best sets of linear combinations with
independent variables related to dependent variables.

Six Sigma Study Guide 0

https://sixsigmastudyguide.com/linear-regression/ 12/17
2/3/2021 Linear Regression | Six Sigma Study Guide

Multiple Linear Regression


Multiple linear regression is an extension to methodology
of simple linear regression. It is used to study more than
two variables.

Six Sigma Study Guide 0

Linear Regression Additional Videos

An Introduction to Linear Regression Analysis

https://sixsigmastudyguide.com/linear-regression/ 13/17
2/3/2021 Linear Regression | Six Sigma Study Guide

How to calculate linear regression using least squar…


squar…

Contributors

Ramana PV

This entry was posted in Analyze and tagged ASQ, Black Belt, Green Belt, IASSC. Bookmark the permalink.

Comments (7)

Ronald Bettinardi
September 18, 2018 at 10:19 am

https://sixsigmastudyguide.com/linear-regression/ 14/17
2/3/2021 Linear Regression | Six Sigma Study Guide

link does not work

Reply

LUCA AMADEI
May 2, 2019 at 12:48 pm

Ted the link doesn’t work.Could you kindly provide a valide one?Thank you!

Reply

Ted Hessing
May 3, 2019 at 9:45 am

Hi all, Updated with a few links and a few videos. Let me know how this works for you!

Best, Ted.

Reply

Lyla
January 10, 2020 at 2:13 am

All your contributions are very useful for professionals and non-professionals. I appreciate your availability to

share these types of great and valuable info And you did it very well! Can’t wait to read more… You nailed it……..

Reply

https://sixsigmastudyguide.com/linear-regression/ 15/17
2/3/2021 Linear Regression | Six Sigma Study Guide

Ted Hessing
January 10, 2020 at 8:40 am

Thanks for the kind words, Lyla!

Reply

Anshika Tela
April 13, 2020 at 3:26 am

Ted,Can you explain how the 1.96 and10.54 is derived?

From the above calculation σ̂Є is 5.38. Thus, most of the points will fall within ±1.96 σ̂Є i.e 10.54 of the line,

hence approx 95% of the values should be in this region. Moreover from the above graph, it is clearly evident that

all the values are within ±10.54 of the line.

Reply

Ramana
April 15, 2020 at 9:22 am

Anshika,

A random error (Є) affects the error of prediction. Hence the variability of the random errors (σε2) is

the key parameter while predicting by the least squares line.

Random errors in experimental measurements are caused by unknown and unpredictable changes in

the experiment. Random errors often have a Gaussian normal distribution.

https://sixsigmastudyguide.com/linear-regression/ 16/17
2/3/2021 Linear Regression | Six Sigma Study Guide

For the standard normal distribution, P(-1.96 < Z < 1.96) = 0.95, i.e., there is a 95% probability that a

standard normal variable, Z, will fall between -1.96 and 1.96. (refer Z table)

From the calculation variability of random error is 5.38. 1.96 *5.38 = 10.54. 95% of values should be in

this region, but If you observe above graph (in the example) all the points fall with in ± 10.54 of the LS

line.

Hope this clari es!

Thanks

Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

https://sixsigmastudyguide.com/linear-regression/ 17/17

You might also like