You are on page 1of 83

Simple Linear Regression

Simple Linear Regression Model

• Only one independent variable, X


• Relationship between X and Y is described by a linear function
• Changes in Y are assumed to be related to changes in X
• The Pearson correlation coefficient (PCC) measures the degree
to which a set of data points form a straight line relationship.

• Regression is a statistical procedure that determines the


equation for the straight line that best fits a specific set of data.

• Linear regression: Concerned with predicting the value of


one variable based on (given) the value of the other variable.
The regression of Y on X.

JHU Intro to Clinical Research


• Regression analysis is used to:
• Predict the value of a dependent variable based on the value of at least one
independent variable
• Explain the impact of changes in an independent variable on the dependent
variable
• Dependent variable: the variable we wish to predict or explain
• Independent variable: the variable used to predict or explain the dependent
variable

Copyright © 2016 Pearson Education, Ltd. Chapter 12, Slide 5


Types of Relationships DCOVA
(continued)
Strong relationships Weak relationships

Y Y

X X

Y Y

X X
Copyright © 2016 Pearson Education, Ltd. Chapter 12, Slide 6
Types of
Correlation

7
Equation of Regression Model

Y = mx+c

Where

• Y represents the predicted value


• Ý represents the predicted mean of Y for a given X
• c represents the line intercept
• m represents the line slope
Error in Linear Regression

y = mx+c+e

Where

y is the given value in dataset

e is the standard error


Error in Linear Regression x y m mx c
1 3 0.4
2 4 0.8
3 2 0.4 1.2 2.4
4 4 1.6
5 5 2

y=mx+c+e
e =mx+c -y
e= 1*0.4+2.4-3 = -0.2
e= 2*0.4+2.4-4 = -0.8
e= 3*0.4+2.4-2 = 1.6
e= 4*0.4+2.4-4 = 0
e= 5*0.4+2.4-5 = -0.6
Multiple Linear Regression
Simple regression considers the relation between a single independent variable
and dependent variable

X Y

B.Burt Gerstman
• Multiple regression considers the influence of multiple independent variables on a
dependent variable Y

• The intent is to look at the independent effect of each variable while “adjusting
out” the influence of potential confounders

X1 X2 X3 . . . X n

Y
Multiple Regression
• The general form of the multiple regression model

• Multiple Regression is estimated by the following equation

y = b0 +b1X1 + b2X2 + b3X3 +. . . bkXk

As before, the coefficient b0 represents the intercept, but the bk's are now
the partial regression coefficients.
P-Value
P-value is a statistical measure that helps scientists determine whether or not
their hypotheses are correct. P values are used to determine whether the results of
their experiment are within the normal range of values for the events being
observed. Usually, if the P value of a data set is below a certain pre-determined
amount (like, for instance, 0.05), scientists will reject the "null hypothesis" of
their experiment - in other words, they'll rule out the hypothesis that the variables
of their experiment had no meaningful effect on the results.

Wikihow
Importing the dataset
Doing data preprocessing steps
Prediction results in compare with y-test
Polynomial Regression
Polynomial Regression is a form of Linear regression known as a special case of
Multiple linear regression which estimates the relationship as an nth degree
polynomial.

https://www.analyticsvidhya.com/
Advantages
• The broad range of function can be fit.

• Polynomial is more flexible and fit a wide range of curvature.

• Polynomial shows a very good approximation of the relationship.

Disadvantages
• Polynomial is very sensitive to outliers.
Polynomial regression is often
needed for more complex
relationships (Figs, B & C). Fig
A is more linear.
Y′ = A + BX + CX2 + DX3 + ….. QXN-1

• If you include (N-1) regressors based on X, you will perfectly fit the data.

• The order of the equation is the highest power: (N-1) in this example.
X(N-1) is the highest order predictor. All other regressors are lower order.

dionysus.psych.wisc.edu
Make the polynomial regression degree =4
Based on the scenario given in the lecture, predict the new region manager salary.
Support Vector Regression (SVR)
• Support Vector Machines (SVMs) are well known in classification problems. The use
of SVMs in regression is not as well documented, however. These types of models
are known as Support Vector Regression (SVR).

• Support Vector Regression (SVR) uses the same principle as SVM, but for
regression problems.

towardsdatascience.com
End

You might also like