You are on page 1of 24

Linear Regression

Context:
Introduction:
• The primary objective of regression is the study of
dependence relationships between two quantitative
variables X and Y, measured on the same individuals.
Examples:
o Different phenomena measured in the same individuals:
weight and height​.
o Successive measurements of the same phenomenon:
blood pressure in the morning and evening.
→ Are they independent? in a linear relationship or not?
Application of Regression

• Regression analysis is an indispensable tool for


analyzing relationship between financial variables.
It allows to determine changes rate for a portfolio
of bonds.
• Predict the price of a house
• Predict the price of a car
• Predict a person's height
• ...
Aims:
• In statistics, a linear regression model is a model that
seeks to establish a linear relationship between a
variable X and a variable Y​.
Example:
Y 🡺 Variable to Explain : Blood Pressure
X🡺 Explanatory Variable : Age
Age 41 43 54 44 48 56
Blood Pressure 12.0 12.0 14 14.4 13.3 13.8
Aims:
• Objective: we are trying to create a model that explains
blood pressure as a function of age.
→Draw a line that best passes through all the points.
Aims:
→ The best line is the one that minimizes the distance between the observed point
and the adjusted point.

7
Step1: Specify the dependent and
independent variables
• The aim of simple regression is to explain a
variable Y using a variable X (resp. several
variables X1, ..., Xq).
• The variable Y is called dependent variable, or
variable to be explained (Labels of observations).
• The variables Xj (j=1,...,q) are called independent
variables, or explanatory variables (Observations
Features).
Step2: Check the linearity
• Before any analysis, it is interesting to represent the data.
Thus a simple regression study always begins with a plot of the
observations (xi, yi), i = 1, ..., n.
• This first representation makes it possible to know if the linear
model is relevant.
Step3: Estimate the model
• The goal of regression is to find a function f :
yi ≈ f(xi).
- Simple linear regression (one feature): example predicting the price of a
house based on its area only.

Y = b0 + b1X + ε
- Multiple linear regression: The multiple regression model is a
generalization of the simple regression model when the explanatory
variables are in number n>1:

ε is added because in reality the relation between X and Y is not perfectly


linear
Step3: Simple linear regression
The Ordinary Least Squares (OLS)

•The function f: yi ≈ f(xi).


•Simple Linear Regression:

•Yi: Feature
• a et b are the coefficients (intercept and slope).
•xi is the independent variable (explanatory variable).
• ε is a random error
Estimated Value

Observed Balue

b0
Problem statement
The Ordinary Least Squares (OLS)
• We must look for the optimal values a and b of the line which minimize
the residual such that the sum of the squares of the errors is the
smallest.
• We therefore seek to minimize represents the residual.
Problem statement
Problem statement

R R >0

No Correlation Positive
Correlation

R <0
R =1

Perfect Negative
Correlation Correlation
Step3: Multiple linear regression

• The multiple regression model is a generalization of the


simple regression model when the explanatory variables are
in number n
Logistic Regression

17
Graphic Representation

•In view of the graph, it seems inadequate to propose a linear regression for the
first 2 graphs, the plot having a sinusoidal (graph 1) or sigmoidal (graph 2) shape.

•On the other hand, the modeling by a straight line of the relationship between X
and Y for the last graph seems to correspond to a good approximation of the
connection

forme forme
forme
sigmoïdale linéaire
sinusoïdal
e

18
Definition
• Logistics regression is a statistical model used to study the relationships
between:
❑ A set of qualitative variables Xi
❑ A qualitative variable Y​

➢ Generalized linear model using a


logistics function as link function
Definition
There are 3 types of logistic regression
❑ Binary ⇒ Variable to be explained binary (e.g. alive / dead)
❑ Ordinal ⇒ Variable to be explained ordinal (e.g.: stages of cancer)
❑ Multinomial ⇒ Variable to be explained qualitatively (e.g.: types of
cancer)
We focus here on binary logistic regression:
• Ordinal Regression: several additional hypotheses will be
recommended.
• Multinomial Regression: can be considered as a combination of several
binary logistic regressions.
Principle
• The idea consists of modeling the modalities of Y (presence
or absence) in terms of % compared to X
• Y Variable to be explained binary (1 or 0)
Principle
• P(Y=1|X) is a numerical value
• So, can we use a linear model?
Principle
• In linear regression: the output (dependent variable) is a
continuous value. It can have any value from an infinite
number of possible values (Example: predicting the value of
a house).
• In logistic regression: the output (dependent variable) has a
limited number of possible values.
→ Generally, logistic regression is used when the dependent
variable is categorical.
Principle
• Logistic regression is based on the logistic function used at
the heart of the method.
• The logistic function, also called the sigmoid function, is an
S-shaped curve that can take any real number and map it to
a value between 0 and 1

You might also like