Professional Documents
Culture Documents
Regression
Regression
Context:
Introduction:
• The primary objective of regression is the study of
dependence relationships between two quantitative
variables X and Y, measured on the same individuals.
Examples:
o Different phenomena measured in the same individuals:
weight and height.
o Successive measurements of the same phenomenon:
blood pressure in the morning and evening.
→ Are they independent? in a linear relationship or not?
Application of Regression
7
Step1: Specify the dependent and
independent variables
• The aim of simple regression is to explain a
variable Y using a variable X (resp. several
variables X1, ..., Xq).
• The variable Y is called dependent variable, or
variable to be explained (Labels of observations).
• The variables Xj (j=1,...,q) are called independent
variables, or explanatory variables (Observations
Features).
Step2: Check the linearity
• Before any analysis, it is interesting to represent the data.
Thus a simple regression study always begins with a plot of the
observations (xi, yi), i = 1, ..., n.
• This first representation makes it possible to know if the linear
model is relevant.
Step3: Estimate the model
• The goal of regression is to find a function f :
yi ≈ f(xi).
- Simple linear regression (one feature): example predicting the price of a
house based on its area only.
Y = b0 + b1X + ε
- Multiple linear regression: The multiple regression model is a
generalization of the simple regression model when the explanatory
variables are in number n>1:
•Yi: Feature
• a et b are the coefficients (intercept and slope).
•xi is the independent variable (explanatory variable).
• ε is a random error
Estimated Value
Observed Balue
b0
Problem statement
The Ordinary Least Squares (OLS)
• We must look for the optimal values a and b of the line which minimize
the residual such that the sum of the squares of the errors is the
smallest.
• We therefore seek to minimize represents the residual.
Problem statement
Problem statement
R R >0
No Correlation Positive
Correlation
R <0
R =1
Perfect Negative
Correlation Correlation
Step3: Multiple linear regression
17
Graphic Representation
•In view of the graph, it seems inadequate to propose a linear regression for the
first 2 graphs, the plot having a sinusoidal (graph 1) or sigmoidal (graph 2) shape.
•On the other hand, the modeling by a straight line of the relationship between X
and Y for the last graph seems to correspond to a good approximation of the
connection
forme forme
forme
sigmoïdale linéaire
sinusoïdal
e
18
Definition
• Logistics regression is a statistical model used to study the relationships
between:
❑ A set of qualitative variables Xi
❑ A qualitative variable Y