Regression

Linear Regression
Context:
Introduction:
• The primary objective of regression is the study of
dependence relationships between two quantitative
variables X and Y, measured on the same individuals.
Examples:
o Different phenomena measured in the same individuals:
weight and height.
o Successive measurements of the same phenomenon:
blood pressure in the morning and evening.
→ Are they independent? in a linear relationship or not?
Application of Regression
• Regression analysis is an indispensable tool for

analyzing relationship between financial variables.
It allows to determine changes rate for a portfolio
of bonds.
• Predict the price of a house
• Predict the price of a car
• Predict a person's height
• ...
Aims:
• In statistics, a linear regression model is a model that
seeks to establish a linear relationship between a
variable X and a variable Y.
Example:
Y 🡺 Variable to Explain : Blood Pressure
X🡺 Explanatory Variable : Age
Age 41 43 54 44 48 56
Blood Pressure 12.0 12.0 14 14.4 13.3 13.8
Aims:
• Objective: we are trying to create a model that explains
blood pressure as a function of age.
→Draw a line that best passes through all the points.
Aims:
→ The best line is the one that minimizes the distance between the observed point
and the adjusted point.
7
Step1: Specify the dependent and
independent variables
• The aim of simple regression is to explain a
variable Y using a variable X (resp. several
variables X1, ..., Xq).
• The variable Y is called dependent variable, or
variable to be explained (Labels of observations).
• The variables Xj (j=1,...,q) are called independent
variables, or explanatory variables (Observations
Features).
Step2: Check the linearity
• Before any analysis, it is interesting to represent the data.
Thus a simple regression study always begins with a plot of the
observations (xi, yi), i = 1, ..., n.
• This first representation makes it possible to know if the linear
model is relevant.
Step3: Estimate the model
• The goal of regression is to find a function f :
yi ≈ f(xi).
- Simple linear regression (one feature): example predicting the price of a
house based on its area only.
Y = b0 + b1X + ε
- Multiple linear regression: The multiple regression model is a
generalization of the simple regression model when the explanatory
variables are in number n>1:
ε is added because in reality the relation between X and Y is not perfectly

linear
Step3: Simple linear regression
The Ordinary Least Squares (OLS)
•The function f: yi ≈ f(xi).

•Simple Linear Regression:
•Yi: Feature
• a et b are the coefficients (intercept and slope).
•xi is the independent variable (explanatory variable).
• ε is a random error
Estimated Value
Observed Balue
b0
Problem statement
The Ordinary Least Squares (OLS)
• We must look for the optimal values a and b of the line which minimize
the residual such that the sum of the squares of the errors is the
smallest.
• We therefore seek to minimize represents the residual.
Problem statement
Problem statement
R R >0
No Correlation Positive
Correlation
R <0
R =1
Perfect Negative
Correlation Correlation
Step3: Multiple linear regression
• The multiple regression model is a generalization of the

simple regression model when the explanatory variables are
in number n
Logistic Regression
17
Graphic Representation
•In view of the graph, it seems inadequate to propose a linear regression for the
first 2 graphs, the plot having a sinusoidal (graph 1) or sigmoidal (graph 2) shape.
•On the other hand, the modeling by a straight line of the relationship between X
and Y for the last graph seems to correspond to a good approximation of the
connection
forme forme
forme
sigmoïdale linéaire
sinusoïdal
e
18
Definition
• Logistics regression is a statistical model used to study the relationships
between:
❑ A set of qualitative variables Xi
❑ A qualitative variable Y
➢ Generalized linear model using a

logistics function as link function
Definition
There are 3 types of logistic regression
❑ Binary ⇒ Variable to be explained binary (e.g. alive / dead)
❑ Ordinal ⇒ Variable to be explained ordinal (e.g.: stages of cancer)
❑ Multinomial ⇒ Variable to be explained qualitatively (e.g.: types of
cancer)
We focus here on binary logistic regression:
• Ordinal Regression: several additional hypotheses will be
recommended.
• Multinomial Regression: can be considered as a combination of several
binary logistic regressions.
Principle
• The idea consists of modeling the modalities of Y (presence
or absence) in terms of % compared to X
• Y Variable to be explained binary (1 or 0)
Principle
• P(Y=1|X) is a numerical value
• So, can we use a linear model?
Principle
• In linear regression: the output (dependent variable) is a
continuous value. It can have any value from an infinite
number of possible values (Example: predicting the value of
a house).
• In logistic regression: the output (dependent variable) has a
limited number of possible values.
→ Generally, logistic regression is used when the dependent
variable is categorical.
Principle
• Logistic regression is based on the logistic function used at
the heart of the method.
• The logistic function, also called the sigmoid function, is an
S-shaped curve that can take any real number and map it to
a value between 0 and 1

Regression

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Regression

Uploaded by

Copyright:

Available Formats

Linear Regression

• Regression analysis is an indispensable tool for

ε is added because in reality the relation between X and Y is not perfectly

•The function f: yi ≈ f(xi).

• The multiple regression model is a generalization of the

➢ Generalized linear model using a

You might also like