Professional Documents
Culture Documents
Introduction To Linear and Ninlinear Regression
Introduction To Linear and Ninlinear Regression
thierry.chonavel@imt-atlantique.fr
FC IMT
18 juin 2021 1 / 20
Regression : a probabilistic approach
Problem : in a probability space (Ω, A, P ) we consider real-valued
random variables (RVs) X = X1 , . . . , Xn and Y and we look for the
best approximation (in some sense) of Y as a function of X.
Interest : for an experiment ω ∈ Ω and an observation
x1:n = X1:n (ω), one can be interested in finding an approximation
of the corresponding y = Y (ω).
Hypothesis : in what follows, involved RVs Z are assumed with
finite second order moment : E[Z 2 ] < ∞.
Remarks :
I Here, uppercase letters will denote RVs and lowercase letters
samples from these RVs.
I Note however that in statistics people often use lower case letters
both to denote random variables and realizations of them, the
distinction arising from the context.
I For possibly multivariate RVs (that is, random vectors), bold case
notation will be used.
18 juin 2021 2 / 20
Regression : examples of application
18 juin 2021 3 / 20
Outline
1 Space L2 (Ω, A, P )
2 Nonlinear regression
3 Linear regression
4 Implementation
6 Exercise
18 juin 2021 4 / 20
Space L2 (Ω, A, P )
Let C(X) denote the set of random variables that are equal almost
everywhere (a.e.) : Y ∈ C(X) iff P (Y 6= X) = 0.
Definition : L2 (Ω, A, P ) = {C(X); E[X 2 ] < ∞}. In practice we
identify random variables names with the class they belong to.
p
Then k X k= E[X 2 ] is a semi-norm for RVs of (Ω, A, P ) with
finite variance but a norm in L2 (Ω, A, P ) :
18 juin 2021 5 / 20
Properties of L2 (Ω, A, P )
A nice feature of Hilbert spaces is that the projection onto closed convex
sets or closed subspaces operates as in euclidean spaces. A very useful
application of this property is the ability it offers to solve approximation
problems.
18 juin 2021 6 / 20
Outline
1 Space L2 (Ω, A, P )
2 Nonlinear regression
3 Linear regression
4 Implementation
6 Exercise
18 juin 2021 7 / 20
Regression function (I)
18 juin 2021 8 / 20
Regression function (II)
{g(X); g ∈ L2 (R, B(R), PX )} is a closed Hilbert subspace of
L2 (Ω, A, P ) and we can apply the projection theorem :
∃!h ∈ L2 (R, B(R), PX )), h = arg minL2 (R,B(R),PX ) E[(Y − g(X))2 ].
y = h(x) is called the regression function of Y knowing that X = x
and is denoted by h(x) = E[Y | X = x]. It is given by the following
result :
Theorem : conditional expectation in L2 (Ω, A, P ))
If (X, Y ) ∈ L2 (Ω, A, P ) with pdf p(x, y), then
R p(x, y)
E[Y | X = x] = y y dy.
p(x)
1 Space L2 (Ω, A, P )
2 Nonlinear regression
3 Linear regression
4 Implementation
6 Exercise
18 juin 2021 10 / 20
Linear regression
For nonlinear regression,
Z the computation of
p(x, y)
E[Y|X = x] = y dy requires knowledge of p(x, y) and
y p(x)
computing an integral.
Remarks
(i) Here we assumed Y(ω) ∈ Rp with p ≥ 1 and the expression of
E[Y|X = x] is a straightforward extension of the scalar case for Y ;
(ii) For large p computing E[Y|X = x] can be demanding.
Simpler than conditional nonlinear regression : linear regression
Ŷ = AX + b, with (A, b) = arg minC,d E[k Y − (CX + d) k2 ].
Solution : Ŷ = E[Y] + cov(Y, X) × cov(X)−1 [X − E[X]].
When X(ω), Y(ω) ∈ R, we get a regression line of Y from X
σY X
y = E[Y ] + 2 [x − E[X]].
σX
18 juin 2021 11 / 20
Linear regression : the Gaussian case
Y | X = x ∼ N mY + ΓYX Γ−1 −1 T
X (y − mY ), ΓYX − ΓYX ΓX ΓYX .
18 juin 2021 12 / 20
Outline
1 Space L2 (Ω, A, P )
2 Nonlinear regression
3 Linear regression
4 Implementation
6 Exercise
18 juin 2021 13 / 20
Practical implementation of linear regression from data
Assume the means, correlations or distribution of (X, Y) ∈ Rp×q are not
known but some sample set S = {(xk , yk )}k=1:n , consisting of
independent realizations of (X, Y), is available.
Let m̂X = n1 k=1:n xk , m̂Y = n1 k=1:n yk ,
P P
P P̂ (X = x, Y = y)
E[Y | X = x] ≈ y∈Y(Ω) y×
P̂ (X = x) (2)
n
where P̂ (X = x, Y = y) = nxy , P̂ (X = x) = nnx ,
nxy = #{k; (xk , yk ) = (x, y)}, nx = #{k; xk = x}
I the case of absolutely continuous random variables is addressed in
the lesson about kernel methods.
18 juin 2021 14 / 20
Example
Example : study the regression of Y w.r.t. X, with
π
X = (1 − 2Z1 ), Y = (0.1X + cos X) × (1 + Z2 ) (3)
2
where Z1 ∼ U[0,1] and Z2 ∼ N (0, 1) are independent.
Clearly, E[Y | X] = 0.1X + cos X
Comparison with linear regression :
18 juin 2021 15 / 20
Outline
1 Space L2 (Ω, A, P )
2 Nonlinear regression
3 Linear regression
4 Implementation
6 Exercise
18 juin 2021 16 / 20
Regression and Machine Learning
Not all the approximation problems in machine learning boil down
to regression among random vectors.
Often, we look for some parameter θ that is involved in the
distance d between two quantities x and y. θ is obtained by
minimizing this distance.
For example, we consider a transform Fθ (linear or nonlinear) such
that θ is estimated by θ̂ = arg minθ d(Fθ (x), y).
d(Fθ (x), y) can be a very complex function of θ and θ itself can
have large dimension. Then, iterative optimization techniques such
as gradient descent algorithms are often considered.
Example : to train neural networks that are made of a succession
of linear and non linear transforms, where the weights θ of the
linear transforms should be learned, we can search for θ by
minimizing i k Fθ (xi ) − yi k2 where training pairs (xi , yi )i∈I
P
represent input data xi and the corresponding desired outputs yi .
18 juin 2021 17 / 20
Outline
1 Space L2 (Ω, A, P )
2 Nonlinear regression
3 Linear regression
4 Implementation
6 Exercise
18 juin 2021 18 / 20
Exercise
18 juin 2021 20 / 20