You are on page 1of 34

Regression and Interpolation

CS 740
Ajit Rajwade

1 of 20
Problem statements
• Consider a set of N points {(xi,yi)}, 1 ≤ i ≤ N.
• Suppose we know that these points actually
lie on a function of the form y = f(x;a) where
f(.) represents a function family and a
represents a set of parameters.
• For example: f(x) is a linear function of x, i.e.
of the form y = f(x) = mx+c. In this case, a =
(m,c).

2
Problem statements
• Example 2: f(x) is a quadratic function of x, i.e. of the
form y = f(x) = px2+qx+r. In this case, a = (p,q,r).

• Example 3: f(x) is a trigonometric function of x, i.e. of


the form f(x) = p sin(qx+r). In this case, a = (p,q,r).

• In each case, we assume knowledge of the function


family. But we do not know the function parameters,
and would like to estimate them from {(xi,yi)}.

• This is the problem of fitting a function (of known


family) to a set of points.
3
Problem statements
• In function regression (or approximation), we
want to find a such that for all i, f(xi;a)≈yi.

• In function interpolation, we want to fit some


function such that f(xi;a)=yi.

4
Polynomial Regression
• A polynomial of degree n is a function of the
form:
n
y   ai x  a0  a1 x  a2 x  ...  an x
i 2 n

i 0

• Polynomial regression is the task of fitting a


polynomial function to a set of points.

5
Polynomial regression
• Let us assume that x is the independent
variable, and y is the dependent variable.
• In the point set {(xi,yi)} containing N points, we
will assume that the x-coordinates of the
points are available accurately, whereas the y-
coordinates are affected by measurement
error – called as noise.

6
7
Least Squares Polynomial regression
• If we assume the polynomial is linear, we have
yi = mxi + c + ei, where ei is the noise in yi. We
want to estimate m and c.
• We will do so by minimizing the following
w.r.t. m and c:
N

 ( y  mx  c)
i 0
i i
2

• This is called as least squares regression.


8
Least Squares Polynomial regression
• In matrix, form we can write the following
 y1   x1 1  e1 
      The solution will exist
 y 2   x2 1  e2  provided there are at
 .  . m   least 2 non-coincident
.    . points. Basically, XTX
    c    must be non-singular.
 .   . .  . 
y  x 1 e 
 N  N  N
y  Xa
T 1 T
a (X X)X y  X y 
9
Least Squares Polynomial regression
• For higher order polynomials, we have:
 a  The solution
 y1   x1 . . x1 x1 1   e1  will exist

n 2 n

   n  .    provided
 y2   x2 . . x2 x2 1   e2  there are at
2

 .  .  .
. . .     .  least n non-
    a2    coincident,
 .   . . . .    .  non-collinear
y   n 2  a1    points.
 N   x N . . x N x N 1 a   eN  Basically, XTX
 0 must be non-
y  Xa singular.

a (X T X)
1 T
X y  X y 10
Least Squares Polynomial regression
• Regardless of the order of the polynomial, we
are dealing with a linear form of regression –
because in each case, we are solving an
equation of the form y ≈ Xa.

• There are some assumptions made on the


errors in the measurement of y. We will not
go into those details.

11
Least Squares Polynomial Regression:
Geometric Interpretation
• The least squares solution seeks a set of polynomial
coefficients that minimize the sum of the squared
difference between the measured y coordinate of
each point and the y coordinate if the assumed
polynomial model was strictly obeyed, i.e.
N n
min {a j }  (y i   a j x )
i
j 2

i 0 j 0


min {a} y  Xa 2
2
12
Least Squares Polynomial Regression:
Geometric Interpretation
• The least squares solution seeks a set of polynomial
coefficients that minimize the sum of the squared
difference between the measured y coordinate of
each point and the y coordinate if the assumed
polynomial model was strictly obeyed, i.e. we are
trying to find a model that will minimize vertical
distances (distances along the Y axis).

13
Weighted least squares: Row weighting
N n
min {a j }  (y i   a j x ij ) 2
i 0 j 0


min {a} y  Xa 2
2
Different instances of y get different weights
(higher the weight the more the importance to
that instance!)

N n Wy  WXa
min {a j }  wi ( y i   a j x ) j 2

a  ( WX) WX  ( WX)T Wy
i
T 1
i 1 j 0


min {a} W(y  Xa )
2
2

W  diag ( w1 , w2 ,..., wN ) 14
Regularization
• In some cases, the system of equations may be ill-
conditioned.
• In such cases, there may be more than one
solution.
• It is common practice to choose a “simple” or
“smooth” solution.
• This practice of choosing a “smooth” solution is
called as regularization.
• Regularization is a common method used in
several problems in machine learning, computer
vision and statistics.
15
Regularization: Ridge regression
Penalize (or discourage) solutions in which the
min{a} y  Xa 2   a 2
2 2
magnitude of vector a is too high. The parameter λ
is called the regularization parameter. Larger the
value of λ, the greater the encouragement to find a
smooth solution.

X T
X  Ia  X y T
Taking derivative w.r.t. a, we get this regularized
solution for a.

X  U SVT
VS V  Ia  VS U y
2 T T

V S  IV a  VS U y
2 T T

S  IV a  S U y
2 T T

SiiUTi yr
V a 2
T

i 1 Sii   16
Regularization: Tikhonov regularization
min{a} y  Xa 2   Ba 2
2 2
X T
X  BT Ba  X T y

Penalize (or discourage) solutions in which the magnitude of vector Ba is too high. The
parameter λ is called the regularization parameter. Larger the value of λ, the greater the
encouragement to find a smooth solution. The matrix B can be chosen in different ways
– very commonly, one penalizes large values of the gradient of vector a, given as
follows:
a  a1 a2  a1 a3  a2 . . an1  an2 an  an1 

In such a case, B is given as follows:


1 0 0 0 0 00
 
1 1 0 . . . 0
 . 1 1 . . . 0
 
B . . . . . . .
 . . . . . . .
 
 0 . . 0  1 1 0
 0 . . 0  1 1 
 . 17
Total Least Squares
• There are situations when there are errors in
not just the yi values, but in the xi values as
well.
• In such a situations, least squares is not
applicable – instead a method called total
least squares is used.
• Total least squares solutions heavily use the
SVD – so this is one more application of the
SVD for you!

18
Total Least Squares
• Consider the equation y ≈ Xa for polynomial
regression.
• We want to find a, in the case where both y
and X have errors (in the earlier case, only y
had errors).
• So we seek to find a new version of y (denoted
y*) close to y, and a new version of X
(denoted X*) close to X, such that y*=X*a.
• Thus we want (see next slide):
19
Find ( y* ,X*) to minimize
yy* 2  XX* such that y*  X * a.
2 2
F

Define Z* (X* | y* ),Z (X | y)


.
a
M inimize Z  Z * F such that Z *    0.
2

  1
Size of X * is N  ( n  1). Size of Z * is N  (n  2).
a
As the vector   is not all zeros,
  1
the column rank of Z * is less than n  2.
To minimize Z  Z * F , Z * is the best rank n  1 approximation to Z, given by
2

n 1
Z*    iui v it from SVD of Z (Eckhart - Young Theorem).
i 1

a v1:n 1,n  2


Also    v n  2 , i.e. a   .
  1 v n  2 ,n  2
20
Interpolation
• In the case of interpolation, we want the
function being fit to pass through the given
data-points exactly.
• The type of interpolation is determined by
whether it is a function of one or more
variables, as well as the type of function.
• Let us consider lower-order polynomial
interpolation in 1D.

21
Interpolation
• Given a set of N points lying on a curve, we
can perform linear interpolation by simply
joining consecutive points with a line.

22
Interpolation
• The problem with this approach is that the slopes
of adjacent lines change abruptly at each data-
point – called as C1 discontinuity.
• This can be resolved by stitching piecewise
quadratic polynomials between consecutive point
pairs and requiring that
 Adjacent polynomials actually pass through the
point (interpolatory condition)
 Adjacent polynomials have the same slope at
that point (C1 continuity condition)
23
Interpolation
• We could go one step further and require that
 Adjacent polynomials have the same second derivative
at that point (C2 continuity condition)
• This requires that the polynomials be piecewise cubic
(degree 3) at least.
• This type of interpolation is called cubic-spline
interpolation and is very popular in graphics and image
processing.
• The data-points (where the continuity conditions) are
imposed are called as knots or control points.
• A cubic spline is a piecewise polynomial that is twice
continuously differentiable (including at the knots).
24
Cubic spline interpolation
• A curve is parameterized as y = f(t) where ‘t’ is
the independent variable.
• Consider N points (ti,yi) where i ranges from 1
to N.
• We will have N-1 cubic polynomials, each of
the form yi(t) = ai + bi t + ci t2 + di t3 where i
ranges from 1 to N-1.
• Thus the overall function y(t) is equal to yi(t)
where t lies in between ti and ti+1.
25
Cubic spline interpolation
• The total number of unknowns here is 4(N-1)
(why?).
• To determine them, we need to specify 4(N-1)
equations.
• As the curve must pass through the N points,
we get N equations as follows:
y1  a1  b1t1  c1t12  d1t13
y2  a1  b1t2  c1t22  d1t23
.
yi  ai 1  bi 1ti  ci 1ti2  d i 1ti3
.
y N  a N 1  bN 1tN  cN 1t N2  d N 1tN3 26
Cubic spline interpolation
• The interpolatory conditions at the N-2
interior points yield N-2 equations of the
form:
y2  a1  b1t2  c1t22  d1t23  a2  b2t2  c2t22  d 2t23
.
yi  ai 1  bi 1ti  ci 1ti2  d i 1ti3  ai  bi ti  ci ti2  d i ti3
.
y N 1  a N 2  bN 2tN 1  cN 2t N2 1  d N 2tN3 1  a N 1  bN 1tN 1  cN 1t N2 1  d N 1tN3 1

27
Cubic spline interpolation
• The C1 continuity conditions at the N-2
interior points yield N-2 equations of the
form:
b1  2c1t2  3d1t22  b2  2c2t2  3d 2t22
.
bi 1  2ci 1ti  3d i 1ti2  bi  2ci ti  3d i ti2
.
bN 2  2cN 2tN 1  3d N 2tN2 1  bN 1  2cN 1tN 1  3d N 1tN2 1

28
Cubic spline interpolation
• The C2 continuity conditions at the N-2
interior points yield N-2 equations of the
form:
2c1  6d1t2  2c2  6d 2t2
.
2ci 1  6d i 1ti  2ci  6d i ti
.
2cN 2  6d N 2t N 1  2cN 1  6d N 1tN 1

• The total number of equations is N+3(N-2) =


4N-6 whereas there are 4N-4 unknowns. 29
Cubic spline interpolation
• We need two more equations.
• One possible set of conditions is to impose that the
second derivative of the curve at the two end-points be
zero. This gives two more equations of the form:
2c1  6d1t1  0
2cN 1  6d N 1tN  0
• A cubic spline with these two conditions is called a
natural cubic spline and these conditions are called
natural boundary conditions.
• We now have 4(N-1) linear equations and 4(N-1)
unknowns.
• Solving the linear system gives us the cubic spline!
30
Cubic spline interpolation
• For the case of N = 3 points, the system of
equations looks as follows:
 1 t1 t12 t13 0 0 0 0  a1   y1 
    
 1 t2 t22 t23 0 0 0 0  b1   y2 
0 0 0 0 1 t2 t22 t23  c1   y2 
    
0 0 0 0 1 t3 t32 t3  d1   y3 
3

 2 
0   
0 1 2 t2 3t22 0  1  2 t2  3t2  a2 0
   
0 0 2 6t2 0 0 2  6t2  b2   0 
    
0 0 2 6t1 0 0 0 0  c2   0 
0 0
 0 0 0 0 2 6t3  d 2   0 

31
32
19

18

17

16

15

14

13

12

11

10
1 1.5 2 2.5 3

A cubic-spline fit to three points (1,10), (2,18) and


(3,11) – marked with red asterisks.

33
References
• Scientific Computing, Michael Heath
• http://en.wikipedia.org/wiki/Total_least_squa
res

34

You might also like