Regression Interpolation

Regression and Interpolation
CS 740
Ajit Rajwade
1 of 20
Problem statements
• Consider a set of N points {(xi,yi)}, 1 ≤ i ≤ N.
• Suppose we know that these points actually
lie on a function of the form y = f(x;a) where
f(.) represents a function family and a
represents a set of parameters.
• For example: f(x) is a linear function of x, i.e.
of the form y = f(x) = mx+c. In this case, a =
(m,c).
2
Problem statements
• Example 2: f(x) is a quadratic function of x, i.e. of the
form y = f(x) = px2+qx+r. In this case, a = (p,q,r).
• Example 3: f(x) is a trigonometric function of x, i.e. of

the form f(x) = p sin(qx+r). In this case, a = (p,q,r).
• In each case, we assume knowledge of the function

family. But we do not know the function parameters,
and would like to estimate them from {(xi,yi)}.
• This is the problem of fitting a function (of known

family) to a set of points.
3
Problem statements
• In function regression (or approximation), we
want to find a such that for all i, f(xi;a)≈yi.
• In function interpolation, we want to fit some

function such that f(xi;a)=yi.
4
Polynomial Regression
• A polynomial of degree n is a function of the
form:
n
y   ai x  a0  a1 x  a2 x  ...  an x
i 2 n
i 0
• Polynomial regression is the task of fitting a

polynomial function to a set of points.
5
Polynomial regression
• Let us assume that x is the independent
variable, and y is the dependent variable.
• In the point set {(xi,yi)} containing N points, we
will assume that the x-coordinates of the
points are available accurately, whereas the y-
coordinates are affected by measurement
error – called as noise.
6
7
Least Squares Polynomial regression
• If we assume the polynomial is linear, we have
yi = mxi + c + ei, where ei is the noise in yi. We
want to estimate m and c.
• We will do so by minimizing the following
w.r.t. m and c:
N
 ( y  mx  c)
i 0
i i
2
• This is called as least squares regression.

8
• In matrix, form we can write the following
 y1   x1 1  e1 
      The solution will exist
 y 2   x2 1  e2  provided there are at
 .  . m   least 2 non-coincident
.    . points. Basically, XTX
    c    must be non-singular.
 .   . .  . 
y  x 1 e 
 N  N  N
y  Xa
T 1 T
a (X X)X y  X y 
9
• For higher order polynomials, we have:
 a  The solution
 y1   x1 . . x1 x1 1   e1  will exist

n 2 n
   n  .    provided
 y2   x2 . . x2 x2 1   e2  there are at
2
 .  .  .
. . .     .  least n non-
    a2    coincident,
 .   . . . .    .  non-collinear
y   n 2  a1    points.
 N   x N . . x N x N 1 a   eN  Basically, XTX
 0 must be non-
y  Xa singular.
a (X T X)
1 T
X y  X y 10
• Regardless of the order of the polynomial, we
are dealing with a linear form of regression –
because in each case, we are solving an
equation of the form y ≈ Xa.
• There are some assumptions made on the

errors in the measurement of y. We will not
go into those details.
11
Least Squares Polynomial Regression:
Geometric Interpretation
• The least squares solution seeks a set of polynomial
coefficients that minimize the sum of the squared
difference between the measured y coordinate of
each point and the y coordinate if the assumed
polynomial model was strictly obeyed, i.e.
N n
min {a j }  (y i   a j x )
i
j 2
i 0 j 0

min {a} y  Xa 2
2
12
Least Squares Polynomial Regression:
Geometric Interpretation
• The least squares solution seeks a set of polynomial
coefficients that minimize the sum of the squared
difference between the measured y coordinate of
each point and the y coordinate if the assumed
polynomial model was strictly obeyed, i.e. we are
trying to find a model that will minimize vertical
distances (distances along the Y axis).
13
Weighted least squares: Row weighting
N n
min {a j }  (y i   a j x ij ) 2
i 0 j 0

min {a} y  Xa 2
2
Different instances of y get different weights
(higher the weight the more the importance to
that instance!)
N n Wy  WXa
min {a j }  wi ( y i   a j x ) j 2
a  ( WX) WX  ( WX)T Wy
i
T 1
i 1 j 0

min {a} W(y  Xa )
2
2
W  diag ( w1 , w2 ,..., wN ) 14
Regularization
• In some cases, the system of equations may be ill-
conditioned.
• In such cases, there may be more than one
solution.
• It is common practice to choose a “simple” or
“smooth” solution.
• This practice of choosing a “smooth” solution is
called as regularization.
• Regularization is a common method used in
several problems in machine learning, computer
vision and statistics.
15
Regularization: Ridge regression
Penalize (or discourage) solutions in which the
min{a} y  Xa 2   a 2
2 2
magnitude of vector a is too high. The parameter λ
is called the regularization parameter. Larger the
value of λ, the greater the encouragement to find a
smooth solution.
X T
X  Ia  X y T
Taking derivative w.r.t. a, we get this regularized
solution for a.
X  U SVT
VS V  Ia  VS U y
2 T T
V S  IV a  VS U y
2 T T
S  IV a  S U y
2 T T
SiiUTi yr
V a 2
T
i 1 Sii   16
Regularization: Tikhonov regularization
min{a} y  Xa 2   Ba 2
2 2
X T
X  BT Ba  X T y
Penalize (or discourage) solutions in which the magnitude of vector Ba is too high. The
parameter λ is called the regularization parameter. Larger the value of λ, the greater the
encouragement to find a smooth solution. The matrix B can be chosen in different ways
– very commonly, one penalizes large values of the gradient of vector a, given as
follows:
a  a1 a2  a1 a3  a2 . . an1  an2 an  an1 
In such a case, B is given as follows:

1 0 0 0 0 00
 
1 1 0 . . . 0
 . 1 1 . . . 0
 
B . . . . . . .
 . . . . . . .
 
 0 . . 0  1 1 0
 0 . . 0  1 1 
 . 17
Total Least Squares
• There are situations when there are errors in
not just the yi values, but in the xi values as
well.
• In such a situations, least squares is not
applicable – instead a method called total
least squares is used.
• Total least squares solutions heavily use the
SVD – so this is one more application of the
SVD for you!
18
Total Least Squares
• Consider the equation y ≈ Xa for polynomial
regression.
• We want to find a, in the case where both y
and X have errors (in the earlier case, only y
had errors).
• So we seek to find a new version of y (denoted
y*) close to y, and a new version of X
(denoted X*) close to X, such that y*=X*a.
• Thus we want (see next slide):
19
Find ( y* ,X*) to minimize
yy* 2  XX* such that y*  X * a.
2 2
F
Define Z* (X* | y* ),Z (X | y)

.
a
M inimize Z  Z * F such that Z *    0.
2
  1
Size of X * is N  ( n  1). Size of Z * is N  (n  2).
a
As the vector   is not all zeros,
  1
the column rank of Z * is less than n  2.
To minimize Z  Z * F , Z * is the best rank n  1 approximation to Z, given by
2
n 1
Z*    iui v it from SVD of Z (Eckhart - Young Theorem).
i 1
a v1:n 1,n  2

Also    v n  2 , i.e. a   .
  1 v n  2 ,n  2
20
Interpolation
• In the case of interpolation, we want the
function being fit to pass through the given
data-points exactly.
• The type of interpolation is determined by
whether it is a function of one or more
variables, as well as the type of function.
• Let us consider lower-order polynomial
interpolation in 1D.
21
Interpolation
• Given a set of N points lying on a curve, we
can perform linear interpolation by simply
joining consecutive points with a line.
22
Interpolation
• The problem with this approach is that the slopes
of adjacent lines change abruptly at each data-
point – called as C1 discontinuity.
• This can be resolved by stitching piecewise
quadratic polynomials between consecutive point
pairs and requiring that
 Adjacent polynomials actually pass through the
point (interpolatory condition)
 Adjacent polynomials have the same slope at
that point (C1 continuity condition)
23
Interpolation
• We could go one step further and require that
 Adjacent polynomials have the same second derivative
at that point (C2 continuity condition)
• This requires that the polynomials be piecewise cubic
(degree 3) at least.
• This type of interpolation is called cubic-spline
interpolation and is very popular in graphics and image
processing.
• The data-points (where the continuity conditions) are
imposed are called as knots or control points.
• A cubic spline is a piecewise polynomial that is twice
continuously differentiable (including at the knots).
24
Cubic spline interpolation
• A curve is parameterized as y = f(t) where ‘t’ is
the independent variable.
• Consider N points (ti,yi) where i ranges from 1
to N.
• We will have N-1 cubic polynomials, each of
the form yi(t) = ai + bi t + ci t2 + di t3 where i
ranges from 1 to N-1.
• Thus the overall function y(t) is equal to yi(t)
where t lies in between ti and ti+1.
25
• The total number of unknowns here is 4(N-1)
(why?).
• To determine them, we need to specify 4(N-1)
equations.
• As the curve must pass through the N points,
we get N equations as follows:
y1  a1  b1t1  c1t12  d1t13
y2  a1  b1t2  c1t22  d1t23
.
yi  ai 1  bi 1ti  ci 1ti2  d i 1ti3
.
y N  a N 1  bN 1tN  cN 1t N2  d N 1tN3 26
• The interpolatory conditions at the N-2
interior points yield N-2 equations of the
form:
y2  a1  b1t2  c1t22  d1t23  a2  b2t2  c2t22  d 2t23
.
yi  ai 1  bi 1ti  ci 1ti2  d i 1ti3  ai  bi ti  ci ti2  d i ti3
.
y N 1  a N 2  bN 2tN 1  cN 2t N2 1  d N 2tN3 1  a N 1  bN 1tN 1  cN 1t N2 1  d N 1tN3 1
27
• The C1 continuity conditions at the N-2
form:
b1  2c1t2  3d1t22  b2  2c2t2  3d 2t22
.
bi 1  2ci 1ti  3d i 1ti2  bi  2ci ti  3d i ti2
.
bN 2  2cN 2tN 1  3d N 2tN2 1  bN 1  2cN 1tN 1  3d N 1tN2 1
28
• The C2 continuity conditions at the N-2
form:
2c1  6d1t2  2c2  6d 2t2
.
2ci 1  6d i 1ti  2ci  6d i ti
.
2cN 2  6d N 2t N 1  2cN 1  6d N 1tN 1
• The total number of equations is N+3(N-2) =

4N-6 whereas there are 4N-4 unknowns. 29
• We need two more equations.
• One possible set of conditions is to impose that the
second derivative of the curve at the two end-points be
zero. This gives two more equations of the form:
2c1  6d1t1  0
2cN 1  6d N 1tN  0
• A cubic spline with these two conditions is called a
natural cubic spline and these conditions are called
natural boundary conditions.
• We now have 4(N-1) linear equations and 4(N-1)
unknowns.
• Solving the linear system gives us the cubic spline!
30
• For the case of N = 3 points, the system of
equations looks as follows:
 1 t1 t12 t13 0 0 0 0  a1   y1 
    
 1 t2 t22 t23 0 0 0 0  b1   y2 
0 0 0 0 1 t2 t22 t23  c1   y2 
    
0 0 0 0 1 t3 t32 t3  d1   y3 
3
 2 
0   
0 1 2 t2 3t22 0  1  2 t2  3t2  a2 0
   
0 0 2 6t2 0 0 2  6t2  b2   0 
    
0 0 2 6t1 0 0 0 0  c2   0 
0 0
 0 0 0 0 2 6t3  d 2   0 
31
32
19
18
17
16
15
14
13
12
11
10
1 1.5 2 2.5 3
A cubic-spline fit to three points (1,10), (2,18) and

(3,11) – marked with red asterisks.
33
References
• Scientific Computing, Michael Heath
• http://en.wikipedia.org/wiki/Total_least_squa
res
34

Regression Interpolation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Regression Interpolation

Uploaded by

Copyright:

Available Formats

Regression and Interpolation

• Example 3: f(x) is a trigonometric function of x, i.e. of

• In each case, we assume knowledge of the function

• This is the problem of fitting a function (of known

• In function interpolation, we want to fit some

• Polynomial regression is the task of fitting a

• This is called as least squares regression.

• There are some assumptions made on the

In such a case, B is given as follows:

Define Z* (X* | y* ),Z (X | y)

a v1:n 1,n  2

• The total number of equations is N+3(N-2) =

A cubic-spline fit to three points (1,10), (2,18) and

You might also like