You are on page 1of 6

UNIVERSIDAD DEL PACÍFICO

Departamento Académico de Economía


Matemáticas III (130233) - Sección A
Primer Semestre 2018
Profesor Diego Winkelried

4 | Topics in calculus

4.1 Transformations from Rn to Rm

• Transformation. A transformation from Rn to Rm , denoted as f : Rn → Rm is a generalization of a function:


it is an operation that takes as an input a vector x ∈ Rn and gives as an output a vector y = f (x ) ∈ Rm . By
convention, functions are scalar valued, m = 1.

• Gradient of a function. Consider a function f : Rn → R, that is to say, a transformation taking as input a


vector x = (x 1 ,x 2 , . . . ,x n ) 0 ∈ Rn and rendering a scalar y = f (x ). Upon differentiating,

∂ f (x ) ∂ f (x ) ∂ f (x )
dy = dx 1 + dx 2 + · · · + dx n .
∂x 1 ∂x 2 ∂x n
The differential of y is the sum of the differentials of each argument, x i , weighted by the partial derivatives
∂ f (x )/∂x i . This result can be expressed as the inner product of two vectors,
 dx 1 
∂ f (x ) ∂ f (x ) ∂ f (x )  dx 2 
" #  
dy = ···  ...  → dy = ∇f (x )dx ,
∂x 1 ∂x 2 ∂x n 1×n
 dx n  n×1
 

where dx ∈ Rn is the vector that collects the differentials of the entries of x, and ∇f (x ) is the gradient (by
convention a row vector, of dimension 1 × n), that collects the partial derivatives.

• Directional derivative. Suppose that the function f (x ) is initially located at point x = a. Then, an
infinitesimal change from x to x + ϵu occurs, where ϵ is an arbitrarily small number, and u is the vector
that gives the direction of the change. The change produced in f (x ), in the direction of u, is known as the
directional derivative, denoted as

Du f (a) = ∇f (a)u .

The similarities with the definition of the differential are rather obvious. The directional derivative simply
evaluates the change at dx = u. Typically, it is assumed that ku k = 1, such that u provides indeed the
direction of the change, whereas ∇f (a) gives its magnitude.
Suppose that u = e i , a canonical vector. Then, Du f (a) = ∂ f (x )/∂x i . Thus, the directional derivative
generalizes the notion of a partial derivative, in which the change is taken along one of the coordinate curves,
all other coordinates held constant.

• Derivative matrix (Jacobian matrix). Consider a transformation f : Rn → Rm . Each element of vector y


is the result of evaluating a function fi : Rn → R at x. In other words, yi = fi (x ) for i = 1, 2, . . . ,m. The
transformation y = f (x ) simply collects the results of all m function involved. Thus,

dyi = ∇fi (x )dx .

Derechos reservados c 2018, Diego Winkelried 29


Prohibida su reproducción y distribución fuera de la Universidad del Pacífico
Class Notes 4 - Topics in calculus

Upon stacking the m differentials dyi in a vector dy ∈ Rm , we obtain


 dy1   ∇f 1 (x )   dx 1 
 dy2 ∇f 2 (x )   dx 2 
     
 ...
 =  
..  ...  → dy = D f (x )dx .
.
 
 
 dym  ∇fm (x )   dx n  n×1
   
 m×1 m×n

This expression generalizes the notion of the differential for transformations whose range is Rm . The matrix
D f (x ) links the changes in the n entries of x, to the changes of the m entries of y, and is therefore m × n.
The i-th row of this matrix contains the gradient of function fi (x ). Thus, explicitly
 ∂ f 1 (x ) ∂ f 1 (x ) ∂ f 1 (x ) 
···
 ∂x 1 ∂x 2 ∂x n
 

 ∂ f 2 (x ) ∂ f 2 (x ) ∂ f 2 (x ) 
···
∂x 1 ∂x 2 ∂x n
 
D f (x ) =   .
.. .. .. ..
. . . .
 
 
 ∂ fm (x ) ∂ fm (x ) ∂ fm (x ) 
 
 ∂x 1 ···
∂x 2 ∂x n  m×n

This is the Jacobian matrix, and contains the partial derivatives of all functions fi (x ) (i = 1, 2, . . . ,m arrayed
in rows), with respect to all the inputs x j (j = 1, 2, . . . ,n arrayed in columns).
• Jacobian. The determinant of D f (x ), of course when m = n.
• Composition of transformations. Consider two transformations f : Rn → Rm and д : Rp → Rn .
Transformation д(·) takes a u in Rp and renders a vector x = д(u) from Rn . On the other hand, transformation
f (·) takes a vector x in Rn and gives as a result a vector y = f (x ) in Rm .
We could think of a transformation h : Rp → Rm which is the result of the sequential application of the
transformations д y f : y = h(u) takes a vector u in Rp and renders a y = f (д(u)) de Rm . h is a composite
transformation. Often, the operation f (д(u)) is denoted as f ◦ д(u).
• Jacobian matrix of a composition. We note that dx = Dд(u)du, where Dд(u) is the n × p Jacobian matrix
of д. Moreover, we have that dy = D f (x )dx, where D f (x ) is the m × n Jacobian matrix of f . Then, for the
composite function y = f ◦ д(u)
dy = D f (x )dx = D f (x )[ Dд(u)du ] = D f (x )Dд(u)du ≡ D f ◦ д(u)du .
The m × p Jacobian matrix of the composite transformation is the product of the Jacobian matrices of each of
the transformations composing f ◦ д.
• Chain rule. The (i, j)-th entry of D f ◦ д(u) is the inner product between the i-th row of D f (x ) and the j-th
column of Dд(u). Hence,
∂ fi (x ) ∂д1 (u) ∂ fi (u) ∂д2 (u) ∂ fi (x ) ∂дn (u)
[D f ◦ д(u)]i j = + +...+ ,
∂x 1 ∂u j ∂x 2 ∂u j ∂x n ∂u j
for i = 1, 2 . . . ,m and j = 1, 2, . . . ,p.
Recall that yi = fi (x ) and x s = дs (u). Then, we can write
∂yi ∂x 1 ∂yi ∂x 2 ∂yi ∂x n ∂yi
[D f ◦ д(u)]i j = + +...+ ≡ ,
∂x 1 ∂u j ∂x 2 ∂u j ∂x n ∂u j ∂u j
which reveals a generalization of the chain rule: how yi varies with a change in u j , after taking into account
the composition process f ◦ д.

Derechos reservados c 2018, Diego Winkelried 30


Prohibida su reproducción y distribución fuera de la Universidad del Pacífico
Class Notes 4 - Topics in calculus

4.2 Implicit functions

• Consider two vector y ∈ Rm and x ∈ Rn and a transformation F : Rm+n → Rm such that, for a given point
(x̄, ȳ), we have that
F (x̄, ȳ) = 0 .
This equality defines an implicit function. In particular, the goal is to verify if there exists a transformation
f : Rn → Rm , such that ȳ = f (x̄ ) and
F (x̄, f (x̄ )) = 0 .

• Implicit function theorem. Upon differentiating the equality F (x̄, ȳ) = 0 we obtain, using partitions,
f g " dx #
DF x (x̄, ȳ) DFy (x̄, ȳ) =0 → DF x (x̄, ȳ)dx + DFy (x̄, ȳ)dy = 0 ,
dy
where DF x (·) is the m×n Jacobian matrix that collects the partial derivatives of F (·) with respect to x, whereas
DFy (·) is the m × m Jacobian matrix that collects the partial derivatives of F (·) with respect to y. Solving for
dy,
dy = −DFy (x̄, ȳ) −1 DF x (x̄, ȳ)dx ,
which established a relationship between dy and dx.
dy measures the change in y for a change in x in other to keep the equality F (x̄, ȳ) = 0 and, in doing so,
defines y as a function of x, around (x̄, ȳ).
• Explicit functions. Consider the “explicit function” approach y = f (x ). Upon differentiating, by the chain
rule,
dF (x, f (x )) = DF x (x,y)dx + DFy (x,y)D f (x )dx .
Provided that dF (x̄, f (x̄ )) = 0, we have
D f (x̄ )dx = −DFy (x̄, ȳ) −1 DF x (x̄, ȳ)dx → D f (x̄ ) = −DFy (x̄, ȳ) −1 DF x (x̄, ȳ) .

• Explicit functions, again. In the case F (x,y) = f (x ) − y, we observe that DFy (x,y) = −I m and therefore
D f (x ) = DF x (x,y).
• Existence. From our previous analysis, we can conclude that an implicit function exists around point (x̄, ȳ)
if DFy (x̄, ȳ) is a nonsingular matrix. Hence, if the Jacobian satisfies
| DFy (x̄, ȳ) | , 0 .

4.3 Taylor polynomials and Taylor series

• Basics. The purpose is to derive polynomial approximations of a continuous function f : R → R, around


(i.e., in the neighborhood of ) a point a.
To this end, consider the polynomial
pn (x ) = c 0 + c 1 (x − a) + c 2 (x − a) 2 + c 3 (x − a) 3 + · · · + c n (x − a) n ,
in x of degree n.
Also, denote the i-th derivative of f (x ) with respect to x as
d i f (x ) d i f (x )
f (i ) (x ) = and f (i ) (a) = .
d xi d x i

x =a

Derechos reservados c 2018, Diego Winkelried 31


Prohibida su reproducción y distribución fuera de la Universidad del Pacífico
Class Notes 4 - Topics in calculus

• Linear approximation. From the definition of the derivative, we know that the linear function (i.e., a
polynomial of degree 1) p1 (x ) = c 0 + c 1 (x − a) that best approximates function f (x ) around x = a, is
the tangent curve that passes through f (a). At the tangency point, p1 (a) = f (a) and p1(1) (a) = f (1) (a). Then,

p1 (x ) = c 0 + c 1 (x − a) → p1 (a) = c 0 = f (a) → c 0 = f (a)


p1(1) (x ) = c1 → p1(1) (a) = c1 = f (1) (a) → c 1 = f (1) (a) .

Upon replacing these findings on the definition of p1 (x ), we obtain the equation of the tangent curve

p1 (x ) = f (a) + f (1) (a)(x − a) .

• Quadratic approximation. Let us find now the quadratic function p2 (x ) = c 0 +c 1 (x −a) +c 2 (x −a) 2 that best
approximates f (x ) around x = a. At a we have that p1 (a) = f (a), p1(1) (a) = f (1) (a) and p1(2) (a) = f (2) (a).
That is, the value, slope and curvature of both f (x ) and p2 (x ) coincide at x = a. Then,

p2 (x ) = c 0 + c 1 (x − a) + c 2 (x − a) 2 → p2 (a) = c 0 = f (a) → c 0 = f (a)


p2(1) (x ) = c 1 + 2c 2 (x − a) → p2(1) (a) = c 1 = f (1) (a) → c 1 = f (1) (a)
p2(2) (x ) = 2c 2 → p2(2) (a) = 2c 2 = f (2) (a) → c 2 = 12 f (2) (a) .

Upon replacing these findings on the definition of p2 (x ), we obtain

f (2) (a)
p2 (x ) = f (a) + f (1) (a)(x − a) + (x − a) 2 .
2
Note that p2 (x ) equals the linear approximation p1 (x ) plus the quadratic term 21 f (2) (a)(x − a) 2 .

• Cubic approximation. Now we get the cubic polynomial p3 (x ) = c 0 + c 1 (x − a) + c 2 (x − a) 2 + c 3 (x − a) 2


that best approximates f (x ) around x = a. We now require that at point a the value of p3 (x ) and its first three
derivatives to be the same as those of f (x ). Thus,

p3 (x ) = c 0 + c 1 (x − a) + c 2 (x − a) 2 + c 3 (x − a) n → p3 (a) = c 0 = f (a) → c 0 = f (a)


p3(1) (x ) = c 1 + 2c 2 (x − a) + 3c 3 (x − a) 2 → pn(1) (a) = c1 = f (1) (a) → c 1 = f (1) (a)
p3(2) (x ) = 2c 2 + (3 · 2)c 3 (x − a) → p3(2) (a) = 2c 2 = f (2) (a) → c 2 = 12 f (2) (a)
p3(3) (x ) = (3 · 2)c 3 (x − a) → pn(3) (a) = 3!c 3 = f (3) (a) → c 3 = 1 (3)
3! f (a) .

Upon replacing these findings on the definition of p3 (x ), we obtain

f (2) (a) f (3) (a)


p3 (x ) = f (a) + f (1) (a)(x − a) + (x − a) 2 + (x − a) 3 .
2! 3!
1 (3)
Note that p3 (x ) equals the quadratic approximation p2 (x ) plus the cubic term 3! f (a)(x − a) 3 .

• General case. Consider now a polynomial of degree n. To determine its n + 1 coefficients, we impose the
n + 1 restrictions pn(k ) (a) = f (k ) (a) for k = 0, 1, 2, . . . ,n. That is to say, at point a the value of f (x ) and pn (x )
coincides, as well as their first n derivatives. Thus,

pn (x ) = c 0 + c 1 (x − a) + c 2 (x − a) 2 + · · · + c n (x − a) n → pn (a) = c 0
pn(1) (x ) = c 1 + 2c 2 (x − a) + 3c 3 (x − a) 2 + · · · + nc n (x − a) n−1 → pn(1) (a) = c 1
pn(2) (x ) = 2c 2 + (3 · 2)c 3 (x − a) + (4 · 3)c 4 (x − a) 2 + · · · + n(n − 1)c n (x − a) n−2 → pn(2) (a) = 2c 2
pn(3) (x ) = (3 · 2)c 3 (x − a) + (4 · 3 · 2)c 4 (x − a) + · · · + n(n − 1)(n − 2)c n (x − a) n−3 → pn(3) (a) = 3!c 3 .
..
.
pn(n) (x ) = n · (n − 1) · (n − 2) · · · 4 · 3 · 2c n → pn(n) (a) = n!c n

Derechos reservados c 2018, Diego Winkelried 32


Prohibida su reproducción y distribución fuera de la Universidad del Pacífico
Class Notes 4 - Topics in calculus

from which we deduce pn(k ) (a) = k!c k for k = 0, 1, 2, . . . ,n (recall that 0! = 1).
Then, from the equality pn(k ) (a) = f (k ) (a) we obtain that c k = f (k ) (a)/!k, rendering the so-called Taylor
polynomial of degree n
f (a) f (1) (a) f (2) (a) f (k ) (a) f (n) (a)
pn (x ) = + (x − a) + (x − a) 2 + · · · + (x − a) k + · · · + (x − a) n .
0! 1! 2! k! n!
Note that
f (n) (a)
pn (x ) = pn−1 (x ) + (x − a) n .
n!
Thus, Taylor polynomials are formed by sequentially adding terms to the approximation in order to match
higher-order derivatives of the original function f (x ). For this reason, we can expect the degree of accuracy
of the approximation (in the neighborhood of a) to be increasing in n.
• Remainder (Optional). When approximating a function by a Taylor polynomial of degree n, we generate an
approximation error, a remainder, which is equal to
Rn (x ) = f (x ) − pn (x ) .
An explicit, useful form for Rn (x ) can be found through a simple application of the Mean Value Theorem.
This important theorem says that if we have a function д(x ) that is continuous on the interval [a,b], then there
exists a value (the mean value) c ∈ [a,b] such that
д(b) = д(a) + д 0 (c)(b − a) .
Note that the theorem does not refer to an approximation, it involves an equality (with no error). The theorem
states that c exists, it does not tell us how much it is, only that it belongs to [a,b].
Back to the remainder of the Taylor approximation, we then have that
f (n+1) (z)
Rn (x ) = (x − a) n+1 ,
(n + 1)!
where z lies between x and a. This looks quite similar to the extra term we would add when we move from
pn (x ) to pn+1 (x ). The important difference is that in the definition of Rn (x ), f (n+1) (·) is evaluated at z, whereas
it is evaluted at a in pn+1 (x ) − pn (x ).
This expression for Rn (x ) is known as the Lagrange remainder.
• Bounding the approximation error (Optional). We can use the Lagrange remainder to compute the
magnitude of the maximum error (in absolute value) that the Taylor polynominal pn (x ), centered at a,
produces when approximating f (x ).
Let Mn = maxz ∈[x,a] | f (n+1) (z) |. That is, Mn is the maximum absolute value of f (n+1) (·) on the interval
between x and a. Then,
f (n+1) (z) |x − a|n+1
(x − a) n+1 ≤ Mn

|Rn (x )| = = Error bound .
(n + 1)! (n + 1)!
• Taylor’s theorem. Suppose that for all n, Mn ≤ M which is a finite number. Then,
|x − a|n+1
lim |Rn (x )| ≤ M lim = 0.
n→∞ n→∞ (n + 1)!
This happens because the factorial function grows faster than the power function.
The implication of Taylor’s theorem is that since Rn (x ) → 0 as n → ∞, then
lim pn (x ) = lim f (x ) − Rn (x ) = f (x ) − lim Rn (x ) = f (x ) .
 
n→∞ n→∞ n→∞

As n increases, our approximation becomes progressively more accurate to the extent that the Taylor
polynominal converges to f (x ).

Derechos reservados c 2018, Diego Winkelried 33


Prohibida su reproducción y distribución fuera de la Universidad del Pacífico
Class Notes 4 - Topics in calculus

• Taylor series. Also known as the power series representation of f (x ):

f (k ) (a)

(x − a) k .
X
f (x ) = lim pn (x ) =
n→∞ k!
k =0

This is a Taylor polynomial of infinite degree.

• Ratio test. Consider an infinite series



X
S= sk .
k =0

The ratio tests says that this infinite sum converges to a finite value (that we actually do not always know) if
sn+1
lim = L < 1.
n→∞ s n

• Range of convergence. For a Taylor series, sk = f (k ) (a)(x − a) k /k!. Then, taking the limit of the ratio test,
sn+1 f (n+1) (a) (x − a) n+1 n! f (n+1) (a) (x − a) n+1 1
L = lim = lim (n) · · = |x − a| lim · · .
n→∞ s n n→∞ f (a) (x − a) n (n + 1)! n→∞ f (n) (a) (x − a) n n + 1

This will be less than one if


f (n) (a)
|x − a| < lim (n+1) (n + 1) ≡ RC .
n→∞ f (a)
The quantity RC is the range of convergence. This means that Taylor’s theorem holds (so pn (x ) converges to
f (x ) as n increases) for all values of x such that |x − a| < RC.
The interval ICa = (RC − a, RC + a) is the interval of convergence. Taylor’s theorem holds if x ∈ ICa .

• McLaurin polynominals and series. Taylor polynomials and series for a = 0.

• Multivariate functions (Optional). If f : Rn → R, the second order Taylor polynominal of f (x ) around


point a ∈ Rn is
1
p2 (x ) = f (a) + ∇f (a)(x − a) + (x − a) 0H (a)(x − a) ,
2
where ∇f (a) is the gradient of f evaluated at point a, and H (a) is the Hessian (i.e., the matrix of second
order partial derivatives) also evaluated at point a.
This is a straightforward generalization of the scalar case. The linear term comes from the inner product of
two vectors, whereas the quadratic term comes from a quadratic form.

Derechos reservados c 2018, Diego Winkelried 34


Prohibida su reproducción y distribución fuera de la Universidad del Pacífico

You might also like