You are on page 1of 601

Kypc M.

KpacHOB,
BhICWeH MaTeMaTHKH A. Kucenea,
~JUI HH)KeHepoB
r. MaKapeHKO,
E.IIIBKBH

B ,[(Byx ToMax ToM


1
Mathematical M.Krasnov
A. Kiselev
Analysis for G. Makarenko
E. Shikin
Engineers

In two volumes Volume

Mir Publishers
Moscow
Translated from Russian
by Alexander Yastrebov

First published 1990

Ha aH2JlUUCKOM Jl3blKe

Printed in the Union of Soviet Socialist Republics

ISBN 5-03-000270-7 © Mir Publishers, 1989


ISBN 5-03-000269-3
Contents

Preface 9
Chapter 1 An Introduction to Analytic Geometry 11
I.I Cartesian Coordinates 11
1.2 Elementary Problems of Analytic Geometry 14
1.3 Polar Coordinates 18
1.4 Second- and Third-Order Determinants 19
Chapter 2 Elements of Vector Algebra 24
2.1 Fixed Vectors and Free Vectors 24
2.2 Linear Operations on Vectors 26
2.3 Coordinates and Components of a Vector 30
2.4 Projection of a Vector onto an Axis 33
2.5 Scalar Product of Two Vectors 34
2.6 Vector Product of 1\vo Vectors 39
2. 7 Mixed Products of Three Vectors 43
Exercises 45
Answers 46
Chapter 3 The Line and the Plane 47
3.1 The Plane 47
3.2 Straight Line in a Plane 51
3.3 Straight Line in Three-Dimensional Space 55
Exercises 60
Answers 62
Chapter 4 Curves and Surfaces of the Second Order 63
4.1 Changing the Axes of Coordinates in a Plane 63
4.2 Curves of the Second Order 66
4.3 The Ellipse 67
4.4 The Hyperbola 71
4.5 The Parabola 77
4.6 Optical Properties of Curves of the Second Order 79
4.7 Classification of Curves of the Second Order 83
4.8 Surfaces of the Second Order 89
4.9 Classification of Surfaces 90
4.10 Standard Equations of Surfaces of the Second Order 95
Exercises 102
Answers 102
6 Contents

Chapter 5 Matrices. Determinants. Systems of Linear Equations 103


5 .1 Matrices 103
5.2 Determinants 122
5.3 Inverse Matrices 133
5.4 Rank of a Matrix 139
5.5 Systems of Linear Equations 143
Exercises 165
Answers 167
Chapter 6 Linear Spaces and Linear Operators 168
6.1 The Concept of Linear Space 168
6.2 Linear Subspaces 170
6.3 Linearly Dependent Vectors 17 4
6.4 Basis and Dimension 175
6.5 Changing a Basis 181
6.6 Euclidean Spaces 183
6.7· Orthogonalization 185
6.8 Orthocompliments of Linear Subspaces 189
6.9 Unitary Spaces 191
6.10 Linear Mappings 192
6.11 Linear Operators 197
6.12 Matrices of Linear Operators 200
6.13 Eigenvalues and Eigenvectors 205
6.14 Adjoint Operators 209
6.15 Symmetric Operators 211
6.16 Quadratic Forms 213
6.17 Classification of Curves and Surfaces of the Second
Order 221
Exercises 227
Answers 228
Chapter 7 An Introduction to Analysis 229
7.1 Basic Concepts 229
7 .2 Sequences of Numbers 239
7.3 Functions of One Variable and Limits 247
7.4 Infinitesimals and Infinities 258
7.5 Operations on Limits 266
7 .6 Continuous Functions. Continuity at a Point 272
7. 7 Continuity on a Closed Interval 283
7 .8 Comparison of Infinitesimals 288
7 .9 Complex Numbers 294
Exercises 302
Answers 304
Contents 7

Chapter 8 Differential Calculus. Functions of One Variable 305


8.1 Derivatives and Differentials 305
8.2 Differentiation Rules 316
8.3 Differentiation of Composite and Inverse Functions 324
8.4 Derivatives and Differentials of Higher Orders 332
8.5 Mean Value Theorems 339
8.6 VHospital's Rule 344
8. 7 Tests for Increase and Decrease of a Function on a Closed
Interval and at a Point 349
8.8 Extrema of a Function. Maximum and Minimum of a Func-
tion on a Closed Interval 352
8.9 Investigating the Shape of a Curve. Points of Inflection 362
8.10 Asymptotes of a Curve 367
8.11 Curve Sketching 373
8.12 Approximate Solution of Equations 381
8.13 Tuylor,s Theorem 385
8.14 Vector Function of a Scalar Argument 396
Exercises 401
Answers 403
Chapter 9 Integral Calculus. The Indefinite Integral 409
9.1 Basic Concepts and Definitions 409
9.2 Methods of Integration 414
9.3 Integrating Rational Function 424
9.4 Integrals Involving Irrational Functions 435
9.5 Integrals Involving Trigonometric Functions 445
Exercises 450
Answers 453
Chapter 10 Integral Calculus. The Definite Integral 456
10.1 Basic Concepts and Definitions 456
10.2 Properties of the Definite Integral 461
10.3 Fundamental Theorems for Definite Integrals. 467
10.4 Evaluating Definite Integrals 472
10.5 Computing Areas and Volumes by Integration 476
10.6 Computing Arc Lengths by Integration 488
10.7 Applications of the Definite Integral 495
10.8 Numerical Integration 498
Exercises 503
Answers 505
Chapter 11 Improper Integrals 506
11.1 Integrals with Infinite Limits of Integration 506
11.2 Integrals of Nonnegative Functions 511
11.3 Absolutely Convergent Improper Integrals 514
11.4 Cauchy Principal Value of the Improper Integrals 519
11.5 Improper Integrals of Unbounded Functions 520
11.6 Improper Integrals of Unbounded Nonnegative Functions.
Convergence Tests 523
11.7 Cauchy Principal Value of the· Improper Integral Involving
Unbounded Functions 525
Exercises 526
Answers 527
Chapter 12 Functions of Several Variables 529
12.1 Basic Notions and Notation 529
12.2 Limits and Continuity 533
12.3 Partial Derivatives and Differentials 538
12.4 Derivatives of Composite Functions 545
12.5 Implicit Functions 550
12.6 Thngent Planes and Normal Lines to a Surface 555
12.7 Derivatives and Differentials of Higher Orders 558
ti's Thylor's Theorem 562
12.9 Extrema of a Function of Several Variables 566
Exercises 580
Answers 583
Appendix I Elementary Functions 587
Index 596
Preface

This two-volume book was written for students of technical colleges


who have had the usual mathematical training. It contains just enough in-
formation to continue with a wide variety of engineering disciplines. It
covers analytic geometry and linear algebra, differential and integral cal-
culus for functions of one and more variables, vector analysis, numerical
and functional series (including Fourier series), ordinary differential equa-
tions, functions of a complex variable, Laplace and Fourier transforms,
and equations of mathematical physics. This list itself demonstrates that
the book covers the material for both a basic course in higher mathematics
and several specialist sections that are important for applied problems.
Hence, it may be used by a wide range of readers. Besides students in techni-
cal colleges and those starting a mathematics course, it may be found useful
by engineers and scientists who wish to refresh their knowledge of some
aspects of mathematics.
We tried to give the fundamental material concisely and without dis-
tracting detail. We concentrated on the presentation of the basic ideas of
linear algebra and analysis to make it detailed and as comprehensible as
possible. Mastery of these ideas is a requirement to understand the later
material.
The many examples also serve this aim. The examples were written to
help students with the mechanics of solving typical problems.
More than 600 diagrams are simple illustrations, clear enough to
demonstrate the ideas and statements convincingly, and can be fairly easily
reproduced.
We were conscious not to burden the course with scrupulous proofs
for theorems which have little practical application. As a rule we chose
the proof (marked in the text with special symbols) that was constructive
in nature or explained fundamental ideas that had been introduced, show-
ing how they work. This approach made it possible to devise algorithms
for solving whole classes of important problems.
In addition to the examples, we have included several carefully selected
problems and exercises (around 1000) which should be of interest to those
pursuing an independent mathematics course. The problems have the form
of moderately sized theorems. They are very simple but are good training
for those learning the fundamental ideas.
10 Preface

Chapters 1-6, 26 and Appendix II were written by E. Shikin, Chapters


7-8, 11, 12, 17-21., 27, 28 and 29-32 by M. Krasnov, Chapters 9, 10, 13-16
by A. Kiselev, and Chapters 22-25 and Appendix I by G. Makarenko. There
was no general editor, but each of the authors has read the chapters written
by the colleagues, and so each chapter benefited from collective advice.

The Authors
Chapter 1
An Introduction to Analytic Geometry

1.1 Cartesian Coordinates


Coordinate axis. Let L be an arbitrary line. We may move along
L in either of two directions. When the direction of moving is fixed, the
line is said to be directed.
Definition. A directed line is called an axis.
The direction of an axis is indicated by an arrow (Fig. 1.1).
We fix on the axis L a point O and a line segment a of a unit length,
called a unit distance (Fig. 1.2). Let M be a point on L. We associate with
M a number x such that the value of x is equal to the positive distance
between O and M if the direction of moving from O to M coincides with
the direction of L, and to the negative distance otherwise (Fig. 1.3).
L
Fig. 1.1
L2
a

a
0 L
Fig. 1.2

X
L
0 M 0 L1
Fig. 1.3 Fig. 1.4
Definition. The axis L with the reference point O and the unit distance
a given on it is called the coordinate axis; the number x as defined above
is said to be the coordinate of M.
In symbols, we write M(x) to designate a point M whose coordinate is x.
Cartesian coordinates in a plane.. Let O be a point in a plane. We draw
through O two mutually perpendicular lines L1 and L2. Let us choose a
direction for each line and a unit distance a which is the same for L1 and
L2. Then L1 and L2 become coordinate axes with a common reference point
0 (Fig. 1.4).
12 1. An Introducti_onJp Analytic Geometr
~-------------
We call one of the coordinate axes the x-axis or the axis of abscissas
and the other one the y-axis or the axis of ordinates (Fig. 1.5). The point
0 is called the origin of coordinates.
Let M be a point in a plane as shown in Fig. 1.6. We drop from M
two perpendiculars onto the coordinate axes, the points Mx and My being
the projections of Mon the x- and y-axes and associate with M an ordered
pair (x, y) of numbers so that x is the coordinate of the point Mx and
y is the coordinate of the point My.

y
---- -- ---,M
I
I
I
I
I
I
I

0 X
0 .x
Fig. 1.5 Fig. 1.6

!J

I II Ill IV

II I
.x + - - +
.x
Ill IV y + + - -

Fig. 1.7

The numbers x and y are called the Cartesian coordinates of M, x being


the abscissa and y being the ordinate.
In symbols, we write M(x, y) to designate a point M whose coordinates
are x and y.
For short, we shall refer to the frame of reference given above as a Carte-
sian coordinate system set up in a plane.
The coordinate axes divide a plane into four parts called quadrants.
These are numbered as shown in Fig. 1. 7 and the accompanying table.
Remark. The unit distances may be different for the two axes. Then
the coordinate system is called rectangular.
1.1 Cartesian Coordinates 13
--- - - - - - - -
Cartesian coordinates in three-dimensional space. Let O be a point in
three-dimensional space. We draw through O three mutually perpendicular
lines L1, Li and L3. We choose a direction for each line and a unit distance
which is the same for L1, Li and L3. Then L1, Li and L3 become coordinate
axes with a common reference point O (Fig. 1.8).
We call one of the axes the x-axis or the axis of abscissas, the second
one the y-axis or the axis of ordinates, and the third one the z-axis or the
axis of app/icates. The point O is called the origin of coordinates (Fig. 1.9).

z
a

r
Fig. 1.8 Fig. 1.9

My
/ y
/

.. ·... /
/
/

-------·~

Fig. 1.10

Let M be an arbitrary point in three-dimensional space as shown in


Fig. 1.10. We drop from M three perpendiculars onto the coordinate axes,
the points Mx, My and Mz being the projections of M on the x-, y- and
z-axes and associate with Man ordered triple (x, y, z) of numbers, so that
x is the coordinate of the point Mx, y is the coordinate of the point My
and z is the coordinate of the point Mz.
14 1. An Introduction to Analytic Geometry
- .

The numbers x, y and z are said to be the Cartesian coordinates of


M; x, y and z are called the abscissa, ordinate and applicate of the point
M, respectively.
In symbols, we write M(x, y, z) to designate a point M whose coor-
dinates are x, y and z.
Thus we have set-up a Cartesian coordinate system in three-dimensional
space.
Definition. A plane determined by any pair of coordinate axes is called
a coordinate plane.
There are three coordinate planes in three-dimensional space, namely
the xy-plane, the xz-plane and the yz-plane. These planes divide the space
into eight parts called octants.

1.2 Elementary Problems of Analytic Geometry


Distance formulas. Let M1 (x1) and M2(x2) be points on a coor-
dinate axis. Then the distance d between M1(X1) and M2(x2) is given by
d = d(M1, M2) = lx2 - xii-
Let there be given a Cartesian coordinate system in an xy-plane. Then
the distance between any two points M1 (x1, Yi) and M2(x2, y 2) is given by
d = d(M1, M2) = ✓ (x2 - x1) 2 + (Yi - Y1) 2 .

Y+
Y2
----- ,- -- - - - - M2
I
I
I
I
I
Ml
- - __ 1 M
iJ,

0 ..r, r2 X

Fig. 1.11

◄ Consider the right triangle MM1M2 (Fig. 1.11). The theorem of


Pythagoras gives IM1M21 2 = IM1Ml 2 + IMM21 2, Since the distance between
M1 and M2 equals the length of the segment M1M2 and IM1MI = lx2 - X1 I,
IMM2I = I.Yi - Y1I, we have
d2 = lx2 - X1l 2 + IY2 - Y1l 2-
Notice that lx2 - x11 2 = (x2 - x1)2 and IY2 -Y11 2 = (y2 -y1)2. Then
extracting the square root from d 2 , we get the desired formula. ►
1.2 Elementary Problems of Analytic Geometry 15

Remark. In three-dimensional space the distance between M1 (x1, Y1, zi)


and M2(x2, Y2, z2) is
d = d(M1, M2) = ✓ (x2 - x1}2 + (Y2 - Y1) 2 + (z2 - z1) 2 .
(Show this.)
Examples. (1) Write the equation of a circle with radius r and centre
at the point P(a, b).
◄ Let M(x, y) be a point of a circle (Fig. 1.12). Then IMPI = r. Since
IMPI is the distance between Mand P we have
IMPI = r= ✓ (x- a) 2 + (y- b)2 •

Squaring this equation, we get


(x - a ) 2 + (y - b )2 = r 2•
This is the desired equation of a circle. ►
(2) Let Ft( - c, 0) and Fr(c, 0) be points in a plane and a (a > c ~ 0)
be a given number. Find the condition to be satisfied by coordinates
(x, y) of a point M for the sum of the distances between Mand Ft and
between Mand Fr to be equal to 2a.

lj

!J
...
.....
p /\,/
.. ;At
/ \ ..... ..
/
/
\ ..
... Fi 0 F,, .. X

..... . .. ...
0 :r

Fig. 1.12 Fig. 1.13

◄ Let us find the distances between M and Ft and between M and Fr


(Fig. 1.13). We have
IMFtl = ✓~(x_+_c_)2_+_y_
2 and IMFrl = ✓ (x - c) 2 + y 2 •

Whence
✓ (x + c) 2 + y 2 + ✓ (x - c) 2 + y 2 = 2a.
Transpose the second radical to the right:
✓ (x + c)2 + y 2 = 2a - ✓ (x - c )2 + y 2 •
16 1. An Introduction to Analytic Geometry

Then, squaring and simplifying both sides of the equ~tion, we get


a✓ (x - c) 2 + y 2 = a2 - ex.
Squaring and further simplifying both sides of the above equation, we
obtain
(a2 - c2)x2 ·+ a2y2 = a2(a2 - c2).
Setting b 2 = a 2 - c2 and dividing both sides by a 2 b 2, we arrive at the
equation of an ellipse (see Chap. 4)
x2 y2
a2 + b2 = 1.
This is the condition we have sought for. ►
Division of a line segment in a given ratio. Let M1 (x1, Yt) and
M2(x2, Y2) be two distinct points in a plane. Let a point M(x, y) lie on
the line segment M1M2 and divide MtM2 in the ratio At: A2, i.e.,
.jM1MI _ A1
IMM2I - A2.
Represent the coordinates (x, y) of M in terms of the coordinates of
M1 and M2 and the numbers A1 and A2.

fj

M2
M
I
I I I
I I I
I I I
I I I
0 X

Fig. 1.14

◄ Suppose that the segment is not parallel to the y-axis (Fig. 1.14). Then
IM1MI IM1xMxl
IMM2I IMxM2xl ·
Since IM1xMxl = lx1 - xi and IMxM2xl = Ix - x2I we have
lxt - xi At
Ix - x2I = A2 ·
____ 1.2 Elementary Problems of Analytic Geometry 17

The point M lies between M1 and M2. Hence, there holds either
x1 < x < x2 or x1 > x > x2. This implies that the differences xi - x and
x - x2 are always of the same sign. Thus we may write
X1 - X At
X - X2 - A.2 .
Hence
x=
When the segment M1M2 is parallel to the y-axis, we have x1 = x2 = x.
Notice that this result immediately follows from (*) if we set x1 = x2.
Similar reasoning yields
= A2Y1 + A1Y2 ►
y - "-1 + "-2 •

Example. Find the coordinates of the centre of gravity M of the triangle


with vertices at M1(X1, yi), M2(x2, Y2) and M3(X3, y3) (Fig. 1.15).

Fig. 1.15

◄ Recall that in any triangle the centre of gravity and the point of inter-
section of medians coincide so that M divides each median in the ratio
2: 1 reckoning from the corresponding vertex. Thus, the coordinates of M
are
- lX3 + 2x I d
. ly3 + 2y I

x - 2+ 1 an y = 2+ 1 '
where x' and y' are the coordinates of the point M' of the median M3M'.
Since M' is the mid-point of M1M2, we have
, lx1 + lx2 d , _ ly1 + ly2
x= 1+1 an Y - 1+1
Substituting these relations into the formulas for x and y, we arrive
at the desired result

X= and y = Yt + Y2 + YJ ►
3
Remark. Let M(x, y, z) divide a segment joining M1(X1, Y1, z1) and
M2(x2, Y2, Z2) in the ratio At : A2. Then

X=

1.3 Polar Coordinates


Consider an axis L in a plane and a point O on L (Fig. 1.16).
Let M be a point distinct from O as shown in Fig. 1.17. The number r
is the distance between O and Mand cp is the angle between the positive
direction of L and OM measured counterclockwise. We easily see that the
position of M in the plane is uniquely defined by the values of r and cp.

0
Fig. 1.16 Fig. 1.17

The ordered pair (r, cp) of numbers represents the polar coordinates
of M. The numbers rand cp are called the polar radius and the polar angle,
respectively. ,.,
The point O is referred to as the pole and the axis L as the polar axis.
It is clear that r > 0 and O ~ cp < 21r.
When M coincides with the pole, r = 0. In this case the polar angle
is not defined.
Therefore we may set another coordinate system in a plane, namely the
polar coordinate system.
The Cartesian coordinate system is said to be compatible with a given
polar system if the origin O is the pole, the x-axis is the polar axis and
the y-axis makes an angle of +; with the x-axis (Fig. 1.18). Then the
relations between the Cartesian coordinates and the polar coordinates are
given by the formulas:
cos cp =x and sin cp =Y ,
or r r
x = rcos cp and y = rs1n cp,
where r = ✓ x2 + y 2 •
1.4 Second- and Third-Order Determinants 19

R
0

X X

Fig. 1.18 Fig. 1.19

Example. Let R (R > 0) be a given number. The set of points whose


polar coordinates (r, cp) satisfy the condition
r=R
is a circle with radius R and centre at the pole (Fig. 1.19).

1.4 Second- and Third-Order Determinants


We are given four numbers a11, a12, a21, a22 (these are read a
one-one, a one-two, a two-one, a two-two).
The second-order determinant is the number

In symbols, we write

d = an a12 = a11a22 - a12a21 = det laijl (i = 1, 2; j = 1, 2)


a21 a22 (1.1)
to designate the second-order determinant.
The numbers au, a12, a21 and a22 are called elements of d. The pairs
a11, a12 and a21, a22 of elements are referred to as the first and the second
rows of d, respectively; au, a21 and a12, a22 as the first and the second
columns of d, respectively; a11, a22 as the principal (or positive) diagonal
of d; and a12, a21 as the secondary (or negative) diagonal of d (see
Fig. 1.20).

+
' /.
Fig. 1.20
20 1. An Introduction to Analytic Geometry

Therefore the value of the second-order determinant is equal to the


product of elements on the principal diagonal minus the product of ele-
ments on the secondary diagonal.
Example. Compute the value of
-1 3
-2 4
◄ Using (1.1), we get

- l 3 = - 4+6 = 2. ►
-2 4
We encounter the second-order determinants when dealing with systems
of linear equations in two unknowns

[
a11X + a12Y = bi
a21X + a22Y = bi

Provided that A = "# 0, eliminating the unknowns gives

x - -- -bia12
= -b1a22 - - and y = -bia11
- -- -b1a21
--
aua22 - a12a21 a11a22 - a12a21
or, as quotients of determinants,
bi a12 au b1
bi a22 a21 b2
X=----- and y=----
a11 a12 a11 a12

a21 a22 a21 a22

Let there be given nine numbers au (i = I, 2, 3; j = 1, 2, 3).


The third-order determinant
au a12 a13
A = a21 a22 a23 = det laul
031 032 033

is the number whose value is computed as


A = aua22a33 + a12a23a31 + a21a32a13
- a13a22a31 - a21a12a33 - aua23a32. (1.2)
The first subscript i of au refers to the row, the second subscript j refers
to the column.
The triples of elements au, a22, 033 and a13, a22, a31 are called the prin-
cipal (or positive) diagonal and the secondary (or negative) diagonal of
A, respectively.
1.4 Second- and Third-Order Determinants 21

To tackle a question of signs in (1.2) we mention that the three positive


terms au, a22, 033; a12, a23, 031 and 021, 032, 013 correspond to the principal
diagonal and to the vertices of the two triangles whose bases are parallel
to the principal diagonal. Similarly, the secondary diagonal and the vertices
of the two triangles whose bases are parallel to the secondary diagonal cor-
respond to the three negative terms in (1.2), as shown in Fig. 1.21 and
Fig. 1.22.

Fig. 1.21 Fig. 1.22

This consideration lays down a convenient procedure for computing the


value of the third-order determinant, called the rule of a triangle.
Example. Compute the value of
1 0 -1
A= 2 4 3 .
3 -1 6
◄ The rule of a triangle gives
A = 1 X 4 X 6 + ( -1) X 2 X ( + 0 X 3 X 3 - ( -1) X 3 X 4
-1)
- 0 X 2 X 6- 1 X (- 1) X 3 = 24 + 2 + 0 + 12 - 0 + 3 = 41. ►

Now we shall turn our attention to some properties of the second- and
the third-order determinants easily verified by applying formulas (1.1) and
(1.2).
Properties of determinants. (1) The value of the determinant is un-
changed if the rows and the columns are transposed, i.e.,
au a12 au a21
021 022 a12 a22
au 012 013 au a21 031
a21 a22 023 012 a22 032
031 032 033 013 023 033

(2) The determinant reverses its sign if any two rows (or any two
columns) are interchang~d.
~2- ._. __ 1: _A_n .!~!!Od11ction to Analytic_ Geometry-··- _________ _

(3) The factor common to all elements of some row (or some column)
may be taken outside the determinant, i.e.,
ka11 ka12 ka13 a11 ka12 a13 au a12 013
a21 a22 a23 a21 ka22 a23 =k 021 022 023
031 032 033 031 ka32 033 031 032 033

The following three properties are consequences of Properties 1-3. We


can also verify these properties by directly applying formulas (1.1) and (1.2).
(4) The determinant is zero if it contains two identical rows (or two
identical columns).
(5) The determinant vanishes if all elements of some row (or some
column) are zeros.
(6) The determinant is zero if it contains two proportional rows (or two
proportional columns).
We shall outline another approach to computation of the third-order
determinant
a11 a12 a13
A = a21 a22 a23 •
031 032 033

The minor Mu of A is the second-order determinant obtained by delet-


ing from A the row and the column which intersect in au.
For example, the minor M23 of A is the second-order determinant

M 23 = a11 a12

a31 a32

The cofactor A;i of the element au in A is the minor Mu if the sum


(i + j) of the numbers of the ith row and the jth column is even, and the
minor Mu multiplied by ( -1) otherwise, i.e.,
A lJ.. -- (- l)i+jl~ ..
lY.Ilj,

where ( - Ii+ i is the position sign.


Theorem 1.1. The third-order determinant A is equal to the sum of
products of elements in some row (or some column) of A multiplied by
their respective cofactors so that the following expansions
A = a;iA;i + a;2Ai2 + ai3A;3 (i = 1, 2, 3), (1.3)
A = aliAti + a2jA2i + a3iAJi U = 1, 2, 3) (1.4)

hold.
By way of illustration we show that
A = a11A11 + a12A12 + a13A13.
1.4
.
Second- and Third-Order
.
Determinants 23

◄ Applying formula (1.3), we obtain


.1 = a11(a22033 - a32a23) + a12(a23a31 - a21a33) + a13(a21a32 - a31a22)

= a11

Formulas (1.3) and (1.4) represent the expansion of the determinant with
respect to the ith row and the expansion of the determinant with respect
to the jth column, respectively.
Example. Compute the value of
1 0 -1
d= 2 4 3.
3 -1 6
◄ Expanding the determinant with respect to the first row, we get
d=l 4 3 _ 0 2 3 _ 1 2 4
-1 6 3 6 3 -1
= (24 + 3) + 0 + (2 + 12) = 41. ►
Chapter 2
Elements of Vector Algebra

2.1 Fixed Vectors and Free Vectors


Let A and B be two distinct points. We may move along a line
segment joining A and Bin either of two directions. Suppose we start mov-
ing at A and end at B. This gives rise to a directed segment AB. When
moving from B to A , we get a directed segment BA.
Directed segments are frequently called fixed vectors.
The direction of a fixed vector (or a directed segment) is indicated pic-
torially by an arrow (Fig. 2.1 ).
We denote a fixed vector by a symbol, say AB, with the understanding
that the first letter in a pair refers to the initial point of the fixed vector
and the second letter to the terminal point.
A fixed vector is said to be zero if its initial and terminal points coincide.

D
I
I
I
I
I
I
---+---
B
----+- B ~
B
AB BA

A
A A
Fig. 2.1 Fig. 2.2

--+ -
Definition. The fixed vectors AB and CD are said to be equivalent if
the mid-points of the segments AD and BC coincide (Fig. 2.2).
In symbols, we write AB= CD to signify the fact that AB and CD
are two equivalent fixed vectors.
---+ ---+
Notice that when AB and CD do not lie in the same line, we may in-
troduce a definition equivalent to the given above, namely AB and CD are ·
2.1 Fixed Vectors and Free Vectors 25

equivalent if the quadrilateral ABCD is a parallelogram. Hence all equiva-


lent fixed vectors are of the same length.
By way of illustration we consider the square and the fixed vectors
~ ~ ~
shown in Fig. 2.3. It is clear that AB and DC are equivalent while BC
~

and DA are not.


Equivalent fixed vectors obey the following laws. . ~
(1) Any fixed vector is equivalent to itself, i.e., AB = AB.
(2) If AB = CD then CD = AB.
(3) If AB = CD and CD = EF then AB = EF.
B
D

1 ,

A B
Fig. 2.4
Fig. 2.3

Fig. 2.5

Let AB be a given fixed vector and C be an arbitrary point. By virtue


of the above definition we can always find a point D (Fig. 2.4) such that
~ ~

CD= AB.
Hence, to any point we may apply a fixed vector equivalent to the given one.
Now we shall turn our attention to free vectors, i.e, to vectors which
may be made to start at an arbitrary point. In other words, a free vector
is a vector which may be moved rigidly parallel to itself.
We clearly see that a given fixed vector AB uniquely defines a free vector
which starts at A and ends at B.
Free vectors whose initial points lie on the line determined by a given
(nonzero) fixed vector are thought of as sliding vectors (Fig. 2.5).
Fixed vectors and sliding vectors are widely used in theoretical
mechanics.
26 2.··-Elements
- . --·
of Vector
- ~- . --
Algebra

We shall denote free vectors by bold face small letters a, b, c, . . . and


the zero vector by 0.
Let a be a free vector and A be a point. Then there exists only one
point B such that
-+
AB= a.
Hence, the operation of moving a given free vector to a specific point gives
-+
rise to a unique fixed vector AB. This operation is referred to as applying
a free vector a to a point A.

Fig. 2.6

Notice that applying a given free vector to different points, we obtain


fixed vectors which are equivalent and, consequently, of the same length.
Therefore we may speak of the length of a free vector, setting it equal to
the length of the corresponding fixed vector. The length of the zero vector
is equal to zero. We shall denote the length of a free vector a by lal.
Notice that if a = b then lal = lbl and the converse is n0t true.

2.2 Linear Operations on Vectors


Addition of vectors. Let there be given two vectors a and b. Apply
the vector a to some point O so that OA = a and then apply the vector
b to the point A, as shown in Fig. 2. 7. The resultant vector OB is the sum
of a and b denoted by a + b. This rule of addition of two vectors is known
as the triangle law.

Fig. 2.7 Fig. 2.8


2.2 Linear Operations on Vectors 27

It is easy to see that addition of two vectors obeys the commutative


law, i.e., there holds
a+ b = b + a.
for any two vectors a and b (Fig. 2.8).
Apply a and b to some point O and construct a parallelogram with
sides a and b, as shown in Fig. 2.9. Then the vector OB which starts at
0 and ends at the vertex opposite to O represents the sum a + b ( or b + a).
This rule of addition is called the parallelogram law.

B
I
I
I
I
I
I
I
I C
0 a A

Fig. 2.9 Fig. 2.10

Now we consider three vectors a, band c and proceed as follows. First,


. ~
we apply a to some point O so that OA = a. Then we apply b to A so
that AB = b. And finally, we apply c to B so that BC = c. Thus we obtain
a polygonal line of three vectors as shown in Fig. 2.10.
Using the triangle law, we obtain OB= a+ b and OC =(a+ b) +c
(Fig. 2.11).
On the other hand, we have AC= b + c and OC =a+ (b + c)
(Fig. 2.12). Hence, we conclude that for any three vectors a, b, c there holds
(a + b) + c = a + (b + c).
Thus addition of three vectors obeys the associative law and we may
write the sum of a, b and c as
a+ c + b.
8

Fig. 2.11 Fig. 2.12


28 2. Elements of Vector Algebra

Fig. 2.13

Similarly, we define the sum of any number of vectors as a vector which


starts at the initial point of the first summand and ends at the terminal
point of the last summand as depicted in Fig. 2.13 for seven vectors. The
sum of seven vectors is
a = a1 + a2 + a3 + ~ + as + ~ + a1.
This rule of addition may be referred to as the law of closure of a poly-
gonal line to complete a polygon.
Example. Find the sum of six vectors each of which starts at the centre
and ends at a vertex of a regular hexagon (Fig. 2.14).

a3 a,
\
\
\
\
a
a"' a,
\
\
\

as a6 a4

Fig. 2.14 Fig. 2.15

◄ The law of closure of a polygonal line yields

Multiplication of a vector by a scalar. Definition. Free vectors a and


b are said to be collinear if their corresponding fixed vectors lie on parallel
lines (Fig. 2.15) or on the same line.
2.2 Linear Operations on Vectors
----------- --------
29

In symbols, we write a II b to denote collinear vectors.


Remark. It follows from the definition that a and b are always collinear
when either a or b is a zero vector.
-+
Let a and b be two collinear vectors applied to a point O so that OA = a
and OB = b. Then the points 0, A and B lie on the same line. The point
0 may lie either outside the segment joining A and B or between A and
B as shown in Fig. 2.16. The vectors a and b are of the same direction
in the first case and of the opposite directions in the second one.
The vectors a and b are said to be equivalent if they are of the same
length and of the same direction.
Let a be a vector and A be a scalar.
Definition. The product of a vector a by a scalar A is the vector b such
that
(a) lbl = IAI · lal;
(b) a and b are of the same direction if A > 0 and of opposite directions
when A< 0.
In symbols, we write b = Aa.
When A = 0 we get Aa = 0.

· • - - ~.~----1
. ►~ ◄•~-- ■8►
....•------..
0 A 8 A 0

Fig. 2.16 Fig. 2.17

Therefore the vectors a and b = Aa are collinear by definition. The con-


verse is also true, namely if a (a -;t: 0) and b are collinear vectors, then
there exists a scalar A such that b = Aa.
Basic properties of multiplication of a vector by a scalar are
(1) (A + µ)a = Aa + µa.
(2) A(µa) = (Aµ)a.
(3) A(a + b) = Aa + Ab.
(Here A and µ are any real numbers, and a and b are any vectors.)
Definition. A vector a0 of a unit length (ia 0 = 1) is called a unit vector.
1

If a -;t: 0 then the vector


o 1 a
a =7ara=7ar

is a unit v'!ct<'r of the same direction as a (Fig. 2.17).


30 -···· __ ~:.Ele~~!!_ts of Vectoi:_A_l..;;;..ge_b_ra__________________

2.3 Coordinates and Components of a Vector


Let a Cartesian coordinate system with the origin O be given in
three-dimensional space.
Given a point Min the space, we may find three points P, Q and R
called projections of M onto the coordinate axes (Fig. 2.18). Conversely,
given three points P, Q and R on the coordinate axes, there exists a unique
point M such that P, Q and R are projections of M onto the coordinate
axes. Hence, the position of Min the space is uniquely defined by projec-
tions P, Q and R of M onto the coordinate axes and vice versa.
Recall that positions of P, Q and R on the coordinate axes are identified
by their respective coordinates x, y and z. There is a one-to-one correspond-
ence between Mand the ordered triple (x, y, z) of numbers called the Carte-
sian coordinates of M.

z z

y y

Fig. 2.18 Fig. 2.19

Let i, j, k be three un1t vectors located at the origin O of a given Cartesi-


an coordinate system and pointing in the positive directions of the x-, y-
and z-axes, respectively, as shown in Fig. 2.19. We c-onsider a vector a which
starts at the origin O of coordinates and ends at the point A. Let us draw
through A planes perpendicular to coordinate axes. These planes intersect
the x-, y- and z-axes at the points P, Q and R, respectively. It follows from
Fig. 2.20 that
a= OP+ OQ + OR. (2.1)

The vectors OP, OQ and OR are collinear to the unit vectors i, j, k,


respectively. Hence. there exist numbers x, y and z such that OP= xi 1
2.3 Coordin~tes and Components of cl Vector 31

~ ~

OQ = yj and OR= zk, and


a = xi + yj + zk. (2.2)
Formula (2.2) is referred to as the expansion of a with respect to the unit
vectors i, j, k.
Expansion (2.2) is applicable to any vector. Since i, j, k are of a unit
length and any two of them are perpendicular to each other, we call the
ordered triple (i, j, k) of unit vectors the orthonormal basis. It is easily
seen that for any a the expansion is unique. The numbers x, y and z in
(2.2) are called the coordinates of a. These are the same as the coordinates
x, y and z of the terminal point A of a. So we write a = {x, y, z }.

z z
R
----------71 M
// I
-------- A I
I I
I I
I I
I IQ
y
I /
___________j/

Fig. 2.20 Fig. 2.21

Vectors xi, yj and zk in (2.2) are called the components of a.


Let a= {x1, Yi, z1} and b = {x2, Y2, z2} be two vectors. Then a and
b are equivalent if and only if their respective coordinates are equal, i.e.,

a= b iff [;: ;: :
Zl = Z2.
The position vector of a point M(x, y, z) is the vector r = xi + yj + zk
which starts at the origin O of coordinates and ends at M(x, y, z)
(Fig. 2.21 ).
Linear operations on vectors specified by their coordinates. Let there
be given two vectors a = {x1, Y1, z1) and b = {x2, Y2, z2) so that
a = xii + Yd + z1 k and b = X2i + y2j + z2k. Then, by laws of addition we
get
a + b = (xii + Yd + z1k) + (x2i + Y2i + z2k)
= (x1 + X2)i + (Y1 + Y2)j + (z1 + z2)k
~~------)-_. Elements _Q..:f..:Ve_ctor AlgeQ_@_ ----·--------

or
a +b = (x1 + X2, Yt + Y2, Zt + Z2}.
Hence, addition of vectors becomes addition of their corresponding coor-
dinates.
Similarly, we have
a - b = (X1 - X2, Yt - Y2, Zt - Z2}.
Also
X.a = AX1i + X.yij + AZ1k
or
~ = (AX1, A.Yi , AZ1 }.
Hence, multiplication of a vector by a scalar reduces to multiplication of
all the coordinates of the vector by the scalar.
Let a = (X1, Y1 , z1 } and b = (x2, Y2, z2 } be collinear vectors, and b '# 0.
Then a = µ,b, i.e.,
X1 = µ,x2, Yi = µ,y2, z1 = µ,z2

lj

Fig. 2.22

or
X1 YI Z1
-=--=- (2.3)
X2 Y2 Z2

Conversely, if (2.3) is fulfilled, then a = µ,b, i.e., a and b are collinear.


In other words, the vectors a and b are collinear if and only if their
coordinates are proportional.
~

Example. Find the coordinates of the vector M1M2 where


M1(x1, Y1, z1) is the initial point and M2(x2, Y2, z2) is the terminal point
of the vector (Fig. 2.22).
2.4 Projection of a Vector onto an Axis 33

◄ From Fig. 2.22 it follows that M1M2 r1, where r1 and r2 are = r2 -
position vectors of M1 and M2, respectively. Thus
M1M2 = {X2 - Xi, Y2 - Y1, Z2 - Zl }.

Hence, we conclude that the coordinates of M1M2 are equal to the differ-
ences of the respective coordinates of M2 and M1. ►

2.4 Projection of a Vector onto an Axis


Let I be an axis and AB be a nonzero directed segment of I (Fig. 2.23).
The magnitude of the directed segment AB of the axis / is the num-
ber equal to the positive length of AB if the directions of AB and
I coincide, and to the negative length of AB otherwise.

B
\
\
\
\
\
\
\
\
\
D

Fig. 2.23 Fig. 2.24

Let AB be an arbitrary vector shown in Fig. 2.24. We drop from A and


B perpendiculars onto the axis /. Thus we get the directed segment CD of /.
Definition. The projection of the vector AB onto the axis I is the magni-
tude of the directed segment CD as given above.

B
\
. . . . . . >,
...... \
\
A \
\

Fig. 2.25

3-9505
34 2. Elements of Vector Algebra

Basic properties of projections. (1) The projection of a vector AB onto


an axis / is equal to the product of the length of AB and the cosine of
the angle between / and AB (Fig. 2.25), i.e.,
~ ~

proj,AB = IABI cos a.

Fig. 2.26

(2) The projection of the sum of vectors onto an axis I is equal to the
sum of the projections of the vectors onto the axis.
For example, from Fig. 2.26 if follows that
proj/(a + b + c) = proj, a + proj, b + proj, c.

2.5 Scalar Product of 1\vo Vectors


Let there be given two nonzero vectors a and b.
Definition. A scalar product of two vectors a and bis a number a· b such
that
a· b lal lbl cos 'P lal lbl cos (a,b), (2.4)
-
= =
. the angle between a and b.
where <P = (a, b) 1s
Observe that lbl cos 'P is the projection of b onto the axis determined
by a. Then we may write (see Fig. 2.27) -
a· b = lal proja b. (2.5)
Similarly, we have
a•b = lbl projb a.
Hence, a scalar product of two vectors is a number equal to the product
of the length of one vector by the projection of the other vector onto the
axis determined by the first vector.
When either a or b is a zero vector, we have a•b = 0.
2.5 Scalar Product of 1\vo Vectors 35

Properties of the scalar product. (1) The scalar product of a and b van-
ishes if and only if a and/or b are zero vectors, or if a and b are perpendicu-
lar (written a .1. b).
◄ This immediately follows from (2.4). ►
Since the direction of the zero vector is not defined we may set it perpen-
dicular to any vector. So we may write the property in question as
a .1. b iff a•b = 0.

(a) (h) (c}

Fig. 2.27

(2) The scalar product obeys the commutative law, i.e.,


a·b = b•a.
◄ This immediately follows from (2.4), taking into account that cos 'P
is an even function. ►
(3) The scalar product obeys the distributive law relative to the 'sum
of vectors, i.e.,
(a + b)•c = a•c + b•c.
◄ Indeed,
(a + b)•c = lcl •projc (a + b) = lcl •(projc a + projc b)
= lcl •projc a + lcl •projc b = a•c + b•c. ►

(4) If a or b is multiplied by a scalar A, the scalar product of a and


b is multiplied by A, i.e.,
(Aa)·b = a•(Ah) = A(a•b).
◄ Let A > 0. Then
A(a•b) --
= Alai lbl cos (a, b)
and
(Aa)·b = IAI lal lbl cos (Aa,b) = Ajal lbl cos (a,b),
3*
36 2. Elements of Vector Algebra

since the angles (a,b) and (Aa,b) are equal (Fig. 2.28).
We proceed similarly when A< 0.
The case of A = 0 is trivial. ►
Remark. In general
(a•b)•c ~ a•(b•c).

Fig. 2.28

Scalar product of two vectors specified by their coordinates. Let


a = (x1, Y1, z1} and b = {x2, Y2, z2) be two vectors specified by their coor-
dinates with respect to the orthonormal basis (i, j, k).
We consider the scalar product of a and b, i.e.,
a•b = (xii + Yd + z1k)·(x2i + y2j + z2k).
By the distributive law, we have
a•b = X1X2(i•i) + Y1X2{j•i) + Z1X2(k•i) + X1Y2(i•j)
+ Y1Y2(j•j) + Z1Y2(j·k) + X1Z2(i·k) + Y1Z2(j·k) + Z1Z2(k•k).
Observe that
i ·j = i •k = j •k = 0,
i•i = j•j = k·k = 1.
Whence we obtain
a•b = x1x2 + Y1Y2 + z1z2. (2.6)
We conclude that, given vectors a and b specified by their coordinates
with respect to the orthonormal basis (i, j, k), the scalar product of a and
b is equal to the sum of the products of their respective coordinates.
Example. Compute the scalar product of
a= 4i - 2j + k and b = 6i + 3j + 2k.
◄ a•b =4 X 6 + (- 2) X 3 + 1 X 2 = 20. ►

The scalar square of the vector a is a•a = a 2•


2.5 Scalar Product of 1\vo Vectors 37

Using (2.6) and setting b = a, we obtain


2
a = X12 + Y12 + Z1.2 (2.7)
On the other hand,
a2 = a·a = lal 2 •cos O = lal 2 ,
so that (2. 7) gives
lal = ✓ x12 + Y12 + z12 . (2.8)
Hence, the length of a vector specified by its coordinates with respect to
the orthonormal basis is equal to the square root of the sum of squares
of the coordinates.
Direction cosines. By definition, the scalar product of two vectors a
and b is
a•b = lal lbl cos cp,
where cp is the angle between a and b.
Provided that both a and b are nonzero vectors, the angle between a
and b is given by
a·b (2.9)
cos cp = lal lbl ·
For a = (x1, Y1, zd and b = {x2, Y2, z2} (2.9) becomes

(2.10)

Example. Find the angle between a = (2, - 4, 4 } and b = { - 3, 2, 6}.


◄ Using (2.10), we get
cos cp
-6 - 8
= ---;;======-----;=====- = - .
+ 24 5
✓ 4 + 16 + 16 - ✓ 9 + 4 + 36 21
The value of cp is easily obtained from tables of trigonometric func-
tions. ►
Let b = i, that is, b = (1, 0, 0). Then for any a= {x1, Y1, z1) ¢ 0 and
the unit vector i we have
a•i
cos a= laf
or
X1
cos a = ----;=;:::===;;===:::;:=- (2.11)
✓ xf +Yi+ zi '
where a is the angle between a and the x-axis.
38 2. Elements of Vector Algebra
---------·--------

Similarly,
YI a•j
cos {3 = ✓ 2 2 2 = Tai' (2.12)
XI + YI + ZI

ZI a•k
= ✓ 2 (2.13)
cos 'Y
XI + YI
2
+ ZI
2 lal .
Formulas (2.11)-(2.13) define the direction cosines of the vector a, i.e.,
the cosines of angles between a and the coordinate axes (Fig. 2.29).

y
z

y 0 X

Fig. 2.29 Fig. Z.30

Example. Find the coordinates of the unit vector n°.


◄ Let n° = xi + yj + zk. Since ln°1 = 1, we get
n°•i = x = ln°1 [ii cos (n°;'i) = cos a,
n°•j= y = cos {3,
n°•k = z = cos 'Y·
Thus, the coordinates of the unit vector are -direction cosines of this
vector, i.e., cosines of the angles between n° and the coordinate axes, so that
n° = i cos a + j cos {3 + k cos 'Y.
Whence
(n°)2 = n°•n° = 1 = cos 2 a + cos2 {3 + cos 2 'Y· ►
Example. Let n° be orthogonal to the z-axis, i.e.,
n° =xi+ yj.
2.6 Vector Product of 1wo Vectors 39

Then the coordinates x and y of n° are

x = cos cp and y = sin cp,


so that (Fig. 2.30)
n° = i cos cp + j sin cp.

2.6 Vector Product of 1\vo Vectors

Definition. The vector product of two noncollinear vectors a and


b is a vector a x b such that
(1) the length of a ,c b is equal to lal lbl sin cp, where cp is the angle be-
tween a and b (Fig. 2.31);

Fig. 2.31

(2) a ,c b is perpendicular to both a and b, i.e., a ,c b is perpendicular


to the_ plane determined by a and b;
(3) a ,c b is so directed that the shortest rotation from a to b is made
counterclockwise as seen from the terminal point of the vector a ,c b
(Fig. 2.32). In other words, the ordered triple of vectors (a, b, a ,c b) is
right-handed, i.e., the thumb, the index finger and the middle finger of the
right hand point along the directions of a, b and a ,c b, respectively.

a•b

Fig. 2.32
40 2. Elements of Vector Algebra

When a and b are collinear, a x b = 0.


The length of the vector product of a and b denoted by
la x bl = lal lbl sin '/J (2.14)
is numerically equal to the area So of the parallelogram with sides a and.
b (Fig. 2.33), i.e.,
la ,c bl = So.

Fig. 2.33

Properties of the vector product. (1) The vector product of a and b


is equal to the zero vector if and only if a and/or b are zero vectors, or
if a and b are collinear. (Recall that if a and b are collinear then the angle
between a and b is either zero or 1r.)
◄ This immediately follows from la ,c bl = lal lbl sin '/J. ►
We may set the zero vector collinear to any vector and express the
property as
a II b iff a ,c b = 0.
(2) The vector product obeys the anticommutative law, i.e.,
b x a = - (a )( b). (2.15)
◄ Indeed, a ,c b and b ,c a are collinear and of the same length. Since
the shortest rotation from a to b is counterclockwise for a x b and is clock-
wise for b ,c a, we conclude that a ,c b and b ,c a are of opposite directions
(Fig. 2.34) ►.
(3) The vector product obeys the distributive law relative to the sum
of vectors, i.e.,
(a + b) ,c c =a ,c c + b x c.
(4) If a or b is multiplied by a scalar ~ the vector proquct of a and
b is multiplied: by ~, i.e., ··
(Aa) ,c b =a ,c (~b) = ~(a ,c b).
2.6 Vector Product of 1\vo Vectors 41

Remark. The vector product is not associative. Indeed, the identity


(a x b) ,c c = a ,c (b ,c c) is not generally true. For example,
(i x j) ,c j = - i but i x (j ,c j) = 0.

Vector product of two vectors specified by their coordinates. Let


a = (xi, Yi, zi} and b = (x2, Y2, z2} be two vectors specified by their coor-
dinates with respect to the orthonormal basis (i, j, k).

a•b

Fig. 2.34
Fig. 2.35

By the distributive law, we obtain

a )( b = (Xii + yij + zik) )( (X2i + .Y2j + z2k)


= XiX2(i >C i) + Xi.Y2(i >C j) + XiZ2(i >C k) + YiX2(j >C i)
+ YiY2(j X j) + YiZ2(j ,c k) + ZiX2(k >C i) + ZiY2(k >e j)
+ ZiZ2(k >C k). (2.16)
The vector products of basis vectors (see Fig. 2.35) are
i ,c i =j x j =k ,c k = 0,
i ,c j = k, j ,c k = i, k ,c i = j,
j >e i = - k, k X j = - i, i ,c k = - j.
Substituting these relations into (2.16), we obtain
a >e b = - X2.Y1k + x2zd + Xi.Yik - Z1.Y2i + Yiz2i - XiZ2j
= i(yiz2 - .Y2Zi) + j(X2Zi - X1Z2) + k(X1.Y2 - X2Y1). (2.17)
To memorize (2.17) it is helpful to rewrite it in the from of the third-
order determinant as
i j k
a X b = X1 Y1 Z1 • (2.18)
X2 Y2 Z2

Expanding (2.18) with respect to the first row, we obtain (2.17).


Examples. (1) Compute the area of a parallelogram with sides
a = i + j - k and b = 2i + j - k.
◄ The area we are seeking for is S0 = la x bj. By (2.18), we get

i j k
a X b = 1 1 - 1 = i ·0 - j · 1 + k · ( -1) = - j + k.
2 1 -1
Whence
So = ✓1 + 1 = "2. ►
(2) Compute the ai;-ea of the triangle OAB shown in Fig. 2.36.

Fig. 2.36

◄ We clearly see that the area SA of OAB is half the area So of the
parallelogram OACB. Computing the vector product of a = 6A and
b = OB, we obtain
i j k
a xb= X1 Y1 0 = (X1.Y2 - X2Y1)lk.
X1 Y2 0
Whence
2. 7 Mixed Products of Three Vectors

2.7 Mixed Products of Three Vectors


Scalar triple product. Let there be given three vectors a, b and c.
We find the vector product a ,c b of a and b, and then the scalar product
of a ,c b and c, i.e., the number (a ,c b)•c.
The number
(a ,c b)•c

is called the scalar triple product of three vectors a, b and c.


Geometric interpretation of the scalar triple product. Let three vectors
a, b and c be applied to some point O as shown in Fig. 2.37. If the points
0, A, B and C lie in the same plane, i.e., if the vectors a, b and c are
coplanar, their scalar triple product (a ,c b)•c is equal to zero, i.e.,

Fig. 2.37

(a x b)•c = 0, since the vector a perpendicular to the plane contain-


,c bis
ing the vectors a and b and hence c. In other words, the vectors a ,c b
and c are perpendicular to each other and their scalar product is equal
to zero.
When the points 0, A, B, C do not lie in the same plane, i.e., when
a, b and care noncoplanar, we construct a parallelepiped with edges OA,
OB and OC (Fig. 2.38).
By definition
a ,c b = Se,

A
(b)

Fig. 2.38
44 2. Elements of Vector Algebra

where S is the area of the parallelogram OADB, e is the unit vector perpen-
dicular to both a and b so that the ordered triple of vectors (a, b, e) is
right-handed, i.e., the thumb, the index finger and the middle finger of
the right hand point along the directions of a, b and e, respectively.
Computing the scalar product of a ,c b and c, we obtain
(a ,c b)•c = (Se)•c = S(e•c) = Sproje c. )
The number S proje c is equal to the positive height h of the parallele-
piped if the angle 'P between e and c is acute, i.e., if the ordered triple of
vectors (a, b, c) is right-handed, and to the negative height if the angle
'Pis obtuse, i.e., if (a, b, c) is the left-handed ordered triple of vectors, so that
(a ,c b) · c = ± Sh = ± V.
Therefore the scalar triple product of a,. b and c is numerically equal
to the volume V of the parallelepiped with edges a, b, c when the triple
(a, b, c) is right-handed, and to minus V when the triple (a, b, c) is left-
handed.
The geometry reveals that multiplying the three vectors in any other
order, we obtain either + V or - V.
Notice that when the triple (a, b, c) is right-handed, both (b, c, a) and
(c, a, b) are also right-handed, while all the triples (b, a, c), (a, c, b) and
(c, b, a) are left-handed. Thus
(a ,c b)•c = (b ,c c)•a = (c ,c a)·b
= - (b ,c a)· C = - (a ,c C) • b = - (C ,c b) •a.

Once again we shall emphasize the following: the scalar triple product
of three vectors a, b, c is equal to zero if and only if a, b and c are coplanar.
In symbols,
a, b, c are coplanar iff (a ,c b)•c = 0.

Scalar triple product of three vectors specified by their coordinates. Let


a = {x1, Yi, z1), b = {x2, Y2, z2) and c = {X3, y3, Z3) be three vectors speci-
fied by their coordinates with respect to the orthonormal basis (i, j, k).
Let us express the scalar t_riple product (a ,c b)•c in terms of coordinates
of a, b and c.
We have

i j k
a ,c b = x1 Y1 z1
X2 Y2 Z2

= i(y1z2 - Y2Zi) + j(x2z1 - X1Z2) + k(X1Y2 - X2Y1).


Exercises 45

Whence
(axb)·c
Xi Y1 Zi
Yi Zi Xi Zi X1 Y1
= X3 - Y3 + Z3 X2 Y2 Z2
Y2 Z2 X2 Z2 X2 Y2 Z3
X3 Y3
Thus the scalar triple product of three vectors specified by their coor-
dinates with respect to the orthonormal basis (i, j, k) is equal to the third-
order determinant whose rows are coordinates of the vectors being mul-
tiplied.
The necessary and sufficient condition for three vectors a =
{x1, Yi, zi), b = {x2, Y2, z2} and c = {X3, y3, z3} to be coplanar may be
written as
X1 Yi Zi
X2 Y2 Z2 = 0.
X3 Y3 Z3

Example. Verify whether the vectors a = {7, 4, 6}, b = {2, 1, 1 } and


c = {19, 11, 17} are coplanar.
◄ The vectors are coplanar if
7 4 6
A= 2 1 1
19 11 17
is equal to zero, and noncoplanar otherwise.
Expanding A with respect to the first row, we obtain
A =7 X 6 - 4 X 15 +6 X 3 = 0.
Thus we conclude that the vectors are coplanar. ►
Vector triple product. The vector triple product a x (b x c) of three
vectors a, b and c is the .vector perpendicular to both a and b x c. Hence
the vector a x (b x c) lies in the plane determined by b and c and may
be expanded with respect to b and c. It is easy to show that
a ,c (b ,c C) = b . (a . C) - C• (a . b).
Exercises
1. The vectors AB= c, BC= a and C4
= b are sides of a tri-
angle. Express the medians of the triangle in terms of a, b and c.
2. Let p, q and p + q be three vectors applied to a common point.
Find the relationship between p and q provided that p + q bisects the angle
between p and q.
'!~----· 2. Elements o_f_Vi_ec_to_r_A_l.;;;;_ge_b_ra_ _ _ _ _ _ _ _ _ _ _"---_ _ __
3. Compute the lengths of the diagonals of a parallelogram with sides
a = Sp + 2q and b = p - 3q provided that IPI = 2v2, lql · = 3 and
--
(p, q) = 1r/4. /
4. Prove that the diagonals of the rhombus are perpendicular to each other.
5. Compute the scalar product of a = 4i + 7j + 3k and b = 3i - 5j + k.
6. Find the unit vector a0 which is parallel to a = {6, 7, - 6). .
7. Find the projection of the vector a = i + j - k onto the axis determined
by the vector b = 2i - j - 3k.
8. Find the cosine of the angle between the vectors AB and AC, given
A( -4, 0, 4), B( -1, 6, 7) and C(l, 10, 9).
9. Find the unit vector p0 which is perpendicular both to the vector a =
{3, 6, 8) and to the x-axis.
10. Find the sine of the angle between the diagonals of the parallelogram
with sides a = 2i + j - k and b = i - 3j + k.
11. Find the height h of the parallelepiped with edges a = 3i + 2j - 5k,
b = i - j + 4k and c = i - 3j + k provided that the base of the parallele-
piped is the parallelogram with sides a and b.

Answers
--
1. AM = c + -a or AM
-- = -
c-b -- = a + -b
- · BN or BN-- = - a-c
- ; Cr =t- =
2 2 ' 2 2
b + ~ or CP = b ; a . 2. IPI = lql since the diagonal of .the parallelogram bisects the

angle only if the parallelogram is the rhombus. 3. la + bl = 15, la - bl = .Jm. 5. -20.

6. •• = [ ~ ' ii ' - In or •• = [ - I~ ' - I~ ' ~ l 7. proj•• = Ji:r .


8. cos cp = 1, cp = 0. 9. p0 = ± [ 0, - 5 5
. cp = 'V
4 , 3] . 10. sm . T;;;
~- 11. h = 49
..ffi.3'
Chapter 3
The Line and the Plane

3.1 The Plane


Normal and general equations of a plane. We may determine a plane
in space by specifying, relative to a given point O, two quantities, namely
(1) the perpendicular distance p between O and the plane, i.e., the length
of the perpendicular OT dropped from O onto the plane, and (2) the unit
vector n° which starts at O and is perpendicular to the plane, as shown
in Fig. 3.1.

Fig. 3.1

Suppose that an arbitrary point M moves in the plane 1r. Whatever the
path of moving, the position vector r = OM varies so that the relation
. _..
prOJn°0M = p (3.1)

remains valid unless M leaves the plane. In other words, relation (3.1)
describes a property peculiar to all points of the pla~ i.e., relation (3.1)
is the equation of a plane. Observing that projn° OM= r•n°, we may
write (3.1) as
r•n° - p = 0. (3.2)
Equation (3.2) is called the normal vector equation of a plane. The posi-
tion vector r is sometimes referred to as a moving position vector.
48 3. The Line and the Plane
--------------------

Let us set up in space a Cartesian coordinate system with the origin


at 0. Then n° = (cos a, cos {3, cos 'Y} and r = (x, y, z}, and the equation
(3.2) takes the form
X COS a +y COS {3 + Z COS 'Y - p = 0, (3.3)
in which case
(1) cos 2 a + cos2 {3 + cos 2 'Y = 1,
(2) the term ( -p) is nonpositive.
Equation (3.3) is called the normal coordinate equation of a plane.
Since (3.3) is the first-degree equation in x, y and z, we conclude that
in three-dimensional space a plane is determined by the first-degree equa-
tion in Cartesian coordinates x, y and z.
The converse is also true, namely any first-degree equation in Cartesian
coordinates x, y and z defines a plane.
◄ Indeed, let us consider a general first-degree equation

Ax + By + Cz + D = 0 (A 2 + B 2 + C 2 > 0). (3.4)


Multiplying (3.4) by a scalar µ, we obtain
µAx + µBy + µCz + µD = 0. (3.5)
We choose µ such that (3.5) reduces to the normal equation (3.3). It
suffices to set
µA = cos a, µB = cos {3, µC = cos 'Y, µD = -p. (3.6)
From (3.6) we obtain
µ2(A2 + B2 + C2) = 1.
Whence
1
µ = :I: ---;;:::====== (3.7)
✓A2+B2+c2
From µD = -p ~ 0 it follows that the sign in (3.7)"must be opposite
to the sign of D. Substituting (3.7) into (3.6), we find expressions for cos a,
cos {3, cos 'Y and p in terms of the coefficients of equation (3.4). In other
words we reduce equation (3.4) to the normal coordinate equation (3.3).
The scalar µ is said to be a normalizing factor. Since the normal coordinate
equation defines a plane, so does equation (3.4), called the general equation
of a plane in three-dimensional space. ►
Therefore, any first-degree equation in x, y and z defines a plane in
space as a set of points whose coordinates satisfy this equation.
We shall call any nonzero vector perpendicular to a given plane the .nor-
3.1 The Plane 49

ma! vector to this plane. This implies that the vector n = {A, B, C} is
a vector normal to the plane (3.4). Hence, A, Band Care easy to interpret
as coordinates of the vector n = {A, B, C} normal to the plane (3.4) (see
Fig. 3.2). Any other normal vector is obtained by multiplying n by a non-
zero scalar.

!I

Fig. 3.2

Equation of a plane which passes through a given point and is perpen-


dicular to a given direction. We wish to find an equation of a plane which
passes through a point Mo specified by the position vector ro =
{X-O, Yo, ~ ) and is perpendicular to the vector n = {A, B, C}.
Let r = {x, y, z) be a position vector of an arbitrary point M of a
plane (Fig. 3.3). The vector M-;/J = r - ro lies in the plane and, conse-
quently, is perpendicular to the vector n. So the scalar product of the vec-
tors r - ro and n is equal to zero, i.e.,
(r - ro)•n = 0. (3.8)
The identity (3.8) remains valid for all points of the plane and does
not hold for any point outside the plane. Hence, (3.8) is the normal vector
equation we are seeking for. Expressing (3 .8) in terms of coordinates of

Fig. 3.3
4-9505
50 3. The Line and the Plane

r - ro and n, we obtain the desired equation


A (x - X-O) + B(y - Yo) + C(z - ~) = 0. (3.9)
T~e angle between two planes. )We consider two planes specified by the
equations
2 2 2
A1x + B1y + C1z + D1 = 0 (A1 + B1 + C1 > 0),
A2x + B2y + C2z + Di = 0 (Ai + Bi + Ci > 0).
The angle between two planes is any of the two dihedral angles formed
by these planes (Fig. 3.4). When two planes are parallel the angle between
them is equal either to zero or to 1r.

\
\
\
\
)
/
/
/
/
/

Fig. 3.4

We easily see from Fig. 3.4 that one of the two dihedral angles is equal
to the angle 'P between the vectors n1 = (A1, B1, C1} and n2 =
{A2, B2, C2 }, whence we have

cos 'P
n1 •n2 + B1B2 + C1C2 A1A2
= ----,----,-----,---,- = ----;:=:;::==:;.::::=::::;:::---;::=:;:::==;:==::;:::• (3.10)
ln1I · 1°21 ✓Ar+ Bi+ C7 .-✓Ai+ Bi+ a
Conditions of perpendicularity and parallelism of two planes. If two
planes are perpendicular to each other, their respective normal vectors are
also perpendicular. Hence, there holds
81 • D2 =0
or
A1A2 + B1B2 + C1 C2 = 0.
This is the condition of perpendicularity of two planes.
3.2 Straight Line in a Plane 51

If two planes are parallel, their respective normal vectors are collinear,
i.e., ni = M2. Expressing this identity in terms of coordinates of ni and
02, we obtain

Ai = M2, B1 = >JJ2, C1 = AC2


or
Ai B1 C1
A2 = B2 = C2.

This is the condition of parallelism of two planes.


Problem. Find the equation of the plane passing through three noncol-
linear points Mi(X1, Yi, z1), M2(X2, Y2, Z2) and M3(X3, Y3, Z3).

3.2 Straight Line in a Plane


An approach identical to that we used in preceding sections is fully
applicable when we consider a (straight) line in a plane.
Here we present the major results with reference to accompanying
figures.
1. Standard forms of the equation of a line:
(a) The normal vector equation (Fig. 3.5)
r • n° - p = 0.

Fig. 3.S Fig. 3.6

(b) The normal coordinate equation (Fig. 3.6)


x cos 'P + y sin <,e - p = 0.
(c) The general equation (Fig. 3.7)
Ax + By + C =0 (A 2 + B2 > 0),
where A and Bare coordinates of the vector n = (A, B} normal to the line.
4*
52 3. The Line and the Plane

Multiplying the general equation by


1
µ, = ± --;:===
✓A 2 -.+--B2
we may reduce the general equation to the normal coordinate equation.

IJ:lc%+b
katan a

Fig. 3.7 Fig. 3.8

The sign of µ, is chosen so that


µ,C = -p' 0.
We mention two interesting cases:
(a) If B ¢ 0, then setting k =- ~ and b =- i (see Fig. 3.8), we
obtain the slope intercept form of the equation of a line
y=kx+b.
(b) If ABC ¢ 0, then setting a = - ~ and b = - i (Fig. 3.9), we
obtain the intercept form of the equation of a line
;+~=1.

Fig. 3.9 Fig. 3.10


3.2 Straight Line in a Plane 53

2. The distance between the point M* and the line L is the length of
the perpendicular segment M*N dropped from M* onto L (Fig. 3.10).
Let M*(x*, y*) be a given point and
x cos <P + y sin '/J - p =0
be a normal coordinate equation of L. Then the distance between M* and
Lis

d = d(M*, L) = Ix* cos '/J + y* sin <P - Pl•


◄ Let r be a position vector of an arbitrary point Mon L. Then r • n° = p.
We denote the position vector of M* by r*. The difference r* • n° - p
is equal to the projection of the vector r* - r onto the axis L* determined
b y nO, 1.e.,

r* • n° - p = r* • n° - r • n° = (r* - r) · n° = projL•(r* - r),


as shown in Fig. 3.11.

Fig. 3.11

Finding the absolute value of the above identity, we obtain


Ir*· n° - Pl= d(M", L)
or, in terms of coordinates,
d(M*, L) = ix- cos <P + y* sin <P - Pl• ►

If L is specified by the general equation


Ax+ By+ C = 0,
54 \3. The Line and the Plane
\

the distance between ~ and L is


d(M*, L) = IAx• +By•+ Cl.
✓A2 + B2
Remark. Similarly, we may define the distance between a point and ~
plane. If a plane is given by the general equation
Ax+By+ Cz+D=O,
then the distance between the plane and the point M*(x*, y•, z*) is
IAx• +By•+ cz• + DI
d(M*, 1r) = ---;======:----
✓A 2 + B2 + c2

Fig. 3.12 Fig. 3.13

3. The equation of a line which passes through a given point


Mo (Xo, y0) and is perpendicular to a given normal vector n =
(A, B) ;ii! 0 (Fig. 3.12) is
A (x - Xo) + B(y - Yo) = 0.
4. The angle between two lines A1x + B1y + C1 =0 and A2x +
&,y + C2 = 0 (Fig. 3.13) is given by
A1A2 + B1lh.
cos 'P = ----;==.:===.:=---;=::;:==::;;;=•
✓A~+ B~ ·YA~+ B~
The two lines are
(a) perpendicular to each other iff
A1A2 + B1lh. = 0,
(b) parallel iff
Ai B1
A2 - 1h. •
3.3 Straight Line in Three-Dimensional Space 55

3.3 Straight Line in Three-Dimensional Space


Equation of a line. The line is uniquely determined in space by a
position vector ro of a point Mo belonging to the line and a nonzero vector
s parallel to this line. The vector s is called the direction vector of the line.
Let r = OM be a position vector of an arbitrary point M on the line.
From Fig. 3.14 we easily see that
._.. --+- --..
OM = OMo + MoM. (3.11)
The vector MoM
is parallel to the vector s so that MoM
= st, where
t is a scalar whose value depends on the position of Mon the line. Hence,
.(3.11) may be written as
r = ro + st. (3.12)
Equation (3.12) is called the parametric vector equation of a line.

Fig. 3.14

We set up a Cartesian coordinate system with the origin at 0. Let us


denote the coordinates of Mo, i.e., coordinates of ro, by Xo, Yo, Zo; the coor-
dinates of an arbitrary point M, i.e., the coordinates of r, by x, y, z; and
the coordinates of the direction vectors by I, m, n. Then the vector equa-
tion (3.12) becomes
x = Xo + It, y = Yo + mt, z = Zo + nt. (3.13)
Equations (3.13) are called the parametric equations of a line. The num-
bers I, m and n are referred to as direction numbers of a line.
Eliminating t from (3.13), we obtain

x- Xo Y -Yo z - 7A>
I = t, ---= t, - - - = t.
m n
56 3. The Line and the Plane

Whence
x- Xo Y - Yo z- ZO
I m - n , (3.14)
where Xo, Yo, zo are the coordinates of Mo belonging to the line and I, m,
n are the direction numbers (/2 + m 2 + n 2 > 0).
Relation (3.14) is said to be the point direction equation of a line.
Any two equations obtained from (3.14), say
X - Xo Y - Yo Y - Yo Z - ZO
- - - - - - - and - - - -
! m m n '
define a line as a line of intersection of two planes.
Equation of a line passing through two given points. We wish to find
equation of a line passing through the points M1 (x1, Y1, z1) and
M2(X2, Y2, z2). Suppose we are seeking for the point direction equation of
a line. So we need to know the coordinates of any point belonging to the
line and the direction vector. Let us select M1 (x1, Y1, z1) as the point we
---.
need and define the direction vector as M1M2 = {x2 - X1, Y2 - Y1,
z2 - z1}. Then, the desired equation is
X - X1 Y - Yt Z - Zl
(3.15)

Remark. If M1 and M2 lie in the xy-plane, i.e., if z1 = z2 = 0, the equa-


tion of a line passing through M1 and M2 is
y -y1
Y2 -yi
Example. Find the equation of a line passing through M1(l, 0, -1) and
M2(3, 1, 1).
◄ Using (3.15), we obtain the desired equation as

x-1 y z+l
-- ►
2- =
. -
1 = - 2- - .
General equation of a line. Reducing the general equation to the point
direction equation of a line. A line in space may be determined as a line
of intersection of any two distinct nonparallel planes.
Let two distinct nonparallel planes be specified by the equations
A1x + B1y + C1z + Di = 0 and A2x + B2Y + C2z + Di. = 0.
Then the system of the two equations

[ A1x + B1Y + C1z + D1 = 0, (3.1 6)


A2x + B2y + C2z + Di. =0
is said to represent the general equation of a line.
3.3 Straight Line in Three-Dimensional Space 57

System (3.16) may be reduced to the point direction equation of a line.


This reduction requires a point belonging to the line and a direction vector
be known.
We may find the coordinates of a point belonging to the line from (3.16)
by setting one unknown equal to an arbitrary value and solving the system
of two equations thus obtained to get the other two coordinates of the
point we are seeking for. Notice that the direction vector s being of the
same direction and lying in the line of intersection of the given planes must
be perpendicular to the normal vectors n1 = {A1, B1, C1 } and n2 =
{A2, B2, C2 }. Conversely, any vector perpendicular to n1 and n2 is parallel
to both planes, i.e., this vector is parallel to the line of intersection of the
planes. The vector product n1 ,c 02 is known to be perpendicular both to
~1 and n2. Hence, we may take s = n1 ,c n2 as the direction vector of the
plane.
Example. Reduce

[
X - y +Z- 3 = 0, (3.17)
X + y - 2z + 1 = 0
to the point direction equation of a line.
◄ We find a point belonging to the line (3.17). Setting, for example, z = 0,
we arrive at the system

[
X - y = 3,
x+y=-1.
Whence, x = 1 and y = -2. Thus Mo(l, -2, 0) belongs to the line. ►-
The normal vectors to the planes are n1 = {1, -1, 1 } and
n2 = {l, 1, -2). Then the direction vector of the line is
i j k
S = n1 ,c n2 = 1 - 1 1 =i + 3j + 2k.
1 1 -2
Hence, the point direction equation of a line specified by the system
(3.17) is
X - 1 y +2 z
----=- ►
1 3 2·
Angle between a line and a plane. Let there be given a line
X - Xo _ Y - Yo _ Z - Zo (3.18)
I m n
and a plane
Ax + By + Cz + D = 0. (3.19)
58 3. The Line and the Plane

The angle 'P between the line and the plane is the smallest angle between
the line and its projection onto the plane (Fig. 3.15).
Let a be an angle between the line (3.18) and the vector normal to the
plane (3.19). Then
n•s
cos a = lnl • Isl '
where s is the direction vector of the line (3.18) (see Fig. 3.16).

Fig. 3.15

Observe that lcos al = sin <p. Then


. + Cnl
IA/ + Bm
Sln '/) = ~;:::::======--;:::=======:::=--. (3.20)
✓ A 2 + B2 + c2 . ✓ 12 + m2 + n2

When the line (3.18) is parallel to the plane (3.19), t~e direction vector
s of the line is perpendicular to the vector n normal to the plane so that
n •s =0
or
Al + Bm + Cn = 0. (3.21)

- n

/
/

sin 'f =-COS a. sin So =COS a.


'IT
; <0:<77' O<ct<T

Fig. 3.16
3.3 Straight Line in Three-Dimensional Space 59

This is the condition of parallelism of a line and a plane.


When the line (3.18) is perpendicular to the plane (3.19), the vectors
s and n are parallel, i.e., s II n and
A B C
T =m =n (3.22)

This is the condition for a line and a plane to be perpendicular.


Intersection of a line and a plane. Let there be given a line
x- Xo y-yo z-zo
(3.23)
I m n
and a plane
Ax + By + Cz + D = 0. (3.24)
The coordinates of the point at which the line (3.23) and the plane (3.24)
intersect are to satisfy both (3.23) and (3.24). Thus, we must solve the sys-
tem containing (3.23) and (3.24) in three unknowns x, y, z to obtain the
point at which the line and the plane meet.
We reduce equation (3.23) to the parametric equations
x = Xo + It, y = Yo + mt, z = zo + nt. (3.25)
Substituting (3.25) into (3.24), we obtain
AXo + Byo + Czo + D + t(AI + Bm + Cn) = 0.
Whence
A.xo+Byo+Czo+D
t =- Al + Bm + Cn (Al + Bm + Cn ¢ 0). (3.26)

Substituting (3.26) into (3.25), we obtain the coordinates of the desired


intersection point.
If Al + Bm + Cn = 0 and AXo + Byo + Czo + D ¢. 0, the line (3.23)
is parallel to the plane (3.24) and the point (Xo, Yo, zo) does not belong
to the plane. Hence, the line (3.23) and the plane (3.24) have no points
in common.
When Al + Bm + Cn = 0 and AXo + Byo + Czo + D = 0, the former
equality implies that the line and the plane are parallel to each other and
by virtue of the latter equality the point (Xo, Yo, zo) belongs to the plane
(3.24). Hence, the line (3.23) is contained in the plane (3.24).
Example. Find the point of intersection of the line
x-7 y-3 z+l
3 - 1 - -2
(3.27)
60 3. The Line and the Plane

with the plane


2x + y + 7z - 3 = O. (3.28)

◄ We reduce (3.27) to the parametric equations


X = 1 + 3(, y = 3 + t, Z = -1 - 2t. (3.29)

Then substituting (3.29) into (3.28), we obtain


6t + 14 + t + 3 - 14( - 7 - 3 = 0
or
-7(+7=0.
Whence, t = 1.
Now we get from (3.29) the coordinates x = 10, y = 4, z = -3 of the
intersection point. ►
Problem. Find the distance between the point M1 (x1, Y1, z1) and the line
x- Xo z - Zo
I n

Exercises

1. Write the equation of (a) the plane which is parallel to the


xz-plane and passes through the point (2, -5, 3); (b) the plane which con-
tains the z-axis and passes through the point ( - 3, 1, -2).
2. Given the points A(l, 3, -2) and B(7, -4, 4), write the equation of
a plane which passes through B and is perpendicular to the segment AB.
3. Write the equations of (a) the plane which passes through the point
( - 2, 7, 3) and is parallel to the plane x - 4y + 5z - 1 = 0; (b) the plane
which passes through the origin of coordinates and is perpendicular to the
planes 2x - y + Sz + 3 = 0 and x + 3y - z - 1 = 0.
4. Write the equation of the plane, such that the point P(3, - 6, 2) is the
foot of the perpendicular dropped from the origin of coordinates onto the
plane.
5. Write the equation of the plane passing through the origin of coordinates
and the points A(3, -2, 1)° and B(l, 4, 0).
6. Write the equation of the line passing through the origin of coordinates
and the point (a, b, c).
7. Compute the angle between the lines
x-1 y+2 z-5 x y-3 z+1
- - - - - - - - - - - and
3 6 2 2 9 6
Exercises 61

(The angle between two given lines is that between any two lines which
pass through an arbitrary point in space and are parallel to the given lines.)
8. Reduce the equations of the line
2x - 3y - 3z + 9 = O,
[
X - 2y + Z + 3 = 0

to the point direction equation.


9. Write the equation of the line which passes through the point
(2, - S, 3) and is pafallel to (a) the z-axis, (b) the line
x-1 y-2 z+3
4 - -6 - 9
(c) the line

f 2x - y + 3z - 1 = 0,
lsx + 4y - z - 7 = o.
10. Write the equation of the line which passes through the point
(1, -1, 0) and is perpendicular to the plane 2x - 3y + 5z - 7 = 0.
. o f (a) t he 1·1ne x - 7 = y - 4 = z - 5
. o f.1ntersect1on
11• F'1nd t he point 5 1 4

with the plane 3x - y + 2z - 5 = 0; (b) the line x ; 1 - Y ~ 3 =;


with the plane 3x - 3y + 2z - S = 0.
12. Write the equation of the plane which passes through the origin of
.
coord1nates . perpend'1cular to t h e 1·1ne x + 2 - y - 3 - z _- 1 •
and 1s 4 5 2
13. Find the projection of the point A (4, - 3, 1) onto the plane
X + 2y - Z - 3 = 0.
14. Write the equation of the plane which passes through the point
(4, - 3, 1) and is parallel to the lines
X _y Z x+l y-3 z-4
6 - 2 - -3 and
5 - 4
- 2

15. Write the equation of the plane which passes through the line
x - 3 _ Y + 4 _ z - 2 and is parallel to the line x ~ 5 -
2 1 -3
y-2 z-1
7 - 2 •
62 3. The Line and the Plane

Answers
LOOy+S=~OOx+~=Q1~-~+~-~=Q100x-~+~+
15 = O; (b) 2x - y - z = ~ 4. 3x - 6y + 2z - 49 = O. 5. 4x - y - 14z = 0.
6• x = yb = -z . 7• cos 'P = -
72. 8. -x = -y = -
5
Z +
-3; provi'ded t hat t he point
. 1ymg
. on t he
0
C 77 9 -1
x-2
line is (0, 0, ;_ 3). 9. (a) 0 = y+S
0 = z-3
1 or
[x-2=0
Y + s = 0' (b)
x-2
4 =

y + S = Z - 3 . (c) X - 2 = y + S = Z - 3 . lO. x - 1 = y + 1 = :_ .
-6 9 -11 17 13 2 -3 S
11. (a) (2, 3, l); (b) the line is parallel to the plane. 12. 4x + Sy - 2z = 0. 13. (S, -1, 0).
14. 1~ - 27y + 14z - 159 = 0. 15. 23x - 16y + 10: - 153 = 0. ·
Chapter 4
Curves and Surfaces of the Second Order

4.1 Changing the Axes of Coordinates in a Plane


Suppose that we are given two Cartesian coordinate systems Oxy
and O 'x' y' in a plane (Fig. 4.1). We may indicate the location of an ar-
bitrary point Min a plane by giving its coordinates (x, y) on the x- and
y-axes or its coordinates (x' , y ') on the x ' - and y '-axes. It is clear that
the coordinates (x, y) and (x', y ') are somehow related. We shall try to
ascertain relationships between (x, y) and (x', y') and, consequently, those
between the axes of the two Cartesian coordinate systems in a plane.
y

0 X
Fig. 4.1

Translation of the axes of coordinates. Suppose that the Cartesian coor-


dinate system O 'x 'y ' is obtained by moving the x- and y-axes parallel to
themselves so that the origin of coordinates O goes into the point O'
(Fig. 4.2). This means that the unit vectors of the parallel axes are
equivalent.
y'

O' I I'

0 I

Fig. 4.2
64 4. Curves and Surfaces of the Second Order
:..:.:....=..:...:=-:...=-:..=.=:......=;.;=~---------·--

Let r and r' be position vectors of a point M relative to the origins


0 and 0 ' of the given Cartesian coordinate systems 0xy and 0 'x 'y'
(Fig. 4.3). Then we have
r =xi+ yj,
r' = x' i + y 'j,
and
~

00' = ai + /3j,
where a and /3 are the coordinates of the origin 0' on the x- and y-axes,
respectively.
Since
r= r' + 00',
we obtain
xi+ yj = (x'i + y'i) + (ai + {3j).
Whence
x = x' + a,
y = y I + {3.

Fig. 4.3 Fig. 4.4

Rotation of the axes of coordinates. Suppose that the x ' - and y '-axes
are obtained by rotating the x- and y-axes through an angle 'P so that the
origins of the two Cartesian coordinate systems coincide (Fig. 4.4).
We find coordinates of the unit vectors i' and j ' relative to the Cartesian
coordinate system 0xy (Fig. 4.5). It is easy to see that the coordinates of
the unit vector i' are the cosines of the angles 'P and ; - 'P made by the
unit vector i ' with the x- and y-axes, respectively. Thus, we may write
i' = i cos 'P + j sin 'P·
4.1 Changing the Axes of Coordinates in a Plane 65

Similarly, the coordinates of the unit vector j' are the cosines of the
angles cp + ; and cp so that
j' = -i sin cp + j cos cp.

Since for an arbitrary point M its position vectors r = xi + yj and


r' = x' i' + y 'j' are equivalent, 1.e.,

xi + yj = X I i' + y j ''
I

then substituting the expressions for i' and j ' into the above identity, we
obtain
xi + yj = x' (i cos cp + j sin cp) + y' ( - i sin cp + j cos cp)
= (x' cos 'P - y' sin cp )i + (x' sin cp + y' cos cp )j.

X
0 x,x 1

Fig. 4.5 Fig. 4.6

Whence
x = x' cos 'P - y' sin 'P,
y = x' sin cp + y' cos cp.

Reflection of the axes of coordinates. Suppose that the Cartesian coor-


dinate system O 'x 'y' is obtained from the system Oxy by reversing the
y-axis as shown in Fig. 4.6. Then for an arbitrary point M its coordinates
(x, y) and (x' , y ' ) are related as
x' = x and y' = - y.
5-9505
66 4. Curves and Surfaces of the Second Order ______________


I
I
I
y I
x'

--- ---------

0 X

Fig. 4.7

Any change from one Cartesian coordinate system to another one with
the same unit distance may be made using in succession the translation,
rotation and reflection of the axes (Fig. 4. 7).

4.2 Curves of the Second Order


Suppose that we are given a Cartesian coordinate system in a plane~
A locus of points of a plane whose coordinates satisfy the equation
F(x, y) = 0,
where F(x, y) is a function of two variables, is called a plane cw·ve. The
equation (•) is said to be the equation of a plane curve.
For example, the equation x + y = 0 is the equation of a line which
bisects the second and the fourth quadrants of the coordinate plane
(Fig. 4.8), and the equation x 2 + y 2 - 1 = 0 is the equation of a circle with
unit radius and centre at the origin of coordinates (Fig. 4.9).
We consider the quadratic polynomial of two variables x and y
F(x, y) = Ax2 + 2Bxy + Cy 2 + 2Dx + 2Ey + F
(A 2 + B2 + C2 > 0).

y
y

X X
1

Fig. 4.8 Fig. 4.9


4.3 The Ellipse 67

The equation
F(x, y) =0
is said to be the equation of a curve of the second order.
While curves of the first order are straight lines (and only straight lines),
the quadratic polynomial equation provides a variety of curves of the se-
cond order. So it is helpful to precede our study of the general equation
of a curve of the second order with an investigation of some important
specific plane curves of the second order.

4.3 The Ellipse


The ellipse is a plane curve defined by the Cartesian equation
x2 Y2
-+-=l (4.1)
a2 bi '

where a ~ b > 0.
Equation (4.1) is called the standard Cartesian equation of the ellipse
and the associated Cartesian coordinate system is called the standard Carte-
sian coordinate system.
y

Fig. 4.10

The circle
x2 + y2 = a2 (4.2)
is an ellipse with a = b. This enables us to regard the ellipse (4.1) as a plane
figure obtained by uniformly compressing the circle (4.2) with coefficient
b/a towards the x-axis (Fig. 4.10). In other words, equation (4.1) of an ellipse
is obtained by substituting (alb )y for y in the equation x 2 + y 2 = a2 • (The
I '
'I
68 4. Curves and Surfaces of the Second Order

uniform compression of a circle towards the x-axis with coefficient k > 0


is a transformation which sends an arbitrary point M(x, y) on the circle
to the point M' (x, y/k).)
Properties of the ellipse. (1) The ellipse (4.1) is contained in the rectangle
P = ((x, Y) : lxl ~ a, IYI ~ b ).

◄ This property is easy to verify since


y2
and - ~ I
b2
for any point M(x, y) lying on the ellipse. ►
The points (±a, 0) and (0, ± b) are called the vertices of the el-
lipse (Fig. 4.11).

y
y

-a o a I
// 0
I I /
I I /
/

L- --~------=-=--
-b
- -J ..- /

(-xo,Yo)
Fig. 4.11 Fig. 4.12

(2) For the ellipse (4.1) the x- and y-axes of the standard Cartesian coor-
dinate system are the axes of symmetry and the origin O of coordinates
is the centre of symmetry. This means that when Mo(X-0, Yo) belongs to
the ellipse, so do the points ( -X-O, Yo), ( -X-O, -yo) and (X-O, -Yo) (Fig. 4.12).
(3) If a > b, that is, unless the ellipse is a circle, the x- and y-axes of
the standard Cartesian coordinate system are the unique axes of symmetry
of the ellipse (4.1).
Set c = ✓ a2 - b 2 • It is easy to see that c < a. The points ( - c, 0) and
(c, 0) are called the left-hand and right-handfoci of the ellipse. The distance
between the foci is equal to 2c.
(4) The ellipse is a locus of points of a plane such that the sum of
the distances of these points from two given points (from the foci of the
ellipse) is constant (is equal to a given number) (Fig. 4.13).
69

◄ Let M(x, y) be a point lying on the ellipse


x2 Y2
-a2+ -b2
=1.

The distance from M(x, y) to the left-hand and to the right-hand foci
of the ellipse are
Q1= ✓ (x+c) 2 +Y 2 and Qr= ✓ (x-c) 2 +y 2 .

Fig. 4.13

By the substitution

y2 = b(1 - ~)
2

we easily obtain

QI=

c2 C C
- x2 +2cx+a2 - -x+a =a+ -x,
a2 a a

because lxl ~ aand£ < 1.


a
Similarly, we obtain
C
Qr= a - - x.
a
Adding QI and Qr, we have the desired result
QI+ Qr= 2a.
.
Notice that we have proved in Chapter 1 (see Section 2, Exercise 2) that
any point possessing this property lies on the ellipse. ►
70 4. Curves and Surfaces of the Second Order

The number e = £ is called the eccentricity of the ellipse (4.1). It is


a
easy to see that O < e < 1, and e = 0 in the case of a circle.
The lines
a a
x + - = 0 and x - - = 0
e e
are called the directrices of the ellipse (4.1) {Fig. 4.14). Any ellipse has two
directrices.
(5) The ellipse is a locus of points of a plane such that for any point
of the locus the ratio of its distance from a given point (from the focus
of the ellipse) to its distance from a given line (from the directrix lying
on the same side of the y-axis as the focus) is constant (is equal to the
eccentricity of the ellipse).

y
y I
I
I
I
I
I
I
I _g_
e
'al-e
I
-a a I X X
I
I I
I I
I I
I I
I I
Fig. 4.14 Fig. 4.15

◄ Let M(x, y) be a point on the ellipse (4.1) (Fig. 4.15). The distances
from M(x, y) to the right-hand focus and to the right-hand directrix are
a
er = a - ex and dr = - - X.
e
Whence we obtain the desired result
er
dr = e.

Similarly, we obtain
_g!_ = a + ex = e.
di a +x
e
Now we consider the point (c, 0) and the line x = ae (c = ae).
The distance from the point M(x, y) to the given point (c, 0) and. to
_______ 4.4 The Hyperbola ______ ·--···--··-- 71

.
t he given 1·1ne y = -a are
e
a
✓ (x - c) 2 + y 2 and - X '
e
respectively.
We require that
✓ (x - c)2 + y2

I: I -x
= e.

Then
✓ (x - c)2 + y 2 = la - exl.
Squaring the above identity and setting b 2 = a2 - c2 , we easily obtain
x2 y2
-a2+ -b2
= 1.

Thus the point M(x, y) belongs to the ellipse (4.1). ►

4.4 The Hyperbola


The hyperbola is a plane curve given ·by the Cartesian equation
x2 y2
-a2 - -b2 = I ' (4.3)

where a > 0 and b > 0.


Equation (4.3) is called the standard Cartesian equation of the hyperbo-
la and the associated Cartesian coordinate system is called the standard
Cartesian coordinate system.
Properties of the hyperbola. (1) The hyperbola (4.3) lies outside the strip
lxl < a (Fig. 4.lg).
◄ It is easy to see that for any point lying on the hyperbola there holds

x2 y2
-2=l+-2~l.
a b
Therefore lxl ~ a for any point lying on the hyperbola. ►
The points (±a, 0) are the vertices of the hyperbola.
(2) The hyperbola (4.3) lies in the interior of the vertical angles formed
by the lines y = ±bx, the x-axis being the bisector of these angles
a
(Fig. 4.17).
72 4. Curves and Surfaces of the Second Order

◄ From the inequality


x2 Y2
a2 > --,;r
it follows that for any point M(x, y) lying on the hyperbola
lYI <·b
a
lxl
holds. ►

a r

Fig. 4.16 Fig. 4.17

Thus the hyperbola contains two parts called the branches of the
hyperbola.
The lines
x + Y = 0 and x - Y = 0
a b a b
are called the asymptotes of the hyperbola.
(3) The hyperbola contains points infinitely distant from the origin 0
of coordinates.
◄ For example, let M(x, y) be a point on the hyperbola and let IYI = n,
where n is an arbitrary positive number (Fig __ 4.18).

The~I =a TT> : lvl = : n


and

Let us establish relations between points lying on the hyperbola and


points belonging to the asymptotes of the hyperbola.
73

We choose a point M lying on the hyperbola (4.3) in the first quadrant


of the standard Cartesian coordinate system and a point N belonging to
the asymptote ; - t = 0 and having the same abscissa as M (Fig. 4.19).
These points are

0 a+b I X
n~

Fig. 4.18 Fig. 4.19

The distance between the points M and N is

d(M, N) =~ (x - ✓x2 - a 2 ) ·

Multiplying and dividing d(M, N) by x + ✓ x2 + a2 , we obtain


ab
d(M, N) = ✓ .
X + x2 - a2
Passing to the limit we conclude that d(M, N) tends to zero as x tends
to infinity.
Thus we have proved the following property of the hyperbola:
(4) If a point recedes along the asymptote from the origin of coordinates
into infinity, i.e., if x ➔ + oo, then there exists a point lying on the hyperbola
and having the same abscissa as the point on the asymptote so that the
distance between these two points tends to zero.
The converse is also true, namely:
(5) If a point M(x, y) recedes along the hyperbola from the origin of
coordinates into infinity, i.e., if x2 + y 2 ➔ oo, then the distance from
74 4. Curves and Surfaces of the Second Order

M(x, y) either to the asymptote


X +Y =O
a b
or to the asymptote
X y =0
a fJ
tends to zero.
(6) The axes of the standard Cartesian coordinate system are the axes
of symmetry for the hyperbola (4.J) and the origin of coordinates is its
centre of symmetry (Fig. 4.20).

';:-,_

(-Xo, · · · · · · · · · · · · · · · · · · · · · · ·· , oJ
/
'- ' /
' /
/
X

/0'
//
/
'-
'
.......... · · · · · · · · · · · · · Yo)

~ ~
;,,

Fig. 4.20

The axes of the standard Cartesian coordinate system are the unique
axes of symmetry of the hyperbola.
Set c = ✓ a2 + b 2 • It is easy to see that c > 0. The points ( - c, 0) and
(c, 0) are called the foci of the hyperbola. The distance between the foci
is 2c.
(7) The hyperbola is a locus of points of a plane such that for any point
the absolute value of the difference of its distances to two given points
(to the foci of the hyperbola) is constant (is equal to a given number).
◄ The proof is similar to that of Property 4 for the ellipse. For example,
we shall show that each point of the hyperbola possesses this property. Let
M(x, y) be a point lying on the hyperbola (4.3) (Fig. 4.21). Then the dis-
tances from M(x, y) to the foci of the hyperbola are

Qi = ✓ (X + C) 2 + y2 - C
a+-x
a
and
Qr = ✓ (X - C) 2 + y2 - a-~+
4.4 The Hyperbola 75

.
S1nce -C > 1,
a
a + £ x if x ~ a,
a
Qt=
- a - £ x if x ~ - a
a

!:I

--- - --- -- ,y)

Fig. 4.21

and
C
-a+-x if X ~ a,
a
Qr=
C
a--x if X ~ -a.
a
Whence
if X ~ a,
QI - Qr=
[_: if x~ -a.
Therefore we may write
le, - erl = 2a. ►
The number e = £ is called the eccentricity of the hyperbola (4.3). It
a
is easy to see that e > 1.
The lines
x+a =0 and x-a =0
e e
are called the directrices of the hyperbola (4.3).
76 4. Curves and Surfaces of the Second Order___ ___ . ---------··--------· - - - -

a 0 a X
-el le
I I
I I
I I
I I
I I
I I
I I
I I
Fig. 4.22

Any hyperbola has two directrices (Fig. 4.22).


The following property is similar to Property 5 of the ellipse:
(8) The hyperbola is a locus of points of a plane such that for any point
the ratio of its distance from a given point (from the focus of the hyperbola)
to its distance from a given line (from the directrix lying .on the same side
of the y-axis as the focus) is constant (is equal to the eccentricity of the
hyperbola) (Fig. 4.23).

Fig. 4.23
4.5 The Parabola 77

Fig. 4.24

The hyperbola
x2 y2 -
a2 - b2 - -1 (4.4)

is called the conjugate hyperbola of the hyperbola (4.3). Locations of the


hyperbolas (4.3) and (4.4) are shown in Fig. 4.24.

4.5 The Parabola


The parabola is a plane curve defined by the Cartesian equation
y2 = 2px, (4.5)
where p > 0.
Equation (4.5) is called the standard Cartesian equation of the parabola
and the associated Cartesian coordinate system is called the standard Carte-
sian coordinate system (Fig. 4.25).
y

Fig. 4.25 Fig. 4.26


78 4. Curves and Surfaces of the Second Order

Properties of the parabola. (1) The parabola lies in a semi plane to the
right from the y-axis, i.e., x ~ 0 for any point lying on the parabola
(Fig. 4.26). The origin of coordinates belongs to the parabola and is called
the vertex of the parabola (4.5).
(2) The parabola contains points infinitely distant from the origin of
coordinates..
(3) The axis of abscissas of the standard Cartesian coordinate system
is the unique axis of symmetry of the parabola (4.5) (Fig. 4.27).
The axis of symmetry is called the axis of the parabola.

The point «,
The number p is sometimes called the focal parameter of the parabola.
0) is the focus, and the line x
the parabola (4.5).
=- i is the directrix of

(4) The parabola is a locus of points of a plane such that for any point
its distance from a given point (from the focus of the parabola) is equal
to its distance from a given line (the directrix of the parabola) (Fig. 4.28).

lj

- .E I r
2 I
I
I
I
I

Fig. 4.27 Fig. 4.28

◄ Let M(x, y) be a point lying on the parabola (4.5). The distances from
M(x, y) to the focus ("i ,0) and from M(x, y) to the directrix x = -i are

x+P
2

respectively.
Substituting 2px for y 2 , we easily obtain
4.6 Properties of Curves of the Second Order 79

The converse is also true Let M(x, y) be a point equidistant from the
point ~ , 0) and from the line x = - ~ , i.e.,

Squaring the above identity, we easily obtain


y2 = 2px. ►

4.6 Optical Properties of Curves of the Second Order


Tangents to ellipses and hyperbolas. Let y = f(x) be an equa-
tion of a plane curve. Then the equation of a tangent to the plane curve
at a point (Xo, Yo), where Yo = f(Xo), may be written in the form
Y - Yo = f' (Xo)(x - Xo). (4.6)
Let Mo(Xo, Yo) be a point lying on the ellipse
x2 Y2
-a2+ -b2
=1.

Fig. 4.29

For definiteness we assume that the point Mo lies in the first quadrant
of the coordinate plane, i.e., Xo > 0 and Yo > 0 (Fig. 4.29).
The part of the ellipse in the first quadrant is given by the equation
~
y=b✓ ,.-IT.

Using (4.6), we obtain the equation of a tangent to the ellipse at the


80 4. Curves and Surfaces of the Second Order

point Mo(Xo, Yo)


Xob
y-yo = - - - - (x - Xo).

a1 j1 :4
Since the point Mo(Xo, Yo) belongs to the ellipse,

Yo=b✓ i-
~.
7
Whence we have
Xob 2
Y - Yo = - ~ - (x - Xo).
a yo
2

The above relation may be presented in the form


2 2
XXo + YYo _ = 0.
a2 b2 ( ~a2 + ~)
b2
Since
2 2
Xo + Yo _ 1
a2 b2 - '
we arrive at the equation
XXo + YYo _ l
7 7- ·
This is the equation of the tangent to the ellipse at the point (Xo, y0 )
which is independent of the location of Mo in the coordinate plane. In other
words, this equation is satisfied by the Cartesian coordinates of each point
lying on the ellipse.
Similarly, we may easily derive the equation of a tangent to the hyperbo-
la in the form
XXo _ YYo _ l
7 7-'
the point (Xo, Yo) lying on the hyperbola.
Tangents to parabolas. Let x = g(y) be an equation of a plane curve.
Then the equation of a tangent to this plane curve at a point (Xo, Yo), where
Xo = g(yo), may be written in the form

x - Xo = g' (yo)(y - Yo). (4.7)


Let Mo(Xo, Yo) be a point lying on a parabola. Using (4.7), we obtain
the equation of a tangent to the parabola at Mo(Xo, Yo) in the form
Yo
X - Xo = - (y - Yo)
p
4.6 Properties of Curves of the Second Order 81

or
YYo - Y~ + PXo - px = 0.
Whence, observing that y~ = 2pXo, we arrive at the equation of a tan-
gent of the form
YYo = p(x + Xo).
Remark. Comparing standard Cartesian equations of the ellipse, hyper-
bola and parabola with the equations of respective tangents we see that
the latter can easily be obtained from the former. Indeed, substituting YYo
for y 2 and XXo for x 2 in the equations of the ellipse and hyperbola, and
x + Xo for 2x in the equation of the parabola, we immediately arrive at
the equations of the respective tangents. Notice that the point with coor-
dinates (Xo, Yo) lies on a plane curve.
Optical property of the ellipse. Let Mo(Xo, Yo) be a point lying on the
ellipse
x2 Y2
-a2+ -b2
=1.

Recall that the distance from Mo to the foci Fi and Fr of the ellipse are
e, = le.xo + al and Qr = leXo - al,
respectively.
We draw through Mo a tangent line given by the equation
XXo + YYo _ l
7 y-.

Fig. 4.30

It is easy to find the distances from the tangent to the foci Fi( - c, 0)
and Fr(c, 0) (Fig. 4.30). We have
CXo l
--+
a2
and hr=µ

x~ x~
where p, = II a4 + b4 is a normalizing factor.

Ii 9505
82 4. Curves and Surfaces of the Second Order

We can easily verify that


h1 h,
---
e, e,
Indeed,
µ CXo
--+ l
h1 a2 _µ leXo + al _µ
-
e, le.xo + al a le.xo + al a
and
µ
CXo _ l
h, a2 _µ
--
e, leXo - al a

Fig. 4.31

Fig. 4.32

We see from Fig. 4.30 that the above ratios are equal to the sines of
the angles made by the segments FiMo and FrMo with the tangent. Since
the sines are equal so are the angles. Therefore a tangent at any point lying
on an ellipse makes equal angles with the line segments joining this point
with the foci of the ellipse. This property is sometimes called the optical
property of the ellipse because the rays from a source of light placed at
one of the foci are reflected from the "mirror" surface of the ellipse and
are focused into the other focus (Fig. 4.31).
4. 7 Classification of Curves of the Second Order 83

Optical property of the hyperbola. Similarly, we may establish the fol-


lowing optical property of the hyperbola: the rays from a source of light
placed at one o_f the foci seem to emanate from the other focus after being
reflected from a "mirror" surface of the. hyperbola (Fig. 4.32).

(a) (b)

Fig. 4.33

Optical property of the parabola. If a source of light is placed at the


focus of a parabola the rays from this source are reflected by the "parabolic
mirror" in directions parallel to the axis of the parabola (Fig. 4.33).

4. 7 Classification of Curves of the Second Order


We begin our classification with the discussion of plane curves given
by quadratic polynomials.
Theorem 4.1. Let us set up a Cartesian coordinate system Oxy in a plane
and let
f(x, y) = ax2 + 2bxy + cy 2 + 2dx + 2ey + g
(a 2 + b2 + c2 > 0) (4.8)
be a quadratic polynomial of two variables x and y.
Then there exists a Cartesian coordinate system O 'XY such that sub-
stituting X for x and Y for y reduces the polynomial f(xJ y) to the poly-
nomial F(X, Y) which may be of the following three kinds:
(a) AX2 + BY2 + C, A· B ~ O;
(b) BY2 + 2DX, B · D ~ O;
(c) BY2 + E,
◄ Step 1. By a suitable rotation of the x- and y-axes we may eliminate
the term 2bxy from f(x, y).
84 4. Curves and Surfaces of the Second Order

Let b-# 0.
Consider a Cartesian coordinate system Ox' y' obtained from the origi-
nal system Oxy by rotating the x- and y-axes through an angle cp (Fig. 4.34).
The new and original axes of coordinates are related as
x = x ' cos cp - y ' sin 'P, (4.9)
y = x' sin cp + y' cos 'P·
Substituting (4.9) into (4.8), we reduce the term 2bxy to the term
2b 'x' y', where
2b' = 2( - a sin cp cos cp + b(cos 2 'P - sin 2 cp) + c sin cp cos 'P)
= (c - a) sin 2'(J + 2b cos 2cp.
To eliminate 2b 'x 'y ' from the polynomial f(x' , y ') it suffices to set
2b ' = 0. Whence we have
a-c
cot 2'() = 2b .

lj
y'
y
I'
y

'f
a' X
X

0 r

Fig. 4.34 Fig. 4.35

Observe that a, b and c are known. From the above identity we may
find the value 'P of the angle through which we should rotate the original
axes of coordinates to eliminate the term 2b 'x 'y' from the polynomial
f(x' , y' ). In other words, we may always choose a Cartesian coordinate
system such that the original polynomial is reduced to the form
f(x, y) = ax2 + cy 2 + 2dx + 2ey + g,
where a2 + c2 > 0, and in what follows we shall consider the above poly-
nomial. For definiteness, we also set c ¢ 0, since substituting y for x, we
may always reduce the polynomial to a form with c ¢ 0.
4. 7 Classification of Curves of the Second Order 85

Step 2. By a suitable translation of the axes of coordinates we may fur-


ther simplify the polynomial f(x, y).
Consider a Cartesian coordinate system O 'XY obtained from the sys-
tem Oxy by translating the x- and y-axes (Fig. 4.35) in which case
X = x + a,
y = y + {3, (4.10)

where - a and - {3 are the coordinates of the origin O' of the system
0 'XY in the original system Oxy.
Depending on the relations between the coefficients a, b, c, d and e,
the polynomial f(x, y) reduces to one of the following three kinds
(1) a ¢ 0, c ¢ 0. Then setting a =d and {3 = ~, we obtain
a C
F(X, Y) = AX2 + BY2 + C,
d2 e2
where A = a, B = b, and C = g - - - -.
a C

(2) a = 0, d ¢ 0. Then setting

a = ~d ( g - e: ) and /j =~
we obtain
F(X, Y) = BY2 + 2DX,
where B = c and D = d.
(3) a = d = 0. Then setting
e
a= 0 and {3 =-
c
we obtain
F(X, Y) = BY2 + E,
e2
where B =c and E =g - -. ►
C

Standard equations of curves of the second order. We consider plane


curves of the second order defined by the polynomial equation
F(X, Y) = 0,
where F(X, Y) is a quadratic polynomial discussed in tne preceding
theorem.
Depending on the kind of F(X, Y) the polynomial equation determines
equations of plane curves of different shapes. Let us consider them
separately.
86 4. Curves and Surfaces of the Second Order

(1) F(X, Y) = AX2 + BY2 + C = 0, A · B # 0.


Assume that A · B > 0. Multiplying the polynomial equation by ( -1)
and substituting Y for X and X for Y, if necessary, we may reduce the
equation to a form with B ~ A > 0. Then
(a) If C < 0, we obtain an equation of the ellipse
x2 . y2
-+-=1
a2 b2 '

where a2 = - ~, b 2 = - i (a ~ b > 0).


(b) If C > 0, we obtain an equation of the imaginary ellipse*>
x2 y2
- + - = -1
a2 b2 '

wh ere a 2 = -
C and b 2 = -
C (a ~ b > 0).
A B
Notice that in the XY-plane there exists no point whose coordinates
satisfy this equation.
(c) If C = 0, we obtain the equation
x2 y2
-+-=0
a2 b2 '
1 an d b2 = B
wh ere a 2 = A 1 (a~ b > 0).

The origin of coordinates is the only point whose coordinates satisfy


this equation. We may regard the origin of coordinates as a point of inter-
section of two imaginary lines**>.
Assume that A • B < 0. Multiplying the polynomial equation by ( - 1)
and substituting Y for X and, if necessary, X for Y we may always reduce
the equation to a form with A > 0, B < 0 and C ~ 0.
Then
(a) If C < 0, we obtain an equation of the hyperbola
x2 y2
-a2- - =b2 1 ,

where a 2 =- A
C and b 2 =B
C (a > 0, b > 0).

•> This curve is called the imaginary ellipse since its equation resembles the equation of
the real ellipse.
••> We speak of imaginary lines since the respective equation resembles the equation
defining two intersecting lines in a plane.
4.7 Classification of Curves of the Second Order 87

(b) If C = 0, we obtain the equation


x2 y2
---=0
a2 b2 '

where a2 = A1 an d b2 =- 1.
B
This equation defines two intersecting lines
X Y X Y
---=0 and - + - = 0
a b a b
in a plane.
(2) F(X, Y) = BY2 + 2DX = 0, B · D -# 0.
Assume that B · D < 0. Substituting if necessary - X for X we always
reduce the polynomial equation to a form with B · D < 0. Then we obtain
an equation of the parabola
Y2 = 2pX,

where p = - D
B (p > 0).
(3) F(X, Y) = BY2 + E = 0, B -# 0.
Assume that B > 0. Then
(a) If E < 0, we obtain the equation
y2 - c2 = 0,
E
where c 2 = - B (c > 0).
This equation defines two parallel lines in a plane.
(b) If E > 0, we obtain the equation
y2 + c2 = 0,
where c 2 = ! (c > 0).
There is no plane which contains a point whose coordinates satisfy this
equation, called the equation of two imaginary parallel lines because it
resembles the equation defining two parallel lines.
(c) If E = 0, we obtain the equation
y2 = 0
which defines two coinciding lines in a plane.
We may determine what type of plane curve a polynomial equation
represents without making manipulations as given above. It suffices to de-
fine the signs of some expressions involving the coefficients of the poly-
nomial equation in question.
88 4. Curves and Surfaces of the Second Order

Table 4.1 Classification of Curves of the Second Order

D i:i Curve Shape

-
r
- Ellipse
'-- -- _/'
+ + Imaginary ellipse

0 Pair of imaginary intersecting


lines

~o Hyperbola
"'\ ~
-
/ ~ ·

0 Pair of intersecting lines ~I/ -

/
-
~
~o Parabola

Pair of parallel lines

-
0 0 Pair of imaginary parallel
lines

Pair of coinciding lines


4.8 Surfaces of the Second Order 89

For the polynomial equation


ax2 + 2bxy + cy 2 + 2dx + 2ey + g =0
the criteria to determine the type of the plane curve are the signs of
a b d
a b
D= and A= b c e
b C
d e g

The numbers D and A are called invariants since they are independent
of the Cartesian coordinate system set up in a plane.
Table 4.1 shows a classification of plane curves of the second order in
terms of D and A.

4.8 Surfaces of the Second Order


Suppose that we are given a Cartesian coordinate system in three-
dimensional space. A set of points whose Cartesian coordinates x, y and
z satisfy the equation
F(x, y, z) = 0
is called a surface in three-dimensional space. The equation (*) is called
the equation of a surface.

Fig. 4.36

Example. The equation


x2 + y 2 + z2 - a2 =O (a> 0)
is an equation of a sphere with radius r and centre at the point (0, 0, 0)
(Fig. 4.36).
90 4. Curves and Surfaces of the Second Order

Consider the quadratic polynomial of three variables x, y and z


F(x, y, z) = a11X2 + 2a12XY + 2a13XZ + a22y 2 + 2a23YZ
+ a33Z 2 + 2a14X + 2a24Y + 2a34Z + 0:'.44 = 0,
2 2 2 2 2 2
au + a12 + a13 + a22 + a23 + a33 > 0.
The equation
F(x, y, z) =0
is said to be the equation of surface of the second order.
This equation determines a rich collection of surfaces. Its investigation
is a mucli more complicated task than that of analyzing the equation of
a plane curve and requires special techniques and procedures. This will be
made in Chapter 6. Here we confine ourselves to classification of surfaces
in general and to standard equations of major surfaces of the second order.

4.9 Classification of Surfaces


Surfaces of revolution. We consider a curve -y (Fig. 4.3 7) defined
in the xz-plane by the equation
z = f(x) (x ~ 0).

z
'Y

0 X y

Fig. 4.37 Fig. 4.38

Revolving the curve -y around the z-axis generates a surface called the
surface of revolution (Fig. 4.38).
We find the equation of this surface, i.e., the equation to be satisfied
only by points lying on the surface.
Let Mo(Xo, 0, ~)bean arbitrary point on the curve -y (Fig. 4.39). When
the curve -y revolves around the z-axis the point Mo moves in a plane which
4.9 Classification of Surfaces 91

passes through Mo and is perpendicular to the z-axis. The path of moving


is a circle
x2 + y 2 = x5
with radius equal to the abscissa Xo of Mo.
Notice that for the given curve 'Y Zo = f(Xo). Therefore the coordinates
of any point on the circle are related as
Zo = J(✓x2 + y2 )·
It is easy to see that this reasoning is fully applicable to every point
on the curve 'Y. Whence we infer that the surface of revolution is defined
by the equation
Z = f (✓ x2 + Y 2 ) •

Fig. 4.39 Fig. 4.40

Cylindrical surfaces. Suppose that we draw through each point of a


given curve 'Y a line I parallel to a given line /o. A totality of lines drawn
in this way through points of the curve 'Y, i.e., a set of points lying on
parallel lines drawn through points of the curve 'Y is called the cylindrical
surface. The curve 'Y is called the directing line of the cylindrical surface
and any line which passes through a point on the directing line and is
parallel to the line lo is called the generating line of the cylindrical surface
(Fig. 4.40).
We find the equation of the cylindrical surface.
92 4. Curves and Surfaces of the Second Order

Consider a plane 1r which passes through an arbitrary point O and is


perpendicular to the generating line of the cylindrical surface (Fig. 4.41).
Let us set up a Cartesian coordinate system Oxyz with origin at O and
the z-axis perpendicular to the plane 1r. Then the plane 1r becomes a ~oor-
dinate plane, i.e., the xy-plane.
It is clear that the line of intersection of the cylindrical surface and
the plane 1r is the directing line -yo.
Let F(x, y) = 0 be the equation of the directing line -y0•
We show that the above equation may be considered the equation of
a cylindrical surface.

,.,,-1
.,, I

I
I
I
I
IJ
I
I
1o
~ I .......... J
0 X :
I .....
I /
/

Fig. 4.41 Fig. 4.42

◄ Indeed, if a point (x, y, z) lies on the cylindrical surface, then the point
(x, y, 0) belongs to the directing line -yo (Fig. 4.42). Hence the point
(x, y, 0) satisfies the equation
F(x, y) = 0.
On the other hand, this equation is also satisfied by the coordinates x and
y of the point (x, y, z). Therefore we may regard the equation
F(x, y) =0
as the equation of the cylindrical surface since it holds true for any point
on this surface. ►
Example. Let there be given a Cartesian coordinate system Oxyz in
three-dimensional space (Fig. 4.43). The equation
4.9 Classification of Surfaces 93

x2 y2
-+-=1
a2 b2
defines a cylindrical surface called the elliptic cylinder.
Remark. The equation
F(y, z) = 0
defines a cylindrical surface with directing line parallel to the x-axis, and
the equation
F(x, z) = O
defines a cylindrical surface with directing line parallel to the y-axis.

Fig. 4.43 Fig. 4.44

Conic surfaces. Suppose that we are given an arbitrary curve 'Y and a
point O outside 'Y· Let us draw lines through O and every point of 'Y. A
totality of lines thus obtained, that is a set of points lying on these lines,
is called the conic surface with the directing line -y and vertex O (Fig. 4.44).
Any line passing through the vertex O and a point on the directing line
'Y is called the generating line of the conic surface.
Consider a function F(x, y, z) of three variables x, y and z.
The function F(x, y, z) is called a homogeneous function of degree
q if for any t > 0 there holds
F(tx, ty, tz) = tq F(x, y., z).
94 4. Curves and Surfaces of the Second Order

Let us show that given a homogeneous function F(x, y, z) the equation


F(x, y, z) =O
defines a conic surface.
◄ Indeed, let

F(Xo, Yo, Zo) = 0,


i.e., a point Mo(Xo, Yo, Zo) lies on the surface defined by the equation
F(x, y, z) = 0.

Fig. 4.45 Fig. 4.46

We set F(0, 0, 0) = 0 and draw a line / through the points Mo and


0(0, 0, 0) (Fig. 4.45). The line / is given by the parametric equations
x = tXo, Y = !Yo, Z = tZo.
Substituting the parametric equations into F(x, y, z), we obtain
-
F(x, y, z) = F(tXo, tyo, t~) = tqF(Xo, Yo, -<A>) = 0.
This means that the line / belongs to the surface defined by the equation
F(x, y, z) = 0.
Hence this equation defines a conic surface. ►
Example. The function
x2 Y2 z2
F(x, Y, z) = - 2 + -2 - -2
a b c
4.10 Equations of Surfaces of the Second Order 95

is a homogeneous function of degree 2, i.e.,


(tx) 2 (ty) 2 (tz) 2
F(tx, ty, tz) = 2 + 2 - 2
a b c
= t2 ( x2 + Y2 -
2 2
z2 )
2 = t2F(x, y, z).
a b c
Whence we conclude that
x2 Y2 z2
-+---=0
a2 b2 c2

is an equation of a conic surface (Fig. 4.46).

4.10 Standard Equations of Surfaces of the Second Order


Ellipsoids. A surface of the second order given by the standard
Cartesian equation
x2 Y2 z2
-
a2+ -
b2+ -c2= 1,
where a ~ b ~ c > 0, is called the ellipsoid.
z z

0 X X

Fig. 4.47

To investigate the shape of the ellipsoid we revolve the ellipse


x2 z2
-+-=1
a2 c2

around the z-axis (Fig. 4.47). This leads to a surface


x2 + Y2 z2
---+-=l
a2 c2

called the ellipsoid of revolution, which gives an idea about the shape of
96 4. Curves and Surfaces of the Second Order

the ellipsoid. It suffices to compress the ellipsoid of revolution along the


y-axis with the compression coefficient b ~ 1 to get the ellipsoid given
a
by the standard Cartesian equation. In other words, substituting : y for
y in ( *), we obtain the standard Cartesian equation of the ellipsoid*>.
Hyperboloids. Revolving a hyperbola given by the equation
x2 z2
---=1
a2 c2
around the z-axis generates a surface called the one-sheet hyperboloid of
revolution (Fig. 4.48) which is defined by the equation
x2 + Y2 z2
a2 - 7 = 1.

Fig. 4.48

By compressing a one-sheet hyperboloid of revolution· uniformly with


compression coefficient b ~ 1 along the y-axis, we obtain a hyperboloid
a
of one sheet given by the standard Cartesian equation
x2 Y2 z2
-
a2+ - -
b2 - =
c2
1.

The standard Cartesian equation of the hyperboloid of one sheet is easily


obtained from the equation of the one-sheet hyperboloid of revolution by
substituting : y for y into the latter equation.

•> The ellipsoid of revolution is also obtained by uniformly compressing a sphere


x2 + y 2 + z2 = a2 with the compression coefficient £ ~ 1 along the z-axis.
a
4.10 Equations of Surfaces of the Second Order 97

Revolving a conjugate hyperbola given by the equation


x2 z2
- - - = -1
a2 c2
around the z-axis generates a two-sheet hyperboloid of revolution
(Fig. 4.49)
x2 + Y2 z2
-~
a2 - - -c2= -1 .
As before, by compressing a two-sheet hyperboloid uniformly with com-
pression coefficient b ~ 1 along the y-axis, we obtain a hyperboloid of two
a
sheets given by the standard Cartesian equation
x2 Y2z2
-
a2+ -
b2 - -c2= -1.
z

0 X X

Fig. 4.49
l

0 X X

Fig. 4.50

Elliptic paraboloids. Revolving a parabola given by the equation


x 2 = 2pz
around the z-axis generates a paraboloid of revolution (Fig. 4.50).
x2 + y 2 = 2pz.
7-9505
98 4. Curves and Surfaces of the Second Order

By compressing a paraboloid of revolution uniformly with compression

coefficient j; . ; 1 along the y-axis, we obtain• an elliptic paraboloid given

by the standard Cartesian equation


x2 . Y2
-+-=2z.
p q
It is easy to see that the standard equation of the elliptic paraboloid

is obtained by substituting $ y for y in the equation of the paraboloid


of revolution
x2 + Y2
---=2z.
p
When p < 0, the standard Cartesian equation describes the elliptic
paraboloid shown in Fig. 4.51.

z
lj h=-2

Fig. 4.51 Fig. 4.52

Hyperbolic paraboloid. A surface of the second order given by a stan-


dard Cartesian equation of the form
x2 Y2
---=2z
p q '
where p > 0 and q > 0, is called a hyperbolic paraboloid.
We shall investigate the shape of this surface by applying the foil owing
technique. Draw planes parallel to coordinate planes. These planes, called
sections, intersect the surface in question along plane curves. Mapping the
lines of intersection onto the coordinate planes, we obtain families of lines
4.10 Equations of Surfaces of the Second Order 99

whose structures, i.e., shapes and mutual locations of the lines on the coor-
dinate planes, enable us to make a conclusion on the shape of the surface
itself.
Let us start with sections z = h = const parallel to the xy-plane. Depend-
ing on the values of h, we obtain three families of intersection lines, namely
(a) a family of hyperbolas
x2 y2
----= 1
(✓2ph )2 (✓2qh )2 '

where h > O;
(b) a family of conjugate hyperbolas
x2 y2
-;:::::===-= -1
(✓ -2ph )2 (✓ - 2qh) 2 '

where h < O;
(c) two intersecting straight lines
y2
----c=-= =0
(vq)2 '
provided that h = 0.

Fig. 4.53

Notice that these straight lines are asymptotes of all the hyperbolas of
the families (a) and (b), i.e., they are asymptotes of hyperbolas for any value
of h distinct from zero.
Mapping the intersection lines onto the xy-plane, we obtain the family
of lines shown in Fig. 4.52 from which we infer that the surface in question
is of a saddle shape (Fig. 4.53).
I'
100 4. Curves and Surfaces of the Second Order

Now we cut the surface by planes y = h. Substituting h for y in (*),


we obtain a family of parabolas in the xz-plane

x2 = 2p(z + 2~q)'
as shown in Fig. 4.54.
Similarly, cutting the surface by planes x = h, we obtain a family of
parabolas in the yz-plane
2
y2 = - 2q ( z - 2pq
h ) ,

as shown in Fig. 4.55.


z
z

0 I

Fig. 4.54 Fig. 4.55


z

Fig. 4.56

From Figs. 4.54 and 4.55 we conclude that a hyperbolic paraboloid


(Fig. 4.56) is obtained by translating a parabola x 2 = 2pz along a parabola
y 2 = - 2qz or by translating y 2 = - 2qz along x2 = 2pz.
Remark. Intersecting a surface by a plane parallel to coordinate planes
is fully applicable to the analysis of all surfaces considered above. However,
revolving plane curves of the second order and further compressing surfaces
thus obtained is a much easier way to investigate surfaces of the se~ond
order.
4.10 Equations of Surfaces of the Second Order 101

Cylinders. Recall that the shape of a cylinder (of cylindrical surface)


is determined by the shape of its directing line. Here we enumerate the fol-
lowing kinds of cylinders we have encountered in the preceding sections.
These are
z

:.r.

;---- - ............
L_
-- ....
Fig. 4.57 Fig. 4.58

(a) elliptic cylinder (Fig. 4.57)


x2 y2
-+-=l
a2 b2 '
(b) hyperbolic cylinder (Fig. 4.58)
x2 y2
---=1
a2 b2 '
(c) parabolic cylinder (Fig. 4.59)
y2 = 2px.
z z

X
X

-----

Fig. 4.60
Fig. 4.59
102 4. Curves and Surfaces of the Second Order

Cones of the second order. A surface given by the standard Cartesian


equation
x2 Y2 z2
-+---=0
a2 b2 c2
1s called a cone of the second order.
We may investigate the shape of this surface either by revolving two
intersecting straight lines
x2 z2
---=0
a2 c2
around the z-axis and further compressing the surface thus obtained or
by intersecting the cone in question by planes parallel to coordinate planes.
In both cases we infer that a cone of the second order is of the shape shown
in Fig. 4.60.

Exercises
x2 y2
1. For the hyperbola 9 - -16 = 1 find; (a) the coordinates of
foci, (b) the eccentricity, (c) the equations of asymptotes and directrices,
(d) the equation of the conjugate hyperbola and its eccentricity.
2. Write down the equation of a parabola provided that the distance from
the focus to the vertex is equal to 3.
2 2
3. Write down the equation of the tangent to the ellipse ; 2 + ; 8 = 1
at the point M( 4, 3).
4. Identify the types and locations of plane curves given by the equations:
(a) x 2 + 2y + 4x - 4y = 0; (b) 6xy + 8y 2 - 12x - 26y + 11 = 0;
(c) x2 - 4xy + 4y 2 + 4x - 3y - 7 = 0; (d) xy + x + y = 0, (e) x2 -
5xy + 4y 2 + x + 2y - 2 = 0; (f) 4x 2 - 12xy + 9y 2 - 2x + 3y - 2 = 0.

Answers
= 35 , (c) y = ± 34 x, x -= ± 59 ; (d) 9x - 16
2 Y 2
1. (a) F1(-5, 0), Fr(5, O); (b) e -= -1,

~ ~
e = 45 . 2. y 2 = l2x. 3. 3x + 4y - 24 = 0. 4. (a) The ellipse 6 + - 3- = 1 with centre
2 2
at O ' ( - 2, 1) and the major axis O 'X parallel to the x-axis; (b) the hyperbola ~ - ~ =1
with centre at O' ( -1, 2) and the tangent of the angle between the axis O 'X and the x-axis
equal to 3; (c) the parabola Y2 = Js X with vertex at O' (3, 2), the vector of the O 'X-axis
directed to the vertex is { - 2, - 1 ); (d) the hyperbola with centre at O ' ( =-1~.J), the asymptotes
are parallel to the x- and y-axes; (e) a pair of intersecting straight lines x - y - 1 = 0 and
x - 4y + 2 = O; (f) a pair of parallel straight lines 2x - 3y + 1 = 0 and 2x - 3y - 2 = 0.
Chapter 5
Matrices. Determinants.
Systems of Linear Equations

5.1 Matrices
Definitions. An m x n matrix is an array of m•n numbers au (i = 1,
2, ... , m; j = 1, 2, ... , n) arranged in the rectangular form

au 0:12
0:21 0:22
A= (5.1)
Cim 1 Cim2 Cimn

The numbers a;j (i = 1, 2, ... , m; j = 1, 2, ... , n) are called elements


or entries or coefficients of A.
The horizontal n-tuple of numbers
a;1, a;z, ... , Ciin (i = 1, 2, ... , m)
is called the ith row of the matrix A.
The vertical m-tuple of numbers
Citj
Ci2j
U = 1, 2, ... , n)

Cimj

is called the jth column of the matrix A.


Therefore each m x n matrix has m rows and n columns. The element
a;i occupies the position where the ith row and jth column meet.
The numbers i and j indicate the position of the element a;i in A and
may be thought of as the coordinates of a;i in A (Fig. 5.1). A concise nota-
tion for matrices of the form m x n is
A= (aij).
A 1 x n matrix is called a row-vector and an m x 1 matrix is called
a column-vector.
104 5. Matrices. Determinants. Systems of Linear Equations

When m = n, the matrix

a12 lXIn
a22 a2n
A= . .. . . . . .. . .. . . . . . . .. .
lXnl lXn2 lXnn

is called a square matrix of order n.

[ - ----

Fig. 5.1

For example, the matrix A = (a11) containing a single element 1s a


square matrix of order 1.
The n-tuple of numbers
a 11 , lX22 , . . . , lXnn

is called the principal diagonal of the matrix A.


An m x n matrix whose elements are zeros is called a zero matrix.
A square matrix of the form

1 0 0
0 1 0
I= ..... . . . . . . . . . . '
(5.2)
0 0 1
is called an identity or unit matrix.
For any m x n matrix there exists a zero matrix and for any square
matrix of a given order n there exists a unit matrix.
We shall denote the set of all matrices of the type m x n by IRm x n with
the understanding that we are concerned with matrices whose elements are
5.1 Matrices 105

real numbers. Since a set of real numbers is conventionally denoted by fR,


the notation fRm x n signifies that we consider a set of all m x n matrices
of real numbers. In Chapter 26 we shall consider a set of m x n matrices
of complex numbers denoted by cm x n because a set of complex numbers
is conventionally denoted by C.
To signify that matrix (5.1) is of the type m x n we shall write
A= (au) E IRmxn.

The matrices A = (au) and B = (/3u) are called equal if they are of the
same type and their elements occupying identical positions coincide, i.e., if
AEIRmxn, BEIRmxn
and
a;i=/3u (i= 1, 2, ... , m; j= 1, 2, ... , n).
In symbols, we write A = B.
Now we shall turn our attention to arithmetic operations on matrices.
Addition of matrices. Let A and B be two matrices of type m x n, that
.
IS,
A= (au) E IRmxn and B = (/3u) E IRmxn.
The sum of matrices A and B is the matrix C = (')'ij) E fRm x n whose
elements are
'"Yii = au + /3u (i = 1, 2, ... , m; j = 1, 2, ... , n). (5.3)
In symbols, we write C = A + B.
Multiplication of a matrix by a scalar. The product of a matrix A =
(au) E IRm x n by a scalar Ais the matrix B = (/3u) E IRm x n whose elements are
f3u=Aa;i (i= 1, 2, ... , m; j= 1, 2, ... , n). (5.4)
In symbols, we write B = AA.
By way of illustration we show how these operations are performed by
using notation (5.1 ):

an a12 ... a1n /311 /312 b1n


a21 a22 ... a2n /321 /322 f32n
. . . . . . . . . . . . . . . . . . . .. + . .. .... . . .. . . . . .. . . .
Olm! am2 Olmn f3m1 f3m2 f3mn

an+ /311 a12 + /312 Olin + f31n


a21 + /321 a22 + /322 Ol2n + f32n
'
Olm! + f3m1 Olm2 + f3m2 Olmn + f3mn
106 5. Matrices. Determinants. Systems of Linear Equations

0'.11 a12 O'.ln /\a11 Aa12 l\a1n


a21 a22 0'.2n /\a21 l\a22 l\a2n
..................... ............... .........
O'.ml O'.m2 O'.mn /\0'.ml l\am2 l\O'.mn

Multiplication of matrices. Let A = (a;k) and B = (/3kj) be two square


matrices of order n. The product of A and B is the matrix C = ('Yu) E fRn x n
whose elements are
'YU = a;1{31j + O'.i2/32j + ... + O'.inf3nj (5.5)
(i, j = 1, 2, ... , n).
In symbols, we write C = AB.
By way of illustration we write this operation as

O'.tj O'.In
0'.2j 0'.2n

I O'.il O'.i2 O'.ij O'.in

O'.n 1 O'.nj O'.nn

J
/311 (312 /31j /31n
/321 (322 /32j f32n
. . . . ... . ... . . . . . . . . . . . . . . . .. ...
X
{3;1 {3;2 f3u /3in
. . . . . ... . . . . . . . . . . .. .. . .. . . .. . .
f3n1 f3n2 /3nj f3nn

J
"/11 "/12 "/lj "/In
"/21 "/22 "/2j "/2n

'Yil 'Yi2 'YU "/in l.

'Yn 1 "/n2 "/nj 'Ynn

Notice that the sum of n ele~ents (numbers)


a1 + a2 + . . . + a; + ... + O'.n

is conventionally written in the form


n
~ a;.
i=1
5.1 Matrices 107

Then the operation of multiplication of two matrices may be conveniently


presented as
n
"fij = I; CXikf3kj (i, } = 1, 2, ... , n).
k=I

In general AB ~ BA.
Examples. (1) Let

A= (~ ~) and B = (~ ~)-

Then

AB= (~ ~) and BA= (~ ~) ·

It is not hard to find similar examples for matrices of any order.


(2) Let A be a square matrix of order 3, 1.e.,

CXI 1 CX12
CX13)
A = ( CX21 cx22 cx23 •
CX31 CX32 CX33

◄ We show that pre-multiplying the matrix A by the matrix


0
0
1
interchanges the second and third rows in A.
We have
0
0
Q)
1 ·
( CX11
cx21
cx12
cx22
CX13)
CX23
1 Q CX31 CX32 CX33

l ·cx12 + O·cx22 + O•cx32 l ·cx13 + O·cx23 + O·cx33)


O·cx12 + O·cx22 + l •cx32 O•cx13 + O·cx23 + l ·cx33
O·cx12 + l ·cx22 + O•cx32 O·cx13 + l ·cx2 3 + O•cx33 .
cx11 cx12
( CX13)
= CX31 CX32 CX33 • ►
cx21 cx22 cx23

Similarly, it is easy to verify that post-multiplying the matrix A by P 23


interchanges the second and third columns in A.
108 5. Matrices. Determinants. Systems of Linear Equations

(3) For any matrix A there holds


l·A = A·I ='A, (5.6)
where I is an identity matrix.
◄ Let A be a square matrix of order 3

0'.11
( 0'.13)
A= a21 0'.23 ·
0'.31 0'.33

Then
0'.11 a12
A·I = ( a21 a22
0'.31 0'.32

a11·l + a12·0 + a13•0 a 11 ·0 + a12·l + a1rO


=( a21 · 1 + a22 · 0 + a23 · 0 a21 · 0 + a22 · 1 + a23 · 0
a31·l + a32·0 + a3rO a 31 ·0 + a32·l + a33·0

0'.11 a12
0'.13)
= ( a21 a22 a23 = A.
0'.3 I 0'.32 Q'.33

It is easy to verify in the same way that


l·A = A. ►

Relation (5.6) explains why the matrix I is called the identity matrix.
The operation of matrix multiplication obeys associative and distribu-
tive laws, namely if A, B, C and D are square matrices of order n, then
(AB)C = A(BC),
A(B + C) = AB + AC,
(B + C)D = BD + CD.
◄ By way of illustration we show that A(B + C) = AB + AC. It is clear
that the three matrices AB, AC and A(B + C) are of the same order n.
Their elements in the (i, j)th position are, respectively,
~

n
~ O'.ik"(kj, (i,j= 1, 2, ... , n).
k=l
n
~ O'.ik(/3kj + "(kj),
k=l
5.1 Matrices 109

Then we obtain
n n n
~ a;k(f3kj + "(kj) = ~ a;kf3kj + ~ aik"(kj
k=l k=l k=l
and this proves the property in question.
Similar reasoning is applicable to the other two identities. ►
Remark. The operation of matrix multiplication may also be defined
for rectangular matrices.
◄ Let there be given matrices A = (a;k) E fRm x n and B = (f3kJ) E fRn x 1.
Then the elements of the matrix C = AB E fRm x I are
n
'Yii = ~ a;k/3ki (i = 1, 2, ... , m; j = l, 2, ... , /). (5.7)
k=l
Therefore the product of two rectangular matrices may be defined only
when the number of columns in the first factor is equal to the number
of rows in the second factor (Fig. 5.2) as is easily seen from (5.7). ►


6x5

Fig. 5.2

Products of rectangular matrices obey the laws (5.6) provided that the
corresponding operations of matrix multiplication make sense.
Example. Compute the product of the matrix

A =( ! !) by the matrU B = (: : : : ).
◄ Observe that the number of columns in A is equal to the number of
rows in B. This means that we may multiply A by B. Computing the
product, we obtain

AB=(! !)-(: : :
9·1 + 5·1 9·0 + 5.9 9·1 + 5.4 9·2 + 5·2)
= ( 1·1 + 9·1 1·0 + 9.9 1·1 + 9.4 1·2 + 9·2
8·1+6·1 8·0+6·9 8·1 + 6·4 8·2 + 6·2
14 45 29 28)
- ( 10 81 37 20 . ►
14 54 32 28
110 5. Matrices. Determinants. Systems of Linear Equations

Summing up elements of a matrix. In applications we often need to


compute the sum of elements of a rectangular matrix, say the matrix A
0:'.11 a12 a1n
a21 a22 a2n
A=
am1 am2 amn

We may compute the sum H of elements of A either by computing


column sums
m m m
b ail, b a;2, ... , b a;n
i =1 i=1 i =1
and adding them together so that
m m
ff = b a;i + ~ a;2 +
i=l i=l

or by finding row sums


n n n
~ a1j, ~ a2j, ... , b amj
j=l j=l j=l

and adding them together so that


n n
f ·(±au).
n
ff = ~ a lj + b a2j + + ~ amj =
j=l j=l j=l i=l }=1

Whence we obtain
n m m n
~ ~au=~ b au.
J=l i=l i=l j=l

Transposition of matrices. The matrix

a11 a21 am1


a12 a22 am2

a1n a2n amn

is called the transpose of the matrix_

a12 a1n
a22 a2n
A= (5.1)
am2 amn

In symbols, we write A ' to denote the transpose of A.


5.1 Matrices 111

Example. By definition the transpose of

A= G 2
6
3
7 :)
.
lS
1 5
2 6
A' -
3 7
4 8
It is important to observe that the element of A' in the (i, ))th position
coincides with the element of A in the U, 1)th position. The operation of
transposition interchanges rows and columns of the matrix A so that rows
in A become columns in A ' and columns in A become rows in A '. There-
fore if the matrix A has m rows and n columns, the transpose A' of A
contains n rows and m columns (Fig. 5.3).

6 x11

11x6

8 2

2
V

V
V
V

Fig. 5.3

The operation of transposition satisfies the following conditions:


(a) (A')'= A,
(b) (A + B)' =A' + B ',
(c) (AA)' = M ',
(d) (AB)' = B 'A'.
112 5. Matrices. J?eterminants. Systems of Linear Equations
---'==---------

Linear dependence of row-vectors. We consider the operations of matrix


addition and multiplication of a matrix by a scalar over a set of 1 x n
matrices, i.e., row-vectors.
Let
a = (a1, a2, ... , an ) E fR
lxn
,
b = (/31, f32, ... , f3n) E fR 1 xn.

Using (5.3) and (5.4), we obtain


a + b = (a1 + f31, a2 + (32, ... , an + f3n) (5.8)
and
(5.9)
It is easy to verify that operations (5.8) and (5.9) satisfy the following
conditions:

(a) a + b = b + a,
(b) (a + b) + c = a + (b + c),
(c) a + 0 = 0 + a = a,
(d) A(a + b) = Ml + Ab,
(e) (A + µ) a = Ml + µa, (5.10)
(f) A(µa) = (Aµ)a,
(g) I •a = a,
the equation a + x = 0 being uniquely soluble for any row-vector a. In
(5.10) A and µ are arbitrary scalars, a, b, c and x are row-vectors (1 x n
matrices), 0 is a zero row-vector (a zero 1 x n matrix). We shall see in
Chap. 6 that conditions (5.10) define a linear vector space over a set of
row-vectors.
Now we shall introduce an important notion of linear dependence of
row-vectors.
Let a1, a2, ... , am be m row-vectors. A relation of the form
(5.11)

is called a linear combination of row_-vectors a1, a2, ... , am with scalars


A1, A2, ... , Am.
The linear combination (5.11) is called nontrivial if at least one of A1 ,
A2, ... , Am is distinct from zero and trivial if A1 = A2 = . . . = Am = 0. It
is easy to see that in the latter case b is a zero row-vector.
Row-vectors are called linearly dependent if there exists their nontrivial
linear combination identical to a zero row-vector, and linearly independent
if a zero row-vector corresponds to their trivial linear combination only.
5.1 Matrices 113

We show that for linearly dependent row-vectors one of them may be


expressed as a linear combination of the others.
◄ Let a1, a2, ... , am be linearly dependent row-vectors. This means that
there exist scalars A1, A2, ... , Am, not all zero, such that
A1a1 + A2a2 + ... + Am- 1am- l + Amam = 0.
Without a loss of generality we may set Am -;t; 0. Transposing the first m - 1
summands to the right, we obtain
Amam = -A1a1 - A2a2 - ... - Am- 1am- l•
Then dividing the above expression by Am -;t; 0, we arrive at

Whence we conclude that am is a linear combination of the other row-


vectors.
The converse is also true, namely, if one of the row-vectors is a linear
combination of the other (m - 1) row-vectors, i.e., if

then the nontrivial linear combination


µ1a1 + µ2a2 + ... + µm-lam-1 + (-l)am

of the row-vectors a1, a2, ... , am - 1, am is equal to a zero row-vector. Hence


the row-vectors are linearly dependent. ►
Similar property of linear dependence is easily derived for the set [Rm x 1
of column-vectors, that is, for the set of all m x 1 matrices.
Elementary operations on matrices. Let A and A be two arbitrary
m x n matrices and let
a 1, a2, ... , ak, ... , a,, ... , am
be rows in A.
The matrix A is said to be obtai:ped from the' matrix A by
(a) interchanging two rows in A if a1, a2, ... , a1, ... , ak, ... , am are
rows of A,
(b) multiplying one particular row of A by a nonzero scalar {3 if a1,
a2, ... , f3ak, ... , a,, ... , am are rows of A,
(c) adding one row of A multiplied by a scalar 'Y to another row if a1,
a2, ... , ak, ... , a1 + yak, ... , am are rows of A.
The operations (a)-(c) are called the elementary row operations.
Elementary column operations can be defined similarly.

8-9505
114 5. Matrices. Determinants. Systems of Linear Equations .
-----

Example. The matrix

(i ~ D
is obtained from the matrix

A=(i ! ~)
by interchanging the second and third rows, and the matrix
1
0
0
is obtained from A by interchanging the first and second columns.
Adding the third row of A multiplied by - 2 to the first row of A,
we obtain the matrix
-12
1
6
Remark. It is easy to see that when the matrix A is obtained by applying
either of the elementary row operations to the matrix A, the transition from
A to A is achieved by the same row operation, that is, either by interchang-
ing the kth and /th rows or by multiplying the kth row by the scalar 1/ {3
or by adding the kth row multiplied by - 'Y to the /th row.
Now we shall turn our attention to the procedure of changing from
an arbitrary matrix to a matrix of simpler form by a finite sequence of
elementary row operations.
Let A = (au) E [Rm x n be a nonzero matrix.
Step 1. Since A is a nonzero matrix there exists at least one nonzero
element in A. Consequently, there exists at least one nonzero row in A.
We choose a nonzero row such that its first nonzero element occurs in a
column with the smallest number k1 ~ 1. Interchanging this row and the
first row of A, we reduce A to the matrix

(5.12)
5.1 Matrices 115

<Xik1 •
Adding scalar multiples of the first row of (5.12) by - (I) (1 = 2,
<X1k1
3, ... , m) to the corresponding rows, we obtain a matrix of the form
0
0 (5.13)
rv(l)
0 0 0 u.mn

Observe that the elements in the k1th row are zeros with the exception of
the element a~}/1 •
It may happen that rows of (5.13) are zero ones with the exception of
the first row. In this case the procedure terminates. If this is not the case,
i.e., if there exist nonzero rows in addition to the first row, the procedure
goes on.
Step 2. Similar to Step 1 we choose a row such that its first nonzero
element occurs in a column with the smallest number k2 (k1 < k2).
Then interchanging this row and the second row of (5.13), we obtain
0 0 (X~~I (X~~2 - 1 (X~~2 (X~~

0 0 0 0 ai:J 2 a~;/
rv(l)
0 0 0 0 u.mn

(5.14)
(I)
. 1es o f the second row bY -
Adding scalar mult1p a~:J
<Xikz (.
, = 3, 4, . . . '
2

m) to the corresponding rows, we reduce (5.14) to a matrix such that its


first row is identical to that of the first row in (5.13) and its elements in
the k2th row are zeros with the exception of two elements from the top.
The procedure terminates if the matrix thus obtained contains no non-
zero ·rows but the first and second rows, and goes on otherwise. It is impor-
tant to observe that for the procedure discussed the total number of steps
to reduce a given matrix to a simpler form does not exceed the number
min (m, n). This means that the procedure terminates in a finite number
of successive steps and we arrive at a matrix of the form
0 ... 0 a\~. . . . . .. . . .
0 0 0 0 a~f2 ...
. . .. .. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .
<X(r)
0 0 0 0 0 0 rk (5.15)
'
0 0 0 0 0 0 0 0
.....................................................
0 0 0 0 0 0 0 0

8*
116 5. Matrices. Determinants. Systems of Linear Equations

where k1 < k2 < . . . < k, and

o:i~. ¢ 0, o:~~2 ¢ 0, ... ' o:~~, ¢ 0.


The matrix (5.15) is called a matrix of the schematic form.
Notice t~at the procedure of reducing a matrix to the schematic form
involves a sequence of elementary row operations (a) and (c). In other
words, we may state the following theorem.
Theorem 5.1. The transition from any arbitrary matrix to a matrix of
schematic form is achieved by a sequence of elementary row operations.
Examples. (1) Reduce the matrix
0 0 0 0 0 1
0 0 -2 3 -4 5
A= 0 0 0 0 0 0
0 1 1 1 1 1
to a matrix of schematic form.
◄ Interchanging the first and fourth rows of A, we obtain

0 1 1 1 1
0 0 -2 3 -4
A1= 0 0 0 0 0
0 0 0 0 0
Interchanging the third and fourth rows of A1, we obtain
0 1 1 1 1 1
0 0 -2 3 -4 5
A2 = 0 0 0 0 0 1
0 0 0 0 0 0
The matrix A2 is of schematic form. ►
(2) Reduce the matrix
3 -1 3 2 5
5 -3 2 3 4
A= 1 -3 -5 0 -7
7 -5 1 4 1 \
to a matrix of schematic form.
◄ Interchanging the first and third rows, we obtain

1 -3 -5 0 -7
5 -3 2 3 4
A1= 3 -1 3 2 5
7 -5 1 4 1
5.1 Matrices 117

Step 1. Subtracting the first row of A1 multiplied by numbers 5, 3 and


7 from the second, third and fourth rows, respectively, we obtain
1 -3 -5 0 7
0 12 27 3 39
A2 = 0 8 18 2 26
0 16 36 4 50
Step 2. To simplify computations we apply the elementary operation
(b), namely, we multiply the second row by 1/3, the third row by 1/2 and
the fourth row by 1/2. Then from A2 we obtain
-3 5 0 -7
4 9 1 13
4 9 1 13
8 18 2 25
Subtracting the second row of A3 multiplied by 1 and 2 from the third
and fourth rows, respectively, we arrive at
-3 5 0 -7
4 9 1 13
0 0 0 0
0 0 0 -1
Step 3. Observe that the third row of Ai is a zero one. Then interchang-
ing the third and fourth rows, we obtain
-3 5 0 7
4 9 1 13
As= 0 0 0 -1
0 0 0 0
The matrix As is of schematic form. ►
By a sequence of elementary column operations a matrix of the form
(5.15) is reduced to the form
r
1 0 0 0
0 1 0 0
r . . . . .. . . . . . ... . . . . . . 0
0 0 1 0
... . . . .. .. . . . . . .. .. . (5.16)
0 0 0 1

0 0
118 5. Matrices. Determinants. Systems of Linear Equations

whose elements are zeros with the exception of those occupying the posi-
tions (1, 1), (2, 2), ... , (r, r), each of which is equal to unity.
Interchanging the columns in (5.15) so that the k1 th column replaces
the first one, the k2th column replaces the second one, etc., until the krth
column replaces the rth column, we obtain a matrix of the form

O'.tr O'.ln
0'.2r 0'.2n
(5.17)
O'.rr O'.rn

0
where &1 I ~ 0, &22 -:¢. 0, ... ' CXrr -:¢. 0.
For example, interchanging the third and fifth columns in As, we obtain
7 0 5
13 1 9
-1 0 0
0 0 0

Adding multiples of the first column of (5.17) by - ~Ii U = 2, 3, ... ,


0'.11
n) to the corresponding columns, we obtain a matrix of the form

0'.11 0 0 0 0
0 0'.22

0 0 0 O'.rn

0 0 0 0 0

where the first row contains only one nonzero element, &.11.
Similarly, operating on the rows 2, 3, ... , r, we obtain the matrix

&.11
&22 0 \
&;; (5.18)

0 CXrr

By a sequence of elementary column operations of type (b) matrix (5.18)


is reduced to the form (5.16).
119
- - -5.1
- Matrices
--------------- - - - - - - - - - - - - - - - - -

It is easy to observe that the matrix A6 admits a representation of the


form
1 0 0 0 0
0 4 0 0 0
A1 = 0 0 -1 0 0 '
0 0 0 0 0
whence we obtain

1 0 0 0 0
0 1 0 0 0
As= 0 0 1 0 0
.
0 0 0 0 0

Elementary matrices. The elementary operations which are summarized


above are closely associated with square matrices called elementary ma-
trices. These are of the following types:
(a) matrices obtained from the corresponding identity matrices by inter-
changing any two rows. For instance, the matrix

i
.
J
1 •

1
••• 0 1 l
P;i = 1
•• .
~

1
1 . .. 0 ... j
• 1


1

is obtained from the identity matrix

0 .
1 l
"
1 j
0
120 5. Matrices. Determinants. Systems of Linear Equations

by interchanging the ith and jth rows. Observe that off-diagonal elements
of Pu are zeros with the exceptions of the elements in the (i, })th and
U, l)th positions.
(b) Matrices obtained from the corresponding identity matrices by sub-
stituting a nonzero scalar for a diagonal element. For example, the matrix
1

1
. . . . . /3 . . . . .
J
1

1
J

differs from the identity matrix by the element {3 ~ 0 in the U, })th position.
Notice that all off-diagonal elements of Dj are zeros.
(c) Matrices that differ from the corresponding identity matrices by one
off-diagonal element. For example, the matrix
i
1

Lu= ·••')'···1 .... J


.•
1
differs from the unit matrix by the element -y in the U, i)th position, and
the matrix
J
1

Ru= I

1
l

\
also differs from the identity matrix by the same element but located in
the (i, j)th position. Observe that off-diagonal elements of Lij and Rij are
zeros with the exception of the element -y.
We point out here that for any matrix each elementary operation is
equivalent to pre- or post-multiplication of the matrix by a suitable elemen-
tary matrix.
5.1 Matrices
----------.~-----··--·------ - . 121

Theorem 5.2. Elementary operations on a matrix are equivalent to pre-


and post-multiplication of the matrix by elementary matrices.
Let A be an arbitrary matrix and Pu, Dj, Lu and RiJ be elementary
matrices given above. Then
(a) interchanging the ith and jth rows of A is equivalent to pre-
multiplying A by Pu,
(b) multiplying one particular row of A by a nonzero scalar {3 is equiva-
lent to pre-multiplication of A by D1,
(c) adding one row of A multiplied by a scalar 'Y to another row is
equivalent to pre-multiplying A by Lu,
(a') interchanging the ith and jth columns of A is equivalent to post-
multiplication of A by Pu,
(b ') multiplying one particular column of A by a nonzero scalar {3 is
equivalent to post-multiplication of A by Dj,
(c ') adding one column of A multiplied by a scalar 'Y to another column
is equivalent to post-multiplication of A by Ru,
◄ For simplicity we consider a square matrix of order 3

0'.11 a12
Q'.13)
A= ( a21 Q'.22 0'.23 ,
0'.31 0'.32 Q'.33
Recall that pre-multiplying A by the matrix

P23 =( g ~ I)
gives
0'.11 Q'.12
Q'.13)
Q'.33
B = ( a31 0'.32
a21 a22 0'.23

and post-multiplying A by P23 gives

C = ( ::: ::: ::: ) .


Q'.31 Q'.33 0'.32

It is easy to see that the matrix B differs from A by the order of rows
and the matrix C differs from A by the order of columns.
Similarly we can verify the conditions (a) and (a') for the matrices

P12 =( ~ 0~ ~1 )
0
and P13 =( ~1 ~0 0~ ) .
122__ 5. Matrices. Determinants. Systems of Linear Equations

Let us multiply A by the matrix


'•

D2=(i i ~)-
Pre-multiplying A by D2, we have

D2A =
1
( ~
0
{3 O)(au
0 a21
a12
a22
0 1 a31 a32

(au
/3a21
a12
/3a22
ao)
/3a23 .
a31 a32 a33

Post-multiplying A by D2, we obtain

AD2=
(a11
a21
a12
a22
a")C
a23 0
0
{3
a31 a32 a33 0 0

= (a11a21
/3a12
{3a22
a13)
0:'.23 •
a31 /3a32 a33

Similarly, we verify the conditions (b) and (b ') for the elementary ma-
trices
0 0
1 1
0 0
Notice that the conditions (c) and (c ') can easily be verified in a similar
way. ►

5.2 Determinants \
We associate with a square matrix a single number called a deter-
minant according to the following rules:
The determinant of the matrix of order 1
(an)
is equal to a11.
5.2 Determinants 123

The determinant of the matrix of order 2

(::: :::)
is a number equal to a11 a22 - a12a21.
In symbols, we write
a11 a12
det (au a12) = (5.19)
a21 a22 a21 a22

The determinant of the matrix of order 3

( a11a21
a12
a22
a13)
a23
a31 a32 a33

is a number equal to
a22 a23 a12 a13 a12 a13
a11
a32 a33
- a21
a32
+ a31
a33 a22 a23

Using (5.19), we obtain

det ( a11 a21


a12
a22
a13) -
a23
au
a21
a12
a22
a13
a23
a31 a32 a33 a31 a32 a33

= a11a22a3'3 + a21a32a13 + a31a12a23


(5.20) ·
- a11a32a23 - a21a12a33 - a31a22a13.

+ Fig. 5.4

Figure 5.4 is helpful to memorize formula (5.20). The drawing on the


left refers to the arrangement of the positive terms in (5.20) and that on
the right to the arrangement of the negative terms.
Suppose that we have defined determinants for matrices of orders less
than n.
124 5. Matrices. Determinants. Systems of Linear Equations

The determinant of the matrix of order n


0'.12 CX1n
a22 0'.2n
A= (5.21)
O'.nl O'.n2 O'.nn

is a number equal to
D = a11M11 - a21M21 + ... + ( - It+ 1 an1Mn1, (5.22)
where M;i (i = 1, 2, ... , n) is the determinant of order n - I
Q'.13 O'.ln
0'.23 0'.2n

O'.i-1,2 O'.i-1,3 O'.i - l ,n (5.23)


O'.i + 1,2 O'.i + 1,3 O'.i + l,n

O'.n2 O'.n3 O'.nn

obtained by deleting from A the ith row and the first column.
In symbols, we write

D = det A= IAI - (5.24)


O'.nl CXn2 O'.nn

Formula (5.22) defines the expansion of determinant with respect to


the first column.
It is easy to see that for n = 2 and n = 3 formula (5.22) gives results
identical to those given by (5.19) and (5.20), respectively.
For example, for n = 3 we have
D = a11 (a22a33 - a32a23) - a21 (a12a33 - a32a13)
+ a31 (a12a23 - a22a13). (5.25)
We abbreviate (5.22) by writing
n
D = ~ (-li+ 1 a;1Mil, (5.26)
i =1

Example. Compute the determinant of the triangular matrix


a11 a12 a13 a1n
0 a22 a23 a2n
A= 0 0 Q'.33 iX3n .
. . . . . . . . . . . . . . . . . . . .. .
0 0 0 ...
5.2 Determinants 125

◄ We have

IAI = au
0 0 0

Whence we infer that the determinant of the triangular matrix is equal


to the product of the elements on the principal diagonal. ►
The determinant of a matrix of order n - 1 obtained from the matrix
A by deleting the ith row and the jth column is called the minor Mu of
A (Fig. 5.5). For instance, the minor M;1 is the determinant of the matrix
obtained from A by deleting the ith row and the first column.

Minor Mij obtained


from matrix A by
deleting the ith row
and the Jth column

Fig. 5.5
126 5. Matrices. Determinants. Systems of Linear Equations

Consider n numbers
n
D1 = ~ (-li+ 1auMu U= 1, 2, ... , n). (5.27)
;=1
It is easy to see that for j = 1 formula (5.27) coincides with (5.26). Let
us prove that D = D1 = D2 = . . . = Dn.
◄ We confine ourselves to the case when n = 3. By setting j = 2 in (5.27),
we have
D2 = - a12M12 + a22M22 - a32M32, (5.28)
where
a21 a23
M12 = = a21 a33 - a31 a23,
CT31 CT33

a11 a13
M22 = = a11a33 - a31a13, (5.29)
CTJI CT33

a11 CT13
M32 = = a11a23 - a13a21.
a21 a23

Whence we obtain
D2 = - a12(a12a33 - a31a23) + a22(a11a33 - a13a31)

- a32(a11a23 - aua21). (5.30)


Comparing (5.25) and (5.30), we conclude that D = D2. The identity
D = D3 is verified in a similar way. ►
Remark. In general the identities D = Di = D2 = . . . = Dn are verified
by computing determinants of orders n - 1, n - 2, etc.,
Therefore we have proved that the expansion of determinant with
respect to the jth column is given by
n
D = ~ (-li+JauMu U = 1, 2, ... , n). (5.31)
i=1
We call the number:
A;J = ( -1); +JMu (5.32)
the cofactor of the element a;J in I A I .
Notice that the cofactor Au of a;J is independent of the value of au.
In other words, Au remains unchanged when the element au is replaced
by f3u in A.
Substituting (5.32) into (5.31), we may write
n
D = ~ auA;J = Ct1jA1j + CT2jA2j + ... + CTnjAnj
;=1
U = 1, 2, ... , n). (5.33)
5.2 Determinants 127

Whence we conclude that the determinant of a square matrix is equal to


the sum of the elements of any column multiplied by their respective
cofactors.
Similarly to (5.27) we may introduce the numbers

n
~t = ~ ( - 1)' + j auMij (i = 1, 2, ... , n) (5.34)
J=I

such that ~ 1 = ~2 = ... = ~n- To verify these identities it suffices to inter-


change the rows and columns in A, i.e., to transpose A, and apply to A'
the same reasoning as above. In conclusion we shall prove the following
theorem.
Theorem 5.3. For any square matrix of order n there holds

n
D= ~ (-l)t+JauM;1 (i= 1, 2, ... , n), (5.35)
)=1

i.e., the determinant of any square matrix of order n admits expansion with
respect to the ith row.
◄ It is sufficient to prove that

~l = D. (5.36)
Again we confine ourselves to n = 3.
Using (5.34), we have

~1 = a11M11 - a12M12 + a13M13 = a11 (a22a33 - a23a32)

- a12(a21a33 - a23a31) + a13(a21a32 - a22a31).

Comparing this expression with (5.25), we infer that ~1 = D. ►


Remark. In general the identity ~1 = D is verified by computing deter-
minants of orders n - 1, n - 2, etc.
Substituting (5.32) into (5.35), we arrive at
n
D = ~ auAu = ail Ail+ ai2A;2 + . . . + a;nAin
j=l
(i = 1, 2, ... , n). (5.37)

Therefore, the determinant of a square matrix of order n is equal to


the sum of the elements of any row multiplied by their respective cofactors.
Example. Compute the determinants of the elementary matrices P;J,
D1 and Lu.
128 5. Matrices. Determinants. Systems of Linear Equations
-------------

◄ The determinant of Pu is

l J
1

1
0 1 l
IPiJ I 1

1
1 0 J
1

Expanding P ;1 with respect to the first row, we have a determinant of


order n - I equal to IP ;1 I. Then expanding the determinant of order n - I
with respect to its first row, we obtain a determinant of order n - 2 which
is also equal to I PiJ I. It is easy to see that going on with this procedure
we obtain

0 1
I

1
I Pu I 1 . 0
1

after i - 1 steps.
Using this procedure further, we finally arrive at
0 1
I Pu I = l O = -1.

Recall that Dj is a diagonal matrix of order n such that <Y.jj = {3 and


the other diagonal elements are unities. Then
IDj I = /3.
5.2 Determinants 129

For the matrix Lu we have


I Lu I = 1. ►
Properties of determinants. Chapter 1 represents a detailed account of
properties of the second- and third-order determinants. Here we point out
properties possessed by an arbitrary determinant of order n.
Linear property. Suppose that we are given a determinant D such that
one of its rows is a linear combination of the other two rows. Let

D= l.

O'.n l

Then
D = AD' + µD",
where the determinants

CX1 I 0'.12 CXtn 0'.1 I CX12 CXJn

D' {31 f3n l and D" = ')'1 ')'2 ')'n I

O'.n l O'.n2 O'.nn O'.n I CXn2 O'.nn

differ from D by their ith rows.


◄ To verify this property it suffices to expand D, D' and D II with respect
to their ith rows. Observe that the cofactors of A{3j + µ-yj, {3j and ')'j are
all equal to Au. Then using (5.37) we have
n
D = ~ (A{3j + µ-yj)A;j,
j=l
n n
D' = ~ {3jAu and D 11
~ ')'jAu.
j=l j=l

Whence
D = }.JJ' + µD 11
• ►

Property of antisymmetry. If a determinant J5 is obtained from a deter-


minant D by interchanging any two rows in D, then
Jj = -D.

9-9505
130 5. Matrices. Determinants. Systems of Linear Equations

◄ Suppose that

a22 n'.2n
Q'.12 Q'. 1n
D=
n'.n2 n'.nn

is obtained from
ll'.ln
ll'.2n
D=
ll'.nl n'.n2 ll'.nn

by interchanging the first and second rows in D.


Expand D with respect to the second row and J5 with respect to the
first row. Using (5.35), we obtain
D = - a21M21 + a22M22 + ... + ( -1) 2 + na2nM2n
and
J5 = a21M21 - a22M22 + ... + (-1) 1 + na2nM2n,
Whence
J5 = -D. ►
Similarly we may prove this property in the case when 15 is obtained
from D by interchanging any two rows.
Transposition of a determinant. The operation of transposing a matrix
A leaves the determinant of A unchanged, 1.e.
IA' I= IAI.
◄ Expanding IA I with respect to the first row and IA' I with respect to
the first column, we infer that I A' I = IA I. ►
Notice that this property implies the equivalence of rows and columns
of any determinant. So the above two properties may be described in terms
of columns of determinants.
Multiplication of determinants. Let A and B be two square matrices
of the same order. Then
IABI = IAl· 1B1,
that is, the determinant of the product of two square matrices of the same
order is equal to the product of the respective determinants.
Now we shall mention some properties of determinants which are help-
ful in computing determinants.
(a) The determinant with two identical rows is equal to zero.
5.2 Determinants 131

◄ Indeed, by property of antisymmetry interchanging any two rows


reverses the sign of determinant. However, interchanging two identical rows
leaves the determinant unchanged. Hence D = - D. This means that
D = 0. ►
(b) Multiplying a row by a scalar is equivalent to multiplying the deter-
minant by the same scalar.
◄ This follows from linear property provided that µ, = 0. ►
(c) The determinant with a zero row is equal to zero.
◄ To prove this property it suffices to expand the determinant with respect
to the zero row. ►-
(d) The determinant with two proportional rows is equal to zero.
◄ By property (b) the common factor may be taken outside the deter-
minant. Then two rows become identical. Hence the determinant is equal
to zero. ►
(e) Adding one row multiplied by a scalar to another row leaves the
determinant unchanged.
◄ By linear property this operation results in the determinant equal to
the sum of two determinants, i.e., the original determinant and that with
two proportional rows. By property (d) the latter is equal to zero. Therefore
the determinant remains unchanged. ►
In general a determinant remains unchanged when a linear combination
of rows is added to any of its rows. The same is true for the columns of
a determinant.
Example. Prove that the sum of elements of one row multiplied by
cofactors of the respective elements of another row is equal to zero.
◄ Replacing the kth row by the ith row in

Cc'.il Cc'.i2 Cc'.in l


D= (i ~ k)
k

Cc'.n 1 Cc'.n2 CX.nn

we obtain

cx.;2 CX.in l

Cc'.i2 k

CX.n 1 CX.n2 CX.nn

41'
132 5. Matrices. Determinants. Systems of Linear Equations

Observe that the ith and kth rows in J5 are identical. By property (a),
J5 = 0. Then expanding 15 with respect to the kth row, we arrive at the
desired identity
O'.i1Ak1 + O'.i2Ak2 + ... + O'.inAkn = 0 (i ~ k).
(Recall that cofactors of elements in any row do not depend on the values
of these elements.) ►
Computing determinants. When computing determinants we widely use
elementary (row and column) operations. It is important to point out that
being subjected to elementary operations either (a) or (b) or (c) the deter-
minant either reverses its sign or is multiplied by a scalar or remai~s un-
changed, respectively. Therefore the elementary operations are closely
related to the properties of determinants.
Example. Compute the determinant of
2 -5 4 3
3 -4 7 4
A= 4 -10 8 3
-3 2 -5 3
◄ Observe that a11 ~ 0 and divides only two elements of the first column
with non-zero remainders. To avoid division of elements of the matrix we
multiply the second row by - 2, the third row by -1 and the fourth row
by 2. Then we obtain
2 -5 4 3
-6 8 -14 -8
( - 2)( -1)21 A I - - 3 .
-4 10 -8
-6 4 -10 6
Step 1. Adding multiples by 3, 2 and 3 of the first row to the second,
third and fourth rows, respectively, we obtain
2 -5 4 3
0 -7 -2 1
41AI - 3 .
0 0 0
0 -11 2 15
Step 2. To avoid division we multiply the fourth row by - 7. Then we
have
2 -5 4 3
0 -7 -2 1
(-7)4 IAI - 0 3 .
0 0
0 77 -14 -105
5.3 Inverse Matrices 133

Adding the second row multiplied by 11 to the fourth row gives


2 -5 4 3
0 -7 -2 1
-281AI - 3 .
0 0 0
0 0 -36 -94
Step 3. Interchanging the third and fourth rows gives
2 -5 4 3
0 -7 -2 1
281AI - -94 .
0 0 -36
0 0 0 3
Computing the determinant of the triangular matrix given above, we
obtain
28 IA I = 2(-7)( - 36)3
and finally
IAI = 54. ►

5.3 Inverse Matrices


A square matrix is called nonsingular if its determinant is not equal
to zero.
Let A = (a;j) be a nonsingular matrix of order n and let B be a square
matrix of order n such that the element of B in the position where the.
ith row and jth column meet is equal to the cofactor Ai; of Olji in A, i.e.,
B = (/3u) where /3u = Aii• Therefore the matrices A and B are related as
shown below:

A= j;
Oljl Olji Oljn

Olnl Olni Olnn

Au Ant
A12 An2
B= .
Au An; l.

A1n Onn
134 5. Matrices. Determinants. Systems of Linear Equations

The matrix B possesses the fallowing property

IAI. 0
AB=BA= IAI •I. (5.38)
0 IAI
By way of illustration we show that
AB= IAl•I.
◄ The element of AB occupying the position where the ith row and the
jth column meet is given by
n
'YU = ~ Ol;kAjk.
k=l
Setting i = j, we see that 'Yii is the result of the expansion of the deter-
minant of A with respect to the ith row, i.e.,
n
'Yii = ~ Ol;kAik = IA I.
k=l

In other words, each diagonal element of AB is equal to the value of the


determinant of A.
By the exercise considered in the previous section, we have
'YU= 0
for each off-diagonal element in AB. ►
Similar reasoning is applicable to the identity BA = IA I · I.
The matrix

Ali An;
(5.39)
IAI IAI

A1n Ann
IAI IAI
is called the inverse of the matrix A.
From (5.38) it follows that
AA - 1 = I, A - 1A = I. (5.40)
This means that the matrix A - 1 may be considered as a solution satisfy-
ing two matrix equations simultaneously
AX = I and XA = I,
5.3 Inverse Matrices 135

where
Xu X1j X1n

X= Xin

Xn1 Xnj Xnn

is an unknown matrix.
Let us show that A - 1 is the only solution satisfying both these equa-
tions simultaneously.
Assume that there exists a matrix C such that
AC = I and CA = I.
Pre-multiplying the former identity by A - 1 and post-multiplying the
latter by A - 1 , we have
A - 1 (AC) = (A - 1)1 and (CA)A - 1 = l(A - 1). (5.41)
By the associative law of multiplication we may write
A - 1 (AC) = (A - 1 A)C and (CA)A - 1 = C(AA - 1).

Substituting these identities into (5.41) and using (5.40), we obtain


IC = A - 1 I and CI = IA - 1•
Whence, (5.6) gives
C = A - 1•
Computing inverse matrices. We shall mention here an effective and .
rather straightforward method for computing an inverse matrix by succes-
sive use of elementary (row and column) operations. We begin with theoret-
ical background of the method, known as pivotal condensation.
Theorem 5.4. By a sequence of elementary row (column) operations an
arbitrary nonsingular matrix is reducible to the identity matrix.
◄ By Theorem 5.1 any matrix is reducible by elementary row operations
to a matrix of schematic form.
Suppose that we are given a square nonsingular matrix.
Let us show that this matrix may be reduced to the triangular matrix
aP? aiY aiY ai~
0 a~~ a~~ a~;? (5.42)
'
0 0 a~~ a~~
. .. . . . .. . . . . . . . . . . .. . . . . . . . .
0 0 0 a<n)
nn

where aP/ ¢ 0, a~~ ¢ 0, a~:Y ¢ 0, ... , a~':l ¢ 0 (see formula (5.15)).


136 5. Matrices. Determinants. Systems of Linear Equations

By Theorem 5.2 each elementary row operation is equivalent to pre-


multiplication of the given matrix by a suitable elementary matrix. Recall
that the determinants of the elementary matrices are not equal to zero.
We have also seen that the determinant of the product of two square ma-
trices is equal to the product of the determinants of these matrices. Thus
multiplying a nonsingular matrix by any elementary matrix and, conse-
quently, any elementary row operation, gives rise to a nonsingular matrix.
This means that if the matrix (5.42) is obtained from the given nonsingular
matrix by applying a sequence of elementary row operations, (5.42) is non-
singular and, consequently, its determinant
a Pl- a~~ . a~~ . . . .. a~':?
is distinct from zero. Whence we infer that
(1) 0 O'. 22
O'. 11 ~ ,
(2) .,....
~ 0
,
(3) ~ 0 (n)
O'. H r- , • • • , O'. nn
~
r-
0•

If (5.42) is subjected to a sequence of elementary row operations of


type (b), it becomes

1 0'.12 0!13 O'.In


0 1 0'.23 0'.2n (5.43)
0 O' 1 0'.3n
. . . . . .. . . .. .. .. . ... . . . ..
0 0 0 1
Adding multiplies of the nth row of (5.43) by - &in, - &2n, •.• ,
- &n - 1,n to the 1st, 2nd, ... , (n - l)th rows, respectively, we obtain the
matrix

1 0'.12 0'.13 0'.1,n-l 0


0 1 0'.23 0'.2,n-1 0
0 0 1 0'.3,n - 1 0
.. . . . .. ... . . .. . . . . . . . . . . ... .
0 0 0 1 0
0 0 0 0 1
such that in its nth column all the elements but the last one, which is equal
to 1, are zeros.
Similarly, adding multiplies of the (n - l)th row of the obtained matrix
by -&1,n-l, -&2,n-1, ••• , -&n-2,n-1 to the first n - 2 rows, we obtain
the matrix such that its (n - l)th and nth columns contain unities in the
principal diagonal and zeros in the other rows. In the end of this process
we add the multiple by - &12 of the second row of the matrix obtained
after n - 1 steps to the first row and finally arrive at the identity matrix
5.3 Inverse Matrices 137

1 0
1
0 1
and thus complete the proof. ►
In matrix form this theorem states the following.
Theorem 5.5. Let A be an arbitrary nonsingular matrix. Then there exist
elementary matrices Q1, Q2, ... , Qk such that
Qk,Qk- I'.,. ·Q2·Q1 ·A = I. (5.44)
Post-multiplying (5.44) by A - 1, we obtain
Qk,Qk-1',,, ·Q2·Q1 = A - 1.
Pivotal condensation. Suppose that we are given a nonsingular matrix
A of order n. Let
(A I I) (5.45)
be a matrix of order n x 2n. If (5.45) is subjected to a sequence of elemen-
tary row operations or, equivalently, is premultiplied by a sequence of
elementary matrices, the matrix A reduces to the identity matrix and the
identity matrix I reduces to the matrix A - 1 inverse to A. Thus, by elemen-
tary row operations (5.45) is reduced to
(I I A - 1 ).

It is easy to understand how we should proceed to compute the inverse ·


of an arbitrary nonsingular square matrix A = (au) (i, j = 1, 2, ... , n):
1. Set up the matrix

0'.11 0'.12 q1n 1 0 0


I
0'.2 I a22 0'.2n 0 1 0
(A I I) = . .. .. . .. . .. .. . . . ... . . ... . . . .. . . . .
O'.n 1 O'.n2 O'.nn 0 0 1
of order n x 2n.
2. By elementary row operations reduce (A I I) to the form

I &12 &in /311 /312 f31n


0 I &2n /321 /322 f32n
(A 1B) = . .. . .. . . . .. . . . .. . . .... . .. . .. .. . . . . ... .
0 0 1 f3n1 f3n2 f3nn
where A is the triangular matrix of order n obtained from A. (See the proce-
dure used to prove Theorem 5.1.)
138 5. Matrices. Determinants. Systems of Linear Equations

3. By elementary row operations reduce (A I B) to the form


1 Q Q ,'11 ,'12 ,'In
0 1 0 ,'21 ,'22 ,'2n
(I IC)=
0 0 1 'Ynl 'Yn2 'Ynn

where I is th·e identity matrix obtained from A. (See the procedure employed
in Theorem 5.4.)
4. The matrix C = (-y;j) of order n (i, j = l, 2, ... , n) in (I IC) is the
inverse of the matrix A, i.e.,
C = A- 1•

Example. Compute the matrix inverse to


1 1 1 1
1 1 -1 -1
A= 1 -1 1 -1
1 -1 -1 1

◄ Set up the matrix of order 4 x 8


1 1 1 1 1 0 0 0
1 ·1 -1 -1 0 1 0 0
B= 1 -1 1 -1 0 1 0
0
1 -1 -1 1 0 0 0 1
Step I. Subtract the first row from the other three rows:
1 1 1 1 1 0 0 0
0 0 -2 -2 -1 1 0 0
B ~ Bi = O -2 0 -2 -1 0 1 0
0 -2 -2 0 -1 0 0 1
Step 2. Observe that a22 = 0. Interchange the second and third rows
and subtract the second row thus obtained from the fourth row:
1 1 1 1 1 0 0 0
0 -2 0 -2 -1 0 1 0
0 0 -2 -2 -1 1 0 0
0 0 -2 2 0 0 -1 1
Step 3. Subtract the third row from the fourth one and divide each row
by its diagonal element:
1 1 1 1 1 0 0 0
0 1 0 1 1/2 0 -1/2 0
0 0 1 1 1/2 -1/2 0 0
0 0 0 1 1/4 1/4 -1/4 1/4
5.4 Rank of a Matrix 139

Step 4. Subtract the fourth row from the other three rows:
1 1 1 0 3/4 1/4 1/4 -1/4
0 1 0 0 1/4 1/4 -1/4 -1/4
B3 ⇒ B4 = 0 0 1 0 1/4 -1/4 1/4 -1/4
0 0 0 1 1/4 1/4 -1/4 1/4
Step 5. Subtract the third row from the first one:
1 1 0 0 1/2 1/2 0 0
0 1 0 0 1/4 1/4 -1/4 -1/4
0 0 1 0 1/4 -1/4 1/4 -1/4
0 0 0 1 1/4 1/4 -1/4 1/4
Step 6. Subtract the second row from the first one:
1 0 0 0 1/4 1/4 1/4 1/4
0 1 0 0 1/4 1/4 -1/4 -1/4
Bs ⇒ B6 = -1/4 -1/4
0 0 1 0 1/4 1/4
0 0 0 1 1/4 -1/4 -1/4 1/4
Whence we conclude that

A - i =!A. ►

5.4 Rank of a Matrix


Suppose that we are given the matrix

CTll a12 CT1n


a21 a22 CT2n
A=
CTm1 CTm2 CTmn

Let us choose in A k rows and k columns such that i1 < i2 < . . . < it and
ji < h < ... < }k refer to the chosen rows and columns, respectively, and
set up the kth order matrix

A=

The determinant Mk of this matrix is called the kth order minor of


the matrix A.
It is easy to see that an m x n matrix has minors of orders 1, 2, ... ,
min (m, n).
140 5. Matrices. Determinants. Systems of Linear Equations

Example. We consider an 11 x 14 matrix shown in the upper part of


Fig. 5.6. Let us choose 7 rows and 7 columns, say, the rows numbered 1,
3, 4, 6, 8, 9, 10 and the columns numbered 2, 5, 6, 7, 10, 12, 13. If we
set up a matrix of order 7 comprising the elements which are contained
in both the rows and columns chosen, we get the matrix shown in the lower
part of Fig. 5.6 provided that the mutual arrangement of these elements
in the original matrix is preserved in the new one. The determinant of the
matrix thus obtained is the 7th order minor of the original matrix.
Let A be a nonzero matrix. Then
2 5 6 7 10 1213 there exists a number r such that
1 (a) at least one rth order minor
of A is distinct from zero and
3
,4 (b) any sth order minor of A
(s > r), if it exists, is equal to zero.
6 ~ W/41 ~ WJ The number r is called the rank
of the matrix A.
1t~ ~ ~~ In symbols, we denote the rank
of A by r(A).
Obviously
0 ~ r(A) ~ min(m, n).
The minor Mr which is distinct
from zero and such that its order is
equal to the rank of the matrix A
is called the base minor of A. The
rows and columns of A comprising
the elements of the base minor of
A are called the base rows and the
base columns of A.
Fig. 5.6 Theorem 5.6. Let A be an ar-
bitrary matrix. Then
(1) The base rows (columns) of A are linearly independent.
(2) Each row (column) of A admits a representation in the form of a
linear combination of the base rows (columns) of A.
◄ For definiteness assume that the base minor of A is of order r and
is in the top left corner of the matrix A, i.e.,
0!11 O!lr O!t,r+ 1 O!In
Mr
O!rl O!rr O!r,r+ 1 O!rn
A= (5.46)
O!r + 1, 1 O!r+ l,r O!r+ l,r+ 1 O!r + 1,n

O!ml O!mr O!m,r+ 1 O!mn


5.4 Rank of a Matrix 141

Then the first r rows of A denoted by a 1, a2, ... , a, become the base rows
of A.
We shall prove that a1, a2, ... , a, are linearly independent. Let us sup-
pose the opposite, i.e., the rows a1, a2, ... , a, are linearly dependent. Then
one of the rows is a linear combination of the other r - 1 rows. Assume
that

This means that the rth row of the base minor M, is a linear combination
of the other rows of M,. Recall that any determinant such that one of its
rows is a linear combination of the others is equal to zero. Hence, M, = 0.
But this contradicts the definition of the base minor. Consequently, the
assumption on the linear dependence of a1, a2, ... , a, is false. Thus we
infer that a1, a2, ... , a, are linearly independent.
Now we show that each row of A can be represented as a linear combi-
nation of the other base rows of A. First, we ascertain that for any i and
j (1 ~ i ~ m, 1 ~ j ~ n) there holds
a11 CX1r CX1j

Ll=
a,1 CXrr CXrj
= 0. (5.47)

a;1 CXir CXij

Indeed, if i ~ r .1. contains two identical rows, and if j ~ r .1. contains


two identical columns. Hence, .1. = 0 in both these cases. When i > r and
j > r, .1. becomes the (r + l)th order minor of the matrix A.
Let us fix a subscript i (1 ~ i ~ m), the ith row, and expand the deter-
minant .1. with respect to the last column. We get
(5.48)
The identity (5.48) holds for any j (1 ~ j ~ n) and .1.1, .1.2, ... , .1., are in-
dependent of j.
Setting

At=

we may write (5.48) in the form


CXij = AtCXtj + A20l.2j + ... + ArCXrj, J = 1, 2, ... ' n,
or
ail = AtCXtt + A2a2t + ... + A,a,1
a;2 = At CXt2 + A2a22 + . . . + A,a,2 (5.49)
142 5. Matrices. Determinants. Systems of Linear Equations

From (5.49) it .follows that

Whence we conclude that the ith row of A is a linear combination of the


base rows a1, a2, ... , Br. Since the above reasoning is applicable to any
other row of A we infer that each row of A is a linear combination of
the base rows in A. ►
Proposition. If a matrix is subjected to elementary operations its rank
does not increase.
◄ Let A be a matrix of rank f obtained from a matrix A of rank r by
interchanging its rows. We consider an arbitrary minor Ms of order s in
A and choose in A a minor Ms of the same order s, such that the elements
of Ms are arranged in the rows and columns of A in the same manner
as the rows and columns of A comprising the elements of Ms. Observe
that interchanging the rows of A rearranges the rows without altering them.
This means that the rows in Ms may only differ from those in Ms by their
arrangements. Whence it follows that either Ms = Ms or Ms = - Ms.
By definition of the rank all minors of A whose orders exceed r are
equal to zero. From the above consideration we infer that any sth (s > r)
order minor Ms of A is equal to zero, i.e., Ms = 0. This means that the
rank of A cannot exceed the rank of A, i.e., f ~ r.
Similarly, we may show that f ~ r in the cases when the matrix A is
obtained from A by elementary operations of the other two types. Obvious-
ly, this assertion is true for elementary column operations. ►
Theorem 5.7. If an arbitrary matrix A is subjected to elementary row
(column) operations its rank remains unchanged.
◄ It suffices to recall that if the matrix A is obtained from the matrix
A by elementary operations, the matrix A is obtainable from A by elemen-
tary operations of the same type. Toking into account the above assertion,
we conclude that r ~ f.
Comparing the inequalities f ~ r and r ~ f, we arrive at the desired
identity

- = r.
r ►

Remark. The number of nonzero rows in a matrix of schematic form


is equal to the rank of the matrix.
Indeed, for a matrix of schematic form the rth order minor whose ele-
ments are contained in the first r rows and in the columns k1, k2, ... ,
kr is distinct from zero, i.e.,
5.5 Systems of Linear Equations 143

0
_ (1) (2) (r) 0
- ex tk1 cx.21c2 ••• cx,k, ~

0 (r)
CX.rk,

and any minor of order s (s > r) comprises the zero row and, consequently,
is equal to zero.
Thus, subjecting a matrix to a sequence of elementary operations, we
may easily and effectively define the rank of this matrix.
Example. Find the rank of

A = (~ - ~ ~ - i ~).
2 -1 1 8 2
◄ Step 1. Subtracting the multiples of the first row by 2 and 1 from the
second and third rows, respectively, we get
2 -1 3 -2
r(A) = r ( ~ -1 5
o oI -2 10
Step 2. Subtracting the second row multiplied by 2 from the third row,
we get
-1
0 I -I
3 -2 4)~5 - = 2. ►
0 0 0

5.5 Systems of Linear Equations


Definitions. Suppose that we are given a matrix

CX.11 Ci.In f31


cx.21 Ci.Zn (32
A= (5.50)
Ci.ml CX.m2 CX.mn f3m

whose first n columns are nonzero.


A collection of relations of the form
CX.11X1 + cx.12X2 + ... + CX1nXn = (31,
CX.21X2 + CX.22X2 + . . . + CX2nXn = (32, (5.51)

CX.m1X1 + CX.m2X2 + ... + CXmnXn = f3m,


144 _ 5. Mat~ices. D~term!na_~ts. ~ys_te~s of__~j!_l_ear Equation=s'-----------

where Xi, X2, ... , Xn are quantities to be defined (unknowns), is called


the system of m linear equations in n unknowns or the linear system.
The numbers au (i = 1, 2, ... , m; j = 1, 2, ... , n) are called the coeffi-
cients of the linear system (5.51) and the numbers {3; (i = 1, 2, ... , m) are
called the constant or free terms.
A solution of the linear system (5.51) is an ordered n-tuple of numbers
,'1, 'Y2, ... , 'Yn which, being substituted into (5.51) for the unknowns x 1 ,
X2, ... , Xn, turns each equation of (5.51) into identity.
The system of linear equations is said to be consistent if it has a solution
and inconsistent if it has no solutions.
Two solutions 'Yl , 'Y2, ... , 'Yn and 'Y {, yJ,, ... , 'Y; are called distinct
if at least one of the identities ,'1 = 'Y{, 'Y2 = yf_, ... , 'Yn = ,,.; is violated.
A consistent system is called determinate if it has a unique solution,
and indeterminate if it has at least two distinct solutions.
In matrix form, the system (5.91) may be written as the matrix equation
AX= b (5.52)
'
where
0'.11 Ci.In

0'.21 0'.2n
A=
Ci.ml O'.m2 O'.mn

(5.53)
/31
f32
b= and X =

/3m

The matrix A is called the coefficient matrix of the system (5.51), the
column-vector b is called the column-vector of constant terms and the
column-vector X is called the column-vector of unknowns. The matrix
A= (Alb)
is called the augmented matrix of the system (5.51). The solution of the
matrix equation (5.52) is the column-vector

,'l
,'2
f=

'Yn
5.5 Systems of Linear Equations 145

Theorem 5.8 (Kronecker-Capelli theorem). The system of linear equa-


tions (the linear system) is consistent if and only if the rank of the coeffi-
cient matrix is equal to the rank of the augmented matrix.
◄ Suppose that the system (5.51) is consistent. This means that the n-tuple
of numbers 'YI, ')'2, ... , 'Yn turns each equation of (5.51) into identity, that
IS,
a 11 'Y 1 + a 12 ')'2 + . . . + a 1n ')' n = /3 1
a21 ')' 1 + a22 ')'2 + . . . + 0'.2n ')'n = f32
..................... ....... . "

CXml ')'I + O'.m2')'2 + • • • + O'.mn')'n = f3m

From the above identities it follows that in the- augmented matrix


A = (A I b) the column-vector b of constant terms is a linear combination
of the other columns in A, i.e., b is a linear combination of the columns
of A.
Adding to the column-vector b the first column of A multiplied by - ')'1,
the second column of A multiplied by - ')'2, etc., and finally the nth column
of A multiplied by - 'Yn, we reduce the augmented matrix A to the matrix
A= (A I 0).
The rank of A is equal to that of A since, by Theorem 5.7, the elemen-
tary column operations leave the rank unchanged. On the other hand, it
is clear that the rank of A = (A I 0) is equal to that of A. Whence we infer
that r(A) = r(A) = r(A).
Now we suppose that the rank of the coefficient matrix is equal to that
of the augmented matrix.
Since A = (A I b), the matrices A and A have common base minors.
Let the common base minor be of order r and let this base minor be in
the top left corners of the respective matrices.
By Theorem 5.6 any column in A admits a representation in the form
of a linear combination of the base columns of A. In particular, the
column-vector of constant terms, i.e., the (n + l)th column in A, may be
written as
O'. 1 l a12 O'.Ir
a21 a22 0'.2r
')'l + ')'2 + ... + 'Yr

O!mt O!m2 O'.mr

or
O'.I 1)'l + 0'.12 ')'2 + + O'.tr'Yr = f31,
a21 'YI + a22 ')'2 + + 0'.2r')'r = f32,
O!mI 'YI + O'.m2'Y2 + ••• + O!mr'Yr = f3m.
10-9505
146 5. Matrices. Determinants. Systems of Linear Equations

It is easy to see that the ordered n-tuple of numbers


')'1, ')'2, , , ., ')'r, 0, ... , 0
turns each equation of the original system into identity. Whence it follows
that the system (5.51) is consistent. ►
Equivalence of linear systems. We shall call a collection of all solutions
of a given linear system a set of solutions.
1\:vo linear systems comprising equal numbers of unknowns are said
to be equivalent if their sets of solutions coincide. In other words, two
linear systems are equivalent if each solution of one system is a solution
of the other one and vice versa, or if both these systems have no solutions.
It is clear that any linear system is uniquely determined by its augmented
matrix. Suppose that A and A' are two m x (n + 1) augmented matrices
of the linear systems
lt'.11X1 + a12X2 + ... + CX1nXn = /31,
a21X1 + CX22X2 + ... + CX2nXn = /32, (*)

and
a{1X1 + a{2X2 + ... + a{nXn = {3{,
CX21X1 + CX22X2 + .. , + CX2nXn = /32,

The system (* ') is said to be obtained from the system ( *) by elementary


operations if the augmented matrix A ' is obtained from the augmented
matrix A by elementary row operations.
Theorem 5.9. If the linear system (* ') is obtained from the linear system
( *) by elementary operations these systems are equivalent.
◄ First, we suppose that the system ( *) is consistent. Let ')'1, ')'2, •.. , ')'n
be a solution of ( *). We show that this n-tuple of numbers turns each equa-
tion of(*') into identity. Consider three cases-which correspond to elemen-
tary operations of different kinds:
(a) Assume that the system ( * ') is obtained from ( *) by interchanging
rows in A. This means that the matrix A' comprises the same collection
of rows as A and, consequently, the system ( * ') contains the same collec-
tion of equations as (•). Whence we conclude that if the n-tuple ')'1, ')'2,
••• , ')'n turns each equation of ( *) into identity, it also turns each equation
of (• ') into identity.
(b) Assume that the system (• ') is obtained from (•) by multiplying
the kth row in A by a nonzero scalar A. Then the kth equation of (• ')
5.5 Systems of Linear Equations 147

will be of the form


(ACc'.k1)X1 + (ACc'.k2)X2 + ... + (Aakn)Xn = A(3k
and the other (m - 1) equations will be the same as in ( *).
Substituting 'YI, 'Y2, ... , 'Yn into the kth equation of ( *), we obtain the
identity
akI'YI + Cc'.k2'Y2 + · · · + Cc'.kn'Yn = f3k,
Multiplying this identity by A -:;c 0, we arrive at
(ACc'.kl)'YI + (ACc'.k2)'Y2 + ,., + (ACc'.kn)'Yn = A{3k.
Whence we conclude that the solution of ( *) is also the solution of ( * ')
since 'YI, -y2, ... , 'Yn turns the kth and all other equations of (* ') into
identities.
(c) Now we assume that the system ( * ') is obtained by adding to the
/th row in A the kth row multiplied by a scalar µ. Then the /th equation
of ( * ') will be of the form
(a11 + µak1)X1 + (a,2 + µak2)X2
(5.54)
and the other (m - 1) equations in (* ') will be the same as in ( *).
Substituting 'YI, -y2, ... , 'Yn into the kth and /th equations of ( *), we
obtain two identities
ak 1'Yl + Cc'.k2 'Y2 + + Cc'.kn'Yn = f3k,
Cc'./1 'Yl + Cc'./2 'Y2 + + Cc'.fn'Yn = f31.
Substituting the multiple of the first identity by µ into the second one,
we obtain
(a11 + µak1)-y1 + (an +- µak2)-y2
+ . • • + (Cc'./n + µCc'.kn)'Yn = (3, + µ{3k •
Whence we infer that the solution 'YI, 'YZ, ... , 'Yn of (*) is also the solu-
tion of ( * '). Notice that by substituting 'YI, -yz, ... , 'Yn into (5.54), we reach
the same result.
Thus, in each case any solution of the system ( *) is a solution of the
system ( * ').
Observe that the system ( *) may also be obtained from the system ( * ')
hy elementary operations. Repeating the previous arguments, we infer that
any solution of the system ( * ') is a solution of the system ( *).
Second, we suppose that the system ( *) is inconsistent. Then the system
(• ') is also inconsistent. To prove this fact, assume the opposite, i.e., assume
I hat the system ( * ') is consistent. Then Theorem 5.9 implies that the system

I fl I
148 5. Matrices. Determinants. Systems of Linear Equations

(•) is also consistent. However, this contradicts to what we have assumed.


Whence we conclude that if the system (*) is inconsistent, so is the system
(• '). ►
Remark. Clearly, if the system (* ') is obtained from the system (•) by
a finite sequence of elementary operations, then these systems are
equivalent..
Gaussian elimination. When we wish to solve a linear system we should,
first, find out whether the system is consistent, and if this is the case,
second, find the set of its solutions.
We shall outline a method of solving a given linear system called the
method of Gaussian elimination. The method employs elementary opera-
tions to reduce a given linear system to a system of simpler form whose
solutions are easily computed.
The elementary operations on a linear system are equivalent to the
elementary row operations on its augmented matrix. So we shall simultane-
ously consider the linear system
a11X1 + a12X2 + ... + a1nXn = /31,
a21X1 + a22X2 + ... + a2nXn = /32,

and the augmented matrix

an a12 U1n
a21 a22 a2n
A=
Uml Um2 Umn

In the previous sections we have shown that the elementary row opera-
tions may reduce the augmented matrix A to schematic form

A'-
a(r)
tJr+ 1

0
0
0
Similarly, the system (•) is reduced to the form with the augmented
matrix A.
5.5 Systems of Linear Equations 149

If the constant term f3tl 1


is distinct from zero the new linear system
(and, consequently, the original one) is inconsistent. Indeed, the (r + l)th
equation takes the form
0·X1 + 0·X2 + ... + 0·Xn = {3~'11 ';t- 0.

This means that there exists non-tuple of numbers turning the above equa-
tion into identity.
Let {3~'1 1 = 0. Then only the first r rows in A' are distinct from zero.
To simplify the notation we set
k2 = 2, k3 = 3, ... , k, = r.
(This may be made by renumbering Yi = X1, Y2 = Xk2, ..• , y, = Xk,, ...• )
The linear system takes the form
(1)
a11X1 + a11X2
(1) (1)
+ ... + a1,X, + ... + a1nXn(1)
= {3(1)
1 ,
(2)
a2t X2 + . . .
+ a2,
(2) (2) _ {3(2)
X, + . . . + Ct'.2,n Xn - 2 ,

00 Xr
a,, + ... + 00 Xn
Ct'.rn = {300
r ,

where ai? ~ 0, a~1 ";t- 0,


... , a~~) ~ 0. Then
(1) The number n of unknowns in (* ') is equal to the number r of
equations, i.e., r = n. Then the system (* ') takes the form
aP/x1 + a~~X2 + + a~~~ - 1 Xn - 1 + a\~Xn = f3P\
a2tX2
(2)
+ ... + a2(2)' n-1 Xn-1 + (2)
0!2,nXn = {3(2)
2 ,

(n - 1) (n - 1) _ {3(n -
1)
Ct'.n-1 n-1Xn-l
'
+ O!n-1
'
nXn - n-1 ,

rv(n)x
u.nn n
= {3(n)
n ,

where a'l'J ~ 0, k = 1, 2, ....


The last equation defines uniquely the value of Xn. Substituting the
value of Xn into the (n - l)th equation, we get the value of Xn- 1, Then
substituting the values of Xn and Xn _ 1 into the (n - 2)th equation, we ob-
tain the value of Xn - 2. Continuing this process, we obtain the values of
Xn - 3, Xn-4, ••• , X3, x2. Finally, substituting the values of X2, X3, ••• , Xn
into the first equation we obtain the value of x1.
Thus the system (• ') has a unique solution and so does the system (• ).
(2) The number n of unknowns in (• ') is greater than the number r
of equations, i.e., n > r.
We set the unknowns x, + 1, x, + 2, ••• , Xn equal to arbitrary numbers
'Jr+ 1, 'Yr+ 2, ••• , 'Yn. Then transposing the corresponding terms to the right,
we have the system
150 5. Matrices. Determinants. Systems of Linear Equations

(1)
aP?x1 + aiYx2 + + ai9xr = 13p> - ai~~+I ')'r+I - - a in 'Yn,
a~~X2 + + a~~ Xr = t3f> - aft+ 1 'Yr+ 1 - -
(2)
0'.2n 'Yn,

(r) _ a(r) (r) (r)


O'.rr Xr - f.Jr - O'.r,r+ 1 'Yr+ 1 - • •• - O'.rn 'Yn•

Similarly to the case (1) we obtain the values of Xr, Xr - 1, . . . , X2, X1.
Since 'Yr+ 1, -y, + 2, . . . , 'Yn are arbitrary numbers we infer that the system
has infinitely many solutions. ►
Example. Solve the system
3x1 - 5x2 + 2x3 + 4x4 = 2,
[ 1x1 - 4x2 + X3 + 3x4 = 5,
5x1 + 7x2 - 4x3 - 6.x.i = 3.
◄ Let us arrange the augmented matrix
-5 2 4
-4
7
and reduce A to schematic form.
-4
1
-6
3
D
Step 1. To get the matrix with au = 1 we subtract the first row multi-
plied by 2 from the second row and interchange the first row and the ob-
tained second one. We have

6 -3 -5
-5 2 4
7 -4 -6
Subtracting the multiples of the first row by 3 and 5 from the second
and third rows, respectively, we get

1 6 -3 -5
( ol -23 11 19
o I -23 11 19
Step 2. Subtracting the second row from· the third one, we have

1 6 -3 -5
( ol -23 11 19
0 0 0 0
Since the rank of the coefficient matrix is equal to 2 and the rank of
the augmented matrix is equal to 3 we conclude that the linear system is
inconsistent. ►
5.5 Systems of Linear Equations 151

Example. Solve the system


2x1 + 5x2 - 8X3 = 8,
4x1 + 3x2 - 9x3 = 9,
2x1 + 3X2 - 5X3 = 7,
X1 + 8X2 - 7X3 = 12.
◄ Arrange the augmented matrix
2 5 -8 8
4 3 -9 9
A= 2 3 -5 7
1 8 -7 12

Step 1. Interchanging the first and fourth rows, we have


1 8 -7 12
4 3 -9 9
2 3 -5 7
2 5 -8 8
Subtracting the multiples of the first row by 4, 2 and 2 from the second,
third and fourth rows, respectively, we get

1 8 -7 12
0 -29 19 -39
0 -13 9 -17
0 -11 6 -16
Step 2. To simplify computations we subtract the third row multiplied
by 2 from the second row and then the fourth row from the third one.
We have
8 -7 12
-3 1 -5
-2 3 -1
-11 6 -16
Subtracting the third row from the second one, we get
1 8 -7 12
0 -1 -2 -4
0 -2 3 -1
-11 6 -16
Subtracting the multiples of the second rows by 2 and 11 from the third
and fourth rows, respectively, we obtain
152 5. Matrices. Determinants. Systems of Linear Equations

N
1 8 -7
-2
7
28
12
-4
7
28
Step 3. Subtracting the third row multiplied by 4 from the fourth row,
we have
8 -7 12
-1 -2 -4
0 7 7
0 0 0
Multiplying the third row by 1/7, we obtain
8 -7 12
-1 -2 -4
0 1 1
0 0 0
The system is consistent since the rank of the coefficient matrix is equal
to the rank of the augmented matrix, that is, r(A) = r(A) = 3. Since the
rank is equal to the number of unknowns the system has a unique solution.
Thus the original system is equivalent to the system
Xt + 8X2 - 7X3 = 12,
[ - X2 - 2x3 = -4,
X3 = 1.
From the third equation we have x 3 = 1. Substituting X3 = 1 into the
second equation, we have - X2 - 2 = -4. Whence, x2 = 2. Substituting
the values of x2 and X3 into the first equation, we get x1 + 16 - 7 = 12.
Whence X1 = 3. Thus the system has the unique solution x1 = 3, x2 = 2,
X3 = 1. ►
Example. Solve the system
3Xt - 2x2 + 5X3 + 4X4 = 2,
[ 6x1 - 4x2 + 4x3 + 3X4 = 3,
9x1 - 6x2 + 3x3 + 2X4 = 4.
◄ Arrange the augmented matrix

G : : i D·
Step 1. Subtracting the multiples of the first row by 2 and 3 from the
second and third rows, respectively, we have
5.5 Systems of Linear Equations 1S3

3 -2

f1l-12
( 0 0
5
-6 -5
-10
4
-i).
-2

Step 2. Subtracting the multiple of the second row by 2 from the third
row, we get

3 -2 5 4
( 0 0 -6 -5
0 0 0 0
The system is consistent (r(A) = r(A) = 2) and has infinitely many so-
lutions since the rank of the coefficient matrix is smaller than the number
of unknowns (r(A) < 4).
The original system is equivalent to the system

[
3xi - 2x2 + 5x3 + 4X4 = 2,
- 6X3 - 5X4 = -1.
Let us find the solution of this system.
Set x2 and X4 equal to arbitrary numbers -y2 and 'Y4. Then transposing
the corresponding terms of the system to the right, we have

[
3x1 + 5x3 = 2 + 2-y2 - 4-y4,
6x3 =1 - 5-y4.
From the last equation we get

X3 = 61 (1 - 5-y4),

where 'Y4 is an arbitrary number.


Substituting the expression for X3 into the first equation, we obtain
1
Xi = 18 (7 + 12-yz + ')'4),
where -y2 and 'Y4 are arbitrary numbers.
The general solution of the system is
1 1
Xi = 18 (7 + 12-yz + ')'4), X2 = ')'2, X3 = 6 (1 - 5-y4), X4 = ,'4 •

Any specific solution is obtainable from the general one by giving


specific values to -y2 and 'Y4• For instance, if we set x2 = -y2 = 1 and X4 =
'Y4 = -1, we get xi = X3 = 1. Thus the specific solution of the system is
Xi = 1, X2 = 1, X3 = l, X4 = -1. ►
154 5. Matrices. Determinants. Systems of Linear Equations

Cramer's rule. Suppose we are given a system of n linear equations in


n unknowns
a11X1 + a12X2+ ... + a1nXn = f31,
a21X1 + a22X2 + ... + a2nXn = {32,
. . . .. . . .... . . . . ... ... . .. . . . . . . (5.55)

or, in matrix form,


AX= b. (5.56)
System (5.55) is called the quadratic system of linear equations.
If the square matrix A is nonsingular the system (5.55) is consistent
and has a unique solution since r(A) = n.
Pre-multiplying (5.56) by the matrix A - 1 inverse to A, we have
X = A - lb
or, using formula (5.39),

Xt Au A21 An1 {31


X2 1 A12 A22 An2 {32
- IAI . . . . .. . . . . . . . . . . . ... . .
Xn A1n A2n Ann f3n

After a little algebra in the right-hand side of the above expression, we


have

or, as quotient of determinants,

a22 a2n
..... . . ... . . . .. . . . ... . . . . . . . .
O'.nl O'.n2 . . . f3n O'.nn

Xj = '
j - 1, 2, . . . ' n,
an a12 O'.tj a1n

a21 a22 0'.2j 0'.2n

O'.nl O'.n2 • • • O'.nj O'.nn


(5.57)
5.5 Systems of Linear Equations 155

where the numerator is the determinant of the matrix obtained from A


by substituting the column-vector b for thejth column and the denominator
is the determinant of the coefficient matrix A.
It is important to mention that Cramer's rule (5.57) is mainly of theoret-
ical interest and is of no use for practical purposes it being awkward to
solve linear systems except those in two or three unknowns.
Remark. Cramer's rule involves n + 1 determinants of order n thus re-
quiring a vast amount of direct computations which is proportional to n !n
and far exceeds the number of arithmetic operations required by the Gaus-
sian elimination. When n = 30 the task of computing the solution of the
linear system becomes time consuming even for advanced computers. The
total number of arithmetic operations involved in Gaussian elimination is
proportional to n 3 •
To solve the quadratic linear system
AX= b,
where A is a nonsingular square matrix, we should proceed as follows:
Step 1. Arrange the augmented matrix

A= (A lb)=

Step 2. By elementary row operations reduce A to the form


1 &12 &in S1
(A I ii)=
0 1 &2n S2
• • • I • • I I I I I I I I I •

0 0 1 f3n
where A is a triangular matrix.
Step 3. By elementary row operations reduce (A I ii) to the form
1 0 0 "/1
(I I c) = 0 1 0 "/2
.............
1 1 'Yn

where / is an identity matrix.


Step 4. Write down the system corresponding to the augmented matrix
(I I c)
X1 = "/1
X2 = "/2
.............
Xn = 'Yn
156 5. Matrices. Determinants. Systems of Linear Equations

The n-tuple
X1 = 'Yl, X2 = 'Y2, ... , Xn = 'Yn
is the solution of the original system.
Homogeneous linear systems. The system of m linear equations in n
unknowns is called homogeneous if all constant terms are equal to zero,
1.e., the homogeneous linear system is given by
a11X1 + a12X2 + ... + a1nXn = 0,
a21X1 + a22X2 + ... + a2nXn = 0, (5.58)
am1X1 + am2X2 + ... + amnXn = 0.
We shall consider some important properties of homogeneous linear
systems.
(a) Any homogeneous system is consistent.
◄ The n-tuple X1 = 0, X2 = 0, ... , Xn = 0 is the trivial solution of any
homogeneous system. ►
(b) If the number n of the unknowns exceeds the number m of the equa-
tions the homogeneous system has nontrivial solutions.
◄ By definition the rank r of the system (5.58) satisfies the inequality
r ~ m < n. Whence we infer that the linear system is indeterminate. ►
(c) The sum of solutions of the homogeneous system (5.58) is a solution
of (5.58).
◄ Let -y{, -y 2, ... , -y; and -y{', -y,J,', ... , -y;' be two solutions of (5.58). This
means that
n n
~ a;;-yj = 0 and ~ aij-yj'= 0
j=l J=l

for any i (i = 1, 2, ... , m).


Since
n n n
~ aii( 'YJ + -yj') = ~ a;J-y} + ~ aiitf'= 0 (i = 1, ... , m),
j=l j=I j=l

the n-tuple
' + 'YI,"
'YI 'Y2 '+ 'Y2," • • ,, 'Yn '+ 'Yn"
i.e., the sum of the solutions of (5.58), is also the solution of the homogene-
ous linear system (5.58). ►
(d) The product of the solution of (5.58) by an arbitrary number is also
a solution of (5.58).
5.5 Systems of Linear Equations 157

◄ Let -y1, -y2, ... , 'Yn be a solution of (5.58), i.e.,


n
~ au'Yi =0 (i = 1, 2, ... , m) '
j=l

and let µ be an arbitrary number.


Then
n n
~ au(µ-yj) =µ ~ <Xii'Yi =0 (i = 1, 2, ... , m).
j=l j=l

Whence we conclude that the n-tuple µ-y1, µ-y2, ... , µ-yn, i.e., the product
of the solution -y1, -y2, ... , 'Yn by the numberµ is the solution of the system
(5.58). ►
In matrix form, the homogeneous linear system may be written as
0
0

Xn 0

or, as the matrix equation


AX= 0. (5.59)
Then the properties (c) and (d) may be formulated as follows.
(c ') If the column-vectors r' and r" are the solutions of the system
(5.59), so is the column-vector r' + r ".
◄ Indeed, if r' and r" are the solutions of (5.59), then Ar' = 0,
Ar" = 0 and A(r' + r ") = Ar' + Ar" = 0 + 0 = 0, so that the
vector-column r' + r" is the solution of (5.59). ►
(d ') If the column-vector r is the solution of (5.59) so is µf where µ
is an arbitrary number.
◄ Indeed, if r is the solution of (5.59) then Ar = 0 and A(µf) = µ(Af) =
pl) = 0, so that the product of the column-vector r by an arbitrary number
µ is the solution of (5.59). ►
The properties (c ') and (d ') imply that the solutions of the homogene-
ous linear system with the induced operations of addition and multiplica-
tion by an arbitrary number form a linear space.•> Let us acquaint ourselves
with an important property of this linear space.
By the method of Gaussian elimination we may reduce the homogene-
ous linear system (5.58) to the form

•> The notion of linear space will be discussed in Chap. 6


158 5. Matrices. Determinants. Systems of Linear Equations

X1 + f:J12X2 + ... + f:J1,X, + f:J1,, + 1X, + 1 + ... + f:J1nXn = 0,


Xz + ... + f:Jz,Xr + f:Jz,r+ 1Xr+ 1 + ... + f:JznXn = 0, (5.60)

Xr + f:Jr,r+ 1Xr+ 1 + ... + f:JrnXn = 0.

Allowing for renumbering of the unknowns we may consider X1, x2, ... ,
x, as the principal unknowns and the other unknowns as indeterminate or
disposal unknowns.
Let the rank r of the system (5.60) be smaller than the number n of
the unknowns, that is, r < n.
We compute n - r basic solutions by giving to the indeterminate
unknowns x, + 1, x, + 2, . . . , Xn the values from the following table

Xr+ 1 Xr+2 ... Xn + 1 Xn

1 1 0 ... 0 0

2 0 1 ... 0 0
(5.61)
... . .. . .. . .. . .. . ..

n-r- 1 0 0 ... 1 0

n-r 0 0 ... 0 1

Each row in (5.61) determines the solution of the system (5.60) so that
we have the collection of n - r solutions

')'11 ,'12
')'21 ')'22

')'rl 'Yr2
1 0
0 1

0 0
0 0
5.5 Systems of Linear Equations 159

)'I, n - r - 1 )'1,n - r
)'2,n - r- 1 )'2,n - r

· · ., rn-r-1 = 'Yr,n - r- 1 'Yr,n-r


0 0
0 0

1 0
0 1
The solutions f1, r2, ... , r n - r are linearly independent.
◄ Consider the linear combination

µr + 1'Y 1 1 + µr + 2 'YI 2 + ••• + µn 'YI , n - r


µr + I 'Y2 I + µr + 2 'Y22 + ••• + µn 'Y2, n - r

µr + 1 'Yr I + µr + 2 'Y r2 + ••• + µn 'Yr, n - r (5.62)


µr+ I

µr+2

µn- 1

µn
It is easy to see that the linear combination (5.62) is equal to the zero
column-vector if and only if µr+ I = µr+ 2 = ... = µn - I = µn = 0. Hence,
it follows that the zero (trivial) solution of the system (5.60) is equal only
to the trivial linear combination of the solutions f I, f 2, . . . , f n - r• ►
The properties (c ') and (d ') imply that given arbitrary scalars µr+ 1,
µr+ 2, . . . , µn, the linear combination (5.62) is the solution of the system
(5.60).
◄ Let

r= (5.63)
µ,r + 1

be an arbitrary solution of (5.60). Then (5.63) may be represented as a linear


combination of the form (5.62).
160 5. Matrices. Determinants. Systems of Linear Equations

Multiplying the solutions f1, f2, ... , fn-r byµ,+ 1, µ,+2, ... , µn,
respectively, and adding the multiples obtained, we get the solution of (5.60)
as the linear combination (5.62).
Comparing (5.62) and (5.63), we easily see that both comprise the same
values of the indeterminate unknowns µ, + 1, µ, + 2, ... , µn. Observe that
the values of the indeterminate unknowns uniquely determine the values
of the principal unknowns. Hence, the solutions (5.62) and (5.63) are identi-
cal and
(5.64)
Therefore the solutions r1, r2, ... , rn _, of the homogeneous system
(5.58) are such that (a) they are linearly independent and (b) any solution
of (5.58) may be represented as a linear combination of r 1, r2, ... , r n _ ,.
Definition. Any collection of n - r solutions of the homogeneous sys-
tem (5.58) that satisfy the conditions (a) and (b) given above is called the
fundamental system of solutions of (5.58).
Example. Compute the solutions of the system
3X1 - 2x2 + 2x3 + 4X4 = 0,
[ 6x1 - 4x2 + 4x3 + 3x4 = 0,
9x1 - 6x2 + 3x3 + 2x4 = 0.
◄ By the method of Gaussian elimination we have

(
3X1 - 2x2 + 5X3 + 4X4 = 0,
- 6X3 - 5X4 = 0.
Let us choose X2 and X4 as indeterminate unknowns and arrange the
table

X1 X2 X3 X4

2
- 1 0 0
3

1 5
0 1
18 6

The fundamental system comprises the solutions


2 1
3 0
0 -15
0 18
5.5 Systems of Linear Equations 161

Any solution of the given linear system can be expressed as


2 1
3 0
r=µ 0
+v -15
0 18
where µ and v are arbitrary numbers. ►
Summing up, it is important to mention that it suffices to find the fun-
damental system of solutions to describe the set of solutions of the
homogeneous linear system since the set of all linear combinations of the
fundamental solutions forms the desired set. It is also easy to see that table
(5.61) represents the identity matrix of order n - r. However, this is one
of the many ways to identify values of the indeterminate unknowns. Table
(5.61) may contain any other nonsingular matrix of order n - r. Finally,
we emphasize that any homogeneous linear system having nontrivial solu-
tions possesses the fundamental system of solutions.
Methods of solving linear systems. In general the methods used to com-
pute the solutions of linear systems can be classified as direct and indirect
(or iterative).
Direct methods are based upon Gaussian elimination with some inessen-
tial modifications aimed at diminishing the number of operations required
so that it remains proportional to n 3 • When input data, i.e., the coefficient
matrix A and the constant terms in (5.52), are defined exactly and the com-
putations are free from rounding-off errors, the direct method yields the
true solution as a result of finite sequence of steps or operations.
Indirect methods provide the approximate solutions by giving sequences
of approximate solutions that start from some initial approximation. The
complete cycle of computing the new approximate solution from the origi-
nal one is called the iteration. So the indirect methods are often called the
iterative ones. They involve a finite number of iterations to reach the ap-
proximate solutions with given precision.
In what follows we consider two methods, one direct and the other in-
direct, which are of practical interest in a variety of applications.
Gaussian elimination for linear systems with tridiagonal matrices. Of
practical interest are linear systems with tridiagonal coefficient matrices
which are square matrices A = (au) whose nonzero elements are adjacent
to their principal diagonals so that a;j = 0 if I i - j I > 1.
Let
0 0 0 0
0 0 0 0
A= C3 0 0 0
. .. . . . . . .. . . . . . . . ..... ..... .. . . . . . .. ..
0 0 0 0 an- 1 bn - 1
0 0 0 0 ... 0 On
162 5. Matrices. Determinants. Systems of Linear Equations

be a tridiagonal coefficient matrix. Then the corresponding linear system


takes the form
b1X1 + C1X2 . . . . . . . . . , . . . . . . . . . . . . . . , , . . . . . . . . . , , , , = d1,
02X1 + b2X2 + C2X3 . . . . . . . . , , , . , , , , , , , , . , , . , . , , · · · · · = d2,
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. . . . . . . . . . . . . . . . . .
a;x; - 1 + b;x; + c;x; + 1 •..•••••.••••• = d;,
OnXn-1 + bnXn = d,
where i = 2, 3, ... , n - 1, or
a;x; - 1 + b;x; + c;x; + 1 = d;, (5.65)
where i = 1, 2, ... , n, a1 = 0, Cn = 0.
Let us apply the method of Gaussian elimination to (5.65).
We assume that the unknowns x; are related as
X; = a;+ 1X; + I + {3; + I (i = 1, 2, ... ' n), (5.66)
where a;+ 1 and {3; + 1 are unknown quantities.
Diminishing all the subscripts in (5.66) by unity and substituting the
obtained relations into (5.65), we get
a;(a;X; + /3;) + b;x; + c;x; + 1 = d;.
Whence
c; a;{3; - d;
X;= X; + 1 - b ,
b; + a;a; ; + a;a;
This expression is identical to (5.66) if
Ci a;/3; + d;
ex;+ 1 = - b; + a;ex; and /3;+ 1 = - - - - - •
b; + a;a;
To start the computations we also need to know the values of a1, /31,
Xn + 1. These are usually set equal to

a1 = 0, /31 = 0, Xn + 1 = 0.
Collecting all the formulas together, we may describe the method as
consisting of
(a) elimination stage
C;
ex; + 1 = - - - - - i = 1, 2, ... , n; a1 = 0,
b; + a;a; '

a;/3; - d;
{3; + 1 = - ---- , i = 1, 2, ... , n; /31 = 0,
b; + a;a;
_ _ -· __ 5.5 Systems_of_Linear Equations ___________________ ... 163

(b) back-substitution stage


X; = O:i + 1Xi + 1 + {3; + 1, l = n, n - 1, ... , 1, Xn + 1 = 0.
Therefore on completing the elimination stage we obtain the values of
the quantities o:; + 1 and {3; + 1 (i = 1, 2, ... , n) and, consequently, the value
of the unknown Xn. Then substituting Xn into the expression relating Xn
and Xn _ 1, we get the value of Xn - 1. Substituting Xn - 1 into the relation
which comprises Xn - 1 and Xn + 2, we get the value of Xn - 2. Continuing this
process, we finally compute the values for all the unknowns X1, x2, ... , Xn.
Method of simple iteration. Suppose that we are given a linear system
(5.52) and start the computations from some initial approximate solution
Yo.
Let Yk and Yk + 1 be two approximate solutions obtained by completing
the kth and (k + l)th iterations, respectively. Suppose that Yk and Yk + 1
are related as
Yk+ 1 - yk
- - - + AYk = b (k = 0, 1, 2, ... ) (5.67)
T
or
Yk+ 1 = Yk - r(AYk - b), (5.68)
where r (r > 0) is a real number called the stationary parameter.
The procedure which employs the recurrence relations (5.67) and (5.68)
to compute the approximate solution of the system (5.52) is called the
method of simple iteration.
In terms of unknowns the relation (5.68) is written as

y~\.1 = yf) - r ( .±
J=l
o:;jyfJ - {3j) (i = 1, 2, ... , n). (5.69)

The precision of the method is given by the value of the error


zk = Yk - x
i.e., the value of the difference between the solution of (5.68) and the true
solution X of the system (5.52).
Let e (e > 0) be a given relative error used to estimate the accuracy of
the approximate solution Yk, The process of the computations terminates
when the following condition is satisfied
n n
IZkl = L lzfJ1 ~ e
2 L I ~•J I 2 = e I Zo I.
i= 1 i= 1

The iterative method is said to converge if


lim IZkl = 0.
k-+oo

II*
164 5. Matrices. Determinants. Systems of Linear Equations

Suppose that the coefficient matrix A = (au) of (5.52) is symmetric and


all the eigenvalues of A are positive, i.e., 0 < A1 ~ A2 ~ . . . ~ An (see Chap-
ter 6).
Theorem 5.10. Whatever the initial approximation Yo the method of
simple iteration converges if the stationary parameter T satisfies the con-
dition
2
7 <An. (5.70)

Example. Given the initial approximation

Yo= (~)

and the relative error e = 0.3, use the method of simple iteration to com-
pute the solution of the linear system
2x1 + X2 = 1,
[
X1 + 2x2 = 2,
or, in matrix form,

◄ Let us compute the eigenvalues of the symmetric matrix

We have
2- A 1
1 2- A = (A - 2)2 - 1 = 0.

Whence A1 = 1 and A2 = 3.
Put
1
7 = - = 0.5.
2
Thus condition (5.70) is satisfied. Then from (5.68) we have

Whence
Exercises 165

Y2 = (~.5) - GD(~.5) - G)]


0.5 (

= C.5) - G.s =D= (~.5 =g:;5) = (g_75) ·


0· 5

Let us estimate the precision of the approximate solutions. Since

is the true solution of the system, then


IZol = I Yo-XI =..Ji.2= 1,
IZil = I Y1 - XI = ✓(0.5) 2 = 0.5,
I Z2 I = I Y2 - XI = ✓(0.25) 2 = 0.25.
Whence we infer that the approximate solution

Y= (g_75)
2

obtained after the second iteration meets the accuracy required. ►

Exercises
1. Multiply the matrix A by the matrix B:

(a) A= (~ g), B = (~ g): (b) A = (~ g), B = (~ g).


1 a C)
2. Multiply the matrix ( 0 1 b by itself.
0 0 1
3. Compute the product AB and BA for A= (2 -3 0) and

B= (D·
1 1 1 1
1 2 1 2
4. Reduce the matrices to schematic forms: (a) 1 1 3 1
1 2 1 4
1 2 3 4
3 4 5 6
(b)
5 6 7 8
31 23 55 42
166 5. Matrices. Determinants. Systems of Linear Equations
---

0 1 2 3
1 0 1 2
5. Compute the values of the determinants: (a) 2 1 0 1
3 2 1 0
1 2 3 4 1 2 3 4
3 6 8 11 3 4 5 6
(b) (c)
7 13 20 26 5 6 7 9
31 23 55 42 31 23 55 42

6. Invert the matrices: (a) (


-8
5 -4)
6 ' (b) (-i -1) -3
7 2
2 -4
'

0 1 1 1
-1 0 1 1
(c) -1 -1 1
0
-1 -1 -1 0

7. Find the ranks of: (a)


1
4
7
10
2
5
8
11
3
6
9
12
(b) c
2 45 7
8 11 ;
3 6 9 12
10)
1 -1 0 0
0 1 -1 0
(c)
0 0 1 -1
-1 0 0 1
X1 + X2 - X3 = 1,
8. Compute the solutions of the linear systems: (a) ( X1 - X2 + X3 = 1,
- X1 + X2 - X3 = 1;
7x1 - 5x2 - 2x3 - 4x4 = 8,
X1 - X2 + 2x3 = 2, - 3x1 + 2x2 + X3 + 2x4 = - 3,
(b) ( X1 + 2x2 + X3 = -2, (c)
2X1 -=- X2 - X3 - 2x4 = 1,
2x1 - X2 - X3 = - 2; -X1 + X3 + 2X4 = 1;

- 3X1 + X2 + X3 = 0,
(d) ( 5x1 + x2 - 2x3 = 2,
-2X1 - 2x2 + X3 = -3.

3x1 + 2x2 + X3 = 0,
9. Compute the solutions of the systems: (a) ( 7x1 + 6x2 + 5x3 = 0,
5x1 + 4x2 + 3x3 = 0;
____EE.rcis~----- 167

- X1 + X3 + 2x4 = 0, x1 + 4x2 + 2x3 - 3xs = 0,


X2 + X3 + 2x4 = 0, (c) [ 2x1 + 9x2 + 5x3 + 2x4 + xs = 0,
(b)
7Xi - 5x2 - 2x3 - 4x4 = 0, xi + 3x2 + X3 - 2x.i - 9xs = 0.
4x1 - 3x2 - X3 - 2x4 = O;

Answers
1 2a ab+
2. ( 0 1 2b
2c) .
0 0 1

3. AB = (- 1), BA =( i -12 0)
-9 0 .
-3 0

1 1 1 1 1 2 3 4
0 1 0 1 0 -2 -4 -6
4. (a) 00 2 0 (b) 0 0 40 35
00 0 2 0 0 0 0

5. (a) -12; (b) 5; (c) 80. 6. (a) - - 1(6 4)


2 8 5
; (b) (32
2
25
14
-1)
1 0
11 -1
;

0 -1 1 -1
1 0 -1 1
(c) -1 1 0 -1
1 -1 1 0

7. (a) 2; (b) 2; (c) 3; 8. (a) x1 = 1, xz = 1, X3 = 1; (b) x1 = -1, xz = -1, X3 = 1;


(c) x1 = -1 - 'Yl + 2-y4, x2 = - 3 + ')'3 + 2-y4, X3 = ')'3, X4 = 'Y4; (d) the system is inconsis-
tent. 9. (a) X1 = X3, X2 = -2x3; (b) X1 = X2 = X3 + 2,X4; or

1
1
or =A 1 +µ
0

X1
X2
(c) X3 =A +µ
X4
Xs
Chapter 6
Linear Spaces and Linear Operators

6.1. The Concept of Linear Space


A set V of mathematical objects x, y, z, ... , called the elements of
V, is said to form a linear real or complex space if (1) given two arbitrary
elements x and y in V there exists an element x + y in V called the sum
of x and y, and (2) given an arbitrary element x in V and an arbitrary
real or complex ex there exists an element ax in V called the multiple of
x by a, and the induced operations of addition and multiplication by a
scalar satisfy the following eight conditions
(a) (x + y) + z = x + (y + z);
(b) x + y = y + x;
(c) there exists an element (J in V such that for any x in V there holds
X + 0 = x;
(d) for any x in V there exists an element ( - x) in V such that
X + (-X) = 0;
(e) a(x + y) = ax + ay;
(f) (ex + (3)x = ax + {3x;
(g) a((3x) = (a(3)x;
(h) l •x = x.
The elements x, y, z, ... of the linear space V are often called the vectors,
the element O is called the zero vector and the element ( - x) is called the
additive inverse of the vector x. The linear space is also called the vector
space.
Examples of linear spaces. The following sets of mathematical objects
form linear spaces:
(1) The set V 3 of geometric free vectors with the induced operations
of vector addition and multiplication of a vector by a scalar described in
Chap. 2 (Fig. 6.1) as well as the set V 1 of vectors in a line and the set
V 2 of vectors in a plane.
(2) The set of sequences (,e1, ~2, ••• , ~n) of n real numbers. The operations
of addition and multiplication by a real number are induced as
(a) Addition
(~1, e, ..., ~n) + (711, 712, ••• , 71n)
= (~I + 71 1, ~2 + 71 2, ••. , ~n + 71 n);
169

(b) Multiplication by a real number


A(e, e, ... , ~n) = (h~ 1, Ae, ... , A~n).
In symbols IRn designates the n-dimensional real coordinate space
(Fig. 6.2).

......
......
'' ......
' ' ......
a+b '

(a) (b)
Fig. 6.1

(a)
(b)
Fig. 6.2

(3) The set IRm x n of m x n matrices, discussed in Chap. 5, with the


operations of matrix addition
/311 /312 ... f31n

+ /321 /322 ... f32n


............................. .............................
CXm 1 <Xm2 <Xmn f3ml f3m2 f3mn

au+ /311 ... <Xln + f31n a12 + /312


a21 + f321 ... CX2n + f32n a22 + /322
.....................................................
<Xml + f3ml <Xm2 + f3m2 <Xmn + f3mn
17_Q _--~"'.Line.ax Spaces and_ Linea_._r=O=p=er=-at"""o.....,.rs'-----------------

and multiplication of a matrix by a real number


O:'. 12 O:'. 1 n "'A.a1n
a22 0:'.2n l\0:'.2n
.............................
l¥ml l¥m2 C¥mn

In particular, the set IR 1 x n of row-vectors, i.e., 1 x n matrices, and the


set IRm x 1 of column-vectors, i.e., m x 1 matrices, form linear spaces.
(4) The set C(O, 1) of real functions continuous on the interval (0, 1)
with the usual operations of addition of functions and multiplication of
a function by a real number.
In all these cases the validity of the conditions (a)-(h) are directly
verified.
Properties of linear spaces. (a) The zero vector O is uniquely defined.
◄ Let 01 and 02 be zero vectors in V and let 01 + 02 be a sum of 01 and
02. Since 02 is the zero vector the condition (c) implies that 01 + o;- 01.
The vector 01 is also zero. Then 01 + 02 = 02 + 01 = 02. ►
(b) For any vector x the additive inverse vector ( - x) of x is uniquely
defined.
◄ Let x - and x - be two additive vectors inverse to x. Let us show that
x - and x _ are equal.
Consider the sum x _ + x + x - . Since x - is inverse to the vector x the
condition (a) implies that
X- + X + X- = X_ + (X + X- ) = X- +0 = X- .
Analogously, we have
x - + x + x - = (x _ + x) + x - = 0 + x - = x - . ►

It is easy to verify the following properties:


(c) for any vector x there holds Ox = O;
(d) for any vector x there holds - x = ( - l)x;
(e) for any real number a there holds aO = O;
(f) the identity ax = 0 implies that either_ a = 0 or x = 0.

6.2 Linear Subspaces


A non-empty subset W of a linear space V is said to form a linear
subspace in V if given arbitrary vectors x and y in W and any number
a, the following conditions are satisfied:
{i) x + y is an element of W and (ii) ax is an element of W. The set
W is said to be closed with respect to the operations (i) and (ii).
Examples of linear subspaces. (1) The set V2 of vectors in a plane forms
a subspace of the linear space V 3 •
6.2 Linear Subspaces 171

(2) The solution of the homogeneous system of m linear equations in


n unknowns
0'.11 0'.1n 0
Ct'.21 0'.2n 0

Ct'. m 1 Ct'. m2 O'.mn Xn 0


or, in matrix form,
AX= 0
forms a linear subspace.
◄ In Chap. 5 we have shown that the sum of the solutions of (*) and
the multiple of th~ solutions of (*) by a number are also the solutions of
( *). Whence we infer that the set of the solutions of (*) forms a linear
subspace of the linear space fRn x 1• ►
(3) The set of all functions f(x) such that each f(x) is equal to zero
at x = 0, i.e., f(0) = 0, forms a subspace in the linear space C( -1, .1)
of real-valued functions continuous on the interval ( -1, 1).
◄ It is easy to verify that the set

Co( -1, 1) = {f(x)if(0) = 0}


is a linear subspace since both the sum f(x) + g(x) and the multiple af(x)
are equal to zero at x = 0 when f(0) = g(0) = 0. ►
From the definition of linear subspace it follows that if x1, x2, ... , Xq
are vectors in W so is any linear combination a1x1 + a2X2 + ... + aqXq,
Also, for any linear subspace W the conditions (a)-(h) are satisfied. This
means that any linear subspace of linear space is itself a linear space. It
suffices to show that both the zero vector and the additive vector inverse
to an arbitrary vector in W belong to W.
◄ Let x be an arbitrary vector in W. Then multiplying x by O and -1,
we get the zero vector (} = Ox and the additive vector - x = ( - l)x inverse
to x. Obviously, the other conditions are also satisfied. ►
Sums and intersections of linear subspaces. Let V be a linear space and
let W 1 and W2 be linear subspaces in V.
The sum W 1 + W2 of the linear subspaces W 1 and W 2 is the set of
all vectors x in V such that
(6.1)

where x1 is a vector in W 1 and x2 is a vector in W 2.


In symbols, we write the sum of W 1 and W2 as
W1+W2= {x=x1+x2lx1EW1,x2EW2}.
We show that the sum W 1 + W 2 is a linear subspace in V.
172 6. Linear Spaces and Linear Operators

◄ Let x and y be two arbitrary vectors in W 1 + W 2. By definition of


the sum of linear spaces there exist x1 and Y1 in W 1, and x2 and Y2 in
W2 such that
x = x1 + x2 and y = Yi + Y2-

Whence the sum x + y may be written as


X+ Y = (x1 + X2) + (Y1 + Y2) = (x1 + Y1) + (x2 + Y2).
Since x1 + y 1 belongs to W 1 and x2 + Y2 belongs to W 2 the sum x + y
is a vector in W 1 + W2.
Similarly, we can prove that ax is a vector in W 1 + W 2. ►
The sum of the linear spaces W 1 and W2 is said to be direct if the expan-
sion (6.1) is unique for any vector x in W 1 + W2 (Fig. 6.3).

Fig. 6.3

In symbols, we write W 1 EB W2 to designate the direct sum of W 1 and


W2.
The intersectio'n W 1 n W 2 of the linear subspaces W 1 and W 2 of the
linear space Vis the set of all vectors x that belong both to W 1 and to W2.
Properties of sums and intersections of linear subspaces. (a) The inter-
section W 1 n W 2 is non-empty since W 1 n W2 always contains the zero
vector.
(b) The intersection W 1 n W2 is a linear subspace of the linear space V.
(c) If the zero vector is the only vector contained both in W1 and in
W2, the sum of W1 and W2 is direct (denoted by W1 EB W2).

Linear span. Let V be a linear space and X be a subset of V. The set

L(X) = [y .±
=
J=l
O!jXjlXj EX; O!j E IR; q = 1, 2, ... ]
6.2 Linear Subspaces 173

of all linear combinations of vectros y in X is called the linear span L(X)


of X. In other words, the linear span L(X) of X is the set of all vectors
y that may be represented as linear combinations of vectors in X.

Properties of linear spans. (a) The linear span L(X) contains the set X.
(b) The linear span L(X) forms a linear subspace in V.
◄ This follows from the fact that the sum of linear combinations of vec-
tors in X and the multiples of linear combinations by an arbitrary number
are linear combinations of vectors in X. ►
(c) The linear span L(X) is the smallest linear subspace containing the
set X.
In other words, if the linear subspace W contains the set X, then W
also contains the linear span L(X) of X.
◄ Indeed, an arbitrary linear combination a1x1 + a2X2 + ... + aqXq of
vectors in X, being an element of L(X), is also contained in W. ►

1 ....

Fig. 6.4

Examples. (1) Let ~ = (1, 1, 0) and .,, = (1, 0, 1) be two vectors in the
linear space IR 3• Then the set of solutions of the equation
~· + ~2 - ~3 =0 (6.2)
is the linear span L(t 71) of the vectors ~ and .,,.
◄ Indeed, the triples (1, 1, 0) and (1, 0, 1) form the fundamental system
of solutions of the homogeneous equation (6.2). Hence each solution of
(6.2) is a linear combination of fundamental solutions (Fig. 6.4). ►
(2) Let C(- oo, oo) be a linear space of real-valued functions continuous
_174.. 6. Lin~ar Spaces and Lin~ru: QQeratorL. ____________

at every point of the number line and let


X = (1, x, ... , x"}
be a set of monomials 1, x, ... , x".
Then the set of polynomials of degree not exceeding n with real coeffi-
cients forms the linear span L(X) of X.
In symools, L(l, x, ... , x") designates the linear span of the set of
monomials 1, x, ... , x" and Mn stands for the set of polynomials of degree
not exceeding n with real coefficients.
I

6.3 Linearly Dependent Vectors


Definition. The vectors x1, x2, ... , Xq in the linear space V are called
linearly dependent if there exist numbers a1, a2, ... , aq, not all zero, such
that
0'.1X1 + 0'.2X2 + ... + O'.qXq = (), (6.3)
If the equality (6.3) is satisfied by a1 = a2 = ... = aq = 0 only, the vec-
tors x1, x2, ... , Xq are called linearly independent.
The following propositions are valid.
Theorem 6.1. The vectors x1, x2, ... , Xq (q ~ 2) are linearly dependent
if and only if at least one of these vectors may be represented as a linear
combination of the others.
◄ Assume that the vectors x1, x2, ... , Xq are linearly dependent. For
definiteness we set aq -;rt O in (6.3). Transposing to the right all the terms
in (6.3) except the qth one and dividing all the terms by aq -;rt 0, we have
O'. 1 0'.2 O'.q - 1
Xq= - - X 1 - - X 2 - ... - - - - X q - 1 ,
O'.q O'.q O'.q

i.e., the vector Xq is a linear combination of the vectors x1, x2, ... , Xq.
Conversely, if one of the vectors is equal to the linear combination of
the others
/31X1 + /3-$-2 + ... + f3q- 1Xq- 1 = Xq
then transposing Xq to the left, we have the linear combination
{31X1 + {3-$-2 + ... + /3q-t'Xq-1 + (-l)Xq = ()
that contains at least one nonzero coefficient ( - 1 -;rt 0). This means that
the vectors x1, x2, ••• , Xq are linearly dependent. ►
Theorem 6.2. Let x1 , x2, ... , Xq be linearly independent vectors and let
y = a1x1 + a2X2 + ... + aqXq. Then the coefficients a1, a2, ... , aq are
uniquely defined by y.
6.4 Basis and Dimension ) 175

◄ Let

Then
a1X1 + a2,X2 + ... + aqXq = f31x1 + f32X2 + ... + {3qXq,
Whence
(a1 - f31)x1 + (a2 - f32)X2 + ... + (aq - {3q)Xq = 8.
Since the vectors x1, are
x2, ... , Xq linearly independent,
a1 - (31 = a2 - f32 = ... = aq - {3q = 0. Hence, a1 = (31, a2 = f32, ... ,
aq = {3q, ►
Theorem 6.3. A collection of vectors containing linearly dependent vec-
tors is linearly dependent.

Fig. 6.5

◄ Let X1, X2, ... , Xq, Xq + 1, ... , Xm be a collection of vectors such that the
first q vectors are linearly dependent. Then
a1X1 + a2,X2 + ... + aqXq =8
and not all a1, a2, ... , aq are equal to zero. Adding to this linear combina-
tion the multiples of Xq + 1, Xq + 2, ... , Xm by zero, we obtain the linear combi-
nation
a1X1 + a2,X2 + ... + aqXq + 0Xq+ 1 + ... + Oxm = 8,
where not all a; are equal to zero. ►
Example. The vectors in V3 are linearly dependent if and only if they
are coplanar (Fig. 6.5).

6.4 Basis and Dimension


The sequence of vectors e1, e2, ... , en in a linear space V is called
the basis of V if e1, e2, ... , en are linearly independent and any vector in
V may be expressed as a linear combination of e1, e2, ... , en.
176 6. Linear Spaces and Linear Operators
- - - - - - - - - --- - -- - -- ---- --· ·----------------- - ------

We speak of the sequence of e1, e2, ... , en to mean an ordered n-tuple


of vectors. It is easy to see that by permutation of n vectors we obtain
n ! ordered sequences (collections) of n vectors.
Example. Let a, b, c be three noncoplanar vectors in V3 (Fig. 6.6). Then
the ordered triples (a, b, c), (b, c, a), (c, a, b), (b, a, c), (a, c, b) and
(c, b, a) form distinct bases in V 3.
Let e = (e1, e2, ... , en) be a basis in V. Then given an arbitrary vector
x in V, there exists a collection of n numbers 2, ... , ~n such that e, ~
n
X = ee1 + ~2e2 + ... + ~nen = ~ ~ie;.
i =1

Fig. 6.6

By Theorem 6.2 the numbers ~1, ~ 2, ... , ~n, called the coordinates of
the vector x relative to the basis e, are uniquely defined.
n n
Let x = ~ ~ie; and y = ~ r,'e; be vectors in V. Then
i=l i=l
n n n
x + y = ~ ~;e; + ~ r,'e; = ~ (~; + r,'}e;
i=l i=l i=l
and, for any number a,
n n
ax =a ~ ~ie; = I; (a~;)e;.
i=l i=l

Thus, addition of vectors becomes addition of their respective coordinates


and multiplication of a vector by a scalar becomes multiplication of the
coordinates by this scalar.
The coordinates of a vector may conveniently be written in the form
of a column-vector. For example,
~1
~2
6.4 Basis and Dimension 177
n
is the column-vector of the coordinates of the vector x = I; l;;e; relative
i=I
to the basis e.
Let
n n n
x1 = ~ l;{e;, x2 = ~ l;~e;, ... , Xq = I; l;~e;
i =I i =I i =1

be expansions of the vectors x1, x2, .... , Xq relative to the basis e and let
1;l 1;! 1;~
l;r I;~ I;~
' ... , ..:

be column-vectors of the coordinates of x1, ... , x2, ... , Xq relative to e.


Theorem 6.4. The vectors x1, x2, ... , Xq are linearly dependent if and
only if so are the column-vectors of their coordinates relative to some basis
e.
◄ Assume that the linear combination A1X1 + AzX2 + ... + AqXq of the vec-
tors x1, x2, ... , Xq is equal to the zero vector, i.e.,
q
b AkXk = (J
k=l

or, in explicit form,

1 Ak ( J, ~te) JI (J Ak~i) e, = = 8. (6.4)

Recall that the expansion of a vector relative to a given basis is unique.


Then from (6.4) we have
q q
~ Akl;Tc = 0, ... , b Akl;i =0
k=l k=l

or, equivalently,
S;l I;} 1;i 0
'l\1
l;r + A2 l;i + ... + Aq 1;: - 0
(6.5)
1;1 I;~ 1;: 0
This means that the linear combination of column-vectors of coor-
dinates of x1, x2, ... , Xq is equal to the zero column-vector.
If we suppose that (6.5) holds, then reversing our arguments, we arrive
at formula (6.4). Whence we infer that if a nontrivial linear combination
of x1, x2, ... , Xq (i.e. a linear combination with A1, A2, ... , Aq not all equal
to zero) vanishes, then a nontrivial combination of column-vectors of the
12-9505
178 6. Linear Spaces antj Linear Operators

coordinates of x1, x2, ... , Xq with the same numbers /\1, A2, ... , l\q is equal
to the zero column-vector and vice versa. ►
Theorem 6.5. Let a basis in V comprise n vectors. Then any system
of m (m > n) vectors in V is linearly dependent.
◄ By Theorem 6.3 it suffices to consider the case of m = n + 1.
Let x1, x2, ... , Xn + 1 be arbitrary vectors in V. Expanding these vectors
relative to the basis e = (e1, e2, ... , en), we have
X1 = ~lei + ~Ie2 + ... + ~1en
X2 = ~½e1 + ~ie2 + ... + ~~en,
.....................................................

Arranging the coordinates of x1, x2, ... , Xn + 1 in matrix form, we obtain


the n x (n + 1) matrix

K=
........................

where the jth column comprises the coordinates of Xj U = 1, 2, ... , n + 1).


Since the rank of K does not exceed the number n of rows, (n + 1)
columns in Kare linearly dependent column-vectors being at the same time
the column-vectors of the coordinates of x1, x2, ... , Xn + 1. Then by Theo-
rem 6.4 the linear dependence of these column-vectors implies that the vec-
tors X1, x2, ... , Xn + 1 are linearly dependent. ►
Corollary. All bases in V comprise equal numbers of vectors.
◄ Let e be a basis comprising n vectors and let e' be a basis comprising
n ' vectors. By Theorem 6.5 the linear independence of e{ , ei , ... , e~ implies
that n ' ~ n. On the other hand, by Theorem 6.5 the linear independence
of e1, e2, ... , en implies that n ~ n' Thus we have n = n'. ►
The dimension of a linear space V is the number of vectors contained
in the basis of V.
Examples. (1) The basis of the real coordinate space IRn is formed by
the vectors
e1 = (1, 0, ... , 0, 0), e2 = (0, 1, ... , o, 0), ... , en = (0, o, ... , o, 1).
◄ Indeed, the vectors e1, e2, ... , en are linearly independent since the
identity
a1e1 + a2e2 + ... + anen =0
implies that
6.4 Basis and Dimension 179

a1 (1, 0, ... , 0, 0) + a2(0, 1, ... , 0, 0)


+ ... + an(0, 0, ... , 0, 1) = (a1, a2, ... , an) = (0, 0, ... , 0).

Whence a1 = a2 = ... = an = 0.
Besides, any vector~ = (e, e, ..., ~n) in IRn may be expressed as a linear
combination of e1, e2, ... , en, i.e.,
~ = eo, o, ... , o, o> + ~2(0, 1, ... , o, o>
+ ... + ~n(0, 0, ... , 0, 1) = (e, I/, ... , ~n).
This means that the dimension of [Rn is equal to n. ►
(2) Recall that the homogeneous linear system
a11X1 + a12X2 + ... + a1nXn = 0,
a21X1 + a22X2 + ... + a2nXn = 0,

am1X1 + am2X2 + ... + amnXn = 0,


with nontrivial (nonzero) solutions possesses the fundamental system of
solutions which forms a linear space. The dimension of this linear space
of solutions of a homogeneous linear system is equal to the number of
the fundamental solutions, i.e., to n - r, where r is the rank of the coeffi-
cient matrix and n is the number of the unknowns.
(3) The dimension of the linear space Mn of polynomials of degree not
exceeding n is equal to n + 1.
◄ Since any polynomial of a degree not exceeding n takes the form

P(t) = ao + a1t + ... + antn,


it suffices to show that the vectors e1 = 1, e2 = t, ... , en+ 1 = tn are linearly
independent.
Consider the identity
aol + a1t + ... + antn = 0, (6.6)
where t is an arbitrary quantity. Setting t = 0, we have ao = 0.
On differentiating (6.6) with respect to t, we have
a1 + 2a2t + ... + nantn - 1 = 0.
Whence, setting t = 0, we obtain a1 = 0.
Repeating this process, we arrive at the identity
ao = a1 = ... = an = 0.
This means that the vectors e1 = 1, e2 = t, ... , en+ 1 = t,; are linearly
independent. Hence, the dimension of Mn is equal to n + 1. ►
180~ ------
6. Linear Spaces and Linear Operators
·- ·-- ---------~-------·-·-- ,_..,,,~·--- ---· ·- --- . -··- --------- - -- ·---·····-- - ----- ------ -·--- -----------

A linear space of dimension n is called the n-dimensional linear space.


In symbols, we write dim V = n.
By way of convention the dimension of the linear space Vis considered
equal to n everywhere through this chapter unless otherwise mentioned.
It is clear that for any linear subspace W in V dim W ~ n.
We show that the linear space V comprises linear subspaces of any
dimension k ~ n.
◄ Let e = (e1, e2, ... , en) be a basis in V. It is easy to verify that the linear
span

is a linear subspace of dimension k. ►


By definition dim ( (J} = 0.
Theorem 6.6. Let a1, a2, ... , ak be linearly independent vectors in a linear
space V and k < n. Then there exist vectors ak + 1, ak + 2, ... , an in V such
that the vectors a1, a2, ... , ak, ak + 1, ... , an form a basis in V.
◄ Let b be an arbitrary vector in V. If the vectors a1, a2, ... , ak, b are
linearly dependent then

(6.7)
because the linear independence of a1, a2, ... , ak implies that the nontrivial
linear combination
A1a1 + A2a2 + ... + Akak + µb = (J

involves µ "#- 0.
Suppose that each vector b in V admits a representation of the form
(6.7). In this case the vectors a1, a2, ... , ak form a basis in V by definition.
However, this contradicts to the fact that the number k of the vectors is
smaller than the dimension n of V. This means that there must exist a vector
ak + 1 in V such that the vectors a1, a2, ... , ak, ak + 1 are linearly independent.
If k + 1 = n the vectors a1, a2, ... , ak, ak + 1 form a basis in V.
If k + 1 < n we repeat the previous reasoning for the vectors a1, a2,
... , ak, ak + 1.
This process enables us to complement the collection of the vectors a1,
a2, ... , ak by the vectors ak+ 1, ak+2, ... , an ~o that these n vectors form
a basis in V. In other words, we may always construct a basis of V which
incorporates the basis of a given subspace of V. ►
Example. Construct the basis of IR4 by complementing the collection
of the vectors a1 = (1, 2, 0, 1) and a2 = (-1, 1, 1, 0).
◄ We choose the vectors a3 = (1, 0, 0, 0) and at = (0, 1, 0, 0) in IR 4 • Let
us show that the vectors a1, a2, a3, at form a basis of IR 4.
6.5 Changing a Basis
- -----·· - ---- -···- ----~- ---.·,-·~--···--·-··-·-·--· -·
181
. - - - -··· ···--- - - - ~-~~ --~-- -··--··-----~-------··•--

Consider the matrix

1 2 0 1
-1 1 1 0
A=
1 0 0 0
0 1 0 0

whose rows comprise the coordinates of a1, a2, a3, 84-


Notice that the rank of A is equal to 4. This means that the rows of
A and, consequently, the vectors a1, a2, a3, 84 are linearly independent.
Thus they form a basis of IR 4 • ►
We proceed similarly in the general case.
Suppose we wish to construct the basis of IRn by complementing the
collection of k linearly independent vectors
a1 = (a11, a12, ... , <Xtn), a2 = (a21, a22, ... , a2n),
... , ak = (<Xk1, <Xk2, ... , <Xkn).

By elementary row operations the matrix

a12 <Xtn
a22 a2n
...............................

can be reduced to the schematic form and then complemented by n - k _


rows of the form
(0, 0, ... , 1, 0, ... , 0)
so that the rank of the matrix obtained becomes equal to n.
Theorem 6.7. Let W1 and W2 be linear subspaces in V. Then
dim (W1 + W2) + dim (W1 nW2) = dim W1 + dim W2.

6.5 Changing a Basis


Let e = (e1, e2, ... , en) and e' = (e{, e2, ... , e~) be bases of V. Ex-
panding the vectors in e' with respect to the basis e, we have
n
ej = ~ aje;, j = 1, 2, ... , n (6.8)
i= 1
or
182 6. Linear Spaces and Linear Operators

In matrix form, (6.8) becomes


1 1
a1 a2
2 2
a1 a2
. (6.9)
n n
a1 O'.n

The matrix
1 1
a1 O'.n

A= ai a~
n
a2

describes the transition from the basis e to the basis e' and is called the
transition matrix.
Properties of transition matrices. (a) The determinant of A is not equal
to zero, i.e., det A ;r. 0.
Let us assume the converse, i.e., det A = 0. This means that the columns
in A are linearly dependent.
◄ Since the column-vectors in A are the column-vectors of the coor-
dinates of e{, ei., ... , e~ relative to the basis e, Theorem 6.4 implies that
e{ , ei , ... , e~ are linearly dependent vectors. However, this contradicts to
the fact that e' is the basis of V. Consequently, the assumption that
det A = 0 is false. Hence, det A ¢ 0. ►
(b) If t1, e, ...,
tn and t'1, t'2, ..., t'n are the coordinates of a vector
x relative to the bases e and e ', respectively, then
e t,1
t2 =A t'2 (6.10)

◄ Substituting (6.8) into the formula


n n
X = ~
i=l
eei = ~
j=l
t ,jej
we obtain

,t/e; = JI ~,jct/xJe) = itl (J aJej) ei,


Since, relative to a given basis, the expansion of a vector is unique, the
above relation gives
n
~i
i,;" = ~ ~ Ii,
rvJ( i,;"
L.J IA. ,• -- 1, 2, ... , n•
j=l
6.6 Euclidean Spaces 183

Arranging these expressions in matrix form, we easily obtain (6.10). ►


(c) The inverse A - 1 of A is the transition matrix from the basis e' to
the basis e.
◄ To verify this property it suffices to post-multiply (6.9) by A - 1 • ►

6.6 Euclidean Spaces


A real linear space V is called a real Euclidean space if, given two
arbitrary vectors x and y in V there exists a number (x, y) (or x•y) such
that for any vectors x, y and z in V and any arbitrary number a the follow-
ing conditions are satisfied:
(a) (x, y) = (y, x);
(b) (x + y, z) = (x, z) + (y, z);
(c) (ax, y) = a(x, y);
(d) (x, x) ~ O; the identity holds if and only if x = 8.
The number (x, y) is called the scalar product of vectors x and y.
Examples of Euclidean spaces. (1) The space V 3 of free vectors with
the scalar product of a and b defined as
(a, b) = lallbl cos (a,b).
(2) The real coordinate space [Rn with the scalar product of two vectors
~ = (e, e, ...,
~n) and 11 = (111, 11 2, ••• , 11n) in fRn given by
n
(~, 11) = ~ ~;11;·
;=1
(3) A linear subspace of the Euclidean space.
In all these cases the validity of conditions (a)-(d) are directly verified.
Employing the definition of the Euclidean space we can easily prove
the following properties:
(1) (8, x) = O;
(2) (x, ay) = a(x, y);
(3) (x, y + z) = (x, y) + (x, z);

(4) ( ;ti a;xi, 1~ 1


fl1Y1) = ;t. t
1 1 a,f!j(xi, YJ),

Theorem 6.8. For any two vectors x and y of the Euclidean space there
holds the inequality
(x, y)2 ~ (x, x)(y, y)
(the Cauchy-Schwarz inequality).
◄ If (x, x) = 0, then x = (J and the inequality holds since (8, y) = 0.
Let (x, x) ¢ 0. Then (x, x) > 0.
_184 ___ 6. __!j_near Spaces_ and Linear Operators-· ·····-·----· -·· _______ ·- .....

By the definition of the scalar product the inequality


( tx - y, tx - y) ~ 0 (6.11)
holds for any x and y in V and for any real number t.
The inequality (6.11) may be written as
t 2 (x, x) - 2t(x, y) + (y, y) ~ 0.
The left side of this inequality can be considered as a quadratic trinomi-
al with respect to t. Since the sign of the trinomial remains unchanged for
any t we infer that the discriminant of the trinomial is nonpositive, i.e.,
(x, y)2 - (x, x)(y, y) ~ 0.
Whence we have the desired inequality. ►
Definition. A length of an arbitrary vector x is a number lxl given by
x = ✓ (x, x).
It is easily seen that lxl ~ 0 for any x., and lxl = 0 only if x = 8.

--- --
------ lx+Y
I
I y
I
I
I

Fig. 6.7 Fig. 6.8.

Consider the following identities


Ix + y 2 = (x + y, x + y)
1

= (x, x) + 2(x, y) + (y, y) = lxl 2 + 2(x, y) + IYl 2• (6.12)


Replacing 2(x, y) by 2l(x, y)I and applying the Cauchy-Schwarz inequali-
ty l(x, Y)I ~ lxl IYI, we have -
Ix+ Yl 2 ~ lxl 2 + 2lxllYI + IYl 2 = (lxl + IYl>2-
Extracting the square root, we arrive at the triangle inequality (Fig. 6. 7)
Ix + YI ~ lxl + IYI•
The angle between the nonzero vectors x and y is the number 'P such
that 0 ~ 'P ~ 1r and
(x, y)
cos 'P = lxl IYI •
6.7 Orthogonalization 185

By Theorem 6.8 it follows that


(x, y)
- 1~ lxl !YI ~ 1
for any nonzero vectors x and y, i.e., the range of cos 'P completely cor-
responds to its domain. In other words, the angle between any two vectors
is correctly defined.
The vectors x and y are called orthogonal if (x, y) = 0.
From (6.12) it follows that, given orthogonal vectors, there holds the
identity
Ix + Yl 2 = jxj 2 + IYl 2
which may be thought as generalization of the theorem of Pythagoras,
namely, the square of the lengths of two orthogonal vectors is equal to
the sum of the squares of the lengths of these vectors (Fig. 6.8).
The collection of vectors f1, f2, ... , fk is called orthogonal if (f;, fi) = 0
for i -;rt j, and orthonormal if

(f; f ·) = [ 1 for ~ = !,
' 1 0 for 1 -;rt J.

Theorem 6.9. Any orthonormal collection of vectors f1, f2, ... , fk is


linearly independent.
◄ Computing the scalar products of

a1f1 + a2'2 + ... + akfk = ()


by fi U = 1, 2, ... , k), we have k identities
aj(f;, fi) = 0 U = l, 2, ... , k).
Since (fj, fj) = 1 then ai = 0 for every j U = 1, 2, ... , k). ►

6.7 Orthogonalization
Let f1, f2, ... , fk be linearly independent vectors in a real Euclidean
space. We describe the procedure of constructing a collection of k or-
thogonal vectors using f1, f2, ... , fk (Fig. 6.9).
◄ Set &1 = f 1. For the vector &2 = f2 - a1&1 to be orthogonal to g1 it is
necessary that the identity
0 = (f2, &1) - a1(&1, &1)
holds.
Whence
a1 =
186 6. Linear Spaces and Linear Operators

Thus the vector


g2 = f2 - (fz, gl) g1
(g1, g1)
is orthogonal to g1.

9,=f1
(a) (b)

Fig. 6.9

Using g1, g2, f3, we construct the vector g3 = f3 - f31g1 - f32g2 which
is orthogonal both to g1 and to g2. It suffices to require that the numbers
(31 and (32 satisfy the following conditions
0 = (f3, g1) - f31(g1, g1) and O = (f3, g2) - f32(g2, g2).
Whence
(f3, g1)
(3 1 = (g1, g1)
so that the vector
g3 = f3 _ (f3, gi) g1 _ (f3, g2)
(g1, g1) (g2, g2) gz
is orthogonal both to g1 and to g2.
Similarly, the vector
. _ f· _ (f;, g1) _ (f;, g2) _ _ (f;, g;- 1)
g, - r ( ) g1 ( ) g2 ... _(_ _ _ _)_ g; - 1
g1, g1 g2, g2 g;_ 1, g;-1
(i = 3, 4, ... , k)
is orthogonal to each of the vectors g1, g2, ... , g; _ 1.
Therefore the vectors g1, g2, ... , gk form a collection of orthogonal
vectors.
Dividing each vector g; (i = 1, 2, ... , k) by its length lg;I we obtain the
6.7 Orthogonalization 187

collection of the orthonormal vectors (see Fig. 6.10)


g1 g2 gk
e1 = liif , e2 = lgzf , ... , ek =~. ►

The basis e = (e1, e2, ... , en) of the Euclidean space is called orthonor-
mal if

(e;, ej) = ou = [ 01 ii = j
-;c. j (i, j = 1, 2, ... , n).

Fig. 6.10

Summing up, we arrive at the following theorem.


Theorem 6.10. There exists an orthonormal basis in any Euclidean
space.
Example. Let a1 = (1, 2, 0, 1), a2 = ( -1, 1, 1, 0) and a3 = (2, 2, 0, 1)
form a basis in a Euclidean space.
◄ We apply the procedure of orthogonalization to construct the orthonor-
mal basis in this space.
Set b1 = a1 and b2 = a2 - aa1. For the vector b2 to be orthogonal to
the vector a 1 it is necessary that the identity
0 = (a2, at) - a(a1, at)
is fulfilled.
Whence

so that
1 (1, 2, 0, 1)
b2 = ( -1, 1, 1, 0) - 6 = ( - 67 , 32 , 1) ·
1, - 6
188 6. Linear Spaces and Linear Operators
- - - - . ---- -- --- ------ --- --- ---

Evidently, the vector c2 = 6b2 = ( - 7, 4, 6, -1) is orthogonal to the


vector 81. Indeed,
(c2, 81) = (- 7) x 1 + 4 X 2 + 6 x 0 + ( -1) x 1 = 0.
For vector b3 = 83 - {381 - -yc2 to be orthogonal both to 81 and to c2
it is necessary that the identities
0 = (83, 81) - (3(a1, a1) and 0 = (83, c2) - -y(c2, c2)
are fulfilled.
Whence
(3
(83, 81)
=- -- 2 +4+ 1
(81, 81) 6
and
(83, C2) -14 + 8 - 1 7
'Y = - -- ----- - - --
(C2, C2) 49 + 16 + 36 + 1 102
so that
7 7
b 3 = (2, 2, 0, 1) - 6 (1, 2, 0, 1) + 102 ( - 7, 4, 6, -1)
_ ( 204 - 119 - 49 204 - 238 + 28 ~ 102 - 119 - 7 )
102 ' 102 ' 102 ' 102
6 1 7 4 )
= ( 17'-17'17'-rf"
It is easy to see that the vector c3 = 17b3 = (6, -1, 7, -4) is orthogonal
both to 81 and to c2.
The vectors 81, c2, C3 form a collection of orthogonal vectors. Dividing
81, c2, c3 by their lengths --./6, ..Jioi, ..Jioi, respectively, we arrive at the
collection of the orthonormal vectors
1
e1 = ✓6 (1, 2, 0, 1),

1
e2 = ( - 7, 4, 6, 1),
..J1o2
1
e3 = (6, -1, 7, 4).
..J1o2
The vectors e1, ez, e3 form an orthonormal basis of the Euclidean
space. ►
If we wish to compute the scalar product of two vectors in a Euclidean
space it is helpful to expand the vectors relative to the orthonormal basis
because in this case the desired product can be expressed by the simplest
formula.
__________ §.~_Qrthoc9_!_11_1tlimenJs of Linear Sub~~-~~----------- ________________1_8_9

◄ Let e = (e1, e2, ... , en) be an orthonormal basis of a Euclidean space.


Computing the scalar product of the vectors
n n
x= ~ ~;e; and y = ~ rlej
i=l j=l

we have

In particular
n
(x, x) = ~ (~;)2 •
;=1
Whence

6.8 Orthocompliments of Linear Subspaces


Suppose that we are given a linear subspace W in a Euclidean
space V.
A collection of vectors in V that satisfy the condition
(y, x) = 0,
where x is an arbitrary vector in W, is said to be an orthocompliment of
W, denoted by W .1. In other words, the orthocompliment W .1 of W con-
sists of vectors y in V that are orthogonal to all vectors in W.
Properties of orthocompliments. (a) The orthocompliment W .1 of the
linear subspace W is a linear subspace in V.
◄ Let Y1 and Y2 belong to W .1, i.e.,

(Y1, x) = 0 and (Y2, x) =0


for any vector x in W.
Adding these expressions, by properties of scalar products, we derive
that for any vector x in W there holds
(Y1 + Y2, x) = 0.
Whence we conclude that Y1 + Y2 belongs to W .1.
Since (y, x) = 0 for any vector x in W, we infer that (ay, x) = a(y, x)
and, consequently, ay is a vector in W .1 • ►
(b) V = W (±) W .1 •
190 6. Linear Spaces and Linear Operators

◄ This follows from the fact that the zero vector is the only vector con-
tained both in W and in W .1 • ►
Property (b) implies that any vector x in V is uniquely expressed as
X =y+ Z,

where y is a vector in W and z is a vector in W .1 (Fig. 6.11).

Fig. 6.11

The vector y in W is called the W-component of x and z in W .1 the


orthogonal component of x with respect to W.
Let x be a given vector in a linear subspace W. We show how to construct
the W-component of x and the orthogonal component of x with respect
to W.
◄ We may assume that the orthonormal basis e1, e2, ... , ek is introduced
in W. Then the W-component of x, denoted by y, is expressed as

Substituting y into (• ), we have

where z is the desired orthogonal component- of x with respect to W.


Recall that z is orthogonal to W. Then, multiplying x successively by
e1, e2, ... , ek, we obtain
(x, e1) = a.1, (x, e2) = a.2, ... , (x, ek) = <X.k.
The vectors
y = (x, e1)e1 + (x, e2)e2 + ... + (x, ek)ek
and
z=x-y
6.9 Unitary Spaces 191

are the W-component of x and the orthogonal component of x with respect


to W, correspondingly. ►
Example. Construct the W-component of x = (4, 2, 3, 5) for the linear
subspace W c IR 4, specified by the system of equations
fX1 + X2 + X3 + X4 = 0,
l X1 - X2 - X3 + X4 = 0.
◄ The vectors a1 = (1, 0, 0, -1) and a2 = (0, 1, -1, 0) form the fun-
damental system of solutions and, consequently, the basis of W. To con-
struct the orthonormal basis of W it suffices to divide a1 and a2 by their
lengths. We obtain

e, = ( Jz , 0, 0, - ~) and e2 = ( 0, ~ , - Jz , 0) ·
The vector

y =( ~ - ~) G.o. o. - Jz)
+ ( 1 - 1) (o, Jz , - ~ , o)
- (- ; , 0, 0, D + ( 0, - ; , ; , 0) = ; ( -1, -1, 1, 1)

is the W-component of x = (4, 2, 3, 5) and the vector

z= x- y = (4; , 2;, 2;, 4D


is the orthogonal component of x with respect to W. ►

6.9 Unitary Spaces


A unitary space is a complex linear space U such that given any
ordered pair of vectors x and y in U, there exists a number called the scalar
product (x, y) of x and y so that for any vectors x, y and z in Uthe following
conditions are satisfied
(a) (y, x) = (x, y) (here (x, y) is the complex conjugate of (x, y));
(b) (x + y, z) = (x, z) + (y, z);
(c) (ax, y) = a(x, y);
(d) (x, y) ~ 0, the identity (x, x) = 0 holding for x = fJ only.
Example. Consider a coordinate space en
whose elements are all possi-
ble ordered n-tuples of n complex numbers. The scalar product of the vec-
tors ~ = (~ 1, ~ 2, ••• , ~n) and ff = (71 1, 71 2, ... , 71n) in en is expressed as
n
(t, ff) = I; tr/.
j=l
192 6. Linear Spaces and Linear Operat~rs

6.10 Linear Mappings


Let V and W be two linear spaces, either both real or both complex.
A linear mapping of a linear space V into a linear space W is a rule
or operation d which associates every vector x in V with a unique vector
y = dx .in W so that
(i) d(x1 +x2) = dx1 + dx2 and (ii) d(ax) = dx.
The conditions (i) and (ii) can be expressed in a single formula
d(a1x1 + a2X2) = a1 NX1 + a2 dx2.
In symbols, we write d: V---+ W to designate a mapping of V into W.
To illustrate the concept of linear mapping we shall turn to some ex-
amples:
(1) Let V = W = Mn, where Mn is a space of polynomials of degree
not exceeding n. Laws of differentiation set every polynomial in Mn into
correspondence with its derivative which is also a polynomial of degree
not exceeding n, i.e., belongs to Mn. Yet, the derivative of the sum of the
polynomials (functions) is equal to the sum of the derivatives of these poly-
nomials and the derivative of the polynomial multiplied by a scalar is equal
to the derivative of the polynomial multiplied by the same scalar. Under
the operation 9: Mn ---+ Mn which associates every polynomial in Mn with
its derivative the conditions (i) and (ii) are fulfilled. Hence, the operation
9 : Mn ---+ Mn is a linear mapping.

Fig. 6.12
-
(2) The rule that sets every vector x in V into correspondence with a
vector AX in V, where ~ is a given scalar, is a linear mapping, called the
similarity mapping (Fig. 6.12).
(3) Let e = (e1, e2, ... , en) be a basis of V. The rule 9: V---+ V, which
associates every vector n
X = ee1
+ ,E2e2 + ••• + ~kek + ... + ~nen = I; fe;
i =1
in V with the vector
k
9x = ~1e1 + ~2e2 + ... + ~kek = I; ~;e;
i= 1
6.10 Linear Mappings 193

in V, given k < n, is a linear mapping called the projective mapping. Hence,


the linear mapping 9: V -+ V sets every vector x from V into correspon-
dence with the vector Y'x in V (Fig. 6.13).
(4) The set T2 of trigonometric polynomials of the form
a cos x + (3 sin x
forms a linear space. Similarly to (1 ), it is easy to see that the rule

~= ! :a cos x + (3 sin x -+ - a sin x + (3 cos x

is a linear mapping
~: T2-+T2.

(5) Let A be a given m x n matrix from IRm x n and let X be an n x 1


matrix, i.e., a column-vector. Pre-multiplying X by A constitutes the linear
.
mapping
X E IRn x 1 -+ AX E !Rm x 1
which associates every column-vector X in IRn x 1 with a column-vector in
IRm x 1.

..... ...... \.
................ \.
.....

Fig. 6.13

The mapping /: V -+ V such that / x = x is called the identity


mapping.
An image of a linear mapping d: V-+ Wis a set im dof all vectors
from W satisfying the following condition: a vector y in W is contained
in im d if there exists a vector x in V such that d x = y.
Referring to the examples of linear mappings we mention the folowing:
(l ') The image of the differential mapping P2: Mn-+ Mn is the set of
polynomials of degree not exceeding n - 1.
13-9505
194 6. Linear Spaces and Linear Operators

(2 ') The image of the similarity mapping coincides with the linear
space V.
(3 ') The image of the projective mapping 9: V ➔ V is the subspace
Wk = L(e1, e2, ... , ek),
(4 ') The image of the differential mapping !iJ: T2 ➔ T2 coincides with
the linear space T 2.
Proposition. The image of the linear mapping N: V ➔ W is a linear
subspace of W.
◄ Let Y1 and Y2 be vectors in im d. This means that in V there exist vectors
x1 and x2 such that st'x1 = Y1 and Nx2 = Y2,
From the formula
A1Y1 + Ai.Y2 = At st'x1 + A2 Nx2 = d(A1x1 + AzX2)
it follows that any linear combination of the vectors Y1 and Y2 also belongs
to im N. ►
The dimension of the image of the linear mapping is called the rank
of the linear mapping.
In symbols, we write rank (N) to mean the rank of the linear mapping
N.
The linear mappings SJf': V ➔ W and !18 : V ➔ W are said to coincide
if for any vector x in V there holds Nx = !18 x.
In symbols, we write d = !18 to mean the coincidence of N and !18.
Theorem 6.11 (on construction of a linear mapping). Let V and W be
two linear spaces and e = (e1, e2, ... , en) be a basis in V, and let f 1,
f 2, ... , f n be arbitrary vectors in W.
Then there exists the only linear mapping
SJI: V ➔ W
such that
Nek = fk (k = 1, 2, ... , n). (6.13)
First, we prove that a linear mapping satisfying (6.13) exists.
◄ Consider the expansion of some vector x in V with respect to the
basis e
n'

and the mapping d: V ➔ W such that


n
dx = b ~kfic. (6.14)
k=l
6.10 Linear Mappings 195

It is easy to verify directly that the induced mapping is linear.


Let

Then applying (6.14) we have


n
d(AX + µy) = ~ (A~k + µr/)fk
k=l
n n
= 'A ~ ~kfk +µ ~ r/fk = 'A dx + µ Ny.
k=l k=l
Whence we infer that
dek = fk (k = 1, 2, ... , n). ►
Now we show that the linear mapping satisfying (6.13) is unique.

k.ercfl
X
I
I
I
--1-~
I
I
I
I
I
i:-.:;_----~:Px

im ,f{

Fig. 6.14

◄ Consider the mapping d satisfying (6.13) and the mapping ~: V---+ W


such that
~ ek = fk (k = 1, 2, ... , n).
Computing the actions of d and ~ on an arbitrary vector x in V, we
arrive at the identity

dx = :t
k=l
~kfk =
k=l
:t ~k ~ ek = ~ ( ±
k=l
~kek) = ~ x.
Whence we conclude that the linear mappings d and ~ coincide. ►
From Theorem 6.11 it follows that we may define a linear mapping by
identifying its action on the base vectors of V.
13*
196 6. Linear Spaces and Linear Operators

The kernel of the linear mapping d: V ➔ W is a set of all vectors of


V which are mapped into the zero vector of Ow of W.
We denote the kernel of d by ker Sil.
Turning back to the examples (1-5) of linear mappings, we can easily
notice that
(1 ") The kernel of the differential mapping !}) : Mn ➔ Mn is a set of
polynomials of degree zero.
(2 fl) The kernel of the similarity mapping consists of Ov only.
(3 11
The kernel of the projective mapping 9: V ➔ V is the linear sub-
)

space L(ek+I, ek+2, ... , en) (Fig. 6.14).


(4 fl) The kernel of the differential mapping !}) : T2 ➔ T2 ~onsists of Ov
only.
(5 11
The set of solutions of the homogeneous linear system
)

AX =0
form the kernel of the mapping
d: fRn x 1 ➔ fRm x 1.

Proposition. The kernel of the linear mapping d: V ➔ W is a linear


subspace of V.
◄ The identities Nx = Ow and dy = Ow ~mply that

d(),.x + µy) = Adx + µdy = A0w + µOw = Ow ►.

The dimension of the kernel of the linear mapping dis called the nul-
lity of Sil.
In symbols, we write dim ker N = nullity Ne
Notice that for any linear space d: V--+ W the following identity holds
rank N + nullity N = dim V. ( •)
Operations on linear mappings. Let d: V --+ W and ~ : V --+ W be two
linear mappings.
The sum of the linear mappings d and ~ is the mapping -67: V --+ W
such that given any x in V
£x=dx+~x.
It is easy to verify that the mapping -67 is linear. Indeed,
-67 (),.x + µ,y) = d(),.x + µ,y) + ~ (Ax + µ,y)
= A(.QtX + ~) + µ{dy + ~y) = A£X + µ,£y.
In symbols, we denote the sum of d and /Jd as -67 = d + 11d.
The product of the linear mapping d: V --+ W by an arbitrary number
a is a mapping ~ : V --+ W such that given any x in V
~x=adx.
6.11 Linear Operators 197

The mapping .'d1 is linear since

yg (AX + µy) = Q'. ,Clf'(AX + µy)


= A{a ,wx) + µ(a ,o/y) = A :!/J X + µ .-f() y.

In symbols, we write ,'d& = a,w.'

6.11 Linear Operators


We shall confine ourselves to linear operators, that is, to linear map-
pings from a space V to V. Notice that the differential mappings, the identi-
ty and projective mappings and pre-multiplication of a square matrix by
a column-vector that we have encountered in the previous sections are all
examples of linear operators.
Multiplication of linear operators. Let ,CV: V -+ V and .ifJ : V -+ V be
two linear operators.
The product of the linear operator ,CV by the linear operator YJ is the
mapping C: V -+ V such that
Cx = .!/IJ ( .w'x).
We show that C is a linear operator. Indeed

C(Ax + µy) = .i!/J (,w'(Ax + µy)) = ,'d& (A ,r✓ x + µ sv'y)


= A .1!/J ( ,Gix) + µ ,if6 ( ,wy) = A Cx + µ Cy.

In symbols, we write r% = ,{_j/j ~if.


Remark. Notice that in general .~w' ~ ~w'.YIJ as is easily seen from the
following example.
Example. Let V = IR 2 (Fig. 6.15).
◄ The mappings

,.w: (e, ~ (e,


2 ) -+ 0) and ~: (e, e) (e + e, e)
-+

are linear operators from V into V.


Then

Whence we deduce that


~.w<~l, e> I= (~>(~1, e>
if ~2 "# 0. ►
198 6. Linear Spaces and Linear Operators

------ -----,
I I
I I
I I
I I
I I

0 {fi..8)1;

Fig. 6.15

Definition. The linear operators Ji?": V --+. V and !18 : V --+. V are said to
be equivalent if given any vector x in V there holds Ji?"x = !Jlx.
Let d: V -+ V be a linear operator.
The linear operator !18 : V -+ V is said to be an inverse of d if
(6.15)
where /: V--+. Vis the identity operator, i.e., /x = x for any vector x in V.
Theorem 6.12. For the linear operator d: V-+ V to be invertible it is
necessary and sufficient that the image of d coincide with the space V,
i.e., im d = V.
◄ Suppose, first, that there exists a linear operator !18 inverse to the given
linear operator d. _
Recall that im d of dis a subspace of V.
We show that an arbitrary vector y in V belongs to im d
Let x = ~y. By (6.15) we have
dx = d(~y) = (~)y = /y = y.
Whence we infer that the vector y is the image of the vector x = ~ y
and, consequently, y belongs to im d Therefore, im d = V. ►
Now we suppose that the image of .9/ coincides with the space V, i.e.,
imd= V. Then
rank d= dim V
6.11 Linear Operators 199

and the linear operator d maps a basis of V into another basis of V, i.e.,
d: e = (e1, e2, ... , en) ➔ f = (f1, 12, ... , ln),
where fk = dek (k = 1, 2, ... , n).
Consider a linear operator §8 such that
Bilk = ek (k = 1, 2, ... , n). (6.16)

.Theorem 6.11 states that the linear operator §8 satisfying (6.16) is


unique.
Let us compute (~x and (~ )x for an arbitrary vector x in V.
◄ Relative to the basis e the vector x takes the form
n
X = ~ ~kek•
k=l
Using (6.16), we obtain

(~)x = (~)
n
Ct ~kek) = 9J
n
(J, t.iafek)
n
= ~ ~k Bl(dek) = ~ ~k Bilk= ~ ~kek = x.
k=l k=l k=l

Analogously, relative to the basis f we have

and
n n n
(~)x = ~ 11k d(Blfk) = ~ 11k dek = ~ 11kfk = x.
k=l k=l k=l

Thus for any vector x in V we have


~dx = x and ~ x = x.
Whence we conclude that
Bid=~=~ ►

Remark. Theorem 6.12 states that the inverse ~ of dis uniquely


defined.
In symbols, we write .w-- 1 to mean the inverse of the linear operator d
Corollary. The linear operator d: V ➔ V is invertible if and only if
the kernel of d consists of the zero vector only, i.e.,
ker d= (6v).
This immediately follows from Theorem 6.12 and formula (•) on
page 196.
200 6. Linear Spaces and Linear Operators

Example. The linear operator

Jli': <e, ~'> --- (e. ; e)


compresses the plane uniformly (with coefficient 2/3) towards the axis e
while the inverse operator

Jli'- '= <~'. e> --- (e. ; e)


extends the plane uniformly with coefficient 3/2 (Fig. 6.16).

~· 1
,,,,,,.-- .............
/
I '-
I \
/1 ~2
I
/
_,/
-1

Fig. 6.16

6.12 Matrices of Linear Operators


Suppose that the linear operator d: V ~ V acts on the vectors e1,
e2, ... , en of the basis e = (e1, e2, ... , en) in V so that
n
de; = ~ akk = ale1 + are2 + ... + a7en (i = 1, 2, .... , n).
k=l

The matrix

A= A(e) = ...........................
whose columns comprise the coordinates of the images of the base vectors
is called the matrix of the linear operator d relative to the basis e.
2
Examples. (1) Relative to the basis eo = 1, e1 = t, e2 = ~! , e3 =
3
~! the matrix D(e) of the differential linear operator ~: M3 ➔ M3 takes
6.12 Matrices of Linear Operators 201

the form
0 1 0 0
0 0 1 0
D(e) =
0 0 0 1
0 0 0 0
(2) Relative to the basis e1 = cos x, e2 = sin x the matrix D(e) of the
differential linear operator .!»: T2 -+ T2 takes the form

D(e) = ( _~ ~)•
since

and

Let
y=Jdfx
and let
n n
x = b ~'e; and y = b r/ek
i=1 k=l

be the expansions of the vectors x and y relative to the basis e.


Then the column-vectors
e
~2
rJ 1
1/2
x(e) = .. and y(e) = .
. .
~n 1/n

which comprise the coordinates of x and y relative to the basis e are related
as
y(e) = A(e)x(e). (6.17)
◄ Indeed, since the expansion of the vector y relative to the basis e is
unique, comparing

with
202 6. Linear Spaces and Linear Operators

yields
n
k i
~ a;~ (k = 1, 2, ... , n).
i =1
Arranging these n identities in matrix form, we have
.r,1 al a½ a~ e
r, 2 aI a~ a~ ~2

Whence follows (6.17). ►


Theorem 6.13. The rank of the matrix A(e) of the linear operator
.sd': V --+ V is independent of the choice of the basis e and is equal to the
rank of d
◄ Since

im N = L( N e1, N e2, ... , N en)


the rank of dis equal to the maximal number of linearly independent
vectors belonging to the collection of the vectors d e1, d e2, ... , den. By
virtue of Theorem 6.4 this number is equal to the maximal number of
linearly independent column-vectors in the matrix A(e), i.e to the rank of
A(e).
Therefore
r(A(e)) = rank d ►

It is easy to verify that addition and multiplication of the linear opera-


tors become addition and multiplication of the matrices of these operators
relative to the same basis and multiplication of a linear operator by a scalar
becomes multiplication of the matrix of the operator by this scalar.
◄ By way of illustration we show that the matrix of£= ~.sd' is equal to
the product of the matrices A(e) and B(e) of the linear opera-
tors d and ~ relative to the basis e, i.e.,
C(e) = A(e)B(e).
Let
n n
de;= ~ afek and ~ek = ~ fj'fem.
k=l m= 1
Then
203
- - -6.12
-- Matrices of Linear Operators
------=-----------------

Setting
n
'Y'I'= ~ {J'!/cxf (i, m = 1, 2, ... , n) (6.18)
k=l

we may write
C(e) = ("/'/'). (6.19)
On the other hand, since A(e) = (af) and B(e) = (/J'f) (6.18) and (6.19)
yield
C(e) = B(e)A(e). (6.20)
Thus, relative to the basis e, the matrix of the linear operator !J8d is
equal to B(e)A(e). ►
From the equivalence of the operation of multiplication of linear opera-
tors and that of multiplication of the respective matrices it easily follows
that the matrix of the linear operator d - 1 which is inverse to the opera-
tor d is the inverse of the matrix A of d
◄ Indeed, by definition of the inverse operator we have

~w- 1 ~W= / and dd- 1 = /.


This means that the matrix of d - 1, say B, must satisfy the identities
BA = I and AB = I.
Whence we conclude that B is the inverse of A, i.e.,
B=A- 1• ►
Theorem 6.14. Let d: V ➔ V be a linear operator and let A= A(e)
and A' = A(e') be matrices of N relative to the bases e and e' of a linear
space V.
Then
A' = s- 1AS, (6.21)
where S is the matrix of transition from e to e '.
◄ Let y = NX.
Relative to the bases e and e' the column-vectors of the coordinates
of x and y are related as
y(e) = Ax(e) and y(e') = A'x(e'). (6.22)
By Property (b) of transition matrices we may write
x(e) = Sx(e') and y(e) = Sy(e'). (6.23)
Substituting (6.23) into the first identity of (6.22), we obtain
Sy(e') = ASx(e').
204 6. Linear Spaces and Linear Operators

Whence, using the second identity of (6.22), we have


SA'x(e') = ASx(e').
Notice that the above identity holds for any vector x. Hence we may
write
SA' = AS.
Since the transition matrix S is nonsingular and, consequently, inverti-
ble, pre-multiplication of the last identity by S - 1 gives (6.21). ►
Corollary. The determinant of the matrix of a linear operator remains
unchanged in any basis.
◄ Let us compute the determinant of the matrix

A(e ') = S - 1A(e)S.


We have
det A(e') = det (S - 1A(e)S) = det S - 1 •det A(e)•det S = det A(e),
since
det S - 1 = (det S)- 1• ►
It is easy to verify that the determinant of the matrix of the linear
operator
N- t/,
where / is the identity operator and t is an arbitrary number, also remains
unchanged in any basis.
◄ Indeed, let A(e) - ti and A(e ') - ti be matrices of the operator
d - t / relative to the bases e and e '.
Using (6.21), we obtain
A(e-') - ti = s- 1A(e)S - ti= s- 1(A(e) - tl)S.
Then by Corollary of Theorem 6.14 we have
det (A(e ') - ti) = det (A(e) - ti). ►

Expanding the determinant

det (A - ti)=

we can write
x(t) = det (A - ti) = ( - Itt" + 'Yn - 1t" - l + ... + -y1t + 'YO·
6.13 Eigenvalues and Eigenvectors 205

The polynomial x(t) is called the characteristic polynomial of the linear


operator N and the roots of x(t) are called the eigenvalues of the linear
operator N (the matrix A). Notice that the characteristic polynomial x(t) -
is independent of the choice of a basis.

6.13 Eigenvalues and Eigenvectors


The nonzero vector x in V is called the eigenvector of the linear
operator N: V ➔ V if there exists the eigenvalue A of S1f such that
Nx = AX.
Examples. (1) Any polynomial of the zero degree is the eigenvector of
the differential operator
d
9 = dt : Mn ➔ Mn.

◄ Consider a polynomial of the zero degree, which is equal to 1. Then


applying (*), we have

! (1) =0 X 1 = 0.

Whence it follows that the eigenvalue A of 9 is equal to zero. ►


(2) The differential operator

9 = _!}_
dt ·. T2 ➔ T2

has no eignvectors.
◄ Consider the trigonometric polynomial a cos t + {3 sin t.
Using (*), we have
9 (a cos t + {3 sin t) = A (a cos t + {3 sin t).
This means that
- a sin t + {3 cos t = A.a cos t + A{3 sin t,
or
(A{3 + a) sin t + (Aa - (3) cos t = 0,
which is fulfilled if and only if
a + A{3 = 0 and A.a - {3 = 0.
Whence a = {3 = 0. Consequently, the polynomial a cos t + {3 sin t is a
zero one and can not be the eigenvector of !!I. ►
206 6. Linear Spaces and Linear Operators

Theorem 6.15. The number >.. is the eigenvalue of the linear opera-
tor ,S>;/ if and only if >.. is the root of the characteristic polynomial x(t)
of d i.e. x(>..) = 0.
◄ Suppose that >.. is the root of the polynomial x(t) in which case

x(>..) = det (A(e) - >..I) = 0. (6.24)


Consider the homogeneous linear system

e 0
(A(e) - >..I) e -
0

tn 0
or

(al - o:1t2
>..)€1 + + ... + O:n1 tn =0
a~e + (o:i - >..)f + ... + o:~tn ::;:: 0,' (6.25)
................................................................

By virtue of (6.24) system (6.25) has the nonzero solution t1, t2, ... ,
tn and
n
X = ~ ~;e; -;c Ov.
i =1

The column-vector x(e) of the coordinates of x satisfies the condition


(A(e) - >..I)x(e) = 0,
which may be written as
A(e)x(e) = AX(e),
or, equivalently,
dx = AX.
.
Whence it follows that >.. is the eigenvalue of the linear operator ,S>;/ and
x is the eigenvector of d
Now we assume that >.. is the eigenvalue of d Then there exists a non-
zero vector x such that ,S>;/x = AX.
Let e = (e1, e2, ... , en) be a basis. Then we may write the matrix equation
A(e)x(e) = AX(e),
or
(A(e) - Al)x(e) = 0. (6.26)
6.13 Eigenvalues and Eigenvectors 207

Since x is the eigenvector of d the column-vector x(e) of the coor-


dinates of x is a nonzero vector. This means that system (6.26) has a nonzero
solution. In this case the condition
det (A(e) - /\I) = 0
must be fulfilled.
Thus
x(l\) = o. ►

Remark. To find all the eigenvectors corresponding to the given eigen-


value I\ it is necessary to construct the fundamental system of solutions
for (6.25).

Fig. 6.17

Examples. (1) Let


.9: v3 ~ v3
be a projective linear operator such that
.021: xi + yj + zk ~ xi + yj.
Compute the eigenvectors of 9.
◄ Let us consider the action of 9 on the base vectors i, j and k (Fig. 6.17).
We have
9: i ➔ i, 9: j ➔ j and 9: k ➔ 8.
Since the matrix of 9 is
0
1
0
208 6. Linear Spaces and Linear Operators

the characteristic polynomial of 9 becomes


1- A 0
0 1- A
0 0
with roots At = 0 and A2, A3 = 1.
Consider the homogeneous linear systems corresponding to the distinct
values of A equal to O and 1. These are defined by the coefficient matrices

( ~ ~ ~)
0 0 0
and ( ~0 ~0 -1~ )
respectively. Then we may write the homogeneous linear systems as

rt x y
o= o
~: and rt ~ o.~:
-z =
The corresponding fundamental systems of solutions are

(~) and (i} (!}


Whence we deduce that the eigenvectors of 9 are the base vector k and
any vector of the form xi + yj (x2 + y 2 > 0) associated with the eigen-
values O and 1, respectively. ►
(2) Let
.!?lJ = :t : +
a {3t + 'Yt 2 ---+ (3 + 2'Yt
be a differential linear operator defined over a space of polynomials of
degree not exceeding 2.
Compute the eigenvectors of !JJ.
◄ Relative to the basis 1, t, t 2 the matrix D of fg takes the form

D=(~~~)-
0 0 0
Then the characteristic polynomial - A3 = 0 has the root A = 0 of mul-
tiplicity 3.
Consider the homogeneous linear system

{3=0]
2"( = 0
0=0
whose solution is 1, 0, 0. The triple 1, 0, 0 corresponds to the polynomial
of the zero degree which is the eigenvector of fg, ► ·
6.14 Adjoint Operators 209

6.14 Adjoint Operators


In a finite-dimensional Euclidean space a linear operator can be
subjected to an operation which gives rise to a linear operator called the
adjoint operator.
Let V be an n-dimensional Euclidean space and J:if: V--+ V be a linear
operator.
We shall associate with J:if a linear operator d*: V --+ V such that for
any vector x and any vector y in V there holds
(J:ifx, y) = (x, J:if*y). (6.27)
Definition. The linear operator Jd'*: V--+ V satisfying (6.27) is called
the adjoint operator of the linear operator d: V --+ V.
Let us show that for any linear operator d there always exists the ad-
joint operator J&f*.
◄ Let e = (e1, e2, ... , en) be an orthonormal basis of V and A =
A(e) = (ex!) be a matrix of the linear operator d relative toe, in which case
n
.~; = ~ o:1ej (i = 1, 2, ... , n). (6.28)
j=l
Transposing the matrix A and setting B = A', i.e., {3j = o:1 for any i
and any j (i, j = 1, 2, ... , n), we may define the linear operator J:if*: V--+ V
such that
n
J&f*e; = LJf31ej (i = 1, 2, ... , n), (6.29)
j=l
where
{31 = o:j and i, j = 1, 2, ... , n. (6.30)
(Recall that by virtue of Theorem 6.11 a linear operator is uniquely defined
by its action on the base vectors of V.)
To verify (6.27) we first set x and y equal to the base vectors, say, x = e;
and y = ej.
Since the basis e is orthonormal (6.28) gives
n ) n n
(de;, ej) = ( LJ o:fek, ej = LJ o:f{ek, ei) = LJ o:foki = o:f.
k=l k=l k=l

Analogously, using (6.29) and (6.30), we obtain


n k ) n n k ' .
(e;, d*ej) = ( e;, LJ /3iek = LJ [jj(e;, ek) = LJ {3/o;k = {3j = o:J.
k=l k=l k=l
Whence
(de;, ej) = (e;, d*ei) (i, j = 1, 2, ... , n). (6.31)

14-9505
210 6. Linear Spac_es and Linear Operators

Now we suppose that x and y are arbitrary vectors in V. Relative to


the orthonormal basis e the expansions of x and y are
n n
x = ~ ~;e; and y = ~ r/ej,
i=l j=l

Then computing the expressions on the left- and right-hand sides of


(6.27), we have

(dx, y) = (d (t, ~•e,), ;t/e;) ,t,;t/•/(de;, e;)


=

and

(x, .w•y) = (J, t'e,, .w• (t!fe;)) = ,t,;t,~'r/.(e;, .w•e;).


Whence by virtue of (6.31) we arrive at the desired result. ►
Example. Consider a two-dimensional linear space of polynomials of
degree not exceeding 1 with real coefficients.
Let
.p(t) = a + bt and .J,(t) = c + dt
be any two polynomials of degree not exceeding 1 and let
(,p, t/,,) = ac + bd
be a scalar product of ,p and t/t.
Then a two-dimensional space of polynomials of degree not exceeding
1 with real coefficients becomes a two-dimensional Euclidean space M1.
Let Pfi: M1 ➔ M1 be a differential operator such that !» (a + bt) = b.
We can define the adjoint operator !» *: M1 ➔ M1 as follows.
Notice that the polynomials 1 and t form the orthonormal basis of M1
since using (•), we obtain (1, 1) = (t, t) = 1 and (1, t) = 0, in which case
!» (1) = 0 and!» (t) = 1, and the matrix of!» becomes (~ ~).
Then ( ~ ~) is the matrix of the adjoint operator !» ' such that
~•(t) = t and ~•(t) = 0.
For an arbitrary polynomial f>( t) =a+ bt we have Pfi: a + bt ➔ b and
~ •: a+ bt ➔ at.
Properties of adjoint operators. (a) Every linear operator has a single
adjoint operator.
◄ Suppose that /JI and~are two distinct adjoint operators of a given linear
operator ~ Then for any x and any y in V there hold
(.2fx, y) = (x, .PAy) and (dx, y) = (x, ~y).
____6_.1_5_S~ymmetric Operators _ _ ____ ___ __ _ _____ _ _ _____________________ J!_! _

Whence
(x, ,yg y) = (x, -t'y)
and
(x, ~Y --t'y) = 0.
Since x is an arbitrary vector in V we infer that the vector ~ y - -ef'y
is orthogonal to every vector in V and, consequently, to itself. The latter
implies that ~Y --t'y = 6 and §&y = G'y. Hence, §& = -ef' since y is an ar-
bitrary vector in V. ►
From Property (a) it immediately follows that
(b) (aw")"' = ot);f/*, where o: is an arbitrary real number;
(c) (.w+ ~)* = .w'* + ~•;
(d) (N~)* = ~ * .w*;
(e) (d*)* = d
We shall also mention the other two important properties of adjoint
operators, namely
(f) Let e be an orthonormal basis of V. Then for the linear operators
N: V ➔ V and ~: V ➔ V to be mutually adjoint, i.e., to satisfy both
~ = N* and d = 11d * it is necessary and sufficient that relative to the basis
e the matrix of one of the operators, say B = B(e), be obtained by transpos•
ing the matrix A = A(e) of the other operator so that B = A'.
Notice that this property holds true only if A and B are matrices ar-
ranged relative to an orthonormal basis and violates otherwise.
(g) If a linear operator N is nonsingular so is the adjoint operator d*
of dand
(.w - 1)• = (.w'* ) - 1.

6.15 Symmetric Operators


A linear operator dis said to be symmetric (or self-adjoint) if d
is identical to its adjoint operator N'* so that N* = d
Notice that by virtue of Property (f) the matrix of a symmetric operator
relative to an orthonormal basis is also symmetric and remains unchanged
when being transposed. Therefore a symmetric operator is adjoint to itself
and may also be called the self-adjoint operator.
Example. Let g; be a linear operator that defines an orthogonal projec-
tion of a three-dimensional Euclidean space with the induced Cartesian
coordinate system Oxyz onto the xy-plane (Fig. 6.18).
Relative to the orthonormal basis i, J, k the matrix of 9 takes the sym-
metric form
14*
212 6. Linear Spaces and Linear Operators

(100)
0 1 0 ,
0 0 0

since 9i = i, 9j = j and 9k = fJ. This means that the operator 9is sym-
metric.

Fig. 6.18

Properties of symmetric operators. It is worth mentioning some remark-


able properties of symmetric operators. The first two properties given below
are the direct consequences of the definition of a symmetric operator.
(a) For a linear operator d: V-+ V to be symmetric it is necessary and
sufficient that for any vectors x and y in V there holds
(dx, y) = (x, dy). (6.32)
(b) For a linear operator to be symmetric it is necessary and sufficient
that relative to an orthonormal basis its matrix become symmetric.
(c) The characteristic polynomial of a symmetric operator (and the as-
sociated symmetric matrix) has only real roots.
Recall that any real root A of the characteristic polynomial is the eigen-
value of the corresponding linear operator ,P/, i.e., there exists a nonzero
vector x (the eigenvector of d) such that dx = ~-
(d) The eigenvectors of a symmetric operator corresponding to distinct
eigenvalues are orthogonal.
◄ Let x1 and x2 be the eigenvectors of d so that dx1 = A1X1 and
dx2 = A2X2, and let A1 ~ A2.
Since d is a symmetric operator we have
(dx1, x2) = (x1, dx2).
On the other hand,
(dx1, x2) = (A1x1, x2) = A1 (x1, x2)
6.16 Quadratic Forms 213

and
(x1, dx2) = (x1, A2X2) = A2(x1, x2).
Whence
A1 (x1, x2) = A2(X1, x2)
and C"-1 - A2)(x1, x2) = 0.
Since A1 - A2 ;e. 0 we arrive at
(X1, X2) = 0. ►
(e) Let d: V -+ V be a symmetric operator. Then in V there exists an
orthonormal basis e = (e1, e2, ... , en) comprising the eigenvectors of d
so that
de; = A;e; (i = 1, 2, ... , n),
(e;, ej) = ou (i, j = 1, 2, ... , n).
Turning back to the previous example we easily see that the triple
(i, j, k) is the desired orthonormal basis in V since the vectors i and j are
the eigenvectors of 9corresponding to the eigenvalue 1 (of multiplicity 2)
and k is the eigenvector corresponding to the eigenvalue 0.
(t) If a nonsingular operator d: V-+ V is symmetric so is its inverse
Jd'- 1: V-+ V.
Remark. All the eigenvalues of a nonsingular operator are distinct from
zero. Indeed, if A ;e. 0 is the eigenvalue of a nonsingular operator Ji{,
then 1/A is the eigenvalue of the inverse operator d - 1.
We shall say that a symmetric operator d' is positive if given any non-
zero vector x in V there holds (dx, x) > 0.
Properties of positive operators. (a) A symmetric operator Sif: V-+ V
is positive if and only if all the eigenvalues of dare positive.
(b) A positive operator is nonsingular.
(c) If an operator dis positive so is its inverse.

6.16 Quadratic Forms


Let A = (au) be a symmetric matrix of order n so that ai; = au.
Then the expression
n n
~ ~O!;jf/e (6.33)
i = lj= 1

is said to be the quadratic form in the variables ~1,e, ...,


~n. The matrix
A is called the associated matrix of the quadratic form.
The quadratic trinomial ax2 + 2bxy + cy2, where a, b and c are real
numbers, serves as an example of the quadratic form in two variables x
and y, the associated matrix being (: !).
214 6. Linear Spaces and Linear Operators

The n-tuple of numbers e, ~


2, ••• , ~n may be regarded as the coor-

dinates of a vector x in an n-dimensional real space V relative to a given


basis so that
X = ~1e1 + ee2 + .,, + ~"en,
where e = (e1, e2, ... , en) is the orthonormal basis of V. Then (6.33) be-
comes a numeric function of a vector-valued argument x defined over the
space V. This function is customarily written as
n n
.w'(x, x) = ~ ~ au~if. (6.34)
I• lj.., 1

We shall also say that the function JJf (x, x) is defined in an n-dimensional
Euclidean space V.
We may also associate with any quadratic form d(x, x) the bilinear
form
n n
d(x, y) = I; I; au~ir,1, (6.35)
i = lj = 1

where ,,1, r,2, ... , 11" are the coordinates of the vector y relative to the
orthonormal basis e = (e1, e2, ... , en) so that
Y = '] e 1
1 2 n
+ 'Y/ e2 + ,,. + 'Y/ en ,
The form (6.35) is called bilinear since it is linear in both the argument
x and the argument y so that
.W(a1X1 + OL2X2, y) = a1N(x1, y) + a2d(x2, y)
and
d(x, f3tY1 + f32Y2) = f31d(x, Y1) + /32.w'(x, Y2),
where a1, .a2, /31 and f32 are arbitrary numbers.
The bilinear form (6.35) is symmetric since its value is independent of
the order in which x and y occur in (6.35), i.e.,
..Qf(y, x) = d(x, y).

Computing the value of d(x, y) for the base vectors, i.e., for x = ek,
y = em, we obtain
(6.36)
Whence it follows that the elements of the associated matrix A of the quad-
rati~ form (6.34) are the values of the bilinear form computed for the vec-
tors of the basis e.
The scalar product of two vectors in an n-dimensional coordinate space
IR"
6.16 Quadratic Forms 215

where~ = (t1, e, ...,


t) E !Rn and r, = (171, 17 2, ... , 11n) E !Rn, is the bilinear
form.
The associated quadratic form
I~ I2 = <t ~) = <t1 >2 + <e) 2 + ... + <~n)2
defines the square of the length of the vector ~-
The coordinates of a vector x relative to a different basis are different
and so is the matrix of the quadratic form.
In a variety of applications we need to simplify a quadratic form by
converting it to a diagonal or normal form.
A quadratic form is said to be of diagonal form if the coefficients in
all ~;-t are equal to zero. In other words, a quadratic form N(x, x) is of
a diagonal form if cx;1 = 0 for all i ~ j and
N(x, x) = cx11(t1)2 + o:22(e)2 + ... + o:nn(t)2.
The associated matrix ~s also diagonal, 1.e.,

( CXQll CX22. • Q)
• CXnn •

Theorem 6.16. For any quadratic form defined over a Euclidean space
there exists an orthonormal basis relative to which the associated matrix
becomes diagonal.
◄ To prove this theorem we shall use the arguments that follow from
properties of symmetric operators.
We choose the orthonormal basis e = (e1, e2, ... , en) and consider the
linear operator N: V ~ V such that, relative to e, the matrix (o:j) of N'
is identical to the matrix (o:u) of a given quadratic form, i.e., cxJ = au,
Since (cxJ) is symmetric so is the operator d
Let us compute (Nx, x). Since the basis e is orthonormal we have
(Ne;, e1) = ex{= cx;J,
and
(Nx, x) = (N (.± ~;e;), .± tei)
l=l J=l
n n n n
= ~ ~ ~;~(Ne;, eJ) = ~ ~ o:;J~;t = N(x, x).
i =lj = l i = lj = l

Whence we infer that the quadratic form N (x, x) defined over a Euclidean
linear space V and the symmetric operator N acting in V are related as
N(x, x) = (Nx, x). (6.37)
Recall that for any symmetric operator and, in particular, for ~ in V
there exists an orthonormal basis f = (f1, f2, ... , fn) comprising the eigen-
216 6. Linear Spaces and Linear Operators

vectors of sf so that
sffk = Akfk (k = 1, 2, ... , n); (fk, fm) = Dkm• (6.38)
Notice that
(k = m),
(.Wfk, fm) = (Akfk, fm) = Aklikm = ( ~k (k ~ m).
Substituting the expansion of x
n

X = ~ r/fk
k=l
relative to the basis f = (f 1, f2, ... , fn), into (<w"x, x), we have

tw'x, x) = (w Ct/l'f•) •mt 11mfm)


n n n
= ~ ~ r/r,m(,w'fk, fm) = ~ Ak(r,k)2.
k=lm=l k=1

Whence, (6.37) yields


n
,w(x, x) = ~ Ak(17k)2. (6.39)
k=l
Thus, the matrix A(f) of the original quadratic form becomes diagonal
relative to the basis f, so that

A(f) = (~ <>2 • • • ~.)· ►


We may convert a quadratic form to diagonal form without making
explicit computations of the base vectors of f. It suffices to compute the
eigenvalues of the corresponding linear operator or, equivalently, the eigen-
values of the associated matrix A = (au) counted with their multiplicities.
Example. Reduce the quadratic form
d(x, x) = 2.xy + 2yz + 2xz
to the diagonal form.
◄ The associated matrix is

(~~ !).
1 1 0
To find the eigenvalues we must solve
-'A 1 1
1 -'A 1 = -'A3 +3'A+2=0
1 1 -'A
yielding 'A1 = 2 and 'A2,3 = -1.
217

Thus we have
,w(x, x) = 2.x2 - y2 - i,2 •

It is much harder to compute the desired orthonormal basis. To this


end we shall find the eigenvectors of the symmetric operator .if that are
identical to the eigenvectors of the matrix of the quadratic form .w (x, x).
Let A = 2. Consider the homogeneous linear system specified by the
coefficient matrix
1
-2
1 -2
!) .
We have
-2x + y + z = 0,
X - 2y + Z = 0,

X + Y - 2z = 0.

All solutions of the system are proportional to the vector (1, 1, 1)'.
. vector 1s
H ence, t he unit . 7
1 = ( 1 , v3
Y3 1 , v3
1 )'

Let A = -1. The homogeneous linear system defined by the coefficient


matrix

o:o
has two linearly independent solutions and we have to choose them so that
they become orthogonal.
The system reduces to the single equation x + y + z = 0. Then the
desired solutions are (1, - 2, 1)' and (1, 0, -1)' and the unit vectors are
J = (l/v'6, - 2/Y6, 1/Y6)' and k = (1/v2, 0, - l/v2) '.
It is easy to verify that both the vector Jand the vector k are orthogonal
to the vector I. (Notice that this result also follows from Property (d) of
a symmetric operator.)
Then the desired orthonormal basis comprises the vectors
i= i+j +k ';' _ i - 2j + k
{f ,J- v6
Remark. We may accept any n-dimensional Euclidean space as V.
However, of practical interest is a coordinate space !Rn whose elements are
all possible ordered n-tuples of real numbers ~ = (~ 1, f, ... , ~n). The basis
of !Rn comprises the vectors (1, 0, ... , 0, 0), (0, 1, ... , 0, 0), ... , (0, 0,
... , 1, 0) relative to which the scalar product of two vectors ~ = (~1, e,
218 6. Linear Spaces and Linear Operators

. . .' tn) and 'fJ = (17 1 , 17 2 , ••• , 17n) is given by the formula
<t
,,,> = 1 + f 11 2+ ... + t1,n.
e11
We shall describe the procedure that enables us to choose the basis rela-
tive to which a given quadratic form specified over an n-dimensional coor-
dinate space becomes diagonal.
n n
◄ Let sd' (x, x) = ~ ~ a;j~; ~ be a given quadratic form.
i = lj = 1

Step 1. Write down the associated matrix


a11 a12 . . . a1n
a21 a22 . . . a2n

Un 1 Cin2 . . . Cinn

Step 2. Solve the polynomial equation


a11 - t a12
a21 a22 - t ...
=0
Cinl Cin2 Cinn - f

yielding the eigenvalues of the associated matrix of d(x, x).


Arrange the eigenvalues A1 ~ A2 ... ~ An, their multiplicities counted.
(Notice that all the eigenvalues are real since the matrix is symmetric.)
Step 3. Let A be a root of multiplicity k. Then the homogeneous linear
system specified by the coefficient matrix
a11 - A a12 a1n
a21 a22 - A . . . Ci2n

Cinl Cin2

has exactly k linearly independent solutions that form the fundamental sys-
tem of solutions. On normalizing the solutiovs we obtain k pairwise unit
vectors.
Repeating this process for the other eigenvalues, we obtain exactly n
pairwise orthogonal unit vectors that comprise the orthonormal basis f 1,
f2, .•• , fn of IR". Notice that the vectors corresponding to the distinct eigen-
values are orthogonal by virtue of Property (d) of a symmetric operator.
Step 4. Write down J:lf(x, x), relative to the basis f = (f1, f2, •.• , fn),
in the diagonal form
d(x, x) = A1(17 1)2 + 'X.2(172)2 + ... + An(17")2,
where x = 17 1f1 + 1,2f2 + ... + 17"fn. ►
6.16 Quadratic Forms 219

Definition. The quadratic form


n n
sd'(x, x) =~ ~ au~;~ (6.40)
i= lj= 1

is called positive-definite if given any nonzero vector x or, equivalently,


given any nonzero n-tuple e, e, ...,
t, there holds
<-Clf (x, x) > 0.
The scalar square of an arbitrary vector ~ = (e, e, ..., ~n) of an n-
dimensional coordinate space given by the formula
c~, ~> = <e>2 + (e> 2 + ... + (t) 2
is an example of a positive-definite quadratic form.
On reducing the positive-definite quadratic form N (x, x) to the di-
agonal form, we have
N(x, x) = A1(l] 1)2 + A2(1J 2) 2 + ... + An(1,n)2,
where A1 > 0, A2 > 0, ... , An > 0.
Criterion for a quadratic form to be positive-definite. The quadratic
form d(x, x) is positive-definite if and only if the leading minors of the
associated matrix .w, i.e., the minors cut out of the left-hand upper corner
of ,if, are all positive, i.e.,

a11 a12 . • • alk


a11 a12 a21 a22 a2k
a11 > 0, > 0, ... > 0,
<X12 <X22
a1k <X2k akk

au a12
a12 a22
> 0.
a1n <X2n . . . ann

Diagonalization of a quadratic form by completing the square. We shall


explain the procedure which is useful to convert a quadratic· form to a di-
agonal form and, in particular, to decide on definiteness of quadratic
forms.
◄ Let
n n
.w'(x, x) =~ ~ <Xij~;
i = lj = 1
t
be a given quadratic form and let au ~ 0.
By a simple algebra we can reduce the sum of all terms involving ~1
to the form
220 6. Linear Spaces and Linear Operators

a11 (~ 1)2 + 2a12e e + ... + 2a1ne ~n

= 0!11 (<e)2 + 2 a12


0!11
ee + ... + 2 a1n
a11
~l~n)
= au (~1 + a12
a11
e + .. • + a In
a11
~n) 2
_ ±±
i=2j=2
Setting
17 1 = e+ a12
au
e + ... + a1n
au
~n 1

.,,k = ~k (k = 2, 3, . . ., n),
we obtain
n n
d(x, x) = a11(17 1) 2 + I: I: a;111'r,',
i = 2j= 2
where

We look now at
n n
Ni (x, x) = I: I: a[j17;rf.
i = 2j= 2

It is easy to see that Sift (x, x) is a quadratic form in (n - 1) variables and


can also be represented as the sum of the square of one variable and the
quadratic form in the other (n - 2) variables. Thus, repeating this process
of "completing the square" we finally arrive at the desired diagonal form
of d(x, x).
If 0:11 = 0 but a;; (2 ~ i ~ n) is distinct from zero, we start the process
by completing the square of ~;.
Now we suppose that in d (x, x) all the coefficients in squares of ~;
(i = 1, 2, ... , n) are equal to zero, i.e., au = a12 = ... = au = ann = 0.
Then by the substitution
~1 = .,,1 + .,,2,
e = 11 1 _ .,,2,
~k = 11k (k = 3, 4, ... , n)
the quadratic form d (x, x) is reduced so that we again have the general
case. Indeed, by this substitution the term 2a12e ~2 is reduced to 20:12(17 1)2-
20:12(172)2. ►
Example. By completing the square reduce the quadratic form
.w'(x, x) = 2xy + 2yz + 2zx.
to the diagonal form.
6.17 Curves and Surfaces of the Second Order 221

◄ By the substitution
X = i + y, y = i - Y, z= z
.w(x, x) is reduced to the form
.w(x, x) = 2i2 - 2y2 + 4iz = 2(i2 + 2iz) - 2y2
= 2(x + z) 2 - 2J2 - 2z2 •
Set x = i + i y = y, z = z. Then
.w(x, x) = 2i2 - 2y2 - 22 2 • ►
Remark. The major shortcoming of the process of completing the
square is that it involves the trasformations of coordinates which are not
orthogonal, that is, the new coordinates taken in pairs are not orthogonal.
On comparing the diagonal forms of 2xy + 2yz + 2zx obtained by ap-
plying the procedure that involves identification of an orthonormal basis
and the procedure of completing the square we easily see that in both cases
the number of positive terms remains unchanged and so does the num-
ber of negative terms. This is an important property of quadratic forms
called the law of inertia which states that for any quadratic form the num-
ber of positive terms remains the same in all its diagonal forms and so
does the number of negative terms and the number of zero terms. Thus
these numbers are independent of procedures applied to reduce a given
quadratic form to a diagonal form.

6.17 Classification of Curves and Surfaces


of the Second Order
We are now well equipped to turn back to the analysis of the general
equations of curves and surfaces of the second order which we have en-
countered in Chap. 4.
Plane curves. The general equation of the second-order curve in the
xy-plane is
ax2 + 2bxy + cy2 + 2dx + 2ey + f = 0,
where a2 + b 2 + c2 > 0.
The associated matrix of the quadratic form ax2 + 2bxy + cy 2 is
(: ~)-
Computing the roots }q and A2 of the characteristic polynomial and
the corresponding orthonormal eigenvectors I and J, we can use i and j
as the unit vectors of the new coordinate axes, the i- and j-axes, respectively
222 6. Linear Spaces and Linear Operators

(Fig. 6.19). Then the original equation becomes


A1x2 + A2f2 + 2ax + 2ef + J = o.
1\vo cases have to be distinguished: (1) At •A2 ¢. 0 and (2) either At or
A2 is equal to zero.
(1) By the translation

X _ a Y _ e
== X + Ti' ' = y + A2
the equation is reduced to the form
AtX2 + A2 Y2 + J = 0.

[}

Fig. 6.19

In a way similar to that we have followed in Chap. 4 we consider all


possible sequences of signs of At, A2 and] and finally arrive at the equations
which specify an ellipse, a hyperbola, a pair of intersecting lines, a point
and an empty set in the xy-plane.
(2) For definiteness we set At = 0 and >--2 ;i 0. Then by the translation

X=i+a, Y =-y +e-


A2
'
the equation
"A.2j2 + 2ox + 2ey + f = 0
is reduced to the equation
>-..2 Y2 + 2ax + J = o.
6.17 Curves and Surfaces of the Second Order

If d ~ O we put a = J_ , thus arriving at the equation of a parabola


2d
2 -
>..2Y + 2dX = 0.
If d = 0 we put a = 0, thus obtaining the equation
2
>..2Y + f- = 0
which specifies either a pair of parallel lines or a pair of coinciding lines
or an empty set corresponding to different signs of J!>..2.
Remark. Computations of the roots of the characteristic polynomial
and the corresponding orthonormal eigenvectors are used here instead of
a suitable rotation of coordinate axes employed in Chap. 4 to eliminate
the term 2bxy from the general equation.
Surfaces of the second order. The general equation is
a11x 2 + 2a12XY + 2a13XZ + a22Y 2 + 2a23YZ + a33z 2
+ 2a14X + 2a24y + 2a34z + a44 = 0,
wh ere a11
2 2
+ a12 2
+ a13 2
+ a22 2
+ a23 2
+ a33 > 0.
To simplify the quadratic form involved in the general equation we con-
sider the associated matrix

CXI 1 CXt2 CX13)


( a12 a22 a23
a13 a23 a33

and compute the roots A1, >..2 and }..3 of the characteristic polynomial
a11 '- t a12 CX13
a12 a22 - t a23
CX13 CX23 CX33 - f

Let i, j and k be the orthonormal eigenvectors of the associated matrix.


We accept i, j, i as the unit vectors of the new coordinate axes, the x-,
j- and i-axes, respectively, relative to which the general equation takes the
form

Three cases have to be distinguished:


(1) All the three roots >..1, >..2, >,.3 are distinct from zero.
(2) Only one root is equal to zero (for definiteness we set }..3 = 0).
(3) 1\vo roots are equal to zero (for definiteness we set }..3 = >..2 = 0).
Remark. All the roots >..1, >..2, 'X,3 are never equal to zero simultaneously.
We shall consider these three cases separately.
224 6. Linear Spaces and Linear Operators

Case 1. By the translation


X = i + a14 y = y + a24 z = i + a34
Al ' A2 ' A3
the general equation is reduced to the form
2 2 2 -
A1X + A2 Y + A3Z + a44 = 0.
(i) If a44 ¢ 0 and
(a) If A1, A2 and A3 are of the same sign which is opposite to that of
a44, we have the equation of the ellipsoid
x2 y2 z2
-a2+ -b2
+-=
c2 1 ,
where 0 2 = _ a44 b2 = _ a44 c2 = a44
Al ' A2 ' A3 .
(b) If the signs of A1 and A2 are opposite to that of a44 and A3 and
a44 are of the same sign, we have the equation of the hyperboloid of one
sheet
x2 y2 z2
-+----=1
a2 b2 c2 '
where 02= _ a44 , b2 = _ a44 c2 = a44 .
Al A2 ' A3
(c) If A1, A2 and a44 are of the same sign and the sign of A3 is opposite
to that of a44, we have the equation of the hyperboloid of two sheets
x2 y2 z2
- + - - - = -1
a2 b2 c2 '
Where a 2 = a44 b2 = a44 c2 = a44
Al ' A2 ' ~.
(ii) If a44 = 0 and
(a) If A1, >--2 and }..3 are of the same sign, the equation defines the point
(0, 0, 0) in space.
(b) If any two roots are of the same sign which is opposite to the sign
of the third root, we have the equation of the cone of the second order
x2 y2 z2
-a2+ -b2- - =
c2 0.

Case 2. By the translation

X -_ X- + -x-;-
a14
' y _ - + a24
- y ---x;- ' z __ z-
the general equation is reduced to the form
2 2 - -
A1X + A2 Y + 2a34Z + a44 = 0.
6.17 Curves and Surfaces of the Second Order 225

(i) If a34 'iC 0, then by the translation

a44
X = X, y = Y, Z = Z + ---
2a34

the equation ( *) is reduced to the form


}qX
2
+ A2Y 2 + -
2a34Z = 0,
in which case
(a) If At and A2 are of the same sign, we have the equation of the elliptic
paraboloid
x2 Y2
2z = ----- + -
p q
where p = __a 34 > 0 andq = -~ 34 > 0.
At A2
Notice that we have assumed that the sign of a34 is opposite to that
of At and A2. We can always make this assumption by reflecting the z-axis,
if necessary.
(b) If At and A2 are of the opposite signs, we have the equation of the
hyperbolic paraboloid
x2 Y2
2z=~---~
p q
where p = - > 0 and q = + ~~ > 0.
_a 3'!__
At A2
(ii) If a34 = 0 then the equation (*) becomes
2 2 -
AtX + A2 Y + a44 = 0,
which defines a family of cylindrical surfaces whose directing lines lie in
the XY-plane and are given by the equation
2
A1X + A2 Y 2 + -
a44 = 0.
The following table shows the classification of the corresponding cylin-
drical surfaces.
Case 3. By the translation
-
X =x+~,
a14 y =y,- z =z-

the general equation is reduced to the form


2 - -
A1X + 2&24 Y + 2a34Z + a44 = 0.
(i) If a24 ~ 0 and a34 ~ 0 then we can always reduce this equation to
the form with a 24 ~ 0 and a34 = 0. It suffices to transform the coordinates
15-9505
226 6. Linear Spaces and Linear Operators

Table 6.1

x2 y2
>w>-2 >0 >-1 ·a« >0 -+-= 1 Elliptic
a2 b2 cylinder

x2 y2
Al ·>-2 >0 >-1 ·a« < 0 - + - = -1 Empty set
a2 b2

x2 y2
>-1 ·>-2 >0 0'.44 = 0 -+-=0 z-axis
a2 b2

x2 y2
>-1·>-2<0 0'.44 ¢ 0 ---=1 Hyperbolic
a2 b2 cylinder

x2 y2
>-1 ·>-2 < 0 <l'.44 = 0 _ - _ =0 Pair of
a2 b2 intersecting lines

as

The equation becomes


}q.X 2 + 2{3y + ~44 = 0,
✓ -2 -2
where /3 = a24 + a34 •

Notice that we have to choose the transformation of coordinates such


that the new system of coordinates will be the Cartesian one.
By the translation
,. ,. 0'44
X=X, y=y+~, z=z
the equation (*) is transformed to the equation of the parabolic cylinder
x 2 = 2py.
(ii) If = a34 = 0 we have the equation
a24
2 -
A1X + a44 = 0,
which defines a pair of parallel planes when A1 ·a44 < 0, a pair of coinciding
planes when a44 = 0 or an empty set when A1 •a44 > 0.
Exercises 227

Exercises

1. Define the linear span generated by the polynomials 1 + t 2 ,


t + t2, 1 + t + t 2 •
2. Determine whether the vectors x1, X2, X3 are linearly dependent or not:
(a) X1 = (1, 2, 3), X2 = (4, 5, 6), X3 = (7, 8, 9), (b) X1 = (1, 4, 7, 10), X2 =
(2, 5, 8, 11), X3 = (3, 6, 9, 12).
3. Show that the vectors x1 = (1, 1, 1), x2 = (1, 1, 0), X3 = (0, 1, -1) form
the basis of the linear space IR 3 •
4. Complement the collection of two vectors (1, 1, 0, 0) and (0, 0, 1, 1)
to get the basis of the linear space IR4 •
5. Verify that the vectors (2, 2, -1), (2, -1, 2), ( -1, 2, 2) form a basis
of the linear space IR 3 and find the coordinates of the vector x =
(3, 3, 3) relative to the basis of IR 3 •
6. Define the dimension and a basis of the linear span generated by the
vectors x1 = (1, 2, 2, -1), x2 = (2, 3, 2, 5), X3 = ( - 1, 4, 3, -1), X4 =
(2, 9, 3, 5) of the linear space IR4 •
7. Compute the angle between the vectors (2, -1, 3, - 2) and (3, 1, 5, 1)
in the Euclidean space IR4 •
8. Apply the procedure of orthogonalization to the vectors (1, - 2, 2),
(-1, 0, -1) and (5, -3, -7) in IR 3 •
9. Let L be a subspace generated by the vectors (1, 3, 3, 5), (1, 3, - 5, - 3),
(1, - 5, 3, - 3). Find the lrcomponent of the vector x and the orthogonal
complement of x with respect to L if x = (2, - 5, 3, 4).
10. Let .C'/ be an operator defined over the 3-dimensional Euclidean space, .
such that .C'/x = (x, a)a, where a is a given vector. Prove that .0/ is a linear
operator.
11. Let .C'/ be a linear operator that maps an arbitrary vector x = (x 1, x 2 ,
x 3 ) as .C'/x = (2x 1 - x 2 - x3, x 1 - 2x 2 + x 3, x 1 + x 2 - 2x 3 ). Find the
image, kernel, rank and nullity of .C'/.
12. Find the matrix of the differential operator defined over the
2-dimensional linear space generated by the base functions
'P(t) = et cost and t/;(t) = et sin t.
13. Let

(0 0 1)
0 1 0
1 0 0
be a matrix of an operator .er/ relative to the basis 1, t, t2 of the space M2.
Find the matrix of .rr/ relative to the basis formed by the polynomials
3( 2 + 2t, 5t 2 + 3t + 1, 7( 2 + 5t + 3.
IG*
228 6. Linear Spaces and Linear Operators

14. Compute the eigenvectors and the eigenvalues of the operators defined
by the matrices

(a) (i ~). (b) G -2)


-1

-1
1 -2 .
1
15. Let an operator define the rotation of a plane through the angle ; .
Find the operator adjoint to the given one.
16. Convert the quadratic form 2x 2 + 5y 2 + 2z 2 - 4xy - 2xz + 4yz to the
diagonal form.
17. Specify what surfaces are given by the equations
(a) 7x2 + 6y 2 + 5z 2 - 4xy - 4yz - 6x - 24y - 18z + 30 = 0;
(b) x 2 + 5y 2 + z 2 + 2xy + 6xz + 2yz - 2x + 6y + 2z = O;
(c) 5x 2 - y 2 + z 2 + 4xy + 6xz + 2x + 4y + 6z - 8 = 0.

Answers
1. The collection of polynomials of order not exceeding 2. 2. (a) yes; (b) yes.

4. For example, (0, 1, 0, 0), (0, 0, 1, 0). 5. (1, 1, 1). 6. 4; x,, x,, x,, x.,. 7. ,r/4. 8.

-i,f). (-t, -;, -½), (t• -½, -;).9.y=(O, -3,5,2),z=(2, -2, -2,2).
0,
11. The basis of the image is y, = (2, 1, 1), Yi = ( -1, 2, 1). The basis of the kernel is z =

(1, 1, 1). The rank is 2. The nullity is 1. 12. ( 1 1) . 13.


-1 1
(-1;/ -1 -~
9~
4
3 4
).

14. (a)A, = ,, = 2; (b) ,, = 1, ,, = 2, ,, = 3 and G) •(~): (:). G). (i)


15. The adjoint operator defines the rotation through the angle - ; . 16. xi + 7 yi + zi;
x = _1_ + _1_ y + _1_ Z; y = - ~ y + _1_ Z; Z = _1_ X - _1_ y - _1_ z .
X
../2 -../6 V3 v6 V3 ../2 -../6 V3
. . xi Yi zi _ . . xi yi zi _ .
17. (a) elhpso1d 2 + - 1- + 213 - 1, (b) hyperboloid o_f one sheet 113 + 116 - 1112 -1,

(c) hyperbolic paraboloid X~ - yi = 2Z.


4/7 14 2/v14
Chapter 7
An Introduction to Analysis

7.1 Basic Concepts


Sets. Set is a basic undefined concept in mathematics. We shall be
content with the understanding that a set is a group or a collection of well-
defined distinguishable objects which are thought of as a whole. It may
be a set of letters printed on this page, a set of grains of sand on the
seashore, a set of all roots of an equation or a set of all even numbers.
Each object in a set is called an element or a member of the set. To signify
that an element a is contained in a set A we write a EA. The notation
a ~ A means that a does not belong to A.
Let A and B be two sets. If every element in A is also contained in
B we say that A is a subset of B and write A c B. For instance, if Z is
a set of all whole numbers and Z' is a set of all even numbers then Z' c Z.
Notice that always A c A.
If A c B and B c A, i.e., if every element in A is also contained in
B and vice versa, we say that A and B are equal and write A = B. This
means that a set is uniquely defined by its elements. So we may define
a set by listing all the elements of the set enclosed in braces. The sets
A= (a}, A= (a, b}, A= {a, b, c}
consist of just one element a, two elements a and b, and three elements
a, b and c, respectively. Sometimes it is impossible or impractical to list
all elements of the set. In this case three dots are used to represent unlisted
elements, e.g.,
A = (a, b, c, ... }
is a set that consists of a, b, c and some other elements. To define the
unlisted elements we shall use a written description that must fit all. ele-
ments of the set and only elements of the set. For example, we shall write
the set of natural numbers ( 1,. 2, 3, ... },
the set of squares of natural numbers ( 1, 4, 9, ... },
the set of primes (2, 3, 5, 7, ... }.
If A c B and A ;e B, A is called a proper subset of B.
230 7. An Introduction to Analysis

Sometimes we do not know in advance whether a set contains at least


a single element or not. So it is helpful to introduce the notion of an empty
set, i.e., a set with no elements!"> We shall denote the empty set by 0. The
empty set is a subset of any set or any set contains the empty set as its
subset.
Operations on sets. Let A and B be two sets. The union of two sets
A and B is the set C = A U B of all elements contained either in A or in
B, or in both. The intersection of two sets A and B is the set C = A n B
of all the elements contained both in A and in B. For exampl.e, let A =
(1,2,3} andB= {2,3,4,5}.ThenAUB= (1,2,3,4,5} andAnB=
{2, 3}.
If A nB = 0, A and B are said to be disjoint sets.
We can similarly define the union and intersection of any number of
sets.
Finite and infinite sets. A set is said to be finite if it has a finite number
of elements. A set of all residents of a specific town and a set of people
living on Earth are examples of finite sets.
A set is said to be infinite if it is not finite. The set rN = {1, 2, ... }
of all natural numbers is infinite.
Let A and B be two sets. We say that a one-to-one correspondence is
set up between A and B if each element of A is associated with an element
of B so that (i) distinct elements of A are associated with distinct elements
of B and (ii) each element of B is put into correspondence with an element
of A. It is easy to see that if a one-to-one correspondence is set up between
the sets A and B these sets are called equivalent. We write A - B to mean
that A and B are equivalent sets.
An infinite set is said to be countable if it can be put into a one-to-one
correspondence with the set rN of natural numbers, i.e., if the set is equiva-
lent to rN. Any infinite set contains a countable subset.
It can be known that the set of all rational numbers is countable while
the set of all real numbers is uncountable.
Real numbers. The numbers 1, 2, 3, ... are called natural numbers.
Every number which can be expressed as a fraction of the form ± m, where
n
m and n are natural numbers, and zero are rational numbers. Thus every
positive integer and every negative integer are rational numbers. All rational
numbers can be expressed as repeating decimal fractions. V nlike rational

•> It has not yet been established whether a set of natural numbers n such that the equa-
tion x" + 2 + yn + 2 = zn + 2 has positive integral solutions is empty or not. (In other words,
it has not yet been established if Fermat's last theorem is true or false.)
7.1 Basic Concepts 231

numbers, irrational numbers can be represented by infinite nonrepeating


decimal fractions. The union of rational and irrational numbers forms a
set of real numbers. It can be shown that the set of all rational numbers
is countable while the set of all real numbers is uncountable. By convention
the sets of natural, whole, rational and real numbers are denoted by rN,
7L, (Q and fR, respectively. We shall not give formal definitions of basic
properties and operations on real numbers assuming that these are well
familiar to the reader from the course of high-school mathematics.
Absolute values of real numbers. Let a be a real number. The absolute
value (or modulus) of a is equal to a if a is positive and is equal to - a
if a is negative. The absolute value of zero is zero. We denote the absolute
value of a by lal and write

lal = ( a ~f a ~ 0,
-a 1f a< 0.
The inequality lxl ~ a, where a > 0, is equivalent to the relation

(Show that this relation is true.)


The basic properties of the absolute values are:
(1) la· bl = lal · lbl,
(2)
a lal
b = lEf (b ~ 0).

◄ Relations (1) and (2) are direct consequences of laws of multiplication


and division of real numbers and the definition of the absolute value of
a real number. ►
(3) la + bl ~ lal + lbl.
◄ Indeed, it is easy to see that
- lal ~ a ~ lal,
-lbl ~ b ~ lbl,
Then adding the inequalities termwise, we obtain the inequality
-(lal + lbl) ~a+ b~ lal + lbl,
which is equivalent to the desired relation la+ bl ~ lal + lbl. ►

(4) jlal - lbll ~ la - bl.


◄ Indeed, the relation
lal = l(a - b) + bl ~ la - bl + lbl
232 7. An Introduction to Analysis

implies that
lal - lbl ~ la - bj.
Similarly, the relation
lbl = l(b - a) + al ~ lb - al + lal = la - bl + lal
yields
la - bl ~ lbl - lal
or
lal - lbl ~ - la - bl.
From (*) and (**) it follows that
- la - bl ~ lal - lbl ~ la - bl.
Whence we obtain the desired inequality
llal - lbll ~ la - bl. ►
Absolute and relative errors. We shall introduce some notions that are
widely used whenever we apply numerical methods to compute approximate
solutions of mathematical problems.
Let a be a true value of some quantity and a* be an approximation
to a. We shall call a the exact number and a* the approximate number.
The simplest measure to estimate the precision of the approximate number
a* is the absolute error of a*. We say that a positive number .6.(a*) is the
absolute error of a* if
la - a* I ~ .6.(a*).
This definition of the absolute error is rather ambiguous. For example,
if both a and a* are known the absolute error of a* is exactly equal to
the absolute value of the difference between a and a*. However, we may
not know the value of a. In this case inequality(***) means that the abso-
lute value of the difference between a and a* does not exceed .6.(a*) and,
consequently, any other positive number larger than .6.(a*) may also be
regarded as the absolute error of a*.
The absolute error refers to the precision of approximation and tells
us nothing about the accuracy. For example, if we have made two measure-
ments of temperature with the same absolute error equal to 0.2 °C and
found the readings 1000 °C and 10 °C, both measurements are to the same
level of precision. However it is easy to see that the former is more accurate
than the latter.
The accuracy of an approximation refers to the relative error. We say
7.1 Basic Concepts 233

that a positive number o(a*) is the relative error of the approximation a*


to a if
a - a*
~ o(a*) (a* -;e 0).
a*
The relative error is usually expressed as a percentage or a decimal fraction
of a*.
To signify that a* is the approximation to a with the absolute error
d(a*) we write
a = a* ± d(a*),
where a* and d(a*) are expressed as decimal expansions with the same
number of digits to the right of the c;lecimal points. For example, the- rela-
tion a = 5.272 ± 0.003 means that 5.272 - 0.003 ~ a ~ 5.272 + 0.003.
Similarly, if a* is the approximation to a with the relative error o(a*)
we write
a = a*(l ± o(a*)),
where o(a*) is expressed as a decimal fraction of a*.
Significant digits. The first nonzero digit that occurs in the decimal ex-
pansion of the approximate number a* on the left and all subsequent digits
to the right of this digit are called the significant digits. For example, in
at = 0.2015, at = 23.653, af = 0.00201500 the underlined digits are sig-
nificant.
A significant digit is said to be accurate if the absolute error d(a*) of
the approximate number a* does not exceed the number whose decimal
expansion contains a unity in the decimal position corresponding to this
significant digit. If in the decimal expansion of a* the last accurate digit
occupies the kth position, counting from the left, the approximate number
a* is said to be accurate to k decimal positions. For example, if a* = 0.3745
and d(a*) = 0.0001 then all nonzero digits in a* are significant. If
a = 23.6538 and a* = 23.653 then the approximate number is accurate to
3 decimal positions or to 5 digits, for the absolute error d(a*) = 0.0008
is smaller than the number obtained by substituting a unity for the last
digit on the right and zeros for the other digits in a*, i.e.,
d(a*) = 0.0008 < 1 X 10- 3•
Sometimes it is helpful to apply a narrow definition of the accurate
digit, namely, a significant digit is said to be accurate if the absolute error
d(a*) of a* does not exceed the number whose decimal expansion contains
half a unity in the decimal position corresponding to this significant digit.
For example, if a= 14.98 then the approximate number a* = 15.00 is ac-
curate to 2 decimal positions or to 4 digits since d(a*) = 0.02 <
0.5 X 10- 1•
234 7. An Introduction to Analysis

The absolute and relative errors are usually represented by numbers with
two or three significant digits.
Number line. Real numbers can be conveniently displayed as points on
a line (Fig. 7.1 ).
Suppose that we have drawn a line and fixed the direction, the origin
0 and the unit distance e for this line. To display a real number a on the
line we choose a point to the right of O such that the distance between
this point and the origin O is equal to a if the number a is positive. When
e
0 a

Fig. 7.1

a is negative we choose a point to the left of the origin O at the distance


equal to the absolute value lal of a. Clearly, the number a = 0 corresponds
to the origin 0. Therefore we have set up a one-to-one correspondence be-
tween the points of the line and the elements of the set of real numbers
so that each real number is associated with one and onfy one point on
the line and vice versa.
The line whose points are put into a one-to-one correspondence with
elements of the set of real numbers is called the number line.
We shall denote by x the point on the number line that corresponds
to the real number x.
The number line is a mathematical model which is helpful to interpret
relations between real numbers. The inequality X1 < x2 means that on the
number line x1 lies to the left of x2. The inequality x1 < X3 < X2 means
that X3 lies between X1 and x2.
Intervals. The important notions of intervals on the set of all real num-
bers are widely used in analysis.
A set of real numbers x is called
(a) the closed interval [a, b] if the inequality a~ x ~ b holds for every
x in the set;
(b) the open interval (a, b) if the inequality a< x < b holds for every
x in the set;
(c) the half-open or half-closed interval (a, b] if the inequality
a< x ~ b holds for all x in the set. (Similarly, the set of real numbers
x satisfying the inequality a~ x < b is also the half-open or half-closed
interval denoted by [a, b).)
We shall also consider infinite intervals by introducing points at infinity,
+ 00 and - 00 being the positive and negative infinities, respectively. For
example
7.1 Basic Concepts 235

(a, + oo) is the set of real numbers such that x > a,


[a, + oo) is the set of real numbers such that x ~ a,
( - oo, b) is the set of real numbers such that x < b,
( - oo, b] is the set of real numbers such that x ~ b,
( - oo, + oo) is the set fR of all real numbers.
Neighbourhoods of points. Let Xo be a point on the number line and
o > 0 be a real number.
The interval which contains Xo is called the neighbourhood of .xo. The
interval (Xo - o, Xo + o) which is symmetric relative to Xo is called the o-
neighbourhood of Xo (Fig. 7.2). Therefore the o-neighbourhood of Xo is the
set of real numbers that satisfy the inequality Ix - Xo I < o or
Xo - o < x < Xo + o. The interval (Xo - o, Xo + o) which does not contain
the point Xo is called the deleted o-neighbourhood of .xo.

x 0 -<J .r0 X 0 +<J

Fig. 7.2

Bounded and unbounded sets. Let E be a set of real numbers. Then


the set E is called
(a) bounded above if there exists a number b such that x ~ b for all
XE E;
(b) bounded below if there exists a number a such that a ~ x for all
XE E;
(c) bounded if E is bounded above and bounded below, i.e., if there
exist numbers a and b such that
a ~ x ~ b for all x E E.
Therefore the set Eis bounded if Eis contained in the closed interval [a, b].
For example, the set E = ( - oo, l] is bounded above and the set of all
natural numbers is bounded below.
A set which is not bounded above (below) is called unbounded above
(below). For example, the set of all natural numbers is unbounded above
and bounded below. The set of all negative numbers is unbounded below
and bounded above.
The sets of all integers, of all rational numbers and of all real numbers
are unbounded.
Supremum of a set. Let E be a set bounded above, i.e., let there exist
a number b such that x ~ b for all x E E. Then b is called an upper bound
of E. Any number b ' whic_h is larger than b is also an upper bound of E.
Definition. The number M is called the supremum of E provided that
(i) for any x E E there holds x ~ M,
236 7. An Introduction to Analysis

(ii) for any (whatever small) number e > 0 there exists a number x* EE
such that M - e < x* ~ M.
In other words, the supremum of Eis the least upper bound of E. We
shall denote the supremum of E by M = sup E or M = sup {x}.
xeE
If Eis a set unbounded above we put the supremum of E equal to + oo
and write sup E = + oo.
Infimum of a set. Let E be a set bounded below, i.e., let there exist
a number a such that a ~ x for all x E E. Then a is called a lower bound
of E. Clearly, any number smaller than a is also a lower bound of E.
Definition. The number m is called the infimum of E provided that
(i) for any x E E there holds x ~ m;
(ii) for any (whatever small) number e > 0 there exists a number x* EE
such that m ~ x* < m + e.
Thus, the infimum of E is the greatest lower bound of E. We shall
denote the infimum of E by m = inf E or m = inf {x}.
xeE
If E is a set unbounded below we put the infimum of E equal to - oo
and write inf E = - oo.
To illustrate the notions of supremum and infimum of a set we consider
the following examples. If E = [a, b] then inf E = a and sup E = b. If
E = (a, b) then again inf E = a and sup E = b. Notice that inf E and sup E
are contained in E in the former example and do not belong to E in the
latter one. For the set E = [ 1, ~ , ... , ! , ... ] we have inf E = 0 and
sup£= 1.
In conclusion we state the following theorem.
Theorem 7.1. A non-empty set of real numbers which is bounded above
has a supremum and a non-empty set of real numbers which is bounded
below has an infimum.
Logical symbols and connectives. To abbreviate the notation and simpli-
fy the definitions we shall use logical symbols and connectives.
The universal quantifier V is read "For every", "For any", "For each",
"For alP'.
The existential quantifier 3 is read "There is", "There exist", "There
exists at least one ..." or something equivalent.
We say that a declarative sentence is a statement if it can be true or
false but not both. For example, the sentences "Mathematics is a science",
"The number 2 is smaller than the number 3" and "The number 6 is a
prime" are statements whereas the sentences "Close the door" or "How
old are you"? are not. We shall denote the statements by Greek letters a,
~, 'Y, etc.
7.1 Basic Concepts 237

The implication a ~ (3 (read "if a then /3" or "a implies /3,,) is a false
statement if and only if a, called the antecedent, is true and /3, called the
consequent, is false. We have to distinguish the implication from the cause-
and-effeet relation. V nlike the latter the implication a ~ (3 is true whenever
the antecedent a is false. In other words, a false statement implies any state-
ment, e.g., "if 2 x 2 = 5 then the unidentified flying object has landed near
your house".
The equivalence a # (3 (read "a if and only if /3") means that the state-
ments a and /3 are logically equivalent.
The conjunction a A. (3 (read "a and /3") is a compound statement made
up of the statements a and (3 connected by the conjunction and. The con-
junction a A. (3 is regarded as a true statement if and only if both a and
(3 are true.
The disjunction av /3 (read "a or /3") is a compound statement made
up of the statements a and /3 connected by the conjunction or. The disjunc-
tion is thought of as true if and only if at least one of the statements is true.
Let a be a statement. The statement a (read "not a•) is called the nega-
tion of a, a being true if a is false and vice versa.
To negate a statement that involves the quantifiers we have to substitute
every universal quantifier for the existentional quantifier and vice versa and
replace the antecedent by its logical opposite so that if /3 ~ 'Y then
(3 ~ 'Y # (3 A. :y.
Necessary and sufficient conditions. Let the theorem "If the statement
a is true so is the statement /3" be true. The statements a and /3 that can
be compound statements are called the hypothesis and conclusion, respec-
tively. The theorem can be symbolized as the implication a ~ (3 and can
also be expressed as
a is a sufficient condition for (3
or
/3 is a necessary condition for a.
Now we shall find out what we mean when speaking of the necessary
and sufficient conditions.
Let /3 be a statement. We say that a statement a is a sufficient condition
for (3 if a implies /3 and a is a necessary condition for (3 if a follows from (3.
Let a and /3 be two statements given as follows
a: "The number x is equal to zero",
(3: "The product xy is equal to zero".
Then a~ (3.
◄ Indeed, for xy to be equal to zero it is sufficient that x be equal to
zero. For x to be equal to zero it is necessary that xy b"e equal to zero.
238 7. An Introduction to Analysis

But /3 is not a sufficient condition for a since x can be distinct from zero
when xy is equal to zero. ►
If a and /3 are the statements each of which implies the other, i.e., a ~ /3
and /3 ~ a, we say that each of a and /3 is the necessary and sufficient
condition for the other and write
a <=> ./3.
The following expressions all mean that a is the necessary and sufficient
condition for /3 and vice versa:
(a) for a to be true it is necessary and sufficient that /3 hold;
(b) a holds if and only if /3 is satisfied;
(c) a is true if and only if /3 is true.
Mathematical induction. It is not a rare case when a statement which
is true in some particular instances turns out to be false in general. For
example, if we compute the values of 991n 2 + 1 for the subsequent natural
numbers 1, 2, 3, ... , 10 10 we shall fail to get at least one value which is
equal to the square of a natural number. Based upon this experience we
might conjecture that the expression 99ln 2 + 1 will never produce squares
of natural numbers when n is natural. However this conclusion would be
false. The point is that the smallest n such that the value of 991n 2 + 1
becomes equal to the square of a natural number is extremely large, viz.,
n = 12055735790331359447442538767.
Against this background it seems reasonable to draw our attention to the
following problem. Let there be a statement which is true in some particular
cases. How can we prove that it is true in general without having to verify
it for each particular case that would be an impossible task?
An important tool which enables us to answer the imposed question
is the mathematical (complete) induction method based on the principle
of mathematical induction.
Principle of mathematical induction. Let a be a statement that is true
for certain n. Then a is true for all natural n provided that
(a) a is true for n = 1;
(b) if a is true for n = k then a is true -for n = k + 1.
This principle lays down the basis of mathematical reasoning.
To illustrate how the principle of mathematical induction works we shall
prove Bernoulli's inequality:
If h > - 1, then
(1 + ht ~ 1 + nh V n E rN. (*)
Clearly, inequality (*) is true for n = 1. Assume that (*) has been proved
for n = m > 1, i.e.,
(1 + h)m ~ 1 + mh.
7.2 Sequences of Numbers 239

Multiplying both sides of this inequality by (1 + h) > 0, we have


(1 + h)m + 1 > (1 + mh)(l + h) = 1 + (m + l)h + mh 2 •
Deleting the nonnegative number mh 2 from the right-hand side, we obtain
(1 + h)m + 1 ~ 1 + (m + l)h.
Thus (*) is true for n = m + 1. Hence, Bernoulli's inequality ( *) is true
for all n E tN.

7.2 Sequences of Numbers


Notion and notation. Let every natural number n be associated with
a real number an and let a rule that puts n into correspondence with an
be known. Then we say that a sequence of numbers
a1 , a2, ... , an, ...

is defined on the set of natural numbers.


The numbers a1, a2, ... , an are called the terms of the sequence. We
call an the nth term of the sequence; an is usually given by a law which
enables us to compute any other term of the sequence.
To abbreviate the notation we shall denote the sequence a1, a2, ... ,
..
an, . . . b Y wnt1ng {an }*) .
Examples of sequences are

g] = I I
l, 2 ' 3
I
' · · ·' n ' . . .'
{2n } = 2, 4, 8, ... , 2n, ... ;
{I} = 1, I, I, ... , I, ... ;
f (- l)n + t ] = 1 - _I_ _I_ .
t n2 ' 22 ' 32 ' ... ' ' ... '

{cos n} = cos 1, cos 2, cos 3, ... , cos n, . . . .


Limit of a sequence. A number A is said to be the limit of a sequence
{an } if, given any positive number e, there exists a number N such that
for all n > N
Ian - Al< e.

*> We have to distinguish the sequence {On} from the set {On}. For example, {5} =
5, 5, ... , 5, ... is a sequence whereas the set ( 5} contains a single element, 5.
240 7. An Introduction to Analysis

We shall write
A = lim an or an --+ A

to mean that A is the limit of the sequence {an }.


Using logical symbols we can write the definition of a limit as
( lim an =A)# Ve> 0
n-+ oo
3N vn > N ~ Ian - Al < e. (7.1)

The notion of a limit is easy to interpret by displaying the terms of


the sequence {an} on the number line (Fig. 7.3). The inequality

• •
a,
..
Fig. 7.3
Ian - A I < e is equivalent to A - e < an < A + e which implies that an
lies in the e-neighbourhood of A. Hence, A is the limit of {an} if, given
any e-neighbourhood of A, there exists a number N such that all an with
n > N are contained in this e-neighbourhood of A, i.e., in the open interval
(A - e, A + e). Thus, only the finite number of terms a1, a2, ... , aN can
be outside the open interval (A - e, A + e). Whence it follows that the
sequence with all terms equal to A, called the stationary sequence, has the
limit equal to A.
A sequence is said to converge if it has a finite limit and to diverge
otherwise.
Example. Consider the sequence [ n ; 1 ], a. = n ; 1 being the nth

n+l
term. Clearly, the larger n the closer to 1 is the fraction - - - --l+-1.
n n
This prompts us to assume that
. an = 1·Im -
1lm +-
n- 1 = 1.
n-+oo n-+oo n
◄ To prove that our assumption is true we shall take an arbitrary e > 0
and show that there exists a number N such that for all n > N
n + 1 1
Ian - 11 = ----1 =- <e. (7.2)
n n

From the inequality _!_ < e we have n > _!_ • Then for any natural N
n e
exceeding _!_ and all n > N there holds _!_ < e, i.e., (7 .2) is true. Hence,
e n
by virtue of the definiton of a limit we have lim an = 1. ►
n ➔ oo
7.2 Sequences of Numbers 241

In general the number N is not independent of e, i.e., N = N(e). For


instance, referring to the previous example, we observe that we can choose
N equal to 10 or to any other number exceeding 10 when e = 0.1. However,
if e = 0.01 N has to be larger than 100.
Remark. The number N is not uniquely defined by the value of e in
the sense that if inequality (7.1) holds for all n > Ni it also holds for all
n > N2, where N2 > Ni. Then to prove that lim an = A it suffices to
n->oo

choose any number N such that Ian - A I < e for all n > N. Therefore we
do not need to find the smallest number N satisfying (7.1 ).
How we shall prove two important theorems on limits of sequences.
Theorem 7.2 (Cauchy convergence criterion). For the sequence ai, a2,
... , an . . . to converge it is necessary and sufficient that for any e > 0
there exists N such that for all n > N and all m > N
Ian - aml < s.
A sequence {an ) satisfying Theorem 7.2 is said to be a Cauchy sequence.
Theorem 7.3 (on uniqueness of a limit). A sequence {an) can not have
two distinct limits.
◄ Let A be a limit of {an) and let B ~ A. To prove that B is not a limit
of {an) we take e so small that the e-neighbourhood of A and the e-
neighbourhood of B do not intersect. For the purpose it suffices to set
e=
IE-Al .
(Fig. 7.4).
3

A B
-
Fig. 7.4

Since lim an = A, only a finite number of terms of {an ) can be outside


n->oo

the open interval (A - B, A + e). Hence, the open interval (B - e, B + s)


contains at best a finite number of terms of {an ) and B is not a limit of
(an). ►
Bounded sequences. A sequence {an ) is called
(a) bounded above if there exists a number M such that an ~ M for all n;
(b) bounded below if there exists a number m such that an ~ m for all n;
(c) bounded if {an) is bounded both above and below, i.e., there exist
numbers m and M such that m ~ an ~ M for all n.
Notice that all terms of a bounded sequence are contained in the closed
interval [m, M] of the number line.
It is easy to observe that the sequence ... , -n, ... , - (n - 1) ... ,
- 3, - 2, -1 is bounded above, for all its terms are negative, whereas the
sequence 1, 2, 3, ... , n, . . . is boundep. below since all its terms are not
16-9505
242 7. An Introduction to Analysis

+1
smaller than 1. The sequence {an}, where an = -n - -= 1 .
1 + - 1s the nth
n n
term, is bounded since 1 < an ~ 2 for all n.
Sometimes it is helpful to use a slightly different definition of a bound-
ed sequence, namely, a sequence {an } is called bounded if there exists a
number K > 0 such that
lanl ~ K Vn.
By using logical symbols we can write this definiton as
({an) is bounded)# 3K > 0 Vn lanl ~ K.
A definition of an unbounded sequence is easily obtained from the defi-
nition of a bounded sequence by interchanging the quantifiers and convert-
ing the inequality involved, i.e.,
( {an} is unbounded) # vK > 0 3n lanl > K.
Example. Prove that the sequence {2n } is unbounded.
◄ Evidently, for any K > 0 there exists n such that 2n > K, i.e.,
n > log2 K. Hence, the sequence {2n } is unbounded. ►
A sequence is said to diverge to oo, and we write lim an = oo, if, given
n-+oo
any, whatever large, M > 0 there exists a number N = N(M) such that
Ian I > M vn > N.
A sequence {an ) such that
VM > 0 3N vn > N an> M (an< -M)
is said to diverge to + oo ( - oo ). In this case we write lim an = + oo
( lim an = - oo). n-+oo
n-+oo
We shall say that a sequence which diverges to oo, + oo or - oo is infinite-
ly large. It has no finite limit satisfying the definition of a limit given above
since the symbols oo, + oo and - oo do not represent real numbers. In what
follows we shall mean a finite limit as a real number. It is also worth men-
tioning here that there exist unbounded sequences which are not infinitely
large. For example, the sequence ( n sin n21r Jis unbounded but does not
diverge to infinity.
Theorem 7.4. Every convergent sequence is bounded, i.e., there exist
numbers m and M such that for all n
m ~an~ M.
◄ Let lim an =A and let e > 0 be an arbitrary number. Then there exists
n-+oo
7 .2 Sequences of Numbers 243

N such that the open interval (A - e, A + e) contains all the terms an with
n > N and a1, a2, ... , aN are the only terms of the sequence which can
lie outside this interval (Fig. 7.5). Thus only a finite number of terms can
lie outside the interval and we can choose the smallest ii and the largest
~ of them. Now let m = min (a,_ A - e} be the smallest of the numbers
iJ. and A - e and let M = max (a, A + e } be the largest of the numbers
a and A + e. Then the closed interval [m, M] contains the terms a1, a2,
. . . , llN and the open interval (A - e, A + e). Since all the terms an with
n ~ N + 1 belong to (A - e, A + e) the closed interval [m, M] contains
all the terms of the sequence ( an }. Hence, (an } is a bounded sequence. ►

• -
Fig. 7.5

Theorem 7.4 implies that for a sequence to converge it is necessary that


this sequence be bounded. However, this is not sufficient for a sequence
to converge, as is easily seen from the following example.
Example. The sequence
1, 0, 1, 0, 1, ...
is bounded but diverges.
◄ To prove that the sequence diverges we suppose the converse, i.e., the

sequence (*) has a limit, say, equal to A. Then for any e > 0 say for e = 41 ,
there exists N such that
1
Ian - A I < 4 vn > N.
Since the terms of ( *) alternate between 0 and 1 there must hold
I0-Al=IAl< 41 and ll-Al< 41 Vn>N.
Whence, we have
1 1 1
1 = 1(1 - A)+ Al ~ 11 - Al + IAI < 4 + 4 = 2'

that is, 1 < ~, which is impossible. This means that our assumption on
convergence of (*) is false. Hence, the sequence (*) does not have a limit,
i.e., it diverges. ►
Operations on convergent sequences. We shall discuss a number of theo-
rems which define arithmetic operations on convergent sequences and
basic properties of limits of sequences.
16*
244 7. An Introduction to Analysis

Theorem 7.5. Let { an } and { bn } be sequences which converge to A and


B, respectively. Then { an + bn} is a convergent sequence and

lim (an + bn) = A + B = Iim On + Iim bn.

◄ Let e > 0 be an arbitrary number. Since lim an =A, there exists N1


such that
e
Ian - A I < 2 Vn > N1.
Similarly, since lim bn = B, there exists N2 such that
n--+oo

Now let N be the largest of N1 and N2. Then for any n > N there hold
both (**) and (***). Hence, we have
l(an + bn) - (A + B)I = l(an - A) + (bn - B)I

:s::;; Ian - A I+ lbn - Bl < e2 + 2 = e. e

Thus
Ve > 0 3N Vn >N ~ l(an + bn) - (A + B)I < e.
By virtue of the definition of a limit we infer that A + B is the limit
of the sequence {an + bn } . ►
Theorem 7.5 can easily be generalized to any finite number of conver-
gent sequences.
Analogously we can prove
Theorem 7.6. If {an} and { bn} are convergent sequences so is { an - bn}
and
lim (an - bn) = lim On - lim bn,
n--♦ oo

Also of importance are


Theorem 7.7. If {an} and { bn} are convergent sequences so is {an· bn)
and

lim (an . bn) = lim On . lim bn.


n->oo n->oo n->oo

Theorem 7.8. If (an} and ( bn) are convergent sequences and if bn ~ 0

for all n and !~ b. at' 0 then [ : ] is a convergent sequence and

lim On
l• lln n->oo
n1:!1! On = lim bn ·
n->oo
7.2 Sequences of Numbers 245

Monotone sequences. A sequence (an} is called


(a) nondecreasing if a1 ~ a2 ~ ... ~ an ~ an+ 1 ~ ... ;
(b) nonincreasing if a1 ~ a2 ~ ... ~ an ~ an+ 1 ~ ..• ;
(c) monotone if (an} is either nondecreasing or nonincreasing.
A nondecreasing sequence {an } is bounded if it is bounded above, i.e.,
if there exists a number M such that an ~ M vn, for all the terms of (an}
are contained in the closed interval [a1, M].
A nonincreasing sequence {an} is bounded if it is bounded below, i.e.,
if there exists a number m such that an ~ m vn, for all the terms of (an )
are contained in the closed interval [m, at].
Theorem 7.9. Every monotone and bounded sequence has a limit.
◄ Since a sequence {an } is bounded the terms of ( an } form a set which
has a supremum and an infimum. Let M be a supremum of this set and
let us show that lim an = M provided that ( an } is a nondecreasing se-
n-> oo
quence.
By definition of a supremum for any e > 0 there exists aN such that
aN > M - e and aN < M. Whence, we have O ~ M - aN < e. Since (an}
is a nondecreasing sequence we have
0 ~ M - an ~ M - aN Vn ~ N.

Then by virtue of M - aN < e, we get


0 ~ M - an <e Vn ~ N
or, equivalently,
Ian - Ml < e vn > N.
Whence it follows that M is a limit of (an } .
Analogously we prove that lim an = m provided that ( an } is a nonin-
n->oo
creasing bounded sequence and m is an infimum of the set of the terms
of (an}, ►
Remark. For a sequence to converge it is not necessary that this se-
[ ( - lt
1
quence be monotone. For example, the sequence n + ] is not mono-
tone but converges to 0, i.e., lim an = 0.
n-+oo
From Theorem 7.9 follows the nested closed interval theorem sometimes
called Cantor lemma.
Cantor lemma. Let there be given a sequence of the nested closed in-
tervals
Un = [an, bn] (n = 1, 2, ... ),
such that Un+ 1 C Un (n = 1, 2, ... ) and dn = bn - an ➔ 0 as n ➔ oo. Then
there exists only one point which belongs to all Un.
246 7. An Introduction to Analysis

Cantor lemma refers to the remarkable property of a set of real num-


bers, i.e., the completeness of a number line, which implies that real num-
bers fill the number line without leaving "holes'' in it.
Number e and natural logarithms. Let

On=(l+~)" (•)
be the nth term of sequence {an}. Then substituting 1, 2 and 3 for n in
(*), we get
a1 = 2, a2 = (1 l)2 2 4'
+ 2 = + !_ 03 ( 1 1) 2 1
3
= +3 1
= + 3 + 27 .
Thus a1 < a2 < a3.
Using the binomial theorem*), we can easily show that the sequence
( (1 + ~)"] is a bounded and monotonically increasing sequence and

2,,; On = (1 + ~ )" < 3 'In.


This means that the sequence has a limit which is customarily denoted as

e = lim
n-+oo
(1 + l)n.
n
The number e is an irrational number and it can only be approximated:
e = 2.7183 .... Sometimes, in working with complicated expressions, it is
convenient to use the number e as the base of logarithms. The logarithm
of the number x > 0 to the base e is called the natural logarithm and denot-
ed by lnx.

•> The binomial theorem is given by the formula


(a+ bt =a"+_!!__ a"-ib + n(n - 1) n-2b2
l! 2! a
+ ... + n(n - 1) ... ~n - k + 1) a"-kbk + ... + bn,
k. -
where k! = 1 x 2 x 3 x . . . x k.
Substituting 2 and 3 for n in the binomial formula, we obtain the formulas of abridged
multiplication representing the square of the sum of two real numbers
(a+ b)2 = a2 + :, ab + 2 ~ 1 b 2 = a2 + 2ab + b 2

and the cube of the sum of two real numbers


(a + b)3 = 0 3 + _3_ 02b + 3 x 2 ab2 + 3 x 2 x 1 b3
1! 2! 3!
= a 3 + 3a 2 b + 3ab 2 + b 3,
which are familiar to the reader from the course of high school algebra.
7.3 Functions of One Variable and Limits 247

Remark. To prove that the sequence [ ( I + !)"J has a limit it is help-


ful to apply Bernoulli's inequality. Indeed, by virtue of this inequality we
have
1
On = ( 1 + n
)n ~ 1 + n n1 =2 vn,
that implies that the sequence (On} is bounded below.
Consider a sequence ( bn} whose nth term bn is given by

bn = ( I + !) an = ( I + ! r +
1
= ( n: l r +
1

Evidently, bn = (I + !)an > 2 Vn. Then we have

bn1 = ( n + 1 )n + 1/ ( n + 2 )n + 2 = (n + 1)2n + 3
bn + 1 n n + 1 nn + 1(n + 2t + 2
(n + 1)2n+4. n
---------------~
(n + l){ [(n + 1) - l][(n + 1) + l]} n + 2

n (n + 1)2n +4 n [ (n + 1)2 r+2


-
n + 1 [(n + 1)2 - 1t+2 =
r,
n + 1 (n + 1)2 - 1
n [ 1
- 1
n + 1 + (n + 1)2 - 1 ·
Applying Bernoulli's inequality to the relation

[l 1 ]n+2
+ (n + 1)2 - 1 '
we obtain

i.e., bn ~ bn + 1 •
Thus, the sequence {bn } is nonincreasing and bounded below. This
means that ( bn } has a limit and so does the sequence ( On } and
.
11m . bn
On = 1Im 1 = lim bn .
n--+oo n--+oo 1 + _ n--+oo

7.3 Functions of One Variable and Limits


Notion of a function. Similarly to the concept of a set a function
is a primary basic notion. It can be used in analysis in slightly different
ways. We shall confine ourselves to the following notion of a function.
248 7. An Introduction to Analysis

Let X be a set of real numbers x and let there be given a certain law
or rule which assigns a real number y to every number x in X. Then we
say that there is a function defined on X and write
y = f(x) or y = y(x), x EX*>.
The set x•is called the domain of a function and the set Y of values y
specified by the function is called the range of a function. The domain
is sometimes denoted by D(f) and the range by E(f) provided that the
range of a function contains values y for every x in X.
Sometimes a function is denoted by writing f instead of f(x).
A function is fully determined if there are given (i) its domain of defini-
tion X and (ii) a rule which associates every x in X with a certain value
Y = f(x).
The functions f and g are said to be equal if D(/) = D(g) and the identi-
ty f(x) = g(x) remains true for all x in D(/) = D(g). For example, the func-
tions y = x 2 , - oo < x < + oo, and y = x 2 , 0 ~ x ~ 1, are not equal since
their domains are distinct; these functions are equal on the closed interval
[O, 1].
Examples of functions. (a) A sequence {an} is a function of an integral
variable whose domain is the set of natural numbers, such that f(n) = an
(n = 1, 2, ... ).
(b) y = n! is defined on the set of natural numbers. The relation n!
(read n factorial) is equal to the product of all integers from 1 through
n, 1.e.,
n! = 1 x 2 x 3 x ... x n
and by convention O! = 1.
Sometimes the symbol !! is used. The relation (2n) !! is assumed to be
equal to the product of all even integers from 2 through 2n. For example,
8! ! = 2 x 4 x 6 x 8. The relation (2n - 1)! ! is equal to the product of all
odd integers from 1 through 2n - 1, e.g., 7 ! ! = 1 x 3 x 5 x 7.
1 for x >0
(c) y = sgn x = [ 0 for x = 0,
-1 for x <0
is defined at each point of the number line - oo < x < + oo. The domain

*> This notion describes a numeric function and can easily be generalized to the case
of arbitrary sets. Let M and N be two arbitrary sets. We say that there is a function f which
is defined on M and assumes values in N if every element in M is associated with one and
only one element in N. Thus the function f maps the set M on the set N and is sometimes
called the mapping. Notice that the extended notion of a function is fully applicable to the
case of sets of numbers.
'F u=n=ct=io=n=s-=o.::....fO=ne=-V.~a=ri=ab=l=-e.....,an....,,d"--'L"""'i"""'m.....,it.,,__s_______________ _ ___.,,,_242.
_ ____;7_:.;;•3'-....

of y = sgn x is the set containing only numbers -1, 0 and 1 (Fig. 7.6).
The abbreviation sgn x means sign um function.
(d) y = [x], where xis a real number and [x] is the largest integer not
exceeding x, i.e., [x] = n for n ~ x < n + 1, n = 0, ±1, ±2, .... This
function is defined at every point of the number line and its range is the
set of all integers (Fig. 7. 7).

0 X

_ _ _ _ _ _ __,.1

Fig. 7.6

lj

4 - -+---

3 ------
-- - - ---•----
2 ______ _._ _ _
_

1 --
• •
-2 -1
0 2 3 4 5 r

• . -1

• . -2

Fig. 7.7

Representations of functions. A function can be expressed by a formula,


a graph or a table. Respectively, we shall speak of the analytic, graphic
or tabular representations of functions.
(a) Analytic representation. A function y = f(x) is said to be expressed
analytically if it is defined by a formula or an equation which specifies
250 7. An Introduction to Analysis

to what operations each x E D(f) must be subjected to obtain the cor-


responding value y of the function. For instance, the function y = x 2
1+X
( - oo < x < + oo) is given analytically. In this case the domain of a func-
tion, unless stated otherwise, is thought of as a set of real values x for
which the formula defining the function assumes only real and finite values.·
In this sense the domain of a function is sometimes called the domain of
existence of a function. For example, the domain of y = ✓1 - x 2 is the
closed interval - 1 ~ x ~ 1 and its range is the closed interval O ~ y ~ 1.
The domain of y = sin x is the infinite interval (the number line)
- oo < x < + oo and the range is the closed interval - 1 ~ y ~ 1.
Notice that not all formulas define functions. For instance, the formula
y = ✓ 1 - x2 + ✓x2 - 4
does not specify a function since for any real x at least one of the square
roots does not assume a real value.

01 (xoJ)
I
I
I
I f(x 0 }
I
I
I
I
1 2 X 0 Xo X

Fig. 7.8 Fig. 7.9

An analytic representation does not mean that a function is always


specified by a single formula. A function can b~ given by different relations
for different parts of its domain. For example, the function y = f(x) shown
in Fig. 7 .8 can be specified as

0 for x < 0,
x for O ~ x ~ 1,
f(x) =
2 - x for 1 < x ~ 2,
0 for x > 2.
(b) Graphic representation. The graph of a function y = f(x) is a locus
of points (x, f(x)) on the xy-plane whose abscissas are the values of x and
-----'-7""""'.3-"'F-"'u=nc=t~io=n"-so"""f-'O'--n---e_V.-"-ar-'ia___b___ m_it_s________ __ ___ _ _____ .25 I
le_a_nd_L1_·

ordinates are the values of y (Fig. 7 .9). We say that a function admits a
graphic representation when its graph is specified.
Notice that not all functions can be given by graphs. For example, the
Dirichlet function
D(x) = [1 for ~atio_nal x,
0 for 1rrat1onal x
does not admit a graphic representation. The domain of this function is
the number line and the range comprises two numbers O and 1.
(c) Tabular representation. A function is said to be specified in a tabular
form if it is represented by means of a table which contains numerical
values of the variable x and the corresponding values of y.
Limit of a function at a point. A limit is a basic concept of mathemati-
cal analysis.
Cauchy criterion for limits. Let f(x) be a function defined in a neigh-
bourhood n of a point .xo, except probably at .xo. Then a number A is a
limit of f(x) at Xo if, given any (whatever small) number e > 0, there exists
a number o > 0 such that
lf(x)-Al<e,
whenever Ix - Xo I < o and x ~ .xo.
To signify that the number A is a limit of f(x) at the point Xo we write
lim f(x) = A.
X--,. Xo

Using logical symbols we can express this criterion as


(!~n_:,f(x) =A) <>Ve> 0 3/i > 0 vx, x .;c Xo,
Ix - Xo I < o => lf(x) - A I < e.
Example. Let f(x) = 2x + 3 and Xo = 1. Verify that lim f(x) = 5.
◄ Clearly, f(x) is defined everywhere including the point Xo = 1 where
f(l) = 5. Let e > 0 be an arbitrary number. For the inequality
l(2x + 3) - 51 < e to hold it is necessary that
e
12.x - 21 < e => 2lx - 11 < e => Ix - 1I < 2 .

Thus, if o= ; , we have O < Ix - 1I < o = ; and lf(x) - 5I < e.


Whence it follows that the number 5 is the limit of /(x) = 2x + 3 at the
point Xo = 1. ►
Geometric interpretation of a limit at a point. Let /(x) be given by a
graph so that the values of /(x) are equal to the ordinates of the curve
252 7. An Introduction to Analysis

M1M for x < Xo and to the ordinates of the curve MM2 for x > Xo
(Fig. 7.10). And let the value /(Xo) be equal to the ordinate of the point
N. In other words, we assume that the graph of f(x) is obtained from the
"good" curve M1MM2 by replacing N for M.
Let us show that at .xo the function f(x) has a limit equal to the ordinate
A of M.

I)
Mz
•N

A+c

A-e

0 I

Fig. 7.10

◄ We choose any (whatever small) e > 0 and fix the points with the or-
dinates A, A - e, A + eon the y-axis. Let P and Q be the points of inter-
section of the graph of y = f(x) with the lines y = A - e and y = A + e
and let Xo - h1 and Xo + h2, where h1 > 0 and h2 > 0, be the abscissas of
P and Q,.respectively. We easily see for any x ~ Xo in the interval (.xo - h1,
Xo + h2) the value of f(x) lies between A - e and A + e, i.e., for all x ~ Xo
we have
A - e < f(x) < A + e
whenever Xo - h1 < x < Xo + h2.
Let o = min ( h1, h2} be the smallest of h1 and h2. Then the interval
(Xo - o, Xo + o) is contained in the interval (Xo - h1, .xo + h2). Hence, the
inequality A - e < f(x) < A + e or, equivalently,
lf(x) - Al < e
remains true for all x ~ Xo belonging to the interval (Xo - o, Xo + o), i.e.,
for all x satisfying the condition
o < Ix - .xol < o.
Whence it follows that lim f(x) = A. ►
x-+xo
7 .3 Functions of One Variable and Limits 253

Thus we infer that the function y = f(x) has a limit A at a point xo


provided that for any (whatever narrow) strip between the lines y .:.- A - e
and y = A + e there exists o > 0 such that the graph of y = f(x) lies inside
this strip for all x belonging to the deleted o-neighourhood of .xo.
Remarks. (1) In general the value of o depends on the value of e, i.e.,
o = o(e).
(2) When we deal with a limit of a function at a point Xo we exclude
Xo from consideration. Thus a limit off at Xo is independent of the value
of f at .xo. Furthermore, a function can even be undefined at .xo. So any
two functions equal to each other in a neighbourhood of .xo, except pro-
bably at the point Xo, where they can differ or even can both be undefined,
have the same limit as x -+ Xo or have no limits at all. In particular, this
implies that when computing the limit of a fraction at xo we may reduce
this fraction by like factors vanishing at x = .xo.
Examples. (1) Compute lim x .
x---+O X

◄ Notice that f(x) = x = 1 for all x ~ 0 and is undefined at x = 0. By


X
the definition of a limit the point x = 0 is excluded from consideration. So
lim x = lim 1 = 1. ►
x---+O X x---+O

(2) Compute lim f(x) (see Fig. 7.11), where


x---+O

f(x) = [x 2 for x ~ 0,
1 for x = 0.
◄ The function g(x) = x 2 , - oo < x < + oo, is equal to f(x) everywhere,
except at x = 0, and its limit at x = 0 is equal to zero, i.e., lim g(x) = 0
x---+O
(show this!). Hence, lim f(x) = 0. ►
x---+O
Problems. Write down the Cauchy criterion for the following limits
(1) lim f(x) = 5, (2) lim f(x) = 0,
x---+1 x---+O
(3) lim f(x) = 1, (4) lim f(x) = -2.
x-+ -2 x-+ -3

Sequential criterion for limits. Let f(x) be a function defined in a neigh-


bourhood O of Xo, except probably at .xo. And let {xn}, where Xn E O and
Xn ~ Xo, be a sequence of values of x that converges to Xo, Then a number
A is the limit of f(x) provided that for any {Xn } the corresponding sequence
{f(xn)} of the values of f(x) converges to A.
This criterion is convenient to apply when we wish to determine whether
the function f(x) has a limit at Xo or not. It suffices to define a sequence
254 7. An Introduction to Analysis
---

{f(xn)} which has no limit or two sequences {f(xn)} and {f(x;)} having
distinct limits.
◄ Consider the function f(x) = sin_!_ defined everywhere, except at the
X
point x = 0 (Fig. 7.12).

lj y= sin
r1

0 I

0 -1

Fig. 7.11 Fig. 7.12

We have
1
Xn = -1- ~ 0, x; =----
n1r n-+oo 7r 2 n-+oo
and 2 + n1r
f(xn) = sin n 1r = 0,
f(x~) = sin(; + 2n1r) = 1.
Hence we have found two sequences {Xn J and {x; } converging to x = 0,
for which the corresponding sequences of the values of f(x) converge
to different limits, viz., {f(xn)} converges to zero while {f(x;)} converges
to 1. By the sequential criterion this means that f(x) = sin_!_ has no limit
X
at X = 0. ►
Remark. It can be proved that both criteria for limits at a point are
equivalent.
Limit theorems. We shall derive some basic results which are of impor-
tance in examining limits of functions of one variable.
Theorem 7.10. Let J(x) have a limit at a point Xo. Then this limit is
unique.
◄ Let lim f(x) = A. We shall show that there exists no number B -;t. A
x-+xo
which is a limit of f(x) at Xo,
The statement lim f(x) -;t. B can be written as
x ➔ xo

3e > 0 vo > 0 3x, x -;t. Xo, Ix - .xol < oI\ lf(x) - Bl ~ e.


7.3 Functions of One Variable and Limits 255

Using Ilal - lbl I ~ la - bl, we have


lf(x) - Bl= l(f(x) - A) - (B - A)I ~ llf(x) - Al - IB - All
= IIB - Al - lf(x) - All• (*)

Let e = IB ; AI > 0. Since lim f(x) = A, then for the chosen e > 0
x->xo
there exists o> 0 such that
lf(x) - A I < e vx, x ~ Xo, Ix - .xol < o.
Applying (*), we reduce the above statement to the following
lf(x) - Bl ~ e vx, x ~ Xo, Ix - .xol < o.
Hence, we have found e > 0 such that for any (whatever small) o > 0
there exists a point x ~ Xo for which O < Ix - .xol < o and lf(x) - Bl ~ e.
Whence it follows that B ~ lim f(x). ►
x->xo
A function f(x) is said to be bounded in a neighbourhood of a point
Xo if there exist numbers M > 0 and o > 0 such that

lf(x)I ~ M Vx E (Xo - o, Xo + o),


where f(x) is defined.
Theorem 7.11. Let f(x) have a finite limit at xo. Then f(x) is bounded
in a neighbourhood of Xo, i.e., 3M> 0, 30 > 0 and
lf(x)I ~ M vx E (xo - o, Xo + o), x ~ .xo.
◄ Let lim f(x) = A. Then for any e > 0, say, e = 1, there exists o> 0
x---+xo
such that
lf(x) - A I < I
whenever x ~ Xo and Ix - xo I < o.
Observe that
lf(x)I - IAI ~ lf(x) - Al
is always true. Then we have
lf(x)I < IAI + 1.
Let M = IA I + 1 if f(x)
is undefined at Xo and let M = max ( IA I + 1,
lf(.xo)I} if f(x) is defined at .xo. Then
lf(x)I ~ M

holds at every point x in the interval (Xo - o, Xo + 5). By definition this


means that f(x) is bounded in a neighbourhood of .xo. ►
256 7. An Introduction to Analysis
-------------·----

Theorem 7.11 implies that every function having a limit is bounded.


However, the converse is not always true, i.e., a function can be bounded
in a neighbourhood of Xo but have no limit at Xo. For example, f(x) = sin_!_
X

is bounded in a neighbourhood of x = 0 since sin_!_ ~ 1 vx, x ~ 0


but f(x) has no limit at x = 0. x
The following two theorems are easy to interpret geometrically.
Theorem 7.12. Letf(x) ~ cp(x)for all x in a neighbourhood of Xo, except
probably at Xo, and let f(x) and cp(x) have limits at Xo. Then (see Fig. 7.13)
lim f(x) ~ lim cp(x).
x-> Xo x-+ Xo

Notice that in general from the strict inequality f(x) < cp(x) it follows that
lim f(x) ~ lim cp(x).
x-+xo x->xo

For example, let


2.x 2 for x ~ 0,
f(x) = x 2 and cp(x) = [
1 for x = 0.
Obviously, we have f(x) < cp(x) vx whereas
lim f(x) = lim cp(x) = 0.
x-> 0 x->O

lj y
Tf(x 0 )
I y='f(x)
I y='P{x)
I
I
I
I

~yaf(x)
I
I
0 X 0 X

Fig. 7.13 Fig. 7.14

Theorem 7.13. Let cp(x) ~ f(x) ~ t/;(x) be true for all x in a neighbour-
hood of Xo, except probably at Xo, and let cp(x) and tf;(x) have the same
limit A at Xo, Then f(x) also has the limit A at Xo (Fig. 7.14).
Limit of a function when the variable tends to infinity. Let f(x) be de-
fined at every point of the number line or at all x whenever !xi > K for
some K > 0 so that we can compute the values of f(x) for whatever large
magnitudes of x.
7.3 Functions of One Variable and Limits 257

Definition. We say that a number A is a limit of f(x) as x tends to


infinity and write
lim f(x) = A
X-> <X>

provided that for any e > 0 there exists a number N > 0 such that
lf(x) - A I < e,
whenever lxl > N.
Substituting x > N and x < - N for lxl > N, we have A = lim f(x)
X-,. +oo
and A = lim f(x), respectively. Whence it follows that
X-,. -oo

A = lim f(x)
X-+ <X>

if and only if both A = lim f(x) and A = lim f(x) hold.

lj

re
A ~-----~-44-WM¼~~tBi:~~:;:;i~

/"
A-c
,/

-------~..L.L....L.L....<....L...L...L.L...."--'--L...L._L_.L....L....,<.....<...L...L._L_,L..L...:....L...<...-

0 N

Fig. 7.15

Geometrically the statement A = lim J(x) means that, given any


(whatever narrow) strip between the lines y = A - e and y = A + e, there
exists a line x = N > 0 such that for all x > N > 0 the graph of y = J(x)
is contained inside this strip (Fig. 7.15). In this case we say that the curve
y = f(x) asymptotically tends to the line y = A as x ~ + oo.

Example. Consider the function f(x) = 2


1 defined at all points
X +1
of the number line in which case the nominator is bounded everywhere
and the denominator increases to infinity as lxl ~ + oo. This prompts us
to assume that lim f(x) = 0.
X-,. oo

17-9505
258 7. An Introduction to Analysis

◄ Let us choose e > 0 such that O < e ~ 1. For the inequality

1 - 0 <e
x2 +1
to hold there must hold ~-1- - < e. Whence x 2 + 1 > _!_ and

lxl > j! I.
x2 + 1
Thus, if we choose N= F e
then

1
~---0 < e whenever !xi > N, i.e., A = 0 is the limit of the given
x2 +1
function as x-+ co.
Notice that the inequality_!_ - 1 ~ 0 is true only for e ~ 1. For e > 1
e
1
it immediately follows that 2 < e for all x E IR.
X +1
The graph of an even function y = 2
1 asymptotically tends to the
X +1
line y = 0 as x -+ ± co. ►
Problems. We leave it to the reader to interpret the following statements
in terms of inequalities involved in the definition of a limit
(1) lim f(x) = - 3; (2) lim f(x) = 1; (3) lim f(x) = 0.
x---+oo x---+ + oo x---+ -oo

7.4 Infinitesimals and Infinities


Notion of an infinitesimal. Let a function a(x) be defined in a
neighbourhood of a point Xo, except probably at .xo. Then a(x) is said
to be an infinitely small Junction or an infinitesimal as x tends to Xo if
a(x) has at Xo a limit equal to zero, i.e.,
lim a(x) = 0.
x---+xo

For instance, the function a(x). = x - 1 shown in Fig. 7.16 is an in-


finitesimal as x-+ 1 since lim (x - 1) = 0.
x➔ l

In general the function a(x) = x - Xo .offers an elementary example of


an infinitesimal as x -+ .xo.
Applying the Cauchy criterion for limits we can define an infinitesimal
as follows.
A function a(x) is an infinitesimal as x -+ Xo if given any e > 0 there
exists o > 0 such that
lo:(x)I < e
whenever Ix - .xol < o and x -;t:. Xo-
7.4 Infinitesimals and Infinities 259

Usfng logical symbols we write


(a(x) is an infinitesimal as x ➔ xo) ~ Ve >0 3o > 0 vx, x ¢ .xo,
Ix - .xol < o~ la(x)I < e.
Analogously, we can define infinitesimals as x ➔ oo, x ➔ + oo and
X ➔ - oo.
We say that a(x) is an infinitesimal as x ➔ oo if lim a(x) = 0.
X--+ DO

If lim a(x) = 0 or lim a(x)


X--+ -DO
= 0 the function a(x) is said to be an
x--++DO

infinitesimal as x ➔ + oo or as x ➔ - oo, respectively.


lj

0 X

Fig. 7.16

For instance, the function a(x) =l , x ¢ 0, is an infinitesimal as


X
x ➔ oo, since lim !_ = 0. The function a(x) = e -x is an infinitesimal as
x-+DO X
x ➔ + oo, for lim e - x = 0.
x-+ + DO
In what follows we shall confine ourselves to infinitesimals as x tends
to .xo by leaving to the reader a generalization of theorems and concepts
to the cases when x ➔ oo, x ➔ + oo and x ➔ - oo.
Properties of infinitesimals. Now we shall prove some important the-
orems which enable us to operate on infinitesimals.
Theorem 7.14. If a(x) and {3(x) are infinitesimals as x ➔ Xo so is the
sum a(x) + {3(x) as x ➔ .xo.
◄ Since a(x) is an infinitesimal as x ➔ Xo then for any e > 0, there exists
01 > 0 such that
e
la(x)I < 2 '
whenever Ix - xo I < 01 and x ¢ Xo-
17*
260 7. An Introduction to Analysis

Analogously, for {3(x) we have &i > 0 such that


e
lf3<x)I < 2
whenever Ix - Xo I < &i and x ~ XQ.
Leto= min {~1, &i }. Then both(*) and(**) hold whenever Ix - Xol < o
and x ~ Xo. Hence
la(x) + {3(x)I ~ la(x)I + lf3(x)I < ; + ; = e

vx, x ~ Xo, Ix - Xo I < o.


Thus the sum a(x) + {3(x) is an infinitesimal as x--+ XQ. ►
Similar theorem can easily be proved for any finite number of in-
finitesimals as x--+ XQ.
Theorem 7.15. If a funciton a(x) is an infinitesimal as x--+ Xo and if
a function f(x) is bounded in a neighbourhood of Xo then the product
a(x)f(x) is an infinitesimal as x--+ XQ.
◄ Since f(x) is bounded in a neighbourhood of Xo there exist numbers
01 > 0 and M > 0 such that
lf(x)I ~ M
for all x E (Xo - 01, Xo + 01) where f(x) is defined.
Consider an arbitrary e > 0. For an infinitesimal a(x) there exists &i > 0
such that, as x--+ Xo,
e
la(x)I < M
whenever Ix - Xo I < &i and x ~ XQ.
Let o = min {01 , &i }. Then both
e
lf(x)I ~ M and la(x),I < M
are true whenever Ix - Xo I < o and x ~ XQ.
Hence
la(x)f(x)I = la(x)I - lf(x)I < tt ·M = e vx, x ~ Xo, Ix.:.... Xoi < o.
By definition we infer that the product a(x)f(x) is an infinitesimal as
X--+ XQ. ►
Example. Consider the function y = x sin _!_X (Fig. 7.17).

The relation x sin _!_ can be considered as the product of the functions
X

a(x) = x and f(x) = sin_!_. The function a(x) is an infinitesimal as x--+ 0


X

and f(x) = sin_!_ is defined everywhere, except at the point x = 0, and is


X
7.4 Infinitesimals and Infinities 261

bounded in any neighbourhood of x = 0. Then by Theorem 7.15 y =


x sin l / x is an infinitesimal as x ~ 0 so that
. X Sill
1Im . -1 = 0.
x-+O X

Corollary. If a function a(x) is an infinitesimal as x ~ Xo and if a func-


tion f(x) has a finite limit at Xo then the product a(x)f(x) is an infinitesimal
as x ~ .xo.
◄ To prove this corollary it suffices to observe that a function having
a limit at Xo is bounded in a neighbourhood of .xo. ►

lj

lj=:X

:x

IJ=-x

Fig. 7.17

Lemma. If a functionf(x) has a nonzero limit at a point Xo then a func-

tion f(~) is bounded in a deleted neighbourhood of .xoA


◄ Let lim f(x) = A ~ 0. Then for any e > 0, say e = _I_I, there exists
x-+xo 2
o> 0 such that for all x
IA - f(x)I < 1 1
1

whenever 0 < Ix - .xol < o.


Since the inequality IA - /(x)I ~ IA I - lf(x)I is always true we have

IAI - IJ<x>I < 1 1. 1


Whence

lf(x)I > 1
1 1
vx, x ~ Xo, Ix - .xol < o.
262 7. An Introduction to Analysis

This means that a function 1 1s defined for all x, x ~ Xo,


Ix - .xol < o and f(x)

1 1 2
f(x) - lf<x>I < 7AT·
Hence the function f(~) is bounded in a deleted neighbourhood of
Xo. ►
Theorem 7.16. If a(x) is an infinitesimal as x ➔ Xo and if f(x) has a
nonzero limit at Xo then the quotient ;g; is an infinitesimal as x-> Xo-
◄ Express a(x) as
f(x)

a(x) 1
f(x) = a(x) f(x)

By virtue of the above lemma the function f(~) is bounded in a neigh-


bourhood of a point Xo. Then, applying Theorem 7.15, we infer that the
function a(x) f(~) is an infinitesimal as x ➔ Xo. ►
The condition lim f(x) ~ 0 is essential when we wish to apply Theo-
x-+ xo
rem 7.16. For example, let a(x) = x and f(x) = x 2 • The function a(x) is
an infinitesimal as x ➔ 0 and the function f(x) has a limit equal to zero
at x = 0, i.e., f(x) is an infinitesimal as x ➔ 0. Evidently, the quotient
f(x) = xx = x1 , x ~ 0 , IS
a(x)
2
. f"In1tesima
. not an In . . 1 as x ➔ 0 • I n genera1 a quo-

tient of two infinitesimals is not an infinitesimal.


Notion of infinity. Let a function f(x) be defined in a neighbourhood
of a point Xo, except probably at Xo, Then a function f(x) is said to be
an infinitely large function or an infinity as x tends to Xo if, given any
(whatever large) number M, there exists a number o > 0 such that

lf(x)I > M
whenever Ix - Xo I < o and x ~ Xo- In this case we write
lim f(x) = oo.
x-+xo

We say that f(x) has an infinite limit as x ➔ Xo- We also say that a limit
of f(x) is equal to A with the understanding that A is any (whatever large)
finite number.
7.4 Infinitesimals and Infinities 263

Using logical symbols we can write


(f(x) is an infinity as x-+ Xo) ~ vM
•> 0 3o > 0, vx, x -:;e Xo
Ix - Xol < o~ lf(x)I > M.
Substituting f(x) > M or f(x) < - M for lf(x)I > M, we have
lim /(x) = + 00 or lim /(x) = - 00,
x-+xo x-+xo

respectively. 1
For example, the function /(x) =- defined at every x -:;e 0 is an infinity
X

as x-+ 0 (Fig. 7.18). The function f(x) =-\- defined at every x -:;e 0 is an
X
infinity tending to + 00 as x-+ 0 (Fig. 7.19).

y lj

lj =l
x2

X 0

Fig. 7.19

Fig. 7.18

Let us show that f(x) = _!__ is an infinity as x -+ 0. Consider any


X

whatever large M > 0. For the inequality lf(x)I = 1 = - 1- > M to


x lxl
hold it is necesary and sufficient that the inequality !xi = Ix - 0I < l,
hold. So if we put 0 = if , then lf(x)I = !>
1 1
M whenever Ix - OI =
lxl < i, and x -:;e 0. Whence we infer that f(x) = ! is an infinity as
x-+ 0.
264 7. An Introduction to Analysis

To interpret the notion of infinity geometrically look at Fig. 7.20. A


functionf(x) is an infinity as x ~ Xo if given any (whatever wide) horizontal
strip between the lines y = - M and y = M there exist two vertical lines
x = Xo - oand x = Xo + osuch that the graph of y = f(x), x "# Xo, lies out-
side this horizontal strip whenever x is contained in the open interval
(Xo - o, Xo + o). Notice that not every function f(x) which is unbounded
in a neighbourhood of Xo is an infinite function as x ~ Xo. For instance,
the function f(x) = _!__ sin! is unbounded in a neighbourhood of x = O;
X X
however, f(x) is not an infinity as x ~ 0. (As an exercise draw a figure to
interpret this example geometrically.)

0 IXo .ra+B .r
I
I
I
I
-M

Fig. 7.20

We say that a function f(x) is an infinity as x ➔ oo and write


lim f(x) = oo
X-+ oo

if, given any (whatever large) number M > 0, there exists a number N > 0
such that
lf(x)I > M
whenever lxl > N.
7.4 Infinitesimals and Infinities 265

Example. The function f(x) = x is an infinity as x -+ oo.


◄ Indeed, vM > 0 3N > 0, say N = M, such that lf(x)I = !xi> M
whenever lxl > N, vx. ►
Analogously, we can define infinities as x-+ + oo and as x-+ - oo.
Relations between infinitesimals and infinities. The following theorems
lay down the relations between infinitesimals and infinities.
Theorem 7.17. /f afunctionf(x) is an infinity as x-+ Xo then a function
a(x) = 1/x) is an infinitesimal as x-+ .xo.

◄ Let e > 0 be an arbitrary (whatever small), number. Since f(x) is an


infinity as x -+ Xo then for any M > 0, say, M = ! , there exists o > 0 such
ili~ e

lf(x)I > M = !
e
whenever Ix - .xol < o and x ;e Xo, in which case there will be defined a
function a(x) = /(~) and

la(x)I = vlx)I < ii = e.


Thus
ve > 0 3o > 0, vx, x ;e Xo, Ix - .xol < o => la(x)I < e.

Whence we infer that a(x) = /(~) is an infinity as x-+ Xo. ►


Similarly we can prove the following theorem.
Theorem 7.18. If a function a(x) is an infinitesimal as x-+ Xo and if
a(x) is distinct from zero in the neighbourhood (Xo - o, Xo + o) of Xo, ex-
cept probably at Xo, then the function f(x) = a:x) is an infinity as x -+ Xo.
Problems. Using inequalities write down the definitions for the limits:
(1) lim f(x) = oo, (2) lim f(x) = + oo,
x-+1 x-+-3

(3) lim f(x) =- oo, (4) lim f(x) = + oo,


x-+O x-+ + ao

(5) lim f(x) = + oo, (6) lim f(x) = - oo.

In conclusion of this section we shall consider the rational function

( ) Qox°' + a1x" - 1 + . , , + Om
y X = 1 (ao ;e 0, bo ;e 0),
box" + b1x" - + . . . + bn
266 7. An Introduction to Analysis

which is a quotient of two polynomials in x of degrees m and n, respective-


ly. For sufficiently large lxl the denominator is distinct from zero; thus the
quotient makes sense. Then dividing the nominator and denominator by
xn, we obtain
( X) = ao? - n + a1? - n - 1 + . . . + OmX - n .
Y bO + b 1X - l + ... + bnX -n

It is easily seen that a limit of the denominator as x -+ oo is equal to


bo -;t 0. The nominator increases infinitely if m > n, has a limit equal to
00 if m = n or has a limit equal to zero if in < n. Therefore, the behaviour
of this function as x -+ oo is fully determined by the formulas

oo, m > n,
ao? + a1?- 1 + . . . + Om ao
lim , m = n,
x-+oo boxn + b1xn - l + . . . + bn bo

0, m < n.

7.5 Operations on Limits


Relationship between a function having a limit and an infinitesimal.
We shall prove a theorem which will be frequently used below.
Theorem 7.19. Let a function f(x) be defined in a neighbourhood of
a point .xo, except probably at Xo- For f(x) to have a limit A at Xo it is
necessary and sufficient that f(x) admit a representation of the form
f(x) = A + a(x),
where a(x) is an infinitesimal as x-+ .xo.
◄ Necessity. Let f(x) have a limit equal to A at .xo, 1.e.,

lim f(x) = A.
x-+xo

We put
a(x) = /(x) - A
or, equivalently,
f(x) = A + a(x)
and prove that a(x) is an infinitesimal as x-+ Xo.
Let e > 0 be an arbitrary number. Since lim f(x) = A, then for the
x-+xo
given e > 0 there exists o> 0 such that
lf(x) - Al< e
whenever Ix - Xo I< ~ and x ~ Xo,
7.5 Operations on Limits 267

Taking into account ( *), we can write


la(x)I < e
or
Ve >0 3o > 0, Vx, x -;t:. Xo, Jx - Xol < o⇒ Ja(x)I < e.
Whence it follows that a(x) is an infinitesimal as x ~ Xo.
Sufficiency. Let f(x) admit a representation of the form
f(x) =A + a(x),
where A is a constant quantity and a(x) is an infinitesimal as x ~ Xo. We
shall prove that f(x) has a limit equal to A at Xo.
Suppose that e > 0 is an arbitrary number. Since a(x) is an infinitesimal
as x ~ Xo there exists o > 0 such that
ia(x)I <e
whenever Ix - .xol < o and x -;t:. Xo.
But by virtue of(**) we have a(x) = f(x) - A and 1/(x) - A I < e. Thus
Ve > 0, 3o > 0, vx, x -;t:. Xo, Ix - .xol < o ⇒ lf(x) - A I < e.
This means that A = lim f(x). ►

Example. Let f(x) = x and Xo = 2. Then lim f(x) = 2 and


x--2
f(x) =2+ (x - 2)
where 2 is a constant quantity, x - 2 is an infinitesimal as x ~ 2, 1.e.,
a(x) = x - 2.
Arithmetical operations on limits. Let f(x) and c,o(x) be defined in a
neighbourhood of a point Xo, except probably at Xo.
Theorem 7.20. If lim f(x) = A and lim cp(x) = B then
x-+xo

(a) lim [f(x) ± c,o(x)] =A ± B,


x-+xo

(b) lim [f(x) · c,o(x)] = AB,


x-+xo

(c) lim !__((x))_ = AB provided that lim c,o(x) =B -;t:. 0.


x-+x0 'PX x-+xo

◄ We shall prove Statement (b) of this theorem.


Let lim f(x) = A and lim c,o(x) = B.
x-+xo x-+xo

By Theorem 7.19 we have


f(x) = A + a(x), cp(X) = B + {3(x),
where a(x) and {3(x) are infinitesimals as x ~ Xo.
268 7. An Introduction to Analysis

Whence
f(x)cp(x) = [A + a(x)] [B + {J(x)] =AB+ Ba(x) + At3(x) + a(x){J(x).
By Theorem 7.15 B a(x), A {J(x) and a(x){J(x) are infinitesimals as
x -+ .xo. Then their sum is also an infinitesimal as x -+ Xo, Thus we have
represented the function f(x)cp(x) as the sum of the constant quantity AB
and an infinitesimal as x-+ .xo. By Theorem 7.19 we conclude that the func-
tion f(x)cp(x) has a limit equal to AB at Xo, i.e.,
lim [f(x) · cp(x)] = AB = lim f(x) · lim cp(x). ►
x-+xo x-+xo x-+xo

It is easy to generalize Theorem 7.20 to the case of any finite number


of factors multiplied.
Corollary. A constant factor can be taken outside the limit sign.
We leave to the reader to prove Statements (a) and (c) and to solve the
following problems.
Problems (1). Let f(x) have a limit at Xo and let cp(x) have no limit at
.xo. Which of the following limits exists?
(i) lim [f(x) + cp(x)], (ii) lim [f(x) • cp(x)].
x-+ Xo x-+xo
(2) Let lim f(x) ~ 0 and let cp(x) have no limit at .xo. Prove that
X--> Xo
lim [f(x) • cp(x)] does not exist.
x-+xo . x2 - 4
Examples. (1) Compute lim .
x-+O X + 1
◄ Consider the given function as a quotient of two functions f(x) =
x 2 - 4 and cp(x) = x + 1 each of which has a limit at x = 0, namely,
lim f(x) = lim (x2 - 4) = -4 and lim cp(x) = lim (x + 1) = 1 ~ 0.
x-+O x-+O x-->O x-->O
Since a limit of cp(x) is not equal to zero, Statement (c) of Theorem 7.20
is applicable here. Then we have

4x2 _ lim (x2 - 4)


lim - - - - _x_-+_o_ _ _ _ - -4. ►
x _.. o x + 1 lim (x + 1)
x-+O

x2 - 1
(2) Compute lim 1 .
x-+ I X -

◄ Setting f(x) = x 2 - 1 and cp(x) = x - 1, we have lim f(x) = 0 and


x-+ 1
lim cp(x) = 0. Thus we can not use Statement (c) of Theorem 7.20 directly.
x-+ 1
To resolve this difficulty we can represent the quotient of /(x) and <,0(x) as
x2 - 1 (x - l)(x + 1)
x-1 x-1
7.5 Operations on Limits 269

Observing that the definition of a limit at a point does not involve this
point and the value of the function at this point, we can exclude the point
x = 1 from consideration. This means that we can reduce the fraction by
x - 1 (assuming that x - 1 ~ 0). Then we have
x2 - 1
= X + 1, X ~ 1.
x- 1
Whence
lim x
2
- 1
1 = lim (x + 1) = 2. ►
x-+l X - x-+1

. ✓ 1 + x2 - 1
(3) Compute hm 2 •
x-+O X

◄ The limits of the nominator and denominator at x = 0 are equal to


zero, and we can not use Theorem 7.20 directly. But multiplying the nomi-
nator and denominator by ✓ 1 + x 2 + 1 and assuming x ~ 0, we have at
x~O
(✓ 1 + x 2 1)(✓ 1 + x 2 + 1)
-
---,===--
x2
x2<✓ 1 + x 2 + 1) x2(✓ 1 + x 2 + 1)
1

Then applying the quotient formula given by Statement (c) of Theorem


7.20, we obtain
✓1 + x - 1 . 2 1
lim - - - - - - - 1Im -----====-----
X2 .t-+ 0 ✓ 1 + x2 + 1

1 1 ►
lim (✓ 1 + x 2 + 1) - 2 ·
x-+O

(4) Consider the function f(x) = sin x 2 which is an even function, de-
fined at every point of the number line and bounded everywhere so that
lsinx2 1 ~ 1, vx. This function vanishes at x = ± ~ , where n = 0,
1, 2, ....
◄ Consider two consecutive points at which f (x) vanishes. Let these
points be ~ and ✓ (n + l)1r and the distance between them be
✓(n + l)1r - ~ -
Now we compute lim (✓ (n + l)1r - ~ ) . Multiplying and dividing
n-+oo
(✓ (n + l)1r - ✓-,,;-) by (✓ (n + l)1r + ~), we obtain
270 7. An Introduction to Analysis

n ->oo

(✓ (n + l)1r - ✓-n:i)(✓ (n + l)1r + ..fnir)


- lim -----====-----==-----
n -00 ✓ (n + l)1r + ✓-n;-

- lim 1r =0
n-oo ✓(n + l)1r + ✓n1r '

since the nominator of the above fraction is a constant quantity and the
denominator increases infinitely as n ~ oo. Thus the distance between two
consecutive points at which f(x) vanishes tends to zero as n ~ oo. Hence,
the function J{x) = sin x 2 is nonperiodic. ►

One-sided limits. Let f(x) be defined in an interval (a, .xo). Then a num-
ber A is said to be a left-hand limit of f(x) at Xo if, given any e > 0, there
exists 8 > 0 such that
lf(x) - Al < e,
whenever Xo - 8 < x < .xo.
In symbols, we write

A = lim f(x) or A = f(Xo - 0).


x->xu - 0

Now letf(x) be defined in an interval (X-O, b). Then a number A is said


to be a right-hand limit of f(x) at X-O if, given any e > 0 there exists 8 > 0
such that

lf(x)-Al<e
whenever Xo < x < X-O + 8.
In symbols, we write
A = lim f(x) or A = f(Xo + 0).
X->Xo- 0

Suppose that a function f(x) is defined in the neighbourhood (Xo - h1,


X-O + h2), h1 > O,h2 > 0, of Xo, except probably at Xo. Furthermore, let there
exist left- and right-hand limits of f(x) at Xo (Fig. 7.21).
Then applying the Cauchy criterion for limit of f(x) at xo, we can easily
prove that for the function f(x) to have a limit at xo it is necessary and
sufficient that there exist left- and right-hand limits of f(x) at xo and these
limits be equal to each other. In other words, it is easy to prove that
f(Xo - 0) = f(Xo + 0) = lim f(x).
x-xo
7.5 Operations on Limits 271

◄ Let lim f(x) = A. Then given any c: > 0 there exists 8 > 0 such that
X--+ Xo

lf(x) - Al< c:
holds for all x (x ~ xo) in (xo - 8, Xo + 8).
Evidently, ( *) holds for all x in (xo - 8, xo) and all x in (xo, xo + 8)
so that by definition we have
A = lim f(x) and A = lim f(x).
x--+ Xo - 0 X--+ Xo +0

Conversely, let A = lim f(x) and A = lim f(x). Then given any
X--+ Xo - 0 X--+ Xo +0
c > 0 there exist 81 > 0 and 02 > 0 such that
lf(x) - Al< c:
whenever Xo - 81 < x < Xo and Xo < x < Xo + 02.

lj

0 r 0 X

Fig. 7.21 Fig. 7.22

Set o = min ( 81, 02 }. Then we obtain


IJ(x) - Al< e
for all x such that O < Ix - .xol < o. Whence it follows that
lim f(x) = A. ►
X--+ Xo

= lflx11
for x ";t. 0, h
Examples. (1) Consider the function f(x) sown
for x = 0,
in Fig. 7.22.
We have
lim f(x) = lim f(x) = o~ limf(x) = 0.
x--+O-O x--+O+O x--+O
272 7. An Introduction to Analysis

(2) Consider the function f(x) = 1 , x ~ 0, shown in Fig. 7.23.


1 + e 11 X

We have
lim f(x) = 1 and lim f(x) =0 => 3lim/(x).
x---+0-0 x---+O+O . x---+O

(3) Consider the function f(x) = e 11 x, x ~ 0, shown in Fig. 7.24.


We have
lim f(x) = 0 and lim f(x) = + oo => 3 lim f(x).
x---+0-0 x---+O+O x---+O

In conclusion we suppose that f(x) is defined on a closed interval


[a, b] or on an open interval (a, b ). Then at a f(x) can have only a right-
hand limit and at b it can only have a left-hand limit.

lj

lj

0 I X
0

Fig. 7.23 Fig. 7.24

7.6 Continuous Functions. Continuity at a Point


Notion of continuity at a point. Let a function /(x) be defined in
a neighbourhood O of a point .xo. A function /(x) is continuous at Xo if
(i) /(x) has a limit at Xo and (ii) the limit of f(x) at Xo is equal to the value
/(Xo) of /(x) at Xo, i.e., -
lim/(x) =/(Xo). (•)
x---+xo

Since Xo = lim x we can write ( *) as


x---+xo

lim /(x) = /(lim


x---+xo x---+xo
x).
In other words, for a continuous function f(x) the symbols lim and/ can
be interchanged in ( *).
7.6 Continuous Functions. Continuity at a Point 273

Now we give a precise "e-o" definition of continuity.


Let f(x) be defined at Xo and in a neighbourhood n of Xo. Then f(x)
is a continuous function at Xo if, given any e > 0, there exists o > 0 such
that
1/(x) - /(Xo)I <e
whenever Ix - Xol < o and x En.
Using logical symbols, we can express this definition as
(j(x) is continuous at Xo) ¢} Ve > 0 3o > 0, vx E n,
Ix - Xoi < o ⇒ lf(x) - /(Xo)I < e.
Notice that in general the value of odepends on both the number e > 0
and the point Xo, i.e., o = o(e, Xo). It is important to mention here that
the "e-o" definition of continuity does not require x '#- Xo.
We can also give a slightly different definition of continuity at a point.
Let y = f(x) be defined at Xo and in the neighbourhood n of Xo (Fig. 7 .25).
Consider the point x = Xo + AX in n which differs from Xo by a positive
or negative quantity -6.x. The quantity -6.x is called the increment of the
argument x at Xo. Then the difference
.1y = f(Xo + AX) - /(Xo)
is called the increment of the function fat Xo, corresponding to the incre-
ment -6.x of the independent variable x.
.
I

Fig. 7.25

In terms of the increment of x the condition of continuity of f(x) at


Xo, 1.e.,
lim f(x) = f(Xo)
x ➔ xo

becomes
lim f(Xo + AX) = f(Xo)
dX ➔ O

or, equivalently,
lim [f(Xo + AX) - f(Xo)] = 0.
.£1x ➔ O

Observe that f(Xo + AX) - f(Xo) = .6.y. Then (**) can be written as
lim .6.y = 0,
.t1x ➔ O

18-9505
274 7. An Introduction to Analysis

which offers another definition of the continuity of y = f(x) at the point


.xo: the function f(x) is continuous at Xo E n if the increment off at Xo,
corresponding to the increment Llx of x, tends to zero as Llx ~ 0.
Example. Let us show that the function y = x 2 is continuous at every
point of the number line.
◄ Indeed, given any increment Llx at Xo we have

dy = (Xo + ax) 2 - xi = 2.xoLlx + (Llx-) 2 = (2.xo + ax)Llx.


Whence it follows that dy ~ 0 as Llx ~ 0. This implies that y = x 2 is con-
tinuous at every point Xo of the number line. ►
The definitions given above are equivalent to each other and any of
them can be applied when suitable.
In many cases it is convenient to apply the definition of continuity at
a point which stems from the sequential criterion for limits. Let f(x) be
defined on any set E of real numbers and let Xo E E. A function f(x) is
continuous at Xo if for any sequence {xn), Xn EE which converges to Xo
the corresponding sequence {f(xn)} of the values of f(x) converges to f(Xo).
Example. Prove that the Dirichlet function
D(x) = [1 for rational x,
0 for irrational x
is discontinuous at every point.
◄ Let Xo be any irrational number. Then D(.xo) = 0 and whatever Xo there
exists a sequence ( rn) of rational numbers, converging to .xo. By definition
D(rn) = 1 Vn; so the sequence ( D(rn)} = (1 ) converges to 1 so that
( D(rn)} ~ D(.xo). This means that D(x) is not continuous for irrational
x. By similar reasoning we can easily verify that D(x) is discontinuous for
all rational x. ►
It is not hard to show that the "s-o" definition of continuity at a point
and the definition stemming from the sequential criterion for limits are
equivalent.
Properties of functions continuous at a point. The following two the-
orems describe the important properties of continuous functions.
Theorem 7.21. If a Junction f(x) is continuous at a point Xo and if
f(.xo) > A (f(.xo) < A) then thereexistso > Osuch thatf(x) > A (f(x5 < A)
for all x in the open interval (Xo - o,
+ o). Xo
◄ For definiteness, we assume that J(.xo) > A. Then
J(.xo) = A + h,
where h > 0.
Let e = ~ . Since f(x) is continuous at Xo there exists o> 0 suc_h that
7.6 Continuous Functions. Continuity at a Point 275

for all x
h
lf<x) - f(.xo)I < 2
or
- ~ < f(x) - f(.xo) < ~
whenever Ix - .xol < o. Whence
h h h
f(x) > f(Xo) - 2 = A + h - l = A + 2 ·
Thus, f(x) > A, vx E (Xo - o, Xo + o). ►
From the preceding theorem the following theorem can be derived.
Theorem 7.22. Let f(x) be continuous at Xo and let f(.xo) ~ 0. Then there
exists a neighbourhood (Xo - o, Xo + o) of Xo such that f(x) does not vanish
and is of the same sign at every point of the neighbourhood.
◄ To prove this theorem it suffices to put A = 0 in Theorem 7.21. ►
Continuity of elementary functions. Basic elementary functions are*)
(a) power function y = xa, where a is an arbitrary real number, x > 0;
(b) exponential function y = ax, a > 0, a ~ 1, - oo < x < + oo;
(c) logarithmic function y = loga x, a> 0, a ~ 1, x > 0,
(d) trigonometric functions y = sin x, - oo < x < + oo, y = cos x;
7r
- oo < x < + oo, y = tanx, x ~ 2 + n1r, n = 0, ±I, ±2, ... , y = cotx;
x ~ n 1r, n = 0, ± 1, ± 2,
... ;
(e) inverse trigonometric functions y = sin - 1 x, -1 ~ x ~ 1, y = "
cos- 1 x, -1 ~x ~ 1, y = tan- 1 x, -oo <x< +oo, y = cot- 1 x,
- oo < X < + oo.
We say that elementary functions are those obtained by means of a finite
number of arithmetic operations and a finite number of operations of con-
structing a function of a function. It is not hard to show that the basic
elementary functions are continuous at every point of their domains.
We prove that the function y = cos x is continuous at every point of
the domain - oo < x < + oo.
◄ First we prove the auxiliary inequality

lsinxl < lxl vx.


Look at a circle of unit radius (Fig. 7.26). Let a radian measure of the
angle AOB be x, 0 < x < 27r and let the angle AOB be equal to the angle
AOC Evidently, the length of the segment BC is equal to 2 sin x and the

*> A detailed account of basic elementary functions is given in the Appendix.


276 7. An Introduction to Analysis

length of the arc BC is equal to 2x. Since the arc length is greater than
the length of the subtending chord we have
2 sinx < 2x and sinx < x.
This inequality can be written as lsin xi < lxl for x E (0, 1r/2). The identities
Isin ( - x)I ::;:: I~ sin xi = lsin xi and I - xi = lxl imply that the inequality
lsinxl < lxl also holds for x E (- 1r/2, 0). Observe that(*) is also true for
all x E ( - 1r/2, 1r/2) since sin 0 = 0. If x~( - 1r/2, 1r/2) then lxl ~ 1r/2 > 1,
whereas Isin xi ~ 1 vx. Thus we conclude that the inequality
lsin xi ~ lxl
holds for any x.
Now we turn to the function y = cos x which is defined at each point
of the number line. Let x be an arbitrary point (x E IR) and ~ be an incre-
ment of x at x. Then the increment of y = cos x can be expressed as

.:iy = cos (x + Llx) - cos x = - 2 sin ( x + ~) sin ~.


Whence

l.:iYI = -2 sin (x + ~) sin ~


. ~
=2 2-
Sln-

0
1

Fig. 7.26 Fig. 7.27

Recall that sin ( x + ~) ,.; I vx, vLlx and, by virtue of (• ),

sin ~ ~ l~I . Thus we can write (**) as

l.:iyl ..; 2 · 1 · '.~I = l&I


and

where ~y is a function of AX for a given x.


Whence, by virtue of Theorem 7.13, we obtain
lim ~Y =0
4X-> 0

which means that the function y = cos x is continuous at every point of


the number line. ►
Remarkable limits. We consider the relation sin x which is a function
X
of x defined at any x ;c. 0 and prove the following theorem.
Theorem 7.23. If x is measured in radians then
.
11m s1nx
- - = 1.
x->O X

◄ Suppose that O < x < 1r/2 and consider a circle of unit radius
(Fig. 7.27). From Fig. 7 .27 it is easily seen that
area of ~OAB < area of sector OAB < area of ~OAC.
These areas are equal to ~ sin x, ~ x and -J tan x, respectively. Then we have

sin x < x < tan x, x E (0, 1r/2).


Dividing all the terms of these inequalities by sin x > 0, we obtain
X I
I < Sin X
<---.
cosx
Whence .
Sin X
1 > > cosx (***)
X

which holds for x E (0, 1r /2) and also for x E ( - 1r /2, 0) since
sin ( - x) sin x
- - - - = - - and cos(-x) = cosx.
-X X

The function y = cos x is continuous at any x and, in particular, at


x = 0 so that
lim COS X = COS O = 1.
x->O

Thus both <,0(x) = cos x and l/;(x) = 1 have the limits equal to 1 at x = 0.
By Theorem 7.13 (•••)and(****) yield the identity we are seeking for. ►
278 7. An Introduction to Analysis

In the preceding sections we have proved that

lim
n--oo
(1 + l)n
n
= e.

Denoting 1/n = z we have n= 1/z and z~ 0 as n~ oo. Then

lim (1 + z) 11 z = e,
z--o
where z = 1, 1/2, 1/3, ....
We can verify that there exists lim (1 + x) 11x which is also equal to the
number e when x tends to 0 in an arbitrary way by running through any
sequence of real values distinct from 0, i.e.,
lim (1 + x) 11x = e.
Operations on functions continuous at a point. In this section we shall
develop a number of important properties of continuous functions.
Theorem 7.24. Let f(x) and cp(x) be defined at Xo and in a neighbour-
hood of XQ. If f(x) and <P(X) are continuous at Xo so are the sum f(x) + <P(X),
the difference f(x) - <P(X), the product f(x)cp(x) and, if cp(x) ~ 0, the quo-
tient f(x) .
<P(X)
By way of illustration we shall prove the quotient rule; all the other
rules can be proved in a similar way.
◄ Let f(x) and cp(x) be continuous at Xo and let cp(Xo) ~ 0. By Theo-
rem 7 .22 there exists a neighbourhood of Xo such that <P(X) ~ 0 for all x in this
neighbourhood. Thus, the function F(x) = ~~~ is defined at Xo and in
a neighbourhood of Xo- Since lim f(x) = f(Xo) and lim cp(x) = <P(Xo) ~ 0
x--> Xo X--> Xo

the quotient rule (see Theorem 7.20) is applicable here and we have
lim f(x)
lim F(x) = lim f(x) x--> Xo
/(Xo) = F(Xo).
x-->x 0 x ➔ xo
cp(X) lim cp(x) <p(Xo)
x--> Xo

Thus lim F(x)


.
= F(Xo), 1.e., the function F(x) =
f(x)
lS continuous
x--> Xo <p(X)

at .xo. ►
Composite functions. Let E be an arbitrary subset of a number line
and let u = <p(X) be a function defined on E. We denote by E1 a set of
values of u corresponding to the values of x in E. Furthermore, let there
be defined a function y = f(u) on E1. Then every x EE corresponds to a
7 .6 Continuous Functions. Continuity at a Point 279

certain u E £1 which in turn is associated with a certain value of y = J(u)


(Fig. 7.28). Thus, the value of y is a function of x which is defined on
E. In this case we say that y is a composite function of x and write
Y = J[ <P(X)].
For instance, if u = sin x and y = e" then y = esin x is a composite function
defined at all x.
y=f(u}=f{P(x}}
y

l1

X E X

Fig. 7.28

Theorem 7.25. Let u = cp(x) have a limit equal to A at Xo and let


y = J(u) be a continuous Junction at the point u = A. Then a composite
Junction y = J [<P(x)] has a limit equal to J(A) at .xo.
◄ We have to prove that lim f[cp(x)] = J(A), i.e.,
X-,. Xo

Ve > 0 30 > 0, vx, x ~ X-0, Ix - .xol < o⇒ IJ[cp(x)] - J(A)I < e.


Let e > 0 be an arbitrary number. Since J(u) is continuous at u =A
then for the given e > 0 there exists a number rJ > 0 such that
IJ(u) - J(A)I < e
for all u whenever Iu - A I < r,.
By definition A = lim cp(x) so that for any 'Y/ > 0 there exists o> 0
X-+Xo

such that lcp(x) - A I < r, or, equivalently, lu - A I < 'Y/ for all x ~ X-0
whenever Ix - xol < o.
The inequality Iu - A I < T/ implies (*) which can be written as
lf[cp(x)] - J(A)I < e.
Thus, for any e > 0 there exists o> 0 such that
lflcp(x)] - J(A)I <e
whenever Ix - Xo I < o and x ~ Xo-
280 7. An Introduction to Analysis

This means that the number f(A) is a limit of the composite tuncuon
f ['P(X)] at Xo. ►
We see that when the conditions of Theorem 7.25 are fulfilled we have
lim f['P(x)] = f(A)
X ➔ Xo

or, equivalently,
limf[cp(x)] =f[lim cp(x)].
X ➔ Xo x---+xo

The latter identity indicates the rule of computing a limit of a composite


function.
. In (1 + x)
Example. Show that hm - - - - = 1.
x---+O X

◄ Observe that In (l -=--:2- = In (1 + x) 11x. The function


X
y = In (1 + x) 11 x is a composite function made up of the functions y = In u
and u = (1 + x) 11x. Since lim (1 + x) 11x = e and y = In u is continuous at
x➔o

u = e. Theorem 7 .25 gives

lim In(l + x) = limln(l + x) 11 x = In [lim (1 + x) 11 x] = Ine= 1. ►


x---+o X x---+o x---+O

Theorem 7.26. Let u = cp(x) be a continuous function at Xo and let


y = f(u) be continuous at uo = cp(Xo). Then a composite function
y = f[cp(x)] is continuous at .xo.
◄ Since u = 'P(x) is continuous at Xo it has at Xo a limit equal to
cp(Xo) = uo. Besides, y = f(u) is continuous at uo. Then Theorem 7.25 im-
plies that the composite function y = f [cp(x}] has at Xo a limit equal to
f(uo) = fl'P(Xo)] or
lim f[cp(x)] = f[cp(Xo)].
X->Xo

Whence we infer that a composite function f[ce(x)] is continuous at .xo. ►


Discontinuities of functions. Let f(x) be defined at a point Xo and in
a neighbourhood of .xo. If f(x) is continuous at Xo then
lim f(x) = f(.xo)
X---+Xo

or, in terms of one-sided limits,


lim f(x) = lim f(x) = f(.xo).
X--+ Xo - 0 X--+ Xo +0
Thus the function f(x) is continuous at the point Xo if and only if there
exist the left-hand and right-hand limits of f(x) at Xo these limits being
equal to each other and to the value of f(x) at .xo.
7.6 Continuous Functions. Continuity at a Point 281

Definition. The function f(x) has a discontinuity at the point Xo if f(x)


is not continuous at Xo and Xo is called the point of discontinuity*>.
The discontinuities of a function can be classified according to the sense
of violating the condition (*).
Definition. A point .xo is called a removable discontinuity of f(x) if f(x)
has the left- and right-hand limits at Xo which are equal to each other but
are different from the value of f(x) at .xo, i.e.,
lim f(x) = lim f(x) '#- /(Xo).
x-+ Xo - 0 x-+ Xo +0
When Xo is a removable discontinuity it suffices to modify the function
f(x) only at the point Xo so that f(x) becomes continuous at .xo. That is,
if Xo is a removable discontinuity of f(x) then the function

F(x) = [f(x) for x '#- Xo,

lim f(x) for x = Xo


x-+xo

is continuous at Xo. We have removed a discontinuity by changing the value


of f(x) at one point, xo.
y y

I
: limf(rJ
I X-+Xo
I

0 X 0 Xo X

Fig. 7.29 Fig. 7.30

Example. Let
f ~1 1 for x '#- 0,
f(x) = l for x = 0.

◄ We have
lim f(x) = lim f(x) = 0 ~ 1 = /(0),
X--+Xo-0 x--+xo+O

so that the point x = 0 is a removable discontinuity of f(x) (Fig. 7 .29).


If we modify f(x) at x = 0 by putting /(0) = 0 then we shall get a function
F(x) = lxl which is continuous at x = 0. ►

*> If f(x) is not defined at Xo the point Xo is also called the point of discontinuity.
282 7. An Introduction to Analysis

In general, if a function f(x) is continuous on an open interval (x0 - 01,


Xo + 02) everywhere, except at Xo, and if Xo is a removable discontinuity
then the graph of f(x) is represented by a continuous curve from which
the point corresponding to the abscissa Xo is deleted and is replaced by
the point of f(xo) whose ordinate is different from the ordinate of the re-
moved point (Fig. 7.30). It is important to mention that if xo is a removable
discontinuity then at Xo there exists lim f(x).
x--+xo
We say that Xo is an unremovable discontinuity of f(x) if lim f(x) does
x--+xo
not exist at xo.
Definition. If the left- and right-hand limits of f(x) at Xo are finite and
are different, i.e., if
lim f(x) -;,:; lim f(x),
x-+ Xo - 0 x-+ Xo +0

then the point Xo is a discontinuity at which f(x) has a jump (or a gap)
and f(.xo) does not need to be equal to either of the one-sided limits. In
this case the jump of f(x) at Xo is equal to the difference
f(Xo + 0) - f(Xo - 0) of the left- and right-hand limits of f(x) at .xo.
y

0 X

Fig. 7.31

Example. Consider f(x) = 2 , f(O) = 1 (Fig. 7.31).


1 + e11 X

◄ At x =0 the function f(x) has a jump since


lim f(x) = 2 and lim f(x) = 0. ►
x ➔ 0-0 x--+O+O
A removable discontinuity and a discontinuity at which the function
has a jump are called the discontinuities of the first kind. If Xo is a discon-
tinuity of the first kind the functionf(x) has finite left-hand and right-hand
limits at .xo. All other discontinuities are called discontinuities of the second
7.7 Continuity on a Closed Interval 283

kind. Thus, if X<l is a discontinuity of the second kind the function f(x)
at X<l does not have either a finite left-hand limit or a finite right-hand limit.
Examples. (1) If f(x) = -~ , x -;,±. 0, f(0) = 0, the point x = 0 is a discon-
x
tinuity of the second kind, for lim f(x) = - oo and lim f(x) = + oo.
x-o-o x-o+o
(2) Let f(x) = sin_!_ , x -;,±. 0, f(0) = 0.
X
◄ At x = 0 f(x) has neither finite nor infinite one-sided limits. We can
easily prove this by applying the sequential criterion for limits. Thus the
point x = 0 is a discontinuity of the second kind. ►
(3) For the Dirichlet function

D(x) = [ 1 for rational x,


0 for irrational x
any point xo is a discontinuity of the second kind.
We say that a function f(x) is continuous at a point X<l from the right if
lim f(x) = f(X<l) or f(X<l + 0) = f(X<l),
x-xo + 0

and f(x) is continuous at X<l from the left if


lim f(x) = f(X<l) or f(X<l - 0) = f(X<l).
x-xo- 0

7.7 Continuity on a Closed Interval


Definitions. A function f(x) is said to be continuous on an open
interval (a, b) if f(x) is continuous at every point of (a, b). We denote
a set of all functions continuous on an open interval (a, b) by C(a, b).
A function f(x) is said to be continuous on a closed interval [a, b] if
f(x) is continuous on an open interval (a, b) and is continuous from the
left at band from the right at a. We denote a set of all functions continuous
on [a, b] by C[a, b].
Properties of functions continuous on closed intervals can be estab-
lished with the help of the following theorems.
Theorem 7.27. Let f(x) be continuous on a closed interval [a, b] and
let f(x) and f(b) be of opposite signs. Then there exists at least one point
in (a, b) at which f(x) vanishes.
◄ Let f(a) and f(b) be of opposite signs. The point t= a ; b is a mid-
point of [a, b]. If f(t) = 0 we get what we are seeking for, i.e., the theorem
is true. Let f(t) -;,±. 0. Consider the closed intervals [a, t] and [t, b]. One
284 7. An Introduction to Analysis

of these intervals denoted by [a1, bi] is such that f(x) assumes the values·
. signs
o f opposite . . en d points.
at its . Clear1y, t h e point t = a1 + b 1 b ecomes
. <;;1
2
the midpoint of [a1, bi]. If /(~1) = 0 the theorem is proved. However it
may occur that f( ~1) ~ 0. Then we choose between the closed intervals
[a1, 6] and [~1, bi] the one at whose endpoints f(x) assumes the values
of opposite signs, denote this closed interval by [a2, bi], and repeat the
process as above.
It is clear that if we continue this process we finally either arrive at
a point a E (a, b) wheref(a) = 0, thus completing the proof of the theorem,
or we obtain an infinite sequence of the nested closed intervals
[a, b] :J [a1, bi] :J ... :J [an, bn] :J ....
All the closed intervals tend to zero as n ~ oo, i.e.,
lim (bn - an) = lim b -: a = O
x ➔ oo n ➔ oo 2
and on every closed interval f(x) assumes the values of opposite signs at
the endpoints.
Recall that the Cantor lemma implies that there exists a unique point
a contained in each [an, bn], n = 0, 1, 2, .... Let us prove that f(a) = 0.
We assume the converse, i.e., f(a) ~ 0, and for definiteness we set
f(a) > 0. Since f(x) is continuous at a E [a, b]. Theorem 7.22 implies that
there exists an open interval (a - o, a + o) in which f(x) is positive. As
a = lim an = lim bn we can choose sufficiently large n so that the closed
n ➔ oo n ➔ oo

interval [an, bn] is contained in the open interval (a - o, a + o). This means
that f(an) and f(bn) are of the same sign (both positive). However on every
closed interval [an, bn] f(an) and f(bn) are of opposite signs. Thus we have
a contradiction which makes our assumption of f(a) ~ 0 false. Hence,
J(a) = 0. Notice that a < a < b, a E [a, b] and the point a must be distinct
from both a and b since f(a) ~ 0 and f(b) ~ 0. ►
It is easy to interpret this theorem geometrically (Fig. 7.32) .. If
f(a)J(b) < 0 then the points A(a, f(a)) and B(b, f(b)) occur on different
semiplanes relative to the x-axis, and the graph of the continuous function
f(x) intersects the x-axis at least at one point between a and b.
It is important that f(x) must be continuous of [a, b]. If the function
is discontinuous at a point in [a, b] it may have a jump from a negative
value to a positive one without assuming a zero value as does the function
-1 for -1 ~ X < 0,
f(x) = [ 1
for 0 ~ X ~ 1,
shown in Fig. 7.33 ..
7.7 Continuity on a Closed Interval 285

To illustrate how Theorem 7.27 works we consider the polynomial equa-


tion of odd degree with real coefficients
P2n + 1 (x) = aoX2n + 1 + a1x2n + . . . + ll2n + 1 = 0.
◄ Let ao > 0. For sufficiently large absolute values of x P2n + 1 (x) becomes
negative when xis negative and positive when x is positive. Since the poly-
nomial is everywhere a continuous function of x, P2n + 1(x) necessarily van-
ishes at some intermediate point of its domain when changing its sign from
negative to positive. Hence any polynomial of an odd degree with real
coefficients has at least one real root. ►

y y

f{b)

0 a
I
I f(a)
L) -- I -1,
I
0 .T

I I
I

Fig. 7.32 Fig. 7.33

We can apply Theorem 7.27 when we wish to find out whether a poly-
nomial has a real root and, if any, to compute an approximate value of
the root.
Example. Consider the polynomial P3(x) = x 3 + x - 1.
◄ Since P3(x) is of odd degree it has at least one real root.
Evidently, we have P3(0) = -1 < 0 and P3(1) = 1 > 0, i.e., at the end-
points of [O, 1] P3(X) assumes the values of opposite signs. Hence P3(X)
has a real root contained in (0, 1).
At the midpoint ~1 = 1/2 of the closed interval [O, 1] we have
P3(1/2) = -3/8 < 0. Recall that P 3(1) > 0. This means that the desired
root is contained in the open interval (1/2, 1). Again we compute the value
of P3(X) at the midpoint of [1/2, l], i.e., at b = 3/4. We obtain P3(1/2) < 0
and P3(3/4) = 11/64 > 0. Hence the root of P3(X) is contained
in (1/2, 3/4). Working through this process we obtain a sequence of the
nested open intervals of decreasing lengths. In other words, at each step
of this process we improve the approximation of the root whose precision
is given by the length of the last open interval thus obtained. ►
286 7. An Introduction to Analysis

Theorem 7.28 (intermediate value theorem). Let f(x) be continuous on


[a, b] and let f(a) = A and f(b) = B. If C is any value between A and
B then there exists at least one point a E [a, b] such that f(a) = C, i.e.,
if f(x) is continuous on [a, b] it assumes all intermediate values between
f(a) and f(b).
◄ Consider the function cp(x) = f(x) - C.
For definiteness we assume that A < B and A < C < B. Clearly, cp(x)
is continuous on [a, b] and
cp(a) = f(a) - C = A - C < 0,
cp(b) = f( b) - C = B - C > 0.
By virtue of Theorem 7.27 there exists a E (a, b) such that cp(a) =
f(a) - C = 0. Hence f(a) = C. ►
Theorem 7.29. If a function f(x) is continuous on [a, b] then f(x) is
bounded on [a, b], i.e., there exists a number K > 0 such that for all
x E [a, b]

lf(x)I ~ K.

Remark. If a function f(x) is continuous on an open interval (a, b)


or on half-open intervals (a, b] or [a, b) thenf(x) is not necessarily bound-
ed on the respective interval. For example, f(x) = 1/x is continuous on the
half-open interval (0, 1] and is not bounded on it.
Now we suppose that f(x) is defined and bounded on a set E. We say
that a supremum of the set of values of f(x) on E is a supremum M of
the function f(x) on E, i.e.,
M = sup f(x).
XEE

Similarly, an infimum m of f(x) on E is


m = inf f(x).
XEE

Theorem 7.30. /f afunctionf(x) is continuous on [a, b] thenf(x) attains


its supremum and its infimum on [a, b], i.e., there exist numbers ~ and
~ 1n [a, b] such that

f(~) = m = inf f(x), f(~) =M = sup f(x),


XE~~ ~~~

and
f(~) ~ f(x) ~ f(~).
It is important here that f(x) is continuous on [a, b]. For instance,
f(x) =x is continuous and bounded on ( -1, 1) but does not attain its
7.7 Continuity on a Closed Interval 287

supremum sup x = 1, i.e., there exists no xo E ( -1, 1) such that


XE(-),))

f(xo) = 1. Analogously, f(x) does not attain its infimum inf x = -1.
XE( - 1,1)

Example. Consider f(x) = x - [x] on [0, 1) (Fig. 7.34).


◄ Here sup f(x) = 1 is unattainable on [0, 1), since f(x) is not continu-
xE[o,II

ous on [0, 1). ►

lj

0 X

Fig. 7.34

We call the supremum of f(x) the absolute maximum of f(x) on


[a, b] and the infimum of f(x) the absolute minimum of f(x) on [a, b].
Then we can express Theorem 7.30 in a different way.
Theorem 7.30 '. If a function f(x) is continuous on [a, b] then f(x) at-
tains its absolute maximum and absolute minimum on [a, b].
Uniform continuity. Functions continuous on closed intervals possess
an important property of uniform continuity.
Let f(x) be continuous on (a, b ). Then given any Xo E (a, b) and any
number e > 0 there exists a number o > 0 such that lf(x) - f(Xo)I < e
whenever x E (a, b) and Ix - Xol < o.
The value of o may depend both on e and on Xo, i.e., o = o(e, Xo) so
that it may happen that for a given e > 0 o can be different for different
Xk E (a, b) and there exists no o which will work for all Xk E (a, b). The
requirement that there exist o = o(e) > 0 for all Xk is stronger than the re-
quirement that f(x) be continuous on (a, b ).
Definition. A function f(x) is said to be uniformly continuous on an
open interval (a, b) if for every e > 0 there exists o = o(e) > 0 such that
lf(x ' ) - f(x " ) I < e
whenever Ix' - x" I < o and x', x" E (a, b).
Using logical symbols, we can write
(f(x) is uniformly continuous on (a, b)) <=> Ve > 0,
3o = o(e) > 0, vx', x" E (a, b),
Ix' - x" I < o ~ lf(x') - f(x" )I < e.
288 7. An Introduction to Analysis

It is important here that for every e > 0 there exists o > 0 such that
lf(x') - f(x" )I < e for all x ', x" belonging to (a, b) only under the condi-
tion Ix' - x" I < o. For example, the function f(x) = x is uniformly con-
tinuous on the number line. To verify this it suffices to put o = e.
It is easy to see that if f(x) is a uniformly continuous function on
(a, b), it is continuous at every point x E (a, b). The converse is not true.
For instance, f(x) = sin is continuous on (0, 1) and is not uniformly
1r
X 1 2
continuous on (0, 1). Indeed, let x; = - and x;' = 2 . Then we can
n n+ 1
choose n so that the value of
1 2 1
n 2n + 1 n(2n + 1)
becomes smaller than any o> 0 whereas
lf(x;) - f(x;' )I = 1 > e ve < 1.

Thus for e > 0, say for e = ; , and for any o > 0 there exist points
x; and x;' in (0, 1) such that Ix; - x;' I < oand lf(x;) - f(x;' )I > e. Hence
f(x) = sin 1r is not uniformly continuous on (0, 1).
X

Example. The function f(x) = lx is continuous on (0, 1) but is not uni-


formly continuous on (0, 1).
◄ Let Xn' = -l an d Xn" = - -l - wh ere e > 0 1s
. an ar b'1trary num ber.
n n + 2e '

Then Ix; - x;' I = -(- 28 )


n n + 2e
which can be made smaller than any o> 0

by choosing sufficiently large n. However lf(x;) - f(x;' )I = 2e > e Ve > 0.


Hence f(x) = l is not uniformly continuous on (0, 1).
X -
The following theorem is worth mentioning.
Theorem 7.31 (due to Cantor). If a function f(x) is continuous on a
closed interval [a, b] then f(x) is uniformly continuous on [a, b].

7.8 Comparison of Infinitesimals


Definitions. Let a(x) and {3(x) be infinitesimals as x ➔ .xo. Then
(a) a(x) is an infinitesimal of higher order than {3(x) if

ll·m a(x) =0
x-+xo {3(x) .
7.8 Comparison of Infinitesimals 289

In symbols, we write a(x) = o({j(x)), x---+ Xo, to mean that o(/j(x)),


x---+ Xo, is an infinitesimal of higher order at Xo than an infinitesimal {j(x)
at .xo.
For example, a(x) = x 2 and {l(x) = x are infinitesimals as x---+ 0. Then

lim ~Jx) = lim _x~ = lim x = 0.


x--+O{3(X) x--+OX x--+O

Whence a(x) = o ({3(x)), x---+ 0, or x 2 = o(x), x---+ 0.


(b) a(x) and {3(x) are infinitesimals of the same order if
. a(x)
hm /3( )
x--+ Xo X
= C -;e: 0.

For example, a(x) = 2x and {l(x) = x, x---+ 0 are infinitesimals -of the
same order since

lim _a(x) = lim -2x = 2.


x-+O {3(x) x--+O X

(c) a(x) and {3(x) are incomparable infinitesimals as x---+ Xo if the ratio
a(x)
~--- - has neither finite nor infinite limit as x---+ .xo. For example,
{3(x)
a(x)
a(x) = x sin 1 and {3(x) = x are incomparable since the ratio {3(x) -
X

sin l_X has no finite limit at x =0 and is not an infinity as x ---+ 0.

(d) a(x) is an infinitesimal of the mth order, where m is a positive in-


teger, relative to the infinitesimal {l(x) = x - xo as x---+ Xo if
. a(x)
hm - ---- = B -;e: 0.
X--+ Xo (x - Xo ) 111

For example, a(x) = 3 sin 2 x is an infinitesimal of order m = 2 relative


to {3(x) = x as x---+ 0 since

. 3 sin 2 x
h m -- --- -- = 3.
x-+O X2

Equivalent infinitesimals. The infinitesimals a(x) and {j(x) as x ---+ xo are


said to be equivalent if at the point xo
. a(x)
hm
x--+xo
/3(X ) = 1.

Thus equivalent infinitesimals are infinitesimals of the same order.

19-9505
290 7. An Introduction to Analysis

We write
a(x) ~ /j(x), x ➔ Xo

to mean that a(x) and /j(x) are equivalent infinitesimals.


Sometimes equivalent infinitesimals a(x) and /j(x) are called asymptoti-
cally equal as x ➔ Xo.
Remark. Let a(x), {3(x) and -y(x) be infinitesimals as x ➔ Xo- It is not
hard to verify that
(1) a(x) ~ a(x) at x ➔ Xo;
(2) if a(x) - /j(x), then {3(x) - a(x) at x ➔ Xo;
(3) if a(x) ~ /j(x) and {3(x) ~ <P(X), then a(x) - -y(x) at x ➔ Xo,
Thus the equivalent relation is a reflexive, symmetric and transitive re-
lation.
Examples of equivalent infinitesimals. In the previous sections we have
proved that
sinx .
lim - - = 1 ~ Sln X ~ X, X ➔ 0
x--+O X
and
lim In (1 + x) = 1 ~ In (1 + x) ~ x, x ➔ 0.
x--+0 X

It is easy to verify the following:


tanx
lim - - =1~ tan x ~ x, x ➔ 0,
x--+O X

sin - t x .
lim---=l~s1n- 1 x~x ' x ➔ O '
x--+O X
tan- 1 x -1
lim - - - =1~ tan x ~ x, x ➔ 0.
x--+O X

By way of illustration we show that

lim ax - 1 = In a, (a > 0, a ~ 1). -


x--+O X

◄ Put ax - 1 = y so that y ➔ 0 as and ax= 1 + y,


In (1 + y) Wh
x = l . ence
na
ax - 1 y ln a
lim - - - = lim - - - - - = lim - - - - = In a.
x--+o x y--o In (1 + y) y--+O In (1 + y)
lna y
Thus ax - 1 ~ x ln a, x ➔ 0.
7.8 Comparison of Infinitesimals 291

In particular, putting a = e, we get


ex - 1
lim - - - = 1 ⇒ ex - 1 ~ X, x-+ 0. ►
x .... O X
· y that 1·Im -
Let us a Iso venf +-
(1 - x)P-
--l =p w here p 1s
. a rea I num ber.
x .... O X

◄ Put (1 + x)P - 1 = y so that y-+ 0 as x-+ 0 and (1 + x)P = 1 + y.


Whence
p ln (1 + x) = In (1 + y ).
Thus we can write
(1 + x)P - I _ y p In (1 + x) y
-
x x ln (I + y) x
Working through the limit as x-+ 0 (y-+ 0), we obtain
.
IIm (1 + x)P - 1 . y . p ln (1 + x)
- - - - - = 1Im - - - - · 1Im - - - - - = p.
x .... o X y---+O ln (1 + y) x---+O X

Hence, (1 + x)P - 1 ~ px, x -+ 0. ►


The follo"wing list of equivalent infinitesimals is extremely helpful to
work through a variety of topics discussed in calculus:
sinx ~ x , x-+ O·'
tan x ~ x, x -+ O;
sin - 1 x ~ x, x -+ 0;
tan - 1 x ~ x, x -+ 0;
In (1 + x) ~ x, x-+ 0;
ax - 1 ~ x ln a, x -+ O;
ex - 1 ~ x, x -+ 0;
(1 + x)P - 1 ~ px, x -+ 0.
Remark. If there exist a nonzero number a and a positive integer m
such that f(x) ~ ax"', x -+ 0 we call axm the principal (power) asymptotic
term of the function f(x) as x -+ 0. The right-hand terms of asymptotic
equalities in the list presented above are the principal (power) terms of their
left-hand terms.
Theorem 7.32. Let a(x), (3(x), a1(x) and /31(x) be infinitesimals as x-+ Xo
and let a(x) ~ a1 (x) and (3(x) ~ /31 (x). If there exists a finite or infinite
limit of ~~;~ at Xo this limit remains unchanged when a(x) is replaced
by a1(x) and (3(x) is replaced by /31(x).
19*
292 7. An Introduction to Analysis

a1(x)
◄ Consider and express it as
/31 (X)
a1(x) a1(X) a(x) {3(x}
-
/31 (x) a(x) {3(x) /31 (x) ·
By the. hypothesis of the theorem
. a1(x) _ {3(x)
h m - - - 1 and lim = 1.
x--+xo a(x) x--+xo /31(X}

Let A be a limit of ;;;~ at Xo- Then Theorem 7.20(b) gives

.
1Im a 1(x) . a1 (x) . a(x) . {3(x) A
-- = 11m - - · 1Im - - · 1Im - - = .
X--+Xo /31(X) X--+Xo a(X} X--+Xo {3(X) X--+Xo /31(X)

Whence we infer that the ratio ;: i;~ also has a limit equal to A at Xo-

If the ratio . :t~ is an infinity as x-> Xo then the right-hand side of

(•) and, consequently, ;: i;~ are also infinities as x -> Xo- ►


. 1 - cosx
Example. Compute hm 2 •
x--+O In (1 + X }
◄ Applying Theorem 7.32 and the list of equivalent infinitesimals, we
have
x2
2 sin 2 ; 2-
1 - cosx
1m
r 4 1
lim ~ =
x2 = lim
x2 - 2 . ►
x--+O In (1 + x 2 } x--+O x--+O

Theorem 7.33. For the infinitesimals a(x) and {3(x) as x--+ Xo to be


equivalent it is necessary and sufficient that the difference of a(x) and {3(x)
as x--+ Xo be an infinitesimal of higher order than a(x) and {3(x).
◄ Necessity. Let a(x) and {3(x) be equivalenf infinitesimals as x--+ Xo. We
prove that the difference -y(x) = a(x) - {3(x) as x --+ Xo is the infinitesimal
of higher order than {3(x) and, consequently, a(x).
Indeed, by the hypothesis of the theorem we have a(x) - {3(x), x--+ Xo
so that lim ~(x)) = 1. Whence
x--+xo fJ(x
lim -~_(x_))
x--+xo fJ(x
= lim _a_(x_)_-_)_/3(_x_)
x--+xo {3(x
= xl~mxo
~
( ~((xx)) -
fJ
1) = 0.
This means that -y(x) is the infinitesimal of higher order than {3(x) as
X--+ Xo.
7.8 Comparison of Infinitesimals 293

Sufficiency. Let -y(x)= a(x) - {3(x) be an infinitesimal of higher order


than the infinitesimals a(x) or {3(x) as x _. Xo- We prove that a(x) ~ {3(x)
as x-. .xo.
Indeed, as stated -y(x) is an infinitesimal of higher order than {3(x); so
lim -y(x) = 0.
X ➔ Xo {3(X)
Whence
ll·m a(x) __ 11.m - {3(x)
-- +--y(x)
- - hm. ( 1 + ~~--
-y(x) ) = 1
X ➔ Xo {3(X) X-->Xo {3(X) X ➔ Xo {3(X) '
which implies that a(x) and {3(x) are equivalent infinitesimals as x _. xo. ►
Example. The functions a(x) = x + 2x 3 and {3(x) = x are infinitesimals
as x-. 0. Then the difference -y(x) = 2x 3 is an infinitesimal of higher order
than a(x) and {3(x) as x _. 0 and, consequently, a(x) ~ {3(x), x _. 0.
We leave it to the reader to solve the following problem.
Problem. Let /31 (x), /32(x), ... f3m(X) be infinitesimals of higher order
than a(x) as x _. .xo. Prove that the sum a(x) = a(x) + /31 (x) + f32(x) + ..
. + = x f3m(x) is an infinitesimal equivalent to a(x) as x _. .xo.
Landau symbols. Let f(x) and 'P(X) be defined in a neighbourhood n
of Xo, except probably at Xo, and let 'P(X) ~ 0 in a neighbourhood Oo of
Xo, x ~ Xo- (Notice that Xo is a finite or infinite point.)
We say that f(x) is an infinitesimal relative to 'P(x) as x _. Xo and write
f(x) = O('f(X)) if
lim f(x) = O.
X--> Xo 'f (X)

In particular, f(x) = o(l), x _. Xo, means that f(x) is an infinitesimal


as x _. .xo.
Examples. (1) x 2 = o(x), x-. 0.
(2) x = o(x 2 ), x-. oo.
In general, X0 = o(x{J), x-. + oo, a < {3, and X0 = o(x{J), x-. 0 + 0,
a> {3.
1 We write f(x) = O('P(x)), x-. Xo if there exist a number M > 0 and a
neighbourhood Oo of Xo such that

lf(x)I ~ Ml'P(x)I vx E Oo, x ~ xo.


Thus f(x) is bounded relative to 'P(x).
The notationf(x) = 0(1), x _. Xo means that the functionf(x) is bound-
ed in a neighbourhood of Xo-
Examples. (1) x = O(x 2 ), [l, + oo).
(2) x 2 = O(x), [ - 2, 2].
(3) sinx = 0(1), - oo < x < + oo.
294 7. An Introduction to Analysis

Notice that the symbolic relations we have introduced are not equalities
in the ordinary sense. For instance, the "equality" sin x = 0(1),
- oo < x < + oo, does not imply that 0(1) = sin x.
The following rules for the symbol "o" can easily be verified:
(a) o(f(x)) + o(f(x)) = o(f(x)), x ➔ Xo, x E o;
(b) o(f(x)). O(<P(X)) = o(f(x). <P(X)), X ➔ Xo, XE n;
(c) o(o(f(x))) = o(f(x)), x ➔ .xo, x E 0.
◄ Indeed, let g1(x) = o(f(x)) and g2(x) = o(f(x)). Then (a) becomes
g1 (x) + g2(x) = o(f(x)), x ➔ Xo since
.
IIm g1 (x) + g2(x) . g1 (x) . g(x) ►
---,----- = 1Im - - + 1Im - - = 0
X---+Xo f(X) X---+Xo f(X) X---+Xo f(X) •
It is also easy to verify the following rules:
(d) o(f(x)) + 0(f(x)) = 0(f(x)), x ➔ .xo;
(e) o(f(x)) · 0(<P(x)) = o(f(x) · <P(x)), x ➔ .xo;
(f) 0(f(x)) • 0(<P(X)) = 0(f(x) • <P(x)), x ➔ .xo;

(g) 0(o(f(x)) = o(f(x)), o(0(f(x))) = o(f(x)), x ➔ .xo.


Recall that f(x) and <P(X) are equivalent or equal asymptotically as
x ➔ Xo and we write f(x) ~ <P(X), x ➔ Xo, if

lim f(x) = 1.
x--+xo <P(X)

Using the list of equivalent infinitesimals and Theorem 7.33, we obtain


the asymptotic relations
sin x = x + o(x), x ➔ O;
ex = 1 + x + o(x), x ➔ O;
In (1 + x) = x + o(x), x ➔ 0;
(1 + X)° = 1 + aX' + O(X), X ➔ 0.
The relations of the form f(x) - <P(X), f(x) = o(<P(X)) and f(x) =
0(<P(X)) are called the asymptotic formulas or asymptotic relations.

7.9 Complex Numbers


Representation of a complex number on a plane. A complex num-
ber is a relation of the form
Z = X + iy
7.9 Complex Numbers 295

where x and y are arbitrary real numbers and i is the imaginary unity such
that i2 = -1.
The numbers x and y are called the real and imaginary parts of the
complex number z = x + iy, respectively. We denote the real and imaginary
parts of z by
x = Re z and y = Im z.
The complex number x + iO is thought of as identical to the real num-
ber x.
The complex numbers z1 = x1 + iy1 and z2 = x2 + iy2 are said to be
equal (z1 = z2) if and only if X1 = x2 and Y1 = Y2,
Operations on complex numbers. (a) Addition of complex numbers.
The sum of z1 = x1 + iy1 and z2 = X2 + iy2 is the complex number
Z= Zt + Z2 = (X1 + X2) + i(y1 + Y2).
The operation of addition of complex numbers obeys
(1) the commutative law
Zt + Z2 = Z2 + Zt
and
(2) the associative law
(z1 + z2) + z3 = z1 + (z2 + z3).
(b) Subtraction of complex numbers. For any complex numbers z1 and
z2 there exists the number z such that z1 = z + z2. Then the number z is
called the difference of z1 and z2 and is denoted by
Z= Zt - Z2 = (Xt - X2) + i(y1 - Y2).

(c) Multiplication of complex numbers. The product z1z2 of the complex


numbers z1 = X1 + iy1 and z2 = x2 + iy2 is the complex number
Z = Z1Z2 = (X1X2 - Y1Y2) + i(X1Y2 + X2yt).
To memorize this formula it suffices to replace i2 by -1 in the product
(X1 + iy1)(X2 + iy2).
If z1 and z2 are real numbers the operation of multiplication of complex
numbers becomes identical to that of real numbers.
It is easy to verify that multiplication of complex numbers obeys
(1) the commutative law
Z1Z2 = Z2Z1;
(2) the associative law
(Z1Z2)Z3 = Zt (z2z3);
296 7. An Introduction to Analysis

(3) the distributive law (with respect to the operation of addition of


complex numbers)
(zi + Z2)z3 = ZiZ3 + Z2Z3.
(d) Division of complex numbers. For any complex numbers z1 and z2
(z2 -;t: 0) there exists a number z such that

Z1 = Z2Z.
The complex number z is called the quotient of z1 and z2 and is denoted
by z = ~- We shall derive the quotient formula for z1 and z2.
Z2
◄ Let zi = Xi + iyi, z2 = x2 + iy2 and z = x + iy. Then from (*) it fol-
lows that
xi = x2x - Y2Y and Yi = X2Y + xy2.
This system of equations is solvable with respect to x and y provided that
z2 -;t: 0. We have
. XiX2 + YiY2 X2Y1 - XiY2 ►
z = X + 1y = 2 2 +i 2 2 •
X2 + Y2 X2 + Y
The complex number
Z =X - IY
is called the complex conjugate of the complex number z = x + iy and
obeys the following rules:
Zi + Z2 = Z i + Z2 , Z i Z2 = Zi Z2 ,

CJ ~> : = -:-:-;:-,
=
zz = lzl 2 = x2 + y 2·

Trigonometric and exponential forms of -a complex number. The com-


plex number z = x + iy can be represented on the xy-plane by the point
M with coordinates (x, y) or by a position vector starting at 0(0, 0) and
ending at M(x, y) (Fig. 7.35). We shall call the xy-plane the complex plane.
We also call the x-axis the real axis and the y-axis the imaginary axis.
It is convenient to use the polar coordinates (r, 0), where r is the length
of the vector OM and 0 is the angle between the vector OM and the x-axis,
to locate the point M in the xy-plane (Fig. 7.36). In polar coordinates
x = r cos 0, y = r sin 0.
7 .9 Complex Numbers 297

Then the trigonometric form of a complex number is


z = r( cos 0 + i sin 0), z ;it. 0
with the absolute value (modulus) of z
r= lzl = ✓x2 + y2 = ✓zz ~ o,
and the argument of z
0 = Argz = argz + 2k1r (k = 0, ±1, ±2, ... ),
where arg z is the principal value of the argument.

y
0 y
0
M
M

0 X
Fig. 7.35 Fig. 7.36

The principal value of the argument of z is defined by


- 1r < arg z ~ 1r ( or O ~ arg z < 21r)
or by
tan - 1Y for x > 0,
X

1r + tan - 1Y for x < 0 and y ~ 0,


X

argz = for x < 0 and y < 0,


7r
for x =0 and y > 0,
2
7r
2
for x =0 and y < 0.

The argument of the complex number z = 0 is assumed to be undefined


and the modulus of z = 0 is assumed to be zero.
The complex numbers z1 and z2 which are distinct are equal to each
other if and only if the moduli of z1 and Z2 are equal and the arguments
of z1 and z2 are either equal or differ from each other by a multiple of
298 7. An Introduction to Analysis

21r by a positive integer, i.e.,


lzil = lz2I and Arg z1 = Arg z2 + 21rn,
where n is an integer.
Example. Compute the modulus and argument of the complex number
• 1r • 1r
Z = - Sill g - l COSS .

◄ We have
x = - sin ; < 0 and y = - cos ; < 0.
The principal value of the argument of z is
argz = -1r + tan- (cot;)
1 = -1r + tan- 1 [tan(; -;) ]
- -1r + tan (tan 38.,,) =
-1 -1r + 38., = - 5; .
Hence
Argz = - 85 1r + 2k1r, k = 0, ±1, ±2, ...

lzl = •27r
Sill - + COS 21r
- = 1. ►
8 8

0
z 1 +z 2
\
\
\
\
\
\
\
---- 'z 1

Fig. 7.37

The relationship between the complex numbers and vectors on a plane


enables us to interpret the operations of addition and subtraction of com-
plex numbers as vector addition and subtraction as shown in Fig. 7.37.
7.9 Complex Numbers 299

It is easy to verify the following


lz1 + z2I ~ lz1 I + lz2I,
lz1 - z2I ~ llzil - lz2II.
When dealing with complex numbers it is helpful to use Euler's formula
cos 0 + i sin 0 = ei 0,
which enables us to present a complex number z in the exponential form as
z = re;o.
It is advisable to present complex numbers in the trigonometric and
exponential forms when we wish to multiply or divide complex numbers.
If z1 = r1 eio, and z2 = r2ei 02 then
r1e;0,r2ei(02) = r1r2ei(01 +02).

◄ Indeed
z1z2 = r1ei 01 r2ei 02 = r 1(cos 01 + i sin 01)r2(cos 02 + i sin 02)
= r1 r2 [cos 01 cos 02 - sin 01 sin 02) + (sin 01 cos 02 + sin 02 cos 01)]
= r1r2(cos(01 + 02) + i sin (01 + 02)) = r 1r2 ei< 01 + 02).
Thus, multiplication of complex numbers involves multiplication of moduli
and addition of arguments, viz.,

and
arg (Z1Z2) = arg z1 + arg z2. ►

Analogously, the operation of division of complex numbers can be


described as
r ei01 r
-l - -_ - le i(01 - 02)
Zl
Z2 r2ei02 r2
provided that r2 ~ 0. Whence it follows that
Z1
Z2
and
Z1
arg - = arg z1 - arg z2.
Z2

Extracting a root of a complex number. We introduce the operation


of raising the complex number z to the nth power as
zn =zz ... z.
~
n times
300 7. An Introduction to Analysis

Let us represent the complex number z in the exponential and trigono-


metric forms as
z = re; 0 = r(cos O + i sin 0).
Then we arrive at the formula
Zn = (re; 0 t = rnein(J = rn(cos nO + i sin nO).
Setting r = 1, we obtain de Moivre's formula
(cos O + i sin or = cos no + i sin no.
Based on this experience we can define the operation of extracting a
root as follows. The complex number w is called the nth root of the complex
number z if
wn = z.
In symbols, we write w = 'vi.
We shall show that for any z ~ 0 the root vi assumes n distinct values.
Substituting
z = re; 0 and w = eei,p

into w = vi, we obtain


e nein,p = rei(J_
Recall that given two equal complex numbers the moduli of these num-
bers are equal, and the arguments are either equal or differ by a positive
multiple of 21r. Then, from the above identity it follows that
en = r and ncp = 0 + 2k1r
or
O + 2k1r
e = '}c
vr and cp = - - - - .
n
Thus, the moduli of all the nth roots of z -are equal and their arguments
differ by the multiples of 21r In by positive integers. Whence it follows that
the nth roots can be displayed on the complex plane as the vertices of the
regular polygon of n sides inscribed in the circle with the radius ~ and
centre at the point z = 0 (Fig. 7.38).
Substituting k = 0, 1, 2, ... , n - 1 into (*), we get n distinct complex
numbers
?-c '!!-c(
V n = Vr cos
0 + 21rk
n
. 0 + 21rk)
+ l. Sill n ,

k = 0, 1, 2, ... , n - 1
7.9 Complex Numbers 301

or
n n 0 +27rk
; _ __
vz = vr e n , k = 0, 1, 2, ... , n - 1.

Example. Compute all the values of VT.


◄ Writing down z = i in the exponential form, we have
• 71"
l -
z= i = e 2•

Then we obtain from (**)


;(:'.': + 27rk)
Wk =e 6 3 , k = 0, 1, 2.

Fig. 7.38 Fig. 7.39

Whence
;ZE:6 tr .. tr {3 .I V3 + i
Wo =e = cos 6 + l sin 6 =2 +l2 2
. _
, S1r 5 7r .. 5 7r
W1 = e 6 = COS - - + l Sln - -
6 6
-ff .1 -V'S+ i
- -+ l- =
2 2 2
,_
• 371"
31r . . 31r
2 = COS 2 + l Sln 2 = -l

(see Fig. 7 .39). ►


Limits of sequences of complex numbers. Let {Zn } be a sequence of
complex numbers, where Zn = Xn + iyn. A complex number z is a limit of
302 7. An Introduction to Analysis

{Zn} if given any e > 0 there exists a number N = N(e) such that
lzn - zl < e
whenever n ~ N.
A sequence {Zn } is called convergent to a number z if z is the limit
of {Zn }. In symbols, we write
Z = lim Zn or Zn ~ Z (n ~ 00)
n -+oo

to mean that z is a limit of Zn or Zn converges to z, or, using logical symbols,


(lim Zn = z) {:} (Ve > 0 3N 'rln ~ N, lzn - zl < e).
n-+ oo

Every term Zn = Xn + iyn of {Zn } is associated with the real numbers


Xn and Yn• Hence the sequence {Zn ) of complex numbers corresponds to
two sequences {Xn} and {Yn} of real numbers comprising the real and im-
aginary parts of terms Zn, respectively.
Theorem 7.34. A sequence {Zn} of complex numbers Zn = Xn + iyn is
convergent if and only if both the sequences {xn} and {Yn} of real numbers
are convergent.
◄ Let {Zn ) converge to a complex number z. This means that for any
e > 0 there exists N such that lzn - zl < e whenever n ~ N. Since
lxn - xi ~ lzn - zl < e and IYn - YI ~ lzn - zl < e then lim Xn = X and
n -+oo
lim Yn = y. Conversely, if lim Xn = x and lim Yn = y then for any e > 0
n -+oo n-+oo n-+oo
there exists Ni such that
e e
lxn - xi < ,,,fi and IYn - YI < ,,,fi"
Thus
lzn - zl = l(Xn + iyn) - (x + iy)j = ✓ (xn - x 2 ) + (yn - y) 2 < e.
Whence lim (Xn + iyn) = x + iy. ►
x-+oo

By virtue of this theorem all results concerning sequences of real num-


bers are fully applicable to sequences of complex numbers.

Exercises
1. Prove that the limit of [ ; 2] is equal to zero. Verify what n

satisfy the inequality 4n < e provided that (a) any e > 0, (b) e = 0.1 and
(c) e = 0.01.
2. Prove that the limit of [ n ] is equal to 1.
n +1
Exercises 303

Compute the following limits


.
3 . IIm (n + 2) 2 .
4• 1Im 2n 3 - + 1 5 1. n 2 - n + 5
n2
2 • 4 • • Im - - - - -
n----> oo n n---->oo n + 16n + 2 n-+oo 2n + 6

~ 2n 3 +
n - 2 . n + 6 - n V 2
6. lim - - - - - - . 7. 11m ----;=====----;;-----;====-
n-+ oo n + I n-+ oo ✓ n 2 + n + I + n4 + 1 1/
8. lim
Vn 5 + 1 + 1/ n 2 + 2
5
n---->oo ~n4 + 3 + ✓ n3 + 5
(Hint. On evaluating a limit of a quotient of polynomials it is helpful to
divide the nominator and denominator by nP where p is the greatest expo-
nent in the denominator. This approach is also used when we evaluate limits
of fractions involving irrational functions.)
Compute the following limits

9. lim (
n---->oo
nl)\ _
n+. n.1
• 10. lim ( 21
n---->oo
+ 41 + . . . + 2 -¼-) .
11. lim ---4--o + 2 + ... + n). 12. limn ~osn! 13. lim (✓n + 2
n--->oo n n--->oo n + 1 n--->oo

vn). 14. lim (✓ n 2 - 4n +5 - n). 15.


n--->oo n--->oo

x 2 -2x+1 x2 - 1 x 3 + 3x 2 + 2x
16. lim . 17. lim . 18. lim
x----> 1 X3 - X x---> - 1 x 2 + 3x + 2 x---> -2 x 2 -x-6

✓ 1 + x2 - 1 2 - ✓x - 3 V1 + x 2 - 1
19. lim 20. lim . 21. lim
X--->0 X X--->7 x 2 - 49 X---->0 x2

x3 + X l + x - 2x 3
22. lim . 23. lim 2 3•
x---->oo x 4 - 3x2 + l x---->oo 1 + X + 3x

Using equivalent infinitesimals, compute the following limits

24. lim sin 2x . 25. lim s'.n ;x (a, (3 are constants). 26. lim t~n;; .
x--->O X x--->O Sill X x--->O Sill

27. lim sin -1 3x . 28. lim tan -1 2x 29 lim s~n 3x . 30. lim 1 _- x2 •
x--->O X x--->O sin 3X • • x+ir Sill 2X x---> I Sin 7rX

. In (1 + x2)
I1m . lnx - 1
I1m In cosx
2 32. ----. 33. lim---x--
x--->O Sin 3X x--->e X - e x--->O x2

.
34. I1m ex2 - 1 35 1· In (1 - x) 36 11·m eax - ebx ( b
. 2 • • 1m sinx • • ---- a, are
x--->O Sin X x--->O e - 1 x-+O X

constants). 37. lim .!_ (e 11x - 1).


x-+oo X
304 7. An Introduction to Analysis

Compute the limits

38. lim ( 1 x )x· 39. lim ( X-+ 21 )3x . .


40. hm (cos x) x
-;- •
X---->CO + X X---->CO X X---->0

41. lim ( x 2 + 1
2
)x
2

X---->CO 2x + 1
42. Compute the modulus and the principal value of the argument of the
following:
(a) 4 + 3i, (b) -2 + 2v3i; (c) -7 - i;
(d) -cos; + isin-~; (e) 4- 3i.
43. Write down complex numbers in the trigonometric and exponential
forms:
(a) - 2; (b) 2i; (c) - i; (d) -V2 + i V2.
44. Compute the following:
(a) (2 - 2i)7; (b) (v3 - 3i)6; (c) ( !+ ~ r
45. Compute all the values of the roots:
(a) Yi; (b) Vl; (c) ~ i ; ( d) t' - 1 + i ; (e) ✓ 2 - 2v3i .

Answers
1 3
1. (a) n > -Ii; (b) n ~ 4; (c) n > 10. 3. I. 4. 0. 5. oo. 6. vi. 7. -1. 8. 0. 9. 0. 10. I.

11. 1/2. 12. 0. 13. 0. 14. - 2. 15. 1/2. 16. 0. 17. - 2. 18. -2/5. 19. 0. 20. -1/56. 21. 1/3.
22. 0. 23. -2/3. 24. 2. 25. a/{3. 26. 3/2. 27. 3. 28. 2/3. 29. -3/2 (put 1r - x = t). 30. 2/1r.
31. 1/9. 32. 1/e. 33. -1/2. 34. I. 35. -1. 36. a - b. 37. I. 38. 1/e. 39. e9• 40. e- 112 •
21r 1
41. 0. 42. (a) r = 5, 0 = tan- 1 3/4; (b) r = 4, 0 = ---; (c) r = 5../2, e = -1r + tan- 1 - ;
3 7

(d) r = l, 0 = -47r ., (e) r = 5, 0 = -tan _ I --3 . 43. (a) 2(cos 1r + i sin 1r),
·
2e' ... ;
5 4

(b) 2(cosJ +isin;), 2e;;; (c) cos(-J) +isin(-J)•


37r .3 ...
'-at 1
i sin -j- ) , 2e . 44. (a) 2 10(1 + i); (b) 1728; (c) I. 45. (a) ± 1, ±i; (b) ± 2 (1 + i);

(c) ± ( cos 1r . sm
. 7r) ( cos -31r- + 1. sm
. - 37!" ) ; V4 . 6 7r
8 - 1
8 , ± 8 8 (d) - 3
2
(1 + 1), v2 ( -cos-+
12

i sin~), ½(sin-~- - i cos--~); (e)±(v3 - i).


12 12 12
Chapter 8
Differential Calculus.
Functions of One Variable

8.1 Derivatives and Differentials


The derivative. Let y = f(x) be a function defined on an open inter-
val (a, b) and let x be a point in (a, b). Consider a point x + AX", where
.1x is a positive or negative increment of the variable x such that the point
x + LU" is contained in (a, b). The increment AX will produce an increment
Ay in y = f(x) so that
Ay = f(x + &) - f(x).
We can write a ratio of two increments .1y and AX as

t = f(x + ~ - f(x) (ax -;t; O).

For a given x this ratio becomes a function of .1.x, i.e.,

t = cp(.AX).

Definition. If there exists a limit of the ratio t as .1x ~ 0 this limit


is called the derivative of the function y = f(x) at the point x and is denoted
by f' (x) or y' (x) or y;.
Thus by definition we have

f , (X ) _- 1Im- -.1y _- 1Im . f(x -+ AX")


- -- -
f(x)
- .
.6.x -+ 0 .AX .6.x -+ 0 .1x
Examples. (1) Let y = x 2 • Then for any x and any .1x we have Ay =
(x + &) 2 - x2 = 2x.1x + (AX)2. Whence ~: = 2x + LU" and lim Ay =
LU .6.x-+oAX

2x. By definition lim ~: = y' (x). Hence y = x 2 has the derivative


.6.x-+ 0 L.l..\.
y' = 2x, i.e., (x 2) ' = 2x at every point x.
(2) Let y = ex.
Then for any x and any AX we have Ay = ex+ .6.x - ex =
ex( e.6.x - 1) . Wh ence 11m
. -Ay = 11m . -'--------'-
ex(e.6.x - 1) . -
= ex 11m e.6.x - 1
-- = e x.
.6.x -+ 0 .1x .6.x -+ O AX .6.x -+ 0 .1x
Thus (ex)' = ex for all x.

20-9505
306 8. Differential Calculus. Functions of One Variable

Remark. Sometimes it is convenient to express (*) in the following form.


Let f(x) be defined at Xo and in a neighbourhood of Xo. Then
. f(x) - J(xo)
f , (Xo ) = 1I m-----
x---+ Xo X - Xo

provided that this limit exists.


We say that the function f(x) has a derivative on (a, b) if this derivative
f' (x) exists at each point x E (a, b).
We leave it to the reader to solve the following problems by applying
the definition of the derivative given above.
Problems. (1) Using the definition of the derivative find f' (0) for the
function

X
2 •
Sill -
1 for X ':;t=
0'
X
f(x) =
0 for x = 0.

(2) Let f(x) be a periodic function with period T. Prove that if f(x) has
a derivative then the derivative of f(x) is also a periodic function with peri-
od T.
(3) Prove that the derivative of an even function is also an even function
and the derivative of an odd function is an odd function provided that
these derivatives exist.
Geometric interpretation of the derivative. We consider the graph of
a function y = f(x) which is defined on (a, b) (Fig. 8.1) and choose the
points M(x, f(x)) and P(x + Ax, f(x + Ax)) on the curve y = f(x). We also
draw the line through the points M and P.
Suppose that the point P is moving along the curve y = f(x) towards
the point M (or, which is the same, Ax tends to zero) so that the line MP
is changing its position until MP coincides with the line MT. The line MT
which defines the limiting position of MP as Ll.x-+ 0 is called the tangent
to the curve y = f(x) at the point M. Notice that when the point P moves
to the point M the angle TMP tends to zer~.
From Fig. 8.1 it is easily seen that the slope k of the line MP is

k = tan a= t.
Let 'P ¢. 1r /2 be the angle between the tangent MT and the x-axis. Since
the slope of the tangent MT to the curve y = f(x) at M is the limit of the
slope of MP as the point P moves to the point M along the curve and,
consequently, as AX-+ 0, we get
tan 'P = lim tan a = lim Ay = lim f(x + Ax) - f(x)
P--+M .6.x-+O Ax ax--+O AX ·
8.1 Derivatives and Differentials 307

The latter limit, if it exists, is the derivative f' (x) and is equal to tan cp.
Therefore, the derivative f' (x) of the function y = f(x) is the slope of the
tangent to the curve y = f(x) at the point specified by the abscissa x.

0 I I +-..:1X

Fig. 8.1

Equations of tangents and normals to a curve. Let a curve be defined


by a function y = f(x) and let Mo(xo, f(xo)) be a point on the curve. we·
assume that /(x) has a derivative at xo and derive the equation of the tan-
gent to the curve at Mo.
The equation of the line through the point Mo(xo, Yo) is
Y - Yo = k(x - Xo),
where k is the slope of the line.
The slope kT of the tangent to the curve y = f(x) at Mo is equal to the
derivative f' (xo) so that the equation of the tangent takes the form
Y - Yo = f' (xo)(x - Xo) (yo = f(xo)).
The normal to the curve at a given point is the line which passes through
this point and is perpendicular to the tangent to the curve at this point.
This implies that the slope kn of the normal is related to the slope kT of
the tangent as

1 (/' (xo) ~ 0)
f' (xo)
20*
308 8. Differential Calculus. Functions of One Variable

so that the equation of the normal to the curve y = f(x) at Mo(xo, Yo) is

y - Yo = - 1 (x - xo) (f' (xo) ~ 0).


f' (xo)
When f' (xo) = 0 the equation of the normal becomes x = Xo.
Exampl·e. Write down the equations of the tangent and normal to the
curve y = x 2 at the point 0(0, 0).
◄ We havef(x) = x 2 , f' (x) = 2x and/' (0) = 0. Then the equation of the
tangent is y - 0 = 0(x - 0) or y = 0, i.e., the tangent coincides with the
x-axis, and the equation of the normal is x = 0, i.e., the normal coincides
with the y-axis (Fig. 8.2). ►

X X

Fig. 8.2 Fig. 8.3

Application of the derivative in mechanics. Let s = s(t) be an equation


(a law) of rectilinear motion of a point, which specifies the distance
travelled by the point as a function of time t. Let ~ be the distance travelled
by the point during the time interval ll.t from t to t + ll.t, 1.e.,
~ = s(t + ll.t) - s(t).

The ratio ~/ llt is called the mean velocity in the time interval llt. We
define the velocity v of the point at the moment t as the limit of the mean
velocity in the time interval ll.t as llt ➔ 0 and write

v(t) = lim ~ = s' (t).


t:.t-+ 0 L,JJ,

Hence the velocity v(t) is equal to the derivative of the distance s with
respect to time t, i.e., v(t) = s' (t).
Example. Let us consider the law of rectilinear motion s = t 2 where
the distance s is measured in metres and the time t is measured in seconds
and compute the v~locity at t = 3 s.
8.1 Derivatives and Differentials 309

◄ The velocity of a point at any t is given by


ds
v = dt = 2t.

Whence we have v = 6 mis at t = 3 s. ►


Right-hand and left-hand derivatives. The right-hand derivative
f' (x + 0) of the function y = f(x) at a point x is

f' (x + 0) = lim Ay = lim Ay


ax--+ O LlX Llx--+ o + 0 LlX
ax>O
and the left-hand derivative of y = f(x) at x is

f' (x - 0) = lim Ay = lim Ay


Llx--+ O LlX Llx--+ O - O LlX
<lx<O

provided the limits involved exist.


It is easy to see that for the derivative f' (x) to exist at a point x it
is necessary and sufficient that the function y = f(x) have the right-hand
and left-hand derivatives at x and these derivatives be equal so that
f' (x + 0) = f' (x - 0) = f' (x).
By way of illustration we shall show that there exist functions which
have right-hand and left-hand derivatives at x but have no derivative at
x. Consider the function f(x) = Ix I. The ratio
f(0 + AX) - f(0) _ IAXI
LlX LlX
is equal to 1 for LlX >0 and is equal to - 1 for LlX < 0. Hence f(x) =
Ix I has the right-hand derivative f' (0 + 0) = lim I
ax--+o
t 1 = 1 and the left-

t
<lx>O

hand derivative f' (0 - 0) = lim I 1 = -1 at x = 0 but these derivatives


ax--+O
i1x < 0
are distinct. Consequently, f(x) = Ix I has no derivative at x = 0. Geometri-
cally this means that there is no tangent to the curve y = Ix I at the point
0(0, 0) (Fig. 8.3).
Let f(x) be continuous at a point xo. We say that f(x) has the infinite
derivative equal to + 00 or to - 00 at xo if at this point

f' (xo) = lim Ay = + 00 or f' (xo) = lim Ay = - 00,


i1x-+ O AX AX
i:\x--+ 0

respectively.
310 8. Differential Calculus. Functions of One Variable

This implies that the tangent to the curve y = f(x) at the point
(xc, f(xo)) is perpendicular to the x-axis. For example, if f(x) = Vi then
at x = 0 we have
.1y _ f(0 + L\x) - f(0) _ ~ _ I
L\x .1x - .1x - V<.1.x->2 .
Whence it is easy to see that .1y/.1x tends to + oo as .1x tends to zero in
an arbitrary manner. The tangent to the curve y = Vi at the point
0(0, 0) coincides with the y-axis (Fig. 8.4).

J
y=yx

Fig. 8.4

lj

y=f(x) :,:✓ j:,o)


(\:,'.,o~\
~/ f,:,:ol::::

M 0 (::,: 0 ,f(xoJ)
I
I
I
I
I

0 X

Fig. 8.5

Thus, if a function f(x) has a finite derivative at xo then there exists


a tangent to the graph of y = f(x) at Mo(xo,/(xo)) (Fig. 8.5) and this tangent
is given by the equation
Y - J(xo) = f' (xo) (x - xo).
8.1 Derivatives and Differentials 311

0 X

Fig. 8.6

0 X 0 :r·

(1) (2)

y y

0 Io ..r 0 Io X

(3) {4)
Fig. 8.7
312 8. Differential Calculus. Functions of One Variable

We say that a function f(x) is smooth on an open interval (a, b) if f(x)


and its derivative are continuous on (a, b). A curve corresponding to a
smooth function is also called smooth on (a, b).
If a function y = f(x) is continuous at xo and has the right- and left-
hand derivatives f' (xo + 0) and f' (xo - 0) and if f' (xo + 0) ~ f' (xo - 0),
then at the point Mo(xo, f(xo)) there exists no tangent to the curve y = f(x).
In this case the curve is not smooth and at Mo(xo, f(xo)) there exist two
lines one of which is a tangent to the left branch of the curve and the
other is a tangent to the right branch. The point Mo(Xo, f(xo)) is said to
be the corner point of the curve (Fig. 8.6). Notice that the point 0(0, 0)
is the corner point of the function (curve) y = Ix I.
If a function f(x) is continuous at Xo and its derivative is infinite at
xo the following cases should be distinguished
(1) f' (xo) = + oo;
(2) f' (xo) = - oo;
(3) f' (Xo - 0) = - oo, f' (xo + 0) = + oo;
(4) f' (Xo - 0) = + oo, f' (Xo + 0) = - oo.
Figure 8. 7 displays the graphs of y = f(x) and tangents x = xo correspond-
ing to cases (1 )-( 4).
Differentiable functions. Let y = f(x) be defined on an open interval
(a, b) and let x be a point in (a, b). Consider an increment Lil in x such
that the point x + AX is contained in (a, b). The increment Lil produces
the increment .1y in f(x) so that
.1y = f(x + AX) - f(x).
The function f(x) is called differentiable at the point x E (a, b) if the
increment of f(x)
.1y = f (x + AX) - f (x)
corresponding to the increment Lil admits a representation of the form
.1y =A Lil + a(Lll) Lil,
where A is a number independent of .1x (though in general A depends on
x) and a(Ax) ➔ 0 as Ax ➔ 0.
Example. Let y = x 2 • For any x and. any Ax we have
.1y = (x + AX) 2 - x 2 = 2x Ax+ AX Ax.
----.; '---'
A a
By definition this means that y = x 2 is differentiable at any point x and
A = 2x, a(Lix) = AX.
The following theorem specifies the necessary and sufficient conditions
for a function to be differentiable.
8.1 Derivatives and Differentials 313

Theorem 8.1. For a function y = f(x) to be differentiable at any point


x it is necessary and sufficient that f(x) has a finite derivative f' (x) at x.
◄ Necessity. Let /(x) be differentiable at x. We prove that there exists a
derivative f' (x) at x. Indeed, since y = f(x) is differentiable at x the incre-
ment AX in x gives rise to the increment ~Y which can be expressed as
Lly =A AX + a(AX) AX.
Whence

~ = A + a (AX),
where A is independent of AX, 1.e., is constant at a given point x and
a(AX) -+ 0 as AX-+ 0.
Theorem 7.19 implies that

A = lim Lly = f' (x).


;U-+0~

Thus a derivative of f(x) at x does exist.


Sufficiency. Let f(x) have a finite derivative f' (x) at x. We prove that
f(x) is differentiable at x. Indeed, since f' (x) exists at x there exists a limit
of ~ as AX-+ 0 so that

lim Lly = f' (x).


;U-+0~

Whence, by virtue of Theorem 7.19 it follows that

~ = f' (x) + a(AX),


where a(AX)-+ 0 as AX-+ 0 and, consequently,
Lly = f' (x) AX + a(AX) AX.
Since f' (x) is independent of AX and a(AX) -+ 0 as AX ---+ 0 from (*) we
infer that y = f (x) is differentiable at x. ►
Theorem 8.1 establishes a one-to-one correspondence between the no-
tion of a function differentiable at a point and the notion of a function
having a finite derivative at this point, that is, if f(x) is differentiable at
a point then it has a finite derivative at this point and vice versa. So the
operation of computing a derivative of a function is also called the differen-
tiation of a function.
Continuity of differentiable functions. We shall prove the theorem
which establishes a relationship between continuous functions and differen-
tiable functions.
314 8. Differential Calculus. Functions of One Variable

Theorem 8.2. If a function y = f(x) is differentiable at a point x then


f(x) is continuous at x.
◄ Indeed, if f(x) is differentiable at x then the increment ..1y corresponding
to the increment ~ in x admits a representation of the form
Liy = A~ + a(~)~,
where A is constant at x and a(~)~ 0 as ~ ~ 0. Whence lim Liy = 0
so that f(x) is continuous at x. ►
The converse is not true, namely, if f(x) is continuous at x it is not
necessarily differentiable at x. For example, f(x) = lxl is continuous at
x = 0 but has no derivative at x = 0. Hence, f(x) is not differentiable at
X = 0.
Example. The function

x sin __!_ for x -;t; 0,


X
f(x) =
0 for x =0

is continuous on ( - oo, + oo ). For all x -;t; 0 f(x) has a derivative; however


f(x) has neither right- nor left-hand derivatives at x = 0 since
A..,. • 1
u.,,.\, sm ~ . 1
--~-- = Sill-~-
has no limit as ~ ~ 0 + 0 and as ~ ~ 0 - 0.
The examples we have considered involve functions each of which has
no derivative only at one point in its domain. The idea that a function
may have no derivatives only at a finite number of points prevailed in the
eighteenth and early nineteenth centuries. However, later on various emi-
nent mathematicians offered examples of functions continuous on a closed
interval [a, b] and having no derivatives at any point of [a, b].
-
The differential. Let y = f(x) be differentiable at a point x, i.e., let an
increment AX in x produce an increment .1y in f(x) so that
.1y =A AX + a(AX) AX,

where a (AX) ~ 0 as AX ~ 0.
If y = f(x) is differentiable at x the linear part A AX of .1y, provided
that A ¢ 0, is called the differential of y = f(x) and is denoted by dy or
by df(x) so that
dy =A AX.
8.1 Derivatives and Differentials 315

When A -:;t; 0 we say that A Llx is the principal linear part of Liy since
a(Llx) Llx is an infinitesimal of higher order than A Llx as .1.x ~ 0. If A = 0
the differential is equal to zero.
By virtue of Theorem 8.1 we have A = f' (x). Then the differential dy
becomes
dy = f' (x) Llx.
Along with the notion of the differential of a function we can introduce
the notion of the differential of an independent variable x by putting
dx = AX. Then the differential of y = f(x) can be written as
dy = f' (x) dx.
Whence we have the Leibniz notation for the derivative: f' (x) = 1x .Thus
a derivative of f(x) can also be thought of as a quotient of two differentials
dy and dx.
We say that a function y = f(x) is differentiable on an open interval
(a, b) if f(x) is differentiable at every point of (a, b).

lj

X x+dx X

Fig. 8.8

To interpret the notion of a differential geometrically we use Fig. 8.8.


Let a function y = f(x) be differentiable on some open interval (a, b). We
draw a tangent to the curve f(x) at a point M and choose a point M1 whose
abscissa is x + dx. Clearly,/' (x) = tan 'P· Consider the triangle MPQ. It
is easy to see that
PQ = MP tan 'P = f' (x)dx = dy.
Thus, the differential dy = f' (x) dx of the function y = f(x) is the incre-
ment of the ordinate of the tangent to the curve y = f(x) at M when x
is given the increment dx.
316 8. Differential Calculus. Functions of One Variable

8.2 Differentiation Rules


The derivative of a constant function. If y = C = const at every
point of (a, b) then y' = 0 in (a, b).
◄ Indeed, if y = C vx E (a, b) then for any x E (a, b) and any LU such
that x + AXE (a, b) we have

Lly =C - C = 0 and Lly =0 (LU ~ 0) .


.1x
Whence
y' = lim Lly = 0 VxE(a, b) .
.lx-•O ,1x

Thus
(C)' = 0 and dC = 0. ►

The derivative of a sum of functions. Let u(x) and v(x) be differentiable


at x. Then the sum y(x) = u(x) + v(x) is also differentiable at x and
y'(x) = (u(x) + v(x))' = u'(x) + v'(x).

◄ Indeed, the increment LU in x produces the increments in u(x) and in


v(x) so that
u(x + Llx) = u(x) + ~u and v(x + Llx) = v(x) + Llu.

Then the increment Lly in y = u(x) + u(x) becomes


Lly = [(u + Llu) + (u + Llv)] - (u + u) = Llu + Llv.
Whence

t t t-= +
Since u(x) and v (x) are differentiable at x there exist the derivatives
u '(x) and u '(x) at x. Then each summand in ( *) has a limit as LU-+ 0
so that there exists a limit of the right-hand -side of ( *) equal to u '(x) +
u '(x). Hence, the left-hand side of (*) also has a limit as LU -+ 0. In other
words, there exists
lim Lly = y ' (x).
tix-+O ,1x

Thus, evaluating the limit of (*) as .1x -+ 0, · we obtain


y' (x) = (u(x) + u(x))' = u '(x) + u '(x)
and
d(u + u) = du + du. ►
8.2 Differentiation Rules 317

Analogously, we prove that


(u(x) - v(x))' = u'(x) - v'(x)
and
d(u - v) = du - du.

We can easily extend these results to any finite number of differentiable


functions.
Example. Find the derivative of y = x 2 + ex + 2.
◄ y ' = (x 2 + ex + 2) ' = (x 2) ' + (ex) ' + (2) ' = 2x + ex. ►
The derivative of product of functions (product rule). If u(x) and v(x)
are differentiable at x so is the product u(x) v (x) and
(u(x) v (x))' = u' (x) v (x) + u(x) v '(x).

◄ Let AX be an increment in x. Then u(x) and v(x) receive the increments


.t,.u and .t,.v which give rise to the increment

.t,.y = (u + .t,.u)( v + .t,.u) - uv = v .t,.u + u .t,.v + .t,.u .t,.u.

Consider the ratio


.t,.y .t,.u /),.u /),.u
-=v-+u-+.t,.u-.
AX AX AX AX
We prove that each factor in the right-hand side of ( *) has a limit as
AX-+ 0. Indeed, at a given point x u(x) and v(x) are constant. Since u(x)
and v (x) are differentiable at x there exist the derivatives

u '(x) = Iim ~~ and v' (x) = lim ~~ .


ti.x --+ 0 UA, ti.x --+ 0 -'-1.A

Since u(x) and v(x) are differentiable at x they are continuous at x so that
.t,.u and .t,.v tend to zero as AX-+ 0. Thus the right-hand side of ( *) has
a limit equal to v(x)u' (x) + u(x) v '(x) as AX-+ 0. Hence there exists a limit
of the left-hand side of ( * ), i.e., there exists lim .t,.y = y' (x). Evaluating
~x---oAX
the limit of ( *) as AX -+ 0, we obtain ·
y' (x) = (u(x) v(x))' = u' (x) v(x) + u(x) v '(x).
Whence
d(uv) = vdu + udv. ►

Example. Find the derivative of y = (x 2 - l)(ex + 2).


◄ y' = ((x 2 - l)(ex + 2))' = (x 2 - 1) '(ex + 2) + (x 2 - l)(ex + 2)'
= 2x(ex + 2) + (x 2 - l)ex = (x 2 + 2x - l)ex + 4x. ►
318 8. Differential Calculus. Functions of One Variable

Corollary. If a function is multiplied by a constant factor its derivative


(or differential) is multiplied by this factor, i.e.,
(Cu(x))' = Cu' (x) and d(Cu(x)) = Cdu (C = const).
The product rule is easily generalized to any finite number of differen-
tiable functions so that
(u1(x)u2(x) ... Un(x))' = u{(x)u2(x) ... Un(X)
+ u1(x)u 2(x) ... Un(x) + ... + u1(x)u2(x) ... u~(x).

The derivative of a quotient of functions (quotient rule). If u(x) and


v(x) are differentiable at a point x and if v(x) ~ 0 at x then the quotient

y(x) = ~~~ is differentiable at x and

y , -_(u)'
- -_
V
u 'v - uv'
V
2 '

(v(x) ~ 0).

d (~) = v du : 2 u dv ,

◄ Since v(x) is differentiable at x, v(x) is continuous at x and, by virtue


of Theorem 7.22, v (x + AX) ~ 0 for all sufficiently small IdX' I. Then the
ratio
u(x + AX) u + '1u
v(x + AX) - v + .1v
is defined for all sufficiently small IdX' I.
The increment .1x in x produces the increment

.1y = u + '1u u V '1u - U .1v


2 •
V + .1v V V + V .1v

Whence
'1u il.v
A vA..,.-uA..,.
~ = _L.l.,\.
___ L.I.,;\_

At" v2 + v .1v

Recall that u(x) and v(x) are differentiable at x. This implies that there
· t 1·1m il.u
ex1s A ..... = u
' (x ) an d 1·1m -il.v = v '(x) and il.v--+ 0 as At"--+ 0. Also,
ax - o L.I.,\. ax - odX
at a given point x the values of u and v are constant and, as stated, v (x) ~ 0.
I I

The right-hand side of(*) thus has a limit as dX--+ 0 equal to vu - uv


u2·
8.2 Differentiation Rules 319

Hence there exists a limit on the left of ( *), i.e., there exists lim ~~ = y' (x) .
.1x --+ O L.lA
Evaluating the limit of ( *) as Lix" --+ 0, we obtain

y'(x) = (u(x))'
v(x)
= u'(x)v(x) - u(x)v'(x),
v 2 (x)
(v(x) ;t; 0). ►

d (~) = vdu:, udv ,

ex - 1
Example. Find the derivative of y = ---
x2 +5
, _ (ex - I ) ' _ (ex - I)' (x 2 + 5) - (ex - l)(x 2 + 5)'
◄ y - x2 + 5 - (x 2 + 5) 2

ex(x 2 + 5) - (ex - 1)2x (x 2 - 2x + 5)ex + 2x


= (x2 + 5)2 = (x2 + 5)2 ►

Problems. (1) Determine whether f(x) + <;?(X) is differentiable at a point


x or not, provided that f(x) is differentiable at x and <,0(x) is not.
(2) Suppose that J(x) is differentiable at Xo, J(xo) ;t; 0, and <i0(X) is not
differentiable at xo. Prove that f(x)<i0(x) is not differentiable at xo.
(3) Suppose that f(x) and <,0(x) have no derivatives at xo. Does it imply

that (a) f(x) + <i0(x), (b) f(x)<i0(x) and (c) ~~i) have no derivatives at Xo?
(Consider the following functions: (i) f(x) = Ix I, ip(x) = - Ix I, xo = 0;
(ii) f(x) = <,0(x) = Ix I, Xo = 0; (iii) f(x) = <,0(x) = Ix I + I, Xo = 0.)
Derivatives of some elementary functions. (I) Exponential Junction
y = ax (a > 0, a ;t; 1). This function is defined at every point of the number
line. Hence, for any x and any Lix" we have

.1y = ax+ .1x - ax= ax(a.1x - 1)

and, for Lix" ;t; 0,


,1y = aX a/j.X - }

ax ax
.1 .ix 1 .1x 1
lim ___.E = lim ax a - = ax lim a - = ax In a•
.1x --+ 0 ax ax --+ 0 LU" tJ.x ..... 0 LU"
Thus
320 8. Differential Calculus. Functions of One Variable

In particular, if a = e then
(ex), = ex.

(2) Logarithmic function y = ln x (x > 0). For any x > 0 and any Ax
such that x + AX > 0 we have

t.y = ln(x + t..x) - In x = In ( I + ~).


Whence

and
In
lim .dy = lim
AX
.::ix--+O t.x--+O

Thus
(In x)' - _!_
X
Using the identity
loga x = Io~ e In x (a > 0, a ~ 1)
we arrive at
lo~ e 1
(lo~ x) ' = lo~ e (In x)' - x - x In a
and, finally,
I
(lo~ x) ' - x In a ·
(3) Power function y = xa (a is an arbitrary real number). This function
is defined at least for all x > 0. Then

and
t.y = (x + t..x)a - xa = xa [ ( + I ~ r-I]
Recall that (1 + xAX)a - I - a XAX as t,,.x -+ 0 (see Chap. 7). Then

lim Ay = x" lim


( I + AX)" - 1 x
a t,,.x
= x"· lim _x_ = axa- t .
.dX
.c1x-+ 0 .c1x -+ 0 .dX .c1x-+ 0 4X
8.2 Differentiation Rules 321

and

(4) Trigonometric functions. Consider y = sin x, - oo < x < + oo. For


any x and any dX

dy = sin(x + t.x) - sin x = 2 sin t cos (x + ~)

and
. dX
Sill Z
Lix cos (x + ~)-
2

Since y = cos x 1s a continuous function at any point x,


. dX
Sill~
lim cos
Ax--+O
(x + Ax-
2) = cos x and
.
1lffi
Ax--+O
A"'
~
2
1, we obtain
2

. dX
lim ~Y = lim
AX--+ 0 AX AX--+ 0
Sl~ 2 COS (x + ~) = COS X.
2
Thus
(sin x)' = cos x.
Analogously
(cos x)' = - sin x.
Using formulas for derivatives of sin x and cos x, we easily arrive at

(tan x), = ( sin x) ' = (sin x)' cos x - sin x(cos x)' 1
2 = sec2 x
cos x cos 2 x COS X

so that

(tan x)' - \ COS X


= sec 2 x (x ¢ 2 + n7r, n
71" = 0, ±1, ±2, .. ·).

Analogously

1 -cosec 2 x (x¢n7r,n=0, ±1, ±2, ... ).


(cot x)'
sin 2 x

21-9505
322 8. Differential Calculus. Functions of One Variable

8.3 Differentiation of Composite and Inverse Functions


Derivatives of composite functions. First we shall prove the follow-
ing important theorem.
Theorem 8.3 (chain rule). Let u = <,0(x) be differentiable at xo and let
y = f(u) be differentiable at uo = <P(Xo). Then the composite Junction
y = f[ <,0(x)] is differentiable at Xo and
{f[<P(X)] );lx=xo = f' (uo)<P '(xo).
◄ Following our usual approach we consider the increment Liu in u = <,0(x)
corresponding to the increment .6.x in xo. The increm·ent Liu produces the
increment Liy in y = f(u).
Since y = f(u) is differentiable at uo we can write
Liy = f' (uo)Liu + a(Llu)Liu, ( *)
where a(Liu) ➔ 0 as Liu ➔ 0.
The function a(Liu) is indeterminate as Liu = 0. If we set a(0) = 0,
a(Liu) becomes continuous as Liu = 0.
Dividing both sides of ( *) by .6.x (.6.x ~ 0), we obtain

t = f' (uo) ~ + a(Llu) ~ . (**)

Also, u = <,0(x) is differentiable at xo and, consequently, continuous at


xo. Hence, Liu ➔ 0 as .6.x ➔ 0. This implies that a(Liu) tends to zero and

~➔ <P' (x0 ) as .6.x ➔ 0. Therefore the right-hand side of ( **) has a limit
equal to f' (u0 ) <P' (x0 ) as .6.x ➔ 0. So there exists a limit of the left-hand
side of ( * *), i.e., lim Liy , which is the derivative of the composite function
ax-+ o .6.x
y = f[<P(x)] at xo with respect to the variable x.
Evaluating the limit of ( * *) as .6.x ➔ 0, we obtain
(f[ <P(X)] ); Ix= xo = f ' (uo) <P '(xo),
where f' (uo) designates the derivative of f(u) with respect to the variable
u at the point uo = <P(Xo) corresponding to the point Xo of the variable x.
This identity can be written as
dy _ dy du
dx-dudx
or
YxI
= YuUx·
I I ►

Examples. (1) Find the derivative of y = esin x.


◄ Here y is a composite function of x which can be expressed as y = eu<x>,
where u(x) = sin x. Then
y; = (e")~u; = e" cos x = esinxcos x. ►
8.3 Differentiation of Composite and Inverse Functions 323

(2) Find the derivative of y = In Ix I, x -;r. 0.


◄ This function is even and is defined at every point of the number line
except at x = 0. If x > 0 then lxl = x and In lxl = In x so that

y; = {In x)' = _!_


X
(x > 0).

If x < 0 then lxl = -x and In lxl = In ( -x).


Set y = In u and u = -x. Then y = In ( -x) becomes a composite
function. By the chain rule we have

I
so that for x < 0 y; = -.
X
Thus

(In Ix I) ' = _!_ (x -;r. 0). ►


X

Remark. The chain rule is also applicable when we consider any finite
chain of functions. For example, if y = f(u), u = <p(Z) and z = l/;(x) so that
y = f{ <p [ l/;(x)]} and if there exist the derivatives f~, 'P;
and 1/;; then

Invariance of the form of the differential. If y = f(u) is a differentiable


function of an independent variable u then
dy = f' (u)du,
where du = w..
Let u be a differentiable function u = <p(u) of a variable x. Then we
may consider y as a composite function y = f[<p(x)] of the variable x. Since
xis an independent variable we can express the differential dy of the com-
posite function y = f[ 'P(x)] as
dy = (f['P(x)]}; dx.
The chain r~le yields
(J[<.o(x)] }; = f' (u)<P' (x)
and we can write
dy = f' (u)<P' (x)dx.
Since <P' (x)dx = du we again get the identity ( *)
dy = f' (u)du.

21*
324 8. Differential Calculus. Functions of One Variable

Therefore the differential of a function is expressed by the same formula


irrespective of whether the argument of this function is an independent
variable or a function of another variable. This property is sometimes called
the invariance of the form of the differential. It is worth mentioning that
in the formula dy = f' (u)du the differential du is equal to an arbitrary
increment Au of an independent variable u or, when u = <P(X), du =
<P' (x)dx is a linear part of the increment of u = <,0(x) and, in general, is
not equal to ilu.
Differentiation of inverse functions. Let y = f(x) be defined on a closed
interval [a, b]. Suppose that the range of y = f(x) is the closed interval
[a, /3] on the y-axis. Furthermore, let each y in [a, {3] correspond to
the only x in [a, b] so that f(x) = y (Fig. 8.9). Then we can specify the
function x = <,0(y) on [a, /3] by associating every y in [a, /3] with x in
[a, b] such that f(x) = y. The function x = <,0(y) is called the inverse func-
tion of y = f(x).

0 a :r b .r

Fig. 8.9

Clearly, if x = <,0(y) is the inverse of y = f(x) then y = f(x) is the inverse


of x = <,0(y). In this case we speak of y = f(x) and x = <,0(x) as of mutually
inverse functions. We can write
f[ <,0(y)] = y and <,0 lf(x)] = x
provided that y = f(x) and x = <,0(y) are
mutually inverse functions.
We shall follow a more constructive approach to specification of an
inverse function. Let y = f(x) be an equation solvable for x so that every
8.3 Differentiation of Composite and Inverse Functions 325

y is associated with exactly one x. Then we have the equation x = <,0(y)


which defines x as a function of y. The function x = <,0(y) is the inverse
function of y = f(x).
Examples. (1) If y = 3x is defined on [O, 1] then the function x = y/3
defined on [O, 3] is the inverse function of y.
(2) The inverse function of y = x 3 , - oo < x < + oo, is x = W,
- 00 < y < + oo.
(3) Let
fx for rational x,
y = l1 - x for irrational x.
Then the inverse function of y(x) is
y for rational y,
[
x = 1- y for irrational y.

IJ='I'( x) / /
/
// --
I
I
I
I
I
I

Fig. 8.10

Clearly, the equations y = f(x) and x = <,0(y) specify the same curve on
the xy-plane. If we display independent variables on the x-axis in both cases,
i.e., if we plot the functions y = f(x) and y = <,0(X) instead of y = f(x) and
x = <,0(y), their graphs will be symmetric relative to the line bisecting the
first and third quadrants of the coordinate plane (Fig. 8.10).
We say that y = f(x) is an increasing function on [a, b] if given any
x1 and x2 in [a, b] such that x1 < x2 there holds /(x1) < /(x2).
For instance, f(x) = x 3 is an increasing function for - oo < x < + oo.
326 8. Differential Calculus. Functions of One Variable

Theorem 8.4. Let y = f(x) be a continuous and increasing function on


[a, b] and let f(a) = a and f(b) = {3. Then y = f(x) has an inverse function
x = <i0(y) on [a, {3], x = <i0(y) being continuous and increasing on [a, {3].
We confine ourselves to a geometric interpretation (Fig. 8.11) of Theo-
rem 8.4. The curve AB represents the graph of y = f(x) which is continuous
and increasing on [a, b]. Every y in [a, {3] corresponds to the only x in
[a, b] so that f(x) = y. Thus AB sets a one-to-one correspondence between
x and y and we may consider x as a function of y on [a, {3], i.e., we may
say that x = <,0(y) is the inverse function of y = f(x). The function x = <iO(y)
represented by the curve AB is continuous and increasing on [a, {3] since
as y increases, so does x.
Similar consideration is fully applicable to any continuous function
which is decreasing on [a, b].
lj
y

f3 B

X=f{y)

a
0

a - -- -
/3

0 a X

Fig. 8.11 Fig. 8.12

Theorem 8.5. Let y = f(x) have a derivative f' (xo)


0 at a point xo
¢
and let there exist an inverse function x = <,0(y) of y = f(x), x = <,0(y) being
continuous at Yo = f(xo). Then the inverse function x = <,0(y) has a deriva-
tive at the point Yo and

,p' (yo) =f I (f' (Xo) ¢ 0).


'(xo)
◄ Consider x = <,0(y) and an increment "1y in y for y = yo. Then we can
write the corresponding increment "1x in x = <,0(y) as
AX = <,0(yo + "1y) - <P<Yo).
8.3 Differentiation of Composite and Inverse Functions 327

Notice that Lly = f(xo + AX) - f(xo). Then Lly ;r= 0 implies that .dX ;r= O;
otherwise the given point Xo would correspond to two distinct values
Yo = J(xo) and Yo + Lly which is impossible by the definition of a function.
Against this background we can write the ratio Ax/ Lly (Lly ?=- 0) as
.dX
----
1
Lly Lly/ AX.

If Lly tends to zero so does Llx, for x = <PlY) is continuous at Yo.


Since y = f(x) has a derivative at xo, lim ~~ = f' (xo). We also have
AX-> 0 L.U.

f' (xo) ¢ 0. Hence there exists a limit of the quotient .1.y~ Ax as Lly ~ 0 (an_d,
consequently, as Llx ~ 0) and this limit is equal to 1 . Then ( *) implies
f' (xo)
that there exists a limit of : as Lly ~0 and

lim Ax= l
.1y-- o Lly f' (xo)
On the other hand, the limit of i as Lly ~ 0 is the derivative cp' (yo) of
x = cp(y) at the point y = Yo- Hence we have

or Xy, = -1 . ►
y;
It is easy to interpret Theorem 8.5 geometrically. If y = f(x) has
a derivative at xo then there exists a tangent to a graph of y = f(x) at
Mo(xo, f(xo)) and _if this tangent is not parallel to the x-axis, then it is also
a tangent to the curve x = cp(y) at Mo (Fig. 8.12). (Notice that the functions
y = f(x) and x = cp(y) are inverse to each other and are specified by the
same curve.) Looking at Fig. 8.12 we see that f' (xo) = tan a, cp' (yo) =
tan /3 and a + /3 = 1r/2 so that tan /3 = tan(1r/2 - a) = cot a = 1/tan a,
1.e.,

~ ' (yo) = f' :xo) ·


We can also write ( * *) as
1 , 1
f' (xo) = 'P, (yo) or Yx = -, (x; ¢ 0).
Xy

Differentiation formulas for inverse functions are easily obtainable by


applying the chain rule. Indeed, let y = f(x) and x = cp(y) be mutually in-
328 8. Differential Calculus. Functions of One Variable

verse functions. Then <i0l/(x)] = x. On differentiating both sides with


respect to x and using the chain rule, we get
, 1
(f; -;e 0),
,:::? 't?y = f;
'Py Jx
f {'I
= 1
"- f't
JX
= _1t
( '{)y, -;e 0) •
'Py
Differentiation of inverse trigonometric functions. (a) The function
y = sin - 1 xis defined on the closed interval [ - 1, 1] (Fig. 8.13). Notice that
y = sin - 1 x is the inverse of x = sin y defined on - 1r/2 ~ y ~ 1r/2.
On the open interval - 1 < x < 1 x = sin y has the positive derivative
x; = cos y for each y E ( - 1r /2, 1r/2). Then there exists the derivative
,_ 1 _ 1 1 1
y X - - - cos y x; ✓ 1 - sin 2 y ✓1 _ X2 '

where -1 < x < 1 and the "+" sign of ✓ 1 - x 2 is chosen, since cos y >0
for all y E ( - 1r/2, 1r/2).

7T
2

I1 X r
I
I
I
I
I
I
7T
2

Fig. 8.13 - Fig. 8.14

Hence
1
(sin - 1 x)' -1 <x< 1.
✓ 1 - x2 '

The points x = ± 1 are deleted, for the derivative x; = cos y is equal


to zero at y = ± 1r/2 where the right-hand side of ( *) becomes undefined.
(b) The function y = tan - 1 x, - oo < x < + oo, (Fig. 8.14), is the in-
8.3 Differentiation of Composite and Inverse Functions 329

verse function of x = tan y, - 1r/2 < y < 1r/2. Then


,_ 1 _ 1 1 1
Yx - -x; - - 1 - - _l_+_t_a_n_
2 -y - _1_+_x~
2 •

cos 2 y
Hence
(tan- 1 x)' = 1 for all x.
1 + x2
(c) To derive the formulas for the derivatives of cos- Ix and cot- Ix
it suffices to notice that
sin - 1 X + cos - l X = 1r/2'
tan - I x + cot - 1 x = 1r/2.
Whence
1
(cos- Ix)' for -1 < x < 1,
✓ 1 - x2
1
(cot - 1 x) ' =- 2 for all x.
1+ X

Differentiation of hyperbolic functions. By definition


. ex - e-x ex+ e-x
s1nh x = 2 and cosh x = 2 .
Then

(sinh x)' ( ex 2 e-x), = cosh x


and

(cosh x)' = (
ex+ e-x) 1
= sinh x.
2

Also by definition
sinh x cosh x
tanh x = h and coth x = . h .
COS X Sin X

Using the quotient rule and the identity cosh 2 x - sinh 2 x = 1, we obtain
(tanh x), =( sinh x)' = cosh 2 x - sinh 2 x = 1
cosh x cosh 2 x cosh 2 x
and

(coth x), = (c?shx)' = sinh 2 ~ - cosh 2 x = _ 1 (x ¢. O).


s1nh x s1nh 2 x sinh 2 x
330 8. Differential Calculus. Functions of One Variable

Differentiation of basic elementary functions. Below is the list of for-


mulas widely used in differential calculus.
(a is any real number, x > 0);
1
(lo&z x)' = (a > 0, a -¢ 1, x > 0);
x 1n a

(In x)' = _!_ (x > 0);


X

(ax) ' = ax In a (a > 0, a -¢ l);


(ex)' = ex;
(sin x)' = cos x;
(cos x)' = - sin x;

(tan x)' - 1
cos 2 x
1
(cot x)' - (x-¢ n1r, n = 0, ±1, ±2, ... );
sin 2 x

(sin - 1 x) ' = 1 ( -1 < x < l);


✓ 1 - x2

(cos - i x)' = 1
( -1 < x < l);
✓ 1 - x2

(tan - 1 x) ' - - 1-
1 + x2 '

(cot - 1 x)' = 1
----=- .
1+x 2 '

(sinh x)' = cosh x;


(cosh x) ' = sinh x;

(tanh x)' - 1 ·
cosh 2 x'

(coth x) ' = - 1 (x -¢ O);


sinh 2 x
Logarithmic differentiation. Sometimes it is much easier to find a
derivative of a natural logarithm of a function than that of this function
itself. This gives rise to a convenient differentiating procedure called the
logarithmic differentiation.
8.3 Differentiation of Composite and Inverse Functions 331

Suppose we are seeking for the derivative of a positive function y = f(x)


and assume that it is much easier to compute the derivative of <p(x) =
In y = In f(x). Then differentiating this function with respect to x, we
obtain
,
Ly = 'P '(x).

Whence y' = y• <p' (x) or


y' = f(x) (In f(x)) '.
This procedure is especially helpful when we deal with composite ex-
ponential functions (power-exponential functions) of the form
y= [u(x)] 11 (x),
where u(x) and v(x) are both differentiable and u(x) > 0.
Then, by logarithmic differentiation we have
In y = v(x) In u(x)
and
y' u '(x)
-y = v ' (x) In u (x) + v (x) ( ) .
ux
Whence

y' (x) = [u(x)] ,(x) ( u '(x) In u(x) + v(x) u:(t)).

Example. Find the derivative of y = x\ x > 0.


◄ Taking the natural logarithm of y = x\ we have
In y = x In x.
Differentiating with respect to x gives
Ly = In x + 1.
Whence
y' = y (In x + 1)
or
y ' = .t'° (In x + 1). ►
Applying differentials to approximate computations. Let y = f(x) be
differentiable at a point x so that an increment in y admits a representation
of the form
.1y = f' (x) At' + a(At") .1x (.1x ¢ 0),
where f' (x) At' = dy and a(.1x) --+ 0 as .1x--+ 0.
332 8. Differential Calculus. Functions of One Variable

Let dy ~ 0 and, consequently, f' (x) ~ 0. Then

dy = l + a(dX)
dy f' (x) .

Whence lim ddy = 1, i.e., the infinitesimals dy and dy are equivalent as


.1x-> O Y
dX ~ 0. This means that the difference dy - dy = a(dX) dX is an in-
finitesimal of higher order than J:iy and dy as dX ~ 0. Hence we may use
dy as an approximation of J:iy, i.e., dy =
dy, and the relative error can be
made whatever small for any sufficiently small IdX I. Then we can apply
the relation
f(x +AX)= f(x) + f' (x)dX
to approximate values of f(x).
Example. Let y = x 13 (/3 is any real number). Then
dy = (x + Axl - x 13 ,
dy = {3x 13 - 1 dX.

Given small IdX I, we have


(x + Axl = x 13 + dy
or
(x + AXl = x 13 + {3x 13 - 1dX.

In particular, for {3 = 1/2

✓x + Ax = vx + _1c dX (x ~ 0).
2vx
For instance, for ✓ 3.9978 = ✓4 + ( -0.0022). Putting x = 4 and dX =
- 0.0022, we get

✓3.9978 = ✓4 + (- 0.0022) .\"i" (- 0.0022) =


= V4 + 2v4 1.99945.

8.4 Derivatives and Differentials of Higher Orders


Derivatives of higher orders. Letf(x) have a derivative at every point
x in an open interval (a, b). Then the derivative f' (x) of f(x) is a function
defined on (a, b). It may happen that f' (x) has a derivative at a point
x in (a, b). This derivative off' (x) is called the second derivative of f(x)
or the derivative of the second order of f(x) and is denoted by f" (x) or
f
by 2>(x). Thus
f" (x) = (f' (x)) '.
8.4 Derivatives and Differentials of Higher Orders 333

Similarly, the nth-order derivative of /(x) is the derivative of the


(n - l)th-order derivative of /(x), i.e.,
fn> (x) = ifn - l) (x)) '.
To compute the nth derivative fn> (x) one has to find the first derivative
f' (x) of f(x), then the second derivative /" (x) as the ·first derivative of
f' (x) and continue this process until the nth derivative is computed. There-
fore the rules and formulas of computing the first derivatives are sufficient
to obtain derivatives of any desired order.
Examples. (1) Compute the nth derivative of y = ekx where k = const.
◄ We get in succession y' = kekx, y" = k 2 ekx, y '" = k 3 ekx, .... By in-
duction we can easily prove that

(ekx)(n) = knekx (n = 1, 2, 3, ... ). ►

(2) Compute the nth derivative of y = sin x.


◄ We have y' = cos x = sin ( x + ; ) , y" = - sin x = sin(x + 1r) =
sin ( x + 2;) • ... .
By induction it follows that

(sin x)'•> = sin (x + n ;) for all n E IN. ►


(3) Compute the nth derivative of y = cos x.
◄ Following the usual approach we have

(cos x)'•> = cos (x + n ;) for all n E IN.

The set of all functions f(x) which are defined on (a, b) and have con-
tinuous nth derivatives at every point x E (a, b) is denoted by cn(a, b).
We say that a function /(x) is infinitely differentiable on (a, b) and write
/(x) E C°(a, b) if /(x) has derivatives of any order at each point x E
(a, b). For example, the functions ex, sin x and cos x are infinitely differen-
tiable on ( - oo, + oo). ►
(4) Compute all the derivatives of the function y = x 4 •
◄ We have y<O = 4x3, y<2> = 12x2, y<3> = 24x, y<4 > = 24. Since y<4> is a
constant all derivatives of higher orders are zero, i.e., y<5> = y<6> = . . . =
y<n) = 0. ►
To give a physical interpretation of the second derivative we consider
the law s = s(t) of a rectilinear motion of a point. Then the first derivative
s' (t) = v(t) specifies the velocity of the point at the moment t and the
second derivative s" (t) = v '(t) is the acceleration of the point at t.
334 8. Differential Calculus. Functions of One Variable

Leibniz formula. Let u(x) and v(x) have the nth derivatives. Then
(a) if y(x) = u(x) ± v(x) then y<n\x) = u<n)(x) ± v<n)(x);
(b) if y(x) = u(x)v(x) then
y I = u + uv
IV I'

y 11
. = u v + 2u ' v ' + uu ",
II

y "' = u"' v + 3u u' + 3u' v" +


II
uv '11 and so on.

It is easy to notice that the formulas on the right resemble the binomial
formulas for (u + v) 1, (u + v) 2 and (u + v) 3 with the exponents indicating
how many times u and v are to be differentiated. The resemblance becomes
nearly complete if u and v are replaced by u<0 ) and v <0 >, standing for deriva-
tives of the zero order. Using the method of mathematical induction, we
can show that in general

(uv)(n) = n<n),v + __!!__ u<n-l)v(l) + n(n - 1) u<n-2)v<2)


l! 2!
. (n - k + 1) (n -
+ . . . + n(n - 1) . . k! k). (k) + (n)
u V ••• + UV •

This relation is called the Leibniz formula.


Example. Using the Leibniz formula, compute the derivative y 0001 ) of
the function y = x 2 ex.
◄ We have

y = ex,x2 = (ex)(l00l)x2 + 1001 (ex)(l000),(x2),


l!

+ 1001 ;! 1000 (ex)(999)(x2)" + 0 = exx2 + 2002exx + 1001 X 1O3eX. ►

We shall derive another useful formula. Let x = cp(y) and y = f(x) be


mutually inverse functions and let f' (x) '# 0. Then -

X' =-
l
y y;
and

Differentials of higher orders. Let y = f(x) be differentiable at a point


x. It may happen that at x the differential dy = f' (x) dx is a differentiable
function of x. Then there exists a differential of a differential of a given
function which is called the differential of the second order or the second
8.4 Derivatives and Differentials of Higher Orders 335

differential of y = f(x) and is denoted by d 2 y so that


d 2 y = d(dy).
Analogously we can define differentials of higher orders. The differential
of the nth order dn y or the nth differential of a function y = f(x) is the
differential of the (n· - l)th differential of y = f(x) so that
dny = d(dn - ly).
The differential dy may be called the first differential or the differential
of the first order of a function y = f(x).
Now we shall derive some important formulas for differentials of higher
orders. Let y = f(x) be a function of an independent variable x and let
y = f(x) have differentials of any order. Then
dy = f' (x) dx,
where dx = Ax is independent of x.
By definition
d 2y = d(dy) = d(f' (x) dx),
where f' (x)dx is a function of x, so that the factor dx being independent
of x can be taken outside the differential sign. So
d 2 y = d(f' (x))dx.
Computing d(f' (x)) as the first differential off' (x), we obtain
d(f' (x)) = (f' (x)) 'dx = f" (x) dx.
Hence the second-order differential d 2y of the function y = f(x) at the
point x is given by the formula
d 2y = f" (x)dx 2 ,
where dx 2 stands for (dx) 2 , dx being the differential of the independent
variable x at the point x.
Mathematical induction gives the formula of the nth differential as
dny = fn>(x)dxn,
where dxn = (dxt.
Whence
fn>(x) = dny .
dxn
Now let y = f(u) where u = cp(x) is a function differentiable sufficiently
many times. Since the first differential retains its form, we have
dy = f' (u)du,
336 8. Differential Calculus. Functions of One Variable

where du = cp' (x)dx is in general dependent on x. Then


d 2 y = d(dy) = d(f' (u)du) = d(f' (u))du + f' (u)d(du)
= f" (u)du 2 + f' (u)d 2 u.
If u is an independent variable then d 2 u = 0 and
d 2y = f" (u)du 2 •
Comparing both expressions for d 2y, we conclude that the second
differential does not possess the property of invariance of the form.
However, given a linear function u = 't,"(X) = ax + b where a and b are
constant, the second differential retains its form.
Differentiation of functions given in a parametric form. Suppose that
the Cartesian coordinate xy-system is set up on a plane. Let 't"(!) and y;(t)
be continuous functions on a closed interval a ~ t ~ {3. If we regard the
parameter t as time the functions 't"(t) and y;(t) specify the law of motion
of a point M in the xy-plane so that the coordinates of M are given by
X = 't,"(l)
[ a ~ t ~ /3.
y = y;(t)
The set M of all points whose coordinates (x, y) on the xy-plane are
given by the equations ( *) is called the plane curve. In this case we say
that a plane curve is given in a parametric form or is represented
parametrically.
For example, we can parametrically represent a circle with radius R and
centre at the origin of coordinates by the equations
X = R cost
[ 0 ~ t < 21r,
y = R sin t
-
where t is the angle between the x-axis and the position-vector OM from
the origin O to the point M(x, y) measured in radians (Fig. 8.15).
The parametric equations of a plane curve can be reduced to the equa-
tion F(x, y) = 0 by eliminating the parameter t. For example, if we square
and add the parametric equations of the circle the parameter t is eliminated
so that the circle with radius R and centre at the origin of coordinates is
given by the equation x 2 + y 2 = R2, familiar to the reader. However, some-
times we fail to eliminate t from parametric equations. In these cases we
need techniques and procedures that enable us to compute the derivative
of y with respect to x when the curve is given in a parametric form.
We say that a functional relationship between y and x is represented
parametrically if both variables x and y are specified separately as functions
of a parameter t so that x = cp(t), y = y;(t), t E (a, /3).
Let x = cp(t) and y = f (t) be defined and continuous for the values of
8.4 Derivatives and Differentials of Higher Orders 337

t in the open interval (a, /3). Suppose that there exists an inverse function
t = g(x) of x = 'P(t). Then y = 1P[g(x)] is a composite function of x. Fur-
thermore, we assume that 'P(t) and 1P(t) are differentiable at the point t E
(a, /3), 'P '(t) ¢ 0 and t = g(x) is differentiable at a point x corresponding
to t. Then by Theorem 8.3 the function y = 'If [g(x)] is differentiable at the
point x and
Yx' = Yt't'x•
On the other hand, by virtue of Theorem 8.5 we have

t'=_!_
X I
x,

lj

Fig. 8.15

so that
I 1 y/' I
Yx = Y,- = -
x/ x/
or

dy = 'Ip I (t) ' <''PI (t) ¢ 0).


dx 'P' (t)
This result immediately follows when we divide the nominator and
denominator of : by dt so that

dy = dyldt = 'Ip I (t)


dx dx/dt 'P' (t)

22-9505
338 8. Differential Calculus. Functions of One Variable

Example. Consider the circle represented parametrically by


X = R cost
[ . 0 ~ t < 21r.
y = R Sln t
Then

◄ dy = dyldt = R cos t = _ cot t


dx dxldt -R sin t

or ddy = - x (interpret this result geometrically). ►


X y
If 'PU) and t{;(t) have the kth derivatives and 'P' (t) -;,t O then the com-
posite function y = t{;[g(x)] has the kth derivative with respect to x.
The second derivative of y with respect to x is given by

d 2 y _ d (dy) _ d
dx 2 - dx dx - dt
(if;''(t)(t)) dxdt
cp

if; " (t) cp ' ( t) - if; ' (t) 'P " (t) 1 t/; " (t) 'P ' (t) - if; ' (t) 'P " (t)
- (cp ') 2 (!) . cp' (t) = (cp ')3(t)
Hence
y;' = (yx')/
x/
and in general
[ (n - I)] ,
y?> = Yx , t ,
x,
where y = f(x) is given in a parametric form by the equations x = x(t) and
y = y(t).
Example. Compute d 2{ provided that y = f(x) is given by
dx

[
x = a(t - sin t),
y = a(l - cos t).
◄ We have
dy = dyldt = a sin t = cot!_
dx dxldt a(l - cost) 2
and
d 2y
d:x2
= _!!_ (dy) = !!_
dx dx dt
(cot !._)
2
dt
dx
= !!_
dt
(cot !._)
2
1
dxldt
1 1 1 1
--·-·------ ►
sjn2 !_ 2 a(l - cos t)
2 4a sin 4 ~
8.5 Mean Value Theorems 339

8.5 Mean Value Theorems


Theorem 8.6 (Rolle's theorem). If a Junction f(x) (1) is continuous
on [a, b], (2) has a finite derivative at every point of (a, b) and (3) assumes
equal values at the endpoints of [a, b] so that f(a) = f(b), then there exists
at least one point ~ E (a, b) such that f' = 0. en
◄ Sincef(x) is continuous on [a, b] Theorem 7.30 implies thatf(x) attains
its maximum M and minimum m on [a, b].
Two cases should be distinguished:
(a) Let M = m. Then M ~ f(x) ~ M, i.e., f(x) is a constant function
on [a, b]. Hence f' (x) = 0 for all x in (a, b) and the theorem is true.
(b) Let M ~ m. Thenf(x) attains either a maximum Mor a minimum
m at a point ~ contained in (a, b) since f(a) = f(b) so that f(x) cannot
attain its maximum M at one end of [a, b] and its minimum m at the
other end simultaneously. For definiteness we set M = f(~), a < ~ < b
(Fig. 8.16).

!J

0 a b X

Fig. 8.16

Since f(x) has a derivative f' (x) at every point of (a, b) there exists
a derivative f' (~) at ~ and
lim f(~ + ax) - /(~) = lim f(~ - ax) - /(~) = f' (~) .
.ix-+O ax .1x ➔ O -ax
.ix>O .ix>O
On the other hand,/(~) =Mis the maximum of f(x) on [a, b] so that
f(~ + ax) - /(~) ~ o and /(~ - ax) - /(~) ~ o.
Whence
f(~ + ~ - /(~) ~ o and f(~ - ax) - /(~) ~ o (ax > O).
-ax
22*
340 8. Differential Calculus. Functions of One Variable

Evaluating the limits as dX ➔ 0, we obtain the inequalities


f' (!) ~ 0 and /' (!) ~ 0
which must be simultaneously true. Hence, f' (~)· = 0 and the theorem is
proved. ►
The geometric interpretation of Rolle's theorem is that if y = f(x) satis-
fies all the three conditions of the theorem then the graph of /(x) is
represented by the curve AB (Fig. 8.17) such that (1) the curve AB is con-
tinuous on [a, b], (2) there exists a tangent to the curve at any point between
A(a, f(a)) and B(b, f(b)) and (3) the ordinates of the points A and B of
this curve are equal. Rolle's theorem states that at least for one point C
(!, f(!)) of the curve AB there is a tangent parallel to the x-axis.
y

-r----------------
a 2
b I

Fig. 8.17

By way of illustration we shall show that the conditions (1)-(3) are im-
portant and if they are violated Rolle's theorem may not be true. Consider
the function f(x) = Ix I, -1 ~ x ~ 1 (Fig. 8.18). For this function condi-
tion (2) is not satisfied as /(x) has no derivative at x = 0. In this case Roll e's
theorem is inapplicable as in ( -1, 1) there exists no point at which the
derivative f' (x) is equal to zero. Indeed, f' (x) = -1 if - 1 < x < 0 and
f' (x) = 1 if O < x < 1 while at x = 0 f' (x) does not exist. Looking at
Fig. 8.19 we notice that the function/(x) = x - [x] does not satisfy condi-

y
y

-1 1 X

Fig. 8.18 Fig. 8.19


8.5 Mean Value Theorems 341

tion (1) of Rolle's theorem since the point x = 1 is a discontinuity of f(x)


and thus f(x) is not continuous on [O, l]; the derivative f' (x) is equal to
1 at all points in (0, 1).
Problems. (1) Consider the function f(x) = 1 + xm (1 - xt where m
and n are positive integers. Without computing the derivative of f(x), show
that the equation f' (x) = 0 has at least one real root in (0, 1).
(2) Prove that the equation x 3 + 3x - 6 = 0 has only one real root.
Theorem 8.7 (mean-value theorem)"'>. If afunctionf(x) (1) is continuous
on {a, b] and (2) has a derivative f' (x) in (a, b) then there exists at least
one point ~ in (a, b) such that

f(bb ={(a) = f' (~), a< ~ < b.

◄ We introduce an auxiliary function F(x) on [a, b] as

F(x) = f(x) - f(a) - f(b) - f(a) (x - a).


b-a
The function F(x) satisfies the conditions of Rolle's theorem. Indeed,
F(x) is continuous on [a, b] since every summand in F(x) is continuous
on [a, b]; in (a, b) F(x) has a finite derivative, for every summand of F(x)
has a derivative in (a, b); and F(a) = F(b) = 0, i.e., F(x) assumes equal
values at the endpoints of [a, b].
Then by virtue of. Rolle's theorem there exists at least one point ~ in
(a, b) such that F' (~) = 0.
On the other hand

F' (x) = f' (x) - f(b) - f(a)


b-a
so that at the point t
/' (t) - /(b) - /(a) = 0.
b-a
Whence

/' (t) = /(b) - /(a) , t E (a, b). ►


b-a
Rolle's theorem is a specific case of the mean value theorem for deriva-
tives and is easily obtained by putting /(a) = /(b) in the latter.

•> In Russian-language mathematical literature this theorem is called the Lagrange


theorem.
342 8. Differential Calculus. Functions of One Variable

To interpret the mean value theorem geometrically we turn to Fig. 8.20.


It is easy to notice that the ratio f(bb = ~(a) is the slope of the line AB

and/' (n is the slope of the tangent toy = f(x) at the point with the abscis-
sa x = ~- Hence the mean value theorem states that if AB is a continuous
curve such that there is a tangent at every point between A and B then
there exists at least one point C(~, J(n) between A and B at which the
tangent to the curve AB is parallel to the straight line AB.
The formula
f(b) - f(a) = f, (~)
b-a

IJ

f(b)

0 a b X

Fig. 8.20

or
f(b) - f(a) = f' (~)(b - a), a < ~ <b
is also valid for a > b. Since ~ is in general unknown it is convenient to
represent it as
~ = a + O(b - a),
where () is a real number, 0 < () < 1. Then the above formula becomes
f(b) - f(a) = f' (a + O(b - a))(b - a), 0 < fJ < 1.
Replacing a and b by x and x + Llx, respectively, we get
Jif(x) = f(x + AX) - f(x) = f' (x + 0 Llx)Jix, 0 < () < 1.
This is the exact expression that relates an increment of y = f(x) and an
increment ax whereas the relative error of the approximate relation
Jif(x) =f' (x)Jix
8.5 Mean Value Theorems 343

tends to zero only as Llx -+ 0. Also notice that in the exact relation the
number 0 is in general unknown.
Example. By use of the mean value theorem prove that
I tan - 1 x2 - tan - 1 x1 I ~ Ix2 - xi I Vx1, x2.

◄ Consider the function f(x) = tan - 1 x. This function satisfies the condi-
tions of the mean value theorem on any [a, b]. Then for any X1 and x2
f(x2) - f(xi) = f' (~)(x2 - xi)
or
tan -1 x2 - tan -1 X1 = 1 (x2 - x1 ) ,
2
1+ ~
where the point ~ is between the points X1 and X2.
Whence
1
I tan - 1 x2 - tan - 1 x1 I = 2 lx2 - x1 I
1+ ~
and
I tan - 1 x2 - tan - 1 x1 I ~ Ix2 - xi I
1
since - - 2 ~ 1 for all ~. ►
1+ ~
Problem. Using the mean value theorem prove that

X
-1 -- < In (1 + x) < x, x > - 1.
+x
Theorem 8.8 (Cauchy mean value theorem). If the functions f(x) and
<,0(x) (1) are continuous on [a, b], (2) have derivatives f' (x) and <,0' (x) in
(a, b) and (3) if <,0 '(x) ~ 0 in (a, b) then there exists at least one point
~ in (a, b) such that

f(b) - f(a) _ f' (n


<,0(b) - <P(a) - <P' (~) , a < ~ < b.
◄ From the conditions of the theorem it follows that the difference
<,0(b) - <,0(a) can not be equal to zero. Indeed, if <,0(b) - <,0(a) = 0 then <,0(x)
would satisfy the conditions of Rolle's theorem so that <P ' (x) would be
equal to zero at least at one point ~ in (a, b). However this contradicts
condition (3) of this theorem. Hence the formula ( *) makes sense. Let us
show that this formula holds for some ~ in (a, b).
Consider the auxiliary function

F(x) = f(x) - f(a) - f(b) - f(a) ( <,0(x) - <,0(a))


<,0(b) - <,0(a)
344 8. Differential Calculus. Functions of One Variable

that satisfies the conditions of Rolle's theorem. Indeed, (1) F(x) is continu-
ous on [a, b] since f(x) and <P(X) are continuous on [a, b]; (2) F(x) has
a derivative F' (x) at every point in (a, b), since every summand in the right-
hand side of F(x) has a derivative in (a, b); and (3) F(a) = F(b) = 0.
From Rolle's theorem we conclude that between a and b there exists
~ such that F' (~) = 0. The derivative F' (x) of F(x) is

F' (x) = f' (x) - f(b) - f(a) <P' (x)


<P(b) - <P(a)
so that

f' (~) _ f(b) - f(a) <P, (n = o.


<P(b) - <P(a)
Dividing both sides by <P' (~) ¢ 0, we arrive at the desired formula
f' (~) = f(b) - f(a) ►
<P '(~) <P(b) - 'P(a) ·
It is easy to notice that if we put <P(X) = x the Cauchy mean value the-
orem becomes the mean value theorem for derivatives.
Problem. Given the differences f(b) - f(a) and ',O(b) - <P(a), could we
derive the formula involved in the Cauchy mean value theorem by applying
the mean value theorem to these differences?
Remark. Rolle's theorem, the mean value theorem and the Cauchy mean
value theorem imply that there exists some "middle" point ~ E (a, b) at
which some of the named relations are true. For this reason all these the-
orems are collectively named the mean value theorems for derivatives.

8.6 L'Hospital's Rule


Let functions f(x) and <P(X) be defined in a neighbourhood of a
point x = a and letf(a) = 0 and <P(a) = 0. Then the quotient f((x)) becomes
. <PX
indeterminate at x = a while the limit of this-quotient can exist at x = a.
The notation ~ is frequently used to refer to this ambiguous situation.
When we wish to compute lim f((x)) where the quotient
x-+a ',O X
f~xt
',O X
is indeter-

minate at x = a we shall speak of evaluating the indeterminate form ~ .


00
Evaluating the indeterminate form CX)
involves the computation of

Jim /(x) provided that lim/(x) = oo and lim ',O(X) = oo.


x-+a ',O(X) x-+a x-+a
8.6 I.:Hospital's Rule 345
-- - -----------

Evaluating the indeterminate form oo - oo involves the computation of


lim [.f(x) - ',O(X)] provided that limf(x) = oo and lim ',O(X) = oo.
x-+a x ➔ a x ➔ a

_ We can introduce the similar notions for limits at infinity, i.e., for limits
as x ➔ oo.
Theorem 8.9 (IJHospital's rule). Let f(x) and ',O(X) have derivatives f' (x)
and "°'
(x) in a neighbourhood (a - o, a + o) of a point a except probably
at a. Suppose that ~(x) and ~' (x) are not equal to zero in (a - o, a + o).
If limf(x) = 0 and lim ~(x) = 0 and if the quotient f ', ~x~ has a finite or
x➔a x➔a ~ X
infinite limit as x ➔ a then there exists
lim f(x) = lim f' (x) .
x ➔ a ',O(X) x ➔ a ',O' (x)

◄ Theorem 8.9 says nothing about the values of f(x) and ',O(X) at x = a.
We put f(a) = 0 and ~(a) = 0. Then limf(x) = f(a) and lim',O(X) = ~(a)
x ➔ a x-+a
and the functions f(x) and ~(x) become continuous at a so that on
[a, x] (or on [x, a]) where x is a point in (a - o, a + o), f(x) and ~(x)
satisfy the conditions of the Cauchy mean value theorem. Hence, there ex-
ists at least one point ~ = ~(x) between a and x such that
f(x) _ f(x) - f(a) _ f' (~)
---
~(x) ~(x) - ~(a) ~' (~)
If for some x there exist more than one such ~ we choose any of them.
The point ~ is dependent of x and ~ ➔ a as x ➔ a. As stated, the quo-
tient f' (x) has a finite or infinite limit as
"°, (x) x➔ a. This limit is independent
of how x tends to a. So the quotient f' (~) has a limit equal to a limit
',O' (~)

of f' (x) as x ➔ a and, consequently, as ~ ➔ a, so that


~, (x)

lim /' (E) = lim /' (x) .


x-+a ',O' (E) x-+a ',O' (x)
(E-+a)

From the identities (•) and (••) we have obtained it follows that
lim /(x) = lim f'(x) . ►
x-+a',O(X) x-+a ',O' (X)
The above identity represents CHospital's rule which allows, under
some specific conditions, to replace a limit of a quotient of functions by
346 8. Differential Calculus. Functions of One Variable

a limit of a quotient of their derivatives which 1s sometimes easier to


compute.
Example.
. 1 - cos x
11m . (1
= 11m - cos x) ' = 11.m sin x = _!_ •
2 2
x-+~ X x-+O (x ) ' x-+O 2x 2
Remarks. (1) If the conditions of this theorem are satisfied on the open
interval (a - o, a) or on (a, a + o) I..!Hospital's rule is applicable to compu-
tation of the limit of ~~~ as x --+ a - 0 or as x --+ a + 0, respectively.
(2) It may happen than the limit of a quotient of derivatives does not
exist while the limit of the quotient bf the respective functions does exist.
As an illustration we consider the functions f(x) = x 2 sin! and cp(x) = x.
X
For the quotient of these functions at x =0 we have
2 • 1
X Sln -
lim f(x) = lim x = limx sin ! = 0.
x-+Ocp(X) x-+O X x-+O X

On the other hand, the quotient of the derivatives f' (x) = 2x sin ! -
cp' (x) x

cos ! has no limit at x = 0. Hence, from the existence of lim f((x)) it does
X x-+O 'PX

not necessarily follow that lim f' (x) exists.


x-+O cp '(x)
(3) Sometimes we have to apply I..!Hospital's rule repeatedly when com-
puting lim f((x)) . For example, if f(x) and cp(x) and their derivatives f' (x)
x-+a 'P X
and cp ' (x) all satisfy the hypothesis of I..!Hospital's rule we can apply the
rule to compute lim f' (x) and so on.
x-+acp'(x)
Example.
. x - sin x . (x - sin x) ' . 1 - cos x . sin x I
11m 3 = 11m 3 = 11m 2 = 11m-- = - .
x-+0 X x-+O (X ) ' x-+O 3x x-+O 6X 6

Theorem 8.10. ( L'Hospital's rule for an indeterminate form ~) . Let


the functions f(x) and cp(x) have the derivatives f' (x) and cp' (x) in a neigh-
bourhood (a - o, a + o) of a point a except probably at a. Suppose that
cp(x) and cp' (x) are not equal to zero in (a - o, a + o). If limf(x) = oo and
x-+a
8.6 I.:Hospital's Rule 347

lim <,0(x) = oo and if f' (x) has a finite or infinite limit as x--. a then there
x-+a <,0'(x)
exists
lim f(x) = Iim f, (x) .
1
x-+a<,O(X) x--+a<,0 (X)

Here we can also consider one-sided limits as x -+ a - 0 or as x --. a +0


(see Remark 1).
Example.
a cos ax
.
IIm In sin ax . (In sin ax) ' sin ax
--- = IIm - - - - = lim
x-+ o + o In sin bx x--+0+0 (In sin bx)' x--+O+O b cos bx
sin bx

= a lim (cos ax . s~n bx) = 1 (a > O, b > O).


b x--+ o + o cos bx sin ax
!!Hospital's rules are used to compute the following limits:
(a) limlf(x)<,0(x)] for limf(x) = 0 and lim<,0(x) = oo.
x-+a x--+a x--+a
◄ It suffices to represent f(x)<,0(x) as

l'.(x) {/'J (x) = f (x) or f:(x) (x) - <P (x)


J, T }/ <,O(X) J, <,0 - 1/f(X) •
Then the functions on the right satisfy the hypothesis of VHospital's
rule. ►
Example.

lim (x In x) = lim In x = lim 1/x


x-+0+0 x-+0+0 1/x x--+0+0 - l/x2 = 0.
(b) lim l/(x) - <,0(x)] for limf(x) = oo and lim <,0(x) = oo.
x--+a x--+a x-+a
◄ To apply !!Hospital's rule it suffices to write f(x) - <,0(x) as
1 1 1/<,0(X) - 1//(x)
f(x) - <,0(x) = 1/f(x) -----~'-------.
1/ <,0(x) 1//(x)• 1/<,0(x)

Example.

. ( -X-
1Im - -1-)
x In-x---x-+- 1-
= II. m
x-+ 1 x- 1 In x x-+ 1 (x - 1) In x

1. In x 11• · 1/x 1
= }~ In x + 1 - 1/x = x~ 1/x + 1/x2 - 2·
(c) lim lf(x)],px for each of the following cases
x-+a
348 8. Differential Calculus. Functions of One Variable

(i) limf(x) =0 and Iim <,?(x) =0 (0°);


(ii) limf(x) = 1 and lim <,0(x) = oo (1 00 ) ;
x ➔ a x ➔a

(iii) limf(x) = + oo and lim <t?(X) = 0 ( 00°).


x ➔ a x ➔ a

◄ Put
Y = lf(x)]~(x).
Let us consider
In y = <,0(x) In f(x)
and evaluate
liminy = lim[<,0(x)Inf(x)].
x ➔ a x ➔ a

It is easy to notice that we have to evaluate a limit considered in (a)


for each of the cases (i)-(iii).
Suppose that limin y = A. Then limy = eA, i.e.,
x ➔ a x ➔ a

lim [/(x)]~<x> = eA. ►


x----a

Example. Find Iim r.


x ➔ o+o

◄ Put y = r. Then ln y = x ln x. Whence


lim In y = lim In x = Iim llx =O
x ➔ 0+0 x ➔ 0+0 1/X x ➔ 0+0 - l/x 2 '

so that
lim y = e0 = I. ►
x ➔ 0+0

Theorem 8.11. Suppose that (1) functions f(x) and <,0(x) are defined for
all x such that lxl is sufficiently large; (2) lim f(x) = Iim <,0(x) = 0 or
x ➔ oo x ➔ oo

lim f(x) = oo and lim <,0(x) = oo; (3) there· exist derivatives f' (x) and
x➔m x➔ m

<,0' (x) (<,0' (x) ¢ O)for all x such that lxl is sufficiently large; (4) there exists
a finite or infinite limit of the quotient f' (x) as x ..... oo. Then
<,O '-(X)

lim f(x) = lim f' (x) .


x ➔ m <,0(X) x ➔ m <,O' (X)

To verify that Theorem 8.11 is true it suffices to put x = lit and use
the results of Theorem 8.9 and Theorem 8.10.
8. 7 Tusts for Increase and Decrease of a Function 349

Example.
x2 . 2x
lim xe - lIm - = lim 2x = 0.
x➔ +ao x ➔ + Cl0 ex x➔ + Cl0 e
Applying l!Hospital's rule repeatedly n times, we obtain
. nxn - I 1· n! 0
hm x = ... = Im - =
X➔ + ao e X➔ + ao ex

so that the function ex increases faster than any power function of x as


x-+ + oo.
The following example shows that l!Hospital's rule being applicable in
.
evaIuat Ing ✓ l-
1·Im - +-x 2 turns out to b e unsuita
. ble 1or . I pur-
" practica
x➔ + ao X
poses. Indeed, applying l!Hospital's rule, we have
X

.
IIm ✓1 + x2 .
1Im ✓1 + x2 .
1Im x
--- = --- =
x➔ + ao x x➔ + ao 1 x ..... + ao ✓ 1 + x2
1
lim - - - - lim ✓ 1 + x 2 , etc.,
x ➔ +ao X

while elementary algebraic manipulations easily yield

.
1Im ✓ 1 -+ -
-
x ➔ +ao
x2 =
X
1·Im
x ➔ +ao
Fi• 2
X
+ = 1.

8.7 Tests for Increase and Decrease of a Function


on a Closed Inte"al and at a Point
Definitions. We say that a function /(x) defined on a closed interval
[a, b] is nondecreasing on [a, b] if given any x1 and x2 in [a, b], the condi-
tion x1 < x2 implies that /(x1) ~ /(x2).
If X1 < x2 always implies that /(x1) < /(x2) then /(x) is said to be in-
creasing on [a, b].
We say that a function f(x) is nonincreasing on [a, b] if given any x1
and x2 in [a, b], x1 < x2 implies that /(x1) ~ /(x2) and /(x) is decreasing
on [a, b] if xi < x2 implies that /(x1) > /(x2).
A function f(x) is said to be monotone on [a, b] if f(x) is only non-
decreasing, in particular increasing, on [a, b] or only nonincreasing, in par-
ticular decreasing, on [a, b].
350 8. Differential Calculus. Functions of One Variable

Theorem 8.12. Let f(x) be continuous on [a, b] and let f(x) have a
derivative f' (x) on (a, b). For the function f(x) to be nondecreasing on
[a, b] it is necessary and sufficient that f' (x) ;:;: 0 for all x in (a, b).
Necessity. Letf(x) be nondecreasing on [a, b] (Fig. 8.21). We prove that
f' (x) ;:;: 0 on (a, b).

y=f(x)
lj I
I
I
I
I
I
I
I
f(x+,:1r)
I
f(r) I
f(x- I
-,:1rJ I
I
0 a X-L1I r r-t,:1r b X

Fig. 8.21

◄ Consider two points x and x + .6.x in (a, b). Sincef(x) is nondecreasing


then AX and f(x + .6.x) - f(x) are of the same sign for any AX so that
f(x + .6.x) - f (x) ~ 0
.6.x ?' •

Notice that at any point x in (a, b) there exists a derivative f' (x). Then
from the above inequality it follows that

f' (x) = Iim f(x + ~ - f (x) ;:;: 0.


~x-+O

Hence at any x in (a, b) f' (x) ;:;: 0. ►


Sufficiency. Let/' (x) ~ 0 on (a, b). We prove that/(x) is nondecreasing
on [a, b].
◄ Indeed, let x1 < x2 be any two points in [a, b]. By Theorem 8.7 we have

f(x2) - /(x1) = f' (~)(x2 - x1),


where x1 < ~ < X2.
Since f' (x) ~ 0 at any point x in (a, b) then f' (~) ~ 0. Also, x2 > x1.
This implies that
f(x2) ~ f(x1).
8.7 Tests for Increase and Decrease of a Function 351

Hence, f(x) is nondecreasing on [a, b] since f(x1) ~ f(x2) for


Xi < X2. ►
Similarly, we can prove the following.
Theorem 8.13. Let f(x) be continuous on [a, b] and let f(x) have a
derivative f' (x) on (a, b). For the function f(x) to be nonincreasing on
[a, b] it is necessary and sufficient that f' (x) ~ 0 for all x in (a, b).
Therefore if f' (x) does not change its sign on some interval, f(x) is
monotone on this interval. Also true is the following proposition: if
f' (x) > 0 in (a, b) thenf(x) increases on [a, b]. This proposition gives the
sufficient condition for a function to increase.
It is worth mentioning that if f(x) increases on [a, b] then it does not
follow that f' (x) > 0 everywhere in (a, b).
Example. The function f(x) = x 3 increases on [ -1, 1). However the
derivative f' (x) = 3x 2 is equal to zero at x = 0.
We can also speak of a function decreasing or increasing at a point.
A function f(x) is said to be increasing at a point x = xo if there exists
a neighbourhood (xo - D, Xo + f>) of Xo such that f(x) < f(xo) whenever
x < Xo and f(x) > f(xo) whenever x > Xo (Fig. 8.22).
lj

I
I
I
I
f(x 0 J I
I

0 X

Fig. 8.22

Analogously, a function f(x) is said to be decreasing at a point x = Xo


if given some neighbourhood of Xo, f(x) > f(xo) whenever x < xo and
f(x) < f(xo) whenever x > Xo.
The following theorem specifies the sufficient conditions for a function
to be increasing or decreasing at a point. .
Theorem 8.14. Let f(x) have a derivative f' (xo) at xo. If f' (xo) > 0 then
f(x) increases at Xo and if f' (xo) < 0 then f(x) decreases at Xo.
◄ Let f' (xo) > 0. Then

. f(xo + AX) - f(xo)


I1m AX > 0.
.!lx-+O
352 8. Differential Calculus. Functions of One Variable

This means that there exists o> 0 such that for all tlX
/(Xo + AX) - /(xo) >0
tlX
whenever O < ItlX I < o.
Whence it follows that if O < ItlX I < o, AX and /(xo + AX") -- /(xo) are
of the same sign, namely, if AX < 0 then f(xo + AX) - /(xo) < 0, i.e.,
f(xo + AX) < f(xo), and if tlX > 0 then /(xo + AX) - /(xo) > 0, 1.e.,
/(Xo + AX) > /(xo).
By definition this means that f (x) increases at xo.
Using similar reasoning, we can show that if /' (xo) < 0 then /(x)
decreases at xo. ►
y y

r
0 .r

Fig. 8.23

Fig. 8.24

Notice that the function shown in Fig. 8.23 increases at x = O; however


the derivative of this function does not exist at x = 0.
The function f(x) = x 3 increases at x = Q and its derivative f' (x) = 3x2
vanishes at x = 0 (Fig. 8.24).

8.8 Extrema of a Function. Maximum and Minimum


of a Function on a Closed Interval
Local extrema. Let f(x) be defined in a neighbourhood of a point
xo and at xo. We say that f(x) has a local maximum at xo if there exists
o > 0 such that
dj = f(x) - f(xo) ~ 0
for all x in (xo - o, Xo + o) (Fig. 8.25).
8.8 Extrema of a Function 353

A function f(x) is said to have a local minimum at xo if there exists


o > 0 such that
D..j = f(x) - f(xo) ~ 0
for all x in (xo - o, xo + o) (Fig. 8.26).
The point x 0 at which f(x) has a local maximum (minimum) is called
the point of local maximum (minimum). The local maximum and local
minimum of a function are called the local extrema of this function.

0 .r

Fig. 8.25

lj

Fig. 8.26

These definitions mean that f(xo) is a local maximum of f(x) if there


exists an open interval (xo - o, xo + o) such that f(xo) is a maximum of
f(x) on this interval and f(xo) is a local minimum of f(x) if there exists
an open interval (xo - o, xo + o) such that /(xo) is a minimum of f(x) on
this interval.

23-9505
354 8. Differential Calculus. Functions of One Variable

The term "local extremum" is used here because we consider a maxi-


mum or minimum of a function within some neighbourhood of a given
point rather than over the domain of a function. For example, the function
y = f(x) shown in Fig. 8.27 has a local maximum at xo and a local mini-
mum at X1 but f(xo) < f(x1).
In what follows we shall speak of extrema of a function omitting "lo-
cal" for brevity.
A function f(x) is said to have a strict maximum (minimum) at a point
x0 if there exists o > 0 such that
f(x) - f (xo) < 0 ( f(x) - f(xo) > 0)

0 I

Fig. 8.27

whenever O < Ix - xo I < o. Then the point x 0 is called the point of strict
maximum (minimum) of a function. Here we do not assume that f(x) is
continuous at Xo. For example, the function

f (x) = [x 2 for x ~ 0
1 for x = 0
.
is not continuous at x = 0 butf(x) has a maximum at x = 0. Indeed, there
exists o > 0, say o = 1, such that f (x) - f(0) = f (x) - 1 < 0 for all x ~ 0
in ( -1, 1) (Fig. 8.28).
Problems. (1) Using the definitions of maximum and minimum, prove
that the function
_ [e - I/x2 for x -¢. 0
f(x) - 0 for x =0
has a minimum at x = 0, and the function
_ [xe - uxz for x -¢. 0
g(x) - 0 for x =0
has no extremum at x = 0.
8.8 Extrema of a Function 355

(2) Find out whether the functionf(x) = (x - xot<P(X) has a maximum


or minimum or has no extremum at Xo provided that (1) <;i(x) is continuous
at xo and <;i(xo) -;c 0, (2) the derivative <P' (x) does not exist at xo and (3)
n is natural.
Theorem 8.15 (necessary condition for extremum). A Junction f(x) can
have an extrem um only at the points where its derivative f' (x) either is
equal to zero or does not exist.
◄ Let f(x) have a derivative at Xo and let f' (xo) -;c 0. For definiteness we
set f' (xo) > 0. Then f(x) is an increasing function at Xo so that there exists
o > 0 such that f(x) < f (xo) for all x in (xo - o, xo) and f(xo) < f (x) for
all x in (xo, x 0 + o) (Fig. 8.29). This implies that there exists no neighbour-
hood of Xo where f(xo) is a maximum or a minimum of f(x) and the point
x0 is neither a point of maximum nor a point of minimum of f(x).

IJ
y
y=f(x)
I
I
I
I
f(x0 ) I
I
I
0
0 Io -rJ Io Xo+rJ X

Fig. 8.28 Fig. 8.29

Using similar reasoning we arrive at the same conclusion for f' (xo) < 0.
Hence, if there exists a derivative f' (x) at a point xo and if f' (xo) -;e. 0
then f(x) can have neither a maximum nor a minimum at xo. Thus f(x)
can have an ext rem um only at a point where its derivative f' (x) either is
equal to zero or does not exist. ►
Figure 8.30 offers a geometric illustration of Theorem 8.15. The func-
tion y = f(x) has extrema at the points x1, x2, X3 and X4, The derivative
f' (x) does not exist at x1 and X4 and f' (x) is equal to zero at x2 and X3.
The points where the necessary condition for extremum of a function
f(x) is satisfied are sometimes called the critical points of f(x). These are
the roots of the equation f' (x) = 0 and the points where f' (x) does not
exist (in particular, the points where f' (x) is an infinity). The points where
f' (x) = 0 are called the stationary points of f(x), since the rate of change
of the function f(x) at these points is zero.
Theorem 8.15 specifies only the necessary condition for extremum of
a function f(x) and it does not necessarily have a maximum or a minimum
23*
356 8. Differential Calculus. Functions of One Variable

at every critical point. For example, if /(x) = x 3 then f' (0) = 0 so that the
point x = 0 is a critical point. However f(x) = x 3 has no extremum at x = 0
since f(0) = 0 and f(x) < 0 for x < 0 and f(x) > 0 for x > 0, which means
that f(x) increases at x = 0.
The following theorems specify the sufficient conditions for a function
to have a maximum or a minimum at a point.

lj

0 r

Fig. 8.30

Theorem 8.16. Let x = xo be a critical point of a function f(x) where


either f' (xo) = 0 or f' (xo) does not exist and let f(x) be continuous at Xo.
Suppose that there exists o > 0 such that f' (x) > 0 for all x in (xo - o,
xo) and f' (x) < 0 for all x in (xo, xo + o) so that the derivative f' (x)
changes its sign from positive to negative at xo when x moves through xo
from left to right. Then f(x) has a maximum at xo.
◄ Since f' (x) > 0 on (xo - o, xo) then f(x) increases on the closed interval
[xo - o, xo]; similarly, f' (x) < 0 on (xo, xo + o) implies that f(x) decreases
on the closed interval [xo, Xo + o]. Hence f(xo) is a maximum of f(x) within
the neighbourhood (xo - o, xo + o) of xo (Fig. 8.31). This means that f(xo)
is a local maximum of f(x). ►
Theorem 8.17. Let x = Xo be a critical point of a function f(x) where
either f' (xo) = 0 or f' (xo) does not exist and let f(x) be continuous at xo.
Suppose that there exists o > 0 such that f' (x) < 0 for all x in (xo - o,
xo) and f' (x) > 0 for all x in (xo, xo + o) so that the derivative f' (x)
changes its sign froin negative to positive at xo when x moves through xo
from left to right. Then f(x) has a minimum at xo.
The proof of Theorem 8.17 is similar to that of Theorem 8.16.
If within some neighbourhood (xo - o, xo + o) of a critical point xo
the derivative f' (x) is of the same sign on the left and on the right of xo
the function f(x) has no extremum at xo. Thus, if f' (x) > 0 for a:11 x in
8.8 Extrema of a Function 357

(xo - o, xo) and for all x in (xo, xo + o), then given any whatever small
(xo - o, xo + o) f(x) increases both on the left and on the right of xo so
that f(xo) is neither a maximum nor a minimum of f(x) in (xo - o,
xo + o),
i.e., f(x) has neither a maximum nor a minimum at xo.
For the sufficient conditions given by Theorem 8.16 and Theorem 8.17
to be satisfied it is important that the function f(x) be continuous at xo.
For example, if
-x for x < 0,
f(x)= [ X + 1 for x ~ 0,
then the derivative f' (x) does not exist at x = 0 (Fig. 8.32). The derivative
f' (x) changes its sign when x moves through x = 0; however, f(x) has no
extremum at x = 0 since there exists no neighbourhood of x = 0 where
f(O) = 1 would be either a maximum or a minimum of f(x). The point
is that the function f(x) is not continuous at x = 0.

!J

0 Io +!J X
0 .r.

Fig. 8.31 Fig. 8.32

The general procedure of computing extrema of a function involves the


following steps:
(a) Compute the derivative f' (x) and find the roots of the equation
f' (x) = O;
(b) Find all points where/' {x) does not exist. These points along with
the roots of f' (x) = 0 are critical points of f(x).
(c) Determine the signs off' (x) on the left and on the right of every
critical point. Then f(x) has a maximum at a critical point Xo if f' (x)
changes its sign from positive to negative when x moves through this critical
point from left to right and /(x) has a minimum at xo if f' (x) changes
its sign from negative to positive at xo when x moves through this point
from left to right. If f' (x) does not change its sign when x moves through
xo f(x) has neither a maximum nor a minimum at Xo.
358 8. Differential Calculus. Functions of One Variable

Examples. (1) Investigate the function y = x 2 e - x for extremum.


◄ (a) Computing the derivative y' of y, we have y' = 2xe -x - x 2 e -x =
e -x(2 - x)x.
(b) Solving the equation y' = 0 gives the critical points x = 0 and
X = 2.
(c) Determining the signs of the derivative, we conclude that: (i) y' is
negative on the left of x = 0 and positive on the right of x = O; (ii) y'
is positive on the left of x = 2 and negative on the right of x = 2. Hence,
f(x) has a maximum at x = 2 and a minimum at x = 0 (Fig. 8.33). ►
(2) Investigate the function y = x 213 for extremum.

◄ (a) The derivative y' = 23 i-


vx
.
(b) The derivative y ' does not vanish; however it does not exist at x = 0
so that y ' (x) -+ oo as x -+ 0. Hence the only critical point is x = 0.
(c) The derivative y' (x) is negative on the left of x = 0 and positive
on the right of x = 0. Hence the function f(x) has a minimum at x = 0
(Fig. 8.34). ►

lj
lj

0 X

Fig. 8.33 Fig. 8.34

(3) Investigate the function y = x 3 for extremum.


◄ (a) The derivative y' = 3x 2 •
(b) The critical point as a solution of 3~ 2 = 0 is x = 0.
(c) The derivative y' (x) is positive on the left and on the right of x = 0.
Hence the function f(x) increases at x = 0 and has neither a maximum
nor a minimum at x = 0. ►
Remark. If a function /(x) has a minimum at a point xo this does not
imply that f(x) necessarily increases on the right of xo and decreases on
the left of Xo. Consider the function

f(x) =
x2 ( 2 - sin !) for x 7' 0,

0 for x = 0.
8.8 Extrema of a Function 359

It is easy to see that at x =0 f(x) is continuous and has a m1n1mum


(Fig. 8.35). The derivative f' (x) = 2x ( 2 - sin ! ) + cos! is continuous
in any neighbourhood of x = 0 except at x = 0 and f' (x) changes its sign
infinitely many times. The function f(x) is not monotone on the left and
on the right of x = 0.
Applying the second derivative to investigation of a function for extre•
mum. The following theorem specifies the sufficient conditions for a func-
tion to have an extremum at a point.

!J= 3x 2

Fig. 8.35

Theorem 8.18. Let f(x) have the first and second derivatives at a point
xo and let f' (xo) = 0 and f" (xo) -;,1: 0. Then at Xo the function f(x) has a
maximum if f" (xo) < 0 and a minimum if f" (xo) > 0.
◄ Observe that the point Xo is a critical point of the function f(x) since
f' (xo) = 0. Let f" (xo) < 0. This implies that at Xo the first derivative f' (x)
of f(x) decreases so that there exists a neighbourhood (x0 - o, xo + o) of
xo such that f' (x) > f' (xo) = 0 for all x in (xo - o, xo) and f' (x) <
f' (xo) = 0 for all x in (xo, xo + o). Hence f' (x) changes its sign from posi-
tive to negative at xo when x moves through x 0 from left to right. Thus
f(x) has a maximum at xo.
Similar reasoning yields that f(x) has a minimum at x 0 if f" (xo) > 0
at Xo. ►
Theorem 8.18 enables us to lay down the following useful procedure
for investigating a function for extremum. First, we find all critical points
of a function as outlined in the general procedure given above. Second,
we compute the second derivative f" (x) at a critical point and determine
its sign if f" (x) exists. If at some critical point xo f" (xo) < 0,f(x) has a
360 8. Differential Calculus. Functions of One Variable

maximum at xo and if f" (xo) > 0 at Xo then f(x) has a minimum at xo.
If the second derivative either is equal to zero or does not exist at Xo then
we can decide on the extremum of f(x) at this point by using the first deriva-
tive of f(x) as specified by the general procedure.
Example. Investigate the function y = e - xz for extremum.
◄ We have y' = - 2xe - xz so that x = 0 is a critical point of f(x). We
find y" = - 2e - xz + 4x 2 e - xz and y" (0) = - 2 < 0 so that f(x) has a
maximum at x = 0 (Fig. 8.36). ►

Fig. 8.36

f(b)

0 I

Fig. 8.37

The absolute maximum and minimum of a function continuous on a


closed interval. If a function f(x) is defined and continuous on a closed
interval [a, b] then, by virtue of Theorem 7.30, f(x) attains its absolute
maximum and absolute minimum on [a, b].
If f(x) attains its maximum M at an interior point xo of [a, b], i.e.,
a< xo < b, then M = f(xo) is a local maximum of f(x) since in this case
there exists a neighbourhood of xo such that at all points belonging to this
neighbourhood which are on the left and on the right of xo the values of
f(x) do not exceed f(xo). However, the function f(x) can attain its abs9luk
8.8 Extrema of a Function 361

maximum Mat the endpoints of [a, b]. So, if we wish to find an absolute
maximum of a functionf(x) continuous on a closed interval [a, b] we have
to find all maxima of f(x) in an open interval (a, b) and the values of
f(x) at the endpoints of [a, b], i.e., f(a) and f(b), and choose the greatest
value as the absolute maximum of f(x) on [a, b].
The absolute minimum of a function f(x) continuous on [a, b] is the
least value of f(x) among all the minima of f(x) in (a, b) and the values
of f(a) and f(b). In the case of a function shown in Fig. 8.37 we have
M = f(b) and m = f(xo).
Example. Suppose that we have a square sheet of steel whose side is
a and we wish to make a box of maximal volume by cutting out four equal
squares at the vertices of the given sheet as shown in Fig. 8.38 and flanging
the sheet. How should we choose the size of the cut out squares to get
a box of maximal volume?
X
I
I
----------
I
I
___
I
__J
I
L---

r---
1
I
I

Fig. 8.38

◄ The volume of a box being a function of x is given by


2 a
v ( x) = x(a - 2x) , 0 ~ x ~ 2.
The first derivative of v(x) is

dv = (a - 2x) 2 - 4x(a - 2.x) = (a - 2.x)(a - 6x)


dx
so that the critical points of v(x) are x1 = a/6 and x2 = a/2.
The open interval (0, a/2) contains the critical point x 1 = a/6.
The second derivative of v (x) is
d2 v
dx 2 - - 2(a - 6x) - 6(a - 2.x).
362 8. Differential Calculus. Functions of One Variable

We have d 2 ~ < 0 at x = 6a so that at this point the function v(x) has


dx
the maximum

"(:) =: Ga)2 = ~3.

At the endpoints of [O, a/2] we have v(O) = v(a/2) = 0.


Therefore, we get the maximum of v(x), i.e., the box of maximal volume,
by cutting out four equal squares with side x = a/6 from the given sheet
of steel. In this case the maximal volume of the box is 2a 3/27. ►

8.9 Investigating the Shape of a Curve.


Points of Inflection
Convexity of a curve and points of inflection. Let a curve be speci-
fied by a function y = f(x) and let y = f(x) have a finite derivative f' (xo)
at a point Xo so that at the point Mo(Xo, f(xo)) of the curve there exists
a tangent which is not parallel to the y-axis.

lj

0 I

Fig. 8.39

We say that a curve is convex downward at a point Mo if there exists


a neighbourhood (xo - o, xo + o) of a point Xo such that all points of the
curve with abscissas contained in (xo - o, xo + o) lie above the tangent to
this curve at Mo (Fig. 8.39).
A curve is said to be convex upward at a point Mo if all points of this
curve with abscissas in some neighbourhood of xo lie below the tangent
to the curve at Mo (Fig. 8.40).
8.9 Shape of a Curve. Points of Inflection 363

Let y = f(x) be a function differentiable on an open interval (a, b). We


say that the graph of y = f(x) is convex upward (downward) on (a, b) if
the graph does not lie above (below) the tangent to y = f(x) for any x in
(a, b).

!:I

0 .r

Fig. 8.40

0 r

Fig. 8.41

The point Mo(xo, f(xo)) is called the point of inflection of a curve


y = f(x) if there exists a neighbourhood (xo - o, xo + o) of xo such that
for all x in (xo - o, xo + o) the curve is convex upward if x < xo and convex
downward if x > xo or vice versa (Fig. 8.41). In other words, the point Mo
is the point of inflection if on the left and on the right of Mo the curve
lies on different sides of the tangent at Mo.
364 8. Differential Calculus. Functions of One Variable

Now we shall consider an analytical procedure for investigating the


sense of convexity and points of inflection of a curve.
Let us choose a point on a curve y = f(x) and a point on the tangent
toy = f(x) at Mo(xo, f(xo)). Suppose that these points have the same abscis-
sa x and y is the ordinate of the point chosen on the curve while Y is the
ordinate of the point on the tangent (Fig. 8.42). Evidently, if y - Y > 0
for all x ¢ xo in a sufficiently small neighbourhood of xo the curve is convex
downward at Mo and if y - Y < 0 for all x ¢ xo in a sufficiently small
neighbourhood of xo the curve is convex upward at Mo. Hence, to determine
the sense of convexity of a curve at Mo it suffices to investigate the sign
of the difference y - Y in a neighbourhood of Xo.

I)

0 r r

Fig. 8.42
Since the equation of the tangent to the curve is
Y - f(xo) = f' (xo)(x - xo)
we have
Y - Y = f(x) - [ f(xo) + f' (xo)(x - xo)]
= [ f(x) - f(xo)] - f' (xo)(x - Xo)
or
A =y - Y= [f(xo + h) - f(xo)] - f' (xo)h,
where h = x - xo.
Let f(x) have the second derivative f" (x) at xo and in some neighbour-
hood of x0 • Applying the mean value theorem (Theorem 8. 7) to the above
identity, we get
A = f' (xo + 8h)h - f' (xo)h = [ /' (xo + 8h) - f' (Xo)]h,
where (J = (J(h) and O < (J < 1.
8.9 Shape of a Curve. Points of Inflection 365

Since f" (x) exists at xo then


. f' (xo + AX) - f' (xo)
I1m f" ( )
.1x--+O
A.....
u.A
= Xo

so that
. f' (xo + Oh) - f' (xo) _ f" ( )
I1m Oh - Xo
Oh--+ 0
and
f' (xo + Oh) - f' (xo) = f" (xo)·Oh + a(Oh)·Oh,
where a(0h) ~ 0 as h ~ 0.
Then we can write
~ =y- Y= [f"(xo) + a(Oh)]Oh 2 •

lj

Fig. 8.43 Fig. 8.44

Let f" (xo) ¢ 0. Since a(Oh) is an infinitesimal as h -+ 0 there exists


o > 0 such that f" (xo) + cx(Oh) is of the same sign as f" (xo) and Oh 2 > 0
whenever O < Ih I < o. Hence, if/" (xo) > 0 then y - Y > 0 for all x suffi-
ciently near to xo and the curve y = f(x) is convex downward at the point
Mo(xo, f(xo)) and if/" (xo) < 0 then y = f(x) is convex upward at Mo
(Fig. 8.43). Whence follows the necessary condition for a point to be a point
of inflection: the point Mo(xo, /(xo)) may be the point of inflection of the
curve y = f(x) only if /" (xo) = 0 or if /" (xo) does not exist.
366 8. Differential Calculus. Functions of One Variable

This condition is not a sufficient one. For example, if f(x) = x 4 then


f" (x) = 12x2 and f" (0) = 0; however, the point 0(0, 0) is not a point of
inflection of y = x 4 : at this point the curve is convex downward (Fig. 8.44).
The following theorem gives the sufficient condition for a point to be
a point of inflection of a curve.
Theorem 8.19. Let f(x) have the second derivative in a neighbourhood
of xo and let the second derivative of f(x) be continuous at xo. Then the
point Mo(xo, f(xo)) is a point of inflection of the curve y = f(x) if
f" (xo) = 0 and f" (x) changes its sign at Xo when x passes through Xo.
◄ Indeed, let f" (xo) = 0 and let there exist a neighbourhood (xo - o,
xo + o) of xo such that the sign off" (x) for all x < xo differs from that
for all x > xo. This means that the sense of convexity changes at
Mo(xo, f(xo)) so that Mo is a point of inflection.
Suppose now that/" (xo) = 0 and there exists some neighbourhood of
Xo such that f" (x) is of the same sign for all x < xo and all x > xo. Then
Mo is not a point of inflection since the curve is convex downward at Mo
if/" (x) > 0 both on the left and on the right of Xo and the curve is convex
upward at Mo if/" (x) < 0 both on the left and on the right of xo. ►
Problem. Let f(x) have a finite third derivative at Xo and let f" (xo) = 0
and f'" (xo) ~ 0. Prove that Mo(xo, f(xo)) is a point of inflection of the
graph of f(x).
Sometimes it happens that at the point of inflection Mo(Xo, f(xo)) the
tangent to the curve y = f(x) is vertical, i.e., parallel to the y-axis, so that
f" (x) does not exist at Xo. For example, if f(x) = x 113 then/' (x) = ! x- 213

andf" (x) =- ~ 1/?. Clearly, there exist no points wheref" (x) = 0 and
at x = 0 f" (x) does not exist. Let us investigate the sign off" (x) in a neigh-
bourhood of x = 0. We have f" (x) > 0 for all x in ( - o, 0), o > 0, and
f" (x) < 0 for all x in (0, o) so that the curve is convex downward on the
left of the point 0(0, 0) and convex upward on the right of 0(0, 0). Hence,
the point 0(0, 0) is the point of inflection of y = x 113 . The tangent to
this curve at 0(0, 0) is perpendicular to the x-axis, for lim /' (x) = + oo.
x-+O
Finally, we can state the following sufficient condition for a point to
be a point of inflection. Let y = f(x) have a tangent at Mo(Xo, f(xo)) which
can be parallel to the y-axis. Let f(x) have the second derivative continuous
in a neighbourhood of Xo except probably at xo. If f" (x) is equal to zero
or does not exist at xo and if/" (x) changes its sign when x passes through
xo then Mo(xo, f(xo)) is the point of inflection of y = f(x).
8.10 Asymptotes of a Curve 367

8.10 Asymptotes of a Curve


Consider a curve having an infinite branch (Fig. 8.45). An asymp-
tote of an infinite branch of a curve is a line whose distance o to a point
Mon the curve tends to zero as M recedes infinitely distant from the origin
of coordinates.
Vertical asymptotes. A line passing parallel to the y-axis through the
point x = xo on the x-axis is a vertical asymptote of the graph of y = f(x)
if at least one of the identities
lim f(x) = ± oo or lim f(x) = ± oo
x--+ Xo - 0 x--+ Xo + 0

holds.
lj

Fig. 8.45

Clearly, in this case the distance o = Ix - Xo I between the point


M(x, f(x)) on the graph of y = f(x) and the line x = xo tends to zero and
M recedes infinitely distant from the origin of coordinates.
For example, the graph of y = 1/x has the vertical asymptote x = 0 since
lim _!_ = - oo and lim _!_ = + oo (Fig. 8.46). Similarly, the graph of
x--+o-oX x--+O+O X
y = e ux has the vertical asymptote x = 0 for lim e llx = + oo.
x ➔ O+O

Figure 8.47 illustrates mutual positions of a curve and its vertical


asymptotes.
To find vertical asymptotes of a curve y = f(x) one should proceed as
follows:
(1) Find discontinuities of f(x) on the x-axis.
(2) Isolate all those discontinuities where at least one of the one-sided
limits of f(x) is equal to + oo or - oo. Let these be x1, x2, ... , Xm. Then
the lines x = xi, x = xi, ... , x = Xm will be the vertical asymptotes of the
368 8. Differential Calculus. Functions of One Variable

Fig. 8.46

y lj

0 0 X

y lj

0 X 0 X

Fig. 8.47
8.10 Asymptotes of a Curve 369

graph of y = f(x). For example, the vertical asymptotes of the graph of


the curve y = 2 1 are the lines x = - l and x = 1 (Fig. 8.48).
1 X -
The vertical line x = xo (x0 being the endpoint of the interval where
f(x) is defined) is the vertical asymptote of y = f(x) provided that (i) xo
is the left end of the interval and
lim f (x) = + oo or lim f(x) = - oo
x--+ Xo +0 x--+ Xo +0

or (ii) xo is the right end of the interval and


lim f (x) = + oo or lim f (x) = - oo.
x--+xo-0

!J

-1 0 :r

Fig. 8.48

For example, the function y = ln x is defined on the interval O < x <


+ oo and lim In x = - oo so that the line x = 0, i.e., the y-axis, is the
x--+O+O
vertical asymptote of the graph of y = In x.
Oblique asymptotes. Let y = f(x) be defined for all x ~ a (x ~ a). Sup-
pose that the line y = kx + b is an asymptote of the graph of y = f(x).
Then the asymptote y = kx + b is called the oblique asymptote.
~4-9505
370 8. Differential Calculus. Functions of One Variable

For definiteness we consider whatever large positive x. If the line


y = kx + b is an asymptote of y = f(x) then, by definition, the distance
o from the point M(x, f(x)) on the curve y = f(x) to the line y = kx + b
tends to zero as x -+ + oo. Let a be an angle (a ¢. 1r /2) made by the asymp-
tote with the x-axis (Fig. 8.49). Evidently, o = IMNI cos ex. Since cos ex ¢. 0
then IMN I tends to zero as o tends to zero as x -+ + oo and vice versa.
Noting that I MN I = I f(x) - kx - b I we infer that the line y = kx + b
is an oblique asymptote of the graph of y = f(x) if and only if
lim ( f(x) - kx - b) =0
x-+ +oo
so that f (x) admits a representation of the form
f(x) = kx + b + a(x),
where lim a(x) = 0.
x-+ + oo

y=f(x)

Fig. 8.49

Observe that if there exists an asymptote y = kx + b of the curve


y = f(x) as x -+ + oo then the function y = f(x) is "nearly" a linear func-
tion as x -+ + oo, i.e., y = f(x) differs from y = kx + b by an infinitesimal
as x-+ + oo.
Theorem 8.20. For the graph of the function y = f(x) to have the ob-
lique asymptote y = kx + b as x --+ + oo it is necessary and sufficient that
there exist both
lim f(x) =k and lim [ f(x) - kx] = b.
x-+ +oo X x-+ +oo

◄ Necessity. Let the graph of y = f(x) have an asymptote y = kx + b as


X --+ + oo so that
f(x) = kx + b + cx(x),
where cx(x)-+ 0 as x-+ + oo.
8.10 Asymptotes of a Curve 371

Then

Iim f (x) = lim


x->+oo X x-++oo

and
lim [ f(x) - kx] = lim [b + a(x)] = b
x-++oo x-++oo

i.e., both limits we are seeking for do exist.


Sufficiency. Let both limits mentioned in Theorem 8.20 exist. Since
lim [ f (x) - kx] = b the difference f (x) - kx - b becomes an in-
x--+ + oo
finitesimal as x-+ + oo. Denote this difference by a(x). Then we can write
f(x) = kx + b + a(x) where a(x)-+ 0 as x-. + oo. Whence it follows that
the graph of the function y = f(x) has the oblique asymptote y =
kx + b. ►
The proof is similar if x-+ - oo.

Fig. 8.50

x2
Example. Prove that the graph of y = has an oblique asymptote.
x- 1
2
◄ The graph of y = X x- has the vertical asymptote x = 1 (Fig. 8.50).
1
Let us write this function if /(x) = x 2 1~ 1 = x + 1 + x-1
x-
- 1 where
372 8. Differential Calculus. Functions of One Variable

1 tends to zero as x-+ oo. Hence, the function being considered can
x- 1
be expressed as
/(x) = x + 1 + a (x),
where a(x) = 1 -+ 0 as x-+ oo.
x- 1
x2
Whence we conclude that the graph of the function y = has the
x- 1
oblique asymptote y = x + 1. ►
It is worth noting that the graph of y = f(x) lies below its asymptote
if ~ = /(x) - kx - b < 0 and the graph lies above the asymptote if ~ > 0.
Horizontal asymptotes. If the function /(x) has the finite limit
lim /(x) = b ( lim /(x) = b) as ·x-+ + oo (x-+ - oo) then the line
x-+ + oo x-+ - oo
y = b is the horizontal asymptote of the right (left) branch of the graph
of y = f(x).
Clearly, a horizontal asymptote is a specific case of an oblique asymp-
tote provided that k = 0.

.r

Fig. 8.51

Examples. (1) Lety = 1/x. Since Jim 1/x = 0 the graph of the function
.l' ➔ 00

y = 1/x has the horizontal asymptote y = 0.


(2) Let y = tan - 1 x so that fun tan - 1 x = ,r/2 and fun tan - 1 x =
x-+ +00 .r-+ -00

- ,r/2. Hence, the right-hand branch of the graph of y = tan - 1 x has the
asymptote y = 1r/2 and the left-hand branch has the asymptote y = - ,r/2
(Fig. 8.51).
8.11 Curve Sketching 373

. .
(3) Let y = sin x and y(O) = 1. Since lim sm x = 0 the line y = 0 is the
X x---+oo X .
horizontal asymptote of the graph of the function y = sm x (Fig. 8.52).
X
As easily follows from the above example the graph of the function
y = f(x) can intersect its asymptote infinitely many times.
Problem. Derive the condition for asymptotes of the graph of the rational
function to exist.
y

-311 0 377' r

Fig. 8.52

8.11 Curve Sketching


In general the process of investigating a function and sketching its
graph involves a sequence of steps which enable us to determine the basic
properties of the function being considered. These are:
(a) The domain of definition of the function.
(b) The discontinuities of the function and their kinds. The vertical
asymptotes.
(c) The point symmetry and the translation symmetry, i.e., whether the
function is even or odd and whether the function is periodic or not.
(d) The points of intersection of the graph with the axes of coordinates.
(e) The behaviour of the function at infinity and the horizontal and
oblique asymptotes.
(f) The intervals where the function is monotone and the points where
the function has extrema.
(g) The intervals where the graph is convex downward and is convex
upward and the points of inflection.
Based on the results of the steps (a)-(g) we can easily plot the graph
of the function. To illustrate our approach to curve sketching we consider
the following examples of investigating functions and constructing their
graphs.
Examples. (1) y = 1 2 (Witch of Agnesi).
1+X
(a) The domain of definition is the number line.
(b) There exist no discontinuities and no vertical asymptotes.
374 8. Differential Calculus. Functions of One Variable

(c) The function is even, i.e., f( - x) = f(x) so that the graph is symmetric
relative to the y-axis. The function is not periodic. Since the function is
even it suffices to construct the graph for x ~ 0 and reflect this graph with
respect to the y-axis.
(d) The graph lies above the x-axis since y = 1 at x = 0 and y > 0 for
all x; y -:;e; 0.
(e) The graph has the horizontal asymptote y = 0 since lim f(x) =
x---->±oo
lim 1 2 = 0. There are no oblique asymptotes.
x-+±ool+x 2
(t) The first derivative of y is y' = - x 2 2 so that f(x) increases
(1 + X)
for x < 0 and decreases for x > 0. The point x = 0 is critical. The derivative
y' changes its sign from negative to positive when x moves through the
point x = 0 from left to right. Hence, the function has the maximum
y(O) = 1 at x = 0. This almost immediately follows from the inequality
f(x) = 1 2 ~ l, holding for all x.
1 +X
(g) The second derivative y" 1 - 1 ~\ vanishes at the points
= -2
(1 + X)
x = 1/v3 and x = - 1/v3. The graph is convex downward for x > 1/v3 and
convex upward for x < 11\"3 since y" > 0 in the former case and y" < 0
in the latter. Hence, x = 1/\"3 = v3/3 is the point of inflection. By virtue
of symmetry the point x = -1/\"3 = - ../3/3 is also the point of inflection.
It is convenient to represent the results in the tabular form

Table 8.1

X
(-~. ~) -
v3
3 (- ·)
~'
0
(·· ~) v3
3 ( -~)
~'

f'(x) + + + 0 - - -
-
f" (X) + 0 - -2 - 0 +

point of in- point of in-


/(x) I' I' max \, \,
flection flection

The arrows /' and \i refer to the increase and decrease of the function,
respectively. The graph of the function is given in Fig. 8.53.
2 1
(
2) Y = X + - .
X
(a) The domain is the number line except the point x = 0.
8.11 Curve Sketching 375

(b) The discontinuity is the point x = 0. We have

lim
x--+O+O
(x> + !) = + oo and
lim
x--+0-0
(x> + !) = -00

so that the line x = 0 is the vertical asymptote of the graph.


!J

0 I

Fig. 8.53

(c) The function is neither even nor odd and it is not periodic.

(d) Putting y = 0, we get x 2 + -1 = 0 or - + -1 = O; whence x = -1
x 3-
X X
so that the graph of the function meets the x-axis at the point x = -1.
(e) There are neither oblique nor horizontal asymptotes since

lim /(x) =
x--+:1::00 X
lim
x--+:1::00
(x + --4) = ±
X
oo.

. denvat1ve
(f) Th e f ust . . 1s . y , = 2x - 1 = 2x3 - 1 so t h at t he pmnt
.
2 2
X X

x = i-- is the critical point. The second derivative y " = 2 + 2 is positive


~ X
3

at the point x = i-- ; hence, at this point the function has the minimum.
~
(g) The second derivative y" = 2<x~ ; 1) vanishes at x = -1 and
X
changes the sign from positive to negative when x passes through the point
x = -1 from left to right. Thus the point x = -1 is the point of inflection.
Since y" > 0 for x in ( - oo, -1) and for x in (0, + oo) and y" < 0 on
the interval -1 < x < 0 the graph is convex downward on the open inter-
,376 8.. Differential Calculus. Functions of One Variable

vais ( - oo, -1) and (0, + oo) and is convex upward on the open interval
-1 < X < 0.
The graph of the function is given in Fig. 8.54.
In x
(J) y =X +- .
X
(a) The domain is the positive number line x > 0.
(b) There are no discontinuities in the domain of definition.

Fig. 8.54

We have Iim
x-+O+O
(x + In x)
X
= - oo so that the line x = 0 passing
through the endpoint of the domain of definition is the vertical asymptote
of the graph of the function.
(c) The function is neither even nor odd nor periodic.
-(d) Putting y = 0, we have x + In x + In x = 0. The soiu-
= 0 and x 2
x
tions of these equations which can be obtained from the graph shown in
Fig. 8.55 define the point of intersection of the graph under investigation
with the x-axis.
8.11 Curve Sketching 377

(e) Since

lim f(x)
x--+ + 00 X
= lim
x--+ + 00
(1 + lnx x) = 1= k
2

and
. In x
lim [/(x) - kx] = I1m - - = 0,
x--+ + 00 x--+ + oo X

the line y = x is the oblique asymptote of the graph of the function.


y

Fig. 8.55

. d • • • ,
= 1 + 21 In x
=x + 1 - In x
2 ur.
(f) Th e first envat1ve 1s y - - 2- 2 • ne can
X X X
easily see from Fig. 8.56 that x 2 + 1 > In x for all x > 0, Hence, y' > 0
for all x and /(x) increases on the interval (0, + oo) and has no extrema.
. . 2 1 2 In x 2 In x - 3
(g) The second denvat1ve y" = - - 3 - 3 + 3 = 3 van-
x X X X
ishes at the point x = e 312
and changes the sign from negative to positive
when x passes through x = e312 from left to right. Hence, x = e 312 is the
point of inflection.
The graph of the function is shown in Fig. 8.57.
1
(4) y = X + 2 .
X
(a) The domain is the number line except the point x = 0.
(b) The point x = 0 is the discontinuity of the second kind. The line
378 8. Differential Calculus. Functions of One Variable

Fig. 8.56

Fig. 8.57
8.11 Curve Sketching 379

x = 0 is the vertical asymptote of the graph of the function, for


lim
x--+O±O
(x + -;) = +
X
oo.

(c) The function is neither even nor odd nor periodic.


(d) Putting y = 0, we have x 3 + 1 = O; whence x = -1 so that the
graph of the function intersects the x-axis at the point x = -1.
(e) The graph has the oblique asymptote y = x since

lim f(x) = lim


x--+±o::> X x--+±o::>

and
lim [ f(x) - x] = lim -; = 0 = b.
x--+±o::> x--+±o::> X

(f) The equation y' = 1 - 23 = X


3
~ 2 = 0 gives x 3 - 2 = 0 so that
X X
x = Ti is the critical point of the function.
!J

Fig. 8.58

The second derivative y 11 = 6/x4 is positive everywhere in the domain


of definition of the function and, consequently, at x = Whence it fol- ri.
lows that the function has the minimum at x = ri.
(g) Since the second derivative y 11 = 6/x4 is positive everywhere in the
domain of definition the graph is convex downward.
The graph of the function is given in Fig. 8.58.
380 8. Differential Calculus. Functions of One Variable

(5) y = Vx(x - 3) 2 •
(a) The domain is the number line.
(b) The function is continuous everywhere in the domain of definition
and its graph has no vertical asymptotes.
(c) The function is nonperiodic and neither even nor odd.
(d) The function vanishes at the points x = 0 and x = 3.
(e) The graph of the function has the oblique asymptote y = x - 2 since

lim f(x) = lim Vx(x - 3) 2 = 1 = k


x--+±oo X x--+±oo X

and
lim
X--+±oo
[ /(x) - x] = lim
x--+±oo
['Vx(x - 3) 2 - x]

. x(x - 3) 2 - x3
hm 3 3 -2 = b.
x--+ ± 00 1/ x 2 (x - 3) 4 + x1/ x(x - 3) 2 + x2

lj

Fig. 8.59

(f) The first derivative


, _ (x - 3) 2 + 2x(x - 3) _ x - 1
y - 3[x(x - 3)2]2/3 - x213(x - 3)113
8.12 Approximate Solution of Equations 381

vanishes at the point x = 1 and does not exist at the points x = 0 and
x = 3. y' is of the same sign on the left and on the right of the point
x = 0 so that the function has no extremum at x = 0. On the other hand
the first derivative changes its sign from positive to negative when x passes
through the point x = 1 from left to right so that the function has a maxi-
mum at x = 1.
At the point x = 3 the function has a minimum since the first derivative
changes its sign from negative to positive when x passes through the point
x = 3 from left to right.
(g) The second derivative
II 2
Y = - xs/3 (x _ 3)413

does not exist at the point x = 0 and changes its sign from positive to nega-
tive when x passes through the point x = 0 from left to right so that the
point x = 0 is the point of inflection and the graph of the function has
the tangent parallel to the y-axis at (0, 0). The point x = 3 is not the point
of inflection. The graph of the function, shown in Fig. 8.59, is convex up-
ward in the half-plane x > 0.

8.12 Approximate Solution of Equations


We wish to compute the real root of the equation
f(x) = 0.
Suppose that the following conditions are satisfied:
(1) the function f(x) is continuous on a closed interval [a, b];
(2) the values f(a) and f(b) are of opposite signs so that f(a)·f(b) < O;
(3) there exist derivatives f' (x) and f" (x) on [a, b] retaining their signs
on [a, b].
By virtue of conditions (1)-(2), Theorem 7.27 says that f(x) vanishes
at least at one point ~ in (a, b); this means that the equation being consid-
ered has at least one real root ~ in (a, b). Furthermore, ·since f' (x) retains
its sign on [a, b] the functionf(x) is monotone on [a, b] so that the equation
has the unique real root ~ in (a, b).
We shall consider an iterative method of computing the unique real root
of the equation f(x) = 0 with any predetermined precision.
Four cases have to be distinguished for [a, b]:
(a) f' (x) > 0 and f" (x) > 0;
(b) f' (x) > 0 and f" (x) < 0;
(c) f' (x) < 0 and f" (x) > 0;
(d) f' (x) < 0 and f" (x) < 0.
382 8. Differential Calculus. Functions of One Variable

The graphs corresponding to cases (a)-(d) are given in Fig. 8.60.


Let us turn our attention to the case (a) (Fig. 8.~l). We have/' (x) > 0
and/" (x) > 0 on [a, b]. Draw the secant through the points A (a, f(a))
and B(b, /(b)) .. The equation of the secant is
Y -f(a) _ x - a
f(b) - f(a) - b - a ·

y y

0 0 b X

(a) (b)
y

0 0

(c) (d.}
Fig. 8.60

The point a1 at which the secant AB intersects the x-axis lies between
a and ~' being a better approximation to ~ than a. Substituting y = 0 into
the equation of the secant AB, we easily obtain
_ f(a)(b - a)
01 - a - f(b) - f(a) ·

Look at Fig. 8.59. It is easy to observe that at a1 the signs of /(x) and/" (x)
are opposite.
Now we draw the tangent to the curve y = f(x) at the point B(b, /(b)).
Notice that at this point /(x) and/" (x) are of the same sign; this is a very
important condition: if /(x) and/" (x) are of opposite signs at B(b, /(b))
8.12 Approximate Solution of Equations 383

it can happen that the process of approximating the root diverges. The
point b 1 of intersection of the tangent with the x-axis lies between b and
~ and offers a better approximation to ~ than b. The tangent is given by
the equation
Y - f(b) = f' (b)(x - b).
Then putting y = 0, we get
b1 =b - Pt~) (f' (b) ¢- 0).

Therefore we have o < 01 < ~ < bi < b.


y

0 X

Fig. 8.61

Suppose that the absolute error of the approximation ~* to ~ is preas-


signed. We can estimate the precision of the approximate values o1 and
b1 by comparing the value I b1 - 01 I with the preassigned error. If
I b1 - 01 I is greater than the preassigned error then we choose the closed
interval [01, bi] and repeat the process by computing the new approximate
values for the root ~ as
/(01)(b1 - 01)
o2 = o1 - /(b1) - /(01)

and

bi = b1 - ;,~;:) (f' (b1) ¢- 0),

where o < 01 < 02 < ~ < b2 < b1 < b.


384 8. Differential Calculus. Functions of One Variable

Repeating the process, we finally obtain two sequences of the approxi-


mate values for the root
a < ll1 < a2 < ... < lln < ... < ~

and

where
f(an - 1)(bn - 1 - lln - 1)
lln = lln - 1 - I'. b
J( n-1) -
f(
lln-1 ) ,

bn -- b n - -f(bn - -
i) ( ao = a, bo = b, n = 1' 2 ' 3' )
-1 f' (-
bn - 1) ··· ·
The sequences {an ) and {bn} are monotone and bounded; hence they
have limits. Let
lim an = a and lim bn = {3.

It is not hard to verify that a = f3 = ~ where ~ is the unique real root of


the equation f(x) = 0 provided that the conditions (1)-(3) hold.
Example. Compute the root ~ of the equation x 2 - 1 = 0 on the closed
interval [0, 2].
◄ Evidently, ~ = 1. Let us apply the process outlined above. For the func-
tion f(x) = x 2 - 1 the following conditions are satisfied:
(1) f(x) is continuous on [0, 2];
(2) f(0) = -1 < 0 and /(2) = 3 > 0 so that f(0) ·/(2) < 0;
(3) f' (x) = 2x and f" (x) = 2 retain their signs on [0, 2].
Thus there exists the unique root ~ of the equation x 2 - 1 = 0 on
[0, 2]. Consequently, the process is applicable in this case.
We have a = 0 and b = 2. For n = 1 the recurrence formulas for an
and bn give
_ ( -1)·2 _ 1 _ 1
a1 - 0 - 4 - 2 - 1 - 2 ,

3 5
b1 = 2 - =
4 4 = 1 + 41 .
For n = 2 we get
1 1
a2 = 1 - 14 and b2 = 1 + 40 ,

which are the approximate values for ~ with the absolute error .d(~*) <
0.1. ►
8.13 Taylor's Theorem 385

8.13 Taylor's Theorem


Taylor's formula for polynomials. Consider the nth degree poly-
nomial

where bo, b1, b2, ... , bn are constant coefficients.


We can express P(x) as the expansion in powers of x - a, with some
coefficients, where a is an arbitrary number. Indeed, put x = a + t. Then
P(x) becomes
P(x) = P(a + t) = bo + b1 (a + t) + ... + bn(a + tt.
On opening the brackets and collecting the similar terms, we can write
P(a + t) = Ao + A1t + A2t 2 + A3t 3 + ... + Antn
or, substituting x - a for t,
P(x) =Ao+ A1(x - a)+ A2(x - a)2 + A3(X - a) 3
+ ... + An(X - at,
where Ao, A 1, . . . , A,, are coefficients dependent on the original coeffi-
cients bo, b1, ... , bn,
Differentiating P(x) in ( *) n times, we obtain
P' (x) = A1 + 2A2(X - a)+ 3A3(X - a) 2 + ... + nAn(X - ar-1,
P"(x) = 2X lA2 + 3 X 2A3(X - a)+ ... +n(n - l)An(X - af- 2,
. . . . .. . . . . . . . . . .. . .. . . . . . . ... . . . . . . .. . . . . . . .. . . .. .. . . . ... .
p<n\x) = n(n - 1) ... 2 x lAn,
Putting x = a in P(x) and in the above identities, we have
P(a) = Ao, P'(a) = l!·A1, P"(a) = 2!·A2, ... , p<n\a) = n!·An,
Whence

Ao= P(a), A - P' (a) _ P" (a) _ p(n) (a)


i - I! ' A2 - 2 ! , ... , A,, - n .

Therefore we can express P(x) as


P'(a) P"(a)
P(x) = P(a) + - - (x - a) + --'------'- (x - a) 2
l! 2!
p<n>(a) n
+ ... + , (x - a) .
n.
This is Taylor's formula, in powers of x - a, for the given polynomial P(x)
of degree n or Taylor's formula for P(x) at the point a.

25-9505
386 8. Differential Calculus. Functions of One Variable

If we put a = 0 we obtain the specific case of Taylor's formula


p(n) (0) n
P(x) = P(0) + P' (O) x + P" (O) x 2 + + I X'
1! 2! n.
which is called Maclaurin's formula.
Example. Expand the polynomial P(x) = x 2 - 3x + 2 in powers of x
and in powers of x - 1.
◄ Applying Maclaurin's formula, we get

P(x) = x 2 - 3x + 2 => P(0) = 2,


P'(x)=2x-3 =>P'(0)= -3,
P" (x) = 2 => P" (0) = 2,
and
3 2 2 2
P(x) = 2 - TI x + 2! x = 2 - 3x +x ,

so that the expansion of P(x) in powers of x is identical to P(x) itself.


To expand P(x) in powers of x - 1 we apply Taylor's formula and get
P(x) = x 2 - 3x + 2 => P(l) = 0,
P'(x)=2x-3 =>P'(l)= -1,
P" (x) = 2 => P" (1) = 2,
and
P(x) =0- I(x - 1) + ;! (x - 1) 2 = -(x - 1) + (x - 1) 2 • ►

Notice that Taylor's formula gives the value of P(x) at any point x
provided that the values of P(x) and all its derivatives at some point a
are known.
Taylor's formula for arbitrary functions. Now we consider a function
f(x) defined in a neighbourhood of x = a; the function f(x) may not be
a polynomial of degree (n - 1) but is supposed to have derivatives up to
order n in this neighbourhood.
Let us compute the values f(a), f' (a), ..-. , fn - 1>(a) and use them to
construct the function
_
Qn - 1 (x) - /(a) +
f 1 (a)
I! (x - a) + ... +
tn -
(n _
I) (a)
I)! (x - a)
n- .
1

Evidently, Qn _ 1 (x) is Taylor's polynomial of degree n - 1 for the function


/(x). If the original function/(x) were a polynomial of degree (n - 1) then
we would have the identity f(x) = Qn - 1 (x) for all x in the neighbourhood
in question. However, in the case under consideration the identity does not
hold since, by the hypothesis, f(x) is not a polynomial of degree (n - I).
8.13 Taylor's Theorem 387

Let us put
f(x) = Qn - 1 (x) + Rn (x).
This equality is called Taylor's formula for the function f(x) in the neigh-
bourhood of the point x = a or Taylor's formula for the function f(x) at
the point a, and Rn (x) is called the nth remainder of Taylor's series.
The remainder Rn (x) can be expressed in terms of the nth derivative
of the functionf(x). For the purpose we assume thatf(x) is not a polynomi-
al of degree (n - 1) and possesses continuous derivatives up to order
(n - 1) on [a, b] and there exists the nth derivative of f(x) in (a, b). Sub-
stituting b for x in Taylor's formula for the function f(x) at the point a,
we can write
f(b) = f(a) + f__'Ja) (b - a) + f" (a) (b - a) 2
1! 2!
fn-l)(a) n-1
+ ... + (n _ l)! (b - a) + Rn.
We shall represent R11 in the form
Rn= M(b - at,
where M is the quantity to be defined.
To this end we consider the auxiliary function

f' (x) f" (x)


= f(b) - [ f(x) + ---yr- (Q - x) + ---2 c- (b - x)
2
cp(x)

+ ... + ;;~'l};l (b - x)"- 1 + M(b - x)"]

defined on [a, b]. The function rp(x) is obtained by replacing a by x in


the right-hand side of f(b) and subtracting the right-hand side from f(b).
Observe that rp(x) satisfies Roi/e's theorem (Theorem 8.6). Indeed,
(a) cp(x) is continuous on [a, b] since the original function f(x) and
all its derivatives up to order (n - 1) are continuous on [a, b];
(b) cp(x) has a derivative on (a, b) since the original function f(x) has
the nth derivative on (a, b);
(c) cp(x) assumes equal values at the endpoints of [a, b]: computing
cp(x) for x = a and x = b gives cp(a) = 0 and cp(b) = 0.
Then by virtue of Rolle's theorem there exists a point ~ in (a, b) such
that 'P' (~) = 0.
Computing the derivative 'P ' (x), we get

'P' (x) = -
~
[1' (x) - f' (x)
1!
+ f" (x) (b - x) - f" (x) 2(b - x) +
1! 2!

25*
388 8. Differential Calculus. Functions of One Variable

fn - 1) (x) n- 2 tn - 1\x) n- 2
+ ··• + (n _ 2)! (b - x) - (n _ I)! (n - I)(b - x)

+ fn)(4_ (b - xt- 1 - Mn(b - xt- 1]


(n - I)!
so that

Therefore

<p' (0 = -(b - ~)" - 1 [-J?!_(t)_


(n - l)!
- Mn] = 0.

Since l ~ b (the point ~ lies inside (a, b)) then


f 11 ) (l)
- --- ---- - Mn = 0
(n - l)!
and

Hence we can write Rn = M(b - at as


fn\~) n
Rn = -~-,---
n.
(b - a) (a < ~ < b).
Substituting this expression into the formula for f(b), we obtain

This is Taylor's formula for the function f(x) and

R = fn>(~) (b - a)n
n n.I '
where ~ lies between a and b and is the nth remainder as given by Lagrange.
If we put n = 1 Taylor's formula for f(x) becomes as given by Theo-
rem 8.7
f(b) = f(a) + [_~~g} (b - a)

or
f(b) - f(a) = f' (~)(b - a).
8.13 Taylor's Theorem 389

Taylor's formula remains valid for any points xo and x in [a, b] so that
we can write it in the form

f(x) = f(xo) + f' ;;o) (x - Xo) +f ";;o) (x - xo)2

fn-1\x)
+ ... + (n - l)~ (x - xor- I + Rn(X),

where Rn(X) = fn)~~) (x - xot and the point ~ lies between x and x 0 , or
n.

Putting xo = 0, we arrive at Maclaurin's formula

f (x) = f(0) + [' (0) x + f" (02 x2 +


1! 2!
f n)(0x)
where Rn(X) = , xn, 0 < 0 < 1.
n.
The remainder in Taylor's formula can also be represented in a form
given by Peano. '¥le have assumed that f(x) has the nth derivative fn) (x)
in a neighbourhood of Xo. Now we suppose that fn)(x) is continuous at
Xo. Then fn)(xo + 0(x - Xo)) = fn\xo) + a(x) where a(x)-+ 0 as x-+ Xo
and the nth remainder

R n (X ) -_ f'r)(Xo + 0(x - Xo)) ( _


- - - - - - --- X Xo
)n
n!
can be written as

R n(X ) _- fn\xo) + a(x) ( _ )n


f X Xo
n.
or
Rn (x) =f n> \xo) (x _ xot + a (x)(x ~ xot ,
n. n.
where a(x)-+ 0 as x ➔ Xo.
. )(x - ~ ) n
a( x
Since a(x)-+ 0 as x-+ Xo then , = o((x - xot) as x ➔ Xo
n.
and the Taylor's formula becomes
f' (xo) + fn>(xo) (x - xot
f(x) = f(xo) + l! (x - Xo) + n!
+ o((x - xot) as x ➔ xo.
390 8. Differential Calculus. Functions of One Variable

This formula is sometimes called Taylor's formula for the function f(x)
with the remainder as given by Peano. It is easy to notice that the error
of the approximation of f(x) by Taylor's formula is an infinitesimal of
higher order than (x - xot as x-+ xo. Hence, this formula is suitable to
apply if we wish to approximate f(x) at points sufficiently close to xo. So
it is somet.imes called the local Taylor's formula.
Maclaurin's formulas for some elementary functions. We shall apply
Maclaurin's formula
f(x) = /(0) + f' (0) x + f" (0) x2 + fn - 1)(0) n- I
l! 2! + (n - l)! x
+ f n\ 0x) Xn O< 0 < 1
n! '
to get the approximation of some elementary functions.
Examples. (1) The function f(x) = ex.
◄ We have

f(x) = eX, f(O) = 1,


f' (x) = eX, f' (0) = 1,
fn - l)(X) = ex, fn - I\0) = l,
fn)(x) = eX, fn)(0x) = e0x.
Applying Maclaurin's formula, we obtain
x x x2 n-1 e0x n
e = 1 +--+-+ + x_ _ +--x 0<0<1.
1! 2! (n - l)! n! '
Putting x = 1 gives
1 + l e0
e=2+-+ -~+ 0<0<1.
2! (n - l)! n! '
Since
e0 3
0< n.,<-,
n.
the polynomial 2 + i! + ;! + + (n ~ l)! approximates the number e

by defect and with the approximation error less than l,n. . ►


(2) The sine function f(x) = sin x.
◄ We have
f(x) = sin x, f(0) = 0,
f' (x) = cos x, f' (0) = 1,
f" (x) = -sin x, f" (0) = 0,
f'" (x) = -cos x, f"' (0) = -1,
8.13 Taylor's Theorem
- - . ·----· . -- -· --- - - ·---- - . - -·-·· - --- . ---· .. ·- - '
391

and in general

fm>(x) = sin (x + m ;) .
Whence

fm)(0) = sin m 1r = [ O
for m = 2k,
2 ( - l)k for m = 2k + 1,
and

f•>cox) = sin (ox+ n ;) .

Therefore, the terms with even powers of x vanish so that Taylor's polynomi-
al involving (2n + 1) terms is identically equal to that involving (2n + 2)
terms. Then applying Maclaurin's formula and putting n = 2k + 1, we
obtain
• X X3 X5 2k - I
sm x = TI - Jf + 5T + . . . + ( - l) k - 1
(2 k _ 1) ! + R 2k + 1 (x ),
X

where
X2k + l [ ]
R2k + 1 (x) = (2k + l)! sin 0x + (2k + 1) ; , 0 < 0 < 1.

lxl 2k + I
Clearly, I R2k + 1 (x) I ~ (2k + 1)! . ►
(3) The cosine function f(x) = cos x.
◄ We have
f(x) = cos x, f(O) = 1,
f' (x) = -sin x, f' (0) = 0,
f " (x) -- --cos x, f" (0) = -1
and in general

fm>(x) = cos (x + m ;)

so that

f m> (0) -_ cos 1r _


m -- -
[ 0 for m = 2k + 1,
2 (-It for m=2k.
Applying Maclaurin's formula for n = 2k + 2, we get
x2 x4 x6 k x2k
cos x = 1 - 2 ! + 4! - 6! + + ( -1) (2 k)! + R2k + 2(x),
392 8. Differential Calculus. Functions of One Variable

where

X2k +2
Evidently, IR2k + 2 (x) I ~ (2 k + 2)! . ►
Maclaurin's formulas for sin x and cos x are suitable when we wish to
approximate these functions with any predetermined errors. The Taylor ap-
proximations to the functions sin x and cos x in a neighbourhood of x = 0
are shown in Fig. 8.62 and in Fig. 8.63.

IJ

:r 0

-1

Fig. 8.62 Fig. 8.63

(4) The logarithmic function f(x) = ln(l + x).


◄ This function is defined and differentiable infinitely many times for
X > -1.
We have
f(x) = ln(l + x), f(O) = In 1 = 0,

f' (X) = 1 ! X' f' (0) = 1,

f" (x) =- (I: x)z ' f" (0) = -1,

f "' (x) = 1. 2 ' f"' (0) = 2!


(1 + x) 3
8.13 Taylor's Theorem 393

Using Maclaurin's formula, we obtain

= --x - x + -x -
2 3
ln(l + x)
1 2 3
where

Rn(X) = (- l)n + 1 l Xn, 0 < 0 < 1. ►


n(l + 0xt
(5) The power function f(x) = (1 + xr (a is real, x > -1).
◄ We have
f(x) = (1 + xf\ /(0) = 1;
f' (x) = a(l + x)O! - 1, f' (0) = a;
f" (x) = a(a - 1)(1 + x)O! - 2, f" (0) = a(a - 1);

fn-l)(X) = a(a - 1) ... (a - n + 2)(1 + X)a-n+1,


fn - 1)(0) = a(a - 1) ... (a - n + 2);
fn)(x) = a(a - 1) ... (a - n + 1)(1 + x)O! - ",
tn\0x) = a(a - 1) ... (a - n + 1)(1 + exr - n.
Whence

(1 + x)O! = 1 + --~ x + a(a - 1) x2


1! 2!

+ ... + a(a - 1) . . . (a - n + 2) n- 1 +R ( )
(n - l)! X n X'

where

It is easy to observe that for natural a = m Maclaurin's formula for


(1 + x)O! becomes the binomial formula

(1 m _ m m(m - 1) 2 · m
+ X) - 1+ I! X + 2! X + ... + X VX.

Maclaurin's formula and equivalent infinitesimals. We have established


equivalence relations for infinitesimals in Chap. 7. These formulas are
suitable to represent elementary functions for sufficiently small Ix I and
to evaluate limits of these functions. However, it is not a rare case when
the limit in question depends on higher degrees of x than those involved
in equivalence relations previously obtained. In this case we need to extend
these equivalence relations to terms involving higher degrees of x to be able
394 8. Differential Calculus. Functions of One Variable

to compute the limits of functions as x -+ 0. These extensions can easily


be obtained by applying Maclaurin's formula with remainder specified by
Peano. The following is the list of the equivalence relations thus obtained:
x2
ex= 1 + x + 2! +
. x3 xs 2n + 1
( l) n X ( 2n + 2)
Sill X =X - - +- - + - (2n + 1)! + O X '
X-+ O;
3! 5!

x2 x4 n x2n
cos x = I - 2! + 4! - + ( -1) (2n)! + o(x2n + i), x-+ 0;

x2 x3
ln(l + x) = x - 2 + 3

(1 + x)a = 1 + ax + a(a2~ l) x2

+ ••• + a(a - 1) . . . (a - n
I
+,. I) X n + O( Xn), X-+ 0.
n.
. X - Sill X
Example. Evaluate hm 3 •
x--+O X

◄ The equivalence relations given in Chap. 7 are of no value here since


the principle term of the limit depends on x 3 • However using the equiva-
lence relation for sin x given in the above list, we easily obtain

· X - (X - x' + 0 (x 4 ) )
- 1
. x - sm x _ 1.
1Im 3.
3 - Im 3
x--+O X x--+O X

Taylor's theorem and investigating a function for extremum. Tuylor's


formula is helpful to find extrema of a function.
Theorem 8.21. Let a function f(x) have the nth derivative fn> (x) in a
neighbourhood of a point xo. Let fn>(x) be continuous at xo and let
f' (xo) = f" (xo) = . . . = fn - 1>(xo) = 0 and fn> (xo) -;t. 0. Then (i) if n is
odd f(x) has no extremum at xo; (ii) if n is even and f n> (xo) < 0 then f(x)
has a maximum at xo and (iii) if n is even and f n> (xo) > 0 f(x) has a mini-
mum at xo.
◄ Observe that f(x) has an extremum at xo if there exists o > 0 such that
the difference f(x) - f(xo) retains its sign on the interval (xo - o, xq + o).
8.13 Taylor's Theorem 395

Applying Taylor's formula, we have


f' (xo) f" (xo)
f(x) = f(xo) + 1! (x - xo) + 2! (x - xo) 2

fn - 0 (xo) ( )n -1 fn)(Xo + 0(x - Xo)) ( )n


+ ... + n ( _ l) 1 x - Xo
. + n., x - Xo ,

where O < 0 < 1. Whence


f(x) - f(xo) = fn)(xo + O(x - Xo)) (x - xot
n!
since f' (xo) = f" (xo) = . . . = f n - l) (xo) = 0.
By the hypothesis f n) (x) is continuous at Xo and fn) (xo) ¢. 0. This im-
plies (see Theorem 7 .22) that there exists o > 0 such that at every point
of (xo - o, Xo + o) f ") (x) is of the same sign as f n) (xo).
The following cases have to be distinguished:
(a) n is even and fn\xo) > 0. Then

f"\xo + ~(x ~ xo)~ (x - xot ~ 0 vx E (xo - o, Xo + o)


n.
so that (*) gives
f(x) - f(xo) ~ 0 VX E (xo - o, Xo + o).
By the definition of a maximum of a function we infer that f(x) has
a minimum at Xo.
(b) n is even and f n) (xo) < 0. Then we have

f">(xo + ~(x - xo)) (x - xot ~ 0 vx E (xo - o, Xo + o)


n.
andf(x) - f(xo) ~ 0 for all x E (xo - o, xo + o) so thatf(x) has a maximum
at Xo.
(c) n is odd and fn) (xo) -¢ 0. Then

f"\xo + ~(x - Xo)) (x - xot and fn>(xo)


n.
are of the same sign whenever x > Xo and of opposite signs as x < xo.
Hence, for any whatever small o > 0 the difference f(x) - f(xo) changes
its sign on the interval (xo - o,
xo + o). This means that f (x) has no extre-
mum at xo. ►
Example. Investigate the functions y = x 4 and y = x 3 for extremum.
◄ It is easy to see that the point x = 0 is a critical point for these func-
tions. The first three derivatives of y = x 4 are zeros at x = 0 and
f 4>(0) = 24 > 0. Hence, n = 4 is even and f 4 >(0) > 0 so that y = x 4 has
a minimum at x = 0.
396 8. Differential Calculus. Functions of One Variable

The function y = x 3 has no extremum at x = 0 since its first two deriva-


tives are zeros at x = 0 and the third derivative is not equal to zero so that
n = 3 is odd. ►
Remark. Taylor's formula is helpful to prove the following theorem
which specifies the sufficient conditions for a function to have a point of
inflection at Xo.
Theorem 8.22. Let f(x) have the nth derivative fn) (x) in a neighbour-
hood of Xo and let fn)(x) be continuous at Xo. Moreover, let f" (xo) =
f"' (xo) = ... = fn - t)(xo) = 0 and fn\xo) ~ 0. Then Mo(Xo, f(xo)) is the
point of inflection of the graph of y = f(x) if n is odd.
To verify this theorem it suffices to consider the function f(x) = x 3 •

8.14 Vector Function of a Scalar Argument


Definition. Suppose a point M moves along some trajectory L. We
can locate this point at any time t by indicating its position vector r, its
velocity vector v, its acceleration vector w, etc. Each of these vectors can
be regarded as a vector function of a scalar argument t, i.e., r = r(t),
v = v(t), w = w(t).
We say that a vector function of a scalar argument t is defined on the
interval (a, {3) and write a = a(t) if there is a certain law which assigns
a definite vector a to each t from (a, {3).
Let a be expanded relative to the unit vectors i, j, k of some system
of coordinates so that
a = xi + yj + zk.
If a = a(t) is a vector function of the argument t then the coordinates x,
y, z of a = a(t) become scalar functions of the argument t, 1.e.,
X = cp(t), y = y;(t), Z = -y(t), a <t< {3.
Conversely, if the coordinates x, y, z of the vector a are functions of the
argument t so is the vector a, i.e.,
a = cp(t)i + y;(t)j + 'Y(t)k.
Therefore the vector function a(t) is fully determined by the scalar func-
tions x = cp(t), y = y;(t) and z = -y(t) and vice versa.
The length and direction of the vector a(t) can in general be different
for different t; also, a(t) can be applied at different points for different
t, for example as the velocity vector of a moving particle.
Suppose now that the vector a(t) starts at a given poir:t O in space.
For different t a(t) ends at different points which form some set of points
in this space. This set of points corresponding to all possible terminal points
8.14 Vector Function of a Scalar Argument 397

of a(t) when t varies over its domain of definition is called the hodograph
of the vector function a(t). In general the hodograph of a(t) is a curve
which contains all points of the space where a(t) ends (Fig. 8.64). Observe
that the hodograph of the position vector r of a moving point coincides
with the trajectory L of this point.
The equation
r = r(t), a < t < {3
or
r(t) = <P(t)i + ,/;(t)j + -y(t)k
is called the vector equation of the curve L.

lj

0 L

Fig. 8.64 Fig. 8.65

The equations
X = <PU),
[ y = ,/;(t), a<t</3
z = -y(t),
are called the parametric equations of the curve L.
For example, the equations
= R cost,
X
[ y = R sin t, 0 ~ t < 21r (R, h are constant)
z = ht,
are the parametric equations which describe one turn of a screw line (or
a circular helix) in space (Fig. 8.65).
398 8. Differential Calculus. Functions of One Variable

Limits and continuity of vector functions. Let a = a(t) be defined in


a neighbourhood of a point t = to except probably at t = to. The constant
vector A is called the limit of the vector function a(t) as t -+ to if, given
any B > 0, there exists o > 0 such that
la(t) - A I < s
whenever It - to I < o and t ~ to.
In symbols one writes
lim a(t) = A.
t---+ to

The geometric interpretation of the notion of a limit of a vector func-


tion a(t) is that the length of the vector a(t) - A tends to zero as t -+ to

da
dt

0
A

Fig. 8.66 Fig. 8.67

so that the vector a(t) tends to coincide with the vector A as t-+ to
(Fig. 8.66). Hence
( lim a(t) = A) ¢} ( lim I a(t) - A I = 0).
t---+ lo t---+ to

Let a(t) = <P(t)i + it,(t)j + -y{t)k and A = ai + bj + ck. Then


la(t) - Al = ✓ (<P(t) - a) 2 + (it,(t) -_ b) 2 + (-y(t) - c) 2 .
Whence, if lim a(t) =A then
t---+ to

lim <P(t) = a, lim it,(t) = b, lim -y(t) =c


t .... to t---+ to t .... to

and vice versa.


Let a vector function a(t) be defined on the interval a < t < {3 and
to E (a, {3). We say that the vector function a(t) is continuous at t = to if
lim a(t) = a(to).
t-+ to
-~-!4 Vector Function_of_a_ Scalar Argument···---·--- ···----·· ····----------·--·----19~

Differentiation of vector functions. Let a = a(t) be a vector function


defined on the interval a < t < {3 and let a curve L be a hodograph of
a(t). Then a certain value t E (a, {3) corresponds to the point Mon L. Sup-
pose that tit is an arbitrary increment in t such that t + tit E (a, {3). Then
the vector a(t + tit) gives the point M1 on L (Fig. 8.67).
Consider the increment tia in the vector function a(t) corresponding
to the increment tit so that
tia = a(t + tit) - a(t).
The ratio

is a vector collinear to the vector tia.


Let the ratio ~~ have a limit as tit ➔ 0. Then this limit is called the
ut ·
derivative of the vector function a(t) with respect to the scalar argument
t at the point t, customarily denoted by d:;t) or by a' (t). Hence

da(t) = hm
----- . -tia = hm a(t + tit) - a(t)
. --·-·-··-----.
dt .11-0 tit .11-0 tit
The vector function a(t) having a derivative at a point t is called
differentiable at t.
Now we shall investigate how the vector !; is directed. The point M 1
moves along the hodograph L towards the point M as tit ➔ 0 so that the
secant MM1 tends to the tangent to L at M. Hence, the derivative : is

a vector tangent to the hodograph of a(t) at M and the direction of !;


coincides with that of the motion of the terminal point of the vector a(t)
caused by the increase of the parameter t.
The derivative ~~ can be expressed in terms of the coordinates of a(t).
Let a(t) = tp{t)i + t/;(t)j + -y(t)k. Then
tia(t) = a(t + tit) - a(t) = ititp(t) + jtit/;(t) + kti-y(t).

Dividing both sides by tit ~ 0, we obtain


tia(t) _ . titp(t) + . flt/;(t) k fl-y(t)
M - I flt J flt + flt .
400 8. Differential Calculus. Functions of One Variable

If the functions cp(t), 1/;(t) and -y{t) have derivatives at a given point
t then every summand on the right has a limit as t:..t -+ 0 so that there exists
a limit of the left-hand side of the above expression, i.e., there exists the
derivative d~;t) . Evaluating the limit as t:..t -+ 0, we get

da = i dcp + . dl/; + k d-y . (*)


dt dt J dt dt

Hence, relative to a stationary system of coordinates the derivative!; of the


vector a(t) is given by the formula (*)so that in order to compute the deriva-
tive of the vector function a(t) we have to compute the derivatives of the
coordinates of a(t).
Let r = r(t) be a position vector of a point moving in space. Then the
velocity of this point at the moment t is determined by the derivative of
r(t) at t:
dr(t) = v(t)
dt '
Example. Find the derivative of the vector function
a(t) = iR cos t + jR sin t + kht (R, h are constant).

◄ Applying ( *), we get

da
dt = -1
. t + J.R cos t + kh . ►
.R sm

Differentiation laws. (1) If c is a constant vector then !~ = 0.


(2) If a(t) and b(t) have derivatives at t then

! (a(t) ± b(t)) = d:~t) ± ~;t) .


(3) If a vector function is multiplied by a scalar its derivative is multi-
plied by this scalar so that
-
d(aa(t)) =a da(t) (a is a constant).
dt dt
(4) The derivative of the scalar product of vectors is given by

1 (a(t), b(t)) = ( :: , b) + ( a, : ) .

. a unit
C oro II ary. If a ( t ) 1s . vector, I.e.,
. Ie (t ) I = 1, t h en de . perpend'Icu-
dt IS
lar to e.
Exercises 401

◄ Indeed, if e is a unit vector then (e, e) = 1. On differentiating (e, e) = 1,


we have

or

2 (!~, e) = 0.

Whence we infer that !; is perpendicular to e. ►


(5) The derivative of the vector product of the vectors is given by

! [a(t), b(t)] = [ !~ , b] + [ a, <;};].


Exercises
Compute the derivatives of the following functions
1 }r;,
1. y = x2 - Sx + 1, 2. y = 2 -vx
C.
- - + v3.
X

3. y = (x 2 - 3x + 3)(x 3 - 1). 4. y = (Vx + 1) (- ~ - 1) . 5. y = 2 x .


VX X + 1

6. y = (x 3 + 1) (s --;) ,
X
y' (1) =? 7. y = 3
~.
✓ 1+7
8. y .
= sm x - cos x. g. y = 1 Sill X . lO . y = Sill
· 2 x.
+ COS X

11. y = ! tan 3 x - tan x + x. 12. y = sin Sx. 13. y = 2 sin(3x - 1).

14. y = sin 1/x. 15. y = sin 5 2x. 16. y = sin (sin x). 17. y =x sin - 1 x.

.
18• y = X Sill X tan - 1
X.
19• y . - 1 -2 • 20•
= Sill y = tan - 1 X 2 •
X

In x
21. y = In 2 x. 22. y = x 2 Iog3 x. 23. y = 2 • 24. y = In tan x.
1+X
x + 2-t
= -I1-
3
25. y • 26. y = 9x, 27. y = xex. 28. y =- -- .
nx x
29. y = x3 - 3x_ 30. y = 10 3x + 1 • 31. Y = 5sinx. 32. y = sin (3x}.
26-9505
402 • 8. Differential Calculus. Functions of One Variable

33. y = sinh 3 x. 34. y = ✓ cosh x. 35. y = tanh (In x). 36. y = 3sinh 2 x.

37. y = (sin xtosx. 38. y = Xsinx. 39. y = Xlnx.

40• Y _- (x - (x2)_ ~)3


x + 1 ✓ ✓
2 x 2 2 a2 2
5 · 41. Y =2 x - a - 2 In (x + x - a 2 ).

42. y = 21 In tan -X2 - -21 COS X


. 2 •
sin x
43. y = 3x 3 sin - 1 x + (x 2 + 2) ✓ 1 - x 2 •
44. y = x (sin - 1 x) 2 - 2x + 2 ✓ 1 - x 2 sin - 1 x.
Compute the first differentials of the functions

45. Y = ~ . 46. y = tan 2 X. 47. y = 5ln sin x.


4x
48. Compute the approximate value of tan - 1 1.02.
Compute the derivatives of higher orders for the functions
49. y = x 3 - 3x + 5, y" = ? 50. y = tan - 1 x, y" (1) = ?
51. y = a 3X, y "' = ? 52. y = x 5 In x, y"' = ?
Write down the nth derivatives of the following functions

53. y = x ex. 54. y = sin 2 x. 55. Y = x(l ~ x) •

Find the differentials of higher orders for the functions


56. y = e-x212, d 2y =? 57. y = xm, d 3y =?

Compute the derivatives : of the functions given parametrically

58 _ f x = 2t - 1, 59 _ f x = a cos 3 t, 60 _ f x ee 2_1_',
lv = t3. lv = b sin 3 t. lv
2
Compute the second derivatives d { of the functions given parame-
dx
trically

61 • f x = I~ t, 62 _ f x = et c~s t,
l_y = t . l_y = et sin t.
Draw the graphs of the following functions

63. y = X - X •
3 x2 - 1
64. y = - - . 65. y = - - . 66. y =
x2 + 1 x3 - 4

2
X X X
-x e2(x+ 1) X - 1
67. y = xe . 68. y = 2(x + 1) . 69. y = 2 In -X- + 1.
Exercises 403

- 8 - x2 3 ,_____ 3 ...----
70. Y = ✓X 2 - 4
• 71. y = Vx(x - 1) 2 • 72. y = Vx(x - 2).

13. Y = Vex - 1) 2 - Vex - 2) 2 •


Compute the maximum and minimum for each of the following func-
tions defined on closed intervals
4
74. y =4- x - 2 for x E [l, 4].
X

15. y = V2x 2 (x - 6) for xE [-2, 4].

76• y -_ 1Ox + 10 ~
.1or x E [ - l, 21 •
2
X +2x+2
77. Expand the polynomial x 3 + 3x2 - 2x + 4 in powers of x + 1.
78. Expand the polynomial x 3 - 2x2 + 3x + 5 in powers of x - 2.
79. Use Taylor's formula to expand the function/(x) = 1/x in a neighbour-
hood of the point Xo = -1.
80. Use Maclaurin's formula to expand the function f(x) = x ex in a neigh-
bourhood of the point Xo = 0.
81. Use Maclaurin's formula to find the expansion of the function
f (x) = ex12 + 2 with the remainder o (xn).
82. The shape of a string hanging of its own weight alone is given by the
equation of the catenary y = a cosh x/a where a is a constant. Show that
given sufficiently small Ix I the string takes the shape of a parabola.
Use the derivatives of higher orders to investigate the shape for each
of the following functions in a neighbourhood of a given point xo.
83. y = sin 2 (x - 1) - x 2 + 2x, Xo = l.
84. y = cos x + cosh x, Xo = 0.
85. y = x 2 + 2 ln(x + 2), Xo = -1.
86. y = x 2 - 2ex - 1, Xo = 1.

Answers

1. y' = 2x - 5. 2. y' = -1 + _!_ . 3. y' = 5x4 - 12x3 + 9x 2 - 2x + 3.


l/x x2

4. Y' =- _l_ (1 +
2VX
!) .
X
S. Y' = 1 - x2 . 6. Y' (1)
(x2 + 1) 2
= 16. 7. Y' =- 2x
3(1 + x 2) 413
.

8. y' = cos x + sin x. 9. y' = 1 . 10. y' = sin 2x. 11. y' = tan4 x.
1 + cosx
1 1
12. y' = 5 c~s 5x. 13. y' = 6 cos (3x - 1). 14. y' = - - 2 cos-.
x X
26*
404 8. Differential Calculus. Functions of One Variable

X
15. y' = 10 sin 4 2x cos 2x. 16. y' = cos (sin x) cos x. 17. y' = sin - 1 x+ ---;:::====
✓ 1 - x2
. x sin x 2
18. y' = sm x tan - 1 x + x cos x tan - 1 x + - - . 19. y' = - --;:::::;===
1 + x2 Ix I ✓ x 2 - 4
20. y' = -~ . 21. y' = ~ In x. 22. y' = 2x log3 x + ~.
1 + x4 x In 3
1 + x2 - 2x In x ,
= -2- . 25. ,
= - -1- . 26. , _ nx
2
23. y' = -- --. 24. y y y - :, In 9.
x (1 + x 2)2 sin 2x x ln 2 x
, 2x(x In 2 - 1) + 2x 3 , 2 x
27. y' = ex(x + 1). 28. y = ----- - - - -- . 29. y = 3x - 3 In 3.
x2
30. y' = 3 · 103x + 1 In 10. 31. y' = 5sinx cos x In 5. 32. y' = 3x cos(3x) In 3.
• 2 h 34. y , sinh x 35. y , 1
33. y' = 3 smh x cos x. = ----;===. = --- --
2.Jcosh x x cosh 2 (In x)

36. y' = 3~inh 2 x sinh 2x In 3. 37. y' = (sin xtosx (c~~~~


smx
- sin x In (sin x)) .

38. y' .
= xsmx ( cos (In x) sin
+ -~- x) . 39. y' = 2x1 x -1 In x.
0

40. y' = 2(~ =--~~~~-/:~-~-~~--~ !) . 41. y' = ✓ x 2 - a2 • 42. y' =- 1- .


3(x - 5) 4 1/(x + 1) 2
I
sin 3 x

43. y' = 9x2 sin - i x. 44. y' = (sin - 1 x) 2 . 45. d y = - ✓ -- 2 tan x dx.
dx . 46. d•v
x5 cos 2 x
1
41. dy = 510 sin x cot x In 5 dx. 48. tan- 1 1.02 = 0.795. 49. y" = 6x. 50. y" (1) =
2
51. ym = 27a 3x In 3 a. 52. y'" = x 2(60 In X + 47). 53. y<n> = ex(x + n).

54. y<n> = 2n- 1 sin [2x + (n - 1) ~J .


2
55. y<n> = (- l)"·n! 1- -
[-
Xn + l (1 +
1
X)" + l
]

(H . mt. E xpress t h e function


· m ~ y = --
· t h e ,orm I -
x (l + x)
1+
=- -x-- -
x = -l - -1-) .
x (l + x) x 1 + x

56. d 2y = e -x 212 (x 2 - l)dx2 • 57. d 3y = m(m - l)(m - 2)xm- 3 dx 3 • 58. dy =! t 2•


dx 2
d b d d2 d2 2 -I
59. ~ tan t. 60. ~ -2e 31. 61. ~
-
= - - = = 9/ 3. 62. __!_ = ___e _ __
dx a dx dx 2 dy 2 (cos t - sin t) 3
63. Fig. 8.68. 64. Fig. 8.69. 65. Fig. 8.70. 66. Fig. 8.71. 67. Fig. 8.72. 68. Fig. 8.73.
69. Fig. 8.74. 70. Fig. 8.75. 71. Fig. 8.76. 72. Fig. 8.77. 73. Fig. 8.78. 74. M = 1, m = -1.
75. M = 0, m = -4. 76. M = 5, m = 0. 77. (x + 1) 3 - S(x + I) + 8.
78. (x - 2) 3 + 4(x - 2) 2 + 7(x - 2) + 11. 79. f(x) = -1 - (x + 1) - (x + 1) 2 - ••• -
(x + l}"
= x + -x + -x + ... +
2 3
(x + 1)"- 1 + ( - l)" - - - - - - - , 0 < 0< 1. 80. f(x)
[-1 + O(x + l)t+ 1 , l! 2!
xn-1 xn n 2
--- + - (Ox + n)eBx, 0 < (J < 1. 81. f(x) = I; _e_ xk + o(xn). 83. Fig. 8.79.
(n - 2)! n! k=O 2kk!
84. Fig. 8.80. 85. Fig. 8.81. 86. Fig. 8.82.
Exercises 405

lj

X
X

Fig. 8.68 Fig. 8.69

lj

Fig. 8.70
406 8. Differential Calculus. Functions of One Variable

y
r

:r:

Fig. 8.71 Fig. 8.72

e21X+11
Y= 2(x+1)

0 X

Fig. 8.73
Exercises 407

lj

y=2ln x-1 +1
X

Fig. 8.74

-8-x 2
lj= '{xL,4

,4 X
I
I
I
I
I

Fig. 8.75
408 8. Differential Calculus. Functions of One Variable

!I
(1, 1)

Fig. 8.79

Fig. 8.76
(0, 2)

lj Fig. 8.80

( - 1, 1)
Fig. 8.77
Fig. 8.81

:r

(1,-1)

Fig. 8.78 Fig .. 8.82


Chapter 9
Integral Calculus.
The Indefinite Integral

9.1 Basic Concepts and Definitions


Antiderivative. Differential calculus provides methods of computing
the derivative f' (x) of a given function f(x). Now we shall examine ap--
proaches and methods of solving the problem inverse to that of different iat-
ing functions; namely, we shall study the problem of finding the function
J(x) provided that its derivative f' (x) is given. The branch of mathematics
that deals with methods and techniques of solving the latter problem is
called integral calculus.
We say that the function F(x) is the antiderivative (or primitive) of the
function f(x) on the finite or infinite open interval (a, b) if F(x) is differen-
tiable at every point of (a, b) and F' (x) = f(x) or, equivalently,
dF(x) = f(x) dx for all x E (a, b).
Examples. (1) The function F(x) = sin - 1x is the antiderivative of t_hc
function f(x) = ✓~- 1 --~ on the interval ( -1, 1) since for all x E ( -1, I)
1- x 2

F'(x) = (sin- 'x)' = ✓ 1 1 ~2


X

(2) The function F(x) 0 < a ~ 1, is the antiderivative of the


= -1a_ - ,
na
function f(x) = ax on the interval ( - oo, + oo) since for all x E ( - oo, + oo)
, ( ax ) ' ax In a x
F (x) = -In a = --Ina-- =a .

If F(x) is an antiderivative of the function f(x) on the interval (a, b)


so is the function cl>(x) = F(x) + C where C is an arbitrary constant.
Indeed, cl> '(x) = [F(x) + C]' = F' (x) = f(x) for all x E (a, b). There-
fore if the function f(x) has an antiderivative on (a, b) then /(x) has in-
finitely many antiderivatives on this interval.
The following theorem establishes thi relationship between any two dis-
tinct antiderivatives of a given function.
410 9. Integral Calculus. The Indefinite Integral

Theorem 9.1. If F(x) and <I>(x) are any arbitrary antiderivatives of a


given function f(x) on the interval (a, b) then the difference of F(x) and
<l>(x) is equal to some constant so that <I>(x) - F(x) = C: x E (a, b) (C is
a constant).
◄ Since F(x) and <I>(x) are antiderivatives of f(x) on (a, b) then F' (x) =
f(x) and <I>' (x) = f(x) for all x E (a, b ). On differentiating
cp(x) = <I>(x) - F(x) we obtain 'P' (x) = <I>' (x) - F' (x) = f(x) - f(x) = 0
for all x E (a, b ). Let xo and x be two distinct points in (a, b ). Then applying
the mean value theorem (Theorem 8) to the function cp(x) on the closed
interval [xo, x], we get
cp(x) - cp(xo) = (x - xo)'P' (~),
where Xo < ~ < x.
Since 'P' (x) = 0 on (a, b) then 'P' (~) = 0 and, consequently,
cp(x) = cp(xo) for all x E (a, b ). Hence, the function cp(x) is constant on (a, b)
and <I>(x) - F(x) = C, where C is a constant for all x E (a, b ). ►
Corollary. If F(x) is any antiderivative of the function f(x) on the inter-
val (a, b) then any other antiderivative <I>(x) of f(x) takes the form
<I>(x) = F(x) + C where C is a constant.
The indefinite integral. The set of all antiderivatives of a function f(x)
1
on (a, b) is called the indefinite integral of f(x) on (a, b ), written f(x) dx.
The symbol J is called the integral sign, the expression f(x) dx is called
the element of integration, the function f(x) is called the integrand and
the variable x is called the variable of integration.
If F(x) is any antiderivative of the function f(x) on the interval (a, b)
then by virtue of the above corollary we have
Jf(x) dx = F(x) + C,
where C is an arbitrary constant.
Observe that any identity involving indefinite integrals in the left-hand
and right-hand sides establishes the equality between sets of antiderivatives.
This equality means that the sets contain the same elements, i.e., the same
antiderivatives.
1
Sometimes the expression f(x) dx is thought of as any element of the
set of antiderivatives, i.e., any antiderivatives of the function f(x).
We shall prove elsewhere the following theorem on the existence of the
indefinite integral of a given function.
Theorem 9.2. Let f(x) be continuous on the interval (a, b ). Then the
function f(x) possesses an antiderivative and, consequently, an indefinite
integral on (a, b ).
The operation of computing the antiderivative or the indefinite integral
of a function f(x) is called the integration of f(x). Clearly, this op~ration
is the inverse of differentiation.
9.1 Basic Concepts and Definitions 411

Properties of the indefinite integral. In what follows we assume that


all functions involved are continuous on the interval (a, b ); hence, we as-
sume that the functions possess indefinite integrals on (a, b ).
(1) The element of integration is equal to the differential of the in-
definite integral so that
d(.\ f(x) dx) = f(x) dx.

◄ Indeed, since F' (x) = f(x) for all x E (a, b) then


d(i f(x) dx) = d[F(x) + C] = dF(x) = F' (x) dx. f(x) dx. ►
(2) The integrand is equal to the derivative of the indefinite integral:
Uf(x) dx)' = f(x).
◄ This readily follows from property (1). ►
(3) The indefinite integral of the differential of some function differs
from this function by an arbitrary constant:
j.dF(x) = F(x) + C.
◄ Indeed, if F' (x) = f(x) for all x E (a, b) then
jdF(x) = jF' (x) dx = jf(x) dx = F(x) + C. ►
(4) If the integrand is multiplied by a constant factor then the indefinite
integral is multiplied by this factor:
.\Af(x) dx = A .\f(x) dx (A = const, A ~ 0).
◄ Indeed, by virtue of property (2) we have
, I I I

(JAJ(x) dx) = Af(x) and (A V<x) dx) = A (jf(x) dx) = Af(x)


so that J° Af(x) dx and A~ f(x) dx correspond to the same set of antideriva-
tives of the function Af(x). ►
(5) The indefinite integral of a sum (difference) of two functions is equal
to the sum (difference) of the indefinite integrals of these functions:
Ilf(x) ± <,0(x)] dx = 1/(x) dx ± j<P(X) dx.
◄ By virtue of property (2) we have
I

(J[J(x) ± <,0(x)] dx) = f(x) ± cp(x).


On the other hand we have
I I I

(V<x) dx ± J<,0(x) dx) = (Jf(x) dx) ± (1 <,0(x) dx) = f(x) ± <P(X).


Therefore both Ilf(x) ± cp(x)] dx and 1/(x) dx ± I<P(X) dx are an-
tiderivatives of the functionf(x) ± cp(x). Hence, they differ from each other
by some constant C ►
412 9. Integral Calculus. The Indefinite Integral

Corollary.

JLt,AAcfk<x)] dx = J,Ak J/k(x)dx,


where Ak = const (k = 1, 2, ... , n).
The expression of the form
n
~ Aifk(X) = Ai.f1(X) + Ai}i(x) + ... + A,Jn(X),
k=I

where all Ak are some constants, is called the linear combination of the
functions fk(x), k = 1, 2, ... , n. Therefore we can formulate this corollary
in the following way: the indefinite integral of a linear combination of any
finite number of functions is equal to the linear combination of the in-
definite integrals of these functions.
Properties (4) and (5) are sometimes called the linear properties of an
indefinite integral.
Some indefinite integrals involving elementary functions. Every formula
for a derivative of a given function, i.e., a formula of the form F' (x) = f(x)
can be represented in terms of the function f(x) and its antiderivative F(x)
as J f(x) dx = F(x) + C where C is an arbitrary constant. This enables
us to derive the integration formulas directly implied by the differentiation
formulas for basic elementary functions. These are:

Jraxdx = ax
In a + C: 0 < a ;t= 1.

(In particular, putting a = e gives Jex dx = e~ + C.)

Jsin xdx = -cosx + C

J cosxdx = sinx + C

Jf dx
sin2 x = -cot x + C: x ;t= n1r (n = 0, ± 1, ±2, ... ).

Jf dx 1r
cos 2 x = tan x + C, x ;t= 2 + n1r (n = 0, ± 1, ± 2, ... ).
9.1 Basic Concepts and Definitions 413

dx = sin - 1 x + C, -1 < Ix I < l.


J ✓ 1 - x2
dx = . -IX
+ C, Ix I< a.
J ✓ a2 - x2 Sill --
a

dx
= In I x + ✓ x2 ± a 2 I +C (the "minus" sign re-
J ✓x2 ± a2
quires the condition I x I > I a I be fulfilled).

J .1-:X_xr = tan- 1 x + C

_. _ dx__ -- = .!. tan - 1 ~ + C, a ~ 0.


J a2 + x2 a a

Jsinhxdx = coshx + C
Jcoshxdx = sinh x + C
rJ ___coshp_x___x = tanh x + C
2

rJ -sinh
.cl_x_ - = - coth x + C x ~ 0.
2 x '

These formulas are easily verified by differentiating their right-hand


sides whose derivatives are equal to the respective integrands.
The distinction between the operations of differentiation and integra-
tion is that on differentiating elementary functions we always obtain
elementary functions while subjecting some elementary functions to in-
tegration can sometimes lead to nonelementary functions, i.e., there exist
elementary functions which are nonintegrable in elementary functions. For
example, it has been established that the following integrals cannot be
represented in terms of elementary functions although, due to continuity
of integrands, these indefinite integrals exist in the domains of definition:

e-xi dx, sin x dX, X ~ 0,


J J X

sinx 2dx, cosx dx, x 0, ~


J J X

cosx2dx, dx
-1-,0<x~l.
J J nx
414 9. Integral Calculus. The Indefinite Integral

Some indefinite integrals can be reduced, by suitable manipulations on


their integrands, to integrals easily computed by applying the integration
formulas that we have derived for basic elementary functions. To illustrate
this method of integration we turn our attention to the examples of comput-
ing each of the following indefinite integrals.

Examples. (1) j (v;> - };;, Ydx.


◄ Manipulating on the integrand we get

j (R----J;,)'dx = j (x'-2 + ~ -)dx = j<x'-2 + x- 3


3
)dx

+ C=--x -2x-- 1-2 +C.


4

4 2x ·

◄ We have

j -~//t dx = j JX~/1 : f dx = j Q_~{~ 2 ;.r dx

= f (!- +---~-- )dx = f_qx + 2l_ ~~-- = Inlx/ +2tan- 1 x+C. ►


J x 1+x 2 Jx J 1 + x2
(3) j ]X Y; 3 x ZX dx.

◄ We have

j2 X 3' ; 3 X zx dx = 2j G)'dx + 3j (; ydx


= 2 (3/5Y + 3 (2/5Y C ►
In 3/5 In 2/5 + ·

9.2 Methods of Integration


Integration by substitution. We wish to compute the indefinite in-
tegral V<x) dx of the continuous function f(x). Suppose that there exists
a funciion x = cp(t) having a continuous derivative 'P '(t) and the inverse
function t = v,,(x). Then we may write
Jf(x) dx = Jf ['P(l)] 'P '(t) dt.
Clearly, substituting t = tf,,(x) into the right-hand side of the above identity
we shall get the integral in terms of the original variable again.
9.2 Methods of Integration 415

◄ To prove the above identity we differentiate the left- and right-hand in-
tegrals with respect to x. On differentiating the left-hand integral, we get
(.\f(x) dx): = f(x).
Observe that t = i/;(x) is the inverse function of x so that

tX,_
-
1 _
--- -
1
-----
X/ cp (t) ' I

where cp' (t) ~ 0.


Then applying to the right-hand integral the chain rule of differentiating
a composite function, we obtain
I I

(I /[cp(t)] <P (t)dt)x =


1
(J' f[cp(t)]cp'(t)dt)i · t;

= f[cp(t)]cp (t) - _)____ = f[cp(t)] = f(x).


I

<P'(t)
Since the left- and right-hand integrals have the same derivative these
integrals define the same set of antiderivatives of the function/(x). Whence
it follows that the desired identity is true. ►
The indentity being considered lays down the method of integration
called integration by substitution. When using this method we have to
choose a suitable function cp(t) to simplify the computation of the original
integral.
Examples. Compute the following integrals.

(1) f --- -- ~-~ (a > 0).


J ✓ x2 + a2
◄ Putting x = a sinh t, we get dx = a cosh t dt. Then

f dx = f a cosh t dt = f_a cosh t dt = fdt = t + C.


J ✓x 2 +a 2 J ✓a 2 (sinh 2 t+l) J acosht J
To express the result in terms of x we solve the equation x = a sinh t
et - e - t et - e - t
to get t as a function of x. Since sinh t = 2 , a 2 =x
or ae 21 - 2.xe 1 - a = 0. Whence we get
x ± ✓x 2 + a 2
et=------.
a
x + ✓x 2 +a 2
Since et> 0 we have e1 =- - - - - - and t = In (x + ✓ x 2 +a2 )-
a
In a. Finally, we obtain
416 9. Integral Calculus. The Indefinite Integral

r
J
✓ dx
x2 + a2
= In (x + ✓x 2 + a2 ) + C:

where C =C- In a. ►
(2) f dx,---- .
J (x + 2)✓x + 1
◄ Put x = t 2 - 1 so that dx = 2t dt and t = ✓ x + 1 . Then

) (x + ~~~x + I = ) (/+dtl)t = 2) 72~ I = 2tan-11 + C.


Substituting t = ✓x + 1 into the right-hand side, we get

rJ (x + 2)✓x
dx,----
+ 1
= 2 tan - 1 ✓ X + 1 + C. ►

Remark. Suppose that in the integral V<x) dx the element of integra-


tion f(x) dx admits a representation of the form
f(x) dx = g [it,(x)] t/;' (x) dx
so that
f(x) dx = g [f(x)] d[t/;(x)].
Assume that the function g(t) is easily integrable, i.e., the integral
!g(t) dt = F(t) + C
is easily computed. Then substituting t = f(x) into the latter integral, we
obtain.
V<x) dx = F[t/;(x)] + C.
2x - 2 -x
(3) ) - - - - - dx.
2x + 2-x
◄ Put t = 2x + 2 - x (t > 0) so that dt = (2x In 2 - 2 - x In 2) dx and
(2x - 2-x) dx = ! 1
1 2 • Then

2x-2-x dx= 1 f dt =-l- Int+C=}11_t2x~~--x)_+C. ►


2x + 2 - x In 2 J t In 2 In 2
e 2xdx
✓ex+ 1
✓ exdx
◄ Putting ex + 1 = t, we obtain - = 2 dt and ex = t2 - 1.
✓ex+ 1
Then
9.2 Methods of Integration 417

exdx
✓ex+ 1
= 2) (t 2 - 1) dt

=2( r; - r) + C =; (ex+1) 312 - 2✓ex+1 +c


= ~ (ex - 2) ✓ ex + 1 + C. ►
Integration by parts. Let the functions u = u(x) and v = v(x) have con-
tinuous derivatives u' (x) and v '(x). Then applying the product rule of
differentiation, we have
[u(x) v(x)]' = u(x) v' (x) + v(x)u '(x)
so that the product u(x) v(x) of the given functions is an antiderivative
of the sum u(x) v' (x) + v(x) u '(x). Hence
J[u(x) v' (x) + v(x) u '(x)] dx = u(x) v(x) + C.
By virtue of the linear property of indefinite integrals this expression
can be written as
Ju(x) v '(x) dx = u(x) v(x) - Jv(x) u' (x) dx + C
or
Ju du = uv - Jv du + C
since v '(x) dx = du and u' (x) dx = du by the definition of the differential.
Noting that the constant C can be combined with either of the integrals
involved we may write the above identity as
Ju du= uv - Jvdu.
This identity lays down the method of integration called integration by
I
parts. The idea is to replace the original integral u du by the integral Jv du
easily computed in some specific instances. The method involves splitting
the original element of integration into two factors u and du = v' dx and
subsequent differentiation of the former and integration of the latter. Clear-
ly, integration by parts is helpful and constructive when this splitting pro-
vides easily differentiable u and easily integrable du.
To illustrate integration by parts we shall work through the integral
J(2 - 3x) cos x dx.
◄ Here udv = (2 - 3x) cosxdx. Put u = 2 - 3x and du= cosxdx so
that du = - 3dx and v = Jcos x dx = sin x. Then
Judv = uv - Jvdu = J(2 - 3x)cosxdx
= (2 - 3x) sin x + 3 Jsin x dx = (2 - 3x) sin x - 3 cos x + C ►

27-9505
418 9. Integral Calculus. The Indefinite Integral

Notice that if we put u = cos x and du = (2 - 3x) dx or


u = (2 - 3x) cos x and du = dx the integral u du becomes more complicat- I
ed than the original one.
On integrating du to get the function u we may choose any suitable
value of the constant of integration C since C is not involved in the final
result (to verify this it suffices to substitute u + C for u in the identity
describing integration by parts). In particular, it is convenient to put C = 0.
Examples. (1) We wish to compute the integral IIn x dx.
◄ Noting that u dv = In x dx we have the unique factorization u = In x

and du = dx. Then du = ~ and u = )dx = x; whence we obtain

pn x dx = x In x - )dx = x In x - x + C = x (In x - 1) + C. ►

(2) Integrate by parts .\ ✓ a 2 - x 2 dx ( I x I < I a I ).


◄ Put u = ✓a 2 - x2 and du = dx so that du = - x~
,..c...=-=~ and
✓ a2 - x2
u = x. Integrating by parts, we get

j -Ja 2 - x2 dx = ~ + j + 2C.

On manipulating with the integrand of the right-hand integral, we


obtain
r✓
J
a 2 - x 2 dx = X ✓ a2 - x2 + r a_~-~
J ✓
(a 2 - x 2 ) dx
a2 - x2
+ 2C

= X ✓ a2 - x2 + a2 rJ--✓ dx_
a2 - x2
. - rJ✓ a 2 - x 2 dx + 2C

=x -J a 2 - x 2 + a 2 sin - 1 ~ - j -J a 2 - x2 dx + 2C.
We have obtained the equation in one unknown ) ✓ a 2 - x 2 dx. Solving
this equation, we get
r
J -J a 2 - i2 dx = ~ x -J a 2 - x2 +
- i-sin -
2
1 ; + C. ►

Problem. Verify the following formulas

(a) j -J x2 + a 2 dx = ~ x -J x 2 + a 2 + ~2 - In (x + -J x 2 + a 2 ) + C;

(b) j -J x2 - a2 dx = ~ x -J x 2 - a2 - ~ In I x + -Ji2 - a2 I + C,
where Ix I > Ia I .
9.2 Methods of Integration 419

Remark. Sometimes we can use integration by parts to compute the


right-hand integral ju du. To illustrate this approach we shall work through
the following examples of integrating by parts.
Examples. (1) We wish to compute the integral fx 2 2x dx.
J 2x
◄ Put u = x 2 and du = 2x dx so that du = 2x dx and u = In 2 .
Integration by parts gives
2x
Jx 2 2x dx = x 2 -
In 2
-

The right-hand integral can also be integrated by parts. If we put u = x


2x
and dv = 2x dx then du = dx and u = In 2 so that

Jx2 2x dx = x2 -,l1 - I;2GT~xi - 1! 2 J2x dx)


= x 2 l~x2 - 1;2 (x -l2 - -1::2) +C
2 2 2 )2x
= (x - In 2 x + In 2 2 In 2 + C. ►
(2) Compute the integral j eO'xcos {3x
(a -:;r: 0, {3 -:;r: 0).
◄ Applying integration by parts, we put u = eO'x and dv = cos {3x dx (or
u = cos {3x and du = eO'x dx) so that

du = aeO'x dx and u = Jrcos {3x dx = sin {3x


{3

Then
Jeaxcos {3x dx = ~ e•xsin {3x - ; J e•xsin {3x dx.
Integrating the latter integral by parts gives: u = eO'x and du = sin {3x dx
cos {3x
so that du = aeaxdx and v = - {3 and

JeQX sin {3x dx = - ~ eax cos {3x + ~ Je•x cos /3X dx.
Substituting this identity into the preceding one, we obtain

reO'X cos {3x dx = !(3 eClX sin {3x + ~


J {32
eClX cos {3x - ci
{32
reClX cos {3x dx.
J
Thus applying integration by parts twice we have arrived at the equation
in one unknown, whence we have

( 1 +
a,2 )
{32 Jreax cos {3x dx = 7J2
eCiX
(a cos {3x + {3 sin {3x)

'27*
420 9. Integral Calculus. The Indefinite Integral

and
j eax cos (3x dx =
eax
~-~ (a
0'.2 + (32
cos (3x + (3 sin (3x) + C.

Analogously, we can compute the integral

1 eax sin (3x dx =


(X
2
eCl'X

+ (3
2 (a sin (3x ·- (3 cos (3x) + C. ►

Integration by parts is helpful to compute a number of integrals given


below. n
Integrals of the form IPn(X) Inxdx, where Pn(X) = ~ CkXk is the nth
k =0 d
degree polynomial. If we put u = ln x and du = Pn(X) dx then du = _£
X
and u = )Pn(X) dx = Qn + 1 (x), where Qn + 1(x) is a polynomial of degree
(n + 1), and integration by parts gives

j Pn(X) lnxdx = Qn + 1(x) In x - j Qn :i(x) dx

= Qn + 1(x) In X - Hn + 1(x) + C,
where Hn + 1(x) is a polynomial of degree (n + 1).
Example. Compute the integral ) (4x 3 + 2x) In x dx.
◄ Put u = In x and du = (4x 3 + 2x) dx so that du = ~ and u = i(4x 3 +
2x) dx = x 4 + x 2 • Then integration by parts gives
) (4x 3 + 2x) In x dx = (x 4 + x 2 ) ln x - ) (x 3 + x) dx

= (x4 + x 2 ) In x - x4 - x2 + C.
4 2

Integrals of the form _\P,, (x) tan - 1 ax dx and IPn(x)cot - 1 ax dx, where a
is a real number. These integrals are reduced to those of rational functions.
For example, if we put u = tan - 1 ax and du = Pn(x) dx in the former in-
tegral then du = ad~ 2 and u = Qn + ~ (x) and integration by parts
1 + ex X
gives
lPn(X) tan-
J
I axdx = Qn+ 1(x) tan- 1ax - a rJ 1Qn+ t~Xt
+ ex X
dx.

Analogously, we work through the latter integral.


Example. Compute the integral (3x 2 + 1) tan - 1x dx.1
◄ Put u = tan - 1x and dv = (3x 2 + 1) dx so that du = dx and
1 + x2
9.2 Methods of Integration 421

v = x 3 + x. Then integration by parts gives

r(3x 2 + 1) tan - IX dx = (x3 + x) tan - 1 X - r +~


J 1+
X
3
dx
J
= (x3 + x)tan- 1 X - Jxdx = (x 3 + x)tan- 1 x-
X

r + C. ►
Integrals of the form jPn(X) sin - 1 ax dx and) Pn(x) cos - 1 ax dx, where
a is a real number. To integrate these integrals by parts we put u = sin - 1 ax
(u = cos - 1 ax in the latter integral) and dv = Pn(X) dx. Then

( du = - ✓ adx
2 2
) ' V = Qn + I (x)
1- a x
and
rPn(X) sin- 1axdx = Qn+ 1(X) sin- 1ax - r O'. Qn+ i(X) dx.
J J ✓ 1 - a2X2
The integral in the right-hand side can be_ computed by using different
methods which will be discussed in detail in Sec. 9.4.
Example. Integrate by parts 2 sin - 1 x dx. Ix
dx x3
◄ Put u = sin - 1 x and du = x 2 dx. Then du = - , = = ~ and v = - 3- .
2 ✓1 - x
Integrating by parts, we get

J sm
x2 • - 1
x dx
3
= -x3- sm
. - 1
x - -31 J x 3 dx
✓ 1 - x2
By the substitution t = ✓ 1 - x 2 so that x 2 = 1 - t 2 and x dx = -t dt
the integral on the right is reduced to
r x3 dx = r x2 X dx = - r1 - t 2 t dt
J ✓1 - x2 J✓ 1- x2 J t

= J(1 2 - l)dt = ! t3 - t+ C =- ! (x" + 2r,/ I - x 2 + C.

Finally we have

Jx sin -
2 1 x dx = ~3 sin - 1 x + ! (x" + 2)✓ I - x2 . ►

Integrals of the form jPn(x) e>.x dx where X. is a real number. Putting


u = Pn(X) and dv = e>.x dx we obtain du = P;(x) dx and v = ! e>.x (}.. ~ 0)
422 9. Integral Calculus. The Indefinite Integral

so that
[ AX 1 AX 1
JPn(x) e dx =" Pn(X) e - "
where P~(x) is a polynomial of degree (n - 1).
Then applying integration by parts to the right-hand integral we again
obtain the expression which can also be subjected to integration by parts.
Continuing this process n times we finally arrive at the integral

Je"' dx = ~ e"' + C.
Example. Compute the integral )(x 2 + 2x) exdx.
◄ Put u = x 2 + 2x and dv = exdx so that du = 2(x + 1) dx and v = ex.
Then integration by parts gives
)(x 2 + 2x) exdx = (x 2 + 2x) ex - 2)(X + 1) exdx.
To integrate the right-hand integral by parts we put u = x + 1 and
dv = exdx. Then du = dx and v = ex so that
)(X + l)exdx = (x + l)ex_ )exdx = (x + l)ex-ex + C = xex + C.
Finally, we have
)(X 2 + 2x) exdx = (x 2 + 2x) ex - 2xex + C = x 2 ex + C. ►
Integrals of this form can also be computed by the method of compar-
ing (unknown) coefficients. In this case we assume that the integral is equal
to the product of the nth degree polynomial
Qn(X) = bo + b1X + ... + bnXn
with unknown coefficients bo, b1, ... , bn by the function eAX, i.e., we assume
that
)Pn(X) eAXdx = Qn(X) eAX.
On differentiating the integral we get
Pn(X) eAX = Q~(x) eAX + AQn(X) eAX
whence, dividing by e>.x ~ 0, we get
Pn(X) = Q;(x) + A Qn(X).
Since the polynomial on the left is to be identically equal to the sum
of polynomials on the right, the coefficients of any power of x must be
equal; so comparing coefficients, we obtain the system of (n + 1) linear
equations in n unknowns bk. This system has a unique solution, for the
respective determinant is distinct from zero. To demonstrate how this
method works we shall compute J(x2 + 2x) exdx.
9.2 Methods of Integration 423

◄ Put
I(x 2 + 2x) exdx = (bo + b1x + b2x2) ex,
where bo, b1, bi are unknown coefficients.
On differentiating this identity, we get
(x 2 + 2x) ex = (b1 + 2b2x) ex + (bo + b1x + b2x 2) ex.
Whence, division by ex :;r: 0 gives
2.\" -+ x 2 = bo + b1x + b2x + b1 + 2b2x
or
2x + x 2 = (bo + bi) + (b1 + 2b2) x + b2x 2.
Comparing the coefficients of equal powers of x, we arrive at the system
of linear equations
x0 = bo + b1 ] 0
x 1 = b1 + 2b2 '2
x 2 = b2 1
whose solution is bo = 0, b1 = 0 and b2 = 1.
Hence the original integral becomes
j(X2 + 2x) exdx = x 2ex + C ►
Integrals of the form JPn(x) sin {3x dx and JPn(x) cos {3x dx, where {3
is a real number, {3 :;r: 0.
◄ Putting u = Pn(x) and du = sin {3x dx (du = cos {3x dx in the latter in-

tegral), we have du = P;(x) dx and u = - cos{3 {3x ( u = sm{3{3x ) . Then ·

) P.(x) sin {Jxdx = -P.(x) co~{Jx + 1) P;(x) cos {Jxdx.

Applying integration by parts n times, we finally obtain

Jrsin· {3x dx = -
COS
{3
{3X
and Jrcos {3x dx = Sill
{3
{3X
. ►

Example. Compute the integral J(x2 - 1) cos x dx.


◄ Put u = x 2 - 1 and dv = cosxdx so that du= 2xdx and u = s1nx.
Then integration by parts gives
J(x2 - l)cosxdx = (x 2 - l)sinx - 2Jxsinxdx.
NQW we put u =x and dv = sinxdx so that du= dx and v = -cosx.
Then we have
J(x2 - 1) cos x dx = (x 2 - 1) sin x + 2x cos x - 2 sin x + C
= (x 2 - 3) sinx + 2.xcosx + C ►
424 9. Integral Calculus. The Indefinite Integral

Integration by parts can also be helpful in dealing with some other in-
tegrals apart from those considered above. By way of illustration we shall
.
compute t he 1ntegra I Jr X. dx
2
sm x dx
◄ Putting u = x and du = - ~ - , we obtain du = dx and v =
sin 2 x
r dx - cot x so that
J sin 2 x -

rJ _xsind!__x = -
2
X cot X + rJcot X dx.

By the substitution t = sin x and dt = cos x dx the integral on the right


is reduced to

) cot x dx = ) _c:~: :x = ) ~/ = In I I I +C= In I sin x I +C


Finally, we have

r .sm~ ~xx
J
= - X cot X + In I sin X I+ C. ►

9.3 Integrating Rational Functions


Rational functions. The simplest rational function is the nth poly-
nomial
Q()
n X = aoXn + a1X
n-1
+ ... + an - 1X + an,

where the coefficients ao, a1, ... , an are real numbers and a0 :;±. 0.
The polynomial Qn(x) is said to be monic if ao = 1.
The real number b is called the root of the polynomial Qn(X) if
Qn(b) = 0.
It is known from algebra that any real polynomial Qn(X) can be uniquely
factorized into a product of monic linear polynomial x - b and monic
quadratic polynomials x 2 + px + q, where p and q are real coefficients and
every monic quadratic polynomial is irreducible to a product of linear poly-
nomials, for it has no real roots. Then gathering equal factors if any, we
can write the monic polynomial Qn(X) in the form

Qn(X) =(x-a)°'(x- bt... (x-l)"'(x + p1x+qi)P.


2 1• • , (X 2 +PsX+Qs)P.s,
where the exponents a, {3, ... , X., µ1, ... , µs are natural numbers.
It is easy to notice that for the nth degree monic polynomial Qn(x) the
following condition is satisfied
a + {3 + ... + X. + 2(µ1 + µ2 + ... + µs) = n.
9.3 Integrating Rational Functions 425

When a = 1 the root a is said to be unrepeated or simple; when a ~ 2


we say that a is a repeated root of Qn and call the number a the multiplicity
of a.
A real rational function f(x) is the ratio ~:t~ of two real polynomials
Pm(x) and Qn(x) which have no factor in common.
The rational function f(x) = ~:~;~ is called the proper rational func-
tion if the numerator Pm(x) has a degree lower than that of the denomina-
tor Qn(X), i.e., if m < n. If m ~ n we can apply the division algorithm to
express f(x) as
_ Pm(X) _ P(x)
f(x) - Qn(X) - Rm-n(X) + Qn(X) ,

where Rm_ n(x) and P(x) are real polynomials and P(x) 1s a proper ra-
Qn(X)
tional function. For example, if we apply the division algorithm to the ra-
. 1 function
. xs + 1 Ps (x) b .
tiona 2 - - - , we o ta1n
X + 1 Q2(X)
X3 - X
x2 + 1 )x5 + Ox4 + Ox 3 + Ox 2 + Ox + 1
xs + Ox4 + 1x 3
1x3 _+_0_x~2 -+-Ox_+_l
lx 3 - Ox 2 - lx
X + 1.
Hence, XS + 1 = X3 _ X + X + 1 so that R3(x) = x 3 - x, P(x) =
x2 + 1 x2 + 1
x + 1 1s
. a proper rat1ona
. l function.
x + 1 and 2
X + 1
Splitting a rational function into partial fractions. Partial fractions are
proper rational functions of the form
A A Mx + N and Mx + n
x - a ' (x - a )k ' x2 + px + q (x + px + qt
2

where A, M, N, a, p, and q are real numbers, k (k ~ 2) is a natural number


and quadratic trinomial x 2 + px + q has no real root so that the condition
p2 p2
4 - q < 0 or q - 4 > 0 holds.
We turn once again to algebra and quote the following important fact.
Theorem. 9.3. Let f(x) = ~:~;~ (m < n) be a proper real rational
function and let Qn(X) = (x - a)a(x - b)f3 . . .(x2 + PsX + Qs)P.s. Then f(x)
426 9. Integral Calculus. The Indefinite Integral

is uniquely reducible to a sum of partial fractions of the form


Pm(X) _ Ai A2 A(\'
---- - + + ... +
Qn(X) x-a (x - a)2 (x - at
Bi -- + B2 B
+ (x - b)2
+ . . . + -(-x---b-)---=-(3
x-b
_M_1_X__+_N_1_ + M2x + N2 _ + + MµsX + Nµs
+ ... + X2 + PsX + qs (x 2 + PsX + q~) 2 , , . (X 2 + PsX + Qs)µs '

where Ai, A2, ... , A(\', Bi, B2, ... , Bf3, ... , Mi, Ni, M2, N2, ... , Mµ,,,
Nµs are real numbers not all equal to zero.
To define the coefficients in the numerators of the partial fractions we
multiply the left- and right-hand sides of the preceding expression by Qn(x)
and apply the method of comparing (unknown) coefficients, i.e., we com-
pare the coefficients of equal powers of x on the left and on the right thus
obtaining a system of linear equations in the desired unknowns. The solu-
tion of this system uniquely defines the coefficients we are seeking for.
Sometimes it is convenient to use another method of computing coeffi-
cients of partial fractions. If two polynomials are identical the identity
holds for any value of x; then multiplying both sides of the expansion given
in Theorem 9.3 by Qn(X) and substituting some specific values for x into
the identity thus obtained it is possible to get simple equations for the
unknown coefficients. The method is often useful when Qn(X) has only
simple real roots, substituting the values of the roots for x we arrive at
simple equations for the unknown coefficients.
To illustrate these methods we use some specific examples.
. 3x 2 - 6x + 2
Examples. (1) Split the proper rational funct10n 3 2 into
x - 3x + 2x
partial fractions.
◄ On factoring the denominator, we get

(x 3 - 3x 2 + 2x) = x(x 2 - 3x + 2) = x(x - l)(x - 2).


Clearly, the denominator has real simple roots. Then we can write
3x2 - 6x + 2 =A + B + C
x3 - 3x2 + 2x X X - I X - 2
Multiplying both sides by the denominator, we obtain the identity

3x2 - 6x + 2 = A(x - l)(x - 2) + Bx(x - 2) + Cx(x - 1) (*)


or
3x2 - 6x + 2 = (A + B + C)x 2 + ( - 3A - 2B - C)x + 2A:
9.3 Integrating Rational Functions 427

Applying the method of comparing coefficients, we arrive at the system


of linear equations in the unknowns A, B, C

A+B+C=3 ]
- 3A - 2B - C = - 6 ,
2A = 2
whose solution yields A = 1, B = 1 and C = 1.
We obtain the same result by substituting the values of the roots of
the _denominator for x in ( *). Indeed, the roots are X1 = 0, X2 = 1 and
X3 = 2. Then the identity yields

= 2A
2 and A = 1 for x = 0,
- 1 = -B and B = 1 for x = 1,
2 = 2C and C = I for x = 2.
Hence

3x 2 - 6x + 2 = 1 + 1 1
x3 - 3x2 + 2x X X - 1 + X - 2 ►.
. t h e proper rat10na
(2) Sp 11t . I f unction
. s x3 +4 3x +3 1 2 into par-
x + 3x + 3x + x
tial fractions.
◄ The denominator can be factorized as

xs + 3x4 + 3x3 + x 2 = x 2 (x 3 + 3x 2 + 3x + 1) = (x + 1) 3x 2 •
Thus it has the repeated root x1 = 0 of multiplicity 2 and the repeated
root x2 = -1 of multiplicity 3. Then

_ _~~-.± 3x__±_
_!__ ____ _ -~t_ + A2 + B1 + B2 + B3
xs + 3x + 3x + x
4 3 2 x x 2 x + 1 (x + 1)2 (x + 1)3
Multiplying out, as before, we get
x 3 + 3x + 1 = A1x(x + 1)3 + A2(x + 1) 3 + B1x 2(x + 1)2
+ B2x 2(x + 1) + B3x2 ( **)
or
x 3 + 3x + 1 = (A1 + Bi)x4 + (3Ai + A2 + 2Bi + B2)x3
+ (3Ai + 3A2 + Bi + B2 + B3)X2 + (Ai + 3A2)x + A2.
The method of comparing coefficients gives
x4 A1 + Bi = 0
x3 3Ai + A2 + 2Bi + B2 = 1
x2 3A i + 3A2 + Bi + B2 + B3 = 0
Xi A I + 3A2 = 3 .
x0 A2 = 1
428 9. Integral Calculus. The Indefinite Integral

so that A1 = 0, A2 = l, B1 = 0, B2 = 0, B3 = - 3 and
x 3 + 3x + l 1 3
x5 + 3x4 + 3x 3 + x2 - x2 (x + 1) 3 •

Proceeding as in the previous example, we put x = 0 and x = - l in


the identity (**). This yields A2 = 1 for x = 0 and B3 = - 3 for x = - l.
Substituting these values for A2 and B3 into (**), we get
x3 + 3x + l = A1x(x + 1) 3 + (x + 1) 3 + Bix 2(x + 1) 2
+ B2x 2(x + l) - 3x 2
or
x3 + 3x + 1 - (x + 1) 3 + 3x 2 = A 1x(x + 1) 3 + B1x 2(x + 1) 2
+ B2x 2(x + l),
whence
0 = A1x(x + 1) 3 + B1x 2(x + 1) 2 + B2x 2(x + l)
and
Ai(x + 1)2 + B1x(x + 1) + B2x = 0.
Put x = 0 and x = - l. Then A 1 = 0, B2 = 0 and, consequently,
B1 = 0. Thus we have the same values for the coefficients as those obtained
by the method of comparing coefficients, namely,
A1 =0, A2= l, B1 =0, B2=0, B3= -3. ►

(3) Split the proper rational function _.x__3_~ x +2 1 into partial frac-
2

(x + 1)
tions.
◄ The denominator has no real roots since x 2 + l does not vanish for
any value of x. Then there must hold
x 3 + x 2- +~--
---~- M1 x +--------
1 - ---- N1 + M2x + N2
(x 2 + 1)2 x2 + 1 (x 2 + 1)2

Whence
x3 + x2 + l = (Mix+ Ni)(x 2 + 1) + M2x + N2
or
x3 + x2 + 1= M1x 3 + N,x 2 + (Mi + M2)x + (Ni + N2).
Comparing the coefficients of equal powers of x, we get
Mi = 1, N1 = l, M1 + M2 = 0, Ni + N2 = 1.
Whence
Mi = 1, Ni = l, M2 = - 1, N2 = 0
9.3 Integrating Rational Functions 429

and, consequently,
x3 + x 2 + 1 + 1
X X
(x2 + 1)2 x + 1
2 (x2 + 1)2 . ►
It is worth noting that sometimes it can be easier to arrive at the desired
result without applying the method of comparing coefficients. For instance,
in example (2) after a little algebra we obtain
x 3 + 3x + 1 (x3 + 3x2 + 3x + 1) - 3x2
x 5 + 3x4 + 3x3 + x 2 x 2 (x + 1)3
(x + 1)3 - 3x 2 1 3
x 2 (x + 1) 3 - x2 (x + 1)3 .
Integrating partial fractions. A rational function can be uniquely
represented in the form of a sum of a polynomial (a zero polynomial in
the case of a proper rational function) and a proper rational function which
can be splitted into partial fractions. Since any real polynomial can be easily
computed by applying standard integration formulas for elementary func-
tions, integration of a rational function becomes integration of a sum of par-
tial fractions. So now we turn our attention to techniques of integrating
partial fractions.
Using standard integration formulas we can easily integrate partial frac-
tions whose denominators are monic linear polynomials or power functions
of them, namely

f A dx =A f d(x - a) =A I
In x - a I + C.
J x-a J x-a

f A dx =A f(x-a)-kd(x-a) = 1~ k (x - a)-k+ 1 + C
J (x - al J
A
------k~-I + C.
(1 - k )(x - a) -
Mx+N
To integrate a partial fraction of the form ~ - - - - we apply the
x 2 +px+q
method of completing the square in the denominator. This yields

x2 + px + q = [x2 + 2x i + (iYJ + q - ~y
- (x + iY + ( q - ~) ·
Since the second summand is positive we set it equal to a2 where

a =+ .J P; . Then t:,e substitution x + i = t gives dx = dt and


q
430 9. Integral Calculus. The Indefinite Integral

x 2 + px + q = t 2 + a 2. By virtue of the linear property of the indefinite


integral we have

) 2
Mx + N dx -
_ ) M(t - i) 2 2
+N
dt
x + px + q t + a
_ M
- 2
f 2t dt
Jt + a 2 2
+ (N _ Jf __t +
Mp_)
2 2
d!____
a2
=M \ d(t 2 + a 2) + (N __
AfP) f dt __
2 J t + a
2 2 J t +a 2 2

= M ln(t 2 + a 2) +
2
(N - a !a Mp_)! tan -
2
1 + C

=M
2 ln(x
2 + px + q) + -~~JL tan - 1 /x + p___ + C
✓ 4q - p2 -v 4q - p2
To illustrate the method of integrating partial fractions of this form
we work through 2 2 - X
X + 4x + 6
r.
J
dx.

◄ Clearly, the quadratic polynomial x 2 + 4x+ 6 has no real roots since


2
~ - q = - 2 < 0. Completing the square, we get

x 2 + 4x + 6 = (x 2 + 4x + 4) + 2 = (x + 2)2 + 2.
The substitution x + 2 = t yields dx = dt, x = t - 2 and
x2 + 4x + 6 = t + 2. (Observe that here a = 2.) Then
2 2

2 - X = ) - (t-
-2 ~ - 2)
--
) ~ - - - - dx dt
x 2 + 4x + 6 t2 + 2
_ 4 rdt _ 1 2r dt . r
- J t +2
2 2 J7+2-
= ~ tan - I ~ -1 ln(l 2 + 2) + C

= ·"'
tan -1 X +2 -
Y2 1 2
2v~ 2 ln(x + 4x + 6) + C. ►

To compute the integral of a partial fraction of the form


Mx + N k , (k ~ 2) where the denominator is irreducible to a
2
(x + px + q)
product of monic linear polynomials we proceed as before by putting

x + i =t so that dx = dt and a= _Jq ~ 2


• Then
9.3 Integrating Rational Functions 431

_- -M
2
J(t 2 + a 2 ) - k d(t 2 + a 2 ) + ( Mp
N - -
2
-) J- -+dt-a2t
(t2
--

_
-
M
2(1 - k)(t2 + a2l- 1
+ (N _-2-
Mp ) f
J (t2 +
dt
a2l

or

Thus we have arrived at the recurrence relation which yields the value
of lk for any k (k = 2, 3, ... ). Indeed, the integral 11 is easily computed
by using a standard integration formula, i.e.,

11 = Jt 2
dt
+a
1
2 = - tan
a
-1 t
- + C.
a
For k = 2 the recurrence relation gives
12 =
Jf (t2 +dta2)2 = t
2a2(t2 + a2)
+ _fl_
2a2

t + 1 tan - 1 ~ + C.
2a 2(t 2 + a 2) ~ a
432 9. Integral Calculus. The Indefinite Integral

Then we can easily compute h by putting k = 3 in the recurrence rela-


tion; continuing this process we can find the value of J for any preassigned
k. Substituting the relations for t and a into the expression obtained for
a given k we finally represent the result in terms of x and M, N, p and
q involved in the original integral.
Example. Compute the integral
Jf ----
(x
-~_±_l - -- dx.
4x + 5)
2 - 2

◄ Clearly, the integrand is a partial fraction similar to that considered


above since the denominator has no real roots due to the fact that
p2
-4 -- - q -- - 1 < 0 and, hence, is irreducible to a product of monic linear

polynomials and the nominator is a linear polynomial. Then completing


the square, we have
x2 - 4x + 5 = (x2 - 4x + 4) + 1 = (x - 2) 2 + 1.
The substitution x - 2 = t (a 2 = 1) yields dx = dt and x = t + 2 and

J -(.? ~i/+s{ = J-/:t), dx dt =½ J-ul/~),-


+3J(( 11)2 2(t 2 2-~ I) + 3 J-(t 2 '; iji .
Put k = 2. Then
f
J
dt
(t2 + 1)2
t
+ 1)
+ _!_ rJ dt t
+ 1)
+ 21 tan -It + C.
2(t 2 2 t2 + 1 2(t 2
Hence

f x + 1 dx = _ 1 + 3t
J (x 2 - 4x + 5) 2 2(t 2 + l) 2(t 2 + 1)

+ -3 tan - 1 t + C = 3t - 1
------=----- + -32 tan - 1 t + C.
2 2(t 2 + 1)
Finally, the substitution x =t+ 2 gives-

l x+1 dx - -------=-3_x_-_7__ + 23 tan - •(x-2) + C ►


J (x 2 - 4x + 5)2 - 2(x2 - 4x + 5)
To summarize the results outlined in the preceding discussion we quote
the following theorem. ·
Theorem 9.4. The indefinite integral of any rational function always
exists on intervals where Qn(X) ~ 0 and is expressed as a composition of
a finite number of elementary functions, namely, as an algebraic sum that
9.3 Integrating Rational Functions 433

can only involve polynomials, proper rational Junctions, logarithmic Junc-


tions or arctangents.
It is instructive to finalize the general procedure of integrating rational
functions as the following sequence of steps: (1) express a given rational
function (an integrand) if necessary as a sum of a polynomial and a proper
rational function by applying the division algorithm for polynomials or
any other suitable technique; (2) factor the denominator of the proper ra-
tional function into linear and quadratic polynomials; (3) split the proper
rational function into partial fractions; (4) compute the integral of the given
functions as a sum of integrals of the summands obtained.
In conclusion of this section we shall consider a number of examples.
Examples. Compute each of the following integrals.
(1) f x3 dx.
J (x - 1)(x 2 - 4)
◄ Since the integrand (x - l)(x 2 - 4) = x 3 - x 2 - 4x + 4 is not a proper
rational function we apply the division algorithm for polynomials
1
x3 - x2 - 4x + 4 )x 3 + Ox 2 + Ox + 0
x3 - lx 2 - 4x - 4
x 2 + 4x - 4:
Hence, the integrand becomes
x3 x 2 + 4x - 4
---- - = l + - --- --
(x - l)(x
2 - 4) 2 (x - l)(x - 4) '
where Ro(x) = 1 and P(x) = x2 + 4x - 4.
The denominator of the proper rational function has three distinct roots
a = I, h = 2 and c = - 2, then this function can be splitted into partial
fractions as
x 2 + 4x - 4 A + B + C
(x - l)(x 2 - 4) x-1 x-2 x+2 ·
Whence
x 2 + 4x - 4 = A(x2 - 4) + B(x - l)(x + 2) + C(x - l)(x - 2).
Setting x equal to the values of the roots, we obtain
A = -1/3 for x = 1, B = 2 for x = 2 and C = -2/3 for x= -2.
Then
x2 + 4x - 4 1 1 + 2 1 2 1
(x - l)(x2 - 4) 3 x-1 x-2 3 X +2

28-9505
434 9. Integral Calculus. The Indefinite Integral

and

J
x 3 dx
(x- l)(x2 - 4)
=
1
x-1 J(1-!.
+ 2. 1
x-2 3
2
x!2 )dx
= x - 31 In I x - 1 I + 2 In I x - 2 I - 32 In I x + 2 I + C. ►

+ 1
(2) Jx X
4
2

- X
3 dx.

◄ The integrand is a proper rational function whose denominator has dis-


tinct roots x = 0 of multiplicity 1 and x = 1 of multiplicity 3. Then splitting
into partial fractions yields

x 2 + 1 = A 1 + A2 _+ A3 + B
x4 - x 3 x x2 x3 x - I
Multiplying out and dividing by the denominator of the integrand, we
arrive at

or
x 2 + 1 = (A1 + B)x 3 + (-A1 + A2)x2 + (-A2 + A3)X -A3.
Comparing the coefficients of x of equal powers, we get A 1 + B = 0,
-A1 + A2 = 1, -A2 + A3 = 0 and -A3 = 1. Whence A1 = -2,
A2 = - 1, A3 = - 1 and B = 2. Then
x2 + 1 _l __1___1_ + 2
x4 - x3 x x2 x3 x - I

and

J-x4x- --+dx3I x = Jl (-lx - _l


2

x
-
2
_l +
x 3
2 )dx
x - 1
- 2 In I x I + l +
X
----4-
2X
+ 2 In I x - 1 I + C. ►
X3 - X
(3) J (x 2
+ 1)
2 dx.

◄ Since the denominator has no real roots the integrand can be written as
x3 - X Mix + N1 + M2x + N2
(x 2 + 1)2 - x2 + 1 (x 2 + 1)2
Then
x3 - x = (M1x + Ni)(x2 + 1) + M2x + N2
9.4 Integrals Involving Irrational Functions 435
-- - - - - - - - - - - - - - - - - - - - - - · - · ·

or
x3 - x = M1x 3 + N1x 2 + (M1 + M2)x + (N1 + N2).
Comparing the coefficients, we have
M1 = 1, N1 = 0, M1 + M2 = - 1, N1 + N2 =0
so that

and
r
- Jl [ x
X3 - X 2x
dx - x - ] dx
J (x 2 + 1)2 2 + 1 (x 2 + 1)2

= 21 ln(x2 + 1) + 21 +C
X + 1
Observe that the integrand can easily be splitted into partial fractions
by using simple algebra as
x3 - x (x 3 + x) - 2x _ x(x 2 + 1) - 2x
(x 2 + 1)2 (x 2 + 1) 2 (x 2 + 1)2
X 2x
x2 + 1 (x2 + 1)2 . ►

9.4 Integrals Involving Irrational Functions


Rational and irrational functions. We consider functions in many
variables u1, u2, ... , Uk. Let a function R (u1, u2, ... , Uk) be represented
in the form
_ Pm(U1, U2, ... , Uk)
R( U1, U2, . . • , Uk ) - Q ( ) ,
n U1, U2, ... , Uk

where Pm(U1, u2, ... , Uk) and Qn(U1, u2, ... , Uk) are the mth and nth
degree polynomials in u1, u2, ... , Uk, respectively. Then R(u1, u2, ... ,
Uk) is a rational function in u1, u2, ... , uk; otherwise, it is irrational. For
example, a quadratic polynomial in variables Ui and u2 takes the form
P1 (u1, u2) = Aoo + A10U1 + Ao1U2 + A20UI + A11u1u2 + Ao2uI where the
coefficients Aoo, A10, Ao1, A20, Au, Ao2 are real numbers and Alo+
Ari + A52 ~ 0.
2' 3
It is easy to observe thatf(x, y) = x + 2{ 2 + xy is a rational function
x+xy +1
in variables x and y since this function is a ratio of the third degree poly-
nomial P3 (X. y) = x 2 + 2y 3 + xy and the fifth degree polynomial
✓x2 - 2xy + 3
Qs(x, y) = x + x y + 1, while/(x, y) = - - - - - - i s an irrational
3 2
x+y
function.
28*
436 9. Integral Calculus. The Indefinite Integral

Suppose that variables u1, u2, ... , Uk are some functions of a variable
x, i.e., u1 = f1 (x), Uz = fz(x), ... , Uk = fk(X). Then the function R l/1 (x),
f2(x), ... ,fk(X)] is a rational function in functionsf1(x),f2(x), .. .,fk(X).
x 2 + ✓x 2 + x + l
For example, f(x) = ✓ is a rational function in
2
x + l + 3 x + x + 1
x and in ✓x 2 +x+l so that f(x) = R(x, ✓x 2 +x+l ). It is worth noting
that f(x) = In x + : ~ is an irrational function in x and in ✓x 2 + 1
2 + Sill X
while it is a rational function in functions In x, e✓ x 2 + 1 and sin x 2 so that
f(x) = R(lnx, e✓x 2 + 1 , sinx2 ).
It is not hard to notice that sometimes integrals involving irrational
functions do not admit representations iq elementary functions. For exam-
ple, the integrals

1✓(I -x2~~ -k2x2) and


(O<k< l)

called the elliptic integrals of the first and second kinds, respectively, can
not be expressed in elementary functions. However, by suitable substitutions
it is often possible to reduce integrals involving irrational functions to those
of rational functions. Below we shall consider techniques applicable in deal-
ing with integrals of some specific forms.

Integrals of the form 1 M,)


R ( x, m dx where m ;;, 2, R(x, y)

. a rationa
Is . I function
. In. x an d y = ax + db an d t h e coe ff'1c1ents
m
. a, b, c
ex+
and dare real numbers such that ad - be :;r: O; we leave aside the case when
a b ax+ b a b .
ad - be = 0, for then - = d and d = - = d so that the Integrand
c ex+ c
becomes a rational function only in x and its integrals have already been
studied in detail.
The substitution

ax+ b
c~ +d
yields
tm = ax+ b
ex+ d '
(ex+ d)tm =ax+ b anrl ax - cxtm = dtm - b
9.4 Integrals Involving________________
----~-------::. Irrational Functions _ 43 7

so that

is a rational function in t.
On differentiating x with respect to t, we have
dmtm - 1 (a - ctm) + cmtm - 1 (dtm - b)
dx = - - - - - - - - ~ - - - - - dt
(a - ctm)2

and further
dx = (ad - bc)mt:- 1 dt.
(a - ctm)

Then
m
ax+ bd )dX
ex+
= fR ( dtm - b
J a - ctm

where R1 (t) is a rational function in t the factor R ( dtm - b , t) being


a - ctm
a function in a rational function is again a rational function and hence
the product of this factor by the second factor which is a rational function
in t is again a rational function in t.
Therefore we have reduced the original integral to that involving rational-
functions which are well familiar to us.
Let
-~ R 1 (t)dt = F(t) + C and F' (t) = R, (/}.
Then the original integral becomes

JfR (x, m ax + : ) dx = F
ex+
(m ax +
ex+
!) + C.

By way of illustration we consider two examples.


Examples. Compute each of the following integrals.


(1) 1J;; +; (2x ~ 3) 2 •

Observe that the integrand is a rational function In x and In


. y
y = , I.e., R(x) = 2 •
(2.x + 3)
438 9. Integral Calculus. The Indefinite Integral

Put I = ~. 3
Then 2x - 3 = 2xl4 + 31 4 , X=-
2
1 + t4
1 - t4
-

3( 2 _ l) d _ 12t 3 dt and 2x + 3 = - -6~ Hence the


2 1 - t4 ' X - (1 - t 4 )2 1 - t4
original integral becomes

= 3lf4 1t s + C = -Ts1(-~)
Jt dt = 15 ~- 2x+3
5
+ C. ►

(2) i-fxcl/+ *> ·


◄ Evidently, the integrand is of the form R(x, 1rx). Put t = rx. Then 1

X = t 12 , Vx = t 'Vi=
3, t4, {x = t 6 and dx = l2t 11 dt. Hence
r dx r 12t 11 dt r t 4 dt
J TxcVx +--v'x)- = J t3(t4 + t6)- = 12 J t+--i2-
= 12 rJ u 1 +l) t+ 1 dt = 12 Jr(t
4
-
2
2 - 1 + ___l__ ) dt
1 + t2
= 12 ( 1; - I+ tan - i 1) + C

= 4Vx - 12 1 1/x + 12 tan - I lrx + C. ►


Integrals of the form JR ( x, ✓ ax 2 + bx + c ) dx where the integrand
expressed as R(x, y) is a rational function in x and in y = ✓ ax 2 + bx + c.
By suitable substitutions dependent on the coefficients a, b and c these
integrals are reduced to those involving elementary functions. Three cases
are to be distinguished.
(i) Let a > 0.
Then the substitution
t = ✓ ax 2 + bx + c + Yax
yields
(t - vax)2 = ax 2 + bx + C and t2 - 2Yaxt = bx + c,
so that
12 - C
X=
2Yat + b
is a rational function in t.
9.4 Integrals Involving Irrational Functions 439

On differentiating x with respect to t, we get


dx = 2t(2\/at + b) - (t 2 - c)2\/a dt = 2 \/at 2 + bt + c\/a dt
(2\/at + b)2 (2-fat + b) 2 •

We also have

ax 2 + bx + c = t - \/ax = t - va -~--
!2 -
2\/a t + b
C

-
vat 2+ bt + cva
2\/at + b
Thus x, dx and ✓ ax 2 + bx + c are all rational functions in t and we
can write

where
Ri (t) =R( !2 - , \/at 2 + bt + c\/a) 2 \/at 2 + bt + cva
c
2vat + b 2\/at + b (2vat + b)2
is a rational function in t.
Observe that the substitution t = ✓ ax 2 + bx + c - Yax also reduces
the original integral to that involving rational functions.
Example. Compute the integral
[ dx
J ✓x 2 + c/
,---- t2 2
◄ Since a = 1 > 0 we put t = ✓ x 2 + cl + x. Then x = - a:
2t
2 2 !2 + 2
dx = t +
2t
/x
dt and ✓ x 2 + c/ = t -
!2 2
- a: -
2t 2t
a: so that

= [
J !2
2t
+ a:2
t 2 + c/ dt =
2!2
r dtt
J
= In I t I + C

= ln I x + ✓ x 2 + a. 2 I+C
Noting that x + ✓ x 2 + a. 2 > 0 for all x, we finally arrive at

r ✓x2dx+
J a.2
= In I (x + ✓x:2 + cl) I +C ►
Problem. Prove that

J✓ dx
X2 - CY.
2
= In Ix + ✓x 2 - a2 I + c; Ix I > I "' I ·
440 9. Integral Calculus. The Indefinite Integral

(ii) Let the quadratic trinomial ax 2 + bx + c have distinct real roots


x1 and x2 (let the coefficient b be of any sign).
Then the substitution

✓ax2 +bx+ c =(x - x1)t (or ✓ax 2 +bx+ c = (x - x2)t)


yields
a(x - x1)(x - x2) = (x - xi) 2t 2 (or a(x - x2) = (x - xi)t)

since ax2 + bx+ c = (x - x1)2t 2.


Whence
x = _x_1_t2_-_a_x_2 dx = 2a(x2 - x1 )t dt
t2 - a ' (t 2 - a)2
and
✓ ax 2 + bx + c -_ ( -----=----
x1t 2 - ax2 _ Xi ) t -_ -
a(x1 - x2)t
- ---
t2 - a t2 - a ·

Since x, dx and ✓ ax 2 + bx + c are all rational functions int the origi-


nal integral becomes

)R(x, ✓ax2 +bx+ c )dx = )R1(t)dt,


where

Ri (t) = R ( Xi t 2 - ax2 , a(x1 - x2)t ) 2a(x2 - xi )t


t2 - a t2 - a (t 2 - a)2
is a rational function in t.
Example. Compute the integral

rJ (x -
dx
2)✓ 1 - x 2 '
lxl<L

◄ Since 1 - x 2 has two distinct real roots x 1 = - 1 and x2 = 1 we put

✓ 1 - x2 = (1 + x)t (or ✓ 1 - / 2 = (1 - x)t).

Then
1
- t2
2
1 - x 2 = (1 + x)2 t 2 , 1 - x = (1 + x)t 2 , x = ,
1+ t
x- 2 =_3! 2 +1 ✓ i-x2 = 2t dx=- 4tdt
f2 + 1 ' 1 + t2 ' (1 + 12 )2
9.4 Integrals Involving Irrational Functions 441

and
= r (t 2 + l)(t 2 + 1)4t dt = 2 r dt
J (3t 2 + 1)2t(t 2 + 1)2 J 3t 2 + 1

2 J- -dt= - t2a n - 1-"'


v3t+C
3 t2 + _!_ V3
3
2 -1 ~ 1 X C
= Y3 tan ✓ 3J ~ + . ►

(iii) Let C > 0.


Then the substitution ✓ ax 2 + bx + C = xt + ft (or ✓ ax2 + bx + C =
xt - vc) easily reduces the original integral to that involving rational
functions.
It is worth noting that it suffices to apply the substitutions in (i) and
(ii) to reduce integrals of the form 1R(x, ✓ ax 2 + bx+ c) dx to those
involving rational functions. Indeed, if b 2 - 4ac > 0 then the roots of
ax2 + bx + c are real and the substitution given in (ii) yields the desired
result. If b 2 - 4ac < 0 then the sign of ax2 + bx + c coincides with that
of a; since ax 2 + bx + c must be positive, then a > 0. In this case the sub-
stitution shown in (i) leads to what we are seeking for. Notice that some
integrals of the form 1R(x, ✓ ax2 + bx+ c) dx can easily be computed
without applying the substitutions mentioned in (i)-(iii). Below we shall
consider three specific cases.

(a) Integrals of the form f✓ dx , a >" 0. Here it IS


J ax +bx+ c
2

suitable to complete the square of the radicand as

ax2 + bx + c = a ( x 2 + 2x !a + : )

2 2
b b ) +c- - -
=a [ ( x 2 +2x--+-- b -]
2a 4a2 a 4a 2

= a[ (x + ty + 4a~;, b 2
]

=a (x + _!!_)2
2a
+ 4ac - b2
4a
=a (x + _!!_)2
2a
+ /JJ
'
4ac - b 2
where p = 4a
442 9. Integral Calculus. The Indefinite Integral

Then the substitution t = x + i'a yields dx = dt and

f dx f dt
J ✓ ax2 + bx + c - J ✓ at 2 + p '
where a and p are of opposite signs or both positive.
When a > 0 and p > 0 and when a < 0 and p < 0 the integral is ex-
pressed by a logarithmic function; when a < 0 and p > 0 the integral is
expressed by an arcsine as the following two examples illustrate.
Examples. (1) f✓ dx .
J x2 - 4x + 5
◄ Since x 2 - 4x + 5 = (x - 2) 2 + 1 then the substitution x - 2 = t gives
dx = dt and
f dx f dx f dt
J ✓ x 2 ...: 4x + 5 - J ✓ (x - 2) 2 + 1 - J ✓t2 + 1

= In (t + ✓ t 2 + 1 ) + C = In (x - 2 + ✓x 2 - 4x + 5 ) + C ►

(2) J dx

6x - x 2
.
◄ Since 6x - x 2 = - [(x2 - 6x + 9) - 9) = 9 - (x - 3) 2 the substitution
x - 3 = t yields dx = dt and

J ✓6x-x
dx
----.===- =
2
J dt
----.===- . - 1 -t
= Sin + 0 · = SI•n- 1 - 3 + C. ►
x--
✓9-t 2 3 3
(b) Integrals of the form J ✓
Mx + N
ax 2 +bx+ c
.
dx, a~ 0. Noting that

(ax 2 + bx + c)' = 2ax + b, we reduce the original integral to that con-


sidered above by proceeding as
M N M -21 [(2ax + b) - b] + N
f x + dx - f a - dx
✓ ax + bx + c
J 2 - ✓ ax + bx + c
2 J
_ M f (2ax + b)dx + (N _Mb) f dx
- 2a J ✓ ax + bx + c
2 20 J ✓ ax2 + bx + c
_ M f d(ax 2 + bx + c) + (N _ Mb) f dx
- 2a J
✓ ax2 + bx + c 20 ✓ ax2 + bx + c J
=M ✓ax2 +bx+c +(N_Mb)f dx
a 20 J ✓ ax + bx + c ·
2
9.4 Integrals Involving Irrational Functions 443

Ex~mple. Compute the integral


rJ ✓6xx +- 1x 2 d
x.
◄ Noting that (6x - x 2 ) ' = 6 - 2x, we obtain

J ✓x+l
-----,=====- dx = - -
l J -2x~2
--.====- dx = --
l J (6-2.x)- 8 dx
--;:===--
6x-x 2 2 ✓ 6x - x2 2 ✓ 6x - x 2
=4r dx _ _!__ r d(6x-x 2 ) =4sin-l x-3 - ✓6x2-x2 +C. ►
J ✓ 6x - x 1 2 J
✓ 6x - x2 3
(c) Integrals of the form f ✓ P.(x) dx where P.(x) is a poly-
J ax +bx+ c
2

nomial of degree n. Here we apply the method of comparing coefficients


to find the coefficients of Pn(x).
Let there hold the identity

r Pn(X) dx=Qn-1(x)✓ax 2 +bx+c


J ,Jax +bx+ c
1

+A f dx
n J ✓ ax 2 + bx + c '
where Qn _ 1 (x) is a polynomial of degree (n - l) with unknown coefficients
Ao' Al' ... , An - l' i.e.,
Qn - l (x) = Ao + Aix + ... + An - 1Xn - 1•
To compute the values of the unknown coefficients we differentiate the
above identity with respect to x so that
Pn(X) = Q~ - 1 (x)✓ ax2 + bx + c
✓ax 2 +bx+ c
+Qn- 1(X) ax+ b/2 + An
✓ ax1 + bx + c ✓ ax2 + bx + c
Multiplying throughout by ✓ ax 2 + bx + c , we get the identity

Pn(X) = Q~-1 (x)(ax2 + bx+ c) + Qn-1(x) (ax+ n + An

which involves the nth degree polynomials on both sides.


Comparing the coefficients of x of equal powers, we get (n + 1) equa-
tions which yield the desired coefficients Ak (k = 0, 1, 2, ... , n ). Then
substituting these coefficients into the right-hand side of(•) and computing
the integral f✓ dx , we arrive at the desired result.
J ax 2 +bx+ c
444 9. Integral Calculus. The Indefinite Integral

Example. Compute the integral


f x 2dx
J ✓x2 +2x+2
◄ Put

j ✓x x 2dx = (Ao + Aix)✓ x 2 + 2x + 2


2 +2x+2
+A f dx
2
J ✓x 2 + 2x + 2 ·
On differentiating both sides, we have
x2 ✓ 2
✓x2+2x+2 =Ai x +2x+2
+ (Ao + A 1X) X +1
-----,c=====-
+ A2
✓ x 2 + 2x + 2 ✓ x 2 + 2x + 2
Multiplying out by ✓ x 2 + 2x + 2 , we get
x 2 = A1(x2 + 2x + 2) + (Ao + Aix)(x + 1) + A2
or
x 2 = 2Aix2 + (2A1 + Ai + Ao)x + (Ao+ 2A1 + A2).
The method of comparing coefficients gives
x2
XI
XO

Whence
Ao = - 23 , A 1 = 21 , A2 = 21 .
Computing the integral in the right-hand side of ( **), we have
f dx · _ f d(x + 1)
J ✓x2 + 1x + 2 - J ✓ (x + 1)2 + 1
= In(x + I + ✓x 2 + 2x + 2 + C
Then the original integral becomes
f x2dx = X - 3 ✓x2 + 2x + 2
J ✓x2 +2x+2 2

+ ~ ln(x + 1 + ✓x2 + 2x + 2 ) + C ►
9.5 Integrals Involving Trigonometric Functions 445

9.5 Integrals Involving Trigonometric Functions


Integrals of the form JR(sin x, cos x) dx where the integrand is a
·
rat10na 1 function
· 1n · x an d cos x. vror examp1e, Ji( x ) = 1 - 2 sin2 x
· b ot h sin
2 + COS X
1 . 2
.1s a rat1ona
. 1 function
. . x an d cos x; g (x ) = ✓ + Sill X
.1n bot h sin
cosx + cosx
is a rational function in sin x and is irrational in cos x (we shall not examine
functions of this kind here).
By the substitution tan~ = t where - 1r <x< 1r, the original integral·
is reduced to that in a rational function. Indeed,
. X
2 sm X X
2 cos 2 2tan 2
2t
SlilX = ------
2X . 2X 1 + !2 '
COS - + Sill - 1 + tan 2X
--
2 2 2

COS X
cos 2X · 2X
- - sm
= - - 2- - - -
-
2 1 - tan 2 i 1 - t2
X
• cos 2 2 · 2X
+ sin 1 + tan 2 ; 1 + t2 '
2
x = 2 tan - 1 t, dx = 2dt 2
1+t
and
JR(sinx, cosx)dx = JRC !\, , :: ~:) //~ 2 = JR,(t)dt,
where R1 (t) is a rational function in t.
Example. Compute the integral f dx
J Sill X
◄ The substitution tan; = t yields

2dt
f dx f I + t2
J sinx J 2t
1+ (2

Sometimes the substitution tan; =- t requires unwieldy computations;


so we shall mention three specific cases when integrals of the form
JR(sin x, cos x) dx can be computed by using simpler substitutions.
446 9. Integral Calculus. The Indefinite Integral

(a) Integrals of the form IR(sin x) cos x dx. Here the substitution
sin x = t gives cos x dx = dt so that the original integral becomes iR(t) dt.
For example, if we wish to compute the integral
\ cosxdx
J 4 + sin 2x '
then the substitution sin x = t yields dt= cos x dx and
cosxdx = \ dt =_! t i_+C=_!_ t (_!_ · ) +C ►
J\ 4+sin x J 4+t
2 2 2 an
-1
2 2 an
-1
2 sinx ·

(b) Integrals of the form )R(cos x) sin x dx. The substitution cos x = t
yields sin x dx = - dt so that the original integral becomes - IR(t) dt. For
example, using this substitution we can easily compute the following in-
tegral
\ sin x dx = _ \ dt = _ \ d(2 + t)
J 2 + COS X J2+t J 2+t
-ln(2 + t) + C = -ln(2 + cosx) + C.

(c) Integrals of the form )R (sin x, cos x) dx where the integrand R (sin x,
cos x) involves only even powers of sin x and cos x. Then the substi-
tution tan x = t gives x = tan - 1 t and dx = dt 2 so that sin 2 x and
1+ t
cos x being rational functions in tan x become rational functions in t. In-
2

deed, we have
sin 2 x tan 2 x t2
sin 2 x = -
cos 2 x + sin 2 x 1 + tan 2 x 1 + t2 '
cos 2 x 1 1
cos 2 x =
cos 2 x + sin 2 x 1 + tan 2 x 1 + ,2 '
so that

where R1(t) is a rational function in t.


To demonstrate how this substitution works we compute

\ dx
J sin2 x + 4 cos2 x + 2 ·
9.5 Integrals Involving Trigonometric Functions 447
dt t2
◄ Put tanx = t. Then dx = sin 2 x = and
1 + t2 ' 1+ t2
1
cos 2 x = so that
1 + t2
) dx
sin x + 4 cos 2 x + 2
2 ) t2
+
1
2 + 2
4
dt
1 + t2
1 + t2 1+ t

_ 1
- 3 Jf t2+2
dt _
-
1 t -
3-fi an
1 _t_
Y2
+C= 1
3Y2 an
t - 1( tan x )
Y3
C ►
+ .
Integrals of the form Jsina x cosa x dx where a and (3 are real numbers.
We consider two cases when the integral admits a representation in ra-
tional functions.
(a) Let either a or (3 be an odd positive number. For definiteness we
put (3 = 2k + 1 where k > 0 is an integer. Then a is any real number. Using
the identity cos 2 x + sin 2 x = 1, we have

) sin" x cos~ x dx = ) sin" x cos 2 k + 1 x dx= ) sin" x (cos 2 x)' cos x dx

= ) sin" x (1 - sin 2 x)' c.os x dx.

Put sin x = t. Then cos x dx = dt and


) sin" xcos2 k + 1 xdx = ) t"(l - t 2 l dt.

Applying the binomial theorem to the integrand on the right, we obtain


(k + 1) power functions which are easily integrable.
Examples. Compute each of the following integrals.
(1) Isin 2 x cos 5 x dx.
◄ Manipulating on the integrand, we have

) sin2 X cos' x dx = ) sin 2 x cos4 x cos x dx

) sin 2 x(l -sin2 x)2 cos x dx.

Then the substitution sin x = t gives cos x dx = dt and

) sin2 x(l - sin2 x)2 cos x dx = ) t 2 (1 - t 2 ) 2 dt

= f (1 2 - 2t 4 + t 6 ) dt =! t 3. - ~ t 5 + _!._ t 1 + C
J 3 5 7

= ]l .3
Sill X - S2Sill
.s l
X + ""j
-1
Sill X + C. ►
448 9. Integral Calculus. The Indefinite Integral

sin 3 x
(2) ) 2 dx.
COS X

◄ We have
r sin 3
X dx = r sin 2
X sinxdx = rl - cos 2 X sinxdx.
J cos 2 x J cos 2 x J cos 2 x
Put cos x = t. Then sin x dx = - dt and

) 1 - cos x sinxdx= ) t 12
2 2 - 1
dt=
cos 2 x
= t + -l + C = cosx + -l- + C. ►
t cosx
(3) r cos3 X dx.
J ✓sin x
◄ We have

)xi:: dx = ) ~;i:: cosxdx =


1 - sin 2 x
) --,_-----=---
✓ sinx
cos x dx

Put sin x = t. Then cos x dx = dt and


- sin 2 x cos x dx = ) -
) --1 --,===-- l - 2
- t- dt = ) (I - 112 - 1312 ) dt
✓ sin x Yf
= v2t 112 - ~ t 512 + C = 2✓ sin x - ~ (✓ sin x )5 + C. ►
5 5

(b) Let a and {3 be positive even numbers, i.e., o: = 2m and /3 = 2n


where m and n are natural numbers. Here it is helpful to manipulate on
the integrand by applying the trigonometric identities
1 + cos 2x
. 2
Sill X = 1 - cos
2
2x
and cos x =
2
2 .

Suppose that m :;t: n. Then using the identities (*), we obtain

) sin2m X cos 2" X dx = ) (sin2 xr(cos 2 x)" dx

=) c- ~os2xr (1 + ~os2x)"dx
= 2}+ n ) (I - cos 2xr(I + cos 2x}" dx.
Applying the binomial theorem to the factors (1 - cos 2x)m and
(1 + cos 2xt and multiplying the polynomials thus obtained we arrive at
9.5 Integrals Involving Trigonometric Functions 449

the integrand involving odd and even powers of cos 2x. The terms involving
odd powers of cos 2x are easily integrable as we have discussed in the
preceding section. To integrate the terms involving even powers of cos 2x
we apply to them the identity (*). This yields the terms involving powers
of cos 4x. Continuing this process we finally arrive at integrals of the form
Jcos kx dx where k > 0 is a positive number. These integr,als can be com-
puted without difficulty.
Suppose now that m = n. Then applying the identity
.
Sill x cos x =
1 sm
. 2x
2
we obtain
) sin 2 " x cos 2 " x dx = ) (sin x cos x )2" dx = ) (~ sin 2x Y" dx

= _l_ f sin n 2x dx = _l_ f (sin


2 2 2xf dx
4n J 4n J
= 1
~ Jf ( 1 - cos 4x ) nd = _I_ (l _
2 X gn J
f COS
4X )n d X.

Clearly, the integral on the right is easily computed as given above.


Examples. Compute each of the following integrals.
(1) Jsin 2 x cos 4 x dx.
◄ We have

JfSill
· 2 X COS4 X dX = Jf 1 - cos 2x ( 1 + cos 2x ) 2 d
2 2 X

=!)(I +cos 2x-cos 2x-cos 2x) 2 3 dx .

f [ 1+cos 2x- 1+ cos


=g1 J 2
4x . 2 2x)cos 2x] dx
-(1-sm

= !)G-~ cos 4x + sin 2 2x cos 2x) dx

= g1 (12 X - 1 Sill
S . 4X + 1 Sill
6 . 3 2x) + c. ►
(2) Jsin 2 x cos 2 x dx.
◄ We have

) sin2 xcos 2 xdx = ) (sinxcosx) 2 dx = !) sin2 2xdx

= !) 1 - ~os 4x dx = !( ! x - sin 4x) + C. ►

29-9505
450 9. Integral Calculus. The Indefinite Integral

Integrals of the form Jsin ax cos {3x d'x, Jcos ax cos {3x dx,
Jsin ax sin {3x dx where a :;r: (3. To compute these integrals it is helpful to
use the following trigonometric identities

sin ax cos (3x = ~ [sin(a + (3)x + sin(a - (3)x],


1
cos ax cos (3x = 2 [cos(a + (3)x + cos(a - (3)x],

sin ax sin (3x = ~ [cos( a - (3)x - cos(a + (3)x].

For example, if we wish to compute sin ax cos (3x dx then I


fsin ca cos {3x dx = ~ I[sin(<> + {J)x + sin(<> - {J)x] dx

= }_ [- ~~-(~ +(3(3)x _ cos( a - (3)x ] +C


~__c--=--
2 a+ a-(3

The other two integrals are computed in a similar way.


I
Example. Compute cos 3x cos x dx.
◄ We have

fcos 3x cosx dx =~ f (cos 4x + cos 2x) dx

= ~ (! sin 4x + ~ sin 2x) + C = ! sin 4x + ! sin 2x + C ►

Exercises
Apply the standard integration formulas to compute the following
integrals

1. fx2
Txdx. 2. f 1/.7 . f✓~ 3. dx. 4. f(vx + Jx )'ttx.
5. f
z<gx dx. 6. f 6' dx.
32x
f+ 7. 2x
Hf
5x
dx. 8. f cos2x d
cos 2 x sin 2 x x.

9. f sin2x
sinx cos 3 x
d x. lO.f + si? 7x
Sill 5X
sin 3x dx.
COS 2x
11. f (tan x - cot x)2dx.

f sinh 2x dx.
12.
f cos2 xdx sin2 x · 13· f sinh2 x dxcosh2 x · 14· J coshx
Exercises 451

ltanh 2 x dx. 16. fcoth2 x dx.


15.
J J 17.
Jf 4x fx+ 9 . 18.
Jf 4x fx- 9
.

19. f.J dx . 20. f dx .


J 4x + 9 2 J✓ 9 - 4x 2

Use integration by substitution to compute the following integrals

21. fxeX 212 dx. 22. lx 2 sin x33 dx. 23. f✓ x dx. 24. f dx .
J J J 1 - x2 J X In X
25.
JW<~\ ¼) . u. JW ;x-Tx . 27 " J✓/: 1 .

28. f f' f dx. 29. x2(x - 1) 18 dx. 30. f x-.J dx . 31. fx'-fx+Tdx.
J e -1 J J x-1 J
32. rJ Vx + 1 dx. 33. Jr✓ 1 x'- x dx. 34. Jr 1 +dxx 35. Jr✓ 1 x- x dx.
3 X
2
X 4 •
2
6

36. rJ In_sm2x
tan dx. 37. JIncotx
X d 38. J sin etan2x dx. X. 3
X
sin x COS X

39. J
tan - xe<tan- 2x i
dx. 40.
J lnx dx.
2
1
>
2
1+x x(4 + In x)
Use integration by parts to compute the following integrals

41. J xe - 'dx. 42. J x2' dx. 43. J x sin 2x dx. 44. J+


{I x)i' dx.

45. J (I + x In 2)2' dx. 46. J(2x - x 2 )e -, dx. 47. J tan - 1 x dx.

48. Jx tan - x dx. 49. Jsin - ix dx. 50. Jr


1 x2
COS X
dx. 51. r tan
J
X 2 X dx.

52. Jcos (In x) dx. 53. Jsin (In x) dx.


Integrate partial fractions

54. f 2 dx 55 f dx 56 f dx 57. J
dx
J 5 + 2x · • J 2 - 3x · • J (3x + 5) (1 - 2x)
5 •
3 •

58. f x2 x + 2 dx. 60. Jr x2 x- +X •+ 2 dX.


J +2x+5
61. f z xdx .
J X + 7X + 13
29*
452 9. Integral Calculus. The Indefinite Integral
----------

Applying the method of comparing coefficients, compute the following


integrals

62 _ f 2x + 3 dx. 63 _ f x dx . 64 _ f 3x2 - 2x - 4 dx.


Jx 2 + 3x - 10 J (x + 1)(2x + 1) J (x - 1)(x2 - 4)

65. jx 3-x + 2 dx. 66.


x - 1
2
j x 4 + 1 dx. 67.
3
x - x
j x + 2 dx. 68.
(x + 3)
2
j x 2 + 2x dx.
(x + 1)
4

69 f dx 70 f x2 - 3x d 71 f dx 72 f dx
· Jx + x 3 4 • • J (x + l)(x- 1)2 x. · J x(x + 1) 2 · · Jx 1
4 -

73. f : dx .
JX - 1

Compute the integrals involving irrational functions

74. f l ~ dx. 75. f ✓l +x dx.


J (1 - x)2 ✓ t+x J X

~
76. j(]~= x);I + x)' j: + ~ dx. 77. j ~T+x (1
d_x
+ x)2
78. f dx . 79. f dx . 80. f dx
J ✓x2 - 4x J ✓4x - x 2 J ✓x2 - 4x - 5 ·

81. f dx . 82. f dx . 83. f x + l dx.


J✓x 2 - 4x + 5 J✓ 5 - 4x - x 2 J✓ x 2 - 6x - 1

84. j ✓
x + 1
dx. 85.
j ✓
x - 3
dx. 86.
j 2x 2 + 3x + 2
--=====-- dx.
1 + 6x - x 2 1 + 6x - x 2 ✓ x 2 + 2x + 2

87. f 1-:; 2x- 3x' dx. 88. f ✓' + I dx. 89. f ✓x> dx . 90. f✓ x• dx.
J 1- x 2 J x2 +2 J 1- x2 J 1+ x 2

Compute the integrals involving trigonometric functions

91 • J2 + ~OS x . 92• j 5+ !~in x . 93• j 3 sin x ~ 4 cos x .

94. f dx . 95 _ f sin 2x dx. 96 _ f cosxdx .


J 5 + sin x + 3 cos x J 1 + sin 2 x J 5 + sin2 x- 6 sin x

97. j 1
1
+ cos x - sin
. x dx.
- COSX + SlnX
98. Jcos xdx. 3 99. Jsin xdx.
5
Exercises 453

100. 1 sin 2 x cos 3 x dx. 101. 1 cos 4 x sin


. 3x dx. 102. 1 sin 34x dx.
COS X

103.1 sin 2x cos 2 x dx. 104. 1 sin4 x dx. 105. 1 sin4 x cos 2 x dx.

106. f dx
J sin4 x ·
107. r cos2
J sin4x
X dx. 108. f dx
J sin2 X cos4 X
109. 1 dx
cos 4 x - sin4 x ·
110. 1 sin 5x cos x dx. 111. 1 sin x cos 5x dx.

112. 1 cos 7xcos 3xdx. 113. 1 sin 15x sin !Ox dx. 114. 12cos X cos 3
X dx.

115. 1 sin x sin 2x sin 3x dx.

Answers
1. 0.3 x 1013 + C. 2. 4'Vi + C. 3. -~ x 1518 + C.
15
4. ~
.t.
+ 2x + ln I x I + C.

5. 16x + C. 6. _1_ (2)x + C. 7. - l5n-5x - 2ln-2x + C. 8. - cot x - tan X + C.


In 16 In I 3
3

9. 2 tan x + C. 10. 2x + C. 11. tan x - cot x - 4x + C. 12. tan x - cot x + C.


13. - tanh x - coth x + C. 14. 2 cosh x + C. 15. x - tanh x + C. 16. x - coth x + C.

In ✓ 2x + ✓ 4x 2
1 2x - 3
17. 1
6 tan
-I2x
3 + C. 18 .
12 In 2x + 3
+ C. 19. + 9 + C.

20.½sin- 1 ~x+C. 21. ~ 12 + C. 22. -cos ~ 3 +C. 23. - ✓ 1-x2 + C. 24. Inllnxl +C.

25. 4 ln (1 + \'x) + C. 26. - 8 ✓ 1 - Vx + C. 27. In ✓ ex +1 - 1 + C. 28. ln I ex - 1 I +


✓ ex+l +1
(x - 1)2, (x - 1)20 (x - 1)'9
ex + C. 29. 21 + 10 + 19 + C. 30. 2 tan - 1 ✓x - 1 .

3 3 t
31. 28 (x + 1)413 (4x - 3) + C. 32. 10 (x + 1)213 (2x - 3) + C. 33. -(2 + x 2 )v l -x 2 +C.

34.~ tan- 1 x 2 +C. 35.! sin- 1 x 3 +C. 36.¼ln 2 tanx+C. 37.ln lnsinxl +C. 38.½etan 2 x+c. I
1
2 e<tan-•)zx + C. 40.
39.- ln ✓4 + ln 2 x + C. 41. -(x + l)e-x + C. 42. xln~ - 1 2x + C.
In 2

43. ¼ sin 2x -1 cos 2x + C. 44. xex + C. 45. x 2x + C. 46. x 2e-x + C. 47. x tan- 1x -

In ✓ 1 + x2 + C. 48. ~ (x2 + 1) tan - 1 x - ½ x + C. 49. x sin - 1 x + ✓ 1 - x2 + C.


454 9. Integral Calculus. The Indefinite Integral

x2 X
50. x tan x + In I cos x I + C. 51. - 2 + x tan x + In I cos x I + C. 52. 2 (cos In x +

sin In x) + C. 53. 1(sin In x - cos In x) + C. 54. In 5 + 2x I I + C. 55. -½ In I 2 - 3x I + C.

56.- 1 2 +C. 57.


6(3x+ 5)
1 4 +C. 58. -~ tan- 1 x
8(1 - 2x) v2 2
1 +C. 59. Ji In ✓x 2 +2x+5 +

ltan- 1 x + 1 + C. 60.1n ✓x 2 1-+c. 61. In ✓x 2 +7x+13 -


- x + 2 + -:--~ tan- 1 ~_;
2 2 v7 v7
2x+7
7
v3 tan
_1
v3 + C. 62. In 1 x 2 + 3x - IO I + C. 63. In ✓ lx+II I
I 2x + 1 I + C. 64. In (x -

l)(x 2 - 4) I + C. 65. ~ + In ; ~ ! + C. 66. 2


x2 x 2 -1
+ In - x - - + C. 67.
I
x+ 3- +

1 1 1
In Ix + 3 I + C. 68. --+C. 69. - --- + In - - ---
X
-- + C.
3(x + 1) 3 x+ X 2x 2 X + 1

70. - 1
--1 - + In I x + 1 I + C. 71. In Ix I + C.
x- ✓x2+1

72. 4
1
In
x-1
X + 1
- 1-tan_,
2
x + C. 73. l
3
(1n Ix - 1 I
✓x2+ x + 1
+ v'3 tan - 1 2x+l)+c.
v3
74. ,Ji +x +C. 75. 2✓ 1 + x + In
-Jl + X - 1
+C. 76.
X

✓ 1 - x2
+ C.


1-x -Jl + X + 1

77. -83 c-x 1+ X +C. 78. In Ix - 2 + ✓ x 2 - 4x I +C. 79. ·


sm
-1X-2
2 + C.

80. In Ix - 2 + ✓ x 2- 4x- 5 I + C. 81. ln(x - 2 + ✓ x 2- 4x + 5 ) + C. 82. sin - 1 x ~} .

83. ✓ x 2 - 6x - 1 + 4 In Ix - 3 + ✓ x 2 - 6x - I I + C. 84. 2✓ 1 + 6x -- x 2 +
._,x-3+c
4 sm --- . 85. - ✓ 1+6x-x 2 +C. 86. x✓x2 +2x+2 +C.
v'io

87.x 2✓ 1-x 2 +sin- 1 x+C.88.~-In(x+ ✓x 2 +2)+C.89. _ s+ 4~:+ 3x 4 x

X X
2 _ 1 5 tan 2 +4 1 2 tan- - 1
2
91. In ( 3 + tan 2 ; ) + C. 92. 3 tan 3 + C. 93. 5 In +C.
X
2 tan 2 + 1

2 2tan~+l
94. - - tan - 1 ----- + C. 95. ln(l + sin 2 x) + C. 96. _!_ In 5 - sin x +C
✓-f5 ✓-f5 4 1 - sin x
Exercises 455

tan x
97. -x + 2 ln
2 · x -
+ C. 98 . sm l.3
3 sm x + C. 99. - 51 cos s x + 3
2 cos 3 x -
X
1 + tan 2

. 3 x - I sm
cos x + C. 100. 31 sm . s x + C. 101 . I cos7 x - 1 cos s x + C. 102• 1
5 1 5 3 cos 3 x

1 x sin 4x 3 1 1 x
cos x + C. 103. 8- 32 + C. 104. 8 x - 4 sin 2x + 32 sin 4x + C. 105. 16 -

sin 4x _ sin 3 2x + C. 106• -cot x - 1 cot 3 x + C. 107. - 1 cot 3 x +


48 3 3 c. 108. tan x +
64

1 3
3 tan x - 2 cot 2x + C. 109. - 2
1 1n tan ( x - ,r)
4 + C. 110• - cos126x _ co~ 4x + C.

111 • _ co:26x + co~ 4x + C. 112_ si;~ox + si~4x + C. 113. _ sin5~5x + si~gx +C.

3 5 x cos6x cos4x cos 2x


114. 5 sin 6 x + 3 sin 6 + C 11S. 24 16 8 +C
Chapter 10
Integral Calculus.
The Definite Integral

10.1 Basic Concepts and Definitions


Area of a curvilinear figure. Suppose we wish to compute the area
of the plane figure aABb bounded by the graph of the positive-valued con-
tinuous function f(x), the x-axis and the vertical lines x = a and x = b
(Fig. 10.1).
Let us divide (or partition) the closed interval [a, b] into n subintervals
by choosing points Xo, x 1 , Xz, •.. , Xn such that
Xo = a < X1 < X1 < . . . < Xn - I < Xn = b.

y
B

Fig. 10.1

Next we take an arbitrary point ~k (Xk- 1 ~ ~k ~ Xk) in each subinterval


[Xk- 1, Xk] and construct n rectangles with bases [Xk-1, Xk] and bights/(~),
k = 1, 2, ... , n. The area of the kth rectangle is
~Qk = f(~k)~k,
where ~ k = Xk - Xk - 1 is the base length.
By doing so we obtain a polygonal figure made up of n rectangles; its
area Qn is equal to the area of the union of these rectangles so that we
_ _ _ !0.1 Basic Concepts and D_e_fi_ni_ti_on_s_ _ _ _ _ ~ - - - - - - ·-·-- _____ _±?]

can write
n
Qn = f(~1)AX"1 + f(6)AX"2 + · · · + f(~n)AX"n = ~ f(~k)AX°k,
k=I

Clearly, if we divide the interval [a, b] into smaller subintervals the


number of subintervals increases and the lengths of these subintervals be-
come smaller so that the polygonal figure thus obtained comes closer to
the plane figure aABb.
Let A = max AX°k be the largest of the lengths of. the subintervals
l~k~n
[Xk - Xk], k = 1, 2, ... , n. Evidently, the number of subintervals tends
1,
to infinity and their lengths ilXk tend to zero as A ~ 0 since O ~ ilXk ~ A
for all k = 1, 2, ... , n.
Suppose that there exists a finite limit Q of the polygonal figure as
A = max ilXk ~ 0. Then this limit is said to be equal to the area of the
l~k~n
plane figure aABb, i.e.,
n
Q = lim Qn = lim ~ f(~k)ilXk,
}..--+0 }..--+0 k = I
This limit, if it exists, must be the same for any collection of subintervals
[Xk _ 1, Xk] and any choice of points ~k in the subintervals. Thus computing
the area of the plane curvilinear figure aABb becomes equivalent to evalu-
ating a limit of the form
n
lim ~ f(~k)ilXk,
max.1.xk--+O k =1
l~k~n
Distance travelled by a particle. We wish to find a distance S travelled
by a particle during a time interval from t = to to t = T provided that the
velocity v of the moving particle is described by the function v = f(t).
Let us partition the interval [to, T] into n small subintervals by choosing
points to, t 1, t2, ... , tn such that
to < !1 < t2 < ... < tn = T.
Suppose that the velocity v(t) remains nearly unchanged on any subinterval
[tk - 1, lk] so that it may be thought of as being equal to the value of v(t)
at Tk E [tk- 1, tk], i.e., v = f(Tk), k = 1, 2, ... , n. Then the distance travelled
by the particle during the time interval tl.tk = [tk - tk _ d is nearly equal
to Sk = f( Tk) ~.tk; hence we can approximate the distance Sn travelled by
the particle during the time interval [to, 71 as
Sn = S1 + S2 + , , . + Sn = f(n) fl.ti + f(n) tl.t2
n
+ ... + f(Tn) din = ~ f(Tk) fl.ft.
k=l
458 10. Integral Calculus. The Definite Integral

We denote the largest of the subintervals /1fk, k = 1, 2, ... , n, by


A = max 11tk. The number of subintervals increases infinitely and their
I~k~n
lengths tend to zero as A~ 0. Evaluating the limit of Sn as A~ 0, we obtain
the exact value S of the distance travelled by the particle from to to T;
so we write
n
S = lim ~ /(7k) /1fk.
}..--+Ok=l

Thus we have arrived at the limits of the same form as before; if these
two limits exist they are called the definite integrals of the functions f(x)
b T
and /(t) denoted by the symbols ) f(x) dx and ) /(t) dt, respectively.
a to

Definition of the definite integral. Let f(x) be a function defined on


the closed interval [a, b] (a< b). Proceeding as before, we divide [a, b]
into n subintervals by choosing points
Xo = a < Xi < X2 < . . . < Xn - I < Xn = b.
We call the collection of subintervals of [a, b] the partition of [a, b].
Let 11xk = Xk - Xk _ 1 > 0 be the length of the kth subinterval and ~k
be a point of the kth subinterval for each k = 1, 2, ... , n. A collection
of points ~1, b, ... , ~n is called the selection for a given partition of
[a, b]. Given a partition of [a, b] and a selection for this partition we
can evaluate the sum
n
Sn = f(~I) Mt + /(b) M2 + • • • + f(~n) f1xn = ~ f(~k) f1xk,
k=I

where f(~k) is the value of f(x) at the point ~k E [xk _ 1, Xk]- This sum is
called the integral (Riemann) sum for /(x) determined by the given partition
of [a, b] and the given selection for this partition; hence a value of Sn
is dependent on the partition of [a, b] and the selection for the given parti-
tion, i.e., it depends on the way of dividing tlie interval [a, b] into subinter-
vals [Xk - 1, Xk] and choosing points ~kin these subintervals, k = 1, 2, ... , n.
Let A be the largest of the lengths of the subintervals [Xk _ 1, Xk], k = 1,
2, ... , n, i.e., A = max /1xk, We say that a number J is the limit of in-
n I~k~n
tegral sums ~ f(~k)l1xk for f(x) on [a, b] if given any number e > 0 there
k=l
exists a number o> 0 such that
n
~ f(~k)/1xk - J <e
k=l

for every partition of [a, b] with 11xk < o, k = 1, 2, ... , n, i.e., for every
partition with A < o, and every selection ~k, k = 1, 2, ... , n.
_______ 10.1 Basic Concepts and Definitions_ 459

In symbols, we write
n
J = lim I; f(~k) iixk.
}..-+Ok=I

Notice that the number o depends on the value of e; to signify this


we shall sometimes write o = o(e).
If given any partition of the closed interval [a, b], a< b, and any selec-
n
tion ~k, k = I, 2, ... , n, the integral sum I; f(~k) .Lixk has the same finite
k=l
limit J as 1'. ~ 0, this limit is called the definite (or Riemann) integral of
b
f(x) on [a, b] and denoted by 1f(x) dx. Thus by the definition of the
a
definite integral we have
b n
J = 1f(x) dx = Iim
a
I; f(~k) iixk.
}..-+Ok=I

The numbers a and b are called the lower limit and the upper limit
of the integral, respectively; xis called the variable of integration. The func-
tion f(x) is called the integrand and the expression f(x) dx is called the ele-
ment of integration.
It is worth noting that by virtue of the definition the definite integral
remains unchanged when at any point c in [a, b] the value f(x) of the in-
tegrand is replaced by some other number. In other words, if we replace
the integrand f(x) by the function
g(x) = ff(x) for x E [a, b], x ~ c,
lA for x = c,
where A ~ f(c), then
b b
1f(x)dx = 1g(x)dx.
a a

This is also true if f(x) is modified at any finite number of points in [a, b].
The definition above applies only if a < b; it is also convenient to in-
a
elude the cases a =b and b < a. We put I f(x) dx = 0 for b = a and
a
a b
) f(x)dx =- 1f(x)dx for b < a.
b a
b
Example. Compute ) dx.
a
460 10. Integral Calculus. The Definite Integral

◄ By the definition we have


b n n
1dx = lim ~
h--+Ok=l
'1Xk = lim ~
h--+Ok=l
(Xk - Xk _ i)
a

= lim [(Xi - Xo) + (X2 - xi)+ ... + (Xn- l - Xn-2)


h--+O
+ (Xn - Xn - 1)) = lim (Xn - Xo) = lim (b - a) = b - a. ►
h--+0 h--+0
Integrable functions. A function f(x) defined on the closed interval
[a, b] is said to be integrable on [a, b] if there exists the definite integral
b
1f(x) dx on [a, b].
a
Theorem 10.1. If the function f(x) is integrable on the closed interval
[a, b] then f(x) is bounded on [a, b].
◄ Let f(x) be unbounded on [a, b]. We divide [a, b] into n subintervals
[Xk - 1, xk], k = 1, 2, ... , n. Since f(x) is unbounded on [a, b] there exists
a subinterval, say [Xo, xi], where f(x) is unbounded. Consider the selection
~1, b, ... , ~n and the integral sum
n n
Sn = ~ f(~k) flXk = f(~t) fu1 + ~ f(~k) fuk.
k=l k=2

Clearly, by a suitable choice of ~1 in [Xo, xi] where f(x) is unbounded it


is possible to make the modulus ISnl of Sn arbitrarily large provided that
the points b, 6, ... , ~n are the same for all selections, i.e., provided that
n

the sum ~ f(~k) tuk remains the same for the chosen ~1 in [Xo, xi]. This
k=2
implies that the integral sum Sn has no finite limit as max tuk -+ 0, i.e.,
l~k~n
f(x) is not integrable on [a, b]. Whence it follows that if f(x) is integrable
on [a, b] then f(x) is bounded on [a, b]. ►
Remark. If a function is bounded on [a, b] it is not necessarily integra-
ble on this interval; in other words, a function can be bounded on [a, b]
but not integrable on [a, b]. For example, the Dirichlet's function

f(x) = D(x) = (1 for ~atio~al x


0 for 1rrat1onal x
is bounded on the closed interval [O, 1] since lf(x)I ~ 1 for all x in [O, 1];
however f(x) is not integrable on [O, 1].
n
◄ Indeed, the integral sum Sn = ~ f(f.k) tuk becomes Sn =
n n k=l
~ 1 . AXk = ~ '1Xk = l for every selection of rational points &, k_ = l,
k=l k=l
_ )0.2 Prope_rties of the DeJinit~ Int~al 461

n
2, ... , n, and Sn = I; 0 · AXk = 0 for every selection of irrational points
k=l
~k- Hence, given any arbitrarily small X. = max LU°k the integral sum Sn
l~k~n
is equal either to 1 or to O; this means that Sn has no limit as X. ~ 0, i.e.,
f(x) is not integrable on [O, 1]. ►
We shall give without proof three theorems that outline sufficient condi-
tions for a function to be integrable on a closed interval.
Theorem 10.2. If the function f(x) is continuous on the closed interval
[a, b] then f(x) is integrable on [a, b].
For example, the function f(x) = e - xi is continuous on [O, a] where a
is an arbitrary number and, consequently, f(x) is integrable on [O, a], i.e.,
+ 00
there exists the definite integral I e- xi dx of this function.
a
Theorem 10.3. If the function f(x) is defined and monotone on the
closed interval [a, b] then f(x) is integrable on [a, b].
It is worth noting that all values of the function f(x) which is monotone
on [a, b] lie between the numbersf(a) andf(b); sof(x) is bounded on [a, b].
Theorem 10.4. Let a function f(x) be bounded on a closed interval
[a, b] and let f(x) have a finite number of discontinuities (of the first or
second kind) on [a, b]. Then f(x) is integrable on [a, b].
For example, the function
sin~ for x -:;r:- 0
f(x) = (
1 for x =0
1s integrable on the closed interval [O, 1) since lf(x)I ~ 1 for all x in
[O, l], i.e., f(x) is bounded on [O, 1), and f(x) has the only discontinuity
(of the second kind) at x = 0.

10.2 Properties of the Definite Integral


We shall derive some properties of definite integrals assuming that
all functions in question are continuous and, consequently, integrable on
a closed interval [a, b].
(1) The definite integral is dependent only on its lower and upper limits,
i.e., on the numbers a and b, and on the integrand f(x); however, it is in-
dependent of the variable of integration. So the definite integral remains
unchanged when x is replaced by any other symbol, i.e.,
b b b

a
J f(x) dx = J f(t) dt =
a
I f(u) du.
a
462 10. Integral Calculus. The Definite Integral

(2) The constant factor can be taken outside the integral sign so that
b b
i Af(x) dx = A f f(x) dx, A = const.
a a

◄ By the definition of the definite integral we have


b n

a
i Af(x) dx = !i~ k~I Af(~k) ilxk
n b
= A lim ~ f(~k) ilxk = A ) f(x) dx. ►
A-Ok=I a

(3) The definite integral of the sum (difference) of two functions is equal
to the sum (difference) of the integrals of these functions, i.e.,
b b b
) 1/1 (x) ± f2(x)] dx = J fi (x) dx ± ) f2(x) dx.
a a a
b n

◄ i 1/1 (x) ± f2(x)] dx = lim A-Ok=l


~ L/1 (~k) ± f2(~k)]ilxk
a

n n
= lim ~ f1(~k) ilxk ± lim ~ f2(~k) AXk
A-Ok=l A-Ok=l
b b
= i f1(x) dx± f f2(x) dx. ►
a a

Corollary. The linearity property of the definite integral:


b b b
) [A1f1(x) + Ai./2(x)] dx = Ai J f1(x) dx + A2 ) fz(x) dx,
a a a

where A 1 and A2 are arbitrary constants.


(4) Given any numbers a, b and c there holds
b C b
i f(x) dx = 1f(x) dx + I f(x) dx
a a c
provided that both integrals on the right exist.
This property is often called the interval union property.
◄ We distinguish two cases.
(a) Let a < c < b. Then by the definition of the definite integral we have
b n
i f(x) dx = lim b
a ).-Ok=l
f(~k) illk.
10.2 Properties of the Definite Integral 463

Since the definite integral is independent of the partition of [a, b] we


can include the point c into the selection for the given partition of [a, b]
by choosing the partition (Fig. 10.2) as
Xo = 0 < Xi < X2 < . . . < Xm = C < Xm + l < . . . < Xn = b.
n
Then the integral sum ~ f(tk) AXk associated with this partition of
k=l
[a, b] can be split into two integral sums as
n m n
I; f(tk) AXk = I; f(tk) AXk + I; f(tk) AXk,
k=l k=l k=m+l

where the sums on the right are associated with the partitions of [a, c]
and of [c, b], respectively.

lj

b=Xn X

Fig. 10.2

Evaluating the limit as X. = max AXk -+ 0, we obtain


l~k~n

b n m

lr f(x) dx = Alim ~ f(tk) AXk = lim I; f(tk) AXk


➔ Ok=l A ➔ Ok=l
n c b
+ lim I;
A ➔ Ok=m+l
f(tk) AXk =
a
I J(x) dx + )
c
f(x) dx.

(b) Now let a< b < c. As before we have


C b C

J f(x) dx = I J(x) dx + J f(x) dx.


a a b
464 10. Integral Calculus. The Definite Integral
------------------

Whence
b C C C b
1f(x) dx = 1f(x) dx - 1f(x) dx = 1f(x) dx + 1f(x) dx.
a a b a c
Given /(x) > 0 and a < c < b this property implies that the area of the
curvilinear figure aABb is equal to the sum of the areas of the curvilinear
figures acCA and cbBC as easily seen from Fig. 10.2 ►
(5) Let the functions f(x) and g(x) be such that f(x) ~ g(x) on [a, b].
Then
b b
) f(x) dx ~ 1g(x) dx.
a a
This means that integration preserves inequalities between functions.
◄ Since f(x) ~ g(x) at every point x in [a, b] then given any partition
of [a, b] and any selection ~k, k = 1, 2, ... , n, there holds
n n
~ f(~k) AXk ~ ~ g(~k) AXk.
k=l k=I

Evaluating the limit as ~ = max AXk ~ 0, we get for a ~ b


l~k~n
b b
1/(x) dx ~ ) g(x) dx. ►
a a

'j

n a X

Fig. 10.3

Remark. If /(x) ~ 0 and g(x) ~ 0 on [a, b] this property means that


the area of the curvilinear figure abB1A 1 does not exceed the area of abB2A2
(Fig. 10.3). In particular, this property implies that
b
1/(x) dx ~ 0
a
(I f(x) dx ,.; 0)
for f(x) ~ 0 (f(x) ~ 0) on [a, b] where a < b.
10.2 Properties of the Definite Integral 465

(6) Let a< b. Then


b b
i f(x) dx ~ ) lf(x)I dx.
a a

◄ Evaluating the integral of


- lf(x)I ~ J<x> ~ lf(x)I
from a to b, we obtain
b b b
- ) IJ(x)I dx ~ ) /(x) dx ~ ) 1/(x)I dx
a a a

so that
b b
) f(x) dx ~ ) IJ(x)I dx. ►
a a

(7) Let m and M be the minimum and maximum values of f(x), respec-
tively, on [a, b] where a < b. Then
b
m(b - a) ~ ) /(x) dx ~M(b - a).
a

◄ Since m ~ f(x) ~ M for all x in [a, b] Property 5 yields


b b b
) m dx ~ ) f(x) dx ~ ) M dx.
a a a

Observe that
b b
) m dx = m ) dx = m(b - a)
a a
and
b b
) M dx = M ) dx = M(b - a).
a a
Whence
b
m(b - a)~ ) f(x)dx ~ M(b - a). ►
a

Remark. Given a functionf(x) > 0 on [a, b] where a< b, this property


implies that the area Q of curvilinear figure abBA lies between the areas
Qi and Q2 of the rectangles abB1A1 and abB2A2 (Fig. 10.4), i.e.,
Q1 ~ Q ~ Qz.
30-9505
466 10. Integral Calculus. The Definite Integral

lJ

0 a 6 X

Fig. 10.4

211"
dx
Examples. (1) Evaluate the integral )
✓ 10 + 6 sinx
0
◄ Since
. 1 1
m = min = 0.25
o~x~2'11" ✓ 10 + 6 sin x ✓ 10 + 6 sinx X=-
'If

2
and
1 1
M= max = 0.50.
O~x~h ✓ lo + 6 sinx ✓ 10+6sinx X=
3
'IT
2
Then Property 7 yields
211"

2,r X 0.25 ~ dx
) ----;:=====- ~ 21r X 0.50
✓ 10 + 6 sinx
0
and
211'
dx
7r ~ ) --;:::::::=====- ~
2 7r. ►
✓ 10 + 6 sinx
0
(2) Evaluate which of the integrals
1 1
J e - x dx2
and J e - x dx
0 0
is larger without direct computations of their values.
10.3 Fundamental Theorems for Definite Integrals 467
- ----- --- - ·----------------

◄ We have x 2 ~ x for all x in [0, l]; whence - x ~ - x 2 • Since e > 1 then


e - x ~ e - x 2 and Property 5 yields
1 1
) e - xi dx ~ ) e - x dx. ►
0 0

10.3 Fundamental Theorems for Definite Integrals


Mean value theorem. We start with a theorem which states that
every continuous function on a closed interval attains a specific value called
the mean value of the function.
Theorem 10.5 (mean value theorem). Let f(x) be a continuous function
on a closed interval [a, b]. Then there exists at least one point ~ in
[a, b] such that
b
) f(x) dx = (b - a)f(~), a ~ ~ ~ b.
a

◄ Since f(x) is continuous on [a, b] it has the minimum value m and


the maximum value Mon [a, b] (see Fig. 10.4). Then Property 7 yields
b
m(b - a) ~ ) f(x) dx ~ M(b - a).
a

Whence
b
) f(x) dx
m ~-a----~M
b-a
because b - a> 0.
Put b

b ~a 1
a
f(x) dx = µ.,

where m ~ µ ~ M.
Since f(x) is continuous on [a, b], it at~ains all intermediate values be-
tween m and M. This implies that there exists a point x = ~ in [a, b] such
that f(~) = µ, i.e.,
b
) f(x) dx b

•b _ a = J(x) or ) /(x) dx = (b - a)/(~), a,;;; ~,;;; b. ►


a
30*
468 10. Integral Calculus. The Definite Integral

Remark. For a < b we have

Put
~-a
b = 0, 0 ~ 0 ~ 1.
-a
Then ~ = a + (b - a)0 and we rewrite the conclusion of this theorem as
b
1f(x) dx = (b - a)f [a + (b - a)0], 0 ~ 0 ~ 1.
a

I)

0 a b X

Fig. 10.5

To interpret the mean value theorem for integrals we look at Fig. 10.5.
The curvilinear figure abBA is bounded by the graph AB of the function
f(x) which is nonnegative on [a, b] (a < b), the x-axis and the vertical lines
b
x = a and x = b. We let Qi denote the area of abBA; then 1f(x) dx = Qi.
a
For the rectangle abNM the base is [a, b] and the height is equal to the
ordinate of the point C(t /(~)) so the area Qi of abNM is Qi = (b - a)/(~).
The mean value theorem tells us that there exists a point C(t /(~)) on the
graph AB such that Qi = Qi.
10.3 Fundamental Theorems for Definite Integrals 469

The number
b

M[/(x)] = b ~a ) f(x)dx
a

is called the mean value of the function f(x) on [a, b].


If a function f(x) is continuous on [a, b] then there exists a point ~
1n [a, b] such that Mlf(x)] = /(~).
Example. Compute the mean value of f(x) = sin x on [O, 1r].
◄ By virtue of the above definition we have
7r

M[sinx] =
7r-
I O
Jf sinxdx =_!_(-cos
7r
1r + cosO) = ~.
7r

0

We have applied here the Newton-Leibniz theorem which will be derived


in this section.
Fundamental theorems of calculus. Let f(x) be a continuous function
on [a, b]. We choose an arbitrary point in [a, b] and consider the definite
b
integral ) f(x) dx. Since the definite integral is independent of x we can
a
substitute t for x and write
X X

) f(x) dx = ) f(t) dt.


a a

Since f(x) is continuous on [a, b], this integral exists at every x in [a, b].
Consequently, the integral becomes a function of its upper limit x. We let
F(x) denote this function, i.e.,
X

F(x) = ~ f(t) dt.


a

Theorem 10.6 (first fundamental theorem of calculus). Let f(x) be con-


x
tinuous on fa, b]. Then the function F(x) = l f(t)dt has a derivative at
a
every point x in [a, b] and F' (x) = f(x).
In other words, the derivative of the definite integral with respect to
its upper limit is equal to the value of the integrand at the upper limit
of the integral.
◄ Consider the increment AX~ 0 such that x + AXE [a, b]. This incre-
ment gives the increment M' in F(x). By virtue of the interval union
property (Property 4) of the definite integral we have
470 10. Integral_ Calculus. T~_e Definite Integral
-- --------- --

x+ dX X x+ dX
M = F(x + AX) - F(x) = ) f(t)dt - ) f(t)dt = ) f(t) dt
a a a
a a x+ dx x+ dx
+ ) f(t) dt = 1f(t) dt + 1 f(t) dt= 1 f(t) dt.
X X a X

Applying the mean value theorem, we obtain


M' = (x + AX - x)f(x + 0AX) = AXf(x + 0 AX).
Whence
M
AX = f(x + 0 AX), 0 ~ 0 ~ 1.
Observe that f(x) is continuous at every point x in [a, b ] .. Then evaluat-
ing the limit of the above ratio as ~ ~ 0, we get

lim ~ = lim f(x + 0 AX) = f(x),


dX--+ 0 ~ dX--+ 0

that is,
F' (x) = f(x) or ( I dt)'
f(t) = f(x) for all x in [a, b]. ►
Remark. If f(x) is a continuous function on [a, b] then for every x in
[a, b] there holds

(IJ(t) dt)' = (- 1J(t) dt)' = - ( i f(t) dt)' = -f(x).


For example, ( I e _ ,, dt)' = +e-x' and ( I e _,, dt)' = -e -x'.
Theorem 10.7 (second fundamental theorem of calculus). Let f(x) be
a continuous function on a closed interval [a, b]. Then f(x) has an an-
tiderivative on [a, b] and, consequently, f(x) has the indefinite integral.
◄ Since f(x) is continuous on [a, b] there_ exists the definite integral
X X
1f(t) dt for every x in [a, b], i.e., there exists the function F(x) = J f(t) dt
a a
such that F' (x) = /(x) for all x in [a, b]. This means that F(x) is an an-
tiderivative of f(x) on [a, b]. Whence it follows that the indefinite integral
of the function /(x) continuous on [a, b] admits a representation of the
form
X

)f(x) dx = 1f(t) dt + C,
a

where C is an arbitrary constant. ►


10.3 Fundamental Theorems for Definite Integrals 471

Newton-Leibniz theorem. We now turn our attention to a theorem


which is helpful in computations of definite integrals.
Theorem 10.8 (Newton-Leibniz theorem). Let f(x) be a continuous func-
tion on a closed interval [a, b] and let F(x) be an antiderivative of f(x)
on [a, b]. Then
b
) /(x) dx = F(b) - F(a).
a

◄ Consider the function


X

cl>(x) = ) /(t) dt, x E [a, b].


a

This function is an antiderivative of /(x) on [a, b ]. Recall that any two


antiderivatives of a given function differ from each other by a constant;
hence there exists a constant C such that
X

cl>(x) = F(x) + C or ) /(t) dt = F(x) + C


a
for all x in [a, b].
a a
Put x = a. Then ) /(t) dt = F(a) + C. Since I /(t) dt = 0, we have
a a
F(a) +C = O; whence C = -F(a). Thus we arrive at
X

) /(t) dt = F(x) - F(a).


a
b
Put now x = b. Then ) f(t) dt = F(b) - F(a) and, substituting x for
a
t, we arrive at the desired formula
b
I /(x) dx = F(b) - F(a). ►
a
b
Remark. Using the notation F(b) - F(a) = F(x) , we can write the
a
Newton-Leibniz theorem as
b b
I /(x) dx = F(x)
a
a'

where f(x) = F' (x).


The Newton-Leibniz theorem establishes a close relation between the
definite integral of /(x) and its antiderivative F(x) so that the computation
of the former becomes the evaluation of the latter.
472 10. Integral Calculus. The Definite Integral

Examples. Compute the following definite integrals.


4
(1) ) x dx.
2

◄ Recall that
r
Jx dx = T
x2 x2
+ C and F(x) = T + C.

Then
4
r x2 4 42 22
J x dx = 2 2 =2 - 2 = 6. ►
2
11"

(2) ) sin x dx.


0

◄ We have
11"

j sin x dx = - cos x : - - cos 1r - ( - cos 0) = 2. ►


0

10.4 Evaluating Definite Integrals


Integration by substitution. Now we consider how a method of in-
tegration by substitution applies when we deal with definite integrals.
b
Theorem 10.9. Let there be given the integral ) f(x) dx where f(x) is
a
a continuous function on a closed interval [a, b] and let x = <P(t). Suppose
that the function <P(t) satisfies the following conditions:
(i) the function <PU) assumes values from a to b when t varies from
a to {3 so that <;?(a) = a, <P(f3) = band all intermediate values of cp(t) are
in [a, b];
(ii) the derivative cp' (t) of cp(t) is a continuous function on the closed
interval [a, {3].
Then
b {3
1f(x)dx = 1f[cp(t)]cp'(t)dt.
a a

◄ Applying the Newton-Leibniz theorem, we have


b
) f(x) dx = F(b) - F(a),
a
10.4 Evaluating Definite Integrals 473

where F(x) is an antiderivative of /(x) on [a, b ], i.e., F' (x) = /(x) for all
x in [a, b].
Consider the composite function 4'(t) = F [',O(t)] in ton [a, {l]. By the
chain rule of differentiation of a composite function we obtain
<l>'(t) = F' [',O(t)] 'P'(t) =f [',O(t)]'P'(t).
Hence 4'(t) is an antiderivative of the function/[',O(t)] 'P' (t) continuous
on [a, {l]. Then the Newton-Leibniz theorem yields the desired result,
namely
(j
I /[',O(t)] ',O' (t) dt = 4'(/1) - cl,(a)
a
b
= F['P(/1)] - F[',O(a)] = F(b) - F(a) = ) /(x) dx. ►
a

Remark. This method of integration is appropriate only if by a suitable


(j
choice of ',O(t) the integral 1/[',O(t)] 'P' (t) dt decomes easier to compute
a
b
than the original integral 1/(x) dx.
a
Notice that the method does not require the original variable be sub-
stituted for the new one.
We shall illustrate the integration by substitution by means of examples.
Examples. Integrate by substitution the following integrals.
a
(1) I ✓a 2 - x 2 dx (a > 0).
0

◄ Put x = a sin t. Then dx = a cos t dt and ✓ a 2 - x2 = a sin t. Sub-


stituting x = 0 and x = a into x = a sin t, we get two equations a sin t = 0
and a sin t = a; whence t = 0 and t = 1r/2 are the lower and upper limits,
respectively, of the new integral. Finally we have
11" 11"
a 2 2

J ✓a2 - x2 dx = a1 J cos t dt 2 = a2
J
1 + cos 2t
2
dt
0 0 0
11" 11"

e
ln2x
=2 ( 02 t
2

0
1 . 2
+ Z SID t :) -
1ra2
4
. ►

(2)
J
1
X
dx.
474 10. Integral Calculus. The Definite Integral

◄ Put x = e'. Since t =0 for x = l, t = l for x =e and t = In x,


e

j ln 2 x
x dx =
j t dt = Tt
1
2 3 1
= 31 . ►
0
I 0

It is worth noting that son1etimes it is convenient to use the substitution


t = 1/;(x) instead of x = 'P(t).
ln2
(3) 1✓ ex - l dx.
0

◄ Put t = ✓ ex - l . Then x = In (t 2 + 1) and dt = U&2 • Since


. t =0
1+t
for x =0 and t = l for x = In 2, respectively, we have
ln2 I

j ✓e• - I
t2 dt
1 + t2
=2 l
J
0 0

l 2
1+ t
) dt=2 (t 1-
0
tan - l t 01)

= 2(1 - tan - 1 1) =2 - ; . ►
1
(4) ) (2x3 - l)✓x4 - 2x + l dx.
0

◄ Put t = x 4 - 2x + 1. Notice that here we do not need to find the func-


tion x = 'P(t). On differentiating t = x 4 - 2x + l, we have
1
dt = (4x3 - 2)dx; whence (2x3 - 1) dx = 2 dt and

j
1
(2x3 - I)✓
,-----
x 4 - 2x I dx + = ~
0
j ,ft dt =
0 I

The following theorem is sometimes useful in simplifying the computa-


tions of definite integrals.
Theorem 10.10. Let f(x) be an integrable function on a closed interval
[ - a, a] which is symmetric relative to the point 0, where a > 0. Then

a
1f(x)dx =
[2) a

0
f(x)dx if f(x) is even,
-a 0 if f(x) is odd.
10.4 Evaluating Definite Integrals 475

◄ By the interval union property (Property 4) of the definite integral


we have
a O a
J /(x) dx = J /(x) dx + J /(x) dx.
-a -a 0

Put x = -t so that dx = -dt and t = -x. Then


0 0 a a
.\ /(x) dx = J /( - t) dt = J /( - t) dt = ) /( - x) dx.
-a a O 0
Hence
a a
J /(x) dx = J [/( - x) + /(x)] dx.
-a 0

Recall that for an even function /(x) we have /(x) = /( -x) so that
a a
J f(x) dx = 2 J /(x) dx. Similarly, for an odd function /(x), i.e., for
-a 0
a
f(-x) = -/(x), ) /(x) dx = 0. ►
-a
'Ir

For example, ) sin 3 x ecosxdx = 0 since the integrand is an odd func-


tion on the closed interval [ - 1r, 1r]. Indeed,
sin 3 ( - x) ecos( - x) = - sin 3 X ecosx, \IX E [ - 7r, 1r].

Integration by parts. We shall extend the method of integration by parts


examined in the previous chapter to definite integrals.
Theorem 10.11. Let the functions u = u(x) and v = v(x) have continu-
ous derivatives u' (x) and v' (x) on a closed interval [a, b]. Then
b b
J u du = uv
a
I! - ) v du.
a

◄ By the hypothesis uv = u(x) v(x) is a differentiable function on [a, b].


Then (uv)' = uv' + vu', i.e., uv is an antiderivative of the function
uv' + vu' on [a, b]. Applying the Newton-Leibniz theorem, we obtain

J (uv' + vu') dx= uv


a
I: and
b b

J uv' dx + aJ vu' dx=uv


a
I:.
Whence
b b

J uv' dx = uv
a
I! - J vu' dx.
a
476 10. Integral Calculus. The Definite Integral

By the definition of the differential we have v' dx = dv and u' dx = du.


Substituting these relations into the above identity, we obtain the desired
result
b b
) u dv = uv
a .
I! - ) v du.
a

Examples. Apply integration by parts to the following integrals.


1r

(1) 1 (1r - x) sin x dx.


0

◄ We have u dv = (1r - x) sin x dx. Put u = 1r - x; then dv = sin x dx,


du = - dx and v = - cos x. Integration by parts yields
I~ - ) cos x dx
1r 1r

1( 1r - x) sin x dx = - (1r - x) cos x


0 0

e
=(x - 1r) cos x I~ - sin x I~= ► 1r.

(2) J lnx ~
X
2-dx.
I
dx dx 1 and in-
◄ Put u = lnx and dv = - 2- . Then du= - - and v =
X X x
tegration by parts gives
e e
e e e
lnx 1 dx lnx 1
dx = - -lnx +
J
I
x2 X
1 J
I
x2 - X
I X

1 1
+1 = 1 - 2e- 1 • ►
e e

10.5 Computing Areas and Volumes by Integration


Areas of plane figures in Cartesian coordinates. Let f(x) be a
continuous nonnegative function on a closed interval [a, b] where a < b.
Then the area Q of the curvilinear figure abBA (see Fig. 10.6) is
b
Q = ) f(x)dx.
a

For example, if a plane figure is bounded by the parabola y = x 2 , the


vertical line x = a (a > 0) and the x-axis as shown in Fig. 10.7 then its
area Q is

Q =
a

Jx
r 2 x3 a a3
dx = -3- o - -3-·
0
10.5 Computing Areas and Volumes by Integration 477

0 a g X

Fig. 10.6

Suppose that f(x) is a negative function on the closed interval [a, b],
where a < b, i.e., f(x) < 0 on [a, b]. Then the region (the plane figure)
bounded by the graph of y = f(x), the lines x = a and x = b and the x-axis
b
lies below the x-axis (see Fig. 10.8) and i f(x) dx < 0. In this case the area
a
Q of the plane figure abBA is
b b
Q= - i f(x) dx or Q = i f(x) dx
a a

Example. Compute the area of the plane figure bounded by the parabo-
la y = x 2 - 2x and the x-axis (Fig. 10.9).

y y

a g
0 X

X A

Fig. 10.7 Fig. 10.8


478 10. Integral Calculus. The Definite Integral _________________ ·--------··-·------

◄ Since y ~ 0 on (0, 2] the plane figure lies below the x-axis, the desired
area 1s
2 2
x3 2

Q=-J (x 2 - 2x) dx = J(2.x - x 2) dx = x2 :


3 0
4
3
0 0

4
and Q =3 . ►

!/

Fig. 10.9

Let a function f(x) change its sign when x passes through a point
c E (a, b ). Then the plane figure bounded by the graph of y = f(x), the
x-axis and the lines x = a and x = b can be regarded as being made up
of two plane figures lying above and below the x-axis (Fig. 10.10). In this
case the area Q of the plane figure is
C b
Q = Qi + Q2 = i f(x) dx +
a
) f(x) dx
C

For example, for the plane figure bounded by the parabola y = 1 - x2,
the line x = 2 and the x- and y-axes (Fig. 1O.l 1) is
1 2

Q= (1 - x 2 )dx + (1 - x 2 ) dx
J
0
J 1

1 x3 1 2 x3 2
=X
3
+ X
3
0 0 1 1

1 8
=1-- + 2-1-- +l - .
-2
3 3 3
10.5 Computing Areas and Volumes by Integration 479

0 r

Fig. 10.10
Fig. 10.11

Suppose now that f(x) and g(x) are continuous functions on [a, b],
where a < b, and f(x) > g(x) > 0. Let the graphs of y = f(x) and y = g(x)
intersect at the points A and B (Fig. 10.12). Then the area Q of the plane
figure bounded by the graphs of y = f(x) and y = g(x) is equal to the differ-
ence of the area Qi of the plane figure aACBb and the area Q2 of the
plane figure aADBb. Thus
b b b
Q = f f(x) dx - f g(x) dx or Q = f [f(x) - g(x)] dx.
a a a

y
8

0 a 6 X

Fig. 10.12
480 10. Integral Calculus. The Definite Integral

To find the limits a and b of integration one must eliminate y from


the system of equations y + f(x) and y = g(x) and solve the equation
f(x) = g(x), whose real roots will be the limits sought for.
Example. Compute the area of the plane figure bounded by the parabo-
las y = 4x - x 2 and y = x 2 - 4x + 6 (Fig. 10.13).

0 X

Fig. 10.13

◄ The abscissas of the points A and B where the parabolas intersect are
the solutions of the equation 4x - x 2 = x 2 - 4x + 6 or x 2 - 4x + 3 = O
whose roots are x1 = 1 and x2 = 3; hence the lower and upper limits of
integration are a = 1 and b = 3. Then the desired area Q is
3 3

Q = ) [4x - x 2 - (x2 - 4x + 6)] dx = ) (8x - 2x2 - 6) dx


1 1

3 2 3
= 4x2 - - x3
3
- 6x
1 1

Let a curve AB be given by parametric equations x = 'P(t) and y = 1/;(t)


where 'P(t) and 1/;(t) are continuous functions. Suppose that the function
'()(t) has a continuous derivative 'P' (t) on the closed interval [a, /j) and
'()(a) = a and '()(/j) = b. (Fig. 10.14). Evidently, the area Q of the plane
figure abBA is
b
Q= Jydx.
a
10.5 Computing Areas and Volumes by Integration 481

If we put x = 'P(f) and y = t/;(t) then we get the following expression for
the area Q of the plane figure specified by parametric equations
(3
Q= j t/;(t) 'PI (t) dt.
Q'

0 a b X
t=d t=/3
Fig. 10.14

Example. Compute the area of the ellipse given by the parametric equa-
tions x = a cost and y = b sin t, 0 ~ t < 21r (a, b > 0).
◄ Since the ellipse is symmetric relative to the x-axis and to the y-axis
it suffices to compute the area of the ellipse in the first quadrant; the
desired area is
a
Q= 4 j ydx.
0

Put x = a cos t and y = b sin t so that dx = - a sin t dt. To find the


new lower and upper limits of integration we set x = 0 and x = a in
x=a cost; this gives a cost = 0 and a=a cost. Whence ti = a = 1r/2 and
ti= .B= 0. Therefore if x varies from 0 to a t changes from 1r /2 to 0 so that
0 0

Q =4 jb sin t ( - a sin t dt) = - 4ab j sin 2 t dt


~/2 ~12

I -cos 2t ( ~12 I . ~12)


2 dt = 2ab t - 2 s1n 2t = 1rab. ►
0 0

Sometimes it is convenient to compute areas of plane figures by apply-


ing formulas that involve integration with respect to the variable y. In this
case the variable xis regarded as ~ function in y, i.e., x = g(y) where g(y)
31-9505
482 10. Integral Calculus. The Definite Integral

is a single-valued function continuous on the closed interval [c, d] of the


y-axis. The limits c and d of integration with respect toy that are the points
of intersection of a given curve with the y-axis can be computed as the
roots of the equation g(y) = 0 obtained by putting x = 0 in the equation
x = g(y). Then the area Q of the plane figure bounded by the graph of
the curve x = g(y) and the y-axis (Fig. 10.15) is given by the formula
d
Q= i g(y)dy.
C

Example. Compute the area of the plane figure bounded by the parabo-
la x = 2 - y - y 2 and the y-axis (Fig. 10.16).

y !I

d,____,_____,~~~~~~~

X= g(y)

0 X

Fig. 10.15 Fig. 10.16

◄ The limits of integration are the points of intersection of the parabola


with the y-axis so that putting x = 0, we get the eq:µation 2 - y - y 2 = O;
whence Yi = c = - 2 and Y2 = d = 1. The desired area Q is
1
y2 y3
Q = f (2 - y - y 2) dy = 2y
1
-- -- = 4.5.
J
-2
-2 2 -2 3 -2

Problems. (1) Compute the area of the plane figure bounded by the
parabola y 2 = 2x + 1 and the line x - y - 1 = O;
(2) Compute the area of the plane figure bounded by the curves
y = sin - 1 x and y = cos - 1 x and the x-axis. (Hint. Write the equations of
the curve in the form x = g(y).) ·
10.5 Computing Areas and Volumes by Integration 483

Area of a plane figure in polar coordinates. Suppose that a curve is


specified by its polar equation e = f('P) where f('P) is a continuous non-
negative function on a closed interval [a, ,B]. The plane figure bounded
by this curve and the two rays that start at the pole and make the angles
a and ,B, respectively, with the polar axis is called the curvilinear sector
(Fig. 10.17).

Fig. 10.17

0 p
Fig. 10.18

To compute the area of the curvilinear sector OABO we divide the sector
into n subsectors by drawing n rays '/J = a = <Po; '/J = 'Pt, ... , '/Jn - 1;
'/J = (3 = '/Jn from the pole. Let A'fJ1, A'/>2, ... , A'/Jn be angles between the
rays. We let '(pk denote the angle formed by the rays '/Jk - 1 and '/Jk and Qk
the position vector associated with '(pk. Consider the circular sector of
radius Qk with central angle A'/Jk (Fig. 10.18). The area of this circular sector
is equal to AQk = ; QkA'/Jk or AQk =; j2(rpk)A'/Jk since Qk = f('Pk)-
When we replace each curvilinear subsector by the corresponding circu-
lar sector we obtain the plane figure made up of n circular sectors; its area
Qn is

31*
484 10. Integral Calculus. The Definite Integral

n n

k=l k =1
We denote the largest fj.<Pk by A = max fj.<Pk • If we make n tend to in-
1 ~k~n
finity, so that A ➔ 0, i.e., if we divide a given curvilinear sector into smaller
subsectors, then the plane figure made up of circular sectors comes closer
to the curvilinear sector OABO. Thus we may regard the limit of the area
Qn as A = max /j.<Pk ➔ 0 as the area of the curvilinear sector OABO
l~k~n
provided that this limit exists and is independent of a partition of the closed
interval [ec, /3] and of a selection 'Pk associated with a given partition, k = I,
2, ... 1 n, 1.e.,
n

Clearly, the sum ~ ~ / 2 ( 'Pk) fj.<Pk is an integral sum for the function
k =1
~/ 2 (<P) which is continuous on [a, /3] since f(<P) is continuous on [a, /3].
Hence, the limit of this sum as A ➔ 0 exists and is equal to the definite
(3

integral ) ~ J2(,p) d,p.


Oi

(jJ

2a p

Fig. 10.19

Therefore the area of the curvilinear sector OABO is given by


~ ~

Q =~ J/2 (,p) d,p or Q =~ J ,/d,p.


Example. Compute the area of the plane figure bounded by the cardioid
e = a(l + cos <P), a > 0 (Fig. 10.19).
10.5 Computing Areas and Volumes by Integration 485

◄ The desired area is

Q = ~2 r
21r
0
(1 + cos cp)2dcp = ~ 1
0
(1 + 2 cos 'ii' + cos 2 'P) d'()

~ J ( 1 + 2 cos 'P + 1 + cos 2'() )


2 d'P
=
0
21r

= ~ J G+ 2 cos cp + ~ cos 2cp) dcp = ~ 1ra2• ►


0

Volumes of solids. We consider a solid bounded by a closed surface.


Let Q be the area of the cross section of the solid by a plane perpendicular
to the x-axis (Fig. 10.20). Clearly the value of Q is dependent on the loca-
tion of the perpendicular plane relative to the x-axis, i.e., this value is a
z
p Q

'II
I
I

b X

Fig. 10.20

function in x so that Q = Q(x). We assume that Q(x) is a continuous func-


tion on a closed interval [a, b]. To set up the formula for the volume of
the given solid we draw n planes x = a = Xo; x = Xi, X2, .•. , Xn-1;
x = b = Xn perpendicular to the x-axis. These planes divide the solid into
n slices and the closed interval [a, b] into n subintervals [Xk- 1 , Xk], k = 1,
2, ... , n.
To approximate the volume of the kth slice we select an arbitrary point
tk in [Xk- 1, Xk] and consider the cylinder whose generating line is parallel_
486 10. Integral Calculus. The Definite Integral

to the x-axis and directing line is the line of intersection of the solid surface
with the perpendicular plane x = ~k (see Fig. 10.20). The volume ~Vk of
this cylinder is the product of its height !:ak and the area Q(~k) of its base
so that ~ Vk = Q(~k) AXk, where ~k E [xk _ 1, Xk]. The volume of all n
cylinders thus obtained is
n n
Vn = ~ ~vk = ~ Q(~k) AX°k,
k=I k=I

If the limit of the sum on the right exists as ~ = max AX"n ~ 0 we set
l~k~n
it equal to the volume of the solid in question so that
n

V = lim ~ Q(~k) AX°k.


},,--+Ok=I

Notice that this sum is an integral sum for the function Q(x) which is con-
tinuous on the closed interval [a, b]; hence, the above limit exists and is
equal to the definite integral
b
V = f Q(x) dx.
a

,--
\
1
y

Fig. 10.21

Example. Compute the area of the solid bounded by the ellipsoid


x2 Y2 z2
-
a2+ - +
b2 - =
c2 1 .

◄ The cross section of the ellipsoid with the plane perpendicular to the
x-axis at x is the ellipse (Fig. 10.21)
Y2 z2 x2
-+-=1--
b2 c2 a2
10.5 Computing Areas and Volumes by Integration 487

or
Y2 z2
-------+-------=!

with the semiaxes

bTT and cjt ;: .


Then the area of the cross section is

Q(x) = 1rbc ( I - :, ) .

lj
8

0 X
0 X

Fig. 10.23
Fig. 10.22

Applying (*), we obtain

2
( 1 - x02 ) dx = 3
4 1rabc.

When b = c = a the ellipsoid becomes the sphere x 2 + y 2 + z2 = a2


whose volume is equal to V = 1
1ra 3 • ►
Now we consider a solid generated by revolving the curvilinear plane
figure abBA around the x-axis (Fig. 10.22). The plane figure is bounded
by the graph of the curve y = f(x), the lines x = a and x = b where a < b
and the x-axis. The solid thus obtained is called the solid of revolution.
488 10. Integral Calculus. The Definite Integral

The cross section of the solid of revolution with the plane perpendicular
to the x-axis at x is the circle whose area is Q(x) = 1ry 2 = 1rf2 (x); hence,
the volume of the solid of revolution is
b b
V = 1r 1/ 2 (x) dx or V = 1r 1y 2 dx.
a a

Example.. Compute the volume of the solid of revolution generated by


revolving the arc DA of the parabola y 2 = 2px around the x-axis
(Fig. 10.23). .
◄ The arc DA is specified by the equation y = ✓2px where p > O;
hence, the desired volume is
a a
V = 1r 1 y2 dx = 1r 12px dx = 1rpa 2• ►
0 0

10.6 Computing Arc Lengths by Integration


Arc length. We consider the arc AB with the endpoints A and B
(Fig. 10.24). Let M 1 , M2, ... , Mn - 1 be points on AB. If we join these

Fig. 10.24

points by line segments AM1, M1 Mi, ... , Mn - 1B with lengths M1, &2,
... , ~Sn, respectively, we obtain the polygonal line AM1M2 ... Mn - 1B
.____,
inscribed in the arc AB. The length Sn of this polygonal line is
n
Sn = &1 + &2 + , , , + as"n = t
k=l
as"k,

Definition. The length S of the arc AB is the limit of the length Sn of


the polygonal line inscribed in AB as the largest of the line segments. of
10.6 Computing Arc Lengths by Integration 489

this polygonal line tends to zero, i.e.,


n
S = lim Sn = lim ~ ~k
maxfilk-+0 maxfilk-+0 k =l

provided that this limit exists and is independent of choices of the points
M1, M2, ... , Mn- I on the arc AB.
Arc length of a curve in Cartesian coordinates. Let an arc AB be speci-
fied by the equation y = f(x) where the function f(x) has a continuous
derivative f' (x) on a closed interval [a, b ]. Consider a partition of [a, b]
into n subintervals [Xk _ 1, Xk], k = 1, 2, ... , n by choosing points such that
Xo = a< X1 < X2 < ... < Xk- I < Xk < ... < Xn = b.
y

Fig. 10.25

Let (Xk, f(xk)) be a point on AB corresponding to the kth subdivision point


Xk of the polygonal line. Then the polygonal line inscribed in AB is given
by the points
A = Mo(Xo, f(Xo)), M1(X1, f(xi)), ... , B = Mn(Xn, f(Xn)).
We let 11.sk denote the length of the kth line segment of this polygonal line
and Lilk = Xk - Xk- 1 and .dYk = f(xk) - f(Xk- 1), k = 1, 2, ... , n. Then
the length of the kth line segment (Fig. 10.25) is

/J.sk = ✓(llxd + (Ayk) 2 = .J) + ( : : y Ax,.

Applying the mean value theorem for derivatives we obtain


.dYk = f(Xk) - f(Xk- 1) = (Xk - Xk- 1) · f' (~k) = f' (~k) Lilk,
490 10. Integral Calculus. The Definite Integral

where ~k is a point in [xk - 1, Xk]; whence


dsk = ✓ 1 + lf' (~k)] 2 AXk
so that the length of the polygonal line is
n
Sn = I; ✓ 1 + [/' (~k)] 2 AXk.
k =1

Recall that f' (x) is continuous on [a, b]. This implies that
✓ 1 + lf' (x)] 2 is a continuous function on [a, b]. Hence the integral sum
on the right of ( *) has a limit S as max dsk ~ 0 so that
1 ::s;;k,:;;;;n
n ~---- b
S= lim I; ✓ 1 + [/' (fa)] 2 AXk = I ✓ 1 + [/' (x)] 2 dx
max..isy-->Ok =1 J
a
l ::s;;k,:;;;;n

or, using the abbreviated notation,


b
S = ) ✓ 1 + (y; ) 2 dx.
a

Example. Compute the arc length of the catenary y =


ex+ e -x
2 = cosh x bounded by the points A (0, 1) and B(a, cosh a)
(Fig. 10.26).
◄ On differentiating the equation of the catenary we have

y' = sinh x.
Noting that cosh 2 x - sinh 2 x = 1, we obtain
✓ 1 + (y') 2 = ✓ 1 + sinh 2 x = ✓cosh 2 x = coshx (coshx > 0)
so that
a a
S= ) cosh x dx = sinh x = sinh a. ►
0 0

Arc length of a curve specified by parametric_equations. Let AB be an


arc of a curve specified by the parametric equations
x = 'P(t), y = VI (t), to ~ t ~ T.
We also assume that the functions 'P(t) and V1(t) have continuous deriva-
tives 'P'(t) and Vl'(t) on [to, '.T] and 'P'(t) ;c O on [to, '.T]. The original para-
metric equations specify the function y = f(x) having a continuous
. .
d envat1ve ' VI I ( t)
Yx = 'P, (t) on to,
[ '.T] Th
. en

✓ 1 +(y; 2 ) dx = ✓ ['P'(t)]2 + [V1'(t)] 2 dt


10.6 Computing Arc Lengths by Integration 491

and formula (**) yields


T
S = 1✓ [<P '(1)) 2 + fit,' (t)J 2 dt
to
or
T
S = 1✓ (xt' ) 2 + (y/ ) 2 dt.
to

Examples. (1) Compute the length of the circumference of the circle


with radius R (Fig. 10.27).
y y

y=cosh x

0 I

-----oc::-¥-_ _ __.____ _
(1 X

Fig. 10.26 Fig. 10.27

◄ The circle with radius R is given by the parametric equations


x = R cost, y = R sin I, 0 ~ I < 21r.
Then applying formula (•••) we obtain
21r ,------------ 21r
S = 1✓ (-R sin 1) 2 + (R cos 1)2 dt = R i dt = 21rR. ►
0 0

(2) Find the length of the ellipse given by the parametric equations
x = a cost, y = b sin t, 0 ~ t < 21r (0 < b ~ a).
◄ Since x; = -a sin t and y/ = b cost and the ellipse is symmetric rela-
tive to the coordinate axes, formula (•••) gives
1r/2
S =4 I ✓a 2 sin2 t + b 2 cos 2 t dt
0
..-/2
=4 I ✓a (1 -
2 cos 2 I) + b 2 cos 2 t dt
0
..-12 --------- 1r/2
=4 I✓ a2 - (a 2 - b 2 ) cos 2 t dt = 4a I ✓1 - e 2 cos 2 t dt,
0 0
492 10. Integral Calculus. The Definite Integral

✓ a2 - b2
where e = ---- is the eccentricity of the ellipse, 0 ~ e < 1.
a
The definite integral on the right is called the elliptic integral; it is worth
noting here that the Newton-Leibniz theorem is inapplicable for computing
this integral since the antiderivative is not an elementary function. ►
Notice_ that the substitution t =; - T yields

1r/2 ,------- 1r/2 ~---- 1r/2


i ✓1 - e 2 cos 2 t dt = ) ✓1 - e2 sin 2 r dr = ) ✓1 - e2 sin 2 t dt.
0 0 0

This is the notation mostly used for the elliptic integral.


y

Fig. 10.28

(3) Find the length of one arc of the cycloid specified by the parametric
equations (Fig. 10.28)
x = a (t - sin t), y = a (1 - cos t), 0 ~ t < 21r (a > 0).
◄ Applying formula (***), we have

S =a j✓
21r ,----------
(I - cos t) 2 + sin t dt = a
2 j ✓2
21r
- 2 cost dt
0 0
21r 21r

=a j .J4 sin ~ dt = j 2
2a
. t
s1n 2
dt
0 0
2,r

= 2a j sin ~ dt =
2,r
t
-4a cos- = 8a. ►
2 0
0
'--'
Arc length of a curve specified by its polar equation. Let AB be an
arc of a curve specified by its polar equation e = /(cp).on a closed interval
[a, ~] where .f(cp) has a continuous derivative/' (cp). To find the length
10.6 Computing Arc Lengths by Integration 493

of the arc AB we derive the parametric equations of the curve. Recall that
the relations between the polar and Cartesian coordinates are given by
x = e cos 'P and y = e sin 'P· Replacing e by f('P), we get the parametric
equations x = f( 'P) cos 'P and y = f( 'P) sin 'P that describe the given curve.
We use here the polar angle 'P as the parameter.
On differentiating the parametric equations, we have
x~ = f' ('P) cos 'P - f('P) sin 'P
and
y~ = f' ('P) sin 'P + f('P) cos 'P·
Squaring both identities and adding (x;) 2 with (y;) 2 , we get
(x;)2 + (y;)2 = [/' ('P)]2 + [/('P)]2.

Then formula (***) gives


/3
S = 1✓ [/' ('P)]2 + L/('P )]2 d'P

or, equivalently,
/3
s= ~ ✓ e2 + (e, )2 d'P,
a

Example. Find the length of the cardioid given by the polar equation
e = a (1 + cos 'P), a > 0 (see Fig. 10.19).
◄ On differentiating the original equation we obtain e ' = - a sin 'P =
f' ('P), 0 ~ 'P ~ 21r, so that by symmetry
,r

S =2 J✓a 2 (1 + cos ;,)2 + a 2 sin 2 ,p d,p


0
,r ,r

= 2a J✓2 (I +cos;,) d,p = 4a J.Jcos ~ d,p 2

0 0
'Ir

= 4a Jcos ~ d,p = Sa. ►


0

Now we consider an arc of the curve y = f(x) shown in Fig. 10.29. Sup-
pose that the function /(x) has a continuous derivative/' (x) on the closed
interval [a, b]. We let AM denote the arc bounded by the points A(a, f(a))
494 10. Integral Calculus. The Definite Integral

and M(x, f(x)). If we think of A as a fixed reference point and of M as


a point that can move along the curve the length of the arc AM becomes
a function in x. This length S is expressed as
X _____ X

S = _\ ✓ l + [/' (x)] 2 dx = i ✓ I + [f' (!)] 2 dt.


a a

Since the integrand on the right is a continuous function on [a, b] we


can write
X

dS(x)
~---
dx
--- =
dx
J
✓ + [f,(t)] 2 dt)
-d -( - 1 = ✓ l + L/,(x)] 2
Q

or

~~f = J (-~Y
1 +
Whence the length dS of the arc AM becomes

dS = j1 + e;: y dx or dS = ✓(dx)2 + (dy) 2 .

ij

dx

0 X x+ dx )(

Fig. 10.29

To interpret the above formulas geometrically we look at Fig. 10.29.


Clearly, the value of dS is equal to the length of the line segment MN
of the tangent MT between the points M(x, y) and N(x + dx, ..__.... y + dy) .
For sufficiently small dx = AX the length t:..S of the arc MM' of the
curve y = f(x) corresponding to the increment t:..x = dx can be thought of
10.7 Applications of the Definite Integral .495

as being nearly equal to the length of the line segment MN of the tangent
MT to the given curve at the point M, i.e., ~ = dS.
When a curve is specified by the parametric equations
x = cp(t), y = 1/; (t), to ~ t ~ T,
where cp(t) and 1/;(t) have continuous derivatives on [to, T], then
dS = ✓ [cp'(t)] 2 + [i/;'(t)] 2 dt
or
dS = ✓ (x/ ) 2 + (y/ ) 2 . dt.
If we put the parameter t equal to the length S of a variable arc, i.e.,
x = cp(S) and y = 1/;(s), then the above formulas become

dx )
( dS
2 ( dy )2
+ dS = 1.
For a curve given by the polar equation e = f(cp), ex ~ 'P ~ {3, where
f( cp) is a function having a continuous derivative oh the closed interval
[ex, {3] we have
dS = ✓ (/ + <e, )2 dcp.
10.7 Applications of the Definite Integral
,
Work done by a variable force. Suppose we wish to define a work
done by a force F in moving a particle M along the x-axis from point a
to point b (a < b ). Recall from the course of general physics that the work
W done by the constant force F is the product of the absolute value F
of this constant force by the distance S = b - a, i.e., W = FS, provided
that F is directed along the x-axis.
Let a force F acting on a particle M along the x-axis be a continuous
function in x (F = F(x)) on the closed interval [a, b] of the x-axis. We parti-
tion the closed interval [a, b] into n subintervals with lengths AX1, AX2,
... , AXn by choosing points in [a, b] such that Xo = a < Xi < X2 < ... <
Xn = b. Assume that ~k is an arbitrary point in the subinterval [Xk- 1, Xk]
and the force F is constant on [Xk - 1, Xk] so that F = F(~k)- Then given
a sufficiently small AXk the work awk done by F (~k) in moving the particle
from Xk - 1 to Xk is LiWk = F(~k) AX°k and the sum
n n
Wn = I; awk = I; F(~k) AXk
k=l k=l

approximates the work W done by the force F on the closed interval


[a, b]. Since Wn is the integral sum for the function F(x) on [a, b] the
496 10. Integral Calculus. The Definite Integral

limit of Wn as max '1xk ➔ 0 defines the work W done by the force


l~k~n
F on [a, b] (this limit exists, for F(x) is a continuous function on [a, b]).
Thus the desired work W should be
n b
W= lim ~ F(~k) tak = r F(x) dx.
· max .1.xy-+O k =1 J
l~k~n a

Example. Let two point charges Q1 and Q2 be located at the points Mo


and M1, respectively, and let r1 be the distance between Mo and M1. Define
the work W done in moving the charge Q2 from M1 to M2 assuming that
the distance between M2 and M1 is equal to r2 and the point Mo is the
origin of the frame of reference.
◄ Suppose that both charges are positive, i.e., Q1 > 0 and q2 > O; hence,
they repel each other. By Coulomb's law the absolute value F of the force
of the electrostatic interaction of two point charges in vacuum is
F- k Q1Q2
-
r2 '
where r is the distance between the charges and k is a constant.
Then applying the above formula, we obtain

W = Jk q~f 2
dr = kq1112 J~ = kq1112 ( - !) :
= kq1 Q2 (-r11- - - 1-) .
r2

The mass and centre of gravity of a rod of varying density. Consider
a rod of varying density. Think of the rod as a closed interval [a, b] of
the x-axis and suppose that the linear density of the rod is defined as a
function e = e(x).
We partition the closed interval [a, b] into n subintervals [xk _ 1, Xk],
k = 1, 2, ... , n, by choosing the points
Xo < Q < Xt < X2 < , .. < Xn-1 < Xn =- b.
We let ~k denote an arbitrary point in the kth subinterval [Xk- t, Xk],
k = 1, 2, ... , n, and consider the sum

Clearly, each summand gives the approximate value of the mass of the
part of the rod corresponding to the subinterval [Xk _ 1, Xk]. Then this sum
determines the approximate value of the total mass of the rod so that we
10.7 Applications of the Definite Integral 497

n
can define the total mass m as the limit of ~ e(~k) LUk as max LU°k ~ 0
k=I 1::;;k::;;n
b
which is equal to the integral ) e(x) dx. Thus, the total mass m of the
Ci

rod of varying density e(x) is given by the formula


b
m = I e(x) dx.
a

To define the centre of gravity of the nonhomogeneous rod we shall


use the formula for the centre of mass of n particles M1, M2, ... , Mn
with masses m1, m2, ... , mn located at the points X1, X2, ... , Xn on the
x-axis. The centre of mass Xe of this system of particles is given by the
formula

---

Let us partition the closed interval [a, b] into n subintervals [Xk _ 1, xk]
by choosing the points a = Xo < X1 < ... < b = Xn and compute the mass
mk corresponding to the kth subinterval, k = 1, 2, ... , n. Clearly,
Xk

mk = ) e(x) dx. Then applying the mean value theorem (Theorem 10.5),
Xk- I
we get

where Xk - 1 ~ ~k ~ Xk,
Assume that a particle with mass mk is located at some point ~k in
the subinterval [Xk - 1, Xk], Then we can replace the rod by a system of
n particles with masses mk located at the points 6, b, ... , ~n in the closed
interval [a, b]. Since

I;
k=l
n
mk = I;
n
r
x

k=lxk-1
e(x)dx =
b
I e(x)dx = m,
a

the centre of gravity of the nonhomogeneous rod can be approximated as


n
I; ~ke(~k) ax-k
k=l
Xe==------
m
The nominator is the integral sum for the function x e(x) on the closed
interval [a, b]; hence, the centre of gravity of the nonhomogeneous rod
32-q505
498 10. Integral Calculus. The Definite Integral

is given by the formula


b
1x e(x) dx
a
Xc=-----
b
1e(x) dx
a

Example. Define. the centre of gravity Xe of the nonhomogeneous rod


with linear density e = x and length I = 1.
◄ The mass of the rod is

1 1
m = 1xdx = 2 .
0

The desired centre of gravity is


1
Xe = 2 1
0
X
2
dx =
2
3. ►

10.8 Numerical Integration


In dealing with physical problems we frequently work with definite
integrals involving continuous functions that have no elementary an-
tiderivatives. To compute these integrals we have to numerically approxi-
mate them. Given below are two convenient approximations, namely,
trapezoidal approximation and parabolic approximation.
b
Trapezoidal approximation. We wish to compute the integral ) /(x) dx
a
where /(x) is a continuous function on the closed interval [a, b]. For sim-
plicity we set /(x) ~ 0. Consider the partition of [a, b] into n subintervals
of equal length (the regular partition) by the points
Xo = a < Xi < X2 < . . . < Xn - 1 < Xn = b.
Let us draw n vertical lines x = Xk, k = 0, 1, ... , n and construct n trape-
zoids as shown in Fig. 10.30. The total area of these n trapezoids is approxi-
mately equal to the area of the curvilinear figure aABb so that
b

r /(x) dx =: /(Xo) + /(xi) (x1 - Xo) + /(xi) + /(xi) (x2 - X1)


Ja 2 2

+ . .. + f(Xn-d + f(Xn) (
Xn - Xn - 1
)
2
n-l

b 2~ a ~(a)+ f(b) +2 ~ f(xk)],


k=l
10.8 Numerical Integration 499

where f(Xk- i) and f(xk) are the bases of the trapezoids and
Xk - Xk-1 = -b-a
- - are t h.elf h.h
e1g ts.
n
Thus we arrive at the formula

I
b n-1

f(x) dx"" b 2~ a ~(a)+ f(b) +2 ~ f(xk)]


a k=l

called the trapezoidal approximation. Notice that the larger is n the higher
precision of approximation is obtained.

lj

Xn-1 b=Xn X

Fig. 10.30

It is worth noting that given a function /{x) having the continuous sec-
ond derivative /" (x) on [a, b] the absolute error of approximation does
not exceed the number
M (b - a)3
12n 2 '
where M= max lf"(x)l-
a(;,x(;,b

Example. Compute the trapezoidal approximation to


f x dx+1 J
with n = 10. o
◄ Consider the regular partition of [Q, l] by the points Xo = 0, X1 = 0.1,
x2 = 0.2, ... , x9 = 0.9, X1o = 1. Let us compute the values of the function
/(x) = x+1 1 at these points. We have

32*
500 10. Integral Calculus. The Definite Integral

f(0) = 1.0000, f(0.1) = 0.9091, f(0.2) = 0.8333,


f(0.3) = 0.7692, f(0.4) = 0.7143, f(0.5) = 0.6667,
f(0.6) = 0.6250, f(0.7) = 0.5882, f(0.8) = 0.5556,
f(0.9) = 0.5263, f(l) = 0.5000.
Then the trapezoidal approximation is
I
f dx
= -101- ( l.OOOO +2 0·5000 + 0.9091 + 0.8333 + 0.7692
J
0
X + 1
+ 0. 7143 + 0.6667 + 0.6250 + 0. 5882 + 0.5 556 + 0.5263)
= 0.69377 z 0.6938.
Let us estimate the error of the computations. Since
f(x) = __ _!__ then f' (x) = - 1 and f" (x) = 2
x + 1 (x + 1)2 (x + 1) 3 •

On the closed interval [0, 1] we have If" (x)I ~ 2 and, consequently,


M = max lf" (x)I = 2; hence the error does not exceed the number
O~x~l
(b - a)2 2 1
M 2 = 2 - 600 < 0.0017.
12n 12 X 10
The Newton-Leibniz theorem gives the exact value of this integral as
I

f
j X
dx I
+
= ln(x + I) 1

o
= ln2"' 0.69315.
0

The absolute error in the trapezoidal approximation is smaller than


0.0007; hence both these results are in full agreement with estimates given
by theory. ►
Parabolic approximation. We start with computing the area Q of a plane
figure bounded by the arc AB of the parabola y = Ax2 + Bx + C passing
through the points Mo(0, Yo), Mi(h/2, Yt) and M2(h, Y2) (Fig. 10.31). This
area is given by
h

Q= J (Ax 2 + Bx + C) dx = A - 3-
~ h

0
+B2
~ h

0
+ Cx
h

0
0

= A - h3- + B 2h +
3
= 6h (2Ah 2 +
2
Ch 3Bh + 6C).

Now we express the area Q in terms of ordinates of M1, M2 and M3.


Substituting the coordinates of these points into the equation of the -given
10.8 Numerical Integration 501

parabola, we obtain
h1 h
Yo = C, Y1 = A 4 + B 2 + C, Y1 = Ah 1 + Bh + C.
Whence
2Ah 1 + 3Bh + 6C = Yo + 4y1 + Y1

so that
h
Q =6 (Yo + 4y1 + Yz).

------- y= Ax 2 +Bx +C

0 h h X
2

Fig. 10.31

b
Let us consider the definite integral ) f(x) dx where f(x) is a continuous_
a
nonnegative function on [a, b]. We divide [a, b] into 2n regular subintervals
(notice that 2n is even) by the points
a = Xo < X1 < X1 < ... < X1n - 1 < X1n - I < X1n = b
and write down the original integral in the form

1f(x) dx =
b ~
J f(x) dx +
~
1f(x) dx + t
... + J f(x) dx.
a Xo

We let A, M1, M1, ... , M2n - 2, M2n - 1, B denote the points of intersec-
tion of the vertical lines x = Xk, k = 0, 1, 2, ... , 2n, with the graph of
the function y = f(x); let Yo, Y1, Y2, ... , Y2n - 2, Y2n - 1, Yin be the ordinates
of the respective intersection points. If we draw a parabola with vertical
axis of symmetry through every triple of points M2k - 2, M2k - 1, M2k (k = 1,
2, ... , n) we get n curvilinear plane figures bounded above by the parabolas
(Fig. 10.32). Since the area of the curvilinear plane figure corresponding
b-a
to the subinterval [X2k- 2, X2k] with length h = - - - is approximately
n
502 10. Integral Calculus. The Definite Integral

equal to the area of the respective "parabolic" trapezoid, formula (*) gives

1
x2k

f(x) dx "" b ~ a (.Yik- 2 + 4.Yik- 1 + .Yik),


X2k -2

where Yk = f(xk), k = 1, 2, ... , n.


Then formula (**) becomes
b

Jf /(x) dx ~ b - a
6n [Yo + Y2n + 2 (Y2 + Y4 + . . . + Y2n - 2)
a
+ 4(y1 + Y3 + ... + Y2n - 1)].
This is the parabolic approximation which is sometimes called Simp-
son's approximation.

Fig. 10.32

Notice that when /(x) has a continuous fourth derivative .f4>(x) on


[a, b] the error in the Simpson's approximation is at most
M(b - a)5
2880n 4 '

where M = max IJ<4>(x)I.


a~x~b
The error in Simpson's approximation decreases faster than that in the
trapezoidal approximation as n increases; so the former is more precise than
the latter. 1

Example. Compute Simpson's approximation to the integral J x+l


dx

0
with 2n = 4.
Exercises 503

◄ Consider the regular partition of the closed interval [0, 1] by the points
Xo = 0, xi = 0.25, x2 = 0.50, X3 = 0.75, X4· = 1. The values of the function

y = 1 at these points are Yo = 1.0000, Yi = 0.8000, Y2 = 0.6662,


x+ 1
y3 = 0.5714, y4 = 0.5000. Then Simpson's approximation is
1

Jf dx b-a
x + l :::::: 6n [Yo + Y4 + 2y2 + 4 (Y1 + y3)] =
0

l; [1.0000 + 0.5000 + 2 X 0.6662 + 2(0.8000 + 0.5714)] :::::: 0.69325.

Let us estimate the error of the result thus obtained. The integrand
1 24
1 has the fourth derivative j< >(x) = so that
f(x) = 4
5
X + (X + 1)
24
= 24.
(1 + x)5
Hence the error is at most 24 < 0.0005. Comparing Simpson's
4
2880 X 2
approximation with the exact value of the integral we conclude that the
absolute error in Simpson's approximation is less than 0.0001 as it has been
predicted above.
The computations of the trapezoidal approximation and Simpson's ap-
1

· · to t h e 1ntegra
prox1mat1on · 1 Jf x dx
+ 1 prove t h at t h e 1atter 1s
. more precise
.
0
than the farmer. ►

Exercises
Compute the following integrals by applying the Newton-Leibniz
theorem
4 2e

J(rx- Jx)
11'

1.
1
f x vx dx.
0
2
5
2.
0
dx. 3. j
e
dx
X
4.
7r/2
) cos~dx.

l/v2 1 3
3
5.
) ✓
dx
1- x2
6. ) 1+
dx
x2 . 7. ) lxl dx.
-2
8. ) xdx
✓x + 1
-1/../2 0
0

-3 e4 1

~
dx
~
xdx
~
dx 10.
9. 11.
✓25 + 3x x✓ lnx ✓ 1 - x2
0 e 0
504 10. Integral Calculus. The Definite Integral

4 2 1
In (x + ✓ 9 + x 2 )
12.
i
0
✓9 + x2
dx. 13.
i
0
xdx
✓ 1 + x4
14.
i
0
xdx
✓ 4 - x4
1r/2
1r/2
COS X - SinX
15.
I
0
3cos 2 x sin 2x dx. 16.
J cosx + sinx
dx.

e
7r/ 12

i
1
cos (In x)
17. J In sin 2x • cot 2x dx. 18. dx. 19.
J sinh 2 xdx.
X
0 0
1
ln2 ln3
20. tanhxdx. 21. ) tanh 2 xdx.
J
0 0
Integrate by parts each of the following integrals
21r 1 1
22. ) x sin x dx.
0
23.
I In (1 + x) dx.
0
24.
J0 xsinh xdx
l/v'2 1
25. J sin - 1x dx. 26.
J
x3 exl/2 dx.
0 0
Compute the areas of the plane figures bounded by the given curves
and the indicated axes
27. y = x 2 + 2x - 3 (parabola), y = x + 3 (line).
28. y = 2x - x 2 (parabola), y = -x (line).
29. y = x 2 - l (parabola), x = 2 (line); the x- and y-axes.
30. y = x 2 - 3x - 4, y = 4 + 3x - x 2 (parabolas).
31. y = x - l, y = 1, y = In x.
32. y = e - X, y = eX, x = 1.
33. y = x 3 , y = 8; the y-axis.
34. x = a cos 3 t and y = a sin 3 t, a > 0 (astroid).
35. x = a (t - sin t) and y = a (l - cost), a > 0 (an arch of a cycloid); the
axis of abscissas.
36. x = a (2 cost - cos 2t) and y = a (2 sin t - sin 2t), a > 0 (cardioid).
37. e = a sin cp, a > 0. 38. e = a sin 2cp, a > 0. 39. e = 2 + sin 'P·
Compute the volume of the solid generated by revolving the given curves
or the given plane figures around the indicated axis
40. y = sin x, 0 ~ x ~ 1r; around the x-axis.
41. y = sin 2 x, 0 ~ x ~ 71'; around the x-axis.
x2 Y2
42. -
a2 + - = 1 (ellipse); around the x-axis.
b2
Exercises 505

43. The plane figure bounded by the parabola y = ax - x 2, a > 0, and


the x-axis; around the x-axis.
44. The plane figure bounded by the parabola y = x 2 , the line y = 1 and
the y-axis; around the y-axis.
45. Compute the volume of the solid bounded by the paraboloid
Y2 z2
2P + 2q = x and the plane x = a.
46. Compute the volume of the solid bounded by the hyperboloid of one
x2 Y.2 z2
sheet -
a2
+ -b2
- -c2
= 1 and the planes z = 0 and z = h.

Compute the length of the arc of the given curve between the indicated
points
x2
47. y = 2 ; from (0, 0) to (1, 1/2).
48. y = U (semicubical parabola); from the origin of coordinates to
A (1, 1).
49. y = lnx; from x = v3 to x = VS.
50. y = In sin x; from x = 7r/3 to x = 71"12.
51. x = a cos 3 t and y = a sin 3 t, a > 0 (astroid; define the total length).
t3
52. x = - t and y = t 2 + 2· from t = 0 to t = 3.
3 '
53. x = et cos t and y = et sin t; from t = 0 to t = In 71".
54. e = a e"", a > 0, (logarithmic spiral); inside the circle e ~ a.
55. e = a sin cp, a > 0; (total length).
56. e = acp, a > 0, (spiral of Archimedes); the length of the first winding.

Answers
5 4 1r 1r 8 2
1. - . 2. - . 3. In 2. 4. - I. 5. . 6. - . 7. 6.5. 8. - . 9. - - . 10. 2. 11. 1.
17 3 2 4 3 3
3 2 1 r.;:.- 1r 2 In 2 . . 2
12. - In 3. 13. - In (4 + "17 ). 14. - . 15. - - . 16. 0. 17. - . 18. sm 1. 19. smh 1.
2 2 4 In 3 4
5 4 -I 7r+4-\f'2 _C
20. -In-. 21. In 3 - - . 22. -21r. 23. In 4 - 1. 24. e . 25. - - - - . 26. 2 - ve.
4 5 2
125 125 3 2 2
27. - . 28. 4.5. 29. 2. 30. - . 31. e - 2.5. 32. 2 cosh 1 - 2. 33. 12. 34. - 1ra . 35. 31ra .
6 3 8
2 1ra 2 1ra 2 9 2 1f 2 2 3
1ra s 7r
4
36. 61ra . 37. - - . 38. - - . 39. - 1r. 40. - 2 . 41. - 1r . 42. - 7f{Jb . 43. - 30 . 44. - .
2 4 2 8 3 2
h2 ) 1 13 ✓ 13 - 8
45. 7f{J 2.Jpij. 46. 1rabh ( 1 + ~ . 47. 2 [\'2 + In (1 + v'2)]. 48. 27
1 3 1
49. 1 + - In- . 50. - In 3. 51. 6a. 52. 12. 53. (1r - 1)\12. 54. a\12. 55. 1ra.
2 2 2

56. 1r0✓ 1 + 41r 2 + ; In ((21r + ✓ 1 + 41r2 ) •


Chapter 11
Improper Integrals

11.1 Integrals with Infinite Limits of Integration


The notion of the definite integral involves a function which is de-
fined on a closed (bounded) interval [a, b] so that the interval of integration
is always bounded. Certain applications of integral calculus lead to integrals
of functions defined over unbounded domains; these are infinite semi-
intervals of forms [a, + oo), and ( - oo, b] or the infinite interval of the
form ( - oo, + oo). For example, we encounter such integrals when comput-
ing the potential of gravitational or electrostatic forces.
To make the notion of the definite integral applicable to unbounded
intervals of integration we must rely updn some new concepts and defini-
tions which clarify the meanings of the symbols
+ 00 b + 00
I f(x) dx, I f(x) dx, ) f(x) dx.
a -oo - 00

Let a function f(x) be defined for all x ~ a and integrable, say continu-
ous, on every finite closed interval a ~ x ~ b where a is a given number
and b (b ~ a) is any arbitrary number. To deduce what we mean by writing
+oo
I f(x)dx (*)
a
which is called the improper integral, we consider the function J(b) =
b
I /(x) dx of the variable b (b ;;;::: a).
a

Definition. If the function J(b) has a finite limit L as b--+ + oo the im-
proper integral (*) is said to converge.
In this case we write
+oo b
lim r /(x) dx = L.
J /(x) dx = b-++oo
a
J a

Suppose that the function J(b) has no (finite) limit as b --+ + oo. Then
the improper integral (*) is said to diverge; in this case no numerical value
is assigned to this integral.
11.l Integrals with Infinite Limits of Integration 507

+00

:x,amples. (1) The improper integral j dx


-----=-
1 + x2
converges and is equal
to 2. o

◄ Indeed, by the definition we have


+ 00

j dx
1+ X
2
_
-
.
11m
b-+ + oo
j
b
dx
1+ X
2 = .
11m tan - t b
b-++oo
= 21r . ►

0 0

+oo
(2) The improper integral I cos x dx diverges.
0
b
◄ Indeed, the integral I cos x dx = sin b has no limit as b ~ + oo; hence,
it diverges. ► 0

(3) Let two charges Q1 and Q2 be both positive so that they repel each
other. By the Coulomb's law the absolute value F of the force of electrostat-
ic interaction of two point charges in vacuum is given by
F = kq1q2
r2 '
where r is the distance between the charges and k is a constant.
Suppose that the charge q1 is located at the point Mo which is chosen
as the origin of some reference frame and the charge q2 is located at the
point M. We let r1 denote the distance between the ponts M and Mo and
compute the work done in moving the charge Q2 from M to infinity.
◄ The desired work Wis defined as the improper integral
+oo +oo

W = j k~? 2
dr = kq1 Q2 j dr
-2.
r

By the definition we have


+oo b

r
J
dr = lim
r2 b ➔ + oo
r dr
Jr 2
= lim
b-+ + oo
(-l)
r
r=b

r=r1
= lim
b-+ +oo
(
- 1- -
r1
!)
b
1

There1ore,
c W -- -kqi
-- . charge. Then W
Qi . Let q2 be a unit kqi
= --; th"1s
n ~
quantity is called the potential of the field induced by the charge Q1. ►
+oo

(4) Consider the integral j -dxxa, a = const.


1
508 I l. Improper Integrals

We wish to determine the values of a for which the integral converges


and those for which it diverges.
◄ By the definition of the improper integral we have
+ 00

J
l d:
X
= lim
b->+oo
1

Let a ;t:. 1. Then

x• -a b bl -a 1
1- Cc'. 1- Cc'. 1 - Cc'.

so that for a > 1 we have


b
lim l dx = lim ( bi - a
b _. + oo J Xa b -> + oo 1- Cc'.
1 b

Thus the original integral converges if a > 1. For a < 1 the integral J xadx

1
has no finite limit as b -+ + oo; in other words the integral diverges if a < 1.
Now let a = 1. Then
b b

l dx = f dx = In b.
J xa J X
1 1
b

Whence it follows that for a =l J dxxa


- -+ + oo as b -+ + oo.
Therefore we infer that 1
+oo

the integral
f dx ( converges if a > 1,
J
1
t
xa diverges if a ~ 1.

This result can easily be interpreted geometrically. Consider the region


D bounded from the left by the line x = 1, below by the x-axis and above
by the graph of the function y = llxa (Fig. 11.1). This region is unbounded
from the right, i.e., it extends to infinity. By convention we regard the limit
of the region bounded from the right by the line x = b provided that
b-+ + oo is the area of the infinite region D (Fig. 11.2). It is easy to see
that for a > 1 the area of the region D bounded above by the graph of
y = 1/xa is finite; if D is bounded above by the graph of the hyperbola
y = 1/x or of the curve y = 1/xa with a < 1 the notion of the area of D
has no meaning.
11.1 Integrals with Infinite Limits of Integration 509

y
cr=1

-r----a:<1
----a:=1
a>1
0 1 b X

Fig. 11.1

1 b X

Fig. 11.2

I
+ 00
dx
Remark. Notice that given any a ~ o> 0 the integral - - converges
X0t
a
for a > 1 and diverges for a ~ 1.
The definition of the improper integral
+ 00 b
1/(x) dx = lim
b-++oo
r /(x) dx
j
a a

implies directly the fallowing


510 11. Improper Integrals

+oo +00
(1) If the integral 1f(x) dx converges so does the integral 1 'A./(x) dx
a a
where 'A. is any real number and
+00 +oo
) 'A/(x) dx = 'A 1f(x) dx.
a a
+ 00 + 00
(2) If both 1/(x) dx and 1<P(X) dx converge so does
a a
+00
) (f(x) + 'P(x)) dx and
a

+00 +00 +00


1(f(x) + 'P(x)) dx = 1/(x) dx + 1<P(x) dx.
a a a

◄ Indeed, given any b > a there holds


b b b
) (f(x) + 'P(x)) dx = ) /(x) dx + ) 'P(x) dx.
a a a

Both integrals on the right have limits as b ➔ + oo; hence, the integral on
+oo
the left also has a limit as b ➔ + oo, i.e., 1(f(x) + <P(X)) dx converges.
a
Evaluating the limit of the above identity as b ➔ + oo, we obtain the desired
result. ►
+oo
Problem. Let the integral 1(f~x) + 'P(x)) dx converge. Determine
a
+00 +cxi
whether or not the improper integrals ) f(x) dx and ) 'P(x) dx converge.
a a
(3) Let u(x) and v(x) be continuously differentiable functions on the
line x ~ a. Then
+oo +00 + 00
I u dv = (u(x) v(x))
a a
) v du (integration by parts)
a

provided that at least two of the three terms involved make sense (notice
that in this case the third term also makes sense).
+ 00
Example. Integrate by parts Jn = I x'2e - x dx, where n is a natural
0

number or zero.
11.2 Integrals of Nonnegative Functions 511

◄ We have
+00 +00 +00
In = I xn e - x dx =- (xn e - X) +n j xn - le - x dx = nJn - 1 ,
0 0 0

n = 1, 2, ....
Notice that
+00
Jo = J e - x dx = 1
a

we get In = n ! ►

11.2 Integrals of Nonnegative Functions


In a variety of applications it suffices to determine whether a given
integral converges or not without having to evaluate it directly. Convergence
tests are convenient devices in many instances. Here we formulate some
valuable convergence tests for improper integrals in the forms of theorems.
Theorem 11.1. Let f(x) and c,o(x) be integrable functions on [a, b] for
any b > a and let O :::;; f(x) :::;;_ c,o(x) for all x ~ a. Then
+00 +00
(1) If J ip(x) dx converges so does j f(x) dx.
a a
+00 +00
(2) If J f(x) dx diverges so does j c,o(x) dx.
a a
+ 00
◄ First we suppose that the integral J ip(x) dx converges and prove that
a
+00
so does J f(x) dx. In other words we have to show that the function
a
b
J(b) = J f(x) dx has a finite limit as b -+ + oo.
a
Since /(x) is _a nonnegative function for all x ~ a J(b)must be a non-
decreasing function in b. Indeed, if b1 > b then
bl b bl b
J(b1) = I /(x) dx = I /(x) dx + J f(x) dx ~ I f(x) dx = J(b).
a a b a

Also, since /(x) ~ ip(x) for all x ~ a then given any b > a there holds
b b
j /(x) dx ~ j ip(x) dx.
a a
512 1I. Improper Integrals

b
The value of the integral J cp(x) dx is not larger than that of the in-
a
+ 00
tegral I cp(x) dx; the latter integral converges by the hypothesis. Hence,
a

given any b > a there holds


b + oo
J(b) = J f(x) dx ~ J <P(x) dx = L
a a
b
so that J(b) = I f(x) dx is a nondecreasing function in b which is
a
bounded above (as b-+ + oo). This means that J(b) has a finite limit as
+ 00
b -+ + oo and by virtue of the definition the integral J f(x) dx converges.
a
Now we prove the second part of the theorem. Let the integral
+oo
J f(x) dx diverge. Assume the converse, i.e., assume that the integral
a
+oo
J cp(x) dx converges. Then the first ·part of the theorem tells that the in-
a + oo
tegral J f(x) dx must converge and that contradicts to the hypothesis
a
of the theorem. Hence, our assumption is false and the integral
+oo
J <P(X) dx diverges. ►
a +oo

J 1 + x z + sin x
e- x2
Example. Investigate the improper integral 4 dx.
0
-x2
◄ Notice that for all x ~
. Ji(x) = - - -
0 the function . sue h
e - - - 1s
2 4 1 + x + sin x
that
e-x2 1
Q. < 1 + x 2 + sin4 x ~ 1 + X2 = <P(X).
+ 00

The integral J -1 -+dx~


x
converges (See Example (1) on p. 507). Thc11
2
0
by virtue of part (1) of Theorem 11.1 the integral in question also converges.
It is worth mentioning here that we fail to investigate this integral by apply
ing the definition of convergent integrals. ►
11.2 Integrals of Nonnegative Functions 513
---------------------------
Theorem 11.2. Let f(x) and cp(x) be continuous nonnegative functions
for all x ~ a and let cp(x) be distinct from zero for all sufficiently large
+oo +oo
x. Then both integrals J f(x) dx and J cp(x) dx either converge or diverge
a a
provided that there exists
lim f(x) = k ;e. 0.
x ..... + oo cp(x)

◄ Indeed, let
cpx
lim
x ..... +oo
ft~
= k > 0. By the definition of the limit this
k
means that given any number e > 0, say e = 2 > 0, there exists a number
N such that for all x ~ N
f(x) _ k k
<s=-
cp(x) 2
or, equivalently,

k < f(x) < 3 k.


2
2 cp(x)
Recall that cp(x) > O; then the above inequality becomes

~ cp(x) < f(x) < ~ kcp(x) 'dx ~ N.


Theorem 11.1 implies that
+oo +oo 3
(a) If J cp(x) dx converges so does J f(x) dx since f(x) < 2 kcp(x),
N N
+oo +oo k
(b) If J cp(x) dx diverges so does J f(x) dx since 2 cp(x) < f(x).
N N
+ 00
Proceeding in a similar way we infer that if J f(x) dx converges
N
+oo
(diverges) then J <P(X) dx also converges (diverges).
N
+oo +oo
These results remain valid for f f(x) dx and f cp(x) dx due to the fact
a a
+oo
that the integral f g(x) dx converges or diverges simultaneously with the
a
+oo
integral f g(x) dx, where p > a is any whatever large fixed number since
p
the difference of these integrals is also an improper integral. ►

33-9505
514 11. Improper Integrals

Example. Determine whether or not the improper integral


+ 00
2x2 + 1 dx
J1
x 3 + 3x + 4

converge.
2x 2 + 1
◄ The integrand f(x) = - - - - - 1s positive for all x ~ 1. Write
x 3 + 3x + 4
down f(x) as
+ 1/x2 2
f(x) = x + 3/x + 4/x2 •

Whence· we conclude that for large x the function f(x) is similar to the
function 2/x.
+oo +oo

Put ,p(x) = ~ and consider the integral J~ = J ,p(x) dx. This


1 1
integral diverges and by virtue of Theorem 11.2 so does the original integral
.
since
lim
x-. +oo
f(x) = lim
<P(X) x ➔ +oo X
\2x+ 3X+ l)x
2

+4
= 2 rt 0. ►
+ 00

Theorem 11.1 along with the results concerning the integral J ::


+ 00 1
lead to the following convergence test for the integral i f(x) dx involving
a nonnegative function f(x). 0

Theorem 11.3. Let there exist a number a > 1 such that for all suffi-
ciently large x

+oo
where Mis independent of x and M > 0. Then the integral J f(x) dx con-
a
verges.
If f(x) ~ M where M is independent of x and M > 0, whenever x is
X
+oo
sufficiently large, then the integral ) f(x) dx diverges.
a

◄ Let O ~ f(x) ~ ~(a> 1) for all x ~A > max (a, O}. Since the in-
+oo · X ·

= ~
tegral
Jf M dx where a > 1 converges then putting <P(X)
~ X
in The-
.
A
11.3 Absolutely Convergent Improper Integrals 515
+ 00
orem 11.1, we conclude that so does J f(x) dx. Whence it follows that
+00 A
J f(x) dx converges, for
a
b A b
) f(x) dx = J f(x) dx + J f(x) dx
a a A
b b
and the integrals J f(x) dx and J f(x) dx must simultaneously have finite
a A
limits as b ~ + oo.
M
Now let f(x) ~ - (M > 0) for all x ~ A > max {a, 0}. Since the in-
+ 00 X

tegral } ':dx diverges then, by virtue of Theorem 11.1, so does the in-
A
+oo +oo
tegral f f(x) dx; this implies that the integral J f(x) dx also diverges. ►
A a
+oo
x-2
Example. Investigate the integral } 3z dx.
x +x +2x+5
3
◄ For x ~ 3 we have
X - 2 X 1
0 < ~ -2 ~ - - - - <3- = -
x3 + x + 2x + 5 x x2 •
+ 00

The integral }
3
!~ converges smce "' = 2 > I; hence so does the

original integral. ►

11.3 Absolutely Convergent Improper Integrals


Let a function f(x) be defined for all x ~ a and integrable on every
[a, b] where b > a.
+00

Definition. The integral J f(x) dx is said to converge absolutely if the


a
+ 00
integral ) lf(x)I dx converges.
a +oo +oo
If the integral J f(x) dx converges and the integral J IJ(x)I dx diverges
a a
+ 00
we say that the integral J f(x) dx is conditionally convergent.
a
33*
516 11. Improper Integrals

+ 00
Theorem 11.4. If the integral J lf(x)I dx converges so does the integral
+00 a
j f(x) dx.
a
+ 00
◄ Let J lf(x)I dx converge, i.e.,
a
b
lim
b-++oo
J lf(x)I dx = L <
a
+ oo.
Since given any x in the domain of f(x) there holds
- lf(x)I ~ f(x) ~ lf(x)I
then
0 ~ lf(x)I + f(x) ~ 2lf(x)I.
+00
The integral J lf(x)I dx converges by the hypothesis; hence, so does
a
+00 +00
the integral j 2lf(x)I dx =2 j lf(x)I dx. Then by virtue of (•) Theo-
a a
+00
rem 11.1 yields that the integral J (f(x) + lf(x)I) dx converges. This means
b a
that the integral j (f(x) + lf(x)I) dx has a finite limit as b ~ + oo.
a
Evidently, we have for all x ~ a
f(x) = (f(x) + lf(x)I) - lf(x)I.
Whence for any b > a there holds
b b b
J f(x) dx = J (f(x) + lf(x)I) dx - J lf(x)I dx.
a a a
Both integrals on the right have finite limits as b ~ + oo. Hence, the integral
b + oo
J f(x) dx also has a finite limit as b ~ + oo, i.e., the integral J f(x) dx
a a
converges. ►
+ Theorem 11.1 and the results concerning the integrals of the form

I ::
00

leads to the following convergence test for the integral +r f(x) dx.
1 Theorem 11.5. Let there exist a number a > I such that for all suffi-
ciently large x there holds
M
lf<x>I ~ -
Xa.
,
11.3 Absolutely Convergent Improper Integrals 517

+oo
where Mis independent of x and M > 0. Then the integral J f(x) dx con-
a
verges absolutely.
◄ Suppose that the above condition holds for all x ~ A > max {a, 0}.
+ 00

Since the integral


Jf
A
M dx where a > 1 converges, then by virtue of The-
Xa
+ oo
orem 11.l so does the integral J 1/(x)I dx. This implies that the integral
A
+oo +oo
f 1/(x)I dx converges and, consequently, the integral f f(x) dx converges
a a
absolutely. + 00

If the integral j f(x) dx does not converge


a
absolutely. ► +oo
Sill X
dx converges absolutely since
For example, the integral
J --=------
x2
+oo
Sill X
x2
~ -
1
X
2 for all x, x ~ 1, and the integral Jf dx
x 2 converges.
l
+ 00
Thus we infer that the integral J f(x) dx converges if it converges abso-
a
lutely. The converse is not true: not every convergent integral converges ab-
+ 00

solutely. For example, the integral Jr Sill X


x
.
dx being convergent does not
converge absolutely. 1
◄ Indeed, applying integration by parts, we obtain

TSl:X dx= - +r ; d(cosx)= -


1
cosx
X 1
+oo

+ 00

= cos 1 j cosx
----,2=--
X
dx.
1
+Joo COS X
The integral - ~ - dx converges absolutely; hence, it is convergent.
x2
1
Thus both expressions on the right of the above identity are finite so that
(1) the method of integration by parts is justified and (2) the integral
+oo

on the left is finite, i.e., the integral J Slll X


- - dx converges.
X
1
518 11. Improper Integrals

+ 00

Now we show that the integral j Sill X


- - dx is conditionally conver-
x
1 + 00

gent. In other words we show that the integral j


1
jsinxl
- - - dx diverges.
X

Notice that for any b > 1 the inequality


1 - cos 2x
lsinxl ~ sin 2 x = 2

implies that
b b b

j lsinxl
x dx ~ 2
I
Jf dxx _ 21
Jf cos 2x
x dx.
1 1 1
+ 00 b

J ~
The integral diverges, for lim
00
b-- + oo Jf
1
dx - + oo. Applying in-
X

. by parts to t h e 1ntegra
tegrat1on . 1 ~ -
cos - dx, we 1n1er
-2x . c t h at 1t
. converges.
X

Evaluating the limit of (*) as b ~ + oo, we conclude that the right-hand


side of (*) and, hence, the left-hand side of (*) diverge to infinity. This
+ 00
means that the integral j !sin xi
- - - dx diverges, whence it follows that the
X
1
+oo
original integral j Sill X
- - dx is not absolutely convergent.
X

1
Now we turn our attention to Dirichlet's convergence test which speci-
fies the sufficient conditions for the improper integral to converge.
Theorem 11.6 (Dirichlet's convergence test). If a function is continuous
and has a bounded antiderivative F(x) for x ~ a_ and if a function g(x)
is continuously differentiable and monotonically decreases for x ~ a and
+oo
if Iim g(x)
x-+ +oo
= 0, then the integral I f(x) g(x) dx converges.
a

The proof of this theorem can easily be found elsewhere in the relevant
literature.
By way of illustration we apply Dirichlet's convergence test to the in-
+joo sin X
tegral --dx,
Xa
a> 0.
1
Cauchy Principal Value 519

◄ The function f(x) = sin x has the bounded antiderivative F(x) = - cos x
for all x; g(x) = 1/xa (a > 0) is a continuously differentiable function for
x ~ 1 and g(x) monotonically decreases. and tends to zero as x ~ co. Hence,
all the conditions of Theorem 11.6 are satisfied so that the integral in ques-
tion converges. ►
+ 00
Problem. Show that the Fresnel integral 1sin x 2 dx converges. (Hint.
0
Use the substitution x 2 = t .)

11.4 Cauchy Principal Value of the Improper Integral


b
We define the improper integral 1 f(x) dx by writing
- 00

b b
1f(x)dx = lim
a-+ -oo
) f(x) dx
-oo a
b
and say that 1f(x) dx converges if this limit exists; otherwise the improper
-00

integral is said to diverge.


If both the lower and upper limits of integration are infinite we set
+ 00 b
1f(x) dx = lim
b-++oo 1f(x) dx
-00
a-+ -oo a

or
+ 00 NZ
1f(x)dx = lim
N1-++oo 1 f(x) dx,
- 00
N2-+ + oo -Ni

where Ni and N2 diverge to + co independently. It may happen that the


improper integral thus defined has no meaning while there exists its Cauchy
principal value (C.p.v.) given by the formula
+oo N
C.p.v. J f(x) dx = lim J f(x) dx,
-oo N- +oo -N

+oo
i.e., when N1 = N2 = N. Then we say that the improper integral I /(x) dx
-oo
converges in the sense of the Cauchy principal value.
+ 00
Example. Investigate the integral J xdx
1 + x2 •
-oo
520 11. Improper Integrals

◄ We have 1
N1
xdx
-----=-
I + x2
1 1 + N;
= - I n - - ~ whence it follows that the in-
2 1 + Nf '

tegral 1
-Ni
---=-
xdx
1 + x2
has no limit as N1 ➔ + oo and N2 ➔ + oo so that the

+ 00

integral J -1xdx
- - diverges. On the other hand we have
+x 2
- 00

+ 00 N
xdx xdx 1 1 + N2
C.p.v. - lim = -In = 0.
-
J
00
1 + x2 N-+ + oo J
-N
1 + x2 2 1 + N2

Hence, the integral in question converges in the sense of the Cauchy prin-
cipal value. ►

11.5 Improper Integrals of Unbounded Functions


The necessary condition for the definite (Riemann) integral
b
I f(x) dx to exist is that the function f(x) be bounded on [a, b].
a
For example, if f(x) is integrable on [a, bi] where b1 < b and is unbounded
in a neighbourhood of x = b then the definite (Riemann) integral of f(x)
does not exist on [a, b]. Thus we need some concepts and definitions to
generalize the notion of the definite integral to that involving an unbounded
integrand.
Let f(x) be integrable on [a, b - e] for any whatever small e > 0 and
be unbounded in (b - e, b) (Fig. 11.3). To deduce what we mean by writing
b
) f(x) dx
a
we consider the function in e (e > 0)
b-e
J(e) = ) /(x) dx.
a

Definition. If the function J(e) has a finite limit L as e-. 0 + 0 we


b
say that the improper integral ) f(x) dx converges and set
a
b b-e
r
J
f(x) dx lim r
= e-+O+O J
/(x) dx = L.
a a
11.5 Improper Integrals of Unbounded Functions 521

If the function J(e) has no limit as e ~ 0 + 0 the improper integral


b
) f(x) dx is said to diverge; in this case no numerical value is assigned
a
to this integral.
1

Example. Investigate the integral j dx

0
1
◄ The integrand f(x) = ----;===- 1s a continuous function on any
✓ 1 - x2
[0, 1 - e] where e > 0. Hence it 1s integrable. On the other hand

!J

0 a b-~ b x

Fig. 11.3

f(x) ➔ + co as x ~ 1 - 0. Then we have


1-e
lim f dx = lim (sin - 1 (1 - e)) = sin - l 1 = 271",
e-+O+O J
0
✓ 1 _ x2 e-+O+O

so that the improper integral in question converges. Notice that the substi-
11"/2

tution x = sin t reduces the original improper integral to f cost dt,


7r/l
J
0
cost

whence we obtain the definite (Riemann) integral j dt = ; by putting


the integrand equal to 1 at t = ; . ►' 0
522 11. Improper Integrals

Analogously, if a function f(x) is unbounded only in (a, a + e) where


b
e > 0 we define the improper integral J f(x) dx as
a
b b
r f(x) dx = lim r f(x) dx.
J ·
a
£--+0+ 0 J
a+£

This integral is called convergent if the finite limit on the right exists; other-
wise it is divergent.
1
dx
Example. Investigate the integral )
0

◄ By the definition we have

Since for a ~ 1 there holds

Xl -a 1 EI - a
1
1- a e 1- a 1- a

then
1

lim
£-+0+0
r
J
dx
------
1

provided that a.< 1.

1
1
dx
If a > 1 the integral - - has no finite limit as e -+ 0 + 0.
xa
e
Let a = 1. Then
1 1

~
dx
) = ) - In e -+ + oo as e -+ 0 + 0.
X
e e

Thus the integral

)
0
dx [converges
xa diverges
if
if
a < 1,
a ~ 1.
11.5 Improper Integrals of Unbounded Functi9_n--'--s_ _ _ _ _ _ _ _ _ 523

If the functionf(x) on [a, b] is unbounded only in some neighbourhood


of the point c, where a < c < b, then we set
b c b
I f(x) dx = ) f(x) dx + f f(x) dx
a a C

lim (
t;l---+0+0
CrJel f(x) dx + ~J f(x) dx]
t: 2 ---+0+0 a c+ t:2

We leave it to the reader to examine other discontinuities of f(x).

11.6 Improper Integrals of Unbounded Nonnegative


Functions. Convergence Tests
Theorem 11.7. Let f(x) and <P(X) be integrable functions on
[a, b - e] for any whatever small positive e and be unbounded in
(b - e, b). Let O ~ f(x) ~ <P(X) on [a, b). Then
b b
(1) if the integral f <P(x) dx converges so does the integral f f(x) dx;
a a
b b
(2) if the integral J f(x) dx diverges so does the integral f <P(X) dx.
a a
b
◄ Let the integral f <P(X) dx converge so that there exists
a
b-t:
lim
t:---+0+0
r
J
<P(X) dx = L.
a
b
We show that the integral f f(x) dx converges, i.e., that the function
b-t: a
J(e) = f f(x) dx has a finite limit as e ~ 0 + 0. Indeed, since f(x) ~ 0
a
on [a, b) then given any e > 0 (e < b - a) the function J(e) in nonnegative
and nondecreasing as e decreases. Also, if f(x) ~ <P(X) for all x E [a, b) then
given any e > 0 there holds
b-t: b-e
J f(x) dx ~ J <P(X) dx.
a a
b-t:
The value of the integral J 'P(x) dx does not exceed that of the conver-
b a
gent integral f 'P(X) dx; hence, given any e > 0 we have
a
b-t: b
J(e) = J f(x) dx ~ j ',O(X) dx = L.
a a
524 · 11. Improper Integrals

Therefore the function J(s) is nondecreasing as c ~ 0 + 0 and is bounded


above. This means that the finite limit of J(s) exists as s ~ 0 + 0 and, by
b
virtue of the definition, the improper integral J f(x) dx converges.
a
The second part of this theorem can easily be proved by contra-
diction. ►
Theorem 11.8. Let f(x) and <P(X) be positive functions on [a, b) and
let f(x) and cp(x) be unbounded only in some neighbourhood of the point
x = b. Let there exist
lim f(x) = k > 0.
x-+b-0 cp(X)
b b
Then both integrals J f(x) dx and J cp(x) dx either converge or diverge.
a a
We now turn to the absolutely convergent improper integrals involving
unbounded functions.
b
Definition. The integral J f(x) dx is said to converge absolutely if the
b a
integral J lf(x)I dx converges.
a
As in the case of the improper integrals with infinite limits of integration
the following theorem is true.
b
Theorem 11.9. If the integral J lf(x)I dx converges so does the integral
b a
1f(x) dx.
a
Applying this theorem we can easily prove the following convergence
test for the improper integrals involving unbounded functions.
Theorem 11.10. Let f(x) be a functon unbounded only in (b - s, b)
wheres > 0 is any whatever small number. Let there exist a positive number
a < 1 such that for all x sufficiently close to b, x < b, there holds
M
lf(x)I ~ (b - xr '
b
where Mis independent of x and M > 0. Then the integral j f(x) dx con-
verges absolutely. a
Problem. Show that if given any x sufficiently close to b, x < b, there
holds
M
lf(x)I ~ b _ x , M > 0,
b
then the integral J f(x) dx does not converge absolutely.
a
11.5 Improper Integrals of Unbounded Functions 525

Remark. The substitution b - x = lit or x - a = lit reduces the im-


proper integrals of unbounded functions to those with infinite limits of
integration; so the major results concerning the former integrals can be
deduced from the theory of the latter.

11.7 Cauchy Principal Value of the Improper


Integral Involving Unbounded Functions
Let f(x) be an integrable function on the closed intervals [a, c - e]
and [c + e, b] where e > 0, a< c < band let f(x) be unbounded in some
neighbourhood of the point c. As before, we can write
b c b
) f(x) dx = J f(x) dx + J f(x) dx
a a C

= lim
t:1 ->O+O
(c-t,J f(x) dx +
c+e2
1
J
f(x) dx].
e2-> 0 + 0

Notice that the limit on the right must exist as e1 and e2 tend inde-
pendently to zero.
Sometimes the improper integral thus defined has no meaning while
there exists its Cauchy principal value given by the formula

C.p.v.
b
J f(x) dx = e~f?lo
[ c-e
1 f(x) dx + b ]
J f(x) dx ,
a a c+e

where e > 0 is the same in both integrals on the right. In this case we say
that the improper integral converges in the sense of its Cauchy principal
value.
b

Example. Investigate the integral f(x) = J -x-c


1
- - where c E (a, b).
a
◄ We have

dx
- - = (In Ix - cl)
x-c a
a
b
b - c e1
+ (In Ix - cl) = I n - - - + In - .
c - a e2
The limit of the right-hand side as both e1 and e2 in an arbitrary manner
tend to zero does not exist. Put e1 = e2 = e. Then as e --+ 0 + 0 the limit
of the right-hand side exists and is equal to the Cauchy principal value
526 11. Improper Integrals

of the original integral so that


b
dx b - c
C.p.v. J -x-c
- - = In---, a < c < b.
c-a

Problem. Let a function f(x) be defined in the neighbourhood


( - R, R) of the point x = 0 except probably at x = 0 and let f(x) be un-
bounded as x ~ 0. It is known that any function f(x) can be represented
in a neighbourhood of x = 0 by the sum of even and odd functions as

f(x) = f(x) - [< - x) + f(x) + [< - x) = /1 (x) + /2(x).


R R
Show that C.p.v. J f(x) dx exists if so does the integral J /2(x) dx
-R 0
f(x) + f(-x)
where f 2 (x) is the even function 2

Exercises
Using the definition of the improper integral determine whether or
not the following integrals converge
+ 00 + 00 + 00
+
xdx
. x + I
2 00
dx
1. dx. J dx.
J
0
x4 + I
2.
J
1
X
3 3.
0
X e-xl 4.
J2 xlnx

Determine whether or not the following integrals converge


+ 00 +oo + 00

xdx + I
X x 2 dx
5. 6. dx. 7.
J 1
x3 + x + I · J
1
x 2 +x+5 J
0
x4 + x 2 + I
2

+ 00 - 00 - 00

dx x tan - 1x dx
8.
J
1
1/x 2 + I
9.
J
0
1/x 4 + 1
dx. 10.
J
2
xlna X
(a is a real

number). 11.

+ 00
12. Evaluate the improper integral J x 2n + 1 e - xz dx where n is a positive
0
natural number.
Exercises 527

13. Show that


+ co +co

C.p.v. J sin x dx = 0 and C.p.v. J ~ = 0.


-co -co

Using the definition of the improper integral determine whether or not


the following integrals converge
2 1 0
1 el/x
dx dx
14.
J ✓x - 1
15. I x lnxdx. 16.
J 3rx-- 17.
J -3-dX.
X
0
1 0 - 1
1 1
el/x dx
-dx. 19.
18.
J
0
-
X
3
J
- 1
-4-·
X

Investigate the following integrals


1 1 1
dx x 3 dx dx
22.
20.
J ✓1 -
0
x4
21.
J
0
1/(1-x2)5 J e*- 1
0

1 +co
dx tan - 1 x
23. . 24 . dx, a ~ 0.
J
0
ex - cosx J
0
Xa

+co +co

25.
Xa
- - ~ dx (a and (3 are real numbers). 26.
rx cosx
----dx.
J
0
1 + x 13 J
0
X + 1

Answer

1. Converges; equal to 1r/4. 2. Diverges. 3. Converges; equal to 1/2. 4. Diverges.


5. Converges. 6. Diverges. 7. Converges. 8. Diverges. 9. _Diverges. 10. Converges for a > 1.
11. Converges for a > 0. 12. ~! . 14. Converges; equal to 2. 15. Converges; equal to -1/.iJ.
16. Converges; equal to 3/2. 17. Converges; equal to -2/e. 18. Diverges. 19. Diverges.
20. Converges. 21. Diverges. 22. Converges. 23. Diverges. 24. Solution. The integral in question
is a union of the integral with infinite limits of integration and that involving unbo-
- 1
unded functions. Indeed, its upper limit is infinite and its integrand f(x) = xa
tan x is

not defined at the point x = 0 and is unbounded as x --+ 0 for sufficiently large a > 0. We
divide the interval of integration into two parts so that the first subinterval would contain
the point of discontinuity of f(x) at x = 0 and the second one would be appropriate to examine
the behaviour of f(x) as x --+ + oo. Consider, for example, the semiclosed intervals (0, 1) and
528 11. Improper Integrals

[1, + oo). Then we have


+ex, I +ex,

j
0
tan - Ix
x°'
dx = j
0
tan - x
1

x°'
dx +
J
tan - 1 x
x°'
dx.

Consider the first integral on the right. Recall that tan - 1 x ~ x as x ..... O; hence,
tan - 1 x 1
f(x) is similar to - - - near the point x = 0. This implies that for the integral
x°' X°' - I
I

j
0
tan - 1 x
- - - - dx to converge it is necessary that a - 1
x°'
< 1 or a < 2. The integrand

- I
f(x) = tanx" x of the second integral on the right behaves similarly to the function~
x°'

as x ..... + oo since tan - 1 x ..... '!!" as x ..... + oo. For this integral to converge it is necessary that
2
a > 1. Combining both conditions we infer that both integrals on the right converge only
if 1 < a < 2; this is the condition for the original integral to converge. 25. Converges if
a > -1 and {3 - a > 1. 26. Converges conditionally. Hint: apply the Dirichlet convergence
test.
Chapter 12
Functions of Several Variables

12.1 Basic Notions and Notation


The notion of a function of one variable is somewhat inadequate to
model relations and phenomena we come across in the surrounding world.
Many real-world functions depend on two or more variables. For example,
the volume of the rectangular parallelepiped with edges x, y and z is given
by the formula V = xyz where x, y and z are all positive. In this case we
have four quantities; the value of Vis fully determined by the values of
three other variables x, y and z. This is one of the numerous applications
of the notion of a function of several variables; the major topic of this
chapter is the calculus of functions of several variables.
Let Rn be the n-dimensional Euclidean space and let M' (x{ , xf, , ... ,
Xn') an d MIi (x 1 , x2 , ••• , XnII) b e two points
II II . . t h.1s space. U.T
1n vve let
e (M' , M 11
denote the distance between At[' and M and define
) ff

n
e(M',M")= ~(xl(-x/;)2.
k=l

If n = 1 this formula gives the distance e(M', M ff) = jx{' - x{ I be-


tween the points M' (x{) and M (x{') of a line; putting n = 2, we obtain
ff

the distance e(M' , M" ) = ✓ (x{' - x{ ) 2 + (xf,' - xi ) 2 between the points


M' (x{, xi) and M" (x{', xf,') in a plane. ,
Definition. Let Mo(xf, x~, ... , x~) E Rn and let e be a positive real num-
ber. The set of all ME. Rn such that e(M, Mo) < e is called the n-
dimensional open ball with centre at Mo and radius e.
If n = 2 we have (x - .xo)2 + (y + Yo)2 < e2 • This expression defines a
circular disk with centre at Mo(Xo, Yo) and radius e. Notice that the circle
bounding this disk is not taken into consideration as shown in Fig. 12.1.
For n = 3 we obtain a ball with radius e and centre at Mo(Xo, Yo, Zo)
specified by the equation (x - .xo)2 + (y - Yo)2 + (z - Zo) 2 < e2 • Again
the bounding sphere is out of consideration (Fig. 12.2).
We shall a-I.so consider a somewhat different neighbourhood of a point
M0 (x~, ~, ... , x~) that can be called a rectangular neighbourhood. We
34-9505
530 12. Funct1~ 0
..... n~f~Se~ver~al · bles
V:~an.=.::a-~-

lj

Fig. 12.1

X
Fig. 12.2

Fig. 12.3
12.1 Basic Notions and Notation 531

mean here a set of all points M(x1, x2 , •.. , Xn) in Rn such that
x?-e;<x;<x?+e;, e;>O, i= 1, 2, ... , n.
Putting n = 1, we obtain the e-neighbourhood Xo - e < x < Xo + e of
the point Xo well familiar to us from the calculus of functions of one varia-
ble; for n = 2 we get the plane figure bounded by the rectangle with sides
2e1 and 2e2 (sides are excluded, Fig. 12.3) and for n = 3 we have the (open)
solid bounded by the parallelepiped with centre at Mo(Xo, Yo, zo) and edges
2e1, 2e2 and 2e3 (Fig. 12.4).

I eM0
)---------
/
/
/
/
/

Fig. 12.4

Definitions. Let E C Rn. The point ME E is called the interior point


of the set E if there exists e > 0 such that E contains M along with its
e-neighbourhood.
The set E is called the open set if E contains only interior points. For
example, for n = 2 any circular disk is an open set.
The point P, PE Rn, is called the boundary point of the set EC Rn
if any neighbourhood of P contains both points belonging to E and those
not belonging to E.
The set of all boundary points of E is called the boundary of E; we
denote the boundary of E by i:JE.
The union of the set E and its boundary i:JE forms the closed set
E = EU i:JE. For example, the union of a circular disk and the circle bound-
ing the disk is a closed set.
The set E c Rn is called connected if given any two points in E a con-
tinuous curve (or a polygonal line) joining these points is fully contained
only in E (Fig. 12.5a). Otherwise the set is called unconnected (Fig. 12.5b).
34*
532 12. Functions of Several Variables

An open connected set is called a domain. A domain is said to be


bounded if there exists a ball that contains this domain.
Any domain that contains a given point Mo is called a neighbourhood
of Mo (in distinction from the t:-neighbourhood of Mo).
Notion of a function of several variables. Suppose that there is known
a rule by which every point M(x1, x2, ... , Xn) of a set E in then-dimensional
Euclidean space Rn is associated with a real number u. Then we say that
a function of the point M or a function of n variables x1, x2, ... , Xn 1s
defined on E and write
U = f(M) or U = f(x1, X2, ... , Xn), ME E.
The set E is called the domain of the function f.

(a) The connected set (b) The u.nconnected set

Fig. 12.5

We shall mainly confine our analysis to the functions of two variables


z = f(x, y); the major results can easily be generalized to functions of more
than two variables.
When a function is specified analytically by a single formula and its
domain is not indicated in advance we shall regard the set of all points
M(x1, x2, ... , Xn), where the formula attains a finite real value, as the do-
main of this function. For example, the domain of the function z = x + y is
the xy-plane and the domain of the function z = ✓ 1 - x2 - y 2 is the circle
x2 + y2 ~ 1.
Let a function z = f(x, y) be defined on a domain E in the xy-plane.
Then every point (x, y) EE corresponds to the point (x, y, f(x, y)) in the
3-dimensional space. The set of all points (x, y, f(x, y)) where (x, y) EE
is called the graph of the function z = f(x, y). For example, the paraboloid
of revolution is the graph of the function z = x2 + y 2 (Fig. 12.6).
To investigate the behaviour and visualize the shape of the function
z = f(x, y) it is helpful to turn to level curves. The level curve is a set of
points in the xy-plane where the value of the function is constant so that
z f(x, y)=c. It can be constructed by intersecting the surface z-f(x, •Y)
12.2 Limits and Continuity 533

with the plane z = c parallel to the xy-plane as the vertical projection


of the line of intersection into the xy-plane. A collection of the level curves
f(x, y) = Cm, m = 1, 2, ... , k, where Cm+ 1 - Cm = h = canst, provides a
useful information about the function. It is easily seen that the closer the
level curves to each other the higher the rate of change of the function.
For the function z = x2 + y 2 its level curves are circles centred at the origin
of coordinates (Fig. 12.7, where h = 1).

2 X

lj

Fig. 12.6 Fig. 12.7

In the case of functions of three variables we can turn to level surfaces.


The level surface of the function u = f(x, y, z) is the set of points
M(x, y, z) in space where the value of u = f(M) is constant. For example,
the level surfaces of the function u = x 2 + y 2 + z 2 are spheres centred at
the origin of coordinates.

12.2 Limits and Continuity


Definition. Let f(M) be a function defined in some neighbourhood
D of the point Mo(Xo, Yo) except probably at Mo. The number A is called
the limit of f(M) at the point Mo(Xo, Yo) if given any number e > 0 there
exists a number o > 0 such that for all points M(x, y) ED distinct from
Mo(Xo, Yo) there holds
1/(M) -Al< e
whenever O < e(M, Mo)< o.
534 12. Functions of Several Variables

In symbols we write
A = lim f(M) or A = lim f(x, y ).
M-+Mo x-+xo
y->yo

We assume he.re that the point M may tend to Mo in an arbitrary mode


(along any arbitrary direction or subject to any arbitrary law) and all the
limiting values of f(M) thus obtained must be equal to the number A.
Examples. (1) The functionf(x, y) = x 2 + y 2 is defined on the xy-plane
and f(O, 0) = 0. We show that the limit of this function at the point
0(0, 0) is equal to zero.
◄ Consider any e > 0. Then the condition lf(x, y) - 0I < e becomes
lx2 + Y 2 - 0I < e or lx 2 + y 2 1 < e. Noting that ✓ x 2 + y 2 = r(M, 0) where
M(x, y) is a point with the coordinates (x, y), we may write lx 2 + y 2 1 < e
as e2 (M, 0) < e or e(M, 0) < re.
Put o = re.
Then given any M(x, y)
such that e(M, 0) < o = re
we have lx2 + y 2 - 0I < e or lf(x, y) - 0I < e
(Fig. 12.8). By the definition of the limit this means that the number A
is the limit of the given function at the point 0(0, 0). ►

Fig. 12.8

(2) The functionf(x., y) = is defined everywhere except at the


22xy 2
+y X
point 0(0, 0). We investigate the behaviour of f(x, y) provided that
(x, y) tends to zero along the lines y = kx, x ¢ 0.
◄ The lines defined by the equation y = kx pass through the origin of
coordinates. We have
2x2k
f(x, kx) = 2 x2 , x ¢ 0.
(1 + k)
12.2 Limits and Continuity 535

Whence
2k
f(x, kx) ~ 2 as x ~ 0
1+ k
so that for different values of k the limiting values of the function are
different. This means that the given function has no limit at the point
0(0, 0). ►
2
(3) The function f(x, y) = 4x y 2 is defined everywhere on the xy-
x +y
plane except at the origin 0(0, 0) of coordinates. Again we investigate the
behaviour of this function provided that (x, y) approaches zero along the
lines y = kx, x -;t:. 0.
◄ We have

kx 3
f(x, kx) = 4 2 2 , x ;;t 0
X + k X
so that f(x, kx) ~ 0 as x ~ 0. Hence, f(x, kx) has a limit equal to zero
for any line y = kx, i.e., for any line along which the point tends to the
origin of coordinates. If we put y = x 2 then f(x, x2 ) = 1/2, x ,;t 0. This
means that the limit exists when a point tends to the origin by moving
along the parabola y = x 2 • However this limit is equal to 1/2. Thus the
given function has no limit at the point 0(0, 0). ►
Theorem 12.1. Let f(M) and cp(M) have limits at the point Mo. Then
at Mo there exist the limits of the sum f(M) + cp(M), the difference
/(M) - ,p(M), the product f(M),p(M) and the quotient ;~~ (provided
that lim 'P(M) -;t:. 0) and
M-+Mo

lim (j(M) ± '{)(M)) = lim f(M) ± lim cp(M),


M-+¼ M-+¼ M-+¼

lim (j(M) · '{)(M)) = lim f(M) · lim cp(M),


M-+¼ M-+¼ M-+¼

lim /(M)
M-+Mo
lim
M-+Mo
/(M)
'{)(M) lim '{)(M)
M-+Mo
(J~~o cp(M) '#:- O).
It is sometimes helpful to use the following definition of the limit which
is equivalent to the preceding one.
Definition. Let a function /(M) be defined in some deleted neighbour-
hood O of a point Mo, i.e., in some neighbourhood of Mo which does not
contain Mo. The number A is called the limit of the function /(M) at Mo
536 12. Functions of Several Variables

if given any sequence of points {Mn } that converges to Mo (Mn E 0,


Mn ;ae Mo) the corresponding sequence {/(Mn)) of the values of f(M) con-
verges to A.
Remark. The above notion of the limit implies that all variables simul-
taneously tend to their limiting values, i.e., (x, y) ~ (Xo, Yo). Sometimes
we have to deal with limits obtained by tending variables to their limiting
values in succession; there are salient features of functions of several varia-
2 2
bles. By way of illustration we consider the function f(x, y) y2 = x2 -
X +y
defined everywhere except at the point 0(0, 0). Tending x to zero for
y = const ;ae 0, we have lim f(x, y) = -1. If we put now x = const ;ae 0 and
x--+O
tend y to zero then limf(x, y) = 1. Thus we have
y--+O

lim [limf(x, y)] ;ae lim [limf(x, y)].


y--+O x--+O x--+O y--+O

This means that the limit of f(x, y) depends upon the order in which we
tend the variables to their limiting values.
Definition. Let a function f(M) be defined at a point Mo(Xo, Yo) and
in some neighbourhood n of Mo. The function f(M) is called continuous
at Mo(Xo, Yo) if lim f(M) = f(Mo) or, equivalently, if lim f(x, y) =
M--+ Mo X--+ Xo
y->yo

f(Xo, Yo). We assume here that the point M(x, y) approaches Mo(Xo, Yo)
in an arbitrary mode, being always contained in the domain of f(M).
Using the "eo" definition of continuity of a function at a point we
say that a function f(M) defined at the point Mo and in some neighbour-
hood O of Mo is continuous at Mo if given any e > 0 there exists o > 0
such that for all points ME O there holds lf(M) - f(Mo)I < e whenever
e(M, Mo)< o.
We can also give slightly different definition of continuity of a function
at a point Mo. We let AX and .dy denote the increm~nts of the independent
variables x and y in moving from the point Mo(Xo, Yo) to the point
M(x, y). Let .dz = f(Xo + AX, Yo + .dy) - f(Xo, Yo) be the increment in z =
f(x, y) correspondng to AX and .dy. Then the expression
lim f(x, y) = f(Xo, Yo)
x-+ Xo
y-+yo

becomes equivalent to
lim ~ = 0,
.:1x--+ 0
.:1y --+ 0
12.2 Limits and Continuity 537

which describes the condition for the function z = f(x, y) to be continuous


at the point Mo(Xo, Yo). We assume here again that the increments AX, dy
can approach zero in an arbitrary manner independently of each other.
The preceding notion refers to the continuity of a function with respect
to every and all variables; it implies that if a function f(x, y) is continuous
at some point Mo(Xo, Yo) thenf(x, y) is continuous at Mo(Xo, Yo) with respect
to botn the variable x and the variable y.
However the converse is not always true; if a functionf(x, y) is continu-
ous at some point Mo with respect to both x and y f(x, y) is not necessarily
continuous at Mo. To illustrate what we are speaking about we consider
the function
2xy 2 2
2 2' X + y ¢ 0,
X +y
[
f(X, y) = 0, X = y = 0.

Clearly, we have f(x, 0) = 0 for all x, so that lim f(x, 0) = 0 =


x-+O
/(0, 0). Hence, f(x, 0) is continuous with respect to x at x = 0. Similarly,
the functionf(0, y) is continuous with respect toy at y = 0 since /(0, y) = 0
for all y so that lim /(0, y) = 0 = f(0, 0). However, the function f(x, y)
y-+O
is not continuous at the point 0(0, 0). Indeed, let y = x. Then
2
lim f(x, y) = lim 2 2x 2 = 1 ¢ f(0, 0).
x=y-+O x-+O X +X
This function fails to exhibit continuity since we have assumed that the
variables x and y approach their limiting values in moving along the x-
and y-axes only and have left aside infinitely many other directions from
which (x, y) may approach the point 0(0, 0).
Theorem 12.2. If both f(M) and cp(M) are continuous functions at the
point Mo so are the sum f(M) + cp(M), the product f(M)cp(M) and the
difference f(M) - ,p(M) and so is the quotient ~~~ provided that
cp(Mo) ¢ 0.
If a functionf(M) is continuous at every point of the domain D, f(M)
is called continuous on the domain D. The point where the function /(M)
is not continuous is called the discontinuity of f(M). Discontinuities of
f(x, y) can either be isolated or fill in some curves. For example, the func-
tion f(x, y) = 2
1
has the only discontinuity 0(0, 0) while discon-
2
x +V l
tinuities of the function f(x, y) = 2 2 form the lines y = x and
X -y
y = -x.
538 12. Functions of Several Variables

Theorem 12.3. If the function f(M) is continuous on the closed bounded


domain D then f(M) is bounded on D and attains its absolute maximum
and absolute minimum in D.
This theorem is a generalization of the respective theorems for functions
of one variable (see Sec. 7.3).

12.3 Partial Derivatives and Differentials


Let a function z = f(x, y) be defined on some domain D in the
xy-plane and let (x, y) be some interior point of D. Consider the increment
AX in x such that (x + AX, y) ED (Fig. 12.9).

lj

0 X

Fig. 12.9

The increment
Axz = f(x + AX, y) - f(x, y)
is called the partial increment in z caused by the increment AX in x.
We let ~ denote the ratio of the partial incn;ment in z to the respec-
tive increment in x; clearly, this ratio is a function of AX.
Definition. If the ratio ~ has a finite limit as AX ➔ 0 this limit is
called the partial derivative of the function z = f(x, y) at the given point
(x, y) with respect to the independent variable x.
We shall denote the partial derivative of z = f(x, y) with respect to x
by writing

:: or J;(x, y) or z;(x, y).


12.3 Partial Derivatives and Differentials 539

Therefore we put

az = lim .dxZ
ax ~x--o .dX
or, equivalently,
F'( . f(x + ax, y) - f(x, y)
Jx X, y) = 1Im ---------.
~x--o .dX
Analogously, we have
az = lim .dyZ = lim _fi_(x_,_Y_+_.d_J'_)_fi_(_x_,_Y_).
ay ~y--o .dy ~y--0 .dy
Let u = f(x1, X2, ... , Xn) be a function of n variables. Then

au
--=
. f(X1 ,X2 ,.,.,Xk - 1 ,Xk + Lll"k ,Xk + 1 , ••• ,Xn) - f(X1 ,X2 , ... ,Xk - 1 ,Xk ,... ,Xn)
1I m------------------------.
axk ~Xk _. O .dXk
Noting that .dxZ and .dyZ are computed by regarding x and y, respective-
ly, as constants, we can give the following definitions of partial derivatives.
The partial derivative of a function z = f(x, y) with respect to the varia-
ble x is an ordinary derivative with respect to x computed by regarding
y as a constant; similarly, the partial derivative of z = f(x, y) with respect
to y is an ordinary derivative with respect to y computed by regarding x
as a constant. Whence it follows that ordinary and partial derivatives are
subject to the same laws of differentiation.
Example. Compute the partial derivatives of the function z = eXJ'.
◄ We have

-
az = yeXJ' and -
az = xeXJ' ►
ax ay
Remark. If a function z = f(x, y) has partial derivatives with respect
to every varia~le at some point, z = f(x, y) is not necessarily continuous
at this point. For example, the function
xy 2 2
2 2 ' X + y ;e 0,
[
X +y
f(X, y) = 0, X=y = 0

is not continuous at the point 0(0, 0) but has both a partial derivative
with respect to x and that with respect toy sincef(x, 0) = 0 andf(0, y) = 0

so that -
aJ = 0 and - aJ = 0.
ax (0,0) ay (0,0)
540 12. Functions of Several Variables

Geometric interpretation of partial derivatives. Consider a surface S


specified by the equation z = f(x, y) in three-dimensional space where
f(x, y) is a continuous function having partial derivatives on some domain
D. We wish to interpret geometrically partial derivatives of f(x, y) at a point
Mo(Xo, Yo) ED, which corresponds to the point No(Xo, Yo, f(Xo, Yo)) on the
surface z = f(x, y).
On computing the partial derivative
.
aazX at Mo we think of z=
f(x, y) as a function of one variable x and treaty as a constant y = Yo,
i.e., z = f(x, Yo) = !1 (x).
The function z = f1 (x) specifies the curve -L obtained by intersecting
the surface S with the plane y = Yo (Fig. 12.10). Recalling the geometric

Fig. 12.10

interpretation of ordinary derivatives, we can write f{(Xo) = tan a where


a is the angle between the x-axis and the tangent to the curve L at the

point No. Since /{(Xo) = ( ;: ) then ( :: ) = tan a.


(Xo, Yo) Mo

Thus ( :: ) 1s the slope of the tangent at No to the curve


Mo
formed by intersecting the plane y = Yo with the surface z = f(x, y).
12.3 Partial Derivatives and Differentials 541

Analogously, ( :; ) = tan {J.


Mo
Differentiable functions. Let a function z = f(x, y) be defined on some
domain D in the xy-plane and let (x, y) be some point in D. We let LlX
and Ay denote the increments in x and in y such that (x + Ax, y + Liy) ED.
Definition. The function z = f(x, y) is said to be differentiable at
(x, y) ED if the total increment
Liz = f(x + Ax, y + Ay) - f(x, y)
corresponding to the increments LlX and Ay admits a representation of the
form
Liz =A Ax + B Ay + a(Ax, Ay) Ax + {)(Ax, Ay)Ay,
where A and Bare independent of Lix and Ay (though they depend upon
x and y in general) and a(Ax, Ay) and {J(Ax, Ay) tend to zero as Ax and
Ay approach zero.
If a function z = f(x, y) is differentiable at a point (x, y) the linear
(relative to Ax and Ay) part A Ax + B Ay of its increment is called the
differential of z = f(x, y) at the point (x, y).
We denote the differential of a function z = f(x, y) by dz and write
dz = A Ax + B Ay.
Thus, Liz = dz + a Ax + {J Ay.
Example. Find the differential of the function z = x2 + y 2•
◄ Given any point (x, y) and any increments Ax and Liy there holds
Liz = (x + Ax/ + (y + Liy)2 - x 2 - y 2
= 2x Lix + 2y Ay + Ax Ax + Ay Ay.
Here we have A = 2x, {J = 2y, a(Ax, Ay) = Ax and {J(Ax, Ay) = Ay. Hence
a and {J tend to zero as Ax~ 0 and Ay ~ 0.
By the definition of differentiable functions the given function is
differentiable at every point in the xy-plane and dz = 2x AX + 2y Ay. ►
Notice that we· have not excluded the case when either LlX or Ay or both
are equal to zero.
We can abbreviate the formula (*) by introducing the distance between
the points (x, y) and (x + Ax, y + Ay) as
e = ✓ (Ax-)2 + (Ay)2 .
Then we can write

a Ax + /J Ay = -e- + /J QAy) e <e


( a Ax ~ O)
542 12. Functions of Several Variables

or
a ax+ (3 J).y = ee,
ax J).y
where e =a -- + (3 - - depends upon ax and J).y and tends to zero as
e e
Ax -+ 0 and J).y -+ 0 or, using the abbreviated notation, as e -+ 0.
Thus formula (*) expressing the condition for the function z = f(x, y)
to be differentiable becomes
i).z =A ax + B J).y + e e,
where e = e(e) -+ 0 as e -+ 0. For instance, turning to the previous example,
we have .dZ = 2x Ax+ 2y J).y + (ax)2 + (J).y) 2 = 2x ax+ 2y J).y + e2
where e(e) = g.
Necessary conditions for a function to be differentiable. We shall prove
the following important theorems.
Theorem 12.4. If a function z = f(x, y) is differentiable at some point
then f(x, y) is continuous at this point.
◄ Indeed, if z = f(x, y) is differentiable at a point (x, y) then the incre-
ment in z at (x, y) corresponding to the increments ax and J).y admits a
representation of the form
J).z =A ax + B J).y + a ax + (3 J).y,

where A and Bare constant at (x, y) and a-+ 0 and (3-+ 0 as ax-+ 0 and
J).y -+ 0, whence lim J).z = 0. This means that the function z = f(x, y) is
~x-> 0
~y _,. 0
continuous at the point (x, y). ►
Theorem 12.5. If a Junction z = f(x, y) is differentiable at a point then
. 1d .
Ji(x, y ) h as partla .
erzvallves az az at th.ts pomt.
ax an d ay .

◄ Indeed, let z = f(x, y) be differentiable at a point (x, y). Then the incre-
ment J).z corresponding to the increments ax and J).y can be written as
.dZ = A ax + B J).y + a(ax, J).y)ax + (3(ax, -1).y)J).y.
Put ax ;c 0 and J).y = 0. Then the above formula gives
J).xz =A ax + a(AX, 0)ax.
Whence
i).x Z
ax = A + a(ax, 0).

Since the value of A is independent of ax and a(Ax, 0)-+ 0 as ax-+ 0 then

A .
11m i).x Z
= --.
Ax-+O ax
12.3 Partial Derivatives and Differentials 543

This implies that at the point (x, y) there exists a partial derivative of
z = f(x, y) with respect to x and

az
ax
=A
.

By similar reasoning we infer that at the point (x, y) there exists a partial
derivative of z = f(x, y) with respect to y and !~ = B. ►
It is worth mentioning here that Theorem 12.5 states that there exist
partial derivatives at the point (x, y ); however, it tells nothing about con-
tinuity and the behaviour of the partial derivatives at (x, y) and in a neigh-
bourhood of the point (x, y).
From Theorem 12.5 it follows that
az az
dZ = ax .6.x + ay dy + a AX + {3 dy.

Sufficient conditions for a function to be differentiable. Recall that for


a function y = f(x) of one variable to be differentiable at the point Xo it
is necessary and sufficient that y = f(x) have a derivative f' (x) at Xo-
However we can not set forth similar necessary and sufficient conditions
even for a function z = f(x, y) of two variables; the sufficient conditions
exist apart from the necessary conditions discussed above. The former are
given by the following theorem.
Theorem 12.6. Let a function z = f(x, y) have partial derivatives f; and
f; in some neighbourhood of a point (Xo, Yo) and let f; and f; be continuous
at (Xo, Yo). Then z = f(x, y) is differentiable at (Xo, Yo).
The proof can be found elsewhere in the relevant literature.
Example. Investigate the function f(x, y) = 1/xi.
◄ This function is defined everywhere. By the definition of the partial
derivative we have

J;(o, O) = Jim f(t;.x, O~ f(O, 0) = Jim 1/Ax• 0 - 0 =0


Ax-+O Ax-+O dX
and
f;(0, 0) = lim f(O, Ay) - f(O, 0) = lim 1/o · Ay - 0 = 0.
Ay -+ 0 dy Ay -+ O dy

To define whether f(x, y) is differentiable or not at the point 0(0, 0)


we compute the increment of f(x, y) at 0(0, 0); we have

.4!if(O, 0) = ;/.4!ixdy = e(.4!ix, dy)g.


544 12. Functions of Several Variables

Since e = ✓ (dx) 2 + (dy) 2 then


;/dx dy
e(dx, dy) = ------;:::::=====-
✓ (dx) 2 + (dy) 2
For the function f(x, y) to be differentiable at 0(0, 0) it is necessary
that the function e(dx, dy) be an infinitesimal as dx-. 0 and dy-. 0. Put-
ting dy = dx > 0, we can write e(dX, dy) as
(dx)213
e(dx, dy) = - -
fi dx
It is easy to see that e (dx, dy)-. oo as dX _. 0 so that f(x, y) = ~
is not differentiable at 0(0, 0) although f(x, y) has partial derivatives J;
and J; at 0(0, 0). This result is attributed to the discontinuities of J; and
J; at 0(0, 0). ►
Total differential and partial differentials. If a function z = f(x, y) is
differentiable then the total differential is given as
dz = A dX + B dy.

Noting that A = :: and B = :; , we can write dz as


az dz
dz = - dX + - dy.
ax ay
To extend the notion of the differential we put differentials of indepen-
dent variables equal to their increments so that
dx = Ax and dy = dy.
Then the total differential of f(x, y) can be written in the form
az dz
dz =- dx + - dy.
ax ay
Example. Compute the differential of the function z = In (x + y 2 ).
◄ We have

dz = I 2y dy = dx + 2y dy
-----,2=- dx + 2 2 ►
x+y x+y x+y
Analogously, if u = f(x1, x2, ... , Xn) is a differentiable function of n
independent variables then
12.4 Derivatives of Composite Functions 545

The relation
dx z = f;(x, y) dx
is called the partial differential of a function z = f(x, y) with respect to
the variable x, and
dyz = J;, (x, y) dy
is called the partial differential of z = f(x, y) with respect to the variable y.
Using the above formulas for dz, dxz and dyz, we can write the differen-
tial of z = f(x, y) as
dz = dx z + dy z,
i.e., the total differential of z = f(x, y) is the sum of partial differentials.
Notice that in general the total increment Az in z = f(x, y) is not equal
to the sum of the increments in x and y.
Suppose that a function z = f(x, y) is differentiable at the point
(x, y) and dz '#- 0 at (x, y). Then the total increment
az az
Az = ax .:ix+ ay Ay + a(Ax, Ay).:ix + {3(Ax, Ay)Ay

differs from the linear part


az az
dz = - .:ix + - Ay
ax ay
only by the sum a .:ix + (3 Ay where .:ix and Ay as .:ix~ 0, Ay ~ 0 are in-
finitesimals of higher orders than the summands in dz; so if dz '#- 0 the
linear part of the total increment of a differentiable function is called the
principal part of this increment and the approximation
Az = dz
is widely used. Notice that the smaller the absolute values of the increments
in arguments the higher the precision of the approximation.

12.4 Derivatives of Composite Functions


Let a function z = f(x, y) be defined on some domain D in the
xy-plane and let both x and y be functions of a variable t so that
X = <P(t), Y = 1/;(t), to< t < t1.
We assume that for any point t in (to, t1) the corresponding point (x, y)
is contained in D. Then the substitution x = <P(t) and y = 1/;(t) reduces
z = f(x, y) to the composite function z = f[<P(t), 1/;(t)] of one variable t.
35-9505
546 12. Functions of Several Variables

Theorem 12.7. If at a point t there exist the derivatives

dx = cp' (t) and dy = VI' (t)


dt dt
and a Junction f(x, y) is differentiable given x = cp(t) and y = t/;(t) then
the composite function z = f[cp(t), t/;(t)] has a derivative !~ at t and
dz az dx az dy
-
dt
=ax- -
dt
+ay- dt
-.

◄ Let ~t be an increment in t which gives rise to the increments ~ and


~y in x and y and, hence, to the increment dz in the function z. Recall
that z = f(x, y) is a differentiable function at (x, y). Then if
( ~ ) 2 + (dy) 2 ~ 0 the increment ~z can be written in the form

az az
dz= ax ~ + ay ~y + a~+ (3 dy,

where a(~, ~y) and /3(~, ~y) tend to zero as ~--+ 0 and dy--+ 0. If
we put a(0, 0) = 0 and {3(0, 0) = 0 for ~ = dy = 0 then a(~, dy) and
{3(~, ~y) will be continuous as ~ = dy = 0.
Consider
~z az ~ az ~Y ~ dy
dt = ax M + ay M + a -z;F + f3 -z;F· (*)

Both factors of each summand on the right have limits as ~t --+ 0. In-
. I d envatlves
d eed , t he part1a . . az an d ay
ax az are constant at t h e given
. .
point
(x, y). By the hypothesis there exist limits

. -
11m ax- = -
dx = c,o , ( t ) an d 11m
. -~Y
- = - dy = .,. ,( )
.,,. t .
~t-+O ~t dt ~t-+O ~t dt

Since ddx and ddy exist at a point t the functions x = cp(t) and y = t/;(t)
t t -
are continuous at this point. This implies that as ~t--+ 0 both ~ and ~Y
tend to zero and, hence, so do a(~, ~y) and /3(~, ~y).
Thus the right-hand side of (*) has a limit as ~t --+ 0 equal to
:: : + :; dt .
This means that the left-hand side of(•) also has
. · as JJ.t--+
a I1m1t A Osot hat t here exists
· 1·1m ~~z = -d.
dz · EvaIuat1ng
· t he 1·1m1ts
·
~t- O JJ.t t
in (•) as dt ~ 0, we obtain
dz = az dx + az dy ►
dt ax dt ay dt .
12.4 Derivatives of Composite Functions 547

Example. Compute the derivative of the function z = x2 + y 2 where


x = sin t and y = t 3•

◄ Using the above formula, we get


!; = 2x cost + 2y 3t 2 = 2 sin t cost + 6t 5 = sin 2t + 6t 5 • ►
Let us consider a function z = f(x, y) where y = 1/;(x), z being a compo-
site function of x so that z = f(x, 1/;(x)). Then
dz az az dz
-
dx
=ax
-+ --
ay dx'

where :: is a partial derivative of z = f(x, y) with respect to x computed


·
by regarding the variable . T h e denvat1ve
y as a constant. . . dz . t h e to taI
dx 1s
derivative of z = f(x, y) with respect to the independent variable x comput-
ed by regarding y as a function of x, i.e., y = 1/;(x) so that the relationship
between z and x is taken into account.
az dz y
Example. Compute ax and provided that z = tan - 1 - and
dx x
y = x2.
◄ az = _!_
ax ax
(tan - y) 1
X
=
x2
Y
+ y2 '
az =
dy
X
;x2 + y 2

and dz = az + az dy = _ y x 2x 1 ►
dx ax ay dx x2 + y + x2 + y
2 2 = 1+ x2 .
Now we draw our attention to differentiation of a composite -function
of several variables. Consider the function z = f(x, y) where x = c,o(I;, 17)
and y = i/;(t 11) so that z = z(t 11) = /(c,o(I; 11), 1/;(€, 11)). Let continuous par-
. l d . . ax ax ay d ay .
tla envat1ves af' a11 , af an ¼ exist at a ·point (t 11) and let
f(x, y) be differentiable at the corresponding point (x, y) where x = c,o(t 11)
and y = if;(t 11). 0
We show that the composite function z = z(I;, 11) has the derivatives 0~
and :: at the point(~. 71) and derive the respective formulas. Notice that
this case is almost similar to that considered before. Indeed, on differentiat-
ing z with respect to I; the variable 17 is regarded as a constant; this implies
that x and y become functions of one variable E, i.e., x = ',O(I;, c) and
y = if;(~, c) so that formula (••) is fully applicable here.
Using (••), we obtain
oz oz ax oz oy
a1; = ax a1; + ay a1; ·
35*
548 12. Functions of Several Variables

Analogously
az az ax az ay
-
a,,, =ax- -
a +ay- a- •
11 11

Example. Compute the partial derivatives :; and :~ of the function


z = x2 y - xy 2 where x= ~rJ and y=i .
rJ
az az ax az ay 2 2 1
◄ a~ = ax a[ + ay a[ = (2.xy - y )r, + (x - 2xy) -;,

= ( 2t~ ! - ;: )~ + ( t'l - 2t~ !)! = 3e ( ~ - !)


and
~ )
-az = -az -ax + -az -ay = (2xy - y2 )~ + (x 2 - 2.xy) ( - -
a,,., ax a11 ay a,,, 71 2

- ( 2t~ ~ - :: ) H ( e~ 2 - 2t~ !)(- ~t, ) = t' ( 1 + ;, ) · ►


If a composite function u = f(x, y, z) is given by x = x(t r,),
y = y(t r,) and z = z(t r,) so that u = u(t r,) then
au au ax au ay au az
-=--+--+--
at ax at ay at az a~
and
au au ax au ay au az
-
a,,,= ax
-- a,,,+ -
ay -a,,,+ -
az -a,,,.
In particular, if u = f(x, y, z) where z = z(x, y) then

au = aJ + aJ az and au = aJ + . aJ az
ax ax az ax ay ay az ay .

Here :: is a total partial derivative of u with respect to the indepen-


dent variable x allowing for the relationship between u and x, ;; is a
partial derivative of u = f(x, y, z) with respect to x computed by regarding
z
the variables y and as constants. The meanings of
au aJ
:uy and !1y are similar
to those of ax and ay , respectively.
12.4 Derivatives of Composite Functions 549

Differentials of composite functions. If z = f(x; y) is a differentiable


function of independent variables x and y then the total differential dz of
z is given by
az az
dz = ax dx + ay dy, (*)
where dx = AX and dy = dy.
Now we suppose that z = f(x, y) is a composite function. Assume that
x = <,0(~, 71) and y = t/l(t r,) and that these functions have continuous par-
tial derivatives with respect to ~ and 71 at a point (~, 71). Let there exist
.
continuous . 1denvat1ves
part1a . . az an d ay
ax az at t he p01nt
. (x, y) correspon d'1ng

to (t 71) so that z = f(x, y) is differentiable at (x, y). Then the function


z = f[cp(t r,), 1/;(t 71)] has at the point (t 71) the derivatives
az az ax az ay
af = ax af + ay af
and
az. ;: : az ax + az ay
a11 -- ax a11 ay 071 .
. easy to see t h at
It 1s az
af an d az
ar, .
are continuous at t h e t he point
.

(t r,). Hence, the function z = f[<P(t 71), 1/;(t 71)] is differentiable at


(t rJ) and by virtue of the formula for the total differential of independent
variables 71 and ~ we have
az az
dz = af dt + 071 d'YJ.

Substituting the expressions for :; and :~ into the above formula,


we get

dz = (az ax az ay)dt (az ax+ az ay)d


ax at + ay at <; + ax a11 ay a11 11
or
az ( ax ax ) az ( ay ay )
dz = ax af d~ + a11 d11 + ay af d~ + a11 d11 .
By the hypothesis the functions x = <,0(t 71) and y = 1/;(t 71) have con-
tinuous partial derivatives at the point (t 71). Then they are differentiable
at (~, 11) and
ax ax ay ay
dx = af d~ + a11 d71 and dy = af d~ + 811 d71.
550 12. Functions of Several Variables

From the above formulas for dz and dx it easily follows that

dz
az
= -ax dx + -
az dy.
oy
Comparing the above formula with (*) we infer that the total differential
of the function z = f(x, y) is expressed by formulas of the same form when
the variables x and y are independent and when x and y are functions of
some other variables. Thus the form of the expression for the total differen-
tial of a function of two and more variables remains the same (invariant)
in both these cases.
Remark. Based upon the previous results it is easy to verify that the
differentiation formulas
d(x ± y) = dx ± dy,
d(xy) = xdy + ydx,
d(x) = ydx - xdy.
y y2

remain valid for x and y being differentiable functions of any finite number
of variables, i.e., for x = cp(t 71, !;, ... ) and y = t/;(t 71, !; , ... ).

12.5 Implicit Functions


Consider the equation F(x, y) = 0 where F(x, y) is a function of
two variables defined on the domain G in the xy-plane. We suppose that
for every x in some interval (Xo - ho, Xo + ho) there exists the only value
of y which satisfies the equation F(x, y) = O; clearly in this case the func-
tion y = y(x) such that the identity F(x, y(x)) = 0 is fulfilled for all x in
(Xo - ho, Xo + h) is fully determined. Then we say that the equation
F(x, y) = 0 specifies the quantity y as an implicit function of x.
In other words, the fu~ction y = y(x) specified by the equation
F(x, y) = 0 involving y is called the implicit function. To express the rela-
tionship between y and x explicitly we have to solve the original equation
with respect to y.
Examples. (1) The equation y - x = 0 specifies y as a single-valued
function y = x of x at every point of the x-axis.
(2) The equation y - x - e siny = 0, 0 < e < 1, specifies y as a single-
valued function of x.
◄ To verify this we investigate the original equation y - x -
e siny = 0. Notice that this equation is satisfied if x = 0 and y = 0. We
regard x as a parameter and consider the functions z = y and
z = x + e sin y. It is clear that if for a given Xo there exists the only value
12.5 Implicit Functions 551

Yo such that the point (Xo, Yo) satisfies the original equation, then the line
z = y and the curve z = Xo + e sin y intersect at a unique point and vice
versa. Let us draw the graphs of z = y and z = Xo + e sin y in the zy-plane
(Fig. 12.11).
The graph of z = x + e sin y where x is thought of as a parameter is
obtained by translating the graph of the curve z = e sin y along the z-axis.
It is easy to see that the graphs of z = y and z = x + e sin y meet at the
only point whose ordinate y is a function of x implicitly specified by the
equation y - x - e sin y = 0, 0 < e < 1. This relationship can not be ex-
pressed in elementary functions. ►

z = x0 + e: s[n y

z = c sin y

Fig. 12.11

(3) The equation .x2 + y 2 + 1 = 0 does not specify y as a real-valued


function of x whatever is a real value of x.
Analogously we can introduce the notion of an implicit function of
several variables.
The following theorem specifies the sufficient conditions for the equa-
tion F(x, y) = 0 to be solvable for yin some neighbourhood of a given
point Xo,
Theorem 12.8. Let there be given an equation F(x, y) = 0 and let the
following conditions be satisfied:
(i) the function F(x, y) is defined and continuous in the rectangle
D(Xo - 01 < x < Xo + 01, Yo - oz< y < Yo + oz), 01 > 0, oz> 0, with
centre at (Xo, Yo);
552 12. Functions of Several Variables

(ii) the function F(x, y) vanishes at the point (Xo, Yo) so that
F(Xo, Yo) = 0;
... ) there exist
(111 . contmuous
. . I d envatlves
partla . . aF
ax an d aF . D;
ay m

(iv) iJF(: Yo) ;,' 0.

Then given any sufficiently small positive number e there exists a neigh-
bourhood Xo - 0o < Xo < Xo + 0o of Xo, 0o > 0, where there exists a unique-
ly defined *> continuous function y = f(x) (Fig. 12.12) such that Yo = f(Xo),
IY - Yol < e and the equation F(x, f(x)) = 0 becomes an identity, 1.e.,
F(x, f(x)) = 0
for all x in the given neighbourhood of Xo-

I
I
l(ro,Yo}I
I I
I I
I I
I

Fig. 12.12

The function y = f(x) is continuously differentiable in the neighbour-


hood of Xo and
aF
dy ax
dx aF
ay

•> We regard y = f(x) as "uniquely defined" in the sense that the coordinates of any
point which lies on the curve F(x. y) = 0 and is contained in the neighbourhood
0 = (Xo - &i < x < Xo + &i, Yo - e < y < Yo + e) of the point (Xo, Yo) are related by the
equation y = f(x).
12.5 Implicit Functions 553

◄ We deduce (*) for an arbitrary implicit function assuming that the


. .
d envat1ve dy .
dx exists.
Let y = f(x) be a continuous implicit function given by the equation
F(x, y) = 0. Then for any x in (Xo - 0o, Xo + 0o) there holds the identity
F(x, /(x)) = O; hence dF(::(x)) = 0 for any point in this neighbourhood.
By the chain rule of differentiating a composite function we have

-dF(x,
- dx - - = -aF
-y(x)) -
ax
aF dy
+ -- --
ay dx.
Therefore for y = f(x) we get
aF + aF dy =O
ax ay dx .
Whence
dy
dx

Example. Compute : of the function y = y(x) defined by the equa-


tion x2 + y 2 = R 2 •
aF aF
◄ Here we have F(x, y) = x2 + y 2 - R 2 and - - = 2x and - - = 2y.
ax ay
Using (*), we obtain ddy = - x (y ~ 0). ►
X y
Remark. Theorem 12.8 specifies the sufficient conditions for an implicit
function whose graph passes through a given point (Xo, Yo) to exist. However
it tells nothing about the necessary conditions. Indeed, consider the equa-
tion (y - x)2 = 0. Here the function F(x, y) = (y - x) 2 has continuous
partial derivatives F; and F; and F; = 2(y - x) is equal to zero at the point
0(0, O); nevertheless the equation F(x, y) = (y - x)2 has the unique solu-
tion y = x vanishing at x = 0.
Problem. Consider the equation y 2 = x2. Let y = y(x), - oo < x <
+ 00, be a single-valued function satisfying y 2 = x 2 •
Find out how many (1) single-valued functions of the form y = y(x),
- 00 < x < + 00; (2) single-valued continuous functions; (3) single-valued
differentiable functions; (4) single-valued continuous functions y = y(x),
1 - o < x < 1 + o, if y(l) = 1 and o > 0 is sufficiently small, satisfy the
equation y 2 = x 2•
Similarly to Theorem 12.8 we can formulate a theorem for an implicit
function z = z(x, y) of two variables specified by the equation
F(x, y, z) = 0.
554 12. Functions of Several Variables

Theorem 12.9. Let there be given an equation F(x, y, z) = 0 and let


the following conditions be satisfied:
(i) the function F(x, y, z) is defined and continuous on the domain
D such that
Xo - 01 < X < Xo + 01,
[
D = Yo - bi. < Y < Yo + bi.,
Zo - 03 < Z < Zo + 03,

where 01 > 0, bi, > 0 and 03 > 0;


(ii) F(Xo, Yo, Zo) = 0;
(iii) there exist continuous partial derivatives F;, F; and F; in D;
(iv) Fl(Xo, Yo, Zo) ;t. 0.
Then given any sufficiently small e > 0 there exists a neighbourhood
0 of the point (Xo, Yo) where there exists a uniquely defined continuous
function z = f(x, y) such that z = f(x, y) attains the value Zo at x = Xo
and y = Yo, /z - Zol < e and the equation F(x, y, z) = 0 becomes an identi-
ty, i.e., F(x, y, f(x, y)) = 0 for all (x, y) in 0. The function z = f(x, y)
has continuous partial derivatives f; and f; in 0.
Let us deduce formulas for f; and f;. Suppose that the equation
F(x, y, z) = 0 specifies z as a single-valued differentiable function
z = f(x, y) of independent variables x and y. If we substitute f(x, y) for
z then F(x, y, z) = 0 becomes the identity
F(x, y, f(x, y)) = 0 (x, y) E 0.
Hence, the partial derivatives of F(x, y, z) where z = f(x, y) with respect
to x and y must also be equal to zero. On differentiating, we get
aF + aF az = O and aF + aF az = O.
ax az ax ay az ay
Whence
az
ax
These are the formulas for partial derivatives of an implicit function
of two indep~ndent variables.
Example. Compute the partial derivatives of a function z(x, y) given
by the equation F(x, y, z) = x 2 + y 2 + z 2 - R2 = 0.
aF aF aF az x
◄ We have - - = 2.x, - - = 2y and - - = 2z; whence - = - - and_
ax ay oz ax z
:; = -: (z ;c 0). ►
To verify the results we can find an explicit formula for the function
and apply standard formulas for partial derivatives dir~ctly.
12.6 Tangent Planes and Normal Lines to a Surface 555

12.6 Tangent Planes and Normal Lines to a Surface


Let S be a surface given by the equation F(x, y, z) = 0.
Definition. The point M(x, y, z) on a surface S is called the regular
(nonsingular) point of S if all the three partial derivatives !~, :;
and !1;, at M exist and are continuous and at least one of them is distinct
from zero.
The point M(x, y, z) is called the singular point of S if all the three
. 1 d envatives
partla . . aF
ax , aF az
ay an d aF vams. h at M or at 1east one O f t h em
does not exist at M.

Fig. 12.13 Fig. 12.14

Example. Consider a circular cone given by the equation x2 + y2 -


z2 = 0 (Fig. 12.13).
◄ Here we have F(x, y, z) = x 2 + y2 -
aF
z2 so that -a= 2x, -
aF = 2y
~ X ~
and az = - 2z. The only singular point is the origin of coordinates
0(0, 0, 0) where all the three partial derivatives vanish. ►
Let L be. a space curve given by the parametric equations
X = ',O(t),
[ y = i/;(t), a < t < {3.
z = w(t),
556 12. Functions of Several Variables

Suppose that <P(t), V1(t) and w(t) have continuous derivatives <P' (t), VI' (t)
and w '(t) at every t such that a < t < {3. We leave aside singular points
of L where cp' 2 (t), + VI ' 2 (t) + w ' 2 (t) = 0 and consider a regular
point Mo(Xo, y 0 , Zo) on L specified by the value to oft, to E (a, /3). Then
the vector -r ;::: x' (to)i + y' (t0 )j + z' (to)k lies on the tangent to the curve
L at the point Mo(Xo, Yo, Zo).
Now we choose a regular point P of a surface S and draw through P
a curve L lying in S. Let the curve be given, as before, by the parametric
equations
X = <P(t),
[ y = i/; (t), a < t < {3
z = w(t),

and let <PU), V1(t) and w(t) have continuous derivatives that nowhere in a(/3)
vanish simultaneously.
By the definition the tangent to L at Pis called the tangent to the sur-
face S at P.
If the above parametric equations are substituted into the equation
F(x, y, z) = 0 of the surface S the latter becomes an identity with respect
to t so that F(<P(t), 1/;(t), w(t)) = 0 since the curve L lies in the surface S.
Differentiating this identity as a composite function of t, we get
aF dx + aF dy + aF dz =O
ax dt oy dt oz dt .
The expression on the left is a scalar product of the vectors
oF . aF . aF k d dx . dy . dz k
n = ax I + oy J + az an T = dt I + dt J + dt .
The vector -r is a tangent vector to the curve L at the point P. The vector
n being independent of the shape of the curve passing through the point
P depends only on the coordinates of P and on the shape of the function
F(x, y, z).
Since P is a regular point the length of n

lnl = (!~Y + (:r + (:r


is distinct from zero.
The identity (•) implies that (n, -r) = 0. This means that the vector -r
tangent to the curve L at the point P is perpendicular to the vector n at
P (Fig. 12.14). The same reasoning is fully applicable to any curve lying
in the surface S and passing through the point P. Hence, any tangent to
the surface Sat the point Pis perpendicular to the vector n and thus all
such tangents lie in the same plane perpendicular to the vector n.
12.6 Tangent Planes and Normal Lines to a Surface 557

Definition. The plane formed by all tangents to a surface S through


a given regular point P E S is called the tangent plane to the surface at P.
aF aF aF ] .
The vector n = [ -a , -a- , -a- 1s a normal vector of
P X PY P Z P

the tangent plane F(x, y, z) = 0 at the point P. Whence we get the equation
of the tangent plane to the surface F(x, y, z) = 0 at the regular point
Po(Xo, Yo, Zo) as

( :F) (x-.xo)+ ( :F) (y-yo)+ ( aF) (z-Zo)=O.


X (xo,Yo,Zo) y (Xo,Yo,Zo) az (Xo,Yo,Zo)

If the surface S is given by the equation z = f(x, y) we can write it


as F=f(x, y) - z = O; whence aF = aJ, aF = aJ and aF = -1.
ax ax ay ay ay
Then the equation Zo = /(Xo, Yo) of the tangent plane at the point
Po(Xo, Yo, Zo) becomes

(Xo,Yo)
(x - Xo) + (!1 )
Y (Xo,Yo)
(y - Yo).

Geometric interpretation of the total differential. The substitution


x - Xo = AX and y - Yo = ~y reduces the above equation to

(Xo,Yo>
~+ ( a/)
ay (Xo,Yo)

The expression on the right is the total differential of the function


z = f(x, y) at the point Mo(Xo, Yo) in the xy-plane so that z - Zo = dz.
Therefore the total differential of the function z = f(x, y) of two in-
dependent variables x and y at the point Mo corresponding to the incre-
ments ~ and ~Y is equal to the increment z - Zo of the applicate z of
the tangent plane to the surface at the point Po(Xo, Yo, /(Xo, Yo)) obtained
in moving from Mo(Xo, Yo) to M(Xo + AX, Yo + ~y).
Definition. The line that passes through the point Po(Xo, Yo, Zo) of the
surface F(x, y, z) = 0 perpendicularly to the tangent plane to this surface
at Po is called the normal to the surface at Po.
. t h at t he vector n = ( aF
. easy to notice
I t 1s ax , aF
ay , ~ aF] P 1s . t he
direction vector of the normal given by the equations
x- Xo z - Zo

(Xo, Yo, Zo)


558 12. Functions of Several Variables

If the surface S is specified by the equation z = f(x, y) the equations
of the normal at the point Po(Xo, Yo, Zo) become
X - Xo Y - Yo z - Zo
-1
(!) I (>o,Yo) - ( Z) I (Ao.~) -

Example. Write down the equations of the tangent plane and the normal
to the surface z = x2 + y 2 at the point 0(0, 0, 0).
◄ We have /(x, y) = x2 + y 2 ·so that :~ = 2x and :; = 2y; these
derivatives vanish at the point (0, 0), i.e., J;(o, 0) = J;(o, 0) = 0. Then the
equation of the tangent plane becomes
z- 0 = 0(x - 0) + 0(y - 0),
i.e., z = 0 (the tangent plane is the xy-plane).
The equations of the normal are

x-0
0
-
y-0
0
z-0 or
1
[xy=0
=0

i.e., the normal is the z-axis. ►

12.7 Derivatives and Differentials of Higher Orders

Let a function z = f(x, y) have partial derivatives :: and :; at

every point in· some domain D. Clearly, these derivatives :: = J;(x, y)


and :; = J;(x, y) are functions of x and yin D; they can have deriva-
tives at some or all points of D.
The partial derivatives of :: anct' :; , if the)' exist, are called these-
cond derivatives of the function z = f(x, y) or the partial derivatives of
the second order. Given a function z = f(x, y) of two independent variables
x and y we can write partial derivatives of the second order as
12. 7 Derivatives and Differentials of Higher Orders 559

The derivatives J; and J;: are called the mixed partial derivatives; the
former is computed by differentiating the given function first with respect
to x and then with respect toy and the latter is computed by differentiating
the function first with respect to y and then to x.
The partial derivatives of the third and higher orders can be defined
in a similar way.
Example. Compute the first and second derivatives of the function
z = x3y2 - xy3.
◄ We have

- az = 3x2y2 - Y 3 -az = 2.x3y - 3xy2


ax ' ay '
a2 z
--=6xy
2 a2 z
--=2x
3
-6xy
ox
2 ' ay 2 '

a 2z 2 2 a 2z 2 2 ►
ay ax = 6x y - 3y ' ax ay = 6x y - 3y .

Notice that the mixed partial derivatives and z; z;:


are identically equal.
The explanation of this result is given by the following theorem.
Theorem 12.10. Let a Junction z = f(x, y) have partial derivatives J;,
J;, J; and J;: in some neighbourhood of a point Mo(Xo, Yo) and let J;
and J;: be continuous at Mo(Xo, Yo).
Then J;(Xo, Yo) = J;~(Xo, Yo) at Mo.
It is important that the mixed partial derivatives J; and J;~ be continu-
ous at Mo(Xo, Yo). For example, the function
xi_ Yi
xi +Yi~ 0,
f(X, y) = [
xy 2
X ;,y
2 ,

X=y=O

have mixed partial derivatives J; and J;~ which are not continuous at the
point 0(0, 0) so that J;(o, 0) = -1 and J;~(0, 0) = I.
In general, the mth mixed partial derivatives (m ~ 2) of a function
u = f(x1, xi, ... , Xn) have the same value at a given point if they are con-
tinuous at this point and are distinct from each other by the order in which
the function is differentiated with respect to its variables.
Differentials of higher orders. Let z = f(x, y) be a function of variables
x and y defined on the domain D.
If z = f(x, y) is differentiable on the domain D then the total differen-
tial of z = f(x, y) at the point (x, y) E D corresponding to the increments
dx and dy in the variables x and y is given by the formula
az az
dz = ax dx + ay dy,
560 12. Functions of Several Variables

where dx = ax and dy = .dy are arbitrary increments in independent vari-


ables, i.e., these are arbitrary numbers independent of x and y. So we can
vary x and y keeping dx and dy constant. Given dx and dy the total differen-
tial dz is a function of x and y, which can also be differentiable.
Definition. The total differential of dz at a point (x, y) corresponding
to the increments of independent variables equal to dx and dy is called
the second differential or the differential of the second order of a function
z = f(x, y) and is denoted by the symbol d 2z so that d 2z = d(dz).
Let a function z = f(x, y) have continuous partial derivatives of the
first and second orders in the domain D. Then the total differential dz of
z = f(x, y) is a differentiable function so that in D there exists d 2 z.
Recall that dx and dy are constant. Then using the laws of differentia-
tion, we obtain
az dx + ay
d 2 z=d(dz)=d ( ax az dy)

= d(;; ) dx + d ( :; ) dy.
Applying the formula for the total differential to aaz and az , we get
X oy
d( :; ) = :x (:: )dx + :y (:ndy = !) dx + a~ :x dy, 2

d(!~) = :x(!;}1x+ :y(!;)dy=a~:y11x+ ::~ dy,


Substituting these relations into the formula for d 2 z, we arrive at
a2 z a2 z a2 z a2 z 2
d 2z = - - 2 (dx)2 + a a dxdy + a a dxdy + - -2 (dy).
ax Y X X Y oy
.
SInce aya2zax - OXa2zay an d bOt h m1xe
. d . 1d . . .
partla envat1ves are COntinU-
OUS functions we finally obtain
d2z = a2z dx2 + 2 a2z dxdy + - a2 z- dy, 2
2
ax2 axay ay
where dx2 = (dx) 2 and dy 2 = (dy) 2 •
It is convenient to write the above formula by applying the notation
:x dx + :Y dy; then we can write
dz= (! dx+ :y dy)'z.
2
12.7 Derivatives and Differentials of Higher Orders 561

The symbols :x and t are regarded as some "multipliers" so that by


completing the square as usually done in algebra, we arrive at the desired
result. By way of illustration we write

d=
2 a a
( -dx+-dy a2
) 2 =--dx+2
2
a
a2 dxdy+--
a2 dy 2 .
ax ay ax2 xay ay 2
Multiplying both sides termwise by z and inserting z into the nominators
of the "fractions" on the right, we get the formula previously deduced,
namely
a2 z a2z a2-
z 2
d 2z = - - 2 dx 2 + 2--dxdy + - 2 dy.
ax axay ay
Formulas for differentials of the third, fourth, etc., orders can be derived
in a similar way. In general, the total differential of the nth order denoted
by dnz is the total differential of the total (n - l)th differential so that
dnz = d(dn - 1 z).
If z = f(x, y) E cn(D) has continuous partial derivatives up to the nth
order derivatives then there exists the total differential of the nth order
given by the formula

d"z= (! dx+ :y dy)"z.


The total nth differential of a function u =/(xi, X2, ... , Xm) of m inde-
pendent variables X1, X2, ... , Xm is given by the formula
a a
dnu = ( -a- dx1 + -a- dx2 + . . . + a
a
dxm
)n u
lx-1 X2 Xm

provided that the appropriate conditions are satisfied.


Remark. If x and y are not independent variables but some functions
~ and 11 then the form of the second differential does not remain invariant.
◄ Indeed, let z = f(x, y) where x = <,0(t 11) and y = t/;(t 11); then the first
differential can be written as
oz az
dz = ax dx + av dy,
where dx and dy are some functions and, hence, they cannot be constant. So

dz= d( !!)t1x + d( !;)dy +


2 !! d(dx) + ii d(dy)
ad x + -ad y) z +az
= ( -ax -dx2 azd y2 .
2
+-
ay ax ay
Thus, the form of the second differential is not invariant in this case. ►

36-9505
562 12. Functions of Several Variables

12.8 Taylor's Theorem


Let a function z = f(x, y) have continuous partial derivatives up
to the nth derivatives at every point (x, y) in some o-neighbourhood of
the point (Xo, Yo) and let a point (Xo + .ix", Yo + ~y) be contained in the
o-neighbourhood of (Xo, Yo) (Fig. 12.15). We put
X = Xo + t ~,
Y =Yo+ t ~Y,
where t is some independent variable. Then
z = f(x, y) = /(Xo + t ~, Yo + t ~y) = ',O(l)

Fig. 12.15

becomes a composite function of t defined on the closed interval [0, 1]


and having derivatives up to the nth order on [0, 1]. This means that the
function z = cp(t) can be represented by Taylor's formula in powers of t
so that
<PI (0) <P II (0) <P(n - 1)(0)
<P(t) = <P(0) + 1! t+ 2! 12 + .. : + _(_n__-1-)!- tn - 1

+ <P(n)~(Jt) tn,
n.
where O < (} < 1.
Put t = 1. Then

where 0 < (} < 1.


12.8 Taylor's Theorem 563

We shall express the right-hand side of the above formula in terms of


the functions f(x, y) and its derivatives. Notice that being the functions
of t the variables x and y of f(x, y) have constant differentials dx = AX dt
and dy = Ay dt where AX and Ay are some given numbers. This means that
we can compute differentials of the functions z = f(x, y) by using the
formula
a a
dPz= ( ax dx+ ay dy
)pf(x, y)=
( axa a
(Axdt)+ ay (Aydt)
)p
X f(x, y) = ( axa a
Ax + ay Ay
)p f(x, y) (dtf,
so that

Put t = 0. Then x = Xo + t Ax = Xo and y = Yo + t Ay = Yo. The above


formula yields

<P>
'P(O)
a Ax + aya Ay)pf(x, y)
= ( ax x = Xo' p = 0, 1, ... , n - 1.
Y = Yo
We put t = 0. Then

cp<n> (0) = ( axa a )n


AX+ ay Ay f(x, y) X= Xo + 0 Ax'
y=yo+O~

Also notice that cp(l) = /(Xo + AX, Yo + Ay).


Substituting the relations for cp<P>(O), cp<n> (()) and cp(l) into (• ), we get

f(Xo+Ax, Yo+~) f(Xo, Yo)+ ( :x .ix+ :y Ay)1<x, Y) ; :: ~


+ 21. (

aaX AX+ aa~ Ay)21(x, y) x = .xo
y=yo

1 ( a
+ . . . + (n - 1)! ax AX +
a
au Ay
)n- f(x, y)
1
x = Xo
J y = Yo

1 ( a
+ n! a
ax AX + ay Ay f(x, y)
)n X = Xo + 0Ax'
y =Yo+ (J~

where O < () < 1.


36*
564 12. Functions of Several Variables

This is Taylor's formula for a function z = f(x, y) of two variables and

Rn
1 (
= nf a
ax AX+ -a-.dy
a )nf(x,. y) X = Xo + 8Jlx
y Y =Yo+ 8t:.y

is the remainder as given by Lagrange.


To abbreviate the notation it is often convenient to denote f(Xo + ax,
Yo + .1y) - f(Xo, Yo) by .1f I (Xo,Yo)· Then Taylor's formula becomes

.1f I (Xo,Yo) = df I (Xo,Yo) + i ! d 2f I (Xo,Yo)


+ · · · + (n _1 l)f. dn-lfl (Xo,Yo) + -n.1, dnf
(Xo + 8ilx, Yo + My)

This formula is frequently used' as the approximation to the increment .1f


of the function z = f(x, y) at the point Mo(Xo, Yo).
If the absolute values of AX and .1y are sufficiently small and df ;c 0
the differential df can be regarded as the approximation to .1f so that
.1f == df In this case the above formula involves only one term. When this
approximation seems to be inadequate the desired precision is attainable
by computing some other terms on the right.
Example. Expand the function f(x, y) = ex siny by applying Maclau-
rin's formula with the remainder of order three.
◄ Tuylor's formula with the remainder R3 takes the form

f(Xo + ax, Yo + .1y) = f(Xo, Yo) + J; (Xo, Yo) AX + J;, (Xo, Yo) .1y

+ i! ~;_(.xo, Yo)dX°2 + 2/;!,(Xo, Yo)dX t.y + J;,,(.xo, Yo)t.y 2]

+ i! ~;;,<x, y)dX3 + 3/:xy(X, y)t.x' t.y + 3/;;,,(x, y)dXt.y2

+ f;;,y(X, y)_.1yJ] X = Xo + 8Jlx •


Y =Yo+ My

Putting Xo = Yo = 0, .1x =x and Ay = y, we get Maclaurin's formula


f(x, y) = f(0, 0) + f;(o, 0)x + f;(o, 0)y

+ i, ~;(O, 0)x2 + 21..;co, O)xy + J;,,(o, 0)y2]

+ J! ~x'Zx(lix, lly)x' + 3/;"xy(llx, lly}x2y


+ f;J,,,(lix, lly)xy2 + f;f,,,(1/x, lly)y3] , 0 <11 < 1.
12.8 Taylor's Theorem 565

Recall that f(x, y) = ex sin y; then


f(x, y) = ex siny, f(O, 0) = 0;
J;(x, y) = ex siny, J;(O, 0) = 0;
J;(x, y) = ex cosy, 1;co, o) = 1;
J;;(x, y) = ex siny, J;;co, o) = o;
J;(x, y) = ex cosy, /;co, 0) = 1;
J;;(x, y) = -ex siny, 1;;co, o) = o;
J;;xex, y) = ex siny, J;;x(Ox, Oy) = e8 x sin Oy;
J;';y(x, y) = ex cosy, J;';y(0x, Oy) = e8x cos Oy;
J;j,y(x, y) = -ex siny, J;j,y(Ox, Oy) = -e 0x sin Oy;
Jyyy x, y = -e X cosy,
/"Ill ( )
J;;y(Ox, Oy) = - e8x cos Oy;

and Maclaurin's formula gives

ex siny = y + xy + ! [e 8x sin Oy · x 3 + 3e 8x cos ()y · x 2 y

- 3e 8 x sin Oy · xy 2 - e 0x cos ()y • y 3 ]. ►

Remark. It is easy to notice that Maclaurin's formula admits a represen-


tation of the form
f(x, y)=f(O, 0) + P1(x, y) + Pi(x, y) + ... + Pn-1(x, y) + Rn,
where Pk(X, y) is a homogeneous polynomial of degree k in x and y.
Sometimes Maclaurin's formula for a given function f(x, y) can easily
be obtained by applying asymptotic relations for infinitesimals.
Example. Expand the function f(x, y) = (l _ x)l(l _ y) by applying
Maclaurin's formula with the remainder as given by Peano.
◄ We have

1 1 1
f(x, y) = (1 - x)(l - y) - 1- x 1-y
= (1 + x + x 2 + o(x2 ))(1 + y + y 2 + o(y2 ))
= 1 + x + y + x2 + xy + y 2 + o(e 2 ),
where e2 = x2 + y 2 • ►
566 12. Functions of Several Variables

12.9 Extrema of a Function of Several Variabl.es


Let a function z = f(x, y) be defined on some region D and a point
Mo(Xo, Yo) be an internal point in this region.
Definition. If there exists a number o > 0 such that for all ax and t,.y
meeting the conditions laxl < o and l11yj < o we have
4/(Xo, Yo) = f(Xo + At', Yo + 11y) - /(Xo, Yo) ~ 0,
then Mo(Xo, Yo) is called the point of a local maximum f(x, y); if for all
ax, t,.y meeting the conditions jaxj < o and j11yl < o
11/(Xo, Yo) = f(Xo + At', Yo + .!iy) - f(Xo, Yo) ~ 0,
then Mo(Xo, Yo) is called the point of a local minimum.

z
z

!I

Fig. 12.16 Fig. 12.17

In other words, the point Mo(Xo, Yo) is the point of a maximum or a


minimum of f(x, y), if there exists a o-neighbourhood of Mo(Xo, Yo) such
that at all points M(x, y) within this neighbourhood the increment
11/ = /(x, y) - /(Xo, Yo)
retains its sign.
Examples. (1) For the function z = x2 + y 2 the point 0(0, 0) is a point
of minimum (Fig. 12.16).
(2) For the function z = 1 - x2 - y 2 the point 0(0, 0) is the point of
maximum (Fig. 12.17).
12.9 Extrema of a Function of Several Variables 567

(3) For the function

J(x, y) = e' ;,y2, x2


X =
+ Y2 -;c 0,
y = 0,
the point 0(0, 0) is the point of maximum (Fig. 12.18).
◄ There exists a neighbourhood of the point 0(0, 0), e.g., a circle of radius
1/2 (see Fig. 12.18), such that at all points in it other than
0(0, 0) the value of the function f(x, y) is less than 1 = /(0, 0). ►
We will only consider points of strict maximum and minimum, i.e., ones
for which the strict inequalities f:..f < 0 or f:..f > 0 are obeyed, respectively,
for all the points M(x, y) in some &-neighbourhood of the point Mo other
than Mo.
z

-
_.-
1
--- -

.r Fig. 12.18

The value of a function at a maximum point is called a maximum, and


the value of the function at a minimum point is called a minimum of the
function. Maximum and minimum points of the function are called extre-
mum points of the function, and the maxima and minima themselves are
called its extrema.
Theorem 12.11 (necessary condition for extremum). If a function
z = f(x, y) has an extremum at a point Mo(Xo, Yo), then at that point each
partial derivative oz/ox and oz/oy either vanishes or does not exist.
◄ Suppose that the function z = f(x, y) has an extremum at the point
Mo(Xo, Yo). We will assign to the variable y a value Yo. The function z =
f(x, y) will then be a function of one variable x: z = f(x, Yo). Since at
x = Xo it has an extremum (maximum or minimum; Fig. 12.19), its deriva-
tive with respect to z at x = Xo, i.e., (oz/ox) I (Xo,Yo) either vanishes or does
not exist. Likewise, we see that ozloy I(Xo,Yo) is zero or does not exist. ►
568 12. Functions of Several Variables

IJ=Yo

Fig. 12.19

Fig. 12.20

Points where az/ax and az/ay are zero or do not exist are called the
critical points of the function z = f(x, y), points where az/ax and az/ay
are zero are called its stationary points.
The theorem only gives the necessary conditions for an extremum. For
example the function z = x2 - y 2 has derivatives az/ax = 2x, ozloy = - 2y,
which become zero at x = y = 0. But at 0(0, 0) this function has no ex-
tremum.
12.9 Extrema of a Function of Several Variables
-----------------~---------~--~--------
569

Indeed, the function f(x, y) = x 2 - y 2 vanishes at the point 0(0, 0)


and at the point M(x, y) that are as close as possible to 0(0, 0), it assumes
positive or negative values. For it
D.j(0, 0)=f(x, y)-f(0, 0)=x2 -y 2 and [D./> 0 at po~nts (x, 0)
D.f < at points
0 (0, y)
at arbitrary small lxl > 0 and IYI > 0.
Point 0(0, 0) of this type is called a minimax point (Fig. 12.20).
Sufficient conditions for an extremum of a function of two variables
are given by the following· theorem:
Theorem 12.12. Let a point Mo(Xo, Yo) be a stationary point of a func-
tion f(x, y), i.e.,
J;(.xo, Yo) = 0 and J; (Xo, Yo) = 0.

Suppose that at some neighbourhood of the point Mo(Xo, Yo) including


Mo itself, the function f(x, y) has continuous partial derivatives up to the
second order. Then:
(i) the function f(x, y) has a maximum at Mo(Xo, Yo) if at that point
the determinant
D = J;x(.xo, Yo) /;/y(Xo, Yo)
/;/y(Xo, Yo) J;;(.xo, Yo)
= J;~(Xo, Yo)J;;(Xo, Yo) - J;;, 2 (Xo, Yo)> 0
and J;~(Xo, Yo) < 0 if;~(Xo, Yo) < 0);
(ii) the function f(x, y) has a minimum at Mo(Xo, Yo) if
D = J;~(Xo, Yo)f;~(Xo, Yo) - /;/y 2 (Xo, Yo) > 0
and J;~(Xo, Yo) > 0 (f;;(.xo, Yo) > O);
(iii) the function f(x, y) has no extremum at Mo(Xo, Yo) if
D = J;~(Xo, Yo)J;;(.xo, Yo) - J;;, 2(.xo, Yo) > 0.
If D = 0, then at the point Mo(Xo, Yo) the function f(x, y) may or may
not have an extremum. Further examination is then needed.
◄ We will only prove items (i) and (ii) of the theorem. We write Taylor's
formula of the second order for f(x, y):
f(Xo + AX", Yo + D.y) = f(Xo, Yo) + J;(Xo, Yo) AX
+ J;(Xo, Yo)D.y + ~ [f;~(x, y)AX2 + 2J;/y(x, y) AX Ay
+ J;~(x, y)~y2] X = Xo + 8ilx'
y = Yo + Ott.y
where 0 < fJ < 1.
570 12. Functions of Several Variables

As stated J;(Xo, Yo) = 0, J;(Xo, Yo) = 0, so that


A/= /(Xo + Lix, Yo + Ay) - /(Xo, Yo)

= 1Lt;:(x, y)~ 2 +2J;;(x, y)~ Ay+J;;(x, y)Ay 2 ] x = Xo + BilX. (•)


Y =Yo+ 8.1.y

It is seen that the sign of the increment A/ is determined by that of the


trinomial on the right of (*), i.e., by the sign of the second differential d 2f
We denote for short
A = J;:(x, y), B = J;(x, y), C = J;;(x, y)
and write (*) in the form

A/= ; (A ~ 2 + 2B ~ Ay + C Ay 2 ) X = Xo + 8/lX"
Y =Yo+ 8~

Suppose that at Mo(Xo, Yo) we have AC - B2 > 0, i.e.,


J;:(Xo, Yo) - J;;(Xo, Yo) - J; 2 (Xo, Yo) > 0.
Since the partial derivatives of the second order of f(x, y) are continuous,
the inequality AC - B2 > 0 will hold within some neighbourhood of
Mo(Xo, Yo).
When AC - B 2 > 0, A = J;ix, y) ~ 0 at the point Mo(Xo, Yo), and
therefore within some neighbourhood of Mo, since J;;(x, y) is continuous,
it has a constant sign that coincides with that of A at (Xo, y 0 ).
But in the region where A ~ 0, we have

A~ 2 +2B ~ Ay + C Ay 2 = ~[(A~+ B Ay)2 + (AC - B2 )Ay 2 ].

It is seen that for AC - B2 > 0 in some neighbourhood of the point


Mo(Xo, Yo) the sign of the trinomial AU+ 2B ~ Ay + C Ay 2 coincides
with the sign of A at the point (Xo, Yo) (and also with the sign of C, because
'A and C cannot have unlike signs for AC - B 2 > 0).
Since by virtue of(••) the sign of AU+ 2BAX Ay + C Ay 2 at the
point (Xo + (J AX, Yo + (J Ay) determines the sign of the difference
Aj = f(Xo + AX, Yo + Ay) - f(Xo, Yo),
we come to the following conclusion: if for the function f(x, y) at the sta-
tionary point (Xo, Yo) the conditions AC - B 2 > 0 and A < O(C < O) hold,
then for all sufficiently small IAXI and jAyl we have
4/ = /(Xo + AX, Yo+ Ay) - f(Xo, Yo)~ 0,
and so at (Xo, Yo) the function has a maximum.
12.9 Extrema of a Function of Several Variables 571

If at the stationary point (Xo, Yo) AC - B 2 > 0 and A > 0 (C > 0), then
for all sufficiently small jaxj and j..1.yj we have
..1./ = /(Xo + ~, Yo + ..1.y) - /(Xo, Yo) ~ 0,
and so at (Xo, Yo) the function f(x, y) has a minimum. ►
Examples. (1) Examine for extremum the function
z = x2 + 2y 2 - 2x + 4y - 6.
◄ Using the necessary conditions for an extremum, we find the stationary
points. For this purpose, we find the partial derivatives oz/ax and oz/oy
and equate them to zero.
We obtain the system of equations
oz
-=2x-2=0
ax '
oz
T = 4y + 4 = o.
Hence x = 1, y = -1, so that Mo(l, -1) is a stationary point.
We now make use of the Theorem 12.12.
We have

A
o2 z = 2, B
Mo - ox2 Mo Mo

= -a z
2
C 2 = 4,
Mo qy Mo

so that (AC - B2 ) IMo= 8 > 0.


Therefore, at Mo we have an extremum. Since A IMo= 2 > 0, this is
a minirnum.
If we thansform z to the form
z = (x - 1)2 + 2(y + 1)2 - 9,
we can easily see that the right side of (*) will be minimal when x = 1,
y = -1. This is an absolute minimum of the function. ►
(2) Examine for extremum the function z = xy.
◄ We find the stationary points of the function, for which purpose we
make up the system of equations
oz
ax = y = o,
az
OJ' = X = 0.
572 12. Functions of Several Variables

Hence x = y = 0, so that Mo(0, 0) is a stationary point. Since


A
a2 z = 0 B
a2 z 1,
- ax2
Mo Mo
' Mo axay Mo

C
az2
=0
- ayz '
Mo Mo

(AC - B 2 )1Mo = -1 < 0, and by Theorem 12.12 at the point Mo(0, 0)


there is no extremum. ►
(3) Examine for extremum the function z = x4 + y 4 •
◄ We find the stationary points of the function. From the system of
equations
az = 4x 3 = 0
--
ax '
az 3
~ = 4y = 0,

we get x = y = 0, so that the point Mo(0, 0) is a stationary point. Further,


we have ·

A
a2 z = 0, B
Mo - ax2 Mo Mo

C
a2 z =0
Mo - ayz Mo

so that (AC - B 2 ) I Mo = 0 and the theorem gives us no answer as to


whether or not there exists an extremum.
Therefore, we proceed as follows. For the function z = x4 + y 4 at all
points M(x, y) other than Mo(0, 0), we will have
ti.f(0, 0) = f(x, y) - f(0, 0) = x 4 + y 4 > 0,
so that, by definition, at the point Mo(0, 0) the fu[!ction z has an absolute
. .
m1n1 mum.
Reasoning along the same lines, we establish that the function
z = - x4 - y 4 at the point 0(0, 0) has a maximum, and the function
z = x4 - y 4 at 0(0, 0) has no extremum. ►
Conditional extremum. Up to this point we have been dealing with local
extrema of a function whose arguments are not subject to any additional
conditions. Such extrema are known as unconditional ones.
However, problems are quite common which deal with conditional extre-
ma. Take a function z = f(x, y) defined on a region D. Suppose that on
12.9 Extrema of a Function of Several Variables 573

this region a curve L is defined and we want to find the extrema of


f(x, y) only among those values which correspond to the points of the curve
L. Such extrema are called conditional extrema of the function
z = f(x, y) on L.
Definition. We say that at the point Mo(Xo, Yo) lying on L the function
f(x, y) has a conditional maximum (minimum), if the inequality
f(x, y) < /(Xo, Yo) (or f(x, y) > /(Xo, Yo)) holds at all points M(x, y) on
the curve L that belong to some neighbourhood of the point Mo(Xo, Yo)
and are different from Mo (Fig. 12.21).

Fig. 12.21 Fig. 12.22

If the equation of the curve Lis cp(x, y) = 0, then the problem on find-
ing a conditional extremum of z = f(x, y) of the curve L can be formulated
as follows: find extrema of the function z = f(x, y) in D under the condi-
tion that cp(x, y) = 0.
Therefore, in finding conditional extrema of the function z = f(x, y)
the arguments x and y can no longer be regarded as independent variables.
They are connected by the relation cp(x, y) = 0, which is known as a con-
straint equation.
To clarify the difference between unconditional and conditional extrema
we will take an example. The unconditional maximum of the function
z = l - x2 - y 2 (Fig. 12.22) is equal to unity and it is achieved at the point
(0, 0). Corresponding to it is the point M, i.e., the vertex of the paraboloid.
We now add the constraint equation y = l/2. The conditional maximum
will, clearly, be equal to 3/4. It is achieved at the point (0, 1/2), and cor-
57 4 12. Functions of Several Variables

responding to it is the vertex M1 of the parabola, which is the line of inter-


section of the paraboloid with the plane y = 1/2. In the case of the
unconditional maximum we have the maximum z-coordinate among all the
applicates of the surface z = 1 - x2 - y 2 ; in the case of the conditional
maximum we have the maximum z-coordinate only among those for the
straight line y = 1/2 in the xy-plane.
One method of finding a conditional extremum of the function
z = f(x, y)
with the constraint
<P(X, y) =0
is as follows.
Suppose now that the constraint equation <P(X, y) = 0 defines y as a
single-valued differentiable function of x: y = i/;(x). Substituting l/;(x) for
y in z = f(x, y), we obtain a function of one argument
z = f(x, 1/;(x)) = F(x),
which takes into account the constraint condition.
The extremum (unconditional) of F(x) is the desired conditional ex-
tremum.
Example. Find the extremum of the function
z= x2 + y2
subject to
X +y - 1 = 0.
◄ From the constraint equation(•*) we find y = 1 - x. Substituting this
value of y into (• ), we obtain a function of one argument x:
z = x2 + (1 - x)2.
Examine it for extremum: z' = 2x - 2(1 - x), hence x = 1/2 is a criti-
cal point; z" = 4 > 0, so that x = 1/2 (y = 1/2) is a conditional minimum
of z (Fig. 12.23). ►
There is another method of solving the problem on conditional extre-
mum, called the method of Lagrange multipliers.
◄ Let Mo(Xo, Yo) be a point of conditional extremum of a function
z = f(x, y) with the constraint <P(X, y) = 0. If we take y to be 1/;(x) defined by
the constraint equation <P(X, y) = 0, we find that the derivative-with respect
to x of f(x, y) at the point Mo must be zero or, equivalently, the differential
of f(x11 y) at Mo must be zero:

(d/) IMo= (f;dx + J;dy) IMo= 0.


12.9 Extrema of a Function of Several Variables 575

From the constraint equation we have


(dcp) IMo = (cp; dx + '{); dy) IMo = 0.
Multiplying the last relation by a numerical factor A yet to be found and
adding up the result term by term with (***), we obtain
(f; + A cp;) IMo dx + (f; + A cp;) IMo dy = 0.

Fig. 12.23

Suppose that A is taken such that


if; + A'P; ) IMo = 0
(we consider that cp; ~ 0); then, since dx is arbitrary, we have
(f; + A cp; ) IMo = 0.
The last two relations reflect the necessary conditions for an uncondi-
tional extremum at the point Mo(Xo, Yo) of the function
F(x, y) = f(x, y) + A cp(x, y),
called the Lagrange function.
Therefore, a conditional extremum of f(x, y), if cp(x, y) = 0, is bound
to be a stationary point of the Lagrange function
F(x, y) = f(x, y) + A cp(x, y),
where A is some numerical factor. ►
576
- - - -12.
-- Functions
---- of-
Several
--- Variables
-------------~-~----- -·

We thus obtain the following rule to find conditional extrema.


To find points that may be conditional extrema of the function z =
f(x, y) subject to the constraint <,0(x, y) = 0, we:
(1) form the Lagrange function
F(x, y) = f(x, y) + A <,0(x, y);
(2) equate to zero the derivatives aF/ax and aF/ay of the function and,
subjecting the resultant equations to the constraint equation, obtain a sys-
tem of three equations

:: = J;(x, y) + A <P;(x, y) = 0,

!~ = J;(x, y) + A,p;(x, y) = 0,
aF
aA = <P(x, y) = o,

from which we find the values A and the coordinates x and y of the possible
extremum points.
The question of the existence and nature of a conditional extremum
is solved by examining the sign of the second differential of the Lagrange
function

d 2 F(x, y) = !) dx 2 + 2 ;)2p dxdy + ;)2p dy 2


axay ay 2
for the set of values Xo, Yo, A obtained from (*) subject to the condition
: dx + :; dy = 0 ((dx) 2 + (dy) 2 ,.,_ 0).
If d 2 F < 0, then at the point (Xo, Yo) the function f(x, y) has a condi-
tional maximum, if d 2 F > 0, a conditional minimum.
Specifically, if at a stationary point (Xo, Yo) the determinant D for
F(x, y) is positive:

D(Xo, Yo) = > 0,

then at the point (Xo, Yo) f(x, y) has a conditional maximum, if A =


F;~(Xo, Yo) < 0, and a conditional minimum if A = F;~(Xo, Yo)> 0.
We now return to the example on p. 57 4 where we sought for an extre-
mum of the function z = x2 + y 2 subject to the condition that x + y = I.
We will solve the problem by the method of the Lagrange multipliers.
12.9 Extrema of a Function of Several Variables 577

◄ The Lagrange function will then be


F(x, y; A) = x2 + y 2 + A(X + y - 1).
To find stationary points we will construct the system:
F; = 2x +A= 0,
F; = 2y +A= 0,
F{ = x + y - 1 = 0.

From the first two equations of the system we will obtain x = y, and then
from the third equation of the system (constraint equation) we find
x = y = 1/2 (the coordinates of a possible extremum). It appears that
A = -1. The Lagrange function will thus be
F(x, y; -1) = x2 + y 2 - x - y + I.
2 0
For it F;~ = 2, F;; = 2, F;/y = 0, so that D = = 4 > 0 and
0 2
F;~ = 2 > 0, i.e., the point Mo(l/2, 1/2) is a conditional minimum of the
function z = x2 + y 2 subject to the condition x + y = I. ►
The fact that there is no unconditional extremum for the Lagrange func-
tion F(x, y) does not yet imply that there is no conditional extremum for
the function f(x, y) in the presence of the constaint <P(X, y) = 0.
Example. Find an extremum of the function z = xy subject to the condi-
tion y - x = 0.
◄ We write the Lagrange function
F(x, y; A) = xy + A(Y - x)
and also a system to find A and the coordinates of possible extrema:
F; = y - A= 0,
F; = x +A= 0, (**)
F{ = y - x = 0.

From the first two equations we find x + y = 0 and obtain the system
fx + y = 0,
l_y - x = 0,
whence x = y = 0. Then, A = 0. The Lagrange function will thus be
F(x, y; 0) = xy.
At the point (0, 0) F(x, y; 0) has no unconditional extremum; however,
z = xy has a conditional extremum when y = x. Indeed, we then have
z = x2, and so at the point (0, 0) there is a conditional minimum. ►

37-9505
578 12. Functions of Several Variables

The method of the Lagrange multipliers can be extended to include the


case of functions of any number of arguments.
We wish to find an extremum of the function
Z = f(X1, X2, ... , Xn) (*)
subject to the constraint equations

'{)1 (X1, X2, ... , Xn) = 0,


'{)2(X1, X2, ... , Xn) = 0,
(**)

where m < n.
We form the Lagrange function
F(x1, X2, • • . , Xn) = f(x1, X2, ... , Xn) + At '{)I (x1, X2, ... , Xn)
+ A2'{)2(X1, X2, ... , Xn) + ... + Xm'Pm(X1, X2, ... , Xn),
where A1, A2, ... , >w, are constant multipliers to be found.
Equating to zero all the partial derivatives of the first order of F and
attaching to the resultant equations the constraint equations (** ), we will
obtain a system of n + m equations from which we find A1, A2, ... , >w,
and the coordinates x1, x2, ... , Xn of the possible points of conditional
extremum. The question of whether or not the points obtained by the
Lagrange method are conditional extrema can often be solved from physical
or geometrical considerations.
Absolute maximum and minimum of a continuous function. We would
like to find the largest or smallest value of a function z = J(x, y) (i.e., its
absolute maximum or minimum), the function being continuous on some
closed region D. By Theorem 12.3 in this region there is a point (Xo, Yo)
at which the function attains an absolute maximum (minimum). If the point
(Xo, Yo) Ii~within the region D, then the functionfhas a maximum (mini-
mum) in D, so that in this case the point of interest is among the critical
points of f(x, y). However, the function may also attain its absolute maxi-
mum (minimum) at the boundary of the region.
Therefore, to find the absolute maximum (minimum) of the function
z = f(x, y) in the bounded closed region D we will have to find all maxima
(minima) of the function within the region, and also at the boundary of
the region. The largest (smallest) of these numbers will be the desired abso-
lute maximum (minimum) of z = f(x, y) in D. We now turn to a differentia-
ble function.
12.9 Extrema of a Function of Several Variables 579

Example. Find the absolute maximum and minimum of the function


z = x 2 + y 2 in the region D { -1 ~ x ~ 1, -1 ~ y ~ 1 } .
◄ We look for the critical point of z = x 2 + y 2 inside D. To this end,
we form up the system of equations
az
ax -
= 2x _ 0
- '
-
az 2y = 0.
ay =
Hence, x = y = 0, so that the point 0(0, 0) is a critical point of z. Since
a2 z
-- = 2
a2 z a2 z
- - = 2 - - = 0 at 0(0 0) AC - B 2 = 4 > 0 and
ax2 ' ay 2 ' ax ay ' '
A = 2 > 0, and so at this point the function z = x 2 + y 2 has a minimum
equal to zero.
z

Fig. 12.24

We now fin~the largest and smallest values of the function on the


boundary r of D. On the part of the boundary r1 (x = 1, -1 ~ y ~ 1}
we have z = 1 + y 2, :z = 2y, so that y = 0 is a critical point. Since
d
dy
2
!
= 2 > 0, at this pJ'nt the function z = 1 + y 2 has a minimum equal
to unity.
At the ends of the segment r1, at points (1, -1) and (1, 1) we have
z(l, -1) = z(l, 1) = 2. Using the considerations of symmetry, we can ob-
tain the same results for other parts of the boundary: r2 (y = 1,
-1 ~x~ 1), r3 (x= -1, -1 ~Y~ 1) and r4 (y= 1, -1 ~x~ 1}.

37*
580 12. Functions of Several Variables

We finally find that the absolute minimum z = x 2 + y 2 in D is zero


and it is achieved at the internal point 0(0, 0) of the region, and the abso-
lute maximum of the function, equal to two, is achieved at four points
on the boundary: M1 (1, -1), M2(l, 1), M3( - 1, 1), M4( -1, -1)
(Fig. 12.24). ►

Exercises
Find the domains of the following functions:
1. z= sin- 1 ~ + -Jxy. 2. z = ✓ 1 - x2 + ✓ 1 -y2 . 3. z= ✓sin(x2+y 2 ).

4. z =
x-y
1 + !Y . 5. z = In (x 2 + y). 6. z = xy + Vx2 + y 2 - R2 +

/In 2R
2
2 . 7. z = cot 7r(X + y). 8. (a) z = ✓sinx siny ; (b) z =
✓ X +y
✓ sin x - 1 + ✓ sin y - 1 .

Draw the level lines of the functions


9. (a) Z = X + y; (b) z = x2 + y 2•
y
10. (a) z = \ ; (b) z = VX.
X
11. (a) z = In (x 2 + y); (b) z = sin - 1xy.
Find the level surfaces of the functions of three independent variables:
12. U = X + y + z. 13. u = x2 + Y2 - z2.
Find the limits of the functions:
2 - vxy + 4 x2y
14. (a) lim (b) lim
x--+O xy x--+O x2 + y2 .
y--+O y--+O
sin (xy) . sin (xy)
15. (a) lim (b) lim
x--+O xy x--+O X
y--+O y--+2
16. (a) lim
x--+ 00
y--+k
(1 +~)'; (b) lim
x--+ 00
y--+00
x+y
x2 + Y2 ·

(k = const)
1
x2 _ Y2 . - xz + y2
17. (a) lim x2 + Y2 , (b) lim e
x-+O x-+O x2 + y2 .
y--+O y--+O
Exercises 581

x+y
18. Show that the function z = - - - has no limit as x --+ 0 and y --+ 0.
x-y
Consider the behaviour of the function on the straight lines y = kx.
Find the sets of discontinuity points for the following functions.
19. (a) z= 2
2
2 ; (b) z = In✓ x2 + y 2 •
X +y
1 1
20. (a) z = --------,=---------,,- ;
(b) z = (x _ y )2 .
1 - x2 - Y2
1
21. (a) z = cos - ; (b) z = xy.
xy
1
22. z = 2 2 •
sin 1rx + sin 1ry
Find partial derivatives of the functions and their total differentials:
23. z = x3 + y 2 - 2.xy. 24. z = tan - 1 yx .
25. z = e-xly_ 26. z = In (x + lny).
27. u = xy + yz + xz. 28. u = ✓x 2 + y 2 + z 2 •

29. z = cosh (x2 y + sinh y). 30. u = xz.


Find the derivatives of the composite functions:
31. (a) z = x 2 + xy + y 2, where x = t 2, y = t. Find :: .

(b) z = ylx, where x = e1, y = 1 - e21. Find !~ .


. az dz
32. (a) z = xeY, where y = tan - 1 x. Find ax and dx.
_2 2 • az dz
(b) z = In (x- - y ), where y = ex. Find ax and dx.

33. (a) z = tan - i; , where x = u sin v, y = u cos v. Find : and :~.

(b) z = x2 + y 2, where x = u + v, y = u - v. . az
Find iJu and av .
az
Using the formula for the derivative of a composite function of two
variables, find :; and :; of the functions:
34. (a) z = f(u), where u = sin - 1 xy + yx ,
. X
(b) z = f(u), where u = sin - + etanzy.
y
582 12. Functions of Several Variables

35. (a) z = f(u, v), where u = x 2 Iny, v = sin - 1 x


y
(b) Z = f(u, v), where U = exi+cosy, V = tan - l y ,
X

Find : of the functions given implicitly by the equations:

37. In tan Y Y = b.
X X
38 • X
2
y + .
Sill
- 1 X
- + -1 = 0.
y y
40. Find the slope of the tangent to the curve x 2 + y 2 = lOy at its intersec-
tion with the straighf line x = 3.
41. Find the points where the tangent to the curve x 2 + y 2 + 2.x -
2y - 2 = 0 is parallel to the x-axis.
. az az
Find ax and ay :
42. X COS y +yCOS Z + Z COS X = 1.
x2 Y2 z2
43. -2 + -2 -2 + = 1.
a b c
Write the equations of the tangent plane and normal to the surfaces:
44. z = x2 + 2y 2 at the point (1, 1, 3).
x2 Y2 z2 .
45. - 2 +- 2 +- 2 = 1 at the point (Xo, Yo, Zo).
a b c
46. z = sin x cosy at the point ('1r 14, 1r/4, 1/2).
47. z = x2 + y 2 + 2.xy at the point (1, 1, 4).
48. x2 + y 2 + xyz - 3 = 0 at the point (1, 1, 1).
49. Form the equations of the tangent planes to the surface x2 + 2y 2 +
3z 2 = 21 parallel to the plane x + 4y + 6z = 0.
Find three or four first terms in the Tuylor expansions of the functions:
50. f(x, y) = ex cosy in the neighbourhood of the point (0, 0).
51. f(x, y) = ex In (1 + y) in the neighbourhood of the point (0, 0).
52. f(x, y) = x" in the neighbourhood of the point (1, 1).
53. f(x, y) = tan - 1 t-
Y in the neighbourhood of the point (0, 0).
+ xy
54. f(x, y) = ex+y in
the neighbourhood of the point (1, -1).
Using the definition of the extremum of a function examine for extre-
mum the foil owing functions:
55. z = 1 - (x - 2)4 - (y - 3)4 at the point (2, 3).
56. z = (x - 2)4 + (y - 3)4 at the point (2, 3).
57. z = x4 - y 4 at the point (0, 0).
Exercises 583

58. z = sin4 x - (y - 1)4 at the point (0, 1).


Using the sufficient conditions for the extremum of a function of two
variables, examine for extremum the functions:
59. z = 2y - x 2 - y 2• 60. z = x 2 - 2x + y 2•
61. z = 2xy - 4x - 2y. 62. z = x 3 + 8y 3 - 6xy + 1.
63. z = ex12 (x + y 2 ).
64. Find the absolute maximum and minimum of the function z = x 2 - y 2
on the closed circle x2 + y 2 ~ 1.
65. Find the absolute maximum and minimum of the function
z = x 2y(4 - x - y) in the triangle bounded by the straight lines x = 0,
y = 0, X +y = 6.
66. Find the dimensions of a rectangular open pool having the smallest
surface area, if its volume is V.
67. Find the dimensions of a rectangular parallelepiped that for a given
total surface S has a maximum volume.

Answers
0 ~ X ~ 2, [- 2 ~ X ~ 0,
1. [ and 2. The square formed by the segments of the
y ~ 0, y ~ 0.
straight lines x = ± 1 and y = ± 1, including its sides. 3. The family of concentric circles
21rk ~ x2 + y 2 ~ (2k + l)1r, k = 0, 1, 2, .... 4. The entire plane save for the points on
the straight lines y = x and y = 0. 5. The part of the plane above the parabola y = -x2 •
6. The points on the circle x2 + y 2 = R 2 • 7. The entire plane save for the straight lines
x + y = n, n = 0, ± l, ± 2, ....

8. (a) r sin x ~ 0,
.
smy ~ 0 ,
which yields

2k,r ~ x ~ (2k + 1),r, k = 0, ±1, ±2, ... ,


[
2m1r ~ y ~ (2m + l)1r, m = 0, ±l, ±2, ... .

sin x ~ 0,
or [ . which yields
smy ~ 0 ,
(2k - 1),r ~ x ~ 2k1r, k = 0, ±1, ±2, ... '
[
(2m - l)1r ~ y ~ 2m1r, m = 0, ±1, ±2, ... ,
The domain is the hatched squares (Fig. 12.25)
sin x - 1 = 0,
(b) [ . l
smy - = O, which yields

[
x, = ! + 2/c,r, k = 0, ±1, ±2, ... '
Ym =- + 2m1r, m = 0, ±1, ±2,
2
The function is defined at the points Mkm = (Xk, Ym), 9. (a) Straight lines parallel to the
line x + y = O; (b) concentric circles with centre at the origin of coordinates. 10. (a) Parabo-
584 12. Functions of Several Variables

!I

::r

Fig. 12.25

las y = Cx2; (b) parabolas y = Cvx. 11. (a) Parabolas y = C - x 2 (C > O); (b) Hyperbolas
xy = C, where ICI ~ 1. 12. Planes parallel to the plane x + y + z = 0. 13. For u > 0 one-
sheet hyperboloids of revolution about the z-axis; for u < 0 two-sheet hyperboloids of revolu-
tion about the z-axis; both families of surfaces are separated by the cone x 2 + y 2 -
z 2 = 0. 14. (a) -1/4; (b) 0. 15. (a) l; (b) 2. 16. (a) ek; (b) 0. 17. (a) no limit; (b) 0.
(1 + k) x l - k .
18. We put y = kx, then z = - - - - = ---, x ;c 0. Fork= -1 we have hm z = 0,
(1 - k)x 1+ k x-o
for k = 1/2 lim z = 3, and for k = 3 lim z = -2, so that the given function has no limit
at the point (0, 0). 19. (a) The point (0, O); (b) the point (0, 0). 20. (a) The discontinuities
form the circle x2 + y 2 = 1; (b) the discontinuities form the line y = x. 21. (a) The points
of discontinuities lie on the x- and y-axes; (b) 0 (empty set). 22. All points (m, n), where
. az . .2 az ·
m and n are integers. 23. ax = 3x- - 2y; ay = 2y - 2x; dz = (3x 2 - 2y) dx + 2(y -

x) dy. 24. -
oz
= -2-y -2; -az = - X
dz = y dx - X dy
25. -
oz
= - ! e - x/y ·,
2 ; 2 2 •
oy x +y oy X2 +y X +Y OX Y
!!z X e - x/y !iz 1 az 1
;y = y2 e-x/y;dz = y2 (-ydx + xdy).26. ;x = X + lny; oy = y(x + lny);

y dx + dy au au au du = (y + z) dx +
dz = Y (X + 1ny)" 27. - 0-X =y + z; -ay = X + z; -
oz
=x+y;

au X au y au
(x + z) dy + (x + y)dz. 28. ax = ✓x2 + Yz + z2 ; oy = ✓ xi + Yi + zz --
oz
Exercises 585

z x dx + y dy + z dz oz . . az
-~~~~~~~~~~; du=--~---_-_-_-_-_-_-_-_-- 29. - = smh (x 2y+smhy) 2xy;
✓ x2 + Yi + z2 ✓ x2 + Yi + z2 ax ay
sinh (x 2y+sinh y)(x 2 + cosh y); 2
dz = sinh (x y + sinh y)[2xy dx + (x + cosh y) dy].
2

30. du = xyz - 1 [yz dx + xz In x dy + xy In x dz]. 31. (a) 4t 3 + 3t 2 + 2t; (b) - 2 cosh t.


32 _ (a) az = e>'; dz = e>' (l + x ) ; (b)
l + x2
az = . 2x
x2 - y 2
; dz = 2(x - yex) .
x2 - y2
az
33. (a)-=
au av
ax

O; - az = 1; (b)-
au
dx
az = 4u; -az = 4v. 34. (a)-=/
av
az
ox
, (u) ---;::===-+-
Y
✓ l _ x2y2
1] ;
Y
ax
l dx

jz_ = f' (u) [


qr
x
✓ 1 _ x 2y 2
- ~];
y
(b) az = f' (u) [cos~ . l +etan .zy -----~-
ax Y Y cos 2 xy
Y] '
_!z =f'(u)[-cos~ --~ + e1an.zy_l__ x]. 35. (a)_!_z__ = _af 2xy + -~[__ 1
qr Y y 2 2
cos xy ax au av ✓y2 _ x2
-~£ = aJ x2 _ aJ ==== X (b) az = aJ ~2+cosy 2x _ aJ Y oz _
ay au av ✓y2 _ x2 Y ax au av x + y2
2 1
ay
_ aJ ex 2 + cosy . sm

y + _aJ . - - - - -xc - - - 36. y' = X
37. y
I
= -y 38. y' =
au av x 2 + y 2 y X

yx!'- I - yx lny
~- -- 39. y' = - - - - . 40. At the point M1 ( - 3; 1),
(1 _ x2 y2)✓y2 _ xi + xy xyx- 1 -x!' In x
az
Y ' = 3/4; at the point M 2(3; 9), y' = -3/4. 41. M1(-l; 3), M2(-l; -1). 42. - ax =

z sin x - cos y . az x siny - cos z oz az c 2y


43. - - - = - -2- 44. 2x +
cos x - y sin z ay cos x - y sin z ax ay b z

x- 1 y - l z- 3 XXo YYo ZZo X - Xo Y - Yo


4y - z=3; ---= = 45. - - + - - + - - = 1; = --- =
2 4 -1 a2 b2 c2 Xo
a2
z-Zo x-1r/4 =-="-,r/4 = z-1/2
46. x-y-2z+l=0; 47. 4x+4y-z-4=0;
Zo 1 -1 -2
c2
x-1 y-1 z-4 X - l y-1 z-1
---=---=--- 48. 3x + 3y + z - 7 = O; = --- = ---
4 4 -1 3 3 1

49. x + 4y + 6z + 21 = _0; x + 4y + 6z - 21 = 0. 50. 1 + x + ~ (x2 - y 2) + ! (x3 - 3xy2).

1 1
51. Y + 21 (2xy - T) + 31 (3x2y - 3.xy2 + 2y 3 ). 52. 1 + (x - l)+(x - l)(y - 1) +

1 1 x-y
2 (x - 1) (y - 1). 53. Hint: Use the formula tan - - -- tan - 1 x - tan - 1 y. We
2
1 + xy

obtain: x - Y - 31 (x3 - y3) + _51 (x5 - ys). 54.1 + [(x - l)+(y+l)]+ [(x-l)+(y+l)]2 +
2!

38-9505
586 12. Functions of Several Variables

[(x - 1) + (y + 1)) 3 x +y (x + y)2 (x + y) 3


3! = 1+ l! + 2! + 3! • 55. Zmax = 1. 56. Zmin = 0.
57. No extremum. 58. No extremum. 59. Zmax = 1 at the point (0, 1). 60. Zmin = -1 at the

point (1, 0). 61. No extremum. 62. Zmin = 0 at point (1, 1/2). 63. Zmin = - 2 at point
e
( - 2, 0). 64. The absolute maximum z = 1 at the points (1, 0) and ( -1, 0). The absolute minimum
z = -1 at the points (0, 1) and (0, -1). 65. The absolute maximum z = 4 at the point
(2, 1). The absolute minimum z = -64 at the point (4, 2). 66. x = y =½v, ½v,
z = ~ Vzv. 67. Cube with a side a= j.
Appendix I

Elementary Functions
1. Power functions. The power function y = x°', where a is any real
number, is defined for all x > 0; this function monotonically increases if
a > 0 and monotonically decreases if a < 0 as shown in Figs. I.I and 1.2,
res pecti vel y.
y y

1
1

0 1 X 1 X
Fig. I.l Fig. 1.2

If a is a positive integer the function y = xa is defined at every point


of the number line - 00 < x < + 00. The graphs of this function for a = 3
and a = 4 are shown in Figs. 1.3 and 1.4, respectively.
If a is a negative integer the function y = xa is defined for all values
of x except x = 0 (Figs. 1.5 and 1.6).
If a = P > 0, where q is odd, is a rational number the function y = xa
q
is defined everywhere on the number line; if q is even this function is de-
fined for all x ~ 0.
2. Exponential functions. The exponential function y = <r, where a > 0
and a ;c 1, is defined at every point of the number line R. The number
a is called the base of the power function. This function monotonically
increases if a> 1 and monotonically decreases if 0 < a< 1 (Fig. 1.7).
3. Logarithmic functions. The logarithmic function y = lo&, x to the
base a, where a > 0 and a ;c 1, is defined of the interval (0, + 00). This
38*
588 Appendix I

y y

0 X

Fig. 1.4

Fig. 1.3
y

Fig. 1.5 Fig. 1.6

0 X X
Fig. 1.7
Elementary Functions 589

function monotonically increases if a > I and monotonically decreases if


0 < a < I (Fig. 1.8).
The logarithmic function y = loga x is the inverse of the exponential
function y = ax and vice versa.

Fig. 1.8

The logarithmic function to the base a = e is called the natural


logarithm and is usually denoted by In x; the logarithmic function to the
base a = 10 is called the common logarithm and sometimes denoted by
log x. In short loge x = In x and log10 x = log x.
4. Trigonometric functions. The sine function y = sin x is a periodic
function with period T = 21r defined for all x. The graph of the sine func-
tion is shown in Fig. 1.9.

y=sinx

Fig. 1.9
y

Fig. 1.10

The cosine funcion y = cos xis a periodic function with period T = 21r
defined for all x (Fig. 1.10). The graph of this function is obtained from
that of the sine function by translating the latter by - ; along the x-axis.
The tangent function y = tan xis a periodic function with period T = 1r

defined everywhere except at the points x = (2k + 1); (k = 0, ± 1,


±2, ... ) (Fig. 1.11).

I
I
I
I
I
I
I
I
I
I
I ,r 3,r
1-2 2
.r

Fig. I.11
Elementary Functions 591

y=cot x

-1( 1T X
I I
I I
I I
I I
I I
I
I I
I I
I I
I I
I I
Fig. 1.12

The cotangent function y = cot x is a periodic function with period


T = 1T' defined everywhere except at the points x = k1r (k = 0, ±1, ±2,
... ) (Fig. 1.12).
The secant and the cosecant functions are given by the formulas
sec x = 1 and cosec x = . 1 , respectively; they are defined every-
cos X SlnX
where except at the points where the denominators vanish.

y y

1T X X
2

-1

Fig. 1.13
592 Appendix I

5. Inverse trigonometric functions. (1) y = sin - 1 x.


Let us consider the sine function y = sin x on the closed interval
[- 12, 1r /2], where this function monotonically increases. Hence, the sine
1r
function has the inverse function x = sin - 1 y defined on the closed interval
[ -1, l] so that the range of x = sin - 1 y is the closed interval [ - 1r/2, 1r/2].
The graph of y = sin - 1 x is shown in Fig. 1.13.
(2) y = COS - 1 X.
Let us consider the cosine function y = cos x on the closed interval
[0, 1r], where this function monotonically decreases. On the closed interval
[ -1, 1] there exists the inverse function x = cos - 1 y whose values fill in
the closed interval [O, 1r]. The graph of y = cos - 1 x is shown in Fig. 1.14.

lj

0 r

-1
2"-
-------~
!
1 X

Fig. 1.14

y
lj

rr :x .:r
12
I
I
I
I
I
I
I

Fig. 1.15

(3) y = tan - 1 x.
Let us consider the tangent function y = tan x on the open interval
( - 1r/2, 1r/2), ·where this function monotonically increases. Then the range
of y = tan xis the interval ( - oo, + oo) so that there exists the inverse func-
Elementary Functions 593.·

tion x = tan - 1 y defined at every point of the number line; the values of
x = tan - 1 y fill in the interval [ - 1r/2, 1r/2]. The graph of y = tan - 1 x is
shown in Fig. 1.15.
(4) y = cot- 1 x.
As before we consider the cotangent function y = cot x on the interval
(0, 1r), where y = cot x monotonically decreases. The range of y = cot x
is the interval ( - oo, + oo) so that there exists the inverse function
x = cot - 1 y defined at every point of the number line; the values of
x = cot - 1 y fill in the interval (0, 1r). The graph of y = cot - 1 xis shown
in Fig. 1.16.
y y

rr

X
1"
I
I
I
I
I
I
I
I Fig. 1.16
I

0 X

Fig. 1.18

Fig. 1.17
594 Appendix I

6. Hyperbolic functions. The hyperbolic sine is given by


. ex - e-x
s1nh x = 2 . (1.1)

The domain is the interval ( - oo, oo).


The range is. the interval ( - oo, oo ).
The hyperbolic sine is an odd function since sinh ( - x) = - sinh x. The
graph is shown in Fig. 1.17.
The hyperbolic cosine is given by
ex+ e -x
cosh x = 2 (1.2)

The domain is the interval ( - oo, oo).

y y=coth x

lj=tanh :r

X 0 X

y=coth x
Fig. 1.19

Fig. 1.20

The range is the interval [1, oo).


The hyperbolic cosine is an even function assuming its minimal value
at the point x = 0. The graph is shown in Fig. 1.18.
The hyperbolic tangent is given by
sinhx ex - e-x
tan x =- - - = -ex-+-e -
cosh x -x
. (1.3)

The domain is the interval ( - oo, oo ).


The range is the interval ( -1, 1) so that jtanh xi < 1.
Elementary Functions 595

The hyperbolic tangent 1s an odd function. The graph IS shown In


Fig. 1.19.
The hyperbolic cotangent is given by
cosh x ex + e - x
c,othx= . h X -x· (1.4)
sm x e - e
The domain is the union ( - oo, 0) U (0, oo) of two intervals.
The range is the union ( - oo, -1) U (1, oo) of two intervals so that
lcothxl > 1.
The hyperbolic cotangent is an odd function. The graph is shown in
Fig. 1.20.
Relations between hyperbolic functions:
cosh 2 x - sinh 2 x = 1,
cosh x = ✓ sinh 2 x + 1 , sinh x = ✓ cosh 2 x - 1 .
sinh (x + y) = sinh x cosh y + cosh x sinh y,
sinh 2x = 2 sinh x cosh x,
cosh (x + y) = cosh x cosh y + sinh x sinh y,
cosh 2x = cosh 2 x + sinh 2 x.
tanhx + tanhy
tan h (x + y ) = - - - - - - - ,
1 + tanhxtanhy
2 tanhx
tan h 2.x = - - - - -
1 + tanh 2 x
These relations follow from the formulas (1.1)-(1.4).
Since sinh x and tanh x are odd functions the above relations yield
sinh (x - y) = sinh x cosh y - cosh x sinh y
cosh (x - y) = cosh x cosh y - sinh x sinh y
tanhx - tanhy .
tan h (x - y ) = - - - - - - -
1 - tanhy tanhy
In conclusion we shall prove the relation
(sinh x + cosh xt = sinh nx + cosh nx (n E N).
Indeed, using (1.1) and (1.2), we get
eX - e- X ff + e - X ) n
(sinhx + coshxt = (
2 + =(ext= enx
2
enx - e - nx enx + e - nx
- - -2- - + - - - 2- - = sinh nx + cosh nx.
Index

Abscissa, 14 Cartesian coordinate system, 14 Cylinder


Absolute value, 231 compatible with a polar elliptic, 93, 101
Adjoint operator, 210 system, 18 hyperbolic, 101
properties of, 210-211 Cauchy-Schwarz inequality, parabolic, 101
Angle 183 Cylindrical surface
dihedral, 50 Circle, 67 directing line of, 91
polar, 18 equation of, 15 generating line of, 91
Antiderivative, 409 Cofactor, 126
Applicate, 14 Column-vector, 103
Approximation Complex number(s), 294 Derivative(s)
accuracy of, 232 exponential form of, 299 . geometric interpretation of,
parabolic, 498 extracting a root of, 299 306
precision of, 232 operations on, 295 infinite, 309
Simpson's, 502 representation of, 294 left-hand, 309
trapezoidal, 498 trigonometric form of, of a constant function, 316
Area 296 of elementary functions, 319
of a curvilinear figure, Components of a vector, 30 of higher orders, 332, 559
457 Composite function, 278 of a ptocuct of functions,
of a plane figure, 476 derivatives of, 322, 545 317
Arc, 488 differentials of, 323, 549 of a quotient of functions,
Asymptotes Conditions 318
horizontal, 372 necessary, 237 of a sum of functions, 316
oblique, 369 sufficient, 237 partial, 538
vertical, 367 Cone of the second order, right-hand, 309
Axis, 11 102 Determinant(s)
coordinate, 11 Conic surface computing, 132-133
of abscissas, 12 directing line of, 93 multiplication of, 130
of applicates, 13 generating line of, 93 of a matrix of order n, 124
of ordinates, 12 vertex of, 93 of a square matrix, 122
of parabola, 78 Convexity of a curve, 362 of a triangular matrix, 125
polar, 18 Coordinate axis, 11 principal diagonal of, 19
Axes of coordinates reference point on, 11 properties of, 129-132
reflection of, 65 unit distance on, 11 second-order, 19
rotation of, 64 Coordinate plane, 14 secondary diagonal of, 19
translation of, 63 Coordinate system third-order, 20
Cartesian, 14 transposition of, 130
Basis polar, 18 Differentiable functions, 312
of linear space, 175 rectangular, 12 continuity of, 313
orthonormal, 31 Coordinates Differential(s), 314
Bernoulli's inequality, 238 of a vector, 30 invariance of form of, 323
Bilinear form, 214 origin of, 12 of a composite function, 549
Boundary point, 531 polar, 18 of higher orders, 334, 559
Bounded set, 235 Cramer's rule, 154 partial, 545
Curve(s) of the second total, 544
Cartesian coordinates, 11 order, 66 Directed line, 11
in a plane, 12 classification of, 83-89 Directed segment
in three-dimensional equation of, 67 magnitude of, 33
space, 13 standard equations of, Direction cosines, 37
origin of, 13 85-89 Dirichlet's function, 460
Index 597

Discontinuity critical points of, 568 properties of, 71-77


removable, 281 derivatives of higher orders standard Cartesian equation
unremovable, 282 of, 558 of, 71
differentiable, 541 vertices of, 71
Elementary functions differentials of higher orders
continuity of, 275 of, 559 Implicit function, 550
derivatives of, 319 discontinuities of, 280 Improper integral(s)
Ellipse, 67 domain of, 247 Cauchy principal value of,
directrices of, 70 elementary, 275 519
eccentricity of, 70 exponential, 587 Dirichlet's convergence test
equation of, 16, 86 extrema of, 352, 566 for, 518
foci of, 68 homogeneous of degree q, of unbounded functions, 520
imaginary, 86 93 of unbounded nonnegative
optical propery of, 81-83 hyperbolic, 594 functions, 523
properties of, 68-71 implicit, 550 Indefinite integral(s), 410
standard Cartesian equation infimum of, 286 involving elementary func-
of, 67 integrable, 460 tions, 412
vertices of, 68 inverse trigonometric, 592 properties of, 411
Ellipsoid, 95 limit of, 533 Inequality
of revolution, 95 linear combination of, 412 Bernoulli's, 238
Elliptic cylinder, 93 logarithmic, 587 Cauchy-Schwarz, 183
Equation(s) Maclaurin's formula for, 564 triangle, 184
approximate solution of, 381 of one variable, 247 lnfinitesimal(s), 258
constraint, 573 of several variables, 305 asymptotically equal, 290
of a circle, 15 partial derivatives of, 538 comparison of, 288
of a curve of the second power, 587 equivalent, 289
of an ellipse, 16, 86 range of, 247 properties of, 259
order, 67 rational, 424 Infinity, 262
Equation of a plane representation of, 249-251 Integral(s)
general, 48 smooth, 312 definite, 459
normal coordinate, 49 supremum of, 286 elliptic, 436
normal vector, 48 Taylor's formula for, 564 improper, 507
Error trigonometric, 589 indefinite, 410
absolute, 232 Fundamental theorems of involving irrational functions,
relative, 232 calculus 435
Extremum of a function, 355 first, 469 involving trigonometric func-
Euclidean space(s) second, 470 tions, 445
angle between vectors in, 184 of nonnegative functions, 511
examples of, 183 Gaussian elimination, 61-63 Integration
length of a vector in, 184 General equation of a plane, 48 numerical, 498
orthogonalization in, 185-189 by parts, 417, 475
orthonormal basis of, 187 Homogeneous linear system(s) by substitution, 414, 472
orthonormal vectors in, 185 fundamental system of solu- of partial fractions, 429
real, 182 tions of, 160 of rational functions, 424
properties of, 156-157 Interval
Fixed vector(s), 24 I.;Hospital's rule, 345 closed, 234
as a directed segment, 24 Hyperboloid half-closed, 234
equivalent, 24 of one sheet, 96 half-open, 234
zero, 24 of two sheets, 97 infinite, 234
Free vector(s), 24 Hyperboloid of revolution open, 234
collinear, 28 one-sheet, 96 Inverse functions, 324
length of, 25 two-sheet, 97 differentiation of, 324
Function(s) Hyperbola, 71 Irrational functions, 435
conditional extrema of, 573 asymptotes of, 72
composite, 278, 545 branches of, 72 Kronecker-Capelli theorem, 145
continuous on closed inter- conjugate, 77
vals, 283 directrices of, 75 Landau symbols, 293
continuous on a domain, eccentricity of, 75 Limit(s) of a function
537 equation of, 86 Cauchy criterion for, 251
continuous at a point, 272, foci of, 74 geometric interpretation of,
536 optical property of, 83 251
598 Index

Limit(s) of a function properties of, 173 Octant, 14


left-hand, 270 Linear subspace(s), 170 Operator
operations on, 266 direct sum of, 172 adjoint, 209
right-hand, 270 examples of, 170-171 linear, 197
sequential criterion for, 253 intersection of, 172 symmetric, 211
Limit of a sequence of complex orthocompliment of, 189 Ordinate, 14
numbers, 301 sum of, 171 Orthocompliment of linear sub-
Limit theorems, 254-256 Logical connectives, 236-237 space, 189
Line(s), 48 properties of, 189-190
equation of, 52 Maclaurin's formula, 389, 564 Orthonormal basis, 31
direction numbers of, 55 for elementary functions, 390
direction vector of, 55 Mapping(s)
general equation of, 51, 56 identity, 193 Parabola, 77
imaginary, 86 linear, 192 axis of, 78
intercept form of equation Mathematical induction, 238 directrix of, 78
of, 53 method of, 238 focal parameter of, 78
normal coordinate equation principle of, 238 focus of, 78
of, 51 Matrix(ces), 103 optical property of, 83
normal vector equation of, addition of, 105 properties of, 77-81
51 base column of, 140 standard Cartesian equation
slope intercept form of base minor of, 140 of, 77
parametric vector equation base row of, 140 tangent to, 80
of, 55 elementary, 119 vertex of, 77
point direction equation of, elementary operations on, Parabolic approximation, 498
56 113-122 Paraboloid
Linear mapping(s), 192 identity, 104 elliptic, 98
examples of, 192-193 inverse of, 134 hyperbolic, 98
image of, 193 minor of, 125 of revolution, 97
kernel of, 196 multiplication of, 105 Partial derivative(s), 538
nullity of, 196 nonsingular, 133 geometric interpretation of,
operations on, 196-197 of linear operator, 200 540
product of, 196 of schematic form, 116 mixed, 559
rank of, 194 principal diagonal of, 104 Partial fraction, 425
sum of, 196 product of, 105 Plane, 47
Linear operator(s), 197 rank of, 140 general equation of, 48
adjoint operator of, 209 square, 122 normal coordinate equation
characteristic polynomial of, transition, 182 of, 49
205 transpose of, 110 normal vector equation of,
eigenvalues of, 205 transposition of, 110 47
eigenvectors of, 205 triangular, 124 Plane curve
inverse of, 198 unit, 104 equation of, 66
matrix of, 200 zero, 124 Point
multiplication of, 197 Method of simple iteration, boundary, 531
symmetric (self-adjoint), 211 163-165 critical (stationary), 568
Linear space(s), 168 Methods of solving systems of deleted a-neighbourhood of,
additive inverse of a vector linear equations 235
in, 168 direct, 161 a-neighbourhood of, 235
basis of, 175 indirect, 161 of inflection, 362
dimension of, 178 iterative, 161 of local maximum, 353
examples of, 168-170 Mixed product of vectors, 43 of local minimum, 353
linear subspace in, 170 Multiplication of matrices, of strict maximum, 354
linearly dependent vectors in, 106-109 of strict minimum, 354
174 laws of, 108 regular (nonsingular), 555
linearly independent vectors singular, 555
in, 174 Newton-Leibniz theorem, 471 Polar axis, 18
n-dimensional, 180 Number(s) Polar coordinate system, 18
properties of, 170 irrational, 231 Polar coordinates
vectors in, 168 natural, 230 polar angle of, 18
zero vector in, 168 rational, 230 pole of, 18
Linear span(s), 172 real, 230 Polar radius, 18
examples of, 173-174 Number line, 234 Position vector, 31
Index 599

Quadratic form, 213 System(s) of linear equations, Taylor's formula, 385, 564
associated matrix of, 213 143 for functions, 386 ·
bilinear form associated augmented matrix of, 144 for polynomials, 385
with, 214 coefficients of, 144 Theorem
diagonalization of, 219 coefficient matrix of, 144 Cauchy mean value, 343
law of inertia for, 221 Cramer's rule for, 154 mean value, 341
positive-definite, 219 method of Gaussian elimina- Newton-Leibniz, 471
Quantifier tion for, 148-150 on construction of a linear
existential, 236 quadratic, 154 mapping, 191
universal, 236 Scalar product, 34 Rolle's, 339
basic properties of, 34-36 Third-order determinant, 20
Scalar square of a vector, 36 cofactor of, 22
Rational function, 424 Scalar triple product, 43 minor of, 21
proper, 425 geometric interpretation of, Total differential
real, 425 43 geometric interpretation of,
Reflection of axes of coor- Second-order determinant, 19 557
dinates, 65 Sequence of numbers, 239 Transition matrix, 182
Relations between Cartesian and bounded, 241 properties of, 182-183
polar coordinates, 18 bounded above, 241 Translation of axes of coor-
Relations between infinitesimals bounded below, 241 dinates, 63
and infinities, 265 Cauchy convergence criterion Trapezoidal approximation, 498
Rotation of axes of coordinates, of, 241 Triangle inequality, 184
64 infinitely large, 242
Row-vector(s), 103 limit of, 239
linear combination of, 112 stationary, 240 Unit vector(s), 29
linear dependence of, 112 Set(s), 229 as orthonormal basis, 31
linearly independent, 112 bounded, 235 Unitary space, 191
nontrivial linear combination bounded above, 235
of, 112 bounded below, 235 Vector(s)
trivial linear combination of, connected, 531 addition of, 26
112 countable, 230 direction of, 557
disjoint, 230 direction cosines of, 37
empty, 230 equivalent, 29
Space equal, 229 components of, 30
n-dimensional real coor- equivalent, 230 coordinates of, 30
dinate, 169 finite, 230 coplanar, 43
Euclidean, 183 greatest lower bound of, 236 fixed, 24
linear, 168 infimum of, 236 free, 24
linear complex, 168 infinite, 230 linear operations on, 26
linear real, 168 least upper bound of, 236 moving position, 48
unitary, 191 lower bound of, 236 normal, 49
vector, 168 one-to-one correspondence orthogonal, 185
Standard Cartesian coordinate between, 230 orthonormal, 185
system, 67 open, 531 position, 31
Surface(s) operations on, 230 scalar product of, 34
classification of, 90-95 proper subset of, 229 scalar square of, 36
conic, 93 supremum of, 235 sliding, 25
cylindrical, 91 unbounded above, 235 sum of, 26
equation of, 89 unbounded below, 235 unit, 29
of revolution, 90 unconnected, 531 Vector function, 396
of the second order, 89-90 upper bound of, 235 continuity of, 398
Surface(s) of the second order Significant digit, 233 differentiation of, 399
equation of, 90 accurate, 233 limit of, 398
standard equations of, 95-102 Simpson's approximation, 502 Vector product, 39
Symmetric operator Subset, 229 Vector space, 168
properties of, 212-213 proper, 229 Vector triple product, 45
10 THE READER
Mir Publishers would be grateful for your CO!llments on the con-
tent, translation and design of this book. We would also be pleased to
receive any other suggestions you may wish to make.
Our address is:
Mir Publishers
2 Pervy Rizhsky Pereulok
1-110, GSP, Moscow, 129820
USSR

You might also like