You are on page 1of 347

Applied Differential Equations

Gabriel Nagy and Tsvetanka Sendova

Mathematics Department,
Michigan State University,
East Lansing, MI, 48824.
gnagy@msu.edu
tsendova@msu.edu

April 3, 2020
Contents

Preface 1

Chapter 1. First Order Equations 3


1.1. Bacteria Reproduce Like Rabbits 4
1.1.1. The Difference Equation 4
1.1.2. Solving the Difference Equation 6
1.1.3. The Differential Equation 7
1.1.4. Solving the Differential Equation 8
1.1.5. Summary and Consistency 9
1.1.6. Exercises 13
1.2. Introduction to Modeling 15
1.2.1. The Linear Model 15
1.2.2. The Exponential Growth Equation 17
1.2.3. The Exponential Decay Equation 18
1.2.4. The Logistic Population Model 21
1.2.5. Interacting Species Model 24
1.2.6. Overview of Mathematical Models 25
1.2.7. Exercises 27
1.3. Qualitative Analysis 28
1.3.1. Existence and Uniqueness of Solutions 28
1.3.2. Direction Fields 29
1.3.3. Qualitative Solutions for Autonomous Equations 31
1.3.4. Exercises 36
1.4. Separable Equations 37
1.4.1. Separable Equations 37
1.4.2. Euler Homogeneous Equations 42
1.4.3. Solving Euler Homogeneous Equations 45
1.4.4. Exercises 47
1.5. Linear Equations 48
1.5.1. Constant Coefficients 48
1.5.2. Newton’s Cooling Law 50
1.5.3. Variable Coefficients 51
1.5.4. Mixing Problems 56
1.5.5. The Bernoulli Equation 61
1.5.6. Exercises 65
1.6. Approximate Solutions: Picard Iteration 67
1.6.1. Existence and Uniqueness of Solutions 67
1.6.2. The Picard Iteration 69
1.6.3. Picard vs Taylor Approximations 71
1.6.4. Picard Iteration: A Few More Examples 74
III
IV CONTENTS

1.6.5. Linear vs Nonlinear Equations 79


1.6.6. Exercises 82

Chapter 2. Second Order Linear Equations 83


2.1. General Properties 84
2.1.1. Definitions and Examples 84
2.1.2. Conservation of the Energy 85
2.1.3. Existence and Uniqueness of Solutions 89
2.1.4. Properties of Homogeneous Equations 90
2.1.5. The Wronskian Function 93
2.1.6. Abel’s Theorem 94
2.1.7. Exercises 97
2.2. Homogenous Constant Coefficients Equations 98
2.2.1. The Roots of the Characteristic Polynomial 98
2.2.2. Real Solutions for Complex Roots 102
2.2.3. Constructive Proof of Theorem 2.2.2 106
2.2.4. Note On the Repeated Root Case 107
2.2.5. Exercises 108
2.3. Nonhomogeneous Equations 109
2.3.1. The General Solution Formula 109
2.3.2. The Undetermined Coefficients Method 110
2.3.3. The Variation of Parameters Method 115
2.3.4. Exercises 119
2.4. Reduction of Order Methods 120
2.4.1. Special Second Order Equations 120
2.4.2. Conservation of the Energy 123
2.4.3. The Reduction of Order Method 127
2.4.4. Exercises 129

Chapter 3. The Laplace Transform Method 131


3.1. Introduction to the Laplace Transform 132
3.1.1. Oveview of the Method 132
3.1.2. The Laplace Transform 133
3.1.3. Main Properties 136
3.1.4. Solving Differential Equations 139
3.1.5. Exercises 141
3.2. The Initial Value Problem 142
3.2.1. Solving Differential Equations 142
3.2.2. One-to-One Property 143
3.2.3. Partial Fractions 145
3.2.4. Higher Order IVP 150
3.2.5. Exercises 151
3.3. Discontinuous Sources 152
3.3.1. Step Functions 152
3.3.2. The Laplace Transform of Steps 153
3.3.3. Translation Identities 154
3.3.4. Solving Differential Equations 158
3.3.5. Exercises 162
3.4. Generalized Sources 163
3.4.1. Sequence of Functions and the Dirac Delta 163
CONTENTS V

3.4.2. Computations with the Dirac Delta 165


3.4.3. Applications of the Dirac Delta 167
3.4.4. The Impulse Response Function 168
3.4.5. Comments on Generalized Sources 170
3.4.6. Exercises 173
3.5. Convolutions and Solutions 174
3.5.1. Definition and Properties 174
3.5.2. The Laplace Transform 176
3.5.3. Solution Decomposition 178
3.5.4. Exercises 181

Chapter 4. Qualitative Aspects of Systems of ODE 183


4.1. Modeling with First Order Systems 184
4.1.1. Interacting Species Model 184
4.1.2. Predator-Prey System 186
4.1.3. Spring-Mass System as a First Order System 187
4.1.4. Exercises 189
4.2. Vector Fields and Qualitative Solutions 190
4.2.1. Equilibrium Solutions 190
4.2.2. Vector Fields and Direction Fields 191
4.2.3. Sketching Solutions 193
4.2.4. Exercises 197

Chapter 5. Overview of Linear Algebra 199


5.1. Linear Algebraic Systems 200
5.1.1. Systems of Algebraic Equations 200
5.1.2. The Row Picture 200
5.1.3. The Column Picture 203
5.1.4. The Matrix Picture 206
5.1.5. Exercises 211
5.2. Vector Spaces 212
5.2.1. Spaces and Subspaces 212
5.2.2. Linear Independence 214
5.2.3. Exercises 216
5.3. Matrix Algebra 217
5.3.1. Matrix Operations 217
5.3.2. The Inverse Matrix 222
5.3.3. Overview of Determinants 225
5.3.4. Exercises 228
5.4. Eigenvalues and Eigenvectors 229
5.4.1. Computing Eigenpairs 229
5.4.2. Eigenvalue Multiplicity 232
5.4.3. Exercises 236
5.5. Diagonalizable Matrices 237
5.5.1. Diagonalizable Matrices 237
5.5.2. Eigenvectors and Diagonalizability 238
5.5.3. The Case of Different Eigenvalues 240
5.5.4. Exercises 242
5.6. The Matrix Exponential 243
5.6.1. The Exponential Function 243
VI CONTENTS

5.6.2. Diagonalizable Matrices Formula 244


5.6.3. Properties of the Exponential 246
5.6.4. Exercises 249

Chapter 6. Quantitative Aspects of Systems of ODE 251


6.1. Two-Dimensional Linear Systems 252
6.1.1. 2 × 2 Linear Systems 252
6.1.2. Diagonalizable Systems 253
6.1.3. The Case of Complex Eigenvalues 256
6.1.4. Non-Diagonalizable Systems 258
6.1.5. Exercises 261
6.2. Two-Dimensional Phase portraits 262
6.2.1. Review of Solutions Formulas 262
6.2.2. Real Distinct Eigenvalues 263
6.2.3. Complex Eigenvalues 265
6.2.4. Repeated Eigenvalues 268
6.2.5. The Stability of Linear Systems 269
6.2.6. Exercises 272
6.3. Two-Dimensional Nonlinear Systems 273
6.3.1. 2 × 2 Autonomous Systems 273
6.3.2. Critical Points and Linearization 273
6.3.3. Exercises 278
6.4. Applications of the Qualitative Analysis 279
6.4.1. Competing Species 279
6.4.2. Predator-Prey 283
6.4.3. Nonlinear Pendulum 287
6.4.4. Exercises 291

Chapter 7. Boundary Value Problems 293


7.1. Simple Eigenfunction Problems 294
7.1.1. Two-Point Boundary Value Problems 294
7.1.2. Comparison: IVP and BVP 295
7.1.3. Simple Eigenfunction Problems 297
7.1.4. Exercises 301
7.2. Overview of Fourier series 302
7.2.1. Fourier Expansion of Vectors 302
7.2.2. Fourier Expansion of Functions 305
7.2.3. Even or Odd Functions 310
7.2.4. Sine and Cosine Series 311
7.2.5. Applications 315
7.2.6. Exercises 316
7.3. The Homogeneous Heat Equation 317
7.3.1. The Heat Equation (in One-Space Dim) 317
7.3.2. The IBVP: Dirichlet Conditions 319
7.3.3. The IBVP: Neumann Conditions 322
7.3.4. Exercises 328

Chapter 8. Appendices 329


A. Overview of Complex Numbers 329
A.1. Extending the Real Numbers 331
A.2. The Imaginary Unit 331
CONTENTS VII

A.3. Standard Notation 332


A.4. Useful Formulas 332
A.5. Complex Functions 335
A.6. Complex Vectors 336
Bibliography 339
Preface

These notes are an introduction to ordinary differential equations. We study a variety of


models in physics and engineering, which use differential equations, we learn how to con-
struct differential equations, corresponding to different ecological or physical systems. In
addition to introducing various analytical techniques for solving basic types of ODEs, we
also study qualitative techniques for understanding the behavior of solutions. We describe
a collection of methods and techniques used to find solutions to several types of differential
equations, including first order scalar equations, second order linear equations, and systems
of linear equations. We introduce Laplace transform methods to solve constant coefficients
equations with generalized source functions. We also provide a brief introduction to bound-
ary value problems, Sturm-Liouville problems, and Fourier Series. Near the end of the
course we combine previous ideas to solve an initial boundary value problem for a particular
partial differential equation, the heat propagation equation.

1
CHAPTER 1

First Order Equations

This first chapter is an introduction to differential equations. We first show that differential
equations can be obtained from an appropriate limit of a difference equation. Next we show
how differential equations can be used to model the population of different type of systems.
We focus on particular techniques—developed in the eighteenth and nineteenth centuries—
to solve certain first order differential equations. These equations include linear equations,
separable equations, Euler homogeneous equations, and exact equations. Soon this way
of studying differential equations reached a dead end. Most of the differential equations
cannot be solved by any of the techniques presented in the first sections of this chapter.
Then, people tried something different. Instead of solving the equations they tried to show
whether an equation has solutions or not, and what properties such solution may have. This
is less information than obtaining the solution, but it is better than giving up. The results of
these efforts are shown in the last sections of this chapter. We present theorems describing
the existence and uniqueness of solutions to a wide class of first order differential equations.

3
4 1. FIRST ORDER EQUATIONS

1.1. Bacteria Reproduce Like Rabbits


A differential equation is an equation, where the unknown is a function and at least one of its
derivatives appears in the equation. Differential equations are essential for a mathematical
description of nature—they lie at the core of many physical theories. In this section we
show that differential equations can be obtained as a certain limit of difference equations.
We focus on a specific problem—a quantitative description of bacteria growth having
unlimited space and food. We first measure the bacteria population at fixed time intervals,
then we repeat the measurements at shorter and shorter time intervals. We write our
measurements in a difference equation for a discrete time interval variable. We solve this
difference equation, obtaining the bacteria population as a function of the initial population
and the number of time intervals passed from the start of the experiment. We then compute
a very particular limit on the difference equation, called the continuum limit. In this limit
the time interval goes to zero and the number of time intervals goes to infinity so that their
product remains constant. We will see that the continuum limit of the difference equation
in this section is a differential equation, called the population growth differential equation.

1.1.1. The Difference Equation. We want to know how bacteria grows in time when
they have unlimited space and food. To obtain such equation we observe—very carefully—
how the bacteria grows. We put an initial amount of bacteria in a small region at the
center of a petri dish, which is full of bacteria nutrients. In this way the bacteria population
has unlimited space and food to grow for a certain time. The bacteria population is then
proportional to the area in the petri dish covered in bacteria. With this setting we will
perform several experiments in which we measure the bacteria population after regular time
intervals.

Figure 1. Bacteria growth experiment with unlimited food and space.

First Experiment:
• fix the time interval between measurements by ∆t1 = 1 hour.
• denote the bacteria population after n time intervals as P (n∆t1 ) = P (n),
• introduce the initial bacteria population P (0),
Our first measurement is P (1), the bacteria population after 1 hour. It is convenient to
write P (1) as follows
P (1) = P (0) + ∆P1
1.1. BACTERIA REPRODUCE LIKE RABBITS 5

where ∆P1 is what we actually have measured, and it is the increment in bacteria population.
In the same way we can write our first n measurements,
P (1) = P (0) + ∆P1 ,
P (2) = P (1) + ∆P2 ,
..
.
P (n) = P ((n − 1)) + ∆Pn , (1.1.1)
where ∆Pj , for j = 1, · · · , n, is the increment in bacteria population at the measurement
j relative to the measurement j − 1. If you actually do the experiment—and if you look
carefully enough at the ∆Pn carefully enough–you will find the following: The increment in
the bacteria population ∆Pn is not random, but it follows the rule
∆Pn = K1 P (n − 1), (1.1.2)
where K1 depends on the type of bacteria and on the fact that we are measuring by ∆t1 = 1
hour. This last equation means that the growth of the bacteria population is proportional
to the existing bacteria population. We use Eq. (1.1.2) in Eq. (1.1.1) and we get the formula
P (n) = P (n − 1) + K1 P (n − 1), n = 1, 2, · · · , N, (1.1.3)
where N is the last time we measure, probably when the bacteria population fills the whole
petri dish. This is the end of our first experiment.

Second Experiment: We reduce the time interval ∆t when we take measurements. Now
∆t2 = 30 minutes, that is, ∆t2 = 1/2 hours. Since ∆t2 is no longer 1, we need to include it
in the argument of P . If you carry out the experiment, you will find that Eq. (1.1.3) still
holds for this case if we introduce ∆t2 = ∆t1 /2 as follows,
P (n∆t2 ) = P ((n − 1)∆t2 ) + K2 P ((n − 1)∆t2 ), n = 1, 2, · · · , N. (1.1.4)
In this experiment we have to measure the new constant K2 . You will find that K2 = K1 /2.
This is reasonable, the bacteria population grows in 30 minutes half it grows in one hour.
This is the end of our second experiment.

m-th Experiment: We now carry out many more similar experiments. For the m-th
experiment we use a time interval ∆tm = ∆t1 /m, where ∆t1 = 1 hour. If you carry out all
these experiments, you will find the following relation,
P (n∆tm ) = P ((n − 1)∆tm ) + Km P ((n − 1)∆tm ), n = 1, 2, · · · , N, (1.1.5)
where Km = K1 /m.

By looking at all our experiments, we can see that the constant Km is in fact proportional to
the time interval ∆tm used in the experiment, and the proportionality constant is the same
for all experiments. Indeed,
K1 K1 ∆t1
Km = ⇒ Km = ⇒ Km = r ∆tm ,
m ∆t1 m
where the constant r = K1 /∆t1 depends only on the type of bacteria we are working with.
Since the constant Km in any of the experiments above is proportional to the time interval
∆tm used in each experiment, we can simplify the notation and discard the subindex m,
K = r ∆t.
6 1. FIRST ORDER EQUATIONS

Then, the final conclusion of all our experiments is the following: the population of bacteria
after n time intervals ∆t > 0 is given by the equation

P (n∆t) = P ((n − 1)∆t) + r ∆t P ((n − 1)∆t), (1.1.6)

where r is a constant that depends on the type of bacteria studied and n = 1, 2, · · · . This
equation is a difference equation, because the argument of the population function takes
discrete values. We call equation (1.1.6) the discrete population growth equation. The
physical meaning of this constant r is given in the equation above,
∆P 1
r=
∆t P
where ∆P = P (n∆t) − P ((n − 1)∆t) and P = P ((n − 1)∆t). So r is the rate of change in
time of the bacteria population per bacteria, that is, a relative rate of change.

1.1.2. Solving the Difference Equation. The difference equation (1.1.6) relates
the bacteria population after n time intervals, P (n∆t), with the bacteria population at
the previous time interval, P ((n − 1)∆t). To solve a difference equation means to find
the bacteria population after n times intervals, P (n∆t), in terms of the initial bacteria
population, P (0). The difference equation above can be solved, and the result is in the
following statement.

Theorem 1.1.1. The difference equation

P (n∆t) = P ((n − 1)∆t) + r ∆t P ((n − 1)∆t),

relating P (n∆t) with P ((n − 1)∆t) has the solution

P (n∆t) = (1 + r ∆t)n P (0), (1.1.7)

relating P (n∆t) with P (0).

Proof of Theorem 1.1.1: Eq. (1.1.6) can be rewritten as

P (n∆t) = (1 + r ∆t) P ((n − 1)∆t),

but we can also rewrite the expression for P ((n − 1)∆t) in a similar way,

P ((n − 1)∆t) = (1 + r ∆t) P ((n − 2)∆t),

and so on till we reach P (0). Therefore,

P (n∆t) = (1 + r ∆t) P ((n − 1)∆t)


= (1 + r ∆t)2 P ((n − 2)∆t)
..
.
= (1 + r ∆t)n P (0).

So, we have solved the discrete equation for population growth, and the solution is

P (n∆t) = (1 + r ∆t)n P (0).

This establishes the Theorem. 


1.1. BACTERIA REPRODUCE LIKE RABBITS 7

1.1.3. The Differential Equation. We want to know what happens to the difference
equation (1.1.6) and its solutions (1.1.7) in the continuum limit:

∆t → 0, n ∆t = t > 0 is constant.

We call it the continuum limit because ∆t → 0, so we look more and more often at the
bacteria population, and then n → ∞, since we are making more and more observations.
Rather than doing an experiment to find out what happens, we work directly with the
discrete equation that models our bacteria population.

Theorem 1.1.2. The continuum limit of the discrete equation

P (n∆t) = P ((n − 1)∆t) + r ∆t P ((n − 1)∆t),

is the differential equation


P 0 (t) = r P (t). (1.1.8)

Remark: The equation (1.1.8) is a differential equation because both P and P 0 appear in
the equation. It is called the exponential growth differential equation because its solutions
are exponentials that increase with time.
Proof of Theorem 1.1.2: We start renaming n as n + 1, then Eq. (1.1.6) has the form

P ((n + 1)∆t) = P (n∆t) + r ∆t P (n∆).

From here it is simple to see that

P (n∆t + ∆t) − P (n∆t) = r ∆t P (n∆).

We now use that n ∆t = t, then the equation above becomes

P (t + ∆t) − P (t) = r ∆t P (t).

Dividing by ∆t we get
P (t + ∆t) − P (t)
= r P (t).
∆t
The continuum limit is given by ∆t → 0 and n → ∞ such that n ∆t = t is constant. For
each choice of t we have a particular limit. So we take such limit in the equation above,
P (t + ∆t) − P (t)
lim = r P (t).
∆t→0 ∆t
Since t is held constant and ∆t → 0, the left-hand side above is the derivative of P with
respect to t,
P (t + ∆t) − P (t)
P 0 (t) = lim .
∆t→0 ∆t
So we get the differential equation

P 0 (t) = r P (t).

This establishes the Theorem. 


8 1. FIRST ORDER EQUATIONS

1.1.4. Solving the Differential Equation. We now find all solutions to the expo-
nential growth differential equation in (1.1.8). By a solution we mean a function P that
depends on time such that its derivative is r times the function itself.

Theorem 1.1.3. All the solutions of the differential equation P 0 (t) = r P (t) are
P (t) = P0 ert , (1.1.9)
where P0 is a constant.

Remark: The constant P0 in (1.1.9) is the initial population, P (0) = P0 .


Proof of Theorem 1.1.3: To find all solutions we start dividing the equation by P ,
P 0 (t)
= r.
P (t)
We now integrate both sides with respect to time,
P 0 (t)
Z Z
dt = r dt.
P (t)
The integral on the right-hand side is simple to do, we need to integrate a constant,
P 0 (t)
Z
dt = rt + c0 ,
P (t)
where c0 is an arbitrary constant. On the left-hand side we can introduce a substitution
p = P (t) ⇒ dp = P 0 (t) dt.
Then, the the equation above becomes
Z
dp
= rt + c0 .
p
The integral above is simple to do and the result is
ln |p| = rt + c0 .
We now replace back p = P (t), and we can solve for P ,
ln |P (t)| = rt + c0 ⇒ |P (t)| = ert+c0 = ekt ec0 ⇒ P (t) = (±ec0 ) ert .
We denote c = (±ec0 ), then all the solutions to the exponential growth equation,
P (t) = c ert , c ∈ R.
The constant c is the initial population. Indeed, given an initial population P0 , called an
initial condition, then it fixes the constant c, because
P0 = P (0) = c e0 = c ⇒ c = P0 .
Then the solution of the differential equation with an initial population P0 is
P (t) = P0 ert .
This establishes the Theorem. 

Remark: We see that the solution of the differential equation is an exponential, which is
the origin of the name for the differential equation.
1.1. BACTERIA REPRODUCE LIKE RABBITS 9

1.1.5. Summary and Consistency. By carefully observing how bacteria grow when
they have unlimited space and food we came up with a diference equation, Eq. (1.1.6). We
were able to solve this difference equation and the result was Eq. (1.1.7). We then studied
what happened with the difference equation in the continuum limit—we look at the bacteria
at infinitely short time intervals. The result is a differential equation, the exponential growth
differential equation (1.1.8). Recalling calculus ideas we were able to find all solutions of
this differential equation, given in Eq. (1.1.9). We can summarize all this as follows,
Discrete description ∆t → 0 Continuous description
P (n∆t) = (1 + r ∆t) P ((n − 1)∆t) −→ P 0 (t) = r P (t)
↓ ↓
Soving the equation Solving the equation
↓ ↓
Consistency
P (n∆t) = (1 + r ∆t)n P0 −→ P (t) = P0 ert
We are now going to show the consistency of the solutions. We have a solution of the
discrete equation, we have a solution of the continuum equation, and now we show that the
continuum limit of the former is the latter.

Theorem 1.1.4 (Consistency). The continuum limit of the solutions of the difference equa-
tions are the solutions of the differential equation,
P (n∆t) = (1 + r ∆t)n P0 −→ P (t) = P0 ert .

Proof of Theorem 1.1.4: We start with the discrete solution given in Eq. (1.1.7),
P (n∆t) = (1 + r∆t)n P0 , (1.1.10)
and we recall that t = n∆t, hence ∆t = t/n. So we write
 rt n
P (t) = 1 + P0 .
n
Now we need to study the limit of the expression above as n → ∞ while t is constant in
that limit. This is a good time to remember the Euler number e,
 1 n
e = lim 1 +
n→∞ n
satisfies that  x n
ex = lim 1 + .
n→∞ n
Using the formula above for x = rt we get
 rt n
lim 1 + = ert .
n→∞ n
With all this we can write the continuum limit as
 rt n
P (t) = lim 1 + P0 = ert P0 .
n→∞ n
But the function
P (t) = P0 ert
is the solution of the differential equation obtained using methods from calculus. This
establishes the Theorem. 
10 1. FIRST ORDER EQUATIONS

In the following examples we provide a table with data from different physical systems.
Then we find the difference equation that describes such data and its solution. After that
we compute the continuum limit, which gives us the differential equation for that system.
We finally solve the differential equation.

Example 1.1.1. The population of bees in a state, given in thousands, is given by

Year 2000 2002 2004 2006 2008 2010


Population 2 10 50 250 1250 6250

Model these data using exponential growth model, denoting by P (t) the bee population
in thousands at time t, with time in years since the year 2000. For example, for the year
2008, the variable t is 8. Consider a discrete model for the data in the table above given by
P ((n + 1)∆t) = P (n∆t) + k ∆t P (n∆t).
(1) Determine the growth-rate coefficient k using the data for the years 2000 and 2002.
(2) Determine the growth-rate coefficient k again, this time using the data for the
years 2008 and 2010.
(3) Use the value of k found above to write the discrete equation describing the bee
population. Write ∆t instead of the time interval in the table.
(4) Solve the discrete equation for the bee population.
(5) Find the continuum differential equation satisfied by the bee population and write
the initial condition for this equation.
(6) Find all solutions of the continuum equation found in part (5).

Solution:
(1) The growth coefficient computed using the years 2000 and 2002 is
(10 − 2) 1
k= ⇒ k = 2.
(2002 − 2000) 2

(2) The growth coefficient computed using the years 2008 and 2010 is
(6250 − 1250) 1
k= ⇒ k = 2.
(2010 − 2008) 1250

(3) We now use k = 2 and ∆t arbitrary to write the discrete equation that describes
the data in the table. We denote
P (n + 1) = P ((n + 1)∆t), P n = P (n∆t),
then, the discrete equation is
P (n + 1) = P n + r ∆t P n,
which is the analogous to Eq. (1.1.6).
(4) Since
)
P n = (1 + r ∆t) P (n − 1),
⇒ P n = (1 + r ∆t)2 P (n − 2),
P (n − 1) = (1 + r ∆t) P (n − 2),
repeating this argument till we reach P0 we get
P n = (1 + r ∆t)n P0 .
1.1. BACTERIA REPRODUCE LIKE RABBITS 11

(5) The continuum equation is obtained from the discrete equation taking the contin-
uum limit:
∆t → 0, n → ∞ such that n ∆t = t ∈ R.
Using the discrete equation in (3) we get
P (n + 1) − P n
P (n + 1) − P n = r ∆t P n ⇒ = r P n.
∆t
If we write what P (n + 1) and P n actually are, we get
P (n∆t + ∆t) − P (n ∆t)
= r P (n ∆t).
∆t
Since n ∆t = t, we replace it above,
P (t + ∆t) − P (t)
= r P (t).
∆t
Since ∆t → 0 we get the continuum equation
P 0 (t) = r P (t).
(6) To solve the continuum equation we rewrite it as follows,
P0 P 0 (t)
Z Z
=r ⇒ dt = r dt ⇒ ln(|P |) = rt + c0 ,
P P (t)
where c0 ∈ R is an arbitrary integration constant, and we ln(|P |)0 = P 0 /P . Then,
P (t) = ±ert+c0 = ±ec0 ert , denote c1 = ±ec0 ⇒ P (t) = c1 ert , c1 ∈ R.
The constant c1 is determined by the initial population P (0) = P0 . Indeed
P0 = P (0) = c1 e0 = c1 ⇒ c1 = P0
therefore we get that
P (t) = P0 ert .
C

Example 1.1.2. A bacteria population increases by a factor (1 + 8 ∆t) in a time period


∆t. Every ∆t we harvest an amount of bacteria 20 ∆t.
(a) Write the discrete equation that relates the bacteria population at (n + 1) ∆t with
the bacteria population at n ∆t.
(b) Find the continuum limit in the discrete equation found in part (a) above. Recall
that the continuum limit is n → ∞ and ∆t → 0 so that n Dt = t is constant in that
limit.
(c) Solve the differential equation in part (b) in the case there is an initial population of
100 bacteria.

Solution:
(a) We know that the bacteria population P increases by a factor (1 + 8 ∆t during the time
interval ∆t. If we forget that we harvest bacteria, then after (n + 1) time intervals
the bacteria population is
P ((n + 1)∆t) = (1 + 8 ∆t) P (n ∆t).
12 1. FIRST ORDER EQUATIONS

The equation above does not include the fact that we harvest 20 ∆t bacteria every time
interval ∆t. If we include this fact we get the equation
P ((n + 1)∆t) = (1 + 8 ∆t) P (n ∆t) − 20 ∆t,
where the negative sign in the last term indicates we reduce the population by that
amount when we harvest.
(b) The continuum limit is computed as folllows: If we see a product n ∆t, we replace it by
t, that is,
P (n ∆t + ∆t) = (1 + 8 ∆t) P (n ∆t) − 20 ∆t ⇔ P (t + ∆t) = (1 + 8 ∆t) P (t) − 20 ∆t.
We now reorder terms such that we get an incremental quotient on the left-hand side,
P (t + ∆t) − P (t)
P (t + ∆t) − P (t) = 8 ∆t P (t) − 20 ∆t ⇒ = 8 P (t) − 20.
∆t
Now we take the limit ∆t → 0 keeping t constant, and we get the continuum equation
P 0 (t) = 8 P (t) − 20.
(c) We now need to solve the differential equation above. We do the same calculation we
did in the case of zero harvesting.
P 0 (t) P 0 (t)
Z Z
0
P (t) = 8 P (t) − 20 ⇒ =1 ⇒ dt = dt.
(8 P (t) − 20) (8 P (t) − 20)
On the left-hand side above we substitute u = 8 P (t) − 20, so du = 8 p0 (t) dt. Then,
Z Z
1 du 1
= dt ⇒ ln |u| = t + c1 ⇒ ln |8 P (t) − 20| = 8t + 8c1 .
u 8 8
We compute the exponential of both sides,
|8 P (t) − 20| = e8t+8c1 = e8t e8c1 ⇒ 8 P (t) − 20 = (±e8c1 ) e8t .
If we call c2 = (±e8c1 ), we get that
c2 8t 20
8 P (t) − 20 = c2 e8t ⇒ P (t) =
e + ,
8 8
and again relabeling the constant c = c2 /8 we get that
5
P (t) = c e8t + .
2
We know that at time t = 0 we have P (0) = 100 bacteria, which fixes the constant c,
because
5 5 5 195
100 = P (0) = c e0 + = c + ⇒ c = 100 − = .
2 2 2 2
So the continuum formula for the bacteria population is
195 8t 5
P (t) = e + .
2 2
C
1.1. BACTERIA REPRODUCE LIKE RABBITS 13

1.1.6. Exercises.

1.1.1.- The fish population in a lake, given in hundred thousands, is given by

Year 2000 2001 2002 2003 2004 2005


Population 3 4.5 6.75 10.125 15.1875 22.78125

Model these data using exponential growth model, denoting by P (t) the fish population
in hundred thousands at time t, with time in years since the year 2000. For example, for the
year 2005, the variable t is 5. Consider a discrete model for the data in the table above given
by

P ((n + 1)∆t) = P (n∆t) + k ∆t P (n∆t).

(1) Determine the growth-rate coefficient k using the data for the years 2000 and 2001.
(2) Determine the growth-rate coefficient k again, this time using the data for the years 2004
and 2005.
(3) Use the value of k found above to write the discrete equation describing the fish popula-
tion. Write ∆t instead of the time interval in the table.
(4) Solve the discrete equation for the fish population.
(5) Find the continuum differential equation satisfied by the fish population and write the
initial condition for this equation.
(6) Solve the the initial value problem found in the previous part.
(7) Use a computer to compare the solutions to the discrete equation (with any ∆t 6= 0) and
continuum equation for the fish population.

1.1.2.- A bacteria population increases by a factor r in a time period ∆t. Every ∆t we harvest an
amount of bacteria Ph ∆t, where Ph is a fixed constant.

(a) Write the discrete equation that relates the bacteria population at (n + 1) ∆t with the
bacteria population at n ∆t. This equation is similar, but not equal, to Eq. (1.1.6) above.
(b) Find the continuum limit in the discrete equation found in part (a) above. Recall that
the continuum limit is n → ∞ and ∆t → 0 so that n ∆t = t is constant in that limit.
Denote by P (t) the bacteria population at the time t.
(c) Solve the differential equation in part (b) in the case there is an initial population of P0
bacteria.
(d) The solution of the continuum differential equation in part (b) above also holds in the
case that the initial population of bacteria is smaller than Ph /r. So, consider the case
where P0 < Ph /r and find the time t1 such that the bacteria population vanishes.

1.1.3.- The amount of a radioactive material decreases by a factor r = 1/2 in a time period ∆t.

(a) Write the discrete equation that relates the amount of radioactive material at (n +
1) ∆t with the radioactive material at n ∆t. This equation is similar, but not equal,
to Eq. (1.1.6) above.
(b) What is the main difference between a radioactive decay system and a bacteria population
system?
(c) Take the continuum limit in the discrete equation found in part (a) above. Recall that
the continuum limit is n → ∞ and ∆t → 0 so that n ∆t = t is constant in that limit.
Denote by N (t) the amount of radioactive material at the time t. You should obtain the
radioactive decay differential equation.
14 1. FIRST ORDER EQUATIONS

(d) Solve the radioactive decay differential equation. Denote by N0 the initial amount of the
radioactive material.
N (0)
(e) The half-life of a radioactive material is the time τ such that N (τ ) = . Find the
2
half-life of radiative material in this problem. Find an equation relating the half life τ
with the radioactive decay constant r.
1.2. INTRODUCTION TO MODELING 15

1.2. Introduction to Modeling


If a system changes in time, chances are that such system can be described by a differential
equation. In this section we present a few examples of differential equations, starting with
the simplest differential equation there is—the equation in a linear model. Next we focus on
exponential growth models, which describe population systems—how the population changes
in time under different conditions. We review the bacteria population models introduced
in the previous section—bacteria with unlimited space and food. We also show a deeply
related model, the exponential decay models, which describe the behavior of radioactive
material. After that we study the logistic equation, which describes population with finite
food as well as the behavior of certain infectious diseases. We end this section with a brief
summary of what it takes to construct a mathematical model.

1.2.1. The Linear Model. In § 1.1 we studied bacteria growth. We saw that when
the bacteria has infinite food and space they grow exponentially in time. In this section we
study a rather simpler system—a system that grows linearly in time.

Example 1.2.1 (Discrete Linear Model). Consider a tub with 1000 liters of water. Use
an 8 liter bucket to take the water out of the tub at a rate of 8 liters per second. Write a
mathematical model describing the amount of water in the tank after n > 0 seconds.
Solution: Let’s introduce W (n), the amount of water in liters after n > 0 seconds. Initially
we have 1000 liters,
W (0) = 1000.
One, two, and n seconds later we have
W (1) = 1000 − 8 = 992
W (2) = 992 − 8 = 986
..
.
W (n) = W (n − 1) − 8.
If ∆W = 8 liters, then the equation for the water in the tub after n seconds is
W (n) = W (n − 1) − ∆W.
This is a difference equation, relating the amount of water after n seconds, W (n), with
the amount of water a second earlier, W (n − 1). A solution to a difference equation is an
equation relating the amount of water after n seconds, W (n), with the initial amount of
water, W (0). The solution is simple to find in this case,
W (n) = W (n − 1) − ∆W = W (n − 2) − 2 ∆W = · · · ⇒ W (n) = W (0) − n ∆W.
In this particular example we get
W (n) = 1000 − 8n.
The solution is a linear function of the number of seconds, n, hence the name linear model.
C

In § 1.1 discussed in detail the continuum limit—how to obtain a differential equation


from a difference equation. In this section we do not go over that limit again. Instead, we
show a new problem that is described by a differential equation.
16 1. FIRST ORDER EQUATIONS

Example 1.2.2 (Continuum Linear Model). Consider a tub with 1000 liters of water. Use
a hose to drain the water form the tub at a constant rate of 8 liters per second. Write a
mathematical model describing the amount of water in the tank after t > 0 seconds.
Solution: We introduce W (t), the amount of water in the tub at the time t. The rate of
change in time of the water in the tub is described W 0 (t) = dW/dt, the time derivative
of W . Since the tub is drained, this derivative is negative. Since the tub is drained at a
constant rate, this derivative is constant. So the differential equation describing the water
in the tub is
W 0 (t) = −8.
This equation is trivial, in the sense that the function W does not appear in the equation.
All the solutions to the equation above are
W (t) = −8t + c,
where c is any real constant. We are interested in only one solution, the one that satisfies
the initial condition
W (0) = 1000.
This condition determines the constant c, since
1000 = W (0) = 8(0) + c = c ⇒ c = 1000.
So, the solution of the differential equation that describes the water in the tub is
W (t) = −8t + 1000.
C

Remark: If water gets into the tub, W 0 > 0; if the water gets out of the tub, W 0 < 0.
Here is a summary of the results in the examples above.

Theorem 1.2.1 (Linear Model). The linear model equation for the function W , with inde-
pendent variable t,
W 0 (t) = r, (1.2.1)
where r is a constant, has infinitely many solutions
W (t) = rt + c, (1.2.2)
where c is any constant. Furthermore, given a constant W0 , there is only one solution to
the pair of equations
W 0 = r, W (0) = W0 , (1.2.3)
and the solution is
W (t) = rt + W0 . (1.2.4)

Remarks:
• The equation in (1.2.1) is called the linear growth equation for r > 0, and the
linear decay equation for r < 0.
• The functions in (1.2.2) are called the general solution of the linear growth equa-
tion. The problem in Eq. (1.2.3) is called an initial value problem (IVP). The
second equation Eq. (1.2.3) is called an initial condition.
1.2. INTRODUCTION TO MODELING 17

1.2.2. The Exponential Growth Equation. In § 1.1 we focused on a particular


physical system—the population of bacteria having unlimited food and space. By doing
simple experiments we obtained a difference equation describing the bacteria population. We
then computed the continuum limit of the difference equation. The result was a differential
equation—the exponential growth differential equation.

Definition 1.2.2. The exponential growth equation for the function P , which depends
on the independent variable t, is
P 0 (t) = r P (t), (1.2.5)
where r > 0 is a constant called the growth constant per capita.

In the system studied in § 1.1 the function values P (t) were the bacteria population at
the time t, and the growth constant r depended on the type of bacteria. In § 1.1 we also
found all solutions to the exponential growth equation.

Theorem 1.2.3 (Exponential Growth). The exponential growth equation


P0 = r P
with r 6= 0, constant, has infinitely many solutions,
P (t) = c ert , (1.2.6)
where c is any constant. Furthermore, given a constant P0 , there is only one solution to the
pair of equations
P 0 = r P, P (0) = P0 , (1.2.7)
and the solution is
P (t) = P0 ert . (1.2.8)

Remark: The functions in (1.2.6) are called the general solution of the exponential growth
equation. The problem in Eq. (1.2.7) is called an initial value problem (IVP). The second
equation Eq. (1.2.7) is called an initial condition. We solved this equation in § 1.1, but we
write the proof here one more time.
Proof of Theorem 1.2.3: Rewrite the differential equation as follows,
P0 P 0 (t)
Z Z
=r ⇒ dt = r dt ⇒ ln(|P |) = rt + c0 ,
P P (t)
where c0 ∈ R is an arbitrary integration constant, and we used that ln(|P |)0 = P 0 /P . Now
compute the exponential on both sides,
P (t) = ±ert+c0 = ±ec0 ert , denote c = ±ec0 ⇒ P (t) = c ert , c ∈ R.
The last equation above is the general solution of the differential equation. We now focus
on the furthermore part of the theorem. We need to show that there is only one solution
that satisfies the initial condition P (0) = P0 , for a given constant P0 . Indeed,
P0 = P (0) = c e0 = c ⇒ c = P0 .
Therefore, there is only one solution of the initial value problem given by
P (t) = P0 ert .
This establishes the Theorem. 
In the next example we generalize Theorem 1.2.3 to slightly more general differential
equations. This result will be useful in population models containing immigration.
18 1. FIRST ORDER EQUATIONS

P (0) > 0

P (0) < 0

Figure 2. We graph a few solutions given in Eq. (1.2.6) for different values
of P (0). Only the solutions with initial population P (0) > 0 represent
population models. (Click on this Interactive Graph)

Example 1.2.3. Find all solutions to y 0 (t) = r y(t) + b, where r, b are constants.
Solution: Rewrite the differential equation as follows,
y 0 (t) y 0 (t) dt
Z Z
1
=1 ⇒ = dt.
ry(t) + b r y(t) + (b/r)
In the left-hand side introduce the substitution
( )
u = y + (b/r) 1
Z
du
Z
⇒ = dt.
du = y 0 dt. r u
The integral is is now simple to find,
1
ln(|u|) = t + c0 ⇒ ln |y + (b/r)| = rt + c1 ,
r
where c1 = r c0 . Compute exponentials on both sides above, and recall e(a1 +a2 ) = ea1 ea2 ,
|y + (b/r)| = ert+c1 = ert ec1 .
We take out the absolute value,
y(t) + (b/r) = (±ec1 ) ert .
If we rename the constant as c = (±ec1 ), then all solutions of the differential equation are
b
y(t) = c ert − ,
r
where c is any constant. C

1.2.3. The Exponential Decay Equation. Radioactive materials are not stable—
their nuclei breaks into several smaller nuclei and radiation after a certain time. This means
that a radioactive material changes into different materials as times goes on. Uranium-235,
Radium-226, Radon-222, Polonium-218, Lead-214, Cobalt-60, Carbon-14, are examples or
radioactive materials. The radioactive decay of a single atom cannot be predicted, but the
decay of a large number can. It turns our that a large amount of a radioactive material
decreases in time in a way described by the exponential decay equation.
1.2. INTRODUCTION TO MODELING 19

Definition 1.2.4. The exponential decay equation for the function N , which depends
on the independent variable t, is
N 0 = −r N,
where r > 0 is a constant called the decay constant.

We solve this equation is a similar way we solved the exponential growth equation.

Theorem 1.2.5 (Exponential Decay). The exponential decay equation


N 0 = −r N
with r 6= 0, constant, has infinitely many solutions,
N (t) = c e−rt , (1.2.9)
where c is any constant. Furthermore, given a constant N0 , there is only one solution to
the pair of equations
N 0 = −r N, N (0) = N0 , (1.2.10)
and the solution is
N (t) = N0 e−rt . (1.2.11)

Remark: To prove this theorem replace r → −r in the proof of Theorem 1.2.3.


Example 1.2.4 (Radioactive Material with a Constant Term).
(a) A radioactive material has a decay rate proportional to the amount of radioactive ma-
terial present at that time, with a proportionality factor of 9 per unit time. Write a
differential equation of the form N 0 = F (N ), which models this situation, where N is
the amount or radioactive material micrograms as function of time.
(b) Suppose now that we introduce radioactive material at the constant rate of 10 micro-
grams per unit time. Write the differential equation N 0 = F (N ) in this case.

Solution:
(a) Since the material has a rate proportional to the amount of material present, then
N 0 (t) = a N (t).
Since the rate is a decay rate, that means that a < 0, so we write it as
N 0 (t) = −r N (t), r > 0.
The problem gives us the decay rate, r = 8, so we get
N 0 (t) = −8 N (t).

(b) We now add radioactive material at a constant rate of 10 micrograms per unit time, so
the equation is
N 0 (t) = −8 N (t) + 10.

Remark: Radioactive materials are often characterized not by their decay constant, k, but
by their half-life, τ . The half-life is the time it takes for half the material to decay.
20 1. FIRST ORDER EQUATIONS

Definition 1.2.6. The half-life of a radioactive substance is the time τ such that
N (0)
N (τ ) = .
2

There is a simple relation between the material constant and the material half-life.

Theorem 1.2.7. A material decay constant r and half-life τ are related by the equation

rτ = ln(2).

Proof of Theorem 1.2.7: We know that the amount of a radioactive material as function
of time is given by
N (t) = N0 e−rt .
Then, the definition of half-life implies,
N0 1
= N0 e−rτ ⇒ −rτ = ln ⇒ rτ = ln(2).
2 2
This establishes the Theorem. 
One application of radioactive materials is dating remains with Carbon-14. The Carbon-
14 is a radioactive isotope of Carbon-12 with a half-life of τ = 5730 years. Carbon-14 is being
constantly created in the upper atmosphere—by collisions of Carbon-12 with outer space
radiation—and is accumulated by living organisms. While the organisms live, the amount
of Carbon-14 in their bodies is held constant—the decay of Carbon-14 is compensated with
new amounts when the organisms breath or eat. When the organisms die, the amount of
Carbon-14 in their remains decays in time. The balance between normal and radioactive
carbon in the remains changes in time.

Example 1.2.5 (Carbon-14 Dating). Bone remains in an ancient excavation site contain
only 14% of the Carbon-14 found in living animals today. Estimate how old are the bone
remains. The half-life of the Carbon-14 is τ = 5730 years.
Solution: Let us set t = 0 to be the time when the organism in the site died. Then, we
write the solution of the radioactive decay equation using the initial amount of material,
N (0), as follows,
N (t) = N (0) e−rt , rτ = ln(2).
The time since the organism died till the preset is called t1 . So, at t = t1 we have 14 % of
the original Carbon-14,
14 14  14 
N (0) e−rt1 = N (0) ⇒ e−rt1 = ⇒ −rt1 = ln ,
100 100 100
1  100 
so we get t1 = ln . Since r and τ are related by rτ = ln(2),
r 14
ln(100/14)
t1 = τ .
ln(2)
We get that the organism died t1 = 16, 253 years ago. C
1.2. INTRODUCTION TO MODELING 21

1.2.4. The Logistic Population Model. Population systems with infinite resources
are described by the exponential growth equation. But when the resources become finite the
populations cannot sustain an exponential growth. The precise way such populations grow
depends on the specific details of the system. We now discuss one of the simplest models
for a population with finite resources—the logistic equation.
Consider a population system with finite food. If the initial population is small enough,
so that the food seems unlimited for this population, then the rate of growth of the popu-
lation is proportional to its size—just as in the exponential growth model.
If P (t) is small enough, then P 0 (t) ' r P (t), r > 0.
However, if the population becomes too large to be supported by the finite food in the
environment, then the rate of growth becomes smaller than r P (t).
If P (t) is large enough, then P 0 (t) < r P (t), r > 0.
How small is P 0 depends on the specifics of the population model. The logistic equation is
a simple modification of the exponential growth equation that describes this new situation.

Definition 1.2.8. The logistic equation for the function P , which depends on the inde-
pendent variable t, is
 P (t) 
P 0 (t) = r P (t) 1 − , (1.2.12)
Pc
with r > 0 the growth rate per capita and Pc > 0 the carrying capacity of the environment.

The equation in 1.2.12 is also called the logistic population model with growth rate r and
carrying capacity Pc . The main difference between the exponential growth and the logistic
equations is the negative term quadratic in P in the latter. The logistic equation is
r 2
P0 = r P − P ,
Pc
so when 0 < P << Pc the right-hand side above is close to r P , but when P approaches
Pc the right-hand side above approaches zero. Therefore, when the population approaches
the carrying capacity of the system, which is a critical population, the rate of change of the
population P 0 decreases. And P 0 = 0 for P = Pc . When P > Pc , we get that P 0 < 0 and
the population decreases, approaching the value Pc .

Example 1.2.6 (Finite Food and Harvesting). Consider a population of a single species
of bacteria in a petri dish, which has limited resources (in terms of space or nutrition).
Suppose that when the population is relatively small, the growth rate is proportional to the
population with a proportionality factor of 3 per unit time. There is also a carrying capacity
of 10 micrograms of bacteria, and a constant rate 7 of harvesting. Write a differential
equation of the form P 0 = F (P ), which models this situation, where P is the bacteria
population in micrograms as function of time.
Solution: We need to write the differential equation for the situation described in the
problem. When the population is small it grows at a rate, P 0 , proportional to the population
at that time,
P 0 (t) = r P (t) for P small.
The proportionality factor is r = 3, so,
P 0 (t) = 3 P (t), for P small.
22 1. FIRST ORDER EQUATIONS

When P is large, P 0 is not given by the equation above, because there is a carrying capacity
of Pc = 10 micrograms of bacteria. This carrying capacity describes that the food is limited.
If we take this into account the equation for all P is
 P
P 0 (t) = 3 P (t) 1 − .
10
However, the equation above does not take into account that we harvest bacteria. To harvest
bacteria at a constant rate of h = 7 means that 7 micrograms of bacteria per unit time are
removed from the dish. If we put this into the equation we get
 P
P 0 (t) = 3 P (t) 1 − − 10.
10
C

The solutions of the logistic equation, with or without a harvesting term, can be obtained
is a similar way as we obtained the solutions of the exponential growth equation. The
integration is more complicated now. We will see how to solve the logistic equation in a
future section. In this section we focus on qualitative properties of the solutions of the
logistic equation, as we show in the following example.

Example 1.2.7 (Qualitative Properties of the Solutions). Suppose the function P is a


solution to the logistic equation
 P (t) 
P 0 (t) = r P (t) 1 − .
Pc
(a) For what values of P is the population in equilibrium—that is, time independent?
(b) For what values of P is the population increasing in time?
(c) For what values of P is the population decreasing in time?
Solution:
(a) If P is an equilibrium solution, then P (t) = P̃ constant. Therefore, P 0 (t) = P̃ 0 = 0.
But P̃ is solution of the logistic equation, that means
 P̃ 
0 = P̃ 0 = r P̃ 1 − ⇒ P̃ = 0 or P̃ = Pc .
Pc
So, there are only two equilibrium solutions, the trivial case of no population P = 0, or
the case when the population is equal the carrying capacity P = Pc .
(b) Any solution of the logistic equation is increasing when P 0 (t) > 0, but the logistic
equation implies that
 P (t)   P (t) 
r P (t) 1 − > 0 ⇒ P (t) 1 − > 0.
Pc Pc
Since P (t) > 0, the population is going to be increasing for
 P (t) 
1− > 0 ⇒ 0 < P (t) < Pc .
Pc
(c) Any solution of the logistic equation is decreasing when P 0 (t) < 0. But a calcula-
tion similar to the one in the previous part implies that the population is going to be
decreasing for
P (t) > Pc .
People say that in this case “The population exceeds the carrying capacity of the envi-
ronment.” C
1.2. INTRODUCTION TO MODELING 23

There is one further property of the solutions to the logistic equation: two different
solutions never intersect. We will see that this is the case when we compute all the solutions
of the logistic equation in a future section. But using this property and the properties in
found in Example 1.2.7, we can sketch a qualitative graph of the solutions to the logistic
equation. The result is given in Figure 3.

Pc Equil. Sol.

Equil. Sol.
0 t

Figure 3. Qualitative graphs of solutions P of the logistic equation.


(Click on this Interactive Graph)

The logistic equation also predicts how infectious diseases spread in certain simple cases.

Example 1.2.8 (Infectious Disease without Recovery). It was Anderson McKendrick—one


of the founders of mathematical epidemiology—who, around 1912, proposed a differential
equation to describe how an infectious disease can spread among a certain population.
He proposed that the rate of change in time of the infected individuals is proportional
to the product of the infected times the susceptible-to-be-infected populations. Write the
differential equation describing this system.
Solution: Denote by X(t) > 0 the number of susceptible-to-be-infected—healthy—individuals
at the time t, and by Y (t) > 0 the number of infected individuals at the time t. If these are
all the individuals, then denote by N = X(t) + Y (t), with N > 0 constant in time. The rate
of change in time of the infected individuals is Y 0 = dY /dt. McKendrick model says that
Y 0 (t) = k̃ Y (t) X(t).
Since the total population number is constant, X(t) = N − Y (t), then
Y 0 = k̃ Y (N − Y ).
Introduce the infected fraction population I(t) = Y (t)/N and write k̃ = k/N , then
I 0 (t) = k I(t) 1 − I(t) .


So infectious diseases can be described by the logistic equation. C


24 1. FIRST ORDER EQUATIONS

1.2.5. Interacting Species Model. We now describe a more complicated system


consisting of two species interacting is some way. The species could be competing for the
same finite food resources, for example rabbits and sheep, which compete on the grass on a
particular piece of land. Or the species could be cooperating, that is acting together for a
common benefit.
We now show how we can construct a simple mathematical model for competing species.
One way is to start with the case that they do not compete at all. Each species has its own,
finite, food resources. Let’s call R(t) and S(t) the populations of rabbits and sheep at the
time t. Then the independent species population model with finite food resources is nothing
more than two logistic equations, one for each species,
 R
R0 = rr R 1 −
Rc
 S
S 0 = rs S 1 − ,
Sc
where rr , rs are the growth rates and Rc , Sc are the carrying capacities for each species.
The next step is to introduce competition terms into this model. Consider the first
equation above for the rabbit population. The fact that sheep are eating the same grass as
the rabbits will decrease R0 , so we need to add a term of the form α R(t) S(t) on the right
hand side, that is
 R
R0 = rr R 1 − + α R S,
Rc
with α < 0. The new term must be negative, because the competition must decrease R0 .
And this new term must depend on the product of the populations RS, because the product
is a good measure of how often the species meet when they compete for the same grass.
If the rabbit population is very small, then the interaction must be also small. Something
similar must occur for the sheep population, so we get
 S
S 0 = rs S 1 − + β R S,
Sc
with β < 0.
We summarize the ideas above and we generalize from competition to any type of
interactions in the following definition.

Definition 1.2.9. The interacting species equation for the functions x and y, which
depend on the independent variable t, are
 x
x0 = rx x 1 − + αxy
xc
 y 
y 0 = ry y 1 − + β x y,
yc
where the constants rx , ry and xc , xc are positive and α, β are real numbers. Furthermore,
we have the following particular cases:
• The species compete if and only if α < 0 and β < 0.
• The species cooperate if and only if α > 0 and β > 0.
• The species y cooperates with x when α > 0, and x competes with y when β < 0.

Unlike the exponential growth equation, 1.2.5, and the logistic equation, 1.2.12, for most
values of the parameters the interacting species equation the solutions cannot be written
in terms simple functions such as exponentials, trigonometric functions, and polynomials.
Nonetheless, we can describe the behavior of the system using qualitative approaches and
we can find numerical solutions to the system. In this course we will employ all there
1.2. INTRODUCTION TO MODELING 25

approaches (analytic, qualitative, and numerical) to gain a better understanding of the


differential equations we are studying.

Example 1.2.9. The following systems are models of the populations of pair of species
that either compete for resources (an increase in one species decreases the growth rate
in the other) or cooperate (an increase in one species increases the growth rate in the
other). For each of the following systems identify the independent and dependent variables,
the parameters, such as growth rates, carrying capacities, measure of interactions between
species. Do the species compete of cooperate?
(a)
(b)
dx x2
= c1 x − c1 − b1 xy dx x2
dt K1 =x− + 5 xy
dt 5
dy y2 2
= c2 y − c2 − b2 xy. dy y
dt K2 = 2y − + 2 xy.
dt 6
where all the coefficients above are positive.
Solution:
(a) (b)
• The species compete. • The species coperate.
• t is the independent variable. • t is the independent variable.
• x, y are the dependent variables. • x, y are the dependent variables.
• c1 , c2 are the growth rates. • 1, 2 are the growth rates.
• K1 , K2 are the carrying capacities. • 5, 12 (not 6) are the carrying capac-
• −b1 , −b2 are the competition coef- ities.
ficients. • 5, 2 are the cooperation coefficients.
C

1.2.6. Overview of Mathematical Models. A mathematical model is a description


of a physical system using difference or differential equations. We do not try to create a
perfect copy of a real-life situation, but rather to capture certain features of the system—the
features we are interested in. To create a mathematical model usually involves following a
few basic steps:
Step 1. Clearly state all the assumptions needed to isolate the feature you are interested
to describe.
Step 2. Introduce the independent variables, dependent variables.
Step 3. Introduce all the parameters needed to specify the model.
Step 4. Use your assumptions to derive equations relating the variables and parameters.
Step 5. Analyze the predictions of the model—do they make physical sense, do they agree
with your data? If not, you might need to revise your assumptions, going back to
Step 1.
As an example, consider the population models we studied in this section:
• The step 1 could include: Does the system have unlimited food supplies? If yes,
we would end up with the exponential growth equation. If not, we would end up
with the logistic equation.
• The step 2 could include: What is the dependent variable? Is it a population or
the amount of a radioactive material? Is time the independent variable?
• The step 3 is how we obtained the differential equation for each population model.
• Step 4 is the analysis of the solutions, which we carried over for the exponential
growth and the logistic population models.
26 1. FIRST ORDER EQUATIONS

We do not want to end this section without mentioning a few important differential
equations from physics.

Example 1.2.10 (Famous Differential Equations from Physics).


(a) Newton’s law: Mass times acceleration equals force, ma = f , where m is the particle
mass, a = d2 x/dt2 is the particle acceleration, and f is the force acting on the particle.
Hence Newton’s law is the differential equation
d2 x  dx 
m 2 (t) = f t, x(t), (t) ,
dt dt
where the unknown is x(t)—the position of the particle in space at the time t. As we
see above, the force may depend on time, on the particle position in space, and on the
particle velocity.
Remark: This is a second order Ordinary Differential Equation (ODE).
(b) The Heat Equation: The temperature T in a solid material changes in time and in
three space dimensions—labeled by x = (x, y, z)—according to the equation
∂T  ∂2T ∂2T ∂2T 
(t, x) = k (t, x) + (t, x) + (t, x) , k > 0,
∂t ∂x2 ∂y 2 ∂z 2
where k is a positive constant representing thermal properties of the material.
Remark: This is a first order in time and second order in space PDE.
(c) The Wave Equation: A wave perturbation u propagating in time t and in three space
dimensions—labeled by x = (x, y, z)—through the media with wave speed v > 0 is
∂2u  2
2 ∂ u ∂2u ∂2u 
(t, x) = v (t, x) + (t, x) + (t, x) .
∂t2 ∂x2 ∂y 2 ∂z 2

C
Remark: This is a second order in time and space Partial Differential Equation (PDE).

The equation in the example (a) is called an ordinary differential equations (ODE)— the
unknown function depends on a single independent variable, t. The equations in examples
(b) and (c) are called partial differential equations (PDE)—the unknown function depends
on two or more independent variables, t, x, y, and z, and their partial derivatives appear in
the equations.
The order of a differential equation is the highest derivative order that appears in
the equation. Newton’s equation in example (a) is second order, the wave equation in
example (b) is second order is time and space variables, and the heat equation in example (c)
is first order in time and second order in space variables.

Notes.
This section and the exercises below follow § 1.1 in Blanchard et.al. [2].
1.2. INTRODUCTION TO MODELING 27

1.2.7. Exercises.

1.2.1.- Consider the population model


  
dP P P
= 0.2 1 − ,
dt 200 50
where P (t) is the population at time t.

(1) For what values of P is the population in equilibrium?


(2) For what values of P is the population increasing?
(3) For what values of P is the population decreasing?

1.2.2.- Consider a population of a single species of fish in a lake, which has limited resources (in
terms of space, nutrition, etc.). Suppose the growth rate is r = 1 1/year, and the carrying
capacity is 10 thousand fish. We introduce a constant rate h of fishing, that is assume h fish
are removed each year from the lake.

(1) Write down a differential equation, which models this situation.


(2) Determine the dependence on the number of critical points on the parameter h.
(3) Choose a value of h, which yields two different critical points, c1 and c2 , with c1 < c2 .
(4) Make a sketch of 3 typical solution curves, with different initial conditions at each side of
the equilibrium solutions to represent three qualitatively different scenarios.
(5) Give a biological interpretation of c1 and c2 .

1.2.3.- The following systems are models of the populations of pair of species that either compete
for resources (an increase in one species decreases the growth rate in the other) or cooperate
(an increase in one species increases the growth rate in the other). For each of the following
systems identify the independent and dependent variables, the parameters, such as growth
rates, carrying capacities, measure of interactions between species. Do the species compete of
cooperate?

(a) (b) (c)


dx x2 dx dx x2
= c1 x − c1 + b1 xy = c1 x − b1 xy = c1 x − c1 − b1 xy
dt K1 dt dt K1
2 dy 2
dy y = c2 y − b2 xy. dy y
= c2 y − c2 + b2 xy. = c2 y − c2 .
dt K2 dt dt K2

1.2.4.- Suppose students memorize lists according to the model


dL
= 2(1 − L),
dt
where L(t) is the portion of the list a student has learned at time t.

(a) If one student knows one-half of the list at time t = 0 and the other knows none of the
list, which student is learning most rapidly at t = 0? Explain.
(b) Will the student who starts out knowing none of the list ever catch up to the student who
starts out knowing one-half of the list?Justify your answer.
28 1. FIRST ORDER EQUATIONS

1.3. Qualitative Analysis


Along the years people have learnt different techniques to find explicit or implicit solutions to differ-
ential equations. However, the differential equations people have learnt to solve is a tiny part of all
possible differential equations. Furthermore, the simple functions we know from calculus—quotient
of polynomial, exponentials, logarithms, and trigonometric functions—are simply not enough to
write the solutions of all differential equations. There are differential equations with solutions that
cannot be written in terms of simple functions—these solutions define completely new functions.
So people gave up with the idea of finding formulas for the solutions to differential equations.
Instead, they had two ideas. First, they focused on proving whether a differential equations has
solutions or not, without finding a formula for the solutions. Second, they developed methods to
find the qualitative behavior of solutions to differential equations without actually computing the
explicit expression of these solutions. In this section we give a brief overview of the first idea,
summarized by Theorem 1.3.1. Then we spend the rest of the section with the second idea.

1.3.1. Existence and Uniqueness of Solutions. In general, solutions of differential equa-


tions contain free constants. Because solving a differential equation involves integrations and for
each integration we get a free integration constant. But sometimes one is not interested in all the
solutions to a differential equation, but only in those solutions satisfying extra conditions. For
example, in the case of Newton’s second law of motion for a point particle, one could be interested
only in solutions such that the particle is at a specific position at the initial time. Such condition is
called an initial condition, and it selects a subset of solutions of the differential equation. An initial
value problem means to find a solution to both a differential equation and an initial condition.
We now present the Picard-Lindelöf Theorem, which shows that a large class of differential
equations have solutions uniquely determined by appropriate initial data.

Theorem 1.3.1 (Picard-Lindelöf). Consider the initial value problem


y 0 (t) = f (t, y(t)), y(t0 ) = y0 . (1.3.1)
If the function f and its partial derivative ∂y f are continuous on some rectangle on the ty-plane
containing the point (t0 , y0 ) in its interior, then there is a unique solution y of the initial value
problem in (1.3.1) on an open interval I containing the point t0 .

Remark: There is no explicit formula for the solution to every nonlinear differential equation. Also
notice that the domain of a solution y to a nonlinear initial value problem may change when we
change the initial data y0 .

y
y1 (t)
y2 (0) y2 (t)
Example 1.3.1. Determine whether
the functions y1 and y2 given by their y0
graphs in Fig. 4 can be solutions of the
same differential equation satisfying the y1 (0)
hypotheses in Theorem 1.3.1. t0 t

Figure 4. The graph of two functions.

Solution: The functions defined by their graphs in Figure 4 can not be both solutions of the
same differential equation satisfying the hypotheses in Theorem 1.3.1. The reason is that their
graph intersect at some point (t0 , y0 ). If we use that point as the initial condition to solve the
differential equation, then Theorem 1.3.1 says that there exists only one solution for that
initial condition. So, if y1 (t) is the solution, then y2 (t) cannot be a solution, and viceversa. C
1.3. QUALITATIVE ANALYSIS 29

1.3.2. Direction Fields. Sometimes one needs to find information about solutions of a dif-
ferential equation without having to actually solve the equation. One way to do this is with the
direction fields. Consider a differential equation
y 0 (t) = f (t, y(t)).
We interpret the the right-hand side above in a new way.

(a) In the usual way, the graph of


f is a surface in the tyz-space,
where z = f (t, y),
(b) In the new way, f (t, y) is the
value of a slope of a segment at
each point (t, y) on the ty-plane.
(c) That slope is the value of y 0 (t),
which is the derivative of a solu-
tion y at t.
Figure 5. The function f as a slope
of a segment.

The ideas above suggest the following definition.

Definition 1.3.2. A direction field for the differential equation y 0 (t) = f (t, y(t)) is the graph on
the ty-plane of the values f (t, y) as slopes of a small segments.

Remark: Direction fields are also called slope fields.

Example 1.3.2. Find the direction field of the equation y 0 = y, and sketch a few solutions to the
differential equation for different initial conditions.
Solution: We use a computer to find the direction field, which is shown in Fig. 6. We have also
plotted solution curves corresponding to four solutions.
Notice that the solution curves are tangent at every point to the direction field. This is no
accident. Since the curve is a solution, its tangent vectors must be parallel to the direction field.
Also notice that the solution curves in this example agree with the uniqueness property of solutions
to initial value problems showed in Theorem 1.3.1: The solution curves corresponding to different
initial conditions do not intersect. (Recall that the solutions are y(t) = y0 et .) C

y
y0 = y

0 t
−1

Figure 6. Direction field for the equation y 0 = y and a few solutions.


30 1. FIRST ORDER EQUATIONS

Example 1.3.3. Find the direction field of the equation y 0 = sin(y), and sketch a few solutions to
the differential equation for different initial conditions y(0) = y0 .
Solution: We first mention that the equation above can be solved and the solutions are
sin(y) sin(y0 )
 =  et .
1 + cos(y) 1 + cos(y0 )
for any y0 ∈ R. This is an equation that defines the solution function y. There are no derivatives
in the equation, so this is not a differential equation; We call it an algebraic equation. However,
the graphs of these solutions are not simple to do. But the direction field is simple to plot and it
can be seen in Fig. 7. From that direction field one can see what the graph of the solutions look
like. C

y
y 0 = sin(y)

0 t

−π

Figure 7. Direction field for the equation y 0 = sin(y).

Example 1.3.4. Find the direction field of the equation y 0 = 2 cos(t) cos(y), and sketch a few
solutions to the differential equation for different initial conditions.
Solution: We do not need to compute the explicit solution of y 0 = 2 cos(t) cos(y) to have a quali-
tative idea of its solutions. The direction field can be seen in Fig. 8. C

y
y 0 = 2 cos(t) cos(y)

π
2

0 t
π

2

Figure 8. Direction field for the equation y 0 = 2 cos(t) cos(y).


1.3. QUALITATIVE ANALYSIS 31

1.3.3. Qualitative Solutions for Autonomous Equations. We now introduce a second


idea to find qualitative properties of solutions to differential equations without having to actually
solve the equation. This idea works on a particular type of differential equations—autonomous
differential equations. These are differential equations where the independent variable does not
appear explicitly in the equation. For these systems we find a few qualitative properties of their
solutions without actually computing the solution—rather by studying the equation itself.

Definition 1.3.3. A first order autonomous differential equation is


y 0 = f (y), (1.3.2)
0 dy
where y = , and the function f does not depend explictly on t.
dt
The autonomous equations we study in this section are a particular type of the separable
equations we will study later on. Here we show a few examples of autonomous and non-autonomous
equations.

Example 1.3.5. The following first order equations are autonomous:


(a) y 0 = 2 y + 3.
(b) y 0 = sin(y).
 y
(c) y 0 = r y 1 − .
K
The independent variable t does not appear explicitly in these equations. The following equations
are not autonomous.
(a) y 0 = 2 y + 3t.
(b) y 0 = t2 sin(y).
 y
(c) y 0 = t y 1 − . C
K

Remark: Since the autonomous equation in (1.3.2) is a particular case of the equations in the
Picard-Lindelöf Theorem 1.3.1. Then, the initial value problem y 0 = f (y), y(0) = y0 , with f
differentiable, always has a unique solution in the neighborhood of t = 0 for every value of the
initial data y0 .
The idea is to obtain qualitative information about solutions to an autonomous equation using
the equation itself, without solving it. We now use the equation in Example ?? to show how this
can be done.

Example 1.3.6. Sketch a qualitative graph of solutions to y 0 = sin(y), for different initial data
conditions y(0).
Solution: The differential equation has the form y 0 = f (y), where f (y) = sin(y). The first step in
the graphical approach is to graph the function f .

y0 = f f (y) = sin(y)

−2π −π 0 π 2π y

Figure 9. Graph of the function f (y) = sin(y).


32 1. FIRST ORDER EQUATIONS

The second step is to identify all the zeros of the function f . In this case,
f (y) = sin(y) = 0 ⇒ yn = nπ, where n = · · · , −2, −1, 0, 1, 2, · · · .
It is important to realize that these constants yn are solutions of the differential equation. On the
one hand, they are constants, t-independent, so yn0 = 0. On the other hand, these constants yn are
zeros of f , hence f (yn ) = 0. So yn are solutions of the differential equation
0 = yn0 = f (yn ) = 0.
The constants yn are called critical points, or fixed points. When the emphasis is on the fact that
these constants define constant functions solutions of the differential equation, then they are called
equilibrium solutions, or stationary solutions.

y0 = f f (y) = sin(y)

−2π −π 0 π 2π y

Figure 10. Critical points and increase/decrease information added to Fig. 9.

The third step is to identify the regions on the line where f is positive, and where f is negative.
These regions are bounded by the critical points. A solution y of y 0 = f (y) is increasing for f (y) > 0,
and it is decreasing for f (y) < 0. We indicate this behavior of the solution by drawing arrows on
the horizontal axis. In an interval where f > 0 we write a right arrow, and in the intervals where
f < 0 we write a left arrow, as shown in Fig. 10.
There are two types of critical points in Fig. 10. The points y-1 = −π, y1 = π, have arrows
on both sides pointing to them. They are called attractors, or stable points, and are pictured with
solid blue dots. The points y-2 = −2π, y0 = 0, y2 = 2π, have arrows on both sides pointing away
from them. They are called repellers, or unstable points, and are pictured with white dots.
The fourth step is to find the regions where the curvature
0 of a solution is concave up or concave
down. That information is given by y 00 = (y 0 )0 = f (y) = f 0 (y) y 0 = f 0 (y) f (y). So, in the regions
where f (y) f 0 (y) > 0 a solution is concave up (CU), and in the regions where f (y) f 0 (y) < 0 a
solution is concave down (CD). See Fig. 11.

y0 = f f 0 (y) = cos(y) f (y) = sin(y)

CU CD CU CD CU CD CU CD
−2π −π 0 π 2π y

Figure 11. Concavity information on the solution y added to Fig. 10.

This is all we need to sketch a qualitative graph of solutions to the differential equation. The
last step is to collect all this information on a ty-plane. The horizontal axis above is now the vertical
axis, and we now plot soltuions y of the differential equation. See Fig. 12.
Fig. 12 contains the graph of several solutions y for different choices of initial data y(0).
Stationary solutions are in blue, t-dependent solutions in green. The stationary solutions are
1.3. QUALITATIVE ANALYSIS 33

2π Unstable
CD

CU
π Stable
CD

CU
Unstable
0 t
CD

CU
−π Stable
CD

CU
−2π Unstable

Figure 12. Qualitative graphs of solutions y for different initial conditions.

separated in two types. The stable solutions y-1 = −π, y1 = π, are pictured with solid blue lines.
The unstable solutions y-2 = −2π, y0 = 0, y2 = 2π, are pictured with dashed blue lines. C

Remark: A qualitative graph of the solutions does not provide all the possible information about
the solution. For example, we know from the graph above that for some initial conditions the
corresponding solutions have inflection points at some t > 0. But we cannot know the exact value
of t where the inflection point occurs. Such information could be useful to have, since |y 0 | has its
maximum value at those points.
In the Example 1.3.6 above we have used that the second derivative of the solution function is
related to f and f 0 . This is a result that we remark here in its own statement.

Theorem 1.3.4. If y is a solution of the autonomous system y 0 = f (y), then


y 00 = f 0 (y) f (y).

Remark: This result has been used to find out the curvature of the solution y of an autonomous
system y 0 = f (y). The graph of y has positive curvature iff f 0 (y) f (y) > 0 and negative curvature
iff f 0 (y) f (y) < 0.
Proof: The differential equation relates y 00 to f (y) and f 0 (y), because of the chain rule,
d  dy  d df dy
y 00 = = f (y(t)) = ⇒ y 00 = f 0 (y) f (y).
dt dt dt dy dt


Example 1.3.7. Sketch a qualitative graph of solutions for different initial data conditions y(0) =
y0 to the logistic equation below, where r and K are given positive constants,
 y
y 0 = ry 1 − .
K
34 1. FIRST ORDER EQUATIONS

f  y
Solution: f (y) = ry 1 −
rK K
The logistic differential equation for pop-
4
ulation growth can be written y 0 = f (y),
where function f is the polynomial
 y
f (y) = ry 1 − .
K 0 K K y
The first step in the graphical approach 2
is to graph the function f . The result is
in Fig. 13.
Figure 13. The graph of f .

The second step is to identify all criti-


cal points of the equation. The critical
points are the zeros of the function f . In
f
this case, f (y) = 0 implies  y
f (y) = ry 1 −
rK K
y0 = 0, y1 = K.
4
The third step is to find out whether
the critical points are stable or unstable.
Where function f is positive, a solution
will be increasing, and where function f 0 K K y
is negative a solution will be decreasing. 2
These regions are bounded by the crit-
ical points. Now, in an interval where
Figure 14. Critical points added.
f > 0 write a right arrow, and in the
intervals where f < 0 write a left arrow,
as shown in Fig. 14.

The fourth step is to find the regions


f
where the curvature of a solution is con-  y
f (y) = ry 1 −
cave up or concave down. That informa- K
tion is given by y 00 . But the differential rK
equation relates y 00 to f (y) and f 0 (y). 4
We have shown in Example 1.3.6 that
the chain rule and the differential equa- CD CU CD CU
0 K K y
tion imply,
2
y 00 = f 0 (y) f (y)
So the regions where f (y) f 0 (y) > 0 a f 0 (y) = r −
2r
y
solution is concave up (CU), and the re- K
gions where f (y) f 0 (y) < 0 a solution
is concave down (CD). The result is in Figure 15. Concavity informa-
Fig. 15. tion added.

This is all the information we need to sketch a qualitative graph of solutions to the differential
equation. So, the last step is to put all this information on a yt-plane. The horizontal axis above is
now the vertical axis, and we now plot solutions y of the differential equation. The result is given
in Fig. 16.
The picture above contains the graph of several solutions y for different choices of initial data
y(0). Stationary solutions are in blue, t-dependent solutions in green. The stationary solution
1.3. QUALITATIVE ANALYSIS 35

CU

K Stable

CD

K
2

CU

Unstable
0 t

CD

Figure 16. Qualitative graphs of solutions y for different initial conditions.

y0 = 0 is unstable and pictured with a dashed blue line. The stationary solution y1 = K is stable
and pictured with a solid blue line. C

Notes.
This section follows a few parts of Chapter 2 in Steven Strogatz’s book on Nonlinear Dynamics and
Chaos, [7], and also § 2.5 in Boyce DiPrima classic textbook [3].
36 1. FIRST ORDER EQUATIONS

1.3.4. Exercises.

1.3.1.- Consider the differential equation


dy
= xy.
dx
Sketch vectors from the corresponding slope field of this differential equation, at the points
indicated on the figure below.

1.3.2.- Match the slope fields of the following three differential equations to the three figures below.
Provide justification for your reasoning.
dy dy dy
A. = cos(x), B. = sin(y), C. = x + y.
dx dx dx

Figure 17. I Figure 18. II Figure 19. III

1.3.3.- For the differential equations below do the following:

(a) Find all the equilibrium solutions of the differential equation.


(b) Find the open intervals where solutions are increasing.
(c) Find the open intervals where solutions are decreasing.
(d) Find the open intervals where solutions are concave up.
(e) Find the open intervals where solutions are concave up.
(f) With the information above sketch a qualitative graph of the several solutions for each
differential equation.
 y
(i) y 0 = sin(3y), (ii) y 0 = cos(3y), (iii) y 0 = y 2 − 9, (iv) y 0 = 9 − y 2 , (v) y 0 = y 1 − .
2
1.4. SEPARABLE EQUATIONS 37

1.4. Separable Equations


We have seen how to find qualitative properties of solutions to differential equations without
actually solving the equation. It is time, however, to start solving differential equations, at least
those that are simple to solve. In this section we focus on a particular type of differential equations
that are specially simple to solve—separable equations. The simplest idea to solve a differential
equation actually works well with separable equations—integrate on both sides of the equation.

1.4.1. Separable Equations. We now introduce a particular type of differential equations


called separable equations. They include the autonomous equations shown in the previous sections,
such us the exponential growth and decay equations, and the logistic equations. To solve a separable
equation just integrate on both sides of the equation.

Definition 1.4.1. A separable differential equation for the function y is

h(y) y 0 = g(t),

where h, g are given functions.

Remark: A separable differential equation is h(y) y 0 = g(y) has the following properties:
• The left-hand side depends explicitly only on y, so any t dependence is through y.
• The right-hand side depends only on t.
• And the left-hand side is of the form (something on y) × y 0 .

Example 1.4.1.
(a) Autonomous equations can be transformed into separable equations. Indeed,

 g(t) = 1,

0 y0
y = f (y) ⇒ =1 ⇒ 1
f (y) h(y) = .
f (y)

t2
(b) The differential equation y 0 = can easily be transformed separable, since
1 − y2

g(t) = t2 ,
(
2 0 2
1−y y =t ⇒
h(y) = 1 − y 2 .

(c) The differential equation y 0 + y 2 cos(2t) = 0 can be transformed into a separable, since

 g(t) = − cos(2t),

1 0
y = − cos(2t) ⇒ 1
y2 h(y) = 2 .
y
The functions g and h are not uniquely defined; another choice in this example is:
1
g(t) = cos(2t), h(y) = − .
y2
(d) The exponential growth or decay equation y 0 = a(t) y can be transformed into a separable,
since

 g(t) = a(t),
1 0
y = a(t) ⇒ 1
y h(y) = .
y

(e) The equation y 0 = ey + cos(t) is not separable and there is no way we can transform it into a
separable equation.
38 1. FIRST ORDER EQUATIONS

(f) The constant coefficient equation y 0 = a0 y + b0 can be transformed into a separable equation,
since,
 g(t) = 1,

1 0
y =1 ⇒ 1
(a0 y + b0 ) h(y) = .
(a0 y + b0 )
(g) The variable coefficient equation y 0 = a(t) y + b(t), with a 6= 0 and b/a nonconstant, is not
separable and cannot be transformed into a separable equation.
C

Separable differential equations are simple to solve. We just integrate on both sides of the
equation with respect to the independent variable t. We show this idea in the following example.

Example 1.4.2. Find all solutions y to the differential equation


y0
− = cos(2t).
y2

Solution: The differential equation above is separable, with


1
g(t) = cos(2t), h(y) = − .
y2
Therefore, it can be integrated as follows:
y0 y 0 (t)
Z Z
− = cos(2t) ⇔ − dt = cos(2t) dt + c.
y2 y 2 (t)
The integral on the right-hand side can be computed explicitly. The integral on the left-hand side
can be done by substitution. The substitution is
Z Z
du
u = y(t), du = y 0 (t) dt ⇒ − 2 dt = cos(2t) dt + c.
u
This notation makes clear that u is the new integation variable, while y(t) are the unknown function
values we look for. However it is common in the literature to use the same name for the variable
and the unknown function. We will follow that convention, and we write the substitution as
Z Z
dy
y = y(t), dy = y 0 (t) dt ⇒ − 2 dt = cos(2t) dt + c.
y
We hope this is not too confusing. Integrating on both sides above we get
1 1
= sin(2t) + c.
y 2
So, we get the implicit and explicit form of the solution,
1 1 2
= sin(2t) + c ⇔ y(t) = .
y(t) 2 sin(2t) + 2c
C

Remark: Notice the following about the equation and its implicit solution:
1 0 1
− y = cos(2t) ⇔ h(y) y 0 = g(t), h(y) = − , g(t) = cos(2t),
y2 y2
1 0 1 1 1
y = sin(2t) ⇔ H(y) = G(t), H(y) = , G(t) = sin(2t).
y 2 y 2
R
• Here H is an antiderivative of h, that is, H(y) =R h(y) dy.
• Here G is an antiderivative of g, that is, G(t) = g(t) dt.
1.4. SEPARABLE EQUATIONS 39

Theorem 1.4.2 (Separable Equations). If h, g are continuous, with h 6= 0, then


h(y) y 0 = g(t) (1.4.1)
has infinitely many solutions y satisfying the algebraic equation
H(y(t)) = G(t) + c, (1.4.2)
where c ∈ R is arbitrary, H and G are antiderivatives of h and g.

R
Remark:
R An antiderivative of h is H(y) = h(y) dy, while an antiderivative of g is the function
G(t) = g(t) dt.

Proof of Theorem 1.4.2: Integrate with respect to t on both sides in Eq. (1.4.1),
Z Z
h(y) y 0 = g(t) ⇒ h(y(t)) y 0 (t) dt = g(t) dt + c,

where c is an arbitrary constant. Introduce on the left-hand side of the second equation above the
substitution
y = y(t), dy = y 0 (t) dt.
The result of the substitution is
Z Z Z Z
h(y(t)) y 0 (t) dt = h(y)dy ⇒ h(y) dy = g(t) dt + c.

To integrate on each side of this equation means to find a function H, primitive of h, and a function
G, primitive of g. Using this notation we write
Z Z
H(y) = h(y) dy, G(t) = g(t) dt.

Then the equation above can be written as follows,


H(y) = G(t) + c,
which implicitly defines a function y, which depends on t. This establishes the Theorem. 

Example 1.4.3. Find all solutions y to the differential equation


t2
y0 = . (1.4.3)
1 − y2

Solution: We write the differential equation in (1.4.3) in the form h(y) y 0 = g(t),
1 − y 2 y 0 = t2 .


In this example the functions h and g defined in Theorem 1.4.2 are given by
h(y) = (1 − y 2 ), g(t) = t2 .
We now integrate with respect to t on both sides of the differential equation,
Z Z
1 − y 2 (t) y 0 (t) dt = t2 dt + c,


where c is any constant. The integral on the right-hand side can be computed explicitly. The
integral on the left-hand side can be done by substitution. The substitution is
Z Z
dy = y 0 (t) dt ⇒ 1 − y 2 (t) y 0 (t) dt = (1 − y 2 ) dy.

y = y(t),

This substitution on the left-hand side integral above gives,


y3 t3
Z Z
(1 − y 2 ) dy = t2 dt + c ⇔ y − = + c.
3 3
The equation above defines a function y, which depends on t. We can write it as
y 3 (t) t3
y(t) − = + c.
3 3
40 1. FIRST ORDER EQUATIONS

We have solved the differential equation, since there are no derivatives in the last equation. When
the solution is given in terms of an algebraic equation, we say that the solution y is given in implicit
form. C

Definition 1.4.3. A function y is a solution in implicit form of the equation h(y) y 0 = g(t) iff
the function y is solution of the algebraic equation

H y(t) = G(t) + c,
where H and G are any antiderivatives of h and g. In the case that function H is invertible, the
solution y above is given in explicit form iff is written as
y(t) = H −1 G(t) + c .


In the case that H is not invertible or H −1 is difficult to compute, we leave the solution y in
implicit form. We now solve the same example as in Example 1.4.3, but now we just use the result
of Theorem 1.4.2.

Example 1.4.4. Use the formula in Theorem 1.4.2 to find all solutions y to the equation
t2
y0 = . (1.4.4)
1 − y2

Solution: Theorem 1.4.2 tell us how to obtain the solution y. Writing Eq. (1.4.4) as
1 − y 2 y 0 = t2 ,


we see that the functions h, g are given by


h(y) = 1 − y 2 , g(t) = t2 .
Their primitive functions, H and G, respectively, are simple to compute,
y3
h(y) = 1 − y 2 ⇒ H(y) = y − ,
3
t3
g(t) = t2 ⇒ G(t) =.
3
Then, Theorem 1.4.2 implies that the solution y satisfies the algebraic equation
y 3 (t) t3
y(t) − = + c, (1.4.5)
3 3
where c ∈ R is arbitrary. C

Remark: Sometimes it is simpler to remember ideas than formulas. So one can solve a separable
equation as we did in Example 1.4.3, instead of using the solution formulas, as in Example 1.4.4.
(Although in the case of separable equations both methods are very close.)
In the next Example we show that an initial value problem can be solved even when the
solutions of the differential equation are given in implicit form.

Example 1.4.5. Find the solution of the initial value problem


t2
y0 = , y(0) = 1. (1.4.6)
1 − y2

Solution: From Example 1.4.3 we know that all solutions to the differential equation in (1.4.6) are
given by
y 3 (t) t3
y(t) − = + c,
3 3
1.4. SEPARABLE EQUATIONS 41

where c ∈ R is arbitrary. This constant c is now fixed with the initial condition in Eq. (1.4.6)
y 3 (0) 0 1 2 y 3 (t) t3 2
y(0) − = +c ⇒ 1− =c ⇔ c= ⇒ y(t) − = + .
3 3 3 3 3 3 3
So we can rewrite the algebraic equation defining the solution functions y as the (time dependent)
roots of a cubic (in y) polynomial,
y 3 (t) − 3y(t) + t3 + 2 = 0.
C

Example 1.4.6. Find the solution of the initial value problem


y 0 + y 2 cos(2t) = 0, y(0) = 1. (1.4.7)

Solution: The differential equation above can be written as


1 0
− y = cos(2t).
y2
We know, from Example 1.4.2, that the solutions of the differential equation are
2
y(t) = , c ∈ R.
sin(2t) + 2c
The initial condition implies that
2
1 = y(0) = ⇔ c = 1.
0 + 2c
So, the solution to the IVP is given in explicit form by
2
y(t) = .
sin(2t) + 2
C

Example 1.4.7. Follow the proof in Theorem 1.4.2 to find all solutions y of the equation
4t − t3
y0 = .
4 + y3

Solution: The differential equation above is separable, with


h(y) = 4 + y 3 , g(t) = 4t − t3 .
Therefore, it can be integrated as follows:
Z Z
4 + y 3 y 0 = 4t − t3 ⇔ 4 + y 3 (t) y 0 (t) dt = (4t − t3 ) dt + c.
 

Again the substitution


y = y(t), dy = y 0 (t) dt
implies that
y4 t4
Z Z
3
(4 + y ) dy = (4t − t3 ) dt + c0 . ⇔ 4y + = 2t2 − + c0 .
4 4
Calling c1 = 4c0 we obtain the following implicit form for the solution,
y 4 (t) + 16y(t) − 8t2 + t4 = c1 .
C
42 1. FIRST ORDER EQUATIONS

Example 1.4.8. Find the solution of the initial value problem below in explicit form,
2−t
y0 = , y(0) = 1. (1.4.8)
1+y

Solution: The differential equation above is separable with


h(y) = 1 + y, g(t) = 2 − t.
Their primitives are respectively given by,
y2
h(y) = 1 + y ⇒ H(y) = y +
,
2
2
t
g(t) = 2 − t ⇒ G(t) = 2t − .
2
Therefore, the implicit form of all solutions y to the ODE above are given by
y 2 (t) t2
y(t) + = 2t − + c,
2 2
with c ∈ R. The initial condition in Eq. (1.4.8) fixes the value of constant c, as follows,
y 2 (0) 1 3
y(0) + =0+c ⇒ 1+ =c ⇒ c= .
2 2 2
We conclude that the implicit form of the solution y is given by
y 2 (t) t2 3
y(t) + = 2t − + , ⇔ y 2 (t) + 2y(t) + (t2 − 4t − 3) = 0.
2 2 2
The explicit form of the solution can be obtained realizing that y(t) is a root in the quadratic
polynomial above. The two roots of that polynomial are given by
1 p  p
y-+ (t) = −2 ± 4 − 4(t2 − 4t − 3) ⇔ y-+ (t) = −1 ± −t2 + 4t + 4.
2
We have obtained two functions y+ and y- . However, we know that there is only one solution to
the initial value problem. We can decide which one is the solution by evaluating them at the value
t = 0 given in the initial condition. We obtain

y+ (0) = −1 + 4 = 1,

y- (0) = −1 − 4 = −3.
Therefore, the solution is y+ , that is, the explicit form of the solution is
p
y(t) = −1 + −t2 + 4t + 4.
C

1.4.2. Euler Homogeneous Equations. We have seen several examples of differential equa-
tions that are not separable but they can be transformed into a separable equation by doing a simple
change in the equation. For example the equation
y 0 = t2 y 3
can be changed into a separable equation simply dividing the whole equation by y 3 , that is
y0
= t2 ,
y3
which is separable. Sometimes a differential equation is not separable but it can be transformed into
a separable equation by a more complicated transformation that involves to change the unknown
function. This is the case for differential equations known as Euler homogenous equations.

Definition 1.4.4. An Euler homogeneous differential equation has the form


 y(t) 
y 0 (t) = F .
t
1.4. SEPARABLE EQUATIONS 43

Remark:
(a) Any function F of t, y that depends only on the quotient y/t is scale invariant. This means
that F does not change when we do the transformation y → cy, t → ct,
 (cy)  y
F =F .
(ct) t
For this reason the differential equations above are also called scale invariant equations.
(b) Scale invariant functions are a particular case of homogeneous functions of degree n, which are
functions f satisfying
f (ct, cy) = cn f (y, t).
Scale invariant functions are the case n = 0.
(c) An example of an homogeneous function is the energy of a thermodynamical system, such as
a gas in a bottle. The energy, E, of a fixed amount of gas is a function of the gas entropy, S,
and the gas volume, V . Such energy is an homogeneous function of degree one,
E(cS, cV ) = c E(S, V ), for all c ∈ R.

Example 1.4.9. Show that the functions f1 and f2 are homogeneous and find their degree,
f1 (t, y) = t4 y 2 + ty 5 + t3 y 3 , f2 (t, y) = t2 y 2 + ty 3 .

Solution: The function f1 is homogeneous of degree 6, since


f1 (ct, cy) = c4 t4 c2 y 2 + ct c5 y 5 + c3 t3 c3 y 3 = c6 (t4 y 2 + ty 5 + t3 y 3 ) = c6 f (t, y).
Notice that the sum of the powers of t and y on every term is 6. Analogously, function f2 is
homogeneous degree 4, since
f2 (ct, cy) = c2 t2 c2 y 2 + ct c3 y 3 = c4 (t2 y 2 + ty 3 ) = c4 f2 (t, y).
And the sum of the powers of t and y on every term is 4. C

Example 1.4.10. Show that the functions below are scale invariant functions,
y t3 + t2 y + t y 2 + y 3
f1 (t, y) = , f2 (t, y) = .
t t3 + t y 2

Solution: Function f1 is scale invariant since


cy y
= = f (t, y).
f (ct, cy) =
ct t
The function f2 is scale invariant as well, since
c3 t3 + c2 t2 cy + ct c2 y 2 + c3 y 3 c3 (t3 + t2 y + t y 2 + y 3 )
f2 (ct, cy) = 3 3 2 2
= = f2 (t, y).
c t + ct c y c3 (t3 + t y 2 )
C

More often than not, Euler homogeneous differential equations come from a differential equation
N y 0 + M = 0 where both N and M are homogeneous functions of the same degree.

Theorem 1.4.5. If the functions N , M , of t, y, are homogeneous of the same degree, then the
differential equation
N (t, y) y 0 (t) + M (t, y) = 0
is can be transformed into the Euler homogeneous equation
y y M (1, (y/t))
y0 = F , with F =− .
t t N (1, (y/t))
44 1. FIRST ORDER EQUATIONS

Proof of Theorem 1.4.5: Rewrite the equation as


M (t, y)
y 0 (t) = − ,
N (t, y)
M (t, y)
The function f (y, y) = − is scale invariant, because
N (t, y)
M (ct, cy) cn M (t, y) M (t, y)
f (ct, cy) = − =− n =− = f (t, y),
N (ct, cy) c N (t, y) N (t, y)
where we used that M and N are homogeneous of the same degree n. We now find a function F
such that the differential equation can be written as
y
y0 = F .
t
Since M and N are homogeneous degree n, we multiply the differential equation by “1” in the form
(1/t)n /(1/t)n , as follows,
M (t, y) (1/tn ) M ((t/t), (y/t)) M (1, (y/t)) y
y 0 (t) = − n
=− =− ⇒ y0 = F ,
N (t, y) (1/t ) N ((t/t), (y/t)) N (1, (y/t)) t
where y M (1, (y/t))
F =− .
t N (1, (y/t))
This establishes the Theorem. 

y2
Example 1.4.11. Show that (t − y) y 0 − 2y + 3t + = 0 is an Euler homogeneous equation.
t
Solution: Rewrite the equation in the standard form
 y2 
y 2 2y − 3t −
(t − y) y 0 = 2y − 3t − ⇒ y0 = t .
t (t − y)
So the function f in this case is given by
 y2 
2y − 3t −
f (t, y) = t .
(t − y)
This function is scale invariant, since numerator and denominator are homogeneous of the same
degree, n = 1 in this case,
 c2 y 2   y2 
2cy − 3ct − c 2y − 3t −
f (ct, cy) = ct = t = f (t, y).
(ct − cy) c(t − y)
So, the differential equation is Euler homogeneous. We now write the equation in the form y 0 =
F (y/t). Since the numerator and denominator are homogeneous of degree n = 1, we multiply them
by “1” in the form (1/t)1 /(1/t)1 , that is
 y2 
2y − 3t − (1/t)
0
y = t .
(t − y) (1/t)
Distribute the factors (1/t) in numerator and denominator, and we get
2(y/t) − 3 − (y/t)2
 y
y0 = ⇒ y0 = F ,
(1 − (y/t)) t
where
2(y/t) − 3 − (y/t)2
y 
F = .
t (1 − (y/t))
So, the equation is Euler homogeneous and it is written in the standard form. C
1.4. SEPARABLE EQUATIONS 45

Example 1.4.12. Determine whether the equation (1 − y 3 ) y 0 = t2 is Euler homogeneous.


Solution: If we write the differential equation in the standard form, y 0 = f (t, y), then we get
t2
f (t, y) = . But
1 − y3
c2 t2
f (ct, cy) = 6= f (t, y),
1 − c3 y 3
hence the equation is not Euler homogeneous. C

1.4.3. Solving Euler Homogeneous Equations. Theorem 1.4.6 transforms an Euler ho-
mogeneous equation into a separable equation, which we know how to solve.

Theorem 1.4.6. The Euler homogeneous equation


y
y0 = F
t
for the function y determines a separable equation for v = y/t, given by
v0 1
 = .
F (v) − v t

Remark: The original homogeneous equation for the function y is transformed into a separable
equation for the unknown function v = y/t. One solves for v, in implicit or explicit form, and then
transforms back to y = t v.
Proof of Theorem 1.4.6: Introduce the function v = y/t into the differential equation,
y 0 = F (v).
We still need to replace y 0 in terms of v. This is done as follows,
y(t) = t v(t) ⇒ y 0 (t) = v(t) + t v 0 (t).
Introducing these expressions into the differential equation for y we get

0 0 F (v) − v v0 1
v + t v = F (v) ⇒ v = ⇒  = .
t F (v) − v t
The equation on the far right is separable. This establishes the Theorem. 

t2 + 3y 2
Example 1.4.13. Find all solutions y of the differential equation y 0 = .
2ty
Solution: The equation is Euler homogeneous, since
c2 t2 + 3c2 y 2 c2 (t2 + 3y 2 ) t2 + 3y 2
f (ct, cy) = = 2
= = f (t, y).
2(ct)(cy) c (2ty) 2ty
Next we compute the function F . Since the numerator and denominator are homogeneous degree
“2” we multiply the right-hand side of the equation by “1” in the form (1/t2 )/(1/t2 ),
1  y 2
2 2
(t + 3y ) t2 1 + 3
y0 = 1 ⇒ y =
0
y t .
2ty 2
t2 t
Now we introduce the change of functions v = y/t,
1 + 3v 2
y0 = .
2v
Since y = t v, then y 0 = v + t v 0 , which implies
1 + 3v 2 1 + 3v 2 1 + 3v 2 − 2v 2 1 + v2
v + t v0 = ⇒ t v0 = −v = = .
2v 2v 2v 2v
46 1. FIRST ORDER EQUATIONS

We obtained the separable equation


1  1 + v2 
v0 = .
t 2v
We rewrite and integrate it,
Z Z
2v 1 2v 1
v0 = ⇒ v 0 dt = dt + c0 .
1 + v2 t 1 + v2 t
The substitution u = 1 + v 2 (t) implies du = 2v(t) v 0 (t) dt, so
Z Z
du dt
= + c0 ⇒ ln(u) = ln(t) + c0 ⇒ u = eln(t)+c0 .
u t
But u = eln(t) ec0 , so denoting c1 = ec0 , then u = c1 t. So, we get
 y 2 √
1 + v 2 = c1 t ⇒ 1 + = c1 t ⇒ y(t) = ±t c1 t − 1.
t
C

t(y + 1) + (y + 1)2
Example 1.4.14. Find all solutions y of the differential equation y 0 = .
t2
Solution: This equation is Euler homogeneous when written in terms of the unknown u(t) = y(t)+1
and the variable t. Indeed, u0 = y 0 , thus we obtain
t(y + 1) + (y + 1)2 tu + u2 u  u 2
y0 = 2
⇔ u0 = 2
⇔ u0 = + .
t t t t
Therefore, we introduce the new variable v = u/t, which satisfies u = t v and u0 = v + t v 0 . The
differential equation for v is
Z 0 Z
v 1
v + t v0 = v + v2 ⇔ t v0 = v2 ⇔ 2
dt = dt + c,
v t
with c ∈ R. The substitution w = v(t) implies dw = v 0 dt, so
Z Z
1 1
w−2 dw = dt + c ⇔ −w−1 = ln(|t|) + c ⇔ w=− .
t ln(|t|) + c
Substituting back v, u and y, we obtain w = v(t) = u(t)/t = [y(t) + 1]/t, so
y+1 1 t
=− ⇔ y(t) = − − 1.
t ln(|t|) + c ln(|t|) + c
C

Notes. This section corresponds to Boyce-DiPrima [3] Section 2.2. Zill and Wright study separable
equations in [10] Section 2.2, and Euler homogeneous equations in Section 2.5. Zill and Wright
organize the material in a nice way, they present first separable equations, then linear equations,
and then they group Euler homogeneous and Bernoulli equations in a section called Solutions by
Substitution. Once again, a one page description is given by Simmons in [6] in Chapter 2, Section
7.
1.4. SEPARABLE EQUATIONS 47

1.4.4. Exercises.

1.4.1.- Find the explicit expressions of all solutions y to the equation


t2
y0 = .
y

1.4.2.- Find the explicit expressions of all solutions y to the equation


3t2 + 4y 3 y 0 − 1 + y 0 = 0.

1.4.3.- Find the solution y to the initial value problem


y 0 = t2 y 2 , y(0) = 1.

1.4.4.- Find all solutions y of the equation


p
ty + 1 + t2 y 0 = 0.

1.4.5.- Find every solution y of the Euler homogeneous equation


y+t
y0 = .
t

1.4.6.- Find all solutions y to the equation


t2 + y 2
y0 = .
ty

1.4.7.- Find the explicit solution to the initial value problem


(t2 + 2ty) y 0 = y 2 , y(1) = 1.

1.4.8.- Prove that if y 0 = f (t, y) is an Euler homogeneous equation and y1 (t) is a solution, then
y(t) = (1/k) y1 (kt) is also a solution for every non-zero k ∈ R.
48 1. FIRST ORDER EQUATIONS

1.5. Linear Equations


Linear differential equations look simpler than separable equations, but they could be more com-
plicated to solve. We will see that there are linear equations that cannot be transformed into
separable equations, therefore their solutions cannot be obtained by simply integrating on both
side with respect to the independent variable. It turns out we need a new idea to solve all linear
equations. We will study one of such new ideas called the integrating factor method.
In the last part of this section we turn our attention to a particular nonlinear differential
equation—the Bernoulli equation. This nonlinear equation has a particular property: it can be
transformed into a linear equation by an appropriate change of the unknown function. Then, one
solves the linear equation for the changed function using the integrating factor method. The last
step is to transform the changed function back into the original function.

1.5.1. Constant Coefficients. We start with a precise definition of linear differential equa-
tions, and then we focus on a particular type of linear equations—those with constant coefficients.

Definition 1.5.1. A linear non-homogenous differential equation on the function y is


y 0 = a(t) y + b(t) (1.5.1)
The equation is called linear, also homogeneous, if b(t) = 0 for all t. The equation has con-
stant coefficients if both a and b are constants, otherwise we say that the equation has variable
coefficients.

We have seen in earlier sections examples of linear non-homogeneous equations with constant
coefficients, for example the population models with unlimited food resources.

Example 1.5.1 (Exponential Population Model with Immigration). Consider a population of


rabbits in a region with unlimited food resources. Suppose that the growth rate is proportional to
the population with a proportionality factor of 2 per month and there is an immigration rate of 25
rabbits per month. Find the differential equation that describes this population of rabbits.
Solution: The population of rabbits has unlimited food resources and immigration, so the differ-
ential equation for such system is
y 0 (t) = a y(t) + b,
where y(t) is the rabbit population at the time t, a is the growth rate per capita coefficient, and b
is the immigration rate coefficient. So in this case we have
a = 2, b = 25.
So the equation describing the rabbit population under the assumptions of this example is
y 0 (t) = 2 y(t) + 25.
C

Our next result is a formula for the solutions of all linear differential equations with constant
coefficients.

Theorem 1.5.2 (Constant Coefficients). The linear non-homogeneous differential equation


y0 = a y + b (1.5.2)
with a 6= 0, b constants, has infinitely many solutions,
b
y(t) = c eat − , c ∈ R. (1.5.3)
a
Furthermore, the initial value problem
y 0 = a y + b, y(0) = y0 , (1.5.4)
1.5. LINEAR EQUATIONS 49

has a unique solution given by


 b  at b
y(t) = y0 + e − . (1.5.5)
a a

Remarks:
(a) The expression in Eq. (1.5.3) is called the general solution of the differential equation.
(b) The proof is to transform the linear equation into a separable equation. Then we integrate on
both sides with respect to t. Notice if we integrate with respect to t the linear equation itself,
Eq. (1.5.2), without transforming it into separable, then we cannot solve the problem. Indeed,
Z Z
y 0 (t) dt = a y(t) dt + bt + c, c ∈ R.

The Fundamental Theorem of Calculus implies y(t) = y 0 (t) dt, so we get


R
Z
y(t) = a y(t) dt + bt + c.

Integrating both sides of the differential equation is not enough to find a solution y. We still
need to find a primitive of y. We have only rewritten the original differential equation as an
integral equation. Simply integrating both sides of a linear equation does not solve the equation.

Proof of Theorem 1.5.2: Rewrite the differential equation as a separable equation,


y 0 (t) y 0 (t) dt
Z Z
1
=1 ⇒ = dt.
a y(t) + b b y(t) + (b/a)
In the left-hand side introduce the substitution
( )
u = y + (b/a) 1
Z
du
Z
0 ⇒ = dt.
du = y dt. a u

The integral is is now simple to find,


1
ln(|u|) = t + c0 ⇒ ln |y + (b/a)| = at + c1 , c1 = 2 c0 .
a
Compute exponentials on both sides above, and recall that e(a1 +a2 ) = ea1 ea2 ,
|y + (b/a)| = eat+c1 = eat ec1 .
We take out the absolute value,
b
y(t) + (b/a) = (±ec1 ) eat ⇒ y(t) = c2 eat − .
a
where c2 = (±ec1 ). We know that at t = 0 we have y(0) = y0 , so
b b
y0 = y(0) = c2 − ⇒ c2 = y0 + .
a a
So the solution to the initial value problem is
 b  at b
y(t) = y0 + e − .
a a
This establishes the Theorem. 
Now we know how to solve the exponential growth equation with immigration from the Exam-
ple 1.5.1.

Example 1.5.2 (Exponential Population Model with Immigration). Solve the differential equation
that describes the rabbit population in Example 1.5.1 knowing that the initial rabbit population is
100 rabbits.
50 1. FIRST ORDER EQUATIONS

Solution: We follow the calculation in the proof of Theorem 1.5.2. The population of rabbits has
unlimited food resources and immigration, so the differential equation for such system is
y 0 (t) = a y(t) + b,
where y(t) is the rabbit population at the time t, a is the growth rate per capita coefficient, and b
is the immigration rate coefficient. So in this case we have
a = 2, b = 25.
We have seen in previous sections how to solve this differential equation, but we do it again here.
Rewrite the differential equation as a separable equation,
y 0 (t) y 0 (t) dt
Z Z
1
=1 ⇒ = dt.
2 y(t) + 25 2 y(t) + (25/2)
In the left-hand side introduce the substitution
( )
u = y + (25/2) 1
Z
du
Z
⇒ = dt.
du = y 0 dt. 2 u

The integral is is now simple to find,


1
ln(|u|) = t + c0 ⇒ ln |y + (25/2)| = 2t + c1 , c1 = 2 c0 .
2
Compute exponentials on both sides above, and recall that e(a1 +a2 ) = ea1 ea2 ,
|y + (25/2)| = e2t+c1 = e2t ec1 .
We take out the absolute value,
25
y(t) + (25/2) = (±ec1 ) e2t ⇒ y(t) = c2 e2t − .
2
where c2 = (±ec1 ). We know that at t = 0 we have 100 rabbits, so,
25 25 225
100 = y(0) = c2 − ⇒ c2 = 100 + = .
2 2 2
So the solution to our problem is
225 2t 25
y(t) = e − .
2 2
C

1.5.2. Newton’s Cooling Law. Our next example is about the Newton cooling law, which
says that the temperature T at a time t of a material placed in a surrounding medium kept at a
constant temperature Ts satisfies the linear differential equation
(∆T )0 = −k (∆T ),
with ∆T (t) = T (t) − Ts , and k > 0, constant, characterizing the material thermal properties.
Remarks:
(a) Although Newton’s law is called a “Cooling Law”, it also describes objects that warm up. When
the initial temperature difference, (∆T )(0) = T (0) − Ts is positive the object cools down, but
when (∆T )(0) is negative the object is warms up.
(b) Newton’s cooling law for ∆T is the same as the radioactive decay equation. But now (∆T )(0)
can be either positive or negative.
(c) The solution of Newton’s cooling law equation is
(∆T )(t) = (∆T )(0) e−kt ,
for some initial temperature difference (∆T )(0) = T (0) − Ts . Since (∆T )(t) = T (t) − Ts ,
T (t) = (T (0) − Ts ) e−kt + Ts .
1.5. LINEAR EQUATIONS 51

Example 1.5.3 (Newton’s Cooling Law). A cup with water at 45 C is placed in the cooler held at
5 C. If after 2 minutes the water temperature is 25 C, when will the water temperature be 15 C?
Solution: The solution to the Newton cooling law equation is
(∆T )(t) = (∆T )(0) e−kt ⇒ T (t) = (T (0) − Ts ) e−kt + Ts ,
and we also know that in this case we have
T (0) = 45, Ts = 5, T (2) = 25.
In this example we need to find t1 such that T (t1 ) = 15. In order to find that t1 we first need to
find the constant k,
T (t) = (45 − 5) e−kt + 5 ⇒ T (t) = 40 e−kt + 5.
Now use the fact that T (2) = 25 C, that is,
1
20 = T (2) = 40 e−2k ⇒ ln(1/2) = −2k ln(2). ⇒ k=
2
Having the constant k we can now go on and find the time t1 such that T (t1 ) = 15 C.
√ √
T (t) = 40 e−t ln( 2)
+5 ⇒ 10 = 40 e−t1 ln( 2)
⇒ t1 = 4.
C

1.5.3. Variable Coefficients. We have solved constant coefficient linear equations by trans-
forming them into separable equations, and then integrating with respect to the independent vari-
able. This idea can be used for a particular type of variable coefficient linear equations, those with
b/a constant.

Example 1.5.4 (Variable Coefficient Case b = 0). Find all solutions of the linear equation
y 0 = a(t) y.

Solution: We transform the linear equation into a separable equation


y0
= a(t) ⇒ ln(|y|)0 = a(t) ⇒ ln(|y(t)|) = A(t) + c0 ,
y
R
where A = a dt, is a primitive or antiderivative of a. Therefore,

y(t) = ±eA(t)+c0 = ±eA(t) ec0 ,


so we get the solution y(t) = c eA(t) , where c = ±ec0 . C

2
Remark: The solutions of y 0 = 2t y are y(t) = c et , where c ∈ R.
However, the case where b/a is not constant is not so simple to solve, because the linear
equation cannot be transformed into a separable equation. We need a new idea. We now show
an idea that works with all first order linear equations with variable coefficients—the integrating
factor method.
We now state our main result—the formula for the solutions of linear differential equations
with variable coefficients.

Theorem 1.5.3 (Variable Coefficients). If the functions a, b are continuous on (t1 , t2 ), then
y 0 = a(t) y + b(t), (1.5.6)
has infinitely many solutions on (t1 , t2 ) given by
Z
y(t) = c eA(t) + eA(t) e−A(t) b(t) dt, (1.5.7)
52 1. FIRST ORDER EQUATIONS

R
where A(t) = a(t) dt is any antiderivative of the function a and c ∈ R. Furthermore, for any
t0 ∈ (t1 , t2 ) and y0 ∈ R the initial value problem
y 0 = a(t) y + b(t), y(t0 ) = y0 , (1.5.8)
has the unique solution y on the same domain (t1 , t2 ), given by
Z t
y(t) = y0 eÂ(t) + eÂ(t) e−Â(s) b(s) ds, (1.5.9)
t0
Z t
where the function Â(t) = a(s) ds is a particular antiderivative of function a.
t0

Remarks:
(a) The expression in Eq. (1.5.7) is called the general solution of the differential equation.
(b) The function µ(t) = e−A(t) is called and integrating factor of the equation.

Example 1.5.5 (Consistency with Constant Coefficients General Solution). Show that for constant
coefficient equations the solution formula given in Eq. (1.5.7) reduces to Eq. (1.5.3).
Solution: In the particular case of constant coefficient equations, a primitive, or antiderivative, for
the constant function a is A(t) = at, so
Z
y(t) = c eat + eat e−at b dt.

Since b is constant, the integral in the second term above can be computed explicitly,
Z  b  b
eat b e−at dt = eat − e−at = − .
a a
b
Therefore, in the case of a, b constants we obtain y(t) = c eat − given in Eq. (1.5.3). C
a

Example 1.5.6 (Consistency with Constant Coefficients IVP). Show that for constant coefficient
equations the solution formula given in Eq. (1.5.9) reduces to Eq. (1.5.5).
Solution: In the particular case of a constant coefficient equation, where a, b ∈ R, the solution
given in Eq. (1.5.9) reduces to the one given in Eq. (1.5.5). Indeed,
Z t Z t
b b
Â(t) = a ds = a(t − t0 ), e−a(s−t0 ) b ds = − e−a(t−t0 ) + .
t0 t0 a a
Therefore, the solution y can be written as
 b b  b  a(t−t0 ) b
y(t) = y0 ea(t−t0 ) + ea(t−t0 ) − e−a(t−t0 ) + = y0 + e − .
a a a a
C

Linear equations with variable coefficients having b/a nonconstant cannot be transformed into
separable equations. We need a new idea to solve such equations. We now show one way to solve
these equations—the integrating factor method.
Proof of Theorem 1.5.3: Write the differential equation with y on one side only,
y 0 − a y = b,
and then multiply the differential equation by a function µ, called an integrating factor,
µ y 0 − a µ y = µ b. (1.5.10)
The critical step is to choose a function µ such that
− a µ = µ0 . (1.5.11)
1.5. LINEAR EQUATIONS 53

For any function µ solution of Eq. (1.5.11), the differential equation in (1.5.10) has the form
µ y 0 + µ0 y = µ b.
But the left-hand side is a total derivative of a product of two functions,
0
µ y = µ b. (1.5.12)
This is the property we want in an integrating factor, µ. We want to find a function µ such that
the left-hand side of the differential equation for y can be written as a total derivative, just as in
Eq. (1.5.12). We need to find just one of such functions µ. So we go back to Eq. (1.5.11), the
differential equation for µ, which is simple to solve,
µ0
µ0 = −a µ ⇒ = −a ⇒ ln(|µ|)0 = −a ⇒ ln(|µ|) = −A + c0 ,
µ
R
where A = a dt, a primitive or antiderivative of a, and c0 is an arbitrary constant. Computing
the exponential of both sides we get
µ = ±ec0 e−A ⇒ µ = c1 e−A , c1 = ±ec0 .
Since c1 is a constant which will cancel out from Eq. (1.5.10) anyway, we choose the integration
constant c0 = 0, hence c1 = 1. The integrating factor is then
µ(t) = e−A(t) .
This function is an integrating factor, because if we start again at Eq. (1.5.10), we get
0
e−A y 0 − a e−A y = e−A b ⇒ e−A y 0 + e−A y = e−A b,
0
where we used the main property of the integrating factor, −a e−A = e−A . Now the product
rule for derivatives implies that the left-hand side above is a total derivative,
0
e−A y = e−A b.
Integrating on both sides we get
Z Z
e y = e−A b dt + c
−A  −A
e−A b dt = c.

⇒ e y −

The function ψ(t, y) = e−A y − e−A b dt is called a potential function of the differential equation.
 R

The solution of the differential equation can be computed from the second equation above, ψ = c,
and the result is Z
y(t) = c eA(t) + eA(t) e−A(t) b(t) dt.
This establishes
Z the first part of the Theorem. For the furthermore part, let us use the notation
−A(t)
K(t) = e b(t) dt, and then introduce the initial condition in (1.5.8), which fixes the constant
c in the general solution,
y0 = y(t0 ) = c eA(t0 ) + eA(t0 ) K(t0 ).
So we get the constant c,
c = y0 e−A(t0 ) − K(t0 ).
Using this expression in the general solution above,
 
y(t) = y0 e−A(t0 ) − K(t0 ) eA(t) + eA(t) K(t) = y0 eA(t)−A(t0 ) + eA(t) K(t) − K(t0 ) .


Let us introduce the particular primitives Â(t) = A(t) − A(t0 ) and K̂(t) = K(t) − K(t0 ), which
vanish at t0 , that is,
Z t Z t
Â(t) = a(s) ds, K̂(t) = e−A(s) b(s) ds.
t0 t0
Then the solution y of the IVP has the form
Z t
y(t) = y0 eÂ(t) + eA(t) e−A(s) b(s) ds
t0
54 1. FIRST ORDER EQUATIONS

which is equivalent to
Z t
y(t) = y0 eÂ(t) + eA(t)−A(t0 ) e−(A(s)−A(t0 )) b(s) ds,
t0

so we conclude that Z t
y(t) = y0 eÂ(t) + eÂ(t) e−Â(s) b(s) ds.
t0

This establishes the IVP part of the Theorem. 


In the following examples we show the main ideas used in the proof of the Theorem 1.5.3 above
to find solutions to linear equations with variable coefficients.

Example 1.5.7 (The Integrating Factor Method). Find all solutions y to the differential equation
3
y0 = y + t5 , t > 0.
t

Solution: Rewrite the equation with y on only one side,


3
y0 −
y = t5 .
t
Multiply the differential equation by a function µ, which we determine later,
 3  3
µ(t) y 0 − y = t5 µ(t) ⇒ µ(t) y 0 − µ(t) y = t5 µ(t).
t t
We need to choose a positive function µ having the following property,
3 3 µ0 (t) 3 0
− µ(t) = µ0 (t) ⇒ − = ⇒ − = ln(|µ|)
t t µ(t) t
Integrating,
Z
3 −3
ln(|µ|) = − dt = −3 ln(|t|) + c0 = ln(|t|−3 ) + c0 ⇒ µ = (±ec0 ) eln(|t| )
,
t
so we get µ = (±ec0 ) |t|−3 . We need only one integrating factor, so we choose
µ = t−3 .
We now go back to the differential equation for y and we multiply it by this integrating factor,
 3 
t−3 y 0 − y = t−3 t5 ⇒ t−3 y 0 − 3 t−4 y = t2 .
t
 t3 0
Using that −3 t−4 = (t−3 )0 and t2 = , we get
3
 t 3 0 0  t3 0  t 3 0
t−3 y 0 + (t−3 )0 y = ⇒ t−3 y = ⇒ t−3 y − = 0.
3 3 3
t3
This last equation is a total derivative of a potential function ψ(t, y) = t−3 y − . Since the
3
equation is a total derivative, this confirms that we got a correct integrating factor. Now we need
to integrate the total derivative, which is simple to do,
t3 t3 t6
t−3 y − =c ⇒ t−3 y = c + ⇒ y(t) = c t3 + ,
3 3 3
where c is an arbitrary constant. C

Example 1.5.8 (The Integrating Factor Method). Find all solutions y of the equation
ty 0 + 2y = 4t2 , with t > 0.
1.5. LINEAR EQUATIONS 55

Solution: Rewrite the equation as


2 2
y 0 = − y + 4t ⇔ a(t) = − , b(t) = 4t. (1.5.13)
t t
Rewrite again,
2
y0 + y = 4t.
t
Multiply by a function µ,
2
µ y0 + µ y = µ 4t.
t
Choose the function µ to be solution of
2 2
µ = µ0 ⇒ ln(|µ|)0 = ⇒ ln(|µ|) = 2 ln(|t|) = ln(t2 ) ⇒ µ(t) = ±t2 .
t t
We choose the integrating factor µ = t2 . Multiply the differential equation by this µ,
t2 y 0 + 2t y = 4t t2 ⇒ (t2 y)0 = 4t3 .
If we write the right-hand side also as a derivative,
0 0 0
t2 y = t4 ⇒ t2 y − t4 = 0.
2 4
So a potential function is ψ(t, y(t)) = t y(t) − t . Integrating on both sides we obtain
c
t2 y − t4 = c ⇒ t2 y = c + t4 ⇒ y(t) = 2 + t2 .
t
C

In the next example we show how to solve an initial value problem for a linear variable coefficient
equation.

Example 1.5.9 (IVP). Find the function y solution of the initial value problem
ty 0 + 2y = 4t2 , t > 0, y(1) = 2.

Solution: In Example 1.5.8 we computed the general solution of the differential equation,
c
y(t) = 2 + t2 , c ∈ R.
t
The initial condition implies that
1
2 = y(1) = c + 1 ⇒ c=1 ⇒ y(t) = + t2 .
t2
C

In our next example we solve again the problem in Example 1.5.9 but this time we use
Eq. (1.5.9).

Example 1.5.10 (IVP). Find the solution of the problem given in Example 1.5.9, but this time
using Eq. (1.5.9) in Theorem 1.5.3.
Solution: We find the solution simply by using Eq. (1.5.9). First, find the integrating factor
function µ as follows:
Z t
2
ds = −2 ln(t) − ln(1) = −2 ln(t) ⇒ A(t) = ln(t−2 ).
 
A(t) = −
1 s

The integrating factor is µ(t) = e−A(t) , that is,


−2 2
µ(t) = e− ln(t )
= eln(t )
⇒ µ(t) = t2 .
56 1. FIRST ORDER EQUATIONS

Note that Eq. (1.5.9) contains eA(t) = 1/µ(t). Then, compute the solution as follows,
Z t
1 
y(t) = 2 2 + s2 4s ds
t 1
1 t 3
Z
2
= 2 + 2 4s ds
t t 1
2 1
= 2 + 2 (t4 − 1)
t t
2 1 1
= 2 + t2 − 2 ⇒ y(t) = 2 + t2 .
t t t
C

1.5.4. Mixing Problems. We study the system pictured in Fig. 20. A tank has a salt mass
Q(t) dissolved in a volume V (t) of water at a time t. Water is pouring into the tank at a rate
ri (t) with a salt concentration qi (t). Water is also leaving the tank at a rate ro (t) with a salt
concentration qo (t). Recall that a water rate r means water volume per unit time, and a salt
concentration q means salt mass per unit volume.
We assume that the salt entering in the tank gets instantaneously mixed. As a consequence
the salt concentration in the tank is homogeneous at every time. This property simplifies the
mathematical model describing the salt in the tank.
Before stating the problem we want to solve, we review the physical units of the main fields
involved in it. Denote by [ri ] the units of the quantity ri . Then we have
Volume Mass
[ri ] = [ro ] = , [qi ] = [qo ] = ,
Time Volume
[V ] = Volume, [Q] = Mass.

ri , qi (t)

Tank
Instantaneously mixed

V (t) Q(t) ro , qo (t)

Figure 20. Description of a water tank problem.

Definition 1.5.4. A Mixing Problem refers to water coming into a tank at a rate ri with salt
concentration qi , and going out the tank at a rate ro and salt concentration qo , so that the water
volume V and the total amount of salt Q, which is instantaneously mixed, in the tank satisfy the
following equations,
V 0 (t) = ri (t) − ro (t), (1.5.14)
0
Q (t) = ri (t) qi (t) − ro (t) qo (t), (1.5.15)
Q(t)
qo (t) = , (1.5.16)
V (t)
ri0 (t) = ro0 (t) = 0. (1.5.17)
1.5. LINEAR EQUATIONS 57

Remarks:
(a) The first equation says that the variation in time of the water volume inside the tank is the
difference of volume time rates coming in and going out of the tank.
(b) The second equation above says that the variation in time of the amount of salt in the tank is
the difference of the amount of salt coming in and going out of the tank. These latter amount
of salts are the product of a water rate r times a salt concentration q. This product has units
of mass per time and represents the amount of salt entering or leaving the tank per unit time.
(c) Eq. (1.5.16) is the consequence of the instantaneous mixing mechanism in the tank. Since
the salt in the tank is well-mixed, the salt concentration is homogeneous in the tank, with
value Q(t)/V (t). Finally the equations in (1.5.17) say that both rates in and out are time
independent, hence constants.

Theorem 1.5.5 (Mixing Problem). The amount of salt in the mixing problem above satisfies the
equation
Q0 (t) = a(t) Q(t) + b(t), (1.5.18)
where the coefficients in the equation are given by
ro
a(t) = − , b(t) = ri qi (t). (1.5.19)
(ri − ro ) t + V0

Proof of Theorem 1.5.5: The equation for the salt in the tank given in (1.5.18) comes from
Eqs. (1.5.14)-(1.5.17). We start noting that Eq. (1.5.17) says that the water rates are constant. We
denote them as ri and ro . This information in Eq. (1.5.14) implies that V 0 is constant. Then we
can easily integrate this equation to obtain
V (t) = (ri − ro ) t + V0 , (1.5.20)
where V0 = V (0) is the water volume in the tank at the initial time t = 0. On the other hand,
Eqs.(1.5.15) and (1.5.16) imply that
ro
Q0 (t) = ri qi (t) − Q(t).
V (t)
Since V (t) is known from Eq. (1.5.20), we get that the function Q must be solution of the differential
equation
ro
Q0 (t) = ri qi (t) − Q(t).
(ri − ro ) t + V0
This is a linear ODE for the function Q. Indeed, introducing the functions
ro
a(t) = − , b(t) = ri qi (t),
(ri − ro ) t + V0
the differential equation for Q has the form
Q0 (t) = a(t) Q(t) + b(t).
This establishes the Theorem. 

Example 1.5.11 (General Case for V (t) = V0 ). Consider a mixing problem with equal constant
water rates ri = ro = r, with constant incoming concentration qi , and with a given initial water
volume in the tank V0 . Then, find the solution to the initial value problem
Q0 (t) = a(t) Q(t) + b(t), Q(0) = Q0 ,
where function a and b are given in Eq. (1.5.19). Graph the solution function Q for different values
of the initial condition Q0 .
Solution: The assumption ri = ro = r implies that the function a is constant, while the assumption
that qi is constant implies that the function b is also constant too,
ro r
a(t) = − ⇒ a(t) = − = a0 ,
(ri − ro ) t + V0 V0
b(t) = ri qi (t) ⇒ b(t) = ri qi = b0 .
58 1. FIRST ORDER EQUATIONS

Then, we must solve the initial value problem for a constant coefficients linear equation,
Q0 (t) = a0 Q(t) + b0 , Q(0) = Q0 ,
The integrating factor method can be used to find the solution of the initial value problem above.
The formula for the solution is given in Theorem ??,
 b0  a0 t b0
Q(t) = Q0 + e − .
a0 a0
In our case the we can evaluate the constant b0 /a0 , and the result is
b0  V 
0 b0
= (rqi ) − ⇒ − = qi V0 .
a0 r a0
Then, the solution Q has the form,
Q(t) = Q0 − qi V0 e−rt/V0 + qi V0 .

(1.5.21)
The initial amount of salt Q0 in the tank can be any non-negative real number. The solution
behaves differently for different values of Q0 . We classify these values in three classes:
(a) The initial amount of salt in the tank is the critical value Q0 = qi V0 . In this case the solution
Q remains constant equal to this critical value, that is, Q(t) = qi V0 .
(b) The initial amount of salt in the tank is bigger than the critical value, Q0 > qi V0 . In this case
the salt in the tank Q decreases exponentially towards the critical value.
(c) The initial amount of salt in the tank is smaller than the critical value, Q0 < qi V0 . In this case
the salt in the tank Q increases exponentially towards the critical value.
The graphs of a few solutions in these three classes are plotted in Fig. 21.

qi V0

Figure 21. The function Q in (1.5.21) for a few values of the initial con-
dition Q0 .

Example 1.5.12 (Find a particular time, for V (t) = V0 ). Consider a mixing problem with equal
constant water rates ri = ro = r and fresh water is coming into the tank, hence qi = 0. Then, find
the time t1 such that the salt concentration in the tank Q(t)/V (t) is 1% the initial value. Write
that time t1 in terms of the rate r and initial water volume V0 .
Solution: The first step to solve this problem is to find the solution Q of the initial value problem
Q0 (t) = a(t) Q(t) + b(t), Q(0) = Q0 ,
1.5. LINEAR EQUATIONS 59

where function a and b are given in Eq. (1.5.19). In this case they are
ro r
a(t) = − ⇒ a(t) = − ,
(ri − ro ) t + V0 V0
b(t) = ri qi (t) ⇒ b(t) = 0.
The initial value problem we need to solve is
r
Q0 (t) = − Q(t), Q(0) = Q0 .
V0
From Section ?? we know that the solution is given by
Q(t) = Q0 e−rt/V0 .
We can now proceed to find the time t1 . We first need to find the concentration Q(t)/V (t). We
already have Q(t) and we now that V (t) = V0 , since ri = ro . Therefore,
Q(t) Q(t) Q0 −rt/V0
= = e .
V (t) V0 V0
The condition that defines t1 is
Q(t1 ) 1 Q0
= .
V (t1 ) 100 V0
From these two equations above we conclude that
1 Q0 Q(t1 ) Q0 −rt1 /V0
= = e .
100 V0 V (t1 ) V0
The time t1 comes from the equation
1  1  rt1 rt1
= e−rt1 /V0 ⇔ ln =− ⇔ ln(100) = .
100 100 V0 V0
The final result is given by
V0
t1 = ln(100).
r
C

Example 1.5.13 (Variable Water Volume). A tank with a maximum capacity for 100 liters origi-
nally contains 20 liters of water with 50 grams of salt in solution. Fresh water is poured in the tank
at a rate of 5 liters per minute. The well-stirred water is allowed to pour out the tank at a rate of
3 liters per minute. Find the amount of salt in the tank at the time tc when the tank is about to
overflow.
Solution: We start with the water volume conservation,
V 0 (t) = ri − ro = 5 − 3 = 2 ⇒ V (t) = 2t + V0 .
Since V (0) = 20 we get the function volume of water in the tank,
V (t) = 2t + 20.
Right now we can compute the time when the tank overflows,
100 − 20
100 = V (tc ) = 2 tc + 20 ⇒ tc = ⇒ tc = 40 min.
2
We now need to find the equation for the salt in the tank. We start with the mass salt conservation
Q0 (t) = ri qi − ro qo (t), qi = 0 ⇒ Q0 (t) = −ro qo (t).
Since the water in the tank is well-stirred, qo (t) = Q(t)/V (t), so
ro 3
Q0 (t) = − Q(t) ⇒ Q0 (t) = − Q(t).
V (t) 2t + 20
This is a linear differential equation with variable coefficients. But it can be converted into a
separable equation,
Q0 (t)
Z Z
3 dQ 3 dt
=− ⇒ =− .
Q(t) 2t + 20 Q 2t + 20
60 1. FIRST ORDER EQUATIONS

Integrating we get
Z
3 dt 3
ln |Q(t)| = − = − ln(t + 10) + c0 ⇒ ln |Q(t)| = ln((t + 10)−3/2 ) + c0 .
2 t + 10 2
We can now compute exponentials on both sides,
|Q(t)| = (t + 10)−3/2 ec0 ⇒ Q(t) = c (t + 10)−3/2 , c = (±ec0 ).
The initial condition Q(0) = 50 fixes the constant c,
1
50 = Q(0) = c (0 + 10)−3/2 = c ⇒ c = 50 (10)3/2 .
(10)3/2
Therefore,
1 50
Q(t) = 50 (10)3/2 ⇒ Q(t) = 3/2 .
(t + 10)3/2 (t/10) + 1
This result is reasonable, the amount of salt decreases as function of time, since we are adding fresh
water to the tank. The amount of salt at the time the tank overflows is Q(tc ),
50 50
Q(40) = 3/2 , ⇒ Q(40) = .
(40/10) + 1 (5)3/2
C

Example 1.5.14 (Nonzero qi , for V (t) = V0 ). Consider a mixing problem with equal constant
water rates ri = ro = r, with only fresh water in the tank at the initial time, hence Q0 = 0 and
with a given initial volume of water in the tank V0 . Then find the function salt in the tank Q if the
incoming salt concentration is given by the function
qi (t) = 2 + sin(2t).

Solution: We need to find the solution Q to the initial value problem


Q0 (t) = a(t) Q(t) + b(t), Q(0) = 0,
where function a and b are given in Eq. (1.5.19). In this case we have
ro r
a(t) = − ⇒ a(t) = − = −a0 ,
(ri − ro ) t + V0 V0
 
b(t) = ri qi (t) ⇒ b(t) = r 2 + sin(2t) .
We are changing the sign convention for a0 so that a0 > 0. The initial value problem we need to
solve is
Q0 (t) = −a0 Q(t) + b(t), Q(0) = 0.
The solution is computed using the integrating factor method and the result is
Z t
Q(t) = e−a0 t ea0 s b(s) ds,
0

where we used that the initial condition is Q0 = 0. Recalling the definition of the function b we
obtain Z t
Q(t) = e−a0 t ea0 s 2 + sin(2s) ds.
 
0
This is the formula for the solution of the problem, we only need to compute the integral given in
the equation above. This is not straightforward though. We start with the following integral found
in an integration table,
eks 
Z
eks sin(ls) ds = 2

k sin(ls) − l cos(ls) ,
k + l2
1.5. LINEAR EQUATIONS 61

where k and l are constants. Therefore,


Z t h2 i t h ea0 s  i t
ea0 s 2 + sin(2s) ds = ea0 s + 2
 
a sin(2s) − 2 cos(2s) ,

0
0 a0 0 a0 + 22 0
a0 t
2 e 2
q ea0 t − 1 + 2
  
= a0 sin(2t) − 2 cos(2t) + 2 .
a0 a0 + 22 a0 + 22
With the integral above we can compute the solution Q as follows,
h2 ea0 t  2 i
Q(t) = e−a0 t ea0 t − 1 + 2
 
2
a0 sin(2t) − 2 cos(2t) + 2 ,
a0 a0 + 2 a0 + 22
recalling that a0 = r/V0 . We rewrite expression above as follows,
2 h 2 2 i −a0 t 1  
Q(t) = + 2 − e + 2 a0 sin(2t) − 2 cos(2t) . (1.5.22)
a0 a0 + 22 a0 a0 + 22
C

y
8
f (x) = 2 − e−x
5
2
Q(t)

Figure 22. The graph of the function Q given in Eq. (1.5.22) for a0 = 1.

1.5.5. The Bernoulli Equation. In 1696 Jacob Bernoulli solved what is now known as the
Bernoulli differential equation. This is a first order nonlinear differential equation. The following
year Leibniz solved this equation by transforming it into a linear equation. We now explain Leibniz’s
idea in more detail.

Definition 1.5.6. The Bernoulli equation is

y 0 = p(t) y + q(t) y n . (1.5.23)

where p, q are given functions and n ∈ R.

Remarks:
(a) For n 6= 0, 1 the equation is nonlinear.
(b) If n = 2 we get the logistic equation, (we’ll study it in a later chapter),
 y
y 0 = ry 1 − .
K
(c) This is not the Bernoulli equation from fluid dynamics.

The Bernoulli equation is special in the following sense: it is a nonlinear equation that can be
transformed into a linear equation.
62 1. FIRST ORDER EQUATIONS

Theorem 1.5.7 (Bernoulli). The function y is a solution of the Bernoulli equation


y 0 = p(t) y + q(t) y n , n 6= 1,
(n−1)
iff the function v = 1/y is solution of the linear differential equation
v 0 = −(n − 1)p(t) v − (n − 1)q(t).

Remark: This result summarizes Laplace’s idea to solve the Bernoulli equation. To transform the
Bernoulli equation for y, which is nonlinear, into a linear equation for v = 1/y (n−1) . One then
solves the linear equation for v using the integrating factor method. The last step is to transform
back to y = (1/v)1/(n−1) .
Proof of Theorem 1.5.7: Divide the Bernoulli equation by y n ,
y0 p(t)
= n−1 + q(t).
yn y
Introduce the new unknown v = y −(n−1) and compute its derivative,
0 v 0 (t) y 0 (t)
v 0 = y −(n−1) = −(n − 1)y −n y 0 ⇒ −

= n .
(n − 1) y (t)
If we substitute v and this last equation into the Bernoulli equation we get
v0
− = p(t) v + q(t) ⇒ v 0 = −(n − 1)p(t) v − (n − 1)q(t).
(n − 1)
This establishes the Theorem. 

Example 1.5.15. Find every nonzero solution of the differential equation


y0 = y + 2 y5 .

Solution: This is a Bernoulli equation for n = 5. Divide the equation by y 5 ,


y0 1
= 4 + 2.
y5 y
Introduce the function v = 1/y 4 and its derivative v 0 = −4(y 0 /y 5 ), into the differential equation
above,
v0
− = v + 2 ⇒ v 0 = −4 v − 8 ⇒ v 0 + 4 v = −8.
4
The last equation is a linear differential equation for the function v. This equation can be solved
using the integrating factor method. Multiply the equation by µ(t) = e4t , then
0 8
e4t v = −8 e4t ⇒ e4t v = − e4t + c.
4
We obtain that v = c e−4t − 2. Since v = 1/y 4 ,
1 1
= c e−4t − 2 ⇒ y(t) = ± 1/4 .
y4 c e−4t −2
C

Example 1.5.16. Given any constants a0 , b0 , find every solution of the differential equation
y 0 = a0 y + b0 y 3 .

Solution: This is a Bernoulli equation with n = 3. Divide the equation by y 3 ,


y0 a0
= 2 + b0 .
y3 y
1.5. LINEAR EQUATIONS 63

Introduce the function v = 1/y 2 and its derivative v 0 = −2(y 0 /y 3 ), into the differential equation
above,
v0
− = a0 v + b0 ⇒ v 0 = −2a0 v − 2b0 ⇒ v 0 + 2a0 v = −2b0 .
2
The last equation is a linear differential equation for v. This equation can be solved using the
integrating factor method. Multiply the equation by µ(t) = e2a0 t ,
0 b0 2a0 t
e2a0 t v = −2b0 e2a0 t ⇒ e2a0 t v = − e +c
a0
b0
We obtain that v = c e−2a0 t − . Since v = 1/y 2 ,
a0
1 b0 1
= c e−2a0 t − ⇒ y(t) = ± .
y2 a0 c e−2a0 t − b0 1/2
a0

Example 1.5.17. Find every solution of the equation t y 0 = 3y + t5 y 1/3 .


Solution: Rewrite the differential equation as
3
y0 = y + t4 y 1/3 .
t
This is a Bernoulli equation for n = 1/3. Divide the equation by y 1/3 ,
y0 3
= y 2/3 + t4 .
y 1/3 t
Define the new unknown function v = 1/y (n−1) , that is, v = y 2/3 , compute is derivative, v 0 =
2 y0
, and introduce them in the differential equation,
3 y 1/3
3 0 3 2 2
v = v + t4 ⇒ v 0 − v = t4 .
2 t t 3
This is a linear equation for v. Integrate this equation using the integrating factor method. To
compute the integrating factor we need to find
Z
2
A(t) = dt = 2 ln(t) = ln(t2 ).
t
Then, the integrating factor is µ(t) = e−A(t) . In this case we get
2 −2 1
µ(t) = e− ln(t )
= eln(t )
⇒ µ(t) = .
t2
Therefore, the equation for v can be written as a total derivative,
1 0 2  2 v 2 0
2
v − v = t2 ⇒ 2
− t3 = 0.
t t 3 t 9
The potential function is ψ(t, v) = v/t2 − (2/9)t3 and the solution of the differential equation is
ψ(t, v(t)) = c, that is,
v 2  2  2
2
− t3 = c ⇒ v(t) = t2 c + t3 ⇒ v(t) = c t2 + t5 .
t 9 9 9
Once v is known we compute the original unknown y = ±v 3/2 , where the double sign is related to
taking the square root. We finally obtain
 2 3/2
y(t) = ± c t2 + t5 .
9
C
64 1. FIRST ORDER EQUATIONS

Notes. This section corresponds to Boyce-DiPrima [3] Section 2.1, and Simmons [6] Section 2.10.
The Bernoulli equation is solved in the exercises of section 2.4 in Boyce-Diprima, and in the exercises
of section 2.10 in Simmons.
1.5. LINEAR EQUATIONS 65

1.5.6. Exercises.

1.5.1.- Find all solutions of the following differential equations:

(a) y 0 = 4t y.
(b) y 0 = −y + e−2t . (e) y 0 = y − 2 sin(t).
y0 (f) y 0 + t y = t y 2 .
(c) 2 = 4t. √
(t + 1)y (g) y 0 = −x y + 6x y.
(d) ty 0 + n y = t2 , n > 0.

1.5.2.- Find the solution y of the initial value problems

(a) y 0 = y + 2te2t , y(0) = 0.


sin(t) π 2
(b) t y 0 + 2 y = , y = , for t > 0.
t 2 π
(c) 2ty − y 0 = 0, y(0) = 3.
π
(d) t y 0 = 2 y + 4t3 cos(4t), y = 0.
8
3
(e) y 0 = y + 2 , y(0) = 1.
y

1.5.3.-

(a) A cup with some liquid is placed in a fridge held at 3 C. If k is the (positive) liquid cooling
constant, find the differential equation satisfied by the temperature of the liquid.
(b) Find the liquid temperature T as function of time knowing that the liquid initial temper-
ature when it was placed in the fridge was 18 C.
(c) After 3 minutes the liquid temperature inside the fridge is 13 C. Find the liquid cooling
constant k.

1.5.4.-

(a) A pizza is placed in a oven held at 100 C. If k is the (positive) pizza cooling constant,
find the differential equation satisfied by the temperature of the pizza.
(b) Find the pizza temperature T as function of time knowing that the pizza initial temper-
ature when it was placed in the oven was 20 C.
(c) After 2 minutes the liquid temperature inside the fridge is 80 C. Find the liquid cooling
constant k.

1.5.5.- A tank initially contains V0 = 100 liters of water with Q0 = 25 grams of salt. The tank is
rinsed with fresh water flowing in at a rate of ri = 5 liters per minute and leaving the tank at
the same rate. The water in the tank is well-stirred. Find the time such that the amount the
salt in the tank is Q1 = 5 grams.

1.5.6.- A tank initially contains V0 = 100 liters of pure water. Water enters the tank at a rate of
ri = 2 liters per minute with a salt concentration of q1 = 3 grams per liter. The instantaneously
mixed mixture leaves the tank at the same rate it enters the tank. Find the salt concentration
in the tank at any time t > 0. Also find the limiting amount of salt in the tank in the limit
t → ∞.
66 1. FIRST ORDER EQUATIONS

1.5.7.- A tank with a capacity of Vm = 500 liters originally contains V0 = 200 liters of water with
Q0 = 100 grams of salt in solution. Water containing salt with concentration of qi = 1 gram
per liter is poured in at a rate of ri = 3 liters per minute. The well-stirred water is allowed to
pour out the tank at a rate of ro = 2 liters per minute. Find the salt concentration in the tank
at the time when the tank is about to overflow. Compare this concentration with the limiting
concentration at infinity time if the tank had infinity capacity.
1.6. APPROXIMATE SOLUTIONS: PICARD ITERATION 67

1.6. Approximate Solutions: Picard Iteration


We know a lot about solutions of linear differential equations, and very little about solutions of
nonlinear differential equations. While we have an explicit formula for the solutions to all linear
equations—Theorem 1.5.3—there is no such formula for solutions to every nonlinear equation. In
§ 1.4 we solved several nonlinear equations and we arrived at different formulas for their solutions.
But the nonlinear equations we solved are only a tiny part of all nonlinear equations.
Sadly, we have to give up on the goal of finding a formula for solutions to all nonlinear equations.
Instead, we focus on proving whether nonlinear equations have solutions or not. Picard and Lindelöf
followed this path and obtained an interesting result. They found out that most nonlinear equations
have solutions, although they could not find a formula for their solutions.
What they found is a way to compute a sequence of approximate solutions to the differential
equation. This sequence is now called the Picard Iteration. The solutions of nonlinear differential
equations are the limit of this sequence.
In this section we first introduce the Picard-Lindelöf Theorem and the Picard iteration to find
approximate solutions. We then compare what we know about solutions to linear and to nonlinear
differential equations.

1.6.1. Existence and Uniqueness of Solutions. A first order differential equation is called
nonlinear when it is neither linear nor linear non-homogeneous. Below we have a few examples.

Example 1.6.1.
(a) The differential equation
t2
y 0 (t) =
y 3 (t)
is nonlinear, since the function f (t, y) = t2 /y 3 is nonlinear in the second argument.
(b) The differential equation
y 0 (t) = 2ty(t) + ln y(t)


is nonlinear, since the function f (t, y) = 2ty + ln(y) is nonlinear in the second argument, due
to the term ln(y).
(c) The differential equation
y 0 (t)
= 2t2
y(t)
is linear, since the function f (t, y) = 2t2 y is linear in the second argument.
C

The Picard-Lindelöf Theorem shows that certain nonlinear equations have solutions, uniquely
determined by appropriate initial data.

Theorem 1.6.1 (Picard-Lindelöf). Consider the initial value problem


y 0 (t) = f (t, y(t)), y(t0 ) = y0 . (1.6.1)
If the function f and its partial derivative ∂y f are continuous on some rectangle on the ty-plane
containing the point (t0 , y0 ) in its interior, then there is a unique solution y of the initial value
problem in (1.6.1) on an open interval containing t0 .

Remark: We prove this theorem rewriting the differential equation as an integral equation for the
unknown function y. Then we use this integral equation to construct a sequence of approximate
solutions {yn } to the original initial value problem. Next we show that this sequence of approximate
solutions has a unique limit as n → ∞. We end the proof showing that this limit is the only
solution of the original initial value problem. This proof follows [8] § 1.6 and Zeidler’s [9] § 1.8. It
is important to read the review on complete normed vector spaces, called Banach spaces, given in
these references.
68 1. FIRST ORDER EQUATIONS

Proof of Theorem 1.6.1: We start writing the differential equation in 1.6.1 as an integral equa-
tion, hence we integrate on both sides of that equation with respect to t,
Z t Z t Z t
y 0 (s) ds = f (s, y(s)) ds ⇒ y(t) = y0 + f (s, y(s)) ds. (1.6.2)
t0 t0 t0

We have used the Fundamental Theorem of Calculus on the left-hand side of the first equation to
get the second equation. And we have introduced the initial condition y(t0 ) = y0 . We use this
integral form of the original differential equation to construct a sequence of functions {yn }∞
n=0 . The
domain of every function in this sequence is Da = [t0 − a, t0 + a] for some a > 0. The sequence is
defined as follows,
Z t
yn+1 (t) = y0 + f (s, yn (s)) ds, n > 0, y0 (t) = y0 . (1.6.3)
t0

We see that the first element in the sequence is the constant function determined by the initial
conditions in (1.6.1). The iteration in (1.6.3) is called the Picard iteration. The central idea of
the proof is to show that the sequence {yn } is a Cauchy sequence in the space C(Db ) of uniformly
continuous functions in the domain Db = [t0 − b, t0 + b] for a small enough b > 0. This function
space is a Banach space under the norm
kuk = max |u(t)|.
t∈Db

See [8] and references therein for the definition of Cauchy sequences, Banach spaces, and the proof
that C(Db ) with that norm is a Banach space. We now show that the sequence {yn } is a Cauchy
sequence in that space. Any two consecutive elements in the sequence satisfy
Z t Z t
kyn+1 − yn k = max f (s, yn (s)) ds − f (s, yn−1 (s)) ds

t∈Db t0 t0
Z t
6 max f (s, yn (s)) − f (s, yn−1 (s)) ds
t∈Db t0
Z t
6 k max |yn (s) − yn−1 (s)| ds
t∈Db t0

6 kb kyn − yn−1 k.
Denoting r = kb, we have obtained the inequality
kyn+1 − yn k 6 r kyn − yn−1 k ⇒ kyn+1 − yn k 6 rn ky1 − y0 k.
Using the triangle inequality for norms and and the sum of a geometric series one compute the
following,
kyn − yn+m k = kyn − yn+1 + yn+1 − yn+2 + · · · + yn+(m−1) − yn+m k
6 kyn − yn+1 k + kyn+1 − yn+2 k + · · · + kyn+(m−1) − yn+m k
6 (rn + rn+1 + · · · + rn+m ) ky1 − y0 k
6 rn (1 + r + r2 + · · · + rm ) ky1 − y0 k
 1 − rm 
6 rn ky1 − y0 k.
1−r
Now choose the positive constant b such that b < min{a, 1/k}, hence 0 < r < 1. In this case the
sequence {yn } is a Cauchy sequence in the Banach space C(Db ), with norm k k, hence converges.
Denote the limit by y = limn→∞ yn . This function satisfies the equation
Z t
y(t) = y0 + f (s, y(s)) ds,
t0

which says that y is not only continuous but also differentiable in the interior of Db , hence y is
solution of the initial value problem in (1.6.1). The proof of uniqueness of the solution follows the
1.6. APPROXIMATE SOLUTIONS: PICARD ITERATION 69

same argument used to show that the sequence above is a Cauchy sequence. Consider two solutions
y and ỹ of the initial value problem above. That means,
Z t Z t
y(t) = y0 + f (s, y(s) ds, ỹ(t) = y0 + f (s, ỹ(s) ds.
t0 t0

Therefore, their difference satisfies


Z t Z t
ky − ỹk = max f (s, y(s)) ds − f (s, ỹ(s)) ds

t∈Db t0 t0
Z t
6 max f (s, y(s)) − f (s, ỹ(s)) ds
t∈Db t0
Z t
6 k max |y(s) − ỹ(s)| ds
t∈Db t0

6 kb ky − ỹk.
Since b is chosen so that r = kb < 1, we got that
ky − ỹk 6 r ky − ỹk, r<1 ⇒ ky − ỹk = 0 ⇒ y = ỹ.
This establishes the Theorem. 

1.6.2. The Picard Iteration. Theorem 1.6.1 above says that the initial value problem
y 0 = f (t, y), y(t0 ) = y0
has a unique solution when f and ∂y f are continuous on a domain containing (t0 , y0 ) in its interior.
A crucial step in the proof of this theorem is the construction of the Picard iteration. This is a
sequence of functions {yn }∞ n=0 with the following property: the larger n the closer yn is to the
solution of the initial value problem.
We now focus on the Picard iteration. In a few examples we show how to construct the
iteration. We also show how the sequence functions yn , called approximate solutions, approximate
the solutions of the differential equations.
In the examples below, the differential equations are very simple. They are linear equations,
which we already know how to solve—either as separable equations or with the integrating factor
method. We use simple equations because we want to show how to construct the Picard iteration,
not how to solve a new type of equations. Furthermore, because the equations are so simple, we
can actually compute the limit n → ∞ of the sequence {yn }. Usually in real life applications this
is not possible and the only thing we can do is to stop the Picard iteration for an n large enough.

Example 1.6.2. Use the proof of Picard-Lindelöf’s Theorem to find the solution to
y0 = 2 y + 3 y(0) = 1.

Solution: We first transform the differential equation into an integral equation.


Z t Z t Z t
y 0 (s) ds = (2 y(s) + 3) ds ⇒ y(t) − y(0) = (2 y(s) + 3) ds.
0 0 0

Using the initial condition, y(0) = 1,


Z t
y(t) = 1 + (2 y(s) + 3) ds.
0
We now define the sequence of approximate solutions:
Z t
y0 = y(0) = 1, yn+1 (t) = 1 + (2 yn (s) + 3) ds, n > 0.
0
We now compute the first elements in the sequence. We said y0 = 1, now y1 is given by
Z t Z t
n = 0, y1 (t) = 1 + (2 y0 (s) + 3) ds = 1 + 5 ds = 1 + 5t.
0 0
70 1. FIRST ORDER EQUATIONS

So y1 = 1 + 5t. Now we compute y2 ,


Z t Z t Z t
5 + 10s ds = 1 + 5t + 5t2 .
 
y2 = 1 + (2 y1 (s) + 3) ds = 1 + 2(1 + 5s) + 3 ds ⇒ y2 = 1 +
0 0 0

So we’ve got y2 (t) = 1 + 5t + 5t2 . Now y3 ,


Z t Z t
2(1 + 5s + 5s2 ) + 3 ds

y3 = 1 + (2 y2 (s) + 3) ds = 1 +
0 0

so we have,
Z t
10 3
5 + 10s + 10s2 ds = 1 + 5t + 5t2 +

y3 = 1 + t .
0 3
10 3
So we obtained y3 (t) = 1 + 5t + 5t2 + t . We now rewrite this expression so we can get a power
3
series expansion that can be written in terms of simple functions. The first step is done already, to
write the powers of t as tn , for n = 1, 2, 3,
5(2) 3
y3 (t) = 1 + 5t1 + 5t2 +
t
3
We now multiply by one each term so we get the factorials n! on each term
t1 t2 t3
y3 (t) = 1 + 5 + 5(2) + 5(22 )
1! 2! 3!
We then realize that we can rewrite the expression above in terms of power of (2t), that is,
5 (2t)1 5 (2t)2 5 (2t)3 5 (2t)2 (2t)3 
y3 (t) = 1 + + + =1+ (2t) + + .
2 1! 2 2! 2 3! 2 2! 3!
From this last expression is simple to guess the n-th approximation
N
5 (2t)2 (2t)3 (2t)N  5 X (2t)k
yN (t) = 1 + (2t) + + + ··· + =1+ .
2 2! 3! N! 2 k!
k=1

Recall now that the power series expansion for the exponential
∞ ∞ ∞
X (at)k X (at)k X (at)k
eat = =1+ ⇒ = (eat − 1).
k! k! k!
k=0 k=1 k=1

Then, the limit N → ∞ is given by



5 X (2t)k 5
= 1 + e2t − 1 ,

y(t) = lim yN (t) = 1 +
N →∞ 2 k! 2
k=1

One last rewriting of the solution and we obtain


5 2t 3
y(t) = e − .
2 2
C

Remark: The differential equation y 0 = 2 y + 3 is of course linear, so the solution to the initial
value problem in Example 1.6.2 can be obtained using the methods in Section 1.5,
3 3
e−2t (y 0 − 2 y) = e−2t 3 ⇒ e−2t y = − e−2t + c ⇒ y(t) = c e2t − ;
2 2
and the initial condition implies
3 5 5 2t 3
1 = y(0) = c − ⇒ c= ⇒ y(t) = e − .
2 2 2 2
1.6. APPROXIMATE SOLUTIONS: PICARD ITERATION 71

1.6.3. Picard vs Taylor Approximations. We use the previous example to compare the
accuracy of the Picard approximation of the solution y with the Taylor approximation. Recall that
y is the solution of the initial value problem
y 0 (t) = 2 y(t) + 3, y(0) = y0 , y0 ∈ R.
We make the Picard vs Taylor comparison with an interactive graph, linked below, where we graph
the solution y of the initial value problem, together with few Picard and Taylor approximations.
Here are a few instructions to use the interactive graph.
• The slider Function turns on-off the graph of the solution y(t), displayed in purple.
• We graph in blue approximate solutions yn of the differential equation constructed with
the Picard iteration up to the order n = 10. The slider Picard-App-Blue turns on-off
the Picard approximate solution.
• We graph in green the n-order Taylor expansion centered t = 0 of the solution of the
differential equation, up to order n = 10. The slider Taylor-App-Green turns on-off
the Taylor approximation of the solution.

Picard vs Taylor Approximations: Linear Case

By looking at the interactive graph above we conclude that the Picard iteration is identical as the
Taylor expansion for solutions of the linear differential equation above. This example suggests that
the Taylor approximation and the Picard approximations of solutions to linear non-homogeneous
equation might be always identical. This is indeed the case, for constant coefficient equations, as
we see in the following result.

Theorem 1.6.2. If the function y is solution of the initial value problem


y 0 (t) = a y(t) + b, y(t0 ) = y0 , (1.6.4)
for any constants a, b, t0 , y0 , then
yn (t) = τn (t),
for all t ∈ R and n = 0, 1, 2, · · · , where yn is the n-order Picard approximation and τn is the n-order
Taylor approximation centered at t = t0 of the solution y.

Proof of Theorem 1.6.2: If y is the solution of the initial value in (1.6.4), its n-order Taylor
expansion centered at t = t0 is
y (2) (t0 ) y (n) (t0 )
τn (t) = y0 + y (1) (t0 ) (t − t0 ) + (t − t0 )2 + · · · + (t − t0 )n ,
2! n!
where y (n) is the n-th derivative of y. In particular, τ0 = y0 and τ1 (t) = y0 + y (1) (t0 ) (t − t0 ). The
Picard iteration of y is defined as follows, y0 (t) = y0 , and
Z t

yn (t) = y0 + a yn−1 (s) + b ds, n = 1, 2, · · · .
t0

From here we see that the zero-order of the Picard and Taylor approximations agree. Now, for
n = 1 we get
Z t

y1 (t) = y0 + a y0 + b ds ⇒ y1 (t) = y0 + (a y0 + b) (t − t0 )
t0

We now need to recall that y is solution of the differential equation in (1.6.4). This equation
evalueated at t = t0 says that
y 0 (t0 ) = a y(t0 ) + b. ⇒ y (1) (t0 ) = a y0 + b.
Therefore, the first Picard approximation y1 of y has the form
y1 (t) = y0 + y (1) (t0 ) (t − t0 ).
72 1. FIRST ORDER EQUATIONS

We conclude that y1 (t) = τ1 (t) for all t ∈ R, that is, the order one Picard and Taylor approximations
agree. Before we finish the proof we need a formula. Differentiate n-times the differential equation
in (1.6.4) and evaluate the result at t = t0 ; recall that a and b are constants, we get

y (n+1) (t0 ) = a y (n) (t0 ). (1.6.5)


Now we are ready to finish the proof of the Theorem, and we do it by induction. We have shown
that for n = 0 and n = 1 the Picard and Taylor approximations are the same. Now we prove the
following:
yn (t) = τn (t) ⇒ yn+1 (t) = τn+1 (t).
Indeed, if yn (t) = τn (t), then the Picard formula says
Z t
yn+1 (t) = y0 + a τn (s) + b) ds
t0

Using the formula for the Taylor expansion written at the beginning of the proof,
Z t
y (n) (t0 ) 
a y0 + y (1) (t0 ) (s − t0 ) + · · · + (s − t0 )n + b ds.

yn+1 (t) = y0 +
t0 n!
If we reorder terms inside the integral, and we recall eq. (1.6.5), we get
Z t
y (n) (t0 ) 
y(n+1) (t) = y0 + (a y0 + b) + a y (1) (t0 ) (s − t0 ) + · · · + a (s − t0 )n ds
t0 n!
Z t (n+1)
y (t0 ) 
= y0 + y (1) (t0 ) + y (2) (t0 ) (s − t0 ) + · · · + (s − t0 )n ds
t0 n!
(t − t0 )2 y (n+1) (t0 ) (t − t0 )(n+1)
= y0 + y (1) (t − t0 ) + y (2) (t0 ) + ··· +
2 n! (n + 1)
= τn+1 (t).
We conclude that yn = τn implies that yn+1 = τn+1 . This establishes the Theorem. 
The theorem above says that for solutions of linear equations having constant coefficients the
Picard and Taylor approximations of the solution are identical. This is not true for solutions of
either linear equations with variable coefficients or nonlinear equations. In the next example we
compute the first three approximations of both the Picard and Taylor approximations of solutions
to a nonlinear differential equation, and we show that they are different.

Example 1.6.3. Show that the Picard and Taylor approximations of the solution y(t) of the initial
value problem below are different, where
y 0 (t) = y 2 (t), y(0) = −1, t > 0.

Solution: This differential equation is separable, so we can solve it as follows,


Z 0 Z Z Z
y (t) dy 1
dt = dt ⇒ dt = dt ⇒ − = t + c.
y 2 (t) y2 y
The initial condition y(0) = −1 fixes the constant c,
1
− =0+c ⇒ c = 1,
(−1)
so we get the solution
1
y(t) = − .
(t + 1)
We can use this explicit solution to compute the Taylor approximations centered at t = 0,
(0) (1) (n) tn
τn (t) = y0 + y0 t + · · · + y0 ,
n!
1.6. APPROXIMATE SOLUTIONS: PICARD ITERATION 73

(n)
where y0 is the n-derivative of y evaluated at t = 0, and the case n = 0 is just y(0). A straight-
forward calculation gives
τ0 (t) = −1, τ1 (t) = −1 + t, τ2 (t) = −1 + t − t2 .
The Picard approximation of y is y0 (t) = y(0), and then
Z t
yn+1 (t) = y(0) + yn2 (s) ds.
0
Again, a straightforward calculation gives,
y0 (t) = −1 = τ0 (t), y1 (t) = −1 + t = τ1 (t).
But the approximation y1 is different form τ2 . Indeed,
Z t Z t Z t
y1 (s))2 ds = −1 + (−1 + s)2 ds = −1 + (1 − 2s + s2 ) ds,

y2 (t) = y(0) +
0 0 0
so we conclude that
t3 t3
y2 (t) = −1 + t − t2 + ⇒ y2 (t) = τ2 (t) + .
3 3
Therefore, y2 6= τ2 . We decide which approximation is more precise in the following interactive
graph. Here are a few instructions to use the interactive graph.
• The slider Function turns on-off the graph of the solution y(t), displayed in purple.
• We graph in blue approximate solutions yn of the differential equation constructed with
the Picard iteration up to the order n = 5. The slider Picard-App-Blue turns on-off
the Picard approximate solution.
• We graph in green the n-order Taylor expansion centered t = 0 of the solution of the
differential equation, up to order n = 5. The slider Taylor-App-Green turns on-off the
Taylor approximation of the solution.

Picard vs Taylor Approximations: Non-Linear Case - Explicit Solution

We can see in the interactive graph that the Picard iteration approximates better the
solution values than the Taylor series expansion of that solution in a neighborhood of the
initial condition. C

In the next example study a nonlinear differential equation, which we do not know how to
find an explicit formula for the solution. Yet, we compute the Picard and Taylor approximations
of the solutions and we can compare them.

Example 1.6.4. Show that the Picard and Taylor approximations of the solution y(t) of the initial
value problem below are different, where
y 0 (t) = y 2 (t) + t, y(0) = −1, t > 0.

Solution: This differential equation is not linear, not separable, and not Euler Homogeneous. So
we do not know how to find a solution y of that equation. But we can compute the Taylor and
Picard approximations of the solution. The Taylor approximation is
(0) (1) (n) tn
τn (t) = y0 + y0 t + · · · + y0 ,
n!
(n)
where y0 is the n-derivative of y evaluated at t = 0, and the case n = 0 is just y(0). We now use
(n)
the initial condition y(0) = −1 and the equation itself to find all the y0 . Indeed,
2
y0 = y 0 (0) = y(0) + 0 = (−1)2 = 1 ⇒ y0 = 1.
(1)  (1)

This coefficient, and the previous one, gives us the Taylor approximation
τ1 (t) = −1 + t.
74 1. FIRST ORDER EQUATIONS

(2)
To compute the coefficient y0 we need to take one derivative to the differential equation,
y 00 (t) = 2y(t) y 0 (t) + 1.
Therefore,
= y 00 (0) = 2y(0) y 0 (0) + 1 = 2 (−1) (1) + 1 = −1
(2) (2)
y0 ⇒ y0 = −1.
This coefficient, and the previous ones, gives us the Taylor approximation
τ2 (t) = −1 + t − t2 .
On the other hand, the Picard iteration is computed in the usual way, y0 (t) = y(0), and
Z t
yn2 (s) + s ds.

yn+1 (t) =
0

So, y0 (t) = −1 = τ0 (t), and then


t
t2
Z
(−1)2 + s ds

y1 (t) = −1 + ⇒ y1 (t) = −1 + t + .
0 2
Therefore, y1 6= τ1 . We can compute one more term in the Picard iteration,
Z t 
s2 2 
y2 (t) = −1 + −1 + s + + s ds
0 2
Z t
s4
1 + s2 + − 2s − s2 + s3 + s ds

= −1 +
0 4
Z t
s4 
= −1 + 1 − s + s3 + ds
0 4
t2 t4 t5
= −1 + t − + + .
2 4 20
Again, y2 6= τ2 . We decide which approximation is more precise in the following interactive graph.
Here are a few instructions to use the interactive graph.
• Unlike the previous interactive graph, we do not have a slider Function, since we do not
have an explicit expression for the solution of the differential equation.
• We graph in blue approximate solutions yn of the differential equation constructed with
the Picard iteration up to the order n = 5. The slider Picard-App-Blue turns on-off
the Picard approximate solution.
• We graph in green the n-order Taylor expansion centered t = 0 of the solution of the
differential equation, up to order n = 5. The slider Taylor-App-Green turns on-off the
Taylor approximation of the solution.

Picard vs Taylor Approximations: Non-Linear Case - No Explicit Solution

We can see in the interactive graph that the Picard iteration is different from the Taylor
series expansion of that solution in a neighborhood of the initial condition. C

1.6.4. Picard Iteration: A Few More Examples.

Example 1.6.5. Use the proof of Picard-Lindelöf’s Theorem to find the solution to
y0 = a y + b y(0) = ŷ0 , a, b ∈ R.

Solution: We first transform the differential equation into an integral equation.


Z t Z t Z t
0
y (s) ds = (a y(s) + b) ds ⇒ y(t) − y(0) = (a y(s) + b) ds.
0 0 0

Using the initial condition, y(0) = ŷ0 ,


Z t
y(t) = ŷ0 + (a y(s) + b) ds.
0
1.6. APPROXIMATE SOLUTIONS: PICARD ITERATION 75

We now define the sequence of approximate solutions:


Z t
y0 = y(0) = ŷ0 , yn+1 (t) = ŷ0 + (a yn (s) + b) ds, n > 0.
0
We now compute the first elements in the sequence. We said y0 = ŷ0 , now y1 is given by
Z t
n = 0, y1 (t) = y0 + (a y0 (s) + b) ds
0
Z t
= ŷ0 + (a ŷ0 + b) ds
0
= ŷ0 + (a ŷ0 + b)t.
So y1 = ŷ0 + (a ŷ0 + b)t. Now we compute y2 ,
Z t
y2 = ŷ0 + [a y1 (s) + b] ds
0
Z t
 
= ŷ0 + a(ŷ0 + (a ŷ0 + b)s) + b ds
0
at2
= ŷ0 + (aŷ0 + b)t + (a ŷ0 + b)
2
at2
So we obtained y2 (t) = ŷ0 + (aŷ0 + b)t + (a ŷ0 + b) . A similar calculation gives us y3 ,
2
at2 a2 t3
y3 (t) = ŷ0 + (aŷ0 + b)t + (a ŷ0 + b) + (a ŷ0 + b) .
2 3!
We now rewrite this expression so we can get a power series expansion that can be written in terms
of simple functions. The first step is done already, to write the powers of t as tn , for n = 1, 2, 3,
(t)1 t2 t3
y3 (t) = ŷ0 + (aŷ0 + b) + (a ŷ0 + b) a + (a ŷ0 + b) a2 .
1! 2! 3!
We already have the factorials n! on each term tn . We now realize we can write the power functions
as (at)n is we multiply eat term by one, as follows
(aŷ0 + b) (at)1 (a ŷ0 + b) (at)2 (a ŷ0 + b) (at)3
y3 (t) = ŷ0 + + + .
a 1! a 2! a 3!
Now we can pull a common factor
 b   (at)1 (at)2 (at)3 
y3 (t) = ŷ0 + ŷ0 + + +
a 1! 2! 3!
From this last expressionis simple to guess the n-th approximation
 b   (at)1 (at)2 (at)3 (at)N 
yN (t) = ŷ0 + ŷ0 + + + + ··· +
a 1! 2! 3! N!
∞ k
 b  X (at)
lim yN (t) = ŷ0 + ŷ0 + .
N →∞ a k!
k=1

Recall now that the power series expansion for the exponential
∞ ∞ ∞
X (at)k X (at)k X (at)k
eat = =1+ ⇒ = (eat − 1).
k! k! k!
k=0 k=1 k=1

Notice that the sum in the exponential starts at k = 0, while the sum in yn starts at k = 1. Then,
the limit n → ∞ is given by
y(t) = lim yn (t)
n→∞

 b  X (at)k
= ŷ0 + ŷ0 +
a k!
k=1
 b 
eat − 1 ,

= ŷ0 + ŷ0 +
a
76 1. FIRST ORDER EQUATIONS

We have been able to add the power series and we have the solution written in terms of simple
functions. One last rewriting of the solution and we obtain
 b  at b
y(t) = ŷ0 + e − .
a a
C

Remark: We reobtained Eq. (1.5.5) in Theorem 1.5.2.

Example 1.6.6. Use the Picard iteration to find the solution of


y 0 = 5t y, y(0) = 1.

Solution: We first transform the differential equation into an integral equation.


Z t Z t Z t
y 0 (s) ds = 5s y(s) ds ⇒ y(t) − y(0) = 5s y(s) ds.
0 0 0
Using the initial condition, y(0) = 1,
Z t
y(t) = 1 + 5s y(s) ds.
0
We now define the sequence of approximate solutions:
Z t
y0 = y(0) = 1, yn+1 (t) = 1 + 5s yn (s) ds, n > 0.
0
We now compute the first four elements in the sequence. The first one is y0 = y(0) = 1, the second
one y1 is given by Z t
5
n = 0, y1 (t) = 1 + 5s ds = 1 + t2 .
0 2
So y1 = 1 + (5/2)t2 . Now we compute y2 ,
Z t
y2 = 1 + 5s y1 (s) ds
0
Z t
5 
=1+ 5s 1 + s2 ds
0 2
Z t
52 3 
=1+ 5s + s ds
0 2
5 2 52 4
=1+ t + t .
2 8
5 2 52 4
So we obtained y2 (t) = 1 + t + 3 t . A similar calculation gives us y3 ,
2 2
Z t
y3 = 1 + 5s y2 (s) ds
0
Z t
5 52 
=1+ 5s 1 + s2 + 3 s4 ds
0 2 2
Z t
52 3 53 5 
=1+ 5s + s + 3 s ds
0 2 2
5 2 52 4 53
=1+ t + t + 3 t6 .
2 8 2 6
5 2 52 4 53 6
So we obtained y3 (t) = 1 + t + 3 t + 4 t . We now rewrite this expression so we can get a
2 2 2 3
power series expansion that can be written in terms of simple functions. The first step is to write
n
the powers of t as t , for n = 1, 2, 3,
5 2 1 52 2 2 53
y3 (t) = 1 + (t ) + 3 (t ) + 4 (t2 )3 .
2 2 2 3
1.6. APPROXIMATE SOLUTIONS: PICARD ITERATION 77

Now we multiply by one each term to get the right facctorials, n! on each term,
5 (t2 )1 52 (t2 )2 53 (t2 )3
y3 (t) = 1 + + 2 + 3 .
2 1! 2 2! 2 3!
No we realize that the factor 5/2 can be written together with the powers of t2 ,
( 52 t2 ) ( 5 t 2 )2 ( 5 t2 )3
y3 (t) = 1 + + 2 + 2 .
1! 2! 3!
From this last expression is simple to guess the n-th approximation
N
X ( 52 t2 )k
yN (t) = 1 + ,
k!
k=1

which can be proven by induction. Therefore,



X ( 52 t2 )k
y(t) = lim yN (t) = 1 + .
N →∞ k!
k=1

Recall now that the power series expansion for the exponential
∞ ∞
X (at)k X (at)k
eat = =1+ .
k! k!
k=0 k=1
so we get
5 2 5 2
y(t) = 1 + (e 2 t − 1) ⇒ y(t) = e 2 t .
C

Remark: The differential equation y 0 = 5t y is of course separable, so the solution to the initial
value problem in Example 1.6.6 can be obtained using the methods in Section 1.4,
y0 5t2 5 2
= 5t ⇒ ln(y) = + c. ⇒ y(t) = c̃ e 2 t .
y 2
We now use the initial condition,
1 = y(0) = c̃ ⇒ c = 1,
so we obtain the solution
5 2
y(t) = e 2 t .

Example 1.6.7. Use the Picard iteration to find the solution of


y 0 = 2t4 y, y(0) = 1.

Solution: We first transform the differential equation into an integral equation.


Z t Z t Z t
y 0 (s) ds = 2s4 y(s) ds ⇒ y(t) − y(0) = 2s4 y(s) ds.
0 0 0
Using the initial condition, y(0) = 1,
Z t
y(t) = 1 + 2s4 y(s) ds.
0
We now define the sequence of approximate solutions:
Z t
y0 = y(0) = 1, yn+1 (t) = 1 + 2s4 yn (s) ds, n > 0.
0
We now compute the first four elements in the sequence. The first one is y0 = y(0) = 1, the second
one y1 is given by Z t
2
n = 0, y1 (t) = 1 + 2s4 ds = 1 + t5 .
0 5
78 1. FIRST ORDER EQUATIONS

So y1 = 1 + (2/5)t5 . Now we compute y2 ,


Z t
y2 = 1 + 2s4 y1 (s) ds
0
Z t
2 5
=1+ 2s4 1 + s ds
0 5
t
22 9 
Z
=1+ 2s4 + s ds
0 5
2 22 1 10
= 1 + t5 + t .
5 5 10
2 5 22 1 10
So we obtained y2 (t) = 1 + t + 2 t . A similar calculation gives us y3 ,
5 5 2
Z t
y3 = 1 + 2s4 y2 (s) ds
0
Z t
2 22 1 
=1+ 2s4 1 + s5 + 2 s10 ds
0 5 5 2
Z t
2 9 23 1 14 
2
=1+ 2s4 + s + 2 s ds
0 5 5 2
2 5 22 1 10 23 1 1 15
=1+ t + t + 2 t .
5 5 10 5 2 15
2 22 1 23 1 1 15
So we obtained y3 (t) = 1 + t5 + 2 t10 + 3 t . We now try reorder terms in this last
5 5 2 5 23
expression so we can get a power series expansion we can write in terms of simple functions. This
is what we do:
2 5 22 (t5 )2 23 (t5 )3
y3 (t) = 1 +
(t ) + 3 + 4
5 5 2 5 6
2 (t5 ) 22 (t5 )2 23 (t5 )3
=1+ + 2 + 3
5 1! 5 2! 5 3!
2 5 2 5 2
(5t ) (5 t ) ( 25 t5 )3
=1+ + + .
1! 2! 3!
From this last expression is simple to guess the n-th approximation
N
X ( 25 t5 )n
yN (t) = 1 + ,
n=1
n!

which can be proven by induction. Therefore,



X ( 25 t5 )n
y(t) = lim yN (t) = 1 + .
N →∞
n=1
n!

Recall now that the power series expansion for the exponential
∞ ∞
X (at)k X (at)k
eat = =1+ .
k! k!
k=0 k=1

so we get
2 5 2 5
y(t) = 1 + (e 5 t − 1) ⇒ y(t) = e 5 t .

C
1.6. APPROXIMATE SOLUTIONS: PICARD ITERATION 79

1.6.5. Linear vs Nonlinear Equations. The main result in § 1.5 was Theorem ??, which
says that an initial value problem for a linear differential equation
y 0 = a(t) y + b(t), y(t0 ) = y0 ,
with a, b continuous functions on (t1 , t2 ), and constants t0 ∈ (t1 , t2 ) and y0 ∈ R, has the unique
solution y on (t1 , t2 ) given by
 Z t 
y(t) = eA(t) y0 + e−A(s) b(s) ds ,
t0
Z t
where we introduced the function A(t) = a(s) ds.
t0
From the result above we can see that solutions to linear differential equations satisfiy the
following properties:
(a) There is an explicit expression for the solutions of a differential equations.
(b) For every initial condition y0 ∈ R there exists a unique solution.
(c) For every initial condition y0 ∈ R the solution y(t) is defined for all (t1 , t2 ).

Remark: None of these properties hold for solutions of nonlinear differential equations.
From the Picard-Lindelöf Theorem one can see that solutions to nonlinear differential equations
satisfy the following properties:
(i) There is no explicit formula for the solution to every nonlinear differential equation.
(ii) Solutions to initial value problems for nonlinear equations may be non-unique when the func-
tion f does not satisfy the Lipschitz condition.
(iii) The domain of a solution y to a nonlinear initial value problem may change when we change
the initial data y0 .
The next three examples (1.6.8)-(1.6.10) are particular cases of the statements in (i)-(iii). We
start with an equation whose solutions cannot be written in explicit form.

Example 1.6.8. For every constant a1 , a2 , a3 , a4 , find all solutions y to the equation
t2
y 0 (t) = . (1.6.6)
y 4 (t) + a4 y 3 (t) + a3 y 2 (t) + a2 y(t) + a1

Solution: The nonlinear differential equation above is separable, so we follow § 1.4 to find its
solutions. First we rewrite the equation as
y 4 (t) + a4 y 3 (t) + a3 y 2 (t) + a2 y(t) + a1 y 0 (t) = t2 .


Then we integrate on both sides of the equation,


Z Z
y 4 (t) + a4 y 3 (t) + a3 y 2 (t) + a2 y(t) + a1 y 0 (t) dt = t2 dt + c.


Introduce the substitution u = y(t), so du = y 0 (t) dt,


Z Z
(u + a4 u + a3 u + a2 u + a1 du = t2 dt + c.
4 3 2 

Integrate the left-hand side with respect to u and the right-hand side with respect to t. Substitute
u back by the function y, hence we obtain
1 5 a4 4 a3 3 a2 t3
y (t) + y (t) + y (t) + y(t) + a1 y(t) = + c.
5 4 3 2 3
This is an implicit form for the solution y of the problem. The solution is the root of a polynomial
degree five for all possible values of the polynomial coefficients. But it has been proven that there
is no formula for the roots of a general polynomial degree bigger or equal five. We conclude that
that there is no explicit expression for solutions y of Eq. (1.6.6). C
80 1. FIRST ORDER EQUATIONS

We now give an example of the statement in (ii), that is, a differential equation which does not
satisfy one of the hypothesis in Theorem 1.6.1. The function f has a discontinuity at a line in the
(t, u) plane where the initial condition for the initial value problem is given. We then show that
such initial value problem has two solutions instead of a unique solution.

Example 1.6.9. Find every solution y of the initial value problem


y 0 (t) = y 1/3 (t), y(0) = 0. (1.6.7)

Remark: The equation above is nonlinear, separable, and f (t, u) = u1/3 has derivative
1 1
∂u f = .
3 u2/3
Since the function ∂u f is not continuous at u = 0, it does not satisfies the Lipschitz condition in
Theorem 1.6.1 on any domain of the form S = [−a, a] × [−a, a] with a > 0.
Solution: The solution to the initial value problem in Eq. (??) exists but it is not unique, since
we now show that it has two solutions. The first solution is
y1 (t) = 0.
The second solution can be computed as using the ideas from separable equations, that is,
Z Z
 −1/3 0
y(t) y (t) dt = dt + c0 .

Then, the substitution u = y(t), with du = y 0 (t) dt, implies that


Z Z
u−1/3 du = dt + c0 .

Integrate and substitute back the function y. The result is


3 2/3 h2 i3/2
y(t) = t + c0 ⇒ y(t) = (t + c0 ) .
2 3
The initial condition above implies
 2 3/2
0 = y(0) = c0 ⇒ c0 = 0,
3
so the second solution is:  2 3/2
y2 (t) = t .
3
C

Finally, an example of the statement in (iii). In this example we have an equation with solutions
defined in a domain that depends on the initial data.

Example 1.6.10. Find the solution y to the initial value problem


y 0 (t) = y 2 (t), y(0) = y0 .

Solution: This is a nonlinear separable equation, so we can again apply the ideas in Sect. 1.4. We
first find all solutions of the differential equation,
Z 0 Z
y (t) dt 1 1
= dt + c0 ⇒ − = t + c0 ⇒ y(t) = − .
y 2 (t) y(t) c0 + t
We now use the initial condition in the last expression above,
1 1
y0 = y(0) = − ⇒ c0 = − .
c0 y0
So, the solution of the initial value problem above is:
1
y(t) =  .
1
−t
y0
1.6. APPROXIMATE SOLUTIONS: PICARD ITERATION 81

This solution diverges at t = 1/y0 , so the domain of the solution y is not the whole real line R.
Instead, the domain is R − {y0 }, so it depends on the values of the initial data y0 . C

In the next example we consider an equation of the form y 0 (t) = f (t, y(t)), where f does not
satisfy the hypotheses in Theorem 1.6.1.

Example 1.6.11. Consider the nonlinear initial


value problem
1 u
y 0 (t) = ,
(t − 1)(t + 1)(y(t) − 2)(y(t) + 3) t = −1 t=1
y(t0 ) = y0 . (1.6.8)
u=2
Find the regions on the plane where the hypotheses
in Theorem 1.6.1 are not satisfied.
Solution: In this case the function f is given by:
1 0 t
f (t, u) = , (1.6.9)
(t − 1)(t + 1)(u − 2)(u + 3)
so f is not defined on the lines
R
t = 1, t = −1, u = 2, u = −3. u = −3
See Fig. 23. For example, in the case that the ini-
tial data is t0 = 0, y0 = 1, then Theorem 1.6.1
implies that there exists a unique solution on any Figure 23. Red
region R̂ contained in the rectangle R = (−1, 1) × regions where f in
(−3, 2). If the initial data for the initial value prob- Eq. (1.6.9) is not de-
lem in Eq. (1.6.8) is t = 0, y0 = 2, then the hy- fined.
potheses of Theorem 1.6.1 are not satisfied. C
82 1. FIRST ORDER EQUATIONS

1.6.6. Exercises.

1.6.1.- Use the Picard iteration to find the first four elements, y0 , y1 , y2 , and y3 , of the sequence
{yn }∞
n=0 of approximate solutions to the initial value problem

y 0 = 6y + 1, y(0) = 0.

1.6.2.- Use the Picard iteration to find the information required below about the sequence {yn }∞
n=0
of approximate solutions to the initial value problem
y 0 = 3y + 5, y(0) = 1.

(a) The first 4 elements in the sequence, y0 , y1 , y2 , and y3 .


(b) The general term ck (t) of the approximation
n
X ck (t)
yn (t) = 1 + .
k!
k=1

(c) Find the limit y(t) = limn→∞ yn (t).

1.6.3.- Find the domain where the solution of the initial value problems below is well-defined.

−4t
(a) y 0 = , y(0) = y0 > 0.
y
(b) y 0 = 2ty 2 , y(0) = y0 > 0.

1.6.4.- By looking at the equation coefficients, find a domain where the solution of the initial value
problem below exists,

(a) (t2 − 4) y 0 + 2 ln(t) y = 3t, and initial condition y(1) = −2.


y
(b) y 0 = , and initial condition y(−1) = 2.
t(t − 3)

1.6.5.- State where in the plane with points (t, y) the hypothesis of Theorem 1.6.1 are not satisfied.

y2
(a) y 0 = .
2t − 3y
(b) y 0 = 1 − t2 − y 2 .
p
CHAPTER 2

Second Order Linear Equations

Newton’s second law of motion, ma = f , is maybe one of the first differential equations written.
This is a second order equation, since the acceleration is the second time derivative of the particle
position function. Second order differential equations are more difficult to solve than first order
equations. In § 2.1 we compare results on linear first and second order equations. While there is
an explicit formula for all solutions to first order linear equations, not such formula exists for all
solutions to second order linear equations. The most one can get is the result in Theorem 2.1.8.
In § 2.2 we find explicit formulas for all solutions to linear second order equations that are both
homogeneous and with constant coefficients. These formulas are generalized to nonhomogeneous
equations in § 2.3. In § 2.4 we solve special second order equations, which include Newton’s
equations in the case that the force depends only on the position function. In this case we see that
the mechanical energy of the system is conserved. We also present in more detail the Reduction
Order Method to find a new solution of a second order equation if we already know one solution of
that equation.

83
84 2. SECOND ORDER LINEAR EQUATIONS

2.1. General Properties


In this section we introduce second order linear differential equations. We start with a general
definition and a few examples from physics. We then show Theorem 2.1.3, which says that there are
solutions to second order linear equations when the equation coefficients are continuous functions.
Furthermore, these solutions have two free parameters that can be fixed by appropriate initial
conditions.
We then focus on a particular type of equations, which have no sources—called homogeneous
equations. We show that these equations satisfy the superposition property—the linear combination
of two solutions is also a solution. This property is important to prove our second main result,
Theorem 2.1.8, which is the closest we can get to a formula for solutions to second order linear
homogeneous equations. This theorem says that to know all solutions to these equations we only
need to know two solutions that are not proportional to each other.

2.1.1. Definitions and Examples. We start with a definition of second order linear differ-
ential equations. After a few examples we state the first of the main results, Theorem 2.1.3, about
existence and uniqueness of solutions to an initial value problem in the case that the equation
coefficients are continuous functions.

Definition 2.1.1. A second order differential equation for y(t) is

y 00 = f (t, y, y 0 ). (2.1.1)

The equation (2.1.1) is linear, non-homogeneous iff

y 00 + a1 (t) y 0 + a0 (t) y = b(t), (2.1.2)

where a1 , a0 , b are given functions on the interval I ⊂ R. The equation (2.1.2):


(a) is homogeneous iff the source b(t) = 0 for all t ∈ R;
(b) has constant coefficients iff a1 and a0 are constants;
(c) has variable coefficients iff either a1 or a0 is not constant.

Remark: The notion of an homogeneous equation presented here is different from the Euler
homogeneous equations we studied in § 1.4.

Example 2.1.1.
(a) A second order, linear, homogeneous, constant coefficients equation is

y 00 + 5y 0 + 6y = 0.

(b) A second order, linear, nonhomogeneous, constant coefficients, equation is

y 00 − 3y 0 + y = cos(3t).

(c) A second order, linear, nonhomogeneous, variable coefficients equation is

y 00 + 2t y 0 − ln(t) y = e3t .

(d) A second order, non-linear equation is

y 00 + 2t y 0 − ln(t) y 2 = e3t .

C
2.1. GENERAL PROPERTIES 85

Maybe the most famous example of a second order differential equation is Newton’s second law
of motion.

Example 2.1.2 (Newton’s Second Law of Motion). Newton’s law of motion for a point particle
of mass m moving in one space dimension under a force f is ma = f , where the acceleration is
a = y 00 , the second time derivative of the position function y, that is
m y 00 (t) = f (t, y(t), y 0 (t)).
The force f can be any function of time t, the position y, and the velocity y 0 of the particle.
C

Example 2.1.3 (Mass-Spring. No Friction).


Consider mass hanging at the bottom of a
spring. Set y to be a vertical coordinate, with
y = 0 at the equilibrium position of the mass-
spring. Then, Hooke’s Law states the force
done by the spring on the mass is proportional
0
to the stretching distance y and in the opposite m
to the stretching,
y(t)
f = −ky, k > 0. m

Newton’s equation for this system, m y 00 = f , is


y
m y 00 = −k y ⇒ m y 00 + k y = 0.
Figure 1. Mass-Spring System.

Example 2.1.4 (Mass-Spring with Friction). Consider mass hanging at the bottom of a spring
describe in the example above. Suppose now that the whole system is oscillating inside a water
bath. In this case appears an extra force, the friction between the oscillating mass and the water,
given by
fd = −d y 0 , d > 0.
This friction, or damping force, opposes the movement. Then, Newton’s equation, m y 00 = f , in
this case is
m y 00 = −k y − d y 0 ⇒ m y 00 + d y 0 + k y = 0.
Therefore, a second order linear differential equation with positive constant coefficients m, d, and
k can always be identified with Newton’s equation for a spring having spring constant k, moving
in one space dimension, having a mass m attached, in a medium with damping constant d.

2.1.2. Conservation of the Energy. In the particular case that the force acting on a particle
depends only on the particle’s position, there is an integrating factor for Newton’s equation of
motion. The integrating factor happens to be the particle’s velocity. When we multiply Newton’s
equation by the particle’s velocity, Newton’s equation becomes a total derivative—so trivial to
integrate—of a function. This function, which depends on the position and velocity of the particle,
is called the mechanical energy of the particle. Therefore, this mechanical energy of the particle
remains constant along the motion.

Theorem 2.1.2 (Conservation of the Mechanical Energy). If the force on a particle is


0
f = f (
t , y, @
Z y@ ) = f (y).
then any solution of Newton’s second law of motion
m y 00 = f (y)
86 2. SECOND ORDER LINEAR EQUATIONS

has the mechanical energy conserved in time, where the mechanical energy is
1
E(t) = m v 2 (t) + V (y(t))
2
where v(t) = y 0 (t) and
Z
V (y) = − f (y) dy

is the potential energy of the particle.

Proof of Theorem 2.1.2: Since the force on the particle depends only on the particle position,
f = f (y), we can always compute its (negative) antiderivative,
Z
dV
V (y) = − f (y) dy ⇒ = −f.
dy
The function V is called the potential energy of the particle. If we write Newton’s law of motion
m y 00 = f in terms of the potential energy we get
m y 00 = −V 0 (y),
where prime means derivative with respect to the function argument, hence
dy dV
y 0 (t) = , V 0 (y) = .
dt dy
Now multiply Newton’s equation by the particle velocity, y 0 ,
m y 0 y 00 = −V 0 (y) y 0 .
The chain rule for derivatives of a composition of functions says that
1 d 0  d
m y 0 (t) y 00 (t) = m y (t) and − V 0 (y(t)) y 0 (t) = − (V (y(t))
2 dt dt
Therefore, Newton’s equation can be written as a total derivative,
d 1
m (y 0 )2 + V (y) = 0.

dt 2
If we introduce the mechanical energy
1
E(t) = m (y 0 (t))2 + V (y(t))
2
then Newton’s equation is
E 0 (t) = 0 ⇒ E(t) = E(0).
that is, the mechanical energy is conserved along the motion of the particle, and it is equal to its
initial value. This establishes the Theorem. 

Example 2.1.5 (Mass-Spring System Undamped). Consider an object of mass m hanging at the
bottom of a spring. In Fig. 1 we see the object hanging at the resting position y = 0 and we also
see the same object oscillating at a position y(t). It is known that the force created by the spring
when streched at the position y is f (y) = −k y, where k > 0 is a constant that determines how
strong the spring is. Use this to write Newton’s law of motion and the energy conservation for this
system.
Solution: Newton’s second law of motion is ma = f , so in this case we get the differential equation
m y 00 = −k y ⇒ m y 00 + k y = 0.
Instead of using the formula for the mechanical energy, we derive that formula again for this system.
Multiply Newton’s equation by the velocity y 0 and recall the chain rule for derivatives.
d  (y 0 )2  d  y2  d  m 0 2 k 2
m y 0 y 00 + k y y 0 = 0 ⇒ m +k = 0, ⇒ (y ) + y = 0.
dt 2 dt 2 dt 2 2
2.1. GENERAL PROPERTIES 87

Therefore, the mechanical energy for a spring is


1 1
E(t) = m (y 0 )2 + k y 2 ,
2 2
and this energy is conserved along the motion,
E(t) = E(0).
1 1
The term K = m (y 0 )2 is called the kinetic energy of the particle, while the term V = k y 2 is
2 2
called the potential energy of the particle. C

Example 2.1.6 (Mass-Spring System Undamped). An object of mass 10 grams is hanging from a
spring with constant 20 grams per seconds square. If the object is initially at rest and the spring
is stretched 10 centimeters, find the maximum speed of the object when it is oscillating.
Solution: We know that the differential equation describing the object movement is
m y 00 + k y = 0, m = 10, k = 20.
Unfortunately, we do not know how to solve this differential equation, yet. Fortunately, we do
not need to solve this equation to answer the question above, because this system has a conserved
energy:
E(t) = E(0),
where
1 1
E(t) = m (y 0 )2 + k y 2 .
2 2
Using the data of the problem we get
E(t) = 5(y 0 (t))2 + 10(y(t))2 .
Since we know that at the initial time t = 0 we have
y(0) = 10, y 0 (0) = 0 ⇒ E(0) = 5(y 0 (0))2 + 10(y(0))2 = 0 + 1000 ⇒ E(0) = 1000.
Since the energy is conserved we get that
1 1
m (y 0 )2 + k y 2 = 1000 for all t.
2 2
The left hand side will have a maximum speed when y 2 has the lowest possible value, and that is
when y = 0, when the object passes through the equilibrium position. At that point the speed is
the maximum and is given by
0 0

5(ymax )2 + 10(0)2 = 1000 ⇒ |ymax | = 10 2.
C

Example 2.1.7 (Mass-Spring with Damping). Consider an object of mass m > 0 hanging from a
spring with constant k > 0 oscillating in a viscous liquid with damping constant d > 0. It is known
that in this case there is a friction force on the object, also called a damping force. This force is
proportional and opposite to the object’s velocity,
fd = −d y 0 .
Write Newton’s equation of motion for this case and determine whether the energy is conserved or
not.
Solution: In this case, Newton’s second law of motion is given by the terms we had in the previous
example plus the damping force,
m y 00 = −k y − d y 0 . ⇒ m y 00 + d y 0 + k y = 0.
The mechanical energy is not conserved in this case due to the damping force. Indeed, if we multiply
Newton’s equation by the particle velocity y 0 we get
1 d 1 d
m y 0 y 00 + d (y 0 )2 + k y y 0 = 0 ⇒ m (y 0 )2 + k y 2 = −d (y 0 )2 ,
2 dt 2 dt
88 2. SECOND ORDER LINEAR EQUATIONS

that is
d 1 1 
m (y 0 )2 + k y 2 = −d (y 0 )2 6 0
dt 2 2
1 1
the mechanical energy E(t) = m (y 0 )2 + k y 2 satisfies that
2 2
E 0 (t) = −d (y 0 )2 6 0
which means that the mechanical energy is a decreasing function of time. Since the mechanical
energy is a nonnegative function, it may approach zero as time grows. C

Remark: One of the main principles in physics is that energy cannot be created nor destroyed.
This principle is not violated by our previous example. When an object oscillates in a viscous
liquid the mechanical energy decreases because it is transformed into a different type of energy,
heat. The liquid and the object increase their temperature. Since we are not taking into account
the thermal energy in our previous example, we only mention that the mechanical energy of the
object decreases.

Example 2.1.8 (Mass with Damping). Consider a m grams bullet is shot down vertically with an
initial speed of y 0 (0) = v0 meters per second into a viscous fluid with damping constant d grams
per second. How long it takes till the kinetic energy of the bullet is 1% of the initial kinetic energy?
(That is, the bullet practically stops.)
Solution: At the speeds we consider in this problem the force of gravity can be discarded, and the
only force acting on the bullet is the friction force fd = −d y 0 . Newton’s second law of motion says
that
m y 00 = −d y 0 .
We can obtain a formula for the kinetic energy if we multiply Newton’s equation by y 0 ,
d 1 
m y 0 y 00 = −d (y 0 )2 ⇒ m (y 0 )2 = −d (y 0 )2
dt 2
that implies
d 1  2d  1  d 2d
m (y 0 )2 = − m (y 0 )2 ⇒ K = − K,
dt 2 m 2 dt m
where K = (m/2)(y 0 )2 is the bullet kinetic energy. So, this kinetic energy satisfies the differential
equation
2d
K0 + K = 0.
m
The solution of this equation is
K(t) = K(0) e−(2d/m)t .
We need to find a time t1 such that K(t1 ) = K(0)/100, that is
K(0)  1  2d m
= K(0) e−(2d/m)t1 ⇒ ln = − t1 ⇒ t1 = ln(100).
100 100 m 2d
C

Example 2.1.9 (Mass Falling on Earth). An object of mass 5 kilograms falls vertically to the
ground under the action of the earth gravitational acceleration of g = 10 meters per second square.
Denote by y vertical coordinate, positive upwards, and let y = 0 be at the earth surface. If the
initial position of the object is y(0) = 0 meters and its initial velocity is y 0 (0) = 5 meters per
second, find the maximum altitude ymax achieved by the object.
Solution: We know that the differential equation describing the object movement is
m y 00 = −mg, m = 10, g = 10.
2.1. GENERAL PROPERTIES 89

Although we know how to solve this differential equation, we can solve this problem using only the
mechanical energy of this system. Multiply Newton’s equation by y 0 ,
d 1 
m y 0 y 00 + mg y 0 = 0 ⇒ m(y 0 )2 + mg y = 0
dt 2
therefore, the energy
1
E(t) = m(y 0 )2 + mg y
2
is conserved along the motion, that is
E(t) = E(0).
Using the initial data of the problem, y(0) = 0 and y 0 (0) = 5 we get the initial energy
1
E(0) = 10 52 + 0 ⇒ E(0) = 125.
2
By looking at the energy we see that the maximum altitude achieved at a time tmax when the object
velocity vanishes, therefore
125 5
E(tmax ) = E(0) = 125 ⇒ 0 + mg ymax = 125 ⇒ ymax = ⇒ ymax = .
50 2
C

2.1.3. Existence and Uniqueness of Solutions. Here is the first of the two main results in
this section. Second order linear differential equations have solutions in the case that the equation
coefficients are continuous functions. And the solution of the equation is unique when we specify two
initial conditions. The latter means that the general solution must have two arbitrary integration
constants.

Theorem 2.1.3 (Existence and Uniqueness). If the functions a1 , a0 , b are continuous on a closed
interval I ⊂ R, the constant t0 ∈ I, and y0 , y1 ∈ R are arbitrary constants, then there is a unique
solution y, defined on I, of the initial value problem
y 00 + a1 (t) y 0 + a0 (t) y = b(t), y(t0 ) = y0 , y 0 (t0 ) = y1 . (2.1.3)

Remark: The fixed point argument used in the proof of Picard-Lindelöf’s Theorem ?? can be
extended to prove Theorem 2.1.3.

Example 2.1.10. Find the domain of the solution of the initial value problem
4(t − 1)
(t − 1) y 00 − 3t y 0 + y = t(t − 1), y(2) = 1, y 0 (2) = 0.
(t − 3)

Solution: We first write the equation above in the form given in the Theorem above,
3t 4
y 00 − y0 + y = t.
(t − 1) (t − 3)
The equation coefficients are defined on the domain
(−∞, 1) ∪ (1, 3) ∪ (3, ∞).
Which means that the solution may not be defined at t = 1 or t = 3. That is, we know for sure
that the solution is defined on
(−∞, 1) or (1, 3) or (3, ∞).
Since the initial condition is at t0 = 2 ∈ (1, 3), then the domain where we know for sure the solution
is defined is
D = (1, 3).
C
90 2. SECOND ORDER LINEAR EQUATIONS

Remark: It is not clear whether the solution in the example above can be extended to a larger
domain than (1, 3). It may be extended to a larger domain, it may not. What the Theorem 2.1.3
says is that we are sure that the solution exists on the domain (1, 3).

2.1.4. Properties of Homogeneous Equations. We now focus on a particular type of


equations—homogeneous equations–with the hope to get deeper properties of its solutions. So
till the end of this section we focus on homogeneous equations only. We will get back to non-
homogeneous equations in a later section. But before getting into homogeneous equations, we
introduce a new notation to write differential equations. This is a shorter, more economical, nota-
tion. Given two functions a1 , a0 , introduce the function L acting on a function y, as follows,
L(y) = y 00 + a1 (t) y 0 + a0 (t) y. (2.1.4)
The function L acts on the function y and the result is another function, given by Eq. (2.1.4).

8
Example 2.1.11. Compute the operator L(y) = t y 00 + 2y 0 − y acting on y(t) = t3 .
t
Solution: Since y(t) = t3 , then y 0 (t) = 3t2 and y 00 (t) = 6t, hence
8 3
L(t3 ) = t (6t) + 2(3t2 ) − t ⇒ L(t3 ) = 4t2 .
t
The function L acts on the function y(t) = t3 and the result is the function L(t3 ) = 4t2 . C

The function L above is called an operator, to emphasize that L is a function that acts on other
functions, instead of acting on numbers, as the functions we are used to. The operator L above is
also called a differential operator, since L(y) contains derivatives of y. These operators are useful
to write differential equations in a compact notation, since
y 00 + a1 (t) y 0 + a0 (t) y = f (t)
can be written using the operator L(y) = y 00 + a1 (t) y 0 + a0 (t) y as
L(y) = f.
An important type of operators are the linear operators.

Definition 2.1.4. An operator L is a linear operator iff for every pair of functions y1 , y2 and
constants c1 , c2 holds
L(c1 y1 + c2 y2 ) = c1 L(y1 ) + c2 L(y2 ). (2.1.5)

In this Section we work with linear operators, as the following result shows.

Theorem 2.1.5 (Linear Operator). The operator L(y) = y 00 + a1 y 0 + a0 y, where a1 , a0 are con-
tinuous functions and y is a twice differentiable function, is a linear operator.

Proof of Theorem 2.1.5: This is a straightforward calculation:


L(c1 y1 + c2 y2 ) = (c1 y1 + c2 y2 )00 + a1 (c1 y1 + c2 y2 )0 + a0 (c1 y1 + c2 y2 ).
Recall that derivations is a linear operation and then reoorder terms in the following way,
L(c1 y1 + c2 y2 ) = c1 y100 + a1 c1 y10 + a0 c1 y1 + c2 y200 + a1 c2 y20 + a0 c2 y2 .
 

Introduce the definition of L back on the right-hand side. We then conclude that
L(c1 y1 + c2 y2 ) = c1 L(y1 ) + c2 L(y2 ).
This establishes the Theorem. 
The linearity of an operator L translates into the superposition property of the solutions to
the homogeneous equation L(y) = 0.
2.1. GENERAL PROPERTIES 91

Theorem 2.1.6 (Superposition). If L is a linear operator and y1 , y2 are solutions of the homoge-
neous equations L(y1 ) = 0, L(y2 ) = 0, then for every constants c1 , c2 holds
L(c1 y1 + c2 y2 ) = 0.

Remarks:
(a) This result is not true for nonhomogeneous equations. Prove it.
(b) The linearity of an operator L and the superposition property of solutions of the equation
L(y) = 0 deeply connected—like two sides of the same coin.

Proof of Theorem 2.1.6: Verify that the function y = c1 y1 + c2 y2 satisfies L(y) = 0 for every
constants c1 , c2 , that is,
L(y) = L(c1 y1 + c2 y2 ) = c1 L(y1 ) + c2 L(y2 ) = c1 0 + c2 0 = 0.
This establishes the Theorem. 
We now introduce the notion of linearly dependent and linearly independent functions.

Definition 2.1.7. Two functions y1 , y2 are called linearly dependent iff they are proportional.
Otherwise, the functions are linearly independent.

Remarks:
(a) Two functions y1 , y2 are proportional iff there is a constant c such that for all t holds
y1 (t) = c y2 (t).
(b) The function y1 = 0 is proportional to every other function y2 , since
y1 = 0 = 0 y2 .

The definitions of linearly dependent or independent functions found in the literature are
equivalent to the definition given here, but they are worded in a slight different way. Often in the
literature, two functions are called linearly dependent on the interval I iff there exist constants c1 ,
c2 , not both zero, such that for all t ∈ I holds
c1 y1 (t) + c2 y2 (t) = 0.
Two functions are called linearly independent on the interval I iff they are not linearly dependent,
that is, the only constants c1 and c2 that for all t ∈ I satisfy the equation
c1 y1 (t) + c2 y2 (t) = 0
are the constants c1 = c2 = 0. This wording makes it simple to generalize these definitions to an
arbitrary number of functions.

Example 2.1.12.
(a) Show that y1 (t) = sin(t), y2 (t) = 2 sin(t) are linearly dependent.
(b) Show that y1 (t) = sin(t), y2 (t) = t sin(t) are linearly independent.

Solution:
Part (a): This is trivial, since 2y1 (t) − y2 (t) = 0.
Part (b): Find constants c1 , c2 such that for all t ∈ R holds
c1 sin(t) + c2 t sin(t) = 0.
Evaluating at t = π/2 and t = 3π/2 we obtain
π 3π
c1 +
c2 = 0, c1 + c2 = 0 ⇒ c1 = 0, c2 = 0.
2 2
We conclude: The functions y1 and y2 are linearly independent. C
92 2. SECOND ORDER LINEAR EQUATIONS

We now introduce the second main result in this section. If you know two linearly independent
solutions to a second order linear homogeneous differential equation, then you actually know all
possible solutions to that equation. Any other solution is just a linear combination of the previous
two solutions. We repeat that the equation must be homogeneous. This is the closer we can get to
a general formula for solutions to second order linear homogeneous differential equations.

Theorem 2.1.8 (General Solution). If y1 and y2 are linearly independent solutions of the equation
L(y) = 0 on an interval I ⊂ R, where L(y) = y 00 + a1 y 0 + a0 y, and a1 , a2 are continuous functions
on I, then there are unique constants c1 , c2 such that every solution y of the differential equation
L(y) = 0 on I can be written as a linear combination
y(t) = c1 y1 (t) + c2 y2 (t).

Before we prove Theorem 2.1.8, it is convenient to state the following the definitions, which
come out naturally from this Theorem.
Definition 2.1.9.
(a) The functions y1 and y2 are fundamental solutions of the equation L(y) = 0 iff y1 , y2 are
linearly independent and
L(y1 ) = 0, L(y2 ) = 0.
(b) The general solution of the homogeneous equation L(y) = 0 is a two-parameter family of
functions ygen given by
ygen (t) = c1 y1 (t) + c2 y2 (t),
where the arbitrary constants c1 , c2 are the parameters of the family, and y1 , y2 are fundamental
solutions of L(y) = 0.

Example 2.1.13. Show that y1 = et and y2 = e−2t are fundamental solutions to the equation
y 00 + y 0 − 2y = 0.

Solution: We first show that y1 and y2 are solutions to the differential equation, since
L(y1 ) = y100 + y10 − 2y1 = et + et − 2et = (1 + 1 − 2)et = 0,
L(y2 ) = y200 + y20 − 2y2 = 4 e−2t − 2 e−2t − 2e−2t = (4 − 2 − 2)e−2t = 0.
It is not difficult to see that y1 and y2 are linearly independent. It is clear that they are not
proportional to each other. A proof of that statement is the following: Find the constants c1 and
c2 such that
0 = c1 y1 + c2 y2 = c1 et + c2 e−2t t∈R ⇒ 0 = c1 et − 2c2 e−2t
The second equation is the derivative of the first one. Take t = 0 in both equations,
0 = c1 + c2 , 0 = c1 − 2c2 ⇒ c1 = c2 = 0.
We conclude that y1 and y2 are fundamental solutions to the differential equation above. C

Remark: The fundamental solutions to the equation above are not unique. For example, show
that another set of fundamental solutions to the equation above is given by,
2 1 1 t
y1 (t) = et + e−2t , e − e−2t .

y2 (t) =
3 3 3
To prove Theorem 2.1.8 we need to introduce the Wronskian function and to verify some of its
properties. The Wronskian function is studied in the following Subsection and Abel’s Theorem is
proved. Once that is done we can say that the proof of Theorem 2.1.8 is complete.
Proof of Theorem 2.1.8: We need to show that, given any fundamental solution pair, y1 , y2 ,
any other solution y to the homogeneous equation L(y) = 0 must be a unique linear combination
of the fundamental solutions,
y(t) = c1 y1 (t) + c2 y2 (t), (2.1.6)
for appropriately chosen constants c1 , c2 .
2.1. GENERAL PROPERTIES 93

First, the superposition property implies that the function y above is solution of the homoge-
neous equation L(y) = 0 for every pair of constants c1 , c2 .
Second, given a function y, if there exist constants c1 , c2 such that Eq. (2.1.6) holds, then these
constants are unique. The reason is that functions y1 , y2 are linearly independent. This can be
seen from the following argument. If there are another constants c̃1 , c̃2 so that
y(t) = c̃1 y1 (t) + c̃2 y2 (t),
then subtract the expression above from Eq. (2.1.6),
0 = (c1 − c̃1 ) y1 + (c2 − c̃2 ) y2 ⇒ c1 − c̃1 = 0, c2 − c̃2 = 0,
where we used that y1 , y2 are linearly independent. This second part of the proof can be obtained
from the part three below, but I think it is better to highlight it here.
So we only need to show that the expression in Eq. (2.1.6) contains all solutions. We need
to show that we are not missing any other solution. In this third part of the argument enters
Theorem 2.1.3. This Theorem says that, in the case of homogeneous equations, the initial value
problem
L(y) = 0, y(t0 ) = d1 , y 0 (t0 ) = d2 ,
always has a unique solution. That means, a good parametrization of all solutions to the differential
equation L(y) = 0 is given by the two constants, d1 , d2 in the initial condition. To finish the proof
of Theorem 2.1.8 we need to show that the constants c1 and c2 are also good to parametrize all
solutions to the equation L(y) = 0. One way to show this, is to find an invertible map from the
constants d1 , d2 , which we know parametrize all solutions, to the constants c1 , c2 . The map itself
is simple to find,
d1 = c1 y1 (t0 ) + c2 y2 (t0 )
d2 = c1 y10 (t0 ) + c2 y20 (t0 ).
We now need to show that this map is invertible. From linear algebra we know that this map acting
on c1 , c2 is invertible iff the determinant of the coefficient matrix is nonzero,

y1 (t0 ) y2 (t0 ) 0 0
y1 (t0 ) y2 (t0 ) = y1 (t0 ) y2 (t0 ) − y1 (t0 )y2 (t0 ) 6= 0.
0 0

This leads us to investigate the function


W12 (t) = y1 (t) y20 (t) − y10 (t)y2 (t)
This function is called the Wronskian of the two functions y1 , y2 . At the end of this section we
prove Theorem 2.1.14, which says the following: If y1 , y2 are fundamental solutions of L(y) = 0 on
I ⊂ R, then W12 (t) 6= 0 on I. This statement establishes the Theorem. 
2.1.5. The Wronskian Function. We now introduce a function that provides important
information about the linear dependency of two functions y1 , y2 . This function, W , is called the
Wronskian to honor the polish scientist Josef Wronski, who first introduced this function in 1821
while studying a different problem.

Definition 2.1.10. The Wronskian of the differentiable functions y1 , y2 is the function


W12 (t) = y1 (t)y20 (t) − y10 (t)y2 (t).
 
y1 (t) y2 (t)
Remark: Introducing the matrix valued function A(t) = the Wronskian can be
y10 (t) y20 (t)

the determinant of that 2 × 2 matrix, W12 (t) = det A(t) . An alternative notation
written using
y1 y2
is: W12 = 0 .
y1 y20

Example 2.1.14. Find the Wronskian of the functions:


(a) y1 (t) = sin(t) and y2 (t) = 2 sin(t). (ld)
(b) y1 (t) = sin(t) and y2 (t) = t sin(t). (li)
94 2. SECOND ORDER LINEAR EQUATIONS

Solution:
Part (a): By the definition of the Wronskian:

y1 (t) y2 (t) sin(t) 2 sin(t)
W12 (t) = 0 0
= = sin(t)2 cos(t) − cos(t)2 sin(t)
y1 (t) y2 (t) cos(t) 2 cos(t)
We conclude that W12 (t) = 0. Notice that y1 and y2 are linearly dependent.
Part (b): Again, by the definition of the Wronskian:

sin(t) t sin(t)  
W12 (t) =
= sin(t) sin(t) + t cos(t) − cos(t)t sin(t).
cos(t) sin(t) + t cos(t)
We conclude that W12 (t) = sin2 (t). Notice that y1 and y2 are linearly independent. C

It is simple to prove the following relation between the Wronskian of two functions and the
linear dependency of these two functions.

Theorem 2.1.11 (Wronskian I). If y1 , y2 are linearly dependent on I ⊂ R, then


W12 = 0 on I.

Proof of Theorem 2.1.11: Since the functions y1 , y2 are linearly dependent, there exists a nonzero
constant c such that y1 = c y2 ; hence holds,
W12 = y1 y20 − y10 y2 = (c y2 ) y20 − (c y2 )0 y2 = 0.
This establishes the Theorem. 
Remark: The converse statement to Theorem 2.1.11 is false. If W12 (t) = 0 for all t ∈ I, that does
not imply that y1 and y2 are linearly dependent.

Example 2.1.15. Show that the functions


y1 (t) = t2 , and y2 (t) = |t| t, for t∈R
are linearly independent and have Wronskian W12 = 0.
Solution:
First, these functions are linearly independent, since y1 (t) = −y2 (t) for t < 0, but y1 (t) = y2 (t)
for t > 0. So there is not c such that y1 (t) = c y2 (t) for all t ∈ R.
Second, their Wronskian vanishes on R. This is simple to see, since y1 (t) = −y2 (t) for t < 0,
then W12 = 0 for t < 0. Since y1 (t) = y2 (t) for t > 0, then W12 = 0 for t > 0. Finally, it is not
difficult to see that W12 (t = 0) = 0. C

Remark: Often in the literature one finds the negative of Theorem 2.1.11, which is equivalent to
Theorem 2.1.11, and we summarize ibn the followng Corollary.

Corollary 2.1.12 (Wronskian I). If the Wronskian W12 (t0 ) 6= 0 at a point t0 ∈ I, then the functions
y1 , y2 defined on I are linearly independent.

The results mentioned above provide different properties of the Wronskian of two functions.
But none of these results is what we need to finish the proof of Theorem 2.1.8. In order to finish
that proof we need one more result, Abel’s Theorem.

2.1.6. Abel’s Theorem. We now show that the Wronskian of two solutions of a differential
equation satisfies a differential equation of its own. This result is known as Abel’s Theorem.

Theorem 2.1.13 (Abel). If y1 , y2 are twice continuously differentiable solutions of


y 00 + a1 (t) y 0 + a0 (t) y = 0, (2.1.7)
2.1. GENERAL PROPERTIES 95

where a1 , a0 are continuous on I ⊂ R, then the Wronskian W12 satisfies


0
W12 + a1 (t) W12 = 0.
Therefore, for any t0 ∈ I, the Wronskian W12 is given by the expression
W12 (t) = W12 (t0 ) e−A1 (t) ,
Z t
where A1 (t) = a1 (s) ds.
t0

Proof of Theorem 2.1.13: We start computing the derivative of the Wronskian function,
0 0
W12 = y1 y20 − y10 y2 = y1 y200 − y100 y2 .
Recall that both y1 and y2 are solutions to Eq. (2.1.7), meaning,
y100 = −a1 y10 − a0 y1 , y200 = −a1 y20 − a0 y2 .
0
Replace these expressions in the formula for W12 above,
0 0 0 0
= −a1 y1 y20 − y10 y2
  
W12 = y1 −a1 y2 − a0 y2 − −a1 y1 − a0 y1 y2 ⇒ W12
So we obtain the equation
0
W12 + a1 (t) W12 = 0.
This equation for W12 is a first order linear equation; its solution can be found using the method
of integrating factors, given in Section 1.5, which results is the expression in the Theorem 2.1.13.
This establishes the Theorem. 
We now show one application of Abel’s Theorem.

Example 2.1.16. Find the Wronskian of two solutions of the equation


t2 y 00 − t(t + 2) y 0 + (t + 2) y = 0, t > 0.

Solution: Notice that we do not known the explicit expression for the solutions. Nevertheless,
Theorem 2.1.13 says that we can compute their Wronskian. First, we have to rewrite the differential
equation in the form given in that Theorem, namely,
2  2 1
y 00 − + 1 y0 + 2 + y = 0.
t t t
Then, Theorem 2.1.13 says that the Wronskian satisfies the differential equation
2 
0
W12 (t) − + 1 W12 (t) = 0.
t
This is a first order, linear equation for W12 , so its solution can be computed using the method of
integrating factors. That is, first compute the integral
Z t
2  t
− + 1 ds = −2 ln − (t − t0 )
t0 s t0
 t2 
= ln 02 − (t − t0 ).
t
Then, the integrating factor µ is given by
t20 −(t−t0 )
µ(t) =
e ,
t2
which satisfies the condition µ(t0 ) = 1. So the solution, W12 is given by
 0
µ(t)W12 (t) = 0 ⇒ µ(t)W12 (t) − µ(t0 )W12 (t0 ) = 0
so, the solution is
t2 (t−t0 )
W12 (t) = W12 (t0 ) e .
t20
96 2. SECOND ORDER LINEAR EQUATIONS

If we call the constant c = W12 (t0 )/[t20 et0 ], then the Wronskian has the simpler form
W12 (t) = c t2 et .
C

We now state and prove the statement we need to complete the proof of Theorem 2.1.8.

Theorem 2.1.14 (Wronskian II). If y1 , y2 are fundamental solutions of L(y) = 0 on I ⊂ R, then


W12 (t) 6= 0 on I.

Remark: Instead of proving the Theorem above, we prove an equivalent statement—the negative
statement.

Corollary 2.1.15 (Wronskian II). If y1 , y2 are solutions of L(y) = 0 on I ⊂ R and there is a


point t1 ∈ I such that W12 (t1 ) = 0, then y1 , y2 are linearly dependent on I.

Proof of Corollary 2.1.15: We know that y1 , y2 are solutions of L(y) = 0. Then, Abel’s Theorem
says that their Wronskian W12 is given by
W12 (t) = W 12(t0 ) e−A1 (t) ,
for any t0 ∈ I. Chossing the point t0 to be t1 , the point where by hypothesis W12 (t1 ) = 0, we get
that
W12 (t) = 0 for all t ∈ I.
Knowing that the Wronskian vanishes identically on I, we can write
y1 y20 − y10 y2 = 0,
on I. If either y1 or y2 is the function zero, then the set is linearly dependent. So we can assume
that both are not identically zero. Let’s assume there exists t1 ∈ I such that y1 (t1 ) 6= 0. By
continuity, y1 is nonzero in an open neighborhood I1 ⊂ I of t1 . So in that neighborhood we can
divide the equation above by y12 ,
y1 y20 − y10 y2  y 0
2 y2
= 0 ⇒ =0 ⇒ = c, on I1 ,
y12 y1 y1
where c ∈ R is an arbitrary constant. So we conclude that y1 is proportional to y2 on the open set
I1 . That means that the function y(t) = y2 (t) − c y1 (t), satisfies
L(y) = 0, y(t1 ) = 0, y 0 (t1 ) = 0.
Therefore, the existence and uniqueness Theorem 2.1.3 says that y(t) = 0 for all t ∈ I. This finally
shows that y1 and y2 are linearly dependent. This establishes the Theorem. 
2.1. GENERAL PROPERTIES 97

2.1.7. Exercises.

2.1.1.- Find the longest interval where the solution y of the initial value problems below is defined.
(Do not try to solve the differential equations.)

(a) t2 y 00 + 6y = 2t, y(1) = 2, y 0 (1) = 3.


(b) (t − 6)y 0 + 3ty 0 − y = 1, y(3) = −1, y 0 (3) = 2.

2.1.2.- (a) Verify that y1 (t) = t2 and y2 (t) = 1/t are solutions to the differential equation
t2 y 00 − 2y = 0, t > 0.
b
(b) Show that y(t) = a t2 + is solution of the same equation for all constants a, b ∈ R.
t

2.1.3.- If the graph of y, solution to a second order linear differential equation L(y(t)) = 0 on the
interval [a, b], is tangent to the t-axis at any point t0 ∈ [a, b], then find the solution y explicitly.

2.1.4.- Can the function y(t) = sin(t2 ) be solution on an open interval containing t = 0 of a
differential equation
y 00 + a(t) y 0 + b(t)y = 0,
with continuous coefficients a and b? Explain your answer.

2.1.5.- Verify whether the functions y1 , y2 below are a fundamental set for the differential equations
given below:

(a) y1 (t) = cos(2t), y2 (t) = sin(2t),


y 00 + 4y = 0.
(b) y1 (t) = et , y2 (t) = t et ,
y 00 − 2y 0 + y = 0.
x
(c) y1 (x) = x, y2 (t) = x e ,
x2 y 00 − 2x(x + 2) y 0 + (x + 2) y = 0.

2.1.6.- Compute the Wronskian of the following functions:

(a) f (t) = sin(t), g(t) = cos(t).


(b) f (x) = x, g(x) = x ex .
(c) f (θ) = cos2 (θ), g(θ) = 1 + cos(2θ).

2.1.7.- If the Wronskian of any two solutions of the differential equation


y 00 + p(t) y 0 + q(t) y = 0
is constant, what does this imply about the coefficients p and q?

2.1.8.- Let y(t) = c1 t + c2 t2 be the general solution of a second order linear differential equation
L(y) = 0. By eliminating the constants c1 and c2 in terms of y and y 0 , find a second order
differential equation satisfied by y.
98 2. SECOND ORDER LINEAR EQUATIONS

2.2. Homogenous Constant Coefficients Equations


All solutions to a second order linear homogeneous equation can be obtained from any pair of
nonproportional solutions. This is the main idea given in § 2.1, Theorem 2.1.8. In this section
we obtain these two linearly independent solutions in the particular case that the equation has
constant coefficients. Such problem reduces to solve for the roots of a degree-two polynomial, the
characteristic polynomial.

2.2.1. The Roots of the Characteristic Polynomial. Thanks to the work done in § 2.1 we
only need to find two linearly independent solutions to second order linear homogeneous equations.
Then Theorem 2.1.8 says that every other solution is a linear combination of the former two. How
do we find any pair of linearly independent solutions? Since the equation is so simple, having
constant coefficients, we find such solutions by trial and error. Here is an example of this idea.

Example 2.2.1. Find solutions to the equation


y 00 + 5y 0 + 6y = 0. (2.2.1)

Solution: We try to find solutions to this equation using simple test functions. For example, it is
clear that power functions y = tn won’t work, since the equation
n(n − 1) t(n−2) + 5n t(n−1) + 6 tn = 0
cannot be satisfied for all t ∈ R. We obtained, instead, a condition on t. This rules out power
functions. A key insight is to try with a test function having a derivative proportional to the original
function,
y 0 (t) = r y(t).
Such function would be simplified from the equation. For example, we try now with the test function
y(t) = ert . If we introduce this function in the differential equation we get
(r2 + 5r + 6) ert = 0 ⇔ r2 + 5r + 6 = 0. (2.2.2)
We have eliminated the exponential and any t dependence from the differential equation, and now
the equation is a condition on the constant r. So we look for the appropriate values of r, which are
the roots of a polynomial degree two,
√ r+ = −2,

1  1
r± = −5 ± 25 − 24 = (−5 ± 1) ⇒
2 2 r− = −3.
We have obtained two different roots, which implies we have two different solutions,
y1 (t) = e−2t , y2 (t) = e−3t .
These solutions are not proportional to each other, so the are fundamental solutions to the differ-
ential equation in (2.2.1). Therefore, Theorem 2.1.8 in § 2.1 implies that we have found all possible
solutions to the differential equation, and they are given by
y(t) = c1 e−2t + c2 e−3t , c1 , c2 ∈ R. (2.2.3)
C

From the example above we see that this idea will produce fundamental solutions to all constant
coefficients homogeneous equations having associated polynomials with two different roots. Such
polynomial play an important role to find solutions to differential equations as the one above, so
we give such polynomial a name.

Definition 2.2.1. The characteristic polynomial and characteristic equation of the second
order linear homogeneous equation with constant coefficients
y 00 + a1 y 0 + a0 = 0,
are given by
p(r) = r2 + a1 r + a0 , p(r) = 0.
2.2. HOMOGENOUS CONSTANT COEFFICIENTS EQUATIONS 99

As we saw in Example 2.2.1, the roots of the characteristic polynomial are crucial to express
the solutions of the differential equation above. The characteristic polynomial is a second degree
polynomial with real coefficients, and the general expression for its roots is
1 p 
r± = −a1 ± a21 − 4a0 .
2
If the discriminant (a21 − 4a0 ) is positive, zero, or negative, then the roots of p are different real
numbers, only one real number, or a complex-conjugate pair of complex numbers. For each case
the solution of the differential equation can be expressed in different forms.

Theorem 2.2.2 (Constant Coefficients). If r± are the roots of the characteristic polynomial to the
second order linear homogeneous equation with constant coefficients
y 00 + a1 y 0 + a0 y = 0, (2.2.4)
and if c+ , c- are arbitrary constants, then the following statements hold true.
(a) If r+ 6= r- , real or complex, then the general solution of Eq. (2.2.4) is given by
ygen (t) = c+ er+ t + c- er- t .
(b) If r+ = r- = r0 ∈ R, then the general solution of Eq. (2.2.4) is given by
ygen (t) = c+ er0 t + c- ter0 t .
Furthermore, given real constants t0 , y0 and y1 , there is a unique solution to the initial value problem
given by Eq. (2.2.4) and the initial conditions y(t0 ) = y0 and y 0 (t0 ) = y1 .

Remarks:
(a) The proof is to guess that functions y(t) = ert must be solutions for appropriate values of
the exponent constant r, the latter being roots of the characteristic polynomial. When the
characteristic polynomial has two different roots, Theorem 2.1.8 says we have all solutions.
When the root is repeated we use the reduction of order method to find a second solution not
proportional to the first one.
(b) At the end of the section we show a proof where we construct the fundamental solutions y1 , y2
without guessing them. We do not need to use Theorem 2.1.8 in this second proof, which is
based completely in a generalization of the reduction of order method.
Proof of Theorem 2.2.2: We guess that particular solutions to Eq. 2.2.4 must be exponential
functions of the form y(t) = ert , because the exponential will cancel out from the equation and
only a condition for r will remain. This is what happens,
r2 ert + a1 ert + a0 ert = 0 ⇒ r2 + a1 r + a0 = 0.
The second equation says that the appropriate values of the exponent are the root of the charac-
teristic polynomial. We now have two cases. If r+ 6= r- then the solutions
y+ (t) = er+ t , y- (t) = er- t ,
are linearly independent, so the general solution to the differential equation is
ygen (t) = c+ er+ t + c- er- t .
If r+ = r- = r0 , then we have found only one solution y+ (t) = er0 t , and we need to find a second
solution not proportional to y+ . This is what the reduction of order method is perfect for. We write
the second solution as
y- (t) = v(t) y+ (t) ⇒ y- (t) = v(t) er0 t ,
and we put this expression in the differential equation (2.2.4),
v 00 + 2r0 v 0 + vr02 er0 t + v 0 + r0 v a1 er0 t + a0 v er0 t = 0.
 

We cancel the exponential out of the equation and we reorder terms,


v 00 + (2r0 + a1 ) v 0 + (r02 + a1 r0 + a0 ) v = 0.
100 2. SECOND ORDER LINEAR EQUATIONS

We now need to use that r0 is a root of the characteristic polynomial, r02 + a1 r0 + a0 = 0, so the
last term in the equation above vanishes. But we also need to use that the root r0 is repeated,
a1 1p 2 a1
r0 = − ± a1 − 4a0 = − ⇒ 2r0 + a1 = 0.
2 2 2
The equation on the right side above implies that the second term in the differential equation for
v vanishes. So we get that
v 00 = 0 ⇒ v(t) = c1 + c2 t
and the second solution is y- (t) = (c1 + c2 t) y+ (t). If we choose the constant c2 = 0, the function
y- is proportional to y+ . So we definitely want c2 6= 0. The other constant, c1 , only adds a term
proportional to y+ , we can choose it zero. So the simplest choice is c1 = 0, c2 = 1, and we get the
fundamental solutions
y+ (t) = er0 t , y- (t) = t er0 t .
So the general solution for the repeated root case is
ygen (t) = c+ er0 t + c- t er0 t .
The furthermore part follows from solving a 2 × 2 linear system for the unknowns c+ and c- . The
initial conditions for the case r+ 6= r- are the following,
y0 = c+ er+ t0 + c- er- t0 , y1 = r+ c+ er+ t0 + r- c- er- t0 .
It is not difficult to verify that this system is always solvable and the solutions are
(r- y0 − y1 ) (r+ y0 − y1 )
c+ = − , c- = .
(r+ − r- ) er+ t0 (r+ − r- ) er- t0
The initial conditions for the case r- = r- = r0 are the following,
y0 = (c+ + c- t0 ) er0 t0 , y1 = c- er0 t0 + r0 (c+ + c- t0 ) er0 t0 .
It is also not difficult to verify that this system is always solvable and the solutions are
y0 + t0 (r0 y0 − y1 ) (r0 y0 − y0 )
c+ = , c- = − .
er0 t0 er0 t0
This establishes the Theorem. 

Example 2.2.2. Consider an object of mass m = 1 grams hanging from a spring with spring
constant k = 6 grams per second square moving in a fluid with damping constant d = 5 grams
per second. Find the movement of this object if the initial position is y(0) = 1 centimeter and the
initial velocity is y 0 (0) = −1 centimeter per second. The coordinate y is measured from the spring
ewquilibrium position, positive downwards.
Solution: The movement of the object attached to the spring in that liquid is the solution of the
initial value problem
y 00 + 5y 0 + 6y = 0, y(0) = 1, y 0 (0) = −1.
Notice the initial velocity is negative, which means the initial velocity is in the upward direction.
We know that the general solution of the differential equation above is
ygen (t) = c+ e−2t + c- e−3t .
We now find the constants c+ and c- that satisfy the initial conditions above,
)
1 = y(0) = c+ + c- 
c+ = 2,
0 ⇒
−1 = y (0) = −2c+ − 3c- c- = −1.

Therefore, the unique solution to the initial value problem is


y(t) = 2e−2t − e−3t .
C
2.2. HOMOGENOUS CONSTANT COEFFICIENTS EQUATIONS 101

Example 2.2.3. Find the general solution ygen of the differential equation
2y 00 − 3y 0 + y = 0.

Solution: We look for every solutions of the form y(t) = ert , where r is solution of the characteristic
equation 
√ r+ = 1,
1
2r2 − 3r + 1 = 0 ⇒ r =

3± 9−8 ⇒
4 r- = 1 .
2
Therefore, the general solution of the equation above is
ygen (t) = c+ et + c- et/2 .
C

Example 2.2.4. Consider an object of mass m = 9 grams hanging from a spring with spring
constant k = 1 grams per second square moving in a fluid with damping constant d = 6 grams
per second. Find the movement of this object if the initial position is y(0) = 1 centimeter and the
initial velocity is y 0 (0) = 5/3 centimeters per second.
Solution: The movement of the object is described by the solution to the initial value problem
5
9y 00 + 6y 0 + y = 0, y(0) = 1, y 0 (0) = .
3
The characteristic polynomial is p(r) = 9r2 + 6r + 1, with roots given by
1 √  1
r± = −6 ± 36 − 36 ⇒ r+ = r- = − .
18 3
Theorem 2.2.2 says that the general solution has the form
ygen (t) = c+ e−t/3 + c- t e−t/3 .
We need to compute the derivative of the expression above to impose the initial conditions,
0 c+  t  −t/3
ygen (t) = − e−t/3 + c- 1 − e ,
3 3
then, the initial conditions imply that

1 = y(0) = c+ , 
5 c ⇒ c+ = 1, c- = 2.
= y 0 (0) = − + c- 
+
3 3
So, the solution to the initial value problem above is: y(t) = (1 + 2t) e−t/3 . C

Example 2.2.5. Consider an object of mass m = 1 grams hanging from a spring with spring
constant k = 13 grams per second square moving in a fluid with damping constant d = 4 grams
per second. Find the position function of this object for arbitrary initial position and velocity.
Solution: The position function y of the object must be solution of Newton’s second law of motion
y 00 + 4y 0 + 13y = 0.
To find the solutions we first need to find the roots of the characteristic polynomial,
1 √ 1 √ 
r2 + 4r + 13 = 0 ⇒ r± =

−4 ± 16 − 52 ⇒ r± = −4 ± 36 ,
2 2
so we obain the roots
r± = −2 ± 3i.
Since the roots of the characteristic polynomial are different, Theorem 2.2.2 says that the general
solution of the differential equation above, which includes complex-valued solutions, can be written
as follows,
ygen (t) = c̃+ e(−2+3i)t + c̃- e(−2−3i)t , c̃+ , c̃- ∈ C.
Unfortunately, it is not clear from the expression above how the object is going to move. C
102 2. SECOND ORDER LINEAR EQUATIONS

2.2.2. Real Solutions for Complex Roots. We study in more detail the solutions to the
differential equation (2.2.4),
y 00 + a1 y 0 + a0 y = 0,
in the case the characteristic polynomial has complex roots. Since these roots have the form
a1 1p 2
r± = − ± a1 − 4a0 ,
2 2
the roots are complex-valued in the case a21 − 4a0 < 0. We use the notation
r
a1 a2
r± = α ± iβ, with α = − , β = a0 − 1 .
2 4
The fundamental solutions in Theorem 2.2.2 are the complex-valued functions
ỹ+ = e(α+iβ)t , ỹ- = e(α−iβ)t .
The general solution constructed from these solutions is
ygen (t) = c̃+ e(α+iβ)t + c̃- e(α−iβ)t , c̃+ , c̃- ∈ C.
This formula for the general solution includes real valued and complex valued solutions. But it is
not so simple to single out the real valued solutions. Knowing the real valued solutions could be
important in physical applications. If a physical system is described by a differential equation with
real coefficients, more often than not one is interested in finding real valued solutions. For that
reason we now provide a new set of fundamental solutions that are real valued. Using real valued
fundamental solution is simple to separate all real valued solutions from the complex valued ones.

Theorem 2.2.3 (Real Valued Fundamental Solutions). If the differential equation


y 00 + a1 y 0 + a0 y = 0, (2.2.5)
where a1 , a0 are real constants, has characteristic polynomial with complex roots r± = α ± iβ and
complex valued fundamental solutions
ỹ+ (t) = e(α+iβ)t , ỹ- (t) = e(α−iβ)t ,
then the equation also has real valued fundamental solutions given by
y+ (t) = eαt cos(βt), y- (t) = eαt sin(βt).
Furthermore, the general solution of the differential equation above can be written either as
ygen (t) = c1 cos(βt) + c2 sin(βt) eαt ,


where c1 , c2 are arbitrary constants, or as


ygen (t) = A eαt cos(βt − φ)
where A > 0 is the amplitude and φ ∈ [−π, π) is the phase shift of the solution.

Proof of Theorem 2.2.3: We start with the complex valued fundamental solutions
ỹ+ (t) = e(α+iβ)t , ỹ- (t) = e(α−iβ)t .
We take the function ỹ+ and we use a property of complex exponentials,
ỹ+ (t) = e(α+iβ)t = eαt eiβt = eαt cos(βt) + i sin(βt) ,


where on the last step we used Euler’s formula eiθ = cos(θ) + i sin(θ). Repeat this calculation for
y- we get,
ỹ+ (t) = eαt cos(βt) + i sin(βt) , ỹ- (t) = eαt cos(βt) − i sin(βt) .
 

If we recall the superposition property of linear homogeneous equations, Theorem 2.1.6, we know
that any linear combination of the two solutions above is also a solution of the differential equa-
tion (2.2.7), in particular the combinations
1  1 
y+ (t) = ỹ+ (t) + ỹ- (t) , y- (t) = ỹ+ (t) − ỹ- (t) .
2 2i
2.2. HOMOGENOUS CONSTANT COEFFICIENTS EQUATIONS 103

A straightforward computation gives


y+ (t) = eαt cos(βt), y- (t) = eαt sin(βt).
Therefore, the genreal solution is
ygen (t) = c1 cos(ω0 t) + c2 sin(βt) eαt .


There is an equivalent way to express the general solution above given by


ygen (t) = A eαt cos(ω0 t − φ).
These two expressions for ygen are equivalent because of the trigonometric identity
A cos(βt − φ) = A cos(βt) cos(φ) + A sin(βt) sin(φ),
which holds for all A and φ, and βt. Then, it is not difficult to see that
 p
A = c21 + c22 ,
)
c1 = A cos(φ), 

c2 = A sin(φ). tan(φ) = c2 .
c1
This establishes the Theorem. 

Example 2.2.6. Describe the movement of the object in Example 2.2.5 above, which satisfies
Newton’s equation
y 00 + 4y 0 + 13y = 0,
with initial position of 2 centimeters and initial velocity of 2 centimeters per second.
Solution: We already found the roots of the characteristic polynomial,
1 √
r2 + 4r + 13 = 0 ⇒ r± =

−4 ± 16 − 52 ⇒ r± = −2 ± 3i.
2
So the complex valued fundamental solutions are
ỹ+ (t) = e(−2+3i) t , ỹ- (t) = e(−2−3i) t .
Theorem 2.2.3 says that real valued fundamental solutions are given by
y+ (t) = e−2t cos(3t), y- (t) = e−2t sin(3t).
So the real valued general solution can be written as
ygen (t) = c+ cos(3t) + c- sin(3t) e−2t ,

c+ , c- ∈ R.
0
We now use the initial conditions, y(0) = 2, and y (0) = 2,
)
2 = y(0) = c+
⇒ c+ = 2, c- = 2,
2 = y 0 (0) = 3c- − 2c+
therefore the solution is
y(t) = 2 cos(3t) + 2 sin(3t) e−2t .

(2.2.6)
C

Example 2.2.7. Write the solution of the Example 2.2.6 above in terms of the amplitude A and
phase shift φ.
Solution: In order to understand the movement of the object Examples 2.2.5-2.2.6 it is convenient
to write the solution in Eq. (2.2.6) in terms of amplitude and phase shift
y(t) = A e−2t cos(3t − φ) ⇒ y 0 (t) = −2A e−2t cos(3t − φ) − 3A e−2t sin(3t − φ).
Let us use again the initial conditions y(0) = 2, and y 0 (0) = 2,
) (
2 = y(0) = A cos(−φ) 2 = A cos(φ)
0 ⇒
2 = y (0) = −2A cos(−φ) − 3A sin(−φ) 2 = −2A cos(φ) + 3A sin(φ)
104 2. SECOND ORDER LINEAR EQUATIONS

Using the first equation in the second one we get


) (
2 = A cos(φ) 2 = A cos(φ)

2 = −4 + 3A sin(φ) 2 = A sin(φ)

From here it is not too difficult to see that


p p
A = 22 + 22 = 2 (2), tan(φ) = 1.
Since φ ∈ [−π, π), the equation tan(φ) = 1 has two solutions in that interval,
π π 3π
φ1 = , φ2 = −π =− .
4 4 4
π
But the φ we need satisfies that cos(φ) > 0 and sin(φ) > 0, which means φ = , then
4
√  π
y(t) = 2 2 e−2t cos 3t − .
4
C
Remark: Sometimes it is difficult to remember the formula for real valued fundamental solutions.
One way to obtain those solutions without remembering the formula is to start repeat the proof of
Theorem 2.2.3. Start with the complex valued solution ỹ+ and use the properties of the complex
exponential,
ỹ+ (t) = e(−2+3i)t = e−2t e3it = e−2t cos(3t) + i sin(3t) .


The real valued fundamental solutions are the real and imaginary parts in that expression.

Example 2.2.8. Find real valued fundamental solutions to the equation


y 00 + 2 y 0 + 6 y = 0.

Solution: The roots of the characteristic polynomial p(r) = r2 + 2r + 6 are


1 √  1 √  √
r± = −2 ± 4 − 24 = −2 ± −20 ⇒ r± = −1 ± i 5.
2 2
These are complex-valued roots, with

α = −1, β = 5.
Real-valued fundamental solutions are
√ √
y1 (t) = e−t cos( 5 t), y2 (t) = e−t sin( 5 t).

y1 y2

Second order differential equations with


e −t characteristic polynomials having com-
plex roots, like the one in this exam-
ple, describe physical processes related
t to damped oscillations. An example
−t
−e from physics is a pendulums with fric-
tion. C

Figure 2. Solutions from Ex. 2.2.8.


2.2. HOMOGENOUS CONSTANT COEFFICIENTS EQUATIONS 105

Example 2.2.9. Find the real valued general solution of y 00 + 5 y = 0.



characteristic polynomial is p(r) = r2 + 5, with roots r± = ± 5 i. In this case
Solution: The √
α = 0, and β = 5. Real valued fundamental solutions are
√ √
y+ (t) = cos( 5 t), y- (t) = sin( 5 t).

The real valued general solution is


√ √
ygen (t) = c+ cos( 5 t) + c- sin( 5 t), c+ , c- ∈ R.
C

Remark: Physical processes that oscillate in time without dissipation could be described by dif-
ferential equations like the one in this example.

Example 2.2.10. Find the movement of a 5kg mass attached to a spring with constant k =
2
5kg/secs
√ moving in a medium with damping constant d = 5kg/secs, with initial conditions y(0) =
3 and y 0 (0) = 0.
Solution: Newton’s second law of motion for this mass is my 00 + dy 0 + ky = 0 with m = 5, k = 5,
d = 5, that is,
y 00 + y 0 + y = 0.
The roots of the characteristic polynomial are

1 √  1 3
r± = −1 ± 1 − 4 ⇒ r± = − ± i .
2 2 2
We can write the solution in terms of an amplitude and a phase shift,
 √3 
y(t) = A e−t/2 cos t−φ .
2
We now use the initial conditions to find out the amplitude A and phase-shift φ. To do that we
need first to compute the derivative,

1  √3  √3  √3 
y 0 (t) = − A e−t/2 cos t−φ − A e−t/2 sin t−φ .
2 2 2 2
The initial conditions in the example imply,

√ 1 3
3 = y(0) = A cos(φ), 0 = y 0 (0) = − A cos(φ) + A sin(φ).
2 2
The second equation above√allows us to compute the phase shift. Recall that φ ∈ [−π, π), and the
condition that tan(φ) = 1/ 3 has two solutions in that interval,
1 π π 5π
tan(φ) = √ ⇒ φ1 = , or φ2 = −π =− .
3 6 6 6

If φ = −5π/6, then y(0) < 0, which is not out case. Hence we must choose φ = π/6. With that
phase shift, the amplitude is given by

√ π 3
3 = A cos =A ⇒ A = 2.
6 2
Therefore we obtain the solution
 √3 π
y(t) = 2 e−t/2 cos t− .
2 6
C
106 2. SECOND ORDER LINEAR EQUATIONS

2.2.3. Constructive Proof of Theorem 2.2.2. We now present an alternative proof for
Theorem 2.2.2 that does not involve guessing the fundamental solutions of the equation. Instead,
we construct these solutions using a generalization of the reduction of order method described in
§ 2.4.
Proof of Theorem 2.2.2: The proof has two main parts: First, we transform the original equation
into an equation simpler to solve for a new unknown; second, we solve this simpler problem.
In order to transform the problem into a simpler one, we express the solution y as a product
of two functions, that is, y(t) = u(t)v(t). Choosing v in an appropriate way the equation for u will
be simpler to solve than the equation for y. Hence,
y = uv ⇒ y 0 = u0 v + v 0 u ⇒ y 00 = u00 v + 2u0 v 0 + v 00 u.
Therefore, Eq. (2.2.4) implies that
(u00 v + 2u0 v 0 + v 00 u) + a1 (u0 v + v 0 u) + a0 uv = 0,
that is,
h  v0  0 i
u00 + a1 + 2 u + a0 u v + (v 00 + a1 v 0 ) u = 0. (2.2.7)
v
We now choose the function v such that
v0 v0 a1
a1 + 2 = 0 ⇔ =− . (2.2.8)
v v 2
We choose a simple solution of this equation, given by
v(t) = e−a1 t/2 .
Having this expression for v one can compute v 0 and v 00 , and it is simple to check that
a21
v 00 + a1 v 0 = −
v. (2.2.9)
4
Introducing the first equation in (2.2.8) and Eq. (2.2.9) into Eq. (2.2.7), and recalling that v is
non-zero, we obtain the simplified equation for the function u, given by
a21
u00 − k u = 0, − a0 .
k= (2.2.10)
4
Eq. (2.2.10) for u is simpler than the original equation (2.2.4) for y since in the former there is no
term with the first derivative of the unknown function.
In order to solve Eq. (2.2.10) we repeat the idea followed to obtain this equation, that is, express
function u as a product of two functions, and solve a simple problem of one of the functions. √
We
first consider the harder case, which is when k 6= 0. In this case, let us express u(t) = e kt w(t).
Hence,
√ √ √ √ √ √ √
u0 = ke kt w + e kt w0 ⇒ u00 = ke kt w + 2 ke kt w0 + e kt w00 .
Therefore, Eq. (2.2.10) for function u implies the following equation for function w
√ √ √
0 = u00 − ku = e kt (2 k w0 + w00 ) ⇒ w00 + 2 kw0 = 0.
Only derivatives of w appear in the latter equation, so denoting x(t) = w0 (t) we have to solve a
simple equation
√ √
x0 = −2 k x ⇒ x(t) = x0 e−2 kt , x0 ∈ R.
Integrating we obtain w as follows,
√ √
x0
w0 = x0 e−2 kt
⇒ w(t) = − √ e−2 kt + c0 .
2 k

renaming c1 = −x0 /(2 k), we obtain
√ √ √
w(t) = c1 e−2 kt
+ c0 ⇒ u(t) = c0 e kt
+ c1 e− kt
.
We then obtain the expression for the solution y = uv, given by
a1 √ a1 √
− k)t
y(t) = c0 e(− 2
+ k)t
+ c1 e(− 2 .
2.2. HOMOGENOUS CONSTANT COEFFICIENTS EQUATIONS 107

Since k = (a21 /4 − a0 ), the numbers


a1 √ 1 p 
r± = − ± k ⇔ r± = −a1 ± a21 − 4a0
2 2
are the roots of the characteristic polynomial
r2 + a1 r + a0 = 0,
we can express all solutions of the Eq. (2.2.4) as follows
y(t) = c0 er+ t + c1 er- t , k 6= 0.
Finally, consider the case k = 0. Then, Eq. (2.2.10) is simply given by
u00 = 0 ⇒ u(t) = (c0 + c1 t) c0 , c1 ∈ R.
Then, the solution y to Eq. (2.2.4) in this case is given by
y(t) = (c0 + c1 t) e−a1 t/2 .
Since k = 0, the characteristic equation r2 + a1 r + a0 = 0 has only one root r+ = r− = −a1 /2, so
the solution y above can be expressed as
y(t) = (c0 + c1 t) er+ t , k = 0.
The Furthermore part is the same as in Theorem 2.2.2. This establishes the Theorem. 
2.2.4. Note On the Repeated Root Case. In the case that the characteristic polynomial
of a differential equation has repeated roots there is an interesting argument to guess the solution
y- . The idea is to take a particular type of limit in solutions of differential equations with complex
valued roots.
Consider the equation in (2.2.4) with a characteristic polynomial having complex valued roots
given by r± = α ± iβ, with r
a1 a2
α=− , β = a0 − 1 .
2 4
Real valued fundamental solutions in this case are given by
ŷ+ = eα t cos(βt), ŷ- = eα t sin(βt).
We now study what happen to these solutions ŷ+ and ŷ- in the following limit: The variable t is held
constant, α is held constant, and β → 0. The last two conditions are conditions on the equation
coefficients, a1 , a0 . For example, we fix a1 and we vary a0 → a21 /4 from above.
Since cos(βt) → 1 as β → 0 with t fixed, then keeping α fixed too, we obtain
ŷ+ (t) = eα t cos(βt) −→ eα t = y+ (t).
sin(βt)
Since → 1 as β → 0 with t constant, that is, sin(βt) → βt, we conclude that
βt
ŷ- (t) sin(βt) α t sin(βt) α t
= e = te −→ t eα t = y- (t).
β β βt
The calculation above says that the function ŷ- /β is close to the function y- (t) = t eα t in the limit
β → 0, t held constant. This calculation provides a candidate, y- (t) = t y+ (t), of a solution to
Eq. (2.2.4). It is simple to verify that this candidate is in fact solution of Eq. (2.2.4). Since y- is not
proportional to y+ , one then concludes the functions y+ , y- are a fundamental set for the differential
equation in (2.2.4) in the case the characteristic polynomial has repeated roots.
108 2. SECOND ORDER LINEAR EQUATIONS

2.2.5. Exercises.

2.2.1.- .
2.3. NONHOMOGENEOUS EQUATIONS 109

2.3. Nonhomogeneous Equations


All solutions of a linear homogeneous equation can be obtained from only two solutions that are
linearly independent— fundamental solutions. Every other solution is a linear combination of
these two. This is the general solution formula for homogeneous equations, and it is the main
result in § 2.1, Theorem 2.1.8. This result is not longer true for nonhomogeneous equations. The
superposition property, Theorem 2.1.6, which played an important part to get the general solution
formula for homogeneous equations, is not true for nonhomogeneous equations.
We start this section proving a general solution formula for nonhomogeneous equations. We
show that all the solutions of the nonhomogeneous equation are a translation by a fixed func-
tion of the solutions of the homogeneous equation. The fixed function is one solution—it doesn’t
matter which one—of the nonhomogenous equation, and it is called a particular solution of the
nonhomogeneous equation.
Later in this section we show two different ways to compute the particular solution of a non-
homogeneous equation—the undetermined coefficients method and the variation of parameters
method. In the former method we guess a particular solution from the expression of the source
in the equation. The guess contains a few unknown constants, the undetermined coefficients, that
must be determined by the equation. The undetermined method works for constant coefficients
linear operators and simple source functions. The source functions and the associated guessed so-
lutions are collected in a small table. This table is constructed by trial and error. In the latter
method we have a formula to compute a particular solution in terms of the equation source, and
fundamental solutions of the homogeneous equation. The variation of parameters method works
with variable coefficients linear operators and general source functions. But the calculations to find
the solution are usually not so simple as in the undetermined coefficients method.

2.3.1. The General Solution Formula. The general solution formula for homogeneous
equations, Theorem 2.1.8, is no longer true for nonhomogeneous equations. But there is a general
solution formula for nonhomogeneous equations. Such formula involves three functions, two of them
are fundamental solutions of the homogeneous equation, and the third function is any solution of the
nonhomogeneous equation. Every other solution of the nonhomogeneous equation can be obtained
from these three functions.

Theorem 2.3.1 (General Solution). Every solution y of the nonhomogeneous equation


L(y) = f, (2.3.1)
00 0
with L(y) = y + a1 y + a0 y, where a1 , a0 , and f are continuous functions, is given by
y = c1 y1 + c2 y2 + yp ,
where the functions y1 and y2 are fundamental solutions of the homogeneous equation, L(y1 ) = 0,
L(y2 ) = 0, and yp is any solution of the nonhomogeneous equation L(yp ) = f .

Before we proof Theorem 2.3.1 we state the following definition, which comes naturally from
this Theorem.

Definition 2.3.2. The general solution of the nonhomogeneous equation L(y) = f is a two-
parameter family of functions
ygen (t) = c1 y1 (t) + c2 y2 (t) + yp (t), (2.3.2)
where the functions y1 and y2 are fundamental solutions of the homogeneous equation, L(y1 ) = 0,
L(y2 ) = 0, and yp is any solution of the nonhomogeneous equation L(yp ) = f .

Remark: The difference of any two solutions of the nonhomogeneous equation is actually a solution
of the homogeneous equation. This is the key idea to prove Theorem 2.3.1. In other words, the
solutions of the nonhomogeneous equation are a translation by a fixed function, yp , of the solutions
of the homogeneous equation.
110 2. SECOND ORDER LINEAR EQUATIONS

Proof of Theorem 2.3.1: Let y be any solution of the nonhomogeneous equation L(y) = f .
Recall that we already have one solution, yp , of the nonhomogeneous equation, L(yp ) = f . We can
now subtract the second equation from the first,
L(y) − L(yp ) = f − f = 0 ⇒ L(y − yp ) = 0.
The equation on the right is obtained from the linearity of the operator L. This last equation says
that the difference of any two solutions of the nonhomogeneous equation is solution of the homoge-
neous equation. The general solution formula for homogeneous equations says that all solutions of
the homogeneous equation can be written as linear combinations of a pair of fundamental solutions,
y1 , y2 . So the exist constants c1 , c2 such that
y − yp = c1 y1 + c2 y2 .
Since for every y solution of L(y) = f we can find constants c1 , c2 such that the equation above holds
true, we have found a formula for all solutions of the nonhomogeneous equation. This establishes
the Theorem. 

2.3.2. The Undetermined Coefficients Method. The general solution formula in (2.3.2)
is the most useful if there is a way to find a particular solution yp of the nonhomogeneous equation
L(yp ) = f . We now present a method to find such particular solution, the undetermined coefficients
method. This method works for linear operators L with constant coefficients and for simple source
functions f . Here is a summary of the undetermined coefficients method:
(1) Find fundamental solutions y1 , y2 of the homogeneous equation L(y) = 0.
(2) Given the source functions f , guess the solutions yp following the Table 1 below.
(3) If the function yp given by the table satisfies L(yp ) = 0, then change the guess to typ. . If typ
satisfies L(typ ) = 0 as well, then change the guess to t2 yp .
(4) Find the undetermined constants k in the function yp using the equation L(y) = f , where y is
yp , or typ or t2 yp .

f (t) (Source) (K, m, a, b, given.) yp (t) (Guess) (k not given.)

Keat keat
Km tm + · · · + K0 km tm + · · · + k0
K1 cos(bt) + K2 sin(bt) k1 cos(bt) + k2 sin(bt)

(Km tm + · · · + K0 ) eat (km tm + · · · + k0 ) eat


 
K1 cos(bt) + K2 sin(bt) eat k1 cos(bt) + k2 sin(bt) eat
  
(Km tm + · · · + K0 ) K̃1 cos(bt) + K̃2 sin(bt) km tm + · · · + k0 k̃1 cos(bt) + k̃2 sin(bt)

Table 1. List of sources f and solutions yp to the equation L(yp ) = f .

This is the undetermined coefficients method. It is a set of simple rules to find a particular
solution yp of an nonhomogeneous equation L(yp ) = f in the case that the source function f is one
of the entries in the Table 1. There are a few formulas in particular cases and a few generalizations
of the whole method. We discuss them after a few examples.

Example 2.3.1 (First Guess Right). Find all solutions to the nonhomogeneous equation
y 00 − 3y 0 − 4y = 3 e2t .

Solution: From the problem we get L(y) = y 00 − 3y 0 − 4y and f (t) = 3 e2t .


2.3. NONHOMOGENEOUS EQUATIONS 111

(1): Find fundamental solutions y+ , y- to the homogeneous equation L(y) = 0. Since the homoge-
neous equation has constant coefficients we find the characteristic equation
r2 − 3r − 4 = 0 ⇒ r+ = 4, r- = −1, ⇒ y+ (t) = e4t , y- = (t) = e−t .

(2): The table says: For f (t) = 3e2t guess yp (t) = k e2t . The constant k is the undetermined
coefficient we must find.
(3): Since yp (t) = k e2t is not solution of the homogeneous equation, we do not need to modify our
guess. (Recall: L(y) = 0 iff exist constants c+ , c- such that y(t) = c+ e4t + c- e−t .)
(4): Introduce yp into L(yp ) = f and find k. So we do that,
1
(22 − 6 − 4)ke2t = 3 e2t ⇒ −6k = 3 ⇒ k=− .
2
We guessed that yp must be proportional to the exponential e2t in order to cancel out the expo-
nentials in the equation above. We have obtained that
1
yp (t) = − e2t .
2
The undetermined coefficients method gives us a way to compute a particular solution yp of the
nonhomogeneous equation. We now use the general solution theorem, Theorem 2.3.1, to write the
general solution of the nonhomogeneous equation,
1 2t
ygen (t) = c+ e4t + c- e−t − e .
2
C

Remark: The step (4) in Example 2.3.1 is a particular case of the following statement.

Theorem 2.3.3. Consider the equation L(y) = f , where L(y) = y 00 + a1 y 0 + a0 y has constant
coefficients and p is its characteristic polynomial. If the source function is f (t) = K eat , with
p(a) 6= 0, then a particular solution of the nonhomogeneous equation is
K at
yp (t) = e .
p(a)

Proof of Theorem 2.3.3: Since the linear operator L has constant coefficients, let us write L
and its associated characteristic polynomial p as follows,
L(y) = y 00 + a1 y 0 + a0 y, p(r) = r2 + a1 r + a0 .
Since the source function is f (t) = K eat , the Table 1 says that a good guess for a particular soution
of the nonhomogneous equation is yp (t) = k eat . Our hypothesis is that this guess is not solution
of the homogenoeus equation, since
L(yp ) = (a2 + a1 a + a0 ) k eat = p(a) k eat , and p(a) 6= 0.
We then compute the constant k using the equation L(yp ) = f ,
K
(a2 + a1 a + a0 ) k eat = K eat ⇒ p(a) k eat = K eat ⇒ k= .
p(a)
K at
We get the particular solution yp (t) = e . This establishes the Theorem. 
p(a)
Remark: As we said, the step (4) in Example 2.3.1 is a particular case of Theorem 2.3.3,
3 2t 3 3 2t 1
yp (t) = e = 2 e2t = e ⇒ yp (t) = − e2t .
p(2) (2 − 6 − 4) −6 2

In the following example our first guess for a particular solution yp happens to be a solution
of the homogenous equation.
112 2. SECOND ORDER LINEAR EQUATIONS

Example 2.3.2 (First Guess Wrong). Find all solutions to the nonhomogeneous equation
y 00 − 3y 0 − 4y = 3 e4t .

Solution: If we write the equation as L(y) = f , with f (t) = 3 e4t , then the operator L is the same
as in Example 2.3.1. So the solutions of the homogeneous equation L(y) = 0, are the same as in
that example,
y+ (t) = e4t , y- (t) = e−t .
The source function is f (t) = 3 e4t , so the Table 1 says that we need to guess yp (t) = k e4t . However,
this function yp is solution of the homogeneous equation, because
yp = k y+ ⇒ L(yp ) = 0.
We have to change our guess, as indicated in the undetermined coefficients method, step (3)
yp (t) = kt e4t .
This new guess is not solution of the homogeneous equation. So we proceed to compute the constant
k. We introduce the guess into L(yp ) = f ,
yp0 = (1 + 4t) k e4t , yp00 = (8 + 16t) k e4t ⇒ 8 − 3 + (16 − 12 − 4)t k e4t = 3 e4t ,
 

therefore, we get that


3 3
5k = 3 ⇒ k= ⇒ yp (t) = t e4t .
5 5
The general solution theorem for nonhomogneneous equations says that
3
ygen (t) = c+ e4t + c- e−t + t e4t .
5
C

In the following example the equation source is a trigonometric function.

Example 2.3.3 (First Guess Right). Find all the solutions to the nonhomogeneous equation
y 00 − 3y 0 − 4y = 2 sin(t).

Solution: If we write the equation as L(y) = f , with f (t) = 2 sin(t), then the operator L is the
same as in Example 2.3.1. So the solutions of the homogeneous equation L(y) = 0, are the same
as in that example,
y+ (t) = e4t , y- (t) = e−t .
Since the source function is f (t) = 2 sin(t), the Table 1 says that we need to choose the function
yp (t) = k1 cos(t) + k2 sin(t). This function yp is not solution to the homogeneous equation. So we
look for the constants k1 , k2 using the differential equation,
yp0 = −k1 sin(t) + k2 cos(t), yp00 = −k1 cos(t) − k2 sin(t),
and then we obtain
[−k1 cos(t) − k2 sin(t)] − 3[−k1 sin(t) + k2 cos(t)] − 4[k1 cos(t) + k2 sin(t)] = 2 sin(t).
Reordering terms in the expression above we get
(−5k1 − 3k2 ) cos(t) + (3k1 − 5k2 ) sin(t) = 2 sin(t).
The last equation must hold for all t ∈ R. In particular, it must hold for t = π/2 and for t = 0. At
these two points we obtain, respectively,
3

3k1 − 5k2 = 2,
  k1 =
 ,
⇒ 17
−5k1 − 3k2 = 0, k2 = − 5 .

17
So the particular solution to the nonhomogeneous equation is given by
1  
yp (t) = 3 cos(t) − 5 sin(t) .
17
2.3. NONHOMOGENEOUS EQUATIONS 113

The general solution theorem for nonhomogeneous equations implies


1 
ygen (t) = c+ e4t + c- e−t +

3 cos(t) − 5 sin(t) .
17
C

The next example collects a few nonhomogeneous equations and the guessed function yp .

Example 2.3.4. We provide few more examples of nonhomogeneous equations and the appropriate
guesses for the particular solutions.

(a) For y 00 − 3y 0 − 4y = 3e2t sin(t), guess, yp (t) = k1 cos(t) + k2 sin(t) e2t .


 

(b) For y 00 − 3y 0 − 4y = 2t2 e3t , guess, yp (t) = k2 t2 + k1 t + k0 e3t .




(c) For y 00 − 3y 0 − 4y = 2t2 e4t , guess, yp (t) = k2 t2 + k1 t + k0 t e4t .




(d) For y 00 − 3y 0 − 4y = 3t sin(t), guess, yp (t) = (k1 t + k0 ) k̃1 cos(t) + k̃2 sin(t) .
 
C

Remark: Suppose that the source function f does not appear in Table 1, but f can be written as
f = f1 + f2 , with f1 and f2 in the table. In such case look for a particular solution yp = yp1 + yp2 ,
where L(yp1 ) = f1 and L(yp2 ) = f2 . Since the operator L is linear,
L(yp ) = L(yp1 + yp2 ) = L(yp1 ) + L(yp2 ) = f1 + f2 = f ⇒ L(yp ) = f.

In our next example we describe the electric


R L
current flowing through an RLC-series elec- C V (t)
tric circuit, which consists of a resistor R,
an inductor L, a capacitor C, and a voltage
source V (t) connected in series as shown in I(t) = electric current
Fig. 3.
Figure 3. An RLC circuit.

This system is described by an integro-differential equation found by Kirchhoff, now called


Kirchhoff’s voltage law, Z t
1
L I 0 (t) + R I(t) + I(s) ds = V (t). (2.3.3)
C t0
If we take one time derivative in the equation above we obtain a second order differential equation
for the electric current,
1
L I 00 (t) + R I 0 (t) + I(t) = V 0 (t).
C
This equation is usually rewritten as
R 1 V 0 (t)
I 00 (t) + 2 I 0 (t) + I(t) = .
2L LC L
R 1
If we introduce damping frequency ωd = and the natural frequency ω0 = √ , then Kirchhoff’s
2L LC
law can be expressed as
V 0 (t)
I 00 + 2ωd I 0 + ω02 I = .
L
We are now ready to solve the following example.

Example 2.3.5. Consider a RLC-series circuit with no resistor, capacitor C, inductor L and
1
voltage source V (t) = V0 sin(νt), where ν 6= ω0 = √ . Find the electric current in the case
LC
0
I(0) = 0, I (0) = 0.
114 2. SECOND ORDER LINEAR EQUATIONS

Solution: Kirchhoff equation for this problem is

I 00 + ω02 I = v0 ν cos(νt)
V0
where we denoted v0 = . We start finding the solutions of the homogeneous equation
L
I 00 + ω02 I = 0.

The characteristic equation is r2 +ω02 = 0, and the roots are r± = ±ω0 i, and real valued fundamental
solutions are
I+ = cos(ω0 t), I- = sin(ω0 t).
For ν 6= ω0 the source function is not solution of the homogeneous equation, so the correct guess
for a particular solution of the nonhomogeneous equation is

Ip = c1 cos(νt) + c2 sin(νt).

If we put this function Ip into the nonhomogeneous equation we get

−ν 2 (c1 cos(νt) + c2 sin(νt)) + ω02 (c1 cos(νt) + c2 sin(νt)) = v0 ν cos(νt).

If we reorder terms we get

c1 (ω02 − ν 2 ) − v0 ν cos(νt) + c2 (ω02 − ν 2 ) sin(νt) = 0.




From here we get that


c1 (ω02 − ν 2 ) − v0 ν = 0, c2 (ω02 − ν 2 ).
Since we are studying the case ν 6= ω0 , we conclude that
v0 ν
c1 = , c2 = 0.
(ω02 − ν 2 )
So, the particular solution is
v0 ν
Ip (t) = cos(νt).
(ω02 − ν 2 )
The general solution of the nonhomogeneous equation is
v0 ν
I(t) = c+ I+ + c- I- + Ip ⇒ I(t) = c+ cos(ω0 t) + c- sin(ω0 t) + cos(νt).
(ω02 − ν 2 )

We now look for the solution that satisfies the initial conditions I(0) = 0, and I 0 (0) = 0. For the
first condition we get
v0 ν v0 ν
0 = I(0) = c+ + ⇒ c+ = − .
(ω02 − ν 2 ) (ω02 − ν 2 )
The other boundary condition implies

0 = I 0 (0) = c- ω0 ⇒ c- = 0.

So, the solution of the initial value problem for the electric current is
v0 ν 
I(t) = 2 cos(νt) − cos(ω0 t) .
(ω0 − ν 2 )
C

Interactive Graph Link: Beating Phenomenon. Click on the interactive graph link here to
see how the solution I(t) changes when ν → ω0 , exhibiting the beating phenomenon shown in Fig. 4.
2.3. NONHOMOGENEOUS EQUATIONS 115

I(t)

Figure 4. The I for ν close to ω0 , showing beating, when ν is close to ω0 .

2.3.3. The Variation of Parameters Method. This method provides a second way to find
a particular solution yp to a nonhomogeneous equation L(y) = f . We summarize this method in
formula to compute yp in terms of any pair of fundamental solutions to the homogeneous equation
L(y) = 0. The variation of parameters method works with second order linear equations having
variable coefficients and contiuous but otherwise arbitrary sources. When the source function of
a nonhomogeneous equation is simple enough to appear in Table 1 the undetermined coefficients
method is a quick way to find a particular solution to the equation. When the source is more
complicated, one usually turns to the variation of parameters method, with its more involved
formula for a particular solution.

Theorem 2.3.4 (Variation of Parameters). A particular solution to the equation


L(y) = f,
with L(y) = y 00 + a1 (t) y 0 + a0 (t) y and a1 , a0 , f continuous functions, is given by
yp = u1 y1 + u2 y2 ,
where y1 , y2 are fundamental solutions of the homogeneous equatio L(y) = 0 and the functions u1 ,
u2 are defined by
Z Z
y2 (t)f (t) y1 (t)f (t)
u1 (t) = − dt, u2 (t) = dt, (2.3.4)
Wy1 y2 (t) Wy1 y2 (t)
where Wy1 y2 is the Wronskian of y1 and y2 .

The proof is a generalization of the reduction order method. Recall that the reduction order
method is a way to find a second solution y2 of an homogeneous equation if we already know one
solution y1 . One writes y2 = u y1 and the original equation L(y2 ) = 0 provides an equation for u.
This equation for u is simpler than the original equation for y2 because the function y1 satisfies
L(y1 ) = 0.
The formula for yp can be seen as a generalization of the reduction order method. We write
yp in terms of both fundamental solutions y1 , y2 of the homogeneous equation,
yp (t) = u1 (t) y1 (t) + u2 (t) y2 (t).
We put this yp in the equation L(yp ) = f and we find an equation relating u1 and u2 . It is important
to realize that we have added one new function to the original problem. The original problem is to
find yp . Now we need to find u1 and u2 , but we still have only one equation to solve, L(yp ) = f .
The problem for u1 , u2 cannot have a unique solution. So we are completely free to add a second
equation to the original equation L(yp ) = f . We choose the second equation so that we can solve
for u1 and u2 .
116 2. SECOND ORDER LINEAR EQUATIONS

Proof of Theorem 2.3.4: We look for a particular solution yp of the form


yp = u 1 y1 + u 2 y2 .
We hope that the equations for u1 , u2 will be simpler to solve than the equation for yp . But we
started with one unknown function and now we have two unknown functions. So we are free to add
one more equation to fix u1 , u2 . We choose
u01 y1 + u02 y2 = 0.
y10 0
Z
In other words, we choose u2 = − u1 dt. Let’s put this yp into L(yp ) = f . We need yp0 (and
y20
recall, u01 y1 + u02 y2 = 0)
yp0 = u01 y1 + u1 y10 + u02 y2 + u2 y20 ⇒ yp0 = u1 y10 + u2 y20 .
and we also need yp00 ,
yp00 = u01 y10 + u1 y100 + u02 y20 + u2 y200 .
So the equation L(yp ) = f is
(u01 y10 + u1 y100 + u02 y20 + u2 y200 ) + a1 (u1 y10 + u2 y20 ) + a0 (u1 y1 + u2 y2 ) = f
We reorder a few terms and we see that
u01 y10 + u02 y20 + u1 (y100 + a1 y10 + a0 y1 ) + u2 (y200 + a1 y20 + a0 y2 ) = f.
The functions y1 and y2 are solutions to the homogeneous equation,
y100 + a1 y10 + a0 y1 = 0, y200 + a1 y20 + a0 y2 = 0,
so u1 and u2 must be solution of a simpler equation that the one above, given by
u01 y10 + u02 y20 = f. (2.3.5)
So we end with the equations
u01 y10 + u02 y20 = f
u01 y1 + u02 y2 = 0.
And this is a 2 × 2 algebraic linear system for the unknowns u01 , u02 . It is hard to overstate the
importance of the word “algebraic” in the previous sentence. From the second equation above we
compute u02 and we introduce it in the first equation,
y1 y1 y20 0  y0 y − y y0 
u02 = − u01 ⇒ u01 y10 − u1 = f ⇒ u01 1
2 1 2
= f.
y2 y2 y2
Recall that the Wronskian of two functions is W12 = y1 y20 − y10 y2 , we get
y2 f y1 f
u01 = − ⇒ u02 = .
W12 W12
These equations are the derivative of Eq. (2.3.4). Integrate them in the variable t and choose the
integration constants to be zero. We get Eq. (2.3.4). This establishes the Theorem. 

Remark: The integration constants in the expressions for u1 , u2 can always be chosen to be zero.
To understand the effect of the integration constants in the function yp , let us do the following.
Denote by u1 and u2 the functions in Eq. (2.3.4), and given any real numbers c1 and c2 define
ũ1 = u1 + c1 , ũ2 = u2 + c2 .
Then the corresponding solution ỹp is given by
ỹp = ũ1 y1 + ũ2 y2 = u1 y1 + u2 y2 + c1 y1 + c2 y2 ⇒ ỹp = yp + c1 y1 + c2 y2 .
The two solutions ỹp and yp differ by a solution to the homogeneous differential equation. So
both functions are also solution to the nonhomogeneous equation. One is then free to choose the
constants c1 and c2 in any way. We chose them in the proof above to be zero.
2.3. NONHOMOGENEOUS EQUATIONS 117

Example 2.3.6. Find the general solution of the nonhomogeneous equation

y 00 − 5y 0 + 6y = 2 et .

Solution: The formula for yp in Theorem 2.3.4 requires we know fundamental solutions to the
homogeneous problem. So we start finding these solutions first. Since the equation has constant
coefficients, we compute the characteristic equation,
√ r+ = 3,

1
r2 − 5r + 6 = 0 ⇒ r± =

5 ± 25 − 24 ⇒
2 r- = 2.

So, the functions y1 and y2 in Theorem 2.3.4 are in our case given by

y1 (t) = e3t , y2 (t) = e2t .

The Wronskian of these two functions is given by

Wy1 y2 (t) = (e3t )(2 e2t ) − (3 e3t )(e2t ) ⇒ Wy1 y2 (t) = −e5t .

We are now ready to compute the functions u1 and u2 . Notice that Eq. (2.3.4) the following
differential equations
y2 f y1 f
u01 = − , u02 = .
Wy1 y2 Wy1 y2
So, the equation for u1 is the following,

u01 = −e2t (2 et )(−e−5t ) ⇒ u01 = 2 e−2t ⇒ u1 = −e−2t ,

u02 = e3t (2 et )(−e−5t ) ⇒ u02 = −2 e−t ⇒ u2 = 2 e−t ,


where we have chosen the constant of integration to be zero. The particular solution we are looking
for is given by
yp = (−e−2t )(e3t ) + (2 e−t )(e2t ) ⇒ yp = et .
Then, the general solution theorem for nonhomogeneous equation implies

ygen (t) = c+ e3t + c- e2t + et c+ , c- ∈ R.

Example 2.3.7. Find a particular solution to the differential equation

t2 y 00 − 2y = 3t2 − 1,

knowing that y1 = t2 and y2 = 1/t are solutions to the homogeneous equation t2 y 00 − 2y = 0.


Solution: We first rewrite the nonhomogeneous equation above in the form given in Theorem 2.3.4.
In this case we must divide the whole equation by t2 ,
2 1 1
y 00 − y =3− 2 ⇒ f (t) = 3 − .
t2 t t2
We now proceed to compute the Wronskian of the fundamental solutions y1 , y2 ,
 −1  1
Wy1 y2 (t) = (t2 ) 2 − (2t) ⇒ Wy1 y2 (t) = −3.
t t
We now use the equation in (2.3.4) to obtain the functions u1 and u2 ,
1 1 1  1 1
u01 = − 3− 2 u02 = (t2 ) 3 − 2
t t −3 t −3
1 1 −3 1 −2 2 1 1 1
= − t ⇒ u1 = ln(t) + t , = −t + ⇒ u2 = − t3 + t.
t 3 6 3 3 3
118 2. SECOND ORDER LINEAR EQUATIONS

A particular solution to the nonhomogeneous equation above is ỹp = u1 y1 + u2 y2 , that is,


h 1 i 1
ỹp = ln(t) + t−2 (t2 ) + (−t3 + t)(t−1 )
6 3
2 1 1 2 1
= t ln(t) + − t +
6 3 3
1 1
= t2 ln(t) + − t2
2 3
2 1 1
= t ln(t) + − y1 (t).
2 3
However, a simpler expression for a solution of the nonhomogeneous equation above is
1
yp = t2 ln(t) + .
2
C

Remark: Sometimes it could be difficult to remember the formulas for functions u1 and u2
in (2.3.4). In such case one can always go back to the place in the proof of Theorem 2.3.4 where
these formulas come from, the system
u01 y10 + u02 y20 = f
u01 y1 + u02 y2 = 0.
The system above could be simpler to remember than the equations in (2.3.4). We end this Section
using the equations above to solve the problem in Example 2.3.7. Recall that the solutions to the
homogeneous equation in Example 2.3.7 are y1 (t) = t2 , and y2 (t) = 1/t, while the source function
is f (t) = 3 − 1/t2 . Then, we need to solve the system
1
t2 u01 + u02 = 0,
t
0 0 (−1) 1
2t u1 + u2 2 = 3 − 2 .
t t
This is an algebraic linear system for u01 and u02 . Those are simple to solve. From the equation on
top we get u02 in terms of u01 , and we use that expression on the bottom equation,
1 1 1
u02 = −t3 u01 ⇒ 2t u01 + t u01 = 3 − 2 ⇒ u01 = − 3 .
t t 3t
Substitue back the expression for u01 in the first equation above and we get u02 . We get,
1 1
u01 = − 3
t 3t
1
u02 = −t2 + .
3
We should now integrate these functions to get u1 and u2 and then get the particular solution
ỹp = u1 y1 + u2 y2 . We do not repeat these calculations, since they are done Example 2.3.7.
2.3. NONHOMOGENEOUS EQUATIONS 119

2.3.4. Exercises.

2.3.1.- . 2.3.2.- .
120 2. SECOND ORDER LINEAR EQUATIONS

2.4. Reduction of Order Methods


Sometimes a solution to a second order differential equation can be obtained solving two first
order equations, one after the other. When that happens we say we have reduced the order of the
equation. We use the ideas in Chapter 1 to solve each first order equation.
In this section we focus on three types of differential equations where such reduction of order
happens. The first two cases are usually called special second order equations and the third case is
called the conservation of the energy.
We end this section with a method that provides a second solution to a second order equation
if you already know one solution. The second solution can be chosen not proportional to the first
one. This idea is called the reduction order method—although all four ideas we study in this section
do reduce the order of the original equation.
2.4.1. Special Second Order Equations. A second order differential equation is called
special when either the function, or its first derivative, or the independent variable does not appear
explicitly in the equation. In these cases the second order equation can be transformed into a first
order equation for a new function. The transformation to get the new function is different in each
case. Then, one solves the first order equation and transforms back solving another first order
equation to get the original function. We start with a few definitions.

Definition 2.4.1. A second order equation in the unknown function y is an equation


y 00 = f (t, y, y 0 ).
where the function f : R3 → R is given. The equation is linear iff function f is linear in both
arguments y and y 0 . The second order differential equation above is special iff one of the following
conditions hold:
(a) y 00 = f (t, 
y , y 0 ), the function y does not appear explicitly in the equation;
Z
00
(b) y = f ( t , y, y 0 ), the variable t does not appear explicitly in the equation.
Z
00 0
(c) y = f ( t , y, @
Z y@ ), the variable t, the function y 0 do not appear explicitly in the equation.

It is simpler to solve special second order equations when the function y is missing, case (a),
than when the variable t is missing, case (b), as it can be seen by comparing Theorems 2.4.2
and 2.4.3. The case (c) is well known in physics, since it applies to Newton’s second law of motion
in the case that the force on a particle depends only on the position of the particle. In such a case
one can show that the energy of the particle is conserved.
Let us start with case (a).

Theorem 2.4.2 (Function y Missing). If a second order differential equation has the form y 00 =
f (t, y 0 ), then v = y 0 satisfies the first order equation v 0 = f (t, v).

The proof is trivial, so we go directly to an example.

Example 2.4.1. Find the y solution of the second order nonlinear equation y 00 = −2t (y 0 )2 with
initial conditions y(0) = 2, y 0 (0) = −1.
Solution: Introduce v = y 0 . Then v 0 = y 00 , and
v0 1
v 0 = −2t v 2 ⇒ = −2t ⇒ − = −t2 + c.
v2 v
1 1
So, = t2 − c, that is, y 0 = 2 . The initial condition implies
y0 t −c
1 1
−1 = y 0 (0) = − ⇒ c = 1 ⇒ y0 = 2 .
c t −1
Z
dt
Then, y = + c. We integrate using the method of partial fractions,
t2 − 1
1 1 a b
= = + .
t2 − 1 (t − 1)(t + 1) (t − 1) (t + 1)
2.4. REDUCTION OF ORDER METHODS 121

1 1
Hence, 1 = a(t + 1) + b(t − 1). Evaluating at t = 1 and t = −1 we get a = , b = − . So
2 2
1 1h 1 1 i
= − .
t2 − 1 2 (t − 1) (t + 1)
Therefore, the integral is simple to do,
1  1
y= ln |t − 1| − ln |t + 1| + c. 2 = y(0) = (0 − 0) + c.
2 2
1 
We conclude y = ln |t − 1| − ln |t + 1| + 2. C
2

The case (b) is way more complicated to solve.

Theorem 2.4.3 (Variable t Missing). If the initial value problem


y 00 = f (y, y 0 ), y(0) = y0 , y 0 (0) = y1 ,
has an invertible solution y, then the function
w(y) = v(t(y)),
where v(t) = y 0 (t), and t(y) is the inverse of y(t), satisfies the initial value problem
f (y, w)
ẇ = , w(y0 ) = y1 ,
w
dw
where we denoted ẇ = .
dy

Remark: The proof is based on the chain rule for the derivative of functions.
Proof of Theorem 2.4.3: The differential equation is y 00 = f (y, y 0 ). Denoting v(t) = y 0 (t)
v 0 = f (y, v)
It is not clear how to solve this equation, since the function y still appears in the equation. On a
domain where y is invertible we can do the following. Denote t(y) the inverse values of y(t), and
introduce w(y) = v(t(y)). The chain rule implies
dw dv dt v 0 (t) v 0 (t) f y(t), v(t)) f (y, w(y))

ẇ(y) = = = 0 = = = .
dy y dt t(y) dy t(y) y (t) t(y) v(t) t(y) v(t) w(y)

t(y)

dw dv
where ẇ(y) = , and v 0 (t) = . Therefore, we have obtained the equation for w, namely
dy dt
f (y, w)
ẇ =
w
Finally we need to find the initial condition fro w. Recall that
y(t = 0) = y0 ⇔ t(y = y0 ) = 0,
0
y (t = 0) = y1 ⇔ v(t = 0) = y1 .
Therefore,
w(y = y0 ) = v(t(y = y0 )) = v(t = 0) = y1 ⇒ w(y0 ) = y1 .
This establishes the Theorem. 

Example 2.4.2. Find a solution y to the second order equation y 00 = 2y y 0 .

Solution: The variable t does not appear in the equation. So we start introduciong the function
v(t) = y 0 (t). The equation is now given by v 0 (t) = 2y(t) v(t). We look for invertible solutions y,
then introduce the function w(y) = v(t(y)). This function satisfies
dw  dv dt  v0 v0

ẇ(y) = = = 0 = .

dy dt dy t(y) y t(y) v t(y)

122 2. SECOND ORDER LINEAR EQUATIONS

Using the differential equation,


2yv dw

ẇ(y) = ⇒ = 2y ⇒ w(y) = y 2 + c.
v t(y) dy

Since v(t) = w(y(t)), we get v(t) = y 2 (t) + c. This is a separable equation,


y 0 (t)
= 1.
y 2 (t) +c
Since we only need to find a solution of the equation, and the integral depends on whether c > 0,
c = 0, c < 0, we choose (for no special reason) only one case, c = 1.
Z Z
dy
= dt + c0 ⇒ arctan(y) = t + c0 y(t) = tan(t + c0 ).
1 + y2
Again, for no reason, we choose c0 = 0, and we conclude that one possible solution to our problem
is y(t) = tan(t). C

Example 2.4.3. Find the solution y to the initial value problem


y y 00 + 3(y 0 )2 = 0, y(0) = 1, y 0 (0) = 6.

Solution: We start rewriting the equation in the standard form


(y 0 )2
y 00 = −3 .
y
The variable t does not appear explicitly in the equation, so we introduce the function v(t) = y 0 (t).
The differential equation now has the form v 0 (t) = −3v 2 (t)/y(t). We look for invertible solutions y,
and then we introduce the function w(y) = v(t(y)). Because of the chain rule for derivatives, this
function satisfies
dw  dv dt  v0 v0 v 0 (t(y))

ẇ(y) = (y) = = 0 = ⇒ ẇ(y) = .

dy dt dy t(y) y t(y) v t(y) w(y)

Using the differential equation on the factor v 0 , we get


−3v 2 (t(y)) 1 −3w2 −3w
ẇ(y) = = ⇒ ẇ = .
y w yw y
This is a separable equation for function w. The problem for w also has initial conditions, which
can be obtained from the initial conditions from y. Recalling the definition of inverse function,
y(t = 0) = 1 ⇔ t(y = 1) = 0.
Therefore,
w(y = 1) = v(t(y = 1)) = v(0) = y 0 (0) = 6,
where in the last step above we use the initial condition y 0 (0) = 6. Summarizing, the initial value
problem for w is
−3w
ẇ = , w(1) = 6.
y
The equation for w is separable, so the method from § 1.4 implies that
ln(w) = −3 ln(y) + c0 = ln(y −3 ) + c0 ⇒ w(y) = c1 y −3 , c1 = ec0 .
The initial condition fixes the constant c1 , since
6 = w(1) = c1 ⇒ w(y) = 6 y −3 .
We now transform from w back to v as follows,
v(t) = w(y(t)) = 6 y −3 (t) ⇒ y 0 (t) = 6y −3 (t).
This is now a first order separable equation for y. Again the method from § 1.4 imply that
y4
y3 y0 = 6 ⇒ = 6t + c2
4
2.4. REDUCTION OF ORDER METHODS 123

The initial condition for y fixes the constant c2 , since


1 y4 1
1 = y(0) ⇒ = 0 + c2 ⇒ = 6t + .
4 4 4
So we conclude that the solution y to the initial value problem is
y(t) = (24t + 1)4 .
C

2.4.2. Conservation of the Energy. We now study case (c) in Def. 2.4.1—second order
differential equations such that both the variable t and the function y 0 do not appear explicitly in
the equation. This case is important in Newtonian mechanics. For that reason we slightly change
notation we use to write the differential equation. Instead of writing the equation as y 00 = f (y), as
in Def. 2.4.1, we write it as
m y 00 = f (y),
where m is a constant. This notation matches the notation of Newton’s second law of motion for
a particle of mass m, with position function y as function of time t, acting under a force f that
depends only on the particle position y.
It turns out that solutions to the differential equation above have a particular property: There
is a function of y 0 and y, called the energy of the system, that remains conserved during the motion.
We summarize this result in the statement below.

Theorem 2.4.4 (Conservation of the Energy). Consider a particle with positive mass m and
position y, function of time t, which is a solution of Newton’s second law of motion
m y 00 = f (y),
with initial conditions y(t0 ) = y0 and y 0 (t0 ) = v0 , where f (y) is the force acting on the particle at
the position y. Then, the position function y satisfies
1
mv 2 + V (y) = E0 ,
2
where E0 = 21 mv02 + V (y0 ) is fixed by the initial conditions, v(t) = y 0 (t) is the particle velocity, and
V is the potential of the force f —the negative of the primitive of f , that is
Z
dV
V (y) = − f (y) dy ⇔ f = − .
dy

Remark: The term T (v) = 21 mv 2 is the kinetic energy of the particle. The term V (y) is the
potential energy. The Theorem above says that the total mechanical energy
E = T (v) + V (y)
remains constant during the motion of the particle.
Proof of Theorem 2.4.4: We write the differential equation using the potential V ,
dV
m y 00 = − .
dy
Multiply the equation above by y 0 ,
dV 0
m y 0 (t) y 00 (t) = − y (t).
dy
Use the chain rule on both sides of the equation above,
d 1  d
m (y 0 )2 = − V (y(t)).
dt 2 dt
Introduce the velocity v = y 0 , and rewrite the equation above as
d 1 
m v 2 + V (y) = 0.
dt 2
124 2. SECOND ORDER LINEAR EQUATIONS

This means that the quantity


1
E(y, v) = m v 2 + V (y),
2
called the mechanical energy of the system, remains constant during the motion. Therefore, it
must match its value at the initial time t0 , which we called E0 in the Theorem. So we arrive to the
equation
1
E(y, v) = m v 2 + V (y) = E0 .
2
This establishes the Theorem. 

Example 2.4.4. Find the potential energy and write the energy conservation for the following
systems:
(i) A particle attached to a spring with constant k, moving in one space dimension.
(ii) A particle moving vertically close to the Earth surface, under Earth’s constant gravitational
acceleration. In this case the force on the particle having mass m is f (y) = mg, where g = 9.81
m/s2 .
(iii) A particle moving along the direction vertical to the surface of a spherical planet with mass
M and radius R.

Solution:
Case (i). The force on a particle of mass m attached to a spring with spring constant k > 0, when
displaced an amount y from the equilibrium position y = 0 is f (y) = −ky. Therefore, Newton’s
second law of motion says
m y 00 = −ky.
1 2
The potential in this case is V (y) = 2 ky , since −dV /dy = −ky = f . If we introduce the particle
velocity v = y 0 , then the total mechanical energy is
1 1
E(y, v) = mv 2 + ky 2 .
2 2
The conservation of the energy says that
1 1
mv 2 + ky 2 = E0 ,
2 2
where E0 is the energy at the initial time.
Case (ii). Newton’s equation of motion says: m y 00 = m g. If we multiply Newton’s equation by
y 0 , we get
d 1 
m y 0 y 00 = m g y 0 ⇒ m (y 0 )2 + m g y = 0
dt 2
If we introduce the the gravitational energy
1
E(y, v) = mv 2 + mgy,
2
dE
where v = y 0 , then Newton’s law says = 0, so the total gravitational energy is constant,
dt
1
mv 2 + mgy = E(0).
2

Case (iii). Consider a particle of mass m moving on a line which is perpendicular to the surface
of a spherical planet of mass M and radius R. The force on such a particle when is at a distance y
from the surface of the planet is, according to Newton’s gravitational law,
GM m
f (y) = − ,
(R + y)2
where G = 6.67 × 10−11 m
3
, is Newton’s gravitational constant. The potential is
s2 kg
GM m
V (y) = − ,
(R + y)
2.4. REDUCTION OF ORDER METHODS 125

since −dV /dy = f (y). The energy for this system is


1 GM m
E(y, v) = mv 2 −
2 (R + y)
where we introduced the particle velocity v = y 0 . The conservation of the energy says that
1 GM m
mv 2 − = E0 ,
2 (R + y)
where E0 is the energy at the initial time. C

Example 2.4.5. Find the maximum height of a ball of mass m = 0.1 kg that is shot vertically by
a spring with spring constant k = 400 kg/s2 and compressed 0.1 m. Use g = 10 m/s2 .
Solution: This is a difficult problem to solve if one tries to find the position function y and evaluate
it at the time when its speed vanishes—maximum altitude. One has to solve two differential
equations for y, one with source f1 = −ky − mg and other with source f2 = −mg, and the solutions
must be glued together. The first source describes the particle when is pushed by the spring under
the Earth’s gravitational force. The second source describes the particle when only the Earth’s
gravitational force is acting on the particle. Also, the moment when the ball leaves the spring is
hard to describe accurately.
A simpler method is to use the conservation of the mechanical and gravitational energy. The
energy for this particle is
1 1
E(t) = mv 2 + ky 2 + mgy.
2 2
This energy must be constant along the movement. In particular, the energy at the initial time
t = 0 must be the same as the energy at the time of the maximum height, tM ,
1 1 1
E(t = 0) = E(tM ) ⇒ mv02 + ky02 + mgy0 = mvM
2
+ mgyM .
2 2 2
But at the initial time we have v0 = 0, and y0 = −0.1, (the negative sign is because the spring is
compressed) and at the maximum time we also have vM = 0, hence
1 2 k
ky0 + mgy0 = mgyM ⇒ yM = y0 + y02 .
2 2mg
We conclude that yM = 1.9 m. C

Example 2.4.6. Find the escape velocity from Earth—the initial velocity of a projectile moving
vertically upwards starting from the Earth surface such that it escapes Earth gravitational attrac-
tion. Recall that the acceleration of gravity at the surface of Earth is g = GM/R2 = 9.81 m/s2 ,
and that the radius of Earth is R = 6378 Km. Here M denotes the mass of the Earth, and G is
Newton’s gravitational constant.
Solution: The projectile moves in the vertical direction, so the movement is along one space
dimension. Let y be the position of the projectile, with y = 0 at the surface of the Earth. Newton’s
equation in this case is
GM m
m y 00 = − .
(R + y)2
We start rewriting the force using the constant g instead of G,
GM m GM mR2 gmR2
− =− 2 =− .
(R + y)2 R (R + y)2 (R + y)2
So the equation of motion for the projectile is
gmR2
m y 00 = − .
(R + y)2
126 2. SECOND ORDER LINEAR EQUATIONS

The projectile mass m can be canceled from the equation above (we do it later) so the result will
be independent of the projectile mass. Now we introduce the gravitational potential
gmR2
V (y) = − .
(R + y)
We know that the motion of this particle satisfies the conservation of the energy
1 gmR2
mv 2 − = E0 ,
2 (R + y)
where v = y 0 . The initial energy is simple to compute, y(0) = 0 and v(0) = v0 , so we get
1 gmR2 1
mv 2 (t) − = mv02 − gmR.
2 (R + y(t)) 2
We now cancel the projectile mass from the equation, and we rewrite the equation as
2gR2
v 2 (t) = v02 − 2gR + .
(R + y(t))
Now we choose the initial velocity v0 to be the escape velocity ve . The latter is the smallest initial
velocity such that v(t) is defined for all y including y → ∞. Since
2gR2
v 2 (t) > 0 and > 0,
(R + y(t))
this means that the escape velocity must satisfy
ve2 − 2gR > 0.
Since the escape velocity is the smallest velocity satisfying the condition above, that means
p
ve = 2gR ⇒ ve = 11.2 Km/s.
C

Example 2.4.7. Find the time tM for a rocket to reach the Moon, if it is launched at the escape
velocity. Use that the distance from the surface of the Earth to the Moon is d = 405, 696 Km.
Solution: From Example 2.4.6 we know that the position function y of the rocket satisfies the
differential equation
2gR2
v 2 (t) = v02 − 2gR + ,
(R + y(t))
where R is the Earth radius, g the gravitational acceleration at the Earth surface, v = y 0 , and
√ v0 is
the initial velocity. Since the rocket initial velocity is the Earth escape velocity, v0 = ve = 2gR,
the differential equation for y is

2gR2 2g R
(y 0 )2 = ⇒ y0 = √ ,
(R + y) R+y
where we chose the positive square root because, in our coordinate system, the rocket leaving Earth
means v > 0. Now, the last equation above is a separable differential equation for y, so we can
integrate it,
2
(R + y)1/2 y 0 = 2g R ⇒ (R + y)3/2 = 2g R t + c,
p p
3
where c is a constant, which can be determined by the initial condition y(t = 0) = 0, since at the
initial time the projectile is on the surface of the Earth, the origin of out coordinate system. With
this initial condition we get
2 2 2
c = R3/2 ⇒ (R + y)3/2 = 2g R t + R3/2 .
p
(2.4.1)
3 3 3
From the equation above we can compute an explicit form of the solution function y,
3 p 2/3
y(t) = 2g R t + R3/2 − R. (2.4.2)
2
2.4. REDUCTION OF ORDER METHODS 127

To find the time to reach the Moon we need to evaluate Eq. (2.4.1) for y = d and get tM ,
2 2 2 1
(R + d)3/2 = 2g R tM + R3/2 . ⇒ tM = √ (R + d)3/2 − R2/3 .
p 
3 3 3 2g R
The formula above gives tM = 51.5 hours. C

2.4.3. The Reduction of Order Method. If we know one solution to a second order,
linear, homogeneous, differential equation, then one can find a second solution to that equation.
And this second solution can be chosen to be not proportional to the known solution. One obtains
the second solution transforming the original second order differential equation into solving two
first order differential equations.

Theorem 2.4.5 (Reduction of Order). If a nonzero function y1 is solution to


y 00 + a1 (t) y 0 + a0 (t) y = 0. (2.4.3)
where a1 , a0 are given functions, then a second solution not proportional to y1 is
Z −A1 (t)
e
y2 (t) = y1 (t) dt, (2.4.4)
y12 (t)
R
where A1 (t) = a1 (t) dt.

Remark: In the first part of the proof we write y2 (t) = v(t) y1 (t) and show that y2 is solution of
Eq. (2.4.3) iff the function v is solution of
 y 0 (t) 
v 00 + 2 1 + a1 (t) v 0 = 0. (2.4.5)
y1 (t)
In the second part we solve the equation for v. This is a first order equation for for w = v 0 , since
v itself does not appear in the equation, hence the name reduction of order method. The equation
for w is linear and first order, so we can solve it using the integrating factor method. One more
integration gives v, which is the factor multiplying y1 in Eq. (2.4.4).
Remark: The functions v and w in this subsection have no relation with the functions v and w
from the previous subsection.
Proof of Theorem 2.4.5: We write y2 = vy1 and we put this function into the differential equation
in 2.4.3, which give us an equation for v. To start, compute y20 and y200 ,
y20 = v 0 y1 + v y10 , y200 = v 00 y1 + 2v 0 y10 + v y100 .
Introduce these equations into the differential equation,
0 = (v 00 y1 + 2v 0 y10 + v y100 ) + a1 (v 0 y1 + v y10 ) + a0 v y1
= y1 v 00 + (2y10 + a1 y1 ) v 0 + (y100 + a1 y10 + a0 y1 ) v.
The function y1 is solution to the differential original differential equation,
y100 + a1 y10 + a0 y1 = 0,
then, the equation for v is given by
 y0 
y1 v 00 + (2y10 + a1 y1 ) v 0 = 0. ⇒ v 00 + 2 1 + a1 v 0 = 0.
y1
This is Eq. (2.4.5). The function v does not appear explicitly in this equation, so denoting w = v 0
we obtain
 y0 
w0 + 2 1 + a1 w = 0.
y1
This is is a first order linear equation for w, so we solve it using the integrating factor method, with
integrating factor Z
µ(t) = y12 (t) eA1 (t) , where A1 (t) = a1 (t) dt.
128 2. SECOND ORDER LINEAR EQUATIONS

Therefore, the differential equation for w can be rewritten as a total t-derivative as


0 e−A1 (t)
y12 eA1 w = 0 ⇒ y12 eA1 w = w0 ⇒ w(t) = w0 2 .
y1 (t)
Since v 0 = w, we integrate one more time with respect to t to obtain
Z −A1 (t)
e
v(t) = w0 dt + v0 .
y12 (t)
We are looking for just one function v, so we choose the integration constants w0 = 1 and v0 = 0.
We then obtain Z −A1 (t) Z −A1 (t)
e e
v(t) = dt ⇒ y 2 (t) = y1 (t) dt.
y12 (t) y12 (t)
For the furthermore part, we now need to show that the functions y1 and y2 = vy1 are linearly
independent. We start computing their Wronskian,

y1 vy1 0 0 0
⇒ W12 = v 0 y12 .

W12 = 0 0 0 = y1 (v y1 + vy1 ) − vy1 y1

y1 (v y1 + vy1 )
Recall that above in this proof we have computed v 0 = w, and the result was w = w0 e−A1 /y12 . So
we get v 0 y12 = w0 e−A1 , and then the Wronskian is given by
W12 = w0 e−A1 .
This is a nonzero function, therefore the functions y1 and y2 = vy1 are linearly independent. This
establishes the Theorem. 

Example 2.4.8. Find a second solution y2 linearly independent to the solution y1 (t) = t of the
differential equation
t2 y 00 + 2ty 0 − 2y = 0.

Solution: We look for a solution of the form y2 (t) = t v(t). This implies that
y20 = t v 0 + v, y200 = t v 00 + 2v 0 .
So, the equation for v is given by
0 = t2 t v 00 + 2v 0 + 2t t v 0 + v − 2t v
 

= t3 v 00 + (2t2 + 2t2 ) v 0 + (2t − 2t) v


4
= t3 v 00 + (4t2 ) v 0 ⇒ v 00 + v 0 = 0.
t
Notice that this last equation is precisely Eq. (??), since in our case we have
2 2 
⇒ t v 00 + 2 + t v 0 = 0.

y1 = t, p(t) =
t t
The equation for v is a first order equation for w = v 0 , given by
w0 4
=− ⇒ w(t) = c1 t−4 , c1 ∈ R.
w t
Therefore, integrating once again we obtain that
v = c2 t−3 + c3 , c2 , c3 ∈ R,
and recalling that y2 = t v we then conclude that
y2 = c2 t−2 + c3 t.
Choosing c2 = 1 and c3 = 0 we obtain that y2 (t) = t−2 . Therefore, a fundamental solution set to
the original differential equation is given by
1
y1 (t) = t, y2 (t) = 2 .
t
C
2.4. REDUCTION OF ORDER METHODS 129

2.4.4. Exercises.

2.4.1.- Find the solution y to the second or- 2.4.2.- .


der, nonlinear equation
t2 y 00 + 6t y 0 = 1, t > 0.
CHAPTER 3

The Laplace Transform Method

The Laplace Transform is a transformation, meaning that it changes a function into a new function.
Actually, it is a linear transformation, because it converts a linear combination of functions into
a linear combination of the transformed functions. Even more interesting, the Laplace Transform
converts derivatives into multiplications. These two properties make the Laplace Transform very
useful to solve linear differential equations with constant coefficients. The Laplace Transform
converts such differential equation for an unknown function into an algebraic equation for the
transformed function. Usually it is easy to solve the algebraic equation for the transformed function.
Then one converts the transformed function back into the original function. This function is the
solution of the differential equation.
Solving a differential equation using a Laplace Transform is radically different from all the
methods we have used so far. This method, as we will use it here, is relatively new. The Laplace
Transform we define here was first used in 1910, but its use grew rapidly after 1920, specially to
solve differential equations. Transformations like the Laplace Transform were known much earlier.
Pierre Simon de Laplace used a similar transformation in his studies of probability theory, published
in 1812, but analogous transformations were used even earlier by Euler around 1737.

131
132 3. THE LAPLACE TRANSFORM METHOD

3.1. Introduction to the Laplace Transform


The Laplace transform is a transformation—it changes a function into another function. This
transformation is an integral transformation—the original function is multiplied by an exponential
and integrated on an appropriate region. Such an integral transformation is the answer to very
interesting questions: Is it possible to transform a differential equation into an algebraic equation?
Is it possible to transform a derivative of a function into a multiplication? The answer to both
questions is yes, for example with a Laplace transform.
This is how it works. You start with a derivative of a function, y 0 (t), then you multiply it by
any function, we choose an exponential e−st , and then you integrate on t, so we get
Z
y 0 (t) → e−st y 0 (t) dt,

which is a transformation, an integral transformation. And now, because we have an integration


above, we can integrate by parts—this is the big idea,
Z Z
y 0 (t) → e−st y 0 (t) dt = e−st y(t) + s e−st y(t) dt.

So we have transformed the derivative we started with into a multiplication by this constant s
from the exponential. The idea in this calculation actually works to solve differential equations and
motivates us to define the integral transformation y(t) → Ỹ (s) as follows,
Z
y(t) → Ỹ (s) = e−st y(t) dt.

The Laplace transform is a transformation similar to the one above, where we choose some appro-
priate integration limits—which are very convenient to solve initial value problems.
We dedicate this section to introduce the precise definition of the Laplace transform and how
is used to solve differential equations. In the following sections we will see that this method can be
used to solve linear constant coefficients differential equation with very general sources, including
Dirac’s delta generalized functions.

3.1.1. Oveview of the Method. The Laplace transform changes a function into another
function. For example, we will show later on that the Laplace transform changes
a
f (x) = sin(ax) into F (x) = .
x2 + a2
We will follow the notation used in the literature and we use t for the variable of the original
function f , while we use s for the variable of the transformed function F . Using this notation, the
Laplace transform changes
a
f (t) = sin(at) into F (s) = .
s2 + a2
We will show that the Laplace transform is a linear transformation and it transforms derivatives
into multiplication. Because of these properties we will use the Laplace transform to solve linear
differential equations.
We Laplace transform the original differential equation. Because the the properties above, the
result will be an algebraic equation for the transformed function. Algebraic equations are simple
to solve, so we solve the algebraic equation. Then we Laplace transform back the solution. We
summarize these steps as follows,

Solve the Transform back



differential

(1) Algebraic (2) (3)
L −→ −→ algebraic −→ to obtain y.
eq. for y. eq. for L[y].
eq. for L[y]. (Use the table.)
3.1. INTRODUCTION TO THE LAPLACE TRANSFORM 133

3.1.2. The Laplace Transform. The Laplace transform is a transformation, meaning that
it converts a function into a new function. We have seen transformations earlier in these notes. In
Chapter 2 we used the transformation
L[y(t)] = y 00 (t) + a1 y 0 (t) + a0 y(t),
so that a second order linear differential equation with source f could be written as L[y] = f . There
are simpler transformations, for example the differentiation operation itself,
D[f (t)] = f 0 (t).
Not all transformations involve differentiation. There are integral transformations, for example
integration itself, Z x
I[f (t)] = f (t) dt.
0
Of particular importance in many applications are integral transformations of the form
Z b
T [f (t)] = K(s, t) f (t) dt,
a
where K is a fixed function of two variables, called the kernel of the transformation, and a, b are
real numbers or ±∞. The Laplace transform is a transfomation of this type, where the kernel is
K(s, t) = e−st , the constant a = 0, and b = ∞.

Definition 3.1.1. The Laplace transform of a function f defined on Df = (0, ∞) is


Z ∞
F (s) = e−st f (t) dt, (3.1.1)
0
defined for all s ∈ DF ⊂ R where the integral converges.

In these note we use an alternative notation for the Laplace transform that emphasizes that
the Laplace transform is a transformation: L[f ] = F , that is
Z ∞
L[ ] = e−st ( ) dt.
0

So, the Laplace transform will be denoted as either L[f ] or F , depending whether we want to
emphasize the transformation itself or the result of the transformation. We will also use the notation
L[f (t)], or L[f ](s), or L[f (t)](s), whenever the independent variables t and s are relevant in any
particular context.
The Laplace transform is an improper integral—an integral on an unbounded domain. Im-
proper integrals are defined as a limit of definite integrals,
Z ∞ Z N
g(t) dt = lim g(t) dt.
t0 N →∞ t0

An improper integral converges iff the limit exists, otherwise the integral diverges.
Now we are ready to compute our first Laplace transform.

Example 3.1.1. Compute the Laplace transform of the function f (t) = 1, that is, L[1].
Solution: Following the definition,
Z ∞ Z N
L[1] = e−st dt = lim e−st dt.
0 N →∞ 0
The definite integral above is simple to compute, but it depends on the values of s. For s = 0 we
get
Z N
lim dt = lim N = ∞.
N →∞ 0 n→∞

So, the improper integral diverges for s = 0. For s 6= 0 we get


Z N
1 1
N
lim e−st dt = lim − e−st = lim − (e−sN − 1).

N →∞ 0 N →∞ s 0 N →∞ s
134 3. THE LAPLACE TRANSFORM METHOD

For s < 0 we have s = −|s|, hence


1 1
lim − (e−sN − 1) = lim − (e|s|N − 1) = −∞.
N →∞ s N →∞ s
So, the improper integral diverges for s < 0. In the case that s > 0 we get
1 1
lim − (e−sN − 1) = .
N →∞ s s
If we put all these result together we get
1
L[1] = , s > 0.
s
C

Example 3.1.2. Compute L[eat ], where a ∈ R.


Solution: We start with the definition of the Laplace transform,
Z ∞ Z ∞
L[eat ] = e−st (eat ) dt = e−(s−a)t dt.
0 0
In the case s = a we get Z ∞
L[eat ] = 1 dt = ∞,
0
so the improper integral diverges. In the case s 6= a we get
Z N
L[eat ] = lim e−(s−a)t dt, s 6= a,
N →∞ 0
h (−1) N i
= lim e−(s−a)t

N →∞ (s − a) 0
h (−1) i
= lim (e−(s−a)N − 1) .
N →∞ (s − a)

Now we have to remaining cases. The first case is:


s−a<0 ⇒ −(s − a) = |s − a| > 0 ⇒ lim e−(s−a)N = ∞,
N →∞

so the integral diverges for s < a. The other case is:


s−a>0 ⇒ −(s − a) = −|s − a| < 0 ⇒ lim e−(s−a)N = 0,
N →∞

so the integral converges only for s > a and the Laplace transform is given by
1
L[eat ] = , s > a.
(s − a)
C

Example 3.1.3. Compute L[teat ], where a ∈ R.


Solution: In this case the calculation is more complicated than above, since we need to integrate
by parts. We start with the definition of the Laplace transform,
Z ∞ Z N
L[teat ] = e−st teat dt = lim te−(s−a)t dt.
0 N →∞ 0
at
This improper integral diverges for s = a, so L[te ] is not defined for s = a. From now on we
consider only the case s 6= a. In this case we can integrate by parts,
Z N
1 1
h N i
L[teat ] = lim − te−(s−a)t + e−(s−a)t dt ,

N →∞ (s − a) 0 s−a 0
that is,
1 1
h N N i
L[teat ] = lim − te−(s−a)t − e−(s−a)t
. (3.1.2)

N →∞ (s − a) 0 (s − a)2 0
3.1. INTRODUCTION TO THE LAPLACE TRANSFORM 135

In the case that s < a the first term above diverges,


1 1
lim − N e−(s−a)N = lim − N e|s−a|N = ∞,
N →∞ (s − a) N →∞ (s − a)

therefore L[teat ] is not defined for s < a. In the case s > a the first term on the right hand side
in (3.1.2) vanishes, since
1 1
N e−(s−a)N = 0, t e−(s−a)t t=0 = 0.

lim −
N →∞ (s − a) (s − a)
Regarding the other term, and recalling that s > a,
1 1 1
e−(s−a)N = 0, e−(s−a)t t=0 =

lim − .
N →∞ (s − a)2 (s − a)2 (s − a)2
Therefore, we conclude that
1
L[teat ] = , s > a.
(s − a)2 C

Example 3.1.4. Compute L[sin(at)], where a ∈ R.


Solution: In this case we need to compute
Z ∞ Z N
L[sin(at)] = e−st sin(at) dt = lim e−st sin(at) dt
0 N →∞ 0

The definite integral above can be computed integrating by parts twice,


Z N
a2 N −st
Z
1  N a  N
e−st sin(at) dt = − e−st sin(at) − 2 e−st cos(at) − 2 e sin(at) dt,
0 s 0 s 0 s 0
which implies that
N
a2 
Z
 1  −st  N a  N
1+ e−st sin(at) dt = − e sin(at) − 2 e−st cos(at) .
s2 0 s 0 s 0

then we get
N
s2
Z h 1  N a  −st  N i
e−st sin(at) dt = − e−st
sin(at) − e cos(at) .
(s2 + a2 ) s s2

0 0 0

and finally we get


Z N
s2 h 1 a i
e−st sin(at) dt = − e−sN sin(aN ) − 0 − 2 e−sN cos(aN ) − 1 .

0 (s2 2
+a ) s s
One can check that the limit N → ∞ on the right hand side above does not exist for s 6 0, so
L[sin(at)] does not exist for s 6 0. In the case s > 0 it is not difficult to see that
Z ∞  s2 h 1 a i
e−st sin(at) dt = (0 − 0) − (0 − 1)
0 s2 + a2 s s2
so we obtain the final result
a
L[sin(at)] = , s > 0.
s2 + a2
C

In Table 1 we present a short list of Laplace transforms. They can be computed in the same
way we computed the the Laplace transforms in the examples above.
136 3. THE LAPLACE TRANSFORM METHOD

f (t) F (s) = L[f (t)] DF

1
f (t) = 1 F (s) = s>0
s
1
f (t) = eat F (s) = s>a
(s − a)
n!
f (t) = tn F (s) = s>0
s(n+1)
a
f (t) = sin(at) F (s) = s>0
s2 + a2
s
f (t) = cos(at) F (s) = s>0
s2 + a2
a
f (t) = sinh(at) F (s) = s > |a|
s2 − a2
s
f (t) = cosh(at) F (s) = s > |a|
s2 − a2
n!
f (t) = tn eat F (s) = s>a
(s − a)(n+1)
b
f (t) = eat sin(bt) F (s) = s>a
(s − a)2 + b2
(s − a)
f (t) = eat cos(bt) F (s) = s>a
(s − a)2 + b2
b
f (t) = eat sinh(bt) F (s) = s − a > |b|
(s − a)2 − b2
(s − a)
f (t) = eat cosh(bt) F (s) = s − a > |b|
(s − a)2 − b2

Table 1. List of a few Laplace transforms.

3.1.3. Main Properties. Since we are more or less confident on how to compute a Laplace
transform, we can start asking deeper questions. For example, what type of functions have a
Laplace transform? It turns out that a large class of functions, those that are piecewise continuous
on [0, ∞) and bounded by an exponential. This last property is particularly important and we give
it a name.

Definition 3.1.2. A function f defined on [0, ∞) is of exponential order s0 , where s0 is any real
number, iff there exist positive constants k, T such that
|f (t)| 6 k es0 t for all t > T. (3.1.3)

Remarks:
(a) When the precise value of the constant s0 is not important we will say that f is of exponential
order.
2
(b) An example of a function that is not of exponential order is f (t) = et .

This definition helps to describe a set of functions having Laplace transform. Piecewise con-
tinuous functions on [0, ∞) of exponential order have Laplace transforms.
3.1. INTRODUCTION TO THE LAPLACE TRANSFORM 137

Theorem 3.1.3 (Convergence of LT). If a function f defined on [0, ∞) is piecewise continuous


and of exponential order s0 , then the L[f ] exists for all s > s0 and there exists a positive constant
k such that
L[f ] 6 k ,

s > s0 .
s − s0

Proof of Theorem 3.1.3: From the definition of the Laplace transform we know that
Z N
L[f ] = lim e−st f (t) dt.
N →∞ 0

The definite integral on the interval [0, N ] exists for every N > 0 since f is piecewise continuous
on that interval, no matter how large N is. We only need to check whether the integral converges
as N → ∞. This is the case for functions of exponential order, because
Z N Z N Z N Z N
e−st f (t) dt 6 e−st |f (t)| dt 6 e−st kes0 t dt = k e−(s−s0 )t dt.


0 0 0 0

Therefore, for s > s0 we can take the limit as N → ∞,


Z N
k

e−st f (t) dt 6 k L[es0 t ] =

L[f ] 6 lim
.
N →∞ 0 (s − s0 )
Therefore, the comparison test for improper integrals implies that the Laplace transform L[f ] exists
at least for s > s0 , and it also holds that
L[f ] 6 k ,

s > s0 .
s − s0
This establishes the Theorem. 
The next result says that the Laplace transform is a linear transformation. This means that
the Laplace transform of a linear combination of functions is the linear combination of their Laplace
transforms.

Theorem 3.1.4 (Linearity). If L[f ] and L[g] exist, then for all a, b ∈ R holds
L[af + bg] = a L[f ] + b L[g].

Proof of Theorem 3.1.4: Since integration is a linear operation, so is the Laplace transform, as
this calculation shows,
Z ∞
e−st af (t) + bg(t) dt
 
L[af + bg] =
0
Z ∞ Z ∞
=a e−st f (t) dt + b e−st g(t) dt
0 0
= a L[f ] + b L[g].
This establishes the Theorem. 

Example 3.1.5. Compute L[3t2 + 5 cos(4t)].


Solution: From the Theorem above and the Laplace transform in Table ?? we know that
L[3t2 + 5 cos(4t)] = 3 L[t2 ] + 5 L[cos(4t)]
2  s 
=3 3 +5 2 , s>0
s s + 42
6 5s
= 3 + 2 .
s s + 42
Therefore,
5s4 + 6s2 + 96
L[3t2 + 5 cos(4t)] = , s > 0.
s3 (s2 + 16) C
138 3. THE LAPLACE TRANSFORM METHOD

The Laplace transform can be used to solve differential equations. The Laplace transform
converts a differential equation into an algebraic equation. This is so because the Laplace transform
converts derivatives into multiplications. Here is the precise result.

Theorem 3.1.5 (Derivative into Multiplication). If a function f is continuously differentiable on


[0, ∞) and of exponential order s0 , then L[f 0 ] exists for s > s0 and

L[f 0 ] = s L[f ] − f (0), s > s0 . (3.1.4)

Proof of Theorem 3.1.5: The main calculation in this proof is to compute


Z N
L[f 0 ] = lim e−st f 0 (t) dt.
N →∞ 0

We start computing the definite integral above. Since f 0 is continuous on [0, ∞), that definite
integral exists for all positive N , and we can integrate by parts,
Z N h  N Z N i
e−st f 0 (t) dt = e−st f (t) − (−s)e−st f (t) dt

0 0 0
Z N
= e−sN f (N ) − f (0) + s e−st f (t) dt.
0

We now compute the limit of this expression above as N → ∞. Since f is continuous on [0, ∞) of
exponential order s0 , we know that
Z N
lim e−st f (t) dt = L[f ], s > s0 .
N →∞ 0

Let us use one more time that f is of exponential order s0 . This means that there exist positive
constants k and T such that |f (t)| 6 k es0 t , for t > T . Therefore,

lim e−sN f (N ) 6 lim k e−sN es0 N = lim k e−(s−s0 )N = 0, s > s0 .


N →∞ N →∞ N →∞

These two results together imply that L[f 0 ] exists and holds

L[f 0 ] = s L[f ] − f (0), s > s0 .

This establishes the Theorem. 

Example 3.1.6. Verify the result in Theorem 3.1.5 for the function f (t) = cos(bt).
Solution: We need to compute the left hand side and the right hand side of Eq. (3.1.4) and verify
that we get the same result. We start with the left hand side,

b b2
L[f 0 ] = L[−b sin(bt)] = −b L[sin(bt)] = −b ⇒ L[f 0 ] = − .
s2 + b2 s2 + b2
We now compute the right hand side,

s s2 − s2 − b2
s L[f ] − f (0) = s L[cos(bt)] − 1 = s −1= ,
s2 +b 2 s2 + b2
so we get
b2
s L[f ] − f (0) = − .
s2 + b2
We conclude that L[f 0 ] = s L[f ] − f (0). C

It is not difficult to generalize Theorem 3.1.5 to higher order derivatives.


3.1. INTRODUCTION TO THE LAPLACE TRANSFORM 139

Theorem 3.1.6 (Higher Derivatives into Multiplication). If a function f is n-times continuously


differentiable on [0, ∞) and of exponential order s0 , then L[f 00 ], · · · , L[f (n) ] exist for s > s0 and
L[f 00 ] = s2 L[f ] − s f (0) − f 0 (0) (3.1.5)
..
.
L[f (n) ] = sn L[f ] − s(n−1) f (0) − · · · − f (n−1) (0). (3.1.6)

Proof of Theorem 3.1.6: We need to use Eq. (3.1.4) n times. We start with the Laplace transform
of a second derivative,
L[f 00 ] = L[(f 0 )0 ]
= s L[f 0 ] − f 0 (0)
= s s L[f ] − f (0) − f 0 (0)


= s2 L[f ] − s f (0) − f 0 (0).


The formula for the Laplace transform of an nth derivative is computed by induction on n. We
assume that the formula is true for n − 1,
L[f (n−1) ] = s(n−1) L[f ] − s(n−2) f (0) − · · · − f (n−2) (0).
Since L[f (n) ] = L[(f 0 )(n−1) ], the formula above on f 0 gives

L[(f 0 )(n−1) ] = s(n−1) L[f 0 ] − s(n−2) f 0 (0) − · · · − (f 0 )(n−2) (0)


= s(n−1) s L[f ] − f (0) − s(n−2) f 0 (0) − · · · − f (n−1) (0)


= s(n) L[f ] − s(n−1) f (0) − s(n−2) f 0 (0) − · · · − f (n−1) (0).


This establishes the Theorem. 

Example 3.1.7. Verify Theorem 3.1.6 for f 00 , where f (t) = cos(bt).


Solution: We need to compute the left hand side and the right hand side in the first equation in
Theorem (3.1.6), and verify that we get the same result. We start with the left hand side,
s b2 s
L[f 00 ] = L[−b2 cos(bt)] = −b2 L[cos(bt)] = −b2 ⇒ L[f 00 ] = − .
s2 + b2 s2 + b2
We now compute the right hand side,
s s3 − s3 − b2 s
s2 L[f ] − s f (0) − f 0 (0) = s2 L[cos(bt)] − s − 0 = s2 −s= ,
s2 +b 2 s2 + b2
so we get
b2 s
s2 L[f ] − s f (0) − f 0 (0) = − .
s2 + b2
We conclude that L[f 00 ] = s2 L[f ] − s f (0) − f 0 (0). C

3.1.4. Solving Differential Equations. The Laplace transform can be used to solve differ-
ential equations. We Laplace transform the whole equation, which converts the differential equation
for y into an algebraic equation for L[y]. We solve the Algebraic equation and we transform back.
Solve the Transform back

differential

(1) Algebraic (2) (3)
L −→ −→ algebraic −→ to obtain y.
eq. for y. eq. for L[y].
eq. for L[y]. (Use the table.)

Example 3.1.8. Use the Laplace transform to find y solution of


y 00 + 9 y = 0, y(0) = y0 , y 0 (0) = y1 .
140 3. THE LAPLACE TRANSFORM METHOD

Remark: Notice we already know what the solution of this problem is. Following § 2.2 we need to
find the roots of
p(r) = r2 + 9 ⇒ r-+ = ±3 i,
and then we get the general solution
y(t) = c+ cos(3t) + c- sin(3t).
Then the initial condition will say that
y1
y(t) = y0 cos(3t) + sin(3t).
3
We now solve this problem using the Laplace transform method.
Solution: We now use the Laplace transform method:
L[y 00 + 9y] = L[0] = 0.
The Laplace transform is a linear transformation,
L[y 00 ] + 9 L[y] = 0.
But the Laplace transform converts derivatives into multiplications,
s2 L[y] − s y(0) − y 0 (0) + 9 L[y] = 0.
This is an algebraic equation for L[y]. It can be solved by rearranging terms and using the initial
condition,
s 1
(s2 + 9) L[y] = s y0 + y1 ⇒ L[y] = y0 2 + y1 2 .
(s + 9) (s + 9)
But from the Laplace transform table we see that
s 3
L[cos(3t)] = 2 , L[sin(3t)] = 2 ,
s + 32 s + 32
therefore,
1
L[y] = y0 L[cos(3t)] + y1 L[sin(3t)].
3
Once again, the Laplace transform is a linear transformation,
h y1 i
L[y] = L y0 cos(3t) + sin(3t) .
3
We obtain that
y1
y(t) = y0 cos(3t) + sin(3t).
3
C
3.1. INTRODUCTION TO THE LAPLACE TRANSFORM 141

3.1.5. Exercises.

3.1.1.- . 3.1.2.- .
142 3. THE LAPLACE TRANSFORM METHOD

3.2. The Initial Value Problem


We will use the Laplace transform to solve differential equations. The main idea is,
" # Solve the Transform back
differential eq. (1) Algebraic eq. (2) (3)
L −→ −→ algebraic eq. −→ to obtain y(t).
for y(t). for L[y(t)].
for L[y(t)]. (Use the table.)

We will use the Laplace transform to solve differential equations with constant coefficients.
Although the method can be used with variable coefficients equations, the calculations could be
very complicated in such a case.
The Laplace transform method works with very general source functions, including step func-
tions, which are discontinuous, and Dirac’s deltas, which are generalized functions.

3.2.1. Solving Differential Equations. As we see in the sketch above, we start with a dif-
ferential equation for a function y. We first compute the Laplace transform of the whole differential
equation. Then we use the linearity of the Laplace transform, Theorem 3.1.4, and the property that
derivatives are converted into multiplications, Theorem 3.1.5, to transform the differential equation
into an algebraic equation for L[y]. Let us see how this works in a simple example, a first order
linear equation with constant coefficients—we already solved it in § ??.

Example 3.2.1. Use the Laplace transform to find the solution y to the initial value problem

y 0 + 2y = 0, y(0) = 3.

Solution: In § ?? we saw one way to solve this problem, using the integrating factor method. One
can check that the solution is y(t) = 3e−2t . We now use the Laplace transform. First, compute the
Laplace transform of the differential equation,

L[y 0 + 2y] = L[0] = 0.

Theorem 3.1.4 says the Laplace transform is a linear operation, that is,

L[y 0 ] + 2 L[y] = 0.

Theorem 3.1.5 relates derivatives and multiplications, as follows,


 
s L[y] − y(0) + 2 L[y] = 0 ⇒ (s + 2)L[y] = y(0).

In the last equation we have been able to transform the original differential equation for y into an
algebraic equation for L[y]. We can solve for the unknown L[y] as follows,

y(0) 3
L[y] = ⇒ L[y] = ,
s+2 s+2
where in the last step we introduced the initial condition y(0) = 3. From the list of Laplace
transforms given in §. 3.1 we know that
1 3 3
L[eat ] = ⇒ = 3L[e−2t ] ⇒ = L[3 e−2t ].
s−a s+2 s+2

So we arrive at L[y(t)] = L[3 e−2t ]. Here is where we need one more property of the Laplace
transform. We show right after this example that

L[y(t)] = L[3 e−2t ] ⇒ y(t) = 3 e−2t .

This property is called one-to-one. Hence the only solution is y(t) = 3 e−2t . C
3.2. THE INITIAL VALUE PROBLEM 143

3.2.2. One-to-One Property. Let us repeat the method we used to solve the differential
equation in Example 3.2.1. We first computed the Laplace transform of the whole differential
equation. Then we use the linearity of the Laplace transform, Theorem 3.1.4, and the property
that derivatives are converted into multiplications, Theorem 3.1.5, to transform the differential
equation into an algebraic equation for L[y]. We solved the algebraic equation and we got an
expression of the form
L[y(t)] = H(s),
where we have collected all the terms that come from the Laplace transformed differential equation
into the function H. We then used a Laplace transform table to find a function h such that
L[h(t)] = H(s).
We arrived to an equation of the form
L[y(t)] = L[h(t)].
Clearly, y = h is one solution of the equation above, hence a solution to the differential equation.
We now show that there are no solutions to the equation L[y] = L[h] other than y = h. The
reason is that the Laplace transform on continuous functions of exponential order is an one-to-one
transformation, also called injective.

Theorem 3.2.1 (One-to-One). If f , g are continuous on [0, ∞) of exponential order, then


L[f ] = L[g] ⇒ f = g.

Remarks:
(a) The result above holds for continuous functions f and g. But it can be extended to piecewise
continuous functions. In the case of piecewise continuous functions f and g satisfying L[f ] =
RT
L[g] one can prove that f = g + h, where h is a null function, meaning that 0 h(t) dt = 0 for
all T > 0. See Churchill’s textbook [4], page 14.
(b) Once we know that the Laplace transform is a one-to-one transformation, we can define the
inverse transformation in the usual way.

Definition 3.2.2. The inverse Laplace transform, denoted L−1 , of a function F is


L−1 [F (s)] = f (t) ⇔ F (s) = L[f (t)].

Remarks: There is an explicit formula for the inverse Laplace transform, which involves an integral
on the complex plane,
Z a+ic
1

L−1 [F (s)] = lim est F (s) ds.

t 2πi c→∞ a−ic
See for example Churchill’s textbook [4], page 176. However, we do not use this formula in these
notes, since it involves integration on the complex plane.
Proof of Theorem 3.2.1: The proof is based on a clever change of variables and on Weierstrass
Approximation Theorem of continuous functions by polynomials. Before we get to the change of
variable we need to do some rewriting. Introduce the function u = f − g, then the linearity of the
Laplace transform implies
L[u] = L[f − g] = L[f ] − L[g] = 0.
What we need to show is that the function u vanishes identically. Let us start with the definition
of the Laplace transform, Z ∞
L[u] = e−st u(t) dt.
0
We know that f and g are of exponential order, say s0 , therefore u is of exponential order s0 ,
meaning that there exist positive constants k and T such that
u(t) < k es0 t ,

t > T.
144 3. THE LAPLACE TRANSFORM METHOD

Evaluate L[u] at s̃ = s1 + n + 1, where s1 is any real number such that s1 > s0 , and n is any positive
integer. We get
Z ∞ Z ∞
−(s1 +n+1)t
L[u] = e u(t) dt = e−s1 t e−(n+1)t u(t) dt.

s̃ 0 0

We now do the substitution y = e−t , so dy = −e−t dt,


Z 0 Z 1
y s1 y n u − ln(y) (−dy) = y s1 y n u − ln(y) dy.
 
L[u] =

s̃ 1 0

Introduce the function v(y) = y s1 u( − ln(y) , so the integral is




Z 1
L[u] = y n v(y) dy. (3.2.1)

s̃ 0
We know that L[u] exists because u is continuous and of exponential order, so the function v does
not diverge at y = 0. To double check this, recall that t = − ln(y) → ∞ as y → 0+ , and u is of
exponential order s0 , hence
lim |v(y)| = lim e−s1 t |u(t)| < lim e−(s1 −s0 )t = 0.
y→0+ t→∞ t→∞

Our main hypothesis is that L[u] = 0 for all values of s such that L[u] is defined, in particular s̃.
By looking at Eq. (3.2.1) this means that
Z 1
y n v(y) dy = 0, n = 1, 2, 3, · · · .
0
The equation above and the linearity of the integral imply that this function v is perpendicular to
every polynomial p, that is Z 1
p(y) v(y) dy = 0, (3.2.2)
0
for every polynomial p. Knowing that, we can do the following calculation,
Z 1 Z 1 Z 1
v 2 (y) dy =

v(y) − p(y) v(y) dy + p(y) v(y) dy.
0 0 0
The last term in the second equation above vanishes because of Eq. (3.2.2), therefore
Z 1 Z 1
v 2 (y) dy =

v(y) − p(y) v(y) dy
0 0
Z 1

6 v(y) − p(y) |v(y)| dy
0
Z 1

6 max |v(y)| v(y) − p(y) dy. (3.2.3)
y∈[0,1] 0
We remark that the inequality above is true for every polynomial p. Here is where we use the
Weierstrass Approximation Theorem, which essentially says that every continuous function on a
closed interval can be approximated by a polynomial.

Theorem 3.2.3 (Weierstrass Approximation). If f is a continuous function on a closed in-


terval [a, b], then for every  > 0 there exists a polynomial q such that
max |f (y) − q (y)| < .
y∈[a,b]

The proof of this theorem can be found on a real analysis textbook. Weierstrass result implies that,
given v and  > 0, there exists a polynomial p such that the inequality in (3.2.3) has the form
Z 1 Z 1
v 2 (y) dy 6 max |v(y)|

v(y) − p (y) dy 6 max |v(y)| .
0 y∈[0,1] 0 y∈[0,1]

Since  can be chosen as small as we please, we get


Z 1
v 2 (y) dy = 0.
0
3.2. THE INITIAL VALUE PROBLEM 145

But v is continuous, hence v = 0, meaning that f = g. This establishes the Theorem. 

3.2.3. Partial Fractions. We are now ready to start using the Laplace transform to solve
second order linear differential equations with constant coefficients. The differential equation for y
will be transformed into an algebraic equation for L[y]. We will then arrive to an equation of the
form L[y(t)] = H(s). We will see, already in the first example below, that usually this function
H does not appear in Table 1. We will need to rewrite H as a linear combination of simpler
functions, each one appearing in Table 1. One of the more used techniques to do that is called
Partial Fractions. Let us solve the next example.

Example 3.2.2. Use the Laplace transform to find the solution y to the initial value problem
y 00 − y 0 − 2y = 0, y(0) = 1, y 0 (0) = 0.

Solution: First, compute the Laplace transform of the differential equation,


L[y 00 − y 0 − 2y] = L[0] = 0.
Theorem 3.1.4 says that the Laplace transform is a linear operation,
L[y 00 ] − L[y 0 ] − 2 L[y] = 0.
Then, Theorem 3.1.5 relates derivatives and multiplications,
h i h i
s2 L[y] − s y(0) − y 0 (0) − s L[y] − y(0) − 2 L[y] = 0,

which is equivalent to the equation


(s2 − s − 2) L[y] = (s − 1) y(0) + y 0 (0).
Once again we have transformed the original differential equation for y into an algebraic equation
for L[y]. Introduce the initial condition into the last equation above, that is,
(s2 − s − 2) L[y] = (s − 1).
Solve for the unknown L[y] as follows,
(s − 1)
L[y] = .
(s2 − s − 2)
The function on the right hand side above does not appear in Table 1. We now use partial fractions
to find a function whose Laplace transform is the right hand side of the equation above. First find
the roots of the polynomial in the denominator,
√ s+ = 2,

1
s2 − s − 2 = 0 ⇒ s± = 1 ± 1 + 8


2 s− = −1,
that is, the polynomial has two real roots. In this case we factorize the denominator,
(s − 1)
L[y] = .
(s − 2)(s + 1)
The partial fraction decomposition of the right-hand side in the equation above is the following:
Find constants a and b such that
(s − 1) a b
= + .
(s − 2)(s + 1) s−2 s+1
A simple calculation shows
(s − 1) a b a(s + 1) + b(s − 2) s(a + b) + (a − 2b)
= + = = .
(s − 2)(s + 1) s−2 s+1 (s − 2)(s + 1) (s − 2)(s + 1)
Hence constants a and b must be solutions of the equations
a + b = 1,

(s − 1) = s(a + b) + (a − 2b) ⇒
a − 2b = −1.
146 3. THE LAPLACE TRANSFORM METHOD

1 2
The solution is a = and b = . Hence,
3 3
1 1 2 1
L[y] = + .
3 (s − 2) 3 (s + 1)
From the list of Laplace transforms given in § ??, Table 1, we know that
1 1 1
L[eat ] = ⇒ = L[e2t ], = L[e−t ].
s−a s−2 s+1
So we arrive at the equation
1 2 h1 i
L[y] = L[e2t ] + L[e−t ] = L e2t + 2 e−t
3 3 3
We conclude that
1 2t
e + 2 e−t .

y(t) =
3
C

The Partial Fraction Method is usually introduced in a second course of Calculus to integrate
rational functions. We need it here to use Table 1 to find Inverse Laplace transforms. The method
applies to rational functions
Q(s)
R(s) = ,
P (s)
where P , Q are polynomials and the degree of the numerator is less than the degree of the denom-
inator. In the example above
(s − 1)
R(s) = 2 .
(s − s − 2)
One starts rewriting the polynomial in the denominator as a product of polynomials degree two or
one. In the example above,
(s − 1)
R(s) = .
(s − 2)(s + 1)
One then rewrites the rational function as an addition of simpler rational functions. In the example
above,
a b
R(s) = + .
(s − 2) (s + 1)
We now solve a few examples to recall the different partial fraction cases that can appear when
solving differential equations.

Example 3.2.3. Use the Laplace transform to find the solution y to the initial value problem
y 00 − 4y 0 + 4y = 0, y(0) = 1, y 0 (0) = 1.

Solution: First, compute the Laplace transform of the differential equation,


L[y 00 − 4y 0 + 4y] = L[0] = 0.
Theorem 3.1.4 says that the Laplace transform is a linear operation,
L[y 00 ] − 4 L[y 0 ] + 4 L[y] = 0.
Theorem 3.1.5 relates derivatives with multiplication,
h i h i
s2 L[y] − s y(0) − y 0 (0) − 4 s L[y] − y(0) + 4 L[y] = 0,

which is equivalent to the equation


(s2 − 4s + 4) L[y] = (s − 4) y(0) + y 0 (0).
Introduce the initial conditions y(0) = 1 and y 0 (0) = 1 into the equation above,
(s2 − 4s + 4) L[y] = s − 3.
3.2. THE INITIAL VALUE PROBLEM 147

Solve the algebraic equation for L[y],


(s − 3)
L[y] = .
(s2 − 4s + 4)
We now want to find a function y whose Laplace transform is the right hand side in the equation
above. In order to see if partial fractions will be needed, we now find the roots of the polynomial
in the denominator,
1 √
s2 − 4s + 4 = 0 ⇒ s± = 4 ± 16 − 16

⇒ s+ = s− = 2.
2
that is, the polynomial has a single real root, so L[y] can be written as
(s − 3)
L[y] = .
(s − 2)2
This expression is already in the partial fraction decomposition. We now rewrite the right hand
side of the equation above in a way it is simple to use the Laplace transform table in § ??,
(s − 2) + 2 − 3 (s − 2) 1 1 1
L[y] = = − ⇒ L[y] = − .
(s − 2)2 (s − 2)2 (s − 2)2 s−2 (s − 2)2
From the list of Laplace transforms given in Table 1, § ?? we know that
1 1
L[eat ] = ⇒ = L[e2t ],
s−a s−2
1 1
L[teat ] = ⇒ = L[t e2t ].
(s − a)2 (s − 2)2
So we arrive at the equation
L[y] = L[e2t ] − L[t e2t ] = L e2t − t e2t ⇒ y(t) = e2t − t e2t .
 

Example 3.2.4. Use the Laplace transform to find the solution y to the initial value problem
y 00 − 4y 0 + 4y = 3 et , y(0) = 0, y 0 (0) = 0.

Solution: First, compute the Laplace transform of the differential equation,


 1 
L[y 00 − 4y 0 + 4y] = L[3 et ] = 3 .
s−1
The Laplace transform is a linear operation,
3
L[y 00 ] − 4 L[y 0 ] + 4 L[y] = .
s−1
The Laplace transform relates derivatives with multiplication,
h i h i 3
s2 L[y] − s y(0) − y 0 (0) − 4 s L[y] − y(0) + 4 L[y] = ,
s−1
But the initial conditions are y(0) = 0 and y 0 (0) = 0, so
3
(s2 − 4s + 4) L[y] = .
s−1
Solve the algebraic equation for L[y],
3
L[y] = .
(s − 1)(s2 − 4s + 4)
We use partial fractions to simplify the right-hand side above. We start finding the roots of the
polynomial in the denominator,
1 √
s2 − 4s + 4 = 0 ⇒ s± = 4 ± 16 − 16

⇒ s+ = s− = 2.
2
that is, the polynomial has a single real root, so L[y] can be written as
3
L[y] = .
(s − 1)(s − 2)2
148 3. THE LAPLACE TRANSFORM METHOD

The partial fraction decomposition of the right-hand side above is


3 a bs + c a (s − 2)2 + (b s + c)(s − 1)
= + =
(s − 1)(s − 2)2 (s − 1) (s − 2)2 (s − 1)(s − 2)2
From the far right and left expressions above we get
3 = a (s − 2)2 + (b s + c)(s − 1) = a (s2 − 4s + 4) + b s2 − b s + c s − c
Expanding all terms above, and reordering terms, we get
(a + b) s2 + (−4a − b + c) s + (4a − c − 3) = 0.
Since this polynomial in s vanishes for all s ∈ R, we get that
 
a + b = 0, a = 3

−4a − b + c = 0, ⇒ b = −3
 
4a − c − 3 = 0.  c = 9.

So we get
3 3 −3s + 9
L[y] = = +
(s − 1)(s − 2)2 s−1 (s − 2)2
One last trick is needed on the last term above,
−3s + 9 −3(s − 2 + 2) + 9 −3(s − 2) −6 + 9 3 3
= = + =− + .
(s − 2)2 (s − 2)2 (s − 2)2 (s − 2)2 (s − 2) (s − 2)2
So we finally get
3 3 3
L[y] = − + .
s−1 (s − 2) (s − 2)2
From our Laplace transforms Table we know that
1 1
L[eat ] = ⇒ = L[e2t ],
s−a s−2
1 1
L[teat ] = ⇒ = L[te2t ].
(s − a)2 (s − 2)2
So we arrive at the formula
L[y] = 3 L[et ] − 3 L[e2t ] + 3 L[t e2t ] = L 3 (et − e2t + t e2t )
 

So we conclude that y(t) = 3 (et − e2t + t e2t ). C

Example 3.2.5. Use the Laplace transform to find the solution y to the initial value problem
y 00 − 4y 0 + 4y = 3 sin(2t), y(0) = 1, y 0 (0) = 1.

Solution: First, compute the Laplace transform of the differential equation,


L[y 00 − 4y 0 + 4y] = L[3 sin(2t)].
The right hand side above can be expressed as follows,
2 6
L[3 sin(2t)] = 3 L[sin(2t)] = 3 = 2 .
s2 + 22 s +4
Theorem 3.1.4 says that the Laplace transform is a linear operation,
6
L[y 00 ] − 4 L[y 0 ] + 4 L[y] = 2 ,
s +4
and Theorem 3.1.5 relates derivatives with multiplications,
h i h i 6
s2 L[y] − s y(0) − y 0 (0) − 4 s L[y] − y(0) + 4 L[y] = .
s2 + 4
Reorder terms,
6
(s2 − 4s + 4) L[y] = (s − 4) y(0) + y 0 (0) + .
s2 + 4
3.2. THE INITIAL VALUE PROBLEM 149

Introduce the initial conditions y(0) = 1 and y 0 (0) = 1,


6
(s2 − 4s + 4) L[y] = s − 3 + .
s2 + 4
Solve this algebraic equation for L[y], that is,
(s − 3) 6
L[y] = + 2 .
(s2 − 4s + 4) (s − 4 + 4)(s2 + 4)

From the Example above we know that s2 − 4s + 4 = (s − 2)2 , so we obtain


1 1 6
L[y] = − + . (3.2.4)
s−2 (s − 2)2 (s − 2)2 (s2 + 4)
From the previous example we know that
1 1
L[e2t − te2t ] = − . (3.2.5)
s−2 (s − 2)2
We know use partial fractions to simplify the third term on the right hand side of Eq. (3.2.4). The
appropriate partial fraction decomposition for this term is the following: Find constants a, b, c, d,
such that
6 as + b c d
= 2 + +
(s − 2)2 (s2 + 4) s +4 (s − 2) (s − 2)2
Take common denominator on the right hand side above, and one obtains the system

a + c = 0,
−4a + b − 2c + d = 0,
4a − 4b + 4c = 0,
4b − 8c + 4d = 6.

The solution for this linear system of equations is the following:


3 3 3
a= , b = 0, c=− , d= .
8 8 4
Therefore,
6 3 s 3 1 3 1
= − +
(s − 2)2 (s2 + 4) 8 s2 + 4 8 (s − 2) 4 (s − 2)2
We can rewrite this expression above in terms of the Laplace transforms given in Table 1, in Sect. ??,
as follows,
6 3 3 3
= L[cos(2t)] − L[e2t ] + L[te2t ],
(s − 2)2 (s2 + 4) 8 8 4
and using the linearity of the Laplace transform,
6 h3 3 3 i
=L cos(2t) − e2t + te2t . (3.2.6)
(s − 2) (s + 4)
2 2 8 8 4
Finally, introducing Eqs. (3.2.5) and (3.2.6) into Eq. (3.2.4) we obtain
3 3 i
L[y(t)] = L (1 − t) e2t + (−1 + 2t) e2t + cos(2t) .

8 8
Since the Laplace transform is an invertible transformation, we conclude that
3 3
y(t) = (1 − t) e2t + (2t − 1) e2t + cos(2t).
8 8
C
150 3. THE LAPLACE TRANSFORM METHOD

3.2.4. Higher Order IVP. The Laplace transform method can be used with linear differen-
tial equations of higher order than second order, as long as the equation coefficients are constant.
Below we show how we can solve a fourth order equation.

Example 3.2.6. Use the Laplace transform to find the solution y to the initial value problem
y(0) = 1, y 0 (0) = 0,
y (4) − 4y = 0,
y 00 (0) = −2, y 000 (0) = 0.

Solution: Compute the Laplace transform of the differential equation,


L[y (4) − 4y] = L[0] = 0.
The Laplace transform is a linear operation,
L[y (4) ] − 4 L[y] = 0,
and the Laplace transform relates derivatives with multiplications,
h i
s4 L[y] − s3 y(0) − s2 y 0 (0) − s y 00 (0) − y 000 (0) − 4 L[y] = 0.
From the initial conditions we get
h i (s3 − 2s)
s4 L[y] − s3 − 0 + 2s − 0 − 4 L[y] = 0 ⇒ (s4 − 4)L[y] = s3 − 2s ⇒ L[y] = .
(s4 − 4)
In this case we are lucky, because
s(s2 − 2) s
L[y] = = 2 .
(s2 − 2)(s2 + 2) (s + 2)
Since
s
L[cos(at)] = ,
s2 + a2
we get that √ √
L[y] = L[cos( 2t)] ⇒ y(t) = cos( 2t).
C
3.2. THE INITIAL VALUE PROBLEM 151

3.2.5. Exercises.

3.2.1.- . 3.2.2.- .
152 3. THE LAPLACE TRANSFORM METHOD

3.3. Discontinuous Sources


The Laplace transform method can be used to solve linear differential equations with discontinuous
sources. In this section we review the simplest discontinuous function—the step function—and we
use steps to construct more general piecewise continuous functions. Then, we compute the Laplace
transform of a step function. But the main result in this section are the translation identities,
Theorem 3.3.3. These identities, together with the Laplace transform table in § 3.1, can be very
useful to solve differential equations with discontinuous sources.

3.3.1. Step Functions. We start with a definition of a step function.

Definition 3.3.1. The step function at t = 0 is denoted by u and given by


0 t < 0,

u(t) = (3.3.1)
1 t > 0.

Example 3.3.1. Graph the step u, uc (t) = u(t − c), and u−c (t) = u(t + c), for c > 0.
Solution: The step function u and its right and left translations are plotted in Fig. 1.

u u u
u(t) u(t − c) u(t + c)
1

0 t 0 c t −c 0 t

Figure 1. The graph of the step function given in Eq. (3.3.1), a right and
a left translation by a constant c > 0, respectively, of this step function.

Recall that given a function with values f (t) and a positive constant c, then f (t−c) and f (t+c)
are the function values of the right translation and the left translation, respectively, of the original
function f . In Fig. 2 we plot the graph of functions f (t) = eat , g(t) = u(t) eat and their respective
right translations by c > 0.

f f (t) = et f f (t) = et−c f f (t) = u(t) et f f (t) = u(t − c) et−c

1 1 1 1

0 t 0 c t 0 t 0 c t

Figure 2. The function f (t) = et , its right translation by c > 0, the


function f (t) = u(t) eat and its right translation by c.

Right and left translations of step functions are useful to construct bump functions.

Example 3.3.2. Graph the bump function b(t) = u(t − a) − u(t − b), where a < b.
3.3. DISCONTINUOUS SOURCES 153

Solution: The bump function we need to graph is



0
 t < a,
b(t) = u(t − a) − u(t − b) ⇔ b(t) = 1 a6t<b (3.3.2)

0 t > b.
The graph of a bump function is given in Fig. 3, constructed from two step functions. Step and
bump functions are useful to construct more general piecewise continuous functions.

u u u
u(t − a) u(t − b) b(t)
1 1 1

0 a b t 0 a b t 0 a b t

Figure 3. A bump function b constructed with translated step functions.

Example 3.3.3. Graph the function


f (t) = u(t − 1) − u(t − 2) eat .
 
y f (t)
Solution: Recall that the function
b(t) = u(t − 1) − u(t − 2),
1
is a bump function with sides at t = 1 andf t = 2.
Then, the function
f (t) = b(t) eat ,
0 1 2 t
is nonzero where b is nonzero, that is on [1, 2), and
on that domain it takes values eat . The graph of Figure 4. Function f .
f is given in Fig. 4. C

3.3.2. The Laplace Transform of Steps. We compute the Laplace transform of a step
function using the definition of the Laplace transform.

Theorem 3.3.2. For every number c ∈ R and and every s > 0 holds
 −cs
e

 for c > 0,
L[u(t − c)] = s
 1
 for c < 0.
s

Proof of Theorem 3.3.2: Consider the case c > 0. The Laplace transform is
Z ∞ Z ∞
L[u(t − c)] = e−st u(t − c) dt = e−st dt,
0 c

where we used that the step function vanishes for t < c. Now compute the improper integral,
1 −N s e−cs e−cs
− e−cs =

L[u(t − c)] = lim − e ⇒ L[u(t − c)] = .
N →∞ s s s
154 3. THE LAPLACE TRANSFORM METHOD

Consider now the case of c < 0. The step function is identically equal to one in the domain of
integration of the Laplace transform, which is [0, ∞), hence
Z ∞ Z ∞
1
L[u(t − c)] = e−st u(t − c) dt = e−st dt = L[1] = .
0 0 s
This establishes the Theorem. 

Example 3.3.4. Compute L[3 u(t − 2)].


Solution: The Laplace transform is a linear operation, so
L[3 u(t − 2)] = 3 L[u(t − 2)],
3 e−2s
and the Theorem 3.3.2 above implies that L[3 u(t − 2)] = . C
s

Remarks:
(a) The LT is an invertible transformation in the set of functions we work in our class.
(b) L[f ] = F ⇔ L−1 [F ] = f .

h e−3s i
Example 3.3.5. Compute L−1 .
s
e−3s h e−3s i
Solution: Theorem 3.3.2 says that = L[u(t − 3)], so L−1 = u(t − 3). C
s s

3.3.3. Translation Identities. We now introduce two properties relating the Laplace trans-
form and translations. The first property relates the Laplace transform of a translation with a
multiplication by an exponential. The second property can be thought as the inverse of the first
one.

Theorem 3.3.3 (Translation Identities). If L[f (t)](s) exists for s > a, then
L[u(t − c)f (t − c)] = e−cs L[f (t)], s > a, c>0 (3.3.3)
L[ect f (t)] = L[f (t)](s − c), s > a + c, c ∈ R. (3.3.4)

Example 3.3.6. Take f (t) = cos(t) and write the equations given the Theorem above.
Solution:  −cs s
L[u(t − c) cos(t − c)] = e

s2 + 1
s
L[cos(t)] = ⇒ (s − c)
s2 + 1 
 L[ect cos(t)] = .
(s − c)2 + 1
C

Remarks:
(a) We can highlight the main idea in the theorem above as follows:
  
L right-translation (uf ) = (exp) L[f ] ,
  
L (exp) (f ) = translation L[f ] .
(b) Denoting F (s) = L[f (t)], then an equivalent expression for Eqs. (3.3.3)-(3.3.4) is
L[u(t − c)f (t − c)] = e−cs F (s),
L[ect f (t)] = F (s − c).
3.3. DISCONTINUOUS SOURCES 155

(c) The inverse form of Eqs. (3.3.3)-(3.3.4) is given by,


L−1 [e−cs F (s)] = u(t − c)f (t − c), (3.3.5)
−1 ct
L [F (s − c)] = e f (t). (3.3.6)
(d) Eq. (3.3.4) holds for all c ∈ R, while Eq. (3.3.3) holds only for c > 0.
(e) Show that in the case that c < 0 the following equation holds,
 Z |c| 
L[u(t + |c|)f (t + |c|)] = e|c|s L[f (t)] − e−st f (t) dt .
0

Proof of Theorem 3.3.3: The proof is again based in a change of the integration variable. We
start with Eq. (3.3.3), as follows,
Z ∞
L[u(t − c)f (t − c)] = e−st u(t − c)f (t − c) dt
0
Z ∞
= e−st f (t − c) dt, τ = t − c, dτ = dt, c > 0,
c
Z ∞
= e−s(τ +c) f (τ ) dτ
0
Z ∞
−cs
=e e−sτ f (τ ) dτ
0
= e−cs L[f (t)], s > a.
The proof of Eq. (3.3.4) is a bit simpler, since
Z ∞ Z ∞
L ect f (t) = e−st ect f (t) dt = e−(s−c)t f (t) dt = L[f (t)](s − c),
 
0 0

which holds for s − c > a. This establishes the Theorem. 


 
Example 3.3.7. Compute L u(t − 2) sin(a(t − 2)) .
a
Solution: Both L[sin(at)] = and L[u(t − c)f (t − c)] = e−cs L[f (t)] imply
s2 + a2
a
L u(t − 2) sin(a(t − 2)) = e−2s L[sin(at)] = e−2s 2
 
.
s + a2
  a e−2s
We conclude: L u(t − 2) sin(a(t − 2)) = 2 . C
s + a2

Example 3.3.8. Compute L e3t sin(at) .


 

Solution: Since L[ect f (t)] = L[f ](s − c), then we get


a
L e3t sin(at) =
 
, s > 3.
(s − 3)2 + a2
C

Example 3.3.9. Compute both L u(t − 2) cos a(t − 2) and L e3t cos(at) .
   

  s
Solution: Since L cos(at) = 2 , then
s + a2
s (s − 3)
L u(t − 2) cos a(t − 2) = e−2s 2 L e3t cos(at) =
   
, .
(s + a2 ) (s − 3)2 + a2
C

Example 3.3.10. Find the Laplace transform of the function



0 t < 1,
f (t) = (3.3.7)
(t2 − 2t + 2) t > 1.
156 3. THE LAPLACE TRANSFORM METHOD

Solution: The idea is to rewrite function f so we can use the Laplace transform Table 1, in § 3.1
to compute its Laplace transform. Since the function f vanishes for all t < 1, we use step functions
to write f as
f (t) = u(t − 1)(t2 − 2t + 2).
Now, we would like to use the translation identity in Eq. (3.3.3), but we can not, since the step is
a function of (t − 1) while the polynomial is a function of t. We need to rewrite the polynomial as
a function of (t − 1). So we add and subtract 1 in the appropriate places
t2 − 2t + 2 = (t − 1 + 1)2 − 2(t − 1 + 1) + 2 .


Recall the identity (a + b)2 = a2 + 2ab + b2 , and use it in the quadratic term above for a = (t − 1)
and b = 1. We get
(t − 1 + 1)2 = (t − 1)2 + 2(t − 1) + 12 .
This identity into the polynomial above implies
t2 − 2t + 2 = (t − 1)2 + 2(t − 1) + 1 − 2(t − 1) − 2 + 2 ⇒ t2 − 2t + 2 = (t − 1)2 + 1.


The polynomial is a parabola t2 translated to the right and up by one. This is a discontinuous
function, as it can be seen in Fig. 5.

So the function f can be written as follows,


y u(t − 1) [(t − 1)2 + 1]
f (t) = u(t − 1) (t − 1)2 + u(t − 1).
2
Since we know that L[t2 ] = , then
s3
Eq. (3.3.3) implies
L[f (t)] = L[u(t − 1) (t − 1)2 ] + L[u(t − 1)]
1
2 1
= e−s 3 + e−s
s s
so we get 0 1 t
e−s
L[f (t)] = 3 2 + s2 .

Figure 5. Function f given
s
C in Eq. (3.3.7).

e−4s
Example 3.3.11. Find the function f such that L[f (t)] = .
s2 + 5
Solution: Notice that

−4s
 1  1 −4s  5 
L[f (t)] = e 2
⇒ L[f (t)] = √ e √ 2 .
s +5 5 s2 + 5
a
Recall that L[sin(at)] = , then
(s2 + a2 )
1 √
L[f (t)] = √ e−4s L[sin( 5t)].
5
But the translation identity
e−cs L[f (t)] = L[u(t − c)f (t − c)]
implies
1  √ 
L[f (t)] = √ L u(t − 4) sin 5 (t − 4) ,
5
hence we obtain
1 √ 
f (t) = √ u(t − 4) sin 5 (t − 4) .
5
C
3.3. DISCONTINUOUS SOURCES 157

(s − 1)
Example 3.3.12. Find the function f (t) such that L[f (t)] = .
(s − 2)2 + 3
Solution: We first rewrite the right-hand side above as follows,
(s − 1 − 1 + 1)
L[f (t)] =
(s − 2)2 + 3
(s − 2) 1
= +
(s − 2)2 + 3 (s − 2)2 + 3

(s − 2) 1 3
= √ 2 + √ √ 2
(s − 2)2 + 3 3 (s − 2)2 + 3
√ 1 √
= L[cos( 3 t)](s − 2) + √ L[sin( 3 t)](s − 2).
3
But the translation identity L[f (t)](s − c) = L[ect f (t)] implies
√  1  √ 
L[f (t)] = L e2t cos 3 t + √ L e2t sin 3 t .

3
So, we conclude that
e2t h√ √  √ i
f (t) = √ 3 cos 3 t + sin 3 t .
3
C
h 2e−3s i
Example 3.3.13. Find L−1 .
s2 − 4
h a i
Solution: Since L−1 = sinh(at) and L−1 e−cs fˆ(s) = u(t − c) f (t − c), then
 
s2 − a2
h 2e−3s i h 2 i h 2e−3s i
L−1 2 = L−1 e−3s 2 ⇒ L−1 2

= u(t − 3) sinh 2(t − 3) .
s −4 s −4 s −4
C

e−2s
Example 3.3.14. Find a function f such that L[f (t)] = .
s2 + s − 2
Solution: Since the right hand side above does not appear in the Laplace transform Table in § 3.1,
we need to simplify it in an appropriate way. The plan is to rewrite the denominator of the rational
function 1/(s2 + s − 2), so we can use partial fractions to simplify this rational function. We first
find out whether this denominator has real or complex roots:
√ s+ = 1,

1 
s± = −1 ± 1 + 8 ⇒
2 s− = −2.
We are in the case of real roots, so we rewrite
s2 + s − 2 = (s − 1) (s + 2).
The partial fraction decomposition in this case is given by
a + b = 0,

1 a b (a + b) s + (2a − b)
= + = ⇒
(s − 1) (s + 2) (s − 1) (s + 2) (s − 1) (s + 2) 2a − b = 1.
The solution is a = 1/3 and b = −1/3, so we arrive to the expression
1 1 1 1
L[f (t)] = e−2s − e−2s .
3 s−1 3 s+2
Recalling that
1
L[eat ] = ,
s−a
and Eq. (3.3.3) we obtain the equation
1   1 
L[f (t)] = L u(t − 2) e(t−2) − L u(t − 2) e−2(t−2)

3 3
158 3. THE LAPLACE TRANSFORM METHOD

which leads to the conclusion:


1 h i
f (t) = u(t − 2) e(t−2) − e−2(t−2) .
3
C

3.3.4. Solving Differential Equations. The last three examples in this section show how to
use the methods presented above to solve differential equations with discontinuous source functions.

Example 3.3.15. Use the Laplace transform to find the solution of the initial value problem
y 0 + 2y = u(t − 4), y(0) = 3.

Solution: We compute the Laplace transform of the whole equation,


e−4s
L[y 0 ] + 2 L[y] = L[u(t − 4)] = .
s
From the previous section we know that
  e−4s e−4s
s L[y] − y(0) + 2 L[y] = ⇒ (s + 2) L[y] = y(0) + .
s s
We introduce the initial condition y(0) = 3 into equation above,
3 1 1
+ e−4s ⇒ L[y] = 3 L e−2t + e−4s
 
L[y] = .
(s + 2) s(s + 2) s(s + 2)
We need to invert the Laplace transform on the last term on the right hand side in equation above.
We use the partial fraction decomposition on the rational function above, as follows
a + b = 0,

1 a b a(s + 2) + bs (a + b) s + (2a)
= + = = ⇒
s(s + 2) s (s + 2) s(s + 2) s(s + 2) 2a = 1.
We conclude that a = 1/2 and b = −1/2, so
1 1 h1 1 i
= − .
s(s + 2) 2 s (s + 2)
We then obtain
 1 h −4s 1 1 i
L[y] = 3 L e−2t + − e−4s

e
2 s (s + 2)
 −2t  1  
L[u(t − 4)] − L u(t − 4) e−2(t−4) .

= 3L e +
2
Hence, we conclude that
1 h i
y(t) = 3e−2t + u(t − 4) 1 − e−2(t−4) .
2
C

Example 3.3.16. Use the Laplace transform to find the solution to the initial value problem

00 0 5 0 1 06t<π
y + y + y = b(t), y(0) = 0, y (0) = 0, b(t) = (3.3.8)
4 0 t > π.

Solution: From Fig. 6, the source function b can be written as


b(t) = u(t) − u(t − π).
The last expression for b is particularly useful to find its Laplace transform,
1 1 1
L[b(t)] = L[u(t)] − L[u(t − π)] = + e−πs ⇒ L[b(t)] = (1 − e−πs ) .
s s s
Now Laplace transform the whole equation,
5
L[y 00 ] + L[y 0 ] + L[y] = L[b].
4
3.3. DISCONTINUOUS SOURCES 159

u u b
u(t) u(t − π) u(t) − u(t − π)
1 1 1

0 π t 0 π t 0 π t

Figure 6. The graph of the u, its translation and b as given in Eq. (3.3.8).

Since the initial condition are y(0) = 0 and y 0 (0) = 0, we obtain


 5 1 1
s2 + s + L[y] = 1 − e−πs ⇒ L[y] = 1 − e−πs 

.
4 s s s + s + 54
2

Introduce the function


1
H(s) =   ⇒ y(t) = L−1 [H(s)] − L−1 [e−πs H(s)].
5
s s2 +s+ 4

That is, we only need to find the inverse Laplace transform of H. We use partial fractions to
simplify the expression of H. We first find out whether the denominator has real or complex roots:
5 1 √
s2 + s + = 0 ⇒ s± = −1 ± 1 − 5 ,

4 2
so the roots are complex valued. An appropriate partial fraction decomposition is
1 a (bs + c)
H(s) = 5
 = +
s + s + 54

s s2 +s+ 4 s 2

Therefore, we get
 5 5
1 = a s2 + s + + s (bs + c) = (a + b) s2 + (a + c) s + a.
4 4
This equation implies that a, b, and c, satisfy the equations
5
a + b = 0, a + c = 0, a = 1.
4
4 4 4
The solution is, a = , b = − , c = − . Hence, we have found that,
5 5 5
1 4 h1 (s + 1) i
H(s) =   = − 
s2 + s + 54 s 5 s s2 + s + 54

Complete the square in the denominator,


5 h 1 1i 1 5  1 2
s2 + s + = s2 + 2 s+ − + = s+ + 1.
4 2 4 4 4 2
Replace this expression in the definition of H, that is,
4 h1 (s + 1) i
H(s) = − 2
5 s s + 21 + 1
 

Rewrite the polynomial in the numerator,


 1 1  1 1
(s + 1) = s + + = s+ + ,
2 2 2 2
hence we get
 
1
4 1
h s + 2 1 1 i
H(s) = − 2 −  2  .
5 s 1
s+ 2 +1 2 s+ 1
+1
2
160 3. THE LAPLACE TRANSFORM METHOD

Use the Laplace transform table to get H(s) equal to


4h  1 i
L[1] − L e−t/2 cos(t) − L[e−t/2 sin(t)] ,

H(s) =
5 2
equivalently
h4 1 i
H(s) = L 1 − e−t/2 cos(t) − e−t/2 sin(t) .
5 2
Denote
4h 1 i
h(t) = 1 − e−t/2 cos(t) − e−t/2 sin(t) . ⇒ H(s) = L[h(t)].
5 2
Recalling L[y(t)] = H(s) + e−πs H(s), we obtain L[y(t)] = L[h(t)] + e−πs L[h(t)], that is,
y(t) = h(t) + u(t − π)h(t − π).
C

Example 3.3.17. Use the Laplace transform to find the solution to the initial value problem

5 sin(t) 0 6 t < π
y 00 + y 0 + y = g(t), y(0) = 0, y 0 (0) = 0, g(t) = (3.3.9)
4 0 t > π.

Solution: From Fig. 7, the source function g can be written as the following product,
 
g(t) = u(t) − u(t − π) sin(t),
since u(t) − u(t − π) is a box function, taking value one in the interval [0, π] and zero on the
complement. Finally, notice that the equation sin(t) = − sin(t − π) implies that the function g can
be expressed as follows,
g(t) = u(t) sin(t) − u(t − π) sin(t) ⇒ g(t) = u(t) sin(t) + u(t − π) sin(t − π).
The last expression for g is particularly useful to find its Laplace transform,

v b g
v(t) = sin(t) u(t) − u(t − π) g(t)
1 1 1

0 π t 0 π t 0 π t

Figure 7. The graph of the sine function, a square function u(t) − u(t − π)
and the source function g given in Eq. (3.3.9).

1 1
L[g(t)] = + e−πs 2 .
(s2 + 1) (s + 1)
With this last transform is not difficult to solve the differential equation. As usual, Laplace trans-
form the whole equation,
5
L[y 00 ] + L[y 0 ] + L[y] = L[g].
4
Since the initial condition are y(0) = 0 and y 0 (0) = 0, we obtain
 5 1 1
s2 + s + L[y] = 1 + e−πs ⇒ L[y] = 1 + e−πs 
 
 .
4 (s2 + 1) s2 + s + 5 (s2 + 1)
4

Introduce the function


1
H(s) =   ⇒ y(t) = L−1 [H(s)] + L−1 [e−πs H(s)].
s2 + s + 54 (s2 + 1)
3.3. DISCONTINUOUS SOURCES 161

That is, we only need to find the Inverse Laplace transform of H. We use partial fractions to
simplify the expression of H. We first find out whether the denominator has real or complex roots:
5 1 √
s2 + s + = 0 ⇒ s± = −1 ± 1 − 5 ,

4 2
so the roots are complex valued. An appropriate partial fraction decomposition is
1 (as + b) (cs + d)
H(s) = 2 = 2 + 2 .
s + s + 45 (s2 + 1) s + s + 45

(s + 1)
Therefore, we get  5
1 = (as + b)(s2 + 1) + (cs + d) s2 + s + ,
4
equivalently,  5   5 
1 = (a + c) s3 + (b + c + d) s2 + a + c + d s + b + d .
4 4
This equation implies that a, b, c, and d, are solutions of
5 5
a + c = 0, b + c + d = 0, a + c + d = 0, b + d = 1.
4 4
Here is the solution to this system:
16 12 16 4
a= , b= , c=− , d= .
17 17 17 17
We have found that,
4 h (4s + 3) (−4s + 1) i
H(s) = + .
17 s2 + s + 54

(s2 + 1)
Complete the square in the denominator,
5 h 1 1i 1 5  1 2
s2 + s + = s2 + 2 s+ − + = s+ + 1.
4 2 4 4 4 2
4 h (4s + 3) (−4s + 1) i
H(s) = 2 + .
17 s+ 1 +1 (s2 + 1)

2
Rewrite the polynomial in the numerator,
 1 1  1
(4s + 3) = 4 s + − +3=4 s+ + 1,
2 2 2
hence we get
s + 21

4 h 1 s 1 i
H(s) = 4 2 + 2  −4 2 + 2 .
17 1 1 (s + 1) (s + 1)

s+ 2 +1 s+ 2 +1
Use the Laplace transform Table in 1 to get H(s) equal to
4 h  −t/2 i
cos(t) + L e−t/2 sin(t) − 4 L[cos(t)] + L[sin(t)] ,
  
H(s) = 4L e
17
equivalently h4 i
H(s) = L 4e−t/2 cos(t) + e−t/2 sin(t) − 4 cos(t) + sin(t) .
17
Denote
4 h −t/2 i
h(t) = 4e cos(t) + e−t/2 sin(t) − 4 cos(t) + sin(t) ⇒ H(s) = L[h(t)].
17
Recalling L[y(t)] = H(s) + e−πs H(s), we obtain L[y(t)] = L[h(t)] + e−πs L[h(t)], that is,
y(t) = h(t) + u(t − π)h(t − π).
C
162 3. THE LAPLACE TRANSFORM METHOD

3.3.5. Exercises.

3.3.1.- . 3.3.2.- .
3.4. GENERALIZED SOURCES 163

3.4. Generalized Sources


We introduce a generalized function—the Dirac delta. We define the Dirac delta as a limit n → ∞
of a particular sequence of functions, {δn }. We will see that this limit is a function on the domain
R − {0}, but it is not a function on R. For that reason we call this limit a generalized function—the
Dirac delta generalized function.
We will show that each element in the sequence {δn } has a Laplace transform, and this sequence
of Laplace transforms {L[δn ]} has a limit as n → ∞. We use this limit of Laplace transforms to
define the Laplace transform of the Dirac delta.
We will solve differential equations having the Dirac delta generalized function as source. Such
differential equations appear often when one describes physical systems with impulsive forces—
forces acting on a very short time but transfering a finite momentum to the system. Dirac’s delta
is tailored to model impulsive forces.
3.4.1. Sequence of Functions and the Dirac Delta. A sequence of functions is a sequence
whose elements are functions. If each element in the sequence is a continuous function, we say that
this is a sequence of continuous functions. Given a sequence of functions {yn }, we compute the
limn→∞ yn (t) for a fixed t. The limit depends on t, so it is a function of t, and we write it as
lim yn (t) = y(t).
n→∞

The domain of the limit function y is smaller or equal to the domain of the yn . The limit of a
sequence of continuous functions may or may not be a continuous function.

Example 3.4.1. The limit of the sequence below is a continuous function,


n  1  o
fn (t) = sin 1 + t → sin(t) as n → ∞.
n
As usual in this section, the limit is computed for each fixed value of t. C

However, not every sequence of continuous functions has a continuous function as a limit.

Example 3.4.2. Consider now the following se-


quence, {un }, for n > 1,
un
0, t<0



 1
un (t) = nt, 0 6 t 6 (3.4.1)
n u3 (t) u2 (t) u1 (t)


 1, 1 1
t> .
n
This is a sequence of continuous functions whose
limit is a discontinuous function. From the few
graphs in Fig. 8 we can see that the limit n → ∞
of the sequence above is a step function, indeed,
0 1 1 1 t
limn→∞ un (t) = ũ(t), where
3 2
0 for t 6 0,

ũ(t) =
1 for t > 0. Figure 8. A few func-
We used a tilde in the name ũ because this step tions in the sequence
function is not the same we defined in the previous {un }.
section. The step u in § 3.3 satisfied u(0) = 1. C

Exercise: Find a sequence {un } so that its limit is the step function u defined in § 3.3.
Although every function in the sequence {un } is continuous, the limit ũ is a discontinuous
function. It is not difficult to see that one can construct sequences of continuous functions having
no limit at all. A similar situation happens when one considers sequences of piecewise discontinuous
164 3. THE LAPLACE TRANSFORM METHOD

functions. In this case the limit could be a continuous function, a piecewise discontinuous function,
or not a function at all.
We now introduce a particular sequence of piecewise discontinuous functions with domain R
such that the limit as n → ∞ does not exist for all values of the independent variable t. The limit
of the sequence is not a function with domain R. In this case, the limit is a new type of object that
we will call Dirac’s delta generalized function. Dirac’s delta is the limit of a sequence of particular
bump functions.

Definition 3.4.1. The Dirac delta generalized function is the limit


δ(t) = lim δn (t),
n→∞

for every fixed t ∈ R of the sequence functions {δn }∞


n=1 ,
h 1 i
δn (t) = n u(t) − u t − . (3.4.2)
n

The sequence of bump functions introduced above


δn
can be rewritten as follows,
0, t<0



 1
δn (t) = n, 0 6 t < δ3 (t)
n 3


 0, 1
t> .
n
We then obtain the equivalent expression,
δ2 (t)
0 for t 6= 0, 2

δ(t) =
∞ for t = 0.

Remark: It can be shown that there exist infin- δ1 (t)


itely many sequences {δ̃n } such that their limit as 1
n → ∞ is Dirac’s delta. For example, another
sequence is
h 1  1 i
δ̃n (t) = n u t + −u t− t
2n 2n 0 11 1
1

32

 0, t<−

 2n
1 1

= n, − 6t6 Figure 9. A few func-
 2n 2n tions in the sequence


 0, 1
t> . {δn }.

2n

The Dirac delta generalized function is the function identically zero on the domain R − {0}.
Dirac’s delta is not defined at t = 0, since the limit diverges at that point. If we shift each element
in the sequence by a real number c, then we define
δ(t − c) = lim δn (t − c), c ∈ R.
n→∞

This shifted Dirac’s delta is identically zero on R − {c} and diverges at t = c. If we shift the graphs
given in Fig. 9 by any real number c, one can see that
Z c+1
δn (t − c) dt = 1
c

for every n > 1. Therefore, the sequence of integrals is the constant sequence, {1, 1, · · · }, which
has a trivial limit, 1, as n → ∞. This says that the divergence at t = c of the sequence {δn } is of
a very particular type. The area below the graph of the sequence elements is always the same. We
3.4. GENERALIZED SOURCES 165

can say that this property of the sequence provides the main defining property of the Dirac delta
generalized function.
Using a limit procedure one can generalize several operations from a sequence to its limit.
For example, translations, linear combinations, and multiplications of a function by a generalized
function, integration and Laplace transforms.

Definition 3.4.2. We introduce the following operations on the Dirac delta:


 
f (t) δ(t − c) + g(t) δ(t − c) = lim f (t) δn (t − c) + g(t) δn (t − c) ,
n→∞
Z b Z b
δ(t − c) dt = lim δn (t − c) dt,
a n→∞ a
L[δ(t − c)] = lim L[δn (t − c)].
n→∞

Remark: The notation in the definitions above could be misleading. In the left hand sides above
we use the same notation as we use on functions, although Dirac’s delta is not a function on R.
Take the integral, for example. When we integrate a function f , the integration symbol means
“take a limit of Riemann sums”, that is,
Z b n
X b−a
f (t) dt = lim f (xi ) ∆x, xi = a + i ∆x, ∆x = .
a n→∞
i=0
n

However, when f is a generalized function in the sense of a limit of a sequence of functions {fn },
then by the integration symbol we mean to compute a different limit,
Z b Z b
f (t) dt = lim fn (t) dt.
a n→∞ a

We use the same symbol, the integration, to mean two different things, depending whether we
integrate a function or a generalized function. This remark also holds for all the operations we
introduce on generalized functions, specially the Laplace transform, that will be often used in the
rest of this section.

3.4.2. Computations with the Dirac Delta. Once we have the definitions of operations
involving the Dirac delta, we can actually compute these limits. The following statement summa-
rizes few interesting results. The first formula below says that the infinity we found in the definition
of Dirac’s delta is of a very particular type; that infinity is such that Dirac’s delta is integrable, in
the sense defined above, with integral equal one.
Z c+
Theorem 3.4.3. For every c ∈ R and  > 0 holds, δ(t − c) dt = 1.
c−

Proof of Theorem 3.4.3: The integral of a Dirac’s delta generalized function is computed as a
limit of integrals,
Z c+ Z c+
δ(t − c) dt = lim δn (t − c) dt.
c− n→∞ c−

If we choose n > 1/, equivalently 1/n < , then the domain of the functions in the sequence is
inside the interval (c − , c + ), and we can write
Z c+ Z c+ 1
n 1
δ(t − c) dt = lim n dt, for < .
c− n→∞ c n
Then it is simple to compute
Z c+  1 
δ(t − c) dt = lim n c + − c = lim 1 = 1.
c− n→∞ n n→∞

This establishes the Theorem. 


166 3. THE LAPLACE TRANSFORM METHOD

The next result is also deeply related with the defining property of the Dirac delta—the se-
quence functions have all graphs of unit area.
Z b
Theorem 3.4.4. If f is continuous on (a, b) and c ∈ (a, b), then f (t) δ(t − c) dt = f (c).
a

Proof of Theorem 3.4.4: We again compute the integral of a Dirac’s delta as a limit of a sequence
of integrals,
Z b Z b
δ(t − c) f (t) dt = lim δn (t − c) f (t) dt
a n→∞ a
Z b h 1 i
= lim n u(t − c) − u t − c − f (t) dt
n→∞ a n
Z 1
c+ n
1
= lim n f (t) dt, < (b − c),
n→∞ c n
R
To get the last line we used that c ∈ [a, b]. Let F be any primitive of f , so F (t) = f (t) dt. Then
we can write,
Z b
 1 
δ(t − c) f (t) dt = lim n F c + − F (c)
a n→∞ n
1  1 
= lim 1  F c + − F (c)
n→∞
n
n
= F 0 (c)
= f (c).
This establishes the Theorem. 
In our next result we compute the Laplace transform of the Dirac delta. We give two proofs
of this result. In the first proof we use the previous theorem. In the second proof we use the same
idea used to prove the previous theorem.
( −cs
e for c > 0,
Theorem 3.4.5. For all s ∈ R holds L[δ(t − c)] =
0 for c < 0.

First Proof of Theorem 3.4.5: We use the previous theorem on the integral that defines a
Laplace transform. Although the previous theorem applies to definite integrals, not to improper
integrals, it can be extended to cover improper integrals. In this case we get
Z ∞ ( −cs
−st e for c > 0,
L[δ(t − c)] = e δ(t − c) dt =
0 0 for c < 0,
This establishes the Theorem. 
Second Proof of Theorem 3.4.5: The Laplace transform of a Dirac’s delta is computed as a
limit of Laplace transforms,
L[δ(t − c)] = lim L[δn (t − c)]
n→∞
h  1 i
= lim L n u(t − c) − u t − c −
n→∞
Z ∞ n
 1  −st
= lim n u(t − c) − u t − c − e dt.
n→∞ 0 n
1
The case c < 0 is simple. For < |c| holds
n
Z ∞
L[δ(t − c)] = lim 0 dt ⇒ L[δ(t − c)] = 0, for s ∈ R, c < 0.
n→∞ 0
3.4. GENERALIZED SOURCES 167

Consider now the case c > 0. We then have,


Z 1
c+ n
L[δ(t − c)] = lim n e−st dt.
n→∞ c

For s = 0 we get
Z 1
c+ n
L[δ(t − c)] = lim n dt = 1 ⇒ L[δ(t − c)] = 1 for s = 0, c > 0.
n→∞ c

In the case that s 6= 0 we get,


Z c+ 1 s
n n 1 (1 − e− n )
n e−st dt = lim − e−cs − e−(c+ n )s = e−cs lim

L[δ(t − c)] = lim s .
n→∞ c n→∞ s n→∞
n
The limit on the last line above is a singular limit of the form 00 , so we can use the l’Hôpital rule
to compute it, that is,
 s s

−n
−n
(1 − e )
s − 2
e s
lim s = lim n s  = lim e− n = 1.
n→∞ n→∞ n→∞
− 2
n n
We then obtain,
L[δ(t − c)] = e−cs for s 6= 0, c > 0.
This establishes the Theorem. 

3.4.3. Applications of the Dirac Delta. Dirac’s delta generalized functions describe impul-
sive forces in mechanical systems, such as the force done by a stick hitting a marble. An impulsive
force acts on an infinitely short time and transmits a finite momentum to the system.

Example 3.4.3. Use Newton’s equation of motion and Dirac’s delta to describe the change of
momentum when a particle is hit by a hammer.
Solution: A point particle with mass m, moving on one space direction, x, with a force F acting
on it is described by
ma = F ⇔ mx00 (t) = F (t, x(t)),
where x(t) is the particle position as function of time, a(t) = x00 (t) is the particle acceleration, and
we will denote v(t) = x0 (t) the particle velocity. We saw in § ?? that Newton’s second law of motion
is a second order differential equation for the position function x. Now it is more convenient to use
the particle momentum, p = mv, to write the Newton’s equation,
mx00 = mv 0 = (mv)0 = F ⇒ p0 = F.
So the force F changes the momentum, P . If we integrate on an interval [t1 , t2 ] we get
Z t2
∆p = p(t2 ) − p(t1 ) = F (t, x(t)) dt.
t1

Suppose that an impulsive force is acting on a particle at t0 transmitting a finite momentum, say
p0 . This is where the Dirac delta is uselful for, because we can write the force as
F (t) = p0 δ(t − t0 ),
then F = 0 on R − {t0 } and the momentum transferred to the particle by the force is
Z t0 +∆t
∆p = p0 δ(t − t0 ) dt = p0 .
t0 −∆t

The momentum tranferred is ∆p = p0 , but the force is identically zero on R − {t0 }. We have
transferred a finite momentum to the particle by an interaction at a single time t0 . C
168 3. THE LAPLACE TRANSFORM METHOD

3.4.4. The Impulse Response Function. We now want to solve differential equations with
the Dirac delta as a source. But there is a particular type of solutions that will be important later
on—solutions to initial value problems with the Dirac delta source and zero initial conditions. We
give these solutions a particular name.

Definition 3.4.6. The impulse response function at the point c > 0 of the constant coefficients
linear operator L(y) = y 00 + a1 y 0 + a0 y, is the solution yδ of
L(yδ ) = δ(t − c), yδ (0) = 0, yδ0 (0) = 0.

Remark: Impulse response functions are also called fundamental solutions.

Theorem 3.4.7. The function yδ is the impulse response function at c > 0 of the constant coeffi-
cients operator L(y) = y 00 + a1 y 0 + a0 y iff holds
h e−cs i
yδ = L−1 .
p(s)
where p is the characteristic polynomial of L.

Remark: The impulse response function yδ at c = 0 satifies


h 1 i
yδ = L−1 .
p(s)

Proof of Theorem 3.4.7: Compute the Laplace transform of the differential equation for for the
impulse response function yδ ,
L[y 00 ] + a1 L[y 0 ] + a0 L[y] = L[δ(t − c)] = e−cs .
Since the initial data for yδ is trivial, we get
(s2 + a1 s + a0 ) L[y] = e−cs .
Since p(s) = s2 + a1 s + a0 is the characteristic polynomial of L, we get
e−cs h e−cs i
L[y] = ⇔ y(t) = L−1 .
p(s) p(s)
All the steps in this calculation are if and only ifs. This establishes the Theorem. 

Example 3.4.4. Find the impulse response function at t = 0 of the linear operator
L(y) = y 00 + 2y 0 + 2y.

Solution: We need to find the solution yδ of the initial value problem


yδ00 + 2yδ0 + 2yδ = δ(t), yδ (0) = 0, yδ0 (0) = 0.
Since the souce is a Dirac delta, we have to use the Laplace transform to solve this problem. So we
compute the Laplace transform on both sides of the differential equation,
L[yδ00 ] + 2 L[yδ0 ] + 2 L[yδ ] = L[δ(t)] = 1 ⇒ (s2 + 2s + 2) L[yδ ] = 1,
where we have introduced the initial conditions on the last equation above. So we obtain
1
L[yδ ] = 2 .
(s + 2s + 2)
The denominator in the equation above has complex valued roots, since
1 √ 
s± = −2 ± 4 − 8 ,
2
therefore, we complete squares s2 + 2s + 2 = (s + 1)2 + 1. We need to solve the equation
1
L[yδ ] =   = L[e−t sin(t)] ⇒ yδ (t) = e−t sin(t).
(s + 1)2 + 1
3.4. GENERALIZED SOURCES 169

Example 3.4.5. Find the impulse response function at t = c > 0 of the linear operator
L(y) = y 00 + 2 y 0 + 2 y.

Solution: We need to find the solution yδ of the initial value problem


yδ00 + 2 yδ0 + 2 yδ = δ(t − c), yδ (0) = 0, yδ0 (0) = 0.
We have to use the Laplace transform to solve this problem because the source is a Dirac’s delta
generalized function. So, compute the Laplace transform of the differential equation,
L[yδ00 ] + 2 L[yδ0 ] + 2 L[yδ ] = L[δ(t − c)].
Since the initial conditions are all zero and c > 0, we get
e−cs
(s2 + 2s + 2) L[yδ ] = e−cs ⇒ L[yδ ] = .
(s2 + 2s + 2)
Find the roots of the denominator,
1 √
s2 + 2s + 2 = 0

⇒ −2 ± 4 − 8
s± =
2
The denominator has complex roots. Then, it is convenient to complete the square in the denomi-
nator,
h 2 i
s2 + 2s + 2 = s2 + 2 s + 1 − 1 + 2 = (s + 1)2 + 1.
2
Therefore, we obtain the expression,
e−cs
L[yδ ] = .
(s + 1)2 + 1
1
Recall that L[sin(t)] = , and L[f ](s − c) = L[ect f (t)]. Then,
s2 + 1
1
= L[e−t sin(t)] ⇒ L[yδ ] = e−cs L[e−t sin(t)].
(s + 1)2 + 1
Since for c > 0 holds e−cs L[f ](s) = L[u(t − c) f (t − c)], we conclude that
yδ (t) = u(t − c) e−(t−c) sin(t − c).
C

Example 3.4.6. Find the solution y to the initial value problem


y 00 − y = −20 δ(t − 3), y(0) = 1, y 0 (0) = 0.

Solution: The source is a generalized function, so we need to solve this problem using the Lapace
transform. So we compute the Laplace transform of the differential equation,
L[y 00 ] − L[y] = −20 L[δ(t − 3)] ⇒ (s2 − 1) L[y] − s = −20 e−3s ,
where in the second equation we have already introduced the initial conditions. We arrive to the
equation
s 1
L[y] = − 20 e−3s 2 = L[cosh(t)] − 20 L[u(t − 3) sinh(t − 3)],
(s2 − 1) (s − 1)
which leads to the solution
y(t) = cosh(t) − 20 u(t − 3) sinh(t − 3).
C
170 3. THE LAPLACE TRANSFORM METHOD

Example 3.4.7. Find the solution to the initial value problem


y 00 + 4y = δ(t − π) − δ(t − 2π), y(0) = 0, y 0 (0) = 0.

Solution: We again Laplace transform both sides of the differential equation,


L[y 00 ] + 4 L[y] = L[δ(t − π)] − L[δ(t − 2π)] ⇒ (s2 + 4) L[y] = e−πs − e−2πs ,
where in the second equation above we have introduced the initial conditions. Then,
e−πs e−2πs
L[y] = 2
− 2
(s + 4) (s + 4)
e−πs 2 e−2πs 2
= −
2 (s2 + 4) 2 (s2 + 4)
1 h  i 1 h  i
= L u(t − π) sin 2(t − π) − L u(t − 2π) sin 2(t − 2π) .
2 2
The last equation can be rewritten as follows,
1   1  
y(t) = u(t − π) sin 2(t − π) − u(t − 2π) sin 2(t − 2π) ,
2 2
which leads to the conclusion that
1 
y(t) = u(t − π) − u(t − 2π) sin(2t).
2
C

3.4.5. Comments on Generalized Sources. We have used the Laplace transform to solve
differential equations with the Dirac delta as a source function. It may be convenient to understand
a bit more clearly what we have done, since the Dirac delta is not an ordinary function but a
generalized function defined by a limit. Consider the following example.

Example 3.4.8. Find the impulse response function at t = c > 0 of the linear operator
L(y) = y 0 .

Solution: We need to solve the initial value problem


y 0 (t) = δ(t − c), y(0) = 0.
In other words, we need to find a primitive of the Dirac delta. However, Dirac’s delta is not even a
function. Anyway, let us compute the Laplace transform of the equation, as we did in the previous
examples,
e−cs
L[y 0 (t)] = L[δ(t − c)] ⇒ s L[y(t)] − y(0) = e−cs ⇒ L[y(t)] = .
s
But we know that
e−cs
= L[u(t − c)] ⇒ L[y(t)] = L[u(t − c)] ⇒ y(t) = u(t − c).
s
C

Looking at the differential equation y 0 (t) = δ(t − c) and at the solution y(t) = u(t − c) one could
like to write them together as
u0 (t − c) = δ(t − c). (3.4.3)
But this is not correct, because the step function is a discontinuous function at t = c, hence not
differentiable. What we have done is something different. We have found a sequence of functions
un with the properties,
lim un (t − c) = u(t − c), lim u0n (t − c) = δ(t − c),
n→∞ n→∞

and we have called y(t) = u(t−c). This is what we actually do when we solve a differential equation
with a source defined as a limit of a sequence of functions, such as the Dirac delta. The Laplace
transform method used on differential equations with generalized sources allows us to solve these
3.4. GENERALIZED SOURCES 171

equations without the need to write any sequence, which are hidden in the definitions of the Laplace
transform of generalized functions. Let us solve the problem in the Example 3.4.8 one more time,
but this time let us show where all the sequences actually are.

Example 3.4.9. Find the solution to the initial value problem


y 0 (t) = δ(t − c), y(0) = 0, c > 0, (3.4.4)

Solution: Recall that the Dirac delta is defined as a limit of a sequence of bump functions,
h  1 i
δ(t − c) = lim δn (t − c), δn (t − c) = n u(t − c) − u t − c − , n = 1, 2, · · · .
n→∞ n
The problem we are actually solving involves a sequence and a limit,
y 0 (t) = lim δn (t − c), y(0) = 0.
n→∞

We start computing the Laplace transform of the differential equation,


L[y 0 (t)] = L[ lim δn (t − c)].
n→∞

We have defined the Laplace transform of the limit as the limit of the Laplace transforms,
L[y 0 (t)] = lim L[δn (t − c)].
n→∞

If the solution is at least piecewise differentiable, we can use the property


L[y 0 (t)] = s L[y(t)] − y(0).
Assuming that property, and the initial condition y(0) = 0, we get
1 L[δn (t − c)]
L[y(t)] = lim L[δn (t − c)] ⇒ L[y(t)] = lim .
s n→∞ n→∞ s
Introduce now the function yn (t) = un (t − c), given in Eq. (3.4.1), which for each n is the only
continuous, piecewise differentiable, solution of the initial value problem
yn0 (t) = δn (t − c), yn (0) = 0.
It is not hard to see that this function un satisfies
L[δn (t − c)]
L[un (t)] = .
s
Therefore, using this formula back in the equation for y we get,
L[y(t)] = lim L[un (t)].
n→∞

For continuous functions we can interchange the Laplace transform and the limit,
L[y(t)] = L[ lim un (t)].
n→∞

So we get the result,


y(t) = lim un (t) ⇒ y(t) = u(t − c).
n→∞

We see above that we have found something more than just y(t) = u(t − c). We have found
y(t) = lim un (t − c),
n→∞

where the sequence elements un are continuous functions with un (0) = 0 and
lim un (t − c) = u(t − c), lim u0n (t − c) = δ(t − c),
n→∞ n→∞

Finally, derivatives and limits cannot be interchanged for un ,


0
lim u0n (t − c) 6= lim un (t − c)
  
n→∞ n→∞
0
so it makes no sense to talk about y . C
172 3. THE LAPLACE TRANSFORM METHOD

When the Dirac delta is defined by a sequence of functions, as we did in this section, the
calculation needed to find impulse response functions must involve sequence of functions and limits.
The Laplace transform method used on generalized functions allows us to hide all the sequences
and limits. This is true not only for the derivative operator L(y) = y 0 but for any second order
differential operator with constant coefficients.

Definition 3.4.8. A solution of the initial value problem with a Dirac’s delta source
y 00 + a1 y 0 + a0 y = δ(t − c), y(0) = y0 , y 0 (0) = y1 , (3.4.5)
where a1 , a0 , y0 , y1 , and c ∈ R, are given constants, is a function
y(t) = lim yn (t),
n→∞

where the functions yn , with n > 1, are the unique solutions to the initial value problems
yn00 + a1 yn0 + a0 yn = δn (t − c), yn (0) = y0 , yn0 (0) = y1 , (3.4.6)
and the source δn satisfy limn→∞ δn (t − c) = δ(t − c).

The definition above makes clear what do we mean by a solution to an initial value problem having
a generalized function as source, when the generalized function is defined as the limit of a sequence
of functions. The following result says that the Laplace transform method used with generalized
functions hides all the sequence computations.

Theorem 3.4.9. The function y is solution of the initial value problem


y 00 + a1 y 0 + a0 y = δ(t − c), y(0) = y0 , y 0 (0) = y1 , c > 0,
iff its Laplace transform satisfies the equation
s2 L[y] − sy0 − y1 + a1 s L[y] − y0 − a0 L[y] = e−cs .
 

This Theorem tells us that to find the solution y to an initial value problem when the source is
a Dirac’s delta we have to apply the Laplace transform to the equation and perform the same
calculations as if the Dirac delta were a function. This is the calculation we did when we computed
the impulse response functions.
Proof of Theorem 3.4.9: Compute the Laplace transform on Eq. (3.4.6),
L[yn00 ] + a1 L[yn0 ] + a0 L[yn ] = L[δn (t − c)].
Recall the relations between the Laplace transform and derivatives and use the initial conditions,
L[yn00 ] = s2 L[yn ] − sy0 − y1 , L[y 0 ] = s L[yn ] − y0 ,
and use these relation in the differential equation,
(s2 + a1 s + a0 ) L[yn ] − sy0 − y1 − a1 y0 = L[δn (t − c)],
Since δn satisfies that limn→∞ δn (t − c) = δ(t − c), an argument like the one in the proof of
Theorem 3.4.5 says that for c > 0 holds
L[δn (t − c)] = L[δ(t − c)] ⇒ lim L[δn (t − c)] = e−cs .
n→∞

Then
(s2 + a1 s + a0 ) lim L[yn ] − sy0 − y1 − a1 y0 = e−cs .
n→∞
Interchanging limits and Laplace transforms we get
(s2 + a1 s + a0 ) L[y] − sy0 − y1 − a1 y0 = e−cs ,
which is equivalent to
s2 L[y] − sy0 − y1 + a1 s L[y] − y0 − a0 L[y] = e−cs .
 

This establishes the Theorem. 


3.4. GENERALIZED SOURCES 173

3.4.6. Exercises.

3.4.1.- . 3.4.2.- .
174 3. THE LAPLACE TRANSFORM METHOD

3.5. Convolutions and Solutions


Solutions of initial value problems for linear nonhomogeneous differential equations can be decom-
posed in a nice way. The part of the solution coming from the initial data can be separated from
the part of the solution coming from the nonhomogeneous source function. Furthermore, the latter
is a kind of product of two functions, the source function itself and the impulse response function
from the differential operator. This kind of product of two functions is the subject of this section.
This kind of product is what we call the convolution of two functions.

3.5.1. Definition and Properties. One can say that the convolution is a generalization of
the pointwise product of two functions. In a convolution one multiplies the two functions evaluated
at different points and then integrates the result. Here is a precise definition.

Definition 3.5.1. The convolution of functions f and g is a function f ∗ g given by


Z t
(f ∗ g)(t) = f (τ )g(t − τ ) dτ. (3.5.1)
0

Remark: The convolution is defined for functions f and g such that the integral in (3.5.1) is
defined. For example for f and g piecewise continuous functions, or one of them continuous and
the other a Dirac’s delta generalized function.

Example 3.5.1. Find f ∗ g the convolution of the functions f (t) = e−t and g(t) = sin(t).
Solution: The definition of convolution is,
Z t
(f ∗ g)(t) = e−τ sin(t − τ ) dτ.
0

This integral is not difficult to compute. Integrate by parts twice,


Z t h i t h i t Z t
e−τ sin(t − τ ) dτ = e−τ cos(t − τ ) − e−τ sin(t − τ ) − e−τ sin(t − τ ) dτ,

0 0 0 0

that is,
Z t h i t h i t
2 e−τ sin(t − τ ) dτ = e−τ cos(t − τ ) − e−τ sin(t − τ ) = e−t − cos(t) − 0 + sin(t).

0 0 0

We then conclude that


1  −t 
(f ∗ g)(t) = e + sin(t) − cos(t) . (3.5.2)
2
C

Example 3.5.2. Graph the convolution of


f (τ ) = u(τ ) − u(τ − 1),
( −2τ
2e for τ > 0
g(τ ) =
0 for τ < 0.

Solution: Notice that ( 2τ


2e for τ 60
g(−τ ) =
0 for τ > 0.
Then we have that ( 2(τ −t)
2e for τ 6t
g(t − τ ) = g(−(τ − t))
0 for τ > t.
In the graphs below we can see that the values of the convolution function f ∗ g measure the overlap
of the functions f and g when one function slides over the other.
3.5. CONVOLUTIONS AND SOLUTIONS 175

Figure 10. The graphs of f , g, and f ∗ g.

A few properties of the convolution operation are summarized in the Theorem below. But we
save the most important property for the next subsection.

Theorem 3.5.2 (Properties). For every piecewise continuous functions f , g, and h, hold:

(i) Commutativity: f ∗ g = g ∗ f;
(ii) Associativity: f ∗ (g ∗ h) = (f ∗ g) ∗ h;
(iii) Distributivity: f ∗ (g + h) = f ∗ g + f ∗ h;
(iv) Neutral element: f ∗ 0 = 0;
(v) Identity element: f ∗ δ = f .

Proof of Theorem 3.5.2: We only prove properties (i) and (v), the rest are left as an exercise
and they are not so hard to obtain from the definition of convolution. The first property can be
obtained by a change of the integration variable as follows,

Z t
(f ∗ g)(t) = f (τ ) g(t − τ ) dτ.
0
176 3. THE LAPLACE TRANSFORM METHOD

Now introduce the change of variables, τ̂ = t − τ , which implies dτ̂ = −dτ , then
Z 0
(f ∗ g)(t) = f (t − τ̂ ) g(τ̂ )(−1) dτ̂
t
Z t
= g(τ̂ ) f (t − τ̂ ) dτ̂ ,
0

so we conclude that

(f ∗ g)(t) = (g ∗ f )(t).

We now move to property (v), which is essentially a property of the Dirac delta,
Z t
(f ∗ δ)(t) = f (τ ) δ(t − τ ) dτ = f (t).
0

This establishes the Theorem. 

3.5.2. The Laplace Transform. The Laplace transform of a convolution of two functions
is the pointwise product of their corresponding Laplace transforms. This result will be a key part
in the solution decomposition result we show at the end of the section.

Theorem 3.5.3 (Laplace Transform). If both L[g] and L[g] exist, including the case where either
f or g is a Dirac’s delta, then

L[f ∗ g] = L[f ] L[g]. (3.5.3)

Remark: It is not an accident that the convolution of two functions satisfies Eq. (3.5.3). The
definition of convolution is chosen so that it has this property. One can see that this is the case
by looking at the proof of Theorem 3.5.3. One starts with the expression L[f ] L[g], then changes
the order of integration, and one ends up with the Laplace transform of some quantity. Because
this quantity appears in that expression, is that it deserves a name. This is how the convolution
operation was created.
Proof of Theorem 3.5.3: We start writing the right hand side of Eq. (3.5.1), the product L[f ] L[g].
We write the two integrals coming from the individual Laplace transforms and we rewrite them in
an appropriate way.
hZ ∞ i hZ ∞ i
L[f ] L[g] = e−st f (t) dt e−st̃ g(t̃) dt̃
0 0
Z ∞ Z ∞ 
−st̃ −st
= e g(t̃) e f (t) dt dt̃
Z0 ∞ Z ∞
0

= g(t̃) e−s(t+t̃) f (t) dt dt̃,
0 0

where we only introduced the integral in t as a constant inside the integral in t̃. Introduce the
change of variables in the inside integral τ = t + t̃, hence dτ = dt. Then, we get
Z ∞
Z ∞ 
L[f ] L[g] = g(t̃) e−sτ f (τ − t̃) dτ dt̃ (3.5.4)
Z0 ∞ Z ∞ t̃
= e−sτ g(t̃) f (τ − t̃) dτ dt̃. (3.5.5)
0 t̃
3.5. CONVOLUTIONS AND SOLUTIONS 177

Here is the key step. We must switch the order of



integration. From Fig. 11 we see that changing the
order of integration gives the following expression,
Z ∞Z τ t̃ = τ
L[f ] L[g] = e−sτ g(t̃) f (τ − t̃) dt̃ dτ.
0 0
Then, is straightforward to check that
Z ∞ Z τ 
L[f ] L[g] = e−sτ g(t̃) f (τ − t̃) dt̃ dτ
0
Z0 ∞ τ
−sτ
0
= e (g ∗ f )(τ ) dt
0
= L[g ∗ f ] ⇒ L[f ] L[g] = L[f ∗ g]. Figure 11. Domain of
This establishes the Theorem.  integration in (3.5.5).

Z t
Example 3.5.3. Compute the Laplace transform of the function u(t) = e−τ sin(t − τ ) dτ .
0

Solution: The function u above is the convolution of the functions


f (t) = e−t , g(t) = sin(t),
that is, u = f ∗ g. Therefore, Theorem 3.5.3 says that
L[u] = L[f ∗ g] = L[f ] L[g].
Since,
1 1
L[f ] = L[e−t ] = , L[g] = L[sin(t)] = ,
s+1 s2 + 1
we then conclude that L[u] = L[f ∗ g] is given by
1
L[f ∗ g] = .
(s + 1)(s2 + 1)
C
Z t
Example 3.5.4. Use the Laplace transform to compute u(t) = e−τ sin(t − τ ) dτ .
0

Solution: Since u = f ∗ g, with f (t) = e−t and g(t) = sin(t), then from Example 3.5.3,
1
L[u] = L[f ∗ g] = .
(s + 1)(s2 + 1)
A partial fraction decomposition of the right hand side above implies that
1 1 (1 − s) 
L[u] = + 2
2 (s + 1) (s + 1)
1 1 1 s 
= + 2 − 2
2 (s + 1) (s + 1) (s + 1)
1  −t 
= L[e ] + L[sin(t)] − L[cos(t)] .
2
This says that
1 −t 
e + sin(t) − cos(t) .
u(t) =
2
So, we recover Eq. (3.5.2) in Example 3.5.1, that is,
1 −t 
(f ∗ g)(t) = e + sin(t) − cos(t) ,
2
C
178 3. THE LAPLACE TRANSFORM METHOD

Z t
Example 3.5.5. Find the function g such that f (t) = sin(4τ ) g(t − τ ) dτ has the Laplace
0
s
transform L[f ] = 2 .
(s + 16)((s − 1)2 + 9)
Solution: Since f (t) = sin(4t) ∗ g(t), we can write
s
= L[f ] = L[sin(4t) ∗ g(t)]
(s2 + 16)((s − 1)2 + 9)
= L[sin(4t)] L[g]
4
= 2 L[g],
(s + 42 )
so we get that
4 s 1 s
L[g] = 2 ⇒ L[g] = .
(s2 + 42 ) (s + 16)((s − 1)2 + 9) 4 (s − 1)2 + 32
We now rewrite the right-hand side of the last equation,
1 (s − 1 + 1) 1  (s − 1) 1 3 
L[g] = ⇒ L[g] = + ,
4 (s − 1) + 3
2 2 4 (s − 1) + 3
2 2 3 (s − 1) + 3
2 2

that is,
1 1  1 t 1 
L[g] = L[cos(3t)](s − 1) + L[sin(3t)](s − 1) = L[e cos(3t)] + L[et sin(3t)] ,
4 3 4 3
which leads us to
1  1 
g(t) = et cos(3t) + sin(3t)
4 3
C

3.5.3. Solution Decomposition. The Solution Decomposition Theorem is the main result of
this section. Theorem 3.5.4 shows one way to write the solution to a general initial value problem for
a linear second order differential equation with constant coefficients. The solution to such problem
can always be divided in two terms. The first term contains information only about the initial
data. The second term contains information only about the source function. This second term
is a convolution of the source function itself and the impulse response function of the differential
operator.

Theorem 3.5.4 (Solution Decomposition). Given constants a0 , a1 , y0 , y1 and a piecewise contin-


uous function g, the solution y to the initial value problem
y 00 + a1 y 0 + a0 y = g(t), y(0) = y0 , y 0 (0) = y1 , (3.5.6)
can be decomposed as
y(t) = yh (t) + (yδ ∗ g)(t), (3.5.7)
where yh is the solution of the homogeneous initial value problem
yh00 + a1 yh0 + a0 yh = 0, yh (0) = y0 , yh0 (0) = y1 , (3.5.8)
and yδ is the impulse response solution, that is,
yδ00 + a1 yδ0 + a0 yδ = δ(t), yδ (0) = 0, yδ0 (0) = 0.

Remark: The solution decomposition in Eq. (3.5.7) can be written in the equivalent way
Z t
y(t) = yh (t) + yδ (τ )g(t − τ ) dτ.
0
Also, recall that the impulse response function can be written in the equivalent way
h e−cs i h 1 i
yδ = L−1 , c 6= 0, and yδ = L−1 , c = 0.
p(s) p(s)
3.5. CONVOLUTIONS AND SOLUTIONS 179

Proof of Theorem3.5.4: Compute the Laplace transform of the differential equation,


L[y 00 ] + a1 L[y 0 ] + a0 L[y] = L[g(t)].
Recalling the relations between Laplace transforms and derivatives,
L[y 00 ] = s2 L[y] − sy0 − y1 , L[y 0 ] = s L[y] − y0 .
we re-write the differential equation for y as an algebraic equation for L[y],
(s2 + a1 s + a0 ) L[y] − sy0 − y1 − a1 y0 = L[g(t)].
As usual, it is simple to solve the algebraic equation for L[y],
(s + a1 )y0 + y1 1
L[y] = + 2 L[g(t)].
(s2 + a1 s + a0 ) (s + a1 s + a0 )
Now, the function yh is the solution of Eq. (3.5.8), that is,
(s + a1 )y0 + y1
L[yh ] = .
(s2 + a1 s + a0 )
And by the definition of the impulse response solution yδ we have that
1
L[yδ ] = 2 .
(s + a1 s + a0 )
These last three equation imply,
L[y] = L[yh ] + L[yδ ] L[g(t)].
This is the Laplace transform version of Eq. (3.5.7). Inverting the Laplace transform above,
y(t) = yh (t) + L−1 L[yδ ] L[g(t)] .
 

Using the result in Theorem 3.5.3 in the last term above we conclude that
y(t) = yh (t) + (yδ ∗ g)(t).


Example 3.5.6. Use the Solution Decomposition Theorem to express the solution of
y 00 + 2 y 0 + 2 y = g(t), y(0) = 1, y 0 (0) = −1.

Solution: We first find the impuse response function


h 1 i
yδ (t) = L−1 , p(s) = s2 + 2s + 2.
p(s)
since p has complex roots, we complete the square,
s2 + 2s + 2 = s2 + 2s + 1 − 1 + 2 = (s + 1)2 + 1,
so we get
1h i
yδ (t) = L−1 2
⇒ yδ (t) = e−t sin(t).
(s + 1) + 1
We now compute the solution to the homogeneous problem
yh00 + 2 yh0 + 2 yh = 0, yh (0) = 1, yh0 (0) = −1.
Using Laplace transforms we get
L[yh00 ] + 2 L[yh0 ] + 2 L[yh ] = 0,
and recalling the relations between the Laplace transform and derivatives,
s2 L[yh ] − s yh (0) − yh0 (0) + 2 L[yh0 ] = s L[yh ] − yh (0) + 2L[yh ] = 0,
 

using our initial conditions we get (s2 + 2s + 2) L[yh ] − s + 1 − 2 = 0, so


(s + 1) (s + 1)
L[yh ] = = ,
(s2 + 2s + 2) (s + 1)2 + 1
180 3. THE LAPLACE TRANSFORM METHOD

so we obtain h i
yh (t) = L e−t cos(t) .
Therefore, the solution to the original initial value problem is
Z t
y(t) = yh (t) + (yδ ∗ g)(t) ⇒ y(t) = e−t cos(t) + e−τ sin(τ ) g(t − τ ) dτ.
0
C

Example 3.5.7. Use the Laplace transform to solve the same IVP as above.
y 00 + 2 y 0 + 2 y = g(t), y(0) = 1, y 0 (0) = −1.

Solution: Compute the Laplace transform of the differential equation above,


L[y 00 ] + 2 L[y 0 ] + 2 L[y] = L[g(t)],
and recall the relations between the Laplace transform and derivatives,
L[y 00 ] = s2 L[y] − sy(0) − y 0 (0), L[y 0 ] = s L[y] − y(0).
Introduce the initial conditions in the equation above,
L[y 00 ] = s2 L[y] − s (1) − (−1), L[y 0 ] = s L[y] − 1,
and these two equation into the differential equation,
(s2 + 2s + 2) L[y] − s + 1 − 2 = L[g(t)].
Reorder terms to get
(s + 1) 1
L[y] = + 2 L[g(t)].
(s2 + 2s + 2) (s + 2s + 2)
Now, the function yh is the solution of the homogeneous initial value problem with the same initial
conditions as y, that is,
(s + 1) (s + 1)
L[yh ] = 2 = = L[e−t cos(t)].
(s + 2s + 2) (s + 1)2 + 1
Now, the function yδ is the impulse response solution for the differential equation in this Example,
that is,
1 1
cL[yδ ] = 2 = = L[e−t sin(t)].
(s + 2s + 2) (s + 1)2 + 1
If we put all this information together and we get
L[y] = L[yh ] + L[yδ ] L[g(t)] ⇒ y(t) = yh (t) + (yδ ∗ g)(t),
More explicitly, we get
Z t
y(t) = e−t cos(t) + e−τ sin(τ ) g(t − τ ) dτ.
0
C
3.5. CONVOLUTIONS AND SOLUTIONS 181

3.5.4. Exercises.

3.5.1.- . 3.5.2.- .
CHAPTER 4

Qualitative Aspects of Systems of ODE

Newton’s second law of motion for point particles is one of the first differential equations ever
written. Even this early example of a differential equation consists not of a single equation but
of a system of three equation on three unknowns. The unknown functions are the particle three
coordinates in space as function of time. One important difficulty to solve a differential system is
that the equations in a system are usually coupled. One cannot solve for one unknown function
without knowing the other unknowns. In this chapter we study how to solve the system in the
particular case that the equations can be uncoupled. We call such systems diagonalizable. Explicit
formulas for the solutions can be written in this case. Later we generalize this idea to systems that
cannot be uncoupled.

183
184 4. QUALITATIVE ASPECTS OF SYSTEMS OF ODE

4.1. Modeling with First Order Systems


So far we have studied physical systems described by a single function, solution of a single differential
equation. Now we study more complicated systems described by several functions, solutions of
several differential equations. These equations are called a system of differential equations. In this
section we go over a few of these physical systems—the interacting species, the predator-prey, and
the spring-mass system. The last system is usually described by a single, second order, differential
equation. However, any second order equation for a single function can be rewritten as a first order
system consisting of two equations for two functions—the original function and its derivative.

4.1.1. Interacting Species Model. Consider two species interacting in some way to survive.
For example, rabbits and sheep competing for the same food. Or badgers and coyotes teaming up
to dig out and hunt rodents. In the the first example the species compete for resources, in the last
example the species cooperate—act together for a common benefit. A mathematical description of
any of these two cases is called an interacting species model.
One way to construct a mathematical model of a complicated physical system is to start with
a much simpler system, and then slowly add features to it until we get to the situation we are
interested in. For example, suppose we want to construct a mathematical model for rabbits and
sheep competing for the same finite food resources. We can start with a much simpler case—
only one species, rabbits, having unlimited food. In that case we know, from the experiment with
bacteria in § 1.1, that the rabbit’s population will increase exponentially, according to solutions of
the differential equation
R0 (t) = rR R(t), rR > 0,
where rR is the growth rate per capita coefficient. Then we add one new feature to this simple
system—we require that the food is finite. That means there is a maximum population, KR > 0,
that can be sustained by the food resources. When the population is larger than KR , the rabbit’s
population should decrease. A simple model with this property is given by the logistic equation,
 R(t) 
R0 (t) = rR R(t) 1 − .
KR
Now we can add the second species—sheep—to our system, but we assume that the two species do
not interact. One way this could happen is when the region where they live is so big the species
do not notice each other. Such a system is described by two logistic equations, one for the rabbits,
the other for the sheep,
 R 
R 0 = rR R 1 −
KR
0
 S 
S = rS S 1 − ,
KS
where rR , rS are the growth rates per capita and KR , KS are the carrying capacities.
Finally, we introduce the competition terms into this model. Consider the first equation above
for the rabbit population. The fact that sheep are eating the same grass as the rabbits will decrease
R0 , so we need to add a term of the form α R(t) S(t) on the right hand side, that is
 R
R 0 = rr R 1 − + α R S, α < 0.
Rc
The new term must be negative, because the competition must decrease R0 . And this new term
must depend on the product of the populations RS, because the product is a good measure of how
often the species meet when they compete for the same grass. If the rabbit population is very small,
then the interaction must be also small. Something similar must occur for the sheep population,
so we get
 S
S 0 = rs S 1 − + β R S, β < 0.
Sc
We summarize the ideas above and we generalize from competition to any type of interactions in
the following definition.
4.1. MODELING WITH FIRST ORDER SYSTEMS 185

Definition 4.1.1. The interacting species system for the functions x and y, which depend on
the independent variable t, is
 x
x0 = r x x 1 − + αxy
xc
 y 
y 0 = ry y 1 − + β x y,
yc
where the constants rx , ry and xc , xc are positive and α, β are real numbers. Furthermore, we have
the following particular cases:
• The species compete if and only if α < 0 and β < 0.
• The species cooperate if and only if α > 0 and β > 0.
• The species y cooperates with x when α > 0, and x competes with y when β < 0.

Example 4.1.1 (Interacting Species). The following systems are models of the populations of pair
of species that either compete for resources (an increase in one species decreases the growth rate
in the other) or cooperate (an increase in one species increases the growth rate in the other). For
each of the following systems identify the independent and dependent variables, the parameters,
such as growth rates, carrying capacities, measure of interactions between species. Do the species
compete of cooperate?
(a)
(b)
dx x2
= c1 x − c1 − b1 xy dx x2
dt K1 =x− + 5 xy
dt 5
dy y2 2
= c2 y − c2 − b2 xy. dy y
dt K2 = 2y − + 2 xy.
dt 6
where all the coefficients above are positive.
Solution:
(a)
(b)
• The species compete.
• The species coperate.
• t is the independent variable.
• t is the independent variable.
• x, y are the dependent variables.
• x, y are the dependent variables.
• c1 , c2 are the growth rates.
• 1, 2 are the growth rates.
• K1 , K2 are the carrying capacities.
• 5, 12 (not 6) are the carrying capacities.
• −b1 , −b2 are the competition coeffi-
• 5, 2 are the cooperation coefficients.
cients.
C

Example 4.1.2 (Competing Species). Consider a system of elephants and rabbits in an environ-
ment with limited resources. Suppose the system is described by the following equations
dx 1 2
=x− x − xy
dt 20
dy 1 2
= 3y − y − 20 xy.
dt 300
Which unknown function represents the elephant and which one represents the rabbits? and what
are the growth rate and the carrying capacity coefficients for each species?
Solution: From the first term in the first equation we see that the growth coefficient for the x
species is rx = 1. Therefore, the carrying capacity for that species is Kx = 20. And the effect on x
of the interaction with y is αx = −1, which is negative, so we have a competition.
Analogously, from the first term in the second equation we see that the growth coefficient for
the y species is ry = 3. Therefore, the carrying capacity for that species is Ky = 900, because
1 3 r
300
= 900 = Kyy . And the effect on y of the interaction with x is αy = −20, which is negative, so
we again have a competition.
From these numbers we see that the species x grows slower than the species y, which suggests
that x are elephants and y are rabbits. Rabbits reproduce faster than elephants.
186 4. QUALITATIVE ASPECTS OF SYSTEMS OF ODE

The carrying capacities say that the critical x-population for the environment is 20 individuals,
while the critical population for the y species is 900 individuals for the same environment. This
also suggests that x are elephants and y are rabbits, the environment can hold 20 elephants or 900
rabbits. Rabbits eat less than elephants.
The competing coefficients also tell us a similar result, x are elephants and y are rabbits.
Rabbits affect little to elephants and elephants affect a lot on rabbits, this is why |αx | < |αy |.
C

Species can interact in ways more general than the system given in Def. 4.1.2. in the following
example we show a system of equation that describes how a parasite interacts with its host.

Example 4.1.3 (Parasite-Host Interaction). Suppose that D(t) is the dog population at time t
and F (t) is the flea population living on these dogs. Assume that these populations have infinitely
may food resources. Then, the following system describes the Dog-Flea population,

D0 = a D + b F
F 0 = c D + d F.

Find the sign of all coefficients.


Solution: Since dogs and fleas reproduce, the coefficients a and d must be positive. Since the more
fleas the worse is the dog health, the coefficient b must be negative. Since the more dogs there are
the more fleas there are, the coefficient c must be positive. So the system looks like

D0 = |a| D − |b| F
F 0 = |c| D + |d| F.

where we have written the signs explicitly. C

4.1.2. Predator-Prey System. In this case we have two species where one species preys on
the other and the preys have unlimited food resources. For example, cats prey on mice, foxes prey
on rabbits. Once again, we construct the mathematical model starting with the simplest system we
can model and then we add features to that system until we get to the system we want to describe.
In this case we start with the prey alone, say rabbits, having unlimited food resources. There-
fore, the rabbit population grows exponentially, and their population function R satisfies the dif-
ferential equation
R0 (t) = rR R(t), rR > 0.
If we now look at the predator population alone, say foxes, something different happens. Foxes
alone do not have any food resources, so their population function F have to decrease to zero. So,
foxes starve following the equation

F 0 (t) = −rF F (t), rF > 0,

where the negative sign is explicitly written to emphasize that the foxes population decreases. If
we now let the foxes and the rabbits live together, the rabbits begin to be hunted by foxes and the
foxes start to get some food. This means R0 has to be lower than before and F 0 has to be larger
than before. Also notice that it is more probable that a fox meets a rabbit when both populations
are large. One can see that one of the simpler models describing this situation is

R0 (t) = rR R(t) − αR R(t) F (t)


F 0 (t) = −rF F (t) + αF R(t) F (t),

where αR and αF are positive. This is the predator-prey system, also known as the Lotka-Volterra
equations. We summarize our discussion in the following definition.
4.1. MODELING WITH FIRST ORDER SYSTEMS 187

Definition 4.1.2. The predator-prey system for the predator function x and the prey function
y, which depend on the independent variable t, is
x0 = −ax x + bx x y
y 0 = ay y − by x y,
where the coefficients ax , bx , ay , and by are nonnegative.

The coefficients ax and ay are the decay and growth rate per capita coefficients, respectively.
Their magnitude indicates how fast their populations decrease or grow, respectively, when they live
alone. The coefficients bx and by measure how much one species is affected by the interaction with
the other.
In the following example we show how the magnitude of the equation coefficients describes the
type of predators and preys.

Example 4.1.4 (Predator-Prey). Consider the two systems of differential equations.


x0 = 0.3 x − 0.1 xy x0 = 0.3 x − 3 xỹ
y 0 = −0.1 y + 2 xy ỹ 0 = −2 ỹ + 0.1 xỹ.
The prey in both systems is the same—find out why. One of these systems has a lethargic predator—
such as a boa constrictor. The other has an active predator—such us a bobcat. Identify each system.
Solution: First, the prey is the same in both systems because the growth rate per capita for the
prey in both system is the same, ax = 0.3. This is the only coefficient that identifies the prey alone,
so the prey is the same in both systems.
The system on the left we have an interaction coefficient dx = 0.1, which means that the
predator y does not affect too much the prey population. The opposite happens with the system
on the right, where dx = 3, so the predator ỹ hunts much more preys than y. This suggests y are
boas and ỹ are bobcats.
From the second equations in both systems we draw similar conclusions. The decay coefficient
ay = 0.1 is smaller than aỹ = 2, which means that the predator y can live longer times without food
than the predator ỹ. This agrees with the fact that boas are cold blooded more energy efficient in
warm weather than the warm blooded bobcats.
Finally the interaction coefficient dy = 2 means that the population of the predator y gains
a lot more form each prey than the population of the predator ỹ, which has a smaller coefficient
dỹ = 0.1. Again this suggests that y are boas and ỹ are bobcats. C

Remark: From the example above it looks like the boas are way more energy efficient predators
than bobcats. So one may ask, why do we have bobcats at all? why aren’t all predators cold
blooded, just like reptiles? It turns out that when in regions where the weather is warm and
without much temperature variations this is the case, we have more reptiles than warm blooded
predators. But in colder regions only warm blooded predators can survive. It is also interesting to
look at the past, when dinosaurs roamed the Earth. The weather then was warm with homogeneous
temperatures. But when weather changed, the cold blooded predators were wiped out, and only
the warm blooded survived. In our model above we have not included the weather in the system,
so me assumed that the environment was fine for both predators to live. In these conditions, boas
have an energy managment advantage over bobcats.

4.1.3. Spring-Mass System as a First Order System. A spring-mass system, as intro-


duced in § 2.1, consists of a spring with stiffness constant k hanging from a roof and having an
object of mass m attached to it. If the there is no damping nor friction, Newton’s equation for that
system is
my 00 + ky = 0,
where y(t) denotes the displacement of the body away from rest position at time t, positive down-
wards. Now, let y 0 (t) = v(t) denote the velocity at time t. Then, the above second order differential
188 4. QUALITATIVE ASPECTS OF SYSTEMS OF ODE

equation can be rewritten as a system of two first order differential equations,


y 0 (t) = v(t)
k
v 0 (t) = −
y(t).
m
We have converted the second order scalar equation for the unknown y into a first order system
of two equations in two unknowns, y and v, which are linked by the first equation y 0 = v. This is
called a first order reduction of the original second order equation.

Theorem 4.1.3 (First Order Reduction). A function y solves the second order equation
y 00 + a1 (t) y 0 + a0 (t) y = b(t), (4.1.1)
0
iff the functions x1 = y and x2 = y are solutions to the 2 × 2 first order differential system
x01 = x2 , (4.1.2)
x02 = −a0 (t) x1 − a1 (t) x2 + b(t). (4.1.3)

Proof of Theorem 4.1.3:


(⇒) Given a solution y of Eq. (4.1.1), introduce the functions x1 = y and x2 = y 0 . Therefore
Eq. (4.1.2) holds, due to the relation
x01 = y 0 = x2 ,
Also Eq. (4.1.3) holds, because of the equation
x02 = y 00 = −a0 (t) y − a1 (t) y 0 + b(t) ⇒ x002 = −a0 (t) x1 − a1 (t) x2 + b(t).
(⇐) Differentiate Eq. (4.1.2) and introduce the result into Eq. (4.1.3), that is,
x001 = x02 ⇒ x001 = −a0 (t) x1 − a1 (t) x01 + b(t).
Denoting y = x1 , we obtain,
y 00 + a1 (t) y 0 + a0 (t) y = b(t).
This establishes the Theorem. 

Example 4.1.5 (First Order Reduction). Express the initial value problem for the second order
equation
y 00 + 2y 0 + 2y = 7, y(0) = 1, y 0 (0) = 3,
as an initial value problem for the associated first order reduction introduced above.
Solution: Let us introduce the new variables
x1 = y, x2 = y 0 ⇒ x01 = x2 .
Then, the differential equation can be written as
x02 + 2x2 + 2x1 = 7.
We conclude that
x01 = x2 ,
x02 = −2x1 − 2x2 + 7.
Finally, the initial conditions for y and y 0 define initial conditions for x1 and x2 ,
x1 (0) = 1, x2 (0) = 3.
C
4.1. MODELING WITH FIRST ORDER SYSTEMS 189

4.1.4. Exercises.

4.1.1.- Write a mathematical model describing the populations of ravens, wolves, and deer. Use
that the unkindness of ravens guide the packs of wolves to the deer.
190 4. QUALITATIVE ASPECTS OF SYSTEMS OF ODE

4.2. Vector Fields and Qualitative Solutions


In this section we start looking at solutions to systems of differential equations. We will restrict
our attention to autonomous systems. We will find qualitative properties of the solutions without
actually solving the system of equations. We start finding a special type of solutions, constant
solutions. These solutions do not depend on the independent variable, say time, therefore they are
called equilibrium solutions. Then we introduce the vector field associated to a given system of
equations. This is a generalization of the slope field studied in section 1.3. Using this vector field
we can obtain qualitative properties of the solution to that system of differential equations.

4.2.1. Equilibrium Solutions. The first order systems we have seen in § 4.1 are particular
cases of 2 × 2 first order system of differential equations, in particular, autonomous differential
equations.

Definition 4.2.1. A 2 × 2 first order system of differential equations for x1 (t), x2 (t) is

x01 = f1 (t, x1 , x2 ), (4.2.1)


x02 = f2 (t, x1 , x2 ). (4.2.2)

The system above is called autonomous when the functions f1 , f2 do not depend explicitly on the
independent variable t, that is,

x01 = f1 (x1 , x2 ), (4.2.3)


x02 = f2 (x1 , x2 ). (4.2.4)

Remark: The interacting species system, the predator-prey system, and the first order reduction
of the mass-spring system are examples of autonomous systems.

Definition 4.2.2. An equilibrium solution of the autonomous system in (4.2.3)-(4.2.4) is a


constant solution, that is, a solution that does not depend on the variable t.

Remarks:
• Equilibrium solutions x01 , x02 are constants, therefore their t-derivative vanish,

x001 = 0, x002 = 0.

But these constants are solutions of the differential equations in (4.2.3)-(4.2.4), therefore
these constant must satisfy the equations

f1 (x01 , x02 ) = 0, (4.2.5)


f2 (x01 , x02 ) = 0. (4.2.6)

• Equilibrium solutions x01 ,x02 determine a point on the plane x1 x2 , which is called the
phase plane or phase space. Because equilibrium solutions are points in the phase plane
they are also called equilibrium points or critical points.
• A pair of functions x1 , x2 solutions of equations (4.2.1)-(4.2.2) determine a curve r(t) =
(x1 (t), x2 (t)) on the phase plane, where the parameter of the curve is the independent
variable t. This curve is called a solution curve.

Example 4.2.1. Find the equilibrium solutions of the predator-prey system

R0 = 5R − 2RF,
F 0 = −F + 0.5RF.

Discuss what do the equilibrium solutions represent in the physical situation of the model.
4.2. VECTOR FIELDS AND QUALITATIVE SOLUTIONS 191

Solution: Using the definition of equilibrium solutions we need to solve the following algebraic
system of equations:
5R − 2RF = 0
−F + 0.5RF = 0.
Equivalently,
R(5 − 2F ) = 0
F (−1 + 0.5R) = 0.
Thus, the equilibrium points are
(R0 , F0 ) = (0, 0), (R1 , F1 ) = (2, 2.5).
If R(t) = 0 and F (t) = 0, that is there are no rabbits and no foxes, the system is in perfect balance
(their numbers will not change). Similarly, R(t) = 2 and F (t) = 2.5 for all t ≥ 0 is a solution to
the system of differential equations that does not change with time—the system is in balance, or
in equilibrium. C

Example 4.2.2. Find the equilibrium solutions of the following competing species system.
dR
= 3R (1 − H − R)
dt
dH
= 2H (2 − H − 3R)
dt

Solution: We can easily find three of the equilibrium solutions (0, 0), (0, 2), (1, 0). In addition we
could have
1−H −R=0
2 − H − 3R = 0,

which yields an additional equilibrium point, ( 12 , 12 ), which we can interpret as a coexistence of both
species. C

4.2.2. Vector Fields and Direction Fields. We now introduce the vector field associated
with a system of differential equations. The vector field is constructed using the equation itself, not
with the solutions of the differential equation.

Definition 4.2.3. The vector field of the autonomous system


x01 = f1 (x1 , x2 )
x02 = f2 (x1 , x2 ),
is the vector function F which at each point (x1 , x2 ) in the phase plane associates a vector
F(x1 , x2 ) = hf1 , f2 i.

Remark: The usual way to plot a two-dimensional vector field is to use the same plane for both the
domain points and the image vector. For example, for the F(x, y) = hFx , Fy i we use the xy-plane,
and at the point (x, y) we place the origin of a vector with components hFx , Fy i.

Example 4.2.3. Plot the vector field associated with the first order reduction of the spring-mass
system with mass m = 1 kg and k = 1 kg/s2 .
Solution: The differential equation for the spring-mass system above is
y 00 + y = 0,
192 4. QUALITATIVE ASPECTS OF SYSTEMS OF ODE

where y is the coordinate with origin at the spring-mass equilibrium position and positive in the
direction we stretch the spring. A first order reduction of this equation is obtained by defining
x1 = y, x2 = y 0 . We then arrive at the following system
x01 = x2
x02 = −x1 .
The vector field associated with this equation is
F(x1 , x2 ) = hx2 , −x1 i.
We now evaluate the vector field at several points on the plane,
F(1/2, 0) = h0, −1/2i, F(0, 1/2) = h1/2, 0i, F(−1/2, 0) = h0, 1/2i, F(0, −1/2) = h−1/2, 0i,
F(1, 0) = h0, −1i, F(0, 1) = h1, 0i, F(−1, 0) = h0, 1i, F(0, −1) = h−1, 0i.
We plot these eight vectors in Fig. 1. With the help of a computer one can plot the vector field at
more sample points and get a picture as in Fig. 2 C

x2
F (x1 , x2 ) = hx2 , −x1 i
x2
F (x1 , x2 ) = hx2 , −x1 i

0 x1
0 x1

Figure 1. The vector field


F (x1 , x2 ) = hx2 , −x1 i evalu-
ated at a few sample points.
Figure 2. The vector field
F (x1 , x2 ) = hx2 , −x1 i.

Remark: The length of the vector carries an important information that we will use later on, when
we construct qualitative graphs of solutions to the differential equation. However, more often than
not, the vector field of a differential equation can be difficult to read when the vectors become too
long, as it can be seen in Fig.3. This is the reason it is useful to plot the direction field, which is a
normalized version of the vector field (all vectors are scaled to unit length).

Definition 4.2.4. The direction field of the autonomous system


x01 = f1 (x1 , x2 )
x02 = f2 (x1 , x2 ),
is the unit vector field in the phase space given by
1
FD (x1 , x2 ) = p hf1 , f2 i.
(f1 ) + (f2 )2
2
4.2. VECTOR FIELDS AND QUALITATIVE SOLUTIONS 193

x2 x2 hx2 , −x1 i
F (x1 , x2 ) = hx2 , −x1 i FD = p
x21 + x22

0 x1 0 x1

Figure 3. The vector field Figure 4. Direction field of


F (x1 , x2 ) = hx2 , −x1 i. F (x1 , x2 ) = hx2 , −x1 i.

p
Remark: If we denote kF k = (f1 )2 + (f2 )2 , the length of the vector F, then the direction field
is given by
F
FD = .
kF k
In Fig. 4 we plot the direction field of the vector field in Fig. 3. Note that no matter how
complicated the system of equation is, we can always plot the associated vector or direction fields.

Example 4.2.4. Use a computer to plot the vector field and the direction field of the predator-prey
system
x0 = 5x − 2xy
1
y 0 = −y + xy.
2
Solution: In Fig. 5, 6, we show the vector field and direction field for the predator-prey system
above.
We can see that the direction field gives a more clear image of the rotation in the field, in
Fig. (6). However, we loose all the information about the length of the vector field. In Fig. (5)
we can see that the vectors in the vector field get longer the farther they are from the rotation
center. The length of the vector field will give us important information about the solution of the
differential equation. C

4.2.3. Sketching Solutions. The vector and direction fields of a system of differential equa-
tions provide a simple way to obtain a qualitative picture in the phase plane of the solutions of
the differential system. The reason is that solution curves and vector fields are deeply connected—
solution curves must be tangent to the vector fields.

Theorem 4.2.5. If x1 (t), x2 (t) are solutions of the autonomous differential system
x01 = f1 (x1 , x2 )
x02 = f2 (x1 , x2 ),
then the vector field F(x1 , x2 ) = hf1 , f2 i and the solution curve r(t) = (x1 (t), x2 (t)) satisfy
r 0 (t) = F(r(t)). (4.2.7)
194 4. QUALITATIVE ASPECTS OF SYSTEMS OF ODE

y y
F (x, y) = h5x − 2xy, −y + xy/2i FD = F/kF k

0 x 0 x

Figure 5. The vector field Figure 6. The direction


for the predator-prey sys- field for the predator-prey
tem. system.

Example 4.2.5. In Fig. 7 we show


the direction field for the competing
species system, y
x0 = x(3 − x − 2y) FD = F/kF k
0
y = y(2 − y − x).

Solution: The vector field of the


system above is given by
F = hx(3 − x − 2y), y(2 − y − x)i,
while the direction field is given by
FD = F/kF k.
Once again, we can see that the di-
rection field gives a more clear pic- x
0
ture than the vector field. We do not
plot the vector field - it is too diffi- Figure 7. The direction field for the
cult to see the behavior of the solu- competing species system.
tion based on this vector field. C

Remark: In particular, equation (4.2.7) implies that the vector field F is tangent to the solution
curve r. Also, the vector field F must vanish at equilibrium solutions.
Proof of Theorem 4.2.5: The derivative of the solution curve is a vector tangent to the curve,
given by
r 0 (t) = hx01 (t), x02 (t)i.
4.2. VECTOR FIELDS AND QUALITATIVE SOLUTIONS 195

But the curve is solution of the system above, so

x01 (t) = f1 (x1 (t), x2 (t))


x02 (t) = f2 (x1 (t), x2 (t)),

therefore we get that

r 0 (t) = f1 (x1 (t), x2 (t)), f2 (x1 (t), x2 (t))






= f1 (r(t)), f2 (r(t)
= F(r(t)).

This establishes the Theorem. 

Remark: Once we have a direction or vector field of a system of differential equations, we can see
what the solutions of the equation look like. We only need to draw the curves in the x1 x2 -plane,
the phase plane, that are always tangent to any vector in the vector or direction field.

Example 4.2.6. Find qualitative solutions of the predator-prey system

x0 = 5x − 2xy
1
y 0 = −y + xy.
2

Solution: We have already shown the direction field of the predator-prey system above. We show
it again in Fig. 8. In this case the vectors point in the direction where the segment gets thinner.
On top of that direction field we have plotted in red a curve that is always tangent to any vector
in the direction field. This is a solution curve

r(t) = (x(t), y(t)),

where x(t) and y(t) are solutions to the predator-prey system with initial data given by the red dot
in the picture.
A different way to represent the solution to the system of ordinary differential equations is to
graph the population functions x and y as functions of time. The resulting graphs are given in
Fig. 9. C

Figure 8. The horizontal x-axis represents the prey while the vertical y-
axis represents the predator. The red dot highlights the initial condition.
The vectors point in the direction where the segments get thinner.
196 4. QUALITATIVE ASPECTS OF SYSTEMS OF ODE

Figure 9. The horizontal t-axis represents time, and in the vertical axis
we plot the the prey population (in purple) and the predator population
(in blue).

Example 4.2.7. Find qualitative solutions of the predator-prey system


1
x0 = 2x − x2 − xy
5
3
y 0 = −5y + xy.
2
Solution: In Fig. 9 we show the direction field for this predator-prey system where the prey has
finite food resources. (Note that in the absence of predators (y = 0) the prey satisfies a logistic
equation.) The direction field looks similar to the previous example, where the prey had unlimited
food resources. However, when we draw a solution curve in the xy−plane, it is clear that the
solution behaves in a qualitatively different manner. In the case of unlimited food resources we saw
that the solution curves were closed. In this case, with limited food resources, the solution curve is
not closed, it is a spiral that spirals towards the equilibrium solution (x0 = 10/3, y0 = 4/3).
Now, if we plot the population functions x and y as functions of time, the resulting graphs are
given in Fig. 9. In this case of limited food resources we see that the oscillations in the predator
and prey populations fade away, and both populations approach the equilibrium populations. The
predator population (in blue) initially oscillates in time and then approaches the constant value y0 =
4/3. The prey population (in purple) also initially oscillates, and then approaches the equilibrium
population x0 = 10/3. C
4.2. VECTOR FIELDS AND QUALITATIVE SOLUTIONS 197

Figure 10. The horizontal x-axis represents the prey while the vertical
y-axis represents the predator. The red dot highlights the initial condition.
The vectors point in the direction where the segments get thinner.

Figure 11. The horizontal t-axis represents time, and in the vertical axis
we plot the the prey population (in purple) and the predator population
(in blue).

4.2.4. Exercises.

4.2.1.- .
CHAPTER 5

Overview of Linear Algebra

We overview a few concepts of linear algebra, such as the Gauss operations to solve linear systems
of algebraic equations, matrix operations, determinants, inverse matrix formulas, eigenvalues and
eigenvectors of a matrix, diagonalizable matrices, and the exponential of a matrix.

199
200 5. OVERVIEW OF LINEAR ALGEBRA

5.1. Linear Algebraic Systems


We start our overview of a few main concepts of linear algebra needed to solve systems of linear
differential equations. In this section we introduce an algebraic linear system and we show different
ways to understand such systems. These ways are called the row picture, the column picture, and
the matrix picture. From each picture we get new ideas that later on will form what is called linear
algebra. Among these ideas we highlight the concepts of vectors, linear combination of vectors,
vector space, matrices, matrix-vector product, and matrices as functions on vector spaces.
5.1.1. Systems of Algebraic Equations. A central problem in linear algebra is to find
solutions of a system of linear equations.

Definition 5.1.1. An n × n system of linear algebraic equations is the following: Given


constants aij and bi , where i, j = 1 · · · , n > 1, find the constants xj solutions of
a11 x1 + · · · + a1n xn = b1 , (5.1.1)
..
.
an1 x1 + · · · + ann xn = bn . (5.1.2)
The system is called homogeneous iff all sources vanish, that is, b1 = · · · = bn = 0.

These equations are called a system because there is more than one equation, and they are
called linear because the unknowns xj appear as multiplicative factors with power zero or one—for
example, there are no terms proportional to (xj )2 .

Example 5.1.1.
(a) A 2 × 2 linear system on the unknowns x1 and x2 is the following:
2x1 − x2 = 0,
−x1 + 2x2 = 3.
(b) A 3 × 3 linear system on the unknowns x1 , x2 and x3 is the following:
x1 + 2x2 + x3 = 1,
−3x1 + x2 + 3x3 = 24,
x2 − 4x3 = −1. C

5.1.2. The Row Picture. One way to find a solution to an n × n linear system is by sub-
stitution. Compute x1 from the first equation and introduce it into all the other equations. Then
compute x2 from this new second equation and introduce it into all the remaining equations. Re-
peat this procedure till the last equation, where one finally obtains xn . Then substitute back and
find all the xi , for i = 1, · · · , n − 1.

Example 5.1.2. Find all the numbers x1 , x2 solutions of the 2 × 2 algebraic linear system
2x1 − x2 = 0,
−x1 + 2x2 = 3.

Solution: We find the solution by substitution; from the first equation we get
2x1 − x2 = 0 ⇒ x2 = 2x1
and this information in the second equation gives
x1 = 1,

−x1 + 2(2x1 ) = 3 ⇒ 3x1 = 3 ⇒
x2 = 2.
The substitution method has a nice geometrical interpretation. One first find all the solutions of
each equation, separately, and then one looks for the intersection of these two sets of solutions, see
Fig. 1.
5.1. LINEAR ALGEBRAIC SYSTEMS 201

x2
2x1 − x2 = 0

−x1 + 2x2 = 3
(1, 2)

0 x1

Figure 1. The solution of a 2 × 2 linear system in the row picture is the


intersection of the two lines, which are the solutions of each row equation.

An interesting property of the solutions to any 2 × 2 linear system is simple to prove using the
row picture, and it is the following result.

Theorem 5.1.2. Given any 2 × 2 linear system, only one of the following statements holds:
(i) There is a unique solution;
(ii) There is no solution.
(iii) There are infinitely many solutions;

Remark: What cannot happen, for example, is a 2 × 2 linear system having only two solutions.
Unlike the quadratic equation x2 − 5x + 6 = 0, which has two solutions given by x = 2 and x = 3,
a 2 × 2 linear system has only one solution, or infinitely many solutions, or no solution at all.
Proof of Theorem 5.1.2: The solutions of each equation in a 2 × 2 linear system is a line in R2 .
Two lines in R2 can intersect at a point, or can be parallel but not coincident, or can be coincident.
These are the cases given in (i)-(iii) and can be seen in Fig. 2. This establishes the Theorem. 

x2 x2 x2

x1 x1 x1

Figure 2. An example of the cases given in Theorem 5.1.2, cases (i)-(iii).

Example 5.1.3. Find all numbers x1 , x2 and x3 solutions of the 2 × 3 linear system
x1 + 2x2 + x3 = 1
(5.1.3)
−3x1 − x2 − 8x3 = 2

Solution: Compute x1 from the first equation, x1 = 1 − 2x2 − x3 , and substitute it in the second
equation,
−3 (1 − 2x2 − x3 ) − x2 − 8x3 = 2 ⇒ 5x2 − 5x3 = 5 ⇒ x2 = 1 + x3 .
202 5. OVERVIEW OF LINEAR ALGEBRA

Substitute the expression for x2 in the equation above for x1 , and we obtain

x1 = 1 − 2 (1 + x3 ) − x3 = 1 − 2 − 2x3 − x3 ⇒ x1 = −1 − 3x3 .

Since there is no condition on x3 , the system above has infinitely many solutions parametrized by
the number x3 . We conclude that x1 = −1 − 3x3 , and x2 = 1 + x3 , while x3 is free. C

Example 5.1.4. Find all numbers x1 , x2 and x3 solutions of the 3 × 3 linear system
2x1 + x2 + x3 = 2
−x1 + 2x2 = 1 (5.1.4)
x1 − x2 + 2x3 = −2.

Solution: While the row picture is appropriate to solve small systems of linear equations, it becomes
difficult to carry out on 3 × 3 and bigger linear systems. The solution x1 , x2 , x3 of the system
above can be found as follows: Substitute the second equation into the first,

x1 = −1 + 2x2 ⇒ x3 = 2 − 2x1 − x2 = 2 + 2 − 4x2 − x2 ⇒ x3 = 4 − 5x2 ;

then, substitute the second equation and x3 = 4 − 5x2 into the third equation,

(−1 + 2x2 ) − x2 + 2(4 − 5x2 ) = −2 ⇒ x2 = 1,

and then, substituting backwards, x1 = 1 and x3 = −1. We conclude that the solution is a single
point in space given by (1, 1, −1). C

The solution of each separate equation in the examples above represents a plane in R3 . A
solution to the whole system is a point that belongs to the three planes. In the 3 × 3 system in
Example 5.1.4 above there is a unique solution, the point (1, 1, −1), which means that the three
planes intersect at a single point. In the general case, a 3 × 3 system can have a unique solution,
infinitely many solutions or no solutions at all, depending on how the three planes in space intersect
among them. The case with unique solution was represented in Fig. 3, while two possible situations
corresponding to no solution are given in Fig. 4. Finally, two cases of 3 × 3 linear system having
infinitely many solutions are pictured in Fig 5, where in the first case the solutions form a line, and
in the second case the solutions form a plane because the three planes coincide.

Figure 3. Planes representing the solutions of each row equation in a 3×3


linear system having a unique solution.

Solutions of linear systems with more than three unknowns can not be represented in the
three dimensional space. Besides, the substitution method becomes more involved to solve. As a
consequence, alternative ideas are needed to solve such systems. We now discuss one of such ideas,
the use of vectors to interpret and find solutions of linear systems.
5.1. LINEAR ALGEBRAIC SYSTEMS 203

Figure 4. Two cases of planes representing the solutions of each row equa-
tion in 3 × 3 linear systems having no solutions.

Figure 5. Two cases of planes representing the solutions of each row equa-
tion in 3 × 3 linear systems having infinitely many solutions.

5.1.3. The Column Picture. We want to introduce a new way to interpret an algebraic
system of linear equations. We start with a simple example, which will lead us to the definition of
a new type of objects—vectors— and operations on these objects—linear combinations. After we
learn how to work with vectors we introduce the formal definition of the column picture of a linear
algebraic system.
Consider again the linear system in Example 5.1.2, that is, to find the numbers x1 , x2 solutions
of (
2x1 − x2 = 0, (2)x1 + (−1)x2 = 0,


−x1 + 2x2 = 3. (−1)x1 + (2)x2 = 3.
We know that the answer is x1 = 1, x2 = 2. The row picture consisted in solving each row
separately. The main idea in the column picture is to interpret the 2 × 2 linear system as an
addition of new objects, column vectors, in the following way,
     
2 −1 0
x1 + x2 = .
−1 2 3
The new objects are called column vectors and they are denoted as follows,
     
2 −1 0
a1 = , a2 = , b= .
−1 2 3
We can represent these vectors in the plane, as it is shown in Fig. 6.
The column vector interpretation of a 2 × 2 linear system determines an addition law of vectors
and a multiplication law of a vector by a number. In the example above, we know that the solution
is given by x1 = 1 and x2 = 2, therefore in the column picture interpretation the following equation
must hold      
2 −1 0
+ 2= .
−1 2 3
The study of this example suggests that the multiplication law of a vector by numbers and the
addition law of two vectors can be defined by the following equations, respectively,
         
−1 (−1)2 2 −2 2−2
2= , + = .
2 (2)2 −1 4 −1 + 4
204 5. OVERVIEW OF LINEAR ALGEBRA

x2 x2

2a2
b b

a2

0 x1 0 x1
a1 a1

Figure 6. Graphical repre- Figure 7. Graphical repre-


sentation of an equation co- sentation of an equation so-
efficients as column vectors lution as column vectors in
in the plane. the plane.

The graphical representation of these operations can be seen in Fig. 6 and Fig. 7. From the study of
this and several other examples of 2 × 2 linear systems, the column picture determines the following
definition of linear combination of vectors.
   
u1 v1
 .  .
Definition 5.1.3. The linear combination of the n-vectors u =  ..  and v =  .. , with the
un vn
real numbers a and b, is defined as follows,
     
u1 v1 au1 + bv1
au + bv = a  ...  + b  ...  =  ..
.
     
.
un vn aun + bvn

A linear combination includes the particular cases of addition (a = b = 1), and multiplication
of a vector by a number (b = 0), respectively given by,
         
u1 v1 u1 + v 1 u1 au1
 .  .  ..
 ..  +  ..  =  , a  ...  =  ...  .
    
.
un vn un + v n un aun

The addition law in terms of components is represented graphically by the parallelogram law, as it
can be seen in Fig. 8. The multiplication of a vector by a number a affects the length and direction of
the vector. The product au stretches the vector u when a > 1 and it compresses u when 0 < a < 1.
If a < 0 then it reverses the direction of u and it stretches when a < −1 and compresses when
−1 < a < 0. Fig. 9 represents some of these possibilities. Notice that the difference of two vectors
is a particular case of the parallelogram law, as it can be seen in Fig. 10 and Fig. 11.
The set of all column vectors together with the operation linear combination is called a vector
space. Since we can define real or complex vector spaces, let us use the following notation: Let
F ∈ {R, C}, that is F is either F = R or F = C.

Definition 5.1.4. The vector space Fn over the field F is the set of all n-vectors u ∈ Fn together
with the operation linear combination, that is, for all n-vectors u, v ∈ Fn , and all numbers a, b ∈ F,
5.1. LINEAR ALGEBRAIC SYSTEMS 205

x2
u+v x2
2u
v

u
u
(1/2)u

0 x1
x1
−(1/2)u

Figure 8. The addition of


Figure 9. The multiplica-
vectors can be computed
tion of a vector by a number.
with the parallelogram law.

holds      
u1 v1 au1 + bv1
au + bv = a  ...  + b  ...  =  ..
.
     
.
un vn aun + bvn

So the column picture of an algebraic linear system leads us to the concept of a vector space,
that is, a set of vectors together with the operation of linear combination. And the operation
of linear combination means to add vectors with the parallelogram rule and to stretch-compress
vectors or reverse the vector direction.
Since we now know how to compute linear combinations of vectors, we can formally introduce
the column picture of an algebraic linear system.

Definition 5.1.5. We are now, the column picture of an n × n linear system


a11 x1 + · · · + a1n xn = b1 ,
..
.
an1 x1 + · · · + ann xn = bn .
is given by the vector equation
a1 x1 + · · · + an xn = b,
where we have introduced the vectors
     
a11 a1n b1
a1 =  ...  , · · · an =  ...  , b =  ...  .
     

an1 ann bn

Theorem 5.1.2 can also be proven in the column picture. In Fig. 12 we present these three
cases in both row and column pictures. In the latter case the proof is to study all possible relative
positions of the column vectors a1 , a2 , and b on the plane.
We see in Fig. 12 that the first case corresponds to a system with unique solution. There is
only one linear combination of the coefficient vectors a1 and a2 which adds up to b. The reason is
that the coefficient vectors are not proportional to each other.
The second case corresponds to the case of infinitely many solutions. The coefficient vectors
are proportional to each other and the source vector b is also proportional to them. So, there are
infinitely many linear combinations of the coefficient vectors that add up to the source vector.
206 5. OVERVIEW OF LINEAR ALGEBRA

x2 x2
v−u
v
v
v + (−u)

u u

0 x1 0 x1

−u

Figure 10. The difference Figure 11. The difference


of vectors also can be com- of two vectors as addition
puted with the parallelo- of a vector and its opposite
gram law. vector.

The last case corresponds to the case of no solutions. While the coefficient vectors are propor-
tional to each other, the source vector is not proportional to them. So, there is no linear combination
of the coefficient vectors that add up to the source vector.

x2 x2 x2

x1 x1 x1

x2
a2 x2 b a2 x2
b a1
a1 a1 b

x1 x1 x1
a2

Figure 12. Examples of a solutions of general 2 × 2 linear systems having


a unique solution, no solution, and infinitely many solutions represented in
the row picture and in the column picture.

5.1.4. The Matrix Picture. We now introduce yet another way to interpret a system of
algebraic linear equations. Once again, consider the algebraic linear system
2x1 − x2 = 0,
−x1 + 2x2 = 3.
5.1. LINEAR ALGEBRAIC SYSTEMS 207

We now interpret this system in the following way,


    
2 −1 x1 0
= ,
−1 2 x2 3
where the array of coefficients, called a matrix, acts on an unknown vector as follows,
    
2 −1 x1 2x1 − x2
= , (5.1.5)
−1 2 x2 −x1 + 2x2
This operation is called the matrix-vector product. If we introduce the notation
     
2 −1 x1 0
A= , x= , b= ,
−1 2 x2 3
where A is the coefficient matrix, x is the unknown vector, and b is the source vector, then the
algebraic linear system is
Ax = b.
This example and the study of other similar examples suggest the following definition.

Definition 5.1.6. An m × n matrix, A, is an array of numbers


a11 · · · a1n
 
m rows,
A =  ... ..  , ,

.  n columns,
am1 · · · amn
where aij ∈ F, for i = 1, · · · , m, j = 1, · · · , n, are the matrix coefficients. A square matrix is an
n × n matrix, and the diagonal coefficients in a square matrix are aii .

Remark: A square matrix is upper (lower) triangular iff all the matrix coefficients below (above)
the diagonal vanish. For example, the 3 × 3 matrix A below is upper triangular while B is lower
triangular.
   
1 2 3 1 0 0
A = 0 4 5 , B = 2 3 0  .
0 0 6 4 5 6

Remark: Vectors can now be seen as the particular case of an m × 1 matrix, which are called an
m-vector.

Example 5.1.5.
(a) Examples of 2 × 2, 2 × 3, 3 × 2 real-valued matrices, and a 2 × 2 complex-valued matrix:
 
    1 2  
1 2 1 2 3 1+i 2−i
A= , B= , C = 3 4  , D= .
3 4 4 5 6 3 4i
5 6
(b) The coefficients and the unknowns in the algebraic linear system below can be grouped in a
coefficient matrix and an unknown vector as follows,

x1 + 2x2 + x3 = 1, 

1 2 1
  
x1
−3x1 + x2 + 3x3 = 24, ⇒ A = −3 1 3 , x = x2  .
0 1 −4 x3

x2 − 4x3 = −1.

Once we have the notion of a matrix and a vector, we need to introduce an operation between
these two objects so we can write linear algebraic systems. The operation is the matrix-vector
product.
208 5. OVERVIEW OF LINEAR ALGEBRA

Definition 5.1.7. The matrix-vector product of an m × n matrix A and an n-vector x is an


m-vector given by
a11 · · · a1n a11 x1 + · · · + a1n xn
    
x1
Ax =  ... ..   ..  =  ..
 
.  .   . 
am1 · · · amn xn am1 x1 + · · · + amn xn

Remark: As we said above, this product is useful to express linear systems of algebraic equations
in terms of matrices and vectors.

Example 5.1.6. Find the matrix-vector products for the matrices A and vectors x in Exam-
ple 5.1.5(b).
Solution: For the matrix and vector in Example 5.1.5(b) we get
    
1 2 1 x1 x1 + 2x2 + x3
Ax = −3 1 3  x2  = −3x1 + x2 + 3x3  .
0 1 −4 x3 x2 − 4x3
C

Example 5.1.7. Use the matrix-vector product to express the algebraic linear system below,
2x1 − x2 = 0,
−x1 + 2x2 = 3.

Solution: Let the coefficient matrix A, the unknown vector x, and the source vector b be
     
2 −1 x1 0
A= , x= , b= .
−1 2 x2 3
Since the matrix-vector product Ax is given by
    
2 −1 x1 2x1 − x2
Ax = = ,
−1 2 x2 −x1 + 2x2
then we conclude that
2x1 − x2 = 0,
    
2x1 − x2 0
⇔ = ⇔ Ax = b.
−x1 + 2x2 = 3, −x1 + 2x2 3
C

It is simple to see that the result found in the Example above can be generalized to every n × n
algebraic linear system.

Theorem 5.1.8 (Matrix Notation). The algebraic linear system


a11 x1 + · · · + a1n xn = b1 ,
..
.
an1 x1 + · · · + ann xn = bn .
can be written as
Ax = b,
where the coefficient matrix A, the unknown vector x, and the source vector b are
a11 · · · a1n
     
x1 b1
A =  ... ..  ,  .  .
x = . , b = ..  .

.   .  
an1 · · · ann xn bn
5.1. LINEAR ALGEBRAIC SYSTEMS 209

Proof of Theorem 5.1.8: From the definition of the matrix-vector product we have that
a11 · · · a1n a11 x1 + · · · + a1n xn
    
x1
Ax =  ... ..   ..  =  ..
.
 
.  .   .
an1 · · · ann xn an1 x1 + · · · + a1n xn
Then, we conclude that
a11 x1 + · · · + a1n xn = b1 , 

a11 x1 + · · · + a1n xn
   
 b1
..

⇔ ..  .

  .. 
= Ax = b.

. 
 .
an1 x1 + · · · + a1n xn

an1 x1 + · · · + ann xn = bn ,
 bn


The most interesting idea coming from the matrix picture of an algebraic linear system is that
a matrix can be thought as a function on the space of vectors. Indeed, the matrix-vector product
says that an m × n matrix is a function that takes an n-vector and produces another m-vector.

Example 5.1.8. Describe the action on R2 of the function given by the 2 × 2 matrix
 
0 1
A= . (5.1.6)
1 0

Solution: The action of this matrix on an arbitrary element x ∈ R2 is given below,


    
0 1 x1 x2
Ax = = .
1 0 x2 x1
Therefore, this matrix interchanges the components x1 and x2 of the vector x. It can be seen in
the first picture in Fig. 13 that this action can be interpreted as a reflection on the plane along the
line x1 = x2 . C

x2 x2

x2 = x1
Ax Ax
z
Ay y
Az x
x
x1 x1

Figure 13. Geometrical meaning of the function determined by the matrix


in Eq. (5.1.6) and the matrix in Eq. (5.1.7), respectively.

Example 5.1.9. Describe the action on R2 of the function given by the 2 × 2 matrix
 
0 −1
A= . (5.1.7)
1 0

Solution: The action of this matrix on an arbitrary element x ∈ R2 is given below,


    
0 −1 x1 −x2
Ax = = .
1 0 x2 x1
210 5. OVERVIEW OF LINEAR ALGEBRA

In order to understand the action of this matrix, we give the following particular cases:
              
0 −1 1 0 0 −1 1 −1 0 −1 −1 0
= , = , = .
1 0 0 1 1 0 1 1 1 0 0 −1
These cases are plotted in the second figure on Fig. 13, and the vectors are called x, y and z,
respectively. We therefore conclude that this matrix produces a ninety degree counterclockwise
rotation of the plane. C
We know that a scalar-valued function is map that to every number in a subset of R assigns a
unique number, also in R. We now know that an m × n matrix A is a function too, but to any
n-vector x the matrix assigns it to an m-vector y = Ax. Therefore, one can think an m × n matrix
as a generalization of the concept of function from the spaces R → R to the spaces Rn → Rm , for
any positive integers n, m.
It is well-known how to define several operations on scalar-valued functions, like linear combi-
nations, compositions, and the inverse function. Therefore, it is reasonable to ask if these operation
on scalar-valued functions can be generalized as well to matrices. The answer is yes, and the study
of these and other operations is the subject of the rest of this Chapter.
5.1. LINEAR ALGEBRAIC SYSTEMS 211

5.1.5. Exercises.

5.1.1.- . 5.1.2.- .
212 5. OVERVIEW OF LINEAR ALGEBRA

5.2. Vector Spaces


In this section we develop the main concepts introduced in the column picture of an algebraic linear
system. From the column picture we obtained the idea of vectors, linear combination of vectors,
and the concept of a vector space. In this section we go deeper on this idea and we introduce the
notion of subspace of a vector space, the span of a set of vectors, the linear dependence of a set of
vectors, and a basis of either a vector space or a subspace.

5.2.1. Spaces and Subspaces. The idea of a vector space is the most important concept
coming from the column picture of an algebraic linear system, see § 5.1. We said that a vector
space is the set of all n-vectors, real or complex, with the operation of linear combination. Recall
the notation F ∈ {R, C}, meaning F is either R or C.

Definition 5.2.1. A vector space Fn over the field F is the set of all n-vectors such that for all
u, v ∈ Fn and all a, b ∈ F holds that (au + bv) ∈ Fn .

Remark: One can go way further than this. Actually, one can define a vector space as any set such
that there is a linear combination defined for all the elements in the set. The linear combination
operation means both the addition of two elements and multiplication of an element by a number
(stretching-compressing or reversing an element). What the actual elements are in a vector space
is not important. What it is important is the operation of linear combination. The elements in a
vector space could be arrows in R2 or R3 or C2 or C3 ; or the elements could be m × n matrices;
or the elements could be polynomials functions; or the elements could be any continuum functions
on a given domain. It is for this reason that the definition of a vector space is given in terms of
the properties of the operation linear combination instead of the space elements. However in these
notes we restrict to the vector spaces Fn , and the definition above is enough for us.
A subspace is a region inside a vector space such that this region is itself a vector space. In
other words, a subspace is a smaller vector space inside the original vector space.

Definition 5.2.2. A subspace of the vector space Fn over the field F is a subset W ⊂ Fn such
that for all u, v ∈ W and all a, b ∈ F holds that
(au + bv) ∈ W.

Remarks:
(a) A subspace is a particular type of set in a vector space. Is a set where all possible linear
combinations of two elements in the set results in another element in the same set. In other
words, elements outside the set cannot be reached by linear combinations of elements within
the set. For this reason a subspace is called a closed set under linear combinations.
(b) It is simple to show that if W is a subspace, then 0 ∈ W . In other words, if 0 ∈ / W , then W
cannot be a subspace.
(c) Lines and planes containing the zero vector are subspaces of Fn .

Example 5.2.1. Show that the set W ⊂ R3 given by W = u = [ui ] ∈ R3 : u3 = 0 is a subspace




of R3 :
Solution: Given two arbitrary elements u, v ∈ W we must show that au + bv ∈ W for all a, b ∈ R.
Since u, v ∈ W we know that    
u1 v1
u = u2  , v = v2  .
0 0
Therefore  
au1 + bv1
au + bv = au2 + bv2  ∈ W,
0
5.2. VECTOR SPACES 213

since the third component vanishes, which makes the linear combination an element in W . Hence,
W is a subspace of R3 . In Fig. 14 we see the plane u3 = 0. It is a subspace, since not only 0 ∈ W ,
but any linear combination of vectors on the plane stays on the plane. C

x3

x2

x3 = 0
x1

Figure 14. The horizontal plane u3 = 0 is a subspace of R3 .

Example 5.2.2. Show that the set W = u = [ui ] ∈ R2 : u2 = 1 is not a subspace of R2 .




Solution: The set W is not a subspace, since 0 ∈ / W . This is enough to show that W is not a
subspace. Another proof is that the addition of two vectors in the set is a vector outside the set,
as can be seen by the following calculation,
     
u1 v1 u1 + v1
u= ∈ W, v = ∈W ⇒ u+v= ∈
/ W.
1 1 2
The second component in the addition above is 2, not 1, hence this addition does not belong to W .
An example of this calculation is given in Fig. 15. C

x2
Not in W

x1

Figure 15. The horizontal line u2 = 1 is not a subspace of R2 .

If a set is not a subspace, there is a way to increase it into a subspace including all possible
linear combinations of elements in the old set.

Definition 5.2.3. The span of set S = {u1 , · · · , uk } in a vector space Fn over F, is


Span(S) = {u ∈ Fn : u = c1 u1 + · · · + ck uk , c1 , · · · , ck ∈ F}.

The span of a finite set of vector is the set of all possible linear combinations of the original
vectors. The following result remarks that the span of a set is a subspace.

Theorem 5.2.4. Given a finite set S in a vector space Fn , Span(S) is a subspace of Fn .


214 5. OVERVIEW OF LINEAR ALGEBRA

Proof of Theorem 5.2.4: Span(S) is closed under linear combinations because it contains all
possible linear combinations of the elements in S. This establishes the Theorem. 

Example 5.2.3. Consider the following examples in R3 .


(a) The span of a single, nonzero, vector is the whole line containing the vector.
(b) The span of two parallel vector is the whole line containing both vectors.
(c) The span of two non-parallel vectors is the whole plane containing both vectors.
(d) The span of three vectors lying on a plane is that same plane.
(e) The span of the vectors i = h1, 0, 0i, j = h0, 1, 0i, k = h0, 0, 1i, is the whole space R3 .
C

5.2.2. Linear Independence. We know that two vectors are parallel if one vector is pro-
portional to the other. If one tries to compute the span of these two vectors, one realizes that we
have redundant information. In fact,
Span({u, v = au}) = Span({u}),
since the second vector v does not bring any useful information to the span that is not already
contained in u, because v = a u. So, when it comes to compute the span of a set of vectors one
tries to eliminate all vectors that are not important. And that is why it is important to generalize
the notion of proportional vectors to a set of any number of vectors.

Definition 5.2.5. A finite set of vectors v1 , · · · , vk in a vector space is linearly dependent iff
there exists a set of scalars {c1 , · · · , ck }, not all of them zero, such that,
c1 v1 + · · · + ck vk = 0. (5.2.1)

The set v1 , · · · , vk is called linearly independent iff Eq. (5.2.1) implies that every scalar van-
ishes, c1 = · · · = ck = 0.

Remark: The wording in this definition is carefully chosen to cover the case of the empty set. The
result is that the empty set is linearly independent. It might seems strange, but this result fits well
with the rest of the theory. On the other hand, the set {0} is linearly dependent, since c1 0 = 0 for
any c1 6= 0. Moreover, any set containing the zero vector is also linearly dependent.

Example 5.2.4. Determine whether set S ⊂ R2 below is linearly dependent or independent,


n1 0 2o
S= , , .
0 1 3

Solution: It is clear that


             
2 1 0 1 0 2 0
=2 +3 ⇒ 2 +3 − = .
3 0 1 0 1 3 0
Since c1 = 2, c2 = 3, and c3 = −1 are non-zero, the set S is linearly dependent. C

A particular type of finite sets in a subspace that is small enough to be linearly independent
and big enough to span the whole subspace is called a basis of that subspace.

Definition 5.2.6. A basis of a subspace V ⊆ Fn is a finite set S ⊂ V such that S is linearly


independent and Span(S) = V .

Notice that the definition of basis also applies to the whole vector space, that is, the case when
V = Fn . The existence of a finite basis is the property that defines the size of the space.

Definition 5.2.7. A subspace V ⊆ Fn is finite dimensional iff V has a finite basis or V is one
of the following two extreme cases: V = ∅ or V = {0}.
5.2. VECTOR SPACES 215

Example 5.2.5. We now present several examples.


   o
2
n 1 0
(a) Let V = R , then the set S2 = e1 = , e2 = is a basis of F2 . Notice that ei is the
0 1
i-th column of the identity matrix I. This basis is called the standard basis of R2 .
(b) A vector space  can have
 infinitely
o many bases. For example, a second basis for R2 is the set
n 1 −1
U = u1 = , u2 = . It is not difficult to verify that this set is a basis of R2 , since U
1 1
is linearly independent, and Span(U) = R2 .
C

One can prove that the number of vectors in a basis of a vector space or subspace is always the
same as in any other basis of that space. Therefore, the number of elements in a basis of a vector
space or subspace is a characteristic of the space. We then give that characteristic number a name.

Definition 5.2.8. The dimension of a finite dimensional subspace V ⊆ Fn with a basis is denoted
as dim V and it is the number of elements in any basis of V . The extreme cases of V = ∅ and
V = {0} are defined as zero dimensional.

Example 5.2.6.
(a) Lines through the origin in R2 are subspaces of dimension 1.
(b) Lines through the origin in R3 are subspaces of dimension 1.
(c) Planes through the origin in R3 are subspaces of dimension 2.
(d) The vector space Fn has dimension n.
C
216 5. OVERVIEW OF LINEAR ALGEBRA

5.2.3. Exercises.

5.2.1.- . 5.2.2.- .
5.3. MATRIX ALGEBRA 217

5.3. Matrix Algebra


In this section we develop the main concept from the matrix picture of an algebraic linear system.
In § 5.1 we discovered that a matrix is a function on vector spaces. More precisely, an m × n matrix
is a function from the vector space of n-vectors, Fn , to the vector space of m-vectors, Fm .
Since a matrix is a function, we can introduce operations on matrices just as we do it for regular
functions on real numbers. In this section we focus on square matrices and we show how to compute
linear combinations of matrices, the composition of matrices—also called matrix multiplication, the
trace, determinant, and the inverse of a square matrix. These operations will be needed later on
when we study systems of differential equations.
5.3.1. Matrix Operations. The first operation we study is the linear combination of ma-
trices. Recall that we define the linear combination of two scalar functions, say f and g, in the
following way: given arbitrary numbers a and b, the function (af + bg) is
(af + bg)(x) = a f (x) + b g(x),
for all x where both f and g are defined. We can do the same for matrices. The linear combination
of two m × n matrices, A and B, is the following: for all scalar numbers a and b we define the
matrix (aA + bB) as follows,
(aA + bB)x = a Ax + b Bx,
for all x ∈ Rn . We can see how this definition works in the particular case of 2 × 2 matrices.

Example 5.3.1. Compute the linear combination aA + bB for


   
A11 A12 B11 B12
A= , B= .
A21 A22 B21 B22

Solution: Let’s find the components of the linear combination matrix. On the one hand we have,
  
(aA + bB)11 (aA + bB)12 x1
(aA + bB)x = .
(aA + bB)21 (aA + bB)22 x2
On the other hand we have
(aA + bB)x = a Ax + b Bx
     
A11 A12 x1 B11 B12 x1
=a +b
A21 A22 x2 B21 B22 x2
   
aA11 x1 + aA12 x2 bB11 x1 + bB12 x2
= +
aA21 x1 + aA22 x2 bB21 x1 + bB22 x2
 
(aA11 + bB11 )x1 + (aA12 + bB12 )x2
= .
(aA21 + bB21 )x1 + (aA22 + bB22 )x2
Therefore, we obtain
  
(aA11 + bB11 ) (aA12 + bB12 ) x1
(aA + bB)x = .
(aA21 + bB21 ) (aA22 + bB22 ) x2
Since the calculation above holds for all x ∈ R2 , we get a formula for the matrix representing the
linear combination
   
(aA + bB)11 (aA + bB)12 (aA11 + bB11 ) (aA12 + bB12 )
= .
(aA + bB)21 (aA + bB)22 (aA21 + bB21 ) (aA22 + bB22 )
which means that
(
(aA + bB)11 = (aA11 + bB11 ), (aA + bB)12 = (aA12 + bB12 ),
(aA + bB)21 = (aA21 + bB21 ), (aA + bB)22 = (aA22 + bB22 ).
One can write all these four equations in one line, using component notation, saying that for all
i, j = 1, 2 holds
(aA + bB)ij = a Aij + b Bij .
C
218 5. OVERVIEW OF LINEAR ALGEBRA

The calculation done in the example above for 2 × 2 matrices can be easily generalized to m × n
matrices. For these arbitrary matrices it is important to use index notation, to keep the equations
small enough. For this reason we introduce the component notation for matrices and vectors. We
denote an m × n matrix by A = [Aij ], where Aij are the components of matrix A, with i = 1, · · · , m
and j = 1, · · · , n. Analogously, an n-vector is denoted by v = [vj ], where vj are the components of
the vector v. We also introduce the notation F ∈ {R, C}, that is, the set F can be either the real or
the complex numbers. Finally, the space of all m × n matrices, real or complex, is called Fm,n .

Definition 5.3.1. Let A = [Aij ] and B = [Bij ] be m × n matrices in Fm,n and a, b be numbers in
F. The linear combination of A and B is also an m × n matrix in Fm,n , denoted as a A + b B,
and given by
a A + b B = [a Aij + b Bij ].

The particular case where a = b = 1 corresponds to the addition of two matrices, and the particular
case of b = 0 corresponds to the multiplication of a matrix by a number, that is,
A + B = [Aij + Bij ], a A = [a Aij ].
   
1 2 2 3
Example 5.3.2. Find the A + B, where A = ,B= .
3 4 5 1
Solution: The addition of two equal size matrices is performed component-wise:
       
1 2 2 3 (1 + 2) (2 + 3) 3 5
A+B = + = = .
3 4 5 1 (3 + 5) (4 + 1) 8 5 C
   
1 2 1 2 3
Example 5.3.3. Find the A + B, where A = ,B= .
3 4 4 5 6
Solution: The matrices have different sizes, so their addition is not defined. C
 
1 3 5
Example 5.3.4. Find 2A and A/3, where A = .
2 4 6
Solution: The multiplication of a matrix by a number is done component-wise, therefore
1 5
1
3 3
     
1 3 5 2 6 10 A 1 1 3 5
2A = 2 = , = = .
2 4 6 4 8 12 3 3 2 4 6 
2 4

2 C
3 3

Since matrices are generalizations of scalar-valued functions, one can define operations on
matrices that, unlike linear combinations, have no analogs on scalar-valued functions. One of
such operations is the transpose of a matrix, which is a new matrix with the rows and columns
interchanged.
m,n
Definition
 T  5.3.2. The transpose of a matrix A =  [Aij ] ∈ F is the matrix denoted as AT =
(A )kl ∈ Fn,m , with its components given by AT kl = Alk .
 
1 3 5
Example 5.3.5. Find the transpose of the 2 × 3 matrix A = .
2 4 6
Solution: Matrix A has components Aij with i = 1, 2 and j = 1, 2, 3. Therefore, its transpose has
components (AT )ji = Aij , that is, AT has three rows and two columns,
 
1 2
AT = 3 4 .
5 6
C
5.3. MATRIX ALGEBRA 219

If a matrix has complex-valued coefficients, then the conjugate of a matrix can be defined as
the conjugate of each component.

Definition 5.3.3. The complex conjugate of a matrix A = [Aij ] ∈ Fm,n is the matrix
A = Aij ∈ Fm,n .
 

Example 5.3.6. A matrix A and its conjugate is given below,


   
1 2+i 1 2−i
A= , ⇔ A= .
−i 3 − 4i i 3 + 4i
C

Example 5.3.7. A matrix A has real coefficients iff A = A; It has purely imaginary coefficients iff
A = −A. Here are examples of these two situations:
   
1 2 1 2
A= ⇒ A= = A;
3 4 3 4
   
i 2i −i −2i
A= ⇒ A= = −A.
3i 4i −3i −4i
C

Definition 5.3.4. The adjoint of a matrix A ∈ Fm,n is the matrix


T
A∗ = A ∈ Fn,m .

T
Remark: Since A = (AT ), the order of the operations does not change the result, that is why
there is no parenthesis in the definition of A∗ .

Example 5.3.8. A matrix A and its adjoint is given below,


   
1 2+i 1 i
A= , ⇔ A∗ = .
−i 3 − 4i 2−i 3 + 4i
C

The transpose, conjugate and adjoint operations are useful to specify certain classes of matrices
with particular symmetries. Here we introduce few of these classes.

Definition 5.3.5. An n × n matrix A is called:


(a) symmetric iff A = AT ;
(b) skew-symmetric iff A = −AT ;
(c) Hermitian iff A = A∗ ;
(d) skew-Hermitian iff A = −A∗ .

Example 5.3.9. We present examples of each of the classes introduced in Def. 5.3.5.
Part (a): Matrices A and B are symmetric. Notice that A is also Hermitian, while B is not
Hermitian,    
1 2 3 1 2 + 3i 3
T
A = 2 7 4  = A , B = 2 + 3i 7 4i = B T .
3 4 8 3 4i 8
Part (b): Matrix C is skew-symmetric,
   
0 −2 3 0 2 −3
C= 2 0 −4 ⇒ C T = −2 0 4  = −C.
−3 4 0 3 −4 0
Notice that the diagonal elements in a skew-symmetric matrix must vanish, since Cij = −Cji in
the case i = j means Cii = −Cii , that is, Cii = 0.
220 5. OVERVIEW OF LINEAR ALGEBRA

Part (c): Matrix D is Hermitian but is not symmetric:


   
1 2+i 3 1 2−i 3
D = 2 − i 7 4 + i ⇒ DT = 2 + i 7 4 − i 6= D,
3 4−i 8 3 4+i 8
however,
 
1 2+i 3
∗ T
D =D = 2 − i 7 4 + i = D.
3 4−i 8
Notice that the diagonal elements in a Hermitian matrix must be real numbers, since the condition
Aij = Aji in the case i = j implies Aii = Aii , that is, 2iIm(Aii ) = Aii − Aii = 0. We can also
T
verify what we said in part (a), matrix A is Hermitian since A∗ = A = AT = A.
Part (d): The following matrix E is skew-Hermitian:
   
i 2+i −3 i −2 + i 3
T
E = −2 + i 7i 4 + i ⇒ E = 2 + i 7i −4 + i
3 −4 + i 8i −3 4+i 8i
therefore,
 
−i −2 − i 3
T
E∗ = E 2 − i −7i −4 − i = −E.
−3 4−i −8i
A skew-Hermitian matrix has purely imaginary elements in its diagonal, and the off diagonal ele-
ments have skew-symmetric real parts with symmetric imaginary parts. C

Another oeration on matrices that has no analog on scalar functions is the trace of a square
matrix. The trace is a number—the sum of all the diagonal elements of the matrix.

Definition 5.3.6. The trace of a square matrix A = Aij ∈ Fn,n , denoted as tr (A) ∈ F, is the
 

sum of its diagonal elements, that is, the scalar given by


tr (A) = A11 + · · · + Ann .

 
1 2 3
Example 5.3.10. Find the trace of the matrix A = 4 5 6.
7 8 9
Solution: We only have to add up the diagonal elements:
tr (A) = 1 + 5 + 9 ⇒ tr (A) = 15.
C

The operation of matrix multiplication originates in the composition of functions. We call it


matrix multiplication instead of matrix composition because it reduces to the multiplication of real
numbers in the case of 1 × 1 real matrices. Unlike the multiplication of real numbers, the product of
general matrices is not commutative, that is, AB 6= BA in the general case. This property comes
from the fact that the composition of two functions is a non-commutative operation.

Definition 5.3.7. The matrix multiplication of the m × n matrix A = [Aij ] and the n × ` matrix
B = [Bjk ], where i = 1, · · · , m, j = 1, · · · , n and k = 1, · · · , `, is the m × ` matrix AB given by
n
X
(AB)ij = Aik Bkj . (5.3.1)
k=1
5.3. MATRIX ALGEBRA 221

The product is not defined for two arbitrary matrices, since the size of the matrices is important:
the numbers of columns in the first matrix must match the numbers of rows in the second matrix.
A times B defines AB
m×n n×` m×`
   
2 −1 3 0
Example 5.3.11. Compute AB, where A = and B = .
−1 2 2 −1
Solution: The component (AB)11 = 4 is obtained from the first row in matrix A and the first
column in matrix B as follows:
    
2 −1 3 0 4 1
= , (2)(3) + (−1)(2) = 4;
−1 2 2 −1 1 −2
The component (AB)12 = −1 is obtained as follows:
    
2 −1 3 0 4 1
= , (2)(0) + (−1)(1) = −1;
−1 2 2 −1 1 −2
The component (AB)21 = 1 is obtained as follows:
    
2 −1 3 0 4 1
= , (−1)(3) + (2)(2) = 1;
−1 2 2 −1 1 −2
And finally the component (AB)22 = −2 is obtained as follows:
    
2 −1 3 0 4 1
= , (−1)(0) + (2)(−1) = −2.
−1 2 2 −1 1 −2
C
   
2 −1 3 0
Example 5.3.12. Compute BA, where A = and B = .
−1 2 2 −1
 
6 −3
Solution: We find that BA = . Notice that in this case AB 6= BA. C
5 −4

We can see from the previous two examples that the matrix product is not commutative, since
we have computed both AB and BA and AB 6= BA. It is even more interesting to note that
sometimes the matrix multiplication of matrices A and B may be defined for AB but it may not
be even defined for BA.
   
4 3 1 2 3
Example 5.3.13. Compute AB and BA, where A = and B = .
2 1 4 5 6
Solution: The product AB is
    
4 3 1 2 3 16 23 30
AB = ⇒ AB = .
2 1 4 5 6 6 9 12
The product BA is not possible. C
   
1 2 −1 1
Example 5.3.14. Compute AB and BA, where A = and B = .
1 2 1 −1
Solution: We find that     
1 2 −1 1 1 −1
AB = = ,
1 2 1 −1 1 −1
    
−1 1 1 2 0 0
BA = = .
1 −1 1 2 0 0

Remarks:
(a) Notice that in this case AB 6= BA.
(b) Notice that BA = 0 but A 6= 0 and B 6= 0.
222 5. OVERVIEW OF LINEAR ALGEBRA

The matrix multiplication can be written in terms of matrix-vector products.

Theorem
 5.3.8. The matrix multiplication of an m × n matrix A with an n × ` matrix B =
b1 , · · · , b` , where the bi for i = 1, · · · , `, are n-vectors, can be written as
 
AB = Ab1 , · · · , Ab` .

Proof of Theorem 5.3.8: If we write the matrix B in components and as column vectors,
B11 · · · B1` (b1 )1 · · · (b` )1
   

B =  ... ..  =  .. .. 

.   . . 
Bn1 · · · Bn` (b1 )n · · · (b` )n
Therefore, the matrix multiplication formula implies
n
X n
X
(AB)ij = Aik Bkj = Aik (bj )k = (Abj )i ,
k=1 k=1

that is, (AB)ij is the component i of the vector Abj . We then conclude that
 
AB = Ab1 , · · · , Ab` .
This establishes the Theorem. 

5.3.2. The Inverse Matrix. Since matrices are functions, one can try to find the inverse of
a given matrix. We show how this can be done for 2 × 2 matrices.
 
a b
Example 5.3.15. Find, when possible, the inverse of a general 2 × 2 matrix A = .
c d
 
x1
Solution: The matrix A defines a function that every vector x = is mapped to a vector
x2
 
y1
y= , as follows
y2
a x 1 + b x 2 = y1 ,
     
a b x1 y1
Ax = y ⇒ = ⇒
c d x2 y2 c x 1 + d x 2 = y2 .
The calculation we should do depends on what coefficients in matrix A are nonzero. Since A has
four coefficients, we have four possibilities. Here we consider only one case b 6= 0. The other cases
are left as an exercise. When one studies the other cases, one can see that the final conclusion we
will get is the same for all cases.
So, consider the case b 6= 0. Then, from the first equation we get x2 in terms of x1 ,
1
x2 = (−a x1 + y1 ).
b
If we put this expression for x2 in the second equation above we get
d
c x1 + (−a x1 + y1 ) = y2 ⇒ (ad − bc) x1 = d y1 − b y2 .
b
Now we need to focus on ∆ = ad − bc. If ∆ = 0, we cannot compute x1 . So we assume that the
coefficients in matrix A are such that ∆ 6= 0. In this case we get
1
x1 = (d y1 − b y2 ).

Using the expression for x1 into the formula for x2 we get
1
x2 = (−c y1 + a y2 ).

5.3. MATRIX ALGEBRA 223

These formulas for x1 , x2 say that


1

x1 = (d y1 − b y2 ),    
1 d −b y1
 
x1

∆ ⇒ = .
1 x2 ∆ −c a y2
x2 = (−c y1 + a y2 )


In matrix notation we can summarize our calculations as follows. The system
 
a b
Ax = y, with A = and ∆ = ad − bc 6= 0
c d
has the solution  
1 d −b
x = A−1 y with A−1 = .
∆ −c a
The matrix A−1 is called the inverse of matrix A. The inverse exists if and only if the number
∆ = ad − bc 6= 0. C

The calculation in the example above is the proof of the following result.

Theorem 5.3.9. Given a 2 × 2 matrix A introduce the number ∆ as follows,


 
a b
A= , ∆ = ad − bc.
c d
The matrix A is invertible iff ∆ 6= 0. Furthermore, if A is invertible, its inverse is given by
 
1 d −b
A−1 = . (5.3.2)
∆ −c a

The number ∆ is called the determinant of A, since it is the number that determines whether A
is invertible or not. Notice that the inverse matrix we found above satisfies the following property.

Theorem 5.3.10. Given a 2 × 2 invertible matrix A, its inverse matrix satisfies


 
1 0
(A−1 ) A = I, A (A−1 ) = I, where I = .
0 1

 
1 0
Remark: The matrix I = is called the 2 × 2 identity matrix, since it satisfies that Ix = x
0 1
for all x ∈ R2 .
Proof of Theorem 5.3.10: Let’s write matrix A as
 
a b
A= .
c d
Since A is invertible, ∆ = ad − bc 6= 0. Then the inverse matrix is
 
1 d −b
A−1 = .
∆ −c a
Now a straightforward calculation says that
    
1 d −b a b 1 0
A−1 A = = .
∆ −c a c d 0 1
It is also straightforward to check that
     
a b 1 d −b 1 0
A A−1 = = .
c d ∆ −c a 0 1
This establishes the Theorem. 
224 5. OVERVIEW OF LINEAR ALGEBRA

Example 5.3.16. Verify that the matrix and its inverse are given by
   
2 2 1 3 −2
A= , A−1 = .
1 3 4 −1 2

Solution: We have to compute the products,


     
−1  2 2 1 3 −2 1 4 0
A A−1 = I2 .

A A = = ⇒
1 3 4 −1 2 4 0 4
It is simple to check that the equation A−1 A = I2 also holds.

C
 
2 2
Example 5.3.17. Compute the inverse of matrix A = , given in Example 5.3.16.
1 3
Solution: Following Theorem 5.3.9 we first compute ∆ = 6 − 4 = 4. Since ∆ 6= 0, then A−1 exists
and it is given by  
1 3 −2
A−1 = .
4 −1 2
C
 
1 2
Example 5.3.18. Compute the inverse of matrix A = .
3 6
Solution: Following Theorem 5.3.9 we first compute ∆ = 6 − 6 = 0. Since ∆ = 0, then matrix A
is not invertible. C

The matrix operations we have introduced are useful to solve matrix equations, where the
unknown is a matrix. We now show an example of a matrix equation.

Example 5.3.19. Find a matrix X such that AXB = I, where


     
1 3 2 1 1 0
A= , B= , I= .
2 1 1 2 0 1

Solution: There are many ways to solve a matrix equation. We choose to multiply the equation
by the inverses of matrix A and B, if they exist. So first we check whether A is invertible. But

1 3
det(A) = = 1 − 6 = −5 6= 0,
2 1
so A is indeed invertible. Regarding matrix B we get

2 1
det(B) = = 4 − 1 = 3 6= 0,
1 2
so B is also invertible. We then compute their inverses,
   
−1 1 1 −3 1 2 −1
A = , B= .
−5 −2 1 3 −1 2
We can now compute X,
AXB = I ⇒ A−1 (AXB)B −1 = A−1 IB −1 ⇒ X = A−1 B −1 .
Therefore,      
1 1 −3 1 2 −1 1 5 −7
X= =−
−5 −2 1 3 −1 2 15 −5 4
so we obtain  1 7 

X =  13 15  .
4

3 15
C
5.3. MATRIX ALGEBRA 225

The concept of the inverse matrix can be generalized from 2 × 2 matrices to n × n matrices.
One generalizes first the concept of the identity matrix.

Definition 5.3.11. The matrix I ∈ Fn,n is the identity matrix iff Ix = x for all x ∈ Fn .

It is simple to see that the components of the n × n identity matrix is


Iii = 1

I = [Iij ] with
Iij = 0 i 6= j.
The cases n = 2, 3 are given by
 
  1 0 0
1 0
I= , I = 0 1 0 .
0 1
0 0 1
Once we have the n × n identity matrix we can define the inverse of an n × n matrix.

Definition 5.3.12. A matrix A ∈ Fn,n is called invertible iff there exists a matrix, denoted as
A−1 , such that
A−1 A = In and A A−1 = In .
 

Similarly to the 2 × 2 case, not every n × n matrix is invertible. It turns out one can define a
number computed out of the matrix components that determines whether the matrix is invertible
or not. That number is called the determinant, and we already introduced it for 2 × 2 matrices.
We now give a formula for the determinant of a 3 × 3 matrix, and we mention that this number
can be also defined on n × n matrices.
5.3.3. Overview of Determinants. A determinant is a number computed form a square
matrix that gives important information about the matrix, for example if the matrix is invertible
or not. We have already defined the determinant for 2 × 2 matrices.
 
a11 a12
Definition 5.3.13. The determinant of a 2 × 2 matrix A = is given by
a21 a22

a11 a12
det(A) = = a11 a22 − a12 a21 .
a21 a22

Remark: The absolute value of the determinant of a 2 × 2 matrix


 
A = a1 , a2
has a geometrical meaning—it is the area of the parallelogram whose sides are given the columns
of the matrix A, as shown in Fig. 16.

x2 R2
a2

a1

x1

Figure 16. Geometrical meaning of the determinant of a 2 × 2 matrix.


226 5. OVERVIEW OF LINEAR ALGEBRA

It turns out that this geometrical meaning of the determinant is crucial to extend the definition
of determinant from 2 × 2 to 3 × 3 matrices. The result is given in the followng definition.
 
a11 a12 a13
Definition 5.3.14. The determinant of a 3 × 3 matrix A = a21 a22 a23  is
a31 a32 a33

a11 a12 a13
a22 a23
− a12 a21 a23 + a13 a21 a22 .

det(A) = a21 a22 a23 = a11
a31 a32 a33 a32 a33 a31 a33 a31 a32

Remarks:
(1) This is a recursive definition. The determinant of a 3 × 3 matrix is written in terms of
three determinants of 2 × 2 matrices.
(2) The absolute value of the determinant of a 3 × 3 matrix
 
A = a1 , a2 , a3
is the volume of the parallelepiped formed by the columns of matrix A, as pictured in
Fig. 17.

x3 R3

a2

a1 a3

x2
x1

Figure 17. Geometrical meaning of the determinant of a 3 × 3 matrix.

Example 5.3.20. The following three examples show that the determinant can be a negative, zero
or positive number.

1 2 2 1 1 2
= 4 − 6 = −2, = 8 − 3 = 5,
2 4 = 4 − 4 = 0.

3 4 3 4

The following is an example shows how to compute the determinant of a 3 × 3 matrix,



1 3 −1
= (1) 1 1 − 3 2 1 + (−1) 2 1

2 1 1
2 1 3 1 3 2
3 2 1
= (1 − 2) − 3 (2 − 3) − (4 − 3)
= −1 + 3 − 1
= 1. C

Remark: The determinant of upper or lower triangular matrices is the product of the diagonal
coefficients.
5.3. MATRIX ALGEBRA 227

Example 5.3.21. Compute the determinant of a 3 × 3 upper triangular matrix.


Solution:

a11 a12 a13
a22 a23 0 a23 0 a22
0 a22 a23 = a11
0 − a12 + a13 = a11 a22 a33 .

0 a33 0 a33 0 0
0 a33
C

Remark: Recall that one ca prove that a 3 × 3 matrix is invertible if an only if it determinant is
nonzero.
 
1 2 3
Example 5.3.22. Is matrix A = 2 5 7 invertible?
3 7 9
Solution: We only need to compute the determinant of A.

1 2 3
5 7
− (2) 2 7
2 5
det(A) = 2 5 7 = (1) + (3) .
3 7 7 9 3 9 3 7
9
Using the definition of determinant for 2 × 2 matrices we obtain
det(A) = (45 − 49) − 2(18 − 21) + 3(14 − 15) = −4 + 6 − 3.
Since det(A) = −1, that is, non-zero, matrix A is invertible. C

The determinant of an n × n matrix can be defined generalizing the properties that areas of
parallelograms have in two dimensions and volumes of parallelepipeds have in three dimensions.
One can find a recursive formula for the determinant of an n × n matrix in terms of n determinants
of (n − 1) × (n − 1) matrices. And the determinant so defined has its most important property—an
n × n matrix is invertible if and only if its determinant is nonzero.
228 5. OVERVIEW OF LINEAR ALGEBRA

5.3.4. Exercises.

5.3.1.- . 5.3.2.- .
5.4. EIGENVALUES AND EIGENVECTORS 229

5.4. Eigenvalues and Eigenvectors


In this section we keep our attention on matrices. We know that a matrix is a function on vector
spaces—a matrix acts on a vector and the result is another vector. In this section we see that,
given an n × n matrix, there may exist subspaces in Fn that are left invariant under the action of
the matrix. This means that such a matrix acting on any vector in these subspaces is a vector on
the same subspace. The vector is called an eigenvector of the matrix, the proportionality factor is
called an eigenvalue, and the subspace is called an eigenspace.

5.4.1. Computing Eigenpairs. When a square matrix acts on a vector the result is another
vector that, more often than not, points in a different direction from the original vector. However,
there may exist vectors whose direction is not changed by the matrix. These will be important for
us, so we give them a name.

Definition 5.4.1. A number λ and a nonzero n-vector v are an eigenvalue with corresponding
eigenvector (eigenpair) of an n × n matrix A iff they satisfy the equation
Av = λv.

Remark: We see that an eigenvector v determines a particular direction in the space Rn , given
by (av) for a ∈ R, that remains invariant under the action of the matrix A. That is, the result of
matrix A acting on any vector (av) on the line determined by v is again a vector on the same line,
since
A(av) = a Av = aλv = λ(av).

Example 5.4.1. Verify that the pair λ1 , v1 and the pair λ2 , v2 are eigenvalue and eigenvector
pairs of matrix A given below,
 
1

 λ1 = 4 v 1 = ,
1
  
1 3

A= ,
3 1
 
λ2 = −2 v2 = −1 .


1

Solution: We just must verify the definition of eigenvalue and eigenvector given above. We start
with the first pair,
      
1 3 1 4 1
Av1 = = =4 = λ1 v1 ⇒ Av1 = λ1 v1 .
3 1 1 4 1
A similar calculation for the second pair implies,
      
1 3 −1 2 −1
Av2 = = = −2 = λ2 v2 ⇒ Av2 = λ2 v2 .
3 1 1 −2 1
C
 
0 1
Example 5.4.2. Find the eigenvalues and eigenvectors of the matrix A = .
1 0
Solution: This is the matrix given in Example 5.1.8. The action of this matrix on the plane is a
reflection along the line x1 = x2 , as it was shown in Fig. 13. Therefore, this line x1 = x2 is left
invariant under the action of this matrix. This property suggests that an eigenvector is any vector
on that line, for example
      
1 0 1 1 1
v1 = ⇒ = ⇒ λ1 = 1.
1 1 0 1 1
 
1
So, we have found one eigenvalue-eigenvector pair: λ1 = 1, with v1 = . We remark that any
1
nonzero vector proportional to v1 is also an eigenvector. Another choice fo eigenvalue-eigenvector
230 5. OVERVIEW OF LINEAR ALGEBRA

 
3
pair is λ1 = 1, with v1 = . It is not so easy to find a second eigenvector which does not belong to
3
the line determined by v1 . One way to find such eigenvector is noticing that the line perpendicular
to the line x1 = x2 is also left invariant by matrix A. Therefore, any nonzero vector on that line
must be an eigenvector. For example the vector v2 below, since
        
−1 0 1 −1 1 −1
v2 = ⇒ = = (−1) ⇒ λ2 = −1.
1 1 0 1 −1 1
 
−1
So, we have found a second eigenvalue-eigenvector pair: λ2 = −1, with v2 = . These two
1
eigenvectors are displayed on Fig. 18. C

x2 x2

x2 = x1
Ax Ax
Av1 = v1 π
θ=
v2 2
x
x
x1 x1

Av2 = −v2

Figure 18. The first picture shows the eigenvalues and eigenvectors of
the matrix in Example 5.4.2. The second picture shows that the matrix
in Example 5.4.3 makes a counterclockwise rotation by an angle θ, which
proves that this matrix does not have eigenvalues or eigenvectors.

There exist matrices that do not have eigenvalues and eigenvectors, as it is show in the example
below.
 
cos(θ) − sin(θ)
Example 5.4.3. Fix any number θ ∈ (0, 2π) and define the matrix A = . Show
sin(θ) cos(θ)
that A has no real eigenvalues.
Solution: One can compute the action of matrix A on several vectors and verify that the action
of this matrix on the plane is a rotation counterclockwise by and angle θ, as shown in Fig. 18. A
particular case of this matrix was shown in Example 5.1.9, where θ = π/2. Since eigenvectors of
a matrix determine directions which are left invariant by the action of the matrix, and a rotation
does not have such directions, we conclude that the matrix A above does not have eigenvectors and
so it does not have eigenvalues either. C

Remark: We will show that matrix A in Example 5.4.3 has complex-valued eigenvalues.
We now describe a method to find eigenvalue-eigenvector pairs of a matrix, if they exit. In
other words, we are going to solve the eigenvalue-eigenvector problem: Given an n × n matrix A
find, if possible, all its eigenvalues and eigenvectors, that is, all pairs λ and v 6= 0 solutions of the
equation
Av = λv.
This problem is more complicated than finding the solution x to a linear system Ax = b, where A
and b are known. In the eigenvalue-eigenvector problem above neither λ nor v are known. To solve
the eigenvalue-eigenvector problem for a matrix A we proceed as follows:
5.4. EIGENVALUES AND EIGENVECTORS 231

(a) First, find the eigenvalues λ;


(b) Second, for each eigenvalue λ, find the corresponding eigenvectors v.
The following result summarizes a way to solve the steps above.

Theorem 5.4.2 (Eigenvalues-Eigenvectors).


(a) All the eigenvalues λ of an n × n matrix A are the solutions of
det(A − λI) = 0. (5.4.1)
(b) Given an eigenvalue λ of an n × n matrix A, the corresponding eigenvectors v are the nonzero
solutions to the homogeneous linear system
(A − λI)v = 0. (5.4.2)

Proof of Theorem 5.4.2: The number λ and the nonzero vector v are an eigenvalue-eigenvector
pair of matrix A iff holds
Av = λv ⇔ (A − λI)v = 0,
where I is the n × n identity matrix. Since v 6= 0, the last equation above says that the columns
of the matrix (A − λI) are linearly dependent. This last property is equivalent, by Theorem ??, to
the equation
det(A − λI) = 0,
which is the equation that determines the eigenvalues λ. Once this equation is solved, substitute
each solution λ back into the original eigenvalue-eigenvector equation
(A − λI)v = 0.
Since λ is known, this is a linear homogeneous system for the eigenvector components. It always
has nonzero solutions, since λ is precisely the number that makes the coefficient matrix (A − λI)
not invertible. This establishes the Theorem. 
 
1 3
Example 5.4.4. Find the eigenvalues λ and eigenvectors v of the matrix A = .
3 1
Solution: We first find the eigenvalues as the solutions of the Eq. (5.4.1). Compute
         
1 3 1 0 1 3 λ 0 (1 − λ) 3
A − λI = −λ = − = .
3 1 0 1 3 1 0 λ 3 (1 − λ)
Then we compute its determinant,
λ+ = 4,

(1 − λ) 3
0 = det(A − λI) = = (λ − 1)2 − 9 ⇒
3 (1 − λ) λ- = −2.
We have obtained two eigenvalues, so now we introduce λ+ = 4 into Eq. (5.4.2), that is,
   
1−4 3 −3 3
A − 4I = = .
3 1−4 3 −3
Then we solve for v+ the equation
   +  
−3 3 v1 0
(A − 4I)v+ = 0 ⇔ = .
3 −3 v2+ 0
The solution can be found using Gauss elimination operations, as follows,
( +
v1 = v2+ ,
     
−3 3 1 −1 1 −1
→ → ⇒
3 −3 3 −3 0 0 v2+ free.
Al solutions to the equation above are then given by
 +    
v 1 + 1
v+ = 2+ = v ⇒ v+ = ,
v2 1 2 1
232 5. OVERVIEW OF LINEAR ALGEBRA

where we have chosen v2+ = 1. A similar calculation provides the eigenvector v- associated with the
eigenvalue λ- = −2, that is, first compute the matrix
 
3 3
A + 2I =
3 3
then we solve for v- the equation
   -  
3 3 v1 0
(A + 2I)v- = 0 ⇔ = .
3 3 v2- 0
The solution can be found using Gauss elimination operations, as follows,
( -
v1 = −v2- ,
     
3 3 1 1 1 1
→ → ⇒
3 3 3 3 0 0 v2- free.
All solutions to the equation above are then given by
 -    
−v2 −1 - −1
v- = = v2 ⇒ v- = ,
v2- 1 1
where we have chosen v2- = 1. We therefore conclude that the eigenvalues and eigenvectors of the
matrix A above are given by
   
1 −1
λ+ = 4, v+ = , λ- = −2, v- = .
1 1
C

5.4.2. Eigenvalue Multiplicity. It is useful to introduce few more concepts, that are com-
mon in the literature.

Definition 5.4.3. The characteristic polynomial of an n × n matrix A is the function


p(λ) = det(A − λI).

 
1 3
Example 5.4.5. Find the characteristic polynomial of matrix A = .
3 1
Solution: We need to compute the determinant

(1 − λ) 3
p(λ) = det(A − λI) =
= (1 − λ)2 − 9 = λ2 − 2λ + 1 − 9.
3 (1 − λ)
We conclude that the characteristic polynomial is p(λ) = λ2 − 2λ − 8. C

Since the matrix A in this example is 2 × 2, its characteristic polynomial has degree two. One
can show that the characteristic polynomial of an n × n matrix has degree n. The eigenvalues of the
matrix are the roots of the characteristic polynomial. Different matrices may have different types
of roots, so we try to classify these roots in the following definition.

Definition 5.4.4. Given an n × n matrix A with real eigenvalues λi , where i = 1, · · · , k 6 n, it is


always possible to express the characteristic polynomial of A as
p(λ) = (λ − λ1 )r1 · · · (λ − λk )rk .
The number ri is called the algebraic multiplicity of the eigenvalue λi . Furthermore, the geo-
metric multiplicity of an eigenvalue λi , denoted as si , is the maximum number of eigenvectors of
λi that form a linearly independent set.

Example 5.4.6. Find the eigenvalues algebraic and geometric multiplicities of the matrix
 
1 3
A=
3 1
5.4. EIGENVALUES AND EIGENVECTORS 233

Solution: In order to find the algebraic multiplicity of the eigenvalues we need first to find the
eigenvalues. We now that the characteristic polynomial of this matrix is given by

(1 − λ) 3
p(λ) = = (λ − 1)2 − 9.
3 (1 − λ)
The roots of this polynomial are λ1 = 4 and λ2 = −2, so we know that p(λ) can be rewritten in
the following way,
p(λ) = (λ − 4)(λ + 2).
We conclude that the algebraic multiplicity of the eigenvalues are both one, that is,
λ1 = 4, r1 = 1, and λ2 = −2, r2 = 1.
In order to find the geometric multiplicities of matrix eigenvalues we need first to find the matrix
eigenvectors. This part of the work was already done in the Example 5.4.4 above and the result is
   
(1) 1 (2) −1
λ1 = 4, v = , λ2 = −2, v = .
1 1
From this expression we conclude that the geometric multiplicities for each eigenvalue are just one,
that is,
λ1 = 4, s1 = 1, and λ2 = −2, s2 = 1.
C

The following example shows that two matrices can have the same eigenvalues, and so the same
algebraic multiplicities, but different eigenvectors with different geometric multiplicities.
 
3 0 1
Example 5.4.7. Find the eigenvalues and eigenvectors of the matrix A = 0 3 2.
0 0 1
Solution: We start finding the eigenvalues, the roots of the characteristic polynomial

(3 − λ) 0 1 λ1 = 1, r1 = 1,

2 = −(λ − 1)(λ − 3)2 ⇒

p(λ) = 0 (3 − λ)
λ2 = 3, r2 = 2.
0 0 (1 − λ)
We now compute the eigenvector associated with the eigenvalue λ1 = 1, which is the solution of
the linear system
   (1)   
2 0 1 v1 0
(A − I)v(1) = 0 ⇔ 0 2 2 v2(1)  = 0 .
 
0 0 0 (1) 0
v3
After the few Gauss elimination operation we obtain the following,
 (1)
   1
 v (1) = − v3 ,

2 0 1 1 0 2
 1
2


0 2 2 → 0 1 1  ⇒ (1) (1)
v 2 = −v3 ,
0 0 0 0 0 0



v (1) free.

3
(1)
Therefore, choosing v3 = 2 we obtain that
 
−1
(1)
v = −2 , λ1 = 1, r1 = 1, s1 = 1.
2
In a similar way we now compute the eigenvectors for the eigenvalue λ2 = 3, which are all solutions
of the linear system
   (2)   
0 0 1 v1 0
(A − 3I)v(2) = 0 ⇔ 0 0 2  v2(2)  = 0 .
 
0 0 −2 (2)
v3 0
234 5. OVERVIEW OF LINEAR ALGEBRA

After the few Gauss elimination operation we obtain the following,


 (2)
    v free,
0 0 1 0 0 1  1


(2)
0 0 2  → 0 0 0  ⇒ v2 free,
0 0 −2 0 0 0


 (2)
v3 = 0.
(2)
Therefore, we obtain two linearly independent solutions, the first one v(2) with the choice v1 = 1,
(2) (2) (2)
v2 = 0, and the second one w(2) with the choice v1 = 0, v2 = 1, that is
   
1 0
(2) (2)
v =  0 , w = 1 , λ2 = 3, r2 = 2, s2 = 2.
0 0
Summarizing, the matrix in this example has three linearly independent eigenvectors. C
 
3 1 1
Example 5.4.8. Find the eigenvalues and eigenvectors of the matrix A = 0 3 2 .
0 0 1
Solution: Notice that this matrix has only the coefficient a12 different from the previous example.
Again, we start finding the eigenvalues, which are the roots of the characteristic polynomial

(3 − λ) 1 1 λ1 = 1, r1 = 1,

2 = −(λ − 1)(λ − 3)2 ⇒

p(λ) = 0 (3 − λ)
λ 2 = 3, r2 = 2.
0 0 (1 − λ)
So this matrix has the same eigenvalues and algebraic multiplicities as the matrix in the previous
example. We now compute the eigenvector associated with the eigenvalue λ1 = 1, which is the
solution of the linear system
   (1)   
2 1 1 v1 0
(A − I)v(1) = 0 ⇔ 0 2 2 v2(1)  = 0 .
 
0 0 0 (1) 0
v3
After the few Gauss elimination operation we obtain the following,
 (1)
      v = 0,
2 1 1 1 1 1 1 0 0  1


0 2 2 → 0 1 1 → 0 1 1 ⇒ (1) (1)
v2 = −v3 ,
0 0 0 0 0 0 0 0 0


 (1)
v3 free.
(1)
Therefore, choosing v3 = 1 we obtain that
 
0
(1)
v = −1 , λ1 = 1, r1 = 1, s1 = 1.
1
In a similar way we now compute the eigenvectors for the eigenvalue λ2 = 3. However, in this case
we obtain only one solution, as this calculation shows,
   (2)   
0 1 1 v1 0
(A − 3I)v(2) = 0 ⇔ 0 0 2  v2(2)  = 0 .
 
0 0 −2 (2)
v3 0

After the few Gauss elimination operation we obtain the following,


 (2)
    v free,
0 1 1 0 1 0  1


(2)
0 0 2 → 0 0 1
   ⇒ v2 = 0,
0 0 −2 0 0 0


 (2)
v3 = 0.
5.4. EIGENVALUES AND EIGENVECTORS 235

(2)
Therefore, we obtain only one linearly independent solution, which corresponds to the choice v1 =
1, that is,  
1
(2)
v =  0 , λ2 = 3, r2 = 2, s2 = 1.
0
Summarizing, the matrix in this example has only two linearly independent eigenvectors, and in
the case of the eigenvalue λ2 = 3 we have the strict inequality
1 = s2 < r2 = 2.
C
236 5. OVERVIEW OF LINEAR ALGEBRA

5.4.3. Exercises.

5.4.1.- . 5.4.2.- .
5.5. DIAGONALIZABLE MATRICES 237

5.5. Diagonalizable Matrices


We focus on a particular type of matrices, called diagonalizable matrices. These are matrices simple
enough so that certain computations can be performed exactly. And these matrices are complicated
enough so they can be used to describe several physical problems.

5.5.1. Diagonalizable Matrices. We first introduce the notion of a diagonal matrix. Later
on we define a diagonalizable matrix as a matrix that can be transformed into a diagonal matrix
by a simple transformation.

a11 · · ·
 
0
Definition 5.5.1. An n × n matrix A is called diagonal iff A =  ... .. .. .

. . 
0 · · · ann

That is, a matrix is diagonal iff every nondiagonal coefficient vanishes. From now on we use the
following notation for a diagonal matrix A:
a11 · · ·
 
0
A = diag a11 , · · · , ann =  ... .. ..  .
  
. . 
0 · · · ann
This notation says that the matrix is diagonal and shows only the diagonal coefficients, since any
other coefficient vanishes. The next result says that the eigenvalues of a diagonal matrix are the
matrix diagonal elements, and it gives the corresponding eigenvectors.

Theorem 5.5.2. If D = diag[d11 , · · · , dnn ], then eigenpairs of D are

   
1 0
0  .. 
λ1 = d11 , v(1) =  .  , · · · , λn = dnn , v(n) =  .  .
   
 ..  0
0 1

Diagonal matrices are simple to manipulate since they share many properties with numbers.
For example the product of two diagonal matrices is commutative. It is simple to compute power
functions of a diagonal matrix. It is also simple to compute more involved functions of a diagonal
matrix, like the exponential function.
 
2 0
Example 5.5.1. For every positive integer n find An , where A = .
0 3
Solution: We start computing A2 as follows,
    2 
2 2 0 2 0 2 0
A = AA = = .
0 3 0 3 0 32

We now compute A3 ,
22
    3 
3 2 0 2 0 2 0
A =A A= = .
0 32 0 3 0 33
 n 
2 0
Using induction, it is simple to see that An = . C
0 3n

Many properties of diagonal matrices are shared by diagonalizable matrices. These are matrices
that can be transformed into a diagonal matrix by a simple transformation.
238 5. OVERVIEW OF LINEAR ALGEBRA

Definition 5.5.3. An n × n matrix A is called diagonalizable iff there exists an invertible matrix
P and a diagonal matrix D such that
A = P DP −1 .

Remarks:
(a) Systems of linear differential equations are simple to solve in the case that the coefficient matrix
is diagonalizable. One decouples the differential equations, solves the decoupled equations, and
transforms the solutions back to the original unknowns.
(b) Not every square matrix is diagonalizable. For example, matrix A below is diagonalizable while
B is not,
   
1 3 1 3 1
A= , B= .
3 1 2 −1 5

 
1 3
Example 5.5.2. Show that matrix A = is diagonalizable, where
3 1
   
1 −1 4 0
P = and D = .
1 1 0 −2

Solution: That matrix P is invertible can be verified by computing its determinant, det(P ) =
1 − (−1) = 2. Since the determinant is nonzero, P is invertible.
 Using linear algebra methods one
1 1 1
can find out that the inverse matrix is P −1 = . Now we only need to verify that P DP −1
2 −1 1
is indeed A. A straightforward calculation shows
    
1 −1 4 0 1 1 1
P DP −1 =
1 1 0 −2 2 −1 1
   
4 2 1 1 1
=
4 −2 2 −1 1
  
2 1 1 1
=
2 −1 −1 1
 
1 3
= ⇒ P DP −1 = A.
3 1
C

5.5.2. Eigenvectors and Diagonalizability. There is a deep relation between the eigen-
pairs of a matrix and whether that matrix is diagonalizable.

Theorem 5.5.4 (Diagonalizable Matrix). An n × n matrix A is diagonalizable iff A has a linearly


independent set of n eigenvectors. Furthermore, if λi , vi , for i = 1, · · · , n, are eigenpairs of A, then
A = P DP −1 , P = [v1 , · · · , vn ], D = diag λ1 , · · · , λn .
 

Proof of Theorem 5.5.4:


(⇒) Since matrix A is diagonalizable, there exist an invertible matrix P and a diagonal matrix
D such that A = P DP −1 . Multiply this equation by P −1 on the left and by P on the right, we get
D = P −1 AP. (5.5.1)
Since n × n matrix D is diagonal, it has a linearly independent set of n eigenvectors, given by the
column vectors of the identity matrix, that is,

De(i) = dii e(i) , I = e(1) , · · · , e(n) .


   
D = diag d11 , · · · , dnn ,
5.5. DIAGONALIZABLE MATRICES 239

So, the pair dii , e(i) is an eigenvalue-eigenvector pair of D, for i = 1 · · · , n. Using this information
in Eq. (5.5.1) we get

dii e(i) = De(i) = P −1 AP e(i) ⇒ A P e(i) = dii P e(i) ,


 

where the last equation comes from multiplying the former equation by P on the left. This last
equation says that the vectors v(i) = P e(i) are eigenvectors of A with eigenvalue dii . By definition,
v(i) is the i-th column of matrix P , that is,
P = v(1) , · · · , v(n) .
 

Since matrix P is invertible, the eigenvectors set {v(1) , · · · , v(n) } is linearly independent. This
establishes this part of the Theorem.
(⇐) Let λi , v(i) be eigenvalue-eigenvector pairs  of matrix A, for i = 1, · · · , n. Now use the
eigenvectors to construct matrix P = v(1) , · · · , v(n) . This matrix is invertible, since the eigenvector


set {v(1) , · · · , v(n) } is linearly independent. We now show that matrix P −1 AP is diagonal. We start
computing the product
AP = A v(1) , · · · , v(n) = Av(1) , · · · , Av(n) , = λ1 v(1) · · · , λn v(n) .
     

that is,
P −1 AP = P −1 λ1 v(1) , · · · , λn v(n) = λ1 P −1 v(1) , · · · , λn P −1 v(n) .
   

At this point it is useful to recall that P −1 is the inverse of P ,


I = P −1 P ⇔ −1  (1)
v , · · · , v(n) = P −1 v(1) , · · · , P −1 v(n) .
 (1)
e , · · · , e(n) = P
   

So, e(i) = P −1 v(i) , for i = 1 · · · , n. Using these equations in the equation for P −1 AP ,
P −1 AP = λ1 e(1) , · · · , λn e(n) = diag λ1 , · · · , λn .
   

Denoting D = diag λ1 , · · · , λn we conclude that P −1 AP = D, or equivalently


 

A = P DP −1 , P = v(1) , · · · , v(n) ,
   
D = diag λ1 , · · · , λn .
This means that A is diagonalizable. This establishes the Theorem. 
 
1 3
Example 5.5.3. Show that matrix A = is diagonalizable.
3 1
Solution: We know that the eigenvalue-eigenvector pairs are
   
1 −1
λ1 = 4, v1 = and λ2 = −2, v2 = .
1 1
Introduce P and D as follows,
     
1 −1 1 1 1 4 0
P = ⇒ P −1 = , D= .
1 1 2 −1 1 0 −2

We must show that A = P DP −1 . This is indeed the case, since


    
1 −1 4 0 1 1 1
P DP −1 = .
1 1 0 −2 2 −1 1
       
4 2 1 1 1 2 1 1 1
P DP −1 = =
4 −2 2 −1 1 2 −1 −1 1
 
1 3
We conclude, P DP −1 = ⇒ P DP −1 = A, that is, A is diagonalizable. C
3 1

With Theorem 5.5.4 we can show that a matrix is not diagonalizable.


240 5. OVERVIEW OF LINEAR ALGEBRA

 
1 3 1
Example 5.5.4. Show that matrix B = is not diagonalizable.
2 −1 5
Solution: We first compute the matrix eigenvalues. The characteristic polynomial is
 3  1
−λ

3  5  1
p(λ) = 2 1  5 2  = − λ + = λ2 − 4λ + 4.

−λ
− −λ 2 2 4
2 2
The roots of the characteristic polynomial are computed in the usual way,
1 √ 
λ = 4 ± 16 − 16 ⇒ λ = 2, r = 2.
2
We have obtained a single eigenvalue with algebraic multiplicity r = 2. The associated eigenvectors
are computed as the solutions to the equation (A − 2I)v = 0. Then,
 
 3 1 1
 1 
−
−2  2 2
    
(A − 2I) =  2 1  5 2  =   → 1 −1 ⇒ v =
1
, s = 1.
− −2

1 1
 0 0 1

 
2 2 2 2

We conclude that the biggest linearly independent set of eigenvalues for the 2 × 2 matrix B contains
only one vector, insted of two. Therefore, matrix B is not diagonalizable. C

5.5.3. The Case of Different Eigenvalues. Theorem 5.5.4 shows the importance of know-
ing whether an n × n matrix has a linearly independent set of n eigenvectors. However, more
often than not, there is no simple way to check this property other than to compute all the matrix
eigenvectors. But there is a simpler particular case, the case when an n × n matrix has n different
eigenvalues. Then, we do not need to compute the eigenvectors. The following result says that
such matrix always have a linearly independent set of n eigenvectors, hence, by Theorem 5.5.4, it
is diagonalizable.

Theorem 5.5.5 (Different Eigenvalues). If an n × n matrix has n different eigenvalues, then this
matrix has a linearly independent set of n eigenvectors.

Proof of Theorem 5.5.5: Let λ1 , · · · , λn be the eigenvalues of an n × n matrix A, all different


from each other. Let v(1) , · · · , v(n) the corresponding eigenvectors, that is, Av(i) = λi v(i) , with
i = 1, · · · , n. We have to show that the set {v(1) , · · · , v(n) } is linearly independent. We assume
that the opposite is true and we obtain a contradiction. Let us assume that the set above is linearly
dependent, that is, there are constants c1 , · · · , cn , not all zero, such that,
c1 v(1) + · · · + cn v(n) = 0. (5.5.2)
Let us name the eigenvalues and eigenvectors such that c1 6= 0. Now, multiply the equation above
by the matrix A, the result is,
c1 λ1 v(1) + · · · + cn λn v(n) = 0.
Multiply Eq. (5.5.2) by the eigenvalue λn , the result is,
c1 λn v(1) + · · · + cn λn v(n) = 0.
Subtract the second from the first of the equations above, then the last term on the right-hand
sides cancels out, and we obtain,
c1 (λ1 − λn )v(1) + · · · + cn−1 (λn−1 − λn )v(n−1) = 0. (5.5.3)
Repeat the whole procedure starting with Eq. (5.5.3), that is, multiply this later equation by matrix
A and also by λn−1 , then subtract the second from the first, the result is,
c1 (λ1 − λn )(λ1 − λn−1 )v(1) + · · · + cn−2 (λn−2 − λn )(λn−2 − λn−1 )v(n−2) = 0.
5.5. DIAGONALIZABLE MATRICES 241

Repeat the whole procedure a total of n − 1 times, in the last step we obtain the equation
c1 (λ1 − λn )(λ1 − λn−1 ) · · · (λ1 − λ3 )(λ1 − λ2 )v(1) = 0.
Since all the eigenvalues are different, we conclude that c1 = 0, however this contradicts our
assumption that c1 6= 0. Therefore, the set of n eigenvectors must be linearly independent. This
establishes the Theorem. 
 
1 1
Example 5.5.5. Is matrix A = diagonalizable?
1 1
Solution: We compute the matrix eigenvalues, starting with the characteristic polynomial,

(1 − λ) 1
p(λ) = = (1 − λ)2 − 1 = λ2 − 2λ ⇒ p(λ) = λ(λ − 2).
1 (1 − λ)
The roots of the characteristic polynomial are the matrix eigenvalues,
λ1 = 0, λ2 = 2.
The eigenvalues are different, so by Theorem 5.5.5, matrix A is diagonalizable. C
242 5. OVERVIEW OF LINEAR ALGEBRA

5.5.4. Exercises.

5.5.1.- . 5.5.2.- .
5.6. THE MATRIX EXPONENTIAL 243

5.6. The Matrix Exponential


When we multiply two square matrices the result is another square matrix. This property allow us
to define power functions and polynomials of a square matrix. In this section we go one step further
and define the exponential of a square matrix. We will show that the derivative of the exponential
function on matrices, as the one defined on real numbers, is proportional to itself.

5.6.1. The Exponential Function. The exponential function defined on real numbers,
f (x) = eax , where a is a constant and x ∈ R, satisfies f 0 (x) = a f (x). We want to find a function of
a square matrix with a similar property. Since the exponential on real numbers can be defined in
several equivalent ways, we start with a short review of three of ways to define the exponential ex .
(a) The exponential function can be defined as a generalization of the power function from the
positive integers to the real numbers. One starts with positive integers n, defining
en = e · · · e, n-times.
Then one defines e0 = 1, and for negative integers −n
1
e−n = .
en
m
The next step is to define the exponential for rational numbers, , with m, n integers,
n
m √
n m
en = e .
The difficult part in this definition of the exponential is the generalization to irrational numbers,
x, which is done by a limit,
m
ex = mlim e n .
n
→x

It is nontrivial to define that limit precisely, which is why many calculus textbooks do not
show it. Anyway, it is not clear how to generalize this definition from real numbers x to square
matrices X.
(b) The exponential function can be defined as the inverse of the natural logarithm function g(x) =
1
ln(x), which in turns is defined as the area under the graph of the function h(x) = from 1
x
to x, that is,
Z x
1
ln(x) = dy, x ∈ (0, ∞).
1 y

Again, it is not clear how to extend to matrices this definition of the exponential function on
real numbers.
(c) The exponential function can be defined also by its Taylor series expansion,

X xk x2 x3
ex = =1+x+ + + ··· .
k! 2! 3!
k=0

Most calculus textbooks show this series expansion, a Taylor expansion, as a result from the
exponential definition, not as a definition itself. But one can define the exponential using this
series and prove that the function so defined satisfies the properties in (a) and (b). It turns
out, this series expression can be generalized square matrices.
We now use the idea in (c) to define the exponential function on square matrices. We start
with the power function of a square matrix, f (X) = X n = X · · · X, n-times, for X a square matrix
and n a positive integer. Then we define a polynomial of a square matrix,
p(X) = an X n + an−1 X n−1 + · · · + a0 I.
Now we are ready to define the exponential of a square matrix.
244 5. OVERVIEW OF LINEAR ALGEBRA

Definition 5.6.1. The exponential of a square matrix A is the infinite sum



X An
eA = . (5.6.1)
n=0
n!

This definition makes sense, because the infinite sum in Eq. (5.6.1) converges.

Theorem 5.6.2. The infinite sum in Eq. (5.6.1) converges for all n × n matrices.

Proof: See Section 2.1.2 and 4.5 in Hassani [5] for a proof using the Spectral Theorem. 
The infinite sum in the definition of the exponential of a matrix is in general difficult to
compute. However, when the matrix is diagonal, the exponential is remarkably simple.
 
Theorem 5.6.3 (Exponential of Diagonal Matrices). If D = diag d1 , · · · , dn , then

eD = diag ed1 , · · · , edn .


 

Proof of Theorem 5.6.3: We start from the definition of the exponential,


∞ ∞
X 1 k X 1
eD = diag (d1 )k , · · · , (dn )k .
  
diag d1 , · · · , dn =
k! k!
k=0 k=0

where in the second equallity we used that the matrix D is diagonal. Then,
∞ h (d )k ∞ ∞
X 1 (dn )k i hX (d1 )k X (dn )k i
eD = diag ,··· , = diag ,··· , .
k! k! k! k!
k=0 k=0 k=0


(di )k
X
Each sum in the diagonal of matrix above satisfies = edi . Therefore, we arrive to the
k!
k=0
equation eD = diag ed1 , · · · , edn . This establishes the Theorem.
 

 
2 0
Example 5.6.1. Compute eA , where A = .
0 7
Solution: We follow the proof of Theorem 5.6.3 to get this result. We start with the definition of
the exponential
∞ ∞ n
An

X X 1 2 0
eA = = .
n! n! 0 7
n=0 n=0

Since the matrix A is diagonal, we have that


 n  n 
2 0 2 0
= n .
0 7 0 7
Therefore,
∞ ∞  2n P∞ 2n
1 2n
  X  
A
X 0 n!
0 n=0 n! 0
e = = 7n = P∞ 7n .
n! 0 7n 0 n!
0 n=0 n!
n=0 n=0
 
2 0
e2
 
P∞ an a 0 7 0
Since n=0 n! = e , for a = 2, 7, we obtain that e = 7 . C
0 e

5.6.2. Diagonalizable Matrices Formula. The exponential of a diagonalizable matrix is


simple to compute, although not as simple as for diagonal matrices. The infinite sum in the
exponential of a diagonalizable matrix reduces to a product of three matrices. We start with the
following result, the nth-power of a diagonalizable matrix.
5.6. THE MATRIX EXPONENTIAL 245

Theorem 5.6.4 (Powers of Diagonalizable Matrices). If an n × n matrix A is diagonalizable, with


invertible matrix P and diagonal matrix D satisfying A = P DP −1 , then for every integer k > 1
holds
Ak = P Dk P −1 . (5.6.2)
 
Proof of Theorem 5.6.4: For any diagonal matrix D = diag d1 , · · · , dn we know that
Dn = diag dn n
 
1 , · · · , dn .

We use this result and induction in n to prove Eq.(5.6.2). Since the case n = 1 is trivially true, we
start computing the case n = 2. We get
2
A2 = P DP −1 = P DP −1 P DP −1 = P DDP −1 ⇒ A2 = P D2 P −1 ,
 

that is, Eq. (5.6.2) holds for k = 2. Now assume that Eq. (5.6.2) is true for k. This equation also
holds for k + 1, since
A(k+1) = Ak A = P Dk P −1 P DP −1 = P Dk P −1 P DP −1 = P Dk DP −1 .
 

We conclude that A(k+1) = P D(k+1) P −1 . This establishes the Theorem. 


We are ready to compute the exponential of a diagonalizable matrix.

Theorem 5.6.5 (Exponential of Diagonalizable Matrices). If an n × n matrix A is diagonalizable,


with invertible matrix P and diagonal matrix D satisfying A = P DP −1 , then the exponential of
matrix A is given by
eA = P eD P −1 . (5.6.3)

Remark: Theorem 5.6.5 says that the infinite sum in the definition of eA reduces to a product of
three matrices when the matrix A is diagonalizable. This Theorem also says that to compute the
exponential of a diagonalizable matrix we need to compute the eigenvalues and eigenvectors of that
matrix.
Proof of Theorem 5.6.5: We start with the definition of the exponential,
∞ ∞ ∞
X 1 n X 1 X 1
eA = A = (P DP −1 )n = (P Dn P −1 ),
k! k! k!
k=0 k=0 k=0

where the last step comes from Theorem 5.6.4. Now, in the expression on the far right we can take
common factor P on the left and P −1 on the right, that is,

X 1 n  −1
eA = P D P .
k!
k=0

The sum in between parenthesis is the exponential of the diagonal matrix D, which we computed
in Theorem 5.6.3,
eA = P eD P −1 .
This establishes the Theorem. 
A n×n n×n
We have defined the exponential function F̃ (A) = e : R → R , which is a function
from the space of square matrices into the space of square matrices. However, when one studies
solutions to linear systems of differential equations, one needs a slightly different type of functions.
One needs functions of the form F (t) = eAt : R → Rn×n , where A is a constant square matrix
and the independent variable is t ∈ R. That is, one needs to generalize the real constant a in the
function f (t) = eat to an n × n matrix A. In the case that the matrix A is diagonalizable, with
A = P DP −1 , so is matrix At, and At = P (Dt)P −1 . Therefore, the formula for the exponential of
At is simply
eAt = P eDt P −1 .
We use this formula in the following example.
246 5. OVERVIEW OF LINEAR ALGEBRA

 
1 3
Example 5.6.2. Compute eAt , where A = and t ∈ R.
3 1
Solution: To compute eAt we need the decomposition A = P DP −1 , which in turns implies that
At = P (Dt)P −1 . Matrices P and D are constructed with the eigenvectors and eigenvalues of matrix
A. We computed them in Example ??,
   
1 −1
λ1 = 4, v1 = and λ2 = −2, v2 = .
1 1
Introduce P and D as follows,
     
1 −1 1 1 1 4 0
P = ⇒ P −1 = , D= .
1 1 2 −1 1 0 −2
Then, the exponential function is given by
e4t
    
1 −1 0 1 1 1
eAt = P eDt P −1 = −2t .
1 1 0 e 2 −1 1
Usually one leaves the function in this form. If we multiply the three matrices out we get
1 (e4t + e−2t ) (e4t − e−2t )
 
eAt = 4t −2t 4t −2t .
2 (e − e ) (e + e )
C

5.6.3. Properties of the Exponential. We summarize some simple properties of the expo-
nential function in the following result. We leave the proof as an exercise.

Theorem 5.6.6 (Algebraic Properties). If A is an n × n matrix, then


(a) If 0 is the n × n zero matrix, then e0 = I.
T T
(b) eA = e(A ) , where T means transpose.
(c) For all nonnegative integers k holds Ak eA = eA Ak .
(d) If AB = BA, then A eB = eB A and eA eB = eB eA .

An important property of the exponential on real numbers is not true for the exponential on
matrices. We know that ea eb = ea+b for all real numbers a, b. However, there exist n × n matrices
A, B such that eA eB 6= eA+B . We now prove a weaker property.

Theorem 5.6.7 (Group Property). If A is an n × n matrix and s, t are real numbers, then
eAs eAt = eA(s+t) .

Proof of Theorem 5.6.7: We start with the definition of the exponential function
∞ ∞ ∞ ∞
X Aj sj  X Ak tk  X X Aj+k sj tk
eAs eAt = = .
j=0
j! k! j=0
j! k!
k=0 k=0

We now introduce the new label n = j + k, then j = n − k, and we reorder the terms,
∞ Xn ∞ n
X An sn−k tk X An X n! 
eAs eAt = = sn−k tk .
n=0
(n − k)! k! n=0
n! (n − k)! k!
k=0 k=0
n
X n!
If we recall the binomial theorem, (s + t)n = sn−k tk , we get
(n − k)! k!
k=0

X An
eAs eAt = (s + t)n = eA(s+t) .
n=0
n!

This establishes the Theorem. 


5.6. THE MATRIX EXPONENTIAL 247

If we set s = 1 and t = −1 in the Theorem 5.6.7 we get that


eA e−A = eA(1−1) = e0 = I,
so we have a formula for the inverse of the exponential.

Theorem 5.6.8 (Inverse Exponential). If A is an n × n matrix, then


−1
eA = e−A .

 
1 3
Example 5.6.3. Verify Theorem 5.6.8 for eAt , where A = and t ∈ R.
3 1
Solution: In Example 5.6.2 we found that
1 (e4t + e−2t ) (e4t − e−2t )
 
eAt = .
2 (e4t − e−2t ) (e4t + e−2t )
A 2 × 2 matrix is invertible iff its determinant is nonzero. In our case,
1 1 1 1
det eAt = (e4t + e−2t ) (e4t + e−2t ) − (e4t − e−2t ) (e4t − e−2t ) = e2t ,

2 2 2 2
hence eAt is invertible. The inverse is
1 1 (e4t + e−2t ) (−e4t + e−2t )
 
−1
eAt = 2t ,
e 2 (−e4t + e−2t ) (e4t + e−2t )
that is
(e2t + e−4t ) (−e2t + e−4t ) 1 (e−4t + e2t ) (e−4t − e2t )
   
−1 1
eAt = = = e−At .
2 (−e2t + e−4t ) 2t −4t
(e + e ) 2 (e−4t − e2t ) (e −4t 2t
+e )
C

We now want to compute the derivative of the function F (t) = eAt , where A is a constant n × n
matrix and t ∈ R. It is not difficult to show the following result.

Theorem 5.6.9 (Derivative of the Exponential). If A is an n × n matrix, and t ∈ R, then


d At
e = A eAt .
dt

Remark: Recall that Theorem 5.6.6 says that A eA = eA A, so we have that


d At
e = A eAt = eA A.
dt

First Proof of Theorem 5.6.9: We use the definition of the exponential,


∞ ∞ ∞ ∞
d At d X An tn X An d n X An tn−1 X A(n−1) tn−1
e = = (t ) = =A ,
dt dt n=0 n! n=0
n! dt n=1
(n − 1)! n=1
(n − 1)!
therefore we get
d At
e = A eAt .
dt
This establishes the Theorem. 
Second Proof of Theorem 5.6.9: We use the definition of derivative and Theorem 5.6.7,
eA(t+h) − eAt eAt eAh − eAt  eAh − I 
F 0 (t) = lim = lim = eAt lim ,
h→0 h h→0 h h→0 h
and using now the power series definition of the exponential we get
h 1 A2 h2 i
F 0 (t) = eAt lim Ah + + ··· = eAt A.
h→0 h 2!
248 5. OVERVIEW OF LINEAR ALGEBRA

This establishes the Theorem. 


 
1 3
Example 5.6.4. Verify Theorem 5.6.9 for F (t) = eAt , where A = and t ∈ R.
3 1
Solution: In Example 5.6.2 we found that
1 (e4t + e−2t ) (e4t − e−2t )
 
eAt = .
2 (e4t − e−2t ) (e4t + e−2t )
Therefore, if we derivate component by component we get
1 (4 e4t − 2 e−2t ) (4 e4t + 2 e−2t )
 
d At
e = 4t −2t 4t −2t .
dt 2 (4 e + 2 e ) (4 e − 2 e )
On the other hand, if we compute
1 3 1 (e4t + e−2t ) (e4t − e−2t )
   
A eAt = 4t −2t 4t −2t
3 1 2 (e − e ) (e + e )
1 (4 e4t − 2 e−2t ) (4 e4t + 2 e−2t )
 
=
2 (4 e4t + 2 e−2t ) (4 e4t − 2 e−2t )
d At d At
Therefore, e = A eAt . The relation e = eAt A is shown in a similar way. C
dt dt

We end this review of the matrix exponential showing when formula eA+B = eA eB holds.

Theorem 5.6.10 (Exponent Rule). If A, B are n × n matrices such that AB = BA, then
eA+B = eA eB .

Proof of Theorem 5.6.10: Introduce the function


F (t) = e(A+B)t e−Bt e−At ,
where t ∈ R. Compute the derivative of F ,
F 0 (t) = (A + B) e(A+B)t e−Bt e−At + e(A+B)t (−B) e−Bt e−At + e(A+B)t e−Bt (−A) e−At .
Since AB = BA, we know that e−BT A = A e−Bt , so we get
F 0 (t) = (A + B) e(A+B)t e−Bt e−At − e(A+B)t B e−Bt e−At − e(A+B)t A e−Bt e−At .
Now AB = BA also implies that (A + B) B = B (A + B), therefore Theorem 5.6.6 implies
e(A+B)t B = B e(A+B)t .
Analogously, we have that (A + B) A = A (A + B), therefore Theorem 5.6.6 implies that
e(A+B)t A = A e(A+B)t .
0
Using these equation in F we get
F 0 (t) = (A + B) F (t) − B F (t) − A F (t) ⇒ F 0 (t) = 0.
Therefore, F (t) is a constant matrix, F (t) = F (0) = I. So we get
e(A+B)t e−Bt e−At = I ⇒ e(A+B)t = eAt eBt .
This establishes the Theorem. 
5.6. THE MATRIX EXPONENTIAL 249

5.6.4. Exercises.

5.6.1.- Use the definition of the matrix ex- 5.6.5.- Compute eA for the following matri-
ponential to prove Theorem 5.6.6. Do ces:
not use any other theorems in this Sec-
 
a b
tion. (a) A = .
0 0
 
a 0
5.6.2.- If A2 = A, find a formula for eA (b) A =
0 b
.
which does not contain an infinite sum.
5.6.6.- If A2 = I, show that
5.6.3.- Compute eA for the following matri-  1  1
ces: 2 eA = e + I + e− A.

0 1
 e e
(a) A = .
0 0 5.6.7.- If λ and v are an eigenvalue and
 
1 1 eigenvector of A, then show that
(b) A = .
0 1
  eA v = eλ v.
a b
(c) A = .
0 1 5.6.8.- By direct computation show that
e(A+B) 6= eA eB for
5.6.4.- Show that, if A is diagonalizable,    
1 0 0 1
det eA = etr (A) . A= , B= .

0 0 0 0
Remark: This result is true for all
square matrices, but it is hard to prove
for nondiagonalizable matrices.
CHAPTER 6

Quantitative Aspects of Systems of ODE

In these chapter we find formulas for solutions to linear differential systems. We start with homo-
geneous linear systems with constant coefficients. We then use these solutions to characterize the
solutions of nonlinear systems near their critical points. Near the end of this chapter we go back to
linear systems. We show how to solve homogeneous and non-homogeneous systems of n × n systems
of differential equations with constant coefficients.

251
252 6. QUANTITATIVE ASPECTS OF SYSTEMS OF ODE

6.1. Two-Dimensional Linear Systems


In this section we focus on the simplest system of differential equations—2 × 2 linear constant
coefficients homogeneous systems of differential equations. These systems simple enough so their
solutions can be computed and classified. But they are non-trivial enough so their solutions describe
several situations including exponential decays and oscillations. In later sections we will use these
systems as approximations of more complicated nonlinear systems.
6.1.1. 2 × 2 Linear Systems. The simplest systems of differential equations are the linear
systems. The simplest linear systems are the 2 × 2 linear systems.

Definition 6.1.1. An 2 × 2 first order linear differential system is the equation


x0 (t) = A(t) x(t) + b(t), (6.1.1)
where the 2 × 2 coefficient matrix A, the source n-vector b, and the unknown n-vector x are given
in components by
     
a11 (t) a12 (t) b1 (t) x1 (t)
A(t) = , b(t) = , x(t) = .
a21 (t) a22 (t) b2 (t) x2 (t)
The system in 6.1.1 is called homogeneous iff the source vector b = 0, of constant coefficients
iff the matrix A is constant, and diagonalizable iff the matrix A is diagonalizable.

Remarks:
x01 (t)

0
(a) The derivative of a a vector valued function is defined as x (t) = 0 .
x2 (t)
(b) By the definition of the matrix-vector product, Eq. (6.1.1) can be written as
x01 (t) = a11 (t) x1 (t) + a12 (t) x2 (t) + b1 (t),
x02 (t) = a21 (t) x1 (t) + a22 (t) x2 (t) + b2 (t).
(c) We recall that in § 5.5 we say that a square matrix A is diagonalizable iff there exists an
invertible matrix P and a diagonal matrix D such that A = P DP −1 .
A solution of an 2×2 linear differential system is an 2-vector valued function x, that is, 2-vector
with components x1 , x2 , that satisfy both differential equations in the system. When we write down
the equations we will usually write x instead of x(t).

Example 6.1.1. A single linear differential equation can be written using a notation similar to
the one used above: Find a solution x1 of
x01 = a11 (t) x1 + b1 (t).

Solution: This is a linear first order equation, and solutions can be found with the integrating
factor method described in Section 1.5. C

Example 6.1.2. Use matrix notation to write down the 2 × 2 system given by
x01 = x1 − x2 + e2t ,
x02 = −x1 + x2 − t2 e−t .

Solution: In this case, the matrix of coefficients, the unknown and source vectors are
     2t 
1 −1 x1 (t) e
A= , x(t) = . b(t) = .
−1 1 x2 (t) −t2 e−t
The differential equation can be written as follows,
x01 = x1 − x2 + e2t ,
 0      2t 
x1 1 −1 x1 e
⇔ 0 = + 2 −t ⇔ x0 = Ax + b.
0 2 −t
x2 = −x1 + x2 − t e , x 2 −1 1 x 2 −t e
C
6.1. TWO-DIMENSIONAL LINEAR SYSTEMS 253

Example 6.1.3. Find the explicit expression for the linear system x0 = Ax + b, where
   t   
1 3 e x1
A= , b(t) = , x = .
3 1 2e3t x2

Solution: The 2 × 2 linear system is given by


x01 = x1 + 3x2 + et ,
 0     t 
x1 1 3 x1 e
= + , ⇔
x02 3 1 x2 2e3t x02 = 3x1 + x2 + 2e3t .
C
   
2 2t 1 −t
Example 6.1.4. Show that the vector valued functions x(1) = e and x(2) = e are
1 2
 
3 −2
solutions to the 2 × 2 linear system x0 = Ax, where A = .
2 −2
Solution: We compute the left-hand side and the right-hand side of the differential equation above
for the function x(1) and we see that both side match, that is,
           
3 −2 2 2t 4 2t 2 2t 2 0 2
Ax(1) = e = e =2 e ; x(1)0 = e2t = 2 e2t ,
2 −2 1 2 1 1 1

so we conclude that x(1) 0 = Ax(1) . Analogously,


           
3 −2 1 −t −1 −t 1 −t 1 0 1 −t
Ax(2) = e = e =− e ; x(2) 0 = e−t = − e ,
2 −2 2 −2 2 2 2

so we conclude that x(2) 0 = Ax(2) . C

6.1.2. Diagonalizable Systems. Linear systems are the simplest systems to solve, but do
not think they are simple to solve—that would be a mistake. In systems, the differential equa-
tions are coupled. To understand what coupled means, consider a 2 × 2, constant coefficients,
homogeneous, linear system
x01 = a11 x1 + a12 x2
x02 = a21 x1 + a22 x2 .
We cannot simply integrate the first equation to obtain x1 , because x2 is on the right-hand side,
and we do not know what this function is. Analogously, we cannot simply integrate the second
equation to obtain x2 , because x1 is on the right-hand side, and we do not know what this function
is either. This is what we mean by the system to be coupled—one cannot solve for one variable at
a time, one must solve it for both variables together.
In the particular case that the coefficient matrix is diagonalizable, it is possible to decouple the
system. In the following example we show how this can be done.

Example 6.1.5. Find functions x1 , x2 solutions of the first order, 2 × 2, constant coefficients,
homogeneous differential system
x01 = x1 + 3x2 ,
x02 = 3x1 + x2 .

Solution: If we write this system in matrix form we get


 0   
x1 1 3 x1
= .
x02 3 1 x2
Notice that the coefficient matrix is  
1 3
A= ,
3 1
254 6. QUANTITATIVE ASPECTS OF SYSTEMS OF ODE

which is diagonalizable. We know that this matrix A has eignpairs


   
1 −1
λ+ = 4, v+ = and λ− = −2, v− = .
1 1
This means that A can be written as
     
−1 1 −1 4 0 −1 1 1 1
A = P DP , P = , D= , P = .
1 1 0 −2 2 −1 1
If we add this information into the differential equation we get
x0 = P DP −1 x.
We now want to do linear combinations among the equations in the system. One way to do that
in an efficient way is to multiply the whole system by a matrix. We choose to multiply the system
by P −1 , that is,
P −1 x0 = P −1 P DP −1 x ⇒ (P −1 x)0 = D (P −1 x),
where we used that P −1 is a constant matrix, so its t-derivative is zero, hence we get P −1 x0 =
(P −1 x)0 . If we introduce the new variable y = P −1 x, we got the system
y0 = Dy,
which is a diagonal system. We now repeat the steps above, but a bit slower, so we can show
explicitly what is the effect on the system when we multiply it by P −1 . What we did is
1 x01 1 (x1 + x2 )0
    
1 1
P −1 x0 = 0 = 0 .
2 −1 1 x2 2 (−x1 + x2 )
So, multiplying the system by P −1 means, in this case, to add and subtract the two equations in
the original system,
(x1 + x2 )0 = 4x1 + 4x2 ⇒ (x1 + x2 )0 = 4(x1 + x2 ).
(x2 − x1 )0 = 2x1 − 2x2 ⇒ (x2 − x1 )0 = −2(x2 − x1 ).
Introduce the new variables y1 = (x2 +x1 )/2, and y2 = (x2 −x1 )/2, which written in matrix notation
is    
y1 1 (x1 + x2 )
y= = ⇔ y = P −1 x.
y2 2 (−x1 + x2 )
In terms of this new variable y, the system is
y10 = 4y1 ,
)
⇔ y0 = Dy.
y20 = −2y2
We have decoupled the original system. The original system for x is coupled, but the new
system for y is diagonal, so decoupled. The solution is
y10 = 4y1 ⇒ y1 = c1 e4t ,
)
c1 e4t
 
⇒ y(t) = ,
y20 = 2y2 ⇒ y2 = c2 e−2t , c2 e−2t

with c1 , c2 ∈ R. Now we go back to the original variables. Since y = P −1 x, then


x1 = y1 − y2 ,
     
x1 1 −1 y1
x = Py ⇒ = ⇒
x2 1 1 y2 x2 = y1 + y2 .
So the general solution is
x1 (t) = c1 e4t − c2 e−2t , x2 (t) = c1 e4t + c2 e−2t .
In vector notation we get
(c1 e4t − c2 e−2t ) c1 e4t −c2 e−2t
       
x1
x= = 4t −2t = 4t + −2t .
x2 (c1 e + c2 e ) c1 e c2 e
Therefore, we get all the solutions for the 2 × 2 linear differential system,
   
1 4t −1 −2t
x(t) = c1 e + c2 e .
1 1
6.1. TWO-DIMENSIONAL LINEAR SYSTEMS 255

In the example above we solved the differential equation


 
1 3
x0 = Ax, A= ,
3 1
and all the solutions to that equation are
   
1 4t −1 −2t
x(t) = c1 e + c2 e ,
1 1
where c1 , c2 ∈ R are arbitrary constants. So we can write all the solutions above as arbitrary linear
combinations of two solutions,
   
1 4t −1 −2t
x1 (t) = e , x2 (t) = e .
1 1
Notice that these solutions are constructed with the eigenpairs
  of the coefficient matrix A. We have
1 3
found in previous sections that the eigenpairs of A = are
3 1
   
1 −1
λ1 = 4, v1 = , and λ2 = −2, v2 = .
1 1
So, we can write the solutions above as

x1 (t) = v1 eλ1 t , x2 (t) = v2 eλ2 t .


We now show that this is not a coincidence. We have actually discovered a fundamental property
of diagonalizable systems of linear differential equations.

Theorem 6.1.2 (Diagonalizable Systems). If the 2 × 2 constant matrix A is diagonalizable with


eigenpairs λ1 , v1 , and λ2 , v2 , then all solutions of x0 = A x are given by

x(t) = c1 eλ1 t v1 + c2 eλ2 t v2 . (6.1.2)

Remark: It is simple to check that the functions given in Eq. (6.1.2) are in fact solutions of the
differential equation. The reason is that each function xi = eλi t vi , for i = 1, 2, is solution of the
system x0 = A x. Indeed

x0i = λi eλi t vi , and Axi = A eλi t vi = eλi t A vi = λi eλi t vi ,




hence x0i = A xi . What it is not so easy to prove is that Eq. (6.1.2) contains all the solutions.
Proof of Theorem 6.1.2: Since the coefficient matrix A is diagonalizable, there exist an invertible
matrix P and a diagonal matrix D such that A = P DP −1 . Introduce this expression into the
differential equation and multiplying the whole equation by P −1 ,
P −1 x0 (t) = P −1 P DP −1 x(t).


Notice that to multiply the differential system by the matrix P −1 means to perform a very particular
type of linear combinations among the equations in the system. This is the linear combination
that decouples the
0 system. Indeed, since matrix A is constant, so is P and D. In particular
P −1 x0 = P −1 x , hence
(P −1 x)0 = D P −1 x .


Define the new variable y = P −1 x . The differential equation is now given by




y0 (t) = D y(t).
256 6. QUANTITATIVE ASPECTS OF SYSTEMS OF ODE

Since matrix D is diagonal, the system above is a decoupled for the variable y. Solve the decoupled
initial value problem y0 (t) = D y(t),
y10 (t) = λ1 y1 (t), y1 (t) = c1 eλ1 t ,
) ( )
c1 eλ1 t
 
0 ⇒ ⇒ y(t) = λ t .
y2 (t) = λ2 y2 (t), y2 (t) = c2 eλ2 t , c2 e 2
Once y is found, we transform back to x,
 c1 eλ1 t
 
λ1 t
v1 + c2 eλ2 t v2 .

x(t) = P y(t) = v1 , v2 λ2 t = c1 e
c2 e
This establishes the Theorem. 

Remark: Since the eigenvalues are the roots of the characteristic polynomial, we will often use the
notation λ± to denote these roots, where λ+ > λ- , for real eigenvalues.
We classify the diagonalizable 2 × 2 linear differential systems by the eigenvalues of their
coefficient matrix.
(i) The eigenvalues λ+ , λ- are real and distinct;
(ii) The eigenvalues λ± = α ± βi are distinct and complex, with λ+ = λ- ;
(iii) The eigenvalues λ+ = λ- = λ0 is repeated and real.
We now provide a few examples of systems on each of the cases above, starting with an example
of case (i).

Example 6.1.6. Find the general solution of the 2 × 2 linear system


 
0 1 2
x = A x, A= .
2 1

Solution: It is not difficult to show that the eigenpairs of the coefficient matrix are
   
1 −1
λ+ = 3, v+ = , and λ- = −1, v- = .
1 1
This coefficient matrix has distinct real eigenvalues, so all solutions to the differential equation are
given by    
1 −1
x(t) = c+ e3t + c- e−t .
1 1
C

6.1.3. The Case of Complex Eigenvalues. We now focus on case (ii). The coefficient
matrix is real-valued with the complex-valued eigenvalues. In this case each eigenvalue is the
complex conjugate of the other. A similar result is true for n × n real-valued matrices. When such
n × n matrix has a complex eigenvalue λ, there is another eigenvalue λ. A similar result holds for
the respective eigenvectors.

Theorem 6.1.3 (Conjugate Pairs). If an n × n real-valued matrix A has a complex eigenpair λ,


v, then the complex conjugate pair λ, v is also an eigenpair of matrix A.

Proof of Theorem 6.1.3: Complex conjugate the eigenvalue eigenvector equation for λ and v,
and recall that matrix A is real-valued, hence A = A. We obtain,
Av = λv ⇔ Av = λ v,
This establishes the Theorem. 
Complex eigenvalues of a matrix with real coefficients are always complex conjugate pairs.
Same it’s true for their respective eigenvectors. So they can be written in terms of their real and
imaginary parts as follows,
λ± = α ± iβ, v± = a ± ib, (6.1.3)
where α, β ∈ R and a, b ∈ Rn .
6.1. TWO-DIMENSIONAL LINEAR SYSTEMS 257

The general solution formula in Eq. (6.1.2) still holds in the case that A has complex eigenvalues
and eigenvectors. The main drawback of this formula is similar to what we found in Chapter 2. It
is difficult to separate real-valued from complex-valued solutions. The fix to that problem is also
similar to the one found in Chapter 2—find a real-valued fundamental set of solutions.

Theorem 6.1.4 (Complex and Real Solutions). If λ± = α±iβ are eigenvalues of an 2×2 constant
matrix A with eigenvectors v± = a ± ib, where α, β ∈ R and a, b ∈ R2 , then a linearly independent
set of two complex-valued solutions to x0 = A x is
x+ (t) = eλ+ t v+ , x- (t) = eλ- t v- , .

(6.1.4)
Furthermore, a linearly independent set of two real-valued solutions to x0 = A x is given by
x1 (t) = a cos(βt) − b sin(βt) eαt , x2 (t) = a sin(βt) + b cos(βt) eαt .
  
(6.1.5)

Proof of Theorem 6.1.4: Theorem 6.1.2 implies the set in (6.1.4) is a linearly independent set.
The new information in Theorem 6.1.4 above is the real-valued solutions in Eq. (6.1.5). They are
obtained from Eq. (6.1.4) as follows:
x± = (a ± ib) e(α±iβ)t
= eαt (a ± ib) e±iβt
= eαt (a ± ib) cos(βt) ± i sin(βt)


= eαt a cos(βt) − b sin(βt) ± ieαt a sin(βt) + b cos(βt) .


 

Since the differential equation x0 = Ax is linear, the functions below are also solutions,
1
x+ + x- = a cos(βt) − b sin(βt) eαt ,
 
x1 =
2
1
x+ − x- = a sin(βt) + b cos(βt) eαt .
 
x2 =
2i
This establishes the Theorem. 

Example 6.1.7. Find a real-valued set of fundamental solutions to the differential equation
 
2 3
x0 = Ax, A= . (6.1.6)
−3 2

Solution: Fist find the eigenvalues of matrix A above,



(2 − λ) 3
0 = = (λ − 2)2 + 9 ⇒ λ± = 2 ± 3i.
−3 (2 − λ)
Then find the respective eigenvectors. The one corresponding to λ+ is the solution of the homoge-
neous linear system with coefficients given by
         
2 − (2 + 3i) 3 −3i 3 −i 1 1 i 1 i
= → → → .
−3 2 − (2 + 3i) −3 −3i −1 −i −1 −i 0 0
 
v+1
Therefore the eigenvector v+ = is given by
v+2
 
−i
v+1 = −iv+2 ⇒ v+2 = 1, v+1 = −i, ⇒ v+ = , λ+ = 2 + 3i.
1
The second eigenvector is the complex conjugate of the eigenvector found above, that is,
 
i
v- = , λ- = 2 − 3i.
1
Notice that    
0 −1
v± = ± i.
1 0
258 6. QUANTITATIVE ASPECTS OF SYSTEMS OF ODE

Then, the real and imaginary parts of the eigenvalues and of the eigenvectors are given by
   
0 −1
α = 2, β = 3, a= , b= .
1 0
So a real-valued expression for a fundamental set of solutions is given by
0  
−1   
sin(3t) 2t
x1 = cos(3t) − sin(3t) e2t ⇒ x1 = e ,
1 0 cos(3t)
     
 0 −1 
2t − cos(3t) 2t
x2 = sin(3t) + cos(3t) e ⇒ x2 = e .
1 0 sin(3t)
C

We end with case (iii). There are no many possibilities left for a 2 × 2 real matrix that both
is diagonalizable and has a repeated eigenvalue. Such matrix must be proportional to the identity
matrix.

Theorem 6.1.5. Every 2 × 2 diagonalizable matrix with repeated eigenvalue λ0 has the form
A = λ0 I.

Proof of Theorem 6.1.5: Since matrix A diagonalizable, there exists a matrix P invertible such
that A = P DP −1 . Since A is 2 × 2 with a repeated eigenvalue λ0 , then
 
λ0 0
D= = λ0 I.
0 λ0
Put these two facts together,
 
λ0 0
A = P DP −1 = P λ0 IP −1 = λ0 P P −1 = λ0 I ⇒ A= .
0 λ0

Remark: The general solution x for x0 = λI x is simple to write. Since any non-zero 2-vector is
an eigenvector of λ0 I, we choose the linearly independent set
   o
n 1 0
v1 = , v2 = .
0 1
Using these eigenvectors we can write the general solution,
     
1 0 c1
x(t) = c1 eλ0 t v1 + c2 eλ0 t v2 = c1 eλ0 t + c2 eλ0 t ⇒ x(t) = eλt .
0 1 c2
6.1.4. Non-Diagonalizable Systems. A 2 × 2 linear systems might not be diagonalizable.
This can happen only when the coefficient matrix has a repeated eigenvalue and all eigenvectors
are proportional to each other. If we denote by λ the repeated eigenvalue of a 2 × 2 matrix A, and
by v an associated eigenvector, then one solution to the differential system x0 = A x is
x1 (t) = eλt v.
Every other eigenvector ṽ associated with λ is proportional to v. So any solution of the form ṽ eλt
is proportional to the solution above. The next result provides a linearly independent set of two
solutions to the system x0 = Ax associated with the repeated eigenvalue λ.

Theorem 6.1.6 (Repeated Eigenvalue). If an 2 × 2 matrix A has a repeated eigenvalue λ with only
one associated eigen-direction, given by the eigenvector v, then the differential system x0 (t) = A x(t)
has a linearly independent set of solutions
x1 (t) = eλt v, x2 (t) = eλt v t + w ,
 

where the vector w is one of infinitely many solutions of the algebraic linear system
(A − λI)w = v. (6.1.7)
6.1. TWO-DIMENSIONAL LINEAR SYSTEMS 259

Remark: The eigenvalue λ is the precise number that makes matrix (A − λI) not invertible, that
is, det(A − λI) = 0. This implies that an algebraic linear system with coefficient matrix (A − λI)
may or may not have solutions, depending on what the source vector is. The Theorem above says
that Eq. (6.1.7) has solutions when the source vector is v. The reason that this system has solutions
is that v is an eigenvector of A.
Proof of Theorem 6.1.6: One solution to the differential system is x1 (t) = eλt v. Inspired by the
reduction order method we look for a second solution of the form
x2 (t) = eλt u(t).
Inserting this function into the differential equation x0 = A x we get
u0 + λ u = A u ⇒ (A − λI) u = u0 .
We now introduce a power series expansion of the vector-valued function u,
u(t) = u0 + u1 t + u2 t2 + · · · ,
into the differential equation above,
(A − λI)(u0 + u1 t + u2 t2 + · · · ) = (u1 + 2u2 t + · · · ).
If we evaluate the equation above at t = 0, and then its derivative at t = 0, and so on, we get the
following infinite set of linear algebraic equations
(A − λI)u0 = u1 ,
(A − λI)u1 = 2u2 ,
(A − λI)u2 = 3u3
..
.
Here is where we use Cayley-Hamilton’s Theorem. Recall that the characteristic polynomial p(λ̃) =
det(A − λ̃I) has the form
p(λ̃) = λ̃2 − tr (A) λ̃ + det(A).
Cayley-Hamilton Theorem says that the matrix-valued polynomial p(A) = 0, that is,
A2 − tr (A) A + det(A) I = 0.
Since in the case we are interested in matrix A has a repeated root λ, then
p(λ̃) = (λ̃ − λ)2 = λ̃2 − 2λ λ̃ + λ2 .
Therefore, Cayley-Hamilton Theorem for the matrix in this Theorem has the form
0 = A2 − 2λ A + λ2 I ⇒ (A − λI)2 = 0.
This last equation is the one we need to solve the system for the vector-valued u. Multiply the first
equation in the system by (A − λI) and use that (A − λI)2 = 0, then we get
0 = (A − λI)2 u0 = (A − λI) u1 ⇒ (A − λI)u1 = 0.
This implies that u1 is an eigenvector of A with eigenvalue λ. We can denote it as u1 = v. Using
this information in the rest of the system we get
(A − λI)u0 = v,
(A − λI)v = 2u2 ⇒ u2 = 0,
(A − λI)u2 = 3u3 ⇒ u3 = 0,
..
.
We conclude that all terms u2 = u3 = · · · = 0. Denoting u0 = w we obtain the following system of
algebraic equations,
(A − λI)w = v,
(A − λI)v = 0.
260 6. QUANTITATIVE ASPECTS OF SYSTEMS OF ODE

For vectors v and w solution of the system above we get u(t) = w + tv. This means that the second
solution to the differential equation is
x2 (t) = eλt (tv + w).
This establishes the Theorem. 

Example 6.1.8. Find the fundamental solutions of the differential equation


 
1 −6 4
x0 = Ax, A= .
4 −1 −2

Solution: As usual, we start finding the eigenvalues and eigenvectors of matrix A. The former are
the solutions of the characteristic equation
3 
− −λ 1   3  1 1
0 = 2 1 1 = λ+ λ+ + = λ2 + 2λ + 1 = (λ + 1)2 .
−4 −2 − λ 2 2 4
Therefore, there solution is the repeated eigenvalue λ = −1. The associated eigenvectors are the
vectors v solution to the linear system (A + I)v = 0,
 3    1     
−2 + 1 1  −2 1 1 −2 1 −2
1 1 = 1 1 → → ⇒ v1 = 2v2 .
−4 −2 + 1 −4 2 1 −2 0 0
Choosing v2 = 1, then v1 = 2, and we obtain
 
2
λ = −1, v= .
1
Any other eigenvector associated to λ = −1 is proportional to the eigenvector above. The matrix
A above is not diagonalizable. So. we follow Theorem 6.1.6 and we solve for a vector w the linear
system (A + I)w = v. The augmented matrix for this system is given by,
 1     
− 2 1 2 1 −2 −4 1 −2 −4
→ → ⇒ w1 = 2w2 − 4.
− 14 12 1 1 −2 −4 0 0 0
We have obtained infinitely many solutions given by
   
2 −4
w= w2 + .
1 0
As one could have imagined, given any solution w, the cv + w is also a solution for any c ∈ R. We
choose the simplest solution given by  
−4
w= .
0
Therefore, a fundamental set of solutions to the differential equation above is formed by
   2 −4
2
x1 (t) = e−t , x2 (t) = e−t t + . (6.1.8)
1 1 0
C
6.1. TWO-DIMENSIONAL LINEAR SYSTEMS 261

6.1.5. Exercises.

6.1.1.- . 6.1.2.- .
262 6. QUANTITATIVE ASPECTS OF SYSTEMS OF ODE

6.2. Two-Dimensional Phase portraits


Figures are easier to understand than words. Words are easier to understand than equations. The
qualitative behavior of a function is often simpler to visualize from a graph than from an explicit or
implicit expression of the function. In this section we show graphical representation of the solutions
found in the previous section , which are solutions of 2 × 2 linear systems of differential equations.
We focus on phase portraits of these solutions and we characterize the trivial solution x = 0 as
stable, unstable, saddle, or center.

6.2.1. Review of Solutions Formulas. In the previous section we found explicit formu-
las for the solutions of 2 × 2 linear homogeneous systems of differential equations with constant
coefficients. We summarize the results from the previous section in the following theorem.

Theorem 6.2.1. A pair of fundamental solutions of a 2 × 2 system x0 = A x, where A is a constant


matrix, depend on the eigenpairs of A, say λ± , v± , as follows.
(a) If λ+ 6= λ- and real, then A is diagonalizable and a pair of fundamental solutions is
x+ = v+ eλ+ t , x- = v- eλ- t ,

(b) If λpm = α ± βi and v± = a ± bi, then A is diagonalizable and fundamental solutions is


x1 = (a cos(βt) − b sin(βt)) eαt ,
x2 = (a sin(βt) + b cos(βt)) eαt .
(c) If λ+ = λ- = λ0 and A is diagonalizable, then A = λ0 I and a pair of fundamental solutions is
   
1 λ0 t 0 λ0 t
x+ = e , x- = e ,
0 1

(d) If λ+ = λ- = λ0 and A is not diagonalizable, then a pair of fundamental solutions is


x+ = v eλ0 t , x- = (t v + w) eλ0 t ,
where
(A − λ0 I)v = 0, (A − λ0 I)w = v.

This theorem has been proven in the previous section. We see that the formulas for fundamental
solutions of a 2×2 system x0 = Ax change considerably depending on the eigenpairs of the coefficient
matrix A. The main reason for this property of the solutions is that—unlike the eigenvalues, which
depend continuously on the matrix coefficients—the eigenvectors do not depend continuously on
the matrix coefficients. This means that two matrices with very similar coefficients will have very
similar eigenvalues but they might have very different eigenvectors.
We are interested in graphical representations of solutions to 2 × 2 systems of differential
equations. One possible graphical representation of a solution vector
 
x1 (t)
x(t) =
x2 (t)
is to graph each component, x1 and x2 , as functions of t. We have seen another way to graph a
2-vector-valued function. We plot the whole vector x(t) at t on the plane x1 , x2 . Each vector x(t)
is represented by its end point, while the whole solution x is represented by a line with arrows
pointing in the direction of increasing t. Such a figure is called a phase portrait or phase diagram
of a solution.
Phase portraits have a simple physical interpretation in the case that the solution vector x(t)
is the position function of a particle moving in a plane at the time t. In this case the phase portrait
is the trajectory of the particle. The arrows added to this trajectory indicate the motion of the
particle as time increases.
6.2. TWO-DIMENSIONAL PHASE PORTRAITS 263

6.2.2. Real Distinct Eigenvalues. We now focus on solutions to the system x0 = Ax, in
the case that matrix A has two real eigenvalues λ+ 6= λ- . We study the case where both eigenvalues
are non-zero. (The case where one eigenvalue vanishes is left as an exercise.) So, the eigenvalues
belong to one of the following cases:
(i) λ+ > λ- > 0, both eigenvalues positive;
(ii) λ+ > 0 > λ- , one eigenvalue negative and the other positive;
(iii) 0 > λ+ > λ- , both eigenvalues negative.
The phase portrait of several solutions x(t) can be displayed in the same picture. If the
fundamental solutions are x+ and x- , the any solution is given by
x = c+ x+ + c- x- .
We indicate what solution we are plotting by specifying the values for the constants c+ and c- . A
phase diagram can be sketched by following these few steps.
(a) Plot the eigenvectors v+ and v- corresponding to the eigenvalues λ+ and λ- .
(b) Draw the whole lines parallel to these vectors and passing through the origin. These straight
lines correspond to solutions with either c+ or c- zero.
(c) Draw arrows on these lines to indicate how the solution changes as the variable t increases.
If t is interpreted as time, the arrows indicate how the solution changes into the future. The
arrows point towards the origin if the corresponding eigenvalue λ is negative, and they point
away form the origin if the eigenvalue is positive.
(d) Find the non-straight curves correspond to solutions with both coefficient c+ and c- non-zero.
Again, arrows on these curves indicate the how the solution moves into the future.
Case λ+ > λ- > 0, Source, (Unstable Point).

Example 6.2.1. Sketch the phase diagram of the solutions to the differential equation
 
1 11 3
x0 = Ax, A= . (6.2.1)
4 1 9

Solution: The characteristic equation for this matrix A is given by


λ+ = 3,

det(A − λI) = λ2 − 5λ + 6 = 0 ⇒
λ- = 2.
One can show that the corresponding eigenvectors are given by
   
3 −2
v+ = , v- = .
1 2
So the general solution to the differential equation above is given by
   
3 3t −2 2t
x(t) = c+ v+ eλ+ t + c- v- eλ- t ⇔ x(t) = c+ e + c- e .
1 2
In Fig. 1 we have sketched four curves, each representing a solution x corresponding to a particular
choice of the constants c+ and c- . These curves actually represent eight different solutions, for eight
different choices of the constants c+ and c- , as is described below. The arrows on these curves
represent the change in the solution as the variable t grows. Since both eigenvalues are positive,
the length of the solution vector always increases as t increases. The straight lines correspond to
the following four solutions:
c+ = 1, c- = 0; c+ = 0, c- = 1; c+ = −1, c- = 0; c+ = 0, c- = −1.
The curved lines on each quadrant correspond to the following four solutions:
c+ = 1, c- = 1; c+ = 1, c- = −1; c+ = −1, c- = 1; c+ = −1, c- = −1.
C
264 6. QUANTITATIVE ASPECTS OF SYSTEMS OF ODE

x2

c+ = 0, c- = 1 c+ = 1, c- = 1

v-

c+ = 1, c- = 0
c+ = −1, c- = 1 v+

x1
0

c+ = −1, c- = 0
c+ = 1, c- = −1

c+ = −1, c- = −1 c+ = 0, c- = −1

Figure 1. Eight solutions to Eq. (6.2.1), where λ+ > λ- > 0. The trivial
solution x = 0 is called a source, or an unstable point.

Case λ+ > 0 > λ- , Saddle, (Unstable Point).

Example 6.2.2. Sketch the phase diagram of the solutions to the differential equation
 
1 3
x0 = Ax, A= . (6.2.2)
3 1

Solution: In a previous section we have computed the eigenvalues and eigenvectors of this coefficient
matrix, and the result is
   
1 −1
λ+ = 4, v+ = and λ- = −2, v- = .
1 1
Therefore, all solutions of the differential equation above are
   
1 4t −1 −2t
x(t) = c+ v+ eλ+ t + c- v- eλ- t ⇔ x(t) = c+ e + c- e ,
1 1
In Fig. 2 we have sketched four curves, each representing a solution x corresponding to a particular
choice of the constants c+ and c- . These curves actually represent eight different solutions, for
eight different choices of the constants c+ and c- , as is described below. The arrows on these
curves represent the change in the solution as the variable t grows. The part of the solution with
positive eigenvalue increases exponentially when t grows, while the part of the solution with negative
eigenvalue decreases exponentially when t grows. The straight lines correspond to the following four
solutions:
c+ = 1, c- = 0; c+ = 0, c- = 1; c+ = −1, c- = 0; c+ = 0, c- = −1.
The curved lines on each quadrant correspond to the following four solutions:
c+ = 1, c- = 1; c+ = 1, c- = −1; c+ = −1, c- = 1; c+ = −1, c- = −1.
C

Case 0 > λ+ > λ- , Sink, (Stable Point).

Example 6.2.3. Sketch the phase diagram of the solutions to the differential equation
 
1 −9 3
x0 = Ax, A= . (6.2.3)
4 1 −11
6.2. TWO-DIMENSIONAL PHASE PORTRAITS 265

x2
c+ = 0, c- = 1 c+ = 1, c- = 0

c+ = 1, c- = 1

v- v+
c+ = −1, c- = 1 c+ = 1, c- = −1
x1
0

c+ = −1, c- = −1

c+ = −1, c- = 0 c+ = 0, c- = −1

Figure 2. Several solutions to Eq. (6.2.2), λ+ > 0 > λ- . The trivial


solution x = 0 is called a saddle or an unstable point.

Solution: The characteristic equation for this matrix A is given by


λ+ = −2,

det(A − λI) = λ2 + 5λ + 6 = 0 ⇒
λ- = −3.
One can show that the corresponding eigenvectors are given by
   
3 −2
v+ = , v- = .
1 2
So the general solution to the differential equation above is given by
   
3 −2t −2 −3t
x(t) = c+ v+ eλ+ t + c- v- eλ- t ⇔ x(t) = c+ e + c- e .
1 2
In Fig. 3 we have sketched four curves, each representing a solution x corresponding to a particular
choice of the constants c+ and c- . These curves actually represent eight different solutions, for
eight different choices of the constants c+ and c- , as is described below. The arrows on these curves
represent the change in the solution as the variable t grows. Since both eigenvalues are negative, the
length of the solution vector always decreases as t grows and the solution vector always approaches
zero. The straight lines correspond to the following four solutions:
c+ = 1, c- = 0; c+ = 0, c- = 1; c+ = −1, c- = 0; c+ = 0, c- = −1.
The curved lines on each quadrant correspond to the following four solutions:
c+ = 1, c- = 1; c+ = 1, c- = −1; c+ = −1, c- = 1; c+ = −1, c- = −1.
C

6.2.3. Complex Eigenvalues. A real-valued matrix may have complex-valued eigenvalues.


These complex eigenvalues come in pairs, because the matrix is real-valued. If λ is one of these
complex eigenvalues, then λ is also an eigenvalue. A usual notation is λ± = α ± iβ, with α, β ∈ R.
The same happens with their eigenvectors, which are written as v± = a ± ib, with a, b ∈ R2 . When
the matrix is the coefficient matrix of a differential equation,
x0 = A x,
266 6. QUANTITATIVE ASPECTS OF SYSTEMS OF ODE

x2
c+ = 1, c- = 1
c+ = 0, c- = 1

v-

c+ = 1, c- = 0
c+ = −1, c- = 1 v+

x1
0

c+ = −1, c- = 0
c+ = 1, c- = −1

c+ = 0, c- = −1
c+ = −1, c- = −1

Figure 3. Several solutions to Eq. (6.2.3), where 0 > λ+ > λ- . The trivial
solution x = 0 is called a sink or a stable point.

the solutions x+ (t) = v+ eλ+ t and x- (t) = v- eλ- t are complex-valued. We know that real-valued
fundamental solutions can be constructed with the real part and the imaginary part of the solution
x+ . The resulting formulas are
x1 (t) = a cos(βt) − b sin(βt) eαt , x2 (t) = a sin(βt) + b cos(βt) eαt .
 
(6.2.4)
These real-valued solutions are used to draw the phase portraits. We recall how to obtain real-valued
fundamental solutions in the following example.

Example 6.2.4. Find a real-valued set of fundamental solutions to the differential equation below
and sketch a phase portrait, where
 
2 3
x0 = Ax, A= .
−3 2

Solution: We know from a previous section that the eigenpairs of the coefficient matrix are
 
∓i
λ± = 2 ± 3i, v± = .
1
Writing them in real and imaginary parts, λ± = α ± iβ and v± = a ± ib, we get
   
0 −1
α = 2, β = 3, a= , b= .
1 0
These eigenvalues and eigenvectors imply the following real-valued fundamental solutions,
   
n sin(3t) 2t − cos(3t) 2t o
x1 (t) = e , x2 (t) = e . (6.2.5)
cos(3t) sin(3t)
Before plotting the phase portrait of these solutions it is useful to plot the phase portrait of the
following functions    
sin(3t) − cos(3t)
x̃1 (t) = x̃2 (t) = .
cos(3t) sin(3t)
These functions x̃1 and x̃2 are not solutions of the differential equation. But they are simple to
plot, because both vector functions have length one for all t,
q q
kx̃1 (t)k = sin2 (3t) + cos2 (3t) = 1, kx̃2 (t)k = cos2 (3t) + sin2 (3t) = 1.
6.2. TWO-DIMENSIONAL PHASE PORTRAITS 267

Therefore, one can see that these functions describe a circle in the plane x1x2 , represented by
the dashed line in Fig. 4. Once we know that, we realize that the fundamental solutions of the
differential equation can be written as
x1 (t) = x̃1 (t) e2t , x2 (t) = x̃2 (t) e2t .
Therefore, the solutions x1 and x2 must be spirals going away from the origin as t increases, see
Fig. 4. C

x2
x2

a
x1

b x1
0

Figure 4. The graph of the fundamental solutions x1 and x2 in Eq. (6.2.5).


The trivial solution x = 0 is an unstable spiral.

In general, when a coefficient matrix A has complex eigenpairs,


λ± = α ± iβ, v± = a ± ib,
real-valued fundamental solutions of the differential equation x]0 = Ax are given by
x1 (t) = a cos(βt) − b sin(βt) eαt , x2 (t) = a sin(βt) + b cos(βt) eαt .
 

The phase portrait of these solutions can be done as follows. First plot the vectors a, b. Then
plot the auxiliary functions
 
x̃1 (t) = a cos(βt) − b sin(βt) , x̃2 (t) = a sin(βt) + b cos(βt) .
It is simple to see that the phase portrait of the auxiliary functions is an ellipse with axes given by
the vectors a and b. These ellipses are plotted below in dashed lines. Since
x1 (t) = x̃1 (t) eαt , x2 (t) = x̃2 (t) eαt ,
the phase portrait of the solutions x1 and x2 are going to be spirals. As t increases we see that the
solution spirals out for α > 0, stays in the ellipse for α = 0, and spirals into the origin for α < 0.
We now choose arbitrary vectors a and b and we sketch phase portraits of x1 and x2 for a few
choices of α. The result is given in Fig. 5.
We now make a different choice for the vector b, and we repeat the three phase portraits given
above; for α > 0, α = 0, and α < 0. The result is given in Fig. 6. Comparing Figs. 5 and 6 shows
268 6. QUANTITATIVE ASPECTS OF SYSTEMS OF ODE

x2 α > 0 x2 α = 0 x2 α < 0

x2 x2
x1
x1
a
b a b a b
0 x1 0 x1 0 x1

x2 x1

Figure 5. Fundamental solutions x1 and x2 in Eq. (6.2.4) for α > 0, α = 0,


and α < 0. The relative positions of a and b determines the rotation
direction. Compare with Fig. 6. The trivial solution x = 0 is called an
unstable spiral for α > 0, a center for α = 0, and a stable spiral for
α < 0.

that the relative directions of the vectors a and b determines the rotation direction of the solutions
as t increases.

x2 α > 0 x2 α = 0 x2 α < 0

x1
x1 x1
a
a a
0 x1 0 x1 0 x1

b x2 b b
x2 x2

Figure 6. Fundamental solutions x1 and x2 in Eq. (6.2.4) for α > 0, α = 0,


and α < 0. The relative positions of a and b determines the rotation
direction. Compare with Fig. 5. The trivial solution x = 0 is called an
unstable spiral for α > 0, a center for α = 0, and a stable spiral for
α < 0.

6.2.4. Repeated Eigenvalues. A matrix with repeated eigenvalues may or may not be di-
agonalizable. If a 2 × 2 matrix A is diagonalizable with repeated eigenvalues, then we know that
this matrix is proportional to the identity matrix, A = λ0 I, with λ0 the repeated eigenvalue. We
saw in a previous section that the general solution of a differential system with such coefficient
matrix is  
c1 λ0 t
x(t) = e .
c2
Phase portraits of these solutions are just straight lines, starting from the origin for λ0 > 0, or
ending at the origin for λ0 < 0.
Non-diagonalizable 2 × 2 differential systems are more interesting. If x0 = A x is such a system,
it has fundamental solutions
x1 (t) = v eλ0 t , x2 (t) = (v t + w) eλ0 t , (6.2.6)
where λ0 is the repeated eigenvalue of A with eigenvector v, and vector w is any solution of the
linear algebraic system
(A − λ0 I)w = v.
6.2. TWO-DIMENSIONAL PHASE PORTRAITS 269

The phase portrait of these fundamental solutions is given in Fig 7. To construct this figure
start drawing the vectors v and w. The solution x1 is simpler to draw than x2 , since the former is
a straight semi-line starting at the origin and parallel to v.

x2 λ 0 > 0
x2 λ 0 < 0
x2

x1
x1
x2
v
w v
w
x1
0 x1
0

−x1 −x2
−x1

−x2

Figure 7. Functions x1 , x2 in Eq. (6.2.6) for the cases λ0 > 0 and λ0 < 0.
The trivial solution x =0 is a source or an unstable point for λ0 > 0, and
a sink or a stable point for λ0 < 0.

The solution x2 is more difficult to draw. One way is to first draw the trajectory of the auxiliary
function
x̃2 = v t + w.
This is a straight line parallel to v passing through w, one of the black dashed lines in Fig. 7, the
one passing through w. The solution x2 can be written as
x2 (t) = x̃2 (t) eλ0 t .
Consider the case λ0 > 0. For t > 0 we have x2 (t) > x̃2 (t), and the opposite happens for t < 0. In
the limit t → −∞ the solution values x2 (t) approach the origin, since the exponential factor eλ0 t
decreases faster than the linear factor t increases. The result is the purple line in the first picture
of Fig. 7. The other picture, for λ0 < 0 can be constructed following similar ideas.

6.2.5. The Stability of Linear Systems. In the last part of this section we focus on a
particular solution of the system x0 = Ax, the trivial solution x0 = 0. First notice that x0 = 0 is
indeed a solution of the differential equation, since x00 = 0 = A0 = Ax0 . Second, this is a solution
that is t-independent, that is, a constant solution. Third, this solution is not very interesting in
itself, that’s why we have not mentioned it at all till now.
However, we are now interested in the behavior of solutions with initial data nearby the trivial
solution x0 = 0. We introduce two main characterizations of the behavior of solutions with initial
data nearby the trivial solution. The first characterization is not so precise as the second charac-
terization. Because of that, the first characterization can be extended to more systems than the
second characterization.
The first characterization is very general and can be extended to a great variety of systems,
including n × n systems, and nonlinear systems.

Definition 6.2.2. The solution x0 = 0 of a 2 × 2 linear system x0 = Ax is:


(1) stable iff all solutions x(t) with x(0) sufficiently close to x0 = 0 remain close to it for all t > 0.
(2) asymptotically stable iff all solutions x(t) with x(0) sufficiently close to x0 = 0 satisfy that
x(t) → x0 as t → ∞.
Otherwise, the trivial solution x0 is called unstable.
270 6. QUANTITATIVE ASPECTS OF SYSTEMS OF ODE

This characterization of the trivial solution is usually called the stability of the trivial solution.
It is not difficult to relate the stability of the trivial solution with the sign of the eigenvalues of the
coefficient matrix of the linear system.

Theorem 6.2.3 (Stability I). Let x(t) be the solution of a 2 × 2 linear system x0 = Ax, with
det(A) 6= 0 and initial condition x(0) = x1 . Then the trivial solution x0 = 0 is:
(1) stable if both eigenvalues of A have non-positive real parts.
(2) asymptotically stable if both eigenvalues of A have negative real parts.

Remark: If the trivial solution is asymptotically stable, then it is also stable. The converse state-
ment is not true.
Proof of Theorem 6.2.3: We start with part (2). If the eigenvalues of the coefficient matrix A
have non-positive real parts, then we have the following possibilities.
(a) If the eigenvalues are real and λ- < λ+ < 0, then the general solution is
x(t) = c+ v+ eλ+ t + c- v- eλ- t .
Therefore, the analysis done in this section says that x(t) → x0 = 0 as t → ∞.
(b) If the eigenvalues are real and λ+ = λ- = λ0 < 0, then the general solution is
x(t) = c1 v0 eλ0 t + c- (tv0 + w) eλ0 t .
Therefore, the analysis done in this section says that x(t) → x0 = 0 as t → ∞.
(c) If the eigenvalues are complex, λ± = α ± iβ, with α < 0, then
x1 (t) = a cos(βt) − b sin(βt) eαt ,


x2 (t) = a sin(βt) + b cos(βt) eαt ,




are fundamental solutions of the differential equation. Since α < 0, it is simple to see that
x(t) → x0 = 0 as t → ∞.
Regarding part (1), it includes all cases in part (2). But part (1) also includes the following
two cases.
(a) Suppose that one eigenvalue of A is λ1 = 0, with eigenvector v1 . In this case one fundamental
solution is x1 = v1 , a constant, nonzero solution. Then a picture helps to understand that for
initial data x(0) close enough to x0 = 0, the solution x(t) remains close to x0 for t > 0.
(b) The last case is when the coefficient matrix A has pure imaginary eigenvalues, λ± = ±βi. In
this case the solutions are ellipses with x0 = 0 in its interior.

The second characterization of the trivial solution x0 = 0 is more precise, but some parts of
this characterization cannot be extended to more general systems. For example, the saddle point
characterization below applies only to 2 × 2 systems.

Definition 6.2.4. The solution x0 = 0 of a 2 × 2 linear system x0 = Ax is:


(a) a source iff for any initial condition x(0) the corresponding solution x(t) satisfies
lim |x(t)| = ∞.
t→∞

(b) a sink iff for any initial condition x(0) the corresponding solution x(t) satisfies
lim x(t) = 0.
t→∞

(c) a saddle iff for some initial data x(0) the corresponding solution x(t) behaves as in (a) and for
other initial data the solution behaves as in (b).
(d) a center iff for any initial data x(0) the corresponding solution x(t) describes a closed periodic
trajectory around x0 = 0.
6.2. TWO-DIMENSIONAL PHASE PORTRAITS 271

Recall that in Theorem 6.2.1 we have explicit formulas for all solutions of the 2 × 2 system
x0 = Ax, for any matrix A. From those formulas it is not difficult to prove the following result.

Theorem 6.2.5 (Stability II). Let x(t) be the solution of a 2 × 2 linear system x0 = Ax, with
det(A) 6= 0 and initial condition x(0) = x1 . Then the trivial solution x0 = 0 is:
(i) a source if both eigenvalues of A have positive real parts.
(ii) a sink if both eigenvalues of A have negative real parts.
(iii) a saddle if one eigenvalues of A is positive and the other is negative.
(iv) a center if both eigenvalues of A are purely imaginary.

Proof of Theorem 6.2.5: Parts (i) and (ii)are simple to prove. From the solution formulas given
above in this section one can see the following. If the coefficient matrix A has both eigenvalues
with positive real parts, then for all initial data x(0) 6= 0 the corresponding solutions satisfy that
kx(t)k → ∞ as t → ∞. This shows part (i). If the eigenvalues have negative real parts, then
x(t) → 0 as t → ∞.
Part (iii) is also simple to prove using the solution formulas from the beginning of this section.
If the initial data x(0) lies on the eigenspace of A with negative eigenvalue, then the corresponding
solution satisfies x(t) → 0 as t → ∞. For any orhter initial data, the corresponding solution satisfies
that kx(t)k → ∞ as t → ∞.
Finally, part (iv) is simple to see from the solutions formulas at the beginning of this section.
If A has pure imaginary eigenvalues, then the solutions describe ellipses around x0 = 0, which are
closed periodic trajectories. 
272 6. QUANTITATIVE ASPECTS OF SYSTEMS OF ODE

6.2.6. Exercises.

6.2.1.- . 6.2.2.- .
6.3. TWO-DIMENSIONAL NONLINEAR SYSTEMS 273

6.3. Two-Dimensional Nonlinear Systems


We now consider two-dimensional autonomous nonlinear systems of differential equations. We have
seen these systems already in § 4.1, § 4.2, when we introduced them, we defined their equilibrium
solutions, and the vector and direction fields determined by the equations. In this section we show
that solutions to certain nonlinear systems near equilibrium solutions behave in a similar way as
solutions of appropriately chosen linear systems. Therefore, the qualitative behavior of solutions to
certain nonlinear systems can be obtained by studying solutions of several linear systems.
6.3.1. 2 × 2 Autonomous Systems. We start reviewing the definition of a 2 × 2, first order,
autonomous, system of differential equations. This time we use vector notation to write the system.

Definition 6.3.1. A 2×2 first order autonomous differential system of equations in the variable
x(t) = (x1 (t), x2 (t)) is
x 0 = F(x), (6.3.1)
0 dx
where x = , and the vector field F does not depend explicitly on t.
dt

Remarks:
• A solution of equation (6.3.1) is a curve x(t) = (x1 (t), x2 (t)) in the x1 x2 -plane, called the
phase space, where the independent variable t is the parameter of the curve.
• The vector tangent to the solution curve x(t) is its t-derivative, which in components is
given by  0
x
x 0 = hx01 , x02 i = 10 .
x2
We will use both notations, rows and columns, to denote vector components.
• The right-hand side in equation (6.3.1) defines the vector field of the equation,
 
f1
F = hf1 , f2 i = .
f2
• In components, equation (6.3.1) is given by
x01 = f1 (x1 , x2 )
x02 = f2 (x1 , x2 ).
6.3.2. Critical Points and Linearization. It is hard to find formulas for solutions of non-
linear systems of differential equations. And even in the case that one can find such formulas, they
often are too complicated to provide much insight into the solution behavior. Instead, we will try
to get a qualitative understanding of the solutions of nonlinear systems. As we did in a previous
section, we will try to get an idea of the phase portrait of solutions without solving the differential
equation.
In § 4.2 we saw one way to do this: plot the vector field or the direction field of the differential
equation. In this section we do something else. We first find the constant solutions, called equilib-
rium solutions, or equilibrium points, or critical points, just as we did in § 4.2. Then we study the
behavior of solutions near the equilibrium solutions. It turns out that, sometimes, this behavior
can be obtained from studying solutions of linear systems of differential equations. In this section
we study when this is possible and how to obtain the appropriate linear system to study.
We start recalling the definition of equilibrium solutions from § 4.2.

Definition 6.3.2. An equilibrium solution of the autonomous system x 0 = F is a constant


solution, that is, a point x0 in the phase space satisfying
F(x0 ) = 0.

Remarks:
• Equilibrium solutions are also called equilibrium points, or critical points.
274 6. QUANTITATIVE ASPECTS OF SYSTEMS OF ODE

• In components, an equilibrium solution has the form x0 = (x01 , x02 ), and the field is
F = hf1 , f2 i, so the equilibrium solutions must satisfy the equations
f1 (x01 , x02 ) = 0
f2 (x01 , x02 ) = 0.

Example 6.3.1. Find all the critical points of the two-dimensional (decoupled) system
x01 = −x1 + (x1 )3
x02 = −2 x2 .

Solution: We need to find all constant solutions x = (x1 , x2 ) solutions of


−x1 + (x1 )3 = 0,
−2 x2 = 0.
From the second equation we get x2 = 0. From the first equation we get
x1 (x1 )2 − 1 = 0 ⇒ x1 = 0, or x1 = ±1.


Therefore, we got three critical points, x0 = (0, 0), x1 = (1, 0), x2 = (−1, 0). C

Once we know the constant solutions of a nonlinear differential system we study the equation
itself near these constant solutions. Consider the two-dimensional system
x01 = f1 (x1 , x2 ),
x02 = f2 (x1 , x2 ),
Assume that f1 , f2 have Taylor expansions at x0 = (x01 , x02 ). We denote
u1 = (x1 − x01 ), u2 = (x2 − x02 ),
and
f10 = f1 (x01 , x02 ), f20 = f2 (x01 , x02 ).
Then, by the Taylor expansion theorem,
∂f1 ∂f1

2 2
f1 (x1 , x2 ) = f10 +

0 u1 + 0 u2 + O (u1 ) , (u2 ) , u1 u2 ,
∂x1 x ∂x2 x
∂f2 ∂f2

0 2 2 
f2 (x1 , x2 ) = f2 + 0 u1 + u2 + O (u1 ) , (u2 ) , u1 u2 ,
∂x1 x ∂x2 x0
where O (u1 )2 , (u2 )2 , u1 u2 denotes quadratic terms in u1 and u2 . Let us simplify the notation a


bit further. Let us denote


∂f1 ∂f1

∂1 f1 = 0, ∂2 f1 = ,
∂x1 x ∂x2 x0
∂f2 ∂f2

∂1 f2 = 0, ∂2 f2 = .
∂x1 x ∂x2 x0
then the Taylor expansion of F has the form
f1 (x1 , x2 ) = f10 + (∂1 f1 ) u1 + (∂2 f1 ) u2 + O (u1 )2 , (u2 )2 , u1 u2 ,


f2 (x1 , x2 ) = f20 + (∂1 f2 ) u1 + (∂2 f2 ) u2 + O (u1 )2 , (u2 )2 , u1 u2 .




We now use this Taylor expansion of the field F into the differential equation x 0 = F. Recall that
x1 = x01 + u1 , x2 = x02 + u2 ,
and that x01 and x02 are constants, then
u01 = f10 + (∂1 f1 ) u1 + (∂2 f1 ) u2 + O (u1 )2 , (u2 )2 , u1 u2 ,


u02 = f20 + (∂1 f2 ) u1 + (∂2 f2 ) u2 + O (u1 )2 , (u2 )2 , u1 u2 .



6.3. TWO-DIMENSIONAL NONLINEAR SYSTEMS 275

Let us write this differential equation using vector notation. If we introduce the vectors and the
matrix    0  
u1 f ∂1 f1 ∂2 f1
u= , F0 = 10 , DF0 = ,
u2 f2 ∂1 f2 ∂2 f2
then, we have that
x 0 = F(x) ⇔ u 0 = F0 + (DF0 ) u + O (u1 )2 , (u2 )2 , u1 u2 .


In the case that x0 is a critical point, then F0 = 0. In this case we have that
x 0 = F(x) ⇔ u 0 = (DF0 ) u + O (u1 )2 , (u2 )2 , u1 u2 .


The relation above says that, when x is close to x0 , the equation coefficients of x 0 = F(x) are close
to the coefficients of the linear differential equation u 0 = (DF0 ) u. For this reason, we give this
linear differential equation a name.

Definition 6.3.3. The linearization of a 2 × 2 system x 0 = F(x) at a critical point x0 is the 2 × 2


linear system
u 0 = (DF0 ) u,
where u = x − x0 , and we have introduced the Jacobian, or derivative, matrix at x0 ,
 
∂f1 ∂f1  
∂x ∂x ∂1 f1 ∂2 f1
DF0 =  ∂f 1 x0 ∂f 2 x0  = .
2 2 ∂1 f2 ∂2 f2
∂x1 0 ∂x2 0
x x

Remark: In components, the nonlinear system is


x01 = f1 (x1 , x2 ),
x02 = f2 (x1 , x2 ),
and the linearization at x0 is  0   
u1 ∂1 f1 ∂2 f1 u1
= .
u2 ∂1 f2 ∂2 f2 u2

Once we have these definitions, we can summarize the calculation we did above.

Theorem 6.3.4. If a 2 × 2 autonomous system x 0 = F(x) has a critical point x0 , then in a


neighborhood of x0 the equation coefficients of x 0 = F(x) are close to the equation coefficients of its
linearization at x0 , given by u 0 = (DF0 ) u.

Remark: The proof is the calculation given above the Definition 6.3.3. This result says that near a
critical point, the equation coefficients of the nonlinear system are close to the equation coefficients
of its linearization at the critical point. This is a result about the equations, not their solutions.
Any relation between solutions is hard to prove. This is the main subject of the Hartman-Grobman
Theorem 6.3.6, which can be established for only a particular type of nonlinear systems.

Example 6.3.2. Find the linearization at every critical point of the nonlinear system
x01 = −x1 + (x1 )3
x02 = −2 x2 .

Solution: We found earlier that this system has three critial points,
x0 = (0, 0), x1 = (1, 0), x2 = (−1, 0).
This means we need to compute three linearizations, one for each critical point. We start computing
the derivative matrix at an arbitrary point x,
  "
∂f1 ∂f1 #
∂x1 ∂x2

(−x1 + x31 ) ∂x

(−x1 + x31 )
DF (x) =  ∂f  = ∂x1∂ 2
∂ ,
2 ∂f2
∂x
(−2x2 ) ∂x
(−2x2 )
∂x1 ∂x2 1 2
276 6. QUANTITATIVE ASPECTS OF SYSTEMS OF ODE

so we get that
−1 + 3 x21
 
0
DF (x) = .
0 −2
We only need to evaluate this matrix Df at the critical points. We start with x0 ,
   0   
−1 0 u1 −1 0 u1
x0 = (0, 0) ⇒ DF0 = ⇒ =
0 −2 u2 0 −2 u2
The Jacobian at x1 and x2 is the same, so we get the same linearization at these points,
   0   
2 0 u1 2 0 u1
x1 = (1, 0) ⇒ DF1 = ⇒ =
0 −2 u2 0 −2 u2
   0   
2 0 u1 2 0 u1
x2 = (−1, 0) ⇒ DF2 = ⇒ =
0 −2 u2 0 −2 u2
C

We classify critical points of nonlinear systems by the eigenvalues of their linearizations.

Definition 6.3.5. A critical point x0 of a two-dimensional system x 0 = F(x) is a:


(a) source node iff both eigenvalues of DF0 are real and positive;
(b) source spiral iff both eigenvalues of DF0 are complex with positive real parts;
(c) sink node iff both eigenvalues of DF0 are real and negative;
(d) sink spiral iff both eigenvalues of DF0 are complex with negative real part;
(e) saddle node, iff both eigenvalues of DF0 are real, one positive, the other negative;
(f ) center, iff both eigenvalues of DF0 are pure imaginary;
(g) higher order critical point iff at least one eigenvalue of DF0 is zero.
A critical point x0 is called hyperbolic iff it belongs to cases (a-e), that is, the real part of all
eigenvalues of DF0 are nonzero.

Example 6.3.3. Classify all the critical points of the nonlinear system
x01 = −x1 + (x1 )3
x02 = −2 x2 .

Solution: We already know that this system has three critical points,
x0 = (0, 0), x1 = (1, 0), x2 = (−1, 0).
We have already computed the linearizations at these critical points too.
   
−1 0 2 0
DF0 = , DF1 = DF2 = .
0 −2 0 −2
We now need to compute the eigenvalues of the Jacobian matrices above.
• For x0 we have λ+ = −1, λ- = −2, so x0 is an attractor.
• For x1 and x2 we have λ+ = 2, λ- = −2, so x1 and x2 are saddle points.
C

We have seen that in a neighborhood of a critical point the equation coefficients are close
to the equation coefficients of its linearization at that critical point. This is a relation between
equation coefficients. Now we relate their corresponding solutions. It turns out that such relation
between solutions can be established when the Jacobian matrix at a critical point has eigenvalues
with nonzero real part. In this case, the solutions of the nonlinear system near that critical point
are close to the solutions of the linearization at that critical point. We summarize this result in the
following statement.
6.3. TWO-DIMENSIONAL NONLINEAR SYSTEMS 277

Theorem 6.3.6 (Hartman-Grobman). Consider a two-dimensional nonlinear autonomous sys-


tem with a continuously differentiable field F,
x 0 = F(x),
and consider its linearization at a hyperbolic critical point x0 ,
u 0 = (DF0 ) u.
Then, there is a neighborhood of the hyperbolic critical point x0 where all the solutions of the
linear system can be transformed into solutions of the nonlinear system by a continuous, invertible,
transformation.

Remark: The Hartman-Grobman theorem implies that the phase portrait of the linear system
in a neighborhood of a hyperbolic critical point can be transformed into the phase portrait of the
nonlinear system by a continuous, invertible, transformation. When that happens we say that the
two phase portraits are topologically equivalent.
Remark: This theorem says that, for hyperbolic critical points, the phase portrait of the lineariza-
tion at the critical point is enough to determine the phase portrait of the nonlinear system near
that critical point.

Example 6.3.4. Use the Hartman-Grobman theorem to sketch the phase portrait of
x01 = −x1 + (x1 )3
x02 = −2 x2 .

Solution: We have found before that the critical points are


x0 = (0, 0), x1 = (1, 0), x2 = (−1, 0),
where x0 is a sink node and x1 , x2 are saddle nodes.

The phase portrait of the linearized systems


at the critical points is given in Fig 6.3.2.
These critical points have all linearizations
with eigenvalues having nonzero real parts.
This means that the critical points are hyper-
bolic, so we can use the Hartman-Grobman
theorem. This theorem says that the phase
portrait in Fig. 6.3.2 is precisely the phase
Figure 8. Phase portraits
portrait of the nonlinear system in this ex-
of the linear systems at x0 ,
ample.
x1 , and x2 .

Since we now know that Fig 6.3.2 is also the phase portrait of the nonlinear, we only need
to fill in the gaps in that phase portrait. In this example, a decoupled system, we can complete
the phase portrait from the symmetries of the solution. Indeed, in the x2 direction all trajectories
must decay to exponentially to the x2 = 0 line. In the x1 direction, all trajectories are attracted
to x1 = 0 and repelled from x1 = ±1. The vertical lines x1 = 0 and x1 = ±1 are invariant, since
x01 = 0 on these lines; hence any trajectory that start on these lines stays on these lines. Similarly,
x2 = 0 is an invariant horizontal line. We also note that the phase portrait must be symmetric in
both x1 and x2 axes, since the equations are invariant under the transformations x1 → −x1 and
x2 → −x2 . Putting all this extra information together we arrive to the phase portrait in Fig. 9.
C
278 6. QUANTITATIVE ASPECTS OF SYSTEMS OF ODE

Figure 9. Phase portraits of the nonlinear systems in the Example 6.3.4

6.3.3. Exercises.

6.3.1.- . 6.3.2.- .
6.4. APPLICATIONS OF THE QUALITATIVE ANALYSIS 279

6.4. Applications of the Qualitative Analysis


In this section we carry out the analysis outlined in § 6.3 in a few physical systems. These systems
include the competing species, the predator-prey, and the nonlinear pendulum. We use the analysis
in § 6.3 to find qualitative properties of solutions to nonlinear autonomous systems
6.4.1. Competing Species. The physical system consists of two species that compete on
the same, limited, food resources. For example, rabbits and sheep, which compete on the grass
on a particular piece of land. If x1 and x2 are the competing species populations, the differential
equations, also called Lotka-Volterra equations for competition, are
 x1 
x01 = r1 x1 1 − − α x2 ,
K1
0
 x2 
x2 = r 2 x2 1 − − β x1 .
K2
The constants r1 , r2 , α, β are all nonnegative, and K1 , K2 are positive. The constants r1 , r2 are the
growth rate per capita, the constants K1 , K2 are the carrying capacities, and α, β the competition
constants for the respective species.

Example 6.4.1 (Competing Species: Extinction). Sketch in the phase space all the critical points
and several solution curves tin order to get a qualitative understanding of the behavior of all
solutions to the competing species system (found in Strogatz book [7]),
x01 = x1 (3 − x1 − 2 x2 ), (6.4.1)
x02 = x2 (2 − x2 − x1 ), (6.4.2)
where x1 (t) is the population of one of the species, say rabbits, and x2 (t) is the population of the
other species, say sheep, at the time t. We restrict our study to nonnegative functions x1 , x2 .
Solution: We start finding all the critical points of the rabbits-sheep system. We need to find all
constants (x1 , x2 ) solutions of
x1 (3 − x1 − 2 x2 ) = 0, (6.4.3)
x2 (2 − x2 − x1 ) = 0. (6.4.4)
• One solution is
x1 = 0 and x2 = 0,
which gives the critical point x0 = (0, 0).
• A second solution is
x1 = 0 and (2 − x2 − x1 ) = 0,
but since x1 = 0, then x2 = 2, which gives the critical point x1 = (0, 2).
• A third solution is
(3 − x1 − 2 x2 ) = 0 and x2 = 0,
but since x2 = 0, then x1 = 3, which gives the critical point x2 = (3, 0).
• The fourth solution is
(2 − x2 − x1 ) = 0 and (3 − x1 − 2 x2 ) = 0
which gives x1 = 1 and x2 = 1, so we get the critical point x3 = (1, 1).
We now compute the linearization of the rabbits-sheep system in Eqs.(6.4.1)-(6.4.2). We first
compute the derivative of the field F, where
   
f1 x1 (3 − x1 − 2 x2 )
F(x) = = .
f2 x2 (2 − x2 − x1 )
The derivative of F at an arbitrary point x is
 
∂f1 ∂f1  
∂x1
DF (x) =  ∂f
∂x2
 = (3 − 2 x1 − 2 x2 ) −2 x1
.
2 ∂f2 −x2 (2 − x1 − 2 x2 )
∂x1 ∂x2
280 6. QUANTITATIVE ASPECTS OF SYSTEMS OF ODE

We now evaluate the matrix DF (x) at each of the critical points we found.
 
3 0
At x0 = (0, 0) we get (DF0 ) = .
0 2
This coefficient matrix has eigenvalues λ0+ = 3 and λ0- = 2, both positive, which means that the
critical point x0 isa Source Node.
  To sketch the phase portrait we will need the corresponding
+ 1 - 0
eigenvectors, v0 = and v0 = .
0 1
 
−1 0
At x1 = (0, 2) we get (DF1 ) = .
−2 −2
This coefficient matrix has eigenvalues λ1+ = −1 and λ1- = −2, both negative, which means that the

1 + 1
critical point x is a Sink Node. One can check that the corresponding eigenvectors are v1 =
−2
 
0
and v-1 = .
1
 
−3 −6
At x2 = (3, 0) we get (DF2 ) = .
0 −1
This coefficient matrix has eigenvalues λ2+ = −1 and λ2- = −3, both negative, which means that the

2 + −3
critical point x is a Sink Node. One can check that the corresponding eigenvectors are v2 =
1
 
- 1
and v2 = .
0
 
−1 −2
At x3 = (1, 1) we get (DF3 ) = .
−1 −1

It is not
√ difficult to check that this coefficient matrix has eigenvalues λ3+ = −1 + 2 and λ3- =
3
−1 − 2, which means that the critical √ point x is√ a Saddle
 Node. One can check that the
+ − 2 - 2
corresponding eigenvectors are v3 = and v3 = . We summarize this information about
1 1
the linearized systems in Fig. 10.

Figure 10. The linearizations of the rabbits-sheeps system in Eqs. (6.4.1)-(6.4.2).

We would like to have the complete phase portrait for the nonlinear system, that is, we would
like to fill the gaps in Fig. 11. This is difficult to do analytically in this example as well as
in general nonlinear autonomous systems. At this point is where we need to turn to computer
generated solutions to fill the gaps in Fig. 11. The result is in Fig. 12.
We can now study the phase portrait in Fig. 12 to obtain some biological insight on the rabbits-
sheep system. The picture on the right says that most of the time one species drives the other to
extinction. If the initial data for the system is a point on the blue region, called the rabbit basin,
then the solution evolves in time toward the critical point x2 = (3, 0). This means that the sheep
become extinct. If the initial data for the system is a point on the green region, called the sheep
basin, then the solution evolves in time toward the critical point x1 = (0, 2). This means that the
rabbits become extinct.
6.4. APPLICATIONS OF THE QUALITATIVE ANALYSIS 281

We now notice that all these critical points


have nonzero real part, that means they
are hyperbolic critical points. Then we can
use Hartman-Grobman Theorem 6.3.6 to con-
struct the phase portrait of the nonlinear sys-
tem in (6.4.1)-(6.4.2) around these critical
points. The Hartman-Grobman theorem says
that the qualitative structure of the phase por-
trait for the linearized system is the same for
the phase portrait of the nonlinear system
Figure 11. Phase Portrait
around the critical point. So we get the pic-
for Eqs. (6.4.1)-(6.4.2).
ture in Fig. 11.

Figure 12. The phase portrait of the rabbits-sheeps system in Eqs. (6.4.1)-(6.4.2).

The two basins of attractions are separated by a curve, called the basin boundary. Only when
the initial data lies on that curve the rabbits and sheep coexist with neither becoming extinct. The
solution moves towards the critical point x3 = (1, 1). Therefore, the populations of rabbits and
sheep become equal to each other as the time goes to infinity. But, if we pick an initial data outside
this basin boundary, no matter how close this boundary, one of the species becomes extinct. C

Our next example is also a competing species system. The coefficient in this model are slightly
different from the previous example, but the behavior of the species population predicted by this
new model is very different from the previous example. The main prediction of the previous example
is that one species goes extinct. We will see that the main prediction for the next example is that
both species can coexist.

Example 6.4.2 (Competing Species: Coexistence). Sketch in the phase space all the critical
points and several solution curves tin order to get a qualitative understanding of the behavior of
all solutions to the competing species system (found in Strogatz book [7]),

x01 = x1 (1 − x1 − x2 ), (6.4.5)
3 1 
x02 = x2 − x2 − x1 , (6.4.6)
4 2

where x1 (t) is the population of one of the species, say rabbits, and x2 (t) is the population of the
other species, say sheep, at the time t. We restrict our study to nonnegative functions x1 , x2 .
282 6. QUANTITATIVE ASPECTS OF SYSTEMS OF ODE

Solution: We start computing the critical points. We need to find all constants (x1 , x2 ) solutions
of

x1 (1 − x1 − x2 ) = 0, (6.4.7)
x2 (3 − 4 x2 − 2 x1 ) = 0. (6.4.8)

• One solution is
x1 = 0 and x2 = 0,
0
which gives the critical point x = (0, 0).
• A second solution is

x1 = 0 and (3 − 4 x2 − 2 x1 ) = 0,

but since x1 = 0, then x2 = 3/4, which gives the critical point x1 = (0, 3/4).
• A third solution is
x1 (1 − x1 − x2 ) = 0 and x2 = 0,

but since x2 = 0, then x1 = 1, which gives the critical point x2 = (1, 0).
• The fourth solution is

(1 − x1 − x2 ) = 0 and (3 − 4 x2 − 2 x1 ) = 0

which gives x1 = 1/2 and x2 = 1/2, so we get the critical point x3 = (1/2, 1/2).
We now compute the linearization of Eqs.(6.4.5)-(6.4.6). We first compute the derivative of
the field F, where
   
f1 x1 (1 − x1 − x2 ) 
F(x) = = .
f2 x2 34 − x2 − 12 x1
The derivative of F at an arbitrary point x is
 
∂f1 ∂f1  
∂x1
DF (x) =  ∂f
∂x2
 = (1 − 2 x 1 − x2 ) −x1
.
2 ∂f2 − 12 x2 ( 34 − 2x2 − 1
2
x1 )
∂x1 ∂x2

We now evaluate the matrix DF (x) at each of the critical points we find the following.
 
1 0
At x0 = (0, 0), (DF0 ) = , u=0 is a Source Node.
0 34
 1 
0
At x1 = (0, 3/4), (DF1 ) = 3
3 3 , u=0 is a Saddle Node.
−8 −4
 
−1 −1
At x2 = (1, 0), (DF2 ) = 1 , u=0 is a Saddle Node.
0 4
 1
− 2 − 12

At x3 = (1/2, 1/2), (DF3 ) = , u=0 is a Sink Node.
− 14 − 12
In the expressions above, we highlighted in red boldface the eigenvalues of the linearizations. For
the last linearization, the eigenvalues are
1 √
λ3± = (−2 ± 2) < 0.
4
If we put all this information together in a phase diagram, and we use the Hartman-Grobman
Theorem 6.3.6, we obtain the picture in Fig. 13.
We can see that this competing species system predicts that both species will coexist, in this
case with population values given by the critical point x3 = (1/2, 1/2). No matter what the initial
conditions are, the solution curve will move towards the critical point x3 .
C
6.4. APPLICATIONS OF THE QUALITATIVE ANALYSIS 283

Figure 13. The phase portrait of the rabbits-sheep system in Eqs. (6.4.5)-(6.4.6).

6.4.2. Predator-Prey. In this section we construct the phase diagram of predator prey sys-
tems, first introduced in § 4.2. These are systems of equations that model physical systems con-
sisting of two biological species where one species preys on the other. For example cats prey on
mice, foxes prey on rabbits. If we call x1 the predator population, and x2 the prey population, then
predator-prey equations, also known as Lotka-Volterra equations for predator prey, are
x01 = −a x1 + α x1 x2 , (6.4.9)
x02 = b x 2 − β x 1 x2 . (6.4.10)
The constants a, b, α, and β are all positive. The vector field, F, of the equation is
   
f1 −a x1 + α x1 x2
F(x) = = .
f2 b x 2 − β x 1 x2
The equilibrium solutions, or critical points, are the solutions of F(x) = 0, which in components
we get
x1 (−a + α x2 ) = 0 (6.4.11)
x2 (b − β x1 ) = 0. (6.4.12)
There are two solutions of the equations above, which give us the critical points
b a
x0 = (0, 0), x1 = , .
β α
The derivative matrix for the predator-prey system is
 
∂f1 ∂f1  
DF (x) =  ∂f
∂x1 ∂x2
 = (−a + α x2 ) α x1
.
2 ∂f2 −β x2 (b − β x1 )
∂x1 ∂x2
0
At the critical point x = (0, 0) we get
 
−a 0
(DF0 ) =
0 b
which has eigenpairs
   
0 1
λ0+ = b, v0+ = , and λ0− = −a, v0− = ,
1 0
which means that x0 = (0, 0) is a Saddle Node. At the critical point x1 = (b/β, a/α) we get
 α 
0 β
b
(DF1 ) = β
−α a 0
284 6. QUANTITATIVE ASPECTS OF SYSTEMS OF ODE

which has eigenpairs



 
α/β
λ1± = ± ab i, v1± = p ,
± a/b i
which means that x0 = (b/β, a/α) is a Center. In particular, notice that we can write the eigenvector
v1+ above as
       
α/β 0 α/β 0
v1+ = + p i = a+bi ⇒ a = , b= p .
0 a/b 0 a/b

These vectors a and b determine the direction a point in a solution curve moves as time increases
on the x1 x2 -plane, which in this system is clockwise.
It could be useful to write an approximate expression for the solutions x(t) of the nonlinear
predator-prey system near the critical point x1 . From the eigenpairs of the linearization matrix
DF1 we can compute the fundamental solutions of the linearization,

u0 = DF1 u,

at the critical point x1 = (b/β, a/α). If we recall the formulas from § ??, them we can see that
these fundamental solutions are given by
 " α √ #
√ √
  
α/β 0 cos( ab t)
u1 (t) = cos( ab t) − sin( ab t) p = β √ ,
− ab sin( ab t)
p
0 a/b

 " α √ #
√ √
  
α/β 0 sin( ab t)
u2 (t) = sin( ab t) + cos( ab t) p β
= pa √ .
0 a/b b
cos( ab t)

Therefore, the general solution of the linearization is

u(t) = c1 u1 (t) + c1 u2 (t),

where c1 , c2 are arbitrary constants that could be determined by appropriate initial conditions.
Then, the solutions of the nonlinear system near the critical point x1 are given by x(t) ' x1 + u(t),
that is,
  "
α
√ # "
α
√ #
b/β cos( ab t) sin( ab t)
x(t) ' + c1 β √ + c2 β √ .
− ab sin( ab t) a
p p
a/α b
cos( ab t)

Remarks:
1
• Near the critical point
√ x the populations of predators
√ and preys oscillate in time with
a frequency ω = ab, that is, period T = 2π/ ab. This period is the same for all
solutions near the equilibrium solution x1 , hence independent of the initial conditions of
the solutions.
• The populations of predators and preys are out of phase by −T /4.
• The amplitude of the oscillations in the predator and prey populations do depend on the
initial conditions of the solutions.
• It can be shown that the average populations of predators and prey are, respectively, b/β
and a/α, the same as the equilibrium populations in x1 .
We can now put together all the information we found in the discussion and sketch a phase
portrait for the solutions of the Predator-Prey system in Equations (6.4.9)-(6.4.10). The phase
space is the x1 x2 -plane, where in the horizontal axis we put the predator and in the vertical axis
the prey. We only consider the region x1 > 0, x2 > 0 in the phase space, since populations cannot
be negative. We choose some arbitrary length for the vectors a, b, to be able to sketch the phase
portrait, although we faithfully represent their directions, horizontal to the right for a and vertical
upwards for b. The result is given in Fig. 14.
6.4. APPLICATIONS OF THE QUALITATIVE ANALYSIS 285

Figure 14. The phase portrait of a predator-prey system.

Example 6.4.3 (Predator-Prey: Infinite Food). Sketch the phase diagram of the predator-prey
system
1
x01 = −x1 + x1 x2 ,
2
0 3 1
x2 = x2 − x1 x2 .
4 4
Solution: We start computing the critical points, the constants (x1 , x2 ) solutions of
 1 
x1 −1 + x2 = 0, (6.4.13)
2
3 1 
x2 − x1 = 0. (6.4.14)
4 4
• One solution is
x1 = 0 and x2 = 0,
which gives the critical point x0 = (0, 0).
• The only other solution is when
 1  3 1 
−1 + x2 = 0 and − x1 = 0,
2 4 4
which means x1 = 3 and x2 = 2. This gives us the critical point x1 = (3, 2).
We now compute the linearization of Eqs.(6.4.13)-(6.4.14). We first compute the derivative of the
field F, where
x1 −1 + 12 x2
   
f1
F(x) = = .
f2 x2 34 − 41 x1
The derivative of F at an arbitrary point x is
 
∂f1 ∂f1
(−1 + 12 x2 ) 1
 
∂x ∂x x
2 1
DF (x) =  ∂f 1 ∂f 2  = 1 3 1 .
2 2 − 4 x2 ( 4 − 4 x1 )
∂x1 ∂x2

We now evaluate the matrix DF (x) at each of the critical points we find the following.
 
−1 0
At x0 = (0, 0), (DF0 ) = 3 , x0 is a Saddle Node.
0 4
3
 
1 0
At x = (3, 2), (DF1 ) = 4 , x1 is a Center.
− 12 0
The critical point x0 is a Saddle Node, because the eigenvalues of the linearization, which are the
coefficients in red boldface, satisfy λ− = −1 < 0 < λ+ = 3/4. The corresponding eigenvectors are
   
1 0
v− = , v+ = .
0 1
286 6. QUANTITATIVE ASPECTS OF SYSTEMS OF ODE

The critical point x1 is a Center, because the eigenpairs of the linearization are
r p 
3 3/2
λ± = ± i, v± = .
8 ±i
The eigenvectors above implies that as time increases the solution moves clockwise around the
center critical point. If we put all this information together, we obtain the phase diagram similar
to the one given in Fig. 14 C

It is simple to modify the Predator-Prey system above to consider the case where the prey has
access to finite food resources.

Example 6.4.4 (Predator-Prey: Finite Food). Characterize the critical points the predator-prey
system
3 1
x01 = − x1 + x1 x2 , (6.4.15)
4 4
1
x02 = x2 − σ x22 − x1 x2 , (6.4.16)
2
where σ is a constant satisfying 0 < σ < 1/3.
Solution: We start computing the critical points, the constants (x1 , x2 ) solutions of
 3 1 
x1 − + x2 = 0,
4 4
 1 
x2 1 − σ x2 − x1 = 0.
2
• One solution is
x1 = 0 and x2 = 0,
which gives the critical point x0 = (0, 0).
• Another solutions is
1
x1 = 0 and x2 = ,
σ
which gives the critical point x0 = (0, 1/σ).
• The last solution is when
 3 1   1 
− + x2 = 0 and 1 − σ x2 − x1 = 0,
4 4 2
which gives us the critical point x1 = (2(1 − 3σ), 3).
We now compute the linearization of Eqs. (6.4.15)-(6.4.16). We first compute the derivative of the
field F, where
− 34 x1 + 41 x1 x2
   
f1
F(x) = = .
f2 x2 − σ x22 − 12 x1 x2
The derivative of F at an arbitrary point x is
 
∂f1 ∂f1  3
(− 4 + 14 x2 ) 1

∂x ∂x x1
DF (x) =  ∂f 1 ∂f 2  = 4 .
2 2 − 21 x2 (1 − 2σ x2 − 1
2
x1 )
∂x1 ∂x2

We now evaluate the matrix DF (x) at each of the critical points we find the following.
 3 
−4 0
At x0 = (0, 0), (DF0 ) = , x0 is a Saddle Node.
0 1
1 1 
( − 3) 0
At x1 = (0, 1/σ), (DF1 ) = 4 σ 1 , x1 is a Saddle Node.
− 2σ −3
1
 
0 (1 − 3σ)
At x2 = (3, 2), (DF2 ) = 3
4 , x2 depends on σ.
−2 −3σ
6.4. APPLICATIONS OF THE QUALITATIVE ANALYSIS 287

The critical point x0 is a Saddle Node, because the eigenvalues of the linearization, which are the
coefficients in red boldface, satisfy λ− = −1 < 0 < λ+ = 3/4. The corresponding eigenvectors
are    
1 0
v− = , v+ = .
0 1
The critical point x1 is also a Saddle Node, because the eigenvalues of the linearization, which are
the coefficients in red boldface, satisfy λ− = −3 < 0 and λ+ = (1/σ − 3)/4 > 0, since we assumed
σ < 1/3. The corresponding eigenvectors are
   
0 (8σ + 1)
v− = , v+ = .
1 −2
Finally, we focus on the critical point x2 . It takes some algebra to compute the eigenvalues of the
linearization, which are
3 p 
λ± = −σ ± σ 2 + σ − 1/3 .
2
Then, one can see that for 0 < σ < 1/3 we have two cases:
p
• For σ ∈ (0, σ0 ), with σ0 = (−1 + 7/3)/2 ' 0.2638, the radicand (σ 2 + σ − 1/3) < 0,
therefore the critical point x2 is a Sink Spiral.
• For σ ∈ (σ0 , 1/3), with σ0 as above, the radicand (σ 2 + σ − 1/3) > 0, and also λ+ < 0,
therefore the critical point x2 is a Sink Node.
C

6.4.3. Nonlinear Pendulum. A pendulum of mass m, length `, oscillating under the gravity
acceleration g around a pivot point, in a medium with damping constant d, see Fig. 15, moves
according to Newton’s second law of motion
m(`θ)00 = −mg sin(θ) − d` θ0 ,
where the angle θ depends on time t. If we rearrange terms we get a second order scalar equation
for the variable θ(t), given by
d 0 g
θ00 + θ + sin(θ) = 0. (6.4.17)
m `

Pivot

`
θ
m

Figure 15. Pendulum of mass m, length `, oscillating around the pivot.

We want to construct a phase diagram for the pendulum. The first step is to write Eq. (6.4.17)
as a 2 × 2 first order system. For this reason we introduce x1 = θ and x2 = θ0 , then the second
order Eq. (6.4.17) can be written as
x01 = x2 (6.4.18)
g d
x02 = − sin(x1 ) − x2 . (6.4.19)
` m
288 6. QUANTITATIVE ASPECTS OF SYSTEMS OF ODE

Remark: This is a nonlinear system. In the case of small oscillations we have the approximation
sin(x1 ) ∼ x1 , and we obtain the linear system.
x01 = x2
g d
x02 = − x1 − x2 ,
` m
which we have studied in previous sections.
The phase space for this system is the plane x1 x2 , where x1 = θ and x2 = θ0 .

x2
−π = θ = π

θ<0 θ>0
−3π −2π −π π 2π 3π x1 `
θ m

θ=0

Figure 16. On the left we have the phase space of a pendulum.

If the pendulum does one complete turn clockwise and then stops at the downwards vertical
position, this position is not θ = 0, but θ = 2π. Similarly, if the pendulum does one complete turn
counterclockwise and then stops at the downwards vertical position, this position is not θ = 0, but
θ = −2π.

Example 6.4.5. Sketch a phase diagram for a pendulum. Consider the particular case of g/` = 1,
and mass m = 1. Make separate diagrams for the case of no damping, d = 0, and for the case with
damping, d > 0.
Solution: The first order differential equations for the pendulum in this case are
x01 = x2
x02 = − sin(x1 ) − d x2 .
We start finding the critical points, which are the constants (x1 , x2 ) solutions of
)
x2 = 0
⇒ sin(x1 ) = 0, x2 = 0.
− sin(x1 ) − d x2 = 0.
Therefore, we get infinitely many critical point of the form
xn = (nπ, 0), n = 0, ±1, ±2, · · · .
We now compute the linearization of Eqs.(6.4.18)-(6.4.19). We first compute the derivative of the
field F, where    
f1 x2
F(x) = = .
f2 − sin(x1 ) − d x2
The derivative of F at an arbitrary point x is
 
∂f1 ∂f1  
∂x1 ∂x2 0 1
DF (x) =  ∂f = .
2 ∂f2 − cos(x1 ) −d
∂x1 ∂x2

We now evaluate the matrix DF (x) at each of the critical points we find that
 
0 1
DF (xn ) = n+1 , n = 0, ±1, ±2, · · · .
(−1) −d
6.4. APPLICATIONS OF THE QUALITATIVE ANALYSIS 289

From here we see that we have two types of critical points, when n is even, n = 2k, and when n is
odd, n = 2k + 1. The corresponding linearizations are
   
0 1 0 1
DF (x2k ) = , DF (x2k+1 ) = . (6.4.20)
−1 −d 1 −d
In order to sketch the phase diagram, it is useful to consider three cases: (1) No damping, d = 0,
(2) small damping, and (3) large damping.
In the case of no damping, d = 0, we have,
   
0 1 ∓i
x2k = (2kπ, 0), DF2k = , λ± = ±i, v± = , x2k are Centers.
−1 0 1
   
0 1 ±1
x2k+1 = ((2k + 1)π, 0), DF2k+1 = , λ± = ±1, v± = , x2k+1 are Saddles.
1 0 0
Notice that from the center critical points, x2k , we can get the direction of the solution curves.
Since the eigenvectors for these center critical points are
         
∓i 0 −1 0 −1
v+ = = + i = a+bi ⇒ a = , b= .
1 1 0 1 0
and the direction of increasing time is a → −b, we get that around the centers the solution moves
clockwise as time increases. If we put all this information together, we get the phase diagram in
Fig. 17.

Figure 17. Phase diagram of a pendulum without damping, d = 0.

In the case that there is damping, either small or large, we need to go back to the linearizations
in Eq. (6.4.20), and compute the eigenvalues and eigenvectors for the critical points x2k and the
x2k+1 . In the case of the even critical points, x2k , we get that they are Saddle Nodes, just as in the
case d = 0, because the eigenvalues are
1 p 
λ± = −d ± d2 + 4 ⇒ λ− < 0 < λ+ ⇒ Saddle Nodes.
2
So, the even critical points are saddle nodes, no matter the value of of the damping constant d.
The odd critical points behave in a different way. In the case of odd critical points, x2k+1 , the
eigenvalues of the linearizations are
1 p 
λ± = −d ± d2 − 4 .
2
We see that we have two different types of eigenvalues, depending whether d2 − 4 < 0 or d2 − 4 > 0.
Since we assume here that d > 0, these two cases are, respectively,
0 < d < 2, or d > 2.
290 6. QUANTITATIVE ASPECTS OF SYSTEMS OF ODE

The first case above is called small damping, and in this case the eigenvalues of the linearization
are complex valued,
1 p 
λ± = −d ± 4 − d2 i , 0 < d < 2 ⇒ Sink Spirals.
2
The second case above is called large damping, and in this case the eigenvalues of the linearizations
are real and negative,
1 p 
λ± = −d ± d2 − 4 , d > 2 ⇒ Sink Nodes.
2
In Fig. 18 we sketch the phase diagram of a pendulum with small damping.

Figure 18. Phase diagram of a pendulum with small damping, 0 < d < 2.

C
6.4. APPLICATIONS OF THE QUALITATIVE ANALYSIS 291

6.4.4. Exercises.

6.4.1.- . 6.4.2.- .
CHAPTER 7

Boundary Value Problems

In this chapter we focus on boundary value problems both for ordinary differential equations and for
a particular partial differential equation, the heat equation. We start introducing boundary value
problems for ordinary differential equations. We then introduce a particular type of boundary value
problem, eigenfunction problems. Later on we introduce more general eigenfunction problems, the
Sturm-Liouville Problem. We show that solutions to Sturm-Liouville problems have two impor-
tant properties, completeness and orthognonality. Wwe use these results to introduce the Fourier
series expansion of continuous and discontinuous functions. We end this chapter introducing the
separation of variables method to find solutions of a partial differential equation, the heat equation.

293
294 7. BOUNDARY VALUE PROBLEMS

7.1. Simple Eigenfunction Problems


In this Section we consider second order, linear, ordinary differential equations. In the first half of
the Section we study boundary value problems for these equations and in the second half we focus
on a particular type of boundary value problems, called the eigenvalue-eigenfunction problem for
these equations.

7.1.1. Two-Point Boundary Value Problems. We start with the definition of a two-point
boundary value problem.

Definition 7.1.1. A two-point boundary value problem (BVP) is the following: Find solutions
to the differential equation
y 00 + a1 (x) y 0 + a0 (x) y = b(x)
satisfying the boundary conditions (BC)

b1 y(x1 ) + b2 y 0 (x1 ) = y1 ,
b̃1 y(x2 ) + b̃2 y 0 (x2 ) = y2 ,

where b1 , b2 , b̃1 , b̃2 , x1 , x2 , y1 , and y2 are given and x1 6= x2 . The boundary conditions are
homogeneous iff y1 = 0 and y2 = 0

Remarks:
(a) The two boundary conditions are held at different points, x1 6= x2 .
(b) Both y and y 0 may appear in the boundary condition.

Example 7.1.1. We now show four examples of boundary value problems that differ only on the
boundary conditions: Solve the different equation

y 00 + a1 y 0 + a0 y = e−2t

with the boundary conditions at x1 = 0 and x2 = 1 given below.


(a)
( ) ( )
y(0) = y1 , b1 = 1, b2 = 0,
Boundary Condition: which is the case
y(1) = y2 , b̃1 = 1, b̃2 = 0.
(b)
( ) ( )
y(0) = y1 , b1 = 1, b2 = 0,
Boundary Condition: which is the case
y 0 (1) = y2 , b̃1 = 0, b̃2 = 1.
(c)
y 0 (0) = y1 ,
( ) ( )
b1 = 0, b2 = 1,
Boundary Condition: which is the case
y(1) = y2 , b̃1 = 1, b̃2 = 0.
(d)
y 0 (0) = y1 ,
( ) ( )
b1 = 0, b2 = 1,
Boundary Condition: which is the case
y 0 (1) = y2 , b̃1 = 0, b̃2 = 1.
(e)
2 y(0) + y 0 (0) = y1 ,
( ) ( )
b1 = 2, b2 = 1,
BC: which is the case
y 0 (1) + 3 y 0 (1) = y2 , b̃1 = 1, b̃2 = 3.
C
7.1. SIMPLE EIGENFUNCTION PROBLEMS 295

7.1.2. Comparison: IVP and BVP. We now review the initial boundary value problem
for the equation above, which was discussed in Sect. 2.1, where we showed in Theorem 2.1.3 that
this initial value problem always has a unique solution.

Definition 7.1.2 (IVP). Find all solutions of the differential equation y 00 +a1 y 0 +a0 y = 0 satisfying
the initial condition (IC)
y(t0 ) = y0 , y 0 (t0 ) = y1 . (7.1.1)

Remarks: In an initial value problem we usually the following happens.


• The variable t represents time.
• The variable y represents position.
• The IC are position and velocity at the initial time.

A typical boundary value problem that appears in many applications is the following.

Definition 7.1.3 (BVP). Find all solutions of the differential equation y 00 + a1 y 0 + a0 y = 0


satisfying the boundary condition (BC)

y(0) = y0 , y(L) = y1 , L 6= 0. (7.1.2)

Remarks: In a boundary value problem we usually the following happens.


• The variable x represents position.
• The variable y may represents a physical quantity such us temperature.
• The BC are the temperature at two different positions.

The names “initial value problem” and “boundary value problem” come from physics. An
example of the former is to solve Newton’s equations of motion for the position function of a point
particle that starts at a given initial position and velocity. An example of the latter is to find the
equilibrium temperature of a cylindrical bar with thermal insulation on the round surface and held
at constant temperatures at the top and bottom sides.
Let’s recall an important result we saw in § 2.1 about solutions to initial value problems.

Theorem 7.1.4 (IVP). The equation y 00 + a1 y 0 + a0 y = 0 with IC y(t0 ) = y0 and y 0 (t0 ) = y1 has
a unique solution y for each choice of the IC.

The solutions to boundary value problems are more complicated to describe. A boundary
value problem may have a unique solution, or may have infinitely many solutions, or may have
no solution, depending on the boundary conditions. In the case of the boundary value problem in
Def. 7.1.3 we get the following.

Theorem 7.1.5 (BVP). The equation y 00 + a1 y 0 + a0 y = 0 with BC y(0) = y0 and y(L) = y1 , wilt
L 6= 0 and with r± roots of the characteristic polynomial p(r) = r2 + a1 r + a0 , satisfy the following.
(A) If r+ 6= r- are reals, then the BVP above has a unique solution for all y0 , y1 ∈ R.
(B) If r± = α ± i β are complex, with α, β ∈ R, then the solution of the BVP above belongs to one
of the following three possibilities:
(i) There exists a unique solution;
(ii) There exists infinitely many solutions;
(iii) There exists no solution.

Proof of Theorem 7.1.5:


Part (A): If r+ 6= r- are reals, then the general solution of the differential equation is

y(x) = c+ er+ x + c- er- x .


296 7. BOUNDARY VALUE PROBLEMS

The boundary conditions are


)
y0 = y(0) = c+ + c- 
1 1
   
c+ y0
⇔ = .
y1 = y(L) = c+ e c+ L c- L
+ c- e er+ L er- L c- y1
This system for c+ , c- has a unique solution iff the coefficient matrix is invertible. But its determinant
is
1 1
r L
e + r L = er- L − er+ L .
e-
Therefore, if the roots r+ 6= r- are reals, then er- L 6= er+ L , hence there is a unique solution c+ , c- ,
which in turn fixes a unique solution y of the BVP.
In the case that r+ = r- = r0 , then we have to start over, since the general solution of the
differential equation is
y(x) = (c1 + c2 x) er0 x , c1 , c2 ∈ R.
Again, the boundary conditions in Eq. (7.1.2) determine the values of the constants c1 and c2 as
follows: )
y0 = y(0) = c1 
1 0
   
c1 y0
r0 L r0 L ⇒ r0 L r0 L = .
y1 = y(L) = c1 e + c2 L e e L e c2 y1
This system for c1 , c2 has a unique solution iff the coefficient matrix is invertible. But its determinant
is
1 0
r L = L er0 L
e 0 L er0 L
So, for L 6= 0 the determinant above is nonzero, then there is a unique solution c1 , c2 , which in
turn fixes a unique solution y of the BVP.
Part (B): If r± = α ± iβ, that is complex, then
er-+ L = e(α±iβ)L = eαL (cos(βL) ± i sin(βL)),
therefore
er- L − er+ L = eαL cos(βL) − i sin(βL) − cos(βL) − i sin(βL)


= −2i eαL sin(βL).


We conclude that
er- L − er+ L = −2i eαL sin(βL) = 0 ⇔ βL = nπ.
So for βL 6= nπ the BVP has a unique solution, case (Bi). But for βL = nπ the BVP has either no
solution or infinitely many solutions, cases (Bii) and (Biii). This establishes the Theorem. 

Example 7.1.2. Find all solutions to the BVPs y 00 + y = 0 with the BCs:
( ( (
y(0) = 1, y(0) = 1, y(0) = 1,
(a) (b) (c)
y(π) = 0. y(π/2) = 1. y(π) = −1.

Solution: We first find the roots of the characteristic polynomial r2 + 1 = 0, that is, r± = ±i. So
the general solution of the differential equation is
y(x) = c1 cos(x) + c2 sin(x).
BC (a):
1 = y(0) = c1 ⇒ c1 = 1.
0 = y(π) = −c1 ⇒ c1 = 0.
Therefore, there is no solution.
BC (b):
1 = y(0) = c1 ⇒ c1 = 1.
1 = y(π/2) = c2 ⇒ c2 = 1.
So there is a unique solution y(x) = cos(x) + sin(x).
7.1. SIMPLE EIGENFUNCTION PROBLEMS 297

BC (c):
1 = y(0) = c1 ⇒ c1 = 1.
−1 = y(π) = −c1 ⇒ c2 = 1.
Therefore, c2 is arbitrary, so we have infinitely many solutions
y(x) = cos(x) + c2 sin(x), c2 ∈ R.
C

Example 7.1.3. Find all solutions to the BVPs y 00 + 4 y = 0 with the BCs:
( ( (
y(0) = 1, y(0) = 1, y(0) = 1,
(a) (b) (c)
y(π/4) = −1. y(π/2) = −1. y(π/2) = 1.

Solution: We first find the roots of the characteristic polynomial r2 + 4 = 0, that is, r± = ±2i. So
the general solution of the differential equation is
y(x) = c1 cos(2x) + c2 sin(2x).
BC (a):
1 = y(0) = c1 ⇒ c1 = 1.
−1 = y(π/4) = c2 ⇒ c2 = −1.
Therefore, there is a unique solution y(x) = cos(2x) − sin(2x).
BC (b):
1 = y(0) = c1 ⇒ c1 = 1.
−1 = y(π/2) = −c1 ⇒ c1 = 1.
So, c2 is arbitrary and we have infinitely many solutions
y(x) = cos(2x) + c2 sin(2x), c2 ∈ R.
BC (c):
1 = y(0) = c1 ⇒ c1 = 1.
1 = y(π/2) = −c1 ⇒ c2 = −1.
Therefore, we have no solution. C

7.1.3. Simple Eigenfunction Problems. We now focus on boundary value problems that
have infinitely many solutions. A particular type of these problems are called an eigenfunction
problems. They are similar to the eigenvector problems we studied in § 5.4. Recall that the
eigenvector problem is the following: Given an n × n matrix A, find all numbers λ and nonzero
vectors v solution of the algebraic linear system
Av = λv.
We saw that for each λ there are infinitely many solutions v, because if v is a solution so is any
multiple av. An eigenfunction problem is something similar.

Definition 7.1.6. An eigenfunction problem is the following: Given a linear operator L(y) =
a2 y 00 + a1 y 0 + a0 y, find a number λ and a nonzero function y solution of
L(y) = λy,
with homogeneous boundary conditions
b1 y(x1 ) + b2 y 0 (x1 ) = 0,
b̃1 y(x2 ) + b̃2 y 0 (x2 ) = 0.

Remarks:
• Notice that y = 0 is always a solution of the BVP above.
• Eigenfunctions are the nonzero solutions of the BVP above.
298 7. BOUNDARY VALUE PROBLEMS

• Hence, the eigenfunction problem is a BVP with infinitely many solutions.


• So, we look for λ such that the operator L(y) − λ y has characteristic polynomial with
complex roots.
• So, λ is such that L(y) − λ y has oscillatory solutions.
• Our examples focus on the linear operator L(y) = −y 00 .

Example 7.1.4. Find the eigenvalues and eigenfunctions of the operator L(y) = −y 00 , with y(0) =
0 and y(L) = 0. This is equivalent to finding all numbers λ and nonzero functions y solutions of
the BVP
−y 00 = λ y, with y(0) = 0, y(L) = 0, L > 0.

Solution: We divide the problem in three cases: (a) λ < 0, (b) λ = 0, and (c) λ > 0.
Case (a): λ = −µ2 < 0, so the equation is y 00 − µ2 y = 0. The characteristic equation is
r 2 − µ2 = 0 ⇒ r-+ = ±µ.
µx −µx
The general solution is y = c+ e + c- e . The BC imply
0 = y(0) = c+ + c- , 0 = y(L) = c+ eµL + c- e−µL .
So from the first equation we get c+ = −c- , so
0 = −c- eµL + c- e−µL ⇒ −c- (eµL − e−µL ) = 0 ⇒ c- = 0, c+ = 0.
So the only the solution is y = 0, then there are no eigenfunctions with negative eigenvalues.
Case (b): λ = 0, so the differential equation is
y 00 = 0 ⇒ y = c0 + c1 x.
The BC imply
0 = y(0) = c0 , 0 = y(L) = c1 L ⇒ c1 = 0.
So the only solution is y = 0, then there are no eigenfunctions with eigenvalue λ = 0.
Case (c): λ = µ2 > 0, so the equation is y 00 + µ2 y = 0. The characteristic equation is
r 2 + µ2 = 0 ⇒ r-+ = ±µi.
The general solution is y = c+ cos(µx) + c- sin(µx). The BC imply
0 = y(0) = c+ , 0 = y(L) = c+ cos(µL) + c- sin(µL).
Since c+ = 0, the second equation above is
c- sin(µL) = 0, c- 6= 0 ⇒ sin(µL) = 0 ⇒ µn L = nπ.
So we get µn = nπ/L, hence the eigenvalue eigenfunction pairs are
 nπ 2  nπx 
λn = , yn (x) = cn sin .
L L
Since we need only one eigenfunction for each eigenvalue, we choose cn = 1, and we get
 nπ 2  nπx 
λn = , yn (x) = sin , n > 1.
L L
C

Example 7.1.5. Find the numbers λ and the nonzero functions y solutions of the BVP
−y 00 = λy, y(0) = 0, y 0 (L) = 0, L > 0.

Solution: We divide the problem in three cases: (a) λ < 0, (b) λ = 0, and (c) λ > 0.
Case (a): Let λ = −µ2 , with µ > 0, so the equation is y 00 − µ2 y = 0. The characteristic equation
is
r2 − µ2 = 0 ⇒ r-+ = ±µ,
−µx
The general solution is y(x) = c1 e + c2 eµx . The BC imply
)
0 = y(0) = c1 + c2 , 
1 1
   
c1 0
⇒ = .
0 = y 0 (L) = −µc1 e−µL + µc2 eµL −µe−µL µeµL c2 0
7.1. SIMPLE EIGENFUNCTION PROBLEMS 299

The matrix above is invertible, because



1 1
= µ eµL + e−µL 6= 0.

−µL
µL
−µe µe
So, the linear system above for c1 , c2 has a unique solution c1 = c2 = 0. Hence, we get the only
solution y = 0. This means there are no eigenfunctions with negative eigenvalues.
Case (b): Let λ = 0, so the differential equation is
y 00 = 0 ⇒ y(x) = c1 + c2 x, c1 , c2 ∈ R.
The boundary conditions imply the following conditions on c1 and c2 ,
0 = y(0) = c1 , 0 = y 0 (L) = c2 .
So the only solution is y = 0. This means there are no eigenfunctions with eigenvalue λ = 0.
Case (c): Let λ = µ2 , with µ > 0, so the equation is y 00 + µ2 y = 0. The characteristic equation is
r 2 + µ2 = 0 ⇒ r-+ = ±µ i.
The general solution is y(x) = c1 cos(µx) + c2 sin(µx). The BC imply
)
0 = y(0) = c1 ,
⇒ c2 cos(µL) = 0.
0 = y 0 (L) = −µc1 sin(µL) + µc2 cos(µL)
Since we are interested in non-zero solutions y, we look for solutions with c2 6= 0. This implies that
µ cannot be arbitrary but must satisfy the equation
π
cos(µL) = 0 ⇔ µn L = (2n − 1) , n > 1.
2
We therefore conclude that the eigenvalues and eigenfunctions are given by
(2n − 1)2 π 2  (2n − 1)πx 
λn = − 2
, yn (x) = cn sin , n > 1.
4L 2L
Since we only need one eigenfunction for each eigenvalue, we choose cn = 1, and we get
(2n − 1)2 π 2  (2n − 1)πx 
λn = − 2
, yn (x) = sin , n > 1.
4L 2L
C

Example 7.1.6. Find the numbers λ and the nonzero functions y solutions of the BVP
−x2 y 00 + x y 0 = λ y, y(1) = 0, y(`) = 0, ` > 1.

Solution: Let us rewrite the equation as


x2 y 00 − x y 0 + λy = 0.
This equation is called an Euler equidimensional equation, and by trial and error one can find that
solutions are given by y(x) = xr , where the power r is the solutions of the indicial polynomial

r(r − 1) − r + λ = 0 ⇒ r2 − 2r + λ = 0 ⇒ r± = 1 ± 1 − λ.

Case (a): Let 1 − λ = 0, so we have a repeated root r+ = r- = 1. The general solution to the
differential equation is 
y(x) = c1 + c2 ln(x) x.
The boundary conditions imply the following conditions on c1 and c2 ,
)
0 = y(1) = c1 ,
 ⇒ c2 ` ln(`) = 0 ⇒ c2 = 0.
0 = y(`) = c1 + c2 ln(`) `
So the only solution is y = 0. This means there are no eigenfunctions with eigenvalue λ = 1.
Case (b): Let 1 − λ > 0, so we can rewrite it as 1 − λ = µ2 , with µ > 0. Then, r± = 1 ± µ, and
so the general solution to the differential equation is given by
y(x) = c1 x(1−µ) + c2 x(1+µ) ,
300 7. BOUNDARY VALUE PROBLEMS

The boundary conditions imply the following conditions on c1 and c2 ,


)
0 = y(1) = c1 + c2 , 
1 1
   
c1 0
⇒ = .
0 = y(`) = c1 `(1−µ) + c2 `(1+µ) `(1−µ) `(1+µ) c2 0
The matrix above is invertible, because

1 1 µ −µ 
(1+µ) = ` ` − ` 6= 0 ⇔ ` 6= ±1.
(1−µ)
` `
Since ` > 1, the matrix above is invertible, and the linear system for c1 , c2 has a unique solution
given by c1 = c2 = 0. Hence we get the only solution y = 0. This means there are no eigenfunctions
with eigenvalues λ < 1.
Case (c): Let 1 − λ < 0, so we can rewrite it as 1 − λ = −µ2 , with µ > 0. Then r± = 1 ± iµ, and
so the general solution to the differential equation is
  
y(x) = x c1 cos µ ln(x) + c2 sin µ ln(x) .
The boundary conditions imply the following conditions on c1 and c2 ,
)
0 = y(1) = c1 , 
  ⇒ c2 ` sin µ ln(`) = 0.
0 = y(`) = c1 ` cos µ ln(`) + c2 ` sin (µ ln(`)
Since we are interested in nonzero solutions y, we look for solutions with c2 6= 0. This implies that
µ cannot be arbitrary but must satisfy the equation

sin µ ln(`) = 0 ⇔ µn ln(`) = nπ, n > 1.
Recalling that 1 − λn = −µ2n , we get λn = 1 + µ2n , hence,
n2 π 2  nπ ln(x) 
λn = 1 + 2 , yn (x) = cn x sin , n > 1.
ln (`) ln(`)
Since we only need one eigenfunction for each eigenvalue, we choose cn = 1, and we get
n2 π 2  nπ ln(x) 
λn = 1 + 2 , yn (x) = x sin , n > 1.
ln (`) ln(`)
C
7.1. SIMPLE EIGENFUNCTION PROBLEMS 301

7.1.4. Exercises.

7.1.1.- . 7.1.2.- .
302 7. BOUNDARY VALUE PROBLEMS

7.2. Overview of Fourier series


A vector in three dimensional space can be decomposed as a linear combination of its components
in a vector basis. If the basis vectors are all mutually perpendicular—an orthogonal basis—then
there is a simple formula for the vector components. This is the Fourier expansion theorem for
vectors in space. In this section we generalize these ideas from the three dimensional space to
the infinite dimensional space of continuous functions. We introduce a notion of orthogonality
among functions and we choose a particular orthogonal basis in this space. Then we state that
any continuous function can be decomposed as a linear combination of its components in that
orthogonal basis. This is the Fourier series expansion theorem for continuous functions.

7.2.1. Fourier Expansion of Vectors. We review the basic concepts about vectors in R3
we will need to generalize to the space of functions. We start with the dot product of two vectors.

Definition 7.2.1. The dot product of vectors u = hu1 , u2 , u3 i, v = hv1 , v2 , v3 i in R3 is


u · v = u1 v1 + u2 v2 + u3 v3 .

So the dot product of two vectors is a number, which can be positive, zero of negative. The
dot product satisfies the following properties.

Theorem 7.2.2. For every u, v, w ∈ R3 and every a, b ∈ R holds,


(a) Positivity: u · u = 0 iff u = 0; and u · u > 0 for u 6= 0.
(b) Symmetry: u · v = v · u.
(c) Linearity: (au + bv) · w = a (u · w) + b (v · w).

Remark: The proof of these properties is simple and left as an exercise.


The dot product of two vectors is deeply related with the projections of one vector onto the
other. To see this geometrical meaning of the dot product we need the following result.

Theorem 7.2.3. The dot product of vectors u, v ∈ R3 can be written as


u · v = ku k kv k cos(θ),
where ku k, kv k are the length of the vectors u, v, and θ ∈ [0, π) is the angle between the vectors.

p
Remark: Recall that the length of a vector u is ku k = (u1 )2 + (u2 )2 + (u3 )2 .
Proof of Theorem 7.2.3: The law of cosines for a triangle with sides given by vectors u, v, and
(u − v), as shown in Fig. 1, is
ku − v k2 = ku k2 + kv k2 − 2ku kv k cos(θ)
If we write this formula in components we get
(u1 − v1 )2 + (u2 − v2 )2 + (u3 − v3 )2 = (u21 + u22 + u23 ) + (v12 + v22 + v32 ) − 2ku kv k cos(θ).
If we expand the squares on the left-hand side above and we cancel terms we get
−2u1 v1 − 2u2 v2 − 2u3 v3 = −2ku kv k cos(θ).
We now use on the left-hand side above the definition of the dot product of vectors u, v, so
u · v = ku kv k cos(θ).
This establishes the Theorem. 
From the dot product expression in Theorem 7.2.3 we can see that the dot product of two
vectors vanishes if and only if the angle in between the vectors is θ = π/2. So the dot product is
a good way to determine whether two vectors are perpendicular. We summarize this result in a
theorem.
7.2. OVERVIEW OF FOURIER SERIES 303

u u−v
θ
v
y

Figure 1. Vectors used in the proof of Theorem 7.2.3.

Theorem 7.2.4. The nonzero vectors u, v are orthogonal (perpendicular) iff u · v = 0.

Proof of Theorem 7.2.4: Suppose that u, v satisfy that u · v = 0. This is equivalent to say
that ku kkv k cos(θ) = 0. Since the vectors are nonzero, this is equivalent to cos(θ) = 0. But for
θ ∈ [0, π) this is equivalent to θ = π/2. This establishes the Theorem. 
We can go a little deeper in the geometrical meaning of the dot product. In fact, the dot
product of two vectors is related to the scalar projection of one vector onto the other.

z u
→u
Pv

θ v
v P u→

Figure 2. The scalar projection of u onto v is the length of the solid


segment in green, while the scalar projection of v onto u is the length of
the solid segment in purple.

For example, if we denote by Pu→v the scalar projection of u onto v, then


u·v
Pu→v = ku k cos(θ) = .
kv k
Analogously, the scalar projection of v onto u is
u·v
Pv→u = kv k cos(θ) = .
ku k
Notice that the length of a vector u can also be written in terms of the dot product,

ku k = u · u.
A vector u is a unit vector if and only if
u · u = 1.
And any vector v can be rescaled into a unit vector by dividing by its magnitude. So, the vector w
below is a unit vector,
v
w= .
kv k
304 7. BOUNDARY VALUE PROBLEMS

A set of vectors is an orthogonal set if all the vectors in the set are mutually perpendicular. An
orthonormal set is an orthogonal set where all the vectors are unit vectors.

Example 7.2.1. The set of vectors {i, j , k } used in physics is an orthonormal set in R3 .
Solution: These are the vectors
     
1 0 0
i =  0 , j = 1 , k = 0 .
0 0 1

As we can see in Fig. 3, they are mutually perpendic-


ular, and have unit magnitude, that is, k
i · j = 0, i · i = 1, j
i · k = 0, j · j = 1, i y
j · k = 0. k · k = 1.
C x

Figure 3. Vectors i,j ,k.

Any rotation of the vectors i, j , k (that is the three vectors are rotated in the same way) is
also an orthonormal set of vectors.

Example 7.2.2. Show that {u1 = h1, 1, 1i, u2 = h−2, 1, 1i, u3 = h0, −3, 3i} is an orthogonal set.
Is the set orthonormal? If not, normalize it.
Solution: Let’s see the orthogonality conditions.
u1 · u2 = h1, 1, 1i · h−2, 1, 1i = −2 + 1 + 1 = 0,
u1 · u3 = h1, 1, 1i · h0, −3, 3i = −3 + 3 = 0,
u2 · u3 = h−2, 1, 1i · h0, −3, 3i = −3 + 3 = 0,
so, this set is orthogonal. This set is not orthonormal, since
√ √ √
ku1 k = 3, ku2 k = 6, ku3 k = 18.
Therefore, an orthonormal set is
n 1 1 1 o
ũ1 = √ h1, 1, 1i, ũ2 = √ h−2, 1, 1i, ũ3 = √ h0, −3, 3i .
3 6 18
C

Remark: Given two vectors, say u1 = h1, 1, 1i and u2 = h−2, 1, 1i, how to do you find a third
vector, u3 , perpendicular to these two vectors? Easy, with the cross product:

i j k

u3 = u1 × u2 = 1 1 1 = (1 − 1)i − (1 + 2)j + ((1 + 2)k = h0, −3, 3i.
−2 1 1

The Fourier expansion theorem says that the set above is not just a set, it is a basis—any
vector in R3 can be decomposed as a linear combination of the basis vectors. Furthermore, there is
a simple formula for the vector components.
7.2. OVERVIEW OF FOURIER SERIES 305

Theorem 7.2.5. Any orthogonal set {u1 , u2 , u3 } is an orthogonal basis, that is, every vector v ∈ R3
can be decomposed as
v = v1 u1 + v2 u2 + v3 u3 .
Furthermore, there is a formula for the vector components,
(v · u1 ) (v · u2 ) (v · u3 )
v1 = , v2 = , v3 = .
(u1 · u1 ) (u2 · u2 ) (u3 · u3 )

Proof of Theorem 7.2.5: Since the vectors u1 , u2 , u3 are mutually perpendicular, that means
they are linearly independent, so the span of these vectors is the whole space R3 . Therefore, given
any vector v ∈ R3 , there exists constants v1 , v2 , and v3 such that
v = v1 u1 + v2 u2 + v3 u3 .
Since the vectors u1 , u2 , u3 are mutually orthogonal, we can compute the dot product of the equation
above with u1 , and we get
u1 · v = v1 u1 · u1 + v2 u1 · u2 + v3 u1 · u3 = v1 u1 · u1 + 0 + 0,
therefore, we get a formula for the component v1 ,
(v · u1 )
v1 = .
(u1 · u1 )
A similar calculation provides the formulas for v2 and v3 . This establishes the Theorem. 

Example 7.2.3. Find the Fourier expansion of the vector v = h3, −2, 4i on the set
{u1 = h1, 1, 1i, u2 = h−2, 1, 1i, u3 = h0, −3, 3i}.

Solution: We know that the set above is orthogonal. So there exist coefficients v1 , v2 , v3 such that
v = v1 u1 + v2 u2 + v3 u3 .
The formula for the coefficient vi , i = 1, 2, 3 is vi = (v · ui )/(ui · ui ). So we get,
v · u1 h3, −2, 4i · h1, 1, 1i 5
v1 = = = ,
u1 · u1 h1, 1, 1i · h1, 1, 1i 3
v · u2 h3, −2, 4i · h−2, 1, 1i 4 2
v2 = = =− =− ,
u2 · u2 h−2, 1, 1i · h−2, 1, 1i 6 3
v · u3 h3, −2, 4i · h0, −3, 3i 18
v3 = = = = 1.
u3 · u3 h0, −3, 3i · h0, −3, 3i 18
Therefore,
5 2
v= h1, 1, 1i − h−2, 1, 1i + h0, −3, 3i.
3 3
C

7.2.2. Fourier Expansion of Functions. The Fourier expansion of a vector in an orthogonal


set in R3 is based in the notion of perpendicular vectors in three-dimensional space. If we can
introduce a notion of orthogonal functions in the space of functions, then we can introduce a
Fourier expansion of a function in an orthogonal set of functions.
The main difference between R3 and the space of functions is that we have a geometric intuition
of perpendicular vectors in three-dimensional space but we do not have such geometric intuition of
perpendicular functions in the space of functions. So, instead of geometric intuition we use analytic
methods to define the notion of perpendicular functions.
One first notices that the geometric properties of vector projections in three-dimensional space
can be captured by the dot product of vectors in R3 , more precisely, by the following three analytic
properties of the dot product: positivity, symmetry, and linearity. Then, inspired by the dot
product of vectors, we introduce a dot product of functions such that it has these same three
306 7. BOUNDARY VALUE PROBLEMS

analytic properties: positivity, symmetry, and linearity. Once we have a dot product of functions,
we use it to define what perpendicular functions are.
Notice that there are infinitely many products of two vectors in R3 such that the result is
a number and the product has the positivity, symmetry, and linearity properties. Our geometric
intuition leads us to select one, the one we called the dot product. In the space of functions there
are also infinitely many products of two functions such that the result is a number and the product
has the positivity, symmetry, and linearity properties. The lack of geometric intuition in the space
of functions leads us to choose the simplest looking product, and here it is.

Definition 7.2.6. The dot product of two functions f , g on [−L, L] is


Z L
f ·g = f (x) g(x) dx.
−L

The dot product above takes two functions and produces a number. And one can verify that
the product has the following properties.

Theorem 7.2.7. For every functions f , g, h and every a, b ∈ R holds,


(a) Positivity: f · f = 0 iff f = 0; and f · f > 0 for f 6= 0.
(b) Symmetry: f · g = g · f .
(c) Linearity: (a f + b g) · h = a (f · h) + b (g · h).

Remarks: The magnitude of a function f is the nonnegative number

p Z L 2 1/2
kf k = f ·f = f (x) dx .
−L

We use a double bar to denote the magnitude so we do not confuse it with |f |, which means the
absolute value. A function f is a unit function iff f · f = 1.
Since we do not have a geometric intuition of perpendicular functions, we define perpendicular
functions using the dot product.

Definition 7.2.8. Two functions f , g are orthogonal (perpendicular) iff f · g = 0.

A set of functions is an orthogonal set if all the functions in the set are mutually perpendicular.
An orthonormal set is an orthogonal set where all the functions are unit functions.

Theorem 7.2.9. An orthogonal set in the space of continuous functions on [−L, L] is


n 1  nπx   nπx o∞
u0 = , un (x) = cos , vn (x) = sin .
2 L L n=1

Remark: To show that the set above is orthogonal we need to show that the dot product of any
two different functions in the set vanishes—the three equations below.

um · un = 0, m 6= n.
vm · vn = 0, m 6= n.
um · vn = 0, for all m, n.

The orthogonality of the set above can be obtained from the following statement about the
functions sine and cosine.
7.2. OVERVIEW OF FOURIER SERIES 307

Theorem 7.2.10. The following relations hold for all n, m = 0, 1, 2, 3, · · · ,


n 6= m,

Z L  nπx   mπx  0

cos cos dx = L n = m 6= 0,
−L L L 
2L n = m = 0,
Z L 0 n 6= m,
 nπx   mπx  
sin sin dx =
−L L L L n = m,
Z L  nπx   mπx 
cos sin dx = 0.
−L L L

Proof of Theorem 7.2.10: Recall the following trigonometric identities:


1 
cos(θ) cos(φ) = cos(θ + φ) + cos(θ − φ) ,
2
1 
sin(θ) sin(φ) = cos(θ − φ) − cos(θ + φ) ,
2
1 
sin(θ) cos(φ) = sin(θ + φ) + sin(θ − φ) .
2
To prove the first formula in the theorem we use the first trigonometric identity above,
Z L
1 L 1 L
 nπx   mπx  Z h (n + m)πx i Z h (n − m)πx i
cos cos dx = cos dx + cos dx.
−L L L 2 −L L 2 −L L
First, assume either n > 0 or m > 0, then the first term vanishes, since
1 L
Z h (n + m)πx i L h (n + m)πx i L
cos dx = sin = 0.

2 −L L 2(n + m)π L

−L

Still for either n > 0 or m > 0, assume that n 6= m, then the second term above is
1 L
Z h (n − m)πx i h (n − m)πx i L
L
cos dx = sin = 0.

2 −L L 2(n − m)π L

−L

Again, still for either n > 0 or m > 0, assume that n = m 6= 0, then


1 L 1 L
Z h (n − m)πx i Z
cos dx = dx = L.
2 −L L 2 −L
Finally, in the case that both n = m = 0 is simple to see that
Z L  nπx   mπx  Z L
cos cos dx = dx = 2L.
−L L L −L

The second and third formulas in the Theorem are proven in a similar way, using the second and
third trigonometric identities above. This establishes the Theorem. 

Remark: The theorem above in the case m = n gives us the length of the functions u0 , un , vn ,
So we can normalize them. The result is the following orthonormal set, which is sometimes used in
the literature on Fourier series.
n 1 1  nπx  1  nπx o∞
ũ0 = √ , ũn (x) = √ cos , ṽn (x) = √ sin .
2L L L L L n=1

Theorem 7.2.11 (Fourier Expansion). The orthogonal set


n 1  nπx   nπx o∞
u0 = , un (x) = cos , vn (x) = (7.2.1)
2 L L n=1
is an orthogonal basis of the space of continuous functions on [−L, L], that is, any continuous
function on [−L, L] can be decomposed as

a0 X  nπx   nπx 
f (x) = + an cos + bn sin .
2 n=1
L L
308 7. BOUNDARY VALUE PROBLEMS

Moreover, the coefficients above are given by the formulas


1 L
Z
a0 = f (x) dx,
L −L
Z L
1  nπx 
an = f (x) cos dx,
L −L L
1 L
Z  nπx 
bn = f (x) sin dx.
L −L L
Furthermore, if the function f is piecewise continuous, then the function

a0 X  nπx   nπx 
fF (x) = + an cos + bn sin ,
2 n=1
L L

satisfies fF (x) = f (x) for all x where f is continuous, while for all x0 where f is discontinuous it
holds
1 
fF (x0 ) = lim f (x) + lim f (x)
2 x→x+0 x→x−0

Remark: We have chosen the basis function u0 = 1/2 instead of just u0 = 1, because for this
constant 1/2 the formula for the coefficient a0 has exactly the same factor in front, 1/L, as the
formulas for an and bn for n > 1.
Idea of the Proof of Theorem 7.2.11: It is not simple to prove that the set in 7.2.1 is a basis,
that is every continuous function on [−L, L] can be written as a linear combination

a0 X  nπx   nπx 
f (x) = + an cos + bn sin .
2 n=1
L L

We skip that part of the proof. But once we have the expansion above, it is not difficult to find a
formula for the coefficients a0 , an , and bn , for n > 1. To find a coefficient bm we just multiply the
expansion above by sin(mπx/L) and integrate on [−L, L], that is,
 mπx   a X∞   nπx   nπx   mπx 
0
f (x) · sin = + an cos + bn sin · sin .
L 2 n=1
L L L

The linearity property of the dot product implies


 mπx  1  mπx 
f (x) · sin = a0 · sin
L 2 L
X∞  nπx   mπx 
+ an cos · sin
n=1
L L

X  nπx   mπx 
+ bn sin · sin .
n=1
L L

But (1/2) · sin(mπx/L) = 0, since the sine functions above are perpendicular to the constant
functions. Also cos(nπx/L) · sin(mπx/L) = 0, since all sine functions above are perpendicular to
all cosine functions above. Finally sin(nπx/L) · sin(mπx/L) = 0 for m 6= n, since sine functions
with different values of m and n are mutually perpendicular. So, on the right hand side above it
survives only one term, n = m,
 mπx   mπx   mπx 
f (x) · sin = bm sin · sin .
L L L
But on the right hand side we got the magnitude square of the sine function above,
 mπx   mπx   mπx  2
sin · sin = sin = L.

L L L
7.2. OVERVIEW OF FOURIER SERIES 309

Therefore,
 mπx  Z L
1  mπx 
f (x) · sin = bm L ⇒ bm = f (x) sin dx.
L L −L L
To get the coefficient am , multiply the series expansion of f by cos(mπx/L) and integrate on
[−L, L], that is
 mπx  a X∞   nπx   nπx   mπx 
0
f (x) · cos = + an cos + bn sin · cos .
L 2 n=1
L L L

As before, the linearity of the dot product together with the orthogonality properties of the basis
implies that only one term survives,
 mπx   mπx   mπx 
f (x) · cos = am cos · cos .
L L L
Since
 mπx   mπx   mπx  2
cos · cos = cos = L,

L L L
we get that
 mπx  Z L
1  mπx 
f (x) · cos = am L ⇒ am = f (x) cos dx.
L L −L L
The coefficient a0 is obtained integrating on [−L, L] the series expansion for f , and using that all
sine and cosine functions above are perpendicular to the constant functions, then we get
Z L
a0 L
Z
a0
f (x), dx = dx = 2L,
−L 2 −L 2
so we get the formula
Z L
1
a0 = f (x) dx.
L −L

We also skip the part of the proof about the values of the Fourier series of discontinuous functions
at the point of the discontinuity. 
We now use the formulas in the Theorem above to compute the Fourier series expansion of a
continuous function.

 x , for x ∈ [0, 3]

Example 7.2.4. Find the Fourier expansion of f (x) = 3


 0, for x ∈ [−3, 0).

Solution: The Fourier expansion of f is



a0 X  nπx   nπx 
fF (x) = + an cos + bn sin
2 n=1
L L

In our case L = 3. We start computing bn for n > 1,


1 3
Z  nπx 
bn = f (x) sin dx
3 −3 3
Z 3
1 x  nπx 
= sin dx
3 0 3 3
1  3x  nπx  9  nπx  3
= − cos + 2 2 sin

9 nπ 3 n π 3

0
1  9 
= − cos(nπ) + 0 + 0 − 0 ,
9 nπ
therefore we get
(−1)(n+1)
bn = .

310 7. BOUNDARY VALUE PROBLEMS

A similar calculation gives us an = 0 for n > 1,

Z 3
1  nπx 
an = f (x) cos dx
3 −3 3
Z 3
1 x  nπx 
= cos dx
3 0 3 3
1 3x
  nπx  9  nπx  3
= sin + 2 2 cos

9 nπ 3 n π 3

0
1 9 9 
= 0 + 2 2 cos(nπ) − 0 − 2 2 ,
9 n π n π

therefore we get

((−1)n − 1)
an = .
n2 π 2

Finally, we compute a0 ,

3
1 x2 3
Z
1 x 1 1

a0 = dx = = , ⇒ a0 = .
3 0 3 9 2 0 2 2

Therefore, we get


1 Xh ((−1)n − 1)  nπx  (−1)(n+1)  nπx i
f (x) = + cos + sin .
4 n=1 n2 π 2 3 nπ 3

7.2.3. Even or Odd Functions. The Fourier series expansion of a function takes a simpler
form in case the function is either even or odd. More interestingly, given a function on [0, L] one
can extend such function to [−L, L] requiring that the extension be either even or odd.

Definition 7.2.12. A function f on [−L, L] is:

• even iff f (−x) = f (x) for all x ∈ [−L, L];


• odd iff f (−x) = −f (x) for all x ∈ [−L, L].

Remark: Not every function is either odd or even. The function y = ex is neither even nor odd.
And in the case that a function is even, such as y = cos(x), or odd, such as y = sin(x), it is very
simple to break that symmetry, for example the functions y = cos(x − π/4) and y = 1 + sin(x) are
neither even nor odd.
Below we now show that the graph of a typical even function is symmetrical about the vertical
axis, while the graph of a typical odd function is symmetrical about the origin.

Example 7.2.5. The function y = x2 is even, while the function y = x3 is odd.


7.2. OVERVIEW OF FOURIER SERIES 311

y
y
y = x3
y = x2

x
x

Figure 5. y = x3 is odd.
Figure 4. y = x2 is even. C

We now summarize a few property of even functions and odd functions.

Theorem 7.2.13. If fe , ge are even and ho , `o are odd functions, then:


(1) a fe + b ge is even for all a, b ∈ R.
(2) a ho + b `o is odd for all a, b ∈ R.
(3) fe ge is even.
(4) ho `o is even.
(5) fe ho is odd.
Z L Z L
(6) fe dx = 2 fe dx.
Z−LL
0

(7) ho dx = 0.
−L

Remark: We leave proof as an exercise. Notice that the last two equations above are simple to
understand, just by looking at the figures below.

y y
2
y=x y = x3

+ + +
x x

Figure 6. Integral of an Figure 7. Integral of an


even function. odd function.

7.2.4. Sine and Cosine Series. In the case that a function is either even or odd, half of its
Fourier series expansion coefficients vanish. In this case the Fourier series is called either a sine or
a cosine series.
312 7. BOUNDARY VALUE PROBLEMS

Theorem 7.2.14. Let f be a function on [−L, L] with a Fourier expansion



a0 Xh  nπx   nπx i
f (x) = + an cos + bn sin .
2 n=1
L L

(a) If f is even, then bn = 0. The series Fourier series is called a cosine series,

a0 X  nπx 
f (x) = + an cos .
2 n=1
L

(b) If f is odd, then an = 0. The series Fourier series is called a sine series,
X∞  nπx 
f (x) = bn sin .
n=1
L

Proof of Theorem 7.2.14:


Part (a): Suppose that f is even, then for n > 1 we get
1 L
Z  nπx 
bn = f (x) sin dx,
L −L L
but f is even and the sine is odd, so the integrand is odd. Therefore bn = 0.
Part (b): Suppose that f is odd, then for n > 1 we get
1 L
Z  nπx 
an = f (x) cos dx,
L −L L
but f is odd and the cosine is even, so the integrand is odd. Therefore an = 0. Finally
1 L
Z
a0 = f (x) dx,
L −L
but f is odd, hence a0 = 0. This establishes the Theorem. 
(
1, for x ∈ [0, 3]
Example 7.2.6. Find the Fourier expansion of f (x) =
−1, for x ∈ [−3, 0).
Solution: The function f is odd, so its Fourier series expansion

a0 X  nπx   nπx 
fF (x) = + an cos + bn sin
2 n=1
L L
is actually a sine series. Therefore, all the coefficients an = 0 for n > 0. So we only need to compute
the coefficients bn . Since in our case L = 3, we have
1 3
Z  nπx 
bn = f (x) sin dx
3 −3 3
1 0
Z  nπx  Z 3 
nπx  
= (−1) sin dx + sin dx
3 −3 3 0 3
2 3  nπx 
Z
= sin dx
3 0 3
2 3  nπx  3
= (−1) cos
3 nπ 3

0
2 n 2
((−1)(n+1) + 1).

= −(−1) + 1 ⇒ bn =
nπ nπ
Therefore, we get

X 2  nπx 
fF (x) = ((−1)(n+1) + 1) sin .
n=1
nπ L
7.2. OVERVIEW OF FOURIER SERIES 313

Example 7.2.7. Find the Fourier series expansion of the function


(
x x ∈ [0, 1],
f (x) =
−x x ∈ [−1, 0).

Solution: Since f is even, then bn = 0. And since L = 1, we get



a0 X
f (x) = + an cos(nπx),
2 n=1

We start with a0 . Since f is even, a0 is given by


Z 1 Z 1
x 2 1

a0 = 2 f (x) dx = 2 x dx = 2 ⇒ a0 = 1.
0 0 2 0
Now we compute the an for n > 1. Since f and the cosines are even, so is their product,
Z 1
an = 2 x cos(nπx) dx
0
 x 1  1
=2 sin(nπx) + 2 2 cos(nπx)

nπ n π 0
2 2
⇒ an = 2 2 (−1)n − 1 .
 
= 2 2 cos(nπ) − 1
n π n π
Therefore, the Fourier expansion of the function f above is

1 X 2
(−1)n − 1 cos(nπx).

f (x) = +
2 n=1 n2 π 2

Example 7.2.8. Find the Fourier series expansion of the function


(
1 − x x ∈ [0, 1]
f (x) =
1 + x x ∈ [−1, 0).

Solution: Since f is even, then bn = 0. And since L = 1, we get



a0 X
f (x) = + an cos(nπx),
2 n=1

We start computing a0 ,
Z 1
a0 = f (x) dx
−1
Z 0 Z 1
= (1 + x) dx + (1 − x) dx
−1 0
 x2  0  x2  1
= x+ + x−
2 −1 2 0

 1  1 
= 1− + 1− ⇒ a0 = 1.
2 2
314 7. BOUNDARY VALUE PROBLEMS

Similarly,
Z 1
an = f (x) cos(nπx) dx
−1
Z 0 Z 1
= (1 + x) cos(nπx) dx + (1 − x) cos(nπx) dx.
−1 0

Recalling the integrals


Z
1
cos(nπx) dx =
sin(nπx),

Z
x 1
x cos(nπx) dx = sin(nπx) + 2 2 cos(nπx),
nπ n π
it is not difficult to see that
1 h x 1
0 i 0
an = sin(nπx) + sin(nπx) + 2 2 cos(nπx)

nπ −1 nπ n π −1
1 x 1
1 h i 1
+ sin(nπx) − sin(nπx) + 2 2 cos(nπx)

nπ 0 nπ n π 0
h 1 1 i h 1 1 i
= − 2 2 cos(−nπ) − 2 2 cos(−nπ) − 2 2 ,
n2 π 2 n π n π n π
we then conclude that
2  2
an = 2 2 1 − cos(−nπ) = 2 2 1 − (−1)n .
 
n π n π
Therefore, the Fourier expansion of the function f above is

1 X 2
1 − (−1)n cos(nπx).

f (x) = + 2 2
2 n=1 n π
C

Example 7.2.9. Find the Fourier expansion of the odd extension of the function
f (x) = 1 − x for x ∈ (0, 1].

Solution: Since the function is f (x) = 1 − x for x ∈ (0, 1], its odd extension is

1 − x, x ∈ (0, 1]


fo (x) = 0, x=0

− 1 − x, x ∈ [−1, 0).

Since fo is a piecewise function defined in [−1, 1], it has a Fourier expansion foF . Since fo is odd,
its Fourier expansion is a sine series,

X
foF (x) = bn sin(nπx),
n=1

where we used that L = 1. And the coefficients bn can be obtained from the formula
Z 1 Z 1 Z 1
bn = fo (x) sin(nπx) dx = 2 f (x) sin(nπx) dx = 2 (1 − x) sin(nπx) dx.
−1 0 0

x sin(ax) dx = −(x/a) cos(ax) + (1/a2 ) cos(ax), then we get


R
Recalling that
 (−1) x 1  1
bn = 2 cos(nπx) + cos(nπx) − 2 2 sin(nπx)

nπ nπ n π 0
2  
= − cos(nπ) + cos(nπ) − 0 + 1 − 0 − 0

2
= .

7.2. OVERVIEW OF FOURIER SERIES 315

Therefore, the Fourier expansion of the odd extension of the function f above is

2 X sin(nπx)
foF (x) =
π n=1 n
C

7.2.5. Applications. The Fourier series expansion is a powerful tool for signal analysis. It
allows us to view any signal in a different way, where several difficult problems are very simple to
solve. Take sound, for example. Sounds can be transformed into electrical currents by a microphone.
There are electric circuits that compute the Fourier series expansion of the currents, and the result
is the frequencies and their corresponding amplitude present on that signal. Then it is possible
to manipulate a precise frequency and then recombine the result into a current. That current is
transformed into sound by a speaker.
This type of sound manipulation is very common. You might remember the annoying sound of
the vuvuzelas—kind of loud trumpets, plastic made, very cheap—in the 2010 soccer world cham-
pionship. Their sound drowned the tv commentators during the world cup. But by the 2014 world
cup you could see the vuvuzelas in the stadiums but you did not hear them. It turns out vuvuzelas
produce a single frequency sound, about 235 Hz. The tv equipment had incorporated a circuit
that eliminated that sound, just as we described above. Fourier series expand the sound, kill that
annoying frequency, and recombine the sound.
A similar, although more elaborate, sound manipulation is done constantly by sound editors in
any film. Suppose you like an actor but you do not like his voice. You record the movie, then take
the actor’s voice, compute its Fourier series expansion, increase the amplitudes of the frequencies
you like, kill the frequencies you do not like, and recombine the resulting sound. Now the actor has
a new voice in the movie.

Fourier transform are used in image analysis too.


A black and white image can be thought as a func-
tion from a rectangle into the set {0, 1}, that is,
pixel on or pixel off. We can now write this func-
tion of two variables as a linear combination of
sine and cosine functions in two space dimensions.
Then one can manipulate the individual frequen-
cies, enhancing some, decreasing others, and then
recombine the result. In Fig. 8 we have an image
and its Fourier transform. In Fig 9 and 10 we see
Figure 8. An image and its
the effect on the image when we erase high or low
frequency modes. Fourier transform.

Figure 9. Image after eras- Figure 10. Image after


ing high frequency terms. erasing low frequency terms.
316 7. BOUNDARY VALUE PROBLEMS

7.2.6. Exercises.

7.2.1.- . 7.2.2.- .
7.3. THE HOMOGENEOUS HEAT EQUATION 317

7.3. The Homogeneous Heat Equation


We now solve our first partial differential equation—the heat equation—which describes the tem-
perature of a solid material as function of time and space. This is a partial differential equation
because it contains partial derivatives of both time and space variables. Partial differential equa-
tions have infinitely many solutions, but one can find appropriate boundary conditions and initial
conditions so that the solution is unique. Here we solve the heat equations with two types of
boundary conditions, called Dirichlet conditions and Neumann conditions. The Dirichlet condition
keeps the temperature constant on the sides of the material, the Neumann condition prevents heat
from entering or leaving the material. For either type of boundary conditions, the initial condition
is the same—the initial temperature of the material.
We use the separation of variables method to solve the heat equation. In this method one
transforms the problem with the partial differential equation into two problems for ordinary differ-
ential equations. One of these problems is an initial value problem for a first order equation and
the other problem is an eigenfunction problem for a second order equation.
7.3.1. The Heat Equation (in One-Space Dim). We start introducing the heat equation.
For simplicity we study only the one-space dimension case.

Definition 7.3.1. The heat equation in one-space dimension, for the function T depending on t
and x is
∂t T (t, x) = k ∂x2 T (t, x), for t ∈ (0, ∞), x ∈ (0, L),
where k > 0 is a constant and ∂t , ∂x are partial derivatives with respect to t and x.

Remarks:
• T is the temperature of a solid material, see Fig. 11.
• t is a time coordinate, while x is a space coordinate.
• k > 0 is the heat conductivity, with units [k] = [x]2 /[t].
• The partial differential equation above has infinitely many solutions.
• We look for solutions satisfying both boundary conditions and initial conditions.

y
t
Insulation

T (t, L) = 0 T (t, 0) = 0 ∂t T = k ∂x2 T T (t, L) = 0


T (t, 0) = 0
x
0 L 0 T (0, x) = T0 (x) L x
z Insulation
Figure 12. Sketch of the
Figure 11. A solid bar initial-boundary value prob-
thermally insulated on the lem on the tx-plane.
four blue sides.

The heat equation describes the temperature of a solid material. Fluid materials are not de-
scribed by this equation. Warmer parts of a fluid have different density that colder parts, which
originates movement inside the fluid. These movements, called convection currents, are not de-
scribed by the heat equation above, which describes solid materials only.
The heat equation contains partial derivatives with respect to time and space. Solving the
equation means to integrate in space and time variables, which in turns means we have arbitrary
integration constants in or solution. From here we conclude we will have infinitely many solutions
to the heat equation. We are going to look for solutions that satisfy some additional conditions.
known as boundary conditions and initial conditions.
318 7. BOUNDARY VALUE PROBLEMS

( (
u(t, 0) = 0, u(0, x) = f (x),
Boundary Conditions: Initial Conditions:
u(t, L) = 0. f (0) = f (L) = 0.
We are going to try to understand the qualitative behavior of the solutions to the heat equation
before we start any detailed calculation. Recall that the heat equation is
∂t u = k ∂x2 u.
The meaning of the left and hand side of the equation is the following:

How fast the temperature The concavity of the graph of u


 
= k (> 0)
increases of decreases. in the variable x at a given time.

Suppose that at a fixed time t > 0 the graph of the temperature u as function of x is given by
Fig. 13. We assume that the boundary conditions are u(t, 0) = T0 = 0 and u(t, L) = TL > 0. Then
the temperature will evolve in time following the red arrows in that figure.
The heat equation relates the time variation of the temperature, ∂t u, to the curvature of the
function u in the x variable, ∂x2 u. In the regions where the function u is concave up, hence ∂x2 u > 0,
the heat equation says that the tempreature must increase ∂t u > 0. In the regions where the
function u is concave down, hence ∂x2 u < 0, the heat equation says that the tempreature must
decrease ∂t u < 0.
Therefore, the heat equation tries to make the temperature along the material to vary the
least possible that is consistent with the boundary conditions. In the case of the figure below, the
temperature will try to get to the dashed line.

u(t, x)
0 6= TL ∂t u < 0

∂t u > 0
0 = T0
0 t fixed L x

Figure 13. Qualitative behavior of a solution to the heat equation.

Before we start solving the heat equation we mention one generalizations and and a couple of
similar equations.
• The heat equation in three space dimensions is
∂t u = k (∂x2 u + ∂y2 u + ∂z2 u).
The method we use in this section to solve the one-space dimensional equation can be
generalized to solve the three-space dimensional equation.
• The wave equation in three space dimensions is
∂t2 u = v 2 (∂x2 u + ∂y2 u + ∂z2 u).
This equation describes how waves propagate in a medium. The constant v has units of
velocity, and it is the wave speed.
7.3. THE HOMOGENEOUS HEAT EQUATION 319

• The Schrödinger equation of Quantum Mechanics is


~2 2
i~ ∂t u = (∂x u + ∂y2 u + ∂z2 u) + V (t, x) u,
2m
where m is the mass of a particle and ~ is the Planck constant divided by 2π, while
i2 = −1. The solutions of this equation behave more like the solutions of the wave
equation than the solutions of the heat equation.

7.3.2. The IBVP: Dirichlet Conditions. We now find solutions of the one-space dimen-
sional heat equation that satisfy a particular type of boundary conditions, called Dirichlet boundary
conditions. These conditions fix the values of the temperature at two sides of the bar.

Theorem 7.3.2. The boundary value problem for the one space dimensional heat equation,
∂t u = k ∂x2 u, BC: u(t, 0) = 0, u(t, L) = 0,
where k > 0, L > 0 are constants, has infinitely many solutions
∞  nπx 
X nπ 2
u(t, x) = cn e−k( L ) t sin , cn ∈ R.
n=1
L
Furthermore, for every continuous function f on [0, L] satisfying f (0) = f (L) = 0, there is a unique
solution u of the boundary value problem above that also satisfies the initial condition
u(0, x) = f (x).
This solution u is given by the expression above, where the coefficients cn are
2 L
Z  nπx 
cn = f (x) sin dx.
L 0 L

Remarks: This is an Initial-Boundary Value Problem (IBVP). The boundary conditions are called
Dirichlet boundary conditions. The physical meaning of the initial-boundary conditions is simple.
(a) The boundary conditions is to keep the temperature at the sides of the bar is constant.
(b) The initial condition is the initial temperature on the whole bar.

The proof of the IBVP above is based on the separation of variables method:
(1) Look for simple solutions of the boundary value problem.
(2) Any linear combination of simple solutions is also a solution. (Superposition.)
(3) Determine the free constants with the initial condition.
Proof of the Theorem 7.3.2: Look for simple solutions of the heat equation given by
u(t, x) = v(t) w(x).
So we look for solutions having the variables separated into two functions. Introduce this particular
function in the heat equation,
1 v̇(t) w00 (x)
v̇(t) w(x) = k v(t) w00 (x) ⇒ = ,
k v(t) w(x)
where we used the notation v̇ = dv/dt and w0 = dw/dx. The separation of variables in the function
u implies a separation of variables in the heat equation. The left hand side in the last equation
above depends only on t and the right hand side depends only on x. The only possible solution is
that both sides are equal the same constant, call it −λ. So we end up with two equations
1 v̇(t) w00 (x)
= −λ, and = −λ.
k v(t) w(x)
The equation on the left is first order and simple to solve. The solution depends on λ,
vλ (t) = cλ e−kλt , cλ = vλ (0).
320 7. BOUNDARY VALUE PROBLEMS

The second equation leads to an eigenfunction problem for w once boundary conditions are provided.
These boundary conditions come from the heat equation boundary conditions,
)
u(t, 0) = v(t) w(0) = 0 for all t > 0
⇒ w(0) = w(L) = 0.
u(t, L) = v(t) w(L) = 0 for all t > 0
So we need to solve the following BVP for w;
w00 + λ w = 0, w(0) = w(L) = 0.
This is an eigenfunction problem we solved § ??, the solution is
 nπ 2  nπx 
λn = , wn (x) = sin , n = 1, 2, · · · .
L L
Since we now know the values of λn , we introduce them in vλ ,
nπ )2 t
vn (t) = cn e−k( L .
Therefore, we got a simple solution of the heat equation BVP,
nπ 2
 nπx 
un (t, x) = cn e−k( L ) t sin ,
L
where n = 1, 2, · · · . Since the boundary conditions for un are homogeneous, then any linear
combination of the solutions un is also a solution of the heat equation with homogenous boundary
conditions. Hence the function
∞  nπx 
X nπ 2
u(t, x) = cn e−k( L ) t sin
n=1
L
is solution of the heat equation with homogeneous Dirichlet boundary conditions. Here the cn are
arbitrary constants. Notice that at t = 0 we have
X∞  nπx 
u(0, x) = cn sin
n=1
L
If we prescribe the cn we get a solution u that at t = 0 is given by the previous formula. Is it the
converse true? The answer is “yes”. Given f (x) = u(0, x), where f (0) = f (L) = 0, we can find
all the coefficients cn . Here is how: Given f on [0, L], extend it to the domain [−L, L] as an odd
function,
fodd (x) = f (x) and fodd (−x) = −f (x), x ∈ [0, L]
Since f (0) = 0, we get that fodd is continuous on [−L, L]. So fodd has a Fourier series expansion.
Since fodd is odd, the Fourier series is a sine series
X∞  nπx 
fodd (x) = bn sin
n=1
L
and the coefficients are given by the formula
1 L 2 L
Z  nπx  Z  nπx 
bn = fodd (x) sin dx = f (x) sin dx.
L −L L L 0 L
Since fodd (x) = f (x) for x ∈ [0, L], then cn = bn . This establishes the Theorem. 

Example 7.3.1 (Dirichlet). Find the solution to the initial-boundary value problem
4 ∂t u = ∂x2 u, t > 0, x ∈ [0, 2],
with initial and boundary conditions given by
  2
0 x ∈ 0, ,


 3 (
u(t, 0) = 0,

 2 4
IC: u(0, x) = 5 x ∈ , , BC:

 3 3 u(t, 2) = 0.
0 x ∈ 4 , 2 ,

 

3
7.3. THE HOMOGENEOUS HEAT EQUATION 321

Solution: We look for simple solutions of the form u(t, x) = v(t) w(x),
dv d2 w 4v̇(t) w00 (x)
4w(x) (t) = v(t) (x) ⇒ = = −λ.
dt dx2 v(t) w(x)
So, the equations for v and w are
λ
v̇(t) = −
v(t), w00 (x) + λ w(x) = 0.
4
The solution for v depends on λ, and is given by
λ
vλ (t) = cλ e− 4 t , cλ = vλ (0).
Next we turn to the the equation for w, and we solve the BVP
w00 (x) + λ w(x) = 0, with BC w(0) = w(2) = 0.
This is an eigenfunction problem for w and λ. This problem has solution only for λ > 0, since only
in that case the characteristic polynomial has complex roots. Let λ = µ2 , then
p(r) = r2 + µ2 = 0 ⇒ r± = ±µ i.
The general solution of the differential equation is
wn (x) = c1 cos(µx) + c2 sin(µx).
The first boundary conditions on w implies
0 = w(0) = c1 , ⇒ w(x) = c2 sin(µx).
The second boundary condition on w implies
0 = w(2) = c2 sin(µ2), c2 6= 0, ⇒ sin(µ2) = 0.

Then, µn 2 = nπ, that is, µn = . Choosing c2 = 1, we conclude,
2
 nπ 2  nπx 
λn = , wn (x) = sin , n = 1, 2, · · · .
2 2
Using the values of λn found above in the formula for vλ we get
1 nπ )2 t
vn (t) = cn e− 4 ( 4 , cn = vn (0).
Therefore, we get
∞  nπx 
X nπ )2 t
u(t, x) = cn e−( 4 sin .
n=1
2
The initial condition is   2

 0x ∈ 0, ,


 3
 2 4
f (x) = u(0, x) = 5 x∈ , ,

 3 3
4 


0 x∈ ,2 .

3
We extend this function to [−2, 2] as an odd function, so we obtain the same sine function,
fodd (x) = f (x) and fodd (−x) = −f (x), where x ∈ [0, 2].
The Fourier expansion of fodd on [−2, 2] is a sine series
X∞  nπx 
fodd (x) = bn sin .
n=1
2
The coefficients bn are given by
2 2
Z  nπx  Z 4/3  nπx  10  nπx  4/3
bn = f (x) sin dx = 5 sin dx = − cos .

2 0 2 2/3 2 nπ 2 2/3

So we get
10   2nπ   nπ 
bn = − cos − cos .
nπ 3 3
322 7. BOUNDARY VALUE PROBLEMS

Since fodd (x) = f (x) for x ∈ [0, 2] we get that cn = bn . So, the solution of the initial-boundary
value problem for the heat equation contains is

10 X 1   nπ   2nπ  nπ 2
 nπx 
u(t, x) = cos − cos e−( 4 ) t sin .
π n=1 n 3 3 2
C

7.3.3. The IBVP: Neumann Conditions. We now find solutions of the one-space dimen-
sional heat equation that satisfy a particular type of boundary conditions, called Neumann boundary
conditions. These conditions fix the values of the heat flux at two sides of the bar.

Theorem 7.3.3. The boundary value problem for the one space dimensional heat equation,
∂t u = k ∂x2 u, BC: ∂x u(t, 0) = 0, ∂x u(t, L) = 0,
where k > 0, L > 0 are constants, has infinitely many solutions

c0 X nπ 2
 nπx 
u(t, x) = + cn e−k( L ) t cos , cn ∈ R.
2 n=1
L

Furthermore, for every continuous function f on [0, L] satisfying f 0 (0) = f 0 (L) = 0, there is a
unique solution u of the boundary value problem above that also satisfies the initial condition
u(0, x) = f (x).
This solution u is given by the expression above, where the coefficients cn are
2 L
Z  nπx 
cn = f (x) cos dx, n = 0, 1, 2, · · · .
L 0 L

Remarks: This is an Initial-Boundary Value Problem (IBVP). The boundary conditions are called
Neumann boundary conditions. The physical meaning of the initial-boundary conditions is simple.
(a) The boundary conditions keep the heat flux (proportional to ∂x u) at the sides of the bar is
constant.
(b) The initial condition is the initial temperature on the whole bar.

One can use Dirichlet conditions on one side and Neumann on the other side. This is called a
mixed boundary condition. The proof, in all cases, is based on the separation of variables method.
Proof of the Theorem 7.3.3: Look for simple solutions of the heat equation given by
u(t, x) = v(t) w(x).
So we look for solutions having the variables separated into two functions. Introduce this particular
function in the heat equation,
1 v̇(t) w00 (x)
v̇(t) w(x) = k v(t) w00 (x) ⇒ = ,
k v(t) w(x)
where we used the notation v̇ = dv/dt and w0 = dw/dx. The separation of variables in the function
u implies a separation of variables in the heat equation. The left hand side in the last equation
above depends only on t and the right hand side depends only on x. The only possible solution is
that both sides are equal the same constant, call it −λ. So we end up with two equations
1 v̇(t) w00 (x)
= −λ, and = −λ.
k v(t) w(x)
The equation on the left is first order and simple to solve. The solution depends on λ,
vλ (t) = cλ e−kλt , cλ = vλ (0).
7.3. THE HOMOGENEOUS HEAT EQUATION 323

The second equation leads to an eigenfunction problem for w once boundary conditions are provided.
These boundary conditions come from the heat equation boundary conditions,
∂x u(t, 0) = v(t) w0 (0) = 0 for all t > 0
)
⇒ w0 (0) = w0 (L) = 0.
∂x u(t, L) = v(t) w0 (L) = 0 for all t > 0
So we need to solve the following BVP for w;
w00 + λ w = 0, w0 (0) = w0 (L) = 0.
This is an eigenfunction problem, which has solutions only for λ > 0, because in that case the
asociated characteristic polynomial has complex roots. If we write λ = µ2 , for µ > 0, we get the
general solution
w(x) = c1 cos(µx) + c2 sin(µx).
The boundary conditions apply on the derivative,
w0 (x) = −µc1 sin(µx) + µc2 cos(µx).
The boundary conditions are
0 = w0 (0) = µc2 ⇒ c2 = 0.
So the function is w(x) = µc1 cos(µx). The second boundary condition is
0 = w0 (L) = −µc1 sin(µL) ⇒ sin(µL) = 0 ⇒ µn L = nπ, n = 1, 2, · · · .
So we get the eigenvalues and eigenfunctions
 nπ 2  nπx 
λn = , wn (x) = cos , n = 1, 2, · · · .
L L
Since we now know the values of λn , we introduce them in vλ ,
nπ )2 t
vn (t) = cn e−k( L .
Therefore, we got a simple solution of the heat equation BVP,
nπ 2
 nπx 
un (t, x) = cn e−k( L ) t cos ,
L
where n = 1, 2, · · · . Since the boundary conditions for un are homogeneous, then any linear
combination of the solutions un is also a solution of the heat equation with homogenous boundary
conditions. Hence the function

c0 X nπ 2
 nπx 
u(t, x) = + cn e−k( L ) t cos
2 n=1
L
is solution of the heat equation with homogeneous Neumann boundary conditions. Notice that the
constant solution c0 /2 is a trivial solution of the Neumann boundary value problem, which was not
present in the Dirichlet boundary value problem. Here the cn are arbitrary constants. Notice that
at t = 0 we have

c0 X  nπx 
u(0, x) = + cn cos
2 n=1
L
If we prescribe the cn we get a solution u that at t = 0 is given by the previous formula. Is it the
converse true? The answer is “yes”. Given f (x) = u(0, x), where f 0 (0) = f 0 (L) = 0, we can find
all the coefficients cn . Here is how: Given f on [0, L], extend it to the domain [−L, L] as an even
function,
feven (x) = f (x) and feven (−x) = f (x), x ∈ [0, L]
We get that feven is continuous on [−L, L]. So feven has a Fourier series expansion. Since feven is
even, the Fourier series is a cosine series

a0 X  nπx 
feven (x) = + an cos
2 n=1
L
and the coefficients are given by the formula
1 L 2 L
Z  nπx  Z  nπx 
an = feven (x) cos dx = f (x) cos dx, n = 0, 1, 2, · · · .
L −L L L 0 L
324 7. BOUNDARY VALUE PROBLEMS

Since feven (x) = f (x) for x ∈ [0, L], then cn = an . This establishes the Theorem. 

Example 7.3.2 (Neumann). Find the solution to the initial-boundary value problem
∂t u = ∂x2 u, t > 0, x ∈ [0, 3],
with initial and boundary conditions given by
 3 
7 x ∈ , 3 ,
( 0
u (t, 0) = 0,

IC: u(0, x) = 2 BC:
0 x ∈ 0, 3 ,
   u0 (t, 3) = 0.
2
Solution: We look for simple solutions of the form u(t, x) = v(t) w(x),
dv d2 w v̇(t) w00 (x)
w(x) (t) = v(t) (x) ⇒ = = −λ.
dt dx2 v(t) w(x)
So, the equations for v and w are
v̇(t) = −λ v(t), w00 (x) + λ w(x) = 0.
The solution for v depends on λ, and is given by
vλ (t) = cλ e−λt , cλ = vλ (0).
Next we turn to the the equation for w, and we solve the BVP
w00 (x) + λ w(x) = 0, with BC w0 (0) = w0 (3) = 0.
This is an eigenfunction problem for w and λ. This problem has solution only for λ > 0, since only
in that case the characteristic polynomial has complex roots. Let λ = µ2 , then
p(r) = r2 + µ2 = 0 ⇒ r± = ±µ i.
The general solution of the differential equation is
wn (x) = c1 cos(µx) + c2 sin(µx).
Its derivative is
w0 (x) = −µ c1 sin(µx) + µ c2 cos(µx).
The first boundary conditions on w implies
0 = w0 (0) = µ c2 , ⇒ c2 = 0 ⇒ w(x) = c1 cos(µx).
The second boundary condition on w implies
0 = w0 (3) = −µ c1 sin(µ3), c1 6= 0, ⇒ sin(µ3) = 0.

Then, µn 3 = nπ, that is, µn = . Choosing c2 = 1, we conclude,
3
 nπ 2  nπx 
λn = , wn (x) = cos , n = 0, 1, 2, 3, · · · .
3 3
Notice that n = 0 is included because it gives us λ0 = 0 and u0 6= 0 constant. This is an eigenpair
because it satisfies the differential equation w00 + 0w = 0 and the boundary conditions w0 (0) = 0,
w0 (3) = 0, and also u0 is nonzero. We choose u0 = 1/2.
Using the values of λn found above in the formula for vλ we get
nπ )2 t
vn (t) = cn e−( 3 , cn = vn (0).
Therefore, we get

c0 X nπ 2
 nπx 
u(t, x) = + cn e−( 3 ) t cos ,
2 n=1
2
where we have added the trivial constant solution written as c0 /2. The initial condition is
 3 
7 x ∈ , 3 ,

f (x) = u(0, x) = 2
0 x ∈ 0, 3 ,
  
2
7.3. THE HOMOGENEOUS HEAT EQUATION 325

We extend f to [−3, 3] as an even function


 3 

 7 x∈ ,3 ,


 2
  3 3
feven (x) = 0 x∈ − , ,

 2 2
3

 
7 x ∈ −3, − .

2
Since frmeven is even, its Fourier expansion is a cosine series

a0 X  nπx 
feven (x) = + an cos .
2 n=1
3
The coefficient a0 is given by
2 3 2 3
Z Z
2 3
a0 = f (x) dx = 7 dx = 7 ⇒ a0 = 7.
3 0 3 3/2 3 2
Now the coefficients an for n > 1 are given by
2 3
Z  nπx 
an = f (x) cos dx
3 0 3
Z 3
2  nπx 
= 7 cos dx
3 3/2 3
2 3  nπx  3
= 7 sin

3 nπ 3

3/2
2 3   nπ 
= 7 0 − sin
3 nπ 2
2
= −7 sin(nπ).

But for n = 2k we have that sin(2kπ/2) = sin(kπ) = 0, while for n = 2k − 1 we have that
sin((2k − 1)π/2) = (−1)k−1 . Therefore
2(−1)k
a2k = 0, a2k−1 = 7 , k = 1, 2, · · · .
(2k − 1)π
We then obtain the Fourier series expansion of feven ,

7 X 2(−1)k  (2k − 1)πx 
feven (x) = + 7 cos
2 (2k − 1)π 3
k=1

But the function f has exactly the same Fourier expansion on [0, 3], which means that
2(−1)k
c0 = 7, c2k = 0, c(2k−1) = 7 .
(2k − 1)π
So the solution of the initial-boundary value problem for the heat equation is

7 X 2(−1)k −( (2k−1)π )2 t
 (2k − 1)πx 
u(t, x) = + 7 e 3 cos .
2 (2k − 1)π 3
k=1

Example 7.3.3 (Dirichlet). Find the solution to the initial-boundary value problem
∂t u = 4 ∂x2 u, t > 0, x ∈ [0, 2],
with initial and boundary conditions given by
u(0, x) = 3 sin(πx/2), u(t, 0) = 0, u(t, 2) = 0.
326 7. BOUNDARY VALUE PROBLEMS

Solution: We look for simple solutions


u(t, x) = v(t) w(x).
We put this function into the heat equation and we get
v̇(t) w00 (x)
w(x) v̇(t) = 4 v(t) w00 (x) ⇒ = = −λn .
4 v(t) w(x)
The equations for v and w are
v̇(t) = −4λ v(t), w00 (x) + λ w(x) = 0.
We solve for v, which depends on the constant λ, and we get
vλ (t) = cλ e−4λt ,
where cλ = vλ (0). Next we turn to the boundary value problem for w. We need to find the solution
of
w00 (x) + λ w(x) = 0, with w(0) = w(2) = 0.
This is an eigenfunction problem for w and λ. From § ?? we know that this problem has solutions
only for λ > 0, which is when the characteristic polynomial of the equation for w has complex roots.
So we write λ = µ2 for µ > 0. The characteristic polynomial of the differential equation for w is
p(r) = r2 + µ2 = 0 ⇒ r± = ±µi.
The general solution of the differential equation is
w(x) = c̃1 cos(µx) + c̃2 sin(µx).
The first boundary conditions on w implies
0 = w(0) = c̃1 , ⇒ w(x) = c̃2 sin(µx).
The second boundary condition on w implies
0 = w(2) = c̃2 sin(µ2), c̃2 6= 0, ⇒ sin(µ2) = 0.
Then, µn 2 = nπ, that is, µn = nπ/2, for n > 1. Choosing c̃2 = 1, we conclude,
 nπ 2  nπx 
λm = , wn (x) = sin , n = 1, 2, · · · .
2 2
Using these λn in the expression for vλ we get
2
vn (t) = cn e−4(nπ) t

The expressions for vn and wn imply that the simple solution solution un has the form
∞  nπx 
X 2
un (t, x) = cn e−4(nπ) t sin .
n=1
2
Since any linear combination of the function above is also a solution, we get
∞  nπx 
X 2
u(t, x) = cn e−4(nπ) t sin .
n=1
2
The initial condition is
 πx  X ∞  nπx 
3 sin = cn sin .
2 n=1
2
We now consider this function on the interval [−2, 2], where is an odd function. Then, the orthog-
onality of these sine functions above implies
Z 2  πx   mπx  X∞ Z 2  nπx   mπx 
3 sin sin dx = cn sin sin dx.
−2 2 2 n=1 −2 2 2
We now study the different cases:
cm = 0 for m 6= 1.
7.3. THE HOMOGENEOUS HEAT EQUATION 327

Therefore we get,  πx   πx 
3 sin = c1 sin ⇒ c1 = 3.
2 2
So the solution of the initial-boundary value problem for the heat equation is
2
 πx 
u(t, x) = 3 e−4π t sin .
2
C
328 7. BOUNDARY VALUE PROBLEMS

7.3.4. Exercises.

7.3.1.- . 7.3.2.- .
CHAPTER 8

Appendices

A. Overview of Complex Numbers


The first public appearance of complex numbers was in 1545 Gerolamo Cardano’s Ars Magna, when
he published a way to find solutions of a cubic equation ax3 + bx + c = 0. The solution formula was
not his own but given to him sometime earlier √ by Scipione del Ferro. In order to get such formula
there was a step in the calculation involving a −1, which was a mystery for the mathematicians
√ that time. There is no real number so that its square is −1, so what the√heck does this symbol
of
−1 mean? More intriguing, a few steps later during the calculation, this −1 cancels out, and it
does not appear in the final formula for the roots of the cubic equation. It was like a ghost entered
your calculation and walked out of it without leaving a trace. Maybe we should call them magic
numbers.
Everything in nature is magic until we understand how it works, then knowledge advances and
magic retreats, one step at a time. It took a while, until the beginning of the 19th century with
the—independent but almost simultaneous—works of Karl Gauss and William Hamilton, but our
magic numbers were finally understood and they became the complex numbers.
In spite of their name, there is nothing complex about complex numbers. Planar numbers is
a better fit to what they are—the set of all ordered pairs of real numbers together with specific
addition and multiplication rules. Complex numbers can be identified with points on a plane, in
the same way that real numbers can be identified with points on a line.

Definition A.1. Complex numbers are numbers of the form


(a, b),
where a and b are real numbers, together with the operations of addition,
(a, b) + (c, d) = (a + c, b + d), (A.1)
and multiplication,
(a, b)(c, d) = (ac − bd, ad + bc). (A.2)

The operation of addition is simple to understand because it is exactly how we add vectors on
a plane,
ha, bi + hc, di = h(a + c), (b + d)i.
It is the multiplication what distinguishes complex numbers from vectors on the plane. To under-
stand these operations it is useful to start with the following properties.

Theorem A.2. The addition and multiplication of complex number are commutative, associative,
and distributive. That is, given arbitrary complex numbers x, y, and z holds
(a) Commutativity: x + y = y + x and x y = y x.
(b) Associativity: x + (y + z) = (x + y) + z and x (y z) = (x y) x.
(c) Distributivity: x (y + z) = x y + x z.

Proof of Theorem A.2: We show how to prove one of these properties, the proof for the rest is
similar. Let’s see the commutativity of multiplication. Given the complex numbers x = (a, b) and

329
330 8. APPENDICES

y = (c, d) we have

x y = (a, b)(c, d) = (ac − bd), (ad + bc)
and

y x = (c, d)(a, b) = (ca − db), (cb + da)
therefore we get that x y = y x. The rest of the properties can be proven in a similar way. This
establishes the Theorem. 

We now mention a few more properties of complex numbers which are straightforward from
the definitions above. For all complex numbers (a, b) we have that
(0, 0) + (a, b) = (a, b)
(−a, −b) + (a, b) = (0, 0)
(a, b)(1, 0) = (a, b).
From the first equation above the complex number (0, 0) is called the zero complex number. From
the second equation above the complex number (−a, −b) is called the negative of (a, b), and we
write
−(a, b) = (−a, −b).
From the last equation above the complex number (1, 0) is called the identity for the multiplication.
The inverse of a complex number (a, b), denoted as (a, b)−1 , is the complex number satisfying
(a, b) (a, b)−1 = (1, 0).
Since the inverse of a complex number is itself a complex number, it can be written as
(a, b)−1 = (c, d)
for appropriate components c and d. The next result gives us a formula for these components. The
next result says that every nonzero complex number has an inverse.

Theorem A.3. The inverse of (a, b), with either a 6= 0 or b 6= 0, is


 a −b 
(a, b)−1 = , 2 . (A.3)
(a + b ) (a + b2 )
2 2

Proof of Theorem A.3: A complex number (a, b)−1 is the inverse of (a, b) iff
(a, b) (a, b)−1 = (1, 0).
When we write (a, b)−1 = (c, d), the equation above is
(a, b)(c, d) = (1, 0).
If we compute explicitly the left-hand side above we get

(ac − bd), (ad + bc) = (1, 0).
The equation above implies two equations for real numbers,
ac − bd = 1, ad + bc = 0.
In the case that either a 6= 0 or b 6= 0, the solution to the equations above is
a −b
c= , d= .
(a2 + b2 ) (a2 + b2 )
Therefore, the inverse of (a, b) is
 a −b 
(a, b)−1 = , 2 .
(a2 + b ) (a + b2 )
2

This establishes the Theorem. 


A. OVERVIEW OF COMPLEX NUMBERS 331

Example A.1. Find the inverse of (2, 3). Then, verify your result.
Solution: The formula above says that (2, 3)−1 is given by
 2 −3   2 −3 
(2, 3)−1 = 2 2
, 2 2
⇒ (2, 3)−1 = , .
(2 + 3 ) (2 + 3 ) 13 13
This is correct, since
 2 −3   4 (−9)  −6 6 
(2, 3) , = − , +
13 13 13 13 13 13
 13 0 
= ,
13 13
= (1, 0).
C

A.1. Extending the Real Numbers. The set of all complex numbers of the form (a, 0)
satisfy the same properties as the set of all real numbers a. Indeed, for all a, c reals holds
(a, 0) + (c, 0) = (a + c, 0), (a, 0)(c, 0) = (ac, 0).
We also have that
−(a, 0) = (−a, 0),
and the formula above for the inverse of a complex number says that
1 
(a, 0)−1 = ,0 .
a
From here it is natural to identify a complex number (a, 0) with the real number a, that is,
(a, 0) ←→ a.
This identification suggests the following definition.

Definition A.4. The real part of z = (a, b) is a and—then it is natural to call—the imaginary
part of z is b. We also use the notation
a = Re(z), b = Im(z).

A.2. The Imaginary Unit. We understood complex numbers of the form (a, 0). They are
no more than the real numbers. Now we study complex numbers of the form (0, b)—complex
numbers with no real part. In particular, we focus on the complex number (0, 1), which we call the
imaginary unit. Let us compute its square,
(0, 1)2 = (0, 1)(0, 1) = (−1, 0) = −(1, 0) ⇒ (0, 1)2 = −(1, 0).
Within the complex numbers we do have a number whose square is negative one, and that number
is the imaginary unit (0, 1). Actually, there are two complex numbers whose square is negative one,
one is (0, 1) and the other is −(0, 1), because
(0, −1)2 = (0, −1)(0, −1) = (0 − (−1)(−1), 0 + 0) = (−1, 0) = −(1, 0).
p
So, in the set of complex numbers we do have solutions for the −(1, 0), given by
p
−(1, 0) = ±(0, 1).
√ p
Notice that −1 has no solutions, but −(1, 0) has two solutions. This is the origin of the confusion
with del Ferro’s calculation. Most of his calculation used numbers of the form (a, 0)—written as
a— except at one tiny spot where a number (0, 1) shows up and later on cancels out. Del Ferro’s
calculation makes perfect sense in the complex realm, and almost all of it can be reproduced with
real numbers, but not all.
332 8. APPENDICES

A.3. Standard Notation. We can now relate the ordered pair notation we have been using
for complex numbers with the notation used by the early mathematicians. We start noticing that

(a, b) = (a, 0) + (0, b) = (a, 0) + (b, 0)(0, 1).

Therefore, if we write a for (a, 0), b for (b, 0), and we use i = (0, 1), we get that every complex
number (a, b) can be written as
(a, b) = a + bi.
Recall, a and b are the real and imaginary parts of (a, b). And the equation

(0, 1)2 = −(1, 0)

in the new notation is


i2 = −1.
This notation (a + bi) is useful to manipulate formulas involving addition and multiplication. If we
multiply (a + bi) by (c + di) and use the distributive and associative properties we get

(a + bi)(c + di) = ac + adi + cbi + bdi2 ,

and if we recall that i2 = −1 and we reorder terms, we get

(a + bi)(c + di) = ac − bd + (ad + bc)i.

So, we do not need to remember the formula for the product of two complex numbers. With the
new notation, this formula comes from the distributive and associative properties. Similarly, to
compute the inverse of a complex number a + bi we may write

1 1 (a − bi)
=
a + bi (a + bi) (a − bi)
(a − bi)
= .
(a + bi)(a − bi)

Notice that
(a + bi)(a − bi) = a2 + b2 ,
which has only a real part. Then we can write
1 a − bi 1 a b
= 2 ⇒ = 2 − 2 i
a + bi (a + b2 ) a + bi (a + b2 ) (a + b2 )

which agrees with the formula we got in Theorem A.3.

A.4. Useful Formulas. The powers of i can have only four possible results.

Theorem A.5. The integer powers of i can have only four results: 1, i, −1, and −i.

Proof of Theorem A.5: We just show that this is the case for the first powers. Dy definition of
a power zero and power one we know that

i0 = 1,
i1 = i.

We also know that


i2 = −i.
A. OVERVIEW OF COMPLEX NUMBERS 333

We can compute the next powers, using that (a + bi)m+n = (a + bi)m (a + bi)n , so we get
i3 = i2 i = (−1) i = −i
i4 = i3 i = −i i = −i2 = 1
i5 = i4 i = (1) i = i
i6 = i5 i = i i = −1
i7 = i6 i = (−1) i = −i
..
.
An argument using induction would proof this Theorem. 

The conjugate of a complex number a + bi is the complex number


a + bi = a − bi.
For example,
1 + 2i = 1 − 2i, a = a, i = −i, 4i = −4i.
If we conjugate twice we get the original complex number, that is a + bi = a + bi.
The modulus or absolute value of a complex number a + bi is the real number
p
|a + bi| = a2 + b2 .
For example
√ √ √
|3 + 4i| = 9 + 16 = 25 = 5, |a + 0i| = |a|, |i| = 1, |1 + i| = 2.
Using these definitions is simple to see that
(a + bi) (a + bi) = (a + bi)(a − bi) = (a2 + b2 ) = |a + bi|2 .
Using these definitions we can rewrite the formula in Eq. (A.3) for the inverse of a complex number
as follows,
1 1
= 2 (a − bi).
(a + bi) (a + b2 )
If we call z = a + bi, then the formula for z −1 reduces to
z
z −1 = .
|z|2

1
Example A.2. Write in the form c + di.
(3 + 4i)
Solution: You multiply numerator and denominator by 3 − 4i,
1 1 (3 − 4i)
=
(3 + 4i) (3 + 4i) (3 − 4i)
(3 − 4i)
= 2
(3 + 42 )
3 − 4i
=
25
3 4
= − i.
25 25
3 4 
So, we have found that the inverse of (3 + 4i) is − i . C
25 25

The absolute value of complex numbers satisfy the triangle inequality.

Theorem A.6. For all complex numbers z1 , z2 holds |z1 + z2 | 6 |z1 | + |z2 |.
334 8. APPENDICES

Remark: The idea of the Proof of Theorem A.6 is to use the graphical representations of complex
numbers as vectors on a plane. Then |z1 | is the length of the vector given by z1 , and the same
holds for the vectors associated to z2 and z1 + z2 , the latter being the diagonal in the parallelogram
formed by z1 and z2 . Then it is clear that the triangle inequality holds.
The absolute value of a complex number also satisfies the following properties.

Theorem A.7. For all complex numbers z1 , z2 holds |z1 z2 | = |z1 | |z2 |.

Proof of Theorem A.7: For an arbitrary complex numbers z1 = a + bi and z2 = c + di, we have
z1 z2 = (ac − bd) + (ad + bc)i,
therefore,
|z1 z2 |2 = (ac − bd)2 + (ad + bc)2
= (ac)2 + (bd)2 − 2acbd + (ad)2 + (bc)2 + 2adbc
= a2 c2 + b2 d2 + a2 d2 + b2 c2
= a2 (c2 + d2 ) + b2 (d2 + c2 )
= (a2 + b2 )(d2 + c2 )
= |z1 |2 |z2 |2 ,
Taking a square root we get
|z1 z2 | = |z1 | |z2 |.
This establishes the Theorem. 

Theorem A.8. Every complex number z satisfies that |z n | = |z|n , for all integer n.

Proof of Theorem A.8: One proof uses the previous theorem A.7 and induction in n. For n = 2
it is proven by the theorem above,
|z 2 | = |zz| = |z| |z| = |z|2 .
Now, suppose the theorem is true for n − 1, so |z n−1 | = |z|n−1 . Then
|z n | = |z n−1 z| = |z n−1 | |z|
where we used the previous theorem A.7. But in the first factor we use the inductive hypothesis,
|z n−1 | |z| = |z|n−1 |z| = |z|n .
So we have proven that |z n | = |z|n . This establishes the Theorem. 

Remark: A second proof, independent of the previous theorem is that, for an arbitrary non-
negative integer n we have,
√ p p √ n
|z n | = z n z n = z n (z)n = (zz)n = zz = |z|n

Example A.3. Verify the result in Theorem A.8 for n = 3 and z = 3 + 4i.
Solution: First we compute |z| and then its cube,

|z| = |3 + 4i| = 9 + 16 = 5 ⇒ |z|3 = 125.
We now compute z 3 , and then its absolute value,
p
z 3 = (3 + 4i)(3 + 4i)(3 + 4i) = −117 + 44 i ⇒ |z 3 | = 1172 + 442 = 125.
Therefore, |z|3 = |z 3 |. As an extra bonus, we found another perfect triple, besides the famous
32 + 42 = 52 , which is
442 + 1172 = 1252 .
A. OVERVIEW OF COMPLEX NUMBERS 335

A.5. Complex Functions. We know how to add and multiply complex numbers
(a + bi) + (c + di) = (a + c) + (b + d)i,
(a + bi)(c + di) = (ac − bd) + (ad + bc)i.
This means we know how to extend any real-valued function defined on real numbers having a Taylor
series expansion. We use the function Taylor series as the definition of the function for complex
numbers. For example, the real-valued exponential function has the Taylor series expansion

X an tn
eat = .
n=0
n!

Therefore, we define the complex-valued exponential as follows.

Definition A.9. The complex-valued exponential function is given by



X zn
ez = . (A.4)
n=0
n!

Remark: We are particularly interested in the case that the argument of the exponential function
is of the form z = (a ± bi)t, where r± = a ± bi are the roots of the characteristic polynomial of
a second order linear differential equation with constant coefficients. In this case, the exponential
function has the form

X (a + bi)n tn
e(a+bi)t = .
n=0
n!

The infinite sum on the right-hand side in equation (A.4) makes sense, since we know how to
multiply—hence compute powers—of complex numbers, and we know how to add complex numbers.
Furthermore, one can prove that the infinite series above converges, because the series converges in
absolute value, which implies that the series itself converges. Also important, the name we chose for
the function above, the exponential, is well chosen, because this function satisfies the exponential
property.

Theorem A.10 (Exp. Property). For all complex numbers z1 , z2 holds ez1 +z2 = ez1 ez2 .

Proof of Theorem A.10: A straightforward calculation using the binomial formula implies

X (z1 + z2 )n
ez1 +z2 =
n=0
n!
∞ X n
!
X n z1k z2n−k
=
n=0
k n!
k=0
∞ Xn
X z1k z2n−k
= ,
n=0
k!(n − k)!
k=0
!
n n!
where we used the notation = . This double sum is over the triangular region in
k k!(n − k)!
the nk space given by
06n6∞ 0 6 k 6 n.
We now interchange the order of the sums, the indices be given by
06k6∞ k 6 n 6 ∞,
336 8. APPENDICES

so we get
∞ Xn ∞ X

X z1k z2n−k X z1k z2n−k
= .
n=0
k!(n − k)! k!(n − k)!
k=0 k=0 n=k
If we introduce the variable m = n − k we get that
∞ X∞ ∞ X∞
X z1k z2n−k X z1k z2m
=
k!(n − k)! m=0
k!m!
k=0 n=k k=0
∞ ∞
X z1k  X z2m 
=
k! m=0 m!
k=0
= ez1 ez2 .
So we have shown that ez1 +z2 = ez1 ez2 . This Establishes the Theorem. 
The exponential property in the case that the exponent is z = (a + bi)t has the form
e(a+bi)t = eat eibt .
The first factor on the right-hand side above is a real exponential, which—for a given value of
a 6= 0—it is either a decreasing (a < 0) or increasing (a > 0) function of t. The second factor above
is an exponential of a pure imaginary exponent. These exponentials can be summed in a closed
form.

Theorem A.11 (Euler Formula). For any real number θ holds that eiθ = cos(θ) + i sin(θ).

Proof of Theorem A.11: Recall that in can have only four results, 1, i, −1, −i. This result can be
summarized as
i2n = (−1)n ⇒ i2n+1 = (−1)n i.
If we split the sum in the definition of the exponential into even and odd terms in the sum index,
we get
∞ ∞ ∞
X in θn X i2n θ2n X i2n+1 θ2n+1
eiθ = = + ,
n=0
n! n=0
(2n)! n=0
(2n + 1)!
and using the property above on the powers of i we get
∞ ∞ ∞ ∞
X i2n θ2n X i2n+1 θ2n+1 X (−1)n θ2n X (−1)n θ2n+1
+ = +i .
n=0
(2n)! n=0
(2n + 1)! n=0
(2n)! n=0
(2n + 1)!
Recall that Taylor series expansions of the sine and cosine functions
∞ ∞
X (−1)n θ2n+1 X (−1)n θ2n
sin(θ) = , cos(θ) = .
n=0
(2n + 1)! n=0
(2n)!
Therefore, we have shown that
eiθ = cos(θ) + i sin(θ).
This establishes the Theorem. 
A.6. Complex Vectors. We can extend the notion of vectors with real components to vectors
with complex components. For example, complex-valued vectors on a plane are vectors of the form
v = ha + bi, c + dii,
where a, b, c, d are real numbers. We can add two complex-valued vectors component-wise. So,
given
v1 = ha1 + b1 i, c1 + d1 ii, v2 = ha2 + b2 i, c2 + d2 ii,
we have that

v1 + v2 = (a1 + a2 ) + (b1 + b2 )i, (c1 + c2 ) + (d1 + d2 )i .
For example
h2 + 3i, 4 + 5ii + h6 + 7i, 8 + 9ii = h8 + 10i, 12 + 14ii.
A. OVERVIEW OF COMPLEX NUMBERS 337

We can also multiply a complex-valued vector by a scalar, which now is a complex number. So,
given v = ha + bi, c + dii and z = z1 + z2 i, then


zv = (z1 + z2 i)ha + bi, c + dii = (z1 + z2 i)(a + bi), (z1 + z2 i)(c + di) .
For example
i h2 + 3i, 4 + 5ii = h2i − 3, 4i − 5i
= h−3 + 2i, −5 + 4ii.
The only non-intuitive calculation with complex-valued vectors is how to find the length of a
complex vector. Recall that in the case of a real-valued vector v = ha, bi, the length of the vector
is defined as
√ p
kvk = v · v = a2 + b2 ,
where · is the dot product of vectors, that is, given the real-valued vectors v1 = ha1 , b1 i, v2 = ha2 , b2 i,
their dot product is the real number
v1 · v2 = a1 a2 + b1 b2 .
We want to generalize the notion of length from real-valued vectors to complex-valued vectors.
Notice that the length of a vector—real or complex—must be a real number.
√ Unfortunately, in
the case of a complex-valued vector v = ha + bi, c + dii the formula v · v is not always a real
number, it may have a nonzero imaginary part. In order to get a real number for the length of a
complex-valued vector we define

kvk = v · v,
where the conjugate of a vector means to conjugate all its components, that is

v = ha + bi, c + dii = ha − bi, c − dii.


We needed to introduce the conjugate in the first vector in the formula above so that the result is
a real number. Indeed, we have the following result.

Theorem A.12. The length of a complex-valued vector v = ha + bi, c + dii is


√ p
kvk = v · v = a2 + b2 + c2 + d2 .

Proof of Theorem A.12: This is a straightforward calculation,


kvk2 = v · v
= ha − bi, c − dii · ha + bi, c + dii
= (a − bi)(a + bi) + (c − di) (c + di)
= a2 + b2 + c2 + d2 .
So we get the formula
p
kvk = a2 + b2 + c2 + d2 .
This establishes the Theorem. 

Example A.4. Find the length of v = h1 + 2i, 3 + 4ii


Solution: The length of this vector is
p √
kvk = 12 + 22 + 32 + 42 = 30.
C
338 8. APPENDICES

A unit vector is a vector with length one, that is, u is a unit vector iff kuk = 1. Sometimes
one needs to find a unit vector parallel to some vector v. For both real-valued and complex-valued
vectors we have the same formula. A unit vector u parallel to v 6= 0 is
1
u= v.
kvk

Example A.5. Find a unit vector in the direction of v = h3 + 2i, 1 − 2ii.


Solution: First we check that v is not a unit vector. Indeed,
kvk2 = v · v
= h3 − 2i, 1 + 2ii · h3 + 2i, 1 − 2ii
= (3 − 2i)(3 + 2i) + (1 + 2i)(1 − 2i)
= 32 + 22 + 12 + 22
= 14.

Since kvk = 14, the vector v is not unit. A unit vector is
1
u = √ h3 − 2i, 1 + 2ii
14
more explicitly, D 3 2   1 2 E
u= √ −√ i , √ +√ i
14 14 14 14
C

Notes.
This appendix is inspired on Tom Apostol’s overview of complex numbers given in his outstanding
Calculus textbook, [1], Volume I, § 9.
Bibliography

[1] T. Apostol. Calculus. John Wiley & Sons, New York, 1967. Volume I, Second edition.
[2] P. Blanchard, R. Devaney, and G. Hall. Differential Equations. Thomson, Brooks/Cole, California,
2006. 3rd edition.
[3] W. Boyce and R. DiPrima. Elementary differential equations and boundary value problems. Wiley, New
Jersey, 2012. 10th edition.
[4] R. Churchill. Operational Mathematics. McGraw-Hill, New york, 1958. Second Edition.
[5] S. Hassani. Mathematical physics. Springer, New York, 2006. Corrected second printing, 2000.
[6] G. Simmons. Differential equations with applications and historical notes. McGraw-Hill, New York,
1991. 2nd edition.
[7] S. Strogatz. Nonlinear Dynamics and Chaos. Perseus Books Publishing, Cambridge, USA, 1994. Pa-
perback printing, 2000.
[8] E. Zeidler. Nonlinear Functional Analysis and its Applications I, Fixed-Point Theorems. Springer, New
York, 1986.
[9] E. Zeidler. Applied functional analysis: applications to mathematical physics. Springer, New York,
1995.
[10] D. Zill and W. Wright. Differential equations and boundary value problems. Brooks/Cole, Boston, 2013.
8th edition.

339

You might also like