You are on page 1of 4

SCHWARZ’ THEOREM ABOUT MIXED PARTIAL DERIVATIVES,

WITH SOME PRELIMINARY COMMENTS AND SOME REMARKS


ABOUT WORKING WITH DERIVATIVES ON A COMPUTER.
New version. December 2008.
The theorem of H. A, Schwarz gives sufficient conditions for the two “mixed”
∂2 ∂2
derivatives ∂x∂y f and ∂y∂x f of a function f to be equal. Most functions which
are defined by some “nice” formula satisfy these conditions and so they satisfy
∂2 ∂2
∂x∂y f = ∂y∂x f . But there exist functions which do not satisfy the conditions and
2 2
∂ ∂
also ∂x∂y f 6= ∂y∂x f at least at some points.
Before we state and prove the theorem, let us try to get some intuitive feeling
for these mixed derivatives. Let us also be explicit about the notation for them.
 f first with respect to x and then with respect
If we differentiate to y we get the
∂ ∂f ∂2f
derivative ∂y ∂x (if it exists). It is more usually denoted by ∂y∂x Another logical
.
0 00
notation for this same derivative is (fx0 )y or fxy . Alternatively, if we differentiate
2
∂ 00
first with respect to y and then x we get ∂x∂y f = fyx (if it exists).
Let f be a function defined in some open subset E of R2 . Suppose that the
points A = (x0 , y0 ), H = (x0 + h, y0 ), K = (x0 , y0 + k) and B = (x0 + h, y0 + k) are
00
all in E. Let us try to approximate the values of the derivatives fx0 , fy0 , fxy 00
and fyx
at the point A in terms of the values of f at A, B, H and K. We suppose (at least
for now) that x0 , y0 , h and k are constants. We think of h and k as being “small”
and positive.
Here please draw a picture of the the four points A, B, H and K and the rectangle
they form, of width h and height k. Things will happen later, at the vertices and
also inside this rectangle.

Let us think a little about the meeting between computers and derivatives of
functions: When we work with a computer, it cannot deal completely with func-
tions defined on infinite subsets of R or Rn . The computer has finite memory so it
can store and process the values of our function f only on some finite subset of E.
1
2

One natural subset to use here is a “network” of points of the form (x0 +N h, y0 +M k)
where N and M are integers.
Since the computer “knows” the values of the function f only at a finite set of
discrete points, there is no way it can calculate its derivatives exactly. But if h is
small, it is reasonable to expect that fx0 (A) = fx0 (x0 , y0 ) might be approximately
equal to f (x0 +h,y0h)−f (x0 ,y0 ) . In other words, we can write
f (H) − f (A)
(0.1) fx0 (A) ≈ .
h
Similarly, if k also is small, we can also write
f (K) − f (A)
(0.2) fy0 (A) ≈ .
k
Furthermore, by the same reasoning,
f (B) − f (K)
(0.3) fx0 (K) ≈
h
and
f (B) − f (H)
(0.4) fy0 (H) ≈ .
k
Now let us make some similar approximate calculations when, instead of the func-
tion f , we consider the function fx0 or the function fy0 . We get, for example,
fx0 (K) − fx0 (A)
(0.5) (fx0 )0y (A) ≈
k
and also,
fy0 (H) − fy0 (A)
(0.6) (fy0 )0x (A) ≈ .
h
Now let us be very very optimistic, and hope that the approximations in the
above calculations are so good that if in (0.5) and (0.6) if we substitute approximate
values from the previous calculations we will still get expressions which is not too
00 00
far from the actual values of the derivatives fxy and fyx at A. If this optimism is
justified we will have

f (B)−f (K) f (H)−f (A)


fx0 (K) − fx0 (A) −
(0.7) (fx0 )0y (A) ≈ ≈ h h
,
k k
and also
f (B)−f (H)
fy0 (H) − fy0 (A) − f (K)−f (A)
(0.8) (fy0 )0x (A) ≈ ≈ k k
.
h h
But now we can see that the right hand side of both of these expressions is the
same. It equals
f (A) + f (B) − f (H) − f (K)
.
hk
00 00
This suggests that maybe the two derivatives fxy and fyx might be equal at A
and maybe even at every other point where they both exist. We are now ready to
check this precisely.
Remark: It should be stressed that all the preceding “calculations” are only
approximate and intuitive. If someone asks you or me “What does the symbol ≈
really mean?” how can we answer them?
3

00
As we already said above, there are some “exotic” functions which satisfy fxy 6=
00
fyx at at least some of the points where these derivatives exist. Let us now consider
the big class of functions for which such problems do not arise.
THEOREM (H. A. Schwarz). Suppose that f is a function of two variables
00 00
such that fxy and fyx both exist and are continuous at some point (x0 , y0 ). Then
00 00
fxy (x0 , y0 ) = fyx (x0 , y0 ).
00 00
Remark: Note that the conditions of the theorem imply that fxy and fyx must
be defined in some neighbourhood of (x0 , y0 ), and so this of course also implies that
fx0 , fy0 and f itself must also be defined at every point of some neighbourhood of
(x0 , y0 ). But we do not require fx0 or fy0 or even f to be continuous.
Proof. We have to do a more precise version of the intuitive and approximate
calculations which appear above. Let us look again more carefully at the two
approximate formulae (0.7) and (0.8). We can rewrite them as
f (x0 +h,y0 +k)−f (x0 ,y0 +k) f (x0 +h,y0 )−f (x0 ,y0 )

(fx0 )0y (x0 , y0 ) ≈ h h
k
and
f (x0 +h,y0 +k)−f (x0 +h,y0 ) f (x0 ,y0 +k)−f (x0 ,y0 )

(fy0 )0x (x0 , y0 ) ≈ k
. k
h
It turns out that we can prove EXACT versions of these two formulae. These
are
f (x0 +h,y0 +k)−f (x0 ,y0 +k) f (x0 +h,y0 )−f (x0 ,y0 )

(0.9) (fx0 )0y (α, β) = h h
k
and
f (x0 +h,y0 +k)−f (x0 +h,y0 ) f (x0 ,y0 +k)−f (x0 ,y0 )

(0.10) (fy0 )0x (γ, δ) = k k
h
where (α, β) and (γ, δ) which now replace (x0 , y0 ) on the left side of the formulae
are unknown points. But each of them is somewhere inside the rectangle AHBK.
Also the numbers h and k have to be sufficiently small for this rectangle to be
contained in the neighbourhood of A = (x0 , y0 ) where the derivatives fx0 , fy0 , fxy00
00
and fyx all exist. Let us now prove (0.9). The idea is to first apply Lagrange’s
theorem to an auxiliary function of y defined by u(y) = h1 (f (x0 + h, y) − f (x0 , y)).
In fact it is quite natural to choose to work with the function u because it is just
what we need to write the right side of (0.9) in a simpler form, namely
u(y0 + k) − u(y0 )
.
k
0
By Lagrange’s theorem there exists θ ∈ [0, 1] such
 that this expression equals u (y0 +
0 1 0 0
θk). But u (y) = h fy (x0 + h, y) − fy (x0 , y) so now we know that the right side
of (0.9) equals h1 fy0 (x0 + h, y0 + θk) − fy0 (x0 , y0 + θk) = h1 (g(x0 + h) − g(x0 ))


where now we have defined the function g by g(x) = fy0 (x, y0 + θk). Now we apply
Lagrange’s theorem a second time, this time to the function g so that, 0 for some
θ0 ∈ [0, 1], the last expression equals g 0 (x0 + θ0 h). But g 0 (x) = fy0 x (x, y0 + θk)
and so we have proved (0.9) for α = x0 + θ0 h and β = y0 + θk.
The next step would be to use a very similar and analogous argument to prove
(0.10). This will be left as an exercise.
4

We shall now use (0.9) and (0.10) in the case where h = k = n1 where n is a
sufficently large integer. Remember that the right hand sides of these two formulae
are equal to each other. So we get points (αn , βn ) and (γn , δn ) both lying in a
square of side length n1 whose bottom left corner is (x0 , y0 ), such that
(0.11) (fx0 )0y (αn , βn ) = (fy0 )0x (γn , δn ).
The sequences of points {(αn , βn )}n∈N and {(γn , δn )}n∈N both converge to (x0 , y0 )
as n tends to ∞. So by Heine’s theorem we can take limits in (0.11) as n → ∞
00 00
and, at last we can use the continuity of fxy and fyx at (x0 , y0 ) to show that
(fx0 )0y (x0 , y0 ) = lim (fx0 )0y (αn , βn ) = lim (fy0 )0x (γn , δn ) = (fy0 )0x (x0 , y0 ).
n→∞ n→∞
This completes the proof. 

Historical Notes: For more details see: Donald H. Trahan. The Mixed Partial
Derivatives and the Double Derivative. American Mathematical Monthly, vol 76,
No. 1 (Jan. 1969) pp. 76-77. The theorem is apparently due to H.A.Schwarz. or
maybe Alexis Claude Clairaut, or both, or someone else. (Note spelling of Schwarz!)
The biography of Clairaut on the st-andrews.ac.uk site makes no mention of this
theorem.

You might also like