Full

THERMODYNAMICS
AND STATISTICAL
MECHANICS
Daniel Arovas
UC San Diego
UC San Diego
Book: Thermodynamics and Statistical
Mechanics (Arovas)
Daniel Arovas
This text is disseminated via the Open Education Resource (OER) LibreTexts Project (https://LibreTexts.org) and like the hundreds
of other texts available within this powerful platform, it is freely available for reading, printing and "consuming." Most, but not all,
pages in the library have licenses that may allow individuals to make changes, save, and print this book. Carefully
consult the applicable license(s) before pursuing such effects.
Instructors can adopt existing LibreTexts texts or Remix them to quickly build course-specific resources to meet the needs of their
students. Unlike traditional textbooks, LibreTexts’ web based origins allow powerful integration of advanced features and new
technologies to support learning.
The LibreTexts mission is to unite students, faculty and scholars in a cooperative effort to develop an easy-to-use online platform
for the construction, customization, and dissemination of OER content to reduce the burdens of unreasonable textbook costs to our
students and society. The LibreTexts project is a multi-institutional collaborative venture to develop the next generation of open-
access texts to improve postsecondary education at all levels of higher learning by developing an Open Access Resource
environment. The project currently consists of 14 independently operating and interconnected libraries that are constantly being
optimized by students, faculty, and outside experts to supplant conventional paper-based books. These free textbook alternatives are
organized within a central environment that is both vertically (from advance to basic level) and horizontally (across different fields)
integrated.
The LibreTexts libraries are Powered by NICE CXOne and are supported by the Department of Education Open Textbook Pilot
Project, the UC Davis Office of the Provost, the UC Davis Library, the California State University Affordable Learning Solutions
Program, and Merlot. This material is based upon work supported by the National Science Foundation under Grant No. 1246120,
1525057, and 1413739.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not
necessarily reflect the views of the National Science Foundation nor the US Department of Education.
Have questions or comments? For information about adoptions or adaptions contact info@LibreTexts.org. More information on our
activities can be found via Facebook (https://facebook.com/Libretexts), Twitter (https://twitter.com/libretexts), or our blog
(http://Blog.Libretexts.org).
This text was compiled on 02/09/2024
TABLE OF CONTENTS
Licensing
1: Fundamentals of Probability
1.1: Statistical Properties of Random Walks
1.2: Basic Concepts in Probability Theory
1.3: Entropy and Probability
1.4: General Aspects of Probability Distributions
1.5: Bayesian Statistical Inference
1.S: Summary
2: Thermodynamics
2.2: The Zeroth Law of Thermodynamics
2.3: Mathematical Interlude - Exact and Inexact Differentials
2.4: The First Law of Thermodynamics
2.5: Heat Engines and the Second Law of Thermodynamics
2.6: The Entropy
2.7: Thermodynamic Potentials
2.8: Maxwell Relations
2.9: Equilibrium and Stability
2.10: Applications of Thermodynamics
2.11: Phase Transitions and Phase Equilibria
2.12: Entropy of Mixing and the Gibbs Paradox
2.13: Some Concepts in Thermochemistry
2.14: Appendix I- Integrating Factors
2.15: Appendix II- Legendre Transformations
2.16: Appendix III- Useful Mathematical Relations
2.S: Summary
3: Ergodicity and the Approach to Equilibrium

3.1: Modeling the Approach to Equilibrium
3.2: Phase Flows in Classical Mechanics
3.3: Irreversibility and Poincaré Recurrence
3.4: Remarks on Ergodic Theory
3.5: Thermalization of Quantum Systems
3.6: Appendices
3.S: Summary
4: Statistical Ensembles
4.1: Microcanonical Ensemble (μCE)
4.2: The Quantum Mechanical Trace
4.3: Thermal Equilibrium
4.4: Ordinary Canonical Ensemble (OCE)
4.5: Grand Canonical Ensemble (GCE)
4.6: Statistical Ensembles from Maximum Entropy
4.7: Ideal Gas Statistical Mechanics
4.8: Selected Examples
1 https://phys.libretexts.org/@go/page/25262
4.9: Statistical Mechanics of Molecular Gases
4.10: Appendix I- Additional Examples
4.S: Summary
5: Noninteracting Quantum Systems

5.1: Statistical Mechanics of Noninteracting Quantum Systems
5.2: Quantum Ideal Gases - Low Density Expansions
5.3: Entropy and Counting States
5.4: Lattice Vibrations - Einstein and Debye Models
5.5: Photon Statistics
5.6: The Ideal Bose Gas
5.7: The Ideal Fermi Gas
5.9: Appendix I- Second Quantization
5.10: Appendix II- Ideal Bose Gas Condensation
5.11: Appendix III- Example Bose Condensation Problem
5.S: Summary
6: Classical Interacting Systems

6.1: Ising Model
6.2: Nonideal Classical Gases
6.3: Lee-Yang Theory
6.4: Liquid State Physics
6.5: Coulomb Systems - Plasmas and the Electron Gas
6.6: Polymers
6.7: Appendix I- Potts Model in One Dimension
6.S: Summary
7: Mean Field Theory of Phase Transitions

7.1: The van der Waals system
7.2: Fluids, Magnets, and the Ising Model
7.3: Mean Field Theory
7.4: Variational Density Matrix Method
7.5: Landau Theory of Phase Transitions
7.6: Mean Field Theory of Fluctuations
7.7: Global Symmetries
7.8: Ginzburg-Landau Theory
7.9: Appendix I- Equivalence of the Mean Field Descriptions
7.10: Appendix II- Additional Examples
7.S: Summary
8: Nonequilibrium Phenomena
8.1: Equilibrium, Nonequilibrium and Local Equilibrium
8.2: Boltzmann Transport Theory
8.3: Weakly Inhomogeneous Gas
8.4: Relaxation Time Approximation
8.5: Diffusion and the Lorentz model
8.6: Linearized Boltzmann Equation
8.7: The Equations of Hydrodynamics
8.8: Nonequilibrium Quantum Transport
8.9: Stochastic Processes
8.10: Appendix I- Boltzmann Equation and Collisional Invariants
8.11: Appendix II- Distributions and Functionals
8.12: Appendix III- General Linear Autonomous Inhomogeneous ODEs
8.13: Appendix IV- Correlations in the Langevin formalism
8.14: Appendix V- Kramers-Krönig Relations
8.S: Summary
9: Renormalization
9.1: The Program of Renormalization
9.2: Real Space Renormalization
9.3: Block Spin Transformation
9.4: Scaling Variables
9.5: RSRG on Hierarchical Lattices The
9.S: Summary
Index
Index
Glossary
Detailed Licensing
Licensing
A detailed breakdown of this resource's licensing can be found in Back Matter/Detailed Licensing.
CHAPTER OVERVIEW
1: Fundamentals of Probability
1.S: Summary
Thumbnail: The falling ball system, which mimics a one-dimensional random walk.
This page titled 1: Fundamentals of Probability is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel
Arovas.
1
One-Dimensional Random Walk
Consider the mechanical system depicted in Fig. 1.1.1, a version of which is often sold in novelty shops. A ball is released from the
top, which cascades consecutively through N levels. The details of each ball’s motion are governed by Newton’s laws of motion.
However, to predict where any given ball will end up in the bottom row is difficult, because the ball’s trajectory depends sensitively
on its initial conditions, and may even be influenced by random vibrations of the entire apparatus. We therefore abandon all hope of
integrating the equations of motion and treat the system statistically. That is, we assume, at each level, that the ball moves to the
right with probability p and to the left with probability q = 1 − p . If there is no bias in the system, then p = q = . The position 1
X
N
after N steps may be written
N
X = ∑σ , (1.1.1)
j
j=1
where σ = +1 if the ball moves to the right at level j , and

j
σ
j
= −1 if the ball moves to the left at level j . At each level, the
probability for these two outcomes is given by
p if σ = +1
Pσ = p δ +q δ ={ (1.1.2)
σ,+1 σ,−1
q if σ = −1 .
This is a normalized discrete probability distribution of the type discussed in section 4 below. The multivariate distribution for all
the steps is then
N
P (σ , … ,σ ) = ∏ P (σ ) . (1.1.3)
1 N j
j=1
Our system is equivalent to a one-dimensional random walk. Imagine an inebriated pedestrian on a sidewalk taking steps to the
right and left at random. After N steps, the pedestrian’s location is X.
Figure 1.1.1 : The falling ball system, which mimics a one-dimensional random walk.
Now let’s compute the average of X:
N
⟨X⟩ = ⟨ ∑ σ ⟩ = N ⟨σ⟩ = N ∑ σ P (σ) = N (p − q) = N (2p − 1) . (1.1.4)

j
j=1 σ=±1
This could be identified as an equation of state for our system, as it relates a measurable quantity X to the number of steps N and
the local bias p. Next, let’s compute the average of X : 2
1.1.1 https://phys.libretexts.org/@go/page/18541
N N
2 2 2
⟨X ⟩ = ∑ ∑⟨σ σ ′ ⟩ =N (p − q ) + 4N pq . (1.1.5)
j j
j=1 j′ =1
Here we have used

′
2 1 if j = j
⟨σ σ ′ ⟩ =δ ′ + (1 − δ ′ )(p − q ) ={ (1.1.6)
j j jj jj 2 ′
(p − q) if j ≠ j .
Note that ⟨X 2
⟩ ≥ ⟨X ⟩
2
, which must be so because
2 2 2 2
V ar(X) = ⟨(ΔX ) ⟩ ≡ ⟨(X − ⟨X⟩) ⟩ = ⟨X ⟩ − ⟨X ⟩ . (1.1.7)
This is called the variance of X. We have V ar(X) = 4N p q . The root mean square deviation, ΔX , is the square root of the rms
−−−−− − 1
variance: ΔX rms= √V ar(X) . Note that the mean value of X is linearly proportional to N , but the RMS fluctuations ΔX rms
are proportional to N . In the limit N → ∞ then, the ratio ΔX /⟨X⟩ vanishes as N

1/2
. This is a consequence of the central
rms
−1/2
limit theorem (see §4.2 below), and we shall meet up with it again on several occasions.We can do even better. We can find the
complete probability distribution for X. It is given by
P\ns_{N,X}={N\choose N\ns_\ssr{R}}\,p^{N\ns_\ssr{R}}\,q^{N\ns_\ssr{L}}\ ,
where N\ns_\ssr{R/L} are the numbers of steps taken to the right/left, with N=N\ns_\ssr{R}+N\ns_\ssr{L} , and
X=N\ns_\ssr{R}-N\ns_\ssr{L} . There are many independent ways to take N\ns_\ssr{R} steps to the right. For example, our first
N\ns_\ssr{R} steps could all be to the right, and the remaining N\ns_\ssr{L}=N-N\ns_\ssr{R} steps would then all be to the left. Or
our final N\ns_\ssr{R} steps could all be to the right. For each of these independent possibilities, the probability is
p^{N\ns_\ssr{R}}\,q^{N\ns_\ssr{L}} . How many possibilities are there? Elementary combinatorics tells us this number is
{N\choose N\ns_\ssr{R}}={N!\over N\ns_\ssr{R}!\,N\ns_\ssr{L}!}\ .
Note that N\pm X=2N\ns_\ssr{R/L} , so we can replace N\ns_\ssr{R/L}=\half(N\pm X) . Thus,

N! (N +X)/2 (N −X)/2
P = p q . (1.1.8)
N ,X N +X N −X
( )! ( )!
2 2
Thermodynamic Limit
Consider the limit N → ∞ but with x ≡ X/N finite. This is analogous to what is called the thermodynamic limit in statistical
mechanics. Since N is large, x may be considered a continuous variable. We evaluate ln P using Stirling’s asymptotic N ,X
expansion
ln N ! ≃ N ln N − N + O(ln N ) . (1.1.9)
We then have
1 1 1
ln P ≃ N ln N − N − N (1 + x) ln [ N (1 + x)] + N (1 + x)
N ,X
2 2 2
1 1 1 1 1
− N (1 − x) ln [ N (1 − x)] + N (1 − x) + N (1 + x) ln p + N (1 − x) ln q
2 2 2 2 2
1 +x 1 +x 1 −x 1 −x 1 +x 1 −x
= −N [( ) ln ( )+( ) ln ( )] + N [( ) ln p + ( ) ln q] .
2 2 2 2 2 2
Notice that the terms proportional to N ln N have all cancelled, leaving us with a quantity which is linear in N . We may therefore
write ln P
N ,X
= −N f (x) + O(ln N ) , where
1 +x 1 +x 1 −x 1 −x 1 +x 1 −x
f (x) = [( ) ln ( )+( ) ln ( )] − [( ) ln p + ( ) ln q] . (1.1.10)
2 2 2 2 2 2
Figure 1.1.2 : Comparison of exact distribution of Equation 1.1.8 (red squares) with the Gaussian distribution of Equation 1.1.16
(blue line).
We have just shown that in the large N limit we may write
−N f (X/N )
P =Ce , (1.1.11)
N ,X
where C is a normalization constant2. Since N is by assumption large, the function P is dominated by the minimum (or N ,X
minima) of f (x), where the probability is maximized. To find the minimum of f (x), we set f (x) = 0 , where ′
1 q 1 +x
′
f (x) = ln ( ⋅ ) . (1.1.12)
2 p 1 −x
Setting f ′
(x) = 0 , we obtain
1 +x p
= ⇒ x̄ = p − q . (1.1.13)
1 −x q
We also have
′′
1
f (x) = , (1.1.14)
2
1 −x
so invoking Taylor’s theorem,

1 ′′ 2
f (x) = f (x̄) + f (x̄) (x − x̄) +… . (1.1.15)
2
Putting it all together, we have

2 2
N (x − x̄) (X − X̄)
P ≈ C exp[− ] = C exp[− ] , (1.1.16)
N ,X
8pq 8N pq
where X̄ = ⟨X⟩ = N (p − q) = N x̄ . The constant C is determined by the normalization condition,

∞
∞ 2
1 (X − X̄) −−−−−−
∑ P ≈ ∫ dX C exp[ − ] = √2πN pq C , (1.1.17)
N ,X
2 8N pq
X=−∞
−∞
−−−−−−
and thus C = 1/ √2πN pq . Why don’t we go beyond second order in the Taylor expansion of f (x)? We will find out in §4.2 below.
Entropy and energy

The function f (x) can be written as a sum of two contributions, f (x) = e(x) − s(x) , where
1 +x 1 +x 1 −x 1 −x
s(x) = −( ) ln ( )−( ) ln ( )
2 2 2 2
1 1
e(x) = − ln(pq) − x ln(p/q) .
2 2
The function S(N , x) ≡ N s(x) is analogous to the statistical entropy of our system3. We have
S(N,x)=N s(x) = \ln\!{N\choose N\ns_\ssr{R}} = \ln\!{N\choose \half N(1+x)}\ .
Thus, the statistical entropy is the logarithm of the number of ways the system can be configured so as to yield the same value of X
(at fixed N ). The second contribution to f (x) is the energy term. We write
1 1
E(N , x) = N e(x) = − N ln(pq) − N x ln(p/q) . (1.1.18)
2 2
The energy term biases the probability P N ,X

= exp(S − E) so that low energy configurations are more probable than high energy
configurations. For our system, we see that when p < q ( p < ), the energy is minimized by taking x as small as possible
1
(meaning as negative as possible). The smallest possible allowed value of x = X/N is x = −1 . Conversely, when p > q ( p > ), 1
the energy is minimized by taking x as large as possible, which means x = 1 . The average value of x, as we have computed
explicitly, is x̄ = p − q = 2p − 1 , which falls somewhere in between these two extremes.
In actual thermodynamic systems, entropy and energy are not dimensionless. What we have called S here is really S/k , which is B
the entropy in units of Boltzmann’s constant. And what we have called E here is really E/k T , which is energy in units of
B
Boltzmann’s constant times temperature.
This page titled 1.1: Statistical Properties of Random Walks is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated
by Daniel Arovas.
Fundamental definitions
The natural mathematical setting is set theory. Sets are generalized collections of objects. The basics: ω ∈ A is a binary relation
which says that the object ω is an element of the set A . Another binary relation is set inclusion. If all members of A are in B , we
write A ⊆ B . The union of sets A and B is denoted A ∪ B and the intersection of A and B is denoted A ∩ B . The Cartesian
product of A and B , denoted A × B , is the set of all ordered elements (a, b) where a ∈ A and b ∈ B .
Some details: If ω is not in A , we write ω ∉ A . Sets may also be objects, so we may speak of sets of sets, but typically the sets
which will concern us are simple discrete collections of numbers, such as the possible rolls of a die {1,2,3,4,5,6}, or the real
numbers R, or Cartesian products such as R . If A ⊆ B but A ≠ B , we say that A is a proper subset of B and write A ⊂ B .
N
Another binary operation is the set difference A∖B, which contains all ω such that ω ∈ A and ω ∉ B .
In probability theory, each object ω is identified as an event. We denote by Ω the set of all events, and ∅ denotes the set of no
events. There are three basic axioms of probability:
To each set A is associated a non-negative real number P (A), which is called the probability of A .
P (Ω) = 1 .
If {A } is a collection of disjoint sets, if A ∩ A = ∅ for all i ≠ j , then

i i j
P ( ⋃ A ) = ∑ P (A ) . (1.2.1)
i i
i i
From these axioms follow a number of conclusions. Among them, let ¬A = Ω∖A be the complement of A , the set of all events not
in A . Then since A ∪ ¬A = Ω , we have P (¬A) = 1 − P (A) . Taking A = Ω , we conclude P (∅) = 0 .
The meaning of P (A) is that if events ω are chosen from Ω at random, then the relative frequency for ω ∈ A approaches P (A) as
the number of trials tends to infinity. But what do we mean by ’at random’? One meaning we can impart to the notion of
randomness is that a process is random if its outcomes can be accurately modeled using the axioms of probability. This entails the
identification of a probability space Ω as well as a probability measure P . For example, in the microcanonical ensemble of
classical statistical physics, the space Ω is the collection of phase space points φ = {q , … , q , p , … , p } and the probability
1 n 1 n
measure is dμ = Σ (E) ∏ dq dp δ(E − H (q, p)) , so that for A ∈ Ω the probability of A is P (A) = ∫ dμ χ (φ) , where
−1 n
i=1 i i A
χ (φ) = 1 if φ ∈ A and χ (φ) = 0 if φ ∉ A is the characteristic function of A . The quantity Σ(E) is determined by
A A
normalization: ∫ dμ = 1 .
Bayesian Statistics
We now introduce two additional probabilities. The joint probability for sets A and B together is written P (A ∩ B) . That is,
P (A ∩ B) = P rob[ω ∈ A and ω ∈ B] . For example, A might denote the set of all politicians, B the set of all American citizens,
and C the set of all living humans with an IQ greater than 60. Then A ∩ B would be the set of all politicians who are also
American citizens, Exercise: estimate P (A ∩ B ∩ C ) .
The conditional probability of B given A is written P (B|A) . We can compute the joint probability P (A ∩ B) = P (B ∩ A) in
two ways:
P (A ∩ B) = P (A|B) ⋅ P (B) = P (B|A) ⋅ P (A) . (1.2.2)
Thus,
P (B|A) P (A)
P (A|B) = , (1.2.3)
P (B)
a result known as Bayes’ theorem. Now suppose the ‘event space’ is partitioned as {A }. Then i
P (B) = ∑ P (B| A ) P (A ) . (1.2.4)

i i
We then have
P (B| A ) P (A )
i i
P (A |B) = , (1.2.5)
i
∑ P (B| A ) P (A )
j j j
a result sometimes known as the extended form of Bayes’ theorem. When the event space is a ‘binary partition’ {A, ¬A}, we have
P (B|A) P (A)
P (A|B) = . (1.2.6)
P (B|A) P (A) + P (B|¬A) P (¬A)
Note that P (A|B) + P (¬A|B) = 1 (which follows from ¬¬A = A ).

As an example, consider the following problem in epidemiology. Suppose there is a rare but highly contagious disease A which
occurs in 0.01% of the general population. Suppose further that there is a simple test for the disease which is accurate 99.99% of
the time. That is, out of every 10,000 tests, the correct answer is returned 9,999 times, and the incorrect answer is returned only
once. Now let us administer the test to a large group of people from the general population. Those who test positive are
quarantined. Question: what is the probability that someone chosen at random from the quarantine group actually has the disease?
We use Bayes’ theorem with the binary partition {A, ¬A}. Let B denote the event that an individual tests positive. Anyone from
the quarantine group has tested positive. Given this datum, we want to know the probability that that person has the disease. That
is, we want P (A|B). Applying Equation [Bayesbinary] with
P (A) = 0.0001 , P (¬A) = 0.9999 , P (B|A) = 0.9999 , P (B|¬A) = 0.0001 , (1.2.7)
we find P (A|B) = . That is, there is only a 50% chance that someone who tested positive actually has the disease, despite the
1
test being 99.99% accurate! The reason is that, given the rarity of the disease in the general population, the number of false
positives is statistically equal to the number of true positives.
In the above example, we had P (B|A) + P (B|¬A) = 1 , but this is not generally the case. What is true instead is
P (B|A) + P (¬B|A) = 1 . Epidemiologists define the sensitivity of a binary classification test as the fraction of actual positives
which are correctly identified, and the specificity as the fraction of actual negatives that are correctly identified. Thus,
se = P (B|A) is the sensitivity and sp = P (¬B|¬A) is the specificity. We then have P (B|¬A) = 1 − P (¬B|¬A) . Therefore,
P (B|A) + P (B|¬A) = 1 + P (B|A) − P (¬B|¬A) = 1 + se − sp . (1.2.8)
In our previous example, se = sp = 0.9999, in which case the RHS above gives 1. In general, if P (A) ≡ f is the fraction of the
population which is afflicted, then
f ⋅ se
P (infected | positive) = . (1.2.9)
f ⋅ se + (1 − f ) ⋅ (1 − sp)
For continuous distributions, we speak of a probability density. We then have
P (y) = ∫ dx P (y|x) P (x) (1.2.10)
and
P (y|x) P (x)
P (x|y) = . (1.2.11)
′ ′ ′
∫ dx P (y| x ) P (x )
The range of integration may depend on the specific application.

The quantities P (A ) are called the prior distribution. Clearly in order to compute P (B) or P (A |B) we must know the priors,
i i
and this is usually the weakest link in the Bayesian chain of reasoning. If our prior distribution is not accurate, Bayes’ theorem will
generate incorrect results. One approach to approximating prior probabilities P (A ) is to derive them from a maximum entropy
i
construction.
Random variables and their averages

Consider an abstract probability space X whose elements ( events) are labeled by x. The average of any function f (x) is denoted
as Ef or ⟨f ⟩, and is defined for discrete sets as
Ef = ⟨f ⟩ = ∑ f (x) P (x) , (1.2.12)
x∈X
where P (x) is the probability of x. For continuous sets, we have
Ef = ⟨f ⟩ = ∫ dx f (x) P (x) . (1.2.13)
Typically for continuous sets we have X = R or X = R . Gardiner and other authors introduce an extra symbol, X, to denote a
≥0
random variable, with X(x) = x being its value. This is formally useful but notationally confusing, so we’ll avoid it here and
speak loosely of x as a random variable.
When there are two random variables x ∈ X and y ∈ Y , we have Ω = X × Y is the product space, and
Ef (x, y) = ⟨f (x, y)⟩ = ∑ ∑ f (x, y) P (x, y) , (1.2.14)
x∈X y∈Y
with the obvious generalization to continuous sets. This generalizes to higher rank products, x
i
∈ Xi with i ∈ {1, … , N } . The
covariance of x and x is defined as
i j
C ≡ ⟨(x − ⟨x ⟩)(x − ⟨x ⟩)⟩ = ⟨x x ⟩ − ⟨x ⟩⟨x ⟩ . (1.2.15)

ij i i j j i j i j
If f (x) is a convex function then one has

Ef (x) ≥ f (Ex) . (1.2.16)
For continuous functions, f (x) is convex if f ′′

(x) ≥ 0 everywhere4. If f (x) is convex on some interval [a, b] then for x 1,2
∈ [a, b]
we must have
f (λ x + (1 − λ)x ) ≤ λf (x ) + (1 − λ)f (x ) , (1.2.17)

1 2 1 2
where λ ∈ [0, 1]. This is easily generalized to
f ( ∑ pn xn ) ≤ ∑ pn f (xn ) , (1.2.18)
n n
where p n = P (xn ) , a result known as Jensen’s theorem.
This page titled 1.2: Basic Concepts in Probability Theory is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by
Daniel Arovas.
Entropy and Information Theory

It was shown in the classic 1948 work of Claude Shannon that entropy is in fact a measure of information5. Suppose we observe
that a particular event occurs with probability p. We associate with this observation an amount of information I (p). The
information I (p) should satisfy certain desiderata:
Information is non-negative, I (p) ≥ 0 .
If two events occur independently so their joint probability is p p , then their information is additive,
1 2
I (p p ) = I (p ) + I (p ) .
1 2 1 2
I (p) is a continuous function of p .
There is no information content to an event which is always observed, I (1) = 0 .

From these four properties, it is easy to show that the only possible function I (p) is
I (p) = −A ln p , (1.3.1)
where A is an arbitrary constant that can be absorbed into the base of the logarithm, since log x = ln x/ ln b . We will take A = 1
b
and use e as the base, so I (p) = − ln p . Another common choice is to take the base of the logarithm to be 2, so I (p) = − log p . 2
In this latter case, the units of information are known as bits. Note that I (0) = ∞ . This means that the observation of an extremely
rare event carries a great deal of information6
Now suppose we have a set of events labeled by an integer n which occur with probabilities {p }. What is the expected amount of
n
information in N observations? Since event n occurs an average of N p times, and the information content in p is − ln p , we
n n n
have that the average information per observation is

⟨I ⟩
N
S = = − ∑ pn ln pn , (1.3.2)
N
n
which is known as the entropy of the distribution. Thus, maximizing S is equivalent to maximizing the information content per
observation.
Consider, for example, the information content of course grades. As we shall see, if the only constraint on the probability
distribution is that of overall normalization, then S is maximized when all the probabilities p are equal. The binary entropy is then
n
S = log Γ , since p = 1/Γ. Thus, for pass/fail grading, the maximum average information per grade is − log ( ) = log 2 = 1
1
2 n 2 2
2
bit. If only A, B, C, D, and F grades are assigned, then the maximum average information per grade is log 5 = 2.32 bits. If we 2
expand the grade options to include {A+, A, A-, B+, B, B-, C+, C, C-, D, F}, then the maximum average information per grade is
log 11 = 3.46 bits.
2
Equivalently, consider, following the discussion in vol. 1 of Kardar, a random sequence {n , n , … , n } where each element n
1 2 N j
takes one of K possible values. There are then K such possible sequences, and to specify one of them requires
N
) = N log K bits of information. However, if the value n occurs with probability p , then on average it will occur
N
log (K n
2 2
N = Np
n times in a sequence of length N , and the total number of such sequences will be
n
N!
g(N ) = . (1.3.3)
K
∏ Nn !
n=1
In general, this is far less that the total possible number K

N
, and the number of bits necessary to specify one from among these
g(N ) possibilities is
K K
log2 g(N ) = log2 (N !) − ∑ log2 (Nn !) ≈ −N ∑ pn log2 pn , (1.3.4)
n=1 n=1
up to terms of order unity. Here we have invoked Stirling’s approximation. If the distribution is uniform, then we have p n =
1
K
for
all n ∈ {1, … , K}, and log g(N ) = N log K .
2 2
Probability distributions from maximum entropy
We have shown how one can proceed from a probability distribution and compute various averages. We now seek to go in the other
direction, and determine the full probability distribution based on a knowledge of certain averages.
At first, this seems impossible. Suppose we want to reproduce the full probability distribution for an N -step random walk from
knowledge of the average ⟨X⟩ = (2p − 1)N , where p is the probability of moving to the right at each step (see §1 above). The
problem seems ridiculously underdetermined, since there are 2 possible configurations for an N -step random walk: σ = ±1 for
N
j
j = 1, … , N . Overall normalization requires
∑ P (σ , … , σ ) =1 , (1.3.5)
1 N
{σ }
j
but this just imposes one constraint on the 2 probabilities N

P (σ , … , σ
1 N
), leaving 2
N
−1 overall parameters. What principle
allows us to reconstruct the full probability distribution
N N
(1+σ )/2 (1−σ )/2
P (σ , … , σ ) = ∏ (p δ +q δ ) = ∏p j
q j
, (1.3.6)
1 N σ ,1 σ ,−1
j j
j=1 j=1
corresponding to N independent steps?
The principle of maximum entropy

The entropy of a discrete probability distribution {p n} is defined as
S = − ∑ pn ln pn , (1.3.7)
where here we take e as the base of the logarithm. The entropy may therefore be regarded as a function of the probability
distribution: S = S({p }) . One special property of the entropy is the following. Suppose we have two independent normalized
n
distributions {p } and {p }. The joint probability for events a and b is then P = p p . The entropy of the joint distribution is
A
a
B
b a,b
A
a
B
b
then
A B A B A B A B
S = −∑∑P ln P = − ∑ ∑ pa p ln (pa p ) = − ∑ ∑ pa p ( ln pa + ln p )
a,b a,b b b b b
a b a b a b
A A B B B A A A B B
= − ∑ pa ln pa ⋅ ∑ p −∑p ln p ⋅ ∑ pa = − ∑ pa ln pa − ∑ p ln p
b b b b b
a b b a a b
A B
=S +S .
Thus, the entropy of a joint distribution formed from two independent distributions is additive.
Suppose all we knew about {p } was that it was normalized. Then ∑ p = 1 . This is a constraint on the values {p }. Let us now
n n n n
extremize the entropy S with respect to the distribution {p }, but subject to the normalization constraint. We do this using
n
Lagrange’s method of undetermined multipliers. We define

∗
S ({ pn }, λ) = − ∑ pn ln pn − λ( ∑ pn − 1) (1.3.8)
n n
and we freely extremize S over all its arguments. Thus, for all n we have
∗
∗
∂S
0 = = −( ln pn + 1 + λ)
∂pn
∗
∂S
0 = = ∑ pn − 1 .
∂λ
n
From the first of these equations, we obtain p n =e

−(1+λ)
, and from the second we obtain
−(1+λ) −(1+λ)
∑ pn = e ⋅∑1 = Γe , (1.3.9)
n n
where Γ ≡ ∑ n
1 is the total number of possible events. Thus, p n = 1/Γ , which says that all events are equally probable.
Now suppose we know one other piece of information, which is the average value X =∑
n
Xn pn of some quantity. We now
extremize S subject to two constraints, and so we define
∗
S ({ pn }, λ , λ ) = − ∑ pn ln pn − λ ( ∑ pn − 1) − λ ( ∑ Xn pn − X) . (1.3.10)
0 1 0 1
n n n
We then have
∗
∂S
= −( ln pn + 1 + λ +λ Xn ) = 0 , (1.3.11)
0 1
∂pn
which yields the two-parameter distribution

−(1+λ0 ) −λ1 Xn
pn = e e . (1.3.12)
To fully determine the distribution {p } we need to invoke the two equations ∑

n n
pn = 1 and ∑ n
Xn pn = X , which come from
extremizing S with respect to λ and λ , respectively:
∗
0 1
−(1+λ0 ) −λ1 Xn
1 =e ∑e
−(1+λ0 ) −λ1 Xn
X =e ∑ Xn e .
General formulation
The generalization to K extra pieces of information (plus normalization) is immediately apparent. We have
a a
X = ∑ Xn pn , (1.3.13)
and therefore we define

K
∗ a a
S ({ pn }, { λa }) = − ∑ pn ln pn − ∑ λa ( ∑ Xn pn − X ) , (1.3.14)
n a=0 n
(a=0)
with X n ≡X
(a=0)
=1 . Then the optimal distribution which extremizes S subject to the K + 1 constraints is
K
a
pn = exp {− 1 − ∑ λa Xn }
a=0
K
1
a
= exp {− ∑ λa Xn } ,
Z
a=1
where Z = e is determined by normalization: ∑ p

1+λ0
n n =1 . This is a (K + 1) -parameter distribution, with {λ , λ , … , λ
0 1 K
}
determined by the K + 1 constraints in Equation [Kpoc].
Example
As an example, consider the random walk problem. We have two pieces of information:
∑ ⋯ ∑ P (σ , … , σ ) =1
1 N
σ σ
1 N
∑ ⋯ ∑ P (σ , … , σ )∑σ =X .
1 N j
σ1 σ j=1
N
Here the discrete label n from §3.2 ranges over 2 possible values, and may be written as an
N
N digit binary number r
N
⋯r
1
,
where r = (1 + σ ) is 0 or 1. Extremizing S subject to these constraints, we obtain
j
1
2 j
N
−λ σ
P (σ , … , σ ) = C exp {− λ ∑ σ } = C ∏ e j
, (1.3.15)
1 N j
j j=1
where C ≡ e −(1+λ0 )
and λ ≡ λ . Normalization then requires
1
λ −λ N
Tr P ≡ ∑ P (σ , … , σ ) = C (e +e ) , (1.3.16)
1 N
{σ }
j
hence C = (cosh λ) −N
. We then have
N −λσ N
e j
P (σ , … , σ ) =∏ = ∏ (p δ +q δ ) , (1.3.17)
1 N λ −λ σ ,1 σ ,−1
e +e j j
j=1 j=1
where
−λ λ
e e
p = , q = 1 −p = . (1.3.18)
λ −λ λ −λ
e +e e +e
We then have X = (2p − 1)N , which determines p = 1
2
(N + X) , and we have recovered the Bernoulli distribution.
Of course there are no miracles7, and there are an infinite family of distributions for which X = (2p − 1)N that are not Bernoulli.
For example, we could have imposed another constraint, such as E = ∑ σ σ . This would result in the distribution N −1
j=1 j j+1
N N −1
1
P (σ , … , σ ) = exp {− λ ∑σ −λ ∑ σ σ } , (1.3.19)
1 N 1 j 2 j j+1
Z
j=1 j=1
with Z(λ , λ ) determined by normalization: ∑

1 2 σ
P (σ) = 1 . This is the one-dimensional Ising chain of classical equilibrium
statistical physics. Defining the transfer matrix R with s, s = ±1 ,
′ ′
−λ (s+s )/2 −λ ss ′
′
=e 1
e 2
ss
−λ1 −λ2 λ2
e e
R =( )
λ2 λ1 −λ2
e e
−λ2 λ2 x −λ2 z
=e cosh λ I+e τ −e sinh λ τ ,
1 1
where τ and τ are Pauli matrices, we have that

x z
N N −1
Z = Tr (R ) , Z = Tr (R S) , (1.3.20)
ring chain
where S ,
′
−λ1 (s+s )/2
′
=e
ss
−λ
e 1
1
S =( )
λ1
1 e
x z
= cosh λ I+τ − sinh λ τ .
1 1
The appropriate case here is that of the chain, but in the thermodynamic limit N → ∞ both chain and ring yield identical results,
so we will examine here the results for the ring, which are somewhat easier to obtain. Clearly Z =ζ +ζ , where ζ are the ring +
N N
− ±
eigenvalues of R :
−−−−−−−−−−−−−−−
−λ2 −2λ2 2 2λ2
ζ =e cosh λ ± √e sinh λ +e . (1.3.21)
± 1 1
In the thermodynamic limit, the ζ eigenvalue dominates, and Z

+ ring
≃ ζ+
N
. We now have
N
∂ ln Z N sinh λ
1
X = ⟨∑σ ⟩ = − =− −−−−−−−−−−− . (1.3.22)
j
∂λ 2 4λ2
j=1 1 √ sinh λ +e
1
We also have E = −∂ ln Z/∂λ . These two equations determine the Lagrange multipliers λ (X, E, N ) and λ (X, E, N ). In the
2 1 2
thermodynamic limit, we have λ = λ (X/N , E/N ) . Thus, if we fix X/N = 2p − 1 alone, there is a continuous one-parameter
i i
family of distributions, parametrized ε = E/N , which satisfy the constraint on X.

So what is it about the maximum entropy approach that is so compelling? Maximum entropy gives us a calculable distribution
which is consistent with maximum ignorance given our known constraints. In that sense, it is as unbiased as possible, from an
information theoretic point of view. As a starting point, a maximum entropy distribution may be improved upon, using Bayesian
methods for example (see §5.2 below).
Continuous probability distributions
Suppose we have a continuous probability density P (φ) defined over some set Ω. We have observables
a a
X = ∫ dμ X (φ) P (φ) , (1.3.23)
D
where dμ is the appropriate integration measure. We assume dμ = ∏ j=1
dφ
j
, where D is the dimension of Ω. Then we extremize
the functional
K
∗ a a
S [P (φ), { λa }] = − ∫ dμ P (φ) ln P (φ) − ∑ λa ( ∫ dμ P (φ) X (φ) − X ) (1.3.24)
a=0
Ω Ω
with respect to P (φ) and with respect to {λ }. Again, X a

0
(φ) ≡ X
0
≡1 . This yields the following result:
K
a
ln P (φ) = −1 − ∑ λa X (φ) . (1.3.25)
a=0
The K + 1 Lagrange multipliers {λ a} are then determined from the K + 1 constraint equations in Equation [constcont].
As an example, consider a distribution P (x) over the real numbers R. We constrain
∞ ∞ ∞
2 2 2
∫ dx P (x) = 1 , ∫ dx x P (x) = μ , ∫ dx x P (x) = μ +σ . (1.3.26)
−∞ −∞ −∞
Extremizing the entropy, we then obtain

2
−λ1 x−λ2 x
P (x) = C e , (1.3.27)
where C = e −(1+λ )
0
. We already know the answer:
1 2
−(x−μ ) /2 σ
2
P (x) = −− −− e . (1.3.28)
√2πσ 2
In other words, λ 1
= −μ/ σ
2
and λ
2
= 1/2 σ
2
, with C = (2π σ 2 −1/2
) exp(−μ /2 σ )
2 2
.
This page titled 1.3: Entropy and Probability is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel
Arovas.
Discrete and Continuous Distributions
Consider a system whose possible configurations | n ⟩ can be labeled by a discrete variable n ∈ C , where C is the set of possible
configurations. The total number of possible configurations, which is to say the order of the set C , may be finite or infinite. Next,
consider an ensemble of such systems, and let P denote the probability that a given random element from that ensemble is in the
n
state (configuration) | n ⟩. The collection {P } forms a discrete probability distribution. We assume that the distribution is
n
normalized, meaning
∑ Pn = 1 . (1.4.1)
n∈C
Now let A be a quantity which takes values depending on n . The average of A is given by
n
⟨A⟩ = ∑ Pn An . (1.4.2)
n∈C
Typically, C is the set of integers (Z) or some subset thereof, but it could be any countable set. As an example, consider the throw
of a single six-sided die. Then P = for each n ∈ {1, … , 6}. Let A = 0 if n is even and 1 if n is odd. Then find ⟨A⟩ = , on
n
1
6
n
1
average half the throws of the die will result in an even number.
It may be that the system’s configurations are described by several discrete variables {n 1
, n , n , …}
2 3
. We can combine these into
a vector n and then we write P for the discrete distribution, with ∑ P = 1 .
n n n
Another possibility is that the system’s configurations are parameterized by a collection of continuous variables,
φ = { φ , … , φ } . We write φ ∈ Ω , where Ω is the phase space (or configuration space) of the system. Let dμ be a measure on
n
1
this space. In general, we can write

dμ = W (φ , … , φn ) dφ dφ ⋯ dφn . (1.4.3)
1 1 2
The phase space measure used in classical statistical mechanics gives equal weight W to equal phase space volumes:
r
dμ = C ∏ dqσ dpσ , (1.4.4)
σ=1
where C is a constant we shall discuss later on below8.

Any continuous probability distribution P (φ) is normalized according to
∫ dμ P (φ) = 1 . (1.4.5)
The average of a function A(φ) on configuration space is then
⟨A⟩ = ∫ dμ P (φ) A(φ) . (1.4.6)
For example, consider the Gaussian distribution

1 2 2
−(x−μ ) /2 σ
P (x) = e . (1.4.7)
−−−−
√2πσ 2
From the result9

∞
−−
−αx
2
−βx
π β
2
/4α
∫ dx e e =√ e , (1.4.8)
α
−∞
we see that P (x) is normalized. One can then compute
⟨x⟩ = μ
2 2 2
⟨x ⟩ − ⟨x ⟩ =σ .
We call μ the mean and σ the standard deviation of the distribution, Equation [pgauss].
The quantity P (φ) is called the distribution or probability density. One has
P (φ) dμ = probability that configuration lies within volume dμ centered at φ (1.4.9)
For example, consider the probability density P = 1 normalized on the interval x ∈ [0, 1]. The probability that some x chosen at
random will be exactly , say, is infinitesimal – one would have to specify each of the infinitely many digits of x. However, we can
1
say that x ∈ [0.45 , 0.55]with probability . 1
10
If x is distributed according to P 1
(x), then the probability distribution on the product space (x , x ) is simply the product of the 1 2
distributions: P (x , x ) = P (x
2 1 2 1 1
) P (x ) . Suppose we have a function ϕ(x , … , x ). How is it distributed? Let P (ϕ) be the
1 2 1 N
distribution for ϕ . We then have

∞ ∞
P (ϕ) = ∫ dx ⋯ ∫ dx P (x , … , x ) δ(ϕ(x , … , x ) − ϕ)
1 N N 1 N 1 N
−∞ −∞
∞ ∞
= ∫ dx ⋯ ∫ dx P (x ) ⋯ P (x ) δ(ϕ(x , … , x ) − ϕ) ,
1 N 1 1 1 N 1 N
−∞ −∞
where the second line is appropriate if the {x j

} are themselves distributed independently. Note that
∞
∫ dϕ P (ϕ) = 1 , (1.4.10)
−∞
so P (ϕ) is itself normalized.
Central limit theorem

In particular, consider the distribution function of the sum X = ∑ N
i=1
x
i
. We will be particularly interested in the case where N is
large. For general N , though, we have
∞ ∞
P (X) = ∫ dx ⋯ ∫ dx P (x ) ⋯ P (x ) δ(x +x +… +x − X) . (1.4.11)

N 1 N 1 1 1 N 1 2 N
−∞ −∞
It is convenient to compute the Fourier transform10 of P (X):

∞
^ −ikX
P N (k) = ∫ dX P (X) e
N
−∞
∞ ∞ ∞
−ikX N
^
= ∫ dX∫ dx ⋯ ∫ dx P (x ) ⋯ P (x ) δ(x +… +x − X) e = [ P 1 (k)] ,
1 N 1 1 1 N 1 N
−∞ −∞ −∞
where
∞
^ −ikx
P 1 (k) = ∫ dx P (x) e (1.4.12)
1
−∞
is the Fourier transform of the single variable distribution P (x). The distribution P (X) is a convolution of the individual P
1 N 1
(x )
i
distributions. We have therefore proven that the Fourier transform of a convolution is the product of the Fourier transforms.
OK, now we can write for P^ 1

(k)
∞
1 1
^ 2 2 3 3
P 1 (k) = ∫ dx P (x) (1 − ikx − k x + ik x +… )
1
2 6
−∞
1 1
2 2 3 3
= 1 − ik⟨x⟩ − k ⟨x ⟩ + i k ⟨x ⟩ + … .
2 6
Thus,
1 1
^ 2 2 3 3
ln P 1 (k) = −iμk − σ k + iγ k +… , (1.4.13)
2 6
where
μ = ⟨x⟩
2 2 2
σ = ⟨x ⟩ − ⟨x ⟩
3 3 2 3
γ = ⟨x ⟩ − 3 ⟨x ⟩ ⟨x⟩ + 2 ⟨x ⟩
We can now write

2 3
N −iN μk −N σ
2
k /2 iN γ
3
k /6
^
[ P 1 (k)] =e e e ⋯ (1.4.14)
Now for the inverse transform. In computing , we will expand the term and all subsequent terms in the above
3
iN γ k /6
P (X) e
N
product as a power series in k . We then have

∞
dk ik(X−N μ) −N σ
2 2
k /2
1 3 3
P (X) = ∫ e e {1 + i Nγ k +… }
N
2π 6
−∞
3 3
γ ∂ 1 2 2
−(X−N μ ) /2N σ
= (1 − N +… ) e
3 −−−−−−
6 ∂X √2πN σ 2
3 3
γ ∂ 1 2 2
−1/2 −ξ /2 σ
= (1 − N +… ) −−− −−− e .
3 2
6 ∂ξ √2πN σ
−
−
In going from the second line to the third, we have written X = N μ + √N ξ , in which case ∂ = N ∂ , and the non-
X
−1/2
ξ
Gaussian terms give a subleading contribution which vanishes in the N → ∞ limit. We have just proven the central limit theorem:
in the limit N → ∞ , the distribution of a sum of N independent random variables x is a Gaussian with mean N μ and standard i
−
−
deviation √N σ. Our only assumptions are that the mean μ and standard deviation σ exist for the distribution P (x). Note that 1
P (x) itself need not be a Gaussian – it could be a very peculiar distribution indeed, but so long as its first and second moment
1
N
exist, where the k^\ssr{th} moment is simply ⟨x ⟩ , the distribution of the sum X = ∑ x is a Gaussian.
k
i=1 i
Moments and cumulants

Consider a general multivariate distribution P (x 1
,…,x
N
) and define the multivariate Fourier transform
∞ ∞
N
^
P (k , … , k ) = ∫ dx ⋯∫ d x P (x , … , x ) exp (− i ∑ k x ) . (1.4.15)
1 N 1 N 1 N j j
j=1
−∞ −∞
The inverse relation is

∞ ∞
N
dk dk
1 N ^
P (x , … , x ) = ∫ ⋯∫ P (k , … , k ) exp (+ i ∑ k x ) . (1.4.16)
1 N 1 N j j
2π 2π
j=1
−∞ −∞
Acting on P^(k) , the differential operator i ∂ki

∂
brings down from the exponential a factor of x inside the integral. Thus, i
m m
1 N
∂ ∂ m m
^ 1 N
[(i ) ⋯ (i ) P (k)] = ⟨x ⋯x ⟩ . (1.4.17)
1 N
∂k ∂k
1 N
k=0
Similarly, we can reconstruct the distribution from its moments, viz.
∞ ∞ m m
(−ik ) 1
(−ik ) N
m m
^ 1 N 1 N
P (k) = ∑ ⋯ ∑ ⋯ ⟨x ⋯x ⟩ . (1.4.18)
1 N
m ! m !
m =0 m =0 1 N
1 N
m m
The cumulants ⟨⟨x 1
1
⋯x
N
N
⟩⟩ are defined by the Taylor expansion of ln P^(k) :
∞ ∞ m m
(−ik ) 1
(−ik ) N
m m
1 N
^ 1 N
ln P (k) = ∑ ⋯ ∑ ⋯ ⟨⟨x ⋯x ⟩⟩ . (1.4.19)
1 N
m ! m !
m =0 m =0 1 N
1 N
There is no general form for the cumulants. It is straightforward to derive the following low order results:
⟨⟨x ⟩⟩ = ⟨x ⟩
i i
⟨⟨x x ⟩⟩ = ⟨x x ⟩ − ⟨x ⟩⟨x ⟩
i j i j i j
⟨⟨x x x ⟩⟩ = ⟨x x x ⟩ − ⟨x x ⟩⟨x ⟩ − ⟨x x ⟩⟨x ⟩ − ⟨x x ⟩⟨x ⟩ + 2⟨x ⟩⟨x ⟩⟨x ⟩ .

i j k i j k i j k j k i k i j i j k
Multidimensional Gaussian integral

Consider the multivariable Gaussian distribution,
1/2
det A 1
P (x) ≡ ( ) exp ( − x A x ) , (1.4.20)
n i ij j
(2π) 2
where A is a positive definite matrix of rank n . A mathematical result which is extremely important throughout physics is the
following:
∞ ∞
1/2
det A 1 1
−1
Z(b) = ( ) ∫ dx ⋯ ∫ dxn exp ( − x A x + b x ) = exp ( b A b ) . (1.4.21)
n 1 i ij j i i i ij j
(2π) 2 2
−∞ −∞
Here, the vector b = (b , … , b ) is identified as a source. Since

1 n Z(0) = 1 , we have that the distribution P (x) is normalized.
Now consider averages of the form
n
∂ Z(b) ∣
n
⟨x ⋯ x ⟩ = ∫ d x P (x) x ⋯ x = ∣
j1 j2k j1 j2k ∣
∂b ⋯ ∂b b=0
j j
1 2k
−1 −1
= ∑ A ⋯A .
j j j j
σ( 1) σ( 2) σ( 2k−1) σ( 2k)
contractions
The sum in the last term is over all contractions of the indices {j , … , j }. A contraction is an arrangement of the 2k indices 1 2k
into k pairs. There are C = (2k)!/2 k! possible such contractions. To obtain this result for C , we start with the first index and
2k
k
then find a mate among the remaining 2k − 1 indices. Then we choose the next unpaired index and find a mate among the
remaining 2k − 3 indices. Proceeding in this manner, we have
(2k)!
C = (2k − 1) ⋅ (2k − 3) ⋯ 3 ⋅ 1 = . (1.4.22)
2k k
2 k!
Equivalently, we can take all possible permutations of the 2k indices, and then divide by 2 k! since permutation within a given pair k
results in the same contraction and permutation among the k pairs results in the same contraction. For example, for k = 2 , we have
C = 3 , and
4
−1 −1 −1 −1 −1 −1
⟨x x x x ⟩ =A A +A A +A A . (1.4.23)
j1 j2 j3 j4 j1 j2 j3 j4 j1 j3 j2 j4 j1 j4 j2 j3
If we define bi
= ik
i
, we have
1 −1
^
P (k) = exp (− k A k ) , (1.4.24)
i ij j
2
from which we read off the cumulants ⟨⟨x i

x ⟩⟩ = A
j
−1
ij
, with all higher order cumulants vanishing.
This page titled 1.4: General Aspects of Probability Distributions is shared under a CC BY-NC-SA license and was authored, remixed, and/or
curated by Daniel Arovas.
Frequentists and Bayesians
There field of statistical inference is roughly divided into two schools of practice: frequentism and Bayesianism. You can find
several articles on the web discussing the differences in these two approaches. In both cases we would like to model observable
data x by a distribution. The distribution in general depends on one or more parameters θ . The basic worldviews of the two
approaches are as follows:
Frequentism: Data x are a random sample drawn from an infinite pool at some
frequency. The underlying parameters θ , which are to be estimated, remain fixed during
this process. There is no information prior to the model specification. The experimental
conditions under which the data are collected are presumed to be controlled and
repeatable. Results are generally expressed in terms of confidence intervals and
confidence levels, obtained via statistical hypothesis testing. Probabilities have meaning
only for data yet to be collected. Calculations generally are computationally
straightforward.
Bayesianism: The only data x which matter are those which have been observed. The
parameters θ are unknown and described probabilistically using a prior distribution,
which is generally based on some available information but which also may be at least
partially subjective. The priors are then to be updated based on observed data x. Results
are expressed in terms of posterior distributions and credible intervals. Calculations can
be computationally intensive.
In essence, frequentists say the data are random and the parameters are fixed. while Bayesians say the data are fixed and the
parameters are random11. Overall, frequentism has dominated over the past several hundred years, but Bayesianism has been
coming on strong of late, and many physicists seem naturally drawn to the Bayesian perspective.
Updating Bayesian priors

Given data D and a hypothesis H , Bayes’ theorem tells us
P (D|H ) P (H )
P (H |D) = . (1.5.1)
P (D)
Typically the data is in the form of a set of values x = {x , … , x } , and the hypothesis in the form of a set of parameters
1 N
θ = { θ , … , θ } . It is notationally helpful to express distributions of x and distributions of x conditioned on θ using the symbol
1 K
f , and distributions of θ and distributions of θ conditioned on x using the symbol π, rather than using the symbol P everywhere.
We then have
f (x|θ) π(θ)
π(θ|x) = , (1.5.2)
′ ′ ′
∫ dθ f (x| θ ) π(θ )
Θ
where Θ ∋ θ is the space of parameters. Note that ∫ dθ π(θ|x) = 1 . The denominator of the RHS is simply f (x), which is
Θ
independent of θ , hence π(θ|x) ∝ f (x|θ) π(θ) . We call π(θ) the prior for θ , f (x|θ) the likelihood of x given θ , and π(θ|x) the
posterior for θ given x. The idea here is that while our initial guess at the θ distribution is given by the prior π(θ), after taking
data, we should update this distribution to the posterior π(θ|x). The likelihood f (x|θ) is entailed by our model for the phenomenon
which produces the data. We can use the posterior to find the distribution of new data points y, called the posterior predictive
distribution,
f (y|x) = ∫ dθ f (y|θ) π(θ|x) . (1.5.3)
This is the update of the prior predictive distribution,
f (x) = ∫ dθ f (x|θ) π(θ) . (1.5.4)
Example 1.5.1 : Coin Flipping

Consider a model of coin flipping based on a standard Bernoulli distribution, where θ ∈ [0, 1] is the probability for heads (
x = 1 ) and 1 − θ the probability for tails (x = 0 ). That is,
f (x , … , x |θ) = ∏ [(1 − θ) δ +θδ ]

1 N xj ,0 xj ,1
j=1
X N −X
=θ (1 − θ) ,
where X = ∑ x is the observed total number of heads, and

N
j=1 j
N −X the corresponding number of tails. We now need a
prior π(θ). We choose the Beta distribution,
α−1 β−1
θ (1 − θ)
π(θ) = , (1.5.5)
B(α, β)
where B(α, β) = Γ(α) Γ(β)/Γ(α + β) is the Beta function. One can check that π(θ) is normalized on the unit interval:
1
∫
0
for all positive α, β. Even if we limit ourselves to this form of the prior, different Bayesians might bring
dθ π(θ) = 1
different assumptions about the values of α and β. Note that if we choose α = β = 1 , the prior distribution for θ is flat, with
π(θ) = 1 .
We now compute the posterior distribution for θ :

X+α−1 N −X+β−1
f (x , … , x |θ) π(θ) θ (1 − θ)
1 N
π(θ| x , … , x ) = = . (1.5.6)
1 N 1
′ ′ ′ B(X + α, N − X + β)
∫ dθ f (x , … , x | θ ) π(θ )
0 1 N
Thus, we retain the form of the Beta distribution, but with updated parameters,
′
α = X +α
′
β = N −X +β .
The fact that the functional form of the prior is retained by the posterior is generally not the case in Bayesian updating. We can
also compute the prior predictive,
1
f (x , … , x ) = ∫ dθ f (x , … , x |θ) π(θ)
1 N 1 N
1 B(X + α, N − X + β)
X+α−1 N −X+β−1
= ∫ dθ θ (1 − θ) = .
B(α, β) B(α, β)
0
The posterior predictive is then

1
f (y , … , y |x , … , x ) = ∫ dθ f (y , … , y |θ) π(θ| x , … , x )
1 M 1 N 1 M 1 N
1
X+Y +α−1 N −X+M−Y +β−1
= ∫ dθ θ (1 − θ)
B(X + α, N − X + β)
0
B(X + Y + α, N − X + M − Y + β)
= .
B(X + α, N − X + β)
Hyperparameters and conjugate priors
In Example 1.5.1, θ is a parameter of the Bernoulli distribution, the likelihood, while quantities α and β are hyperparameters
which enter the prior π(θ). Accordingly, we could have written π(θ|α, β) for the prior. We then have for the posterior
f (x|θ) π(θ|α)
π(θ|x, α) = , (1.5.7)
′ ′ ′
∫ dθ f (x| θ ) π(θ |α)
Θ
replacing Equation [BayesPost], , where α ∈ A is the vector of hyperparameters. The hyperparameters can also be distributed,
according to a hyperprior ρ(α), and the hyperpriors can further be parameterized by hyperhyperparameters, which can have their
own distributions, ad nauseum.
What use is all this? We’ve already seen a compelling example: when the posterior is of the same form as the prior, the Bayesian
update can be viewed as an automorphism of the hyperparameter space A , one set of hyperparameters α is mapped to a new set of
hyperparameters α̃ .
Definition: A parametric family of distributions P = {π(θ|α) | θ ∈ Θ, α ∈ A} is called

a conjugate family for a family of distributions {f (x|θ) | x ∈ X , θ ∈ Θ} if, for all
x ∈ X and α ∈ A ,
f (x|θ) π(θ|α)
π(θ|x, α) ≡ ∈ P . (1.5.8)
′ ′ ′
∫ dθ f (x|θ ) π(θ |α)
That is, π(θ|x, α) = π(θ|α̃) for some α̃ ∈ A , with α̃ = α̃(α, x) .

As an example, consider the conjugate Bayesian analysis of the Gaussian distribution. We assume a likelihood
N
1
2 −N /2 2
f (x|u, s) = (2π s ) exp{− ∑(x − u) } . (1.5.9)
2 j
2s
j=1
The parameters here are θ = {u, s} . Now consider the prior distribution
2
(u − μ0 )
2 −1/2
π(u, s| μ , σ ) = (2π σ ) exp{− } . (1.5.10)
0 0 0 2
2σ
0
Note that the prior distribution is independent of the parameter s and only depends on u and the hyperparameters α = (μ , σ )
0 0
.
We now compute the posterior:
π(u, s|x, μ , σ ) ∝ f (x|u, s) π(u, s| μ , σ )

0 0 0 0
2 2
1 N μ N ⟨x⟩ μ N ⟨x ⟩
2 0 0
= exp {− ( + )u +( + )u − ( + )} ,
2 2 2 2 2 2
2σ 2s σ s 2σ 2s
0 0 0
with ⟨x⟩ = ∑ x and

1
N
N
j=1 j
2
⟨x ⟩ =
N
1
∑
N
j=1
2
x
j
. This is also a Gaussian distribution for u, and after supplying the appropriate
normalization one finds
2
(u − μ )
2 −1/2 1
π(u, s|x, μ , σ ) = (2π σ ) exp {− } , (1.5.11)
0 0 1 2
2σ
1
with
2
N (⟨x⟩ − μ )σ
0 0
μ =μ +
1 0 2 2
s +Nσ
0
2 2
s σ
2 0
σ = .
1 2
2
s +Nσ
0
Thus, the posterior is among the same family as the prior, and we have derived the update rule for the hyperparameters
(μ , σ ) → (μ , σ ) . Note that σ < σ , so the updated Gaussian prior is sharper than the original. The updated mean μ shifts in
0 0 1 1 1 0 1
the direction of ⟨x⟩ obtained from the data set.
The problem with priors

We might think that the for the coin flipping problem, the flat prior π(θ) = 1 is an appropriate initial one, since it does not privilege
any value of θ . This prior therefore seems ’objective’ or ’unbiased’, also called ’uninformative’. But suppose we make a change of
variables, mapping the interval θ ∈ [0, 1] to the entire real line according to ζ = ln [θ/(1 − θ)] . In terms of the new parameter ζ ,
we write the prior as π~
(ζ) . Clearly π(θ) dθ = π(ζ) dζ , so π(ζ) = π(θ) dθ/dζ . For our example, find π(ζ) =
~ ~ ~
sech (ζ/2) , which
1
4
2
is not flat. Thus what was uninformative in terms of θ has become very informative in terms of the new parameter ζ . Is there any
truly unbiased way of selecting a Bayesian prior?
One approach, advocated by E. T. Jaynes, is to choose the prior distribution π(θ) according to the principle of maximum entropy.
For continuous parameter spaces, we must first define a parameter space metric so as to be able to ’count’ the number of different
parameter states. The entropy of a distribution π(θ) is then dependent on this metric: S = − ∫ dμ(θ) π(θ) ln π(θ) .
Another approach, due to Jeffreys, is to derive a parameterization-independent prior from the likelihood f (x|θ) using the so-called
Fisher information matrix,
2
∂ lnf (x|θ)
I (θ) = −Eθ ( )
ij
∂θ ∂θ
i j
2
∂ lnf (x|θ)
= −∫ dx f (x|θ) .
∂θ ∂θ
i j
The Jeffreys prior \pi\ns_\ssr{J}(\Btheta) is defined as
\pi\ns_\ssr{J}(\Btheta)\propto\sqrt{\det\! I(\Btheta)}\ .
One can check that the Jeffries prior is invariant under reparameterization. As an example, consider the Bernoulli process, for
N
which ln f (x|θ) = X ln θ + (N − X) ln(1 − θ) , where X = ∑ x . Then j=1 j
2
d ln p(x|θ) X N −X
− = + , (1.5.12)
2 2 2
dθ θ (1 − θ)
and since E θ
X = Nθ , we have
I(\theta)={N\over\theta(1-\theta)}\qquad\Rightarrow\qquad \pi\ns_\ssr{J}(\theta)={1\over\pi}\,{1\over\sqrt{\theta(1-\theta)}}\ ,
which felicitously corresponds to a Beta distribution with α =β =

1
2
. In this example the Jeffries prior turned out to be a
conjugate prior, but in general this is not the case.
We can try to implement the Jeffreys procedure for a two-parameter family where each x is normally distributed with mean μ and j
standard deviation σ. Let the parameters be (θ , θ ) = (μ, σ) . Then

1 2
N
−− 1
2
− ln f (x|θ) = N ln √2π + N ln σ + ∑(x − μ) , (1.5.13)
2 j
2σ
j=1
and the Fisher information matrix is

−2 −3
Nσ σ ∑ (x − μ)
2 ⎛ j j ⎞
∂ lnf (x|θ)
I (θ) = − =⎜
⎜
⎟ .
⎟ (1.5.14)
∂θ ∂θ
i j
⎝ σ −3 ∑ (x − μ) −N σ
−2
+ 3σ
−4
∑ (x − μ)
2 ⎠
j j j j
Taking the expectation value, we have E (x j

− μ) = 0 and E (x j
− μ)
2
=σ
2
, hence
−2
Nσ 0
E I (θ) = ( ) (1.5.15)
−2
0 2N σ
and the Jeffries prior is \pi\ns_\ssr{J}(\mu,\sigma)\propto \sigma^{-2} . This is problematic because if we choose a flat metric on the
(μ, σ) upper half plane, the Jeffries prior is not normalizable. Note also that the Jeffreys prior no longer resembles a Gaussian, and
hence is not a conjugate prior.
1. The exception is the unbiased case p = q = , where ⟨X⟩ = 0 .↩

1
2. The origin of C lies in the O(ln N ) and O(N ) terms in the asymptotic expansion of ln N ! . We have ignored these terms here.
0
Accounting for them carefully reproduces the correct value of C in Equation [normC].↩
3. The function s(x) is the specific entropy.↩
4. A function g(x) is concave if −g(x) is convex.↩
5. See ‘An Introduction to Information Theory and Entropy’ by T. Carter, Santa Fe Complex Systems Summer School, June 2011.
Available online at astarte.csustan.edu/$\sim$tom/SFI-CSSS/info-theory/info-lec.pdf.↩
6. My colleague John McGreevy refers to I (p) as the surprise of observing an event which occurs with probability p. I like this
very much.↩
7. See §10 of An Enquiry Concerning Human Understanding by David Hume (1748).↩
8. Such a measure is invariant with respect to canonical transformations, which are the broad class of transformations among
coordinates and momenta which leave Hamilton’s equations of motion invariant, and which preserve phase space volumes
under Hamiltonian evolution. For this reason dμ is called an invariant phase space measure.↩
9. Memorize this!↩
10. Jean Baptiste Joseph Fourier (1768-1830) had an illustrious career. The son of a tailor, and orphaned at age eight, Fourier’s
ignoble status rendered him ineligible to receive a commission in the scientific corps of the French army. A Benedictine
minister at the École Royale Militaire of Auxerre remarked, "Fourier, not being noble, could not enter the artillery, although he
were a second Newton." Fourier prepared for the priesthood but his affinity for mathematics proved overwhelming, and so he
left the abbey and soon thereafter accepted a military lectureship position. Despite his initial support for revolution in France, in
1794 Fourier ran afoul of a rival sect while on a trip to Orleans and was arrested and very nearly guillotined. Fortunately the
Reign of Terror ended soon after the death of Robespierre, and Fourier was released. He went on Napoleon Bonaparte’s 1798
expedition to Egypt, where he was appointed governor of Lower Egypt. His organizational skills impressed Napoleon, and upon
return to France he was appointed to a position of prefect in Grenoble. It was in Grenoble that Fourier performed his landmark
studies of heat, and his famous work on partial differential equations and Fourier series. It seems that Fourier’s fascination with
heat began in Egypt, where he developed an appreciation of desert climate. His fascination developed into an obsession, and he
became convinced that heat could promote a healthy body. He would cover himself in blankets, like a mummy, in his heated
apartment, even during the middle of summer. On May 4, 1830, Fourier, so arrayed, tripped and fell down a flight of stairs. This
aggravated a developing heart condition, which he refused to treat with anything other than more heat. Two weeks later, he
died. Fourier’s is one of the 72 names of scientists, engineers and other luminaries which are engraved on the Eiffel Tower.↩
11. "A frequentist is a person whose long-run ambition is to be wrong 5% of the time. A Bayesian is one who, vaguely expecting a
horse, and catching glimpse of a donkey, strongly believes he has seen a mule." – Charles Annis.↩
This page titled 1.5: Bayesian Statistical Inference is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel
Arovas.
1.S: Summary
References
C. Gardiner, Stochastic Methods (4 edition, Springer-Verlag, 2010) Very clear and complete text on stochastic methods with
th
many applications.
J. M. Bernardo and A. F. M. Smith, Bayesian Theory (Wiley, 2000) A thorough textbook on Bayesian methods.
D. Williams, Weighing the Odds: A Course in Probability and Statistics (Cambridge, 2001) A good overall statistics textbook,
according to a mathematician colleague.
E. T. Jaynes, Probability Theory (Cambridge, 2007) An extensive, descriptive, and highly opinionated presentation, with a
strongly Bayesian approach.
A. N. Kolmogorov, Foundations of the Theory of Probability (Chelsea, 1956) The Urtext of mathematical probability theory.
Summary
∙ Discrete distributions: Let n label the distinct possible outcomes of a discrete random process, and let p be the probability for n
outcome n . Let A be a quantity which takes values which depend on n , with A being the value of A under the outcome n . Then
n
the expected value of A is ⟨A⟩ = ∑ p A , where the sum is over all possible allowed values of n . We must have that the
n n n
distribution is normalized, ⟨1⟩ = ∑ p = 1 . n n
∙ Continuous distributions: When the random variable φ takes a continuum of values, we define the probability density P (φ) to be
such that P (φ) dμ is the probability for the outcome to lie within a differential volume dμ of φ , where dμ = W (φ) ∏ dφ , n
i=1 i
were φ is an n -component vector in the configuration space Ω, and where the function W (φ) accounts for the possibility of
different configuration space measures. Then if A(φ) is any function on Ω, the expected value of A is ⟨A⟩ = ∫ dμ P (φ) A(φ) .
Ω
∙ Central limit theorem: If {x 1

,…,x
N
} are each independently distributed according to P (x) , then the distribution of the sum
x is
N
X =∑
i=1 i
∞ ∞ ∞
N
dk N
^ ikX
P (X) = ∫ dx ⋯∫ d x P (x ) ⋯ P (x ) δ(X − ∑ x ) = ∫ [P (k)] e , (1.S.1)
N 1 N 1 N i
2π
i=1
−∞ −∞ −∞
where ^
P (k) = ∫ dx P (x) e
−ikx
is the Fourier transform of P (x) . Assuming that the lowest moments of P (x) exist,
+ O(k ) , where μ = ⟨x⟩ and σ are the mean and standard deviation. Then for
^ 1 2 2 3 2 2 2
ln[ P (k)] = −iμk − σ k = ⟨x ⟩ − ⟨x ⟩
2
N → ∞ ,
2 2
2 −1/2 −(X−N μ ) /2N σ
P (X) = (2πN σ ) e , (1.S.2)
N
−−−−−−−−− − −
−
which is a Gaussian with mean ⟨X⟩ = N μ and standard deviation 2
√⟨X ⟩ − ⟨X ⟩
2
= √N σ . Thus, X is distributed as a Gaussian,
even if P (x) is not a Gaussian itself.
∙ Entropy: The entropy of a statistical distribution is {p } is S = − ∑ p ln p . (Sometimes the base 2 logarithm is used, in
n n n n
which case the entropy is measured in bits.) This has the interpretation of the information content per element of a random
sequence.
∙ Distributions from maximum entropy: Given a distribution {p } subject to (K + 1) constraints of the form X = ∑ X p
n
a
n
a
n n
with a ∈ {0, … , K} , where X = X = 1 (normalization), the distribution consistent with these constraints which maximizes
0 0
n
the entropy function is obtained by extremizing the multivariable function

K
∗ a a
S ({ pn }, { λa }) = − ∑ pn ln pn − ∑ λa ( ∑ Xn pn − X ) , (1.S.3)
n a=0 n
with respect to the probabilities {p n} and the Lagrange multipliers {λ }. This results in a Gibbs distribution,
a
K
1
a
pn = exp{− ∑ λa Xn } , (1.S.4)
Z
a=1
1.S.1 https://phys.libretexts.org/@go/page/18727
where Z = e is determined by normalization, ∑
1+λ0
n
pn = 1 ( the a = 0 constraint) and the K remaining multipliers determined
by the K additional constraints.
∙ Multidimensional Gaussian integral:
∞ ∞
n 1/2
1 (2π) 1
−1
∫ dx ⋯ ∫ dxn exp (− x A x +b x ) = ( ) exp ( b A b ) . (1.S.5)
1 i ij j i i i ij j
2 det A 2
−∞ −∞
∙ Bayes’ theorem: Let the conditional probability for B given A be P (B|A). Then Bayes’ theorem says
. If the ’event space’ is partitioned as {A }, then we have the extended form,
P (A|B) = P (A) ⋅ P (B|A) / P (B)
i
P (B| A ) ⋅ P (A )
i i
P (A |B) = . (1.S.6)
i
∑ P (B| A ) ⋅ P (A )
j j j
When the event space is a ‘binary partition’ {A, ¬A} , as is often the case in fields like epidemiology ( test positive or test
negative), we have
P (B|A) ⋅ P (A)
P (A|B) = . (1.S.7)
P (B|A) ⋅ P (A) + P (B|¬A) ⋅ P (¬A)
Note that P (A|B) + P (¬A|B) = 1 (which follows from ¬¬A = A ).

∙ Updating Bayesian priors: Given data in the form of observed values x = {x , … , x } ∈ X and a hypothesis in the form of 1 N
parameters θ = {θ , … , θ } ∈ Θ , we write the conditional probability (density) for observing x given θ as f (x|θ). Bayes’
1 K
theorem says that the corresponding distribution π(θ|x) for θ conditioned on x is

f (x|θ) π(θ)
π(θ|x) = , (1.S.8)
′ ′ ′
∫ dθ f (x| θ ) π(θ )
Θ
We call π(θ) the prior for θ , f (x|θ) the likelihood of x given θ , and π(θ|x) the posterior for θ given x. We can use the posterior to
find the distribution of new data points y, called the posterior predictive distribution, f (y|x) = ∫ dθ f (y|θ) π(θ|x) . This is the
Θ
update of the prior predictive distribution, f (x) = ∫ dθ f (x|θ) π(θ) . As an example, consider coin flipping with
Θ
f (x|θ) = θ
X
(1 − θ) , where N is the number of flips, and X = ∑ x with x a discrete variable which is 0 for tails and 1
N −X N
j=1 j j
for heads. The parameter θ ∈ [0, 1] is the probability to flip heads. We choose a prior π(θ) = θ (1 − θ) /B(α, β) where
α−1 β−1
B(α, β) = Γ(α) Γ(β)/Γ(α + β) is the Beta distribution. This results in a normalized prior ∫ dθ π(θ) = 1 . The posterior
0
distribution for θ is then

X+α−1 N −X+β−1
f (x , … , x |θ) π(θ) θ (1 − θ)
1 N
π(θ| x , … , x ) = = . (1.S.9)
1 N 1
∫
′
dθ f (x , … , x
′
| θ ) π(θ )
′ B(X + α, N − X + β)
0 1 N
The prior predictive is f (x) = ∫ dθf (x|θ) π(θ) = B(X + α, N − X + β)/B(α, β) , and the posterior predictive for the total
0
number of heads Y in M flips is

1
B(X + Y + α, N − X + M − Y + β)
f (y|x) =∫ dθ f (y|θ) π(θ|x) = . (1.S.10)
B(X + α, N − X + β)
0
This page titled 1.S: Summary is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel Arovas.
CHAPTER OVERVIEW
2: Thermodynamics
2.1: What is Thermodynamics?
2.6: The Entropy
2.S: Summary
Thumbnail: Heat Engine.
This page titled 2: Thermodynamics is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel Arovas.
1
2.1: What is Thermodynamics?
Thermodynamics is the study of relations among the state variables describing a thermodynamic system, and of transformations of
heat into work and vice versa.
Thermodynamic systems and state variables

Thermodynamic systems contain large numbers of constituent particles, and are described by a set of state variables which
describe the system’s properties in an average sense. State variables are classified as being either extensive or intensive.
Extensive variables, such as volume V , particle number N , total internal energy E , magnetization M , , scale linearly with the
system size, as the first power of the system volume. If we take two identical thermodynamic systems, place them next to each
other, and remove any barriers between them, then all the extensive variables will double in size.
Intensive variables, such as the pressure p, the temperature T , the chemical potential μ , the electric field E , , are independent of
system size, scaling as the zeroth power of the volume. They are the same throughout the system, if that system is in an appropriate
state of equilibrium. The ratio of any two extensive variables is an intensive variable. For example, we write n = N /V for the
number density, which scales as V . Intensive variables may also be inhomogeneous. For example, n(r) is the number density at
0
position r , and is defined as the limit of ΔN /ΔV of the number of particles ΔN inside a volume ΔV which contains the point r ,
in the limit V ≫ ΔV ≫ V /N .
Classically, the full motion of a system of N point particles requires 6N variables to fully describe it (3N positions and 3N
velocities or momenta, in three space dimensions)1. Since the constituents are very small, N is typically very large. A typical solid
or liquid, for example, has a mass density on the order of ϱ ∼ 1 g/cm ; for gases, ϱ ∼ 10 g/cm . The constituent atoms have
3 −3 3
masses of 10 to 10 grams per mole, where one mole of X contains N of X, and N = 6.0221415 × 10 is Avogadro’s
0 2
A A
23
number2. Thus, for solids and liquids we roughly expect number densities n of 10 − 10 mol/cm for solids and liquids, and
−2 0 3
10
−5
− 10
−3
mol/ cm for gases. Clearly we are dealing with fantastically large numbers of constituent particles in a typical
3
thermodynamic system. The underlying theoretical basis for thermodynamics, where we use a small number of state variables to
describe a system, is provided by the microscopic theory of statistical mechanics, which we shall study in the weeks ahead.
Intensive quantities such as p, T , and n ultimately involve averages over both space and time. Consider for example the case of a
gas enclosed in a container. We can measure the pressure (relative to atmospheric pressure) by attaching a spring to a moveable
wall, as shown in Fig. [pressure]. From the displacement of the spring and the value of its spring constant k we determine the force
F . This force is due to the difference in pressures, so p = p + F /A . Microscopically, the gas consists of constituent atoms or
0
molecules, which are constantly undergoing collisions with each other and with the walls of the container. When a particle bounces
off a wall, it imparts an impulse 2n ^ ⋅ p) , where p is the particle’s momentum and n
^ (n ^ is the unit vector normal to the wall. (Only
particles with p ⋅ n
^ > 0 will hit the wall.) Multiply this by the number of particles colliding with the wall per unit time, and one
finds the net force on the wall; dividing by the area gives the pressure p. Within the gas, each particle travels for a distance ℓ , called
the mean free path, before it undergoes a collision. We can write ℓ = v̄ τ , where v̄ is the average particle speed and τ is the mean
free time. When we study the kinetic theory of gases, we will derive formulas for ℓ and v̄ (and hence τ ). For now it is helpful to
quote some numbers to get an idea of the relevant distance and time scales. For O gas at standard temperature and pressure (
2
T = 0 C, p = 1 atm), the mean free path is ℓ ≈ 1.1 × 10 cm, the average speed is v̄ ≈ 480 \Rm/s, and the mean free time is
∘ −5
τ ≈ 2.5 × 10
−10
s. Thus, particles in the gas undergo collisions at a rate τ ≈ 4.0 × 10 s . A measuring device, such as our
−1 9 −1
spring, or a thermometer, effectively performs time and space averages. If there are N collisions with a particular patch of wall
c
during some time interval on which our measurement device responds, then the root mean square relative fluctuations in the local
−1/2
pressure will be on the order of N c times the average. Since N is a very large number, the fluctuations are negligible.
c
[politics] From microscale to macroscale : physical versus social sciences.
If the system is in steady state, the state variables do not change with time. If furthermore there are no macroscopic currents of
energy or particle number flowing through the system, the system is said to be in equilibrium. A continuous succession of
equilibrium states is known as a thermodynamic path, which can be represented as a smooth curve in a multidimensional space
whose axes are labeled by state variables. A thermodynamic process is any change or succession of changes which results in a
change of the state variables. In a cyclic process, the initial and final states are the same. In a quasistatic process, the system passes
through a continuous succession of equilibria. A reversible process is one where the external conditions and the thermodynamic
path of the system can be reversed; it is both quasi-static and non-dissipative ( no friction). The slow expansion of a gas against a
piston head, whose counter-force is always infinitesimally less than the force pA exerted by the gas, is reversible. To reverse this
process, we simply add infinitesimally more force to pA and the gas compresses. An example of a quasistatic process which is not
reversible: slowly dragging a block across the floor, or the slow leak of air from a tire. Irreversible processes, as a rule, are
dissipative. Other special processes include isothermal (dT = 0 ), isobaric (dp = 0 ), isochoric (dV = 0 ), and adiabatic (
dQ = 0 , no heat exchange):
′
\mathchar 26
′
\sl reversible: \mathchar 26
dQ = T dS \sl isothermal: dT = 0
′
\sl spontaneous: \mathchar 26
dQ < T dS \sl isochoric: dV = 0
′
\sl adiabatic: \mathchar 26
dQ = 0 \sl isobaric: dp = 0 .
We shall discuss later the entropy S and its connection with irreversibility.
[pressure] The pressure p of a gas is due to an average over space and time of the impulses due to the constituent particles.
How many state variables are necessary to fully specify the equilibrium state of a thermodynamic system? For a single component
system, such as water which is composed of one constituent molecule, the answer is three. These can be taken to be T , p, and V .
One always must specify at least one extensive variable, else we cannot determine the overall size of the system. For a
multicomponent system with g different species, we must specify g + 2 state variables, which may be {T , p, N , … , N }, where 1
g
N a is the number of particles of species a . Another possibility is the set (T , p, V , x , … , x }, where the concentration of
1 g−1
species a is x = N /N . Here, N = ∑
a a N
g
a=1
is the total number of particles. Note that ∑
a x = 1.
g
a=1 a
If then follows that if we specify more than g + 2 state variables, there must exist a relation among them. Such relations are known
as equations of state. The most famous example is the ideal gas law,
pV = N kB T , (2.1.1)
relating the four state variables T , p, V , and N . Here k B

= 1.3806503 × 10
−16
erg/K is Boltzmann’s constant. Another example
is the van der Waals equation,
2
aN
(p + )(V − bN ) = N kB T , (2.1.2)
2
V
where a and b are constants which depend on the molecule which forms the gas. For a third example, consider a paramagnet,
where
M CH
= , (2.1.3)
V T
where M is the magnetization, H the magnetic field, and C the Curie constant.
Any quantity which, in equilibrium, depends only on the state variables is called a state function. For example, the total internal
energy E of a thermodynamics system is a state function, and we may write E = E(T , p, V ) . State functions can also serve as
state variables, although the most natural state variables are those which can be directly measured.
Heat
Once thought to be a type of fluid, heat is now understood in terms of the kinetic theory of gases, liquids, and solids as a form of
energy stored in the disordered motion of constituent particles. The units of heat are therefore units of energy, and it is appropriate
to speak of heat energy, which we shall simply abbreviate as heat:3
7 18 −4 −4
1 J = 10 erg = 6.242 × 10 eV = 2.390 × 10 kcal = 9.478 × 10 BT U . (2.1.4)
We will use the symbol Q to denote the amount of heat energy absorbed by a system during some given thermodynamic process,
and \mathchar 26 dQ to denote a differential amount of heat energy. The symbol \mathchar 26
′
d indicates an ‘inexact differential’,
′
about which we shall have more to say presently. This means that heat is not a state function: there is no ‘heat function’
Q(T , p, V ).
Work
′
In general we will write the differential element of work \mathchar 26
dW done by the system as
′
\mathchar 26
dW = ∑ F dX , (2.1.5)
i i
where F is a generalized force and dX a generalized displacement4. The generalized forces and displacements are themselves
i i
state variables, and by convention we will take the generalized forces to be intensive and the generalized displacements to be
extensive. As an example, in a simple one-component system, we have \mathchar 26 dW = p dV . More generally, we write
′
−∑ y dXj ∑a μa dNa
j j
 
′
\mathchar 26
dW = (p dV − H ⋅ dM − E ⋅ dP − σ dA + … ) − (μ dN +μ dN +… ) (2.1.6)
1 1 2 2
Here we distinguish between two types of work. The first involves changes in quantities such as volume, magnetization, electric
polarization, area, The conjugate forces y applied to the system are then −p, the magnetic field H, the electric field E , the surface
i
tension σ, respectively. The second type of work involves changes in the number of constituents of a given species. For example,
energy is required in order to dissociate two hydrogen atoms in an H molecule. The effect of such a process is dN = −1 and
2 H2
dN
H
= +2 .
As with heat, \mathchar 26 dW is an inexact differential, and work W is not a state variable, since it is path-dependent. There is no
′
‘work function’ W (T , p, V ).
Pressure and Temperature
2
The units of pressure (p) are force per unit area. The SI unit is the Pascal (Pa): 1 P a = 1 N /\Rm = 1 kg/\Rm s
2
. Other units of
pressure we will encounter:
5
1 bar ≡ 10 Pa
5
1 atm ≡ 1.01325 × 10 Pa
1 torr ≡ 133.3 P a .
Temperature (T ) has a very precise definition from the point of view of statistical mechanics, as we shall see. Many physical
properties depend on the temperature – such properties are called thermometric properties. For example, the resistivity of a metal
ρ(T , p) or the number density of a gas n(T , p) are both thermometric properties, and can be used to define a temperature scale.
Consider the device known as the ‘constant volume gas thermometer’ depicted in Fig. [CVGTa], in which the volume or pressure
of a gas may be used to measure temperature. The gas is assumed to be in equilibrium at some pressure p, volume V , and
temperature T . An incompressible fluid of density ϱ is used to measure the pressure difference Δp = p − p , where p is the 0 0
ambient pressure at the top of the reservoir:
p −p = ϱg(h −h ) , (2.1.7)
0 2 1
where g is the acceleration due to gravity. The height h of the left column of fluid in the U-tube provides a measure of the change
1
in the volume of the gas:
V (h ) = V (0) − Ah , (2.1.8)
1 1
where A is the (assumed constant) cross-sectional area of the left arm of the U-tube. The device can operate in two modes:
Constant pressure mode : The height of the reservoir is adjusted so that the height difference h − h is held constant. This
2 1
fixes the pressure p of the gas. The gas volume still varies with temperature T , and we can define
T V
= , (2.1.9)
T V
ref ref
where T and V are the reference temperature and volume, respectively.

ref ref
Constant volume mode : The height of the reservoir is adjusted so that h = 0 , hence the volume of the gas is held fixed, and
1
the pressure varies with temperature. We then define

T p
= , (2.1.10)
T p
ref ref
where T ref
and p
ref
are the reference temperature and pressure, respectively.
[CVGTa] The constant volume gas thermometer. The gas is placed in thermal contact with an object of temperature T . An
incompressible fluid of density ϱ is used to measure the pressure difference Δp = p − p .
gas 0
What should we use for a reference? One might think that a pot of boiling water will do, but anyone who has gone camping in the
mountains knows that water boils at lower temperatures at high altitude (lower pressure). This phenomenon is reflected in the
phase diagram for H O, depicted in Fig. [H2Opd]. There are two special points in the phase diagram, however. One is the triple
2
point, where the solid, liquid, and vapor (gas) phases all coexist. The second is the critical point, which is the terminus of the curve
separating liquid from gas. At the critical point, the latent heat of transition between liquid and gas phases vanishes (more on this
later on). The triple point temperature T at thus unique and is by definition T = 273.16 K. The pressure at the triple point is
t t
atm.
−3
611.7 P a = 6.056 × 10
[H2Opd] A sketch of the phase diagram of H O (water). Two special points are identified: the triple point (T , p ) at which there is
2 t t
three phase coexistence, and the critical point (T , p ), where the latent heat of transformation from liquid to gas vanishes. Not
c c
shown are transitions between several different solid phases.

A question remains: are the two modes of the thermometer compatible? it we boil water at p = p = 1 atm, do they yield the same
0
value for T ? And what if we use a different gas in our measurements? In fact, all these measurements will in general be
incompatible, yielding different results for the temperature T . However, in the limit that we use a very low density gas, all the
results converge. This is because all low density gases behave as ideal gases, and obey the ideal gas equation of state pV = N k T . B
Standard temperature and pressure
It is customary in the physical sciences to define certain standard conditions with respect to which any arbitrary conditions may be
compared. In thermodynamics, there is a notion of standard temperature and pressure, abbreviated STP. Unfortunately, there are
two different definitions of STP currently in use, one from the International Union of Pure and Applied Chemistry (IUPAC), and
the other from the U.S. National Institute of Standards and Technology (NIST). The two standards are:
∘ 5
I U P AC : T =0 C = 273.15 K , p = 10 Pa
0 0
∘ 5
N I ST : T = 20 C = 293.15 K , p = 1 atm = 1.01325 × 10 Pa
0 0
To make matters worse, in the past it was customary to define STP as T = 0 C and p = 1 atm. We will use the NIST definition
0
∘
0
in this course. Unless I slip and use the IUPAC definition. Figuring out what I mean by STP will keep you on your toes.
The volume of one mole of ideal gas at STP is then
NA kB T 22.711 ℓ (I U P AC )
0
V = ={ (2.1.11)
p 24.219 ℓ (N I ST ) ,
0
3
where 1 ℓ = 10 cm = 10 \Rm is one liter. Under the old definition of STP as T = 0 C and p = 1 atm, the volume of one
6 3 −3
0
∘
mole of gas at STP is 22.414 ℓ, which is a figure I remember from my 10 grade chemistry class with Mr. Lawrence.
th
[CVGTb] As the gas density tends to zero, the readings of the constant volume gas thermometer converge.
This page titled 2.1: What is Thermodynamics? is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel
Arovas.
Equilibrium is established by the exchange of energy, volume, or particle number between different systems or subsystems:
energy exchange ⟹ T = constant ⟹ thermal equilibrium
p
volume exchange ⟹ = constant ⟹ mechanical equilibrium
T
μ
particle exchange ⟹ = constant ⟹ chemical equilibrium
T
Equilibrium is transitive, so
If A is in equilibrium with B, and B is in equilibrium with C, then A is in equilibrium with

C.
This known as the Zeroth Law of Thermodynamics5.
This page titled 2.2: The Zeroth Law of Thermodynamics is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by
Daniel Arovas.
The differential
k
dF = ∑ A dx (2.3.1)
i i
i=1
is called exact if there is a function F (x1

,…,x )
k
whose differential gives the right hand side of Equation 2.3.1. In this case, we have
∂F ∂Ai ∂Aj
A = ⟺ = ∀ i, j . (2.3.2)
i
∂x ∂x ∂x
i j i
For exact differentials, the integral between fixed endpoints is path-independent:

B
B B A A
∫ dF = F (x ,…,x ) − F (x ,…,x ) , (2.3.3)
1 k 1 k
from which it follows that the integral of dF around any closed path must vanish:
∮ dF = 0 . (2.3.4)
When the cross derivatives are not identical, when ∂ A i

/∂ x
j
≠ ∂ A /∂ x
j i
, the differential is inexact. In this case, the integral of dF is path dependent, and does not depend solely on the endpoints.
Figure [work_path] Two distinct paths with identical endpoints.

As an example, consider the differential
dF = K y dx + K x dy . (2.3.5)
1 2
Let’s evaluate the integral of dF , which is the work done, along each of the two paths in Fig. [work_path]:
\begin{aligned} W^\ssr{(I)}&=K\ns_1\!\int\limits_{x\ns_\RA}^{x\nd_\RB}\!\!dx\>y\subA + K\ns_2\!\int\limits_{y\ns_\RA}^{y\nd_\RB}\!\!dy\>x\subB= K\ns_1\,y\subA \,(x\subB-x\subA) + K\ns_2\,x\subB\,(y\subB-y\sub
Note that in general W^\ssr{(I)}\ne W^\ssr{(II)} . Thus, if we start at point A, the kinetic energy at point B will depend on the path taken, since the work done is path-dependent.
The difference between the work done along the two paths is
W^\ssr{(I)}-W^\ssr{(II)}=\oint\!dF=(K\ns_2-K\ns_1)\,(x\subB-x\subA)\,(y\subB-y\subA)\ . \label{Wdiff}
Thus, we see that if K = K , the work is the same for the two paths. In fact, if K = K , the work would be path-independent, and would depend only on the endpoints. This is true for any path,
1 2 1 2
and not just piecewise linear paths of the type depicted in Fig. [work_path]. Thus, if K = K , we are justified in using the notation dF for the differential in Equation [dFe]; explicitly, we then have
1 2
F = K xy . However, if K ≠ K , the differential is inexact, and we will henceforth write \mathchar 26 dF in such cases.
′
1 1 2
This page titled 2.3: Mathematical Interlude - Exact and Inexact Differentials is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel Arovas.
Conservation of energy
The first law is a statement of energy conservation, and is depicted in Fig. [firstlaw]. It says, quite simply, that during a thermodynamic process, the
change in a system’s internal energy E is given by the heat energy Q added to the system, minus the work W done by the system:
ΔE = Q − W . (2.4.1)
The differential form of this, the First Law of Thermodynamics, is

′ ′
dE = \mathchar 26
dQ − \mathchar 26
dW . (2.4.2)
′ ′ ′
We use the symbol \mathchar 26 d in the differentials \mathchar 26 dQ and \mathchar 26
dW to remind us that these are inexact differentials. The
energy E , however, is a state function, hence dE is an exact differential.
Consider a volume V of fluid held in a flask, initially at temperature T , and held at atmospheric pressure. The internal energy is then
0
E = E(T , p, V ) . Now let us contemplate changing the temperature in two different ways. The first method (A) is to place the flask on a hot plate
0 0
until the temperature of the fluid rises to a value T . The second method (B) is to stir the fluid vigorously. In the first case, we add heat Q\subA>0 but
1
no work is done, so W\subA=0 . In the second case, if we thermally insulate the flask and use a stirrer of very low thermal conductivity, then no heat is
added, Q\subB=0 . However, the stirrer does work -W\subB>0 on the fluid (remember W is the work done by the system). If we end up at the same
temperature T , then the final energy is E = E(T , p, V ) in both cases. We then have
1 1 1
\RDelta E = E\ns_1-E\ns_0 = Q\subA = -W\subB\ .
[firstlaw] The first law of thermodynamics is a statement of energy conservation.

It also follows that for any cyclic transformation, where the state variables are the same at the beginning and the end, we have
ΔE = Q −W = 0 ⟹ Q =W (cyclic) . (2.4.3)
cyclic
Single component systems

A single component system is specified by three state variables. In many applications, the total number of particles N is conserved, so it is useful to
take N as one of the state variables. The remaining two can be (T , V ) or (T , p) or (p, V ). The differential form of the first law says
′ ′
dE = \mathchar 26
dQ − \mathchar 26
dW
′
= \mathchar 26
dQ − p dV + μ dN .
The quantity μ is called the chemical potential. We ask: how much heat is required in order to make an infinitesimal change in temperature, pressure,
volume, or particle number? We start by rewriting Equation [DFL] as
′
\mathchar 26
dQ = dE + p dV − μ dN . (2.4.4)
We now must roll up our sleeves and do some work with partial derivatives.
∙ (T , V , N ) systems : If the state variables are (T , V , N ), we write
∂E ∂E ∂E
dE = ( ) dT + ( ) dV + ( ) dN . (2.4.5)
∂T ∂V ∂N
V ,N T ,N T ,V
Then
′
∂E ∂E ∂E
\mathchar 26
dQ = ( ) dT + [ ( ) + p]dV + [ ( ) − μ]dN . (2.4.6)
∂T ∂V ∂N
V ,N T ,N T ,V
∙ (T , p, N ) systems : If the state variables are (T , p, N ), we write
∂E ∂E ∂E
dE = ( ) dT + ( ) dp + ( ) dN . (2.4.7)
∂T ∂p ∂N
p,N T ,N T ,p
We also write
∂V ∂V ∂V
dV = ( ) dT + ( ) dp + ( ) dN . (2.4.8)
∂T ∂p ∂N
p,N T ,N T ,p
Then
′ ∂E ∂V ∂E ∂V
\mathchar 26
dQ = [( ) + p( ) ]dT + [ ( ) + p( ) ]dp
∂T ∂T ∂p ∂p
p,N p,N T ,N T ,N
∂E ∂V
+ [( ) + p( ) − μ]dN .
∂N ∂N
T ,p T ,p
∙ (p, V , N ) systems : If the state variables are (p, V , N ), we write
∂E ∂E ∂E
dE = ( ) dp + ( ) dV + ( ) dN . (2.4.9)
∂p ∂V ∂N
V ,N p,N p,V
Then
′ ∂E ∂E ∂E
\mathchar 26
dQ = ( ) dp + [ ( ) + p]dV + [ ( ) − μ]dN . (2.4.10)
∂p ∂V ∂N
V ,N p,N p,V
The heat capacity of a body, C , is by definition the ratio \mathchar 26 dQ/dT of the amount of heat absorbed by the body to the associated
′
infinitesimal change in temperature dT . The heat capacity will in general be different if the body is heated at constant volume or at constant pressure.
Setting dV = 0 gives, from Equation [QTVN],
′
\mathchar 26
dQ ∂E
C =( ) =( ) . (2.4.11)
V ,N
dT ∂T
V ,N V ,N
Similarly, if we set dp = 0 , then Equation [QTpN] yields

′
\mathchar 26
dQ ∂E ∂V
C =( ) =( ) + p( ) . (2.4.12)
p,N
dT ∂T ∂T
p,N p,N p,N
Unless explicitly stated as otherwise, we shall assume that N is fixed, and will write C for C V V ,N
and C for C p p,N
.
[cptab] Specific heat (at 25 C, unless otherwise noted) of some common substances. (Source: Wikipedia.)
∘
~ ~
cp c cp c
p p
SUBSTANCE (J/mol K ) (J/g K ) SUBSTANCE (J/mol K ) (J/g K )
Air 29.07 1.01 H O

2
(25 C) ∘
75.34 4.181
Aluminum 24.2 0.897 H O

2
(100 C)∘
+
37.47 2.08
Copper 24.47 0.385 Iron 25.1 0.450
CO
2
36.94 0.839 Lead 26.4 0.127
Diamond 6.115 0.509 Lithium 24.8 3.58
Ethanol 112 2.44 Neon 20.786 1.03
Gold 25.42 0.129 Oxygen 29.38 0.918
Helium 20.786 5.193 Paraffin (wax) 900 2.5
Hydrogen 28.82 5.19 Uranium 27.7 0.116
H O
2
(−10 C) ∘
38.09 2.05 Zinc 25.3 0.387
The units of heat capacity are energy divided by temperature, J/K . The heat capacity is an extensive quantity, scaling with the size of the system. If we
divide by the number of moles N /N , we obtain the molar heat capacity, sometimes called the molar specific heat: c = C /ν , where ν = N /N is the
A A
number of moles of substance. Specific heat is also sometimes quoted in units of heat capacity per gram of substance. We shall define
C c heat capacity per mole
~
c = = = . (2.4.13)
mN M mass per mole
Here m is the mass per particle and M is the mass per mole: M = NA m .
Suppose we raise the temperature of a body from T=\TA to T=\TB . How much heat is required? We have
Q=\int\limits_\TA^\TB\!\!dT\,C(T)\ ,
where C =C
V
or C = Cp depending on whether volume or pressure is held constant. For ideal gases, as we shall discuss below, C (T ) is constant, and
thus
Q=C(\TB-\TA) \quad\Longrightarrow\quad \TB=\TA+{Q\over C}\ .
In metals at very low temperatures one finds C = γT , where γ is a constant6. We then have
\begin{aligned} Q&=\int\limits_\TA^\TB\!\!dT\,C(T)=\half\gamma\big(T_\ssr{B}^2 - T_\ssr{A}^2\big)\\ \TB&=\sqrt{T_\ssr{A}^2 + 2\gamma^{-1} Q}\ .\end{aligned}
Ideal gases
The ideal gas equation of state is pV = N k T . In order to invoke the formulae in Equations 2.4.6, ??? , and 2.4.10, we need to know the state function
B
E(T , V , N ) . A landmark experiment by Joule in the mid-19th century established that the energy of a low density gas is independent of its volume .
7
Essentially, a gas at temperature T was allowed to freely expand from one volume V to a larger volume V > V , with no added heat Q and no work W ′
done. Therefore the energy cannot change. What Joule found was that the temperature also did not change. This means that E(T , V , N ) = E(T , N )
cannot be a function of the volume.
[CVH2] Heat capacity C for one mole of hydrogen (H ) gas. At the lowest temperatures, only translational degrees of freedom are relevant, and
V 2
f = 3 . At around 200 K, two rotational modes are excitable and f = 5 . Above 1000 K, the vibrational excitations begin to contribute. Note the
logarithmic temperature scale. (Data from H. W. Wooley et al., Jour. Natl. Bureau of Standards, 41, 379 (1948).)
Since E is extensive, we conclude that
E(T , V , N ) = ν ε(T ) , (2.4.14)
where ν = N /NA is the number of moles of substance. Note that ν is an extensive variable. From eqns. [cveqn] and [cpeqn], we conclude
′
C (T ) = ν ε (T ) , Cp (T ) = C (T ) + ν R , (2.4.15)
V V
where we invoke the ideal gas law to obtain the second of these. Empirically it is found that C (T ) is temperature independent over a wide range of T ,
V
far enough from boiling point. We can then write C = ν c , where ν ≡ N /N is the number of moles, and where c is the molar heat capacity. We
V V A V
then have
cp = c +R , (2.4.16)
V
where R = N A kB = 8.31457 J/mol K is the gas constant. We denote by γ = cp / c

V
the ratio of specific heat at constant pressure and at constant
volume.
From the kinetic theory of gases, one can show that
3 5 5
monatomic gases: c = R , cp = R , γ =
V
2 2 3
5 7 7
diatomic gases: c = R , cp = R , γ =
V
2 2 5
4
polyatomic gases: c = 3R , cp = 4R , γ = .
V
3
Digression : kinetic theory of gases

We will conclude in general from noninteracting classical statistical mechanics that the specific heat of a substance is c = f R , where f is the v
1
number of phase space coordinates, per particle, for which there is a quadratic kinetic or potential energy function. For example, a point particle has
three translational degrees of freedom, and the kinetic energy is a quadratic function of their conjugate momenta: H = (p + p + p )/2m . Thus, 0
2
x
2
y
2
z
f = 3 . Diatomic molecules have two additional rotational degrees of freedom – we don’t count rotations about the symmetry axis – and their conjugate
momenta also appear quadratically in the kinetic energy, leading to f =5 . For polyatomic molecules, all three Euler angles and their conjugate
momenta are in play, and f = 6 .
[CVsolids] Molar heat capacities c for three solids. The solid curves correspond to the predictions of the Debye model, which we shall discuss later.
V
The reason that f = 5 for diatomic molecules rather than f = 6 is due to quantum mechanics. While translational eigenstates form a continuum, or are
quantized in a box with Δk = 2π/L being very small, since the dimensions L are macroscopic, angular momentum, and hence rotational kinetic
α α α
energy, is quantized. For rotations about a principal axis with very low moment of inertia I , the corresponding energy scale ℏ /2I is very large, and a 2
high temperature is required in order to thermally populate these states. Thus, degrees of freedom with a quantization energy on the order or greater
than ε are ‘frozen out’ for temperatures T ∝ ε /k .
0 0 B
In solids, each atom is effectively connected to its neighbors by springs; such a potential arises from quantum mechanical and electrostatic
consideration of the interacting atoms. Thus, each degree of freedom contributes to the potential energy, and its conjugate momentum contributes to the
kinetic energy. This results in f = 6 . Assuming only lattice vibrations, then, the high temperature limit for c (T ) for any solid is predicted to be V
3R = 24.944 J/mol K. This is called the Dulong-Petit law. The high temperature limit is reached above the so-called Debye temperature, which is
roughly proportional to the melting temperature of the solid.

In table [cptab], we list c and c~ for some common substances at T = 25 C (unless otherwise noted). Note that c for the monatomic gases He and
p p
∘
p
Ne is to high accuracy given by the value from kinetic theory, c = R = 20.7864 J/mol K . For the diatomic gases oxygen (O ) and air (mostly N
p
5
2 2 2
and O ), kinetic theory predicts c = R = 29.10 , which is close to the measured values. Kinetic theory predicts c = 4R = 33.258 for polyatomic
2 p
7
2
p
gases; the measured values for C O and H O are both about 10% higher.
2 2
Adiabatic transformations of ideal gases

Assuming dN =0 and E = ν ε(T ) , Equation [QTVN] tells us that
′
\mathchar 26
dQ = C dT + p dV . (2.4.17)
V
Invoking the ideal gas law to write p = ν RT /V , and remembering C V

=ν c
V
, we have, setting \mathchar 26
′
dQ = 0 ,
dT R dV
+ =0 . (2.4.18)
T c V
V
We can immediately integrate to obtain

γ−1
⎧ TV = constant
′ γ
\mathchar 26
dQ = 0 ⟹ ⎨ pV = constant (2.4.19)
⎩
γ 1−γ
T p = constant
where the second two equations are obtained from the first by invoking the ideal gas law. These are all adiabatic equations of state. Note the difference
between the adiabatic equation of state d(pV ) = 0 and the isothermal equation of state d(pV ) = 0 . Equivalently, we can write these three conditions
γ
as
2 f 2 f f f +2 f f +2 f +2 −2 f +2 −2
V T =V T , p V =p V , T p =T p . (2.4.20)
0 0 0 0 0 0
It turns out that air is a rather poor conductor of heat. This suggests the following model for an adiabatic atmosphere. The hydrostatic pressure decrease
associated with an increase dz in height is dp = −ϱg dz , where ϱ is the density and g the acceleration due to gravity. Assuming the gas is ideal, the
density can be written as ϱ = M p/RT , where M is the molar mass. Thus,
dp Mg
=− dz . (2.4.21)
p RT
If the height changes are adiabatic, then, from d(T γ

p
1−γ
) =0 , we have
γ −1 T dp γ −1 Mg
dT = =− dz , (2.4.22)
γ p γ R
with the solution

γ −1 Mg γ −1 z
T (z) = T − z = (1 − )T , (2.4.23)
0 0
γ R γ λ
where T 0
= T (0) is the temperature at the earth’s surface, and
RT
0
λ = . (2.4.24)
Mg
With M = 28.88 g and γ = for air, and assuming T = 293 K , we find λ = 8.6 km, and dT /dz = −(1 − γ
7
5 0
−1
) T /λ = −9.7 K/km
0
. Note that in
this model the atmosphere ends at a height z = γλ/(γ − 1) = 30 km.
max
Again invoking the adiabatic equation of state, we can find p(z) :

γ γ
p(z) T γ−1
γ −1 z γ−1
=( ) = (1 − ) (2.4.25)
p T γ λ
0 0
Recall that
k
x
x
e = lim (1 + ) . (2.4.26)
k→∞ k
Thus, in the limit γ → 1 , where k = γ/(γ − 1) → ∞ , we have p(z) = p 0

exp(−z/λ) . Finally, since ϱ ∝ p/T from the ideal gas law, we have
1
ϱ(z) γ −1 z γ−1
= (1 − ) . (2.4.27)
ϱ γ λ
0
Adiabatic free expansion

Consider the situation depicted in Fig. [AFE]. A quantity (ν moles) of gas in equilibrium at temperature T and volume V is allowed to expand freely 1
into an evacuated chamber of volume V by the removal of a barrier. Clearly no work is done on or by the gas during this process, hence W = 0 . If the
2
walls are everywhere insulating, so that no heat can pass through them, then Q = 0 as well. The First Law then gives ΔE = Q − W = 0 , and there is
no change in energy.
If the gas is ideal, then since E(T , V , N ) = N c T , then ΔE = 0 gives ΔT = 0 , and there is no change in temperature. (If the walls are insulating
V
against the passage of heat, they must also prevent the passage of particles, so ΔN = 0 .) There is of course a change in volume: ΔV = V , hence there 2
is a change in pressure. The initial pressure is p = N k T /V and the final pressure is p = N k T /(V + V ) .
B
1
′
B
1 2
[AFE] In the adiabatic free expansion of a gas, there is volume expansion with no work or heat exchange with the environment: ΔE = Q = W = 0 .
If the gas is nonideal, then the temperature will in general change. Suppose E(T , V , N ) = α V N T , where α , x, and y are constants. This form
x 1−x y
is properly extensive: if V and N double, then E doubles. If the volume changes from V to V under an adiabatic free expansion, then we must have, ′
from ΔE = 0 ,
x ′ y x/y
V T ′
V
( ) =( ) ⟹ T =T ⋅( ) . (2.4.28)
′ ′
V T V
If x/y > 0, the temperature decreases upon the expansion. If x/y < 0, the temperature increases. Without an equation of state, we can’t say precisely
what happens to the pressure, although we know on general grounds that it must decrease because, as we shall see, thermodynamic stability entails a
positive isothermal compressibility: κ T
=−
1
V
(
∂V
∂p
)
T ,N
>0 .
Adiabatic free expansion of a gas is a spontaneous process, arising due to the natural internal dynamics of the system. It is also irreversible. If we wish
to take the gas back to its original state, we must do work on it to compress it. If the gas is ideal, then the initial and final temperatures are identical, so
we can place the system in thermal contact with a reservoir at temperature T and follow a thermodynamic path along an isotherm. The work done on
the gas during compression is then
Vi
V V
dV f 2
W = −N kB T ∫ = N kB T ln( ) = N kB T ln(1 + ) (2.4.29)
V V V
i 1
V
f
The work done by the gas is W = ∫ p dV = −W . During the compression, heat energy Q = W <0 is transferred to the gas from the reservoir. Thus,
Q = W > 0 is given off by the gas to its environment.
This page titled 2.4: The First Law of Thermodynamics is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel Arovas.
There’s no free lunch so quit asking
A heat engine is a device which takes a thermodynamic system through a repeated cycle which can be represented as a succession of equilibrium states: A → B → C ⋯ → A . The net result of such
a cyclic process is to convert heat into mechanical work, or vice versa.
Figure [perfect]: A perfect engine would extract heat Q from a thermal reservoir at some temperature T and convert it into useful mechanical work W . This process is alas impossible, according to
the Second Law of thermodynamics. The inverse process, where work W is converted into heat Q , is always possible.
For a system in equilibrium at temperature T , there is a thermodynamically large amount of internal energy stored in the random internal motion of its constituent particles. Later, when we study
statistical mechanics, we will see how each ‘quadratic’ degree of freedom in the Hamiltonian contributes k T to the total internal energy. An immense body in equilibrium at temperature T has an
1
2
B
enormous heat capacity C , hence extracting a finite quantity of heat Q from it results in a temperature change ΔT = −Q/C which is utterly negligible. Such a body is called a heat bath, or thermal
reservoir. A perfect engine would, in each cycle, extract an amount of heat Q from the bath and convert it into work. Since ΔE = 0 for a cyclic process, the First Law then gives W = Q . This
situation is depicted schematically in Fig. [perfect]. One could imagine running this process virtually indefinitely, slowly sucking energy out of an immense heat bath, converting the random thermal
motion of its constituent molecules into useful mechanical work. Sadly, this is not possible:
A transformation whose only final result is to extract heat froma source at fixed temperature and transform that heat into work is
impossible.
This is known as the Postulate of Lord Kelvin. It is equivalent to the postulate of Clausius,
A transformation whose only result is to transfer heat from a body at a given temperature to a body at higher temperature is
impossible.
These postulates which have been repeatedly validated by empirical observations, constitute the Second Law of Thermodynamics.
Engines and refrigerators

While it is not possible to convert heat into work with 100% efficiency, it is possible to transfer heat from one thermal reservoir to another one, at lower temperature, and to convert some of that heat
into work. This is what an engine does. The energy accounting for one cycle of the engine is depicted in the left hand panel of Fig. [engref]. An amount of heat Q > 0 is extracted- from the reservoir
2
at temperature T . Since the reservoir is assumed to be enormous, its temperature change ΔT = −Q /C is negligible, and its temperature remains constant – this is what it means for an object to
2 2 2 2
be a reservoir. A lesser amount of heat, Q , with 0 < Q < Q , is deposited in a second reservoir at a lower temperature T . Its temperature change ΔT = +Q /C is also negligible. The
1 1 2 1 1 1 1
difference W = Q − Q is extracted as useful work. We define the efficiency, η , of the engine as the ratio of the work done to the heat extracted from the upper reservoir, per cycle:
2 1
W Q
1
η = =1− . (2.5.1)
Q Q
2 2
This is a natural definition of efficiency, since it will cost us fuel to maintain the temperature of the upper reservoir over many cycles of the engine. Thus, the efficiency is proportional to the ratio of
the work done to the cost of the fuel.
A refrigerator works according to the same principles, but the process runs in reverse. An amount of heat Q is extracted from the lower reservoir – the inside of our refrigerator – and is pumped into
1
the upper reservoir. As Clausius’ form of the Second Law asserts, it is impossible for this to be the only result of our cycle. Some amount of work W must be performed on the refrigerator in order for
it to extract the heat Q . Since ΔE = 0 for the cycle, a heat Q = W + Q must be deposited into the upper reservoir during each cycle. The analog of efficiency here is called the coefficient of
1 2 1
refrigeration, κ , defined as
Q Q
1 1
κ = = . (2.5.2)
W Q2 − Q
1
Thus, κ is proportional to the ratio of the heat extracted to the cost of electricity, per cycle.
[engref] An engine (left) extracts heat Q from a reservoir at temperature T and deposits a smaller amount of heat Q into a reservoir at a lower temperature T , during each cycle. The difference
2 2 1 1
W = Q −Q2 is transformed into mechanical work. A refrigerator (right) performs the inverse process, drawing heat Q from a low temperature reservoir and depositing heat Q = Q + W into a
1 1 2 1
high temperature reservoir, where W is the mechanical (or electrical) work done per cycle.
Please note the deliberate notation here. I am using symbols Q and W to denote the heat supplied to the engine (or refrigerator) and the work done by the engine, respectively, and Q and W to denote
the heat taken from the engine and the work done on the engine.
A perfect engine has Q = 0 and η = 1 ; a perfect refrigerator has Q = Q and κ = ∞ . Both violate the Second Law. Sadi Carnot8 (1796 – 1832) realized that a reversible cyclic engine operating
1 1 2
between two thermal reservoirs must produce the maximum amount of work W , and that the amount of work produced is independent of the material properties of the engine. We call any such engine
a Carnot engine.
The efficiency of a Carnot engine may be used to define a temperature scale. We know from Carnot’s observations that the efficiency \eta_\ssr{C} can only be a function of the temperatures T and 1
T : \eta_\ssr{C}=\eta_\ssr{C}(T\ns_1,T\ns_2) . We can then define

2
{T\ns_1\over T\ns_2}\equiv 1-\eta\ns_\ssr{C}(T\ns_1,T\ns_2)\ .
Below, in §6.4, we will see that how, using an ideal gas as the ‘working substance’ of the Carnot engine, this temperature scale coincides precisely with the ideal gas temperature scale from §2.4.
Nothing beats a Carnot engine

The Carnot engine is the most efficient engine possible operating between two thermal reservoirs. To see this, let’s suppose that an amazing wonder engine has an efficiency even greater than that of
the Carnot engine. A key feature of the Carnot engine is its reversibility – we can just go around its cycle in the opposite direction, creating a Carnot refrigerator. Let’s use our notional wonder engine
to drive a Carnot refrigerator, as depicted in Fig. [NBC].
We assume that
{W\over Q\ns_2} = \eta\ns_\ssr{wonder} > \eta\ns_\ssr{Carnot} = {\CW'\over \CQ'_2}\ .
But from the figure, we have W =W

′
, and therefore the heat energy Q ′
2
−Q
2
transferred to the upper reservoir is positive. From
′ ′ ′
W =Q − Q1 = Q2 − Q =W , (2.5.3)
2 1
we see that this is equal to the heat energy extracted from the lower reservoir, since no external work is done on the system:
′ ′
Q2 − Q =Q − Q1 > 0 . (2.5.4)
2 1
Therefore, the existence of the wonder engine entails a violation of the Second Law. Since the Second Law is correct – Lord Kelvin articulated it, and who are we to argue with a Lord? – the wonder
engine cannot exist.
[NBC] A wonder engine driving a Carnot refrigerator.

We further conclude that all reversible engines running between two thermal reservoirs have the same efficiency, which is the efficiency of a Carnot engine. For an irreversible engine, we must have
\eta={W\over Q\ns_2}=1-{\CQ\ns_1\over Q\ns_2} \le 1-{T\ns_1\over T\ns_2}= \eta\ns_\ssr{C}\ .
Thus,
Q Q1
2
− ≤0 . (2.5.5)
T T
2 1
The Carnot cycle

Let us now consider a specific cycle, known as the Carnot cycle, depicted in Fig. [carnot]. The cycle consists of two adiabats and two isotherms. The work done per cycle is simply the area inside the
curve on our p − V diagram:
W = ∮ p dV . (2.5.6)
The gas inside our Carnot engine is called the ‘working substance’. Whatever it may be, the system obeys the First Law,
′ ′ ′
dE = \mathchar 26
dQ − \mathchar 26
dW = \mathchar 26
dQ − p dV . (2.5.7)
We will now assume that the working material is an ideal gas, and we compute W as well as Q1 and Q
2
to find the efficiency of this cycle. In order to do this, we will rely upon the ideal gas
equations,
ν RT
E = , pV = ν RT , (2.5.8)
γ −1
where γ = c p / cv =1+
2
f
, where f is the effective number of molecular degrees of freedom contributing to the internal energy. Recall f =3 for monatomic gases, f =5 for diatomic gases, and
f =6 for polyatomic gases. The finite difference form of the first law is
ΔE = E −E =Q −W , (2.5.9)
f i if if
where i denotes the initial state and f the final state.
[carnot] The Carnot cycle consists of two adiabats (dark red) and two isotherms (blue).
This stage is an isothermal expansion at temperature T . It is the ‘power stroke’ of the engine. We have
2
\begin{aligned} W\ns_\ssr{AB}&=\int\limits_{V\subA}^{V\subB}\!\!dV\,{\nu R T\ns_2\over V} = \nu R T\ns_2\, \ln\bigg({V\subB\over V\subA}\bigg)\\ E\subA&=E\subB={\nu R T\ns_2\over \gamma-1}\ ,\end{align
hence
Q\ns_\ssr{AB}=\RDelta E\ns_\ssr{AB}+ W\ns_\ssr{AB} = \nu R T\ns_2\,\ln\bigg({V\subB\over V\subA}\bigg)\ .
This stage is an adiabatic expansion. We have
\begin{aligned} Q\ns_\ssr{BC}&=0\\ \RDelta E\ns_\ssr{BC}&=E\ns_\ssr{C}-E\subB = {\nu R\over \gamma-1}\,(T\ns_1 - T\ns_2)\ .\end{aligned}
The energy change is negative, and the heat exchange is zero, so the engine still does some work during this stage:
W\subBC=Q\ns_\ssr{BC}-\RDelta E\ns_\ssr{BC}= {\nu R\over \gamma-1}\, (T\ns_2 - T\ns_1)\ .
This stage is an isothermal compression, and we may apply the analysis of the isothermal expansion, mutatis mutandis:
\begin{aligned} W\ns_\ssr{CD}&=\int\limits_{V\ns_\ssr{C}}^{V\ns_\ssr{D}}\!\!dV\,{\nu R T\ns_1\over V} = \nu R T\ns_1\, \ln\bigg({V\ns_\ssr{D}\over V\ns_\ssr{C}}\bigg)\\ E\ns_\ssr{C}&=E\ns_\ssr{D}={\nu R T
hence
Q\ns_\ssr{CD}=\RDelta E\ns_\ssr{CD}+ W\ns_\ssr{CD} = \nu R T\ns_1\,\ln\bigg({V\ns_\ssr{D}\over V\ns_\ssr{C}}\bigg)\ .
This last stage is an adiabatic compression, and we may draw on the results from the adiabatic expansion in BC:
\begin{aligned} Q\ns_\ssr{DA}&=0\\ \RDelta E\ns_\ssr{DA}&=E\ns_\ssr{D}-E\subA= {\nu R\over \gamma-1}\,(T\ns_2 - T\ns_1)\ .\end{aligned}
The energy change is positive, and the heat exchange is zero, so work is done on the engine:
W\ns_\ssr{DA}=Q\ns_\ssr{DA}-\RDelta E\ns_\ssr{DA}= {\nu R\over \gamma-1}\,(T\ns_1 - T\ns_2)\ .
We now add up all the work values from the individual stages to get for the cycle
\begin{split} W&=W\ns_\ssr{AB}+W\ns_\ssr{BC}+W\ns_\ssr{CD}+W\ns_\ssr{DA}\\ &=\nu R T\ns_2\,\ln\bigg({V\subB\over V\subA}\bigg)+\nu R T\ns_1\,\ln\bigg({V\ns_\ssr{D}\over V\ns_\ssr{C}}\bigg)\ . \end{split}
Since we are analyzing a cyclic process, we must have ΔE = 0 , we must have Q =W , which can of course be verified explicitly, by computing
Q=Q\ns_\ssr{AB}+Q\ns_\ssr{BC}+Q\ns_\ssr{CD}+Q\ns_\ssr{DA} . To finish up, recall the adiabatic ideal gas equation of state, d(T V γ−1
) =0 . This tells us that
\begin{aligned} T\ns_2\, V_\ssr{B}^{\gamma-1}&= T\ns_1\, V_\ssr{C}^{\gamma-1} \\ T\ns_2\, V_\ssr{A}^{\gamma-1}&= T\ns_1\, V_\ssr{D}^{\gamma-1} \ .\end{aligned}
Dividing these two equations, we find
{V\subB\over V\subA}={V\ns_\ssr{C}\over V\ns_\ssr{D}}\ ,
and therefore
\begin{aligned} W&=\nu R(T\ns_2-T\ns_1)\,\ln\bigg({V\subB\over V\subA}\bigg)\\ Q\ns_\ssr{AB}&=\nu R T\ns_2\,\ln\bigg({V\subB\over V\subA}\bigg)\ .\end{aligned}
Finally, the efficiency is given by the ratio of these two quantities:
\eta={W\over Q\ns_\ssr{AB}}=1-{T\ns_1\over T\ns_2}\ .
The Stirling cycle

Many other engine cycles are possible. The Stirling cycle, depicted in Fig. [stirling], consists of two isotherms and two isochores. Recall the isothermal ideal gas equation of state, d(pV ) = 0 . Thus,
for an ideal gas Stirling cycle, we have
p\subA V\ns_1=p\subB V\ns_2 \qquad,\qquad p\ns_\ssr{D} V\ns_1=p\ns_\ssr{C} V\ns_2\ ,
which says
{p\subB\over p\subA}={p\subC\over p\subD}={V\ns_1\over V\ns_2}\ .
This isothermal expansion is the power stroke. Assuming ν moles of ideal gas throughout, we have pV = ν RT
2
=p V
1 1
, hence
W\subAB=\int\limits_{V\ns_1}^{V\ns_2}\!\!dV\,{\nu R T\ns_2\over V}=\nu R T\ns_2\,\ln\bigg({V\ns_2\over V\ns_1}\bigg)\ .
Since AB is an isotherm, we have E\subA=E\subB , and from \RDelta E\subAB=0 we conclude Q\subAB=W\subAB .
Isochoric cooling. Since dV = 0 we have W\subBC=0 . The energy change is given by
\RDelta E\subBC=E\subC-E\subB={\nu R(T\ns_1-T\ns_2)\over\gamma-1} ,
which is negative. Since W\subBC=0 , we have Q\subBC=\RDelta E\subBC .
[stirling] A Stirling cycle consists of two isotherms (blue) and two isochores (green).
Isothermal compression. Clearly
W\subCD=\int\limits_{V\ns_2}^{V\ns_1}\!\!dV\,{\nu R T\ns_1\over V}= -\nu R T\ns_1\,\ln\bigg({V\ns_2\over V\ns_1}\bigg)\ .
Since CD is an isotherm, we have E\subC=E\subD , and from \RDelta E\subCD=0 we conclude Q\subCD=W\subCD .
Isochoric heating. Since dV = 0 we have W\subDA=0 . The energy change is given by
\RDelta E\subDA=E\subA-E\subD={\nu R(T\ns_2-T\ns_1)\over\gamma-1}\ ,
which is positive, and opposite to \RDelta E\subBC . Since W\subDA=0 , we have Q\subDA=\RDelta E\subDA .
We now add up all the work contributions to obtain
\begin{split} W&=W\ns_\ssr{AB}+W\ns_\ssr{BC}+W\ns_\ssr{CD}+W\ns_\ssr{DA}\\ &=\nu R (T\ns_2-T\ns_1)\,\ln\bigg({V\ns_2\over V\ns_1}\bigg)\ . \end{split}
The cycle efficiency is once again
\eta={W\over Q\subAB}=1-{T\ns_1\over T\ns_2}\ .
The Otto and Diesel cycles

The Otto cycle is a rough approximation to the physics of a gasoline engine. It consists of two adiabats and two isochores, and is depicted in Fig. [otto]. Assuming an ideal gas, along the adiabats we
have d(pV ) = 0 . Thus,
γ
p\subA\,V^\gamma_1=p\subB\,V^\gamma_2 \qquad,\qquad p\ns_\ssr{D}\,V^\gamma_1=p\ns_\ssr{C}\,V^\gamma_2\ ,
which says
{p\subB\over p\subA}={p\subC\over p\subD}= \bigg({V\ns_1\over V\ns_2}\bigg)^{\!\gamma}\ .
[otto] An Otto cycle consists of two adiabats (dark red) and two isochores (green).
Adiabatic expansion, the power stroke. The heat transfer is Q\subAB=0 , so from the First Law we have W\subAB=-\RDelta E\subAB=E\subA-E\subB , thus
W\subAB={p\subA V\ns_1-p\subB V\ns_2\over\gamma-1} ={p\subA V\ns_1\over \gamma-1}\Bigg[ 1-\bigg({V\ns_1\over V\ns_2}\bigg)^{\!\gamma-1} \Bigg]\ .
Note that this result can also be obtained from the adiabatic equation of state pV^\gamma=p\subA V_1^\gamma :
W\subAB=\int\limits_{V\ns_1}^{V\ns_2}\!\!p\,dV= p\subA V_1^\gamma\!\int\limits_{V\ns_1}^{V\ns_2}\!dV\,V^{-\gamma} ={p\subA V\ns_1\over \gamma-1}\Bigg[ 1-\bigg({V\ns_1\over V\ns_2}\bigg)^{\!\gamma-1
Isochoric cooling (exhaust); dV =0 hence W\subBC=0 . The heat Q\subBC absorbed is then
Q\subBC=E\subC-E\subB={V\ns_2\over\gamma-1}\,(p\subC-p\subB)\ .
In a realistic engine, this is the stage in which the old burned gas is ejected and new gas is inserted.
Adiabatic compression; Q\subCD=0 and W\subCD=E\subC-E\subD :
W\subCD={p\subC V\ns_2 - p\subD V\ns_1\over \gamma-1} =-{p\subD V\ns_1\over \gamma-1}\Bigg[ 1-\bigg({V\ns_1\over V\ns_2}\bigg)^{\!\gamma-1} \Bigg]\ .
Isochoric heating, the combustion of the gas. As with BC we have dV =0 , and thus W\subDA=0 . The heat Q\subDA absorbed by the gas is then
Q\subDA=E\subA-E\subD={V\ns_1\over\gamma-1}\,(p\subA-p\subD)\ .
[diesel] A Diesel cycle consists of two adiabats (dark red), one isobar (light blue), and one isochore (green).
The total work done per cycle is then
\begin{split} W&=W\ns_\ssr{AB}+W\ns_\ssr{BC}+W\ns_\ssr{CD}+W\ns_\ssr{DA}\\ &={(p\subA-p\subD)V\ns_1\over\gamma-1} \Bigg[ 1-\bigg({V\ns_1\over V\ns_2}\bigg)^{\!\gamma-1}\Bigg]\ , \end{split}
and the efficiency is defined to be

\eta\equiv{W\over Q\subDA}=1-\bigg({V\ns_1\over V\ns_2}\bigg)^{\!\gamma-1}\ .
The ratio V /V is called the compression ratio. We can make our Otto cycle more efficient simply by increasing the compression ratio. The problem with this scheme is that if the fuel mixture
2 1
becomes too hot, it will spontaneously ‘preignite’, and the pressure will jump up before point D in the cycle is reached. A Diesel engine avoids preignition by compressing the air only, and then later
spraying the fuel into the cylinder when the air temperature is sufficient for fuel ignition. The rate at which fuel is injected is adjusted so that the ignition process takes place at constant pressure. Thus,
in a Diesel engine, step DA is an isobar. The compression ratio is r\equiv V\subB/V\subD , and the cutoff ratio is s\equiv V\subA/V\subD . This refinement of the Otto cycle allows for higher
compression ratios (of about 20) in practice, and greater engine efficiency.
For the Diesel cycle, we have, briefly,
\begin{split} W&=p\subA(V\subA-V\subD) + {p\subA V\subA-p\subB V\subB\over\gamma-1} + {p\subC V\subC-p\subD V\subD\over\gamma-1}\\ &={\gamma\,p\subA(V\subA-V\subD)\over\gamma-1} - {(p\subB-p\sub
and
Q\subDA={\gamma\,p\subA(V\subA-V\subD)\over\gamma-1}\ .
To find the efficiency, we will need to eliminate p\subB and p\subC in favor of p\subA using the adiabatic equation of state d(pV γ
) =0 . Thus,
p\subB=p\subA\cdot\bigg({V\subA\over V\subB}\bigg)^{\!\!\gamma}\qquad,\qquad p\subC=p\subA\cdot\bigg({V\subD\over V\subB}\bigg)^{\!\!\gamma}\ ,
where we’ve used p\subD=p\subA and V\subC=V\subB . Putting it all together, the efficiency of the Diesel cycle is
\eta={W\over Q\subDA}=1-{1\over\gamma}\,{r^{1-\gamma} (s^\gamma-1)\over s-1}\ .
The Joule-Brayton cycle

Our final example is the Joule-Brayton cycle, depicted in Fig. [jbray], consisting of two adiabats and two isobars. Along the adiabats we have Thus,
p\ns_2\,V^\gamma_\ssr{A}=p\ns_1\,V^\gamma_\ssr{D} \qquad,\qquad p\ns_2\,V^\gamma_\ssr{B}=p\ns_1\,V^\gamma_\ssr{C}\ ,
which says
{V\subD\over V\subA}={V\subC\over V\subB}= \bigg({p\ns_2\over p\ns_1}\bigg)^{\!\gamma^{-1}}\ .
This isobaric expansion at p = p is the power stroke. We have

2
\begin{aligned} W\subAB&=\int\limits_{V\subA}^{V\subB}\!\!dV\,p\ns_2 = p\ns_2\,(V\subB-V\subA)\\ \RDelta E\subAB&=E\subB-E\subA={p\ns_2\,(V\subB-V\subA)\over\gamma-1}\\ Q\subAB&=\RDelta E\subAB
Adiabatic expansion; Q\subBC=0 and W\subBC=E\subB-E\subC . The work done by the gas is
\begin{split} W\subBC&={p\ns_2 V\subB-p\ns_1 V\subC\over\gamma-1}= {p\ns_2 V\subB\over\gamma-1}\bigg(1-{p\ns_1\over p\ns_2} \cdot{V\subC\over V\subB}\bigg)\\ &={p\ns_2\, V\subB\over \gamma-1}\Bigg
[jbray] A Joule-Brayton cycle consists of two adiabats (dark red) and two isobars (light blue).
Isobaric compression at p = p . 1
\begin{aligned} W\subCD&=\int\limits_{V\subC}^{V\subD}\!\!dV\,p\ns_1 = p\ns_1\,(V\subD-V\subC) =-p\ns_2\,(V\subB-V\subA)\,\bigg({p\ns_1\over p\ns_2}\bigg)^{\!1-\gamma^{-1}}\\ \RDelta E\subCD&=E\subD
Adiabatic expansion; Q\subDA=0 and W\subDA=E\subD-E\subA . The work done by the gas is
\begin{split} W\subDA&={p\ns_1 V\subD-p\ns_2 V\subA\over\gamma-1}= -{p\ns_2 V\subA\over\gamma-1}\bigg(1-{p\ns_1\over p\ns_2} \cdot{V\subD\over V\subA}\bigg)\\ &=-{p\ns_2\, V\subA\over \gamma-1}\Bi
The total work done per cycle is then
\begin{split} W&=W\ns_\ssr{AB}+W\ns_\ssr{BC}+W\ns_\ssr{CD}+W\ns_\ssr{DA}\\ &={\gamma\,p\ns_2\, (V\subB-V\subA)\over \gamma-1}\Bigg[ 1-\bigg({p\ns_1\over p\ns_2} \bigg)^{\!1-\gamma^{-1}}\Bigg] \end{s
and the efficiency is defined to be
\eta\equiv{W\over Q\subAB}=1-\bigg({p\ns_1\over p\ns_2}\bigg)^{\!1-\gamma^{-1}}\ .
Carnot engine at maximum power output

While the Carnot engine described above in §6.4 has maximum efficiency, it is practically useless, because the isothermal processes must take place infinitely slowly in order for the working material
to remain in thermal equilibrium with each reservoir. Thus, while the work done per cycle is finite, the cycle period is infinite, and the engine power is zero.
A modification of the ideal Carnot cycle is necessary to create a practical engine. The idea9 is as follows. During the isothermal expansion stage, the working material is maintained at a temperature
T < T . The temperature difference between the working material and the hot reservoir drives a thermal current,
2w 2
′
\mathchar 26
dQ
2
=κ (T −T ) . (2.5.10)
2 2 2w
dt
Here, κ is a transport coefficient which describes the thermal conductivity of the chamber walls, multiplied by a geometric parameter (which is the ratio of the total wall area to its thickness).
2
Similarly, during the isothermal compression, the working material is maintained at a temperature T > T , which drives a thermal current to the cold reservoir,
1w 1
′
\mathchar 26
d Q1
=κ (T −T ) . (2.5.11)
1 1w 1
dt
Now let us assume that the upper isothermal stage requires a duration Δt and the lower isotherm a duration Δt . Then
2 1
Q =κ Δt (T −T )
2 2 2 2 2w
Q1 = κ Δt (T −T ) .
1 1 1w 1
Since the engine is reversible, we must have

Q1 Q
2
= , (2.5.12)
T T
1w 2w
which says
Δt κ T (T −T )
1 2 1w 2 2w
= . (2.5.13)
Δt κ T (T −T )
2 1 2w 1w 1
The power is
Q −Q
2 1
P = , (2.5.14)
(1 + α) (Δt + Δt )
1 2
where we assume that the adiabatic stages require a combined time of α (Δt 1
+ Δt )
2
. Thus, we find
κ κ (T −T ) (T − T ) (T −T )
1 2 2w 1w 1w 1 2 2w
P = ⋅ (2.5.15)
1 +α κ T (T −T )+κ T (T −T )
1 2w 1w 1 2 1w 2 2w
[pptab] Observed performances of real heat engines, taken from table 1 from Curzon and Albhorn (1975).
Power source T
1
∘
( C) T
2
( C)
∘
\eta\ns_\ssr{Carnot} η (theor.) η (obs.)
West Thurrock (UK)
Coal Fired Steam Plant ∼ 25 565 0.641 0.40 0.36
CANDU (Canada)
PHW Nuclear Reactor ∼ 25 300 0.480 0.28 0.30
Larderello (Italy)
Geothermal Steam Plant ∼ 80 250 0.323 0.175 0.16
We optimize the engine by maximizing P with respect to the temperatures T 1w

and T . This yields
2w
−−−−
T − √T T
2 1 2
T =T − −−−−−
2w 2
1 + √κ / κ
2 1
−−−−
√T T − T
1 2 1
T =T + .
1w 1 −−−−−
1 + √κ / κ
1 2
The efficiency at maximum power is then \[\eta={Q\ns_2-\CQ\ns_1\over Q\ns_2}=1-{T\ns_{1\Rw}\over T\ns_{2\Rw}}= 1-\sqrt

\ . \label{MCeff}\] One also finds at maximum power \[{\RDelta t\ns_2\over \RDelta t\ns_1}=\sqrt\ .\] Finally, the maximized power is
−− −− 2
κ κ √T2 − √T1
1 2
Pmax = ( ) . (2.5.16)
−− −−
1 +α √κ + √κ 1 2
Table [pptab], taken from the article of Curzon and Albhorn (1975), shows how the efficiency of this practical Carnot cycle, given by Equation [MCeff], rather accurately predicts the efficiencies of
functioning power plants.
This page titled 2.5: Heat Engines and the Second Law of Thermodynamics is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel Arovas.
2.6: The Entropy
Entropy and heat
The Second Law guarantees us that an engine operating between two heat baths at temperatures T and T must satisfy 1 2
Q Q
1 2
+ ≤0 , (2.6.1)
T T
1 2
with the equality holding for reversible processes. This is a restatement of Equation ??? , after writing Q = −Q for the heat transferred to the engine from reservoir #1. Consider now an arbitrary
1 1
curve in the p − V plane. We can describe such a curve, to arbitrary accuracy, as a combination of Carnot cycles, as shown in Fig. [mcarnot]. Each little Carnot cycle consists of two adiabats and two
isotherms. We then conclude
′
Q \mathchar 26
dQ
i
∑ ⟶ ∮ ≤0 , (2.6.2)
T T
i i
C
with equality holding if all the cycles are reversible. Rudolf Clausius, in 1865, realized that one could then define a new state function, which he called the entropy, S , that depended only on the initial
and final states of a reversible process:
dS={\dbar Q\over T} \quad\Longrightarrow\quad S\subB-S\subA=\int\limits_{\RA}^{\RB} \!{\dbar Q\over T}\ . \label{dseqn}
Since Q is extensive, so is S ; the units of entropy are [S] = J/K .
[mcarnot] An arbitrarily shaped cycle in the p − V plane can be decomposed into a number of smaller Carnot cycles. Red curves indicate isotherms and blue curves adiabats, with γ = .
5
The Third Law of Thermodynamics

Equation [dseqn] determines the entropy up to a constant. By choosing a standard state Υ, we can define S = 0 , and then by taking A = Υ in the above equation, we can define the absolute entropy
Υ
S for any state. However, it turns out that this seemingly arbitrary constant S in the entropy does have consequences, for example in the theory of gaseous equilibrium. The proper definition of
Υ
entropy, from the point of view of statistical mechanics, will lead us to understand how the zero temperature entropy of a system is related to its quantum mechanical ground state degeneracy. Walther
Nernst, in 1906, articulated a principle which is sometimes called the Third Law of Thermodynamics,
Again, this is not quite correct, and quantum mechanics tells us that S(T = 0) = kB ln g , where g is the ground state degeneracy. Nernst’s law holds when g = 1 .
We can combine the First and Second laws to write
′ ′
dE + \mathchar 26
dW = \mathchar 26
dQ ≤ T dS , (2.6.3)
where the equality holds for reversible processes.
Entropy changes in cyclic processes

For a cyclic process, whether reversible or not, the change in entropy around a cycle is zero: \RDelta S\ns_\ssr{CYC}=0 . This is because the entropy S is a state function, with a unique value for every
equilibrium state. A cyclical process returns to the same equilibrium state, hence S must return as well to its corresponding value from the previous cycle.
Consider now a general engine, as in Fig. [engref]. Let us compute the total entropy change in the entire Universe over one cycle. We have
(\RDelta S)\ns_\ssr{TOTAL}=(\RDelta S)\ns_\ssr{ENGINE} + (\RDelta S)\ns_\ssr{HOT} +(\RDelta S)\ns_\ssr{COLD}\ ,
written as a sum over entropy changes of the engine itself, the hot reservoir, and the cold reservoir10. Clearly (\RDelta S)\ns_\ssr{ENGINE}=0 . The changes in the reservoir entropies are
\begin{aligned} (\RDelta S)\ns_\ssr{HOT}&=\!\!\int\limits_{T=T\ns_2}\!\!\!{\dbar Q_\ssr{HOT}\over T} = -{Q\ns_2\over T\ns_2}\ <\ 0\\ (\RDelta S)\ns_\ssr{COLD}&=\!\!\int\limits_{T=T\ns_1}\!\!\!{\dbar Q_\ssr{COLD
because the hot reservoir loses heat Q 2

>0 to the engine, and the cold reservoir gains heat Q 1
= −Q
1
>0 from the engine. Therefore,
(\RDelta S)\ns_\ssr{TOTAL}=-\bigg({Q\ns_1\over T\ns_1} + {Q\ns_2\over T\ns_2}\bigg) \ge 0\ .
Thus, for a reversible cycle, the net change in the total entropy of the engine plus reservoirs is zero. For an irreversible cycle, there is an increase in total entropy, due to spontaneous processes.
Gibbs-Duhem relation
Recall Equation [dwork]:
′
\mathchar 26
dW = − ∑ y dX − ∑ μa dNa . (2.6.4)
j j
j a
For reversible systems, we can therefore write
dE = T dS + ∑ y dX + ∑ μa dNa . (2.6.5)
j j
j a
This says that the energy E is a function of the entropy S , the generalized displacements {X }, and the particle numbers {N }:
j a
E = E(S, { X }, { Na }) . (2.6.6)
j
Furthermore, we have
∂E ∂E ∂E
T =( ) , y =( ) , μa = ( ) (2.6.7)
j
∂S ∂X ∂Na
{ Xj , Na } j S,{ Xj , Nb( ≠a )}
S,{ Xi( ≠j) , Na }
Since E and all its arguments are extensive, we have
λE = E(λS, {λ X }, {λ Na }) . (2.6.8)
j
We now differentiate the LHS and RHS above with respect to λ , setting λ = 1 afterward. The result is
∂E ∂E ∂E
E =S +∑X + ∑ Na
j
∂S ∂X ∂Na
j j a
= TS +∑y X + ∑ μa Na .
j j
j a
Mathematically astute readers will recognize this result as an example of Euler’s theorem for homogeneous functions. Taking the differential of Equation [ETS], and then subtracting Equation
[dErev], we obtain
S dT + ∑ X dy + ∑ Na dμa = 0 . (2.6.9)
j j
j a
This is called the Gibbs-Duhem relation. It says that there is one equation of state which may be written in terms of all the intensive quantities alone. For example, for a single component system, we
must have p = p(T , μ) , which follows from
S dT − V dp + N dμ = 0 . (2.6.10)
Entropy for an ideal gas

For an ideal gas, we have E = 1
2
f N kB T , and
1 p μ
dS = dE + dV − dN
T T T
1 dT p 1 μ
= f N kB + dV + ( f kB − ) dN .
2 T T 2 T
Invoking the ideal gas equation of state pV = N kB T , we have

1
dS ∣
∣ = f N kB d ln T + N kB d ln V . (2.6.11)
N
2
Integrating, we obtain
1
S(T , V , N ) = f N kB ln T + N kB ln V + φ(N ) , (2.6.12)
2
where φ(N ) is an arbitrary function. Extensivity of S places restrictions on φ(N ), so that the most general case is
1 V
S(T , V , N ) = f N kB ln T + N kB ln ( ) +Na , (2.6.13)
2 N
where a is a constant. Equivalently, we could write

1 E V
S(E, V , N ) = f N kB ln( ) + N kB ln ( ) +Nb , (2.6.14)
2 N N
where b = a − 1
2
f kB ln(
1
2
f kB ) is another constant. When we study statistical mechanics, we will find that for the monatomic ideal gas the entropy is
5 V
S(T , V , N ) = N kB [ + ln( )] , (2.6.15)
3
2 Nλ
T
−−− −−−−−−−
where λ T
2
= √2π ℏ /m kB T is the thermal wavelength, which involved Planck’s constant. Let’s now contrast two illustrative cases.
Adiabatic free expansion – Suppose the volume freely expands from V to V = r V , with r > 1 . Such an expansion can be effected by a removal of a partition between two chambers that are
i f i
otherwise thermally insulated (see Fig. [AFE]). We have already seen how this process entails
ΔE = Q = W = 0 . (2.6.16)
But the entropy changes! According to Equation [SEVN], we have
ΔS = S −S = N kB ln r . (2.6.17)
f i
Reversible adiabatic expansion – If the gas expands quasistatically and reversibly, then S = S(E, V , N ) holds everywhere along the thermodynamic path. We then have, assuming dN =0 ,
1 dE dV
0 = dS = f N kB + N kB
2 E V
f /2
= N kB d ln (V E ) .
Integrating, we find
2/f
E V
0
=( ) . (2.6.18)
E V
0
Thus,
−2/f −2/f
E =r E ⟺ T =r T . (2.6.19)
f i f i
Example system
Consider a model thermodynamic system for which
3
aS
E(S, V , N ) = , (2.6.20)
NV
where a is a constant. We have

dE = T dS − p dV + μ dN , (2.6.21)
and therefore
2
∂E 3aS
T =( ) =
∂S NV
V ,N
3
∂E aS
p = −( ) =
2
∂V NV
S,N
3
∂E aS
μ =( ) =− .
2
∂N N V
S,V
Choosing any two of these equations, we can eliminate S , which is inconvenient for experimental purposes. This yields three equations of state,
3 3
T V T N p N
= 27a , = 27a , =− , (2.6.22)
2 2
p N μ V μ V
only two of which are independent.

What about C and C ? To find C , we recast Equation [EXA] as
V p V
1/2
NV T
S =( ) . (2.6.23)
3a
We then have
1/2 2
∂S 1 NV T N T
C = T( ) = ( ) = , (2.6.24)
V
∂T 2 3a 18a p
V ,N
where the last equality on the RHS follows upon invoking the first of the equations of state in Equation [TEOS]. To find Cp , we eliminate V from eqns. [EXA] and [EXB], obtaining
T /p = 9aS/N . From this we obtain
2
2
∂S 2N T
Cp = T ( ) = . (2.6.25)
∂T 9a p
p,N
Thus, C p / CV =4 .
We can derive still more. To find the isothermal compressibility κ
T
=−
1
V
(
∂V
∂p
)
T ,N
, use the first of the equations of state in Equation [TEOS]. To derive the adiabatic compressibility
κ
S
=−
1
V
(
∂V
∂p
)
S,N
, use Equation [EXB], and then eliminate the inconvenient variable S .
Suppose we use this system as the working substance for a Carnot engine. Let’s compute the work done and the engine efficiency. To do this, it is helpful to eliminate S in the expression for the
energy, and to rewrite the equation of state: \[E=pV=\sqrt
\>V^{1/2}\,T^{3/2}\qquad , \qquad p=\sqrt\>{T^{3/2}\over V^{1/2}}\ .\] We assume dN =0 throughout. We now see that for isotherms,
E
dT = 0 : = constant (2.6.26)
−
−
√V
Furthermore, since \[\dbar W\big|\nd_T=\sqrt\>T^{3/2}\>{dV\over V^{1/2}}=2\,dE\big|\nd_T\ ,\] we conclude that

dT = 0 : W = 2(E −E ) , Q =E −E +W = 3(E −E ) . (2.6.27)
if f i if f i if f i
For adiabats, Equation [EXA] says d(T V ) = 0 , and therefore
′
E
\mathchar 26
dQ = 0 : T V = constant , = constant , EV = constant (2.6.28)
T
as well as W if
= E −E
i f
. We can use these relations to derive the following:
E\subB=\sqrt{V\subB\over V\subA}\,E\subA\quad,\quad E\subC={T\ns_1\over T\ns_2}\sqrt{V\subB\over V\subA}\,E\subA\quad,\quad E\subD={T\ns_1\over T\ns_2}\,E\subA\ .
Now we can write

\begin{aligned} W\subAB&=2(E\subB-E\subA)=2\Bigg(\sqrt{V\subB\over V\subA}-1\Bigg)E\subA\\ W\subBC&=(E\subB-E\subC)=\sqrt{V\subB\over V\subA}\Bigg(1-{T\ns_1\over T\ns_2}\Bigg)E\subA\\ W\subCD&=2
Adding up all the work, we obtain
\begin{split} W&=W\ns_\ssr{AB}+W\ns_\ssr{BC}+W\ns_\ssr{CD}+W\ns_\ssr{DA}\vph\\ &=3\Bigg(\sqrt{V\subB\over V\subA}-1\Bigg)\Bigg(1-{T\ns_1\over T\ns_2}\Bigg)E\subA\ . \end{split}
Since
Q\subAB=3(E\subB-E\subA)=\frac{3}{2}W\subAB=3\Bigg(\sqrt{V\subB\over V\subA}-1\Bigg)E\subA\ ,
we find once again

\eta={W\over Q\subAB}=1-{T\ns_1\over T\ns_2} \ .
Measuring the entropy of a substance

If we can measure the heat capacity C V
(T ) or C
p (T ) of a substance as a function of temperature down to the lowest temperatures, then we can measure the entropy. At constant pressure, for example,
we have T dS = C dT , hence p
T
′
Cp (T )
′
S(p, T ) = S(p, T = 0) + ∫ dT . (2.6.29)
′
T
0
The zero temperature entropy is S(p, T = 0) = kB ln g where g is the quantum ground state degeneracy at pressure p. In all but highly unusual cases, g = 1 and S(p, T = 0) = 0 .
This page titled 2.6: The Entropy is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel Arovas.
Thermodynamic systems may do work on their environments. Under certain constraints, the work done may be bounded from
above by the change in an appropriately defined thermodynamic potential.
Energy E
Suppose we wish to create a thermodynamic system from scratch. Let’s imagine that we create it from scratch in a thermally
insulated box of volume V . The work we must to to assemble the system is then W = E . After we bring all the constituent
particles together, pulling them in from infinity (say), the system will have total energy E . After we finish, the system may not be
in thermal equilibrium. Spontaneous processes will then occur so as to maximize the system’s entropy, but the internal energy
remains at E .
′ ′
We have, from the First Law, dE = \mathchar 26
dQ − \mathchar 26
dW and combining this with the Second Law in the form
dQ ≤ T dS yields
′
\mathchar 26
′
dE ≤ T dS − \mathchar 26
dW . (2.7.1)
Rearranging terms, we have \mathchar 26 dW ≤ T dS − dE . Hence, the work done by a thermodynamic system under conditions
′
′
of constant entropy is bounded above by −dE , and the maximum \mathchar 26 dW is achieved for a reversible process. It is
sometimes useful to define the quantity

′ ′
\mathchar 26
dW = \mathchar 26
dW − p dV , (2.7.2)
f ree
which is the differential work done by the system other than that required to change its volume. Then we have
′
\mathchar 26
dW ≤ T dS − p dV − dE , (2.7.3)
f ree
and we conclude for systems at fixed (S, V ) that \mathchar 26

′
dWf ree ≤ −dE .
In equilibrium, the equality in Equation [dEeqn] holds, and for single component systems where
′
\mathchar 26
dW = p dV − μ dN we have E = E(S, V , N ) with
∂E ∂E ∂E
T =( ) , −p = ( ) , μ =( ) . (2.7.4)
∂S V ,N
∂V S,N
∂N S,V
These expressions are easily generalized to multicomponent systems, magnetic systems,

Now consider a single component system at fixed (S, V , N ). We conclude that dE ≤ 0 , which says that spontaneous processes in
a system with dS = dV = dN = 0 always lead to a reduction in the internal energy E . Therefore, spontaneous processes drive
the internal energy E to a minimum in systems at fixed (S, V , N ).
Helmholtz free energy F

Suppose that when we spontaneously create our system while it is in constant contact with a thermal reservoir at temperature T .
Then as we create our system, it will absorb heat from the reservoir. Therefore, we don’t have to supply the full internal energy E ,
but rather only E − Q , since the system receives heat energy Q from the reservoir. In other words, we must perform work
W = E − T S to create our system, if it is constantly in equilibrium at temperature T . The quantity E − T S is known as the
Helmholtz free energy, F , which is related to the energy E by a Legendre transformation,
F = E −TS . (2.7.5)
The general properties of Legendre transformations are discussed in Appendix II, §16.
Again invoking the Second Law, we have
′
dF ≤ −S dT − \mathchar 26
dW . (2.7.6)
Rearranging terms, we have \mathchar 26 dW ≤ −S dT − dF , which says that the work done by a thermodynamic system under
′
conditions of constant temperature is bounded above by −dF , and the maximum \mathchar 26 dW is achieved for a reversible
′
process. We also have the general result
′
\mathchar 26
dW ≤ −S dT − p dV − dF , (2.7.7)
f ree
and we conclude, for systems at fixed (T , V ), that \mathchar 26

′
dWf ree ≤ −dF .
Under equilibrium conditions, the equality in Equation [dFeqn] holds, and for single component systems where
dW = p dV − μ dN we have dF = −S dT − p dV + μ dN . This says that F = F (T , V , N ) with
′
\mathchar 26
∂F ∂F ∂F
−S = ( ) , −p = ( ) , μ =( ) . (2.7.8)
∂T ∂V ∂N
V ,N T ,N T ,V
For spontaneous processes, dF ≤ −S dT − p dV + μ dN says that spontaneous processes drive the Helmholtz free energy F to a
minimum in systems at fixed (T , V , N ).
Enthalpy H
Suppose that when we spontaneously create our system while it is thermally insulated, but in constant mechanical contact with a
‘volume bath’ at pressure p. For example, we could create our system inside a thermally insulated chamber with one movable wall
where the external pressure is fixed at p. Thus, when creating the system, in addition to the system’s internal energy E , we must
also perform work pV in order to make room for it. In other words, we must perform work W = E + pV . The quantity E + pV is
known as the enthalpy, H. (We use the calligraphic font for H for enthalpy to avoid confusing it with magnetic field, H .) The
enthalpy is obtained from the energy via a different Legendre transformation than that used to obtain the Helmholtz free energy F ,
H = E + pV . (2.7.9)
Again invoking the Second Law, we have

′
dH ≤ T dS − \mathchar 26
dW + p dV + V dp , (2.7.10)
′ ′
hence with \mathchar 26
dW
f ree
= \mathchar 26
dW − p dV , we have in general
′
\mathchar 26
dW ≤ T dS + V dp − dH , (2.7.11)
f ree
and we conclude, for systems at fixed (S, p), that \mathchar 26

′
dWf ree ≤ −dH .
In equilibrium, for single component systems,
dH = T dS + V dp + μ dN , (2.7.12)
which says H = H(S, p, N ) , with
∂H ∂H ∂H
T =( ) , V =( ) , μ =( ) . (2.7.13)
∂S ∂p ∂N
p,N S,N S,p
For spontaneous processes, dH ≤ T dS + V dp + μ dN , which says that spontaneous processes drive the enthalpy H to a
minimum in systems at fixed (S, p, N ).
Gibbs free energy G

If we create a thermodynamic system at conditions of constant temperature T and constant pressure p, then it absorbs heat energy
Q = T S from the reservoir and we must expend work energy pV in order to make room for it. Thus, the total amount of work we
must do in assembling our system is W = E − T S + pV . This is the Gibbs free energy, G. The Gibbs free energy is obtained
from E after two Legendre transformations,
G = E − T S + pV (2.7.14)
Note that G = F + pV = H − T S . The Second Law says that

′
dG ≤ −S dT + V dp + p dV − \mathchar 26
dW , (2.7.15)
′
which we may rearrange as \mathchar 26
dW
f ree
≤ −S dT + V dp − dG . Accordingly, we conclude, for systems at fixed (T , p),
that \mathchar 26
dW
′
≤ −dG .
f ree
For equilibrium one-component systems, the differential of G is
dG = −S dT + V dp + μ dN , (2.7.16)
therefore G = G(T , p, N ) , with

∂G ∂G ∂G
−S = ( ) , V =( ) , μ =( ) . (2.7.17)
∂T ∂p ∂N
p,N T ,N T ,p
Recall that Euler’s theorem for single component systems requires E = T S − pV + μN which says G = μN , Thus, the
chemical potential μ is the Gibbs free energy per particle. For spontaneous processes, dG ≤ −S dT + V dp + μ dN , hence
spontaneous processes drive the Gibbs free energy G to a minimum in systems at fixed (T , p, N ).
Grand potential Ω
The grand potential, sometimes called the Landau free energy, is defined by
Ω = E − T S − μN . (2.7.18)
Under equilibrium conditions, its differential is

dΩ = −S dT − p dV − N dμ , (2.7.19)
hence
∂Ω ∂Ω ∂Ω
−S = ( ) , −p = ( ) , −N = ( ) . (2.7.20)
∂T V ,μ
∂V T ,μ
∂μ T ,V
Again invoking Equation [ETS], we find Ω = −pV , which says that the pressure is the negative of the grand potential per unit
volume.
The Second Law tells us
′
dΩ ≤ − \mathchar 26
dW − S dT − μ dN − N dμ , (2.7.21)
hence
′ ′
\mathchar 26
dW̃ f ree
≡ \mathchar 26
dW + μ dN ≤ −S dT − p dV − N dμ − dΩ . (2.7.22)
f ree
We conclude, for systems at fixed (T , V , μ), that \mathchar 26

dW̃
′
f ree ≤ −dΩ .
This page titled 2.7: Thermodynamic Potentials is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel
Arovas.
Maxwell relations are conditions equating certain derivatives of state variables which follow from the exactness of the differentials
of the various state functions.
Relations deriving from E(S, V , N )

The energy E(S, V , N ) is a state function, with
dE = T dS − p dV + μ dN , (2.8.1)
and therefore
∂E ∂E ∂E
T =( ) , −p = ( ) , μ =( ) . (2.8.2)
∂S ∂V ∂N
V ,N S,N S,V
Taking the mixed second derivatives, we find

2
∂ E ∂T ∂p
=( ) = −( )
∂S ∂V ∂V ∂S
S,N V ,N
2
∂ E ∂T ∂μ
=( ) =( )
∂S ∂N ∂N ∂S
S,V V ,N
2
∂ E ∂p ∂μ
= −( ) =( ) .
∂V ∂N ∂N ∂V
S,V S,N
Relations deriving from F (T , V , N )

The energy F (T , V , N ) is a state function, with
dF = −S dT − p dV + μ dN , (2.8.3)
and therefore
∂F ∂F ∂F
−S = ( ) , −p = ( ) , μ =( ) . (2.8.4)
∂T V ,N
∂V T ,N
∂N T ,V

2
∂ F ∂S ∂p
= −( ) = −( )
∂T ∂V ∂V ∂T
T ,N V ,N
2
∂ F ∂S ∂μ
= −( ) =( )
∂T ∂N ∂N ∂T
T ,V V ,N
2
∂ F ∂p ∂μ
= −( ) =( ) .
∂V ∂N ∂N ∂V
T ,V T ,N
Relations deriving from H(S, p, N )

The enthalpy H(S, p, N ) satisfies
dH = T dS + V dp + μ dN , (2.8.5)
which says H = H(S, p, N ) , with

∂H ∂H ∂H
T =( ) , V =( ) , μ =( ) . (2.8.6)
∂S ∂p ∂N
p,N S,N S,p
2
∂ H ∂T ∂V
=( ) =( )
∂S ∂p ∂p ∂S
S,N p,N
2
∂ H ∂T ∂μ
=( ) =( )
∂S ∂N ∂N ∂S
S,p p,N
2
∂ H ∂V ∂μ
=( ) =( ) .
∂p ∂N ∂N ∂p
S,p S,N
Relations deriving from G(T , p, N )

The Gibbs free energy G(T , p, N ) satisfies
dG = −S dT + V dp + μ dN , (2.8.7)
therefore G = G(T , p, N ) , with
∂G ∂G ∂G
−S = ( ) , V =( ) , μ =( ) . (2.8.8)
∂T ∂p ∂N
p,N T ,N T ,p

2
∂ G ∂S ∂V
= −( ) =( )
∂T ∂p ∂p ∂T
T ,N p,N
2
∂ G ∂S ∂μ
= −( ) =( )
∂T ∂N ∂N ∂T
T ,p p,N
2
∂ G ∂V ∂μ
=( ) =( ) .
∂p ∂N ∂N ∂p
T ,p T ,N
Relations deriving from Ω(T , V , μ)

The grand potential Ω(T , V , μ) satisfied
dΩ = −S dT − p dV − N dμ , (2.8.9)
hence
∂Ω ∂Ω ∂Ω
−S = ( ) , −p = ( ) , −N = ( ) . (2.8.10)
∂T ∂V ∂μ
V ,μ T ,μ T ,V

2
∂ Ω ∂S ∂p
= −( ) = −( )
∂T ∂V ∂V ∂T
T ,μ V ,μ
2
∂ Ω ∂S ∂N
= −( ) = −( )
∂T ∂μ ∂μ ∂T
T ,V V ,μ
2
∂ Ω ∂p ∂N
= −( ) = −( ) .
∂V ∂μ ∂μ ∂V
T ,V T ,μ
Relations deriving from S(E, V , N )

We can also derive Maxwell relations based on the entropy S(E, V , N ) itself. For example, we have
1 p μ
dS = dE + dV − dN . (2.8.11)
T T T
Therefore S = S(E, V , N ) and
2
∂ S
(2.8.12)
−1 −1
∂( T ) ∂(p T )
∂E ∂V = ( ) =( ) ,
∂V ∂E
E,N V ,N
et cetera.
Generalized thermodynamic potentials

We have up until now assumed a generalized force-displacement pair (y, X) = (−p, V ) . But the above results also generalize to
magnetic systems, where (y, X) = (H , M ) . In general, we have
\tt THIS\tt\ SPACE AVAILABLE dE = T dS + y dX + μ dN
F = E −TS dF = −S dT + y dX + μ dN
H = E − yX dH = T dS − X dy + μ dN
G = E − T S − yX dG = −S dT − X dy + μ dN
Ω = E − T S − μN dΩ = −S dT + y dX − N dμ .
Generalizing (−p, V ) → (y, X) , we also obtain, mutatis mutandis, the following Maxwell relations:
∂T ∂y ∂T ∂μ ∂y ∂μ
( ) =( ) ( ) =( ) ( ) =( )
∂X ∂S ∂N ∂S ∂N ∂X
S,N X,N S,X X,N S,X S,N
∂T ∂X ∂T ∂μ ∂X ∂μ
( ) = −( ) ( ) =( ) ( ) = −( )
∂y ∂S ∂N ∂S ∂N ∂y
S,N y,N S,y y,N S,y S,N
∂S ∂y ∂S ∂μ ∂y ∂μ
( ) = −( ) ( ) = −( ) ( ) =( )
∂X ∂T ∂N ∂T ∂N ∂X
T ,N X,N T ,X X,N T ,X T ,N
∂S ∂X ∂S ∂μ ∂X ∂μ
( ) =( ) ( ) = −( ) ( ) = −( )
∂y ∂T ∂N ∂T ∂N ∂y
T ,N y,N T ,y y,N T ,y T ,N
∂S ∂y ∂S ∂N ∂y ∂N
( ) = −( ) ( ) =( ) ( ) = −( ) .
∂X ∂T ∂μ ∂T ∂μ ∂X
T ,μ X,μ T ,X X,μ T ,X T ,μ
This page titled 2.8: Maxwell Relations is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel Arovas.
Equilibrium
Suppose we have two systems, A and B, which are free to exchange energy, volume, and particle number, subject to overall
conservation rules
EA + EB = E , VA + VB = V , NA + NB = N , (2.9.1)
where E , V , and N are fixed. Now let us compute the change in the total entropy of the combined systems when they are allowed
to exchange energy, volume, or particle number. We assume that the entropy is additive,
∂SA ∂SB ∂SA ∂SB

dS = [( ) −( ) ] dEA + [ ( ) −( ) ] dVA
∂EA ∂EB ∂VA ∂VB
VA , NA VB , NB EA , NA EB , NB
∂SA ∂SB
+ [( ) −( ) ] dNA .
∂NA EA , VA
∂NB EB , VB
Note that we have used dE = −dE , dV = −dV , and dN = −dN . Now we know from the Second Law that spontaneous
B A B A B A
processes result in T dS > 0 , which means that S tends to a maximum. If S is a maximum, it must be that the coefficients of dE , A
dV , and dN
A A all vanish, else we could increase the total entropy of the system by a judicious choice of these three differentials.
From T dS = dE + p dV − μ, dN , we have
1 ∂S p ∂S μ ∂S
=( ) , =( ) , = −( ) . (2.9.2)
T ∂E T ∂V T ∂N
V ,N E,N E,V
Thus, we conclude that in order for the system to be in equilibrium, so that S is maximized and can increase no further under
spontaneous processes, we must have
TA = TB (thermal equilibrium)
pA pB
= (mechanical equilibrium)
TA TB
μA μB
= (chemical equilibrium)
TA TB
Stability
Next, consider a uniform system with energy E = 2E , volume V = 2V , and particle number N = 2N . We wish to check that
′ ′ ′
this system is not unstable with respect to spontaneously becoming inhomogeneous. To that end, we imagine dividing the system in
half. Each half would have energy E , volume V , and particle number N . But suppose we divided up these quantities differently, so
that the left half had slightly different energy, volume, and particle number than the right, as depicted in Figure 2.9.1 . Does the
entropy increase or decrease? We have
ΔS = S(E + ΔE, V + ΔV , N + ΔN ) + S(E − ΔE, V − ΔV , N − ΔN ) − S(2E, 2V , 2N )
2 2 2
∂ S 2
∂ S 2
∂ S 2
= (ΔE ) + (ΔV ) + (ΔN )
∂E 2 ∂V 2
∂N 2
2 2 2
∂ S ∂ S ∂ S
+2 ΔE ΔV + 2 ΔE ΔN + 2 ΔV ΔN .
∂E ∂V ∂E ∂N ∂V ∂N
Thus, we can write
ΔS = ∑ Q Ψ Ψ , (2.9.3)
ij i j
i,j
where
2 2 2
∂ S ∂ S ∂ S
⎛ 2
⎞
∂E ∂E ∂V ∂E ∂N
⎜ ⎟
⎜ ⎟
⎜ ⎟
⎜ ∂ S
2 2
∂ S
2
∂ S ⎟
Q =⎜ ⎟ (2.9.4)
2
⎜ ∂E ∂V ∂V ∂V ∂N ⎟
⎜ ⎟
⎜ ⎟
⎜ ⎟
2 2 2
∂ S ∂ S ∂ S
⎝ ⎠
2
∂E ∂N ∂V ∂N ∂N
is the matrix of second derivatives, known in mathematical parlance as the Hessian, and Ψ = (ΔE, ΔV , ΔN ) . Note that Q is a
symmetric matrix.
Figure 2.9.1 : To check for an instability, we compare the energy of a system to its total energy when we reapportion its energy,
volume, and particle number slightly unequally.
Since S must be a maximum in order for the system to be in equilibrium, we are tempted to conclude that the homogeneous system
is stable if and only if all three eigenvalues of Q are negative. If one or more of the eigenvalues is positive, then it is possible to
choose a set of variations Ψ such that ΔS > 0 , which would contradict the assumption that the homogeneous state is one of
maximum entropy. A matrix with this restriction is said to be negative definite. While it is true that Q can have no positive
eigenvalues, it is clear from homogeneity of S(E, V , N ) that one of the three eigenvalues must be zero, corresponding to the
eigenvector Ψ = (E, V , N ) . Homogeneity means S(λE, λV , λN ) = λS(E, V , N ) . Now let us take λ = 1 + η , where η is
infinitesimal. Then ΔE = ηE , ΔV = ηV , and ΔN = ηN , and homogeneity says
S(E ± ΔE, V ± ΔV , N ± ΔN ) = (1 ± η) S(E, V , N ) and ΔS = (1 + η)S + (1 − η)S − 2S = 0 . We then have a slightly
weaker characterization of Q as negative semidefinite.
However, if we fix one of the components of (ΔE, ΔV , ΔN ) to be zero, then Ψ must have some component orthogonal to the
zero eigenvector, in which case ΔS < 0 . Suppose we set ΔN = 0 and we just examine the stability with respect to
inhomogeneities in energy and volume. We then restrict our attention to the upper left 2 × 2 submatrix of Q. A general symmetric
2 × 2 matrix may be written
a b
Q =( ) (2.9.5)
b c
It is easy to solve for the eigenvalues of Q. One finds

−−−−−−−−−−−
2
a+c a−c
2
λ =( ) ± √( ) +b . (2.9.6)
±
2 2
In order for Q to be negative definite, we require λ < 0 and +

λ− < 0 . Thus, Tr Q = a + c = λ+ + λ− < 0 and
det Q = ac − b
2
= λ+ λ− > 0 . Taken together, these conditions require
2
a <0 , c <0 , ac > b . (2.9.7)
Going back to thermodynamic variables, this requires

2 2 2 2 2 2
∂ S ∂ S ∂ S ∂ S ∂ S
<0 , <0 , ⋅ >( ) . (2.9.8)
2 2 2 2
∂E ∂V ∂E ∂V ∂E ∂V
Thus the entropy is a concave function of E and V at fixed N . Had we set ΔE = 0 and considered the lower right 2 × 2
submatrix of Q, we’d have concluded that S(V , N ) is concave at fixed E . Since ( ) = T , we have ∂S
∂E V
−1
2
CV
∂ S
2
=−
1
2
(
∂T
∂E
)
V
=− 2
<0 and we conclude C V
>0 for stability.
∂E T T
Many thermodynamic systems are held at fixed (T , p, N ), which suggests we examine the stability criteria for G(T , p, N ).
Suppose our system is in equilibrium with a reservoir at temperature T and pressure p . Then, suppressing N (which is assumed
0 0
constant), we have
G(T , p ) = E − T S +p V . (2.9.9)
0 0 0 0
Now suppose there is a fluctuation in the entropy and the volume of our system, which is held at fixed particle number. Going to
second order in ΔS and ΔV , we have
∂E ∂E
ΔG = [ ( ) − T ] ΔS + [ ( ) + p ] ΔV
0 0
∂S ∂V
V S
2 2 2
1 ∂ E 2
∂ E ∂ E 2
+ [ (ΔS ) +2 ΔS ΔV + (ΔV ) ] + … .
2 2
2 ∂S ∂S ∂V ∂V
Equilibrium requires that the coefficients of ΔS and ΔV both vanish, that T =(

∂E
∂S
)
V ,N
=T
0
and p = −(
∂E
∂V
)
S,N
=p
0
. The
condition for stability is that ΔG > 0 for all (ΔS, ΔV ). Stability therefore requires that the Hessian matrix Q be positive definite,
with
2 2
∂ E ∂ E
⎛ 2
⎞
∂S ∂S ∂V
⎜ ⎟
Q =⎜ ⎟ . (2.9.10)
⎜ ⎟
2 2
∂ E ∂ E
⎝ ⎠
2
∂S ∂V ∂V
Thus, we have the following three conditions:

2
∂ E ∂T T
=( ) = >0
2
∂S ∂S C
V V
2
∂ E ∂p 1
= −( ) = >0
2
∂V ∂V V κ
S S
2 2 2 2 2
∂ E ∂ E ∂ E T ∂T
⋅ −( ) = −( ) >0 .
2 2
∂S ∂V ∂S ∂V V κ C ∂V
S V S
As we shall discuss below, the quantity α ≡ ( S

1
V
∂V
∂T
)
S,N
is the adiabatic thermal expansivity coefficient. We therefore conclude
that stability of any thermodynamic system requires
−−−−−−
C κ C
V S V
>0 , κ >0 , α >√ . (2.9.11)
S S
T V T
This page titled 2.9: Equilibrium and Stability is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel
Arovas.
A discussion of various useful mathematical relations among partial derivatives may be found in the appendix in §17. Some facility
with the differential multivariable calculus is extremely useful in the analysis of thermodynamics problems.
Adiabatic free expansion revisited

Consider once again the adiabatic free expansion of a gas from initial volume V to final volume V = rV . Since the system is not
i f i
in equilibrium during the free expansion process, the initial and final states do not lie along an adiabat, they do not have the same
entropy. Rather, as we found, from Q = W = 0 , we have that E = E , which means they have the same energy, and, in the case
i f
of an ideal gas, the same temperature (assuming N is constant). Thus, the initial and final states lie along an isotherm. The situation
is depicted in Fig. [AFEgraph]. Now let us compute the change in entropy ΔS = S − S by integrating along this isotherm. Note
f i
that the actual dynamics are irreversible and do not quasistatically follow any continuous thermodynamic path. However, we can
use what is a fictitious thermodynamic path as a means of comparing S in the initial and final states.
[AFEgraph] Adiabatic free expansion via a thermal path. The initial and final states do not lie along an adabat! Rather, for an ideal
gas, the initial and final states lie along an isotherm.
We have
Vf
∂S
ΔS = S −S = ∫ dV ( ) . (2.10.1)
f i
∂V
T ,N
Vi
But from a Maxwell equation deriving from F , we have

∂S ∂p
( ) =( ) , (2.10.2)
∂V T ,N
∂T V ,N
hence
V
f
∂p
ΔS = ∫ dV ( ) . (2.10.3)
∂T
V ,N
V
i
For an ideal gas, we can use the equation of state pV = N kB T to obtain
∂p N kB
( ) = . (2.10.4)
∂T V
V ,N
The integral can now be computed:

rV
i
N kB
ΔS = ∫ dV = N kB ln r , (2.10.5)
V
Vi
as we found before, in Equation [AFEdS] What is different about this derivation? Previously, we derived the entropy change from
the explicit formula for S(E, V , N ). Here, we did not need to know this function. The Maxwell relation allowed us to compute the
entropy change using only the equation of state.
Energy and volume

We saw how E(T , V , N ) = 1
2
f N kB T for an ideal gas, independent of the volume. In general we should have
V
E(T , V , N ) = N ϕ(T , ) . (2.10.6)
N
For the ideal gas, ϕ(T , ) = f k T is a function of T alone and is independent on the other intensive quantity V /N . How does
V
N
1
2
B
energy vary with volume? At fixed temperature and particle number, we have, from E = F + T S ,
∂E ∂F ∂S ∂p
( ) =( ) +T( ) = −p + T ( ) , (2.10.7)
∂V T ,N
∂V T ,N
∂V T ,N
∂T V ,N
where we have used the Maxwell relation \big({\pz S\over\pz V}\big)\ns_\sss{T.N}=\big({\pz p\over\pz T}\big)\ns_\sss{V,N} , derived
2
from the mixed second derivative . Another way to derive this result is as follows. Write
∂ F
∂T ∂V
dE = T dS − p dV + μ dN and
then express dS in terms of dT , dV , and dN , resulting in
∂S ∂S ∂μ
dE = T ( ) dT + [T ( ) − p]dV − [T ( ) + μ]dN . (2.10.8)
∂T ∂V ∂T
V ,N T ,N V ,N
Now read off \big({\pz E\over\pz V}\big)\ns_\sss{V,N} and use the same Maxwell relation as before to recover Equation [pEVTN].
Applying this result to the ideal gas law pV = N k T results in the vanishing of the RHS, hence for any substance obeying the
B
ideal gas law we must have

E(T , V , N ) = ν ε(T ) = N ε(T )/ NA . (2.10.9)
van der Waals equation of state

It is clear that the same conclusion follows for any equation of state of the form p(T , V , N ) = T ⋅ f (V /N ) , where f (V /N ) is an
arbitrary function of its argument: the ideal gas law remains valid11. This is not true, however, for the van der Waals equation of
state,
a
(p + )(v − b) = RT , (2.10.10)
2
v
where v = N AV /N is the molar volume. We then find (always assuming constant N ),
∂E ∂ε ∂p a
( ) =( ) = T( ) −p = , (2.10.11)
2
∂V ∂v ∂T v
T T V
where E(T , V , N ) ≡ ν ε(T , v) . We can integrate this to obtain

a
ε(T , v) = ω(T ) − , (2.10.12)
v
where ω(T ) is arbitrary. From Equation [cveqn], we immediately have
∂ε ′
c =( ) = ω (T ) . (2.10.13)
V
∂T
v
[vdwab] Van der Waals parameters for some common gases. (Source: Wikipedia.)
2
gas a (
L ⋅bar
2
) b (
L
mol
) pc (bar) Tc (K) vc (L/mol)
mol
Acetone 14.09 0.0994 52.82 505.1 0.2982
Argon 1.363 0.03219 48.72 150.9 0.0966
Carbon dioxide 3.640 0.04267 7404 304.0 0.1280
Ethanol 12.18 0.08407 63.83 516.3 0.2522
Freon 10.78 0.0998 40.09 384.9 0.2994
Helium 0.03457 0.0237 2.279 5.198 0.0711
Hydrogen 0.2476 0.02661 12.95 33.16 0.0798
Mercury 8.200 0.01696 1055 1723 0.0509
Methane 2.283 0.04278 46.20 190.2 0.1283
Nitrogen 1.408 0.03913 34.06 128.2 0.1174
Oxygen 1.378 0.03183 50.37 154.3 0.0955
Water 5.536 0.03049 220.6 647.0 0.0915
What about c ? This requires a bit of work. We start with Equation [cpeqn],
p
∂ε ∂v
cp = ( ) + p( )
∂T ∂T
p p
a ∂v
′
= ω (T ) + (p + )( )
2
v ∂T
p
We next take the differential of the equation of state (at constant N ):

a 2a
R dT = (p + ) dv + (v − b)(dp − dv)
2 3
v v
a 2ab
= (p − + ) dv + (v − b) dp .
2
v v3
We can now read off the result for the volume expansion coefficient,
1 ∂v 1 R
αp = ( ) = ⋅ . (2.10.14)
a 2ab
v ∂T v p− +
p 2 3
v v
We now have for c , p
a
(p + 2
)R
v
′
cp = ω (T ) +
a 2ab
p− 2
+ 3
v v
2 3
′
R Tv
= ω (T ) + .
3 2
RT v − 2a(v − b )
where v = V N A /N is the molar volume.

To fix ω(T ), we consider the v → ∞ limit, where the density of the gas vanishes. In this limit, the gas must be ideal, hence
Equation [EVT] says that ω(T ) = f RT . Therefore c (T , v) = f R , just as in the case of an ideal gas. However, rather than
1
2 V
1
c =c
p + R , which holds for ideal gases, c (T , v) is given by Equation [cpvdw]. Thus,
p
V
\begin{aligned} c_V^\ssr{VDW}&=\half f R \\ c_p^\ssr{VDW}&=\half f R +{R^2Tv^3\over RTv^3-2a(v-b)^2}\ .\end{aligned}
Note that c p (a → 0) = c
V
+R , which is the ideal gas result.
As we shall see in chapter 7, the van der Waals system in unstable throughout a region of parameters, where it undergoes phase
separation between high density (liquid) and low density (gas) phases. The above results are valid only in the stable regions of the
phase diagram.
Thermodynamic response functions

Consider the entropy S expressed as a function of T , V , and N :
∂S ∂S ∂S
dS = ( ) dT + ( ) dV + ( ) dN . (2.10.15)
∂T ∂V ∂N
V ,N T ,N T ,V
Dividing by dT , multiplying by T , and assuming dN =0 throughout, we have
∂S ∂V
Cp − C = T( ) ( ) . (2.10.16)
V
∂V ∂T
T p
Appealing to a Maxwell relation derived from F (T , V , N ), and then appealing to Equation [boxtwob], we have
∂S ∂p ∂p ∂V
( ) =( ) = −( ) ( ). (2.10.17)
∂V ∂T ∂V ∂T
T V T p
This allows us to write

2
∂p ∂V
Cp − C = −T ( ) ( ) . (2.10.18)
V
∂V ∂T
T p
We define the response functions,

2
1 ∂V 1 ∂ G
isothermal compressibility: κ =− ( ) =−
T 2
V ∂p V ∂p
T
2
1 ∂V 1 ∂ H
adiabatic compressibility: κ =− ( ) =−
S 2
V ∂p V ∂p
S
1 ∂V
thermal expansivity: αp = ( ) .
V ∂T
p
Thus,
2
T αp
Cp − C =V , (2.10.19)
V
κ
T
or, in terms of intensive quantities,

2
v T αp
cp − c = , (2.10.20)
V
κ
T
where, as always, v = V N A
/N is the molar volume.
This above relation generalizes to any conjugate force-displacement pair (−p, V ) → (y, X) :
∂y ∂X
Cy − C = −T ( ) ( )
X
∂T ∂T
X y
2
∂y ∂X
= T( ) ( ) .
∂X ∂T
T y
For example, we could have (y, X) = (H α

,M
α
) .
A similar relationship can be derived between the compressibilities κ and κ . We then clearly must start with the volume, writing
T S
∂V ∂V ∂V
dV = ( ) dp + ( ) dS + ( ) dN . (2.10.21)
∂p ∂S ∂p
S,N p,N S,p
Dividing by dp, multiplying by −V −1
, and keeping N constant, we have
1 ∂V ∂S
κ −κ =− ( )( ) . (2.10.22)
T S
V ∂S ∂p
p T
Again we appeal to a Maxwell relation, writing
∂S ∂V
( ) = −( ) , (2.10.23)
∂p ∂T
T p
and after invoking the chain rule,
∂V ∂V ∂T T ∂V
( ) =( )( ) = ( ) , (2.10.24)
∂S ∂T ∂S Cp ∂T
p p p p
we obtain
2
v T αp
κ −κ = . (2.10.25)
T S
cp
Comparing eqns. [inta] and [intb], we find

2
(cp − c )κ = (κ −κ ) cp = v T αp . (2.10.26)
V T T S
This result entails

cp κ
T
= . (2.10.27)
c κ
V S
The corresponding result for magnetic systems is

2
∂m
(c −c )χ = (χ −χ )c = T( ) , (2.10.28)
H M T T S H
∂T
H
where m = M /ν is the magnetization per mole of substance, and

2
∂M 1 ∂ G
isothermal susceptibility: χ =( ) =−
T 2
∂H ν ∂H
T
2
∂M 1 ∂ H
adiabatic susceptibility: χ =( ) =− .
S 2
∂H ν ∂H
S
Here the enthalpy and Gibbs free energy are

H = E − HM dH = T dS − M dH
G = E − T S − HM dG = −S dT − M dH .
Remark: The previous discussion has assumed an isotropic magnetic system where M and H are collinear, hence H⋅M = H M .
α 2
αβ ∂M 1 ∂ G
χ =( ) =−
T β α β
∂H ν ∂H ∂H
T
α 2
αβ ∂M 1 ∂ H
χ =( ) =− .
S β α β
∂H ν ∂H ∂H
S
In this case, the enthalpy and Gibbs free energy are

H = E − H⋅M dH = T dS − M⋅dH
G = E − T S − H⋅M dG = −S dT − M⋅dH .
Joule effect: free expansion of a gas

Previously we considered the adiabatic free expansion of an ideal gas. We found that Q = W = 0 hence ΔE = 0 , which means
the process is isothermal, since E = ν ε(T ) is volume-independent. The entropy changes, however, since
S(E, V , N ) = N kB ln(V /N ) +
1
2
f N kB ln(E/N ) + N s
0
. Thus,
V
f
S = S + N kB ln( ) . (2.10.29)
f i
V
i
What happens if the gas is nonideal?

We integrate along a fictitious thermodynamic path connecting initial and final states, where dE = 0 along the path. We have
∂E ∂E
0 = dE = ( ) dV + ( ) dT (2.10.30)
∂V T
∂T V
hence
∂T (∂E/∂V ) 1 ∂E
T
( ) =− =− ( ) . (2.10.31)
∂V (∂E/∂T ) C ∂V
E V V T
We also have
∂E ∂S ∂p
( ) = T( ) −p = T( ) −p . (2.10.32)
∂V ∂V ∂T
T T V
Thus,
∂T 1 ∂p
( ) = [p − T ( ) ] . (2.10.33)
∂V C ∂T
E V V
Note that the term in square brackets vanishes for any system obeying the ideal gas law. For a nonideal gas,
V
f
∂T
ΔT = ∫ dV ( ) , (2.10.34)
∂V
E
Vi
which is in general nonzero.

Now consider a van der Waals gas, for which
a
(p + )(v − b) = RT . (2.10.35)
2
v
We then have
2
∂p a aν
p −T( ) =− =− . (2.10.36)
2 2
∂T v V
V
In §11.3 we concluded that C V

=
1
2
f νR for the van der Waals gas, hence
Vf
2aν dV 2a 1 1
ΔT = − ∫ = ( − ) . (2.10.37)
2
fR V fR v v
f i
Vi
Thus, if V
f
>V
i
, we have T
f
<T
i
and the gas cools upon expansion.
Consider O gas with an initial specific volume of v = 22.4 L/mol, which is the STP value for an ideal gas, freely expanding to a
2 i
volume v = ∞ for maximum cooling. According to table [vdwab], a = 1.378 L ⋅ bar/mol , and we have
f
2 2
ΔT = −2a/f Rv = −0.296 K , which is a pitifully small amount of cooling. Adiabatic free expansion is a very inefficient way to
i
cool a gas.
[throttle] In a throttle, a gas is pushed through a porous plug separating regions of different pressure. The change in energy is the
work done, hence enthalpy is conserved during the throttling process.
Throttling: the Joule-Thompson effect

In a throttle, depicted in Fig. [throttle], a gas is forced through a porous plug which separates regions of different pressures.
According to the figure, the work done on a given element of gas is
Vf Vi
W = ∫ dV p − ∫ dV p =p V −p V . (2.10.38)
f i f f i i
0 0
Now we assume that the system is thermally isolated so that the gas exchanges no heat with its environment, nor with the plug.
Then Q = 0 so ΔE = −W , and
E +p V =E +p V
i i i f f f
Hi = Hf ,
where H is enthalpy. Thus, the throttling process is isenthalpic. We can therefore study it by defining a fictitious thermodynamic
path along which dH = 0 . The, choosing T and p as state variables,
∂H ∂H
0 = dH = ( ) dT + ( ) dp (2.10.39)
∂T ∂p
p T
hence
∂T (∂H/∂p)
T
( ) =− . (2.10.40)
∂p (∂H/∂T )p
H
The numerator on the RHS is computed by writing dH = T dS + V dp and then dividing by dp, to obtain
∂H ∂S ∂V
( ) = V +T( ) = V −T( ) . (2.10.41)
∂p T
∂p T
∂T p
The denominator is
∂H ∂H ∂S
( ) =( )( )
∂T ∂S ∂T
p p p
∂S
= T( ) = Cp .
∂T
p
Thus,
∂T 1 ∂v
( ) = [T ( ) − v]
∂p cp ∂T
H p
v
= (T αp − 1) ,
cp
where α p =
1
V
(
∂V
∂T
) is the volume expansion coefficient.
p
From the van der Waals equation of state, we obtain, from Equation 2.10.14,
T ∂v RT /v v−b
T αp = ( ) = = . (2.10.42)
a 2ab 2
v ∂T p− + 2a v−b
p
v
2
v
3
v− ( )
RT v
Assuming v ≫ RT
a
,b , we have
∂T 1 2a
( ) = ( − b) . (2.10.43)
∂p cp RT
H
Thus, for T >T

∗
=
2a
Rb
, we have (
∂T
∂p
) <0 and the gas heats up upon an isenthalpic pressure decrease. For T <T
∗
, the gas
H
cools under such conditions.
[itvdw] Inversion temperature T ∗

(p) for the van der Waals gas. Pressure and temperature are given in terms of pc = a/27b
2
and
T = 8a/27bR , respectively.
c
In fact, there are two inversion temperatures T for the van der Waals gas. To see this, we set T α
∗
1,2 p =1 , which is the criterion for
inversion. From Equation 2.10.42 it is easy to derive \[{b\over v}=1-\sqrt
\ .\] We insert this into the van der Waals equation of state to derive a relationship T =T
∗
(p) at which T α p =1 holds. After a
little work, we find
−−−−−−
3RT 8aRT a
p =− +√ − . (2.10.44)
3 2
2b b b
This is a quadratic equation for T , the solution of which is

−−−−−− −− 2
2
2a 3b p
∗
T (p) = (2 ± √ 1 − ) . (2.10.45)
9 bR a
In Fig. [itvdw] we plot pressure versus temperature in scaled units, showing the curve along which ( ∂T
∂p
) =0 . The volume,
H
pressure, and temperature scales defined are

a 8a
vc = 3b , pc = , Tc = . (2.10.46)
2
27 b 27 bR
Values for p , T , and v are provided in table [vdwab]. If we define v = v/v , p = p/p , and T
c c c c c = T /Tc , then the van der Waals
equation of state may be written in dimensionless form:
3
(p + )(3v − 1) = 8T . (2.10.47)
2
v
In terms of the scaled parameters, the equation for the inversion curve ( ∂T
∂p
) =0 becomes
H
−
− −
− −−−−−−−
2 2
1 1
p = 9 − 36 (1 − √ T ) ⟺ T = 3(1 ± √ 1 − p ) . (2.10.48)
3 9
Thus, there is no inversion for p > 9 p . We are usually interested in the upper inversion temperature, T , corresponding to the
c
∗
2
upper sign in Equation [invtemp]. The maximum inversion temperature occurs for p = 0 , where T = = T . For H ,
∗
max
2a
bR
27
4
c 2
from the data in table [vdwab], we find T (H ) = 224 K , which is within 10% of the experimentally measured value of 205 K.
∗
max 2
What happens when H

2
gas leaks from a container with T >T
2
∗
? Since (
∂T
∂p
) <0 and Δp < 0 , we have ΔT > 0 . The gas
H
warms up, and the heat facilitates the reaction 2 H 2

+O
2
⟶ 2H O
2
, which releases energy, and we have a nice explosion.
This page titled 2.10: Applications of Thermodynamics is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by
Daniel Arovas.
A typical phase diagram of a p-V -T system is shown in the Fig. [pdiaga](a). The solid lines delineate boundaries between distinct thermodynamic phases. These lines are called coexistence curves.
Along these curves, we can have coexistence of two phases, and the thermodynamic potentials are singular. The order of the singularity is often taken as a classification of the phase transition. if the
thermodynamic potentials E , F , G, and H have discontinuous or divergent m^\ssr{th} derivatives, the transition between the respective phases is said to be m^\ssr{th} order. Modern theories of
phase transitions generally only recognize two possibilities: first order transitions, where the order parameter changes discontinuously through the transition, and second order transitions, where the
order parameter vanishes continuously at the boundary from ordered to disordered phases12. We’ll discuss order parameters during Physics 140B.
For a more interesting phase diagram, see Fig. [pdiaga](b,c), which displays the phase diagrams for He and He. The only difference between these two atoms is that the former has one fewer
3 4
neutron: (2p + 1n + 2e) in He versus (2p + 2n + 2e) in He. As we shall learn when we study quantum statistics, this extra neutron makes all the difference, because He is a fermion while He is a
3 4 3 4
boson.
[pdiaga] (a) Typical thermodynamic phase diagram of a single component p-V -T system, showing triple point (three phase coexistence) and critical point. (Source: Univ. of Helsinki.) Also shown:
phase diagrams for He (b) and He (c). What a difference a neutron makes! (Source: Brittanica.)
3 4
- -
p v T surfaces
The equation of state for a single component system may be written as
f (p, v, T ) = 0 . (2.11.1)
This may in principle be inverted to yield p = p(v, T ) or v = v(T , p) or T = T (p, v) . The single constraint f (p, v, T ) on the three state variables defines a surface in {p, v, T } space. An example of
such a surface is shown in Fig. [PVTideal], for the ideal gas.
Real p-v -T surfaces are much richer than that for the ideal gas, because real systems undergo phase transitions in which thermodynamic properties are singular or discontinuous along certain curves
on the p-v -T surface. An example is shown in Fig. [PVTa]. The high temperature isotherms resemble those of the ideal gas, but as one cools below the critical temperature T , the isotherms become c
singular. Precisely at T = Tc , the isotherm p = p(v, T c) becomes perfectly horizontal at v = v , which is the critical molar volume. This means that the isothermal compressibility, κ
c T
=−
1
v
(
∂v
∂p
)
T
diverges at T = T . Below T , the isotherms have a flat portion, as shown in Fig. [PVTb], corresponding to a two-phase region where liquid and vapor coexist. In the (p, T ) plane, sketched for H O
c c 2
in Fig. [H2Opd] and shown for C O in Fig. [PTCO2], this liquid-vapor phase coexistence occurs along a curve, called the vaporization (or boiling) curve. The density changes discontinuously across
2
this curve; for H O, the liquid is approximately 1000 times denser than the vapor at atmospheric pressure. The density discontinuity vanishes at the critical point. Note that one can continuously
2
transform between liquid and vapor phases, without encountering any phase transitions, by going around the critical point and avoiding the two-phase region.
[PVTideal] The surface p(v, T ) = RT /v corresponding to the ideal gas equation of state, and its projections onto the (p, T ), (p, v), and (T , v) planes.
In addition to liquid-vapor coexistence, solid-liquid and solid-vapor coexistence also occur, as shown in Fig. [PVTa]. The triple point (T t
, pt ) lies at the confluence of these three coexistence regions.
For H O, the location of the triple point and critical point are given by
2
T = 273.16 K Tc = 647 K
t
−3
p = 611.7 P a = 6.037 × 10 atm pc = 22.06 M P a = 217.7 atm
t
[PVTa] A p-v -T surface for a substance which contracts upon freezing. The red dot is the critical point and the red dashed line is the critical isotherm. The yellow dot is the triple point at which there
is three phase coexistence of solid, liquid, and vapor.
The Clausius-Clapeyron relation

Recall that the homogeneity of E(S, V , N ) guaranteed E = T S − pV + μN , from Euler’s theorem. It also guarantees a relation between the intensive variables T , p, and μ , according to Equation
[GDRa]. Let us define g ≡ G/ν = N μ , the Gibbs free energy per mole. Then
A
dg = −s dT + v dp , (2.11.2)
where s = S/ν and v = V /ν are the molar entropy and molar volume, respectively. Along a coexistence curve between phase #1 and phase #2, we must have g
1
=g
2
, since the phases are free to
exchange energy and particle number, they are in thermal and chemical equilibrium. This means
dg = −s dT + v dp = −s dT + v dp = dg . (2.11.3)
1 1 1 2 2 2
Therefore, along the coexistence curve we must have
dp s −s ℓ
2 1
( ) = = , (2.11.4)
dT v −v T Δv
coex 2 1
where
ℓ ≡ T Δs = T (s −s ) (2.11.5)
2 1
~
is the molar latent heat of transition. A heat ℓ must be supplied in order to change from phase #1 to phase #2, even without changing p or T . If ℓ is the latent heat per mole, then we write ℓ as the
~
latent heat per gram: ℓ = ℓ/M , where M is the molar mass.
[PVTc] Equation of state for a substance which expands upon freezing, projected to the (v, T ) and (v, p) and (T , p) planes.
Along the liquid-gas coexistence curve, we typically have v gas ≫ v
liquid
, and assuming the vapor is ideal, we may write Δv ≈ v gas ≈ RT /p . Thus,
dp ℓ pℓ
( ) = ≈ . (2.11.6)
2
dT T Δv RT
liq−gas
If ℓ remains constant throughout a section of the liquid-gas coexistence curve, we may integrate the above equation to get
dp ℓ dT
ℓ/RT0 −ℓ/RT
= ⟹ p(T ) = p(T ) e e . (2.11.7)
2 0
p R T
Liquid-solid line in H 2O
Life on planet earth owes much of its existence to a peculiar property of water: the solid is less dense than the liquid along the coexistence curve. For example at T = 273.1 K and p = 1 atm,
~ 3 ~ 3
v water = 1.00013 cm /g , v ice = 1.0907 cm /g . (2.11.8)
~
The latent heat of the transition is ℓ = 333 J/g = 79.5 cal/g . Thus, \[\begin{split} \bigg({d p\over d T}\bigg)\ns_{liq-sol}&=
= {333\,\RJ/\Rg\over (273.1\,\RK)\,(-9.05\times 10^{-2}\,{cm}^3/\Rg)}\\ &=-1.35\times 10^{8}\,{{dyn}\over {cm}^2\,\RK} = -134\,{{atm}\over {}^\circ\RC}\ . \end{split}\] The negative slope of
the melting curve is invoked to explain the movement of glaciers: as glaciers slide down a rocky slope, they generate enormous pressure at obstacles13 Due to this pressure, the story goes, the melting
temperature decreases, and the glacier melts around the obstacle, so it can flow past it, after which it refreezes. But it is not the case that the bottom of the glacier melts under the pressure, for consider
a glacier of height h = 1 km. The pressure at the bottom is p ∼ gh/ v~ ∼ 10 P a , which is only about 100 atmospheres. Such a pressure can produce only a small shift in the melting temperature of
7
about ΔT melt
= −0.75 C .
∘
[PVTb] Projection of the p-v -T surface of Fig. [PVTa] onto the (v, p) plane.
Does the Clausius-Clapeyron relation explain how we can skate on ice? When my daughter was seven years old, she had a mass of about M = 20 kg. Her ice skates had blades of width about 5 mm
and length about 10 cm. Thus, even on one foot, she imparted an additional pressure of only
2
Mg 20 kg × 9.8 \Rm/s
5
Δp = ≈ = 3.9 × 10 P a = 3.9 atm . (2.11.9)
−3 −1
A (5 × 10 \Rm) × (10 \Rm)
The corresponding change in the melting temperature is thus minuscule: ΔT melt

≈ −0.03
∘
C .
So why could my daughter skate so nicely? The answer isn’t so clear!14 There seem to be two relevant issues in play. First, friction generates heat which can locally melt the surface of the ice. Second,
the surface of ice, and of many solids, is naturally slippery. Indeed, this is the case for ice even if one is standing still, generating no frictional forces. Why is this so? It turns out that the Gibbs free
energy of the ice-air interface is larger than the sum of free energies of ice-water and water-air interfaces. That is to say, ice, as well as many simple solids, prefers to have a thin layer of liquid on its
surface, even at temperatures well below its bulk melting point. If the intermolecular interactions are not short-ranged15, theory predicts a surface melt thickness d ∝ (T −T) . In Fig.
\Rm
−1/3
[surfmelt] we show measurements by Gilpin (1980) of the surface melt on ice, down to about −50 C. Near 0 C the melt layer thickness is about 40 nm, but this decreases to ∼ 1 nm at T = −35 C.
∘ ∘ ∘
At very low temperatures, skates stick rather than glide. Of course, the skate material is also important, since that will affect the energetics of the second interface. The 19th century novel, Hans
Brinker, or The Silver Skates by Mary Mapes Dodge tells the story of the poor but stereotypically decent and hardworking Dutch boy Hans Brinker, who dreams of winning an upcoming ice skating
race, along with the top prize: a pair of silver skates. All he has are some lousy wooden skates, which won’t do him any good in the race. He has money saved to buy steel skates, but of course his
father desperately needs an operation because – I am not making this up – he fell off a dike and lost his mind. The family has no other way to pay for the doctor. What a story! At this point, I imagine
the suspense must be too much for you to bear, but this isn’t an American Literature class, so you can use Google to find out what happens (or rent the 1958 movie, directed by Sidney Lumet). My
point here is that Hans’ crappy wooden skates can’t compare to the metal ones, even though the surface melt between the ice and the air is the same. The skate blade material also makes a difference,
both for the interface energy and, perhaps more importantly, for the generation of friction as well.
Slow melting of ice : a quasistatic but irreversible process

Suppose we have an ice cube initially at temperature T < Θ ≡ 273.15 K ( Θ = 0 C ) and we toss it into a pond of water. We regard the pond as a heat bath at some temperature T
0
∘
1
>Θ . Let the
mass of the ice be M . How much heat Q is absorbed by the ice in order to raise its temperature to T ? Clearly 1
Q=M {\tilde c}\ns_\ssr{S}(\Theta-T\ns_0) + M {\tilde \ell} + M{\tilde c}\ns_\ssr{L} (T\ns_1-\Theta)\ ,

~
where {\tilde c}\ns_\ssr{S} and {\tilde c}\ns_\ssr{L} are the specific heats of ice (solid) and water (liquid), respectively16, and ℓ is the latent heat of melting per unit mass. The pond must give up this
much heat to the ice, hence the entropy of the pond, discounting the new water which will come from the melted ice, must decrease:
Q
ΔS =− . (2.11.10)
pond
T
1
[PTCO2] Phase diagram for C O in the (p, T ) plane. (Source: www.scifun.org.)

2
Now we ask what is the entropy change of the H 2

O in the ice. We have
\begin{split} \RDelta S\ns_{ice}&=\int\!{\dbar Q\over T} = \int\limits_{T\ns_0}^{\Theta}\!\!dT\,{M{\tilde c}\ns_\ssr{S}\over T} + {M{\tilde \ell}\over \Theta} + \int\limits_{\Theta}^{T\ns_1}\!\!dT\,{M{\tilde c}\ns_\ssr{L
The total entropy change of the system is then
\begin{split} \RDelta S\ns_{total}&=\RDelta S\ns_{pond} + \RDelta S\ns_{ice}\\ &=M{\tilde c}\ns_\ssr{S} \ln\!\bigg({\Theta\over T\ns_0}\bigg) - M{\tilde c}\ns_\ssr{S} \bigg({\Theta-T\ns_0\over T\ns_1}\bigg) + M{\tild
Now since T 0
<Θ<T
1
, we have
M{\tilde c}\ns_\ssr{S} \bigg({\Theta-T\ns_0\over T\ns_1}\bigg) < M{\tilde c}\ns_\ssr{S} \bigg({\Theta-T\ns_0\over \Theta}\bigg)\ .
Therefore,
\RDelta S > M{\tilde\ell}\,\bigg({1\over \Theta}-{1\over T\ns_1}\bigg) + M{\tilde c}\ns_\ssr{S}\, f\big(T\ns_0/\Theta\big) + M{\tilde c}\ns_\ssr{L}\, f\big(\Theta/T\ns_1\big)\ ,
where f (x) = x − 1 − ln x . Clearly f (x) = 1 − x ′

is negative on the interval (0, 1), which means that the maximum of
−1
f (x) occurs at x =0 and the minimum at x =1 . But f (0) = ∞ and
f (1) = 0 , which means that f (x) ≥ 0 for x ∈ [0, 1]. Since T < Θ < T , we conclude ΔS
0 1
> 0.
total
[surfmelt] Left panel: data from R. R. Gilpin, J. Colloid Interface Sci. 77, 435 (1980) showing measured thickness of the surface melt on ice at temperatures below 0
∘
C. The straight line has slope
− , as predicted by theory. Right panel: phase diagram of H O , showing various high pressure solid phases. (Source : Physics Today, December 2005).
1
2
3
Gibbs phase rule

Equilibrium between two phases means that p, T , and μ(p, T ) are identical. From
μ (p, T ) = μ (p, T ) , (2.11.11)

1 2
we derive an equation for the slope of the coexistence curve, the Clausius-Clapeyron relation. Note that we have one equation in two unknowns (T , p), so the solution set is a curve. For three phase
coexistence, we have
μ (p, T ) = μ (p, T ) = μ (p, T ) , (2.11.12)
1 2 3
which gives us two equations in two unknowns. The solution is then a point (or a set of points). A critical point also is a solution of two simultaneous equations:
critical point ⟹ v (p, T ) = v (p, T ) , μ (p, T ) = μ (p, T ) . (2.11.13)
1 2 1 2
∂μ
Recall v = N A(
∂p
)
T
. Note that there can be no four phase coexistence for a simple p-V -T system.
Now for the general result. Suppose we have σ species, with particle numbers Na , where a = 1, … , σ . It is useful to briefly recapitulate the derivation of the Gibbs-Duhem relation. The energy
E(S, V , N , … , N ) is a homogeneous function of degree one:
σ
1
E(λS, λV , λ N , … , λ Nσ ) = λE(S, V , N , … , Nσ ) . (2.11.14)

1 1
From Euler’s theorem for homogeneous functions (just differentiate with respect to λ and then set λ = 1 ), we have
σ
E = T S − p V + ∑ μa Na . (2.11.15)
a=1
Taking the differential, and invoking the First Law,

σ
dE = T dS − p dV + ∑ μa dNa , (2.11.16)
a=1
we arrive at the relation

σ
S dT − V dp + ∑ Na dμa = 0 , (2.11.17)
a=1
of which Equation [GDR] is a generalization to additional internal ‘work’ variables. This says that the σ + 2 quantities (T , p, μ 1
, … , μσ ) are not all independent. We can therefore write
μσ = μσ (T , p, μ , … , μ ) . (2.11.18)
1 σ−1
(j)
If there are φ different phases, then in each phase j , with j = 1, … , φ , there is a chemical potential μ a for each species a . We then have
(j) (j) (j) (j)
μσ = μσ (T , p, μ ,…,μ ) . (2.11.19)
1 σ−1
(j) (j)
Here μ is the chemical potential of the a^\ssr{th} species in the j^\ssr{th} phase. Thus, there are φ such equations relating the 2 + φ σ variables (T , p, {μ
a a }) , meaning that only 2 + φ (σ − 1)
of them may be chosen as independent. This, then, is the dimension of ‘thermodynamic space’ containing a maximal number of intensive variables:
d\ns_\ssr{TD}(\sigma,\varphi)=2+\varphi\,(\sigma-1)\ .
σ
To completely specify the state of our system, we of course introduce a single extensive variable, such as the total volume V . Note that the total particle number N = ∑a=1 Na may not be conserved
in the presence of chemical reactions!
Now suppose we have equilibrium among φ phases. We have implicitly assumed thermal and mechanical equilibrium among all the phases, meaning that p and T are constant. Chemical equilibrium
applies on a species-by-species basis. This means
′
(j) (j )
μa = μa (2.11.20)
where j, j ′
∈ {1, … , φ} . This gives σ(φ − 1) independent equations equations17. Thus, we can have phase equilibrium among the φ phases of σ species over a region of dimension
\begin{split} d\ns_\ssr{PE}(\sigma,\varphi)&=2+\varphi\,(\sigma-1)-\sigma\,(\varphi-1)\\ &=2+\sigma-\varphi\ . \end{split}
Since d_\ssr{PE}\ge 0 , we must have φ ≤ σ + 2 . Thus, with two species (σ = 2 ), we could have at most four phase coexistence.
If the various species can undergo ρ distinct chemical reactions of the form
(r) (r) (r)
ζ A +ζ A + ⋯ + ζσ Aσ = 0 , (2.11.21)
1 1 2 2
(r)
where A is the chemical formula for species a , and
a ζa is the stoichiometric coefficient for the a^\ssr{th} species in the r^\ssr{th} reaction, with r = 1, … , ρ , then we have an additional ρ
constraints of the form

σ
(r) (j)
∑ ζa μa =0 . (2.11.22)
a=1
Therefore,
d\ns_\ssr{PE}(\sigma,\varphi,\rho)=2+\sigma-\varphi-\rho\ .
One might ask what value of j are we to use in Equation 2.11.22, or do we in fact have φ such equations for each r? The answer is that Equation [phaseq] guarantees that the chemical potential of
species a is the same in all the phases, hence it doesn’t matter what value one chooses for j in Equation [reacon].
σ (j)
Let us assume that no reactions take place, ρ =0 , so the total number of particles ∑b=1 N
b
is conserved. Instead of choosing (T , p, μ , … , μ
1 σ−1
) as d\ns_\ssr{TD} intensive variables, we could
(j)
have chosen (T , p, μ 1
,…,x
σ−1
, where x
) a = Na /N is the concentration of species a .
Why do phase diagrams in the (p, v) and (T , v) plane look different than those in the (p, T ) plane?18 For example, Fig. [PVTc] shows projections of the p-v -T surface of a typical single component
substance onto the (T , v) , (p, v), and (p, T ) planes. Coexistence takes place along curves in the (p, T ) plane, but in extended two-dimensional regions in the (T , v) and (p, v) planes. The reason that
p and T are special is that temperature, pressure, and chemical potential must be equal throughout an equilibrium phase if it is truly in thermal, mechanical, and chemical equilibrium. This is not the
case for an intensive variable such as specific volume v = N V /N or chemical concentration x = N /N .

A a a
This page titled 2.11: Phase Transitions and Phase Equilibria is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel Arovas.
Computing the entropy of mixing
Entropy is widely understood as a measure of disorder. Of course, such a definition should be supplemented by a more precise definition of disorder – after all, one man’s trash is another man’s
treasure. To gain some intuition about entropy, let us explore the mixing of a multicomponent ideal gas. Let N = ∑ N be the total number of particles of all species, and let x = N /N be the
a a a a
concentration of species a . Note that ∑ x = 1 . a a
For any substance obeying the ideal gas law pV = N kB T , the entropy is
S(T , V , N ) = N kB ln(V /N ) + N ϕ(T ) , (2.12.1)
since \big({\pz S\over\pz V}\big)\ns_\sss{T,N}=\big({\pz p\over\pz T}\big)\ns_\sss{V,N}={N\kB\over V} . Note that in Equation [STVNideal] we have divided V by N before taking the logarithm. This is
essential if the entropy is to be an extensive function (see §7.5). One might think that the configurational entropy of an ideal gas should scale as ln(V ) = N ln V , since each particle can be N
anywhere in the volume V . However, if the particles are indistinguishable, then permuting the particle labels does not result in a distinct configuration, and so the configurational entropy is
proportional to ln(V /N !) ∼ N ln(V /N ) − N . The origin of this indistinguishability factor will become clear when we discuss the quantum mechanical formulation of statistical mechanics. For
N
now, note that such a correction is necessary in order that the entropy be an extensive function.
If we did not include this factor and instead wrote S (T , V , N ) = N k ln V + N ϕ(T ) , then we would find S (T , V , N ) − 2S (T , V , N ) = N k ln 2 , the total entropy of two identical systems
∗
B
∗ ∗ 1
2
1
2
B
of particles separated by a barrier will increase if the barrier is removed and they are allowed to mix. This seems absurd, though, because we could just as well regard the barriers as invisible. This is
known as the Gibbs paradox. The resolution of the Gibbs paradox is to include the indistinguishability correction, which renders S extensive, in which case S(T , V , N ) = 2S(T , V , N ) . 1
2
1
Consider now the situation in Fig. [boxes], where we have separated the different components into their own volumes V . Let the pressure and temperature be the same everywhere, so pV
a a = Na kB T .
The entropy of the unmixed system is then
S = ∑ Sa = ∑ [Na kB ln(Va / Na ) + Na ϕa (T )] . (2.12.2)

unmixed
a a
[boxes] A multicomponent system consisting of isolated gases, each at temperature T and pressure p. Then system entropy increases when all the walls between the different subsystems are removed.
Now let us imagine removing all the barriers separating the different gases and letting the particles mix thoroughly. The result is that each component gas occupies the full volume V , so the entropy is
S = ∑ Sa = ∑ [Na kB ln(V / Na ) + Na ϕa (T )] . (2.12.3)

mixed
a a
Thus, the entropy of mixing is

ΔS =S −S
mix mixed unmixed
= ∑ Na kB ln(V / Va ) = −N kB ∑ xa ln xa ,
a a
Na Va
where x a =
N
=
V
is the fraction of species a . Note that ΔS mix
≥0 .
What if all the components were initially identical? It seems absurd that the entropy should increase simply by removing some invisible barriers. This is again the Gibbs paradox. In this case, the
resolution of the paradox is to note that the sum in the expression for S is a sum over distinct species. Hence if the particles are all identical, we have
mixed
S
mixed
= N k ln(V /N ) + N ϕ(T ) = S
B
, hence ΔS
unmixed
= 0.
mix
Entropy and combinatorics

As we shall learn when we study statistical mechanics, the entropy may be interpreted in terms of the number of ways W (E, V , N ) a system at fixed energy and volume can arrange itself. One has
S(E, V , N ) = kB ln W (E, V , N ) . (2.12.4)
Consider a system consisting of σ different species of particles. Now let it be that for each species label a , N particles of that species are confined among a Qa little boxes such that at most one
particle can fit in a box (see Fig. [Smix]). How many ways W are there to configure N identical particles among Q boxes? Clearly
Q Q!
W =( ) = . (2.12.5)
N N ! (Q − N )!
Q!
Were the particles distinct, we’d have W distinct
=
(Q−N )!
, which is N ! times greater. This is because permuting distinct particles results in a different configuration, and there are N ! ways to permute
N particles.
Q
The entropy for species a is then S a = kB ln Wa = kB ln(
Na
a
) . We then use Stirling’s approximation,
1 1 −1
ln(K!) = K ln K − K + ln K + ln(2π) + O(K ) , (2.12.6)
2 2
which is an asymptotic expansion valid for K ≫ 1 . One then finds for Q, N ≫ 1 , with x = N /Q ∈ [0, 1],
Q
ln ( ) = (Q ln Q − Q) − (xQ ln(xQ) − xQ) − ((1 − x)Q ln ((1 − x)Q) − (1 − x)Q)
N
= −Q[x ln x + (1 − x) ln(1 − x)] .
This is valid up to terms of order Q in Stirling’s expansion. Since ln Q ≪ Q, the next term is small and we are safe to stop here. Summing up the contributions from all the species, we get
σ σ
S = kB ∑ ln Wa = −kB ∑ Qa [xa ln xa + (1 − xa ) ln(1 − xa )] , (2.12.7)

unmixed
a=1 a=1
where x a = Na / Qa is the initial dimensionless density of species a .
[Smix] Mixing among three different species of particles. The mixed configuration has an additional entropy, ΔS mix .
Now let’s remove all the partitions between the different species so that each of the particles is free to explore all of the boxes. There are Q = ∑ a
Qa boxes in all. The total number of ways of placing
N particles of species a = 1 through N particles of species σ is
σ
1
Q!
W = , (2.12.8)
mixed
N ! N ! ⋯ Nσ !
0 1
where N 0
= Q −∑
σ
a=1
Na is the number of vacant boxes. Again using Stirling’s rule, we find
σ
S = −kB Q ∑ x̃a ln x̃a , (2.12.9)

mixed
a=0
where x̃ a
= Na /Q is the fraction of all boxes containing a particle of species a , and N is the number of empty boxes. Note that
0
Na Na Qa
x̃ = = ⋅ = xa fa , (2.12.10)
a
Q Qa Q
where f a ≡ Qa /Q . Note that ∑ σ
a=1
fa = 1 .
Qa Na
Let’s assume all the densities are initially the same, so x a = x∀a , so x̃ a
= x fa . In this case, f a =
Q
=
N
is the fraction of species a among all the particles. We then have x̃ 0
= 1 −x , and
σ
S = −kB Q ∑ x fa ln(x fa ) − kB Q x̃0 ln x̃0

mixed
a=1
= −kB Q[x ln x + (1 − x) ln(1 − x)] − kB x Q ∑ fa ln fa .
a=1
Thus, the entropy of mixing is

σ
ΔS = −N kB ∑ fa ln fa , (2.12.11)
mix
a=1
where N =∑
σ
a=1
Na is the total number of particles among all species (excluding vacancies) and f a = Na /(N + N )
0
is the fraction of all boxes occupied by species a .
Weak solutions and osmotic pressure

Suppose one of the species is much more plentiful than all the others, and label it with a = 0 . We will call this the solvent. The entropy of mixing is then
σ
N Na
0
ΔS = −kB [ N ln( ) + ∑ Na ln( )] , (2.12.12)
mix 0 ′ ′
N +N N +N
0 a=1 0
where N
′
=∑
σ
a=1
Na is the total number of solvent molecules, summed over all species. We assume the solution is weak, which means Na ≤ N
′
≪ N
0
. Expanding in powers of N /N
′
0
and
Na / N
0
, we find
σ
Na
′2
ΔS = −kB ∑ [ Na ln( ) − Na ] + O(N /N ) . (2.12.13)
mix 0
N
a=1 0
Consider now a solution consisting of N molecules of a solvent and N molecules of species a of solute, where
0 a a = 1, … , σ . We begin by expanding the Gibbs free energy
G(T , p, N , N , … , N ), where there are σ species of solutes, as a power series in the small quantities N . We have
σ a
0 1
Na
G(T , p, N , { Na }) = N g (T , p) + kB T ∑ Na ln( )
0 0 0
eN
a 0
1
+ ∑ Na ψa (T , p) + ∑A (T , p) Na N .
ab b
2N
a 0 a,b
The first term on the RHS corresponds to the Gibbs free energy of the solvent. The second term is due to the entropy of mixing. The third term is the contribution to the total free energy from the
individual species. Note the factor of e in the denominator inside the logarithm, which accounts for the second term in the brackets on the RHS of Equation 2.12.13. The last term is due to interactions
between the species; it is truncated at second order in the solute numbers.
The chemical potential for the solvent is
∂G 1
μ (T , p) = = g (T , p) − kB T ∑ xa − ∑A (T , p) xa x , (2.12.14)
0 0 ab b
∂N 2
0 a a,b
and the chemical potential for species a is

∂G
μa (T , p) = = kB T ln xa + ψa (T , p) + ∑ A (T , p) x , (2.12.15)
ab b
∂Na
b
where x = N /N is the concentrations of solute species a . By assumption, the last term on the RHS of each of these equations is small, since N
a a 0 solute
≪ N
0
, where Nsolute
=∑
σ
a=1
Na is the total
number of solute molecules. To lowest order, then, we have
μ (T , p) = g (T , p) − x kB T
0 0
μa (T , p) = kB T ln xa + ψa (T , p) ,
where x = ∑ a
xa is the total solute concentration.
[osmotic] Osmotic pressure causes the column on the right side of the U-tube to rise higher than the column on the left by an amount Δh = π/ϱ g .
If we add sugar to a solution confined by a semipermeable membrane19, the pressure increases! To see why, consider a situation where a rigid semipermeable membrane separates a solution (solvent
plus solutes) from a pure solvent. There is energy exchange through the membrane, so the temperature is T throughout. There is no volume exchange, however: dV = dV = 0 , hence the pressure ′
need not be the same. Since the membrane is permeable to the solvent, we have that the chemical potential μ is the same on each side. This means 0
g\ns_0(T,p\ns_\ssr{R})-x\kT = g\ns_0(T,p\ns_\ssr{L})\ ,
where p\ns_\ssr{L,R} is the pressure on the left and right sides of the membrane, and x = N /N is again the total solute concentration. This equation once again tells us that the pressure p cannot be
0
the same on both sides of the membrane. If the pressure difference is small, we can expand in powers of the osmotic pressure, \pi\equiv p\ns_\ssr{R}-p\ns_\ssr{L}\, , and we find
∂μ
0
π = x kB T /( ) . (2.12.16)
∂p T
But a Maxwell relation (§9) guarantees

∂μ ∂V
( ) =( ) = v(T , p)/ NA , (2.12.17)
∂p T ,N
∂N T ,p
where v(T , p) is the molar volume of the solvent.

πv = xRT , (2.12.18)
which looks very much like the ideal gas law, even though we are talking about dense (but ‘weak’) solutions! The resulting pressure has a demonstrable effect, as sketched in Fig. [osmotic]. Consider
a solution containing ν moles of sucrose (C H O ) per kilogram (55.52 mol) of water at 30 C . We find π = 2.5 atm when ν = 0.1 .
12 22 11
∘
One might worry about the expansion in powers of π when π is much larger than the ambient pressure. But in fact the next term in the expansion is smaller than the first term by a factor of πκ
T
,
where κ is the isothermal compressibility. For water one has κ ≈ 4.4 × 10 (atm) , hence we can safely ignore the higher order terms in the Taylor expansion.
T T
−5 −1
Effect of impurities on boiling and freezing points

Along the coexistence curve separating liquid and vapor phases, the chemical potentials of the two phases are identical:
\mu^0_\ssr{L}(T,p)=\mu^0_\ssr{V}(T,p)\ .
Here we write μ for μ to emphasize that we are talking about a phase with no impurities present. This equation provides a single constraint on the two variables T and p, hence one can, in principle,
0
solve to obtain T = T (p) , which is the equation of the liquid-vapor coexistence curve in the (T , p) plane. Now suppose there is a solute present in the liquid. We then have
∗
0
\mu\ns_\ssr{L}(T,p,x)=\mu^0_\ssr{L}(T,p) - x\kT\ ,
where x is the dimensionless solute concentration, summed over all species. The condition for liquid-vapor coexistence now becomes
\mu^0_\ssr{L}(T,p)-x\kT=\mu^0_\ssr{V}(T,p)\ .
This will lead to a shift in the boiling temperature at fixed p. Assuming this shift is small, let us expand to lowest order in (T − T ∗
0
(p)) , writing
\mu^0_\ssr{L}(T^*_0,p)+ \pabc{\mu^0_\ssr{L}}{T}{p}\,\big(T-T^*_0\big) -x\kT = \mu^0_\ssr{V}(T^*_0,p) + \pabc{\mu^0_\ssr{V}}{T}{p}\,\big(T-T^*_0\big)\ .
Note that
∂μ ∂S
( ) = −( ) (2.12.19)
∂T ∂N
p,N T ,p
from a Maxwell relation deriving from exactness of dG. Since S is extensive, we can write S = (N /N A
) s(T , p) , where s(T , p) is the molar entropy. Solving for T , we obtain
2
∗
xR[ T (p)]
∗ 0
∗
T (p, x) = T (p) + , (2.12.20)
0
ℓv (p)
where \ell\ns_\Rv=T_0^*\cdot(s\ns_\ssr{V}-s\ns_\ssr{L}) is the latent heat of the liquid-vapor transition20. The shift ΔT ∗
=T
∗
−T
0
∗
is called the boiling point elevation.
[latentheats] Latent heats of fusion and vaporization at p = 1 atm.
Latent Heat Melting Latent Heat of Boiling
~ ~
Substance of Fusion ℓ f Point Vaporization ℓ v Point
∘ ∘
J/g C J/g C
C H OH
2 5
108 -114 855 78.3
NH
3
339 -75 1369 -33.34
CO
2
184 -57 574 -78
He – – 21 -268.93
H 58 -259 455 -253
Pb 24.5 372.3 871 1750
N
2
25.7 -210 200 -196
O
2
13.9 -219 213 -183
H O
2
334 0 2270 100
As an example, consider seawater, which contains approximately 35 g of dissolved N a C l per kilogram of H + −
2
O . The atomic masses of Na and Cl are 23.0 and 35.4, respectively, hence the total
ionic concentration in seawater (neglecting everything but sodium and chlorine) is given by
2 ⋅ 35 1000
x = / ≈ 0.022 . (2.12.21)
23.0 + 35.4 18
The latent heat of vaporization of H 2

O at atmospheric pressure is ℓ = 40.7 kJ/mol, hence
2
(0.022)(8.3 J/mol K)(373 K)
∗
ΔT = ≈ 0.6 K . (2.12.22)
4
4.1 × 10 J/mol
Put another way, the boiling point elevation of H O at atmospheric pressure is about 0.28
2
∘
C per percent solute. We can express this as ΔT ∗
= Km , where the molality m is the number of moles of
solute per kilogram of solvent. For H O, we find K = 0.51 C kg/mol .
2
∘
Similar considerations apply at the freezing point, when we equate the chemical potential of the solvent plus solute to that of the pure solid. The latent heat of fusion for H O
2
is about
\ell\ns_\Rf=T^0_\Rf\cdot(s\ns_\ssr{LIQUID}-s\ns_\ssr{SOLID})=6.01\,{kJ}/{mol} 21 We thus predict a freezing point depression of
2
ΔT
∗
= −xR[ T
∗
0
] /ℓ
f
∘
= 1.03 C ⋅ x[%] . This can be expressed once
22
again as ΔT ∗
= −Km , with K = 1.86 ∘
C kg/mol .
Binary solutions
Consider a binary solution, and write the Gibbs free energy G(T , p, N A
,N
B
) as
N
0 0 A
G(T , p, N ,N ) =N μ (T , p) + N μ (T , p) + N kB T ln ( )
A B A A B B A
N +N
A B
N N N
B A B
+N kB T ln( ) +λ .
B
N +N N +N
A B A B
The first four terms on the RHS represent the free energy of the individual component fluids and the entropy of mixing. The last term is an interaction contribution. With λ > 0 , the interaction term
prefers that the system be either fully A or fully B. The entropy contribution prefers a mixture, so there is a competition. What is the stable thermodynamic state?
[bing] Gibbs free energy per particle for a binary solution as a function of concentration x = x of the B species (pure A at the left end x = 0 ; pure B at the right end x = 1 ), in units of the
B
interaction parameter λ . Dark red curve: T=0.65\,\lambda/k_\ssr{B}> T_\Rc ; green curve: T=\lambda/2k_\ssr{B}=T_\Rc ; blue curve: T=0.40\,\lambda/k_\ssr{B} < T_\Rc . We have chosen
\mu^0_\SA=0.60\,\lambda-0.50\,k_\ssr{B}T and \mu^0_\SB=0.50\,\lambda-0.\,50\,k_\ssr{B}T . Note that the free energy g(T , p, x) is not convex in x for T < T , indicating an instability and c
necessitating a Maxwell construction.

It is useful to write the Gibbs free energy per particle, g(T , p, x) = G/(N
A
+N
B
) , in terms of T , p, and the concentration x ≡x
B
=N
B
/(N
A
+N
B
) of species B (hence x
A
= 1 −x is the
concentration of species A). Then
0 0
g(T , p, x) = (1 − x) μ +x μ + kB T [x ln x + (1 − x) ln(1 − x)] + λ x (1 − x) . (2.12.23)
A B
In order for the system to be stable against phase separation into relatively A -rich and B-rich regions, we must have that g(T , p, x) be a convex function of x. Our first check should be for a local
instability, spinodal decomposition. We have
∂g x
0 0
=μ −μ + kB T ln ( ) + λ (1 − 2x) (2.12.24)
B A
∂x 1 −x
and
2
∂ g kB T kB T
= + − 2λ . (2.12.25)
∂x2 x 1 −x
2
∂ g
The spinodal is given by the solution to the equation ∂x2
=0 , which is
2λ
∗
T (x) = x (1 − x) . (2.12.26)
kB
Since x (1 − x) achieves its maximum value of 1
4
at x = 1
2
, we have T ∗
≤ kB /2λ .
In Fig. [bing] we sketch the free energy g(T , p, x) versus x for three representative temperatures. For T > λ/2k , the free energy is everywhere convex in λ . When T < λ/2k , there free energy
B B
resembles the blue curve in Fig. [bing], and the system is unstable to phase separation. The two phases are said to be immiscible, or, equivalently, there exists a solubility gap. To determine the
coexistence curve, we perform a Maxwell construction, writing
g(x ) − g(x ) ∂g ∣ ∂g ∣
2 1
= ∣ = ∣ . (2.12.27)
x −x ∂x ∣x ∂x ∣x
2 1 1 2
Here, x and x are the boundaries of the two phase region. These equations admit a symmetry of x ↔ 1 − x , hence we can set x = x and x
1 2 1 2
= 1 −x . We find
0 0
g(1 − x) − g(x) = (1 − 2x) (μ −μ ) , (2.12.28)
B A
and invoking eqns. [binmax] and [gpmax] we obtain the solution

λ 1 − 2x
Tcoex (x) = ⋅ . (2.12.29)
1−x
kB ln ( )
x
[binmupd] Upper panels: chemical potential shifts Δμ = Δμ ± Δμ versus concentration x = x . The dashed black line is the spinodal, and the solid black line the coexistence boundary.
± A B B
Temperatures range from T = 0 (dark blue) to T=0.6\,\lambda/k_\ssr{B} (red) in units of 0.1\,\lambda/k_\ssr{B} . Lower panels: phase diagram in the (T , Δμ ) planes. The black dot is the critical
±
point.
The phase diagram for the binary system is shown in Fig. [binary]. For T < T (x) , the system is unstable, and spinodal decomposition occurs. For T (x) < T < T
∗
(x) , the system is metastable,
∗
coex
just like the van der Waals gas in its corresponding regime. Real binary solutions behave qualitatively like the model discussed here, although the coexistence curve is generally not symmetric under
x ↔ 1 − x , and the single phase region extends down to low temperatures for x ≈ 0 and x ≈ 1 . If λ itself is temperature-dependent, there can be multiple solutions to eqns. [TSPINO] and [TCOEX].
For example, one could take

2
λ T
0
λ(T ) = . (2.12.30)
2 2
T +T
0
In this case, k B
T >λ at both high and low temperatures, and we expect the single phase region to be reentrant. Such a phenomenon occurs in water-nicotine mixtures, for example.
[binary] Phase diagram for the binary system. The black curve is the coexistence curve, and the dark red curve is the spinodal. A-rich material is to the left and B-rich to the right.
It is instructive to consider the phase diagram in the (T , μ) plane. We define the chemical potential shifts,
0 2
Δμ ≡μ −μ = kB T ln(1 − x) + λ x
A A A
0 2
Δμ ≡μ −μ = kB T ln x + λ (1 − x ) ,
B B B
and their sum and difference, Δμ ≡ Δμ ± Δμ . From the Gibbs-Duhem relation, we know that we can write μ as a function of T , p, and μ . Alternately, we could write Δμ in terms of T , p,
± A B B A ±
and Δμ , so we can choose which among Δμ and Δμ we wish to use in our phase diagram. The results are plotted in Fig. [binmupd]. It is perhaps easiest to understand the phase diagram in the
∓ + −
(T , Δμ ) plane. At low temperatures, below T = T = λ/2 k , there is a first order phase transition at Δμ = 0 . For T < T = λ/2 k and Δμ = 0 , infinitesimally positive, the system is in the +
− c B
− c B
−
A -rich phase, but for Δμ , infinitesimally negative, it is B-rich. The concentration x = x changes discontinuously across the phase boundary. The critical point lies at
−
=0
− B
(T , Δμ ) = (λ/2 k , 0) .B
−
If we choose N =N
A
+N
B
to be the extensive variable, then fixing N means dN A
+ dN
B
=0 . So st fixed T and p,
dG∣
∣ =μ dN +μ dN ⇒ dg∣
∣ = −Δμ− dx . (2.12.31)
T ,p A A B B T ,p
Since Δμ− (x, T ) = φ(x, T ) − φ(1 − x, T ) = −Δμ− (1 − x, T ) , where φ(x, T ) = λx − kB T ln x , we have that the coexistence boundary in the (x, Δ− ) plane is simply the line Δμ− = 0 ,
1−x
because ′
∫ dx Δμ
−
′
(x , T ) = 0 .
x
Note also that there is no two-phase region in the (T , Δμ) plane; the phase boundary in this plane is a curve which terminates at a critical point. As we saw in §12, the same situation pertains in single
component (p, v, T ) systems. That is, the phase diagram in the (p, v) or (T , v) plane contains two-phase regions, but in the (p, T ) plane the boundaries between phases are one-dimensional curves.
Any two-phase behavior is confined to these curves, where the thermodynamic potentials are singular.
The phase separation can be seen in a number of systems. A popular example involves mixtures of water and ouzo or other anise-based liqueurs, such as arak and absinthe. Starting with the pure
liqueur (x = 1 ), and at a temperature below the coexistence curve maximum, the concentration is diluted by adding water. Follow along on Fig. [binary] by starting at the point
(x = 1 , k T /λ = 0.4) and move to the left. Eventually, one hits the boundary of the two-phase region. At this point, the mixture turns milky, due to the formation of large droplets of the pure phases
B
on either side of coexistence region which scatter light, a process known as spontaneous emulsification23. As one continues to dilute the solution with more water, eventually one passes all the way
through the coexistence region, at which point the solution becomes clear once again, and described as a single phase.
What happens if λ < 0 ? In this case, both the entropy and the interaction energy prefer a mixed phase, and there is no instability to phase separation. The two fluids are said to be completely miscible.
An example would be benzene, C H , and toluene, C H ( C H C H ). The phase diagram would be blank, with no phase boundaries below the boiling transition, because the fluid could exist as a
6 6 7 8 6 5 3
mixture in any proportion.
Figure [LVcoex] Gibbs free energy per particle g for an ideal binary solution for temperatures T ∈ [T
A
∗
,T
B
∗
] . The Maxwell construction is shown for the case T A
∗
< T < T
B
∗
. Right: phase diagram,
showing two-phase region and distillation sequence in (x, T ) space.
Any fluid will eventually boil if the temperature is raised sufficiently high. Let us assume that the boiling points of our A and B fluids are T^*_\ssr{A,B} , and without loss of generality let us take
T
A
∗
<T
B
∗
at some given fixed pressure24. This means \mu_\SA^\ssr{L}(T^*_\SA,p)=\mu_\SA^\ssr{V}(T^*_\SA,p) and \mu_\SB^\ssr{L}(T^*_\SB,p)=\mu_\SB^\ssr{V}(T^*_\SB,p) . What happens to the
mixture? We begin by writing the free energies of the mixed liquid and mixed vapor phases as
\begin{aligned} g\ns_\ssr{L}(T,p,x)&=(1-x)\,\mu_\SA^\ssr{L}(T,p) + x\,\mu_\SB^\ssr{L}(T,p)+\kT\Big[x\ln x + (1-x)\ln(1-x)\Big] + \lambda\ns_\ssr{L}\,x(1-x)\\ g\ns_\ssr{V}(T,p,x)&=(1-x)\,\mu_\SA^\ssr{V}(T,p) + x\,\m
Typically \lambda\ns_\ssr{V}\approx 0 . Consider these two free energies as functions of the concentration x, at fixed T and p. If the curves never cross, and g\ns_\ssr{L}(x) < g\ns_\ssr{V}(x) for all
x ∈ [0, 1], then the liquid is always the state of lowest free energy. This is the situation in the first panel of Fig. [LVcoex]. Similarly, if g\ns_\ssr{V}(x) < g\ns_\ssr{L}(x) over this range, then the
mixture is in the vapor phase throughout. What happens if the two curves cross at some value of x? This situation is depicted in the second panel of Fig. [LVcoex]. In this case, there is always a
Maxwell construction which lowers the free energy throughout some range of concentration, the system undergoes phase separation.
Figure
Figure
[FAZEO] Negative (left) and positive (right) azeotrope phase diagrams. From Wikipedia.
In an ideal fluid, we have \lambda\ns_\ssr{L}=\lambda\ns_\ssr{V}=0 , and setting g\ns_\ssr{L}=g\ns_\ssr{V} requires
(1 − x) Δμ (T , p) + x Δμ (T , p) = 0 , (2.12.32)
A B
where \RDelta\mu\ns_{\SA/\SB}(T,p)=\mu^\ssr{L}_{\SA/\SB}(T,p)-\mu^\ssr{V}_{\SA/\SB}(T,p) . Expanding the chemical potential about a given temperature T , ∗
∗
cp (T , p)
∗ ∗ ∗ ∗ 2
μ(T , p) = μ(T , p) − s(T , p) (T − T )− (T − T ) +… , (2.12.33)
2T
∂μ
where we have used ( ∂T
)
p,N
= −(
∂S
∂N
)
T ,p
= −s(T , p) , the entropy per particle, and ( ∂s
∂T
)
p,N
= cp /T . Thus, expanding Δμ A/B
about T ∗
A/B
, we have
\begin{split} \RDelta\mu\ns_\SA\equiv \mu^\ssr{L}_\SA-\mu^\ssr{V}_\SA &= (s^\ssr{V}_\SA-s^\ssr{L}_\SA)(T-T^*_\SA) + {c^\ssr{V}_{p\SA}-c^\ssr{L}_{p\SA}\over 2 T^*_\SA}\,(T-T^*_\SA)^2 + \ldots \\ \RDelta\m
We assume s^\ssr{V}_{\SA/\SB} > s^\ssr{L}_{\SA/\SB} , the vapor phase has greater entropy per particle. Thus, Δμ
A/B
(T ) changes sign from negative to positive as T rises through T
A/B
∗
. If we
assume that these are the only sign changes for Δμ A/B
(T ) at fixed p, then Equation [muABbin] can only be solved for T ∈ [T
∗
A
,T
B
∗
] . This immediately leads to the phase diagram in the rightmost
panel of Fig. [LVcoex].
[FAZZ] Free energies before Maxwell constructions for a binary fluid mixture in equilibrium with a vapor (
\lambda_\ssr{V}=0 ). Panels show (a) \lambda_\ssr{L}=0 (ideal fluid), (b) \lambda_\ssr{L}<0 (miscible fluid; negative azeotrope), (c) \lambda^\ssr{L}_\ssr{AB}>0 (positive azeotrope), (d)
\lambda^\ssr{L}_\ssr{AB}>0 (heteroazeotrope). Thick blue and red lines correspond to temperatures TA∗ and TB∗ , respectively, with TA∗ < TB∗ . Thin blue and red curves are for temperatures outside the
range [T , T ] . The black curves show the locus of points where g is discontinuous, where the liquid and vapor free energy curves cross. The yellow curve in (d) corresponds to the coexistence
∗
A
∗
B
temperature for the fluid mixture. In this case the azeotrope forms within the coexistence region.
According to the Gibbs phase rule, with σ = 2 , two-phase equilibrium (φ = 2 ) occurs along a subspace of dimension d\ns_\ssr{PE}=2+\sigma-\varphi=2 . Thus, if we fix the pressure p and the
concentration x=x\subB , liquid-gas equilibrium occurs at a particular temperature T , known as the boiling point. Since the liquid and the vapor with which it is in equilibrium at T may have
∗ ∗
different composition, different values of x, one may distill the mixture to separate the two pure substances, as follows. First, given a liquid mixture of A and B, we bring it to boiling, as shown in the
rightmost panel of Fig. [LVcoex]. The vapor is at a different concentration x than the liquid (a lower value of x if the boiling point of pure A is less than that of pure B, as shown). If we collect the
vapor, the remaining fluid is at a higher value of x. The collected vapor is then captured and then condensed, forming a liquid at the lower x value. This is then brought to a boil, and the resulting
vapor is drawn off and condensed, etc The result is a purified A state. The remaining liquid is then at a higher B concentration. By repeated boiling and condensation, A and B can be separated. For
liquid-vapor transitions, the upper curve, representing the lowest temperature at a given concentration for which the mixture is a homogeneous vapor, is called the dew point curve. The lower curve,
representing the highest temperature at a given concentration for which the mixture is a homogeneous liquid, is called the bubble point curve. The same phase diagram applies to liquid-solid mixtures
where both phases are completely miscible. In that case, the upper curve is called the liquidus, and the lower curve the solidus.
When a homogeneous liquid or vapor at concentration x is heated or cooled to a temperature T such that (x, T ) lies within the two-phase region, the mixture phase separates into the the two end
components (x^*_\ssr{L},T) and (x^*_\ssr{V},T) , which lie on opposite sides of the boundary of the two-phase region, at the same temperature. The locus of points at constant T joining these two
points is called the tie line. To determine how much of each of these two homogeneous phases separates out, we use particle number conservation. If \eta\ns_\ssr{L,V} is the fraction of the
homogeneous liquid and homogeneous vapor phases present, then \eta\ns_\ssr{L} x^*_\ssr{L} + \eta\ns_\ssr{V} x^*_\ssr{V} = x , which says \eta\ns_\ssr{L}=(x-x^*_\ssr{V})/(x^*_\ssr{L}-x^*_\ssr{V}) and
\eta\ns_\ssr{V}=(x-x^*_\ssr{L})/(x^*_\ssr{V}-x^*_\ssr{L}) . This is known as the lever rule.
For many binary mixtures, the boiling point curve is as shown in Fig. [FAZEO]. Such cases are called azeotropes. For negative azeotropes, the maximum of the boiling curve lies above both T . ∗
A,B
The free energy curves for this case are shown in panel (b) of Fig. [FAZZ]. For x < x , where x is the azeotropic composition, one can distill A but not B. Similarly, for x > x one can distill B but
∗ ∗ ∗
not A. The situation is different for positive azeotropes, where the minimum of the boiling curve lies below both T , corresponding to the free energy curves in panel (c) of Fig. [FAZZ]. In this case,
∗
A,B
distillation ( condensing and reboiling the collected vapor) from either side of x results in the azeotrope. One can of course collect the fluid instead of the vapor. In general, for both positive and
∗
negative azeotropes, starting from a given concentration x, one can only arrive at pure A plus azeotrope (if x < x ) or pure B plus azeotrope (if x > x ). Ethanol (C H OH) and water (H O) form a
∗ ∗
2 5 2
positive azeotrope which is 95.6% ethanol and 4.4% water by weight. The individual boiling points are T = 78.4 C , T
∗
C2 H5 OH
= 100 C ,
∘ ∗
while the azeotrope boils at
H2 O
∘
T^*_\ssr{AZ}=78.2^\circ\RC . No amount of distillation of this mixture can purify ethanol beyond the 95.6% level. To go beyond this level of purity, one must resort to azeotropic distillation, which
involves introducing another component, such as benzene (or a less carcinogenic additive), which alters the molecular interactions.
To model the azeotrope system, we need to take \lambda\ns_\ssr{L}\ne 0 , in which case one can find two solutions to the energy crossing condition g\ns_\ssr{V}(x)=g\ns_\ssr{L}(x) . With two such
crossings come two Maxwell constructions, hence the phase diagrams in Fig. [FAZEO]. Generally, negative azeotropes are found in systems with \lambda\ns_\ssr{L}<0 , whereas positive azeotropes
are found when \lambda\ns_\ssr{L}>0 . As we’ve seen, such repulsive interactions between the A and B components in general lead to a phase separation below a coexistence temperature
T\ns_\ssr{COEX}(x) given by Equation 2.12.29. What happens if the minimum boiling point lies within the coexistence region? This is the situation depicted in panel (d) of Fig. [FAZZ]. The system
is then a liquid/vapor version of the solid/liquid eutectic (see Fig. [Feutectic]), and the minimum boiling point mixture is called a heteroazeotrope.
[Feutectic] Phase diagram for a eutectic mixture in which a liquid is in equilibrium
with two solid phases α and β. The same phase diagram holds for heteroazeotropes, where a vapor is in equilibrium with two liquid phases.
This page titled 2.12: Entropy of Mixing and the Gibbs Paradox is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel Arovas.
Chemical reactions and the law of mass action
Suppose we have a chemical reaction among σ species, written as
ζ A +ζ A + ⋯ + ζσ Aσ = 0 , (2.13.1)
1 1 2 2
where
Aa = chemical formula
ζa = stoichiometric coefficient .
For example, we could have

−3 H −N +2 NH =0 (3 H +N ⇌ 2 NH ) (2.13.2)
2 2 3 2 2 3
for which
ζ(H ) = −3 , ζ(N ) = −1 , ζ(N H ) = 2 . (2.13.3)

2 2 3
When ζ > 0 , the corresponding A is a product; when ζ < 0 , the corresponding A is a reactant. The bookkeeping of the
a a a a
coefficients ζ which ensures conservation of each individual species of atom in the reaction(s) is known as stoichiometry25
a
Now we ask: what are the conditions for equilibrium? At constant T and p, which is typical for many chemical reactions, the
conditions are that G(T , p, {N }) be a minimum. Now
a
dG = −S dT + V dp + ∑ μa dNa , (2.13.4)
so if we let the reaction go forward, we have dN a = ζa , and if it runs in reverse we have dN a = −ζa . Thus, setting dT = dp = 0 ,
we have the equilibrium condition
σ
∑ ζa μa = 0 . (2.13.5)
a=1
Let us investigate the consequences of this relation for ideal gases. The chemical potential of the a^\ssr{th} species is
μa (T , p) = kB T ϕa (T ) + kB T ln pa . (2.13.6)
Here p = p x is the partial pressure of species a , where x = N / ∑ N the dimensionless concentration of species
a a a a b b
a .
Chemists sometimes write x = [A ] for the concentration of species a . In equilibrium we must have
a a
∑ ζa [ ln p + ln xa + ϕa (T )] = 0 , (2.13.7)
which says
∑ ζa ln xa = − ∑ ζa ln p − ∑ ζa ϕa (T ) . (2.13.8)
a a a
Exponentiating, we obtain the law of mass action:
ζa −∑ ζ
∏ xa =p a a
exp(− ∑ ζa ϕa (T )) ≡ κ(p, T ) . (2.13.9)
a a
The quantity κ(p, T ) is called the equilibrium constant. When κ is large, the LHS of the above equation is large. This favors
maximal concentration x for the products (ζ > 0 ) and minimal concentration x for the reactants (ζ < 0 ). This means that the
a a a a
equation REACTANTS ⇌ PRODUCTS is shifted to the right, the products are plentiful and the reactants are scarce. When κ is
small, the LHS is small and the reaction is shifted to the left, the reactants are plentiful and the products are scarce. Remember we
are describing equilibrium conditions here. Now we observe that reactions for which ∑ ζ > 0 shift to the left with increasing a a
pressure and shift to the right with decreasing pressure, while reactions for which ∑ ζ > 0 the situation is reversed: they shift to a a
the right with increasing pressure and to the left with decreasing pressure. When ∑
a
ζa = 0 there is no shift upon increasing or
decreasing pressure.
The rate at which the equilibrium constant changes with temperature is given by
∂ ln κ
′
( ) = − ∑ ζa ϕa (T ) . (2.13.10)
∂T
p a
Now from Equation [mui] we have that the enthalpy per particle for species i is
∂μa
ha = μa − T ( ) , (2.13.11)
∂T
p
since H = G + T S and S = −( ∂G
∂T
) . We find
p
2 ′
ha = −kB T ϕa (T ) , (2.13.12)
and thus
∂ ln κ ∑ ζa ha Δh
i
( ) = = , (2.13.13)
2 2
∂T kB T kB T
p
where Δh is the enthalpy of the reaction, which is the heat absorbed or emitted as a result of the reaction.
When Δh > 0 the reaction is endothermic and the yield increases with increasing T . When Δh < 0 the reaction is exothermic and
the yield decreases with increasing T .
As an example, consider the reaction H 2
+I
2
⇌ 2 HI . We have
ζ(H ) = −1 , ζ(I ) = −1 ζ(H I ) = 2 . (2.13.14)

2 2
Suppose our initial system consists of ν moles of H , 1

0
2
ν
2
0
=0 moles of I , and 2
ν
0
3
moles of undissociated HI . These mole
numbers determine the initial concentrations x , where x 0
a a = νa / ∑ ν . Define
b b
0
x −x
3 3
α ≡ , (2.13.15)
x
3
in which case we have

1 1
0 0 0 0
x =x + αx , x = αx , x = (1 − α) x . (2.13.16)
1 1 3 2 3 3 3
2 2
Then the law of mass action gives

2
4 (1 − α)
=κ . (2.13.17)
α(α + 2r)
where r ≡ x /x
0
1
0
3
=ν
1
0
/ν
3
. This yields a quadratic equation, which can be solved to find
0
α(κ, r) . Note that κ = κ(T ) for this
reaction since ∑ a
ζa = 0 . The enthalpy of this reaction is positive: Δh > 0 .
Enthalpy of formation
Most chemical reactions take place under constant pressure. The heat Q associated with a given isobaric process is if
f f
Q = ∫ dE +∫ p dV = (E − E ) + p (V − V ) = Hf − Hi , (2.13.18)
if f i f i
i i
where H is the enthalpy,
H = E + pV . (2.13.19)
Note that the enthalpy H is a state function, since E is a state function and p and V are state variables. Hence, we can
meaningfully speak of changes in enthalpy: ΔH = H − H . If ΔH < 0 for a given reaction, we call it exothermic – this is the
f i
case when Q < 0 and thus heat is transferred to the surroundings. Such reactions can occur spontaneously, and, in really fun
if
cases, can produce explosions. The combustion of fuels is always exothermic. If ΔH > 0 , the reaction is called endothermic.
Endothermic reactions require that heat be supplied in order for the reaction to proceed. Photosynthesis is an example of an
endothermic reaction.
[dhtab] Enthalpies of formation of some common substances.
0 0
ΔH ΔH
f f
Formula Name State kJ/mol Formula Name State kJ/mol
Ag Silver crystal 0.0 NiSO 4 Nickel sulfate crystal -872.9
Aluminum
C Graphite crystal 0.0 Al O 2 3 crystal -1657.7
oxide
Calcium
C Diamond crystal 1.9 Ca P O
3 2 8 gas -4120.8
phosphate
Hydrogen
O 3 Ozone gas 142.7 HCN liquid 108.9
cyanide
Sulfur
H O 2 Water liquid -285.8 SF 6 gas -1220.5
hexafluoride
Calcium
H BO
3 3 Boric acid crystal -1094.3 CaF 2 crystal -1228.0
fluoride
Calcium
ZnSO 4 Zinc sulfate crystal -982.8 CaCl 2 crystal -795.4
chloride
Suppose we have two reactions

(ΔH)
1
A + Bto35pt\rightarrowfill C (2.13.20)
and
(ΔH)
2
C + Dto35pt\rightarrowfill E . (2.13.21)
Then we may write

(ΔH)3
A + B + Dto35pt\rightarrowfill E , (2.13.22)
with
(ΔH) + (ΔH) = (ΔH) . (2.13.23)
1 2 3
We can use this additivity of reaction enthalpies to define a standard molar enthalpy of formation. We first define the standard state
of a pure substance at a given temperature to be its state (gas, liquid, or solid) at a pressure p = 1 bar. The standard reaction
enthalpies at a given temperature are then defined to be the reaction enthalpies when the reactants and products are all in their
standard states. Finally, we define the standard molar enthalpy of formation ΔH (X) of a compound X at temperature T as the 0
f
reaction enthalpy for the compound X to be produced by its constituents when they are in their standard state. For example, if
X = SO , then we write
2 0
Δ H [ SO ]
f 2
S +O to55pt\rightarrowfill S O . (2.13.24)
2 2
The enthalpy of formation of any substance in its standard state is zero at all temperatures, by definition:
ΔH [ O ] = ΔH [H e] = ΔH [K] = ΔH [M n] = 0 ,
0 0 0 0
f 2 f f f
[rxnenthalpy] Left panel: reaction enthalpy and activation energy (exothermic case shown). Right panel: reaction enthalpy as a
difference between enthalpy of formation of reactants and products.
Suppose now we have a reaction
ΔH
a A + b Bto35pt\rightarrowfill c C + d D . (2.13.25)
To compute the reaction enthalpy ΔH, we can imagine forming the components A and B from their standard state constituents.
Similarly, we can imagine doing the same for C and D. Since the number of atoms of a given kind is conserved in the process, the
constituents of the reactants must be the same as those of the products, we have
0 0 0 0
ΔH = −a ΔH (A) − b ΔH (B) + c ΔH (C ) + d ΔH (D) . (2.13.26)
f f f f
A list of a few enthalpies of formation is provided in table [dhtab]. Note that the reaction enthalpy is independent of the actual
reaction path. That is, the difference in enthalpy between A and B is the same whether the reaction is A ⟶ B or
A ⟶ X ⟶ (Y + Z) ⟶ B . This statement is known as Hess’s Law.
Note that
′
dH = dE + p dV + V dp = \mathchar 26
dQ + V dp , (2.13.27)
hence
′
\mathchar 26
dQ ∂H
Cp = ( ) =( ) . (2.13.28)
dT ∂T
p p
We therefore have
T
′ ′
H(T , p, ν ) = H(T , p, ν ) + ν∫ dT cp (T ) . (2.13.29)
0
T
0
For ideal gases, we have c p (T ) = (1 +

1
2
f) R . For real gases, over a range of temperatures, there are small variations:
2
cp (T ) = α + β T + γ T . (2.13.30)
Two examples (300 K < T < 1500 K , p = 1 atm):

J −3
J −7
J
O2 : α = 25.503 , β = 13.612 × 10 , γ = −42.553 × 10
2 3
mol K mol K mol K
J −3
J −7
J
H2 O : α = 30.206 , β = 9.936 × 10 , γ = 11.14 × 10
3
mol K mol K 2 mol K
If all the gaseous components in a reaction can be approximated as ideal, then we may write
(ΔH)rxn = (ΔE )rxn + ∑ ζa RT , (2.13.31)

a
where the subscript ‘rxn’ stands for ‘reaction’. Here (ΔE) rxn is the change in energy from reactants to products.
[enthtab] Average bond enthalpies for some common bonds. (Source: L. Pauling, The Nature of the Chemical Bond (Cornell Univ. Press, NY,
1960).)
enthalpy enthalpy enthalpy enthalpy
bond (kJ/mol) bond (kJ/mol) bond (kJ/mol) bond (kJ/mol)
H−H 436 C −C 348 C −S 259 F −F 155
H−C 412 C = C 612 N −N 163 F − Cl 254
H−N 388 C ≡ C 811 N = N 409 Cl − Br 219
H−O 463 C −N 305 N ≡ N 945 Cl − I 210
H −F 565 C = N 613 N −O 157 Cl − S 250
H − Cl 431 C ≡ N 890 N −F 270 Br − Br 193
H − Br 366 C −O 360 N − Cl 200 Br − I 178
H−I 299 C = O 743 N − Si 374 Br − S 212
H−S 338 C−F 484 O −O 146 I −I 151
H−P 322 C − Cl 338 O = O 497 S −S 264
H − Si 318 C − Br 276 O −F 185 P −P 172
C −I 238 O − Cl 203 Si − Si 176
Bond enthalpies
The enthalpy needed to break a chemical bond is called the bond enthalpy, h[ ∙ ]. The bond enthalpy is the energy required to
dissociate one mole of gaseous bonds to form gaseous atoms. A table of bond enthalpies is given in Tab. [enthtab]. Bond enthalpies
are endothermic, since energy is required to break chemical bonds. Of course, the actual bond energies can depend on the location
of a bond in a given molecule, and the values listed in the table reflect averages over the possible bond environment.
The bond enthalpies in Tab. [enthtab] may be used to compute reaction enthalpies. Consider, for example, the reaction
2 H (g) + O (g) ⟶ 2 H O(l) . We then have, from the table,
2 2 2
(ΔH)rxn = 2 h[H−H ] + h[O=O] − 4 h[H−O]
= −483 kJ/mol O .
2
Thus, 483 kJ of heat would be released for every two moles of H O produced, if the H O were in the gaseous phase. Since H O
2 2 2
is liquid at STP, we should also include the condensation energy of the gaseous water vapor into liquid water. At T = 100 C the ∘
~ ~
latent heat of vaporization is ℓ = 2270 J/g, but at T = 20 C , one has ℓ = 2450 J/g, hence with M = 18 we have
∘
ℓ = 44.1 kJ/mol. Therefore, the heat produced by the reaction

2 H (g)
2
is
+ O (g) \ooalign{\raise1pt\hbox{\relbar\joinrel ⇀ \joinrel}\crcr \lower1pt\hbox{↽ \joinrel\relbar\joinrel}}
2
2 H O(l)
2
(ΔH)rxn = −571.2 kJ / mol O

2
. Since the reaction produces two moles of water, we conclude that the enthalpy of formation of
liquid water at STP is half this value: ΔH [H O] = 285.6 kJ/mol.
0
f 2
[ethene] Calculation of reaction enthalpy for the hydrogenation of ethene (ethylene), C
2 H4 .
Consider next the hydrogenation of ethene (ethylene):
C H
2 4
. The
+ H \ooalign{\raise1pt\hbox{\relbar\joinrel ⇀ \joinrel}\crcr \lower1pt\hbox{↽ \joinrel\relbar\joinrel}}
2
C H
2 6
product is known as ethane. The energy accounting is shown in Fig. [ethene]. To compute the enthalpies of formation of ethene and
ethane from the bond enthalpies, we need one more bit of information, which is the standard enthalpy of formation of C (g) from
C (s), since the solid is the standard state at STP. This value is ΔH [C (g)] = 718 kJ/mol. We may now write
0
f
−2260 kJ
2 C (g) + 4 H (g) to55pt\rightarrowfill C H (g)

1436 kJ 2 4
2 C (s) to55pt\rightarrowfill 2 C (g)

872 kJ
2 H (g) to55pt\rightarrowfill 4 H (g) .

2
Thus, using Hess’s law, adding up these reaction equations, we have

48 kJ
2 C (s) + 2 H (g) to55pt\rightarrowfill C H (g) . (2.13.32)

2 2 4
Thus, the formation of ethene is endothermic. For ethane,

−2820 kJ
2 C (g) + 6 H (g) to55pt\rightarrowfill C H (g)

1436 kJ 2 6
2 C (s) to55pt\rightarrowfill 2 C (g)

1306 kJ
3 H (g) to55pt\rightarrowfill 6 H (g)

2
For ethane,
−76 kJ
2 C (s) + 3 H (g) to55pt\rightarrowfill C H (g) , (2.13.33)

2 2 6
which is exothermic.
This page titled 2.13: Some Concepts in Thermochemistry is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by
Daniel Arovas.
Suppose we have an inexact differential
′
\mathchar 26
dW = A dx . (2.14.1)
i i
Here I am adopting the ‘Einstein convention’ where we sum over repeated indices unless otherwise explicitly stated;
A dx = ∑ A dx . An integrating factor e is a function which, when divided into \mathchar 26
dF , yields an exact
⃗
L( x) ′
i i i i i
differential:
′
∂U
−L
dU = e \mathchar 26
dW = dx . (2.14.2)
i
∂x
i
Clearly we must have

2
∂ U ∂ −L
∂ −L
= (e A ) = (e A ) . (2.14.3)
j i
∂x ∂x ∂x ∂x
i j i j
Applying the Leibniz rule and then multiplying by e yields L
∂A ∂L ∂A ∂L
j i
−A = −A . (2.14.4)
j i
∂x ∂x ∂x ∂x
i i j j
If there are K independent variables {x , … , x }, then there are K(K − 1) independent equations of the above form – one for
1 K
1
each distinct (i, j) pair. These equations can be written compactly as

∂L
Ω =F , (2.14.5)
ijk ij
∂x
k
where
Ω =A δ −A δ
ijk j ik i jk
∂A ∂A
j i
F = − .
ij
∂x ∂x
i j
Note that F is antisymmetric, and resembles a field strength tensor, and that
ij
Ω
ijk
= −Ω
jik
is antisymmetric in the first two
indices (but is not totally antisymmetric in all three).
Can we solve these K(K − 1) coupled equations to find an integrating factor L? In general the answer is no. However, when
1
K = 2 we can always find an integrating factor. To see why, let’s call x ≡ x and y ≡ x . Consider now the ODE
1 2
dy Ax (x, y)
=− . (2.14.6)
dx Ay (x, y)
This equation can be integrated to yield a one-parameter set of integral curves, indexed by an initial condition. The equation for
these curves may be written as U (x, y) = 0 , where c labels the curves. Then along each curve we have
c
dUc ∂Ux ∂Uc dy

0 = = +
dx ∂x ∂y dx
∂Uc Ax ∂Uc
= − .
∂x Ay ∂y
Thus,
∂Uc ∂Uc −L
Ay = Ax ≡ e Ax Ay . (2.14.7)
∂x ∂y
This equation defines the integrating factor L :
1 ∂Uc 1 ∂Uc
L = − ln( ) = − ln( ) . (2.14.8)
Ax ∂x Ay ∂y
We now have that
L
∂Uc L
∂Uc
Ax = e , Ay = e , (2.14.9)
∂x ∂y
and hence
−L ′ ∂Uc ∂Uc
e \mathchar 26
dW = dx + dy = dUc . (2.14.10)
∂x ∂y
This page titled 2.14: Appendix I- Integrating Factors is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by
Daniel Arovas.
A convex function of a single variable f (x) is one for which f (x) > 0 everywhere. The Legendre transform of a convex function
′′
f (x) is a function g(p) defined as follows. Let p be a real number, and consider the line y = px, as shown in Figure 2.15.1. We
define the point x(p) as the value of x for which the difference F (x, p) = px − f (x) is greatest. Then define
26
g(p) = F (x(p), p). The value x(p) is unique if f (x) is convex, since x(p) is determined by the equation
′
f (x(p)) = p . (2.15.1)
Note that from p = f ′

(x(p)) we have, according to the chain rule,
d −1
′ ′′ ′ ′ ′′
f (x(p)) = f (x(p)) x (p) ⟹ x (p) = [f (x(p))] . (2.15.2)
dp
From this, we can prove that g(p) is itself convex:

d
′
g (p) = [p x(p) − f (x(p))]
dp
′ ′ ′
= p x (p) + x(p) − f (x(p)) x (p) = x(p) ,
hence
−1
′′ ′ ′′
g (p) = x (p) = [f (x(p))] >0 . (2.15.3)
Figure 2.15.1 : Construction for the Legendre transformation of a function f (x).

In higher dimensions, the generalization of the definition f
′′
(x) > 0 is that a function F (x , … , xn )
1
is convex if the matrix of
second derivatives, called the Hessian,
2
∂ F
Hij (x) = (2.15.4)
∂x ∂x
i j
is positive definite. That is, all the eigenvalues of Hij (x) must be positive for every x. We then define the Legendre transform
G(p) as
G(p) = p ⋅ x − F (x) (2.15.5)
where
p = ∇F . (2.15.6)
Note that
dG = x ⋅ dp + p ⋅ dx − ∇F ⋅ dx = x ⋅ dp , (2.15.7)
which establishes that G is a function of p and that
∂G
=x . (2.15.8)
j
∂p
j
Note also that the Legendre transformation is self dual, which is to say that the Legendre transform of G(p) is F (x) :
F → G → F under successive Legendre transformations.
We can also define a partial Legendre transformation as follows. Consider a function of q variables F (x, y) , where
x = { x , … , x } and y = { y , … , y }, with q = m + n . Define p = { p , … , p } , and
m n m
1 1 1
G(p, y) = p ⋅ x − F (x, y) , (2.15.9)
where
∂F
pa = (a = 1, … , m) . (2.15.10)
∂xa
These equations are then to be inverted to yield

∂G
xa = xa (p, y) = . (2.15.11)
∂pa
Note that
∂F
pa = (x(p, y), y) . (2.15.12)
∂xa
Thus, from the chain rule,

2 2 2
∂pa ∂ F ∂xc ∂ F ∂ G
δ = = = , (2.15.13)
ab
∂p ∂ xa ∂ xc ∂p ∂ xa ∂ xc ∂ pc ∂ p
b b b
which says
2
∂ G ∂xa −1
= =K , (2.15.14)
ab
∂ pa ∂ p ∂p
b b
where the m × m partial Hessian is

2
∂ F ∂pa
= =K . (2.15.15)
ab
∂ xa ∂ x ∂x
b b
Note that K ab
= Kba is symmetric. And with respect to the y coordinates,
2 2
∂ G ∂ F
=− = −L , (2.15.16)
μν
∂ yμ ∂ yν ∂ yμ ∂ yν
where
2
∂ F
L = (2.15.17)
μν
∂ yμ ∂ yν
is the partial Hessian in the y coordinates. Now it is easy to see that if the full q × q Hessian matrix H is positive definite, then ij
any submatrix such as K or L must also be positive definite. In this case, the partial Legendre transform is convex in
ab μν
{ p , … , p } and concave in { y , … , y }.
m n
1 1
This page titled 2.15: Appendix II- Legendre Transformations is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated
by Daniel Arovas.
Consider a set of n independent variables {x , … , x }, which can be thought of as a point in
1 n n -dimensional space. Let
{ y , … , y } and { z , … , z } be other choices of coordinates. Then
n n
1 1
∂xi ∂xi ∂yj

= . (2.16.1)
∂z ∂y ∂z
k j k
Note that this entails a matrix multiplication: A

ik
=B
ij
C
jk
, where A
ik
= ∂ x /∂ z
i k
, B
ij
= ∂ x /∂ y
i j
, and C
jk
= ∂ y /∂ z
j k
. We
define the determinant
∂x ∂(x , … , xn )
i 1
det( ) ≡ . (2.16.2)
∂z ∂(z , … , zn )
k 1
Such a determinant is called a Jacobian. Now if A = BC , then det(A) = det(B) ⋅ det(C ) . Thus,
∂(x , … , xn ) ∂(x , … , xn ) ∂(y , … , yn )
1 1 1
= ⋅ . (2.16.3)
∂(z , … , zn ) ∂(y , … , yn ) ∂(z , … , zn )
1 1 1
Recall also that

∂x
i
=δ . (2.16.4)
ik
∂x
k
Consider the case n = 2 . We have

∂x ∂x
⎛( ) ( ) ⎞
∂u ∂v
v u
∂(x, y) ⎜ ⎟ ∂x ∂y ∂x ∂y
= det ⎜ ⎟ =( )( ) −( )( ) . (2.16.5)
⎜ ⎟
∂(u, v) ⎜ ⎟ ∂u ∂v ∂v ∂u
v u u v
∂y ∂y
⎝( ) ( ) ⎠
∂u ∂v
v u
We also have
∂(x, y) ∂(u, v) ∂(x, y)
⋅ = . (2.16.6)
∂(u, v) ∂(r, s) ∂(r, s)
From this simple mathematics follows several very useful results.

1) First, write
−1
∂(x, y) ∂(u, v)
=[ ] . (2.16.7)
∂(u, v) ∂(x, y)
Now let v = y :
∂(x, y) ∂x 1
=( ) = . (2.16.8)
∂(u, y) ∂u ∂u
y
( )
∂x
y
Thus,
∂x ∂u
( ) = 1/ ( ) (2.16.9)
∂u ∂x
y y
2) Second, we have
∂(x, y) ∂x
=( )
∂(u, y) ∂u
y
∂(x, y) ∂(x, u)
= ⋅
∂(x, u) ∂(u, y)
∂y ∂x
= −( )( ) ,
∂u ∂y
x u
which is to say
∂x ∂y ∂x
( )( ) = −( ) . (2.16.10)
∂y ∂u ∂u
u x y
Invoking Equation [boxone], we conclude that
∂x ∂y ∂u
( )( )( ) = −1 . (2.16.11)
∂y ∂u ∂x
u x y
3) Third, we have
∂(x, v) ∂(x, v) ∂(y, v)
= ⋅ , (2.16.12)
∂(u, v) ∂(y, v) ∂(u, v)
which says
∂x ∂x ∂y
( ) =( )( ) (2.16.13)
∂u ∂y ∂u
v v v
This is simply the chain rule of partial differentiation.

4) Fourth, we have
∂(x, y) ∂(x, y) ∂(u, v)
= ⋅
∂(u, y) ∂(u, v) ∂(u, y)
∂x ∂y ∂v ∂x ∂y ∂v
=( )( )( ) −( )( )( ) ,
∂u ∂v ∂y ∂v ∂u ∂y
v u u u v u
which says
∂x ∂x ∂x ∂y
( ) =( ) −( )( ) (2.16.14)
∂u ∂u ∂y ∂u
y v u v
5) Fifth, whenever we differentiate one extensive quantity with respect to another, holding only intensive quantities constant, the
result is simply the ratio of those extensive quantities. For example,
∂S S
( ) = . (2.16.15)
∂V V
p,T
The reason should be obvious. In the above example, S(p, V , T ) = V ϕ(p, T ) , where ϕ is a function of the two intensive quantities
p and T . Hence differentiating S with respect to V holding p and T constant is the same as dividing S by V . Note that this implies
∂S ∂S ∂S S
( ) =( ) =( ) = , (2.16.16)
∂V ∂V ∂V V
p,T p,μ n,T
where n = N /V is the particle density.

6) Sixth, suppose we have a function Φ(y, v) and we write
dΦ = x dy + u dv . (2.16.17)
That is,
∂Φ ∂Φ
x =( ) ≡ Φy , u =( ) ≡ Φv . (2.16.18)
∂y ∂v
v y
Now we may write
dx = Φyy dy + Φyv dv
du = Φvy dy + Φvv dv .
If we demand du = 0 , this yields

∂x Φyy
( ) = . (2.16.19)
∂u Φvy
v
Note that Φvy = Φyv . From the equation du = 0 we also derive

∂y Φvv
( ) =− . (2.16.20)
∂v Φvy
u
Next, we use Equation [due] with du = 0 to eliminate dy in favor of dv , and then substitute into Equation [dxe]. This yields
∂x Φyy Φvv
( ) = Φyv − . (2.16.21)
∂v Φvy
u
Finally, Equation [due] with dv = 0 yields
∂y 1
( ) = . (2.16.22)
∂u Φvy
v
Combining the results of eqns. [pxuv], [pyvu], [pxvu], and [pyuv], we have
∂(x, y) ∂x ∂y ∂x ∂y
=( )( ) −( )( )
∂(u, v) ∂u ∂v ∂v ∂u
v u u v
Φyy Φvv Φyy Φvv 1

=( )( − ) − ( Φyv − )( ) = −1 .
Φvy Φvy Φvy Φvy
Thus, if Φ = E(S, V ) , then (x, y) = (T , S) and (u, v) = (−p, V ) , we have

∂(T , S)
= −1 . (2.16.23)
∂(−p, V )
Nota bene: It is important to understand what other quantities are kept constant, otherwise we can run into trouble. For example, it
would seem that Equation [jacob] would also yield
∂(μ, N )
=1 . (2.16.24)
∂(p, V )
But then we should have

∂(T , S) ∂(T , S) ∂(−p, V )
= ⋅ = +1 (WRONG!) (2.16.25)
∂(μ, N ) ∂(−p, V ) ∂(μ, N )
when according to Equation [jacob] it should be −1. What has gone wrong?
The problem is that we have not properly specified what else is being held constant. In Equation [detTSpV] it is N (or μ ) which is
being held constant, while in Equation [detmuNpV] it is S (or T ) which is being held constant. Therefore a naive application of the
chain rule for determinants yields the wrong result, as we have seen.
Let’s be more careful. Applying the same derivation to dE = x dy + u dv + r ds and holding s constant, we conclude
∂(x, y, s) ∂x ∂y ∂x ∂y
=( ) ( ) − ( ) ( ) = −1 . (2.16.26)
∂(u, v, s) ∂u ∂v ∂v ∂u
v,s u,s u,s v,s
Thus, if
dE = T dS + y dX + μ dN , (2.16.27)
where (y, X) = (−p, V ) or (H α

,M
α
) or (E α
,P
α
) , the appropriate thermodynamic relations are
∂(T , S, N ) ∂(T , S, μ)
= −1 = −1
∂(y, X, N ) ∂(y, X, μ)
∂(μ, N , X) ∂(μ, N , y)
= −1 = −1
∂(T , S, X) ∂(T , S, y)
∂(y, X, S) ∂(y, X, T )
= −1 = −1
∂(μ, N , S) ∂(μ, N , T )
For example,
∂(T , S, N ) ∂(−p, V , S) ∂(μ, N , V )
= = = −1 (2.16.28)
∂(−p, V , N ) ∂(μ, N , S) ∂(T , S, V )
and
∂(T , S, μ) ∂(−p, V , T ) ∂(μ, N , −p)
= = = −1 . (2.16.29)
∂(−p, V , μ) ∂(μ, N , T ) ∂(T , S, −p)
If we are careful, then the results in eq. [TSyXN] can be quite handy, especially when used in conjunction with Equation [chain].
For example, we have \[\pabc{S}{V}{T,N}={\pz(T,S,N)\over\pz(T,V,N)}=\stackrel{=\,1}{\overbrace
} \cdot{\pz(p,V,N)\over\pz(T,V,N)}=\pabc{p}{T}{V,N}\ ,\] which is one of the Maxwell relations derived from the exactness of
dF (T , V , N ). Some other examples include \[\begin{aligned} \pabc{V}{S}{p,N}&={\pz(V,p,N)\over\pz(S,p,N)}=\stackrel{=\,1}
{\overbrace}\cdot{\pz(S,T,N)\over\pz(S,p,N)}=\pabc{T}{p}{S,N}\\ \pabc{S}{N}{T,p}&=
{\pz(S,T,p)\over\pz(N,T,p)}=\stackrel{=\,1}{\overbrace} \cdot{\pz(\mu,N,p)\over\pz(N,T,p)}=-\pabc{\mu}{T}{p,N}\
,\bvph\end{aligned}\] which are Maxwell relations deriving from dH(S, p, N ) and dG(T , p, N ) , respectively. Note that due to the
alternating nature of the determinant – it is antisymmetric under interchange of any two rows or columns – we have
∂(x, y, z) ∂(y, x, z) ∂(y, x, z)
=− = =… . (2.16.30)
∂(u, v, w) ∂(u, v, w) ∂(w, v, u)
In general, it is usually advisable to eliminate S from a Jacobian. If we have a Jacobian involving T , S , and N , we can write
∂(T , S, N ) ∂(T , S, N ) ∂(p, V , N ) ∂(p, V , N )
= = , (2.16.31)
∂( ∙ , ∙ , N ) ∂(p, V , N ) ∂( ∙ , ∙ , N ) ∂( ∙ , ∙ , N )
where each ∙ is a distinct arbitrary state variable other than N .

If our Jacobian involves the S , V , and N , we write
∂(S, V , N ) ∂(S, V , N ) ∂(T , V , N ) C ∂(T , V , N )

V
= ⋅ = ⋅ . (2.16.32)
∂( ∙ , ∙ , N ) ∂(T , V , N ) ∂( ∙ , ∙ , N ) T ∂( ∙ , ∙ , N )
If our Jacobian involves the S , p, and N , we write

∂(S, p, N ) ∂(S, p, N ) ∂(T , p, N ) Cp ∂(T , p, N )
= ⋅ = ⋅ . (2.16.33)
∂( ∙ , ∙ , N ) ∂(T , p, N ) ∂( ∙ , ∙ , N ) T ∂( ∙ , ∙ , N )
For example, \[\begin{aligned} \pabc{T}{p}{S,N}&={\pz(T,S,N)\over\pz(p,S,N)}=\stackrel{=\,1}{\overbrace

} \cdot{\pz(p,V,N)\over\pz(p,T,N)}\cdot {\pz(p,T,N)\over\pz(p,S,N)}={T\over C\ns_p}\pabc{V}{T}{p,N}\bvph\\ \pabc{V}{p}
{S,N}&={\pz(V,S,N)\over\pz(p,S,N)}=
{\pz(V,S,N)\over\pz(V,T,N)}\cdot{\pz(V,T,N)\over\pz(p,T,N)}\cdot{\pz(p,T,N)\over\pz(p,S,N)} ={C\ns_V\over C\ns_p}\,\pabc{V}
{p}{T,N}\ .\bvph\end{aligned}\] With κ ≡ − the compressibility, we see that the second of these equations says
1
V
∂V
∂p
κ
T
c
V
=κ
S
, relating the isothermal and adiabatic compressibilities and the molar heat capacities at constant volume and
cp
constant pressure. This relation was previously established in Equation [cpcvktks]
This page titled 2.16: Appendix III- Useful Mathematical Relations is shared under a CC BY-NC-SA license and was authored, remixed, and/or
2.S: Summary
References
E. Fermi, Thermodynamics (Dover, 1956) This outstanding and inexpensive little book is a model of clarity.
A. H. Carter, Classical and Statistical Thermodynamics (Benjamin Cummings, 2000) A very relaxed treatment appropriate for
undergraduate physics majors.
H. B. Callen, Thermodynamics and an Introduction to Thermostatistics (2 edition, Wiley, 1985) A comprehensive text
nd
appropriate for an extended course on thermodynamics.

D. V. Schroeder, An Introduction to Thermal Physics (Addison-Wesley, 2000) An excellent thermodynamics text appropriate for
upper division undergraduates. Contains many illustrative practical applications.
D. Kondepudi and I. Prigogine, Modern Thermodynamics: From Heat Engines to Dissipative Structures (Wiley, 1998) Lively
modern text with excellent choice of topics and good historical content. More focus on chemical and materials applications than
in Callen.
L. E. Reichl, A Modern Course in Statistical Physics (2nd edition, Wiley, 1998) A graduate level text with an excellent and
crisp section on thermodynamics.
Summary
∙ Extensive and intensive variables: The equilibrium state of a thermodynamic system is characterized by specifying a number of
state variables which can be either extensive (scaling linearly with system size), or intensive (scaling as the zeroth power of system
size). Extensive quantities include: energy E , entropy S , particle number N , magnetization M, Intensive quantities include
temperature T , pressure p, number density n , magnetic field H, The ratio of two extensive quantities is intensive, n = N /V . In
the thermodynamic limit, all extensive state variables tend to infinity (in whatever units are appropriate), while their various ratios
are all finite. A full description of the state of any thermodynamic system must involve at least one extensive variable (but may or
may not include intensive variables).
∙Work: The internal energy of a thermodynamic system can change as a result of a generalized displacement dX
i
, as a result of
work W done by the system. We write the differential form of W as
′
\mathchar 26
dW = − ∑ y dX − ∑ μa dNa , (2.S.1)
i i
i a
where −y is the generalized force conjugate to the generalized displacement X , and μ is the chemical potential of species a ,
i i a
which is conjugate to the number of particles of that species, N . Think of chemical work as the work required to assemble
a
′
particles out of infinitely remote constituents. The slash through the differential symbol indicates that \mathchar 26
dW is an
inexact differential, there is no function W (T , p, V , …).

∙ Heat: Aside from work done by or on the system, there is another way of changing the system’s internal energy, which is by
transferring heat, Q. Heat is a form of energy contained in the random microscopic motions of the constituent particles. Like
dW , the differential \mathchar 26
dQ is also inexact, and there is no heat function Q(T , p, V , …). Transfer of heat
′ ′
\mathchar 26
under conditions of constant volume or pressure and constant particle number results in a change of the the thermodynamic state
′
via a change in temperature: dT = \mathchar 26 dQ/C , where C is the heat capacity of the system at fixed volume/pressure and
particle number.
∙ First Law: The First Law of Thermodynamics is a statement of energy conservation which accounts for both types of energies:
ΔE = Q − W , or in differential form dE = \mathchar 26 ′
dQ − \mathchar 26 dW .
′
∙ Single component systems: A single component system is completely specified by three state variables, which can be taken to be
E , V , and N , and writing \mathchar 26
′
dW = p dV − μ dN , we have
′
\mathchar 26
dQ = dE + p dV − μ dN . (2.S.2)
If, for example, we want to use variables (T , V , N ), we write
∂E ∂E ∂E
dE = ( ) dT + ( ) dV + ( ) dN . (2.S.3)
∂T ∂V ∂N
V ,N T ,N T ,V
Proceeding in this way, one can derive expressions like
′ ′
\mathchar 26
dQ ∂E \mathchar 26
dQ ∂E
C =( ) =( ) , C =( ) =( ) (2.S.4)
V ,N p,N
dT ∂T dT ∂T
V ,N V ,N p,N p,N
∂V
+ p( ) .
∂T
p,N
∙ Equations of state: An equation of state is a relation among state variables. Examples include the familiar ideal gas law,
2
pV = N kB T , and the van der Waals equation of state, (p + aN

2
)(V − N b) = N kB T .
V
∙ Ideal gases: For ideal gases, one has pV = N k T and E = f N k T , where f is the number of kinetic degrees of freedom (
B
1
2
B
f = 3 for monatomic, f = 5 for diatomic, and f = 6 for polyatomic gases, assuming only translational and rotational freedoms are
excited).
∙ Special thermodynamic processes: Remember adiabatic ( \mathchar 26 dQ = 0 ), isothermal (dT = 0 ), isobaric (dp = 0 ), and
′
isochoric (dV = 0 ). A quasistatic process is one which follows a continuous path is a space of state variables infinitely slowly, so
that the system is in equilibrium at any instant. A reversible process is necessarily quasistatic, and moreover is nondissipative ( no
friction), so that its thermodynamic path may be followed in reverse.
∙ Heat engines and the Second Law: A heat engine takes a thermodynamic system through a repeated cycle of equilibrium states
A → B → C → ⋯ → A , the net result of which is to convert heat into mechanical work, or vice versa. A perfect engine, which
would extract heat Q from a large thermal reservoir1, such as the ocean, and convert it into work W = Q each cycle, is not
possible, according to the Second Law of Thermodynamics. Real engines extract heat Q from an upper reservoir at temperature 2
T , dump heat Q into a lower reservoir at temperature T , and transform the difference into useful mechanical work
1
2 1
W = Q − Q . A refrigerator is simply an engine operating in reverse: work is done in order to extract heat Q from the lower
1
2 1
reservoir, and Q = W + Q is dumped into the upper reservoir in each cycle. The efficiency of the engine cycle is defined to be
2 1
Q1
η =1−
Q2
. The engine efficiency is bounded from above by the efficiency of a reversible cycle operating between those two
T
reservoirs, such as the Carnot cycle (two adiabats and two isotherms). Thus, η ≤ η C
=1−
1
T2
.
Q1 Q2
∙ Entropy: The Second Law guarantees that an engine operating between two reservoirs must satisfy T1
+
T2
≤0 , with the
equality holding for reversible cycles. Here Q 1
= −Q1 is the (negative) heat transferred to the engine from reservoir #1. Since an
arbitrary curve in the p-V plane (at fixed N ) can be composed of a combination of Carnot cycles, one concludes
′
\mathchar 26
dQ
∮
T
≤0 , again with equality holding for reversible cycles. Clausius, in 1865, realized that one could thereby define a
′
\mathchar 26
dQ
new state function, the entropy, S , with dS = . Thus, T
\mathchar 26
′
dQ ≤ T dS , with equality holding for reversible
processes. The entropy is extensive, with units [S] = J/K .
∙ Gibbs-Duhem relation: For reversible processes, we now have
dE = T dS + ∑ y dX + ∑ μa dNa , (2.S.5)
i i
i a
which says E = E(S, {X }, {N }), which is to say

i a E is a function of all the extensive variables. It therefore must be
homogeneous of degree one, λE = E(λS, {λX }, {λN i a }) , and from Euler’s theorem it then follows that
E = T S + ∑ y X + ∑ μa Na
i i
i a
0 = S dT + ∑ X dy + ∑ Na dμa .
i i
i a
This means that there is one equation of state which can be written as a function of all the ’proper’ intensive variables.
∙ Thermodynamic potentials: Under equilibrium conditions, one can make Legendre transforms to an appropriate or convenient
system of thermodynamic variables. Some common examples:
E(S, V , N ) = E dE = T dS − p dV + μ dN
F (T , V , N ) = E − T S dF = −S dT − p dV + μ dN
H(S, p, N ) = E + pV dH = T dS + V dp + μ dN
G(T , p, N ) = E − T S + pV dG = −S dT + V dp + μ dN
Ω(T , V , μ) = E − T S − μN dΩ = −S dT − p dV − N dμ .
Under general nonequilibrium conditions, the Second Law says that each of the equalities on the right is replaced by an inequality,
dG ≤ −S dT + V dp + μ dN . Thus, under conditions of constant temperature, pressure, and particle number, the Gibbs free
energy G will achieve its minimum possible value via spontaneous processes. Note that Gibbs-Duhem says that G = μN and
Ω = −pV .
∙ Maxwell relations: Since the various thermodynamic potentials are state variables, we have that the mixed second derivatives can
each be expressed in two ways. This leads to relations of the form
2
∂ G ∂S ∂V
= −( ) =( ) . (2.S.6)
∂T ∂p ∂p ∂T
T ,N p,N
∙ Thermodynamic stability: Suppose T , p, and N are fixed. Then

2 2
1 ∂E ∂ E ∂ E
2 2
ΔG = [ (ΔS ) + ΔS ΔV + (ΔV ) ] + … , (2.S.7)
2 2
2 ∂S ∂S ∂V ∂V
and since in equilibrium G is at a minimum, ΔG > 0 requires that the corresponding Hessian matrix of second derivatives be
positive definite:
2
∂ E ∂T T
=( ) = >0
2
∂S ∂S C
V V
2
∂ E ∂p 1
= −( ) = >0
2
∂V ∂V V κ
S S
2 2
2 2 2
∂ E ∂ E ∂ E T ∂T
⋅ −( ) = −( ) >0 .
2 2
∂S ∂V ∂S ∂V V κ C ∂V S
S V
∙ Response coefficients: In addition to heat capacities C

V
= T(
∂S
∂T
) and Cp = T (
∂S
∂T
) one defines the isothermal
V p
compressibility κ T
=−
1
V
(
∂V
∂p
)
T
and the adiabatic compressibility κ S
=−
1
V
(
∂V
∂p
)
S
, as well as the thermal expansion coefficient
αp =
1
V
(
∂V
∂T
)
p
. Invoking the Maxwell relations, one derives certain identities, such as
2 2
V T αp V T αp
Cp − C = , κ −κ = . (2.S.8)
V T S
κ Cp
T
∙ Entropy of mixing: The entropy of any substance obeying the ideal gas law is given by the expression
S(T , V , N ) = N kB ln(V /N ) + N ϕ(T ) . If different ideal gases at the same p and T were separated via physical barriers, and the
barriers were then removed, the change in entropy would be ΔS = −N k ∑ x ln x , where x = N /N with N = ∑ N B
a a a a a a a
being the total number of particles over all species. This is called the entropy of mixing.
∙ Weak solutions and osmotic pressure: If one species is much more plentiful than the others, we give it a particle label a = 0 and
call it the solvent. The Gibbs free energy of a weak solution is then
Na 1
G(T , p, N , { Na }) = N g (T , p) + ∑ Na { kB T ln( ) + ψa (T , p)} + ∑A (T , p) Na N . (2.S.9)
0 0 0 ab b
eN 2N
a 0 0 a,b
Assuming x = N /N ≪ 1 for a > 0 , we have μ = g − x k T and μ = k T ln x + ψ . If x > 0 on the right side of a

a a 0 0 0 B a B a a
semipermeable membrane and x = 0 on the left, then assuming the membrane is permeable to the solvent, we must have
\mu\ns_0=g\ns_0(T,p\ns_\ssr{L})=g\ns_0(T,p\ns_\ssr{R})-x\kT . This leads to a pressure difference, π, called the osmotic pressure,
given by \pi=p\ns_\ssr{R}-p\ns_\ssr{L}= x\kT\big/\big({\pz\mu\ns_0\over\pz p}\big)\ns_{T,N} . Since a Maxwell relation guarantees
∂μ ∂V0
(
∂p
0
)
T ,N
=(
∂N
)
T ,p
, we have the equation of state πv = xRT , where v is the molar volume of the solvent.
∙ Binary solutions: In a mixture of A and B species, let x = N B

/(N
A
+N
B
) . The Gibbs free energy per particle is
0 0
g(T , p, x) = (1 − x) μ (T , p) + x μ (T , p) + kB T [x ln x + (1 − x) ln(1 − x)] + λ x(1 − x) . (2.S.10)
A B AB
If λ > 0 , the and components repel, and the mixture becomes unstable. There is a local instability, corresponding to spinodal
AB
decomposition, when g (x) = 0 . This occurs at a temperature k T = 2λ x(1 − x) . But for a given x, an instability toward
′′
B
∗
AB
phase separation survives to even higher temperature, and is described by the Maxwell construction. The coexistence boundary is
obtained from [g(x ) − g(x )]/(x − x ) = g (x ) = g (x ) , and from the symmetry under x ↔ 1 − x , one finds
2 1 2 1
′
1
′
2
− 1) , where nucleation of the minority phase sets in.

−1
k T B
=λ
coex (1 − 2x)/ ln(x
AB
∙ Miscible fluids and liquid-vapor coexistence: If λ < 0 , there is no instability toward phase separation, and the and fluids are
AB
said to be completely miscible. Example: benzene C H and toluene C H CH . At higher temperatures, near the liquid gas
6 6 6 5 3
transition, there is an instability toward phase separation. In the vapor phase, λ ≈ 0 , while for the liquid λ < 0 . The free V
AB
L
AB
energy curves g (T , p, x) and g (T , p, x) are then both convex as a function of x, but choosing the minimum
L V
g(x) = min(g (x), g (x)), one is forced toward a Maxwell construction, hence phase coexistence. In the case of ’ideal liquids’
L V
with different boiling points, we can even take λ ≈ 0 . By successively boiling and then separating and condensing the resulting
L
AB
vapor, the mixture may be distilled (see Fig. [FIG3]). When λ ≠ 0 , the mixture may be azeotropic in which case the extremum
L
AB
of the boiling point occurs at an intermediate concentration (see Fig. [FIG4]).

∙ Thermochemistry: A chemical reaction among σ species may be represented
ζ A1 + ζ A2 + ⋯ + ζσ Aσ = 0 , (2.S.11)
1 2
where A is a chemical formula, and ζ is a stoichiometric coefficient. If ζ > 0 , then A is a product, while for ζ < 0 ,
a a a a a Aa is a
reactant. Chemical equilibrium requires ∑ ζ μ = 0 . For a mixture of ideal gases, one has the law of mass action,
σ
a=1 a a
σ σ ζ
ζa
kB T ξa (T ) a
κ(T , p) ≡ ∏ xa =∏ ( ) , (2.S.12)
3
a=1 a=1
pλa
where ξ (T ) is the internal coordinate partition function for molecular species a . κ(T , p) is the equilibrium constant of the
a
reaction. When κ is large, products are favored over reactants. When κ is small, reactants are favored over products. One may
further show
∂ ln κ Δh
( ) = , (2.S.13)
2
∂T kB T
p
where Δh is the enthalpy of the reaction. When Δh < 0 , the reaction is exothermic. At finite pressure, this means that heat is
transferred to the environment: Q = ΔE + p ΔV = ΔH < 0 , where H = E + pV . When Δh > 0 , the reaction is endothermic,
and requires heat be transferred from the environment.
∙ Clapeyron relation: Across a coexistence curve p(T )separating two phases, the chemical potential μ is continuous. This says
dg
1
= −s
1
dT + v
1
dp = −s
2
dT + v
2
dp = dg
2
, where g , s , and v are the Gibbs free energy, entropy, and volume per mole,
respectively. Then
∂p s −s ℓ
2 1
( ) = = , (2.S.14)
∂t v −v T Δv
coex 2 1
where ℓ = T Δs = T (s − s ) is the molar latent heat of transition which must be supplied in order to change from phase #1 to
2 1
phase #2, even without changing T or p.

∙ Gibbs phase rule: For a system with σ species, Gibbs-Duhem says μ = μ (T , p, μ , … , μ ), so a maximum of σ + 1σ σ 1 σ−1
intensive quantities may be specified. If a system with σ species has equilibrium among φ phases, then there are σ(φ − 1)
′
(j) (j )
independent equilibrium conditions μ = μ , where a labels species and j labels phases, among the 2 + φ(σ − 1) intensive
a a
variables, and so φ -phase equilibrium can exist over a space of dimension d = 2 + σ − φ . Since this cannot be negative, we have
φ ≤ 2 + σ . Thus, for a single species, we can at most have three phase coexistence, which would then occur on a set of dimension
zero, as is the case for the triple point of water, for example.
1. A thermal reservoir, or heat bath, is any very large object with a fixed temperature. Because it is so large, the change in
temperature ΔT = Q/C which results from a heat transfer Q is negligible, since the heat capacity C is an extensive quantity.↩
Endnotes
1. For a system of N molecules which can freely rotate, we must then specify 3N additional orientational variables – the Euler
angles – and their 3N conjugate momenta. The dimension of phase space is then 12N .↩
2. Hence, 1 guacamole = 6.0221415 × 10 guacas.↩ 23
3. One calorie (cal) is the amount of heat needed to raise 1 g of H O from T = 14.5 C to T = 15.5 C at a pressure of
2 0
∘
1
∘
p = 1 atm. One British Thermal Unit (BTU) is the amount of heat needed to raise 1 lb. of H O from T = 63 F to
∘
0 2 0
T = 64 F at a pressure of p = 1 atm.↩
∘
1 0
4. We use the symbol \mathchar 26 ′
d in the differential \mathchar 26dW to indicate that this is not an exact differential. More on
′
this in section 4 below.↩

5. As we shall see further below, thermomechanical equilibrium in fact leads to constant p/T , and thermochemical equilibrium to
constant μ/T . If there is thermal equilibrium, then T is already constant, and so thermomechanical and thermochemical
equilibria then guarantee the constancy of p and μ .↩
6. In most metals, the difference between C and C is negligible.↩
V
p
7. See the description in E. Fermi, Thermodynamics, pp. 22-23.↩

8. Carnot died during cholera epidemic of 1832. His is one of the 72 names engraved on the Eiffel Tower.↩
9. See F. L. Curzon and B. Ahlborn, Am. J. Phys. 43, 22 (1975). I am grateful to Professor Asle Sudbø for correcting a typo in one
expression and providing a simplified form of another.↩
10. We neglect any interfacial contributions to the entropy change, which will be small compared with the bulk entropy change in
the thermodynamic limit of large system size.↩
11. Note V /N = v/N .↩ A
12. Some exotic phase transitions in quantum matter, which do not quite fit the usual classification schemes, have recently been
proposed.↩
13. The melting curve has a negative slope at relatively low pressures, where the solid has the so-called Ih hexagonal crystal
structure. At pressures above about 2500 atmospheres, the crystal structure changes, and the slope of the melting curve becomes
positive.↩
14. For a recent discussion, see R. Rosenberg, Physics Today 58, 50 (2005).↩
15. For example, they could be of the van der Waals form, due to virtual dipole fluctuations, with an attractive 1/r tail.↩
6
16. We assume {\tilde c}\ns_\ssr{S}(T) and {\tilde c}\ns_\ssr{L}(T) have no appreciable temperature dependence, and we regard
them both as constants.↩
17. Set j = 1 and let j range over the φ − 1 values 2, … , φ.↩
′
18. The same can be said for multicomponent systems: the phase diagram in the (T , x) plane at constant p looks different than the
phase diagram in the (T , μ) plane at constant p.↩
19. ‘Semipermeable’ in this context means permeable to the solvent but not the solute(s).↩
20. We shall discuss latent heat again in §12.2 below.↩
21. See table [latentheats], and recall M = 18 g is the molar mass of H O.↩ 2
22. It is more customary to write ΔT = T ∗ ∗

pure solvent
−T
∗
in the case of the freezing point depression, in which case ΔT is
solution
∗
positive.↩
23. An emulsion is a mixture of two or more immiscible liquids.↩
24. We assume the boiling temperatures are not exactly equal!↩
25. Antoine Lavoisier, the "father of modern chemistry", made pioneering contributions in both chemistry and biology. In
particular, he is often credited as the progenitor of stoichiometry. An aristocrat by birth, Lavoisier was an administrator of the
Ferme générale, an organization in pre-revolutionary France which collected taxes on behalf of the king. At the age of 28,
Lavoisier married Marie-Anne Pierette Paulze, the 13-year-old daughter of one of his business partners. She would later join her
husband in his research, and she played a role in his disproof of the phlogiston theory of combustion. The phlogiston theory was
superseded by Lavoisier’s work, where, based on contemporary experiments by Joseph Priestley, he correctly identified the
pivotal role played by oxygen in both chemical and biological processes ( respiration). Despite his fame as a scientist, Lavoisier
succumbed to the Reign of Terror. His association with the Ferme générale, which collected taxes from the poor and the
downtrodden, was a significant liability in revolutionary France (think Mitt Romney vis-a-vis Bain Capital). Furthermore – and
let this be a lesson to all of us – Lavoisier had unwisely ridiculed a worthless pseudoscientific pamphlet, ostensibly on the
physics of fire, and its author, Jean-Paul Marat. Marat was a journalist with scientific pretensions, but apparently little in the
way of scientific talent or acumen. Lavoisier effectively blackballed Marat’s candidacy to the French Academy of Sciences, and
the time came when Marat sought revenge. Marat was instrumental in getting Lavoisier and other members of the Ferme
générale arrested on charges of counterrevolutionary activities, and on May 8, 1794, after a trial lasting less than a day,
Lavoisier was guillotined. Along with Fourier and Carnot, Lavoisier’s name is one of the 72 engraved on the Eiffel Tower.
Source: www.vigyanprasar.gov.in/scientists/ALLavoisier.htm.↩
26. Note that g(p) may be a negative number, if the line y = px lies everywhere below f (x).↩
CHAPTER OVERVIEW
3: Ergodicity and the Approach to Equilibrium

3.6: Appendices
3.S: Summary
Thumbnail: Time evolution of two immiscible fluids. The local density remains constant.
This page titled 3: Ergodicity and the Approach to Equilibrium is shared under a CC BY-NC-SA license and was authored, remixed, and/or
1
Equilibrium
A thermodynamic system typically consists of an enormously large number of constituent particles, a typical ‘large number’ being
Avogadro’s number, N = 6.02 × 10 . Nevertheless, in equilibrium, such a system is characterized by a relatively small number
A
23
of thermodynamic state variables. Thus, while a complete description of a (classical) system would require us to account for
) evolving degrees of freedom, with respect to the physical quantities in which we are interested, the details of the initial
23
O(10
conditions are effectively forgotten over some microscopic time scale τ , called the collision time, and over some microscopic
distance scale, ℓ , called the mean free path1. The equilibrium state is time-independent.
The Master Equation

Relaxation to equilibrium is often modeled with something called the master equation. Let P i
(t) be the probability that the system
is in a quantum or classical state i at time t . Then write
dP
i
= ∑ (W P −W P ) . (3.1.1)
ij j ji i
dt
j
Here, W is the rate at which j makes a transition to i. Note that we can write this equation as
ij
dP
i
= −∑Γ P , (3.1.2)
ij j
dt
j
where
−W if i ≠ j
ij
Γ ={ (3.1.3)
ij '
∑ W if i = j ,
k kj
where the prime on the sum indicates that k = j is to be excluded. The constraints on the W are that W ij ij
≥0 for all i, j, and we
may take W ≡ 0 (no sum on i). Fermi’s Golden Rule of quantum mechanics says that
ii
2π 2
∣ ^
W
ij
= ∣⟨ i | V | j ⟩∣
∣ ρ(E ) ,
j
(3.1.4)
ℏ
where H^ ∣
∣i⟩ =E ∣
0 ∣ i ⟩ , V is an additional potential which leads to transitions, and ρ(E ) is the density of final states at energy
^
i i
E . The fact that W ≥ 0 means that if each P (t = 0) ≥ 0 , then P (t) ≥ 0 for all t ≥ 0 . To see this, suppose that at some time
i ij i i
t > 0 one of the probabilities P is crossing zero and about to become negative. But then Equation 3.1.1 says that
i
P (t) = ∑ W P (t) ≥ 0 . So P (t) can never become negative.

˙
i j ij j i
Equilibrium distribution and detailed balance

If the transition rates W are themselves time-independent, then we may formally write
ij
−Γt
P (t) = (e ) P (0) . (3.1.5)
i ij j
Here we have used the Einstein ‘summation convention’ in which repeated indices are summed over (in this case, the j index).
Note that
∑Γ =0 , (3.1.6)
ij
which says that the total probability ∑ i

P
i
is conserved:
d
∑P = −∑Γ P = − ∑ (P ∑Γ ) =0 . (3.1.7)
i ij j j ij
dt
i i,j j i
We conclude that ϕ ⃗ = (1, 1, … , 1) is a left eigenvector of Γ with eigenvalue λ = 0 . The corresponding right eigenvector, which
we write as P , satisfies Γ P = 0 , and is a stationary ( time independent) solution to the master equation. Generally, there is
eq
i ij
eq
only one right/left eigenvector pair corresponding to λ =0 , in which case any initial probability distribution P (0)
i
converges to
P
i
eq
as t → ∞ , as shown in Appendix I (§7).
In equilibrium, the net rate of transitions into a state | i ⟩ is equal to the rate of transitions out of | i ⟩. If, for each state | j ⟩ the
transition rate from | i ⟩ to | j ⟩ is equal to the transition rate from | j ⟩ to | i ⟩, we say that the rates satisfy the condition of detailed
balance. In other words,
eq eq
W P =W P . (3.1.8)
ij j ji i
Assuming W ij
≠0 and P j
eq
≠0 , we can divide to obtain
eq
W P
ji j
= eq
. (3.1.9)
W P
ij i
Note that detailed balance is a stronger condition than that required for a stationary solution to the master equation.
If Γ = Γ is symmetric, then the right eigenvectors and left eigenvectors are transposes of each other, hence P = 1/N , where N
t eq
is the dimension of Γ. The system then satisfies the conditions of detailed balance. See Appendix II (§8) for an example of this
formalism applied to a model of radioactive decay.
Boltzmann’s H-theorem
Suppose for the moment that Γ is a symmetric matrix, Γ ij
=Γ
ji
. Then construct the function
H(t) = ∑ P (t) ln P (t) . (3.1.10)

i i
Then
dH dP dP
i i
=∑ (1 + ln P ) = ∑ ln P
i i
dt dt dt
i i
= −∑Γ P ln P
ij j i
i,j
= ∑Γ P (ln P − ln P ) ,
ij j j i
i,j
where we have used ∑ i

Γ
ij
=0 . Now switch i ↔ j in the above sum and add the terms to get
dH 1
= ∑Γ (P − P ) (ln P − ln P ) . (3.1.11)
ij i j i j
dt 2
i,j
Note that the i = j term does not contribute to the sum. For i ≠ j we have Γ ij
= −W
ij
≤0 , and using the result
(x − y) (ln x − ln y) ≥ 0 , (3.1.12)
we conclude
dH
≤0 . (3.1.13)
dt
eq
In equilibrium, P i
is a constant, independent of i. We write
eq
1
P = , Ω = ∑1 ⟹ H = − ln Ω . (3.1.14)
i
Ω
i
If Γ ij
≠Γ
ji
, we can still prove a version of the H -theorem. Define a new symmetric matrix
¯¯¯¯
¯ eq eq ¯¯¯¯
¯
W ≡W P =W P =W , (3.1.15)
ij ij j ji i ji
and the generalized H -function,
P (t)
i
H(t) ≡ ∑ P (t) ln( ) . (3.1.16)
i eq
P
i i
Then
dH 1 P P P P
¯¯¯¯
¯ i j i j
=− ∑W ( − )[ ln( ) − ln( )] ≤ 0 . (3.1.17)
ij eq eq eq eq
dt 2 P P P P
i,j i j i j
This page titled 3.1: Modeling the Approach to Equilibrium is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated
by Daniel Arovas.
Hamiltonian evolution
The master equation provides us with a semi-phenomenological description of a dynamical system’s relaxation to equilibrium. It
explicitly breaks time reversal symmetry. Yet the microscopic laws of Nature are (approximately) time-reversal symmetric. How
can a system which obeys Hamilton’s equations of motion come to equilibrium?
Let’s start our investigation by reviewing the basics of Hamiltonian dynamics. Recall the Lagrangian L = L(q, q̇ , t) = T − V .
The Euler-Lagrange equations of motion for the action S[q(t)] = ∫ dt L are
d ∂L ∂L
ṗ = ( ) = , (3.2.1)
σ
dt ∂q̇ ∂qσ
σ
where p is the canonical momentum conjugate to the generalized coordinate q :

σ σ
∂L
pσ = . (3.2.2)
∂q̇ σ
The Hamiltonian, H (q, p) is obtained by a Legendre transformation,

r
H (q, p) = ∑ pσ q̇ σ
−L . (3.2.3)
σ=1
Note that
r
∂L ∂L ∂L
dH = ∑ ( pσ d q̇ σ
+ q̇ σ
dpσ − dqσ − d q̇ σ
)− dt
∂qσ ∂q̇ σ
∂t
σ=1
r
∂L ∂L
= ∑ ( q̇ dpσ − dqσ ) − dt .
σ
∂qσ ∂t
σ=1
Thus, we obtain Hamilton’s equations of motion,

∂H ∂H ∂L
= q̇ σ
, =− = −ṗ σ (3.2.4)
∂pσ ∂qσ ∂qσ
and
dH ∂H ∂L
= =− . (3.2.5)
dt ∂t ∂t
Define the rank 2r vector φ by its components,
⎧q if\ 1 ≤ i ≤ r
⎪ i
φ =⎨ (3.2.6)
i
⎩
⎪
p if\ r ≤ i ≤ 2r .
i−r
Then we may write Hamilton’s equations compactly as

∂H
φ̇i = Jij , (3.2.7)
∂φj
where
0r×r 1r×r
J =( ) (3.2.8)
−1r×r 0r×r
is a rank 2r matrix. Note that J t

= −J , J is antisymmetric, and that J 2
= −1
2r×2r
.
Dynamical systems and the evolution of phase space volumes
Consider a general dynamical system,
dφ
= V(φ) , (3.2.9)
dt
where φ(t) is a point in an n -dimensional phase space. Consider now a compact2 region R in phase space, and consider its 0
evolution under the dynamics. That is, R consists of a set of points {φ | φ ∈ R } , and if we regard each φ ∈ R as an initial
0 0 0
condition, we can define the time-dependent set R(t) as the set of points φ(t) that were in R at time t = 0 : 0
R(t) = {φ(t) ∣
∣ φ(0) ∈ R0 } . (3.2.10)
Now consider the volume Ω(t) of the set R(t) . We have
Ω(t) = ∫ dμ (3.2.11)
R(t)
where
dμ = dφ dφ ⋯ dφn , (3.2.12)
1 2
for an n -dimensional phase space. We then have

∂ φ (t + dt)
′ ∣ i ∣
Ω(t + dt) = ∫dμ = ∫ dμ ∣ ∣ , (3.2.13)
∣ ∂ φ (t) ∣
j
R(t+dt) R(t)
where
′ ′
∂ φ (t + dt) ∂(φ , … , φn )
∣ i ∣ 1
∣ ∣ ≡ (3.2.14)
∣ ∂ φ (t) ∣ ∂(φ , … , φn )
j 1
is a determinant, which is the Jacobean of the transformation from the set of coordinates {φ
i
= φ (t)}
i
to the coordinates
{ φ = φ (t + dt)} . But according to the dynamics, we have
′
i i
2
φ (t + dt) = φ (t) + V (φ(t)) dt + O(dt ) (3.2.15)
i i i
and therefore
∂ φ (t + dt) ∂V
i i 2
=δ + dt + O(dt ) . (3.2.16)
ij
∂ φ (t) ∂φ
j j
We now make use of the equality

ln det M = Tr ln M , (3.2.17)
for any matrix M , which gives us3, for small ε ,

1 2
2 2
det (1 + εA) = exp Tr ln (1 + εA) = 1 + ε Tr A + ε (( Tr A) − Tr (A )) + … (3.2.18)
2
Thus,
2
Ω(t + dt) = Ω(t) + ∫ dμ ∇⋅V dt + O(dt ) , (3.2.19)
R(t)
which says
dΩ
^ ⋅V
= ∫dμ ∇⋅V =∫dS n (3.2.20)
dt
R(t) ∂R(t)
Here, the divergence is the phase space divergence,
n
∂V
i
∇⋅V = ∑ , (3.2.21)
∂φ
i=1 i
and we have used the divergence theorem to convert the volume integral of the divergence to a surface integral of n^ ⋅ V , where n
^
is the surface normal and dS is the differential element of surface area, and ∂R denotes the boundary of the region R . We see that
if ∇⋅V = 0 everywhere in phase space, then Ω(t) is a constant, and phase space volumes are preserved by the evolution of the
system.
For an alternative derivation, consider a function ϱ(φ, t) which is defined to be the density of some collection of points in phase
space at phase space position φ and time t . This must satisfy the continuity equation,
∂ϱ
+ ∇⋅(ϱV) = 0 . (3.2.22)
∂t
This is called the continuity equation. It says that ‘nobody gets lost’. If we integrate it over a region of phase space R , we have
d
^ ⋅ (ϱV) .
∫ dμ ϱ = − ∫ dμ ∇⋅(ϱV) = −∫ dS n (3.2.23)
dt
R R ∂R
It is perhaps helpful to think of ϱ as a charge density, in which case J = ϱV is the current density. The above equation then says
dQ
R
^ ⋅J ,
= −∫ dS n (3.2.24)
dt
∂R
where Q is the total charge contained inside the region R . In other words, the rate of increase or decrease of the charge within the
R
region R is equal to the total integrated current flowing in or out of R at its boundary.
Figure 3.2.1 : Time evolution of two immiscible fluids. The local density remains constant.
The Leibniz rule lets us write the continuity equation as
∂ϱ
+ V⋅∇ϱ + ϱ ∇⋅V = 0 . (3.2.25)
∂t
But now suppose that the phase flow is divergenceless, ∇⋅V = 0 . Then we have
Dϱ ∂
≡( + V⋅∇)ϱ = 0 . (3.2.26)
Dt ∂t
The combination inside the brackets above is known as the convective derivative. It tells us the total rate of change of ϱ for an
observer moving with the phase flow. That is
d ∂ϱ dφ ∂ϱ
i
ϱ(φ(t), t) = +
dt ∂φ dt ∂t
i
n
∂ρ ∂ϱ Dϱ
= ∑V + = .
i
∂φ ∂t Dt
i=1 i
If Dϱ/Dt = 0 , the local density remains the same during the evolution of the system. If we consider the ‘characteristic function’
1 if φ ∈ R
0
ϱ(φ, t = 0) = { (3.2.27)
0 otherwise
then the vanishing of the convective derivative means that the image of the set R under time evolution will always have the same 0
volume.
Hamiltonian evolution in classical mechanics is volume preserving. The equations of motion are
∂H ∂H
q̇ i
=+ , ṗ i = − (3.2.28)
∂pi ∂qi
A point in phase space is specified by r positions q and r momenta p , hence the dimension of phase space is n = 2r :
i i
q q̇ ∂H /∂p
φ =( ) , V =( ) =( ) . (3.2.29)
p ṗ −∂H /∂q
Hamilton’s equations of motion guarantee that the phase space flow is divergenceless:
r
∂q̇ i
∂ṗ i
∇⋅V = ∑ { + }
∂qi ∂pi
i=1
r
∂ ∂H ∂ ∂H
= ∑{ ( )+ (− )} = 0 .
∂qi ∂pi ∂pi ∂qi
i=1
Thus, we have that the convective derivative vanishes, viz.

Dϱ ∂ϱ
≡ + V⋅∇ϱ = 0 , (3.2.30)
Dt ∂t
for any distribution ϱ(φ, t) on phase space. Thus, the value of the density ϱ(φ(t), t) is constant, which tells us that the phase flow
is incompressible. In particular, phase space volumes are preserved.
Liouville’s equation and the microcanonical distribution

Let ϱ(φ) = ϱ(q, p) be a distribution on phase space. Assuming the evolution is Hamiltonian, we can write
r
∂ϱ ∂ ∂
^
= −φ̇ ⋅ ∇ϱ = − ∑ ( q̇ k
+ ṗ k ) ϱ = −i Lϱ , (3.2.31)
∂t ∂q ∂p
k=1 k k
where L
^
is a differential operator known as the Liouvillian:
r
∂H ∂ ∂H ∂
^
L = −i ∑ { − } . (3.2.32)
∂p ∂q ∂q ∂p
k=1 k k k k
Equation 3.2.31, known as Liouville’s equation, bears an obvious resemblance to the Schrödinger equation from quantum
mechanics.
Suppose that Λ (φ) is conserved by the dynamics of the system. Typical conserved quantities include the components of the total
a
linear momentum (if there is translational invariance), the components of the total angular momentum (if there is rotational
invariance), and the Hamiltonian itself (if the Lagrangian is not explicitly time-dependent). Now consider a distribution
ϱ(φ, t) = ϱ(Λ , Λ , … , Λ ) which is a function only of these various conserved quantities. Then from the chain rule, we have
1 2 k
∂ϱ
φ̇ ⋅ ∇ϱ = ∑ φ̇ ⋅ ∇Λa = 0 , (3.2.33)
∂Λa
a
since for each a we have
r
dΛa ∂Λa ∂Λa
= ∑( q̇ + ṗ ) = φ̇ ⋅ ∇Λa = 0 . (3.2.34)
σ σ
dt ∂qσ ∂pσ
σ=1
We conclude that any distribution ϱ(φ, t) = ϱ(Λ 1

,Λ ,…,Λ )
2 k
which is a function solely of conserved dynamical quantities is a
stationary solution to Liouville’s equation.
Clearly the microcanonical distribution,
δ(E − H (φ)) δ(E − H (φ))

ϱ (φ) = = , (3.2.35)
E
D(E) ∫dμ δ(E − H (φ))
is a fixed point solution of Liouville’s equation.
This page titled 3.2: Phase Flows in Classical Mechanics is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by
Daniel Arovas.
The dynamics of the master equation describe an approach to equilibrium. These dynamics are irreversible: dH/dt ≤ 0 , where H
is Boltzmann’s H -function. However, the microscopic laws of physics are (almost) time-reversal invariant4, so how can we
understand the emergence of irreversibility? Furthermore, any dynamics which are deterministic and volume-preserving in a finite
phase space exhibits the phenomenon of Poincaré recurrence, which guarantees that phase space trajectories are arbitrarily close to
periodic if one waits long enough.
Poincaré Recurrence Theorem

The proof of the recurrence theorem is simple. Let g be the ‘τ -advance mapping’ which evolves points in phase space according
τ
to Hamilton’s equations. Assume that g is invertible and volume-preserving, as is the case for Hamiltonian flow. Further assume
τ
that phase space volume is finite. Since energy is preserved in the case of time-independent Hamiltonians, we simply ask that the
volume of phase space at fixed total energy E be finite,
∫ dμ δ(E − H (q, p)) < ∞ , (3.3.1)
where dμ = dq dp is the phase space uniform integration measure.

In any finite neighborhood R of phase space there exists a point φ which will return to R after m applications of g , where m
0 0 0 τ
is finite.
Figure 3.3.1 : Successive images of a set R under the τ -advance mapping g , projected onto a two-dimensional phase plane. The
0 τ
Poincaré recurrence theorem guarantees that if phase space has finite volume, and g is invertible and volume preserving, then for
τ
any set R there exists an integer m such that R ∩ g R ≠ ∅ .

0 0
m
τ 0
Assume the theorem fails; we will show this assumption results in a contradiction. Consider the set Υ formed from the union of all
sets g R for all m:
k
τ
k
Υ = ⋃ gτ R0 (3.3.2)
k=0
We assume that the set k

{ gτ R
0
| k ∈ N} is disjoint5. The volume of a union of disjoint sets is the sum of the individual volumes.
Thus,
∞
k
vol(Υ) = ∑ vol(gτ R0 )
k=0
= vol(R0 ) ⋅ ∑ 1 = ∞ ,
k=0
since k
vol(gτ R ) = vol(R ) from volume preservation. But clearly
0 0
Υ is a subset of the entire phase space, hence we have a
contradiction, because by assumption phase space is of finite volume.
Thus, the assumption that the set {g R | k ∈ Z } is disjoint fails. This means that there exists some pair of integers k and l, with
k
τ 0 +
k ≠ l , such that g R ∩ g R ≠ ∅ . Without loss of generality we may assume k < l . Apply the inverse g to this relation k
k l −1
τ 0 τ 0 τ
times to get g R ∩ R ≠ ∅ . Now choose any point φ ∈ g R ∩ R , where m = l − k , and define φ = g φ . Then by
l−k
τ 0 0 1
m
τ 0 0 0
−m
τ 1
construction both φ and g φ lie within R and the theorem is proven.

0
m
τ 0 0
Poincaré recurrence has remarkable implications. Consider a bottle of perfume which is opened in an otherwise evacuated room, as
depicted in Figure 3.3.2. The perfume molecules evolve according to Hamiltonian evolution. The positions are bounded because
physical space is finite. The momenta are bounded because the total energy is conserved, hence no single particle can have a
momentum such that T (p) > E , where T (p) is the single particle kinetic energy function6. Thus, phase space, however large,
T OT
is still bounded. Hamiltonian evolution, as we have seen, is invertible and volume preserving, therefore the system is recurrent. All
the molecules must eventually return to the bottle. What’s more, they all must return with momenta arbitrarily close to their initial
momenta!7 In this case, we could define the region R as 0
0 0
R0 = {(q , … , qr , p , … , pr ) ∣
∣ | qi − qi | ≤ Δq and | pj − pj | ≤ Δp ∀ i, j} , (3.3.3)
1 1
which specifies a hypercube in phase space centered about the point (q 0 0

,p ) .
Figure 3.3.2 : Poincaré recurrence guarantees that if we remove the cap from a bottle of perfume in an otherwise evacuated room,
all the perfume molecules will eventually return to the bottle! (Here H is the Hubble constant.)
Each of the three central assumptions – finite phase space, invertibility, and volume preservation – is crucial. If any one of these
assumptions does not hold, the proof fails. Obviously if phase space is infinite the flow needn’t be recurrent since it can keep
moving off in a particular direction. Consider next a volume-preserving map which is not invertible. An example might be a
mapping f : R → R which takes any real number to its fractional part. Thus, f (π) = 0.14159265 …. Let us restrict our attention to
intervals of width less than unity. Clearly f is then volume preserving. The action of f on the interval [2, 3) is to map it to the
interval [0, 1). But [0, 1) remains fixed under the action of f , so no point within the interval [2, 3) will ever return under repeated
iterations of f . Thus, f does not exhibit Poincaré recurrence.
Consider next the case of the damped harmonic oscillator. In this case, phase space volumes contract. For a one-dimensional
oscillator obeying ẍ + 2β ẋ + Ω x = 0 one has ∇⋅V = −2β < 0 , since β > 0 for physical damping. Thus the convective
2
0
derivative is D ϱ = −(∇⋅V)ϱ = 2βϱ which says that the density increases exponentially in the comoving frame, as
t
ϱ(0) . Thus, phase space volumes collapse: Ω(t) = e Ω(0) , and are not preserved by the dynamics. The proof of
2βt −2β2
ϱ(t) = e
recurrence therefore fails. In this case, it is possible for the set Υ to be of finite volume, even if it is the union of an infinite number
of sets g R , because the volumes of these component sets themselves decrease exponentially, as vol(g R ) = e
k
τ 0
n
τ 0
vol(R ) .
−2nβτ
0
A damped pendulum, released from rest at some small angle θ , will not return arbitrarily close to these initial conditions.
0
Kac ring model

The implications of the Poincaré recurrence theorem are surprising – even shocking. If one takes a bottle of perfume in a sealed,
evacuated room and opens it, the perfume molecules will diffuse throughout the room. The recurrence theorem guarantees that after
some finite time T all the molecules will go back inside the bottle (and arbitrarily close to their initial velocities as well). The hitch
is that this could take a very long time, much much longer than the age of the Universe.
On less absurd time scales, we know that most systems come to thermodynamic equilibrium. But how can a system both exhibit
equilibration and Poincaré recurrence? The two concepts seem utterly incompatible!
A beautifully simple model due to Kac shows how a recurrent system can exhibit the phenomenon of equilibration. Consider a ring
with N sites. On each site, place a ‘spin’ which can be in one of two states: up or down. Along the N links of the system, F of
them contain ‘flippers’. The configuration of the flippers is set at the outset and never changes. The dynamics of the system are as
follows: during each time step, every spin moves clockwise a distance of one lattice spacing. Spins which pass through flippers
reverse their orientation: up becomes down, and down becomes up.
Figure 3.3.3 : Left: A configuration of the Kac ring with N = 16 sites and F = 4 flippers. The flippers, which live on the links, are
represented by blue dots. Right: The ring system after one time step. Evolution proceeds by clockwise rotation. Spins passing
through flippers are flipped.
The ‘phase space’ for this system consists of 2 discrete configurations. Since each configuration maps onto a unique image under
N
the evolution of the system, phase space ‘volume’ is preserved. The evolution is invertible; the inverse is obtained simply by
rotating the spins counterclockwise. Figure 3.3.3 depicts an example configuration for the system, and its first iteration under the
dynamics.
Figure 3.3.4 : Three simulations of the Kac ring model with N = 2500 sites and three different concentrations of flippers. The red
line shows the magnetization as a function of time, starting from an initial configuration in which 100% of the spins are up. The
blue line shows the prediction of the Stosszahlansatz, which yields an exponentially decaying magnetization with time constant τ .
Suppose the flippers were not fixed, but moved about randomly. In this case, we could focus on a single spin and determine its
configuration probabilistically. Let p be the probability that a given spin is in the up configuration at time n . The probability that
n
it is up at time (n + 1) is then
pn+1 = (1 − x) pn + x (1 − pn ) , (3.3.4)
where x = F /N is the fraction of flippers in the system. In words: a spin will be up at time (n + 1) if it was up at time n and did
not pass through a flipper, or if it was down at time n and did pass through a flipper. If the flipper locations are randomized at each
time step, then the probability of flipping is simply x = F /N . Equation 3.3.4 can be solved immediately:
1 1
n
pn = + (1 − 2x ) (p0 − ) , (3.3.5)
2 2
which decays exponentially to the equilibrium value of p eq =

1
2
with time scale
1
τ (x) = − . (3.3.6)
ln |1 − 2x|
We identify as the microscopic relaxation time over which local equilibrium is established. If we define the magnetization
τ (x)
m ≡ (N − N )/N , then m = 2p − 1 , so m = (1 − 2x ) m . The equilibrium magnetization is m = 0 . Note that for

n
n 0 eq
↑ ↓
< x < 1 that the magnetization reverses sign each time step, as well as decreasing exponentially in magnitude.
1
Figure 3.3.5 : Simulations of the Kac ring model. Top: N = 2500 sites with F = 201 flippers. After 2500 iterations, each spin has
flipped an odd number of times, so the recurrence time is 2N. Middle: N = 2500 with F = 2400, resulting in a near-complete
reversal of the population with every iteration. Bottom: N = 25000 with N = 1000, showing long time equilibration and dramatic
resurgence of the spin population.
The assumption that leads to equation 3.3.4 is called the Stosszahlansatz8, a long German word meaning, approximately,
‘assumption on the counting of hits’. The resulting dynamics are irreversible: the magnetization inexorably decays to zero.
However, the Kac ring model is purely deterministic, and the Stosszahlansatz can at best be an approximation to the true dynamics.
Clearly the Stosszahlansatz fails to account for correlations such as the following: if spin i is flipped at time n , then spin i + 1 will
have been flipped at time n − 1 . Also if spin i is flipped at time n , then it also will be flipped at time n + N . Indeed, since the
dynamics of the Kac ring model are invertible and volume preserving, it must exhibit Poincaré recurrence. We see this most vividly
in Figures 3.3.4 and 3.3.5.
The model is trivial to simulate. The results of such a simulation are shown in Figure 3.3.4 for a ring of N = 1000 sites, with
F = 100 and F = 24 flippers. Note how the magnetization decays and fluctuates about the equilibrium value m = 0 , but that
eq
after N iterations m recovers its initial value: m = m . The recurrence time for this system is simply N if F is even, and 2N if
N 0
F is odd, since every spin will then have flipped an even number of times.
In Figure 3.3.5 we plot two other simulations. The top panel shows what happens when x > , so that the magnetization wants to
1
reverse its sign with every iteration. The bottom panel shows a simulation for a larger ring, with N = 25000 sites. Note that the
fluctuations in m about equilibrium are smaller than in the cases with N = 1000 sites. Why?
This page titled 3.3: Irreversibility and Poincaré Recurrence is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated
by Daniel Arovas.
Definition of Ergodicity
A mechanical system evolves according to Hamilton’s equations of motion. We have seen how such a system is recurrent in the
sense of Poincaré.
There is a level beyond recurrence called ergodicity. In an ergodic system, time averages over intervals [0, T ] with T → ∞ may be
replaced by phase space averages. The time average of a function f (φ) is defined as
T
1
⟨f (φ)⟩ = lim ∫ dt f (φ(t)) . (3.4.1)
t
T →∞ T
0
For a Hamiltonian system, the phase space average of the same function is defined by
⟨f (φ)⟩ = ∫ dμ f (φ) δ(E − H (φ))/∫ dμ δ(E − H (φ)) , (3.4.2)

S
where H (φ) = H (q, p) is the Hamiltonian, and where δ(x) is the Dirac δ -function. Thus,
ergodicity ⟺ ⟨f (φ)⟩ = ⟨f (φ)⟩ , (3.4.3)

t S
for all smooth functions f (φ) for which ⟨f (φ)⟩ exists and is finite. Note that we do not average over all of phase space. Rather,
S
we average only over a hypersurface along which H (φ) = E is fixed, over one of the level sets of the Hamiltonian function. This
is because the dynamics preserves the energy. Ergodicity means that almost all points φ will, upon Hamiltonian evolution, move in
such a way as to eventually pass through every finite neighborhood on the energy surface, and will spend equal time in equal
regions of phase space.
Let χR
(φ) be the characteristic function of a region R :
1 if φ ∈ R
χ (φ) = { (3.4.4)
R
0 otherwise,
where H (φ) = E for all φ ∈ R . Then

time spent in R
⟨χ (φ)⟩ = lim ( ) . (3.4.5)
R t
T →∞ T
If the system is ergodic, then

D (E)
R
⟨χ (φ)⟩ = P (R) = , (3.4.6)
R t
D(E)
where P (R) is the a priori probability to find φ ∈ R , based solely on the relative volumes of R and of the entire phase space. The
latter is given by
D(E) = ∫ dμ δ(E − H (φ)) , (3.4.7)
called the density of states, is the surface area of phase space at energy E , and
D (E) = ∫ dμ δ(E − H (φ)) . (3.4.8)

R
is the density of states for the phase space subset R . Note that
dS
D(E) ≡ ∫ dμ δ(E − H (φ)) = ∫
|∇H |
SE
d dΩ(E)
= ∫ dμ Θ(E − H (φ)) = .
dE dE
Here, dS is the differential surface element, S is the constant

E
H hypersurface H (φ) = E , and Ω(E) is the volume of phase
space over which H (φ) < E . Note also that we may write
dμ = dE dΣ , (3.4.9)
E
where
dS ∣
dΣ = ∣ (3.4.10)
E
|∇H | ∣H (φ)=E
is the the invariant surface element.
The Microcanonical Ensemble

The distribution,
δ(E − H (φ)) δ(E − H (φ))

ϱ (φ) = = , (3.4.11)
E
D(E) ∫ dμ δ(E − H (φ))
defines the microcanonical ensemble (μ CE) of Gibbs.

We could also write
1
⟨f (φ)⟩ = ∫ dΣ f (φ) , (3.4.12)
S E
D(E)
S
E
integrating over the hypersurface S rather than the entire phase space.
E
Figure 3.4.1 : Constant phase space velocity at an irrational angle over a toroidal phase space is ergodic, but not mixing. A circle
remains a circle, and a blob remains a blob.
Ergodicity and Mixing

Just because a system is ergodic, it does not necessarily mean that ϱ(φ, t) → ϱ (φ) , for consider the following motion on the
eq
toroidal space (φ = (q, p) ∣∣ 0 ≤ q < 1 , 0 ≤ p < 1} , where we identify opposite edges, we impose periodic boundary conditions.
We also take q and p to be dimensionless, for simplicity of notation. Let the dynamics be given by
q̇ = 1 , ṗ = α . (3.4.13)
The solution is
q(t) = q +t , p(t) = p + αt , (3.4.14)

0 0
hence the phase curves are given by
p =p + α(q − q ) . (3.4.15)
0 0
Now consider the average of some function f (q, p). We can write f (q, p) in terms of its Fourier transform,
^ 2πi(mq+np)
f (q, p) = ∑ f mn e . (3.4.16)
m,n
We have, then,
^ 2πi(mq +n p ) 2πi(m+αn)t
f (q(t), p(t)) = ∑ f mn e 0 0
e . (3.4.17)
m,n
We can now perform the time average of f :

2πi(m+αn)T
1 ′ e −1
^ ^ 2πi(mq +n p )
⟨f (q, p)⟩ =f + lim ∑ f e 0 0
t 00 mn
T →∞ T 2πi(m + αn)
m,n
^
= f 00 if α irrational.
Clearly,
1 1
^
⟨f (q, p)⟩ = ∫ dq∫ dp f (q, p) = f 00 = ⟨f (q, p)⟩ , (3.4.18)
S t
0 0
so the system is ergodic.
Figure 3.4.2 : The baker’s transformation is a successive stretching, cutting, and restacking.
The situation is depicted in Figure 3.4.1. If we start with the characteristic function of a disc,
2 2 2
ϱ(q, p, t = 0) = Θ(a − (q − q ) − (p − p ) ) , (3.4.19)
0 0
then it remains the characteristic function of a disc:

2 2 2
ϱ(q, p, t) = Θ(a − (q − q − t) − (p − p − αt) ) , (3.4.20)
0 0
For an example of a transition to ergodicity in a simple dynamical Hamiltonian model, see §9.
A stronger condition one could impose is the following. Let A and B be subsets of S . Define the measure E
D (E)
A
ν (A) = ∫ dΣ χ (φ)/∫ dΣ = , (3.4.21)
E A E
D(E)
where χ (φ) is the characteristic function of A . The measure of a set A is the fraction of the energy surface S covered by A .
A E
This means ν (S ) = 1 , since S is the entire phase space at energy E . Now let g be a volume-preserving map on phase space.
E E
Given two measurable sets A and B , we say that a system is mixing if

n
mixing ⟺ lim ν (g A ∩ B) = ν (A) ν (B) . (3.4.22)
n→∞
In other words, the fraction of B covered by the n iterate of A , g A, is, as n → ∞ , simply the fraction of S covered by A . The
th n
E
iterated map g distorts the region A so severely that it eventually spreads out ‘evenly’ over the entire energy hypersurface. Of
n
course by ‘evenly’ we mean ‘with respect to any finite length scale’, because at the very smallest scales, the phase space density is
still locally constant as one evolves with the dynamics.
Figure 3.4.3 : The multiply iterated baker’s transformation. The set A covers half the phase space and its area is preserved under the
map. Initially, the fraction of B covered by A is zero. After many iterations, the fraction of B covered by g A approaches .
n 1
Mixing means that
⟨f (φ)⟩ = ∫ dμ ϱ(φ, t) f (φ)
to35pt\rightarrowfill ∫ dμ f (φ) δ(E − H (φ))/∫ dμ δ(E − H (φ))

t→∞
≡ Tr [f (φ) δ(E − H (φ))]/ Tr [δ(E − H (φ))] .
Physically, we can imagine regions of phase space being successively stretched and folded. During the stretching process, the
volume is preserved, so the successive stretch and fold operations map phase space back onto itself.
An example of a mixing system is the baker’s transformation, depicted in Figure 3.4.2. The baker map is defined by
1 1
⎧ (2q , p) if 0 ≤q <
⎪ 2 2
g(q, p) = ⎨ (3.4.23)
⎩
⎪ 1 1 1
(2q − 1 , p+ ) if ≤q <1 .
2 2 2
Note that g is invertible and volume-preserving. The baker’s transformation consists of an initial stretch in which q is expanded by
a factor of two and p is contracted by a factor of two, which preserves the total volume. The system is then mapped back onto the
original area by cutting and restacking, which we can call a ‘fold’. The inverse transformation is accomplished by stretching first in
the vertical (p) direction and squashing in the horizontal (q) direction, followed by a slicing and restacking. Explicitly,
1 1
⎧(
⎪
q , 2p) if 0 ≤p <
2 2
−1
g (q, p) = ⎨ (3.4.24)
⎩
⎪ 1 1 1
( q+ , 2p − 1) if ≤p <1 .
2 2 2
Figure 3.4.4 : The Arnold cat map applied to an image of 150 × 150 pixels. After 300 iterations, the image repeats itself. (Source:
Wikipedia)
Another example of a mixing system is Arnold’s ‘cat map’9
g(q, p) = ( [q + p] , [q + 2p] ) , (3.4.25)
where [x] denotes the fractional part of x. One can write this in matrix form as
M

′
q 1 1 q 2
( ) =( ) ( ) mod Z . (3.4.26)
′
p 1 2 p
The matrix M is very special because it has integer entries and its determinant is det M =1 . This means that the inverse also has
integer entries. The inverse transformation is then
−1
M

′
q 2 −1 q 2
( ) =( ) ( ) mod Z . (3.4.27)
′
p −1 1 p
Now for something cool. Suppose that our image consists of a set of discrete points located at (n /k , n /k), where the 1 2
denominator k ∈ Z is fixed, and where n and n range over the set {1, … , k}. Clearly g and its inverse preserve this set, since
1 2
the entries of M and M −1

are integers. If there are two possibilities for each pixel (say off and on, or black and white), then there
2
are 2 (k )
possible images, and the cat map will map us invertibly from one image to another. Therefore it must exhibit Poincaré
recurrence! This phenomenon is demonstrated vividly in Figure 3.4.4, which shows a k = 150 pixel (square) image of a cat
subjected to the iterated cat map. The image is stretched and folded with each successive application of the cat map, but after 300
iterations the image is restored! How can this be if the cat map is mixing? The point is that only the discrete set of points
(n /k , n /k) is periodic. Points with different denominators will exhibit a different periodicity, and points with irrational
1 2
coordinates will in general never return to their exact initial conditions, although recurrence says they will come arbitrarily close,
given enough iterations. The baker’s transformation is also different in this respect, since the denominator of the p coordinate is
doubled upon each successive iteration.
The student should now contemplate the hierarchy of dynamical systems depicted in Figure 3.4.5, understanding the characteristic
features of each successive refinement10.
Figure 3.4.5 : The hierarchy of dynamical systems.
This page titled 3.4: Remarks on Ergodic Theory is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel
Arovas.
Quantum Dephasing
Thermalization of quantum systems is fundamentally different from that of classical systems. Whereas time evolution in classical
mechanics is in general a nonlinear dynamical system, the Schrödinger equation for time evolution in quantum mechanics is linear:
∂Ψ
^
iℏ = HΨ , (3.5.1)
∂t
where H ^
is a many-body Hamiltonian. In classical mechanics, the thermal state is constructed by time evolution – this is the
content of the ergodic theorem. In quantum mechanics, as we shall see, the thermal distribution must be encoded in the eigenstates
themselves.
Let us assume an initial condition at t = 0 ,
|Ψ(0)⟩ = ∑ Cα | Ψα ⟩ , (3.5.2)
where {| Ψα ⟩} is an orthonormal eigenbasis for H^ satisfying ^ |Ψ ⟩ = E |Ψ ⟩

H α α α . The expansion coefficients satisfy
|Ψ(0)⟩ and ∑ | C | = 1 . Normalization requires
2
Cα = ⟨Ψα α
α
2
⟨ Ψ(0) | Ψ(0) ⟩ = ∑ | Cα | =1 . (3.5.3)
The time evolution of |Ψ⟩ is then given by

−iEα t/ℏ
|Ψ(t)⟩ = ∑ Cα e | Ψα ⟩ . (3.5.4)
The energy is distributed according to the time-independent function

2
^
P (E) = ⟨ Ψ(t) | δ(E − H ) | Ψ(t) ⟩ = ∑ | Cα | δ(E − Eα ) . (3.5.5)
Thus, the average energy is time-independent and is given by

∞
2
^
⟨E⟩ = ⟨ Ψ(t) | H | Ψ(t) ⟩ = ∫ dE P (E) E = ∑ | Cα | Eα . (3.5.6)
α
−∞
The root mean square fluctuations of the energy are given by

−−−−−−−−−−−−−−−−−−−−−−−−−
1/2 2
2 2 2
2
(ΔE )rms = ⟨(E − ⟨E⟩) ⟩ = √ ∑ | Cα | Eα − ( ∑ | Cα | Eα ) . (3.5.7)
α α
Typically we assume that the distribution P (E) is narrowly peaked about ⟨E⟩, such that (ΔE ) ≪ E − E , where rms 0
E
0
is the
ground state energy. Note that P (E) = 0 for E < E , the eigenspectrum of H
0
^ is bounded from below.
Now consider a general quantum observable described by an operator A . We have

∗ i( Eα −Eβ )t/ℏ
⟨A(t)⟩ = ⟨ Ψ(t) | A | Ψ(t) ⟩ = ∑ Cα C e A , (3.5.8)
β αβ
α,β
where A αβ
= ⟨Ψα |A| Ψ ⟩
β
. In the limit of large times, we have
T
1 2
⟨A ⟩ ≡ lim ∫ dt ⟨A(t)⟩ = ∑ | Cα | Aαα . (3.5.9)
t
T →∞ T
α
0
Note that this implies that all coherence between different eigenstates is lost in the long time limit, due to dephasing.
Eigenstate Thermalization Hypothesis
The essential ideas behind the eigenstate thermalization hypothesis (ETH) were described independently by J. Deutsch (1991) and
by M. Srednicki (1994). The argument goes as follows. If the total energy is the only conserved quantity, and if A is a local,
translationally-invariant, few-body operator, then the time average ⟨A⟩ is given by its microcanonical value,
2
∑α Aαα Θ(Eα ∈ I )
⟨A ⟩t = ∑ | Cα | Aαα = ≡ ⟨A ⟩ , (3.5.10)
E
∑ Θ(Eα ∈ I )
α α
where I = [E, E + ΔE] is an energy interval of width ΔE. So once again, time averages are micro canonical averages.
But how is it that this is the case? The hypothesis of Deutsch and of Srednicki is that thermalization in isolated and bounded
quantum systems occurs at the level of individual eigenstates. That is, for all eigenstates |Ψ ⟩ with E ∈ I , one has α α
A = ⟨A ⟩ . (3.5.11)
αα Eα
This means that thermal information is encoded in each eigenstate. This is called the eigenstate thermalization hypothesis (ETH).
An equivalent version of the ETH is the following scenario. Suppose we have an infinite or extremely large quantum system U (the
‘universe’) fixed in an eigenstate |Ψ ⟩. Then form the projection operator P = |Ψ ⟩⟨Ψ | . Projection operators satisfy P = P
α α α α
2
and their eigenspectrum consists of one eigenvalue 1 and the rest of the eigenvalues are zero11. Now consider a partition of
U = W ∪ S , where W ≫ S . We imagine S to be the ‘system’ and W the ‘world’. We can always decompose the state | Ψ ⟩ in a α
complete product basis for W and S , viz.

N N
W S
α W S
| Ψα ⟩ = ∑ ∑ Qpj | ψp ⟩ ⊗ | ψ ⟩ . (3.5.12)
j
p=1 j=1
Here N W /S
is the size of the basis for W /S. The reduced density matrix for S is defined as
N N
S W
α α∗ S S
ρ = Tr Pα = ∑ ( ∑ Qpj Q ′ ) |ψ ⟩⟨ψ ′ | . (3.5.13)
S pj j j
W
′
j, j =1 p=1
The claim is that ρ approximates a thermal density matrix on S ,

S
1 ^
−βH S
ρ ≈ e , (3.5.14)
S
Z
S
^
where H^
S
is some Hamiltonian on S , and Z S
= Tr e
−βH
S
, so that Tr ρ
S
=1 and ρ is properly normalized. A number of issues
S
remain to be clarified:
What do we mean by “approximates"?
What do we mean by H^
? S
What do we mean by the temperature T ?

We address these in reverse order. The temperature T of an eigenstate | Ψα ⟩ of a Hamiltonian ^
H is defined by setting its energy
density E /V to the thermal energy density,
α U
^
^ −βH
Eα 1 Tr H e
= . (3.5.15)
^
V V −βH
Tr e
Here, ^ =H
H ^
U
is the full Hamiltonian of the universe U =W ∪S . Our intuition is that ^
H
S
should reflect a restriction of the
original Hamiltonian to the system S . What should be done, though, about the interface parts of H
^
HU
^
which link S and W ? For U
lattice Hamiltonians, we can simply but somewhat arbitrarily cut all the bonds coupling S and W . But we could easily imagine
some other prescription, such as halving the coupling strength along all such interface bonds. Indeed, the definition of H is S
somewhat arbitrary. However, so long as we use ρ to compute averages of local operators which lie sufficiently far from the
S
boundary of S , the precise details of how we truncate H ^

to H
^
are unimportant. This brings us to the first issue: the
U S
approximation of ρ by its Gibbs form in Equation 3.5.14 is only valid when we consider averages of local operators lying within
S
the bulk of S . This means that we must only examine operators whose support is confined to regions greater than some distance ξ T
from ∂S , where ξ is a thermal correlation length. This, in turn, requires that L ≫ ξ , the region S is very large on the scale of
T S T
ξ . How do we define ξ ? For a model such as the Ising model, it can be taken to be the usual correlation length obtained from the
T T
spin-spin correlation function ⟨σ σ ⟩ . More generally, we may choose the largest correlation length from among the correlators
r ′
r T
of all the independent local operators in our system. Again, the requirement is that exp(−d (r)/ξ ) ≪ 1 , where d (r) is the
∂ T ∂
shortest distance from the location of our local operator O to the boundary of S . At criticality, the exponential is replaced by a
r
power law (d (r)/ξ ) , where p is a critical exponent. Another implicit assumption here is that V ≪ V .
∂ T
−p
S W
When is the ETH true?

There is no rigorous proof of the ETH. Deutsch showed that the ETH holds for the case of an integrable Hamiltonian weakly
perturbed by a single Gaussian random matrix. Horoi (1995) showed that nuclear shell model wavefunctions reproduce
thermodynamic predictions. Recent numerical work by M. Rigol and collaborators has verified the applicability of the ETH in
small interacting boson systems. ETH fails for so-called integrable models, where there are a large number of conserved quantities,
which commute with the Hamiltonian. Integrable models are, however, quite special, and as Deutsch showed, integrability is
spoiled by weak perturbations, in which case ETH then applies.
ETH also fails in the case of noninteracting disordered systems which exhibit Anderson localization. Single particle energy
eigenstates ψ whose energies ε
j
the localized portion of the eigenspectrum decay exponentially, as
j
| ψ (r)|
j
2
∼ exp ( − |r − r |/ξ(ε ))
j
, where r is some position in space associated with ψ and ξ(ε ) is the localization length.
j j j j
Within the localized portion of the spectrum, ξ(ε) is finite. As ε approaches a mobility edge, ξ(ε) diverges as a power law. In the
delocalized regime, eigenstates are spatially extended and typically decay at worst as a power law12. Exponentially localized states
are unable to thermalize with other distantly removed localized states. Of course, all noninteracting systems will violate ETH,
because they are integrable. The interacting version of this phenomenon, many-body localization (MBL), is a topic of intense
current interest in condensed matter and statistical physics. MBL systems also exhibit a large number of conserved quantities, but
in contrast to the case of integrable systems, where each conserved quantity is in general expressed in terms of an integral of a local
density, in MBL systems the conserved quantities are themselves local, although emergent. The emergent nature of locally
conserved quantities in MBL systems means that they are not simply expressed in terms of the original local operators of the
system, but rather are arrived at via a sequence of local unitary transformations.
Note again that in contrast to the classical case, time evolution of a quantum state does not create the thermal state. Rather, it
reveals the thermal distribution which is encoded in all eigenstates after sufficient time for dephasing to occur, so that correlations
between all the wavefunction expansion coefficients {C } for α ≠ α are all lost.
α
′
This page titled 3.5: Thermalization of Quantum Systems is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by
Daniel Arovas.
3.6: Appendices
Appendix I: Formal Solution of the Master Equation
Recall the master equation ˙
P i = −Γ
ij
P
j
. The matrix Γ
ij
is real but not necessarily symmetric. For such a matrix, the left
eigenvectors ϕ and the right eigenvectors ψ are not the same: general different:
α
i
β
α α
ϕ Γ = λα ϕ
i ij j
β β
Γ ψ =λ ψ .
ij j β i
Note that the eigenvalue equation for the right eigenvectors is Γψ = λψ while that for the left eigenvectors is t
Γ ϕ = λϕ . The
characteristic polynomial is the same in both cases:
t
F (λ) ≡ det (λ − Γ) = det (λ − Γ ) , (3.6.1)
∗
which means that the left and right eigenvalues are the same. Note also that [F (λ)] = F (λ ) , hence the eigenvalues are either ∗
β
real or appear in complex conjugate pairs. Multiplying the eigenvector equation for ϕ on the right by ψ and summing over j , and α
j
multiplying the eigenvector equation for ψ on the left by ϕ and summing over i, and subtracting the two results yields
β α
i
α β
(λα − λ ) ⟨ ϕ ∣
∣ψ ⟩ =0 , (3.6.2)
β
where the inner product is
⟨ϕ∣
∣ ψ ⟩ = ∑ ϕi ψi . (3.6.3)
We can now demand

α β
⟨ϕ ∣
∣ψ ⟩ =δ , (3.6.4)
αβ
in which case we can write

α α α α
Γ = ∑ λα ∣
∣ψ ⟩⟨ ϕ ∣
∣ ⟺ Γ = ∑ λα ψ ϕ . (3.6.5)
ij i j
α α
We have seen that ϕ ⃗ = (1, 1, … , 1) is a left eigenvector with eigenvalue λ = 0 , since ∑ Γ i ij

=0 . We do not know a priori the
corresponding right eigenvector, which depends on other details of Γ . Now let’s expand P ij i
(t) in the right eigenvectors of Γ,
writing
α
P (t) = ∑ Cα (t) ψ . (3.6.6)
i i
Then
dP dCα
i α
=∑ ψ
i
dt dt
α
α
= −Γ P = − ∑ Cα Γ ψ
ij j ij j
α
= − ∑ λα Cα ψ .
i
This allows us to write

dCα −λα t
= −λα Cα ⟹ Cα (t) = Cα (0) e . (3.6.7)
dt
Hence, we can write

−λ t α
P (t) = ∑ Cα (0) e α
ψ . (3.6.8)
i i
It is now easy to see that Re (λ ) ≥ 0 for all λ , or else the probabilities will become negative. For suppose Re (λ ) < 0 for some
α α
α . Then as t → ∞ , the sum in Equation 3.6.8 will be dominated by the term for which λ has the largest negative real part; all
α
other contributions will be subleading. But we must have ∑ ψ = 0 since ∣ ψ ⟩ must be orthogonal to the left eigenvector
i
∣
α
i
α
α=0
ϕ
⃗
. Therefore, at least one component of ψ ( for some value of i) must have a negative real part, which means
= (1, 1, … , 1)
α
i
a negative probability!13 As we have already proven that an initial nonnegative distribution {P (t = 0)} will remain nonnegative
i
under the evolution of the master equation, we conclude that P (t) → P as t → ∞ , relaxing to the λ = 0 right eigenvector, with
i
eq
Re (λ ) ≥ 0 for all α .
α
Appendix II: Radioactive Decay

Consider a group of atoms, some of which are in an excited state which can undergo nuclear decay. Let Pn (t) be the probability
that n atoms are excited at some time t . We then model the decay dynamics by
⎧0 if m ≥ n
Wmn = ⎨ nγ if m = n − 1 (3.6.9)
⎩
0 if m < n − 1 .
Here, γ is the decay rate of an individual atom, which can be determined from quantum mechanics. The master equation then tells
us
dPn
= (n + 1) γ P − n γ Pn . (3.6.10)
n+1
dt
2
The interpretation here is as follows: let ∣∣ n ⟩ denote a state in which n atoms are excited. Then P (t) = ∣∣⟨ ψ(t) | n ⟩∣∣ . Then P (t)
n n
will increase due to spontaneous transitions from | n+1 ⟩ to | n ⟩, and will decrease due to spontaneous transitions from | n ⟩ to
| n−1 ⟩ .
The average number of particles in the system is

∞
N (t) = ∑ n Pn (t) . (3.6.11)
n=0
Note that
∞
dN
= ∑ n[(n + 1) γ P − n γ Pn ]
n+1
dt
n=0
2
= γ ∑ [n(n − 1) Pn − n Pn ]
n=0
= −γ ∑ n Pn = −γ N .
n=0
Thus,
−γt
N (t) = N (0) e . (3.6.12)
The relaxation time is τ =γ

−1
, and the equilibrium distribution is
eq
Pn =δ . (3.6.13)
n,0
Note that this satisfies detailed balance.

We can go a bit farther here. Let us define
∞
n
P (z, t) ≡ ∑ z Pn (t) . (3.6.14)
n=0
This is sometimes called a generating function. Then
∞
∂P n
= γ∑z [(n + 1) P − n Pn ]
n+1
∂t
n=0
∂P ∂P
=γ − γz .
∂z ∂z
Thus,
1 ∂P ∂P
− (1 − z) =0 . (3.6.15)
γ ∂t ∂z
We now see that any function f (ξ) satisfies the above equation, where ξ = γt − ln(1 − z) . Thus, we can write
P (z, t) = f (γt − ln(1 − z)) . (3.6.16)
Setting t = 0 we have P (z, 0) = f (−ln(1 − z)) , and inverting this result we obtain f (u) = P (1 − e −u
, 0) ,
−γt
P (z, t) = P (1 + (z − 1) e , 0) . (3.6.17)
The total probability is P (z=1, t) = ∑ ∞
n=0
Pn , which clearly is conserved: P (1, t) = P (1, 0). The average particle number is
∞
∂P ∣ −γt −γt
N (t) = ∑ n Pn (t) = ∣ = e P (1, 0) = N (0) e . (3.6.18)
∂z ∣z=1
n=0
Appendix III: Transition to Ergodicity in a Simple Model

A ball of mass m executes perfect one-dimensional motion along the symmetry axis of a piston. Above the ball lies a mobile piston
head of mass M which slides frictionlessly inside the piston. Both the ball and piston head execute ballistic motion, with two types
of collision possible: (i) the ball may bounce off the floor, which is assumed to be infinitely massive and fixed in space, and (ii) the
ball and piston head may engage in a one-dimensional elastic collision. The Hamiltonian is
2 2
P p
H = + + M gX + mgx , (3.6.19)
2M 2m
where X is the height of the piston head and x the height of the ball. Another quantity is conserved by the dynamics: Θ(X − x) . ,
the ball always is below the piston head.
−−
Choose an arbitrary length scale L, and then energy scale E = M gL , momentum scale P
0 0
= M √gL , and time scale
− −−
τ = √L/g . Show that the dimensionless Hamiltonian becomes
0
2
1 2 p̄
¯ ¯ ¯
H = P +X + + rx̄ , (3.6.20)
2 2r
with r = m/M , and with equations of motion dX/dt = ∂ H̄ /∂ P¯ , (Here the bar indicates dimensionless variables:
P = P / P , t̄ = t/ τ , ) What special dynamical consequences hold for r = 1 ?
¯
0 0
Compute the microcanonical average piston height ⟨X⟩. The analogous dynamical average is
T
1
⟨X ⟩t = lim ∫ dt X(t) . (3.6.21)
T →∞ T
0
When computing microcanonical averages, it is helpful to use the Laplace transform, discussed toward the end of §3.3 of the
notes. (It is possible to compute the microcanonical average by more brute force methods as well.)
Compute the microcanonical average of the rate of collisions between the ball and the floor. Show that this is given by
+
⟨ ∑ δ(t − ti )⟩ = ⟨Θ(v) v δ(x − 0 )⟩ . (3.6.22)
The analogous dynamical average is

T
1
⟨γ ⟩t = lim ∫ dt ∑ δ(t − ti ) , (3.6.23)
T →∞ T
i
0
where {t } is the set of times at which the ball hits the floor.
i
How do your results change if you do not enforce the dynamical constraint X ≥ x ?
Write a computer program to simulate this system. The only input should be the mass ratio r (set Ē = 10 to fix the energy).
You also may wish to input the initial conditions, or perhaps to choose the initial conditions randomly (all satisfying energy
conservation, of course!). Have your program compute the microcanonical as well as dynamical averages in parts (b) and (c).
Plot out the Poincaré section of P vs. X for those times when the ball hits the floor. Investigate this for several values of r. Just
to show you that this is interesting, I’ve plotted some of my own numerical results in Figure 3.6.1.
Figure 3.6.1 : Poincaré sections for the ball and piston head problem. Each color corresponds to a different initial condition. When
the mass ratio r = m/M exceeds unity, the system apparently becomes ergodic.
r X(0) ⟨X(t)⟩ ⟨X⟩μce ⟨γ(t)⟩ ⟨γ⟩μce r X(0) ⟨X(t)⟩ ⟨X⟩μce ⟨γ(t)⟩ ⟨γ⟩μce
0.3 0.1 6.1743 5.8974 0.5283 0.4505 1.2 0.1 4.8509 4.8545 0.3816 0.3812
0.3 1.0 5.7303 5.8974 0.4170 0.4505 1.2 1.0 4.8479 4.8545 0.3811 0.3812
0.3 3.0 5.7876 5.8974 0.4217 0.4505 1.2 3.0 4.8493 4.8545 0.3813 0.3812
0.3 5.0 5.8231 5.8974 0.4228 0.4505 1.2 5.0 4.8482 4.8545 0.3813 0.3812
0.3 7.0 5.8227 5.8974 0.4228 0.4505 1.2 7.0 4.8472 4.8545 0.3808 0.3812
0.3 9.0 5.8016 5.8974 0.4234 0.4505 1.2 9.0 4.8466 4.8545 0.3808 0.3812
0.3 9.9 6.1539 5.8974 0.5249 0.4505 1.2 9.9 4.8444 4.8545 0.3807 0.3812
r X(0) N ⟨X(t)⟩ ⟨X⟩μce ⟨γ(t)⟩ ⟨γ⟩μce

b
4
1.2 7.0 10 4.8054892 4.8484848 0.37560388 0.38118510
5
1.2 7.0 10 4.8436969 4.8484848 0.38120356 0.38118510
6
1.2 7.0 10 4.8479414 4.8484848 0.38122778 0.38118510
r X(0) N ⟨X(t)⟩ ⟨X⟩μce ⟨γ(t)⟩ ⟨γ⟩μce
b
7
1.2 7.0 10 4.8471686 4.8484848 0.38083749 0.38118510
8
1.2 7.0 10 4.8485825 4.8484848 0.38116282 0.38118510
9
1.2 7.0 10 4.8486682 4.8484848 0.38120259 0.38118510
9
1.2 1.0 10 4.8485381 4.8484848 0.38118069 0.38118510
9
1.2 9.9 10 4.8484886 4.8484848 0.38116295 0.38118510
This page titled 3.6: Appendices is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel Arovas.
3.S: Summary
References
R. Balescu, Equilibrium and Nonequilibrium Statistical Mechanics (Wiley, 1975) An advanced text with an emphasis on fluids
and kinetics.
R. Balian, From Macrophysics to Microphysics (2 vols., Springer-Verlag, 2006) A very detailed discussion of the fundamental
postulates of statistical mechanics and their implications.)
Summary
∙ Distributions: Equilibrium statistical mechanics describes systems of particles in terms of time-independent statistical
distributions. Where do these distributions come from? How does a system with a given set of initial conditions come to have time-
independent properties which can be described in this way?
∙ Master equation: Let P (t)
i
be the probability that a system is in state |i⟩ at time t . The evolution of the P (t)
i
is given by
dP
dt
i
= ∑ (W
j ij
P
j
−W
ji
P ) = −∑
i j
Γ
ij
, where the rates W ≥ 0 are nonnegative. Conservation of probability means
P
j ij
∑ Γ
i
for all j , hence ψ
ij
=0
t
= (1, 1, … , 1) is a left eigenvector with eigenvalue zero. The corresponding right eigenvector is
the equilibrium distribution: Γ
ij
P
j
eq
= 0 . Detailed balance, W P =W
ij
P , is a more stringent condition than the
eq
j ji
eq
requirement of a stationary distribution alone. Boltzmann’s H -theorem: H ˙

≤ 0 , where H = ∑ P ln(P / P
i i i i
eq
) . Thus, the ME
dynamics are irreversible. But the underlying microscopic laws are reversible!
∙ Hamiltonian evolution: φ̇i = J
ij ∂φ
∂H
, where φ = (q , … , qr , p , … , pr )
1 1
is a point in 2r-dimensional phase space, and
j
0 I
J =( ) . Phase space flow is then incompressible: ∇ ⋅ φ̇ = 0 , hence phase space densities ϱ(φ, t) obey Liouville’s
−I 0
equation, ∂ ϱ + φ̇ ⋅ ∇ϱ = 0 (follows from continuity and incompressibility). Any function ϱ(Λ , … , Λ ), where each Λ is
t 1 k i
conserved by the phase space dynamics, will be a stationary solution to Liouville’s equation. In particular, the microcanonical
distribution, ϱ (φ) = δ(E − H (φ))/D(E) is such a solution, where D(E) = Tr δ(E − H (φ)) is the density of states.
E
∙ Poincaré Recurrence: Let g φ(t) = φ(t + τ ) be the τ -advance mapping for a dynamical system φ̇ = V(φ) . If (i) g is
τ τ
invertible, (ii) g preserves phase space volumes, and (iii) the volume of phase accessible given the dynamics and initial conditions
τ
is finite, then in any finite neighborhood R of phase space there exists a point φ ∈ R such that g φ ∈ R with n finite. This
0 0 0
n
τ 0 0
means all the perfume molecules eventually go back inside the bottle (if it is opened in a sealed room).
∙ Kac ring model: Normally the recurrence time is orders of magnitude greater than the age of the Universe, but for the Kac ring
model, one can simulate the recurrence phenomenon easily. The model consists of a ring of N sites, and a quenched ( fixed)
random distribution of flippers on F of the links (F ≤ N ) . On each site lies a discrete spin variable which is polarized either up or
down. The system evolves discretely by all spins advancing clockwise by one site during a given time step. All spins which pass
through a flipper reverse their polarization. Viewed probabilistically, if p is the probability any given spin is up at time n , then n
under the assumptions of the Stosszahlansatz p = (1 − x)p + x(1 − p ) , where x = F /N is the flipper density. This leads to
n+1 n n
exponential relaxation with a time scale τ = −1/ ln |1 − 2x| , but the recurrence time is clearly N (if F is even) or 2N (if F is
odd).
∙ Ergodicity and mixing: A dynamical system is ergodic if
T
1 Tr f (φ) δ(E − H (φ))

⟨f (φ)⟩ = lim ∫ dt f (φ(t)) = = ⟨f (φ)⟩ . (3.S.1)
T S
T →∞ T Tr δ(E − H (φ))
0
This means long time averages are equal to phase space averages. This does not necessarily mean that the phase space distribution
will converge to the microcanonical distribution. A stronger condition, known as mixing, means that the distribution spreads out
’evenly’ over the phase space hypersurface consistent with all conservation laws. Thus, if g is a phase space map, and if
ν (A) ≡ D (E)/D(E) is the fraction of the energy hypersurface (assume no conserved quantities other than H = E ) contained in
A
A , then g is mixing if lim ν (g A ∩ B) = ν (A) ν (B) . An example of a mixing map on a two-dimensional torus is the Arnold
n
n→∞
’cat map’,
′
q 1 1 q
2
( ) =( )( ) mod Z . (3.S.2)
′
p 1 2 p
∙ Thermalization of quantum systems: This is a current research topic. One proposal, due to Deutsch (1991) and Srednicki (1994) is
the eigenstate thermalization hypothesis (ETH). This says that thermal information is encoded in each eigenstate, such that if
E ∈ [E, E + ΔE] , then
α
⟨ Ψα | A | Ψα ⟩ = ⟨A ⟩ , (3.S.3)
Eα
the expectation value of some local, translationally-invariant, few-body operator A in the state | Ψ ⟩, is given by its average over a α
small energy window containing E . If this is the case, then so long as we prepare an initial state such that the spread of energies is
α
within ΔE of some value E , where ΔE ≪ E − E with E the ground state energy, then ⟨A ⟩ = ⟨A ⟩ , and time averages
0 0 T E
become energy averages. Equivalently, the reduced density matrix ρ corresponding to a system S which is a subset of a universe
S
^
U , with W ∪ S = U (W is the ’world’), is a thermal density matrix: ρ S
=Z
−1
S
e
−βH
S
, where H
^
is the Hamiltonian restricted to
S
S , and with temperature fixed by the requirement Tr (ρ S

^
/ V ) , where the last factor is a ratio of volumes. ETH
H S ) = E ⋅ (V
S U
does not hold for so-called integrable models with an extensive number of independent conserved quantities. But it has been
shown, both perturbatively as well as numerically, to hold for certain model nonintegrable systems. An interesting distinction
between classical and quantum thermalization: in the quantum case, time evolution does not create the thermal state. Rather, it
reveals the thermal distribution which is encoded in each eigenstate after sufficient time that dephasing has occurred and all
correlations between the different wavefunction expansion coefficients is lost.
Endnotes
1. Exceptions involve quantities which are conserved by collisions, such as overall particle number, momentum, and energy. These
quantities relax to equilibrium in a special way called hydrodynamics.↩
2. ‘Compact’ in the parlance of mathematical analysis means ‘closed and bounded’.↩
3. The equality ln det M = Tr ln M is most easily proven by bringing the matrix to diagonal form via a similarity
transformation, and proving the equality for diagonal matrices.↩
4. Actually, the microscopic laws of physics are not time-reversal invariant, but rather are invariant under the product P C T ,
where P is parity, C is charge conjugation, and T is time reversal.↩
5. The natural numbers N is the set of non-negative integers {0, 1, 2, …}.↩
6. In the nonrelativistic limit, T = p /2m . For relativistic particles, we have T = (p c + m c ) − mc .↩
2 2 2 2 4 1/2 2
7. Actually, what the recurrence theorem guarantees is that there is a configuration arbitrarily close to the initial one which recurs,
to within the same degree of closeness.↩
8. Unfortunately, many important physicists were German and we have to put up with a legacy of long German words like
Gedankenexperiment, Zitterbewegung, Brehmsstrahlung, Stosszahlansatz, Kartoffelsalat,↩
9. The cat map gets its name from its initial application, by Arnold, to the image of a cat’s face.↩
10. There is something beyond mixing, called a K -system. A K -system has positive Kolmogorov-Sinai entropy. For such a system,
closed orbits separate exponentially in time, and consequently the Liouvillian L has a Lebesgue spectrum with denumerably
infinite multiplicity.↩
11. More generally, we could project onto a K -dimensional subspace, in which case there would be K eigenvalues of +1 and
N − K eigenvalues of 0 , where N is the dimension of the entire vector space.↩
12. Recall that in systems with no disorder, eigenstates exhibit Bloch periodicity in space.↩
13. Since the probability P (t) is real, if the eigenvalue with the smallest ( largest negative) real part is complex, there will be a
i
corresponding complex conjugate eigenvalue, and summing over all eigenvectors will result in a real value for P (t) .↩ i
CHAPTER OVERVIEW
4: Statistical Ensembles
4.S: Summary
Thumbnail: A model of noninteracting spin dimers on a lattice. Each red dot represents a classical spin for which σ j = ±1 .
This page titled 4: Statistical Ensembles is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel Arovas.
1
The microcanonical distribution function
We have seen how in an ergodic dynamical system, time averages can be replaced by phase space averages:
ergodicity ⟺ ⟨f (φ)⟩ = ⟨f (φ)⟩ , (4.1.1)

t S
where
T
1
⟨f (φ)⟩ = lim ∫ dt f (φ(t)) . (4.1.2)
t
T →∞ T
0
and
⟨f (φ)⟩ ^ (φ))/∫ dμ δ(E − H

= ∫ dμ f (φ) δ(E − H ^ (φ)) . (4.1.3)
S
1
Here H^
(φ) = H (q, p) is the Hamiltonian, and where δ(x) is the Dirac δ -function . Thus, averages are taken over a constant
^
energy hypersurface which is a subset of the entire phase space.

We’ve also seen how any phase space distribution ϱ(Λ , … , Λ ) which is a function of conserved quantitied
1 k
Λa (φ) is
automatically a stationary (time-independent) solution to Liouville’s equation. Note that the microcanonical distribution,
^ ^
ϱ (φ) = δ(E − H (φ))/∫ dμ δ(E − H (φ)) , (4.1.4)
E
is of this form, since H ^ (φ) is conserved by the dynamics. Linear and angular momentum conservation generally are broken by
elastic scattering off the walls of the sample.

So averages in the microcanonical ensemble are computed by evaluating the ratio
^
Tr A δ(E − H )
⟨A⟩ = , (4.1.5)
^
Tr δ(E − H )
where Tr means ‘trace’, which entails an integration over all phase space:
N d d
1 d p d q
i i
Tr A(q, p) ≡ ∏∫ A(q, p) . (4.1.6)
N! (2πℏ)d
i=1
Here N is the total number of particles and d is the dimension of physical space in which each particle moves. The factor of 1/N !,
which cancels in the ratio between numerator and denominator, is present for indistinguishable particles2. The normalization factor
(2πℏ)
−N d
renders the trace dimensionless. Again, this cancels between numerator and denominator. These factors may then seem
arbitrary in the definition of the trace, but we’ll see how they in fact are required from quantum mechanical considerations. So we
now adopt the following metric for classical phase space integration:
N d d
1 d p d q
i i
dμ = ∏ . (4.1.7)
N! (2πℏ)d
i=1
Density of States
The denominator,
^
D(E) = Tr δ(E − H ) , (4.1.8)
is called the density of states. It has dimensions of inverse energy, such that
E+ΔE
′ ′ ^
D(E) ΔE = ∫ dE ∫ dμ δ(E − H ) = ∫ dμ (4.1.9)
E ^
E<H <E+ΔE
= \# of states with energies between E and E + ΔE .
Let us now compute D(E) for the nonrelativistic ideal gas. The Hamiltonian is
N 2
p
i
^
H (q, p) = ∑ . (4.1.10)
2m
i=1
We assume that the gas is enclosed in a region of volume V , and we’ll do a purely classical calculation, neglecting discreteness of
its quantum spectrum. We must compute
N d d N 2
1 d p d q p
i i i
D(E) = ∫ ∏ δ(E − ∑ ) . (4.1.11)
d
N! (2πℏ) 2m
i=1 i=1
We shall calculate D(E) in two ways. The first method utilizes the Laplace transform, Z(β):
∞
^
−βE −βH
Z(β) = L[D(E)] ≡ ∫ dE e D(E) = Tr e . (4.1.12)
The inverse Laplace transform is then

c+i∞
−1
dβ βE
D(E) = L [Z(β)] ≡ ∫ e Z(β) , (4.1.13)
2πi
c−i∞
where c is such that the integration contour is to the right of any singularities of Z(β) in the complex β-plane. We then have
N d d
1 d x d p 2
i i −βp /2m
Z(β) = ∏∫ e i
d
N! (2πℏ)
i=1
Nd
∞
N ⎛ ⎞
V dp 2
−βp /2m
= ⎜∫ e ⎟
N! 2πℏ
⎝ ⎠
−∞
N N d/2
V m
−N d/2
= ( ) β .
2
N! 2πℏ
The inverse Laplace transform is then

N N d/2
V m dβ
βE −N d/2
D(E) = ( ) ∮ e β
2
N! 2πℏ 2πi
C
1
N d−1
2
N d/2
N E
V m
= ( ) ,
N! 2πℏ 2 Γ(N d/2)
exactly as before. The integration contour for the inverse Laplace transform is extended in an infinite semicircle in the left half β-
plane. When N d is even, the function β −N d/2
has a simple pole of order N d/2 at the origin. When N d is odd, there is a branch
cut extending along the negative Re β axis, and the integration contour must avoid the cut, as shown in Figure 4.1.1. One can
check that this results in the same expression above, we may analytically continue from even values of N d to all positive values of
N d.
For a general system, the Laplace transform, Z(β) = L[D(E)] also is called the partition function. We shall again meet up with
Z(β) when we discuss the ordinary canonical ensemble.
Figure 4.1.1 : Complex integration contours C for inverse Laplace transform L
−1
[Z(β)] = D(E) . When the product dN is odd,
there is a branch cut along the negative Re β axis.
Our final result, then, is
1
N d−1
2
N d/2
N E
V m
D(E, V , N ) = ( ) . (4.1.14)
N! 2πℏ 2 Γ(N d/2)
Here we have emphasized that the density of states is a function of E , V , and N . Using Stirling’s approximation,
1 1 −1
ln N ! = N ln N − N + ln N + ln(2π) + O(N ) , (4.1.15)
2 2
we may define the statistical entropy,

E V
S(E, V , N ) ≡ kB ln D(E, V , N ) = N kB ϕ( , ) + O(ln N ) , (4.1.16)
N N
where
E V d E V d m 1
ϕ( , ) = ln( ) + ln( )+ ln( ) + (1 + d) . (4.1.17)
2
N N 2 N N 2 dπℏ 2
Recall k B
= 1.3806503 × 10
−16
erg/K is Boltzmann’s constant.
Second method
−−−−
The second method invokes a mathematical trick. First, let’s rescale p α
i
≡ √2mE u
α
i
. We then have
Nd
N
−−−−
V √2mE 1 M 2 2 2
D(E) = ( ) ∫d u δ(u +u +… +u − 1) . (4.1.18)
1 2 M
N! h E
Here we have written u = (u , u , … , u ) with

1 2 M
M = Nd as a M -dimensional vector. We’ve also used the rule
δ(x) for δ -functions. We can now write
−1
δ(Ex) = E
M M−1
d u =u du dΩ , (4.1.19)
M
where dΩ M
is the M -dimensional differential solid angle. We now have our answer:3
Nd
N
−−
− 1
V √2m N d−1 1
2
D(E) = ( ) E ⋅ Ω . (4.1.20)
Nd
N! h 2
What remains is for us to compute Ω , the total solid angle in M dimensions. We do this by a nifty mathematical trick. Consider
M
the integral
∞
2 2
M −u M−1 −u
IM = ∫ d u e =Ω ∫ du u e
M
∞
1
1 M−1
−s
1 1
2
= Ω ∫ ds s e = Ω Γ( M) ,
M M
2 2 2
0
where s = u , and where

2
z−1 −t
Γ(z) = ∫ dt t e (4.1.21)
4
is the Gamma function, which satisfies z Γ(z) = Γ(z + 1). On the other hand, we can compute I
M
in Cartesian coordinates,
writing
M
∞
⎛ ⎞
−u
2
− M
I = ⎜ ∫ du e 1
⎟ = (√π ) . (4.1.22)
M 1
⎝ ⎠
−∞
Therefore
M/2
2π
Ω = . (4.1.23)
M
Γ(M /2)
We thereby obtain Ω 2
= 2π ,Ω
3
= 4π ,Ω 4
= 2π
2
, , the first two of which are familiar.
Arbitrariness in the definition of S(E)

Note that D(E) has dimensions of inverse energy, so one might ask how we are to take the logarithm of a dimensionful quantity in
~
Equation 4.1.16. We must introduce an energy scale, such as ΔE in Equation 4.1.9, and define D(E; ΔE) = D(E) ΔE and
~
S(E; ΔE) ≡ k ln D(E; ΔE) . The definition of statistical entropy then involves the arbitrary parameter ΔE, however this only
B
affects S(E) in an additive way. That is,

ΔE
1
S(E, V , N ; ΔE ) = S(E, V , N ; ΔE ) + kB ln( ) . (4.1.24)
1 2
ΔE
2
Note that the difference between the two definitions of S depends only on the ratio ΔE 1
/ΔE
2
, and is independent of E , V , and
N.
Ultra-relativistic ideal gas

Consider an ultrarelativistic ideal gas, with single particle dispersion ε(p) = cp . We then have
N
∞
N
V
N
Ω ⎛ ⎞
d d−1 −βcp
Z(β) = ⎜∫ dp p e ⎟
N! hN d ⎝ ⎠
0
N N
V Γ(d) Ωd
= ( ) .
d d d
N! c h β
The statistical entropy is S(E, V , N ) = k B

ln D(E, V , N ) = N kB ϕ(
E
N
,
N
V
) , with
E V E V Ωd Γ(d)
ϕ( , ) = d ln( ) + ln( ) + ln( ) + (d + 1) (4.1.25)
d
N N N N (dhc)
Discrete systems
For classical systems where the energy levels are discrete, the states of the system | σ ⟩ are labeled by a set of discrete quantities
{ σ , σ , …}, where each variable σ takes discrete values. The number of ways of configuring the system at fixed energy E is
1 2 i
then
Ω(E, N ) = ∑ δ , (4.1.26)
^
H (σ),E
σ
where the sum is over all possible configurations. Here N labels the total number of particles. For example, if we have N spin- 1
particles on a lattice which are placed in a magnetic field H , so the individual particle energy is ε = −μ H σ , where σ = ±1 , then
i 0
in a configuration in which N particles have σ = +1 and N = N − N particles have σ = −1 , the energy is

↑ i ↓ ↑ i
E = (N − N )μ H . The number of configurations at fixed energy E is

↓ ↑ 0
N N!
Ω(E, N ) = ( ) = , (4.1.27)
N E N E
N ( − )! ( + )!
↑
2 2μ H 2 2μ H
0 0
since N↑/↓
=
N
2
∓
E
2μ H
. The statistical entropy is S(E, N ) = k B
ln Ω(E, N ) .
0
This page titled 4.1: Microcanonical Ensemble (μCE) is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by
Daniel Arovas.
Thus far our understanding of ergodicity is rooted in the dynamics of classical mechanics. A Hamiltonian flow which is ergodic is
one in which time averages can be replaced by phase space averages using the microcanonical ensemble. What happens, though, if
our system is quantum mechanical, as all systems ultimately are?
The Density Matrix

First, let us consider that our system S will in general be in contact with a world W . We call the union of S and W the universe,
U = W ∪ S . Let ∣ ∣ N ⟩ denote a quantum mechanical state of W , and let ∣ ∣ n ⟩ denote a quantum mechanical state of S . Then the
most general wavefunction we can write is of the form
∣
∣ Ψ ⟩ = ∑ ΨN ,n ∣
∣ N ⟩⊗∣
∣n⟩ . (4.2.1)
N ,n
Now let us compute the expectation value of some operator A^ which acts as the identity within W , meaning
⟨N ∣
^
∣A ∣
′^
∣N ⟩ =Aδ , where A
NN
′
^
is the ‘reduced’ operator which acts within S alone. We then have
^ ∗ ^ ′
⟨Ψ∣
∣A ∣∣ Ψ ⟩ = ∑ ∑ ΨN ,n Ψ ′ ′
δ ′ ⟨n∣
∣A∣ ∣n ⟩
N ,n NN
′ ′
N ,N n,n
^A^
= Tr (ϱ ) ,
where
∗ ′
^ = ∑∑Ψ
ϱ Ψ ′
∣
∣n ⟩⟨n∣
∣ (4.2.2)
N ,n N ,n
′
N n ,n
is the density matrix. The time-dependence of ϱ^ is easily found:

∗ ′
^(t) = ∑ ∑ Ψ
ϱ Ψ ′
∣
∣ n (t) ⟩ ⟨ n(t) ∣
∣
N ,n N ,n
′
N n ,n
^ ^
−iH t/ℏ +iH t/ℏ
=e ϱ
^ e ,
where H
^ is the Hamiltonian for the system S . Thus, we find
^
∂ϱ
^ ^
iℏ = [H , ϱ ] . (4.2.3)
∂t
Note that the density matrix evolves according to a slightly different equation than an operator in the Heisenberg picture, for which
^
^ ∂A
^ +iH t/ℏ −iH t/ℏ ^ ^ ^ ^
A(t) = e Ae ⟹ iℏ = [ A, H ] = −[ H , A] . (4.2.4)
∂t
Figure 4.2.1 : A system S in contact with a ‘world’ W . The union of the two, universe U = W ∪S , is said to be the ‘universe’.
For Hamiltonian systems, we found that the phase space distribution ϱ(q, p, t) evolved according to the Liouville equation,
∂ϱ
i =Lϱ , (4.2.5)
∂t
where the Liouvillian L is the differential operator
Nd ^ ^
∂H ∂ ∂H ∂
L = −i ∑ { − } . (4.2.6)
∂p ∂q ∂q ∂p
j=1 j j j j
Accordingly, any distribution ϱ(Λ , … , Λ ) which is a function of constants of the motion Λ (q, p) is a stationary solution to the
1 k a
Liouville equation: ∂ ϱ(Λ , … , Λ ) = 0 . Similarly, any quantum mechanical density matrix which commutes with the
t 1 k
Hamiltonian is a stationary solution to Equation 4.2.3. The corresponding microcanonical distribution is

^
ϱ
^ = δ(E − H ) . (4.2.7)
E
Averaging the DOS

If our quantum mechanical system is placed in a finite volume, the energy levels will be discrete, rather than continuous, and the
density of states (DOS) will be of the form
^
D(E) = Tr δ(E − H ) = ∑ δ(E − E ) , (4.2.8)
l
where {E } are the eigenvalues of the Hamiltonian H

l
^
. In the thermodynamic limit, V → ∞ , and the discrete spectrum of kinetic
energies remains discrete for all finite V but must approach the continuum result. To recover the continuum result, we average the
DOS over a window of width ΔE:
E+ΔE
¯
¯¯¯¯¯¯¯¯¯¯
¯ 1 ′ ′
D(E) = ∫ dE D(E ) . (4.2.9)
ΔE
E
If we take the limit ΔE → 0 but with ΔE ≫ δE , where δE is the spacing between successive quantized levels, we recover a
smooth function, as shown in Figure 4.2.2. We will in general drop the bar and refer to this function as D(E). Note that
δE ∼ 1/D(E) = e is (typically) exponentially small in the size of the system, hence if we took ΔE ∝ V
−N ϕ(ε,v)
which −1
vanishes in the thermodynamic limit, there are still exponentially many energy levels within an interval of width ΔE.
Figure 4.2.2 : Averaging the quantum mechanical discrete density of states yields a continuous curve.
Coherent States
The quantum-classical correspondence is elucidated with the use of coherent states. Recall that the one-dimensional harmonic
oscillator Hamiltonian may be written
2
p 1 2
^ 2
H0 = + mω q
0
2m 2
†
1
= ℏω (a a + ) ,
0
2
where a and a are ladder operators satisfying [a, a

† †
] =1 , which can be taken to be
∂ q ∂ q
†
a =ℓ + , a = −ℓ + , (4.2.10)
∂q 2ℓ ∂q 2ℓ
−−−−−−−
with ℓ = √ℏ/2mω . Note that
0
ℏ
† †
q = ℓ (a + a ) , p = (a − a ) . (4.2.11)
2iℓ
The ground state satisfies a ψ 0 (q) =0 , which yields

2 2
2 −1/4 −q /4 ℓ
ψ (q) = (2π ℓ ) e . (4.2.12)
0
The normalized coherent state | z ⟩ is defined as

∞ n
−
1
|z|
2
za
†
−
1
|z|
2 z
|z⟩ =e 2
e |0⟩ =e 2
∑ − |n⟩ .
− (4.2.13)
√n!
n=0
The overlap of coherent states is given by

1 2 1 2
− |z | − |z | z̄ 1 z2
⟨ z1 | z2 ⟩ = e 2
1
e 2
2
e , (4.2.14)
hence different coherent states are not orthogonal. Despite this nonorthogonality, the coherent states allow a simple resolution of
the identity,
2 2
d z d z d Rez d I mz
1 =∫ | z ⟩⟨ z | ; ≡ (4.2.15)
2πi 2πi π
which is straightforward to establish.

To gain some physical intuition about the coherent states, define
Q iℓP
z ≡ + (4.2.16)
2ℓ ℏ
and write | z ⟩ ≡ | Q, P ⟩. One finds (exercise!)

2 2
2 −1/4 −iP Q/2ℏ iP q/ℏ −(q−Q) /4 ℓ
ψ (q) = ⟨ q | z ⟩ = (2π ℓ ) e e e , (4.2.17)
Q,P
hence the coherent state ψ Q,P

(q) is a wavepacket Gaussianly localized about q = Q , but oscillating with average momentum P .
For example, we can compute

†
⟨ Q, P ∣
∣q∣
∣ Q, P ⟩ = ⟨ z ∣
∣ ℓ (a + a ) ∣
∣ z ⟩ = 2ℓ Re z = Q
ℏ ℏ
†
⟨ Q, P ∣
∣p∣
∣ Q, P ⟩ = ⟨ z ∣
∣ (a − a ) ∣
∣z⟩ = Im z = P
2iℓ ℓ
as well as
2 2 † 2 2 2
⟨ Q, P ∣
∣q ∣
∣ Q, P ⟩ = ⟨ z ∣
∣ℓ (a + a ) ∣
∣z⟩ =Q +ℓ
2 2
ℏ ℏ
2 † 2 2
⟨ Q, P ∣
∣p ∣
∣ Q, P ⟩ = −⟨ z ∣
∣ (a − a ) ∣
∣z⟩ =P + .
2 2
4ℓ 4ℓ
Thus, the root mean square fluctuations in the coherent state | Q, P ⟩ are
−−−−− −−−−−−
ℏ ℏ mℏω
0
√
Δq = ℓ = √ , Δp = = , (4.2.18)
2mω 2ℓ 2
0
and Δq ⋅ Δp = 1
2
ℏ . Thus we learn that the coherent state ψ Q,P
(q) is localized in phase space, in both position and momentum. If
we have a general operator ^
A(q, p) , we can then write
^
⟨ Q, P ∣
∣ A(q, p) ∣
∣ Q, P ⟩ = A(Q, P ) + O(ℏ) , (4.2.19)
where A(Q, P ) is formed from A

^
(q, p) by replacing q → Q and p → P .
Since
2
d z d Rez d I mz dQ dP
≡ = , (4.2.20)
2πi π 2πℏ
we can write the trace using coherent states as

∞ ∞
1
^ ^
Tr A = ∫ dQ∫ dP ⟨ Q, P ∣
∣A ∣∣ Q, P ⟩ . (4.2.21)
2πℏ
−∞ −∞
We now can understand the origin of the factor 2πℏ in the denominator of each (q , p )
i i
integral over classical phase space in
Equation ??? .
Note that ω is arbitrary in our discussion. By increasing ω , the states become more localized in q and more plane wave like in p.
0 0
However, so long as ω is finite, the width of the coherent state in each direction is proportional to ℏ , and thus vanishes in the
0
1/2
classical limit.
This page titled 4.2: The Quantum Mechanical Trace is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by
Daniel Arovas.
Two Systems in Thermal Contact
Consider two systems in thermal contact, as depicted in Figure 4.3.1. The two subsystems #1 and #2 are free to exchange energy,
but their respective volumes and particle numbers remain fixed. We assume the contact is made over a surface, and that the energy
associated with that surface is negligible when compared with the bulk energies E and E . Let the total energy be E = E + E . 1 2 1 2
Then the density of states D(E) for the combined system is
D(E) = ∫ dE D (E ) D (E − E ) . (4.3.1)
1 1 1 2 1
The probability density for system #1 to have energy E is then 1
D (E ) D (E − E )
1 1 2 1
P (E ) = . (4.3.2)
1 1
D(E)
Note that P (E
1 1
) is normalized: ∫ dE P (E ) = 1 . We now ask: what is the most probable value of
1 1 1
E
1
? We find out by
differentiating P
1
(E )
1
with respect to E and setting the result to zero. This requires
1
1 dP (E ) ∂
1 1
0 = = ln P (E )
1 1
P (E ) dE ∂E
1 1 1 1
∂ ∂
= ln D (E ) + ln D (E − E ) .
1 1 2 1
∂E ∂E
1 1
We conclude that the maximally likely partition of energy between systems #1 and #2 is realized when
∂S ∂S
1 2
= . (4.3.3)
∂E ∂E
1 2
This guarantees that

S(E, E ) = S (E ) + S (E − E ) (4.3.4)
1 1 1 2 1
is a maximum with respect to the energy E , at fixed total energy E .

1
Figure 4.3.1 : Two systems in thermal contact.

The temperature T is defined as
1 ∂S
=( ) , (4.3.5)
T ∂E
V ,N
a result familiar from thermodynamics. The difference is now we have a more rigorous definition of the entropy. When the total
entropy S is maximized, we have that T = T . Once again, two systems in thermal contact and can exchange energy will in
1 2
equilibrium have equal temperatures.

According to Equations \ref{phinrel} and \ref{phiurel}, the entropies of nonrelativistic and ultrarelativistic ideal gases in d space
dimensions are given by
1 E V
S NR = N d kB ln( ) + N kB ln( ) + const. (4.3.6)
2 N N
E V
S = N d kB ln( ) + N kB ln( ) + const. . (4.3.7)
UR
N N
Invoking Equation 4.3.5, we then have

1
E = N d kB T , E = N d kB T . (4.3.8)
NR UR
2
We saw that the probability distribution P (E ) is maximized when T = T , but how sharp is the peak in the distribution? Let us
1 1 1 2
write E = E + ΔE , where E is the solution to Equation ??? . We then have

1
∗
1 1
∗
1
2 2
1 ∂ S ∣ 1 ∂ S ∣
∗ ∗ 1 2 2 2
ln P (E + ΔE ) = ln P (E )+ ∣ (ΔE ) + ∣ (ΔE ) +… , (4.3.9)
1 1 1 1 1 2 ∣ 1 2 ∣ 1
2kB ∂E E
∗
2kB ∂E E
∗
1 1 2 2
where E ∗
2
= E −E
∗
1
. We must now evaluate
2
∂ S ∂ 1 1 ∂T 1
= ( ) =− ( ) =− , (4.3.10)
2 2 2
∂E ∂E T T ∂E T C
V ,N V
where C V
= (∂E/∂T )
V ,N
is the heat capacity. Thus,
2 2 ¯
∗ −(Δ E1 ) /2 kB T CV
P =P e , (4.3.11)
1 1
where
C C
V ,1 V ,2
¯
CV = . (4.3.12)
C +C
V ,1 V ,2
The distribution is therefore a Gaussian, and the fluctuations in ΔE can now be computed: 1
−−−− −−
2 2 ¯ ¯
⟨(ΔE ) ⟩ = kB T CV ⟹ (ΔE1 ) = kB T √C V / kB . (4.3.13)
1 RMS
The individual heat capacities C

V ,1
and C V ,2
scale with the volumes V
1
and V , respectively. If
2
V
2
≫ V
1
, then C
V ,2
≫ C
V ,1
, in
which case ¯
. Therefore the RMS fluctuations in ΔE are proportional to the square root of the system size, whereas E
CV ≈ C
V ,1 1 1
itself is extensive. Thus, the ratio (ΔE ) /E ∝ V scales as the inverse square root of the volume. The distribution
1 RMS 1
−1/2
P (E ) is thus extremely sharp.

1 1
Thermal, mechanical and chemical equilibrium

We have dS∣∣ V ,N
=
1
T
dE , but in general S = S(E, V , N ) . Equivalently, we may write E = E(S, V , N ) . The full differential of
E(S, V , N ) is then dE = T dS − p dV + μ dN , with T =(
∂E
∂S
) and p = −(
∂E
∂V
) and μ =(
∂E
∂N
) . As we shall
V ,N S,N S,V
discuss in more detail, p is the pressure and μ is the chemical potential. We may thus write the total differential dS as
1 p μ
dS = dE + dV − dN . (4.3.14)
T T T
Employing the same reasoning as in the previous section, we conclude that entropy maximization for two systems in contact
requires the following:
If two systems can exchange energy, then T = T . This is thermal equilibrium.1 2
If two systems can exchange volume, then p /T = p /T . This is mechanical equilibrium.

1 1 2 2
If two systems can exchange particle number, then μ /T = μ /T . This is chemical equilibrium. 1 1 2 2
Gibbs-Duhem Relation
The energy E(S, V , N ) is an extensive function of extensive variables, it is homogeneous of degree one in its arguments.
Therefore E(λS, λV , λN ) = λE , and taking the derivative with respect to λ yields
∂E ∂E ∂E
E = S( ) +V ( ) +N( )
∂S ∂V ∂N
V ,N S,N S,V
= T S − pV + μN .
Taking the differential of each side, using the Leibniz rule on the RHS, and plugging in dE = T dS − p dV + μ dN , we arrive at
the Gibbs-Duhem relation5,
S dT − V dp + N dμ = 0 . (4.3.15)
This, in turn, says that any one of the intensive quantities (T , p, μ) can be written as a function of the other two, in the case of a
single component system.
This page titled 4.3: Thermal Equilibrium is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel Arovas.
Canonical Distribution and Partition Function
Consider a system S in contact with a world W , and let their union U = W ∪ S be called the ‘universe’. The situation is depicted
in Figure [universe]. The volume V and particle number N of the system are held fixed, but the energy is allowed to fluctuate by
S S
exchange with the world W . We are interested in the limit N → ∞ , N → ∞ , with N ≪ N , with similar relations holding
S W S W
for the respective volumes and energies. We now ask what is the probability that S is in a state | n ⟩ with energy E . This is given n
by the ratio
D (E − En ) ΔE
W U
Pn = lim (4.4.1)
ΔE→0 D (E ) ΔE
U U
# of states accessible to W given that E = En

S
= . (4.4.2)
total # of states in U
Then
ln Pn = ln D (E − En ) − ln D (E ) (4.4.3)
W U U U
∂ ln D (E)
W ∣
= ln D (E ) − ln D (E ) − En ∣ +… (4.4.4)
W U U U
∂E ∣
E=E
U
≡ −α − β En . (4.4.5)
The constant β is given by

∂ ln D (E) 1
W ∣
β = ∣ = . (4.4.6)
∂E ∣E=E kB T
U
Thus, we find P n =e
−α
e
−βEn
. The constant α is fixed by the requirement that ∑ n
Pn = 1 :
1 ^
−βEn −βEn −βH
Pn = e , Z(T , V , N ) = ∑ e = Tr e . (4.4.7)
Z
n
We’ve already met Z(β) in Equation ??? – it is the Laplace transform of the density of states. It is also called the partition function
of the system S . Quantum mechanically, we can write the ordinary canonical density matrix as
^
−βH
e
^ =
ϱ , (4.4.8)
^
−βH
Tr e
which is known as the Gibbs distribution. Note that [ ϱ^, H^

] = 0 , hence the ordinary canonical distribution is a stationary solution
to the evolution equation for the density matrix. Note that the OCE is specified by three parameters: T , V , and N .
The difference between P (E n) and P n
Let the total energy of the Universe be fixed at E

U
. The joint probability density P (E
S
,E
W
) for the system to have energy E
S
and the world to have energy E is W
P (E ,E ) =D (E )D (E ) δ(E −E −E )/ D (E ) , (4.4.9)
S W S S W W U S W U U
where
∞
D (E ) = ∫ dE D (E )D (E −E ) , (4.4.10)
U U S S S W U S
−∞
which ensures that ∫ dE ∫ dE P (E , E ) = 1 . The probability density P (E ) is defined such that P (E

S W S W S S
) dE
S
is the
(differential) probability for the system to have an energy in the range [E , E + dE ] . The units of P (E ) are E S S S S
−1
. To obtain
P (E ), we simply integrate the joint probability density P (E , E ) over all possible values of E , obtaining
S S W W
D (E )D (E −E )
S S W U S
P (E ) = , (4.4.11)
S
D (E )
U U
as we have in Equation 4.4.1.

Now suppose we wish to know the probability P that the system is in a particular state | n ⟩ with energy E . Clearly
n n
probability that E ∈ [ En , En + ΔE] P (En ) ΔE D (E − En )

S W U
Pn = lim = = . (4.4.12)
ΔE→0 \ \# of S states with E ∈ [ En , En + ΔE]\ D (En ) ΔE D (E )
S S U U
Additional remarks
The formula of Equation 4.4.1 is quite general and holds in the case where N /N = O(1) , so long as we are in the S W
thermodynamic limit, where the energy associated with the interface between S and W may be neglected. In this case, however, one
is not licensed to perform the subsequent Taylor expansion, and the distribution P is no longer of the Gibbs form. It is also valid n
for quantum systems6, in which case we interpret P = ⟨n|ϱ |n⟩ as a diagonal element of the density matrix ϱ . The density of
n S S
states functions may then be replaced by

E −E +ΔE
U n
S (E −E , ΔE) ^
D (E − En ) ΔE → e W U n
≡ Tra ∫ dE δ(E − H W )
W U
W
E −En
U
E +ΔE
U
S (E , ΔE) ^
D (E ) ΔE → e U U ≡ Tra ∫ dE δ(E − H ) .
U U U
U
E
U
The off-diagonal matrix elements of ϱ are negligible in the thermodynamic limit.

S
Averages within the OCE

To compute averages within the OCE,
^ −βEn
∑ ⟨n| A|n⟩ e
^ ^ n
^A
⟨A⟩ = Tr (ϱ ) = , (4.4.13)
−βEn
∑n e
where we have conveniently taken the trace in a basis of energy eigenstates. In the classical limit, we have
1 ^ ^ ^
−βH (φ) −βH −βH (φ)
ϱ(φ) = e , Z = Tr e = ∫ dμ e , (4.4.14)
Z
with dμ = 1
N!
∏
N
j=1
d
(d q
j
d
d p /h )
j
d
for identical particles (‘Maxwell-Boltzmann statistics’). Thus,
^
−βH (φ)
∫dμ A(φ) e
⟨A⟩ = Tr (ϱA) = . (4.4.15)
^
−βH (φ)
∫dμ e
Entropy and Free Energy

The Boltzmann entropy is defined by
^ ln ϱ
S = −kB Tr (ϱ ^) = −kB ∑ Pn ln Pn . (4.4.16)
The Boltzmann entropy and the statistical entropy S = kB ln D(E) are identical in the thermodynamic limit. We define the
Helmholtz free energy F (T , V , N ) as
F (T , V , N ) = −kB T ln Z(T , V , N ) , (4.4.17)
hence
βF −βEn
Pn = e e , ln Pn = βF − β En . (4.4.18)
Therefore the entropy is
S = −kB ∑ Pn (βF − β En ) (4.4.19)
^
F ⟨H ⟩
=− + ,
T T
which is to say F = E −TS , where

^
^ −βH
Tr H e
E = ∑ Pn En = (4.4.20)
^
−βH
n Tr e
is the average energy. We also see that

−βE
^ ∑ En e n
∂ ∂
−βH −βEn n
Z = Tr e = ∑e ⟹ E = =− ln Z = (βF ) . (4.4.21)
−βEn
∑ e ∂β ∂β
n n
Thus, F (T , V , N ) is a Legendre transform of E(S, V , N ), with

dF = −S dT − p dV + μ dN , (4.4.22)
which means
∂F ∂F ∂F
S = −( ) , p = −( ) , μ = +( ) . (4.4.23)
∂T V ,N
∂V T ,N
∂N T ,V
Fluctuations in the OCE

In the OCE, the energy is not fixed. It therefore fluctuates about its average value E = ⟨H
^
⟩ . Note that
2
∂E 2
∂E ∂ ln Z
− = kB T =
∂β ∂T ∂β 2
2 2
^ ^
^ −βH ^ −βH
Tr H e Tr H e
= −( )
^ ^
−βH −βH
Tr e Tr e
2
^ ^ 2
= ⟨H ⟩ − ⟨H ⟩ .
Thus, the heat capacity is related to the fluctuations in the energy, just as we saw at the end of §4:
∂E 1 2
^ ^ 2
C =( ) = (⟨H ⟩ − ⟨H ⟩ ) (4.4.24)
V 2
∂T kB T
V ,N
For the nonrelativistic ideal gas, we found C V

=
d
2
N kB , hence the ratio of RMS fluctuations in the energy to the energy itself is
−− −−−−− − −−−−−−−
√⟨ (ΔH^ 2 √kB T
2
C −
− −
−
) ⟩ V 2
= =√ , (4.4.25)
^ d Nd
⟨H ⟩ N kB T
2
and the ratio of the RMS fluctuations to the mean value vanishes in the thermodynamic limit.
The full distribution function for the energy is
^
^ −βH
Tr δ(E − H ) e 1
^ )⟩ = −βE
P (E) = ⟨δ(E − H = D(E) e . (4.4.26)
^
−βH Z
Tr e
Thus,
−β[E−T S(E)]
e
P (E) = ′ ′
, (4.4.27)
′ −β[E −T S( E )]
∫ dE e
where S(E) = k ln D(E) is the statistical entropy. Let’s write E = E + δE , where E extremizes the combination E − T S(E) , the
B
solution to T S (E) = 1 , where the energy derivative of S is performed at fixed volume V and particle number N . We now
′
expand S(E + δE) to second order in δE , obtaining

2
δE (δE)
S(E + δE) = S(E) + − +… (4.4.28)
2
T 2T C
V
Recall that S ′′
(E) =
∂E
∂
(
T
1
) =−
1
2
. Thus,
T CV
2
(δE)
3
E − T S(E) = E − T S(E) + + O((δE ) ) . (4.4.29)
2T C
V
Applying this to both numerator and denominator of Equation 4.4.27, we obtain7

2
(δE)
P (E) = N exp [− ] , (4.4.30)
2
2kB T C
V
where N = (2πkB T
2
C
V
)
−1/2
is a normalization constant which guarantees ∫ dE P (E) = 1 . Once again, we see that the
−−−−− −−
distribution is a Gaussian centered at ⟨E⟩ = E , and of width (ΔE ) RMS
= √kB T
2
C
V
. This is a consequence of the Central Limit
Theorem.
Thermodynamics revisited
The average energy within the OCE is
E = ∑ En Pn , (4.4.31)
and therefore
dE = ∑ En dPn + ∑ Pn dEn
n n
′ ′
= \mathchar 26
dQ − \mathchar 26
dW ,
where
′
\mathchar 26
dW = − ∑ Pn dEn
′
\mathchar 26
dQ = ∑ En dPn .
n
Finally, from P n =Z
−1
e
−E / k
n B
T
, we can write
En = −kB T ln Z − kB T ln Pn , (4.4.32)
with which we obtain

′
\mathchar 26
dQ = ∑ En dPn
= −kB T ln Z ∑ dPn − kB T ∑ ln Pn dPn

n n
= T d(− kB ∑ Pn ln Pn ) = T dS .
Note also that
′
\mathchar 26
dW = − ∑ Pn dEn (4.4.33)
∂En
= − ∑ Pn ( ∑ dX ) (4.4.34)
i
∂X
n i i
^
= −∑ Pn ⟨ n ∣
∣ ∂H
n,i
(4.4.35)
∂X ∣∣ n ⟩ dXi ≡ ∑i Fi dXi ,
i
so the generalized force F conjugate to the generalized displacement dX is

i i
^
∂En ∂H
F = − ∑ Pn = −⟨ ⟩ . (4.4.36)
i
∂X ∂X
n i i
This is the force acting on the system8. In the chapter on thermodynamics, we defined the generalized force conjugate to X
i
as
y ≡ −F .
i i
Figure 4.4.1 : Microscopic, statistical interpretation of the First Law of Thermodynamics.

Thus we see from Equation ??? that there are two ways that the average energy can change; these are depicted in the sketch of
Figure 4.4.1. Starting from a set of energy levels {E } and probabilities {P }, we can shift the energies to {E }. The resulting
n n
′
n
change in energy (ΔE ) = −W is identified with the work done on the system. We could also modify the probabilities to {P }
I
′
n
without changing the energies. The energy change in this case is the heat absorbed by the system: (ΔE ) = Q . This provides us II
with a statistical and microscopic interpretation of the First Law of Thermodynamics.
Generalized Susceptibilities
Suppose our Hamiltonian is of the form
^ ^ ^ ^
H = H (λ) = H 0 − λ Q , (4.4.37)
where λ is an intensive parameter, such as magnetic field. Then

^ ^
−β( H 0 −λQ)
Z(λ) = Tr e (4.4.38)
and
1 ∂Z 1 ^
^ −βH (λ) ^
=β⋅ Tr (Q e ) = β ⟨Q⟩ . (4.4.39)
Z ∂λ Z
But then from Z = e −βF

we have
∂F
^
Q(λ, T ) = ⟨ Q ⟩ = −( ) . (4.4.40)
∂λ
T
Typically we will take Q to be an extensive quantity. We can now define the susceptibility χ as
2
1 ∂Q 1 ∂ F
χ = =− . (4.4.41)
2
V ∂λ V ∂λ
The volume factor in the denominator ensures that χ is intensive.
It is important to realize that we have assumed here that [ H

^
, Q] = 0 , the ‘bare’ Hamiltonian H and the operator Q commute. If
^
0
^ ^
0
they do not commute, then the response functions must be computed within a proper quantum mechanical formalism, which we
shall not discuss here.
Note also that we can imagine an entire family of observables {Q

^
k
} satisfying [ Q
^
k
^
, Qk′ ] = 0 and [ H
^
0
^
, Qk ] = 0 , for all k and
k . Then for the Hamiltonian
′
^ ⃗ ^ ^
H (λ) = H 0 − ∑ λ Qk , (4.4.42)
k
we have that
∂F
⃗ ^
Q (λ, T ) = ⟨ Q ⟩ = −( ) (4.4.43)
k k
∂λ
k T , Na , λ ′
k ≠k
and we may define an entire matrix of susceptibilities,

2
1 ∂Qk 1 ∂ F
χ = =− . (4.4.44)
kl
V ∂λ V ∂λ ∂λ
l k l
This page titled 4.4: Ordinary Canonical Ensemble (OCE) is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by
Daniel Arovas.
Grand canonical distribution and partition function
Consider once again the situation depicted in Figure [universe], where a system S is in contact with a world W , their union
U = W ∪ S being called the ‘universe’. We assume that the system’s volume V is fixed, but otherwise it is allowed to exchange S
energy and particle number with W . Hence, the system’s energy E and particle number N will fluctuate. We ask what is the
S S
probability that S is in a state | n ⟩ with energy E and particle number N . This is given by the ratio
n n
D (E − En , N − Nn ) ΔE
W U U
Pn = lim
ΔE→0 D (E ,N ) ΔE
U U U
\# of states accessible to W given that E = En and N = Nn

S S
= .
total \# of states in U
Then
ln Pn = ln D (E − En , N − Nn ) − ln D (E ,N )
W U U U U U
= ln D (E ,N ) − ln D (E ,N )
W U U U U U
∂ ln D (E, N ) ∂ ln D (E, N )
W ∣ W ∣
− En ∣ − Nn ∣ + …
∂E ∣ E=E
U ∂N ∣ E=E
U
N=N N=N
U U
≡ −α − β En + βμNn .
The constants β and μ are given by

∂ ln D (E, N ) 1
W ∣
β = ∣ =
∂E ∣ E=E
U kB T
N=N
U
∂ ln D (E, N ) ∣
W
μ = −kB T ∣ .
∂N ∣ E=E
U
N=N
U
The quantity μ has dimensions of energy and is called the chemical potential. Nota bene: Some texts define the ‘grand canonical
Hamiltonian’ K^
as
^ ^ ^
K ≡ H − μN . (4.5.1)
Thus, P n =e
−α
e
−β( En −μ Nn )
. Once again, the constant α is fixed by the requirement that ∑ n
Pn = 1 :
1 ^ ^ ^
−β( En −μ Nn ) −β( En −μ Nn ) −β( H −μ N ) −βK
Pn = e , Ξ(β, V , μ) = ∑ e = Tr e = Tr e . (4.5.2)
Ξ
n
Thus, the quantum mechanical grand canonical density matrix is given by

^
−βK
e
^ =
ϱ . (4.5.3)
^
−βK
Tr e
Note that [ ϱ^, K

^ ] = 0 . The quantity Ξ(T , V , μ) is called the grand partition function. It stands in relation to a corresponding free
energy in the usual way:

−βΩ(T ,V ,μ)
Ξ(T , V , μ) ≡ e ⟺ Ω = −kB T ln Ξ , (4.5.4)
where Ω(T , V , μ) is the grand potential, also known as the Landau free energy. The dimensionless quantity z ≡ e βμ
is called the
fugacity.
If [ H
^
, N ] = 0 , the grand potential may be expressed as a sum over contributions from each N sector, viz.
^
βμN
Ξ(T , V , μ) = ∑ e Z(T , V , N ) . (4.5.5)
When there is more than one species, we have several chemical potentials {μ }, and accordingly we define a
^ ^ ^
K = H − ∑ μa N a , (4.5.6)
^
with Ξ = Tr e
−βK
as before.
Entropy and Gibbs-Duhem relation

In the GCE, the Boltzmann entropy is
S = −kB ∑ Pn ln Pn
= −kB ∑ Pn (βΩ − β En + βμNn )
^⟩
⟨H ^⟩
μ ⟨N
Ω
=− + − ,
T T T
which says
Ω = E − T S − μN , (4.5.7)
where
^H^
E = ∑ En Pn = Tr (ϱ )
^N^
N = ∑ Nn Pn = Tr (ϱ ) .
Therefore, Ω(T , V , μ) is a double Legendre transform of E(S, V , N ), with
dΩ = −S dT − p dV − N dμ , (4.5.8)
which entails
∂Ω ∂Ω ∂Ω
S = −( ) , p = −( ) , N = −( ) . (4.5.9)
∂T ∂V ∂μ
V ,μ T ,μ T ,V
Since Ω(T , V , μ) is an extensive quantity, we must be able to write Ω = V ω(T , μ) . We identify the function ω(T , μ) as the
negative of the pressure:
∂Ω kB T ∂Ξ 1 ∂En −β( E −μ N )
=− ( ) = ∑ e n n
∂V Ξ ∂V Ξ ∂V
T ,μ n
∂E
=( ) = −p(T , μ) .
∂V
T ,μ
Therefore,
Ω = −pV , p = p(T , μ) (equation of state)\ . (4.5.10)
This is consistent with the result from thermodynamics that G = E − T S + pV = μN . Taking the differential, we recover the
Gibbs-Duhem relation,
dΩ = −S dT − p dV − N dμ = −p dV − V dp ⇒ S dT − V dp + N dμ = 0 . (4.5.11)
Generalized Susceptibilities in the GCE

We can appropriate the results from §5.8 and apply them, mutatis mutandis, to the GCE. Suppose we have a family of observables
^
{ Qk }satisfying ^ ^
[ Qk , Qk′ ] = 0 and ^ ^
[ H 0 , Qk ] = 0 and ^ ^
[ N a , Qk ] = 0 for all k , k
′
, and a . Then for the grand canonical
Hamiltonian
^ ⃗ ^ ^ ^
K (λ) = H 0 − ∑ μa N a − ∑ λ Qk , (4.5.12)
k
a k
we have that
∂Ω
⃗ ^
Q (λ, T ) = ⟨ Qk ⟩ = −( ) (4.5.13)
k
∂λ
k T , μa , λ ′
k ≠k
and we may define the matrix of generalized susceptibilities,

2
1 ∂Q 1 ∂ Ω
k
χ = =− . (4.5.14)
kl
V ∂λ V ∂λ ∂λ
l k l
Fluctuations in the GCE

Both energy and particle number fluctuate in the GCE. Let us compute the fluctuations in particle number. We have
^ ^
^ e−β( H −μ N )
Tr N 1 ∂
^
N =⟨N ⟩ = = ln Ξ . (4.5.15)
^ ^
−β( H −μ N ) β ∂μ
Tr e
Therefore,
2 2
^ ^ ^ ^
^ −β( H −μ N ) ^ −β( H −μ N )
1 ∂N Tr N e Tr N e
= −( )
^ ^ ^ ^
β ∂μ
Tr e−β( H −μ N ) Tr e−β( H −μ N )
2 2
^ ^
= ⟨N ⟩ − ⟨N ⟩ .
Note now that

2 2
^ ^
⟨N ⟩ − ⟨N ⟩ kB T ∂N kB T
= ( ) = κ , (4.5.16)
2 2 T
^⟩ N ∂μ V
⟨N T ,V
where κ is the isothermal compressibility. Note:

T
∂N ∂(N , T , V ) ∂(N , T , V )
( ) = =−
∂μ ∂(μ, T , V ) ∂(V , T , μ)
T ,V
1

∂(N , T , V ) ∂(N , T , p) ∂(V , T , p) ∂(N , T , μ)

=− ⋅ ⋅ ⋅
∂(N , T , p) ∂(V , T , p) ∂(N , T , μ) ∂(V , T , μ)
2 2
N ∂V N
=− ( ) = κ .
2 T
V ∂p V
T ,N
Thus,
−−−−−−
(ΔN ) kB T κ
RMS T
=√ , (4.5.17)
N V
which again scales as V −1/2

.
Gibbs ensemble
Let the system’s particle number N be fixed, but let it exchange energy and volume with the world W . Mutatis mutandis, we have
D (E − En , V − Vn ) ΔE ΔV
W U U
Pn = lim lim . (4.5.18)
ΔE→0 ΔV →0 D (E ,V ) ΔE ΔV
U U U
Then
ln Pn = ln D (E − En , V − Vn ) − ln D (E ,V )
W U U U U U
= ln D (E ,V ) − ln D (E ,V )
W U U U U U
∂ ln D (E, V ) ∂ ln D (E, V )
W ∣ W ∣
− En ∣ E=E − Vn ∣ E=E + …
∂E ∣ U ∂V ∣ U
V =V V =V
U U
≡ −α − β En − βp Vn .
The constants β and p are given by

∂ ln D (E, V ) ∣ 1
W
β = ∣ =
∂E ∣ E=EU kB T
V =V
U
∂ ln D (E, V )
W ∣
p = kB T ∣ E=E .
∂V ∣ U
V =V
U
The corresponding partition function is

∞
^
−β( H +pV )
1 −βpV −βG(T ,p,N )
Y (T , p, N ) = Tr e = ∫ dV e Z(T , V , N ) ≡ e , (4.5.19)
V
0
0
where V is a constant which has dimensions of volume. The factor V

0
in front of the integral renders Y dimensionless. Note that
0
−1
G(V ) = G(V ) + k T ln(V / V ) , so the difference is not extensive and can be neglected in the thermodynamic limit. In other
′ ′
B
0 0 0 0
words, it doesn’t matter what constant we choose for V since it contributes subextensively to G. Moreover, in computing averages,
0
the constant V divides out in the ratio of numerator and denominator. Like the Helmholtz free energy, the Gibbs free energy
0
G(T , p, N ) is also a double Legendre transform of the energy E(S, V , N ), viz.
G = E − T S + pV
dG = −S dT + V dp + μ dN ,
which entails
∂G ∂G ∂G
S = −( ) , V = +( ) , μ = +( ) . (4.5.20)
∂T ∂p ∂N
p,N T ,N T ,p
This page titled 4.5: Grand Canonical Ensemble (GCE) is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by
Daniel Arovas.
The basic principle: maximize the entropy,
S = −kB ∑ Pn ln Pn . (4.6.1)
μ CE
We maximize S subject to the single constraint
C = ∑ Pn − 1 = 0 . (4.6.2)
We implement the constraint C =0 with a Lagrange multiplier, λ

¯
≡k B
λ , writing
∗
S = S − kB λ C , (4.6.3)
and freely extremizing over the distribution {P n} and the Lagrange multiplier λ . Thus,
∗
δS = δS − kB λ δC − kB C δλ
= −kB ∑ [ ln Pn + 1 + λ]δPn − kB C δλ ≡ 0 .
We conclude that C =0 and that

ln Pn = −(1 + λ) , (4.6.4)
and we fix λ by the normalization condition ∑ n

Pn = 1 . This gives
1
Pn = , Ω = ∑ Θ(E + ΔE − En ) Θ(En − E) . (4.6.5)
Ω
n
Note that Ω is the number of states with energies between E and E + ΔE .
OCE
We maximize S subject to the two constraints
C = ∑ Pn − 1 = 0 , C = ∑ En Pn − E = 0 . (4.6.6)
1 2
n n
We now have two Lagrange multipliers. We write

2
∗
S = S − kB ∑ λ C , (4.6.7)
j j
j=1
and we freely extremize over {P n} and {C }. We therefore have

j
∗
δS = δS − kB ∑ (λ +λ En ) δPn − kB ∑ C δλ
1 2 j j
n j=1
= −kB ∑ [ ln Pn + 1 + λ +λ En ]δPn − kB ∑ C δλ ≡0 .
1 2 j j
n j=1
Thus, C 1
=C
2
=0 and
ln Pn = −(1 + λ +λ En ) . (4.6.8)
1 2
We define λ 2
≡β and we fix λ by normalization. This yields
1
1
−βEn −βEn
Pn = e , Z = ∑e . (4.6.9)
Z
n
GCE
We maximize S subject to the three constraints
C = ∑ Pn − 1 = 0 , C = ∑ En Pn − E = 0 , C = ∑ Nn Pn − N = 0 . (4.6.10)
1 2 3
n n n
We now have three Lagrange multipliers. We write

3
∗
S = S − kB ∑ λ C , (4.6.11)
j j
j=1
and hence
3
∗
δS = δS − kB ∑ (λ +λ En + λ Nn ) δPn − kB ∑ C δλ
1 2 3 j j
n j=1
= −kB ∑ [ ln Pn + 1 + λ +λ En + λ Nn ]δPn − kB ∑ C δλ ≡0 .
1 2 3 j j
n j=1
Thus, C 1
=C
2
=C
3
=0 and
ln Pn = −(1 + λ +λ En + λ Nn ) . (4.6.12)
1 2 3
We define λ 2
≡β and λ 3
≡ −βμ , and we fix λ by normalization. This yields
1
1
−β( En −μ Nn ) −β( En −μ Nn )
Pn = e , Ξ = ∑e . (4.6.13)
Ξ
n
This page titled 4.6: Statistical Ensembles from Maximum Entropy is shared under a CC BY-NC-SA license and was authored, remixed, and/or
The ordinary canonical partition function for the ideal gas was computed in Equation [zideal]. We found
N d d
1 d x d p 2
i i −βp /2m
Z(T , V , N ) = ∏∫ e i
N! (2πℏ)d
i=1
Nd
∞
N ⎛ ⎞
V dp 2
−βp /2m
= ⎜∫ e ⎟
N! ⎝ 2πℏ
⎠
−∞
N
1 V
= ( ) ,
d
N! λ
T
where λ is the thermal wavelength:

T
−−−−−−−−−−
2
λ = √2π ℏ /m kB T . (4.7.1)
T
The physical interpretation of λ is that it is the de Broglie wavelength for a particle of mass m which has a kinetic energy of k
T B
T .
In the GCE, we have
∞
βμN
Ξ(T , V , μ) = ∑ e Z(T , V , N )
N =0
N
∞ μ/ kB T μ/ kB T
1 V e V e
= ∑ ( ) = exp ( ) .
d d
N! λ λ
N =1 T T
From Ξ = e −Ω/ kB T
, we have the grand potential is
μ/ kB T d
Ω(T , V , μ) = −V kB T e /λ . (4.7.2)
T
Since Ω = −pV (see §6.2), we have

−d μ/ kB T
p(T , μ) = kB T λ e . (4.7.3)
T
The number density can also be calculated:

N 1 ∂Ω
−d μ/ kB T
n = =− ( ) =λ e . (4.7.4)
T
V V ∂μ
T ,V
Combined, the last two equations recapitulate the ideal gas law, pV = N kB T .
Maxwell velocity distribution

The distribution function for momenta is given by
N
1
g(p) = ⟨ ∑ δ(p − p)⟩ . (4.7.5)
i
N
i=1
Note that g(p) = ⟨δ(p − p)⟩

i
is the same for every particle, independent of its label i . We compute the average
^ ^
⟨A⟩ = Tr (Ae
−βH
)/ Tr e
−βH
. Setting i =1 , all the integrals other than that over p
1
divide out between numerator and
denominator. We then have
2
3 −βp /2m
∫d p δ(p − p) e 1
1 1
g(p) =
2
3 −βp /2m
∫d p e 1
1
2
−3/2 −βp /2m
= (2πm kB T ) e .
Textbooks commonly refer to the velocity distribution f (v), which is related to g(p) by
3 3
f (v) d v = g(p) d p . (4.7.6)
Hence,
3/2
m 2
−mv /2 kB T
f (v) = ( ) e . (4.7.7)
2π kB T
This is known as the Maxwell velocity distribution. Note that the distributions are normalized, viz.
3 3
∫ d p g(p) = ∫ d v f (v) = 1 . (4.7.8)
–
Figure 4.7.1 : Maxwell distribution of speeds φ(v/v0 ) . The most probable speed is vMAX = √2 v0 . The average speed is
−
−
–
vAV G = √
8
π
v0 . The RMS speed is v RMS = √3 v0 .
If we are only interested in averaging functions of v = |v| which are isotropic, then we can define the Maxwell speed distribution,
~
f (v) , as
3/2
~ m 2
2 2 −mv /2 kB T
f (v) = 4π v f (v) = 4π( ) v e . (4.7.9)
2π kB T
~
Note that f (v) is normalized according to
∞
~
∫ dv f (v) = 1 . (4.7.10)
−−− −−−
It is convenient to represent v in units of v 0
= √kB T /m , in which case
−
−
~ 1 2 2
2 −s /2
f (v) = φ(v/ v ) , φ(s) = √ s e . (4.7.11)
0
v π
0
The distribution φ(s) is shown in Figure 4.7.1. Computing averages, we have

∞
2 3 k
k k k/2
C ≡ ⟨s ⟩ = ∫ ds s φ(s) = 2 ⋅ (4.7.12)
k − Γ( + ) .
√π 2 2
0
−
−
Thus, C 0
=1 ,C1
=√
8
π
,C
2
=3 , The speed averages are
k/2
kB T
k
⟨v ⟩ = C ( ) . (4.7.13)
k
m
−−−− −−−−
Note that the average velocity is ⟨v⟩ = 0 , but the average speed is ⟨v⟩ = √8k B
T /πm . The speed distribution is plotted in Figure
4.7.1.
Equipartition
The Hamiltonian for ballistic (massive nonrelativistic) particles is quadratic in the individual components of each momentum p . i
There are other cases in which a classical degree of freedom appears quadratically in H ^
as well. For example, an individual normal
mode ξ of a system of coupled oscillators has the Lagrangian
1 2 1 2
˙ 2
L = ξ − ω ξ , (4.7.14)
0
2 2
where the dimensions of ξ are [ξ] = M 1/2

L by convention. The Hamiltonian for this normal mode is then
2
p 1 2
^ 2
H = + ω ξ , (4.7.15)
0
2 2
from which we see that both the kinetic as well as potential energy terms enter quadratically into the Hamiltonian. The classical
rotational kinetic energy is also quadratic in the angular momentum components.
Let us compute the contribution of a single quadratic degree of freedom in H
^
to the partition function. We’ll call this degree of
freedom ζ – it may be a position or momentum or angular momentum – and we’ll write its contribution to H^
as
1
^ 2
Hζ = Kζ , (4.7.16)
2
where K is some constant. Integrating over ζ yields the following factor in the partition function:
∞
1/2
−βK ζ
2
/2
2π
∫ dζ e =( ) . (4.7.17)
Kβ
−∞
The contribution to the Helmholtz free energy is then

1 K
ΔF = kB T ln( ) , (4.7.18)
ζ
2 2π kB T
and therefore the contribution to the internal energy E is

∂ 1 1
ΔE = (β ΔF ) = = kB T . (4.7.19)
ζ ζ
∂β 2β 2
We have thus derived what is commonly called the equipartition theorem of classical statistical mechanics:
We now see why the internal energy of a classical ideal gas with f degrees of freedom per molecule is E = f N k T , and 1
2
B
N k . This result also has applications in the theory of solids. The atoms in a solid possess kinetic energy due to their
1
C = B
V 2
motion, and potential energy due to the spring-like interatomic potentials which tend to keep the atoms in their preferred crystalline
positions. Thus, for a three-dimensional crystal, there are six quadratic degrees of freedom (three positions and three momenta) per
atom, and the classical energy should be E = 3N k T , and the heat capacity C = 3N k . As we shall see, quantum mechanics
B
V
B
modifies this result considerably at temperatures below the highest normal mode (phonon) frequency, but the high temperature
limit is given by the classical value C = 3ν R (where ν = N /N is the number of moles) derived here, known as the Dulong-
V A
Petit limit.
This page titled 4.7: Ideal Gas Statistical Mechanics is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by
Daniel Arovas.
Spins in an External Magnetic Field
Consider a system of N spins, each of which can be either up (σ = +1) or down (σ = −1 ). The Hamiltonian for this system is
S
NS
^
H = −μ H ∑ σ , (4.8.1)
0 j
j=1
where now we write H ^

for the Hamiltonian, to distinguish it from the external magnetic field H , and μ is the magnetic moment per particle. We treat this system 0
within the ordinary canonical ensemble. The partition function is

^
−βH N
Z = ∑⋯∑e =ζ S
, (4.8.2)
σ σ
1 N
S
where ζ is the single particle partition function:

μ H
μ H σ/ kB T 0
ζ = ∑ e 0
= 2 cosh( ) . (4.8.3)
kB T
σ=±1
The Helmholtz free energy is then
μ H
0
F (T , H , N ) = −kB T ln Z = −N kB T ln[2 cosh( )] . (4.8.4)
S S
kB T
The magnetization is
∂F μ H
0
M = −( ) =N μ tanh( ) . (4.8.5)
S 0
∂H kB T
T, N
S
The energy is
∂ μ H
0
E = (βF ) = −N μ H tanh( ) . (4.8.6)
S 0
∂β kB T
Hence, E = −H M , which we already knew, from the form of H

^
itself.
Each spin here is independent. The probability that a given spin has polarization σ is
βμ H σ
e 0
Pσ = . (4.8.7)
βμ H −βμ H
e 0 +e 0
The total probability is unity, and the average polarization is a weighted average of σ = +1 and σ = −1 contributions:
μ H
0
P +P =1 , ⟨σ⟩ = P −P = tanh( ) . (4.8.8)
↑ ↓ ↑ ↓
kB T
At low temperatures T ≪ μ H / kB
0
, we have P
↑
≈ 1 −e
−2 μ H / kB T
0
. At high temperatures T > μ H / kB
0
, the two polarizations are equally likely, and
σμ H
Pσ ≈
1
2
(1 +
0
kB T
) .
The isothermal magnetic susceptibility is defined as

2
1 ∂M μ μ H
0 2 0
χ = ( ) = sech ( ) . (4.8.9)
T
N ∂H kB T kB T
S T
(Typically this is computed per unit volume rather than per particle.) At H = 0 , we have χ T
2
= μ / kB T
0
, which is known as the Curie law.
Aside
The energy E = −H M here is not the same quantity we discussed in our study of thermodynamics. In fact, the thermodynamic energy for this problem
vanishes! Here is why. To avoid confusion, we’ll need to invoke a new symbol for the thermodynamic energy, E . Recall that the thermodynamic energy E is a
function of extensive quantities, meaning E = E(S, M , N ) . It is obtained from the free energy F (T , H , N ) by a double Legendre transform:
S S
E(S, M , N ) = F (T , H , N ) + T S + H M . (4.8.10)
S S
Now from Equation 4.8.4 we derive the entropy
∂F μ H μ H μ H
0 0 0
S =− =N kB ln[2 cosh( )] − N tanh( ) . (4.8.11)
S S
∂T kB T T kB T
Thus, using Equations 4.8.4 and 4.8.5, we obtain E(S, M , N S

) =0 .
The potential confusion here arises from our use of the expression F (T , H , N ). In thermodynamics, it is the Gibbs free energy G(T , p, N ) which is a double
S
Legendre transform of the energy: G = E − T S + pV . By analogy, with magnetic systems we should perhaps write G = E − T S − H M , but in keeping with
many textbooks we shall use the symbol F and refer to it as the Helmholtz free energy. The quantity we’ve called E in Equation 4.8.6 is in fact
E = E − H M , which means E = 0 . The energy E(S, M , N ) vanishes here because the spins are noninteracting.
S
Negative Temperature (!)

Consider again a system of N spins, each of which can be either up (+) or down (−). Let
S
Nσ be the number of sites with spin σ, where σ = ±1 . Clearly
N +
+N −
=N
S
. We now treat this system within the microcanonical ensemble.
Figure 4.8.1 : When entropy decreases with increasing energy, the temperature is negative. Typically, kinetic degrees of freedom prevent this peculiarity from
manifesting in physical systems.
The energy of the system is
E = −H M , (4.8.12)
where H is an external magnetic field, and M = (N − N + −

)μ
0
is the total magnetization. We now compute S(E) using the ordinary canonical ensemble. The
number of ways of arranging the system with N up spins is +
N
S
Ω =( ) , (4.8.13)
N+
hence the entropy is
S = kB ln Ω = −N kB {x ln x + (1 − x) ln(1 − x)} (4.8.14)

S
in the thermodynamic limit: N → ∞ , N

S +
→ ∞ ,x=N +
/N
S
constant. Now the magnetization is M = (N+ − N− )μ
0
= (2 N+ − N )μ
S 0
, hence if we define
the maximum energy E ≡ N μ H , then
0 S 0
E M E −E
0
=− = 1 − 2x ⟹ x = . (4.8.15)
E N μ 2E
0 S 0 0
We therefore have
E −E E −E E +E E +E
0 0 0 0
S(E, N ) = −N kB [( ) ln( ) +( ) ln( )] . (4.8.16)
S S
2E 2E 2E 2E
0 0 0 0
We now have
1 ∂S ∂S ∂x N kB E −E
S 0
=( ) = = ln( ) . (4.8.17)
T ∂E ∂x ∂E 2E E +E
NS 0 0
We see that the temperature is positive for −E 0

≤E <0 and is negative for 0 < E ≤ E . 0
What has gone wrong? The answer is that nothing has gone wrong – all our calculations are perfectly correct. This system does exhibit the possibility of negative
temperature. It is, however, unphysical in that we have neglected kinetic degrees of freedom, which result in an entropy function S(E, N ) which is an increasing S
function of energy. In this system, S(E, N ) achieves a maximum of S

S
= N k ln 2 at E = 0 ( x =max ), and then turns over and starts decreasing. In fact, our
S
B
1
results are completely consistent with Equation 4.8.6: the energy E is an odd function of temperature. Positive energy requires negative temperature! Another
example of this peculiarity is provided in the appendix in §11.2.
Adsorption
PROBLEM: A surface containing N adsorption sites is in equilibrium with a monatomic ideal gas. Atoms adsorbed on the surface have an energy −Δ and no
S
kinetic energy. Each adsorption site can accommodate at most one atom. Calculate the fraction f of occupied adsorption sites as a function of the gas density n , the
temperature T , the binding energy Δ, and physical constants.
The grand partition function for the surface is
N
S
N
−Ω / kB T S j(μ+Δ)/ kB T
Ξ =e surf
= ∑( )e
surf
j
j=0
μ/ kB T Δ/ kB T NS
= (1 + e e ) .
The fraction of occupied sites is
^ ∂Ω μ/ kB T
⟨N surf ⟩ 1 surf e
f = =− = . (4.8.18)
μ/ kB T −Δ/ kB T
N N ∂μ e +e
S S
Since the surface is in equilibrium with the gas, its fugacity z = exp(μ/k B
T) and temperature T are the same as in the gas.
−−− −−−−−−−
SOLUTION: For a monatomic ideal gas, the single particle partition function is ζ =V λ
−3
T
, where λ
T
2
= √2π ℏ /m kB T is the thermal wavelength. Thus, the
grand partition function, for indistinguishable particles, is
−3 μ/ kB T
Ξgas = exp (V λ e ) . (4.8.19)
T
The gas density is
^
⟨N gas ⟩ 1 ∂Ωgas
−3 μ/ kB T
n = =− =λ e . (4.8.20)
T
V V ∂μ
We can now solve for the fugacity: z = e μ/ kB T

= nλ
3
T
. Thus, the fraction of occupied adsorption sites is
3
nλ
T
f = . (4.8.21)
3 −Δ/ kB T
nλ +e
T
Interestingly, the solution for f involves the constant ℏ .

It is always advisable to check that the solution makes sense in various limits. First of all, if the gas density tends to zero at fixed T and Δ, we have f → 0 . On the
other hand, if n → ∞ we have f → 1 , which also makes sense. At fixed n and T , if the adsorption energy is (−Δ) → −∞ , then once again f = 1 since every
adsorption site wants to be occupied. Conversely, taking (−Δ) → +∞ results in n → 0 , since the energetic cost of adsorption is infinitely high.
Figure 4.8.2 : The monomers in wool are modeled as existing in one of two states. The low energy undeformed state is A, and the higher energy deformed state is
B. Applying tension induces more monomers to enter the B state.
Elasticity of wool
Wool consists of interlocking protein molecules which can stretch into an elongated configuration, but reversibly so. This feature gives wool its very useful
elasticity. Let us model a chain of these proteins by assuming they can exist in one of two states, which we will call A and B, with energies \ve\subA and \ve\subB
and lengths \ell\subA and \ell\subB . The situation is depicted in Figure 4.8.2. We model these conformational degrees of freedom by a spin variable σ = ±1 for
each molecule, where σ = +1 in the A state and σ = −1 in the B state. Suppose a chain consisting of N monomers is placed under a tension τ . We then have
\HH=\sum_{j=1}^N \Big[\ve\subA\,\delta\ns_{\sigma\ns_j,+1} + \ve\subB\,\delta\ns_{\sigma\ns_j,-1}\Big]\ .
Similarly, the length is
\HL=\sum_{j=1}^N \Big[\ell\subA\,\delta\ns_{\sigma\ns_j,+1} + \ell\subB\,\delta\ns_{\sigma\ns_j,-1}\Big]\ .

^
The Gibbs partition function is Y = Tr e
−K / kB T
, with K
^
= H − τL :
^ ^
\HK=\sum_{j=1}^N \Big[\vet\subA\,\delta\ns_{\sigma\ns_j,+1} + \vet\subB\,\delta\ns_{\sigma\ns_j,-1}\Big]\ ,
where \vet\subA\equiv\ve\subA-\tau\ell\subA and \vet\subB\equiv\ve\subB-\tau\ell\subB . At τ =0 the A state is preferred for each monomer, but when τ exceeds τ ,
∗
defined by the relation \vet\subA=\vet\subB , the B state is preferred. One finds
\tau^*={\ve\subB-\ve\subA\over\ell\subB-\ell\subA}\ .
~
Figure 4.8.3 : Upper panel: length L(τ , T ) for kB T / ε = 0.01 (blue), 0.1 (green), 0.5 (dark red), and 1.0 (red). Bottom panel: dimensionless force constant
k/N (Δℓ)
2
versus temperature.
^
Once again, we have a set of N noninteracting spins. The partition function is Y =ζ
N
, where ζ is the single monomer partition function, ζ = Tr e
−βh
, where
\Hh=\vet\subA\,\delta\ns_{\sigma\ns_j,1} + \vet\subB\,\delta\ns_{\sigma\ns_j,-1}
is the single “spin" Hamiltonian. Thus,

^ ~ ~
−βh −βε −βε
ζ = Tr e =e A
+e B
, (4.8.22)
It is convenient to define the differences
\RDelta\ve=\ve\subB-\ve\subA\quad,\quad \RDelta\ell=\ell\subB-\ell\subA\quad,\quad \RDelta\vet=\vet\subB-\vet\subA
in which case the partition function Y is
\begin{aligned} Y(T,\tau,N)&=e^{-N\beta\,\vet\ns_\RA}\Big[1+e^{-\beta\RDelta\vet}\Big]^N\\ G(T,\tau,N)&=N\vet\subA - N\kT\ln\!\Big[1+e^{-\RDelta\vet/\kT}\Big]\end{aligned}
The average length is
\begin{split} L=\langle \HL\rangle &= -\pabc{G}{\tau}{T,N}\\ &=N\ell\subA + {N\RDelta\ell\over e^{(\RDelta\ve-\tau\RDelta\ell)/\kT}+1}\ . \end{split}
The polymer behaves as a spring, and for small τ the spring constant is
∂τ ∣ 4 kB T Δε
2
k = ∣ = cosh ( ) . (4.8.23)
∂L ∣τ=0 N (Δℓ)
2
2 kB T
The results are shown in Figure 4.8.3. Note that length increases with temperature for τ < τ and decreases with temperature for τ > τ . Note also that k diverges
∗ ∗
at both low and high temperatures. At low T , the energy gap Δε dominates and L=N\ell\ns_\ssr{A} , while at high temperatures k T dominates and B
L=\half N(\ell\ns_\ssr{A}+\ell\ns_\ssr{B}) .
Noninteracting spin dimers

Consider a system of noninteracting spin dimers as depicted in Figure 4.8.4. Each dimer contains two spins, and is described by the Hamiltonian
^
H = −J σ σ − μ H (σ +σ ) . (4.8.24)
dimer 1 2 0 1 2
Here, J is an interaction energy between the spins which comprise the dimer. If J > 0 the interaction is ferromagnetic, which prefers that the spins are aligned.
That is, the lowest energy states are | ↑↑ ⟩ and | ↓↓ ⟩. If J < 0 the interaction is antiferromagnetic, which prefers that spins be anti-aligned: | ↑↓ ⟩ and | ↓↑ ⟩.9
Suppose there are N dimers. Then the OCE partition function is Z = ζ
d
Nd
, where ζ(T , H ) is the single dimer partition function. To obtain ζ(T , H ), we sum over
the four possible states of the two spins, obtaining
^
−H dimer / kB T
ζ = Tre
2μ H
−J/ kB T J/ kB T 0
=2e +2 e cosh( ) .
kB T
Thus, the free energy is
2μ H
−J/ kB T J/ kB T 0
F (T , H , N ) = −N kB T ln 2 − N kB T ln[ e +e cosh( )] . (4.8.25)
d d d
kB T
The magnetization is
2μ H
J/ kB T 0
e sinh( )
∂F kB T
M = −( ) = 2N μ ⋅ (4.8.26)
d 0 2μ H
∂H −J/ kB T J/ kB T 0
T ,Nd e +e cosh( )
kB T
It is instructive to consider the zero field isothermal susceptibility per spin,
2 J/ k T
1 ∂M ∣ μ 2e B
0
χ = ∣ = ⋅ . (4.8.27)
T
2N ∣
∂H H =0 kB T J/ k T −J/ kB T
d
e B
+e
The quantity μ /k T is simply the Curie susceptibility for noninteracting classical spins. Note that we correctly recover the Curie result when
2
0 B
J =0 , since then
the individual spins comprising each dimer are in fact noninteracting. For the ferromagnetic case, if J ≫ k T , then we obtain B
2
2μ
0
χ (J ≫ kB T ) ≈ . (4.8.28)
T
kB T
This has the following simple interpretation. When J ≫ k T , the spins of each dimer are effectively locked in parallel. Thus, each dimer has an effective magnetic
B
moment μ = 2μ . On the other hand, there are only half as many dimers as there are spins, so the resulting Curie susceptibility per spin is × (2μ ) /k T .
ef f 0
1
2
0
2
B
Figure 4.8.4 : A model of noninteracting spin dimers on a lattice. Each red dot represents a classical spin for which σ
j = ±1 .
When −J ≫ k B
T , the spins of each dimer are effectively locked in one of the two antiparallel configurations. We then have
2
2μ
0 −2|J|/ kB T
χ (−J ≫ kB T ) ≈ e . (4.8.29)
T
kB T
In this case, the individual dimers have essentially zero magnetic moment.
This page titled 4.8: Selected Examples is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel Arovas.
Separation of translational and internal degrees of freedom
The states of a noninteracting atom or molecule are labeled by its total momentum p and its internal quantum numbers, which we
will simply write with a collective index α , specifying rotational, vibrational, and electronic degrees of freedom. The single particle
Hamiltonian is then
2
p
^ ^
h = +h , (4.9.1)
int
2m
with
2 2
ℏ k
^
h∣∣k, α ⟩ =( + εα ) ∣
∣k, α ⟩ . (4.9.2)
2m
The partition function is

^ 2
−βh −βp /2m −βε
ζ = Tr e = ∑e ∑g e j
. (4.9.3)
j
p j
Here we have replaced the internal label α with a label j of energy eigenvalues, with g being the degeneracy of the internal state j
with energy ε . To do the p sum, we quantize in a box of dimensions L × L × ⋯ × L , using periodic boundary conditions.
j 1 2 d
Then
2πℏn 2πℏn 2πℏn
1 2 d
p =( , , … , ) , (4.9.4)
L L L
1 2 d
where each n is an integer. Since the differences between neighboring quantized p vectors are very tiny, we can replace the sum
i
over p by an integral:
d
d p
∑ ⟶ ∫ (4.9.5)
Δp ⋯ Δp
p 1 d
where the volume in momentum space of an elementary rectangle is

d d
(2πℏ) (2πℏ)
Δp ⋯ Δp = = . (4.9.6)
1 d
L ⋯L V
1 d
Thus,
d
d p 2
−p /2mkB T −ε / kB T −d
ζ = V∫ e ∑g e j
=V λ ξ
j T
d
(2πℏ)
j
−ε / kB T
ξ(T ) = ∑ g e j
.
j
Here, ξ(T ) is the internal coordinate partition function. The full N -particle ordinary canonical partition function is then
N
1 V N
Z = ( ) ξ (T ) . (4.9.7)
N d
N! λ
T
Using Stirling’s approximation, we find the Helmholtz free energy F = −kB T ln Z is
V
F (T , V , N ) = −N kB T [ ln( ) + 1 + ln ξ(T )]
d
Nλ
T
V
= −N kB T [ ln( ) + 1] + N φ(T ) ,
d
Nλ
T
where
φ(T ) = −kB T ln ξ(T ) (4.9.8)
is the internal coordinate contribution to the single particle free energy. We could also compute the partition function in the Gibbs
(T , p, N ) ensemble:
−βG(T ,p,N )
1 −βpV
Y (T , p, N ) = e = ∫ dV e Z(T , V , N )
V
0
0
N
kB T kB T
N
=( )( ) ξ (T ) .
d
pV pλ
0 T
Thus, in the thermodynamic limit,

d
G(T , p, N ) pλ
T
μ(T , p) = = kB T ln( ) − kB T ln ξ(T )
N kB T
d
pλ
T
= kB T ln( ) + φ(T ) .
kB T
Ideal gas law

Since the internal coordinate contribution to the free energy is volume-independent, we have
∂G N kB T
V =( ) = , (4.9.9)
∂p p
T ,N
and the ideal gas law applies. The entropy is
∂G kB T 1 ′
S = −( ) = N kB [ ln( ) +1 + d ] − N φ (T ) , (4.9.10)
d
∂T pλ 2
p,N T
and therefore the heat capacity is

∂S 1
′′
Cp = T ( ) =( d + 1)N kB − N T φ (T )
∂T 2
p,N
∂S 1
′′
C = T( ) = dN kB − N T φ (T ) .
V
∂T 2
V ,N
Thus, any temperature variation in C must be due to the internal degrees of freedom.
p
The internal coordinate partition function

At energy scales of interest we can separate the internal degrees of freedom into distinct classes, writing
^ ^ ^ ^
hint = hrot + hvib + helec (4.9.11)
as a sum over internal Hamiltonians governing rotational, vibrational, and electronic degrees of freedom. Then
ξ =ξ ⋅ξ ⋅ξ . (4.9.12)
int rot vib elec
Associated with each class of excitation is a characteristic temperature Θ . Rotational and vibrational temperatures of a few
common molecules are listed in table tab. [rvftab].
Rotations
Consider a class of molecules which can be approximated as an axisymmetric top. The rotational Hamiltonian is then
2 2 2
La + Lb Lc
^
hrot = +
2I 2I
1 3
2
ℏ L(L + 1) 1 1 2
= +( − ) Lc ,
2I 2I 2I
1 3 1
where n^ (t) are the principal axes, with n
a.b,c
^ the symmetry axis, and L
c a,b,c
are the components of the angular momentum vector L
about these instantaneous body-fixed principal axes. The components of L along space-fixed axes {x, y, z} are written as L . x,y,z
Note that
μ ν μ ν μ ν ν ν λ λ ν
[L , Lc ] = nc [ L , L ] + [L , nc ] L = iϵ nc L + iϵ nc L =0 , (4.9.13)
μν λ μν λ
which is equivalent to the statement that L = n

^ ⋅ L is a rotational scalar. We can therefore simultaneously specify the eigenvalues
c c
of {L , L , L }, which form a complete set of commuting observables (CSCO)10. The eigenvalues of L are mℏ with
2 z
c
z
m ∈ {−L, … , L}, while those of L are kℏ with k ∈ {−L, … , L}. There is a (2L + 1) -fold degeneracy associated with the L
z
c
quantum number.
We assume the molecule is prolate, so that I 3
<I
1
. We can the define two temperature scales,
2 2
ℏ ℏ
Θ= , Θ̃ = . (4.9.14)
2I kB 2I kB
1 3
Prolateness then means Θ̃ > Θ . We conclude that the rotational partition function for an axisymmetric molecule is given by
∞ L
2
−L(L+1) Θ/T −k ( Θ̃−Θ)/T
ξ (T ) = ∑(2L + 1) e ∑ e (4.9.15)
rot
L=0 k=−L
Table [rvftab]: Some rotational and vibrational temperatures of common molecules.

molecule Θ
rot
(K ) Θ
vib
(K )
H 85.4 6100
2
N 2.86 3340
2
H O
2
13.7 , 21.0 , 39.4 2290 , 5180 , 5400
In diatomic molecules, I is extremely small, and

3
Θ̃ ≫ kB T at all relevant temperatures. Only the k =0 term contributes to the
partition sum, and we have
∞
−L(L+1) Θ/T
ξrot (T ) = ∑(2L + 1) e . (4.9.16)
L=0
When T ≪ Θ , only the first few terms contribute, and

−2Θ/T −6Θ/T
ξ (T ) = 1 + 3 e +5 e +… (4.9.17)
rot
In the high temperature limit, we have a slowly varying summand. The Euler-MacLaurin summation formula may be used to
evaluate such a series:
n
n ∞
1 B
2j (2j−1) (2j−1)
∑F = ∫ dk F (k) + [F (0) + F (n)] + ∑ [F (n) − F (0)] (4.9.18)
k
2 (2j)!
k=0 j=1
0
where B is the j^\ssr{th} Bernoulli number where

j
1 1 1 1
B =1 , B =− , B = , B =− , B = . (4.9.19)
0 1 2 4 6
2 6 30 42
Thus,
∞
∞
1 1 1
′ ′′′
∑F = ∫ dx F (x) + F (0) − F (0) − F (0) + … . (4.9.20)
k
2 12 720
k=0
0
We have F (x) = (2x + 1) e −x(x+1)Θ/T

, for which ∫ dx F (x) = T
Θ
, hence
0
2
T 1 1 Θ 4 Θ
ξrot = + + + ( ) +… . (4.9.21)
Θ 3 15 T 315 T
Recall that φ(T ) = −k T ln ξ(T ) . We conclude that φ (T ) ≈ −3k T e

B
rot
for T ≪ Θ and φ (T ) ≈ −k T ln(T /Θ) for
B
−2Θ/T
rot B
T ≫ Θ . We have seen that the internal coordinate contribution to the heat capacity is ΔC = −N T φ (T ) . For diatomic
′′
molecules, then, this contribution is exponentially suppressed for T ≪ Θ , while for high temperatures we have ΔC = N k . One V
B
says that the rotational excitations are ‘frozen out’ at temperatures much below Θ. Including the first few terms, we have
2
Θ
−2Θ/T
ΔC (T ≪ Θ) = 12 N kB ( ) e +…
V
T
2 3
1 Θ 16 Θ
ΔC (T ≫ Θ) = N kB {1 + ( ) + ( ) +… } .
V
45 T 945 T
Note that C overshoots its limiting value of N k and asymptotically approaches it from above.
V
B
Special care must be taken in the case of homonuclear diatomic molecules, for then only even or odd L states are allowed,
depending on the total nuclear spin. This is discussed below in §10.7.
For polyatomic molecules, the moments of inertia generally are large enough that the molecule’s rotations can be considered
classically. We then have
2 2 2
La Lb Lc
ε(La , Lb , Lc ) = + + . (4.9.22)
2I 2I 2I
1 2 3
We then have
1 dLa dLb dLc dϕ dθ dψ

−ε( L L L )/ kB T
ξrot (T ) = ∫ e a b c
, (4.9.23)
g (2πℏ)3
rot
where (ϕ, θ ψ) are the Euler angles. Recall ϕ ∈ [0, 2π], θ ∈ [0, π], and ψ ∈ [0, 2π]. The factor g accounts for physically rot
indistinguishable orientations of the molecule brought about by rotations, which can happen when more than one of the nuclei is
the same. We then have
3/2
2 kB T −−−−−−
ξrot (T ) = ( ) √πI1 I2 I3 . (4.9.24)
2
ℏ
This leads to ΔC V
=
3
2
N kB .
Vibrations
Vibrational frequencies are often given in units of inverse wavelength, such as cm , called a wavenumber. To convert to a −1
temperature scale T , we write k T = hν = hc/λ , hence T = (hc/k ) λ , and we multiply by

∗
B
∗ ∗
B
−1
hc
= 1.436 K ⋅ cm . (4.9.25)
kB
For example, infrared absorption (∼ 50 cm to 10 cm ) reveals that the ‘asymmetric stretch’ mode of the H
−1 4 −1
2
O molecule has a
vibrational frequency of ν = 3756 cm . The corresponding temperature scale is T = 5394 K .
−1 ∗
Vibrations are normal modes of oscillations. A single normal mode Hamiltonian is of the form
2
p 1 1
^ 2 2 †
h = + mω q = ℏω(a a + ) . (4.9.26)
2m 2 2
In general there are many vibrational modes, hence many normal mode frequencies ωα . We then must sum over all of them,
resulting in
(α)
ξ = ∏ξ . (4.9.27)
vib vib
α
For each such normal mode, the contribution is
∞ ∞
1 n
−(n+ )ℏω/ kB T −ℏω/2 kB T −ℏω/ kB T
ξ = ∑e 2
=e ∑ (e )
n=0 n=0
−ℏω/2 kB T
e 1
= = ,
−ℏω/ kB T
1 −e 2 sinh(Θ/2T )
where Θ = ℏω/k . Then

B
φ = kB T ln (2 sinh(Θ/2T ))
1 −Θ/T
= kB Θ + kB T ln (1 − e ) .
2
The contribution to the heat capacity is

2 Θ/T
Θ e
ΔC = N kB ( )
V
T Θ/T 2
(e − 1)
2
N kB (Θ/T ) exp(−Θ/T ) (T → 0)
={
N kB (T → ∞)
Two-level systems : Schottky anomaly

Consider now a two-level system, with energies ε
0
and ε
1
. We define Δ ≡ε
1
−ε
0
and assume without loss of generality that
Δ > 0 . The partition function is
−βε −βε −βε −βΔ

ζ =e 0
+e 1
=e 0
(1 + e ) . (4.9.28)
The free energy is

−Δ/ kB T
f = −kB T ln ζ = ε − kB T ln (1 + e ) . (4.9.29)
0
The entropy for a given two level system is then

∂f Δ 1
−Δ/ kB T
s =− = kB ln (1 + e )+ ⋅ (4.9.30)
∂T T Δ/ kB T
e +1
and the heat capacity is = T (∂s/∂T ) ,

2 Δ/ kB T
Δ e
c(T ) = ⋅ . (4.9.31)
2 2
kB T Δ/ kB T
(e + 1)
Thus,
2
Δ
−Δ/ kB T
c (T ≪ Δ) = e
2
kB T
2
Δ
c (T ≫ Δ) = .
2
4kB T
We find that c(T ) has a characteristic peak at T ≈ 0.42 Δ/k . The heat capacity vanishes in both the low temperature and high
∗
B
temperature limits. At low temperatures, the gap to the excited state is much greater than k T , and it is not possible to populate it B
and store energy. At high temperatures, both ground state and excited state are equally populated, and once again there is no way to
store energy.
Figure [molgas]: Heat capacity per molecule as a function of temperature for (a) heteronuclear diatomic gases, (b) a single
vibrational mode, and (c) a single two-level system.
If we have a distribution of independent two-level systems, the heat capacity of such a system is a sum over the individual Schottky
functions:
∞
C (T ) = ∑ c̃ (Δ / kB T ) = N∫ dΔ P (Δ) c̃ (Δ/T ) , (4.9.32)

i
i
0
where N is the number of two level systems, c̃ (x) = k B

x
2 x
e /(e
x
+ 1)
2
, and where P (Δ) is the normalized distribution function,
which satisfies the normalization condition
∞
∫ dΔ P (Δ) = 1 . (4.9.33)
N
S
is the total number of two level systems. If P (Δ) ∝ Δ for Δ → 0 , then the low temperature heat capacity behaves as
r
C (T ) ∝ T
1+r
. Many amorphous or glassy systems contain such a distribution of two level systems, with r ≈ 0 for glasses, leading
to a linear low-temperature heat capacity. The origin of these two-level systems is not always so clear but is generally believed to
be associated with local atomic configurations for which there are two low-lying states which are close in energy. The paradigmatic
example is the mixed crystalline solid (KBr) (KC N ) which over the range 0.1 ∝ x ∝ 0.6 forms an ‘orientational glass’ at
1−x x
low temperatures. The two level systems are associated with different orientation of the cyanide (CN) dipoles.
Electronic and Nuclear Excitations

For a monatomic gas, the internal coordinate partition function arises due to electronic and nuclear degrees of freedom. Let’s first
consider the electronic degrees of freedom. We assume that k T is small compared with energy differences between successive
B
electronic shells. The atomic ground state is then computed by filling up the hydrogenic orbitals until all the electrons are used up.
If the atomic number is a ‘magic number’ (A = 2 (He), 10 (Ne), 18 (Ar), 36 (Kr), 54 (Xe), ) then the atom has all shells filled and
L = 0 and S = 0 . Otherwise the last shell is partially filled and one or both of L and S will be nonzero. The atomic ground state
configuration 2J+1
L
S
is then determined by Hund’s rules:
1. The LS multiplet with the largest S has the lowest energy.
2. If the largest value of S is associated with several multiplets, the multiplet with the largest L has the lowest energy.
3. If an incomplete shell is not more than half-filled, then the lowest energy state has J = |L − S| . If the shell is more than half-
filled, then J = L + S .
The last of Hund’s rules distinguishes between the (2S + 1)(2L + 1) states which result upon fixing S and L as per rules #1 and
#2. It arises due to the atomic spin-orbit coupling, whose effective Hamiltonian may be written H ^
= ΛL ⋅ S , where Λ is the
Russell-Saunders coupling. If the last shell is less than or equal to half-filled, then Λ > 0 and the ground state has J = |L − S| . If
the last shell is more than half-filled, the coupling is inverted, Λ < 0 , and the ground state has J = L + S .11
The electronic contribution to ξ is then
L+S
−Δε(L,S,J)/ kB T
ξ = ∑ (2J + 1) e (4.9.34)
elec
J=|L−S|
where
1
Δε(L, S, J) = Λ[J(J + 1) − L(L + 1) − S(S + 1)] . (4.9.35)
2
At high temperatures, is larger than the energy difference between the different J multiplets, and we have
kB T
ξ
elec
∼ (2L + 1)(2S + 1) e , where ε is the ground state energy. At low temperatures, a particular value of J is selected – that
−βε
0
0
determined by Hund’s third rule – and we have ξ ∼ (2J + 1) e . If, in addition, there is a nonzero nuclear spin I , then we
elec
−βε
0
also must include a factor ξ = (2I + 1) , neglecting the small hyperfine splittings due to the coupling of nuclear and electronic
nuc
angular momenta.
For heteronuclear diatomic molecules, molecules composed from two different atomic nuclei, the internal partition function simply
(1) (2)
receives a factor of ξ ⋅ ξ ⋅ ξ , where the first term is a sum over molecular electronic states, and the second two terms arise
elec nuc nuc
from the spin degeneracies of the two nuclei. For homonuclear diatomic molecules, the exchange of nuclear centers is a symmetry
operation, and does not represent a distinct quantum state. To correctly count the electronic states, we first assume that the total
electronic spin is S = 0 . This is generally a very safe assumption. Exchange symmetry now puts restrictions on the possible values
of the molecular angular momentum L, depending on the total nuclear angular momentum I . If I is even, then the molecular tot tot
angular momentum L must also be even. If the total nuclear angular momentum is odd, then L must be odd. This is so because the
molecular ground state configuration is Σ .12 1 +
\slg
The total number of nuclear states for the molecule is (2I + 1) , of which some are even under nuclear exchange, and some are
2
odd. The number of even states, corresponding to even total nuclear angular momentum is written as g , where the subscript g
conventionally stands for the (mercifully short) German word gerade, meaning ‘even’. The number of odd (Ger. ungerade) states is
written g . Table [nucspin] gives the values of g corresponding to half-odd-integer I and integer I .
u g,u
The final answer for the rotational component of the internal molecular partition function is then
ξrot (T ) = gg ζg + gu ζu , (4.9.36)
where
−L(L+1) Θrot /T
ζg = ∑ (2L + 1) e
L even
−L(L+1) Θrot /T
ζu = ∑ (2L + 1) e .
L odd
For hydrogen, the molecules with the larger nuclear statistical weight are called orthohydrogen and those with the smaller statistical
weight are called parahydrogen. For H , we have I = hence the ortho state has g = 3 and the para state has g = 1 . In D , we
2
1
2
u g 2
have I = 1 and the ortho state has g = 6 while the para state has g = 3 . In equilibrium, the ratio of ortho to para states is then
g u
ortho ortho
N N gg ζg 2 ζg
H2 gu ζu 3 ζu D2
= = , = = . (4.9.37)
para para
N gg ζg ζg N gu ζu ζu
H2 D2
Table [nucspin]: Number of even (g ) and odd (g ) total nuclear angular momentum states for a homonuclear diatomic molecule. I is the
g u
ground state nuclear spin.

2I gg gu
odd I (2I + 1) (I + 1)(2I + 1)
even (I + 1)(2I + 1) I (2I + 1)
Incidentally, how do we derive the results in Table [tabgggu] ? The total nuclear angular momentum I is the quantum mechanical tot
sum of the two individual nuclear angular momenta, each of which are of magnitude I . From elementary addition of angular
momenta, we have
I ⊗ I = 0 ⊕ 1 ⊕ 2 ⊕ ⋯ ⊕ 2I . (4.9.38)
The right hand side of the above equation lists all the possible multiplets. Thus, I ∈ {0, 1, … , 2I }. Now let us count the total
tot
number of states with even I . If 2I is even, which is to say if I is an integer, we have

tot
I
(2I=even)
gg = ∑ {2 ⋅ (2n) + 1} = (I + 1)(2I + 1) , (4.9.39)
n=0
because the degeneracy of each multiplet is 2I tot

+1 . It follows that
(2I=even) 2
gu = (2I + 1 ) − gg = I (2I + 1) . (4.9.40)
On the other hand, if 2I is odd, which is to say I is a half odd integer, then
1
I−
2
(2I=odd)
gg = ∑ {2 ⋅ (2n) + 1} = I (2I + 1) . (4.9.41)
n=0
It follows that
(2I=odd) 2
gu = (2I + 1 ) − gg = (I + 1)(2I + 1) . (4.9.42)
This page titled 4.9: Statistical Mechanics of Molecular Gases is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated
by Daniel Arovas.
Three state system
Consider a spin-1 particle where σ = −1, 0, +1. We model this with the single particle Hamiltonian
^ 2
h = −μ H σ + Δ(1 − σ ) . (4.10.1)
0
We can also interpret this as describing a spin if σ = ±1 and a vacancy if σ =0 . The parameter Δ then represents the vacancy formation
energy. The single particle partition function is
^
−βh −βΔ
ζ = Tr e =e + 2 cosh(β μ H ) . (4.10.2)
0
With N distinguishable noninteracting spins ( at different sites in a crystalline lattice), we have Z=\zeta^\NS and
S
−βΔ
F ≡ N f = −kB T ln Z = −N kB T ln [e + 2 cosh(β μ H )] , (4.10.3)
S S 0
where f = −kB T ln ζ is the free energy of a single particle. Note that

^
2
∂h
^
n = 1 −σ =
V
∂Δ
^
∂h
^ =μ
m σ =−
0
∂H
are the vacancy number and magnetization, respectively. Thus,

−Δ/ kB T
∂f e
n = ⟨n
^ ⟩ = = (4.10.4)
V V
−Δ/ kB T
∂Δ e + 2 cosh(μ H / kB T )
0
and
∂f 2μ sinh(μ H / kB T )
0 0
^ ⟩ =−
m = ⟨m = . (4.10.5)
∂H −Δ/ kB T
e + 2 cosh(μ H / kB T )
0
At weak fields we can compute

2
∂m ∣ μ 2
0
χ = ∣ = ⋅ . (4.10.6)
T
∂H ∣ H =0 kB T 2 +e −Δ/ kB T
We thus obtain a modified Curie law. At temperatures T ≪ Δ/k , the vacancies are frozen out and we recover the usual Curie behavior. At
B
high temperatures, where T ≫ Δ/k , the low temperature result is reduced by a factor of , which accounts for the fact that one third of the
B
2
time the particle is in a nonmagnetic state with σ = 0 .
Spins and vacancies on a surface

A collection of spin- particles is confined to a surface with N sites. For each site, let σ = 0 if there is a vacancy, σ = +1 if there is particle
1
present with spin up, and σ = −1 if there is a particle present with spin down. The particles are non-interacting, and the energy for each site is
given by ε = −W σ , where −W < 0 is the binding energy.
2
Let Q = N + N be the number of spins, and N be the number of vacancies. The surface magnetization is M = N − N . Compute, in
↑ ↓ 0 ↑ ↓
the microcanonical ensemble, the statistical entropy S(Q, M ).

Let q = Q/N and m = M /N be the dimensionless particle density and magnetization density, respectively. Assuming that we are in the
thermodynamic limit, where N , Q, and M all tend to infinity, but with q and m finite, Find the temperature T (q, m). Recall Stirling’s
formula
ln(N !) = N ln N − N + O(ln N ) . (4.10.7)
Show explicitly that T can be negative for this system. What does negative T mean? What physical degrees of freedom have been left out
that would avoid this strange property?
There is a constraint on N , N , and N :
↑ 0 ↓
N +N +N = Q +N =N . (4.10.8)
↑ 0 ↓ 0
The total energy of the system is E = −W Q .
The number of states available to the system is
N!
Ω = . (4.10.9)
N !N !N !
↑ 0 ↓
Fixing Q and M , along with the above constraint, is enough to completely determine {N ↑
,N ,N }
0 ↓
:
1 1
N = (Q + M ) , N = N −Q , N = (Q − M ) , (4.10.10)
↑ 0 ↓
2 2
whence
N!
Ω(Q, M ) = . (4.10.11)
1 1
[ (Q + M )]! [ (Q − M )]! (N − Q)!
2 2
The statistical entropy is S = k B

ln Ω :
1 1
S(Q, M ) = kB ln(N !) − kB ln [ (Q + M )!] − kB ln [ (Q − M )!] − kB ln [(N − Q)!] . (4.10.12)
2 2
Now we invoke Stirling’s rule,

ln(N !) = N ln N − N + O(ln N ) , (4.10.13)
to obtain
1 1 1
ln Ω(Q, M ) = N ln N − N − (Q + M ) ln [ (Q + M )] + (Q + M )
2 2 2
1 1 1
− (Q − M ) ln [ (Q − M )] + (Q − M )
2 2 2
− (N − Q) ln(N − Q) + (N − Q)
1 1 1 Q +M
2 2
= N ln N − Q ln [ (Q −M )] − M ln( )
2 4 2 Q −M
Combining terms,
−−−−−−−
1 2 2
1 q +m
ln Ω(Q, M ) = −N q ln [ √q −m ]− N m ln( ) − N (1 − q) ln(1 − q) , (4.10.14)
2 2 q −m
where Q = N q and M = Nm . Note that the entropy S = k B

ln Ω is extensive. The statistical entropy per site is thus
−−−−−−−
1 2 2
1 q +m
s(q, m) = −kB q ln [ √q −m ]− kB m ln( ) − kB (1 − q) ln(1 − q) . (4.10.15)
2 2 q −m
The temperature is obtained from the relation

1 ∂S 1 ∂s
=( ) = ( )
T ∂E W ∂q
M m
1 1 1 −−−−−−−
2 2
= ln(1 − q) − ln [ √q −m ] .
W W 2
Thus,
W /kB
T = . (4.10.16)
− −−−−−−
2 2
ln[2(1 − q)/ √ q − m ]
We have 0 ≤ q ≤ 1 and −q ≤ m ≤ q , so T is real (thank heavens!). But it is easy to choose {q, m} such that T < 0 . For example, when
m = 0 we have T = W / k ln(2 q − 2) and T < 0 for all q ∈ ( , 1] . The reason for this strange state of affairs is that the entropy S is
−1 2
B
3
bounded, and is not an monotonically increasing function of the energy E (or the dimensionless quantity Q). The entropy is maximized for
N ↑= N = N =
0 ↓
1
3
, which says m = 0 and q = . Increasing q beyond this point (with m = 0 fixed) starts to reduce the entropy, and
2
hence (∂S/∂E) < 0 in this range, which immediately gives T < 0 . What we’ve left out are kinetic degrees of freedom, such as vibrations
and rotations, whose energies are unbounded, and which result in an increasing S(E) function.
Fluctuating Interface
Consider an interface between two dissimilar fluids. In equilibrium, in a uniform gravitational field, the denser fluid is on the bottom. Let
z = z(x, y) be the height the interface between the fluids, relative to equilibrium. The potential energy is a sum of gravitational and surface
tension terms, with

z
2 ′ ′
Ugrav = ∫ d x∫ dz Δρ g z
2
1 2
U =∫d x σ (∇z) .
surf
2
We won’t need the kinetic energy in our calculations, but we can include it just for completeness. It isn’t so clear how to model it a priori so we
will assume a rather general form
′
1 ∂z(x, t) ∂z(x , t)
2 2 ′ ′
T = ∫ d x∫ d x μ(x, x ) . (4.10.17)
2 ∂t ∂t
We assume that the (x, y) plane is a rectangle of dimensions L x × Ly . We also assume μ(x, x ) = μ(|x − x |) . We can then Fourier transform
′ ′
−1/2 ik⋅x
z(x) = (Lx Ly ) ∑z e , (4.10.18)
k
where the wavevectors k are quantized according to

2πnx 2πny
k = ^ +
x ^ ,
y (4.10.19)
Lx Ly
with integer n and n , if we impose periodic boundary conditions (for calculational convenience). The Lagrangian is then
x y
1 2 2 2
L = ∑ [μ ∣
∣ż k ∣
∣ − (g Δρ + σ k ) ∣
∣zk ∣
∣ ] , (4.10.20)
k
2
k
where
2 −ik⋅x
μ = ∫ d x μ(|x|) e . (4.10.21)
k
Since z(x, t) is real, we have the relation z

−k
=z
∗
k
, therefore the Fourier coefficients at k and −k are not independent. The canonical
momenta are given by
∂L ∂L ∗
∗
p = =μ ż k , p = =μ ż k (4.10.22)
k ∗ k k k
∂ż k ∂ż k
The Hamiltonian is then

′
^ ∗ ∗
H = ∑ [p z +p z ]−L
k k k k
2
′ |p |
k 2 2
=∑ [ + (g Δρ + σ k ) | z | ] ,
k
μ
k k
where the prime on the k sum indicates that only one of the pair {k, −k} is to be included, for each k.
We may now compute the ordinary canonical partition function:
2 2
′ d p d z 2 2 2
k k −| p | /μ kB T −(g Δρ+σk ) | z | / kB T
Z =∏ ∫ e k k
e k
2
(2πℏ)
k
2
′ kB T μ
k
=∏ ( )( ) .
2
2ℏ g Δρ + σk
k
Thus,
kB T
F = −kB T ∑ ln( ) , (4.10.23)
2ℏΩ
k k
where13
2 1/2
g Δρ + σk
Ω =( ) . (4.10.24)
k
μ
k
is the normal mode frequency for surface oscillations at wavevector k. For deep water waves, it is appropriate to take μ
k
= Δρ/|k| , where
Δρ = ρ − ρ
L
≈ρ
G
is the difference between the densities of water and air.
L
It is now easy to compute the thermal average

2 2 2 2
2 2 2 −(g Δρ+σk ) | z | / kB T 2 −(g Δρ+σk ) | z | / kB T
⟨ |z | ⟩ =∫d z |z | e k
/∫d z e k
k k k k
kB T
= .
2
g Δρ + σk
Note that this result does not depend on μ , on our choice of kinetic energy. One defines the correlation function
k
2
1 2 ik⋅x
d k kB T ik⋅x
C (x) ≡ ⟨ z(x) z(0) ⟩ = ∑ ⟨ |z | ⟩ e =∫ ( ) e
k 2 2
Lx Ly (2π) g Δρ + σk
k
∞
ik|x|
kB T e kB T
= ∫ dq −−−−− − = K (|x|/ξ) ,
0
4πσ √q2 + ξ2 4πσ
0
−−−−−−
where ξ = √g Δρ/σ is the correlation length, and where K (z) is the Bessel function of imaginary argument. The asymptotic behavior of
0
K (z)
0
for small z is K (z) ∼ ln(2/z) , whereas for large z one has K (z) ∼ (π/2z) e . We see that on large length scales the correlations
0 0
1/2 −z
decay exponentially, but on small length scales they diverge. This divergence is due to the improper energetics we have assigned to short
wavelength fluctuations of the interface. Roughly, it can cured by imposing a cutoff on the integral, or by insisting that the shortest distance
scale is a molecular diameter.
Dissociation of Molecular Hydrogen

Consider the reaction
+
H \ooalign{\raise1pt\hbox{\relbar\joinrel ⇀ \joinrel}\crcr \lower1pt\hbox{↽ \joinrel\relbar\joinrel}}
p (4.10.25)
−
+e .
In equilibrium, we have
μ = μp + μe . (4.10.26)
H
What is the relationship between the temperature T and the fraction x of hydrogen which is dissociated?
Let us assume a fraction x of the hydrogen is dissociated. Then the densities of H, p, and e are then
n = (1 − x) n , np = xn , ne = xn . (4.10.27)
H
The single particle partition function for each species is

N N
g V −N ε / kB T
ζ = ( ) e int
, (4.10.28)
3
N! λ
T
where gis the degeneracy and ε the internal energy for a given species. We have ε = 0 for p and e, and ε
int int int
= −Δ for H, where
14
= 13.6 eV, the binding energy of hydrogen. Neglecting hyperfine splittings , we have g = 4 , while g because each
2
Δ = e /2 a e = gp = 2
B H
has spin S = . Thus, the associated grand potentials are

1
−3 (μ +Δ)/ kB T
Ω = −g V kB T λ e H
H H T ,H
−3 μp / kB T
Ωp = −gp V kB T λ e
T ,p
−3 μe / kB T
Ωe = −ge V kB T λ e ,
T ,e
where
−−−−−−
2
2πℏ
λ =√ (4.10.29)
T ,a
ma kB T
for species a . The corresponding number densities are
1 ∂Ω −3 (μ−ε )/ kB T
n = ( ) = gλ e int
, (4.10.30)
T
V ∂μ
T ,V
and the fugacity z = e μ/ kB T

of a given species is given by
−1 3 ε / kB T
z =g nλ e int
. (4.10.31)
T
We now invoke μ H
= μp + μe , which says z H
= zp ze , or
−1 3 −Δ/ kB T −1 3 −1 3
g n λ e = (gp np λ )(ge ne λ ) , (4.10.32)
H H T ,H T ,p T ,e
which yields
2
x ~3 −Δ/ kB T
( ) nλT = e , (4.10.33)
1 −x
~ −−−−−−−−− −
where λ T
2 ∗
= √2π ℏ / m kB T , with m ∗
= mp me / m
H
≈ me . Note that
−−−−−− −−−−
~ 4πm Δ
H
λT = a √ √ , (4.10.34)
B
mp kB T
where a B
= 0.529 Å is the Bohr radius. Thus, we have
2 3/2
x 3/2
T −T0 /T
( ) ⋅ (4π ) ν =( ) e , (4.10.35)
1 −x T
0
where T = Δ/k = 1.578 × 10 K and ν = na . Consider for example a temperature T = 3000 K , for which T /T = 52.6, and assume
0 B
5 3
B 0
that x = . We then find ν = 1.69 × 10

1
2
, corresponding to a density of n = 1.14 × 10 cm . At this temperature, the fraction of
−27 −2 −3
hydrogen molecules in their first excited (2s) state is x ∼ e = 3.8 × 10 . This is quite striking: half the hydrogen atoms are
′ −T /2T
0
−12
completely dissociated, which requires an energy of Δ, yet the number in their first excited state, requiring energy Δ, is twelve orders of 1
magnitude smaller. The student should reflect on why this can be the case.
This page titled 4.10: Appendix I- Additional Examples is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel Arovas.
4.S: Summary
References
F. Reif, Fundamentals of Statistical and Thermal Physics (McGraw-Hill, 1987) This has been perhaps the most popular
undergraduate text since it first appeared in 1967, and with good reason.
D. V. Schroeder, An Introduction to Thermal Physics (Addison-Wesley, 2000) This is the best undergraduate thermodynamics
book I’ve come across, but only 40% of the book treats statistical mechanics.
C. Kittel, Elementary Statistical Physics (Dover, 2004) Remarkably crisp, though dated, this text is organized as a series of brief
discussions of key concepts and examples. Published by Dover, so you can’t beat the price.
M. Kardar, Statistical Physics of Particles (Cambridge, 2007) A superb modern text, with many insightful presentations of key
concepts.
M. Plischke and B. Bergersen, Equilibrium Statistical Physics (3 edition, World Scientific, 2006) An excellent graduate level
rd
text. Less insightful than Kardar but still a good modern treatment of the subject. Good discussion of mean field theory.
E. M. Lifshitz and L. P. Pitaevskii, Statistical Physics (part I, 3 edition, Pergamon, 1980) This is volume 5 in the famous
rd
Landau and Lifshitz Course of Theoretical Physics. Though dated, it still contains a wealth of information and physical insight.
Summary
∙ Distributions: Let ϱ(φ) be a normalized distribution on phase space. Then
⟨f (φ)⟩ = Tr [ϱ(φ) f (φ)] = ∫ dμ ϱ(φ) f (φ) , (4.S.1)
where dμ = W (φ) ∏ dφ is the phase space measure. For a Hamiltonian system of N identical indistinguishable point particles
i i
in d space dimensions, we have

N d d
1 d p d q
i i
dμ = ∏ . (4.S.2)
d
N! (2πℏ)
i=1
The 1
N!
prefactor accounts for indistinguishability. Normalization means Tr ϱ = 1 .
∙ Microcanonical ensemble (μ CE): ϱ(φ) = δ(E − H ^

(φ))/D(E) , where D(E) = Tr δ(E − H (φ)) is the density of states and
^
H^ (φ) = H^ (q, p) is the Hamiltonian. The energy E , volume V , and particle number N are held fixed. Thus, the density of states
D(E, V , N ) is a function of all three variables. The statistical entropy is S(E, V , N ) = k ln D(E, V , N ) , where k is B B
Boltzmann’s constant. Since D has dimensions of E , an arbitrary energy scale is necessary to convert D to a dimensionless
−1
quantity before taking the log. In the thermodynamic limit, one has
E V
S(E, V , N ) = N kB ϕ( , ) . (4.S.3)
N N
The differential of E is defined to be dE = T dS − p dV + μ dN , thus T =(

∂E
∂S
) is the temperature, p = −( ∂E
∂V
) is the
V ,N S,N
pressure, and μ = ( ∂E
∂N
) is the chemical potential. Note that E , S , V , and N are all extensive quantities, they are halved when
S,V
the system itself is halved.

∙ Ordinary canonical ensemble (OCE): In the OCE, energy fluctuates, while V , N , and the temperature T are fixed. The
^ ^
distribution is ϱ = Z e −1
, where β = 1/k T and Z = Tr e
−βH
B
is the partition function. Note that Z is the Laplace transform
−βH
of the density of states: Z = ∫ dE D(E) e . The Boltzmann entropy is S = −k Tr (ϱ ln ϱ) . This entails F = E − T S , where
−βE
B
F = −k T ln Z
B
is the Helmholtz free energy, a Legendre transform of the energy E . From this we derive
dF = −S dT − p dV + μ dN .
∙ Grand canonical ensemble (GCE): In the GCE, both E and N fluctuate, while T , V , and chemical potential μ remain fixed.
^ ^ ^ ^
Then ϱ =Ξ
−1
e
−β( H −μ N )
, where Ξ = Tr e
−β( H −μ N )
is the grand partition function and Ω = −kB T ln Ξ is the grand potential.
Assuming [H
^
, N ] = 0 , we can label states | n ⟩ by both energy and particle number. Then
^
Pn = Ξ
−1
e
−β( En −μ Nn )
. We also have
Ω = E − T S − μN , hence dΩ = −S dT − p dV − N dμ .
∙ Thermodynamics: From E = Tr (ϱ H
^
) , we have
^ ^ ′
dE = Tr (H dϱ) + Tr (ϱ dH ) = \mathchar 26
′
dQ − \mathchar 26
dW ,
where \mathchar 26
dQ = T dS and
′
′ ∂En
^
\mathchar 26
dW = − Tr (ϱ dH ) = − ∑ Pn ∑ dX = ∑F dX , (4.S.4)
i i i
∂X
n i i i
^
with P n =Z
−1
e
−En / kB T
. Here F i
= −⟨
∂H
∂X
⟩ is the generalized force conjugate to the generalized displacement X . i
i
∙ Thermal contact: In equilibrium, two systems which can exchange energy satisfy T = T . Two systems which can exchange 1 2
volume satisfy p /T = p /T . Two systems which can exchange particle number satisfy μ /T = μ /T .
1 1 2 2 1 1 2 2
∙ Gibbs-Duhem relation: Since E(S, V , N ) is extensive, Euler’s theorem for homogeneous functions guarantees that
E = T S − pV + μN . Taking the differential, we obtain the equation S dT − V dp + N dμ = 0 , so there must be a relation
among any two of the intensive quantities T , p, and μ .
∙ Generalized susceptibilities: Within the OCE1, let H

^ (λ) = H
^
0
^
−∑ λ Q
i i i
, where Q
^
are observables with [Q
i
^
i
^
,Q ] =0
j
. Then
2
∂F 1 ∂Q 1 ∂ F
^ k
Q (T , V , N ; λ) = ⟨Q ⟩ = − , χ (T , V , N ; λ) = =− . (4.S.5)
k k kl
∂λ V ∂λ V ∂λ ∂λ
k l k l
The quantities χ are the generalized susceptibilities.

kl
2
N p N
∙ Ideal gases: For H
^
=∑ i=1 2m
i
, one finds Z(T , V , N ) = N!
1
(
V
d
) , where $\lambda\ns_T=\sqrt
λT
$ is the thermal wavelength. Thus F = N kB T ln(N /V ) −

1
2
dN kB T ln T + N a , where a is a constant. From this one finds
p = −(
∂F
∂V
) = nkB T , which is the ideal gas law, with n = N
V
the number density. The distribution of velocities in d = 3
T ,N
dimensions is given by
N 3/2
1 m 2
−mv /2 kB T
f (v) = ⟨ ∑ δ(v − v )⟩ = ( ) e , (4.S.6)
i
N 2π kB T
i=1
and this leads to a speed distribution f¯(v) = 4π v 2

f (v) .
∙Example: For N noninteracting spins in an external magnetic field H , the Hamiltonian is H
^
= −μ
0
H ∑
N
i=1
σ
i
, where σ i
= ±1 .
The spins, if on a lattice, are regarded as distinguishable. Then Z = ζ , where ζ = ∑ e
N
σ=±1
βμ H σ
0
= 2 cosh(β μ H )
0
. The
magnetization and magnetic susceptibility are then
2
∂F μ H ∂M Nμ μ H
0 0 2 0
M = −( ) = Nμ tanh( ) , χ = = sech ( ) . (4.S.7)
0
∂H kB T ∂H kB T kB T
T ,N
2
p N
∙ Example: For noninteracting particles with kinetic energy 2m
and internal degrees of freedom, Z
N
=
N!
1
(
V
d
) ξ
N
(T ) , where
λT
^
ξ(T ) = Tr e is the partition function for the internal degrees of freedom, which include rotational, vibrational, and electronic
−βhint
excitations. One still has pV = N k T , but the heat capacities at constant V and p are
B
∂S 1 ∂S
′′
C = T( ) = dN kB − N T φ (T ) , Cp = T ( ) =C + N kB , (4.S.8)
V V
∂T 2 ∂T
V ,N p,N
where φ(T ) = −k B
T ln ξ(T ) .
1. The generalization to the GCE is straightforward.↩
Endnotes
1. We write the Hamiltonian as H
^
(classical or quantum) in order to distinguish it from magnetic field (H ) or enthalpy (H).↩
2. More on this in chapter 5.↩
3. The factor of preceding Ω in Equation [nrdos] appears because δ(u − 1) = δ(u − 1) + δ(u + 1) . Since u = |u| ≥ 0 ,
1
2 M
2 1
2
1
the second term can be dropped.↩

4. Note that for integer argument, Γ(k) = (k − 1)! ↩
5. See §2.7.4.↩
6. See T.-C. Lu and T. Grover, arXiv 1709.08784.↩
7. In applying Equation [EminusTS] to the denominator of Equation [PEOCE], we shift E by E and integrate over the difference
′
δE ≡ E − E , retaining terms up to quadratic order in δE in the argument of the exponent.↩

′ ′ ′
8. In deriving Equation [thermforce], we have used the so-called Feynman-Hellman theorem of quantum mechanics:
d⟨n| H |n⟩ = ⟨n| dH |n⟩ , if |n⟩ is an energy eigenstate.↩
^ ^
9. Nota bene we are concerned with classical spin configurations only – there is no superposition of states allowed in this model!
↩
10. Note that while we cannot simultaneously specify the eigenvalues of two components of L along axes fixed in space, we can
simultaneously specify the components of L along one axis fixed in space and one axis rotating with a body. See Landau and
Lifshitz, Quantum Mechanics, §103.↩
11. See §72 of Landau and Lifshitz, Quantum Mechanics, which, in my humble estimation, is the greatest physics book ever
written.↩
12. See Landau and Lifshitz, Quantum Mechanics, §86.↩
13. Note that there is no prime on the k sum for F , as we have divided the logarithm of Z by two and replaced the half sum by the
whole sum.↩
14. The hyperfine splitting in hydrogen is on the order of (m /m ) α m c ∼ 10 eV, which is on the order of 0.01 K. Here
e p
4
e
2 −6
α = e /ℏc is the fine structure constant.↩

2
CHAPTER OVERVIEW
5: Noninteracting Quantum Systems

5.4: Lattice Vibrations - Einstein and Debye Models
5.S: Summary
Thumbnail: Fermi distribution in the presence of an external Zeeman-coupled magnetic field.
This page titled 5: Noninteracting Quantum Systems is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by
Daniel Arovas.
1
Bose and Fermi systems in the grand canonical ensemble
A noninteracting many-particle quantum Hamiltonian may be written as1
^
H = ∑ εα n
^ , (5.1.1)
α
where n
^
α
is the number of particles in the quantum state α with energy ε . This form is called the second quantized representation of the Hamiltonian. The number eigenbasis is therefore also an
α
energy eigenbasis. Any eigenstate of H^

may be labeled by the integer eigenvalues of the n
^ number operators, and written as ∣∣ n , n , … ⟩. We then have
α 1 2
^ ∣
n n⃗ ⟩ = nα ∣
∣ n⃗ ⟩ (5.1.2)
α ∣
and
^
H ∣
∣ n⃗ ⟩ = ∑ nα εα ∣
∣ n⃗ ⟩ . (5.1.3)
The eigenvalues n take on different possible values depending on whether the constituent particles are bosons or fermions, viz.
α
bosons: nα ∈ {0 , 1 , 2 , 3 , … }
fermions: nα ∈ {0 , 1} .
In other words, for bosons, the occupation numbers are nonnegative integers. For fermions, the occupation numbers are either 0 or 1 due to the Pauli principle, which says that at most one fermion
can occupy any single particle quantum state. There is no Pauli principle for bosons.
The N -particle partition function Z is then
N
−β ∑ nα εα
Z = ∑ e α
δ , (5.1.4)
N N , ∑ nα
α
{ nα }
where the sum is over all allowed values of the set {n α} , which depends on the statistics of the particles. Bosons satisfy Bose-Einstein (BE) statistics, in which n α . Fermions satisfy
∈ {0 , 1 , 2 , …}
Fermi-Dirac (FD) statistics, in which n ∈ {0 , 1}.

α
The OCE partition sum is difficult to perform, owing to the constraint ∑ α

nα = N on the total number of particles. This constraint is relaxed in the GCE, where
βμN
Ξ = ∑e Z
N
−β ∑ nα εα βμ ∑ nα
= ∑e α
e α
{ nα }
−β( εα −μ) nα
= ∏(∑e ) .
α nα
Note that the grand partition function Ξ takes the form of a product over contributions from the individual single particle states.
We now perform the single particle sums:
∞
1
−β(ε−μ) n
∑e = (bosons)
−β(ε−μ)
1 −e
n=0
−β(ε−μ) n −β(ε−μ)
∑e = 1 +e (fermions) .
n=0
Therefore we have
\begin{split} \XBE&=\prod_\alpha {1\over 1-e^{-(\ve\ns_\alpha-\mu)/\kT}}\\ \OBE&=\kT\sum_\alpha\ln\!\Big(1-e^{-(\ve\ns_\alpha-\mu)/\kT}\Big) \end{split}
and
\begin{split} \XFD&=\prod_\alpha \Big(1+e^{-(\ve\ns_\alpha-\mu)/\kT}\Big)\\ \OFD&=-\kT\sum_\alpha\ln\!\Big(1+e^{-(\ve\ns_\alpha-\mu)/\kT}\Big). \end{split}
We can combine these expressions into one, writing

−( εα −μ)/ kB T
Ω(T , V , μ) = ±kB T ∑ ln(1 ∓ e ), (5.1.5)
where we take the upper sign for Bose-Einstein statistics and the lower sign for Fermi-Dirac statistics. Note that the average occupancy of single particle state α is
∂Ω 1
^ ⟩ =
⟨n = , (5.1.6)
α
∂εα e( εα −μ)/ kB T ∓ 1
and the total particle number is then

1
N (T , V , μ) = ∑ . (5.1.7)
( εα −μ)/ kB T
α
e ∓1
We will henceforth write n α (μ, T ) = ⟨n

^
α
⟩ for the thermodynamic average of this occupancy.
Quantum statistics and the Maxwell-Boltzmann limit

Consider a system composed of N noninteracting particles. The Hamiltonian is
N
^ ^
H = ∑ hj . (5.1.8)
j=1
The single particle Hamiltonian h

^
has eigenstates |α⟩ with corresponding energy eigenvalues ε . What is the partition function? Is it
α
?
−β(ε +ε +… +ε )
α α α N
Z to10pt= ∑ ⋯ ∑e 1 2 N =ζ , (5.1.9)
α α
1 N
where ζ is the single particle partition function,

−βεα
ζ = ∑e . (5.1.10)
For systems where the individual particles are distinguishable, such as spins on a lattice which have fixed positions, this is indeed correct. But for particles free to move in a gas, this equation is
wrong. The reason is that for indistinguishable particles the many particle quantum mechanical states are specified by a collection of occupation numbers n , which tell us how many particles are in α
the single-particle state | α ⟩. The energy is
E = ∑ nα εα (5.1.11)
and the total number of particles is
N = ∑ nα . (5.1.12)
That is, each collection of occupation numbers {n } labels a unique many particle state ∣∣ {n } ⟩. In the product ζ , the collection
α α
N
{ nα } occurs many times. We have therefore overcounted the
contribution to Z due to this state. By what factor have we overcounted? It is easy to see that the overcounting factor is
N
N!
degree of overcounting = ,
∏ nα !
α
which is the number of ways we can rearrange the labels α to arrive at the same collection {n
j α} . This follows from the multinomial theorem,
N
K
N! n n n
1 2 K
( ∑ xα ) = ∑∑ ⋯ ∑ x x ⋯x δ . (5.1.13)
1 2 K N ,n + … + n
n !n !⋯n ! 1 K
α=1 n n n 1 2 K
1 2 K
Thus, the correct expression for Z is

N
−β ∑ nα εα
Z = ∑ e α
δ
N N ,∑ nα
α
{ nα }
∏α nα ! −β( εα + εα + … + εα )
= ∑∑ ⋯ ∑( ) e 1 2 N
.
N!
α α α
1 2 N
In the high temperature limit, almost all the n are either 0 or 1, hence
α
N
ζ
Z ≈ . (5.1.14)
N
N!
This is the classical Maxwell-Boltzmann limit of quantum statistical mechanics. We now see the origin of the 1/N ! term which is so important in the thermodynamics of entropy of mixing.
Finally, starting with the expressions for the grand partition function for Bose-Einstein or Fermi-Dirac particles, and working in the low density limit where n α (μ, T) ≪ 1 , we have ε α − μ ≫ kB T ,
and consequently
\begin{split} \Omega\ns_{\ssr{BE}/\ssr{FD}}&=\pm\kT\,\sum_\alpha\ln\!\Big(1\mp e^{-(\ve\ns_\alpha-\mu)/\kT}\Big)\\ &\longrightarrow-\kT\sum_\alpha e^{-(\ve\ns_\alpha-\mu)/\kT}\equiv \Omega\ns_\ssr{MB}\ . \end{
This is the Maxwell-Boltzmann limit of quantum statistical mechanics. The occupation number average in the Maxwell-Boltzmann limit is then
^ ⟩ =e
⟨n . (5.1.15)
α
Single particle density of states

The single particle density of states per unit volume g(ε) is defined as
1
g(ε) = ∑ δ(ε − εα ) . (5.1.16)
V
α
We can then write

∞
−(ε−μ)/ kB T
Ω(T , V , μ) = ±V kB T ∫ dε g(ε) ln(1 ∓ e ) . (5.1.17)
−∞
For particles with a dispersion ε(k) , with p = ℏk , we have

d
d k
g(ε) = g∫ δ(ε − ε(k))
d
(2π)
d−1
gΩ k
d
= .
d
(2π) dε/dk
where g = 2S+1 is the spin degeneracy, and where we assume that ε(k) is both isotropic and a monotonically increasing function of k . Thus, we have
g dk
⎧ d =1
⎪
⎪ π dε
⎪
⎪
⎪
d−1 ⎪
gΩ k g
d dk
g(ε) = =⎨ k d =2 (5.1.18)
d 2π dε
(2π) dε/dk ⎪
⎪
⎪
⎪
⎪
⎩
⎪ g dk
2
2
k d =3 .
2π dε
In order to obtain g(ε) as a function of the energy ε one must invert the dispersion relation ε = ε(k) to obtain k = k(ε) .
Note that we can equivalently write
d
d k gΩ
d d−1
g(ε) dε = g = k dk (5.1.19)
d d
(2π) (2π)
to derive g(ε) .
For a spin-S particle with ballistic dispersion ε(k) = ℏ 2 2
k /2m , we have
d/2
2S+1 m d
−1
g(ε) = ( ) ε 2
Θ(ε), (5.1.20)
2
Γ(d/2) 2πℏ
where Θ(ε) is the step function, which takes the value 0 for ε < 0 and 1 for ε ≥ 0 . The appearance of Θ(ε) simply says that all the single particle energy eigenvalues are nonnegative. Note that we
are assuming a box of volume V but we are ignoring the quantization of kinetic energy, and assuming that the difference between successive quantized single particle energy eigenvalues is negligible
so that g(ε) can be replaced by the average in the above expression. Note that
1
n(ε, T , μ) = . (5.1.21)
(ε−μ)/ kB T
e ∓1
This result holds true independent of the form of g(ε) . The average total number of particles is then
∞
1
N (T , V , μ) = V∫ dε g(ε) , (5.1.22)
(ε−μ)/ kB T
e ∓1
−∞
which does depend on g(ε) .
This page titled 5.1: Statistical Mechanics of Noninteracting Quantum Systems is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel Arovas.
Expansion in powers of the fugacity
From Equation [numeqn], we have that the number density n = N /V is
∞
g(ε)
n(T , z) = ∫ dε
−1 ε/ kB T
z e ∓1
−∞
∞
j−1 j
= ∑(±1 ) C (T ) z ,
j
j=1
where z = exp(μ/k B
T) is the fugacity and
∞
−jε/ kB T
C (T ) = ∫ dε g(ε) e . (5.2.1)
j
−∞
From Ω = −pV and our expression above for Ω(T , V , μ), we have
∞
−ε/ kB T
p(T , z) = ∓ kB T ∫ dε g(ε) ln (1 ∓ z e )
−∞
j−1 −1 j
= kB T ∑(±1 ) j C (T ) z .
j
j=1
Virial expansion of the equation of state

Eqns. ??? and ??? express n(T , z) and p(T , z) as power series in the fugacity z , with T -dependent coefficients. In principal, we
can eliminate z using Equation ??? , writing z = z(T , n) as a power series in the number density n , and substitute this into
Equation ??? to obtain an equation of state p = p(T , n) of the form
2
p(T , n) = n kB T (1 + B (T ) n + B (T ) n + … ). (5.2.2)
2 3
Note that the low density limit n → 0 yields the ideal gas law independent of the density of states g(ε) . This follows from
expanding n(T , z) and p(T , z) to lowest order in z , yielding n = C z + O(z ) and p = k T C z + O(z ) . Dividing the second
1
2
B
1
2
of these equations by the first yields p = n k T + O(n ) , which is the ideal gas law. Note that z = n/C + O(n ) can formally
B
2
1
2
be written as a power series in n .

Unfortunately, there is no general analytic expression for the virial coefficients B (T ) in terms of the expansion coefficients j
n (T ) . The only way is to grind things out order by order in our expansions. Let’s roll up our sleeves and see how this is done. We
j
start by formally writing z(T , n) as a power series in the density n with T -dependent coefficients A (T ): j
2 3
z =A n+A n +A n +… . (5.2.3)
1 2 3
We then insert this into the series for n(T , z):

2 3
n =C z±C z +C z +…
1 2 3
2 3 2 3 2
=C (A n+A n +A n +… )±C (A n+A n +A n +… )
1 1 2 3 2 1 2 3
2 3 3
+C (A n+A n +A n +… ) +… .
3 1 2 3
Let’s expand the RHS to order n . Collecting terms, we have

3
2 2 3 3
n =C A n + (C A ±C A )n + (C A ± 2C A A +C A )n +… . (5.2.4)
1 1 1 2 2 1 1 3 2 1 2 3 1
In order for this equation to be true we require that the coefficient of n on the RHS be unity, and that the coefficients of n for all j
j > 1 must vanish. Thus,
C A =1
1 1
2
C A ±C A =0
1 2 2 1
3
C A ± 2C A A +C A = 0.
1 3 2 1 2 3 1
The first of these yields A :

1
1
A = . (5.2.5)
1
C
1
We now insert this into the second equation to obtain A : 2
C
2
A =∓ . (5.2.6)
2 3
C
1
Next, insert the expressions for A and A into the third equation to obtain A :
1 2 3
2
2C C
2 3
A = − . (5.2.7)
3 5 4
C C
1 1
This procedure rapidly gets tedious!

And we’re only half way done. We still must express p in terms of n :
p 1 2
2 3 2 3
=C (A n+A n +A n +… )± C (A n+A n +A n +… )
1 1 2 3 2 1 2 3
kB T 2
1 3
2 3
+ C (A n+A n +A n +… ) +…
3 1 2 3
3
1 2
1 3
2 3
=C A n + (C A ± C A )n + (C A ±C A A + C A )n +…
1 1 1 2 2 1 1 3 2 1 2 3 1
2 3
2 3
= n+B n +B n +…
2 3
We can now write
1 C
2 2
B =C A ± C A =∓
2 1 2 2 1 2
2 2C
1
2
1 C 2C
3 2 3
B =C A ±C A A + C A = − .
3 1 3 2 1 2 3 1 4 3
3 C 3C
1 1
It is easy to derive the general result that B^\ssr{F}_j=(-1)^{j-1} B^\ssr{B}_j , where the superscripts denote Fermi (F) or Bose (B)
statistics.
We remark that the equation of state for classical (and quantum) interacting systems also can be expanded in terms of virial
coefficients. Consider, for example, the van der Waals equation of state,
2
aN
(p + )(V − N b) = N kB T . (5.2.8)
2
V
This may be recast as

nkB T
2
p = − an
1 − bn
2 2 3 3 4
= nkB T + (b kB T − a) n + kB T b n + kB T b n +… ,
where n = N /V . Thus, for the van der Waals system, we have B 2

= (b kB T − a) and B k
= kB T b
k−1
for all k ≥ 3 .
Ballistic Dispersion
For the ballistic dispersion ε(p) = p 2
/2m we computed the density of states in Equation ??? . One finds
∞
−d
g λ d
S T −1 −jt −d −d/2
C (T ) = ∫ dt t 2 e =g λ j . (5.2.9)
j S T
Γ(d/2)
0
We then have
d
−( +1)
−1 d
B (T ) = ∓ 2 2
⋅g λ
2 S T
d
−( +1)
−(d+1) −2 2d
B (T ) = (2 −3 2
)⋅2g λ .
3 S T
Note that B (T ) is negative for bosons and positive for fermions. This is because bosons have a tendency to bunch and under
2
certain circumstances may exhibit a phenomenon known as Bose-Einstein condensation (BEC). Fermions, on the other hand, obey
the Pauli principle, which results in an extra positive correction to the pressure in the low density limit.
We may also write
−d
n(T , z) = ±g λ Li d (±z) (5.2.10)
S T
2
and
−d
p(T , z) = ±g kB T λ Li d (±z), (5.2.11)
S T +1
2
where
∞ n
z
Liq (z) ≡ ∑ (5.2.12)
q
n
n=1
is the polylogarithm function2. Note that Li q

(z) obeys a recursion relation in its index, viz.
∂
z Li (z) = Li (z), (5.2.13)
q q−1
∂z
and that
∞
1
Liq (1) = ∑ = ζ(q). (5.2.14)
q
n
n=1
This page titled 5.2: Quantum Ideal Gases - Low Density Expansions is shared under a CC BY-NC-SA license and was authored, remixed, and/or
Suppose we are to partition N particles among J possible distinct single particle states. How many ways Ω are there of accomplishing this task? The answer depends on the statistics of the particles.
If the particles are fermions, the answer is easy: \ROmega\ns_\ssr{FD}={J\choose N} . For bosons, the number of possible partitions can be evaluated via the following argument. Imagine that we line
up all the N particles in a row, and we place J − 1 barriers among the particles, as shown below in Figure [BEcount]. The number of partitions is then the total number of ways of placing the N
particles among these N + J − 1 objects (particles plus barriers), hence we have \ROmega\ns_\ssr{BE}={N+J-1\choose N} . For Maxwell-Boltzmann statistics, we take \ROmega\ns_\ssr{MB}=J^N/N!
Note that \ROmega\ns_\ssr{MB} is not necessarily an integer, so Maxwell-Boltzmann statistics does not represent any actual state counting. Rather, it manifests itself as a common limit of the Bose
and Fermi distributions, as we have seen and shall see again shortly.
[BEcount] Partitioning N bosons into J possible states (N = 14 and J = 5 shown). The N black dots represent bosons, while the J − 1 white dots represent markers separating the different single
particle populations. Here n = 3 , n = 1 , n = 4 , n = 2 , and n = 4 .
1 2 3 4 5
The entropy in each case is simply S = k B

ln Ω . We assume N ≫ 1 and J ≫ 1 , with n ≡ N /J finite. Then using Stirling’s approximation, ln(K!) = K ln K − K + O(ln K) , we have
\begin{split} S\ns_\ssr{MB}&=-J\kB \, n\ln n \\ S\ns_\ssr{BE}&=-J\kB\big[ n\ln n - (1+n)\ln (1+n)\big] \bvph \\ S\ns_\ssr{FD}&=-J\kB\big[ n\ln n + (1-n)\ln (1-n)\big]\ . \end{split}
In the Maxwell-Boltzmann limit, n ≪ 1 , and all three expressions agree. Note thatR
\begin{split} \pabc{S\ns_\ssr{MB}}{N}{J} &= -\kB \, \big( 1 + \ln n\big) \\ \pabc{S\ns_\ssr{BE}}{N}{J} &= \kB\ln\!\big(n^{-1}+1\big) \bvph \\ \pabc{S\ns_\ssr{FD}}{N}{J} &= \kB\ln\!\big(n^{-1}-1\big)\ . \end{split}
Now let’s imagine grouping the single particle spectrum into intervals of J consecutive energy states. If J is finite and the spectrum is continuous and we are in the thermodynamic limit, then these
states will all be degenerate. Therefore, using α as a label for the energies, we have that the grand potential Ω = E − T S − μN is given in each case by
\begin{split} \Omega\ns_\ssr{MB} &= J\sum_\alpha \Big[ (\ve\ns_\alpha-\mu)\,n\ns_\alpha+\kT\,n\ns_\alpha\ln n\ns_\alpha\Big] \\ \Omega\ns_\ssr{BE} &= J\sum_\alpha \Big[ (\ve\ns_\alpha-\mu)\,n\ns_\alpha+\kT\,n\ns_\a
Now - lo and behold! - treating Ω as a function of the distribution {n α} and extremizing in each case, subject to the constraint of total particle number N =J∑
α
nα , one obtains the Maxwell-
Boltzmann, Bose-Einstein, and Fermi-Dirac distributions, respectively:
{\delta\over\delta n\ns_\alpha}\Big(\Omega-\lambda \, J\sum_{\alpha'} n\ns_{\alpha'}\Big) = 0 \quad\Rightarrow \quad \begin{cases} n^\ssr{MB}_\alpha=e^{(\mu-\ve\ns_\alpha)/k\ns_\RB T} \\ \\ n^\ssr{BE}_\alpha=\big[e^
As long as J is finite, so the states in each block all remain at the same energy, the results are independent of J .
This page titled 5.3: Entropy and Counting States is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel Arovas.
Welcome to the Physics Library. This Living Library is a principal hub of the LibreTexts project, which is a multi-institutional
collaborative venture to develop the next generation of open-access texts to improve postsecondary education at all levels of higher
learning. The LibreTexts approach is highly collaborative where an Open Access textbook environment is under constant revision
by students, faculty, and outside experts to supplant conventional paper-based books.
Thermodynamics of the photon gas
There exists a certain class of particles, including photons and certain elementary excitations in solids such as phonons ( lattice
vibrations) and magnons ( spin waves) which obey bosonic statistics but with zero chemical potential. This is because their overall
number is not conserved (under typical conditions) – photons can be emitted and absorbed by the atoms in the wall of a container,
phonon and magnon number is also not conserved due to various processes, In such cases, the free energy attains its minimum
value with respect to particle number when
∂F
μ =( ) =0 . (5.5.1)
∂N T .V
The number distribution, from Equation ??? , is then

1
n(ε) = . (5.5.2)
βε
e −1
The grand partition function for a system of particles with μ = 0 is

∞
−ε/ kB T
Ω(T , V ) = V kB T ∫ dε g(ε) ln (1 − e ) , (5.5.3)
−∞
where g(ε) is the density of states per unit volume.

σ
Suppose the particle dispersion is ε(p) = A|p| . We can compute the density of states g(ε) :
∞
d
d p gΩ
σ d d−1 σ
g(ε) = g∫ δ(ε − A|p | ) = ∫ dp p δ(ε − Ap )
d d
h h
0
∞
− d
gΩ −
d
d 2g √π d
−1
d σ σ
= A ∫ dx x σ−1
δ(ε − x) = ( ) ε Θ(ε) ,
σhd σ Γ(d/2) 1/σ
hA
0
where gis the internal degeneracy, due, for example, to different polarization states of the photon. We have used the result
/Γ(d/2) for the solid angle in d dimensions. The step function Θ(ε) is perhaps overly formal, but it reminds us that
d/2
Ω = 2π
d
the energy spectrum is bounded from below by ε = 0 , there are no negative energy states.
For the photon, we have ε(p) = cp , hence σ = 1 and
d/2 d−1
2g π ε
g(ε) = Θ(ε) . (5.5.4)
d
Γ(d/2) (hc)
In d = 3 dimensions the degeneracy is g =2 , the number of independent polarization states. The pressure p(T ) is then obtained
using Ω = −pV . We have
∞
−ε/ kB T
p(T ) = −kB T ∫ dε g(ε) ln (1 − e )
−∞
∞
d/2
2gπ −d d−1 −ε/ kB T
=− (hc ) kB T ∫ dε ε ln (1 − e )
Γ(d/2)
0
∞
d/2 d+1
2gπ (kB T )
d−1 −t
=− ∫ dt t ln (1 − e ) .
Γ(d/2) (hc)d
0
We can make some progress with the dimensionless integral:
∞
d−1 −t
Id ≡ − ∫ dt t ln (1 − e )
∞
∞
1 d−1 −nt
=∑ ∫ dt t e
n
n=1
0
∞
1
= Γ(d) ∑ = Γ(d) ζ(d + 1) .
d+1
n
n=1
Finally, we invoke a result from the mathematics of the gamma function known as the doubling formula,
z−1
2 z z+1
Γ(z) = − Γ( ) Γ( ) . (5.5.5)
√π 2 2
Putting it all together, we find

1 d+1
− (d+1) d+1 (kB T )
2
p(T ) = g π Γ( ) ζ(d + 1) . (5.5.6)
d
2 (ℏc)
The number density is found to be

∞
g(ε)
n(T ) = ∫ dε
ε/ kB T
e −1
−∞
d
1
− (d+1) d+1 kB T
2
=gπ Γ( ) ζ(d) ( ) .
2 ℏc
For photons in d = 3 dimensions, we have g = 2 and thus

3 4
2 ζ(3) kB T 2 ζ(4) (kB T )
n(T ) = ( ) , p(T ) = . (5.5.7)
2 2 3
π ℏc π (ℏc)
It turns out that ζ(4) = π
90
.
Note that ℏc/k B
= 0.22855 cm ⋅ K , so
kB T −1 3 3 −3
= 4.3755 T [K] cm ⟹ n(T ) = 20.405 × T [K ] cm . (5.5.8)
ℏc
To find the entropy, we use Gibbs-Duhem:

dp
dμ = 0 = −s dT + v dp ⟹ s =v , (5.5.9)
dT
where s is the entropy per particle and v = n −1

is the volume per particle. We then find
ζ(d+1)
s(T ) = (d+1) kB . (5.5.10)
ζ(d)
The entropy per particle is constant. The internal energy is

∂ ln Ξ ∂
E =− =− (βpV ) = d ⋅ p V , (5.5.11)
∂β ∂β
and hence the energy per particle is

E d ⋅ ζ(d+1)
ε = = d ⋅ pv = kB T . (5.5.12)
N ζ(d)
Classical arguments for the photon gas
A number of thermodynamic properties of the photon gas can be determined from purely classical arguments. Here we recapitulate
a few important ones.
Suppose our photon gas is confined to a rectangular box of dimensions L × L × L . Suppose further that the dimensions are
x y z
all expanded by a factor λ , the volume is isotropically expanded by a factor of λ . The cavity modes of the electromagnetic
1/3
radiation have quantized wavevectors, even within classical electromagnetic theory, given by
2πnx 2πny 2πnz
k =( , , ) . (5.5.13)
Lx Ly Lz
Since the energy for a given mode is ε(k) = ℏc|k| , we see that the energy changes by a factor λ under an adiabatic volume −1/3
expansion V → λV , where the distribution of different electromagnetic mode occupancies remains fixed. Thus,
∂E ∂E 1
V ( ) = λ( ) =− E . (5.5.14)
∂V ∂λ 3
S S
Thus,
∂E E
p = −( ) = , (5.5.15)
∂V 3V
S
as we found in Equation [photE]. Since E = E(T , V ) is extensive, we must have p = p(T ) alone.
Since p = p(T ) alone, we have
∂E ∂E
( ) =( ) = 3p
∂V ∂V
T p
∂p
= T( ) −p ,
∂T
V
∂p
where the second line follows the Maxwell relation ( ∂S
∂V
)
p
=(
∂T
)
V
, after invoking the First Law dE = T dS − p dV . Thus,
dp
4
T = 4p ⟹ p(T ) = A T , (5.5.16)
dT
where A is a constant. Thus, we recover the temperature dependence found microscopically in Equation [photp].
Given an energy density E/V , the differential energy flux emitted in a direction θ relative to a surface normal is
E dΩ
d jε = c ⋅ ⋅ cos θ ⋅ , (5.5.17)
V 4π
where dΩ is the differential solid angle. Thus, the power emitted per unit area is
π/2 2π
dP cE cE 3
4
= ∫ dθ∫ dϕ sin θ ⋅ cos θ = = c p(T ) ≡ σ T , (5.5.18)
dA 4πV 4V 4
0 0
where σ = 3
4
cA , with p(T ) = A T 4
as we found above. From quantum statistical mechanical considerations, we have
\sigma={\pi^2 k_\ssr{B}^4\over 60\,c^2\,\hbar^3}=5.67\times 10^{-8}\,{\RW\over\Rm^2\,\RK^4} \label{stefan}
is Stefan’s constant.
Surface temperature of the earth

We derived the result P = σ T ⋅ A where σ = 5.67 × 10 W /\Rm K for the power emitted by an electromagnetic ‘black
4 −8 2 4
body’. Let’s apply this result to the earth-sun system. We’ll need three lengths: the radius of the sun R = 6.96 × 10 \Rm , the ⊙
8
radius of the earth R = 6.38 × 10 \Rm , and the radius of the earth’s orbit a = 1.50 × 10 \Rm . Let’s assume that the earth
e
6
e
11
has achieved a steady state temperature of T . We balance the total power incident upon the earth with the power radiated by the
e
earth. The power incident upon the earth is
2 2 2
πRe Re R⊙
4 2 4
P = ⋅ σ T⊙ ⋅ 4π R⊙ = ⋅ πσ T⊙ . (5.5.19)
incident 2 2
4πae ae
The power radiated by the earth is

4 2
P = σ Te ⋅ 4π Re . (5.5.20)
radiated
Setting Pincident
=P
radiated
, we obtain
1/2
R⊙
Te = ( ) T⊙ . (5.5.21)
2 ae
Thus, we find T = 0.04817 T , and with T = 5780 K , we obtain T = 278.4 K. The mean surface temperature of the earth is
e ⊙ ⊙ e
T¯ = 287 K , which is only about 10 K higher. The difference is due to the fact that the earth is not a perfect blackbody, an object
e
which absorbs all incident radiation upon it and emits radiation according to Stefan’s law. As you know, the earth’s atmosphere
retraps a fraction of the emitted radiation – a phenomenon known as the greenhouse effect.
[planck] Spectral density \rho_\ve(\nu,T) for blackbody radiation at three
temperatures.
[planck] Spectral density ρ ε (ν, T) for blackbody radiation at three temperatures.
Distribution of blackbody radiation

Recall that the frequency of an electromagnetic wave of wavevector k is ν = c/λ = ck/2π . Therefore the number of photons
N (ν , T ) per unit frequency in thermodynamic equilibrium is (recall there are two polarization states)
T
3 2
2V d k V k dk
N (ν , T ) dν = ⋅ = ⋅ . (5.5.22)
3 ℏck/ kB T 2 ℏck/ kB T
8π e −1 π e −1
We therefore have
2
8πV ν
N (ν , T ) = ⋅ . (5.5.23)
3 hν / kB T
c e −1
Since a photon of frequency ν carries energy hν , the energy per unit frequency E(ν ) is
3
8πhV ν
E(ν , T ) = ⋅ . (5.5.24)
3 hν / kB T
c e −1
Note what happens if Planck’s constant h vanishes, as it does in the classical limit. The denominator can then be written
hν / kB T
hν 2
e −1 = + O(h ) (5.5.25)
kB T
and
\CE\ns_\ssr{CL}(\nu,T)=\lim_{h\to 0} \CE(\nu)=V\cdot{8\pi\kT\over c^3}\,\nu^2\ .
In classical electromagnetic theory, then, the total energy integrated over all frequencies diverges. This is known as the ultraviolet
catastrophe, since the divergence comes from the large ν part of the integral, which in the optical spectrum is the ultraviolet
portion. With quantization, the Bose-Einstein factor imposes an effective ultraviolet cutoff k T /h on the frequency integral, and B
the total energy, as we found above, is finite:

∞
2 4
π (kB T )
E(T ) = ∫ dν E(ν ) = 3pV = V ⋅ . (5.5.26)
3
15 (ℏc)
0
We can define the spectral density ρ ε (ν ) of the radiation as

3
E(ν , T ) 15 h (hν / kB T )
ρε (ν , T ) ≡ = (5.5.27)
4 hν / kB T
E(T ) π kB T e −1
so that ρε (ν , T ) dν is the fraction of the electromagnetic energy, under equilibrium conditions, between frequencies ν and ν + dν ,
∞
∫ dν ρε (ν , T ) = 1 . In Figure [planck] we plot this in Figure [planck] for three different temperatures. The maximum occurs when
0
s ≡ hν / kB T satisfies
3
d s s
( ) =0 ⟹ =3 ⟹ s = 2.82144 . (5.5.28)
s −s
ds e −1 1 −e
What if the sun emitted ferromagnetic spin waves?

We saw in Equation [jephoton] that the power emitted per unit surface area by a blackbody is σT . The power law here follows 4
from the ultrarelativistic dispersion ε = ℏck of the photons. Suppose that we replace this dispersion with the general form
ε = ε(k) . Now consider a large box in equilibrium at temperature T . The energy current incident on a differential area dA of
surface normal to z
^ is
3
d k 1 ∂ε(k) 1
dP = dA ⋅∫ Θ(cos θ) ⋅ ε(k) ⋅ ⋅ . (5.5.29)
3 ε(k)/ k T
(2π) ℏ ∂kz e B −1
Let us assume an isotropic power law dispersion of the form ε(k) = C k . Then after a straightforward calculation we obtain
α
dP 2+
2
=σT α
, (5.5.30)
dA
where
2
2+ 2
α −
2 2 gk C α
B
σ = ζ(2 + ) Γ(2 + )⋅ . (5.5.31)
α α 8π 2 ℏ
One can check that for g = 2 , C = ℏc , and α = 1 that this result reduces to that of Equation [stefan].
This page titled 5.5: Photon Statistics is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel Arovas.
Crystalline solids support propagating waves called phonons, which are quantized vibrations of the lattice. Recall that the quantum mechanical Hamiltonian for a single harmonic oscillator,
2
p
^ =
H
2m
+
1
2
2
mω q
0
2
, may be written as H
^ = ℏω
0
†
(a a +
1
2
) , where a and a are ‘ladder operators’ satisfying commutation relations [a , a
† †
] =1 .
One-dimensional chain
Consider the linear chain of masses and springs depicted in Figure [lchain]. We assume that our system consists of N mass points on a large ring of circumference L. In equilibrium, the masses are
spaced evenly by a distance b = L/N . That is, x = nb is the equilibrium position of particle n . We define u = x − x to be the difference between the position of mass n and The Hamiltonian is
0
n n n
0
n
then
2
pn 1
^ 2
H = ∑[ + κ (x − xn − a) ]
n+1
2m 2
n
2
pn 1 2
1 2
= ∑[ + κ (u − un ) ] + N κ(b − a) ,
n+1
2m 2 2
n
where a is the unstretched length of each spring, m is the mass of each mass point, κ is the force constant of each spring, and N is the total number of mass points. If b ≠a the springs are under
tension in equilibrium, but as we see this only leads to an additive constant in the Hamiltonian, and hence does not enter the equations of motion.
The classical equations of motion are
^
∂H pn
u̇n = =
∂pn m
^
∂H
ṗ n =− = κ (u +u − 2 un ) .
n+1 n−1
∂un
Taking the time derivative of the first equation and substituting into the second yields
κ
ü = (u +u − 2 un ) . (5.6.1)
n n+1 n−1
m
We now write
1
~ ikna
un = −− ∑ uk e , (5.6.2)
√N
k
where periodicity u N +n
= un requires that the k values are quantized so that e ikN a
=1 , k = 2πj/N a where j ∈ {0, 1, … , N −1} . The inverse of this discrete Fourier transform is
1
~ −ikna
uk = −− ∑ un e . (5.6.3)
√N
n
Note that u
~
is in general complex, but that u
k
~ ∗
k
~
=u
−k
. In terms of the u
~
, the equations of motion take the form
k
2κ
~
¨ ~ 2 ~
uk = − (1 − cos(ka)) uk ≡ −ω uk . (5.6.4)
k
m
Thus, each ~
uk is a normal mode, and the normal mode frequencies are
−−
−
κ 1
ω =2√ ∣
∣sin ( ka)∣
∣ . (5.6.5)
k
m 2
The density of states for this band of phonon excitations is

π/a
dk
g(ε) = ∫ δ(ε − ℏω )
k
2π
−π/a
2 −1/2
2 2
= (J −ε ) Θ(ε) Θ(J − ε) ,
πa
−−−−
where J = 2ℏ√κ/m is the phonon bandwidth. The step functions require 0 ≤ ε ≤ J ; outside this range there are no phonon energy levels and the density of states accordingly vanishes.
[lchain] A linear chain of masses and springs. The black circles represent the
equilibrium positions of the masses. The displacement of mass n relative to
its equilibrium value is u_n.
[lchain] A linear chain of masses and springs. The black circles represent the equilibrium positions of the masses. The displacement of mass n relative to its equilibrium value is u . n
The entire theory can be quantized, taking [ p n

,u
n′
] = −iℏδ
nn′
. We then define
1 1
~ ikna ~ −ikna
pn = −− ∑ pk e , pk = −− ∑ pn e , (5.6.6)
√N √N n
k
in which case [ p~ k
~
,u ′
k
] = −iℏδ
kk
′
. Note that u
~ †
k
~
=u
−k
and p~ †
k
~
=p
−k
. We then define the ladder operator
1/2 1/2
1 mω
~ k ~
a =( ) p k − i( ) uk (5.6.7)
k
2mℏω 2ℏ
k
†
and its Hermitean conjugate a , in terms of which the Hamiltonian is
k
†
1
^ = ∑ ℏω
H (a a + ) , (5.6.8)
k k k
2
k
which is a sum over independent harmonic oscillator modes. Note that the sum over k is restricted to an interval of width 2π, k ∈ [− π
a
,
π
a
] , which is the first Brillouin zone for the one-dimensional
chain structure. The state at wavevector k + is identical to that at k , as we see from Equation [uFT].
2π
General theory of lattice vibrations

The most general model of a harmonic solid is described by a Hamiltonian of the form
2
p (R) 1
i αβ ′ β ′
^ α
H =∑ + ∑∑ ∑ u (R) Φ (R − R ) u (R ) , (5.6.9)
i ij j
2M 2
R,i i i,j α,β R,R′
where the dynamical matrix is
2
αβ ′
∂ U
Φ (R − R ) = , (5.6.10)
ij β
α ′
∂u (R) ∂ u (R )
i j
where U is the potential energy of interaction among all the atoms. Here we have simply expanded the potential to second order in the local displacements u (R) . The lattice sites R are elements of α
i
a Bravais lattice. The indices i and j specify basis elements with respect to this lattice, and the indices α and β range over {1, … , d}, the number of possible directions in space. The subject of
crystallography is beyond the scope of these notes, but, very briefly, a Bravais lattice in d dimensions is specified by a set of d linearly independent primitive direct lattice vectors a , such that any l
point in the Bravais lattice may be written as a sum over the primitive vectors with integer coefficients: R = ∑ n a . The set of all such vectors {R} is called the direct lattice. The direct lattice
d
l=1 l l
is closed under the operation of vector addition: if R and R are points in a Bravais lattice, then so is R + R .
′ ′
[basis] A crystal structure with an underlying square Bravais lattice and a

three element basis.
[basis] A crystal structure with an underlying square Bravais lattice and a three element basis.
A crystal is a periodic arrangement of lattice sites. The fundamental repeating unit is called the unit cell. Not every crystal is a Bravais lattice, however. Indeed, Bravais lattices are special crystals in
which there is only one atom per unit cell. Consider, for example, the structure in Figure [basis]. The blue dots form a square Bravais lattice with primitive direct lattice vectors a = a x ^ and
1
a =ay ^ , where a is the lattice constant, which is the distance between any neighboring pair of blue dots. The red squares and green triangles, along with the blue dots, form a basis for the crystal
2
structure which label each sublattice. Our crystal in Figure [basis] is formally classified as a square Bravais lattice with a three element basis. To specify an arbitrary site in the crystal, we must
specify both a direct lattice vector R as well as a basis index j ∈ {1, … , r}, so that the location is R + η . The vectors {η } are the basis vectors for our crystal structure. We see that a general
j j
crystal structure consists of a repeating unit, known as a unit cell. The centers (or corners, if one prefers) of the unit cells form a Bravais lattice. Within a given unit cell, the individual sublattice sites
are located at positions η with respect to the unit cell position R .
j
Upon diagonalization, the Hamiltonian of Equation [Hcrystal] takes the form
† 1
^
H = ∑ ℏωa (k) (Aa (k) Aa (k) + ) , (5.6.11)
2
k,a
where
† ′
[ Aa (k) , A (k )] = δ δ ′
. (5.6.12)
b ab kk
The eigenfrequencies are solutions to the eigenvalue equation

~ αβ (a) 2 (a)
∑ Φij (k) e (k) = M ωa (k) e (k) , (5.6.13)
jβ i iα
j,β
where
~ αβ αβ −ik⋅R
Φij (k) = ∑ Φ (R) e . (5.6.14)
ij
Here, k lies within the first Brillouin zone, which is the unit cell of the reciprocal lattice of points G satisfying e = 1 for all G and
iG ⋅R
R . The reciprocal lattice is also a Bravais lattice, with
d
primitive reciprocal lattice vectors b , such that any point on the reciprocal lattice may be written G = ∑ m b . One also has that a ⋅ b
l l=1 l l l
′
l
= 2π δ ′
ll
. The index a ranges from 1 to d ⋅ r and labels
(a)
the mode of oscillation at wavevector k. The vector e iα
(k) is the polarization vector for the a^\ssr{th} phonon branch. In solids of high symmetry, phonon modes can be classified as longitudinal or
transverse excitations.
For a crystalline lattice with an r-element basis, there are then d ⋅ r phonon modes for each wavevector k lying in the first Brillouin zone. If we impose periodic boundary conditions, then the k points
within the first Brillouin zone are themselves quantized, as in the d = 1 case where we found k = 2πn/N . There are N distinct k points in the first Brillouin zone – one for every direct lattice site.
The total number of modes is than d ⋅ r ⋅ N , which is the total number of translational degrees of freedom in our system: rN total atoms (N unit cells each with an r atom basis) each free to vibrate in
d dimensions. Of the d ⋅ r branches of phonon excitations, d of them will be acoustic modes whose frequency vanishes as k → 0 . The remaining d(r − 1) branches are optical modes and oscillate at
finite frequencies. Basically, in an acoustic mode, for k close to the (Brillouin) zone center k = 0 , all the atoms in each unit cell move together in the same direction at any moment of time. In an
optical mode, the different basis atoms move in different directions.
There is no number conservation law for phonons – they may be freely created or destroyed in anharmonic processes, where two photons with wavevectors k and q can combine into a single phonon
with wavevector k + q , and vice versa. Therefore the chemical potential for phonons is μ = 0 . We define the density of states g (ω) for the a^\ssr{th} phonon mode as a
d
1 d k
ga (ω) = ∑ δ(ω − ωa (k)) = V ∫ δ(ω − ωa (k)) , (5.6.15)
0
d
N (2π)
k
BZ
where N is the number of unit cells, V is the unit cell volume of the direct lattice, and the k sum and integral are over the first Brillouin zone only. Note that ω here has dimensions of frequency. The
0
functions g (ω) is normalized to unity:

a
∫ dω ga (ω) = 1 . (5.6.16)
The total phonon density of states per unit cell is given by3
dr
g(ω) = ∑ ga (ω) . (5.6.17)
a=1
[phonons] Upper panel: phonon spectrum in elemental rhodium (Rh) at T=297\,K measured
by high precision inelastic neutron scattering (INS) by A. Eichler et al., Phys. Rev. B 57,
324 (1998). Note the three acoustic branches and no optical branches, corresponding to d=3 and
r=1. Lower panel: phonon spectrum in gallium arsenide (GaAs) at T=12\,K, comparing
theoretical lattice-dynamical calculations with INS results of D. Strauch and B. Dorner, J.
Phys.: Condens. Matter 2, 1457 (1990). Note the three acoustic branches and three optical
branches, corresponding to d=3 and r=2. The Greek letters along the x-axis indicate points of
high symmetry in the Brillouin zone.
[phonons] Upper panel: phonon spectrum in elemental rhodium (Rh) at T = 297 K measured by high precision inelastic neutron scattering (INS) by A. Eichler et al., Phys. Rev. B 57, 324 (1998).
Note the three acoustic branches and no optical branches, corresponding to d = 3 and r = 1 . Lower panel: phonon spectrum in gallium arsenide (GaAs) at T = 12 K, comparing theoretical lattice-
dynamical calculations with INS results of D. Strauch and B. Dorner, J. Phys.: Condens. Matter 2, 1457 (1990). Note the three acoustic branches and three optical branches, corresponding to d = 3
and r = 2 . The Greek letters along the x -axis indicate points of high symmetry in the Brillouin zone.
The grand potential for the phonon gas is
∞
1
−βℏωa (k) (na (k)+ )
Ω(T , V ) = −kB T ln ∏ ∑ e 2
k,a na (k)=0
ℏωa (k)
= kB T ∑ ln[2 sinh ( )]
2 kB T
k,a
ℏω
= N kB T∫ dω g(ω) ln[2 sinh ( )] .
2 kB T
0
Note that V = N V0 since there are N unit cells, each of volume V . The entropy is given by S = −(
0
∂Ω
∂T
)
V
and thus the heat capacity is
∞
2 2
∂ Ω ℏω ℏω
2
C = −T = N kB ∫ dω g(ω) ( ) csch ( ) (5.6.18)
V 2
∂T 2 kB T 2 kB T
0
2 kB T
Note that as T → ∞ we have csch(
ℏω
2 kB T
) →
ℏω
, and therefore
∞
lim C (T ) = N kB ∫ dω g(ω) = rdN kB . (5.6.19)

V
T →∞
This is the classical Dulong-Petit limit of 1
2
kB per quadratic degree of freedom; there are rN atoms moving in d dimensions, hence d ⋅ rN positions and an equal number of momenta, resulting in a
high temperature limit of C = rdN k .
V
B
Einstein and Debye models

HIstorically, two models of lattice vibrations have received wide attention. First is the so-called Einstein model, in which there is no dispersion to the individual phonon modes. We approximate
g (ω) ≈ δ(ω − ω ) , in which case
a a
2
ℏωa ℏωa
2
C (T ) = N kB ∑ ( ) csch ( ) . (5.6.20)
V
2 kB T 2 kB T
a
ℏωa
At low temperatures, the contribution from each branch vanishes exponentially, since csch (
2
2 kB T
) ≃4e
−ℏωa / kB T
→ 0 . Real solids don’t behave this way.
A more realistic model. due to Debye, accounts for the low-lying acoustic phonon branches. Since the acoustic phonon dispersion vanishes linearly with |k| as k → 0 , there is no temperature at which
the acoustic phonons ‘freeze out’ exponentially, as in the case of Einstein phonons. Indeed, the Einstein model is appropriate in describing the d (r−1) optical phonon branches, though it fails
miserably for the acoustic branches.
In the vicinity of the zone center k = 0 (also called Γ in crystallographic notation) the d acoustic modes obey a linear dispersion, with ω a (k)
^
= ca (k) k . This results in an acoustic phonon density of
states in d = 3 dimensions of
\begin{split} {\tilde g}(\omega)&={\CV\ns_0\,\omega^2\over 2\pi^2}\sum_a\int\!{d\khat\over 4\pi}\,{1\over c^3_a(\Bk)}\ \RTheta(\omega\ns_\ssr{D}-\omega)\\ &={3\CV\ns_0\over 2\pi^2{\bar c}^3}\,\omega^2\,\RTheta(\
where c̄ is an average acoustic phonon velocity ( speed of sound) defined by

^
3 dk 1
= ∑∫ (5.6.21)
3 3
c̄ 4π ca (k)
a
and \omega\ns_\ssr{D} is a cutoff known as the Debye frequency. The cutoff is necessary because the phonon branch does not extend forever, but only to the boundaries of the Brillouin zone. Thus,
\omega\ns_\ssr{D} should roughly be equal to the energy of a zone boundary phonon. Alternatively, we can define \omega\ns_\ssr{D} by the normalization condition
\int\limits_0^\infty\!\!d\omega\>{\tilde g}(\omega)=3\quad\Longrightarrow \quad \omega\ns_\ssr{D}=(6\pi^2/\CV\ns_0)^{1/3}\,{\bar c}\ .
This allows us to write {\tilde g}(\omega)=\big(9\omega^2/\omega_\ssr{D}^3\big)\,\RTheta(\omega\ns_\ssr{D}-\omega) .

The specific heat due to the acoustic phonons is then
\begin{split} C\ns_V(T)&={9N\kB\over\omega_\ssr{D}^3}\!\int\limits_0^{\omega\ns_\ssr{D}}\!\!d\omega\>\omega^2\, \bigg({\hbar\omega\over 2\kT}\bigg)^{\!\!2}\csch^{\!2}\bigg({\hbar\omega\over 2\kT}\bigg)\\ &=9
where \Theta\ns_\ssr{D}=\hbar\omega\ns_\ssr{D}/\kB is the Debye temperature and

1 3
x ⎧ x x → 0
⎪ 3
4 2
ϕ(x) = ∫ dt t csch t = ⎨ (5.6.22)
⎩
⎪ π
4
0 x → ∞ .
30
Therefore,
3
⎧
⎪ 12π
4
T
⎪ N kB ( ) T ≪ Θ
5 ΘD D
C (T ) = ⎨ (5.6.23)
V
⎪
⎩
⎪
3N kB T ≫ Θ .
D
Thus, the heat capacity due to acoustic phonons obeys the Dulong-Petit rule in that C (T → ∞) = 3N k , corresponding to the three acoustic degrees of freedom per unit cell. The remaining
V B
contribution of 3(r − 1)N k to the high temperature heat capacity comes from the optical modes not considered in the Debye model. The low temperature T behavior of the heat capacity of
B
3
crystalline solids is a generic feature, and its detailed description is a triumph of the Debye model.
Melting and the Lindemann criterion

Atomic fluctuations in a crystal
For the one-dimensional chain, Equation [siva] gives
1/2
ℏ †
~
u = i( ) (a −a ) . (5.6.24)
k k −k
2mω
k
Therefore the RMS fluctuations at each site are given by

1
2 ~ ~
⟨un ⟩ = ∑⟨u u ⟩
k −k
N
k
1 ℏ 1
= ∑ (n(k) + ) ,
N mω 2
k k
−1
where n(k, T ) = [ exp(ℏω k
/ kB T ) − 1 ] is the Bose occupancy function.
[debtab] Debye temperatures (at T = 0 ) and melting points for some common elements (carbon is assumed to be diamond and not graphite). (Source: the internet!)
Element Ag Al Au C Cd Cr Cu Fe Mn
\Theta\ns_\ssr{D}
227 433 162 2250 210 606 347 477 409
(K)
Tmelt (K) 962 660 1064 3500 321 1857 1083 1535 1245
Element Ni Pb Pt Si Sn Ta Ti W Zn
Element Ag Al Au C Cd Cr Cu Fe Mn
\Theta\ns_\ssr{D}
477 105 237 645 199 246 420 383 329
(K)
Tmelt (K) 1453 327 1772 1410 232 2996 1660 3410 420
Let us now generalize this expression to the case of a d -dimensional solid. The appropriate expression for the RMS position fluctuations of the i th
basis atom in each unit cell is
dr
1 ℏ 1
2
⟨u (R)⟩ = ∑∑ (na (k) + ) . (5.6.25)
i
N M (k) ωa (k) 2
k a=1 ia
Here we sum over all wavevectors k in the first Brilliouin zone, and over all normal modes a . There are dr normal modes per unit cell d branches of the phonon dispersion ω (k) . (For the one- a
dimensional chain with d = 1 and r = 1 there was only one such branch to consider). Note also the quantity M (k), which has units of mass and is defined in terms of the polarization vectors ia
(a)
e (k) as
iα
d
1 (a) 2
= ∑∣
∣eiμ (k)∣
∣ . (5.6.26)
M (k)
ia μ=1
The dimensions of the polarization vector are [mass] −1/2

, since the generalized orthonormality condition on the normal modes is
∗
(a) (b) ab
∑M e (k) e (k) = δ , (5.6.27)
i iμ iμ
i,μ
where M is the mass of the atom of species i within the unit cell (i ∈ {1, … , r}). For our purposes we can replace M
i ia
(k) by an appropriately averaged quantity which we call M ; this ‘effective
i
mass’ is then independent of the mode index a as well as the wavevector k. We may then write
∞
ℏ 1 1
2
⟨u ⟩ ≈ ∫ dω g(ω) ⋅{ + } , (5.6.28)
i
M ω
i
eℏω/ kB T − 1 2
0
where we have dropped the site label R since translational invariance guarantees that the fluctuations are the same from one unit cell to the next. Note that the fluctuations ⟨ u 2
i
⟩ can be divided into a
temperature-dependent part ⟨ u ⟩ and a temperature-independent quantum contribution ⟨ u ⟩ , where
2
i th
2
i qu
ℏ g(ω) 1
2
⟨u ⟩ = ∫ dω ⋅
i th
M
i
ω eℏω/ kB T − 1
0
ℏ g(ω)
2
⟨u ⟩qu = ∫ dω .
i
2M ω
i
0
Let’s evaluate these contributions within the Debye model, where we replace g(ω) by
{\bar g}(\omega)={d^2\,\omega^{d-1}\over\omega_\ssr{D}^d}\>\Theta(\omega\ns_\ssr{D}-\omega)\ .
We then find
\begin{split} \langle\,\Bu_i^2\,\rangle\ns_{th} &={d^2\hbar\over \SM\ns_i\,\omega\ns_\ssr{D}}\,\bigg({\kT\over\hbar\omega\ns_\ssr{D}}\bigg)^{\!\!d-1} F\ns_d(\hbar\omega\ns_\ssr{D}/\kT) \\ \langle\,\Bu_i^2\,\rangle\ns_
where
d−2
x x
⎧
⎪ x → 0
d−2
s d−2
F (x) = ∫ ds =⎨ . (5.6.29)
d s
e −1 ⎩
⎪
0 ζ(d − 1) x → ∞
We can now extract from these expressions several important conclusions:

The T = 0 contribution to the the fluctuations, ⟨ u ⟩ , diverges in d = 1 dimensions. Therefore there are no one-dimensional quantum solids.
2
i qu
The thermal contribution to the fluctuations, ⟨ u ⟩ , diverges for any T > 0 whenever d ≤ 2 . This is because the integrand of F (x) goes as s
2
i th
as s → 0 . Therefore, there are no two- d
d−3
dimensional classical solids.

Both the above conclusions are valid in the thermodynamic limit. Finite size imposes a cutoff on the frequency integrals, because there is a smallest wavevector k ∼ 2π/L, where L is the
min
(finite) linear dimension of the system. This leads to a low frequency cutoff ω = 2π c̄ /L , where c̄ is the appropriately averaged acoustic phonon velocity from Equation [DMcave], which
min
mitigates any divergences.
Lindemann melting criterion

An old phenomenological theory of melting due to Lindemann says that a crystalline solid melts when the RMS fluctuations in the atomic positions exceeds a certain fraction η of the lattice constant
a . We therefore define the ratios
\begin{split} x^2_{i,th}&\equiv {\langle\,\Bu_i^2\,\rangle\ns_{th} \over a^2} = d^2\cdot\bigg({\hbar^2\over \SM_i\,a^2\,\kB}\bigg) \cdot{T^{d-1}\over\Theta_\ssr{D}^d}\cdot F(\Theta\ns_\ssr{D}/T) \\ x^2_{i,qu}&\equiv
−−−−−−−−− −−−−
with x i
= √x
i,th
+x
2
i,qu
2 2
= √⟨ u ⟩/a
i
.
Let’s now work through an example of a three-dimensional solid. We’ll assume a single element basis (r = 1) . We have that
2
9ℏ /4 kB
= 109 K . (5.6.30)
2
1 \tt amu Å
According to table [DMtemps], the melting temperature always exceeds the Debye temperature, and often by a great amount. We therefore assume T\gg\Theta\ns_\ssr{D} , which puts us in the small x
limit of F (x). We then find
d
x^2_{qu}={\Theta^\star\over\Theta\ns_\ssr{D}}\qquad,\qquad x^2_{th}= {\Theta^\star\over\Theta\ns_\ssr{D}}\cdot{4T\over\Theta\ns_\ssr{D}} \qquad,\qquad x=\sqrt{\bigg(1+{4T\over\Theta\ns_\ssr{D}}\bigg){\Theta^\s
where
∗
109 K
Θ = . (5.6.31)
2
M [\tt amu] ⋅ (a[\rm\AA])
The total position fluctuation is of course the sum x = x + x 2

. Consider for example the case of copper, with M = 56 amu and a = 2.87 Å. The Debye temperature is
2
i,th
2
i,qu
\Theta\ns_\ssr{D}=347\,\RK . From this we find x = 0.026, which says that at T = 0 the RMS fluctuations of the atomic positions are not quite three percent of the lattice spacing ( the distance
qu
between neighboring copper atoms). At room temperature, T = 293 K , one finds x = 0.048, which is about twice as large as the quantum contribution. How big are the atomic position fluctuations
th
at the melting point? According to our table, T = 1083 K for copper, and from our formulae we obtain x
melt
= 0.096. The Lindemann criterion says that solids melt when x(T ) ≈ 0.1 .
melt
We were very lucky to hit the magic number x
melt
= 0.1 with copper. Let’s try another example. Lead has M = 208 amu and a = 4.95 Å. The Debye temperature is \Theta\ns_\ssr{D}=105\,\RK
(‘soft phonons’), and the melting point is T melt
= 327 K . From these data we obtain x(T = 0) = 0.014 , x(293 K) = 0.050 and x(T = 327 K) = 0.053. Same ballpark.
We can turn the analysis around and predict a melting temperature based on the Lindemann criterion x(T melt
) =η , where η ≈ 0.1 . We obtain
T\ns_\ssr{L}=\bigg( {\eta^2\,\Theta\ns_\ssr{D}\over\Theta^\star}-1\bigg) \cdot{\Theta\ns_\ssr{D}\over 4}\ .
We call T\ns_\ssr{L} the Lindemann temperature. Most treatments of the Lindemann criterion ignore the quantum correction, which gives the −1 contribution inside the above parentheses. But if we
are more careful and include it, we see that it may be possible to have T_\ssr{L} < 0 . This occurs for any crystal where \Theta\ns_\ssr{D} < \Theta^\star/\eta^2 .
Consider for example the case of He, which at atmospheric pressure condenses into a liquid at T = 4.2 K and remains in the liquid state down to absolute zero. At p = 1 atm , it never solidifies!
4
c
Why? The number density of liquid He at p = 1 atm and T = 0 K is 2.2 × 10 cm . Let’s say the Helium atoms want to form a crystalline lattice. We don’t know a priori what the lattice
4 22 −3
structure will be, so let’s for the sake of simplicity assume a simple cubic lattice. From the number density we obtain a lattice spacing of a = 3.57 Å. OK now what do we take for the Debye
temperature? Theoretically this should depend on the microscopic force constants which enter the small oscillations problem ( the spring constants between pairs of helium atoms in equilibrium).
We’ll use the expression we derived for the Debye frequency, \omega\ns_\ssr{D}=(6\pi^2/\CV\ns_0)^{1/3}{\bar c} , where V is the unit cell volume. We’ll take c̄ = 238 \Rm/s, which is the speed of
0
sound in liquid helium at T = 0 . This gives \Theta\ns_\ssr{D}=19.8\,\RK . We find Θ = 2.13 K , and if we take η = 0.1 this gives Θ /η = 213 K , which significantly exceeds \Theta\ns_\ssr{D} .
⋆ ⋆ 2
Thus, the solid should melt because the RMS fluctuations in the atomic positions at absolute zero are huge: x\ns_{qu}=(\Theta^\star/\Theta\ns_\ssr{D})^{1/2}=0.33 . By applying pressure, one can get
4
He to crystallize above p = 25 atm (at absolute zero). Under pressure, the unit cell volume V decreases and the phonon velocity c̄ increases, so the Debye temperature itself increases.
c 0
It is important to recognize that the Lindemann criterion does not provide us with a theory of melting per se. Rather it provides us with a heuristic which allows us to predict roughly when a solid
should melt.
Goldstone bosons
The vanishing of the acoustic phonon dispersion at k = 0 is a consequence of Goldstone’s theorem which says that associated with every broken generator of a continuous symmetry there is an
associated bosonic gapless excitation ( one whose frequency ω vanishes in the long wavelength limit). In the case of phonons, the ‘broken generators’ are the symmetries under spatial translation in
the x, y , and z directions. The crystal selects a particular location for its center-of-mass, which breaks this symmetry. There are, accordingly, three gapless acoustic phonons.
Magnetic materials support another branch of elementary excitations known as spin waves, or magnons. In isotropic magnets, there is a global symmetry associated with rotations in internal spin
space, described by the group SU (2). If the system spontaneously magnetizes, meaning there is long-ranged ferromagnetic order (↑↑↑ ⋯), or long-ranged antiferromagnetic order (↑↓↑↓ ⋯), then
global spin rotation symmetry is broken. Typically a particular direction is chosen for the magnetic moment (or staggered moment, in the case of an antiferromagnet). Symmetry under rotations about
this axis is then preserved, but rotations which do not preserve the selected axis are ‘broken’. In the most straightforward case, that of the antiferromagnet, there are two such rotations for SU (2), and
concomitantly two gapless magnon branches, with linearly vanishing dispersions ω (k) . The situation is more subtle in the case of ferromagnets, because the total magnetization is conserved by the
a
dynamics (unlike the total staggered magnetization in the case of antiferromagnets). Another wrinkle arises if there are long-ranged interactions present.
For our purposes, we can safely ignore the deep physical reasons underlying the gaplessness of Goldstone bosons and simply posit a gapless dispersion relation of the form ω(k) = A |k|
σ
. The
density of states for this excitation branch is then
d
−1
σ
g(ω) = C ω Θ(ωc − ω) , (5.6.32)
where C is a constant and ω is the cutoff, which is the bandwidth for this excitation branch.4 Normalizing the density of states for this branch results in the identification ω
c c
σ/d
= (d/σC) .
The heat capacity is then found to be
ωc
2
d
−1 ℏω 2
ℏω
σ
C = N kB C ∫ dω ω ( ) csch ( )
V
kB T 2 kB T
0
d/σ
d 2T
= N kB ( ) ϕ(Θ/2T ) ,
σ Θ
where Θ = ℏω c / kB and
x σ d/σ
⎧
⎪ x x → 0
d d
+1 2
σ
ϕ(x) = ∫ dt t csch t = ⎨ (5.6.33)
⎩
⎪ −d/σ d d
0 2 Γ(2 + ) ζ(2 + ) x → ∞ ,
σ σ
which is a generalization of our earlier results. Once again, we recover Dulong-Petit for k B
T ≫ ℏωc , with C V
(T ≫ ℏωc / kB ) = N kB .
In an isotropic ferromagnet, a ferromagnetic material where there is full SU(2) symmetry in internal ‘spin’ space, the magnons have a k dispersion. Thus, a bulk three-dimensional isotropic 2
ferromagnet will exhibit a heat capacity due to spin waves which behaves as T at low temperatures. For sufficiently low temperatures this will overwhelm the phonon contribution, which behaves
3/2
as T .
3
This page titled 5.6: The Ideal Bose Gas is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel Arovas.
General formulation for noninteracting systems
Recall that the grand partition function for noninteracting bosons is given by
∞
−1
β(μ−εα ) nα β(μ−εα )
Ξ = ∏( ∑ e ) = ∏ (1 − e ) , (5.7.1)
α nα =0 α
In order for the sum to converge to the RHS above, we must have μ < ε for all single-particle states |α⟩. The density of particles α
is then
∞
1 ∂Ω 1 1 g(ε)
n(T , μ) = − ( ) = ∑ = ∫ dε , (5.7.2)
β( εα −μ) β(ε−μ)
V ∂μ V e −1 e −1
T ,V α
ε
0
where g(ε) = ∑ δ(ε − ε ) is the density of single particle states per unit volume. We assume that g(ε) = 0 for ε < ε ;
1
V α α 0
typically ε = 0 , as is the case for any dispersion of the form ε(k) = A|k| , for example. However, in the presence of a magnetic
0
r
field, we could have ε(k, σ) = A|k| − gμ H σ , in which case ε = −gμ |H | .

r
0 0 0
Clearly n(T , μ) is an increasing function of both T and μ . At fixed T , the maximum possible value for n(T , μ), called the critical
density n (T ), is achieved for μ = ε ,
c
0
g(ε)
nc (T ) = ∫ dε . (5.7.3)
β(ε−ε )
e 0
−1
ε
0
The above integral converges provided g(ε ) = 0 , assuming g(ε) is continuous5. If g(ε ) > 0 , the integral diverges, and
0 0
n (T ) = ∞ . In this latter case, one can always invert the equation for n(T , μ) to obtain the chemical potential μ(T , n) . In the
c
former case, where the n (T ) is finite, we have a problem – what happens if n > n (T ) ?
c c
In the former case, where n (T ) is finite, we can equivalently restate the problem in terms of a critical temperature T (n) , defined
c c
by the equation n (T ) = n . For T < T , we apparently can no longer invert to obtain μ(T , n), so clearly something has gone
c c c
wrong. The remedy is to recognize that the single particle energy levels are discrete, and separate out the contribution from the
lowest energy state ε . we write \[n(T,\mu) = \stackrel{n\ns_0}{\overbrace
0
} + \stackrel{n'}{\overbrace{\int\limits_{\ve\ns_0}^\infty\!\! d\ve\ {g(\ve)\over e^{\beta(\ve-\mu)}-1}}}\ ,\] where g is the 0
degeneracy of the single particle state with energy ε . We assume that n is finite, which means that N = V n is extensive. We
0 0 0 0
say that the particles have condensed into the state with energy ε . The quantity n is the condensate density. The remaining
0 0
particles, with density n , are said to comprise the overcondensate. With the total density n fixed, we have n = n + n . Note that
′
0
′
n finite means that μ is infinitesimally close to ε :

0 0
g g kB T
0 0
μ =ε − kB T ln(1 + ) ≈ε − . (5.7.4)
0 0
V n V n
0 0
Note also that if ε 0

−μ is finite, then n 0
∝V
−1
is infinitesimal.
Thus, for T < Tc (n) , we have μ = ε with n 0 0
>0 , and
∞
g(ε)
n(T , n ) = n + ∫ dε . (5.7.5)
0 0 (ε−ε )/ k T
e 0 B
−1
ε
0
For T > Tc (n) , we have n 0

=0 and
∞
g(ε)
n(T , μ) = ∫ dε . (5.7.6)
(ε−μ)/ kB T
e −1
ε
0
The equation for T c (n) is
∞
g(ε)
n = ∫ dε . (5.7.7)
(ε−ε )/ kB Tc
e 0 −1
ε
0
For another take on ideal Bose gas condensation see the appendix in §10.
Ballistic dispersion
We already derived, in §3.3, expressions for n(T , z) and p(T , z) for the ideal Bose gas (IBG) with ballistic dispersion
ε(p) = p /2m , We found \[\begin{split} n(T,z)&=\Sg\,\lambda_T^{-d}\>\,{Li}\ns_
2
(z)\vph\\ p(T,z)&=\Sg\,\kT\,\lambda_T^{-d}\>\,{Li}\ns_{{d\over 2}+1}(z) , \end{split}\] where g is the internal ( spin) degeneracy

of each single particle energy level. Here z = e is the fugacity and
μ/ kB T
∞ m
z
Lis (z) = ∑ (5.7.8)
s
m
m=1
is the polylogarithm function. For bosons with a spectrum bounded below by ε 0

=0 , the fugacity takes values on the interval
6
z ∈ [0, 1] .
Clearly $n(T,z)=\Sg\,\lambda_T^{-d}\>\,{Li}\ns_
(z)$ is an increasing function of z for fixed T . In Figure [zeta] we plot the function Li (z) versus z for three different values of s . s
We note that the maximum value Li (z = 1) is finite if s > 1 . Thus, for d > 2 , there is a maximum density $n\ns_{max}(T)=\Sg\,
s
{Li}\ns_(z)\,\lambda_T^{-d}$ which is an increasing function of temperature T . Put another way, if we fix the density n , then
there is a critical temperature T below which there is no solution to the equation n = n(T , z) . The critical temperature T (n) is
c c
then determined by the relation

d/2 2 2/d
d mkB Tc 2πℏ n
n = g ζ( )( ) ⟹ kB Tc = ( ) . (5.7.9)
2 d
2 2πℏ m g ζ( )
2
What happens for T < Tc ?

[zeta] The polylogarithm function {Li}_s(z) versus z for s=\half, s=\frac{3}{2}, and
s=\frac{5}{2}. Note that {Li}_s(1)=\zeta(s) diverges for s\le 1.
[zeta] The polylogarithm function Li versus z for s = ,s= , and s = . Note that Li diverges for s ≤ 1 .
1 3 5
s (z) 2 2 2 s (1) = ζ(s)
As shown above in §7, we must separate out the contribution from the lowest energy single particle mode, which for ballistic
dispersion lies at ε = 0 . Thus writing
0
1 1 1 1
n = + ∑ , (5.7.10)
−1 −1 εα / kB T
V z −1 V α z e −1
( ε >0)
α
where we have taken g = 1 . Now V is of course very small, since V is thermodynamically large, but if μ → 0 then z − 1 is
−1 −1
also very small and their ratio can be finite, as we have seen. Indeed, if the density of k = 0 bosons n is finite, then their total 0
number N satisfies
0
1 1
N =V n = ⟹ z = . (5.7.11)
0 0 −1 −1
z −1 1 +N
0
The chemical potential is then

kB T
−1 −
μ = kB T ln z = −kB T ln (1 + N ) ≈− → 0 . (5.7.12)
0
N
0
In other words, the chemical potential is infinitesimally negative, because N is assumed to be thermodynamically large. 0
According to Equation [Oqsm], the contribution to the pressure from the k = 0 states is
kB T kB T +
p =− ln(1 − z) = ln(1 + N ) → 0 . (5.7.13)
0 0
V V
So the k = 0 bosons, which we identify as the condensate, contribute nothing to the pressure.
Having separated out the k = 0 mode, we can now replace the remaining sum over α by the usual integral over k. We then have
d
−d
T < Tc : n =n + g ζ( )λ
0 T
2
d
−d
p = g ζ( +1) kB T λ
T
2
and \[\begin{split} \ \ \quad T>T\ns_\Rc\qquad :\qquad n&=\Sg\,{Li}\ns_

(z)\,\lambda_T^{-d}\vph\\ p&=\Sg\,{Li}\ns_{{d\over 2}+1}(z)\,\kT\,\lambda_T^{-d}\quad. \end{split}\]
The condensate fraction n /n is unity at T = 0 , when all particles are in the condensate with k = 0 , and decreases with increasing
0
T until T = T , at which point it vanishes identically. Explicitly, we have

c
d d/2
n (T ) g ζ( ) T
0 2
=1− = 1 −( ) . (5.7.14)
d
n nλ Tc (n)
T
Let us compute the internal energy E for the ideal Bose gas. We have
∂ ∂Ω ∂Ω
(βΩ) = Ω + β = Ω −T = Ω +TS (5.7.15)
∂β ∂β ∂T
and therefore
∂
E = Ω + T S + μN = μN + (βΩ)
∂β
∂
= V (μ n − (βp))
∂β
1 −d
= d gV kB T λ Li d (z) .
T +1
2 2
This expression is valid at all temperatures, both above and below Tc . Note that the condensate particles do not contribute to E ,
because the k = 0 condensate particles carry no energy.
We now investigate the heat capacity C =(

V ,N
) . Since we have been working in the GCE, it is very important to note that
∂E
∂T V ,N
N is held constant when computing C . We’ll also restrict our attention to the case d = 3 since the ideal Bose gas does not
V ,N
condense at finite T for d ≤ 2 and d > 3 is unphysical. While we’re at it, we’ll also set g = 1 .
[ibgcv] Molar heat capacity of the ideal Bose gas (units of R). Note the cusp at
T=T_\Rc.
[ibgcv] Molar heat capacity of the ideal Bose gas (units of R ). Note the cusp at T = Tc .
The number of particles is
3 −3
⎧ N + ζ( )V λ (T < Tc )
⎪ 0 2 T
N =⎨ (5.7.16)
⎩
⎪ −3
V λ Li (z) (T > Tc ) ,
T 3/2
and the energy is

3 V
E = kB T Li (z) . (5.7.17)
3 5/2
2 λ
T
For T < Tc , we have z = 1 and
∂E 15 5 V
C =( ) = ζ( ) kB . (5.7.18)
V ,N 3
∂T 4 2 λ
V ,N T
The molar heat capacity is therefore

C
V ,N 15 5 −1
3
c (T , n) = NA ⋅ = ζ( ) R ⋅ (n λ ) . (5.7.19)
V ,N T
N 4 2
For T > Tc , we have
15 V dT 3 V dz
dE ∣
∣ = kB T Li (z) ⋅ + kB T Li (z) ⋅ , (5.7.20)
V 5/2 3 3/2 3
4 λ T 2 λ z
T T
where we have invoked Equation [zetarec]. Taking the differential of N , we have

3 V dT V dz
dN ∣
∣ = Li (z) ⋅ + Li (z) ⋅ . (5.7.21)
V 3/2 3 1/2 3
2 λ T λ z
T T
We set dN =0 , which fixes dz in terms of dT , resulting in

5 3
3 Li5/2 (z) Li3/2 (z)
2 2
c (T , z) = R ⋅[ − ] . (5.7.22)
V ,N
2 Li3/2 (z) Li1/2 (z)
To obtain c V ,N
(T , n) , we must invert the relation
−3
n(T , z) = λ Li (z) (5.7.23)
T 3/2
in order to obtain z(T , n), and then insert this into Equation [ibgcg]. The results are shown in Figure [ibgcv]. There are several
noteworthy features of this plot. First of all, by dimensional analysis the function c (T , n) is R times a function of the
V ,N
dimensionless ratio T /T (n) ∝ T n

c . Second, the high temperature limit is R, which is the classical value. Finally, there is a
−2/3 3
cusp at T = T (n) .
c
For another example, see §11.
Isotherms for the ideal Bose gas

Let a be some length scale and define
2 2
2πℏ 2πℏ
3
va = a , pa = , Ta = (5.7.24)
5 2
ma ma kB
Then we have
3/2
va T
=( ) Li (z) + va n
3/2 0
v Ta
5/2
p T
=( ) Li (z) ,
5/2
pa Ta
where v = V /N is the volume per particle7 and n is the condensate number density; n vanishes for T ≥ T , where z = 1 . One
0 0 c
identifies a critical volume v (T ) by setting z = 1 and n = 0 , leading to v (T ) = v (T /T ) . For v < v (T ) , we set z = 1 in

c 0 c a a
3/2
c
Equation [BGIa] to find a relation between v , T , and n . For v > v (T ) , we set n = 0 in Equation [BGIa] to relate v , T , and z .
0 c 0
Note that the pressure is independent of volume for T < T . The isotherms in the (p, v) plane are then flat for v < v . This
c c
resembles the coexistence region familiar from our study of the thermodynamics of the liquid-gas transition. The situation is
depicted in Fig. [ibgpd]. In the (T , p) plane, we identify p (T ) = p (T /T ) as the critical temperature at which condensation
c a a
5/2
starts to occur.
[ibgpd] Phase diagrams for the ideal Bose gas. Left panel: (p,v) plane. The solid blue curves are
isotherms, and the green hatched region denotes v<v_\Rc(T), where the system is partially
condensed. Right panel: (p,T) plane. The solid red curve is the coexistence curve p_\Rc(T), along
which Bose condensation occurs. No distinct thermodynamic phase exists in the yellow hatched region
above p=p_\Rc(T).
[ibgpd] Phase diagrams for the ideal Bose gas. Left panel: (p, v) plane. The solid blue curves are isotherms, and the green hatched
region denotes v < v (T ) , where the system is partially condensed. Right panel: (p, T ) plane. The solid red curve is the
c
coexistence curve p (T ), along which Bose condensation occurs. No distinct thermodynamic phase exists in the yellow hatched
c
region above p = p (T ). c
Recall the Gibbs-Duhem equation,

dμ = −s dT + v dp . (5.7.25)
Along a coexistence curve, we have the Clausius-Clapeyron relation,
dp s −s ℓ
2 1
( ) = = , (5.7.26)
dT v −v T Δv
coex 2 1
where ℓ = T (s − s ) is the latent heat per mole, and Δv = v − v . For ideal gas Bose condensation, the coexistence curve
2 1 2 1
resembles the red curve in the right hand panel of Figure [ibgpd]. There is no meaning to the shaded region where p > p (T ) . c
Nevertheless, it is tempting to associate the curve p = p (T ) with the coexistence of the k = 0 condensate and the remaining
c
uncondensed (k ≠ 0) bosons8.
The entropy in the coexistence region is given by
5 5
1 ∂Ω 5 5 ζ( ) n
−3 2 2 0
s =− ( ) = ζ( ) kB v λ = kB (1 − ) . (5.7.27)
T 3
N ∂T 2 2 ζ( ) n
V
2
All the entropy is thus carried by the uncondensed bosons, and the condensate carries zero entropy. The Clausius-Clapeyron
relation can then be interpreted as describing a phase equilibrium between the condensate, for which s = v = 0 , and the 0 0
uncondensed bosons, for which s = s(T ) and v = v (T ) . So this identification forces us to conclude that the specific volume of
′ ′
c
the condensate is zero. This is certainly false in an interacting Bose gas!

While one can identify, by analogy, a ‘latent heat’ ℓ = T Δs = T s in the Clapeyron equation, it is important to understand that
there is no distinct thermodynamic phase associated with the region p > p (T ) . Ideal Bose gas condensation is a second order c
transition, and not a first order transition.

[He4PD] Phase diagram of {}^4He. All phase
boundaries are first order transition lines, with the
exception of the normal liquid-superfluid transition, which
is second order. (Source: University of Helsinki)
[He4PD] Phase diagram of He. All phase boundaries are first order transition lines, with the exception of the normal liquid-
4
superfluid transition, which is second order. (Source: University of Helsinki)
The λ-transition in Liquid He 4
Helium has two stable isotopes. He is a boson, consisting of two protons, two neutrons, and two electrons (hence an even number
4
of fermions). He is a fermion, with one less neutron than He. Each He atom can be regarded as a tiny hard sphere of mass
3 4 4
m = 6.65 × 10 g and diameter a = 2.65 Å. A sketch of the phase diagram is shown in Figure [He4PD]. At atmospheric
−24
pressure, Helium liquefies at T = 4.2 K. The gas-liquid transition is first order, as usual. However, as one continues to cool, a
l
second transition sets in at T = T = 2.17 K (at p = 1 atm). The λ -transition, so named for the λ -shaped anomaly in the specific
λ
heat in the vicinity of the transition, as shown in Figure [cphelium], is continuous ( second order).
If we pretend that He is a noninteracting Bose gas, then from the density of the liquid
4
n = 2.2 × 10
22 −3
cm , we obtain a Bose-
2
2/3
Einstein condensation temperature T c =
2πℏ
m
(n/ζ(
3
2
)) = 3.16 K, which is in the right ballpark. The specific heat Cp (T ) is
found to be singular at T = T , with λ
−α
Cp (T ) = A ∣
∣T − Tλ (p)∣
∣ . (5.7.28)
α is an example of a critical exponent. We shall study the physics of critical phenomena later on in this course. For now, note that a
cusp singularity of the type found in Figure [ibgcv] corresponds to α = −1 . The behavior of C (T ) in He is very nearly p
4
logarithmic in |T − T | . In fact, both theory (renormalization group on the O(2) model) and experiment concur that α is almost
λ
zero but in fact slightly negative, with α = −0.0127 ± 0.0003 in the best experiments (Lipa , 2003). The λ transition is most
definitely not an ideal Bose gas condensation. Theoretically, in the parlance of critical phenomena, IBG condensation and the λ -
transition in He lie in different universality classes9. Unlike the IBG, the condensed phase in He is a distinct thermodynamic
4 4
phase, known as a superfluid.

[cphelium] Specific heat of liquid {}^4He in the vicinity of the \lambda-transition. Data from M. J.
Buckingham and W. M. Fairbank, in Progress in Low Temperature Physics, C. J. Gortner, ed.
(North-Holland, 1961). Inset at upper right: more recent data of J. A. Lipa et al., Phys. Rev. B 68,
174518 (2003) performed in zero gravity earth orbit, to within \RDelta T=2\,nK of the transition.
[cphelium] Specific heat of liquid He in the vicinity of the λ -transition. Data from M. J. Buckingham and W. M. Fairbank, in
4
Progress in Low Temperature Physics, C. J. Gortner, ed. (North-Holland, 1961). Inset at upper right: more recent data of J. A. Lipa
et al., Phys. Rev. B 68, 174518 (2003) performed in zero gravity earth orbit, to within ΔT = 2 nK of the transition.
Note that Cp (T < Tc ) for the IBG is not even defined, since for T < Tc we have p = p(T ) and therefore dp = 0 requires
dT = 0 .
Fountain effect in superfluid He 4
At temperatures T < T , liquid He has a superfluid component which is a type of Bose condensate. In fact, there is an important
λ
4
difference between condensate fraction N /N and superfluid density, which is denoted by the symbol ρ . In He, for example,
k=0 s
4
at T = 0 the condensate fraction is only about 8%, while the superfluid fraction ρ /ρ = 1 . The distinction between N and ρ is s 0 s
very interesting but lies beyond the scope of this course.

One aspect of the superfluid state is its complete absence of viscosity. For this reason, superfluids can flow through tiny cracks
called microleaks that will not pass normal fluid. Consider then a porous plug which permits the passage of superfluid but not of
normal fluid. The key feature of the superfluid component is that it has zero energy density. Therefore even though there is a
transfer of particles across the plug, there is no energy exchange, and therefore a temperature gradient across the plug can be
maintained10.
The elementary excitations in the superfluid state are sound waves called phonons. They are compressional waves, just like
longitudinal phonons in a solid, but here in a liquid. Their dispersion is acoustic, given by ω(k) = ck where c = 238 \Rm/s.11
The have no internal degrees of freedom, hence g = 1 . Like phonons in a solid, the phonons in liquid helium are not conserved.
Hence their chemical potential vanishes and these excitations are described by photon statistics. We can now compute the height
difference Δh in a U-tube experiment.
[fountain] The fountain effect. In each case, a temperature gradient is maintained across a porous
plug through which only superfluid can flow. This results in a pressure gradient which can result in
a fountain or an elevated column in a U-tube.
[fountain] The fountain effect. In each case, a temperature gradient is maintained across a porous plug through which only
superfluid can flow. This results in a pressure gradient which can result in a fountain or an elevated column in a U-tube.
Clearly Δh = Δp/ρg. so we must find p(T ) for the helium. In the grand canonical ensemble, we have
3
dk −ℏck/ kB T
p = −Ω/V = −kB T ∫ ln (1 − e )
(2π)3
∞
4 2 4
(kB T ) 4π π (kB T )
2 −u
=− ∫ du u ln(1 − e ) = .
3 3 3
(ℏc) 8π 90 (ℏc)
0
3
Let’s assume T =1 K. We’ll need the density of liquid helium, ρ = 148 kg/\Rm .
2 3
dh 2π kB T kB
= ( )
dT 45 ℏc ρg
3
2 −23 −23
2π (1.38 × 10 J/K)(1 K) (1.38 × 10 J/K)
= ( ) × ≃ 32 cm/K ,
−34 3
45 (1.055 × 10 J ⋅ s)(238 \Rm/s) (148 kg/ \Rm )(9.8 \Rm/ s )
2
a very noticeable effect!
Bose condensation in optical traps

The 2001 Nobel Prize in Physics was awarded to Weiman, Cornell, and Ketterle for the experimental observation of Bose
condensation in dilute atomic gases. The experimental techniques required to trap and cool such systems are a true tour de force,
and we shall not enter into a discussion of the details here12.
The optical trapping of neutral bosonic atoms, such as Rb, results in a confining potential V (r) which is quadratic in the atomic
87
positions. Thus, the single particle Hamiltonian for a given atom is written
2
ℏ 2
1 2 2 2
^ 2 2 2
H =− ∇ + m(ω x +ω y +ω z ) , (5.7.29)
1 2 3
2m 2
where ω 1,2,3
are the angular frequencies of the trap. This is an anisotropic three-dimensional harmonic oscillator, the solution of
which is separable into a product of one-dimensional harmonic oscillator wavefunctions. The eigenspectrum is then given by a sum
of one-dimensional spectra, viz.
1 1 1
E = (n + ) ℏω + (n + ) ℏω + (n + ) ℏω . (5.7.30)
n ,n ,n 1 1 2 2 3 3
1 2 3
2 2 2
According to Equation [Ntot], the number of particles in the system is
∞ ∞ ∞
−1
−1 n ℏω / kB T n ℏω / kB T n ℏω / kB T
N = ∑ ∑ ∑ [y e 1 1
e 2 2
e 3 3
− 1]
n =0 n =0 n =0
1 2 3
∞
1 1 1
k
= ∑y ( )( )( ) ,
−kℏω / kB T −kℏω / kB T −kℏω / kB T
1 −e 1 1 −e 2 1 −e 3
k=1
where we’ve defined

μ/ kB T −ℏω /2 kB T −ℏω /2 kB T −ℏω /2 kB T
y ≡e e 1
e 2
e 3
. (5.7.31)
Note that y ∈ [0, 1].

Let’s assume that the trap is approximately anisotropic, which entails that the frequency ratios ω 1
/ω
2
are all numbers on the order
of one. Let us further assume that k T ≫ ℏω
B
. Then 1,2,3
kB T ∗
⎧
⎪ k ∝ k (T )
⎪ kℏω
1 j
≈ ⎨ (5.7.32)
−kℏωj / kB T
1 −e ⎪
⎩
⎪ ∗
1 k > k (T )
where k ∗
(T ) = kB T /ℏω̄ ≫ 1 , with
1/3
ω̄ = (ω ω ω ) . (5.7.33)
1 2 3
We then have
∗ ∗
k +1 3 k k
y kB T y
N (T , y) ≈ +( ) ∑ , (5.7.34)
3
1 −y ℏω̄ k
k=1
where the first term on the RHS is due to k > k and the second term from k ≤ k in the previous sum. Since k ≫ 1 and since
∗ ∗ ∗
the sum of inverse cubes is convergent, we may safely extend the limit on the above sum to infinity. To help make more sense of
−1
the first term, write N = (y − 1)
0
−1
for the number of particles in the (n , n , n ) = (0, 0, 0) state. Then 1 2 3
N
0
y = . (5.7.35)
N +1
0
This is true always. The issue vis-a-vis Bose-Einstein condensation is whether N 0

≫ 1 . At any rate, we now see that we can write
3
−k
∗
kB T
−1
N ≈N (1 + N ) +( ) Li3 (y) . (5.7.36)
0 0
ℏω̄
As for the first term, we have

∗
∗
⎪0
⎧ N
0
≪ k
−1 −k
N (1 + N ) =⎨ (5.7.37)
0 0
⎩
⎪ ∗
N N ≫ k
0 0
Thus, as in the case of IBG condensation of ballistic particles, we identify the critical temperature by the condition
y = N /(N + 1) ≈ 1 , and we have \[T\ns_\Rc={\hbar{\bar\omega}\over\kB}\,\bigg({N\over\zeta(3)}\bigg)^{\!\!1/3} =
0 0
4.5\,\bigg(
\bigg)\>N^{1/3} \ [\,{nK}\,]\ ,\] where ν̄ = ω̄/2π . We see that k B
Tc ≫ ℏω̄ if the number of particles in the trap is large: N ≫ 1 .
In this regime, we have
3
kB T
T < Tc : N =N + ζ(3)( )
0
ℏω̄
3
kB T
T > Tc : N =( ) Li (y) . ]
3
ℏω̄
It is interesting to note that BEC can also occur in two-dimensional traps, which is to say traps which are very anisotropic, with
oblate equipotential surfaces V (r) = V . This happens when ℏω ≫ k T ≫ ω . We then have
0 3 B
1,2
1/2
(d=2)
ℏω̄ 6N
Tc = ⋅( ) (5.7.38)
2
kB π
. The particle number then obeys a set of equations like those in eqns. [trapab], mutatis mutandis13.
1/2
with ω̄ = (ω 1
ω )
2
For extremely prolate traps, with ω 3

≪ ω
1,2
, the situation is different because Li 1
(y) diverges for y = 1 . We then have
kB T
N =N + ln (1 + N ) . (5.7.39)
0 0
ℏω
3
Here we have simply replaced y by the equivalent expression N /(N

0 0
+ 1) . If our criterion for condensation is that N
0
= αN ,
where α is some fractional value, then we have
ℏω N
3
Tc (α) = (1 − α) ⋅ . (5.7.40)
kB ln N
This page titled 5.7: The Ideal Fermi Gas is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel Arovas.
Grand potential and particle number
The grand potential of the ideal Fermi gas is, per Equation [Oqsm],
μ/ kB T −εα / kB T
Ω(T , V , μ) = −V kB T ∑ ln (1 + e e )
(μ−ε)/ kB T
= −V kB T ∫ dε g(ε) ln (1 + e ) .
−∞
The average number of particles in a state with energy ε is

1
n(ε) = , (5.8.1)
(ε−μ)/ kB T
e +1
hence the total number of particles is

∞
1
N = V∫ dε g(ε) . (5.8.2)
(ε−μ)/ kB T
e +1
−∞
The Fermi distribution

We define the function
1
f (ϵ) ≡ , (5.8.3)
ϵ/ kB T
e +1
known as the Fermi distribution. In the T → ∞ limit, f (ϵ) → for all finite values of ε . As T → 0 , f (ϵ) approaches a step function Θ(−ϵ) . The average number of particles in a state of energy ε in
1
a system at temperature T and chemical potential μ is n(ε) = f (ε − μ) . In Figure [fermidist] we plot f (ε − μ) versus ε for three representative temperatures.
T = 0 and the Fermi surface

At T = 0 , we therefore have n(ε) = Θ(μ − ε) , which says that all single particle energy states up to ε = μ are filled, and all energy states above ε = μ are empty. We call μ(T = 0) the Fermi
energy: ε = μ(T = 0) . If the single particle dispersion ε(k) depends only on the wavevector k, then the locus of points in k-space for which ε(k) = ε is called the Fermi surface. For isotropic
F F
systems, ε(k) = ε(k) is a function only of the magnitude k = |k| , and the Fermi surface is a sphere in d = 3 or a circle in d = 2 . The radius of this circle is the Fermi wavevector, k . When there is F
internal ( spin) degree of freedom, there is a Fermi surface and Fermi wavevector (for isotropic systems) for each polarization state of the internal degree of freedom.
[fermidist] The Fermi distribution, f(\eps)=\big[\exp(\eps/k_\ssr{B}T)+1\big]^{-1}.
Here we have set k_\ssr{B}=1 and taken \mu=2, with T={1\over 20} (blue), T=
{3\over 4} (green), and T=2 (red). In the T\to 0 limit, f(\eps) approaches a step function
\RTheta(-\eps).
[fermidist] The Fermi distribution, f(\eps)=\big[\exp(\eps/k_\ssr{B}T)+1\big]^{-1} . Here we have set k_\ssr{B}=1 and taken μ = 2 , with T =
1
20
(blue), T =
3
4
(green), and T = 2 (red). In the T → 0
limit, f (ϵ) approaches a step function Θ(−ϵ).

Let’s compute the Fermi wavevector k and Fermi energy ε for the IFG with a ballistic dispersion ε(k) = ℏ
F F
2
k /2m
2
. The number density is
⎧ g k /π (d = 1)
⎪ F
⎪
⎪
d
⎪
⎪
d
d k gΩ k
d F 2
n = g∫ Θ(k − k) = ⋅ =⎨gk /4π (d = 2) (5.8.4)
F F
(2π)d (2π)d d
⎪
⎪
⎪
⎪
⎩
⎪ 3 2
gk /6 π (d = 3) ,
F
where Ω = 2π /Γ(d/2) is the area of the unit sphere in

d
d/2
d space dimensions. Note that the form of n(k
F
) is independent of the dispersion relation, so long as it remains isotropic. Inverting the
above expressions, we obtain k (n) : F
⎧ πn/g (d = 1)
⎪
⎪
⎪
⎪
1/d ⎪
dn
1/2
k = 2π ( ) = ⎨ (4πn/g) (d = 2) (5.8.5)
F
gΩ ⎪
d ⎪
⎪
⎪
⎩
⎪ 2 1/3
(6 π n/g ) (d = 3) .
The Fermi energy in each case, for ballistic dispersion, is therefore

2 2 2
π ℏ n
⎧ (d = 1)
⎪
⎪
⎪ 2 g2 m
⎪
⎪
⎪
⎪
2 2 2 2 2/d ⎪
ℏ k 2π ℏ dn 2
F 2πℏ n
ε = = ( ) =⎨ (d = 2) (5.8.6)
F g m
2m m gΩ ⎪
d ⎪
⎪
⎪
⎪
⎪
⎪
⎩
⎪ ℏ
2
6π
2
n 2/3
( ) (d = 3) .
2m g
Another useful result for the ballistic dispersion, which follows from the above, is that the density of states at the Fermi level is given by
g(\veF)={\Sg\,\Omega\ns_d\over (2\pi)^d}\cdot{m k_\ssr{F}^{d-2}\over\hbar^2}={d\over 2}\cdot{n\over\veF}\ .

−1 −1
For the electron gas, we have g = 2 . In a metal, one typically has k ∼ 0.5 Å to 2 Å , and ε ∼ 1 eV − 10 eV . Due to the effects of the crystalline lattice, electrons in a solid behave as if they
F F
had an effective mass m which is typically on the order of the electron mass but very often about an order of magnitude smaller, particularly in semiconductors.
∗
Nonisotropic dispersions ε(k) are more interesting in that they give rise to non-spherical Fermi surfaces. The simplest example is that of a two-dimensional ‘tight-binding’ model of electrons hopping
on a square lattice, as may be appropriate in certain layered materials. The dispersion relation is then
ε(kx , ky ) = −2t cos(kx a) − 2t cos(ky a) , (5.8.7)
where k and k are confined to the interval [ − , ] . The quantity t has dimensions of energy and is known as the hopping integral. The Fermi surface is the set of points (k , k ) which satisfies
x y
π
a
π
a x y
ε(k , k ) = ε
x y . When ε achieves its minimum value of ε
F F
= −4t , the Fermi surface collapses to a point at (k , k ) = (0, 0) . For energies just above this minimum value, we can expand the
min
F x y
dispersion in a power series, writing
2
1 4
2 2 4 4
ε(kx , ky ) = −4t + ta (kx + ky ) − ta (kx + ky ) + … . (5.8.8)
12
If we only work to quadratic order in k and k , the dispersion is isotropic, and the Fermi surface is a circle, with k = (ε + 4t)/ta . As the energy increases further, the continuous O(2) rotational
x y
2
F F
2
invariance is broken down to the discrete group of rotations of the square, C . The Fermi surfaces distort and eventually, at ε = 0 , the Fermi surface is itself a square. As ε increases further, the
4v F F
square turns back into a circle, but centered about the point ( π
a
,
π
a
. Note that everything is periodic in k and k modulo
) x y
2π
a
. The Fermi surfaces for this model are depicted in the upper right panel of
Figure [fermisurfs].
[fermisurfs] Fermi surfaces for two and three-dimensional structures. Upper
left: free particles in two dimensions. Upper right: ‘tight binding’ electrons on
a square lattice. Lower left: Fermi surface for cesium, which is predominantly
composed of electrons in the 6s orbital shell. Lower right: the Fermi surface of
yttrium has two parts. One part (yellow) is predominantly due to 5s electrons,
while the other (pink) is due to 4d electrons. (Source:
www.phys.ufl.edu/fermisurface/)
[fermisurfs] Fermi surfaces for two and three-dimensional structures. Upper left: free particles in two dimensions. Upper right: ‘tight binding’ electrons on a square lattice. Lower left: Fermi surface
for cesium, which is predominantly composed of electrons in the 6s orbital shell. Lower right: the Fermi surface of yttrium has two parts. One part (yellow) is predominantly due to 5s electrons,
while the other (pink) is due to 4d electrons. (Source: www.phys.ufl.edu/fermisurface/)
Fermi surfaces in three dimensions can be very interesting indeed, and of great importance in understanding the electronic properties of solids. Two examples are shown in the bottom panels of Figure
[fermisurfs]. The electronic configuration of cesium (Cs) is [Xe] 6s . The 6s electrons ‘hop’ from site to site on a body centered cubic (BCC) lattice, a generalization of the simple two-dimensional
1
square lattice hopping model discussed above. The elementary unit cell in k space, known as the first Brillouin zone, turns out to be a dodecahedron. In yttrium, the electronic structure is
[Kr] 5 s 4 d , and there are two electronic energy bands at the Fermi level, meaning two Fermi surfaces. Yttrium forms a hexagonal close packed (HCP) crystal structure, and its first Brillouin zone is
2 1
shaped like a hexagonal pillbox.

Spin-split Fermi surfaces
Consider an electron gas in an external magnetic field H . The single particle Hamiltonian is then
\HH={\Bp^2\over 2m} +\muB H\,\sigma
where \muB is the Bohr magneton,
\begin{split} \muB&={e\hbar\over 2m c}=5.788\times 10^{-9}\,{eV}/\RG\\ \muB/\kB&=6.717\times 10^{-5}\,\RK/\RG \end{split}
where m is the electron mass. What happens at T =0 to a noninteracting electron gas in a magnetic field?
Electrons of each spin polarization form their own Fermi surfaces. That is, there is an up spin Fermi surface, with Fermi wavevector k
F↑
, and a down spin Fermi surface, with Fermi wavevector
\kFd. The individual Fermi energies, on the other hand, must be equal, hence
{\hbar^2 k_{\RF\uar}^2\over 2m} + \muB H = {\hbar^2 k_{\RF\dar}^2\over 2m} - \muB H,
which says
2eH
2 2
k −k = . (5.8.9)
F↓ F↑
ℏc
The total density is

3 3
k k
F↑ F↓
3 3 2
n = + ⟹ k +k = 6 π n. (5.8.10)
F↑ F↓
6π 2 6π 2
Clearly the down spin Fermi surface grows and the up spin Fermi surface shrinks with increasing H . Eventually, the minority spin Fermi surface vanishes altogether. This happens for the up spins
when k = 0 . Solving for the critical field, we obtain
F↑
ℏc 1/3
2
Hc = ⋅ (6 π n) . (5.8.11)
2e
In real magnetic solids, like cobalt and nickel, the spin-split Fermi surfaces are not spheres, just like the case of the (spin degenerate) Fermi surfaces for Cs and Y shown in Figure [fermisurfs].
The Sommerfeld expansion

In dealing with the ideal Fermi gas, we will repeatedly encounter integrals of the form
∞
I (T , μ) ≡∫ dε f (ε − μ) ϕ(ε) . (5.8.12)
−∞
The Sommerfeld expansion provides a systematic way of expanding these expressions in powers of T and is an important analytical tool in analyzing the low temperature properties of the ideal Fermi
gas (IFG).
We start by defining
ε
′ ′
Φ(ε) ≡∫ dε ϕ(ε ) (5.8.13)
−∞
so that ϕ(ε) = Φ (ε) . We then have

′
dΦ
I =∫ dε f (ε − μ)
dε
−∞
′
= −∫ dε f (ε) Φ(μ + ε) ,
−∞
where we assume Φ(−∞) = 0 . Next, we invoke Taylor’s theorem, to write

∞ n n
ε d Φ
Φ(μ + ε) = ∑
n
n! dμ
n=0
d
= exp(ε ) Φ(μ) .
dμ
This last expression involving the exponential of a differential operator may appear overly formal but it proves extremely useful. Since
ε/ kB T
1 e
′
f (ε) = − , (5.8.14)
2
kB T ε/ kB T
(e + 1)
we can write
∞
vD
e
I =∫ dv Φ(μ) , (5.8.15)
v −v
(e + 1)(e + 1)
−∞
with v = ε/k B
T , where
d
D = kB T (5.8.16)
dμ
14
is a dimensionless differential operator. The integral can now be done using the methods of complex integration:
∞
vD ∞ vD
e e
∫ dv = 2πi ∑ Res[ ]
v −v v −v
(e + 1)(e + 1) (e + 1)(e + 1)
n=1
−∞ v=(2n+1)iπ
(2n+1)iπD
= −2πi ∑ D e
n=0
iπD
2πiD e
=− = πD csc πD .
2πiD
1 −e
[vcontour] Deformation of the complex integration contour in Equation [vcon].
[vcontour] Deformation of the complex integration contour in Equation [vcon].

Thus,
I (T , μ) = πD csc(πD) Φ(μ) , (5.8.17)
which is to be understood as the differential operator πD csc(πD) = πD/ sin(πD) acting on the function Φ(μ). Appealing once more to Taylor’s theorem, we have
2 2 4 4
π 2
d 7π 4
d
πD csc(πD) = 1 + (kB T ) + (kB T ) +… . (5.8.18)
2 4
6 dμ 360 dμ
Thus,
∞
I (T , μ) =∫ dε f (ε − μ) ϕ(ε)
−∞
μ
2 4
π 7π
2 ′ 4 ′′′
=∫ dε ϕ(ε) + (kB T ) ϕ (μ) + (kB T ) ϕ (μ) + … .
6 360
−∞
If ϕ(ε) is a polynomial function of its argument, then each derivative effectively reduces the order of the polynomial by one degree, and the dimensionless parameter of the expansion is (T /μ) . This 2
procedure is known as the Sommerfeld expansion.
Chemical potential shift

As our first application of the Sommerfeld expansion formalism, let us compute μ(n, T ) for the ideal Fermi gas. The number density n(T , μ) is
∞
n =∫ dε g(ε) f (ε − μ)
−∞
μ
2
π 2 ′
=∫ dε g(ε) + (kB T ) g (μ) + … .
6
−∞
Let us write μ = ε F
+ δμ , where ε F
= μ(T = 0, n) is the Fermi energy, which is the chemical potential at T =0 . We then have
ε +δμ
F
2
π
2 ′
n =∫ dε g(ε) + (kB T ) g (ε + δμ) + …
F
6
−∞
ε
F
2
π 2 ′
=∫ dε g(ε) + g(ε ) δμ + (kB T ) g (ε )+… ,
F F
6
−∞
from which we derive

2 ′
π g (ε )
2 F 4
δμ = − (kB T ) + O(T ) . (5.8.19)
6 g(ε )
F
Note that g ′
/g = (ln g)
′
. For a ballistic dispersion, assuming g = 2 ,
3 2 2
d k ℏ k m k(ε) ∣
g(ε) = 2 ∫ δ(ε − ) = ∣ (5.8.20)
(2π)3 2m π 2 ℏ 2 ∣k(ε)= 1
√2mε
ℏ
Thus, g(ε) ∝ ε 1/2

and (ln g) ′
=
1
2
ε
−1
, so
2 2
π (kB T )
μ(n, T ) = ε − +… , (5.8.21)
F
12 ε
F
where ε F
(n) =
ℏ
2m
2
(3 π n)
2/3
.
Specific heat
The energy of the electron gas is
∞
E
=∫ dε g(ε) ε f (ε − μ)
V
−∞
μ
2
π 2
d
=∫ dε g(ε) ε + (kB T ) (μ g(μ)) + …
6 dμ
−∞
ε
F
2 2
π π
2 ′ 2
=∫ dε g(ε) ε + g(ε )ε δμ + (kB T ) ε g (ε )+ (kB T ) g(ε )+…
F F F F F
6 6
−∞
2
π 2
=ε + (kB T ) g(ε )+… ,
0 F
6
ε
F
where ε 0
= ∫dε g(ε) ε . is the ground state energy density ( ground state energy per unit volume). Thus, to order T , 2
−∞
C\ns_{V,N}=\pabc{E}{T}{V,N}={\pi^2\over 3}\,V k_\ssr{B}^2\, T\,g(\veF)\equiv V\gamma\, T\ ,
where \gamma(n)={\pi^2\over 3}\>k_\ssr{B}^2\,g\big(\veF(n)\big) . Note that the molar heat capacity is

2 2
NA π kB T g(ε ) π kB T
F
c = ⋅C = R⋅ = ( )R , (5.8.22)
V V
N 3 n 2 ε
F
where in the last expression on the RHS we have assumed a ballistic dispersion, for which
{g(\veF)\over n}={\Sg\,m \kF\over 2\pi^2\hbar^2}\cdot{6\pi^2\over\Sg\,k_\ssr{F}^3}={3\over 2\,\veF}\ .
The molar heat capacity in Equation [mhceg] is to be compared with the classical ideal gas value of R. Relative to the classical ideal gas, the IFG value is reduced by a fraction of 3
(π /3) × (k T / ε ) , which in most metals is very small and even at room temperature is only on the order of 10 . Most of the heat capacity of metals at room temperature is due to the energy
2 −2
B
F
stored in lattice vibrations.

A niftier way to derive the heat capacity15: Starting with Equation [dmueqn] for μ(T ) − ε
2
F
≡ δμ(T ) , note that g(ε F
) = dn/dε
F
, so we may write δμ = − π
6
2
(kB T ) (dg/dn) + O(T
4
) . Next, use
the Maxwell relation (∂S/∂N ) = −(∂μ/∂T )
T ,V
to arrive at
N ,V
\pabc{s}{n}{T}={\pi^2\over 3}\,k_\ssr{B}^2 T\,{\pz g(\veF)\over\pz n}+ \CO(T^3)\quad,
where s = S/V is the entropy per unit volume. Now use S(T = 0) = 0 and integrate with respect to the density n to arrive at S(T , V , N ) = V γT , where γ(n) is defined above.
Magnetic susceptibility
Pauli paramagnetism
Magnetism has two origins: (i) orbital currents of charged particles, and (ii) intrinsic magnetic moment. The intrinsic magnetic moment m of a particle is related to its quantum mechanical spin via
qℏ
m = gμ S/ℏ , μ = = magneton , (5.8.23)
0 0
2mc
where g is the particle’s g -factor, μ its magnetic moment, and

0
S is the vector of quantum mechanical spin operators satisfying [S
α
, S
β
] = iℏϵ
αβγ
S
γ
, SU(2) commutation relations. The
Hamiltonian for a single particle is then
\begin{split} \HH&={1\over 2m^*}\Big(\Bp-{q\over c}\BA\Big)^{\!\!2} -\BH\ncdot\Bm\bvph\\ &={1\over 2m^*}\Big(\Bp+{e\over c}\BA\Big)^{\!\!2} + {g\over 2}\,\muB H\,\sigma\ , \end{split}
where in the last line we’ve restricted our attention to the electron, for which q = −e . The g -factor for an electron is g = 2 at tree level, and when radiative corrections are accounted for using
quantum electrodynamics (QED) one finds g = 2.0023193043617(15). For our purposes we can take g = 2 , although we can always absorb the small difference into the definition of \muB , writing
\muB\to {\tilde\mu}\ns_\ssr{B}=g e\hbar/4mc . We’ve chosen the z
^ -axis in spin space to point in the direction of the magnetic field, and we wrote the eigenvalues of S as ℏσ, where σ = ±1 . The z 1
quantity m is the effective mass of the electron, which we mentioned earlier. An important distinction is that it is m which enters into the kinetic energy term p /2m , but it is the electron mass m
∗ ∗ 2 ∗
itself (m = 511 keV) which enters into the definition of the Bohr magneton. We shall discuss the consequences of this further below.
In the absence of orbital magnetic coupling, the single particle dispersion is
2 2
ℏ k
~
εσ (k) = + μH σ . (5.8.24)
∗
2m
At T =0 , we have the results of §8.3.1. At finite T , we once again use the Sommerfeld expansion. We then have
∞ ∞
n = ∫ dε g (ε) f (ε − μ) + ∫ dε g (ε) f (ε − μ)
↑ ↓
−∞ −∞
1
~ ~
= ∫ dε {g(ε − μH ) + g(ε + μH )} f (ε − μ)
2
−∞
~ 2 ′′
= ∫ dε {g(ε) + (μH ) g (ε) + … } f (ε − μ) .
−∞
We now invoke the Sommerfeld expension to find the temperature dependence:

μ
2
π
2 ′ ~ 2 ′
n = ∫ dε g(ε) + (kB T ) g (μ) + (μH ) g (μ) + …
6
−∞
ε
F
2
π 2 ′ ~ 2 ′
= ∫ dε g(ε) + g(ε ) δμ + (kB T ) g (ε ) + (μH ) g (ε )+… .
F F F
6
−∞
Note that the density of states for spin species σ is

1
~
gσ (ε) = g(ε − μH σ) , (5.8.25)
2
where g(ε) is the total density of states per unit volume, for both spin species, in the absence of a magnetic field. We conclude that the chemical potential shift in an external field is
2 ′
π g (ε )
2 ~ 2 F
δμ(T , n, H ) = −{ (kB T ) + (μH ) } +… . (5.8.26)
6 g(ε )
F
[FSmag] Fermi distributions in the presence of an external Zeeman-coupled

magnetic field.
[FSmag] Fermi distributions in the presence of an external Zeeman-coupled magnetic field.
We next compute the difference n ↑
−n
↓
in the densities of up and down spin electrons:
∞
n −n = ∫ dε {g (ε) − g (ε)} f (ε − μ)
↑ ↓ ↑ ↓
−∞
1
~ ~
= ∫ dε {g(ε − μH ) − g(ε + μH )} f (ε − μ)
2
−∞
~ 3
= −μH ⋅ πD csc(πD) g(μ) + O(H ) .
We needn’t go beyond the trivial lowest order term in the Sommerfeld expansion, because H is already assumed to be small. Thus, the magnetization density is
M=-\mutB(n\ns_\uar-n\ns_\dar)={\tilde \mu}^2_\ssr{B}\,g(\veF)\, H\ .
in which the magnetic susceptibility is
\xhi=\bigg({\pz M\over\pz H}\bigg)\nd_{T,N}={\tilde \mu}^2_\ssr{B}\,g(\veF)\ .
This is called the Pauli paramagnetic susceptibility.
Landau Diamagnetism
When orbital effects are included, the single particle energy levels are given by
2 2
1 ℏ kz
~
ε(n, kz , σ) = (n + )ℏωc + + μH σ . (5.8.27)
∗
2 2m
Here n is a Landau level index, and ω c = eH / m c

∗
is the cyclotron frequency. Note that
~ ∗ ∗
μH geℏH m c g m
= ⋅ = ⋅ . (5.8.28)
ℏωc 4mc ℏeH 4 m
Accordingly, we define the ratio r ≡ (g/2) × (m ∗

/m) . We can then write
2 2
1 1 ℏ kz
ε(n, kz , σ) = (n + + rσ)ℏωc + . (5.8.29)
∗
2 2 2m
The grand potential is then given by

∞
∞
HA dkz μ/ kB T −(n+
1
+
1
rσ)ℏωc / kB T −ℏ
2 2 ∗
k /2 m kB T
Ω =− ⋅ Lz ⋅ kB T ∫ ∑ ∑ ln[1 + e e 2 2
e z
] . (5.8.30)
ϕ 2π
0 n=0 σ=±1
−∞
A few words are in order here regarding the prefactor. In the presence of a uniform magnetic field, the energy levels of a two-dimensional ballistic charged particle collapse into Landau levels. The
number of states per Landau level scales with the area of the system, and is equal to the number of flux quanta through the system: N = H A/ϕ , where ϕ = hc/e is the Dirac flux quantum. Note ϕ 0 0
that
HA V
⋅ Lz ⋅ kB T = ℏωc ⋅ , (5.8.31)
3
ϕ λ
0 T
hence we can write

∞
1 1
Ω(T , V , μ, H ) = ℏωc ∑ ∑ Q((n + + rσ)ℏωc − μ) , (5.8.32)
2 2
n=0 σ=±1
where
∞
V dkz −ε/ kB T −ℏ
2 2
kz /2 m kB T
∗
Q(ε) = − ∫ ln[1 + e e ] . (5.8.33)

2
λ 2π
T
−∞
We now invoke the Euler-MacLaurin formula,

∞
∞
1 1
′
∑ F (n) = ∫ dx F (x) + F (0) − F (0) + … , (5.8.34)
2 12
n=0
0
resulting in
∞
1 1
Ω = ∑ { ∫ dε Q(ε − μ) + ℏωc Q( (1 + rσ)ℏωc − μ)
2 2
σ=±1
1
(1+rσ)ℏωc
2
1 2 ′
1
− (ℏωc ) Q ( (1 + rσ)ℏωc − μ) + … }
12 2
We next expand in powers of the magnetic field H to obtain

∞
1 1
2 2 ′
Ω(T , V , μ, H ) = 2∫ dε Q(ε − μ) + ( r − ) (ℏωc ) Q (−μ) + … . (5.8.35)
4 12
0
Thus, the magnetic susceptibility is
\begin{split} \xhi&=-{1\over V}{\pz^2\!\Omega\over\pz H^2}=\big(r^2-\third\big)\cdot{\tilde\mu}_\ssr{B}^2\cdot \big(m/m^*\big)^2\cdot \Big(\!-{2\over V}\,Q'(-\mu)\Big)\bvph\\ &=\bigg({g^2\over 4}-{m^2\over 3{m^*
where κ is the isothermal compressibility16. In most metals we have m ≈ m and the term in brackets is positive (recall g ≈ 2 ). In semiconductors, however, we can have m ≪ m ; for example in
T
∗ ∗
GaAs we have m = 0.067 m . Thus, semiconductors can have a diamagnetic response. If we take g = 2 and m = m , we see that the orbital currents give rise to a diamagnetic contribution to the
∗ ∗
magnetic susceptibility which is exactly − times as large as the contribution arising from Zeeman coupling. The net result is then paramagnetic (χ > 0 ) and
1
3
2
3
as large as the Pauli susceptibility.
The orbital currents can be understood within the context of Lenz’s law.
Exercise : Show that − 2
V
′
Q (−μ) = n κ
2
T
.
Moment formation in interacting itinerant electron systems

The Hubbard model
A noninteracting electron gas exhibits paramagnetism or diamagnetism, depending on the sign of χ, but never develops a spontaneous magnetic moment: M(H = 0) = 0 . What gives rise to
magnetism in solids? Overwhelmingly, the answer is that Coulomb repulsion between electrons is responsible for magnetism, in those instances in which magnetism arises. At first thought this might
seem odd, since the Coulomb interaction is spin-independent. How then can it lead to a spontaneous magnetic moment?
To understand how Coulomb repulsion leads to magnetism, it is useful to consider a model interacting system, described by the Hamiltonian
\HH=-t\sum_{\ij,\sigma}\Big(c\yd_{i\sigma}c\nd_{j\sigma} + c\yd_{j\sigma}c\nd_{i\sigma}\Big) +U\sum_i n\nd_{i\uar}\,n\nd_{i\dar}+ \muB\BH\cdot\sum_{i,\alpha,\beta} c\yd_{i\alpha}\,\Bsigma\nd_{\alpha\beta}\,c\nd
This is none other than the famous Hubbard model, which has served as a kind of Rosetta stone for interacting electron systems. The first term describes hopping of electrons along the links of some
regular lattice (the symbol \ij denotes a link between sites i and j ). The second term describes the local (on-site) repulsion of electrons. This is a single orbital model, so the repulsion exists when one
tries to put two electrons in the orbital, with opposite spin polarization. Typically the Hubbard U parameter is on the order of electron volts. The last term is the Zeeman interaction of the electron
spins with an external magnetic field. Orbital effects can be modeled by associating a phase exp(i A ) to the hopping matrix element t between sites i and j , where the directed sum of A around a
ij ij
plaquette yields the total magnetic flux through the plaquette in units of ϕ = hc/e . We will ignore orbital effects here. Note that the interaction term is short-ranged, whereas the Coulomb interaction
0
falls off as 1/|R − R | . The Hubbard model is thus unrealistic, although screening effects in metals do effectively render the interaction to be short-ranged.
i j
Within the Hubbard model, the interaction term is local and written as U n ↑
n
↓
on any given site. This term favors a local moment. This is because the chemical potential will fix the mean value of the
total occupancy n ↑
+n
↓
, in which case it always pays to maximize the difference |n ↑
−n |
↓
.
Stoner mean field theory

There are no general methods available to solve for even the ground state of an interacting many-body Hamiltonian. We’ll solve this problem using a mean field theory due to Stoner. The idea is to
write the occupancy n as a sum of average and fluctuating terms:
iσ
n = ⟨n ⟩ + δn . (5.8.36)
iσ iσ iσ
Here, ⟨n ⟩ is the thermodynamic average; the above equation may then be taken as a definition of the fluctuating piece, δn . We assume that the average is site-independent. This is a significant
iσ iσ
assumption, for while we understand why each site should favor developing a moment, it is not clear that all these local moments should want to line up parallel to each other. Indeed, on a bipartite
lattice, it is possible that the individual local moments on neighboring sites will be antiparallel, corresponding to an antiferromagnetic order of the pins. Our mean field theory will be one for
ferromagnetic states.
We now write the interaction term as
s 2
(f luct )

n n = \langlen ⟩ \langlen ⟩ + ⟨n ⟩ \deltan + \langlen ⟩ \deltan + \deltan \deltan

i↑ i↓ ↑ ↓ ↑ i↓ ↓ i↑ i↑ i↓
2
= −\langlen ⟩ \langlen ⟩ + \langlen ⟩ n + \langlen ⟩ n + O((δn) )
↑ ↓ ↑ i↓ ↓ i↑
1 1 1
2 2 2
= (m −n )+ n (n +n )+ m (n −n ) + O((δn) ) ,
i↑ i↓ i↑ i↓
4 2 2
where n and m are the average occupancy per spin and average spin polarization, each per unit cell:
n = ⟨n ⟩ + ⟨n ⟩
↓ ↑
m = ⟨n ⟩ − ⟨n ⟩ ,
↓ ↑
⟨nσ ⟩ =
1
2
(n − σm) . The mean field grand canonical Hamiltonian K = H
^
− μN , may then be written as
\begin{split} \CK^\ssr{MF}&=-\half\sum_{i,j,\sigma} t\nd_{ij} \Big(c\yd_{i\sigma}c\nd_{j\sigma} + c\yd_{j\sigma}c\nd_{i\sigma}\Big) - \big(\mu-\half Un\big)\sum_{i\sigma} c\yd_{i\sigma}c\nd_{i\sigma}\\ &\qquad +
where we’ve quantized spins along the direction of H, defined as z

^ . You should take note of two things here. First, the chemical potential is shifted downward (or the electron energies shifted upward)
by an amount U n , corresponding to the average energy of repulsion with the background. Second, the effective magnetic field has been shifted by an amount \half Um/\muB , so the effective field is
1
H\ns_{eff}=H + {Um\over 2\muB}\ .
The bare single particle dispersions are given by \ve\ns_\sigma(\Bk)=-{\hat t}(\Bk)+\sigma\muB H , where
^(k) = ∑ t(R) e−ik⋅R ,
t (5.8.37)
d
and t = t(R − R ) . For nearest neighbor hopping on a d -dimensional cubic lattice,
ij i j
^(k) = −t ∑
t μ=1
cos(kμ a) , where a is the lattice constant. Including the mean field effects, the effective single
particle dispersions become
{\widetilde\ve}\ns_\sigma(\Bk)=-{\hat t}(\Bk) - \half U n + \big(\muB H + \half U m\big)\,\sigma\ .
We now solve the mean field theory, by obtaining the free energy per site, φ(n, T , H ). First, note that φ = ω + μn , where ω = Ω/N is the Landau, or grand canonical, free energy per site. This sites
follows from the general relation Ω = F − μN ; note that the total electron number is N = nN , since n is the electron number per unit cell (including both spin species). If g(ε) is the density of
sites
states per unit cell (rather than per unit volume), then we have17
∞
1 1
2 2 ( μ̄−ε−Δ)/ kB T ( μ̄−ε+Δ)/ kB T
φ = U (m + n ) + μ̄n − kB T∫ dε g(ε) { ln (1 + e ) + ln (1 + e )} (5.8.38)
4 2
−∞
where μ̄ ≡ μ − U n and \Delta\equiv\muB H + \half Um . From this free energy we derive two self-consistent equations for μ and m. The first comes from demanding that φ be a function of n and
1
not of μ , ∂φ/∂μ = 0 , which leads to

∞
1
n = ∫ dε g(ε) {f (ε − Δ − μ̄) + f (ε + Δ − μ̄)} , (5.8.39)
2
−∞
−1
where f (y) = [ exp(y/k B
T ) + 1] is the Fermi function. The second equation comes from minimizing f with respect to average moment m:
∞
1
m = ∫ dε g(ε) {f (ε − Δ − μ̄) − f (ε + Δ − μ̄)} . (5.8.40)
2
−∞
Here, we will solve the first equation, eq. [neqn], and use the results to generate a Landau expansion of the free energy φ in powers of m . We assume that Δ is small, in which case we may write 2
∞
1 1
2 ′′ 4 ′′′′
n = ∫ dε g(ε) {f (ε − μ̄) + Δ f (ε − μ̄) + Δ f (ε − μ̄) + … } . (5.8.41)
2 24
−∞
We write μ̄(Δ) = μ̄ 0
+ δμ̄ and expand in δμ̄. Since n is fixed in our (canonical) ensemble, we have
∞
n = ∫ dε g(ε) f (ε − μ̄ ) , (5.8.42)
0
−∞
which defines μ̄ 0
(n, T ) .18 The remaining terms in the δμ̄ expansion of Equation [nexpan] must sum to zero. This yields
1 1 1 1
2 ′ 2 ′ ′′ 2 ′′′ 4 6
D(μ̄ ) δμ̄ + Δ D (μ̄ ) + (δμ̄) D (μ̄ ) + D (μ̄ ) Δ δμ̄ + D (μ̄ ) Δ + O(Δ ) = 0 , (5.8.43)
0 0 0 0 0
2 2 2 24
where
∞
′
D(μ) = −∫ dε g(ε) f (ε − μ) (5.8.44)
−∞
is the thermally averaged bare density of states at energy μ . Note that the k th
derivative is
∞
(k) (k) ′
D (μ) = − ∫ dε g (ε) f (ε − μ) . (5.8.45)
−∞
Solving for δμ̄, we obtain

1 1
2 3 4 6
δμ̄ = − a Δ − (3 a − 6a a +a )Δ + O(Δ ) , (5.8.46)
1 1 1 2 3
2 24
where
(k)
D (μ̄ )
0
a ≡ . (5.8.47)
k
D(μ̄ )
0
After integrating by parts and inserting this result for δμ̄ into our expression for the free energy f , we obtain the expansion
2
′
1 1 1 [ D (μ̄ )] 1
2 2 0 ′′ 4
φ(n, T , m) = φ (n, T ) + Um − D(μ̄ ) Δ + ( − D (μ̄ )) Δ +… ,
0 0 0
4 2 8 D(μ̄0 ) 3
where prime denotes differentiation with respect to argument, at m = 0 , and

∞
1
2
φ (n, T ) = Un + nμ̄ − ∫ dε N (ε) f (ε − μ̄ ) , (5.8.48)
0 0 0
4
−∞
′
where g(ε) = N (ε) , so N (ε) is the integrated bare density of states per unit cell in the absence of any magnetic field (including both spin species).
We assume that H and m are small, in which case
\vphi=\vphi\ns_0 + \half a m^2 + \fourth b m^4 -\half\xhi\ns_0\,H^2 - {U\xhi\ns_0\over 2\muB}\,Hm +\ldots\ ,
where \xhi\ns_0=\mu_\ssr{B}^2\, D({\bar\mu}\ns_0) is the Pauli susceptibility, and

′ 2
1 1 1 (D ) 1
′′ 4
a = U (1 − U D) , b = ( − D ) U , (5.8.49)
2 2 32 D 3
where the argument of each D (k)

above is μ̄
0
(n, T ) . The magnetization density (per unit cell, rather than per unit volume) is given by
M=-{\pz \vphi\over\pz H}=\xhi\ns_0 H + {U\xhi\ns_0\over 2\muB}\,m\ .
Minimizing with respect to m yields
am + bm^3 - {U\xhi\ns_0\over 2\muB}\,H=0\ ,
which gives, for small m,
m={\xhi\ns_0\over \muB}\,{H\over 1-\half UD}\ .
We therefore obtain M = χH with

χ
0
χ = , (5.8.50)
U
1−
Uc
where
2
Uc = . (5.8.51)
D(μ̄0 )
The denominator of χ increases the susceptibility above the bare Pauli value χ , and is referred to as – I kid you not – the Stoner enhancement (see Fig. [stenfig]).
0
[stenfig] A graduate student experiences the Stoner enhancement.
[stenfig] A graduate student experiences the Stoner enhancement.

It is worth emphasizing that the magnetization per unit cell is given by
M=-{1\over N_{sites}}\,{\delta\HH\over\delta H}=\muB m\ .
This is an operator identity and is valid for any value of m, and not only small m.
When H = 0 we can still get a magnetic moment, provided U > Uc . This is a consequence of the simple Landau theory we have derived. Solving for m when H = 0 gives m = 0 when U < Uc and
1/2
U −− −−−−
m(U ) = ±( ) √ U − Uc , (5.8.52)
2b Uc
when U > Uc , and assuming b > 0 . Thus we have the usual mean field order parameter exponent of β = 1
2
.
Antiferromagnetic solution
In addition to ferromagnetism, there may be other ordered states which solve the mean field theory. One such example is antiferromagnetism. On a bipartite lattice, the antiferromagnetic mean field
theory is obtained from
1 1 iQ⋅Ri
⟨n ⟩ = n+ σe m , (5.8.53)
iσ
2 2
where Q = (π/a, π/a, … , π/a) is the antiferromagnetic ordering wavevector. The grand canonical Hamiltonian is then
\begin{aligned} \CK^\ssr{MF}&=-\half\sum_{i,j,\sigma} t\ns_{ij} \Big(c\yd_{i\sigma}c\nd_{j\sigma} + c\yd_{j\sigma}c\nd_{i\sigma}\Big) -\big(\mu-\half Un\big)\sum_{i\sigma} c\yd_{i\sigma}c\nd_{i\sigma} \nonumbe
where ε(k) = − t^(k) , as before. On a bipartite lattice, with nearest neighbor hopping only, we have ε(k + Q) = −ε(k) . The above matrix is diagonalized by a unitary transformation, yielding the
eigenvalues
−−−−−−−−−
2 2
λ = ±√ ε (k) + Δ − μ̄ (5.8.54)
±
with Δ = 1
2
Um and μ̄ = μ − 1
2
Un as before. The free energy per unit cell is then
1
2 2
φ = U (m + n ) + μ̄n
4
∞
1 ( μ̄−
√ 2 2
ε +Δ )/ kB T ( μ̄+
√ 2 2
ε +Δ )/ kB T
− kB T∫ dε g(ε) { ln (1 + e ) + ln (1 + e )} .
2
−∞
The mean field equations are then

∞
1 −−−−−− −−−−−−
2 2 2 2
n = ∫ dε g(ε) {f (−√ ε +Δ − μ̄) + f (√ ε +Δ − μ̄) }
2
−∞
1 1 g(ε) −−−−− − −−−−− −

2 2 2 2
= ∫ dε {f (−√ ε + Δ − μ̄) − f (√ ε + Δ − μ̄) }.
−−−−− −
U 2 √ ε2 + Δ2
−∞
As in the case of the ferromagnet, a paramagnetic solution with m = 0 always exists, in which case the second of the above equations is no longer valid.
Mean field phase diagram of the Hubbard model
Let us compare the mean field theories for the ferromagnetic and antiferromagnetic states at T =0 and H = 0 . Due to particle-hole symmetry, we may assume 0 ≤ n ≤ 1 without loss of generality.
(The solutions repeat themselves under n → 2 − n .) For the paramagnet, we have
μ̄
n = ∫ dε g(ε)
−∞
μ̄
1 2
φ = Un + ∫ dε g(ε) ε ,
4
−∞
with μ̄ = μ − 1
2
Un is the ‘renormalized’ Fermi energy and g(ε) is the density of states per unit cell in the absence of any explicit (H ) or implicit (m) symmetry breaking, including both spin
polarizations.
For the ferromagnet,
μ̄−Δ μ̄+Δ
1 1
n = ∫ dε g(ε) + ∫ dε g(ε)
2 2
−∞ −∞
μ̄+Δ
4Δ
= ∫ dε g(ε)
U
μ̄−Δ
μ̄−Δ μ̄+Δ
2
1 2
Δ
φ = Un − + ∫ dε g(ε) ε + ∫ dε g(ε) ε .
4 U
−∞ −∞
Here, Δ = 1
2
Um is nonzero in the ordered phase.
Finally, the antiferromagnetic mean field equations are \[\begin{aligned} n\ns_
−−−−−−−
\label{dela}\\ \vphi&=\fourth U n^2+{\Delta^2\over U} -\!\int\limits_{\ve\ns_0}^\infty\!\!d\ve\,g(\ve)\, \sqrt{\ve^2+\Delta^2}\quad ,\end{aligned}\] where ε 0
= √μ̄
2
− Δ2 and Δ = 1
2
Um as
before. Note that |μ̄| ≥ Δ for these solutions. Exactly at half-filling, we have n = 1 and μ̄ = 0 . We then set ε 0
=0 .
The paramagnet to ferromagnet transition may be first or second order, depending on the details of g(ε) . If second order, it occurs at U_\Rc^\ssr{F}=1\big/g({\bar\mu}\ns_\ssr{P}) , where
{\bar\mu}_\ssr{P}(n) is the paramagnetic solution for μ̄ . The paramagnet to antiferromagnet transition is always second order in this mean field theory, since the RHS of Equation ([dela]) is a
monotonic function of Δ. This transition occurs at $U_\Rc^\ssr{A}=2\bigg/\!\!\int\limits_
^\infty\!\!d\ve\,g(\ve) \,\ve^{-1}$. Note that U_\Rc^\ssr{A}\to 0 logarithmically for n → 1 , since {\bar\mu}\ns_\ssr{P}=0 at half-filling.
For large U , the ferromagnetic solution always has the lowest energy, and therefore if U_\Rc^\ssr{A} < U_\Rc^\ssr{F} , there will be a first-order antiferromagnet to ferromagnet transition at some
−−−−− −−
value U^*>U_\Rc^\ssr{F} . In Figure [hpd], I plot the phase diagram obtained by solving the mean field equations assuming a semicircular density of states g(ε) = W √W − ε . Also shown is 2
π
−2 2 2
the phase diagram for the d = 2 square lattice Hubbard model obtained by J. Hirsch (1985).
[hpd] Mean field phase diagram of the Hubbard model, including paramagnetic (P), ferromagnetic (F), and
antiferromagnetic (A) phases. Left panel: results using a semicircular density of states function of half-
bandwidth W. Right panel: results using a two-dimensional square lattice density of states with nearest
neighbor hopping t, from J. E. Hirsch, Phys. Rev. B 31, 4403 (1985). The phase boundary between F and A
phases is first order.
[hpd] Mean field phase diagram of the Hubbard model, including paramagnetic (P), ferromagnetic (F), and antiferromagnetic (A) phases. Left panel: results using a semicircular density of states
function of half-bandwidth W . Right panel: results using a two-dimensional square lattice density of states with nearest neighbor hopping t , from J. E. Hirsch, Phys. Rev. B 31, 4403 (1985). The
phase boundary between F and A phases is first order.
How well does Stoner theory describe the physics of the Hubbard model? Quantum Monte Carlo calculations by J. Hirsch (1985) found that the actual phase diagram of the d = 2 square lattice
Hubbard Model exhibits no ferromagnetism for any n up to U = 10 . Furthermore, he found the antiferromagnetic phase to be entirely confined to the vertical line n = 1 . For n ≠ 1 and 0 ≤ U ≤ 10 ,
the system is a paramagnet19. These results were state-of-the art at the time, but both computing power as well as numerical algorithms for interacting quantum systems have advanced considerably
since 1985. Yet as of 2018, we still don’t have a clear understanding of the d = 2 Hubbard model’s T = 0 phase diagram! There is an emerging body of numerical evidence20 that in the underdoped (
n <1 ) regime, there are portions of the phase diagram which exhibit a stripe ordering, in which antiferromagnetic order is interrupted by a parallel array of line defects containing excess holes ( the
absence of an electron)21. This problem has turned out to be unexpectedly rich, complex, and numerically difficult to resolve due to the presence of competing ordered states, such as d -wave
superconductivity and spiral magnetic phases, which lie nearby in energy with respect to the putative stripe ground state.
In order to achieve a ferromagnetic solution, it appears necessary to introduce geometric frustration, either by including a next-nearest-neighbor hopping amplitude t or by defining the model on non- ′
bipartite lattices. Numerical work by M. Ulmke (1997) showed the existence of a ferromagnetic phase at T = 0 on the FCC lattice Hubbard model for U = 6 and n ∈ [0.15, 0.87] (approximately).
White dwarf stars

There is a nice discussion of this material in R. K. Pathria, Statistical Mechanics. As a model, consider a mass M ∼ 10 g of helium at nuclear densities of ρ ∼ 10 g/cm and temperature 33 7 3
T ∼ 10 K . This temperature is much larger than the ionization energy of He, hence we may safely assume that all helium atoms are ionized. If there are N electrons, then the number of α particles
7 4
( He nuclei) must be N . The mass of the α particle is m ≈ 4m . The total stellar mass M is almost completely due to α particle cores.
4 1
2
α p
The electron density is then

N 2 ⋅ M /4mp ρ
30 −3
n = = = ≈ 10 cm , (5.8.55)
V V 2mp
since M = N ⋅ me +
1
2
N ⋅ 4 mp . From the number density n we find for the electrons
2 1/3 10 −1
k = (3 π n) = 2.14 × 10 cm
F
−17
p = ℏk = 2.26 × 10 g cm/s
F F
−28 10 −17
mc = (9.1 × 10 g)(3 × 10 \Rm/s) = 2.7 × 10 g cm/s .
Since p\ns_\ssr{F}\sim mc , we conclude that the electrons are relativistic. The Fermi temperature will then be T ∼ mc ∼ 10 eV F
2 6
∼ 10
12
K . Thus, T ≪ T
f
which says that the electron gas is
degenerate and may be considered to be at T ∼ 0 . So we need to understand the ground state properties of the relativistic electron gas.
The kinetic energy is given by
−−−−−−−−−−
2 2 2 4 2
ε(p) = √ p c +m c − mc . (5.8.56)
The velocity is
2
∂ε pc
v = = . (5.8.57)
−−−− −−−− −−
∂p 2 2 2 4
√p c + m c
The pressure in the ground state is

1
p = n⟨p ⋅ v⟩
0
3
p
F
2 2
1 2
p c
= ∫ dp p ⋅ −−−− −−−− −−
3π 2 ℏ 3 2 2
√p c + m c
2 4
θF
4 5
m c
4
= ∫ dθ sinh θ
2 3
3π ℏ
0
4 5
m c
= (sinh(4 θ ) − 8 sinh(2 θ ) + 12 θ ) ,
2 3 F F F
96π ℏ
where we use the substitution

1 c +v
p = mc sinh θ , v = c tanh θ ⟹ θ = ln( ) . (5.8.58)
2 c −v
Note that p F
= ℏk
F
2
= ℏ(3 π n)
1/3
, and that
M 9π M
2
n = ⟹ 3π n = . (5.8.59)
3
2 mp V 8 R mp
Now in equilibrium the pressure p is balanced by gravitational pressure. We have

2
dE = −p dV = −p (R) ⋅ 4π R dR . (5.8.60)
0 0 0
This must be balanced by gravity:

2
GM
dEg = γ ⋅ dR , (5.8.61)
2
R
where γ depends on the radial mass distribution. Equilibrium then implies

2
γ GM
p (R) = . (5.8.62)
0 4
4π R
[whitedwarf] Mass-radius relationship for white dwarf stars. (Source: Wikipedia).
[whitedwarf] Mass-radius relationship for white dwarf stars. (Source: Wikipedia).

To find the relation R = R(M ) , we must solve
2 4 5
γ gM m c
= (sinh(4 θ ) − 8 sinh(2 θ ) + 12 θ ) . (5.8.63)
4 2 3 F F F
4π R 96π ℏ
Note that
96 5
⎧
⎪ θ θ → 0
15 F F
sinh(4 θ ) − 8 sinh(2 θ ) + 12 θ =⎨ (5.8.64)

F F F
⎩
⎪ 1 4θ
e F
θ → ∞ .
2 F
Thus, we may write

5/3
⎧ ℏ
2
9π M
⎪
⎪ ( ) θ → 0
⎪
⎪ 15 π
2
m 8 3
R mp
F
2
γ gM
p (R) = =⎨ (5.8.65)
0 4
4π R ⎪ 4/3
⎪
⎪
⎩
⎪
ℏc 9π M
( ) θ → ∞ .
12π
2
8 3 F
R mp
In the limit θ
F
→ 0 , we solve for R(M ) and find
2
3 2/3
ℏ −1/3
R = (9π ) ∝M . (5.8.66)
40γ 5/3
1/3
G mp mM
In the opposite limit θ F

→ ∞ , the R factors divide out and we obtain
1/2 3/2
9 3π ℏc 1
M =M = ( ) ( ) . (5.8.67)
0 3 2
64 γ G mp
To find the R dependence, we must go beyond the lowest order expansion of Equation [cases], in which case we find
1/2
1/3 1/3 2/3
9π ℏ M M
R =( ) ( )( ) [1 − ( ) ] . (5.8.68)
8 mc mp M
0
The value M is the limiting size for a white dwarf. It is called the Chandrasekhar limit.
0
This page titled 5.8: The Ideal Fermi Gas is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel Arovas.
Basis States and Creation/Annihilation Operators
Second quantization is a convenient scheme to label basis states of a many particle quantum system. We are ultimately interested in
solutions of the many-body Schrödinger equation,
^ Ψ(x , … , x
H ) = E Ψ(x , … , x ) (5.9.1)
1 N 1 N
where the Hamiltonian is

2 N N
ℏ
^ 2
H =− ∑∇ + ∑ V (x −x ) . (5.9.2)
i j k
2m
i=1 j<k
To the coordinate labels {x , … x } we may also append labels for internal degrees of freedom, such as spin polarization,
1 N
denoted {ζ , … , ζ }. Since [ H
1 N
^
, σ] = 0 for all permutations σ ∈ S , the many-body wavefunctions may be chosen to transform N
according to irreducible representations of the symmetric group S . Thus, for any σ ∈ S , N N
1
Ψ(x ,…,x ) ={ } Ψ(x , … , x ) , (5.9.3)
σ(1) σ(N ) 1 N
sgn(σ)
where the upper choice is for Bose-Einstein statistics and the lower sign for Fermi-Dirac statistics. Here x
j
may include not only
the spatial coordinates of particle j , but its internal quantum number(s) as well, such as ζ . j
A convenient basis for the many body states is obtained from the single-particle eigenstates {|α⟩} of some single-particle
Hamiltonian ^
H0 , with ⟨ x | α ⟩ = φα (x) and ^
H 0 |α⟩ = εα |α⟩ . The basis may be taken as orthonormal, ⟨α ∣
′
∣ α ⟩ = δαα′ . Now
define
1 1
Ψα ,…, α
(x , … , x ) = − −−−−−−− ∑ { } φα (x ) ⋯ φα (x ) . (5.9.4)
1 N 1 N
1 N
√ N ! ∏ nα ! sgn(σ) σ( 1) σ( N)
α σ∈SN
Here n is the number of times the index α appears among the set {α , … , α
α
1 N
}. For BE statistics, nα ∈ {0, 1, 2, …} , whereas
for FD statistics, n ∈ {0, 1} . Note that the above states are normalized22:
α
N
2 1 1
d d d ∗
∫ d x ⋯∫ d x ∣
∣Ψ (x , … , x )∣ = ∑ { } ∏∫ d x φ (x ) φ (x )
1 N α ⋯α 1 N ∣ j α j α j
1 N
N! ∏ nα ! sgn(σμ) σ( j) μ( j)
α σ,μ∈SN j=1
N
1
= ∑ ∏δ =1 .
α ,α
∏ nα ! j σ( j)
α σ∈SN j=1
Note that
∑ φ (x ) ⋯ φ (x ) ≡ per{ φ (x )}
α 1 α N α j
σ( 1) σ( N) i
σ∈SN
∑ sgn(σ) φ (x ) ⋯ φ (x ) ≡ det{ φ (x )} ,
α 1 α N α j
σ( 1) σ( N) i
σ∈S
N
which stand for permanent and determinant, respectively. We may now write
Ψ (x , … , x ) = ⟨x ,⋯,x ∣
∣ α1 ⋯ αN ⟩ , (5.9.5)
α1 ⋯ α 1 N 1 N
N
where
1 1
∣
∣ α1 ⋯ αN ⟩ = ∑ { } ∣
∣ ασ(1) ⟩ ⊗ ∣
∣ ασ(2) ⟩ ⊗ ⋯ ⊗ ∣
∣ ασ(N ) ⟩ . (5.9.6)
− −−−−−−−
√ N ! ∏ nα ! sgn(σ)
α σ∈SN
Note that | α σ(1)

⋯α
σ(N )
⟩ = (±1 )
σ
|α
1
⋯α
N
⟩ , where by (±1) we mean 1 in the case of BE statistics and
σ
sgn (σ) in the case
of FD statistics.
We may express | α 1
⋯α
N
⟩ as a product of creation operators acting on a vacuum | 0 ⟩ in Fock space. For bosons,
† nα
(bα )
∣
∣ α1 ⋯ αN ⟩ = ∏ ∣
∣0⟩ ≡∣
∣ { nα } ⟩ , (5.9.7)
−−
−
α
√nα !
with
† † †
[ bα , b ] = 0 , [ bα , b ] = 0 , [ bα , b ] = δ , (5.9.8)
β β β αβ
where [ ∙ , ∙ ] is the commutator. For fermions,

† † †
∣
∣ α1 ⋯ αN ⟩ = cα cα ⋯ c ∣
∣0⟩ ≡∣
∣ { nα } ⟩ , (5.9.9)
α
1 2 N
with
† † †
{ cα , c } = 0 , { cα , c } = 0 , { cα , c } = δ , (5.9.10)
β β β αβ
where {∙ , ∙} is the anticommutator.
Second Quantized Operators

2
N
Now consider the action of permutation-symmetric first quantized operators such as ^
T =−
ℏ
2m
∑
i=1
2
∇
i
and
N
^ ^(x − x )
V = ∑i<j v
i j
. For a one-body operator such as ^
T , we have
−1/2 −1/2
^ ′ ′ d d ′
⟨α ⋯ α ∣
∣T ∣∣ α1 ⋯ αN ⟩ = ∫ d x1 ⋯∫ d xN ( ∏ nα ! ) ( ∏ nα ! ) ×
1 N
α α
σ ∗ ∗ ^
∑ (±1 ) φ (x ) ⋯ φ (x ) ∑T i φ ′
(x ) ⋯ φ ′
α 1 α N α 1 α
σ( 1) σ( N) σ( 1) σ( N)
σ∈S k=1
N
N
−1/2
σ ′ d ∗ ^
= ∑ (±1 ) ( ∏ nα ! nα ! ) ∑ ∏ δ ′
∫ d x φ (x ) T 1 φ ′
(x ) .
αj , α 1 α 1 α 1
σ( j) i σ( i)
σ∈SN α i=1 j
( j≠i)
One may verify that any permutation-symmetric one-body operator such as ^

T is faithfully represented by the second quantized
expression,
†
^ ^
T = ∑⟨α ∣
∣T ∣∣ β ⟩ ψα ψβ , (5.9.11)
α,β
† † †
where ψ is b or c as the application determines, and
α α α
^ d ∗ ^
⟨α ∣
∣T ∣∣ β ⟩ = ∫ d x1 φα (x1 ) T 1 φβ (x1 ) . (5.9.12)
Similarly, two-body operators such as V^ are represented as

1 † †
^ ^
V = ∑ ⟨ αβ ∣
∣V ∣∣ γδ ⟩ ψα ψβ ψ ψγ , (5.9.13)
δ
2
α,β,γ,δ
where
^∣ d d ∗ ∗
⟨ αβ ∣
∣V ∣ γδ ⟩ = ∫ d x ∫ d x φα (x ) φ (x ) v(x − x ) φ (x ) φγ (x ) . (5.9.14)
1 2 1 β 2 1 2 δ 2 1
The general form for an n -body operator is then

1 † †
^ ^
R = ∑ ⟨α ⋯ αn ∣
∣R∣ ∣ β1 ⋯ βn ⟩ ψα ⋯ ψα ψ ⋯ψ . (5.9.15)
1 n n βn β1
n!
α ⋯ α
n
1
β ⋯ β
1 n
Finally, if the Hamiltonian is noninteracting, consisting solely of one-body operators H

^
=∑
N
i=1
^
hi , then
†
^
H = ∑ εα ψα ψα , (5.9.16)
where {ε α} is the spectrum of each single particle Hamiltonian h

^
.i
This page titled 5.9: Appendix I- Second Quantization is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by
Daniel Arovas.
We begin with the grand canonical Hamiltonian K = H − μN for the ideal Bose gas,
† −
− †
K = ∑(ε − μ) b b − √N ∑ (ν b + ν̄ b ) . (5.10.1)
k k k k k k k
k k
Here b is the creation operator for a boson in a state of wavevector k, hence [ b , b

†
k k
†
k
′
] =δ
kk
′
. The dispersion relation is given by
the function ε , which is the energy of a particle with wavevector k. We must have ε
k k
−μ ≥ 0 for all k, lest the spectrum of K be
unbounded from below. The fields {ν , ν̄ } break a global O(2) symmetry.
k k
Students who have not taken a course in solid state physics can skip the following paragraph, and be aware that N = V /v is the 0
total volume of the system in units of a fundamental "unit cell" volume. The thermodynamic limit is then N → ∞ . Note that N is
not the boson particle number, which we’ll call N . b
Solid state physics boilerplate : We presume a setting in which the real space Hamiltonian is defined by some boson hopping model
on a Bravais lattice. The wavevectors k are then restricted to the first Brillouin zone, Ω ^
, and assuming periodic boundary
conditions are quantized according to the condition exp(i N k ⋅ a ) = 1 for all l ∈ {1, … , d}, where a is the l fundamental l l l
th
direct lattice vector and N is the size of the system in the a direction; d is the dimension of space. The total number of unit cells
l l
is N ≡ ∏ N . Thus, quantization entails k = ∑ (2π n /N ) b , where b is the l elementary reciprocal lattice vector (
l l l l l l l
th
a ⋅ b = 2π δ
l l
′ ) and n ranges over N distinct integers such that the allowed k points form a discrete approximation to Ω
ll
′
l l
^
.
To solve, we first shift the boson creation and annihilation operators, writing
2
|ν |
† k
K = ∑(ε − μ) β β −N ∑ , (5.10.2)
k k k
ε −μ
k k k
where
−
− −
−
√N ν √N ν̄ k
k † †
β =b − , β =b − . (5.10.3)
k k k k
ε −μ ε −μ
k k
Note that [β
k
, β
k
†
′
] =δ
kk
′
so the above transformation is canonical. The Landau free energy Ω = −kB T ln Ξ , where
Ξ = Tr e
−K/ k
B
T
, is given by
∞
2
|ν |
(μ−ε)/ kb T k
Ω = N kB T ∫ dε g(ε) ln (1 − e )−N ∑ , (5.10.4)
k ε −μ
k
−∞
where g(ε) is the density of energy states per unit cell,

d
1 d k
g(ε) = ∑ δ(ε − ε )to35pt\rightarrowfill ∫ δ(ε − ε ) . (5.10.5)
k d k
N N →∞ (2π)
k
^
Ω
Note that
1 1 ∂Ω ν
k
ψ ≡ − ⟨bk ⟩ = −
− = . (5.10.6)
k
√N N ε −μ
∂ν̄ k
k
In the condensed phase, ψ is nonzero. k
The Landau free energy (grand potential) is a function Ω(T , N , μ, ν , ν̄ ). We now make a Legendre transformation,
¯ ¯
Y (T , N , μ, ψ, ψ ) = Ω(T , N , μ, ν , ν̄ ) + N ∑ (ν ψ k + ν̄ ψ ) . (5.10.7)
k k k
Note that
∂Y ∂Ω
= +Nψ =0 , (5.10.8)
k
∂ν̄ ∂ν̄
k k
by the definition of ψ . Similarly, ∂Y /∂ ν

k k
=0 . We now have
∞
¯ (μ−ε)/ kb T 2
Y (T , N , μ, ψ, ψ ) = N kB T ∫ dε g(ε) ln (1 − e ) + N ∑(ε − μ) | ψ | . (5.10.9)
k k
k
−∞
Therefore, the boson particle number per unit cell is given by the dimensionless density,
∞
N 1 ∂Y g(ε)
b 2
n = =− = ∑ |ψ | + ∫ dε , (5.10.10)
k
N N ∂μ (ε−μ)/ k T
k e B
−1
−∞
and the condensate amplitude at wavevector k is

1 ∂Y
ν = = (ε − μ) ψ . (5.10.11)
k k k
¯
N ∂ψ
k
Recall that ν acts as an external field. Let the dispersion ε be minimized at k = K . Without loss of generality, we may assume
k k
this minimum value is ε = 0 . We see that if ν = 0 then one of two must be true:
K k
ψ
k
=0 for all k
μ =ε
K
, in which case ψ can be nonzero.
K
Thus, for ν = ν̄ = 0 and μ > 0 , we have the usual equation of state,

∞
g(ε)
n(T , μ) = ∫ dε , (5.10.12)
(ε−μ)/ k T
e B −1
−∞
which relates the intensive variables n , T , and μ . When μ = 0 , the equation of state becomes
n> (T )

n
0
∞

g(ε)
2
n(T , μ = 0) =∑ | ψ | + ∫ dε , (5.10.13)
K
ε/ k T
K e B
−1
−∞
where now the sum is over only those K for which ε = 0 . Typically this set has only one member, K = 0 , but it is quite
K
possible, due to symmetry reasons, that there are more such K values. This last equation of state is one which relates the intensive
variables n , T , and n , where
0
2
n = ∑ |ψ | (5.10.14)
0 K
is the dimensionless condensate density. If the integral n (T ) in Equation [condeqn] is finite, then for n > n (T ) we must have
> 0
n > 0 . Note that, for any T , n (T ) diverges logarithmically whenever g(0) is finite. This means that Equation [GDE] can always
>
0
be inverted to yield a finite μ(n, T ), no matter how large the value of n , in which case there is no condensation and n = 0 . If 0
g(ε) ∝ ε
α
with α > 0 , the integral converges and n (T ) is finite and monotonically increasing for all T . Thus, for fixed
>
dimensionless number n , there will be a critical temperature T for which n = n (T ) . For T < T , Equation [GDE] has no
c > c c
solution for any μ and we must appeal to Equation [condeqn]. The condensate density, given by n (n, T ) = n − n (T ) , is then 0 >
finite for T < T , and vanishes for T ≥ T .

c c
In the condensed phase, the phase of the order parameter ψ inherits its phase from the external field ν , which is taken to zero, in the
same way the magnetization in the symmetry-broken phase of an Ising ferromagnet inherits its direction from an applied field h
which is taken to zero. The important feature is that in both cases the applied field is taken to zero after the approach to the
thermodynamic limit.
This page titled 5.10: Appendix II- Ideal Bose Gas Condensation is shared under a CC BY-NC-SA license and was authored, remixed, and/or
1/2
A three-dimensional gas of noninteracting bosonic particles obeys the dispersion relation ε(k) =A∣
∣k∣
∣ .
Obtain an expression for the density n(T , z) where z = exp(μ/k T ) is the fugacity. Simplify your expression as best you can,
B
adimensionalizing any integral or infinite sum which may appear. You may find it convenient to define
∞
ν −1 ∞ k
1 t z
Liν (z) ≡ ∫ dt =∑ . (5.11.1)
−1 t ν
Γ(ν ) z e −1 k
k=1
0
Note Li (1) = ζ(ν ) , the Riemann zeta function.

ν
Find the critical temperature for Bose condensation, T (n) . Your expression should only include the density n , the constant A ,
c
physical constants, and numerical factors (which may be expressed in terms of integrals or infinite sums).
What is the condensate density n when T = T ?
0
1
2
c
Do you expect the second virial coefficient to be positive or negative? Explain your reasoning. (You don’t have to do any
calculation.)
We work in the grand canonical ensemble, using Bose-Einstein statistics.
The density for Bose-Einstein particles are given by
3
d k 1
n(T , z) = ∫
3
(2π) z −1 exp(Ak1/2 / kB T ) − 1
∞
6 5
1 kB T s
= ( ) ∫ ds
2 −1 s
π A z e −1
0
6
120 kB T
= ( ) Li (z) ,
2 6
π A
where we have changed integration variables from k to s = Ak /k T , and we have defined the functions Li
1/2
B
ν
(z) as above, in
Equation [zetadef]. Note Li (1) = ζ(ν ) , the Riemann zeta function.
ν
Bose condensation sets in for z = 1 , μ = 0 . Thus, the critical temperature T and the density n are related by c
6
120 ζ(6) kB Tc
n = ( ), (5.11.2)
2
π A
or
2 1/6
A π n
Tc (n) = ( ) . (5.11.3)
kB 120 ζ(6)
For T < Tc , we have

6
120 ζ(6) kB T
n =n + ( )
0
π2 A
6
T
=n +( ) n ,
0
Tc
where n is the condensate density. Thus, at T

0
=
1
2
Tc ,
1 63
n (T = Tc ) = n. (5.11.4)
0
2 64
The virial expansion of the equation of state is

2
p = nkB T (1 + B2 (T ) n + B3 (T ) n +… ) . (5.11.5)
We expect B (T ) < 0 for noninteracting bosons, reflecting the tendency of the bosons to condense. (Correspondingly, for
2
noninteracting fermions we expect B (T ) > 0 .) 2
For the curious, we compute B2 (T ) by eliminating the fugacity z from the equations for n(T , z) and . First, we find
p(T , z)
p(T , z):
3
d k
1/2
p(T , z) = −kB T ∫ ln (1 − z exp(−Ak / kB T ))
3
(2π)
∞
6
kB T kB T 5 −s
=− ( ) ∫ ds s ln (1 − z e )
π2 A
0
6
120 kB T kB T
= ( ) Li (z).
2 7
π A
Expanding in powers of the fugacity, we have

6 2 3
120 kB T z z
n = ( ) {z + + +… }
2 6 6
π A 2 3
6 2 3
p 120 kB T z z
= ( ) {z + + +… } .
2 7 7
kB T π A 2 3
Solving for z(n) using the first equation, we obtain, to order n , 2
2 6 2 6 2
π A n 1 π A n
3
z =( )− ( ) + O(n ) . (5.11.6)
6 6 6
120 (kB T ) 2 120 (kB T )
Plugging this into the equation for p(T , z), we obtain the first nontrivial term in the virial expansion, with
2 6
π A
B2 (T ) = − ( ) , (5.11.7)
15360 kB T
which is negative, as expected. Note that the ideal gas law is recovered for T → ∞ , for fixed n .
1. For a review of the formalism of second quantization, see the appendix in §9.↩
2. Several texts, such as Pathria and Reichl, write g (z) for Li (z) . I adopt the latter notation since we are already using the
q q
symbol g for the density of states function g(ε) and for the internal degeneracy g.↩
3. Note the dimensions of g(ω) are (f requency) . By contrast, the dimensions of g(ε) in Equation [BDOS] are
−1
(energy )
−1
⋅ (volume) . The difference lies in the a factor of V ⋅ ℏ , where V is the unit cell volume.↩
−1
0 0
d d
− −
4. If ω(k) = Ak , then C = 2
σ
π
1−d
σ A
2
g / Γ(d/2) .↩
−1 σ
5. OK, that isn’t quite true. For example, if g(ε) ∼ 1/ ln ε , then the integral has a very weak ln ln(1/η) divergence, where η is the
lower cutoff. But for any power law density of states g(ε) ∝ ε with r > 0 , the integral converges.↩ r
6. It is easy to see that the chemical potential for noninteracting bosons can never exceed the minimum value ε of the single 0
particle dispersion.↩
7. Note that in the thermodynamics chapter we used v to denote the molar volume, N V /N .↩ A
8. The k ≠ 0 particles are sometimes called the overcondensate.↩

9. IBG condensation is in the universality class of the spherical model. The λ -transition is in the universality class of the XY
model.↩
10. Recall that two bodies in thermal equilibrium will have identical temperatures if they are free to exchange energy.↩
11. The phonon velocity c is slightly temperature dependent.↩
12. Many reliable descriptions may be found on the web. Check Wikipedia, for example.↩
2 3 2
13. Explicitly, one replaces ζ(3) with ζ(2) = , Li π
6 3
(y) with Li 2
(y) , and (k B
T /ℏω̄) with (k T /ℏω̄) .↩
B
14. Note that writing v = (2n + 1) iπ + ϵ we have e ±v

= −1 ∓ ϵ −
1
2
ϵ
2
+… We then expand
v −v 2
), so\((e + 1)(e + 1) = −ϵ +…
e
vD
=e
(2n+1)iπD
to find the residue: Res = −D e
(1 + ϵD + …)
(2n+1)iπD
.↩
15. I thank my colleague Tarun Grover for this observation.↩
2
16. We’ve used − 2
V
′
Q (μ) = −
1
V
∂ Ω
∂μ
2
2
=n κ
T
.↩
17. Note that we have written μn = μ̄n + U n , which explains the sign of the coefficient of n .↩
1
2
2 2
18. The Gibbs-Duhem relation guarantees that such an equation of state exists, relating any three intensive thermodynamic
quantities.↩
19. A theorem due to Nagaoka establishes that the ground state is ferromagnetic for the case of a single hole in the U = ∞ system
on bipartite lattices.↩
20. See J. P. F. LeBlanc , Phys. Rev. X 5, 041041 (2015) and B. Zheng , Science 358, 1155 (2017).↩
21. The best case for stripe order has been made at T = 0 , U /t = 8 , and hold doping x = ( n = ).↩
1
8
7
22. In the normalization integrals, each ∫ d x implicitly includes a sum ∑ over any internal indices that may be present.↩
d
ζ
This page titled 5.11: Appendix III- Example Bose Condensation Problem is shared under a CC BY-NC-SA license and was authored, remixed,
and/or curated by Daniel Arovas.
5.S: Summary
References
F. Reif, Fundamentals of Statistical and Thermal Physics (McGraw-Hill, 1987) This has been perhaps the most popular
undergraduate text since it first appeared in 1967, and with good reason.
D. V. Schroeder, An Introduction to Thermal Physics (Addison-Wesley, 2000) This is the best undergraduate thermodynamics
book I’ve come across, but only 40% of the book treats statistical mechanics.
C. Kittel, Elementary Statistical Physics (Dover, 2004) Remarkably crisp, though dated, this text is organized as a series of brief
discussions of key concepts and examples. Published by Dover, so you can’t beat the price.
R. K. Pathria, Statistical Mechanics (2 edition, Butterworth-Heinemann, 1996) This popular graduate level text contains
nd
many detailed derivations which are helpful for the student.

rd
rd
Summary
\def\tpar{t\ns_\parallel} \def\mhat{\hat\Bm} \parindent=0pt \renewcommand*\rmdefault{ppl}\normalfont\upshape \physgreek
\font\seventeenbf=cmbx10 scaled \magstep3 \setcounter{section}{4} \section{Quantum Statistics : Summary} $\bullet$ {\it
Second-quantized Hamiltonians\/}: A noninteracting quantum system is described by a Hamiltonian
$\HH=\sum_\alpha\ve\ns_\alpha\,\Hn\ns_\alpha$, where $\ve\ns_\alpha$ is the energy eigenvalue for the single particle state
$\psi\ns_\alpha$ (possibly degenerate), and $\Hn\ns_\alpha$ is the number operator. Many-body eigenstates $\tket{\Vn}$ are
labeled by the set of occupancies $\Vn=\{n\ns_\alpha\}$, with $\Hn\ns_\alpha\,\tket{\Vn}=n\ns_\alpha\tket{\Vn}$. Thus,
$\HH\,\tket{\Vn}=E\ns_\Vn\>\tket{\Vn}$, where $E\ns_\Vn=\sum_\alpha n\ns_\alpha\,\ve\ns_\alpha$. $\bullet$ {\it Bosons and
fermions\/}: The allowed values for $n\ns_\alpha$ are $n\ns_\alpha\in\{0,1,2,\ldots,\infty\}$ for bosons and $n\ns_\alpha\in\
{0,1\}$ for fermions. $\bullet$ {\it Grand canonical ensemble\/}: Because of the constraint $\sum_\alpha n\ns_\alpha=N$, the
ordinary canonical ensemble is inconvenient. Rather, we use the grand canonical ensemble, in which case
Ω(T , V , μ) = ±kB T ∑ ln(1 ∓ e ) ,
where the upper sign corresponds to bosons and the lower sign to fermions. The average number of particles occupying the single
particle state $\psi\ns_\alpha$ is then
∂Ω 1
⟨n
^α ⟩ = = .
∂εα ( εα −μ)/ kB T
e ∓1
In the Maxwell-Boltzmann limit, $\mu\ll -\kT$ and $\langle n\ns_\alpha\rangle = z\,e^{-\ve\ns_\alpha/\kT}$, where
$z=e^{\mu/\kT}$ is the fugacity. Note that this low-density limit is common to both bosons and fermions. $\bullet$ {\it Single
particle density of states\/}: The single particle density of states per unit volume is defined to be
1 1
^
g(ε) = Tr δ(ε − h) = ∑ δ(ε − εα ) ,
V V
α
where $\Hh$ is the one-body Hamiltonian. If $\Hh$ is isotropic, then $\ve=\ve(k)$, where $k=|\Bk|$ is the magnitude of the
wavevector, and
d−1
gΩ k
d
g(ε) = ,
(2π)d dε/dk
where $\Sg$ is the degeneracy of each single particle energy state (due to spin, for example). $\bullet$ {\it Quantum virial
expansion\/}: From $\Omega=-pV$, we have
∞
∞
g(ε)
j−1 j
n(T , z) = ∫ dε = ∑(±1 ) z C (T )
j
−1 ε/ kB T
z e ∓1 j=1
−∞
∞
∞ j
p(T , z) z
−ε/ kB T j−1
= ∓∫ dε g(ε) ln(1 ∓ z e ) = ∑(±1 ) C (T ) ,
j
kB T j
j=1
−∞
where
∞
−jε/ kB T
C (T ) = ∫ dε g(ε) e .
j
−∞
One now inverts $n=n(T,z)$ to obtain $z=z(T,n)$, then substitutes this into $p=p(T,z)$ to obtain a series expansion for the equation
of state,
2
p(T , n) = nkB T (1 + B (T ) n + B (T ) n +… ) .
2 3
The coefficients $B\ns_j(T)$ are the {\it virial coefficients\/}. One finds
2
C C 2 C3
2 2
B =∓ , B = − .
2 2 3 4 3
2C C 2C
1 1 1
$\bullet$ {\it Photon statistics\/}: Photons are bosonic excitations whose number is not conserved, hence $\mu=0$. The number
distribution for photon statistics is then $n(\ve)=1/(e^{\beta\ve}-1)$. Examples of particles obeying photon statistics include
phonons (lattice vibrations), magnons (spin waves), and of course photons themselves, for which $\ve(k)=\hbar c k$ with $\Sg=2$.
The pressure and number density for the photon gas obey $p(T) = A\ns_d\,T^{d+1}$ and $n(T)=A'_d\,T^d$, where $d$ is the
dimension of space and $A\ns_d$ and $A'_d$ are constants. $\bullet$ {\it Blackbody radiation\/}: The energy density per unit
frequency of a three-dimensional blackbody is given{P by
3
8πh ν
ε(ν , T ) = ⋅ .
3
c ehν / kB T − 1
The total power emitted per unit area of a blackbody is ${dP\over dA}=\sigma T^4$, where $\sigma=\pi^2 k_\ssr{B}^4/60\hbar^3
c^2 =5.67\times 10^{-8}\,\RW/\Rm^2\,\RK^4$ is Stefan's constant. $\bullet$ {\it Ideal Bose gas\/}: For Bose systems, we must
have $\ve\ns_\alpha > \mu$ for all single particle states. The number density is
∞
g(ε)
n(T , μ) = ∫ dε .
β(ε−μ)
e −1
−∞
This is an increasing function of $\mu$ and an increasing function of $T$. For fixed $T$, the largest value $n(T,\mu)$ can attain is
$n(T,\ve\ns_0)$, where $\ve\ns_0$ is the lowest possible single particle energy, for which $g(\ve)=0$ for $\ve < \ve\ns_0$. If
$n\ns_\Rc(T)\equiv n(T,\ve\ns_0) < \infty$, this establishes a {\it critical density\/} above which there is {\it Bose condensation\/}
into the energy $\ve\ns_0$ state. Conversely, for a given density $n$ there is a {\it critical temperature\/} $T\ns_\Rc(n)$ such that
$n\ns_0$ is finite for $T<t\ns_\rc$\,.>T\ns_\Rc$, $n(T,\mu)$ is given by the integral formula above, with $n\ns_0=0$. For a
ballistic dispersion $\ve(\Bk)=\hbar^2\Bk^2/2m$, one finds $n\lambda_{T\ns_\Rc}^d=\Sg\,\zeta(d/2)$, \ie\ $\kB T\ns_\Rc=
{2\pi\hbar^2\over m} \left(n\big/\Sg\,\zeta(d/2)\right)^{2/d}$. For $T<t\ns_\rc(n)$,>T\ns_\Rc(n)$, one has $n=\Sg\,
{Li}\ns_{d\over 2}(z)\,\lambda_T^{-d}$ and $p=\Sg\,{Li}\ns_{{d\over 2}+1}(z)\,\kT\,\lambda_T^{-d}$, where
∞ n
z
Liq (z) ≡ ∑ .
q
n
n=1
$\bullet$ {\it Ideal Fermi gas\/}: The Fermi distribution is $n(\ve)=f(\ve-\mu)=1\big/\!\left(e^{(\ve-\mu)/\kT}+1\right)$. At $T=0$,
this is a step function: $n(\ve)=\RTheta(\mu-\ve)$, and $n=\int\limits_{-\infty}^\mu\!\! d\ve\>g(\ve)$. The chemical potential at
$T=0$ is called the {\it Fermi energy\/}: $\mu(T=0,n)=\veF(n)$. If the dispersion is $\ve(\Bk)$, the locus of $\Bk$ values
satisfying $\ve(\Bk)=\veF$ is called the {\it Fermi surface\/}. For an isotropic and monotonic dispersion $\ve(k)$, the Fermi surface
is a sphere of radius $\kF$, the {\it Fermi wavevector\/}. For isotropic three-dimensional systems, $\kF=(6\pi^2 n/\Sg)^{1/3}$.
$\bullet$ {\it Sommerfeld expansion\/}: Let $\phi(\ve)={d\Phi\over d\ve}$. Then
∞
∫ dε f (ε − μ) ϕ(ε) = πD csc(πD) Φ(μ)
−∞
2 2 4 4
π 2
d 7π 4
d
= {1 + (kB T ) + (kB T ) + … } Φ(μ) ,
2 4
6 dμ 360 dμ
where $D=\kT\,{d\over d\mu}$. One then finds, for example, $C\ns_V=\gamma V T$ with $\gamma=\third \pi^2
k_\ssr{B}^2\,g(\veF)$.
CHAPTER OVERVIEW
6: Classical Interacting Systems

6.1: Ising Model
6.6: Polymers
6.S: Summary
In a scattering experiment, a beam of particles interacts with a sample and the beam particles scatter off the sample particles. A
momentum~q and energy ~ω are transferred to the beam particle during such a collision. If ω = 0 , the scattering is said to be
elastic. For ω ≠= 0 , the scattering is inelastic.
This page titled 6: Classical Interacting Systems is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel
Arovas.
1
6.1: Ising Model
Definition
The simplest model of an interacting system consists of a lattice L of sites, each of which contains a spin σ which may be either up (σ i i
= +1 ) or down (σ
i
= −1 ). The Hamiltonian is
^ = −J ∑ σ σ
H −μ H ∑σ . (6.1.1)
i j 0 i
⟨ij⟩ i
When J > 0 , the preferred (lowest energy) configuration of neighboring spins is that they are aligned, σ i
σ
j
= +1 . The interaction is then called ferromagnetic. When J < 0 the preference
is for anti-alignment, σ σ = −1 , which is antiferromagnetic.
i j
This model is not exactly solvable in general. In one dimension, the solution is quite straightforward. In two dimensions, Onsager’s solution of the model (with H = 0 ) is among the most
celebrated results in statistical physics. In higher dimensions the system has been studied by numerical simulations (the Monte Carlo method) and by field theoretic calculations
(renormalization group), but no exact solutions exist.
Ising Model in One Dimension

Consider a one-dimensional ring of N sites. The ordinary canonical partition function is then
^
−βH
Z = Tr e
ring
βJσn σ βμ H σn
= ∑∏e n+1
e 0
{ σn } n=1
N
= Tr (R ) ,
where σ N +1
≡σ
1
owing to periodic (ring) boundary conditions, and where R is a 2 × 2 transfer matrix,
′ ′
βJσσ βμ H (σ+σ )/2
R ′ =e e 0
σσ
βJ βμ H −βJ
e e 0
e
=( )
−βJ βJ −βμ H
e e e 0
βJ βJ z −βJ x
=e cosh(β μ H ) + e sinh(β μ H ) τ +e τ ,
0 0
where τ are the Pauli matrices. Since the trace of a matrix is invariant under a similarity transformation, we have
α
N N
Z(T , H , N ) = λ+ + λ− , (6.1.2)
where
−−−−−−−−−−−−−−−−−−−−
βJ 2βJ 2 −2βJ
λ (T , H ) = e cosh(β μ H ) ± √ e sinh (β μ H ) + e (6.1.3)
± 0 0
are the eigenvalues of R . In the thermodynamic limit, N → ∞ , and the λ N

+
term dominates exponentially. We therefore have
F (T , H , N ) = −N kB T ln λ+ (T , H ) . (6.1.4)
From the free energy, we can compute the magnetization,
∂F Nμ sinh(β μ H )
0 0
M = −( ) = −−−−−−−−−−−−−−−− (6.1.5)
∂H 2
T ,N √ sinh (β μ H ) + e−4βJ
0
and the zero field isothermal susceptibility,

2
1 ∂M ∣ μ
0 2J/ kB T
χ(T ) = ∣ = e . (6.1.6)
N ∂H ∣ H =0 kB T
Note that in the noninteracting limit J → 0 we recover the familiar result for a free spin. The effect of the interactions at low temperature is to vastly increase the susceptibility. Rather than
a set of independent single spins, the system effectively behaves as if it were composed of large blocks of spins, where the block size ξ is the correlation length, to be derived below.
The physical properties of the system are often elucidated by evaluation of various correlation functions. In this case, we define
Tr (σ R ⋯R σ R ⋯R )
1 σ σ σn σ n+1 σ σ σ σ
1 2 n+1 n+1 n+2 N 1
C (n) ≡ ⟨σ σ ⟩ =
1 n+1
N
Tr (R )
n N −n
Tr (Σ R ΣR )
= ,
N
Tr (R )
where 0 < n < N , and where

1 0
Σ =( ) . (6.1.7)
0 −1
To compute this ratio, we decompose R in terms of its eigenvectors, writing

R = λ+ |+⟩⟨+| + λ− |−⟩⟨−| . (6.1.8)
Then
N 2 N 2 N −n n n N −n
λ+ Σ++ + λ− Σ−− + (λ+ λ− + λ+ λ− ) Σ+− Σ−+
C (n) = , (6.1.9)
N N
λ+ + λ−
where
′
Σ =⟨μ|Σ|μ ⟩ . (6.1.10)
μμ′
Zero External Field
Consider the case H = 0 , where R = e βJ
+e
−βJ
τ
x
, where τ is the Pauli matrix. Then
x
1
| ±⟩ = (| ↑⟩ ± | ↓⟩) , (6.1.11)
–
√2
the eigenvectors of R are

1 1
ψ = – ( ) , (6.1.12)
±
√2 ±1
and Σ ++
=Σ
−−
=0 , while Σ ±
=Σ
−+
=1 . The corresponding eigenvalues are
λ+ = 2 cosh(βJ) , λ− = 2 sinh(βJ) . (6.1.13)
The correlation function is then found to be

N −|n| |n| |n| N −|n|
λ+ λ− + λ+ λ−
C (n) ≡ ⟨σ σ ⟩ =
1 n+1
N N
λ+ + λ−
|n| N −|n|
tanh (βJ) + tanh (βJ)
=
N
1 + tanh (βJ)
|n|
≈ tanh (βJ) (N → ∞) .
This result is also valid for n < 0 , provided |n| ≤ N . We see that we may write
−|n|/ξ(T )
C (n) = e , (6.1.14)
where the correlation length is

1
ξ(T ) = . (6.1.15)
ln ctnh(J/ kB T )
Note that ξ(T ) grows as T → 0 as ξ ≈ 1
2
e
2J/ kB T
.
Chain with free ends

When the chain has free ends, there are (N −1) links, and the partition function is
N −1
Z = ∑ (R ) ′
chain σσ
′
σ,σ
N −1 ′ N −1 ′
= ∑ {λ ψ+ (σ) ψ+ (σ ) + λ ψ− (σ) ψ− (σ )} ,
+ −
′
σ,σ
where ψ ±
(σ) = ⟨ σ | ± ⟩ . When H = 0 , we make use of Equation 6.1.12 to obtain
1 1 1 N −1 1 1 −1 N −1
N −1
R = ( ) (2 cosh βJ ) + ( ) (2 sinh βJ ) , (6.1.16)
2 1 1 2 −1 1
and therefore
N N −1
Z =2 cosh (βJ) . (6.1.17)
chain
There’s a nifty trick to obtaining the partition function for the Ising chain which amounts to a change of variables. We define
νn ≡ σn σ (n = 1 , … , N − 1) . (6.1.18)
n+1
Thus, ν 1
=σ σ
1 2
,ν
2 =σ σ
2 3
, Note that each ν takes the values ±1. The Hamiltonian for the chain is
j
N −1 N −1
H = −J ∑ σn σ = −J ∑ νn . (6.1.19)
chain n+1
n=1 n=1
The state of the system is defined by the N Ising variables {σ

1
, ν
1
, … , ν
N −1
} . Note that σ
1
doesn’t appear in the Hamiltonian. Thus, the interacting model is recast as N −1
noninteracting Ising spins, and the partition function is

−βH
Z = Tre cha in
chain
βJν βJν βJν

= ∑∑⋯ ∑e 1
e 2
⋯e N−1
σ ν ν
1 1 N−1
N −1
βJν N N −1
= ∑ (∑ e ) =2 cosh (βJ) .
σ ν
1
Ising model in two dimensions: Peierls’ argument

We have just seen how in one dimension, the Ising model never achieves long-ranged spin order. That is, the spin-spin correlation function decays asymptotically as an exponential function
of the distance with a correlation length ξ(T ) which is finite for all > 0 . Only for T = 0 does the correlation length diverge. At T = 0 , there are two ground states, | ↑↑↑↑ ⋯ ↑ ⟩ and
| ↓↓↓↓ ⋯ ↓ ⟩. To choose between these ground states, we can specify a boundary condition at the ends of our one-dimensional chain, where we demand that the spins are up. Equivalently,
we can apply a magnetic field H of order 1/N , which vanishes in the thermodynamic limit, but which at zero temperature will select the ‘all up’ ground state. At finite temperature, there is
always a finite probability for any consecutive pair of sites (n, n+1) to be in a high energy state, either | ↑↓ ⟩ or | ↓↑ ⟩. Such a configuration is called a domain wall, and in one-dimensional
systems domain walls live on individual links. Relative to the configurations | ↑↑ ⟩ and | ↓↓ ⟩, a domain wall costs energy 2J. For a system with M = xN domain walls, the free energy is
N
F = 2M J − kB T ln ( )
M
= N ⋅ {2Jx + kB T [x ln x + (1 − x) ln(1 − x)]} ,
Minimizing the free energy with respect to x, one finds x = 1/(e + 1) , so the equilibrium concentration of domain walls is finite, meaning there can be no long-ranged spin order. In
2J/ kB T
one dimension, entropy wins and there is always a thermodynamically large number of domain walls in equilibrium. And since the correlation length for T > 0 is finite, any boundary
conditions imposed at spatial infinity will have no thermodynamic consequences since they will only be ‘felt’ over a finite range.
As we shall discuss in the following chapter, this consideration is true for any system with sufficiently short-ranged interactions and a discrete global symmetry. Another example is the q-
state Potts model,
H = −J ∑ δ −h ∑δ . (6.1.20)
σ ,σ σ ,1
i j i
⟨ij⟩ i
Here, the spin variables σ take values in the set {1, 2, … , q} on each site. The equivalent of an external magnetic field in the Ising case is a field h which prefers a particular value of σ (
i
σ = 1 in the above Hamiltonian). See the appendix in §8 for a transfer matrix solution of the one-dimensional Potts model.
What about higher dimensions? A nifty argument due to R. Peierls shows that there will be a finite temperature phase transition for the Ising model on the square lattice1. Consider the Ising
model, in zero magnetic field, on a N × N square lattice, with N → ∞ in the thermodynamic limit. Along the perimeter of the system we impose the boundary condition σ = +1 .
x y x,y i
Any configuration of the spins may then be represented uniquely in the following manner. Start with a configuration in which all spins are up. Next, draw a set of closed loops on the lattice.
By definition, the loops cannot share any links along their boundaries, each link on the lattice is associated with at most one such loop. Now flip all the spins inside each loop from up to
down. Identify each such loop configuration with a label Γ. The partition function is
^
−βH −2βJLΓ
Z = Tr e = ∑e , (6.1.21)
where L is the total perimeter of the loop configuration Γ. The domain walls are now loops, rather than individual links, but as in the one-dimensional case, each link of each domain wall
Γ
contributes an energy +2J relative to the ground state.
Figure 6.1.1 : Clusters and boundaries for the square lattice Ising model. Left panel: a configuration Γ where the central spin is up. Right panel: a configuration Cγ ∘ Γ where the interior
spins of a new loop γ containing the central spin have been flipped.
Now we wish to compute the average magnetization of the central site (assume N are both odd, so there is a unique central site). This is given by the difference P (0) − P (0) , where
x,y + −
P (0) = ⟨δ
μ ⟩ is the probability that the central spin has spin polarization μ . If P (0) > P (0) , then the magnetization per site m = P (0) − P (0) is finite in the thermodynamic
+ − + −
σ , μ
0
limit, and the system is ordered. Clearly

1
−2βJLΓ
P (0) = ∑ e , (6.1.22)
+
Z
Γ∈Σ
+
where the restriction on the sum indicates that only those configurations where the central spin is up (σ 0
= +1 ) are to be included (Figure 6.1.1a). Similarly,
1 −2βJL ˜
P (0) = ∑ e Γ , (6.1.23)
−
Z
˜
Γ∈Σ−
where only configurations in which σ 0

= −1 are included in the sum. Here we have defined
Σ = {Γ ∣
∣ σ0 = ±} . (6.1.24)
±
Σ+ (Σ− ) is the set of configurations Γ in which the central spin is always up (down). Consider now the construction in Figure 6.1.1b. Any loop configuration Γ̃ ∈ Σ may be associated −
with a unique loop configuration Γ ∈ Σ by reversing all the spins within the loop of Γ̃ which contains the origin. Note that the map from Γ̃ to Γ is many-to-one. That is, we can write
+
Γ̃ = C ∘ Γ , where C overturns the spins within the loop γ, with the conditions that (i) γ contains the origin, and (ii) none of the links in the perimeter of γ coincide with any of the links
γ γ
from the constituent loops of Γ. Let us denote this set of loops as Υ : Γ
Υ = {γ : 0 ∈ int(γ) and γ ∩ Γ = ∅} . (6.1.25)

Γ
Then
1
−2βJLΓ −2βJLγ
m = P+ (0) − P− (0) = ∑ e (1 − ∑ e ) . (6.1.26)
Z
Γ∈Σ γ∈ΥΓ
+
If we can prove that ∑ e

γ∈ΥΓ
< 1 , then we will have established that m > 0 . Let us ask: how many loops γ are there in
−2βJLγ
Υ
Γ
with perimeter L ? We cannot answer this question
exactly, but we can derive a rigorous upper bound for this number, which, following Peliti, we call g(L) . We claim that
2
2 L L
L L
g(L) < ⋅3 ⋅( ) = ⋅3 . (6.1.27)
3L 4 24
To establish this bound, consider any site on such a loop γ. Initially we have 4 possible directions to proceed to the next site, but thereafter there are only 3 possibilities for each subsequent
step, since the loop cannot run into itself. This gives 4 ⋅ 3 possibilities. But we are clearly overcounting, since any point on the loop could have been chosen as the initial point, and
L−1
moreover we could have started by proceeding either clockwise or counterclockwise. So we are justified in dividing this by 2L. We are still overcounting, because we have not accounted
for the constraint that γ is a closed loop, nor that γ ∩ Γ = ∅ . We won’t bother trying to improve our estimate to account for these constraints. However, we are clearly undercounting due to
the fact that a given loop can be translated in space so long as the origin remains within it. To account for this, we multiply by the area of a square of side length L/4, which is the maximum
area that can be enclosed by a loop of perimeter L. We therefore arrive at Equation 6.1.27. Finally, we note that the smallest possible value of L is L = 4 , corresponding to a square
enclosing the central site alone. Therefore
∞ 4 2
1 2k x (2 − x )
−2βJLγ −2βJ
∑ e < ∑ k ⋅ (3 e ) = ≡r , (6.1.28)
12 12 (1 − x2 )2
γ∈ΥΓ k=2
where x = 3 e . Note that we have accounted for the fact that the perimeter L of each loop γ must be an even integer. The sum is smaller than unity provided x < x
−2βJ
0
= 0.869756 … ,
hence the system is ordered provided
kB T 2
< = 1.61531 . (6.1.29)
J ln(3/ x )
0
The exact result is k B

Tc = 2J/ sinh
−1
(1) = 2.26918 … The Peierls argument has been generalized to higher dimensional lattices as well2.
With a little more work we can derive a bound for the magnetization. We have shown that
1 −2βJLΓ −2βJLγ
1 −2βJLΓ
P− (0) = ∑ e ∑ e < r⋅ ∑ e = r P+ (0) . (6.1.30)
Z Z
Γ∈Σ+ γ∈ΥΓ Γ∈Σ+
Thus,
1 =P (0) + P (0) < (1 + r) P (0) (6.1.31)
+ − +
and therefore
1 −r
m =P (0) − P (0) > (1 − r) P (0) > , (6.1.32)
+ − +
1 +r
where r(T ) is given in Equation 6.1.28.
Figure 6.1.2 : A two-dimensional square lattice mapped onto a one-dimensional chain.
Two dimensions or one?

We showed that the one-dimensional Ising model has no finite temperature phase transition, and is disordered at any finite temperature T , but in two dimensions on the square lattice there is
a finite critical temperature T below which there is long-ranged order. Consider now the construction depicted in Figure 6.1.2, where the sites of a two-dimensional square lattice are
c
mapped onto those of a linear chain3. Clearly we can elicit a one-to-one mapping between the sites of a two-dimensional square lattice and those of a one-dimensional chain. That is, the
two-dimensional square lattice Ising model may be written as a one-dimensional Ising model,
square linear
lattice chain
^
H = −J ∑ σ σ = −∑J ′ σn σ ′ . (6.1.33)
i j nn n
′
⟨ij⟩ n,n
How can this be consistent with the results we have just proven?
The fly in the ointment here is that the interaction along the chain J is long-ranged. This is apparent from inspecting the site labels in Figure 6.1.2. Note that site n = 15 is linked to sites
n,n′
n = 14 and n = 16 , but also to sites n = −6 and n = −28 . With each turn of the concentric spirals in the figure, the range of the interaction increases. To complicate matters further, the
′ ′ ′ ′
interactions are no longer translationally invariant, J ≠ J(n − n ) . But it is the long-ranged nature of the interactions on our contrived one-dimensional chain which spoils our previous
nn
′
′
′ −α
energy-entropy argument, because now the domain walls themselves interact via a long-ranged potential. Consider for example the linear chain with J = J |n − n | , where α > 0 . n,n
′
Let us compute the energy of a domain wall configuration where σ = +1 if n > 0 and σ = −1 if n ≤ 0 . The domain wall energy is then
n n
∞ ∞
2J
Δ = ∑∑ . (6.1.34)
α
|m + n|
m=0 n=1
Here we have written one of the sums in terms of m = −n

′
. For asymptotically large m and n , we can write R = (m, n) and we obtain an integral over the upper right quadrant of the
plane:
∞ π/2 π/4 ∞
2J dϕ dR
−α/2
∫ dR R ∫ dϕ =2 ∫ ∫ . (6.1.35)
α α α α−1
R (cos ϕ + sin ϕ) cos ϕ R
1 0 −π/4 1
The ϕ integral is convergent, but the R integral diverges for α ≤ 2 . For a finite system, the upper bound on the R integral becomes the system size L. For α > 2 the domain wall energy is
finite in the thermodynamic limit L → ∞ . In this case, entropy again wins. the entropy associated with a single domain wall is k ln L, and therefore F = E − k T is always lowered by B B
having a finite density of domain walls. For α < 2 , the energy of a single domain wall scales as L . It was first proven by F. J. Dyson in 1969 that this model has a finite temperature
2−α
phase transition provided 1 < α < 2 . There is no transition for α < 1 or α > 2 . The case α = 2 is special, and is discussed as a special case in the beautiful renormalization group analysis
by J. M. Kosterlitz in Phys. Rev. Lett. 37, 1577 (1976).
High temperature expansion

Consider once again the ferromagnetic Ising model in zero field (H = 0) , but on an arbitrary lattice. The partition function is
Z=\Tra e^{\beta J \sum_{\langle ij\rangle}\sigma\ns_i\,\sigma\ns_j} =\big(\!\cosh\beta J\big)^{N\ns_\ssr{L}}\Tra\Bigg\{\prod_{\langle ij\rangle} \big(1 + x\,\sigma\ns_i\,\sigma\ns_j\big)\Bigg\}\ , \label{HTEZ}
where x = tanh βJ and N\ns_\ssr{L} is the number of links. For regular lattices, N\ns_\ssr{L}=\half z N , where N is the number of lattice sites and z is the lattice coordination number, the
number of nearest neighbors for each site. We have used
+βJ ′
βJσσ
′
′ e if σ σ = +1
e = cosh βJ ⋅ {1 + σ σ tanh βJ} = { (6.1.36)
−βJ ′
e if σ σ = −1 .
We expand Equation ??? in powers of x, resulting in a sum of 2^{N\ns_\ssr{L}} terms, each of which can be represented graphically in terms of so-called lattice animals. A lattice animal is
a distinct (including reflections and rotations) arrangement of adjacent plaquettes on a lattice. In order that the trace not vanish, only such configurations and their compositions are
permitted. This is because each σ for every given site i must occur an even number of times in order for a given term in the sum not to vanish. For all such terms, the trace is 2 . Let Γ
i
N
represent a collection of lattice animals, and g the multiplicity of Γ. Then

Γ
Z=2^N\big(\!\cosh\beta J\big)^{N\ns_\ssr{L}}\sum_\Gamma g\ns_\Gamma \,\big(\!\tanh\beta J\big)^{L\ns_\Gamma}\ ,
where L is the total number of sites in the diagram Γ, and g is the multiplicity of Γ. Since x vanishes as T
Γ Γ
→ ∞ , this procedure is known as the high temperature expansion (HTE).
Figure 6.1.3 : HTE diagrams on the square lattice and their multiplicities.
For the square lattice, he enumeration of all lattice animals with up to order eight is given in Figure 6.1.3. For the diagram represented as a single elementary plaquette, there are N possible
locations for the lower left vertex. For the 2 × 1 plaquette animal, one has g = 2N , because there are two inequivalent orientations as well as N translations. For two disjoint elementary
squares, one has g = N (N − 5) , which arises from subtracting 5N ‘illegal’ configurations involving double lines (remember each link in the partition sum appears only once!), shown in
1
the figure, and finally dividing by two because the individual squares are identical. Note that N (N − 5) is always even for any integer value of N . Thus, to lowest interesting order on the
square lattice,
2N 5 1
N 4 6 8 2 8 10
Z =2 (cosh βJ ) {1 + N x + 2N x + (7 − )N x + N x + O(x )} . (6.1.37)
2 2
The free energy is therefore

9
2 4 6 8 10
F = −kB T ln 2 + N kB T ln(1 − x ) − N kB T [x +2 x + x + O(x )]
2
3 7 19
2 4 6 8 10
= N kB T ln 2 − N kB T {x + x + x + x + O(x )},
2 3 4
again with x = tanh βJ . Note that we’ve substituted cosh 2

βJ = 1/(1 − x )
2
to write the final result as a power series in x. Notice that the O(N 2
) factor in Z has cancelled upon taking
the logarithm, so the free energy is properly extensive.
Note that the high temperature expansion for the one-dimensional Ising chain yields
N N −1 N N
Z (T , N ) = 2 cosh βJ , Z (T , N ) = 2 cosh βJ , (6.1.38)
chain ring
in agreement with the transfer matrix calculations. In higher dimensions, where there is a finite temperature phase transition, one typically computes the specific heat c(T ) and tries to
extract its singular behavior in the vicinity of T , where c(T ) ∼ A (T − T ) . Since x(T ) = tanh(J/k T ) is analytic in T , we have c(x) ∼ A (x − x ) , where
c c
−α
B
′
c
−α
xc = x(Tc ) . One
assumes x is the singularity closest to the origin and corresponds to the radius of convergence of the high temperature expansion. If we write
c
∞ −α
x
n ′′
c(x) = ∑ an x ∼ A (1 − ) , (6.1.39)
xc
n=0
then according to the binomial theorem we should expect

an 1 1 −α
= [1 − ] . (6.1.40)
a xc n
n−1
Thus, by plotting a n / an−1 versus 1/n, one extracts 1/x as the intercept, and (α − 1)/x as the slope.
c c
High temperature expansion for correlation functions
Can we also derive a high temperature expansion for the spin-spin correlation function C kl
= ⟨σ
k
σ ⟩
l
? Yes we can. We have
βJ ∑⟨ ij⟩ σ σ
Tr [σ σ e i j
]
k l Y
kl
C = ≡ . (6.1.41)
kl
βJ ∑⟨ ij⟩ σ σ Z
Tr [e i j
]
Recall our analysis of the partition function Z . We concluded that in order for the trace not to vanish, the spin variable σ on each site i must occur an even number of times in the expansion
i
of the product. Similar considerations hold for Y , except now due to the presence of σ and σ , those variables now must occur an odd number of times when expanding the product. It is
kl k l
clear that the only nonvanishing diagrams will be those in which there is a finite string connecting sites k and l, in addition to the usual closed HTE loops. See Figure 6.1.4 for an instructive
sketch. One then expands both Y as well as Z in powers of x = tanh βJ , taking the ratio to obtain the correlator C . At high temperatures (x → 0) , both numerator and denominator are
kl kl
dominated by the configurations Γ with the shortest possible total perimeter. For Z , this means the trivial path Γ = {∅} , while for Y this means finding the shortest length path from k to l.
kl
(If there is no straight line path from k to l, there will in general be several such minimizing paths.) Note, however, that the presence of the string between sites k and l complicates the
analysis of g for the closed loops, since none of the links of Γ can intersect the string. It is worth stressing that this does not mean that the string and the closed loops cannot intersect at
Γ
isolated sites, but only that they share no common links; see once again Figure 6.1.4.
Figure 6.1.4 : HTE diagrams for the numerator Y of the correlation function C . The blue path connecting sites k and l is the string. The remaining red paths are all closed loops.
kl kl
This page titled 6.1: Ising Model is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel Arovas.
Let’s switch gears now and return to the study of continuous classical systems described by a Hamiltonian H ^ ({ x }, { p }). In the
i i
next chapter, we will see how the critical properties of classical fluids can in fact be modeled by an appropriate lattice gas Ising
model, and we’ll derive methods for describing the liquid-gas phase transition in such a model.
The Configuration Integral

Consider the ordinary canonical partition function for a nonideal system of identical point particles interacting via a central two-
body potential u(r). We work in the ordinary canonical ensemble. The N -particle partition function is
N d d
1 d p d x ^
i i −H / kB T
Z(T , V , N ) = ∫ ∏ e
d
N! h
i=1
−N d N
λ 1
T d
= ∫ ∏d x exp (− ∑ u(| x − x |)) .
i i j
N! kB T
i=1 i<j
Here, we have assumed a many body Hamiltonian of the form

N 2
p
^ i
H =∑ + ∑ u(| x − x |) , (6.2.1)
i j
2m
i=1 i<j
−−−−−−−−− −
in which massive nonrelativistic particles interact via a two-body central potential. As before, λ T
= √2π ℏ 2 /m kB T is the thermal
wavelength. We can now write
−N d
Z(T , V , N ) = λ Q (T , V ) , (6.2.2)
T N
where the configuration integral Q N

(T , V ) is given by
1
d d −βu( r )
Q (T , V ) = ∫d x ⋯∫d x ∏e ij
. (6.2.3)
N 1 N
N!
i<j
There are no general methods for evaluating the configurational integral exactly.
One-dimensional Tonks gas

The Tonks gas is a one-dimensional generalization of the hard sphere gas. Consider a one-dimensional gas of indistinguishable
particles of mass m interacting via the potential
′
′ ∞ if |x − x | < a
u(x − x ) = { (6.2.4)
′
0 if |x − x | ≥ a .
Thus, the Tonks gas may be considered to be a gas of hard rods. The above potential guarantees that the portion of configuration
space in which any rods overlap is forbidden in this model4. Let the gas be placed in a finite volume L. The hard sphere nature of
the particles means that no particle can get within a distance a of the ends at x = 0 and x = L . That is, there is a one-body
1
potential v(x) acting as well, where

1
⎧∞
⎪ if x < a
⎪ 2
1 1
v(x) = ⎨ 0 if a ≤ x ≤ L− a (6.2.5)
2 2
⎪
⎩
⎪ 1
∞ if x > L − a .
2
The configuration integral of the 1D Tonks gas is given by

L L
1
Q (T , L) = ∫ dx ⋯ ∫ dx χ(x , … , x ) , (6.2.6)
N 1 N 1 N
N!
0 0
where χ = e is zero if any two ‘rods’ (of length a ) overlap, or if any rod overlaps with either boundary at x = 0 and x = L ,
−U/ kB T
and χ = 1 otherwise. Note that χ does not depend on temperature. Without loss of generality, we can integrate over the subspace
where x
1
<x
2
<⋯ <x
N
and then multiply the result by N ! . Clearly x
j
must lie to the right of x
j−1
+a and to the left of
Y
j
≡ L − (N − j)a −
1
2
a . Thus, the configurational integral is
Y1 Y2 YN
Q (T , L) =∫ dx ∫ dx ⋯∫dx
N 1 2 N
a/2 x1 +a x +a
N−1
Y Y YN−1
1 2
=∫ dx ∫ dx ⋯∫dx (Y −x )
1 2 N −1 N −1 N −1
a/2 x +a x +a
1 N−2
Y1 Y2 YN−2
1 2
=∫ dx ∫ dx ⋯∫dx (Y −x ) =⋯
1 2 N −2 N −2 N −2
2
a/2 x +a x +a
1 N−3
1 1 N 1
N
= (X − a) = (L − N a) .
1
N! 2 N!
The partition function is Z(T , L, N ) = λ −N
T
Q
N
(T , L) , and so the free energy is
L
F = −kB T ln Z = −N kB T { − ln λ + 1 + ln ( − a)} , (6.2.7)
T
N
where we have used Stirling’s rule to write ln N ! ≈ N ln N − N . The pressure is

∂F kB T nkB T
p =− = = , (6.2.8)
L
∂L −a 1 − na
N
where n = N /L is the one-dimensional density. Note that the pressure diverges as n approaches 1/a. The usual one-dimensional
ideal gas law, pL = N k T , is replaced by pL = N k T , where L = L − N a is the ‘free’ volume obtained by subtracting
B
ef f
B
ef f
the total "excluded volume" N a from the original volume L. Note the similarity here to the van der Waals equation of state,
~
)(v − b) = RT , where v = N V /N is the molar volume. Defining a ≡ a/ N and b ≡ b/N , we have
−2 ~ 2
(p + av A A
A
nkB T
~ 2
p + an = , (6.2.9)
~
1 − bn
where n = N /v is the number density. The term involving the constant a

A
~
is due to the long-ranged attraction of atoms due to their
~
mutual polarizability. The term involving b is an excluded volume effect. The Tonks gas models only the latter.
Mayer Cluster Expansion

Let us return to the general problem of computing the configuration integral. Consider the function e −βu
ij
, where
u ≡ u(| x − x |) . We assume that at very short distances there is a strong repulsion between particles, u → ∞ as
ij i j ij
r
ij i j
, and that u → 0 as r → ∞ . Thus, e
= |x − x | → 0
ij ij
−βu
ij
vanishes as r
ij
→ 0 and approaches unity as r ij
→ ∞ . For our
purposes, it will prove useful to define the function
−βu(r)
f (r) = e −1 , (6.2.10)
called the Mayer function after Josef Mayer. We may now write
1
d d
Q (T , V ) = ∫d x ⋯∫d x ∏ (1 + f ) . (6.2.11)
N 1 N ij
N!
i<j
Figure 6.2.1 : Bottom panel: Lennard-Jones potential u(r) = 4ϵ (x −x ) , with x = r/σ and ϵ = 1 . Note the weak attractive
−12 −6
tail and the strong repulsive core. Top panel: Mayer function f (r, T ) = e − 1 for k T = 0.8 ϵ (blue), k T = 1.5 ϵ
−u(r)/k
B
T
B B
(green), and k T = 5 ϵ (red).

B
A typical potential we might consider is the semi-phenomenological Lennard-Jones potential,

12 6
σ σ
u(r) = 4 ϵ {( ) −( ) } . (6.2.12)
r r
This accounts for a long-distance attraction due to mutually induced electric dipole fluctuations, and a strong short-ranged
repulsion, phenomenologically modelled with a r potential, which mimics a hard core due to overlap of the atomic electron
−12
distributions. Setting u (r) = 0 we obtain r = 2 σ ≈ 1.12246 σ at the minimum, where u(r ) = −ϵ . In contrast to the
′ ∗ 1/6 ∗
Boltzmann weight e , the Mayer function f (r) vanishes as r → ∞ , behaving as f (r) ∼ −βu(r). The Mayer function also
−βu(r)
depends on temperature. Sketches of u(r) and f (r) for the Lennard-Jones model are shown in Figure 6.2.1.
The Lennard-Jones potential5 is realistic for certain simple fluids, but it leads to a configuration integral which is in general
impossible to evaluate. Indeed, even a potential as simple as that of the hard sphere gas is intractable in more than one space
dimension. We can however make progress by deriving a series expansion for the equation of state in powers of the particle density.
This is known as the virial expansion. As was the case when we investigated noninteracting quantum statistics, it is convenient to
work in the grand canonical ensemble and to derive series expansions for the density n(T , z) and the pressure p(T , z) in terms of
the fugacity z , then solve for z(T , n) to obtain p(T , n). These expansions in terms of fugacity have a nifty diagrammatic
interpretation, due to Mayer.
We begin by expanding the product in Equation 6.2.11 as
∏ (1 + f ) = 1 +∑f + ∑ f f +… . (6.2.13)
ij ij ij kl
i<j , k<l
i<j i<j
( ij) ≠( kl)
As there are N (N − 1) possible pairings, there are 2

1
2
terms in the expansion of the above product. Each such term may
N (N −1)/2
be represented by a graph, as shown in Figure 6.2.2. For each such term, we draw a connection between dots representing different
particles i and j if the factor f appears in the term under consideration. The contribution for any given graph may be written as a
ij
product over contributions from each of its disconnected component clusters. For example, in the case of the term in Figure 6.2.2,
the contribution to the configurational integral would be
N −11
V
d d d d
ΔQ = ∫d x d x d x d x f f f f
1 4 7 9 1,4 4,7 4,9 7,9
N!
d d d d d d d
×∫ d x d x d x f f ×∫ d x d x f ×∫ d x d x f .
2 5 6 2,5 2,6 3 10 3,10 8 11 8,11
We will refer to a given product of Mayer functions which arises from this expansion as a term.
Figure 6.2.2 : Diagrammatic interpretation of a term involving a product of eight Mayer functions.
Figure 6.6: Left: John Lennard-Jones. Center: Catherine Zeta-Jones. Right: James Earl Jones.
The particular labels we assign to each vertex of a given graph don’t affect the overall value of the graph. Now a given unlabeled
graph consists of a certain number of connected subgraphs. For a system with N particles, we may then write
N = ∑ mγ nγ , (6.2.14)
where γ ranges over all possible connected subgraphs, and

mγ = number of connected subgraphs of type γ in the unlabeled graph
nγ = number of vertices in the connected subgraph γ .
Note that the single vertex ∙ counts as a connected subgraph, with n = 1 . We now ask: how many ways are there of assigning the
∙
N labels to the N vertices of a given unlabeled graph? One might first thing the answer is simply N ! , however this is too big,
because different assignments of the labels to the vertices may not result in a distinct graph. To see this, consider the examples in
Figure 6.2.3. In the first example, an unlabeled graph with four vertices consists of two identical connected subgraphs. Given any
assignment of labels to the vertices, then, we can simply exchange the two subgraphs and get the same term. So we should divide
N ! by the product ∏ m ! . But even this is not enough, because within each connected subgraph γ there may be permutations
γ
γ
which leave the integrand unchanged, as shown in the second and third examples in Figure 6.2.3. We define the symmetry factor s γ
as the number of permutations of the labels which leaves a given connected subgraphs γ invariant. Examples of symmetry factors
are shown in Figure 6.2.4. Consider, for example, the third subgraph in the top row. Clearly one can rotate the figure about its
horizontal symmetry axis to obtain a new labeling which represents the same term. This twofold axis is the only symmetry the
diagram possesses, hence s = 2 . For the first diagram in the second row, one can rotate either of the triangles about the horizontal
γ
symmetry axis. One can also rotate the figur e in the plane by 180 so as to exchange the two triangles. Thus, there are
∘
2 × 2 × 2 = 8 symmetry operations which result in the same term, and s = 8 . Finally, the last subgraph in the second row
γ
consists of five vertices each of which is connected to the other four. Therefore any permutation of the labels results in the same
mγ
term, and s = 5! = 120 . In addition to dividing by the product ∏ m ! , we must then also divide by ∏ s .
γ γ γ γ γ
Figure 6.2.3 : Different assignations of labels to vertices may not result in a distinct term in the expansion of the configuration
integral.
We can now write the partition function as
mγ
−N d γ
λ N!
T d d
Z = ∑ ⋅∏(∫d x ⋯d x ∏f ) ⋅δ
mγ 1 nγ ij N , ∑ mγ nγ
N!
{ mγ } ∏ mγ ! sγ γ i<j
mγ
(V bγ (T ))
−N d
=λ ∑ ∏ ⋅δ
T N , ∑ mγ nγ
mγ !
{ mγ } γ
where the product ∏ f is over all links in the subgraph γ. The final Kronecker delta enforces the constraint N
γ
i<j ij
=∑
γ
mγ nγ .
We have defined the cluster integrals b as γ
γ
1 1
d d
bγ (T ) ≡ ⋅ ∫d x ⋯d x ∏f , (6.2.15)
1 nγ ij
sγ V
i<j
γ
where we assume the limit V → ∞ . Since f = f (|x − x |) , the product ∏ f is invariant under simultaneous translation of
ij i j i<j ij
all the coordinate vectors by any constant vector, and hence the integral over the n position variables contains exactly one factor γ
of the volume, which cancels with the prefactor in the above definition of b . Thus, each cluster integral is intensive6, scaling as
γ
V .
0
If we compute the grand partition function, then the fixed N constraint is relaxed, and we can do the sums:
∑ mγ nγ 1
−βΩ βμ −d mγ
Ξ =e = ∑ (e λ ) ∏ (V bγ )
T
mγ !
{ mγ } γ
∞
mγ nγ
1 βμ −d mγ
=∏ ∑ (e λ ) (V bγ )
T
mγ !
γ mγ =0
βμ −d nγ
= exp (V ∑ (e λ ) bγ ) .
T
Thus,
nγ
βμ −d
Ω(T , V , μ) = −V kB T ∑ (e λ ) bγ (T ) , (6.2.16)
T
and we can write

nγ
−d
p = kB T ∑ (zλ ) bγ (T )
T
nγ
−d
n = ∑ nγ (zλ ) bγ (T ) ,
T
where z = exp(βμ) is the fugacity, and where b ≡ 1 . As in the case of ideal quantum gas statistical mechanics, we can
∙
systematically invert the relation n = n(z, T ) to obtain z = z(n, T ) , and then insert this into the equation for p(z, T ) to obtain the
equation of state p = p(n, T ) . This yields the virial expansion of the equation of state,
2
p = nkB T {1 + B (T ) n + B (T ) n +… } . (6.2.17)
2 3
Figure 6.2.4 : The symmetry factor sγ for a connected subgraph γ is the number of permutations of its indices which leaves the
term ∏ (ij)∈γ
f invariant.
ij
Lowest order expansion

We have
1
d d
b− (T ) = ∫d x ∫d x f (| x − x |)
1 2 1 2
2V
1 d
= ∫ d r f (r)
2
and
1 d d d
b∧ (T ) = ∫d x ∫d x ∫d x f (| x − x |) f (| x − x |)
1 2 3 1 2 1 3
2V
1 2
d d ′ ′
= ∫ d r∫ d r f (r) f (r ) = 2(b− )
2
and
1
d d d
b (T ) = ∫d x ∫d x ∫d x f (| x − x |) f (| x − x |) f (| x − x |)
△ 1 2 3 1 2 1 3 2 3
6V
1 d d ′ ′ ′
= ∫ d r∫ d r f (r) f (r ) f (|r − r |) .
6
We may now write

−d −d 2 −d 3 4
p = kB T {zλ + (zλ ) b− (T ) + (zλ ) ⋅ (b∧ + b ) + O(z )}
T T T △
−d −d 2 −d 3 4
n = zλ + 2(zλ ) b (T ) + 3(zλ ) ⋅ (b +b ) + O(z )
T T − T ∧ △
We invert by writing
−d 2 3
zλ = n+α n +α n +… (6.2.18)
T 2 3
and substituting into the equation for n(z, T ), yielding

2 3 2 2 3 4
n = (n + α n +α n ) + 2(n + α n ) b + 3n (b +b ) + O(n ) . (6.2.19)
2 3 2 − ∧ △
Thus,
2 3
0 = (α + 2 b− ) n + (α + 4α b− + 3 b∧ + 3 b )n +… . (6.2.20)
2 3 2 △
We therefore conclude
α = −2b−
2
α = −4 α b − 3b − 3b
3 2 − ∧ △
2 2 2
= 8 b− − 6 b− − 3 b = 2 b− − 3 b .
△ △
We now insert Equation 6.2.18 with the determined values of α 2,3

into the equation for p(z, T ), obtaining
p
2 2 3 2 2 3 2 4
= n − 2 b− n + (2 b− − 3 b )n + (n − 2 b− n ) b− + n (2 b− + b ) + O(n )
△ △
kB T
2 3 4
= n − b− n − 2b n + O(n ) .
△
Thus,
B (T ) = −b− (T ) , B (T ) = −2 b (T ) . (6.2.21)
2 3 △
Note that b does not contribute to B – only △ appears. As we shall see, this is because the virial coefficients B involve only
∧ 2 j
cluster integrals b for one-particle irreducible clusters, those clusters which remain connected if any of the vertices plus all its
γ
links are removed.
One-particle irreducible clusters and the virial expansion

We start with Equation ??? for p(T , z) and n(T , z),
−d nγ
p = kB T ∑ (zλ ) bγ (T )
T
nγ
−d
n = ∑ nγ (zλ ) bγ (T ) ,
T
where b γ (T ) for the connected cluster γ is given by

γ
1 1
d d
bγ (T ) ≡ ⋅ ∫d x ⋯d x ∏f . (6.2.22)
1 nγ ij
sγ V
i<j
It is convenient to work with dimensionless quantities, using λ as the unit of volume. To this end, define d
T
nγ −1
d d d
ν ≡ nλ , π ≡ pλ , cγ (T ) ≡ bγ (T ) (λ ) , (6.2.23)
T T T
so that
∞ ∞
nγ ℓ nγ ℓ
βπ = ∑ cγ z = ∑d z , ν = ∑ nγ cγ z = ∑ℓ d z , (6.2.24)
ℓ ℓ
γ ℓ=1 γ l=1
where
d = ∑ cγ δ (6.2.25)
ℓ nγ , ℓ
is the sum over all connected clusters with ℓ vertices. Here and henceforth, the functional dependence on T is implicit; π and ν are
regarded here as explicit functions of z . We can, in principle, invert to obtain z(ν ) . Let us write this inverse as
∞
k
z(ν ) = ν exp(− ∑ β ν ) . (6.2.26)
k
k=1
Ultimately we need to obtain expressions for the coefficients β

k
, but let us first assume the above form and use it to write π in
terms of ν . We have
z ν
~
∞ ℓ ~ ∞ ~ℓ−1 ~ dz ~
= ∑ℓ=1 d z = ∫d z ∑l=1 ℓ d z = ∫dν ~ ν
ℓ ℓ dν
0 0
βπ
ν
~
~ ~ d ln z
z = ∫dν ~
d ln ν
0
ν
∞ ∞ ∞
kβ
~ ~k k k+1 k
= ∫ dν (1 − ∑ k β ν ) = ν −∑ ν ≡ ∑ Bk ν ,
k
k+1
k=1 k=1 k=1
0
−d(k−1)
where B k
=B
k
λ
T
is the dimensionless k th
virial coefficient. Thus, B k=1
=1 and
k−1
Bk = − β (6.2.27)
k−1
k
for k > 1 . We may also obtain the cluster integrals d in terms of the β . To this end, note that ℓ
ℓ k
2
dℓ is the coefficient of z in the
ℓ
function z dν /dz , hence

∞
2
dz 1 dν dν −ℓ
dν 1 ℓβ ν
k
ℓ d =∮ (z ) =∮ z =∮ ∏e k
ℓ ℓ ℓ
2πiz z dz 2πi 2πi ν
k=1
∞ m ∞ m
dν 1 (ℓ β ) k
(ℓ β ) k
k km k
=∮ ∑ ∏ ν k
= ∑ δ ∏ .
∑k k m , ℓ−1
2πi ν ℓ m ! k m !
{ m } k=1 k {m } k=1 k
k k
Irreducible clusters
The clusters which contribute to d are all connected, by definition. However, it is useful to make a further distinction based on the
ℓ
topology of connected clusters and define a connected cluster γ to be irreducible if, upon removing any site in γ and all the links
connected to that site, the remaining sites of the cluster are still connected. The situation is depicted in Figure 6.2.5.
Figure 6.2.5 : Connected versus irreducible clusters. Clusters (a) through (d) are irreducible in that they remain connected if any
component site and its connecting links are removed. Cluster (e) is connected, but is reducible. Its integral c may be reduced to a γ
product over its irreducible components, each shown in a unique color.

For a reducible cluster γ, the integral c is proportional to a product of cluster integrals over its irreducible components. Let us
γ
define the set Γ as the set of all irreducible clusters of ℓ vertices. It turns out that
ℓ
γ
1 1 d d
β (T ) = ∑ ∫d x ⋯∫d x ∏f (6.2.28)
k (k−1)d 1 k ij
k!
V λ γ∈Γk+1 ⟨ij⟩
T
Thus, the virial coefficients B j

(T ) are obtained by summing a restricted set of cluster integrals, viz.
k−1 (k−1)d
B (T ) = − β (T ) λ . (6.2.29)
j k−1 T
k
In the end, it turns out we don’t need the symmetry factors at all!
Cookbook Recipe
Just follow these simple steps!
The pressure and number density are written as an expansion over unlabeled connected clusters γ, viz.
−d nγ
βp = ∑ (zλ ) bγ
T
−d nγ
n = ∑ nγ (zλ ) bγ .
T
γ
For each term in each of these sums, draw the unlabeled connected cluster γ.
Assign labels 1 , 2 , … , n to the vertices, where n is the total number of vertices in the cluster γ. It doesn’t matter how you
γ γ
assign the labels.

Write down the product ∏ f . The factor f appears in the product if there is a link in your (now labeled) cluster between
γ
i<j ij ij
sites i and j .
γ
The symmetry factor s is the number of elements of the symmetric group S which leave the product ∏ f invariant. The
γ nγ i<j ij
identity permutation leaves the product invariant, so s ≥ 1 . γ
The cluster integral is

γ
1 1 d d
bγ (T ) ≡ ⋅ ∫d x ⋯d x ∏f . (6.2.30)
1 nγ ij
sγ V
i<j
Due to translation invariance, b γ (T ) ∝V

0
. One can therefore set x nγ
≡0 , eliminate the volume factor from the denominator,
and perform the integral over the remaining n −1 coordinates.
γ
This procedure generates expansions for p(T , z) and n(T , z) in powers of the fugacity z = e . To obtain something useful βμ
like p(T , n), we invert the equation n = n(T , z) to find z = z(T , n) , and then substitute into the equation p = p(T , z) to
obtain p = p(T , z(T , n)) = p(T , n) . The result is the virial expansion,
2
p = nkB T {1 + B (T ) n + B (T ) n +… } , (6.2.31)
2 3
where
γ
1 d d
B (T ) = − ∑ ∫d x ⋯∫ d x ∏f (6.2.32)
k 1 k−1 ij
k(k − 2)!
γ∈Γk ⟨ij⟩
with Γ the set of all one-particle irreducible j -site clusters.

k
Hard sphere gas in three dimensions

The hard sphere potential is given by
∞ if r ≤ a
u(r) = { (6.2.33)
0 if r > a\ .
Here a is the diameter of the spheres. The corresponding Mayer function is then temperature independent, and given by
−1 if r ≤ a
f (r) = { (6.2.34)
0 if r > a\ .
We can change variables

1 2
3 3
b (T ) = ∫ d r f (r) = − πa . (6.2.35)
−
2 3
The calculation of b△
is more challenging. We have
1 3 3
b = ∫ d ρ∫ d r f (ρ) f (r) f (|r − ρ|) . (6.2.36)
△
6
We must first compute the volume of overlap for spheres of radius a (recall a is the diameter of the constituent hard sphere
particles) centered at 0 and at ρ:
3
V = ∫ d r f (r) f (|r − ρ|)
2 2
4π 3 2
π 3
= 2 ∫ dz π(a −z ) = a − πa ρ + ρ .
3 12
ρ/2
We then integrate over region |ρ| < a , to obtain

a
2
1 2
4π 3 2
π 3
5π 6
b =− ⋅ 4π∫ dρ ρ ⋅{ a − πa ρ + ρ } =− a . (6.2.37)
△
6 3 12 36
0
Thus,
2
2π 5π
3 6 2 3
p = nkB T {1 + a n+ a n + O(n )} . (6.2.38)
3 18
Figure 6.2.6 : The overlap of hard sphere Mayer functions. The shaded volume is V .
Weakly attractive tail

Suppose
∞ if r ≤ a
u(r) = { (6.2.39)
−u (r) if r > a\ .
0
Then the corresponding Mayer function is
−1 if r ≤ a
f (r) = { (6.2.40)
βu (r)
e 0
−1 if r > a\ .
Thus,
∞
1 2π
3 3 2 βu0 (r)
b− (T ) = ∫ d r f (r) = − a + 2π ∫ dr r [e − 1] . (6.2.41)
2 3
a
Thus, the second virial coefficient is

∞
2π 3
2π 2
B (T ) = −b− (T ) ≈ a − ∫ dr r u (r) , (6.2.42)
2 0
3 kB T
a
where we have assumed k T ≪ u (r) . We see that the second virial coefficient changes sign at some temperature
B
0
T
0
, from a
negative low temperature value to a positive high temperature value.
Spherical Potential Well

Consider an attractive spherical well potential with an infinitely repulsive core,
⎧∞ if r ≤ a
u(r) = ⎨ −ϵ if a < r < R (6.2.43)

⎩
0 if r > R .
Then the corresponding Mayer function is
⎧ −1 if r ≤ a
f (r) = ⎨ eβϵ − 1 if a < r < R (6.2.44)

⎩
0 if r > R .
Writing s ≡ R/a , we have

1
3
B (T ) = −b− (T ) = − ∫ d r f (r)
2
2
1 4π 3 βϵ
4π 3 3
=− {(−1) ⋅ a + (e − 1) ⋅ a (s − 1)}
2 3 3
2π 3 3 βϵ
= a {1 − (s − 1)(e − 1)} .
3
To find the temperature T where B

0 2
(T ) changes sign, we set B 2
(T ) = 0
0
and obtain
3
s
kB T = ϵ/ ln( ) . (6.2.45)
0 3
s −1
Recall in our study of the thermodynamics of the Joule-Thompson effect in §1.10.6 that the throttling process is isenthalpic. The
temperature change, when a gas is pushed (or escapes) through a porous plug from a high pressure region to a low pressure one is
p
2
∂T
ΔT = ∫ dp ( ) , (6.2.46)
∂p
H
p
1
where
∂T 1 ∂V
( ) = [T ( ) −V ] . (6.2.47)
∂p Cp ∂T
H p
Appealing to the virial expansion, and working to lowest order in corrections to the ideal gas law, we have
2
N N
p = kB T + kB T B (T ) + … (6.2.48)
2 2
V V
and we compute ( ∂V
∂T
)
p
by seting
2 2
N kB T N kB 2N N
0 = dp = − dV + dT − kB T B (T ) dV + d(kB T B (T )) + … . (6.2.49)
2 3 2 2 2
V V V V
Figure 6.2.7 : An attractive spherical well with a repulsive core u(r) and its associated Mayer function f (r).
Dividing by dT , we find
∂V ∂B
2
T( ) − V = N [T −B ] . (6.2.50)
2
∂T ∂T
p
The temperature where (

∂T
∂p
)
H
changes sign is called the inversion temperature T
∗
. To find the inversion point, we set
T
∗ ′
B (T
2
∗
) = B (T
2
∗
) ,
d ln B ∣
2
∣ =1 . (6.2.51)
d ln T ∣ ∗
T
If we approximate B 2
(T ) ≈ A −
B
T
, then the inversion temperature follows simply:
B B ∗
2B
= A− ⟹ T = . (6.2.52)
∗ ∗
T T A
Hard spheres with a hard wall

Consider a hard sphere gas in three dimensions in the presence of a hard wall at z =0 . The gas is confined to the region z >0 .
The total potential energy is now
W (x , … , x ) = ∑ v(x ) + ∑ u(x − x ) , (6.2.53)

1 N i i j
i i<j
where
1
∞ if z ≤ a
2
v(r) = v(z) = { (6.2.54)
1
0 if z > a ,
2
and u(r) is given in Equation 6.2.33. The grand potential is written as a series in the total particle number N , and is given by
−βΩ 3 −βv(z)
1 2 3 3 ′ −βv(z) −βv( z )
′ ′
−βu(r−r )
Ξ =e = 1 + ξ∫ d r e + ξ ∫ d r∫ d r e e e +… , (6.2.55)
2
where ξ =zλ
−3
T
, with z =e the
μ/ kB T
fugacity. Taking the logarithm, and invoking the Taylor series
ln(1 + δ) = δ −
1
2
δ
2
+
1
3
δ
3
−… , we obtain
1 ′
3 2 3 3 ′ −βu(r−r )
−βΩ = ξ∫d r + ξ ∫d r ∫d r [e − 1] + … (6.2.56)
2
a a ′ a
z> z> z >
2 2 2
The volume is V =∫d r . Dividing by V , we have, in the thermodynamic limit,

3
z>0
βΩ 1 1 ′
2 3 3 ′ −βu(r−r )
− = βp =ξ+ ξ ∫d r ∫d r [e − 1] + …
V 2 V
a ′ a
z> z >
2 2
2 3 2 3
=ξ− πa ξ + O(ξ ) .
3
The number density is
∂ 4
3 2 3
n =ξ (βp) = ξ − πa ξ + O(ξ ) , (6.2.57)
∂ξ 3
and inverting to obtain ξ(n) and then substituting into the pressure equation, we obtain the lowest order virial expansion for the
equation of state,
2
3 2
p = kB T {n + πa n +… } . (6.2.58)
3
As expected, the presence of the wall does not affect a bulk property such as the equation of state.
Figure 6.2.8 : In the presence of a hard wall, the Mayer sphere is cut off on the side closest to the wall. The resulting density n(z)
vanishes for z < a since the center of each sphere must be at least one radius ( a) away from the wall. Between z = a and
1
2
1
2
1
a there is a density enhancement. If the calculation were carried out to higher order, n(z) would exhibit damped spatial
3
z =
2
oscillations with wavelength λ ∼ a .

Next, let us compute the number density n(z) , given by
n(z) = ⟨ ∑ δ(r − r ) ⟩ . (6.2.59)

i
Due to translational invariance in the (x, y) plane, we know that the density must be a function of z alone. The presence of the wall
at z = 0 breaks translational symmetry in the z direction. The number density is
N
^ ^ ^ ^
β(μ N −H ) β(μ N −H )
n(z) = T r[e ∑ δ(r − r )]/T r e
i
i=1
′ ′
−1 −βv(z) 2 −βv(z) 3 ′ −βv( z ) −βu(r−r )
=Ξ {ξ e +ξ e ∫d r e e + …}
′ ′
−βv(z) 2 −βv(z) 3 ′ −βv( z ) −βu(r−r )
=ξe +ξ e ∫d r e [e − 1] + … .
Note that the term in square brackets in the last line is the Mayer function f (r − r ) = e . Consider the function
′
′ −βu(r−r )
−1
1 ′ 1
⎧
⎪0 if z < a or z < a
2 2
′
−βv(z) −βv( z ) ′ ′
e e f (r − r ) = ⎨ 0 if |r − r | > a (6.2.60)
⎩
⎪ 1 ′ 1 ′
−1 if z > a and z > a and |r − r | < a .
2 2
Now consider the integral of the above function with respect to r . Clearly the result depends on the value of z . If z > a , then
′ 3
there is no excluded region in r and the integral is (−1) times the full Mayer sphere volume, − π a . If z < a the integral
′ 4
3
3 1
vanishes due to the e factor. For z infinitesimally larger than a, the integral is (−1) times half the Mayer sphere volume,
−βv(z) 1
− π a . For z ∈ [ ] the integral interpolates between − π a and − π a . Explicitly, one finds by elementary integration,
2 3 a 3a 2 3 4 3
,
3 2 2 3 3
1
⎧
⎪ 0 if z < a
⎪ 2
′
3 ′ −βv(z) −βv( z ) ′ 3 z 1 1 z 1 3 2 1 3
3
∫d r e e f (r − r ) = ⎨ [−1 − ( − )+ ( − ) ]⋅ πa if a <z < a (6.2.61)
2 a 2 2 a 2 3 2 2
⎪
⎩
⎪ 4 3
3
− πa if z > a .
3 2
After substituting ξ = n + 4
3
3
πa n
2 3
+ O(n ) to relate ξ to the bulk density n = n , we obtain the desired result:
∞
1
⎧
⎪ 0 if z < a
⎪ 2
3 z 1 1 z 1 3 2 1 3
3 2
n(z) = ⎨ n + [1 − ( − )+ ( − ) ]⋅ πa n if a <z < a (6.2.62)
2 a 2 2 a 2 3 2 2
⎪
⎩
⎪ 3
n if z > a .
2
A sketch is provided in the right hand panel of Figure 6.2.8. Note that the density n(z) vanishes identically for z < due to the 1
exclusion of the hard spheres by the wall. For z between a and a, there is a density enhancement, the origin of which has a
1
2
3
simple physical interpretation. Since the wall excludes particles from the region z < , there is an empty slab of thickness z
1
2
1
coating the interior of the wall. There are then no particles in this region to exclude neighbors to their right, hence the density
builds up just on the other side of this slab. The effect vanishes to the order of the calculation past z = a , where n(z) = n returns 3
to its bulk value. Had we calculated to higher order, we’d have found damped oscillations with spatial period λ ∼ a .
This page titled 6.2: Nonideal Classical Gases is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel
Arovas.
Analytic Properties of the Partition Function
How can statistical mechanics describe phase transitions? This question was addressed in some beautiful mathematical analysis by
Lee and Yang7. Consider the grand partition function Ξ,
∞
N −dN
Ξ(T , V , z) = ∑ z Q (T , V ) λ , (6.3.1)
N T
N =0
where
1
d d −U( x , … , x )/ kB T
Q (T , V ) = ∫d x ⋯∫d x e 1 N
(6.3.2)
N 1 N
N!
is the contribution to the N -particle partition function from the potential energy U (assuming no momentum-dependent potentials).
For two-body central potentials, we have
U (x , … , x ) = ∑ v(| x − x |). (6.3.3)

1 N i j
i<j
Suppose further that these classical particles have hard cores. Then for any finite volume, there must be some maximum number
N
V
such that Q (T , V ) vanishes for N > N . This is because if N > N at least two spheres must overlap, in which case the
N V V
potential energy is infinite. The theoretical maximum packing density for hard spheres is achieved for a hexagonal close packed
–
(HCP) lattice8, for which f =
HCP
= 0.74048 . If the spheres have radius r , then N
π
= V /4 √2r is the maximum particle
0 V
3
0
3 √2
number.
Thus, if V itself is finite, then Ξ(T , V , z) is a finite degree polynomial in z , and may be factorized as
NV NV
z
N −dN
Ξ(T , V , z) = ∑ z Q (T , V ) λ = ∏ (1 − ) , (6.3.4)
N T
z
N =0 k=1 k
where z (T , V ) is one of the N zeros of the grand partition function. Note that the O(z ) term is fixed to be unity. Note also that
k V
0
since the configuration integrals Q (T , V ) are all positive, Ξ(z) is an increasing function along the positive real z axis. In
N
¯
¯¯¯¯¯¯¯¯
¯
addition, since the coefficients of z in the polynomial Ξ(z) are all real, then Ξ(z) = 0 implies Ξ(z) = Ξ(z̄ ) = 0 , so the zeros of
N
Ξ(z) are either real and negative or else come in complex conjugate pairs.
Figure 6.3.1 : In the thermodynamic limit, the grand partition function can develop a singularity at positive real fugacity z . The set
of discrete zeros fuses into a branch cut.
For finite N , the situation is roughly as depicted in the left panel of Figure 6.3.1, with a set of N zeros arranged in complex
V V
conjugate pairs (or negative real values). The zeros aren’t necessarily distributed along a circle as shown in the figure, though.
They could be anywhere, so long as they are symmetrically distributed about the Re(z) axis, and no zeros occur for z real and
nonnegative.
Lee and Yang proved the existence of the limits
p 1
= lim ln Ξ(T , V , z)
kB T V →∞ V
∂ 1
n = lim z [ ln Ξ(T , V , z)] ,
V →∞ ∂z V
and notably the result

∂ p
n =z ( ) , (6.3.5)
∂z kB T
which amounts to the commutativity of the thermodynamic limit V → ∞ with the differential operator z . In particular, p(T , z) ∂
∂z
is a smooth function of z in regions free of roots. If the roots do coalesce and pinch the positive real axis, then then density n can
be discontinuous, as in a first order phase transition, or a higher derivative ∂ p/∂ n can be discontinuous or divergent, as in a
j j
second order phase transition.
Electrostatic Analogy
There is a beautiful analogy to the theory of two-dimensional electrostatics. We write
NV
p 1 z
= ∑ ln(1 − )
kB T V z
k=1 k
NV
= − ∑ [ϕ(z − z ) − ϕ(0 − z )] ,
k k
k=1
where
1
ϕ(z) = − ln(z) (6.3.6)
V
is the complex potential due to a line charge of linear density λ = V −1

located at origin. The number density is then
NV
∂ p ∂
n =z ( ) = −z ∑ ϕ(z − z ) , (6.3.7)
k
∂z kB T ∂z
k=1
to be evaluated for physical values of z , z ∈ R . Since ϕ(z) is analytic,

+
∂ϕ 1 ∂ϕ i ∂ϕ
= + =0 . (6.3.8)
∂z̄ 2 ∂x 2 ∂y
If we decompose the complex potential ϕ =ϕ

1
+ iϕ
2
into real and imaginary parts, the condition of analyticity is recast as the
Cauchy-Riemann equations,
∂ϕ ∂ϕ ∂ϕ ∂ϕ
1 2 1 2
= , =− . (6.3.9)
∂x ∂y ∂y ∂x
Thus,
∂ϕ 1 ∂ϕ i ∂ϕ
− =− +
∂z 2 ∂x 2 ∂y
1 ∂ϕ ∂ϕ i ∂ϕ ∂ϕ
1 2 1 2
=− ( + )+ ( − )
2 ∂x ∂y 2 ∂y ∂x
∂ϕ ∂ϕ
1 1
=− +i = Ex − i Ey ,
∂x ∂y
where E = −∇ϕ is the electric field. Suppose, then, that as
1
V → ∞ a continuous charge distribution develops, which crosses
the positive real z axis at a point x ∈ R . Then
+
n −n
+ − + −
= Ex (x ) − Ex (x ) = 4πσ(x) , (6.3.10)
x
where σ is the linear charge density (assuming logarithmic two-dimensional potentials), or the two-dimensional charge density (if
we extend the distribution along a third axis).
Example
As an example, consider the function
M M
(1 + z) (1 − z )
Ξ(z) =
1 −z
M 2 M−1
= (1 + z) (1 + z + z +… +z ) .
The (2M − 1) degree polynomial has an M order zero at z = −1 and (M − 1) simple zeros at z = e
th 2πik/M
, where
k ∈ {1, … , M −1} . Since M serves as the maximum particle number N , we may assume that V = M v , and the V → ∞ limit
V 0
may be taken as M → ∞ . We then have

p 1
= lim ln Ξ(z)
kB T V →∞ V
1 1
= lim ln Ξ(z)
v M→∞ M
0
1 1
M
= lim [M ln(1 + z) + ln(1 − z ) − ln(1 − z)] .
v M→∞ M
0
The limit depends on whether |z| > 1 or |z| < 1 , and we obtain
⎧ ln(1 + z)
⎪
if |z| < 1
pv ⎪
0
=⎨ (6.3.11)
kB T ⎪
⎩ [ ln(1 + z) + ln z]
⎪ if |z| > 1 .
Figure 6.3.2 : Fugacity z and pv 0 /kB T versus dimensionless specific volume v/v for the example problem discussed in the text.
0
Thus,
1 z
⎧ ⋅ if |z| < 1
⎪ v 1+z
⎪ 0
∂ p
n =z ( ) =⎨ (6.3.12)
∂z kB T ⎪ 1 z
⎩
⎪ ⋅[ + 1] if |z| > 1 .
v 1+z
0
If we solve for z(v) , where v = n −1

, we find
v
⎧
⎪
0
if v > 2v
⎪
⎪ v−v 0
0
z =⎨ (6.3.13)
⎪
⎪ v −v
1 2
⎩
⎪ 0
if v <v< v .
2v−v 2 0 3 0
0
We then obtain the equation of state,
⎧ ln( v
⎪ ) if v > 2v
⎪ v−v 0
⎪
⎪ 0
⎪
⎪
⎪
⎪
pv
0 2
= ⎨ ln 2 if
3
v
0
< v < 2v
0 (6.3.14)
kB T ⎪
⎪
⎪
⎪
⎪
⎪
⎪
v( v −v)
0 1 2
⎩ ln(
⎪ ) if v <v< v .
2
2 0 3 0
(2v−v )
0
This page titled 6.3: Lee-Yang Theory is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel Arovas.
The many-particle Distribution Function
The virial expansion is typically applied to low-density systems. When the density is high, when na ∼ 1 , where a is a typical molecular or atomic length scale, 3
the virial expansion is impractical. There are to many terms to compute, and to make progress one must use sophisticated resummation techniques to investigate the
high density regime.
To elucidate the physics of liquids, it is useful to consider the properties of various correlation functions. These objects are derived from the general N -body
Boltzmann distribution,
−1 1 ^
⎧ −βHN (p,x)
⎪
⎪ ZN ⋅ e OC E
N!
f (x , … , x ;p ,…,p ) =⎨ (6.4.1)
1 N 1 N
⎪
⎩
⎪ ^
−1 1 βμN −βH (p,x)
Ξ ⋅ e e N
GC E .
N!
We assume a Hamiltonian of the form

N 2
p
i
^
HN = ∑ + W (x , … , x ). (6.4.2)
1 N
2m
i=1
The quantity
d d d d
d x d p d x d p
1 1 N N
f (x , … , x ;p ,…,p ) ⋯ (6.4.3)
1 N 1 N
hd hd
is the propability of finding N particles in the system, with particle #1 lying within d x of x and having momentum within d p of p , If we compute averages
3
1 1
d
1 1
of quantities which only depend on the positions {x } and not on the momenta {p }, then we may integrate out the momenta to obtain, in the OCE,
j j
1 −βW ( x , … , x )
−1
P (x , … , x ) =Q ⋅ e 1 N
, (6.4.4)
1 N N
N!
where W is the total potential energy,
W (x , … , x ) = ∑ v(x ) + ∑ u(x − x ) + ∑ w(x − x , x −x )+… , (6.4.5)

1 N i i j i j j k
i i<j i<j<k
and Q is the configuration integral,

N
1 d d −βW ( x , … , x )
Q (T , V ) = ∫d x ⋯∫d x e 1 N
. (6.4.6)
N 1 N
N!
We will, for the most part, consider only two-body central potentials as contributing to W , which is to say we will only retain the middle term on the RHS. Note
that P (x , … , x ) is invariant under any permutation of the particle labels.
1 N
Averages over the Distribution

To compute an average, one integrates over the distribution:
d d
⟨F (x , … , x )⟩ = ∫ d x ⋯∫ d x P (x , … , x ) F (x , … , x ) . (6.4.7)
1 N 1 N 1 N 1 N
The overall N -particle probability density is normalized according to
d
∫d x P (x , … , x ) =1 . (6.4.8)
N 1 N
The average local density is
n (r) = ⟨ ∑ δ(r − x )⟩
1 i
d d
= N∫ d x ⋯∫ d x P (r, x , … , x ) .
2 N 2 N
Note that the local density obeys the sum rule
d
∫ d r n (r) = N . (6.4.9)
1
In a translationally invariant system, n = n =

1
is a constant independent of position. The boundaries of a system will in general break translational invariance,
N
so in order to maintain the notion of a translationally invariant system of finite total volume, one must impose periodic boundary conditions.
The two-particle density matrix n 2
(r , r )
1 2
is defined by
n (r , r ) = ⟨ ∑ δ(r − x ) δ(r − x )⟩
2 1 2 1 i 2 j
i≠j
d d
= N (N − 1)∫ d x ⋯∫ d x P (r , r , x , … , x ) .
3 N 1 2 3 N
As in the case of the one-particle density matrix, the local density n 1
(r) , the two-particle density matrix satisfies a sum rule:
d d
∫d r ∫d r n (r , r ) = N (N − 1) . (6.4.10)
1 2 2 1 2
Generalizing further, one defines the k -particle density matrix as

′
n (r , … , r ) = ⟨ ∑ δ(r −x ) ⋯ δ(r −x ) ⟩
k 1 k 1 i k i
1 k
i1 ⋯ik
N!
d d
= ∫d x ⋯∫ d x P (r , … , r , x ,…,x ) ,
k+1 N 1 k k+1 N
(N − k)!
where the prime on the sum indicates that all the indices i 1
,…,i
k
are distinct. The corresponding sum rule is then
N!
d d
∫d r ⋯∫ d r n (r , … , r ) = . (6.4.11)
1 k k 1 k
(N − k)!
The average potential energy can be expressed in terms of the distribution functions. Assuming only two-body interactions, we have
⟨W ⟩ = ⟨ ∑ u(x − x )⟩
i j
i<j
1
d d
= ∫d r ∫d r u(r − r ) ⟨ ∑ δ(r − x ) δ(r − x )⟩
1 2 1 2 1 i 2 j
2
i≠j
1
d d
= ∫d r ∫d r u(r − r ) n (r , r ) .
1 2 1 2 2 1 2
2
As the separations r ij
= |r − r |
i j
get large, we expect the correlations to vanish, in which case
′
n (r , … , r ) = ⟨ ∑ δ(r −x ) ⋯ δ(r −x )⟩
k 1 k 1 i1 k ik
i ⋯i
1 k
r →∞
ij
′
−−−−→ ∑ ⟨δ(r −x )⟩ ⋯ ⟨δ(r −x )⟩
1 i k i
1 k
i1 ⋯ik
N! 1
= ⋅ n (r ) ⋯ n (r )
k 1 1 1 k
(N − k)! N
1 2 k−1
= (1 − )(1 − ) ⋯ (1 − ) n (r ) ⋯ n (r ) .
1 1 1 k
N N N
The k -particle distribution function is defined as the ratio

n (r , … , r )
k 1 k
g (r , … , r ) ≡ . (6.4.12)
k 1 k
n (r ) ⋯ n (r )
1 1 1 k
For large separations, then,

rij →∞ k−1
j
g (r , … , r ) −−−−→ ∏ (1 − ) . (6.4.13)
k 1 k
N
j=1
For isotropic systems, the two-particle distribution function g (r , r )

2 1 2
depends only on the magnitude |r
1
−r |
2
. As a function of this scalar separation, the
function is known as the radial distribution function:
1
g(r) ≡ g (r) = ⟨ ∑ δ(r − x ) δ(x )⟩
2 2 i j
n
i≠j
1
= ⟨ ∑ δ(r − x + x )⟩ .
2 i j
V n
i≠j
The radial distribution function is of great importance in the physics of liquids because
thermodynamic properties of the system can be related to g(r)
g(r) is directly measurable by scattering experiments
For example, in an isotropic system the average potential energy is given by

1
d d
⟨W ⟩ = ∫d r ∫d r u(r − r ) n (r , r )
1 2 1 2 2 1 2
2
1
2 d d
= n ∫d r ∫d r u(r − r ) g(| r − r |)
1 2 1 2 1 2
2
2
N d
= ∫ d r u(r) g(r) .
2V
For a three-dimensional system, the average internal ( potential) energy per particle is
∞
⟨W ⟩
2
= 2πn∫ dr r g(r) u(r) . (6.4.14)
N
0
Intuitively, f (r) dr ≡ 4π r n g(r) dr is the average number of particles lying at a radial distance between r and r + dr from a given reference particle. The total
2
potential energy of interaction with the reference particle is then f (r) u(r) dr. Now integrate over all r and divide by two to avoid double-counting. This recovers
Equation 6.4.14.
Figure 6.4.1 : Pair distribution functions for hard spheres of diameter a at filling fraction η = a n = 0.49 (left) and for liquid Argon at T = 85 K (right).
π
6
3
Molecular dynamics data for hard spheres (points) is compared with the result of the Percus-Yevick approximation (see below in §5.8). Reproduced (without
permission) from J.-P. Hansen and I. R. McDonald, Theory of Simple Liquids, fig 5.5. Experimental data on liquid argon are from the neutron scattering work of J.
L. Yarnell , Phys. Rev. A 7, 2130 (1973). The data (points) are compared with molecular dynamics calculations by Verlet (1967) for a Lennard-Jones fluid.
In the OCE, g(r) obeys the sum rule
V V
d
∫ d r g(r) = ⋅ N (N − 1) = V − , (6.4.15)
2
N N
hence
d
n∫ d r [g(r) − 1] = −1 (OCE) . (6.4.16)
The function h(r) ≡ g(r) − 1 is called the pair correlation function.

In the grand canonical formulation, we have
⟨N ⟩ ⟨N (N − 1)⟩
3
n∫ d r h(r) = ⋅[ V −V ]
2
V ⟨N ⟩
2
2
⟨N ⟩ − ⟨N ⟩
= −1
⟨N ⟩
= nkB T κ −1 (GCE) ,
T
where κ is the isothermal compressibility. Note that in an ideal gas we have h(r) = 0 and κ = κ ≡ 1/nk T . Self-condensed systems, such as liquids and
T T
0
T
B
solids far from criticality, are nearly incompressible, hence 0 < nk T κ ≪ 1 , and therefore n∫ d r h(r) ≈ −1 . For incompressible systems, where κ = 0 , this
B
T
3
T
becomes an equality.
As we shall see below in §5.4, the function h(r), or rather its Fourier transform h
^
(k) , is directly measured in a scattering experiment. The question then arises as to
which result applies: the OCE result from Equation 6.4.16 or the GCE result from Equation ??? . The answer is that under almost all experimental conditions it is
the GCE result which applies. The reason for this is that the scattering experiment typically illuminates only a subset of the entire system. This subsystem is in
particle equilibrium with the remainder of the system, hence it is appropriate to use the grand canonical ensemble. The OCE results would only apply if the
scattering experiment were to measure the entire system.
Figure 6.4.2 : Monte Carlo pair distribution functions for liquid water. From A. K. Soper, Chem Phys. 202, 295 (1996).
Virial Equation of State

The virial of a mechanical system is defined to be
G= ∑x ⋅F , (6.4.17)
i i
where F is the total force acting on particle i. If we average G over time, we obtain
i
1
⟨G⟩ = lim ∫ dt ∑ x ⋅ F
i i
T →∞ T
i
0
1 2
= − lim ∫ dt ∑ m ẋi
T →∞ T
i
0
= −3N kB T .
Here, we have made use of
2
d
x ⋅ Fi = m x ⋅ ẍi = − m ẋi + (m x ⋅ ẋi ) , (6.4.18)
i i i
dt
as well as ergodicity and equipartition of kinetic energy. We have also assumed three space dimensions. In a bounded system, there are two contributions to the
force F . One contribution is from the surfaces which enclose the system. This is given by9
i
(surf )
⟨G⟩ = ⟨∑x ⋅ F ⟩ = −3pV . (6.4.19)
surf aces i i
The remaining contribution is due to the interparticle forces. Thus,

p N 1
= − ⟨∑x ⋅ ∇W⟩ . (6.4.20)
i i
kB T V 3V kB T
i
Invoking the definition of g(r) , we have

∞
⎧
⎪ ⎫
⎪
2πn 3 ′
p = nkB T ⎨1 − ∫ dr r g(r) u (r)⎬ . (6.4.21)
⎩
⎪ 3 kB T ⎭
⎪
0
As an alternate derivation, consider the First Law of Thermodynamics,

dΩ = −S dT − p dV − N dμ , (6.4.22)

∂Ω ∂F
p = −( ) = −( ) . (6.4.23)
∂V ∂V
T ,μ T ,N
Now let V 3
→ ℓ V , where ℓ is a scale parameter. Then
∂Ω 1 ∂ ∣
3
p =− =− ∣ Ω(T , ℓ V , μ) . (6.4.24)
∂V 3V ∂ℓ ∣
ℓ=1
Now
∞
3
1 βμN −3N 3 3 −βW ( x , … , x )
Ξ(T , ℓ V , μ) = ∑ e λ ∫d x ⋯∫ d x e 1 N
T 1 N
N!
N =0
3 3
ℓ V ℓ V
∞
N
1
βμ −3 3N 3 3 −βW (ℓx1 , … , ℓx )
= ∑ (e λ ) ℓ ∫ d x ⋯∫ d x e N
T 1 N
N!
N =0
V V
Thus,
3 3
1 ∂Ω(ℓ V ) ∣ kB T 1 ∂Ξ(ℓ V )
p =− ∣ =
3V ∂ℓ ∣ℓ=1 3V Ξ ∂ℓ
∞ ⎧
⎪ ⎫
⎪
kB T 1 1 −3 N 3 3 −βW ( x , … , x )
∂W
= ∑ (zλ ) ⎨∫ d x ⋯∫ d x e 1 N
[3N − β ∑ x ⋅ ]⎬
T 1 N i
3V Ξ N! ⎩
⎪ ∂x ⎭
⎪
N =0 i i
V V
1 ∂W
= nkB T − ⟨ ⟩ .
3V ∂ℓ ℓ=1
Finally, from W =∑
i<j
u(ℓ x
ij
) we have
∂W
⟨ ⟩ = ∑x ⋅ ∇u(x )
ij ij
∂ℓ ℓ=1
i<j
∞
2
2πN
3 ′
= ∫ dr r g(r) u (r) ,
V
0
and hence
∞
2
2 3 ′
p = nkB T − π n ∫ dr r g(r) u (r) . (6.4.25)
3
0
Note that the density n enters the equation of state explicitly on the RHS of the above equation, but also implicitly through the pair distribution function g(r) ,
which has implicit dependence on both n and T .
Correlations and Scattering

Consider the scattering of a light or particle beam ( photons or neutrons) from a liquid. We label the states of the beam particles by their wavevector k and we
assume a general dispersion ε . For photons, ε = ℏc|k| , while for neutrons ε = ℏ k /2m . We assume a single scattering process with the liquid, during which
k k k
2 2
n
the total momentum and energy of the liquid plus beam are conserved. We write
′
k = k+q
ε ′
=ε + ℏω ,
k k
where k is the final state of the scattered beam particle. Thus, the fluid transfers momentum Δp = ℏq and energy ℏω to the beam.
′
Figure 6.4.3 : In a scattering experiment, a beam of particles interacts with a sample and the beam particles scatter off the sample particles. A momentum ℏq and
energy ℏω are transferred to the beam particle during such a collision. If ω = 0 , the scattering is said to be elastic. For ω ≠ 0 , the scattering is inelastic.
Now consider the scattering process between an initial state | i, k ⟩ and a final state | j, k ⟩, where these states describe both the beam and the liquid. According to
′
Fermi’s Golden Rule, the scattering rate is

2π ′ 2
Γ ′
= ∣ ∣
∣⟨ j, k | V | i, k ⟩∣ δ(E − E + ℏω) , (6.4.26)
ik→jk j i
ℏ
where V is the scattering potential and E is the initial internal energy of the liquid. If r is the position of the beam particle and {x } are the positions of the liquid
i l
particles, then
N
V(r) = ∑ v(r − x ) . (6.4.27)

l
l=1
The differential scattering cross section (per unit frequency per unit solid angle) is
2 g(ε )
∂ σ ℏ k
′
= ∑P Γ ′ , (6.4.28)
i ik→jk
∂Ω ∂ω 4π |v |
k i,j
where
d
d k
g(ε) = ∫ δ(ε − ε ) (6.4.29)
d k
(2π)
is the density of states for the beam particle and

1
−βEi
P = e . (6.4.30)
i
Z
Consider now the matrix element

N
1 ′
′ d i(k−k )⋅r
⟨ j, k ∣
∣V∣
∣ i, k ⟩ = ⟨ j ∣
∣ ∑ ∫ d re v(r − x ) ∣
∣i⟩
l
V
l=1
N
1 −iq⋅x
= ^(q) ⟨ j ∣
v ∣ ∑e
l ∣
∣i⟩ ,
V
l=1
where we have assumed that the incident and scattered beams are plane waves. We then have
2 2 N
∂ σ ℏ g(ε ) ^(q)|
|v
k+q 2
−iq⋅x
= ∑P ∑∣
∣⟨ j ∣
∣ ∑e
l ∣
∣ i ⟩∣
∣ δ(E − E + ℏω)
2 i j i
∂Ω ∂ω 2 V
|∇ ε | i j l=1
k k
g(ε ) N
k+q 2
= |v
^(q)| S(q, ω) ,
2
4π | ∇ ε | V
k k
where S(q, ω) is the dynamic structure factor,

N
2πℏ 2
−iq⋅x
S(q, ω) = ∑P ∑∣
∣⟨ j ∣
∣ ∑e
l ∣
∣ i ⟩∣
∣ δ(E − E + ℏω) (6.4.31)
i j i
N
i j l=1
Note that for an arbitrary operator A ,

∞
2 1
i( Ej −Ei +ℏω) t/ℏ †
∑∣
∣⟨ j ∣
∣A∣
∣ i ⟩∣
∣ δ(E − E + ℏω) = ∑ ∫ dt e ⟨i ∣
∣A ∣
∣j⟩⟨j∣
∣A∣
∣i⟩
j i
2πℏ
j j
−∞
1 ^ ^
iωt † iH t/ℏ −iH t/ℏ
= ∑ ∫ dt e ⟨i ∣
∣A ∣
∣j⟩⟨j∣
∣e Ae ∣
∣i⟩
2πℏ
j
−∞
1 iωt †
= ∫ dt e ⟨i ∣
∣ A (0) A(t) ∣
∣i⟩ .
2πℏ
−∞
Thus,
∞
1 −iq⋅ x (t)
iωt iq⋅ x (0)
S(q, ω) = ∫ dt e ∑P ⟨i ∣
∣ ∑e l
e l
′
∣
∣i⟩
i
N ′
i l,l
−∞
1 iωt iq⋅ x (0) −iq⋅ x (t)

′
= ∫ dt e ⟨∑e l
e l
⟩ ,
N ′
l,l
−∞
where the angular brackets in the last line denote a thermal expectation value of a quantum mechanical operator. If we integrate over all frequencies, we obtain the
equal time correlator,
∞
dω 1 iq⋅( x −x )
S(q) = ∫ S(q, ω) = ∑ ⟨e l
l
′
⟩
2π N ′
l,l
−∞
d −iq⋅r
=N δ + 1 + n∫ d r e [g(r) − 1] .
q,0
known as the static structure factor10. Note that S(q = 0) = N , since all the phases e are then unity. As q → ∞ , the phases oscillate rapidly with changes
iq⋅( x −x )
i j
in the distances |x − x | , and average out to zero. However, the ‘diagonal’ terms in the sum, those with i = j , always contribute a total of 1 to S(q) . Therefore in
i j
the q → ∞ limit we have S(q → ∞) = 1 .

In general, the detectors used in a scattering experiment are sensitive to the energy of the scattered beam particles, although there is always a finite experimental
resolution, both in q and ω. This means that what is measured is actually something like
d ′ ′ ′ ′ ′ ′
Smeas (q, ω) = ∫ d q ∫ dω F (q − q ) G(ω − ω ) S(q , ω ) , (6.4.32)
where F and G are essentially Gaussian functions of their argument, with width given by the experimental resolution. If one integrates over all frequencies ω, if
one simply counts scattered particles as a function of q but without any discrimination of their energies, then one measures the static structure factor S(q) . Elastic
scattering is determined by S(q, ω = 0 , no energy transfer.
Figure 6.4.4 : Comparison of the static structure factor as determined by neutron scattering work of J. L. Yarnell , Phys. Rev. A 7, 2130 (1973) with molecular
dynamics calculations by Verlet (1967) for a Lennard-Jones fluid.
Correlation and Response

Suppose an external potential v(x) is also present. Then
1 1 −βW ( x , … , x ) −β ∑ v( x )
P (x , … , x ) = ⋅ e 1 N
e i i
, (6.4.33)
1 N
Q [v] N!
N
where
1 d d −βW ( x , … , x ) −β ∑ v( x )
Q [v] = ∫d x ⋯∫ d x e 1 N
e i i
. (6.4.34)
N 1 N
N!
The Helmholtz free energy is then

1 −dN
F =− ln (λ Q [v]) . (6.4.35)
T N
β
Now consider the functional derivative
δF 1 1 δQ
N
=− ⋅ ⋅ . (6.4.36)
δv(r) β Q δv(r)
N
Using
d
∑ v(x ) = ∫ d r v(r) ∑ δ(r − x ) , (6.4.37)
i i
i i
hence
δF d d
=∫d x ⋯∫ d x P (x , … , x ) ∑ δ(r − x )
1 N 1 N i
δv(r)
i
= n (r) ,
1
which is the local density at r .

Next, consider the response function,
2
δn (r) δ F [v]
′ 1
χ(r, r ) ≡ =
′ ′
δv(r ) δv(r) δv(r )
2
1 1 δQ δQ 1 1 δ Q
N N N
= ⋅ − ⋅
2 ′ ′
β Q δv(r) δv(r ) β Q δv(r) δv(r )
N N
′ ′ ′
= β n (r) n (r ) − β n (r) δ(r − r ) − β n (r, r ) .
1 1 1 2
In an isotropic system, χ(r, r ) = χ(r − r ) is a function of the coordinate separation, and

′ ′
′ 2 ′ 2 ′
−kB T χ(r − r ) = −n + n δ(r − r ) + n g(|r − r |)
2 ′ ′
=n h(|r − r |) + n δ(r − r ) .
Taking the Fourier transform,
2 ^
^(q) = n + n
−kB T χ h(q) = n S(q) . (6.4.38)
We may also write

κ
T
^ ^(0) ,
= 1 + n h(0) = −nkB T χ (6.4.39)
0
κ
T
κ
T
^(0)
= −χ .
What does this all mean? Suppose we have an isotropic system which is subjected to a weak, spatially inhomogeneous potential v(r) . We expect that the density
n(r) in the presence of the inhomogeneous potential to itself be inhomogeneous. The first corrections to the v = 0 value n = n are linear in v , and given by 0
d ′ ′ ′
δn(r) = ∫ d r χ(r, r ) v(r )
2 d ′ ′
= −β n0 v(r) − β n ∫ d r h(r − r) v(r ) .
0
Note that if v(r) > 0 it becomes energetically more costly for a particle to be at r . Accordingly, the density response is negative, and proportional to the ratio
v(r)/ k T – this is the first term in the above equation. If there were no correlations between the particles, then h = 0 and this would be the entire story. However,
B
the particles in general are correlated. Consider, for example, the case of hard spheres of diameter a , and let there be a repulsive potential at r = 0 . This means that
it is less likely for a particle to be centered anywhere within a distance a of the origin. But then it will be more likely to find a particle in the next ‘shell’ of radial
thickness a .
BBGKY Hierarchy
The distribution functions satisfy a hierarchy of integro-differential equations known as the BBGKY hierarchy11. In homogeneous systems, we have
N! 1
d d
g (r , … , r ) = ∫d x ⋯∫ d x P (r , … , r , x , … , x ) , (6.4.40)
k 1 k k+1 N 1 k k+1 N
(N − k)! nk
where
1 1 −βW ( x , … , x )
P (x , … , x ) = ⋅ e 1 N
. (6.4.41)
1 N
Q N!
N
Taking the gradient with respect to r , we have

1
−k
∂ 1 n d d −β ∑ u( x )
g (r , … , r ) = ⋅ ∫d x ⋯∫ d x e k<i<j ij
k 1 k k+1 N
∂r Q (N − k)!
1 N
∂ −β ∑ u( r ) −β ∑ u( r −x )
× [e i<j≤k ij
⋅e i≤k<j i j
] ,
∂r
1
where ∑ k<i<j
means to sum on indices i and j such that i < j and k < i ,
N −1 N
∑ u(x ) ≡ ∑ ∑ u(x − x )
ij i j
k<i<j i=k+1 j=i+1
k−1 k
∑ u(r ) ≡ ∑ ∑ u(r − r )
ij i j
i<j≤k i=1 j=i+1
k N
∑ u(r − x ) = ∑ ∑ u(r − x ) .
i j i j
i≤k<j i=1 j=k+1
Now
∂ −β ∑ u( r ) −β ∑ u( r −x )
[e i<j≤k ij
⋅e i≤k<j i j
] =
∂r
1
∂u(r −r ) ∂u(r −r )
1 j 1 j
−β ∑i<j≤k u( rij ) −β ∑i≤k<j u( ri −xj )
β { ∑ +∑ } ⋅ [e ⋅e ] ,
∂r ∂r
1<j≤k 1 k<j 1
hence
k
∂ ∂u(r −r )
1 j
g (r , … , r ) = −β ∑ g (r , … , r )
k 1 k k 1 k
∂r ∂r
1 j=2 1
∂u(r −x )
d 1 k+1
− β(N − k)∫ d x P (r , … , r , x , … , x )
k+1 1 k k+1 N
∂r
1
k
∂u(r −r )
1 j
= −β ∑ g (r , … , r )
k 1 k
∂r
j=2 1
∂u(r −x )
d 1 k+1
+ n∫ d x g (r , … , r , x ) .
k+1 k+1 1 k k+1
∂r
1
Thus, we obtain the BBGKY hierarchy:
k
∂ ∂u(r −r )
1 j
−kB T g (r , … , r ) =∑ g (r , … , r )
k 1 k k 1 k
∂r ∂r
1 j=2 1
′
∂u(r −r )
d ′ 1 ′
+ n∫ d r g (r , … , r , r ) .
k+1 1 k
∂r
1
The BBGKY hierarchy is an infinite tower of coupled integro-differential equations, relating g to g for all k . If we approximate g at some level k in terms of
k k+1 k
equal or lower order distributions, then we obtain a closed set of equations which in principle can be solved, at least numerically. For example, the Kirkwood
approximation closes the hierarchy at order k = 2 by imposing the condition
g (r , r , r ) ≡ g(r − r ) g(r − r ) g(r −r ) . (6.4.42)
3 1 2 3 1 2 1 3 2 2
This results in the single integro-differential equation
d ′ ′ ′ ′
−kB T ∇g(r) = g(r) ∇u + n∫ d r g(r) g(r ) g(r − r ) ∇u(r − r ) . (6.4.43)
This is known as the Born-Green-Yvon (BGY) equation. In practice, the BGY equation, which is solved numerically, gives adequate results only at low densities.
Ornstein-Zernike Theory
The direct correlation function c(r) is defined by the equation
3 ′ ′ ′
h(r) = c(r) + n∫ d r h(r − r ) c(r ) , (6.4.44)
where h(r) = g(r) − 1 and we assume an isotropic system. This is called the Ornstein-Zernike equation. The first term, c(r) , accounts for local correlations,
which are then propagated in the second term to account for long-ranged correlations.
The OZ equation is an integral equation, but it becomes a simple algebraic one upon Fourier transforming:
^ ^
h(q) = c
^(q) + n h(q) c
^(q) , (6.4.45)
the solution of which is

^(q)
c
^
h(q) = . (6.4.46)
1 −n c
^(q)
The static structure factor is then

1
^
S(q) = 1 + n h(q) = . (6.4.47)
1 −n c
^(q)
In the grand canonical ensemble, we can write

^ 0
1 + n h(0) 1 1 κ
T
κ = = ⋅ ⟹ ^(0) = 1 −
n c , (6.4.48)
T
nkB T nkB T 1 −n c
^(0) κ
T
where κ 0
T
= 1/nkB T is the ideal gas isothermal compressibility.
At this point, we have merely substituted one unknown function, h(r) , for another, namely c(r) . To close the system, we need to relate c(r) to h(r) again in some
way. There are various approximation schemes which do just this.
Percus-Yevick Equation
In the Percus-Yevick approximation, we take
βu(r)
c(r) = [1 − e ] ⋅ g(r) . (6.4.49)
Note that c(r) vanishes whenever the potential u(r) itself vanishes. This results in the following integro-differential equation for the pair distribution function
g(r) :
′
−βu(r) −βu(r) 3 ′ ′ βu( r ) ′
g(r) = e +n e ∫ d r [g(r − r ) − 1] ⋅ [1 − e ] g(r ) . (6.4.50)
This is the Percus-Yevick equation. Remarkably, the Percus-Yevick (PY) equation can be solved analytically for the case of hard spheres, where u(r) = ∞ for
r ≤ a and u(r) = 0 for r > a , where a is the hard sphere diameter. Define the function y(r) = e g(r) , in which case
βu(r)
−y(r) , r ≤a
c(r) = y(r) f (r) = { (6.4.51)
0 , r >a .
Here, f (r) = e − 1 is the Mayer function. We remark that the definition of y(r) may cause some concern for the hard sphere system, because of the e
−βu(r) βu(r)
term, which diverges severely for r ≤ a . However, g(r) vanishes in this limit, and their product y(r) is in fact finite! The PY equation may then be written for the
function y(r) as
3 ′ ′ 3 ′ ′ ′
y(r) = 1 + n∫d r y(r ) − n
∫d r y(r ) y(r − r ) . (6.4.52)
′ ′
r <a r <a
′
|r−r |>a
This has been solved using Laplace transform methods by M. S. Wertheim, J. Math. Phys. 5, 643 (1964). The final result for c(r) is
3
r 1 r
c(r) = − {λ + 6η λ2 ( )+ η λ1 ( ) } ⋅ Θ(a − r) , (6.4.53)
1
a 2 a
where η = 1
6
πa n
3
is the packing fraction and
2 1 2
(1 + 2η) (1 + η)
2
λ = , λ2 = − . (6.4.54)
1 4 4
(1 − η) (1 − η)
This leads to the equation of state

2
1 +η +η
p = nkB T ⋅ . (6.4.55)
3
(1 − η)
This gets B and B exactly right. The accuracy of the PY approximation for higher order virial coefficients is shown in table [pytab].
2 3
To obtain the equation of state from Equation ref{PYdcf}, we invoke the compressibility equation,
∂n 1
nkB T κ =( ) = . (6.4.56)
T
∂p ^(0)
1 −n c
T
We therefore need
3
^(0) = ∫ d r c(r)
c
3 2
1 3
= −4π a ∫ dx x [λ +6 η λ x+ η λ x ]
1 2 1
2
0
3
1 3 1
= −4π a [ λ + η λ + η λ ] .
1 2 1
3 2 12
With η = 1
6
πa n
3
and using the definitions of λ 1,2
in Equation 6.4.54, one finds
2
1 + 4η + 4η
^(0) =
1 −n c . (6.4.57)
4
(1 − η)
We then have, from the compressibility equation,

2
6 kB T ∂p 1 + 4η + 4η
= . (6.4.58)
3 4
πa ∂η (1 − η)
Integrating, we obtain p(η) up to a constant. The constant is set so that p = 0 when n = 0 . The result is Equation 6.4.55.
Table [pytab]: Comparison of exact (Monte Carlo) results to those of the Percus-Yevick (PY) and hypernetted chains approximation (HCA) for hard spheres in three
dimensions. Sources: Hansen and McDonald (1990) and Reichl (1998)
quantity exact PY HNC
B4 / B
3
2
0.28695 0.2969 0.2092
B5 / B
4
2
0.1103 0.1211 0.0493
B6 / B
5
2
0.0386 0.0281 0.0449
B7 / B
6
2
0.0138 0.0156 –
Another commonly used scheme is the hypernetted chains (HNC) approximation, for which
c(r) = −βu(r) + h(r) − ln(1 + h(r)) . (6.4.59)
The rationale behind the HNC and other such approximation schemes is rooted in diagrammatic approaches, which are extensions of the Mayer cluster expansion
to the computation of correlation functions. For details and references to their application in the literature, see Hansen and McDonald (1990) and Reichl (1998).
Ornstein-Zernike Approximation at Long Wavelengths

Let’s expand the direct correlation function c^(q) in powers of the wavevector q, viz.
2 4
^(q) = c
c ^(0) + c q +c q +… . (6.4.60)
2 4
Here we have assumed spatial isotropy. Then

1 2
^(q)
1 −n c = ^(0) − n c
= 1 −n c q +…
2
S(q)
−2 2 2 2 4
≡ξ R +q R + O(q ) ,
where
∞
2 4
R = −n c = 2πn∫ dr r c(r) (6.4.61)
2
and
∞ 2
1 −n c
^(0) 1 − 4πn∫ dr r c(r)
−2 0
ξ = = . (6.4.62)
2 ∞
4
R 2πn∫ dr r c(r)
0
The quantity R(T ) tells us something about the effective range of the interactions, while ξ(T ) is the correlation length. As we approach a critical point, the
correlation length diverges as a power law:
−ν
ξ(T ) ∼ A|T − Tc | . (6.4.63)
The susceptibility is given by

−2
nβR
^(q) = −nβ S(q) = −
χ (6.4.64)
−2 2
ξ +q + O(q 4 )
In the Ornstein-Zernike approximation, one drops the O(q )

4
terms in the denominator and retains only the long wavelength behavior. in the direct correlation
function. Thus,
−2
nβR
\xhihOZ(q) = − . (6.4.65)
−2 2
ξ +q
We now apply the inverse Fourier transform back to real space to obtain \xhiOZ(\Br) . In d = 1 dimension the result can be obtained exactly:
\begin{split} \xhiOZ_{d=1}(x)&=-{n\over\kT R^2}\!\int\limits_{-\infty}^\infty\!\!{dq\over 2\pi}\>{e^{iqx}\over \xi^{-2}+q^2}\\ &=-{n\xi\over 2\kT R^2}\,e^{-|x|/\xi}\ . \end{split}
In higher dimensions d > 1 we can obtain the result asymptotically in two limits:
Take r → ∞ with ξ fixed. Then
\xhiOZ_d(\Br)\simeq -C\ns_d \,n\cdot{\xi^{(3-d)/2}\over \kT\,R^2}\cdot{e^{-r/\xi}\over r^{(d-1)/2}}\cdot\left\{1+\CO\bigg({d-3\over r/\xi}\bigg)\right\}\ ,
where the C are dimensionless constants.

d
Take ξ → ∞ with r fixed; this is the limit T → Tc at fixed r. In dimensions d > 2 we obtain
\xhiOZ_d(\Br)\simeq -{C'_d\, n\over\kT R^2}\cdot{e^{-r/\xi}\over r^{d-2}}\cdot \left\{1+\CO\bigg({d-3\over r/\xi}\bigg)\right\}\ .
In d = 2 dimensions we obtain
\xhiOZ_{d=2}(\Br)\simeq-{C'_2\,n\over\kT R^2}\cdot\ln\!\bigg({r\over\xi}\bigg)\,e^{-r/\xi}\cdot\left\{1+\CO\bigg({1\over\ln(r/\xi)}\bigg)\right\}\ ,

d
′
At criticality, ξ → ∞ , and clearly our results in d = 1 and d = 2 dimensions are nonsensical, as they are divergent. To correct this behavior, M. E. Fisher in 1963
suggested that the OZ correlation functions in the r ≪ ξ limit be replaced by
η −r/ξ
ξ e
′′
χ(r) ≃ −C n⋅ ⋅ , (6.4.66)
d 2 d−2+η
kB T R r
a result known as anomalous scaling. Here, η is the anomalous scaling exponent.

Recall that the isothermal compressibility is given by κ T
^(0)
= −χ . Near criticality, the integral in χ
^(0) is dominated by the r ≪ ξ part, since ξ → ∞ . Thus, using
Fisher’s anomalous scaling,
d
κ = −χ
^(0) = − ∫ d r χ(r)
T
−r/ξ
d
e 2−η −(2−η)ν
∼ A∫ d r ∼Bξ ∼C ∣
∣T − Tc ∣
∣ ,
d−2+η
r
where A , B , and C are temperature-dependent constants which are nonsingular at T = Tc . Thus, since κ T
−γ
∝ |T − Tc | , we conclude
γ = (2 − η) ν , (6.4.67)
a result known as hyperscaling.
This page titled 6.4: Liquid State Physics is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel Arovas.
Electrostatic Potential
Coulomb systems are particularly interesting in statistical mechanics because of their long-ranged forces, which result in the
phenomenon of screening. Long-ranged forces wreak havoc with the Mayer cluster expansion, since the Mayer function is no
longer integrable. Thus, the virial expansion fails, and new techniques need to be applied to reveal the physics of plasmas.
The potential energy of a Coulomb system is
1 d d ′ ′ ′
U = ∫ d r∫ d r ρ(r) u(r − r ) ρ(r ) , (6.5.1)
2
where ρ(r) is the charge density and u(r), which has the dimensions of (energy)/(charge) , satisfies 2
2 ′ ′
∇ u(r − r ) = −4π δ(r − r ) . (6.5.2)
Thus,
′
⎧ −2π |x − x | , d =1
⎪
⎪
⎪
⎪
⎪
′
u(r) = ⎨ −2 ln |r − r | , d =2 (6.5.3)
⎪
⎪
⎪
⎪
⎩
⎪ ′ −1
|r − r | , d =3 .
For discete particles, the charge density ρ(r) is given by
ρ(r) = ∑ q δ(r − x ) , (6.5.4)

i i
where q is the charge of the i

i
th
particle. We will assume two types of charges: q = ±e , with e > 0 . The electric potential is
d ′ ′ ′
ϕ(r) = ∫ d r u(r − r ) ρ(r ) = ∑ q u (r − x ) . (6.5.5)
i i
This satisfies the Poisson equation,

2
∇ ϕ(r) = −4πρ(r) . (6.5.6)
The total potential energy can be written as

1 1
d
U = ∫ d r ϕ(r) ρ(r) = ∑ q ϕ(x ) . (6.5.7)
i i
2 2
i
Debye-Hückel theory
We now write the grand partition function:
∞ ∞
1 −N+ d 1 −N− d
βμ+ N+ βμ− N−
Ξ(T , V , μ ,μ ) = ∑ ∑ e λ ⋅ e λ
+ − + −
N+ ! N− !
N+ =0 N− =0
−βU( r , … , r )
d d 1 N +N
⋅∫d r ⋯∫ d r e + −
.
1 N+ +N−
We now adopt a mean field approach, known as Debye-Hückel theory, writing

av
ρ(r) = ρ (r) + δρ(r)
av
ϕ(r) = ϕ (r) + δϕ(r) .
We then have
1 d av av
U = ∫ d r [ρ (r) + δρ(r)] ⋅ [ ϕ (r) + δϕ(r)]
2
≡ U0 ignore f luctuation term

 
1 d av av d av
1 d
= − ∫d rρ (r) ϕ (r) + ∫ d r ϕ (r) ρ(r)+ ∫ d r δρ(r) δϕ(r) .
2 2
We apply the mean field approximation in each region of space, which leads to
av
eϕ (r)
−d d
Ω(T , V , μ ,μ ) = −kB T λ z ∫ d r exp(− )
+ − + +
kB T
av
eϕ (r)
−d d
− kB T λ z ∫ d r exp(+ ) ,
− −
kB T
where
2
2πℏ μ
±
λ =( ) , z = exp( ) . (6.5.8)
± ±
m kB T kB T
±
The charge density is therefore

δΩ e ϕ(r) e ϕ(r)
−d −d
ρ(r) = = e λ+ z+ exp(− ) − e λ− z− exp(+ ) , (6.5.9)
av
δϕ (r) kB T kB T
where we have now dropped the superscript on ϕ

av
(r) for convenience. At r → ∞ , we assume charge neutrality and ϕ(∞) = 0 .
Thus
−d −d
λ+ z+ = n+ (∞) = λ− z− = n− (∞) ≡ n∞ , (6.5.10)
where n ∞ is the ionic density of either species at infinity. Therefore,

e ϕ(r)
ρ(r) = −2e n∞ sinh( ) . (6.5.11)
kB T
We now invoke Poisson’s equation,

2
∇ ϕ = 8πen∞ sinh(βeϕ) − 4π ρ , (6.5.12)
ext
where ρ ext
is an externally imposed charge density.
If eϕ ≪ k B
T , we can expand the sinh function and obtain
2 2
∇ ϕ =κ ϕ − 4π ρ , (6.5.13)
D ext
where
2 1/2 1/2
8πn∞ e kB T
κ =( ) , λ =( ) . (6.5.14)
D D 2
kB T 8πn∞ e
The quantity λ is known as the Debye screening length. Consider, for example, a point charge
D
Q located at the origin. We then
solve Poisson’s equation in the weak field limit,
2 2
∇ ϕ =κ ϕ − 4πQ δ(r) . (6.5.15)
D
Fourier transforming, we obtain
2
4πQ
^ 2 ^ ^
−q ϕ (q) = κ ϕ (q) − 4πQ ⟹ ϕ (q) = . (6.5.16)
D 2 2
q +κ
D
Transforming back to real space, we obtain, in three dimensions, the Yukawa potential,
3 iq⋅r
d q 4πQ e Q −κ r
ϕ(r) = ∫ = ⋅e D . (6.5.17)
3 2 2
(2π) q +κ r
D
This solution must break down sufficiently close to r = 0 , since the assumption eϕ(r) ≪ kB T is no longer valid there. However,
for larger r, the Yukawa form is increasingly accurate.
For another example, consider an electrolyte held between two conducting plates, one at potential ϕ(x = 0) = 0 and the other at
potential ϕ(x = L) = V , where x^ is normal to the plane of the plates. Again assuming a weak field eϕ ≪ k T , we solve B
ϕ and obtain
2 2
∇ ϕ =κ
D
κ x −κ x
ϕ(x) = A e D +Be D . (6.5.18)
We fix the constants A and B by invoking the boundary conditions, which results in
sinh(κ x)
D
ϕ(x) = V ⋅ . (6.5.19)
sinh(κ L)
D
Debye-Hückel theory is valid provided n∞ λ

3
D
≫ 1 , so that the statistical assumption of many charges in a screening volume is
justified.
The Electron Gas: Thomas-Fermi Screening

Assuming k T ≪ ε , thermal fluctuations are unimportant and we may assume T = 0 . In the same spirit as the Debye-Hückel
B
F
approach, we assume a slowly varying mean electrostatic potential ϕ(r) . Locally, we can write
2 2
ℏ k
F
ε = − eϕ(r) . (6.5.20)
F
2m
Thus, the Fermi wavevector k is spatially varying, according to the relation

F
1/2
2m
k (r) = [ (ε + eϕ(r))] . (6.5.21)
F 2 F
ℏ
The local electron number density is

3 3/2
k (r) eϕ(r)
F
n(r) = = n∞ (1 + ) . (6.5.22)
2
3π ε
F
In the presence of a uniform compensating positive background charge ρ +

= en∞ , Poisson’s equation takes the form
3/2
eϕ(r)
2
∇ ϕ = 4πe n∞ ⋅ [(1 + ) − 1] − 4π ρext (r) . (6.5.23)
ε
F
If eϕ ≪ ε , we may expand in powers of the ratio, obtaining

F
2
6πn∞ e
2 2
∇ ϕ = ϕ ≡κ ϕ − 4π ρ (r) . (6.5.24)
TF ext
ε
F
Here, κTF
is the Thomas-Fermi wavevector,
2 1/2
6πn∞ e
κ =( ) . (6.5.25)
TF
ε
F
Thomas-Fermi theory is valid provided n ∞ λ

3
TF
≫ 1 , where λ TF
=κ
−1
TF
, so that the statistical assumption of many electrons in a
screening volume is justified.
One important application of Thomas-Fermi screening is to the theory of metals. In a metal, the outer, valence electrons of each
atom are stripped away from the positively charged ionic core and enter into itinerant, plane-wave-like states. These states disperse
with some ε(k) function (that is periodic in the Brillouin zone, under k → k + G , where G is a reciprocal lattice vector), and at
T = 0 this energy band is filled up to the Fermi level ε , as Fermi statistics dictates. (In some cases, there may be several bands at
F
the Fermi level, as we saw in the case of yttrium.) The set of ionic cores then acts as a neutralizing positive background. In a
perfect crystal, the ionic cores are distributed periodically, and the positive background is approximately uniform. A charged
impurity in a metal, such as a zinc atom in a copper matrix, has a different nuclear charge and a different valency than the host. The
charge of the ionic core, when valence electrons are stripped away, differs from that of the host ions, and therefore the impurity acts
as a local charge impurity. For example, copper has an electronic configuration of [Ar] 3d 4s . The 4s electron forms an energy 10 1
band which contains the Fermi surface. Zinc has a configuration of [Ar] 3d 4s , and in a Cu matrix the Zn gives up its two 4s 10 2
electrons into the 4s conduction band, leaving behind a charge +2 ionic core. The Cu cores have charge +1 since each copper
atom contributed only one 4s electron to the conduction band. The conduction band electrons neutralize the uniform positive
background of the Cu ion cores. What is left is an extra Q = +e nuclear charge at the Zn site, and one extra 4s conduction band
electron. The Q = +e impurity is, however, screened by the electrons, and at distances greater than an atomic radius the potential
that a given electron sees due to the Zn core is of the Yukawa form,
Q −κ r
ϕ(r) = ⋅e TF . (6.5.26)
r
We should take care, however, that the dispersion ε(k) for the conduction band in a metal is not necessarily of the free electron
form ε(k) = ℏ k /2m . To linear order in the potential, however, the change in the local electronic density is
2 2
δn(r) = eϕ(r) g(ε ) , (6.5.27)

F
where g(ε F
) is the density of states at the Fermi energy. Thus, in a metal, we should write
2
∇ ϕ = (−4π)(−e δn)
2 2
= 4π e g(ε )ϕ =κ ϕ ,
F TF
where
−−−−−−−−−
2
κ = √4π e g(ε ) . (6.5.28)
TF F
The value of g(ε ) will depend on the form of the dispersion. For ballistic bands with an effective mass
F
m
∗
, the formula in
Equation 6.5.24 still applies.
The Thomas-Fermi atom

Consider an ion formed of a nucleus of charge +Ze and an electron cloud of charge −N e . The net ionic charge is then (Z − N )e .
Since we will be interested in atomic scales, we can no longer assume a weak field limit and we must retain the full nonlinear
screening theory, for which
3/2
(2m) 3/2
2
∇ ϕ(r) = 4πe ⋅ (ε + eϕ(r)) − 4πZe δ(r) . (6.5.29)
2 3 F
3π ℏ
We assume an isotropic solution. It is then convenient to define

2
Ze
ε + eϕ(r) = ⋅ χ(r/ r ) , (6.5.30)
F 0
r
where r is yet to be determined. As r → 0 we expect χ → 1 since the nuclear charge is then unscreened. We then have
0
2 2
Ze 1 Ze
2 ′′
∇ { ⋅ χ(r/ r )} = χ (r/ r ) , (6.5.31)
0 2 0
r r r
0
thus we arrive at the Thomas-Fermi equation,
′′
1 3/2
χ (t) = χ (t) , (6.5.32)
√t
with r = t r , provided we take

0
2 2/3
ℏ 3π −1/3
r = ( −)
− = 0.885 Z a , (6.5.33)
0 B
2me2 4 √Z
where a B
=
ℏ
me
2
= 0.529 Å is the Bohr radius. The TF equation is subject to the following boundary conditions:
Figure 6.5.1 : The Thomas-Fermi atom consists of a nuclear charge +Ze surrounded by N electrons distributed in a cloud. The
electric potential ϕ(r) felt by any electron at position r is screened by the electrons within this radius, resulting in a self-consistent
potential ϕ(r) = ϕ + (Ze /r) χ(r/r ).
0
2
0
At short distances, the nucleus is unscreened,

χ(0) = 1 . (6.5.34)
For positive ions, with N <Z , there is perfect screening at the ionic boundary R = t ∗
r
0
, where χ(t ∗
) =0 . This requires
2 2
Ze Ze (Z − N ) e
′
E = −∇ϕ = [− χ(R/ r ) + χ (R/ r )] ^
r = ^
r . (6.5.35)
2 0 0 2
R Rr R
0
This requires
N
∗ ′ ∗
−t χ (t ) = 1 − . (6.5.36)
Z
For an atom, with N = Z , the asymptotic solution to the TF equation is a power law, and by inspection is found to be
χ(t) ∼ C t
−3
, where C is a constant. The constant follows from the TF equation, which yields 12 C = C , hence C = 144. 3/2
Thus, a neutral TF atom has a density with a power law tail, with ρ ∼ r . TF ions with N > Z are unstable.
−6
This page titled 6.5: Coulomb Systems - Plasmas and the Electron Gas is shared under a CC BY-NC-SA license and was authored, remixed,
6.6: Polymers
Basic concepts
Linear chain polymers are repeating structures with the chemical formula (A) , where A is the formula unit and x is the degree of
x
polymerization. In many cases (polystyrene), x ≈ 10 is not uncommon. For a very readable introduction to the subject, see P. G.
5
de Gennes, Scaling Concepts in Polymer Physics.

Quite often a given polymer solution will contain a distribution of x values; this is known as polydispersity. Various preparation
techniques, such as chromatography, can mitigate the degree of polydispersity. Another morphological feature of polymers is
branching, in which the polymers do not form linear chains.
Figure 6.6.1 : Some examples of linear chain polymers.

Polymers exhibit a static flexibility which can be understood as follows. Consider a long chain hydrocarbon with a −C − C − C −
backbone. The angle between successive C − C bonds is fixed at θ ≈ 68 , but the azimuthal angle φ can take one of three
∘
possible low-energy values, as shown in the right panel of Figure 6.6.2. Thus, the relative probabilities of gauche and trans
orientations are
P rob (gauche)
−Δε/ kB T
=2e (6.6.1)
P rob (trans)
where Δε is the energy difference between trans and gauche configurations. This means that the polymer chain is in fact a random
coil with a persistence length
Δε/ kB T
ℓp = ℓ e (6.6.2)
0
where ℓ is a microscopic length scale, roughly given by the length of a formula unit, which is approximately a few Ångstroms (see
0
Figure 6.6.2). Let L be the total length of the polymer when it is stretched into a straight line. If ℓ > L , the polymer is rigid. If
p
ℓ ≪ L , the polymer is rigid on the length scale ℓ but flexible on longer scales. We have
p p
ℓp 1 Δε/ kB T
= e , (6.6.3)
L N
where we now use N (rather than x) for the degree of polymerization.

In the time domain, the polymer exhibits a dynamical flexibility on scales longer than a persistence time. The persistence time τ is p
the time required for a trans-gauche transition. The rate for such transitions is set by the energy barrier B separating trans from
gauche configurations:
B/ kB T
τp = τ e (6.6.4)
0
where τ ∼ 10
0
s . On frequency scales ω ≪ τ
−11
the polymer is dynamically flexible. If Δε ∼ k T ≪ B the polymer is
−1
p B
flexible from a static point of view, but dynamically rigid. That is, there are many gauche orientations of successive carbon bonds
which reflect a quenched disorder. The polymer then forms a frozen random coil, like a twisted coat hanger.
Figure 6.6.2 : Left: trans and gauche orientations in carbon chains. Right: energy as a function of azimuthal angle φ . There are
three low energy states: trans (φ = 0 ) and gauche (φ = ±φ ). 0
Polymers as random walks

A polymer can be modeled by a self-avoiding random walk (SAW). That is, on scales longer than ℓ , it twists about randomly in p
space subject to the constraint that it doesn’t overlap itself. Before we consider the mathematics of SAWs, let’s first recall some
aspects of ordinary random walks which are not self-avoiding.
We’ll simplify matters further by considering random walks on a hypercubic lattice of dimension d . Such a lattice has coordination
number 2d, there are 2d nearest neighbor separation vectors, given by δ = ±a e ^ , ±a e^ , … , ±a e ^ , where a is the lattice 1 2 d
spacing. Consider now a random walk of N steps starting at the origin. After N steps the position is
N
R = ∑δ (6.6.5)
N j
j=1
where δ takes on one of 2d possible values. Now N is no longer the degree of polymerization, but somthing approximating L/ℓ ,
j p
which is the number of persistence lengths in the chain. We assume each step is independent, hence ⟨δ α
j
δ
j
β
′
2
⟩ = (a /d) δ
jj
′
δ
αβ
and
⟨R
2
N
⟩ = Na
2
. The full distribution PN
(R) is given by
−N
P (R) = (2d) ∑⋯∑δ
N R,∑ δ
j j
δ1 δN
π/a π/a N
d
dk dk 1
d 1 d −ik⋅R
=a ∫ ⋯∫ e [ ∑ cos(kμ a)]
2π 2π d
μ=1
−π/a −π/a
d
d k 1
d −ik⋅R 2 2
=a ∫ e exp[N ln(1 − k a + … )]
d 2d
(2π)
^
Ω
d d/2
a d
2
−N k a /2d
2
−ik⋅R
d 2
−dR /2N a
2
≈( ) ∫d k e e =( ) e .
2d 2πN
This is a simple Gaussian, with width ⟨R ⟩ = d⋅(N a /d) = N a , as we have already computed. The quantity R defined here is
2 2 2
−−
the end-to-end vector of the chain. The RMS end-to-end distance is then ⟨R ⟩ = √N a ≡ R . 2 1/2
0
A related figure of merit is the radius of gyration, R , defined by g
N
1 2
2
Rg = ⟨ ∑ (Rn − RC M ) ⟩ , (6.6.6)
N
n=1
N
where R CM
=
1
N
∑j=1 Rj is the center of mass position. A brief calculation yields
2
−1 2
Na
2
Rg = (N + 3 − 4 N )a ∼ , (6.6.7)
6
in all dimensions.
The total number of random walk configurations with end-to-end vector R is then (2d) N
P
N
(R) , so the entropy of a chain at fixed
elongation is
2
N
dkB R
S(R, N ) = kB ln [(2d) P (R)] = S(0, N ) − . (6.6.8)
N 2
2N a
If we assume that the energy of the chain is conformation independent, then E = E 0

(N ) and
2
dkB T R
F (R, N ) = F (0, N ) + . (6.6.9)
2N a2
In the presence of an external force F ext

, the Gibbs free energy is the Legendre transform
G(F , N ) = F (R, N ) − F ⋅R , (6.6.10)
ext ext
and ∂G/∂R = 0 then gives the relation

2
Na
⟨R(Fext , N )⟩ = Fext . (6.6.11)
dkB T
This may be considered an equation of state for the polymer.

Following de Gennes, consider a chain with charges ±e at each end, placed in an external electric field of magnitude
E = 30, 000 V /cm. Let N = 10 , a = 2 Å, and d = 3 . What is the elongation? From the above formula, we have
4
R eER
0
= = 0.8 , (6.6.12)
R 3 kB T
0
−
−
with R 0
= √N a as before.
Figure 6.6.3 : The polymer chain as a random coil.
Structure factor
We can also compute the structure factor,
N N
1
ik⋅( Rm −Rn )
S(k) = ⟨∑∑e ⟩
N
m=1 n=1
N m−1
2 ik⋅( R −R )
=1+ ∑ ∑ ⟨e m n
⟩
N
m=1 n=1
For averages with respect to a Gaussian distribution,

1 2
ik⋅( Rm −Rn )
⟨e ⟩ = exp {− ⟨(k ⋅ (R − R )) ⟩} . (6.6.13)
m n
2
Now for m > n we have R m
− Rn = ∑
m
j=n+1
δj , and therefore
m
2 2
1 2 2
⟨(k ⋅ (Rm − Rn )) ⟩ = ∑ ⟨(k ⋅ δj ) ⟩ = (m − n) k a , (6.6.14)
d
j=n+1
since ⟨δj
α
δ
j
β
′
⟩ = (a /d) δ
2
jj
′
δ
αβ
. We then have
N m−1
2 −(m−n) k a /2d
2 2
S(k) = 1 + ∑ ∑e (6.6.15)
N
m=1 n=1
2μk μk −N μk
N (e − 1) − 2 e (1 − e )
= ,
2
μk
N (e − 1)
where μ k = k a /2d
2 2
. In the limit where N → ∞ and a → 0 with N a 2
=R
2
0
constant, the structure factor has a scaling form,
S(k) = N f (N μk ) = (R /a) f (k R /2d)
0
2 2
0
, where
2
2 x x
−x
f (x) = (e − 1 + x) = 1 − + +… . (6.6.16)
2
x 3 12
Rouse model
Consider next a polymer chain subjected to stochastic forcing. We model the chain as a collection of mass points connected by
2
springs, with a potential energy U = k ∑ (x −x ) . This reproduces the distribution of Equation ??? if we take the spring
1
2 n n+1 n
constant to be k = 3k T /a and set the equilibrium length of each spring to zero. The equations of motion are then
B
2
M ẍn + γ ẋn = −k(2 xn − x −x ) + fn (t) , (6.6.17)

n−1 n+1
μ
where n ∈ {1, … , N } and {f n (t)} a set of Gaussian white noise forcings, each with zero mean, and
μ ν ′ μν ′
⟨fn (t) f ′ (t )⟩ = 2γ kB T δ ′ δ δ(t − t ) . (6.6.18)
n nn
We define x ≡ x and x 0
≡x
1
so that the end mass points n = 1 and
N +1 N
n =N experience a restoring force from only one
neighbor. We assume the chain is overdamped and set M → 0 . We then have
N
γ ẋn = −k ∑ A x + fn (t) , (6.6.19)

nn′ n′
′
n =1
where
1 −1 0 0 ⋯ 0
⎛ ⎞
⎜ −1 2 −1 0 ⋯ 0⎟
⎜ ⎟
⎜ 0 −1 2 −1 ⋯ 0⎟
⎜ ⎟
⎜ ⎟
A =⎜ ⎟ . (6.6.20)
⎜ 0 0 −1 ⋱ ⋯ ⋮ ⎟
⎜ ⎟
⎜ ⎟
⎜ ⎟
⎜ ⋮ ⋱ ⋱ 2 −1 ⎟
⎝ ⎠
0 ⋯ ⋯ 0 −1 1
The matrix A is real and symmetric. Its eigenfunctions are labeled ψ j

(n) , with j ∈ {0, … , N − 1} :
1
ψ (n) =
0 −
−
√N
−
−−
2 (2n − 1)jπ
ψ (n) = √ cos( ) , j ∈ {1, … , N − 1}
j
N 2N
The completeness and orthonormality relations are

N −1 N
′
∑ ψ (n) ψ (n ) = δ ′ , ∑ ψ (n) ψ ′ (n) = δ ′ , (6.6.21)
j j nn j j jj
j=0 n=1
with eigenvalues λ j
2
= 4 sin (πj/2N ) . Note that λ 0
=0 .
μ
We now work in the basis of normal modes {η j
} , where
N N −1
μ μ μ μ
η (t) = ∑ ψ (n) xn (t) , xn (t) = ∑ ψ (n) η (t) . (6.6.22)
j j j j
n=1 j=0
We then have
dη 1
j
=− η + g (t) , (6.6.23)
j j
dt
τ
j
where the j th
relaxation time is
γ
τ = (6.6.24)
j 2
4k sin (πj/2N )
and
N
μ −1 μ
g (t) = γ ∑ ψ (n) fn (t) . (6.6.25)
j j
n=1
Note that
μ ν ′ −1 μν ′
⟨g (t) g ′
(t )⟩ = 2 γ kB T δ ′
δ δ(t − t ) . (6.6.26)
j j jj
Integrating Equation 6.6.23, we have for, j = 0 ,

t
′ ′
η (t) = η (0) + ∫ dt g (t ) . (6.6.27)
0 0 0
For the j > 0 modes,

t
′
−t/τ ′ ′ ( t −t)/ τ
η (t) = η (0) e j
+ ∫ dt g (t ) e j
. (6.6.28)
j j j
Thus,
μ ν ′ −1 μν ′
⟨η (t) η (t )⟩ = 2γ kB T δ min(t, t )
0 0 c
′ ′
μ ν ′ −1 μν −|t−t |/ τ −(t+t )/ τ
⟨η (t) η (t )⟩ =γ kB T δ τ (e j
−e j
) ,
j j c j
where the ‘connected average’ is defined to be ⟨A(t) B(t )⟩ ′

c ≡ ⟨A(t) B(t )⟩ − ⟨A(t)⟩⟨B(t )⟩
′ ′
. Transforming back to the original
real space basis, we then have
N −1
μ ′
2 kB T μν ′
kB T μν ′ −|t−t |/ τ
′ ′
−(t+t )/ τ
ν
⟨xn (t) x ′ (t )⟩ = δ min(t, t ) + δ ∑τ ψ (n) ψ (n ) (e j
−e j
) . (6.6.29)
n c j j j
Nγ γ
j=1
In particular, the ‘connected variance’ of x n (t) is

N −1
2 6 kB T 3 kB T 2 −2t/τ
CVar[ xn (t)] ≡ ⟨[ xn (t)] ⟩ = t+ ∑τ [ ψ (n)] (1 − e j
) . (6.6.30)
c j j
Nγ γ
j=1
From this we see that at long times, when t ≫ τ , the motion of x (t) is diffusive, with diffusion constant D = k T /N γ ∝ B ,
1 n B
−1
which is inversely proportional to the chain length. Recall the Stokes result γ = 6πηR/M for a sphere of radius R and mass M
moving in a fluid of dynamical viscosity η . From D = k T /γM , shouldn’t we expect the diffusion constant to be B
D = k T /6πηR ∝ N
B
, since the radius of gyration of the polymer is R ∝ N
−1/2
? This argument smuggles in the assumption g
1/2
that the only dissipation is taking place at the outer surface of the polymer, modeled as a ball of radius R . In fact, for a Gaussian g
random walk in three space dimensions, the density for r < R is ρ ∝ N since there are N monomers inside a region of
g
−1/2
−− 3
volume (√N ) . Accounting for Flory swelling due to steric interactions (see below), the density is ρ ∼ N , which is even −4/5
smaller. So as N → ∞ , the density within the r = R effective sphere gets small, which means water molecules can easily
g
penetrate, in which case the entire polymer chain should be considered to be in a dissipative environment, which is what the Rouse
model says – each monomer executed overdamped motion.
A careful analysis of Equation 6.6.30 reveals that there is a subdiffusive regime12 where CVar[ x (t)] ∝ t . To see this, first n
1/2
take the N ≫ 1 limit, in which case we may write τ = N τ /j , where τ ≡ γ/π k and j ∈ {1, … , N − 1} . Let
j
2
0
2
0
2
)/N ∈ [0, 1] be the scaled coordinate along the chain. The second term in Equation 6.6.30 is then
1
s ≡ (n −
2
N −1 2
6 kB T τ cos (πjs) 2
1 −2 j t/ τ
S(s, t) ≡ ⋅ ∑ (1 − e 1
) . (6.6.31)
2
γ N j
j=1
Let σ ≡ (t/τ 1
1/2
) . When t ≪ τ , σ ≪ 1 , we have
1
Nσ
2
6 kB T τ cos (πus/σ) 2
1 −2u
S(s, t) ≃ ⋅ σ∫ du (1 − e ) . (6.6.32)
2
γ N u
0
Since s/σ ≫ 1 , we may replace the cosine squared term by its average . If we further assume N σ ≫ 1 , which means we are in 1
the regime 1 ≪ t/τ ≪ N , after performing the integral we obtain the result
0
2
3 kB T −−−−−
S(s, t) = √2π τ0 t , (6.6.33)
γ
provided s = O(1) , the site n is not on either end of the chain. The result in Equation 6.6.33 dominates the first term on the RHS
of Equation \reF{Rousevar} since τ ≪ t ≪ τ . This is the subdiffusive regime.
0 1
When t ≫ τ = N τ , the exponential on the RHS of Equation 6.6.31 is negligible, and if we again approximate cos (πjs) ≃ ,
1
2
0
2 1
and we extend the upper limit on the sum to infinity, we find S(t) = (3k T /γ)(τ /N )(π /6) ∝ t , which is dominated by the B
1
2 0
leading term on the RHS of Equation 6.6.30. This is the diffusive regime, with D = k T /N γ . B
Finally, when , the factor 1 − exp(−2t/τ ) may be expanded to first order in t . One then obtains
t ≪ τ
0 j
CVar[ xn (t)] = (6 kB T /γ) t, which is independent of the force constant k . In this regime, the monomers don’t have time to
respond to the force from their neighbors, hence they each diffuse independently. On such short time scales, however, one should
check to make sure that inertial effects can be ignored, that t ≫ M /γ .
One serious defect of the Rouse model is its prediction of the relaxation time of the j = 1 mode, τ ∝ N . The experimentally 1
2
observed result is τ ∝ N . We should stress here that the Rouse model applies to ideal chains. In the theory of polymer
1
3/2
solutions, a theta solvent is one in which polymer coils act as ideal chains. An extension of the Rouse model, due to my former
UCSD colleague Bruno Zimm, accounts for hydrodynamically-mediated interactions between any pair of ‘beads’ along the chain.
Specifically, the Zimm model is given by
μ
dxn μν ν ν ν ν
= ∑H (xn − x ′ )[k(x ′ +x ′ − 2x ′ )+f ′ (t)] , (6.6.34)
n n +1 n −1 n n
dt ′
n
where
μν
1 μν
μ ν
^ ^
H (R) = (δ +R R ) (6.6.35)
6πηR
is known as the Oseen hydrodynamic tensor (1927) and arises when computing the velocity in a fluid at position R when a point
force F = f δ(r) is applied at the origin. Typically one replaces H (R) by its average over the equilibrium distribution of polymer
configurations. Zimm’s model more correctly reproduces the behavior of polymers in θ -solvents.
Flory Theory of Self-Avoiding Walks
What is missing from the random walk free energy is the effect of steric interactions. An argument due to Flory takes these
interactions into account in a mean field treatment. Suppose we have a chain of radius R . Then the average monomer density
within the chain is c = N /R . Assuming short-ranged interactions, we should then add a term to the free energy which effectively
d
counts the number of near self-intersections of the chain. This number should be roughly N c. Thus, we write
2 2
N 1 R
F (R, N ) = F + u(T ) + dkB T . (6.6.36)
0 2
Rd 2 Na
The effective interaction u(T ) is positive in the case of a so-called ‘good solvent’.
The free energy is minimized when
2
∂F dvN R
0 = =− + dkB T , (6.6.37)
d+1
∂R R N a2
which yields the result

2 1/(d+2)
ua
3/(d+2) ν
R (N ) = ( ) N ∝N . (6.6.38)
F
kB T
Thus, we obtain ν = 3/(d + 2) . In d = 1 this says ν = 1 , which is exactly correct because a SAW in d = 1 has no option but to
keep going in the same direction. In d = 2 , Flory theory predicts ν = , which is also exact. In d = 3 , we have ν
3
4
= , which d=3
3
is extremely close to the numerical value ν = 0.5880. Flory theory is again exact at the SAW upper critical dimension, which is
d = 4 , where ν = , corresponding to a Gaussian random walk13. Best. Mean. Field. Theory. Ever.
1
How well are polymers described as SAWs? Figure 6.6.4 shows the radius of gyration R versus molecular weight M for g
polystyrene chains in a toluene and benzene solvent. The slope is ν = d ln R /d ln M = 0.5936 . Experimental results can vary
g
with concentration and temperature, but generally confirm the validity of the SAW model.
For a SAW under an external force, we compute the Gibbs partition function,
d Fext ⋅R/ kB T d ^ ⋅x
sn
Y (F , N) = ∫ d R P (R) e = ∫ d x f (x) e , (6.6.39)
ext N
where x = R/R
F
and and n
s = kB T / R
F
^ =F
F
^
. One than has R(F ) = R Φ(R /ξ), where ξ = k T /F
ext ext
and ext F F B
ext
R / k T . For small values of its argument one has Φ(u) ∝ u. For large u it can be shown that
2
R(F ) =F B
ext ext F
R(F
ext
) ∝ (F
ext
R /k T )
F
.
B
2/3
Figure 6.6.4 : Radius of gyration R of polystyrene in a toluene and benzene solvent, plotted as a function of molecular weight of
g
the polystyrene. The best fit corresponds to a power law R ∝ M with ν = 0.5936. From J. Des Cloizeaux and G. Jannink,
g
ν
Polymers in Solution: Their Modeling and Structure (Oxford, 1990).

On a lattice of coordination number z , the number of N -step random walks starting from the origin is Ω
N
=z
N
. If we constrain
our random walks to be self-avoiding, the number is reduced to
SAW γ−1 N
Ω =CN y , (6.6.40)
N
where C and γ are dimension-dependent constants, and we expect y ∝ z − 1 , since at the very least a SAW cannot immediately
double back on itself. In fact, on the cubic lattice one has z = 6 but y = 4.68, slightly less than z − 1 . One finds γ ≃ and d=2
4
γ
d=3
≃
7
6
. The RMS end-to-end distance of the SAW is
ν
R =aN , (6.6.41)
F
where a and ν are d -dependent constants,with ν d=1

=1 ,ν d=2
≃
3
4
, and ν d=3
≃
3
5
. The distribution P
N
(R) has a scaling form,
1 R
P (R) = f( ) (a ≪ R ≪ N a) . (6.6.42)
N d
R R
F F
One finds
g
x x ≪ 1
f (x) ∼ { (6.6.43)
δ
exp(−x ) x ≫ 1 ,
with g = (γ − 1)/ν and δ = 1/(1 − ν ) .
Polymers and Solvents

Consider a solution of monodisperse polymers of length N in a solvent. Let ϕ be the dimensionless monomer concentration, so
ϕ/N is the dimensionless polymer concentration and ϕ = 1 − ϕ is the dimensionless solvent concentration. (Dimensionless
s
concentrations are obtained by dividing the corresponding dimensionful concentration by the overall density.) The entropy of
mixing for such a system is given by Equation 2.352. We have
V kB 1
S =− ⋅{ ϕ ln ϕ + (1 − ϕ) ln(1 − ϕ)} , (6.6.44)
mix
v N
0
where v ∝ a is the volume per monomer. Accounting for an interaction between the monomer and the solvent, we have that the
0
3
free energy of mixing is

v Fmix 1
0
= ϕ ln ϕ + (1 − ϕ) ln(1 − ϕ) + χ ϕ(1 − ϕ) . (6.6.45)
V kB T N
where χ is the dimensionless polymer-solvent interaction, called the Flory parameter. This provides a mean field theory of the
polymer-solvent system.
The osmotic pressure Π is defined by
∂F ∣
mix
Π =− ∣ , (6.6.46)
∂V ∣
Np
which is the variation of the free energy of mixing with respect to volume holding the number of polymers constant. The monomer
concentration is ϕ = N N v /V , so
p 0
2
∣∂ ϕ ∣∂
∣ =− ∣ . (6.6.47)
∣
∂V Np N Np v ∂ϕ ∣Np
0
Now we have
1
−1
Fmix = N Np kB T { ln ϕ + (ϕ − 1) ln(1 − ϕ) + χ (1 − ϕ)} , (6.6.48)
N
and therefore
kB T
−1 2
Π = [(N − 1) ϕ − ln(1 − ϕ) − χ ϕ ] . (6.6.49)
v
0
In the limit of vanishing monomer concentration ϕ → 0 , we recover

ϕ kB T
Π = , (6.6.50)
Nv
0
which is the ideal gas law for polymers.
For N −1
≪ ϕ ≪ 1 , we expand the logarithm and obtain
v Π 1 1
0 2 3
= ϕ+ (1 − 2χ) ϕ + O(ϕ )
kB T N 2
1 2
≈ (1 − 2χ) ϕ .
2
Note that Π > 0 only if χ < 1
2
, which is the condition for a ’good solvent’.
In fact, Equation ??? is only qualitatively correct. In the limit where χ ≪ , Flory showed that the individual polymer coils
1
behave much as hard spheres of radius R . The osmotic pressure then satisfies something analogous to a virial equation of state:
F
2
Π ϕ ϕ
3
= +A ( )R +…
F
kB T Nv Nv
0 0
ϕ ∗
= h(ϕ/ ϕ ) .
Nv
0
This is generalized to a scaling form in the second line, where h(x) is a scaling function, and ϕ = N v /R ∝ N , assuming ∗
0
3
F
−4/5
d = 3 and ν = from Flory theory. As x = ϕ/ϕ → 0 , we must recover the ideal gas law, so h(x) = 1 + O(x) in this limit. For
3 ∗
x → ∞ , we require that the result be independent of the degree of polymerization N . This means h(x) ∝ x with p = 1 , p = . p 4 5
5 4
The result is known as the des Cloiseaux law:

v Π
0 9/4
=C ϕ , (6.6.51)
kB T
where C is a constant. This is valid for what is known as semi-dilute solutions, where ϕ ≪ ϕ ≪ 1 . In the dense limit ϕ ∼ 1 , the
∗
results do not exhibit this universality, and we must appeal to liquid state theory, which is no fun at all.
This page titled 6.6: Polymers is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel Arovas.
Definition
The Potts model is defined by the Hamiltonian
H = −J ∑ δσ ,σ
−h ∑δ . (6.7.1)
i j
σi ,1
⟨ij⟩ i
Here, the spin variables σ take values in the set {1, 2, … , q} on each site. The equivalent of an external magnetic field in the Ising
i
case is a field h which prefers a particular value of σ (σ = 1 in the above Hamiltonian). Once again, it is not possible to compute
the partition function on general lattices, however in one dimension we may once again find Z using the transfer matrix method.
Transfer matrix
On a ring of N sites, we have
−βH
Z = Tre
βhδσ ,1
βJδσ ,σ
βhδσ ,1
βJδσ ,σ
= ∑e 1
e 1 2
⋯ e N
e N 1
{ σn }
N
= T r (R ) ,
where the q × q transfer matrix R is given by

β(J+h) ′
⎧e
⎪ if σ = σ =1
⎪
⎪ βJ
⎪e ′
1 1 ⎪ if σ = σ ≠1
βJδσσ ′ 2βhδ
= ⎨ eβh/2 ′
2βhδ ′
R ′
=e e σ,1
e σ ,1
if σ = 1 and σ ≠1 (6.7.2)
σσ
⎪
⎪ eβh/2 ′
⎪ if σ ≠ 1 and σ =1
⎪
⎩
⎪
′ ′
1 if σ ≠ 1 and σ ≠ 1 and σ ≠ σ .
In matrix form,
β(J+h) βh/2 βh/2 βh/2
⎛e e e ⋯ e ⎞
βh/2 βJ
⎜ e e 1 ⋯ 1 ⎟
⎜ ⎟
⎜ βh/2 βJ ⎟
⎜ e 1 e ⋯ 1 ⎟
R =⎜ ⎟ (6.7.3)
⎜ ⎟
⎜ ⎟
⋮ ⋮ ⋮ ⋱ ⋮
⎜ ⎟
⎜ ⎟
βh/2 βJ
⎜ e 1 1 ⋯ e 1 ⎟
⎝ βh/2 βJ ⎠
e 1 1 ⋯ 1 e
The matrix R has q eigenvalues λ , with j = 1, … , q . The partition function for the Potts chain is then
j
N
Z = ∑λ . (6.7.4)
j
j=1
We can actually find the eigenvalues of R analytically. To this end, consider the vectors
βh/2
1 e
⎛ ⎞ ⎛ ⎞
⎜0⎟ −1/2 ⎜ 1 ⎟
βh
ϕ =⎜ ⎟ , ψ = (q − 1 + e ) ⎜ ⎟ . (6.7.5)
⎜ ⎟ ⎜ ⎟
⎜ ⋮ ⎟ ⎜ ⋮ ⎟
⎝ ⎠ ⎝ ⎠
0 1
Then R may be written as

βJ βh βJ βh
R = (e − 1) I + (q − 1 + e ) | ψ ⟩⟨ ψ | + (e − 1)(e − 1) | ϕ ⟩⟨ ϕ | , (6.7.6)
where I is the q × q identity matrix. When h = 0 , we have a simpler form,
βJ
R = (e − 1) I + q | ψ ⟩⟨ ψ | . (6.7.7)
From this we can read off the eigenvalues:

βJ
λ =e +q −1
1
βJ
λ =e −1 , j ∈ {2, … , q} ,
j
since | ψ ⟩ is an eigenvector with eigenvalue λ = e βJ

+q −1 , and any vector orthogonal to | ψ ⟩ has eigenvalue λ = e βJ
−1 . The
partition function is then
βJ N βJ N
Z = (e + q − 1) + (q − 1)(e − 1) . (6.7.8)
In the thermodynamic limit N → ∞ , only the λ eigenvalue contributes, and we have

1
J/ kB T
F (T , N , h = 0) = −N kB T ln (e + q − 1) for N → ∞ . (6.7.9)
When h is nonzero, the calculation becomes somewhat more tedious, but still relatively easy. The problem is that | ψ ⟩ and | ϕ ⟩ are
not orthogonal, so we define
| ϕ ⟩ − | ψ ⟩⟨ ψ | ϕ ⟩
| χ⟩ = − −−− −− − −−− , (6.7.10)
√ 1 − ⟨ ϕ | ψ ⟩2
where
βh 1/2
e
x ≡⟨ϕ|ψ⟩ =( ) . (6.7.11)
βh
q −1 +e
Now we have ⟨ χ | ψ ⟩ = 0 , with ⟨ χ | χ ⟩ = 1 and ⟨ ψ | ψ ⟩ = 1 , with

−−−−−−
2
| ϕ ⟩ = √1 − x | χ⟩+x | ψ ⟩ . (6.7.12)
and the transfer matrix is then

βJ βh
R = (e − 1) I + (q − 1 + e ) | ψ ⟩⟨ ψ |
− −−− −
βJ βh 2 2 2
+ (e − 1)(e − 1) [(1 − x ) | χ ⟩⟨ χ | + x | ψ ⟩⟨ ψ | + x √ 1 − x (| χ ⟩⟨ ψ | + | ψ ⟩⟨ χ |)]
βh
e
βJ βh βJ βh
= (e − 1) I + [(q − 1 + e ) + (e − 1)(e − 1)( )]| ψ ⟩⟨ ψ |
q − 1 + eβh
q −1
βJ βh
+ (e − 1)(e − 1) ( ) | χ ⟩⟨ χ |
βh
q −1 +e
βh 1/2
(q − 1) e
βJ βh
+ (e − 1)(e − 1) ( ) (| χ ⟩⟨ ψ | + | ψ ⟩⟨ χ |) ,
βh
q −1 +e
which in the two-dimensional subspace spanned by | χ ⟩ and | ψ ⟩ is of the form

a c
R =( ) . (6.7.13)
c b
Recall that for any 2 × 2 Hermitian matrix,
M =a I+a⋅ τ
0
a +a a − ia
0 3 1 2
=( ) ,
a + ia a −a
1 2 0 3
the characteristic polynomial is

2 2 2 2
P (λ) = det (λ I − M ) = (λ − a ) −a −a −a , (6.7.14)
0 1 2 3
and hence the eigenvalues are
−−−−−−−−−−
2 2 2
λ =a ±√a +a +a . (6.7.15)
± 0 1 2 3
For the transfer matrix of Equation ??? , we obtain, after a little work,
1
βJ βh βJ βh
λ =e −1 + [q − 1 + e + (e − 1)(e − 1)]
1,2
2
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
2
1
βh βJ βh βJ βh
± √ [q − 1 + e + (e − 1)(e − 1)] − 4(q − 1)(e − 1)(e − 1) .
2
There are q − 2 other eigenvalues, however, associated with the (q−2) -dimensional subspace orthogonal to | χ ⟩ and | ψ ⟩. Clearly
all these eigenvalues are given by
βJ
λ =e −1 , j ∈ {3 , … , q} . (6.7.16)
j
The partition function is then

N N N
Z =λ +λ + (q − 2) λ , (6.7.17)
1 2 3
and in the thermodynamic limit N → ∞ the maximum eigenvalue λ dominates. Note that we recover the correct limit as h → 0 .
1
This page titled 6.7: Appendix I- Potts Model in One Dimension is shared under a CC BY-NC-SA license and was authored, remixed, and/or
6.S: Summary
References
concepts.
L. E. Reichl, A Modern Course in Statistical Physics (2nd edition, Wiley, 1998) A comprehensive graduate level text with an
emphasis on nonequilibrium phenomena.
rd
rd
J.-P Hansen and I. R. McDonald, Theory of Simple Liquids (Academic Press, 1990) An advanced, detailed discussion of liquid
state physics.
Summary
∙ Lattice-based models: Amongst the many lattice-based models of physical interest are
^
H Ising = −J ∑ σ σ −H ∑σ ; σ ∈ {−1, +1}
i j i i
⟨ij⟩ i
^
H P otts = −J ∑ δ −H ∑δ ; σ ∈ {1, … , q}
σi , σj σ,1 i
⟨ij⟩ i
^ n−1
H = −J ∑ n
^ ⋅n
^ −H⋅ ∑n
^ ; n
^ ∈ S .
O(n) i j i i
⟨ij⟩ i
Here J is the coupling between neighboring sites and H (or H) is a polarizing field which breaks a global symmetry (groups Z , 2
S , and O(n), respectively). J > 0 describes a ferromagnet and J < 0 an antiferromagnet. One can generalize to include further
q
neighbor interactions, described by a matrix of couplings J . When J = 0 , the degrees of freedom at each site are independent,
ij
and Z(T , N , J = 0, H ) = ζ , where ζ(T , H ) is the single site partition function. When J ≠ 0 it is in general impossible to
N
compute the partition function analytically, except in certain special cases.

∙ Transfer matrix solution in d = 1 : One such special case is that of one-dimensional systems. In that case, one can write
) , where R is the transfer matrix. Consider a general one-dimensional model with nearest-neighbor interactions and
N
Z = Tr(R
Hamiltonian
^
H = − ∑ U (αn , α ) − ∑ W (αn ) , (6.S.1)
n+1
n n
where αn describes the local degree of freedom, which could be discrete or continuous, single component or multi-component.
Then
′ ′
U(α, α )/ kB T W ( α )/ kB T
R =e e . (6.S.2)
αα′
The form of the transfer matrix

is not unique, although its eigenvalues are. We could have taken
, for example. The interaction matrix U (α, α ) may or may not be symmetric itself. On
′ ′
W (α)/2 kB T U(α, α )/ kB T W ( α )/2 kB T ′
R =e′
e e
αα
a ring of N sites, one has Z = ∑ λ , where {λ } are the eigenvalues and K the rank of R . In the thermodynamic limit, the
K
i=1
N
i i
partition function is dominated by the eigenvalue with the largest magnitude.

∙ Higher dimensions: For one-dimensional classical systems with finite range interactions, the thermodynamic properties vary
smoothly with temperature for all T > 0 . The lower critical dimension d of a model is the dimension at or below which there is ℓ
no finite temperature phase transition. For models with discrete global symmetry groups, d = 1 , while for continuous global ℓ
symmetries d = 2 . In zero external field the (d = 2 ) square lattice Ising model has a critical temperature T = 2.269 J. On the
ℓ
c
honeycomb lattice, T = 1.519 J. For the O(3) model on the cubic lattice, T = 4.515 J. In general, for unfrustrated systems, one
c c
expects for d > d that T ∝ z , where z is the lattice coordination number ( number of nearest neighbors).
ℓ
c
2
p
∙ Nonideal classical gases: For H
^
=∑
N
i=1 2m
i
+∑
i<j
u(| x − x |)
i j
, one has Z(T , V , N ) = λ −N d
T
Q
N
(T , V ) , where
1 d d −u( r )/ kB T
Q (T , V ) = ∫d x ⋯∫d x ∏e ij
(6.S.3)
N 1 N
N!
i<j
is the configuration integral. For the one-dimensional Tonks gas of N hard rods of length a confined to the region x ∈ [0, L], one
finds Q (T , L) = (L − N a) , whence the equation of state p = nk T /(1 − na) . For more complicated interactions, or in
N
N
B
higher dimensions, the configuration integral is analytically intractable.

∙ Mayer cluster expansion: Writing the Mayer function f ≡ e − 1 , and assuming
ij
−u
ij
/ kB T d
∫ d r f (r) is finite, one can expand the
pressure p(T , z) and n(T , z) as power series in the fugacity z = exp(μ/k T ) , viz. B
−d nγ
p/ kB T = ∑ (zλ ) bγ
T
−d nγ
n = ∑ nγ (zλ ) bγ .
T
The sum is over unlabeled connected clusters γ, and n is the number of vertices in γ. The cluster integral
γ bγ (T ) is obtained by
assigning labels {1, … n } to all the vertices, and computing
γ
γ
1 1
d d
bγ (T ) ≡ ⋅ ∫d x ⋯d x ∏f , (6.S.4)
1 nγ ij
sγ V
i<j
where f appears in the product if there is a link between vertices i and j . s is the symmetry factor of the cluster, defined to be the
ij γ
number of elements from the symmetric group S which, acting on the labels, would leave the product ∏ f invariant. By
nγ
γ
i<j ij
definition, a cluster consisting of a single site has b = 1 . Translational invariance implies b (T ) ∝ V . One then inverts n(T , z)
∙ γ
0
to obtain z(T , n), and inserting the result into the equation for p(T , z) one obtains the virial expansion of the equation of state,
2
p = nkB T {1 + B (T ) n + B (T ) n +… } . (6.S.5)
2 3
where
γ
1
d d
B (T ) = − ∑ ∫d x ⋯∫ d x ∏f (6.S.6)
k 1 k−1 ij
k(k − 2)!
γ∈Γ ⟨ij⟩
k
with Γ the set of all one-particle irreducible j -site clusters. An irreducible cluster is a connected cluster which does not break
k
apart into more than one piece if any of its sites and all of that site’s connecting links are removed from the graph. Any site whose
removal, along with all its connecting links, would result in a disconnected graph is called an articulation point. Irreducible clusters
have no articulation points.
∙ Liquids: In the ordinary canonical ensemble,
1 −βW ( x , … , x )
−1
P (x , … , x ) =Q ⋅ e 1 N
, (6.S.7)
1 N N
N!
where W is the total potential energy, and Q is the configuration integral,

N
1
d d −βW ( x1 , … , x )
Q (T , V ) = ∫d x ⋯∫d x e N
. (6.S.8)
N 1 N
N!
We can use P , or its grand canonical generalization, to compute thermal averages, such as the average local density
n (r) = ⟨ ∑ δ(r − x )⟩
1 i
d d
= N∫ d x ⋯∫ d x P (r, x , … , x )
2 N 2 N
and the two particle density matrix, two-particle density matrix n 2

(r , r )
1 2
is defined by
n (r , r ) = ⟨ ∑ δ(r − x ) δ(r − x )⟩
2 1 2 1 i 2 j
i≠j
d d
= N (N − 1)∫ d x ⋯∫ d x P (r , r , x , … , x ) .
3 N 1 2 3 N
∙ Pair distribution function: For translationally invariant simple fluids consisting of identical point particles interacting by a two-
body central potential u(r), the thermodynamic properties follow from the behavior of the pair distribution function (pdf),
1
g(r) = ⟨ ∑ δ(r − x + x )⟩ , (6.S.9)
2 i j
V n
i≠j
where V is the total volume and n = N /V the average density. The average energy per particle is then
∞
⟨E⟩ 3
2
ε(n, T ) = = kB T + 2πn∫ dr r g(r) u(r) . (6.S.10)
N 2
0
Here g(r) is implicitly dependent on n and T as well In the grand canonical ensemble, the pdf satisfies the compressibility sum
rule, ∫ d r [g(r) − 1] = k T κ − n
3
B
T
−1
, where κ is the isothermal compressibility. Note g(∞) = 1 . The pdf also implies the
T
virial equation of state,

∞
2 2 3 ′
p = nkB T − π n ∫ dr r g(r) u (r) . (6.S.11)
3
0
∙ Scattering: Scattering experiments are sensitive to momentum transfer ℏq and energy transfer ℏω, and allow determination of the
dynamic structure factor
∞
1 iωt iq⋅ x (0) −iq⋅ x (t)

′
S(q, ω) = ∫ dt e ⟨∑e l
e l
⟩
T
N ′
l,l
−∞
N
2πℏ 2
−iq⋅x
= ∑P ∑∣
∣⟨ j ∣
∣ ∑e
l ∣
∣ i ⟩∣
∣ δ(E − E + ℏω) ,
i j i
N
i j l=1
where | i ⟩ and | j ⟩ are (quantum) states of the system being studied, and P
i
is the equilibrium probability for state i.1 Integrating
over all frequency, one obtains the static structure factor,
∞
dω 1 iq⋅( x −x )
′
S(q) = ∫ S(q, ω) = ∑ ⟨e l l
⟩
2π N ′
l,l
−∞
d −iq⋅r
=N δ + 1 + n∫ d r e [g(r) − 1] .
q,0
∙ Theories of fluid structure – The BBGKY hierarchy is set of coupled integrodifferential equations relating k - and (k + 1) -
particle distribution functions. In order to make progress, a truncation must be performed, expressing higher order distributions in
terms of lower order ones. This results in various theories of fluids, known by their defining equations for the pdf g(r) . Examples
include the Born-Green-Yvon equation, the Percus-Yevick equation, the hypernetted chains equation, the Ornstein-Zernike
approximation, Except in the simplest cases (such as the OZ approximation), these equations must be solved numerically. OZ
approximation deserves special mention. There we write S(q) ≈ for small q, where ξ(T ) is the correlation length and
2
1
2 2
(R/ξ) +R q
R(T ) is related to the range of interactions.

∙ Debye-Hückel theory – Due to the long-ranged nature of the Coulomb interaction, the Mayer function decays so slowly as
r → ∞ that it is not integrable, so the virial expansion is problematic. Progress can be made by a self-consistent mean field
approach. For a system consisting of charges ±e , one assumes a local electrostatic potential ϕ(r) . Boltzmann statistics then gives a
charge density
−d −e ϕ(r)/ kB T −d e ϕ(r)/ kB T
ρ(r) = eλ z e − eλ z e , (6.S.12)
+ + − −
where λ and z are the thermal de Broglie wavelengths and fugacities for the + and − species. Assuming overall charge
± ±
neutrality at infinity, one has λ z = λ z = n , and we have ρ(r) = −2en sinh(e ϕ(r)/k T ) . The local potential is then
−d
+ +
−d
− − ∞ ∞ B
determined self-consistently, using Poisson’s equation:

2
∇ ϕ = 8πen∞ sinh(eϕ/ kB T ) − 4π ρ . (6.S.13)
ext
If , we can expand the sinh function to obtain ∇ ϕ = κ ϕ − 4π ρ

eϕ ≪ kB T , where the Debye screening wavevector is
2 2
D ext
κ = (8π n
D
e /k T ) . The self-consistent potential arising from a point charge ρ (r) = Q δ(r) is then of the Yukawa form
∞
2
B
1/2
ext
ϕ(r) = Q exp(−κ r)/r in three space dimensions.

D
∙ Thomas-Fermi screening – In an electron gas with kB T ≪ ε

F
, we may take T =0 . If the Fermi energy is constant, we write
2 2
ℏ k (r)
ε
F
=
F
2m
− e ϕ(r) , and local electron number density is n(r) = k 3
F
(r)/3 π
2
. Assuming a compensating smeared positive
charge background ρ +
= en ∞ , Poisson’s equation takes the form
3/2
eϕ(r)
2
∇ ϕ = 4πe n∞ ⋅ {(1 + ) − 1} − 4π ρ (r) . (6.S.14)
ext
ε
F
If eϕ ≪ ε , we expand in the presence of external sources to obtain ∇ ϕ = κ ϕ − 4π ρ , where κ = (6π n e /ε )

F
is 2 2
TF ext TF ∞
2
F
1/2
the Thomas-Fermi screening wavevector. In metals, where the electron dispersion is a more general function of crystal momentum,
the density response to a local potential ϕ(r) is δn(r) = e ϕ(r) g(ε ) to lowest order, where g(ε ) is the density of states at the F F
−−−−−−−−
Fermi energy. One then finds κ TF = √4π e g(ε
2
F
) .
1. In practice, what is measured is S(q, ω) convolved with spatial and energy resolution filters appropriate to the measuring
apparatus.↩
1. Here we modify slightly the discussion in chapter 5 of the book by L. Peliti.
2. See. J. L. Lebowitz and A. E. Mazel, J. Stat. Phys. 90, 1051 (1998).
3. A corresponding mapping can be found between a cubic lattice and the linear chain as well.
4. Not that I personally think there’s anything wrong with that.
5. Disambiguation footnote: Take care not to confuse Philipp Lenard (Hungarian-German, cathode ray tubes, Nazi), Alfred-Marie
Liénard (French, Liénard-Wiechert potentials, not a Nazi), John Lennard-Jones (British, molecular structure, definitely not a
Nazi), and Lynyrd Skynyrd (American, "Free Bird”, possibly killed by Nazis in 1977 plane crash). I thank my colleague Oleg
Shpyrko for setting me straight on this.
6. We assume that the long-ranged behavior of f (r) ≈ −βu(r) is integrable.
7. See C. N. Yang and R. D. Lee, Phys. Rev. 87, 404 (1952) and ibid, p. 410
8. See http://en.Wikipedia.org/wiki/Close-packing. For randomly close-packed hard spheres, one finds, from numerical
simulations, f = 0.644.
RC P
9. To derive this expression, note that F is directed inward and vanishes away from the surface. Each Cartesian direction
(surf )
(surf ) (surf )
then contributes −F
α = (x, y, z) L , where L is the corresponding linear dimension. But F
α α α = p A , where A is α α α
the area of the corresponding face and p. is the pressure. Summing over the three possibilities for α , one obtains Equation ??? .
10. We may write δ = (2π ) δ(q) . q,0
1
V
d
11. So named after Bogoliubov, Born, Green, Kirkwood, and Yvon.

12. I am grateful to Jonathan Lam and Olga Dudko for explaining this to me.
13. There are logarithmic corrections to the SAW result exactly at d = 4 , but for all d > 4 one has ν = . 1
CHAPTER OVERVIEW
7: Mean Field Theory of Phase Transitions

7.S: Summary
Thumbnail: Domain walls in the two-dimensional Ising model.
This page titled 7: Mean Field Theory of Phase Transitions is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by
Daniel Arovas.
1
Equation of state
Recall the van der Waals equation of state,
a
(p + )(v − b) = RT , (7.1.1)
v2
where v = N A
V /N is the molar volume. Solving for p(v, T ), we have
RT a
p = − . (7.1.2)
2
v−b v
Let us fix the temperature T and examine the function p(v) . Clearly p(v) is a decreasing function of volume for v just above the
minimum allowed value v = b , as well as for v → ∞ . But is p(v) a monotonic function for all v ∈ [b, ∞] ?
We can answer this by computing the derivative,
∂p 2a RT
( ) = − . (7.1.3)
3 2
∂v v (v − b)
T
Setting this expression to zero for finite v , we obtain the equation1

3
2a u
= , (7.1.4)
2
bRT (u − 1)
where u ≡ v/b is dimensionless. It is easy to see that the function f (u) = u /(u − 1) has a unique minimum for u > 1 . Setting 3 2
f (u ) = 0 yields u = 3 , and so f . Thus, for T > T = 8a/27bR , the LHS of Equation 7.1.4 lies below the
′ ∗ ∗ 27
= f (3) = c
min 4
minimum value of the RHS, and there is no solution. This means that p(v, T > T ) is a monotonically decreasing function of v . c
At T = T there is a saddle-node bifurcation. Setting

c vc = b u
∗
= 3b and evaluating pc = p(vc , Tc ), we have that the location of
the critical point for the van der Waals system is
a 8a
pc = , vc = 3b , Tc = . (7.1.5)
2
27 b 27 bR
For T < T , there are two solutions to Equation 7.1.4, corresponding to a local minimum and a local maximum of the function
c
p(v) . The locus of points in the (v, p) plane for which (∂p/∂v) = 0 is obtained by setting Equation 7.1.3 to zero and solving for
T
T , then substituting this into Equation 7.1.2. The result is
∗
a 2ab
p (v) = − . (7.1.6)
2 3
v v
Expressed in terms of dimensionless quantities p̄ = p/ pc and v̄ = v/v , this equation becomes

c
3 2
∗
p̄ (v̄ ) = − . (7.1.7)
2 3
v̄ v̄
Figure 7.1.1 : Pressure versus molar volume for the van der Waals gas at temperatures in equal intervals from T = 1.10 Tc (red) to
T = 0.85 T c (blue). The purple curve is p̄ ( v̄ ).
∗
Along the curve ∗

p = p (v) , the isothermal compressibility, κ
T
=−
1
v
(
∂v
∂p
)
T
diverges, heralding a thermodynamic instability. To
understand better, let us compute the free energy of the van der Waals system, F = E −TS . Regarding the energy E , we showed
back in chapter 2 that
∂ε ∂p a
( ) = T( ) −p = , (7.1.8)
2
∂v ∂T v
T V
which entails
1 a
ε(T , v) = f RT − , (7.1.9)
2 v
where ε = E/ν is the molar internal energy. The first term is the molar energy of an ideal gas, where f is the number of molecular
f
freedoms, which is the appropriate low density limit. The molar specific heat is then c
V
=(
∂ε
∂T
)
v
=
2
R , which means that the
molar entropy is
T
c f
′ V
s(T , v) = ∫ dT = R ln(T / Tc ) + s (v) . (7.1.10)
′ 1
T 2
∂f
We then write f = ε − T s , and we fix the function s 1
(v) by demanding that p = −( ∂v
)
T
. This yields s 1
(v) = R ln(v − b) + s
0
,
where s is a constant. Thus2,
0
f a
f (T , v) = R T (1 − ln(T / Tc )) − − RT ln(v − b) − T s . (7.1.11)
0
2 v
Table 7.1.1 : van der Waals parameters for some common gases. (Source: Wikipedia)
2
gas a (
L ⋅bar
2
) b (
L
L
mol
) pc (bar) Tc (K) vc ( L/mol)
mol
mol
Acetone 14.09 0.0994 52.82 505.1 0.2982
Argon 1.363 0.03219 48.72 150.9 0.0966
Carbon dioxide 3.640 0.04267 7404 304.0 0.1280
Ethanol 12.18 0.08407 63.83 516.3 0.2522
Freon 10.78 0.0998 40.09 384.9 0.2994
Helium 0.03457 0.0237 2.279 5.198 0.0711
Hydrogen 0.2476 0.02661 12.95 33.16 0.0798
Mercury 8.200 0.01696 1055 1723 0.0509
Methane 2.283 0.04278 46.20 190.2 0.1283
Nitrogen 1.408 0.03913 34.06 128.2 0.1174
Oxygen 1.378 0.03183 50.37 154.3 0.0955
Water 5.536 0.03049 220.6 647.0 0.0915
2
∂ f
We know that under equilibrium conditions, f is driven to a minimum by spontaneous processes. Now suppose that ∣
∣ < 0 over
∂v
2
T
some range of v at a given temperature T . This would mean that one mole of the system at volume v and temperature T could
lower its energy by rearranging into two half-moles, with respective molar volumes v ± δv , each at temperature T . The total
2
∂ f
volume and temperature thus remain fixed, but the free energy changes by an amount Δf = ∣ (δv) < 0 . This means that the
∣
1
2 ∂v2 T
2
system is unstable – it can lower its energy by dividing up into two subsystems each with different densities ( molar volumes). Note
that the onset of stability occurs when
2
∂ f ∣ ∂p ∣ 1
∣ =− ∣ = =0 , (7.1.12)
∂v ∣ ∂v ∣
2
T T vκp
which is to say when κ p =∞ . As we saw, this occurs at p = p ∗

(v) , given in Equation [pstar].
2
∂ f
However, this condition, ∂v2
∣
∣T < 0 , is in fact too strong. That is, the system can be unstable even at molar volumes where
2
∂ f
∂v
2
∣
∣
T
. The reason is shown graphically in Figure 7.1.2. At the fixed temperature T , for any molar volume v between
>0
v ≡v
liquid
and v ≡ v , the system can lower its free energy by phase separating into regions of different molar volumes. In
1 gas 2
general we can write

v = (1 − x) v +x v , (7.1.13)
1 2
so v = v when x = 0 and v = v when x = 1 . The free energy upon phase separation is simply
1 2
f = (1 − x) f +x f , (7.1.14)
1 2
where f j
= f (v , T )
j
. This function is given by the straight black line connecting the points at volumes v and v in Figure 7.1.2. 1 2
Figure 7.1.2 Molar free energy f (T , v) of the van der Waals system T = 0.85 T , with dot-dashed black line showing Maxwell c
construction connecting molar volumes v on opposite sides of the coexistence curve.

1,2
The two equations which give us v and v are 1 2
∂f ∣ ∂f ∣ f (T , v ) − f (T , v )
2 1
∣ = ∣ = . (7.1.15)
∂v ∣ v ,T ∂v ∣ v ,T (v −v )
1 2 2 1
∂f
Equivalently, in terms of the pressure, p = − ∂v
∣
∣
T
, these equations are equivalent to
v2
1
p(T , v ) = p(T , v ) = ∫ dv p(T , v) . (7.1.16)
1 2
v −v
2 1
v
1
This procedure is known as the Maxwell construction, and is depicted graphically in Figure 7.1.3. When the Maxwell construction
is enforced, the isotherms resemble the curves in Figure 7.1.1. In this figure, all points within the purple shaded region have
2
∂ f
∂v
2
<0 , hence this region is unstable to infinitesimal fluctuations. The boundary of this region is called the spinodal, and the
spontaneous phase separation into two phases is a process known as spinodal decomposition. The dot-dashed orange curve, called
the coexistence curve, marks the instability boundary for nucleation. In a nucleation process, an energy barrier must be overcome in
order to achieve the lower free energy state. There is no energy barrier for spinodal decomposition – it is a spontaneous process.
Analytic form of the coexistence curve near the critical point

We write v = v + w and v
L c L G
= vc + w
G
. One of our equations is p(v c
+ w , T ) = p(vc + w
L G
,T) . Taylor expanding in powers
of w and w , we have
L G
1 1
2 2 3 3
0 = pv (vc , T ) (w −w )+ pvv (vc , T ) (w −w )+ pvvv(vc , T ) (w −w )+… , (7.1.17)
G L G L G L
2 6
where
2 3 2
∂p ∂ p ∂ p ∂ p
pv ≡ , pvv ≡ , pvvv ≡ , p ≡ , etc. (7.1.18)
2 3 vT
∂v ∂v ∂v ∂v ∂T
The second equation we write as

w
G
1
∫ dw p(vc + w, T ) = (w − w )(p(vc + w , T ) + p(vc + w , T )) . (7.1.19)
G L L G
2
w
L
Expanding in powers of w and w , this becomes

L G
1 1
2 2 3 3
p(vc , T ) (w −w )+ pv (vc , T ) (w −w )+ pvv (vc , T ) (w −w )
G L G L G L
2 6
1 1
4 4 5 5
+ pvvv(vc , T ) (w −w )+ pvvvv(vc , T ) (w −w )+…
G L G L
24 120
1 1
2 2
= (w − w ){2 p(vc , T ) + pv (vc , T ) (w +w )+ pvv (vc , T ) (w +w )
G L G L G L
2 2
1 1
3 3 4 4
+ pvvv(vc , T ) (w +w )+ pvvvv(vc , T ) (w +w )+… }
G L G L
6 24
Subtracting the LHS from the RHS, we find that we can then divide by 1
6
(w
2
G
−w )
2
L
, resulting in
1 1
2 2 3
0 = pvv (vc , T ) + pvvv(vc , T ) (w +w )+ pvvvv(vc , T ) (3 w + 4w w + 3 w ) + O(w ) . (7.1.20)
G L G G L L G,L
2 20
We now define w ±
≡w
G
±w
L
. In terms of these variables, Equations 7.1.17 and 7.1.20 become
1 1 1
2 2 3
0 = pv (vc , T ) + pvv (vc , T ) w+ + pvvv(vc , T ) (w+ + w− ) + O(w )
±
2 8 3
1 1 1
2 2 3
0 = pvv (vc , T ) + pvvv(vc , T ) w + pvvvv(vc , T ) (w + w ) + O(w ) .
+ + − ±
2 8 5
Figure 7.1.3 : Maxwell construction in the (v, p) plane. The system is absolutely unstable between volumes vd and ve . For
v ∈ [va , vd ] of v ∈ [v , v ] , the solution is unstable with respect to phase separation. Source: Wikipedia.
e c
We now evaluate w to order T . Note that p (v , T ) = p (v , T ) = 0 , since the critical point is an inflection point in the (v, p)
± v c c vv c c
plane. Thus, we have p (v , T ) = p Θ + O(Θ ) , where T = T + Θ and p = p (v , T ) . We can then see that w is of
v c vT
2
c vT vT c c −
−−−
leading order √−Θ , while w is of leading order Θ. This allows us to write
+
1 2
2
0 =p Θ+ pvvv w− + O(Θ )
vT
24
1 1
2 2
0 =p Θ+ pvvv w + pvvvv w + O(Θ ) .
vvT + −
2 40
Thus,
1/2
24 p −−−
vT
w− = ( ) √−Θ + …
pvvv
6p pvvvv − 10 pvvv p
vT vvT
w =( ) Θ+… .
+ 2
5 pvvv
We then have
1/2
6p −−− 3p pvvvv − 5 pvvv p
vT vT vvT 3/2
w = −( ) √−Θ + ( )Θ + O(Θ )
L 2
pvvv 5 pvvv
1/2
6p −−− 3p pvvvv − 5 pvvv p
vT vT vvT 3/2
w =( ) √−Θ + ( )Θ + O(Θ ) .
G 2
pvvv 5 pvvv
Suppose we follow along an isotherm starting from the high molar volume (gas) phase. If T > T , the volume v decreases c
continuously as the pressure p increases. If T < T , then at the instant the isotherm first intersects the orange boundary curve in
c
Figure 7.1.4, there is a discontinuous change in the molar volume from high (gas) to low (liquid). This discontinuous change is the
hallmark of a first order phase transition. Note that the volume discontinuity, Δv = w ∝ (T − T ) . This is an example of a − c
1/2
critical behavior in which the order parameter ϕ , which in this case may be taken to be the difference ϕ = v − v , behaves as a G L
power law in ∣∣T − T ∣∣ , where T is the critical temperature. In this case, we have ϕ(T ) ∝ (T − T ) , where β = is the
c c c
β
+
1
exponent, and where (T − T ) is defined to be T − T if T < T and 0 otherwise. The isothermal compressibility is
c + c c
κ = −v/ p (v, T ) . This is finite along the coexistence curve – it diverges only along the spinodal. It therefore diverges at the
v
T
critical point, which lies at the intersection of the spinodal and the coexistence curve.
Figure 7.1.4 : Pressure-volume isotherms for the van der Waals system, as in Figure 7.1.1 , but corrected to account for the Maxwell
construction. The boundary of the purple shaded region is the spinodal line p̄ ( v̄ ). The boundary of the orange shaded region is the
∗
stability boundary with respect to phase separation.

It is convenient to express the equation of state and the coexistence curve in terms of dimensionless variables. Write
p v T
p̄ = , v̄ = , ¯ =
T . (7.1.21)
pc vc Tc
The van der Waals equation of state then becomes

¯
8T 3
p̄ = − . (7.1.22)
2
3 v̄ − 1 v̄
Further expressing these dimensionless quantities in terms of distance from the critical point, we write
¯
p̄ = 1 + π , v̄ = 1 + ϵ , T = 1 +t . (7.1.23)
Thus,
8(1 + t) 3
π(ϵ, t) = − −1 . (7.1.24)
2
2 + 3ϵ (1 + ϵ)
Note that the LHS and the RHS of this equation vanish identically for (π, ϵ, t) = (0, 0, 0). We can then write
1/2
6 πϵt 3 πϵt πϵϵϵϵ − 5 πϵϵϵ πϵϵt
1/2 3/2
ϵ = ∓( ) (−t) +( )t + O((−t) ) . (7.1.25)
L,G 2
πϵϵϵ 5 πϵϵϵ
History of the van der Waals equation
The van der Waals equation of state first appears in van der Waals’ 1873 PhD thesis3, “Over de Continuïteit van den Gas - en
Vloeistoftoestand” (“On the continuity of the gas and liquid state”). In his Nobel lecture4, van der Waals writes of how he was
inspired by Rudolf Clausius’ 1857 treatise on the nature of heat, where it is posited that a gas in fact consists of microscopic
particles whizzing around at high velocities. van der Waals reasoned that liquids, which result when gases are compressed, also
consist of ’small moving particles’: "Thus I conceived the idea that there is no essential difference between the gaseous and the
liquid state of matter…"
Figure [Gugg1945] ‘Universality’ of the liquid-gas transition for eight different atomic and molecular fluids, from E. A.
Guggenheim, J. Chem. Phys. 13, 253 (1945). Dimensionless temperature T /T versus dimensionless density ρ/ρ = v /v is
c c c
shown. The van der Waals / mean field theory gives Δv = v − v

gas ∝ (−t)
l iquid , while experiments show a result closer to
1/2
Δv ∝ (−t) . Here t ≡ T¯ − 1 = (T − T )/T is the dimensionless temperature deviation with respect to the critical point.
1/3
c c
Image used without permission.

Clausius’ treatise showed how his kinetic theory of heat was consistent with Boyle’s law for gases ( pV = constant at fixed
temperature). van der Waals pondered why this might fail for the non-dilute liquid phase, and he reasoned that there were two
principal differences: inter-particle attraction and excluded volume. These considerations prompted him to posit his famous
equation,
RT a
p = − . (7.1.26)
2
v−b v
The first term on the RHS accounts for excluded volume effects, and the second for mutual attractions.
In the limiting case of p → ∞ , the molar volume approaches v = b . On physical grounds, one might expect b = v /ζ , where 0
v =N ω
0 A is N times the volume ω of a single molecule, and the packing fraction is ζ = N ω /V = v /v , which is the ratio of
0 A 0 0 0
the total molecular volume to the total system volume. In three dimensions, the maximum possible packing fraction is for fcc and
hcp lattices, each of which have coordination number 12, with ζ = max = 0.74078 . Dense random packing results in
π
3 √2
ζ
drp
= 0.634 . Expanding the vdW equation of state in inverse powers of v yields
RT a RT
−3
p = + (b − )⋅ + O(v ) , (7.1.27)
2
v RT v
and we read of the second virial coefficient B

2
= (b −
a
RT
)/ NA . For hard spheres, a = 0 , and the result B = 4ω from the
2 0
Mayer cluster expansion corresponds to b Mayer

= 4v
0
, which is larger than the result from even the loosest regular sphere packing,
that for a cubic lattice, with ζ = .
cub
π
Another of van der Waals’ great achievements was his articulation of the law of corresponding states. Recall that the van der Waals
equation of state, when written in terms of dimensionless quantities p̄ = p/p , v̄ = v/v , and T¯ = T /T , takes the form of
c c c
Equation 7.1.22. Thus, while the a and b parameters are specific to each fluid – see Table 7.1.1 – when written in terms of these
scaled dimensionless variables, the equation of state and all its consequent properties ( the liquid-gas phase transition) are
universal.
The van der Waals equation is best viewed as semi-phenomenological. Interaction and excluded volume effects surely are present,
but the van der Waals equation itself only captures them in a very approximate way. It is applicable to gases, where it successfully
predicts features that are not present in ideal systems ( throttling). It is of only qualitative and pedagogical use in the study of fluids,
the essential physics of which lies in the behavior of quantities like the pair distribution function g(r) . As we saw in chapter 6, any
adequate first principles derivation of g(r) - a function which can be measured in scattering experiments - involves rather
complicated approximation schemes to close the BBGKY hierarchy. Else one must resort to numerical simulations such as the
Monte Carlo method. Nevertheless, the lessons learned from the van der Waals system are invaluable and they provide us with a
first glimpse of what is going on in the vicinity of a phase transition, and how nonanalytic behavior, such as v − v ∝ (T − T )
G L c
β
with noninteger exponent β may result due to singularities in the free energy at the critical point.
This page titled 7.1: The van der Waals system is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel
Arovas.
Lattice Gas Description of a Fluid
The usual description of a fluid follows from a continuum Hamiltonian of the form
N 2
p
^ i
H (p, x) = ∑ + ∑ u(x − x ) . (7.2.1)
i j
2m
i=1 i<j
The potential u(r) is typically central, depending only on the magnitude |r|, and short-ranged. Now consider a discretized version
of the fluid, in which we divide up space into cells (cubes, say), each of which can accommodate at most one fluid particle (due to
excluded volume effects). That is, each cube has a volume on the order of a , where a is the diameter of the fluid particles. In a
3
given cube i we set the occupancy n = 1 if a fluid particle is present and n = 0 if there is no fluid particle present. We then have
i i
that the potential energy is

1
U = ∑ u(x − x ) = ∑ V ′
n n ′
, (7.2.2)
i j RR R R
2 ′
i<j R≠R
where V RR
′
≈ v(R − R )
′
, where R is the position at the center of cube k . The grand partition function is then approximated as
k
n
1
Ξ(T , V , μ) ≈ ∑ ( ∏ ξ R
) exp(− β ∑ V ′ n n ′ ) , (7.2.3)
RR R R
2 ′
{n } R R≠R
R
where
βμ −d d
ξ =e λ a , (7.2.4)
T
where a is the side length of each cube (chosen to be on the order of the hard sphere diameter). The λ
−d
T
factor arises from the
integration over the momenta. Note ∑ n = N is the total number of fluid particles, so
R R
nR N βμN −N d Nd
∏ξ =ξ =e λ a . (7.2.5)
T
Figure 7.2.1 : The lattice gas model. An occupied cell corresponds to n = 1 (σ = +1 ), and a vacant cell to n = 0 (σ = −1 ).
Thus, we can write a lattice Hamiltonian,
1
^ =
H ∑ V n n − kB T ln ξ ∑ n (7.2.6)
′ ′
RR R R R
2 ′
R≠R R
1
=− ∑ J ′ σ σ ′ −H ∑σ +E , (7.2.7)
RR R R R 0
2 ′
R≠R R
where σ R
≡ 2n
R
−1 is a spin variable taking the possible values {−1, +1}, and
1
J ′ =− V ′ (7.2.8)
RR RR
4
1 1 ′
H = kB T ln ξ − ∑ V ′ , (7.2.9)
RR
2 4 ′
R
where the prime on the sum indicates that R = R is to be excluded. For the Lennard-Jones system, V
′
= v(R − R ) < 0 is RR
′
′
due to the attractive tail of the potential, hence J is positive, which prefers alignment of the spins σ and σ . This interaction
′
RR R R
′
is therefore ferromagnetic. The spin Hamiltonian in Equation 7.2.7 is known as the Ising model.
Phase diagrams and critical exponents

The physics of the liquid-gas transition in fact has a great deal in common with that of the transition between a magnetized and
unmagnetized state of a magnetic system. The correspondences are5
p ⟷ H , v⟷ m , (7.2.10)
where m is the magnetization density, defined here to be the total magnetization M divided by the number of lattice sites N :6 S
M 1
m = = ∑⟨σ ⟩ . (7.2.11)
R
N N
S S R
Sketches of the phase diagrams are reproduced in Figure 7.2.2. Of particular interest is the critical point, which occurs at (Tc , pc )
in the fluid system and (T , H ) in the magnetic system, with H = 0 by symmetry.

c c c
In the fluid, the coexistence curve in the (p, T ) plane separates high density (liquid) and low density (vapor) phases. The specific
volume v (or the density n = v ) jumps discontinuously across the coexistence curve. In the magnet, the coexistence curve in the
−1
(H , T ) plane separates positive magnetization and negative magnetization phases. The magnetization density m jumps
discontinuously across the coexistence curve. For T > T , the latter system is a paramagnet, in which the magnetization varies
c
smoothly as a function of H . This behavior is most apparent in the bottom panel of the figure, where v(p) and m(H ) curves are
shown.
For T < T , the fluid exists in a two phase region, which is spatially inhomogeneous, supporting local regions of high and low
c
density. There is no stable homogeneous thermodynamic phase for (T , v) within the two phase region shown in the middle left
panel. Similarly, for the magnet, there is no stable homogeneous thermodynamic phase at fixed temperature T and magnetization
m if (T , m) lies within the coexistence region. Rather, the system consists of blobs where the spin is predominantly up, and blobs
where the spin is predominantly down.

Note also the analogy between the isothermal compressibility κ and the isothermal susceptibility χ :
T T
1 ∂v
κ =− ( ) , κ (Tc , pc ) = ∞
T T
v ∂p
T
∂m
χ =( ) , χ (Tc , Hc ) = ∞
T T
∂H
T
Figure 7.2.2 : Comparison of the liquid-gas phase diagram with that of the Ising ferromagnet.
The ‘order parameter’ for a second order phase transition is a quantity which vanishes in the disordered phase and is finite in the
ordered phase. For the fluid, the order parameter can be chosen to be Ψ ∝ (v − v ) , the difference in the specific volumes of
vap liq
the vapor and liquid phases. In the vicinity of the critical point, the system exhibits power law behavior in many physical quantities,
viz.
β
m(T , Hc ) ∼ (Tc − T )
+
−γ
χ(T , Hc ) ∼ |T − Tc |
−α
C (T , Hc ) ∼ |T − Tc |
M
1/δ
m(Tc , H ) ∼ ±|H | .
The quantities α , β, γ, and δ are the critical exponents associated with the transition. These exponents satisfy certain equalities,
such as the Rushbrooke and Griffiths relations and hyperscaling,7
α + 2β + γ = 2 (Rushbrooke)
β + γ = βδ (Griffiths)
2 −α = d ν (hyperscaling) .
Originally such relations were derived as inequalities, and only after the advent of scaling and renormalization group theories it was
realized that they held as equalities. We shall have much more to say about critical behavior later on, when we discuss scaling and
renormalization.
Gibbs-Duhem relation for magnetic systems

Homogeneity of E(S, M , N S
) means E = T S + H M + μN , and, after invoking the First Law dE = T
S
dS + H dM + μ dN
S
,
we have
S dT + M dH + N dμ = 0 . (7.2.12)
S
Now consider two magnetic phases in coexistence. We must have dμ 1

= dμ
2
, hence
dμ = −s dT − m dH = −s dT − m dH = dμ , (7.2.13)
1 1 1 2 2 2
where m = M /N is the magnetization per site and s = S/N is the specific entropy. Thus, we obtain the Clapeyron equation for
S S
magnetic systems,
dH s −s
1 2
( ) =− . (7.2.14)
dT m −m
coex 1 2
Thus, if m ≠ m and ( )
1 2
= 0 , then we must have s = s , which says that there is no latent heat associated with the
dH
dT coex 1 2
transition. This absence of latent heat is a consequence of the symmetry which guarantees that F (T , H , N ) = F (T , −H , N ) . S S
Order-disorder transitions
Another application of the Ising model lies in the theory of order-disorder transitions in alloys. Examples include Cu Au, CuZn, 3
and other compounds. In CuZn, the Cu and Zn atoms occupy sites of a body centered cubic (BCC) lattice, forming an alloy known
as β-brass. Below T ≃ 740 K , the atoms are ordered, with the Cu preferentially occupying one simple cubic sublattice and the Zn
c
preferentially occupying the other.

The energy is a sum of pairwise interactions, with a given link contributing ε , ε , or ε , depending on whether it is an A-A, AA BB AB
B-B, or A-B/B-A link. Here A and B represent Cu and Zn, respectively. Thus, we can write the energy of the link ⟨ij⟩ as
A A B B A B B A
E =ε P P +ε P P +ε (P P +P P ) , (7.2.15)
ij AA i j BB i j AB i j i j
where
1 1 if site i contains Cu
A
P = (1 + σi ) = {
i
2 0 if site i contains Zn
1 1 if site i contains Zn
B
P = (1 − σi ) = {
i
2 0 if site i contains Cu\ .
The Hamiltonian is then

^
H = ∑E
ij
⟨ij⟩
1 1 1
= ∑{ (ε +ε − 2ε )σ σ + (ε −ε ) (σ + σ ) + (ε +ε + 2ε )}
AA BB AB i j AA BB i j AA BB AB
4 4 4
⟨ij⟩
= −J ∑ σ σ −H ∑σ +E ,
i j i 0
⟨ij⟩ i
where the exchange constant J and the magnetic field H are given by
1
J = (2 ε −ε −ε )
AB AA BB
4
1
H = (ε −ε ) ,
BB AA
4
and E = N z(ε + ε + 2ε ) , where N is the total number of lattice sites and

0
1
8 AA BB AB
z =8 is the lattice coordination number,
which is the number of nearest neighbors of any given site.
– –
Figure 7.2.3 : Order-disorder transition on the square lattice. Below T = Tc , order develops spontaneously on the two √2 × √2
sublattices. There is perfect sublattice order at T = 0 (left panel).
Note that
2ε >ε +ε ⟹ J >0 (ferromagnetic)

AB AA BB
2ε <ε +ε ⟹ J <0 (antiferromagnetic) .

AB AA BB
The antiferromagnetic case is depicted in Figure 7.2.3.
This page titled 7.2: Fluids, Magnets, and the Ising Model is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by
Daniel Arovas.
Consider the Ising model Hamiltonian,
^ = −J ∑ σ σ − H ∑ σ ,
H (7.3.1)
i j i
⟨ij⟩ i
where the first sum on the RHS is over all links of the lattice. Each spin can be either ‘up’ (σ = +1 ) or ‘down’ (σ = −1 ). We
further assume that the spins are located on a Bravais lattice8 and that the coupling J = J(|R − R |) , where R is the position ij i j i
of the i spin.
th
On each site i we decompose σ into a contribution from its thermodynamic average and a fluctuation term,
i
σ = ⟨σ ⟩ + δσ . (7.3.2)
i i i
We will write ⟨σ i
⟩ ≡m , the local magnetization (dimensionless), and assume that m is independent of position i. Then
σ σ = (m + δσ ) (m + δσ )
i j i j
2
=m + m (δσ + δσ ) + δσ δσ
i j i j
2
= −m + m (σ + σ ) + δσ δσ .
i j i j
The last term on the RHS of the second equation above is quadratic in the fluctuations, and we assume this to be negligibly small.
Thus, we obtain the mean field Hamiltonian
1
^ 2
H MF = N zJ m − (H + zJm) ∑ σi , (7.3.3)
2
i
where N is the total number of lattice sites. The first term is a constant, although the value of m is yet to be determined. The
Boltzmann weights are then completely determined by the second term, which is just what we would write down for a Hamiltonian
of noninteracting spins in an effective ‘mean field’
H = H + zJm . (7.3.4)
ef f
In other words, H
ef f
=H
ext
+H
int
, where the external field is applied field H ext
=H , and the ‘internal field’ is H
int
= zJm .
The internal field accounts for the interaction with the average values of all other spins coupled to a spin at a given site, hence it is
often called the ‘mean field’. Since the spins are noninteracting, we have
βH −βH
e eff
−e eff
H + zJm
m = = tanh( ) . (7.3.5)
βHeff −βHeff kB T
e +e
It is a simple matter to solve for the free energy, given the noninteracting Hamiltonian H
^
MF
. The partition function is
N
^ 1 2
−βH − βN zJ m β(H +zJm)σ −βF
MF
Z = Tr e =e 2
(∑e ) =e . (7.3.6)
We now define dimensionless variables:

F kB T H
f ≡ , θ ≡ , h ≡ , (7.3.7)
N zJ zJ zJ
and obtain the dimensionless free energy

1 2 (m+h)/θ −(m+h)/θ
f (m, h, θ) = m − θ ln ( e +e ) . (7.3.8)
2
Differentiating with respect to m gives the mean field equation,

m +h
m = tanh ( ) , (7.3.9)
θ
which is equivalent to the self-consistency requirement, m = ⟨σ ⟩ . i
h = 0
When h = 0 the mean field equation becomes

m
m = tanh ( ) . (7.3.10)
θ
This nonlinear equation can be solved graphically, as in the top panel of Figure ??? . The RHS in a tanh function which gets steeper
with decreasing t . If, at m = 0 , the slope of tanh(m/θ) is smaller than unity, then the curve y = tanh(m/h) will intersect y = m
only at m = 0 . However, if the slope is larger than unity, there will be three such intersections. Since the slope is 1/θ, we identify
θ = 1 as the mean field transition temperature.
c
Figure 7.3.1 . Upper panels: graphical solution to self-consistency equation

: Results for h = 0
m = tanh(m/θ) at temperatures θ = 0.65 (blue) and θ = 1.5 (dark red). Lower panel: mean
field free energy, with energy shifted by θ ln 2 so that f(m = 0, θ) = 0 .
In the low temperature phase θ < 1 , there are three solutions to the mean field equations. One solution is always at m = 0 . The
other two solutions must be related by the m ↔ −m symmetry of the free energy (when h = 0 ). The exact free energies are
plotted in the bottom panel of Figure 7.3.1, but it is possible to make analytical progress by assuming m is small and Taylor
expanding the free energy f (m, θ) in powers of m:
1 m
2
f (m, θ) = m − θ ln 2 − θ ln cosh ( )
2 θ
4 6
1 −1 2
m m
= −θ ln 2 + (1 − θ )m + − +… .
3
2 12 θ 45 θ5
Note that the sign of the quadratic term is positive for θ > 1 and negative for θ < 1 . Thus, the shape of the free energy f (m, θ) as
a function of m qualitatively changes at this point, θ = 1 , the mean field transition temperature, also known as the critical
c
temperature.
For θ > θ , the free energy f (m, θ) has a single minimum at m = 0 . Below θ , the curvature at m = 0 reverses, and m = 0
c c
becomes a local maximum. There are then two equivalent minima symmetrically displaced on either side of m = 0 . Differentiating
with respect to m, we find these local minima. For θ < θ , the local minima are found at
c
2 2 2
m = 3 θ (1 − θ) = 3(1 − θ) + O((1 − θ) ) . (7.3.11)
Thus, we find for |θ − 1| ≪ 1 ,

– 1/2
m(θ, h = 0) = ±√3 (1 − θ) , (7.3.12)
+
where the + subscript indicates that this solution is only for 1 − θ > 0 . For θ > 1 the only solution is m = 0 . The exponent with
β
which m(θ) vanishes as θ → θ is denoted β. m(θ, h = 0) ∝ (θ − θ) .
−
c c +
Specific heat
We can now expand the free energy f (θ, h = 0) . We find
−θ ln 2 if θ > θc
f (θ, h = 0) = { 3 2 4
(7.3.13)
−θ ln 2 − (1 − θ) + O((1 − θ) ) if θ < θc .
4
Thus, if we compute the heat capacity, we find in the vicinity of θ = θ c
2
∂ f 0 if θ > θc
c = −θ ={ 3 (7.3.14)
V 2
∂θ if θ < θc .
2
Thus, the specific heat is discontinuous at θ = θ . We emphasize that this is only valid near
c θ = θc = 1 . The general result valid
for all θ is9
2 4
1 m (θ) − m (θ)
c (θ) = ⋅ , (7.3.15)
V 2
θ θ − 1 + m (θ)
With this expression one can check both limits θ → 0 and θ → θ . As θ → 0 the magnetization saturates and one has
c
2
m (θ) ≃ 1 − 4 e
−2/θ
. The numerator then vanishes as e , which overwhelms the denominator that itself vanishes as θ . As a
−2/θ 2
result, c (θ → 0) = 0 , as expected. As θ → 1 , invoking m ≃ 3(1 − θ) we recover c (θ ) = .

V
2
V
−
c
3
−α
In the theory of critical phenomena, c V
(θ) ∝ |θ − θc | as θ → θ . We see that mean field theory yields α = 0 .
c
Figure 7.3.2 : Results for h = 0.1 . Upper panels: graphical solution to self-consistency equation m = tanh((m + h)/θ) at
temperatures θ = 0.65 (blue), θ = 0.9 (dark green), and θ = 1.5 (dark red). Lower panel: mean field free energy, with energy
shifted by θ ln 2 so that f (m = 0, θ) = 0.
h ≠ 0
Consider without loss of generality the case h > 0 . The minimum of the free energy f (m, h, θ) now lies at m > 0 for any θ . At
low temperatures, the double well structure we found in the h = 0 case is tilted so that the right well lies lower in energy than the
left well. This is depicted in Figure 7.3.2. As the temperature is raised, the local minimum at m < 0 vanishes, annihilating with the
2
∂f ∂ f
local maximum in a saddle-node bifurcation. To find where this happens, one sets ∂m
=0 and ∂m
2
=0 simultaneously, resulting
in
−−−−
− −−− θ 1 + √1 − θ
∗
h (θ) = √ 1 − θ − ln( ). (7.3.16)
−−−−
2 1 − √1 − θ
The solutions lie at h = ±h (θ) . For θ < θ = 1 and h ∈ [−h (θ) , +h (θ)] , there are three solutions to the mean field
∗
c
∗ ∗
equation. Equivalently we could in principle invert the above expression to obtain θ (h). For θ > θ (h) , there is only a single
∗ ∗
global minimum in the free energy f (m) and there is no local minimum. Note θ (h = 0) = 1 . ∗
Assuming h ≪ |θ − 1| ≪ 1 , the mean field solution for m(θ, h) will also be small, and we expand the free energy in m , and to
linear order in h :
4
1 m hm
−1 2
f (m, h, θ) = −θ ln 2 + (1 − θ )m + −
3
2 12 θ θ
1 1
2 4
=f + (θ − 1) m + m − hm + … .
0
2 12
∂f
Setting ∂m
=0 , we obtain
1 3
m + (θ − 1) ⋅ m − h = 0 . (7.3.17)
3
If θ > 1 then we have a solution m = h/(θ − 1) . The m term can be ignored because it is higher order in
3
h , and we have
assumed h ≪ |θ − 1| ≪ 1 . This is known as the Curie-Weiss law10. The magnetic susceptibility behaves as
∂m 1 −γ
χ(θ) = = ∝ |θ − 1 | , (7.3.18)
∂h θ−1
where the magnetization critical exponent γ is γ = 1 . If θ < 1 then while there is still a solution at m = h/(θ − 1) , it lies at a
local maximum of the free energy, as shown in Figure 7.3.2. The minimum of the free energy occurs close to the h = 0 solution
–
m = m (θ) ≡ √3 (1 − θ) , and writing m = m + δm we find δm to linear order in h as δm(θ, h) = h/2(1 − θ) . Thus,
0 0
– h
1/2
m(θ, h) = √3 (1 − θ) + . (7.3.19)
2(1 − θ)
−γ
Once again, we find that χ(θ) diverges as |θ − 1| with γ = 1 . The exponent γ on either side of the transition is the same.
Finally, we can set θ = θ and examine m(h). We find, from Equation 7.3.17,
c
1/3 1/δ
m(θ = θc , h) = (3h ) ∝h , (7.3.20)
where δ is a new critical exponent. Mean field theory gives δ =3 . Note that at θ = θc = 1 we have m = tanh(m + h) , and
inverting we find
3 5
1 1 +m m m
h(m, θ = θc ) = ln( ) −m = + +… , (7.3.21)
2 1 −m 3 5
which is consistent with what we just found for m(h, θ = θ ) . c
Table 7.3.1 : Critical exponents from mean field theory as compared with exact results for the two-dimensional Ising model, numerical results
for the three-dimensional Ising model, and experiments on the liquid-gas transition in CO . Source: H. E. Stanley, Phase Transitions and
2
Critical Phenomena.
2D Ising 3D Ising CO 2
Exponent MFT (exact) (numerical) (expt.)
α 0 0 0.125 0.1
β 1/2 1/8 0.313 0.35
γ 1 7/4 1.25 1.26
δ 3 15 5 4.2
How well does mean field theory do in describing the phase transition of the Ising model? In table 7.3.1 we compare our mean
field results for the exponents α , β, γ, and δ with exact values for the two-dimensional Ising model, numerical work on the three-
dimensional Ising model, and experiments on the liquid-gas transition in CO . The first thing to note is that the exponents are
2
dependent on the dimension of space, and this is something that mean field theory completely misses. In fact, it turns out that the
mean field exponents are exact provided d > d , where d is the upper critical dimension of the theory. For the Ising model,
u u
d = 4 , and above four dimensions (which is of course unphysical) the mean field exponents are in fact exact. We see that all in all
u
the MFT results compare better with the three dimensional exponent values than with the two-dimensional ones – this makes sense
since MFT does better in higher dimensions. The reason for this is that higher dimensions means more nearest neighbors, which
has the effect of reducing the relative importance of the fluctuations we neglected to include.
Magnetization dynamics
Dissipative processes drive physical systems to minimum energy states. We can crudely model the dissipative dynamics of a
magnet by writing the phenomenological equation
dm ∂f
=− , (7.3.22)
ds ∂m
where s is a dimensionless time variable. Under these dynamics, the free energy is never increasing:
2
df ∂f ∂m ∂f
= = −( ) ≤0 . (7.3.23)
ds ∂m ∂s ∂m
∂f
Clearly the fixed point of these dynamics, where ṁ = 0 , is a solution to the mean field equation ∂m
=0 .
Figure 7.3.3 : Dissipative magnetization dynamics ṁ = −f (m) . Bottom panel shows h (θ) from Equation 7.3.16 . For (θ, h)
′ ∗
within the blue shaded region, the free energy f (m) has a global minimum plus a local minimum and a local maximum. Otherwise
f (m) has only a single global minimum. Top panels show an imperfect bifurcation in the magnetization dynamics at h = 0.0215 ,
for which θ = 0.90 . Temperatures shown: θ = 0.65 (blue), θ = θ (h) = 0.90 (green), and θ = 1.2 . The rightmost stable fixed
∗ ∗
point corresponds to the global minimum of the free energy. The bottom of the middle two upper panels shows h = 0 , where both
of the attractive fixed points and the repulsive fixed point coalesce into a single attractive fixed point (supercritical pitchfork
bifurcation).
The phase flow for the equation ṁ = −f (m) is shown in Figure 7.3.3. As we have seen, for any value of h there is a temperature
′
θ below which the free energy f (m) has two local minima and one local maximum. When h = 0 the minima are degenerate, but
∗
at finite h one of the minima is a global minimum. Thus, for θ < θ (h) there are three solutions to the mean field equations. In the
∗
language of dynamical systems, under the dynamics of Equation 7.3.22, minima of f (m) correspond to attractive fixed points and
maxima to repulsive fixed points. If h > 0 , the rightmost of these fixed points corresponds to the global minimum of the free
energy. As θ is increased, this fixed point evolves smoothly. At θ = θ , the (metastable) local minimum and the local maximum
∗
coalesce and annihilate in a saddle-note bifurcation. However at h = 0 all three fixed points coalesce at θ = θ and the bifurcation
c
is a supercritical pitchfork. As a function of t at finite h , the dynamics are said to exhibit an imperfect bifurcation, which is a
deformed supercritical pitchfork.
Figure 7.3.4 : Top panel : hysteresis as a function of ramping the dimensionless magnetic field h at θ = 0.40 . Dark red arrows
below the curve follow evolution of the magnetization on slow increase of h . Dark grey arrows above the curve follow evolution of
the magnetization on slow decrease of h . Bottom panel : solution set for m(θ, h) as a function of h at temperatures θ = 0.40
(blue), θ = θ = 1.0 (dark green), and t = 1.25 (red).
c
The solution set for the mean field equation is simply expressed by inverting the tanh function to obtain h(θ, m) . One readily
finds
θ 1 +m
h(θ, m) = ln( ) −m . (7.3.24)
2 1 −m
As we see in the bottom panel of Figure 7.3.4, m(h) becomes multivalued for h ∈ [ − h (θ) , +h (θ)] , where h (θ) is given in
∗ ∗ ∗
Equation 7.3.16. Now imagine that θ < θ and we slowly ramp the field h from a large negative value to a large positive value,
c
and then slowly back down to its original value. On the time scale of the magnetization dynamics, we can regard h(s) as a
constant. (Remember the time variable is s here.) Thus, m(s) will flow to the nearest stable fixed point. Initially the system starts
with m = −1 and h large and negative, and there is only one fixed point, at m ≈ −1 . As h slowly increases, the fixed point
∗
value m also slowly increases. As h exceeds −h (θ), a saddle-node bifurcation occurs, and two new fixed points are created at
∗ ∗
positive m, one stable and one unstable. The global minimum of the free energy still lies at the fixed point with m < 0 . However, ∗
when h crosses h = 0 , the global minimum of the free energy lies at the most positive fixed point m . The dynamics, however, ∗
keep the system stuck in what is a metastable phase. This persists until h = +h (θ) , at which point another saddle-note bifurcation
∗
occurs, and the attractive fixed point at m < 0 annihilates with the repulsive fixed point. The dynamics then act quickly to drive
∗
m to the only remaining fixed point. This process is depicted in the top panel of Figure 7.3.4. As one can see from the figure, the
the system follows a stable fixed point until the fixed point disappears, even though that fixed point may not always correspond to a
global minimum of the free energy. The resulting m(h) curve is then not reversible as a function of time, and it possesses a
characteristic shape known as a hysteresis loop. Etymologically, the word hysteresis derives from the Greek υστ ερησις, which
means ‘lagging behind’. Systems which are hysteretic exhibit a history-dependence to their status, which is not uniquely
determined by external conditions. Hysteresis may be exhibited with respect to changes in applied magnetic field, changes in
temperature, or changes in other externally determined parameters.
Beyond nearest neighbors

Suppose we had started with the more general model,
^
H = −∑J σ σ −H ∑σ
ij i j i
i<j i
1
=− ∑J σ σ −H ∑σ ,
ij i j i
2
i≠j i
where J is the coupling between spins on sites i and j . In the top equation above, each pair (ij) is counted once in the interaction
ij
term; this may be replaced by a sum over all i and j if we include a factor of .11 The resulting mean field Hamiltonian is then
1
1 2
^ ^ ^
H MF = N J (0) m − (H + J (0) m) ∑ σi . (7.3.25)
2
i
Here, J^(q) is the Fourier transform of the interaction matrix J :12 ij
−iq⋅R
J^(q) = ∑ J(R) e . (7.3.26)
For nearest neighbor interactions only, one has J^(0) = zJ , where z is the lattice coordination number, the number of nearest
neighbors of any given site. The scaled free energy is as in Equation 7.3.8, with f = F /N J^(0) , θ = k T /J^(0) , and B
h = H / J (0) . The analysis proceeds precisely as before, and we conclude θ = 1 , k T = J (0) .

^ ^ MF
c B c
Ising model with long-ranged forces

Consider an Ising model where J ij
= J/N for all i and j , so that there is a very weak interaction between every pair of spins. The
Hamiltonian is then
2
J
^
H =− ( ∑σ ) −H ∑σ . (7.3.27)
i k
2N
i k
The partition function is

2
βJ
Z = Tr { σi } exp[ ( ∑ σ ) + βH ∑ σ ] . (7.3.28)
i i
2N
i i
We now invoke the Gaussian integral,

∞
−−
−αx −βx
2 π β
2
/4α
∫ dx e =√ e . (7.3.29)
α
−∞
Thus,
∞
2 1/2
βJ N βJ −
1
N βJ m +βJm ∑
2
σ
exp[ (∑σ ) ] = ( ) ∫ dm e 2 i i
, (7.3.30)
i
2N 2π
i
−∞
and we can write the partition function as

∞
1/2 N
N βJ −
1
N βJ m
2
β(H +Jm)σ
Z =( ) ∫ dm e 2
(∑e )
2π
σ
−∞
∞
1/2
N
−N A(m)/θ
=( ) ∫ dm e ,
2πθ
−∞
where θ = k B
T /J , h = H /J , and
1 2
h +m
A(m) = m − θ ln[2 cosh ( )] . (7.3.31)
2 θ
Since N → ∞ , we can perform the integral using the method of steepest descents. Thus, we must set
∗
dA ∣ m +h
∗
∣ =0 ⟹ m = tanh ( ) . (7.3.32)
dm ∣m∗ θ
Expanding about m = m , we write

∗
∗
1 ′′ ∗ ∗ 2
1 ′′′ ∗ ∗ 3
A(m) = A(m ) + A (m ) (m − m ) + A (m ) (m − m ) +… . (7.3.33)
2 6
Performing the integrations, we obtain

∞
1/2 ′′ ∗ ′′′ ∗
N ∗ N A (m ) NA (m )
−N A( m )/θ 2 3
Z =( ) e ∫ dν exp[ − m − m +… ]
2πθ 2θ 6θ
−∞
1 −N A( m )/θ
∗
−1
= −−−−−− e ⋅ {1 + O(N )} .
′′ ∗
√A (m )
The corresponding free energy per site

F θ
∗ ′′ ∗ −2
f = = A(m ) + ln A (m ) + O(N ) , (7.3.34)
NJ 2N
where m is the solution to the mean field equation which minimizes A(m). Mean field theory is exact for this model!
∗
This page titled 7.3: Mean Field Theory is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel Arovas.
The variational principle
Suppose we are given a Hamiltonian H
^ . From this we construct the free energy, F :
F = E −TS
^ ) + k T T r (ϱ ln ϱ) .
= T r (ϱ H B
Here, ϱ is the density matrix13. A physical density matrix must be (i) normalized ( T r ϱ = 1 ), (ii) Hermitian, and (iii) non-negative
definite ( all the eigenvalues of ϱ must be non-negative).
Our goal is to extremize the free energy subject to the various constraints on ϱ. Let us assume that ϱ is diagonal in the basis of
eigenstates of H
^
,
ϱ = ∑ Pγ ∣
∣ γ ⟩⟨ γ ∣
∣ , (7.4.1)
where P is the probability that the system is in state ∣∣ γ ⟩. Then

γ
F = ∑ Eγ Pγ + kB T ∑ Pγ ln Pγ . (7.4.2)
γ γ
Thus, the free energy is a function of the set . We now extremize

{ Pγ } F subject to the normalization constraint. This means we
form the extended function
∗
F ({ Pγ }, λ) = F ({ Pγ }) + λ( ∑ Pγ − 1) , (7.4.3)
and then freely extremize over both the probabilities { Pγ } as well as the Lagrange multiplier λ . This yields the Boltzmann
distribution,
eq
1
Pγ = exp(−Eγ / kB T ) , (7.4.4)
Z
^
where Z = ∑ γ
e
−Eγ / kB T
= Tre
−H / kB T
is the canonical partition function, which is related to λ through
λ = kB T (ln Z − 1) . (7.4.5)
Note that the Boltzmann weights are, appropriately, all positive.

If the spectrum of H ^
is bounded from below, our extremum should in fact yield a minimum for the free energy F . Furthermore,
since we have freely minimized over all the probabilities, subject to the single normalization constraint, any distribution {P } γ
other than the equilibrium one must yield a greater value of F .

Alas, the Boltzmann distribution, while exact, is often intractable to evaluate. For one-dimensional systems, there are general
methods such as the transfer matrix approach which do permit an exact evaluation of the free energy. However, beyond one
dimension the situation is in general hopeless. A family of solvable (“integrable") models exists in two dimensions, but their
solutions require specialized techniques and are extremely difficult. The idea behind the variational density matrix approximation is
to construct a tractable trial density matrix ϱ which depends on a set of variational parameters {x }, and to minimize with respect
α
to this (finite) set.
Variational density matrix for the Ising model

Consider once again the Ising model Hamiltonian,
^
H = − ∑ Jij σi σj − H ∑ σi . (7.4.6)
i<j i
The states of the system ∣∣ γ ⟩ may be labeled by the values of the spin variables: ∣∣ γ ⟩ ⟷ ∣∣ σ1 , σ2 , … ⟩ . We assume the density
matrix is diagonal in this basis,
′
ϱ (γ ∣
∣γ ) ≡ ϱ(γ) δγ,γ ′ , (7.4.7)
N
where
δγ,γ ′ = ∏ δσ ,σ
′ . (7.4.8)
i i
Indeed, this is the case for the exact density matrix, which is to say the Boltzmann weight,
1 ^
−βH ( σ ,…, σ )
ϱ (σ1 , σ2 , …) = e 1 N
. (7.4.9)
N
Z
We now write a trial density matrix which is a product over contributions from independent single sites:
ϱ (σ1 , σ2 , …) = ∏ ϱ(σi ) , (7.4.10)

N
where
1 +m 1 −m
ϱ(σ) = ( ) δσ,1 + ( ) δσ,−1 . (7.4.11)
2 2
Note that we’ve changed our notation slightly. We are denoting by ϱ(σ) the corresponding diagonal element of the matrix
1+m
0
2
ϱ =( ) , (7.4.12)
1−m
0
2
and the full density matrix is a tensor product over the single site matrices:
ϱ = ϱ⊗ϱ⊗⋯ ⊗ϱ . (7.4.13)
N
Note that ϱ and hence ϱ are appropriately normalized. The variational parameter here is m, which, if ρ is to be non-negative
N
definite, must satisfy −1 ≤ m ≤ 1 . The quantity m has the physical interpretation of the average spin on any given site, since
⟨σi ⟩ = ∑ ϱ(σ) σ = m. (7.4.14)
We may now evaluate the average energy:

^ 2
E = Tr (ϱ H ) = − ∑ Jij m − H ∑ m
N
i<j i
1 2
^
=− N J (0) m − N H m ,
2
where once again J^(0) is the discrete Fourier transform of J(R) at wavevector q = 0 . The entropy is given by
S = −kB Tr (ϱ ln ϱ ) = −N kB Tr (ϱ ln ϱ)
N N
1 +m 1 +m 1 −m 1 −m
= −N kB {( ) ln ( )+( ) ln ( )} .
2 2 2 2
We now define the dimensionless free energy per site: f ^

≡ F /N J (0) . We have
1 2
1 +m 1 +m 1 −m 1 −m
f (m, h, θ) = − m − hm + θ {( ) ln ( )+( ) ln ( )} , (7.4.15)
2 2 2 2 2
where θ ≡ k T /J^(0) is the dimensionless temperature, and

B
h ≡ H / J^(0) the dimensionless magnetic field, as before. We
extremize f (m) by setting
∂f θ 1 +m
= 0 = −m − h + ln ( ) . (7.4.16)
∂m 2 1 −m
Solving for m, we obtain
m +h
m = tanh ( ) , (7.4.17)
θ
which is precisely what we found in Equation [isingmft].
[ferg] Variational field free energy Δf = f (m, h, θ) + θ ln 2 versus magnetization m at six equally spaced temperatures
interpolating between ‘high’ (θ = 1.25 , red) and ‘low’ (θ = 0.75 , blue) values. Top panel: h = 0 . Bottom panel: h = 0.06 .
Note that the optimal value of m indeed satisfies the requirement |m| ≤ 1 of non-negative probability. This nonlinear equation
may be solved graphically. For h = 0 , the unmagnetized solution m = 0 always applies. However, for θ < 1 there are two
additional solutions at m=\pm m\nd_\ssr{A}(\theta) , with m\nd_\ssr{A}(\theta) =\sqrt{3(1-\theta)} + \CO\big( (1-\theta)^{3/2}\big) for t
close to (but less than) one. These solutions, which are related by the Z symmetry of the h = 0 model, are in fact the low energy
2
solutions. This is shown clearly in figure [ferg], where the variational free energy f (m, t) is plotted as a function of m for a range
of temperatures interpolating between ‘high’ and ‘low’ values. At the critical temperature θ = 1 , the lowest energy state changes c
from being unmagnetized (high temperature) to magnetized (low temperature).

For h > 0 , there is no longer a Z symmetry ( σ → −σ ∀ i). The high temperature solution now has m > 0 (or m < 0 if h < 0 ),
2 i i
and this smoothly varies as t is lowered, approaching the completely polarized limit m = 1 as θ → 0 . At very high temperatures,
the argument of the tanh function is small, and we may approximate tanh(x) ≃ x , in which case
h
m(h, θ) = . (7.4.18)
θ − θc
This is called the Curie-Weiss law. One can infer θ from the high temperature susceptibility χ(θ) = (∂m/∂h)
c h=0
by plotting χ−1
versus θ and extrapolating to obtain the θ -intercept. In our case, χ(θ) = (θ − θ ) . For low θ and weak c
−1
h , there are two
inequivalent minima in the free energy.

When m is small, it is appropriate to expand f (m, h, θ), obtaining
1 θ θ θ
2 4 6 8
f (m, h, θ) = −θ ln 2 − hm + (θ − 1) m + m + m + m +… . (7.4.19)
2 12 30 56
This is known as the Landau expansion of the free energy in terms of the order parameter m. An order parameter is a
thermodynamic variable ϕ which distinguishes ordered and disordered phases. Typically ϕ = 0 in the disordered (high
temperature) phase, and ϕ ≠ 0 in the ordered (low temperature) phase. When the order sets in continuously, when ϕ is continuous
across θ , the phase transition is said to be second order. When ϕ changes abruptly, the transition is first order. It is also quite
c
commonplace to observe phase transitions between two ordered states. For example, a crystal, which is an ordered state, may
change its lattice structure, say from a high temperature tetragonal phase to a low temperature orthorhombic phase. When the high
T phase possesses the same symmetries as the low T phase, as in the tetragonal-to-orthorhombic example, the transition may be
second order. When the two symmetries are completely unrelated, for example in a hexagonal-to-tetragonal transition, or in a
transition between a ferromagnet and an antiferromagnet, the transition is in general first order.
Throughout this discussion, we have assumed that the interactions J are predominantly ferromagnetic, J > 0 , so that all the
ij ij
spins prefer to align. When J < 0 , the interaction is said to be antiferromagnetic and prefers anti-alignment of the spins (
ij
σ σ = −1 ). Clearly not every pair of spins can be anti-aligned – there are two possible spin states and a thermodynamically
i j
extensive number of spins. But on the square lattice, for example, if the only interactions J are between nearest neighbors and the
ij
interactions are antiferromagnetic, then the lowest energy configuration (T = 0 ground state) will be one in which spins on
opposite sublattices are anti-aligned. The square lattice is bipartite – it breaks up into two interpenetrating sublattices A and B
(which are themselves square lattices, rotated by 45 with respect to the original, and with a larger lattice constant by a factor of
∘
–
√2), such that any site in A has nearest neighbors in B, and vice versa. The honeycomb lattice is another example of a bipartite
lattice. So is the simple cubic lattice. The triangular lattice, however, is not bipartite (it is tripartite). Consequently, with nearest
neighbor antiferromagnetic interactions, the triangular lattice Ising model is highly frustrated. The moral of the story is this:
antiferromagnetic interactions can give rise to complicated magnetic ordering, and, when frustrated by the lattice geometry, may
have finite specific entropy even at T = 0 .
Mean Field Theory of the Potts Model

The Hamiltonian for the Potts model is
^
H = − ∑ Jij δσ , σj − H ∑ δσ ,1 . (7.4.20)
i i
i<j i
Here, σ ∈ {1, … , q}, with integer q. This is the so-called ‘q-state Potts model’. The quantity H is analogous to an external
i
magnetic field, and preferentially aligns (for H > 0 ) the local spins in the σ = 1 direction. We will assume H ≥ 0 .
The q-component set is conveniently taken to be the integers from 1 to q, but it could be anything, such as
σi ∈ {tomato, penny, ostrich, Grateful Dead ticket from 1987, …} . (7.4.21)
The interaction energy is −J if sites i and j contain the same object (q possibilities), and
ij 0 if i and j contain different objects (
q − q possibilities).
2
The two-state Potts model is equivalent to the Ising model. Let the allowed values of σ be ±1. Then the quantity
\[\delta_{\sigma,\sigma'}=\half + \half\,\sigma\sigma'\\]
equals 1 if σ = σ , and is zero otherwise. The three-state Potts model cannot be written as a simple three-state Ising model, one
′
with a bilinear interaction σ σ where σ ∈ {−1, 0, +1}. However, it is straightforward to verify the identity
′
1 3
′ 2 ′2 2 ′2
δσ,σ ′ = 1 + σσ + σ σ − (σ +σ ) . (7.4.22)
2 2
Thus, the q = 3 -state Potts model is equivalent to a S = 1 (three-state) Ising model which includes both bilinear (σ σ ) and ′
biquadratic (σ σ ) interactions, as well as a local field term which couples to the square of the spin, σ . In general one can find
2 ′2 2
such correspondences for higher q Potts models, but, as should be expected, the interactions become increasingly complex, with bi-
cubic, bi-quartic, bi-quintic, terms. Such a formulation, however, obscures the beautiful S symmetry inherent in the model, where q
S is the permutation group on q symbols, which has q! elements.

q
Getting back to the mean field theory, we write the single site variational density matrix ϱ as a diagonal matrix with entries
1 −x
ϱ(σ) = x δσ,1 + ( ) (1 − δσ,1 ) , (7.4.23)
q −1
with ϱ (σ , … , σ ) = ϱ(σ ) ⋯ ϱ(σ ) . Note that Tr (ϱ) = 1. The variational parameter is x. When x = q , all states are
N 1 N 1 N
−1
equally probable. But for x > q , the state σ = 1 is preferred, and the other (q − 1) states have identical but smaller probabilities.
−1
It is a simple matter to compute the energy and entropy:

2
1 (1 − x)
^) = − 2
E = Tr (ϱ H N J^(0){x + } − N Hx
N
2 q −1
1 −x
S = −kB Tr (ϱ ln ϱ ) = −N kB {x ln x + (1 − x) ln ( )} .
N N
q −1
The dimensionless free energy per site is then

2
1 (1 − x) 1 −x
2
f (x, θ, h) = − {x + } + θ {x ln x + (1 − x) ln ( )} − hx , (7.4.24)
2 q −1 q −1
where h = H /J^(0) . We now extremize with respect to x to obtain the mean field equation,
∂f 1 −x 1 −x
= 0 = −x + + θ ln x − θ ln ( ) −h . (7.4.25)
∂x q −1 q −1
Note that for h = 0 , x = q −1
is a solution, corresponding to a disordered state in which all states are equally probable. At high
temperatures, for small h , we expect x − q ∝ h . Indeed, using Mathematica one can set
−1
−1
x ≡q +s , (7.4.26)
and expand the mean field equation in powers of s . One obtains

3
q (qθ − 1) q (q − 2) θ
2 3
h = s+ s + O(s ) . (7.4.27)
2
q −1 2 (q − 1)
For weak fields, |h| ≪ 1 , and we have

(q − 1) h
2
s(θ) = + O(h ) , (7.4.28)
q (qθ − 1)
which again is of the Curie-Weiss form. The difference s = x − q −1

is the order parameter for the transition.
Finally, one can expand the free energy in powers of s , obtaining the Landau expansion,
3
2h + 1 q (qθ − 1) (q − 2) q θ
2 3
f (s, θ, h) =− − θ ln q − hs + s − s
2
2q 2 (q − 1) 6 (q − 1)
3 4
q θ −3 4
q θ −4 5
+ [1 + (q − 1 ) ]s − [1 − (q − 1 ) ]s
12 20
5
q θ
−5 6
+ [1 + (q − 1 ) ]s +… .
30
Note that, for q = 2 , the coefficients of s , s , and higher order odd powers of s vanish in the Landau expansion. This is consistent
3 5
with what we found for the Ising model, and is related to the Z symmetry of that model. For q > 3 , there is a cubic term in the
2
mean field free energy, and thus we generically expect a first order transition, as we shall see below when we discuss Landau
theory.
Mean Field Theory of the XY Model

Consider the so-called XY model, in which each site contains a continuous planar spin, represented by an angular variable
ϕ ∈ [−π, π] :
i
1
^ =−
H ∑ Jij cos(ϕi − ϕj ) − H ∑ cos ϕi . (7.4.29)
2
i≠j i
We write the (diagonal elements of the) full density matrix once again as a product:
ϱ (ϕ1 , ϕ2 , …) = ∏ ϱ(ϕi ) . (7.4.30)

N
Our goal will be to extremize the free energy with respect to the function ϱ(ϕ). To this end, we compute
1 2
^ ^ ∣ iϕ ∣
E = Tr (ϱ H) = − N J (0) Tr (ϱ e ) − N H Tr (ϱ cos ϕ) . (7.4.31)
N ∣ ∣
2
The entropy is
S = −N kB Tr (ϱ ln ϱ) . (7.4.32)
Note that for any function A(ϕ) , we have14

π
dϕ
Tr (ϱ A) ≡ ∫ ϱ(ϕ) A(ϕ) . (7.4.33)
2π
−π
We now extremize the functional F [ϱ(ϕ)] = E − T S with respect to ϱ(ϕ) , under the condition that Tr ϱ = 1 . We therefore use
Lagrange’s method of undetermined multipliers, writing
∗
F = F − N kB T λ ( Tr ϱ − 1) . (7.4.34)
Note that F is a function of the Lagrange multiplier λ and a functional of the density matrix ϱ(ϕ). The prefactor N k T which
∗
B
multiplies λ is of no mathematical consequence – we could always redefine the multiplier to be λ ≡ N k T λ . It is present only to ′
B
maintain homogeneity and proper dimensionality of F with λ itself dimensionless and of order N . We now have
∗ 0
∗
δF δ 1 2
^ ∣ iϕ ∣
= {− N J (0) Tr(ϱ e ) − N H Tr(ϱ cos ϕ)
∣ ∣
δϱ(ϕ) δϱ(ϕ) 2
+ N kB T Tr (ϱ ln ϱ) − N kB T λ ( Tr ϱ − 1)} .
To this end, we note that

π
δ δ dϕ 1
Tr (ϱ A) = ∫ ϱ(ϕ) A(ϕ) = A(ϕ) . (7.4.35)
δϱ(ϕ) δϱ(ϕ) 2π 2π
−π
Thus, we have
~
δF 1 1 ′ ′ cos ϕ
^ iϕ −iϕ −iϕ iϕ
=− N J (0) ⋅ [ Tr(ϱ e ) e + Tr(ϱ e ) e ] −NH ⋅
δϱ(ϕ) 2 2π ′
ϕ ϕ
′
2π
1 λ
+ N kB T ⋅ [ ln ϱ(ϕ) + 1] − N kB T ⋅ .
2π 2π
Now let us define

π
iϕ
dϕ iϕ iϕ
Tr(ϱ e ) = ∫ ϱ(ϕ) e ≡me 0
. (7.4.36)
ϕ 2π
−π
We then have
^
J (0) H
ln ϱ(ϕ) = m cos(ϕ − ϕ ) + cos ϕ + λ − 1. (7.4.37)
0
kB T kB T
Clearly the free energy will be reduced if ϕ 0

=0 so that the mean field is maximal and aligns with the external field, which prefers
ϕ = 0 . Thus, we conclude
H
ef f
ϱ(ϕ) = C exp( cos ϕ) , (7.4.38)
kB T
where
H = J^(0) m + H (7.4.39)
ef f
and C = e λ−1
. The value of λ is then determined by invoking the constraint,
π
dϕ H
ef f
Tr ϱ = 1 = C∫ exp( cos ϕ) = C I (H / kB T ) , (7.4.40)
0 ef f
2π kB T
−π
where I (z) is the Bessel function. We are free to define ε ≡ H

0 ef f
/ kB T , and treat ε as our single variational parameter. We then
have the normalized single site density matrix
exp(ε cos ϕ) exp(ε cos ϕ)
ϱ(ϕ) = = . (7.4.41)
π
dϕ
′
I0 (ε)
∫ exp(ε cos ϕ′ )
2π
−π
We next compute the following averages:
π
dϕ I1 (ε)
±iϕ ±iϕ
⟨e ⟩ = ∫ ϱ(ϕ) e =
2π I0 (ε)
−π
2
′ I1 (ε)
′ iϕ −iϕ
⟨cos(ϕ − ϕ )⟩ = Re ⟨e e ⟩ =( ) ,
I0 (ε)
as well as
π
ε cos ϕ
dϕ e I1 (ε)
Tr (ϱ ln ϱ) = ∫ {ε cos ϕ − ln I0 (ε)} = ε − ln I0 (ε) . (7.4.42)
2π I0 (ε) I0 (ε)
−π
The dimensionless free energy per site is therefore

2
1 I1 (ε) I1 (ε)
f (ε, h, θ) = − ( ) + (θε − h) − θ ln I0 (ε) , (7.4.43)
2 I0 (ε) I0 (ε)
with ^
θ = kB T / J (0) and ^
h = H / J (0) and ^
f = F /N J (0) as before. Note that the mean field equation is
m = θε − h = ⟨ e
iϕ
⟩ ,
I1 (ε)
θε − h = . (7.4.44)
I0 (ε)
For small ε , we may expand the Bessel functions, using

∞ 1 2 k
1 ( z )
ν 4
Iν (z) = ( z) ∑ , (7.4.45)
2 k! Γ(k + ν + 1)
k=0
to obtain
1 1 1 1 1
2 4 3
f (ε, h, θ) = (θ − )ε + (2 − 3θ) ε − hε + hε +… . (7.4.46)
4 2 64 2 16
This predicts a second order phase transition at θ c =

1
2
.15 Note also the Curie-Weiss form of the susceptibility at high θ :
∂f h
=0 ⟹ ε = +… . (7.4.47)
∂ε θ − θc
XY model via neglect of fluctuations method

Consider again the Hamiltonian of Equation [XYmodel]. Define z i
≡ exp(i ϕ )
i
and write
z = w + δz , (7.4.48)
i i
where w ≡ ⟨z i
⟩ and δz
i
≡ z −w
i
. Of course we also have the complex conjugate relations z = w + δz and w = ⟨z ⟩ . ∗
i
∗ ∗
i
∗ ∗
i
Writing cos(ϕ i
− ϕ ) = Re (z z )
j i
∗
j
, by neglecting the terms proportional to δz δz in H
^
we arrive at the mean field Hamiltonian, ∗
i j
\HH^\ssr{MF}=\half N \HJ(0)\,|w|^2 - \half\HJ(0)\,|w|\sum_i\big(w^* z\ns_i + w z^*_i\big) - \half H\sum_i \big(z^*_i+z\ns_i\big)
It is clear that the free energy will be minimized if the mean field w breaks the O(2) symmetry in the same direction as the external
field H , which means w ∈ R and
\HH^\ssr{MF}=\half N \HJ(0)\,|w|^2 - \big(H+\HJ(0)\,|w|\big)\sum_i\cos\phi\ns_i\quad.
The dimensionless free energy per site is then

1 h + |w|
2
f = |w | − θ ln I ( ) . (7.4.49)
0
2 θ
Differentiating with respect to |w| , one obtains
h+m
I ( )
1 θ
|w| ≡ m = , (7.4.50)
h+m
I ( )
0 θ
which is the same equation as Equation [XYvdm]. The two mean field theories yield the same results in every detail (see §10).
This page titled 7.4: Variational Density Matrix Method is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by
Daniel Arovas.
Landau’s theory of phase transitions is based on an expansion of the free energy of a thermodynamic system in terms of an order
parameter, which is nonzero in an ordered phase and zero in a disordered phase. For example, the magnetization M of a
ferromagnet in zero external field but at finite temperature typically vanishes for temperatures T > T , where T is the critical
c c
temperature, also called the Curie temperature in a ferromagnet. A low order expansion in powers of the order parameter is
appropriate sufficiently close to the phase transition, at temperatures such that the order parameter, if nonzero, is still small.
[Landau_a] Phase diagram for the quartic Landau free energy f = f + am + bm − hm , with b > 0 . There is a first order
0
1
2
2 1
4
4
line at h = 0 extending from a = −∞ and terminating in a critical point at a = 0 . For |h| < h (a) (dashed red line) there are
∗
three solutions to the mean field equation, corresponding to one global minimum, one local minimum, and one local maximum.
Insets show behavior of the free energy f (m).
Quartic free energy with Ising symmetry

The simplest example is the quartic free energy,
1 1
2 4
f (m, h = 0, θ) = f0 + am + bm , (7.5.1)
2 4
where f = f (θ) , a = a(θ) , and b = b(θ) . Here, θ is a dimensionless measure of the temperature. If for example the local
0 0
exchange energy in the ferromagnet is J , then we might define θ = k T /zJ , as before. Let us assume b > 0 , which is necessary if
B
the free energy is to be bounded from below16. The equation of state ,
∂f
3
= 0 = am + b m , (7.5.2)
∂m
−−−−− −−−−−
has three solutions in the complex m plane: (i) m = 0 , (ii) m = √−a/b , and (iii) m = −√−a/b . The latter two solutions lie
along the (physical) real axis if a < 0 . We assume that there exists a unique temperature θ where a(θ ) = 0 . Minimizing f , we c c
find
2
a
θ < θc : f (θ) = f0 −
4b
θ > θc : f (θ) = f0 .
The free energy is continuous at θ since a(θ

c c) =0 . The specific heat, however, is discontinuous across the transition, with
2
2 2 ′
∂ a θc [ a (θc )]
+ − ∣
c(θc ) − c(θc ) = −θc ∣ ( ) =− . (7.5.3)
∂θ
2 ∣ 4b 2b(θc )
θ=θc
The presence of a magnetic field h breaks the Z symmetry of m → −m . The free energy becomes
2
1 2
1 4
f (m, h, θ) = f0 + am + bm − hm , (7.5.4)
2 4
and the mean field equation is

3
bm + am − h = 0 . (7.5.5)
This is a cubic equation for m with real coefficients, and as such it can either have three real solutions or one real solution and two
complex solutions related by complex conjugation. Clearly we must have a < 0 in order to have three real roots, since bm + am 3
is monotonically increasing otherwise. The boundary between these two classes of solution sets occurs when two roots coincide,
which means f (m) = 0 as well as f (m) = 0 . Simultaneously solving these two equations, we find
′′ ′
3/2
2 (−a)
∗
h (a) = ± , (7.5.6)
3/2 1/2
3 b
or, equivalently,
3 2/3
∗ 1/3
a (h) = − b |h | . (7.5.7)
2/3
2
If, for fixed h , we have a < a (h) , then there will be three real solutions to the mean field equation f (m) = 0 , one of which is a
∗ ′
global minimum (the one for which m ⋅ h > 0 ). For a > a (h) there is only a single global minimum, at which m also has the
∗
same sign as h . If we solve the mean field equation perturbatively in h/a, we find
h b 3 5
m(a, h) = − h + O(h ) (a > 0)
4
a a
1/2 1/2
|a| h 3b 2 3
=± + ± h + O(h ) (a < 0) .
5/2
b1/2 2 |a|
8 |a|
Cubic terms in Landau theory : first order transitions

Next, consider a free energy with a cubic term,
1 2
1 3
1 4
f = f0 + am − ym + bm , (7.5.8)
2 3 4
with b >0 for stability. Without loss of generality, we may assume y > 0 (else send m → −m ). Note that we no longer have
m → −m ( Z ) symmetry. The cubic term favors positive m. What is the phase diagram in the (a, y) plane?
2
Extremizing the free energy with respect to m, we obtain

∂f 2 3
= 0 = am − y m + bm . (7.5.9)
∂m
This cubic equation factorizes into a linear and quadratic piece, and hence may be solved simply. The three solutions are m =0
and
−−−−−−−− −
2
y y a
m =m ≡ ± √( ) − . (7.5.10)
±
2b 2b b
We now see that for y < 4ab there is only one real solution, at m = 0 , while for y > 4ab there are three real solutions. Which
2 2
solution has lowest free energy? To find out, we compare the energy f (0) with f (m )17. Thus, we set +
1 1 1
2 3 4
f (m) = f (0) ⟹ am − ym + bm =0 , (7.5.11)
2 3 4
and we now have two quadratic equations to solve simultaneously:

2
0 = a − ym + bm
1 1 1 2
0 = a− ym + bm =0 .
2 3 4
[quartic] Behavior of the quartic free energy f (m) = am − ym + bm . A: y < 4ab ; B: 4ab < y < ab ; C and D:
1 2 1 3 1 4 2 2 9
2 3 4 2
ab . The thick black line denotes a line of first order transitions, where the order parameter is discontinuous across the
2 9
y >
2
transition.
Eliminating the quadratic term gives m = 3a/y . Finally, substituting m = m +
gives us a relation between a , b , and y :
9
2
y = ab . (7.5.12)
2
Thus, we have the following:
2
y
a > : 1 real root m = 0
4b
2 2
y 2y
>a > : 3 real roots; minimum at m = 0
4b 9b
2
−−−−−−−− −
2
2y y y a
>a : 3 real roots; minimum at m = + √( ) −
9b 2b 2b b
The solution m =0 lies at a local minimum of the free energy for a >0 and at a local maximum for a <0 . Over the range
2 2
y 2y
4b
>a >
9b
, then, there is a global minimum at m = 0 , a local minimum at m = m , and a local maximum at m = m , with + −
2
2y
m+ > m− > 0 . For > a > 0 , there is a local minimum at a = 0 , a global minimum at m = m
9b
, and a local maximum at +
m = m− , again with m > m > 0 . For a < 0 , there is a local maximum at m = 0 , a local minimum at m = m , and a global
+ − −
minimum at m = m , with m > 0 > m . See Figure [quartic].

+ + −
With , we have a second order transition at a = 0 . With y ≠ 0 , there is a discontinuous (first order) transition at
y =0
2
ac = 2 y /9b > 0and m = 2y/3b . This occurs before a reaches the value a = 0 where the curvature at m = 0 turns negative. If
c
we write a = α(T − T ) , then the expected second order transition at T = T is preempted by a first order transition at
0 0
T = T + 2 y /9αb .
2
c 0
Magnetization dynamics
Suppose we now impose some dynamics on the system, of the simple relaxational type
∂m ∂f
= −Γ , (7.5.13)
∂t ∂m
where Γ is a phenomenological kinetic coefficient. Assuming y > 0 and b > 0 , it is convenient to adimensionalize by writing
2
y y b
m ≡ ⋅u , a ≡ ⋅r , t ≡ ⋅s . (7.5.14)
2
b b Γy
Then we obtain
∂u ∂φ
=− , (7.5.15)
∂s ∂u
where the dimensionless free energy function is

1 1 1
2 3 4
φ(u) = ru − u + u . (7.5.16)
2 3 4
We see that there is a single control parameter, r. The fixed points of the dynamics are then the stationary points of , where
φ(u)
φ (u) = 0 , with
′
′ 2
φ (u) = u (r − u + u ) . (7.5.17)
The solutions to φ (u) = 0 are then given by

′
−−−−−
1 1
∗ ∗
u =0 , u = ±√ −r . (7.5.18)
2 4
For r > there is one fixed point at u = 0 , which is attractive under the dynamics u̇ = −φ (u) since φ (0) = r . At r = there
1
4
′ ′′ 1
occurs a saddle-node bifurcation and a pair of fixed points is generated, one stable and one unstable. As we see from Figure
[Landau_a], the interior fixed point is always unstable and the two exterior fixed points are always stable. At r = 0 there is a
transcritical bifurcation where two fixed points of opposite stability collide and bounce off one another (metaphorically speaking).
[Landau_b] Fixed points for φ(u) = ru − u + u and flow under the dynamics u̇ = −φ (u) . Solid curves represent stable
1
2
2 1
3
3 1
4
4 ′
fixed points and dashed curves unstable fixed points. Magenta arrows show behavior under slowly increasing control parameter r
and dark blue arrows show behavior under slowly decreasing r . For u > 0 there is a hysteresis loop. The thick black curve shows
the equilibrium thermodynamic value of u(r), that value which minimizes the free energy φ(u). There is a first order phase
transition at r = , where the thermodynamic value of u jumps from u = 0 to u = .
2
9
2
At the saddle-node bifurcation, r = 1
4
and u = , and we find φ(u =
1
2
1
2
;r =
1
4
) = , which is positive. Thus, the
192
1
thermodynamic state of the system remains at u = 0 until the value of φ(u +

) crosses zero. This occurs when φ(u) = 0 and
φ (u) = 0 , the simultaneous solution of which yields r = and u = .
′ 2 2
9 3
Suppose we slowly ramp the control parameter r up and down as a function of the dimensionless time s . Under the dynamics of
Equation [LBdyn], u(s) flows to the first stable fixed point encountered – this is always the case for a dynamical system with a
one-dimensional phase space. Then as r is further varied, u follows the position of whatever locally stable fixed point it initially
encountered. Thus, u(r(s)) evolves smoothly until a bifurcation is encountered. The situation is depicted by the arrows in Figure
[Landau_b]. The equilibrium thermodynamic value for u(r) is discontinuous; there is a first order phase transition at r = , as 2
we’ve already seen. As r is increased, u(r) follows a trajectory indicated by the magenta arrows. For an negative initial value of u,
the evolution as a function of r will be reversible. However, if u(0) is initially positive, then the system exhibits hysteresis, as
shown. Starting with a large positive value of r, u(s) quickly evolves to u = 0 , which means a positive infinitesimal value. Then
+
as r is decreased, the system remains at u = 0 even through the first order transition, because u = 0 is an attractive fixed point.
+
However, once r begins to go negative, the u = 0 fixed point becomes repulsive, and u(s) quickly flows to the stable fixed point
−−−−−
u
+
=
1
2
+√
1
4
−r . Further decreasing r, the system remains on this branch. If r is later increased, then u(s) remains on the
−−−−−
upper branch past r =0 , until the u+ fixed point annihilates with the unstable fixed point at u− =
1
2
−√
1
4
−r , at which time
u(s) quickly flows down to u = 0 again. +
[fsextic] Behavior
of the sextic free energy f (m) = am + bm + cm . A: a > 0 and b > 0 ; B: a < 0 and b > 0 ; C: a < 0 and b < 0 ; D:
1
2
2 1
4
4 1
6
6
−− −− −− −−
a > 0 and b < − √ac ; E: a > 0 and − √ac < b < −2√ac ; F: a > 0 and −2√ac < b < 0 . The thick dashed line is a
4 4
√3 √3
line of second order transitions, which meets the thick solid line of first order transitions at the tricritical point, (a, b) = (0, 0).
Sixth order Landau theory: tricritical point

Finally, consider a model with Z symmetry, with the Landau free energy
2
1 2
1 4
1 6
f = f0 + am + bm + cm , (7.5.19)
2 4 6
with c > 0 for stability. We seek the phase diagram in the (a, b) plane. Extremizing f with respect to m, we obtain
∂f
2 4
= 0 = m (a + b m + cm ) , (7.5.20)
∂m
which is a quintic with five solutions over the complex m plane. One solution is obviously m = 0 . The other four are
−−−−−−−− −
−−
−−
− −
−−
−−
− −
−−
−−
−−
−

2
 b b a

m =± − ± √( ) − . (7.5.21)
⎷ 2c 2c c
For each ± symbol in the above equation, there are two options, hence four roots in all.
If a > 0 and b > 0 , then four of the roots are imaginary and there is a unique minimum at m = 0 .
For a < 0 , there are only three solutions to f (m) = 0 for real m, since the
′
− choice for the ± sign under the radical leads to
imaginary roots. One of the solutions is m = 0 . The other two are
−−−−−−−−−−−−−−−− −
−−−−−−−− −
2
b b a
m = ±√ − + √ ( ) − . (7.5.22)
2c 2c c
The most interesting situation is a > 0 and b < 0 . If a > 0 and b < −2√− −
ac , all five roots are real. There must be three minima,
separated by two local maxima. Clearly if m is a solution, then so is −m . Thus, the only question is whether the outer minima
∗ ∗
are of lower energy than the minimum at m = 0 . We assess this by demanding f (m ) = f (0), where m is the position of the
∗ ∗
largest root ( the rightmost minimum). This gives a second quadratic equation,
1 1 1
2 4
0 = a+ bm + cm , (7.5.23)
2 4 6
which together with equation [quintic] gives

4
−−
b =− ac . (7.5.24)
– √
√3
Thus, we have the following, for fixed a > 0 :

−−
b > −2 √ac : 1 real root m = 0
−− 4 −−
−2 √ac > b > − √ac : 5 real roots; minimum at m = 0
–
√3
−−−−−−−−−−−−−−−− −
−−−−−−−− −
4 b b 2 a
−−
− ac > b : 5 real roots; minima at m = ±√ − + √( ) −
– √
√3 2c 2c c
The point (a, b) = (0, 0), which lies at the confluence of a first order line and a second order line, is known as a tricritical point.
[sexfree] Free energy φ(u) = 1
2
2
ru −
1
4
4
u +
1
6
6
u for several different values of the control parameter r .
Hysteresis for the sextic potential

Once again, we consider the dissipative dynamics ṁ = −Γ f ′
(m) . We adimensionalize by writing
− −−
2
|b| b c
m ≡ √ ⋅u , a ≡ ⋅r , t ≡ ⋅s . (7.5.25)
2
c c Γb
Then we obtain once again the dimensionless equation
∂u ∂φ
=− , (7.5.26)
∂s ∂u
where
1 1 1
2 4 6
φ(u) = ru ± u + u . (7.5.27)
2 4 6
In the above equation, the coefficient of the quartic term is positive if b > 0 and negative if b < 0 . That is, the coefficient is
sgn(b) . When b > 0 we can ignore the sextic term for sufficiently small u, and we recover the quartic free energy studied earlier.
There is then a second order transition at r = 0 . .

New and interesting behavior occurs for b > 0 . The fixed points of the dynamics are obtained by setting φ (u) = 0 . We have ′
1 2
1 4
1 6
φ(u) = ru − u + u
2 4 6
′ 2 4
φ (u) = u (r − u +u ) .
Thus, the equation φ (u) = 0 factorizes into a linear factor u and a quartic factor u
′ 4 2
−u +r which is quadratic in u . Thus, we
2
can easily obtain the roots:

−−−−−−−−−−−
−−−−−
∗ ∗
1 1
r <0 : u =0 , u = ±√ +√ −r
2 4
−−−−−−
−−−
−−−
−−
−− −−−−−−
−−−
−−−
−−
−−
1 ∗ ∗
1 1 ∗
1 1
0 <r < : u =0 , u = ±√ +√ −r , u = ±√ −√ −r
4 2 4 2 4
1 ∗
r > : u =0 .
4
In Figure [Landau_c], we plot the fixed points and the hysteresis loops for this system. At r = , there are two symmetrically 1
located saddle-node bifurcations at u = ± . We find φ(u = ± , r = ) =

√2
1
, which is positive, indicating that the stable
√2
1 1
4
1
48
fixed point ∗
u =0 remains the thermodynamic minimum for the free energy φ(u) as r is decreased through r =
1
4
. Setting
√3
φ(u) = 0 and ′
φ (u) = 0 simultaneously, we obtain r =
16
3
and u =±
2
. The thermodynamic value for u therefore jumps
√3
discontinuously from u = 0 to u = ± 2
(either branch) at r = 3
16
; this is a first order transition.
Under the dissipative dynamics considered here, the system exhibits hysteresis, as indicated in the figure, where the arrows show
the evolution of u(s) for very slowly varying r(s) . When the control parameter r is large and positive, the flow is toward the sole
fixed point at u = 0 . At r = , two simultaneous saddle-node bifurcations take place at u = ± ; the outer branch is stable and
∗ 1
4
∗ 1
√2
the inner branch unstable in both cases. At r = 0 there is a subcritical pitchfork bifurcation, and the fixed point at u ∗
=0 becomes
unstable.
[Landau_c] Fixed
points ′ ∗
φ (u ) = 0 for the sextic potential φ(u) = ru − u + u , and corresponding dynamical flow (arrows) under
1
2
2 1
4
4 1
6
6
u̇ = −φ (u) . Solid curves show stable fixed points and dashed curves show unstable fixed points. The thick solid black and solid
′
grey curves indicate the equilibrium thermodynamic values for u ; note the overall u → −u symmetry. Within the region
] the dynamics are irreversible and the system exhibits the phenomenon of hysteresis. There is a first order phase
1
r ∈ [0,
4
transition at r = .3
16
Suppose one starts off with r ≫ with some value u > 0 . The flow u̇ = −φ (u) then rapidly results in u → 0 . This is the ‘high
1
4
′ +
temperature phase’ in which there is no magnetization. Now let r increase slowly, using s as the dimensionless time variable. The
scaled magnetization u(s) = u (r(s)) will remain pinned at the fixed point u = 0 . As r passes through r = , two new stable
∗ ∗ + 1
values of u appear, but our system remains at u = 0 , since u = 0 is a stable fixed point. But after the subcritical pitchfork,
∗ + ∗
u = 0 becomes unstable. The magnetization u(s) then flows rapidly to the stable fixed point at u = , and follows the curve
∗ ∗ 1
√2
1/2
∗
u (r) = (
1
2
+(
1
4
1/2
− r) ) for all r < 0 .
Now suppose we start increasing r ( increasing temperature). The magnetization follows the stable fixed point
1/2
∗
u (r) = (
1
2
+(
1
4
1/2
− r) past r = 0 , beyond the first order phase transition point at r = , and all the way up to r = , at
)
3
16
1
which point this fixed point is annihilated at a saddle-node bifurcation. The flow then rapidly takes u → u = 0 , where it remains
∗ +
as r continues to be increased further.

Within the region r ∈ [0, 1
4
] of control parameter space, the dynamics are said to be irreversible and the behavior of u(s) is said to
be hysteretic.
This page titled 7.5: Landau Theory of Phase Transitions is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by
Daniel Arovas.
Correlation and response in mean field theory
Consider the Ising model,
1
^
H =− ∑J σ σ −∑H σ , (7.6.1)
ij i j k k
2
i,j k
where the local magnetic field on site k is now H

k
. We assume without loss of generality that the diagonal terms vanish: J
ii
=0 . Now
^
consider the partition function Z = Tr e
−βH
as a function of the temperature T and the local field values {H }. We have i
∂Z ^
−βH
= β Tr [σ e ] = βZ ⋅ ⟨σ ⟩
i i
∂H
i
2
∂ Z 2
^
−βH 2
=β Tr [σ σ e ] = β Z ⋅ ⟨σ σ ⟩ .
i j i j
∂H ∂H
i j
Thus,
∂F
m =− = ⟨σ ⟩
i i
∂H
i
2
∂m ∂ F 1
i
χ = =− = ⋅ {⟨σ σ ⟩ − ⟨σ ⟩ ⟨σ ⟩} .
ij i j i j
∂H ∂H ∂H kB T
j i j
Expressions such as ⟨σ ⟩, ⟨σ
i i
σ ⟩
j
, are in general called correlation functions. For example, we define the spin-spin correlation function C as ij
C ≡ ⟨σ σ ⟩ − ⟨σ ⟩ ⟨σ ⟩ . (7.6.2)
ij i j i j
Expressions such as ∂F
∂H
and ∂ F
∂H ∂H
are called response functions. The above relation between correlation functions and response functions,
i i j
C = k T χ , is valid only for the equilibrium distribution. In particular, this relationship is invalid if one uses an approximate distribution,
B
ij ij
such as the variational density matrix formalism of mean field theory.

The question then arises: within mean field theory, which is more accurate: correlation functions or response functions? A simple argument
suggests that the response functions are more accurate representations of the real physics. To see this, let’s write the variational density matrix
ϱ
var
as the sum of the exact equilibrium (Boltzmann) distribution ϱ = Z exp(−β H ^
) plus a deviation δϱ:
eq −1
var eq
ϱ =ϱ + δϱ . (7.6.3)
Then if we calculate a correlator using the variational distribution, we have

var
⟨σ σ ⟩var = Tr [ϱ σ σ ]
i j i j
eq
= Tr [ϱ σ σ ] + Tr [δϱ σ σ ] .
i j i j
Thus, the variational density matrix gets the correlator right to first order in δϱ. On the other hand, the free energy is given by
2
var eq
∂F ∣ 1 ∂ F ∣
F =F +∑ ∣ δϱσ + ∑ ∣ δϱσ δϱ ′ + … . (7.6.4)
σ
∂ϱσ ∣ϱeq 2 ′ ∂ϱ ∂ϱ
∣ϱeq
σ σ,σ ′
σ σ
Here σ denotes a state of the system, | σ ⟩ = | σ , … , σ ⟩, where every spin polarization is specified. Since the free energy is an extremum
1 N
(and in fact an absolute minimum) with respect to the distribution, the second term on the RHS vanishes. This means that the free energy is
accurate to second order in the deviation δϱ.
Calculation of the response functions

Consider the variational density matrix
ϱ(σ) = ∏ ϱ (σ ) , (7.6.5)
i i
where
1 +m 1 −m
i i
ϱ (σ ) = ( ) δσ ,1 +( ) δσ ,−1 . (7.6.6)
i i i i
2 2
The variational energy E = Tr (ϱ H
^
) is
1
E =− ∑J m m −∑H m (7.6.7)
i,j i j i i
2
ij i
and the entropy S = −k B

T Tr (ϱ ln ϱ) is
1 +m 1 +m 1 −m 1 −m
i i i i
S = −kB ∑ {( ) ln( ) +( ) ln( )} . (7.6.8)
2 2 2 2
i
Setting the variation ∂F
∂m
=0 , with F = E −TS , we obtain the mean field equations,
i
m = tanh (β J m + βH ) , (7.6.9)
i ij j i
where we use the summation convention: J ij

m
j
≡∑
j
J
ij
m
j
. Suppose T > Tc and m is small. Then we can expand the RHS of the above
i
mean field equations, obtaining
(δ − βJ )m = βH . (7.6.10)
ij ij j i
Thus, the susceptibility tensor χ is the inverse of the matrix (k B

T ⋅ I − J) :
∂m −1
i
χ = = (kB T ⋅ I − J ) , (7.6.11)
ij ij
∂H
j
where I is the identity. Note also that so-called connected averages of the kind in Equation [connavg] vanish identically if we compute them
using our variational density matrix, since all the sites are independent, hence
var
⟨σ σ ⟩ = Tr (ϱ σ σ ) = Tr (ϱ σ ) ⋅ Tr (ϱ σ ) = ⟨σ ⟩ ⋅ ⟨σ ⟩ , (7.6.12)
i j i j i i j j i j
and therefore χ = 0 if we compute the correlation functions themselves from the variational density matrix, rather than from the free energy
ij
F . As we have argued above, the latter approximation is more accurate.
Assuming J ij
= J(R − R )
i j
, where R is a Bravais lattice site, we can Fourier transform the above equation, resulting in
i
^
H (q)
m
^ (q) = ≡χ ^ (q) .
^(q) H (7.6.13)
^
kB T − J (q)
Once again, our definition of lattice Fourier transform of a function ϕ(R) is

^ −iq⋅R
ϕ (q) ≡ ∑ ϕ(R) e
d
d q
^ iq⋅R
ϕ(R) = Ω∫ ϕ (q) e ,
(2π)d
^
Ω
where Ω is the unit cell in real space, called the Wigner-Seitz cell, and Ω
^
is the first Brillouin zone, which is the unit cell in reciprocal space.
Similarly, we have
1 2
^
J (q) = ∑ J(R)(1 − iq ⋅ R − (q ⋅ R) +… )
2
R
^ 2 2 4
= J (0) ⋅ {1 − q R∗ + O(q )} ,
where
2
∑ R J(R)
2 R
R∗ = . (7.6.14)
2d ∑R J(R)
Here we have assumed inversion symmetry for the lattice, in which case
μ ν
1 μν 2
∑R R J(R) = ⋅δ ∑ R J(R) . (7.6.15)
d
R R
−−
On cubic lattices with nearest neighbor interactions only, one has R∗ = a/ √2d , where a is the lattice constant and d is the dimension of
space.
Thus, with the identification k B
^
Tc = J (0) , we have
1
χ
^(q) =
2 2 4
kB (T − Tc ) + kB Tc R∗ q + O(q )
1 1
= ⋅ ,
2 −2 2 4
kB Tc R∗ ξ +q + O(q )
where
−1/2
T − Tc
ξ = R∗ ⋅ ( ) (7.6.16)
Tc
is the correlation length. With the definition

−ν
ξ(T ) ∝ |T − Tc | (7.6.17)
as T → T , we obtain the mean field correlation length exponent ν = . The exact result for the two-dimensional Ising model is
c
1
2
ν =1 ,
whereas ν ≈ 0.6 for the d = 3 Ising model. Note that χ
^(q = 0, T ) diverges as (T − T ) for T > T . c
−1
c
In real space, we have
m = ∑χ H , (7.6.18)
i ij j
where
d
d q
iq⋅( Ri −Rj )
χ = Ω∫ χ
^(q) e . (7.6.19)
ij
(2π)d
Note that χ^(q) is properly periodic under q → q+G , where G is a reciprocal lattice vector, which satisfies e
iG ⋅R
=1 for any direct
Bravais lattice vector R . Indeed, we have
−1 ^
χ
^ (q) = kB T − J (q)
iq⋅δ
= kB T − J ∑ e ,
where δ is a nearest neighbor separation vector, and where in the second line we have assumed nearest neighbor interactions only. On cubic
lattices in d dimensions, there are 2d nearest neighbor separation vectors, δ = ±a e
^ , where μ ∈ {1, … , d}. The real space susceptibility is
μ
then
π π
in θ in θ
dθ dθ e 1 1
⋯e d d
1 d
χ(R) = ∫ ⋯∫ , (7.6.20)
2π 2π kB T − (2J cos θ + … + 2J cos θ )
1 d
−π −π
d
where R = a ∑ μ=1
^
nμ e μ
is a general direct lattice vector for the cubic Bravais lattice in d dimensions, and the {n μ} are integers.
18
The long distance behavior was discussed in chapter 6 (see §6.5.9 on Ornstein-Zernike theory ). For convenience we reiterate those results:
In d = 1 ,
ξ
−|x|/ξ
χ (x) = ( ) e . (7.6.21)
d=1 2
2kB Tc R∗
In d > 1 , with r → ∞ and ξ fixed,
\xhiOZ_d(\Br)\simeq C\ns_d\cdot{\xi^{(3-d)/2}\over \kT\,R_*^2}\cdot{e^{-r/\xi}\over r^{(d-1)/2}}\cdot\left\{1+\CO\bigg({d-3\over r/\xi}\bigg)\right\}\ ,

d
In d > 2 , with ξ → ∞ and r fixed ( T → T at fixed separation r ),

c
′ −r/ξ
C e d−3
d
χ (r) ≃ ⋅ ⋅ {1 + O( )} . (7.6.22)
d 2 d−2
kB T R∗ r r/ξ
In d = 2 dimensions we obtain
′
C r 1
2 −r/ξ
χ (r) ≃ ⋅ ln( )e ⋅ {1 + O( )} , (7.6.23)
d=2 2
kB T R∗ ξ ln(r/ξ)
′
d
Beyond the Ising model

Consider a general spin model, and a variational density matrix ϱ var which is a product of single site density matrices:
(i)
ϱvar [{ S }] = ∏ ϱ (S ) , (7.6.24)
i 1 i
where Tr (ϱ var S) = m
i
is the local magnetization and S , which may be a scalar (, σ in the Ising model previously discussed), is the local
i i
(i)
spin operator. Note that ϱ 1
(Si ) depends parametrically on the variational parameter(s) m . Let the Hamiltonian be i
1 μν μ
^ ν
H =− ∑J S S + ∑ h(Si ) − ∑ Hi ⋅ Si . (7.6.25)
ij i j
2
i,j i i
The variational free energy is then

1 μν μ μ μ
ν
Fvar = − ∑J m m + ∑ φ(m , T ) − ∑ H m , (7.6.26)
ij i j i i i
2
i,j i i
where the single site free energy φ(m i

,T) in the absence of an external field is given by
(i) (i) (i)
φ(m , T ) = Tr[ϱ (S) h(S)] + kB T Tr[ϱ (S) ln ϱ (S)] (7.6.27)
i 1 1 1
We then have
∂Fvar ∂φ(m , T )
μν ν μ i
= −∑J m −H + . (7.6.28)
μ ij j i μ
∂m ∂m
i j i
For the noninteracting system, we have J μν
ij
=0 , and the weak field response must be linear. In this limit we may write
μ
(T ) H + O(H ) , and we conclude
0 ν 3
m =χ μν
i i i
∂φ(m , T ) −1
i 0 ν 3
= [ χ (T )] m + O(m ) . (7.6.29)
μ μν i i
∂m
i
Note that this entails the following expansion for the single site free energy in zero field:
1 −1
0 ν ν 4
φ(m , T ) = [ χ (T )] m m + O(m ) . (7.6.30)
i μν i i
2
μ
Finally, we restore the interaction term and extremize F var by setting ∂ F var /∂ mi =0 . To linear order, then,
μ 0 ν νλ λ
m = χμν (T ) (H +∑J m ) . (7.6.31)
i i ij j
Typically the local susceptibility is a scalar in the internal spin space, χ 0

μν (T ) = χ (T ) δμν
0
, in which case we obtain
μν 0 μν ν 0 μ
(δ δ − χ (T ) J )m = χ (T ) H . (7.6.32)
ij ij i i
In Fourier space, then,

−1
^
0 0 ^
χ μν
(q, T ) = χ (T ) (1 − χ (T ) J(q)) , (7.6.33)
μν
μν μν
where J
^
(q) is the matrix whose elements are J
^
(q) . If J^ ^
(q) = J (q) δ
μν
, then the susceptibility is isotropic in spin space, with
1
χ
^(q, T ) = . (7.6.34)
−1
0 ^
[ χ (T )] − J (q)
Consider now the following illustrative examples:
Quantum spin S with h(S) = 0 : We take the z

^ axis to be that of the local external magnetic field, H
^
. Write i
exp(u S / k T ) , where u = u(m, T ) is obtained implicitly from the relation m(u, T ) = Tr(ϱ . The normalization
−1 z z
ϱ (S) = z B
S )
1 1
constant is
S 1
z
sinh[(S + ) u/ kB T ]
uS / kB T ju/ kB T 2
z = Tr e = ∑ e = (7.6.35)
j=−S
sinh[u/2 kB T ]
The relation between m, u, and T is then given by
∂ ln z 1 1 1
z
m = ⟨S ⟩ = kB T = (S + ) ctnh[(S + ) u/ kB T ] − ctnh[u/2 kB T ]
∂u 2 2 2
S(S + 1)
3
= u + O(u ) .
3 kB T
The free-field single-site free energy is then
φ(m, T ) = kB T Tr (ϱ ln ϱ ) = um − kB T ln z , (7.6.36)
1 1
whence
∂φ ∂u ∂ ln z ∂u
−1 3
= u +m − kB T =u ≡χ (T ) m + O(m ) , (7.6.37)
0
∂m ∂m ∂u ∂m
and we thereby obtain the result

S(S + 1)
χ (T ) = , (7.6.38)
0
3 kB T
which is the Curie susceptibility.

Classical spin S = S n^ with h = 0 and n ^ an N -component unit vector : We take the single site density matrix to be
exp(u ⋅ S/ k T ) . The single site field-free partition function is then

−1
ϱ (S) = z B
1
2 2
^
dn S u 4
z =∫ exp(u ⋅ S/ kB T ) = 1 + + O(u ) (7.6.39)
Ω N (kB T )2
N
and therefore
2
∂ ln z S u 3
m = kB T = + O(u ) , (7.6.40)
∂u N kB T
from which we read off χ (T ) = S /N k T . Note that this agrees in the classical (S → ∞ ) limit, for N = 3 , with our previous result.
0
2
B
Quantum spin S with h(S) = Δ(S ) : This corresponds to so-called easy plane anisotropy, meaning that the single site energy h(S) is
z 2
minimized when the local spin vector S lies in the (x, y) plane. As in example (i), we write ϱ (S) = z exp(u S /k T ) , yielding the 1
−1 z
B
same expression for z and the same relation between z and u. What is different is that we must evaluate the local energy,
2
∂ ln z
2
e (u, T ) = Tr (ϱ h(S)) = Δ (kB T )
1 2
∂u
2 2
Δ 1 (2S + 1) S(S + 1)Δ u
4
= [ − ] = + O(u ) .
2 2 2
4 sinh [u/2 kB T ] sinh [(2S + 1)u/2 kB T ] 6(kB T )
We now have φ = e + um − k B
T ln z , from which we obtain the susceptibility
S(S + 1)
0
χ (T ) = . (7.6.41)
3(kB T + Δ)
Note that the local susceptibility no longer diverges as T → 0 , because there is always a gap in the spectrum of h(S).
This page titled 7.6: Mean Field Theory of Fluctuations is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel
Arovas.
Symmetries and symmetry groups
Interacting systems can be broadly classified according to their global symmetry group. Consider the following five examples:
^
H Ising = − ∑ J σ σ σ ∈ {−1, +1}
ij i j i
i<j
2π(n − n )
i j
^
H p−clock = − ∑ J cos ( ) n ∈ {1, 2, … , p}
ij i
p
i<j
^
H q−P otts = − ∑ J δσ ,σ
σ ∈ {1, 2, … , q}
ij i j
i
i<j
^
H XY = −∑J cos(ϕ − ϕ ) ϕ ∈ [0, 2π]
ij i j i
i<j
^ ^ ^ ^ n−1
H O(n) = − ∑ J Ωi ⋅ Ωj Ωi ∈ S .
ij
i<j
The Ising Hamiltonian is left invariant by the global symmetry group Z , which has two elements, I and η , with 2
η σ = −σ . (7.7.1)
i i
I is the identity, and η 2

=I . By simultaneously reversing all the spins σ i
→ −σ
i
, the interactions remain invariant.
The degrees of freedom of the p-state clock model are integer variables n each of which ranges from i
1 to p. The Hamiltonian is
invariant under the discrete group Z , whose p elements are generated by the single operation η , where
p
n +1 if n ∈ {1, 2, … , p − 1}
i i
η n ={ (7.7.2)
i
1 if n =p .
i
Think of a clock with one hand and p ‘hour’ markings consecutively spaced by an angle 2π/p. In each site i, a hand points to one
of the p hour marks; this determines n . The operation η simply advances all the hours by one tick, with hour p advancing to hour
i
1 , just as 23:00 military time is followed one hour later by 00:00. The interaction cos (2π(n − n )/p) is invariant under such an
i j
operation. The p elements of the group Z are then p
2 p−1
I, η , η , … , η . (7.7.3)
We’ve already met up with the q-state Potts model, where each site supports a ‘spin’ σ which can be in any of q possible states, i
which we may label by integers {1 , … , q}. The energy of two interacting sites i and j is −J if σ = σ and zero otherwise. ij i j
This energy function is invariant under global operations of the symmetric group on q characters, S , which is the group of q
permutations of the sequence {1 , 2 , 3 , … , q}. The group S has q! elements. Note the difference between a Z symmetry and
q q
an S symmetry. In the former case, the Hamiltonian is invariant only under the q-element cyclic permutations,
q
1 2 ⋯ q−1 q
η ≡( ) (7.7.4)
2 3 ⋯ q 1
and its powers η with l = 0, … , q − 1 .

l
All these models – the Ising, p-state clock, and q-state Potts models – possess a global symmetry group which is discrete. That is,
each of the symmetry groups Z , Z , S is a discrete group, with a finite number of elements. The XY Hamiltonian H
2 p q
^
on the XY
other hand is invariant under a continuous group of transformations ϕ → ϕ + α , where ϕ is the angle variable on site i. More to i i i
the point, we could write the interaction term cos(ϕ − ϕ ) as (z z + z z ) , where z = e is a phase which lives on the unit
i j
1
2
∗
i j i
∗
j i
iϕi
circle, and z is the complex conjugate of z . The model is then invariant under the global transformation z → e z . The phases
∗
i i i
iα
i
e form a group under multiplication, called U(1) , which is the same as O(2). Equivalently, we could write the interaction as
iα
^ ^
Ωi ⋅ Ωj , where Ω
^
= (cos ϕ , sin ϕ ) , which explains the O(2), symmetry, since the symmetry operations are global rotations in
i i i
the plane, which is to say the two-dimensional orthogonal group. This last representation generalizes nicely to unit vectors in n
dimensions, where
^ 1 2 n
Ω = (Ω , Ω , … , Ω ) (7.7.5)
2
with ^
Ω =1 . The dot product ^ ^
Ω ⋅Ω
i j
is then invariant under global rotations in this n -dimensional space, which is the group
O(n) .
[DWonedim] A domain wall in a one-dimensional Ising model.
Lower critical dimension

Depending on whether the global symmetry group of a model is discrete or continuous, there exists a lower critical dimension d at ℓ
or below which no phase transition may take place at finite temperature. That is, for d ≤ d , the critical temperature is T = 0 .
ℓ c
Owing to its neglect of fluctuations, mean field theory generally overestimates the value of T because it overestimates the stability
c
of the ordered phase. Indeed, there are many examples where mean field theory predicts a finite T when the actual critical c
temperature is T = 0 . This happens whenever d ≤ d .

c
ℓ
Let’s test the stability of the ordered (ferromagnetic) state of the one-dimensional Ising model at low temperatures. We consider
order-destroying domain wall excitations which interpolate between regions of degenerate, symmetry-related ordered phase, ↑↑↑↑↑
and ↓↓↓↓↓. For a system with a discrete symmetry at low temperatures, the domain wall is abrupt, on the scale of a single lattice
spacing. If the exchange energy is J , then the energy of a single domain wall is 2J, since a link of energy −J is replaced with one
of energy +J . However, there are N possible locations for the domain wall, hence its entropy is k ln N . For a system with M B
domain walls, the free energy is

N
F = 2M J − kB T ln ( )
M
= N ⋅ {2Jx + kB T [x ln x + (1 − x) ln(1 − x)]} ,
where x = M /N is the density of domain walls, and where we have used Stirling’s approximation for k! when k is large.
Extremizing with respect to x, we find
x −2J/ kB T
1
=e ⟹ x = . (7.7.6)
2J/ kB T
1 −x e +1
The average distance between domain walls is x

−1
, which is finite for finite T . Thus, the thermodynamic state of the system is
disordered, with no net average magnetization.
[DWIsing] Domain walls in the two-dimensional (left) and three-dimensional (right) Ising model.
Consider next an Ising domain wall in d dimensions. Let the linear dimension of the system be L ⋅ a , where L is a real number and
a is the lattice constant. Then the energy of a single domain wall which partitions the entire system is 2J ⋅ L . The domain wall d−1
entropy is difficult to compute, because the wall can fluctuate significantly, but for a single domain wall we have S ≈ k ln L . B
Thus, the free energy F = 2J L d−1

− k T ln L is dominated by the energy term if d > 1 , suggesting that the system may be
B
ordered. We can do a slightly better job in d = 2 by writing
d −2P J/ kB T
Z ≈ exp ( L ∑N e ) , (7.7.7)
P
where the sum is over all closd loops of perimeter P , and N is the number of such loops. An example of such a loop
P
circumscribing a domain is depicted in the left panel of Figure [DWIsing]. It turns out that
P −θ −1
N ≃κ P ⋅ {1 + O(P )} , (7.7.8)
P
where κ = z − 1 with z the lattice coordination number, and θ is some exponent. We can understand the κ factor in the following P
way. At each step along the perimeter of the loop, there are κ = z−1 possible directions to go (since one doesn’t backtrack). The
fact that the loop must avoid overlapping itself and must return to its original position to be closed leads to the power law term
P
−θ
, which is subleading since κ P = exp(P ln κ − θ ln P ) and P ≫ ln P for P ≫ 1 . Thus,
P −θ
1 d −θ (ln κ−2βJ)P
F ≈− L ∑P e , (7.7.9)
β
P
which diverges if ln κ > 2βJ , if T > 2J/k ln(z − 1) . We identify this singularity with the phase transition. The high
B
temperature phase involves a proliferation of such loops. The excluded volume effects between the loops, which we have not taken
into account, then enter in an essential way so that the sum converges. Thus, we have the following picture:
ln κ < 2βJ : large loops suppressed ; ordered phase
ln κ > 2βJ : large loops proliferate ; disordered phase .
On the square lattice, we obtain
approx 2J
kB Tc = = 1.82 J
ln 3
2J
exact
kB Tc = = 2.27 J .
−1
sinh (1)
The agreement is better than we should reasonably expect from such a crude argument.
Nota bene : Beware of arguments which allegedly prove the existence of an ordered phase. Generally speaking, any approximation
will underestimate the entropy, and thus will overestimate the stability of the putative ordered phase.
Continuous symmetries
When the global symmetry group is continuous, the domain walls interpolate smoothly between ordered phases. The energy
generally involves a stiffness term,
1 d 2
E = ρs ∫ d r (∇θ) , (7.7.10)
2
where θ(r) is the angle of a local rotation about a single axis and where ρs is the spin stiffness. Of course, in O(n) models, the
rotations can be with respect to several different axes simultaneously.
[XYdomainwall] A domain wall in an XY ferromagnet.
In the ordered phase, we have θ(r) = θ , a constant. Now imagine a domain wall in which θ(r) rotates by 2π across the width of
0
the sample. We write θ(r) = 2πnx/L, where L is the linear size of the sample (here with dimensions of length) and n is an integer
telling us how many complete twists the order parameter field makes. The domain wall then resembles that in Figure
[XYdomainwall]. The gradient energy is
L
2
1 2πn
d−1 2 2 d−2
E = ρs L ∫ dx ( ) = 2 π n ρs L . (7.7.11)
2 L
0
Recall that in the case of discrete symmetry, the domain wall energy scaled as E ∝ L . Thus, with S ≈ k ln L for a single wall,d−1
B
we see that the entropy term dominates if d ≤ 2 , in which case there is no finite temperature phase transition. Thus, the lower
critical dimension d depends on whether the global symmetry is discrete or continuous, with
ℓ
discrete global symmetry ⟹ d =1

ℓ
continuous global symmetry ⟹ d =2 .

ℓ
Note that all along we have assumed local, short-ranged interactions. Long-ranged interactions can enhance order and thereby
suppress d .ℓ
Thus, we expect that for models with discrete symmetries, d = 1 and there is no finite temperature phase transition for d ≤ 1 . For
ℓ
models with continuous symmetries, d = 2 , and we expect T = 0 for d ≤ 2 . In this context we should emphasize that the two-
ℓ
c
dimensional XY model does exhibit a phase transition at finite temperature, called the Kosterlitz-Thouless transition. However,
this phase transition is not associated with the breaking of the continuous global O(2) symmetry and rather has to do with the
unbinding of vortices and antivortices. So there is still no true long-ranged order below the critical temperature T\ns_\ssr{KT} ,
even though there is a phase transition!
Random systems : Imry-Ma argument

Oftentimes, particularly in condensed matter systems, intrinsic randomness exists due to quenched impurities, grain boundaries,
immobile vacancies, How does this quenched randomness affect a system’s attempt to order at T = 0 ? This question was taken up
in a beautiful and brief paper by J. Imry and S.-K. Ma, Phys. Rev. Lett. 35, 1399 (1975). Imry and Ma considered models in which
there are short-ranged interactions and a random local field coupling to the local order parameter:
^
H RF I = −J ∑ σ σ −∑H σ
i j i i
⟨ij⟩ i
^ ^ ^ α α
H RF O(n) = −J ∑ Ωi ⋅ Ωj − ∑ H Ω ,
i i
⟨ij⟩ i
where
α α β αβ
⟨⟨ H ⟩⟩ = 0 , ⟨⟨ H H ⟩⟩ = Γ δ δ , (7.7.12)
i i j ij
where ⟨⟨ ⋅ ⟩⟩ denotes a configurational average over the disorder. Imry and Ma reasoned that a system could try to lower its free
energy by forming domains in which the order parameter takes advantage of local fluctuations in the random field. The size of
these domains is assumed to be L , a length scale to be determined. See the sketch in the left panel of Figure [ImryMa].
d
[ImryMa] Left panel : Imry-Ma domains for an O(2) model. The arrows point in the direction of the local order parameter field
⟨ Ω(r)⟩. Right panel : free energy density as a function of domain size L . Keep in mind that the minimum possible value for L is
^
d d
the lattice spacing a .
There are two contributions to the energy of a given domain: bulk and surface terms. The bulk energy is
d/2
E = −Hrms (Ld /a) , (7.7.13)
bulk
where a is the lattice spacing. This is because when we add together (L /a) random fields, the magnitude of the result is
d
d
−
proportional to the square root of the number of terms, to (L /a) . The quantity H
d
= √Γ is the root-mean-square fluctuation
d/2
rms
in the random field at a given site. The surface energy is

d−1
J (Ld /a) (discrete symmetry)
E ∝{ (7.7.14)
surf ace d−2
J (Ld /a) (continuous symmetry) .
We compute the critical dimension d by balancing the bulk and surface energies,
c
1
d−1 = d ⟹ dc = 2 (discrete)
2
1
d−2 = d ⟹ dc = 4 (continuous) .
2
The total free energy is F d

= (V / L ) ⋅ ΔE
d
, where ΔE = E bulk
+E
surf
. Thus, the free energy per unit cell is
1 1
dc d
F a 2
a 2
f = ≈J ( ) − Hrms ( ) . (7.7.15)
d
V /a L Ld
d
If d < d , the surface term dominates for small L and the bulk term dominates for large L There is global minimum at
c
d d
L dc J d −d
c
d
=( ⋅ ) . (7.7.16)
a d Hrms
For d > d , the relative dominance of the bulk and surface terms is reversed, and there is a global maximum at this value of L .
c
d
Sketches of the free energy f (L ) in both cases are provided in the right panel of Figure [ImryMa]. We must keep in mind that the
d
domain size L cannot become smaller than the lattice spacing a . Hence we should draw a vertical line on the graph at L = a and
d d
discard the portion L < a as unphysical. For d < d , we see that the state with L = ∞ , the ordered state, is never the state of
d c d
lowest free energy. In dimensions d < d , the ordered state is always unstable to domain formation in the presence of a random
c
field.
For d > dc, there are two possibilities, depending on the relative size of J and H . We can see this by evaluating rms
f (L
d
= a) = J − Hrms and f (L = ∞) = 0 . Thus, if J > H
d
, the minimum energy state occurs for L = ∞ . In this case, the
rms d
system has an ordered ground state, and we expect a finite temperature transition to a disordered state at some critical temperature
T > 0 . If, on the other hand, J < H
c , then the fluctuations in H overwhelm the exchange energy at T = 0 , and the ground
rms
state is disordered down to the very smallest length scale ( the lattice spacing a ).
Please read the essay, “Memories of Shang-Keng Ma,” at sip.clarku.edu/skma.html.
This page titled 7.7: Global Symmetries is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel Arovas.
Ginzburg-Landau free energy
Including gradient terms in the free energy, we write
d
1 2
1 4
1 6
1 2
F [m(x) , h(x)] = ∫ d x {f + am + bm + cm −h m + κ (∇m ) +… } . (7.8.1)
0
2 4 6 2
In principle, any term which does not violate the appropriate global symmetry will turn up in such an expansion of the free energy,
with some coefficient. Examples include hm (both m and h are odd under time reversal), m (∇m) , We now ask: what function
3 2 2
m(x) extremizes the free energy functional F [m(x) , h(x)]? The answer is that m(x) must satisfy the corresponding Euler-
Lagrange equation, which for the above functional is

3 5 2
a m +b m +c m −h −κ ∇ m = 0 . (7.8.2)
If a > 0 and h is small (we assume b > 0 and c > 0 ), we may neglect the m and m terms and write 3 5
2
(a − κ ∇ ) m = h , (7.8.3)
whose solution is obtained by Fourier transform as

^
h(q)
m
^ (q) = , (7.8.4)
2
a + κq
which, with h(x) appropriately defined, recapitulates the result in Equation [mhqeqn]. Thus, we conclude that
1
χ
^(q) = , (7.8.5)
2
a + κq
which should be compared with Equation [xhiheqn]. For continuous functions, we have
d −iq⋅x
m
^ (q) = ∫ d x m(x) e
d
d q
iq⋅x
m(x) = ∫ ^ (q) e
m .
d
(2π)
We can then derive the result
d ′ ′ ′
m(x) = ∫ d x χ(x − x ) h(x ) , (7.8.6)
where
′
d iq⋅(x−x )
′
1 d q e
χ(x − x ) = ∫ , (7.8.7)
d 2 −2
κ (2π) q +ξ
−−−
where the correlation length is ξ = √κ/a ∝ (T − T c)
−1/2
, as before.
If a < 0 then there is a spontaneous magnetization and we write m(x) = m
0
+ δm(x) . Assuming h is weak, we then have two
equations
2 4
a+b m +c m =0
0 0
2 4 2
(a + 3b m + 5c m − κ ∇ ) δm =h .
0 0
If −a > 0 is small, we have m 2

0
= −a/3b and
^
h(q)
^ (q) =
δm , (7.8.8)
2
−2a + κq
Domain wall profile
A particularly interesting application of Ginzburg-Landau theory is its application toward modeling the spatial profile of defects
such as vortices and domain walls. Consider, for example, the case of Ising (Z ) symmetry with h = 0 . We expand the free energy 2
density to order m : 4
1 1 1
d 2 4 2
F [m(x)] = ∫ d x {f + am + bm + κ (∇m ) } . (7.8.9)
0
2 4 2
We assume a <0 , corresponding to T < T . Consider now a domain wall, where m(x → −∞) = −m
c 0
and
m(x → +∞) = +m , where m is the equilibrium magnetization, which we obtain from the Euler-Lagrange equation,
0 0
3 2
a m +b m −κ ∇ m = 0 , (7.8.10)
−−−−
assuming a uniform solution where ∇m = 0 . This gives m
0
= √|a|/b . It is useful to scale m(x) by m
0
, writing
m(x) = m
0
ϕ(x) . The scaled order parameter function ϕ(x) interpolates between ϕ(−∞) = −1 and ϕ(+∞) = 1 .
1/2
It also proves useful to rescale position, writing x = (2κ/|a|) ζ . Then we obtain
1 2 3
∇ ϕ = −ϕ + ϕ . (7.8.11)
2
We assume ϕ(ζ) = ϕ(ζ) is only a function of one coordinate, ζ ≡ ζ . Then the Euler-Lagrange equation becomes
1
2
d ϕ 3
∂U
= −2ϕ + 2 ϕ ≡− , (7.8.12)
2
dζ ∂ϕ
where
1 2
2
U (ϕ) = − (ϕ − 1) . (7.8.13)
2
The ‘potential’ U (ϕ) is an inverted double well, with maxima at ϕ = ±1 . The equation ϕ¨ = −U (ϕ) , where dot denotes ′
differentiation with respect to ζ , is simply Newton’s second law with time replaced by space. In order to have a stationary solution
2
at ζ → ±∞ where ϕ = ±1 , the total energy must be E = U (ϕ = ±1) = 0 , where E = 1
2
˙
ϕ + U (ϕ) . This leads to the first order
differential equation
dϕ 2
= 1 −ϕ , (7.8.14)
dζ
with solution
ϕ(ζ) = tanh(ζ) . (7.8.15)
Restoring the dimensionful constants,

x
m(x) = m tanh( – ) , (7.8.16)
0
√2 ξ
1/2
where the coherence length ξ ≡ (κ/|a|) diverges at the Ising transition a = 0 .
Derivation of Ginzburg-Landau free energy

We can make some progress in systematically deriving the Ginzburg-Landau free energy. Consider the Ising model,
^
H 1 1
=− ∑K σ σ −∑h σ + ∑K , (7.8.17)
ij i j i i ii
kB T 2 2
i,j i i
where now K = J /k T and h = H /k T are the interaction energies and local magnetic fields in units of k T . The last term
ij ij B
i i B B
on the RHS above cancels out any contribution from diagonal elements of K . Our derivation makes use of a generalization of the ij
Gaussian integral,
∞
1/2
−
1 2
ax −bx
2π 2
b /2a
∫ dx e 2 =( ) e . (7.8.18)
a
−∞
The generalization is \[\int\limits_{-\infty}^\infty\!\!\!dx\ns_1\cdots\!\!\!\int\limits_{-\infty}^\infty\!\!\!dx\ns_N\> e^{-{1\over 2}

A\ns_{ij} x\ns_ix\ns_j - b\ns_i x\ns_i}={(2\pi)^{N/2}\over \sqrt
\ e^\Tra \bigg[e^\!\!\int\limits_{-\infty}^\infty\!\!\!d\phi\ns_1 \cdots\!\!\!\int\limits_{-\infty}^\infty\!\!\!d\phi\ns_N\> e^{-{1\over 2}
K^{-1}_{ij}\phi\ns_i\phi\ns_j}\,\Tra e^{(\phi\ns_i+h\ns_i)\sigma\ns_i}\\ &={det}^{-1/2}(2\pi K)\> e^{-{1\over 2}
K\ns_{ii}}\!\!\int\limits_{-\infty}^\infty\!\!\!d\phi\ns_1 \cdots\!\!\!\int\limits_{-\infty}^\infty\!\!\!d\phi\ns_N\> e^{-{1\over 2}
K^{-1}_{ij}\phi\ns_i\phi\ns_j}\,e^{\sum_i \ln\left[2\cosh(\phi\ns_i+h\ns_i)\right]}\\ &\equiv \int\limits_{-
\infty}^\infty\!\!\!d\phi\ns_1\cdots\!\!\!\int\limits_{-\infty}^\infty\!\!\!d\phi\ns_N\> e^{-\RPhi(\phi\ns_1,\ldots,\phi\ns_N)}\ ,
\end{split}\] where
1 1 1
−1
Φ = ∑K ϕ ϕ − ∑ ln cosh(ϕ + h ) + ln det (2πK) + Tr K − N ln 2 . (7.8.19)
ij i j i i
2 2 2
i,j i
We assume the model is defined on a Bravais lattice, in which case we can write ϕ i
=ϕ
R
. We can then define the Fourier
i
transforms,
1
^ iq⋅R
ϕ = ∑ϕ e
R −
− q
√N q
1
^ −iq⋅R
ϕ = ∑ϕ e
q −
− R
√N
R
and
^ −iq⋅R
K (q) = ∑ K(R) e . (7.8.20)
A few remarks about the lattice structure and periodic boundary conditions are in order. For a Bravais lattice, we can write each
direct lattice vector R as a sum over d basis vectors with integer coefficients, viz.
d
R = ∑ nμ aμ , (7.8.21)
μ=1
where d is the dimension of space. The reciprocal lattice vectors b satisfy μ
aμ ⋅ b = 2π δμν , (7.8.22)
ν
and any wavevector q may be expressed as

d
1
q = ∑ θμ bμ . (7.8.23)
2π
μ=1
We can impose periodic boundary conditions on a system of size M 1

×M
2
×⋯ ×M
d
by requiring
ϕ d
=ϕ . (7.8.24)
R+∑ l M aμ R
μ=1 μ μ
This leads to the quantization of the wavevectors, which must then satisfy
iMμ q⋅ aμ iMμ θμ
e =e =1 , (7.8.25)
and therefore θ = 2π m /M , where m is an integer. There are then

μ μ μ μ M M
1 2
⋯M
d
=N independent values of q, which can
be taken to be those corresponding to m ∈ {1, … , M }.
μ μ
Let’s now expand the function Φ(\Vphi) in powers of the ϕ , and to first order in the external fields h . We obtain
i i
1 −1
2 1 6 2
^ ^ 4
Φ = ∑ (K (q) − 1) | ϕ q | + ∑ϕ −∑h ϕ + O(ϕ , h )
R R R
2 12
q R R
1 1
+ Tr K + Tr ln(2πK) − N ln 2
2 2
On a d -dimensional lattice, for a model with nearest neighbor interactions K only, we have K ^
(q) = K ∑ e , where δ is a 1 1 δ
iq⋅δ
nearest neighbor separation vector. These are the eigenvalues of the matrix K . We note that K is then not positive definite, since
ij ij
there are negative eigenvalues19. To fix this, we can add a term K everywhere along the diagonal. We then have
0
^
K (q) = K +K ∑ cos(q ⋅ δ) . (7.8.26)
0 1
Here we have used the inversion symmetry of the Bravais lattice to eliminate the imaginary term. The eigenvalues are all positive
so long as K > zK , where z is the lattice coordination number. We can therefore write K
0 1
^ ^
(q) = K (0) − α q for small q, with 2
α > 0 . Thus, we can write
−1
^ 2
K (q) − 1 = a + κ q +… . (7.8.27)
To lowest order in q the RHS is isotropic if the lattice has cubic symmetry, but anisotropy will enter in higher order terms. We’ll
assume isotropy at this level. This is not necessary but it makes the discussion somewhat less involved. We can now write down
our Ginzburg-Landau free energy density:
1 2
1
2 4
F =aϕ + κ |∇ϕ| + ϕ −h ϕ , (7.8.28)
2 12
valid to lowest nontrivial order in derivatives, and to sixth order in ϕ .

One might wonder what we have gained over the inhomogeneous variational density matrix treatment, where we found
1 2
F =− ∑ J^(q) | m ^ (−q) m
^ (q)| − ∑ H ^ (q)
2
q q
1 +m 1 +m 1 −m 1 −m
i i i i
+ kB T ∑ {( ) ln( ) +( ) ln( )} .
2 2 2 2
i
Surely we could expand J^(q) = J^(0) − aq + … and obtain a similar expression for F . However, such a derivation using the
1
2
2
variational density matrix is only approximate. The method outlined in this section is exact.
Let’s return to our complete expression for Φ:
Φ(\Vphi) = Φ (\Vphi) + ∑ v(ϕ ) , (7.8.29)

0 R
where
1 −1 2 1 1 1 2π
^
Φ (\Vphi) = ∑G (q) ∣
∣ϕ (q)∣
∣ + Tr ( )+ Tr ln ( ) − N ln 2 . (7.8.30)
0 −1 −1
2 2 1 +G 2 1 +G
q
Here we have defined

1
2
v(ϕ) = ϕ − ln cosh ϕ
2
1 4
1 6
17 8
= ϕ − ϕ + ϕ +…
12 45 2520
and
^ (q)
K
G(q) = . (7.8.31)
^
1 − K (q)
We now want to compute
−Φ (\Vphi) −∑ v( ϕ )
Z = ∫ D\Vphi e 0
e R R
(7.8.32)
where
D\Vphi ≡ dϕ dϕ ⋯ dϕ . (7.8.33)
1 2 N
We expand the second exponential factor in a Taylor series, allowing us to write

1
Z =Z (1 − ∑ ⟨v(ϕ )⟩ + ∑ ∑ ⟨v(ϕ ) v(ϕ ′
)⟩ + … ) , (7.8.34)
0 R R R
2 ′
R R R
where
−Φ0 (\Vphi)
Z = ∫ D\Vphi e
0
1 G
ln Z = Tr [ ln(1 + G) − ] + N ln 2
0
2 1 +G
and
−Φ0
∫D\Vphi F e
⟨F (\Vphi)⟩ = . (7.8.35)
−Φ0
∫D\Vphi e
To evaluate the various terms in the expansion of Equation [ZZZ], we invoke Wick’s theorem, which says
∞ ∞ ∞ ∞
1 −1 1 −1
− G x x − G x x
⟨x x ⋯x ⟩ = ∫ dx ⋯∫ dx e 2
ij i j
x x ⋯x / ∫ dx ⋯∫ dx e 2
ij i j
i i i 1 N i i i 1 N
1 2 2L 1 2 2L
−∞ −∞ −∞ −∞
= ∑ G G ⋯G ,
j j j j j j
1 2 3 4 2L −1 2L
a ll distinct
pa irings
where the sets {j 1

,…,j
2L
} are all permutations of the set {i 1
,…,i
2L
} . In particular, we have
2
4
⟨x ⟩ = 3(G ) . (7.8.36)
i ii
In our case, we have

2
1
4
⟨ϕ ⟩ =3( ∑ G(q)) . (7.8.37)
R
N
q
Thus, if we write v(ϕ) ≈ 12

1
ϕ
4
and retain only the quartic term in v(ϕ) , we obtain
F 1 G 1 2
= − ln Z = Tr [ − ln(1 + G)] + ( Tr G) − N ln 2
0
kB T 2 1 +G 4N
1 2 1
2 3
= −N ln 2 + ( Tr G) − Tr (G ) + O(G ) .
4N 4
Note that if we set K to be diagonal, then K

ij
^
(q) and hence G(q) are constant functions of q . The O(G )
2
term then vanishes,
which is required since the free energy cannot depend on the diagonal elements of K . ij
Ginzburg criterion
Let us define A(T , H , V , N ) to be the usual ( thermodynamic) Helmholtz free energy. Then
−βA −βF [m(x)]

e = ∫ Dm e , (7.8.38)
where the functional F [m(x)] is of the Ginzburg-Landau form, given in Equation [DWFE]. The integral above is a functional
integral. We can give it a more precise meaning by defining its measure in the case of periodic functions m(x) confined to a
rectangular box. Then we can expand
1 iq⋅x
m(x) = ^
−− ∑ mq e , (7.8.39)
√V
q
and we define the measure
^
Dm ≡ dm ∏ d Re m ^
d Im m . (7.8.40)
q q0
q
q >0
x
∗
Note that the fact that m(x) ∈ R means that m ^ =m ^
−q
. We’ll assume T > T and H = 0 and we’ll explore limit T → T from
q c c
+
above to analyze the properties of the critical region close to T . In this limit we can ignore all but the quadratic terms in m, and we
c
have
1 2
−βA 2
e = ∫ Dm exp (− β ∑(a + κ q ) | m
^ | )
q
2
q
1/2
π kB T
= ∏( ) .
2
a+κ q
q
Thus,
2
1 a+κ q
A = kB T ∑ ln( ) . (7.8.41)
2 π kB T
q
We now assume that a(T ) = αt , where t is the dimensionless quantity

T − Tc
t = , (7.8.42)
Tc
known as the reduced temperature.

2
We now compute the heat capacity C V

= −T
∂ A
2
. We are really only interested in the singular contributions to C , which means V
∂T
that we’re only interested in differentiating with respect to T as it appears in a(T ) . We divide by N k where N is the number of S
B
S
unit cells of our system, which we presume is a lattice-based model. Note N ∼ V /a where V is the volume and a the lattice S
d
constant. The dimensionless heat capacity per lattice site is then

Λ
2 d d
C α a d q 1
V
c ≡ = ∫ , (7.8.43)
2 d −2 2 2
N 2κ (2π) (ξ +q )
S
where ξ = (κ/αt) 1/2 −1/2

∝ |t| is the correlation length, and where Λ ∼ a −1
is an ultraviolet cutoff. We define R ∗
≡ (κ/α )
1/2
, in
which case
Λξ
d
−4 d 4−d
1 d q̄ 1
c = R∗ a ξ ⋅ ∫ , (7.8.44)
d 2 2
2 (2π) (1 + q̄ )
where q̄ ≡ qξ . Thus,
⎧ const.
⎪ if d > 4
c(t) ∼ ⎨ − ln t if d = 4 (7.8.45)
⎩
⎪ d −2
t 2
if d < 4 .
For d > 4 , mean field theory is qualitatively accurate, with finite corrections. In dimensions d ≤ 4 , the mean field result is
overwhelmed by fluctuation contributions as t → 0 ( as T → T ). We see that MFT is sensible provided the fluctuation
+
c
+
contributions are small, provided

−4 d 4−d
R∗ a ξ ≪ 1 , (7.8.46)
which entails t\gg t\ns_\ssr{G} , where
t\ns_\ssr{G}=\bigg({\Sa\over R\ns_*}\bigg)^{\!{2d\over 4-d}}
is the Ginzburg reduced temperature. The criterion for the sufficiency of mean field theory, namely t\gg t\ns_\ssr{G} , is known as
the Ginzburg criterion. The region |t|<t\ns_\ssr{G} is known as the critical region.
In a lattice ferromagnet, as we have seen, R ∼ a is on the scale of the lattice spacing itself, hence t\ns_\ssr{G}\sim 1 and the
∗
critical regime is very large. Mean field theory then fails quickly as T → T . In a (conventional) three-dimensional superconductor,
c
R ∗ is on the order of the Cooper pair size, and R /a ∼ 10 − 10 , hence t\ns_\ssr{G}=(a/R\ns_*)^6\sim 10^{-18} - 10^{-12} is
∗
2 3
negligibly narrow. The mean field theory of the superconducting transition – BCS theory – is then valid essentially all the way to
T =T .c
This page titled 7.8: Ginzburg-Landau Theory is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel
Arovas.
In both the variational density matrix and mean field Hamiltonian methods as applied to the Ising model, we obtained the same result m = tanh ((m + h)/θ) . What is perhaps not obvious is whether
these theories are in fact the same, if their respective free energies agree. Indeed, the two free energy functions,
\begin{split} f\nd_\ssr{A}(m,h,\theta)&=-\half\,m^2 -h m + \theta\> \bigg\{\bigg({1+m\over 2}\bigg) \ln \bigg({1+m\over 2}\bigg) +\bigg({1- m\over 2}\bigg) \ln \bigg({1-m\over 2}\bigg) \bigg\}\\ f\nd_\ssr{B}(m,h,\theta)&
where f\ns_\ssr{A} is the variational density matrix result and f\ns_\ssr{B} is the mean field Hamiltonian result, clearly are different functions of their arguments. However, it turns out that upon
minimizing with respect to m in each cast, the resulting free energies obey f\nd_\ssr{A}(h,\theta)=f\nd_\ssr{B}(h,\theta) . This agreement may seem surprising. The first method utilizes an approximate
(variational) density matrix applied to the exact Hamiltonian H^
. The second method approximates the Hamiltonian as \HH\ns_\ssr{MF} , but otherwise treats it exactly. The two Landau expansions
seem hopelessly different:
\begin{split} f\nd_\ssr{A}(m,h,\theta)&=-\theta\,\ln 2 - hm +\half\, (\theta-1) \,m^2 + \frac{\theta}{12}\,m^4 + \frac{\theta}{30}\,m^6 + \ldots\vph\\ f\nd_\ssr{B}(m,h,\theta)&=-\theta\,\ln 2 + \half m^2 - {(m+h)^2\over 2\,\th
We shall now prove that these two methods, the variational density matrix and the mean field approach, are in fact equivalent, and yield the same free energy f (h, θ).
Let us generalize the Ising model and write
^
H = − ∑ Jij ε(σi , σj ) − ∑ Φ(σi ) . (7.9.1)
i<j i
Here, each ‘spin’ σ may take on any of K possible values, {s , … , s }. For the S = 1 Ising model, we would have K = 3 possibilities, with s = −1 , s = 0 , and s = +1 . But the set {s },
i 1 K 1 2 3 α
with α ∈ {1, … , K}, is completely arbitrary20. The ‘local field’ term Φ(σ) is also a completely arbitrary function. It may be linear, with Φ(σ) = H σ, for example, but it could also contain terms
quadratic in σ, or whatever one desires.
The symmetric, dimensionless interaction function ε(σ, σ ′ ′
) = ε(σ , σ) is a real symmetric K × K matrix. According to the singular value decomposition theorem, any such matrix may be written in
the form
N
s
′ ′
ε(σ, σ ) = ∑ Ap λp (σ) λp (σ ) , (7.9.2)
p=1
where the {A } are coefficients (the singular values), and the

p { λp (σ)} are the singular vectors. The number of terms Ns in this decomposition is such that Ns ≤ K . This treatment can be
generalized to account for continuous σ.
Variational Density Matrix

The most general single-site variational density matrix is written
K
ϱ(σ) = ∑ xα δ . (7.9.3)
σ,sα
α=1
Thus, x is the probability for a given site to be in state α , with σ = s . The {x

α α α} are the K variational parameters, subject to the single normalization constraint, ∑ α
xα = 1 . We now have
1
^
f = { Tr (ϱH ) + kB T Tr (ϱ ln ϱ)}
^
N J (0)
1
=− ∑ ∑ Ap λp (sα ) λp (s ) xα x − ∑ φ(sα ) xα + θ ∑ xα ln xα ,
α′ α′
2 ′
p α,α α α
where φ(σ) = Φ(σ)/J^(0). We extremize in the usual way, introducing a Lagrange undetermined multiplier ζ to enforce the constraint. This means we extend the function f ({x , writing
α })
∗
f (x , … , x , ζ) = f (x , … , x ) + ζ ( ∑ xα − 1) , (7.9.4)
1 K 1 K
α=1
and freely extremizing with respect to the (K + 1) parameters {x 1

,…,x
K
, ζ} . This yields K nonlinear equations,
∗
∂f
0 = = − ∑ ∑ Ap λp (sα ) λp (s ′ )x ′ − φ(sα ) + θ ln xα + ζ + θ , (7.9.5)
α α
∂xα ′
p α
for each α , and one linear equation, which is the normalization condition,
∗
∂f
0 = = ∑ xα − 1 . (7.9.6)
∂ζ
α
We cannot solve these nonlinear equations analytically, but they may be recast, by exponentiating them, as
1 1
xα = exp { [ ∑ ∑ Ap λp (sα ) λp (s )x + φ(sα )]} , (7.9.7)
α′ α′
Z θ ′
p α
with
1
(ζ/θ)+1
Z =e = ∑ exp { [ ∑ ∑ Ap λp (sα ) λp (s )x + φ(sα )]} . (7.9.8)
α′ α′
θ ′
α p α
From the logarithm of x , we may compute the entropy, and, finally, the free energy:
α
1
f (θ) = ∑ ∑ Ap λp (sα ) λp (s ′
) xα x ′
− θ ln Z , (7.9.9)
α α
2 ′
p α,α
which is to be evaluated at the solution of [nonla], {x ∗

α (h, θ)}
Mean Field Approximation

We now derive a mean field approximation in the spirit of that used in the Ising model above. We write
λp (σ) = ⟨λp (σ)⟩ + δλp (σ) , (7.9.10)
and abbreviate \labar p

= ⟨λp (σ)⟩ , the thermodynamic average of λ p (σ) on any given site. We then have
′ 2 ′ ′
λp (σ) λp (σ ) = \labar + \labar δλp (σ) + \labar δλp (σ ) + δλp (σ) δλp (σ )
p p p
2 ′ ′
= −\labar + \labar (λp (σ) + λp (σ )) + δλp (σ) δλp (σ ) .
p p
The product δλ p (σ)

′
δλp (σ ) is of second order in fluctuations, and we neglect it. This leads us to the mean field Hamiltonian,
\HH\ns_\ssr{MF}=+\half N \jhz\sum_p A_p\,\labar_p^2 -\sum_i \bigg[\jhz\sum_p A_p\,\labar_p\,\lambda_p(\sigma_i) + \RPhi(\sigma_i)\bigg]\ .
The free energy is then
1 2
1
f ({ \labar }, θ) = ∑ Ap \labar − θ ln ∑ exp { [ ∑ Ap \labar λp (sα ) + φ(sα )]} . (7.9.11)
p p p
2 θ
p α p
The variational parameters are the mean field values {\labar p

} .
The single site probabilities {x α} are then
1 1
xα = exp { [ ∑ Ap \labar λp (sα ) + φ(sα )]} , (7.9.12)
p
Z θ
p
with Z implied by the normalization ∑ α

xα = 1 . These results reproduce exactly what we found in Equation [nonla], since the mean field equation here, ∂f /∂ \labar p
=0 , yields
K
\labar = ∑ λp (sα ) xα . (7.9.13)

p
α=1
The free energy is immediately found to be

1 2
f (θ) = ∑ Ap \labar − θ ln Z , (7.9.14)
p
2
p
which again agrees with what we found using the variational density matrix.
Thus, whether one extremizes with respect to the set {x , … , x , ζ}, or with respect to the set {\labar }, the results are the same, in terms of all these parameters, as well as the free energy f (θ) .
1 K p
Generically, both approaches may be termed ‘mean field theory’ since the variational density matrix corresponds to a mean field which acts on each site independently21.
This page titled 7.9: Appendix I- Equivalence of the Mean Field Descriptions is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel Arovas.
Blume-Capel model
The Blume-Capel model provides a simple and convenient way to model systems with vacancies. The simplest version of the model is written
1
^ 2
H =− ∑J S S +Δ∑S . (7.10.1)
ij i j i
2
i,j i
The spin variables S range over the values {−1 , 0 , +1}, so this is an extension of the S = 1 Ising model. We explicitly separate out the diagonal terms, writing J ≡ 0 , and placing them in the
i ii
second term on the RHS above. We say that site i is occupied if S = ±1 and vacant if S = 0 , and we identify −Δ as the vacancy creation energy, which may be positive or negative, depending on
i i
whether vacancies are disfavored or favored in our system.

We make the mean field Ansatz, writing S i
= m + δS
i
. This results in the mean field Hamiltonian,
\HH\ns_\ssr{MF}=\half N \jhz\, m^2 - \jhz\, m \sum_i S\ns_i + \RDelta\sum_i S^2_i\ .
Once again, we adimensionalize, writing ^

f ≡ F /N J (0) ,θ=k B
^
T / J (0) , and δ = Δ/J^(0) . We assume J^(0) > 0 . The free energy per site is then
1
2 −δ/θ
f (θ, δ, m) = m − θ ln (1 + 2 e cosh(m/θ)) . (7.10.2)
2
Extremizing with respect to m, we obtain the mean field equation,

2 sinh(m/θ)
m = . (7.10.3)
exp(δ/θ) + 2 cosh(m/θ)
Note that m = 0 is always a solution. Finding the slope of the RHS at m = 0 and setting it to unity gives us the critical temperature:
2
θc = . (7.10.4)
exp(δ/ θc ) + 2
This is an implicit equation for θ in terms of the vacancy energy δ .

c
[blume] Mean field phase diagram for the Blume-Capel model. The black dot signifies a tricritical point, where the coefficients of m and m in the Landau free energy expansion both vanish. The 2 4
dashed curve denotes a first order transition, and the solid curve a second order transition. The thin dotted line is the continuation of the θ (δ) relation to zero temperature. c
Let’s now expand the free energy in terms of the magnetization m. We find, to fourth order,
1 2
−δ/θ 2
f = −θ ln (1 + 2 e )+ (θ − )m
2θ 2 + exp(δ/θ)
1 6
4
+ ( − 1) m +… .
12 (2 + exp(δ/θ))θ3 2 + exp(δ/θ)
Note that setting the coefficient of the m term to zero yields the equation for θ . However, upon further examination, we see that the coefficient of the
2
c m
4
term can also vanish. As we have seen,
when both the coefficients of the m and the m terms vanish, we have a tricritical point22. Setting both coefficients to zero, we obtain
2 4
1 2
θ = , δ = ln 2 . (7.10.5)
t t
3 3
At θ = 0 , it is easy to see we have a first order transition, simply by comparing the energies of the paramagnetic (S i
=0 ) and ferromagnetic (S i = +1 or S
i = −1 ) states. We have
{E\ns_\ssr{MF}\over N\jhz}=\begin{cases} 0 & {if}\ m=0 \\ \half-\RDelta & {if}\ m=\pm 1\ . \end{cases}
These results are in fact exact, and not only valid for the mean field theory. Mean field theory is approximate because it neglects fluctuations, but at zero temperature, there are no fluctuations to
neglect!
The phase diagram is shown in Figure [blume]. Note that for δ large and negative, vacancies are strongly disfavored, hence the only allowed states on each site have S = ±1 , which is our old friend i
the two-state Ising model. Accordingly, the phase boundary there approaches the vertical line θ = 1 , which is the mean field transition temperature for the two-state Ising model.
c
Ising antiferromagnet in an external field

Consider the following model:
^
H = J ∑σ σ −H ∑σ , (7.10.6)
i j i
⟨ij⟩ i
with J > 0 and σ i

= ±1 . We’ve solved for the mean field phase diagram of the Ising ferromagnet; what happens if the interactions are antiferromagnetic?
It turns out that under certain circumstances, the ferromagnet and the antiferromagnet behave exactly the same in terms of their phase diagram, response functions, This occurs when H = 0 , and when
the interactions are between nearest neighbors on a bipartite lattice. A bipartite lattice is one which can be divided into two sublattices, which we call A and B, such that an A site has only B
neighbors, and a B site has only A neighbors. The square, honeycomb, and body centered cubic (BCC) lattices are bipartite. The triangular and face centered cubic lattices are non-bipartite. Now if the
lattice is bipartite and the interaction matrix J is nonzero only when i and j are from different sublattices (they needn’t be nearest neighbors only), then we can simply redefine the spin variables
ij
such that
+σ if j ∈ A
′ j
σ ={ (7.10.7)
j
−σ if j ∈ B .
j
Then σ i
′
σ
j
′
= −σ σ
i j
, and in terms of the new spin variables the exchange constant has reversed. The thermodynamic properties are invariant under such a redefinition of the spin variables.
We can see why this trick doesn’t work in the presence of a magnetic field, because the field H would have to be reversed on the B sublattice. In other words, the thermodynamics of an Ising
ferromagnet on a bipartite lattice in a uniform applied field is identical to that of the Ising antiferromagnet, with the same exchange constant (in magnitude), in the presence of a staggered field
H\ns_\ssr{A}=+H and H\ns_\ssr{B}=-H .
We treat this problem using the variational density matrix method, using two independent variational parameters \msa and \msb for the two sublattices:
\begin{split} \vrh_\ssr{A}(\sigma)&={1+\msa\over 2} \> \delta\ns_{\sigma,1} + {1-\msa\over 2} \> \delta\ns_{\sigma,-1}\\ \vrh_\ssr{B}(\sigma)&={1+\msb\over 2} \> \delta\ns_{\sigma,1} + {1-\msb\over 2} \> \delta\ns_{\
With the usual adimensionalization, f = F /N zJ ,θ=k B

T /zJ , and h = H /zJ , we have the free energy
f(\msa,\msb)=\half\msa\msb-\half\, h\, (\msa+\msb) -\half\,\theta \, s(\msa) -\half\,\theta\,s(\msb)\ ,
where the entropy function is
1 +m 1 +m 1 −m 1 −m
s(m) = −[ ln( )+ ln( )] . (7.10.8)
2 2 2 2
Note that
2
ds 1 1 +m d s 1
=− ln( ) , =− . (7.10.9)
2 2
dm 2 1 −m dm 1 −m
[affgraph] Graphical solution to the mean field equations for the Ising antiferromagnet in an external field, here for θ = 0.6 . Clockwise from upper left: (a) h = 0.1 , (b) h = 0.5 , (c) h = 1.1 , (d)
h = 1.4 .
Differentiating f(\msa,\msb) with respect to the variational parameters, we obtain two coupled mean field equations:
\begin{split} {\pz f\over\pz\msa}&=0 \quad \Longrightarrow\quad \msb=h-{\theta\over 2}\ln\!\bigg({1+\msa\over 1-\msa}\bigg) \\ {\pz f\over\pz\msb}&=0 \quad \Longrightarrow\quad \msa=h-{\theta\over 2}\ln\!\bigg({1+\
Recognizing tanh −1
(x) =
1
2
ln [(1 + x)/(1 − x)] , we may write these equations in an equivalent but perhaps more suggestive form:
\msa=\tanh\bigg({h-\msb\over\theta}\bigg)\qquad,\qquad \msb=\tanh\bigg({h-\msa\over\theta}\bigg)\ .
In other words, the A sublattice sites see an internal field H\ns_{\ssr{A},{int}}=-zJ\msb from their B neighbors, and the B sublattice sites see an internal field H\ns_{\ssr{B},{int}}=-zJ\msa from their A
neighbors.
We can solve these equations graphically, as in Figure [affgraph]. Note that there is always a paramagnetic solution with \msa=\msb=m , where
θ 1 +m h −m
m =h− ln( ) ⟺ m = tanh ( ) . (7.10.10)
2 1 −m θ
However, we can see from the figure that there will be three solutions to the mean field equations provided that {\pz \msa\over\pz \msb}<-1 at the point of the solution where \msa=\msb=m . This gives
us two equations with which to eliminate \msa and \msb , resulting in the curve
θ 1 +m − −−−−
∗
h (θ) = m + ln( ) with m = √ 1 − θ . (7.10.11)
2 1 −m
Thus, for θ < 1 and |h| < h (θ) there are three solutions to the mean field equations. It is usually the case, the broken symmetry solutions, which mean those for which \msa\ne\msb in our case, are
∗
of lower energy than the symmetric solution(s). We show the curve h (θ) in Figure [affpd].
∗
[affpd] Mean field phase diagram for the Ising antiferromagnet in an external field. The phase diagram is symmetric under reflection in the h = 0 axis.
We can make additional progress by defining the average and staggered magnetizations m and m , s
m\equiv \half(\msa+\msb) \qquad,\quad \mss\equiv\half (\msa-\msb)\ .
We expand the free energy in terms of m : s
1 2
1 1 1
2
f (m, ms ) = m − ms − h m − θ s(m + ms ) − θ s(m − ms )
2 2 2 2
1 2
1 ′′
1 ′′′′
2 4
= m − h m − θ s(m) − (1 + θ s (m)) ms − θs (m) ms + … .
2 2 24
−−−−
The term quadratic in m vanishes when θ s
s
′′
(m) = −1 , when m = √1 − θ . It is easy to obtain
3 4 2
d s 2m d s 2 (1 + 3 m )
=− , =− , (7.10.12)
3 2 2 4 2 3
dm (1 − m ) dm (1 − m )
from which we learn that the coefficient of the quartic term, − 1
24
θs
′′′′
(m) , never vanishes. Therefore the transition remains second order down to θ = 0 , where it finally becomes first order.
We can confirm the θ → 0 limit directly. The two competing states are the ferromagnet, with \msa=\msb=\pm 1 , and the antiferromagnet, with \msa=-\msb=\pm 1 . The free energies of these states are
f^\ssr{FM}=\half-h \qquad,\qquad f^\ssr{AFM}=-\half\ .
There is a first order transition when f^\ssr{FM}=f^\ssr{AFM} , which yields h = 1 .
Canted quantum antiferromagnet

Consider the following model for quantum S = 1
2
spins:
y y
1
^ x x z z z z z z
H = ∑ [ − J(σ σ + σ σ ) + Δ σ σ ] + K ∑ σ σ σ σ , (7.10.13)
i j i j i j i j k l
4
⟨ij⟩ ⟨ijkl⟩
where σ is the vector of Pauli matrices on site i. The spins live on a square lattice. The second sum is over all square plaquettes. All the constants J , Δ, and K are positive.
i
Let’s take a look at the Hamiltonian for a moment. The J term clearly wants the spins to align ferromagnetically in the (x, y) plane (in internal spin space). The Δ term prefers antiferromagnetic
alignment along the z^ axis. The K term discourages any kind of moment along z
^ and works against the Δ term. We’d like our mean field theory to capture the physics behind this competition.
– –
Accordingly, we break up the square lattice into two interpenetrating √2 × √2 square sublattices (each rotated by 45 with respect to the original), in order to be able to describe an antiferromagnetic
∘
state. In addition, we include a parameter α which describes the canting angle that the spins on these sublattices make with respect to the x
^ -axis. That is, we write
\begin{split} \vrh\ns_\ssr{A}&=\half + \half m\,\big(\sin\alpha\>\sigma^x +\cos\alpha\>\sigma^z )\\ \vrh\ns_\ssr{B}&=\half + \half m\,\big(\sin\alpha\>\sigma^x - \cos\alpha\>\sigma^z )\ .\vph \end{split}
Note that \Tra\vrh\ns_\ssr{A}=\Tra\vrh\ns_\ssr{B}=1 so these density matrices are normalized. Note also that the mean direction for a spin on the A and B sublattices is given by
\Bm\ns_\ssr{A,B}=\Tra(\vrh\ns_\ssr{A,B}\,\Bsigma)=\pm\, m\cos\alpha\,\HBz + m\sin\alpha\>\HBx\ .
Thus, when α = 0 , the system is an antiferromagnet with its staggered moment lying along the z
^ axis. When α =
1
2
π , the system is a ferromagnet with its moment lying along the x
^ axis.
Finally, the eigenvalues of \vrh\ns_\ssr{A,B} are still λ ±

=
1
2
(1 ± m) , hence
\begin{split} s(m)&\equiv-\Tra (\vrh\ns_\ssr{A}\ln\vrh\ns_\ssr{A})=-\Tra (\vrh\ns_\ssr{B}\ln\vrh\ns_\ssr{B})\\ &=-\Bigg[{1+m\over 2}\>\ln\!\bigg({1+m\over 2}\bigg) +{1-m\over 2}\> \ln\!\bigg({1-m\over 2}\bigg) \Bigg]
Note that we have taken m\ns_\ssr{A}=m\ns_\ssr{B}=m , unlike the case of the antiferromagnet in a uniform field. The reason is that there remains in our model a symmetry between A and B
sublattices.
The free energy is now easily calculated:
^
F = Tr (ϱH ) + kB T Tr (ϱ ln ϱ)
2 2 2
1 4 4
= −2N (J sin α + Δ cos α) m + N Km cos α − N kB T s(m)
4
We can adimensionalize by defining δ ≡ Δ/J , κ ≡ K/4J , and θ ≡ k B

T /4J . Then the free energy per site is f ≡ F /4N J is
1 2
1 2 2
1 4 4
f (m, α) = − m + (1 − δ) m cos α + κm cos α − θ s(m) . (7.10.14)
2 2 4
There are two variational parameters: m and θ . We thus obtain two coupled mean field equations,
∂f 2 3 4
1 1 +m
= 0 = −m + (1 − δ) m cos α + κ m cos α + θ ln( )
∂m 2 1 −m
∂f
2 2 2
= 0 = (1 − δ + κ m cos α) m sin α cos α .
∂α
Let’s start with the second of the mean field equations. Assuming m ≠ 0 , it is clear from Equation [cantedferg] that
⎧0 if δ <1
⎪
⎪
⎪
⎪
⎪
2 2 2
cos α = ⎨ (δ − 1)/κm if 1 ≤ δ ≤ 1 +κ m (7.10.15)
⎪
⎪
⎪
⎪
⎩
⎪
2
1 if δ ≥ 1 +κ m .
Suppose δ < 1 . Then we have cos α = 0 and the first mean field equation yields the familiar result
m = tanh (m/θ) . (7.10.16)
Along the θ axis, then, we have the usual ferromagnet-paramagnet transition at θ c =1 .
[cantpd] Mean field phase diagram for the model of Equation [cantham] for the case κ = 1 .
For 1 < δ < 1 + κm 2
we have canting with an angle \[\alpha=\alpha^*(m)=\cos^{-1}\sqrt
\ .\] Substituting this into the first mean field equation, we once again obtain the relation m = tanh (m/θ) . However, eventually, as θ is increased, the magnetization will dip below the value
− −− −− −−−
m ≡ √(δ − 1)/κ . This occurs at a dimensionless temperature \[\theta\ns_0={m\ns_0\over\tanh^{-1}(m\ns_0)} < 1\qquad;\qquad m\ns_0=\sqrt\ .\] For θ > θ , we have δ > 1 + κm , and we must 2
0 0
take cos α = 1 . The first mean field equation then becomes

2
3
θ 1 +m
δm − κ m = ln( ) , (7.10.17)
2 1 −m
or, equivalently, m = tanh ((δm − κm )/θ) . A simple graphical analysis shows that a nontrivial solution exists provided θ < δ . Since cos α = ±1 , this solution describes an antiferromagnet, with
3
\Bm\ns_\ssr{A}=\pm m\zhat and \Bm\ns_\ssr{B}=\mp m\zhat . The resulting mean field phase diagram is then as depicted in Figure [cantpd].
Coupled order parameters

Consider the Landau free energy
1 1 1 1 1
2 4 2 4 2 2
f (m, ϕ) = am m + bm m + a ϕ + b ϕ + Λm ϕ . (7.10.18)
ϕ ϕ
2 4 2 4 2
We write
am ≡ αm θm , a =α θ , (7.10.19)
ϕ ϕ ϕ
where
T − Tc,m T −T
c,ϕ
θm = , θ = , (7.10.20)
ϕ
T T
0 0
where T is some temperature scale. We assume without loss of generality that T

0
c,m >T
c,ϕ
. We begin by rescaling:
1/2 1/2
αm αm
m ≡( ) m̃ , ϕ ≡( ) ϕ̃ . (7.10.21)
bm bm
We then have
1 2
1 4 −1
1 2 1 4 1 2
2
f =ε {r ( θm m̃ + m̃ ) + r ( θ ϕ̃ + ϕ̃ ) + λ m̃ ϕ̃ } , (7.10.22)
0 ϕ
2 4 2 4 2
where
1/2
αm α αm b Λ
ϕ ϕ
ε = , r = ( ) , λ = . (7.10.23)
0
1/2 α bm 1/2
(bm b ) ϕ
(bm b )
ϕ ϕ
It proves convenient to perform one last rescaling, writing

~ −1/4
~ 1/4
m ≡r m , ϕ ≡r φ . (7.10.24)
Then
1 2
1 4
1 −1 2
1 4
1 2 2
f =ε { q θm m + m + q θ φ + φ + λm φ } , (7.10.25)
0 ϕ
2 4 2 4 2
where
1/2 1/4
b
αm ϕ
q = √r = ( ) ( ) . (7.10.26)
α bm
ϕ
Note that we may write

2
ε 1 λ m ε q θm
0 2 2 0 2 2
f (m, φ) = (m φ )( )( )+ (m φ )( ) . (7.10.27)
2 −1
4 λ 1 φ 2 q θ
ϕ
The eigenvalues of the above 2 × 2 matrix are 1 ± λ , with corresponding eigenvectors ( 1
±1
) . Since φ 2
>0 , we are only interested in the first eigenvector ( 1
1
, corresponding to the eigenvalue 1 + λ .
)
Clearly when λ < 1 the free energy is unbounded from below, which is unphysical.
We now set
∂f ∂f
=0 , =0 , (7.10.28)
∂m ∂φ
and identify four possible phases:

Phase I : m = 0 , φ = 0 . The free energy is f\ns_\ssr{I}=0 .
Phase II : m ≠ 0 with φ = 0 . The free energy is
ε 1
0 2 4
f = (q θm m + m ) , (7.10.29)
2 2
hence we require θ m <0 in this phase, in which case
\Sm\ns_\ssr{II}=\sqrt{-q\,\thm}\qquad,\qquad f\ns_\ssr{II}=-{\ve\ns_0\over 4}\,q^2\,\theta_m^2\ .
Phase III : m = 0 with φ ≠ 0 . The free energy is

ε 1
0 −1 2 4
f = (q θ φ + φ ) , (7.10.30)
ϕ
2 2
hence we require θ ϕ
<0 in this phase, in which case
\vphi\ns_\ssr{III}=\sqrt{-q^{-1}\,\thp}\qquad,\qquad f\ns_\ssr{III}=-{\ve\ns_0\over 4}\,q^{-2}\,\theta_\phi^2\ .
Phase IV : m ≠ 0 and φ ≠ 0 . Varying f yields

2
1 λ m q θm
( )( ) = −( ) , (7.10.31)
2 −1
λ 1 φ q θ
ϕ
with solution
−1
q θm − q θ λ
ϕ
2
m =
2
λ −1
−1
q θ − q θm λ
2 ϕ
φ = .
2
λ −1
Since m and φ must each be nonnegative, phase IV exists only over a yet-to-be-determined subset of the entire parameter space. The free energy is
2 2
f\ns_\ssr{IV}={q^2\,\theta_m^2 + q^{-2}\,\theta_\phi^2 - 2\lambda\,\thm\,\thp\over 4(\lambda^2-1)}\ .
We now define θ ≡ θ m and τ ≡θ

ϕ
− θm = (Tc,m − T
c,ϕ)
/T
0
. Note that τ >0 . There are three possible temperature ranges to consider.
θ
ϕ
> θm > 0 . The only possible phases are I and IV. For phase IV, we must impose the conditions m 2
>0 and ϕ 2
>0 . If λ 2
>1 , then the numerators in eqns. [IVab] must each be positive:
q
2
θm θ q
2
θm θ
ϕ ϕ
λ < , λ < ⇒ λ < min( , ) . (7.10.32)
2 2
θ q θm θ q θm
ϕ ϕ
But since either q 2

θm / θ
ϕ
or its inverse must be less than or equal to unity, this requires λ < −1 , which is unphysical.
If on the other hand we assume λ 2
<1 , the non-negativeness of m and φ requires
2 2
2 θ 2 θ
q θm ϕ q θm ϕ
λ > , λ > ⇒ λ > max( , ) >1 . (7.10.33)
θ q 2 θm θ q 2 θm
ϕ ϕ
Thus, λ > 1 and we have a contradiction.

Therefore, the only allowed phase for θ > 0 is phase I.
θ
ϕ
> 0 > θm . Now the possible phases are I, II, and IV. We can immediately rule out phase I because f\ns_\ssr{II} < f\ns_\ssr{I} . To compare phases II and IV, we compute
\RDelta f = f\ns_\ssr{IV}-f\ns_\ssr{II}={(q\,\lambda\,\thm - q^{-1}\,\thp)^2\over 4(\lambda^2-1)}\ .
Thus, phase II has the lower energy if λ 2

>1 . For λ
2
<1 , phase IV has the lower energy, but the conditions m 2
>0 and φ 2
>0 then entail
2 θ
q θm ϕ
2
<λ < ⇒ q | θm | > θ >0 . (7.10.34)
ϕ
θ q 2 θm
ϕ
Thus, λ is restricted to the range

θ
ϕ
λ ∈ [ −1 , − ] . (7.10.35)
2
q | θm |
With θ m ≡θ <0 and θϕ

≡ θ+τ > 0 , the condition q 2
| θm | > θ
ϕ
is found to be
τ
−τ < θ < − . (7.10.36)
2
q +1
Thus, phase IV exists and has lower energy when

τ θ+τ
−τ < θ < − and −1 < λ < − , (7.10.37)
r+1 rθ
where r = q . 2
0 >θ >θ
ϕ
. In this regime, any phase is possible, however once again phase I can be ruled out since phases II and III are of lower free energy. The condition that phase II have lower free energy
m
than phase III is

f\ns_\ssr{II} - f\ns_\ssr{III} = {\ve\ns_0\over 4}\big(q^{-2}\theta_\phi^2-q^2\theta_m^2\big) < 0\ ,
| θ | < r| θm |
ϕ
, which means r|θ| > |θ| − τ . If r > 1 this is true for all θ < 0 , while if r < 1 phase II is lower in energy only for |θ| < τ /(1 − r) .
[FcoupledLandau] Phase diagram for τ = 0.5 , r = 1.5 (top) and τ = 0.5 , r = 0.25 (bottom). The hatched purple region is unphysical, with a free energy unbounded from below. The blue lines
denote second order transitions. The thick red line separating phases II and III is a first order line.
We next need to test whether phase IV has an even lower energy than the lower of phases II and III. We have
\begin{split} f\ns_\ssr{IV}-f\ns_\ssr{II}&={(q\,\lambda\,\thm - q^{-1}\,\thp)^2\over 4(\lambda^2-1)}\vph\\ f\ns_\ssr{IV}-f\ns_\ssr{III}&={(q\,\thm - q^{-1}\,\lambda\,\thp)^2\over 4(\lambda^2-1)}\ . \end{split}
In both cases, phase IV can only be the true thermodynamic phase if λ 2

<1 . We then require m 2
>0 and φ 2
>0 , which fixes
2 θ
q θm ϕ
λ ∈ [ − 1 , min( , )] . (7.10.38)
2
θ q θm
ϕ
The upper limit will be the first term inside the rounded brackets if 2
q | θm | < θ
ϕ
, if r|θ| < |θ| − τ . This is impossible if r >1 , hence the upper limit is given by the second term in the rounded
brackets:
θ+τ
r > 1 : λ ∈ [−1 , ] (condition for phase IV) . (7.10.39)
rθ
If r < 1 , then the upper limit will be q 2

θm / θ
ϕ
= rθ/(θ + τ ) if |θ| > τ /(1 − r) , and will be θ ϕ
2
/ q θm = (θ + τ )/rθ if |θ| < τ /(1 − r) .
τ θ+τ
r <1 , − < θ < −τ : λ ∈ [−1 , ] (phase IV)
1 −r rθ
τ rθ
r <1 , θ <− : λ ∈ [−1 , ] (phase IV) .
1 −r θ+τ
Representative phase diagrams for the cases r > 1 and r < 1 are shown in Figure [FcoupledLandau].
1. There is always a solution to (∂p/∂v) = 0 at v = ∞ .↩

T
2. Don’t confuse the molar free energy (f ) with the number of molecular degrees of freedom (f )!↩
3. Johannes Diderik van der Waals, the eldest of ten children, was the son of a carpenter. As a child he received only a primary school education. He worked for a living until age 25, and was able to
enroll in a three-year industrial evening school for working class youth. Afterward he continued his studies independently, in his spare time, working as a teacher. By the time he obtained his PhD,
he was 36 years old. He received the Nobel Prize for Physics in 1910.↩
4. See www.nobelprize.org/nobel_prizes/physics/laureates/1910/waals-lecture.pdf↩
5. One could equally well identify the second correspondence as n ⟷ m between density (rather than specific volume) and magnetization. One might object that H is more properly analogous to
μ . However, since μ = μ(p, T ) it can equally be regarded as analogous to p . Note also that βp = zλ for the ideal gas, in which case ξ = z(a/λ ) is proportional to p.↩
−d d
T T
6. Note the distinction between the number of lattice sites N and the number of occupied cells N . According to our definitions, N = (M + N ) .↩
S
1
2 S
7. In the third of the following exponent equalities, d is the dimension of space and ν is the correlation length exponent.↩
8. A Bravais lattice is one in which any site is equivalent to any other site through an appropriate discrete translation. Examples of Bravais lattices include the linear chain, square, triangular, simple
cubic, face-centered cubic, lattices. The honeycomb lattice is not a Bravais lattice, because there are two sets of inequivalent sites – those in the center of a Y and those in the center of an upside
down Y.↩
∂f
9. To obtain this result, one writes f = f (θ, m(θ)) and then differentiates twice with respect to θ , using the chain rule. Along the way, any naked ( undifferentiated) term proportional to may be ∂m
dropped, since this vanishes at any θ by the mean field equation.↩

10. Pierre Curie was a pioneer in the fields of crystallography, magnetism, and radiation physics. In 1880, Pierre and his older brother Jacques discovered piezoelectricity. He was 21 years old at the
time. It was in 1895 that Pierre made the first systematic studies of the effects of temperature on magnetic materials, and he formulated what is known as Curie’s Law, χ = C /T , where C is a
constant. Curie married Marie Sklodowska in the same year. Their research turned toward radiation, recently discovered by Becquerel and Röntgen. In 1898, Pierre and Marie Curie discovered
radium. They shared the 1903 Nobel Prize in Physics with Becquerel. Marie went on to win the 1911 Nobel Prize in Chemistry and was the first person ever awarded two Nobel Prizes. Their
daughter Irène Joliot Curie shared the 1935 Prize in Chemistry (with her husband), also for work on radioactivity. Pierre Curie met an untimely and unfortunate end in the Spring of 1906. Walking
across the Place Dauphine, he slipped and fell under a heavy horse-drawn wagon carrying military uniforms. His skull was crushed by one of the wagon wheels, killing him instantly. Later on that
year, Pierre-Ernest Weiss proposed a modification of Curie’s Law to account for ferromagnetism. This became known as the Curie-Weiss law, χ = C /(T − T ) .↩ c
11. The self-interaction terms with i = j contribute a constant to H ^

and may be either included or excluded. However, this property only pertains to the σ = ±1 model. For higher spin versions of
i
the Ising model, say where S ∈ {−1, 0, +1}, then S is not constant and we should explicitly exclude the self-interaction terms.↩
i
2
i
12. The sum in the discrete Fourier transform is over all ‘direct Bravais lattice vectors’ and the wavevector q may be restricted to the ‘first Brillouin zone’. These terms are familiar from elementary
solid state physics.↩
13. How do we take the logarithm of a matrix? The rule is this: A = ln B if B = exp(A) . The exponential of a matrix may be evaluated via its Taylor expansion.↩
14. The denominator of 2π in the measure is not necessary, and in fact it is even slightly cumbersome. It divides out whenever we take a ratio to compute a thermodynamic average. I introduce this
factor to preserve the relation Tr 1 = 1 . I personally find unnormalized traces to be profoundly unsettling on purely aesthetic grounds.↩
15. Note that the coefficient of the quartic term in ε is negative for θ > . At θ = θ = , the coefficient is positive, but for larger θ one must include higher order terms in the Landau expansion.↩
2
3
c
1
16. It is always the case that f is bounded from below, on physical grounds. Were b negative, we’d have to consider higher order terms in the Landau expansion.↩
17. We needn’t waste our time considering the m = m solution, since the cubic term prefers positive m.↩
−
18. There is a sign difference between the particle susceptibility defined in chapter 6 and the spin susceptibility defined here. The origin of the difference is that the single particle potential v as
defined was repulsive for v > 0 , meaning the local density response δn should be negative, while in the current discussion a positive magnetic field H prefers m > 0 .↩
19. To evoke a negative eigenvalue on a d -dimensional cubic lattice, set q = for all μ . The eigenvalue is then −2dK .↩
μ
π
a 1
20. It needn’t be an equally spaced sequence, for example.↩

21. The function Φ(σ) may involve one or more adjustable parameters which could correspond, for example, to an external magnetic field h . We suppress these parameters when we write the free
energy as f (θ) .↩
22. We should really check that the coefficient of the sixth order term is positive, but that is left as an exercise to the eager student.↩
This page titled 7.10: Appendix II- Additional Examples is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel Arovas.
7.S: Summary
References
concepts.
rd
G. Parisi, Statistical Field Theory (Addison-Wesley, 1988) An advanced text focusing on field theoretic approaches, covering
mean field and Landau-Ginzburg theories before moving on to renormalization group and beyond.
J. P. Sethna, Entropy, Order Parameters, and Complexity (Oxford, 2006) An excellent introductory text with a very modern set
of topics and exercises. Available online at http://www.physics.cornell.edu/sethna/StatMech
Summary
∙ van der Waals system: The van der Waals equation of state may be written p = − , where v is the molar volume.
RT
v−b v
a
2
Comparing with the ideal gas law p = RT /v , the vdW equation accounts for (i) an excluded volume effect due to finite molecular
size, and (ii) a long-distance attraction between molecules. The energy per mole is ε(T , v) = f RT − , where f is the number
1
2
a
of independent quadratic terms in the individual molecular Hamiltonian.

o3in
At fixed T , p(v) is monotonic and decreasing for T > T = 8a/27bR . For T < T , the pressure is no longer monotonic, and
c c
p (v) vanishes at two points v (T ) . For v ∈ [ v , v ] , the isothermal compressibility κ is negative, indicating an
′ 1 ∂v
− +
=− ( )
± T v ∂p T
absolute thermodynamic instability. From p(v, T ) and ε(v, T ) , one can derive the molar free energy
a
f /2
f (T , v) = −RT ln(T (v − b)) − −Ts (7.S.1)
0
v
where s is a constant. Analyzing f (T , v), one finds an even wider range of instability applies, with
0
v
ℓ
< v− < v+ < vg , where
the extremal liquid and gas volumes are determined by the coupled equations
vg
p(T , v ) = p(T , vg ) , ∫ dv p(T , v) = (vg − v ) p(T , v ) . (7.S.2)

ℓ ℓ ℓ
v
ℓ
The Maxwell construction extends f (T , v) by a straight line connecting f (T , v ) and f (T , v ), resulting in the isotherms in Fig.
ℓ g
[vdwiso]. This corresponds to a two phase region in which the homogeneous phase is unstable, either to nucleation, which requires
surmounting an energy barrier, or spinodal composition, which is a spontaneous process.
∙ Lattice gas model: For interactions consisting of a hard core and a weakly attractive tail, such as the Lennard-Jones potential, one
can imagine discretizing space into unit cells on the scale of the core size a . Each cell i can then accommodate either zero or one
particle. The resulting Hamiltonian is an Ising ferromagnet,
^ = −∑J
H σ σ −H ∑σ , (7.S.3)
ij i j i
i<j i
with σ = ±1 , J = − V (R − R ) , and H = k T ln(e
i ij
1
4 i j
λ a ) . The correspondences between the ferromagnet and the
1
2
B
μ/ kB T −d
T
d
liquid-gas system are then v (or n ) ↔ m , with m = M /N the magnetization per site, and p (or μ ) ↔ H . The isothermal
compressibility κ is analogous to the isothermal magnetic susceptibility χ = ( ) . At the critical point, κ (T , p ) = ∞
T T
∂m
∂H T T
c c
↔ χ (T , H ) = ∞ . See Fig. [magPD].

c c
T
o3in
∙ Mean field theory: Consider the Ising model, H

^
= −J ∑ σ σ − H ∑ σ . On each site i , write σ = m + δσ , where
⟨ij⟩ i j i i i i
m = ⟨σ ⟩ . Then σ σ = −m + m (σ + σ ) + δσ δσ , and neglecting the term quadratic in the fluctuations, we arrive at the
2
i i j i j i j
mean field Hamiltonian,
\HH\ns_\ssr{MF}=\half NzJ\,m^2- \big(H+zJm\big)\sum_i\sigma_i\ ,
where z is the lattice coordination number. This corresponds to independent spins in an effective field H
ef f
= H + zJm . For
noninteracting spins in an external field, we have m = tanh(H /k T ) , ef f
B
H + zJm
m = tanh( ) , (7.S.4)
kB T
which is a self-consistent equation for m(T , H ). This equation also follows from extremizing the mean field free energy, given by
F=-\kT\ln\Tra e^{-\HH\ns_\ssr{MF}/\kT} . It is convenient to dimensionalize by writing f = F /zJN , h = H /zJ , and θ = k T /zJ . B
Then
1 m +h
2
f (m, T , h) = m − θ ln cosh( ) − θ ln 2
2 θ
1 1
2 4
=f + (θ − 1) m + m − hm + … ,
0
2 12
where the second line is an expansion for small m and h . The dimensionless mean field equation is m = tanh ((m + h)/θ) .
When h = 0 , we have m = tanh(m/θ) , and for θ > θ , where θ = 1 ( T = zJ/k ), there is only one solution at m = 0 . For
c c c B
θ < θ , there are two additional broken symmetry solutions at m = ±m , and one can check that they correspond to minima in
c 0
− −−−−−−
the free energy, whereas m = 0 is a local maximum. Just below θ , one finds m(θ) = √3(1 − θ) ∝ (θ − θ) , where β = is
c c
β 1
the mean field order parameter exponent.

An order parameter is a quantity which vanishes throughout a disordered phase, usually at high temperature, but which
spontaneously breaks a global symmetry to take a finite value in the ordered phase. For the Ising ferromagnet, the order parameter
is m, the local magnetization. The global symmetry of the Ising model in zero external field is the Z symmetry associated with 2
flipping all the spins: σ → −σ for all i. An external field explicitly breaks this symmetry. For a given system, there may be
i i
several distinct ordered phases and a cascade of symmetry-breaking transitions as temperature is lowered.
Again setting h = 0 , we see that f (θ > θc ) = f
0
, while f (θ < θ c) =f
0
−
3
4
(θc − θ)
2
just below the transition. Thus, there is a
2
∂ f
jump in the specific heat c = −θ 2
at the transition, with Δc = −
3
2
. Very close to the transition, we therefore have
∂θ
−α
c(T ) ∝ |θ − θc | , where the mean field value of the exponent is α = 0 .
As we increase |h| from zero, two of the solutions merge and eventually annihilate at ∗
h (θ) , leaving a unique solution for
∂f
h > h (θ)
∗
, as depicted in Fig. [IPD]. For small m and h ,, setting = 0 , we obtain m + (θ − 1) m − h = 0 . Thus, when θ is
∂m
1
3
3
just above θ = 1 , we have m = h/(θ − 1) , hence the susceptibility is χ =

c ∝ |θ − θ | , where γ = 1 is the mean field ∂m
∂h
c
−γ
susceptibility exponent. The same power law behavior is found for θ < θ ; one finds m(θ) = m c 0
(θ) +
h
2(1−θ)
. Finally, if we fix
θ = θc , we have m(θ , h) ∝ h
c with δ = 3 . The quantities α , β, γ, and δ are critical exponents for the Ising transition. Mean
1/δ
field theory becomes exact when the number of neighbors is infinite, which arises in two hypothetical settings: (i) infinite range
interactions, or (ii) infinite spatial dimension.
∂f
A phenomenological model for magnetization dynamics takes =− , so m is dissipatively driven to a local minimum of the
∂m
∂t ∂m
free energy. This is a simple dynamical system with control parameters (θ, h) . For h = 0 , the point θ = θ corresponds to a c
supercritical pitchfork bifurcation, and more generally there is an imperfect bifurcation everywhere along the curve h = h (θ) , ∗
defined by the simultaneous vanishing of both ∂f /∂m and ∂ f /∂ m , corresponding to the dashed green curve in Fig. [IPD]. This
2 2
leads to the phenomenon of hysteresis: a protocol in which the control parameters cross both branches of this curve is irreversible.
Phase diagram for the Ising ferromagnet. In the hatched blue region, the mean field equations have three solutions. Along the
boundary dashed green line, where is a saddle-node bifurcation so that there is a unique solution to the MF equations in the white
region. The thermodynamic properties are singular, with discontinuous magnetization, along the solid black line, which terminates
in the critical point at (θ, h) = (1, 0).
∙ Variational density matrix: The free energy is given by F ^

= Tr (ϱH ) + kB T Tr (ϱ ln ϱ) . Extremizing F with respect to ϱ subject
^
to the normalization condition Tr ϱ = 1 yields the equilibrium Gibbs distribution ϱ = Z e . Any distribution other than that −1 −βH
of Gibbs will yield a larger value of F . Therefore, we can construct a variational Ansatz for ϱ and minimize F with respect to its
variational parameters. For example, in the case of the Ising model H
^
= −∑ J σ σ −H ∑ σ , then assuming translational
i<j ij i j i i
invariance J ij
= J(| R − R |)
i j
, we write ϱ var (σ1 , …,σ
N
) =∏
N
i=1
~
ϱ (σ )
i
, with
1 +m 1 −m
~
ϱ (σ) = ( ) δσ,1 + ( ) δσ,−1 . (7.S.5)
2 2
Adimensionalizing by writing θ = k B
^
T / J (0) and h = H /J^(0) with J^(0) = ∑ j
J
ij
, one finds the variational free energy is
1 1 +m 1 +m 1 −m 1 −m
2
f (m, θ, h) =− m − hm + θ {( ) ln ( )+( ) ln ( )}
2 2 2 2 2
1 θ θ
2 4 6
= −θ ln 2 − hm + (θ − 1) m + m + m +…
2 12 30
Extremizing with respect to m yields the same equation as before: m = tanh ((m + h)/θ) . One can prove that this variational
density matrix formulation of mean field theory yields identical results to the "neglect of fluctuations" method described above.
∙ Landau theory of phase transitions: The basic idea is to write a phenomenological expansion of the free energy in powers of the
order parameter(s) of a system, with coefficients depending on quantities such as temperature and field, and keeping terms only up
to some low order. On then analyzes how the minima of the resulting finite degree polynomial behave as a function of these
coefficients. The simplest case is that of a model with Ising symmetry, where the order parameter is a real scalar quantity m. One
writes
1 2
1 4
f =f + am + bm − hm , (7.S.6)
0
2 4
with b > 0 for stability. Extremizing with respect to m yields am + bm − h = 0 . For 3

a >0 there is a unique solution to this
equation for m(h), but for a < 0 there are three roots when |h| < h (a) , with h (a) = ∗ ∗
3/2
2
b
−1/2
(−a) . For h = 0 , one has
3/2
3
−−−−
m(a > 0) = 0 and m(a < 0) = ±√−a/b . Thus, a c =0 is the critical point in zero field.
o3in
For certain systems, such as the liquid-gas transition, there is no true Ising symmetry between the two homogeneous phases. The
order parameter, which can taken to be proportional to the density relative to that at the critical point, is again a real scalar. With no
Z symmetry, we write
2
1 2
1 3
1 4
f =f + am − ym + bm , (7.S.7)
0
2 3 4
with b >0 and y >0 . Extremizing yields (a − ym + b m ) m = 0

2
, which has three roots, one at m =0 and the other two at
−−−−−−−−
y y 2
m =m
±
≡
2b
± √(
2b
) −
a
b
. The situation is as depicted in Fig. [quartic]. For y
2
> 4ab only the m =0 root is real. For
ab , all three roots are real, but the minimum of f remains at m = 0 . For y > ab , all three roots are real, with a
2 9 2 9
4ab < y <
2 2
global minimum at m = m and a local one at m = m . Thus, along the curve y = ab , there is a discontinuous change in the
+ −
2 9
order parameter, between m = 0 and m = 3a/y , which is the hallmark of a first order phase transition. Note that this occurs for
a > 0 , before the coefficient of the quadratic term in f (m) has changed sign. One says in this case that the first order transition
preempts the second order one.

∙ Mean field theory of fluctuations: For the Ising model, H
^
= −∑
i<j
J
ij
σ σ
i j
−∑ H σ
i i i
, now with local fields H
i
, the local
∂m
magnetization is m
i
= ⟨σ ⟩ = −
i
∂F
∂H
. The susceptibility, given by χ
ij
=
∂H
i
, is an example of a thermodynamic response
i j
function. In equilibrium, it is related to the correlation function,

C ≡ ⟨σ σ ⟩ − ⟨σ ⟩ ⟨σ ⟩ , (7.S.8)
ij i j i j
with C = k T χ . Within mean field theory, this relation no longer applies, and it is the response functions which are more
ij B
ij
accurately represented: the usual MF description treats each site as independent, hence C^\ssr{MF}_{ij}=0 (!) To compute
\xhi^\ssr{MF}_{ij} , take a variational density matrix which is a product of single-site ones, as above, where the local magnetization
is m . Extremizing the resulting free energy with respect to each m yields a set of coupled nonlinear equations,
i i
∑ J m +H
j ij j i
m = tanh( ) . (7.S.9)
i
kB T
∂m −1
Expanding for small fields and magnetizations, one obtains ∑
j
(kB T δ
ij
−J
ij
)m
j
=H
i
, hence χ
ij
=
∂Hj
i
= (kB T ⋅ I − J )
ij
.
For translationally invariant systems, the eigenvectors of the matrix J are plane waves ψ ij q,i
=e
iq⋅Ri
, and one has
^ (q)
H ∂m
^ (q) 1
^ (q) =
m ⇒ ^(q) =
χ = , (7.S.10)
^ ^ ^
kB T − J (q) ∂ H (q) kB T − J (q)
where J^(q) = ∑ J(R) e R

. The mean field value of T is then J^(Q) , where q = Q is the ordering wavevector which
−iq⋅R
c
maximizes J^(q) . For a ferromagnet, which is dominated by positive values of J , one has Q = 0 , and expanding about this point ij
one may write J^(q) = k B

Tc − C q
2
+… , in which case χ(q) ∝ (ξ
−2
+q )
2 −1
at long wavelengths, which is of the Ornstein-
Zernike (OZ) form.
∙ Global symmetries: A global symmetry is an operation carried out equally at every point in space (continuous systems) or in
every unit cell of the lattice (discrete systems) such that the Hamiltonian is left invariant. The symmetry operations comprise a
group G. In the absence of a symmetry-breaking external field, Ising systems have symmetry group Z . The p-state clock model 2
has symmetry group Z . The q-state Potts model has symmetry group S (the permutation group on q elements). In each of these
p q
cases, the group G is discrete. Examples of models with continuous symmetries include the XY model (G = O(2) ), the
Heisenberg model (G = O(3) or O(n)), the Standard Model of particle physics (G = SU(3) × SU(2) × U(1) ), Depending on
whether G is discrete or continuous, and on the dimension of space, there may be no ordered phase possible. The lower critical
dimension d of a model is the dimension at or below which there is no spontaneous symmetry breaking at any finite temperature.
ℓ
For systems with discrete global symmetries, d = 1 . For systems with continuous global symmetries, d = 2 . The upper critical
ℓ ℓ
dimension d is the dimension above which mean field exponents are exact. This depends on structure of the model itself, and not
u
all models have a finite upper critical dimension.

o3in
∙ Random systems: A system with quenched randomness orders in a different way than a pure one. Typically the randomness may
be modeled as a weak symmetry breaking field that is spatially varying, but averages to zero on large scales. Imry and Ma (1975)
reasoned that such a system could try to lower its energy by forming domains in which the order parameter takes advantage of local
fluctuations in the random field. If the size of these domains is L , then the rms fluctuations of the random field integrated over a
d
d/2
single domain are proportional to L , where d is the dimension of space. By aligning the order parameter in each domain with
d
the direction of the average field therein, one lowers the energy by E ≈ −H (L /a) per domain, where a is a
bulk rms d
d/2
microscopic length. The surface energy of a single domain is E ≈ J(L /a) , where σ = 1 if the global symmetry is
surf d
d−σ
discrete and σ = 2 if it is continuous. This follows from a simple calculation of the associated domain wall energy. Dividing by the
number of atoms (or unit cells) in a domain (L /a) , one obtains the energy density,
d
d
d
σ
a a 2
f ≈ J( ) − Hrms ( ) . (7.S.11)
L L
d d
For d < 2σ the surface term

(∝ J) dominates for small L and the bulk term for large L . The energy has a minimum at
d d
L
d
≈ a (2σJ/dHrms ) . Thus, for d < 2σ the ordered state is always unstable to domain formation in the presence of a
2/(2σ−d)
random field. For d > 2σ , the relevant dominance of the two terms is reversed, and the minimum becomes a maximum. There are
then two possibilities, depending on the relative size of J and H . The smallest allowed value for L is the lattice scale a , in
rms d
which case f (L = a) ≈ J − H
d
. Comparing with f (L = ∞) = 0 , we see that if the random field is weak, so J > H
rms d
, the rms
minimum energy state occurs for L = ∞ , the system has an ordered ground state. We then expect a finite critical temperature
d
T > 0 for a transition to a high T disordered state. If on the other hand the random field is strong and J < H
c , then the energy rms
is minimized for L = a , meaning the ground state of the system is disordered down to the scale of the lattice spacing. In this case
d
there is no longer any finite temperature phase transition, because there is no ordered phase.
∙ Ginzburg-Landau theory: Allow the order parameter to vary in space. The free energy is then a functional of m(x):
d
1 2
1 4
1 2
F [m(x) , h(x)] = ∫ d x {f + am + bm − hm + κ (∇m ) +… } . (7.S.12)
0
2 4 2
Extremize F by setting the functional derivative δF /δm(x) to zero, resulting in

3 2
am + b m − hm − κ ∇ m = 0 . (7.S.13)
For a > 0 and small h (take b, c > 0 ) then m is small, and one has (a − κ∇ )m = h , hence m ^
^ (q) = h (q)/(a + κ q ) , which is
2 2
of the OZ form. If a < 0 , write m(x) = m + δm(x) , and for small |a| find m = −a/3b and δm(q) = h
0
^ ^
(q)/(−2a + κ q ) .
2
0
2
Deeper in the ordered (a < 0 ) phase, and for h = 0 , one can envisage a situation where m(x) interpolates between the two
degenerate values ±m . Assuming the variation occurs only along one direction, one can solve am + bm − κ d m/dx = 0 to
0
3 2 2
–
obtain m(x) = m tanh(x/√2 ξ), where the coherence length is ξ = (κ/|a|) .
0
1/2
∙ Ginzburg criterion: The actual Helmholtz free energy, which we will here call A(T , H , V , N ), is obtained by performing a
functional integral over the order parameter field. The partition function is Z = e = ∫ Dm e . Near T , we are −βA −βF [m(x)]
c
licensed to keep only up to quadratic terms in m and its gradients in F [m], resulting in
2
1 a+κ q
A = kB T ∑ ln( ) . (7.S.14)
2 π kB T
q
Let a(t) = αt with t ≡ (T − T c )/ Tc , and let Λ −1

be the microscopic (lattice) cutoff. The specific heat is then (for t > 0 ):
Λ
2 2 −d d ⎧ const.
⎪ if d > 4
1 ∂ A α Λ d q 1
c =− T = ∫ ∼ ⎨ − ln t if d = 4 (7.S.15)
d 2 2 d −2 2 2
V Λ ∂T 2κ (2π) (ξ +q ) ⎩
⎪ d
−2
t 2
if d < 4 ,
1/2
with ξ = (κ/α|t|) −1/2
∝ |t| .
The upper critical dimension is d = 4 . For d > 4 , mean field theory is qualitatively accurate, with finite corrections. In
ℓ
dimensions d ≤ 4 , the mean field result is overwhelmed by fluctuation contributions as t → 0 ( as T → T ). We see that MFT +
c
+
is sensible provided the fluctuation contributions are small, provided

−4 d 4−d
R a ξ ≪ 1 , (7.S.16)
with R = (κ/α) 1/2

, which entails t\gg t\ns_\ssr{G} , where
t\ns_\ssr{G}=\bigg({\Sa\over R}\bigg)^{\!{2d\over 4-d}}
is the Ginzburg reduced temperature. The criterion for the sufficiency of mean field theory, namely t\gg t\ns_\ssr{G} , is known as
the Ginzburg criterion. The region |t|<t\ns_\ssr{G} is known as the critical region. In a lattice ferromagnet, R ∼ a is on the scale of
the lattice spacing itself, hence t\ns_\ssr{G}\sim 1 and the critical regime is very large. Mean field theory then fails quickly as
T → T . In a (conventional) three-dimensional superconductor, R is on the order of the Cooper pair size, and R/a ∼ 10 − 10 , 2 3
c
hence t\ns_\ssr{G}=(a/R)^6\sim 10^{-18} - 10^{-12} is negligibly narrow. The mean field theory of the superconducting transition –
BCS theory – is then valid essentially all the way to T = T . c
CHAPTER OVERVIEW
8: Nonequilibrium Phenomena
8.S: Summary
Thumbnail: Discretization of a continuous function.
This page titled 8: Nonequilibrium Phenomena is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel
Arovas.
1
Classical equilibrium statistical mechanics is described by the full N -body distribution,
^
⎧ Z −1 ⋅ 1
e
−βHN (p,x)
OC E
⎪
⎪ N N!
0
f (x , … , x ;p ,…,p ) =⎨ (8.1.1)
1 N 1 N
⎪
⎩
⎪ 1 ^
−1 βμN −βHN (p,x)
Ξ ⋅ e e GC E .
N!
We assume a Hamiltonian of the form

N 2 N N
p
^ i
H =∑ + ∑ v(x ) + ∑ u(x − x ), (8.1.2)
N i i j
2m
i=1 i=1 i<j
typically with v = 0 , only two-body interactions. The quantity

d d d d
d x d p d x d p
0 1 1 N N
f (x , … , x ;p ,…,p ) ⋯ (8.1.3)
1 N 1 N d d
h h
is the probability, under equilibrium conditions, of finding N particles in the system, with particle #1 lying within d x of x and 3
1 1
having momentum within d p of p , The temperature T and chemical potential μ are constants, independent of position. Note
d
1 1
that f ({x }, {p }) is dimensionless.

i i
Nonequilibrium statistical mechanics seeks to describe thermodynamic systems which are out of equilibrium, meaning that the
distribution function is not given by the Boltzmann distribution above. For a general nonequilibrium setting, it is hopeless to make
progress – we’d have to integrate the equations of motion for all the constituent particles. However, typically we are concerned
with situations where external forces or constraints are imposed over some macroscopic scale. Examples would include the
imposition of a voltage drop across a metal, or a temperature differential across any thermodynamic sample. In such cases,
scattering at microscopic length and time scales described by the mean free path ℓ and the collision time τ work to establish local
equilibrium throughout the system. A local equilibrium is a state described by a space and time varying temperature T (r, t) and
chemical potential μ(r, t) . As we will see, the Boltzmann distribution with T = T (r, t) and μ = μ(r, t) will not be a solution to
the evolution equation governing the distribution function. Rather, the distribution for systems slightly out of equilibrium will be of
the form f = f + δf , where f describes a state of local equilibrium.
0 0
We will mainly be interested in the one-body distribution

N
f (r, p; t) = ∑ ⟨ δ(x (t) − r) δ(p (t) − p) ⟩

i i
i=1
d d
= N∫ ∏ d x d p f (r, x , … , x ; p, p , … , p ; t) .
i i 2 N 2 N
i=2
In this chapter, we will drop the 1/ℏ normalization for phase space integration. Thus, f (r, p, t) has dimensions of −d
h , and
f (r, p, t) d r d p is the average number of particles found within d r of r and d p of p at time t .
3 3 3 3
In the GCE, we sum the RHS above over N . Assuming v = 0 so that there is no one-body potential to break translational
symmetry, the equilibrium distribution is time-independent and space-independent:
2
0 −3/2 −p /2mk T
f (r, p) = n (2πm kB T ) e B
, (8.1.4)
where n = N /V or n = n(T , μ) is the particle density in the OCE or GCE. From the one-body distribution we can compute
things like the particle current, j, and the energy current, j : ε
d
p
j(r, t) = ∫ d p f (r, p; t)
m
p
d
jε (r, t) = ∫ d p f (r, p; t) ε(p) ,
m
where ε(p) = p /2m . Clearly these currents both vanish in equilibrium, when f = f , since f (r, p) depends only on
2 0 0
p
2
and
not on the direction of p . In a steady state nonequilibrium situation, the above quantities are time-independent.
Thermodynamics says that
dq = T ds = dε − μ dn , (8.1.5)
where s , ε , and n are entropy density, energy density, and particle density, respectively, and dq is the differential heat density. This
relation may be case as one among the corresponding current densities:
j = Tj =j −μ j . (8.1.6)
q s ε
Thus, in a system with no particle flow, j = 0 and the heat current j is the same as the energy current j .
q ε
When the individual particles are not point particles, they possess angular momentum as well as linear momentum. Following
Lifshitz and Pitaevskii, we abbreviate Γ = (p, L) for these two variables for the case of diatomic molecules, and
^ ⋅ L) in the case of spherical top molecules, where n
Γ = (p, L, n ^ is the symmetry axis of the top. We then have, in d = 3
dimensions,
3
⎧d p
⎪ point particles
3
dΓ = ⎨ d p L dL dΩL diatomic molecules (8.1.7)
⎩
⎪ 3 2
d p L dL dΩ d cos ϑ symmetric tops ,
L
where ϑ = cos −1 ^
^ ⋅L
(n ) . We will call the set Γ the ‘kinematic variables’. The instantaneous number density at r is then
n(r, t) = ∫ dΓ f (r, Γ; t) . (8.1.8)
One might ask why we do not also keep track of the angular orientation of the individual molecules. There are two reasons. First,
the rotations of the molecules are generally extremely rapid, so we are justified in averaging over these motions. Second, the
orientation of, say, a rotor does not enter into its energy. While the same can be said of the spatial position in the absence of
external fields, (i) in the presence of external fields one must keep track of the position coordinate r since there is physical
transport of particles from one region of space to another, and (iii) the collision process, which as we shall see enters the dynamics
of the distribution function, takes place in real space.
This page titled 8.1: Equilibrium, Nonequilibrium and Local Equilibrium is shared under a CC BY-NC-SA license and was authored, remixed,
Derivation of the Boltzmann equation
For simplicity of presentation, we assume point particles. Recall that
3
3 3 \rm\# of particles with positions within d r of
f (r, p, t) d r d p ≡ { (8.2.1)
3
r and momenta within d p of p at time t.
We now ask how the distribution functions f (r, p, t) evolves in time. It is clear that in the absence of collisions, the distribution function must satisfy the continuity equation,
∂f
+ ∇⋅(uf ) = 0 . (8.2.2)
∂t
This is just the condition of number conservation for particles. Take care to note that ∇ and u are six-dimensional phase space vectors:
u = ( ẋ , ẏ , ż , ṗ x , ṗ y , ṗ z )
∂ ∂ ∂ ∂ ∂ ∂
∇ =( , , , , , ) .
∂x ∂y ∂z ∂px ∂py ∂pz
The continuity equation describes a distribution in which each constituent particle evolves according to a prescribed dynamics, which for a mechanical system is specified by
dr ∂H dp ∂H
= = v(p) , =− = Fext , (8.2.3)
dt ∂p dt ∂r
where F is an external applied force. Here,

H (p, r) = ε(p) + U (r) . (8.2.4)
ext
For example, if the particles are under the influence of gravity, then U ext (r) = mg ⋅ r and F = −∇U ext
= −mg .
Note that as a consequence of the dynamics, we have ∇⋅u = 0 , phase space flow is incompressible, provided that ε(p) is a function of p alone, and not of r . Thus, in the absence of collisions, we
have
∂f
+ u ⋅∇f = 0 . (8.2.5)
∂t
The differential operator D t ≡ ∂t + u ⋅∇ is sometimes called the ‘convective derivative’, because D t

f is the time derivative of f in a comoving frame of reference.
Next we must consider the effect of collisions, which are not accounted for by the semiclassical dynamics. In a collision process, a particle with momentum p and one with momentum ~
p can
instantaneously convert into a pair with momenta p and p
~
′
, provided total momentum is conserved: p + p
′ ~
= p + p . This means that D f ≠ 0 . Rather, we should write
~ ′ ′
t
∂f ∂f ∂f ∂f
+ ṙ ⋅ + ṗ ⋅ =( ) (8.2.6)
∂t ∂r ∂p ∂t
coll
where the right side is known as the collision integral. The collision integral is in general a function of r , p , and t and a functional of the distribution f .
After a trivial rearrangement of terms, we can write the Boltzmann equation as
∂f ∂f ∂f
=( ) +( ) , (8.2.7)
∂t ∂t ∂t
str coll
where
∂f ∂f ∂f
( ) ≡ −ṙ ⋅ − ṗ ⋅ (8.2.8)
∂t ∂r ∂p
str
is known as the streaming term. Thus, there are two contributions to ∂f /∂t : streaming and collisions.
Collisionless Boltzmann equation

In the absence of collisions, the Boltzmann equation is given by
∂f ∂ε ∂f ∂f
+ ⋅ − ∇U ⋅ =0 . (8.2.9)
ext
∂t ∂p ∂r ∂p
In order to gain some intuition about how the streaming term affects the evolution of the distribution f (r, p, t), consider a case where F ext
=0 . We then have
∂f p ∂f
+ ⋅ =0 . (8.2.10)
∂t m ∂r
Clearly, then, any function of the form
f (r, p, t) = φ(r − v(p) t , p) (8.2.11)
will be a solution to the collisionless Boltzmann equation, where v(p) = ∂ε
∂p
. One possible solution would be the Boltzmann distribution,
2
μ/ kB T −p /2mkB T
f (r, p, t) = e e , (8.2.12)
which is time-independent1. Here we have assumed a ballistic dispersion, ε(p) = p 2

/2m .
For a slightly less trivial example, let the initial distribution be φ(r, p) = A e , so that
2 2 2 2
−r /2 σ −p /2 κ
e
2
pt
−(r− ) 2 2 2
/2 σ −p /2 κ
f (r, p, t) = A e m
e . (8.2.13)
Consider the one-dimensional version, and rescale position, momentum, and time so that
1 ¯ 2 1 2
− ( x̄−p̄ t ) − p̄
f (x, p, t) = A e 2
e 2
. (8.2.14)
1
2
Consider the level sets of f , where f (x, p, t) = A e −

2
α
. The equation for these sets is
−−−−−−
2 2
x̄ = p̄ t̄ ± √ α − p̄ . (8.2.15)
For fixed t̄ , these level sets describe the loci in phase space of equal probability densities, with the probability density decreasing exponentially in the parameter α . For t̄ = 0 , the initial distribution 2
describes a Gaussian cloud of particles with a Gaussian momentum distribution. As t̄ increases, the distribution widens in x̄ but not in p̄ – each particle moves with a constant momentum, so the set
of momentum values never changes. However, the level sets in the (x̄ , p̄ ) plane become elliptical, with a semimajor axis oriented at an angle θ = ctn
−1
(t) with respect to the x̄ axis. For t̄ > 0 , he
particles at the outer edges of the cloud are more likely to be moving away from the center. See the sketches in Figure [Fstreaming]
Suppose we add in a constant external force F ext
. Then it is easy to show (and left as an exercise to the reader to prove) that any function of the form
2
p t Fext t Fext t
f (r, p, t) = A φ(r − + , p− ) (8.2.16)
m 2m m
satisfies the collisionless Boltzmann equation (ballistic dispersion assumed).
1
¯)2 1 2 1 2
[Fstreaming] Level sets for a sample f (x̄, p̄ , t̄ ) = A e −

2 e , for values f
(x̄− p̄ t −
2
p̄
= Ae
−
2
α
with α in equally spaced intervals from α = 0.2 (red) to α = 1.2 (blue). The time variable t̄ is taken
to be t̄ = 0.0 (upper left), 0.2 (upper right), 0.8 (lower right), and 1.3 (lower left).
Collisional invariants
Consider a function A(r, p) of position and momentum. Its average value at time t is
3 3
A(t) = ∫ d r d p A(r, p) f (r, p, t) . (8.2.17)
Taking the time derivative,

dA ∂f
3 3
= ∫ d r d p A(r, p)
dt ∂t
3 3
∂ ∂ ∂f
= ∫ d r d p A(r, p) {− ⋅ (ṙf ) − ⋅ (ṗf ) + ( ) }
∂r ∂p ∂t
coll
∂A dr ∂A dp ∂f
3 3
= ∫ d r d p {( ⋅ + ⋅ )f + A(r, p) ( ) } .
∂r dt ∂p dt ∂t coll
Hence, if A is preserved by the dynamics between collisions, then2

dA ∂A dr ∂A dp
= ⋅ + ⋅ =0 . (8.2.18)
dt ∂r dt ∂p dt
We therefore have that the rate of change of A is determined wholly by the collision integral
dA ∂f
3 3
= ∫ d r d p A(r, p) ( ) . (8.2.19)
dt ∂t
coll
Quantities which are then conserved in the collisions satisfy A

˙
= 0 . Such quantities are called collisional invariants. Examples of collisional invariants include the particle number (A = 1) , the
components of the total momentum (A = p ) (in the absence of broken translational invariance, due to the presence of walls), and the total energy (A = ε(p) ).
μ
Scattering processes
What sort of processes contribute to the collision integral? There are two broad classes to consider. The first involves potential scattering, where a particle in state |Γ⟩ scatters, in the presence of an
external potential, to a state |Γ ⟩. Recall that Γ is an abbreviation for the set of kinematic variables, Γ = (p, L) in the case of a diatomic molecule. For point particles, Γ = (p , p , p ) and dΓ = d p .
′
x y z
3
We now define the function w(Γ |Γ) such that

′
′ ′
rate at which a particle within dΓ of (r, Γ)
w(Γ |Γ) f (r, Γ; t) dΓ dΓ = { (8.2.20)
′ ′
scatters to within dΓ of (r, Γ ) at time t.
The units of w dΓ are therefore 1/T . The differential scattering cross section for particle scattering is then
′
w(Γ |Γ)
′
dσ = dΓ , (8.2.21)
n |v|
where v = p/m is the particle’s velocity and n the density.

The second class is that of two-particle scattering processes, |ΓΓ 1
⟩ → |Γ Γ ⟩
′ ′
1
. We define the scattering function w(Γ Γ ′ ′
1
| ΓΓ )
1
by
⎧ rate at which two particles within dΓ of (r, Γ)

⎪
′ ′ ′ ′
w(Γ Γ | ΓΓ ) f (r, Γ; r, Γ ; t) dΓ dΓ dΓ dΓ = ⎨ and within dΓ of (r, Γ ) scatter into states within (8.2.22)
1 1 2 1 1 1 1 1
⎩
⎪ ′ ′ ′ ′
dΓ of (r, Γ ) and dΓ of (r, Γ ) at time t\,,
1 1
where
′ ′ ′ ′
f (r, p ; r , p ; t) = ⟨ ∑ δ(x (t) − r) δ(p (t) − p) δ(x (t) − r ) δ(p (t) − p ) ⟩ (8.2.23)
2 i i j j
i,j
is the nonequilibrium two-particle distribution for point particles. The differential scattering cross section is
′ ′
w(Γ Γ | ΓΓ )
1 1 ′ ′
dσ = dΓ dΓ . (8.2.24)
1
|v − v |
1
We assume, in both cases, that any scattering occurs locally, the particles attain their asymptotic kinematic states on distance scales small compared to the mean interparticle separation. In this case we
can treat each scattering process independently. This assumption is particular to rarefied systems, gases, and is not appropriate for dense liquids. The two types of scattering processes are depicted in
Figure [FCIscatt].
[FCIscatt] Left: single particle scattering process |Γ⟩ → |Γ ⟩ . Right: two-particle scattering process |ΓΓ ′
1⟩
′
→ |Γ Γ ⟩
′
1
.
In computing the collision integral for the state |r, Γ⟩, we must take care to sum over contributions from transitions out of this state, |Γ⟩ → |Γ ⟩ , which reduce ′
, and transitions into this state,
f (r, Γ)
| Γ ⟩ → |Γ⟩ , which increase f (r, Γ). Thus, for one-body scattering, we have
′
D ∂f
′ ′ ′ ′
f (r, Γ; t) = ( ) = ∫ dΓ {w(Γ | Γ ) f (r, Γ ; t) − w(Γ | Γ) f (r, Γ; t)} . (8.2.25)
Dt ∂t
coll
For two-body scattering, we have
D ∂f
f (r, Γ; t) = ( )
Dt ∂t
coll
′ ′ ′ ′ ′ ′
= ∫ dΓ ∫ dΓ ∫ dΓ {w(ΓΓ | Γ Γ ) f (r, Γ ; r, Γ ; t)
1 1 1 1 2 1
′ ′
− w(Γ Γ | ΓΓ ) f (r, Γ; r, Γ ; t)} .
1 1 2 1
Unlike the one-body scattering case, the kinetic equation for two-body scattering does not close, since the LHS involves the one-body distribution f ≡f
1
and the RHS involves the two-body
distribution f . To close the equations, we make the approximation
2
′ ~
~ ~ ~
f (r, Γ ; r, Γ; t) ≈ f (r, Γ; t) f (r, Γ; t) . (8.2.26)
2
We then have
D
′ ′ ′ ′ ′ ′
f (r, Γ; t) = ∫ dΓ ∫ dΓ ∫ dΓ {w(ΓΓ | Γ Γ ) f (r, Γ ; t) f (r, Γ ; t)
1 1 1 1 1
Dt
′ ′
− w(Γ Γ | ΓΓ ) f (r, Γ; t) f (r, Γ ; t)} .
1 1 1
Detailed balance
Classical mechanics places some restrictions on the form of the kernel w(ΓΓ 1
|Γ Γ )
′ ′
1
. In particular, if \Gamma^\sss{T}=(-\Bp,-\BL) denotes the kinematic variables under time reversal, then
w\big(\Gamma'\Gamma'_1 \, | \, \Gamma\Gamma_1\big)= w\big(\Gamma^\sss{T}\Gamma^\sss{T}_1 \, | \,\Gamma'{}^\sss{T}\Gamma'_1{}^\sss{T}\big)\ . \label{TRw}
This is because the time reverse of the process |ΓΓ 1

′
⟩ → |Γ Γ ⟩
′
1
is \tket{\Gamma'{}^\ssr{T}\Gamma'_1{}^\ssr{T}}\to\tket{\Gamma^\ssr{T}\Gamma^\ssr{T}_1} .
In equilibrium, we must have
w\big(\Gamma'\Gamma'_1 \, | \, \Gamma\Gamma_1\big)\,\,f^0(\Gamma)\,f^0(\Gamma\ns_1) \,d^4\!\Gamma= w\big(\Gamma^\sss{T}\Gamma^\sss{T}_1 \, | \,\Gamma'{}^\sss{T}\Gamma'_1{}^\sss{T}\big)\,f^0(\Gamma'{
where
d^4\!\Gamma\equiv d\Gamma\,d\Gamma\ns_1\,d\Gamma' d\Gamma'_1\qquad,\qquad d^4\!\Gamma^\sss{T}\equiv d\Gamma^\sss{T}\,d\Gamma_1^\sss{T}\,d\Gamma'{}^\sss{T} d\Gamma'_1{}^\sss{T}\ .
Since d\Gamma=d\Gamma^\sss{T} , we may cancel the differentials above, and after invoking Equation [TRw] and suppressing the common r label, we find
f^0(\Gamma)\,f^0(\Gamma\ns_1)=f^0(\Gamma'{}^\sss{T})\,f^0(\Gamma'_1{}^\sss{T})\ .
This is the condition of detailed balance. For the Boltzmann distribution, we have
0 −ε/ kB T
f (Γ) = A e , (8.2.27)
where A is a constant and where ε = ε(Γ) is the kinetic energy, ε(Γ) = p 2

/2m in the case of point particles. Note that \ve({\Gamma^\sss{T}})=\ve(\Gamma) . Detailed balance is satisfied because the
kinematics of the collision requires energy conservation:
′ ′
ε+ε = ε +ε . (8.2.28)
1 1
Since momentum is also kinematically conserved,

′ ′
p +p = p +p , (8.2.29)
1 1
any distribution of the form

0 −(ε−p⋅V)/ kB T
f (Γ) = A e (8.2.30)
also satisfies detailed balance, for any velocity parameter V. This distribution is appropriate for gases which are flowing with average particle V.
In addition to time-reversal, parity is also a symmetry of the microscopic mechanical laws. Under the parity operation P , we have r → −r and p → −p . Note that a pseudovector such as L = r × p
is unchanged under P . Thus, \Gamma^\sss{P}=(-\Bp,\BL) . Under the combined operation of C = P T , we have \Gamma^\sss{C}=(\Bp,-\BL) . If the microscopic Hamiltonian is invariant under C , then
we must have
w\big(\Gamma'\Gamma'_1 \, | \, \Gamma\Gamma\ns_1\big)=w\big(\Gamma^\sss{C}\Gamma_1^\sss{C} \, | \, \Gamma'{}^\sss{C}\Gamma'_1{}^\sss{C}\big)\ .
For point particles, invariance under T and P then means

′ ′ ′ ′
w(p , p | p, p ) = w(p, p |p ,p ) , (8.2.31)
1 1 1 1
and therefore the collision integral takes the simplified form,
Df (p) ∂f
=( )
Dt ∂t
coll
3 3 ′ 3 ′ ′ ′ ′ ′
= ∫ d p ∫ d p∫ d p w(p , p | p, p ) {f (p ) f (p ) − f (p) f (p )} ,
1 1 1 1 1 1
where we have suppressed both r and t variables.

The most general statement of detailed balance is
0 ′ 0 ′ ′ ′
f (Γ ) f (Γ ) w(Γ Γ | ΓΓ )
1 1 1
= . (8.2.32)
0 0 ′ ′
f (Γ) f (Γ ) w(ΓΓ |Γ Γ )
1 1 1
Under this condition, the collision term vanishes for f =f

0
, which is the equilibrium distribution.
Kinematics and cross section

We can rewrite Equation [BEwp] in the form
Df (p) ∂σ
3 ′ ′
= ∫ d p ∫ dΩ |v − v | {f (p ) f (p ) − f (p) f (p )} , (8.2.33)
1 1 1 1
Dt ∂Ω
where ∂σ
∂Ω
is the differential scattering cross section. If we recast the scattering problem in terms of center-of-mass and relative coordinates, we conclude that the total momentum is conserved by the
collision, and furthermore that the energy in the CM frame is conserved, which means that the magnitude of the relative momentum is conserved. Thus, we may write p ′
−p
′
1
^
= |p − p | Ω
1
, where
Ω is a unit vector. Then p and p are determined to be
^ ′ ′
1
1
′ ^
p = (p + p + |p − p | Ω)
1 1
2
1
′ ^
p = (p + p − |p − p | Ω) .
1 1 1
2
H -theorem
Let’s consider the Boltzmann equation with two particle collisions. We define the local ( r -dependent) quantity
ρφ (r, t) ≡ ∫ dΓ φ(Γ, f ) f (Γ, r, t) . (8.2.34)
At this point, φ(Γ, f ) is arbitrary. Note that the φ(Γ, f ) factor has r and t dependence through its dependence on f , which itself is a function of r , Γ, and t . We now compute
∂ρφ ∂(φf ) ∂(φf ) ∂f
= ∫ dΓ = ∫ dΓ
∂t ∂t ∂f ∂t
∂(φf ) ∂f
= − ∫ dΓ u ⋅ ∇(φf ) − ∫ dΓ ( )
∂f ∂t coll
∂(φf ) ∂f
^ ⋅ (u φf ) − ∫ dΓ
= − ∮ dΣ n ( ) .
∂f ∂t
coll
The first term on the last line follows from the divergence theorem, and vanishes if we assume f =0 for infinite values of the kinematic variables, which is the only physical possibility. Thus, the rate
of change of ρ is entirely due to the collision term. Thus,
φ
∂ρφ
′ ′ ′ ′ ′ ′ ′ ′
= ∫ dΓ∫ dΓ ∫ dΓ ∫ dΓ {w(Γ Γ | ΓΓ ) f f χ − w(ΓΓ |Γ Γ )f f χ}
1 1 1 1 1 1 1 1
∂t
′ ′ ′ ′ ′
= ∫ dΓ∫ dΓ ∫ dΓ ∫ dΓ w(Γ Γ | ΓΓ ) f f (χ − χ ) ,
1 1 1 1 1
where f ≡ f (Γ) ,f ′
≡ f (Γ )
′
,f 1
≡ f (Γ )
1
,f 1
′ ′
≡ f (Γ )
1
, χ = χ(Γ ), with
∂(φf ) ∂φ
χ = = φ +f . (8.2.35)
∂f ∂f
We now invoke the symmetry

′ ′ ′ ′
w(Γ Γ | ΓΓ ) = w(Γ Γ |Γ Γ) , (8.2.36)
1 1 1 1
which allows us to write

∂ρφ 1
′ ′ ′ ′ ′ ′
= ∫ dΓ∫ dΓ ∫ dΓ ∫ dΓ w(Γ Γ | ΓΓ ) f f (χ + χ −χ −χ ) . (8.2.37)
1 1 1 1 1 1 1
∂t 2
This shows that ρ is preserved by the collision term if χ(Γ) is a collisional invariant.
φ
Now let us consider φ(f ) = ln f . We define h ≡ ρ∣∣ φ=ln f

. We then have
∂h 1
′ ′ ′ ′
=− ∫ dΓ∫ dΓ ∫ dΓ ∫ dΓ w f f ⋅ x ln x , (8.2.38)
1 1 1
∂t 2
where w ≡ w(Γ Γ ′ ′
1
| ΓΓ )
1
and x ≡ f f 1
′
/f f
′
1
. We next invoke the result
′ ′ ′ ′ ′ ′ ′ ′
∫ dΓ ∫ dΓ w(Γ Γ | ΓΓ ) = ∫ dΓ ∫ dΓ w(ΓΓ |Γ Γ ) (8.2.39)
1 1 1 1 1 1
which is a statement of unitarity of the scattering matrix3. Multiplying both sides by f (Γ) f (Γ ), then integrating over Γ and Γ , and finally changing variables (Γ, Γ 1 1 1
′
) ↔ (Γ , Γ )
′
1
, we find
′ ′ ′ ′ ′ ′ ′ ′
0 = ∫ dΓ∫ dΓ ∫ dΓ ∫ dΓ w (f f − f f ) = ∫ dΓ∫ dΓ ∫ dΓ ∫ dΓ w f f (x − 1) . (8.2.40)
1 1 1 1 1 1 1
Multiplying this result by 1
2
and adding it to the previous equation for h
˙
, we arrive at our final result,
∂h 1 ′ ′
′ ′
=− ∫ dΓ∫ dΓ ∫ dΓ ∫ dΓ w f f (x ln x − x + 1) . (8.2.41)
1 1 1
∂t 2
Note that w, f , and f are all nonnegative. It is then easy to prove that the function g(x) = x ln x − x + 1 is nonnegative for all positive x values4, which therefore entails the important result
′ ′
1
∂h(r, t)
≤0 . (8.2.42)
∂t
Boltzmann’s H function is the space integral of the h density: H = ∫ d 3

r h .
Thus, everywhere in space, the function h(r, t) is monotonically decreasing or constant, due to collisions. In equilibrium, h
˙
= 0 everywhere, which requires x = 1 ,
0 0 0 ′ 0 ′
f (Γ) f (Γ ) = f (Γ ) f (Γ ) , (8.2.43)
1 1
or, taking the logarithm,

0 0 0 ′ 0 ′
ln f (Γ) + ln f (Γ ) = ln f (Γ ) + ln f (Γ ) . (8.2.44)
1 1
But this means that ln f is itself a collisional invariant, and if 1, p , and ε are the only collisional invariants, then ln f must be expressible in terms of them. Thus,
0 0
0
μ V⋅p ε
ln f = + − , (8.2.45)
kB T kB T kB T
where μ , V, and T are constants which parameterize the equilibrium distribution f 0

(p) , corresponding to the chemical potential, flow velocity, and temperature, respectively.
This page titled 8.2: Boltzmann Transport Theory is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel Arovas.
Consider a gas which is only weakly out of equilibrium. We follow the treatment in Lifshitz and Pitaevskii, §6. As the gas is only
slightly out of equilibrium, we seek a solution to the Boltzmann equation of the form f = f + δf , where f is describes a local 0 0
equilibrium. Recall that such a distribution function is annihilated by the collision term in the Boltzmann equation but not by the
streaming term, hence a correction δf must be added in order to obtain a solution.
The most general form of local equilibrium is described by the distribution
μ − ε(Γ) + V⋅ p
0
f (r, Γ) = C exp ( ) , (8.3.1)
kB T
where μ = μ(r, t) , T = T (r, t) , and V = V(r, t) vary in both space and time. Note that
0
0
dT ∂f
df = (dμ + p ⋅ dV + (ε − μ − V⋅ p) − dε) (− )
T ∂ε
0
1 dT ∂f
=( dp + p ⋅ dV + (ε − h) − dε) (− )
n T ∂ε
where we have assumed V = 0 on average, and used
∂μ ∂μ
dμ = ( ) dT + ( ) dp
∂T ∂p
p T
1
= −s dT + dp ,
n
where s is the entropy per particle and n is the number density. We have further written h = μ + T s , which is the enthalpy per
particle. Here, c is the heat capacity per particle at constant pressure5. Finally, note that when f is the Maxwell-Boltzmann
p
0
distribution, we have
0 0
∂f f
− = . (8.3.2)
∂ε k T
B
The Boltzmann equation is written
∂ p ∂ ∂ ∂f
0
( + ⋅ +F ⋅ )(f + δf ) = ( ) . (8.3.3)
∂t m ∂r ∂p ∂t
coll
The RHS of this equation must be of order δf because the local equilibrium distribution f is annihilated by the collision integral. 0
We therefore wish to evaluate one of the contributions to the LHS of this equation,
0 0 0 0
∂f p ∂f ∂f ∂f 1 ∂p ε−h ∂T
+ ⋅ +F ⋅ = (− ){ + + mv⋅[(v⋅∇) V]
∂t m ∂r ∂p ∂ε n ∂t T ∂t
∂V 1 ε−h
+ v ⋅ (m + ∇p) + v ⋅ ∇T − F ⋅ v} .
∂t n T
To simplify this, first note that Newton’s laws applied to an ideal fluid give ρV ˙
= −∇p , where ρ = mn is the mass density.
Corrections to this result, e.g. viscosity and nonlinearity in V, are of higher order.
Next, continuity for particle number means ṅ + ∇⋅(nV) = 0 . We assume V is zero on average and that all derivatives are small,
hence ∇⋅(nV) = V⋅∇n + n ∇⋅V ≈ n ∇⋅V . Thus,
∂ ln n ∂ ln p ∂ ln T
= − = −∇⋅V , (8.3.4)
∂t ∂t ∂t
where we have invoked the ideal gas law n = p/k B

T above.
Next, we invoke conservation of entropy. If s is the entropy per particle, then ns is the entropy per unit volume, in which case we
have the continuity equation
∂(ns) ∂s ∂n
+ ∇ ⋅ (nsV) = n( + V⋅∇s) + s( + ∇ ⋅ (nV)) = 0 . (8.3.5)
∂t ∂t ∂t
The second bracketed term on the RHS vanishes because of particle continuity, leaving us with ṡ + V⋅∇s ≈ ṡ =0 (since V = 0
on average, and any gradient is first order in smallness). Now thermodynamics says
∂s ∂s
ds = ( ) dT + ( ) dp
∂T ∂p
p T
cp kB
= dT − dp ,
T p
since T ( ∂s
∂T
)
p
= cp and ( ∂s
∂p
)
T
=(
∂v
∂T
)
p
, where v = V /N . Thus,
cp ∂ ln T ∂ ln p
− =0 . (8.3.6)
kB ∂t ∂t
∂ ln p
We now have in eqns. [ptea] and [pteb] two equations in the two unknowns ∂ ln T
∂t
and ∂t
, yielding
∂ ln T kB
=− ∇⋅V
∂t c
V
∂ ln p cp
=− ∇⋅V .
∂t c
V
Thus Equation [LHSA] becomes

0 0 0 0
∂f p ∂f ∂f ∂f ε(Γ) − h
+ ⋅ +F ⋅ = (− ){ v ⋅ ∇T + m vα v Qαβ
β
∂t m ∂r ∂p ∂ε T
h − T cp − ε(Γ)
+ ∇⋅V − F ⋅ v} ,
c /k
V B
where
∂V
1 ∂Vα β
Qαβ = ( + ) . (8.3.7)
2 ∂x ∂xα
β
Therefore, the Boltzmann equation takes the form

0
ε(Γ) − h ε(Γ) − h + T cp f ∂ δf ∂f
{ v ⋅ ∇T + m vα v Qαβ − ∇⋅V − F ⋅ v} + =( ) . (8.3.8)
β
T c /k kB T ∂t ∂t
V B coll
∂ δf ∂ δf
Notice we have dropped the terms v ⋅ ∂r
and F ⋅ ∂p
, since δf must already be first order in smallness, and both the ∂
∂r
operator
∂ δf
as well as F add a second order of smallness, which is negligible. Typically ∂t
is nonzero if the applied force F(t) is time-
dependent. We use the convention of summing over repeated indices. Note that δ αβ
Q
αβ
=Q = ∇⋅V . For ideal gases in which
αα
only translational and rotational degrees of freedom are excited, h = c T . p
This page titled 8.3: Weakly Inhomogeneous Gas is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel
Arovas.
Approximation of Collision Integral
We now consider a very simple model of the collision integral,
0
∂f f −f δf
( ) =− =− . (8.4.1)
∂t τ τ
coll
This model is known as the relaxation time approximation. Here, f = f (r, p, t) is a distribution function which describes a local equilibrium at each
0 0
position r and time t . The quantity τ is the relaxation time, which can in principle be momentum-dependent, but which we shall first consider to be
constant. In the absence of streaming terms, we have
∂ δf δf −t/τ
=− ⟹ δf (r, p, t) = δf (r, p, 0) e . (8.4.2)
∂t τ
The distribution f then relaxes to the equilibrium distribution f on a time scale τ . We note that this approximation is obviously flawed in that all
0
quantities – even the collisional invariants – relax to their equilibrium values on the scale τ . In the Appendix, we consider a model for the collision
integral in which the collisional invariants are all preserved, but everything else relaxes to local equilibrium at a single rate.
Computation of the scattering time

Consider two particles with velocities v and v . The average of their relative speed is
′
′ 3 3 ′ ′ ′
⟨ |v − v | ⟩ = ∫ d v∫ d v P (v) P (v ) |v − v | , (8.4.3)
where P (v) is the Maxwell velocity distribution,

3/2 2
m mv
P (v) = ( ) exp (− ) , (8.4.4)
2π kB T 2 kB T
which follows from the Boltzmann form of the equilibrium distribution f 0

(p) . It is left as an exercise for the student to verify that
1/2
4 kB T
′
v̄ ≡ ⟨ |v − v | ⟩ = − ( ) . (8.4.5)
rel
√π m
–
Note that v̄ = √2 v̄ , where v̄ is the average particle speed. Let σ be the total scattering cross section, which for hard spheres is σ = πd , where d is
rel
2
the hard sphere diameter. Then the rate at which particles scatter is
1
= n v̄ σ . (8.4.6)
rel
τ
The particle mean free path is simply

1
ℓ = v̄ τ = – . (8.4.7)
√2 n σ
While the scattering length is not temperature-dependent within this formalism, the scattering time is T -dependent, with
− 1/2
1 √π m
τ (T ) = = ( ) . (8.4.8)
n v̄ σ 4nσ kB T
rel
As T → 0 , the collision time diverges as τ ∝ T , because the particles on average move more slowly at lower temperatures. The mean free path,
−1/2
–
however, is independent of T , and is given by ℓ = 1/√2nσ .
Figure 8.4.1 : Graphic representation of the equation n σ v̄ τ = 1 , which yields the scattering time τ in terms of the number density n , average
rel
particle pair relative velocity v̄ , and two-particle total scattering cross section σ . The equation says that on average there must be one particle within
rel
the tube.
Thermal conductivity
We consider a system with a temperature gradient ∇T and seek a steady state ( time-independent) solution to the Boltzmann equation. We assume
Fα=Q = 0 . Appealing to Equation ??? , and using the relaxation time approximation for the collision integral, we have
αβ
τ (ε − cp T )
0
δf = − (v ⋅ ∇T ) f . (8.4.9)
2
kB T
We are now ready to compute the energy and particle currents. In order to compute the local density of any quantity A(r, p) , we multiply by the
distribution f (r, p) and integrate over momentum:
3
ρ (r, t) = ∫ d p A(r, p) f (r, p, t) , (8.4.10)
A
For the energy (thermal) current, we let A = ε v = ε p /m , in which case ρ = j . Note that ∫d p p f = 0 since f is isotropic in p even when
α α α
3 0 0
μ
A
and T depend on r . Thus, only δf enters into the calculation of the various currents. Thus, the energy (thermal) current is
α 3 α
jε (r) = ∫ d p ε v δf
nτ ∂T
α β
=− ⟨v v ε (ε − cp T ) ⟩ ,
2
kB T ∂xβ
where the repeated index β is summed over, and where momentum averages are defined relative to the equilibrium distribution,
3 0 3 0 3
⟨ ϕ(p) ⟩ = ∫ d p ϕ(p) f (p)/∫ d p f (p) =∫ d v P (v) ϕ(mv) . (8.4.11)
In this context, it is useful to point out the identity

3 0 3
d pf (p) = n d v P (v) , (8.4.12)
where
3/2
m −m(v−V) /2 kB T
2
P (v) = ( ) e (8.4.13)
2π kB T
is the Maxwell velocity distribution.

Note that if ϕ = ϕ(ε) is a function of the energy, and if V = 0 , then
3 0 3 ˜(ε) dε ,
d pf (p) = n d v P (v) = n P (8.4.14)
where
2
˜(ε) = −3/2 1/2 −ε/ kB T
P (kB T ) ε e , (8.4.15)
−
√π
is the Maxwellian distribution of single particle energies. This distribution is normalized with ˜(ε) = 1
∫ dε P . Averages with respect to this distribution
0
are given by
∞ ∞
2 −3/2 1/2 −ε/ kB T

˜(ε) =
⟨ ϕ(ε) ⟩ = ∫ dε ϕ(ε) P − (kB T ) ∫ dε ε ϕ(ε) e . (8.4.16)
√π
0 0
If ϕ(ε) is homogeneous, then for any α we have

2 3
α α
⟨ε ⟩ = Γ(α + )(kB T ) . (8.4.17)
−
√π 2
Due to spatial isotropy, it is clear that we can replace

1 2ε
α β 2
v v → v δ = δ (8.4.18)
αβ αβ
3 3m
in Equation ??? . We then have j ε

= −κ ∇T , with
2
2nτ kB T π
2
κ = ⟨ ε (ε − cp T ) ⟩ = 5nτ = nℓ v̄ cp , (8.4.19)
2
3mkB T 2m 8
8 kB T
where we have used c p =
5
2
kB and v̄ 2
=
πm
. The quantity κ is called the thermal conductivity. Note that κ ∝ T 1/2
.
Viscosity
Consider the situation depicted in Figure 8.4.2. A fluid filling the space between two large flat plates at z = 0 and z = d is set in motion by a force
F = Fx ^ applied to the upper plate; the lower plate is fixed. It is assumed that the fluid’s velocity locally matches that of the plates. Fluid particles at the
top have an average x-component of their momentum ⟨p ⟩ = mV . As these particles move downward toward lower z values, they bring their x-
x
momenta with them. Therefore there is a downward (−z ^-directed) flow of ⟨p ⟩ . Since x-momentum is constantly being drawn away from z = d plane,
x
this means that there is a −x-directed viscous drag on the upper plate. The viscous drag force per unit area is given by F /A = −ηV /d , where
drag
V /d = ∂ V /∂z is the velocity gradient and η is the shear viscosity. In steady state, the applied force balances the drag force, F + F
x = 0 . Clearly
drag
in the steady state the net momentum density of the fluid does not change, and is given by ρV x ^ , where ρ is the fluid mass density. The momentum per
1
unit time injected into the fluid by the upper plate at z = d is then extracted by the lower plate at z = 0 . The momentum flux density Π = n ⟨ p v ⟩ xz x z
∂V
is the drag force on the upper surface per unit area: Π xz = −η
∂z
x
. The units of viscosity are [η] = M /LT .
We now provide some formal definitions of viscosity. As we shall see presently, there is in fact a second type of viscosity, called second viscosity or
bulk viscosity, which is measurable although not by the type of experiment depicted in igure 8.4.2.
The momentum flux tensor Π = n ⟨ p αβ α v
β
⟩ is defined to be the current of momentum component pα in the direction of increasing x
β
. For a gas in
motion with average velocity V, we have
′ ′
Π = nm ⟨ (Vα + vα )(V +v )⟩
αβ β β
′ ′
= nm Vα V + nm ⟨ vα v ⟩
β β
1 ′2
= nm Vα V + nm ⟨ v ⟩δ
β αβ
3
= ρ Vα V +p δ ,
β αβ
where v
′
is the particle velocity in a frame moving with velocity V , and where we have invoked the ideal gas law p = nkB T . The mass density is
ρ = nm .
Figure 8.4.2 : Gedankenexperiment to measure shear viscosity η in a fluid. The lower plate is fixed. The viscous drag force per unit area on the upper
plate is F /A = −ηV /d . This must be balanced by an applied force F .
drag
When V is spatially varying,

~
Π =pδ + ρ Vα V − σαβ , (8.4.20)
αβ αβ β
where σ ~
αβ
is the viscosity stress tensor. Any symmetric tensor, such as σ
~
, can be decomposed into a sum of (i) a traceless component, and (ii) a
αβ
component proportional to the identity matrix. Since σ should be, to first order, linear in the spatial derivatives of the components of the velocity field
~
αβ
V , there is a unique two-parameter decomposition:
∂V
∂Vα β 2
~
σαβ = η( + − ∇⋅V δ ) + ζ ∇⋅V δ
αβ αβ
∂x ∂xα 3
β
1
= 2η (Q − Tr (Q) δ ) + ζ Tr (Q) δ .
αβ αβ αβ
3
The coefficient of the traceless component is η , known as the shear viscosity. The coefficient of the component proportional to the identity is ζ , known
as the bulk viscosity. The full stress tensor σ contains a contribution from the pressure:
αβ
~
σ = −p δ + σαβ . (8.4.21)
αβ αβ
The differential force dF that a fluid exerts on on a surface element n

α ^ dA is
dFα = −σ n dA , (8.4.22)
αβ β
where we are using the Einstein summation convention and summing over the repeated index β. We will now compute the shear viscosity η using the
Boltzmann equation in the relaxation time approximation.
Appealing again to Equation ??? , with F = 0 and h = c pT , we find
τ ε − cp T ε
0
δf = − {m vα v Qαβ + v ⋅ ∇T − ∇⋅V} f . (8.4.23)
β
kB T T c / kB
V
We assume ∇T = ∇⋅V = 0 , and we compute the momentum flux:
3
Πxz = n∫ d p px vz δf
2
nm τ
=− Qαβ ⟨ vx vz vα v ⟩
β
kB T
nτ ∂Vx ∂Vz
2 2
=− ( + ) ⟨ m vx ⋅ m vz ⟩
kB T ∂z ∂x
∂Vz ∂Vx
= −nτ kB T ( + ) .
∂x ∂z
Thus, if Vx = Vx (z) , we have

∂Vx
Πxz = −nτ kB T (8.4.24)
∂z
from which we read off the viscosity,

π
η = nkB T τ = nmℓ v̄ . (8.4.25)
8
Note that η(T ) ∝ T 1/2

.
Figure 8.4.3 : Left: thermal conductivity (λ in figure) of Ar between T = 800 K and T = 2600 K . The best fit to a single power law λ = aT results b
in b = 0.651 . Source: G. S. Springer and E. W. Wingeier, J. Chem Phys. 59, 1747 (1972). Right: log-log plot of shear viscosity (μ in figure) of He
between T ≈ 15 K and T ≈ 1000 K . The red line has slope . The slope of the data is approximately 0.633. Source: J. Kestin and W. Leidenfrost,
1
Physica 25, 537 (1959).

How well do these predictions hold up? In igure 8.4.3, we plot data for the thermal conductivity of argon and the shear viscosity of helium. Both show a
clear sublinear behavior as a function of temperature, but the slope d ln κ/d ln T is approximately 0.65 and d ln η/d ln T is approximately 0.63.
Clearly the simple model is not even getting the functional dependence on T right, let alone its coefficient. Still, our crude theory is at least qualitatively
correct.
Why do both κ(T ) as well as η(T ) decrease at low temperatures? The reason is that the heat current which flows in response to ∇T as well as the
momentum current which flows in response to ∂ V /∂z are due to the presence of collisions, which result in momentum and energy transfer between
x
particles. This is true even when total energy and momentum are conserved, which they are not in the relaxation time approximation. Intuitively, we
might think that the viscosity should increase as the temperature is lowered, since common experience tells us that fluids ‘gum up’ as they get colder –
think of honey as an extreme example. But of course honey is nothing like an ideal gas, and the physics behind the crystallization or glass transition
which occurs in real fluids when they get sufficiently cold is completely absent from our approach. In our calculation, viscosity results from collisions,
and with no collisions there is no momentum transfer and hence no viscosity. If, for example, the gas particles were to simply pass through each other,
as though they were ghosts, then there would be no opposition to maintaining an arbitrary velocity gradient.
Oscillating External Force

Suppose a uniform oscillating external force F (t) = F e ext
is applied. For a system of charged particles, this force would arise from an external
−iωt
electric field F = qE e
ext
, where q is the charge of each particle. We’ll assume ∇T = 0 . The Boltzmann equation is then written
−iωt
0
∂f p ∂f −iωt
∂f f −f
+ ⋅ + Fe ⋅ =− . (8.4.26)
∂t m ∂r ∂p τ
We again write f =f
0
+ δf , and we assume δf is spatially constant. Thus,
0
∂ δf −iωt
∂f δf
+ Fe ⋅v =− . (8.4.27)
∂t ∂ε τ
If we assume δf (t) = δf (ω) e −iωt

then the above differential equation is converted to an algebraic equation, with solution
−iωt 0
τ e ∂f
δf (t) = − F⋅v . (8.4.28)
1 − iωτ ∂ε
We now compute the particle current:
3
jα (r, t) = ∫ d p v δf
−iωt F
τ e β
3 0
= ⋅ ∫d p f (p) vα v
β
1 − iωτ kB T
−iωt
τ e nFα
3 2
= ⋅ ∫ d v P (v) v
1 − iωτ 3 kB T
−iωt
nτ Fα e
= ⋅ .
m 1 − iωτ
If the particles are electrons, with charge q = −e , then the electrical current is (−e) times the particle current. We then obtain
j^\ssr{(elec)}_\alpha(t)={ne^2\tau\over m}\cdot {E\ns_\alpha\,e^{-i\omega t}\over 1-i\omega \tau}\equiv\sigma\ns_{\alpha\beta}(\omega) \>E\ns_\beta\,e^{-i\omega t}\ ,
where
2
ne τ 1
σ (ω) = ⋅ δ (8.4.29)
αβ αβ
m 1 − iωτ
is the frequency-dependent electrical conductivity tensor. Of course for fermions such as electrons, we should be using the Fermi distribution in place of
the Maxwell-Boltzmann distribution for f (p). This affects the relation between n and μ only, and the final result for the conductivity tensor σ (ω) is
0
αβ
unchanged.
Quick and Dirty Treatment of Transport

Suppose we have some averaged intensive quantity ϕ which is spatially dependent through T (r) or μ(r) or V(r) . For simplicity we will write
ϕ = ϕ(z) . We wish to compute the current of ϕ across some surface whose equation is dz = 0 . If the mean free path is ℓ , then the value of ϕ for
particles crossing this surface in the +z

^ direction is ϕ(z − ℓ cos θ) , where θ is the angle the particle’s velocity makes with respect to z
^ , cos θ = v /v . z
We perform the same analysis for particles moving in the −z ^ direction, for which ϕ = ϕ(z + ℓ cos θ) . The current of ϕ through this surface is then
3 3
jϕ = nz
^∫d v P (v) vz ϕ(z − ℓ cos θ) + nz
^∫d v P (v) vz ϕ(z + ℓ cos θ)
vz >0 vz <0
2
∂ϕ 3
vz 1 ∂ϕ
= −nℓ ^∫ d v P (v)
z =− nv̄ ℓ ^ ,
z
∂z v 3 ∂z
−−−−
8 kB T
where v̄ = √ πm
is the average particle speed. If the z -dependence of ϕ comes through the dependence of ϕ on the local temperature T , then we have
1 ∂ϕ
jϕ = − nℓ v̄ ∇T ≡ −K ∇T , (8.4.30)
3 ∂T
where
1 ∂ϕ
K = nℓ v̄ (8.4.31)
3 ∂T
∂ϕ
is the transport coefficient. If ϕ = ⟨ε⟩ , then ∂T
= cp , where c is the heat capacity per particle at constant pressure. We then find
p j
ε
= −κ ∇T with
thermal conductivity
1
κ = nℓ v̄ cp . (8.4.32)
3
Our Boltzmann equation calculation yielded the same result, but with a prefactor of π
8
instead of 1
3
.
We can make a similar argument for the viscosity. In this case ϕ = ⟨p x⟩ is spatially varying through its dependence on the flow velocity V(r) . Clearly
∂ϕ/∂ V = m , hence
x
1 ∂Vx
z
j = Πxz = − nmℓ v̄ , (8.4.33)
px
3 ∂z
from which we identify the viscosity, η = nmℓ v̄ . Once again, this agrees in its functional dependences with the Boltzmann equation calculation in the
1
relaxation time approximation. Only the coefficients differ. The ratio of the coefficients is K\ns_\ssr{QDC}/K\ns_\ssr{BRT}=\frac{8}{3\pi}=0.849 in both
cases6.
Thermal diffusivity, kinematic viscosity, and Prandtl number

Suppose, under conditions of constant pressure, we add heat q per unit volume to an ideal gas. We know from thermodynamics that its temperature will
then increase by an amount ΔT = q/nc . If a heat current j flows, then the continuity equation for energy flow requires
p q
∂T
ncp +∇ ⋅ j =0 . (8.4.34)
q
∂t
In a system where there is no net particle current, the heat current j is the same as the energy current j , and since j
q ε ε
= −κ ∇T , we obtain a diffusion
equation for temperature,
∂T κ
2
= ∇ T . (8.4.35)
∂t ncp
The combination
κ
a ≡ (8.4.36)
ncp
is known as the thermal diffusivity. Our Boltzmann equation calculation in the relaxation time approximation yielded the result κ = nk T τ c B p /m . Thus,
we find a = k T τ /m via this method. Note that the dimensions of a are the same as for any diffusion constant D, namely [a] = L /T .
B
2
[Prandtl] Viscosities, thermal conductivities, and Prandtl numbers for some common gases at T = 293 K and p = 1 atm. (Source: Table 1.1 of Smith and Jensen,
with data for triatomic gases added.)
Gas η (μP a ⋅ s ) κ (mW /\Rm ⋅ K ) cp / kB Pr
He 19.5 149 2.50 0.682
Ar 22.3 17.4 2.50 0.666
Xe 22.7 5.46 2.50 0.659
H 8.67 179 3.47 0.693

2
N 17.6 25.5 3.53 0.721

2
O 20.3 26.0 3.50 0.711

2
CH 11.2 33.5 4.29 0.74

4
CO 14.8 18.1 4.47 0.71

2
NH 10.1 24.6 4.50 0.90

3
Another quantity with dimensions of L /T is the kinematic viscosity, ν = η/ρ, where ρ = nm is the mass density. We found η = nk T τ from the
2
B
relaxation time approximation calculation, hence ν = k T τ /m. The ratio ν /a, called the Prandtl number, P r = η c /mκ , is dimensionless. According
B p
to our calculations, P r = 1 . According to table [Prandtl], most monatomic gases have P r ≈ . 2
This page titled 8.4: Relaxation Time Approximation is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel Arovas.
Failure of the relaxation time approximation
As we remarked above, the relaxation time approximation fails to conserve any of the collisional invariants. It is therefore
unsuitable for describing hydrodynamic phenomena such as diffusion. To see this, let f (r, v, t) be the distribution function, here
written in terms of position, velocity, and time rather than position, momentum, and time as befor7. In the absence of external
forces, the Boltzmann equation in the relaxation time approximation is
0
∂f ∂f f −f
+v⋅ =− . (8.5.1)
∂t ∂r τ
The density of particles in velocity space is given by
~ 3
n(v, t) = ∫ d r f (r, v, t) . (8.5.2)
In equilibrium, this is the Maxwell distribution times the total number of particles: {\tilde n}\ns_0(\Bv)=N P\ns_\ssr{M}(\Bv) . The
number of particles as a function of time, N (t) = ∫ d v n
~
(v, t) , should be a constant.
3
Integrating the Boltzmann equation one has

~ ~ ~
∂n n−n
0
=− . (8.5.3)
∂t τ
Thus, with δn
~ ~ ~
(v, t) = n(v, t) − n 0
(v) , we have
~ ~ −t/τ
δn(v, t) = δn(v, 0) e . (8.5.4)
Thus, n
~
(v, t) decays exponentially to zero with time constant τ , from which it follows that the total particle number exponentially
relaxes to N . This is physically incorrect; local density perturbations can’t just vanish. Rather, they diffuse.
0
Modified Boltzmann equation and its solution

To remedy this unphysical aspect, consider the modified Boltzmann equation,
∂f ∂f 1 ^
dv 1
+v⋅ = [−f +∫ f] ≡ ( P − 1)f , (8.5.5)
∂t ∂r τ 4π τ
^
where P is a projector onto a space of isotropic functions of v: P F = ∫ F (v) for any function F (v). Note that dv
4π
PF is a
function of the speed v = |v| . For this modified equation, known as the Lorentz model, one finds ∂ n
~
= 0. t
8
The model in Equation [Lormod] is known as the Lorentz model . To solve it, we consider the Laplace transform,
∞
^ −st 3 −ik⋅r
f (k, v, s) = ∫ dt e ∫d r e f (r, v, t) . (8.5.6)
Taking the Laplace transform of Equation [Lormod], we find

−1 ^ −1 ^
(s + iv ⋅ k + τ ) f (k, v, s) = τ P f (k, v, s) + f (k, v, t = 0) . (8.5.7)
We now solve for ^

P f (k, v, s) :
−1
τ f (k, v, t = 0)
^ ^
f (k, v, s) = P f (k, v, s) + , (8.5.8)
−1 −1
s + iv ⋅ k + τ s + iv ⋅ k + τ
which entails
−1
^
dv τ ^
dv f (k, v, t = 0)
^ ^
P f (k, v, s) = [∫ ] P f (k, v, s) + ∫ . (8.5.9)
−1 −1
4π s + iv ⋅ k + τ 4π s + iv ⋅ k + τ
Now we have
1
−1 −1
^
dv τ τ
∫ = ∫ dx
−1 −1
4π s + iv ⋅ k + τ s + ivkx + τ
−1
1 vkτ
−1
= tan ( ) .
vk 1 + τs
Thus,
−1
1 vkτ dv
^ f (k, v, t = 0)
−1
P f (k, v, s) = [1 − tan ( )] ∫ . (8.5.10)
−1
vkτ 1 + τs 4π s + iv ⋅ k + τ
We now have the solution to Lorentz’s modified Boltzmann equation:

−1
−1
τ 1 vkτ dv
^ f (k, v, t = 0)
^ −1
f (k, v, s) = [1 − tan ( )] ∫
−1 −1
s + iv ⋅ k + τ vkτ 1 + τs 4π s + iv ⋅ k + τ
f (k, v, t = 0)
+ .
−1
s + iv ⋅ k + τ
Let us assume an initial distribution which is perfectly localized in both r and v:

f (r, v, t = 0) = δ(v − v ) . (8.5.11)
0
For these initial conditions, we find
dv
^ f (k, v, t = 0) 1 δ(v − v )
0
∫ = ⋅ . (8.5.12)
−1 −1 2
4π s + iv ⋅ k + τ s + iv ⋅ k+τ 4πv
0 0
We further have that

1 −1
vkτ 1 2 2 2
1− tan ( ) = sτ + k v τ +… , (8.5.13)
vkτ 1 + τs 3
and therefore
−1 −1
τ τ 1 δ(v − v )
^ 0
f (k, v, s) = ⋅ ⋅ ⋅
−1 −1 1 2 2 2
s + iv ⋅ k + τ s + iv ⋅ k+τ s+ v k τ +… 4πv
0 0 0
3
δ(v − v )
0
+ .
−1
s + iv ⋅ k+τ
0
We are interested in the long time limit t ≫ τ for f (r, v, t). This is dominated by s ∼ t −1
, and we assume that τ
−1
is dominant
over s and iv ⋅ k . We then have
1 δ(v − v )
^ 0
f (k, v, s) ≈ ⋅ . (8.5.14)
1 2 2 2
s+ v k τ 4πv
3 0 0
Performing the inverse Laplace and Fourier transforms, we obtain
2
δ(v − v )
−3/2 −r /4Dt 0
f (r, v, t) = (4πDt) e ⋅ , (8.5.15)
2
4πv
0
where the diffusion constant is

1
2
D = v τ . (8.5.16)
0
3
The units are [D] = L 2

/T . Integrating over velocities, we have the density
2
3 −3/2 −r /4Dt
n(r, t) = ∫ d v f (r, v, t) = (4πDt) e . (8.5.17)
Note that
3
∫ d r n(r, t) = 1 (8.5.18)
for all time. Total particle number is conserved!
This page titled 8.5: Diffusion and the Lorentz model is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by
Daniel Arovas.
Linearizing the collision integral
We now return to the classical Boltzmann equation and consider a more formal treatment of the collision term in the linear
approximation. We will assume time-reversal symmetry, in which case
∂f
3 3 ′ 3 ′ ′ ′ ′ ′
( ) = ∫ d p ∫ d p∫ d p w(p , p | p, p ) {f (p ) f (p ) − f (p) f (p )} . (8.6.1)
1 1 1 1 1 1
∂t
coll
The collision integral is nonlinear in the distribution f . We linearize by writing

0 0
f (p) = f (p) + f (p) ψ(p) , (8.6.2)
where we assume ψ(p) is small. We then have, to first order in ψ ,
∂f
0 ^ ψ + O(ψ2 ) ,
( ) =f (p) L (8.6.3)
∂t coll
where the action of the linearized collision operator is given by
^ 3 3 ′ 3 ′ ′ ′ 0 ′ ′
Lψ = ∫ d p ∫ d p∫ d p w(p , p | p, p ) f (p ) {ψ(p ) + ψ(p ) − ψ(p) − ψ(p )}
1 1 1 1 1 1 1
∂σ
3 0 ′ ′
= ∫ d p ∫ dΩ |v − v | f (p ) {ψ(p ) + ψ(p ) − ψ(p) − ψ(p )} ,
1 1 1 1 1
∂Ω
where we have invoked Equation [BEsig] to write the RHS in terms of the differential scattering cross section. In deriving the
above result, we have made use of the detailed balance relation,
0 0 0 ′ 0 ′
f (p) f (p ) = f (p ) f (p ) . (8.6.4)
1 1
We have also suppressed the r dependence in writing f (p), f 0

(p) , and ψ(p).
From Equation [bwig], we then have the linearized equation
∂
^ −
(L )ψ = Y , (8.6.5)
∂t
where, for point particles,
1 ε(p) − cp T k ε(p)
B
Y = { v ⋅ ∇T + m vα v Q − ∇⋅V − F ⋅ v} . (8.6.6)
β αβ
kB T T c
V
Equation [LBE] is an inhomogeneous linear equation, which can be solved by inverting the operator L
^
−
∂
∂t
.
Linear algebraic properties of L

^
Although L^ is an integral operator, it shares many properties with other linear operators with which you are familiar, such as
matrices and differential operators. We can define an inner product9,
3 0
⟨ψ |ψ ⟩ ≡∫d pf (p) ψ (p) ψ (p) . (8.6.7)
1 2 1 2
Note that this is not the usual Hilbert space inner product from quantum mechanics, since the factor f
0
(p) is included in the
metric. This is necessary in order that L
^
be self-adjoint:
^ ^
⟨ψ | Lψ ⟩ = ⟨ Lψ |ψ ⟩ . (8.6.8)
1 2 1 2
We can now define the spectrum of normalized eigenfunctions of ^

L , which we write as ϕn (p). The eigenfunctions satisfy the
eigenvalue equation,
^
Lϕn = −λn ϕn , (8.6.9)
and may be chosen to be orthonormal,
⟨ ϕm | ϕn ⟩ = δmn . (8.6.10)
Of course, in order to obtain the eigenfunctions ϕ we must have detailed knowledge of the function w(p , p
n
′ ′
1
| p, p )
1
.
Recall that there are five collisional invariants, which are the particle number, the three components of the total particle
momentum, and the particle energy. To each collisional invariant, there is an associated eigenfunction ϕ with eigenvalue λ = 0 . n n
One can check that these normalized eigenfunctions are

1
ϕn (p) = −
√n
pα
ϕp (p) = −− −−−−
α
√nm kB T
−−
−
2 ε(p) 3
ϕε (p) = √ ( − ) .
3n kB T 2
If there are no temperature, chemical potential, or bulk velocity gradients, and there are no external forces, then Y =0 and the only
changes to the distribution are from collisions. The linearized Boltzmann equation becomes
∂ψ
^ψ .
=L (8.6.11)
∂t
We can therefore write the most general solution in the form

′
−λn t
ψ(p, t) = ∑ Cn ϕn (p) e , (8.6.12)
where the prime on the sum reminds us that collisional invariants are to be excluded. All the eigenvalues λ , aside from the five n
zero eigenvalues for the collisional invariants, must be positive. Any negative eigenvalue would cause ψ(p, t) to increase without
bound, and an initial nonequilibrium distribution would not relax to the equilibrium f (p), which we regard as unphysical. 0
Henceforth we will drop the prime on the sum but remember that C = 0 for the five collisional invariants.
n
Recall also the particle, energy, and thermal (heat) currents,
3 3 0
j = ∫ d p v f (p) = ∫ d p f (p) v ψ(p) = ⟨ v | ψ ⟩
3 3 0
jε = ∫ d p v ε f (p) = ∫ d p f (p) v ε ψ(p) = ⟨ v ε | ψ ⟩
3 3 0
j = ∫ d p v (ε − μ) f (p) = ∫ d p f (p) v (ε − μ) ψ(p) = ⟨ v (ε − μ) | ψ ⟩ .
q
Note j q
=j
ε
− μj .
Steady state solution to the linearized Boltzmann equation

Under steady state conditions, there is no time dependence, and the linearized Boltzmann equation takes the form
^ψ = Y .
L (8.6.13)
We may expand ψ in the eigenfunctions ϕ and write ψ = ∑n n

Cn ϕn . Applying L
^ and taking the inner product with ϕ , we have
j
1
C =− ⟨ϕ |Y ⟩ . (8.6.14)
j j
λ
j
Thus, the formal solution to the linearized Boltzmann equation is

1
ψ(p) = − ∑ ⟨ ϕn | Y ⟩ ϕn (p) . (8.6.15)
λn
n
This solution is applicable provided | Y ⟩ is orthogonal to the five collisional invariants.
Thermal conductivity
For the thermal conductivity, we take ∇T = ∂z T x
^ , and
1 ∂T
Y = ⋅ Xκ , (8.6.16)
2
kB T ∂x
where X κ ≡ (ε − cp T ) vx . Under the conditions of no particle flow (j = 0 ), we have j q

= −κ ∂x T x
^ . Then we have
∂T
⟨ Xκ | ψ ⟩ = −κ . (8.6.17)
∂x
Viscosity
For the viscosity, we take
m ∂Vx
Y = ⋅ Xη , (8.6.18)
kB T ∂y
with X η = vx vy . We then
∂Vx
Πxy = ⟨ m vx vy | ψ ⟩ = −η . (8.6.19)
∂y
Thus,
η ∂Vx
⟨ Xη | ψ ⟩ = − . (8.6.20)
m ∂y
Variational approach
Following the treatment in chapter 1 of Smith and Jensen, define H ^
≡ −L . We have that H is a positive semidefinite operator,
^ ^
whose only zero eigenvalues correspond to the collisional invariants. We then have the Schwarz inequality,
^ ^ ^ 2
⟨ψ | H | ψ ⟩⋅⟨ϕ| H | ϕ⟩ ≥ ⟨ϕ| H | ψ ⟩ , (8.6.21)
for any two Hilbert space vectors | ψ ⟩ and | ϕ ⟩. Consider now the above calculation of the thermal conductivity. We have
1 ∂T
^
Hψ = − Xκ (8.6.22)
2
kB T ∂x
and therefore
2 2
kB T 1 ⟨ ϕ | Xκ ⟩
^
κ = ⟨ψ|H |ψ⟩ ≥ . (8.6.23)
2 2
(∂T /∂x) kB T ^
⟨ϕ|H |ϕ⟩
Similarly, for the viscosity, we have

m ∂Vx
^
Hψ = − Xη , (8.6.24)
kB T ∂y

2 2
kB T m ⟨ ϕ | Xη ⟩
^
η = ⟨ψ|H |ψ⟩ ≥ . (8.6.25)
(∂ Vx /∂y )2 kB T ^ |ϕ⟩
⟨ϕ|H
In order to get a good lower bound, we want ϕ in each case to have a good overlap with X . One approach then is to take κ,η
ϕ =X , which guarantees that the overlap will be finite (and not zero due to symmetry, for example). We illustrate this method
κ,η
with the viscosity calculation. We have

2 2
m ⟨ vx vy | vx vy ⟩
η ≥ . (8.6.26)
kB T ^
⟨ vx vy | H | vx vy ⟩
Now the linearized collision operator L

^
acts as
∂σ
^ 3 0 3 0 ′ ′
⟨ ϕ | L | ψ ⟩ = ∫ d p g (p) ϕ(p)∫ d p ∫ dΩ |v − v | f (p ){ψ(p) + ψ(p ) − ψ(p ) − ψ(p )} . (8.6.27)
1 1 1 1 1
∂Ω
Here the kinematics of the collision guarantee total energy and momentum conservation, so ′
p and p
′
1
are determined as in
Equation [finalps].
Now we have
dΩ = sin χ dχ dφ , (8.6.28)
where χ is the scattering angle depicted in Fig. [scat_impact] and φ is the azimuthal angle of the scattering. The differential
scattering cross section is obtained by elementary mechanics and is known to be
2
∂σ ∣ d(b /2) ∣
=∣ ∣ , (8.6.29)
∂Ω ∣ d sin χ ∣
where b is the impact parameter. The scattering angle is

∞
b
χ(b, u) = π − 2∫ dr −−−−−−−−−−−−−− , (8.6.30)
4
2U(r)r
√ r4 − b2 r2 −
rp ~ 2
mu
where m~
= m is the reduced mass, and
1
2
rp is the relative coordinate separation at periapsis, the distance of closest approach,
which occurs when ṙ = 0 ,
2
1 ℓ
~ 2
mu = + U (rp ) , (8.6.31)
2 ~ 2
2mrp
where ℓ = m
~
ub is the relative coordinate angular momentum.
[scat_impact] Scattering in the CM frame. O is the force center and P is the point of
periapsis. The impact parameter is b, and \chi is the scattering angle. \phi_0 is the
angle through which the relative coordinate moves between periapsis and infinity.
[scat_impact] Scattering in the CM frame. O is the force center and P is the point of periapsis. The impact parameter is b , and χ is
the scattering angle. ϕ is the angle through which the relative coordinate moves between periapsis and infinity.
0
We work in center-of-mass coordinates, so the velocities are

1 1
′ ′
v =V+ u v =V+ u
2 2
1 1
′ ′
v =V− u v =V− u ,
1 1
2 2
′
with |u| = |u | and u
′
^ ⋅u
^ = cos χ . Then if ψ(p) = v x vy , we have
1
′ ′ ′ ′
Δ(ψ) ≡ ψ(p) + ψ(p ) − ψ(p ) − ψ(p ) = (ux uy − ux uy ) . (8.6.32)
1 1
2
We may write
′
^ + sin χ sin φ e
u = u (sin χ cos φ e ^ + cos χ e
^ ) , (8.6.33)
1 2 3
where e
^
3
=u
^ . With this parameterization, we have
2π
1
′ ′ 2 2
∫ dφ (uα u − uα u ) = −π sin χ (u δ − 3 uα u ) . (8.6.34)
β β αβ β
2
0
Note that we have used here the relation
e e +e e +e e =δ , (8.6.35)
1α 1β 2α 2β 3α 3β αβ
3
which holds since the LHS is a projector ∑ i=1
^ ⟩⟨e
|ei
^ |
i
.
It is convenient to define the following integral:
∞
2
R(u) ≡ ∫ db b sin χ(b, u) . (8.6.36)
Since the Jacobian
∣ (∂v, ∂ v ) ∣
1
∣ det ∣ =1 , (8.6.37)
∣ (∂V, ∂u) ∣
we have
3
m 2 2 3π
^ | v v ⟩ = n2 (
⟨ vx vy | L
3
) ∫ d V∫ d u e
3 −mV / kB T
e
−mu /4 kB T
⋅u⋅ ux uy ⋅ R(u) ⋅ vx vy . (8.6.38)
x y
2π kB T 2
This yields
π
^ 2 5
⟨ vx vy | L | vx vy ⟩ = n ⟨u R(u)⟩ , (8.6.39)
40
where
∞ ∞
2 2
2 −mu /4 kB T 2 −mu /4 kB T
⟨F (u)⟩ ≡ ∫ du u e F (u)/ ∫ du u e . (8.6.40)
0 0
It is easy to compute the term in the numerator of Equation [varvisc]:

3/2 2
m 3 −mv /2 k
2
T
kB T
2 2
⟨ vx vy | vx vy ⟩ = n ( ) ∫d ve B
vx vy = n( ) . (8.6.41)
2π kB T m
Putting it all together, we find

3
40 (kB T )
5
η ≥ /⟨u R(u)⟩ . (8.6.42)
2
π m
The computation for κ is a bit more tedious. One has ψ(p) = (ε − c pT ) vx , in which case
1 ′ ′
Δ(ψ) = m[(V ⋅ u) ux − (V ⋅ u ) ux ] . (8.6.43)
2
Ultimately, one obtains the lower bound

3
150 kB (kB T )
5
κ ≥ /⟨u R(u)⟩ . (8.6.44)
π m3
Thus, independent of the potential, this variational calculation yields a Prandtl number of
ν η cp 2
Pr = = = , (8.6.45)
a mκ 3
which is very close to what is observed in dilute monatomic gases (see Tab. [Prandtl]).
While the variational expressions for η and κ are complicated functions of the potential, for hard sphere scattering the calculation
is simple, because b = d sin ϕ = d cos( χ) , where d is the hard sphere diameter. Thus, the impact parameter b is independent of
0
1
the relative speed u, and one finds R(u) = d . Then 1
3
3
5/2
5
1 3 5
128 kB T 2
⟨u R(u)⟩ = d ⟨u ⟩ = −( ) d (8.6.46)
3 √π m
and one finds

1/2 1/2
5 (m kB T ) 75 kB kB T
η ≥ − 2 , κ ≥ − 2 ( ) . (8.6.47)
16 √π d 64 √π d m
This page titled 8.6: Linearized Boltzmann Equation is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by
Daniel Arovas.
We now derive the equations governing fluid flow. The equations of mass and momentum balance are
∂ρ
+ ∇⋅(ρ V) = 0
∂t
∂(ρ Vα ) ∂Π
αβ
+ =0 ,
β
∂t ∂x
where \[\RPi\ns_{\alpha\beta}=\rho\, V\ns_\alpha V\ns_\beta + p\,\delta\ns_{\alpha\beta} \, - \, \stackrel

{\overbrace{\left\{\eta\Bigg( {\pz V\ns_\alpha\over\pz x\ns_\beta} + {\pz V\ns_\beta\over\pz x\ns_\alpha}-\frac{2}
{3}\,\bnabla\!\cdot\!\BV\,\delta\ns_{\alpha\beta}\bigg) +\zeta\,\bnabla\!\cdot\!\BV\,\delta\ns_{\alpha\beta}\right\}}}\ .\]
Substituting the continuity equation into the momentum balance equation, one arrives at
∂V 1
2
ρ + ρ (V⋅∇)V = −∇p + η ∇ V + (ζ + η)∇(∇⋅V) , (8.7.1)
∂t 3
which, together with continuity, are known as the Navier-Stokes equations. These equations are supplemented by an equation
describing the conservation of energy,
∂s ∂Vα
~
T + T ∇⋅(sV) = σ + ∇⋅(κ∇T ) . (8.7.2)
αβ
β
∂T ∂x
Note that the LHS of Equation [NSB] is ρ DV/Dt, where D/Dt is the convective derivative. Multiplying by a differential
volume, this gives the mass times the acceleration of a differential local fluid element. The RHS, multiplied by the same differential
volume, gives the differential force on this fluid element in a frame instantaneously moving with constant velocity V . Thus, this is
Newton’s Second Law for the fluid.
This page titled 8.7: The Equations of Hydrodynamics is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by
Daniel Arovas.
Boltzmann equation for quantum systems
Almost everything we have derived thus far can be applied, mutatis mutandis, to quantum systems. The main difference is that the
distribution f corresponding to local equilibrium is no longer of the Maxwell-Boltzmann form, but rather of the Bose-Einstein or
0
Fermi-Dirac form,
−1
ε(k) − μ(r, t)
0
f (r, k, t) = {exp( ) ∓ 1} , (8.8.1)
kB T (r, t)
where the top sign applies to bosons and the bottom sign to fermions. Here we shift to the more common notation for quantum
systems in which we write the distribution in terms of the wavevector k = p/ℏ rather than the momentum p . The quantum
distributions satisfy detailed balance with respect to the quantum collision integral
3 3 ′ 3 ′
∂f d k d k d k
1 1 ′ ′ ′ ′
( ) =∫ ∫ ∫ w {f f (1 ± f ) (1 ± f ) − f f (1 ± f ) (1 ± f )} (8.8.2)
3 3 3 1 1 1 1
∂t (2π) (2π) (2π)
coll
where w = w(k, k | k , k ), f = f (k) , f

1
′ ′
1 1
= f (k )
1
,f ′
= f (k )
′
, and f 1
′
= f (k )
′
1
, and where we have assumed time-reversal and
parity symmetry. Detailed balance requires
′ ′
f f f f
1 1
⋅ = ⋅ , (8.8.3)
′ ′
1 ±f 1 ±f 1 ±f 1 ±f
1 1
where f =f
0
is the equilibrium distribution. One can check that
1 f
β(μ−ε)
f = ⟹ =e , (8.8.4)
β(ε−μ)
e ∓1 1 ±f
which is the Boltzmann distribution, which we have already shown to satisfy detailed balance. For the streaming term, we have \
[\begin{split} df^0&=\kT\,{\pz f^0\over\pz\ve}\>d\!\left({\ve-\mu\over\kT}\right)\\ &=\kT\>{\pz f^0\over\pz\ve}\left\{-
{d\mu\over\kT}-{(\ve-\mu)\,dT\over\kB T^2} +{d\ve\over\kT}\right\}\\ &=-{\pz f^0\over\pz \ve}\left\
|\expect{\Bk'}{U}{\Bk}|^2\,\big(f(\Bk')-f(\Bk)\big)\,\delta\big(\ve(\Bk)-\ve(\Bk')\big)\label{qobc}\\ &={2\pi\over\hbar
V}\int\limits_{\hat\ROmega}\!\!{d^3\!k\over (2\pi)^3}\> |\,{\hat U}(\Bk-\Bk')|^2\,\big(f(\Bk')-f(\Bk)\big)\,\delta\big(\ve(\Bk)-
\ve(\Bk')\big)\ . \end{split}\] The wavevectors are now restricted to the first Brillouin zone, and the dispersion ε(k) is no longer the
ballistic form ε = ℏ k /2m but rather the dispersion for electrons in a particular energy band (typically the valence band) of a
2 2
solid10. Note that f = f satisfies detailed balance with respect to one-body collisions as well11.
0
In the presence of a weak electric field E and a (not necessarily weak) magnetic field B , we have, within the relaxation time
approximation, f = f + δf with
0
0
∂ δf e ∂ δf ε−μ ∂f δf
− v×B⋅ − v ⋅ [e \boldmath{E}+ ∇ T] =− , (8.8.5)
∂t ℏc ∂k T ∂ε τ
where \boldmath{E} = −∇(ϕ − μ/e) = E − e

−1
∇μ is the gradient of the ‘electrochemical potential’ ϕ−e
−1
μ . In deriving
∂ δf
the above equation, we have worked to lowest order in small quantities. This entails dropping terms like v⋅
∂r
(higher order in
∂ δf
spatial derivatives) and E ⋅ ∂k
(both E and δf are assumed small). Typically τ is energy-dependent, τ = τ (ε(k)) .
We can use Equation [qlbe] to compute the electrical current j and the thermal current j , q
3
d k
j = −2e∫ v δf
3
(2π)
^
Ω
3
d k
jq = 2 ∫ (ε − μ) v δf .
3
(2π)
^
Ω
Here the factor of 2 is from spin degeneracy of the electrons (we neglect Zeeman splitting).
In the presence of a time-independent temperature gradient and electric field, linearized Boltzmann equation in the relaxation time
approximation has the solution
0
ε−μ ∂f
δf = −τ (ε) v ⋅ (e\boldmath{E}+ ∇ T ) (− ) . (8.8.6)
T ∂ε
We now consider both the electrical current12 j as well as the thermal current density j . One readily obtains q
3
d k
j = −2e∫ v δf ≡ L11 \boldmath{E}− L12 ∇ T
3
(2π)
^
Ω
3
d k
jq = 2 ∫ (ε − μ) v δf ≡ L21 \boldmath{E}− L22 ∇ T
(2π)3
^
Ω
where the transport coefficients L 11

are matrices:
2 0 α β
αβ
e ∂f v v
L = ∫ dε τ (ε) (− ) ∫ dSε
11 3
4π ℏ ∂ε |v|
0 α β
αβ αβ e ∂f v v
L = TL =− ∫ dε τ (ε) (ε − μ) (− ) ∫ dSε
21 12 3
4π ℏ ∂ε |v|
0 α β
αβ
1 ∂f v v
2
L = ∫ dε τ (ε) (ε − μ) (− ) ∫ dSε .
22 3
4π ℏ T ∂ε |v|
If we define the hierarchy of integral expressions

0 α β
1 ∂f v v
αβ n
Jn ≡ ∫ dε τ (ε) (ε − μ) (− ) ∫ dSε (8.8.7)
3
4π ℏ ∂ε |v|
then we may write
αβ αβ αβ αβ αβ αβ
1 αβ
2
L =e J , L = TL = −e J , L = J . (8.8.8)
11 0 21 12 1 22 2
T
The linear relations in Equation ([linrel1]) may be recast in the following form:
\boldmath{E} = ρ j + Q ∇ T
jq = \boldmath{⊓} j − κ ∇ T ,
where the matrices ρ, Q, \boldmath{⊓}, and κ are given by

−1 −1
ρ =L Q =L L
11 11 12
−1 −1
\boldmath{⊓} = L L κ =L −L L L ,
21 11 22 21 11 12
or, in terms of the J ,

n
1 1
−1 −1
ρ = J Q =− J J
2 0 0 1
e eT
1 −1
1 −1
\boldmath{⊓} = − J J κ = (J2 − J J J ) ,
1 0 1 0 1
e T
[thermocouple] A thermocouple is a junction

formed of two dissimilar metals. With no
electrical current passing, an electric field is
generated in the presence of a temperature
gradient, resulting in a voltage V=V_\RA-V_\RB.
[thermocouple] A thermocouple is a junction formed of two dissimilar metals. With no electrical current passing, an electric field is
generated in the presence of a temperature gradient, resulting in a voltage V = V − V . A B
These equations describe a wealth of transport phenomena:

(∇T = B = 0 ) An electrical current j will generate an electric field \boldmath{E} = ρj, where ρ is the electrical resistivity.
(∇T = B = 0 ) An electrical current j will generate an heat current j = ⊓j , where ⊓ is the Peltier coefficient. q
(j = B = 0 ) A temperature gradient ∇T gives rise to a heat current j = −κ∇T , where κ is the thermal conductivity.q
(j = B = 0 ) A temperature gradient ∇T gives rise to an electric field \boldmath{E} = Q∇T , where Q is the Seebeck
coefficient.
One practical way to measure the thermopower is to form a junction between two dissimilar metals, A and B. The junction is held
at temperature T and the other ends of the metals are held at temperature T . One then measures a voltage difference between the
1 0
free ends of the metals – this is known as the Seebeck effect. Integrating the electric field from the free end of A to the free end of
B gives
B
VA − VB = − ∫ \boldmath{E}⋅ dl = (QB − QA )(T1 − T0 ) . (8.8.9)
What one measures here is really the difference in thermopowers of the two metals. For an absolute measurement of Q , replace B A
by a superconductor (Q = 0 for a superconductor). A device which converts a temperature gradient into an emf is known as a
thermocouple.
The Peltier effect has practical applications in refrigeration technology. Suppose an electrical current I is passed through a junction
between two dissimilar metals, A and B. Due to the difference in Peltier coefficients, there will be a net heat current into the
junction of W = (\boldmath{⊓} − \boldmath{⊓} ) I . Note that this is proportional to I , rather than the familiar I result
A B
2
from Joule heating. The sign of W depends on the direction of the current. If a second junction is added, to make an ABA
configuration, then heat absorbed at the first junction will be liberated at the second. 13
[peltier] A sketch of a Peltier effect refrigerator. An electrical current I
is passed through a junction between two dissimilar metals. If the
dotted line represents the boundary of a thermally well-insulated body,
then the body cools when \bsqcap_\RB >\bsqcap_\RA, in order to
maintain a heat current balance at the junction.
[peltier] A sketch of a Peltier effect refrigerator. An electrical current I is passed through a junction between two dissimilar metals.
If the dotted line represents the boundary of a thermally well-insulated body, then the body cools when
\boldmath{⊓} > \boldmath{⊓} , in order to maintain a heat current balance at the junction.
B A
The Heat Equation

We begin with the continuity equations for charge density ρ and energy density ε :
∂ρ
+∇ ⋅ j = 0
∂t
∂ε
+ ∇ ⋅ jε = j⋅E ,
∂t
where E is the electric field14. Now we invoke local thermodynamic equilibrium and write
∂ε ∂ε ∂n ∂ε ∂T
= +
∂t ∂n ∂t ∂T ∂t
μ ∂ρ ∂T
=− +c ,
V
e ∂t ∂t
where n is the electron number density (n = −ρ/e ) and c is the specific heat. We may now write
V
∂T ∂ε μ ∂ρ
c = +
V
∂t ∂t e ∂t
μ
= j ⋅ E − ∇ ⋅ jε − ∇⋅j
e
= j ⋅ \boldmath{E}− ∇ ⋅ jq .
Invoking j q
= \boldmath{⊓}j − κ∇ T , we see that if there is no electrical current (j = 0 ), we obtain the heat equation
2
∂T ∂ T
c = καβ . (8.8.10)
V α β
∂t ∂x ∂x
This results in a time scale τ for temperature diffusion τ = CL c /κ , where L is a typical length scale and C is a numerical
T T
2
V
constant. For a cube of size L subjected to a sudden external temperature change, L is the side length and C = 1/3π (solve by 2
separation of variables).
Calculation of Transport Coefficients
We will henceforth assume that sufficient crystalline symmetry exists ( cubic symmetry) to render all the transport coefficients
multiples of the identity matrix. Under such conditions, we may write J = J δ with n
αβ
n αβ
0
1 ∂f
n
Jn = ∫ dε τ (ε) (ε − μ) (− ) ∫ dSε |v| . (8.8.11)
3
12 π ℏ ∂ε
The low-temperature behavior is extracted using the Sommerfeld expansion,

∞
0
∂f
∣
I ≡ ∫ dε H (ε) (− ) = πD csc(πD) H (ε)
∣
∂ε ε=μ
−∞
2
π 2 ′′
= H (μ) + (kB T ) H (μ) + …
6
where D ≡ k B
T
∂
∂ε
is a dimensionless differential operator.15
Let us now perform some explicit calculations in the case of a parabolic band with an energy-independent scattering time τ . In this
case, one readily finds
σ
0 −3/2 3/2 n∣
J = μ πD csc πD ε (ε − μ) , (8.8.12)
n 2 ∣ε=μ
e
where σ 0
2
= ne τ / m
∗
. Thus,
2 2
σ π (kB T )
0
J = [1 + + …]
0 2 2
e 8 μ
2 2
σ π (kB T )
0
J = +…
1 2
e 2 μ
2
σ π
0 2
J = (kB T ) +… ,
2 2
e 3
from which we obtain the low-T results ρ = σ −1
0
,
Q=-{\pi^2\over 2}\,{k_\ssr{B}^2 T\over e\,\veF} \qquad\qquad \kappa = {\pi^2\over 3}\,{n\tau\over m^*}\,k_\ssr{B}^2 T\ ,
and of course \boldmath{⊓} = T Q. The predicted universal ratio
{\kappa\over\sigma T}={\pi^2\over 3}\, (k\nd_\ssr{B}/e)^2 = 2.45\times 10^{-8}\,\RV^2\,\RK^{-2}\ ,
is known as the Wiedemann-Franz law. Note also that our result for the thermopower is unambiguously negative. In actuality,
several nearly free electron metals have positive low-temperature thermopowers (Cs and Li, for example). What went wrong? We
have neglected electron-phonon scattering!
Onsager Relations
Transport phenomena are described in general by a set of linear relations,
Ji = Lik Fk , (8.8.13)
where the {F } are generalized forces and the {J } are generalized currents. Moreover, to each force
k i Fi corresponds a unique
conjugate current J , such that the rate of internal entropy production is
i
˙
∂S
˙
S = ∑ Fi Ji ⟹ Fi = . (8.8.14)
∂Ji
i
The Onsager relations (also known as Onsager reciprocity) state that

Lik (B) = ηi ηk Lki (−B) , (8.8.15)
where η describes the parity of J under time reversal:

i i
T
J = ηi Ji , (8.8.16)
i
where J T
i
is the time reverse of J
i
. To justify the Onsager relations requires a microscopic description of our nonequilibrium
system.
The Onsager relations have some remarkable consequences. For example, they require, for B = 0 , that the thermal conductivity
tensor κ of any crystal must be symmetric, independent of the crystal structure. In general,this result does not follow from
ij
considerations of crystalline symmetry. It also requires that for every ‘off-diagonal’ transport phenomenon, the Seebeck effect,
there exists a distinct corresponding phenomenon, the Peltier effect.
For the transport coefficients studied, Onsager reciprocity means that in the presence of an external magnetic field,
ρ (B) = ρ (−B)
αβ βα
κ (B) = κ (−B)
αβ βα
\boldmath{⊓} (B) = T Q (−B) .

αβ βα
Let’s consider an isotropic system in a weak magnetic field, and expand the transport coefficients to first order in B:
γ
ρ (B) = ρ δ +ν ϵ B
αβ αβ αβγ
γ
κ (B) = κ δ +ϖ ϵ B
αβ αβ αβγ
γ
Q (B) = Q δ +ζ ϵ B
αβ αβ αβγ
γ
\boldmath{⊓} (B) = \boldmath{⊓} δ +θϵ B .
αβ αβ αβγ
Onsager reciprocity requires \boldmath{⊓} = T Q and θ = T ζ . We can now write

\boldmath{E} = ρ j + ν j × B + Q ∇ T + ζ ∇ T × B
jq = \boldmath{⊓} j + θ j × B − κ ∇ T − ϖ ∇ T × B .
There are several new phenomena lurking:
( ∂T
∂x
=
∂T
∂y
= jy = 0 ) An electrical current j = j x
^
x and a field B = B z
^
z yield an electric field \boldmath{E}. The Hall
coefficient is R H = Ey / jx Bz = −ν .
( ∂T
∂x
= jy = jq,y = 0 ) An electrical current j = j x
^
^
z yield a temperature gradient ∂T
∂y
. The Ettingshausen
coefficient is P =
∂T
∂y
/ jx Bz = −θ/κ .
(j x = jy =
∂T
∂y
=0 ) A temperature gradient ∇ T =
∂T
∂x
x
^ and a field B = B z z
^ yield an electric field \boldmath{E}. The
Nernst coefficient is Λ = E y
/
∂T
∂x
Bz = −ζ .
(j x = jy = Ey = 0 ) A temperature gradient ∇ T =
∂T
∂x
^
^
z yield an orthogonal temperature gradient ∂T
∂y
.
The Righi-Leduc coefficient is L = ∂T
∂y
/
∂T
∂x
Bz = ζ/Q .
This page titled 8.8: Nonequilibrium Quantum Transport is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by
Daniel Arovas.
A stochastic process is one which is partially random, it is not wholly deterministic. Typically the randomness is due to phenomena
at the microscale, such as the effect of fluid molecules on a small particle, such as a piece of dust in the air. The resulting motion
(called Brownian motion in the case of particles moving in a fluid) can be described only in a statistical sense. That is, the full
motion of the system is a functional of one or more independent random variables. The motion is then described by its averages with
respect to the various random distributions.
Langevin equation and Brownian motion

Consider a particle of mass M subjected to dissipative and random forcing. We’ll examine this system in one dimension to gain an
understanding of the essential physics. We write
ṗ + γp = F + η(t) . (8.9.1)
Here, γ is the damping rate due to friction, F is a constant external force, and η(t) is a stochastic random force. This equation,
known as the Langevin equation, describes a ballistic particle being buffeted by random forcing events. Think of a particle of dust as
it moves in the atmosphere; F would then represent the external force due to gravity and η(t) the random forcing due to interaction
with the air molecules. For a sphere of radius a moving with velocity v in a fluid, the Stokes drag is given by F = −6πηav , drag
where a is the radius. Thus,

6πηa
γ = , (8.9.2)
Stokes
M
where M is the mass of the particle. It is illustrative to compute γ in some setting. Consider a micron sized droplet (a = 10 cm) −4
of some liquid of density ρ ∼ 1.0 g/cm moving in air at T = 20 C . The viscosity of air is η = 1.8 × 10 g/cm ⋅ s at this
3 ∘ −4
temperature16. If the droplet density is constant, then γ = 9η/2ρa = 8.1 × 10 s , hence the time scale for viscous relaxation of
2 4 −1
the particle is τ = γ = 12 μs . We should stress that the viscous damping on the particle is of course due to the fluid molecules, in
−1
some average ‘coarse-grained’ sense. The random component to the force η(t) would then represent the fluctuations with respect to
this average.
We can easily integrate this equation:
d
γt γt γt
(p e ) =F e + η(t) e
dt
t
F
−γt −γt γ(s−t)
p(t) = p(0) e + (1 − e ) + ∫ ds η(s) e
γ
0
Note that p(t) is indeed a functional of the random function η(t). We can therefore only compute averages in order to describe the
motion of the system.
The first average we will compute is that of p itself. In so doing, we assume that η(t) has zero mean: ⟨η(t)⟩ = 0 . Then
−γt
F −γt
⟨p(t)⟩ = p(0) e + (1 − e ) . (8.9.3)
γ
On the time scale γ , the initial conditions

−1
p(0) are effectively forgotten, and asymptotically for t ≫ γ
−1
we have
⟨p(t)⟩ → F /γ , which is the terminal momentum.
Next, consider
t t
2 2 γ( s −t) γ( s −t)
⟨p (t)⟩ = ⟨p(t)⟩ + ∫ ds ∫ ds e 1
e 2
⟨η(s ) η(s )⟩ .
1 2 1 2
0 0
We now need to know the two-time correlator ⟨η(s ) η(s 1 2

)⟩ . We assume that the correlator is a function only of the time difference
Δs = s − s , so that the random force η(s) satisfies
1 2
⟨η(s)⟩ = 0
⟨η(s ) η(s )⟩ = ϕ(s −s ) .

1 2 1 2
The function ϕ(s) is the autocorrelation function of the random force. A macroscopic object moving in a fluid is constantly buffeted
by fluid particles over its entire perimeter. These different fluid particles are almost completely uncorrelated, hence ϕ(s) is basically
nonzero except on a very small time scale τ , which is the time a single fluid particle spends interacting with the object. We can take
ϕ
τ → 0 and approximate
ϕ
ϕ(s) ≈ Γ δ(s) . (8.9.4)
We shall determine the value of Γ from equilibrium thermodynamic considerations below.

With this form for ϕ(s) , we can easily calculate the equal time momentum autocorrelation:
t
2 2 2γ(s−t)
⟨p (t)⟩ = ⟨p(t)⟩ + Γ∫ ds e
2 Γ −2γt
= ⟨p(t)⟩ + (1 − e ) .
2γ
Consider the case where F =0 and the limit t ≫ γ

−1
. We demand that the object thermalize at temperature T . Thus, we impose
the condition
2
p (t) 1
⟨ ⟩ = kB T ⟹ Γ = 2γM kB T , (8.9.5)
2M 2
where M is the particle’s mass. This determines the value of Γ.

We can now compute the general momentum autocorrelator:
′
t t
′ ′
′ ′ ′ γ(s−t) γ( s −t ) ′
⟨p(t) p(t )⟩ − ⟨p(t)⟩⟨p(t )⟩ = ∫ ds∫ ds e e ⟨η(s) η(s )⟩
0 0
′
−γ|t−t | ′ ′
= M kB T e (t, t → ∞ , |t − t | f inite) .
The full expressions for this and subsequent expressions, including subleading terms, are contained in an appendix, §14.
Let’s now compute the position x(t). We find
t s
1 γ( s −s)
x(t) = ⟨x(t)⟩ + ∫ ds∫ ds η(s ) e 1
, (8.9.6)
1 1
M
0 0
where
Ft 1 F −γt
⟨x(t)⟩ = x(0) + + (v(0) − ) (1 − e ) . (8.9.7)
γM γ γM
Note that for γt ≪ 1 we have ⟨x(t)⟩ = x(0) + v(0) t + M F t + O(t ) , as is appropriate for ballistic particles moving under
1
2
−1 2 3
the influence of a constant force. This long time limit of course agrees with our earlier evaluation for the terminal velocity,
v∞ = ⟨p(∞)⟩/M = F /γM . We next compute the position autocorrelation:
′ ′
t t s s
1 ′
′ ′ ′ −γ(s+s ) ′ γ( s +s )
⟨x(t) x(t )⟩ − ⟨x(t)⟩⟨x(t )⟩ = ∫ ds∫ ds e ∫ ds ∫ ds e 1 2
⟨η(s ) η(s )⟩
2 1 1 1 2
M
0 0 0 0
2 kB T
′
= min(t, t ) + O(1) .
γM
In particular, the equal time autocorrelator is
2 2 2 kB T t
⟨x (t)⟩ − ⟨x(t)⟩ = ≡ 2D t , (8.9.8)
γM
at long times, up to terms of order unity. Here,
kB T
D = (8.9.9)
γM
is the diffusion constant. For a liquid droplet of radius a = 1 μ\Rm moving in air at T = 293 K , for which η = 1.8 × 10 −4
P , we
have
−16
kB T (1.38 × 10 erg/K) (293 K)
−7 2
D = = = 1.19 × 10 cm /s . (8.9.10)
−4 −4
6πηa 6π (1.8 × 10 P ) (10 cm)
This result presumes that the droplet is large enough compared to the intermolecular distance in the fluid that one can adopt a
continuum approach and use the Navier-Stokes equations, and then assuming a laminar flow.
If we consider molecular diffusion, the situation is quite a bit different. As we shall derive below in §10.3, the molecular diffusion
constant is D = ℓ /2τ , where ℓ is the mean free path and τ is the collision time. As we found in Equation [nutaueqn], the mean free
2
path ℓ , collision time τ , number density n , and total scattering cross section σ are related by
1
ℓ = v̄ τ = – , (8.9.11)
√2 nσ
−−−−−−−−
where v̄ = √8k T /πm is the average particle speed. Approximating the particles as hard spheres, we have σ = 4πa , where a is
B
2
the hard sphere radius. At T = 293 K , and p = 1 atm , we have n = p/k T = 2.51 × 10 cm . Since air is predominantly
B
19 −3
composed of N molecules, we take a = 1.90 × 10 cm and m = 28.0 amu = 4.65 × 10

2
−8
g , which are appropriate for N .
−23
2
We find an average speed of v̄ = 471 \Rm/s and a mean free path of ℓ = 6.21 × 10 cm. Thus, D = ℓ v̄ = 0.146 cm /s . −6 1
2
2
Though much larger than the diffusion constant for large droplets, this is still too small to explain common experiences. Suppose we
set the characteristic distance scale at d = 10 cm and we ask how much time a point source would take to diffuse out to this radius.
The answer is Δt = d /2D = 343 s , which is between five and six minutes. Yet if someone in the next seat emits a foul odor, your
2
sense the offending emission in on the order of a second. What this tells us is that diffusion isn’t the only transport process involved
in these and like phenomena. More important are convection currents which distribute the scent much more rapidly.
Langevin equation for a particle in a harmonic well

Consider next the equation
˙ 2
M Ẍ + γM X + M ω X = F + η(t) , (8.9.12)
0 0
F
where F is a constant force. We write X =
0
0
2
+x and measure x relative to the potential minimum, yielding
Mω
0
1
2
ẍ + γ ẋ + ω x = η(t) . (8.9.13)
0
M
At this point there are several ways to proceed.

Perhaps the most straightforward is by use of the Laplace transform. Recall:
∞
−ν t
^(ν ) = ∫ dt e
x η(ν )
dν +ν t
x(t) = ∫ e ^(ν ) ,
x
2πi
C
where the contour C proceeds from a − i∞ to a + i∞ such that all poles of the integrand lie to the left of C . We then have
∞ ∞
1 1
−ν t −ν t 2
∫ dt e η(t) = ∫ dt e (ẍ + γ ẋ + ω x)
0
M M
0 0
2 2
= −(ν + γ) x(0) − ẋ(0) + (ν + γν + ω ) x
^(ν ) .
0
Thus, we have
∞
(ν + γ) x(0) + ẋ(0) 1 1
−ν t
^(ν ) =
x + ⋅ ∫ dt e η(t) . (8.9.14)
2 2 2 2
ν + γν + ω M ν + γν + ω
0 0
0
Now we may write

2 2
ν + γν + ω = (ν − ν+ )(ν − ν− ) , (8.9.15)
0
where
−−−− −−−−
1 1
2 2
ν =− γ ±√ γ −ω . (8.9.16)
± 0
2 4
Note that Re (ν ±
) ≤0 and that γ + ν ±
= −ν∓ .
Performing the inverse Laplace transform, we obtain
x(0) ẋ(0)
ν− t ν+ t ν+ t ν− t
x(t) = (ν+ e − ν− e )+ (e −e )
ν −ν ν −ν
+ − + −
+ ∫ ds K(t − s) η(s) ,
where
Θ(t − s)
ν+ (t−s) ν− (t−s)
K(t − s) = (e −e ) (8.9.17)
M (ν+ − ν− )
is the response kernel and Θ(t − s) is the step function which is unity for t > s and zero otherwise. The response is causal, x(t)
depends on η(s) for all previous times s < t , but not for future times s > t . Note that K(τ ) decays exponentially for τ → ∞ , if
Re(ν ) < 0 . The marginal case where ω = 0 and ν = 0 corresponds to the diffusion calculation we performed in the previous
+
± 0
section.
Discrete random walk

Consider an object moving on a one-dimensional lattice in such a way that every time step it moves either one unit to the right or
left, at random. If the lattice spacing is ℓ , then after n time steps the position will be
n
xn = ℓ ∑ σ , (8.9.18)
j
j=1
where
+1 if motion is one unit to right at time step j

σ ={ (8.9.19)
j
−1 if motion is one unit to left at time step j\ .
Clearly ⟨σj
⟩ =0 , so ⟨xn⟩ =0 . Now let us compute
n n
2 2 2
⟨xn ⟩ = ℓ ∑ ∑⟨σ σ ′
⟩ = nℓ , (8.9.20)
j j
′
j=1 j =1
where we invoke
⟨σ σ ′
⟩ =δ ′
. (8.9.21)
j j jj
If the length of each time step is τ , then we have, with t = nτ ,

2
ℓ
2
⟨x (t)⟩ = t , (8.9.22)
τ
and we identify the diffusion constant
2
ℓ
D = . (8.9.23)
2τ
Suppose, however, the random walk is biased, so that the probability for each independent step is given by
P (σ) = p δ +q δ , (8.9.24)
σ,1 σ,−1
where p + q = 1 . Then
⟨σ ⟩ = p − q = 2p − 1 (8.9.25)
j
and
2
⟨σ σ ′
⟩ = (p − q ) (1 − δ ′
)+δ ′
j j jj jj
2
= (2p − 1 ) + 4 p (1 − p) δ ′
.
jj
Then
⟨xn ⟩ = (2p − 1) ℓn
2 2
2
⟨xn ⟩ − ⟨xn ⟩ = 4 p (1 − p) ℓ n .
Fokker-Planck equation
Suppose x(t) is a stochastic variable. We define the quantity
δx(t) ≡ x(t + δt) − x(t) , (8.9.26)
and we assume
⟨δx(t)⟩ = F1 (x(t)) δt
2
⟨[δx(t)] ⟩ = F2 (x(t)) δt
n
but ⟨[δx(t)] ⟩ = O((δt) ) for n > 2 . The n = 1 term is due to drift and the n = 2 term is due to diffusion. Now consider the
2
conditional probability density, P (x, t | x , t ), defined to be the probability distribution for x ≡ x(t) given that x(t ) = x . The
0 0 0 0
conditional probability density satisfies the composition rule,

∞
P (x , t | x , t ) = ∫ dx P (x , t | x , t ) P (x , t |x ,t ) , (8.9.27)
2 2 0 0 1 2 2 1 1 1 1 0 0
−∞
for any value of t . This is also known as the Chapman-Kolmogorov equation. In words, what it says is that the probability density
1
for a particle being at x at time t , given that it was at x at time t , is given by the product of the probability density for being at
2 2 0 0
x at time t given that it was at x at t , multiplied by that for being at x at t given it was at x at t , integrated over x . This
2 2 1 1 1 1 0 0 1
should be intuitively obvious, since if we pick any time t ∈ [t , t ] , then the particle had to be somewhere at that time. Indeed, one
1 0 2
wonders how Chapman and Kolmogorov got their names attached to a result that is so obvious. At any rate, a picture is worth a
thousand words: see Figure [FChaKol].
[FChaKol] Interpretive sketch of the mathematics behind the Chapman-Kolmogorov equation.

Proceeding, we may write
∞
′ ′ ′
P (x, t + δt | x , t ) = ∫ dx P (x, t + δt | x , t) P (x , t | x , t ) . (8.9.28)
0 0 0 0
−∞
Now
′ ′
P (x, t + δt | x , t) = ⟨δ(x − δx(t) − x )⟩
2
d 1 2 d
′
= {1 + ⟨δx(t)⟩ + ⟨[δx(t)] ⟩ + … } δ(x − x )
′
dx 2 ′2
dx
′ 2 ′
d δ(x − x ) 1 d δ(x − x )
′ ′ ′ 2
= δ(x − x ) + F (x ) δt + F (x ) δt + O((δt) ) ,
1 ′ 2
dx 2 ′2
dx
where the average is over the random variables. We now insert this result into Equation [CGEFPE], integrate by parts, divide by δt,
and then take the limit δt → 0 . The result is the Fokker-Planck equation,
2
∂P ∂ 1 ∂
=− [ F1 (x) P (x, t)] + [ F2 (x) P (x, t)] . (8.9.29)
2
∂t ∂x 2 ∂x
Brownian motion redux

Let’s apply our Fokker-Planck equation to a description of Brownian motion. From our earlier results, we have
F
F (x) = , F (x) = 2D . (8.9.30)
1 2
γM
A formal proof of these results is left as an exercise for the reader. The Fokker-Planck equation is then
2
∂P ∂P ∂ P
= −u +D , (8.9.31)
2
∂t ∂x ∂x
where u = F /γM is the average terminal velocity. If we make a Galilean transformation and define
y = x − ut , s =t (8.9.32)
then our Fokker-Planck equation takes the form

2
∂P ∂ P
=D . (8.9.33)
∂s ∂y 2
This is known as the diffusion equation. Equation [FPEBM] is also a diffusion equation, rendered in a moving frame.
While the Galilean transformation is illuminating, we can easily solve Equation [FPEBM] without it. Let’s take a look at this
equation after Fourier transforming from x to q:
∞
dq
iqx ^
P (x, t) = ∫ e P (q, t)
2π
−∞
^ −iqx
P (q, t) = ∫ dx e P (x, t) .
−∞
Then as should be well known to you by now, we can replace the operator ∂
∂x
with multiplication by iq, resulting in
∂ 2
^ ^
P (q, t) = −(Dq + iqu) P (q, t) , (8.9.34)
∂t
with solution
2
^ −D q t −iqut ^
P (q, t) = e e P (q, 0) . (8.9.35)
We now apply the inverse transform to get back to x-space:
∞ ∞
dq 2 ′
iqx −D q t −iqut ′ −iqx ′
P (x, t) = ∫ e e e ∫ dx e P (x , 0)
2π
−∞ −∞
∞ ∞
′ ′
dq −D q
2
t iq(x−ut−x )
′
= ∫ dx P (x , 0)∫ e e
2π
−∞ −∞
′ ′ ′
= ∫ dx K(x − x , t) P (x , 0) ,
−∞
where
1 2
−(x−ut) /4Dt
K(x, t) = e (8.9.36)
−−−−
√4πDt
is the diffusion kernel. We now have a recipe for obtaining P (x, t) given the initial conditions P (x, 0). If P (x, 0) = δ(x),
describing a particle confined to an infinitesimal region about the origin, then P (x, t) = K(x, t) is the probability distribution for
finding the particle at x at time t . There are two aspects to K(x, t) which merit comment. The first is that the center of the
distribution moves with velocity u. This is due to the presence of the external force. The second is that the standard deviation
−− −
σ = √2Dt is increasing in time, so the distribution is not only shifting its center but it is also getting broader as time evolves. This
movement of the center and broadening are what we have called drift and diffusion, respectively.
Master Equation
Another way to model stochastic processes is via the master equation, which was discussed in chapter 3. Recall that if P i
(t) is the
probability for a system to be in state | i ⟩ at time t and W is the transition rate from state | j ⟩ to state | i ⟩, then
ij
dP
i
= ∑ (W P −W P ) . (8.9.37)
ij j ji i
dt
j
Consider a birth-death process in which the states | n ⟩ are labeled by nonnegative integers. Let α denote the rate of transitions n
from | n ⟩ → | n + 1 ⟩ and let β denote the rate of transitions from | n ⟩ → | n − 1 ⟩ . The master equation then takes the form17
n
dPn
=α P +β P − (αn + βn )Pn . (8.9.38)
n−1 n−1 n+1 n+1
dt
Let us assume we can write α = Kᾱ(n/K) and β = Kβ¯(n/K) , where K ≫ 1 . We assume the distribution P (t) has a time-
n n n
−
−
dependent maximum at n = Kϕ(t) and a width proportional to √K . We expand relative to this maximum, writing
−−
n ≡ Kϕ(t) + √K ξ and we define P (t) ≡ Π(ξ, t) . We now rewrite the master equation in Equation [MEPab] in terms of Π(ξ, t).
n
Since n is an independent variable, we set

˙ −
− −
− ˙
dn = K ϕ dt + √K dξ ⇒ dξ ∣
∣ = −√K ϕ dt . (8.9.39)
n
Therefore
dPn −
− ∂Π ∂Π
˙
= −√K ϕ + . (8.9.40)
dt ∂ξ ∂t
Next, we write, for any function f ,n
−1/2
fn = Kf (ϕ + K ξ)
1
1/2 ′ 2 ′′
= Kf (ϕ) + K ξ f (ϕ) + ξ f (ϕ) + … .
2
Similarly,
−1/2 −1
f = Kf (ϕ + K ξ ±K )
n±1
1/2 ′ ′
1 2 ′′
= Kf (ϕ) + K ξ f (ϕ) ± f (ϕ) + ξ f (ϕ) + … .
2
−
−
Dividing both sides of Equation [MEPab] by √K , we have
2
∂Π −1/2
∂Π ∂Π −1/2 ′ ′
∂Π 1 ∂ Π ′ ′
˙ ¯ ¯ ¯ ¯
− ϕ +K = (β − ᾱ) +K {(β − ᾱ ) ξ + (ᾱ + β ) + (β − ᾱ )Π} + … . (8.9.41)
2
∂ξ ∂t ∂ξ ∂ξ 2 ∂ξ
Equating terms of order K yields the equation

0
¯
ϕ̇ = f (ϕ) ≡ ᾱ(ϕ) − β (ϕ) . (8.9.42)
Equating terms of order K −1/2

yields the Fokker-Planck equation,
2
∂Π ∂ 1 ∂ Φ
′
= −f (ϕ(t)) (ξ Π) + g(ϕ(t)) , (8.9.43)
2
∂t ∂ξ 2 ∂ξ
where g(ϕ) ≡ ᾱ(ϕ) + β¯(ϕ) . If in the limit t → ∞ , Equation [Dphieqn] evolves to a stable fixed point ϕ
∗
, then the stationary
solution of the Fokker-Planck Equation [FPEPi], Π (ξ) = Π(ξ, t = ∞) must satisfy
eq
2
∂ 1 ∂ Πeq 1 2 2
′ ∗ ∗ −ξ /2 σ
−f (ϕ ) (ξ Πeq ) + g(ϕ ) =0 ⇒ Πeq (ξ) = −− −− e , (8.9.44)
2
∂ξ 2 ∂ξ √2πσ 2
where
∗
g(ϕ )
2
σ =− . (8.9.45)
′ ∗
2 f (ϕ )
Now both α and β are rates, hence both are positive and thus g(ϕ) > 0 . We see that the condition σ > 0 , which is necessary for a 2
normalizable equilibrium distribution, requires f (ϕ ) < 0 , which is saying that the fixed point in Equation [Dphieqn] is stable.
′ ∗
This page titled 8.9: Stochastic Processes is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel Arovas.
Problem : The linearized Boltzmann operator Lψ is a complicated functional. Suppose we replace L by L , where
3/2 2
m 3
mu
Lψ = −γ ψ(v, t) + γ( ) ∫ d u exp ( − )
2π kB T 2 kB T
2 2
m 2 mu 3 mv 3
× {1 + u⋅v+ ( − )( − )} ψ(u, t) .
kB T 3 2 kB T 2 2 kB T 2
Show that L shares all the important properties of L. What is the meaning of γ? Expand ψ(v, t) in spherical harmonics and Sonine
polynomials,
r ℓ/2 ℓ
ψ(v, t) = ∑ arℓm (t) S 1
(x) x ^ ),
Ym (n (8.10.1)
ℓ+
2
rℓm
with x = mv 2
/2 kB T , and thus express the action of the linearized Boltzmann operator algebraically on the expansion coefficients
arℓm (t) .
The Sonine polynomials S (x) are a complete, orthogonal set which are convenient to use in the calculation of transport
n
α
coefficients. They are defined as

n m
Γ(α + n + 1) (−x)
n
Sα (x) = ∑ , (8.10.2)
Γ(α + m + 1) (n − m)! m!
m=0
and satisfy the generalized orthogonality relation

∞
′ Γ(α + n + 1)
−x α n n
∫ dx e x Sα (x) Sα (x) = δ ′
. (8.10.3)
nn
n!
0
Solution : The ‘important properties’ of L are that it annihilate the five collisional invariants, 1, v, and v
2
, and that all other
eigenvalues are negative. That this is true for L can be verified by an explicit calculation.
Plugging the conveniently parameterized form of ψ(v, t) into L , we have
∞
γ 1/2
r ℓ/2 ℓ −x
Lψ = −γ ∑ arℓm (t) S 1
(x) x ^)
Ym (n + ∑ arℓm (t)∫ dx x e 1
ℓ+ 3/2 1 1
2 2π
rℓm rℓm
0
1/2 2 3 3 ℓ/2
1/2 r ℓ
× ∫ dn
^ [1 + 2 x x n
^ ⋅n
^ + (x − )(x − )] S (x ) x Ym (n
^1) ,
1 1 1 1 1 1 1
3 2 2 ℓ+
2
where we’ve used

−−−− − −−−−
2 kB T 1/2
kB T −1/2
u =√ x , du = √ x dx . (8.10.4)
1 1 1
m 2m
Now recall Y 0
0
^) =
(n
√4π
1
and
−−
− −−
− −−
−
3 iφ
3 3 −iφ
1 1 1
Y ^ ) = −√
(n sin θ e Y ^) = √
(n cos θ Y ^ ) = +√
(n sin θ e
1 0 −1
8π 4π 8π
3
0 0 1
S (x) = 1 S (x) = 1 S (x) = −x ,
1/2 3/2 1/2
2
which allows us to write
0 0∗
1 = 4π Y ^) Y
(n ^1)
(n
0 0
4π ∗ ∗ ∗
1 1 1 1 1 1
^ ⋅n
n ^1 = [ Y ^) Y
(n ^1) + Y
(n ^) Y
(n ^1) + Y
(n ^) Y
(n ^1) ] .
(n
0 0 1 1 −1 −1
3
We can do the integrals by appealing to the orthogonality relations for the spherical harmonics and Sonine polynomials:
′ ∗
ℓ l
^ Ym (n
∫ dn ^) Y ′
^ ) = δ ′ δmm′
(n
m ll
′
Γ(n + α + 1)
−x α n n
∫ dx e x Sα (x) Sα (x) = δnn′ .
Γ(n + 1)
0
Integrating first over the direction vector n

^ , 1
r ℓ/2 ℓ
(x) x Ym (n
^)
ℓ+
2
rℓm
2γ 1/2 −x1 0 0∗ 0 0
+ ^ [Y ^) Y ^1) S
− ∑ arℓm (t)∫ dx1 x1 e ∫ dn1 0
(n
0
(n
1/2
(x) S
1/2
(x )
1
√π
rℓm
0
1
2 1/2 ∗
1/2 1 1 0 0
+ x x ∑ Y ′
(n
^) Y ′
(n
^1) S (x) S (x )
1 m m 3/2 3/2 1
3 ′
m =−1
2
0 0∗ 1 1 r
ℓ/2 ℓ
+ Y ^) Y
(n ^1) S
(n (x) S (x )] S 1
(x ) x ^1) ,
Ym (n
0 0 1/2 1/2 1 1 1
3 ℓ+
2
we obtain the intermediate result

r ℓ/2 ℓ
(x) x ^)
Ym (n
ℓ+
2
rℓm
2γ 1/2 −x 0 0 0
+ ^ ) δl0 δm0 S
− ∑ arℓm (t)∫ dx1 x1 e [Y (n (x) S (x )
1
0 1/2 1/2 1
√π
rℓm
0
1
2 1/2 1/2 1 0 0
+ x x ∑ Y ′
^ ) δl1 δmm′ S
(n (x) S (x )
1 m 3/2 3/2 1
3 ′
m =−1
2 1/2
0 1 1 r
+ Y ^ ) δl0 δm0 S
(n (x) S (x )] S 1
(x ) x .
0 1/2 1/2 1 ℓ+ 1 1
3 2
Appealing now to the orthogonality of the Sonine polynomials, and recalling that
1
−
Γ( ) = √π , Γ(1) = 1 , Γ(z + 1) = z Γ(z) , (8.10.5)
2
we integrate over x . For the first term in brackets, we invoke the orthogonality relation with n = 0 and α = , giving
1
1
2
−
√π . For the second bracketed term, we have n = 0 but α = , and we obtain Γ( ) = Γ( ) , while the third
3 1 3 5 3 3
Γ( ) =
2 2 2 2 2 2
bracketed term involves leads to n = 1 and α = , also yielding Γ( ) = Γ( ) . Thus, we obtain the simple and pleasing result
1
2
5
2
3
2
3
′
r ℓ/2 ℓ
(x) x ^)
Ym (n (8.10.6)
ℓ+
2
rℓm
where the prime on the sum indicates that the set
C I = {(0, 0, 0) , (1, 0, 0) , (0, 1, 1) , (0, 1, 0) , (0, 1, −1)} (8.10.7)
are to be excluded from the sum. But these are just the functions which correspond to the five collisional invariants! Thus, we learn
that
r ℓ/2 ℓ
ψrℓm (v) = Nrℓm S 1
(x) x ^ ),
Ym (n (8.10.8)
ℓ+
2
is an eigenfunction of L with eigenvalue −γ if (r, ℓ, m) does not correspond to one of the five collisional invariants. In the latter
case, the eigenvalue is zero. Thus, the algebraic action of L on the coefficients a is rℓm
−γ arℓm if (r, ℓ, m) ∉ C I
(La)rℓm = { (8.10.9)
=0 if (r, ℓ, m) ∈ C I
The quantity τ =γ
−1
is the relaxation time.
It is pretty obvious that L is self-adjoint, since
3 0
⟨ ϕ | Lψ ⟩ ≡ ∫ d v f (v) ϕ(v) L[ψ(v)]
3/2 2
m 3
mv
= −γ n( ) ∫ d v exp ( − )ϕ(v) ψ(v)
2π kB T 2 kB T
3 2 2
m mu mv
3 3
+ γ n( ) ∫ d v∫ d u exp ( − ) exp ( − )
2π kB T 2 kB T 2 kB T
2 2
m 2 mu 3 mv 3
× ϕ(v) [1 + u⋅v+ ( − )( − )] ψ(u)
kB T 3 2 kB T 2 2 kB T 2
= ⟨ Lϕ | ψ ⟩ ,
where n is the bulk number density and f 0

(v) is the Maxwellian velocity distribution.
This page titled 8.10: Appendix I- Boltzmann Equation and Collisional Invariants is shared under a CC BY-NC-SA license and was authored,
remixed, and/or curated by Daniel Arovas.
Let x ∈ R be a random variable, and P (x) a probability distribution for x. The average of any function ϕ(x) is then
∞ ∞
⟨ϕ(x)⟩ = ∫ dx P (x) ϕ(x)/∫ dx P (x) . (8.11.1)
−∞ −∞
Let η(t) be a random function of t , with η(t) ∈ R , and let P [η(t)] be the probability distribution functional for η(t) . Then if
Φ[η(t)] is a functional of η(t), the average of Φ is given by
∫ Dη P [η(t)] Φ[η(t)]/∫ Dη P [η(t)] (8.11.2)
The expression ∫ Dη P [η] Φ[η] is a functional integral. A functional integral is a continuum limit of a multivariable integral.
Suppose η(t) were defined on a set of t values t = nτ . A functional of η(t) becomes a multivariable function of the values
n
η ≡ η(t ) . The metric then becomes

n n
Dη ⟶ ∏ dηn . (8.11.3)
In fact, for our purposes we will not need to know any details about the functional measure Dη ; we will finesse this delicate
issue18. Consider the generating functional,
∞
Z[J(t)] = ∫ Dη P [η] exp ( ∫ dt J(t) η(t)) . (8.11.4)
−∞
It is clear that
n
1 δ Z[J] ∣
∣ = ⟨η(t ) ⋯ η(tn )⟩ . (8.11.5)
1
Z[J] δJ(t ) ⋯ δJ(tn ) ∣
1 J(t)=0
The function J(t) is an arbitrary source function. We differentiate with respect to it in order to find the η -field correlators.
[Fdiscretize] Discretization of a continuous function η(t) . Upon discretization, a functional Φ[η(t)] becomes an ordinary
multivariable function Φ({η }).
j
Let’s compute the generating function for a class of distributions of the Gaussian form,
∞
1 2
2 2
P [η] = exp(− ∫ dt (τ η̇ + η ))
2Γ
−∞
1 dω 2 2 2
= exp(− ∫ (1 + ω τ )∣
∣η^(ω)∣
∣ ) .
2Γ 2π
−∞
Then Fourier transforming the source function J(t) , it is easy to see that
∞ 2
∣ ^
Γ dω ∣J (ω)∣
∣
Z[J] = Z[0] ⋅ exp( ∫ ) . (8.11.6)
2 2
2 2π 1 +ω τ
−∞
Note that with η(t) ∈ R and J(t) ∈ R we have η ∗

(ω) = η(−ω) and J ∗
(ω) = J(−ω) . Transforming back to real time, we have
∞ ∞
1
′ ′ ′
Z[J] = Z[0] ⋅ exp ( ∫ dt∫ dt J(t) G(t − t ) J(t )) , (8.11.7)
2
−∞ −∞
where
Γ −|s|/τ
Γ
ˆ
G(s) = e , G(ω) = (8.11.8)
2 2
2τ 1 +ω τ
is the Green’s function, in real and Fourier space. Note that

∞
ˆ
∫ ds G(s) = G(0) = Γ . (8.11.9)
−∞
We can now compute
⟨η(t ) η(t )⟩ = G(t −t )

1 2 1 2
⟨η(t ) η(t ) η(t ) η(t )⟩ = G(t − t ) G(t − t ) + G(t − t ) G(t −t )

1 2 3 4 1 2 3 4 1 3 2 4
+ G(t − t ) G(t −t ) .
1 4 2 3
The generalization is now easy to prove, and is known as Wick’s theorem:
⟨η(t ) ⋯ η(t )⟩ = ∑ G(t −t ) ⋯ G(t −t ) , (8.11.10)

1 2n i1 i2 i2n−1 i2n
contractions
where the sum is over all distinct contractions of the sequence 1-2 ⋯ 2n into products of pairs. How many terms are there? Some
simple combinatorics answers this question. Choose the index 1. There are (2n − 1) other time indices with which it can be
contracted. Now choose another index. There are (2n − 3) indices with which that index can be contracted. And so on. We thus
obtain
\# of contractions (2n)!
C (n) ≡ = (2n − 1)(2n − 3) ⋯ 3 ⋅ 1 = . (8.11.11)
n
of 1-2-3\,⋯2n 2 n!
This page titled 8.11: Appendix II- Distributions and Functionals is shared under a CC BY-NC-SA license and was authored, remixed, and/or
We can also solve general autonomous linear inhomogeneous ODEs of the form
n n−1
d x d x dx
+a +… +a +a x = ξ(t) . (8.12.1)
n−1 1 0
dtn dt
n−1
dt
We can write this as
L x(t) = ξ(t) , (8.12.2)

t
where L is the n^\ssr{th} order differential operator

t
n n−1
d d d
L = +a +… +a +a . (8.12.3)
t n n−1 n−1 1 0
dt dt dt
The general solution to the inhomogeneous equation is given by

∞
′ ′ ′
x(t) = x (t) + ∫ dt G(t, t ) ξ(t ) , (8.12.4)
h
−∞
where G(t, t ) is the Green’s function. Note that L

′
t
x (t) = 0
h
. Thus, in order for eqns. [Leqn] and [time] to be true, we must have
\[{\cal L}\nd_t\, x(t)=\stackrel
{\overbrace +\impi dt'\>{\cal L}\nd_t\,G(t,t')\,\xi(t')=\xi(t)\ ,\] which means that
′ ′
L G(t, t ) = δ(t − t ) , (8.12.5)
t
where δ(t − t ) is the Dirac δ -function.

′
If the differential equation L x(t) = ξ(t) is defined over some finite or semi-infinite t interval with prescribed boundary
t
conditions on x(t) at the endpoints, then G(t, t ) will depend on t and t separately. For the case we are now considering, let the
′ ′
interval be the entire real line t ∈ (−∞, ∞) . Then G(t, t ) = G(t − t ) is a function of the single variable t − t .
′ ′ ′
Note that Lt = L(
d
dt
) may be considered a function of the differential operator dt
d
. If we now Fourier transform the equation
L x(t) = ξ(t)
t
, we obtain
∞ ∞
n n−1
iωt iωt
d d d
∫ dt e ξ(t) = ∫ dt e { +a +… +a + a } x(t)
n n−1 n−1 1 0
dt dt dt
−∞ −∞
iωt n n−1
= ∫ dt e {(−iω) +a (−iω) +… +a (−iω) + a } x(t) .
n−1 1 0
−∞
Thus, if we define
n
^ k
L (ω) = ∑ a (−iω) , (8.12.6)
k
k=0
then we have
^ ^
^(ω) = ξ
L (ω) x (ω) , (8.12.7)
where a ≡ 1 . According to the Fundamental Theorem of Algebra, the n^\ssr{th} degree polynomial
n
^
L (ω) may be uniquely
factored over the complex ω plane into a product over n roots:
^ n
L (ω) = (−i ) (ω − ω1 )(ω − ω2 ) ⋯ (ω − ωn ) . (8.12.8)
∗
If the {a } are all real, then [ L^(ω)] = L^(−ω ) , hence if Ω is a root then so is −Ω . Thus, the roots appear in pairs which are
k
∗ ∗
symmetric about the imaginary axis. if Ω = a + ib is a root, then so is −Ω = −a + ib . ∗
The general solution to the homogeneous equation is
n
−iωσ t
x (t) = ∑ Aσ e , (8.12.9)
h
σ=1
which involves n arbitrary complex constants A . The susceptibility, or Green’s function in Fourier space, G
i
^
(ω) is then
n
1 i
^
G(ω) = = , (8.12.10)
^ (ω − ω1 )(ω − ω2 ) ⋯ (ω − ωn )
L (ω)
∗
Note that [ G^
(ω)] = G(−ω) , which is equivalent to the statement that
^
G(t − t )
′
is a real function of its argument. The general
solution to the inhomogeneous equation is then
∞
′ ′ ′
x(t) = x (t) + ∫ dt G(t − t ) ξ(t ) , (8.12.11)
h
−∞
where x h
(t) is the solution to the homogeneous equation, with zero forcing, and where
∞
dω ′
′ −iω(t−t ) ^
G(t − t ) = ∫ e G(ω)
2π
−∞
∞
′
−iω(t−t )
dω e
n
=i ∫
2π (ω − ω1 )(ω − ω2 ) ⋯ (ω − ωn )
−∞
′
n −iωσ (t−t )
e ′
=∑ Θ(t − t ) ,
′
σ=1 i L (ωσ )
where we assume that I m ω σ <0 for all σ. This guarantees causality – the response x(t) to the influence ξ(t ) is nonzero only for ′
t >t .
′
As an example, consider the familiar case

^ 2 2
L (ω) = −ω − iγω + ω
0
= −(ω − ω+ ) (ω − ω− ) ,
−−−−−−−−
with ω ±
=−
i
2
γ ±β , and β = √ω 2
0
−
1
4
γ
2
. This yields
′
L (ω ) = ∓(ω+ − ω− ) = ∓2β . (8.12.12)
±
Then according to equation [gfun],

−iω s −iω s
e +
e −
G(s) = { + } Θ(s)
′ ′
i L (ω+ ) i L (ω− )
−γs/2 −iβs −γs/2 iβs

e e e e
={ + } Θ(s)
−2iβ 2iβ
−1 −γs/2
=β e sin(βs) Θ(s) .
Now let us evaluate the two-point correlation function ⟨x(t) x(t )⟩ , assuming the noise is correlated according to ′
⟨ξ(s) ξ(s )⟩ = ϕ(s − s ) . We assume t, t → ∞ so the transient contribution x is negligible. We then have
′ ′ ′
h
∞ ∞
′ ′ ′ ′ ′
⟨x(t) x(t )⟩ = ∫ ds ∫ ds G(t − s) G(t − s ) ⟨ξ(s) ξ(s )⟩
−∞ −∞
dω 2 ′
^ ^ iω(t−t )
= ∫ ϕ (ω) ∣
∣G(ω)∣
∣ e .
2π
−∞
Higher order ODEs
Note that any n^\ssr{th} order ODE, of the general form
n n−1
d x dx d x
= F (x , , … , ) , (8.12.13)
n n−1
dt dt dt
may be represented by the first order system φ̇ = V(φ) . To see this, define φ
k
=d
k−1
x/dt
k−1
, with k = 1, … , n . Thus, for
k < n we have φ̇ = φ , and φ̇ = F . In other words, \[\stackrel
k k+1 n
{\overbrace{ {d\over dt}
⎛ φ ⎞
1
⎜ ⎟
⎜ ⋮ ⎟
⎜ ⎟ (8.12.14)
⎜ ⎟
⎜ ⎟
⎜ φn−1 ⎟
⎝ ⎠
φn
}}= \stackrel{\BV(\Bvphi)}{\overbrace{
⎛ φ ⎞
2
⎜ ⎟
⎜ ⋮ ⎟
⎜ ⎟ (8.12.15)
⎜ ⎟
⎜ ⎟
⎜ φn ⎟
⎝ ⎠
F (φ , … , φp )
1
}}\ .\]
An inhomogeneous linear n^\ssr{th} order ODE,
n n−1
d x d x dx
+a +… +a +a x = ξ(t) (8.12.16)
n n−1 1 0
dt dtn−1 dt
may be written in matrix form, as

Q ξ
 
⎛ φ1 ⎞ ⎛
0 1 0 ⋯ 0
⎞ ⎛
φ ⎞
1 0
⎛ ⎞
⎜ ⎟ 0 0 1 ⋯ 0 ⎜ ⎟
d ⎜ φ2 ⎟ ⎜ ⎟ ⎜ φ2 ⎟ ⎜ 0 ⎟
⎜ ⎟ ⎜ ⎟ .
⎜ ⎟ =⎜ ⎟ ⎜ ⎟+ (8.12.17)
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
dt ⎜ ⎟ ⎜ ⋮ ⋮ ⋮ ⋮ ⎟ ⎜ ⎟
⎜ ⎟ ⎜ ⋮ ⎟
⎜ ⋮ ⎟ ⎜ ⋮ ⎟
⎝ ⎠ ⎝ ⎝ ⎠
⎝ ⎠ −a −a −a ⋯ −a ⎠ ξ(t)
φn 0 1 2 n−1 φn
Thus,
φ̇ = Q φ + ξ , (8.12.18)
and if the coefficients c are time-independent, the ODE is autonomous.

k
For the homogeneous case where ξ(t) = 0 , the solution is obtained by exponentiating the constant matrix Qt:
φ(t) = exp(Qt) φ(0) ; (8.12.19)
the exponential of a matrix may be given meaning by its Taylor series expansion. If the ODE is not autonomous, then Q = Q(t) is
time-dependent, and the solution is given by the path-ordered exponential,
t
′ ′
φ(t) = P exp { ∫ dt Q(t )} φ(0) , (8.12.20)
where P is the path ordering operator which places earlier times to the right. As defined, the equation φ̇ = V(φ) is autonomous,
since the t -advance mapping g depends only on t and on no other time variable. However, by extending the phase space M ∋ φ
t
from M → M × R , which is of dimension n + 1 , one can describe arbitrary time-dependent ODEs.
In general, path ordered exponentials are difficult to compute analytically. We will henceforth consider the autonomous case where
Q is a constant matrix in time. We will assume the matrix Q is real, but other than that it has no helpful symmetries. We can
however decompose it into left and right eigenvectors:

n
Q = ∑ νσ R L . (8.12.21)
ij σ,i σ,j
σ=1
Or, in bra-ket notation, Q = ∑ σ

νσ | Rσ ⟩⟨Lσ | . The normalization condition we use is
⟨ Lσ ∣
∣ Rσ ′ ⟩ = δσσ ′ , (8.12.22)
where { νσ } are the eigenvalues of Q. The eigenvalues may be real or imaginary. Since the characteristic polynomial
P (ν ) = det (ν I − Q) has real coefficients, we know that the eigenvalues of Q are either real or come in complex conjugate pairs.
Consider, for example, the n = 2 system we studied earlier. Then
0 1
Q =( ) . (8.12.23)
2
−ω −γ
0
−−−−−−−−
The eigenvalues are as before: ν ±
=−
1
2
γ ±√
1
4
γ
2
−ω
2
0
. The left and right eigenvectors are
±1 1
L = ( −ν∓ 1) , R =( ) . (8.12.24)
± ±
ν+ − ν− ν
±
The utility of working in a left-right eigenbasis is apparent once we reflect upon the result
n
f (Q) = ∑ f (νσ ) ∣
∣ Rσ ⟩ ⟨ Lσ ∣
∣ (8.12.25)
σ=1
for any function f . Thus, the solution to the general autonomous homogeneous case is
n
νσ t
∣
∣ φ(t) ⟩ = ∑ e ∣
∣ Rσ ⟩ ⟨ Lσ ∣
∣ φ(0) ⟩
σ=1
n n
νσ t
φ (t) = ∑ e R ∑L φ (0) .
i σ,i σ,j j
σ=1 j=1
If Re (νσ) ≤0 for all σ, then the initial conditions φ(0) are forgotten on time scales τ σ = νσ
−1
. Physicality demands that this is the
case.
Now let’s consider the inhomogeneous case where ξ(t) ≠ 0 . We begin by recasting Equation [phiQeqn] in the form
d
−Qt −Qt
(e φ) = e ξ(t) . (8.12.26)
dt
We can integrate this directly:

t
Qt Q(t−s)
φ(t) = e φ(0) + ∫ ds e ξ(s) . (8.12.27)
In component notation,
t
n n
νσ t νσ (t−s)
φ (t) = ∑ e R ⟨ Lσ ∣
∣ φ(0) ⟩ + ∑ Rσ,i ∫ ds e ⟨ Lσ ∣
∣ ξ(s) ⟩. (8.12.28)
i σ,i
σ=1 σ=1
0
Note that the first term on the RHS is the solution to the homogeneous equation, as must be the case when ξ(s) = 0 .
The solution in Equation [CNsoln] holds for general Q and ξ(s) . For the particular form of Q and ξ(s) in Equation [Qxieqn], we
can proceed further. For starters, ⟨L |ξ(s)⟩ = L ξ(s) . We can further exploit a special feature of the Q matrix to analytically
σ σ,n
determine all its left and right eigenvectors. Applying Q to the right eigenvector |R ⟩, we obtain σ
R = νσ R (j > 1) . (8.12.29)
σ,j σ,j−1
We are free to choose R = 1 for all

σ,1
σ and defer the issue of normalization to the derivation of the left eigenvectors. Thus, we
obtain the pleasingly simple result,
k−1
R = νσ . (8.12.30)
σ,k
Applying Q to the left eigenvector ⟨L |, we obtain σ
−a Lσ,n = νσ L
0 σ,1
L −a Lσ,n = νσ L (j > 1) .
σ,j−1 j−1 σ,j
From these equations we may derive

k−1 n
Lσ,n Lσ,n
j−k−1 j−k−1
L =− ∑a νσ = ∑a νσ . (8.12.31)
σ,k j j
νσ νσ
j=0 j=k
n j
The equality in the above equation is derived using the result P (ν σ) =∑
j=0
a
j
νσ = 0 . Recall also that a n ≡1 . We now impose
the normalization condition,
n
∑L R =1 . (8.12.32)
σ,k σ,k
k=1
This condition determines our last remaining unknown quantity (for a given σ), L σ,p :
n
k−1 ′
⟨ Lσ ∣
∣ Rσ ⟩ = Lσ,n ∑ k ak νσ = P (νσ ) Lσ,n , (8.12.33)
k=1
where P ′
(ν ) is the first derivative of the characteristic polynomial. Thus, we obtain another neat result,
1
Lσ,n = . (8.12.34)
P ′ (νσ )
Now let us evaluate the general two-point correlation function,

′ ′ ′
C ′
(t, t ) ≡ ⟨φ (t) φ ′
(t )⟩ − ⟨φ (t)⟩ ⟨φ ′
(t )⟩ . (8.12.35)
jj j j j j
We write
∞
′ ′
dω −iω(s−s
′
)
^
⟨ξ(s) ξ(s )⟩ = ϕ(s − s ) = ∫ ϕ (ω) e . (8.12.36)
2π
−∞
When ϕ^(ω) is constant, we have ⟨ξ(s) ξ(s )⟩ = ϕ^(t) δ(s − s ) . This is the case of so-called white noise, when all frequencies
′ ′
contribute equally. The more general case when ϕ^(ω) is frequency-dependent is known as colored noise. Appealing to Equation
[CNsoln], we have
′
′ t t
j−1 j −1
ν ν ′ ′ ′
′ σ σ νσ (t−s) ′ νσ ′ ( t −s ) ′
C ′
(t, t ) = ∑ ∫ ds e ∫ ds e ϕ(s − s )
jj
′ ′
σ,σ
′ P (ν ) P (ν ′
)
σ σ
0 0
′ ∞
j−1 j −1 ′ ′
ν ^ −iωt ν t iωt ν ′t
νσ σ′ dω ϕ (ω) (e − e σ )(e −e σ )
=∑ ∫ .
′ ′ 2π
σ,σ
′ P (νσ ) P (ν ′ ) (ω − i νσ )(ω + i ν ′ )
σ −∞ σ
In the limit , assuming for all ( no diffusion), the exponentials and may be neglected, and we
′
′ νσ t νσ ′ t
t, t → ∞ Re (νσ ) < 0 σ e e
then have
′ ∞
j−1 j −1 ′
ν ^ −iω(t−t )
νσ σ′ dω ϕ (ω) e
′
C ′ (t, t ) = ∑ ∫ . (8.12.37)
jj
′ ′ 2π
σ,σ
′ P (νσ ) P (ν ′ ) (ω − i νσ )(ω + i ν ′ )
σ −∞ σ
This page titled 8.12: Appendix III- General Linear Autonomous Inhomogeneous ODEs is shared under a CC BY-NC-SA license and was
authored, remixed, and/or curated by Daniel Arovas.
As shown above, integrating the Langevin equation ṗ + γp = F + η(t) yields
t
F
−γt −γt γ(s−t)
p(t) = p(0) e + (1 − e ) + ∫ ds η(s) e . (8.13.1)
γ
0
. Thus, the momentum autocorrelator is

′
t t
′ ′
′ ′ ′ γ(s−t) γ( s −t ) ′
⟨p(t) p(t )⟩ − ⟨p(t)⟩⟨p(t )⟩ = ∫ ds∫ ds e e ⟨η(s) η(s )⟩
0 0
tmin
′ ′ ′
−γ(t+t ) 2γs −γ|t−t | −γ(t+t )
= Γe ∫ ds e = M kB T (e −e ) ,
where
′
′ t if t < t
t = min(t, t ) = { (8.13.2)
min ′ ′
t if t < t
is the lesser of t and t . Here we have used the result

′
′
t t t t
min min
′ ′
′ γ(s+s ) ′ ′ γ(s+s ) ′
∫ ds∫ ds e δ(s − s ) = ∫ ds ∫ ds e δ(s − s )
0 0 0 0
tmin
1
2γs 2γtmin
= ∫ ds e = (e − 1) .
2γ
0
One way to intuitively understand this result is as follows. The double integral over s and s is over a rectangle of dimensions ′
t × t . Since the δ -function can only be satisfied when s = s , there can be no contribution to the integral from regions where
′ ′
s > t or s > t . Thus, the only contributions can arise from integration over the square of dimensions t . Note also
′ ′
×t
min min
′ ′ ′
t + t − 2 min(t, t ) = |t − t | . (8.13.3)
[Fssprime] Regions for some of the double integrals encountered in the text.
Let’s now compute the position x(t). We have
t
1
x(t) = x(0) + ∫ ds p(s)
M
0
t t s
F F 1
−γs γ( s −s)
= x(0) + ∫ ds [(v(0) − )e + ]+ ∫ ds∫ ds η(s ) e 1
1 1
γM γM M
0 0 0
t s
1 γ( s −s)
= ⟨x(t)⟩ + ∫ ds∫ ds η(s ) e 1
,
1 1
M
0 0
with v = p/M . Since ⟨η(t)⟩ = 0 , we have

t
F −γs
F
⟨x(t)⟩ = x(0) + ∫ ds [(v(0) − )e + ]
γM γM
0
Ft 1 F −γt
= x(0) + + (v(0) − ) (1 − e ) .
γM γ γM
Note that for γt ≪ 1 we have ⟨x(t)⟩ = x(0) + v(0) t + M F t + O(t ) , as is appropriate for ballistic particles moving
1
2
−1 2 3
under the influence of a constant force. This long time limit of course agrees with our earlier evaluation for the terminal velocity,
v∞ = ⟨p(∞)⟩/M = F /γM .
We next compute the position autocorrelation:

′ ′
t t s s
1 ′
′ ′ ′ −γ(s+s ) ′ γ( s +s )
⟨x(t) x(t )⟩ − ⟨x(t)⟩⟨x(t )⟩ = ∫ ds∫ ds e ∫ ds ∫ ds e 1 2
⟨η(s ) η(s )⟩
2 1 1 1 2
M
0 0 0 0
′
t t
Γ ′ −γ|s−s |
′
−γ(s+s )
′
= ∫ ds∫ ds (e −e )
2
2γM
0 0
We have to be careful in computing the double integral of the first term in brackets on the RHS. We can assume, without loss of
generality, that t ≥ t . Then
′
′ ′ ′ ′
t t t t t s
′ ′ ′
′ −γ|s−s | ′ γs −γs ′ −γs γs
∫ ds∫ ds e = ∫ ds e ∫ ds e + ∫ ds e ∫ ds e
′
0 0 0 s 0 0
′ ′
−1 ′ −2 −γt −γt −γ(t−t )
= 2γ t +γ (e +e −1 −e ) .
We then find, for t > t , ′
2 kB T kB T ′ ′ ′
′ ′ ′ −γt −γt −γ(t−t ) −γ(t+t )
⟨x(t) x(t )⟩ − ⟨x(t)⟩⟨x(t )⟩ = t + (2 e + 2e −2 −e −e ) . (8.13.4)
2
γM γ M
In particular, the equal time autocorrelator is
2 2 2 kB T kB T −γt −2γt
⟨x (t)⟩ − ⟨x(t)⟩ = t+ (4 e −3 −e ) . (8.13.5)
2
γM γ M
We see that for long times

2 2
⟨x (t)⟩ − ⟨x(t)⟩ ∼ 2Dt , (8.13.6)
where D = k B
T /γM is the diffusion constant.
This page titled 8.13: Appendix IV- Correlations in the Langevin formalism is shared under a CC BY-NC-SA license and was authored, remixed,
19
Suppose χ ^
^(ω) ≡ G (ω) is analytic in the UHP . Then for all ν , we must have
dν ^(ν )
χ
∫ =0 , (8.14.1)
2π ν − ω + iϵ
−∞
where ϵ is a positive infinitesimal. The reason is simple: just close the contour in the UHP, assuming χ
^(ω) vanishes sufficiently
rapidly that Jordan’s lemma can be applied. Clearly this is an extremely weak restriction on χ(ω), given the fact that the
^
denominator already causes the integrand to vanish as |ω| . −1
Let us examine the function

1 ν −ω iϵ
= − . (8.14.2)
2 2 2 2
ν − ω + iϵ (ν − ω) +ϵ (ν − ω) +ϵ
which we have separated into real and imaginary parts. Under an integral sign, the first term, in the limit ϵ→ 0 , is equivalent to
taking a principal part of the integral. That is, for any function F (ν ) which is regular at ν = ω ,
∞ ∞
dν ν −ω dν F (ν )
lim ∫ F (ν ) ≡ ℘∫ . (8.14.3)
2 2
ϵ→0 2π (ν − ω) +ϵ 2π ν −ω
−∞ −∞
The principal part symbol ℘ means that the singularity at ν = ω is elided, either by smoothing out the function 1/(ν − ϵ) as
above, or by simply cutting out a region of integration of width ϵ on either side of ν = ω .
The imaginary part is more interesting. Let us write
ϵ
h(u) ≡ . (8.14.4)
2 2
u +ϵ
For |u| ≫ ϵ, h(u) ≃ ϵ/u , which vanishes as ϵ → 0 . For u = 0 , h(0) = 1/ϵ which diverges as ϵ → 0 . Thus, h(u) has a huge peak
2
at u = 0 and rapidly decays to 0 as one moves off the peak in either direction a distance greater that ϵ. Finally, note that
∞
∫ du h(u) = π , (8.14.5)
−∞
a result which itself is easy to show using contour integration. Putting it all together, this tells us that
ϵ
lim = πδ(u) . (8.14.6)
ϵ→0 2 2
u +ϵ
Thus, for positive infinitesimal ϵ,

1 ℘
= ∓ iπδ(u) , (8.14.7)
u ± iϵ u
a most useful result.

We now return to our initial result [kka], and we separate χ
^(ω) into real and imaginary parts:
′ ′′
^(ω) = χ
χ ^ (ω) + i χ
^ (ω) . (8.14.8)
(In this equation, the primes do not indicate differentiation with respect to argument.) We therefore have, for every real value of ω,
∞
dν ℘
′ ′′
0 = ∫ [χ (ν ) + i χ (ν )] [ − iπδ(ν − ω)] . (8.14.9)
2π ν −ω
−∞
Taking the real and imaginary parts of this equation, we derive the Kramers-Krönig relations:
∞
′′
dν χ
^ (ν )
′
χ (ω) = +℘∫
π ν −ω
−∞
∞
′
dν ^ (ν )
χ
′′
χ (ω) = −℘∫ .
π ν −ω
−∞
This page titled 8.14: Appendix V- Kramers-Krönig Relations is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated
by Daniel Arovas.
8.S: Summary
References
H. Smith and H. H. Jensen, Transport Phenomena (Oxford, 1989) An outstanding, thorough, and pellucid presentation of the theory of
Boltzmann transport in classical and quantum systems.
P. L. Krapivsky, S. Redner, and E. Ben-Naim, A Kinetic View of Statistical Physics (Cambridge, 2010) Superb, modern discussion of a
broad variety of issues and models in nonequilibrium statistical physics.
E. M. Lifshitz and L. P. Pitaevskii, Physical Kinetics (Pergamon, 1981) Volume 10 in the famous Landau and Lifshitz Course of
Theoretical Physics. Surprisingly readable, and with many applications (some advanced).
M. Kardar, Statistical Physics of Particles (Cambridge, 2007) A superb modern text, with many insightful presentations of key concepts.
Includes a very instructive derivation of the Boltzmann equation starting from the BBGKY hierarchy.
J. A. McLennan, Introduction to Non-equilibrium Statistical Mechanics (Prentice-Hall, 1989) Though narrow in scope, this book is a good
resource on the Boltzmann equation.
F. Reif, Fundamentals of Statistical and Thermal Physics (McGraw-Hill, 1987) This has been perhaps the most popular undergraduate text
since it first appeared in 1967, and with good reason. The later chapters discuss transport phenomena at an undergraduate level.
N. G. Van Kampen, Stochastic Processes in Physics and Chemistry (3 edition, North-Holland, 2007) This is a very readable and useful
rd
text. A relaxed but meaty presentation.
Summary
∙ Boltzmann equation: The full phase space distribution for a Hamiltonian system, ϱ(φ, t), where φ = ({q }, {p }), satisfies σ σ
ϱ̇ + φ̇ ⋅∇ϱ = 0 . This is not true, however, for the one-particle distribution f (q, p, t). Rather, f˙ is related to two-, three-, and higher order
particle number distributions in a chain of integrodifferential equations known as the BBGKY hierarchy. We can lump our ignorance of these
other terms into a collision integral and write \[{\pz f\over\pz t}=\stackrel
{\overbrace{\vphantom{\Bigg(}-{\dot\Br}\cdot{\pz f\over\pz\Br} - {\dot\Bp}\cdot{\pz f\over\pz\Bp}}}+\stackrel{\overbrace{\coll}}\ .\] In
the absence of collisions, the distribution evolves solely due to the streaming term with ṙ = p/m and ṗ = −∇U . If ṗ = F is ext ext
constant, we have the general solution

2
p t F t F t
ext ext
f (r, p, t) = ϕ(r − + , p− ) , (8.S.1)
m 2m m
valid for any initial condition f (r, p, t = 0) = ϕ(r, p) . We write the convective derivative as D
Dt
=
∂
∂t
+ ṙ ⋅
∂
∂r
+ ṗ ⋅
∂
∂p
. Then the
Df ∂f
Boltzmann equation may be written Dt
=(
∂t
)
coll
.
∙ Collisions: We are concerned with two types of collision processes: single-particle scattering, due to a local potential, and two-particle
scattering, due to interparticle forces. Let Γ denote the set of single particle kinematic variables, Γ = (p , p , p ) for point particles and x y z
Γ = (p, L) for diatomic molecules. Then
∂f
′ ′ ′ ′
( ) = ∫ dΓ {w(Γ | Γ ) f (r, Γ ; t) − w(Γ | Γ) f (r, Γ; t)} (8.S.2)
∂t
coll
for single particle scattering, and
∂f
′ ′ ′ ′ ′ ′ ′ ′
( ) = ∫ dΓ ∫ dΓ ∫ dΓ {w(ΓΓ | Γ Γ ) f (r, Γ ; r, Γ ; t) − w(Γ Γ | ΓΓ ) f (r, Γ; r, Γ ; t)}
1 1 1 1 2 1 1 1 2 1
∂t
coll
′ ′ ′ ′ ′ ′
≈ ∫ dΓ ∫ dΓ ∫ dΓ {w(ΓΓ | Γ Γ ) f (r, Γ ; t) f (r, Γ ; t)
1 1 1 1 1
′ ′
− w(Γ Γ | ΓΓ ) f (r, Γ; t) f (r, Γ ; t)} .
1 1 1
for two-body scattering, where f is the two-body distribution, and where the approximation f (r, Γ ; r , Γ ; t) ≈ f (r, Γ; t) f (r , Γ ; t) in the
2 2
′ ′ ′ ′ ′
second line closes the equation. A quantity A(r, Γ) which is preserved by the dynamics between collisions then satisfies
dA d ∂f
d d
≡ ∫ d r dΓ A(r, Γ) f (r, Γ, t) = ∫ d r dΓ A(r, Γ)( ) . (8.S.3)
dt dt ∂t
coll
Quantities which are conserved by collisions satisfy A

˙
= 0 and are called collisional invariants. Examples include A = 1 (particle number),
A = p (linear momentum, if translational invariance applies), and A = ε (energy). p
∙ Time reversal, parity, and detailed balance: With Γ = (p, L) , we define the actions of time reversal and parity as
\Gamma^\sss{T}=(-\Bp,-\BL) \qquad,\qquad \Gamma^\sss{P}=(-\Bp,\BL) \qquad,\qquad \Gamma^{\sss{C}}=(\Bp,-\BL)\ ,
where C = PT is the combined operation. Time reversal symmetry of the underlying equations of motion requires
w\big(\Gamma'\Gamma'_1 \, | \, \Gamma\Gamma_1\big)= w\big(\Gamma^\sss{T}\Gamma^\sss{T}_1 \, | \,\Gamma'{}^\sss{T}\Gamma'_1{}^\sss{T}\big) .
Under conditions of detailed balance, this leads to f^0(\Gamma)\,f^0(\Gamma\ns_1)=f^0(\Gamma'{}^\sss{T})\,f^0(\Gamma'_1{}^\sss{T}) , where
f
0
is the equilibrium distribution. For systems with both P and T symmetries,
w\big(\Gamma'\Gamma'_1 \, | \, \Gamma\Gamma\ns_1\big)=w\big(\Gamma^\sss{C}\Gamma_1^\sss{C} \, | \, \Gamma'{}^\sss{C}\Gamma'_1{}^\sss{C}\big)
, whence w(p , p ′ ′
1
| p, p ) = w(p, p
1 1
′
|p ,p )
′
1
for point particles.
∙ Boltzmann’s H -theorem: Let h(r, t) = ∫ dΓf (r, Γ, t) ln f (r, Γ, t) . Invoking the Boltzmann equation, it can be shown that ∂h
∂t
≤0 , which
means dH
dt
≤0 , where H(t) = ∫ d d
r h(r, t) is Boltzmann’s H -function. h(r, t) is everywhere decreasing or constant, due to collisions.
∙ Weakly inhomogeneous gas: Under equilibrium conditions, f can be a function only of collisional invariants, and takes the Gibbs form 0
0
f (r, p) = C e . Assume now that μ , V, and T are all weakly dependent on r and t . f then describes a local equilibrium and
(μ+V⋅p−ε
Γ
)/ kB T 0
as such is annihilated by the collision term in the Boltzmann equation, but not by the streaming term. Accordingly, we seek a solution
f = f + δf . A lengthy derivation results in
0
0
ε −h ε − h + T cp f ∂ δf ∂f
Γ Γ ext
{ v ⋅ ∇T + m vα v Qαβ − ∇⋅V − F ⋅ v} + =( ) , (8.S.4)
β
T c /k kB T ∂t ∂t
V B coll
α β
where v = ∂ε
∂p
is the particle velocity, h is the enthalpy per particle, Q αβ
=
1
2
(
∂V
∂xβ
+
∂V
∂x
α
) , and F ext
is an external force. For an ideal gas,
h = cp T . The RHS is to be evaluated to first order in δf . The simplest model for the collision integral is the relaxation time approximation,
∂f δf
where (
∂t
. Note that this form does not preserve any collisional invariants. The scattering time is obtained from the relation
)
coll
=−
τ
nv̄ στ = 1 , where σ is the two particle total scattering cross section and v̄ is the average relative speed of a pair of particles. This says
rel rel
that there is on average one collision within a tube of cross sectional area σ and length v̄ τ . For the Maxwellian distribution, ${\bar rel
v}\ns_{rel}=\sqrt{2}\,{\bar v}=\sqrt
$, so τ (T ) ∝ T −1/2
. The mean free path is defined as ℓ = v̄ τ =
1
.
√2 nσ
∙ Transport coefficients: Assuming F ext

α = Qαβ = 0 and steady state, Eq. [bwig] yields
τ (ε − cp T )
0
δf = − (v ⋅∇T ) f . (8.S.5)
2
kB T
The energy current is given by

αβ
thermal conductivity κ

nτ ∂T
α α α β
jε = ∫ dΓ ε v δf = − ⟨v v ε (ε − cp T )⟩ . (8.S.6)
Γ 2 Γ Γ β
kB T ∂x
For a monatomic gas, one finds κ = κ δ with κ(T ) = nℓ v̄c ∝ T . A similar result follows by considering any intensive quantity ϕ
αβ αβ π
8
p
1/2
which is spatially dependent through the temperature T (r) . The ϕ -current across the surface z = 0 is
3 3
1 ∂ϕ
^∫d v P (v) vz ϕ(z − ℓ cos θ) + nz
jϕ = n z ^∫d v P (v) vz ϕ(z + ℓ cos θ) = − n v̄ ℓ ^ .
z (8.S.7)
3 ∂z
vz >0 vz <0
∂ϕ ∂ϕ
Thus, jϕ = −K∇T , with K =
1
3
n ℓ v̄
∂T
the associated transport coefficient. If ϕ = ⟨ε
Γ
⟩ , then ∂T
= cp , yielding κ =
1
3
nℓ v̄cp . If
∂V ∂V
ϕ = ⟨px ⟩, then j = Π = − nmℓ v̄ z
≡ −η
px
, where η is the shear viscosity. Using the Boltzmann equation in the relaxation time
xz
1
3 ∂z
x x
∂z
approximation, one obtains η = nmℓ v̄ . From κ and η , we can form a dimensionless quantity Pr = η c /mκ, known as the Prandtl number.
π
8
p
Within the relaxation time approximation, Pr = 1 . Most monatomic gases have Pr ≈ . 2
∙ Linearized Boltzmann equation: To go beyond the phenomenological relaxation time approximation, one must grapple with the collision
integral,
∂f 3 3 ′ 3 ′ ′
′ ′ ′
( ) = ∫ d p ∫ d p∫ d p w(p , p | p, p ) {f (p ) f (p ) − f (p) f (p )} , (8.S.8)
1 1 1 1 1 1
∂t
coll
which is a nonlinear functional of the distribution f (p, t) (we suppress the t index here). Writing f (p) = f
0
(p) + f
0
(p) ψ(p) , we have
∂f
(
∂t
)
coll
=f
0 ^
(p) Lψ + O(ψ )
2
, with
∂σ
^ 3 0 ′ ′
Lψ(p) = ∫ d p ∫ dΩ |v − v | f (p ) {ψ(p ) + ψ(p ) − ψ(p) − ψ(p )} . (8.S.9)
1 1 1 1 1
∂Ω
The linearized Boltzmann equation (LBE) then takes the form (L
^
−
∂
∂t
)ψ = Y , where
5
1 ε(p) − kB T k ε(p)
2 B
Y = { v ⋅ ∇T + m vα v Q − ∇⋅V − F ⋅ v} . (8.S.10)
β αβ
kB T T c
V
for point particles. To solve the LBE, we must invert the operator ^
L−
∂
∂t
. Various useful properties follow from defining the inner product
⟨ψ | ψ ⟩ ≡ ∫d p f
1 2
3 0
(p) ψ (p) ψ (p)
1
, such as the self-adjointness of : ⟨ψ |L
2
^
ψ ⟩ = ⟨Lψ | ψ ⟩ . We then have L| ϕ
^ ^ ^
L
1 2 1 2 n⟩ = −λn | ϕn ⟩ , with
⟨ϕm | ϕn ⟩ = δmn and real eigenvalues λ . There are five zero eigenvalues corresponding to the collisional invariants:
n
−−
−
1 pα 2 ε(p) 3
ϕ (p) = − , ϕ (p) = −−−−−− , ϕ (p) = √ ( − ) . (8.S.11)
1 2,3,4 5
√n √nm kB T 3n kB T 2
∂ψ
When Y = 0 , the formal solution to = Lψ is ψ(p, t) = ∑ C ϕ (p) e
^
∂t
. Aside from the collisional invariants, all the eigenvalues λ
n n n
−λ t
n
n
must be positive, corresponding to relaxation to the equilibrium state. One can check that the particle, energy, and heat currents are given by
j = ⟨ v | ψ ⟩ , j = ⟨ v ε | ψ ⟩ , and j = ⟨ v (ε − μ) | ψ ⟩ .
ε q
−1
In steady state, the solution to ^
Lψ = Y is ψ = L
^
Y . This is valid provided Y is orthogonal to each of the collisional invariants, in which
case
−1
ψ(p) = ∑ λn ⟨ ϕn | Y ⟩ ϕn (p) . (8.S.12)
n∉CI
Once we have | ψ ⟩, we may obtain the various transport coefficients by computing the requisite currents. For example, to find the thermal
conductivity κ and shear viscosity η ,
1 ∂T 5 ⟨ Xκ | ψ ⟩
κ : Y = Xκ , Xκ ≡ (ε − kB ) vx ⇒ κ =−
2
kB T ∂x 2 ∂T /∂x
m ∂Vx m ⟨ Xη | ψ ⟩
η : Y = Xη , Xη ≡ vx vy ⇒ η =− .
kB T ∂y ∂ Vx /∂y
∙ Variational approach: The Schwarz inequality, ^ ^ ^

⟨ψ | −L | ψ ⟩⋅ ⟨ϕ | H | ϕ ⟩ ≥ ⟨ϕ | H | ψ ⟩
2
, holds for the positive semidefinite operator
H ≡ −L . One therefore has
^ ^
2 2 2
1 ⟨ ϕ | Xκ ⟩ m ⟨ ϕ | Xη ⟩
κ ≥ , η ≥ . (8.S.13)
2
kB T ^ |ϕ⟩
⟨ϕ|H kB T ^ |ϕ⟩
⟨ϕ|H
Using variational functions ϕ κ = (ε −

5
2
kB T ) vx and ϕ η = vx vy , one finds, after tedious calculations,
1/2 1/2
75 kB kB T 5 (m kB T )
κ ≥ ( ) , η ≥ . (8.S.14)
− 2 − 2
64 √π d m 16 √π d
η cp
Taking the lower limit in each case, we obtain a Prandtl number Pr = mκ
=
2
3
, which is close to what is observed for monatomic gases.
∙ Quantum transport: For quantum systems, the local equilibrium distribution is of the Bose-Einstein or Fermi-Dirac form,
−1
ε(k) − μ(r, t)
0
f (r, k, t) = {exp( ) ∓ 1} , (8.S.15)
kB T (r, t)
with k = p/ℏ , and

3 3 ′ 3 ′
∂f d k d k d k
1 1 ′ ′ ′ ′
( ) =∫ ∫ ∫ w {f f (1 ± f ) (1 ± f ) − f f (1 ± f ) (1 ± f )} (8.S.16)
3 3 3 1 1 1 1
∂t (2π) (2π) (2π)
coll
where w = w(k, k | k , k ), f = f (k) , f = f (k ) , f = f (k ) , and f = f (k ) , and where we have assumed time-reversal and parity
1
′ ′
1 1 1
′ ′ ′
1
′
1
symmetry. The most important application is to electron transport in metals and semiconductors, in which case f is the Fermi distribution. 0
With f = f + δf , one has, within the relaxation time approximation,

0
0
∂ δf e ∂ δf ε−μ ∂f δf
− v×B⋅ − v ⋅ [e \boldmath{E}+ ∇ T] =− , (8.S.17)
∂t ℏc ∂k T ∂ε τ
where \boldmath{E} = −∇(ϕ − μ/e) = E − e −1

∇μ is the gradient of the ‘electrochemical potential’ ϕ−e
−1
μ . For steady state
transport with B = 0 , one has
3
d k
j = −2e∫ v δf ≡ L11 \boldmath{E}− L12 ∇ T
3
(2π)
^
Ω
3
d k
jq = 2 ∫ (ε − μ) v δf ≡ L21 \boldmath{E}− L22 ∇ T
3
(2π)
^
Ω
where L αβ
11
=e J
2 αβ
0
,L αβ
21
= TL
αβ
12
= −e J
αβ
1
, and L αβ
22
=
1
T
J
αβ
2
, with
0 α β
1 ∂f v v
αβ n
J ≡ ∫ dε τ (ε) (ε − μ) (− ) ∫ dSε . (8.S.18)
n 3
4π ℏ ∂ε |v|
These results entail
\boldmath{E} = ρ j + Q ∇ T , jq = \boldmath{⊓} j − κ ∇ T , (8.S.19)
or, in terms of the J , n
1 −1
1 −1
1 −1
1 −1
ρ = J , Q =− J J , \boldmath{⊓} = − J J , κ = (J2 − J J J ) . (8.S.20)
2 0 0 1 1 0 1 0 1
e eT e T
These results describe the following physical phenomena:

(∇T =B =0 ): An electrical current j will generate an electric field \boldmath{E} = ρj, where ρ is the electrical resistivity.
(∇T =B =0 ): An electrical current j will generate an heat current j q
= ⊓j , where ⊓ is the Peltier coefficient.
(j = B = 0 ): A temperature gradient ∇T gives rise to a heat current j q
= −κ∇T , where κ is the thermal conductivity.
(j = B = 0 ): A temperature gradient ∇T gives rise to an electric field \boldmath{E} = Q∇T , where Q is the Seebeck coefficient.
For a parabolic band with effective electron mass m , one finds ∗
\rho={m^*\over ne^2\tau} \quad,\quad Q=-{\pi^2 k_\ssr{B}^2 T\over 2 e\,\ve\ns_\ssr{F}} \quad,\quad \kappa = {\pi^2 n\tau k_\ssr{B}^2 T\over 3m^*}
2
with ⊓ = T Q , where \ve\ns_\ssr{F} is the Fermi energy. The ratio κ/σT = (k /e) = 2.45 × 10 V K is then predicted to be π
3
B
2 −8 2 −2
universal, a result known as the Wiedemann-Franz law. This also predicts all metals to have negative thermopower, which is not the case. In
the presence of an external magnetic field B, additional transport effects arise:
( ∂T
∂x
=
∂T
∂y
= jy = 0 ): An electrical current ^
j = jx x and a field ^
B = Bz z yield an electric field \boldmath{E} . The Hall coefficient is
RH = E / jx Bz
y
.
( ∂T
∂x
= jy = jq,y = 0 ): An electrical current j = j x
^
^
z yield a temperature gradient ∂T
∂y
. The Ettingshausen coefficient is
P =
∂T
∂y
/ jx Bz .
(j x = jy =
∂T
∂y
=0 ): A temperature gradient ∇T =
∂T
∂x
x
^ and a field B = Bz z
^ yield an electric field \boldmath{E} . The Nernst
coefficient is Λ = E y
/
∂T
∂x
Bz .
(j x = jy = E
y
=0 ): A temperature gradient ∇T =
∂T
∂x
x
^ and a field B = Bz z
^ yield an orthogonal gradient ∂T
∂y
. The Righi-Leduc
coefficient is L = ∂T
∂y
/
∂T
∂x
Bz .
∙ Stochastic processes: Stochastic processes involve a random element, hence they are not wholly deterministic. The simplest example is the
Langevin equation for Brownian motion, ṗ + γp = F + η(t) , where p is a particle’s momentum, γ a damping rate due to friction, F an
external force, and η(t) a stochastic random force. We can integrate this first order equation to obtain
t
−γt
F −γt γ(s−t)
p(t) = p(0) e + (1 − e ) + ∫ ds η(s) e . (8.S.21)
γ
0
We assume that the random force η(t) has zero mean, and furthermore that
′ ′ ′
⟨η(s) η(s )⟩ = ϕ(s − s ) ≈ Γ δ(s − s ) , (8.S.22)
in which case one finds 2

⟨p (t)⟩ = ⟨p(t)⟩
2
+
Γ
2γ
(1 − e
−2γt
) . If there is no external force, we expect the particle thermailzes at long times,
2
p
⟨
2m
⟩ =. This fixes Γ = 2γmk T , where m is the particle’s mass. One can integrate again to find the position. At late times t ≫ γ
1
2
kB T B
−1
,
one finds ⟨x(t)⟩ = const. + , corresponding to a mean velocity ⟨p/m⟩ = F /γ. The RMS fluctuations in position, however, grow as
Ft
γm
2 2
2 kB T t
⟨x (t)⟩ − ⟨x(t)⟩ = ≡ 2Dt , (8.S.23)
γm
where D = k T /γm is the diffusion constant. Thus, after the memory of the initial conditions is lost (t ≫ γ
B
−1
) , the mean position advances
linearly in time due to the external force, and the RMS fluctuations in position also increase linearly.
∙ Fokker-Planck equation: Suppose x(t) is a stochastic variable, and define
δx(t) ≡ x(t + δt) − x(t) . (8.S.24)
Furthermore, assume ⟨δx(t)⟩ = F (x(t))δt and ⟨[δx(t)] ⟩ = F (x(t))δt , but that ⟨[δx(t)]
1
2
2
n
⟩ − O(δt )
2
for n > 2 . One can then show that
the probability density P (x, t) = ⟨δ(x − x(t))⟩ satisfies the Fokker-Planck equation,
2
∂P ∂ 1 ∂
=− [ F (x) P (x, t)] + [ F (x) P (x, t)] . (8.S.25)
1 2
∂t ∂x 2 ∂x2
For Brownian motion, F (x) = F /γm ≡ u

1
and F (x) = 2D
2
. The resulting Fokker-Planck equation is then P
t
= −u Px + DPxx , where
2
P
t
=
∂P
∂t
, Pxx =
∂ P
∂x2
, The Galilean transformation x → x − ut then results in P
t
= DPxx , which is known as the diffusion equation, a
∞
general solution to which is given by P (x, t) = ′ ′

∫dx K(x − x , t − t ) P (x , t )
′ ′ ′
, where
−∞
2
−1/2 −(Δx) /4DΔt
K(Δx, Δt) = (4πDΔt) e (8.S.26)
is the diffusion kernel. Thus, \RDelta x\ns_\ssr{RMS}=\sqrt{2D\RDelta t} .
Endnotes
1. Indeed, any arbitrary function of p alone would be a solution. Ultimately, we require some energy exchanging processes, such as
collisions, in order for any initial nonequilibrium distribution to converge to the Boltzmann distribution.↩
2. Recall from classical mechanics the definition of the Poisson bracket, {A, B} = ⋅ − ⋅ . Then from Hamilton’s equations
∂A
∂r
∂B
∂p
∂B
∂r
∂A
∂p
ṙ =
∂H
∂p
and ṗ = − ∂H
∂r
, where H (p, r, t) is the Hamiltonian, we have dA
dt
= {A, H } . Invariants have zero Poisson bracket with the
Hamiltonian.↩
3. See Lifshitz and Pitaevskii, Physical Kinetics, §2.↩
4. The function g(x) = x ln x − x + 1 satisfies g (x) = ln x , hence g (x) < 0 on the interval x ∈ [0, 1) and g (x) > 0 on x ∈ (1, ∞].
′ ′ ′
Thus, g(x) monotonically decreases from g(0) = 1 to g(1) = 0 , and then monotonically increases to g(∞) = ∞ , never becoming
negative.↩
5. In the chapter on thermodynamics, we adopted a slightly different definition of c as the heat capacity per mole. In this chapter c is the
p p
heat capacity per particle.↩

6. Here we abbreviate QDC for ‘quick and dirty calculation’ and BRT for ‘Boltzmann equation in the relaxation time approximation’.↩
7. The difference is trivial, since p = mv .↩
8. See the excellent discussion in the book by Krapivsky, Redner, and Ben-Naim, cited in §8.1.↩
9. The requirements of an inner product ⟨f |g⟩ are symmetry, linearity, and non-negative definiteness.↩
10. We neglect interband scattering here, which can be important in practical applications, but which is beyond the scope of these notes.↩
11. The transition rate from |k ⟩ to |k⟩ is proportional to the matrix element and to the product f (1 − f ) . The reverse process is proportional
′ ′
to f (1 − f ) . Subtracting these factors, one obtains f − f , and therefore the nonlinear terms felicitously cancel in Equation [qobc].↩
′ ′
12. In this section we use j to denote electrical current, rather than particle number current as before.↩
13. To create a refrigerator, stick the cold junction inside a thermally insulated box and the hot junction outside the box.↩
14. Note that it is E ⋅ j and not \boldmath{E}⋅ j which is the source term in the energy continuity equation.↩
15. Remember that physically the fixed quantities are temperature and total carrier number density (or charge density, in the case of electron
and hole bands), and not temperature and chemical potential. An equation of state relating n , μ , and T is then inverted to obtain μ(n, T ),
so that all results ultimately may be expressed in terms of n and T .↩
16. The cgs unit of viscosity is the Poise (P). 1 P = 1 g/cm⋅s .↩
17. We further demand β = 0 and P
n=0
(t) = 0 at all times.↩
−1
18. A discussion of measure for functional integrals is found in R. P. Feynman and A. R. Hibbs, Quantum Mechanics and Path Integrals.↩
19. In this section, we use the notation χ^(ω) for the susceptibility, rather than G (ω)↩
^
CHAPTER OVERVIEW
9: Renormalization
9.S: Summary
Thumbnail: RG flow for the Ising model with nearest and nearest neighbor interactions.
This page titled 9: Renormalization is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel Arovas.
1
A statistical mechanical system is defined by a set of degrees of freedom and by a set of coupling constants {K }. The degrees of α
freedom can be discrete, such as Ising spins σ , or continuous, such as a field ϕ(r) . Additionally, each such system possesses a
i
microscopic length scale ℓ . For discrete, lattice-based systems, this length scale is simply the lattice spacing: ℓ = a . For continuous
systems, we can define a microscopic length scale by imposing a cutoff Λ on the wavevectors we integrate over in all Fourier
transforms. That is, we replace
d d
d k d k
∫ F (k) ⟶ ∫ F (k) g (k) , (9.1.1)
d d Λ
(2π) (2π)
where F (k) is any function and g (k) is the cutoff function. The simplest such case to imagine is a sharp cutoff which is isotropic
Λ
in wavevector, g (k) = Θ(Λ − |k|) . Other cutoff schemes, however, are possible, including ‘soft cutoffs’ where g (k) is smooth.
Λ Λ
The microscopic length scale is then ℓ ∼ Λ , which is the smallest distance in real space over which the system can
−1
independently fluctuate.
The idea behind renormalization is that we can successively winnow degrees of freedom from a system in some exact or
approximate way, and in so doing we generate a new version of the system, at a different length scale ℓ > ℓ , and with different
′
couplings {K }. We then iterate this procedure. The result is a set of equations which tell us how the couplings behave under a
′
α
change of the microscopic length scale. As we shall see, the fixed points of this procedure – where couplings do not change under a
change of length scale – are critical points. Such a fixed point is defined by a set of couplings {K }. ∗
α
If we denote by R the renormalization procedure

b
′ ′
Rb (ℓ , { Kα }) = (ℓ , { Kα }) , (9.1.2)
where ℓ = b ℓ , then we have the composition law R R = R

′
b
. The set of transformations {R } is collectively referred to as
′
b b+b
′
b
the renormalization group (RG) because of this mathematical structure. It is somewhat of a misnomer, however, since the
transformations are only defined for b ≥ 1 , which means that there is no inverse operation, and hence no true group structure1.
Nevertheless we shall use the RG terminology because it has become universally accepted in the literature.
This page titled 9.1: The Program of Renormalization is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by
Daniel Arovas.
As alluded to previously, there are two different classes of renormalization. One class, called real space renormalization group
(RSRG), eliminates local lattice-based degrees of freedom at each step in the RG process. The second class, called momentum
space renormalization group (MSRG), is implemented by systematically lowering the cutoff Λ in the wavevector integrals. It turns
out that the RSRG process, for reasons we shall see, is uncontrolled, and for ‘professional’ results one resorts to MSRG.
Nevertheless RSRG provides us with perhaps the most vivid and intuitive understanding of what renormalization is all about, so we
shall begin there.
Figure 9.2.1 : Real space renormalization of a one-dimensional lattice by ‘integrating out’ the degrees of freedom on half the lattice
sites.
RSRG for the Ising chain

Consider a d = 1 Ising model with Hamiltonian
^ = −J ∑ σ σ
H . (9.2.1)
n n+1
^
Our goal is to compute the partition function Z = Tr e
−βH
. We do this by first tracing over the degrees of freedom on all the odd
index sites. We have
βJσ σ βJσ σ βJ( σ +σ ) −βJ( σ +σ )
∑ e 2n 2n+1
e 2n+1 2n+2
=e 2n 2n+1
+e 2n 2n+1
σ2n+1
2 cosh(βJ) if | σ σ ⟩ = | ↑↑ ⟩ or | ↓↓ ⟩
2n 2n+2
={
2 if | σ σ ⟩ = | ↑↓ ⟩ or | ↓↑ ⟩
2n 2n+2
′
βJ σ σ βΔε
≡e 2n 2n+2
e ,
where
′
βJ βΔε
e e = 2 cosh(2βJ)
′
−βJ βΔε
e e =2 ,
from which we obtain

′
2βJ
e = cosh(2βJ)
−−−−−−−−
βΔε
e = 2 √cosh(2βJ) .
Thus, if we write our original Hamiltonian as

^
H
= ∑ (c − K σn σ ) , (9.2.2)
n+1
kB T
n
where K = βJ , then the RSRG transformation in which we trace out over every other site results in
1
′
c = c − ln 2 − ln cosh(2K)
2
1
′
K = ln cosh(2K)
2
′
a = 2a ,
where the last equation describes the change in the lattice constant. The second of these equations may be written
′ 2
tanh K = tanh K . (9.2.3)
Suppose we perform this procedure n times. We then have, with ℓ 0

≡a ,
n n
ℓn = 2 ℓ , ln tanh Kn = 2 ln tanh K . (9.2.4)
0 0
At this point, we can imagine ℓ to be a continuous variable. We can now write down the behavior of the coupling constant K as a
function of the microscopic length scale ℓ :
ℓ/ℓ0
tanh K(ℓ) = (tanh K ) . (9.2.5)
0
Let’s define g ≡ tanh K . We then have the RG flow equation

∂ ln g
β(g) ≡ = b ln tanh K = ln g < 0 , (9.2.6)
0
∂ ln b
where b = ℓ/ℓ . Thus, as ℓ increases, ln g flows to increasingly negative values, meaning

0
g → 0 , which entails K → 0 . So as ℓ
flows to larger and larger values, the coupling K(ℓ) gets smaller and smaller.
Figure 9.2.2 : Real space renormalization of a two-dimensional square lattice.
Two-dimensional square lattice

Consider next a RSRG transformation of the two-dimensional square lattice Ising model. As depicted in Figure 9.2.2, the square
– –
lattice is bipartite, consisting of two interpenetrating √2 × √2 square sublattices. Let’s try to do the same as for the one-
dimensional Ising model and trace out over the degrees of freedom of one of the sublattices. To this end, let us trace out over a
single site, which has four neighbors on the square lattice, as shown in Figure 9.2.3. We have2
K σ ( σ +σ +σ +σ ) K( σ +σ +σ +σ ) −K( σ +σ +σ +σ )
∑e 0 1 2 3 4
=e 1 2 3 4
+e 1 2 3 4
σ
0
′
′ ′ ′
K ( σ σ +σ σ +σ σ +σ σ ) K̃ ( σ σ +σ σ ) L σ σ σ σ α
≡e 1 2 2 3 3 4 4 1
e 1 3 2 4
e 1 2 3 4
e .
′
It should be clear that K̃ =K
′
, because the spin σ
0
couples to the sum (σ
1
+σ
2
+σ
3
+σ )
4
so there can be no distinction
′
between induced nearest neighbor interactions ( K σ σ ) and induced next-nearest neighbor interactions (
′
1 2
K̃ σ σ
1 3
) at this stage.
We setting | σ σ σ σ ⟩ to | ↑↑↑↑ ⟩, | ↑↑↑↓ ⟩, and | ↑↑↓↓ ⟩, respectively, we obtain the relations
1 2 3 4
′ ′ ′
6 K +L +α
2 cosh(4K) = e
′ ′
−L +α
2 cosh(2K) = e
′ ′ ′
−2 K +L +α
2 =e .
The solution is
′
1
K = ln cosh(4K)
8
′
1 1
L = ln cosh(4K) − ln cosh(2K)
8 2
′ 1/8 1/2
α = 2 cosh (4K) cosh (2K) .
Figure 9.2.3 : Tracing out over the central site results in three different interactions: nearest neighbor (blue), next-nearest neighbor
(red), and a four-site plaquette term (green).
Note that new couplings have been generated at this very first step of the RSRG procedure. It now becomes very difficult to iterate
~
this transformation a second time, since the presence of second neighbor and plaquette couplings K and L means that we cannot
exactly integrate out one of the sublattices as before. Still we could imagine iterating this RSRG procedure, if only perturbatively in
certain couplings. We see, though, that rather than considering the effect of R on a single coupling K or the pair (K, α) , we
should instead consider, if only formally, the iteration of an infinite set of all possible couplings, {K }. Writing this as a vector K,
α
we can write the RSRG transformation in the form

′
K = \boldmath{R} (K) . (9.2.7)
b
Figure 9.2.4 : Spin blocking on the triangular lattice.
This page titled 9.2: Real Space Renormalization is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel
Arovas.
Spin blocking refers to a process in which we replace a group of spins by a single spin whose direction is determined by ‘majority
rule’. That is, if most of the spins in the group are up, then the block spin is said to be up; if most spins are down, then the block
spin is down. The block spins interact with a different set of couplings {K }. ′
α
Consider a b × b × ⋯ × b block of spins, and define the block spin projector,

d
′ b
1 of σa = sgn (∑ σ )
′ i=1 a,i
T (σa , { σ }) = { (9.3.1)
i
0 otherwise .
Note that
′
∑ T (σa , { σ }) = 1 . (9.3.2)
a,i
σa
Here a indexes the blocks, and σ denotes the i spin within the a block. The block spin projector effects the ‘majority rule’
a,i
th th
operation, assigning σ to ±1 depending on whether the majority of the spins in the block a are up (σ = +1 ) or down (
′
a
′
a
σ = −1) . Note that such a procedure presumes an odd number of spins in each block. Then
′
a
^
−βH [{ σ }]
Z = ∑e i
{σ }
i
⎧ ⎫
^
−βH [{ σa .i}] ′
= ∑ ⎨∑ e ∏ T (σa , { σ })⎬
a,i
⎩ ⎭
′
{ σa } {σ } a
i
′
^ ′
−βH [{ σa }]
= ∑e ,
′
{ σa }
where
′
^ ′ ^
−βH [{ σa }] −βH [{ σ }] ′
e = ∑e i
∏ T (σa , { σ }) . (9.3.3)
a,i
{σ } a
i
This page titled 9.3: Block Spin Transformation is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel
Arovas.
We’ve seen how an RG transformation acts on the (infinite) set of couplings which define the Hamiltonian of a system. We found
K = \boldmath{R} (K). If ξ(K) is the correlation length in units of the lattice spacing, then since each RG step involves a
′
rescaling by a factor b , we must have

′ 2 ′′
ξ(K) = b ξ(K ) = b ξ(K ) = ⋯ . (9.4.1)
A fixed point of the transformation \boldmath{R} is a set of couplings K such that

b
∗
∗ ∗
\boldmath{R} (K ) = K . (9.4.2)
b
Linearizing \boldmath{R} (K) about the fixed point, we have

b
′
∂Kα ∣
′ ∗ ∗
Kα − Kα = ∑ Q (K −K ) , Q = ∣ . (9.4.3)
αβ β β αβ
∂K ∣ K
∗
β β
The matrix Q αβ
is real but not necessarily symmetric. We define the left eigenvectors of Q, ϕ , such that (i)
α
(i) (i)
∑ ϕα Q =λ ϕ . (9.4.4)
αβ i β
α
The scaling variable u is then defined as

i
(i)
∗
u ≡ ∑ ϕα (Kα − Kα ) . (9.4.5)
i
It should now be apparent that under an RG transformation, we have

(i) (i) (i)
′ ′ ∗ ∗ ∗
u = ∑ ϕα (Kα − Kα ) = ∑ ϕα ,Q (K −K ) = λ ∑ ϕα (Kα − Kα ) = λ u . (9.4.6)
i αβ β β i i i
α α,β α
We say that
⎪ \it relevant
⎧ if λ
i
>1
u is ⎨ \it irrelevant if λ <1 (9.4.7)

i i
⎩
⎪
\it marginal if λ =1 .
i
Under renormalization, relevant scaling variables flow away from the fixed point, while irrelevant scaling variables flow toward
the fixed point. For marginal variables, one must go to higher order, beyond the above linearization, to determine whether the flow
is away from (marginally relevant) or toward (marginally irrelevant) the fixed point.
This page titled 9.4: Scaling Variables is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Daniel Arovas.
This page titled 9.5: RSRG on Hierarchical Lattices The is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by
Daniel Arovas.
9.S: Summary
References
M. Kardar, Statistical Physics of Fields (Cambridge, 2007) A superb modern text, with many insightful presentations of key
concepts.
rd
G. Parisi, Statistical Field Theory (Addison-Wesley, 1988) An advanced text focusing on field theoretic approaches, covering
mean field and Landau-Ginzburg theories before moving on to renormalization group and beyond.
J. P. Sethna, Entropy, Order Parameters, and Complexity (Oxford, 2006) An excellent introductory text with a very modern set
of topics and exercises.
Endnotes
1. A more mathematically rigorous name would be the renormalization monoid.↩
2. Our choice of what terms to put in the exponent in the second line below is dictated by global Z symmetry. Once we sum over
2
σ , the result should be invariant under simultaneous reversal of σ

0 .↩
1−4
Index
A ferromagnetic momentum space renormalization group
Annihilation operator 6.1: Ising Model 9.2: Real Space Renormalization
5.9: Appendix I- Second Quantization first Brillouin zone
antiferromagnetic 5.8: The Ideal Fermi Gas N
6.1: Ising Model Flory Theory Negative temperature
6.6: Polymers 4.8: Selected Examples
B
baker’s transformation G O
3.4: Remarks on Ergodic Theory generating function orthohydrogen
Bayesian Statistical Inference 3.6: Appendices 4.9: Statistical Mechanics of Molecular Gases
1.5: Bayesian Statistical Inference Gibbs distribution Oseen hydrodynamic tensor
BBGKY Hierarchy 4.4: Ordinary Canonical Ensemble (OCE) 6.6: Polymers
6.4: Liquid State Physics grand partition function
block spin projector 4.5: Grand Canonical Ensemble (GCE) P
9.3: Block Spin Transformation pair correlation function
Boltzmann entropy H 6.4: Liquid State Physics
4.4: Ordinary Canonical Ensemble (OCE) Hessian partition function
Bose condensation 2.9: Equilibrium and Stability 4.1: Microcanonical Ensemble (μCE)
5.11: Appendix III- Example Bose Condensation hyperscaling 4.4: Ordinary Canonical Ensemble (OCE)
Problem 6.4: Liquid State Physics phase space average
Brownian motion 3.4: Remarks on Ergodic Theory
8.9: Stochastic Processes I photon gas
imperfect bifurcation 5.5: Photon Statistics
C 7.3: Mean Field Theory Plasmas
central limit theorem Information Theory 6.5: Coulomb Systems - Plasmas and the Electron
Gas
1.4: General Aspects of Probability Distributions 1.3: Entropy and Probability
chemical potential Integrating Factors Poincaré Recurrence
4.5: Grand Canonical Ensemble (GCE) 2.14: Appendix I- Integrating Factors
correlation functions Ising Model Poincaré recurrence theorem
6.4: Liquid State Physics 6.1: Ising Model
Creation operator 7.2: Fluids, Magnets, and the Ising Model polydispersity
6.6: Polymers
Curie temperature L Polymers
6.6: Polymers
7.5: Landau Theory of Phase Transitions Landau Diamagnetism
Potts Model
D Landau free energy 6.7: Appendix I- Potts Model in One Dimension
density matrix 4.5: Grand Canonical Ensemble (GCE)

4.2: The Quantum Mechanical Trace Landau Theory Q
density of states 7.5: Landau Theory of Phase Transitions Quantum Dephasing
Langevin equation 3.5: Thermalization of Quantum Systems
dephasing 8.9: Stochastic Processes
3.5: Thermalization of Quantum Systems lattice coordination number R
des Cloiseaux law 7.2: Fluids, Magnets, and the Ising Model radial distribution function
6.6: Polymers Liouville’s equation 6.4: Liquid State Physics
3.2: Phase Flows in Classical Mechanics random walk
E 1.1: Statistical Properties of Random Walks
eigenstate thermalization hypothesis M Real Space Renormalization
Master Equation 9.2: Real Space Renormalization
Electron Gas 3.1: Modeling the Approach to Equilibrium real space renormalization group
3.6: Appendices 9.2: Real Space Renormalization
6.5: Coulomb Systems - Plasmas and the Electron
Gas Maxwell relations Renormalization
equipartition theorem 2.8: Maxwell Relations 9.1: The Program of Renormalization
4.7: Ideal Gas Statistical Mechanics Mayer cluster expansion renormalization group (RG)
ergodicity 6.2: Nonideal Classical Gases 9.1: The Program of Renormalization
3.4: Remarks on Ergodic Theory Mean field theory response function
7.3: Mean Field Theory 6.4: Liquid State Physics
F microcanonical distribution function Rouse model
Fermi Distribution 3.4: Remarks on Ergodic Theory 6.6: Polymers
Microcanonical Ensemble
S T V
scattering time thermal conductivity van der Waals equation of state
8.4: Relaxation Time Approximation 8.4: Relaxation Time Approximation 7.1: The van der Waals system
second quantization thermal equilibrium virial expansion
5.9: Appendix I- Second Quantization 4.3: Thermal Equilibrium 5.2: Quantum Ideal Gases - Low Density Expansions
second quantized representation thermal wavelength 6.2: Nonideal Classical Gases
5.1: Statistical Mechanics of Noninteracting 4.7: Ideal Gas Statistical Mechanics viscosity
Quantum Systems Tonks gas 8.4: Relaxation Time Approximation
second viscosity 6.2: Nonideal Classical Gases
8.4: Relaxation Time Approximation Z
Stochastic Processes U zero field isothermal susceptibility
8.9: Stochastic Processes 6.1: Ising Model
Ultrarelativistic ideal gas
4.1: Microcanonical Ensemble (μCE) zeroth law of thermodynamics
Index
A ferromagnetic momentum space renormalization group
Annihilation operator 6.1: Ising Model 9.2: Real Space Renormalization
5.9: Appendix I- Second Quantization first Brillouin zone
antiferromagnetic 5.8: The Ideal Fermi Gas N
6.1: Ising Model Flory Theory Negative temperature
6.6: Polymers 4.8: Selected Examples
B
baker’s transformation G O
3.4: Remarks on Ergodic Theory generating function orthohydrogen
Bayesian Statistical Inference 3.6: Appendices 4.9: Statistical Mechanics of Molecular Gases
1.5: Bayesian Statistical Inference Gibbs distribution Oseen hydrodynamic tensor
BBGKY Hierarchy 4.4: Ordinary Canonical Ensemble (OCE) 6.6: Polymers
6.4: Liquid State Physics grand partition function
block spin projector 4.5: Grand Canonical Ensemble (GCE) P
9.3: Block Spin Transformation pair correlation function
Boltzmann entropy H 6.4: Liquid State Physics
4.4: Ordinary Canonical Ensemble (OCE) Hessian partition function
Bose condensation 2.9: Equilibrium and Stability 4.1: Microcanonical Ensemble (μCE)
5.11: Appendix III- Example Bose Condensation hyperscaling 4.4: Ordinary Canonical Ensemble (OCE)
Problem 6.4: Liquid State Physics phase space average
Brownian motion 3.4: Remarks on Ergodic Theory
8.9: Stochastic Processes I photon gas
imperfect bifurcation 5.5: Photon Statistics
C 7.3: Mean Field Theory Plasmas
central limit theorem Information Theory 6.5: Coulomb Systems - Plasmas and the Electron
Gas
1.4: General Aspects of Probability Distributions 1.3: Entropy and Probability
chemical potential Integrating Factors Poincaré Recurrence
4.5: Grand Canonical Ensemble (GCE) 2.14: Appendix I- Integrating Factors
correlation functions Ising Model Poincaré recurrence theorem
6.4: Liquid State Physics 6.1: Ising Model
Creation operator 7.2: Fluids, Magnets, and the Ising Model polydispersity
6.6: Polymers
Curie temperature L Polymers
6.6: Polymers
7.5: Landau Theory of Phase Transitions Landau Diamagnetism
Potts Model
D Landau free energy 6.7: Appendix I- Potts Model in One Dimension
density matrix 4.5: Grand Canonical Ensemble (GCE)

4.2: The Quantum Mechanical Trace Landau Theory Q
density of states 7.5: Landau Theory of Phase Transitions Quantum Dephasing
Langevin equation 3.5: Thermalization of Quantum Systems
dephasing 8.9: Stochastic Processes
3.5: Thermalization of Quantum Systems lattice coordination number R
des Cloiseaux law 7.2: Fluids, Magnets, and the Ising Model radial distribution function
6.6: Polymers Liouville’s equation 6.4: Liquid State Physics
3.2: Phase Flows in Classical Mechanics random walk
E 1.1: Statistical Properties of Random Walks
eigenstate thermalization hypothesis M Real Space Renormalization
Master Equation 9.2: Real Space Renormalization
Electron Gas 3.1: Modeling the Approach to Equilibrium real space renormalization group
3.6: Appendices 9.2: Real Space Renormalization
Gas Maxwell relations Renormalization
equipartition theorem 2.8: Maxwell Relations 9.1: The Program of Renormalization
4.7: Ideal Gas Statistical Mechanics Mayer cluster expansion renormalization group (RG)
ergodicity 6.2: Nonideal Classical Gases 9.1: The Program of Renormalization
3.4: Remarks on Ergodic Theory Mean field theory response function
7.3: Mean Field Theory 6.4: Liquid State Physics
F microcanonical distribution function Rouse model
Fermi Distribution 3.4: Remarks on Ergodic Theory 6.6: Polymers
Microcanonical Ensemble
S T V
scattering time thermal conductivity van der Waals equation of state
8.4: Relaxation Time Approximation 8.4: Relaxation Time Approximation 7.1: The van der Waals system
second quantization thermal equilibrium virial expansion
5.9: Appendix I- Second Quantization 4.3: Thermal Equilibrium 5.2: Quantum Ideal Gases - Low Density Expansions
second quantized representation thermal wavelength 6.2: Nonideal Classical Gases
5.1: Statistical Mechanics of Noninteracting 4.7: Ideal Gas Statistical Mechanics viscosity
Quantum Systems Tonks gas 8.4: Relaxation Time Approximation
second viscosity 6.2: Nonideal Classical Gases
8.4: Relaxation Time Approximation Z
Stochastic Processes U zero field isothermal susceptibility
8.9: Stochastic Processes 6.1: Ising Model
Ultrarelativistic ideal gas
4.1: Microcanonical Ensemble (μCE) zeroth law of thermodynamics
Glossary
Sample Word 1 | Sample Definition 1
Detailed Licensing
Overview
Title: Book: Thermodynamics and Statistical Mechanics (Arovas)
Webpages: 113
Applicable Restrictions: Noncommercial
All licenses found:
CC BY-NC-SA 4.0: 94.7% (107 pages)
Undeclared: 5.3% (6 pages)
By Page
Book: Thermodynamics and Statistical Mechanics (Arovas) 2.12: Entropy of Mixing and the Gibbs Paradox - CC
- CC BY-NC-SA 4.0 BY-NC-SA 4.0
Front Matter - CC BY-NC-SA 4.0 2.13: Some Concepts in Thermochemistry - CC BY-
TitlePage - CC BY-NC-SA 4.0 NC-SA 4.0
InfoPage - CC BY-NC-SA 4.0 2.14: Appendix I- Integrating Factors - CC BY-NC-SA
Table of Contents - Undeclared 4.0
Licensing - Undeclared 2.15: Appendix II- Legendre Transformations - CC
BY-NC-SA 4.0
1: Fundamentals of Probability - CC BY-NC-SA 4.0
2.16: Appendix III- Useful Mathematical Relations -
1.1: Statistical Properties of Random Walks - CC BY- CC BY-NC-SA 4.0
NC-SA 4.0 2.S: Summary - CC BY-NC-SA 4.0
1.2: Basic Concepts in Probability Theory - CC BY-
3: Ergodicity and the Approach to Equilibrium - CC BY-
NC-SA 4.0
NC-SA 4.0
1.3: Entropy and Probability - CC BY-NC-SA 4.0
3.1: Modeling the Approach to Equilibrium - CC BY-
1.4: General Aspects of Probability Distributions -
NC-SA 4.0
CC BY-NC-SA 4.0
3.2: Phase Flows in Classical Mechanics - CC BY-
1.5: Bayesian Statistical Inference - CC BY-NC-SA
NC-SA 4.0
4.0
3.3: Irreversibility and Poincaré Recurrence - CC BY-
1.S: Summary - CC BY-NC-SA 4.0
NC-SA 4.0
2: Thermodynamics - CC BY-NC-SA 4.0
3.4: Remarks on Ergodic Theory - CC BY-NC-SA 4.0
2.1: What is Thermodynamics? - Undeclared 3.5: Thermalization of Quantum Systems - CC BY-
2.2: The Zeroth Law of Thermodynamics - CC BY- NC-SA 4.0
NC-SA 4.0 3.6: Appendices - CC BY-NC-SA 4.0
2.3: Mathematical Interlude - Exact and Inexact 3.S: Summary - CC BY-NC-SA 4.0
Differentials - CC BY-NC-SA 4.0
4: Statistical Ensembles - CC BY-NC-SA 4.0
2.4: The First Law of Thermodynamics - CC BY-NC-
SA 4.0 4.1: Microcanonical Ensemble (μCE) - CC BY-NC-SA
2.5: Heat Engines and the Second Law of 4.0
Thermodynamics - CC BY-NC-SA 4.0 4.2: The Quantum Mechanical Trace - CC BY-NC-SA
2.6: The Entropy - CC BY-NC-SA 4.0 4.0
2.7: Thermodynamic Potentials - CC BY-NC-SA 4.0 4.3: Thermal Equilibrium - CC BY-NC-SA 4.0
2.8: Maxwell Relations - CC BY-NC-SA 4.0 4.4: Ordinary Canonical Ensemble (OCE) - CC BY-
2.9: Equilibrium and Stability - CC BY-NC-SA 4.0 NC-SA 4.0
2.10: Applications of Thermodynamics - CC BY-NC- 4.5: Grand Canonical Ensemble (GCE) - CC BY-NC-
SA 4.0 SA 4.0
2.11: Phase Transitions and Phase Equilibria - CC BY- 4.6: Statistical Ensembles from Maximum Entropy -
NC-SA 4.0 CC BY-NC-SA 4.0
4.7: Ideal Gas Statistical Mechanics - CC BY-NC-SA
4.0
4.8: Selected Examples - CC BY-NC-SA 4.0 7.8: Ginzburg-Landau Theory - CC BY-NC-SA 4.0
4.9: Statistical Mechanics of Molecular Gases - CC 7.9: Appendix I- Equivalence of the Mean Field
BY-NC-SA 4.0 Descriptions - CC BY-NC-SA 4.0
4.10: Appendix I- Additional Examples - CC BY-NC- 7.10: Appendix II- Additional Examples - CC BY-
SA 4.0 NC-SA 4.0
4.S: Summary - CC BY-NC-SA 4.0 7.S: Summary - CC BY-NC-SA 4.0
5: Noninteracting Quantum Systems - CC BY-NC-SA 4.0 8: Nonequilibrium Phenomena - CC BY-NC-SA 4.0
5.1: Statistical Mechanics of Noninteracting Quantum 8.1: Equilibrium, Nonequilibrium and Local
Systems - CC BY-NC-SA 4.0 Equilibrium - CC BY-NC-SA 4.0
5.2: Quantum Ideal Gases - Low Density Expansions 8.2: Boltzmann Transport Theory - CC BY-NC-SA 4.0
- CC BY-NC-SA 4.0 8.3: Weakly Inhomogeneous Gas - CC BY-NC-SA 4.0
5.3: Entropy and Counting States - CC BY-NC-SA 4.0 8.4: Relaxation Time Approximation - CC BY-NC-SA
5.4: Lattice Vibrations - Einstein and Debye Models - 4.0
CC BY-NC-SA 4.0 8.5: Diffusion and the Lorentz model - CC BY-NC-SA
5.5: Photon Statistics - CC BY-NC-SA 4.0 4.0
5.6: The Ideal Bose Gas - CC BY-NC-SA 4.0 8.6: Linearized Boltzmann Equation - CC BY-NC-SA
5.7: The Ideal Fermi Gas - CC BY-NC-SA 4.0 4.0
5.8: The Ideal Fermi Gas - CC BY-NC-SA 4.0 8.7: The Equations of Hydrodynamics - CC BY-NC-
5.9: Appendix I- Second Quantization - CC BY-NC- SA 4.0
SA 4.0 8.8: Nonequilibrium Quantum Transport - CC BY-
5.10: Appendix II- Ideal Bose Gas Condensation - NC-SA 4.0
CC BY-NC-SA 4.0 8.9: Stochastic Processes - CC BY-NC-SA 4.0
5.11: Appendix III- Example Bose Condensation 8.10: Appendix I- Boltzmann Equation and
Problem - CC BY-NC-SA 4.0 Collisional Invariants - CC BY-NC-SA 4.0
5.S: Summary - CC BY-NC-SA 4.0 8.11: Appendix II- Distributions and Functionals - CC
6: Classical Interacting Systems - CC BY-NC-SA 4.0 BY-NC-SA 4.0
8.12: Appendix III- General Linear Autonomous
6.1: Ising Model - CC BY-NC-SA 4.0
Inhomogeneous ODEs - CC BY-NC-SA 4.0
6.2: Nonideal Classical Gases - CC BY-NC-SA 4.0
8.13: Appendix IV- Correlations in the Langevin
6.3: Lee-Yang Theory - CC BY-NC-SA 4.0
formalism - CC BY-NC-SA 4.0
6.4: Liquid State Physics - CC BY-NC-SA 4.0
8.14: Appendix V- Kramers-Krönig Relations - CC
BY-NC-SA 4.0
Gas - CC BY-NC-SA 4.0
8.S: Summary - CC BY-NC-SA 4.0
6.6: Polymers - CC BY-NC-SA 4.0
6.7: Appendix I- Potts Model in One Dimension - CC 9: Renormalization - CC BY-NC-SA 4.0
BY-NC-SA 4.0 9.1: The Program of Renormalization - CC BY-NC-SA
6.S: Summary - CC BY-NC-SA 4.0 4.0
7: Mean Field Theory of Phase Transitions - CC BY-NC- 9.2: Real Space Renormalization - CC BY-NC-SA 4.0
SA 4.0 9.3: Block Spin Transformation - CC BY-NC-SA 4.0
7.1: The van der Waals system - CC BY-NC-SA 4.0 9.4: Scaling Variables - CC BY-NC-SA 4.0
7.2: Fluids, Magnets, and the Ising Model - CC BY- 9.5: RSRG on Hierarchical Lattices The - CC BY-NC-
NC-SA 4.0 SA 4.0
7.3: Mean Field Theory - CC BY-NC-SA 4.0 9.S: Summary - CC BY-NC-SA 4.0
7.4: Variational Density Matrix Method - CC BY-NC- Back Matter - CC BY-NC-SA 4.0
SA 4.0 Index - CC BY-NC-SA 4.0
7.5: Landau Theory of Phase Transitions - CC BY- Index - Undeclared
NC-SA 4.0 Glossary - Undeclared
7.6: Mean Field Theory of Fluctuations - CC BY-NC- Detailed Licensing - Undeclared
SA 4.0
7.7: Global Symmetries - CC BY-NC-SA 4.0

Full

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Full

Uploaded by

Copyright:

Available Formats

THERMODYNAMICS

3: Ergodicity and the Approach to Equilibrium

5: Noninteracting Quantum Systems

6: Classical Interacting Systems

7: Mean Field Theory of Phase Transitions

where σ = +1 if the ball moves to the right at level j , and

⟨X⟩ = ⟨ ∑ σ ⟩ = N ⟨σ⟩ = N ∑ σ P (σ) = N (p − q) = N (2p − 1) . (1.1.4)

Here we have used

are proportional to N . In the limit N → ∞ then, the ratio ΔX /⟨X⟩ vanishes as N

{N\choose N\ns_\ssr{R}}={N!\over N\ns_\ssr{R}!\,N\ns_\ssr{L}!}\ .

Note that N\pm X=2N\ns_\ssr{R/L} , so we can replace N\ns_\ssr{R/L}=\half(N\pm X) . Thus,

so invoking Taylor’s theorem,

Putting it all together, we have

where X̄ = ⟨X⟩ = N (p − q) = N x̄ . The constant C is determined by the normalization condition,

Entropy and energy

S(N,x)=N s(x) = \ln\!{N\choose N\ns_\ssr{R}} = \ln\!{N\choose \half N(1+x)}\ .

The energy term biases the probability P N ,X

Boltzmann’s constant times temperature.

If {A } is a collection of disjoint sets, if A ∩ A = ∅ for all i ≠ j , then

P (B) = ∑ P (B| A ) P (A ) . (1.2.4)

Note that P (A|B) + P (¬A|B) = 1 (which follows from ¬¬A = A ).

P (B|A) + P (B|¬A) = 1 + P (B|A) − P (¬B|¬A) = 1 + se − sp . (1.2.8)

For continuous distributions, we speak of a probability density. We then have

P (y) = ∫ dx P (y|x) P (x) (1.2.10)

The range of integration may depend on the specific application.

Random variables and their averages

Ef = ⟨f ⟩ = ∑ f (x) P (x) , (1.2.12)

Ef = ⟨f ⟩ = ∫ dx f (x) P (x) . (1.2.13)

Ef (x, y) = ⟨f (x, y)⟩ = ∑ ∑ f (x, y) P (x, y) , (1.2.14)

C ≡ ⟨(x − ⟨x ⟩)(x − ⟨x ⟩)⟩ = ⟨x x ⟩ − ⟨x ⟩⟨x ⟩ . (1.2.15)

If f (x) is a convex function then one has

For continuous functions, f (x) is convex if f ′′

f (λ x + (1 − λ)x ) ≤ λf (x ) + (1 − λ)f (x ) , (1.2.17)

where λ ∈ [0, 1]. This is easily generalized to

where p n = P (xn ) , a result known as Jensen’s theorem.

Entropy and Information Theory

I (p) is a continuous function of p .

There is no information content to an event which is always observed, I (1) = 0 .

have that the average information per observation is

In general, this is far less that the total possible number K

log2 g(N ) = log2 (N !) − ∑ log2 (Nn !) ≈ −N ∑ pn log2 pn , (1.3.4)

j = 1, … , N . Overall normalization requires

but this just imposes one constraint on the 2 probabilities N

corresponding to N independent steps?

The principle of maximum entropy

Lagrange’s method of undetermined multipliers. We define

From the first of these equations, we obtain p n =e

which yields the two-parameter distribution

To fully determine the distribution {p } we need to invoke the two equations ∑

and therefore we define

where Z = e is determined by normalization: ∑ p

determined by the K + 1 constraints in Equation [Kpoc].

We then have X = (2p − 1)N , which determines p = 1

with Z(λ , λ ) determined by normalization: ∑

where τ and τ are Pauli matrices, we have that

In the thermodynamic limit, the ζ eigenvalue dominates, and Z

family of distributions, parametrized ε = E/N , which satisfy the constraint on X.

with respect to P (φ) and with respect to {λ }. Again, X a

Extremizing the entropy, we then obtain

this space. In general, we can write

dμ = C ∏ dqσ dpσ , (1.4.4)

where C is a constant we shall discuss later on below8.

The average of a function A(φ) on configuration space is then