You are on page 1of 30

Statistical Mechanics

Lecture Notes
Fall 2021

Rizwan Khalid
Department of Physics
Syed Babar Ali School of Science and
Engineering (SBASSE)
Lahore University of Management Sci-
ences (LUMS)
E-mail: rizwan khalid@lums
Contents

1 Statistical Basis of Thermodynamics 1


1.1 Pressure and Chemical Potential . . . . . . . . . . . . . . . . . . . . . . 3

2 Essentials of Probability Theory 5


2.1 Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 The 1D Random Walk . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Mean and Standard Deviation . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Mean and Standard Deviation of the Binomial Distribution . . . . . . . 9
2.5 Gaussian Limit of the Binomial Distribution . . . . . . . . . . . . . . . 10
2.6 The Continuous Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3 The Microcanonical Ensemble 13


3.1 Paramagnetism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Einstein’s Model of Solids . . . . . . . . . . . . . . . . . . . . . . . . . 17

4 The Canonical Ensemble 19


4.1 The Canonical Distribution . . . . . . . . . . . . . . . . . . . . . . . . 19
4.1.1 A system in thermal contact with a heat bath . . . . . . . . . . 20
4.2 Thermodynamics from the Canonical Distribution . . . . . . . . . . . . 21
4.3 The Partition Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.4 Assorted Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.4.1 Two State System . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.4.2 Independent Simple Harmonic Oscillators . . . . . . . . . . . . 26

i
Chapter 1

Statistical Basis of
Thermodynamics

The thermodynamic limit is defined as the limit in which N, V → ∞ such that N/V
is kept at a constant value. In this limit, the extensive variables become proportional
to N or V and the intensive variables become independent of system size.
Consider a non-interacting system of N particles. Let there be ni particles with
energies i each, then the total energy can be written as,
X X
U= ni i with N = ni
i i

From quantum mechanics we know that the i are discrete and depend crucially
on V . In the limit of large V , the spacing between the different energy levels decreases
to the extent that we can consider a continuum of allowed energy states.
Similar remarks hold for an interacting system even though the total energy can’t
be written as a sum of individual particle energies.
A macrostate may be specified by giving U, V, N . The microstate may be specified
by giving the various solutions of the time-independent Schrödinger equation. This
is because such solutions are automatically consistent with the system having energy
U . One may assume equal a priori probabilities for all microstates consistent with
the same macrostate.
Let g = g(U, V, N ) be the number of microstates consistent with U, V, N . The
dependence on N and U is obvious. g depends on V because the allowed energy
states depend on the size of the system.
Consider two systems A1 and A2 which are separately in equilibrium. They are
assumed to be separated by walls that don’t allow any kind of an exchange between
the two systems. The mathematical form for g1 and g2 may be different as it depends
on the details of the two systems. All thermodynamic properties of Ai may be derived
from gi .

1
Let’s now assume that the wall separating the two systems becomes conducting
while still remaining rigid and impermeable. This allows the two systems to exchange
energy. The Ai will henceforth be referred to as subsystems of the full system A(0) .
With the possibility of energy exchange, the two component energies Ei will change
subject to,

U1 + U2 = U (0) = constant,

where U (0) is the energy of the full system A(0) . Any energy of interaction between
A1 and A2 is being neglected right now. At any given time t, the subsystem Ai is
equally likely to be in anyone of the microstates gi (Ui , Ni , Vi ). The composite system
is, therefore, in anyone of the microstates g (0) with

g (0) (U1 , U2 ) = g1 (U1 )g2 (U2 ) = g1 (U1 )g2 (U (0) − U1 ),


= g (0) (U (0) , U1 ) (1.1)

where dependence on Ni , Vi is suppressed as these are fixed for the moment.


The pertinent question to ask is: At what value of the variable U1 will the composite
system be in equilibrium? This, consistent with our previous discussion, is the state
that maximizes g (0) (U1 , U2 ).

∂g (0)
⇒ =0
∂U1
   
∂g1 (U1 ) ∂g2 (U2 ) ∂U2
= g2 (U 2 ) + g1 (U 1 ) , (1.2)
∂U1 U1 =U 1 ∂U2 U2 =U 2 ∂U1

where the overlines indicate equilibrium values. Now, since ∂U2 /∂U1 = −1, the above
condition reduces to,
   
1 ∂g1 (U1 ) 1 ∂g2 (U2 )
= (1.3)
g1 (U 1 ) ∂U1 U1 =U 1 g2 (U 2 ) ∂U2 U2 =U 2

A more convenient way of rewriting the above condition is,


   
∂ ln g1 (U1 ) ∂ ln g2 (U2 )
= (1.4)
∂U1 U1 =U 1 ∂U2 U2 =U 2

One may now define a parameter β via,


 
∂ ln g(U, N, V )
β≡ , (1.5)
∂U N,V,U =U

and the condition for thermal equilibrium between A1 and A2 may be stated in terms
of equality of β1 and β2 . It is, therefore, natural to assume that β is related to the

2
thermodynamic temperature. Recall further that U = U (N, S, V ) or S = S(U, N, V )
and from the 1st law of thermodynamics,

dU = T dS − P dV + µdN
1 P µ
⇒ dS = dU + dV − dN (1.6)
T T T
From above, it is clear that,
 
1 ∂S
= (1.7)
T ∂E N,V

It should now be obvious that the entropy is related to ln Ω. Planck used the
constant k to relate the two,

S = k ln Ω, (1.8)

without any additive constant. The connection between entropy and order is now
very clear. If only one microstate is possible for a given macrostate, S = 0. The
larger the number of microstates that are possible, the more randomness associated
with a given macrostate. The connection with information theory is also clear.
Finally, it should be obvious that
1
β= (1.9)
kT

1.1 Pressure and Chemical Potential


Assume now that the wall separating A1 and A2 is movable. Similar considerations
will lead to,
   
∂ ln g1 (U1 , N1 , V1 ) ∂ ln g2 (U2 , N2 , V2 )
= (1.10)
∂V1 N1 ,U1 ,V1 =V 1 ∂V2 N2 ,U2 ,V2 =V 2

as the condition for mechanical equilibrium. In analogy with β we define η,


 
∂ ln g(U, N, V )
η≡ . (1.11)
∂V N,U,V =V

Considerations of particle exchange lead us to define:


 
∂ ln g(U, N, V )
ξ≡ . (1.12)
∂N U,V,N =N

3
Full thermodynamic equilibrium demands equality of β, η, ξ. As earlier, it is easy
to see that,
P µ
η= and ξ = − (1.13)
kT kT
In summary, S = S(N, V, U ) = k ln g(U, N, V ). If we can, somehow, determine g,
the rest of thermodynamics follows from it. We shall now turn to the subject of how
this might be accomplished. Certainly, counting methods and probability theory will
play a role.

4
Chapter 2

Essentials of Probability Theory

The probability of an event is simply defined as the number of times the event occurs
divided by the total number of trials, provided the number of trials is large (→ ∞).

NA
P r(A) = lim (2.1)
Ntotal →∞ Ntotal

Have a look at Figure 2.1 that illustrates the essential ideas of set theory as needed
in probability theory. The Venn diagram illustrates a Universal set X with A and B
as subsets that have a non-empty intersection. The following is immediately obvious,

NA NB
P r(A) = , P r(B) =
NX NX
P r(A ∪ B) = P r(A)+P r(B) − P r(A ∩ B) (2.2)

If A and B are dis-jointed events, then P r(A ∪ B) = P r(A) + P r(B).

Figure 2.1: Illustration of Pr(A and/or B).

5
Also, we have

P r(A ∩ B) = P r(A|B)P r(B)


= P r(B|A)P r(A), (2.3)

and, so, if A and B are statistically independent, we have P r(A ∩ B) = P r(A)P r(B)
as statistical independence implies P r(A|B) = P r(A).

2.1 Binomial Distribution


We can now apply these ideas to a binomial process. The following considerations
apply to a binomial process,
(i) We perform N independent trials.
(ii) The probability p of success S on any given trial is the same. In other words,
the trials are identical.
(iii) We want to probe the question of the probability of obtaining N1 successes in
N trials.
N1 successes, of course, implies N − N1 failures F . Non-interacting spins on lattice
sites fulfills the above criteria as does the 1D random walk.
One way to obtain N1 successes and N − N1 failures is;

| {z· · · S} F
SSS · · · F}
| F {z (2.4)
N1 times N2 times

The probability of the above microstate is pN1 q N2 with q = 1−p and N2 = N −N1 .
This is because the different trials are independent. Furthermore, since the different
trials happen to be dis-jointed, we should see how many different ways there are of
achieving N1 successes and N2 failures. There are N ! permutations of N objects.
However, the N1 ! permutations of the successes and N2 ! permutations of the failures
should be taken into account. All in all, we get;
N ! N1 N2
P rN (X = N1 ) = p q ≡ WN (N1 ) (2.5)
| {z } N1 !N2 ! | {z }
Riley Salinaz

A standard check on any probability function is to see if it is properly normalized.


N N
X X N ! N1 N2
WN (N1 ) = p q
N1 =0 N1 =0
N1 !N2 !

= (p + q)N = 1 (2.6)

6
Figure 2.2: The 1D random walk.

2.2 The 1D Random Walk


The formalism of the binomial distribution can be applied to the 1D random walk
in a straight-forward fashion. We assume that (see Figure 2.2) each step is of equal
length l. Therefore, the position can be expressed as x = ml with −N ≤ m ≤ N .
Also, clearly m = N1 − N2 , and so,
N!
PN (m) = N +m
 N −m  p(N +m)/2 q (N −m)/2 (2.7)
2
! 2 !

The random walk is an example of a Markov process. The probability of step N+1
depends only on step N . In our notation, consider the diffusion of a particle, such
that,
(a) each time a step of length l is taken either to the right or to the left, and

(b) τ is the time between successive steps.

⇒ PN +1 (m) = pPN (m − 1) + qPN (m + 1), (2.8)

where t = N τ can be thought of as the time parameter whereas x = ml gives the


displacement at the corresponding time.
The above is a stochastic difference equation whose solution is the binomial dis-
tribution.
If we consider p = q = 1/2 and the limit τ, l → 0, we can convert this into the
well-known diffusion equation. Consider,

∂P (x, t) PN +1 (m) − PN (m)
= (2.9)
∂t
m,N τ
" #
∂ 2 P (x, t)

1 ∂P (x, t) ∂P (x, t)
= −
∂x2 m,N l ∂x m+1,N ∂x m,N
1
= [PN (m + 1) − PN (m) − PN (m) + PN (m − 1)]
l2
1
= 2 [PN (m + 1) + PN (m − 1) − 2PN (m)] (2.10)
l

7
From Eq.(2.8), for p = q = 1/2, we get,

2PN +1 (m) = PN (m − 1) + PN (m + 1)
⇒ 2 [PN +1 (m) − PN (m)] = PN (m − 1) + PN (m + 1) − 2PN (m)
1 1
2
[2 [PN +1 (m) − PN (m)]] = 2 [PN (m − 1) + PN (m + 1) − 2PN (m)] (2.11)
l τ τl

Now using Eq.(2.9) and Eq.(2.10) in the above Eq. we get,

∂P l2 ∂ 2 P
= , (2.12)
∂τ 2τ ∂x2

where we may define D ≡ l2 /2τ as the diffusion coefficient.

2.3 Mean and Standard Deviation


Let u be a discrete random variable
P with allowed values ui to appear with probability
P (ui ) ≡ Pi . This implies that Pi = 1. We define the mean as the weighted average,
X
hui = u = ui Pi , (2.13)
i

and, likewise, for a stochastic function f (u) of the random variable u, the expected
value or mean is defined as,
X
hf (u)i = f (u) = f (ui )Pi , (2.14)
i

Clearly,

hf (u) + g(u)i = hf (u)i + hg(u)i


hcf (u)i = chf (u)i, for c a constant. (2.15)

The deviation from the mean is defined as ∆u = u − hui.


Clearly h∆ui = hu − huii = 0.
A much more meaningful measure of dispersion in a random variable is given by
the square of the deviation from the mean.

(∆u)2 = (u − hui)2
= u2 − hui2 − 2uhui (2.16)

8
The variance σ 2 is defined as the mean of the square of the deviation from the
mean. While the standard deviation σ is the square toot of the variance. So,

σu2 ≡ h(∆u)2 i = hu2 i − hui2 − 2hui2


= hu2 i − hui2
p
σu ≡ σu2 (2.17)

Since (∆u)2 ≥ 0, we get hu2 i ≥ hui2 . Higher central moments may be defined
similarly.

2.4 Mean and Standard Deviation of the Binomial


Distribution
N
X N!
hN1 i = N1 pN1 q N −N1
N1 =0
N1 !(N − N1 )!
N
∂ X N!
=p pN1 q N −N1
∂p N =0 N1 !(N − N1 )!
1


=p (p + q)N = pN (p + q)N −1
∂p
= pN (2.18)

⇒ hN2 i = hN − N1 i = N − pN = qN , just as expected.


One can similarly find the variance and standard deviation. We start with finding
hN12 i.
N
X N!
hN12 i = N12 pN1 q N −N1
N1 =0
N1 !(N − N1 )!
  "  N #
∂ ∂ X N!
= p p pN1 q N −N1
∂p ∂p N =0 N1 !(N − N1 )!
    1 
∂ ∂ N
= p p (p + q)
∂p ∂p

pN (p + q)N −1

=p
∂p
= p N (p + q)N −1 + pN (N − 1)(p + q)N −2


= N p + N (N − 1)p2 (2.19)

9
We can now use Eq.(2.17) to find the variance as,

h(∆N1 )2 i = hN12 i − hN1 i2


= N p + N 2 p2 − N p 2 − N 2 p2
= N p(1 − p) = N pq
p
⇒ σN1 = N pq (2.20)

Finally, the relative width is an important parameter and may be defined as,
p
h(∆N1 )2 i
r
p 1
= √ (2.21)
hN1 i q N
for the binomial distribution. This implies that the distribution is sharply peaked
about the mean for large values of N , as discussed previously.

2.5 Gaussian Limit of the Binomial Distribution


As seen above, for large N the binomial distribution is sharply peaked about the
mean. Because of this reason, consider the slower moving function,

f (N1 ) = ln WN (N1 )
= ln N ! − ln N1 ! − ln (N − N1 )! + N1 ln p + (N − N1 ) ln q (2.22)

Further, let’s assume a maximum for f (N1 ) at N1 = Ñ1 . We can show that
Ñ1 = hN1 i. Close to Ñ1 , both N1 and N − N1 are of order N . Then, using Stirling’s
approximation ln N ! ≈ N ln N − N + O(ln N ), we get

f (N1 ) = N ln N − N − N1 ln N1 + N1 − (N − N1 ) ln (N − N1 ) + N − N1 +
+ N1 ln p + (N − N1 ) ln q + O(ln N, ln N1 , ln (N − N1 )) (2.23)

We wish to find the location of the maximum of f (N1 ) so we now compute the
partial derivative,
∂f N1 N − N1
= − ln N1 − + 1 + ln (N − N1 ) − +
∂N1 N1 N − N1
 
1 1 1
− 1 + ln p − ln q + O , , , (2.24)
N N1 N − N1
which in the large N limit reduces to,
∂f
= − ln N1 + ln (N − N1 ) + ln p − ln q (2.25)
∂N1

10
Setting ∂f /∂ Ñ1 = 0 to find the maximum gives us,

0 = − ln Ñ1 + ln(N − Ñ1 ) + ln p − ln q


N − Ñ1 q
⇒ ln = ln
Ñ1 p
⇒ p(N − Ñ1 ) = q Ñ1
⇒ Ñ1 = N p, (2.26)

just as expected.
Let’s now compute the second derivative of f at the maximum.

∂ 2f
    
1 1 1 1
= − − +O ,
∂N12 N =Ñ1 N N − N1 N 2 (N − N1 )2 N =Ñ1
1 1 1 1
=− − =− −
Np N − Np Np Nq
−q − p 1
= =− (2.27)
N pq N pq

Since f (N1 ) is sharply peaked about N = Ñ1 , it is sensible to Taylor expand it


about this point. We get,

1
f (N1 ) = ln WN (N1 ) = ln WN (Ñ1 ) − (N − Ñ1 )2 + . . . (2.28)
2N pq

If we can ignore the higher order terms, we get the following Gaussian approxima-
tion for WN (N1 ),

(N − hN i)2
 
WN (N1 ) ≈ p0 exp − ≡ pG (N − 1), (2.29)
2h(∆N1 )2 i

where hN1 i = N p and h(∆N1 )2 i = N pq. p0 is a constant that can be determined by


requiring the Gaussian to be normalized, and hence,

1
p0 = p . (2.30)
2πh(∆N1 )2 i

As we analyze this normal distribution, it becomes clear that hN1 iG = hN1 ibinomial
and that h(∆N1 )2 iG = h(∆N1 )2 ibinomial .
It is a good idea to check the limits of applicability of the Gaussian approximation
to the binomial distribution. Clearly, the approximation will be good so long as the

11
terms that we ignored are small in comparison with the one we have kept. For this,
find the third derivative of f at the mean,
∂ 3f 1 1 1
3
= − 2
+ O( )
∂N1 N2 (N − N1 ) N3
 2 
∂ f 1 1
⇒ 2
= 2 2− 2 2
∂N1 N =Ñ1 N p N q
q−p
= 2 2 2 (2.31)
N pq
For the Gaussian approximation to work, we must have;
|q − p| |q − p|
2 2 2
|N − Ñ1 |3  |N − Ñ1 |2
3!N p q 2!N pq
3N pq
⇒ |N − Ñ1 |  (2.32)
|p − q|
This is the interval for which the Gaussian approximation works for the binomial
distribution. Outside of this interval, the Gaussian distribution goes to zero for large
N . This can be seen by finding pG for |N − Ñ1 | ∼ 3N pq/|p − q|.
1 9N 2 p2 q 2
 
⇒ pG ∼ p0 exp − → 0, for large N . (2.33)
2N pq (q − p)2
The binomial also goes to zero for the same range, for large N . Therefore, when
the Gaussian approximation is not valid, the Guassian still matches the binomial for
large N . This is a consequence of the very general central limit theorem.

2.6 The Continuous Case


The random variable in question need not be discrete. In this case, we cannot give
a probability function. We instead give a probability density function p(x) such that
p(x)dx gives the probability ofRoccurrence or relevant event around the value x. The

normalization condition reads −∞ p(x)dx = 1.
T he expectation values are similarly defined. For instance,
Z ∞
hXi ≡ xp(x)dx, (2.34)
−∞

where p(x) is the probability density function for X. Likewise, we have


Z ∞
hf (X)i = f (x)p(x)dx. (2.35)
−∞

Do note that the probability density function has dimensions inverse those of
−1
[X] .

12
Chapter 3

The Microcanonical Ensemble

We are now about to take a first step into understanding how to apply statistical
arguments to physical systems. In particular, in this chapter we will study systems
that have a fixed energy. We will straight go to the examples relying on concepts
explained in the first two chapters.

3.1 Paramagnetism
The first example that we encounter is that of a system of uncorrelated spin-1/2
particles in the microcanonical ensemble. By this we mean that the system has a
given total energy and number N .
We first need to know the Hamiltonian for the system. Since the spins are un-
correlated and we assume that the N spins are pinned to their respective sites, we
may treat the N particles as being distinguishable. Also, we are interested in the
thermodynamics of spin. We assume for the purposes of this analysis that the system
is subjected to uniform magnetic field The Hamiltonian H can be written as,
N
X
H=− µi · H
i=1
XN
=− µiz H, (3.1)
i=1

assuming that the magnetic field H is in the z-direction. The µi are the magnetic
moments of the N particles. We may simplify the above by noting that the spin-1/2
particles are only allowed to have two spin degrees of freedom per particle and so,
µz = ±µ0 , where µ0 is some fundamental unit of magnetic moment. The energy of
the system can, therefore, be written as,
E = −µ0 (N1 − N2 )H = −µ0 N1 H + µ0 (N − N1 )H (3.2)

13
The number of microstates can be recognized to be,

N!
Ω(N, N1 ) = (3.3)
N1 !(N − N1 )!

We need the number of microstates to be a function of E, N . This is easily achieved


by writing N1 in terms of E from Eq.(3.2) as,
 
1 E
N1 = N−
2 µ0 H
 
1 E
⇒ N2 = N − N1 = N+ (3.4)
2 µ0 H

Finally, we can write the number of microstates as,

N!
Ω(E, N ) = h  i h  i (3.5)
1
2
N − µE
0H
! 21 N − E
µ0 H
!

Now the entropy is related to ln Ω in the thermodynamic limit, E, N → ∞ with


u ≡ E/N fixed. We use the Stirling approximation in the thermodynamic limit to
obtain,
    
1 E 1 E
−
ln Ω(E, N ) =N ln N − 
N N− ln N− +
2 µ0 H 2 µ0 H
      
1 E 1 E 1 E
+ N
− − N+ ln N+ +
2 µ0 H 2 µ0 H 2 µ0 H
 
1 E
+ N
+ (3.6)
2 µ0 H

We now take the thermodynamic limit E, N → ∞ and u = E/N to find s ≡ S/N ,


the entropy per unit particle,

kB
s/kB = lim ln Ω(E, N )
E,N →∞ N
         
1 u N u 1 u N u
= ln N − 1− ln 1− − 1+ ln 1+
2 µ0 H 2 µ0 H 2 µ0 H 2 µ0 H
     
1 u N 1 u u
= ln N − 1− ln − 1− ln 1 − +
2 µ0 H 2 2 µ0 H µ0 H
     
1 u N 1 u u
− 1+ ln − 1+ ln 1 + (3.7)
2 µ0 H 2 2 µ0 H µ0 H

14
0.8

0.6
T >0 T <0
s(u)/kB 0.4

0.2

0
−1 0 1
u/µ0 H

Figure 3.1: The entropy per particle s plotted as a function of the energy per particle
u for an ideal paramagnet. The region with u < 0 corresponds to T ≥ 0.

We finally arrive at the result that,


       
kB u u kB u u
s(u) = kB ln 2 − 1− ln 1 − − 1+ ln 1 +
2 µ0 H µ0 H 2 µ0 H µ0 H
(3.8)

The other thermodynamic quantities can now be determined. In particular, the


temperature is given by,
   
1 ∂s kB u kB u
= = ln 1 − − ln 1 + (3.9)
T ∂u 2µ0 H µ0 H 2µ0 H µ0 H

Now is a good time to analyze these results. See Figure 3.1 where s has been
plotted against u. It can be seen clearly from the above Eq. (3.9) that the region
with positive temperature corresponds to the region with negative energy. This is
reasonable, as in equilibrium, the paramagnetic spins will be aligned in the direction
of H and will lead to a negative energy for the system. The maximum of entropy lies
for u = 0 as can be clearly seen from the figure. This is reasonable because for u = 0,
half the spins are aligned with the field and the other half are aligned opposite to the
field, hence yielding a large number of microsates with the macrostate corresponding
to zero energy. So, for a fixed field H as T → 0, the energy becomes −µ0 H/kB T and
the entropy approaches zero. The thermal excitations will tend to distort the spins,
whereas, the magnetic field H tends to reduce the entropy.
One may now invert Eq. (3.9) to obtain the energy density as a function of tem-

15
1.5
1
0.5

m/µ0
0
−0.5
−1
−1.5
−10 −5 0 5 10
µ0 H/kB T

Figure 3.2: m versus H for a paramagnet.

perature. In order to do so, we begin by realizing,


(1 − u/µ0 H)1/2
 
µ0 H
= ln
kB T (1 + u/µ0 H)1/2
(1 − u/µ0 H)1/2
 
µ0 H/kB T
⇒e =
(1 + u/µ0 H)1/2
(1 − u/µ0 H) − (1 + u/µ0 H)
⇒ eµ0 H/kB T − e−µ0 H/kB T = p
(1 − u/µ0 H)(1 + u/µ0 H)
(1 − u/µ0 H) + (1 + u/µ0 H)
and eµ0 H/kB T + e−µ0 H/kB T = p
(1 − u/µ0 H)(1 + u/µ0 H)
 
µ0 H
⇒ u = −µ0 H tanh (3.10)
kB T
The magnetization per particle m = M/N may be defined as,
M µ0 N1 − µ0 N2 −E/H
m= = =
N N  N
u µ0 H
=− = µ0 tanh (3.11)
H kB T
This is the ‘famous’ equation of the magnetization of a paramagnet. The mag-
netization saturates for low enough temperatures (kB T  |µ0 H|) to ±µ0 depending
on the sign of H. For high temperatures, the system gets randomized due to ther-
mal vibrations and the magnetization approaches zero. For a fixed temperature, the
magnetization depends linearly on the field for low fields. This can clearly be seen in
Figure 3.2.

16
The magnetization m = m(H, T ) can be used to define the magnetic susceptibility,

µ20
 
∂m −2 µ0 H
χ(H, T ) = = cosh (3.12)
∂H kB T kB T

This, in the zero field limit, gives the Curie law of paramagnetism, i.e. limH→0 χ =
C/T , where C = µ20 /kB . This law has been experimentally verified for a range of
paramagnetic compounds. The constant C gives information about the magnetic
moments of the molecules of the compound. This is also sometimes used as a low
temperature thermometer.

3.2 Einstein’s Model of Solids


The Dulong-Petit law states that the molar heat capacity (heat capacity per mole)
of solids is a constant. They derived this law empirically after Dalton had proposed
atomic weights. This law, of course, does not explain the low temperature behavior
of solids which was later found.
In 1905, Einstein used the ideas of quantization of energy given by Planck to derive,
in a simple setting, both the Dulong-Petit law and the low temperature behavior of
the specific heat of solids. Einstein thought of the solid to be composed of N 0 non-
interacting 3D non-interacting oscillators of the same fundamental mode. Therefore,
one could image the solid to be equivalent to 3N 0 = N non-interacting 1D simple
harmonic oscillators. The energy of a simple harmonic oscillator is given by E =
~ω(n + 1/2).
The microstates of this model can be thought of as (n1 , n2 , n3 , . . . , nN ), where
ni = 0 → ∞ gives the energy quanta of the ith oscillator. The energy is now given by,

N
En1 ,n2 ,... = M ~ω + ~ω, (3.13)
2
P
where M = i ni are the total quanta of energy in the system. The number of
microstates corresponding to the macrostate with energy E can be determined by
finding out the number of ways of distributing M quanta amongst N oscillators.
The problem is identical to the cambinatorial problem of determining the number
of ways of distributing M identical balls in N boxes. In order to visualize this,
focus on Figure 3.3. The N boxes are separated by N − 1 vertical lines. The M
quanta are represented as blue dots. This configuration represents one of the possible
microstates. Other microstates can be obtained by inequivalent permutations. There
are (M + N − 1)! permutations of the symbols (lines and dots) all of which don’t
lead to different microstates. In particular, M ! permutations amongst the dots and
(N − 1)! permutations of the lines give the same microstate. So, the total number of

17
Figure 3.3: The Einstein model of a solid. The N boxes are separated by N − 1
vertical lines. The M quanta are represented as blue dots..

microstates is,

(M + N − 1)!
Ω(M, N ) = (3.14)
M !(N − 1)!

Deriving the heat capacity per particle is now left as an exercise.

18
Chapter 4

The Canonical Ensemble

The microcanonical ensemble developed in the last chapter can become extremely
tedious to work with from a mathematical standpoint. The mathematical challenge
alone in trying to determine the number of microstates compatible with N particles
confined to a volume V with energy between U ± ∆/2 is enough cause to look for
an alternative formulation. From a physical standpoint, also, systems with a fixed
energy are not the norm. What is easier to implement is to maintain a system at a
fixed temperature by keeping it in thermal contact with a heat bath (which ideally
has an infinite heat capacity).
Such considerations lead us to consider what is referred to as the canonical en-
semble in statistical mechanics - essentially a large number of mental copies of our
system that may share energy but have the same temperature, volume and number
of particles. This is variously called the N, V, T ensemble.

4.1 The Canonical Distribution


The canonical distribution of energies refers to the probability of a system in the
canonical ensemble of having some given value of energy. Our stereotypical system
in the canonical ensemble is one that has a fixed number of particles and volume but
is allowed to exchange energy with a heat bath. Alternatively, we think of a large
number of mental copies of our system that are in thermal contact with each other
and share an energy U0 amongst themselves. These mental copies do not exchange
either volume or particles and are expected to attain thermal equilibrium at a fixed
temperature T . The canonical distribution now is the probability that one of the
mental copies shall have an energy s . Naturally, there are several ways to obtain the
canonical distribution. We shall focus on only the system i contact with the heat bath
as that is simpler from the point of view of computing the canonical distribution of
energies.

19
4.1.1 A system in thermal contact with a heat bath
A heat bath alternatively referred to as an energy reservoir is a thermodynamic system
that has a near infinite heat capacity. This, of course, is an idealization. What we are
really after is a reservoir large enough that it’s temperature does not change when it
is in thermal contact with our system S defined by having N particles and volume
V . In equilibrium, the system S, of course, attains the same temperature T as our
reservoir R.
The composite system S + R is idealized to not interact with the environment
and so has a fixed energy U0 . This means that this composite system as a whole can
be thought of as a microcanonical ensemble. Let the system S have energy Es . We
would like to find the probability of this happening. It is reasonable to expect that
Es  U0 . When S has energy Es , the reservoir has energy UR = U0 − Es .
With the state of system S having been specified1 , the reservoir R may still be in
a large number of states gR (UR ) = gR (U0 − Es ). Since all of these states with a given
energy are equally likely to occur, the probability that R is in one of these states, and
correspondingly S is in a state with energy Es , is proportional to this number gR (UR ).

Ps ∝ gR (UR ) = gR (U0 − Es ). (4.1)

We may Taylor expand ln gR (UR ) about U0 to get,


 
∂ ln gR
ln gR (UR ) = ln gR (U0 ) + (UR − U0 ) + . . .
∂UR UR =U0
1
'const − Es , (4.2)
τ
where 1/τ = (∂ ln g(UR )/∂UR )N,V defines the common ‘fundamental’ temperature of
our system and reservoir. Recall that τ = kT with T being the thermodynamic
temperature. It will be convenient to define β = 1/τ = 1/kT . We now arrive at the
desired canonical distribution,

Ps ∝e−βEs
e−βEs
⇒ Ps = P −βEs , (4.3)
se

where the constant of proportionality has been computed by normalizing the proba-
bility function.
In deriving these results we have tacitly assumed that the different microstates
all have different energy. However, the generaliztion to the energy degenerate case is
1
The system S is assumed to be in some eigenstate of the N particle system and so forms some
particular microstate.

20
straight-forward. Assuming that the microstate with energy Es has degeneracy gs , we
obtain:

gs e−βEs
P r(E = Es ) = P −βEs
(4.4)
s gs e

P The−βEdenominator of this expression is known as the partition function ZN (V, T ) ≡


s gs e .
s

4.2 Thermodynamics from the Canonical Distri-


bution
Note that in the N, V, T ensemble point of view, our system S will not have a fixed
energy. Rather it will have the probability Ps to occupy the energy Es . In the
thermodynamic limit, we expect (and indeed it does happen) that the width of energy
is extremely small and the system is observed to have the energy:

X
U =hEs i = Es Ps
−βEs
P
s Es gs e
= P
gs e−βEs
P s
Es gs e−βEs
= s (4.5)
ZN (V, T )
It is interesting to note that U may actually be computed as a derivative of the
partition function, namely:


U =− ln ZN (V, T ) (4.6)
∂β
Let’s switch gears for a few moments and talk about the Helmholtz free energy,
later referred to just as the free energy. The free energy is a function of N, V, T , and
so we suspect there might be some relation with the canonical ensemble. We know
that,

F =F (T, V, N ) ≡ U − T S
⇒ dF = − SdT − pdV + µdN
     
∂F ∂F ∂F
⇒S=− p=− µ= (4.7)
∂T N,V ∂V N,T ∂N T,V

21
The energy U may be constructed from the free energy via,
   
∂F ∂(F/T )
U = F + TS = F − T =
∂T N,V ∂(1/T ) N,V
 

= (βF ) (4.8)
∂β N,V

We may now make the following identification looking at the two expressions above
for U :

βF = − ln(ZN (V, T ))
=⇒ F = −kT ln (ZN (V, T )) , (4.9)
−βEr
P
We are now in a position to define the partition function Z ≡ re , with
F = −kT ln Z. This is the connection between thermodynamics and the canonical
ensemble. All we need to do is construct the “sum over states” or the partition
function. After this, we can directly find the free energy which we can then use to
compute all of thermodynamics. For instance,
   2 
∂U ∂ F
CV = = −T (4.10)
∂T V,N ∂T 2 V,N

The free energy, of course directly gives the S, p, µ.


It is also instructive to rewrite the entropy in a slightly different form. We start
with,
e−βEs
Ps =gs
Z
Ps
⇒ ln = − ln Z − βEs
gs
 
Ps
⇒ ln = − ln Z − β hEs i (4.11)
gs
Now F = −β −1 ln Z and hEs i is the average energy in the ensemble identified with
U . This means that − ln Z − β hEs i = β(F − U ) = −S/k from the definition of the
free energy. All of this gives us the simple result that the entropy depends only on
the occupation probabilities of the various energy levels accessible to the system,
 
Ps X Ps
S = −k ln = −k Ps ln . (4.12)
gs s
gs

It is easy to see that this definition of entropy makes sense from the point of view
of information theory. For instance, as the temperature approaches zero, all particles

22
shall occupy the lowest energy states possible which will lead to the Pr ’s corresponding
to those levels to tend to 1 while the Pr → 0 for the unfilled levels. This would mean
that the entropy becomes zero.
It is also straight-forward to apply this result to the microcanonical ensemble.
For the microcanonical ensemble, we only need to consider the one state with energy
Es and multiplicity gs for which Ps = 1 with the understanding that all states with
different energies have zero probability. Then,
X Ps
S =−k Ps ln
s
gs
1
= − k ln
gs
=k ln gs (4.13)

At this point, I would like to point out that many books will write the entropy as:
X
S=− Pr ln Pr
r
e−βEr
Pr = . (4.14)
Z
The difference between this approach and the one taken by us is that we are summing
over states that have different energies all the while appropriately using the (multi-
plicity) degeneracy factor gs . This, in my opinion, is the more logical way of thinking
about this problem.
Regardless of how the formalism is developed, it is almost always used in the way
that we have done so.

4.3 The Partition Function


Just to recap, the partition function Z as defined in the previous section, is of utmost
importance in the canonical ensemble approach to statistical mechanics. In many
applications of physical interest, the energy levels are degenerate. Let’s assume that
the degeneracy of the Ei level is gi , the partition function may then be written as,
X
ZN (V, T ) = gi e−βEi . (4.15)
i

The dependence on T is obvious through β, whereas the dependence on N and V is


there, as before, because the energy levels depend on these parameters. There is a

23
special reason for writing ZN (V, T ) rather than Z(N, V, T ) which will become clear
later. Furthermore, since the free energy must be an extensive2 variable, so must ln Z.
The probability that the system inhabits a state with energy Ei is,

gi e−βEi
Pi = P −βEi
, (4.16)
i gi e

with states of the same energy Ei equally likely to occur. In many applications, the
energy becomes an essentially continuous variable. In such cases we may define a
density of states g(E) such that g(E)dE is the number of states between energies
E and E + dE. The partition function and corresponding probability function then
become,

g(E)e−βE dE
P (E)dE = R ∞ −βE dE
,
g(E)e
Z0 ∞
ZN (V, T ) = g(E)e−βE dE (4.17)
0

Finally, the expectation value of an observable f is given by,


R∞
fi gi e−βEi f (E)g(E)e−βE dE
X P
i 0R
hf i = f i Pi = P −βEi
→ ∞ (4.18)
i i gi e 0
g(E)e−βE dE

It is interesting to note that for the physical case (β > 0) the partition function is
nothing but the Laplace transform of the density of states. Therefore, if we know the
partition function somehow, we can find the density of states by Laplace inversion by
a suitable means3 .
Finally, consider the case of finding the partition function for a system of N parti-
cles, such that the N particles can be tagged in a classical sense (for example, by being
fixed in space) and are all independent. Assume that for the one particle partition
function the set of allowed energies is {s } with associated multiplicities {gr }. Then,
the N particle partition function may be written as:
X
ZN (V, T ) = gr e−βEr , (4.19)
r

where gr = gr1 gr2 . . . grN is the multiplicity factor for the N particle state having
P
energy Er = r1 + r2 · · · + rN . Also, the sum over all states takes the form r →
2
An extensive variable is one that scales with the size of the system. For instance, the number of
particles N is an extensive variable. Intensive variables are such that don’t scale with system size,
for instance temperature, pressure, etc.
3
See, for example, the Mellin’s inversion formula.

24
P P P
r1 r2 ··· rN ;

XX X
ZN (V, T ) = ··· gr1 gr2 . . . grN e−β(r1 +r2 ···+rN ) ,
r1 r2 rN
! ! !
X X X
= gr1 e−βr1 gr2 e−βr2 ... grN e−βrN
r1 r2 rN
N
= (Z1 (V, T )) (4.20)

It should now be clear that if the N particles are identical and interchangeable
(as in a gas), then this procedure is over-counting the partition function by N !, the
number of permutations of the N identical particles. So, when the N particles cannot
be tagged in the classical sense, then for independent particles we get:

(Z1 (V, T ))N


ZN (V, T ) = (4.21)
N!

4.4 Assorted Examples


In this section I will briefly show how the partition function may be constructed for
various systems. Use this as a starting point for determining the full thermodynamics
of these example systems as well as a guide to understand how to deal with other
systems.
In all these cases I shall simply assume that the systems are composed of N
identical copies of the same basic unit. However, the N copies can be tagged in the
classical sense and, so cannot be exchanged. The result is that the complete workup of
these systems can just be obtained from a study of the 1-component partition function.
As usual lowercase letters will be used for the 1-component extensive parameters, i.e.
f for the Helmholtz free energy per particle.

4.4.1 Two State System


Let’s assume that the two possible energies are 0 and . The one-particle partition
function is,

Z1 = 1 + e−β , (4.22)

from which follows the Helmholtz free energy per particle,

f = −kT ln 1 + e−β

(4.23)

25
It is interesting to investigate the behavior of the probability of finding this system
in a given state. Denoting this by Pr with r = 0 corresponding to the zero energy
state and r = 1 corresponding to the state with energy . Then,

1
P0 =
1 + e−β
e−β
P1 = (4.24)
1 + e−β

In order to make sense of these results, the two limits of T → 0, ∞ are interesting
to investigate. As T → 0, β → ∞ and e−β → 0. Therefore, P0 → 1 with P1 →
0 in the limit of T → 0. This is entirely consistent with our expectation of the
system settling in the lowest energy state for low temperatures. You should check
that the entropy goes to zero in this limit as well - this is, after all, a highly ordered
configuration. As T increases the thermal agitation in the system will bring about loss
of information/randomness/increase in entropy. In the limit of T → ∞, we expect
the configuration to be completely random. This is indeed the case as in this limit
β → 0 with the result that P0 = P1 = 1/2.

4.4.2 Independent Simple Harmonic Oscillators


The simple harmonic oscillator is one of the most studied models in Physics. We look
towards it as an application of the ideas developed in this chapter.
We will treat the quantum mechanical problem via explicitly computing the parti-
tion function as a sum over the allowed energy states. We know that the eigenenergies
n are equal to (n + 1/2)~ω, where n is any non-negative integer. Then,


X
Z1 (β) = e−β(n+1/2)~ω
n=0

X
−β~ω/2
=e e−βn~ω
n=0
−β~ω/2
e
=
1 − e−β~ω
−N
⇒ ZN (β) =e−N β~ω/2 1 − e−β~ω (4.25)

Do note that in writing the above we have not divided by N !. This is because the
N -oscillators are assumed to be enumerable in principle. This happens for the case of
solids where the oscillators may be thought of as being associated with lattice sites.

26
From this we can directly compute all the thermodynamics,
 
β~ω −β~ω

F = − kT ln ZN = N kT + ln 1 − e
2
 
1 −β~ω

=N ~ω + kT ln 1 − e
2
⇒ µ =F/N
p =0
 
β~ω −β~ω

and S =N k β~ω − ln 1 − e (4.26)
e −1
Finally we can say,
 
~ω ~ω
U =N + β~ω
2 e −1
eβ~ω
Cp =CV = N k(β~ω)2 (4.27)
(eβ~ω − 1)2

The energy per oscillator u = U/N can be seen to not follow the equipartition
theorem ni the low temperature case, with the equipartition theorem being valid at
high temperatures. The equipartition theorem is a general result of classical ther-
modynamics that states that in thermal equilibrium every degree of freedom must
contribute the same energy equal to kT /2. For the oscillators that we have taken here
(1 dimensional), there are two degrees of freedom (momentum and position), and,
hence we expect the energy per particle to be kT , at least in the high temperature
limit when quantum effects can be ignored. The easiest way to see this is to compute
u/kT in the limits of T → 0(β → ∞) and T → ∞(β → 0). The calculation for
T → ∞ follows:
 
u ~ω 1 β
lim = lim +
T →∞ kT T →∞ kT 2 eβ~ω − 1
 
β 1
=~ω lim β~ω =
β→0 e −1 ~ωeβ~ω
 
1
=~ω = 1, (4.28)

where in the second last line we have used L’Hopital’s rule to simplify the expression.
This shows that in the limit of high temperatures the equipartition theorem is valid.
This is consistent with our understanding of thermal agitation wiping out quantum
effects. As will be seen shortly, the primary quantum effect that arises in this system
is the ground state energy which becomes irrelevant when thermal energy is large.

27
Thus, we expect the system to show maximum deviation from classical behavior in
the limit of T → 0. In this limit, the system should, on physical grounds, settle in
the ground state. So we expect u → ~ω/2 as T → 0. This can be see from above
expression of u as well. To recall, classically the energy per oscillator goes to zero as
T → 0 because it is simply given by kT .

28

You might also like