You are on page 1of 18

Lecture Notes: Thermal Physics

S Chaturvedi
August 18, 2021

Contents
1 Rudiments of Probability Theory 3
1.1 Compound events and associated probabilities . . . . . . . . . 4
1.2 Mutual exclusiveness of events, independence of events . . . . 6
1.3 Conditional probability, Baes theorem . . . . . . . . . . . . . . 7

2 The statistical concept of uncertainty : Entropy 8

3 Discrete random Variables, probability distributions 11

4 Some important discrete probability distributions 11


4.1 Bernoulli trials, Binomial distribution . . . . . . . . . . . . . . 11
4.2 Poisson distribution . . . . . . . . . . . . . . . . . . . . . . . . 12
4.3 Geometric distribution . . . . . . . . . . . . . . . . . . . . . . 12
4.4 Multinomial distribution . . . . . . . . . . . . . . . . . . . . . 13

5 Continuous probabilty distributions 13

6 Some important continuous probability distributions 14


6.1 Uniform distribution . . . . . . . . . . . . . . . . . . . . . . . 14
6.2 Gaussian or the normal distribution . . . . . . . . . . . . . . . 14
6.3 Lorentzian distribution . . . . . . . . . . . . . . . . . . . . . . 15
6.4 Dirac delta function distribution . . . . . . . . . . . . . . . . . 15

7 Intuitive notions of temperature, heat, thermal equilibrium 16

8 Macrostates and Microstates 16

1
9 Boltzmann probabilities 17

10 Maxwell distribution of velocities 18

11 Doppler broadening 18

2
1 Rudiments of Probability Theory
Probability theory deals with results of experiments performed on a system
assumed to exist in certain well defined ’states’. For simplicity we will assume
that the states can be labelled by a discrete index i taking values from 1
to k. By an experiment one means performing a specific operation on the
system and recording the outcome. Some typical examples of systems and
experiments performed on them are:
1. System-A coin, States: Heads (H) or Tails (T ), Experiment- Tossing
the coin and recording the outcome H or T .
2. System-A die, States: 1, 2, 3, 4, 5, 6, Experiment- Casting the die and
recording the outcome i, i = 1, · · · , 6.
3. System: A deck of cards, States:1, 2, · · · , 52, Experiment- Pulling out
a card from the deck and recording the outcome i, i = 1, · · · , 52.
4. Two coins, States: HH, HT, T H, T T , Experiment: Tossing the coins
and recording which of the four outcomes is realized.
The use of the term ‘state’ here is inspired by quantum mechanics. In con-
ventional probability theory what we call a state is referred to as a simple
event and the set of all possible states, the state space, as the sample space.
Remark: It is appropriate to note that here we are dealing with the ‘clas-
sical’ probability theory. Important and fundamental differences arise when
we pass to quantum probabilities. This happens because of the differences
in the notions of specifiction of a quantum system, the act of measurement
and specification of the state of a composite quantum systems are radically
different from their classical counterparts.
If an experiment is performed a number of times, say N times, a particular
simple event i may occur ni times. The number ni may of course depend
on N . To make that explicit we write ni as ni (N ). We may now define the
frequency of occurrence of the event i as
ni (N )
fi =
N
Clearly
X
fi ≥ 0; fi = 1;
i

3
Remark: Performing N experiments on the same system at different in-
stants of time may also be viewed as performing a single experiment at the
same instant of time on N copies of the same system. While the process
of computing the frequencies of outcomes from the first point of view may
be regarded as having been obtained by a ‘time averaging’ procedure, those
computed from the second point of view may appropriately be regarded as
‘ensemble averages’.
To remove the dependence on N , we consider the limit N → ∞ and arrive
at the notion of the probability pi of the event i :

ni (N )
pi = lim
N →∞ N
Again, as with fi ’s we have
X
pi ≥ 0; pi = 1
i

The probabilities pi thus obtained are referred to as ‘empirical’ or ‘a poste-


riori’ probabilities.
If, on the other hand there is no apparent reason for one event to occur
more frequently than another, then their respective probabilities are assumed
to be equal and one has

pi = 1/Ω, i = 1, 2, · · · , Ω

where Ω gives the size of the sample space, the total number of simple events.
The probabilities thus assigned are referred to as ‘a priori’ probabilities.

1.1 Compound events and associated probabilities


Recall that the sample space is defined as the set of all simple events. A
simple event may thus be regarded as a subset of the sample space consisting
of a singleton. A subset A of the sample space containing two or more points
may therefore be appropriately called a compound event and may be pictured
as follows:

4
Simple events Compound event

Sample space

The probability p(A) of a compound event A is obtained by summing up the


probabilities of the simple events contained in A:
X
p(A) = p(i)
i∈A

Given two compound events A and B, the probability p(A ∪ B), giving the
probability that the events A or B occur is given by
p(A ∪ B) = p(A) + p(B) − p(A ∩ B)
as should be evident from the figure below:

A B

It is also clear that p(A ∩ B) canSample space


be understood as the probability that
both the events A and B occur and can rightly be referred to as the joint
probability of occurrence of A and B. One often uses the notation p(A; B)
for the joint probability of occurrence of A and B and by construction, it is
symmetric in its arguments A and B i.e. p(A; B) = p(B; A).

5
1.2 Mutual exclusiveness of events, independence of
events
Two compound events A and B are said to be be mutually exclusive if they
have no simple events in common i.e. A ∩ B is empty and in that case we
have
p(A ∩ B) = 0
and as a result
p(A ∪ B) = p(A) + p(B)
(Since simple events are, by definition mutually exclusve we immediately see
that this is consistent with the way we associate probabilities with compound
events.) Thus, with reference to the figure below,

A B

Sample space

we see that while the events B and C are mutually exclusive, the events A
and B and the events A and C are not. As a result only in the first case do
we have the situation that the probability that B or C occur is obtained by
adding up the probability of occurrence of B and that of C.
Two events A and B are said to be independent if
p(A ∩ B) = p(A)p(B)
i.e. if the probability that A and B occur is the product of the respective
probabilities.

6
1.3 Conditional probability, Baes theorem
Having defined the notion of a joint probability we now introduce the notion
of a conditional probability P (A|B) –the probability that A occurs given that
B has occurred. It is defined as follows
p(A ∩ B)
p(A|B) =
p(B)

A moment’s thought reveals that P (A|B) can simply be understood as the


‘fraction of B contained in A’. It is also evident that unlike the joint probabil-
ity, the conditional probability is, in general, not symmetric in its arguments:

p(A|B) 6= p(B|A)

Further, if for two events A and B are independent then for two such events
we have

p(A|B) = p(A) and p(B|A) = p(B)

justifying the use of the term ‘independent’ in that the occurrence of one
does not affect the chances of occurrence of the other.
Further from the definition of conditional probability and the symmetry
of the joint probability it follows that

p(A|B)p(B) = p(B|A)p(A)

This result goes by the name of Baes theorem.


Exercise 1
Consider the experiment of drawing a single card from a deck of cards.
The possible simple events are 52 in number and can be labelled by i =
1, 2, · · · , 52. The sample space in this case consists of 52 points each repre-
senting one of the cards that may be drawn. We now define three compound
events A, B, C as follows :

• A: All points that represent hearts.

• B: All points that represent a number three card.

• C All points that represent a one eyed jack.

7
[a] Compute the probability of occurrence of

[i] A.
[2] B.
[3] C.
[4] A or B.
[5] A or C.
[7] B or C.
[8] A and B.
[9] A and C.
[10] B and C.
[11] A given that B has occurred.
[12] A given that C has occurred.
[13] B given that A has ocurred.
[14] B given that C has ocurred.
[15] C given that A has ocurred.
[16] C given that B has ocurred.

[b] List the pair of events which are mutually exclusive.

[c] List the pair of events which are independent.

2 The statistical concept of uncertainty : En-


tropy
Consider four experiments I-IV with the first one involving a true die and the
remaining three involving a die which is not true. The sample space in each
of these experiments consits of six points labelled by i with i taking values
from 1 to 6. Let the probabilities for the six possible outcomes in each of the
four experiments be as given below:
1
I pi = 6
for i = 1, · · · , 6
1 3
II pi = 8
for i 6= 6, p6 = 8

8
III pi = 0 for i 6= 6, p6 = 1
1 3
IV pi = 8
for i 6= 3, p3 = 8

Consider now cases I and II. If one were to select one of these dice for purposes
of gambling, one would obviously select the die in II and bet on number
six. The reason for choosing die II is that compared to the die in I one is
less uncertain about the outcome. Indeed, if one wanted to be absolutely
certain about the outcome one would choose the die in case III. Based on
these intuitively obvious considerations we can draw the following comparison
table:
I Most uncertain

II Less uncertain than than I

III No uncertainty

IV Same uncertainty as in II
We can see that the uncertainty does not depend on which event has certain
probability but rather on values of all the probabilities.
Consider now an experiment involving rolling two dice. The sample space
now consists of 36 points which can be labelled as (i, j) with i, j = 1, · · · 6
by associating i and j with the first and the second die respectively. Sup-
pose we choose I for the first die and III for the second. Since the dice are
independent pij = pi (I)pj (III). The uncertainty in the present case is clearly
the uncertainty in I as the uncertainty in III is zero and we can tentatively
stipulate that the uncertainty in this combination (I,III) = uncertainty(I) +
uncertainty(III) as uncertainty(III) =0. Consider now the combination I and
II. In this combination all we can say is that the uncertainty must be at least
as large as the uncertainty in I and the uncertainty in II separately. Clearly
uncertainty(I)+uncertainty(II) satisfies this requirement.
Based on these cosiderations we may now list the desired properties for
a quantifier of the uncertainty associated with an experiment:
• The uncertainty of an experiment consisting of two independent events
equals the sum of their individual uncertainties.

• The uncertainty of an experiment depends on the probabilities of all


the events pi and is thus an average property of the experiment.

9
• The maximum uncertainty of an experiment occurs if all the probabil-
ities are equal. If one event has probability 1 then the uncertainty is
zero.

• The uncertainty must be a symmetric function of the pi ’s - a permuta-


tion of the pi must not alter its value.

A function satisfying all these requirements first considered by Boltzmann


way back in 1872 and later by Gibbs and by Shannon in the context of
information theory is
X
S = −kB pi ln pi
i

where kB is a constant.
The most mysterious part of this expression is the logrithmic term. One
can easily check that it is needed to satisfy the first requirement in the list
above.
Exercise 2
Verify that the above expression for a quantitative measure of uncertainty in
an experiment does indeed bear out the intuitive conclusions drawn above in
the context of experiments I, II, III, IV, (I,III) and (I,II).
Exercise 3
Show that S attains its maximum value for pi = 1/Ω where Ω is the size of
the sample space and that the maximum value of S is kB ln Ω.
Exercise 4
Use the P method of lagrange
P multipliers to maximize S subject to the con-
straints i pi = 1 and i pi Ei = U to show that the maximum occurs when
pi ’s are given by

e−βEi X
pi = , Z= e−βEi (Boltzmann probabilities)
Z i

and that S for such values of pi is given by

S = kB (ln Z + βU )

10
3 Discrete random Variables, probability dis-
tributions
There are many circumstances in which we can associate numerical values
x1 , x2 , · · · xn to the n points i = 1, 2, · · · , n in the sample space. In such
circumstances one speaks of a random variable X taking values in the set
{x1 , x2 , · · · , xn } and interprets the probability pi as the probability p(X = xi )
that the random variable X has a numerical value xi . With this understand-
ing, for notional convenience, it is customary to abbreviate p(X = xi ) as
p(xi ) and refer to the collection {p(xi ), i = 1, 2, · · · , n} as the probability
distribution of the discrete random variable X. The statements
X
pi ≥ 0, pi = 1
i

translate into
X
p(xi ) ≥ 0, p(xi ) = 1
i

It now becomes meaningful to talk about moments hX k i of a random variable


defined as follows:
X
hX k i = xki p(xi )
i

and associated quantities such as the average value of X, variance of X, stan-


dard deviation etc. In case when xi take values in the set {i, i = 0, 1, · · · , N }
so that we can put xi = i and p(xi ) ≡ p(i), computation of the moments (or
better factorial moments) is greatly facilitated by the use of the correspond-
ing ‘generating’ function :
X
F (z) = p(i)z i
i

4 Some important discrete probability distri-


butions
4.1 Bernoulli trials, Binomial distribution
Consider a coin toss in which the outcomes can be heads or tails with proba-
bilities p and q with p + q = 1. We ask the question : what is the probability

11
that we get n heads if the coin is tossed N times. Here n is the random
variable taking values in the set {0, 1 · · · , N }. Elementary considerations
based on what we have already learnt show that
 
N n N −n
p(n) = p q
n

A physical application of this distribution arises in the context of number


fluctuations in a (classical ideal) gas consisting of N molecules in a volume
V . We ask the question : what is the probability that n molecules are present
in a subvolume v ? The apriori probability that a molecule is present in a
subvolume v is clearly v/V . Putting p = v/V in the formula above we get
   
N v n v N −n
p(n) = 1−
n V V

4.2 Poisson distribution


In the limit N → ∞, p → 0 with N p finite the binomial distribution goes
over to the Poisson distribution:
µn
p(n) = e−µ , n = 0, 1, · · · · · ·
n!
where

µ = lim (N p)
N →∞
p→0

4.3 Geometric distribution


Consider again a coin with probabilities p and q for heads and tails respec-
tively. The experiment now consists of tossing the coin repeatedly till heads
is obtained for the first time. The probability P (n) that this happens on the
(n + 1)th toss is clearly

p(n) = q n p

This distribution is called a geometric distribution.

12
4.4 Multinomial distribution
Setting n = n1 , n2 = N − n and p = p1 , q = p2 with n1 + n2 = N and
p1 + p2 = 1, the binomial distribution can be written as
N!
p(n1 , n2 ) = pn1 pn2 .
n1 !n2 ! 1 2
This immediately generalises to a multinomial distribution:
N!
p(n1 , n2 , · · · , nm ) = pn1 1 pn2 2 · · · pnmm
n1 !n2 ! · · · nm !
with n1 + n2 + · · · + nm = N and p1 + p2 + · · · + pm = 1

5 Continuous probabilty distributions


If the random variable X takes values in (−∞, ∞) we associate with it a
probabilty density function p(x) with the understanding that p(x)dx gives
the probability that the random variable has a value in the range x and
x + dx. Of course, as with probabilities
Z ∞
p(x) ≥ 0 for all x, dxp(x) = 1
−∞

[In mathematics literature one often deals with what is called the cumulative
distribution function
Z x
P r(X ≤ x) = dx0 p(x0 )
−∞

to accommodate cases where p(x) may not be well defined. The Dirac delta
function distribution is a case in point]
A useful analogue of the generating function in the context of the dis-
crete probability distribution function is the characteristic function F (k) of
a continuous probability distribution function p(x) defined as the Fourier
transform of p(x)
Z ∞
F (k) = dxe−ikx p(x)
−∞

13
6 Some important continuous probability dis-
tributions
6.1 Uniform distribution
(
1 for 0 ≤ x ≤ 1
p(x) =
0 otherwise

Here one is dealing with a random variable X which is uniformly distributed


in the interval [0, 1].

6.2 Gaussian or the normal distribution

(x − µ)2
1 −
p(x) = p e 2σ 2
(2πσ 2 )

where µ is any real number and σ is a positive number. The parameter µ is


the mean value of X, σ 2 its variance and σ its standard deviation:

hXi = µ; hX 2 i − hXi2 = σ 2

The importance of the Gaussian distribution in all walks of life stems from
the central limit theorem :
Let X1 , X2 , · · · , Xn be independent and identically distributed random vari-
ables with mean µ and variance σ 2 . They need not be Gaussian random
variables. Then in the limit n → ∞p the probability of the random variable
Zn = (X1 + X2 + · · · + Xn − nµ)/(σ (n)) tends to a Gaussian distribution
with zero mean and unit variance.
The Gaussian distribution generalises to multivariate situations as fol-
lows:
r
DetA −(x−µ)T A(x−µ)
p(x1 , · · · , xn ) = e
π
where x and µ denote n-dimensional columns with entries (x1 , · · · , xn ) and
(µ1 , · · · , µn ) respectively and A is an n × n real symmetric matrix with pos-
itive eigenvalues.

14
6.3 Lorentzian distribution
Besides the ubiquitous Gaussian distribution, another continuous probability
distribution that one comes across in physics is the Lorentzian distribution:
1 λ
p(x) =
π (x − µ)2 + λ2

where µ is any real number and λ > 0.

6.4 Dirac delta function distribution

p(x) = δ(x − µ)

This may be viewed as a limiting case of the Gaussian (Lorentzian) distribu-


tion in the limit σ → 0 (λ → 0). The delta function δ(x) is zero everywhere
except at x = 0 where it has an infinite spike.
Some important properties of the Dirac delta function are
Z ∞ X δ(x − xi )
1
(i) dxf (x)δ(x) = f (0) (ii) δ(ax) = δ(x) (iii) δ(f (x)) =
−∞ |a| i
|f 0 (xi )|

In (iii) the prime denotes differentiation with respect to x and xi ’s are the
zeros of f (x).
Exercise 5
Compute the integrals
Z ∞ Z ∞ Z ∞ Z ∞
−αx2 2 2
(a) dxe (b) 4 2
dxx δ(x − 4) (c) dx1 dx2 e−(x1 +x2 +x1 x2 )
−∞ −∞ −∞ −∞

Exercise 6
Compute hXi and hX 2 i−hXi2 for the Poisson and the Gaussian distributions.
Exercise 7
Compute the generating function and the characteristic function for the Pois-
son and the Gaussian distributions respectively and hence deduce the quan-
titities in Ex. 6.

15
7 Intuitive notions of temperature, heat, ther-
mal equilibrium
Temperature is a measure of ‘hotness’ or ‘coldness’. When a hot object is
brought in contact with a cold object, there is a net flow of ‘heat’ from the
hot object to the cold object. After some time the flow of heat stops and
we say that the two objects are in thermal equilibrium - they are at the
same temperature. Heat is thus to be thought of as the ‘energy in transit’.
Quantification of the notion of temperature culminating in the advent of the
Kelvin scale has a long and fascinating history as you might have learnt in
courses earlier. The notion of being in thermal equilibrium is transitive: If
A is in thermal equilibrium with B (i.e. there is no net flow of heat when
the two are brought in thermal contact with each other) and B is in thermal
equilibrium with C then A and C are also in thermal equilibrium with each
other. The zeroth law of thermodynamics expresses the transitivity of the
notion of being in thermal equilibrium.

8 Macrostates and Microstates


Historically the subject of thermodynamics began with the study of relations
between macroscopic quantities such as pressure, volume, internal energy,
temperature etc. of a gas which were taken to be the gross variables ade-
quate to describe the ‘state’ of the system at the chosen level of description.
A microscopic description of a mole of a gas, on the other hand would require
specification of the positions and momenta of each of the 1023 molecules. A
given macrostate may be consistent with a very large number of microstates.
As a typical example consider the toss of N coins. Specification of a mi-
crostate would correspond to specifying the state of the first coin, the second
coin etc.. A macrostate for this system would be, for instance, one in which
one specifies only the number n of coins with heads (and therefore N − n
coins with tails) without regard to which coin is in which state. The number
microstate corresponding to this macrostate is clearly
 
N
n
which for N = 100 and n = 50 is of the order of 1027 , indeed a very large
number. As another example, consider a gas consisting of N particles and

16
let us assume that the total energy
1
E= (p21 + · · · + p2N )
2m
is taken to specify the macrostates of the system. The number of microstates
with energies lying between E and E + dE will be given by the number of
points lying between two shells of spheres in a 6N dimensional space.

9 Boltzmann probabilities
Consider Two macroscopic systems A and B which are in thermal equilib-
rium with a reservoir (or a bath) at a temperature T . Let pA (EA ) denote
the probability that of finding the system A in a particular microstate cor-
responding to the energy EA . Let pB (EB ) denote the correponding quantity
for the system B.
Consider now the composite system made up of A and B. Let pA+B (EA+B )
denote the probability that the composite system is in a microstate corre-
sponding to the energy EA+B .Assuming that A and B are non interacting
EA+B = EA + EB we have

pA+B (EA+B ) = pA+B (EA + EB )

The RHS denotes the probability that the composite system is in a microstate
corresponding to the energy EA + EB . In view of the independence of A and
B we have
pA+B (EA + EB ) = pA (EA )pB (EB )
Further using the fact that the derivative of a function f (x + y) is same as
its derivative with respect to x or y we get

p0A (EA )p(EB ) = pA (EA )p0B (EB )

or
p0A (EA ) p0 (EB )
= B = −β
pA (EA ) pB (EB )
where β is a parameter independent of A or B and therefore only depends
the properties of the reservoir viz. its temperature.

17
This relation then gives the structure of the probabilities associated with
a microscopic corresponding to given energy:

pA (EA ) = CA e−βEA

With the identification β = 1/kB T , as would be done later, one is led to the
probabilities associated with microstates corresponding to energy E.

10 Maxwell distribution of velocities


A straightforward application of the above result leads to the velocity and
the speed distribution function of molecules in a gas at a temperature T as
you would have learnt in the context of kinetic theory of gases:

The fraction of molecules with velocities between (vx , vy , vz )


and ((vx + dvx , vy + dvy , vz + dvz )
 3/2
m 2
= e−mv /2kB T dvx dvy dvz
2πkB T
The fraction of molecules with speeds between v
and v + dv
 3/2
4 m 2
=√ e−mv /2kB T dv
π 2πkB T

From the speed distribution given above it easily follows that


r
8kB T
hvi =
πm
3k BT
hv 2 i =
m r
p 3kB T
vrms = hv 2 i =
m
1 2 3
hEi = mhv i = kB T
2 2
The last result is consistent with the equipartition theorem : mean energy
per ‘degree of freedom’ = (1/2)kB T .

18

You might also like