This action might not be possible to undo. Are you sure you want to continue?
r
X
i
v
:
p
h
y
s
i
c
s
/
9
8
0
5
0
2
4
v
1
[
p
h
y
s
i
c
s
.
e
d

p
h
]
1
8
M
a
y
1
9
9
8
Statistical Mechanics in a Nutshell
∗
Jochen Rau
MaxPlanckInstitut f¨ ur Physik komplexer Systeme
N¨othnitzer Straße 38, 01187 Dresden, Germany
April 2, 2007
Contents
1 Some Probability Theory 2
1.1 Constrained distributions . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Concentration theorem . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Frequency estimation . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Hypothesis testing . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Jaynes’ analysis of Wolf’s die data . . . . . . . . . . . . . . . . . . 8
1.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Macroscopic Systems in Equilibrium 10
2.1 Macrostate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 First law of thermodynamics . . . . . . . . . . . . . . . . . . . . . 13
2.3 Example: Ideal quantum gas . . . . . . . . . . . . . . . . . . . . . 15
2.4 Thermodynamic potentials . . . . . . . . . . . . . . . . . . . . . . 16
2.5 Correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3 Linear Response 19
3.1 Liouvillian and Evolution . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Kubo formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 Example: Electrical conductivity . . . . . . . . . . . . . . . . . . 22
∗
Part I of a course on “Transport Theory” taught at Dresden University of Technology,
Spring 1997. For more information see the course web site
http://www.mpipksdresden.mpg.de/∼jochen/transport/intro.html
1
1 Some Probability Theory
1.1 Constrained distributions
A random experiment has n possible results at each trial; so in N trials there
are n
N
conceivable outcomes. (We use the word “result” for a single trial, while
“outcome” refers to the experiment as a whole; thus one outcome consists of an
enumeration of N results, including their order. For instance, ten tosses of a die
(n = 6, N = 10) might have the outcome “1326642335.”) Each outcome yields a
set of sample numbers ¦N
i
¦ and relative frequencies ¦f
i
= N
i
/N, i = 1 . . . n¦. In
many situations the outcome of a random experiment is not known completely:
One does not know the order in which the individual results occurred, and often
one does not even know all n relative frequencies ¦f
i
¦ but only a smaller number
m (m < n) of linearly independent constraints
n
¸
i=1
G
i
a
f
i
= g
a
, a = 1 . . . m. (1)
As a simple example consider a loaded die. Observations on this badly bal
anced die have shown that 6 occurs twice as often as 1; nothing peculiar was
observed for the other faces. Given this information only and nothing else, i.e.,
not making use of any additional information that we might get from inspection
of the die or from past experience with dice in general, all we know is a single
constraint of the form (1) with
G
i
1
=
2 : i = 1
0 : i = 2 . . . 5
−1 : i = 6
(2)
and g
1
= 0.
The available data –in the form of linear constraints– are generally not suﬃ
cient to reconstruct unambiguously the relative frequencies ¦f
i
¦. These frequen
cies may be regarded as Cartesian coordinates of a point in an ndimensional
vector space. The m linear constraints, together with f
i
∈ [0, 1] and the normal
ization condition
¸
f
i
= 1, then just restrict the allowed points to some portion
of an (n −m−1)dimensional hyperplane.
1.2 Concentration theorem
Given an a priori probability distribution ¦p
i
¦ for the results i = 1 . . . n, the
probability that N trials will yield the –generally diﬀerent– relative frequencies
¦f
i
¦ is
prob(¦f
i
¦[¦p
i
¦, N) =
N!
N
1
! . . . N
n
!
p
N
1
1
. . . p
Nn
n
. (3)
2
Here the second factor is the probability for one speciﬁc outcome with sample
numbers ¦N
i
¦, and the ﬁrst factor counts the number of all outcomes that give
rise to the same set of sample numbers. With the deﬁnition
I
p
(f) := −
¸
i
f
i
ln
f
i
p
i
(4)
and the shorthand notations f = ¦f
i
¦, p = ¦p
i
¦ we can also write
prob(f[p, N) = prob(f[f, N) exp[NI
p
(f)] . (5)
In particular, for two diﬀerent data sets ¦f
i
¦ and ¦f
′
i
¦ the ratio of their respective
probabilities is given by
prob(f[p, N)
prob(f
′
[p, N)
=
prob(f[f, N)
prob(f
′
[f
′
, N)
exp[N(I
p
(f) −I
p
(f
′
))] (6)
where, by virtue of Stirling’s formula
x! ≈
√
2πxx
x
e
−x
, (7)
it is asymptotically
prob(f[f, N)
prob(f
′
[f
′
, N)
≈
¸
i
f
′
i
f
i
. (8)
As the latter ratio is independent of N, for large N and nearby distributions
f
′
≈ f the variation of prob(f[p, N)/prob(f
′
[p, N) is completely dominated by
the exponential:
prob(f[p, N)
prob(f
′
[p, N)
≈ exp[N(I
p
(f) −I
p
(f
′
))] . (9)
Hence the probability with which any given frequency distribution f is realized is
essentially determined by the quantity I
p
(f): The larger this quantity, the more
likely the frequency distribution is realized.
Consider now all frequency distributions allowed by m linearly independent
constraints. As we discussed earlier, the allowed distributions can be visualized
as points in some portion of an (n − m − 1)dimensional hyperplane. In this
hyperplane portion there is a unique point at which the quantity I
p
(f) attains a
maximum I
max
p
; we call this point the “maximal point” f
max
. (That the maximal
point is indeed unique can be seen as follows: Suppose there were not one but
two maximal points corresponding to frequency distributions f
(1)
and f
(2)
. Then
the mixture
¯
f = (f
(1)
+ f
(2)
)/2 would have I
p
(
¯
f) > I
max
p
, which would be a
contradiction.) It is possible to deﬁne new coordinates ¦x
1
. . . x
n−m−1
¦ in the
hyperplane such that
3
• they are linear functions of the ¦f
i
¦;
• the origin (x = 0) is at the maximal point; and
• in the vicinity of the maximal point
I
p
(x) = I
max
p
−ar
2
+ O(r
3
) , a > 0 , (10)
where
r :=
n−m−1
¸
j=1
x
2
j
. (11)
Frequency distributions that satisfy the given constraints (1) and whose I
p
(x)
diﬀers from I
max
p
by more than ∆I thus lie outside a hypersphere around the
maximal point, the sphere’s radius R being given by aR
2
= ∆I. The probability
that N trials will yield such a frequency distribution outside the hypersphere is
prob(I
p
< (I
max
p
−∆I)[mconstraints) =
∞
R
dr r
n−m−2
exp(−Nar
2
)
∞
0
dr
′
r
′
n−m−2
exp(−Nar
′
2
)
. (12)
Here the factors r
n−m−2
in the integrand are due to the volume element, while
the exponentials exp(−Nar
2
) = exp(N(I
p
(x) − I
max
p
)) stem from the ratio (9).
Substituting t = Nar
2
, deﬁning
s := (n −m−3)/2 (13)
and using
Γ(s + 1) =
∞
0
dt t
s
exp(−t) (14)
one may also write
prob(I
p
< (I
max
p
−∆I)[mconstraints) =
1
Γ(s + 1)
∞
N∆I
dt t
s
exp(−t) ; (15)
which for large N (N ≫ s/∆I) can be approximated by
prob(I
p
< (I
max
p
−∆I)[mconstraints) ≈
1
Γ(s + 1)
(N∆I)
s
exp(−N∆I) . (16)
As the number N of trials increases, this probability rapidly tends to zero for
any ﬁnite ∆I. As N → ∞, therefore, it becomes virtually certain that the (aside
from m constraints) unknown frequency distribution has an I
p
very close to I
max
p
.
Hence not only does the maximal point represent the frequency distribution that
is the most likely to be realized (cf. Eq. (9)); but in addition, as N increases,
all other –theoretically allowed– frequency distributions become more and more
concentrated near this maximal point. Any frequency distribution other than the
maximal point becomes highly atypical of those allowed by the constraints.
4
1.3 Frequency estimation
We have seen that the knowledge of m (m < n) “averages” (1) constrains, but
fails to specify uniquely, the relative frequencies ¦f
i
¦. In view of this incomplete
information the relative frequencies must be estimated. Our previous consid
erations suggest that the most reasonable estimate is the maximal point: that
distribution which, while satisfying all the constraints, maximizes I
p
(f). This
leads to a variational equation
δ
¸
¸
i
f
i
ln
f
i
p
i
+ η
¸
i
f
i
+
¸
a
λ
a
¸
i
G
i
a
f
i
¸
= 0 (17)
where the constraints, as well as the normalization condition
¸
i
f
i
= 1, have been
implemented by means of Lagrange multipliers. Its solution is of the form
f
max
i
=
1
Z
exp
ln p
i
−'ln p`
p
−
¸
a
λ
a
G
i
a
(18)
with
Z =
¸
i
exp
ln p
i
−'lnp`
p
−
¸
a
λ
a
G
i
a
. (19)
The term
'lnp`
p
:=
¸
j
p
j
ln p
j
(20)
has been introduced by convention; it cancels from the ratio in (18) and so does
not aﬀect the frequency estimate. The expression in the exponent simpliﬁes if
and only if the a priori distribution ¦p
i
¦ is uniform: In this case,
lnp
i
−'ln p`
p
= 0 . (21)
The m Lagrange parameters ¦λ
a
¦ must be adjusted such as to yield the correct
prescribed averages ¦g
a
¦. They can be determined from
∂
∂λ
a
ln Z = −g
a
, (22)
a set of m simultaneous equations for m unknowns. Finally, inserting (18) into
the deﬁnition of I
p
(f) gives
I
max
p
= 'lnp`
p
+ ln Z +
¸
a
λ
a
g
a
. (23)
There remains the task of specifying the –possibly nonuniform– a priori prob
ability distribution ¦p
i
¦. The ¦p
i
¦ are those probabilities one would assign before
having asserted the existence of the constraints (1); i.e., being still in a state of
ignorance. This “ignorance distribution” can usually be determined on the basis
5
of symmetry considerations: If the problem at hand is a priori invariant under
some characteristic group then the ¦p
i
¦, too, must exhibit this same group in
variance.
1
For example, if a priori we do not know anything about the properties
of a given die then our prior ignorance extends to all faces equally. The prob
lem is therefore invariant under a relabelling of the faces, which trivially implies
¦p
i
= 1/6¦. In more complicated random experiments, especially those involving
continuous and hence coordinatedependent distributions, the task of specifying
the a priori distribution may be less straightforward.
2
For illustration let us return to the example of the loaded die, characterized
solely by the single constraint (2). What estimates should we make of the relative
frequencies ¦f
i
¦ with which the diﬀerent faces appeared? Taking the a priori
probability distribution –assigned to the various faces before one has asserted the
die’s imperfection– to be uniform, ¦p
i
= 1/6¦, the best estimate (18) for the
frequency distribution reads
f
max
i
=
Z
−1
exp(−2λ
1
) : i = 1
Z
−1
: i = 2 . . . 5
Z
−1
exp(λ
1
) : i = 6
(24)
with only a single Lagrange parameter λ
1
and
Z = exp(−2λ
1
) + 4 + exp(λ
1
) . (25)
The Lagrange parameter is readily determined from
∂
∂λ
1
ln Z = −g
1
= 0 , (26)
with solution
λ
1
= (ln 2)/3 . (27)
This in turn gives the numerical estimates
f
max
i
=
0.107 : i = 1
0.170 : i = 2 . . . 5
0.214 : i = 6
(28)
with an associated
I
max
p
= ln(1/6) + ln Z = −0.019 . (29)
1
The rationale underlying this consistency requirement has historically been called the “Prin
ciple of Insuﬃcient Reason” (J. Bernoulli, Ars Conjectandi, 1713).
2
see for example E. T. Jaynes, Prior probabilities, IEEE Trans. Systems Sci. Cyb. 4, 227
(1968)
6
The above algorithm for estimating frequencies can be iterated. Suppose
that beyond the m constraints (1) we learn of l additional, linearly independent
constraints
n
¸
i=1
G
i
a
f
i
= g
a
, a = (m + 1) . . . (m + l) . (30)
In order to make an improved estimate that takes these additional data into
account we can either, (i) starting from the same a priori distribution p as before,
apply the algorithm to the total set of (m + l) constraints; or (ii) iterate: use
the previous estimate (18), which was based on the ﬁrst m constraints only, as a
new a priori distribution f
max
→ p
′
, and then repeat the algorithm just for the l
additional constraints. Both procedures give the same improved estimate f
max′
.
Associated with this improved estimate is
I
max
p
′
= I
max
p
+ I
f
max(f
max′
) . (31)
1.4 Hypothesis testing
Now we consider random experiments for which complete frequency data are
available. Suppose that, based on some insight we have into the systematic
inﬂuences aﬀecting the experiment, we conjecture that the observed relative fre
quencies can be fully characterized by a set of constraints of the –by now familiar–
form (1), and that hence the observed relative frequencies can be ﬁtted with a
maximal distribution (18). This maximal distribution contains m ﬁt parameters
¦λ
a
¦ (the Lagrange parameters) whose speciﬁc values depend on the averages
¦g
a
¦, which in turn are extracted from the data. It represents our theoretical
model or hypothesis.
In general, the experimental frequencies f and the theoretical ﬁt f
max
do
not agree exactly. Must the hypothesis therefore be rejected, or is the deviation
merely a statistical ﬂuctuation? The answer is furnished by the concentration
theorem: Let N be the number of trials performed to establish the experimental
distribution, let
∆I = I
max
p
−I
p
(f) (32)
and s = (n −m−3)/2. For large N (N ≫ s/∆I) the probability that statistical
ﬂuctuations alone yield an I
p
diﬀerence as large as ∆I is given by (16); typically
the hypothesis is rejected whenever this probability is below 5%,
3
prob(I
p
< (I
max
p
−∆I)[mconstraints) < 5% . (33)
Rejecting a hypothesis means that the chosen set of constraints was not complete,
and hence that important systematic eﬀects have been overlooked. These must be
incorporated in the form of additional constraints. In this fashion one can proceed
iteratively from simple to ever more sophisticated models until the deviation of
the ﬁt from the experimental data ceases to be statistically signiﬁcant.
3
The hypothesis test presented here is closely related to the betterknown χ
2
test.
7
i f
i
∆
i
1 0.16230 0.00437
2 0.17245 +0.00578
3 0.14485 0.02182
4 0.14205 0.02464
5 0.18175 +0.01508
6 0.19960 +0.02993
Table 1: Wolf’s die data: frequency distribution f and its deviation ∆ from the
uniform distribution.
1.5 Jaynes’ analysis of Wolf’s die data
The above prescription for testing hypotheses and –if rejected– for iteratively
improving them by enlarging the set of constraints has been lucidly illustrated
by E. T. Jaynes in his analysis of Wolf’s die data.
4
Rudolph Wolf (1816–1893), a
Swiss astronomer, had performed a number of random experiments, presumably
to check the validity of statistical theory. In one of these experiments a die
(actually two dice, but only one of them is of interest here) was tossed 20, 000
times in a way that precluded any systematic favoring of any face over any other.
The observed relative frequencies ¦f
i
¦ and their deviations ¦∆
i
= f
i
− p
i
¦ from
the a priori probabilities ¦p
i
= 1/6¦ are given in Table 1. Associated with the
observed distribution is
I
p
(f) = −0.006769 . (34)
Our “null hypothesis” H0 is that the die is ideal and hence that there are no
constraints needed to characterize any imperfection (m = 0); the deviation of the
experimental from the uniform distribution, with associated
I
max(H0)
p
= I
p
(p) = 0 , (35)
is merely a statistical ﬂuctuation. However, the probability that statistical ﬂuc
tuations alone yield an I
p
diﬀerence as large as
∆I
H0
= I
max(H0)
p
−I
p
(f) = 0.006769 (36)
is practically zero: Using Eq. (16) with N = 20, 000 and s = 3/2 we ﬁnd
prob(I
p
< (I
max
p
−∆I
H0
)[0 constraints) ∼ 10
−56
. (37)
4
E. T. Jaynes, Concentration of distributions at entropy maxima, in: E. T. Jaynes, Papers on
Probability, Statistics and Statistical Mechanics, ed. by R. D. Rosenkrantz, Kluwer Academic,
Dordrecht (1989).
8
Therefore, the null hypothesis is rejected: The die cannot be perfect.
Our analysis need not stop here. Not knowing the mechanical details of the
die we can still formulate and test hypotheses as to the nature of its imperfections.
Jaynes argued that the two most likely imperfections are:
• a shift of the center of gravity due to the mass of ivory excavated from the
spots, which being proportional to the number of spots on any side, should
make the “observable”
G
i
1
= i −3.5 (38)
have a nonzero average g
1
= 0; and
• errors in trying to machine a perfect cube, which will tend to make one
dimension (the last side cut) slightly diﬀerent from the other two. It is
clear from the data that Wolf’s die gave a lower frequency for the faces
(3,4); and therefore that the (34) dimension was greater than the (16) or
(25) ones. The eﬀect of this is that the “observable”
G
i
2
=
1 : i = 1, 2, 5, 6
−2 : i = 3, 4
(39)
has a nonzero average g
2
= 0.
Our hypothesis H2 is that these are the only two imperfections present. More
speciﬁcally, we conjecture that the observed relative frequencies are characterized
by just two constraints (m = 2) imposed by the measured averages
g
1
= 0.0983 and g
2
= 0.1393 ; (40)
and that hence the observed relative frequencies can be ﬁtted with a maximal
distribution
f
max(H2)
i
=
1
Z
exp
−
2
¸
a=1
λ
a
G
i
a
. (41)
In order to test our hypothesis we determine
Z =
6
¸
i=1
exp
−
2
¸
a=1
λ
a
G
i
a
, (42)
ﬁx the Lagrange parameters by requiring
∂
∂λ
a
lnZ = −g
a
(43)
and then calculate
I
max(H2)
p
= ln(1/6) + ln Z +
2
¸
a=1
λ
a
g
a
. (44)
9
With this algorithm Jaynes found
I
max(H2)
p
= −0.006534 (45)
and thus
∆I
H2
= I
max(H2)
p
−I
p
(f) = 0.000235 . (46)
The probability for such an I
p
diﬀerence to occur as a result of statistical ﬂuctu
ations is (with now s = 1/2)
prob(I
p
< (I
max
p
−∆I
H2
)[2 constraints) ≈ 2.5% , (47)
much larger than the previous 10
−56
but still below the usual acceptance bound
of 5%. The more sophisticated model H2 is therefore a major improvement over
the null hypothesis H0 and captures the principal features of Wolf’s die; yet there
are indications that an additional very tiny imperfection may have been present.
Jaynes’ analysis of Wolf’s die data furnishes a useful paradigm for the exper
imental method in general. All modern experiments at particle colliders (CERN,
Desy, Fermilab. . . ), for example, yield data in the form of frequency distributions
over discrete “bins” in momentum space, for each of the various end products
of the collision. The search for interesting signals in the data (new particles,
new interactions, etc.) essentially proceeds in the same manner in which Jaynes
revealed the imperfections of Wolf’s die: by formulating physically motivated
hypotheses and testing them against the data. Such a test is always statistical in
nature. Conclusions (say, about the presence of a top quark, or about the pres
ence of a certain imperfection of Wolf’s die) can never be drawn with absolute
certainty but only at some –quantiﬁable– conﬁdence level.
1.6 Conclusion
In all our considerations a crucial role has been played by the quantity I
p
: The
algorithm that yields the best estimate for an unknown frequency distribution is
based on the maximization of I
p
; and hypotheses can be tested with the help of
Eq. (16), i.e., by simply comparing the experimental and theoretical values of I
p
.
We shall soon encounter the quantity I
p
again and see how it is related to one of
the most fundamental concepts in statistical mechanics: the “entropy.”
2 Macroscopic Systems in Equilibrium
2.1 Macrostate
For complex systems with many degrees of freedom (like a gas, ﬂuid or plasma)
the exact microstate is usually not known. It is therefore impossible to assign to
the system a unique point in phase space (classical) or a unique wave function
10
(quantal), respectively. Instead one must resort to a statistical description: The
system is described by a classical phase space distribution ρ(π) or an incoherent
mixture
ˆ ρ =
¸
i
f
i
[i`'i[ (48)
of mutually orthogonal quantum microstates ¦[i`¦, respectively. (Where the dis
tinction between classical and quantal does not matter we shall use the generic
symbol ρ.) Probabilities must be real, nonnegative, and normalized to one;
which implies the respective properties
ρ(π)
∗
= ρ(π) , ρ(π) ≥ 0 ,
dπ ρ(π) = 1 (49)
or
ˆ ρ
†
= ˆ ρ , ˆ ρ ≥ 0 , tr ˆ ρ = 1 . (50)
In this statistical description every observable A (real phase space function or
Hermitian operator, respectively) is assigned an expectation value
'A`
ρ
=
dπ ρ(π)A(π) (51)
or
'A`
ρ
= tr(ˆ ρ
ˆ
A) , (52)
respectively.
Typically, not even the distribution ρ is a priori known. Rather, the state of
a complex physical system is characterized by very few macroscopic data. These
data may come in diﬀerent forms:
• as data given with certainty, such as the type of particles that make up the
system, or the shape and volume of the box in which they are enclosed.
These exact data we take into account through the deﬁnition of the phase
space or Hilbert space in which we are working;
• as prescribed expectation values
'G
a
`
ρ
= g
a
, a = 1 . . . m (53)
of some set ¦G
a
¦ of selected macroscopic observables. Examples might be
the average total energy, average angular momentum, or average magneti
zation. Such data, which are of a statistical nature, impose constraints of
the type (1) on the distribution ρ; or
• as additional control parameters on which the selected observables ¦G
a
¦
may explicitly depend, such as an external electric or magnetic ﬁeld.
11
According to our general considerations in Section 1.3 the best estimate for the
thus characterized macrostate is a distribution of the form (18). In the classical
case this implies
ρ(π) =
1
Z
exp
ln σ(π) −'ln σ`
σ
−
¸
a
λ
a
G
a
(π)
(54)
with
Z =
dπ exp
lnσ(π) −'ln σ`
σ
−
¸
a
λ
a
G
a
(π)
; (55)
while for a quantum system
ˆ ρ =
1
Z
exp
ln ˆ σ −'lnσ`
σ
−
¸
a
λ
a
ˆ
G
a
(56)
and
Z = tr exp
ln ˆ σ −'ln σ`
σ
−
¸
a
λ
a
ˆ
G
a
. (57)
In both cases σ denotes the a priori distribution. The auxiliary quantity Z is
referred to as the partition function.
5
The phase space integral or trace in the respective expressions for Z depend
on the speciﬁc choice of the phase space or Hilbert space; hence they may depend
on parameters like the volume or particle number. Furthermore, there may be an
explicit dependence of the observables ¦G
a
¦ or of the a priori distribution σ on ad
ditional control parameters. Therefore, the partition function generally depends
not just on the Lagrange multipliers ¦λ
a
¦ but also on some other parameters
¦h
b
¦. In analogy with the relation (22) one then deﬁnes new variables
γ
b
:=
∂
∂h
b
ln Z . (58)
(In contrast to (22) there is no minus sign.) The ¦g
a
¦, ¦λ
a
¦, ¦h
b
¦ and ¦γ
b
¦ are
called the thermodynamic variables of the system; together they specify the sys
tem’s macrostate. The thermodynamic variables are not all independent: Rather,
they are related by (22) and (58), that is, via partial derivatives of ln Z. One
says that h
b
and γ
b
, or g
a
and λ
a
, are conjugate to each other.
Some combinations of thermodynamic variables are of particular importance,
which is why the associated distributions go by special names. If the observables
that characterize the macrostate –in the form of sharp values given with certainty,
5
Readers already familiar with statistical mechanics might be disturbed by the appearance
of σ in the deﬁnitions of ρ and Z. Yet this is essential for a consistent formulation of the
theory: see, for instance, our remarks at the end of Section 1.3 on the possibility of iterating
the frequency estimation algorithm. In most practical applications σ is uniform and hence
ln σ −'ln σ`
σ
= 0. Our deﬁnitions of ρ and Z then reduce to the conventional expressions.
12
or in the form of expectation values– are all constants of the motion then the
system is said to be in equilibrium. Associated is an equilibrium distribution
of the form (54) or (56), with all ¦G
a
¦ being constants of the motion. Such
an equilibrium distribution is itself constant in time, and so are all expectation
values calculated from it.
6
The set of constants of the motion always includes the
Hamiltonian (Hamilton function or Hamilton operator, respectively) provided it
is not explicitly timedependent. If its value for a speciﬁc system, the internal
energy, and the other macroscopic data are all given with certainty then the
resulting equilibrium distribution is called microcanonical; if just the energy is
given on average, while all other data are given with certainty, canonical; and if
both energy and total particle number are given on average, while all other data
are given with certainty, grand canonical.
Strictly speaking, every description of the macrostate in terms of thermody
namic variables represents a hypothesis: namely, the hypothesis that the sets
¦G
a
¦ and ¦h
b
¦ are actually complete. This is analogous to Jaynes’ model for
Wolf’s die, which assumes that just two imperfections (associated with two ob
servables G
1
, G
2
) suﬃce to characterize the experimental data. Such a hypothesis
may well be rejected by experiment. If so, this does not mean that our rationale
for constructing ρ –maximizing I
σ
under given constraints– was wrong. Rather,
it means that important macroscopic observables or control parameters (such as
“hidden” constants of the motion, or further imperfections of Wolf’s die) have
been overlooked, and that the correct description of the macrostate requires ad
ditional thermodynamic variables.
2.2 First law of thermodynamics
Changing the values of the thermodynamic variables alters the distribution ρ and
with it the associated
I
max
σ
≡ I
σ
(ρ) = 'lnσ`
σ
+ ln Z +
¸
a
λ
a
g
a
. (59)
By virtue of Eqs. (22) and (58) its inﬁnitesimal variation is given by
dI
max
σ
= d'ln σ`
σ
+
¸
a
λ
a
dg
a
+
¸
b
γ
b
dh
b
. (60)
As the set of constants of the motion always contains the Hamiltonian its value for
the given system, the internal energy U, and the associated conjugate parameter,
which we denote by β, play a particularly important role. Depending on whether
the energy is given with certainty or on average, the pair (U, β) corresponds to a
pair (h, γ) or (g, λ). For all remaining variables one then deﬁnes new conjugate
parameters
l
a
:= λ
a
/β , m
a
:= γ
a
/β (61)
6
Here we have assumed that there is no timedependence of the a priori distribution σ.
13
such that in terms of these new parameters the energy diﬀerential reads
dU = β
−1
d(I
max
σ
−'ln σ`
σ
)−
¸
a
l
a
dg
a
−
¸
b
m
b
dh
b
. (62)
A change in internal energy that is eﬀected solely by a variation of the pa
rameters ¦g
a
¦ or ¦h
b
¦ is deﬁned as work
δW := −
¸
a
l
a
dg
a
−
¸
b
m
b
dh
b
; (63)
some commonly used pairs (g, l) and (h, m) of thermodynamic variables are listed
in Table 2. If, on the other hand, these parameters are held ﬁxed (dg
a
= dh
b
= 0)
then the internal energy can still change through the addition or subtraction of
heat
δQ :=
1
kβ
k d(I
max
σ
−'lnσ`
σ
) . (64)
Here we have introduced an arbitrary constant k. Provided we choose this con
stant to be the Boltzmann constant
k = 1.381 10
−23
J/K , (65)
we can identify the temperature
T :=
1
kβ
(66)
and the entropy
S := k (I
max
σ
−'lnσ`
σ
) (67)
to write δQ in the more familiar form
δQ = TdS . (68)
The entropy is related to the other thermodynamic variables via Eq. (59), i.e.,
7
S = k ln Z + k
¸
a
λ
a
g
a
. (69)
The relation
dU = δQ + δW , (70)
which reﬂects nothing but energy conservation, is known as the ﬁrst law of ther
modynamics.
7
Even though the entropy, like the partition function, is related to measurable quantities
it is essentially an auxiliary concept and does not itself constitute a physical observable: In
quantum mechanics, for example, there is nothing like a Hermitian “entropy operator.”
14
(g, l) (h, m) names
(V, p) volume, pressure
(N, −µ) (N, −µ) particle number, chemical potential
(M, −B) (B, M) magnetic induction, magnetization
(P, −E) (E, P) electric ﬁeld, electric polarization
( p, −v) momentum, velocity
(
L, −ω) angular momentum, angular velocity
Table 2: Some commonly used pairs of thermodynamic variables. In cases where
two pairs are given, e. g., (M, −B) and (B, M), the proper choice depends on the
speciﬁc situation: For example, the pair (M, −B) is adequate if the magnetization
M is a constant of the motion whose value is given on average; while the pair
(B, M) should be used if there is an externally applied magnetic ﬁeld B which
plays the role of a control parameter.
2.3 Example: Ideal quantum gas
We consider a gas of noninteracting bosons or fermions. We suppose that the
total particle number is not given with certainty (but possibly on average, as in
the grand canonical ensemble) so the system must be described in Fock space. We
further suppose that the observables ¦
ˆ
G
a
¦ whose expectation values are furnished
as macroscopic data are all of the singleparticle form
ˆ
G
a
=
¸
i
G
i
a
ˆ
N
i
, (71)
where the ¦G
i
a
¦ are arbitrary (cnumber) coeﬃcients and the ¦
ˆ
N
i
¦ denote number
operators pertaining to some orthonormal basis ¦[i`¦ of singleparticle states.
Provided the a priori distribution σ is uniform, the best estimate for the macro
state has the form
ˆ ρ =
1
Z
exp
−
¸
i
α
i
ˆ
N
i
(72)
with
α
i
=
¸
a
λ
a
G
i
a
. (73)
For example, in the grand canonical ensemble (energy and total particle number
given on average) the parameters ¦α
i
¦ are functions of the singleparticle energies
¦ǫ
i
¦, the inverse temperature β and the chemical potential µ:
α
i
= β(ǫ
i
−µ) . (74)
15
The partition function
Z = tr exp
−
¸
i
α
i
ˆ
N
i
=
¸
conﬁgurations {N
1
,N
2
,...}
¸
i
e
−α
i
N
i
(75)
factorizes, for we work in Fock space where we sum freely over each N
i
:
Z =
¸
i
¸
N
i
e
−α
i
N
i
=:
¸
i
Z
i
. (76)
The sum over N
i
extends from 0 to the maximum value allowed by particle
statistics: ∞ for bosons, 1 for fermions. Consequently, each factor Z
i
reads
Z
i
=
1 ∓e
−α
i
∓1
, (77)
the upper sign pertaining to bosons and the lower sign to fermions. This gives
ln Z = ∓
¸
i
ln
1 ∓e
−α
i
(78)
and hence the average occupation
n
i
≡ 'N
i
`
ρ
= −
∂
∂α
i
ln Z =
e
α
i
∓1
−1
(79)
of any singleparticle state i. Using the inverse relation
α
i
= ln(1 ±n
i
) −ln n
i
(80)
together with the speciﬁc realization of Eq. (69),
S = k lnZ + k
¸
i
α
i
n
i
, (81)
we ﬁnd for the entropy
S = −k
¸
i
[n
i
ln n
i
∓(1 ±n
i
) ln(1 ±n
i
)] . (82)
2.4 Thermodynamic potentials
Like the partition function, thermodynamic potentials are auxiliary quantities
used to facilitate calculations. One example is the (generalized) grand potential
Ω(T, l
a
, h
b
) := −
1
β
ln Z , (83)
related to the internal energy U via
Ω = U −TS +
¸
a
l
a
g
a
. (84)
16
Its diﬀerential
dΩ = −SdT +
¸
a
g
a
dl
a
−
¸
b
m
b
dh
b
(85)
shows that S, g
a
and m
b
can be obtained from the grand potential by partial
diﬀerentiation; e.g.,
S = −
∂Ω
∂T
l
a
,h
b
, (86)
where the subscript means that the partial derivative is to be taken at ﬁxed l
a
, h
b
.
In addition to the grand potential there are many other thermodynamic poten
tials: Their deﬁnition and properties are best summarized in a Born diagram (Fig.
1). In a given physical situation it is most convenient to work with that potential
which depends on the variables being controlled or measured in the experiment.
For example, if a chemical reaction takes place at constant temperature and pres
sure (controlled variables T, ¦m
b
¦ = ¦p¦), and the observables of interest are the
particle numbers of the various reactants (measured variables ¦g
a
¦ = ¦N
i
¦) then
the reaction is most conveniently described by the free enthalpy G(T, N
i
, p).
When a large system is physically divided into several subsystems then in
these subsystems the thermodynamic variables generally take values that diﬀer
from those of the total system. In the special case of a homogeneous system all
variables of interest can be classiﬁed either as extensive –varying proportionally
to the volume of the respective subsystem– or intensive –remaining invariant
under the subdivision of the system. Examples for the former are the volume
itself, the internal energy or the number of particles; whereas amongst the latter
are the pressure, the temperature or the chemical potential. In general, if a
thermodynamic variable is extensive then its conjugate is intensive, and vice
versa. If we assume that the temperature and the ¦l
a
¦ are intensive, while the
¦h
b
¦ and the grand potential are extensive, then
Ω
hom
(T, l
a
, τh
b
) = τ Ω
hom
(T, l
a
, h
b
) ∀ τ > 0 (87)
and hence
Ω
hom
= −
¸
b
m
b
h
b
. (88)
This implies the GibbsDuhem relation
SdT −
¸
a
g
a
dl
a
−
¸
b
h
b
dm
b
= 0 . (89)
For an ideal gas in the grand canonical ensemble, for instance, we have the
temperature T and the chemical potential ¦l
a
¦ = ¦−µ¦ intensive, whereas the
volume ¦h
b
¦ = ¦V ¦ and the grand potential Ω are extensive; hence
Ω
i.gas
(T, µ, V ) = −p(T, µ) V . (90)
17
Ω
G H
F U
Ξ χ
2
χ
1
m
h
T S
g
l
(=0)
Figure 1: Born diagram. Corners correspond to thermodynamic potentials: the
grand potential Ω, the free energy F, the internal energy U, the enthalpy H, the
free enthalpy G, the potential Ξ (which vanishes for a homogeneous system), as
well as two rarely used potentials χ
1
and χ
2
. Sides of the cube correspond to
thermodynamic variables: T, S, g, l, h and m. Opposite sides are conjugate to
each other, and associated with each conjugate pair is a dotted “basis vector.”
Each corner is a function of the adjacent sides; e.g., the enthalpy H is a function
of ¦S, g, m¦. Their conjugates ¦T, l, h¦ can be obtained from H by partial diﬀer
entiation, the sign depending on whether the requested conjugate variable is at
the head (−) or tail (+) of a basis vector; e.g., T = +∂H/∂S. One can go from
one corner to the next by moving parallel or antiparallel to a basis vector, thereby
(i) changing variables such as to get the correct dependence of the new potential,
and (ii) adding (if moving parallel) or subtracting (if moving antiparallel) the
product of the conjugate variables that are associated with the basis vector. For
instance, in order to obtain the free enthalpy G from the enthalpy H one (i) uses
T = +∂H/∂S to solve for S(T, g, m), since the free enthalpy will be a function
of ¦T, g, m¦ rather than ¦S, g, m¦; and then (ii) subtracts the product TS to get
G(T, g, m) = H(S(T, g, m), g, m) − TS(T, g, m). This procedure is known as a
Legendre transformation. Successive application allows one to calculate all ther
modynamic potentials from the grand potential Ω and hence, ultimately, from
the partition function Z.
18
2.5 Correlations
Arbitrary expectation values 'A`
ρ
in the macrostate (54) or (56), respectively,
depend on the Lagrange multipliers ¦λ
a
¦ as well as –possibly– on other parame
ters ¦h
b
¦. If the Lagrange multipliers vary inﬁnitesimally while the ¦h
b
¦ are held
ﬁxed, the expectation value 'A`
ρ
changes according to
d'A`
ρ
= −
¸
a
'δG
a
; A`
ρ
dλ
a
. (91)
Here '; `
ρ
is the canonical correlation function with respect to the state ρ:
'A; B`
ρ
:=
dπ ρ(π)A(π)
∗
B(π) (92)
in the classical case or
'A; B`
ρ
:=
1
0
dν tr
ˆ ρ
ν
ˆ
A
†
ˆ ρ
1−ν
ˆ
B
(93)
in the quantum case, respectively. The observable δG
a
is deﬁned as
δG
a
:= G
a
−'G
a
`
ρ
. (94)
The correlation matrix
C
ab
:= 'δG
a
; δG
b
`
ρ
= −
∂g
b
∂λ
a
λ,h
=
∂
2
∂λ
a
∂λ
b
ln Z
λ,h
(95)
thus relates inﬁnitesimal variations of λ and g:
dg
b
= −
¸
a
dλ
a
C
ab
, dλ
a
= −
¸
b
dg
b
(C
−1
)
ba
. (96)
The subscripts λ, h of the partial derivatives indicate that they must be taken
with all other ¦λ
a
¦ and all ¦h
b
¦ held ﬁxed. Returning to our example of the ideal
quantum gas, we immediately obtain from (79) the correlation of occupation
numbers
'δN
i
; δN
j
`
ρ
= −
∂n
j
∂α
i
= δ
ij
n
i
(1 ±n
i
) . (97)
3 Linear Response
3.1 Liouvillian and Evolution
The dynamics of an expectation value 'A`
ρ
is governed by the equation of motion
d'A`
ρ
dt
= 'iLA`
ρ
+
∂A
∂t
¸
ρ
. (98)
19
Here we have allowed for an explicit timedependence of the observable A. Clas
sically, the Liouvillian L takes the Poisson bracket with the Hamilton function
H(π),
iL =
¸
j
∂H
∂P
j
∂
∂Q
j
−
∂H
∂Q
j
∂
∂P
j
(99)
in canonical coordinates π = ¦Q
j
, P
j
¦; whereas in the quantum case it takes the
commutator with the Hamilton operator
ˆ
H,
iL = (i/¯ h) [
ˆ
H, ∗] . (100)
An observable A for which iLA + ∂A/∂t = 0 is called a constant of the motion;
a state ρ for which Lρ = 0 is called stationary. Only for a stationary ρ the
Liouvillian is Hermitian with respect to the canonical correlation function,
'A; LB`
ρ
= 'LA; B`
ρ
∀ A, B . (101)
The evolver  is deﬁned as the solution of the diﬀerential equation
∂
∂t
(t
0
, t) = i(t
0
, t)L (102)
with initial condition (t
0
, t
0
) = 1. As long as the Liouvillian L is not explicitly
timedependent, the solution has the simple exponential form
(t
0
, t) = exp[i(t −t
0
)L] ; (103)
however, we shall not assume this in the following. The evolver determines –at
least formally– the evolution of expectation values via
'A`
ρ
(t) = '(t
0
, t)A`
ρ
(t
0
) . (104)
Multiplication with a step function
θ(t −t
0
) =
0 : t ≤ t
0
1 : t > t
0
(105)
yields the socalled causal evolver

<
(t
0
, t) := (t
0
, t) θ(t −t
0
) (106)
(where ‘<’ symbolizes ‘t
0
< t’) which satisﬁes another diﬀerential equation
∂
∂t

<
(t
0
, t) = i
<
(t
0
, t)L + δ(t −t
0
) . (107)
If a (possibly timedependent) perturbation is added to the Liouvillian,
L
(V )
:= L +1 , (108)
20
then the perturbed causal evolver 
(V )
<
is related to the unperturbed 
<
by an
integral equation

(V )
<
(t
0
, t) = 
<
(t
0
, t) +
∞
−∞
dt
′

(V )
<
(t
0
, t
′
) i1(t
′
) 
<
(t
′
, t) . (109)
Iteration of this integral equation –reexpressing the 
(V )
<
(t
0
, t
′
) in the integrand
in terms of another sum of the form (109), and so on– yields an inﬁnite series, the
terms being of increasing order in 1. Truncating this series after the term of order
1
n
gives an approximation to the exact causal evolver in nth order perturbation
theory.
3.2 Kubo formula
The Kubo formula describes the response of a system to weak timedependent
external ﬁelds φ
α
(t). Before t = 0 the external ﬁelds are zero and the system is
assumed to be in an initial equilibrium state
ρ(0) =
1
Z
exp
−
¸
a
λ
a
G
a
[0]
(110)
characterized by some set ¦G
a
[0]¦ of constants of the motion at zero ﬁeld (and
with the a priori distribution σ taken to be uniform). Then the external ﬁelds
are switched on:
φ
α
(t) =
0 : t ≤ 0
φ
α
(t) : t > 0
. (111)
How does an arbitrary expectation value 'A`(t) evolve in response to this external
perturbation? The general solution is
'A`(t) = '
[φ]
<
(0, t)A`
0
, (112)
where '`
0
stands for the expectation value in the initial equilibrium state ρ(0). We
assume that the observable A does not depend explicitly on time or on the ﬁelds
φ
α
(t). The Hamiltonian H[φ] and with it the Liouvillian L[φ], on the other hand,
generally do depend on the external ﬁelds. Provided the ﬁelds are suﬃciently
weak, the Liouvillian may be expanded linearly:
L[φ(t)] ≈ L[0] +
¸
α
∂L[φ]
∂φ
α
φ=0
φ
α
(t) . (113)
The zeroﬁeld Liouvillian L[0] is assumed to be not explicitly timedependent;
the linear correction to it generally is, and may be regarded as a timedependent
perturbation 1(t). Application of ﬁrst order timedependent perturbation the
ory then yields the evolver 
[φ]
<
in terms of 1(t) and the zeroﬁeld evolver 
<
.
21
Assuming for simplicity that 'A`
0
= 0 we thus ﬁnd
'A`(t) =
¸
α
∞
−∞
dt
′
i
∂L[φ]
∂φ
α
φ=0

<
(t
′
, t)A
¸
0
φ
α
(t
′
) . (114)
With the help of the mathematical identity (prove it!)
'iL[φ]B`
0
=
¸
a
'iL[φ]G
a
[0]; B`
0
λ
a
∀ B (115)
we can also write
'A`(t) =
¸
α
∞
−∞
dt
′
¸
a
λ
a
i
∂L[φ]
∂φ
α
φ=0
G
a
[0]; 
<
(t
′
, t)A
¸
0
φ
α
(t
′
) . (116)
In general, the constants of the motion depend explicitly on the external ﬁelds.
They satisfy
L[φ]G
a
[φ] = 0 ∀ φ , (117)
yet generally L[φ
′
]G
a
[φ] = 0 for φ
′
= φ. Together with the Leibniz rule this
implies
∂L[φ]
∂φ
α
φ=0
G
a
[0] = −L[0]
∂G
a
[φ]
∂φ
α
φ=0
, (118)
which we use to obtain
'A`(t) = −
¸
α
∞
−∞
dt
′
¸
a
λ
a
iL[0]
∂G
a
[φ]
∂φ
α
φ=0
; 
<
(t
′
, t)A
¸
0
φ
α
(t
′
) . (119)
The righthand side of this equation has the structure of a convolution, so in the
frequency representation we obtain an ordinary product
'A`(ω) =
¸
α
χ
A
α
(ω)φ
α
(ω) . (120)
The coeﬃcient
χ
A
α
(ω) = −
¸
a
λ
a
∞
0
dt exp(iωt)
iL[0]
∂G
a
[φ]
∂φ
α
φ=0
; A(t)
¸
0
(121)
with A(t) := 
<
(0, t)A is called the dynamical susceptibility. The above expres
sion for the dynamical susceptibility is known as the Kubo formula.
3.3 Example: Electrical conductivity
The conductivity σ
ik
(ω) determines the linear response of the current density
j to
a (possibly timedependent) homogeneous external electric ﬁeld
E. We identify
φ
α
→ E
i
, A → j
k
, χ
A
α
(ω) → σ
ik
(ω) . (122)
22
Since a conductor is an open system with the number of electrons ﬁxed only
on average, its initial state must be described by a grand canonical ensemble:
¦G
a
[φ]¦ → ¦H[
E], N¦, with associated Lagrange parameters ¦λ
a
¦ → ¦β, −βµ¦.
In principle, the formula for the conductivity then contains both ∂H/∂E
i
and
∂N/∂E
i
; but the latter vanishes, and there remains only
∂H
∂E
i
= −eQ
i
, (123)
with Q
i
denoting the ith component of the position observable and e the electron
charge. We use the general formula (121) for the susceptibility to obtain
σ
ik
(ω) = eβ
∞
0
dt exp(iωt)'iL[0]Q
i
; j
k
(t)`
0
. (124)
The current density is related to the velocity V
k
by
j
k
= enV
k
, (125)
where n is the number density of electrons. Furthermore, iL[0]Q
i
= V
i
. Hence
the conductivity is proportional to the velocityvelocity correlation:
σ
ik
(ω) = e
2
nβ
∞
0
dt exp(iωt)'V
i
; V
k
(t)`
0
. (126)
This result is rather intuitive. In a dirty metal or semiconductor, for instance, the
electrons will often scatter oﬀ impurities, thereby changing their velocities. As
a result, the velocityvelocity correlation function will decay rapidly, leading to
a small conductivity. In a clean metal with fewer impurities, on the other hand,
the velocityvelocity correlation function will decay more slowly, giving rise to a
correspondingly larger conductivity.
23
1
1.1
Some Probability Theory
Constrained distributions
A random experiment has n possible results at each trial; so in N trials there are nN conceivable outcomes. (We use the word “result” for a single trial, while “outcome” refers to the experiment as a whole; thus one outcome consists of an enumeration of N results, including their order. For instance, ten tosses of a die (n = 6, N = 10) might have the outcome “1326642335.”) Each outcome yields a set of sample numbers {Ni } and relative frequencies {fi = Ni /N, i = 1 . . . n}. In many situations the outcome of a random experiment is not known completely: One does not know the order in which the individual results occurred, and often one does not even know all n relative frequencies {fi } but only a smaller number m (m < n) of linearly independent constraints
n
Gi fi = ga a
i=1
,
a = 1...m.
(1)
As a simple example consider a loaded die. Observations on this badly balanced die have shown that 6 occurs twice as often as 1; nothing peculiar was observed for the other faces. Given this information only and nothing else, i.e., not making use of any additional information that we might get from inspection of the die or from past experience with dice in general, all we know is a single constraint of the form (1) with Gi = 1
2: i=1 0 : i = 2...5 −1 : i = 6
(2)
and g1 = 0. The available data –in the form of linear constraints– are generally not suﬃcient to reconstruct unambiguously the relative frequencies {fi }. These frequencies may be regarded as Cartesian coordinates of a point in an ndimensional vector space. The m linear constraints, together with fi ∈ [0, 1] and the normalization condition fi = 1, then just restrict the allowed points to some portion of an (n − m − 1)dimensional hyperplane.
1.2
Concentration theorem
Given an a priori probability distribution {pi } for the results i = 1 . . . n, the probability that N trials will yield the –generally diﬀerent– relative frequencies {fi } is N! prob({fi }{pi }, N) = (3) p N1 . . . p Nn . n N1 ! . . . Nn ! 1 2
which would be a contradiction. by virtue of Stirling’s formula √ x! ≈ 2πx xx e−x it is asymptotically prob(f f. In this hyperplane portion there is a unique point at which the quantity Ip (f ) attains a max maximum Ip . (8) As the latter ratio is independent of N.) It is possible to deﬁne new coordinates {x1 . (That the maximal point is indeed unique can be seen as follows: Suppose there were not one but two maximal points corresponding to frequency distributions f (1) and f (2) . (5) In particular. for large N and nearby distributions f ′ ≈ f the variation of prob(f p. N) ≈ prob(f ′ f ′. . prob(f ′ p. Then max ¯ ¯ the mixture f = (f (1) + f (2) )/2 would have Ip (f) > Ip . N) i (6) . With the deﬁnition Ip (f ) := − fi ln i fi pi (4) and the shorthand notations f = {fi }. we call this point the “maximal point” f max . N) exp[NIp (f )] . xn−m−1 } in the hyperplane such that 3 . N)/prob(f ′ p. As we discussed earlier. N) = exp[N(Ip (f ) − Ip (f ′ ))] ′ p. N) = prob(f f. for two diﬀerent data sets {fi } and {fi′ } the ratio of their respective probabilities is given by prob(f p. N) (9) Hence the probability with which any given frequency distribution f is realized is essentially determined by the quantity Ip (f ): The larger this quantity. N) prob(f f.Here the second factor is the probability for one speciﬁc outcome with sample numbers {Ni }. p = {pi } we can also write prob(f p. the more likely the frequency distribution is realized. N) ≈ exp[N(Ip (f ) − Ip (f ′ ))] . . fi′ fi (7) . N) is completely dominated by the exponential: prob(f p. Consider now all frequency distributions allowed by m linearly independent constraints. and the ﬁrst factor counts the number of all outcomes that give rise to the same set of sample numbers. the allowed distributions can be visualized as points in some portion of an (n − m − 1)dimensional hyperplane. N) prob(f prob(f where. N) ′ f ′ .
it becomes virtually certain that the (aside max from m constraints) unknown frequency distribution has an Ip very close to Ip . this probability rapidly tends to zero for any ﬁnite ∆I. (12) Here the factors r n−m−2 in the integrand are due to the volume element. as N increases. (11) Frequency distributions that satisfy the given constraints (1) and whose Ip (x) max diﬀers from Ip by more than ∆I thus lie outside a hypersphere around the maximal point. all other –theoretically allowed– frequency distributions become more and more concentrated near this maximal point. Eq. the sphere’s radius R being given by aR2 = ∆I. Hence not only does the maximal point represent the frequency distribution that is the most likely to be realized (cf. deﬁning s := (n − m − 3)/2 and using Γ(s + 1) = one may also write max prob(Ip < (Ip − ∆I)m constraints) = 0 ∞ (13) dt ts exp(−t) (14) 1 Γ(s + 1) ∞ N ∆I dt ts exp(−t) . As N → ∞. The probability that N trials will yield such a frequency distribution outside the hypersphere is max prob(Ip < (Ip − ∆I)m constraints) = ∞ n−m−2 exp(−Nar 2 ) R dr r ∞ ′ ′ n−m−2 exp(−Nar ′ 2 ) 0 dr r . • the origin (x = 0) is at the maximal point. but in addition. (10) where n−m−1 r := j=1 x2 j . 4 . Any frequency distribution other than the maximal point becomes highly atypical of those allowed by the constraints. Γ(s + 1) (16) As the number N of trials increases. (9)). Substituting t = Nar 2 . a>0 . and • in the vicinity of the maximal point max Ip (x) = Ip − ar 2 + O(r 3 ) . while max the exponentials exp(−Nar 2 ) = exp(N(Ip (x) − Ip )) stem from the ratio (9). (15) which for large N (N ≫ s/∆I) can be approximated by max prob(Ip < (Ip − ∆I)m constraints) ≈ 1 (N∆I)s exp(−N∆I) . therefore.• they are linear functions of the {fi }.
1. ln pi − ln p p =0 . Its solution is of the form fimax = with Z= i 1 exp ln pi − ln p Z p − λa Gi a a (18) exp ln pi − ln p ln p p p − i λa Ga a . Our previous considerations suggest that the most reasonable estimate is the maximal point: that distribution which. have been implemented by means of Lagrange multipliers. as well as the normalization condition i fi = 1. being still in a state of ignorance. inserting (18) into the deﬁnition of Ip (f ) gives max Ip = ln p p + ln Z + a λa g a . the relative frequencies {fi }. The expression in the exponent simpliﬁes if and only if the a priori distribution {pi } is uniform: In this case. it cancels from the ratio in (18) and so does not aﬀect the frequency estimate. This leads to a variational equation δ i fi ln fi +η pi fi + i a λa i Gi fi = 0 a (17) where the constraints. maximizes Ip (f ). (23) There remains the task of specifying the –possibly nonuniform– a priori probability distribution {pi }. (21) The m Lagrange parameters {λa } must be adjusted such as to yield the correct prescribed averages {ga }.. In view of this incomplete information the relative frequencies must be estimated. The {pi } are those probabilities one would assign before having asserted the existence of the constraints (1).3 Frequency estimation We have seen that the knowledge of m (m < n) “averages” (1) constrains. i. while satisfying all the constraints. This “ignorance distribution” can usually be determined on the basis 5 .e. but fails to specify uniquely. (22) a set of m simultaneous equations for m unknowns. Finally. They can be determined from ∂ ln Z = −ga ∂λa . (19) The term := j pj ln pj (20) has been introduced by convention.
characterized solely by the single constraint (2). (29) 6 . {pi = 1/6}. must exhibit this same group invariance. which trivially implies {pi = 1/6}. the task of specifying the a priori distribution may be less straightforward. too. .of symmetry considerations: If the problem at hand is a priori invariant under some characteristic group then the {pi }. The problem is therefore invariant under a relabelling of the faces. 2 see for example E. the best estimate (18) for the frequency distribution reads fimax Z −1 exp(−2λ1 ) : i = 1 Z −1 : i = 2 . T. . In more complicated random experiments. What estimates should we make of the relative frequencies {fi } with which the diﬀerent faces appeared? Taking the a priori probability distribution –assigned to the various faces before one has asserted the die’s imperfection– to be uniform.170 : i = 2 .107 : i = 1 0. The Lagrange parameter is readily determined from ∂ ln Z = −g1 = 0 . 227 (1968) 1 max Ip = ln(1/6) + ln Z = −0. Jaynes. 1713). This in turn gives the numerical estimates fimax = with an associated (25) (26) (27) 0. if a priori we do not know anything about the properties of a given die then our prior ignorance extends to all faces equally.214 : i = 6 (28) The rationale underlying this consistency requirement has historically been called the “Principle of Insuﬃcient Reason” (J. 5 = −1 Z exp(λ1 ) : i = 6 (24) with only a single Lagrange parameter λ1 and Z = exp(−2λ1 ) + 4 + exp(λ1 ) . Ars Conjectandi. IEEE Trans. Cyb. especially those involving continuous and hence coordinatedependent distributions. 4. Bernoulli. . .1 For example. 5 0. Prior probabilities.2 For illustration let us return to the example of the loaded die. Systems Sci.019 . ∂λ1 with solution λ1 = (ln 2)/3 .
Must the hypothesis therefore be rejected. a = (m + 1) . and hence that important systematic eﬀects have been overlooked. and that hence the observed relative frequencies can be ﬁtted with a maximal distribution (18). In this fashion one can proceed iteratively from simple to ever more sophisticated models until the deviation of the ﬁt from the experimental data ceases to be statistically signiﬁcant. or is the deviation merely a statistical ﬂuctuation? The answer is furnished by the concentration theorem: Let N be the number of trials performed to establish the experimental distribution. 7 . (m + l) .The above algorithm for estimating frequencies can be iterated. the experimental frequencies f and the theoretical ﬁt f max do not agree exactly. typically the hypothesis is rejected whenever this probability is below 5%. These must be incorporated in the form of additional constraints. . let max ∆I = Ip − Ip (f ) (32) and s = (n − m − 3)/2. 3 The hypothesis test presented here is closely related to the betterknown χ2 test. based on some insight we have into the systematic inﬂuences aﬀecting the experiment.4 Hypothesis testing Now we consider random experiments for which complete frequency data are available. as a new a priori distribution f max → p′ . we conjecture that the observed relative frequencies can be fully characterized by a set of constraints of the –by now familiar– form (1). linearly independent constraints n Gi fi = ga a i=1 . This maximal distribution contains m ﬁt parameters {λa } (the Lagrange parameters) whose speciﬁc values depend on the averages {ga }. (i) starting from the same a priori distribution p as before. or (ii) iterate: use the previous estimate (18). . which in turn are extracted from the data. It represents our theoretical model or hypothesis. Suppose that. Associated with this improved estimate is max max Ip ′ = Ip + If max (f max′ ) . In general. apply the algorithm to the total set of (m + l) constraints.3 max prob(Ip < (Ip − ∆I)m constraints) < 5% . (31) 1. For large N (N ≫ s/∆I) the probability that statistical ﬂuctuations alone yield an Ip diﬀerence as large as ∆I is given by (16). and then repeat the algorithm just for the l additional constraints. which was based on the ﬁrst m constraints only. (33) Rejecting a hypothesis means that the chosen set of constraints was not complete. (30) In order to make an improved estimate that takes these additional data into account we can either. Suppose that beyond the m constraints (1) we learn of l additional. Both procedures give the same improved estimate f max ′ .
the deviation of the experimental from the uniform distribution. Rosenkrantz. (34) Our “null hypothesis” H0 is that the die is ideal and hence that there are no constraints needed to characterize any imperfection (m = 0).5 Jaynes’ analysis of Wolf’s die data The above prescription for testing hypotheses and –if rejected– for iteratively improving them by enlarging the set of constraints has been lucidly illustrated by E.02993 Table 1: Wolf’s die data: frequency distribution f and its deviation ∆ from the uniform distribution. the probability that statistical ﬂuctuations alone yield an Ip diﬀerence as large as max(H0) − Ip (f ) = 0.17245 +0. Concentration of distributions at entropy maxima. Jaynes in his analysis of Wolf’s die data. T. (37) E.4 Rudolph Wolf (1816–1893).00578 0. ed.02464 0. in: E. Papers on Probability. a Swiss astronomer. Jaynes. 000 and s = 3/2 we ﬁnd max prob(Ip < (Ip − ∆I H0 )0 constraints) ∼ 10−56 .01508 0. T. Statistics and Statistical Mechanics. with associated max(H0) Ip = Ip (p) = 0 . Jaynes.14485 0. Associated with the observed distribution is Ip (f ) = −0.006769 .14205 0.00437 0. 1. 4 8 .006769 ∆I H0 = Ip (36) is practically zero: Using Eq. by R. (16) with N = 20. The observed relative frequencies {fi } and their deviations {∆i = fi − pi } from the a priori probabilities {pi = 1/6} are given in Table 1. presumably to check the validity of statistical theory. In one of these experiments a die (actually two dice.16230 0. Kluwer Academic. but only one of them is of interest here) was tossed 20. Dordrecht (1989).19960 +0. T. However.i 1 2 3 4 5 6 fi ∆i 0.02182 0. D. had performed a number of random experiments. 000 times in a way that precluded any systematic favoring of any face over any other. (35) is merely a statistical ﬂuctuation.18175 +0.
(42) ﬁx the Lagrange parameters by requiring ∂ ln Z = −ga ∂λa and then calculate max(H2) Ip = ln(1/6) + ln Z + 2 (43) λa g a a=1 . which will tend to make one dimension (the last side cut) slightly diﬀerent from the other two. 2. (41) λa Gi a Z a=1 In order to test our hypothesis we determine 6 2 Z= i=1 exp − λa Gi a a=1 . The eﬀect of this is that the “observable” Gi = 2 has a nonzero average g2 = 0.1393 . (44) 9 . More speciﬁcally. Our analysis need not stop here. Jaynes argued that the two most likely imperfections are: • a shift of the center of gravity due to the mass of ivory excavated from the spots. which being proportional to the number of spots on any side. It is clear from the data that Wolf’s die gave a lower frequency for the faces (3.Therefore. 4 (39) and that hence the observed relative frequencies can be ﬁtted with a maximal distribution 2 1 max(H2) fi = exp − . and • errors in trying to machine a perfect cube. the null hypothesis is rejected: The die cannot be perfect. Our hypothesis H2 is that these are the only two imperfections present. should make the “observable” Gi = i − 3.4).5 (38) 1 have a nonzero average g1 = 0. we conjecture that the observed relative frequencies are characterized by just two constraints (m = 2) imposed by the measured averages g1 = 0. 6 −2 : i = 3. (40) 1 : i = 1. 5. and therefore that the (34) dimension was greater than the (16) or (25) ones.0983 and g2 = 0. Not knowing the mechanical details of the die we can still formulate and test hypotheses as to the nature of its imperfections.
(16).e. etc. by simply comparing the experimental and theoretical values of Ip . and hypotheses can be tested with the help of Eq. . Such a test is always statistical in nature.006534 (45) (46) and thus max(H2) ∆I H2 = Ip − Ip (f ) = 0.” 2 2. ). new interactions. yet there are indications that an additional very tiny imperfection may have been present. We shall soon encounter the quantity Ip again and see how it is related to one of the most fundamental concepts in statistical mechanics: the “entropy.6 Conclusion In all our considerations a crucial role has been played by the quantity Ip : The algorithm that yields the best estimate for an unknown frequency distribution is based on the maximization of Ip . i. or about the presence of a certain imperfection of Wolf’s die) can never be drawn with absolute certainty but only at some –quantiﬁable– conﬁdence level.1 Macroscopic Systems in Equilibrium Macrostate For complex systems with many degrees of freedom (like a gas. Jaynes’ analysis of Wolf’s die data furnishes a useful paradigm for the experimental method in general.) essentially proceeds in the same manner in which Jaynes revealed the imperfections of Wolf’s die: by formulating physically motivated hypotheses and testing them against the data. yield data in the form of frequency distributions over discrete “bins” in momentum space.000235 . The search for interesting signals in the data (new particles. Conclusions (say. for each of the various end products of the collision. about the presence of a top quark. Fermilab. 1. ﬂuid or plasma) the exact microstate is usually not known. The probability for such an Ip diﬀerence to occur as a result of statistical ﬂuctuations is (with now s = 1/2) max prob(Ip < (Ip − ∆I H2 )2 constraints) ≈ 2. Desy. The more sophisticated model H2 is therefore a major improvement over the null hypothesis H0 and captures the principal features of Wolf’s die. All modern experiments at particle colliders (CERN.5% . It is therefore impossible to assign to the system a unique point in phase space (classical) or a unique wave function 10 .. (47) much larger than the previous 10−56 but still below the usual acceptance bound of 5%. .With this algorithm Jaynes found max(H2) Ip = −0. for example.
11 . and normalized to one.(quantal). the state of a complex physical system is characterized by very few macroscopic data. or • as additional control parameters on which the selected observables {Ga } may explicitly depend. ˆ tr ρ = 1 . respectively. Such data. which implies the respective properties ρ(π)∗ = ρ(π) . Examples might be the average total energy. respectively) is assigned an expectation value A or A ρ ρ ρ(π) ≥ 0 .m (53) of some set {Ga } of selected macroscopic observables.) Probabilities must be real. such as an external electric or magnetic ﬁeld. a = 1. average angular momentum. ˆ (50) In this statistical description every observable A (real phase space function or Hermitian operator. ˆ ˆ ρ≥0 . These data may come in diﬀerent forms: • as data given with certainty. which are of a statistical nature. dπ ρ(π) = 1 (49) = dπ ρ(π)A(π) = tr(ˆA) . These exact data we take into account through the deﬁnition of the phase space or Hilbert space in which we are working. or average magnetization. or ρ† = ρ .. Instead one must resort to a statistical description: The system is described by a classical phase space distribution ρ(π) or an incoherent mixture fi i i (48) ρ= ˆ i of mutually orthogonal quantum microstates {i }. respectively.. ρˆ (51) (52) respectively. (Where the distinction between classical and quantal does not matter we shall use the generic symbol ρ. • as prescribed expectation values Ga ρ = ga . such as the type of particles that make up the system. nonnegative. or the shape and volume of the box in which they are enclosed. impose constraints of the type (1) on the distribution ρ. not even the distribution ρ is a priori known. Rather. Typically.
In analogy with the relation (22) one then deﬁnes new variables γb := ∂ ln Z ∂hb . for instance. that is. Some combinations of thermodynamic variables are of particular importance. The thermodynamic variables are not all independent: Rather. our remarks at the end of Section 1.According to our general considerations in Section 1. {hb } and {γb } are called the thermodynamic variables of the system. Readers already familiar with statistical mechanics might be disturbed by the appearance of σ in the deﬁnitions of ρ and Z. Therefore. Furthermore. If the observables that characterize the macrostate –in the form of sharp values given with certainty. In most practical applications σ is uniform and hence ln σ − ln σ σ = 0. Yet this is essential for a consistent formulation of the theory: see. are conjugate to each other. they are related by (22) and (58). {λa }. or ga and λa . (55) while for a quantum system ρ= ˆ and Z = tr exp ln σ − ln σ ˆ σ 1 exp ln σ − ln σ ˆ Z σ − ˆ λ a Ga a (56) − ˆ λa Ga a .3 on the possibility of iterating the frequency estimation algorithm. One says that hb and γb .) The {ga }. there may be an explicit dependence of the observables {Ga } or of the a priori distribution σ on additional control parameters. (57) In both cases σ denotes the a priori distribution. via partial derivatives of ln Z. (58) (In contrast to (22) there is no minus sign. The auxiliary quantity Z is referred to as the partition function.5 The phase space integral or trace in the respective expressions for Z depend on the speciﬁc choice of the phase space or Hilbert space. 5 12 . In the classical case this implies ρ(π) = with Z= dπ exp ln σ(π) − ln σ σ 1 exp ln σ(π) − ln σ Z σ − λa Ga (π) a (54) − λa Ga (π) a . hence they may depend on parameters like the volume or particle number. which is why the associated distributions go by special names.3 the best estimate for the thus characterized macrostate is a distribution of the form (18). Our deﬁnitions of ρ and Z then reduce to the conventional expressions. together they specify the system’s macrostate. the partition function generally depends not just on the Lagrange multipliers {λa } but also on some other parameters {hb }.
while all other data are given with certainty. every description of the macrostate in terms of thermodynamic variables represents a hypothesis: namely. β) corresponds to a pair (h. Rather. if just the energy is given on average. Depending on whether the energy is given with certainty or on average. ma := γa /β (61) 6 Here we have assumed that there is no timedependence of the a priori distribution σ. (22) and (58) its inﬁnitesimal variation is given by max dIσ = d ln σ σ + a λa dga + b γb dhb . Associated is an equilibrium distribution of the form (54) or (56). and the associated conjugate parameter. grand canonical.or in the form of expectation values– are all constants of the motion then the system is said to be in equilibrium. 2. 13 . Such a hypothesis may well be rejected by experiment. with all {Ga } being constants of the motion. G2 ) suﬃce to characterize the experimental data. For all remaining variables one then deﬁnes new conjugate parameters la := λa /β .6 The set of constants of the motion always includes the Hamiltonian (Hamilton function or Hamilton operator. canonical. This is analogous to Jaynes’ model for Wolf’s die. or further imperfections of Wolf’s die) have been overlooked. Strictly speaking. If its value for a speciﬁc system. and that the correct description of the macrostate requires additional thermodynamic variables. play a particularly important role. λ). which assumes that just two imperfections (associated with two observables G1 . and so are all expectation values calculated from it. γ) or (g. and the other macroscopic data are all given with certainty then the resulting equilibrium distribution is called microcanonical. (60) As the set of constants of the motion always contains the Hamiltonian its value for the given system. (59) By virtue of Eqs. this does not mean that our rationale for constructing ρ –maximizing Iσ under given constraints– was wrong. and if both energy and total particle number are given on average. the internal energy. it means that important macroscopic observables or control parameters (such as “hidden” constants of the motion. Such an equilibrium distribution is itself constant in time. the hypothesis that the sets {Ga } and {hb } are actually complete. which we denote by β. while all other data are given with certainty. the pair (U. the internal energy U. If so.2 First law of thermodynamics Changing the values of the thermodynamic variables alters the distribution ρ and with it the associated max Iσ ≡ Iσ (ρ) = ln σ σ + ln Z + a λa g a . respectively) provided it is not explicitly timedependent.
like the partition function.such that in terms of these new parameters the energy diﬀerential reads max dU = β −1 d(Iσ − ln σ σ )− a la dga − mb dhb b . Provided we choose this constant to be the Boltzmann constant k = 1. there is nothing like a Hermitian “entropy operator.381 × 10−23 J/K . (69) The relation dU = δQ + δW . we can identify the temperature T := and the entropy max S := k (Iσ − ln σ σ ) (65) 1 kβ (66) (67) to write δQ in the more familiar form δQ = T dS . is known as the ﬁrst law of thermodynamics.e.” 7 14 . is related to measurable quantities it is essentially an auxiliary concept and does not itself constitute a physical observable: In quantum mechanics. m) of thermodynamic variables are listed in Table 2. these parameters are held ﬁxed (dga = dhb = 0) then the internal energy can still change through the addition or subtraction of heat 1 max k d(Iσ − ln σ σ ) . (64) δQ := kβ Here we have introduced an arbitrary constant k. l) and (h. (68) The entropy is related to the other thermodynamic variables via Eq. (63) some commonly used pairs (g. i. (70) which reﬂects nothing but energy conservation.7 S = k ln Z + k a λa g a . (59). for example. (62) A change in internal energy that is eﬀected solely by a variation of the parameters {ga } or {hb } is deﬁned as work δW := − a la dga − mb dhb b . If. on the other hand.. Even though the entropy.
3 Example: Ideal quantum gas We consider a gas of noninteracting bosons or fermions. Provided the a priori distribution σ is uniform. m) (V. e. (73) For example. −B) (P. magnetization electric ﬁeld. the best estimate for the macrostate has the form 1 ˆ ρ = exp − ˆ αiNi (72) Z i with αi = a λa Gi a . In cases where two pairs are given. angular velocity Table 2: Some commonly used pairs of thermodynamic variables. pressure particle number. (74) 15 . −µ) (B. 2. g. l) (N. −E) (p. M). (M. p) (N. −B) is adequate if the magnetization M is a constant of the motion whose value is given on average. chemical potential magnetic induction. M) should be used if there is an externally applied magnetic ﬁeld B which plays the role of a control parameter. −B) and (B. in the grand canonical ensemble (energy and total particle number given on average) the parameters {αi} are functions of the singleparticle energies {ǫi }. (71) ˆ where the {Gi } are arbitrary (cnumber) coeﬃcients and the {Ni } denote number a operators pertaining to some orthonormal basis {i } of singleparticle states. We ˆ further suppose that the observables {Ga } whose expectation values are furnished as macroscopic data are all of the singleparticle form ˆ Ga = i ˆ Gi Ni a .(g. We suppose that the total particle number is not given with certainty (but possibly on average. P ) names volume. velocity angular momentum. the pair (M. the inverse temperature β and the chemical potential µ: αi = β(ǫi − µ) . −ω) (h. M) (E. −v) (L. the proper choice depends on the speciﬁc situation: For example.. as in the grand canonical ensemble) so the system must be described in Fock space. while the pair (B. −µ) (M. electric polarization momentum.
This gives i ln 1 ∓ e−α i (78) =− ∂ i ln Z = eα ∓ 1 i ∂α −1 (79) of any singleparticle state i.4 Thermodynamic potentials Like the partition function. (82) 2. (76) The sum over Ni extends from 0 to the maximum value allowed by particle statistics: ∞ for bosons. thermodynamic potentials are auxiliary quantities used to facilitate calculations. Consequently. for we work in Fock space where we sum freely over each Ni : Z= i Ni e−α i Ni =: i Zi . Using the inverse relation αi = ln(1 ± ni ) − ln ni together with the speciﬁc realization of Eq.. each factor Zi reads Zi = 1 ∓ e−α ln Z = ∓ and hence the average occupation ni ≡ Ni ρ i ∓1 . (81) we ﬁnd for the entropy S = −k i [ni ln ni ∓ (1 ± ni ) ln(1 ± ni )] .N2 . 1 for fermions. hb ) := − ln Z β related to the internal energy U via Ω = U − TS + 16 l a ga a . (84) .. One example is the (generalized) grand potential 1 Ω(T.The partition function Z = tr exp − ˆ αiNi = i conﬁgurations {N1 .. (77) the upper sign pertaining to bosons and the lower sign to fermions.} i e−α i Ni (75) factorizes. S = k ln Z + k i (80) αi ni . la . (83) . (69).
ga dla − hb dmb = 0 . (86) S=− ∂T la . Ni . (90) . e. b (89) For an ideal gas in the grand canonical ensemble.g. For example. 1). Examples for the former are the volume itself.. In addition to the grand potential there are many other thermodynamic potentials: Their deﬁnition and properties are best summarized in a Born diagram (Fig. ga and mb can be obtained from the grand potential by partial diﬀerentiation.Its diﬀerential dΩ = −SdT + a ga dla − mb dhb b (85) shows that S. and the observables of interest are the particle numbers of the various reactants (measured variables {ga } = {Ni }) then the reaction is most conveniently described by the free enthalpy G(T. la .hb where the subscript means that the partial derivative is to be taken at ﬁxed la . µ. hence Ωi. for instance. V ) = −p(T. we have the temperature T and the chemical potential {la } = {−µ} intensive. µ) V 17 .gas (T. hb . if a chemical reaction takes place at constant temperature and pressure (controlled variables T . In general. the temperature or the chemical potential. ∂Ω . If we assume that the temperature and the {la } are intensive. whereas the volume {hb } = {V } and the grand potential Ω are extensive. if a thermodynamic variable is extensive then its conjugate is intensive. {mb } = {p}). hb ) ∀ τ > 0 and hence Ωhom = − This implies the GibbsDuhem relation SdT − a (87) (88) mb hb b . whereas amongst the latter are the pressure. while the {hb } and the grand potential are extensive. and vice versa. When a large system is physically divided into several subsystems then in these subsystems the thermodynamic variables generally take values that diﬀer from those of the total system. p). la . τ hb ) = τ · Ωhom (T. the internal energy or the number of particles. In the special case of a homogeneous system all variables of interest can be classiﬁed either as extensive –varying proportionally to the volume of the respective subsystem– or intensive –remaining invariant under the subdivision of the system. In a given physical situation it is most convenient to work with that potential which depends on the variables being controlled or measured in the experiment. then Ωhom (T.
the free enthalpy G. and associated with each conjugate pair is a dotted “basis vector. the sign depending on whether the requested conjugate variable is at the head (−) or tail (+) of a basis vector. g. g. g. Corners correspond to thermodynamic potentials: the grand potential Ω. and (ii) adding (if moving parallel) or subtracting (if moving antiparallel) the product of the conjugate variables that are associated with the basis vector.g. m). m}. the internal energy U. This procedure is known as a Legendre transformation. in order to obtain the free enthalpy G from the enthalpy H one (i) uses T = +∂H/∂S to solve for S(T.. and then (ii) subtracts the product T S to get G(T. g. the potential Ξ (which vanishes for a homogeneous system). m). g. m}. m). Their conjugates {T. since the free enthalpy will be a function of {T. Opposite sides are conjugate to each other. One can go from one corner to the next by moving parallel or antiparallel to a basis vector. g..” Each corner is a function of the adjacent sides. e. m} rather than {S. g. thereby (i) changing variables such as to get the correct dependence of the new potential. T = +∂H/∂S. the enthalpy H. For instance. Successive application allows one to calculate all thermodynamic potentials from the grand potential Ω and hence. Sides of the cube correspond to thermodynamic variables: T . g. S. from the partition function Z. h and m. the free energy F . l. ultimately. g. m) = H(S(T. l. as well as two rarely used potentials χ1 and χ2 . h} can be obtained from H by partial diﬀerentiation. e. 18 .g. m) − T S(T. the enthalpy H is a function of {S.Ξ(=0) m G l T g Ω h F U S H χ2 χ1 Figure 1: Born diagram.
(94) =− ∂gb ∂λa = λ. B ρ := 1 0 dν tr ρν A† ρ1−ν B ˆ ˆ ˆ ˆ (93) in the quantum case. h of the partial derivatives indicate that they must be taken with all other {λa } and all {hb } held ﬁxed.2. ρ = iLA ρ + (98) 19 . the expectation value A ρ changes according to dA Here .h thus relates inﬁnitesimal variations of λ and g: dgb = − dλa Cab a . respectively. (91) is the canonical correlation function with respect to the state ρ: A. (96) The subscripts λ. depend on the Lagrange multipliers {λa } as well as –possibly– on other parameters {hb }. B ρ := dπ ρ(π)A(π)∗ B(π) (92) in the classical case or A. A ρ dλa a . ρ ρ =− δGa . we immediately obtain from (79) the correlation of occupation numbers ∂nj (97) δNi . ∂α 3 3.h ∂2 ln Z ∂λa ∂λb (95) λ. The observable δGa is deﬁned as δGa := Ga − Ga The correlation matrix Cab := δGa . respectively. If the Lagrange multipliers vary inﬁnitesimally while the {hb } are held ﬁxed. δNj ρ = − i = δij ni (1 ± ni ) .1 Linear Response Liouvillian and Evolution ρ The dynamics of an expectation value A dA dt ρ is governed by the equation of motion ∂A ∂t .5 Correlations Arbitrary expectation values A ρ in the macrostate (54) or (56). δGb ρ ρ . dλa = − dgb (C −1 )ba b . Returning to our example of the ideal quantum gas.
the Liouvillian L takes the Poisson bracket with the Hamilton function H(π). t)A ρ (t0 ) . t) · θ(t − t0 ) (where ‘<’ symbolizes ‘t0 < t’) which satisﬁes another diﬀerential equation ∂ U< (t0 . t) = iU(t0 . a state ρ for which Lρ = 0 is called stationary. B . The evolver determines –at least formally– the evolution of expectation values via A ρ (t) = U(t0 . we shall not assume this in the following. ∂t If a (possibly timedependent) perturbation is added to the Liouvillian. t0 ) = 1. iL = (i/¯ ) [H. t) = exp[i(t − t0 )L] . (108) (107) (106) 0 : t ≤ t0 1 : t > t0 (105) (104) . Multiplication with a step function θ(t − t0 ) = yields the socalled causal evolver U< (t0 . As long as the Liouvillian L is not explicitly timedependent. Pj }. (103) however. ∗] . t) := U(t0 . h ˆ (100) An observable A for which iLA + ∂A/∂t = 0 is called a constant of the motion. (101) The evolver U is deﬁned as the solution of the diﬀerential equation ∂ U(t0 . Only for a stationary ρ the Liouvillian is Hermitian with respect to the canonical correlation function. t)L ∂t (102) with initial condition U(t0 . B ρ ∀ A. LB ρ = LA. ∂H ∂ ∂H ∂ (99) − iL = ∂Pj ∂Qj ∂Qj ∂Pj j in canonical coordinates π = {Qj . the solution has the simple exponential form U(t0 .Here we have allowed for an explicit timedependence of the observable A. t)L + δ(t − t0 ) . L(V ) := L + V 20 . t) = iU< (t0 . Classically. whereas in the quantum case it takes the ˆ commutator with the Hamilton operator H. A.
(111) α φ (t) : t > 0 How does an arbitrary expectation value A (t) evolve in response to this external perturbation? The general solution is A (t) = U< (0. Provided the ﬁelds are suﬃciently weak. (V ) (109) Iteration of this integral equation –reexpressing the U< (t0 . (112) where 0 stands for the expectation value in the initial equilibrium state ρ(0). We assume that the observable A does not depend explicitly on time or on the ﬁelds φα (t). and so on– yields an inﬁnite series. t) . and may be regarded as a timedependent perturbation V(t). t) + (V ) (V ) is related to the unperturbed U< by an (V ) ∞ −∞ dt′ U< (t0 . Before t = 0 the external ﬁelds are zero and the system is assumed to be in an initial equilibrium state ρ(0) = 1 exp − Z λa Ga [0] a (110) characterized by some set {Ga [0]} of constants of the motion at zero ﬁeld (and with the a priori distribution σ taken to be uniform).then the perturbed causal evolver U< integral equation U< (t0 . 3.2 Kubo formula The Kubo formula describes the response of a system to weak timedependent external ﬁelds φα (t). t) = U< (t0 . t′ ) in the integrand in terms of another sum of the form (109). t)A [φ] 0 . The Hamiltonian H[φ] and with it the Liouvillian L[φ]. the terms being of increasing order in V. t′ ) iV(t′ ) U< (t′ . Application of ﬁrst order timedependent perturbation the[φ] ory then yields the evolver U< in terms of V(t) and the zeroﬁeld evolver U< . generally do depend on the external ﬁelds. the Liouvillian may be expanded linearly: L[φ(t)] ≈ L[0] + ∂L[φ] ∂φα φα (t) . φ=0 (113) α The zeroﬁeld Liouvillian L[0] is assumed to be not explicitly timedependent. the linear correction to it generally is. Then the external ﬁelds are switched on: 0 : t≤0 φα (t) = . on the other hand. 21 . Truncating this series after the term of order V n gives an approximation to the exact causal evolver in nth order perturbation theory.
t)A φα (t′ ) . (118) α ∂φ φ=0 ∂φα φ=0 which we use to obtain A (t) = − ∞ α −∞ In general.3 Example: Electrical conductivity The conductivity σ ik (ω) determines the linear response of the current density j to a (possibly timedependent) homogeneous external electric ﬁeld E. t)A φα (t′ ) . 22 χA (ω) → σ ik (ω) . The above expression for the dynamical susceptibility is known as the Kubo formula. A(t) φ=0 0 (121) with A(t) := U< (0. They satisfy L[φ]Ga [φ] = 0 ∀ φ . A → jk . We identify φα → Ei . t)A is called the dynamical susceptibility. 3. α (122) . B 0 λa ∀B (115) dt′ a λa i ∂L[φ] ∂φα φ=0 Ga [0]. α (120) The coeﬃcient χA (ω) = − α λa a ∞ 0 dt exp(iωt) iL[0] ∂Ga [φ] ∂φα . 0 (116) yet generally L[φ′]Ga [φ] = 0 for φ′ = φ. 0 (119) The righthand side of this equation has the structure of a convolution. so in the frequency representation we obtain an ordinary product A (ω) = α χA (ω)φα (ω) . U< (t′ . 0 dt′ (114) φ=0 With the help of the mathematical identity (prove it!) iL[φ]B we can also write A (t) = α ∞ −∞ 0 = a iL[φ]Ga [0].Assuming for simplicity that A A (t) = α ∞ −∞ 0 = 0 we thus ﬁnd i ∂L[φ] ∂φα U< (t′ . U< (t′ . Together with the Leibniz rule this implies ∂Ga [φ] ∂L[φ] Ga [0] = −L[0] . the constants of the motion depend explicitly on the external ﬁelds. (117) dt′ a λa iL[0] ∂Ga [φ] ∂φα φ=0 . t)A φα (t′ ) .
Since a conductor is an open system with the number of electrons ﬁxed only on average. and there remains only ∂H = −eQi ∂Ei . the velocityvelocity correlation function will decay more slowly. its initial state must be described by a grand canonical ensemble: {Ga [φ]} → {H[E]. (126) This result is rather intuitive. Hence the conductivity is proportional to the velocityvelocity correlation: σ ik (ω) = e2 nβ ∞ 0 dt exp(iωt) V i . for instance. but the latter vanishes. giving rise to a correspondingly larger conductivity. thereby changing their velocities. In a dirty metal or semiconductor. In principle. the electrons will often scatter oﬀ impurities. Furthermore. We use the general formula (121) for the susceptibility to obtain σ ik (ω) = eβ ∞ 0 dt exp(iωt) iL[0]Qi . 23 . In a clean metal with fewer impurities. with associated Lagrange parameters {λa } → {β. on the other hand. (124) The current density is related to the velocity V k by j k = enV k . j k (t) 0 . −βµ}. N}. V k (t) 0 . iL[0]Qi = V i . As a result. (125) where n is the number density of electrons. leading to a small conductivity. the formula for the conductivity then contains both ∂H/∂Ei and ∂N/∂Ei . the velocityvelocity correlation function will decay rapidly. (123) with Qi denoting the ith component of the position observable and e the electron charge.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.