Brownian Motion and Diffusion Explained
Brownian Motion and Diffusion Explained
Harish Srinivasan
1
1 The Brownian motion - treatment of displacement
The notion of random motion has been a topic of great philosophical and scientific discourse throughout the history
of humanity. As one of the earliest proponents of this idea, a Roman poet and philosopher Lucretius in his scientific
poem ”On the nature of Things” wrote the following about behaviour of dust particles:
An idea he proposed to reinforce the atomistic models in Roman philosophy bears a gross resemblance to the nature
of the random motion of particles suspended in a fluid. However, the first true discovery of random motion of
particles in a fluid is credited to the botanist Robert Brown. In 1827, during a study on the suspension of pollen
grains in water, he observed numerous particles ejected by pollen grains performing zig-zag motion. Through a
series of observations on different materials, Brown noted that this motion was not related to life, but rather a
physical phenomenon which had no biological origin . After Brown’s notable work, very little progress was taken in
understanding the physical principles of this process.
In the second half of the 19th century, the framework molecular kinetic theory of gases was making significant
strides, particularly pioneered by Maxwell, Boltzmann and Gibbs. This instilled Gouy to suggest that the nature
of Brownian motion could be linked to molecular kinetic theory. However, a more quantitative formalism of this
relationship between kinetic theory and Brownian motion was developed Einstein in 1905. Simultaneously, around
the same period Smoluchowski and Sutherland also made similar observations about the nature of Brownian motion.
In his groundbreaking 1905 paper, Einstein built upon the progress in molecular kinetic theory, primarily pioneered
by the efforts of Boltzmann, Maxwell, and Gibbs. By employing these fundamental principles, Einstein addressed the
phenomenon of Brownian motion, elucidating the molecular or atomic underpinnings of the persistent and irregular
motion displayed by microscopic particles suspended in a stationary fluid.
Owing to the probabilistic nature of particle dynamics, a probability distribution function (PDF), P (x, t), char-
acterizing the probability of finding the particle at some position x at a time t, is considered along with certain
initial conditions P (x, 0). In Einstein’s model, the Brownian motion of particles is propagated by making small
displacements, of length ∆. These displacements essentially occur due to random molecular collisions on the particle
and are characterized by a probability distribution, f (∆). An equation governing the time-evolution of the PDF
P (x, t + τ ) can the be given as,
Z∞
P (x, t + τ ) = d∆ f (∆)P (x − ∆, t) (1)
−∞
The above equation quantitatively relates the probability of detecting particles at position x at time t + τ to two
factors: (i) the likelihood of encountering the particle at x − ∆ at time t and (ii) the probability of the particle under-
going a displacement of length ∆. It is crucial to emphasize several key assumptions that form the foundation of the
aforementioned equation. Foremost among these assumptions is that it considers the occurrence of a single displace-
ment within the time interval τ . Additionally, the equation assumes that the probability of the jump’s magnitude
remains unaffected by the displacements at an earlier time. These assumptions align naturally with the meaningful
scales inherent to the time interval τ . This interval is considerably small when compared to the macroscopic time
frame of experimental measurements, a factor that allows it to be treated as a single displacement. However, it
retains a reasonably large magnitude when compared to average time-interval between molecular collisions, thereby
maintaining the assumption of independence between consecutive jumps. In order to obtain a rate equation governing
the probability density P (x, t), we consider the Taylor expansion of (1) with respect to τ on the RHS and ∆ on the
2
LHS, yielding,
∂P (x, t)
+ O(τ 2 )
P (x, t) + τ
∂t
Z∞ (2)
∆2 ∂ 2 P
∂P 3
= d∆f (∆) P (x, t) − ∆ + + O(∆ )
∂x 2 ∂x2
−∞
The first term on the sides cancel out, owing to the normalization of f (∆). Further, considering that the suspended
particles experience isotropic collisions, enforces that the mean jump-length is zero (⟨∆⟩ = 0). Therefore we have,
∂P (x, t) ∆2 ∂ 2 P (x, t)
= (3)
∂t 2τ ∂x2
where we have retained terms of only lowest order with non-zero contribution in the Taylor expansion. This truncation
is equivalent to the physical assumption that the observation length x ≫ ∆ and time scales t ≫ τ .
It is easy to recognize the structure of eq. (3) as the diffusion equation, with the diffusivity, D given by ∆2 /(2τ ).
Therefore, we have,
∂P (x, t) ∂ 2 P (x, t)
=D (4)
∂t ∂x2
The solutions to this equation can be obtained easily in the Fourier space by calculating the characteristic function
1
, which is the Fourier transform of the probability distribution,
dP̃ (k, t)
= −Dk 2 P̃ (k, t) (5)
dt
Solving the above equation, we have
P̃ (k, t) = P̃ (k, 0) exp −Dk 2 t
(6)
To get the solutions to eq. (3) for an initial condition P (x, 0) = P0 (x), we can take the inverse Fourier transform of
(6)
1 −x2 /(4Dt)
P (x, t) = √ e ⊗ P0 (x) (7)
4πDt
where ⊗ refers the convolution operation. In the case of the trivial initial condition, P0 (x) = δ(x), we have that
1 2
P (x, t) = √ e−x /(4Dt) (8)
4πDt
Since we have the solution for P (x, t), we can now calculate all the moments/cumulants of the probability distribution.
As we already have the characteristic function, we can get the moment/cumulant generating function from it 2 . The
cumulant generating function, K(k, t) in our case can be written from (6),
From this the all the cumulants can be calculated by the formula,
−1 dn K(k, t)
κn (t) = (10)
in dk n k=0
1 Characteristic function of a probability distribution is Fourier transform of the distribution,
Z∞
P̃ (k, t) = e−ikx P (x, t)
∞
generating function. From this, we have the cumulant generating function as, K(u) = ln M (u) =⇒ K(k) = ln M (−ik)
3
Evidently, it can be seen that the first cumulant κ1 (t), which is the mean displacement ⟨x(t)⟩ is equal to zero. On
2
the other hand, the variance of displacement ∆x2 (t) = x2 (t) − ⟨x(t)⟩ can be found out to be,
Major consequences of (7) is the nature of average particle displacements. Due to the isotropic nature of the
motion, mean displacement ⟨x⟩ is zero. However, the mean-squared displacement (MSD), x2 = 2Dt, exhibits a
linear relationship with time.
The diffusion equation can also be derived from a simple one-dimensional random walk by considering the probability
distribution of a particle moving in discrete time steps and then taking the continuum limit.
Consider a particle performing a random walk on a one-dimensional lattice with lattice spacing ∆x. At each time
step ∆t, the particle can move to the left or right with equal probability 1/2. Let P (x, t) denote the probability of
finding the particle at position x at time t.
This equation states that the probability of the particle being at position x at time t + ∆t is the average of the
probabilities of it being at the neighboring positions x + ∆x and x − ∆x at time t.
(∆x)2
In the continuum limit, we let ∆x → 0 and ∆t → 0 such that the ratio 2∆t remains constant and is defined as
the diffusion coefficient D:
(∆x)2
D= (16)
2∆t
Thus, starting from a simple 1D random walk model, we have derived the diffusion equation. This equation
describes the evolution of the probability distribution of a diffusing particle over time.
4
1.3.1 Introducing Drift in the Random Walk
To introduce drift, we make the random walk probabilities asymmetric. Let the probability of moving to the right
be p and to the left be q = 1 − p. The transition probabilities now become:
Simplifying, we obtain:
(∆x)2 (q − p)∆x
D= and v = (23)
2∆t ∆t
∂P (x, t) ∂ 2 P (x, t)
P (x, t + ∆t) ≈ P (x, t) + v∆t + D∆t (24)
∂x ∂x2
The equation presented above (eq. (25)) is the general form of a Fokker-Planck equation, which we
will derive in more general context in the later sections.
5
2 Stochastic Processes
In the previous chapter, we explored the foundations of Brownian motion through Einstein’s model and the random
walk problem, illustrating how the seemingly erratic motion of particles can be understood statistically. However,
to delve deeper into the mathematical description of such phenomena, it is essential to introduce the concept of
stochastic processes. A stochastic process provides a rigorous framework to describe systems that evolve over time
under the influence of random forces. By recognizing that the displacement of a particle in Brownian motion is
not just a fixed value but a random variable that changes with time, we can accurately capture its behavior. This
approach not only enhances our understanding of particle displacement but also sets the stage for analyzing other
dynamic quantities, such as velocity, within a probabilistic framework. The precise mathematical treatment of these
random variables is crucial for developing predictive models and deriving key results in statistical mechanics and
beyond.
A stochastic process is a mathematical object that describes a collection of random variables indexed by time
or another parameter. Formally, a stochastic process {X(t)}t∈T is a family of random variables, where t belongs to
an index set T (often representing time).
For each fixed t ∈ T , X(t) is a random variable that takes values in a state space S. The index set T can be
discrete (e.g., {0, 1, 2, . . .}) or continuous (e.g., [0, ∞)).
The fundamental way to define a stochastic process is through its finite-dimensional distributions, which describe
the joint behavior of the process at a finite set of times. For any finite collection of times t1 , t2 , . . . , tn ∈ T , the joint
probability distribution is given by:
P (x1 , t1 ; x2 , t2 ; . . . xi , ti ; . . . xn , tn ) (26)
which gives the probability that the process has a value xi at time ti .
In the case, the collection of random variables are continuously varying (i.e. all the xi are now continuous random
variables), then we define a probability density function,
p(x1 , t1 ; . . . xi , ti ; . . . xn , tn )dx1 . . . dxi . . . dxn (27)
which gives the probability that the stochastic process takes a value between xi & xi + dxi at any instant of time ti .
The use of joint probability distributions to define a stochastic process is motivated by the need to capture the
dependencies and relationships between random variables at different times. This approach allows us to understand
how the process evolves over time and how the values at different times are correlated. By specifying joint distribu-
tions, we can derive important properties such as conditional probabilities, moments, and correlations, which provide
deeper insights into the behavior of the process.
The hierarchy of joint probability densities/distributions completely defines all the statistical properties of the
stochastic process. In this hierarchy, we consider marginal probability distributions, which we get by integrating
out certain portion of the family of random variables in the stochastic process.
The marginal density function of a single random variable X(t) at time t is given by p1 (x) = P (X(t) = x). This
density function describes the probability distribution of the process at a single time point.
6
Z∞ Z∞
p(1) (x1 , t1 ) = ··· p(x1 , t1 ; . . . xi , ti ; . . . xn , tn ) dx2 dx3 · · · dxn (28)
−∞ −∞
Here the subscript (1) in p(1) denotes that it’s a one-time PDF. In a similar fashion, we can also define a two-time
PDF, p(2) ,
Z∞ Z∞
p(2) (x1 , t1 ; x2 , t2 ) = ··· p(x1 , t1 ; . . . xi , ti ; . . . xn , tn ) dx3 · · · dxn (29)
−∞ −∞
By defining such multi-time PDFs, p(l) , we can define the complete hierarchy of probability distribution, which all
together completely defines the stochastic process.
p(1) (x1 , t1 ), p(2) (x1 , t1 ; x2 , t2 ), . . . p(l) (x1 , t1 , . . . xl , tl ) . . . p(x1 , t1 , . . . xn , tn ) (30)
wherein in our notation, the n-time PDF is the complete joint PDF (i.e. p(n) = p).
In the context of stochastic processes, while the marginal density functions p(l) provide important information about
the distribution of the random variables at various time points, they do not capture the dependencies and relationships
between these variables. This is where conditional probability densities come into play.
The conditional density function represents the probability density of one random variable given the values of
others. For example, the conditional PDF of X(t1 ) given X(t2 ), X(t3 ), . . . , X(tn ) is:
p(x1 t1 ; x2 , t2 ; . . . , xn , tn )
p(x1 , t1 |x2 , t2 ; x3 , t3 ; . . . , xn , tn ) = (31)
p(n−1) (x2 , t2 ; . . . , xn , tn )
with the condition that t1 ≥ t2 ≥ . . . ti ≥ . . . tn holds. By specifying these densities, one can capture the dependencies
and causal structure of the stochastic process.
Yet another example of a conditional density involving two different times can be given as,
p(x1 t1 ; x2 , t2 ; . . . , xn , tn )
p(x1 , t1 ; x2 , t2 |x3 , t3 ; . . . , xn , tn ) = (32)
p(n−2) (x3 , t3 ; . . . , xn , tn )
By utilizing conditional probability densities, we can capture the relationships between different time points. For
instance, knowing the state of the system at time t1 might significantly affect the behavior at time t2 . This is crucial
for understanding time-dependent phenomena in stochastic processes.
In a completely independent stochastic process, the random variables X(t1 ), X(t2 ), . . . , X(tn ) are independent for
all t1 , t2 , . . . , tn ∈ T . This means that the occurrence of an event at one time does not influence the occurrence of
events at any other time. Mathematically, the joint PDF factorizes as:
n
Y
p(x1 , t1 ; x2 , t2 ; . . . , xn , tn ) = p(1) (x1 , t1 )p(1) (x2 , t2 ) · · · p(1) (xn , tn ) = p(1) (xi , ti ) (33)
i=1
Each random variable has its own distribution, and the process lacks any form of memory or dependence structure.
A stochastic process is called a Markov process if the future state of the process depends only on the present state
and not on the sequence of events that preceded it. This property is known as the Markov property. Formally, for
any times t1 < t2 < · · · < tn , the Markov condition implies that,
p(x1 , t1 |x2 , t2 ; x3 , t3 ; . . . , xn , tn ) = p(x1 , t1 |x2 , t2 ) (34)
7
which means that in principle, in the hierarchy of PDFs, only one-time and two-time condition PDFs are sufficient to
describe a Markov process. The Markov property simplifies the analysis and modeling of stochastic processes since
it reduces the complexity of conditional probabilities.
To write it explicitly, in a Markov process the total joint-PDF can be simply given as a product of individual
two-time and one-time conditional probabilities
p(x1 , t1 ; . . . xn , tn ) = p(x1 , t1 |x2 , t2 )p(x2 , t2 |x3 , t3 ) . . . p(xn−1 , tn−1 |xn , tn )p(xn , tn ) (35)
A stochastic process is referred to as stationary if its statistical properties do not change over time. More
formally, for a process to be stationary, the joint probability distribution must be invariant under time
translation. This means that the joint probability distribution of the process at times t1 , t2 , . . . , tn should be the
same as the joint distribution at t1 + τ, t2 + τ, . . . , tn + τ for any time shift τ .
It is quite straightforward to show that if the joint PDF is equal to time-translationally invariant, the conditional
PDFs will also have that property.
Within the context of Markov processes, we can use time-translational invariance to write the two-time conditional
probability as,
p(x1 , t1 |x2 , t2 ) = p(x1 , t1 − t2 |x2 ) (37)
For a Markov process, the Chapman-Kolmogorov equation provides a relationship between the conditional prob-
abilities at different times. Consider three times t1 > t2 > t3 , in such case, the joint PDF over the three times
is,
p(3) (x1 , t1 ; x2 , t2 ; x3 , t3 ) = p(x1 , t1 |x2 , t2 )p(x2 , t2 |x3 , t3 )p(x3 , t3 ) (38)
where we have used the Markov assumption to rewrite them as conditional two-time PDFs. After integrating both
sides of the equation by x2 ,
Z Z
dx2 p(3) (x1 , t1 ; x2 , t2 ; x3 , t3 ) = dx2 p(x1 , t1 |x2 , t2 )p(x2 , t2 |x3 , t3 )p(x3 , t3 )
Z
=⇒ p(2) (x1 , t1 ; x3 , t3 ) = p(x3 , t3 ) dx2 p(x1 , t1 |x2 , t2 )p(x2 , t2 |x3 , t3 ) (39)
p(2) (x1 , t1 ; x3 , t3 )
Z
=⇒ = dx2 p(x1 , t1 |x2 , t2 )p(x2 , t2 |x3 , t3 )
p(x3 , t3 )
By definition of the conditional probability, the LHS is equal to p(x1 , t1 |x3 , t3 ). Therefore, we have the final form of
Chapman-Kolmogrov equation (CKE) for Markov processes as,
Z
p(x1 , t1 |x3 , t3 ) = dx2 p(x1 , t1 |x2 , t2 )p(x2 , t2 |x3 , t3 ) (40)
This equation expresses the idea that the probability of transitioning from state x3 to state x1 can be obtained by
considering all possible intermediate states x2 and summing (integrating) over their contributions. It is a fundamental
equation in the theory of Markov processes and plays a key role in the analysis of such systems.
For discrete stochastic processes (i.e. processes whose values are discretely spaced), we can write the CKE as,
N
X
P (j, t1 |i, t3 ) = P (j, t1 |k, t2 )P (k, t2 |i, t1 ) (41)
k=1
8
Figure 1: Schematic highlighting the Chapman-Kolmogrov equation used for Master equation shown in eq. (46).
First there is a x0 → x transition in the interval (t, t0 ); then in the interval (t, t + ∆t), another transition x′ → x.
In the case of stationary Markov processes, we already saw that two-time conditional PDFs simply become a function
time-interval rather than two time instances. This can provide a further simplification in the CKE. In order to observe
this effectively, consider the transition from x0 [t = 0] → x[t] in a time-inteval t, through an intermediate state x′ at
some time t′ , such that 0 < t′ < t
Z
p(x, t|x0 ) = dx′ p(x, t − t′ |x′ )p(x′ , t′ |x0 ) (42)
The probability of ‘propagating’ from an initial value x0 to a final value x is the product of the probabilities of
propagating from x0 to any intermediate value x′ , and subsequently from x′ to x, summed over all possible values of
the intermediate state x′ .
Important Remarks
1. The probability p(x1 , t1 |x3 , t3 ) is not merely a conditional probability, but also a transition probability for
x3 → x1
2. The conditional probability satisfies the normalization condition
Z
dx3 p(x3 , t3 |x1 , t1 ) = 1
3. Getting the form of p(x, t|x0 , t0 ) is the central problem to be solved for studying Markov processes.
4. However, CKE is a non-linear integral equation, which is quite hard to solve. We will modify CKE into a
differential equation on p(x, t|x0 , t0 )
In order to obtain a differential form (in time, t)3 for CKE, firstly we need to ensure that time is continuous and
the Markov process is a continuous time Markov process. So far, in any of our discussions, we have not specified the
necessity of continuity in time.
3 Keep in mind that we are currently only interested in making the time parameter continuous. Whether the values of stochastic
process xi span a continuous space is a different question and we will deal with it later. In fact, such a consideration will lead to the
derivation of Fokker-Planck equation.
9
2.4.1 Mathematical definition of Continuous Markov Process
It can be shown with certainty that the stochastic process is a continuous Markov process, if for ϵ > 0, we have,
p(x, t + ∆t|x′ , t)
Z
lim dx =0 (43)
∆t→0 ∆t
|x−x′ |>ϵ
With the choice of limit of integration being taken as |x − x′ | > ϵ indicates that you’re integrating over a small
non-vanishing interval.
The probability for ”final state x to be finitely different from x′ goes to zero faster than ∆t, for very
small values of ∆t
The integrand in eq. (43) can be defined as an independent quantity, if the limit exists,
p(x, t + ∆t|x′ , t)
W (x|x′ ; t) = lim (44)
∆t→0 ∆t
Here, W (x|x′ ; t)4 is the rate the transition x′ → x is bound to happen in the system. It is important to note that
this transition in general can be function of time, t.
However, the transition rate W becomes independent of time for a stationary process. To observe this, notice
that p(x, t + ∆t|x′ , t) = p(x, ∆t|x′ ) from the condition of stationarity. Imposing these conditions on eq. (44) we have,
p(x, t + ∆t|x′ , t)
W (x|x′ ; t) = lim
∆t→0 ∆t
(45)
p(x, ∆t|x′ )
= lim = W (x|x′ )
∆t→0 ∆t
In order to derive the Master equation, let’s consider the transition x0 → x between times t0 to t + ∆t, through an
intermediate state x′ at a time t. The CKE for this setup can be written as,
Z
p(x, t + ∆t|x0 , t0 ) = dx′ p(x, t + ∆t|x′ , t)p(x′ , t|x0 , t0 ) (46)
The partial derivative of the conditional probability density p(x, t|x0 , t0 ) can be defined as,
∂ p(x, t + ∆t|x0 , t0 ) − p(x, t|x0 , t0 )
p(x, t|x0 , t0 ) = lim (47)
∂t ∆t→0 ∆t
Substituting eq. (46) into (47), we have,
Z
∂ 1 1
p(x, t|x0 , t0 ) = lim dx′ p(x, t + ∆t|x′ , t)p(x′ , t|x0 , t0 ) − p(x, t|x0 , t0 ) (48)
∂t ∆t→0 ∆t ∆t
Let use the normalization condition 1 = dx p(x′ , t + ∆t|x, t) and insert it into the second term in eq. (48)
R ′
Z Z
∂ 1 ′ ′ ′ ′ ′
p(x, t|x0 , t0 ) = lim dx p(x, t + ∆t|x , t)p(x , t|x0 , t0 ) − dx p(x , t + ∆t|x, t)p(x, t|x0 , t0 )
∂t ∆t→0 ∆t
(49)
p(x, t + ∆t|x′ , t) p(x′ , t + ∆t|x, t)
Z
∂ ′ ′
=⇒ p(x, t|x0 , t0 ) = lim dx p(x , t|x0 , t0 ) − p(x, t|x0 , t0 )
∂t ∆t→0 ∆t ∆t
Using the definition of the transition rate from eq. (44), we finally have the differential form of CKE,
Z
∂
p(x, t|x0 , t0 ) = dx′ W (x|x′ ; t)p(x′ , t|x0 , t0 ) − W (x′ |x; t)p(x, t|x0 , t0 ) (50)
∂t | {z } | {z }
Gain Loss
∂p(x,t|x′ ,t′ )
4 Remember that W (x|x′ ; t) is not equal to derivative of conditional probability density: W (x|x′ ; t) ̸= ∂t
10
This equation is also called the Master equation, as it describes the evolution of the conditional PDF p(x, t|x0 , t0 ).
• W (x|x′ ): The first term is the Gain term as it corresponds to the transitions x′ → x, which leads to accumulation
in state x
• −W (x′ |x): The second term gives the rate at which the transition x → x′ occurs, suggesting a loss from state
x
From eq. (51), we can show the discrete version of the equation, in which case,
N
dP (j, t|i) X
= W (j|k)P (k, t|i) − W (k|j)P (j, t|i) (52)
dt | {z } | {z }
k=1
k̸=j Gain Loss
where W (j|k) denotes the transition rate for k → j and likewise W (k|j) indicates transition for process j → k. The
above equation can be written in a matrix form, by considering P (j, t|i) as a column matrix (for any given i),
P (1, t|i)
P (2, t|i)
P (j, t|i) = ≡ P(t) (53)
...
P (N, t|i)
Wjk = W (j|k) ∀j ̸= k
N
X (55)
Wjj = − W (k|j)
k̸=j
Therefore, the discrete master equation (eq. (52)) becomes a matrix equation of the form,
dP(t)
= WP (56)
dt
with the formal solutions to the above equations being given by,
11
Figure 2: Schematic showing Gain and Loss probabilities when considering a 3-state system with the initial condition
that the system is in state 1 at t0 and finally in state 3 at t + ∆t
In the 3-state example, let’s consider the case that the system is in state 1 at t0 , we need to find the rate of probability
to state 3 (dP (3, t|1)/dt). For this we need to do the substitution N = 3, j = 3, i = 1 in eq. (46),
2
dP (3, t|1) X
= [W (3|k)P (k, t|i) − W (k|3)P (3, t|1)]
dt
k=1
(59)
= W (3|1)P (1, t|1) + W (3|2)P (2, t|1) − W (1|3)P (3, t|1) + W (2|3)P (3, t|1)
| {z }
Gain | {z }
Loss
where
• the first term on RHS refers to the Gain, in which the system while initially in state 1 at t0 , goes to state 3 at
t + ∆t via intermediate states 1 & 2 at time t.
• The second term on the other hand, is in the scenario, if the system has already reached state 3 at time t (from
state 1 at t0 ), then it considers the loss from state 3 to states 1 & 2 at time t + ∆t.
12
2.5 Stationary State Distributions
From section 4.6.4, it is clear that there exists a state whose time-derivative is zero - it is referred to as the stationary
state distribution Pst , such that,
dPst
= WPst = 0
dt
N
X (60)
=⇒ W (j|k)Pst (k|i) − W (k|j)Pst (j|i) = 0
k̸=j
The important takeaway from this section is that the time-independence of the transition matrix leads to the existence
of a stationary state solutions Pst (j, t|i) which follows the eq. (60). This equation can be exactly solved employing
additional constraints that the total conditional probability should be one.
N
X
Pst (j|i) = 1 (61)
j=1
Combining eqs (60) and (61), we can exactly solve for all the values of Pst (j|i).
While eq. (60) gives the solution for stationary distribution such that the total sum goes to zero. There is a more
stronger condition, which is of great importance in which each of the term of the sum goes to zero independently,
W (j|k)Pst (k|i) = W (k|j)Pst (j|i) k ̸= j
W (j|k) Pst (k|i) (62)
=⇒ = k ̸= j
W (k|j) Pst (j|i)
which is called the detailed balance condition. It is possible to calculate the stationary conditional probability
distribution Pst (j|i) in the case of detailed balance condition, which is,
N −1
X W (k|j)
Pst (j|i) = 1 + (63)
W (j|k)
k̸=j
There is also a condition which is more stronger than the detailed balance condition, which refers to the microscopic
reversibility wherein
W (j|k) = W (k|j) (64)
In this scenario, the transition rates for j → k and k → j are exactly same. Using the above equation in eq. (62),
we get that
Pst (k|i) = Pst (j|i) k ̸= j (65)
which indicates that the probability to be in any state is exactly the same. Therefore, applying the normalization
constraint, we can immediately note that the probability distribution is trivially given by,
1
Pst (k|i) = ∀k = 1, · · · , N (66)
N
It is important to note that this is precisely the distribution that is obtained from the maximum entropy principle
without any additional constraints. If the stationary state Pst essentially maximizes entropy of the system, can we
say more about the time-evolution of entropy in these systems?
13
Now we seek to understand what is the rate of change of entropy by taking a time-derivation of eq. (67),
N
dS X dP (j, t)
=− ln [P (j, t)] + P (j, t) (68)
dt j=1
dt
The second term goes to zero, as time-derivative can be brought outside the summation and the total sum is essentially
normalization condition, which is equal to 1 and hence the time-derivative is zero. Therefore, we have the rate of
change of entropy to be,
N
dS X dP (j, t)
=− ln [P (j, t)] (69)
dt j=1
dt
Let’s substitute the Master equation (eq. (46)) for dP (j, t)/dt in the above equation, then we have,
N N
dS X X
=− ln [P (j, t)] [W (j|k)P (k, t) − W (k|j)P (j, t)]
dt j=1 k=1
k̸=j
(70)
N X
X N
= ln [P (j, t)] W (k|j)P (j, t) − ln [P (j, t)] W (j|k)P (k, t)
| {z } | {z }
j=1 k=1
k̸=j Term-1 Term-2
Let us add equations (70) and (71) collect the Term-1 and Term-2 together, we get,
dS 1 X nh ih io
= ln P (j, t) − ln P (k, t) W (k|j)P (j, t) − W (j|k)P (k, t) (72)
dt 2
j,k
j̸=k
If the process has microscopic reversibility, such that W (k|j) = W (j|k), then we can rewrite the above equation such
that,
dS 1 X nh i h io
= ln P (j, t) − ln P (k, t) W (k|j) P (j, t) − P (k, t) (73)
dt 2
j,k
j̸=k
In the above equation, W (k|j) > 0(∀k ̸= j), therefore we can make the following comments on dS/dt
• But even, if P (j, t) < P (k, t), then ln P (j, t) < ln P (k, t) =⇒ dS
dt >0
Therefore, for any circumstance we can say that dS/dt > 0, indicating that entropy is a non-decreasing function
of time.
Of course it can be! When the distribution is a stationary distribuion P (j, t) → Pst (j), then from eq. (69)
dPst /dt = 0 suggesting that dS/dt = 0. In this case, we have the scenario that the system has achieved the
maximum entropy distribution. We have already seen that in the symmetric case W (j|k) = W (k|j), stationary
distribution is Pst = 1/N (which is indeed the maximum entropy distribution).
While we have the Master equation for continuous Markov processes, they are still integral equations. Can we
recast them into a differential form? This differential form is also called as the Fokker-Planck equations. To derive
the Fokker-Planck equations from the master equation given in equation (74), we will consider the small increment
∆x = x′ − x and perform a Taylor expansion of the transition matrix W (x|x′ ) = W (x, ∆x)
14
Starting with the master equation:
Z
∂
p(x, t|x0 , t0 ) = dx′ W (x|x′ )p(x′ , t|x0 , t0 ) − W (x′ |x)p(x, t|x0 , t0 ) (74)
∂t | {z } | {z }
Gain Loss
We can transform the integral from x′ to ∆x and expand W (x, δx) in a Taylor’s series.
∂p 1 ∂2p
W (x, δx) = p(x, t|x0 , t0 ) + ∆x + (∆x)2 + · · · (75)
∂x′ x′ x 2 ∂x′2 x′ x
15
3 Langevin equation - velocity of the Brownian particle
In the chapter we derived Fokker-Planck equation (FPE) through a simple random walk problem. This FPE dictates
the evolution of the probability density of the position of the Brownian particle. But what about the following
questions?
The answer to all these questions starts from the equation given by Langevin (1908) for the velocity of a Brownian
particle diffusing in a fluid. In this section, we will present the Langevin equation, study it’s behaviour and understand
the nature of velocity from a statistical viewpoint.
Along this pursuit, we will discover a new method to solve continuous stochastic processes, called the Stochastic
Differential Equations (SDEs). In the previous section, we showed that evolution of joint PDFs define the stochastic
processes completely. We had shown that under certain approximations, we can arrive at a differential equation for
Markov processes, which is called the Fokker-Planck equation (FPEs). However, in many instances, FPEs might be
very difficult to solve, in which case, we shall aim to solve the corresponding SDEs.
We shall show later in the next chapter that there exists a one-one correspondence between FPEs
and SDEs for Markov processes.
The key difference between the study using FPEs and SDEs is that, while FPEs give the conditional PDFs of the
stochastic processes, the SDEs can provide typically the moments/time-correlation of the stochastic process and not
the entire PDFs.
In his original work in 1908, Langevin wrote down the equation of motion for a particle in a fluid as given by,
dv
m = −γv + η(t) (76)
dt
where m is the mass of the particle, −γv corresponds to the frictional/viscous force experienced by the particle
and η(t) is a random force that is due to the cumulative effect of collisions on the Brownian particle due to the
atoms/molecules in the fluid. The constant γ indicates the strength of frictional/viscous force.
In the absence of the random force (η(t) = 0), if we solve the Langevin equation
dv
m = −γv
dt
=⇒ v(t) = v(0)e−(γ/m)t (77)
=⇒ lim v(t) = 0
t→∞
which implies that after a sufficiently long time, the Brownian particle that is inside the fluid will eventually come
to rest. But this is in stark contrast to the observation of Brownian motion, where the particle is always executing
a continuous zig-zag motion. Therefore, this itself indicates that there is necessarily another force, apart from the
viscous/frictional force acting on the Brownian particle, which is keeping it continuously in motion.
The random force, η(t), is statistical in nature, as it emerges from averaging over all the effects of the system on the
Brownian particle. We make the following assumptions for the Brownian particle:
16
• The random force experienced by the Brownian particle is isotropic in nature — meaning the collisions are
occurring equally in opposing directions, effectively canceling each other out. This leads to the fact that the
average of the random force is zero.
• The collisions occurring on the Brownian particle due to different atoms/molecules are completely uncorrelated.
This can be realized by considering that the random force has no time-correlation at different instants of time.
These assumptions can be put into mathematical form using the formula:
where the first equation indicates that the mean force due to collisions is zero, and the second equation shows that
the forces due to collisions are not correlated at different instants of time. The parameter Γ corresponds to the
variance of the random force η(t).
In the literature of stochastic processes, η(t) is generally referred to as Stationary Gaussian white noise. Let’s
delve deeper into what this term means. Stationary Gaussian white noise is a specific type of random process
characterized by the following properties:
1. Gaussian Distribution: The values of η(t) at any given time t are normally (or Gaussian) distributed. This
means that the probability density function of η(t) follows a Gaussian curve, often specified by a mean (which is zero
in our case) and a variance (given by Γ). Mathematically, this can be expressed as:
η(t) ∼ N (0, Γ)
2. White Noise: The term ”white” in white noise implies that the power spectral density of the noise (which
is the Fourier transform of correlation function ⟨η(t)η(t′ )⟩) is constant across all frequencies. This is analogous to
white light in optics, which contains all frequencies (colors) in equal intensity.
3. Stationarity: The statistical properties of η(t) do not change over time. This means that the mean, variance,
and autocorrelation (or more generally, all moments) of η(t) are constant over time. The stationarity property
simplifies the analysis of stochastic processes as it allows for time-invariant statistical descriptions. This can also be
seen in the case where the autocorrelation of the white noise ⟨η(t)η(t′ )⟩ depends only on the time-interval t − t′ and
don’t explicitly depend on t and t′ .
Physical Interpretation In the context of Brownian motion, Gaussian white noise represents the random,
uncorrelated impacts from the surrounding molecules on the Brownian particle. Each collision is independent of the
others, and over time, these collisions average out to create a net zero effect in any particular direction. However,
the variance Γ represents the intensity of these fluctuations.
In the framework of SDEs, the typical exercise is to solve for the moments of the stochastic process (here it is v(t))
and obtain it’s limiting behaviour in time. At this point, we shall assume that velocity, v(t), is stationary process and
therefore has a stationary distribution which corresponds to the equilibrium distribution from Maxwell-Boltzmann
equation. This is a reasonable assumption, considering the fact that velocity in the Langevin equation is driven by
white noise η(t), which in itself is a stationary process, therefore it is likely that v(t) is also a stationary process.
Let’s begin by writing down the formal solution to Langevin equation (eq. (76)) (here we are using particular
integrals)
1 t ′ −γ(t−t′ )/m ′
Z
v(t) = v(0)e−γt/m + dt e η(t ) (79)
m 0
White noise η(t) is stationary → velocity v(t) is stationary process → Existence of equilibrium PDF for v, peq (v)
→ Equilibrium distribution = MB distribution
17
If velocity is a stationary process and stationary state distribution function corresponds to MB distribution, then
it has to ensure that in the long-time limit, the mean velocity and mean-square velocity follows the relationship,
kB T
lim ⟨v(t)⟩ = 0 lim v(t)2 = (80)
t→∞ t→∞ m
where T is the temperature and kB is the Boltzmann’s constant.
Let’s consider the average of the formal solution given eq. (79) to calculate the limits described eq. (80),
1 t ′ −γ(t−t′ )/m
Z
⟨v(t)⟩ = ⟨v(0)⟩ e−γt/m + dt e ⟨η(t′ )⟩ = ⟨v(0)⟩ e−γt/m (81)
m 0
we have used average noise is zero, ⟨η(t)⟩ = 0, to arrive at the final equality. Now considering the long-time limit
(t → ∞), we clearly get ⟨v(t)⟩ → 0. Therefore, we have shown that the average velocity satisfies the condition for
thermal equilibrium where velocity follows a Maxwell-Boltzmann distribution.
Now, for ⟨v 2 (t)⟩, let’s square eq. (79) and take it’s average,
Z t Z t
1 ′ ′′
⟨v 2 (t)⟩ = ⟨v(0)2 ⟩e−2γt/m + 2 dt′ e−γ(t−t )/m dt′′ e−γ(t−t )/m ⟨η(t′ )η(t′′ )⟩
m 0 0 | {z }
Γδ(t′ −t′′ )
Z t
Γ ′ (82)
= ⟨v(0)2 ⟩e−2γt/m + dt′ e−2γ(t−t )/m
m2 0
−2γt/m Γ
=⇒ 2 2
⟨v (t)⟩ = ⟨v(0) ⟩e + 1 − e−2γt/m
2γm
Let’s consider this expression in the long-time limit, where the exponential terms would vanish, and compare it to
the MB distribution value
Γ kB T
lim ⟨v(t)2 ⟩ = =
t→∞ 2γm m (83)
=⇒ Γ = 2γkB T
where the last expression is also a form of fluctuation-dissipation relationship. This equality is neccessary for the
Langevin equation to give the correct stationary state distribution which corresponds to the MB velocity distribution.
Fluctuation-Dissipation Relationship
Γ = 2γkB T
On the LHS, we have Γ, which is the fluctuation of the noise and on RHS we have γ, which is essentially the
dissipation due to friction experienced by the particle in the liquid.
• The larger the fluctuation in noise, we will have better dissipation through the frictional force.
• This expression also shows that the fluctuation in noise will be increase with increasing temperature T
• Remember the relationship between specific heat and energy fluctuations that we studied in the canonical
ensemble. There are such fluctuation-dissipation relationships that show up recurrently in different forms at
different junctures when studying systems reaching thermodynamic equilibrium
While we have established that at long-times, the mean-square velocity will reach the Maxwell-Boltzmann limit
and shown that the white noise can sufficiently allow the system to evolve from non-equilibrum to equilibrium, we
haven’t discussed anything about the nature of particle displacements in this model. Unless we do that, we can’t
make a comparison with the Brownian model presented by Einstein (or random walk), in which we have shown that
mean-squared displacement (MSD) explicitly follows a linear time-dependence.
18
But in order to study the displacement, we first calculate the velocity autocorrelation function (VACF) because
the (MSD) is linked to the VACF (as we briefly show below)
Z t
x(t) = dt′ v(t′ )dt′
0
Z t Z t
′
2
=⇒ ⟨x(t) ⟩ = dt dt′′ ⟨v(t′ )v(t′′ )⟩
0 0
Z t Z t (84)
=⇒ ⟨x(t)2 ⟩ = dt′ dt′′ ⟨v(t′ − t′′ )v(0)⟩
0 0
Z t
=⇒ ⟨x(t)2 ⟩ = 2 dτ (t − τ )⟨v(τ )v(0)⟩
0
where in the last expression, we have shown that MSD is related to the VACF (⟨v(τ )v(0)⟩) by using the idea that
v(t) is a stationary stochastic process and therefore gives rise to time-translational invariance in VACF.
In order to calculate VACF, multiply the Langevin equation (eq. (76)) by v(0) and take an ensemble average,
d −γ 1
⟨v(t)v(0)⟩ = ⟨v(t)v(0)⟩ + ⟨η(t)v(0)⟩ (86)
dt m m
where the last-term will become zero, as noise at time t can’t be correlated to the initial velocity, as it would break
the causality structure. Therefore, we have the equation,
d γ
Cv (t) = − Cv (t)
dt m (87)
=⇒ Cv (t) = Cv (0)e−γt/m = Cv (0)e−t/τB
finally suggesting that the VACF decays exponentially for a particle following the Langevin equation. The exponential
decay rate is governed by a timescale τB = m/γ, which is linked to both the mass of the particle and the frictional
constant.
Using the VACF we have just calculated in eq. (87), we can calculate MSD from eq. (84),
Z t
2
⟨x(t) ⟩ = 2 dτ (t − τ )⟨v(τ )v(0)⟩
0
Z t
= 2Cv (0) dτ (t − τ )e[−τ /τB ] (88)
0
h i
= 2Cv (0)τB2 t/τB + 1 − e−t/τB
Since, v(t) is a stationary process, we can substitute Cv (0) = (kB T /m) and therefore we obtain the formula for MSD
of a particle following Langevin equation as,
kB T 2mkB T −γt/m
⟨x(t)2 ⟩ = 2 t− 1 − e
γ γ2
| {z }
D
(89)
⟨x(t)2 ⟩ = 2Dt − 2DτB 1 − e−t/τB
where we have used D = (kB T /γ), defining the diffusivity through the ratio of temperature and frictional constant.
Here the time-constant τB plays a crucial role in exhibiting different regimes of diffusion process. One can also
19
imagine τB as the typical timescale between successive collisions of atoms/molecules on the large Brownian particle.
Employing two distinct limits, we will obtain two distinc regimes and show the connection between Einstein’s model
and Langevin’s model
Long-time limit (t ≫ τB )
The long-time limit of MSD gives an expression that exactly matches with that obtained from the Einstein’s model
of diffusion equation,
t≫τB
⟨x(t)2 ⟩ −−−−→ 2Dt (90)
which is the exact expression for MSD as obtained from Einstein’s model of diffusion process. Therefore, at sufficiently
long-times t ≫ τB , the particle has undergone a large number of collisions and the behaviour of Langevin’s model
mimics that of the Einstein’s model.
Short-time limit (t ≪ τB )
In the short-time limit the MSD obtained from Langevin equation gives very different expression,
2 t≪τB 1 2
⟨x(t) ⟩ −−−−→ 2Dt − 2DτB 1 − 1 + t/τB − (t/τB ) − ...
2
(91)
t≪τB kB T 2
=⇒ ⟨x(t)2 ⟩ −−−−→ (D/τB )t2 = ⟨v 2 ⟩eq t2 = t
m
which indicates that displacement is proportional to time (x ∼ vt), reminiscent of a particle moving a constant
velocity (which is equal to RMS thermal velocity kB T /m).
As we have established that there is indeed connection between the Langevin’s model and Einstein’s model through
MSD, we seek to find if there is there a more direct way to establish a connection. To achieve this, we go to the
inertial limit, where we assume the particle is very massive (m ≫ 1) and therefore ignore the acceleration in the
Langevin equation (eq. (76)). Therfore, we have,
dx 1
= η(t) (92)
dt γ
which suggests that displacement of the particle x(t) is nothing but the integral of white noise. However, remember
that we don’t know how to do calculus with quantities like ξ(t), we shall see more about that in the next section.
But in the meantime, we can easily get the behaviour of the moments of x(t). It is trivial to show that ⟨x(t)⟩ = 0,
as the mean of ξ(t) is zero. The MSD can be calculated using,
Z t Z t
′
2
⟨x(t) ⟩ = 2D dt dt′′ ⟨ξ(t′ )ξ(t′′ )⟩ = 2Dt (96)
0 0 | {z }
δ(t′ −t′′ )
and it exactly corresponds to the behaviour of MSD obtained from the random walk/Einstein’s model (where we
obtained it from FPEs).
20
3.4.1 Brownian motion & Wiener’s process
Using Brownian motion, we can define what is called as the Wiener’s process W (t), and rewrite the Langevin equation
in terms of dW (t). Recall the stochastic differential equation (SDE) for displacement in Brownian motion is,
dx √
= 2Dξ(t) (97)
dt
where ξ(t) is the dimensionless Gaussian white noise with properties defined in equation (93). This can be rewritten
as, √ √
dx = 2Dξ(t)dt = 2DdW (t) (98)
where we have written ξ(t)dt = dW (t). This is the formal definition of Wiener’s process. Therefore, basically
Wiener’s process is nothing but Brownian motion with D = 1/2.
Also, Wiener’s process is essentially the integral of Standard Gaussian white noise,
Z t
dW (t) = ξ(t)dt =⇒ W (t) = dt′ ξ(t′ ) (99)
0
Other way around, ξ(t) can be regarded as the derivative of the Wiener process, ξ(t) = dW (t)/dt.
As we shall see in the next chapter, we will develop special methods to describe the differential of Wiener’s process,
dW (t). Therefore, we would like to rewrite the Brownian motion equation (eq. (97)) and the Langevin equation (eq.
(76)) in terms of Wiener’s process differential.
Displacement in Brownian motion As shown already in the last few lines, the displacement of a particle
executing Brownian motion is given by,
dx(t) √
= 2Dξ(t)
dt √ (100)
dx(t) = 2D dW (t)
Velocity in Langevin equation Using the fluctuation-dissipation relationship, the Langevin equation can be
explicitly written as,
√
dv(t) γ 2γkB T
= − v+ ξ(t)
dt m m
√
γ 2γkB T
dv(t) = − v dt + dW (t) (101)
m m√
1 2D
dv(t) = − v dt + dW (t)
τB τB
The above two equations motivate us to note that there may be general structure to the behaviour stochastic
differential equation in which it will be typically sum of terms associated with dt and dW (t). More on this follows
in the next chapter.
21
4 Stochastic Differential Equations and Itô Calculus
The Wiener process, also known as (dimensionless) Brownian motion (D = 1/2), is a fundamental concept in the
theory of stochastic processes and serves as a cornerstone for stochastic differential equations (SDEs). It is central to
the development of Itô calculus, which provides the mathematical framework to handle stochastic integrals involving
Wiener processes. In principle, Wiener’s process W (t), is essentially defined as,
Z t
dW (t) = ξ(t)dt =⇒ W (t) = dt′ ξ(t′ ) (102)
0
• W (t) has continuous paths: with probability 1, the function t 7→ W (t) is continuous.
Mathematically, these properties can be expressed as:
After an elaborate discussion on Langevin equation and the introduction of properties of white noise and Wiener’s
process, we are at a position to introduce stochastic diferential equations (SDEs). SDEs are differential equations
in which one or more terms are stochastic processes. The Wiener process plays a central role in SDEs (particularly
involving Markov processes), and its properties lead to the development of Itô calculus, which is essential for handling
stochastic integrals (integrals of stochastic processes).
An SDE for any stochastic process y(t) can be generally written as (look at the last section of previous chapter),
where µ(y, t) is the drift term and σ(y, t) is the diffusion term. The integral with respect to the Wiener process,
W (t), is interpreted in the Itô sense, which accounts for the non-differentiable nature of W (t).
Itô’s lemma is a fundamental result in stochastic calculus that provides a way to differentiate functions of stochastic
processes. It extends the rules of ordinary calculus to functions of processes driven by Wiener processes, accounting
for the additional complexity introduced by the stochastic nature of these processes.
Let y(t) be a stochastic process governed by the stochastic differential equation (SDE):
where W (t) is the Wiener process, µ(y, t) is the drift term, and σ(y, t) is the diffusion term.
Now, consider a twice-differentiable function f (y, t). For the stochastic process y(t), Itô’s lemma says that the
differential df (t) becomes,
1 2 ∂2f
∂f ∂f ∂f
df (y, t) = +µ + σ 2
dt + σ dW (t) (106)
∂t ∂y 2 ∂y ∂y
22
2
1 2∂ f
Here, the additional term 2 σ ∂y 2 dt, which doesn’t exist in normal calculus, accounts for the contribution of the
stochastic component.
This lemma is fundamental in stochastic calculus, providing a way to find the differential of a function of a stochas-
tic process and enabling further applications such as deriving the Fokker-Planck equation from the corresponding
SDE. Below, we show how to derive the Ito’s lemma using simple Taylor expansion.
Derivation
To derive Itô’s lemma using Taylor’s series expansion, we start with a stochastic process y(t) that follows a stochastic
differential equation (SDE) of the form:
For a function f (y, t), the Taylor series expansion upto O(t) is considered, ignoring all other higher-order terms in t,
∂f ∂f 1 ∂2f
df (y, t) = dt + dy + (dy)2 + · · ·
∂t ∂y 2 ∂y 2
the reason we have retained dy 2 will become clear as we substitute teh SDE into Taylor expansion,
We substitute the SDE dy(t) = µ(y, t) dt + σ(y, t) dW (t) into the Taylor expansion.
First-Order Terms
The term ∂f ∂f
∂t dt remains unchanged, whereas, the term ∂y dy becomes:
∂f ∂f ∂f
(µ(y, t) dt + σ(y, t) dW (t)) = µ(y, t) dt + σ(y, t) dW (t).
∂y ∂y ∂y
Second-Order 2
Terms
The term 21 ∂∂yf2 (dy)2 needs careful consideration because (dy)2 includes the contribution from the Wiener process.
(µ(y, t) dt + σ(y, t) dW (t))2 = (µ(y, t))2 (dt)2 + 2µ(y, t)σ(y, t) dt dW (t) + (σ(y, t))2 (dW (t))2 .
Since (dt)2 ≈ 0 and dt dW (t) ≈ 0, we are left with:
1 ∂2f 2 1 ∂2f
(dy) = (σ(y, t))2 dt.
2 ∂y 2 2 ∂y 2
Combining all terms, the differential df (y, t) according to Itô’s lemma is:
∂f ∂f ∂f 1 ∂2f
df (y, t) = dt + µ(y, t) dt + σ(y, t) dW (t) + (σ(y, t))2 dt.
∂t ∂y ∂y 2 ∂y 2
∂2f
∂f ∂f 1 ∂f
df (y, t) = + µ(y, t) + σ 2 (y, t) 2 dt + σ(y, t) dW (t). (107)
∂t ∂y 2 ∂y ∂y
23
4.4 Derivation of the Fokker-Planck Equation Using Itô’s Lemma
To derive the Fokker-Planck equation, consider an arbitrary function f (y) (note we take that f has no explicit time-
dependence) and take its expectation value using the conditional probability density p(y, t|y0 , t0 ). The expectation
value of f (y) is given by: Z ∞
⟨f (y)⟩ = f (y)p(y, t|y0 , t0 ), dy (108)
−∞
Using Itô’s lemma, the differential df (y) can be written as (note that ∂f /∂t = 0),
∂2f
∂f 1 2 ∂f
df (y) = µ(y, t) + σ (y, t) 2 dt + σ(y, t) dW (t) (109)
∂y 2 ∂y ∂y
Taking the expectation value and noting that the expectation of the stochastic integral term σ(y, t) ∂f
∂y dW (t) is
zero, we get
∂2f
df (y) ∂f 1 2
= µ(y, t) + σ (y, t) 2 (110)
dt ∂y 2 ∂y
Using the definition of expectation and moving the differential operators outside the integral, we have
d ∞
Z ∞
∂2f
Z
∂f 1
f (y)p(y, t|y0 , t0 ) dy = µ(y, t) + σ 2 (y, t) 2 p(y, t|y0 , t0 ) dy (111)
dt −∞ −∞ ∂y 2 ∂y
Using integration by parts and assuming that the boundary terms vanish we can get the RHS. Meanwhile, in the
LHS, the total derivative on f (y) will become a partial derivative of p(y, t),
Z ∞ Z ∞
1 ∂2 2
∂p ∂
f (y)dy = [µ(y, t)p] − [σ (y, t)p] f (y)dy (112)
−∞ ∂t −∞ ∂y 2 ∂y 2
Since f (y) is arbitrary, the integrands must be equal, leading to the Fokker-Planck equation:
∂p(y, t|y0 , t0 ) ∂ 1 ∂2 2
=− [µ(y, t)p(y, t|y0 , t0 )] + 2
σ (y, t)p(y, t|y0 , t0 ) (113)
∂t ∂y 2 ∂y
In summary, Itô’s lemma provides a crucial link between the differential equations governing stochastic processes
and the partial differential equations describing their probability densities. This connection allows us to derive the
Fokker-Planck equation, offering a powerful tool for analyzing the behavior of stochastic systems.
The SDE-FPE correspondence is a powerful tool that connects the stochastic differential equations describing the
dynamics of a system with the partial differential equations describing the evolution of probability densities. For a
general SDE of the form:
dy(t) = µ(y, t), dt + σ(y, t), dW (t) (114)
the corresponding Fokker-Planck equation for the probability density p(y, t) is:
∂p(y, t) ∂ 1 ∂2 2
=− [µ(y, t)p(y, t)] + 2
σ (y, t)p(y, t) (115)
∂t ∂y 2 ∂y
By applying this correspondence to the SDEs of Brownian motion and Langevin motion, we obtain their respective
Fokker-Planck equations, which provide a comprehensive description of the statistical properties of these fundamental
stochastic processes.
24
Process Stochastic SDE FPE
Variable
√ ∂p(x,t) 2
Brownian x(t) dx(t) = 2D dW (t) ∂t = D∂ p(x,t)
∂x2
Motion
√ 2
∂p(v,t) D ∂ p(v,t)
Ornstein v(t) dv(t) = − τ1B v(t) dt + 2D
τB dW (t) ∂t = 1 ∂
τB ∂v [vp(v, t)] + 2
τB ∂v 2
Uhlenbeck
Table 1: Comparison of Brownian Motion and Langevin Motion (Please note that the PDFs here are inherently
conditional PDFs.
Now that we know that there is a correspondence between SDE and FPE, a clear picture emerges on how to connect
between the SDE obtained for displacement to arrive at a FPE for the same. Consider the SDE for displacement
given by eq. (97), √
dx(t) = 2DdW (t) (116)
Comparing it with the general SDE (104), we observe that
√
µ=0 σ= 2D
The velocity described in the Langevin equation is also called as the Ornstein-Uhlenbeck (OU) process. It is a
fundamental stochastic process that describes the velocity of a particle undergoing Brownian motion with friction.
It is a continuous-time Gaussian Markov process, and its statistical properties can be derived starting from the
Fokker-Planck equation (FPE) for the velocity of a Langevin particle. Let’s write the Langevin equation for velocity,
√
1 2D
dv(t) = vdt + dW (t) (118)
τB τB
comparing this with the standard SDE ((104)), we can see that
√
1 2D
µ=− v σ=
τB τB
Therefore the corresponding FPE is given by,
∂p(v, t) 1 ∂ D ∂ 2 p(v, t)
= [vp(v, t)] + 2 (119)
∂t τB ∂v τB ∂v 2
which is the Fokker-Planck equation for the obtaining the evolution of distribution of velocity of the particles following
Brownian motion.
The Fokker-Planck equation for the Ornstein-Uhlenbeck process can be solved using various methods. But here
we shall solve it using an intuitive idea that the velocity distribution has to be a Stationary Gaussian process
suggesting that the distribution corresponding to velocity will have a Gaussian form with mean, ⟨v(t)⟩ and variance
2
σv2 = v(t)2 − ⟨v(t)⟩ .
25
We can find ⟨v(t)⟩ and σv2 (t) from Langevin equation (76), and substitute into this ansatz to get the final results.
In fact, we have already calculated the mean of velocity from Langevin equation in eq. (81), which is
We have also already calculated the mean-squared velocity from Langevin equation to be (82),
kB T
⟨v 2 (t)⟩ = ⟨v(0)2 ⟩e−2t/τB + 1 − e−2t/τB (122)
m
From the last two equation, the variance of velocity can be calculated as,
2
σv2 = ⟨v 2 (t)⟩ − ⟨v(t)⟩
kB T (123)
=⇒ σv2 (t) = σv20 e−2t/τB + 1 − e−2t/τB
m
where σv20 = ⟨v(0)2 ⟩ − ⟨v(0)⟩2 is the variance in the initial distribution of velocity.
Combining the results for the mean and variance, the complete solution for the probability density function p(v, t)
of the velocity is:
2 !
1 v − ⟨v(0)⟩e−t/τB
p(v, t|v0 , 0) = q exp − 2σ 2 e−2t/τB + kB T 1 − e−2t/τB (124)
2π σv20 e−2t/τB + kBmT 1 − e−2t/τB v0 m
The solution looks less cumbersome, if we consider an initial distribution of delta function, i.e. p(v0 ) = δ(v − v0 ).
In that case, the variance of initial distribution is zero, σv20 = 0 and the mean ⟨v(0)⟩ = v0 . Therefore, we have, the
solution in that scenario to be,
2 !
m m v − v0 e−t/τB
r
p(v, t|v0 , 0) = exp − (125)
2πkB T 1 − e−2t/τB 2kB T 1 − e−2t/τB
Therefore, this basically tells that if I have an initial distribution that is typically given by a delta function, p(v, 0) =
δ(v − v0 ), then it will naturally evolve with time according to the eq (125) and eventually go into a stationary
distribution, pst (v) at very long times t → ∞. What does this stationary solution looks like?
As t → ∞, the process reaches a stationary state where the probability density function no longer changes with time.
In this limit,
mv 2
r
m
pst (v) = exp − (126)
2πkB T 2kB T
This is a basically the MB distribution of velocity.
The stationary solution can also be obtained by solving the differential equation,
1 ∂ D ∂ 2 pst (v)
[vpst (v)] + 2 =0 (127)
τB ∂v τB ∂v 2
Verify by solving the above differential equation that the pst (v) is indeed given by eq. (126).
26