NewcombBenford (1) - Copie

The Newcomb–Benford law: Scale invariance and a simple Markov
process based on it
Andrea Burgosa) and Andre
s Santosb)
Departamento de Fısica, Universidad de Extremadura, 06006 Badajoz, Spain
(Received 28 January 2021; accepted 23 April 2021)
The Newcomb–Benford law, also known as the rst-digit law, gives the probability distribution
associated with the rst digit of a dataset so that, for example, the rst signicant digit has a probability
of 30.1% of being 1 and 4.58% of being 9. This law can be extended to the second and next signicant
digits. This article presents an introduction to the discovery of the law and its derivation from the scale
invariance property as well as some applications and examples. Additionally, a simple model of a
Markov process inspired by scale invariance is proposed. Within this model, it is proved that the
probability distribution irreversibly converges to the Newcomb–Benford law, in analogy to the
irreversible evolution toward equilibrium of physical systems in thermodynamics and statistical
mechanics. # 2021 Author(s). All article content, except where otherwise noted, is licensed under a Creative Commons
Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
https://doi.org/10.1119/10.0004957
I. INTRODUCTION even though it was found nearly sixty years earlier by

Newcomb. This is one more manifestation of the so-called
In the late 19th century, an astronomer and mathematician Stigler’s law, according to which no scientic discovery is
visits his institution’s library and consults a table of loga- named after the person who discovered it in the rst place. In
rithms to perform certain astronomical calculations. As on fact, as Stigler himself points out,3 the law that bears his
previous occasions, he is struck by the fact that the rst name was actually spelled out in a similar way twenty-three
pages (those corresponding to numbers that start at 1) are years earlier by the American sociologist Robert K. Merton.
much more worn than the last ones (corresponding to num- In order not to fall completely into Stigler’s law, many
bers that start at 9). Intrigued, this time he decides not to authors refer to Eq. (1) as Newcomb–Benford’s law and that
07 December 2023 11:31:34

overlook the matter. He closes his eyes to concentrate, is the convention (by means of the acronym NBL) that we
sketches a few calculations on a piece of paper, and nally will follow in this article.
smiles. He has found the answer and it turns out to be enor- While the literature on the NBL in specialized journals is
mously simple and elegant. vast,4 and several books on the topic are available,5,6 the
A little over half a century later, a physicist and electrical number of papers in general or science education journals is
engineer, who is unaware of his predecessor’s discovery, much scarcer.7–9 In this paper, apart from revisiting the NBL
observes the same curious phenomenon on the pages of loga- and illustrating it with a few examples, we construct a
rithm tables and arrives at the same conclusion. Both have Markov-chain model inspired by the invariance property of
understood that, in a long list of records frn g obtained from the NBL under the operation of a change of scale.
measurements or observations, the fraction pd of records The remainder of the paper is organized as follows. The
beginning with the signicant digit d ¼ 1; 2; …; 9 is not connection between the NBL (and some of its generaliza-
pd ¼ 1=9, as one might naively expect, but rather follows a tions) with the property of scale invariance is formulated in
logarithmic law. More specically, Sec. II. This is followed by a few examples in Sec. III. The
  most original part is contained in Sec. IV, where our
1 Markov-chain model is proposed and solved. Finally, the
pd ¼ log10 1 þ ; d ¼ 1; 2; …; 9: (1)
d paper is closed in Sec. V with some concluding remarks. For
the interested reader, Appendices A–C contain the most tech-
The numeric values of pd are shown in the second column of nical and mathematical parts of Secs. II and IV.
Table I. We see that the records that start with 1, 2, or 3
account for around 60% of the total, while the other six digits II. ORIGIN OF THE LAW
must settle for the remaining 40%.
Our 19th century character is Simon Newcomb (Fig. 1) Oftentimes, when one rst speaks to a friend, relative, or
and he published his discovery in a modest two-page note.1 even a colleague about the NBL, their rst reaction is usually
The second character is Frank Benford (Fig. 2) and he wrote of skepticism. Why is the rst digit not evenly distributed
a 22-page article2 in which, in addition to mathematically among the nine possible values? A simple argument shows
justifying Eq. (1), he showed its validity in the analysis of that, if a robust distribution law exists, it cannot be the uni-
more than 20 000 rst digits taken from sources as varied as form distribution whatsoever.
river areas, populations of American cities, physical con- Imagine a long list of river lengths, mountain heights, and
stants, atomic and molecular weights, specic heats, num- country surfaces, for example. It is possible that the lengths
bers taken from newspapers or the Reader’s
p Digest, postal of the rivers are in km, the heights of the mountains in m,
addresses, …, and the series n1 ; n, n2, or n!, among and the surfaces of countries in km2, but they could also be
others, with n 2 ½½1; 100. expressed in miles, feet, or acres, respectively. Will the dis-
With such overwhelming evidence, it is not surprising that tribution pd depend on whether we use some units or others,
Eq. (1) is usually known as Benford’s law (or rst-digit law), or even if we mix them? It seems logical that the answer
851 Am. J. Phys. 89 (9), September 2021 http://aapt.org/ajp V

C Author(s) 2021. 851
Table I. Probabilities for the rst, second, third, and fourth signicant digits.
Digit First Second Third Fourth

ð2Þ ð3Þ ð4Þ
d pd pd pd pd
0 … 0.119 68 0.101 78 0.100 18

1 0.301 03 0.113 89 0.101 38 0.100 14
2 0.176 09 0.108 82 0.100 97 0.100 10
3 0.124 94 0.104 33 0.100 57 0.100 06
4 0.096 91 0.100 31 0.100 18 0.100 02
5 0.079 18 0.096 68 0.099 79 0.099 98
6 0.066 95 0.093 37 0.099 40 0.099 94
7 0.057 99 0.090 35 0.099 02 0.099 90
8 0.051 15 0.087 57 0.098 64 0.099 86
9 0.045 76 0.084 90 0.098 27 0.099 82
should be negative; that is, the pd distribution should be

(statistically) independent of the units chosen; in other
words, it is expected that the pd distribution is invariant
under a change of scale. The uniform distribution pd ¼ 19
obviously does not follow that property of invariance.
Suppose we start from a dataset in which all the values of the
rst digit are equally represented. If we multiply all the Fig. 2. Frank Benford (1883–1948).
records in the dataset by 2, we can see that those records that
started before with 1, 2, 3, and 4 then start with either 2 or 3,
either 4 or 5, either 6 or 7, and either 8 or 9, respectively. On Equation (1) can be generalized beyond the rst digit. The
the other hand, all those that started with 5, 6, 7, 8, or 9 start probability that the m rst digits of a record match the
now with 1. All those possibilities are schematically ordered string ðd1 ; d2 ; …; dm Þ, where d1 2 f1; 2; …; 9g and
shown in Fig. 3. Therefore, if pd ¼ 19 initially, then p1 ¼ 59 di 2 f0; 1; 2; …; 9g if i  2, is given by (see Appendix A)
and p2 þ p3 ¼ p4 þ p5 ¼ p6 þ p7 ¼ p8 þ p9 ¼ 19 after dou-
07 December 2023 11:31:34

bling all the records, thus destroying the initial uniformity.
2 !1 3
m
Thus, the most identifying hallmark of the NBL is that it
X
pd1 ;d2 ;…;dm ¼ log10 41 þ di  10mi 5: (2)
must be applied to records that have units or, as Newcomb i¼1
himself writes,1 “As natural numbers occur in nature, they
are to be considered as the ratios of quantities.” As an example, the probability that the rst three digits of a
While relatively formal mathematical proofs of the NBL record form precisely the string ð3; 1; 4Þ is p3;1;4
can be found in the literature,10–12 in Appendix A, we pre- ¼ log10 ð1 þ 1=314Þ ¼ 0:001 38.
sent a sketch of a simpler, physicist-style derivation of the Once we have pd1 ;d2 ;…;dm , we can calculate the probability
law by imposing invariance under a change of scale.13 ðmÞ
pd that the mth digit is d, regardless of the values of the pre-
ceding m  1 digits, just by summing for all possible values
of those preceding m  1 digits,
9 X
9 9
ðmÞ
X X
pd ¼  pd1 ;d2 ;…;dm1 ;d : (3)
d1 ¼1 d2 ¼0 dm1 ¼0
In Table I, the law for the rst digit, pd, is accompanied by

the laws for the second, third, and fourth digits, obtained
from Eqs. (2) and (3). As the digit moves to the right, the
probability distribution becomes less and less disparate.
In Fig. 3 we saw that, when multiplying a dataset frn g by
2, part of the records that started with d ¼ 1; 2; 3; 4, speci-
cally those that start with the rst two digits
ðd; 0Þ; ðd; 1Þ; ðd; 2Þ; ðd; 3Þ, or ðd; 4Þ, will start with 2d, while
the rest will start with 2d þ 1. Let us call ad the fraction of
records that, initially starting with d ¼ 1; 2; 3; 4, start with 2d
by doubling all the records. Thus,
4
X
pd;d2
d2 ¼0
ad ¼ ; d ¼ 1; 2; 3; 4: (4)
pd
Fig. 1. Simon Newcomb (1835–1909). If the dataset fullls the NBL, then one has
852 Am. J. Phys., Vol. 89, No. 9, September 2021 A. Burgos and A. Santos 852
Fig. 4. Histograms showing the distribution of the rst digit for (from left to
right at each digit d) the NBL and the populations of the municipalities of
the provinces of Badajoz and Caceres, the region of Extremadura, and
Spain.
province of Badajoz), of the 223 municipalities of the prov-

ince of Caceres (plus the total population of the province of
Caceres), and the total population of the 388 municipalities
of the Spanish region of Extremadura, which encompasses
Fig. 3. Diagram showing how the rst digit changes when all the records of the provinces of Badajoz and Caceres (plus the total popula-
a dataset are doubled.
tions of the provinces of Badajoz and Caceres). We have
also considered the population (according to the 2016 cen-
3 5 sus) of the 8 110 Spanish municipalities. For all these lists of
log10 log10
a1 ¼ 2 ’ 0:584 96; a2 ¼ 4 ’ 0:550 34; records, we have analyzed the frequency of those starting
log10 2 3 with d ¼ 1; 2; …9 and the results are compared in Fig. 4.
07 December 2023 11:31:34

log10
2 There is a good general agreement between the population
(5a) data and the NBL predictions. This is interesting, since it is
not obvious that the distribution of the number of inhabitants
7 9 of municipalities should be invariant under a change of scale.
log10 log10
a3 ¼ 6 ’ 0:535 84; a4 ¼ 8 ’ 0:527 84: Let us now turn to two examples from astronomy. In the
4 5 rst one, we take the distance from Earth (in light-years and
log10 log10
3 4 in parsecs) to the 300 brightest stars.47 In the second case,
(5b) the data are the daily number of sunspots from 1818 to the
present.48 As seen in Fig. 5, the distances between our planet
We will use these values later in Sec. IV. and the 300 brightest stars agree reasonably well with the
NBL, despite the fact that the list is not excessively long
III. APPLICATIONS AND EXAMPLES (only 300 data points) and that there are “local” deviations
(for example, p6 < p7 in the two choices of units). This gen-
The applications and verications of the NBL are numer- eral agreement was to be expected, since the distribution of
ous and cover topics as varied and prosaic as the study of the
genome,14 atomic,15 nuclear,8,16 and particle17 physics,
astrophysics,18 quantum correlations,19 toxic emissions,20
biophysics,21 medicine,22 dynamical systems,23 distinction
of chaos from noise,24 statistical physics,25 scientic cita-
tions,26 tax audits,5,27 electoral28 or scientic29 frauds, gross
domestic product,30 stock market,9,31 ination data,32 cli-
mate change,33 world wide web,34 internet trafc,35 social
networks,36 textbook exercises,37 image processing,38 reli-
gious activities,39 dates of birth,40 hydrology and geology,41
fragmentation processes,42 the rst letters of words,43 or
even COVID-19.44 Other examples can be seen in the link of
Ref. 45. In this section, we will present some additional
examples.
Let us start with one of the situations that Benford himself
studied in his classic paper:2 city populations. Using data
from the Spanish National Institute of Statistics,46 we have Fig. 5. Histograms showing the distribution of the rst digit for (from left to
considered the 2019 population of the 165 municipalities in right at each digit d) the NBL, the distances to Earth in light-years and in
the province of Badajoz (plus the total population of the parsecs from the brightest 300 stars, and the daily number of sunspots.
converge toward the NBL. However, this is not the case. As
illustrated in Fig. 7 for the case k ¼ 2 and an initial dataset of
104 random numbers, the time-dependent distribution
fpd ðtÞg exhibits a quasiperiodic oscillatory pattern around
the NBL distribution without any apparent amplitude attenu-
ation. In fact, since 210 ¼ 1024 ’ 103 , the distribution nearly
returns to the uniform initial one at times t ¼ 10; 20; 30; ….
This behavior is reminiscent of the Poincare recurrence time
in mechanical systems and the associated Zermelo paradox
about irreversibility.51
In the transformation frn ðtÞg ! frn ðt þ 1Þ ¼ 2rn ðtÞg, the
rst-digit distribution changes, according to Fig. 3, as
p1 ðt þ 1Þ ¼ p5 ðtÞ þ p6 ðtÞ þ p7 ðtÞ þ p8 ðtÞ þ p9 ðtÞ; (6a)

p2 ðt þ 1Þ ¼ a1 p1 ðtÞ; p3 ðt þ 1Þ ¼ ð1  a1 Þp1 ðtÞ; (6b)
Fig. 6. Histograms showing the distribution of the rst digit for (from left to
right at each digit d) the NBL and the prices of articles of a chain of fashion p4 ðt þ 1Þ ¼ a2 p2 ðtÞ; p5 ðt þ 1Þ ¼ ð1  a2 Þp2 ðtÞ; (6c)
retailers and a chain of hypermarkets.
p6 ðt þ 1Þ ¼ a3 p3 ðtÞ; p7 ðt þ 1Þ ¼ ð1  a3 Þp3 ðtÞ; (6d)
digits in distances (which are expressed in units) is a clear
p8 ðt þ 1Þ ¼ a4 p4 ðtÞ; p9 ðt þ 1Þ ¼ ð1  a4 Þp4 ðtÞ; (6e)
example of invariance under a change of scale. However, in
the case of the daily number of sunspots, quantitative
where the fractions ad (d ¼ 1; 2; 3; 4) are dened by Eq. (4).
(although not qualitative) differences are observed with the
Note that, in general, the fractions ad depend on the
NBL, especially in the d ¼ 1, 3, 4, and 5 cases. It should be
noted that, although the series is rather long (more than
59 000 records, after excluding days without data or with
zero spots), each record only has one, two, or three digits
(the maximum number of sunspots was 528 and corre-
sponded to August 26, 1870), thus producing a certain bias
07 December 2023 11:31:34

to records starting with 1. Moreover, the daily number of
sunspots cannot be expressed in units and therefore may not
be scale-invariant.
Finally, we have analyzed the prices of 1 016 items from a
chain of fashion retailers49 and of 1 373 products from a
chain of hypermarkets.50 The results are shown in Fig. 6. In
this case, the discrepancies with the NBL are more pro-
nounced. Although the highest frequencies occur for d ¼ 1
and d ¼ 2, the observed values of pd do not decrease mono-
tonically with increasing d. In the case of the fashion
retailers, we have p4 > p3 and p9 > p8 > p6 > p7 ; in the pri-
ces of the chain of hypermarkets, p8 > p9 > p7 > p6 . In
principle, one might think that, since they can be expressed
in euro, pound, dollar, peso, ruble, yen, …, prices should sat-
isfy the property of invariance under a change of scale inher-
ent to the NBL. However, commercial and articial pricing
strategies must be superimposed on this invariance, which
generates relevant deviations with respect to the NBL.
IV. A SIMPLE MODEL OF A MARKOV CHAIN

BASED ON THE SCALE INVARIANCE PROPERTY
OF THE NEWCOMB–BENFORD DISTRIBUTION
As already said, NBL, Eq. (1), is invariant under a change
of scale; that is, if we start from a dataset frn g that fullls
the NBL and multiply all the records by a constant k (other
than a fractional power of 10), the resulting dataset fkrn g
still fullls the NBL.
It is tempting to conjecture that the NBL should be an
attractor of this process. This would mean that, if we started
Fig. 7. Evolution of the rst-digit distribution pd ðtÞ (top panel) and of the ratio
from an initial dataset frn ð0Þg that does not fulll the NBL
pd ðtÞ=pd (bottom panel, where a vertical shift d  1 has been applied for better
and generated new sets frn ðtÞg ¼ fkt rn ð0Þg at times clarity), fpd g being the NBL distribution, when starting from a dataset of 104
t ¼ 1; 2; … by multiplying each successive dataset by k, the random records uniformly distributed between 0 and 1 and doubling the records
rst-digit distribution fpd ðtÞg of the generated sets would at each time step. Note the overlap of some of the points in the top panel.
distributions of the rst digit, pd ðtÞ, and of the rst two digits, choose for ad the values given by Eqs. (5), the stationary
pd;d2 ðtÞ, of the dataset frn ðtÞg [see Eq. (4)]. As a conse- solution coincides exactly with the NBL, as could be
quence, (i) Eqs. (6) do not make a closed set and (ii) the expected. This will be the choice made in the rest of this
parameters ad depend on t. section.
At this point, we construct a simplied dynamical model To illustrate the irreversible evolution of pðtÞ toward p ,
in which the four unknown and time-dependent parameters we are going to consider two different initial conditions.
ad in Eqs. (6) are replaced by xed constants. In matrix First, we start from a uniform distribution, that is, pd ð0Þ ¼ 19.
notation, The result is shown in Fig. 8, where we see that the evolution
is oscillatory (see Appendix B for an explanation) and the
pðt þ 1Þ ¼ W  pðtÞ; (7)
oscillations are rapidly damped to attain the stationary solu-
tion. As a second example, we take an NBL inverted initial
where pðtÞ ¼ ðp1 ðtÞ; p2 ðtÞ; …; p9 ðtÞÞ† is a column vector
distribution, that is, pd ð0Þ ¼ p10d , so that, initially, state 9 is
(† denotes transposition) and
the most probable and state 1 is the least probable. In this
0
0 0 0 0 1 1 1 1 1
1 second example, as shown in Fig. 9, the initial oscillations
B a are of greater amplitude but, as before, the stationary distri-
B 1 0 0 0 0 0 0 0 0 C
C bution is practically reached after a few iterations.
B
B 1  a1
C
B 0 0 0 0 0 0 0 0 C
C Comparison between Figs. 7 and 8 shows that the main dif-
B 0
B a2 0 0 0 0 0 0 0 C
C ference between our Markov model and the non-Markovian
W¼B 0
B
1  a2 0 0 0 0 0 0 0
C
C evolution of fpd ðtÞg in a real dataset is that the oscillation
B C amplitudes are attenuated in the model and not in the original
B 0
B 0 a3 0 0 0 0 0 0 C
C framework.
B 0
B 0 1  a2 0 0 0 0 0 0 C
C It seems convenient to characterize the evolution of the set
@ 0
B
0 0 a4 0 0 0 0 0
C
A of probabilities fpd ðtÞg obeying the Markov process [Eq.
0 0 0 1  a4 0 0 0 0 0 (7)] toward the attractor fpd g by means of a single parameter
that, in addition, evolves monotonically, thus representing
(8) the irreversibility of evolution. It is expected that these prop-
is a xed square matrix. Equation (7) ts the canonical
erties are veried by the Kullback–Leibler divergence,53
description of Markov chains,52 where W is the so-called
which in our case is dened as
transition matrix, and fad g correspond to transition probabil-
ities. In this way, we may forget about the meaning of
07 December 2023 11:31:34

9
X pd ðtÞ
fpd ðtÞg as the rst-digit distribution of the dataset frn ðtÞg DKL ðtÞ ¼ pd ðtÞ ln : (9)
and look at the numerals 1; 2; …; 9 as labels of nine distinct d¼1
pd
internal states of a certain physical system, which follow
each other in the prescribed order sketched by Fig. 3. This quantity represents the opposite of the Shannon
Note that the matrix W is singular, that is, not invertible. entropy54 of fpd ðtÞg, except that it is measured with respect
This implies the irreversible character of the transition to the stationary distribution fpd g. In the context of our
fpd ðtÞg ! fpd ðt þ 1Þg. The stationary solution p to Eq. (7) Markov-chain model, DKL is related to information theory.
satises the condition p ¼ W  p . Such a stationary solution On the other hand, the mathematical structure of DKL can be
will be an attractor of our Markov process if limt!1 pðtÞ extended to physical systems, thus providing a statistical-
¼ p for any initial condition pð0Þ. This property, along with mechanical basis to the thermodynamic entropy,54,55 as
the exact solution of Eq. (7), is proved in Appendix B. If we exemplied below.
Fig. 8. Evolution of the ratio pd ðtÞ=pd (where a vertical shift d 1 has been Fig. 9. Evolution of the ratio pd ðtÞ=pd (where a vertical shift d 1 has been
applied for better clarity), when starting from a uniform initial distribution applied for better clarity), when starting from an inverted initial distribution
pd ð0Þ ¼ 19, according to our Markov-chain model, Eq. (7). pd ð0Þ ¼ p10d , according to our Markov-chain model, Eq. (7).
system. In both cases, a statistical description is constructed
by introducing the adequate probability distribution: the
velocity distribution function (gas) and the probability of
the internal state d (Markov chain); while f ð~ v; tÞ is continu-
ous in both ~v and t, pd ðtÞ is discrete in d and t. The evolu-
tion equation for the probability distribution function is the
Boltzmann equation (gas, under the molecular chaos
ansatz56) and Eq. (7) (Markov chain). The equilibrium state
is characterized by the Maxwell–Boltzmann distribution57
vÞ (gas) and the NBL distribution pd (Markov chain).
fMB ð~
In both classes of systems, the nonequilibrium entropy
functional S(t) can be dened, within a constant, as the
opposite of the Kullback–Leibler divergence53,58 of the
equilibrium distribution from the time-dependent one.
Boltzmann’s celebrated H-theorem56,59 (gas) and the result
presented in Eq. (10) (Markov chain) show that the entropy
S(t) never decreases, irreversibly evolving toward its equi-
librium value.
Fig. 10. Evolution of the Kullback–Leibler divergence DKL ðtÞ (in logarith-
V. CONCLUDING REMARKS
mic scale), starting from the uniform initial distribution pd ð0Þ ¼ 19 and from
the inverted initial distribution pd ð0Þ ¼ p10d , according to our Markov- One of the main goals of this article was to show that, con-
chain model, Eq. (7). The dashed line is proportional to ja2;3 j2t (see
Appendix C).
trary to what might be initially thought, the rst signicant
digit d of a dataset extracted from measurements or observa-
tions of the real world is not evenly distributed among the
nine possible values, but typically the frequency is higher for
Figure 10 shows the evolution of DKL ðtÞ for the same ini- d ¼ 1 and decreases as d increases. The NBL, Eq. (1), gives a
tial conditions as in Figs. 8 and 9. Both cases conrm the mathematical expression to this empirical fact, although it
desired monotonic evolution of DKL ðtÞ. Also, the asymptotic does not always need to be rigorously veried due to statisti-
decay DKL ðtÞ ! 0 occurs essentially exponentially with a cal uctuations (as happens with populations in Fig. 4 and
07 December 2023 11:31:34

rate independent of the initial probability distribution. This is with distances in Fig. 5), limitations of the sample (as in the
proved in Appendix C, as well as the monotonicity property, sunspot case in Fig. 5), or an articial bias (as in the case of
prices of articles in Fig. 6). It is to be expected that, except
DKL ðt þ 1Þ  DKL ðtÞ; (10) for unavoidable statistical uctuations, the law is fullled in
datasets in which quantities are measured in units, so that the
with the equality not holding for two successive times unless distribution of the rst digit is independent of the units cho-
pd ðtÞ ¼ pd , in which case DKL ¼ 0. sen due to its invariance under a change of scale. More gen-
Thus, we can say that the NBL in our Markov model plays erally, the NBL is satised when the mantissa of the
a role analogous to the equilibrium state in thermodynamics logarithms of the considered data is uniformly distributed.
and statistical mechanics. In this analogy, the “entropy” of That makes lists not directly related to physical quantities,
the out-of-equilibrium system would be (except for a con- such as Fibonacci numbers or powers of 2, also satisfy the
stant) S ¼ DKL , so that S increases irreversibly in the evo- NBL.
lution toward equilibrium. Gaining inspiration from the scale invariance property of
This analogy is put forward in Table II, where we com- the NBL, we have constructed a Markov-chain model for a
pare a system described by the Markov chain [Eq. (7)] to a nine-state system that evolves in time according to the
classical dilute gas as an example of a well-known physical scheme depicted in Fig. 3. The initial-value model can be
Table II. Analogy between a classical dilute gas (as a prototypical physical system) and a system described by the Markov chain, Eq. (7). In the expression of
the Maxwell–Boltzmann distribution fMB ð~ vÞ, m is the mass of a molecule, T is the gas temperature, and kB is the Boltzmann constant. In the Boltzmann equa-
vjf ðtÞ; f ðtÞ is the collision operator.
tion, J½~
Dilute gas Markov chain
Probability distribution Velocity distribution function: f ð~

v; tÞ Probability of the internal state d: pd ðtÞ
ð X9
Normalization d3~v f ð~
v; tÞ ¼ 1 d¼1 d
p ðtÞ ¼ 1
Evolution equation Boltzmann equation: @f ð~

v; tÞ=@t ¼ J½~
vjf ðtÞ; f ðtÞ pðt þ 1Þ ¼ W  pðtÞ
3=2 mv2 =2kB T
Equilibrium state Maxwell–Boltzmann: fMB ð~vÞ ¼ ðm=2pkB TÞ e NBL: pd ¼ log10 ð1 þ d1 Þ
ð X9
Entropy functional SðtÞ ¼  d3~ v f ð~ v; tÞ=fMB ð~
v; tÞ ln f ð~ vÞ SðtÞ ¼  p ðtÞ ln pd ðtÞ=pd
d¼1 d
Irreversibility equation dSðtÞ=dt  0 Sðt þ 1Þ  SðtÞ  0
exactly solved, the solution showing an irreversible relaxa- sides of the equation with respect to k and then taking k ¼ 1,
tion toward the NBL probability distribution. Moreover, we we easily obtain xP0x ðxÞ ¼ Px ðxÞ, which, according to
have proved that the associated (relative) entropy monotoni- Euler’s theorem, implies that Px ðxÞ is a homogeneous func-
cally increases, in analogy with the second law of thermody- tion of degree 1, that is, Px ðxÞ / x1 . The constant of pro-
namics in physical systems. portionality is obtained from the normalization condition,
Until the 1970s (which is when scientic pocket calcula- nally yielding
tors began to be used), physicists used tables of logarithms
(or their application in slide rules) for small everyday scien- x1
Px ðxÞ ¼ ; 0  x < 10: (A3)
tic calculations. Those calculations are nowadays per- ln 10
formed on pocket calculators, cellular phones, or personal
This is the unique distribution of signicands that is invariant
computers with a wide variety of existing mathematical pro- under a change of scale. From Eq. (A3), and applying Eqs.
grams. Since the data that are manipulated in physics are (A1) and (A2), it is straightforward to obtain NBL, Eq. (1),
extracted from “real” situations, such as experiments, mod- and its generalization, Eq. (2). As a consistency test of Eq.
els, physical constants, …, we can conclude, as a tribute to (2), it is easy to check that
Newcomb and Benford and their logarithmic tables, that key-
board button 1 will be the one that presents the greatest wear 9
X
and tear, while that of 9 will be the least used. pd1 ;d2 ;…;dm1 ¼ pd1 ;d2 ;…;dm
dm ¼0
2 !1 3
ACKNOWLEDGMENTS m1
X
¼ log10 41 þ di  10m1i 5: (A4)
A.S. acknowledges nancial support from Grant No.
i¼1
FIS2016-76359-P/AEI/10.13039/501100011033 and the
Junta de Extremadura (Spain) through Grant No. GR18079, It is interesting to note that the inverse law for the signi-
both partially nanced by Fondo Europeo de Desarrollo cand, Eq. (A3), implies a uniform law for the mantissa (and
Regional funds. vice versa). Let Pl ðlÞdl be the probability that the mantissa
APPENDIX A: DERIVATION OF THE NBL AND lies between l and l þ dl. Since Pl ðlÞdl ¼ Px ðxÞdx and
SOME GENERALIZATIONS dl ¼ ðx1 = ln 10Þdx, Eq. (A3) gives us Pl ðlÞ ¼ 1. In
Newcomb’s words,1 “The law of probability of the occur-
Consider a long list of records frn g that, without loss of rence of numbers is such that all mantissæ of their logarithms
07 December 2023 11:31:34

generality for the matter at hand, we will assume positive. are equally probable.” An immediate consequence is that, if
Each record can be written in the form rn ¼ xn  10kn , where l is a random variable uniformly distributed between 0 and
kn is an integer and xn 2 ½1; 10Þ is the signicand. Obviously, 1, then the random variable x ¼ 10l fullls the NBL. This is
it is the distribution of the signicand that is relevant for the in fact an easy way to generate a list of random records
NBL. The signicand xn is directly related to the mantissa ln matching the NBL.
of the decimal logarithm of rn, that is, log10 rn ¼ kn þ ln , There are deterministic series that also satisfy the NBL.
where ln ¼ log10 xn 2 ½0; 1Þ. Suppose the series frn ¼ aan þ bbn ; n ¼ 1; 2; …g, where a,
Let Px ðxÞdx be the probability that the signicand lies b, a, and b are real numbers with a > 0; a > b  0, and
log10 a ¼ irrational.12 In that case, limn!1 log10 rn
Ð 10 x and x þ dx, so that the normalization condition
between
is 1 dx Px ðxÞ ¼ 1. The probability that the rst signi- ¼ n log10 a þ log10 a has a uniformly distributed mantissa,
cant digit of any record is d is then given by the so frn g satises the NBL. That includes, for example, the
integral, series f2n g; f3n g, and fFn g, where Fn ¼ p15 ½un  ðu1 Þn 
 p
ð dþ1 are the Fibonacci numbers, u  12 1 þ 5 being the
pd ¼ dx Px ðxÞ: (A1) golden ratio. Similarly, the series fn!g also veries the
d law.12
Another important property is that if a list frn g fullls the
More generally, if we consider an ordered string ðd1 ; d2 ; NBL, so does the list frna g, a being a real number. Indeed, if
…; dm Þ made up of the rst m digits, where d1 2 f1; 2; …; 9g log10 rn ¼ kn þ ln , the mantissa ln being uniformly distrib-
and di 2 f0; 1; 2; …; 9g if i  2, it is obvious that the records uted, then the mantissa of log10 rna ¼ aðkn þ ln Þ is also
whose m rst digits match the string ðd1 ; d2 ; …; dm Þ will be evenly distributed. This is directly related to the fact that the
those whose signicand x is greater than or equal to d1 NBL is not only invariant under a change of scale but also
þ d2  101 þ    þ dm  10ðm1Þ and less than d1 þ d2 under base change,11 as would be expected, given the arti-
 101 þ    þ ðdm þ 1Þ  10ðm1Þ . Consequently, cial character of the decimal base. To see it, let us assume a
m base b and build the list frna g, with a ¼ log10 b, from a list
frn g that fullls the NBL. In that case, rna ¼ yn  bkn , where
P
ð 10ðm1Þ þ di 10ði1Þ
pd1 ;d2 ;…;dm ¼ P
m
i¼1
dx Px ðxÞ: (A2) yn ¼ xan 2 ½1; bÞ is the signicand of rna in the base b. The
di 10ði1Þ probability distribution Py ðyÞ is related to the distribution
i¼1
Px ðxÞ through the equation Py ðyÞdy ¼ Px ðxÞdx, so that Eq.
If the distribution Px ðxÞ is actually invariant under a (A3) leads to
change of scale, that means that Px ðkxÞ ¼ f ðkÞPx ðxÞ with
y1
arbitrary k. Taking into account the normalization condition Py ðyÞ ¼ ; 0  y < b: (A5)
Ð 10k ln b
in the form k dðkxÞ Px ðkxÞ ¼ 1, it follows that necessarily
f ðkÞ ¼ k1 , that is, Px ðkxÞ ¼ k1 Px ðxÞ. Differentiating both Therefore, NBL in an arbitrary base b takes the form,
is the unique stationary solution. From the normalization
 
1
pd ¼ logb 1þ ; d ¼ 1; 2; …; b  1: (A6) condition, one has p9 ¼ a1 a2 ð1  a4 Þ=ð3 þ a1 a2 Þ. Note that,
d
as seen from Eqs. (B3), the initial values pII ð0Þ do not inu-
ence the evolution of either pI ðtÞ or pII ðtÞ.
APPENDIX B: EXACT SOLUTION OF THE The stationary solution will be an attractor if
limt!1 pI ðtÞ ¼ pI and limt!1 pII ðtÞ ¼ pII for any initial con-
MARKOV-CHAIN MODEL
dition pI ð0Þ. According to Eqs. (B3), this implies
P9
Note rst that Eqs. (6) verify the normalization condition
P9 limt!1 At ¼ 0.
d¼1 pd ðt þ 1Þ ¼ d¼1 pd ðtÞ ¼ 1. Therefore, only eight of
To check the above condition, let us obtain the four eigen-
the probabilities fpd ; d ¼ 1; 2; …; 9g are linearly indepen- values fai ; i ¼ 0; 1; 2; 3g of A. It is easy to see that the char-
dent, so we can eliminate one of them. If we choose p9 ¼ 1 acteristic equation is aða1 a2 þ a þ a2 þ a3 Þ ¼ 0. Therefore,
 8d¼1 pd , Eq. (6a) gives us p1 ðt þ 1Þ ¼ 1  p1 ðtÞ  p2 ðtÞ
P the eigenvalues are a0 ¼ 0 and
p3 ðtÞ  p4 ðtÞ. Although it is not strictly necessary, it is  
1 2
mathematically convenient to split the column vector pðtÞ a1 ¼  1þ b ; (B5a)
into the subsets pI ðtÞ  ðp1 ðtÞ; p2 ðtÞ; p3 ðtÞ; p4 ðtÞÞ† , pII ðtÞ 3 b
 ðp5 ðtÞ; p6 ðtÞ; p7 ðtÞ; p8 ðtÞÞ† , and p9 ðtÞ. As a consequence, p p !
the matrix equation (7) can be decomposed into 1 16ı 3 17ı 3
a2;3 ¼ 1 þ b ; (B5b)
3 b 2
pI ðt þ 1Þ ¼ q þ A  pI ðtÞ; pII ðt þ 1Þ ¼ B  pI ðtÞ; (B1)
where q ¼ ð1; 0; 0; 0Þ† and where ı is the imaginary unit and

0
1 1 1 1
1   q1=3
3 7
B a b  9a1 a2 þ 9  42a1 a2 þ 81a21 a22 :
1 0 0 0 C 2 3
A¼B C; (B2a)
B C
@ 1  a1 0 0 0 A (B6)
0 a2 0 0
0 1 Consequently
0 1  a2 0 0
At ¼ U  Dt  U1 ; t ¼ 1; 2; …; (B7)
07 December 2023 11:31:34

B C
B0 0 a3 0C
B¼B C: (B2b)
B0
@ 0 1  a3 0CA where
0 0 0 a4 0 1
a21 a22 a23
B 0
a1 a2 a1 a2 a1 a2
C
In this way, one deals with two independent 4  4 matrices B C
a1 a2 a3
B C
instead of the 9  9 transition matrix introduced in Eq. (8). B 0
B C
Moreover, only the equation for the projected vector pI in
C
U¼B a2 a2 a2 C;
Eq. (B1) needs to be solved. Once the solution is obtained,
B C
B
B 1 ð1  a1 Þa1 ð1  a1 Þa2 ð1  a1 Þa3 C
the solution for the complementary projected vector pII is C
a1 a2 a1 a2 a1 a2 A
B C
straightforward. @
By recursion, it is easy to check that the solution to the 1 1 1 1
initial-value problem posed by Eq. (B1) is (B8)
t1
0 1
X 0 0 0 0
pI ðtÞ ¼ An  q þ At  pI ð0Þ B0 at1 0 0C
n¼0 Dt ¼ B C; t ¼ 1; 2; …: (B9)
B C
¼ ðI  At Þ  pI þ At  pI ð0Þ; (B3a) @0 0 at2 0A
0 0 0 at3
pII ðtÞ ¼ B  ðI  At1 Þ  pI þ B  At1  pI ð0Þ; (B3b)
From Eqs. (B5), it can be veried that ja1 j < ja2;3 j < 1 for
where I is the identity matrix and 0 < a1 a2 < 1, so that limt!1 ati ¼ 0 for i ¼ 1, 2, 3.
0
1 Therefore, limt!1 Dt ¼ 0 and hence limt!1 At ¼ 0. This
1 proves the character of the stationary distribution p as an
1 1 B a C
1 attractor of the dynamics. Moreover, since both the real
pI ¼ ðI  AÞ  q ¼ C; (B4a)
B C
3 þ a 1 a2 @ 1  a 1 A
B eigenvalue (a1) and the real part of the complex eigenvalues
(a2;3 ) are negative, the evolution of each pd ðtÞ to pd is oscil-
a1 a2 latory, as can be observed in Figs. 8 and 9. Note that, in the
0 1 analysis above, the eigenvalues 0 (double), 1  a3 , and a4 of
a1 ð1  a2 Þ the matrix B are not needed.
1 B ð1  a1 Þa3 C
B C If we choose ad ¼ 12, then Eqs. (B4) provide the stationary
pII ¼ B  pI ¼ B C; (B4b) solution p1 ¼ 13
4
’ 0:308; p2 ¼ p3 ¼ 132
’ 0:154; p4 ¼ p5 ¼ p6
3 þ a 1 a2 B
@ ð1  a1 Þð1  a3 Þ A
C
1 1
¼ p7 ¼ 13 ’ 0:077; p8 ¼ p9 ¼ 26 ’ 0:038. These values are
a 1 a2 a4 not too different from those of true NBL, as can be seen
from Table I. On the other hand, the most natural choice is DDKL ðt þ 1Þ  DKL ðt þ 1Þ  DKL ðtÞ
provided by Eqs. (5), in which case the stationary solution 9
coincides exactly with the NBL.
X
9
pd ðtÞ pd0
d0 ¼5
X
¼ pd ðtÞ ln 9
: (C6)
APPENDIX C: PROPERTIES OF THE d¼5
X
pd pd0 ðtÞ
KULLBACK–LEIBLER DIVERGENCE d 0 ¼5
IN THE MARKOV MODEL
Given the stationary values fpd ; d ¼ 5; …; 9g, the difference
The Kullback–Leibler divergence is dened by Eq. (9). In
DDKL ðt þ 1Þ is a function of the ve probabilities fpd ðtÞ; d
order to analyze its asymptotic decay, let us consider times
¼ 5; …; 9g. To nd the maximum value of DDKL ðt þ 1Þ, we
that are long enough so that the deviations dpd ðtÞ
take the derivative
 pd ðtÞ  pd can be considered small. In this regime, we can
expand Eq. (9) in a power series and retain the dominant 9
X
terms. The result is pd ðtÞ pd0
@DDKL ðt þ 1Þ d0 ¼5
1X 9
½dpd ðtÞ2 ¼ ln : (C7)
DKL ðtÞ  : (C1) @pd ðtÞ 9
X
2 d¼1 pd pd pd0 ðtÞ
d 0 ¼5
On the other hand, for times long enough, ja1 jt  ja2;3 jt (note
The unique solution to the extremum conditions @DDKL ðt
that ja1 j ¼ 0:4261 and ja2;3 j ¼ 0:8692), so that, according to
þ1Þ=@pd ðtÞ ¼ 0 is pd ðtÞ ¼ cpd (d ¼ 5; …; 9), where 0 < c
Eqs. (B3), dpd ðtÞ  ja2;3 jt . Thus,
< 1= 9d¼5 pd is arbitrary. In such a case, DDKL ðt þ 1Þ ¼ 0.
P
DKL ðtÞ  ja2;3 j2t ¼ 102t log10 ja2;3 j : (C2) To see that this is actually a maximum value, suppose, for
instance, that pd ðtÞ ¼ 0 except if d ¼ d0 (with d0 ¼ 5; …; 9).
This asymptotic behavior is represented in Fig. 10. In that case, it is easy to nd DDKL ðt þ 1Þ
¼ pd0 ðtÞ ln ðpd0 = 9d¼5 pd Þ < 0. We then conclude that
P
Finally, let us prove Eq. (10). According to Eq. (9),
DDKL ðt þ 1Þ  0, i.e., we obtain Eq. (10), the equality hold-
p1 ðt þ 1Þ ing only if pd ðtÞ ¼ cpd (d ¼ 5; …; 9). Note that, even though
DKL ðt þ 1Þ ¼ p1 ðt þ 1Þ ln
p1 DDKL ðt þ 1Þ ¼ 0 if pd ðtÞ=pd ¼ constant for d ¼ 5; …; 9, this
07 December 2023 11:31:34

4  proportionality property is broken down at t þ 1 unless c ¼ 1.
X p2d ðt þ 1Þ As a result, DDKL ðt þ 2Þ < 0, even though DDKL ðt þ 1Þ
þ p2d ðt þ 1Þ ln
d¼1
p2d ¼ 0, except in the stationary solution.

p2dþ1 ðt þ 1Þ
þ p2dþ1 ðt þ 1Þ ln : (C3) a)
Electronic mail: anburgosr@alumnos.unex.es
p2dþ1 b)
Electronic mail: andres@unex.es, ORCID: 0000-0002-9564-5180.
1
S. Newcomb, “Note on the frequency of use of the different digits in natu-
Evolution equations (6) and the stationarity condition ral numbers,” Am. J. Math. 4, 39–40 (1881).
p ¼ W  p can be rewritten as 2
F. Benford, “The law of the anomalous numbers,” Proc. Am. Philos. Soc.
78, 551–572 (1938), <http://www.jstor.com/stable/984802>.
9 9 3
X X S. M. Stigler, “Stigler’s law of eponymy,” Trans. N. Y. Acad. Sci. 39,
p1 ðt þ 1Þ ¼ pd ðtÞ; p1 ¼ pd ; (C4a) 147–158 (1980).
4
d¼5 d¼5 A. Berger, T. P. Hill, and E. Rogers, “Benford online bibliography,”
( ) ( ) <http://www.benfordonline.net>.
5
p2d ðt þ 1Þ ad M. J. Nigrini, Benford’s Law: Applications for Forensic Accounting,
¼ pd ðtÞ; d ¼ 1; 2; 3; 4; Auditing, and Fraud Detection (Wiley, Hoboken, NJ, 2012).
p2dþ1 ðt þ 1Þ 1  ad 6
A. Berger and T. P. Hill, An Introduction to Benford’s Law (Princeton U.
P., Princeton, NJ, 2015); S. J. Miller, Benford’s Law: Theory and
(C4b) Applications (Princeton U. P., Princeton, NJ, 2015); A. E. Kossovsky,
( ) ( ) Benford’s Law: Theory, The General Law Of Relative Quantities, And
p2d ad Forensic Fraud Detection Applications (World Scientic, Singapore,
¼ pd ; d ¼ 1; 2; 3; 4: (C4c)
p2dþ1 1  ad 2015); A. E. Kossovsky, Studies in Benford’s Law: Arithmetical Tugs of
War, Quantitative Partition Models, Prime Numbers, Exponential Growth
Series, and Data Forensics (Independently Published, 2019); M. J.
Inserting Eqs. (C4) into Eq. (C3), one gets Nigrini, Forensic Analytics: Methods and Techniques for Forensic
Accounting Investigations (Wiley, Hoboken, NJ, 2020).
7
9
P S. A. Goudsmit and W. H. Furry, “Signicant gures of numbers in statis-
9 pd0 ðtÞ tical tables,” Nature 154, 800–801 (1944); W. H. Furry and H. Hurwitz,
d 0 ¼5 “Distribution of numbers and distribution of signicant gures,” ibid. 155,
X
DKL ðt þ 1Þ ¼ pd ðtÞ ln 9 52–53 (1945); D. S. Lemons, “On the numbers of things and the distribu-
d¼5 pd0
P
tion of rst digits,” Am. J. Phys. 54, 816–817 (1986); J. Burke and E.
d0 ¼5 Kincanon, “Benford’s law and physical constants: The distribution of ini-
4 tial digits,” ibid. 59, 952–952 (1991); J. M. R. Parrondo, “La misteriosa
X pd ðtÞ ley del primer dıgito,” Invest. Cienc. 315, 84–85 (2002); J. Torres, S.
þ pd ðtÞ ln : (C5)
d¼1
pd Fernandez, A. Gamero, and A. Sola, “How do numbers begin? (The rst
digit law),” Eur. J. Phys. 28, L17–L25 (2007); R. M. Fewster, “A simple
explanation of Benford’s law,” Am. Stat. 63, 26–32 (2009); T. A. Mir and
Therefore, M. Ausloos, “Benford’s law: A ‘sleeping beauty’ sleeping in the dirty
pages of logarithmic tables,” J. Assoc. Inf. Sci. Technol. 69, 349–358 Curry, and A. M. Dougherty, “Stochastic aspects of one-dimensional dis-
(2018); D. S. Lemons, “Thermodynamics of Benford’s rst digit law,” crete dynamical systems: Benford’s law,” Phys. Rev. E 64, 026222
Am. J. Phys. 87, 787–790 (2019). (2001); A. Berger, L. A. Bunimovich, and T. P. Hill, “One-dimensional
8
B. Buck, A. C. Merchant, and S. M. Perez, “An illustration of Benford’s dynamical systems and Benford’s law,” Trans. Am. Math. Soc. 357,
rst digit law using alpha decay half lives,” Eur. J. Phys. 14, 59–63 197–219 (2004).
24
(1993). Q. Li, Z. Fu, and N. Yuan, “Beyond Benford’s law: Distinguishing noise
9
T. P. Hill, “The rst digit phenomenon: A century-old observation about from chaos,” PLoS One 10, e0129161 (2015).
25
an unexpected pattern in many numerical tables applies to the stock mar- L. Shao and B.-Q. Ma, “The signicant digit law in statistical physics,”
ket, census statistics and accounting data,” Am. Sci. 86, 358–363 (1998). Physica A 389, 3109–3116 (2010).
10 26
R. S. Pinkham, “On the distribution of rst signicant digits,” Ann. Math. J. M. Campanario and M. A. Coslado, “Benford’s law and citations,
Stat. 32, 1223–1230 (1961); B. J. Flehinger, “On the probability that a ran- articles and impact factors of scientic journals,” Scientometrics 88,
dom integer has initial digit A,” Am. Math. Mon. 73, 1056–1061 (1966); 421–432 (2011); A. D. Alves, H. H. Yanasse, and N. Y. Soma, “Benford’s
R. A. Raimi, “On the distribution of rst signicant gures,” ibid. 76, law and articles of scientic journals: comparison of JCR and Scopus
342–348 (1969); “The rst digit problem,” 83, 521–538 (1976); P. data,” ibid. 98, 173–184 (2014).
Diaconis, “The distribution of leading digits and uniform distribution mod 27
M. J. Nigrini and S. J. Miller, “Data diagnostics using second-order tests
1,” Ann. Probab. 5, 72–81 (1977); K. Nagasaka, “On Benford’s law,” of Benford’s law,” Auditing 28, 305–324 (2009).
Ann. Inst. Stat. Math. 36, 337–352 (1984); T. P. Hill, “A statistical deriva- 28
W. K. T. Cho and B. J. Gaines, “Breaking the (Benford) law. Statistical
tion for the signicant-digit law,” Stat. Sci. 10, 354–363 (1995); L. M. fraud detection in campaign nance,” Am. Stat. 61, 218–223 (2007); B. F.
Leemis, B. W. Schmeiser, and D. L. Evans, “Survival distributions satisfy- Roukema, “A rst-digit anomaly in the 2009 Iranian presidential election,”
ing Benford’s law,” Am. Stat. 54, 236–241 (2000); M. Cong, C. Li, and J. Appl. Stat. 41, 164–199 (2014); D. Gamermann and F. L. Antunes,
B.-Q. Ma, “First digit law from Laplace transform,” Phys. Lett. A 383, “Statistical analysis of Brazilian electoral campaigns via Benford’s law,”
1836–1844 (2019); M. Cong and B.-Q. Ma, “A proof of rst digit law Physica A 496, 171–188 (2018).
from Laplace transform,” Chin. Phys. Lett. 36, 070201 (2019); A. Volčič, 29
A. Diekmann, “Not the rst digit! Using Benford’s law to detect fraudu-
“Uniform distribution, Benford’s law and scale-invariance,” Boll. Unione lent scientic data,” J. Appl. Stat. 34, 321–329 (2007).
Mat. Ital. 13, 539–543 (2020); A. Berger and T. P. Hill, “The mathematics 30
J. Shi, M. Ausloos, and T. Zhu, “Benford’s law rst signicant digit and
of Benford’s law: a primer,” Stat. Methods Appl. (published online). distribution distances for testing the reliability of nancial reports in devel-
11
T. P. Hill, “Base-invariance implies Benford’s law,” Proc. Am. Math. Soc. oping countries,” Physica A 492, 878–888 (2018); M. Ausloos, A.
123, 887–895 (1995). Eskandary, P. Kaur, and G. Dhesi, “Evidence for gross domestic product
12
A. Berger and T. P. Hill, “A basic theory of Benford’s law,” Probab. Surv. growth time delay dependence over foreign direct investment. A time-lag
8, 1–126 (2011).
13 dependent correlation study,” ibid. 527, 121181 (2019).
E. W. Weisstein, “Benford’s law,” <https://mathworld.wolfram.com/ 31
L. Pietronero, E. Tosatti, V. Tosatti, and A. Vespignani, “Explaining the
BenfordsLaw.html>.
14 uneven distribution of numbers in nature: the laws of Benford and Zipf,”
D. C. Hoyle, M. Rattray, R. Jupp, and A. Brass, “Making sense of microar-
Physica A 293, 297–304 (2001).
ray data distributions,” Bioinformatics 18, 576–584 (2002). 32
15 M. Miranda-Zanetti, F. Delbianco, and F. Tohme, “Tampering with ina-
07 December 2023 11:31:34

J.-C. Pain, “Benford’s law and complex atomic spectra,” Phys. Rev. E 77,
tion data: A Benford law-based analysis of national statistics in
012102 (2008).
16 Argentina,” Physica A 525, 761–770 (2019).
D.-D-Ni and Z.-Z. Ren, “Benford’s law and half-lives of unstable nuclei,” 33
J. Lee and M. de Carvalho, “Technological improvements or climate
Eur. Phys. J. A 38, 251–255 (2008); D.-D. Ni, L. Wei, and Z.-Z. Ren,
change? Bayesian modeling of time-varying conformance to Benford’s
“Benford’s law and b-decay half-lives,” Commun. Theor. Phys. 51,
law,” PLoS One 14, e0213300 (2019).
713–716 (2009). 34
17 S. N. Dorogovtsev, J. F. F. Mendes, and J. G. Oliveira, “Frequency of
L. Shao and B.-Q. Ma, “First digit distribution of hadron full width,” Mod.
occurrence of numbers in the World Wide Web,” Physica A 360, 548–556
Phys. Lett. A 24, 3275–3282 (2009); A. Dantuluri and S. Desai, “Do s lep-
(2006).
ton branching fractions obey Benford’s law?” Physica A 506, 919–928 35
L. Arshadi and A. H. Jahangir, “Benford’s law behavior of internet
(2018).
18
M. A. Moret, V. de Senna, M. G. Pereira, and G. F. Zebende, “Newcomb- trafc,” J. Network Comput. Appl. 40, 194–205 (2014).
36
Benford law in astrophysical sources,” Int. J. Mod. Phys. C 17, 1597–1604 J. Golbeck, “Benford’s law applies to online social networks,” PLoS One
(2006); L. Shao and B.-Q. Ma, “Empirical mantissa distributions of 10, e0135169 (2015).
37
pulsars,” Astropart. Phys. 33, 255–262 (2010); T. Alexopoulos and S. A. D. Slepkov, K. B. Ironside, and D. DiBattista, “Benford’s law:
Leontsinis, “Benford’s law in astronomy,” J. Astrophys. Astron. 35, Textbook exercises and multiple-choice testbanks,” PLoS One 10,
639–648 (2014); T.-P. Hill and R. F. Fox, “Hubble’s law implies e0117972 (2015).
38
Benford’s law for distances to galaxies,” ibid. 37, 4 (2016); A. Shukla, A. J.-M. Jolion, “Images and Benford’s law,” J. Math. Imaging Vis. 14,
K. Pandey, and A. Pathak, “Benford’s distribution in extrasolar world: Do 73–81 (2001).
39
the exoplanets follow Benford’s distribution?” ibid. 38, 7 (2017); J. de T. A. Mir, “The law of the leading digits and the world religions,” Physica
Jong, J. de Bruijne, and J. de Ridder, “Benford’s law in the Gaia universe,” A 391, 792–798 (2012); “The Benford law behavior of the religious activ-
Astron. Astrophys. 642, A205 (2020). ity data,” 408, 1–9 (2014); M. Ausloos, “Econophysics of a religious cult:
19
T. Chanda, T. Das, D. Sadhukhan, A. K. Pal, A. Sen(De), and U. Sen, The Antoinists in Belgium [1920–2000],” ibid. 391, 3190–3197 (2012).
40
“Statistics of leading digits leads to unication of quantum correlations,” M. Ausloos, C. Herteliu, and B. Ileanu, “Breakdown of Benford’s law for
EPL 314, 30004 (2016); A. Bera, U. Mishra, S. S. Roy, A. Biswas, A. birth data,” Physica A 419, 736–745 (2015).
41
Sen(De), and U. Sen, “Benford analysis of quantum critical phenomena: M. J. Nigrini and S. J. Miller, “Benford’s law applied to hydrology data—
First digit provides high nite-size scaling exponent while rst two and Results and relevance to other geophysical data,” Math. Geol. 39, 469–490
further are not much better,” Phys. Lett. A 382, 1639–1644 (2018); A. (2007); M. Sambridge, H. Tkalčić, and A. Jackson, “Benford’s law in the
Bera, T. Das, D. Sadhukhan, S. S. Roy, A. Sen(De), and U. Sen, natural sciences,” Geophys. Res. Lett. 37, L22301, https://doi.org/
“Quantum discord and its allies: a review of recent progress,” Rep. Prog. 10.1029/2010GL044830 (2010); A. Geyer and J. Martı, “Applying
Phys. 81, 024001 (2018). Benford’s law to volcanology,” Geology 40, 327–330 (2012); M. Ausloos,
20 R. Cerqueti, and C. Lupi, “Long-range properties and data validity for
S. de Marchi and J. Hamilton, “Assessing the accuracy of self-reported
data: An evaluation of the toxics release inventory,” J. Risk Uncertainty hydrogeological time series: The case of the Paglia river,” Physica A 470,
32, 57–76 (2006). 39–50 (2017).
21 42
A. J. da Silva, S. Floquet, D. O. C. Santos, and R. Lima, “On the validation W. A. Kreiner, “On the Newcomb-Benford law,” Z. Naturforsch. 58a,
of the Newcomb–Benford law and the Weibull distribution in neuromus- 618–622 (2003); T. Becker, D. Burt, T. C. Corcoran, A. Greaves-Tunnell,
cular transmission,” Physica A 553, 124606 (2020). J. R. Iafrate, J. Jing, S. J. Miller, J. D. Porlio, R. Ronan, J.
22
E. Crocetti and G. Randi, “Using the Benford’s law as a rst step to assess Samranvedhya, F. W. Strauch, and B. Talbut, “Benford’s law and continu-
the quality of the cancer registry data,” Front. Public Health 4, 225 (2016). ous dependent random variables,” Ann. Phys. 388, 350–381 (2018).
23 43
C. R. Tolle, J. L. Budzien, and R. A. LaViolette, “Do dynamical systems X. Yan, S.-G. Yang, B. J. Kim, and P. Minnhagen, “Benford’s law and rst
follow Benford’s law?” Chaos 10, 331–336 (2000); M. A. Snyder, J. H. letter of words,” Physica A 512, 305–315 (2018).
44
K.-B. Lee, S. Han, and Y. Jeong, “COVID-19, attening the curve, and Gases in Shear Flows: Nonlinear Transport, Fundamental Theories of
Benford’s law,” Physica A 559, 125090 (2020); A. P. Kennedy and S. C. Physics (Springer, Netherlands, 2003); H. Karabulut, “Direct simulation
P. Yam, “On the authenticity of COVID-19 case gures,” PLoS One 15, for a homogeneous gas,” Am. J. Phys. 75, 62–66 (2007); A. D. Boozer,
e0243123 (2020). “Thermodynamic time asymmetry and the Boltzmann equation,” ibid. 83,
45
“Testing Benford’s law,” <https://testingbenfordslaw.com>. 223–230 (2016).
46 57
“Instituto Nacional de Estadıstica,” <https://www.ine.es/index.htm>. J. Hurley, “Equilibrium solution of the Boltzmann equation,” Am. J. Phys.
47
“The brightest stars,” <http://www.atlasoftheuniverse.com/stars.html>. 33, 242–243 (1965); P. A. Mello and T. A. Brody, “A different proof of
48
“Sunspot number,” <http://sidc.be/silso/datales>. the Maxwell-Boltzmann distribution,” ibid. 40, 1239–1345 (1972); R.
49
“Corteel,” <https://corteel.com/es/es/mujer?srule=price-high-to-low>. Lopez-Ruiz and X. Calbet, “Derivation of the Maxwellian distribution
50
“Hipercor,” <https://www.hipercor.es/supermercado/alimentacion/>. from the microcanonical ensemble,” ibid. 75, 752–753 (2007); A.
51
V. S. Steckline, “Zermelo, Boltzmann, and the recurrence paradox,” Am. Walstad, “On deriving the Maxwellian velocity distribution,” ibid. 81,
J. Phys. 51, 894–897 (1982). 555–557 (2013); R. E. Robson, T. J. Mehrling, and J. Osterhoff, “Great
52
Y. K. Leong, “Mechanical model for 2–state Markov chains,” Am. J. moments in kinetic theory: 150 years of Maxwell’s (other) equations,”
Phys. 52, 749–753 (1984). Eur. J. Phys. 38, 065103 (2017); F. Rivadulla, “Alternative derivation of
53
S. Kullback and R. A. Leibler, “On information and sufciency,” Ann. the Maxwell distribution of speeds,” J. Chem. Educ. 96, 2063–2065
Math. Stat. 22, 79–86 (1951). (2019).
54 58
J. Machta, “Entropy, information, and computation,” Am. J. Phys. 67, S. Tiwary, “Time evolution of entropy, in various scenarios,” Eur. J. Phys.
1074–1077 (1999). 41, 025101 (2020).
55 59
A. Ben-Naim, Entropy Demystied: The Second Law Reduced to Plain J. Pojman, “Boltzmann’s H theorem applied to simulations of polymer
Common Sense (World Scientic, Hackensack, NJ, 2008). interchange reactions,” J. Chem. Educ. 67, 200–202 (1990); A. D. Boozer,
56
S. Chapman and T. G. Cowling, The Mathematical Theory of Non- “Boltzmann’s H-theorem and the assumption of molecular chaos,” Eur. J.
Uniform Gases, 3rd ed. (Cambridge U. P., Cambridge, UK, 1970); C. Phys. 32, 1391–1403 (2011); K. Rȩbilas, “Origin of the thermodynamic
Cercignani, The Boltzmann Equation and Its Applications (Springer- time arrow demonstrated in a realistic statistical system,” Am. J. Phys. 80,
Verlag, New York, 1988); V. Garz o and A. Santos, Kinetic Theory of 700–707 (2012).
07 December 2023 11:31:34

NewcombBenford (1) - Copie

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

NewcombBenford (1) - Copie

Uploaded by

Copyright:

Available Formats

The Newcomb–Benford law: Scale invariance and a simple Markov

I. INTRODUCTION even though it was found nearly sixty years earlier by

07 December 2023 11:31:34

851 Am. J. Phys. 89 (9), September 2021 http://aapt.org/ajp V

Digit First Second Third Fourth

0 … 0.119 68 0.101 78 0.100 18

should be negative; that is, the pd distribution should be

07 December 2023 11:31:34

In Table I, the law for the rst digit, pd, is accompanied by

province of Badajoz), of the 223 municipalities of the prov-

07 December 2023 11:31:34

p1 ðt þ 1Þ ¼ p5 ðtÞ þ p6 ðtÞ þ p7 ðtÞ þ p8 ðtÞ þ p9 ðtÞ; (6a)

07 December 2023 11:31:34

IV. A SIMPLE MODEL OF A MARKOV CHAIN

07 December 2023 11:31:34

07 December 2023 11:31:34

Dilute gas Markov chain

Probability distribution Velocity distribution function: f ð~

Evolution equation Boltzmann equation: @f ð~

Irreversibility equation dSðtÞ=dt  0 Sðt þ 1Þ  SðtÞ  0

07 December 2023 11:31:34

where q ¼ ð1; 0; 0; 0Þ† and where ı is the imaginary unit and

07 December 2023 11:31:34

07 December 2023 11:31:34

07 December 2023 11:31:34

07 December 2023 11:31:34

You might also like