You are on page 1of 4

Volume 151, number 1,2 PHYSICS LETITERS A 26 November 1990

An efficient algorithm for fast O(N*ln(N)) box counting


Xin-Jun Hou ~, Robert Gilmore, Gabriel B. Mindlin and Hernán G. Solari
Department of Physics andAtmospheric Science, Drexel University, Philadelphia, PA 19104, USA

Received 25 June 1990; revised manuscript received 20 September 1990; accepted for publication 21 September 1990
Communicated by D.D. Hoim

A new topological ordering is defined which significantly reduces the time requirements for the fast box counting method
proposed in a recent paperby Liebovitch and Toth. Only onesorting is necessary in this algorithm.

Calculation of the capacity dimension [1—3]of a work is an implementation of this idea which also
fractal structure [3,4] is severely limited by large has been expressed elsewhere [12].
memory and time requirements [5—91. In a recent To increase efficiency, a new ordering scheme
paper [5], Liebovitch and Toth suggested an should be used such that the sorted list can be used
O (N* in (N)) box counting method which needs only repeatedly for different box sizes. If all points be-
O ( d~* N) memory, where de is the embedding di- longing to the same box can be arranged in one seg-
mension. This method can be improved with a new ment of the sorted list, regardless the size of the box,
topological ordering of points described in this paper. then only one sorting is required. Once the points are
In ref. [5], the coordinates of each point of a set sorted, one only needs to do box size dependent
embedded in a do-dimensional space are rescaled to masking for each new box size, followed by counting
the interval (0, 2k_ 1) and expressed in binary form, the distinct values in the list.
The set is covered by a grid of de-dimensional boxes Regular lexicographical order does not satisfy this
with edge size 2~(0~rn<k). With the binary rep- requirement. For example points (000, 000) and
resentation of coordinates, the box to which a point (001, 000) are in different boxes if the box size is 1
belongs can be found by checking the most signifi- (rn = 0), and they are both located in the same big-
cant rn bits (left m bits) of each coordinate of that ger box of size 2 (rn= 1), but point (000, 111) is in
point. To find
2m),the
thenumber of boxes needed
least significant to (right
k—rn bits cover between these
list: (000, two<points
000) (000, in111)
a lexicographicaily
<(001, 000).sorted
This
the set, NB(
k—rn bits) ofeach coordinate are masked to 0’s and makes it necessary to sort the list separately for each
the points are sorted to lexicographical order [10]. different box size (fig. 1 a).
The one walks down the sorted list to find the num- Instead of sorting the points into lexicographical
ber of distinct values of masked points. This algo- order directly, we intercalate the coordinates of each
rithm has a time complexity [11] of O(N*ln(N)) point into a long bit string and sort the points ac-
using optimal sorting procedures (quicksort, heap- cording to the value of intercalated bit string. We use
sort [10]). However, the lexicographical order of embedding dimension de = 2 as an example to show
points requires a new sorting for each box size; this this intercalation procedure.
makes the constant factor of time complexity very Assume (x, y) to be the coordinates of a point in
large. Although ref. [5] did mention using cyclic or- a plane scaled to (0,
2k_ 1), and
dering of the bits in the coordinates to eliminate X=Xk_lXk_2...XIXO (la)
multiple sorting, no details were given. The present
be the binary representation of x. Here x1 (i= 0,
E-mail: houxj@einstein.physics.drexel.edu k— 1) are bit values of 0 or 1, Xk_ is the most sig-

0375-9601/90/s 03.50 © 1990 — Elsevier Science Publishers B.V. (North-Holland) 43


Volume 151, number 1,2 PHYSICS LETTERS A 26 November 1990

y y

(111,111)

I i Si
I I I
I I S •Sl
I I I I

1 I X (000,000) II I ~ I X
(000,000) (001.000)

Intercalated corresponding
(x, Y) bit string (x, y)

(000,000) — 000000 (000,000)]


(000,001) 000001 (000,001)
(000,010) 000010 (001,000)
(000,011) 000011 (001,001)

(000,111) (-~--
(001,000)
(001,001) 001111 (011,011)

(11 1:1 11) 111 (11 ii)

(a) (b)

Fig. 1. Ordering of points in two dimensions. First few points in sorted order are shown. (a) Lexicographical ordering of point (x, y).
Points in one box may not be in the same segment of the sorted list, for example, point (000, 111) is between (000, 000) and (001,
000). (b) Intercalated order of (x, y). All points in the same box are in one segment ofthe sorted list, independent of box size.

nificant bit and x0 the least significant bit; and sim- of size 21, then two more bits from the right are
ilarly for y, masked forthe next box size 22, and so on. Each time
we mask two bits, we prepare a list for counting the
Y—Yk— lYk— 2 YiYo.
‘.. (lb) number ofboxes required ofa bigger box size. At the
The intercalated bit string of x and y is a bit string end we will get the number of boxes needed to cover
of length 2k: the set for box sizes ranging from 2°to 2”~.
In a de-dimensional embedding space, data can be
Xk_,Yk_,Xk_23’k_2 x~y2x1y1x~y~.
... (2) intercalated in a similar way. The intercalated bit
Once the intercalated points (2) are sorted, the most string has length de * k and after sorting, one has to
significant bits determine the box which contains that mask de bits at the right at a time.
point. All points in the same box are always in one This procedure is fast and simple. First, only shift-
segment, independent of box size (fig. ib). To start ing and masking operations are involved to inter-
counting, we find the number of distinct values in calate the bits. This is an 0(N) process. Second, we
the list. This gives us the number of boxes of size 2° can overwrite the coordinates with the intercalated
required to cover the set, NB(2°).Then two bits on bit string so that no extra memory is required.
the right are masked to 0’s to create a list of boxes We tested this algorithmon a Sun SPARCstationTM

44
Volume 151, number 1,2 PHYSICS LETTERS A 26 November 1990

Table 1
Benchmark of intercalation algorithm for Hénon attractor with a = 1.4, b = 0.3. Tr: Data reading and rescaling time, data are in binary
file; Ti: time neededto intercalate coordinates of N points; Ts: sorting time (heapsort); Tc: box counting time (32 different sizes); Tt:
total execution time. All times are in seconds. lK= 1024.

N Tr Ti Ts Tc Tt

32K 0.583 0.983 2.33 5.70 9.60 1.258


128K 2.43 3.92 11.3 23.4 41.3 1.263
1024K 20.5 30.5 115 189 354 1.263

for the Hénon attractor [13] (d~=2)with a= 1.4 suggestion in ref. [5]: the counting for big box sizes
and b= 0.3. The program is written in C language ~‘ rn = k, k— 1 and k— 2 are disregarded, then the fitting
and the coordinates are rescaled to 32 bits (k= 32) range is selected from m = k— 3 down to M such that
unsigned integer. The execution time ~2 is shown in NB ( 2M) ~ ~NB (2°), but NB(2M_ 1) > ~NB(20). In
table 1. This algorithm using intercalation is signif- some case, NB(2°),the number of the smallest boxes
icantly faster than the algorithm requiring multiple required to cover the set,. may be less than the num-
sortings, as is clear from table 1. ber of data points. Obviously, the fitting range de-
The computed values of the capacity dimension, pends on the strangeness of the set and the number
D~[5—7], of the Hénon attractor are shown in table of bits (k) in the integer representation of the
1 and fig. 2. We select the fitting points based on the coordinates.
We have developed an efficient way to order the
~ C language may be a good choice to implementthis algorithm points in any embedding dimension space for faster
since many bitwise operations are needed and much unsigned implementation of box counting algorithms. Such a
number anthmetic is carned out. .

i2 We used the internal routine clock to get execution time in method was briefly, but not fully, described in ref.
milliseconds. This includes actual CPU time and system time [5]. This algorithm solves the memory and time
(I/O and waiting). problems of regular box counting methods, but does

20

z
c’1 10’
0

0.—c. 11’IIIII.

0 4 8 12 16 20 24 28 32

log 2 (lIe)

Fig. 2. Plot oflog2 [NB(~)] versus log2( 1 /~)for Hénon map with a = 1.3, b= 0.3; 1024K points were used. (Here box size ~ = 2~~~/232.)

45
Volume 151, number 1,2 PHYSICS LETTERS A 26 November 1990

not solve the convergence problem [9]. The source [2] H.G. Schuster, Derministic chaos (Physik Verlag,
code for this algorithm is available upon request to Weinheim, 1984).
the first author. [3] M. Barnsley,
York, 1988). Fractals everywhere (Academic Press, New
[4] B.B. Mandelbrot, The fractal geometry of nature (Freeman,
We would like to thank M. Magasco for the dis- San Francisco, 1983).
cussion of other implementations of box counting [5] L.S. Liebovitch and T. Toth, Phys. Lett. A 141 (1989) 386.
methods. One of the authors (X.J.H.) wishes to [6]D.A. Russell, J.D. Hanson and E. Ott, Phys. Rev. Lett. 45
(1980) 1175.
thank Ms. R. Mercun and Dr. Jing Guo for discus- [7] P. Grassberger, Phys. Lett. A 97 (1983) 224.
sions on the optimized coding of intercalation. An- [8]A. Giorgilli, D. Casati, L. Sironi and L. Galgani, Phys. Lett.
other author (H.G.S) thanks Consejo Nacional de A 115 (1986) 202.
Investigaciones Cientificas y Tecnicas (CONICET) [9] H.S. Greenside, A. Wolf, J. Swift and T. Pignataro, Phys.
of Argentina. This work is supported in part by NSF Rev. A 25 (1982) 3453.
Grant No. PHY 90-04582. [10] A.V. Aho,
analysis of J.E. Hopcroft
computer and J.D.
algorithms Ullman, The design
(Addison-Wesley, and
Reading,
MA, 1974).
[11] J.E. Hopcroft and J.D. UlIman, Introduction to automata
References theory, language and computation (Addison-Wesley,
Reading, MA, 1979).
[1] A.N. Kolmogorov, DokI. Akad. Nauk. SSSR 119 (1958) [12] D. Kaplan, unpublished.
861. [13] M. Hénon, Commun. Math. Phys. 50 (1976) 69.

46

You might also like