You are on page 1of 6

Disjoint Sparse Table

Ron Mikhael C Surara


July 27, 2018

I Introduction
Sparse Table is a data structure that can support operations wherein multiple values
does not affect the result. An example of such problem is this:

You are given an array on integers 𝐴[𝑁] and 𝑄 queries. For each query, you must find
the maximum value within the segment 𝐴[𝑙, 𝑟] wherein 𝑙 ≤ 𝑟.

A common problem such as this has many solutions, such as the naïve method which
has a runtime of 𝑂(𝑁) per query and segment tree with a runtime of 𝑂(log 𝑁) per query.
If there are updates, segment tree would very likely be a good choice, but since there
are none, then each query can actually be solved in 𝑂(1) using a sparse table.

Sparse Table is a data structure that uses dynamic programming to solve queries in very
efficiently. It also utilizes the fact that the answer for the query max(𝐴[𝑙, 𝑎], 𝐴[𝑏, 𝑟])
wherein 𝑙 ≤ 𝑏 < 𝑎 ≤ 𝑟 is the same with max(𝐴[𝑙, 𝑘], 𝐴[𝑘 + 1, 𝑟]). What this means that
repeating the segment [𝑏, 𝑎] on the query does not actually change the result, but what
if repeating a segment does actually affect the result? Take for example the following
problem:

You are given an array of positive integers 𝐴[𝑁] and 𝑄 queries. For each query, find the
sum of the subarray 𝐴[𝑙, 𝑟] wherein 𝑙 ≤ 𝑟.

This time, instead of the maximum value in the segment, the sum is now queried.
Unfortunately, by repeating a non-empty segment, the answer will actually be
affected. For example, let the segment to be queried on be 𝐴[𝑙, 𝑟]. If one answers it
normally as one would answer the previous problem, you would get

𝑆𝑢𝑚 = (𝐴[𝑙] + 𝐴[𝑙 + 1] + ⋯ + 𝐴[𝑏] + 𝐴[𝑏 + 1] + ⋯ + 𝐴[𝑎])


+ (𝐴[𝑏] + 𝐴[𝑏 + 1] + ⋯ + 𝐴[𝑎] + ⋯ + 𝐴[𝑟 − 1] + 𝐴[𝑟])
As we can see, the segment [𝑏, 𝑎] was added twice, therefore we cannot do what we
would normally do in a sparse table. There is a way to get around this problem, but it
will have a time complexity of 𝑂(log 𝑁) per query. Since this problem can be solved in
𝑂(1) using partial sums and inverse of addition (subtraction) and 𝑂(log 𝑁) using a
tree, the solution will be omitted.

What if we want to solve a problem in 𝑂(1) but the operation involved does not have
an inverse, for example, finding the result of a range matrix multiplication or
multiplication modulo 𝑀 wherein 𝑀 can be any positive integer? Since these
operations does not have inverses, prefixes and suffixes cannot be used. How can these
be solved then? Simple, using Disjoint Sparse Tables!

II Theory and Implementation


Disjoint Sparse Table is a variant of sparse tables such that the segment to be queried
on is divided into disjoint smaller segments. This means that no segment is repeated.
Similarly to a normal sparse table, this data structure uses precomputation to efficiently
solve a query.

The idea behind is like this, let the boundaries of the query be [𝑙, 𝑟] and 𝑚 be an integer
such that 𝑙 ≤ 𝑚 < 𝑟. If we have an index 𝑖 > 𝑚, then we precompute the following way

𝑑𝑠𝑡[𝑖] = 𝐴[𝑚 + 1] ⊗ 𝐴[𝑚 + 2] ⊗ ⋯ ⊗ 𝐴[𝑖]

And if 𝑖 ≤ 𝑚, then

𝑑𝑠𝑡[𝑖] = 𝐴[𝑖] ⊗ 𝐴[𝑖 + 1] ⊗ ⋯ ⊗ 𝐴[𝑚]

Wherein ⊗ denotes the operation to be used. Since 𝑙 ≤ 𝑚 < 𝑟, then the answer for the
segment [𝑙, 𝑟] would just be 𝑑𝑠𝑡[𝑙] ⊗ 𝑑𝑠𝑡[𝑟]. Since the values of 𝑑𝑠𝑡[ ] have already been
precomputed, then answering a query only takes 𝑂(1).

Disjoint Sparse Table is a recursive data structure and is built quite similarly to a
segment tree. Initially, the segment to be precomputed is [0, 𝑁 − 1] with 𝑙 = 0, 𝑟 = 𝑁 −
1, 𝑚 = 𝑁⁄2. Then, the segments [𝑙, 𝑚] and [𝑚 + 1, 𝑟] are recursively built until 𝑙 = 𝑟.
1. int DST[K][N], arr[MAXN];
2.
3. void build(int depth, int l, int r) {
4. if(l == r) return;
5. int m = (l + r)/2;
6. DST[depth][m] = arr[m];
7. for(int i=m-1; i>=l; i--) DST[depth][i] =
combine(arr[i], DST[depth][i+1]);
8. //To avoid out of bounds
9. if(m+1 <= r) {
10. DST[depth][m+1] = arr[m+1];
11. for(int i=m+2; i<=r; i++) DST[depth][i] =
combine(DST[depth][i-1], arr[i]);
12. }
13. build(depth+1, l, m);
14. build(depth+1, m+1, r);
15. }

Figure 1. Build Code

In order to query in 𝑂(1), one must be able to find the depth wherein there exist a
midpoint 𝑚 such that 𝑙 ≤ 𝑚 < 𝑟 efficiently. In order to do this easily, the length of 𝑑𝑠𝑡
must be the smallest power of 2 at least 𝑁. The extra space can be filled with identity
elements or can be ignored at all.

𝐿𝑒𝑛𝑔𝑡ℎ = 2⌈log2 𝑁⌉

Let 𝐿𝑒𝑛𝑔𝑡ℎ = 25 , this way, the segments would be:

Depth Intervals
0 [0,31]
1 [0,15] [16,31]
2 [0,7][8,15][16, 23][24, 31]
3 [0,3][4,7][8,11][12,15][16,19][20,23][24,27][28,31]

The midpoints of the intervals

Depth Midpoints
0 [15]
1 [7][23]
2 [3][11][19][27]
3 [1][5][9][13][17][21][25][29]
When converted to binary, the midpoints show an interesting pattern

Depth Midpoints in Binary


0 [01111]
1 [00111][10111]
2 [00011][01011][10011][11011]
3 [00001][00101][01001][01101][10001][10101][11001][11101]

More specifically, the midpoints + 1

Depth Midpoints + 1 in Binary


0 [𝟏0000]
1 [0𝟏000][1𝟏000]
2 [00𝟏00][01𝟏00][10𝟏00][11𝟏00]
3 [000𝟏0][001𝟏0][010𝟏0][011𝟏0][101𝟏0][110𝟏0][111𝟏0]

As we can see, for a certain depth, a specific bit is always active and this bit differs
depending on the depth. Also, quite interestingly, these bits are easily calculable!
𝑏𝑖𝑡 = ⌈log 2 𝑁⌉ − 𝑑𝑒𝑝𝑡ℎ − 1

(The minus 1 is used since bits are 0-based index)


Using the bit, one can then find the required depth quite easily.
𝑑𝑒𝑝𝑡ℎ = ⌈log 2 𝑁⌉ − 𝑏𝑖𝑡 − 1

The problem that remains now is how to find the bit efficiently. Before that, we must
first explain more the importance of this bit. If both 𝑙 and 𝑟 are represented in binary,
then this bit is actually the most significant bit of 𝑙 ⊕ 𝑟 (proof omitted). Since it is
possible to get the most significant bit in 𝑂(1), then we are done!
Since the segments are halved at every level, then there are at most 𝑂(log 𝑁) levels and
at since each level takes 𝑂(𝑁), the total memory and preprocessing time is 𝑂(𝑁 log 𝑁).
Please do note that since 𝑙 ≤ 𝑚 < 𝑟, then the data structure cannot solve queries
wherein 𝑙 = 𝑟, fortunately such queries are trivial. Please refer to the sample code for
implementation details.
1. class disjointSparseTable {
2. private:
3. int size, levels;
4. vector<int> arr;
5. vector<vector<int> > dst;
6.
7. public:
8. void build(int k, int s, int e) {
9. if(s == e) return;
10. int m = (s + e) >> 1;
11. dst[m][k] = arr[m];
12. for(int i=m-1; i>=s; i--) dst[i][k] =
dst[i+1][k] + arr[i];
13. if(m+1 <= e) {
14. dst[m+1][k] = arr[m+1];
15. for(int i=m+2; i<=e; i++) dst[i][k] = dst[i-
1][k] + arr[i];
16. }
17. build(k+1, s, m);
18. build(k+1, m+1, e);
19. }
20.
21. disjointSparseTable(vector<int> in) {
22. int n = (int)in.size();
23. levels = __builtin_clz(n);
24. size = 1 << (31 - levels);
25. if(n != size) {
26. levels--;
27. size <<= 1;
28. }
29. arr = in;
30. while(arr.size() < size) arr.push_back(0);
31. dst.resize(size, vector<int>(22));
32. build(0,0,size-1);
33. }
34.
35. //already 0-based
36. int query(int l, int r) {
37. if(l == r) return arr[l];
38. unsigned int tmp = __builtin_clz(l ^ r);
39. unsigned int lvl = tmp - levels - 1;
40. return dst[l][lvl] + dst[r][lvl];
41. }
42. };

Figure 2. DST for Range Sum


III Sample Problems
Problems that are solvable with segment trees and has no queries are most likely
solvable with these data structures. Though, to be honest, I have not encountered a
problem with this being the intended solution. Still, here are some sample problems:
1. CF474F – Ant Colony
2. CC SEGPROD – Product on the Segment by Modulo

IV References
1. CodeChef Discussion on Disjoint Sparse Table
2. CodeForces Comment on Disjoint Sparse Table

You might also like