Professional Documents
Culture Documents
Universal-Caching Sinha22
Universal-Caching Sinha22
where lt,i = 1 − rt,i is the loss of the ith expert at round t. In the above, we P have defined wt−1 (i) ≡ exp(ηRt−1 (i)),
t−1
Connection to the Caching problem: The problem of where Rt−1 (i) ≡ τ =1 1(xτ = i) denotes the total number
th
designing an online prefetcher that competes against a static of times the i file was requested up to time t − 1. Both the
benchmark can be straightforwardly reduced to the previous numerator and the denominator in the probability expression
experts framework. For this purpose, define an instance of the (5) have exponentially many terms and are non-trivial to
experts problem with M = N compute directly. A key observation made in [20] is that both
C experts, each corresponding to
a subset of C files. Let the reward rt,i accrued by the ith subset the numerator and denominator can be expressed in terms
at round t be equal to 1 if the ith expert (which corresponds of certain elementary symmetric polynomials (ESP), which
to a particular subset of C files) contains the file xt requested can be efficiently evaluated. To see this, define the vectors
at round t. Else, the value of rt,i is set to zero. A simple wt = (wt (i))i∈[N ] and w−i,t = (wt (i))i∈[N ]\{i} . Let ek (·)
but computationally inefficient online caching policy can be denote the ESP of order k, defined as follows:
X Y
obtained by using the H EDGE policy on the experts problem ek (w) = wj .
defined above. However, a major issue with this naive reduction I⊆[N ],|I|=k j∈I
With the above definitions in place, the probability term given where lT∗ ≡ T − T π̃1 (xT ) is the cumulative number of cache
in (5) can be expressed in terms of ESPs as: misses incurred by the optimal offline caching configuration
wt (i)eC−1 (w−i,t ) in hindsight. From the above discussion, it is clear that the
pt (i) = . (6) S AGE policy also achieves the regret bound (7). Since l∗ ≤ T,
eC (wt ) √ T
Eqn. (7) trivially yields a sublinear O( T ) static regret bound
It is known that any ESP of order k with N variables can be
for the online caching problem. Hence, our √ regret bound (7)
computed efficiently in Õ(N ) time using FFT-based polynomial
improves upon the previous O(N C 2 log T T ) regret bound of
multiplication methods [26], [27].
the prefetcher (referred to as “P1 ”) proposed by [1, Theorem
b) Sampling without replacement according to a pre-
1, Lemma 3]. More importantly, Eqn. (7) gives what is known
scribed set of inclusion probabilities: Consider the problem
as a “small-loss” bound [28]. In particular, for any request
of efficiently sampling a subset of C items without replace-
sequence for which the optimal static offline policy concedes a
ment from a universe of N items, where the ith item is
small number of cache-misses (i.e., lT∗ T ), Eqn. (7) provides
included in the sampled subset with a prescribed probability
a much tighter bound. We will exploit the small-loss bound in
pi ∈ [0, 1], 1 ≤ i ≤ N . In other words, if the set S
our subsequent analysis.
is
P sampled with probability P(S), then it is required that
S:i∈S,|S|=k P(S) = pi , ∀i ∈ [N ]. Given that the inclusion C. Augmenting the S AGE policy with a Markovian Prefetcher
probabilities
PN satisfy the necessary and sufficient condition Equation (7) gives an upper-bound on the static regret for
i=1 pi = k, the sampling problem can be efficiently solved the S AGE caching policy against all offline static prefetchers
using Madow’s systematic sampling procedure given below where the action of the benchmark policy does not change with
[21]. time. Now consider any given FSM M containing S number
of states. For each state s ∈ S, let xs be the subsequence of
Algorithm 1 Madow’s sampling
the original file requests obtained by aggregating the requests
Input: Set [N ], size of the sampled set C, probability p. when the FSM M was visiting the state s. Upon running a
Output: A random set S containing C elements s.t. P(i ∈ separate copy of the S AGE policy for each state of the given
S) = pi , ∀i. FSM M, we obtain the following regret bound:
1: Let P0 = 0 and Pi = Pi−1 + pi , ∀i ∈ [N ]
2: Sample a uniform random variable U ∈ [0, 1]. RT = T (π̃SM (xT ) − πSS AGE (xT ))
S
3: S ← ∅ X
π̃1M (xs ) − π1S AGE (xs )
4: for i ← 0 to k − 1 do = T
s=1
5: Select element j if Pj−1 ≤ U + i ≤ Pj
(a) S
6: S ← S ∪ {j} Xq
∗ ln(N e/C) + CS ln(N e/C)
≤ 2ClT,s
7: end for
s=1
8: return S (b) q
≤ 2CSL∗T,S ln(N e/C) + CS ln(N e/C), (8)
Combining part (a) and (b), the overall S AGE caching policy
∗
is summarized in Algorithm 2. where lT,s denotes the total number of cache misses in the
state s incurredPSby the optimal offline single-state prefetcher
Algorithm 2 Online Caching with the S AGE framework and L∗T,S ≡ s=1 lT,s ∗
. In the above, inequality (a) follows
Input: w(i) = 1, ∀i ∈ [N ], learning rate η > 0. from Eqn. (7) applied to each of the S states of the FSM M
Output: A subset of C cached files at every separately, and inequality (b) follows from an application of
round Jensen’s inequality. Specializing the bound (8) to a k th order
1: for t = 1, 2, . . . T do Markov-prefetcher containing S = N k many states, we obtain
w(i) ← exp(η1{xt−1 = i})w(i), ∀i ∈ [N ].
r
2: Ne
T S AGE T
3: p(i) ← w(i)eC−1 (w−i )
, ∀i ∈ [N ]. . Efficient evaluation T (µ̃k (x ) − πk (x )) ≤ 2N k CL∗T,k ln
eC (w) C
using FFT N e
4: Sample a set of C files, with marginal inclusion proba- +N k C ln , (9)
C
bilities p computed as above, using Madow’s sampling
where L∗T,k denotes the minimum number of cache misses
(Algorithm 1).
incurred by the optimal k th order Markovian prefetcher for
5: end for
the file request sequence xT . Note that the cumulative cache
∗
Static regret bound for the S AGE policy: Recall that the misses LT,k could be much smaller than the horizon-length
quantity π̃1 (xT ) denotes the hit rate achieved by the optimal T for many “regular” request sequences. Hence, Theorem 2
offline FSP containing a single state. By tuning the learning gives a new and tighter adaptive √ regret bound compared to the
rate η adaptively, the H EDGE policy achieves the following previously-known weaker O( T ) bound given by [18, Eqn.
data-dependent regret bound [20, Eqn. (14)]: (24)].
q E XAMPLE 1: Consider a “regular” request sequence xT
H EDGE ∗
T (π̃1 − π ) ≤ 2ClT ln(N e/C) + C ln(N e/C), (7) generated by an mth -order Markovian FSM. By taking k ≥ m,
we can ensure that L∗T,k (xT ) = 0 for this sequence. Hence, in Efficient implementation using Lempel-Ziv (LZ) parsing
this case, the first term on the RHS of the bound (9) vanishes, Similar to the binary prediction problem considered in [18],
resulting in O(1) regret. we can use incremental parsing algorithms, such as Lempel-
Combining the regret bound (8) with Theorem 1, we have Ziv’78 [19] to adaptively increment the order of the Markovian
the following guarantee against any FSM containing S many Prefetcher when the horizon length T is not known a priori [1],
states: [29]. However, instead of constructing a binary parse tree as in
Theorem 2. For any file request sequence x , the regret of [18], we build an N -ary tree with the S AGE policy running at
T
the k th order Markovian FSM running a separate copy of the each node. In particular, the LZ parsing algorithm parses the
S AGE caching policy on each state, compared to an optimal N -ary request sequence into distinct phrases such that each
offline FSP containing at most S states, is upper-bounded as: phrase is the shortest phrase that is not a previously parsed
phrase. In the parse tree, each new phrase corresponds to a
RT ≡ T (π̃S (xT ) − πkS AGE (xT )) ≤ T min 1 − C/N, leaf in the tree. The parsing proceeds as follows: the LZ tree
s is initialized with a root node and N leaves. The current tree
r
ln S N e N e is used to create the next phrase by following the path from
+ 2N k CL∗T,k ln + N k C ln . the root to leave according to the consecutive file requests.
2(k + 1) C C
Once a leaf node is reached, the tree is extended by making
E XAMPLE 2: Consider a file request sequence xTQ generated the leaf an internal node, and adding N offsprings to the tree
by any FSM containing at most Q states. The FSM needs not and then moving to the root of the tree. Each node of the tree
be Markovian (c.f. E XAMPLE 1). Refer to Section VIII-A of now corresponds to a state of the Markovian prefetcher and
the Appendix for details on the request sequence generation. runs a separate instance of the S AGE policy. A classical result,
Combining Theorem 1 and Theorem 2, we have: established in [30, Theorem 2], tells that the number of nodes
Expected fraction of cache misses conceded by the in an N -ary LZ tree generated by an arbitrary sequence of
length T grows sub-linearly as c(T ) = O( Tlog log N
T ). Hence, for
S AGE policy used with a k th order Markovian FSM any fixed k, the fraction of file requests made on a node with
s
C ln Q depth less than k vanishes asymptotically. Hence, the expected
≤ min 1 − , +
N 2(k + 1) fraction of cache hits π LZ achieved by the LZ prefetcher is
v s asymptotically lower bounded by that of a k th order Markovian
FSP containing N k ≈ c(T ) states up to a sublinear regret term.
u
u 2N k C N e
C ln Q
t ln min 1 − , The following theorem makes this statement precise.
T C N 2(k + 1)
N kC N e Theorem 3. For any integer k ≥ 0, the regret of the LZ
+ ln , (10) prefetcher w.r.t. an offline k th order Markovian prefetcher can
T C
be upper-bounded as:
where we have used the fact that an optimal FSM with Q many
states incurs zero cache misses for the request sequence xTQ . RT ≡ T (µ̃k − π LZ ) ≤ δ(c(T ), L∗,LZ T ) + kc(T ),
If the value of Q is known (however, the structure of the FSM T log N
remains unknown), the optimal order k ∗ of the Markovian where c(T ) ≡ O( log T ) and δ(B, lT∗ ) ≡
q
prefetcher minimizing the upper bound in Eqn. (10) can be ∗,LZ
2BCLT ln(N e/C) + CB ln(N e/C).
computed using calculus. From Eqn. (10), it also follows that
for any fixed value of Q, the expected fraction of cache-misses See Section VII-B of the Appendix for the proof.
−1/2
can be made approach to zero at the rate of O(T ) by V. C ONCLUSION
taking k ln Q. See Section VIII for the numerical results.
In this paper, we proposed an efficient online universal
In the following section, we design a universal caching policy
caching policy that results in a sublinear regret against all
that achieves a sublinear O (log T )−1/2 regret bound for all
finite-state prefetchers containing arbitrarily many states. We
file request sequences against any FSP containing unknown
presented the first data-dependent regret bound for the universal
and arbitrarily many states S.
caching problem by making use of the S AGE framework [20].
IV. A U NIVERSAL C ACHING P OLICY In the future, it will be interesting to extend these techniques to
In Theorem (2), we are free to choose the order k of the other online learning problems to design policies with improved
Markovian FSM as a function of the (known) horizon-length regret guarantees. Furthermore, designing universal algorithms
T . Since the number of states S in the benchmark comparator that also guarantee sublinear dynamic regret and take into
could be arbitrarily large, it is clear that in order to achieve account the file download costs will be of interest.
asymptotically zero regret (normalized w.r.t. the horizon length VI. ACKNOWLEDGMENT
T ), the order k of the Markovian prefetcher needs to be
This work was partly supported by a grant from the DST-
increased accordingly with T . By setting N k = O( logT T ) in
NSF India-US collaborative research initiative under the TIH
Theorem 2, we obtain the following bound on the regret against
at the Indian Statistical Institute at Kolkata, India.
all FSPs having arbitrarily many states: RTT ≤ O( √log 1
T
).
R EFERENCES [14] L. Zhang, T. Yang, Z.-H. Zhou et al., “Dynamic regret of strongly
adaptive methods,” in International conference on machine learning.
[1] P. Krishnan and J. S. Vitter, “Optimal prediction for prefetching in the PMLR, 2018, pp. 5882–5891.
worst case,” SIAM Journal on Computing, vol. 27, no. 6, pp. 1617–1636, [15] D. Adamskiy, W. M. Koolen, A. Chernov, and V. Vovk, “A closer look
1998. at adaptive regret,” in International Conference on Algorithmic Learning
[2] R. Bhattacharjee, S. Banerjee, and A. Sinha, “Fundamental limits Theory. Springer, 2012, pp. 290–304.
on the regret of online network-caching,” Proc. ACM Meas. Anal. [16] O. Besbes, Y. Gur, and A. Zeevi, “Non-stationary stochastic optimization,”
Comput. Syst., vol. 4, no. 2, Jun. 2020. [Online]. Available: Operations research, vol. 63, no. 5, pp. 1227–1244, 2015.
https://doi.org/10.1145/3392143 [17] A. Jadbabaie, A. Rakhlin, S. Shahrampour, and K. Sridharan, “Online
[3] G. S. Paschos, A. Destounis, L. Vigneri, and G. Iosifidis, “Learning to optimization: Competing with dynamic comparators,” in Artificial
cache with no regrets,” in IEEE INFOCOM 2019-IEEE Conference on Intelligence and Statistics. PMLR, 2015, pp. 398–406.
Computer Communications. IEEE, 2019, pp. 235–243. [18] M. Feder, N. Merhav, and M. Gutman, “Universal prediction of individual
[4] D. Paria and A. Sinha, “LeadCache: Regret-optimal caching in sequences,” IEEE transactions on Information Theory, vol. 38, no. 4, pp.
networks,” Advances in Neural Information Processing Systems, vol. 34, 1258–1270, 1992.
2021. [19] J. Ziv and A. Lempel, “Compression of individual sequences via variable-
[5] S. Mukhopadhyay and A. Sinha, “Online caching with optimal switching rate coding,” IEEE transactions on Information Theory, vol. 24, no. 5,
regret,” in 2021 IEEE International Symposium on Information Theory pp. 530–536, 1978.
(ISIT). IEEE, 2021, pp. 1546–1551. [20] S. Mukhopadhyay, S. Sahoo, and A. Sinha, “k-experts-Online Policies
[6] N. Cesa-Bianchi and G. Lugosi, Prediction, learning, and games. and Fundamental Limits,” in International Conference on Artificial
Cambridge university press, 2006. Intelligence and Statistics. PMLR, 2022, pp. 342–365.
[7] G. Paschos, G. Iosifidis, G. Caire et al., “Cache optimization models [21] W. G. Madow et al., “On the theory of systematic sampling, ii,” The
and algorithms,” Foundations and Trends® in Communications and Annals of Mathematical Statistics, vol. 20, no. 3, pp. 333–354, 1949.
Information Theory, vol. 16, no. 3–4, pp. 156–345, 2020. [22] Y. Tillé, Sampling algorithms. Springer, 2006.
[8] Y. Li, T. Si Salem, G. Neglia, and S. Ioannidis, “Online caching networks [23] G. Pandurangan and W. Szpankowski, “A universal online caching
with adversarial guarantees,” Proceedings of the ACM on Measurement algorithm based on pattern matching,” Algorithmica, vol. 57, no. 1,
and Analysis of Computing Systems, vol. 5, no. 3, pp. 1–39, 2021. pp. 62–73, 2010.
[9] M. Herbster and M. K. Warmuth, “Tracking the best expert,” Machine [24] V. Vovk, “A game of prediction with expert advice,” Journal of Computer
learning, vol. 32, no. 2, pp. 151–178, 1998. and System Sciences, vol. 56, no. 2, pp. 153–173, 1998.
[10] N. Cesa-Bianchi, P. Gaillard, G. Lugosi, and G. Stoltz, “Mirror descent [25] Y. Freund and R. E. Schapire, “A decision-theoretic generalization of
meets fixed share (and feels no regret),” Advances in Neural Information on-line learning and an application to boosting,” Journal of computer
Processing Systems, vol. 25, 2012. and system sciences, vol. 55, no. 1, pp. 119–139, 1997.
[11] L. Chen, Q. Yu, H. Lawrence, and A. Karbasi, “Minimax regret of [26] A. Shpilka, “Lower bounds for small depth arithmetic and boolean
switching-constrained online convex optimization: No phase transition,” circuits,” 2001.
Advances in Neural Information Processing Systems, vol. 33, pp. 3477– [27] V. Grolmusz, “Computing elementary symmetric polynomials with a
3486, 2020. subpolynomial number of multiplications,” SIAM Journal on Computing,
[12] E. Hazan and C. Seshadhri, “Adaptive algorithms for online decision prob- vol. 32, no. 6, pp. 1475–1487, 2003.
lems,” in Electronic colloquium on computational complexity (ECCC), [28] T. Lykouris, K. Sridharan, and É. Tardos, “Small-loss bounds for online
vol. 14, no. 088, 2007. learning with partial information,” in Conference on Learning Theory.
[13] A. Daniely, A. Gonen, and S. Shalev-Shwartz, “Strongly adaptive online PMLR, 2018, pp. 979–986.
learning,” in International Conference on Machine Learning. PMLR, [29] J. S. Vitter and P. Krishnan, “Optimal prefetching via data compression,”
2015, pp. 1405–1411. Journal of the ACM (JACM), vol. 43, no. 5, pp. 771–793, 1996.
[30] A. Lempel and J. Ziv, “On the complexity of finite sequences,” IEEE
Transactions on information theory, vol. 22, no. 1, pp. 75–81, 1976.
A PPENDIX
VII. P ROOFS ≤ ES,X j TV(pS,X j (Xj+1 ), pX j (Xj+1 ))
r
A. Proof of Theorem 1 (b)
ln 2 j j
≤ ES,X j D(P (Xj+1 |S, X )||P (Xj+1 |X ))
Consider an FSP M, containing S states, whose state 2
transition function is given by g. For any integer j ≥ 1,
r
(c) ln 2
construct a new FSP M̃ whose state at round t is given by ≤ D(P (Xj+1 |S, X j )||P (Xj+1 |X j ))
s̃t = (st , xt−1 r 2
t−j ). In other words, the states of the FSP M̃ are
ln 2
constructed by juxtaposing the states of M and the k th order = (H(Xj+1 |X j ) − H(Xj+1 |X j , S)). (11)
Markov prefetcher. The transition function of M̃ is naturally 2
defined as g̃((st , xt−1 t
t−j ), xt ) = (g(st ), xt−j+1 ). Consequently, where (a) follows from the sub-additivity of the max(·)
if both the FSPs M and M̃ are fed with the same file request function, (b) follows from Pinsker’s inequality, and (c) follows
sequence x, the current state of the FSP M can be read off from the concavity of the square root function and Jensen’s
from the first component of the current state of the FSP M̃. inequality. Furthermore, we also have
Let µ̃j,S (xT ) be the hit rate achieved by the new FSP M̃ for
the given file request sequence. Define a sequence of random µ̃j,S (xT ) − µ̃j (xT )
variables (X1 , X2 , . . .), which are distributed according to the X
empirical probability measure induced by the consecutive terms = ES,X j max pS,X j (Xj+1 = i) −
A:|A|=C
of the given file request sequence xT1 . In other words, for any i∈A
natural number k, we define the joint distribution:
X
max pX j (Xj+1 = i)
A:|A|=C
P (X1 , X2 , . . . , Xk ) = (z1 , z2 , . . . , zk ) i∈A
VIII. E XPERIMENTS
In this section, we report some simulation results demon-
strating the practical efficacy of the proposed universal caching
policy 4 . In our simulations, we use a synthetic file request
sequence generated by a randomly constructed Finite State
Machine. The structure and the number of states of the FSM
remain hidden from the online policies that we evaluate. The Order k of the Markov predictor
details of the setup and the simulation results are discussed (a) N = 3, C = 2, Q = 50, T = 10M
below.