You are on page 1of 21

Min-max Submodular Ranking for Multiple Agents∗


Qingyun Chen1 , Sungjin Im1 , Benjamin Moseley2 , Chenyang Xu3,4 , and Ruilong

Zhang5
1
Electrical Engineering and Computer Science, University of California at Merced
2 Tepper School of Business, Carnegie Mellon University
3 Software Engineering Institute, East China Normal University
4 College of Computer Science, Zhejiang University
5 Department of Computer Science, City University of Hong Kong

qchen41@ucmerced.edu, sim3@ucmerced.edu, moseleyb@andrew.cmu.edu, xcy1995@zju.edu.cn,


arXiv:2212.07682v1 [cs.DS] 15 Dec 2022

ruilzhang4-c@my.cityu.edu.hk

Abstract
In the submodular ranking (SR) problem, the input consists of a set of submodular functions
defined on a ground set of elements. The goal is to order elements for all the functions to have
value above a certain threshold as soon on average as possible, assuming we choose one element per
time. The problem is flexible enough to capture various applications in machine learning, including
decision trees.
This paper considers the min-max version of SR where multiple instances share the ground set.
With the view of each instance being associated with an agent, the min-max problem is to order the
common elements to minimize the maximum objective of all agents—thus, finding a fair solution for
all agents. We give approximation algorithms for this problem and demonstrate their effectiveness
in the application of finding a decision tree for multiple agents.

∗ Allauthors (ordered alphabetically) have equal contributions and are corresponding authors.
† This work was done when the author was a student at Zhejiang University.
‡ This work was done when the author visited Carnegie Mellon University.

1
1 Introduction
The submodular ranking (SR) problem was proposed by [3]. The problem includes a ground set of
elements [n], and a collection of m monotone submodular1 set functions f1 , . . . , fm defined over [n],
i.e., fj : 2[n] → R≥0 for all j ∈ [m]. Each function fj is additionally associated with a positive weight
wj ∈ R. Given a permutation π of the ground set of elements, the cover time of a function fj is defined
to be the minimal number of elements in the prefix of π which forms a set such that the corresponding
function value is larger than a unit threshold value2 . The goal is to find a permutation of the elements
such that the weighted average cover time is minimized.
The SR problem has a natural interpretation. Suppose there are m types of clients who are char-
acterized by their utility function {fj }j∈[m] , and we have to satisfy a client sampled from a known
probability distribution {wj }j∈[m] . Without knowing her type, we need to sequentially present items
to satisfy her, corresponding to her utility fj having a value above a threshold. Thus, the goal becomes
finding the best ordering to minimize the average number of items shown until satisfying the randomly
chosen client.
In the min-max submodular ranking for multiple agents, we are in the scenario where there are k
SR instances sharing the common ground set of elements, and the goal is to find a common ordering
that minimizes the maximum objective of all instances. Using the above analogy, suppose there are k
different groups of clients. We have k clients to satisfy, one sample from each group. Then, we want
to satisfy all the k clients as early as possible. In other words, we want to commit to an ordering that
satisfies all groups fairly—formalized in a min-max objective.

1.1 Problem Definition


The problem of min-max submodular ranking for multiple agents (SRMA) is formally defined as follows.
An instance of this problem consists of a ground set U := [n], and a set of k agents A := { a1 , . . . , ak }.
Every agent ai , where i ∈ [k], is associated with a collection of m monotone submodular set functions
(i) (i) (i) (i)
f1 , . . . , fm . It is the case that fj : 2[n] → [0, 1] with fj (U ) = 1 for all i ∈ [k], j ∈ [m]. In addition,
(i) (i)
every function fj is associated with a weight wj > 0 for all i ∈ [k], j ∈ [m].
(i) (i)
Given a permutation π of the ground elements, the cover time cov(fj , π) of fj in π is defined as
(i)
the smallest index t such that the function fj
has value 1 for the first t elements in the given ordering.
The goal is to find a permutation of the ground elements that minimizes thenmaximum total weighted o
Pm (i) (i)
cover time among all agents, i.e., finding a permutation π such that maxi∈[k] j=1 wj · cov(fj , π)
is minimized. We assume that the minimum non-zero marginal value of all m · k functions is  > 0, i.e.,
(i) (i) (i) (i)
for any S ⊆ S 0 , fj (S 0 ) > fj (S) implies that fj (S 0 ) − fj (S) ≥  > 0 for all i ∈ [k], j ∈ [m]. Without
(i) Pm (i)
loss of generality, we assume that any wj ≥ 1 and let W := maxi∈[k] j=1 wj be the maximum total
weight among all agents.

1.2 Applications
In addition to the aforementioned natural interpretation, we discuss two concrete applications below in
detail.

Optimal Decision Tree (ODT) with Multiple Probability Distributions. In (the non-adaptive)
optimal decision tree problem with multiple probability distributions, we are given k probability distri-
(i)
butions {pj }j∈[m] , i ∈ [k] over m hypotheses and a set U := [n] of binary tests. There is exactly one
(i)
unknown hypothesis e j (i) drawn from {p }j∈[m] for each i ∈ [k]. The outcome of each test e ∈ U is a
j

1 A function f : 2[n] → R is submodular if for all A, B ⊆ [n] we have f (A) + f (B) ≥ f (A ∩ B) + f (A ∪ B). The function

is monotone if f (A) ≤ f (B) for all A ⊆ B.


2 If the threshold is not unit-valued, one can easily obtain an equivalent instance by normalizing the corresponding

function.

2
partition of m hypotheses. Our goal is to find a permutation of U that minimizes the expected number
j (i) , i ∈ [k].
of tests to identify all the k sampled hypotheses e
This problem generalizes the traditional optimal decision tree problem which assumes k = 1. The
problem has the following motivation. Suppose there are m possible diseases and their occurrence rate
varies depending on the demographics. A diagnosis process should typically be unified and the process
should be fair to all demographic groups. If we adopt the min-max fairness notion, the goal then
becomes to find a common diagnostic protocol that successfully diagnoses the disease to minimize the
maximum diagnosis time for any of the k groups. The groups are referred to as agents in our problem.
As observed in [19], the optimal decision tree is a special case of submodular ranking. To see this,
fix agent ai . For notational simplicity we drop ai . Let Tj (e) be the set of hypotheses that have an
outcome different from j for test e. For each hypothesis j, we define a monotone submodular function
fj : 2U → [0, 1] with a weight pj as follows:
[
1
fj (S) =
Tj (e) ·
m−1
e∈S

which is the fraction of hypotheses other than j that have been ruled out by a set of tests S. Then,
hypothesis e
j can be identified if and only if fej (S) = 1.

Web Search Ranking with Multiple User Groups. Besides fair decision tree, our problem also
captures several practical applications, as discussed in [3]. Here, we describe the application of web
search ranking with multiple user groups. A user group is drawn from a given distribution defined
over k user groups. We would like to display the search results sequentially from top to bottom. We
assume that each user browses search results from top to bottom. Each user is satisfied when she has
found sufficient web pages relevant to the search, and the satisfaction time corresponds to the number
of search results she checked. The satisfaction time of a user group is defined as the total satisfied time
of all users in this group. The min-max objective ensures fairness among different groups of users.

1.3 Our Contributions


To our knowledge, this is the first work that studies submodular ranking, or its related problems, such
as optimal decision trees, in the presence of multiple agents (or groups). Finding an order to satisfy
all agents equally is critical to be fair to them. This is because the optimum ordering can be highly
different for each agent.
We first consider a natural adaptation of the normalized greedy algorithm [3], which is the best
algorithm √for submodular ranking. We show that the adaptation is O(k ln(1/))-approximation and
has an Ω( k) lower bound on the approximation ratio.
To get rid of the polynomial dependence on k, we then develop a new algorithm, which we term
balanced adaptive greedy. Our new algorithm overcomes the polynomial dependence on k by carefully
balancing agents in each phase. To illustrate the idea, for simplicity, assume all weights associated
with the functions are 1 in this paragraph. We iteratively set checkpoints: in the p-th iteration, we
ensure that we satisfy all except about m/2p functions for each agent. By balancing the progress among
different agents, we obtain a O(ln(1/) log(min{n, m}) log k) approximation (Theorem 3), where n is the
number of elements to be ordered. When min{n, m} = o(2k ), the balanced adaptive greedy algorithm
has a better approximation ratio than the natural adaptation of the normalized greedy algorithm. For
the case that min{n, m} = Ω(2k ), we can simply run both of the two algorithms and pick the better
solution which yields an algorithm that is always better than the natural normalized greedy algorithm.
We complement our result by showing that it is NP-hard to approximate the problem within a
factor of O(ln(1/) + log k) (Theorem 4). While we show a tight Θ(log k)-approximation for generalized
min-sum set cover over multiple agents which is a special case of our problem (see Theorem 5), reducing
the gap in the general case is left as an open problem.

3
We demonstrate that our algorithm outperforms other baseline algorithms for real-world data sets.
This shows that the theory is predictive of practice. The experiments are performed for optimal decision
trees, which is perhaps the most important application of submodular ranking.

1.4 Related Works


Submodular Optimization. Submodularity commonly arises in various practical scenarios. It is
best characterized by diminishing marginal gain, and it is a common phenomenon observed in all disci-
plines. Particularly in machine learning, submodularity is useful as a considerable number of problems
can be cast into submodular optimizations, and (continuous extensions of) submodular functions can be
used as regularization functions [5]. Due to the extensive literature on submodularity, we only discuss
the most relevant work here. Given a monotone submodular function f : 2U → Z≥0 , choosing a S ⊆ U
of minimum cardinality s.t. f (S) ≥ T for some target T ≥ 0 admits a O(log T )-approximation [23],
which can be achieved by iteratively choosing an element that increase the function value the most,
i.e., an element with the maximum marginal gain. Or, if f has a range [0, 1], T = 1 and the non-zero
marginal increase is at least  > 0, we can obtain an O(ln(1/))-approximation.

Submodular Ranking. The submodular ranking problem was introduced by [3]. In [3], they gave an
elegant greedy algorithm that achieves O(ln 1 )-approximation and provided an asymptotically matching
lower bound. The key idea was to renormalize the submodular functions over the course of the algorithm.
That is, if S is the elements chosen so far, we choose an element e maximizing f ∈F wf · f (S∪{e})−f (S)
P
1−f (S) ,
where F is the set of uncovered functions. Thus, if a function fj is nearly covered, then the algorithm
considers fj equally by renormalizing it by the residual to full coverage, i.e., 1 − fj (S). Later, [18] gave
a simpler analysis of the same greedy algorithm and extended it to other settings involving metrics.
Special cases of submodular ranking include min-sum set cover [14] and generalized min-sum set cover
[4, 7]. For stochastic extension of the problem, see [18, 2, 19].

Optimal Decision Tree. The optimal decision tree problem (with one distribution) has been ex-
tensively studied. The best known approximation ratio for the problem is O(log m) [16] where m is
the number of hypotheses and it is asymptotically optimal unless P = NP [9]. As discussed above, it
is shown in [19] how the optimal decision tree problem is captured by the submodular ranking. For
applications of optimal decision trees, see [12]. Submodular ranking only captures non-adaptive optimal
decision trees. For adaptive and noisy decision trees, see [15, 19].

Fair Algorithms. Fairness is an increasingly important criterion in various machine learning appli-
cations [8, 10, 21, 17, 1], yet certain fairness conditions cannot be satisfied simultaneously [20, 11]. In
this paper, we take the min-max fairness notion, which is simple yet widely accepted. For min-max, or
max-min fairness, see [22].

2 Warm-up Algorithm: A Natural Adaptation of Normalized


Greedy
The normalized greedy algorithm [3] obtains the best possible approximation ratio of O(ln 1 ) on the
traditional (single-agent) submodular ranking problem, where  is the minimum non-zero marginal
value of functions. The algorithm picks the elements sequentially and the final element sequence
P gives
a permutation of all elements. Each time, the algorithm chooses an element e maximizing f ∈F wf ·
f (S∪{e})−f (S)
1−f (S) where S is the elements chosen so far and F is the set of uncovered functions.
,
For the multi-agent setting, there exists a natural adaptation of this algorithm: simply view all
m · k functions as a large single-agent submodular ranking instance and run normalized greedy. For
simplicity, refer to this adaption as algorithm NG. The following shows that this algorithm has to lose
a polynomial factor of k on the approximation ratio.

4
√ P (i)
Theorem 1. The approximation ratio of algorithm NG is Ω( k), even when the sum j wj of
function weights for each agent i is the same.

Proof. Consider
√ the following SRMA instance. There are k agents and each agent has at most k
functions; k is assumed to be an integer. We first describe a weighted set cover instance √ and explain
how the instance is mapped to an instance for our problem. We are given a ground set of k+ k elements
E = { e1 , e2 , . . . , ek+√k } and a collection of singleton sets corresponding to the elements. Each agent

i ∈ [k − 1] has two sets, { ei } with weight 1 + δ and { ek } with weight √ k − 1 − δ, where δ > 0 is a
tiny value used for tie-breaking. The last agent k is special and she has k sets, { ek+1 } , . . . , { ek+√k },

each with weight 1. Note that every agent has exactly the same total weight k.
Each set is “covered” when we choose the unique element in the singleton set. This results in an
instance for our problem: for each set {e} with weight w, we create a distinct 0-1 valued function f
with an equal weight w such that f (S) = 1 if and only if e ∈ S. It is obvious to see that all created
functions are submodular and have√function values in {0, 1}. It is worth noting that ek is special and
all functions of the largest weight ( k − 1) get covered simultaneously when ek is selected.
Algorithm NG selects ek in the first step. After√ that, the contribution of each element in E \
{ ek } in each following step is the same, which is k − 1. Thus, the permutation π returned by
algorithm NG is (ek , e1 , . . . , ek−1 , ek+1 , . . . , ek+√k ). The objective for this ordering is at least Ω(k 1.5 )

since until we choose ek+√k/2 , the last agent k has at least k/2 functions unsatisfied. However, another
permutation π 0 = (ek , ek+1 , . . . , ek+√k , e1 , . . . , ek−1 ) obtains an objective value of O(k). This implies

that the approximation ratio of the algorithm is Θ( k) and completes the proof.
One can easily show that the approximation ratio of algorithm NG is O(k ln( 1 )) by observing that
the total cover time among all agents is at most k times the maximum cover time. The details can be
found in the proof of the following theorem.
Theorem 2. Algorithm NG achieves (4k ln( 1 ) + 8k)-approximation.
Proof. Consider any element permutation σ. Let F (I, σ) and F (I∪ , σ) be the objective value of instance
I and I∪ when the permutation σ is applied, where I is an instance of SRMA and I∪ is the stacked
Pk Pm (i) (i)
single-agent submodular ranking instance of I. In other words, F (I∪ , σ) = i=1 j=1 wj · cov(fj , σ)
Pm (i) (i)
and F (I, σ) = maxi∈[k] { j=1 wj · cov(fj , σ)}. Clearly, for any permutation σ, we have F (I, σ) ≤
F (I∪ , σ).
Use π to denote the permutation returned by algorithm NG. Let π 0 and π 00 be the optimal solution
to I and I∪ , respectively. Since π 0 is a feasible solution to I∪ , we have F (I∪ , π 00 ) ≤ F (I∪ , π 0 ). Thus,
F (I∪ , π 00 ) ≤ F (I∪ , π 0 )
k X
m
(i) (i)
X
= wj · cov(fj , π 0 )
i=1 j=1
   (1)
Xm 
(i) (i)
≤ k · max wj · cov(fj , π 0 ) 
i∈[k]  
j=1
0
= k · F (I, π )

By [3], we know that π is a (4 ln( 1 ) + 8)-approximation solution to instance I∪ , i.e., F (I∪ , π) ≤ (8 +


4 ln( 1 ))·F (I∪ , π 00 ). Since F (I, π) ≤ F (I∪ , π), we have F (I, π) ≤ (4k ln( 1 )+8k)·F (I, π 0 ) by Equation (1).

3 Balanced Adaptive Greedy for SRMA


In this section, we build on the simple algorithm NG to give a new combinatorial algorithm that obtains
a logarithmic approximation. As the proof of Theorem 1 demonstrates, the shortfall of the previous

5
Algorithm 1 Balanced Adaptive Greedy for SRMA
Input: Ground elements [n]; Agent set A; A collection of k weight vectors { w(i) ∈ Rm + | i ∈ [k] }; A
(i) [n]
collection of m · k submodular functions { fj : 2 → [0, 1] | i ∈ [k], j ∈ [m] }.
Output: A permutation π : [n] → [n] of the elements.
1: p ← 1; q ← 1; t ← 1; Bp ← ( 32 )p · W
(i)
2: For each agent ai , let Rt be the set of functions of agent ai that are not satisfied earlier than time
t.
(i)
3: while there exists an agent ai with w(Rt ) > Bp do
(i)
4: q ← 1; Ap,q ← { ai ∈ A | w(Rt ) > Bp }.
5: while |Ap,q | ≥ 0 do
6: A0p,q ← Ap,q .
7: while |Ap,q | ≥ 34 · |A0p,q | do
(i) (i)
P P (i) f (πt−1 ∪{ e })−fj (πt−1 )
8: Fπt−1 (e) ← ai ∈A0p,q j∈[m] wj · j (i) , ∀e ∈ U \ πt−1 .
1−fj (πt−1 )
9: π(t) ← arg maxe∈U \πt−1 Fπt−1 (e).
(i)
10: t ← t + 1; Ap,q ← { ai ∈ A | w(Rt ) > Bp }.
11: end while
12: q ← q + 1; Ap,q ← Ap,q−1 .
13: end while
14: p ← p + 1.
15: end while
16: return Permutation π.

algorithm is that it does not take into account the progress of agents in the process of covering functions.
After the first step, the last agent k has many more uncovered functions than other agents, but the
algorithm does not prioritize the elements that can cover the functions of this agent. In other words,
algorithm NG performs poorly in balancing agents’ progress, which prevents the algorithm from getting
rid of the polynomial dependence on k. Based on this crucial observation, we propose the balanced
adaptive greedy algorithm.
(i)
We introduce some notation first. Let Rt be the set of functions of agent ai that are not satisfied
(i) (i)
earlier than time t. Note that Rt includes the functions that are satisfied exactly at time t. Let w(Rt )
(i)
be the total weight of functions in Rt . Recall that W is the maximum total function weight among all
2 1 2 2
agents. Let B = { ( 3 ) · W, ( 3 ) · W, . . . } be a sequence, where each entry is referred to as the baseline
in each iteration. Let Bp be the p-th term of B, i.e., Bp = ( 32 )p · W . Roughly speaking, in the p-th
iteration, we want the uncovered functions of any agent to have a total weight at most Bp . Note that
the size of B is Θ(log W ). Given a permutation π, let π(t) be the element selected in the t-th time slot.
Let πt be the set of elements that are scheduled before or at time t, i.e., πt = { π(1), . . . , π(t) }.
Algorithm 1 has two loops: outer iteration (line 3-15) and inner iteration (line 7-11). Let p, q be
the index of the outer and inner iterations, respectively. At the beginning of the p-th outer iteration,
we remove all the satisfied functions and elements selected in the previous iterations, and we obtain a
subinstance, denoted by Ip . At the end of the p-th outer iteration, Algorithm 1 ensures that the total
weight of unsatisfied functions of each agent is at most Bp = ( 32 )p ·W . Let Ap be the set of agents whose
total weight of unsatisfied functions is more than Bp . At the end of (p, q)-th inner iteration, Algorithm 1
guarantees that the number of agents with the total weight of unsatisfied function larger than Bp is at
most ( 34 )q ·|Ap |. Together, this implies there are at most Θ(log W ) outer iterations because Bp decreases
geometrically. Similarly, there are at most Θ(log k) inner iterations for each outer iteration.
Naturally, the algorithm chooses the element that gives the highest total weighted functional increase
over all agents in Ap,q and their corresponding submodular functions. When computing the marginal
increase, each function is divided by how much is left to be fully covered.
For technical reasons, during one iteration of line 7-11, some agents may be satisfied, i.e., their total

6
Figure 1: An illustration of the optimal value’s lower bound. There are four agents in the figure. The
area enclosed by the x-axis, y-axis, and curve w(Ret(i) ) is the total weighted cover time of agent ai . The
optimal value will be the largest one. Then, the total area of α · W · |T ∗ | will be completely included in
the area that is formed by the optimal solution.

weight of unsatisfied functions is at most Bp . Instead of dropping them immediately, we wait until
1/4-the proportion of agents is satisfied, and then drop them together. This is the role of line 6 of
Algorithm 1.

4 Analysis
Given an arbitrary instance I of SRMA, let OPT(I) be an optimal weighted cover time of instance
Pm (i)
I. Our goal is to show Theorem 3. Recall that W = maxi∈[k] j=1 wj is the maximum total weight
among all agents; we assume that all weights are integers. We will first show the upper bound depending
on log W . We will later show how to obtain another bound depending on log n at the end of the analysis,
where n is the number of elements.
Theorem 3. Algorithm 1 obtains an approximation ratio of O(ln(1/) log(min{n, W }) log k), where
 is the minimum non-zero marginal value among all m · k monotone submodular functions, k is the
number of agents and W is the maximum total integer weight of all functions among all agents.
We first show a lower bound of OPT(I). Let π ∗ be an optimal permutation of the ground elements.
Let Ret(i) be the set of functions of agent ai that are not satisfied earlier than time t in π ∗ . Note that R et(i)
includes the functions that are satisfied exactly at time t. Let w(R et(i) ) be the total weight of functions
(i)
et . Let T ∗ be the set of times from t = 1 to the first time that all agents in the optimal solution
in R
satisfy w(R et(i) ) ≤ α · W , where α ∈ (0, 1) is a constant. Let E(T ∗ ) be the corresponding elements in the
optimal solution π ∗ , i.e., E(T ∗ ) = { π ∗ (1), . . . , π ∗ (|T ∗ |) }.
We have the following lower bound of the optimal solution. An example of Fact 1 can be found in
Figure 1.
Fact 1. (Lower Bound of OPT) α · W · |T ∗ | ≤ OPT(I).
Recall that Ip is the subinstance of the initial instance at the beginning of the p-th iteration of
Algorithm 1. Therefore, we have OPT(Ip ) ≤ OPT(I) for all p ≤ dlog W e. Let π ∗ (Ip ) be the optimal
permutation of instance Ip . Recall that Bp = ( 32 )p · W . Let Tp∗ be the set of times from t = 1 to
the first time such that all agents satisfy w(R et(i) ) ≤ 1 · Bp in π ∗ (Ip ). By Fact 1, we know that
12
1 ∗
12 · Bp · |Tp | ≤ OPT(Ip ) ≤ OPT(I). Let Tp be the set of times that are used by Algorithm 1 in
the p-th outer iteration, i.e., Tp = tp , . . . , tp+1 − 1 where tp is the time at the beginning of the p-th

7
outer iteration. Let E(Tp ) be the set of elements that are selected by Algorithm 1 in the p-th outer
iteration. Note that |Tp | is the length of the p-th outer iteration. Recall that every outer iteration
contains Θ(log k) inner iterations. In the remainder of this paper, we denote the q-th inner iteration of
the p-th outer iteration as (p, q)-th iteration.
We now give a simple upper bound of the solution returned by the algorithm.
Fact 2. (Upper Bound of ALG) ALG(I) ≤ p |Tp | · Bp−1 = p 23 · |Tp | · Bp .
P P

Our analysis relies on the following key lemma (Lemma 1). Roughly speaking, Lemma 1 measures
how far the algorithm is behind the optimal solution after each outer iteration.
Lemma 1. For any p ≤ dlog W e, |Tp | ≤ O (1 + ln( 1 )) · log k · |Tp∗ |, where  is the minimum non-zero


marginal value and k is the number of agents.


In the following, we first show that if Lemma 1 holds, Theorem 3 can be proved easily, and then
give the proof of Lemma 1.
Proof of Theorem 3. Combining Fact 1, Fact 2 and Lemma 1, we have ALG(I) ≤ O((1+ln( 1 )) log k log W )·
OPT(I). Formally, we have the following inequalities:
X2
ALG(I) ≤ · |Tp | · Bp [Due to Fact 2]
p
3
X 1
≤ O((1 + ln( )) · log k) · |Tp∗ | · Bp [Due to Lemma 1]
p

X 1
≤ O((1 + ln( )) · log k) · OPT(Ip ) [Due to Fact 1]
p

X 1
≤ O((1 + ln( )) · log k) · OPT(I) [Due to OPT(Ib ) ≤ OPT(I)]
p

 
1
≤ O (1 + ln( )) · log k · log W · OPT(I) [Due to p ≤ dlog W e]

To obtain the other bound of log n claimed in Theorem 3, we make a simple observation. Once the total
weight of the uncovered functions drops below W/n for any agent, they incur at most (W/n)n = W
cost in total afterward as there are at most n elements to be ordered. Until this moment, O(log n)
different values of p were considered. Thus, in the analysis, we only need to consider O(log n) different
values of p, not O(log W ). This gives the desired approximation guarantee.
Now we prove Lemma 1. Let Tp,q be the times that are used by Algorithm 1 in the (p, q)-th iteration.
Let E(Tp,q ) be the set of elements that are selected by Algorithm 1 in the (p, q)-th iteration. Note that

|Tp,q | is the length of the q-th inner iteration in p-th outer inner iteration. Let Tp,q be the set of times
(i)
from t = 1 to the first time such that all agents in Ap,q satisfy w(Rt ) ≤ 12 · Bp in π ∗ (Ip ). Observe
e 1

that if, for any p ≤ dlog W e and q ≤ dlog ke, we have:


 
1 ∗
|Tp,q | ≤ 1 + ln( ) · |Tp,q |, (2)


the proof of Lemma 1 is straightforward because Tp,q ⊆ Tp∗ for any p ≤ dlog W e and the number of inner
iterations is Θ(log k). Thus, the major technical difficulty of our algorithm is to prove Equation (2).
Let π be the permutation returned by Algorithm 1. Note that the time set [n] can be partitioned into
O(log W ·log k) sets. Recall that Tp,q is the time set that are used in the (p, q)-th iteration and Ap,q is the
(i)
set of unsatisfied agents at the beginning of the (p, q)-th iteration, i.e., Ap,q = { ai ∈ A | w(Rt ) > Bp }.
For notation convenience, we define Fπt−1 (e) as follows:
(i) (i)
X X (i) fj (πt−1 ∪ { e }) − fj (πt−1 )
Fπt−1 (e) := wj · (i)
ai ∈Ap,q j∈[m] 1 − fj (πt−1 )

8
We present two technical lemmas sufficient to prove Equation (2).
Lemma 2. For any p ≤ dlog W e and q ≤ dlog ke, we have
 
X 1
Fπt−1 (et ) ≤ 1 + ln( ) · |Ap,q | · Bp−1

t∈Tp,q

Lemma 3. For any p ≤ dlog W e and q ≤ dlog ke, we have


X 2 |Tp,q |
Fπt−1 (et ) ≥ · ∗ · |Ap,q | · Bp
3 |Tp,q |
t∈Tp,q

The proof of Lemma 2 uses a property of a monotone function (Proposition 1). Note that the proof
of Proposition 1 can be found in [3] and [18]. Here we only present the statement for completeness.
Proving Lemma 3 uses a property that is more algorithm-specific (Proposition 2). Recall that in each
step t of an iteration (p, q), Algorithm 1 will greedily choose the element et among all unselected elements
such that Fπt−1 (et ) is maximized. Then we have that the inequality Fπt−1 (et ) ≥ Fπt−1 (e) holds for any
element e ∈ U , and hence, for any element selected by the optimal solution. By an average argument,

we can build the relationship between Fπt−1 (e) and |Tp,q |, and prove Lemma 3. Equation (2) follows
from Lemma 2 and Lemma 3 simply by concatenating the two inequalities.
Proposition 1. (Claim 2.4 in [18]) Given an arbitrary monotone function f : 2[n] → [0, 1] with
f ([n]) = 1 and sets ∅ = S0 ⊆ S1 ⊆ . . . ⊆ Sh ⊆ [n], we have
h
X f (Si ) − f (Si−1 ) 1
≤ 1 + ln ,
i=1
1 − f (Si−1 ) δ
where δ > 0 is such that for any A ⊆ B, if f (B) − f (A) ≥ 0 =⇒ f (B) − f (A) ≥ δ.
Proof of Lemma 2. Consider an arbitrary pair of p ≤ dlog W e and q ≤ dlog ke. Recall that Ap,q is the
(i)
set of unsatisfied agents, i.e., for any ai ∈ Ap,q , we have w(Rt ) > Bp at the beginning of the (p, q)-th
iteration. Recall that Tp,q is a set of times that are used in the iteration (p, q). Let t̄ be the last time in
Tp,q−1 if q ≥ 2; otherwise let t̄ be the first time of Tp,q . Note that, during the whole q-th inner iteration,
the agent set that we look at keeps the same (line 6 of Algorithm 1). Then, we have
X
Fπt−1 (et )
t∈Tp,q
(i) (i)
X X X (i) fj (πt−1 ∪ { et }) − fj (πt−1 )
= wj · (i)
t∈Tp,q ai ∈Ap,q j∈R(i) 1 − fj (πt−1 )
t

(i) (i)
X X X (i) fj (πt−1 ∪ { et }) − fj (πt−1 ) (i) (i)
≤ wj · (i)
[Due to Rt ⊆ Rt̄ ]
ai ∈Ap,q j∈R(i) t∈Tp,q 1− fj (πt−1 )

X X (i)
X fj(i) (πt−1 ∪ { et }) − fj(i) (πt−1 )
= wj · (i)
ai ∈Ap,q j∈R(i) t∈Tp,q 1 − fj (πt−1 )

  X
1 X (i)
≤ 1 + ln · wj [Due to Proposition 1]

ai ∈Ap,q j∈R(i)

  X
1
≤ 1 + ln · Bp−1 [Due to the definition of t̄]

ai ∈Ap,q
 
1
≤ 1 + ln · |Ap,q | · Bp−1


9
The last piece is proving Lemma 3, which relies on following property of the greedy algorithm.
Proposition 2. For any p ≤ dlog W e, q ≤ dlog ke, consider an arbitrary time t ∈ Tp,q and let et be the

element that is selected by Algorithm 1 at time t. For any e ∈ E(Tp,q ), we have Fπt−1 (et ) ≥ Fπt−1 (e).

Proof of Lemma 3. By Proposition 2, we know that Fπt−1 (et ) ≥ Fπt−1 (e) for any e ∈ E(Tp,q ). Thus, by
average argument, we have Equation (3) for any t ∈ Tp,q . Recall that et is the element that is selected
by Algorithm 1 at time t.
1 X
Fπt−1 (et ) ≥ ∗
· Fπt−1 (e) (3)
|E(Tp,q )| ∗ e∈E(Tp,q )

(i)
Let t∗ be the last time in Tp,q ∗
, i.e., t∗ is the first time such that all agents in Ap,q satisfy w(R
et ) ≤ 1 ·Bp
12
in π ∗ (Ip ), where π ∗ (Ip ) is the optimal permutation to the subinstance Ip . Let e t be the last time in Tp,q .
(i)
Note that, for any t ∈ Tp,q , we know that there are at most 41 · |Ap,q | agents satisfying w(Rt ) ≤ Bp .
Thus, we have
X (i) 3
w(Rt ) > · |Ap,q | · Bp , ∀t ∈ Tp,q (4)
4
ai ∈Ap,q
P (i) 3
t ∈ Tp,q , we know that
Since e ai ∈Ap,q w(Ret ) > 4 · |Ap,q | · Bp . Thus, we have the following inequalities.
X
Fπt−1 (et )
t∈Tp,q
(i) (i)
X X X (i) fj (πt−1 ∪ { et }) − fj (πt−1 )
= wj · (i)
t∈Tp,q ai ∈Ap,q j∈R(i) 1 − fj (πt−1 )
t

(i) (i)
X 1 X X X (i) fj (πt−1 ∪ { e }) − fj (πt−1 )
≥ ∗ )|
wj · (i)
[Due to Equation (3)]
|E(Tp,q ∗ ) a ∈A 1 − fj (πt−1 )
t∈Tp,q e∈E(Tp,q i p,q j∈R(i)
t

(i) ∗ (i)
X 1 X X (i) fj (πt−1 ∪ E(Tp,q )) − fj (πt−1 ) (i)
≥ ∗ )|
wj · (i)
[Due to fj is submodular]
|E(Tp,q 1− fj (πt−1 )
t∈Tp,q ai ∈Ap,q j∈R(i)
t

(i) ∗ (i)
X 1 X X (i) fj (πt−1 ∪ E(Tp,q )) − fj (πt−1 )
≥ ∗
· wj · (i)
|E(Tp,q ) | 1 − fj (πt−1 )
t∈Tp,q ai ∈Ap,q j∈R(i) \R
e(i)
t
e t∗
(i) e(i) (i)
[Due to Ret \ R t∗ ⊆ Re
t
]
X 1 X (i) e(i) (i)
∗ (i)
≥ ∗
· w(Ret \ R t∗ ) [Due to fj (E(Tp,q )) = 1 for any j ∈
/Re∗]
t
|E(Tp,q )|
t∈Tp,q ai ∈Ap,q
X 1 X  (i) e(i)

≥ ∗
· w(Ret ) − w(R t∗ )
|E(Tp,q )|
t∈Tp,q ai ∈Ap,q
 
|Tp,q |  X (i) 1
≥ ∗ · w(Ret ) − · |Ap,q | · Bp  [Due to the definition of t∗ ]
|Tp,q | 12
ai ∈Ap,q
 
|Tp,q | 3 1
≥ ∗ · · |Ap,q | · Bp − · |Ap,q | · Bp [Due to Equation (4)]
|Tp,q | 4 12
2 |Tp,q |
= · ∗ · |Ap,q | · Bp
3 |Tp,q |

10
Proof of Lemma 1. Combining the above lemmas, we have
X
|Tp | = |Tp,q |
q≤dlog ke
 
1 ∗
≤ 1 + ln( ) · dlog ke · |Tp,q | [Due to Lemma 2 and Lemma 3]

 
1
≤ O (1 + ln( )) · log k · |Tp∗ | ∗
[Due to |Tp,q | ≤ |Tp∗ |]


5 Inapproximability Results for SRMA and Tight Approxima-


tion for GMSC
In this section, we consider a special case of SRMA, which is called Generalized Min-sum Set Cover for
Multiple Agents (GMSC), and leverage it to give a lower bound on the approximation ratio of SRMA.
We first state the formal definition.

GMSC for Multiple Agents. An instance of this problem consists of a ground set U := [n], and a
set of k agents A := { a1 , . . . , ak }. Every agent ai , i ∈ [k], has a collection S (i) of m subsets defined
(i) (i) (i)
over U , i.e., S (i) = { S1 , . . . , Sm } with Sj ⊆ U for all j ∈ [m]. Each set Sj comes with a coverage
(i) (i)
requirement K(Sj ) ∈ { 1, 2, . . . , |Sj | }. Given a permutation π : [n] → [n] of the elements in U , the
(i) (i) (i) (i)
cover time cov(Sj , π) of set Sj is defined to the first time such that K(Sj ) elements from Sj are hit
in π. The goal is to find a permutation π of the ground elements such that the maximum
nP total cover time
o
m (i)
is minimized among all agents, i.e., finding a permutation π such that maxi∈[k] j=1 cov(Sj , π) is
minimized.

5.1 Inapproximability Results


We first show a lower bound of GMSC. The basic idea is to show that GMSC captures the classical set
cover instance as a special case. To see the reduction, think of each element e in the set cover instance
as an agent in the instance of GMSC and each set S as a ground element. Every agent has only one
set, and all of the coverage requirements are 1. Then, a feasible solution to the set cover instance will
be a set of subsets (elements in GMSC instance) that covers all elements (satisfies all agents in GMSC
instance). Thus, the optimal solutions for these two instances are equivalent.

Lemma 4. For the generalized min-sum set cover for multiple agents problem, given any constant
c < 1, there is no c · ln k-approximation algorithm unless P=NP, where k is the number of agents.
Proof. To prove the claim, we present a approximation-preserving reduction from the set cover problem.
Given an arbitrary instance of set cover (E, C), where E = { e1 , . . . , ep } is the set of ground elements
and C = { C1 , . . . , Cq } is a collection of subsets defined over E. By [13], we know that it is NP-hard to
approximate the set cover problem within c · ln p factor for any constant c < 1, where p is the number
of ground set elements.
For any element ei ∈ E, let V (ei ) be the collection of the subsets that contain ei , i.e., V (ei ) =
{ Cj ∈ C | ei ∈ Cj }. For each element ei ∈ E, we create an agent ai in our problem. For each subset
Cj ∈ C of in the set cover instance, we create a ground set element j ∈ U . In total, there are p agents
and q ground set elements, i.e., |A| = p and |U | = q. Every element ei in the set cover instance has an
adjacent set V (ei ) which is a collection of the subsets. Every subset in the set cover instance corresponds
an element in our problem. Thus, V (ei ) corresponds a subset of U of our problem. Let S (i) be the

11
corresponding subsets of V (ei ) in our problem (S (i) ⊆ U ). Every agent ai has only one set S (i) . And
all sets come with a same covering requirement 1.
In such an instance of our problem, the total cover time of each agent is exactly the same as the
hitting time of of her only set. Thus, the maximum total cover time among all agents is exactly the
same as the length of element permutation. Therefore, the optimal solution to the set cover instance
and the constructed instance of our problem can be easily converted to each other. Moreover, both the
optimal solutions share a same value which completes the proof.
Note that our problem admits the submodular ranking problem as a special case which is c ln(1/)-
hard to approximate for some constant c > 0 [3]. Thus, our problem has a natural lower bound
Ω(ln(1/)). Hence, by combining Lemma 4 and the lower bound of classical submodular ranking, one
can easily get a Ω(ln(1/) + log k) lower bound for SRMA .
Theorem 4. The problem of min-max submodular ranking for multiple agents cannot be approximated
within a factor of c · (ln(1/) + log k) for some constant c > 0 unless P=NP, where  is the minimum
non-zero marginal value and k is the number of agents.

5.2 Tight Approximation for GMSC


This subsection mainly shows the following theorem.
Theorem 5. There is a randomized algorithm that achieves Θ(logk)-approximation for generalized
min-sum set cover for multiple agents, where k is the number of agents.

Now we show a O(log k)-approximation algorithm for the generalized min-sum set cover for multiple
agents (Theorem 5). Our algorithm is LP-based randomized algorithm mainly follows from [6]. In
the LP relaxation, the variable xe,t indicates for whether element e ∈ U is scheduled at time t ∈ [n].
(i)
The variable yS (i) ,t indicates for whether set Sj has been covered before time t ∈ [n]. To reduce
j
the integrality gap, we introduce an exponential number of constraints. We show that the ellipsoid
algorithm can solve such an LP. We partition the whole timeline into multiple phases based on the
optimal fractional solution. In each phase, we round variable xe,t to obtain an integer solution. The
main difference from [6], we repeat the rounding algorithm Θ(log k) time and interleave the resulting
solutions over time to avoid the bad events for all k agents simultaneously. Then, we can write down
the following LP. Our objective function is min max which is not linear. We convert the program into
linear by using the standard binary search technique. Namely, fix an objective value T and test whether
the linear program has a feasible solution.

X X
1 − yS (i) ,t ≤ T ∀ai ∈ A
j
t∈[n] S (i) ∈S (i)
j
X
xe,t = 1 ∀t ∈ [n]
e∈U
X
xe,t = 1 ∀e ∈ U
t∈[n]
(i) (i) (i)
X X
xe,t0 ≥ (K(Sj ) − |B|)yS (i) ,t ∀Sj ∈ S (i) , B ⊆ Sj , t ∈ [n] (5)
j
0
e∈Sj \B t <t
(i)

xe,t , yS (i) ,t ∈ [0, 1]


j

Note that the size of the LP defined above is exponential. We first show that it can be solved in
polynomial time by giving a separation oracle.
Lemma 5. There is a polynomial time separation oracle for the LP defined above.

12
Now, we describe our algorithm. Firstly, we solve the LP above and let (x∗ , y ∗ , T ∗ ) be the optimal
fractional solution of the LP. Then, our rounding algorithm consists of dlog ne phases. Consider one
phase ` ≤ dlog ne. We first independently run 2 log k times Algorithm 2. Let π`q be the q-th rounding
solution for phase `. After dlog ne phases, we outputs π = π11 . . . π1log k . . . πdlog
1 log k
ne . . . πdlog ne . Note that
the concatenation of π`q only keeps the first occurrence of each element.

Algorithm 2 LP Rounding for phase `


1: let t` = 2` .P
2: let x̄e,` ← t0 <t` x∗e,t0 .
3: let pe,` = min{1, 8 · x̄e,` }. Pick e ∈ U with probability pe,` .
4: let π` be an arbitrary order of picked elements.
5: if |π` | > 16 · 2` , then let π be empty.

(i)
For any set Sj , let t∗ (i) be the last time t such that yS (i) ,t ≤ 21 . Since for time t ≤ t∗ (i) , 1−yS (i) ,t ≥ 12 ,
Sj j Sj j

the following Lemma 6 is immediately proved. This gives us a lower bound of the optimal solution.
Lemma 6. For any i ∈ [k] and j ∈ [m], T ∗ ≥ 12 S (i) ∈S (i) t∗ (i) .
P
j Sj

(i) (i)
Lemma 7. For any agent ai and set Sj in phase ` such that t∗ (i) ≤ t` , the probability that K(Sj )
Sj
(i)
elements from Sj are not picked in phase ` is at most e−9/8 .
(i) (i)
Proof. Consider set Sj . Let Sg = { e ∈ Sj : x̄e,` ≥ 1/8 }. For e ∈ Sg , by Step 3 of Algorithm 2, we
(i)
know that element e is picked in phase `. Therefore, if |Sg | ≥ K(Sj ), the lemma holds.
(i)
Thus, we only need to focus on |Sg | < K(Sj ). Since t∗ (i) ≤ t` , we know y ∗ (i) > 1/2. Plugging
Sj Sj ,t`
B = Sg into Equation (5), we have
X X X
x̄e,` = x∗e,t0
0
e∈Sj \Sg t <t`
(i) (i)
e∈Sj \Sg
(i)
≥ (K(Sj ) − |Sg |)yS∗ (i) ,t
j `

(i)
≥ 1/2(K(Sj ) − |Sg |)
(i)
Therefore, the expected number of elements from Sj \ Sg picked in phase ` is
(i)
X
8x̄e,` ≥ 4(K(Sj ) − |Sg |)
(i)
e∈Sj \Sg

Since every element is independently picked, we can use


Pthe following Chernoff bound: If X1 , ..., Xn
n
are independent 0,1-valued random variables with X = i=1 Xi such that µ = E[X], then Pr[X <
2 (i)
(1 − β)µ] ≤ exp(− β2 µ). Since the expected number of elements picked is at least 4(K(Sj ) − |Sg |) ≥ 4,
(i)
set β = 3/4 and µ = 4, the probability the less than K(Sj ) − |Sg | were picked is at most e−9/8 .
Combining with the event that all elements in Sg were picked, Lemma 7 follows.

Lemma 8. The probability of Step 5 in Algorithm 2 is at most e−6 .


Proof. We use the following Chernoff bound: If X1 , . . . , Xn are independent 0,1-valued random variables
Pn β2
with X = i=1 Xi such that µ = E[X], then Pr[X > µ + β] ≤ exp(− 2µ+2β/3 ).
Since the expected number of elements picked in π` is at most 8 · 2 , set β = µ = 8 · 2` , we have the
`
−64·22` −6
the probability of more than 16 · 2` were picked is at most exp( (64/3)2 `) ≤ e .

13
Lemma 9. For any q ≤ 2dlog ke, the expected cover time of any agent given by π q = π1q . . . πdlog
q
ne is

at most 512 · T .

Proof. For simplicity, fix q and agent ai , we may drop q and i from the notation. Let ESj ,` denote the
event that Sj is first covered in phase `. Consider an agent ai . By linearity of expectation, we only
focus on a set Sj . Then we have

dlog ne
X
q
E[cov(Sj , π )] ≤ 2 · 16 · 2` · Pr[ESj ,` ]
`=dlog t∗
Sj e

For each phase `, Sj is not covered only if K(Sj ) elements from Sj were not picked in π`q , or π`q
is empty. From Lemma 7 and Lemma 8, the probability that Sj is not cover in any phase is at most
e−9/8 + e−6 < e−1 . Therefore, we have

Pr[ESj ,` ] ≤ exp(−(i − dlog t∗Sj e))

Plugging in the inequality above, we have


dlog ne
X
E[cov(Sj , π q )] ≤ 2 · 16 · 2` · exp(−(i − dlog t∗Sj e))
`=dlog t∗
Sj e

≤ 256 · t∗Sj

Then, by linearity of expectation, we get


m
X m
X m
X
E[ cov(Sj , π q )] = E[cov(Sj , π q )] ≤ 256 · t∗Sj ≤ 512 · T ∗
j=1 j=1 j=1

Lemma 10. The probability that the total cover time is at most O(log k) · T ∗ for all agents is at least
Pm (i)
1 − k1 , i.e., Pr[maxi∈[k] j=1 cov(Sj , π) ≤ 1024 log k · T ∗ ] ≥ 1 − k1 , where k is the number of agents.
Proof. By union bound, it is sufficient to show that, for every agent ai , the probability that the total
cover time of agent ai is at least 1 − k12 . From Lemma 9, for every q, the expected total cover time
given by π q is at most 512 · T ∗ . Then, by Markov’s inequality, we have
m
(i)
X
Pr[ cov(Sj , π q ) ≤ 1024 · T ∗ ] ≥ 1/2
j=1

Pm (i)
Recall that π = π11 . . . π1log k . . . πdlog
1 log k
ne . . . πdlog ne , then, for any agent ai , j=1 cov(Sj , π) >
m (i)
1024 log k · T ∗ happens only if j=1 cov(Sj , π q ) > 1024 · T ∗ for all d2 log ke independent rounding.
P
Therefore, we get
m
(i)
X
Pr[ cov(Sj , π) ≤ 1024 log k · T ∗ ] ≥ 1 − 2−2dlog ke ≥ 1 − 1/k 2
j=1

14
6 Experiments
This section investigates the empirical performance of our algorithms. We seek to show that the theory
is predictive of practice on real data. We give experimental results for the min-max optimal decision
tree over multiple agents. At the high level, there is a set of objects where each object is associated
with an attribute vector. We can view the data as a table with the objects as rows and attributes as
columns. Each agent i has a target object oi sampled from a different distribution over the objects. Our
goal is to order the columns to find all agents’ targets as soon as possible. When column c is chosen,
for each agent i, rows that have a different value from oi in column c are discarded. The target object
oi is identified when all the other objects are inconsistent with oi in one of the columns probed.

Data Preparation. In the experiments, three public data sets are considered: MFCC data set3 ,
PPPTS data set4 , and CTG data set5 . These are the data sets in the field of life science. Each one is in
table format and the entries are real-valued. The sizes of the three data sets are 7195 × 22, 45730 × 9,
and 2126 × 23 respectively, where h × n implies h rows and n columns.
The instances are constructed as follows. We first discretize real-valued entries, so entries in each
column can have at most ten different values. This is because the objects’ vectors could have small
variations, and we seek to make them more standardized. Let the ground set of elements U be all the
columns for a table, and view each row j as an object. Create a submodular function fj for each object
j: For a subset S ⊆ U , fj (S) represents the number of rows that have different attributes from row j
after checking the columns in S. If fj (S) = f (U ), the object in row j can be completely distinguished
from other objects by checking columns in S. We then normalize functions to have a range of [0, 1].
Note that the functions are created essentially in the same way they are in the reduction of the optimal
decision tree to submodular ranking, as discussed in Section 1. All agents have the same table, that is,
the same objects, functions, and columns. We construct a different weight vector for each agent, where
each entry has a value sampled uniformly at random from [1, 100]. In the following, we use K and M
to denote the number of agents and the number of functions, respectively.

Baseline Algorithms and Parameter Setting. We refer to our main Algorithm 1 as balanced
adaptive greedy (BAG). We also refer to the naive adaptation of the algorithm proposed by [3] as
normalized greedy (NG). For full description of NG, see Section 2. The algorithms are compared to
two natural baseline algorithms. One is called the random (R) algorithm, which directly outputs a
random permutation of elements. The other is the greedy (G) algorithm, which selects the element
that maximizes the total increment of all K · M functions each time. Notice that Algorithm 1, the
decreasing ratio of the sequence in algorithm BAG is set to be 2/3, but in practice, this decreasing
ratio is flexible. In the experiments, we test the performance of algorithm BAG with decreasing ratios
in [0, 0.05, 0.1, . . . , 0.95, 1] and pick the best decreasing ratio.
We conduct the experiments6 on a machine running Ubuntu 18.04 with an i7-7800X CPU and 48
GB memory. We investigate the performance of algorithms on different data sets under different values
of K and M . The results are averaged over four runs. The data sets give the same trend. Thus, we
first show the algorithms’ performance with K = M = 10 on the three data sets and only present the
results on the MFCC data set when K and M vary in Figure 2. The results on the other two data sets
appear in Appendix A.2.

Empirical Discussion. From the figures, we see that the proposed balanced adaptive greedy algo-
rithm always obtains the best performance for all datasets and all values of K and M . Moreover,
Figure 2(b) shows that as K increases, the objective value of each algorithm generally increases, imply-
ing that the instance becomes harder. In these harder instances, algorithm BAG has a more significant
3 https://archive.ics.uci.edu/ml/datasets/Anuran+Calls+%28MFCCs%29
4 https://archive.ics.uci.edu/ml/datasets/Physicochemical+Properties+of+Protein+Tertiary+Structure#
5 https://archive.ics.uci.edu/ml/datasets/Cardiotocography
6 The code is available at https://github.com/Chenyang-1995/Min-Max-Submodular-Ranking-for-Multiple-Agents

15
(a) (b)

(c) (d)

Figure 2: Figure 2(a) shows the results on different datasets when both the number of agents K and
the number of functions per agent M are 10. Others show the performance of algorithms on the MFCC
dataset when K and M vary. In Figure 2(b), we fix M = 10 and increase K, while in Figure 2(c), we
fix K = 10 and increase M . Finally, Figure 2(d) shows the algorithms’ performance when K and M
are set to be the same value and increase together.

improvement over other methods. Conversely, Figure 2(c) indicates that we get easier instances as M
increases because all the curves generally give downward trends. In this case, although the benefit of
our balancing strategy becomes smaller, algorithm BAG still obtains the best performance.

7 Conclusion
The paper is the first to study the submodular ranking problem in the presence of multiple agents. The
objective is to minimize the maximum cover time of all agents, i.e., optimizing the worst-case fairness
over the agents. This problem generalizes to designing optimal decision trees over multiple agents
and also captures other practical applications. By observing the shortfall of the existing techniques,
we introduce a new algorithm, balanced adaptive greedy. Theoretically, the algorithm is shown to
have strong approximation guarantees. The paper shows empirically that the theory is predictive of
experimental performances. Balanced adaptive greedy is shown to outperform strong baselines in the
experiments, including the most natural greedy strategies.
The paper gives a tight approximation algorithm for generalized min-sum set cover on multiple

16
agents, which is a special case of our model. The upper bound shown in this paper matches the
lower bound introduced. The tight approximation for the general case is left as an interesting open
problem. Beyond the generalized min-sum set cover problem, another special case of our problem is
also interesting in which the monotone submodular functions of each agent are the same. Observing
the special case above also captures the problem of Optimal Decision Tree with Multiple Probability
Distribution. Thus, improving the approximation for this particular case will be interesting.

Acknowledgments
Chenyang Xu was supported in part by Science and Technology Innovation 2030 –“The Next Generation
of Artificial Intelligence” Major Project No.2018AAA0100900. Qingyun Chen and Sungjin Im were
supported in part by NSF CCF-1844939 and CCF-2121745. Benjamin Moseley was supported in part
by a Google Research Award, an Infor Research Award, a Carnegie Bosch Junior Faculty Chair and
NSF grants CCF-1824303, CCF-1845146, CCF-1733873 and CMMI-1938909. We thank the anonymous
reviewers for their insightful comments and suggestions.

17
References
[1] Jacob D. Abernethy et al. “Active Sampling for Min-Max Fairness”. In: ICML. Vol. 162. Pro-
ceedings of Machine Learning Research. PMLR, 2022, pp. 53–65.
[2] Arpit Agarwal, Sepehr Assadi, and Sanjeev Khanna. “Stochastic submodular cover with limited
adaptivity”. In: Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algo-
rithms. SIAM. 2019, pp. 323–342.
[3] Yossi Azar and Iftah Gamzu. “Ranking with Submodular Valuations”. In: SODA. SIAM, 2011,
pp. 1070–1079.
[4] Yossi Azar, Iftah Gamzu, and Xiaoxin Yin. “Multiple intents re-ranking”. In: Proceedings of the
forty-first annual ACM symposium on Theory of computing. 2009, pp. 669–678.
[5] Francis Bach et al. “Learning with submodular functions: A convex optimization perspective”.
In: Foundations and Trends® in Machine Learning 6.2-3 (2013), pp. 145–373.
[6] Nikhil Bansal, Anupam Gupta, and Ravishankar Krishnaswamy. “A Constant Factor Approxima-
tion Algorithm for Generalized Min-Sum Set Cover”. In: Proceedings of the Twenty-First Annual
ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, Austin, Texas, USA, January 17-
19, 2010. SIAM, 2010, pp. 1539–1545.
[7] Nikhil Bansal et al. “Improved approximations for min sum vertex cover and generalized min sum
set cover”. In: Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA).
SIAM. 2021, pp. 998–1005.
[8] Solon Barocas, Moritz Hardt, and Arvind Narayanan. “Fairness in machine learning”. In: Nips
tutorial 1 (2017), p. 2.
[9] Venkatesan T Chakaravarthy et al. “Decision trees for entity identification: Approximation al-
gorithms and hardness results”. In: Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-
SIGART symposium on Principles of database systems. 2007, pp. 53–62.
[10] Alexandra Chouldechova and Aaron Roth. “A snapshot of the frontiers of fairness in machine
learning”. In: Communications of the ACM 63.5 (2020), pp. 82–89.
[11] Sam Corbett-Davies et al. “Algorithmic decision making and the cost of fairness”. In: Proceedings
of the 23rd acm sigkdd international conference on knowledge discovery and data mining. 2017,
pp. 797–806.
[12] Sanjoy Dasgupta. “Analysis of a greedy active learning strategy”. In: Advances in neural infor-
mation processing systems 17 (2004).
[13] Uriel Feige. “A Threshold of ln n for Approximating Set Cover”. In: J. ACM 45.4 (1998), pp. 634–
652.
[14] Uriel Feige, László Lovász, and Prasad Tetali. “Approximating min sum set cover”. In: Algorith-
mica 40.4 (2004), pp. 219–234.
[15] Daniel Golovin, Andreas Krause, and Debajyoti Ray. “Near-optimal bayesian active learning with
noisy observations”. In: Advances in Neural Information Processing Systems 23 (2010).
[16] Anupam Gupta, Viswanath Nagarajan, and R Ravi. “Approximation algorithms for optimal de-
cision trees and adaptive TSP problems”. In: Mathematics of Operations Research 42.3 (2017),
pp. 876–896.
[17] Marwa El Halabi et al. “Fairness in Streaming Submodular Maximization: Algorithms and Hard-
ness”. In: NeurIPS. 2020.
[18] Sungjin Im, Viswanath Nagarajan, and Ruben van der Zwaan. “Minimum Latency Submodular
Cover”. In: ACM Trans. Algorithms 13.1 (2016), 13:1–13:28.
[19] Su Jia, Fatemeh Navidi, R Ravi, et al. “Optimal decision tree with noisy outcomes”. In: Advances
in neural information processing systems 32 (2019).

18
[20] Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. “Inherent trade-offs in the fair
determination of risk scores”. In: arXiv preprint arXiv:1609.05807 (2016).
[21] Ninareh Mehrabi et al. “A survey on bias and fairness in machine learning”. In: ACM Computing
Surveys (CSUR) 54.6 (2021), pp. 1–35.
[22] Bozidar Radunovic and Jean-Yves Le Boudec. “A unified framework for max-min and min-max
fairness with applications”. In: IEEE/ACM Transactions on networking 15.5 (2007), pp. 1073–
1083.
[23] David P Williamson and David B Shmoys. The design of approximation algorithms. Cambridge
university press, 2011.

19
A More Experimental Results
A.1 Data Set Descriptions
The experments are implemented on three public data sets: mel-frequency cepstral coefficients of anuran
calls (MFCC) data set7 , physicochemical properties of protein tertiary structures (PPPTS) data set8 ,
and diagnostic features of cardiotocograms (CTG) data set9 .
The first MFCC data set was created by 60 audio records belonging to 4 different families, 8 genus,
and 10 species. It has 7195 rows and 22 columns, where each row corresponds to one specimen (an
individual frog) and each column is an attribute. The second PPPTS data set is a data set of Physic-
ochemical Properties of Protein Tertiary Structure. It is taken from Critical Assessment of protein
Structure Prediction (CASP), which is a worldwide experiment for protein structure prediction taking
place every two years since 1994. The data set consists of 45730 rows and 9 columns. The third CTG
data set consists of measurements of fetal heart rate (FHR) and uterine contraction (UC) features on
cardiotocograms classified by expert obstetricians. It has 2126 rows and 23 columns, corresponding to
2126 fetal cardiotocograms and 23 diagnostic features.

A.2 Results on Other Data Sets


In this section, we show the performance of algorithms on the PPPTS data set (Figure 3) and the CTG
data set (Figure 4). The figures show the same trends. In addition, we also investigate the empirical
results of these algorithms if the objective is the average cover time of agents, rather than the maximum
cover time. We see that even in this case, our balanced adaptive greedy method still outperforms other
algorithms on different data sets (Figure 5, Figure 6 and Figure 7).

(a) (b) (c)

Figure 3: The performance of algorithms on the PPPTS dataset when K and M vary.

7 https://archive.ics.uci.edu/ml/datasets/Anuran+Calls+%28MFCCs%29
8 https://archive.ics.uci.edu/ml/datasets/Physicochemical+Properties+of+Protein+Tertiary+Structure#
9 https://archive.ics.uci.edu/ml/datasets/Cardiotocography

20
(a) (b) (c)

Figure 4: The performance of algorithms on the CTG dataset when K and M vary.

(a) (b) (c)

Figure 5: The algorithms’ average cover times of agents on the MFCC dataset when K and M vary.

(a) (b) (c)

Figure 6: The algorithms’ average cover times of agents on the PPPTS dataset when K and M vary.

(a) (b) (c)

Figure 7: The algorithms’ average cover times of agents on the CTG dataset when K and M vary.

21

You might also like