19 Submodular

-- Class lecture on Thu 3/14 is canceled! But the video of the lecture will be posted on SCPD.
-- HW4 due today! -- CS341 info session is on Thu 3/14 6pm in Gates415
CS246: Mining Massive Datasets Jure Leskovec, Stanford University
http://cs246.stanford.edu
Redundancy leads to a bad user experience
Uncertainty around information need => dont put all eggs in one basket
How do we optimize for diversity directly?

Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 2
3/12/2013
France intervenes Chuck for Defense Argo wins big Hagel expects fight
Monday, January 14, 2013

3/12/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 3
France intervenes Chuck for Defense Argo wins big New gun proposals
Monday, January 14, 2013

Idea: Encode diversity as coverage problem Example: Word cloud of news for a single day
Want to select articles so that most words are covered
3/12/2013
Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu
Q: What is being covered? A: Concepts (here: named entities)
France Mali Hagel Pentagon Obama Romney Zero Dark Thirty Argo NFL
Hagel expects fight
Q: Who is doing the covering? A: Documents

3/12/2013
Suppose we are given a set of documents V

Each document d covers a set of words/topics/named entities W
For each set of documents A we define =
Goal: We want to () Note: F(A) is a set function: 2W
Given universe of elements = {, , } and sets , ,

X3
X2
X1 X4
Goal: Find k sets Xi that cover the most of W

More precisely: Find a k sets Xi whose size of the union is the largest? Bad news: A known NP-complete problem
3/12/2013
Simple Heuristic: Greedy Algorithm: Start with = { } For =

Take set that ( {}) Let = {}
Example:

Eval. , , ({ }), pick best (say ) Eval. } { , , ({ } { }), pick best (say ) Eval. ({ , } { }), , ({ , } { }), pick best And so on
3/12/2013
Greedy produces a solution A where: F(A) (1-1/e)*OPT (F(A)>0.63*OPT)

[Nemhauser, Fisher, Wolsey 78]
Claim holds for functions F() with 2 properties:

F is monotone: (adding more docs doesnt decrease coverage) if A B then F(A) F(B) and F({})=0 F is submodular: adding an element to a set gives less improvement than adding it to one of its subsets
3/12/2013
10
Definition: Set function F() is called submodular if: For all A,B W: F(A) + F(B) F(A B) + F(A B)
+ A A B B
A B
3/12/2013
11
Diminishing returns characterization Equivalent definition: Set function F() is called submodular if: For all A B, sB:
F(A d) F(A) F(B d) F(B)

Gain of adding d to a small set Gain of adding d to a large set
B A
+ Xd + Xd
Large improvement Small improvement

12
3/12/2013
F() is submodular: A B

Gain of adding Xd to a small set Gain of adding Xd to a large set
Natural example:
Sets 1, , =
(size of the covered area)
Xd B
Xd
13
Claim: () is submodular!
3/12/2013
Submodularity is discrete analogue of concavity

F()
F(B d) F(B) F(A d) F(A) Adding d to B helps less than adding it to A!
A B

Gain of adding Xd to a small set
3/12/2013
Solution size |A|
Gain of adding Xd to a large set

14
Marginal gain: = () Submodular: () Concavity: + + ()
F(A)
3/12/2013
|A|
15
Let be submodular and > then = is submodular

Submodularity is closed under non-negative linear combinations! This is an extremely useful fact:
Average of submodular functions is submodular: = Multicriterion optimization: =
3/12/2013
16
Objective: pick k docs that cover most concepts
Enthusiasm for Inauguration wanes
Inauguration weekend
F(A) is the number of concepts covered by A

Elementsconcepts, Sets concepts in docs F(A) is submodular andmonotone! We can use greedy to optimize F
3/12/2013
17
Objective: pick k docs that cover most concepts
Inauguration weekend
Penalizes redundancy Submodular

3/12/2013
Concept importance? weights All-or-nothing too harsh

18
Document coverage function probability document d covers concept c

[e.g., how strongly d covers c]
Obama Romney

Document coverage function: probability document d covers concept c

Coverd(c) can model how relevant is concept c for user u
Set coverage function:

Prob. that at least one document in A covers c
Objective:
concept weights
3/12/2013
20
The objective function is submodular

Intuitive diminishing returns property Greedy algorithm leads to a (1 1/e) ~ 63% approximation, i.e., a near-optimal solution
But greedy submodular function optimization is slow!

3/12/2013
Greedy
Marginal gain: F(Ax)-F(A) a
Greedy algorithm is slow!

At each iteration we need to re-evaluate marginal gains of all remaning documents Runtime (|| ) for
selecting documents
b
c d e Add document with highest marginal gain
3/12/2013
Part 2-22
[Leskovec et al., KDD 07]
In round : So far we have = { , , }

Now we pick = ( {}) ( )
Greedy algorithm maximizes the marginal benefit = ( {}) ( )
By submodularity property:
for <
Observation: By submodularity: For every () () for < since Marginal benefits () only shrink!
(as i grows)
i(d) j(d) d
Selecting document d in step i covers more words than selecting d at step j (j>i)
23
3/12/2013
Idea:
Use i as upper-bound on j (j > i)
Marginal gain a b c d e A1={a}
Lazy Greedy:
Keep an ordered list of marginal benefits i from previous iteration Re-evaluate i only for top node Re-sort and prune
F(A {d}) F(A) F(B {d}) F(B)

3/12/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu
AB
24
Idea:
Marginal gain a b c d e A1={a}
Lazy Greedy:
F(A {d}) F(A) F(B {d}) F(B)

AB
25
Idea:
Marginal gain a d b e c A1={a} A2={a,b}
Lazy Greedy:
F(A {d}) F(A) F(B {d}) F(B)

AB
26
Summary so far:
Diversity can be formulated as a set cover Set cover is submodular optimization problem Can be (approximately) solved using greedy algorithm Lazy-greedy gives significant speedup
400
Lower is better
running time (seconds)
300
exhaustive search (all subsets) naive greedy
200
100
Lazy
0 1 2 3 4 5 6 7 number of blogs selected 8 9 10
27
3/12/2013
But what about personalization?

Election trouble
model
Songs of Syria Sandy delays
Recommendations
We assumed same concept weighting for all users
France intervenes
Chuck for Defense Argo wins big
Each user has different preferences over concepts
politico
movie buff
Assume each user has different preference vector over concepts
Goal: Learn personal concept weights from user feedback

3/12/2013
France intervenes Chuck for Defense Argo wins big

Multiplicative Weights algorithm

Assume each concept has weight We recommend document and receive feedback, say = +1 or -1 Update the weights:
If then = If then =
If concept c appears in Xd and we received positive feedback r=+1 then we increase the weight wc by multiplying it by ( > ) otherwise we decrease the weight (divide by )
Normalize weights so that

3/12/2013
=
33
Steps of the algorithm

Identify items to recommend from Identify concepts [what makes items redundant?] Weigh concepts by general importance Define document coverage function Select items using probabilistic set cover Obtain feedback, update weights
1. 2. 3. 4. 5. 6.
3/12/2013
34
= I have 10 minutes. Which blogs should I read to be most up to date? = Who are the most influential bloggers?
?
35
Want to read things before others do.

Detect blue & yellow soon but miss red.
Detect all stories but late.

Given:

Graph (, ) Data on how information spreads over :
For each information we know the time (, ) when cascade hits blog
Goal: Select a subset of nodes S that maximize the expected reward:
max =

Expected reward for detecting cascade i
subject to: cost(S) < B

Reward
Maximize the number of blogs that hear the information after we do
If S are blogs we read, F(S) denotes the number of blogs that we beat Reward is submodular
Cost (blog dependent):

Reading big blogs is more time consuming
Cascade i
F(S)
Read blue blog is better than green blog
Consider the following algorithm to solve the information cascade detection problem: Greedy that ignores cost
Ignore blog cost Repeatedly select blog with highest marginal gain Do this until the budget is exhausted
How well does this work?
3/12/2013
39
Bad example:
blogs, budget : reward , cost : reward , cost Greedy always prefers more expensive blog with reward (and exhausts the budget). It never selects cheaper blog with reward For variable cost it can fail arbitrarily badly!
1 {} (1 ) = arg max
Greedily pick blog that maximizes benefit to cost ratio.
41
Idea: What if we optimize benefit-cost ratio?
3/12/2013
Benefit-cost ratio can also fail arbitrarily badly! Consider: budget
2 blogs and :
Costs: () = , () = Only 1 information cascade: () = , () =
Then benefit-cost ratio () is:

() = and () =
So, we first select but then can not afford We get reward instead of . Now send , we can get arbitrarily bad solution!
CELF (cost-effective lazy forward-selection) A two pass greedy algorithm:

Set (solution) : Use benefit-cost greedy Set (solution) : Use unit-cost greedy
Final solution: = ((), ())
How far is CELF from (unknown) optimal solution? Theorem: CELF is near optimal
CELF achieves (1-1/e) factor approximation!
3/12/2013
Back to the solution quality! The (1-1/e) bound for submodular functions is the worst case bound (worst over all possible inputs)
Data dependent bound:

Value of the bound depends on the input data
On easy data, hill climbing may do better than 63%
Can we say something about the solution quality when we know the input data?
Suppose is some solution to () s.t. =

() is function 2|| (set of docs coverage) () is monotone & submodular
Let = {, , } be the OPT solution For each let = Lets order so that () () Then: + =

Note:
This is data dependent bound (() depend on input data) Bound holds for any algorithm For some inputs it can be very loose (worse than 63%)
Online bound is much tighter!

13% instead of 37%
(1-1/e) F(A) + F(A)
=
3/12/2013
46
Heuristics perform much worse! One really needs to perform the optimization
3/12/2013
3/12/2013
48

19 Submodular

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

19 Submodular

Uploaded by

Copyright:

Available Formats

-- Class lecture on Thu 3/14 is canceled! But the video of the lecture will be posted on SCPD.

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Redundancy leads to a bad user experience

How do we optimize for diversity directly?

Monday, January 14, 2013

Monday, January 14, 2013

Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Q: What is being covered? A: Concepts (here: named entities)

Hagel expects fight

Q: Who is doing the covering? A: Documents

Suppose we are given a set of documents V

For each set of documents A we define =

Goal: We want to () Note: F(A) is a set function: 2W

3/12/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 7

Given universe of elements = {, , } and sets , ,

Goal: Find k sets Xi that cover the most of W

Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Simple Heuristic: Greedy Algorithm: Start with = { } For =

Greedy produces a solution A where: F(A) (1-1/e)*OPT (F(A)>0.63*OPT)

Claim holds for functions F() with 2 properties:

Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

F(A d) F(A) F(B d) F(B)

Large improvement Small improvement

Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

F(A d) F(A) F(B d) F(B)

Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Submodularity is discrete analogue of concavity

F(A d) F(A) F(B d) F(B)

Solution size |A|

Gain of adding Xd to a large set

Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Marginal gain: = () Submodular: () Concavity: + + ()

Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Let be submodular and > then = is submodular

Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Objective: pick k docs that cover most concepts

Enthusiasm for Inauguration wanes

F(A) is the number of concepts covered by A

Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Objective: pick k docs that cover most concepts

Enthusiasm for Inauguration wanes

Penalizes redundancy Submodular

Concept importance? weights All-or-nothing too harsh

Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Document coverage function probability document d covers concept c

Enthusiasm for Inauguration wanes

Document coverage function: probability document d covers concept c

Set coverage function:

Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

The objective function is submodular

But greedy submodular function optimization is slow!

Greedy algorithm is slow!

Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

[Leskovec et al., KDD 07]

In round : So far we have = { , , }

[Leskovec et al., KDD 07]

F(A {d}) F(A) F(B {d}) F(B)

[Leskovec et al., KDD 07]

F(A {d}) F(A) F(B {d}) F(B)

[Leskovec et al., KDD 07]

F(A {d}) F(A) F(B {d}) F(B)

Greedy produces a solution A where: F(A) (1-1/e)OPT (F(A)>0.63OPT)