You are on page 1of 47

-- Class lecture on Thu 3/14 is canceled! But the video of the lecture will be posted on SCPD.

-- HW4 due today! -- CS341 info session is on Thu 3/14 6pm in Gates415

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

http://cs246.stanford.edu

Redundancy leads to a bad user experience

Uncertainty around information need => dont put all eggs in one basket

How do we optimize for diversity directly?


Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 2

3/12/2013

France intervenes Chuck for Defense Argo wins big Hagel expects fight

Monday, January 14, 2013


3/12/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 3

France intervenes Chuck for Defense Argo wins big New gun proposals

Monday, January 14, 2013


3/12/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 4

Idea: Encode diversity as coverage problem Example: Word cloud of news for a single day
Want to select articles so that most words are covered

3/12/2013

Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Q: What is being covered? A: Concepts (here: named entities)

France Mali Hagel Pentagon Obama Romney Zero Dark Thirty Argo NFL

Hagel expects fight

Q: Who is doing the covering? A: Documents


Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 6

3/12/2013

Suppose we are given a set of documents V


Each document d covers a set of words/topics/named entities W

For each set of documents A we define =

Goal: We want to () Note: F(A) is a set function: 2W

3/12/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 7

Given universe of elements = {, , } and sets , ,


X3

X2

X1 X4

Goal: Find k sets Xi that cover the most of W


More precisely: Find a k sets Xi whose size of the union is the largest? Bad news: A known NP-complete problem

3/12/2013

Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Simple Heuristic: Greedy Algorithm: Start with = { } For =


Take set that ( {}) Let = {}

Example:

Eval. , , ({ }), pick best (say ) Eval. } { , , ({ } { }), pick best (say ) Eval. ({ , } { }), , ({ , } { }), pick best And so on
Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 9

3/12/2013

Greedy produces a solution A where: F(A) (1-1/e)*OPT (F(A)>0.63*OPT)


[Nemhauser, Fisher, Wolsey 78]

Claim holds for functions F() with 2 properties:


F is monotone: (adding more docs doesnt decrease coverage) if A B then F(A) F(B) and F({})=0 F is submodular: adding an element to a set gives less improvement than adding it to one of its subsets

3/12/2013

Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

10

Definition: Set function F() is called submodular if: For all A,B W: F(A) + F(B) F(A B) + F(A B)
+ A A B B
A B

3/12/2013

Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

11

Diminishing returns characterization Equivalent definition: Set function F() is called submodular if: For all A B, sB:

F(A d) F(A) F(B d) F(B)


Gain of adding d to a small set Gain of adding d to a large set

B A

+ Xd + Xd

Large improvement Small improvement


12

3/12/2013

Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

F() is submodular: A B

F(A d) F(A) F(B d) F(B)


Gain of adding Xd to a small set Gain of adding Xd to a large set

Natural example:
Sets 1, , =
(size of the covered area)

Xd B
Xd
13

Claim: () is submodular!
3/12/2013

Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Submodularity is discrete analogue of concavity


F()
F(B d) F(B) F(A d) F(A) Adding d to B helps less than adding it to A!

A B

F(A d) F(A) F(B d) F(B)


Gain of adding Xd to a small set
3/12/2013

Solution size |A|

Gain of adding Xd to a large set


14

Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Marginal gain: = () Submodular: () Concavity: + + ()

F(A)

3/12/2013

Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

|A|

15

Let be submodular and > then = is submodular


Submodularity is closed under non-negative linear combinations! This is an extremely useful fact:
Average of submodular functions is submodular: = Multicriterion optimization: =

3/12/2013

Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

16

Objective: pick k docs that cover most concepts

France Mali Hagel Pentagon Obama Romney Zero Dark Thirty Argo NFL

Enthusiasm for Inauguration wanes

Inauguration weekend

F(A) is the number of concepts covered by A


Elementsconcepts, Sets concepts in docs F(A) is submodular andmonotone! We can use greedy to optimize F

3/12/2013

Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

17

Objective: pick k docs that cover most concepts

France Mali Hagel Pentagon Obama Romney Zero Dark Thirty Argo NFL

Enthusiasm for Inauguration wanes

Inauguration weekend

Penalizes redundancy Submodular


3/12/2013

Concept importance? weights All-or-nothing too harsh


18

Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Document coverage function probability document d covers concept c


[e.g., how strongly d covers c]
Obama Romney

Enthusiasm for Inauguration wanes


3/12/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 19

Document coverage function: probability document d covers concept c


Coverd(c) can model how relevant is concept c for user u

Set coverage function:


Prob. that at least one document in A covers c

Objective:

concept weights

3/12/2013

Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

20

The objective function is submodular


Intuitive diminishing returns property Greedy algorithm leads to a (1 1/e) ~ 63% approximation, i.e., a near-optimal solution

But greedy submodular function optimization is slow!


Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 21

3/12/2013

Greedy
Marginal gain: F(Ax)-F(A) a

Greedy algorithm is slow!


At each iteration we need to re-evaluate marginal gains of all remaning documents Runtime (|| ) for
selecting documents

b
c d e Add document with highest marginal gain

3/12/2013

Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Part 2-22

[Leskovec et al., KDD 07]

In round : So far we have = { , , }


Now we pick = ( {}) ( )
Greedy algorithm maximizes the marginal benefit = ( {}) ( )

By submodularity property:
for <

Observation: By submodularity: For every () () for < since Marginal benefits () only shrink!
(as i grows)
Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

i(d) j(d) d

Selecting document d in step i covers more words than selecting d at step j (j>i)
23

3/12/2013

[Leskovec et al., KDD 07]

Idea:
Use i as upper-bound on j (j > i)
Marginal gain a b c d e A1={a}

Lazy Greedy:
Keep an ordered list of marginal benefits i from previous iteration Re-evaluate i only for top node Re-sort and prune

F(A {d}) F(A) F(B {d}) F(B)


3/12/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

AB
24

[Leskovec et al., KDD 07]

Idea:
Use i as upper-bound on j (j > i)
Marginal gain a b c d e A1={a}

Lazy Greedy:
Keep an ordered list of marginal benefits i from previous iteration Re-evaluate i only for top node Re-sort and prune

F(A {d}) F(A) F(B {d}) F(B)


3/12/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

AB
25

[Leskovec et al., KDD 07]

Idea:
Use i as upper-bound on j (j > i)
Marginal gain a d b e c A1={a} A2={a,b}

Lazy Greedy:
Keep an ordered list of marginal benefits i from previous iteration Re-evaluate i only for top node Re-sort and prune

F(A {d}) F(A) F(B {d}) F(B)


3/12/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

AB
26

Summary so far:
Diversity can be formulated as a set cover Set cover is submodular optimization problem Can be (approximately) solved using greedy algorithm Lazy-greedy gives significant speedup
400

Lower is better

running time (seconds)

300

exhaustive search (all subsets) naive greedy

200

100

Lazy
0 1 2 3 4 5 6 7 number of blogs selected 8 9 10
27

3/12/2013

Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

But what about personalization?


Election trouble

model

Songs of Syria Sandy delays

Recommendations
3/12/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 28

We assumed same concept weighting for all users

France Mali Hagel Pentagon Obama Romney Zero Dark Thirty Argo NFL

France intervenes
Chuck for Defense Argo wins big
3/12/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 29

Each user has different preferences over concepts

France Mali Hagel Pentagon Obama Romney Zero Dark Thirty Argo NFL

politico

France Mali Hagel Pentagon Obama Romney Zero Dark Thirty Argo NFL

movie buff
3/12/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 30

Assume each user has different preference vector over concepts

Goal: Learn personal concept weights from user feedback


Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 31

3/12/2013

France Mali Hagel Pentagon Obama Romney Zero Dark Thirty Argo NFL

France intervenes Chuck for Defense Argo wins big


3/12/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 32

Multiplicative Weights algorithm


Assume each concept has weight We recommend document and receive feedback, say = +1 or -1 Update the weights:
If then = If then =
If concept c appears in Xd and we received positive feedback r=+1 then we increase the weight wc by multiplying it by ( > ) otherwise we decrease the weight (divide by )

Normalize weights so that


3/12/2013

=
33

Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Steps of the algorithm


Identify items to recommend from Identify concepts [what makes items redundant?] Weigh concepts by general importance Define document coverage function Select items using probabilistic set cover Obtain feedback, update weights

1. 2. 3. 4. 5. 6.

3/12/2013

Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

34

= I have 10 minutes. Which blogs should I read to be most up to date? = Who are the most influential bloggers?
3/12/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

?
35

Want to read things before others do.


Detect blue & yellow soon but miss red.

Detect all stories but late.


3/12/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 36

[Leskovec et al., KDD 07]

Given:

Graph (, ) Data on how information spreads over :

For each information we know the time (, ) when cascade hits blog

Goal: Select a subset of nodes S that maximize the expected reward:

max =


Expected reward for detecting cascade i

subject to: cost(S) < B


3/12/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 37

Reward
Maximize the number of blogs that hear the information after we do
If S are blogs we read, F(S) denotes the number of blogs that we beat Reward is submodular

Cost (blog dependent):


Reading big blogs is more time consuming

Cascade i

F(S)
Read blue blog is better than green blog
3/12/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 38

[Leskovec et al., KDD 07]

Consider the following algorithm to solve the information cascade detection problem: Greedy that ignores cost
Ignore blog cost Repeatedly select blog with highest marginal gain Do this until the budget is exhausted

How well does this work?

3/12/2013

Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

39

[Leskovec et al., KDD 07]

Bad example:
blogs, budget : reward , cost : reward , cost Greedy always prefers more expensive blog with reward (and exhausts the budget). It never selects cheaper blog with reward For variable cost it can fail arbitrarily badly!
1 {} (1 ) = arg max
Greedily pick blog that maximizes benefit to cost ratio.
41

Idea: What if we optimize benefit-cost ratio?

3/12/2013

Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

[Leskovec et al., KDD 07]

Benefit-cost ratio can also fail arbitrarily badly! Consider: budget

2 blogs and :
Costs: () = , () = Only 1 information cascade: () = , () =

Then benefit-cost ratio () is:


() = and () =

So, we first select but then can not afford We get reward instead of . Now send , we can get arbitrarily bad solution!
3/12/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 42

[Leskovec et al., KDD 07]

CELF (cost-effective lazy forward-selection) A two pass greedy algorithm:


Set (solution) : Use benefit-cost greedy Set (solution) : Use unit-cost greedy

Final solution: = ((), ())

How far is CELF from (unknown) optimal solution? Theorem: CELF is near optimal
CELF achieves (1-1/e) factor approximation!
Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 43

3/12/2013

Back to the solution quality! The (1-1/e) bound for submodular functions is the worst case bound (worst over all possible inputs)

Data dependent bound:


Value of the bound depends on the input data
On easy data, hill climbing may do better than 63%

Can we say something about the solution quality when we know the input data?
3/12/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 44

Suppose is some solution to () s.t. =


() is function 2|| (set of docs coverage) () is monotone & submodular

Let = {, , } be the OPT solution For each let = Lets order so that () () Then: + =

Note:
This is data dependent bound (() depend on input data) Bound holds for any algorithm For some inputs it can be very loose (worse than 63%)
3/12/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 45

Online bound is much tighter!


13% instead of 37%
(1-1/e) F(A) + F(A)
=

3/12/2013

Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

46

[Leskovec et al., KDD 07]

Heuristics perform much worse! One really needs to perform the optimization
Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 47

3/12/2013

3/12/2013

Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

48

You might also like