You are on page 1of 6

Information Filtering

Information Systems M

Prof. Paolo Ciaccia


http://www-db.deis.unibo.it/courses/SI-M/

Information Retrieval vs Information Filtering


 The Information Filtering (IF) problem:
 Deliver to users only the information that is relevant to them, filtering
out all irrelevant new data items (news, papers, advertisments, …)
 Although IF and IR share the common goal to provide users with relevant
information, there are important differences:

IR IF
Selecting relevant items Filtering out the many
Goal
(docs) for each query irrelevant data items
Type of use Ad-hoc use Repetitive use
Type of users One-time users Long-term users
Representation of Queries User profiles
information needs
Index Items User profiles

Information Filtering Sistemi Informativi M 2

1
Applications of IF
 IF techniques find applications in a variety of scenarios, including:
 Automatic delivery of news/alerts
 Online display advertising
 Publish/subscribe systems
 …

 Recommender systems are a specific type of IF systems that will be


discussed later on

Information Filtering Sistemi Informativi M 3

Models of IF
 Due to its similarity with IR, it is not surprise that the most common
approaches to IF are based on the Boolean and the Vector Space models
 However, a more detailed and structured description of the user profile is
now needed, in order to improve the effectiveness of matching

 In the sequel we will sketch the details of a recent approach based on the
Boolean model; examples of use of the VSM will be given in the context of
recommender systems

Information Filtering Sistemi Informativi M 4

2
Matching and indexing of Boolean expressions
 Reference: [WBS+09]
Scenario:
 A (profiled) user visiting a web site (also called an “assignment”)
 Many advertisement campaigns managed by the site
 Both specified using Boolean expressions (BE’s) over a multi-attribute
space
Alternatively (pub/sub system):
 An incoming item
 Many stored user profiles
One “assignment” to be efficiently matched against many stored BE’s

index
BE
Assignment Matched BE’s

Information Filtering Sistemi Informativi M 5

BE model
 Two types of Boolean predicates: ∈ and ∉
 E.g.: state ∈ {CA,NY}, state ∉ {NY}
 Ranges of values are converted into ∈ and ∉ predicates
 age < 30 converted into age ∈ {0,1,2} (0 = [0,9], 1 = [10,19], …)
 A BE is either in DNF or in CNF normal form, e.g.:
(state ∈ {CA,NY} & age ∈ {1,2}) | (state ∉ {NY} & gender ∈ {F})
 & = AND; | = OR
 In the following we only discuss the DNF case

 An assignment S is a set (conjunction) of attribute and value pairs


 E.g.: S: state = CA & gender = F

 An attribute-value pair is also called a key


 E.g. (state,CA) is a key

Information Filtering Sistemi Informativi M 6

3
BE matching
 A BE E is satisfied by an assignment S if S makes E true
 S: state = CA & gender = F
 E1: state ∈ {CA,NY} satisfied
 E2: state ∈ {CA,NY} & gender ∈ {M} not satisfied

 Since an assignment needs not to specify a value for all the attributes, the
semantics of matching needs to be refined
 (state ∈ {NY} & gender ∈ {F}) is satisfied by gender = F? NO
 (state ∉ {NY} & gender ∈ {F}) is satisfied by gender = F? MAYBE…

 Two alternative interpretations for ∉ predicates:


 Strong-∉ predicate: violated if no value is specified for the attribute
 Weak-∉ predicate: satisfied if no value is specified for the attribute
 The default are weak-∉ predicates;
 The strong-∉ semantics can be enforced by writing, e.g.: state ∉ {NY,NULL},
which requires a value for state to be present in the assignment
Information Filtering Sistemi Informativi M 7

Inverted index: basic principle


 The basic idea is to build an inverted index on BE’s that, for each key, stores
the BE’s containing it
 The basic case is when BE’s are simple conjunctions of ∈ predicates

E1: A ∈ {1} Inverted Index


E2: A ∈ {1} & B ∈ {2} & C ∈ {3,4} Key Posting list
(A,1) E1, E2
(B,2) E2
S: A = 1 & B = 2 (C,3) E2
(C,4) E2

The problem is that neither intersection nor union of posting lists work here:
- Intersection: E2
- Union: E1 and E2
Information Filtering Sistemi Informativi M 8

4
Inverted index: the “conjunction” case
 Entries are partitioned based on the number of conjuncts K in each BE
 The partition of the inverted index storing information of BE’s with K
conjuncts is called the “K-index”
BE’s (conjunctions) Inverted Index
ID BE K K Key Posting list
C1 age ∈ {3} & state ∈ {NY} 2 0 (state,CA) (C6,∉)
C2 age ∈ {3} & gender ∈ {F} 2 (state,NY) (C6,∉)
C3 age ∈ {3} & gender ∈ {M} & state ∉ {CA} 2 Z (C6,∈)
C4 state ∈ {CA} & gender ∈ {M} 2 1 (age,3) (C5,∈)
C5 age ∈ {3,4} 1 (age,4) (C5,∈)
C6 state ∉ {CA,NY} 0 2 (state,NY) (C1,∈)
(C1,∈), (C2,∈),
(age,3)
(C3,∈)
 The “Z key” is used to handle the case (gender,F) (C2,∈)
K = 0 (notice that ∉ predicates do not (state,CA) (C3,∉) ,(C4,∈)
concur to determine the value of K) (gender,M) (C3,∈), (C4,∈)

Information Filtering Sistemi Informativi M 9

The “Conjunction algorithm”: basic ideas


 Given an assignment S with t keys, two basic conditions are used to check if
a conjunction C matches S:
1. For a K-index with K ≤ t, a conjunction C matches S only if there are K
posting lists such that:
 Each list refers to a key (A,v) in S, and (C,∈) is in the posting list
2. For no (A,v) key in S there is a posting list in which (C,∉) appears
 Example:
 C1: (age ∈ {3} & gender ∈ {M}) matches
S: age ∈ {3} & gender ∈ {M} & state ∈ {CA}
 C2: (age ∈ {3} & gender ∈ {M} & state ∉ {CA})
does not match S, since the posting list of the key (state,CA) includes
the entry (C2,∉)

 The Conjunction algorithm iterates through the K-indexes by checking that


above conditions are satisfied
 Further, it does not consider at all K-indexes with K > t

Information Filtering Sistemi Informativi M 10

5
The “Conjunction algorithm”: example
S: age =3 & state = CA & gender = M Inverted Index
 First, all the relevant posting lists are K Key Posting list
obtained (one K-index at a time) 0 (state,CA) (C6,∉)
Z (C6,∈)
 For K=2 it is recognized that neither
1 (age,3) (C5,∈)
C1 nor C2 can be satisfied by S
2 (age,3) (C1,∈), (C2,∈), (C3,∈)
 Although C3 satisfies condition 1, (state,CA) (C3,∉) ,(C4,∈)
it violates cond. 2 (gender,M) (C3,∈), (C4,∈)
 C4 satisfies both conditions
BE’s (conjunctions)
 The same holds for C5 (K=1)
 C6 violates condition 2 ID BE K
C1 age ∈ {3} & state ∈ {NY} 2
C2 age ∈ {3} & gender ∈ {F} 2
Result: {C4,C5} C3 age ∈ {3} & gender ∈ {M} & state ∉ {CA} 2
C4 state ∈ {CA} & gender ∈ {M} 2
C5 age ∈ {3,4} 1
C6 state ∉ {CA,NY} 0
Information Filtering Sistemi Informativi M 11

The DNF case

 To process BE’s in DNF it is sufficient to observe that a BE E is satisfied by an


assignment S iff at least one of its conjunctions of predicates is satisfied by S

 Example:
(state ∈ {CA} & gender ∈ {M}) | (state ∈ {NY} & gender ∈ {F})
is satisfied by
S: age =3 & state = CA & gender = M

Information Filtering Sistemi Informativi M 12

You might also like