Professional Documents
Culture Documents
Information Systems M
IR IF
Selecting relevant items Filtering out the many
Goal
(docs) for each query irrelevant data items
Type of use Ad-hoc use Repetitive use
Type of users One-time users Long-term users
Representation of Queries User profiles
information needs
Index Items User profiles
1
Applications of IF
IF techniques find applications in a variety of scenarios, including:
Automatic delivery of news/alerts
Online display advertising
Publish/subscribe systems
…
Models of IF
Due to its similarity with IR, it is not surprise that the most common
approaches to IF are based on the Boolean and the Vector Space models
However, a more detailed and structured description of the user profile is
now needed, in order to improve the effectiveness of matching
In the sequel we will sketch the details of a recent approach based on the
Boolean model; examples of use of the VSM will be given in the context of
recommender systems
2
Matching and indexing of Boolean expressions
Reference: [WBS+09]
Scenario:
A (profiled) user visiting a web site (also called an “assignment”)
Many advertisement campaigns managed by the site
Both specified using Boolean expressions (BE’s) over a multi-attribute
space
Alternatively (pub/sub system):
An incoming item
Many stored user profiles
One “assignment” to be efficiently matched against many stored BE’s
index
BE
Assignment Matched BE’s
BE model
Two types of Boolean predicates: ∈ and ∉
E.g.: state ∈ {CA,NY}, state ∉ {NY}
Ranges of values are converted into ∈ and ∉ predicates
age < 30 converted into age ∈ {0,1,2} (0 = [0,9], 1 = [10,19], …)
A BE is either in DNF or in CNF normal form, e.g.:
(state ∈ {CA,NY} & age ∈ {1,2}) | (state ∉ {NY} & gender ∈ {F})
& = AND; | = OR
In the following we only discuss the DNF case
3
BE matching
A BE E is satisfied by an assignment S if S makes E true
S: state = CA & gender = F
E1: state ∈ {CA,NY} satisfied
E2: state ∈ {CA,NY} & gender ∈ {M} not satisfied
Since an assignment needs not to specify a value for all the attributes, the
semantics of matching needs to be refined
(state ∈ {NY} & gender ∈ {F}) is satisfied by gender = F? NO
(state ∉ {NY} & gender ∈ {F}) is satisfied by gender = F? MAYBE…
The problem is that neither intersection nor union of posting lists work here:
- Intersection: E2
- Union: E1 and E2
Information Filtering Sistemi Informativi M 8
4
Inverted index: the “conjunction” case
Entries are partitioned based on the number of conjuncts K in each BE
The partition of the inverted index storing information of BE’s with K
conjuncts is called the “K-index”
BE’s (conjunctions) Inverted Index
ID BE K K Key Posting list
C1 age ∈ {3} & state ∈ {NY} 2 0 (state,CA) (C6,∉)
C2 age ∈ {3} & gender ∈ {F} 2 (state,NY) (C6,∉)
C3 age ∈ {3} & gender ∈ {M} & state ∉ {CA} 2 Z (C6,∈)
C4 state ∈ {CA} & gender ∈ {M} 2 1 (age,3) (C5,∈)
C5 age ∈ {3,4} 1 (age,4) (C5,∈)
C6 state ∉ {CA,NY} 0 2 (state,NY) (C1,∈)
(C1,∈), (C2,∈),
(age,3)
(C3,∈)
The “Z key” is used to handle the case (gender,F) (C2,∈)
K = 0 (notice that ∉ predicates do not (state,CA) (C3,∉) ,(C4,∈)
concur to determine the value of K) (gender,M) (C3,∈), (C4,∈)
5
The “Conjunction algorithm”: example
S: age =3 & state = CA & gender = M Inverted Index
First, all the relevant posting lists are K Key Posting list
obtained (one K-index at a time) 0 (state,CA) (C6,∉)
Z (C6,∈)
For K=2 it is recognized that neither
1 (age,3) (C5,∈)
C1 nor C2 can be satisfied by S
2 (age,3) (C1,∈), (C2,∈), (C3,∈)
Although C3 satisfies condition 1, (state,CA) (C3,∉) ,(C4,∈)
it violates cond. 2 (gender,M) (C3,∈), (C4,∈)
C4 satisfies both conditions
BE’s (conjunctions)
The same holds for C5 (K=1)
C6 violates condition 2 ID BE K
C1 age ∈ {3} & state ∈ {NY} 2
C2 age ∈ {3} & gender ∈ {F} 2
Result: {C4,C5} C3 age ∈ {3} & gender ∈ {M} & state ∉ {CA} 2
C4 state ∈ {CA} & gender ∈ {M} 2
C5 age ∈ {3,4} 1
C6 state ∉ {CA,NY} 0
Information Filtering Sistemi Informativi M 11
Example:
(state ∈ {CA} & gender ∈ {M}) | (state ∈ {NY} & gender ∈ {F})
is satisfied by
S: age =3 & state = CA & gender = M