Relevance and Ranking in Online Dating Systems

Fernando Diaz1 Donald Metzler2 Sihem Amer-Yahia1
2 USC

Labs Information Sciences Institute

1 Yahoo!

July 20, 2010

1 / 48

Problem Definition

U du qu Ruv

set of users user description user query match relevance

Task: For each user, u, rank all other users, v ∈ U − {u}, such that relevant matches occur above non-relevant candidates.

2 / 48

Dating System

U
3 / 48

Dating System

set of candidates given u
3 / 48

Dating System

set of candidates u is interested in
3 / 48

Dating System

set of candidates also interested in u
3 / 48

Dating System

set of candidates also interested in u
3 / 48

Dating System

set of candidates also interested in u
3 / 48

Match-Making Systems

• dating sites

4 / 48

Match-Making Systems

• dating sites • employment sites

4 / 48

Match-Making Systems

• dating sites • employment sites • community question answering

4 / 48

Match-Making Systems

• dating sites • employment sites • community question answering • paper reviewing

4 / 48

Match-Making Systems

• dating sites • employment sites • community question answering • paper reviewing • consumer-to-consumer

4 / 48

Outline
Related Work Ranking for Match-Making Systems Relevance Ranking Features Ranking Function Methods and Materials Results Conclusion

5 / 48

Related Work
• Quantitative analysis of dating • Demographic factors influencing preferences in online dating systems [Hitsch et al. 2005]. • Linguistic analysis of speed dating [Ranganath et al. 2009]. • Ranking for databases • Relevance ranking [Chaudhuri et al. 2004, Lavrenko et al. 2007] • Workload-based feedback [Wu et al. 2000] • Person search • Expert search [Balog 2008] • Search in a social network [Kleinberg 2000] • Reviewer matching [Karimzadehgan et al. 2008]

6 / 48

Stable Marriage Problem
[Gale and Shapley 1962]

• input: each user ranks all other users • output: 1-1 matching of users • assumes single recommendation, not a ranking

7 / 48

Ranking for Match-Making Systems

8 / 48

Ranking for Match-Making Systems
Probability Ranking Principle

Task: For each querier, u, rank each candidate, v ∈ U − {u}, by P (R|u, v).

9 / 48

Ranking for Match-Making Systems

• Relevance: measuring match relevance. • Ranking features: signals used to determine match

relevance. • Ranking function: combining ranking features to model match relevance.

10 / 48

Ranking for Match-Making Systems
Relevance

11 / 48

Traditional Relevance
user document

One-sided relevance in information retrieval measures the satisfaction of the user with respect to a document.

12 / 48

Relevance for Match-Making Systems
user user

Two-sided relevance measures the mutual satisfaction of two users in a match.

13 / 48

Issues with One-Sided Relevance in a Two-Sided Domain
Unreciprocated Contact
user ranking

one-sided two-sided

P@5 0.60 0.20

14 / 48

Issues with One-Sided Relevance in a Two-Sided Domain
Undesired Attention

A user may be contacted by irrelevant individuals.
15 / 48

Two-Sided Relevance Is Very Subjective

• Human • Relevance is very subjective given only pre-introduction information (i.e. queries and profiles) • Possible with post-introduction forms, questionnaires [Ranganath et al. 2009]. • Automatic • Possible with longitudinal studies [Gottman et al. 2002].

16 / 48

editorially labeling match relevance is very, very hard.

(relevance is in the eye of the beholder.)

17 / 48

can we use implicit feedback signals in the production system to infer relevance?

(relevance is in the eye of the beholder.)

18 / 48

Post-Presentation Signals
One-Way Relevance

P (R|q, d) ∝ f (clicks)

In traditional search, one-way user engagements (e.g. clicks) can be used to predict relevance [Joachims 2002, …].

19 / 48

Post-Presentation Signals
Two-Way Relevance

P (R|u, v) ∝ f (number of messages)

In match-making search, two-way user engagements can be used to predict relevance.

• • •

profile views (e.g. ‘number of times users viewed each other’s profile’) message exchanges (e.g. ‘number of times users exchanged messages’) duration of interactions (e.g. ‘period of time over which users interacted’)

20 / 48

Modeling Relevance with Post-Presentation Signals

1. Pseudo-labeled relevance: pick high precision signals to infer relevance for a subset of matches.
• ‘exchanged 100 messages’ → relevant • ‘no reply to message’ → non-relevant

2. Predicted relevance: model pseudo-labeled relevance as a function of unlabeled signals.

21 / 48

Modeling Relevance with Post-Presentation Signals

pseudo-labeled

P (R|u, v) {0, 1}

coverage low

22 / 48

Modeling Relevance with Post-Presentation Signals

pseudo-labeled predicted

P (R|u, v) {0, 1} f (xpost ; Θpost )

coverage low medium

22 / 48

Modeling Relevance with Post-Presentation Signals

pseudo-labeled predicted predicted

P (R|u, v) {0, 1} f (xpost ; Θpost ) f (xpre ; Θpre )

coverage low medium high

22 / 48

Ranking for Match-Making Systems
Ranking Features

23 / 48

Pre-Presentation Signals
One-Way Relevance

P (R|q, d) ∝ f (query term matches, popularity)

In traditional search, a variety of static and query-dependent features can be used to predict relevance.

24 / 48

Pre-Presentation Signals
Two-Way Relevance

P (R|q, d) ∝ f (query matches, income)

In match-making search, a similar variety of static and query-dependent features can be used to predict relevance.

25 / 48

Query and Description Attributes
Dating Dataset
scalar age height income num children num photos categorical body type city country desires more children drinking education employment ethnicity eye color featured profile gender hair color humor style interests languages living situation marital status new user occupation personality type political bent religion religious activity romantic style sexuality smoking social style star sign state subscription status television viewer zip code text description

*all except ‘description’ are queriable.
26 / 48

Querying Attributes
Profile
querier candidate

query

profile

1. desired attribute value 2. importance (‘must match’, ‘nice to match’, ‘any match’)
one feature for each attribute
27 / 48

Ranking Features
Profile
querier candidate

query

q

q

profile

d

d

← − − → • d : features of the querier (candidate independent)
• d : features of the candidate (query independent)
one feature for each attribute
28 / 48

Ranking Features
Match
querier candidate

query

q

q

profile

d

d

• •

− : candidate attributes matching query → q ←: querier attributes matching candidate’s query − q
one feature for each attribute
29 / 48

Ranking Features
Similarity
querier candidate

query

q

q

profile

δ d d
scalar features binary features text features
30 / 48

− → ← − δi = | d i − d i | − → ← − δi = d i ⊕ d i δtext = ⟨tu , tv ⟩

Ranking Features
Summary

• Profile: match-independent attributes of the querier

and candidate. • Match: match-dependent attributes of the query and profile. • Similarity: match-dependent similarity between profiles.

31 / 48

Ranking for Match-Making Systems
Ranking Function

32 / 48

Pre-Presentation Signals
Two-Way Relevance

P (R|q, d) ∝ f (profile, match, similarity)

The functional form of f will be a boosted decision tree.

33 / 48

Modeling Relevance
Gradient Boosted Decision Trees

+
=

+...+

f (xpre ; Θpre )

T (x; Θ0 )

+

T (x; Θ1 )

+...+

T (x; Θm )

34 / 48

Methods and Materials

35 / 48

Data

train validation test

users 7,716 3,697 4,836

matches 17,538 7,543 10,636

gathered in Fall 2009; only users with at least one match with non-zero P (R)

36 / 48

Evaluation
APu = ∑ P@k =

1

v

1

k v∈R k

P (R|u, v) v ∑ P (R|u, v)

Prec(v)P (R|u, v)

NDCGk =

k ∑ 2P (R|u,vi ) − 1 i=1

log(i + 1)

∑ P (R|u, vi ) i−1 ( ∏ ) ERR = 1 − P (R|u, vj ) r i j=1

• evaluation performed for pseudo- and predicted labels

with probabilistic retrieval metrics. • statistical significance using paired, one-tailed bootstrap (user sampling proportional to probability of strongest match)
37 / 48

Training

• Gradient-boosted decision tree regression against

P (R|u, v). • Instance-weighting according to inverse class frequency (to address class imbalance). • Free parameters tuned on validation set.

38 / 48

Runs

← − d − → q δ ← → q

query-independent profile attributes one-way profile-query match profile similarity two-way profile-query match

← → → → Combinations δ ← and d ← . q q

39 / 48

Results
Match Features

AP NDCG1 NDCG5 NDCG10 P@1 P@5 P@10 ERR

← − d 0.485
0.346 0.576 0.649 0.360 0.326 0.226 0.582

δ 0.484
0.366 0.556 0.643 0.380 0.311 0.223 0.577

− → q 0.428
0.287 0.501 0.598 0.304 0.298 0.221 0.517

← → q 0.454
0.317 0.527 0.619 0.334 0.303 0.219 0.552

→ δ← q 0.497
0.380 0.575 0.659 0.395 0.318 0.227 0.595

←← →→ d q 0.494
0.367 0.580 0.656 0.381 0.326 0.226 0.589

40 / 48

Predicted Relevance
Match Features Alone Ineffective

AP NDCG1 NDCG5 NDCG10 P@1 P@5 P@10 ERR

← − d 0.485
0.346 0.576 0.649 0.360 0.326 0.226 0.582

δ 0.484
0.366 0.556 0.643 0.380 0.311 0.223 0.577

− → q 0.428
0.287 0.501 0.598 0.304 0.298 0.221 0.517

← → q 0.454
0.317 0.527 0.619 0.334 0.303 0.219 0.552

→ δ← q 0.497
0.380 0.575 0.659 0.395 0.318 0.227 0.595

←← →→ d q 0.494
0.367 0.580 0.656 0.381 0.326 0.226 0.589

41 / 48

Predicted Relevance
No Statistical Difference Between Strongest Runs

AP NDCG1 NDCG5 NDCG10 P@1 P@5 P@10 ERR

← − d 0.485
0.346 0.576 0.649 0.360 0.326 0.226 0.582

δ 0.484
0.366 0.556 0.643 0.380 0.311 0.223 0.577

− → q 0.428
0.287 0.501 0.598 0.304 0.298 0.221 0.517

← → q 0.454
0.317 0.527 0.619 0.334 0.303 0.219 0.552

→ δ← q 0.497
0.380 0.575 0.659 0.395 0.318 0.227 0.595

←← →→ d q 0.494
0.367 0.580 0.656 0.381 0.326 0.226 0.589

42 / 48

Are queries useless?

• Yes, we are better off statically ranking users subject to

a few distance and gender constraints.

43 / 48

Are queries useless?

• Yes, we are better off statically ranking users subject to

a few distance and gender constraints. • No, the match representation is bad.
• Query language or attributes may be poorly

represented.
• Users may not be comfortable enough with the query

interface.

43 / 48

Conclusions

• Relevance is complicated and task-dependent • two-way • highly subjective • IR ̸= web search • Preliminary evidence suggests queries may not be

helpful.

44 / 48

Future Work

• Detecting implicit relevance signals for different

retrieval domains. • Designing interfaces to encourage implicit relevance feedback. • Richer features. • More effective query languages and interfaces.

45 / 48

Acknowledgments

Sergiy Matusevych Seena Cherangara Ramesh Gollapudi Ramana Lokanathan

46 / 48

Supplemental Slides

47 / 48

Important Features
← − d featured age height living arrangement subscription status religious activity employment num photos religion interests new user smoking activity more kids occupation → δ← q text (δ ) − distance (←) q age (δ ) → distance (− ) q subscription status (δ ) height (δ ) featured (δ ) new user (δ ) gender (δ ) num photos (δ ) eye color (δ ) → max age (− ) q religious activity (δ ) − max age (←) q featured (δ )

48 / 48

Sign up to vote on this title
UsefulNot useful