Professional Documents
Culture Documents
Abstract scores that are generated from these models, when used
In media search, it’s often found that for a query, a for document retrieval and ranking, have not been seen
content without any shared term in its title or concise to have performance as good as originally expected.
description can still be highly relevant. Therefore, On the other hand, one recent semantic matching
document retrieval based on only word matching can method called Deep Structured Semantic Model
miss important contents. On the other hand, even with (DSSM) was proposed by Huang et al. [7] in a form of
an exact-word match in its document, the content can supervised learning, which utilizes the relevance labels
still be an unsatisfactory result. These observations in the model training. With the labels incorporated into
suggest that query-document matching should be its objective function, the DSSM maps a query and a
established at the semantic level other than the term relevant document into a common semantic space in
level. In this paper, we introduce some latest results of which a similarity measure can be calculated so that
applying the Deep Structured Semantic Model (DSSM) even there is no shared term between the query and the
to build the similarity features for document retrieval document, the similarity score can still be positive. In
and ranking in media search. To test the new features, this sense, the DSSM can do better in reducing the
we perform the experiments of retrieving and ranking chance that relevant documents are missed during the
media documents from Xbox, with and without the retrieval. Furthermore, it has been reported in [7][9] that
DSSM involved. To have a DSSM, we leverage the applying the DSSM to web search yields superior single
Bing’s large web search logs to train a seed DSSM and feature performance. That is the DSSM similarity score
is better than many traditional features including TF-
then tune the model using the Xbox’s ranker training
IDF, BM25, WTM (word translation model), LSA,
data. We also train the Xbox’s rankers with and without
PLSA, and LDA, etc. One important reason is that the
adding the DSSM similarity features. Our experimental DSSM similarity score contains the label information so
results have shown that adding the DSSM similarity that not only high relevance without term match receives
features significantly improves the results of document high similarity score, the low relevance with term match
retrieval and ranking. also has low similarity score as a result of penalization.
Given the encouraging results reported so far, we
aim to apply the DSSM model to the media search area
1. Introduction
to see whether the method can also improve the quality
of the retrieval and ranking results in the media domain.
Applying semantic matching based methods beyond
The rest of this paper is organized as the follows. We
lexical matching based ones to search problems is
first indtroduce the formulation of the DSSM model. We
currently a hot research direction. Latent semantic
then introduce letter-tri-gram as the technique of
models such as latent semantic analysis (LSA) have
dimension-reduced text representation, the parameter
been widely used to find the relatedness of a query and
estimation method, the model structure, and how the
its relevant documents when term matching fails
model is integrated into a media search system. We
[1][4][6]. Besides LSA, probabilistic topic models such
make conclusions after presenting some experimental
as the probabilistic LSA (PLSA) [6] and Latent Dirichlet
results on the Xbox’s media search data.
Allocation (LDA) [1] have also been proposed for
semantic matching. Those models, despite the purpose
and solid theoretical foundation, suffer from a fact that 2. Model
they are usually trained via unsupervised learning under
the objective functions that are loosely connected to the 2.1. Formulation
relevance labels. Consequently, the semantic matching
Let , , , , …
denote a set of relevant where ?D = ∑LM F E
,L
D D
CL + ,N
D
. Note that the last layers
query-document pairs in which is a document under for and respectively are 2 and 2 . The structure
the query . Assuming that these relevant query- is illustrated as the following Figure 1.
document pairs are independent, then the joint
probability of observing these query-document pairs is
∏
| , (1)
|;
] =
] − ^] ∇
5
(7) q:
where ^] > 0 is the learning rate at >-th ibteration. The ⋯ d1
d2
gradient can be written as ⋯
∇ = ∑ ∑∈`9$ 3 ∙ ∇
5 ∆ ,
(8)
DSSM
) ;)∆9! <
Ranker with
3 =
Query
where . Hence computing the DSSM
∑! ∈#$ ;)∆9 <
representation:
9 ! 2 features
gradient of the loss function 5 is reduced to Query: q
computing the gradient of the similarity function
. It depends on the structures of the two neural
-, ;
nets that represent the two functions 3 and 4 , Figure 3. The illustration of DSSM in search
respectively. We adopt the same structure as the and ranking
following Figure 2 shows.
3. Experiments
S
C :2
or 2 We trained three models in our experiments. First, we
50 trained a baseline ranker that does not have the DSSM
similarity scores as features. Second, we trained a DSSM
C and used it to compute -, for each , -pair in the
ranker’s training data. Third, we added -, in
100
C
different forms as similarity features to the existing
100 features and trained a ranker that we call DSSM ranker.
Since a media document usually contains title,
CN : LTG vector of or description, actor list, genre, we calculate four similarity
scores - title, - description, - actors, and - genre. Consequently,
49248