You are on page 1of 4

Understanding Short Texts∗

Zhongyuan Wang Haixun Wang


Microsoft Research Facebook Inc.
zhy.wang@microsoft.com haixun@fb.com

1 Intended audience makes them extremely difficult to handle. On the


other hand, search engines, seem to be able to han-
• Researchers in natural language processing,
dle short texts (queries) quite well. This however
data management, knowledge engineering,
does not mean search engines have an organic un-
text mining, and information retrieval;
derstanding of short texts, rather, the semantic gap
• Industrial practitioners in search, ads, seman- is filled in by human click-through signals.
tic query processing, and other knowledge- Human beings can understand short texts with
powered applications. ease, although many of them are ambiguous. How
does the mind get so much out of so little, espe-
• Junior researchers and graduate students in cially when the input data is sparse, noisy, and am-
text analysis, who are potentially interested biguous? In 2011, a Science paper called “How
in large-scale knowledge representation and to grow a mind: statistics, structure, and abstrac-
acquisition, machine learning, graph algo- tion” (Tenenbaum et al., 2011) pointed out that
rithms. “If the mind goes beyond the data given, another
source of information must make up the differ-
2 Introduction ence.” Lots of efforts have been put into filling
Everyday, billions of short texts are being pro- this gap.
duced, including search queries, ad keywords, A growing number of approaches leverage ex-
tags, tweets, conversations in messengers, social ternal knowledge to address the issue of inade-
network posts, etc. Unlike documents, short texts quate contextual information that accompanies the
have some unique characteristics which make short texts. These approaches can be classified
them difficult to handle. into two categories:

• First, short texts, especially search queries, • Explicit Representation Model (ERM) Ex-
do not always observe the syntax of a written plicit approaches try to analyze and model
language. This means traditional NLP tech- short texts by following the traditional natu-
niques, such as syntactic parsing, do not al- ral language processing steps, including seg-
ways apply to short texts with good results. mentation (Hua et al., 2015), labeling (sense
disambiguation) (Wang et al., 2015a; Song
• Second, short texts contain limited context.
et al., 2011; Kim et al., 2013; Hua et
The majority of search queries contain less
al., 2015; Wang et al., 2015b), and syntax
than 5 words, and tweets can have no more
structure (dependency parsing) (Wang et al.,
than 140 characters.
2014b). Take Conceptualization as an exam-
Because of the above reasons, short texts give ple, it maps the short text to concepts de-
rise to a significant amount of ambiguity, which fined in a certain taxonomy or knowledge

base. Bayesian rules (Song et al., 2011) and
You can find more materials from our tutorial website:
http://www.wangzhongyuan.com/tutorial/ co-occurrence network (Hua et al., 2015) are
ACL2016/Understanding-Short-Texts/ leveraged for concept inference. A holistic
model (Wang et al., 2015b) is also proposed and discuss fundamental problems, techniques as
to allows all available signals to fully inter- well as open issues in this vibrant area.
play in these subtasks to better understand
short texts. Then based on explicit represen- 3 Tutorial Overview
tation, which can be treated as both human This tutorial aims at presenting a comprehensive
understandable and machine understandable overview of short text understanding based on ex-
vectors, more advanced functions such as plicit semantics (knowledge graph representation,
term similarity (Li et al., 2013), short text acquisition, and reasoning) and implicit semantics
similarity (Song et al., 2011), short text clas- (embedding and deep learning). We note that no
sifier (Wang et al., 2014a) are proposed based tutorial on the topic yet exist across NLP, web, IR,
on vector similarity measures. These func- or databases conferences, and we believe that this
tions can be further used for various applica- tutorial is timely for both surveying the field, and
tions such as ads and search relevance (Wang educating both application developers and aspir-
et al., 2015a), query recommendation (Wang ing researchers.
et al., 2014a), and web table understand-
ing (Wang et al., 2012). The most important 3.1 Central theme
advantage of explicit models is that its repre- The tutorial is going to survey many applications,
sentation results are easily understood by hu- including search engines, ads, automatic question-
man beings. Therefore, these models can be answering, online advertising, recommendation
customized for different cases. systems, etc., that may benefit from short text un-
• Implicit Representation Model (IRM) On derstanding.
the other hand, implicit approaches tries to The central theme of the tutorial is represen-
leverage big data and learning based tech- tation, as in all these applications, the neces-
niques (especially deep learning) to model la- sary first step is to transform an input text into
tent semantic representation for short texts. a machine-interpretable representation, namely to
Lots of work focuses on mapping texts to “understand” the short text. We will go over var-
semantic space, which is called embedding, ious techniques in knowledge acquisition, repre-
ranging from word embedding (Mikolov et sentation, and inferencing has been proposed for
al., 2013c; Mikolov et al., 2013a), phrase em- text understanding, and we will describe massive
bedding (Cho et al., 2014; Socher et al., 2010; structured and semi-structured data that have been
Mikolov et al., 2013b; Yu and Dredze, 2015), made available in the recent decade that directly or
to sentence embedding (Le and Mikolov, indirectly encode human knowledge, turning the
2014; Palangi et al., 2015; Kiros et al., 2015). knowledge representation problems into a compu-
The well-known approaches includes: using tational grand challenge with feasible solutions in
surrounding context to predicate the central sight.
word/phrase/sentence, or vice versa. Usu-
3.2 Tutorial outline
ally, implicit approaches are designed for
specific scenarios, such as short text conver- Following is the outline of the tutorial. The total
sation (Shang et al., 2015; Sordoni et al., length is about 3 hours.
2015; Vinyals and Le, 2015) and question an-
swering (Severyn and Moschitti, 2015; Qiu • Part I. Introduction and foundations (20
and Huang, 2015). In these given sce- min) We will introduce the challenge of short
narios, it is more easily to get large train- text understanding, and its various applica-
ing data. Then recurrent neural network tions, in order to motivate and inspire the au-
(RNN), long short-term memory (LSTM), dience of this problem area. This section will
and their variants are widely used to train the also provide a quick overview for the rest of
model. Encoder-decoder framework is also the tutorial.
frequently adopted with these models to cap- • Part II. Explicit short text understanding
ture the semantics of texts. (80 min) We will introduce current popu-
The purpose of this tutorial is to survey recent lar knowledge base systems which are used
advances on the topic of short text understanding, for building explicit models. Then we will
introduce the explicit representation such as language processing. He led research in seman-
conceptualization for segmentation, labeling, tic search, graph data processing systems, and dis-
syntax structure analysis, and applications. tributed query processing at Microsoft Research
Asia. He had been a research staff member at
• Part III. Implicit short text understanding IBM T. J. Watson Research Center from 2000 -
(60 min) We will introduce the major ap- 2009. He was Technical Assistant to Stuart Feld-
proaches used for building word embedding, man (Vice President of Computer Science of IBM
phrase embedding, and sentence embedding. Research) from 2006 to 2007, and Technical As-
Then we will introduce how deep neural net- sistant to Mark Wegman (Head of Computer Sci-
works are built on top of these embedding for ence of IBM Research) from 2007 to 2009. He re-
short text related applications. ceived the Ph.D. degree in computer science from
the University of California, Los Angeles in 2000.
• Conclusion We will introduce open research He has published more than 150 research papers
and application challenges (10 min) in referred international journals and conference
proceedings. He served PC Chair of conferences
4 Related Tutorials such as CIKM12 and he is on the editorial board
Part of this tutorial (learning the knowledge- of IEEE Transactions of Knowledge and Data En-
base for text understanding) was presented at gineering (TKDE), and Journal of Computer Sci-
ACM Multimedia 2014 & 2015 entitled “Learning ence and Technology (JCST). He won the best pa-
knowledge bases for text and multimedia” (Xie per award in ICDE 2015, 10 year best paper award
and Wang, 2014), which was the most attended in ICDM 2013, and best paper award of ER 2009.
tutorial at the conference (by attendee counts). Homepage: http://haixun.olidu.com/.
The “Inferencing in Information Extraction: Tech-
niques and Applications” (Barbosa et al., 2015) at
6 Conference
ICDE 2015 is also related. But our proposal is the This proposal is submitted to ACL 2016.
first that comprehensively study on short text un-
derstanding.
The estimate of the audience size: 100. References
Denilson Barbosa, Haixun Wang, and Cong Yu. 2015.
5 Proposer bios Inferencing in information extraction: Techniques
and applications. In International Conference on
Zhongyuan Wang is a Researcher at Microsoft Data Engineering (ICDE).
Research Asia (MSRA). He leads two projects at
MSRA: Enterprise Dictionary (knowledge mining Kyunghyun Cho, Bart Van Merriënboer, Caglar Gul-
cehre, Dzmitry Bahdanau, Fethi Bougares, Holger
from Enterprise) and Probase (knowledge mining Schwenk, and Yoshua Bengio. 2014. Learning
from Web). He got his Ph.D. degree in com- phrase representations using rnn encoder-decoder
puter science from Renmin University of China, for statistical machine translation. In EMNLP.
and his PhD thesis is “Short Text Understand-
Wen Hua, Zhongyuan Wang, Haixun Wang, Kai
ing”. Zhongyuan Wang has published 20+ pa- Zheng, and Xiaofang Zhou. 2015. Short text under-
pers (including ICDE 2015 Best Paper Award standing through lexical-semantic analysis. In Inter-
on short text understanding) in the leading inter- national Conference on Data Engineering (ICDE).
national conferences, such as VLDB, ICDE, IJ-
Dongwoo Kim, Haixun Wang, and Alice Oh. 2013.
CAI, CIKM, etc. He is also the co-author of Context-dependent conceptualization. In Proceed-
the book “Web Data Management: Concepts and ings of the Twenty-Third international joint confer-
Techniques”, published in 2014. His research in- ence on Artificial Intelligence (IJCAI), pages 2654–
terests include knowledge base, natural language 2661. AAAI Press.
processing, semantic network, machine learning, Ryan Kiros, Yukun Zhu, Ruslan R Salakhutdinov,
and web data mining. Homepage: http:// Richard Zemel, Raquel Urtasun, Antonio Torralba,
wangzhongyuan.com/en/. and Sanja Fidler. 2015. Skip-thought vectors. In
Haixun Wang is a research scientist / Engi- NIPS, pages 3276–3284.
neering manager at Facebook. Before Facebook, Quoc V Le and Tomas Mikolov. 2014. Distributed rep-
he is with Google Research, working on natural resentations of sentences and documents. In ICML.
Peipei Li, Haixun Wang, Kenny Zhu, Zhongyuan conceptualization using a probabilistic knowledge-
Wang, and Xindong Wu. 2013. Computing term base. In Proceedings of the Twenty-Second interna-
similarity by large probabilistic isa knowledge. In tional joint conference on Artificial Intelligence (IJ-
ACM International Conference on Information and CAI), pages 2330–2336. AAAI Press.
Knowledge Management (CIKM).
Alessandro Sordoni, Michel Galley, Michael Auli,
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Chris Brockett, Yangfeng Ji, Margaret Mitchell,
Dean. 2013a. Efficient estimation of word represen- Jian-Yun Nie, Jianfeng Gao, and Bill Dolan. 2015.
tations in vector space. In Proceedings of Workshop A neural network approach to context-sensitive gen-
at ICLR. eration of conversational responses. In Proc. of
NAACL-HLT.
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Cor-
rado, and Jeff Dean. 2013b. Distributed represen- Joshua B Tenenbaum, Charles Kemp, Thomas L Grif-
tations of words and phrases and their composition- fiths, and Noah D Goodman. 2011. How to grow a
ality. In Advances in neural information processing mind: statistics, structure, and abstraction. Science,
systems, pages 3111–3119. 331(6022):1279–85.

Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. Oriol Vinyals and Quoc Le. 2015. A neural conversa-
2013c. Linguistic regularities in continuous space tional model. In ICML Deep Learning Workshop.
word representations. In HLT-NAACL, pages 746–
751. Jingjing Wang, Haixun Wang, Zhongyuan Wang, and
Kenny Zhu. 2012. Understanding tables on the web.
Hamid Palangi, Li Deng, Yelong Shen, Jianfeng Gao, In International Conference on Conceptual Model-
Xiaodong He, Jianshu Chen, Xinying Song, and ing, October.
Rabab Ward. 2015. Deep sentence embedding us-
ing the long short term memory network: Analy- Fang Wang, Zhongyuan Wang, Zhoujun Li, and Ji-
sis and application to information retrieval. arXiv Rong Wen. 2014a. Concept-based short text classi-
preprint arXiv:1502.06922. fication and ranking. In ACM International Confer-
ence on Information and Knowledge Management
Xipeng Qiu and Xuanjing Huang. 2015. Con- (CIKM), October.
volutional neural tensor network architecture for
community-based question answering. In Proceed- Zhongyuan Wang, Haixun Wang, and Zhirui Hu.
ings of the 24th International Joint Conference on 2014b. Head, modifier, and constraint detection in
Artificial Intelligence (IJCAI), pages 1305–1311. short texts. In IEEE 30th International Conference
on Data Engineering (ICDE), pages 280–291. IEEE.
Aliaksei Severyn and Alessandro Moschitti. 2015.
Learning to rank short text pairs with convolutional Zhongyuan Wang, Haixun Wang, Ji-Rong Wen, and
deep neural networks. In Proceedings of the 38th Yanghua Xiao. 2015a. An inference approach to
International ACM SIGIR Conference on Research basic level of categorization. In Proceedings of the
and Development in Information Retrieval, pages 24th ACM International on Conference on Informa-
373–382. ACM. tion and Knowledge Management, pages 653–662.
ACM.
Lifeng Shang, Zhengdong Lu, and Hang Li. 2015.
Neural responding machine for short-text conversa- Zhongyuan Wang, Kejun Zhao, Haixun Wang, Xi-
tion. In Proceedings of the 53th Annual Meeting of aofeng Meng, and Ji-Rong Wen. 2015b. Query un-
Association for Computational Linguistics and the derstanding through knowledge-based conceptual-
7th International Joint Conference on Natural Lan- ization. In Proceedings of the Twenty-Fourth Inter-
guage Processing (ACL-IJCNLP’15). national Joint Conference on Artificial Intelligence
(IJCAI).
Richard Socher, Christopher D Manning, and An-
drew Y Ng. 2010. Learning continuous phrase Lexing Xie and Haixun Wang. 2014. Learning knowl-
representations and syntactic parsing with recursive edge bases for text and multimedia. In Proceedings
neural networks. In Proceedings of the NIPS-2010 of the ACM International Conference on Multime-
Deep Learning and Unsupervised Feature Learning dia, pages 1235–1236.
Workshop, pages 1–9. Mo Yu and Mark Dredze. 2015. Learning composition
Yangqiu Song, Haixun Wang, Zhongyuan Wang, models for phrase embeddings. Transactions of the
Hongsong Li, and Weizhu Chen. 2011. Short text Association for Computational Linguistics, 3:227–
242.

You might also like