You are on page 1of 54

Database Systems for Advanced

Applications 22nd International


Conference DASFAA 2017 Suzhou
China March 27 30 2017 Proceedings
Part I 1st Edition Selçuk Candan
Visit to download the full and correct content document:
https://textbookfull.com/product/database-systems-for-advanced-applications-22nd-int
ernational-conference-dasfaa-2017-suzhou-china-march-27-30-2017-proceedings-par
t-i-1st-edition-selcuk-candan/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Database Systems for Advanced Applications 25th


International Conference DASFAA 2020 Jeju South Korea
September 24 27 2020 Proceedings Part I Yunmook Nah

https://textbookfull.com/product/database-systems-for-advanced-
applications-25th-international-conference-dasfaa-2020-jeju-
south-korea-september-24-27-2020-proceedings-part-i-yunmook-nah/

Database Systems for Advanced Applications 25th


International Conference DASFAA 2020 Jeju South Korea
September 24 27 2020 Proceedings Part II Yunmook Nah

https://textbookfull.com/product/database-systems-for-advanced-
applications-25th-international-conference-dasfaa-2020-jeju-
south-korea-september-24-27-2020-proceedings-part-ii-yunmook-nah/

Database Systems for Advanced Applications 25th


International Conference DASFAA 2020 Jeju South Korea
September 24 27 2020 Proceedings Part III Yunmook Nah

https://textbookfull.com/product/database-systems-for-advanced-
applications-25th-international-conference-dasfaa-2020-jeju-
south-korea-september-24-27-2020-proceedings-part-iii-yunmook-
nah/

Intelligent Information and Database Systems: 9th Asian


Conference, ACIIDS 2017, Kanazawa, Japan, April 3-5,
2017, Proceedings, Part I 1st Edition Ngoc Thanh Nguyen

https://textbookfull.com/product/intelligent-information-and-
database-systems-9th-asian-conference-aciids-2017-kanazawa-japan-
april-3-5-2017-proceedings-part-i-1st-edition-ngoc-thanh-nguyen/
Database Systems for Advanced Applications DASFAA 2020
International Workshops BDMS SeCoP BDQM GDMA and AIDE
Jeju South Korea September 24 27 2020 Proceedings
Yunmook Nah
https://textbookfull.com/product/database-systems-for-advanced-
applications-dasfaa-2020-international-workshops-bdms-secop-bdqm-
gdma-and-aide-jeju-south-korea-september-24-27-2020-proceedings-
yunmook-nah/

Requirements Engineering Foundation for Software


Quality 23rd International Working Conference REFSQ
2017 Essen Germany February 27 March 2 2017 Proceedings
1st Edition Paul Grünbacher
https://textbookfull.com/product/requirements-engineering-
foundation-for-software-quality-23rd-international-working-
conference-refsq-2017-essen-germany-
february-27-march-2-2017-proceedings-1st-edition-paul-grunbacher/

Emerging Technologies for Developing Countries: First


International EAI Conference, AFRICATEK 2017,
Marrakech, Morocco, March 27-28, 2017 Proceedings 1st
Edition Fatna Belqasmi Et Al. (Eds.)
https://textbookfull.com/product/emerging-technologies-for-
developing-countries-first-international-eai-conference-
africatek-2017-marrakech-morocco-
march-27-28-2017-proceedings-1st-edition-fatna-belqasmi-et-al-
eds/
Information Security and Privacy: 22nd Australasian
Conference, ACISP 2017, Auckland, New Zealand, July
3–5, 2017, Proceedings, Part I 1st Edition Josef
Pieprzyk
https://textbookfull.com/product/information-security-and-
privacy-22nd-australasian-conference-acisp-2017-auckland-new-
zealand-july-3-5-2017-proceedings-part-i-1st-edition-josef-
pieprzyk/

Business Information Systems: 20th International


Conference, BIS 2017, Poznan, Poland, June 28–30, 2017,
Proceedings 1st Edition Witold Abramowicz (Eds.)

https://textbookfull.com/product/business-information-
systems-20th-international-conference-bis-2017-poznan-poland-
june-28-30-2017-proceedings-1st-edition-witold-abramowicz-eds/
Selçuk Candan · Lei Chen
Torben Bach Pedersen · Lijun Chang
Wen Hua (Eds.)
LNCS 10177

Database Systems
for Advanced Applications
22nd International Conference, DASFAA 2017
Suzhou, China, March 27–30, 2017
Proceedings, Part I

123
Lecture Notes in Computer Science 10177
Commenced Publication in 1973
Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

Editorial Board
David Hutchison
Lancaster University, Lancaster, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Friedemann Mattern
ETH Zurich, Zurich, Switzerland
John C. Mitchell
Stanford University, Stanford, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
C. Pandu Rangan
Indian Institute of Technology, Madras, India
Bernhard Steffen
TU Dortmund University, Dortmund, Germany
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max Planck Institute for Informatics, Saarbrücken, Germany
More information about this series at http://www.springer.com/series/7409
Selçuk Candan Lei Chen

Torben Bach Pedersen Lijun Chang


Wen Hua (Eds.)

Database Systems
for Advanced Applications
22nd International Conference, DASFAA 2017
Suzhou, China, March 27–30, 2017
Proceedings, Part I

123
Editors
Selçuk Candan Lijun Chang
Arizona State University University of New South Wales
Tempe - Phoenix, AZ Sydney, NSW
USA Australia
Lei Chen Wen Hua
Hong Kong University of Science The University of Queensland
and Technology Brisbane, QLD
Hong Kong Australia
China
Torben Bach Pedersen
Aalborg University
Aalborg
Denmark

ISSN 0302-9743 ISSN 1611-3349 (electronic)


Lecture Notes in Computer Science
ISBN 978-3-319-55752-6 ISBN 978-3-319-55753-3 (eBook)
DOI 10.1007/978-3-319-55753-3

Library of Congress Control Number: 2017934640

LNCS Sublibrary: SL3 – Information Systems and Applications, incl. Internet/Web, and HCI

© Springer International Publishing AG 2017


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now
known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors
give a warranty, express or implied, with respect to the material contained herein or for any errors or
omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.

Printed on acid-free paper

This Springer imprint is published by Springer Nature


The registered company is Springer International Publishing AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface

It is our great pleasure to welcome you to DASFAA 2017, the 22nd edition of the
International Conference on Database Systems for Advanced Applications (DASFAA),
which was held in Suzhou, China, during March 27–30, 2017.
The long history of Suzhou City has left behind many attractive scenic spots and
historical sites with beautiful and interesting legends. The elegant classical gardens, the
old-fashioned houses and delicate bridges hanging over flowing waters in the drizzling
rain, the beautiful lakes with undulating hills in lush green, and the exquisite arts and
crafts, among many other attractions, have made Suzhou a renowned historical and
cultural city full of eternal and poetic charm. Suzhou is best known for its gardens: the
Humble Administrator’s Garden, the Lingering Garden, the Surging Wave Pavilion,
and the Master of Nets Garden. These gardens weave together the best of traditional
Chinese architecture, painting, and arts. Suzhou is also known as the “Venice of the
East.” The city is sandwiched between Taihu Lake and Grand Canal. A network of
channels, criss-crossed with hump-backed bridges, give Suzhou an image of the city on
the water.
We were delighted to offer an exciting technical program, including three keynote
talks by Divesh Srivastava (AT&T Research), Christian S. Jensen (Aalborg Univer-
sity), and Victor Chang (Xi’an Jiaotong University and Liverpool University), one
10-year best paper award presentation; a demo session with four demonstrations; two
industry sessions with nine paper presentations; three tutorial sessions; and of course a
superb set of research papers. This year, we received 300 submissions to the research
track, each of which went through a rigorous review process. Specifically, each paper
was reviewed by at least three Program Committee (PC) members, followed by a
discussion led by the PC co-chairs. Several papers went through a shepherding process.
Finally, DASFA 2017 accepted 73 full research papers, yielding an acceptance ratio of
24.3%.
Four workshops were selected by the workshop co-chairs to be held in conjunction
with DASFAA 2017. They are the 4th International Workshop on Big Data Man-
agement and Service (BDMS 2017), the Second Workshop on Big Data Quality
Management (BDQM 2017), the 4th International Workshop on Semantic Computing
and Personalization (SeCoP), and the First International Workshop on Data Manage-
ment and Mining on MOOCs (DMMOOC 2017).
The workshop papers are included in a separate volume of the proceedings also
published by Springer in its Lecture Notes in Computer Science series.
The conference received generous financial support from Soochow University. The
conference organizers also received extensive help and logistic support from the
DASFAA Steering Committee and the Conference Management Toolkit Support Team
at Microsoft.
VI Preface

We are grateful to the general chairs, Karl Aberer, EPFL, Switzerland, Peter
Scheuermann, Northwestern University, USA, and Kai Zheng, Soochow University,
China, the members of the Organizing Committee, and many volunteers, for their great
support in the conference organization. Special thanks also go to the DASFAA 2017
local Organizing Committee: Zhixu Li and Jiajie Xu, both from Soochow University,
China. Finally, we would like to take this opportunity to thank the authors who sub-
mitted their papers to the conference and the PC members and external reviewers for
their expertise and help in evaluating the submissions.

February 2017 K. Selçuk Candan


Lei Chen
Torben Bach Pedersen
Organization

General Co-chairs
Karl Aberer EPFL, Switzerland
Peter Scheuermann Northwestern University, USA
Kai Zheng Soochow University, China

Program Committee Co-chairs


Selçuk Candan Arizona State University, USA
Lei Chen HKUST, Hong Kong, SAR China
Torben Bach Pedersen Aalborg University, Denmark

Workshops Co-chairs
Zhifeng Bao RMIT, Australia
Goce Trajcevski Northwestern University, USA

Industrial/Practitioners Track Co-chairs


Nicholas Jing Yuan Microsoft, China
Georgia Koutrika HP Labs, USA

Demo Track Co-chairs


Meihui Zhang Singapore University of Technology and Design,
Singapore
Wook-Shin Han POSTECH University, South Korea

Tutorial Co-chairs
Katja Hose Aalborg University, Denmark
Huiping Cao New Mexico State University, USA

Panel Chair
Xuemin Lin University of New South Wales, Australia

Proceedings Co-chairs
Lijun Chang University of New South Wales, Australia
Wen Hua University of Queensland, Australia
VIII Organization

Publicity Co-chairs
Bin Yang Aalborg University, Denmark
Xiang Lian University of Texas-Pan American, USA
Maria Luisa Sapino University of Turin, Italy

Local Organization Co-chairs


Zhixu Li Soochow University, China
Jiajie Xu Soochow University, China

Steering Committee Liaison


Xiaofang Zhou University of Queensland, Australia

Conference Secretary
Yan Zhao Soochow University, China

Webmaster
Yang Li Soochow University, China

Program Committee
Amr El Abbadi University of California at Santa Barbara, USA
Alberto Abello UPC Barcelona, Spain
Divyakant Agrawal University of California at Santa Barbara, USA
Marco Aldinucci University of Turin, Italy
Ira Assent Aarhus University, Denmark
Rafael Berlanga Llavori University Jaume I, Spain
Francescho Bonchi ISI Foundation, Italy
Selcuk Candan Arizona State University, USA
Huiping Cao New Mexico State University, USA
Barbara Catania Università di Genova, Italy
Qun Chen Northwestern Polytechnical University, China
Reynold Cheng University of Hong Kong, SAR China
Wonik Choi Inha University, South Korea
Gao Cong Nanyang Technological University, Singapore
Bin Cui Peiking University, China
Lars Dannecker SAP, Germany
Hasan Davulcu Arizona State University, USA
Ugur Demiryurek University of Southern California, USA
Francesco Di Mauro University of Turin, Italy
Curtis Dyreson Utah State University, USA
Hakan Ferhatosmanoglu Bilkent University, Turkey
Organization IX

Elena Ferrari Università dell’Insubria, Italy


Johann Gamper Free University of Bozen-Bolzano, Italy
Hong Gao Harbin Institute of Technology, China
Yunjun Gao Zhejiang University, China
Yash Garg Arizona State University, USA
Lukasz Golab University of Waterloo, Canada
Le Gruenwald University of Oklahoma, USA
Ismail Hakki Toroslu Middle East Technical University, Turkey
Jingrui He Arizona State University, USA
Yoshiharu Ishikawa Nagoya University, Japan
Linnan Jiang HKUST, SAR China
Cheqing Jin East China Normal University, China
Alekh Jindal MIT, USA
Sungwon Jung Jung Sogang University, South Korea
Arijit Khan Nanyang Technological University, Singapore
Latifur Khan University of Texas at Dallas, USA
Deok-Hwan Kim Inha University, South Korea
Jinho Kim Kangwon National University, South Korea
Jong Wook Kim Sangmyung University, South Korea
Jung Hyun Kim Arizona State University, USA
Sang-Wook Kim Hanyang University, South Korea
Peer Kroger LMU Munich, Germany
Anne Laurent Université de Montpellier II, France
Wookey Lee Inha University, South Korea
Young-Koo Lee Kyung Hee University, South Korea
Chengkai Li University of Texas at Arlington, USA
Feifei Li University of Utah, USA
Guoliang Li Tsinghua University, China
Zhixu Li Soochow University, China
Xiang Lian Kent State University, USA
Eric Lo Chinese University of Hong Kong, SAR China
Woong-Kee Loh Gachon University, South Korea
Hua Lu Aalborg University, Denmark
Nikos Mamoulis University of Ioannina, Greece/University
of Hong Kong, SAR China
Ioana Manolescu Inria, France
Rui Meng HKUST, SAR China
Parth Nagarkar Arizona State University, USA
Yunmook Nah Dankook University, South Korea
Kjetil Norvag Norwegian University of Science and Technology,
Norway
Sarana Yi Nutanong City University of Hong Kong, SAR China
Vincent Oria New Jersey Institute of Technology, USA
Paolo Papotti Arizona State University, USA
Torben Bach Pedersen Aalborg University, Denmark
Ruggero Pensa University of Turin, Italy
X Organization

Dieter Pfoser George Mason University, USA


Evaggelia Pitoura University of Ioannina, Greece
Silvestro Poccia University of Turin, Italy
Weixiong Rao Tongji University, China
Matthias Renz George Mason University, USA
Oscar Romero UPC Barcelona, Spain
Florin Rusu University of California Merced, USA
Simonas Saltenis Aalborg University, Denmark
Maria Luisa Sapino University of Turin, Italy
Claudio Schifanella University of Turin, Italy
Cyrus Shahabi University of Southern California, USA
Jieying She HKUST, SAR China
Hengtao Shen University of Queensland, China
Yanyan Shen Shanghai Jiao Tong University, China
Alkis Simitsis HP Labs, USA
Shaoxu Song Tsinghua University, China
Yangqiu Song HKUST, SAR China
Xiaoshuai Sun University of Queensland, Australia
Letizia Tanca Politecnico di Milano, Italy
Nan Tang Qatar Computing Research Institute, Qatar
Egemen Tanin University of Melbourne, Australia
Junichi Tatemura Google, USA
Christian Thomsen Aalborg University, Denmark
Hanghang Tong Arizona State University, USA
Yongxin Tong Beihang University, China
Panos Vassiliadis University of Ioannina, Greece
Sabrina De Capitani University of Milan, Italy
Vimercati
Bin Wang Northeastern University, China
Wei Wang National University of Singapore, Singapore
Xin Wang Tianjin University, China
John Wu Lawrence Berkeley Lab, USA
Xiaokui Xiao Nanyang Technological University, Singapore
Xike Xie University of Science and Technology of China, China
Jianliang Xu Hong Kong Bapatist University, SAR China
Jeffrey Xu Yu Chinese University of Hong Kong, China
Xiaochun Yang Northeastern University, China
Bin Yao Shanghai Jiao Tong University, China
Hongzhi Yin University of Queensland, Australia
Man Lung Yiu Hong Kong Polytechnic, SAR China
Yi Yu National Institute of Informatics, Japan
Ye Yuan Northeastern University, China
Meihui Zhang Singapore University of Technology and Design,
Singapore
Wenjie Zhang University of New South Wales, Australia
Ying Zhang University of Technology Sydney, Australia
Organization XI

Zhengjie Zhang Advanced Digital Sciences Center, Singapore


Xiangmin Zhou RMIT, Australia
Yongluan Zhou University of Southern Denmark, Denmark
Lei Zhu University of Queensland, Australia
Esteban Zimanyi Université Libre de Bruxelles, Belgium
Andreas Zufle George Mason University, USA

Industry Track Program Committee


Akhil Arora Xerox Research Centre, India
Jie Bao Microsoft, China
Senjuti Basu Roy UW Tacoma, USA
Neil Zhenqiang Gong Iowa State University, USA
Defu Lian University of Electronic Science and Technology,
China
Qi Liu University of Science and Technology of China, China
Alkis Simitsis HP Labs, USA
Kostas Stefanidis University of Tampere, Finland
Lu-An Tang NEC Lab, USA
Fuzheng Zhang Microsoft, China
Hengshu Zhu Baidu, China

Demo Track Program Committee


Jinha Kim Oracle Labs, USA
Xuan Liu Baidu, China
Yanyan Shen Shanghai Jiao Tong University, China
Yongxin Tong Beihang University, China

Additional Reviewers
Jinpeng Chen Beihang University, China
Xilun Chen Arizona State University, USA
Yu Cheng Turn Inc., USA
Alexander Crosdale UC Merced, USA
Tiziano De Matteis University of Pisa, Italy
Vasilis Efthymiou University of Crete, Greece
Roberto Esposito University of Turin, Italy
Yixiang Fang The University of Hong Kong, SAR China
Christian Frey LMU Munich, Germany
Yash Garg Arizona State University, USA
Concorde Habineza George Mason University, USA
Jiafeng Hu The University of Hong Kong, SAR China
Zhiyi Huang UC Merced, USA
Zhipeng Huang The University of Hong Kong, SAR China
Shengyu Huang Arizona State University, USA
XII Organization

Petar Jovanovic UPC Barcelona, Spain


Elvis Koci Technische Universität Dresden, Germany
Georgia Koloniari University of Macedonia, Greece
Haridimos Kondylakis ICS-FORTH, Greece
Rohit Kumar
Xiaodong Li The University of Hong Kong, SAR China
Xinsheng Li Arizona State University, USA
Mao-Lin Li Arizona State University, USA
Xuan Liu Google, USA
Sicong Liu Arizona State University, USA
Yujing Ma UC Merced, USA
Sebastian Mattheis BMW Car-IT, Germany
Mirjana Mazuran Politecnico di Milano, Italy
Gabriele Mencagli University of Pisa, Italy
Faisal Munir
Sergi Nadal Universitat Politècnica de Catalunya, Spain
Silvestro Poccia University of Turin, Italy
Emanuele Rabosio Politecnico di Milano, Italy
Klaus A. Schmid Ludwig-Maximilians Universität München, Germany
Caihua Shan The University of Hong Kong, SAR China
Haiqi Sun The University of Hong Kong, SAR China
Massimo Torquati University of Turin, Italy
Martin Torres UC Merced, USA
Jovan Varga Universitat Politècnica de Catalunya, Spain
Paolo Viviani University of Turin, Italy
Hongwei Wang
Jianqiu Xu Nanjing University of Aeronautics and Astronautics,
China
Yong Xu The University of Hong Kong, SAR China
Xiaoyan Yang Advanced Digital Sciences Center, Singapore
Liming Zhan University of New South Wales, Australia
Xin Zhang UC Merced, USA
Weijie Zhao UC Merced, USA
Yuhai Zhao Northeastern University, China
Contents – Part I

Semantic Web and Knowledge Management

A General Fine-Grained Truth Discovery Approach for Crowdsourced


Data Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Yang Du, Hongli Xu, Yu-E Sun, and Liusheng Huang

Learning the Structures of Online Asynchronous Conversations . . . . . . . . . . 19


Jun Chen, Chaokun Wang, Heran Lin, Weiping Wang, Zhipeng Cai,
and Jianmin Wang

A Question Routing Technique Using Deep Neural Network


for Communities of Question Answering . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Amr Azzam, Neamat Tazi, and Ahmad Hossny

Category-Level Transfer Learning from Knowledge Base to Microblog


Stream for Accurate Event Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Weijing Huang, Tengjiao Wang, Wei Chen, and Yazhou Wang

Indexing and Distributed Systems

AngleCut: A Ring-Based Hashing Scheme for Distributed


Metadata Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Jiaxi Liu, Renxuan Wang, Xiaofeng Gao, Xiaochun Yang,
and Guihai Chen

An Efficient Bulk Loading Approach of Secondary Index in Distributed


Log-Structured Data Stores. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Yanchao Zhu, Zhao Zhang, Peng Cai, Weining Qian, and Aoying Zhou

Performance Comparison of Distributed Processing of Large Volume


of Data on Top of Xen and Docker-Based Virtual Clusters . . . . . . . . . . . . . 103
Haejin Chung and Yunmook Nah

An Adaptive Data Partitioning Scheme for Accelerating Exploratory Spark


SQL Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Chenghao Guo, Zhigang Wu, Zhenying He, and X. Sean Wang

Network Embedding

Semi-Supervised Network Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131


Chaozhuo Li, Zhoujun Li, Senzhang Wang, Yang Yang,
Xiaoming Zhang, and Jianshe Zhou
XIV Contents – Part I

CirE: Circular Embeddings of Knowledge Graphs . . . . . . . . . . . . . . . . . . . . 148


Zhijuan Du, Zehui Hao, Xiaofeng Meng, and Qiuyue Wang

PPNE: Property Preserving Network Embedding. . . . . . . . . . . . . . . . . . . . . 163


Chaozhuo Li, Senzhang Wang, Dejian Yang, Zhoujun Li, Yang Yang,
Xiaoming Zhang, and Jianshe Zhou

HINE: Heterogeneous Information Network Embedding. . . . . . . . . . . . . . . . 180


Yuxin Chen and Chenguang Wang

Trajectory and Time Series Data Processing

DT-KST: Distributed Top-k Similarity Query on Big Trajectory Streams . . . . 199


Zhigang Zhang, Yilin Wang, Jiali Mao, Shaojie Qiao, Cheqing Jin,
and Aoying Zhou

A Distributed Multi-level Composite Index for KNN Processing on Long


Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
Xiaqing Wang, Zicheng Fang, Peng Wang, Ruiyuan Zhu, and Wei Wang

Outlier Trajectory Detection: A Trajectory Analytics Based Approach . . . . . . 231


Zhongjian Lv, Jiajie Xu, Pengpeng Zhao, Guanfeng Liu, Lei Zhao,
and Xiaofang Zhou

Clustering Time Series Utilizing a Dimension Hierarchical


Decomposition Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
Qiuhong Li, Peng Wang, Yang Wang, Wei Wang, Yimin Liu, Jiaye Wu,
and Danyang Dou

Data Mining

Fast Extended One-Versus-Rest Multi-label SVM Classification Algorithm


Based on Approximate Extreme Points . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
Zhongwei Sun, Zhongwen Guo, Xupeng Wang, Jing Liu,
and Shiyong Liu

Efficiently Discovering Most-Specific Mixed Patterns from Large


Data Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
Xiaoying Wu and Dimitri Theodoratos

Max-Cosine Matching Based Neural Models for Recognizing


Textual Entailment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
Zhipeng Xie and Junfeng Hu

An Intelligent Field-Aware Factorization Machine Model . . . . . . . . . . . . . . . 309


Cairong Yan, Qinglong Zhang, Xue Zhao, and Yongfeng Huang
Contents – Part I XV

Query Processing and Optimization (I)

Beyond Skylines: Explicit Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327


Markus Endres and Timotheus Preisinger

Optimizing Window Aggregate Functions in Relational Database Systems . . . 343


Guangxuan Song, Jiansong Ma, Xiaoling Wang, Cheqing Jin,
and Yu Cao

Query Optimization on Hybrid Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . 361


Anxuan Yu, Qingzhong Meng, Xuan Zhou, Binyu Shen,
and Yansong Zhang

Efficient Batch Grouping in Relational Datasets . . . . . . . . . . . . . . . . . . . . . 376


Jizhou Sun, Jianzhong Li, and Hong Gao

Text Mining

Memory-Enhanced Latent Semantic Model: Short Text Understanding for


Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
Fei Hu, Xiaofei Xu, Jingyuan Wang, Zhanbo Yang, and Li Li

Supervised Intensive Topic Models for Emotion Detection over Short Text . . . . 408
Yanghui Rao, Jianhui Pang, Haoran Xie, An Liu, Tak-Lam Wong,
Qing Li, and Fu Lee Wang

Leveraging Pattern Associations for Word Embedding Models . . . . . . . . . . . 423


Qian Liu, Heyan Huang, Yang Gao, Xiaochi Wei, and Ruiying Geng

Multi-Granularity Neural Sentence Model for Measuring


Short Text Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439
Jiangping Huang, Shuxin Yao, Chen Lyu, and Donghong Ji

Recommendation

Leveraging Kernel Incorporated Matrix Factorization for Smartphone


Application Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
Chenyang Liu, Jian Cao, and Jing He

Preference Integration in Context-Aware Recommendation . . . . . . . . . . . . . . 475


Lin Zheng and Fuxi Zhu

Jointly Modeling Heterogeneous Temporal Properties


in Location Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490
Saeid Hosseini, Hongzhi Yin, Meihui Zhang, Xiaofang Zhou,
and Shazia Sadiq
XVI Contents – Part I

Location-Aware News Recommendation Using Deep Localized


Semantic Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507
Cheng Chen, Thomas Lukasiewicz, Xiangwu Meng, and Zhenghua Xu

Review-Based Cross-Domain Recommendation Through Joint


Tensor Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525
Tianhang Song, Zhaohui Peng, Senzhang Wang, Wenjing Fu,
Xiaoguang Hong, and Philip S. Yu

Security, Privacy, Senor and Cloud

A Local-Clustering-Based Personalized Differential Privacy Framework


for User-Based Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543
Yongkai Li, Shubo Liu, Jun Wang, and Mengjun Liu

Fast Multi-dimensional Range Queries on Encrypted Cloud Databases. . . . . . 559


Jialin Chi, Cheng Hong, Min Zhang, and Zhenfeng Zhang

When Differential Privacy Meets Randomized Perturbation: A Hybrid


Approach for Privacy-Preserving Recommender System. . . . . . . . . . . . . . . . 576
Xiao Liu, An Liu, Xiangliang Zhang, Zhixu Li, Guanfeng Liu, Lei Zhao,
and Xiaofang Zhou

Supporting Cost-Efficient Multi-tenant Database Services with Service


Level Objectives (SLOs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592
Yifeng Luo, Junshi Guo, Jiaye Zhu, Jihong Guan, and Shuigeng Zhou

Recovering Missing Values from Corrupted Spatio-Temporal Sensory Data


via Robust Low-Rank Tensor Completion . . . . . . . . . . . . . . . . . . . . . . . . . 607
Wenjie Ruan, Peipei Xu, Quan Z. Sheng, Nickolas J.G. Falkner, Xue Li,
and Wei Emma Zhang

Social Network Analytics (I)

Group-Level Influence Maximization with Budget Constraint . . . . . . . . . . . . 625


Qian Yan, Hao Huang, Yunjun Gao, Wei Lu, and Qinming He

Correlating Stressor Events for Social Network Based Adolescent


Stress Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 642
Qi Li, Liang Zhao, Yuanyuan Xue, Li Jin, Mostafa Alli, and Ling Feng

Emotion Detection in Online Social Network Based


on Multi-label Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659
Xiao Zhang, Wenzhong Li, and Sanglu Lu
Contents – Part I XVII

Tutorials

Urban Computing: Enabling Urban Intelligence with Big Data . . . . . . . . . . . 677


Yu Zheng

Incentive-Based Dynamic Content Management in Mobile Crowdsourcing


for Smart City Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 680
Sanjay K. Madria

Tutorial on Data Analytics in Multi-engine Environments . . . . . . . . . . . . . . 682


Verena Kantere and Maxim Filatov

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685


Contents – Part II

Map Matching and Spatial Keywords

HIMM: An HMM-Based Interactive Map-Matching System . . . . . . . . . . . . . 3


Xibo Zhou, Ye Ding, Haoyu Tan, Qiong Luo, and Lionel M. Ni

HyMU: A Hybrid Map Updating Framework . . . . . . . . . . . . . . . . . . . . . . . 19


Tao Wang, Jiali Mao, and Cheqing Jin

Multi-objective Spatial Keyword Query with Semantics . . . . . . . . . . . . . . . . 34


Jing Chen, Jiajie Xu, Chengfei Liu, Zhixu Li, An Liu, and Zhiming Ding

Query Processing and Optimization (II)

RSkycube: Efficient Skycube Computation by Reusing Principle . . . . . . . . . 51


Kaiqi Zhang, Hong Gao, Xixian Han, Donghua Yang, Zhipeng Cai,
and Jianzhong Li

Similarity Search Combining Query Relaxation and Diversification . . . . . . . . 65


Ruoxi Shi, Hongzhi Wang, Tao Wang, Yutai Hou, Yiwen Tang,
Jianzhong Li, and Hong Gao

An Unsupervised Approach for Low-Quality Answer Detection


in Community Question-Answering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Haocheng Wu, Zuohui Tian, Wei Wu, and Enhong Chen

Approximate OLAP on Sustained Data Streams . . . . . . . . . . . . . . . . . . . . . 102


Salman Ahmed Shaikh and Hiroyuki Kitagawa

Search and Information Retrieval

Hierarchical Semantic Representations of Online News Comments


for Emotion Tagging Using Multiple Information Sources . . . . . . . . . . . . . . 121
Chao Wang, Ying Zhang, Wei Jie, Christian Sauer, and Xiaojie Yuan

Towards a Query-Less News Search Framework on Twitter . . . . . . . . . . . . . 137


Xiaotian Hao, Ji Cheng, Jan Vosecky, and Wilfred Ng

Semantic Definition Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153


Zehui Hao, Zhongyuan Wang, Xiaofeng Meng, Jun Yan,
and Qiuyue Wang
XX Contents – Part II

An Improved Approach for Long Tail Advertising in Sponsored Search . . . . 169


Amar Budhiraja and P. Krishna Reddy

String and Sequence Processing

Locating Longest Common Subsequences with Limited Penalty . . . . . . . . . . 187


Bin Wang, Xiaochun Yang, and Jinxu Li

Top-k String Auto-Completion with Synonyms . . . . . . . . . . . . . . . . . . . . . . 202


Pengfei Xu and Jiaheng Lu

Efficient Regular Expression Matching on Compressed Strings . . . . . . . . . . . 219


Yutong Han, Bin Wang, Xiaochun Yang, and Huaijie Zhu

Mining Top-k Distinguishing Temporal Sequential Patterns


from Event Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
Lei Duan, Li Yan, Guozhu Dong, Jyrki Nummenmaa, and Hao Yang

Stream Data Processing

Soft Quorums: A High Availability Solution for Service Oriented


Stream Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
Chunyao Song, Tingjian Ge, Cindy Chen, and Jie Wang

StroMAX: Partitioning-Based Scheduler for Real-Time Stream


Processing System. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
Jiawei Jiang, Zhipeng Zhang, Bin Cui, Yunhai Tong, and Ning Xu

Partition-Based Clustering with Sliding Windows for Data Streams . . . . . . . . 289


Jonghem Youn, Jihun Choi, Junho Shim, and Sang-goo Lee

CBP: A New Parallelization Paradigm for Massively Distributed


Stream Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
Qingsong Guo and Yongluan Zhou

Social Network Analytics (II)

Measuring and Maximizing Influence via Random Walk in Social


Activity Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
Pengpeng Zhao, Yongkun Li, Hong Xie, Zhiyong Wu, Yinlong Xu,
and John C.S. Lui

Adaptive Overlapping Community Detection with Bayesian NonNegative


Matrix Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
Xiaohua Shi, Hongtao Lu, and Guanbo Jia
Contents – Part II XXI

A Unified Approach for Learning Expertise and Authority


in Digital Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
B. de La Robertie, L. Ermakova, Y. Pitarch, A. Takasu, and O. Teste

Graph and Network Data Processing

Efficient Local Clustering Coefficient Estimation in Massive Graphs . . . . . . . 371


Hao Zhang, Yuanyuan Zhu, Lu Qin, Hong Cheng, and Jeffrey Xu Yu

Efficient Processing of Growing Temporal Graphs . . . . . . . . . . . . . . . . . . . 387


Huanhuan Wu, Yunjian Zhao, James Cheng, and Da Yan

Effective k-Vertex Connected Component Detection


in Large-Scale Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404
Yuan Li, Yuhai Zhao, Guoren Wang, Feida Zhu, Yubao Wu,
and Shengle Shi

Spatial Databases

Efficient Landmark-Based Candidate Generation for kNN Queries


on Road Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
Tenindra Abeywickrama and Muhammad Aamir Cheema

MinSum Based Optimal Location Query in Road Networks . . . . . . . . . . . . . 441


Lv Xu, Ganglin Mai, Zitong Chen, Yubao Liu, and Genan Dai

Efficiently Mining High Utility Co-location Patterns from Spatial Data Sets
with Instance-Specific Utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
Lizhen Wang, Wanguo Jiang, Hongmei Chen, and Yuan Fang

Real Time Data Processing

Supporting Real-Time Analytic Queries in Big


and Fast Data Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
Guangjun Wu, Xiaochun Yun, Chao Li, Shupeng Wang, Yipeng Wang,
Xiaoyu Zhang, Siyu Jia, and Guangyan Zhang

Boosting Moving Average Reversion Strategy for Online Portfolio


Selection: A Meta-learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494
Xiao Lin, Min Zhang, Yongfeng Zhang, Zhaoquan Gu, Yiqun Liu,
and Shaoping Ma

Continuous Summarization over Microblog Threads . . . . . . . . . . . . . . . . . . 511


Liangjun Song, Ping Zhang, Zhifeng Bao, and Timos Sellis

Drawing Density Core-Sets from Incomplete Relational Data . . . . . . . . . . . . 527


Yongnan Liu, Jianzhong Li, and Hong Gao
XXII Contents – Part II

Big Data (Industrial)

Co-training an Improved Recurrent Neural Network with Probability


Statistic Models for Named Entity Recognition . . . . . . . . . . . . . . . . . . . . . . 545
Yueqing Sun, Lin Li, Zhongwei Xie, Qing Xie, Xin Li, and Guandong Xu

EtherQL: A Query Layer for Blockchain System. . . . . . . . . . . . . . . . . . . . . 556


Yang Li, Kai Zheng, Ying Yan, Qi Liu, and Xiaofang Zhou

Optimizing Scalar User-Defined Functions in In-Memory Column-Store


Database Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568
Cheol Ryu, Sunho Lee, Kihong Kim, Kunsoo Park, Yongsik Kwon,
Sang Kyun Cha, Changbin Song, Emanuel Ziegler, and Stephan Muench

GPS-Simulated Trajectory Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581


Han Su, Wei Chen, Rong Liu, Min Nie, Bolong Zheng, Zehao Huang,
and Defu Lian

Social Networks and Graphs (Industrial)

Predicting Academic Performance via Semi-supervised Learning


with Constructed Campus Social Network . . . . . . . . . . . . . . . . . . . . . . . . . 597
Huaxiu Yao, Min Nie, Han Su, Hu Xia, and Defu Lian

Social User Profiling: A Social-Aware Topic Modeling Perspective. . . . . . . . 610


Chao Ma, Chen Zhu, Yanjie Fu, Hengshu Zhu, Guiquan Liu,
and Enhong Chen

Cost-Effective Data Partition for Distributed Stream Processing System . . . . . 623


Xiaotong Wang, Junhua Fang, Yuming Li, Rong Zhang,
and Aoying Zhou

A Graph-Based Push Service Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . 636


Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He

Edge Influence Computation in Dynamic Graphs . . . . . . . . . . . . . . . . . . . . 649


Yongrui Qin, Quan Z. Sheng, Simon Parkinson,
and Nickolas J.G. Falkner

Demos

DKGBuilder: An Architecture for Building a Domain Knowledge Graph


from Scratch. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663
Yan Fan, Chengyu Wang, Guomin Zhou, and Xiaofeng He
Contents – Part II XXIII

CLTR: Collectively Linking Temporal Records Across


Heterogeneous Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 668
Yanyan Zou and Kasun S. Perera

PhenomenaAssociater: Linking Multi-domain Spatio-Temporal Datasets. . . . . 672


Prathamesh Walkikar and Vandana P. Janeja

VisDM–A Data Stream Visualization Platform . . . . . . . . . . . . . . . . . . . . . . 677


Lars Melander, Kjell Orsborn, Tore Risch, and Daniel Wedlund

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 681


Semantic Web and Knowledge
Management
A General Fine-Grained Truth Discovery
Approach for Crowdsourced Data Aggregation

Yang Du1(B) , Hongli Xu1 , Yu-E Sun2 , and Liusheng Huang1


1
School of Computer Science and Technology,
University of Science and Technology of China, Hefei, China
jannr@mail.ustc.edu.cn, {xuhongli,lshuang}@ustc.edu.cn
2
School of Urban Rail Transportation, Soochow University, Suzhou, China
sunye12@suda.edu.cn

Abstract. Crowdsourcing has been proven to be an efficient tool to


collect large-scale datasets. Answers provided by the crowds are often
noisy and conflicted, which makes aggregating them to infer ground
truth a critical challenge. Existing fine-grained truth discovery methods
solve this problem by exploring the correlation between source reliabil-
ity and task topics or answers. However, they can only work on lim-
ited tasks, which results in the incompatibility with Writing tasks and
Transcription tasks, along with the insufficient utilization of the global
dataset. To maintain compatibility, we consider the existence of clusters
in both tasks and sources, then propose a general fine-grained method.
The proposed approach contains two integral components: kl-means and
Pattern-based Truth Discovery (PTD). With the aid of ground truth
data, kl-means directly employs a co-clustering reliability model on the
correctness matrix to learn the patterns. Then PTD conducts the answer
aggregation by incorporating captured patterns, producing a more accu-
rate estimation. Therefore, our approach is compatible with all tasks
and can better demonstrate the correlation among tasks and sources.
Experimental results show that our method can produce a more precise
estimation than other general truth discovery methods due to its ability
to learn and utilize the patterns of both tasks and sources.

1 Introduction
In recent years, producing large-scale datasets has been vital for both research
and industry. Traditional strategies, mostly hiring motivated experts, are not
competent enough when collecting datasets like ImageNet [1] and TinyImages
[2] which contain millions of samples. On the other hand, crowdsourcing services,
like AMT (Amazon Mechanical Turk) [3] or CrowdFlower [4], offer a convenient
way to collect datasets by distributing tasks to workers (sources) all over the
world.
For its efficiency and effectiveness in producing large-scale datasets, crowd-
sourcing has become increasingly popular. As sources are not necessarily experts,
tasks are often distributed more than once, which results in the noisy and con-
flicted answers. Therefore, how to aggregate answers and infer the ground truth
has been a critical challenge in crowdsourcing.

c Springer International Publishing AG 2017
S. Candan et al. (Eds.): DASFAA 2017, Part I, LNCS 10177, pp. 3–18, 2017.
DOI: 10.1007/978-3-319-55753-3 1
4 Y. Du et al.

An intuitive solution to this challenge is majority voting, which selects the


answer with most votes as the final output. However, it overlooks the rugged
reliability of sources and shows poor performance when the number of low-
quality sources exceeds that of high-quality ones. To solve this problem, a set
of weighted voting approaches was proposed [5–7]. Regardless of the differences
in their models, their principles are assigning larger weights to sources with
higher reliability and outputting the answers with largest weights. Nevertheless,
these approaches assume that each source’s reliability is consistent with all tasks,
which is not true, as everyone has his areas of expertise.
Given the shortcomings of above approaches, current efforts try to utilize
sources’ fine-grained reliability. Some of them divide tasks into topical-level clus-
ters and estimate source’s topical reliability, like FaitCrowd [8], which employs
a topic model on task description and divides tasks into groups. Others attempt
to model source’s reliability for different answers [9,10]. However, a common
drawback of these methods is their limited compatibility with tasks.

Table 1. Crowdsourcing task templates in AMT

Task form Task description Candidate answers


task-specific non-specific task-specific non-specific
Image tagging/Moderation  
Writing  
Data collection  
Transcription  
Sentiment wizard  
Categorization wizard  

To better demonstrate the incompatibility issues of existing fine-grained


approaches, we consider the task templates in AMT and propose a rough classi-
fication for their task description and candidate answers. As Table 1 shows, we
divide each feature into two groups: task-specific or non-specific. For example,
the task description of Data Collection tasks, like “What is the height of the
Mount Everest?”, is labeled as task-specific for that each description is targeted
at a specific task. In contrast, the task description of Image Tagging tasks, like
“Is there a duck in the picture?”, which can be used for multiple images, is
labeled as non-specific. The classification process of candidate answer is similar.
Consider a typical Image Tagging task which asks workers to answer whether
they see birds in the picture. We label the candidate answers of this task as
non-specific since they can be shared among a set of Image Tagging tasks.
We find out that current fine-grained approaches are only competent for lim-
ited tasks in AMT. For instance, the approaches which divide tasks into topical-
level groups, like FaitCrowd, can only work with the tasks whose description
can reflect their topics. Therefore, those approaches can only perform well on
A General Fine-Grained Truth Discovery Approach 5

tasks with the task-specific description but are not competent for those with
non-specific description. Other approaches, which model fine-grained reliability
for different answers, limit the tasks to those whose candidate answers are non-
specific. As consequences, (1) none of these approaches can deal with the Writing
tasks or the Transcription tasks, and (2) they can only utilize a subset of the
global dataset to help infer the ground truth. Therefore, it is crucial to design a
general fine-grained truth discovery approach.
To tackle with challenges above, we consider the clusters in both tasks and
sources. It has been observed that members of the same task cluster or source
cluster share identical patterns. For instance, the pattern shared in a source
cluster is sources’ rugged reliability levels for different task clusters. More specif-
ically, sources can have different reliability levels for various task clusters, but
for a particular task cluster, sources coming from the same cluster show an equal
reliability level. The patterns of task clusters are similar to that of source clusters.
Therefore, we can first learn the patterns of task clusters and source clusters,
then derive correct answers for target tasks based on the learned patterns.
In this paper, we propose a general fine-grained truth discovery approach
to aggregate crowdsourced answers. The proposed method contains two integral
components: kl-means and Pattern-based Truth Discovery (PTD). With the aid
of standard tasks whose ground truth are known, kl-means directly employs a
co-clustering reliability model on the correctness matrix to learn the patterns
of sources and tasks simultaneously. Then PTD further uses captured patterns
to help infer the ground truth. Since these two components do not have any
additional requirements for tasks, like the task-specific description or non-specific
answers, our approach is compatible with all tasks. To the best of our knowledge,
this is the first work to propose a general fine-grained method with consideration
of the patterns of task clusters and source clusters. One important feature of our
approach is the utilization of co-clustering reliability model and standard tasks,
which allows us to learn the cluster patterns directly from the correctness matrix.
Therefore, our method can better demonstrate the relationship between tasks
and sources and produce more accurate inference.
In summary, three main contributions of this work are as follows:
(1) To the best of our knowledge, we are the first to recognize the compatibility
issues of state-of-art fine-grained truth discovery methods and propose a
general one to aggregate noisy crowdsourced answers.
(2) We employ a co-clustering reliability model to learn the patterns of task
clusters and source clusters simultaneously and use captured patterns to
derive a more accurate inference of ground truth.
(3) We empirically show that the proposed method outperforms existing general
truth discovery methods.

2 Problem Formulation
In this section, we first introduce some basic notations used in this paper, then
give a formal definition of our problem.
6 Y. Du et al.


The inputs of our method are m standard tasks T = {ti }m 1 , m target tasks
 m+m m+m ,n
T = {ti }m+1 , n sources U = {uj }1 , answers matrix A = {aij }i=1,j=1 and cor-
n

rectness matrix R = {rij }m,n


i=1,j=1 , where standard tasks are those whose ground
truth we already know, the target tasks are those whose ground truth we want
to find out.
Definition 1. All tasks are single-choice questions. For task ti , it has B candi-
date answers ci = {ci1 , ci2 · · · ciB }, and its ground truth is denoted by c∗i ∈ ci .

Definition 2. Answer element aij denotes the answer given by source uj to task
ti . aij = b, 1 ≤ b ≤ B means source uj select task ti ’s candidate answer cib as
his answer, aij = 0 means source uj has not answered task ti .

Definition 3. Correctness element rij denotes the correctness of answer given


by source uj to standard task ti . rij = 1 means source uj answers task ti correctly,
rij = 0 means his answer is incorrect, while rij = −1 means source uj has not
answered task ti .

Our goal is to derive estimation of target tasks’ ground truth E = {ei }m+mm+1 ,
k task clusters I = {Ip }k1 , l source clusters {Jq }l1 and co-clustering reliability
matrix {ηpq }k,l
p=1,q=1 .

Definition 4. Estimated ground truth ei for task ti is the most accurate answer
provided by sources.

Definition 5. Co-clustering reliability matrix η k×l is referred as the reliability


levels of l source clusters on k task clusters.

Definition 6. Sources in same source cluster share an identical pattern on task


clusters, and a source’s reliability is consistent on the same task cluster.

Based on these definitions, we can formally define our problem as: given m
  m+m
standard tasks T = {ti }m
1 , m target tasks T = {ti }m+1 , n sources U = {uj }1 ,
n

m+m ,n
answers matrix A = {aij }i=1,q=1 and correctness matrix R = {rij }m,n i=1,q=1 , our
goals are learning k task clusters I = {Ip }1 , l source clusters {Jq }1 , co-clustering
k l

reliability matrix {ηpq }k,l


p=1,q=1 and producing estimation of target tasks’ true

answers E = {ei }m+m
m+1 .

3 Co-clustering Reliability Model

The basic idea of proposed approach is to build a co-clustering reliability model


on the correctness matrix, which is based on the observation that there exist
clusters in both sources and tasks. For each source cluster, its members share
same reliability level on tasks belonging to the same task cluster, so do the task
clusters.
A General Fine-Grained Truth Discovery Approach 7

Therefore, we can divide tasks into k clusters, sources into l clusters by the
following functions:

ρ : {t1 , t2 · · · tm } → {1, 2 · · · k}
(1)
γ : {u1 , u2 · · · un } → {1, 2 · · · l},

where ρ(ti ) = r implies that task ti is in task cluster r, γ(uj ) = s implies that
source uj is in source cluster s.
Let I = {Ip }k1 denote the k task clusters and J = {Jq }l1 denote the source
clusters. A co-cluster is referred as the submatrix of correctness matrix R deter-
mined by a task cluster Ip and a source cluster Jq .
We evaluate the homogeneity of co-cluster by the sum of squared differences
between each entry and the mean of this co-cluster which was used by Hartigan
et al. [15]. Consider that the correctness matrix could be highly sparse since it
is unnecessary for sources to perform all tasks. We compute the residue of the
element rij differently based on whether source uj has answered task ti . Let
value −1 represent the unfilled items in the correctness matrix. Formally:

rij − ηpq , if rij = −1
hij = (2)
0, otherwise.

When source uj has answered task ti , the residue of rij is hij = rij − ηpq ,
where p is the cluster label of task ti , q is the cluster label of source uj , ηpq is
the mean of submatrix determined by Ip and Jq . But, when source uj has not
answered task ti , we set hij to 0, which in fact ignores the loss caused by unfilled
elements.
The objective of this co-clustering reliability model is to learn the patterns
of both tasks and sources. Hence, the objective function is to minimize the total
squared residues of all co-clusters, which is shown as follows.
  
L= Lpq = h2ij , (3)
Ip ,Jq Ip ,Jq i∈Ip ,j∈Jq

where Ip , Jq enumerates all co-clusters, Lpq represents the sum squared residues
of co-cluster determined by Ip and Jq .

4 Algorithms
The proposed approach estimates the ground truth by employing two integral
components: (1) using kl-means to learn the co-clustering reliability model from
sources performance on standard tasks, which contains the patterns of clusters,
and (2) infer the ground truth of target tasks with the help of capture patterns.

4.1 kl-means
We propose kl-means, an extended version of the k-means algorithm, to learn
co-clustering reliability of tasks and sources. Differing from previous work
Another random document with
no related content on Scribd:
⅓ cup soft shortening
1 cup brown sugar
1½ cups black molasses

Stir in ...

½ cup cold water

Sift together and stir in ...

6 cups sifted GOLD MEDAL Flour


1 tsp. salt
1 tsp. allspice
1 tsp. ginger
1 tsp. cloves
1 tsp. cinnamon

Stir in ...

2 tsp. soda dissolved in 3 tbsp. cold water

Chill dough. Roll out very thick (½″). Cut with 2½″ round cutter. Place
far apart on lightly greased baking sheet. Bake until, when touched
lightly with finger, no imprint remains.
temperature: 350° (mod. oven).
time: Bake 15 to 18 min.
amount: 2⅔ doz. fat, puffy 2½″ cookies.

FROSTED GINGIES
Follow recipe above—and frost when cool with Simple White Icing
(recipe below).

SIMPLE WHITE ICING


Blend together 1 cup sifted confectioners’ sugar, ¼ tsp. salt, ½ tsp.
vanilla, and enough milk or water to make easy to spread (about 1½
tbsp.). Part of icing may be
colored by adding a drop or two
of food coloring.

GINGERBREAD BOYS
Make holidays gayer than ever.
Follow recipe above—and mix in 1 more cup sifted GOLD MEDAL
Flour. Chill dough. Roll out very thick (½″). Grease cardboard
gingerbread boy pattern, place on the dough, and cut around it with
a sharp knife. Or use a gingerbread boy cutter. With a pancake
turner, carefully transfer gingerbread boys to lightly greased baking
sheet. Press raisins into dough for eyes, nose, mouth, and shoe and
cuff buttons. Use bits of candied cherries or red gum drops for coat
buttons; strips of citron for tie. Bake. Cool slightly, then carefully
remove from baking sheet. With white icing, make outlines for collar,
cuffs, belt, and shoes.
amount: About 12 Gingerbread Boys.

★ STONE JAR MOLASSES COOKIES


Crisp and brown ... without a bit of sugar.
Heat to boiling point ...

1 cup molasses

Remove from heat


Stir ...

½ cup shortening
1 tsp. soda

Sift together and stir in ...

2¼ cups sifted GOLD MEDAL Flour


1¾ tsp. baking powder
1 tsp. salt
1½ tsp. ginger

Chill dough. Roll out very thin (¹⁄₁₆″). Cut into desired shapes. Bake
until, when touched lightly, no imprint remains. (Over-baking gives a
bitter taste.)
temperature: 350° (mod. oven).
time: Bake 5 to 7 min.
amount: About 6 doz. 2½″ cookies.

PATTERN FOR GINGER BREAD BOY


Trace on tissue paper. Then cut pattern from cardboard. Place
greased pattern on dough. Cut around it with a sharp knife. Other
cooky patterns can be made in same way.
To make “dancing” Gingerbread boys ... bend the legs and arms into
“action” positions when you place them on baking sheet (as shown in
small figures above).
Packing cookies successfully for mailing
1 Select heavy box, lined with waxed paper. Use plenty of filler
(crushed wrapping or tissue paper, or unbuttered popcorn or
Cheerios).
2 Wrap each cooky separately ... in waxed paper. Or place cookies
back-to-back in pairs ... then wrap each pair.
3 Pad bottom of box with filler.
Fit wrapped cookies into box
closely, in layers.
4 Use filler between layers to
prevent crushing of cookies.
5 Cover with paper doily, add
card, and pad top with crushed
paper. Pack tightly so contents
will not shake around.
6 Wrap box tightly with heavy
paper and cord. Address plainly
with permanent ink ... covering
address with Scotch tape or
colorless nail polish. Mark the
box plainly: “PERISHABLE.”
Festive Christmas Cookies
★ 1 Sandbakelser
★ 2 Spritz Rosettes
★ 3 Merry Christmas Cookies, Trees, Stars, etc.
★ 4 Nurnberger
★ 5 Almond Crescents
★ 6 Lebkuchen
★ 7 Scotch Shortbread
★ 8 Berliner Kranser
★ 9 Finska Kakor
★ 10 Russian Tea Cakes

Gay shapes ... for holiday cheer.

MERRY CHRISTMAS COOKIES ( Recipe)


Soft, cushiony cookies, dark or light.
DARK DOUGH.... For animal shapes, toy shapes, and boy and girl
figures.
Mix together thoroughly ...

⅓ cup soft shortening


⅓ cup brown sugar
1 egg
⅔ cup molasses

Sift together and stir in ...

2¾ cups sifted GOLD MEDAL Flour


1 tsp. soda
1 tsp. salt
2 tsp. cinnamon
1 tsp. ginger

Chill dough. Roll out thick (¼″). Cut into desired shapes. Place 1″
apart on lightly greased baking sheet. Bake until, when touched
lightly with finger, no imprint remains. When cool, ice and decorate
as desired.
temperature: 375° (quick mod. oven).
time: Bake 8 to 10 min.
amount: About 5 doz. 2½″ cookies.

LIGHT DOUGH
For bells, stockings, stars, wreaths, etc.
Follow recipe for Dark Dough above except substitute honey for
molasses, and granulated sugar for brown. Use 1 tsp. vanilla in
place of cinnamon and ginger.

TO HANG ON CHRISTMAS TREE


Just loop a piece of green string and press ends into the dough at
the top of each cooky before baking. Bake with string-side down on
pan.

TO DECORATE
Use recipe for Decorating Icing (p. 31) (thin the icing for spreading).
For decorating ideas, see picture on preceding page. Sugar in
coarse granules for decorating is available at bakery supply houses.
STARS
Cover with white icing. Sprinkle with sky blue sugar.
WREATHS
Cut with scalloped cutter ... using smaller
cutter for center. Cover with white icing.
Sprinkle with green sugar and decorate with clusters
of berries made of red icing—leaves of green icing—
to give the realistic effect of holly wreaths.
BELLS
Outline with red icing. Make clapper of red icing. (A
favorite with children.)
STOCKINGS
Sprinkle colored sugar on toes and heels before
baking. Or mark heels and toes of baked cookies
with icing of some contrasting color.
CHRISTMAS TREES
Spread with white icing ... then sprinkle with green sugar.
Decorate with silver dragées and tiny colored candies.
TOYS
(Drum, car, jack-in-the-box, etc.):
Outline shapes with white or colored icing.
ANIMALS
(Reindeer, camel, dog, kitten, etc.): Pipe icing
on animals to give effect of bridles, blankets,
etc.
BOYS AND GIRLS
Pipe figures with an icing to give desired effects:
eyes, noses, buttons, etc.

“Old country” Christmas treasures.


LEBKUCHEN ( Recipe)
The famous old-time German Christmas Honey Cakes.
Mix together and bring to a boil ...

½ cup honey
½ cup molasses

Cool thoroughly
Stir in ...

¾ cup brown sugar


1 egg
1 tbsp. lemon juice
1 tsp. grated lemon rind

Sift together and stir in ...

2¾ cups sifted GOLD MEDAL Flour


½ tsp. soda
1 tsp. cinnamon
1 tsp. cloves
1 tsp. allspice
1 tsp. nutmeg

Mix in ...

⅓ cup cut-up citron


⅓ cup chopped nuts

Chill dough overnight. Roll small amount at a time, keeping rest


chilled. Roll out ¼″ thick and cut into oblongs 1½ × 2½″. Place one
inch apart on greased baking sheet. Bake until when touched lightly
no imprint remains. While cookies bake, make Glazing Icing (recipe
below). Brush it over cookies the minute they are out of oven. Then
quickly remove from baking sheet. Cool and store to mellow.
temperature: 400° (mod. hot oven).
time: Bake 10 to 12 min.
amount: About 6 doz. 2″ × 3″ cookies.

GLAZING ICING
Boil together 1 cup sugar and ½ cup water until first indication of a
thread appears (230°). Remove from heat. Stir in ¼ cup
confectioners’ sugar and brush hot icing thinly over cookies. (When
icing gets sugary, reheat slightly, adding a little water until clear
again.)

★ NURNBERGER
Round, light-colored honey cakes from the famed old City of Toys.
Follow recipe above—except in place of honey and molasses use
1 cup honey; and reduce spices (using ¼ tsp. cloves, ½ tsp. allspice,
and ½ tsp. nutmeg ... with 1 tsp. cinnamon).
Roll out the chilled dough ¼″ thick. Cut into 2″ rounds. Place on
greased baking sheet. With fingers, round up cookies a bit toward
center. Press in blanched almond halves around the edge like petals
of a daisy. Use a round piece of citron for each center. Bake just until
set. Immediately brush with Glazing Icing (above). Remove from
baking sheet. Cool, and store to mellow.
amount: About 6 doz. 2½″ cookies.

TO “MELLOW” COOKIES
... store in an air-tight container for a few days. Add a cut orange or apple; but fruit
molds, so change it frequently.
ZUCKER HÜTCHEN (Little Sugar Hats)
From the collection of Christmas recipes by the Kohler Woman’s Club of Kohler,
Wisconsin.
Mix together thoroughly ...
6 tbsp. soft butter
½ cup sugar
1 egg yolk

Stir in ...

2 tbsp. milk

Sift together and stir in ...

1⅜ cups sifted GOLD MEDAL Flour


½ tsp. baking powder
¼ tsp. salt

Mix in ...

¼ cup finely cut-up citron

Chill dough. Roll thin (⅛″). Cut into 2″ rounds. Heap 1 tsp. Meringue
Frosting (recipe below) in center of each round to make it look like
the crown of a hat. Place 1″ apart on greased baking sheet. Bake
until delicately browned.
temperature: 350° (mod. oven).
time: Bake 10 to 12 min.
amount: About 4 doz. 2″ cookies.

MERINGUE FROSTING
Beat 1 egg white until frothy. Beat in gradually 1½ cups sifted
confectioners’ sugar and beat until frosting holds its shape. Stir in ½
cup finely chopped blanched almonds.

Decorative favorites from lands afar.

SCOTCH SHORTBREAD
Old-time delicacy from Scotland ... crisp, thick, buttery.
Mix together thoroughly
...

1 cup soft butter


⅝ cup sugar (½ cup
plus 2 tbsp.)

Stir in ...

2½ cups sifted GOLD MEDAL Flour

Mix thoroughly with hands. Chill dough. Roll out ⅓ to ½″ thick. Cut
into fancy shapes (small leaves, ovals, squares, etc.). Flute edges if
desired by pinching between fingers as for pie crust. Place on
ungreased baking sheet. Bake. (The tops do not brown during
baking ... nor does shape of the cookies change.)
temperature: 300° (slow oven).
time: Bake 20 to 25 min.
amount: About 2 doz. 1″ × 1½″ cookies.

★ FINSKA KAKOR (Finnish Cakes)


Nut-studded butter strips from
Finland.
Mix together thoroughly ...

¾ cup soft butter


¼ cup sugar
1 tsp. almond flavoring

Stir in ...

2 cups sifted GOLD


MEDAL Flour
Mix thoroughly with hands. Chill dough. Roll out ¼″ thick. Cut into
strips 2½″ long and ¾″ wide. Brush tops lightly with 1 egg white,
slightly beaten. Sprinkle with mixture of 1 tbsp. sugar and ⅓ cup
finely chopped blanched almonds. Carefully transfer (several strips
at a time) to ungreased baking sheet. Bake just until cookies begin to
turn a very delicate golden brown.
temperature: 350° (mod. oven).
time: Bake 17 to 20 min.
amount: About 4 doz. 2½″ × ¾″ cookies.
SANDBAKELSER (Sand Tarts)
Fragile almond-flavored shells of
Swedish origin, made in copper molds
of varied designs.
Put through fine knife of food
grinder twice ...

*⅓ cup blanched almonds


*4 unblanched almonds

Mix in thoroughly ...

⅞ cup soft butter (1 cup


minus 2 tbsp.)
The ring of sleigh bells fills the air as
¾ cup sugar everyone races to church on Christmas Day
1 small egg white, in Finland.
unbeaten

Stir in ...

1¾ cups sifted GOLD MEDAL Flour

*In place of the almonds, you may use 1 tsp. vanilla flavoring and 1
tsp. almond flavoring.
Chill dough. Press dough into Sandbakels molds (or tiny fluted tart
forms) to coat inside. Place on ungreased baking sheet. Bake until
very delicately browned. Tap molds on table to loosen cookies and
turn them out of the molds.
temperature: 350° (mod. oven).
time: Bake 12 to 15 min.
amount: About 3 doz. cookies.
MOLDED COOKIES Mold ’em fast with
a fork or glass!

HOW TO MAKE MOLDED COOKIES (preliminary steps on pp. 14-


15)

1 With hands, roll dough 2 Flatten balls of dough 3 Cut pencil-thick strips
into balls or into long, with bottom of a glass ... and shape as directed
pencil-thick rolls, as dipped in flour (or with a ... as for Almond
indicated in recipe. damp cloth around it), or Crescents (p. 41) or
with a fork—crisscross. Berliner Kranser (p. 42).

DATE-OATMEAL COOKIES
Mix together thoroughly ...

¾ cup soft shortening (half butter)


1 cup brown sugar
2 eggs
3 tbsp. milk
1 tsp. vanilla

Sift together and stir in ...

2 cups sifted GOLD MEDAL Flour


¾ tsp. soda
1 tsp. salt

Stir in ...
2 cups rolled oats
1½ cups cut-up dates
¾ cup chopped nuts

Chill dough. Roll into balls size of large walnuts. Place 3″ apart on
lightly greased baking sheet. Flatten (to ¼″) with bottom of glass
dipped in flour. Bake until lightly browned.
temperature: 375° (quick mod. oven).
time: Bake 10 to 12 min.
amount: About 4 doz. 2½″ cookies.

★ PEANUT BUTTER COOKIES ( Recipe)


Perfect for the Children’s Hour.
Mix together thoroughly ...

½ cup soft shortening (half butter)


½ cup peanut butter
½ cup sugar
½ cup brown sugar
1 egg

Sift together and stir in ...

1¼ cups sifted GOLD MEDAL Flour


½ tsp. baking powder
¾ tsp. soda
¼ tsp. salt

Chill dough. Roll into balls size of large walnuts. Place 3″ apart on
lightly greased baking sheet. Flatten with fork dipped in flour ...
crisscross. Bake until set ... but not hard.
temperature: 375° (quick mod. oven).
time: Bake 10 to 12 min.
amount: About 3 doz. 2½″ cookies.

HONEY PEANUT BUTTER COOKIES


Follow recipe above—except use only ¼
cup shortening, and in place of brown sugar
use ½ cup honey.

Sprightly tea cakes for friends and family.

THUMBPRINT COOKIES Nut-rich ... the thumb dents filled with


sparkling jelly.
I’m as delighted with this quaint addition to our cooky collection, from Ken
MacKenzie, as is the collector of old glass when a friend presents her with some
early thumbprint goblets.
Mix together thoroughly ...

½ cup soft shortening (half butter)


¼ cup brown sugar
1 egg yolk
½ tsp. vanilla

Sift together and stir in ...

1 cup sifted GOLD MEDAL Flour


¼ tsp. salt

Roll into 1″ balls. Dip in slightly beaten egg whites. Roll in finely
chopped nuts (¾ cup). Place about 1″ apart on ungreased baking
sheet. Bake 5 min. Remove from oven. Quickly press thumb gently
on top of each cooky. Return to oven and bake 8 min. longer. Cool.
Place in thumbprints a bit of chopped candied fruit, sparkling jelly, or
tinted confectioners’ sugar icing.
temperature: 375° (quick mod. oven).
time: Bake 5 min., then 8 min.
amount: About 2 doz. 1½″ cookies.

★ ENGLISH TEA CAKES Tender, flavorful tidbits with a sugary


glaze.
Mix together thoroughly ...

½ cup soft shortening (half butter)


¾ cup sugar
1 egg
3 tbsp. milk

Sift together and stir in ...

1¾ cups sifted GOLD MEDAL Flour


1½ tsp. baking powder
¼ tsp. salt

Mix in ...

½ cup finely cut sliced citron


½ cup currants or raisins, cut-up

Chill dough. Roll into balls the size of walnuts. Dip tops in slightly
beaten egg white, then sugar. Place sugared-side-up 2″ apart on
ungreased baking sheet. Bake until delicately browned. The balls
flatten some in baking and become glazed.
temperature: 400° (mod. hot oven).
time: Bake 12 to 15 min.
amount: About 3 doz. 1½″ cookies.

ALMOND CRESCENTS
Richly delicate, buttery. Party favorites.
Mix together thoroughly ...

1 cup soft shortening (half butter)


⅓ cup sugar
⅔ cup ground blanched almonds

Sift together and work in ...

1⅔ cups sifted GOLD MEDAL Flour


¼ tsp. salt

Chill dough. Roll with hands pencil-thick. Cut in 2½″ lengths. Form
into crescents on ungreased baking sheet. Bake until set ... not
brown. Cool on pan. While slightly warm, carefully dip in 1 cup
confectioners’ sugar and 1 tsp. cinnamon mixed.
temperature: 325° (slow mod. oven).
time: Bake 14 to 16 min.
amount: About 5 doz. 2½″ cookies.

LEMON SNOWDROPS
Refreshing, lemony ... with snowy icing.
Follow recipe for English Tea Cakes above—except use 2 tbsp.
lemon juice and 1 tbsp. water in place of the milk. Add 2 tsp. grated
lemon rind. Omit citron and currants. Mix in ½ cup chopped nuts.
Chill dough. Roll into balls and bake. Then roll in confectioners’
sugar.

BUTTER FINGERS
Nut-flavored, rich buttery party cookies.
Follow recipe for Almond Crescents—except in place of almonds use
black walnuts or other nuts, chopped. Cut into finger lengths and
bake. While still warm, roll in confectioners’ sugar. Cool, and roll in
the sugar again.
Festive cookies for the holidays ... ideal for Christmas boxes.

You might also like