You are on page 1of 54

Advanced Data Mining and

Applications 14th International


Conference ADMA 2018 Nanjing China
November 16 18 2018 Proceedings
Guojun Gan
Visit to download the full and correct content document:
https://textbookfull.com/product/advanced-data-mining-and-applications-14th-internati
onal-conference-adma-2018-nanjing-china-november-16-18-2018-proceedings-guoju
n-gan/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Advanced Data Mining and Applications 15th


International Conference ADMA 2019 Dalian China
November 21 23 2019 Proceedings Jianxin Li

https://textbookfull.com/product/advanced-data-mining-and-
applications-15th-international-conference-adma-2019-dalian-
china-november-21-23-2019-proceedings-jianxin-li/

Advanced Data Mining and Applications 10th


International Conference ADMA 2014 Guilin China
December 19 21 2014 Proceedings 1st Edition Xudong Luo

https://textbookfull.com/product/advanced-data-mining-and-
applications-10th-international-conference-adma-2014-guilin-
china-december-19-21-2014-proceedings-1st-edition-xudong-luo/

Computational Data and Social Networks 7th


International Conference CSoNet 2018 Shanghai China
December 18 20 2018 Proceedings Xuemin Chen

https://textbookfull.com/product/computational-data-and-social-
networks-7th-international-conference-csonet-2018-shanghai-china-
december-18-20-2018-proceedings-xuemin-chen/

Service Oriented Computing 16th International


Conference ICSOC 2018 Hangzhou China November 12 15
2018 Proceedings Claus Pahl

https://textbookfull.com/product/service-oriented-computing-16th-
international-conference-icsoc-2018-hangzhou-china-
november-12-15-2018-proceedings-claus-pahl/
Knowledge Engineering and Knowledge Management 21st
International Conference EKAW 2018 Nancy France
November 12 16 2018 Proceedings Catherine Faron Zucker

https://textbookfull.com/product/knowledge-engineering-and-
knowledge-management-21st-international-conference-
ekaw-2018-nancy-france-november-12-16-2018-proceedings-catherine-
faron-zucker/

Big Data Analytics 6th International Conference BDA


2018 Warangal India December 18 21 2018 Proceedings
Anirban Mondal

https://textbookfull.com/product/big-data-analytics-6th-
international-conference-bda-2018-warangal-india-
december-18-21-2018-proceedings-anirban-mondal/

Bio inspired Computing Theories and Applications 13th


International Conference BIC TA 2018 Beijing China
November 2 4 2018 Proceedings Part I Jianyong Qiao

https://textbookfull.com/product/bio-inspired-computing-theories-
and-applications-13th-international-conference-bic-
ta-2018-beijing-china-november-2-4-2018-proceedings-part-i-
jianyong-qiao/

Ambient Intelligence 14th European Conference AmI 2018


Larnaca Cyprus November 12 14 2018 Proceedings Achilles
Kameas

https://textbookfull.com/product/ambient-intelligence-14th-
european-conference-ami-2018-larnaca-cyprus-
november-12-14-2018-proceedings-achilles-kameas/

Frontiers in Cyber Security First International


Conference FCS 2018 Chengdu China November 5 7 2018
Proceedings Fagen Li

https://textbookfull.com/product/frontiers-in-cyber-security-
first-international-conference-fcs-2018-chengdu-china-
november-5-7-2018-proceedings-fagen-li/
Guojun Gan
Bohan Li
Xue Li
Shuliang Wang (Eds.)
LNAI 11323

Advanced Data Mining


and Applications
14th International Conference, ADMA 2018
Nanjing, China, November 16–18, 2018
Proceedings

123
Lecture Notes in Artificial Intelligence 11323

Subseries of Lecture Notes in Computer Science

LNAI Series Editors


Randy Goebel
University of Alberta, Edmonton, Canada
Yuzuru Tanaka
Hokkaido University, Sapporo, Japan
Wolfgang Wahlster
DFKI and Saarland University, Saarbrücken, Germany

LNAI Founding Series Editor


Joerg Siekmann
DFKI and Saarland University, Saarbrücken, Germany
More information about this series at http://www.springer.com/series/1244
Guojun Gan Bohan Li

Xue Li Shuliang Wang (Eds.)


Advanced Data Mining


and Applications
14th International Conference, ADMA 2018
Nanjing, China, November 16–18, 2018
Proceedings

123
Editors
Guojun Gan Xue Li
University of Connecticut The University of Queensland
Storrs, CT, USA Brisbane, QLD, Australia
Bohan Li Shuliang Wang
Nanjing University of Aeronautics Beijing Institute of Technology
and Astronautics Beijing, China
Nanjing, China

ISSN 0302-9743 ISSN 1611-3349 (electronic)


Lecture Notes in Artificial Intelligence
ISBN 978-3-030-05089-4 ISBN 978-3-030-05090-0 (eBook)
https://doi.org/10.1007/978-3-030-05090-0

Library of Congress Control Number: 2018962542

LNCS Sublibrary: SL7 – Artificial Intelligence

© Springer Nature Switzerland AG 2018


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now
known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors
give a warranty, express or implied, with respect to the material contained herein or for any errors or
omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface

The 14th International Conference on Advanced Data Mining and Applications


(ADMA) was held in Nanjing, one of the most ancient cities in China. Over the years,
ADMA has grown to become a flagship conference in the field of data mining and
applications. One of the major goals of ADMA is to bring together data mining
researchers from around the world to share their original data mining findings and
practical data mining experiences. In the ear of big data and artificial intelligence (AI),
data mining is becoming an important option for developing data-driven applications.
For ADMA 2018, we received 104 papers from 30 different countries including
American, European, Middle-East, and Pacific-Asian countries. Each paper was
assigned to at least three Program Committee members to review. All papers were
rigorously reviewed and had at least three reviews. At the end, 23 papers were accepted
as spotlight research papers with long presentation and 22 were accepted as regular
research papers with short presentation. The conference program of ADMA 2018 was
also complemented by several outstanding keynotes and tutorials given by
world-renowned experts Xuemin Lin, Ekram Hossain, Guoren Wang, Yang Yu,
Jie Tang as well as an invited industry keynote talk session, delivered by several invited
industry speakers. We would like to particularly thank those speakers for contributing
their insights and visions of the future of data mining technology in this dynamic
research field, where many puzzling terms are emerging, such as blockchain, strong AI,
and common-sense mining on networks of multi-modality data.
We greatly appreciate the Program Committee members’ tremendous efforts to
complete the review reports before the deadline. We would like to thank the external
reviewers for their time and comprehensive reviews and recommendations. Their
professional work was crucial to the final paper selection and production of the
high-quality technical program for ADMA 2018.
This high-quality program would not have been possible without the expertise and
dedication of our Program Committee members. We would like to express our grati-
tude to all individuals, institutions, and sponsors that supported ADMA 2018. We are
grateful to all the chairs who are actively involved in the organization of this confer-
ence including attracting submissions, compiling all accepted papers, and working with
the Springer team to produce the proceedings, managing the website. Our special
thanks to the publicity chair, Guojun Gan, for editing, the local organization chair,
Bohan Li, for the local arrangements ensuring the conference ran smoothly, and the
registration chair, Donghai Guan, for handling the registration process. We would like
to express our sincere thanks to Weitong (Tony) Chen for helping with paper sub-
mission process and the smooth running of the conference program based on his rich
experiences. We would also like to thank Michael Sheng, Aixin Sun, Gao Cong, and
Wei Luo for their contribution to the conference. Furthermore, we would like to
acknowledge the support of the members of the conference Steering Committee.
VI Preface

Finally, we would like to thank all researchers, practitioners, and volunteer students
who contributed with their work and participated in the conference.
With the new challenges in data mining research, we hope the participants in the
conference and the readers of the proceedings will enjoy the research outcome of
ADMA 2018.

October 2018 Xue Li


Joao Gama
Bing Chen
Songcan Chen
Shuliang Wang
Xingquan (Hill) Zhu
Organization

Main Organizing Committee


Honorary Chair
Zhiqiu Huang Nanjing University of Aeronautics and Astronautics, China

General Chairs
Xue Li University of Queensland, Australia
Joao Gama University of Porto, Portugal

Acting Chair
Bing Chen Nanjing University of Aeronautics and Astronautics, China

Program Chairs
Songcan Chen Nanjing University of Aeronautics and Astronautics, China
Shuliang Wang Beijing Institute of Technology, China
Xingquan (Hill) Zhu Florida Atlantic University, USA

Demo Chairs
Zhifeng Bao RMIT, Australia
Jianqiu Xu Nanjing University of Aeronautics and Astronautics, China

Proceedings Chair
Yunlong Zhao Nanjing University of Aeronautics and Astronautics, China

Awards Committee Chair


Aixin Sun Nanyang Technological University, Singapore

Publicity Chair
Guojun Gan University of Connecticut, USA

Data Mining Competition Chair/Pacific Chair


Luo Wei Deakin University, Australia

Special Issue Chair


Daoqiang Zhang Nanjing University of Aeronautics and Astronautics, China
VIII Organization

Sponsorship Chairs
Donghai Guan Nanjing University of Aeronautics and Astronautics, China
Xiangping Zhai Nanjing University of Aeronautics and Astronautics, China

Local Chair
Bohan Li Nanjing University of Aeronautics and Astronautics, China

Web Chair
Xin Li Nanjing University of Aeronautics and Astronautics, China

Program Committee
Bin Guo Northwestern Polytechnical University, USA
Bin Yao Shanghai Jiao Tong University, China
Bin Zhao Nanjing Normal University, China
Bin Zhou National University of Defense Technology, China
Chandra Prasetyo University of Queensland, Australia
Utomo
Changdong Wang Sun Yat-Sen University, China
Chuan Shi Beijing University of Posts and Telecommunications, China
Dechang Pi Nanjing University of Aeronautics and Astronautics, China
Guandong Xu University of Technology Sydney, Australia
Hongxu Chen University of Queensland, Australia
Hongzhi Wang Harbin Institute of Technology, China
Hongzhi Yin University of Queensland, Australia
Jianxin Li University of Western Australia
Jingfeng Guo Yanshan University, China
Lina Yao University of New South Wales, Australia
Luyao Liu University of Queensland, Australia
Meng Wang Xi’an Jiaotong University, China
Michael Sheng Macquarie University, Australia
Min Yao Zhejiang University, China
Moscato Pablo University of Newcastle, Australia
Nguyen Hung Griffith University, Australia
Peiquan Jin University of Science and Technology of China
Qilong Han Harbin Engineering University, China
Rui Mao Shenzhen University, China
Sen Wang Griffith University, Australia
Shuai Ma BeiHang University, China
Tong Chen University of Queensland, Australia
Unankard Sayan Maejo University, Thailand
Wei Zhang Macquarie University, Australia
Weitong Chen University of Queensland, Australia
Wenjie Ruan University of Oxford, UK
Xiaolin Qin Nanjing University of Aeronautics and Astronautics, China
Organization IX

Xiaoyang Tan Nanjing University of Aeronautics and Astronautics, China


Xin Wang Tianjin University, China
Xin Zhao University of Queensland, Australia
Xiu Fang Macquarie University, Australia
Yan Jia National University of Defense Technology, China
Yanhui Gu Nanjing Normal University, China
Yongxin Tong BeiHang University, China
Yue Lin Northeast Normal University, China
Yunjun Gao Zhejiang University, China
Yuwei Peng Wuhan University, China
Zongmin Ma Nanjing University of Aeronautics and Astronautics, China

Steering Committee
Jie Cao Nanjing University of Finance and Economics, China
Xue Li (Chair) University of Queensland, Australia
Shuliang Wang Beijing Institute of Technology, China
Michael Sheng University of Adelaide, Australia
Jie Tang Tsinghua University, China
Kyu-Young Whang Advanced Institute of Science and Technology, South Korea
Min Yao Zhejiang University, China
Osmar Zaiane University of Alberta, Canada
Chengqi Zhang University of Technology Sydney, Australia
Shichao Zhang Guangxi Normal University, China
Contents

Data Mining Foundations

Efficiently Mining Constrained Subsequence Patterns . . . . . . . . . . . . . . . . . 3


Abdullah Albarrak, Sanad Al-Maskari, Ibrahim A. Ibrahim,
and Abdulqader M. Almars

Slice_OP: Selecting Initial Cluster Centers Using Observation Points. . . . . . . 17


Md Abdul Masud, Joshua Zhexue Huang, Ming Zhong, Xianghua Fu,
and Mohammad Sultan Mahmud

HierArchical-Grid CluStering Based on DaTA Field in Time-Series


and the Influence of the First-Order Partial Derivative Potential Value
for the ARIMA-Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Krid Jinklub and Jing Geng

Anomaly Detection with Changing Cluster Centers . . . . . . . . . . . . . . . . . . . 42


Zhang Peng and Zhou Liang

A Novel Feature Selection-Based Sequential Ensemble Learning Method


for Class Noise Detection in High-Dimensional Data . . . . . . . . . . . . . . . . . . 55
Kai Chen, Donghai Guan, Weiwei Yuan, Bohan Li,
Asad Masood Khattak, and Omar Alfandi

Possibilistic Information Retrieval Model Based on a Multi-terminology . . . . 66


Wiem Chebil, Lina F. Soualmia, and Mohamed Nazih Omri

On Improving the Prediction Accuracy of a Decision Tree


Using Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Md. Nasim Adnan, Md. Zahidul Islam, and Md. Mostofa Akbar

A Genetic Algorithm Based Technique for Outlier Detection


with Fast Convergence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Xiaodong Zhu, Ji Zhang, Zewen Hu, Hongzhou Li, Liang Chang,
Youwen Zhu, Jerry Chun-Wei Lin, and Yongrui Qin

Multivariate Synchronization Index Based on Independent Component


Analysis for SSVEP-Based BCI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Yanlong Zhu, Chenglong Dai, and Dechang Pi
XII Contents

Big Data

Forecasting Traffic Flow in Big Cities Using Modified


Tucker Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Manish Bhanu, Shalini Priya, Sourav Kumar Dandapat,
Joydeep Chandra, and João Mendes-Moreira

A Sparse and Low-Rank Matrix Recovery Model for Saliency Detection . . . . 129
Chao Wang, Jing Li, KeXin Li, and Yi Zhuang

Instruction SDC Vulnerability Prediction Using Long Short-Term


Memory Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
Yunfei Liu, Jing Li, and Yi Zhuang

Forecasting Hospital Daily Occupancy Using Patient Journey


Data - A Heuristic Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Shaowen Qin and Dale Ward

An Airport Scene Delay Prediction Method Based on LSTM . . . . . . . . . . . . 160


Zhongbin Li, Haiyan Chen, Jiaming Ge, and Kunpeng Ning

DSDCS: Detection of Safe Driving via Crowd Sensing . . . . . . . . . . . . . . . . 170


Yun Du, Xin Guo, Chenyang Shi, Yifan Zhu, and Bohan Li

Power Equipment Fault Diagnosis Model Based on Deep Transfer


Learning with Balanced Distribution Adaptation . . . . . . . . . . . . . . . . . . . . . 178
Kaijie Wang and Bin Wu

Event Extraction with Deep Contextualized Word Representation


and Multi-attention Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
Ruixue Ding and Zhoujun Li

A Novel Unsupervised Time Series Discord Detection Algorithm


in Aircraft Engine Gearbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
Zhongyu Wang, Dechang Pi, and Ya Gao

A More Secure Spatial Decompositions Algorithm via Indefeasible Laplace


Noise in Differential Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
Xiaocui Li, Yangtao Wang, Xinyu Zhang, Ke Zhou, and Chunhua Li

Towards Geological Knowledge Discovery Using Vector-Based


Semantic Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
Majigsuren Enkhsaikhan, Wei Liu, Eun-Jung Holden, and Paul Duuring

Keep Calm and Know Where to Focus: Measuring and Predicting


the Impact of Android Malware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
Junyang Qiu, Wei Luo, Surya Nepal, Jun Zhang, Yang Xiang,
and Lei Pan
Contents XIII

Fault Diagnosis for an Automatic Shell Magazine Using FDA and ELM . . . . 255
Qiangqiang Zhao, Lingfeng Tao, Maosheng Li, and Peng Hong

Anomalous Trajectory Detection Using Recurrent Neural Network . . . . . . . . 263


Li Song, Ruijia Wang, Ding Xiao, Xiaotian Han, Yanan Cai,
and Chuan Shi

Text and Multimedia Mining

Detecting Spammers with Changing Strategies via a Transfer Distance


Learning Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
Hao Chen, Jun Liu, and Yanzhang Lv

Adversarial Learning for Topic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 292


Tomonari Masada and Atsuhiro Takasu

Learning Concise Relax NG Schemas Supporting Interleaving


from XML Documents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
Yeting Li, Xiaoying Mou, and Haiming Chen

Deep Group Residual Convolutional CTC Networks


for Speech Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
Kai Wang, Donghai Guan, and Bohan Li

Short Text Understanding Based on Conceptual and Semantic Enrichment . . . 329


Qiuyan Shi, Yongli Wang, Jianhong Sun, and Anmin Fu

LDA-PSTR: A Topic Modeling Method for Short Text . . . . . . . . . . . . . . . . 339


Kai Zhou and Qun Yang

Vertical and Sequential Sentiment Analysis of Micro-blog Topic . . . . . . . . . 353


Shuo Wan, Bohan Li, Anman Zhang, Kai Wang, and Xue Li

Abstractive Document Summarization via Bidirectional Decoder . . . . . . . . . . 364


Xin Wan, Chen Li, Ruijia Wang, Ding Xiao, and Chuan Shi

Miscellaneous Topics

CBPF: Leveraging Context and Content Information


for Better Recommendations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
Zahra Vahidi Ferdousi, Dario Colazzo, and Elsa Negre

Discovering High Utility Change Points in Customer Transaction Data . . . . . 392


Philippe Fournier-Viger, Yimin Zhang, Jerry Chun-Wei Lin,
and Yun Sing Koh
XIV Contents

Estimating Interactions of Functional Brain Connectivity


by Hidden Markov Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
Xingjuan Li, Yu Li, and Jiangtao Cui

From Complex Network to Skeleton: mj -Modified Topology Potential


for Node Importance Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
Hanning Yuan, Kanokwan Malang, Yuanyuan Lv,
and Aniwat Phaphuangwittayakul

Nodes Deployment Optimization Algorithm Based on Energy Consumption


of Underwater Wireless Sensor Networks. . . . . . . . . . . . . . . . . . . . . . . . . . 428
Min Cui, Fengtong Mei, Qiangyi Li, and Qiangnan Li

A New Graph-Partitioning Algorithm for Large-Scale Knowledge Graph . . . . 434


Jiang Zhong, Chen Wang, Qi Li, and Qing Li

SQL Injection Behavior Mining Based Deep Learning . . . . . . . . . . . . . . . . . 445


Peng Tang, Weidong Qiu, Zheng Huang, Huijuan Lian,
and Guozhen Liu

Evaluation Methods of Hierarchical Models . . . . . . . . . . . . . . . . . . . . . . . . 455


Abdulqader M. Almars, Ibrahim A. Ibrahim, Xin Zhao,
and Sanad Al-Maskari

Local Community Detection Using Greedy Algorithm with Probability . . . . . 465


Xiaoxiang Zhu and Zhengyou Xia

A Player Behavior Model for Predicting Win-Loss Outcome


in MOBA Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474
Xuan Lan, Lei Duan, Wen Chen, Ruiqi Qin, Timo Nummenmaa,
and Jyrki Nummenmaa

An Improved Optimization of Link-Based Label Propagation Algorithm . . . . 489


Xiaoxiang Zhu and Zhengyou Xia

Research on Commodity Recommendation Algorithm Based on RFN . . . . . . 499


Kai Wang, Bohan Li, Shuo Wan, Anman Zhang, and Donghai Guan

Research of Personalized Recommendation System Based


on Multi-view Deep Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514
Yunfei Zi, Yeli Li, and Huayan Sun

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531


Data Mining Foundations
Efficiently Mining Constrained
Subsequence Patterns

Abdullah Albarrak1(B) , Sanad Al-Maskari2 , Ibrahim A. Ibrahim3,4 ,


and Abdulqader M. Almars3
1
Al Imam Mohammad Ibn Saud Islamic University, Riyadh, Saudi Arabia
amsbarrak@imamu.edu.sa
2
Sohar University, Sohar, Oman
smaskari@soharuni.edu.om
3
University of Queensland, Brisbane, Australia
{i.ibrahim,a.almars}@uq.edu.au
4
Minia University, Minya, Egypt
i.ibrahim@minia.edu.org

Abstract. Big time series data are generated daily by various applica-
tion domains such as environment monitoring, internet of things, health
care, industry and science. Mining this massive data is a very challenging
task because conventional data mining algorithms are unable to scale
effectively with massive time series data. Moreover, applying a global
classification approach to a highly similar and noisy data will hinder the
classification performance. Therefore, utilizing constrained subsequence
patterns in data mining applications increases the efficiency, accuracy,
and could provide useful insight into the data.
To address the above mentioned limitations, we propose an efficient
subsequence processing technique with preferences constraints. Then, we
introduce a sub-patterns analysis for time series data. The sub-pattern
analysis objective is to maximize the interclass separability using a local-
ization approach. Furthermore, we make use of the deviation from a cor-
relation constraint as an objective to minimize in our problem, and we
include users preferences as an objective to maximize in proportion to
users’ preferred time intervals. We experimentally validate the efficiency
and effectiveness of our proposed algorithm using real data to demon-
strate its superiority and efficiency when compared to recently proposed
correlation-based subsequence search algorithms.

1 Introduction

Time series data nowadays is in continuous increase in terms of size and complex-
ity. It is being generated, gathered and stored in unprecedented rate, whether
for the purpose of financial analysis (e.g., exchange rates, stock market), envi-
ronment monitoring [1–4], health care [13,16], social networks [17]. This increase
easily overwhelms data mining users when applying data mining applications on
these ever-growing time series data.
c Springer Nature Switzerland AG 2018
G. Gan et al. (Eds.): ADMA 2018, LNAI 11323, pp. 3–16, 2018.
https://doi.org/10.1007/978-3-030-05090-0_1
4 A. Albarrak et al.

One fundamental, preprocessing step of mining time series data is extracting


representative features from the raw time series data [5–7]. In this paper, we focus
on extracting correlated subsequences patterns [8,10,11,15,18] as features, where
correlation is the Pearson Correlation Coefficient (ρ). Our choice of Pearson
correlation is based on the fact that it is the most suitable measure for meaningful
comparisons of time series, as stated in the literature [6,9,14].
In reality, it is not realistic to assume perfect separation between different
classes. In fact, in time series data, a high overlapping could exist; due to noise,
highly similar objects, unknown factors, and limited understanding. Therefore,
instead of using the full time series sequences, we propose to use the most inter-
esting and discriminant subsequences, i.e., the correlated subsequence patterns.
We refer to such subsequences as sub-patterns (SPs). By identifying the most
discriminant SP, highly overlapped classes can be discriminated without the
need to store, process and extract features for all data. Additionally, in cases of
highly noisy and similar data set, using global features can impact the classifica-
tion performance. Finally, using SPs in data mining applications is much faster
than current classification approaches, making it a better candidate for mining
big data.
Interestingly, identifying SPs is not trivial, it is actually CPU and I/O inten-
sive. Hence, several works have focused on optimizing it using complex indexing
methods and pruning techniques. For instance, SKIP [10] was proposed to find
the longest subsequence of two time series having a correlation above a threshold
value θ, without any prior knowledge of the subsequence length m. Jocor [11]
focus on finding the subsequence with the highest correlation value such that it
has a length above a threshold value ml. While the above works are successful
in extracting subsequences satisfying correlation thresholds, they fall short in
addressing correlation thresholds as ranges (i.e., correlation between θ1 and θ2 )
and users preferences.

Fig. 1. A snippet of air quality sensor data. Large redundant, meaningless and irrev-
erent data exist which does not contribute positively in the classification process. The
two most significant sections are labeled Pattern 1 and Pattern 2.
Efficiently Mining Constrained Subsequence Patterns 5

Example 1. Figure 1 shows a snippet of a large air quality time series data from
multiple sensors. Because most of these time series are redundant, they do not
contribute positively in the classification process. However, it can be seen that
pattern 1 and pattern 2 are the most significant sections since they provide
highly distinguishable features.

Subsequence patterns 1 and 2 in Example 1 are identified by two constraints:


1. The subsequences must exhibit a correlation within a target range, and
2. The subsequences must be as close as possible to specific time interval.
Employing users’ preferences to extract correlated subsequences promotes
optimization opportunities with guarantees on the result’s accuracy. Plainly, our
proposed efficient algorithm strives to reduce CPU cost through harnessing users’
preferences to limit the search space, and incrementally estimate subsequences
correlation by using precomputed data in an efficient manner.
We summarize our contributions as follows:
– Incorporating users preferences and correlation ranges as objectives to extract
constrained subsequences patterns from raw time series data.
– Proposing an efficient CPU-centric algorithm to find optimal constrained sub-
sequences patterns by utilizing: (a) users preferences to limit, navigate and
prune the search space, and (b) multiple copies of cumulative arrays which
summarize time series values.
– Conducting experiments on real data to evaluate our algorithm and show
the efficiency it provides when compared to recent algorithm for subsequence
time series search.

2 Preliminaries and Definitions


We assume there are k data series x1 , x2 , ..., xk such that all series are of equal
length. A data series xi is a series of n consecutive values xi = {v1 , v2 , ..., vn }
that have an implicit ordering. For instance, xi is a time series if its values
ordering is based on a timestamp domain (e.g., date and/or time). While our
work is based on time series, it can be generalized to data series too.
A subsequence of xi is constructed by a time interval [s, e], xi [s, e] =
{vj , vj+1 , ..., ve } where j = s, s < e and e ≤ n, as shown in Fig. 2.
Our focus in this paper is on mining correlated subsequence patterns. The
correlation is computed between a pair of synchronized subsequences xi [s, e]
and xj [s, e] constructed from time series xi and xj , respectively. Henceforth, we
refer to the pair xi [s, e] and xj [s, e] as a candidate subsequence Sc (xi , xj , s, e)
or briefly Sc when there is no need to specify which time series and what time
interval.
As discussed in Example 1, the most interesting subsequence patterns must
be within a certain correlation range. In other words, subsequences with the
minimum deviation from a target correlation range are more preferable. We
explain next how to support this objective in details (Table 1).
6 A. Albarrak et al.

Table 1. Symbols and definitions

Symbol Definition
xi A time series
n Length of time series
m Length of subsequence time series
xi [s, e] A subsequence of xi
λ Weight of preference
tc Target correlation range [cl − cu ]
SI Input(initial) subsequence
Sc Candidate subsequence
ΔρSc Correlation deviation of Sc
ΔE
Sc Preference deviation of Sc
ΔSc Total deviation of Sc

2.1 Correlation Deviation


The deviation in correlation Δρ for a candidate subsequence Sc from a target
correlation range tc = [cl − cu ] is a measure of how far the correlation value of
Sc is to the target range. Formally:


⎨0 if ρ(Sc ) ∈ [cl − cu ]
ΔρSc = ρ(Sc ) − cl if ρ(Sc ) < cl (1)


ρ(Sc ) − cu if ρ(Sc ) > cu

where ρ(Sc ) is a function that returns the Pearson correlation coefficient of the
subsequences in Sc . The more ΔρSc approaches zero, the more Sc is preferable.
Pearson correlation coefficient of two subsequences x and y (in Sc ) of length
m is computed as follows [10]:

m 
m 
m
xi yi − xi yi
i=1 i=1 i=1
ρ(Sc ) = ρ(x, y) =   (2)

m 
m 
m 
m
x2i −( xi )2 yi2 −( yi )2
i=1 i=1 i=1 i=1

But how to obtain any candidate subsequence in the first place? We explain
the how next.

2.2 Enumerating Subsequences


Our algorithm enumerates candidate subsequences by recursively expanding and
contracting the initial subsequence SI (x, y, s, e). Specifically, there are two oper-
ations applied on the time interval [s, e], as shown in Fig. 2.
Efficiently Mining Constrained Subsequence Patterns 7

v1 v2 v3 .. .. .. vj .. .. .. vj+(e-s) .. .. .. vn
Time series x .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..

Expansion on s Expansion on e
.. .. .. .. .. .. .. .. ..
side side
vs’ vs ve ve’

Candidate subsequence SI ( x, y, s, e ) x[s,e] y[s,e]

vs’ vs ve ve’
Expansion on s Expansion on e
.. .. .. .. .. .. .. .. ..
side side

Time series y .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
v1 v2 v3 .. .. .. vj .. .. .. vj+(e-s) .. .. .. vn

Fig. 2. Enumerating subsequences from SI (x, y, s, e) by expanding either sides s or e,


i.e., LE, RE

1. Expansion: to expand [s, e] from either sides s or e by δ.


For instance, [ŝ, e] is expanded from s side by δ such that ŝ = s − δ while
[s, ê] is expanded from e side by δ such that ê = e + δ.
We encode those two operations as LE (left expansion) and RE (right expan-
sion).
2. Contraction: to contract [s, e] from either sides s or e by δ.
For instance, [ŝ, e] is contracted from s side by δ such that ŝ = s + δ while
[s, ê] is contracted from e side by δ such that ê = e − δ.
Similarly, we encode those two operations as LC (left contraction) and RC
(right contraction).
With those two operations, our algorithm is able to recursively generate all
possible n(n+1)
2 combinations of candidate subsequences. To remove any approx-
imation and to ensure no possible candidate subsequence is missed, we set δ = 1.
Now, the question is, how to judge whether a subsequence Sc is more bene-
ficial than another one? We answer this question by quantifying the benefit of a
subsequence as a preference deviation, which is explained next.

2.3 Preference Deviation


Since users prefer similar subsequences to their first impression (i.e., SI ), we
include users preferences as an objective. It is a normalized value that indicates
how far Sc is from the input subsequence SI . A subsequence that is far from a
user’s preference will exhibit a high deviation, and vise versa:
|SI .s − Sc .s| |SI .e − Sc .e|
ΔE
Sc = + (3)
n n
where n is the length of the two time series. Note that the lower the value of
ΔE
Sc , the more beneficial Sc is.
8 A. Albarrak et al.

Algorithm 1. Baseline (simplified SKIP)


1: Input: SI (x, y, s, e), preference weight λ, target correlation tc, minimum length
ml, α
2: Return: Sp , best;
3: calculate Sxα , Sxα2 , Syα , Syα2 and Sxy
α
for whole time series x, y
4: best = 1; Sp ← SI
5: for ( l = n to ml ) do
6: for ( t = 0 to n − l ) do
7: Si ← (x, y, t, (t + l − 1));
8: for ( z ∈ {x, y, x2 , y 2 , xy}) do
9: if ( t = 0 ) then

10: Compute z by Szα
11: else 
12: Update z incrementally
13: Compute ρ(Si ) ;
ρ
14: ΔSi = λΔE Si + (1 − λ)ΔSi ;
15: if ( ΔSi < best ) then
16: best = ΔSi ; Sp ← Si ;
17: return Sp , best;

2.4 Problem Definition


Now, we are in position to formally define the problem of extracting constrained
subsequences from raw time series data.
Definition 1. Given SI (x, y, s, e), and a target correlation range tc. Find the
optimal subsequence Sp (x, y, ŝ, ê) such that Sp minimizes the deviation in cor-
relation ΔρSp while considering users preferences by minimizing the preference
deviation ΔESp 
As stated in Definition 1, our objective is to look for the subsequence Sp that
minimizes the overall deviation defined as:
ρ
ΔSp = λΔE
Sp + (1 − λ)ΔSp (4)
The parameter λ is used to control the trade off between satisfying the correlation
and preference deviation. Setting λ = 0 means users preferences are neglected,
which is a special case of our general objective.

3 Methodology
Mining for the optimal subsequence pattern is essentially a search problem. That
is, to find Sp , an algorithm has to iterative over all possible combinations of
subsequences, compute the objective Eq. (4), while keeping the one with the
minimum deviation.
Iterating over all subsequences is obviously an intensive task and incurs high
computational overhead. Hence, we tackle this problem from this angle, and
propose a computational-centric (Sect. 3.3) algorithm with efficient optimization
techniques.
Efficiently Mining Constrained Subsequence Patterns 9

Fig. 3. Our proposed method for subsequence pattern mining.

3.1 Proposed Methods


Figure 3 illustrates our proposed method. Instead of using the full time series
data for training which consist of irrelevant, noisy and weakly-labeled data, we
focus on extracting most relevant, discriminant and meaningful subsets, which
can be used to accurately represent the target class. For instance, Fig. 1 shows a
small section of a training data for air quality sensors. From this data, we extract
only the most interesting subsequences to be used in the training set (pattern
1 and pattern 2). The process of identifying most interesting, and relevant sub-
sequence is automated and does not require expensive human manual labeling.
Finally, all identified SPs will be added to the SP Dictionary. The SP Dictio-
nary is a smaller subset of the full training data set which contain discriminant,
relevant and representative sub-patterns of the original data.

3.2 Computations of Correlation


Computations cost of correlation in Eq. 2 increases linearly with n (i.e., O(n)).
Specifically, each of the summation components in Eq. 2 will perform n summa-
tion operations. With the observation that Eq. 2 can be computed incrementally
[15], [10] proposed the α-skipping cumulative array to compute it in O(α) time,
where α ≪ n.
In the α-skipping cumulative array [10], each time series x of length n has
two cumulative arrays; the sum of values and the sum of square values, Sxα and
n
Sxα2 , resp. Those arrays are of length α . An element in those arrays is computed
as follows (see Fig. 4):
j∗α
α

Sw [j] = w, if (j ∗ α) mod α = 0
i=1
n
where j = (1, 2, ..., ), andw ∈ {x[i], (x[i])2 , x[i]y[i]}
α
10 A. Albarrak et al.

Hence, with those α-skipping cumulative arrays, the components in Eq. 2 are
computed in O(α) time.

Time v1 v2 v3 .. .. .. .. vj-1 vj .. .. vj+α .. .. .. vm


series ………..
(x)

1 2 3 m/α
α
Sx ………..
1 2 3 m/α
α ………..
Sx 2

Fig. 4. Constructing α-cumulative sum arrays Sxα and Sxα2 for time series x of length
m. α = 4

Next, we describe two algorithms which utilize the α-skipping cumulative


arrays to find Sp . Then, we show our algorithm which incorporates the preference
objective to optimize the search for Sp without sacrificing the solution quality.

3.3 Computational-Centric Algorithms


In this section we discuss three algorithms to find the optimal solution Sp . We
start by showing a brute force algorithm that we consider as a baseline to com-
pare against the other algorithms.
Baseline: This is a simplified version of SKIP [10] which was proposed to find
the longest correlated subsequence of multiple time series. We simplified and
adapted the algorithm for two time series, i.e., O = {x, y}.
Essentially, Baseline utilizes two nested for-loops to generate the combina-
tions of subsequences. The outer-loop specifies the right hand side of the time
interval, while the inner-loop specifics the left hand side. In each iteration, the
interval represents a candidate subsequence Si . Then, Baseline computes corre-
lation using the cumulative arrays that are generated for the whole time series
in advance. This result in a complexity of O(α) only.
Then, the values computed from the previous step are updated incrementally
to compute correlation for the next candidate subsequences, until the inner-loop
finishes. Specifically, the next candidate subsequence is generated by shifting the
interval to the right by one step, which means adding one value to the right hand
side, and subtracting one value from the left.
Baseline++ (limited search space): Differently than the previous algorithm,
Baseline++ limits the search space before navigating. Hence, it searches a small
Efficiently Mining Constrained Subsequence Patterns 11

Algorithm 2. Baseline++ (limited search space)


1: Input: SI (x, y, s, e), preference weight λ, target correlation tc, minimum length
ml, α
2: Return: Sp , best;
3: best = (1-λ)ΔρSI ; Sp ← SI ;
4: i = SI .s; j = SI .e;
5: while ( i ≥ 0 —— j ≤ n ) do
6: Si ←(i,j);
7: if ( λΔE Si > best ) then
8: maxs = i; maxe = j; break;
9: i ← i − 1; j ← j + 1;
10: calculate Sxα , Sxα2 , Syα , Syα2 and Sxy
α
for time interval [maxs, maxe] of x, y
11: n = maxe − maxs + 1;
12: for ( l = n to ml ) do
13: for ( t = maxs to n − l ) do
14: Si ← (x, y, t, (t + l − 1));
15: for ( z ∈ {x, y, x2 , y 2 , xy}) do
16: if ( t = maxs ) then α
17: Compute z by Sz
18: else 
19: Update z incrementally
20: Compute ρ(Si ) ;
ρ
21: ΔSi = λΔE Si + (1 − λ)ΔSi ;
22: if ( ΔSi < best ) then
23: best = ΔSi ; Sp ← Si ;
24: return Sp , best;

part of the time series instead of the whole time series. To achieve that, it uses
the preference objective to limit the search space (lines 5–10).
Specifically, it defines two variables maxs and maxe as new boundaries for
the search space. Then, starting from SI ’s boundaries (line 4), it enters a while-
loop (line 5) until: (1) both sides of the time series have been reached, or (2) the
preference deviation reached its maximum possible value (lines 7,8).
There are two prominent differences between Baseline and Baseline++.
Firstly, the generation of the cumulative arrays. The cumulative arrays in Base-
line++ are generated for a sub time interval of x and y, which are essentially
the new found boundaries [maxs, maxe] (line 10), not the whole interval as in
Baseline.
Secondly, the new boundaries [maxs, maxe] in Baseline++ will decrease the
number of candidate subsequences (when λ > 0). As a result, the cost of search
will decrease too. Though, Baseline++’s technique in optimizing the search cost
is limited by the subsequence/time series length ratio.
Next, we introduce our algorithm Incremental (INC) which uses the prefer-
ence deviation as an optimization technique to prune unpromising subsequences.
INC further optimize the computations of correlation by a simple technique:
12 A. Albarrak et al.

Algorithm 3. Incremental (INC)


1: Input: SI (x, y, s, e), preference weight λ
2: Return: Sp , best;
3: Define and compute SxA , SxA2 , SyA , SyA2 , SxyA
for [s, e]
4: where A ∈ {RE, RC, LE, LC}
5: best = (1-λ)ΔρSI ; Sp ← SI ;
6: Q.push(SI );
7: while ( Q = φ & ΔT H ≤ best) do
8: Sc ← Enumerate(Q.pop());
9: for ( each Si ∈ Sc s.t. Si not visited ) do
10: Lookup A of Si , i.e., {RE, RC, LE, LC}
11: Update SxA , SxA2 , SyA , SyA2 , Sxy
A
incrementally
12: Compute ρ(Si );
ρ
13: ΔSi = λΔE Si + (1 − λ)ΔSi ;
14: if ( ΔSi < best ) then
15: best = ΔSi ; Sp ← Si ;
16: ΔT H = λΔE Si ;
17: Q.push(Si );
18: return Sp , best;

creating and maintaining four instances of the cumulative arrays, one for each
enumeration operation: LE, RE, LC, RC.
Incremental (INC): Essentially, INC starts from SI to generate all possible
subsequences with the help of an auxiliary function Enumerate() and a pri-
ority queue Q. This auxiliary function takes a subsequence as an input, and
performs the four enumeration operations defined earlier in Sect. 2.2 to produce
the set Sc . Then, for each Si ∈ Sc , the corresponding cumulative arrays instance
are updated incrementally. After that, the correlation is computed from this
instance, and Si is pushed into Q. Once all candidate subsequences in Sc have
been processed, INC pops an un-enumerated subsequence Si from Q and calls
Enumerate(), and so on. At any time, the candidate subsequences in Q are
ascendingly sorted from the closest to SI to the furthest based on Eq. 3.
To avoid an exhaustive search, INC abandons the search when reaching a
point where any candidate subsequence to be generated has a preference devi-
ation higher than the best deviation found so far, or when the queue becomes
empty, as shown in Algorithm 3 line 7.

4 Experiments Setup and Results


We conducted a set of experiments on real-world dataset to evaluate our algo-
rithms on a PC with Intel Core i7 @ 3.40 GHz, 16 GB of RAM. Algorithms were
coded in Java. We report the results next.
Efficiently Mining Constrained Subsequence Patterns 13

Fig. 5. Set of experiments on real dataset extracted from Yahoo! Finance.

4.1 Setup
Table 2 summarizes all parameters used throughout the experiments and the
dataset settings.

Table 2. Parameters of the experiments

Parameter Default Range


Preference weight (λ) 0.5 0.0–1.0
Target correlation (tc) - 0.0–1.0
Query/time series ratio 20% 10%–100%

Dataset: We have experimented with a real dataset which contains 61 time


series. Each time series represents the historical daily closing price of a company
listed in the US stocks market from 2/1/2009 to 31/12/2013, n = 1, 258 ≈ 103 .
This dataset was manually extracted from Yahoo! Finance and had to be merged
and processed to be suitable for experimenting. Also, we have experimented with
a synthetically generated time series dataset using the Random Walk model [12].
There are a total of 10,000 time series in this dataset, with length n = 2, 000.
14 A. Albarrak et al.

Performance Measures: We compare the computational-centric algorithms


against Baseline and Baseline++ using the CPU cost as an indicator for effi-
ciency. The CPU cost of an algorithm is the number of float point operations
(FLOPs) the algorithm had to make to find the optimal solution. Specifically,
those operations are the addition, subtraction, multiplication and division oper-
ations of numbers made to obtain the Pearson correlation.
As for effectiveness, we use the total deviation Δ defined in Eq. 4.
We averaged FLOPs and Δ for a workload of 103 input sequences. Specif-
ically, each item in the workload consists of 10 pairs (chosen randomly out of
k) time series, 10 uniformly distributed time intervals [s, e] and 10 uniformly
distributed target correlation ranges [cl − cu ] where: 0 ≤ cl , cu ≤ 1 and cl ≤ cu .
Note that a pair of time series and a time interval construct one instance of
input sequence SI .

4.2 Results

Preference Weight (λ): To show the impact of λ, we evaluated the algorithms


with our workload while varying λ and averaged the result. Figure 5a and c show
the average FLOPs and average deviation respectively.
All three algorithms achieve the same accuracy in terms of deviation. How-
ever, they exhibit different costs. When compared to the Baseline and Base-
line++, INC manages to find the same solution as them but with almost half
the cost when λ = 0. The reason is, INC keeps four sets of the cumulative arrays.
One for each move. This enables INC to update the cumulative arrays by only
adding or subtracting one value (one float point operation) from the previously
computed cumulative arrays, while Baseline and Baseline++ have to add and
subtract two values (two float point operations) from the previously computed
cumulative arrays.
By increasing the preference weight, both Baseline++ and INC costs decrease
by a considerable amount while Baseline stays constant. The reason is: increasing
the weight of preference for Baseline++ and INC is orthogonal to increasing
the tightness of the search space in Baseline++ and to early abandoning the
search by the threshold ΔT A in INC. Also, Fig. 5c shows a tradeoff between the
conflicting two objectives, which follows a semi-bell shape pattern.
Target Correlation (tc): To show the relationship between tc and FLOPs,
we set λ = 0.5 and varied tc such that cl = cu . As Fig. 5d shows, Baseline
exhibit a constant performance regardless of the value tc, while Baseline++ is
the most affected by tc. The closer tc is to 0.5 (which happens to be the average
correlation among all time series in the dataset), the less work Baseline++ will
have to do to achieve the optimal solution. In other words, Baseline++ will
achieve tighter search space if tc is close from the input’s correlation ρ(SI ) since
best in Algorithm 2 line 3 will be very close from zero.
Efficiently Mining Constrained Subsequence Patterns 15

5 Conclusion
We have addressed the challenging problem of finding constrained subsequence
patterns in time series data, to increase the efficiency and accuracy of data
mining applications. Then, we proposed our efficient subsequence processing
techniques and illustrated the reasons behind their design choices. Finally, we
empirically demonstrated the efficiency of our techniques using real and synthetic
dataset.

Acknowledgments. We would like to thank Lemma solutions (www.lemma.com.au)


for their help during the production of this paper.

References
1. Al-Maskari, S., Bélisle, E., Li, X., Le Digabel, S., Nawahda, A., Zhong, J.: Clas-
sification with quantification for air quality monitoring. In: Bailey, J., Khan, L.,
Washio, T., Dobbie, G., Huang, J.Z., Wang, R. (eds.) PAKDD 2016. LNCS (LNAI),
vol. 9651, pp. 578–590. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-
31753-3 46
2. Al-Maskari, S., Guo, W., Zhao, X.: Biologically inspired pattern recognition for
e-nose sensors. In: Li, J., Li, X., Wang, S., Li, J., Sheng, Q.Z. (eds.) ADMA 2016.
LNCS, vol. 10086, pp. 142–155. Springer International Publishing, Cham (2016).
https://doi.org/10.1007/978-3-319-49586-6 10
3. Al-Maskari, S., Ibrahim, I.A., Li, X., Abusham, E., Almars, A.: Feature extraction
for smart sensing using multi-perspectives transformation. In: Wang, J., Cong, G.,
Chen, J., Qi, J. (eds.) ADC 2018. LNCS, vol. 10837, pp. 236–248. Springer, Cham
(2018). https://doi.org/10.1007/978-3-319-92013-9 19
4. Al-Maskari, S., Li, X., Liu, Q.: An effective approach to handling noise and drift in
electronic noses. In: Wang, H., Sharaf, M.A. (eds.) ADC 2014. LNCS, vol. 8506, pp.
223–230. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08608-8 21
5. Fu, T.C.: A review on time series data mining. Eng. Appl. Artif. Intell. 24(1),
164–181 (2011)
6. Gavrilov, M., Anguelov, D., Indyk, P., Motwani, R.: Mining the stock market
(extended abstract): which measure is best? In: Proceedings of the Sixth ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining, 20–
23 August 2000, Boston, MA, USA, pp. 487–496 (2000)
7. Ghazavi, S.N., Liao, T.W.: Medical data mining by fuzzy modeling with selected
features. Artif. Intell. Med. 43(3), 195–206 (2008)
8. Ibrahim, I.A., Albarrak, A.M., Li, X.: Constrained recommendations for query
visualizations. Knowl. Inf. Syst. 51(2), 499–529 (2017)
9. Keogh, E.J., Kasetty, S.: On the need for time series data mining benchmarks:
a survey and empirical demonstration. Data Min. Knowl. Discov. 7(4), 349–371
(2003)
10. Li, Y., U, L.H., Yiu, M.L., Gong, Z.: Discovering longest-lasting correlation in
sequence databases. PVLDB 6(14), 1666–1677 (2013)
11. Mueen, A., Hamooni, H., Estrada, T.: Time series join on subsequence correla-
tion. In: 2014 IEEE International Conference on Data Mining, ICDM 2014, 14–17
December 2014, Shenzhen, China, pp. 450–459 (2014)
16 A. Albarrak et al.

12. Mueen, A., Nath, S., Liu, J.: Fast approximate correlation for massive time-series
data. In: Proceedings of the ACM SIGMOD International Conference on Manage-
ment of Data, SIGMOD 2010, 6–10 June 2010, Indianapolis, Indiana, USA, pp.
171–182 (2010)
13. Raghupathi, W., Raghupathi, V.: Big data analytics in healthcare: promise and
potential. Health Inf. Sci. Syst. 2(1), 1 (2014)
14. Rakthanmanon, T., et al.: Searching and mining trillions of time series subse-
quences under dynamic time warping. In: The 18th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, KDD 2012, 12–16 August
2012, Beijing, China, pp. 262–270 (2012)
15. Sakurai, Y., Papadimitriou, S., Faloutsos, C.: BRAID: stream mining through
group lag correlations. In: Proceedings of the ACM SIGMOD International Con-
ference on Management of Data, 14–16 June 2005, Baltimore, Maryland, USA, pp.
599–610 (2005)
16. Utomo, C., Li, X., Wang, S.: Classification based on compressive multivariate time
series. In: Cheema, M.A., Zhang, W., Chang, L. (eds.) ADC 2016. LNCS, vol. 9877,
pp. 204–214. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46922-
5 16
17. Nahar, V., Al-Maskari, S., Li, X., Pang, C.: Semi-supervised learning for cyberbul-
lying detection in social networks. In: Wang, H., Sharaf, M.A. (eds.) ADC 2014.
LNCS, vol. 8506, pp. 160–171. Springer, Cham (2014). https://doi.org/10.1007/
978-3-319-08608-8 14
18. Zhu, Y., Shasha, D.: Statstream: statistical monitoring of thousands of data
streams in real time. In: Proceedings of 28th International Conference on Very
Large Data Bases, VLDB 2002, 20–23 August 2002, Hong Kong, China, pp. 358–
369 (2002)
Another random document with
no related content on Scribd:
PLATE XXX

TURTLE-HEAD.—C. glabra.

Common Dodder. Love Vine.


Cuscuta Gronovii. Convolvulus Family.

Stems.—Yellow or reddish, thread-like, twining, leafless. Flowers.—White, in


close clusters. Calyx.—Five-cleft. Corolla.—With five spreading lobes. Stamens.—
Five. Pistil.—One, with two styles.
Late in the summer we are perhaps tempted deep into some
thicket by the jasmine-scented heads of the button-bush or the
fragrant spikes of the clethra, and note for the first time the tangled
golden threads and close white flower-clusters of the dodder. If we
try to trace to their source these twisted stems, which the Creoles
know as “angels’ hair,” we discover that they are fastened to the bark
of the shrub or plant about which they are twining by means of small
suckers; but nowhere can we find any connection with the earth, all
their nourishment being extracted from the plant to which they are
adhering. Originally this curious herb sprang from the ground which
succored it until it succeeded in attaching itself to some plant; having
accomplished this it severed all connection with mother-earth by the
withering away or snapping off of the stem below.
The flax-dodder, C. Epilinum, is a very injurious plant in
European flax-fields. It has been sparingly introduced into this
country with flax-seed.

Traveller’s Joy. Virgin’s Bower.


Clematis Virginiana. Crowfoot Family.

Stem.—Climbing, somewhat woody. Leaves.—Opposite, three-divided.


Flowers.—Whitish, in clusters, unisexual. Calyx.—Of four petal-like sepals.
Corolla.—None. Stamens and Pistils.—Indefinite in number, occurring on
different plants.
In July and August this beautiful plant, covered with its white
blossoms and clambering over the shrubs which border the country
lanes, makes indeed a fitting bower for any maid or traveller who
may chance to be seeking shelter. Later in the year the seeds with
their silvery plumes give a feathery effect which is very striking.
PLATE XXXI

TRAVELLER’S JOY.—Clematis
Virginiana.

This graceful climber works its way by means of its bending or


clasping leaf-stalks. Darwin has made interesting experiments
regarding the movements of the young shoots of the Clematis. He
discovered that, “one revolved describing a broad oval, in five hours,
thirty minutes; and another in six hours, twelve minutes; they follow
the course of the sun.”

Sweet Pepperbush. White Alder.


Clethra alnifolia. Heath Family.

A shrub from three to ten feet high. Leaves.—Alternate, ovate, sharply


toothed. Flowers.—White, growing in clustered finger-like racemes. Calyx.—Of
five sepals. Corolla.—Of five oblong petals. Stamens.—Ten, protruding. Pistil.—
One, three-cleft at apex.
Nearly all our flowering shrubs are past their glory by
midsummer, when the fragrant blossoms of the sweet pepperbush
begin to exhale their perfume from the cool thickets which line the
lanes along the New England coast. There is a certain luxuriance in
the vegetation of this part of the country in August which is generally
lacking farther inland, where the fairer flowers have passed away,
and the country begins to show the effects of the long days of heat
and drought. The moisture of the air, and the peculiar character of
the soil near the sea, are responsible for the freshness and beauty of
many of the late flowers which we find in such a locality.
Clethra is the ancient Greek name for the alder, which this plant
somewhat resembles in foliage.

Thorn-apple. Jamestown Weed.


Datura Stramonium. Nightshade Family.

Stem.—Smooth and branching. Leaves.—Ovate, wavy-toothed or angled.


Flowers.—White, large and showy, on short flower-stalks from the forks of the
branching stem. Calyx.—Five-toothed. Corolla.—Funnel-form, the border five-
toothed. Stamens.—Five. Pistil.—One. Fruit.—Green, globular, prickly.
The showy white flowers of the thorn-apple are found in waste
places during the summer and autumn, a heap of rubbish forming
their usual unattractive background. The plant is a rank, ill-scented
one, which was introduced into our country from Asia. It was so
associated with civilization as to be called the “white man’s plant” by
the Indians.
Its purple-flowered relative, D. Tatula, is an emigrant from the
tropics. This genus possesses narcotic-poisonous properties.

Wild Balsam-apple.
Echinocystis lobata. Gourd Family.

Stem.—Climbing, nearly smooth, with three-forked tendrils. Leaves.—Deeply


and sharply five-lobed. Flowers.—Numerous, small, greenish-white, unisexual; the
staminate ones growing in long racemes, the pistillate ones in small clusters or
solitary. Fruit.—Fleshy, oval, green, about two inches long, clothed with weak
prickles.
This is an ornamental climber which is found bearing its flowers
and fruit at the same time. It grows in rich soil along rivers in parts
of New England, Pennsylvania, and westward; and is often cultivated
in gardens, making an effective arbor-vine. The generic name is from
two Greek words which signify hedgehog and bladder, in reference
to the prickly fruit.

White Asters.
Aster. Composite Family (p. 13).

Flower-heads.—Composed of white ray-flowers with a centre of yellow disk-


flowers.
While we have far fewer species of white than of blue or purple
asters, some of these few are so abundant in individuals as to hold
their own fairly well against their bright-hued rivals.
The slender zigzag stems, thin, coarsely toothed, heart-shaped
leaves, and white, loosely clustered flower-heads of A. corymbosus,
are noticeable along the shaded roadsides and in the open woods of
August.
Bordering the dry fields at this same season are the spreading
wand-like branches, thickly covered with the tiny flower-heads as
with snowflakes, of A. ericoides.
A. umbellatus is the tall white aster of the swamps and moist
thickets. It sometimes reaches a height of seven feet, and can be
identified by its long tapering leaves and large, flat flower-clusters.
A beautiful and abundant seaside species is A. multiflorus. Its
small flower-heads are closely crowded on the low, bushy, spreading
branches; its leaves are narrow, rigid, crowded, and somewhat hoary.
The whole effect of the plant is heath-like; it also somewhat suggests
an evergreen.

Boneset. Thoroughwort.
Eupatorium perfoliatum. Composite Family (p. 13).
Stem.—Stout and hairy, two to four feet high. Leaves.—Opposite, widely
spreading, lance-shaped, united at the base around the stem. Flower-heads.—Dull
white, small, composed entirely of tubular blossoms borne in large clusters.
To one whose childhood was passed in the country some fifty
years ago the name or sight of this plant is fraught with unpleasant
memories. The attic or wood-shed was hung with bunches of the
dried herb which served as so many grewsome warnings against wet
feet, or any over-exposure which might result in cold or malaria. A
certain Nemesis, in the shape of a nauseous draught which was
poured down the throat under the name of “boneset tea,” attended
such a catastrophe. The Indians first discovered its virtues, and
named the plant ague-weed. Possibly this is one of the few herbs
whose efficacy has not been over-rated. Dr. Millspaugh says: “It is
prominently adapted to cure a disease peculiar to the South, known
as break-bone fever (Dengue), and it is without doubt from this
property that the name boneset was derived.”

White Snakeroot.
Eupatorium ageratoides. Composite Family (p. 13).

About three feet high. Stem.—Smooth and branching. Leaves.—Opposite,


long-stalked, broadly ovate, coarsely and sharply toothed. Flower-heads.—White,
clustered, composed of tubular blossoms.
Although this species is less common than boneset, it is
frequently found blossoming in the rich Northern woods of late
summer.
PLATE XXXII

BONESET.—E. perfoliatum.

Climbing Hemp-weed.
Mikania scandens. Composite Family (p. 13).

Stem.—Twining and climbing, nearly smooth. Leaves.—Opposite, somewhat


triangular-heart-shaped, pointed, toothed at the base. Flower-heads.—Dull white
or flesh-color, composed of four tubular flowers; clustered, resembling boneset.
In late summer one often finds the thickets which line the slow
streams nearly covered with the dull white flowers of the climbing
hemp-weed. At first sight the likeness to the boneset is so marked
that the two plants are often confused, but a second glance discovers
the climbing stems and triangular leaves which clearly distinguish
this genus.
Ladies’ Tresses.
Spiranthes cernua. Orchis Family (p. 17).

Stem.—Leafy below, leafy-bracted above, six to twenty inches high. Leaves.—


Linear-lance-shaped, the lowest elongated. Flowers.—White, fragrant, the lips
wavy or crisped; growing in slender spikes.
This pretty little orchid is found in great abundance in
September and October. The botany relegates it to “wet places,” but I
have seen dry upland pastures as well as low-lying swamps profusely
flecked with its slender, fragrant spikes. The braided appearance of
these spikes would easily account for the popular name of ladies’
tresses; but we learn that the plant’s English name was formerly
“ladies’ traces,” from a fancied resemblance between its twisted
clusters and the lacings which played so important a part in the
feminine toilet. I am told that in parts of New England the country
people have christened the plant “wild hyacinth.”
The flowers of S. gracilis are very small, and grow in a much
more slender, one-sided spike than those of S. cernua. They are
found in the dry woods and along the sandy hill-sides from July
onward.
PLATE XXXIII

LADIES’ TRESSES.—S. cernua.

Green-flowered Milkweed.
Asclepias verticillata. Milkweed Family.

Stem.—Slender, very leafy to the summit. Leaves.—Very narrow, from three to


six in a whorl. Flowers.—Greenish-white, in small clusters at the summit and along
the sides of the stem. Fruit.—Two erect pods, one often stunted.
This species is one commonly found on dry uplands, especially
southward, with flowers resembling in structure those of the other
milkweeds. (Pl. .)
Groundsel Tree.
Baccharis halimifolia. Composite Family (p. 13).

A shrub from six to twelve feet high. Leaves.—Somewhat ovate and wedge-
shaped, coarsely toothed on the upper entire. Flower-heads.—Whitish or
yellowish, composed of unisexual tubular flowers, the stamens and pistils
occurring on different plants.
Some October day, as we pick our way through the salt marshes
which lie back of the beach, we may spy in the distance a thicket
which looks as though composed of such white-flowered shrubs as
belong to June. Hastening to the spot we discover that the silky-
tufted seeds of the female groundsel tree are responsible for our
surprise. The shrub is much more noticeable and effective at this
season than when—a few weeks previous—it was covered with its
small white or yellowish flower-heads.

Grass of Parnassus.
Parnassia Caroliniana. Saxifrage Family.

Stem.—Scape-like, nine inches to two feet high, with usually one small
rounded leaf clasping it below; bearing at its summit a single flower. Leaves.—
Thickish, rounded, often heart-shaped, from the root. Flower.—White or cream-
color, veiny. Calyx.—Of five slightly united sepals. Corolla.—Of five veiny petals.
True Stamens.—Five, alternate with the petals, and with clusters of sterile gland-
tipped filaments. Pistil.—One, with four stigmas.
PLATE XXXIV

GRASS OF PARNASSUS.—P.
Caroliniana.

Gerarde indignantly declares that this plant has been described


by blind men, not “such as are blinde in their eyes, but in their
understandings, for if this plant be a kind of grasse then may the
Butter-burre or Colte’s-foote be reckoned for grasses—as also all
other plants whatsoever.” But if it covered Parnassus with its delicate
veiny blossoms as abundantly as it does some moist New England
meadows each autumn, the ancients may have reasoned that a plant
almost as common as grass must somehow partake of its nature. The
slender-stemmed, creamy flowers are never seen to better advantage
than when disputing with the fringed gentian the possession of some
luxurious swamp.
Pearly Everlasting.
Anaphilis margaritacea. Composite Family (p. 13).

Stem.—Erect, one or two feet high, leafy. Leaves.—Broadly linear to lance-


shaped. Flower-heads.—Composed entirely of tubular flowers with very numerous
pearly white involucral scales.
This species is common throughout our Northern woods and
pastures, blossoming in August. Thoreau writes of it in September:
“The pearly everlasting is an interesting white at present. Though the
stems and leaves are still green, it is dry and unwithering like an
artificial flower; its white, flexuous stem and branches, too, like wire
wound with cotton. Neither is there any scent to betray it. Its
amaranthine quality is instead of high color. Its very brown centre
now affects me as a fresh and original color. It monopolizes small
circles in the midst of sweet fern, perchance, on a dry hill-side.”

Fragrant Life-everlasting.
Gnaphalium polycephalum. Composite Family (p. 13).

Stem.—Erect, one to three feet high, woolly. Leaves.—Lance-shaped. Flower-


heads.—Yellowish-white, clustered at the summit of the branches, composed of
many tubular flowers.
This is the “fragrant life-everlasting,” as Thoreau calls it, of late
summer. It abounds in rocky pastures and throughout the somewhat
open woods.
Note.—Flowers so faintly tinged with color as to give a white effect in the
mass or at a distance are placed in the White section: greenish or greenish-white
flowers are also found here. The Moth Mullein (p. 152) and Bouncing Bet (p. 196)
are found frequently bearing white flowers: indeed, white varieties of flowers
which are usually colored, need never surprise one.
II
YELLOW

Marsh Marigold.
Caltha palustris. Crowfoot Family.

Stem.—Hollow, furrowed. Leaves.—Rounded, somewhat kidney-shaped.


Flowers.—Golden-yellow. Calyx.—Of five to nine petal-like sepals. Corolla.—None.
Stamens.—Numerous. Pistils.—Five to ten, almost without styles.

Hark, hark! the lark at Heaven’s gate sings,


And Phœbus ’gins arise,
His steeds to water at those springs,
On chaliced flowers that lies:
And winking Mary-buds begin
To ope their golden eyes;
With everything that pretty is—
My lady sweet, arise!
Arise, arise.—Cymbeline.

We claim—and not without authority—that these “winking


Mary-buds” are identical with the gay marsh marigolds which border
our springs and gladden our wet meadows every April. There are
those who assert that the poet had in mind the garden marigold—
Calendula—but surely no cultivated flower could harmonize with the
spirit of the song as do these gleaming swamp blossoms. We will
yield to the garden if necessary—
The marigold that goes to bed with the sun
And with him rises weeping—

of the “Winter’s Tale,” but insist on retaining for that larger, lovelier
garden in which we all feel a certain sense of possession—even if we
are not taxed on real estate in any part of the country—the “golden
eyes” of the Mary-buds, and we feel strengthened in our position by
the statement in Mr. Robinson’s “Wild Garden” that the marsh
marigold is so abundant along certain English rivers as to cause the
ground to look as though paved with gold at those seasons when they
overflow their banks.
These flowers are peddled about our streets every spring under
the name of cowslips—a title to which they have no claim, and which
is the result of that reckless fashion of christening unrecognized
flowers which is so prevalent, and which is responsible for so much
confusion about their English names.
The derivation of marigold is somewhat obscure. In the “Grete
Herball” of the sixteenth century the flower is spoken of as Mary
Gowles, and by the early English poets as gold simply. As the first
part of the word might be derived from the Anglo-Saxon mere—a
marsh, it seems possible that the entire name may signify marsh-
gold, which would be an appropriate and poetic title for this shining
flower of the marshes.

Spice-bush. Benjamin-bush. Fever-bush.


Lindera Benzoin. Laurel Family.

An aromatic shrub from six to fifteen feet high. Leaves.—Oblong, pale


underneath. Flowers.—Appearing before the leaves in March or April, honey-
yellow, borne in clusters which are composed of smaller clusters, surrounded by an
involucre of four early falling scales. Fruit.—Red, berry-like, somewhat pear-
shaped.
These are among the very earliest blossoms to be found in the
moist woods of spring. During the Revolution the powdered berries
were used as a substitute for allspice; while at the time of the
Rebellion the leaves served as a substitute for tea.

Yellow Adder’s Tongue. Dog’s Tooth Violet.


Erythronium Americanum. Lily Family.

Scape.—Six to nine inches high, one-flowered. Leaves.—Two, oblong-lance-


shaped, pale green mottled with purple and white. Flower.—Rather large, pale
yellow marked with purple, nodding. Perianth.—Of six recurved or spreading
sepals. Stamens.—Six. Pistil.—One.
The white blossoms of the shad-bush gleam from the thicket,
and the sheltered hill-side is already starred with the blood-root and
anemone when we go to seek the yellow adder’s tongue. We direct
our steps toward one of those hollows in the wood which is watered
by such a clear gurgling brook as must appeal to every country-loving
heart; and there where the pale April sunlight filters through the
leafless branches, nod myriads of these lilies, each one guarded by a
pair of mottled, erect, sentinel-like leaves.

PLATE XXXV

MARSH MARIGOLD.—C. palustris.

The two English names of this plant are unsatisfactory and


inappropriate. If the marking of its leaves resembles the skin of an
adder why name it after its tongue? And there is equally little reason
for calling a lily a violet. Mr. Burroughs has suggested two pretty and
significant names. “Fawn lily,” he thinks, would be appropriate,
because a fawn is also mottled, and because the two leaves stand up
with the alert, startled look of a fawn’s ears. The speckled foliage and
perhaps its flowering season are indicated in the title “trout-lily,”
which has a spring-like flavor not without charm. It is said that the
early settlers of Pennsylvania named the flower “yellow snowdrop,”
in memory of their own “harbinger-of-spring.”
The white adder’s tongue, E. albidum, is a species which is
usually found somewhat westward.

Celandine.
Chelidonium majus. Poppy Family.

Stem.—Brittle, with saffron-colored, acrid juice. Leaves.—Compound or


divided, toothed or cut. Flowers.—Yellow, clustered. Calyx.—Of two sepals falling
early. Corolla.—Of four petals. Stamens.—Sixteen to twenty-four. Pistil.—One,
with a two-lobed stigma. Pod.—Slender, linear.
The name of celandine must always suggest the poet who never
seemed to weary of writing in its honor:
Pansies, lilies, kingcups, daisies,
Let them live upon their praises;
Long as there’s a sun that sets,
Primroses will have their glory;
Long as there are violets,
They will have a place in story;
There’s a flower that shall be mine,
’Tis the little celandine.

And when certain yellow flowers which frequent the village roadside
are pointed out to us as those of the celandine, we feel a sense of
disappointment that the favorite theme of Wordsworth should
arouse within us so little enthusiasm. So perhaps we are rather
relieved than otherwise to realize that the botanical name of this
plant signifies greater celandine; for we remember that the poet
never failed to specify the small celandine as the object of his praise.
The small celandine is Ranunculus ficaria, one of the Crowfoot
family, and is only found in this country as an escape from gardens.
PLATE XXXVI

YELLOW ADDER’S TONGUE.—E.


Americanum.

Gray tells us that the generic name, Chelidonium, from the


ancient Greek for swallow, was given “because its flowers appear
with the swallows;” but if we turn to Gerarde we read that the title
was not bestowed “because it first springeth at the coming in of the
swallowes, or dieth when they go away, for as we have saide, it may
be founde all the yeare; but because some holde opinion, that with
this herbe the dams restore sight to their young ones, when their eies
be put out.”

Celandine Poppy.
Stylophorum diphyllum. Poppy Family.
Stem.—Low, two-leaved. Stem-leaves.—Opposite, deeply incised. Root-leaves.
—Incised or divided. Flowers.—Deep yellow, large, one or more at the summit of
the stem. Calyx.—Of two hairy sepals. Corolla.—Of four petals. Stamens.—Many.
Pistil.—One, with a two to four-lobed stigma.
In April or May, somewhat south and westward, the woods are
brightened, and occasionally the hill-sides are painted yellow, by this
handsome flower. In both flower and foliage the plant suggests the
celandine.

Downy Yellow Violet.


Viola pubescens. Violet Family.

Stems.—Leafy above, erect. Leaves.—Broadly heart-shaped, toothed. Flowers.


—Yellow, veined with purple, otherwise much like those of the common blue violet.

When beechen buds begin to swell,


And woods the blue-bird’s warble know,
The yellow violet’s modest bell
Peeps from the last year’s leaves below,

sings Bryant, in his charming, but not strictly accurate poem, for the
chances are that the “beechen buds” have almost burst into foliage,
and that the “bluebird’s warble” has been heard for some time when
these pretty flowers begin to dot the woods.
PLATE XXXVII

DOWNY YELLOW VIOLET.—V.


pubescens.

The lines which run:


Yet slight thy form, and low thy seat,
And earthward bent thy gentle eye,
Unapt the passing view to meet,
When loftier flowers are flaunting nigh,

would seem to apply more correctly to the round-leaved, V.


rotundifolia, than to the downy violet, for although its large, flat
shining leaves are somewhat conspicuous, its flowers are borne
singly on a low scape, which would be less apt to attract notice than
the tall, leafy flowering stems of the other.
Common Cinquefoil. Five Finger.
Potentilla Canadensis. Rose Family.

Stem.—Slender, prostrate, or sometimes erect. Leaves.—Divided really into


three leaflets, but apparently into five by the parting of the lateral leaflets. Flowers.
—Yellow, growing singly from the axils of the leaves. Calyx.—Deeply five-cleft, with
bracts between each tooth, thus appearing ten-cleft. Corolla.—Of five rounded
petals. Stamens.—Many. Pistils.—Many in a head.
From spring to nearly midsummer the roads are bordered and
the fields carpeted with the bright flowers of the common cinquefoil.
The passer-by unconsciously betrays his recognition of some of the
prominent features of the Rose family by often assuming that the
plant is a yellow-flowered wild strawberry. Both of the English
names refer to the pretty foliage, cinquefoil being derived from the
French cinque feuilles. The generic name, Potentilla, has reference to
the powerful medicinal properties formerly attributed to the genus.

Shrubby Cinquefoil. Five Finger.


Potentilla fruticosa. Rose Family.

Stem.—Erect, shrubby, one to four feet high. Leaves.—Divided into five to


seven narrow leaflets. Flowers.—Yellow, resembling those of the common
cinquefoil.

You might also like