Papers

2017 13th International Conference on Computational Intelligence and Security
Document Sensitivity Classification for Data Leakage Prevention

with Twitter-based Document Embedding and Query Expansion
Lap Q. Trieu, Trung-Nguyen Tran, Mai-Khiem Tran, Minh-Triet Tran

Faculty of Information Technology
University of Science, VNU-HCM
Ho Chi Minh City, Vietnam
tqlap@apcs.vn, ttnguyen@selab.hcmus.edu.vn,
tmkhiem@selab.hcmus.edu.vn, tmtriet@fit.hcmus.edu.vn
Abstract—Document sensitivity classification is essential to known patterns and templates of a classified document,
prevent potential sensitive data leakage for individuals and such as contracts or agreements, but may not efficiently
organizations. As most of existing methods use regular expres- help people to aware the potential privacy leaks in informal
sions or data fingerprinting to classify sensitive documents,
they may not fully exploit the semantic and content of a documents, such as emails or personal notes.
document, especially with informal messages and files. This In recent years, natural language processing (NLP) tech-
motivates the authors to propose a novel method to classify niques, such as N-gram[4] or Named Entity Recognition[6],
document sensitivity in realtime with better semantic and have been applied for document classification in DLP. In
content analysis. this paper, we follow this trend to better understand and
Taking advantages of deep learning in natural language
processing, we use our pre-trained Twitter-based document
exploit document content to classify document sensitivity.
embedding TD2V to encode a document or a text fragment Inpired by ParagraphVector [7], one of the state-of-the-art
into a fixed length vector of 300 dimensions. Then we use methods for document representation, we apply our Twitter-
retrieval and automatic query expansion to retrieve a re-ranked based Doc2Vec model (TD2V [8]) to vectorize an arbitrary
list of semantically similar known documents, and determine document or text fragment. Collecting more than one million
the sensitivity score for a new document from those of the
retrieved documents in this list. Experimental results show that
tweets (from 2010 to 2017) in Twitters, we train TD2V from
our method can achieve classification accuracy of more than 422,351 English articles with 297,298,525 tokenized words
99.9% for 4 datasets (snowden, Mormon, Dyncorp, TM) and [8]. Thus, our word embedding model is expected to be
98.34% for Enron dataset. Furthermore, our method can early general and efficient enough to represent English documents
predict a sensitive document from a short text fragment with and text fragments in various domains. Then, we propose a
the accuracy higher than 98.84%.
novel method to classify the sensitivity level of a document
Keywords-Sensitive document detection, document embed- d using retrieval and automatic query expansion (AQE [9]).
ding, Doc2Vec, automatic query expansion, data leakage pre- From the initial ranklist containing k nearest neighbors of
vention
d, we use Modified Distance[8] to re-rank documents in a
labeled sensitivity corpus S. The sensitivity label for d is
I. I NTRODUCTION
determined by majority voting scheme from the top l in this
Data leakage prevention (DLP[1]) is one of the essential re-ranked list.
problems to protect personal and organizational sensitive Following the work by Hart et.al. [10], we gather four dif-
information from disclosing without official consent. The ferent datasets, namely, Dyncorp, Transcendental meditation
number of confidential data leaks is increasing every year (TM), Mormon, and Enron. We also create the fifth dataset
[2]. In Global Data Leakage Report of InfoWatch Analytical Snowden. Our experiment for full document classification
Center, there are 1,556 confidential data leaks registered in shows that our proposed method achieves the accuracy more
2016, increasing 3.4% more than in 2015[3]. Especially, than 99.9% for 4 datasets (Snowden, Mormon, Dyncorp,
more than 3.1 billion personal data records are compromised, TM) and 98.34% for Enron dataset. Besides, we also conduct
up to three times more than in 2015[3]. sensitivity classification for short text segment (512 bytes,
With the increasing amount of data created everyday, 1KB, 2KB, and 4KB) and our method achieves the accuracy
it is not an easy task for people to classify and manage of more than 99.7% for 4 datasets (Snowden, Mormon,
documents based on their sensitivity levels. Currently most Dyncorp, TM) and 98.84% for 1 dataset (Enron).
of existing methods in DLP use regular expressions or The main contributions of our work are as follows.
data fingerprinting[4]. Thus these methods only analyze the • we propose a novel method to classify the sensitivity
formal representation, i.e. data format, not the semantic of a document or a text fragment with two phases: text
content of a document[5]. This approach is appropriate with fragment vectorization with our pre-trained document
0-7695-6341-4/17/31.00 ©2017 IEEE 537

DOI 10.1109/CIS.2017.00125
Smartcam
An object-detection surveillance camera
Mai-Khiem Tran1, Phuc-Nguyen Nguyen2, Hoang-Trieu Trinh2

1 Honored program in Computer Science, Faculty of Information Technology, University of Science, VNU-HCM
Advanced Program in Computer Science, Faculty of Information Technology, University of Science, VNU-HCM
2
1612869@student.hcmus.edu.vn
{thtrieu, npnguyen}@apcs.vn
Abstract – In this paper, we propose smartcam, a new surveillance solution based on object – detection technique
to achieve richer descriptiveness, higher storage usage efficiency and true antitheft feature while keeping the bill
of materials low and also flexibility for the user. Early calculations shows that in best case, an event recorded by
our solution only uses 0.0279% storage space of that is normally used in traditional solutions. An estimation
shows that over 5,100,000 events that is recorded in roughly 3 years can be fit on a storage space of 8 Gigabytes
continously with our solution.
Keywords – Object detection, surveillance, security
I. INTRODUCTION dependency on the server itself, and could causes trouble if

there is technical failure such as power outage or maintenance
IP Camera surveillance is a popular method against ever-
downtime, and the approach naturally utilizes high network
increasing rate of social vices. Traditional security camera
bandwidth and power usage.
solutions record events to a video file. This mechanism
propose a tradeoff between the quality of recorded video and Another feature that surveillance systems provide to the
the duration of the video. While quality is important to user is motion detection that raises an alarm if any differences
disinguish small details in the video, duration also plays a is present between continous frames. The feature is originally
major role with long ambient events. There are ways to used to combat thieves. But there do exist various sources of
eliminate these problems. Using a better encoding to further motion that is not necessary thieve, such as tree leaves or pets.
compress the video without affecting the quality, and hence This renders the feature nearly useless unless the user
allows longer video to be stored is an option, but only if the explicity lower the sensitiveness, or he will get false alarm in
processor supports it natively at the hardware level. Encoding the middle of the night.
video is known to be a computationally expensive process,
III. PROPOSED METHOD FOR SMARTCAM
which consumes a lot of power even on embedded processor
counterparts – mobile application processors. An option is to Based on existing solutions, we took a step further to
increase the storage space, but is limited off fixed-internal decentralize the surveillance network model. Theorically, if
storage surveillance systems. the surveillance camera itself can determine the type of object
that is being captured, so could it describe the events without
II. EXISTING SOLUTIONS
dependency on other machines. The description will be text-
To achieve true event-based recording, there currently based with an image captured right at the moment the event
exists an approach that involves a central server from which happens, with optionally a short video clip if the user desires
the live video feeds connect to perform the object-detection, to, therefore the metadata overhead is almost zero while the
since the process is long and memory-intensive that it is content loss is almost nothing. This leads to a better bytes per
suitable for such hardware configuration a server typically event ratio, and therefore drastically increase the number of
has. But having a centralized server also means that there is events can be stored with the same storage space.
Personal Diary Generation from Wearable Cameras with
Concept Augmented Image Captioning and Wide Trail Strategy
Viet-Khoa Vo-Ho Quoc-An Luong Duy-Tam Nguyen
Faculty of Information Technology Faculty of Information Technology Faculty of Information Technology
Software Engineering Lab Software Engineering Lab Software Engineering Lab
University of Science, VNU-HCM University of Science, VNU-HCM University of Science, VNU-HCM
vhvkhoa@selab.hcmus.edu.vn lqan@selab.hcmus.edu.vn ndtam@selab.hcmus.edu.vn
Mai-Khiem Tran Minh-Triet Tran

Faculty of Information Technology Faculty of Information Technology
Software Engineering Lab Software Engineering Lab
University of Science, VNU-HCM University of Science, VNU-HCM
tmkhiem@selab.hcmus.edu.vn tmtriet@hcmus.edu.vn
ABSTRACT ACM Reference Format:
Writing diary is not only a hobby but also provides a personal lifelog Viet-Khoa Vo-Ho, Quoc-An Luong, Duy-Tam Nguyen, Mai-Khiem Tran,
and Minh-Triet Tran. 2018. Personal Diary Generation from Wearable Cam-
for better analysis and understanding of a user’s daily activities and
eras with, Concept Augmented Image Captioning and Wide Trail Strategy.
events. However, in a busy society, people may not have enough In The Ninth International Symposium on Information and Communication
time to write in diary all their social interaction. This motivates our Technology (SoICT 2018), December 6–7, 2018, Danang City, Viet Nam. ACM,
proposal to develop a ubiquitous system to automatically generate Da Nang, Viet Nam , 8 pages. https://doi.org/10.1145/3287921.3287955
daily text diary using our novel method for image captioning from
photos taken periodically from wearable cameras. We propose to
1 INTRODUCTION
incorporate common visual concepts extracted from a photo to
enhance the details of the image description. We also propose a wide People usually write in personal diaries events they observe in
trail beam search strategy to enhance the naturalness of text caption. their daily activities. However, in a busy society, people may not
Our captioning method improves the results on MSCOCO dataset have enough time to record all events happening around them.
on four metrics: BLEU, METEOR, ROUGE-L, CIDEr. As compared Furthermore, there can be some event that people may not aware
to the method proposed by Xu et.al and Neuraltalk of Karpathy, of simply ignore because such event is not considered important
our model has better performance on all four metrics. We also when it happens. Therefore, it would be necessary to develop an
develop smart glasses and a prototype smart workplace in which utility to assist people to generate personal lifelog automatically
people can have their personal diary generated from photos taken from their daily activities and observation.
by smart glasses. Furthermore, we also apply a transformer machine Lifelog data [5] can be in various media format. It may include
translation model in order to translate captions into Vietnamese audio data recorded during conversations, photos or video clips
language. The results are promising and can be used for Vietnamese captured by wearable or regular personal cameras, or even biomet-
people. ric data, such as heart rate or calorie burn. Visual lifelog data is one
of the most essential sources for personal diary generation as it has
CCS CONCEPTS rich potential information and it is easy to be collected.
It requires large storage capacity to store visual lifelog in raw
• Information systems → Multimedia content creation; • Human- visual data format. Besides, it is not easy for users to express their
centered computing → Ubiquitous and mobile computing systems needs to query for certain events from visual personal diary. This
and tools; motivates our proposal to automatically generate personal diary in
text format from photos taken periodically by wearable cameras.
KEYWORDS Based on our method first presented in [15], we further improve
Lifelog processing, Image captioning, Ubiquitous system and propose a novel method for image captioning with concept
augmentation and wide trail strategy. There are two main key ideas
in our image captioning method. First, we encode visual concepts
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed detected from an image and integrate them into the visual feature
for profit or commercial advantage and that copies bear this notice and the full citation for image captioning process[15]. Second, we propose wide trail
on the first page. Copyrights for components of this work owned by others than ACM strategy, a modified beam-search strategy, to maintain the top k−
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a best word sequence candidates to find the the optimal sequence of
fee. Request permissions from permissions@acm.org. generated words. Our new method prefers complete sentences so
SoICT 2018, December 6–7, 2018, Danang City, Viet Nam that the generated caption is sure to be a whole sentence instead of
© 2018 Association for Computing Machinery.
ACM ISBN 978-1-4503-6539-0/18/12. . . $15.00 an incomplete phrase. This is the main difference between our new
https://doi.org/10.1145/3287921.3287955 method and its previous version [15].
applied
sciences
Article
A Smart System for Text-Lifelog Generation from
Wearable Cameras in Smart Environment Using
Concept-Augmented Image Captioning with
Modified Beam Search Strategy †
Viet-Khoa Vo-Ho , Quoc-An Luong * , Duy-Tam Nguyen, Mai-Khiem Tran
and Minh-Triet Tran *
Information Technology and Software Engineering Lab, VNUHCM—University of Science, Ho Chi Minh 800010,
Vietnam; vhvkhoa@selab.hcmus.edu.vn (V.-K.V.-H.); ndtam@selab.hcmus.edu.vn (D.-T.N.);
tmkhiem@selab.hcmus.edu.vn (M.-K.T.)
* Correspondence: lqan@selab.hcmus.edu.vn (Q.-A.L.); tmtriet@hcmus.edu.vn (M.-T.T.)
Tel.: +84-983-118-326 (Q.-A.L.)
† This article is an extended research of our previous work “Personal Diary Generation from Wearable Cameras
with Concept Augmented Image Captioning and Wide Trail Strategy”, awarded as “Best Paper” in Symposium on
Information and Communication Technology conference (SoICT 2018), DaNang City, Viet Nam,
6–7 December 2018.

Received: 2 March 2019; Accepted: 29 April 2019; Published: 8 May 2019
Featured Application: Our work can be applied as an IoT system to capture important events in
daily life for later storage. From wearable devices with camera such as smart glasses, photos of
events can be periodically taken and processed into description in text format. The description
is then stored in a database on server and can be retrieved via another smart device such as
smartphone. This let users easily retrieve the information they want for sharing or reminiscence.
The descriptions of photos taken each day can also be gathered as a diary. Furthermore,
the database is also a huge resource for analyzing user behavior.
Abstract: During a lifetime, a person can have many wonderful and memorable moments that he/she
wants to keep. With the development of technology, people now can store a massive amount of
lifelog information via images, videos or texts. Inspired by this, we develop a system to automatically
generate caption from lifelog pictures taken from wearable cameras. Following up on our previous
method introduced at the SoICT 2018 conference, we propose two improvements in our captioning
method. We trained and tested the model on the baseline MSCOCO datasets and evaluated on
different metrics. The results show better performance compared to our previous model and to some
other image captioning methods. Our system also shows effectiveness in retrieving relevant data
from captions and achieve high rank in ImageCLEF 2018 retrieval challenge.
Keywords: lifelog processing; image captioning; IoT system
1. Introduction
People usually want to keep footage of the events that happen around them for many purposes
such as reminiscence [1], retrieval [2] or verification [3]. However, it is not always convenient for
them to record those events because they do not have the time or tool at that moment. People also
could miss some events because they do not consider those events important or worth keeping until
later. With the development of technology, especially IoT system, smart environment such as smart
home and smart office can be established and give people easy access to ubiquitous service. In a
Appl. Sci. 2019, 9, 1886; doi:10.3390/app9091886 www.mdpi.com/journal/applsci

Vehicle Re-identification with Learned Representation and Spatial Verification
and Abnormality Detection with Multi-Adaptive Vehicle Detectors
for Traffic Video Analysis
Khac-Tuan Nguyen1 , Trung-Hieu Hoang1 , Minh-Triet Tran ∗1 , Trung-Nghia Le3 ,

Ngoc-Minh Bui1 , Trong-Le Do1 , Viet-Khoa Vo-Ho1 , Quoc-An Luong1 , Mai-Khiem Tran1 ,
Thanh-An Nguyen1 , Thanh-Dat Truong1 , Vinh-Tiep Nguyen2 , and Minh N. Do4
1
University of Science, VNU-HCM, Vietnam
2
University of Information Technology, VNU-HCM, Vietnam
3
University of Tokyo, Japan
4
University of Illinois at Urbana-Champaign, U.S.
Abstract In this paper, we focus on two challenging problems in

the real world presented in AI City Challenge 2019, namely
Traffic flow analysis is essential for intelligent trans- vehicle re-identification and anomaly detection.
portation systems. In this paper, we propose methods for For vehicle re-identification, our proposed method has
two challenging problems in traffic flow analysis: vehicle three main components. First, we employ deep represen-
re-identification and abnormal event detection. For the first tation for vehicle instance. Second, we extract various at-
problem, we propose to combine learned high-level features tributes of a vehicle instance from a photo or tracklet (an
for vehicle instance representation with hand-crafted local image set of a single vehicle instance) for an adaptive strat-
features for spatial verification. For the second problem, egy to retrieve candidates instance/tracklet that is similar
we propose to use multiple adaptive vehicle detectors for to a given one. Finally, we propose to use Bag-of-Word
anomaly proposal and use heuristics properties extracted approach with local features for spatial verification and re-
from anomaly proposals to determine anomaly events. ranking[26, 27].
Experiments on the datasets of traffic flow analysis from For anomaly detection, we aim to localize and track
AI City Challenge 2019 show that our methods achieve anomaly proposals, i.e. stalled vehicles on roads. First, sta-
mAP of 0.4008 for vehicle re-identification in Track 2, and ble scenes and adaptive detection strategies are exploited
can detect abnormal events with very high accuracy (F1 = through day-night detection as well as dynamic scene de-
0.9429) in Track 3. tection. Second, we employ background modeling [45] to
eliminate moving vehicles and then localize stalled vehi-
cles. We adopt our proposed solution with multiple adaptive
1. Introduction vehicle detectors for anomaly proposal to adapt to different
contexts from traffic cameras. Finally, we propose to detect
To develop an intelligent transportation system (ITS) for and track abnormal events cross scenes based on heuristics
smart society, it is a practical urgent need to analyze traf- properties extracted from anomaly proposals.
fic flow to extract meaningful information for management,
We also report our results on AI City Challenge 2019. In
prediction, simulation, and planning. Various tasks on traf-
Track 2 for vehicle re-identification, we achieve 0.4008 on
fic video analysis are becoming popular, such as vehicle
mAP, the 25th place out of 84 team submissions. In Track
type classification [12, 34], vehicle localization [41, 10] ,
3 for anomaly detection, we take the 8th place out of 23
velocity estimation [9, 11], vehicle tracking [4], car fluent
team submissions with 0.61 on S3 score. We remark that
recognition [13], vehicle re-identification [21, 1, 32], or ab-
our method can detect abnormal events with high accuracy
normal event detection [31, 45].
(F1 = 0.9429) in Track 3.
∗ Corresponding author. Email: tmtriet@fit.hcmus.edu.vn The remainder of this paper is organized as follows. Sec-
Eurographics Workshop on 3D Object Retrieval (2019)
S. Biasotti and G. Lavoué (Editors)
SHREC 2019 - Monocular Image Based 3D Model Retrieval
∗ ∗
Wenhui Li1† , Anan Liu 1† , Weizhi Nie 1† , Dan Song1† , Yuqian Li1† , Weijie Wang1† , Shu Xiang1† , Heyu Zhou1†
Ngoc-Minh Bui2 , Yunchi Cen3 , Zenian Chen3 , Huy-Hoang Chung-Nguyen2 , Gia-Han Diep2 , Trong-Le Do2 , Eugeni L. Doubrovski4 ,
Anh-Duc Duong5 , Jo M.P. Geraedts4 , Haobin Guo6 , Trung-Hieu Hoang2 , Yichen Li7 , Xing Liu9 , Zishun Liu4 , Duc-Tuan Luu2 , Yunsheng
Ma10 , Vinh-Tiep Nguyen5 , Jie Nie11 , Tongwei Ren6 , Mai-Khiem Tran2 , Son-Thanh Tran-Nguyen2 , Minh-Triet Tran2 , The-Anh Vu-Le2 ,
Charlie C.L. Wang8 , Shijie Wang9 , Gangshan Wu6 , Caifei Yang9 , Meng Yuan11 , Hao Zhai7 , Ao Zhang6 , Fan Zhang3 , Sicheng Zhao10
1
Shool of Electrical and Information Engineering, Tianjin University, China.
2 University of Science, VNU-HCM, Vietnam.
3 State Key Laboratory of Virtual Reality Technology and System, Beihang University, China.
4 Delft University of Technology, Netherlands.
5 University of Information Technology, VNU-HCM, Vietnam.
6 Nanjing University, China.
7 SuoAo Technology Center, SAEE, University of Science and Technology Beijing, China.
8 Chinese University of Hong Kong, China.
9 School of Software, Dalian University of Technology, China.
10 Department of Electrical Engineering and Computer Sciences, University of California Berkeley, USA
11 Ocean University of China, China.
Abstract
Monocular image based 3D object retrieval is a novel and challenging research topic in the field of 3D object retrieval. Given a
RGB image captured in real world, it aims to search for relevant 3D objects from a dataset. To advance this promising research,
we organize this SHREC track and build the first monocular image based 3D object retrieval benchmark by collecting 2D
images from ImageNet and 3D objects from popular 3D datasets such as NTU, PSB, ModelNet40 and ShapeNet. The benchmark
contains classified 21,000 2D images and 7,690 3D objects of 21 categories. This track attracted 9 groups from 4 countries and
the submission of 20 runs. To have a comprehensive comparison, 7 commonly-used retrieval performance metrics have been
used to evaluate their retrieval performance. We wish this publicly available benchmark, comparative evaluation results and
corresponding evaluation code, will further enrich and boost the research of monocular image based 3D object retrieval and
its applications.
Categories and Subject Descriptors (according to ACM CCS): H.3.3 [Computer Graphics]: Information Systems—Information
Search and Retrieval
1. Introduction The fundamental challenge in cross-modal retrieval lies in the

heterogeneity of different modalities of data. In recent years,
As the rapid development of 3D technologies for modeling, recon-
some efforts have been made to bridge the gap between dif-
struction, printing an so on have produced increasing number of
ferent modalities and different domains, such as text-to-image
3D models, 3D model retrieval becomes more and more importan-
retrieval and image-to-image domain adaption. SHREC18’IBR
t. Monocular image based 3D object retrieval (MI3DOR) aims to
[ARYL∗ 18] aims to search for relevant 3D scenes with 2D scene
retrieve 3D object using a RGB image captured in real world. It
image, which is also a cross-modal retrieval task. Compared with
helps users to get access to valuable 3D models by easily available
SHREC18’IBR, this track has the following different aspects: (1)
2D images, which is significant and promising.
Different from collecting the scene images and models, we focus on
However, few work focuses on MI3DOR with the following two individual object, which is useful for many applications related to
reasons: (1) lack of related retrieval benchmarks, and (2) the gap 3D objects. (2) We contribute a dataset with more data and more
between two modalities makes the problem extremely challenging. categories, which makes the retrieval task based on this dataset
more convincing.
† Track organizer. ∗ Corresponding Author Email: anan0422@gmail.com In summary, the objective of this track is to retrieve 3D objects
and weizhinie@tju.edu.cn. using 2D monocular image. Our collection is composed of 21,000
c 2019 The Author(s)

Eurographics Proceedings
c 2019 The Eurographics Association.

Papers

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Papers

Uploaded by

Copyright:

Available Formats

2017 13th International Conference on Computational Intelligence and Security

Document Sensitivity Classiﬁcation for Data Leakage Prevention

Lap Q. Trieu, Trung-Nguyen Tran, Mai-Khiem Tran, Minh-Triet Tran

0-7695-6341-4/17/31.00 ©2017 IEEE 537

Mai-Khiem Tran1, Phuc-Nguyen Nguyen2, Hoang-Trieu Trinh2

I. INTRODUCTION dependency on the server itself, and could causes trouble if

Mai-Khiem Tran Minh-Triet Tran

Keywords: lifelog processing; image captioning; IoT system

Appl. Sci. 2019, 9, 1886; doi:10.3390/app9091886 www.mdpi.com/journal/applsci

Khac-Tuan Nguyen1 , Trung-Hieu Hoang1 , Minh-Triet Tran ∗1 , Trung-Nghia Le3 ,

Abstract In this paper, we focus on two challenging problems in

SHREC 2019 - Monocular Image Based 3D Model Retrieval

1. Introduction The fundamental challenge in cross-modal retrieval lies in the

c 2019 The Author(s)

You might also like