ebffiledoc_490Download textbook Advances In Multimedia Information Processing Pcm 2016 17Th Pacific Rim Conference On Multimedia Xi An China September 15 16 2016 Proceedings Part Ii 1St Edition Enqing Chen ebook all chapter pdf

Advances in Multimedia Information
Processing PCM 2016 17th Pacific Rim

Conference on Multimedia Xi an China
September 15 16 2016 Proceedings Part
II 1st Edition Enqing Chen
Visit to download the full and correct content document:
https://textbookfull.com/product/advances-in-multimedia-information-processing-pcm-
2016-17th-pacific-rim-conference-on-multimedia-xi-an-china-september-15-16-2016-p
roceedings-part-ii-1st-edition-enqing-chen/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...
Advances in Multimedia Information Processing PCM 2016

17th Pacific Rim Conference on Multimedia Xi an China
September 15 16 2016 Proceedings Part I 1st Edition
Enqing Chen
https://textbookfull.com/product/advances-in-multimedia-
information-processing-pcm-2016-17th-pacific-rim-conference-on-
multimedia-xi-an-china-september-15-16-2016-proceedings-
part-i-1st-edition-enqing-chen/
Advances in Multimedia Information Processing – PCM

2018: 19th Pacific-Rim Conference on Multimedia, Hefei,
China, September 21-22, 2018, Proceedings, Part II
Richang Hong
multimedia-hefei-china-september-21-22-2018-proceedings-part-ii-
richang-hong/

2018: 19th Pacific-Rim Conference on Multimedia, Hefei,
China, September 21-22, 2018, Proceedings, Part III
Richang Hong
multimedia-hefei-china-september-21-22-2018-proceedings-part-iii-
richang-hong/

2017: 18th Pacific-Rim Conference on Multimedia,
Harbin, China, September 28-29, 2017, Revised Selected
Papers, Part II Bing Zeng
multimedia-harbin-china-september-28-29-2017-revised-selected-
Neural Information Processing 23rd International
Conference ICONIP 2016 Kyoto Japan October 16 21 2016
Proceedings Part IV 1st Edition Akira Hirose
https://textbookfull.com/product/neural-information-
processing-23rd-international-conference-iconip-2016-kyoto-japan-
october-16-21-2016-proceedings-part-iv-1st-edition-akira-hirose/
MultiMedia Modeling 22nd International Conference MMM

2016 Miami FL USA January 4 6 2016 Proceedings Part I
1st Edition Qi Tian
https://textbookfull.com/product/multimedia-modeling-22nd-
international-conference-mmm-2016-miami-fl-usa-
january-4-6-2016-proceedings-part-i-1st-edition-qi-tian/
Web Age Information Management 17th International

Conference WAIM 2016 Nanchang China June 3 5 2016
Proceedings Part I 1st Edition Bin Cui
https://textbookfull.com/product/web-age-information-
management-17th-international-conference-waim-2016-nanchang-
china-june-3-5-2016-proceedings-part-i-1st-edition-bin-cui/
Perspectives in Business Informatics Research 15th

International Conference BIR 2016 Prague Czech Republic
September 15 16 2016 Proceedings 1st Edition Václav
■epa
https://textbookfull.com/product/perspectives-in-business-
informatics-research-15th-international-conference-
bir-2016-prague-czech-republic-
september-15-16-2016-proceedings-1st-edition-vaclav-repa/
Building Sustainable Health Ecosystems 6th

International Conference on Well Being in the
Information Society WIS 2016 Tampere Finland September
16 18 2016 Proceedings 1st Edition Hongxiu Li
https://textbookfull.com/product/building-sustainable-health-
ecosystems-6th-international-conference-on-well-being-in-the-
information-society-wis-2016-tampere-finland-
Enqing Chen · Yihong Gong
Yun Tie (Eds.)
Advances in Multimedia
LNCS 9917
Information Processing –
PCM 2016
17th Pacific-Rim Conference on Multimedia
Xi‘an, China, September 15–16, 2016
Proceedings, Part II
123
Lecture Notes in Computer Science 9917
Commenced Publication in 1973
Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board
David Hutchison
Lancaster University, Lancaster, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Friedemann Mattern
ETH Zurich, Zurich, Switzerland
John C. Mitchell
Stanford University, Stanford, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
C. Pandu Rangan
Indian Institute of Technology, Madras, India
Bernhard Steffen
TU Dortmund University, Dortmund, Germany
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max Planck Institute for Informatics, Saarbrücken, Germany
More information about this series at http://www.springer.com/series/7409
Enqing Chen Yihong Gong
•
Yun Tie (Eds.)
Advances in Multimedia
Information Processing –
PCM 2016
17th Pacific-Rim Conference on Multimedia
Xi’an, China, September 15–16, 2016
Proceedings, Part II
123
Editors
Enqing Chen Yun Tie
Zhengzhou University Zhengzhou University
Zhengzhou Zhengzhou
China China
Yihong Gong
Jiaotong University
Xi’an
China
ISSN 0302-9743 ISSN 1611-3349 (electronic)

Lecture Notes in Computer Science
ISBN 978-3-319-48895-0 ISBN 978-3-319-48896-7 (eBook)
DOI 10.1007/978-3-319-48896-7
Library of Congress Control Number: 2016959170
LNCS Sublibrary: SL3 – Information Systems and Applications, incl. Internet/Web, and HCI
© Springer International Publishing AG 2016

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now
known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors
give a warranty, express or implied, with respect to the material contained herein or for any errors or
omissions that may have been made.
Printed on acid-free paper
This Springer imprint is published by Springer Nature

The registered company is Springer International Publishing AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
The 17th Pacific-Rim Conference on Multimedia (PCM 2016) was held in Xi’an,
China, during September 15–16, 2016, and hosted by the Xi’an Jiaotong University
(XJTU). PCM is a leading international conference for researchers and industry
practitioners to share their new ideas, original research results, and practical devel-
opment experiences from all multimedia-related areas.
It was a great honor for XJTU to host PCM 2016, one of the most longstanding
multimedia conferences, in Xi’an, China. Xi’an Jiaotong University, located in the
capital of Shaanxi province, is one of the key universities run by the Ministry of
Education, China. Recently its multimedia-related research has been attracting
increasing attention from the local and international multimedia community. For over
2000 years, Xi’an has been the center for political and economic developments and the
capital city of many Chinese dynasties, with the richest cultural and historical heritage,
including the world-famous Terracotta Warriors, Big Wild Goose Pagoda, etc. We
hope that our venue made PCM 2016 a memorable experience for all participants.
PCM 2016 featured a comprehensive program. The 202 submissions from authors
of more than ten countries included a large number of high-quality papers in multi-
media content analysis, multimedia signal processing and communications, and mul-
timedia applications and services. We thank our 28 Technical Program Committee
members who spent many hours reviewing papers and providing valuable feedback to
the authors. From the total of 202 submissions to the main conference and based on at
least three reviews per submission, the program chairs decided to accept 111 regular
papers (54 %) among which 67 were posters (33 %). This volume of the conference
proceedings contains the abstracts of two invited talks and all the regular, poster, and
special session papers.
The technical program is an important aspect but only achieves its full impact if
complemented by challenging keynotes. We are extremely pleased and grateful to have
had two exceptional keynote speakers, Wen Gao and Alex Hauptmann, accept our
invitation and present interesting ideas and insights at PCM 2016.
We are also heavily indebted to many individuals for their significant contributions.
We thank the PCM Steering Committee for their invaluable input and guidance on
crucial decisions. We wish to acknowledge and express our deepest appreciation to the
honorary chairs, Nanning Zheng, Shin’chi Satoh, general chairs, Yihong Gong, Tho-
mas Plagemann, Ke Lu, Jianping Fan, program chairs, Meng Wang, Qi Tian, Abdul-
motaleb EI Saddik, Yun Tie, organizing chairs, Jinye Peng, Xinbo Gao, Ziyu Guan,
Yizhou Wang, publicity chairs, Xueming Qian, Xiaojiang Chen, Cheng Jin, Xiangyang
Xue, publication chairs, Jun Wu, Enqing Chen, local Arrangements Chairs, Kuizi Mei,
Xuguang Lan, special session chairs, Jianbing Shen, Jialie Shen, Jianru Xue, demo
chairs, Yugang Jiang, Jitao Sang, finance and registration chair, Shuchan Gao. Without
their efforts and enthusiasm, PCM 2016 would not have become a reality. Moreover,
we want to thank our sponsors: Springer, Peking University, Zhengzhou University,
VI Preface
Ryerson University. Finally, we wish to thank all committee members, reviewers,

session chairs, student volunteers, and supporters. Their contributions are much
appreciated.
September 2016 Meng Wang

Yun Tie
Qi Tian
Abdulmotaleb EI Saddik
Yihong Gong
Thomas Plagemann
Ke Lu
Jianping Fan
Organization
Honorary Chairs
Nanning Zheng Xi’an Jiaotong University, China
Shin’chi Satoh National Institute of Informatics, Japan
General Chairs
Yihong Gong Xi’an Jiaotong University, China
Thomas Plagemann University of Oslo, Norway
Ke Lu University of Chinese Academy of Sciences, China
Jianping Fan University of North Carolina at Charlotte, USA
Program Chairs
Meng Wang Hefei University of Technology, China
Qi Tian University of Texas at San Antonio, USA
Abdulmotaleb EI Saddik University of Ottawa, Canada
Yun Tie Zhengzhou University, China
Organizing Chairs
Jinye Peng Northwest University, China
Xinbo Gao Xidian University, China
Ziyu Guan Northwest University, China
Yizhou Wang Peking University, China
Publicity Chairs
Xueming Qian Xi’an Jiaotong University, China
Xiaojiang Chen Northwest University, China
Cheng Jin Fudan University, China
Xiangyang Xue Fudan University, China
Publication Chairs
Jun Wu Northwestern Polytechnical University, China
Enqing Chen Zhengzhou University, China
VIII Organization
Local Arrangements Chairs

Kuizi Mei Xi’an Jiaotong University, China
Xuguang Lan Xi’an Jiaotong University, China
Special Session Chairs

Jianbing Shen Beijing Institute of Technology, China
Jialie Shen Singapore Management University, Singapore
Jianru Xue Xi’an Jiaotong University, China
Demo Chairs
Yugang Jiang Fudan University, China
Jitao Sang Institute of Automation, Chinese Academy of Sciences,
China
Finance and Registration Chair

Shuchan Gao Xi’an Jiaotong University, China
Contents – Part II
A Global-Local Approach to Extracting Deformable Fashion Items from

Web Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Lixuan Yang, Helena Rodriguez, Michel Crucianu, and Marin Ferecatu
Say Cheese: Personal Photography Layout Recommendation

Using 3D Aesthetics Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Ben Zhang, Ran Ju, Tongwei Ren, and Gangshan Wu
Speech Enhancement Using Non-negative Low-Rank Modeling

with Temporal Continuity and Sparseness Constraints . . . . . . . . . . . . . . . . . 24
Yinan Li, Xiongwei Zhang, Meng Sun, Xushan Chen, and Lin Qiao
Facial Animation Based on 2D Shape Regression . . . . . . . . . . . . . . . . . . . . 33

Ruibin Bai, Qiqi Hou, Jinjun Wang, and Yihong Gong
A Deep CNN with Focused Attention Objective for Integrated Object

Recognition and Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Xiaoyu Tao, Chenyang Xu, Yihong Gong, and Jinjun Wang
An Accurate Measurement System for Non-cooperative Spherical Target

Based on Calibrated Lasers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Hang Dong, Fei Wang, Haiwei Yang, Zhongheng Li, and Yanan Chen
Integrating Supervised Laplacian Objective with CNN

for Object Recognition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Weiwei Shi, Yihong Gong, Jinjun Wang, and Nanning Zheng
Automatic Color Image Enhancement Using Double Channels . . . . . . . . . . . 74

Na Li, Zhao Liu, Jie Lei, Mingli Song, and Jiajun Bu
Deep Ranking Model for Person Re-identification with Pairwise

Similarity Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Sanping Zhou, Jinjun Wang, Qiqi Hou, and Yihong Gong
Cluster Enhanced Multi-task Learning for Face Attributes Feature Selection . . . 95

Yuchun Fang and Xiaoda Jiang
Triple-Bit Quantization with Asymmetric Distance for Nearest

Neighbor Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Han Deng, Hongtao Xie, Wei Ma, Qiong Dai, Jianjun Chen,
and Ming Lu
X Contents – Part II
Creating Spectral Words for Large-Scale Hyperspectral Remote

Sensing Image Retrieval. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Wenhao Geng, Jing Zhang, Li Zhuo, Jihong Liu, and Lu Chen
Rapid Vehicle Retrieval Using a Cascade of Interest Regions . . . . . . . . . . . . 126

Yuanqi Su, Bonan Cuan, Xingjun Zhang, and Yuehu Liu
Towards Drug Counterfeit Detection Using Package Paperboard

Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
Christof Kauba, Luca Debiasi, Rudolf Schraml, and Andreas Uhl
Dynamic Strategies for Flow Scheduling in Multihoming Video CDNs . . . . . 147

Ming Ma, Zhi Wang, Yankai Zhang, and Lifeng Sun
Homogenous Color Transfer Using Texture Retrieval and Matching . . . . . . . 159

Chang Xing, Hai Ye, Tao Yu, and Zhong Zhou
Viewpoint Estimation for Objects with Convolutional Neural Network

Trained on Synthetic Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
Yumeng Wang, Shuyang Li, Mengyao Jia, and Wei Liang
Depth Extraction from a Light Field Camera Using Weighted

Median Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
Changtian Sun and Gangshan Wu
Scale and Topology Preserving SIFT Feature Hashing . . . . . . . . . . . . . . . . . 190

Chen Kang, Li Zhu, and Xueming Qian
Hierarchical Traffic Sign Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

Yanyun Qu, Siying Yang, Weiwei Wu, and Li Lin
Category Aggregation Among Region Proposals for Object Detection . . . . . . 210

Linghui Li, Sheng Tang, Jianshe Zhou, Bin Wang, and Qi Tian
Exploiting Local Feature Fusion for Action Recognition . . . . . . . . . . . . . . . 221

Jie Miao, Xiangmin Xu, Xiaoyi Jia, Haoyu Huang, Bolun Cai,
Chunmei Qing, and Xiaofen Xing
Improving Image Captioning by Concept-Based Sentence Reranking . . . . . . . 231

Xirong Li and Qin Jin
Blind Image Quality Assessment Based on Local Quantized Pattern . . . . . . . 241

Yazhong Zhang, Jinjian Wu, Xuemei Xie, and Guangming Shi
Sign Language Recognition with Multi-modal Features . . . . . . . . . . . . . . . . 252

Junfu Pu, Wengang Zhou, and Houqiang Li
Heterogeneous Convolutional Neural Networks for Visual Recognition . . . . . 262

Xiangyang Li, Luis Herranz, and Shuqiang Jiang
Contents – Part II XI
Recognition Oriented Feature Hallucination for Low Resolution

Face Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
Guangheng Jia, Xiaoguang Li, Li Zhuo, and Li Liu
Learning Robust Multi-Label Hashing for Efficient Image Retrieval . . . . . . . 285

Haibao Chen, Yuyan Zhao, Lei Zhu, Guilin Chen, and Kaichuan Sun
A Second-Order Approach for Blind Motion Deblurring by Normalized

l1 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
Zedong Chen, Faming Fang, Yingying Xu, and Chaomin Shen
Abnormal Event Detection and Localization by Using Sparse Coding

and Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
Jing Xue, Yao Lu, and Haohao Jiang
Real-Time Video Dehazing Based on Spatio-Temporal MRF . . . . . . . . . . . . 315

Bolun Cai, Xiangmin Xu, and Dacheng Tao
Dynamic Contour Matching for Lossy Screen Content Picture Intra Coding . . . 326
Hu Yuan, Tao Pin, and Yuanchun Shi
A Novel Hard-Decision Quantization Algorithm Based on Adaptive

Deadzone Offset Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
Hongkui Wang, Haibing Yin, and Ye Shen
Comparison of Information Loss Architectures in CNNs . . . . . . . . . . . . . . . 346

Song Wu and Michael S. Lew
Fast-Gaussian SIFT for Fast and Accurate Feature Extraction . . . . . . . . . . . . 355

Liu Ke, Jun Wang, and Zhixian Ye
An Overview+Detail Surveillance Video Player: Information-Based

Adaptive Fast-Forward. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
Lele Dong, Qing Xu, Shang Wu, Xueyan Song, Klaus Schoeffmann,
and Mateu Sbert
Recurrent Double Features: Recurrent Multi-scale Deep Features

and Saliency Features for Salient Object Detection . . . . . . . . . . . . . . . . . . . 376
Ziqin Wang, Peilin Jiang, Fei Wang, and Xuetao Zhang
Key Frame Extraction Based on Motion Vector . . . . . . . . . . . . . . . . . . . . . 387

Ziqian Qiang, Qing Xu, Shihua Sun, and Mateu Sbert
Haze Removal Technology Based on Physical Model . . . . . . . . . . . . . . . . . 396

Yunqian Cui and Xinguang Xiang
Robust Uyghur Text Localization in Complex Background Images . . . . . . . . 406

Jianjun Chen, Yun Song, Hongtao Xie, Xi Chen, Han Deng,
and Yizhi Liu
XII Contents – Part II
Learning Qualitative and Quantitative Image Quality Assessment . . . . . . . . . 417

Yudong Liang, Jinjun Wang, Ze Yang, Yihong Gong,
and Nanning Zheng
An Analysis-Oriented ROI Based Coding Approach on Surveillance

Video Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
Liang Liao, Ruimin Hu, Jing Xiao, Gen Zhan, Yu Chen, and Jun Xiao
A Stepwise Frontal Face Synthesis Approach for Large Pose

Non-frontal Facial Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439
Xueli Wei, Ruimin Hu, Zhen Han, Liang Chen, and Xin Ding
Nonlinear PCA Network for Image Classification . . . . . . . . . . . . . . . . . . . . 449

Xiao Zhang and Youtian Du
Salient Object Detection in Video Based on Dynamic Attention Center . . . . . 458

Mengling Shao, Ruimin Hu, Xu Wang, Zhongyuan Wang, Jing Xiao,
and Ge Gao
Joint Optimization of a Perceptual Modified Wiener Filtering Mask

and Deep Neural Networks for Monaural Speech Separation . . . . . . . . . . . . 469
Wei Han, Xiongwei Zhang, Jibin Yang, Meng Sun, and Gang Min
Automatic Extraction and Construction Algorithm of Overpass

from Raster Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
Xincan Zhao, Yaodan Liu, and Yaping Wang
Geometric and Tongue-Mouth Relation Features for Morphology

Analysis of Tongue Body . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490
Qing Cui, Xiaoqiang Li, Jide Li, and Yin Zhang
Perceptual Asymmetric Video Coding for 3D-HEVC. . . . . . . . . . . . . . . . . . 498

Yongfang Wang, Kanghua Zhu, Yawen Shi, and Pamela C. Cosman
Recognition of Chinese Sign Language Based on Dynamic Features

Extracted by Fast Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508
Zhengchao Zhang, Xiankang Qin, Xiaocong Wu, Feng Wang,
and Zhiyong Yuan
Enhanced Joint Trilateral Up-sampling for Super-Resolution. . . . . . . . . . . . . 518

Liang Yuan, Xin Jin, and Chun Yuan
Learning to Recognize Hand-Held Objects from Scratch . . . . . . . . . . . . . . . 527

Xue Li, Shuqiang Jiang, Xiong Lv, and Chengpeng Chen
Audio Bandwidth Extension Using Audio Super-Resolution . . . . . . . . . . . . . 540

Jiang Lin, Hu Ruimin, Wang Xiaochen, and Tu Weiping
Contents – Part II XIII
Jointly Learning a Multi-class Discriminative Dictionary for Robust

Visual Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 550
Zhao Liu, Mingtao Pei, Chi Zhang, and Mingda Zhu
Product Image Search with Deep Attribute Mining and Re-ranking . . . . . . . . 561
Xin Zhou, Yuqi Zhang, Xiuxiu Bai, Jihua Zhu, Li Zhu, and Xueming Qian
A New Rate Control Algorithm Based on Region of Interest for HEVC . . . . 571
Liquan Shen, Qianqian Hu, Zhi Liu, and Ping An
Deep Learning Features Inspired Saliency Detection of 3D Images . . . . . . . . 580

Qiudan Zhang, Xu Wang, Jianmin Jiang, and Lin Ma
No-Reference Quality Assessment of Camera-Captured Distortion Images . . . 590

Lijuan Tang, Leida Li, Ke Gu, Jiansheng Qian, and Jianying Zhang
GIP: Generic Image Prior for No Reference Image Quality Assessment . . . . . 600
Qingbo Wu, Hongliang Li, and King N. Ngan
Objective Quality Assessment of Screen Content Images

by Structure Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609
Yuming Fang, Jiebin Yan, Jiaying Liu, Shiqi Wang, Qiaohong Li,
and Zongming Guo
CrowdTravel: Leveraging Heterogeneous Crowdsourced Data

for Scenic Spot Profiling and Recommendation. . . . . . . . . . . . . . . . . . . . . . 617
Tong Guo, Bin Guo, Jiafan Zhang, Zhiwen Yu, and Xingshe Zhou
Context-Oriented Name-Face Association in Web Videos. . . . . . . . . . . . . . . 629

Zhineng Chen, Wei Zhang, Hongtao Xie, Bailan Feng, and Xiaoyan Gu
Social Media Profiler: Inferring Your Social Media Personality from Visual
Attributes in Portrait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 640
Jie Nie, Lei Huang, Peng Cui, Zhen Li, Yan Yan, Zhiqiang Wei,
and Wenwu Zhu
SSFS: A Space-Saliency Fingerprint Selection Framework

for Crowdsourcing Based Mobile Location Recognition . . . . . . . . . . . . . . . . 650
Hao Wang, Dong Zhao, Huadong Ma, and Huaiyu Xu
Multi-view Multi-object Tracking Based on Global Graph Matching

Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 660
Chao Li, Shantao Ping, Hao Sheng, Jiahui Chen, and Zhang Xiong
Accelerating Large-Scale Human Action Recognition

with GPU-Based Spark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 670
Hanli Wang, Xiaobin Zheng, and Bo Xiao
XIV Contents – Part II
Adaptive Multi-class Correlation Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . 680

Linlin Yang, Chen Chen, Hainan Wang, Baochang Zhang,
and Jungong Han
Deep Neural Networks for Free-Hand Sketch Recognition . . . . . . . . . . . . . . 689

Yuqi Zhang, Yuting Zhang, and Xueming Qian
Fusion of Thermal and Visible Imagery for Effective Detection

and Tracking of Salient Objects in Videos . . . . . . . . . . . . . . . . . . . . . . . . . 697
Yijun Yan, Jinchang Ren, Huimin Zhao, Jiangbin Zheng,
Ezrinda Mohd Zaihidee, and John Soraghan
RGB-D Camera based Human Limb Movement Recognition

and Tracking in Supine Positions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705
Jun Wu, Cailiang Kuang, Kai Zeng, Wenjing Qiao, Fan Zhang,
Xiaobo Zhang, and Zhisheng Xu
Scene Parsing with Deep Features and Spatial Structure Learning . . . . . . . . . 715
Hui Yu, Yuecheng Song, Wenyu Ju, and Zhenbao Liu
Semi-supervised Learning for Human Pose Recognition

with RGB-D Light-Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723
Xinbo Wang, Guoshan Zhang, Dahai Yu, and Dan Liu
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 739

Contents – Part I
Visual Tracking by Local Superpixel Matching with Markov Random Field . . . 1

Heng Fan, Jinhai Xiang, and Zhongmin Chen
Saliency Detection Combining Multi-layer Integration Algorithm

with Background Prior and Energy Function . . . . . . . . . . . . . . . . . . . . . . . 11
Chenxing Xia and Hanling Zhang
Facial Landmark Localization by Part-Aware Deep Convolutional Network . . . 22

Keke He and Xiangyang Xue
On Combining Compressed Sensing and Sparse Representations

for Object Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Hang Sun, Jing Li, Bo Du, and Dacheng Tao
Leaf Recognition Based on Binary Gabor Pattern and Extreme

Learning Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Huisi Wu, Jingjing Liu, Ping Li, and Zhenkun Wen
Sparse Representation Based Histogram in Color Texture Retrieval . . . . . . . . 55

Cong Bai, Jia-nan Chen, Jinglin Zhang, Kidiyo Kpalma,
and Joseph Ronsin
Improving Image Retrieval by Local Feature Reselection

with Query Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Hanli Wang and Tianyao Sun
Sparse Subspace Clustering via Closure Subgraph Based on Directed Graph. . . 75

Yuefeng Ma and Xun Liang
Robust Lip Segmentation Based on Complexion Mixture Model . . . . . . . . . . 85

Yangyang Hu, Hong Lu, Jinhua Cheng, Wenqiang Zhang, Fufeng Li,
and Weifei Zhang
Visual BFI: An Exploratory Study for Image-Based Personality Test . . . . . . . 95

Jitao Sang, Huaiwen Zhang, and Changsheng Xu
Fast Cross-Scenario Clothing Retrieval Based on Indexing Deep Features . . . 107

Zongmin Li, Yante Li, Yongbiao Gao, and Yujie Liu
3D Point Cloud Encryption Through Chaotic Mapping . . . . . . . . . . . . . . . . 119

Xin Jin, Zhaoxing Wu, Chenggen Song, Chunwei Zhang,
and Xiaodong Li
XVI Contents – Part I
Online Multi-Person Tracking Based on Metric Learning . . . . . . . . . . . . . . . 130

Changyong Yu, Min Yang, Yanmei Dong, Mingtao Pei, and Yunde Jia
A Low-Rank Tensor Decomposition Based Hyperspectral Image

Compression Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
Mengfei Zhang, Bo Du, Lefei Zhang, and Xuelong Li
Moving Object Detection with ViBe and Texture Feature. . . . . . . . . . . . . . . 150

Yumin Tian, Dan Wang, Peipei Jia, and Jinhui Liu
Leveraging Composition of Object Regions for Aesthetic Assessment

of Photographs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
Hong Lu, Zeping Yao, Yunhan Bai, Zhibin Zhu, Bohong Yang,
Lukun Chen, and Wenqiang Zhang
Video Affective Content Analysis Based on Protagonist

via Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
Yingying Zhu, Zhengbo Jiang, Jianfeng Peng, and Sheng-hua Zhong
Texture Description Using Dual Tree Complex Wavelet Packets . . . . . . . . . . 181

M. Liedlgruber, M. Häfner, J. Hämmerle-Uhl, and A. Uhl
Fast and Accurate Image Denoising via a Deep

Convolutional-Pairs Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Lulu Sun, Yongbing Zhang, Wangpeng An, Jingtao Fan, Jian Zhang,
Haoqian Wang, and Qionghai Dai
Traffic Sign Recognition Based on Attribute-Refinement Cascaded

Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
Kaixuan Xie, Shiming Ge, Qiting Ye, and Zhao Luo
Building Locally Discriminative Classifier Ensemble Through Classifier

Fusion Among Nearest Neighbors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
Xiang-Jun Shen, Wen-Chao Zhang, Wei Cai, Ben-Bright B. Benuw,
He-Ping Song, Qian Zhu, and Zheng-Jun Zha
Retrieving Images by Multiple Samples via Fusing Deep Features . . . . . . . . 221

Kecai Wu, Xueliang Liu, Jie Shao, Richang Hong, and Tao Yang
A Part-Based and Feature Fusion Method for Clothing Classification. . . . . . . 231

Pan Huo, Yunhong Wang, and Qingjie Liu
Research on Perception Sensitivity of Elevation Angle in 3D Sound Field . . . 242

Yafei Wu, Xiaochen Wang, Cheng Yang, Ge Gao, and Wei Chen
Tri-level Combination for Image Representation . . . . . . . . . . . . . . . . . . . . . 250

Ruiying Li, Chunjie Zhang, and Qingming Huang
Contents – Part I XVII
Accurate Multi-view Stereopsis Fusing DAISY Descriptor

and Scaled-Neighbourhood Patches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
Fei Wang and Ning An
Stereo Matching Based on CF-EM Joint Algorithm . . . . . . . . . . . . . . . . . . . 271

Baoping Li, Long Ye, Yun Tie, and Qin Zhang
Fine-Grained Vehicle Recognition in Traffic Surveillance. . . . . . . . . . . . . . . 285

Qi Wang, Zhongyuan Wang, Jing Xiao, Jun Xiao, and Wenbin Li
Transductive Classification by Robust Linear Neighborhood Propagation . . . . 296

Lei Jia, Zhao Zhang, and Weiming Jiang
Discriminative Sparse Coding by Nuclear Norm-Driven Semi-Supervised

Dictionary Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
Weiming Jiang, Zhao Zhang, Yan Zhang, and Fanzhang Li
Semantically Smoothed Refinement for Everyday Concept Indexing . . . . . . . 318

Peng Wang, Lifeng Sun, Shiqiang Yang, and Alan F. Smeaton
A Deep Two-Stream Network for Bidirectional Cross-Media

Information Retrieval. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
Tianyuan Yu, Liang Bai, Jinlin Guo, Zheng Yang, and Yuxiang Xie
Prototyping Methodology with Motion Estimation Algorithm . . . . . . . . . . . . 338

Jinglin Zhang, Jian Shang, and Cong Bai
Automatic Image Annotation Using Adaptive Weighted Distance

in Improved K Nearest Neighbors Framework . . . . . . . . . . . . . . . . . . . . . . 345
Jiancheng Li and Chun Yuan
One-Shot-Learning Gesture Segmentation and Recognition

Using Frame-Based PDV Features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
Tao Rong and Ruoyu Yang
Multi-scale Point Set Saliency Detection Based on Site Entropy Rate . . . . . . 366
Yu Guo, Fei Wang, Pengyu Liu, Jingmin Xin, and Nanning Zheng
Facial Expression Recognition with Multi-scale Convolution

Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
Jieru Wang and Chun Yuan
Deep Similarity Feature Learning for Person Re-identification . . . . . . . . . . . 386

Yanan Guo, Dapeng Tao, Jun Yu, and Yaotang Li
Object Detection Based on Scene Understanding and Enhanced Proposals . . . 397

Zhicheng Wang and Chun Yuan
XVIII Contents – Part I
Video Inpainting Based on Joint Gradient and Noise Minimization . . . . . . . . 407

Yiqi Jiang, Xin Jin, and Zhiyong Wu
Head Related Transfer Function Interpolation Based on Aligning Operation . . . 418

Tingzhao Wu, Ruimin Hu, Xiaochen Wang, Li Gao, and Shanfa Ke
Adaptive Multi-window Matching Method for Depth Sensing SoC

and Its VLSI Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
Huimin Yao, Chenyang Ge, Liuqing Yang, Yichuan Fu, and Jianru Xue
A Cross-Domain Lifelong Learning Model for Visual Understanding . . . . . . 438

Chunmei Qing, Zhuobin Huang, and Xiangmin Xu
On the Quantitative Analysis of Sparse RBMs . . . . . . . . . . . . . . . . . . . . . . 449

Yanxia Zhang, Lu Yang, Binghao Meng, Hong Cheng, Yong Zhang,
Qian Wang, and Jiadan Zhu
An Efficient Solution for Extrinsic Calibration of a Vision System

with Simple Laser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
Ya-Nan Chen, Fei Wang, Hang Dong, Xuetao Zhang, and Haiwei Yang
A Stepped-RAM Reading and Multiplierless VLSI Architecture

for Intra Prediction in HEVC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469
Wei Zhou, Yue Niu, Xiaocong Lian, Xin Zhou, and Jiamin Yang
A Sea-Land Segmentation Algorithm Based on Sea Surface Analysis . . . . . . 479

Guichi Liu, Enqing Chen, Lin Qi, Yun Tie, and Deyin Liu
Criminal Investigation Oriented Saliency Detection for Surveillance Videos . . . 487

Yu Chen, Ruimin Hu, Jing Xiao, Liang Liao, Jun Xiao, and Gen Zhan
Deep Metric Learning with Improved Triplet Loss for Face Clustering
in Videos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497
Shun Zhang, Yihong Gong, and Jinjun Wang
Characterizing TCP Performance for Chunk Delivery in DASH . . . . . . . . . . 509

Wen Hu, Zhi Wang, and Lifeng Sun
Where and What to Eat: Simultaneous Restaurant and Dish Recognition

from Food Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 520
Huayang Wang, Weiqing Min, Xiangyang Li, and Shuqiang Jiang
A Real-Time Gesture-Based Unmanned Aerial Vehicle Control System . . . . . 529

Leye Wei, Xin Jin, Zhiyong Wu, and Lei Zhang
A Biologically Inspired Deep CNN Model . . . . . . . . . . . . . . . . . . . . . . . . . 540

Shizhou Zhang, Yihong Gong, Jinjun Wang, and Nanning Zheng
Contents – Part I XIX
Saliency-Based Objective Quality Assessment of Tone-Mapped Images . . . . . 550

Yinchu Chen, Ke Li, and Bo Yan
Sparse Matrix Based Hashing for Approximate Nearest Neighbor Search . . . . 559
Min Wang, Wengang Zhou, Qi Tian, and Houqiang Li
Piecewise Affine Sparse Representation via Edge Preserving

Image Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569
Xuan Wang, Fei Wang, and Yu Guo
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577

A Global-Local Approach to Extracting
Deformable Fashion Items from Web Images
Lixuan Yang1,2(B) , Helena Rodriguez2 , Michel Crucianu1 , and Marin Ferecatu1

1
Conservatoire National des Arts et Metiers,
292 Rue Saint-Martin, 75003 Paris, France
{lixuan.yang,michel.crucianu,marin.ferecatu}@cnam.fr
2
Shopedia SAS, 55 Rue La Boétie, 75008 Paris, France
{lixuan.yang,helena.rodriguez}@shopedia.fr
Abstract. In this work we propose a new framework for extracting

deformable clothing items from images by using a three stage global-
local fitting procedure. First, a set of initial segmentation templates are
generated from a handcrafted database. Then, each template initiates an
object extraction process by a global alignment of the model, followed
by a local search minimizing a measure of the misfit with respect to the
potential boundaries in the neighborhood. Finally, the results provided
by each template are aggregated, with a global fitting criterion, to obtain
the final segmentation. The method is validated on the Fashionista data-
base and on a new database of manually segmented images. Our method
compares favorably with the Paper Doll clothing parsing and with the
recent GrabCut on One Cut foreground extraction method. We quanti-
tatively analyze each component, and show examples of both successful
segmentation and difficult cases.
Keywords: Clothing extraction · Segmentation · Active contour ·

GrabCut
1 Introduction and Related Work
With the recent proliferation of fashion web-stores, an important goal for online
advertising systems is to propose items that truly correspond to the expectations
of the users in terms of design, manufacturing and suitability. We put forward
here a method to extract, without user supervision, clothes and other fashion
items from web images. Indeed, localizing, extracting and tracking fashion items
during web browsing is an important step in addressing the needs of professionals
of online advertising and fashion media: present the users with relevant items
from a clothing database, based on the content of the web application they are
consulting and its context of use. Users usually look for characteristics expressed
by very subjective concepts, to describe a style, a brand or a specific design. For
this reason, recent research focused in the development of detection, recognition
and search of fashion items based on visual characteristics [11].

c Springer International Publishing AG 2016
E. Chen et al. (Eds.): PCM 2016, Part II, LNCS 9917, pp. 1–12, 2016.
DOI: 10.1007/978-3-319-48896-7 1
2 L. Yang et al.
A popular approach is to model the target items based on attribute selection

and high-level classification, for example [5] trains attribute classifiers on fine-
grained clothing styles formulating the retrieval as a classification problem, [2]
extracts low-level features in a pose-adaptive manner and learns attribute clas-
sifiers by using conditional random fields (CRF), while [3] introduced a novel
double-path deep domain adaptation network for attribute prediction by mod-
eling the data jointly from unconstrained photos and the images issued from
large-scale online shopping stores. A complementary approach is to use part-
based models to compensate for the lack of pose estimation. The idea is to
automatically align patches of human body parts by using different methods,
for example sparse coding as in [16] or graph parsing technique as in [12].
Segmentation and aggregation to select cloth categories was employed either
by using bottom-up cloth parsing from labels attached to pixels [19] or by over-
segmentation and classification [8]. Deep learning was also used with success
for clothing retrieval (deep similarity learning [13], Siamese networks [18]) or to
predict fashionability [15].
Fig. 1. Our goal is to produce a precise segmentation (extraction) of the fashion items
as in (b).
Unlike the above-mentioned methods, our proposal aims to precisely seg-

ment the object of interest from the background (foreground separation, see
Fig. 1(b)), without user interaction and without using an extensive training
database. Extracting such complex objects by simply optimizing a local pixel
objective function is likely to fail without an awareness of the object’s global
properties. To take this into account, we propose a Global-Local approach based
on the idea that a local search is likely to converge to a better fit if the initial
state is coherent with the expected global appearance of the object.
Our method is validated on the Fashionista database [19]1 and on a new
database of manually segmented images that we specifically built to test fashion
objects extraction and that we make available to the community. Our method
compares favorably with the well-known Paper Doll [19] clothing parsing and
with the recent GrabCut on One Cut [17] generic foreground extraction method.
We provide examples of successful segmentation, analyze difficult cases and also
quantitatively evaluate each component.
1
http://vision.is.tohoku.ac.jp/∼kyamagu/research/paperdoll/.
A Global-Local Approach to Extracting Deformable Fashion Items 3
In Sect. 2 we describe our proposal, followed by a detailed presentation of

each component. After the experimental validation in Sect. 3, we conclude the
paper with Sect. 4 by a discussion of the main points and extension perspectives.
2 Our Proposal
Detecting clothes in images is a difficult problem because the objects are
deformable, have large intra-class diversity and may appear against complex
backgrounds. To extract objects under these difficult conditions and without
user intervention, methods solely relying on optimizing a local criterion (or pixel
classification based on local features) are unlikely to perform well. Some knowl-
edge about the global shape of the class of objects to be extracted is necessary to
help a local analysis converge to a correct object boundary. In this paper we use
this intuition to develop a framework that takes into account the local/global
duality to select the most likely object segmentation.
We investigate here fashion items that are worn by a person. This covers
practically most of the situations encountered by users of fashion and/or news
web sites, while making possible the use of a person detector to restrict the
search regions in the image and to serve as reference for alignment operations.
First, we prepare a set of images containing the object of interest and we
manually segment them. These initial object masks (called templates in the fol-
lowing) provide the prior knowledge used by the algorithm. Of course, a given
manual segmentation will not match exactly the object in an unknown image.
We use each segmentation (after a suitable alignment) as a template to initiate
an active contour (AC) procedure that will converge closer to the true bound-
aries of the real object in the current image. We then extract the object with
a suitable GrabCut procedure to provide the final segmentation. Thus, at the
end we have as many candidate segmentations as hand-made templates. In the
final step we choose the best of them according to a criterion that optimizes
the coherence of the proposed segmentation with the edges extracted from the
image. In the following subsections we detail each of these stages (see also Fig. 2
for an illustration).
Fig. 2. Different stages of our approach: (a) original image, (b) a template segmenta-
tion, (c) output of the person detector, (d) result after the alignment step, (e) result
after the active contour step, (f) the GrabCut band, (g) result after the GrabCut step.
4 L. Yang et al.
To summarize, the main contributions of this paper are: we introduce a new

framework for the extraction of fashion items in web images that combines local
and global object characteristics, framework supported by a new active contour
that optimizes the gap with respect to the global segmentation model, and by a
new measure of fit of the proposed segmentation to the real distribution of the
contours. Also, we prepare a new benchmark database and make it available to
the community.
2.1 Person Detector

For clothing extraction, it is reasonable to first apply a person detector. As in
many other studies (e.g. [8,12,20]), we use the person detector with articulated
pose estimation algorithm from [21] that was extensively tested and proved to
be very robust in several other fashion-related works (see Sect. 1). It is based
on a deformable model that sees the object as a combination of parts [21]. The
detection score is defined as the fit to parts minus the parts deformation cost.
The mixture model not only encodes object structure but also captures spatial
relations between part locations and co-occurrence relations between parts. The
output of the detector is a set of parts (rectangular boxes) centered on the body
joints and oriented correctly. The boxes are used as reference points for alignment
by translation and re-scaling in several stages of our proposal (see below).
To train the person detector, we manually annotate a set of 800 images. Each
person is annotated with 14 joint points by marking the articulations and main
body parts. When the legs are covered by long dresses, the lower parts are placed
on the edges of the dress rather than on the legs. This not only improves detection
accuracy, but also hints to the location of the contours. Figure 2(c) shows the
output of the person detector on an unannotated image. Boxes usually slightly
cover the limbs and body joints.
2.2 Template Selection

As we have seen, each initial template can provide a candidate segmentation for a
new, unknown image. However, this is redundant and may slow down unnecessar-
ily the procedure. Since we focus on the fashion items that are worn by a person,
the number of different poses in which an object may be found is relatively small,
and many initial templates are thus quite similar. Intuitively, templates that are
alike in shape should also produce similar segmentation masks. To reduce their
number, the initial templates are clustered into similar-shape clusters by using
the K-Medoid procedure [9]. We employ 8 clusters for each object class, which is
a reasonable choice in our case because the number of person poses is not very
large. Each resulting cluster is a configuration of deformable objects that share a
similarity in pose, viewpoint and clothing shape. The dissimilarity of two object
masks is defined by the complement of the Jaccard index:
Surface(S1 ∩ S2 )
d(S1 , S2 ) = 1 −
Surface(S1 ∪ S2 )
where S1 and S2 are the binary masks of two objects.
Fig. 3. Medoids of the 8 clusters of template segmentations for three classes: jeans
(top), long dress (middle) and coat (bottom).
Each cluster represents a segmentation configuration and its prototype is

used in the next stages of the procedure. However, we do not simply choose the
medoid as the prototype of the cluster, but rather the element in the cluster that
is visually closest to the corresponding box parts produced by the person detector
on the unknown image. To do so, we apply the object detector on both the
unknown image and the template image and we compare the boxes that contain
the object in the template with the corresponding ones in the unknown image
by using the Euclidean distance. To represent the content of the boxes we first
considered HOG features [4] (to favor similar shape content) but finally settled
for Caffe features [7] that provide better results. This suggests that mid-level
features give better clues to identifying the correct pose of an object compared
to local pure shape features. Shape is relevant for comparing the boundaries of
two objects but less so when comparing what is inside those boundaries.
Specifically, we use the AlexNet model in [10] within the Caffe framework [7].
The network was pre-trained on 1.2 million high-resolution images from Ima-
geNet, classified into 1000 classes. To fine-tune the network to our image domain,
we replace the last layer by a layer of ten outputs (the number of classes con-
sidered here) and then retrain the network on our training database with back-
propagation to fine-tune the weights of all the layers. After the fine-tuning, the
feature we employ is the vector of responses for layer fc7 (second to last layer)
obtained by forward propagation.
To illustrate this step we show in Fig. 3 the medoids (centers) of the 8 clusters
obtained for three classes of our benchmark database. We notice the diversity in
poses, scale and topology. For example, some coats are segmented into several
disjoint parts, some have openings and some jeans are covered by a vest.
2.3 Template Alignment
The output of the previous stage is a set of segmentation templates (8 in our

case) for each object class. They will be used one by one to initiate an active
6 L. Yang et al.
contour process. But they first need to be aligned into the unknown image at the
right site and with the correct angle and scale. We propose an SVM alignment
technique based on the observation that the person detector places the boxes
centered on the body joints. Thus, the line joining adjacent boxes represents the
body limbs. Since the clothing’s spatial distribution highly depends on the pose
of human body, and thus on limb placement, we use the vector of distances from
a pixel to the limbs as a feature vector to learn a pixel-level SVM classifier that
predicts if a pixel belongs to the object. Learning is performed on the template
image and prediction on the unknown image. Pixels predicted as positives form
the mask whose envelope serves as initialization for the active contour step.
The SVM uses a Gaussian kernel with a scale parameter σ = 1 found through
experiments.
2.4 Active Contour

Once the template is embedded in the image, we use it to initialize an active
contour (AC) that should converge to the boundaries of the object. The result
is highly dependent on the initial contour, but usually one of the 8 segmenta-
tion templates leads to a final contour that is quite close to the true boundary.
The AC is initialized with the aligned segmentation contour produced by the
previous step and has as input the gray-level image. We use the AC introduced
in [1] because it can segment objects whose boundaries are not necessarily well-
supported by gradient information. The AC minimizes an energy defined by
contour length, area inside the contour and a fitting term:

F (c1 , c2 , C) = μ · Length(C) + ν · Area(in(C)) + λ1 |u(x, y) − c1 |2
in(C)

+ λ2 |u(x, y) − c2 |2 (1)
out(C)
where C is the current contour, c1 and c2 are the average pixel gray-level values
u(x, y) inside and respectively outside the contour C. The curvature term is
controlled by μ and the fitting terms by λ1 and λ2 . The averages c1 and c2
are usually computed on the entire image. Because of the large variability of the
background in real images, these values can be meaningless locally. Consequently,
in our case we replace them by averages computed in a local window of size 40×40
pixels around each contour pixel.
To reinforce the influence of the global shape of the template on the position
of the AC, we include a new term in the energy function (Eq. 1) that moderates
the tendency to converge too far away from the template:

Ft (C) = η Dm (x, y) (2)
on(C)
where Dm (x, y) is the distance between pixel (x, y) and the template. By includ-
ing this term, the contour will converge to those image regions that separate
best the inside from the outside and, at the same time, are not too far away
from the template contour.
2.5 Segmentation
The contours obtained in the previous step suffer from two implicit problems:
(1) only the grey-level information is used by the AC process, and (2) possible
alignment errors may affect the result. To compensate for these problems, an
“exclusion band” of constant thickness is defined around the contour produced
by the previous step, then the inside region is labeled as “certain foreground”
and the outside area as “certain background”. A GrabCut algorithm [14] is then
initialized by these labels to obtain the final result. GrabCut takes into account
the global information of color in the image and will correct the alignment errors
within the limits of the defined band.
2.6 Object Selection
After obtaining the segmentation proposals initiated from each template, we

need to select a single segmentation as the final result. For this, we propose a
score based on a global measure of fit to the image:

D (x, y)ds
on(C) e
F (C) = (3)
on(C)
ds
where De (x, y) is the distance from the current pixel to the closest edge detected
by [6] and C is the boundary of the segmentation proposal. This score measures
the average distance from the segmentation boundary to the closest edges in
the image. A small value indicates a good fit to the image. See Table 1 for an
illustration of this step.
Table 1. Segmentation selection from the results based on the 8 templates of the class,
using the corresponding fit values. The test image is given top left, with the extracted
edges shown bottom left. The best score is the smallest (outlined in boldface).
8 L. Yang et al.
Fig. 4. Qualitative evaluation: are original images and associated segmentation results.
3 Experimental Results
To assess the performance of the proposed method, we perform two sets of
experiments. In the first set, our method is compared to a recent improvement
of GrabCut [14] that is the standard approach in generic object extraction, on a
novel fashion item benchmark we built. The second set of experiments compares
our proposal to the recent PaperDoll [19] fashion item annotation method on
the Fashionista database [20].
3.1 RichPicture Database

Since, to our knowledge, at this time there is no public benckmark specifi-
cally designed for clothing extraction from fashion images, we introduce a novel
dataset called RichPicture, consisting of 1000 images from Google.com and
Bing.com. It has 100 images for each of the following fashion items: Boots, Coat,
Jeans, Shirt, T-Shirt, Short Dress, Mid Dress, Long Dress, Vest and Sweater.
Each target object in each class is manually segmented. To train the person
detector (see Sect. 2.1), images are also annotated by 14 key points. This data-
base will be made available with the paper and open to external contributions.
We shall further extend it with new classes and more images per class.
3.2 Comparison with GrabCut in One Cut

In this set of experiments, we compare our proposal to GrabCut in one cut [17],
a recent improvement on the well-known GrabCut [14] foreground extraction
algorithm, which is frequently used as a baseline method in the literature. Grab-
Cut in One Cut was shown in [17] to have higher effectiveness, is less resource
demanding and has an open implementation. These reasons makes it a good
candidate as a benchmark baseline. For the purpose of this evaluation, we split
each class of our database in 80 images for training (template selection) and 20
images for test.
Table 2. Comparison with the One Cut algorithm. The comparison measure is the
Jaccard index.
Class Boots Coat Mid dress Jeans Shirt T-shirt Short dress Long dress Vest Pull
Our method 0,54 0,74 0,84 0,78 0,77 0,67 0,80 0,74 0,65 0,74
One Cut 0,26 0,31 0,54 0,71 0,77 0,45 0,47 0,57 0,35 0,36
The segmentation produced by the algorithms is tested against the ground

truth obtained by manual segmentation. As performance measure we employ
the Jaccard index, traditionally used for segmentation evaluation, and averaged
over all the testing images of a class. To outline the object for One Cut we use
the external envelope of the relevant parts (the ones that contain parts of the
object) identified by the person detector. Table 2 shows a class by class synthesis
of the results (best results are in boldface).
It can be seen that the proposed method performs significantly better on all
the classes except “Shirt” where the scores are equal. While both segmentation
methods are automatic (do not require interaction), these results speak in favor
of including specific knowledge into the algorithm (by the use of segmentation
templates in our case).
3.3 Comparison with Paper Doll

To our knowledge, there is no published method concerning fashion retrieval that
aims to precisely extract entire fashion items from arbitrary images. The closest
we could find is the Paper Doll framework, cited above, that in fact attributes
label scores to a set of blobs in the image. By taking the union of all the blobs
that correspond to a same clothing class, one can extract objects of that class.
The authors of Paper Doll also introduced the Fashionista database, used to
test annotation algorithms, which we use for this evaluation. Table 3 presents
the synthesis of the results of Paper Doll vs. One Cut vs. our method. The
object classes we selected for tests are those that correspond to fashion items
that are worn by persons (compatible with our method).
For our method, training and template selection are performed on the same
part of the database that Paper Doll employed for training. As seen from Table 3,
on most object classes we compare favorably to Paper Doll. For objects like
“Boots”, our method needs a more dedicated alignment process, since the object
is very small compared to the frame given by the person detector that serves
as alignment reference. For objects of the “Jeans” class, the problem also comes
from the alignment stage, because the boxes proposed by the person detector
are not very well positioned when the legs are crossed. It is necessary to increase
the number of training examples with this specific pose.
3.4 Qualitative Evaluation

We illustrate here the results of the proposed method with some examples taken
from the our test database. First, Table 1 shows the final segmentation selection
Another random document with
no related content on Scribd:
The Project Gutenberg eBook of
Andersonville diary
This ebook is for the use of anyone anywhere in the United States
and most other parts of the world at no cost and with almost no
restrictions whatsoever. You may copy it, give it away or re-use it
under the terms of the Project Gutenberg License included with this
ebook or online at www.gutenberg.org. If you are not located in the
United States, you will have to check the laws of the country where
you are located before using this eBook.
Title: Andersonville diary

escape, and list of the dead, with name, co., regiment, date
of death and no. of grave in cemetery
Author: John L. Ransom
Release date: September 10, 2023 [eBook #71609]

Most recently updated: October 27, 2023
Language: English
Original publication: Auburn N. Y: John L. Ransom, 1881
Credits: MWS, John Campbell and the Online Distributed

Proofreading Team at https://www.pgdp.net (This file was
produced from images generously made available by The
Internet Archive/American Libraries.)
*** START OF THE PROJECT GUTENBERG EBOOK

ANDERSONVILLE DIARY ***
TRANSCRIBER’S NOTE
This book has only two footnotes and they have been placed very
close to their anchors. These anchors are denoted by [A] and [B].
The Table of Contents has been created by the transcriber and is
hereby placed in the public domain.
This edition of the diary was self-published in 1881 by the author
John Ransom. It had first been printed some years earlier in a
Michigan newspaper. Many minor printer’s errors have been
corrected in this etext, and are noted at the end of the book.
Misspellings in the diary text have been left unchanged.
The ‘List of the Dead’ is printed following the diary itself and is
essentially a reprint, in a similar but different format, of the source
document held in the Library of Congress. This source list was
compiled by the efforts of Dorence Atwater and Clara Barton, and
can now be viewed online at https://www.loc.gov/item/37031864
This records the deaths of prisoners which occurred in the
fourteen months between March 1864 and April 1865. It is
organized by State, and names are listed alphabetically by first
letter only. More details can be found in the Transcriber Note at the
end of the book.
Andersonville Diary,
ESCAPE,
——AND——
LIST OF THE DEAD,

——WITH——
Name, Co., Regiment, Date of

Death
——AND——
No. of Grave in Cemetery.
JOHN L. RANSOM,
LATE FIRST SERGEANT NINTH MICH. CAV.,
AUTHOR AND PUBLISHER.
AUBURN, N. Y.
1881.
“Entered according to act of Congress, in the year 1881, by
John L. Ransom, in the office of the Librarian of
Congress, at Washington.”
D E D I C AT I O N .
TO THE
MOTHERS, WIVES AND SISTERS
OF THOSE WHOSE NAMES
ARE HEREIN RECORDED AS HAVING DIED
—IN—
ANDERSONVILLE,
THIS BOOK IS RESPECTFULLY DEDICATED
BY THE AUTHOR.
John L. Ransom.
(From a photograph taken two months
before capture.)
INTRODUCTION.
The book to which these lines form an introduction

is a peculiar one in many respects. It is a story, but it
is a true story, and written years ago with little idea
that it would ever come into this form. The writer has
been induced, only recently, by the advice of friends
and by his own feeling that such a production would
be appreciated, to present what, at the time it was
being made up, was merely a means of occupying a
mind which had to contemplate, besides, only the
horrors of a situation from which death would have
been, and was to thousands, a happy relief.
The original diary in which these writings were
made from day to day was destroyed by fire some
years after the war, but its contents had been printed
in a series of letters to the Jackson, (Mich.) Citizen,
and to the editor and publisher of that journal thanks
are now extended for the privilege of using his files
for the preparation of this work. There has been little
change in the entries in the diary, before presenting
them here. In such cases the words which suggest
themselves at the time are best—they cannot be
improved upon by substitution at a later day.
This book is essentially different from any other
that has been published concerning the “late war” or
any of its incidents. Those who have had any such
experience as the author will see its truthfulness at
once, and to all other readers it is commended as a
statement of actual things by one who experienced
them to the fullest.
The annexed list of the Andersonville dead is from
the rebel official records, is authentic, and will be
found valuable in many pension cases and
otherwise.
CONTENTS
THE CAPTURE 9
NEW YEAR’S DAY 23
PEMERTON BUILDING 34
ANDERSONVILLE 41
FROM BAD TO WORSE 65
THE RAIDERS PUT DOWN 75
AN ACCOUNT OF THE
81
HANGING
MOVED JUST IN TIME 91
HOSPITAL LIFE 97
REMOVED TO MILLEN 109
ESCAPE BUT NOT ESCAPE 120
RE-CAPTURED 127
A SUCCESSFUL ESCAPE 136
SAFE AND SOUND 154
THE FINIS 160
MICHAEL HOARE’S ESCAPE 167
REBEL TESTIMONY 172
SUMMARY 187
THE WAR’S DEAD 188
EX-PRISONERS AND
189
PENSIONERS
LIST OF THE DEAD 193
A LIST OF OFFICERS
IMPRISONED AT CAMP 289
ASYLUM
THE CAPTURE.
A REBEL RUSE TO GOBBLE UP UNION TROOPS—A COMPLETE

SURPRISE—CARELESS OFFICERS—HEROIC DEFENCE—
BEGINNING OF A LONG IMPRISONMENT.
Belle Island, Richmond, Va., Nov. 22, 1863.—I

was captured near Rogersville, East Tennessee, on
the 6th of this month, while acting as Brigade
Quarter-Master Sergt. The Brigade was divided, two
regiments twenty miles away, while Brigade Head-
Quarters with 7th Ohio and 1st Tennessee Mounted
Infantry were at Rogersville. The brigade quarter-
master had a large quantity of clothing on hand,
which we were about to issue to the brigade as soon
as possible. The rebel citizens got up a dance at one
of the public houses in the village, and invited all the
union officers. This was the evening of Nov. 5th.
Nearly all the officers attended and were away from
the command nearly all night and many were away
all night. We were encamped in a bend of the
Holston River. It was a dark rainy night and the river
rose rapidly before morning. The dance was a ruse
to get our officers away from their command. At
break of day the pickets were drove in by rebel
cavalry, and orders were immediately received from
commanding officer to get wagon train out on the
road in ten minutes. The quarter-master had been to
the dance and had not returned, consequently it
devolved upon me to see to wagon train, which I did,
and in probably ten minutes the whole seventy six
mule army wagons were in line out on the main road,
while the companies were forming into line and
getting ready for a fight. Rebels had us completely
surrounded and soon began to fire volley after volley
into our disorganized ranks. Not one officer in five
was present; Gen. commanding and staff as soon as
they realized our danger, started for the river, swam
across and got away. We had a small company of
artillery with us commanded by a lieutenant. The
lieutenant in the absence of other officers, assumed
command of the two regiments, and right gallantly
did he do service. Kept forming his men for the better
protection of his wagon train, while the rebels were
shifting around from one point to another, and all the
time sending volley after volley into our ranks. Our
men did well, and had there been plenty of officers
and ammunition, we might have gained the day. After
ten hours fighting we were obliged to surrender after
having lost in killed over a hundred, and three or four
times that number in wounded. After surrendering we
were drawn up into line, counted off and hurriedly
marched away south. By eight o’clock at night had
probably marched ten miles, and encamped until
morning. We expected that our troops would
intercept and release us, but they did not. An hour
before daylight we were up and on the march toward
Bristol, Va., that being the nearest railroad station.
We were cavalrymen, and marching on foot made us
very lame, and we could hardly hobble along. Were
very well fed on corn bread and bacon. Reached
Bristol, Va., Nov. 8th and were soon aboard of cattle
cars en-route for the rebel capital. I must here tell
how I came into possession of a very nice and large
bed spread which is doing good service even now
these cold nights. After we were captured everything
was taken away from us, blankets, overcoats, and in
many cases our boots and shoes. I had on a new
pair of boots, which by muddying them over had
escaped the rebel eyes thus far, as being a good
pair. As our blankets had been taken away from us
we suffered considerably from cold. I saw that if I
was going to remain a prisoner of war it behooved
me to get hold of a blanket. After a few hours march I
became so lame walking with my new boots on that
the rebels were compelled to put me on an old horse

ebffiledoc_490Download textbook Advances In Multimedia Information Processing Pcm 2016 17Th Pacific Rim Conference On Multimedia Xi An China September 15 16 2016 Proceedings Part Ii 1St Edition Enqing Chen ebook all chapter pdf

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ebffiledoc_490Download textbook Advances In Multimedia Information Processing Pcm 2016 17Th Pacific Rim Conference On Multimedia Xi An China September 15 16 2016 Proceedings Part Ii 1St Edition Enqing Chen ebook all chapter pdf

Uploaded by

Copyright:

Available Formats

Advances in Multimedia Information

Processing PCM 2016 17th Pacific Rim

Advances in Multimedia Information Processing PCM 2016

Advances in Multimedia Information Processing – PCM

Advances in Multimedia Information Processing – PCM

Advances in Multimedia Information Processing – PCM

MultiMedia Modeling 22nd International Conference MMM

Web Age Information Management 17th International

Perspectives in Business Informatics Research 15th

Building Sustainable Health Ecosystems 6th

Yun Tie (Eds.)

ISSN 0302-9743 ISSN 1611-3349 (electronic)

Library of Congress Control Number: 2016959170

© Springer International Publishing AG 2016

Printed on acid-free paper

This Springer imprint is published by Springer Nature

Ryerson University. Finally, we wish to thank all committee members, reviewers,

September 2016 Meng Wang

Local Arrangements Chairs

Special Session Chairs

Finance and Registration Chair

A Global-Local Approach to Extracting Deformable Fashion Items from

Say Cheese: Personal Photography Layout Recommendation

Speech Enhancement Using Non-negative Low-Rank Modeling

Facial Animation Based on 2D Shape Regression . . . . . . . . . . . . . . . . . . . . 33

A Deep CNN with Focused Attention Objective for Integrated Object

An Accurate Measurement System for Non-cooperative Spherical Target

Integrating Supervised Laplacian Objective with CNN

Automatic Color Image Enhancement Using Double Channels . . . . . . . . . . . 74

Deep Ranking Model for Person Re-identification with Pairwise

Cluster Enhanced Multi-task Learning for Face Attributes Feature Selection . . . 95

Triple-Bit Quantization with Asymmetric Distance for Nearest

Creating Spectral Words for Large-Scale Hyperspectral Remote

Rapid Vehicle Retrieval Using a Cascade of Interest Regions . . . . . . . . . . . . 126

Towards Drug Counterfeit Detection Using Package Paperboard

Dynamic Strategies for Flow Scheduling in Multihoming Video CDNs . . . . . 147

Homogenous Color Transfer Using Texture Retrieval and Matching . . . . . . . 159

Viewpoint Estimation for Objects with Convolutional Neural Network

Depth Extraction from a Light Field Camera Using Weighted

Scale and Topology Preserving SIFT Feature Hashing . . . . . . . . . . . . . . . . . 190

Hierarchical Traffic Sign Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

Category Aggregation Among Region Proposals for Object Detection . . . . . . 210

Exploiting Local Feature Fusion for Action Recognition . . . . . . . . . . . . . . . 221

Improving Image Captioning by Concept-Based Sentence Reranking . . . . . . . 231

Blind Image Quality Assessment Based on Local Quantized Pattern . . . . . . . 241

Sign Language Recognition with Multi-modal Features . . . . . . . . . . . . . . . . 252

Heterogeneous Convolutional Neural Networks for Visual Recognition . . . . . 262

Recognition Oriented Feature Hallucination for Low Resolution

Learning Robust Multi-Label Hashing for Efficient Image Retrieval . . . . . . . 285

A Second-Order Approach for Blind Motion Deblurring by Normalized

Abnormal Event Detection and Localization by Using Sparse Coding

Real-Time Video Dehazing Based on Spatio-Temporal MRF . . . . . . . . . . . . 315

A Novel Hard-Decision Quantization Algorithm Based on Adaptive

Comparison of Information Loss Architectures in CNNs . . . . . . . . . . . . . . . 346

Fast-Gaussian SIFT for Fast and Accurate Feature Extraction . . . . . . . . . . . . 355

An Overview+Detail Surveillance Video Player: Information-Based

Recurrent Double Features: Recurrent Multi-scale Deep Features

Key Frame Extraction Based on Motion Vector . . . . . . . . . . . . . . . . . . . . . 387

Haze Removal Technology Based on Physical Model . . . . . . . . . . . . . . . . . 396

Robust Uyghur Text Localization in Complex Background Images . . . . . . . . 406

Learning Qualitative and Quantitative Image Quality Assessment . . . . . . . . . 417

An Analysis-Oriented ROI Based Coding Approach on Surveillance

A Stepwise Frontal Face Synthesis Approach for Large Pose

Nonlinear PCA Network for Image Classification . . . . . . . . . . . . . . . . . . . . 449