You are on page 1of 54

Advances in Multimedia Information

Processing PCM 2016 17th Pacific Rim


Conference on Multimedia Xi an China
September 15 16 2016 Proceedings Part
II 1st Edition Enqing Chen
Visit to download the full and correct content document:
https://textbookfull.com/product/advances-in-multimedia-information-processing-pcm-
2016-17th-pacific-rim-conference-on-multimedia-xi-an-china-september-15-16-2016-p
roceedings-part-ii-1st-edition-enqing-chen/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Advances in Multimedia Information Processing PCM 2016


17th Pacific Rim Conference on Multimedia Xi an China
September 15 16 2016 Proceedings Part I 1st Edition
Enqing Chen
https://textbookfull.com/product/advances-in-multimedia-
information-processing-pcm-2016-17th-pacific-rim-conference-on-
multimedia-xi-an-china-september-15-16-2016-proceedings-
part-i-1st-edition-enqing-chen/

Advances in Multimedia Information Processing – PCM


2018: 19th Pacific-Rim Conference on Multimedia, Hefei,
China, September 21-22, 2018, Proceedings, Part II
Richang Hong
https://textbookfull.com/product/advances-in-multimedia-
information-processing-pcm-2018-19th-pacific-rim-conference-on-
multimedia-hefei-china-september-21-22-2018-proceedings-part-ii-
richang-hong/

Advances in Multimedia Information Processing – PCM


2018: 19th Pacific-Rim Conference on Multimedia, Hefei,
China, September 21-22, 2018, Proceedings, Part III
Richang Hong
https://textbookfull.com/product/advances-in-multimedia-
information-processing-pcm-2018-19th-pacific-rim-conference-on-
multimedia-hefei-china-september-21-22-2018-proceedings-part-iii-
richang-hong/

Advances in Multimedia Information Processing – PCM


2017: 18th Pacific-Rim Conference on Multimedia,
Harbin, China, September 28-29, 2017, Revised Selected
Papers, Part II Bing Zeng
https://textbookfull.com/product/advances-in-multimedia-
information-processing-pcm-2017-18th-pacific-rim-conference-on-
multimedia-harbin-china-september-28-29-2017-revised-selected-
Neural Information Processing 23rd International
Conference ICONIP 2016 Kyoto Japan October 16 21 2016
Proceedings Part IV 1st Edition Akira Hirose

https://textbookfull.com/product/neural-information-
processing-23rd-international-conference-iconip-2016-kyoto-japan-
october-16-21-2016-proceedings-part-iv-1st-edition-akira-hirose/

MultiMedia Modeling 22nd International Conference MMM


2016 Miami FL USA January 4 6 2016 Proceedings Part I
1st Edition Qi Tian

https://textbookfull.com/product/multimedia-modeling-22nd-
international-conference-mmm-2016-miami-fl-usa-
january-4-6-2016-proceedings-part-i-1st-edition-qi-tian/

Web Age Information Management 17th International


Conference WAIM 2016 Nanchang China June 3 5 2016
Proceedings Part I 1st Edition Bin Cui

https://textbookfull.com/product/web-age-information-
management-17th-international-conference-waim-2016-nanchang-
china-june-3-5-2016-proceedings-part-i-1st-edition-bin-cui/

Perspectives in Business Informatics Research 15th


International Conference BIR 2016 Prague Czech Republic
September 15 16 2016 Proceedings 1st Edition Václav
■epa
https://textbookfull.com/product/perspectives-in-business-
informatics-research-15th-international-conference-
bir-2016-prague-czech-republic-
september-15-16-2016-proceedings-1st-edition-vaclav-repa/

Building Sustainable Health Ecosystems 6th


International Conference on Well Being in the
Information Society WIS 2016 Tampere Finland September
16 18 2016 Proceedings 1st Edition Hongxiu Li
https://textbookfull.com/product/building-sustainable-health-
ecosystems-6th-international-conference-on-well-being-in-the-
information-society-wis-2016-tampere-finland-
Enqing Chen · Yihong Gong
Yun Tie (Eds.)

Advances in Multimedia
LNCS 9917

Information Processing –
PCM 2016
17th Pacific-Rim Conference on Multimedia
Xi‘an, China, September 15–16, 2016
Proceedings, Part II

123
Lecture Notes in Computer Science 9917
Commenced Publication in 1973
Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

Editorial Board
David Hutchison
Lancaster University, Lancaster, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Friedemann Mattern
ETH Zurich, Zurich, Switzerland
John C. Mitchell
Stanford University, Stanford, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
C. Pandu Rangan
Indian Institute of Technology, Madras, India
Bernhard Steffen
TU Dortmund University, Dortmund, Germany
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max Planck Institute for Informatics, Saarbrücken, Germany
More information about this series at http://www.springer.com/series/7409
Enqing Chen Yihong Gong

Yun Tie (Eds.)

Advances in Multimedia
Information Processing –
PCM 2016
17th Pacific-Rim Conference on Multimedia
Xi’an, China, September 15–16, 2016
Proceedings, Part II

123
Editors
Enqing Chen Yun Tie
Zhengzhou University Zhengzhou University
Zhengzhou Zhengzhou
China China
Yihong Gong
Jiaotong University
Xi’an
China

ISSN 0302-9743 ISSN 1611-3349 (electronic)


Lecture Notes in Computer Science
ISBN 978-3-319-48895-0 ISBN 978-3-319-48896-7 (eBook)
DOI 10.1007/978-3-319-48896-7

Library of Congress Control Number: 2016959170

LNCS Sublibrary: SL3 – Information Systems and Applications, incl. Internet/Web, and HCI

© Springer International Publishing AG 2016


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now
known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors
give a warranty, express or implied, with respect to the material contained herein or for any errors or
omissions that may have been made.

Printed on acid-free paper

This Springer imprint is published by Springer Nature


The registered company is Springer International Publishing AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface

The 17th Pacific-Rim Conference on Multimedia (PCM 2016) was held in Xi’an,
China, during September 15–16, 2016, and hosted by the Xi’an Jiaotong University
(XJTU). PCM is a leading international conference for researchers and industry
practitioners to share their new ideas, original research results, and practical devel-
opment experiences from all multimedia-related areas.
It was a great honor for XJTU to host PCM 2016, one of the most longstanding
multimedia conferences, in Xi’an, China. Xi’an Jiaotong University, located in the
capital of Shaanxi province, is one of the key universities run by the Ministry of
Education, China. Recently its multimedia-related research has been attracting
increasing attention from the local and international multimedia community. For over
2000 years, Xi’an has been the center for political and economic developments and the
capital city of many Chinese dynasties, with the richest cultural and historical heritage,
including the world-famous Terracotta Warriors, Big Wild Goose Pagoda, etc. We
hope that our venue made PCM 2016 a memorable experience for all participants.
PCM 2016 featured a comprehensive program. The 202 submissions from authors
of more than ten countries included a large number of high-quality papers in multi-
media content analysis, multimedia signal processing and communications, and mul-
timedia applications and services. We thank our 28 Technical Program Committee
members who spent many hours reviewing papers and providing valuable feedback to
the authors. From the total of 202 submissions to the main conference and based on at
least three reviews per submission, the program chairs decided to accept 111 regular
papers (54 %) among which 67 were posters (33 %). This volume of the conference
proceedings contains the abstracts of two invited talks and all the regular, poster, and
special session papers.
The technical program is an important aspect but only achieves its full impact if
complemented by challenging keynotes. We are extremely pleased and grateful to have
had two exceptional keynote speakers, Wen Gao and Alex Hauptmann, accept our
invitation and present interesting ideas and insights at PCM 2016.
We are also heavily indebted to many individuals for their significant contributions.
We thank the PCM Steering Committee for their invaluable input and guidance on
crucial decisions. We wish to acknowledge and express our deepest appreciation to the
honorary chairs, Nanning Zheng, Shin’chi Satoh, general chairs, Yihong Gong, Tho-
mas Plagemann, Ke Lu, Jianping Fan, program chairs, Meng Wang, Qi Tian, Abdul-
motaleb EI Saddik, Yun Tie, organizing chairs, Jinye Peng, Xinbo Gao, Ziyu Guan,
Yizhou Wang, publicity chairs, Xueming Qian, Xiaojiang Chen, Cheng Jin, Xiangyang
Xue, publication chairs, Jun Wu, Enqing Chen, local Arrangements Chairs, Kuizi Mei,
Xuguang Lan, special session chairs, Jianbing Shen, Jialie Shen, Jianru Xue, demo
chairs, Yugang Jiang, Jitao Sang, finance and registration chair, Shuchan Gao. Without
their efforts and enthusiasm, PCM 2016 would not have become a reality. Moreover,
we want to thank our sponsors: Springer, Peking University, Zhengzhou University,
VI Preface

Ryerson University. Finally, we wish to thank all committee members, reviewers,


session chairs, student volunteers, and supporters. Their contributions are much
appreciated.

September 2016 Meng Wang


Yun Tie
Qi Tian
Abdulmotaleb EI Saddik
Yihong Gong
Thomas Plagemann
Ke Lu
Jianping Fan
Organization

Honorary Chairs
Nanning Zheng Xi’an Jiaotong University, China
Shin’chi Satoh National Institute of Informatics, Japan

General Chairs
Yihong Gong Xi’an Jiaotong University, China
Thomas Plagemann University of Oslo, Norway
Ke Lu University of Chinese Academy of Sciences, China
Jianping Fan University of North Carolina at Charlotte, USA

Program Chairs
Meng Wang Hefei University of Technology, China
Qi Tian University of Texas at San Antonio, USA
Abdulmotaleb EI Saddik University of Ottawa, Canada
Yun Tie Zhengzhou University, China

Organizing Chairs
Jinye Peng Northwest University, China
Xinbo Gao Xidian University, China
Ziyu Guan Northwest University, China
Yizhou Wang Peking University, China

Publicity Chairs
Xueming Qian Xi’an Jiaotong University, China
Xiaojiang Chen Northwest University, China
Cheng Jin Fudan University, China
Xiangyang Xue Fudan University, China

Publication Chairs
Jun Wu Northwestern Polytechnical University, China
Enqing Chen Zhengzhou University, China
VIII Organization

Local Arrangements Chairs


Kuizi Mei Xi’an Jiaotong University, China
Xuguang Lan Xi’an Jiaotong University, China

Special Session Chairs


Jianbing Shen Beijing Institute of Technology, China
Jialie Shen Singapore Management University, Singapore
Jianru Xue Xi’an Jiaotong University, China

Demo Chairs
Yugang Jiang Fudan University, China
Jitao Sang Institute of Automation, Chinese Academy of Sciences,
China

Finance and Registration Chair


Shuchan Gao Xi’an Jiaotong University, China
Contents – Part II

A Global-Local Approach to Extracting Deformable Fashion Items from


Web Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Lixuan Yang, Helena Rodriguez, Michel Crucianu, and Marin Ferecatu

Say Cheese: Personal Photography Layout Recommendation


Using 3D Aesthetics Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Ben Zhang, Ran Ju, Tongwei Ren, and Gangshan Wu

Speech Enhancement Using Non-negative Low-Rank Modeling


with Temporal Continuity and Sparseness Constraints . . . . . . . . . . . . . . . . . 24
Yinan Li, Xiongwei Zhang, Meng Sun, Xushan Chen, and Lin Qiao

Facial Animation Based on 2D Shape Regression . . . . . . . . . . . . . . . . . . . . 33


Ruibin Bai, Qiqi Hou, Jinjun Wang, and Yihong Gong

A Deep CNN with Focused Attention Objective for Integrated Object


Recognition and Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Xiaoyu Tao, Chenyang Xu, Yihong Gong, and Jinjun Wang

An Accurate Measurement System for Non-cooperative Spherical Target


Based on Calibrated Lasers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Hang Dong, Fei Wang, Haiwei Yang, Zhongheng Li, and Yanan Chen

Integrating Supervised Laplacian Objective with CNN


for Object Recognition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Weiwei Shi, Yihong Gong, Jinjun Wang, and Nanning Zheng

Automatic Color Image Enhancement Using Double Channels . . . . . . . . . . . 74


Na Li, Zhao Liu, Jie Lei, Mingli Song, and Jiajun Bu

Deep Ranking Model for Person Re-identification with Pairwise


Similarity Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Sanping Zhou, Jinjun Wang, Qiqi Hou, and Yihong Gong

Cluster Enhanced Multi-task Learning for Face Attributes Feature Selection . . . 95


Yuchun Fang and Xiaoda Jiang

Triple-Bit Quantization with Asymmetric Distance for Nearest


Neighbor Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Han Deng, Hongtao Xie, Wei Ma, Qiong Dai, Jianjun Chen,
and Ming Lu
X Contents – Part II

Creating Spectral Words for Large-Scale Hyperspectral Remote


Sensing Image Retrieval. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Wenhao Geng, Jing Zhang, Li Zhuo, Jihong Liu, and Lu Chen

Rapid Vehicle Retrieval Using a Cascade of Interest Regions . . . . . . . . . . . . 126


Yuanqi Su, Bonan Cuan, Xingjun Zhang, and Yuehu Liu

Towards Drug Counterfeit Detection Using Package Paperboard


Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
Christof Kauba, Luca Debiasi, Rudolf Schraml, and Andreas Uhl

Dynamic Strategies for Flow Scheduling in Multihoming Video CDNs . . . . . 147


Ming Ma, Zhi Wang, Yankai Zhang, and Lifeng Sun

Homogenous Color Transfer Using Texture Retrieval and Matching . . . . . . . 159


Chang Xing, Hai Ye, Tao Yu, and Zhong Zhou

Viewpoint Estimation for Objects with Convolutional Neural Network


Trained on Synthetic Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
Yumeng Wang, Shuyang Li, Mengyao Jia, and Wei Liang

Depth Extraction from a Light Field Camera Using Weighted


Median Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
Changtian Sun and Gangshan Wu

Scale and Topology Preserving SIFT Feature Hashing . . . . . . . . . . . . . . . . . 190


Chen Kang, Li Zhu, and Xueming Qian

Hierarchical Traffic Sign Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200


Yanyun Qu, Siying Yang, Weiwei Wu, and Li Lin

Category Aggregation Among Region Proposals for Object Detection . . . . . . 210


Linghui Li, Sheng Tang, Jianshe Zhou, Bin Wang, and Qi Tian

Exploiting Local Feature Fusion for Action Recognition . . . . . . . . . . . . . . . 221


Jie Miao, Xiangmin Xu, Xiaoyi Jia, Haoyu Huang, Bolun Cai,
Chunmei Qing, and Xiaofen Xing

Improving Image Captioning by Concept-Based Sentence Reranking . . . . . . . 231


Xirong Li and Qin Jin

Blind Image Quality Assessment Based on Local Quantized Pattern . . . . . . . 241


Yazhong Zhang, Jinjian Wu, Xuemei Xie, and Guangming Shi

Sign Language Recognition with Multi-modal Features . . . . . . . . . . . . . . . . 252


Junfu Pu, Wengang Zhou, and Houqiang Li

Heterogeneous Convolutional Neural Networks for Visual Recognition . . . . . 262


Xiangyang Li, Luis Herranz, and Shuqiang Jiang
Contents – Part II XI

Recognition Oriented Feature Hallucination for Low Resolution


Face Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
Guangheng Jia, Xiaoguang Li, Li Zhuo, and Li Liu

Learning Robust Multi-Label Hashing for Efficient Image Retrieval . . . . . . . 285


Haibao Chen, Yuyan Zhao, Lei Zhu, Guilin Chen, and Kaichuan Sun

A Second-Order Approach for Blind Motion Deblurring by Normalized


l1 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
Zedong Chen, Faming Fang, Yingying Xu, and Chaomin Shen

Abnormal Event Detection and Localization by Using Sparse Coding


and Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
Jing Xue, Yao Lu, and Haohao Jiang

Real-Time Video Dehazing Based on Spatio-Temporal MRF . . . . . . . . . . . . 315


Bolun Cai, Xiangmin Xu, and Dacheng Tao

Dynamic Contour Matching for Lossy Screen Content Picture Intra Coding . . . 326
Hu Yuan, Tao Pin, and Yuanchun Shi

A Novel Hard-Decision Quantization Algorithm Based on Adaptive


Deadzone Offset Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
Hongkui Wang, Haibing Yin, and Ye Shen

Comparison of Information Loss Architectures in CNNs . . . . . . . . . . . . . . . 346


Song Wu and Michael S. Lew

Fast-Gaussian SIFT for Fast and Accurate Feature Extraction . . . . . . . . . . . . 355


Liu Ke, Jun Wang, and Zhixian Ye

An Overview+Detail Surveillance Video Player: Information-Based


Adaptive Fast-Forward. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
Lele Dong, Qing Xu, Shang Wu, Xueyan Song, Klaus Schoeffmann,
and Mateu Sbert

Recurrent Double Features: Recurrent Multi-scale Deep Features


and Saliency Features for Salient Object Detection . . . . . . . . . . . . . . . . . . . 376
Ziqin Wang, Peilin Jiang, Fei Wang, and Xuetao Zhang

Key Frame Extraction Based on Motion Vector . . . . . . . . . . . . . . . . . . . . . 387


Ziqian Qiang, Qing Xu, Shihua Sun, and Mateu Sbert

Haze Removal Technology Based on Physical Model . . . . . . . . . . . . . . . . . 396


Yunqian Cui and Xinguang Xiang

Robust Uyghur Text Localization in Complex Background Images . . . . . . . . 406


Jianjun Chen, Yun Song, Hongtao Xie, Xi Chen, Han Deng,
and Yizhi Liu
XII Contents – Part II

Learning Qualitative and Quantitative Image Quality Assessment . . . . . . . . . 417


Yudong Liang, Jinjun Wang, Ze Yang, Yihong Gong,
and Nanning Zheng

An Analysis-Oriented ROI Based Coding Approach on Surveillance


Video Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
Liang Liao, Ruimin Hu, Jing Xiao, Gen Zhan, Yu Chen, and Jun Xiao

A Stepwise Frontal Face Synthesis Approach for Large Pose


Non-frontal Facial Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439
Xueli Wei, Ruimin Hu, Zhen Han, Liang Chen, and Xin Ding

Nonlinear PCA Network for Image Classification . . . . . . . . . . . . . . . . . . . . 449


Xiao Zhang and Youtian Du

Salient Object Detection in Video Based on Dynamic Attention Center . . . . . 458


Mengling Shao, Ruimin Hu, Xu Wang, Zhongyuan Wang, Jing Xiao,
and Ge Gao

Joint Optimization of a Perceptual Modified Wiener Filtering Mask


and Deep Neural Networks for Monaural Speech Separation . . . . . . . . . . . . 469
Wei Han, Xiongwei Zhang, Jibin Yang, Meng Sun, and Gang Min

Automatic Extraction and Construction Algorithm of Overpass


from Raster Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
Xincan Zhao, Yaodan Liu, and Yaping Wang

Geometric and Tongue-Mouth Relation Features for Morphology


Analysis of Tongue Body . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490
Qing Cui, Xiaoqiang Li, Jide Li, and Yin Zhang

Perceptual Asymmetric Video Coding for 3D-HEVC. . . . . . . . . . . . . . . . . . 498


Yongfang Wang, Kanghua Zhu, Yawen Shi, and Pamela C. Cosman

Recognition of Chinese Sign Language Based on Dynamic Features


Extracted by Fast Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508
Zhengchao Zhang, Xiankang Qin, Xiaocong Wu, Feng Wang,
and Zhiyong Yuan

Enhanced Joint Trilateral Up-sampling for Super-Resolution. . . . . . . . . . . . . 518


Liang Yuan, Xin Jin, and Chun Yuan

Learning to Recognize Hand-Held Objects from Scratch . . . . . . . . . . . . . . . 527


Xue Li, Shuqiang Jiang, Xiong Lv, and Chengpeng Chen

Audio Bandwidth Extension Using Audio Super-Resolution . . . . . . . . . . . . . 540


Jiang Lin, Hu Ruimin, Wang Xiaochen, and Tu Weiping
Contents – Part II XIII

Jointly Learning a Multi-class Discriminative Dictionary for Robust


Visual Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 550
Zhao Liu, Mingtao Pei, Chi Zhang, and Mingda Zhu

Product Image Search with Deep Attribute Mining and Re-ranking . . . . . . . . 561
Xin Zhou, Yuqi Zhang, Xiuxiu Bai, Jihua Zhu, Li Zhu, and Xueming Qian

A New Rate Control Algorithm Based on Region of Interest for HEVC . . . . 571
Liquan Shen, Qianqian Hu, Zhi Liu, and Ping An

Deep Learning Features Inspired Saliency Detection of 3D Images . . . . . . . . 580


Qiudan Zhang, Xu Wang, Jianmin Jiang, and Lin Ma

No-Reference Quality Assessment of Camera-Captured Distortion Images . . . 590


Lijuan Tang, Leida Li, Ke Gu, Jiansheng Qian, and Jianying Zhang

GIP: Generic Image Prior for No Reference Image Quality Assessment . . . . . 600
Qingbo Wu, Hongliang Li, and King N. Ngan

Objective Quality Assessment of Screen Content Images


by Structure Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609
Yuming Fang, Jiebin Yan, Jiaying Liu, Shiqi Wang, Qiaohong Li,
and Zongming Guo

CrowdTravel: Leveraging Heterogeneous Crowdsourced Data


for Scenic Spot Profiling and Recommendation. . . . . . . . . . . . . . . . . . . . . . 617
Tong Guo, Bin Guo, Jiafan Zhang, Zhiwen Yu, and Xingshe Zhou

Context-Oriented Name-Face Association in Web Videos. . . . . . . . . . . . . . . 629


Zhineng Chen, Wei Zhang, Hongtao Xie, Bailan Feng, and Xiaoyan Gu

Social Media Profiler: Inferring Your Social Media Personality from Visual
Attributes in Portrait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 640
Jie Nie, Lei Huang, Peng Cui, Zhen Li, Yan Yan, Zhiqiang Wei,
and Wenwu Zhu

SSFS: A Space-Saliency Fingerprint Selection Framework


for Crowdsourcing Based Mobile Location Recognition . . . . . . . . . . . . . . . . 650
Hao Wang, Dong Zhao, Huadong Ma, and Huaiyu Xu

Multi-view Multi-object Tracking Based on Global Graph Matching


Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 660
Chao Li, Shantao Ping, Hao Sheng, Jiahui Chen, and Zhang Xiong

Accelerating Large-Scale Human Action Recognition


with GPU-Based Spark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 670
Hanli Wang, Xiaobin Zheng, and Bo Xiao
XIV Contents – Part II

Adaptive Multi-class Correlation Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . 680


Linlin Yang, Chen Chen, Hainan Wang, Baochang Zhang,
and Jungong Han

Deep Neural Networks for Free-Hand Sketch Recognition . . . . . . . . . . . . . . 689


Yuqi Zhang, Yuting Zhang, and Xueming Qian

Fusion of Thermal and Visible Imagery for Effective Detection


and Tracking of Salient Objects in Videos . . . . . . . . . . . . . . . . . . . . . . . . . 697
Yijun Yan, Jinchang Ren, Huimin Zhao, Jiangbin Zheng,
Ezrinda Mohd Zaihidee, and John Soraghan

RGB-D Camera based Human Limb Movement Recognition


and Tracking in Supine Positions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705
Jun Wu, Cailiang Kuang, Kai Zeng, Wenjing Qiao, Fan Zhang,
Xiaobo Zhang, and Zhisheng Xu

Scene Parsing with Deep Features and Spatial Structure Learning . . . . . . . . . 715
Hui Yu, Yuecheng Song, Wenyu Ju, and Zhenbao Liu

Semi-supervised Learning for Human Pose Recognition


with RGB-D Light-Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723
Xinbo Wang, Guoshan Zhang, Dahai Yu, and Dan Liu

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 739


Contents – Part I

Visual Tracking by Local Superpixel Matching with Markov Random Field . . . 1


Heng Fan, Jinhai Xiang, and Zhongmin Chen

Saliency Detection Combining Multi-layer Integration Algorithm


with Background Prior and Energy Function . . . . . . . . . . . . . . . . . . . . . . . 11
Chenxing Xia and Hanling Zhang

Facial Landmark Localization by Part-Aware Deep Convolutional Network . . . 22


Keke He and Xiangyang Xue

On Combining Compressed Sensing and Sparse Representations


for Object Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Hang Sun, Jing Li, Bo Du, and Dacheng Tao

Leaf Recognition Based on Binary Gabor Pattern and Extreme


Learning Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Huisi Wu, Jingjing Liu, Ping Li, and Zhenkun Wen

Sparse Representation Based Histogram in Color Texture Retrieval . . . . . . . . 55


Cong Bai, Jia-nan Chen, Jinglin Zhang, Kidiyo Kpalma,
and Joseph Ronsin

Improving Image Retrieval by Local Feature Reselection


with Query Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Hanli Wang and Tianyao Sun

Sparse Subspace Clustering via Closure Subgraph Based on Directed Graph. . . 75


Yuefeng Ma and Xun Liang

Robust Lip Segmentation Based on Complexion Mixture Model . . . . . . . . . . 85


Yangyang Hu, Hong Lu, Jinhua Cheng, Wenqiang Zhang, Fufeng Li,
and Weifei Zhang

Visual BFI: An Exploratory Study for Image-Based Personality Test . . . . . . . 95


Jitao Sang, Huaiwen Zhang, and Changsheng Xu

Fast Cross-Scenario Clothing Retrieval Based on Indexing Deep Features . . . 107


Zongmin Li, Yante Li, Yongbiao Gao, and Yujie Liu

3D Point Cloud Encryption Through Chaotic Mapping . . . . . . . . . . . . . . . . 119


Xin Jin, Zhaoxing Wu, Chenggen Song, Chunwei Zhang,
and Xiaodong Li
XVI Contents – Part I

Online Multi-Person Tracking Based on Metric Learning . . . . . . . . . . . . . . . 130


Changyong Yu, Min Yang, Yanmei Dong, Mingtao Pei, and Yunde Jia

A Low-Rank Tensor Decomposition Based Hyperspectral Image


Compression Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
Mengfei Zhang, Bo Du, Lefei Zhang, and Xuelong Li

Moving Object Detection with ViBe and Texture Feature. . . . . . . . . . . . . . . 150


Yumin Tian, Dan Wang, Peipei Jia, and Jinhui Liu

Leveraging Composition of Object Regions for Aesthetic Assessment


of Photographs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
Hong Lu, Zeping Yao, Yunhan Bai, Zhibin Zhu, Bohong Yang,
Lukun Chen, and Wenqiang Zhang

Video Affective Content Analysis Based on Protagonist


via Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
Yingying Zhu, Zhengbo Jiang, Jianfeng Peng, and Sheng-hua Zhong

Texture Description Using Dual Tree Complex Wavelet Packets . . . . . . . . . . 181


M. Liedlgruber, M. Häfner, J. Hämmerle-Uhl, and A. Uhl

Fast and Accurate Image Denoising via a Deep


Convolutional-Pairs Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Lulu Sun, Yongbing Zhang, Wangpeng An, Jingtao Fan, Jian Zhang,
Haoqian Wang, and Qionghai Dai

Traffic Sign Recognition Based on Attribute-Refinement Cascaded


Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
Kaixuan Xie, Shiming Ge, Qiting Ye, and Zhao Luo

Building Locally Discriminative Classifier Ensemble Through Classifier


Fusion Among Nearest Neighbors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
Xiang-Jun Shen, Wen-Chao Zhang, Wei Cai, Ben-Bright B. Benuw,
He-Ping Song, Qian Zhu, and Zheng-Jun Zha

Retrieving Images by Multiple Samples via Fusing Deep Features . . . . . . . . 221


Kecai Wu, Xueliang Liu, Jie Shao, Richang Hong, and Tao Yang

A Part-Based and Feature Fusion Method for Clothing Classification. . . . . . . 231


Pan Huo, Yunhong Wang, and Qingjie Liu

Research on Perception Sensitivity of Elevation Angle in 3D Sound Field . . . 242


Yafei Wu, Xiaochen Wang, Cheng Yang, Ge Gao, and Wei Chen

Tri-level Combination for Image Representation . . . . . . . . . . . . . . . . . . . . . 250


Ruiying Li, Chunjie Zhang, and Qingming Huang
Contents – Part I XVII

Accurate Multi-view Stereopsis Fusing DAISY Descriptor


and Scaled-Neighbourhood Patches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
Fei Wang and Ning An

Stereo Matching Based on CF-EM Joint Algorithm . . . . . . . . . . . . . . . . . . . 271


Baoping Li, Long Ye, Yun Tie, and Qin Zhang

Fine-Grained Vehicle Recognition in Traffic Surveillance. . . . . . . . . . . . . . . 285


Qi Wang, Zhongyuan Wang, Jing Xiao, Jun Xiao, and Wenbin Li

Transductive Classification by Robust Linear Neighborhood Propagation . . . . 296


Lei Jia, Zhao Zhang, and Weiming Jiang

Discriminative Sparse Coding by Nuclear Norm-Driven Semi-Supervised


Dictionary Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
Weiming Jiang, Zhao Zhang, Yan Zhang, and Fanzhang Li

Semantically Smoothed Refinement for Everyday Concept Indexing . . . . . . . 318


Peng Wang, Lifeng Sun, Shiqiang Yang, and Alan F. Smeaton

A Deep Two-Stream Network for Bidirectional Cross-Media


Information Retrieval. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
Tianyuan Yu, Liang Bai, Jinlin Guo, Zheng Yang, and Yuxiang Xie

Prototyping Methodology with Motion Estimation Algorithm . . . . . . . . . . . . 338


Jinglin Zhang, Jian Shang, and Cong Bai

Automatic Image Annotation Using Adaptive Weighted Distance


in Improved K Nearest Neighbors Framework . . . . . . . . . . . . . . . . . . . . . . 345
Jiancheng Li and Chun Yuan

One-Shot-Learning Gesture Segmentation and Recognition


Using Frame-Based PDV Features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
Tao Rong and Ruoyu Yang

Multi-scale Point Set Saliency Detection Based on Site Entropy Rate . . . . . . 366
Yu Guo, Fei Wang, Pengyu Liu, Jingmin Xin, and Nanning Zheng

Facial Expression Recognition with Multi-scale Convolution


Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
Jieru Wang and Chun Yuan

Deep Similarity Feature Learning for Person Re-identification . . . . . . . . . . . 386


Yanan Guo, Dapeng Tao, Jun Yu, and Yaotang Li

Object Detection Based on Scene Understanding and Enhanced Proposals . . . 397


Zhicheng Wang and Chun Yuan
XVIII Contents – Part I

Video Inpainting Based on Joint Gradient and Noise Minimization . . . . . . . . 407


Yiqi Jiang, Xin Jin, and Zhiyong Wu

Head Related Transfer Function Interpolation Based on Aligning Operation . . . 418


Tingzhao Wu, Ruimin Hu, Xiaochen Wang, Li Gao, and Shanfa Ke

Adaptive Multi-window Matching Method for Depth Sensing SoC


and Its VLSI Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
Huimin Yao, Chenyang Ge, Liuqing Yang, Yichuan Fu, and Jianru Xue

A Cross-Domain Lifelong Learning Model for Visual Understanding . . . . . . 438


Chunmei Qing, Zhuobin Huang, and Xiangmin Xu

On the Quantitative Analysis of Sparse RBMs . . . . . . . . . . . . . . . . . . . . . . 449


Yanxia Zhang, Lu Yang, Binghao Meng, Hong Cheng, Yong Zhang,
Qian Wang, and Jiadan Zhu

An Efficient Solution for Extrinsic Calibration of a Vision System


with Simple Laser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
Ya-Nan Chen, Fei Wang, Hang Dong, Xuetao Zhang, and Haiwei Yang

A Stepped-RAM Reading and Multiplierless VLSI Architecture


for Intra Prediction in HEVC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469
Wei Zhou, Yue Niu, Xiaocong Lian, Xin Zhou, and Jiamin Yang

A Sea-Land Segmentation Algorithm Based on Sea Surface Analysis . . . . . . 479


Guichi Liu, Enqing Chen, Lin Qi, Yun Tie, and Deyin Liu

Criminal Investigation Oriented Saliency Detection for Surveillance Videos . . . 487


Yu Chen, Ruimin Hu, Jing Xiao, Liang Liao, Jun Xiao, and Gen Zhan

Deep Metric Learning with Improved Triplet Loss for Face Clustering
in Videos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497
Shun Zhang, Yihong Gong, and Jinjun Wang

Characterizing TCP Performance for Chunk Delivery in DASH . . . . . . . . . . 509


Wen Hu, Zhi Wang, and Lifeng Sun

Where and What to Eat: Simultaneous Restaurant and Dish Recognition


from Food Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 520
Huayang Wang, Weiqing Min, Xiangyang Li, and Shuqiang Jiang

A Real-Time Gesture-Based Unmanned Aerial Vehicle Control System . . . . . 529


Leye Wei, Xin Jin, Zhiyong Wu, and Lei Zhang

A Biologically Inspired Deep CNN Model . . . . . . . . . . . . . . . . . . . . . . . . . 540


Shizhou Zhang, Yihong Gong, Jinjun Wang, and Nanning Zheng
Contents – Part I XIX

Saliency-Based Objective Quality Assessment of Tone-Mapped Images . . . . . 550


Yinchu Chen, Ke Li, and Bo Yan

Sparse Matrix Based Hashing for Approximate Nearest Neighbor Search . . . . 559
Min Wang, Wengang Zhou, Qi Tian, and Houqiang Li

Piecewise Affine Sparse Representation via Edge Preserving


Image Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569
Xuan Wang, Fei Wang, and Yu Guo

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577


A Global-Local Approach to Extracting
Deformable Fashion Items from Web Images

Lixuan Yang1,2(B) , Helena Rodriguez2 , Michel Crucianu1 , and Marin Ferecatu1


1
Conservatoire National des Arts et Metiers,
292 Rue Saint-Martin, 75003 Paris, France
{lixuan.yang,michel.crucianu,marin.ferecatu}@cnam.fr
2
Shopedia SAS, 55 Rue La Boétie, 75008 Paris, France
{lixuan.yang,helena.rodriguez}@shopedia.fr

Abstract. In this work we propose a new framework for extracting


deformable clothing items from images by using a three stage global-
local fitting procedure. First, a set of initial segmentation templates are
generated from a handcrafted database. Then, each template initiates an
object extraction process by a global alignment of the model, followed
by a local search minimizing a measure of the misfit with respect to the
potential boundaries in the neighborhood. Finally, the results provided
by each template are aggregated, with a global fitting criterion, to obtain
the final segmentation. The method is validated on the Fashionista data-
base and on a new database of manually segmented images. Our method
compares favorably with the Paper Doll clothing parsing and with the
recent GrabCut on One Cut foreground extraction method. We quanti-
tatively analyze each component, and show examples of both successful
segmentation and difficult cases.

Keywords: Clothing extraction · Segmentation · Active contour ·


GrabCut

1 Introduction and Related Work

With the recent proliferation of fashion web-stores, an important goal for online
advertising systems is to propose items that truly correspond to the expectations
of the users in terms of design, manufacturing and suitability. We put forward
here a method to extract, without user supervision, clothes and other fashion
items from web images. Indeed, localizing, extracting and tracking fashion items
during web browsing is an important step in addressing the needs of professionals
of online advertising and fashion media: present the users with relevant items
from a clothing database, based on the content of the web application they are
consulting and its context of use. Users usually look for characteristics expressed
by very subjective concepts, to describe a style, a brand or a specific design. For
this reason, recent research focused in the development of detection, recognition
and search of fashion items based on visual characteristics [11].

c Springer International Publishing AG 2016
E. Chen et al. (Eds.): PCM 2016, Part II, LNCS 9917, pp. 1–12, 2016.
DOI: 10.1007/978-3-319-48896-7 1
2 L. Yang et al.

A popular approach is to model the target items based on attribute selection


and high-level classification, for example [5] trains attribute classifiers on fine-
grained clothing styles formulating the retrieval as a classification problem, [2]
extracts low-level features in a pose-adaptive manner and learns attribute clas-
sifiers by using conditional random fields (CRF), while [3] introduced a novel
double-path deep domain adaptation network for attribute prediction by mod-
eling the data jointly from unconstrained photos and the images issued from
large-scale online shopping stores. A complementary approach is to use part-
based models to compensate for the lack of pose estimation. The idea is to
automatically align patches of human body parts by using different methods,
for example sparse coding as in [16] or graph parsing technique as in [12].
Segmentation and aggregation to select cloth categories was employed either
by using bottom-up cloth parsing from labels attached to pixels [19] or by over-
segmentation and classification [8]. Deep learning was also used with success
for clothing retrieval (deep similarity learning [13], Siamese networks [18]) or to
predict fashionability [15].

Fig. 1. Our goal is to produce a precise segmentation (extraction) of the fashion items
as in (b).

Unlike the above-mentioned methods, our proposal aims to precisely seg-


ment the object of interest from the background (foreground separation, see
Fig. 1(b)), without user interaction and without using an extensive training
database. Extracting such complex objects by simply optimizing a local pixel
objective function is likely to fail without an awareness of the object’s global
properties. To take this into account, we propose a Global-Local approach based
on the idea that a local search is likely to converge to a better fit if the initial
state is coherent with the expected global appearance of the object.
Our method is validated on the Fashionista database [19]1 and on a new
database of manually segmented images that we specifically built to test fashion
objects extraction and that we make available to the community. Our method
compares favorably with the well-known Paper Doll [19] clothing parsing and
with the recent GrabCut on One Cut [17] generic foreground extraction method.
We provide examples of successful segmentation, analyze difficult cases and also
quantitatively evaluate each component.
1
http://vision.is.tohoku.ac.jp/∼kyamagu/research/paperdoll/.
A Global-Local Approach to Extracting Deformable Fashion Items 3

In Sect. 2 we describe our proposal, followed by a detailed presentation of


each component. After the experimental validation in Sect. 3, we conclude the
paper with Sect. 4 by a discussion of the main points and extension perspectives.

2 Our Proposal
Detecting clothes in images is a difficult problem because the objects are
deformable, have large intra-class diversity and may appear against complex
backgrounds. To extract objects under these difficult conditions and without
user intervention, methods solely relying on optimizing a local criterion (or pixel
classification based on local features) are unlikely to perform well. Some knowl-
edge about the global shape of the class of objects to be extracted is necessary to
help a local analysis converge to a correct object boundary. In this paper we use
this intuition to develop a framework that takes into account the local/global
duality to select the most likely object segmentation.
We investigate here fashion items that are worn by a person. This covers
practically most of the situations encountered by users of fashion and/or news
web sites, while making possible the use of a person detector to restrict the
search regions in the image and to serve as reference for alignment operations.
First, we prepare a set of images containing the object of interest and we
manually segment them. These initial object masks (called templates in the fol-
lowing) provide the prior knowledge used by the algorithm. Of course, a given
manual segmentation will not match exactly the object in an unknown image.
We use each segmentation (after a suitable alignment) as a template to initiate
an active contour (AC) procedure that will converge closer to the true bound-
aries of the real object in the current image. We then extract the object with
a suitable GrabCut procedure to provide the final segmentation. Thus, at the
end we have as many candidate segmentations as hand-made templates. In the
final step we choose the best of them according to a criterion that optimizes
the coherence of the proposed segmentation with the edges extracted from the
image. In the following subsections we detail each of these stages (see also Fig. 2
for an illustration).

Fig. 2. Different stages of our approach: (a) original image, (b) a template segmenta-
tion, (c) output of the person detector, (d) result after the alignment step, (e) result
after the active contour step, (f) the GrabCut band, (g) result after the GrabCut step.
4 L. Yang et al.

To summarize, the main contributions of this paper are: we introduce a new


framework for the extraction of fashion items in web images that combines local
and global object characteristics, framework supported by a new active contour
that optimizes the gap with respect to the global segmentation model, and by a
new measure of fit of the proposed segmentation to the real distribution of the
contours. Also, we prepare a new benchmark database and make it available to
the community.

2.1 Person Detector


For clothing extraction, it is reasonable to first apply a person detector. As in
many other studies (e.g. [8,12,20]), we use the person detector with articulated
pose estimation algorithm from [21] that was extensively tested and proved to
be very robust in several other fashion-related works (see Sect. 1). It is based
on a deformable model that sees the object as a combination of parts [21]. The
detection score is defined as the fit to parts minus the parts deformation cost.
The mixture model not only encodes object structure but also captures spatial
relations between part locations and co-occurrence relations between parts. The
output of the detector is a set of parts (rectangular boxes) centered on the body
joints and oriented correctly. The boxes are used as reference points for alignment
by translation and re-scaling in several stages of our proposal (see below).
To train the person detector, we manually annotate a set of 800 images. Each
person is annotated with 14 joint points by marking the articulations and main
body parts. When the legs are covered by long dresses, the lower parts are placed
on the edges of the dress rather than on the legs. This not only improves detection
accuracy, but also hints to the location of the contours. Figure 2(c) shows the
output of the person detector on an unannotated image. Boxes usually slightly
cover the limbs and body joints.

2.2 Template Selection


As we have seen, each initial template can provide a candidate segmentation for a
new, unknown image. However, this is redundant and may slow down unnecessar-
ily the procedure. Since we focus on the fashion items that are worn by a person,
the number of different poses in which an object may be found is relatively small,
and many initial templates are thus quite similar. Intuitively, templates that are
alike in shape should also produce similar segmentation masks. To reduce their
number, the initial templates are clustered into similar-shape clusters by using
the K-Medoid procedure [9]. We employ 8 clusters for each object class, which is
a reasonable choice in our case because the number of person poses is not very
large. Each resulting cluster is a configuration of deformable objects that share a
similarity in pose, viewpoint and clothing shape. The dissimilarity of two object
masks is defined by the complement of the Jaccard index:
Surface(S1 ∩ S2 )
d(S1 , S2 ) = 1 −
Surface(S1 ∪ S2 )
where S1 and S2 are the binary masks of two objects.
A Global-Local Approach to Extracting Deformable Fashion Items 5

Fig. 3. Medoids of the 8 clusters of template segmentations for three classes: jeans
(top), long dress (middle) and coat (bottom).

Each cluster represents a segmentation configuration and its prototype is


used in the next stages of the procedure. However, we do not simply choose the
medoid as the prototype of the cluster, but rather the element in the cluster that
is visually closest to the corresponding box parts produced by the person detector
on the unknown image. To do so, we apply the object detector on both the
unknown image and the template image and we compare the boxes that contain
the object in the template with the corresponding ones in the unknown image
by using the Euclidean distance. To represent the content of the boxes we first
considered HOG features [4] (to favor similar shape content) but finally settled
for Caffe features [7] that provide better results. This suggests that mid-level
features give better clues to identifying the correct pose of an object compared
to local pure shape features. Shape is relevant for comparing the boundaries of
two objects but less so when comparing what is inside those boundaries.
Specifically, we use the AlexNet model in [10] within the Caffe framework [7].
The network was pre-trained on 1.2 million high-resolution images from Ima-
geNet, classified into 1000 classes. To fine-tune the network to our image domain,
we replace the last layer by a layer of ten outputs (the number of classes con-
sidered here) and then retrain the network on our training database with back-
propagation to fine-tune the weights of all the layers. After the fine-tuning, the
feature we employ is the vector of responses for layer fc7 (second to last layer)
obtained by forward propagation.
To illustrate this step we show in Fig. 3 the medoids (centers) of the 8 clusters
obtained for three classes of our benchmark database. We notice the diversity in
poses, scale and topology. For example, some coats are segmented into several
disjoint parts, some have openings and some jeans are covered by a vest.

2.3 Template Alignment

The output of the previous stage is a set of segmentation templates (8 in our


case) for each object class. They will be used one by one to initiate an active
6 L. Yang et al.

contour process. But they first need to be aligned into the unknown image at the
right site and with the correct angle and scale. We propose an SVM alignment
technique based on the observation that the person detector places the boxes
centered on the body joints. Thus, the line joining adjacent boxes represents the
body limbs. Since the clothing’s spatial distribution highly depends on the pose
of human body, and thus on limb placement, we use the vector of distances from
a pixel to the limbs as a feature vector to learn a pixel-level SVM classifier that
predicts if a pixel belongs to the object. Learning is performed on the template
image and prediction on the unknown image. Pixels predicted as positives form
the mask whose envelope serves as initialization for the active contour step.
The SVM uses a Gaussian kernel with a scale parameter σ = 1 found through
experiments.

2.4 Active Contour


Once the template is embedded in the image, we use it to initialize an active
contour (AC) that should converge to the boundaries of the object. The result
is highly dependent on the initial contour, but usually one of the 8 segmenta-
tion templates leads to a final contour that is quite close to the true boundary.
The AC is initialized with the aligned segmentation contour produced by the
previous step and has as input the gray-level image. We use the AC introduced
in [1] because it can segment objects whose boundaries are not necessarily well-
supported by gradient information. The AC minimizes an energy defined by
contour length, area inside the contour and a fitting term:

F (c1 , c2 , C) = μ · Length(C) + ν · Area(in(C)) + λ1 |u(x, y) − c1 |2
in(C)

+ λ2 |u(x, y) − c2 |2 (1)
out(C)

where C is the current contour, c1 and c2 are the average pixel gray-level values
u(x, y) inside and respectively outside the contour C. The curvature term is
controlled by μ and the fitting terms by λ1 and λ2 . The averages c1 and c2
are usually computed on the entire image. Because of the large variability of the
background in real images, these values can be meaningless locally. Consequently,
in our case we replace them by averages computed in a local window of size 40×40
pixels around each contour pixel.
To reinforce the influence of the global shape of the template on the position
of the AC, we include a new term in the energy function (Eq. 1) that moderates
the tendency to converge too far away from the template:

Ft (C) = η Dm (x, y) (2)
on(C)

where Dm (x, y) is the distance between pixel (x, y) and the template. By includ-
ing this term, the contour will converge to those image regions that separate
best the inside from the outside and, at the same time, are not too far away
from the template contour.
A Global-Local Approach to Extracting Deformable Fashion Items 7

2.5 Segmentation

The contours obtained in the previous step suffer from two implicit problems:
(1) only the grey-level information is used by the AC process, and (2) possible
alignment errors may affect the result. To compensate for these problems, an
“exclusion band” of constant thickness is defined around the contour produced
by the previous step, then the inside region is labeled as “certain foreground”
and the outside area as “certain background”. A GrabCut algorithm [14] is then
initialized by these labels to obtain the final result. GrabCut takes into account
the global information of color in the image and will correct the alignment errors
within the limits of the defined band.

2.6 Object Selection

After obtaining the segmentation proposals initiated from each template, we


need to select a single segmentation as the final result. For this, we propose a
score based on a global measure of fit to the image:

D (x, y)ds
on(C) e
F (C) =  (3)
on(C)
ds

where De (x, y) is the distance from the current pixel to the closest edge detected
by [6] and C is the boundary of the segmentation proposal. This score measures
the average distance from the segmentation boundary to the closest edges in
the image. A small value indicates a good fit to the image. See Table 1 for an
illustration of this step.

Table 1. Segmentation selection from the results based on the 8 templates of the class,
using the corresponding fit values. The test image is given top left, with the extracted
edges shown bottom left. The best score is the smallest (outlined in boldface).
8 L. Yang et al.

Fig. 4. Qualitative evaluation: are original images and associated segmentation results.

3 Experimental Results
To assess the performance of the proposed method, we perform two sets of
experiments. In the first set, our method is compared to a recent improvement
of GrabCut [14] that is the standard approach in generic object extraction, on a
novel fashion item benchmark we built. The second set of experiments compares
our proposal to the recent PaperDoll [19] fashion item annotation method on
the Fashionista database [20].

3.1 RichPicture Database


Since, to our knowledge, at this time there is no public benckmark specifi-
cally designed for clothing extraction from fashion images, we introduce a novel
dataset called RichPicture, consisting of 1000 images from Google.com and
Bing.com. It has 100 images for each of the following fashion items: Boots, Coat,
Jeans, Shirt, T-Shirt, Short Dress, Mid Dress, Long Dress, Vest and Sweater.
Each target object in each class is manually segmented. To train the person
detector (see Sect. 2.1), images are also annotated by 14 key points. This data-
base will be made available with the paper and open to external contributions.
We shall further extend it with new classes and more images per class.

3.2 Comparison with GrabCut in One Cut


In this set of experiments, we compare our proposal to GrabCut in one cut [17],
a recent improvement on the well-known GrabCut [14] foreground extraction
algorithm, which is frequently used as a baseline method in the literature. Grab-
Cut in One Cut was shown in [17] to have higher effectiveness, is less resource
demanding and has an open implementation. These reasons makes it a good
candidate as a benchmark baseline. For the purpose of this evaluation, we split
each class of our database in 80 images for training (template selection) and 20
images for test.
A Global-Local Approach to Extracting Deformable Fashion Items 9

Table 2. Comparison with the One Cut algorithm. The comparison measure is the
Jaccard index.

Class Boots Coat Mid dress Jeans Shirt T-shirt Short dress Long dress Vest Pull
Our method 0,54 0,74 0,84 0,78 0,77 0,67 0,80 0,74 0,65 0,74
One Cut 0,26 0,31 0,54 0,71 0,77 0,45 0,47 0,57 0,35 0,36

The segmentation produced by the algorithms is tested against the ground


truth obtained by manual segmentation. As performance measure we employ
the Jaccard index, traditionally used for segmentation evaluation, and averaged
over all the testing images of a class. To outline the object for One Cut we use
the external envelope of the relevant parts (the ones that contain parts of the
object) identified by the person detector. Table 2 shows a class by class synthesis
of the results (best results are in boldface).
It can be seen that the proposed method performs significantly better on all
the classes except “Shirt” where the scores are equal. While both segmentation
methods are automatic (do not require interaction), these results speak in favor
of including specific knowledge into the algorithm (by the use of segmentation
templates in our case).

3.3 Comparison with Paper Doll


To our knowledge, there is no published method concerning fashion retrieval that
aims to precisely extract entire fashion items from arbitrary images. The closest
we could find is the Paper Doll framework, cited above, that in fact attributes
label scores to a set of blobs in the image. By taking the union of all the blobs
that correspond to a same clothing class, one can extract objects of that class.
The authors of Paper Doll also introduced the Fashionista database, used to
test annotation algorithms, which we use for this evaluation. Table 3 presents
the synthesis of the results of Paper Doll vs. One Cut vs. our method. The
object classes we selected for tests are those that correspond to fashion items
that are worn by persons (compatible with our method).
For our method, training and template selection are performed on the same
part of the database that Paper Doll employed for training. As seen from Table 3,
on most object classes we compare favorably to Paper Doll. For objects like
“Boots”, our method needs a more dedicated alignment process, since the object
is very small compared to the frame given by the person detector that serves
as alignment reference. For objects of the “Jeans” class, the problem also comes
from the alignment stage, because the boxes proposed by the person detector
are not very well positioned when the legs are crossed. It is necessary to increase
the number of training examples with this specific pose.

3.4 Qualitative Evaluation


We illustrate here the results of the proposed method with some examples taken
from the our test database. First, Table 1 shows the final segmentation selection
Another random document with
no related content on Scribd:
The Project Gutenberg eBook of
Andersonville diary
This ebook is for the use of anyone anywhere in the United States
and most other parts of the world at no cost and with almost no
restrictions whatsoever. You may copy it, give it away or re-use it
under the terms of the Project Gutenberg License included with this
ebook or online at www.gutenberg.org. If you are not located in the
United States, you will have to check the laws of the country where
you are located before using this eBook.

Title: Andersonville diary


escape, and list of the dead, with name, co., regiment, date
of death and no. of grave in cemetery

Author: John L. Ransom

Release date: September 10, 2023 [eBook #71609]


Most recently updated: October 27, 2023

Language: English

Original publication: Auburn N. Y: John L. Ransom, 1881

Credits: MWS, John Campbell and the Online Distributed


Proofreading Team at https://www.pgdp.net (This file was
produced from images generously made available by The
Internet Archive/American Libraries.)

*** START OF THE PROJECT GUTENBERG EBOOK


ANDERSONVILLE DIARY ***
TRANSCRIBER’S NOTE
This book has only two footnotes and they have been placed very
close to their anchors. These anchors are denoted by [A] and [B].
The Table of Contents has been created by the transcriber and is
hereby placed in the public domain.
This edition of the diary was self-published in 1881 by the author
John Ransom. It had first been printed some years earlier in a
Michigan newspaper. Many minor printer’s errors have been
corrected in this etext, and are noted at the end of the book.
Misspellings in the diary text have been left unchanged.
The ‘List of the Dead’ is printed following the diary itself and is
essentially a reprint, in a similar but different format, of the source
document held in the Library of Congress. This source list was
compiled by the efforts of Dorence Atwater and Clara Barton, and
can now be viewed online at https://www.loc.gov/item/37031864
This records the deaths of prisoners which occurred in the
fourteen months between March 1864 and April 1865. It is
organized by State, and names are listed alphabetically by first
letter only. More details can be found in the Transcriber Note at the
end of the book.
Andersonville Diary,

ESCAPE,
——AND——

LIST OF THE DEAD,


——WITH——

Name, Co., Regiment, Date of


Death
——AND——

No. of Grave in Cemetery.

JOHN L. RANSOM,
LATE FIRST SERGEANT NINTH MICH. CAV.,
AUTHOR AND PUBLISHER.

AUBURN, N. Y.

1881.
“Entered according to act of Congress, in the year 1881, by
John L. Ransom, in the office of the Librarian of
Congress, at Washington.”
D E D I C AT I O N .

TO THE

MOTHERS, WIVES AND SISTERS

OF THOSE WHOSE NAMES

ARE HEREIN RECORDED AS HAVING DIED

—IN—

ANDERSONVILLE,

THIS BOOK IS RESPECTFULLY DEDICATED

BY THE AUTHOR.
John L. Ransom.
(From a photograph taken two months
before capture.)
INTRODUCTION.

The book to which these lines form an introduction


is a peculiar one in many respects. It is a story, but it
is a true story, and written years ago with little idea
that it would ever come into this form. The writer has
been induced, only recently, by the advice of friends
and by his own feeling that such a production would
be appreciated, to present what, at the time it was
being made up, was merely a means of occupying a
mind which had to contemplate, besides, only the
horrors of a situation from which death would have
been, and was to thousands, a happy relief.
The original diary in which these writings were
made from day to day was destroyed by fire some
years after the war, but its contents had been printed
in a series of letters to the Jackson, (Mich.) Citizen,
and to the editor and publisher of that journal thanks
are now extended for the privilege of using his files
for the preparation of this work. There has been little
change in the entries in the diary, before presenting
them here. In such cases the words which suggest
themselves at the time are best—they cannot be
improved upon by substitution at a later day.
This book is essentially different from any other
that has been published concerning the “late war” or
any of its incidents. Those who have had any such
experience as the author will see its truthfulness at
once, and to all other readers it is commended as a
statement of actual things by one who experienced
them to the fullest.
The annexed list of the Andersonville dead is from
the rebel official records, is authentic, and will be
found valuable in many pension cases and
otherwise.
CONTENTS

THE CAPTURE 9
NEW YEAR’S DAY 23
PEMERTON BUILDING 34
ANDERSONVILLE 41
FROM BAD TO WORSE 65
THE RAIDERS PUT DOWN 75
AN ACCOUNT OF THE
81
HANGING
MOVED JUST IN TIME 91
HOSPITAL LIFE 97
REMOVED TO MILLEN 109
ESCAPE BUT NOT ESCAPE 120
RE-CAPTURED 127
A SUCCESSFUL ESCAPE 136
SAFE AND SOUND 154
THE FINIS 160
MICHAEL HOARE’S ESCAPE 167
REBEL TESTIMONY 172
SUMMARY 187
THE WAR’S DEAD 188
EX-PRISONERS AND
189
PENSIONERS
LIST OF THE DEAD 193
A LIST OF OFFICERS
IMPRISONED AT CAMP 289
ASYLUM
THE CAPTURE.

A REBEL RUSE TO GOBBLE UP UNION TROOPS—A COMPLETE


SURPRISE—CARELESS OFFICERS—HEROIC DEFENCE—
BEGINNING OF A LONG IMPRISONMENT.

Belle Island, Richmond, Va., Nov. 22, 1863.—I


was captured near Rogersville, East Tennessee, on
the 6th of this month, while acting as Brigade
Quarter-Master Sergt. The Brigade was divided, two
regiments twenty miles away, while Brigade Head-
Quarters with 7th Ohio and 1st Tennessee Mounted
Infantry were at Rogersville. The brigade quarter-
master had a large quantity of clothing on hand,
which we were about to issue to the brigade as soon
as possible. The rebel citizens got up a dance at one
of the public houses in the village, and invited all the
union officers. This was the evening of Nov. 5th.
Nearly all the officers attended and were away from
the command nearly all night and many were away
all night. We were encamped in a bend of the
Holston River. It was a dark rainy night and the river
rose rapidly before morning. The dance was a ruse
to get our officers away from their command. At
break of day the pickets were drove in by rebel
cavalry, and orders were immediately received from
commanding officer to get wagon train out on the
road in ten minutes. The quarter-master had been to
the dance and had not returned, consequently it
devolved upon me to see to wagon train, which I did,
and in probably ten minutes the whole seventy six
mule army wagons were in line out on the main road,
while the companies were forming into line and
getting ready for a fight. Rebels had us completely
surrounded and soon began to fire volley after volley
into our disorganized ranks. Not one officer in five
was present; Gen. commanding and staff as soon as
they realized our danger, started for the river, swam
across and got away. We had a small company of
artillery with us commanded by a lieutenant. The
lieutenant in the absence of other officers, assumed
command of the two regiments, and right gallantly
did he do service. Kept forming his men for the better
protection of his wagon train, while the rebels were
shifting around from one point to another, and all the
time sending volley after volley into our ranks. Our
men did well, and had there been plenty of officers
and ammunition, we might have gained the day. After
ten hours fighting we were obliged to surrender after
having lost in killed over a hundred, and three or four
times that number in wounded. After surrendering we
were drawn up into line, counted off and hurriedly
marched away south. By eight o’clock at night had
probably marched ten miles, and encamped until
morning. We expected that our troops would
intercept and release us, but they did not. An hour
before daylight we were up and on the march toward
Bristol, Va., that being the nearest railroad station.
We were cavalrymen, and marching on foot made us
very lame, and we could hardly hobble along. Were
very well fed on corn bread and bacon. Reached
Bristol, Va., Nov. 8th and were soon aboard of cattle
cars en-route for the rebel capital. I must here tell
how I came into possession of a very nice and large
bed spread which is doing good service even now
these cold nights. After we were captured everything
was taken away from us, blankets, overcoats, and in
many cases our boots and shoes. I had on a new
pair of boots, which by muddying them over had
escaped the rebel eyes thus far, as being a good
pair. As our blankets had been taken away from us
we suffered considerably from cold. I saw that if I
was going to remain a prisoner of war it behooved
me to get hold of a blanket. After a few hours march I
became so lame walking with my new boots on that
the rebels were compelled to put me on an old horse

You might also like