Textbook Advances in Multimedia Information Processing PCM 2018 19Th Pacific Rim Conference On Multimedia Hefei China September 21 22 2018 Proceedings Part Ii Richang Hong Ebook All Chapter PDF

Advances in Multimedia Information
Processing – PCM 2018: 19th

Pacific-Rim Conference on Multimedia,
Hefei, China, September 21-22, 2018,
Proceedings, Part II Richang Hong
Visit to download the full and correct content document:
https://textbookfull.com/product/advances-in-multimedia-information-processing-pcm-
2018-19th-pacific-rim-conference-on-multimedia-hefei-china-september-21-22-2018-p
roceedings-part-ii-richang-hong/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...
Advances in Multimedia Information Processing – PCM

2018: 19th Pacific-Rim Conference on Multimedia, Hefei,
China, September 21-22, 2018, Proceedings, Part III
Richang Hong
https://textbookfull.com/product/advances-in-multimedia-
information-processing-pcm-2018-19th-pacific-rim-conference-on-
multimedia-hefei-china-september-21-22-2018-proceedings-part-iii-
richang-hong/
Biota Grow 2C gather 2C cook Loucas
https://textbookfull.com/product/biota-grow-2c-gather-2c-cook-
loucas/
Advances in Multimedia Information Processing PCM 2016

17th Pacific Rim Conference on Multimedia Xi an China
September 15 16 2016 Proceedings Part II 1st Edition
Enqing Chen
multimedia-xi-an-china-september-15-16-2016-proceedings-part-
ii-1st-edition-enqing-chen/
Advances in Multimedia Information Processing – PCM

2017: 18th Pacific-Rim Conference on Multimedia,
Harbin, China, September 28-29, 2017, Revised Selected
Papers, Part II Bing Zeng
multimedia-harbin-china-september-28-29-2017-revised-selected-
Advances in Multimedia Information Processing PCM 2016
17th Pacific Rim Conference on Multimedia Xi an China
September 15 16 2016 Proceedings Part I 1st Edition
Enqing Chen
multimedia-xi-an-china-september-15-16-2016-proceedings-
part-i-1st-edition-enqing-chen/
PRICAI 2018 Trends in Artificial Intelligence 15th

Pacific Rim International Conference on Artificial
Intelligence Nanjing China August 28 31 2018
Proceedings Part II Xin Geng
https://textbookfull.com/product/pricai-2018-trends-in-
artificial-intelligence-15th-pacific-rim-international-
conference-on-artificial-intelligence-nanjing-china-
august-28-31-2018-proceedings-part-ii-xin-geng/
Advanced Hybrid Information Processing Third EAI

International Conference ADHIP 2019 Nanjing China
September 21 22 2019 Proceedings Part II Guan Gui
https://textbookfull.com/product/advanced-hybrid-information-
processing-third-eai-international-conference-adhip-2019-nanjing-
china-september-21-22-2019-proceedings-part-ii-guan-gui/
Digital TV and Multimedia Communication: 15th

International Forum, IFTC 2018, Shanghai, China,
September 20–21, 2018, Revised Selected Papers Guangtao
Zhai
https://textbookfull.com/product/digital-tv-and-multimedia-
communication-15th-international-forum-iftc-2018-shanghai-china-
september-20-21-2018-revised-selected-papers-guangtao-zhai/
Advanced Hybrid Information Processing Third EAI

International Conference ADHIP 2019 Nanjing China
September 21 22 2019 Proceedings Part I Guan Gui
https://textbookfull.com/product/advanced-hybrid-information-
processing-third-eai-international-conference-adhip-2019-nanjing-
china-september-21-22-2019-proceedings-part-i-guan-gui/
Richang Hong · Wen-Huang Cheng
Toshihiko Yamasaki · Meng Wang
Chong-Wah Ngo (Eds.)
Advances in Multimedia
LNCS 11165
Information Processing –
PCM 2018
19th Pacific-Rim Conference on Multimedia
Hefei, China, September 21–22, 2018
Proceedings, Part II
123
Lecture Notes in Computer Science 11165
Commenced Publication in 1973
Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board
David Hutchison
Lancaster University, Lancaster, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Friedemann Mattern
ETH Zurich, Zurich, Switzerland
John C. Mitchell
Stanford University, Stanford, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
C. Pandu Rangan
Indian Institute of Technology Madras, Chennai, India
Bernhard Steffen
TU Dortmund University, Dortmund, Germany
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max Planck Institute for Informatics, Saarbrücken, Germany
More information about this series at http://www.springer.com/series/7409
Richang Hong Wen-Huang Cheng
•
Toshihiko Yamasaki Meng Wang

•
Chong-Wah Ngo (Eds.)
Advances in Multimedia
Information Processing –
PCM 2018
19th Pacific-Rim Conference on Multimedia
Hefei, China, September 21–22, 2018
Proceedings, Part II
123
Editors
Richang Hong Meng Wang
Hefei University of Technology Hefei University of Technology
Hefei Hefei
China China
Wen-Huang Cheng Chong-Wah Ngo
National Chiao Tung University City University of Hong Kong
Hsinchu Hong Kong
Taiwan Hong Kong, China
Toshihiko Yamasaki
University of Tokyo
Tokyo
Japan
ISSN 0302-9743 ISSN 1611-3349 (electronic)

Lecture Notes in Computer Science
ISBN 978-3-030-00766-9 ISBN 978-3-030-00767-6 (eBook)
https://doi.org/10.1007/978-3-030-00767-6
Library of Congress Control Number: 2018954671
LNCS Sublibrary: SL3 – Information Systems and Applications, incl. Internet/Web, and HCI
© Springer Nature Switzerland AG 2018

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now
known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors
give a warranty, express or implied, with respect to the material contained herein or for any errors or
omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
The 19th Pacific-Rim Conference on Multimedia (PCM 2018) was held in Hefei,
China, during September 21–22, 2018, and hosted by the Hefei University of Tech-
nology (HFUT). PCM is a major annual international conference for multimedia
researchers and practitioners across academia and industry to demonstrate their sci-
entific achievements and industrial innovations in the field of multimedia.
It is a great honor for HFUT to host PCM 2018, one of the most longstanding
multimedia conferences, in Hefei, China. Hefei University of Technology, located in
the capital of Anhui province, is one of the key universities administrated by the
Ministry of Education, China. Recently its multimedia-related research has attracted
more and more attentions from local and international multimedia community. Hefei is
the capital city of Anhui Province, and is located in the center of Anhui between the
Yangtze and Huaihe rivers. Well known both as a historic site famous from the Three
Kingdoms Period and the hometown of Lord Bao, Hefei is a city with a history of more
than 2000 years. In modern times, as an important base for science and education in
China, Hefei is the first and sole Science and Technology Innovation Pilot City in
China, and a member city of WTA (World Technopolis Association). We hope that
PCM 2018 is a memorable experience for all participants.
PCM 2018 featured a comprehensive program. We received 422 submissions to the
main conference by authors from more than ten countries. These submissions included
a large number of high-quality papers in multimedia content analysis, multimedia
signal processing and communications, and multimedia applications and services. We
thank our Technical Program Committee with 178 members, who spent much time
reviewing papers and providing valuable feedback to the authors. From the total of 422
submissions, the program chairs decided to accept 209 regular papers (49.5%) based on
at least three reviews per submission. In total, 30 papers were received for four special
sessions, while 20 of them were accepted. The volumes of the conference proceedings
contain all the regular and special session papers.
We are also heavily indebted to many individuals for their significant contributions.
We wish to acknowledge and express our deepest appreciation to general chairs, Meng
Wang and Chong-Wah Ngo; program chairs, Richang Hong, Wen-Huang Cheng and
Toshihiko Yamasaki; organizing chairs, Xueliang Liu, Yun Tie and Hanwang Zhang;
publicity chairs, Jingdong Wang, Min Xu, Wei-Ta Chu and Yi Yu, special session
chairs, Zhengjun Zha and Liqiang Nie. Without their efforts and enthusiasm, PCM
2018 would not have become a reality. Moreover, we want to thank our sponsors:
Springer, Anhui Association for Artificial Intelligence, Shandong Artificial Intelligence
Institute, Kuaishou Co. Ltd., and Zhongke Leinao Co. Ltd. Finally, we wish to thank
VI Preface
all committee members, reviewers, session chairs, student volunteers, and supporters.
Their contributions are much appreciated.
September 2018 Richang Hong

Wen-Huang Cheng
Toshihiko Yamasaki
Meng Wang
Chong-Wah Ngo
Organization
General Chairs
Meng Wang Hefei University of Technology, China
Chong-Wah Ngo City University of Hong Kong, Hong Kong, China
Technical Program Chairs

Richang Hong Hefei University of Technology, China
Wen-Huang Cheng National Chiao Tung University, Taiwan, China
Toshihiko Yamasaki University of Tokyo, Japan
Organizing Chairs
Xueliang Liu Hefei University of Technology, China
Yun Tie Zhengzhou University, China
Hanwang Zhang Nanyang Technological University, Singapore
Publicity Chairs
Jingdong Wang Microsoft Research Asia, China
Min Xu University of Technology Sydney, Australia
Wei-Ta Chu National Chung Cheng University, Taiwan, China
Yi Yu National Institute of Informatics, Japan
Special Session Chairs

Zhengjun Zha University of Science and Technology of China, China
Liqiang Nie Shandong University, China
Contents – Part II
Poster Papers
Enhancing Low-Light Images with JPEG Artifact Based

on Image Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Chenmin Xu, Shijie Hao, Yanrong Guo, and Richang Hong
Depth Estimation from Monocular Images Using Dilated Convolution

and Uncertainty Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Haojie Ma, Yinzhang Ding, Lianghao Wang, Ming Zhang,
and Dongxiao Li
Enhancing Feature Correlation for Bi-Modal Group Emotion Recognition . . . 24

Ningjie Liu, Yuchun Fang, and Yike Guo
Frame Rate Conversion Based High Efficient Compression

Method for Video Satellite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Xu Wang, Ruimin Hu, and Jing Xiao
Spectral-Spatial Hyperspectral Image Classification via Adaptive

Total Variation Filtering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Bing Tu, Jinping Wang, Xiaofei Zhang, Siyuan Huang,
and Guoyun Zhang
Convolutional Neural Network Based Inter-Frame Enhancement

for 360-Degree Video Streaming. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Yaru Li, Li Yu, Chunyu Lin, Yao Zhao, and Moncef Gabbouj
Robust and Index-Compatible Deep Hashing for Accurate

and Fast Image Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Jing Liu, Dayan Wu, Wanqian Zhang, Bo Li, and Weiping Wang
Self-supervised GAN for Image Generation by Correlating

Image Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Sheng Qian, Wen-ming Cao, Rui Li, Si Wu, and Hau-san Wong
Deformable Point Cloud Recognition Using Intrinsic Function

and Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Zhenzhong Kuang, Jun Yu, Suguo Zhu, Zongmin Li, and Jianping Fan
Feature Synthesization for Real-Time Pedestrian Detection

in Urban Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Wenhua Fang, Jun Chen, Tao Lu, and Ruimin Hu
X Contents – Part II
Enhancing Person Retrieval with Joint Person Detection, Attribute

Learning, and Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Jianwen Wu, Ye Zhao, and Xueliang Liu
End-To-End Learning for Action Quality Assessment . . . . . . . . . . . . . . . . . 125

Yongjun Li, Xiujuan Chai, and Xilin Chen
CGANs Based User Preferred Photorealistic Re-stylization

of Social Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Zhen Li, Meng Yuan, Jie Nie, Lei Huang, and Zhiqiang Wei
Cross-Modal Event Retrieval: A Dataset and a Baseline Using Deep

Semantic Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Runwei Situ, Zhenguo Yang, Jianming Lv, Qing Li, and Wenyin Liu
Retinal Vessel Segmentation via Multiscaled Deep-Guidance . . . . . . . . . . . . 158

Rui Xu, Guiliang Jiang, Xinchen Ye, and Yen-Wei Chen
Image Synthesis with Aesthetics-Aware Generative Adversarial Network . . . . 169

Rongjie Zhang, Xueliang Liu, Yanrong Guo, and Shijie Hao
Hyperspectral Image Classification Using Nonnegative Sparse Spectral

Representation and Spatial Regularization . . . . . . . . . . . . . . . . . . . . . . . . . 180
Xian-Hua Han, Jian Wang, Jian De Sun, and Yen-Wei Chen
Adaptive Aggregation Network for Face Hallucination. . . . . . . . . . . . . . . . . 190

Jin Guo, Jun Chen, Zhen Han, Han Liu, Zhongyuan Wang,
and Ruimin Hu
Multi-decoder Based Co-attention for Image Captioning . . . . . . . . . . . . . . . 200

Zhen Sun, Xin Lin, Zhaohui Wang, Yi Ji, and Chunping Liu
Simultaneous Occlusion Handling and Optical Flow Estimation . . . . . . . . . . 211

Song Wang and Zengfu Wang
Handwritten Chinese Character Recognition Based

on Domain-Specific Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
Qian Liu, Danqing Wang, Hong Lu, and Chaopeng Li
Attention to Refine Through Multi Scales for Semantic Segmentation . . . . . . 232

Shiqi Yang and Gang Peng
Frame Segmentation Networks for Temporal Action Localization . . . . . . . . . 242

Ke Yang, Peng Qiao, Qiang Wang, Shijie Li, Xin Niu, Dongsheng Li,
and Yong Dou
Stereo Matching Based on Density Segmentation and Non-Local

Cost Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
Jianning Du, Yanbing Xue, Hua Zhang, and Zan Gao
Contents – Part II XI
Single Image Super Resolution Using Local and Non-local Priors . . . . . . . . . 264
Tianyi Li, Kan Chang, Caiwang Mo, Xueyu Zhang, and Tuanfa Qin
Plenoptic Image Compression via Simplified Subaperture Projection . . . . . . . 274

Haixu Han, Jin Xin, and Qionghai Dai
Gaze Aware Deep Learning Model for Video Summarization . . . . . . . . . . . . 285

Jiaxin Wu, Sheng-hua Zhong, Zheng Ma, Stephen J. Heinen,
and Jianmin Jiang
Three-Stream Action Tubelet Detector for Spatiotemporal

Action Detection in Videos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
Yutang Wu, Hanli Wang, and Qinyu Li
Multi-person/Group Interactive Video Generation . . . . . . . . . . . . . . . . . . . . 307

Zhan Wang, Taiping Yao, Huawei Wei, Shanyan Guan, and Bingbing Ni
Image Denoising Based on Non-parametric ADMM Algorithm. . . . . . . . . . . 318

Xinchen Ye, Mingliang Zhang, Qianyu Yan, Xin Fan,
and Zhongxuan Luo
Satellite Image Scene Classification via ConvNet

With Context Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
Zhao Zhou, Yingbin Zheng, Hao Ye, Jian Pu, and Gufei Sun
VAL: Visual-Attention Action Localizer . . . . . . . . . . . . . . . . . . . . . . . . . . 340

Xiaomeng Song and Yahong Han
Incremental Nonnegative Matrix Factorization with Sparseness

Constraint for Image Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
Jing Sun, Zhihui Wang, Haojie Li, and Fuming Sun
A Data-Driven No-Reference Image Quality Assessment via Deep

Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
Yezhao Fan, Yuchen Zhu, Guangtao Zhai, Jia Wang, and Jing Liu
An Asian Face Dataset and How Race Influences Face Recognition . . . . . . . 372
Zhangyang Xiong, Zhongyuan Wang, Changqing Du, Rong Zhu,
Jing Xiao, and Tao Lu
Towards Stereo Matching Algorithm Based on Multi-matching

Primitive Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384
Renpeng Du, Fuming Sun, and Haojie Li
An Effective Image Detection Algorithm for USM Sharpening

Based on Pixel-Pair Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396
Hang Gao, Mengting Hu, Tiegang Gao, and Renhong Cheng
XII Contents – Part II
Skeletal Bone Age Assessment Based on Deep Convolutional

Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408
Pengyi Hao, Yijing Chen, Sharon Chokuwa, Fuli Wu, and Cong Bai
Panoramic Image Saliency Detection by Fusing Visual Frequency Feature

and Viewing Behavior Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
Ying Ding, Yanwei Liu, Jinxia Liu, Kedong Liu, Liming Wang,
and Zhen Xu
Coupled Learning for Image Generation and Latent Representation

Inference Using MMD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
Sheng Qian, Wen-ming Cao, Rui Li, Si Wu, and Hau-san Wong
Enhanced Discriminative Generative Adversarial Network

for Face Super-Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441
Xi Yang, Tao Lu, Jiaming Wang, Yanduo Zhang, Yuntao Wu,
Zhongyuan Wang, and Zixiang Xiong
Image Splicing Detection Based on the Q-Markov Features . . . . . . . . . . . . . 453

Hongda Sheng, Xuanjing Shen, and Zenan Shi
Snapshot Multiplexed Imaging Based on Compressive Sensing . . . . . . . . . . . 465

Ying Fu, Chen Sun, Lizhi Wang, and Hua Huang
Partially Annotated Gastric Pathological Image Classification . . . . . . . . . . . . 476

Yanping Cui, Zhangcheng Wang, Guanzhen Yu, and Xinmei Tian
Facial Expression Recognition Based on Local Double Binary

Mapped Pattern. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487
Chunjian Yang, Min Hu, Yaqin Zheng, Xiaohua Wang, Yong Gao,
and Hao Wu
Pixel-Copy Prediction Based Lossless Reference Frame Compression . . . . . . 498

Weizhe Xu, Fangfa Fu, Binglei Lou, Yao Wang, and Jinxiang Wang
Robust Underwater Fish Classification Based on Data Augmentation

by Adding Noises in Random Local Regions . . . . . . . . . . . . . . . . . . . . . . . 509
Guanqun Wei, Zhiqiang Wei, Lei Huang, Jie Nie, and Huanhuan Chang
Study on User Experience of Panoramic Images on Different

Immersive Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519
Shilin Wu, Zhibo Chen, Ning Liao, and Xiaoming Chen
Environmental Sound Classification Based on Multi-temporal Resolution

Convolutional Neural Network Combining with Multi-level Features . . . . . . . 528
Boqing Zhu, Kele Xu, Dezhi Wang, Lilun Zhang, Bo Li, and Yuxing Peng
Contents – Part II XIII
Discriminative Dictionary Learning Based on Sample Diversity

for Face Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 538
Yuhong Wang, Shigang Liu, Yali Peng, and Han Cao
Spatial Attention Network for Head Detection . . . . . . . . . . . . . . . . . . . . . . 547

Rongchun Li, Biao Zhang, Zhen Huang, Xiang Zhao, Peng Qiao,
and Yong Dou
Arbitrary Perspective Crowd Counting via Multi Convolutional Kernels. . . . . 558

Minghui Yu, Teng Li, Jun Zhang, Jiaxing Li, Feng Yuan, and Ran Li
3D Shape Co-segmentation by Combining Sparse Representation with

Extreme Learning Machine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570
Hongyan Li, Zhengxing Sun, Qian Li, and Jinlong Shi
RS-MSSF Frame: Remote Sensing Image Classification Based on

Extraction and Fusion of Multiple Spectral-Spatial Features . . . . . . . . . . . . . 582
Hanane Teffahi and Hongxun Yao
Hierarchical Image Segmentation Based on Multi-feature Fusion and Graph

Cut Optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596
Anqi Hu, Zhengxing Sun, Yuqi Guo, and Qian Li
A Sound Image Reproduction Model Based on Personalized

Weight Vectors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607
Jiaxi Zheng, Weiping Tu, Xiong Zhang, Wanzhao Yang,
Shuangxing Zhai, and Chen Shen
Reconstruction of Multi-view Video Based on GAN . . . . . . . . . . . . . . . . . . 618

Song Li, Chengdong Lan, and Tiesong Zhao
Contextual Attention Model for Social Recommendation . . . . . . . . . . . . . . . 630

Hongfeng Bao, Le Wu, and Peijie Sun
Cracked Tongue Recognition Based on Deep Features and Multiple-

Instance SVM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 642
Yushan Xue, Xiaoqiang Li, Qing Cui, Lu Wang, and Pin Wu
Multitask Learning for Chinese Named Entity Recognition. . . . . . . . . . . . . . 653

Qun Zhang, Zhenzhen Li, Dawei Feng, Dongsheng Li, Zhen Huang,
and Yuxing Peng
Sparse-Region Net: Local-Enhanced Facial Depthmap Reconstruction

from a Single Face Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663
Haoqian Wang, Shuhao Zhang, Xingzheng Wang, and Yongbing Zhang
XIV Contents – Part II
Entropy Based Boundary-Eliminated Pseudo-Inverse Linear Discriminant

for Speech Emotion Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674
Dongdong Li, Linyu Sun, Zhe Wang, and Jing Zhang
An Improved C-COT Based Visual Tracking Scheme to Weighted Fusion

of Diverse Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 686
Lifang Wu, Qi Wang, Dezhong Xu, and Meng Jian
Focal Liver Lesion Classification Based on Tensor Sparse Representations

of Multi-phase CT Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 696
Jian Wang, Xian-Hua Han, Jiande Sun, Lanfen Lin, Hongjie Hu,
Yingying Xu, Qingqing Chen, and Yen-Wei Chen
Joint Learning of LSTMs-CNN and Prototype for Micro-video

Venue Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705
Wei Liu, Xianglin Huang, Gang Cao, Gege Song, and Lifang Yang
Spatial Pixels Selection and Inter-frame Combined Likelihood

Based Observation for 60 fps 3D Tracking of Twelve Volleyball
Players on GPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 716
Yiming Zhao, Xina Cheng, and Takeshi Ikenaga
An Interactive Light Field Video System with User-Dependent View

Selection and Coding Scheme. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 727
Bing Wang, Qiang Peng, Xiao Wu, Eric Wang, and Wei Xiang
Deep Residual Net Based Compact Feature Representation

for Image Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 737
Cong Bai, Jian Chen, Qing Ma, Zhi Liu, and Shengyong Chen
Sea Ice Change Detection from SAR Images Based on Canonical

Correlation Analysis and Contractive Autoencoders . . . . . . . . . . . . . . . . . . . 748
Xiao Wang, Feng Gao, Junyu Dong, and Shengke Wang
Pedestrian Attributes Recognition in Surveillance Scenarios

with Hierarchical Multi-task CNN Models . . . . . . . . . . . . . . . . . . . . . . . . . 758
Wenhua Fang, Jun Chen, Tao Lu, and Ruimin Hu
Re-Ranking Person Re-Identification with Forward and Reverse

Sorting Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 768
Meibin Qi, Yonglai Wei, Kunpeng Gao, Jianguo Jiang, and Jingjing Wu
Content-Based Co-Factorization Machines: Modeling User Decisions

in Event-Based Social Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 780
Yilin Zhao, Yuan He, and Hong Li
Contents – Part II XV
Image-into-Image Steganography Using Deep Convolutional Network . . . . . . 792

Pin Wu, Yang Yang, and Xiaoqiang Li
Deep Forest with Local Experts Based on ELM for Pedestrian Detection . . . . 803
Wenbo Zheng, Sisi Cao, Xin Jin, Shaocong Mo, Han Gao, Yili Qu,
Chengfeng Zheng, Sijie Long, Jia Shuai, Zefeng Xie, Wei Jiang,
Hang Du, and Yongsheng Zhu
AdvRefactor: A Resampling-Based Defense Against Adversarial Attacks . . . . 815

Jianguo Jiang, Boquan Li, Min Yu, Chao Liu, Jianguo Sun,
Weiqing Huang, and Zhiqiang Lv
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 827

Poster Papers
Enhancing Low-Light Images with JPEG
Artifact Based on Image Decomposition
Chenmin Xu, Shijie Hao(&), Yanrong Guo, and Richang Hong
Hefei University of Technology, Hefei, China

hfut.hsj@gmail.com
Abstract. Images shared on the Internet are often compressed into a small size,
and thus have the JPEG artifact. This issue becomes challenging for task of low-
light image enhancement, as the artifacts hidden in dark image regions can be
further boosted by traditional enhancement models. We use a divide-and-
conquer strategy to tackle this problem. Specifically, we decompose an input
image into an illumination layer and a reflectance layer, which decouple the
issues of low lightness and JPEG artifacts. Therefore we can deal with them
separately with the off-the-shelf enhancing and deblocking techniques. Quali-
tative and quantitative comparisons validate the effectiveness of our method.
Keywords: Image enhancement Low light JPEG artifact Retinex model
1 Introduction
With the advance of photographing devices, people always prefer images with clear
content details and few artifacts. Nevertheless, this is not always the case in various
real-world situations. For example, people nowadays enjoy taking photographs and
share them on the Internet. In this process, the quality of an image can be affected by
many factors. In the stage of photograph shooting, poor lightness conditions (e.g.
nighttime) and amateur shooting skills (e.g. back light) often bring in a dark visual
appearance ((e.g. Fig. 1(a))). In the stage of photograph distribution, images are often
unintentionally compressed or resized into a smaller size by social network software
like WeChat or QQ. Many high-frequency image details can be filtered out in this
process and JPEG artifacts (or called block artifacts) are therefore introduced. In this
context, for these compressed low-light images, an image enhancement model is
expected to have the abilities of lightness enhancement and block removal [1].
However, conventional image enhancement methods [2–5] are limited due to the
following reasons. First, these methods are not equipped with the artifact-removing
ability. When we directly apply the off-the-shelf low-light enhancement methods on
them, the JPEG artifacts can be unnecessarily amplified (e.g. Fig. 1(b)). Furthermore,
JPEG artifacts usually hide in the image regions of low contrast, which makes the
enhancement task more challenging. For example, compared with the original clean
image, the hidden JPEG artifacts are not visually obvious in the compressed images
(e.g. Fig. 1(a)).
© Springer Nature Switzerland AG 2018

R. Hong et al. (Eds.): PCM 2018, LNCS 11165, pp. 3–12, 2018.
https://doi.org/10.1007/978-3-030-00767-6_1
4 C. Xu et al.
(a) A visual comparison between low-light images without (first row) and with (sec-
ond row) JPEG artifacts (Q=60). We use these six compressed low-light images as
our experimental data.
(b) Enhanced results directly based on state-of-the-art low-light enhancement methods

(e.g. MF [12] and LIME [5]), and our method (better with a bright screen display).
(c) Enhanced results based on the technical roadmaps of deblock-enhance, enhance-

deblock, and our method (better with a bright screen display).
Fig. 1. An illustration of our research background.

Enhancing Low-Light Images with JPEG Artifact 5
Intuitively, we can simply adopt the technical roadmap of deblock-enhance or

alternatively enhance-deblock to tackle this problem. However, both of them are still
limited. As shown in Fig. 1(c), the deblock-enhance roadmap tends to produce over-
smoothed final results, while the enhance-deblock roadmap has difficulties in removing
the unnecessarily boosted JPEG artifact. In this paper, we propose a novel framework
that simultaneously enhances the lightness and removes the JPEG blocks well (Fig. 2).
The key element is to decompose the input image into the illumination layer and the
reflectance layer. In this way, the low lightness and JPEG artifact can be well separated,
which avoids the risk of artifact-boosting. Our research is highlighted in the proposed
framework that well enhances compressed low-light images. Experimental results
demonstrate the superior performance over other related methods both qualitatively and
quantitatively.
Fig. 2. The flowchart of our method.
2 Related Works
In recent years, many low-light image enhancement methods have been proposed. We
can divide the enhancing methods into the single-source group and the multi-source
group. The former refers to the methods with a single input image for enhancement,
while the later uses multiple images as inputs.
As for single-source low-light image enhancement methods, we further divided into
the histogram-based ones and the Retinex-based ones. As the histogram of a low-light
image is heavily tailed at low intensities, the histogram-based methods are targeted on
reshaping the histogram distribution [2, 6]. Since an image histogram is a global
descriptor and drops all the spatial information, these methods are prone to produce
over-enhanced [2] or under-enhanced [6] results for local regions. The Retinex-based
models assume that an image is composed of an illumination layer representing the
light intensity of an object, and a reflectance layer representing the physical charac-
teristics of the object’s surface. Based on this image representation, the low-light
enhancement can be achieved by adjusting the illumination layer [3, 7, 8]. Differently,
Guo et al. [5] propose a simplified Retinex model for the task of low-light image
enhancement. Instead of the intrinsic decomposition, they directly estimate a piece-
smooth map as the illumination layer. In general, the visual appearance of these single-
source methods’ results heavily depends on a properly chosen enhancing strength.
6 C. Xu et al.
Low-light image enhancement methods based on multiple sources can relief the
issue of choosing the enhancing parameter, as they adopt the technical roadmap of
multi-source fusion. In principle, larger dynamic range can be captured for an imaging
scene with multiple images of different exposures. Then these images are fused based
on a multi-scale image pyramid [9] or a patch-based image composition [10]. In many
cases, however, multiple sources are not available and only one low-quality input
image is at hand. To address this challenge, a feasible way is to artificially produce
multiple initial enhancements and then fuse them. Ying et al. [12] propose a method
simulating the camera response model and generate multiple intermediate images of
different exposing levels. Different from [11], multiple initial results with different
enhancing models are generated and taken as the fusion sources in [12].
Although all the above methods well address the problem of dark image appear-
ance, they are not equipped with the function of artifact removal. To the best of our
knowledge, there are few works that concentrate on the enhancement of compressed
low-light images. Li et al. [1] propose to decompose an image into the structure layer
and a texture layer. The former layer is for the contrast enhancement; while the later
one is for the JPEG block removal. Our method resembles the backbone of [1] but
distinguishes itself in the technical details. First, we use a totally different image
representation model (illumination-reflectance vs. structure-texture). Second, our
enhancement model is also different from the one adopted in [1] (fusion-based vs.
histogram-based).
3 Proposed Method
3.1 Overall Framework
Our technical roadmap is to separately solve the low lightness issue and the JPEG
artifact issue. The proposed framework is shown in Fig. 2. We first convert the RGB
input image into the HSV space. Then the V channel V of the input image S is firstly
decomposed into two components (Sect. 3.2), i.e. the illumination layer I and the
reflectance layer R. Then we perform low-light enhancement on the illumination layer
(Sect. 3.3), and perform the JPEG artifact removal on the reflectance layer (Sect. 3.4).
Finally, the output image is obtained by re-combining the refined I0 and R0 :
Voutput ¼ I0 R0 . By replacing the refined Voutput with the original V, the final output
image Soutput can be obtained by converting the HSV representation back into the RGB
representation. In another word, we keep the color information of S unchanged during
the whole process.
3.2 Image Decomposition

We decompose the input image V into the illumination layer I and the reflectance layer
R. Since we only have an observed image S at hand in real-world applications, the
decomposition is an ill-posed task. Additionally, for our task, we further aim to
decouple the low lightness and the JPEG artifact.
To meet the above demands, we use an image decomposition model by jointly

considering shape, texture and illumination priors [8]. We minimize the following
target function:
EðI; RÞ ¼ kV I Rk22 þ g1 Es ðIÞ þ g2 Et ðRÞ þ g3 El ðIÞ ð1Þ
Here the first term represents the data fidelity, and the rest terms encode the three
priors. The first prior is constructed as:
2
Es ðIÞ ¼ ux krx Ik22 þ uy ry I2 ð2Þ
!1 !1
1 X 1 X

ux ¼ r Ijr Ij þ e ; uy ¼ r I r I þ e ð3Þ
X X x x X X y y
The minimization of Es leads to the extracted I that is consistent with the image
structures of V. The second prior is constructed as:
2
Et ðRÞ ¼ vx krx Rk22 þ vy ry R2 ð4Þ
1
vx ¼ ðjrx Rj þ eÞ1 ; vy ¼ ry R þ e ð5Þ
The minimization of Et preserves fine details of V for the extracted R. The third
prior is constructed as
El ðIÞ ¼ kI Bk22 ð6Þ
where B is the maxRGB matrix:
Bð pÞ ¼ max Sc ð pÞ ð7Þ
c2fR;G;Bg
The element in B represents the possible maximum brightness of a pixel. Through the
optimization, the obtained I is forced to be consistent with the brightness distribution.
An alternative optimization strategy is then applied to solve the optimal I and R.
In summary, the first prior Es and the third prior Et enforce the illumination layer to
be structure–aware and illumination-aware; the second prior Et focuses on preserving
as many texture details and block artifacts as possible for the reflectance layer. The
examples in Fig. 3 validate the effectiveness of the chosen decomposition model.
Specifically, we observe that the decomposed R contains almost all the block artifacts.
3.3 Low-Light Enhancement

We enhance the illumination layer I by using a fusion framework proposed in [8]. As
we only have I at hand, we have to artificially produce several fusion sources.
By noticing that the extracted I has already been piece-wisely smooth and detail-free,
8 C. Xu et al.
Fig. 3. Examples of the image decomposition results.
we can simply apply the global contrast enhancement models. For producing the first
fusion source I1 , we first use a nonlinear intensity mapping:
2
I1 ð pÞ ¼ arctanðkIð pÞÞ ð8Þ
p
1 meanðIÞ
k¼ þ 10 ð9Þ
meanðIÞ
Then the enhancement result based on the well-known CLAHE method [3] is taken as
the second fusion source I2 . In case of over- or under-enhancement, we choose the
original I as the third fusion source I3 , which plays the role of regularization in the
fusion process.
We construct pixel-level weight matrices. We first consider the brightness of
Ik ðk ¼ 1; 2; 3Þ:
ðIk ð pÞ bÞ2
WkB ð pÞ ¼ exp ð10Þ
2r2
where b and r represent the mean and standard deviation of the brightness in a natural
image in a broadly statistical sense. They are empirically set as 0.5 and 0.25, respec-
tively. When the pixel intensities are far from b, it means that they are possibly over- or
under exposed, and the weights should be small. Second, we consider the chromatic
contrast by incorporating the H and S channels of S:
WkC ð pÞ ¼ Ik ð pÞ ð1 þ cosðaHð pÞ þ uÞ Sð pÞÞ ð11Þ
where a and u are parameters to preserve the color consistency, and empirically set as 2
and 1.39 p. This weight emphasizes the image regions of high contrast and good colors.
By combining these two weights together and normalizing it, we can obtain the weights
for each fusion source:
W k ð pÞ
W k ð pÞ ¼ P ð12Þ
k W k ð pÞ
Wk ð pÞ ¼ WkB ð pÞ WkC ð pÞ ð13Þ
To ensure a seamless fusion, we use multi-scale technique based on image pyra-

mids. We first build Laplacian pyramids Ll fIk g for all the fusion sources. By building

Gaussian pyramids Gl Wk for the weight matrices, we can fuse fIk g at various
scales:
X
L0l ¼ k
Gl Wk Ll fIk g ð14Þ
At last, the enhanced illumination layer I0 can be obtained by collapsing the fused
pyramid.
3.4 JPEG Artifact Removal

Since the JPEG artifact is divided from the illumination layer I, we can conduct the
deblocking in the reflectance layer R. We adopt a simple but effective deblocking
model in [1]:
X X
ðR0 ð pÞ Rð pÞÞ þ l ðrR0 ðp0 ÞÞ
2 2
min
0 p p0 2N ð pÞ
ð15Þ
R ð pÞ
In this model, the first fidelity term restricts the refined R0 from going too far from
the original R. The second term is specifically designed for eliminating block edges
that are introduced by JPEG compression. By considering the specific pattern of JPEG
blocks, we choose a specific neighboring system N ð pÞ for each location p: p0 2 N ð pÞ
refers to pixel positions along the border edges of each 8 8 image patch. By this
setting, the optimization process of Eq. 15 concentrates on the JPEG blocks, and tries
to preserve the original image structures. The parameter l acts as a balancing weight
between the two terms, and is empirically set as 0.5.
4 Experiments
In this section, we validate our method with qualitative and quantitative comparisons.
The experimental images are shown in Fig. 1(a), of which the compression strength is
set as Q = 60. The MATLAB codes were run on a PC with 8G RAM and 2.6G
Hz CPU.
As our task is to enhance the low lightness and suppress the JPEG artifact, we first
use two baseline models for comparison. Baseline 1: remove the artifact at first and
then enhance the low lightness. Baseline 2: enhance the low lightness at first and then
10 C. Xu et al.
remove the artifact. For the fairness of the comparison, we use the methods mentioned
in Sects. 3.3 and 3.4 to achieve the lightness enhancement and the artifact suppression,
respectively. We also compare our method with the most related one proposed in [1]
and term it as ECCV14. Its parameters are empirically set as the default values as in
[1]. As for our methods, since the framework of our method is open for the choice of
image enhancement models, we choose two kinds of them. The first is the one men-
tioned in Sect. 3.3, and we term it as Ours-MF. The second is the simple gamma
correction process, and we term it as Ours-GC. Here we simply set the enhancement
parameter c as 1=2:2 according to [8]. As for the image decomposition stage of Ours-
GC and Ours-MF, we follow [8] to set g1 ¼ 0:001, g2 ¼ 0:0001, g3 ¼ 0:25, and
e ¼ 0:01.
We first present the visual comparison in Fig. 4. From the two examples, we have
the following observations. First, we observe the lost image details in the results of
Baseline 1 and ECCV14. The reason is that the image details hidden in the darkness
are vulnerable to the JPEG removal process, and many of them are unnecessarily
removed. In contrary, the artifact removal of our method is applied on the decomposed
reflectance layer that extracts the image contents at various scales in advance. Second,
Baseline 2 well preserves the image details, but removes much fewer block artifacts
than Ours-MF and Ours-GC. Since Baseline 2 firstly enhances the input image, the
originally weak block edges hidden in the darkness are unnecessarily boosted, which
brings difficulty for the following suppression step. Our methods do not have this
Fig. 4. Qualitative comparison between our methods and the counterparts.

problem due to the well-separated image layers. Third, by comparing the two versions
based on our framework, we can see that Ours-MF achieves better results than Ours-
GC in terms of the lightness condition.
We also quantitatively evaluate all the above methods with two metrics proposed in
[13, 14]. The results are shown in Tables 1 and 2, in which the bold/underline numbers
indicate the best/second best results among all the methods, respectively. From
Table 1, we can see that Ours-MF and Ours-GC generally achieve better performance
than other three methods in terms of removing JPEG artifacts. From Table 2, Ours-MF
generally has the best performance in terms of image contrast. Differently, the per-
formance of Ours-GC is less competitive than ECCV14. The reason is that the gamma
correction model only imposes a global non-linear transform on the lightness layer.
Based on the qualitative and quantitative comparison, Ours-MF achieves the best
performance, and validates the effectiveness of our idea of decoupling the low lightness
issue and the JPEG artifact issue at the beginning. In a word, all the above results
validate the effectiveness of each part of our proposed framework.
Table 1. Quantitative comparison based on the metric measuring the block effect [13]
Input Baseline1 Baseline2 ECCV14 Ours-MF Ours-GC
1 0.2109 0.0383 0.0296 0.0276 0.0119 0.0366
2 0.1975 0.0307 0.0216 0.0165 0.0037 0.0045
3 0.1900 0.0735 0.0472 0.0180 0.0064 0.0074
4 0.1941 0.0503 0.0407 0.0232 0.0150 0.0133
5 0.1894 0.0284 0.0171 0.0110 0.0021 0.0034
6 0.1268 0.0445 0.0195 0.0171 0.0043 0.0038
Average 0.1848 0.0443 0.0293 0.0189 0.0072 0.0115
The bold font means the best result
Table 2. Quantitative comparison based on the metric measuring the image contrast [14]
Input Baseline1 Baseline2 ECCV14 Ours-MF Ours-GC
1 7.6885 7.1351 7.0683 6.0436 5.8170 7.6696
2 6.3567 4.7280 4.0690 4.6849 3.3641 4.3921
3 7.1700 5.3822 3.8854 3.5243 3.1429 3.7367
4 6.3027 5.0812 4.8454 4.3957 4.2165 4.9111
5 6.4311 2.9499 2.9322 2.5457 3.1349 3.1649
6 7.2752 3.8666 3.9664 3.5832 4.1252 5.3496
Average 6.8707 4.8571 4.4611 4.1296 3.9668 4.8707
The underline font means the second best result
5 Conclusions
In this paper, we propose an image enhancement method for compressed low-light

images. Based on image decomposition, low lightness and JPEG artifact are separated
into two decomposed layers, which can be well addressed by off-the-shelf enhancing
12 C. Xu et al.
techniques. Visual and quantitative comparisons demonstrate the effectiveness of our

method. We plan to improve our method by introducing saliency detection [15, 16].
Acknowledgements. The research was supported by the National Nature Science Foundation of
China (Grant Nos. 61632007, 61772171, and 61702156), Anhui Provincial Natural Science
Foundation (Grant No. 1808085QF188)
References
1. Li, Y., Guo, F., Tan, R.T., Brown, M.S.: A contrast enhancement framework with JPEG
artifacts suppression. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014.
LNCS, vol. 8690, pp. 174–188. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-
10605-2_12
2. Reza, A.M.: Realization of the contrast limited adaptive histogram equalization (CLAHE)
for real-time image enhancement. J. VLSI Signal Process. Syst. 38(1), 35–44 (2004)
3. Yue, H., Yang, J., Sun, X., Wu, F., Hou, C.: Contrast enhancement based on intrinsic image
decomposition. IEEE TIP 26(8), 3981–3994 (2017)
4. Li, M., Liu, J., Yang, W., Sun, X., Guo, Z.: Structure-revealing low-light image
enhancement via robust Retinex model. IEEE TIP 27(6), 2828–2841 (2018)
5. Guo, X., Li, Y., Ling, H.: LIME: low-light image enhancement via illumination map
estimation. IEEE TIP 26(2), 982–993 (2017)
6. Lee, C., Lee, C., Kim, C.: Contrast enhancement based on layered difference representation
of 2D histograms. IEEE TIP 22(12), 5372–5384 (2013)
7. Fu, X., Zeng, D., Huang, Y., Zhang, X., Ding, X.: A weighted variational model for
simultaneous reflectance and illumination estimation. In: Proceedings of CVPR (2016)
8. Cai, B., Xu, X., Guo, K., Jia, K., Hu, B., Tao, D.: A joint intrinsic-extrinsic prior model for
retinex. In: Proceedings of ICCV (2017)
9. Kou, F., Wei, Z., Chen, W., Wu, X., Wen, C., Li, Z.: Intelligent detail enhancement for
exposure fusion. IEEE TMM 20(2), 484–495 (2018)
10. Ma, K., Li, H., Yong, H., Wang, Z., Meng, D., Zhang, L.: Robust multi-exposure image
fusion: a structural patch decomposition approach. IEEE TIP 26(5), 2519–2532 (2017)
11. Ying, Z., Li, G., Gao, W.: A bio-inspired multi-exposure fusion framework for low-light
image enhancement. ArXiv, abs/1711.00591 (2017)
12. Fu, X., Zeng, D., Huang, Y., Liao, Y., Ding, X., Paisley, J.: A fusion-based enhancing
method for weakly illuminated images. Sig. Process. 129, 82–96 (2016)
13. Golestaneh, S.A., Chandler, D.M.: No-reference quality assessment of JPEG images via a
quality relevance map. IEEE SPL 21(2), 155–158 (2014)
14. Gu, K., Wang, S., Zhai, G.: Blind quality assessment of tone-mapped images via analysis of
information, naturalness, and structure. IEEE TMM 18(3), 432–443 (2016)
15. Jian, M., Qi, Q., Dong, J., Sun, X., Sun, Y., Lam, K.: Saliency detection using quaternionic
distance based weber local descriptor & level priors. MTAP 77(11), 14343–14360 (2018)
16. Jian, M., Lam, K., Dong, Y., Shen, L.: Visual-patch-attention-aware saliency detection.
IEEE Trans. Cybernetics 45(8), 1575–1586 (2015)
Depth Estimation from Monocular
Images Using Dilated Convolution and
Uncertainty Learning
Haojie Ma1,2 , Yinzhang Ding1,2 , Lianghao Wang1,2,3(B) , Ming Zhang1,2 ,

and Dongxiao Li1,2
1
College of Information Science and Electronic Engineering, Zhejiang University,
Hangzhou 310027, China
wanglianghao@zju.edu.cn
2
Zhejiang Provincial Key Laboratory of Information Processing,
Communication and Networking, Hangzhou 310027, China
3
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing,
People’s Republic of China
Abstract. Depth cues are vital in many challenging computer vision

tasks. In this paper, we address the problem of dense depth predic-
tion from a single RGB image. Compared with stereo depth estimation,
sensing the depth of a scene from monocular images is much more diffi-
cult and ambiguous because the epipolar geometry constraints cannot be
exploited. In addition, the value of the scale is often unknown in monoc-
ular depth prediction. To facilitate an accurate single-view depth predic-
tion, we introduce dilated convolution to capture multi-scale contextual
information and then present a deep convolutional neural network. To
improve the robustness of the system, we estimate the uncertainty of
noisy data by modelling such uncertainty in a new loss function. The
experiment results show that the proposed approach outperforms the
previous state-of-the-art methods in depth estimation tasks.
Keywords: Depth estimation · Dilated convolution

Convolutional neural network · Uncertainty
1 Introduction
Depth estimation has been investigated for a long time because of its vital role in
computer vision. Some studies have proven that accurate depth information is use-
ful for various existing challenging tasks, such as image segmentation [1], 3D recon-
struction [2], human pose estimation [3], and counter detection [5]. Humans can
effectively predict monocular depth by using their past visual experiences to struc-
turally understand their world and may even utilize such knowledge in unfamiliar
environments. However, monocular depth prediction remains a difficult problem
for computer vision systems due to the lack of reliable depth cues.
c Springer Nature Switzerland AG 2018
R. Hong et al. (Eds.): PCM 2018, LNCS 11165, pp. 13–23, 2018.
https://doi.org/10.1007/978-3-030-00767-6_2
14 H. Ma et al.
Many studies have investigated the use of image correspondences that are
included in stereo image pairs [6]. In the case of stereo images, depth estimation
can be addressed when the correspondence between the points in the left and
right parts of images is established. Many studies have also explored the method
of motion [7], which initially estimates the camera pose based on the change in
motion in video sequences and then recovers the depth via triangulation. Obtain-
ing a sufficient number of point correspondences plays a key role in the afore-
mentioned methods. These correspondences are often found by using the local
feature selection and matching techniques. However, the feature-based method
usually fails in the absence of texture and the occurrence of occlusion. Owing to
the recent advancements in depth sensors, directly measuring depth has recently
become affordable and achievable, but these sensors have their own limitations
in practice. For instance, Microsoft Kinect is widely used indoors for acquiring
RGB-D images but is limited by short measurement distance and large power
consumption. When working outdoors, LiDAR and relatively cheaper millimeter
wave radars are mainly used to capture depth data. However, these collected data
are always sparse and noisy. Accordingly, there has always been a strong interest
in accurate depth estimation from a single image. Recently, CNNs [8] with pow-
erful feature representation capabilities have been widely used for single-view
depth estimation by learning the implicit relationship between an RGB image
and the depth map. Despite increasing the complexity of the task, the outputs
of deep-learning-based approaches [13–17] showed significant improvements over
those of traditional techniques [10–12] on public datasets.
In this work, we exploit CNNs to learn strong features for recovering the
depth map from a single still image. Given that applying downsampling, upsam-
pling, or deconvolution in a fully convolution network may result in the loss of
many cues in the image boundary for pixel-level regression tasks, we introduce
the dilated convolution [9] to learn multi-scale information from a single scale
image input. Long skip connections are also applied to combine the abstract fea-
tures with image features. To achieve more robust prediction, we further model
the uncertainty in computer vision and proposed a novel loss function to mea-
sure such uncertainty during training without labels. The experimental results
demonstrate that our proposed method outperform state-of-the-art approaches
on standard benchmark datasets [2,21].
2 Related Work
Previous studies have often used probabilistic graphic model and have usually
relied on hand-crafted features. Saxena et al. [10] proposed a superpixel-based
method for inferring depth from a single still image and applied a multi-scale
MRF to incorporate local and global features. Ladicky et al. [11] introduced
semantic labels on the depth to learn a highly discriminative classifier. Karsch
et al. [12] proposed a non-parametric approach for automatically generating
depth. In this approach, the GIST features were initially extracted for the input
image and other images in the database, then several candidate depths that
Depth Estimation from Monocular Images 15
correspond to the candidate images were selected before conducting warping

and optimization procedures.
Recent studies have employed CNNs to solve the depth prediction problem.
Eigen et al. [13] utilized two network stacks to regress depth. The first local
network makes a coarse depth prediction for the global contents, while the other
network refines the prediction locally. Liu et al. [15] combined CNN with con-
tinuous CRF in a unified framework. Wang et al. [16] jointly addressed depth
prediction and semantic segmentation by using a common CNN. They proposed
a two-layer hierarchical CRF model to refine the coarse network output. Laina
et al. [17] proposed a fully convolutional network and introduced a robust loss
function called berHu loss. Some unsupervised approaches have also been intro-
duced recently to address the challenges in obtaining a large number of dense
and reliable depth labels, especially in outdoor scenes. Garg et al. [18] treated
depth estimation as an image reconstruction problem and proposed the use of
photometric loss. Given that the loss is not completely differentiable, they per-
formed a first-order Taylor expansion to linearize the results in warp images.
Based on [18], Godard et al. [19] considered the left-right disparity consistency
constraint and dealt with the warp image by bilinear interpolation.
In this paper, we construct a fully convolutional network for monocular depth
prediction. To maintain additional feature information, we apply dilated convo-
lution to enlarge the receptive field without reducing the resolution of the fea-
ture maps. Afterward, we implement low-level and high-level information fusion
by using long skip connections. Unlike the previous CNNs that are unable to
represent, or model uncertainty as probability distributions by using CRF, our
network can accurately estimate uncertainty as the model attenuation.
3 Approach
3.1 Network Architecture
We adopt a deep fully convolutional network with an encoder-decoder architec-

ture for single-view depth estimation (see Fig. 1). This network is constructed
based on ResNet [4], which performs well in image classification. We remove the
last average pooling layer and fully connected layer of the original version. In this
way, we discard most of the network parameters and successfully train our model
on the current hardware. As low-resolution feature maps contain less boundary
information, we employ dilated convolution to expand the receptive field while
maintaining the features within an appropriate size. The key components of the
decoder part are the two up-sampling layers that are used to recover image reso-
lution. To achieve a higher depth accuracy, we choose the up-projection module
proposed in [17] as our up-sampling layer. This module comprises an unpooling
layer (which increases the spatial resolution of the feature map) and two convo-
lution layers with residual learning. We concatenate the corresponding feature
maps from the encoder and decoder parts by skip connections. Eventually, a
convolution is applied to generate the depth prediction.
Another random document with
no related content on Scribd:
In één uur tijd was de geweldige marktdrukte komen
áándonderen in furiënd gescharrel en gesjouw,
geraas, getier en roezemoezende stemmenbotsing.
Paarden met karren, al méér, al méér, klakkerden en
bonkerden onder de belommerde kastanjelaan,
schurend langs de huisstoepen, zwenkend, tegen
elkaar òp. Vrachtrijders verkrijschten: ho!.… ho!…
hai!-daar!-geroep, [88]tusschen kerels, vrouwen en
kinderen; tusschen ladende en lossende werkers, of
dwars door rumoerige groepen, opgepropt òm groote
groenteuitstalling van afslag, waar koopgierige
schobbejakkige venters rond-kankaneerden in schor
opbied-geschreeuw, als dronken duivels.
Van zijstraatjes uit, dobberde zwaar-stootend gedrang

van karren, de eene al hooger achter de andere
òpbergend, met kisten- en zakkenlading.
Tegen half zeven daagde van allen kant ’t landvolk òp,

voortstuwend pyramiden van kleurdruipende groenten,
als was aan akkerzij van Wiereland overstrooming
losgebroken, de zee over de duinen heengegolfd. Als
was daardoor ’n chaotische beroering, ’n opstand,
dwars door paniekheen gescheurd; had zich ’n drom
angstigen aanééngestoet, verschrompelde
bemodderde kerels, reddend van hun zwaren ploeter
àl wàt ’r op ’t land nog te halen viel. Eén wilde, woeste
samenstrompeling van mannen, vrouwen, kinders,
met kisten en manden en zenuwbevende handen, die
meehielpen aansjouwen en wegduwen van kar naar
boot.
Aardbeigloei, rood-fel en purperdiep, brandde weer in
zonnelaai. Honderden en duizenden bakken met de
roode pracht-furie der vruchtjes gingen van hand tot
hand op de booten, van de karren naar de
dekplaatsen, en van alle kanten losten de wagens uit,
stortten een groentezee neer, in branding van kleuren,
aanklotsend tegen zonnegolven; tegen broklijven van
schuiten, kerels, karren, paarden en honden, één
woeste uitgolving, òpzieding van groen, groèn, van
zeng-rood, vuur, donkerend purper en hel prachtig
wortelen-oranje.—
Hemel was blauw-diep weer uitgetinteld, tusschen

zilverende wolktochten en in de koelender
zilverglanzen van gloedroode, stillere zon, lag de
kastanjelaan doorzeefd in wondren glansnevel,
doorvlàmd en zonnemistig tegelijk. En de
groenbemoste boomstammen, in flitsen grillig
beserpentiend, kromden gevangen in gouden kronkels
van glans-slangen.
Dampig gezeef van goudschemer, gloedrood, vloeide

over de bronzen en roode kielen der woelende kerels,
werkersstoeten [89]in sfeer van gouden lichtdamp en
roode toortsentooverij bewegend.—
Midden in zonnetoover daverde ’t haventje van

helsche donderbonkerende rumoering. Telkens
ratelden van de zijwegen, uit smal-doffige broei-
steegjes, nieuwe karren áán, waarachter angstig-
verwrongen gezichten èven uithoekten, tusschen
bakken en manden; monsterlijk gerammeide
zweetkoppen, doodòp van zwoeg, benauwing en
snikhitte.—Hun bakken overstroomden de markt,
verzwolgen ’t licht. Uit àl andere weggetjes en
spleetstraatjes, van rechts, van links, dwars, opzij, uit
nog fel-bezonde achteraffe gaatjes en steegjes,
flonkerend en dampend in verblindend schellen
lichtgloei, bonkerden en schokkerden ààn, al meer
paardwagens en handkarren, aardbeien en
groentevrachten, rood en groen; felle uitbarstende
gamma’s van purper naar papaver-vuùr, en al soorten
smaragd, tusschen gelige goud-gloeiende kolen, al
méér, al méér, als moest ’t haventje begraven,
verzwolgen onder de bakken, manden, vrachten en
vaten. Overal, de heete hartstocht van ’t gloeikleurig
rood en oranje, de felle klater van ’t schelle en
zingende groen, dook òp van alle zij, uit zon-
overstorte zijweggetjes, blakerend, smal en vunzig
verstoft in de Julihitte, met hun ordelooze wemeling
van krijtwitte, bruine, fel-groene bekalkte geveltjes en
rooddakig brio.—Overal doken òp, geweldig in
drommen tusschen ’t kleurheete, zengende geveltjes-
koloriet,—ònder de roodgoud doordampte
boomenlaan uit,—al nader, àl nader, de wagens met
hun lichtend groen en vruchtenpurper, tègen de
hemelblauwing in; de huisjes al kleiner, verder wèg, de
karren àl grooter en massaler, aanbonkerend op de
Haven.
Later nog, de kastanjeboomen zeefden in zonnemist

en de kruinen stonden weer uitgegloeid als was ’t
goud er tusschen vervloeid.—
Hier en daar nog van boven schampte ’n flits, ’n

gloeiende kronkel, ’n trillende spiraalglans; was
lichtweb uitgescheurd en blijven hangen wat bevend
spinsel van goud-gloed.—
Links van de karren zwenkten de tuinders hun

paardwagens àchter paaltjes áán, op stiller gedeelte,
’n end van d’r [90]boot af.—Stil, tusschen rumoering,
suften de paardkoppen, de voorste vastgetouwd aan
paaltjes. Maar ook dààr, heel gauw, dromde ’t stikvol,
schoven al meer karren in, stonden de beesten met de
lijven op elkaar ingedrongen, koppen,
droefmelankoliek naar beneden, in zorgelijke
bepeinzing over ’t oorlogend havenrumoer, geraas en
gekrijsch. Te wachten, roerloos stonden de beesten,
d’r half-leeggeladen karren kleure-jolend in zinkenden
zonnedoop, kermisjubel van helblauw, groen, hard-
geel, bruin en menie-roode beschildering. Zacht
zwiepten de staarten, knokelden schimmel-blanke,
goudbruine en donker-bronzen ruggen, gloed-laag
beschenen. Van alle kanten manoeuvreerden,
tusschen ’t geraas van weggeslingerde kisten, de
karren zoo dicht mogelijk bij boot waarop geladen
moest worden, voor vertrek van den volgenden
ochtend. Hoog bestapeld, van achter tot voordek,
krioelden de booten in rij vóór den wal. Geen plekje
kon onbezet blijven. Dekknechten en kapteins
sjouwden gelijkelijk mee met de tuinders, droegen de
bakken met vruchten en groenten ààn, in zwaren
zwoeg, over keiweg en loopplanken. Handiger dan de
tuinders wrongen zij zich tusschen de kisten, mand-
en zakvrachten door, opstapelend in zwaren smak van
uitputting, de nieuwe vrachten. Dwars door elkaar,
bouwden ze òp, monumentaal, dat de groentebakken
vèr boven verschansing, als ’n kleurig fort van ruw
hout, neergeblokt stonden. En overal bij spleetjes,
gloeiflitste ’t vruchtenrood tusschen het
monsterbouwsel van kisten, levend monument van
vruchten, gesmoord tegen orànjebrand van wortelen
en groen.—
Op alle booten tegelijk, bleef de laatste stapeling

aardbeibakken, langs heel den havenwal neergewolkt
in vuur, en de gemeen-beschilderde kisten in ruig
koloriet, dansten een kankan van helsche kleuren in ’t
doorzonde, heete havenrumoer. Zwoele stanken
drassigden in de lucht ’t zoet uit van de vruchtjes,
zwoel-vochtig tusschen de groenten, die
opeengebroeid, voozer verstonken. Over den drogen
gloeiend-stoffigen sintelgrond verwàlmden geuren,
dwarrelde rond, draaikolk van stanken, zoetige en
ransige, kanteloeplucht en citroenen, sinaasappels
[91]en bedorven uitwasemende poonen, scharren, zuur
en eieren; gotenlucht tusschen ’t aardbeizoet,
dòòrwaaid weer van kroegstank en spiritus. Alsof
zieke reuzen ergens heeten stankadem uitbliezen
over de Haven, zóó van alle zij wasemden de dingen
hun luchten uit, vermengd en verbroeid in de hitte. En
telkens gulpte van spoordijk, rook-roetige wolk-
smoezel, in woeste vegen donker heendampend over
’t havengewoel, òver karren, vruchten en
menschenoppropping, doorzond en verstoofd in
kleuren en licht. Zóó, fel omklaterd en verbrand van
kleuren, in de hartstochthitte der zwoegers,
verdampten de geuren bevend licht bòven de karren,
de mensch-koppen, kisten en manden dompelend in
zichtbare sfeer, stralend-fijn van ijl stofgoud.—
Sfeer van trillend licht, als neergestroomd uit reuzige
zonnelantaarns, in banen van damp en warrel. En
tusschen de menschkoppen en koopwaar, grillig-
verspatte nog zonnevuur kleurfelle vlakken, gooide
vlammig spel op bevende kaken, òp handen, in
oogen. Telkens nieuwe kisten sjouwden dekknechten
met tuinders áán, in hijg-zwoeg, dood-af gewerkt van
den ganschen dag landwerk; telkens in hun gang
gestuit, teruggestooten, weggeduwd en verkneld, door
andere werkers met vrachten op nek, hoofd en
schouders, ingebukt en woest in d’r moeheid,
terugsjokkend van de loopplank, achter elkaar, met de
doffe, of kleur-felle kisten op de schonken, wat ruzie-
schor verschreeuwend tegen de kerels die opdrongen
of worstelden om plaats. Dan hier, dan daar holden en
kankaneerden de sjouwers op zij, vòòr paardkoppen
of langs geweldige vrachtwagens, die tusschen hen
indrongen. Overal was gestruikel, gestrompel; dof-
klankten smakkingen, daverde gebonker van
verslingerde kisten en manden, gloei-stroom van
jachting, verkronkelend in dol-botsende drukte. De
kleurige booten, achter elkaar langs den wal, sloegen
’n branding in ’t hart van ’t levende haventje. Eén
vloed en eb deinde er van duizenden verlompte
werkgangers die òp en af, dwars en tusschen elkaar’s
groen-waar heensjouwden en reden.—Zoo golfde bij
den wal ’t sterkst, woeste stroom en tegenstroom, zich
vertakkend en [92]verbrokkelend in zijwegjes, weer
aanspoelend in kleinere hurrie-golfjes van andere
kanten. Drang tot afdoen vloeide samen op één punt,
bij al de booten, drang en jacht, die angst-steigerend,
al hooger àànstortte tegen ’t forthooge bouwsel, kleur-
heet monument van bakken en manden; stroom die
plots weer àfrazernijde naar kroegen, dat er spuiing
kwam en luchtiger rammeling van karren tòt in de
afgelegenste, koel-uitgezonde hoeken, bij pakhuizen
naar polderkant.
Langs de havenvaart, achter kastanjelaan, lag de kei-

en sintelweg hooger met leege kisten bestapeld,
teruggestrooide vracht van ’n boot, die al twee maal
aardbei naar de groote stad had vervoerd. Nu ook van
polderkant donderden en bonkerden paardkarren en
ezelwagentjes ààn, met ’n geraas of ’r mijlen in ’t rond
muren van glaswerk instortten, telkens één op den
ander vergletscherend. Tusschen ’t lawaai van lossers
en laders, gilden en joelden kinderen aan karkrukken,
klauterend in stemmenschater over wagens, kisten,
manden; hingen en wipten ze aan karhoeken, elkaar
bejoedelend òprukkend of terugsmakkend. Beestig
woelden ze rond, furiënd in de herrie, stremmend ’t
gedrang en gesjouw, tot plots in woedevlaag van
werkers, ze verjaagd werden naar andere hoeken.
Maar ook daar woelden en klauterden ze weer als
losgebroken apen uit dierentuin, òphitsend de honden
die basten en jammerden gelijk schreiende gekken.—
Als wegratelde, met kar en hond, één tuindersgroepje,

stortten nieuwe wagens rumoer uit, was ’r in enkele
minuten weer kargedrang en gestoot op-elkaar-in.
Achter boomgroen, waar goud-rooden gloed dampte,
en scheemrig de laan lommer-glansde, keken vreemd
de geheim-stille ruiten, nù onder half bladerduister, in
haven-zonnerood vertooverd; bij brokken
weerkaatsend ’t belichte gewoel, worstelzwoeg van
tuinders, die er vóór sprongen en holden, tierden en
hotsten als gestrafte misdadigers, kankaneerend op
heete platen, met hun verweerde tronies tusschen
eigen walmende reukwaar in. Geen sterveling in ’t
stroomrumoer, die stààn kon blijven. Van alle kanten
stootten de vrachten òp, mannen, vrouwen, kinders,
[93]vloekend, scheldend tegen elkaar in. Méér dragers
balanceerden bakkenstapels boven ’t hoofd, waar ze
in angst naar grepen, als ze zich teruggebonkt
voelden, in den karren-kronkel. Minuten lang werden
lijven geblokkeerd, ingesloten, door dwarrelgang van
andere sjouwers vòòr de booten, en eindelijk haastig
strompelde ’n troep vooruit, als er spuiing kwam aan
den walkant. Snel warrelden ze dooréén,
aanschuivend en verglibberend op stronken en
groenteafval, met één voet soms tusschen ’n opening
ingekneld, de andere op bakken en manden
neergetrapt. Elk bloot keiplekje lag dadelijk
dichtgesmakt, als ’n werker z’n voet wegtrok, opschoof
of weer naar z’n kar terugdrong om nieuwe waar af te
laden. Angstig geroep en gekrijsch van ho!.. ho!.. hai-
je-hee!.… verklonk opjagend de werkgangers.
—Kaik veur je!.… paa’s d’r-op-bout!.… hee doar je

kop!—màin snuit ook nie!.… kruiste en spatte tegen
elkaar òp, in rauwe ironie en spot. Bij de minuten nu,
zwol áán de razernij, ’t gewoel, gedrom, gekrioel op
havenengte, en wilder ging werkjacht onder de
menschoppropping, de samengeknelde worstelende
zwoegers, teelders van Wiereland, Duinkijk,
Kerkervaart, Lemperweg, Lemper en van al de dorpjes
rondom; moeë kerels, afgejakkerd door dàglangen
landarbeid, met heeten wrevel in d’r lijven, tot laat in
den avond opsjouwend naar marktwemel.—
Van de kroegen uit, tierde, raasde, ziedde rumoer. Als
dampige holen, volgerookt met smook-blauwigen
nevel, donkerde daar roezemoes van zuipende kerels,
uitspattende drinkers en pijprookers, klodderend en
spuwend op zandgrond. Flikkering en lichtflitsjes gloei-
speelden ijl over karaffen en glaasjes bij buffet,
onderschepte vonkjes en glansjes van verdwaald rood
en groen-goud licht. Van alle herbergen stonden de
deuren wagenwijd open, en achter de boomenlom’ring
uit, verdrongen de werkers elkaar rumoerend, om
eerder in haastigen slok ’t brandvocht, droge, moeë
keelen in te gieten.—
Benauwend sloeg de stank van groenten, visch en

aardbei de kroegen door. In woelige, opgedrukte
kringetjes zaten groepjes [94]bijeen onder schor
dronkenmanslawaai, en telkens schoot geschater uit,
dat als jammerhuilen verklonk. Jovialig beklopten ze
elkaar op de petten, schouers, of stond ’n troepje, bij
’n hoek opgedrongen, met de jeneverkelken in de
hand, morsend en plassend over kleeren en polsen,
d’r groen-waar te versjacheren, klakkend in handslag.
Vlak langs de huizen en stoepen schuurden de

paardwagens, uitwijkend als ’t kon, voor
groenteuitstallingen om boomstammen; ging rond, ’n
gang van slenterende meiden en moeders met
kinderwagens, zuigelingen, angstig verschrikt
kermend en krijschend tusschen den jachtarbeid in.
Telkens uit zijwegjes en onder boomlaan uit, doken de
paardkoppen vlak tusschen de verweerde
menschtronies in, joelde de havelooze kinderbent,
onder paardenpooten, achter karwielen, vaten, zakken
en bakken uitklauterend. En heviger, boven alles uit,
furiede hondgeblaf, angstige huilgamma’s van
verhongerde kreupel-getrokken beesten, dorstig,
geranseld, in woeste razernij van klachten, òver heet
werkersrumoer heen, met ’n ontroerend-
menschelijken jammerhuil in d’r blaf, als verschreiden
ze hun ellende, hun dorstgemartel in hitte;
verklaagden ze elkaar hun slaag en driftschoppen van
d’r bazen, waarvoor ze zwoegden, den heelen
zomerdag door.
Tusschen het ketsende paardgetrappel en kargeratel,

door dwarsweggetjes uitgestort en neergeworpen in
den woelstroom van ’t donderbonkerend marktgeraas,
bleven de honden blàffen. Eén jammerkoncert van
droeve klaagklanken, van nijdig gekef, van gekerm,
gejank; schor en dreigend gebrul van door kinderen
gesarde dieren, tegen de wielbanden getrapt, aan de
spanriemen vermarteld, met steenen besmakt, dat de
schrei-blaffende koppen verkrampten tegen de assen
òp. En àl meer karren ratelden ààn, met span van drie
en vier beesten vóór en ònder de wielen, die dadelijk
meeblaften in oorscheurend geweld.
Plots braakten twee hoekkroegen ’n stoet uit van

krijsch-jolende kerels, in malle sprongen, hals-bloot,
met slappe handjes naar elkaar toe kankaneerend.
Dronken verzwollen koppen, [95]roode stropdoeken,
losgescheurd op bloote borst, joolden ze èng en
opgestooten onder paardkoppen door, waggelend
tusschen kar-honden, die nijdig aanbasten, en in dol
geterg, met hun woeste, kwijlige schuimbekken
dreigden te happen. Voor de zuur- en vischstalletjes
bleven de kerels staan, gretig grabbelend in de zon-
bebronsde poonen, waarvan de vellen glansden als
gloeiend ceramiek. En schor joegen ventersstemmen
de kerels òp.
—Fersche poòne.… mooie poòne? faine skarre!.…

fersche skarre!
Verhitte relletjeskring van arme tuindersventers, die

onder elkaar wat bakken groenten hadden verkocht,
niet op bootveiling en afslag hoefden te wachten,
zongen rauw, in dronken joel:
—Oooaaw.… waa’t ’n skà-ànde!..
Hun vervuilde kleeren stonken en drank-asem

walmden ze elkaar in de roode bakkesen.—Van rechts
golfde ’n nieuwe zwoegstroom áán, die den lolkring
terugdrong weer, in de smookige donkere kroegen,
waar de zweetwasem verbroeide als zure
vleeschstank, tusschen de scherpe prikkellucht van
tabak en jenever.
Oorverdoovender buiten, brulde, loeide, raasde en

kermde de hondenbent onder de karren; duizenden
woedende en schreiende dieren tusschen ’t
menschgekrioel in.—
Ze huilden als uit dwangbuizen losworstelende

gekken, door brandangst, in ruimte zonder uitgangen,
opgejaagd. Tegen elkaar in blaften de beesten onder
assen, wielen of krukken, met bassende, dreigende,
hakkende, schorre hoog klank-martelende razernij,
onvermoeibaar in één krisis van kretensmart. En
vonkwild, bloedbeloopen, gloeiden hun oogen zonder
dat ’t landvolk omkeek naar de dieren, die ze
verhongeren lieten, en versmoren van dorst.—
Aan linkerkant van Haven, waar menschen, paarden,

karren, ezels en honden in één oppropping, ’t sterkst
dooreenkrioelden, daverde zwaaibreed ’t
kistengesmak; keilden tuinders, manden en bakken
rond, klonk rauw gekrijsch, barstten vloeken uit,
[96]giftig en grof, in heete stemme-botsingen.—’n
Politieman, tusschen geraas en gedrang, verkrompen
tot mager angst-gestaltetje, keek rond, hand op de
sabel, dan hier, dan daar geschouerbonkt, tegen z’n
rug, getràpt tegen z’ beenen, z’n kop, geen raad
wetend soms, waar zich te bergen.—Aldoor sprong ie
opzij tusschen karrenknel en hondgedreig, maar
dadelijk weer ver-bonkten ze’m naar andere plekken.
—
En overal om ’m, zwol ’t daverend marktgeraas,

ontkraterde het heete onweer van zwoeg, de
sjouwende kleurige opstand van marktkerels; deinde
de zee van druischend woelgeweld, in d’r licht en
kleur-golvend beweeg, geraakt nog door laten
zonnebrand; in d’r klotsende schuiming
overstroomend, vèr, al verder, de Haven, de booten,
den polder, met daverend rhytmus van treinengang,
dreunend in den lagen, snikheeten zonneval.—
Tusschen het geraas en de blaffurie klonk als ’n

demonische hooge stemmenspot, plots bel van den
afslag, uitrammelend z’n schellen schater boven de
kleurig-opgepropte menschenmassa. Daad’lijk trok ’n
bronskleerige tuindersstoet met straat-groente-
venters, zonder eigen grond, rond den afslager, die
bengelen bleef; tusschen de helle klank-strooiing
dóórkrijschend, dat nieuwe omgang van afslag
beginnen zou. ’n Helper riep naast den verkooper, met
twee handen aan den mond als toeter gebogen, heftig
dat ’t bloed ’m naar de wangen schroeide, ’t zweet
glimmig van z’n slapen druppelde, den
druischdonderenden warrel van ’t marktgeraas
pogend te overschreeuwen. De schelle
klankenrammel bleef luiend lawaaien en daaràchter
de schor-verkrijschte afslagers-stem, onverstaanbare
radde handelstermen uitbrabbelend. Op ’n hooge
stapeling van kisten stond ie naast z’n helper, en rond
’m, de stom toekijkende beweeglooze, ingedrukte
klomp brons-kleerige tuinders, loerend naar de
uitgestalde groentewaar, den geweldigen kring van
sla, groen-doorzond, als smagarden kwallen, geplooid
en bekerfd; douwig-doffe doppers, platte peulen,
rhabarberbossen, rooiig-groen, en korrelig blanke
boeketjes bloemkool, op stapels dooreengesjouwd, in
kisten en bakken. Eindelijk stopte afslager,
satanischen klankstrot van bel dicht, kringden de
koopers [97]nauwer op en achter elkaar. Kleine
baasjes-pachtertjes stonden in den koopkrans
gedrongen, met d’r sluwe kaal-geschoren tronies,
loenscherig berekenend in klankloos lipbeweeg, wat
ze voor de waar geven konden. Anderen, in
gedachten, staarden als gekataleptizeerden naar de
nu stille bel op kistenhoogte, die flonkerde bronzerig in
zongloed, tusschen ’t groen. Lager gloei-glansde zon
rond de kijkkoppen, rood-gouden toover sprankelend
op tronies-brokken, de kerels vervlammend in
toortsigen brand van d’r pracht-bronzende kleeren.
Demonisch vlàmde er gloed, inbijtend op den kring,
hier fel-licht verblindend op wit gekiel, daar goud-
wolkerig verzevend op fronskoppen, bronzen ooren en
handen, kleurige kielen en schonken.—
Venters die Wiereland doorsjacherden met wat

groenteafval in hun verrotte schrompeling van kleeren,
beloerden elkaar woest-nijdig, vooruit al bang, dat een
den ander zou opjagen. Onstuimiger drong de stoet
bijéén, scharrelkring van schorre kerels, in naakten
drang elkaar met oogentaal vermoordend van afgunst.
Snel, in woordratel van z’n kisthoogen stapel àf, had
afslager z’n waren geprijsd.—
—Veur tachtig de duuzend,.… veur dertig.… die.…

veur vaiftig die.…
—Main.… gilde één uit den kring, angst-ontzet als

zouden ze ’m slaan.
Daar weerklankte helsche stemmestrot, van bel

tingelingelinge! lingg!.… tin.… gelinge.… lingg-ling!
ling! schel neerziedend in verdoovende vaart, in ’t
rondomme geraas van karren, honden en menschen.
Achter den afslager en kijkkring, was weer ’n
ontzaglijke, aaneengekettingde rij aardbeikarren
aangerateld, helrood tusschen de losgestapelde
peenbossen. Als ’n geweldig fantomige opstand van
plukkers en rooiers, was er ’n nieuwe stroom reuzige
werkers en sjouwers uit den stationskant
aangedrongen, dwars door de paardjes en ezeltjes, in
’n davering van ho!.. ho!.. hei daàr!!’s. Opstanding van
titanische plukkers uit ’n heel anderen landhoek, tegen
de paarden ingaand, die verhit, in pracht van trek-
gang, trappelden en vernielden wat onder den hoef
kwam.— [98]
Rond handen en lijven, rond oogen van dieren en

menschen bleef razen een heet gewoel, een druisch-
hevige energie van demonische jacht, één krioelende
gang van leven, tusschen boomen, tenten, stalletjes
indringend en weer uitgulpend met overstroomenden
golfslag van rondfuriënden ploeter.
Pal in den zinkenden zonnedamp, làger verglanzend

in sfeer van goudrooden brand, doken de woeste
zwoegkoppen òp, uit al meer verdonkerd
boomlommer. Dààr, onder ’t verduisterd bladergroen
sjacherden de kerels, met d’r tronies in geheim-fijnen
schemer van half-licht, groen-goud en bleekblond
omzeefd, soms plots uitgesmakt naar walkant, in ’t
roodgoud gewolk, tot de voeten begloeid in fanatieken
worstel van glansen.—
Van alle kanten in de avondzon, duwden ze òp de

kerels, pracht-gespannen de spieren van stuw, al
mèèr groenten, heele steden overrompelend met hun
monumentale warenmassa. Inééngedrongen dààr, op
karren bij elkaar, geplukt en gerooid, lag de éérst
mijlen verspreide zwoeg van hun land, om straks te
zien verdwijnen hun waar, hun voedsel en zoete spijs,
als offerden ze, in angstige, stomme biecht van arbeid,
aan den donkeren grom-muil van groote geweldige
stad, verslindende metropool, die maar slikte, slikte,
elken dag meer van wat er brandend en zweetend uit
hùn gloeihanden zich opstapelen kwam.—
Van heinde en ver trokken ze op, van de stilste
afgelegenste akkerhoekjes, waar eerst hun waar in
zonnebrand had te gloeien gestaan. Van heinde en
ver de kerelsstoeten trokken òp in orgieënden zwoeg,
omjoeld van laat hemellicht, dat schaduwde en zacht
vervlamde in beev’rige, goud-roode vegen, aarzelend
purper en nevelig hitte-brio, verdonkerd onder
lommerlaan in zeegroenige zweefglanzen. De
kerelsstoeten er in, tusschen hun nog half levend
goed, met den sappenden streng uit den grond gerukt,
in den zwijmel van landgeur, den frisschen zwelg van
aarde-lucht, verbrandend, verstovend in de
havenstanken. Kerels-stoeten als ’n leger van
werkers, met hun kleurfelle, gloei-dronken schitter-
spijzen van overal, worstelend en optrekkend [99]naar
den grooten verren donder der stad, toestoppend den
grommenden lekkenden reuzigen muil van het
zwelgend, vraat-dolle, hongerige massa-monster,
waar zij nu voor saamstroomden, met hun lèvende
waar, doorzogen van hùn zweet; kerelsstoeten,
onbewust van hun demonischen aanvoer, hun
luidruchtige geweldigheid, en heet-koortsend
kleurenbrio, hoorend in ’t kraterend marktgeraas, echo
van den ontzettenden honger-grom van stads-strot.—
Al maar, achter de kastanjelaan uit, van steegjes en
weegjes, doken de karren òp, nu drukker in bedrijvig
gedrang bij den molenkant van station. Ze waggelden
de wagens, zwanger van hun vrachten, ratelden,
hotsten heviger, alle door ’n blondroode baan van licht
heenwadend, doortrilden gouddamp, die àchter hun
koppen, hun waar, bleef hangen als ’n bevende nevel,
laat zonnevuur waaruit ontzag’lijk, reuzige menschen
groeiden. Eindelijk, hun lijven en karren, dromden
buiten de lichtbaan, die zich achter hun koppen weer
sloot als ’n wolkerig wonder van goud-roode
glansen.… Al lager sloeg ’t zonnevuur de haven in
tooverroes van kleur-vlammige schijnsels, als
gebroken licht-vitrails, in scherven en brokken
rondgestrooid over keien, sintelgrond, karren en
vruchten, hout en steen; vlammige brandingen van ’n
Juli-avond, waarin ’t marktgedruisch ziedde, kookte, in
helle-sfeer van broeiing.—
De booten stonden al hooger opgestapeld, achter en

òm de donkere pijpen en koelkast, bòven
machinekamer. Gedraaf, hijgend òp en af de
plankieren daverde en krioelde langs den wal. Te
zinken dreigden de schepen van overvolle angstig-
dicht opgepakte vracht. Burchten van ruw hout, in al
woester koloriet, waar de late zon op neerlekte
lichtvlammen. ’t Hoogst, bòven de bakkleuren uit,
roest-rood, brons, groen, hel-blauw, goud-geel, bleven
de aardbeien gloeien in hun ijl-zang van hel inkarnaat,
purperend zonnegoud òpzuigend; de bakken
volgedaverd van lichtende vruchtjes.—Daartusschen
pyramidaalden bergen doppers, selderie, sla,
postelein, wortelen, tuinboonen, bergen groen en
oranje.—De bloemkoolen schaars aangevoerd van
polder, stonken tegen de aardbei-zwoelte in, [100]en
telkens, uit den kroeg-smokigen mist, doken kerels òp,
doodaf, in zweetvet verglommen, den heeten asem
van hun drank in sjacher verhijgend over hun
vruchten.
Midden in het oorsuizend donder-dof gesmak van

bakken, in jagender tempo, stond plots ’n Jood,
moorsch-zwart, op ’n kruk, voor ’n hoog standaard-
tafeltje, waarop wat vuile tangen met kromme snavels,
dooréén lagen; grimmig roodroestig gereedschap als
martelwerktuigen van middeleeuwschen kerker.—
Midden tusschen ’t gekrioel en den furiënden

aanstroom van lossing en lading, was de Jood
tweemaal tusschen karren en boot doorgedrongen,
maar weer teruggesmakt van z’n standplaats. Eindelijk
had ie z’n tafeltje bij ’n wat uitgedaverd walhoekje
vastgemoerd, was ie spraakloos in de giftige woeling
rondom, blijven staan. Vòòr ’m dromden handkarren,
volgestort met felgele komkommers, goud-glanzend,
bij punten gepolijst, en telkens daverden kerels met
groentebakken voorbij, die licht rondsloegen tusschen
’t overal wolkende aardbeirood.
Moorsch-zwarte Jood, met vonk-donkere oogen, stond

hoog en rustig eindelijk op z’n smalle kruk, boven z’n
tafeltje uit. Onder z’n tasch, tusschen de tangen,
greep ie naar ’n trompet, tamboer-gebarig voor z’n
mond drukkend. Valsch geschal toeterde en stootte ie
uit, ééntonig als bazuindreun op grooten verzoendag
in synagogen, kreten van vermoeienis en uitputting
verjammerend.—Telkens bonkten, in ruwen daver
kisten en manden tegen z’n kruk op, dat ie waggelde,
zich vastgreep aan z’n vastgesnoerd tafeltje. Maar
onvermoeid, toeterde ie door, energisch fel
klankgescheur, dat de karhonden rondom nog
ellendiger uitraasden, grienden en blaften, en telkens
stortte ie ’n heet stjing! stjing! van bekkenslag er
tusschen in.—
Achter den Jood, op z’n kruk, koepelde ’t geweldig
polderluchtruim, doorgloeid van glansen, waartegen
donkerhoog z’n gestalte oproer herautte, in ’t
rondomme gebarrikadeer van manden, kisten en
karren. Donker vonkten z’n oogen, en strak, taai-
energisch, bleef ie toeteren en heete tjings van
bekkenslagen [101]afpauken in ’t geraas. Hem voorbij,
in nijdige onverschilligheid, donderde ’t gewoel en
gedraaf, gebroei en geschroei der zweetzwoegende
kerels. Geweldige lasten voortzwoegende paarden
zwenkten schurend langs z’n hoog standaardtafeltje.
Ezeltjes huilbalkten achter z’n rug, en in loomen luister
naar z’n getoeter en getjing-sjing stonden
paardkoppen rondom, vlak tusschen het tuinders-
gewoel, met heftigen tril van hun oortjes.—Wèg bleef
’t luistervolk, nog gebukt in sjouw, en sterker toeterden
en dreunden de moorjood trompetscheuren, heet als
schuimflitsen van golflicht, hooggoud en brandend
rood, òver ’t woeste hurrie-rondkolkend marktgetier.—
Eindelijk wat jongens en kerels, gedaan met sjouw,
bleven hangerig en landerig staan voor z’n tafel,
lacherig of in stompen staar bot òpkijkend naar den
trompettenden Jood.—Grinnekend scholden ze ’m uit,
krijschte een roomsch-nijdige kop, fel-spottend en dik-
tongig.
—Enne.… ààpramm.… gewòn Isaak!… enne

Issaakh!.. gewon Sjàkop enne.… Sjàkop ’r.. gewon..
heè! rot-smaus, appie!.… d’ur is d’r ’n goàtje in je
trepetter!—
Maar Jood bleef star-kalm in z’n donkere figuring op

de kruk, z’n vonk-zwarte oogen, hevig suggereerend
in z’n hoofd. Stil had ie op z’n tafeltje de trompet en
tsjing neergelegd, groote toortsen papier gerold en in
brand gestoken, dat ze rookvlamden in z’n handen.
Zachte kringen van vlammenlicht, zwierden hoog om
z’n zwart hoofd, donkerend tegen den laag-stralenden
zonschijn in. Walm-rook dampte blauwige mist rond
z’n hoog-staand lijf, dat reuzigde tegen de polderlucht.
Telkens als toortsen bijna uitgebrand waren, rolde ie
met linkerhand nieuwe, zwaaide ze in kringen, dan
één, dan twee, geheimzinnig om z’n gelaat; stak weer
andere áán, wierp wèg de uitgebrande.—Nu en dan
bazuinde ie tusschen het smokige, roetige gloed-rood
gewalm, in den mist van z’n mysterieuze goochelsfeer
trompetklanken uit, dat angstiger, burlesker, z’n
omdampt-moorsche kop scheen te zwellen tegen ’t
hemel-rood in.—
De jongens schaterden om mallen Jood, maar kerels

nu bleven [102]lijzerig staan kijken, bot, den mond van
loome verbazing opengegaapt.—
Dirk was uit de kroeg, op ’t walmgerook

toegesprongen in den kring, verbluft òpstarend naar
den kwakzalver, als zag ie ’n wonder uit den hemel
tusschen ’t havengewoel neergestort.
Onder de kastanjelaan woelde marktgescharrel, ’t

drukst nu, rond de kleine tentjes en
groenteuitstallingen.—De groote huisramen gloeiden
rood in ’t zonnezinken, vreemd flonkerend, fel in
gouden mist, onder den zwoel-duisteren
bladerenschemer uit, rooden gloed, die diep, dièp, ’t
marktkrioelen in bloedende gamma’s òverleefde,

Textbook Advances in Multimedia Information Processing PCM 2018 19Th Pacific Rim Conference On Multimedia Hefei China September 21 22 2018 Proceedings Part Ii Richang Hong Ebook All Chapter PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Textbook Advances in Multimedia Information Processing PCM 2018 19Th Pacific Rim Conference On Multimedia Hefei China September 21 22 2018 Proceedings Part Ii Richang Hong Ebook All Chapter PDF

Uploaded by

Copyright:

Available Formats

Advances in Multimedia Information

Processing – PCM 2018: 19th

Advances in Multimedia Information Processing – PCM

Biota Grow 2C gather 2C cook Loucas

Advances in Multimedia Information Processing PCM 2016

Advances in Multimedia Information Processing – PCM

PRICAI 2018 Trends in Artificial Intelligence 15th

Advanced Hybrid Information Processing Third EAI

Digital TV and Multimedia Communication: 15th

Advanced Hybrid Information Processing Third EAI

Toshihiko Yamasaki Meng Wang

Chong-Wah Ngo (Eds.)

ISSN 0302-9743 ISSN 1611-3349 (electronic)

Library of Congress Control Number: 2018954671

© Springer Nature Switzerland AG 2018

September 2018 Richang Hong

Technical Program Chairs

Special Session Chairs

Enhancing Low-Light Images with JPEG Artifact Based

Depth Estimation from Monocular Images Using Dilated Convolution

Enhancing Feature Correlation for Bi-Modal Group Emotion Recognition . . . 24

Frame Rate Conversion Based High Efficient Compression

Spectral-Spatial Hyperspectral Image Classification via Adaptive

Convolutional Neural Network Based Inter-Frame Enhancement

Robust and Index-Compatible Deep Hashing for Accurate

Self-supervised GAN for Image Generation by Correlating

Deformable Point Cloud Recognition Using Intrinsic Function

Feature Synthesization for Real-Time Pedestrian Detection

Enhancing Person Retrieval with Joint Person Detection, Attribute

End-To-End Learning for Action Quality Assessment . . . . . . . . . . . . . . . . . 125

CGANs Based User Preferred Photorealistic Re-stylization

Cross-Modal Event Retrieval: A Dataset and a Baseline Using Deep

Retinal Vessel Segmentation via Multiscaled Deep-Guidance . . . . . . . . . . . . 158

Image Synthesis with Aesthetics-Aware Generative Adversarial Network . . . . 169

Hyperspectral Image Classification Using Nonnegative Sparse Spectral

Adaptive Aggregation Network for Face Hallucination. . . . . . . . . . . . . . . . . 190

Multi-decoder Based Co-attention for Image Captioning . . . . . . . . . . . . . . . 200

Simultaneous Occlusion Handling and Optical Flow Estimation . . . . . . . . . . 211

Handwritten Chinese Character Recognition Based

Attention to Refine Through Multi Scales for Semantic Segmentation . . . . . . 232

Frame Segmentation Networks for Temporal Action Localization . . . . . . . . . 242

Stereo Matching Based on Density Segmentation and Non-Local

Plenoptic Image Compression via Simplified Subaperture Projection . . . . . . . 274

Gaze Aware Deep Learning Model for Video Summarization . . . . . . . . . . . . 285

Three-Stream Action Tubelet Detector for Spatiotemporal

Multi-person/Group Interactive Video Generation . . . . . . . . . . . . . . . . . . . . 307

Image Denoising Based on Non-parametric ADMM Algorithm. . . . . . . . . . . 318

Satellite Image Scene Classification via ConvNet

VAL: Visual-Attention Action Localizer . . . . . . . . . . . . . . . . . . . . . . . . . . 340

Incremental Nonnegative Matrix Factorization with Sparseness

A Data-Driven No-Reference Image Quality Assessment via Deep

Towards Stereo Matching Algorithm Based on Multi-matching

An Effective Image Detection Algorithm for USM Sharpening

Skeletal Bone Age Assessment Based on Deep Convolutional

Panoramic Image Saliency Detection by Fusing Visual Frequency Feature

Coupled Learning for Image Generation and Latent Representation

Enhanced Discriminative Generative Adversarial Network

Image Splicing Detection Based on the Q-Markov Features . . . . . . . . . . . . . 453

Snapshot Multiplexed Imaging Based on Compressive Sensing . . . . . . . . . . . 465

Partially Annotated Gastric Pathological Image Classification . . . . . . . . . . . . 476

Facial Expression Recognition Based on Local Double Binary

Pixel-Copy Prediction Based Lossless Reference Frame Compression . . . . . . 498

Robust Underwater Fish Classification Based on Data Augmentation