You are on page 1of 70

CVPR 2022 Cover Page (Outside Front Cover) Main Conference

CVPR 2022 Confernce Center Map (Inside Front Cover) Main Conference
Message from the General and Program Chairs
Welcome to the 2022 IEEE/CVF Conference on Computer Vision and Out of the 2064 accepted papers, 342 papers have been selected for oral
Pattern Recognition in New Orleans, LA. CVPR continues to be presentation and the remaining 1721 for poster presentations. 34 papers
IEEE/CVF’s and PAMI-TC’s premier and flagship annual meeting on have been shortlisted as best paper award candidates. The final best
computer vision, giving researchers in our community the opportunity to papers and honorable mentions will be selected from these 34 papers by
present their exciting advances in computer vision, pattern recognition, an independent award committee appointed by the program chairs,
machine learning, robotics, and artificial intelligence, in theory and/or which is composed of senior researchers from our community. The award
practice. With invited keynote talks, oral and poster presentations, committee is led by an award committee chair appointed by the program
tutorials, workshops, demos and exhibitions, as well as an amiable social chairs, who moderates the selection process.
setting, we have an exciting program planned for this week. Moreover,
Best papers will be presented in a single-track award session at the
this year marks the first hybrid CVPR that you may once again join
beginning of the main conference, each being allocated 8-10 minutes for
physically since the COVID-19 pandemic.
presentation (depending on the final number of awards that will be
If you are able to join the physical conference, we very much hope you issued) and 2 minutes for questions. All oral papers have been allocated
will enjoy your visit to New Orleans, after spending such a long period in 5 minutes for a short oral presentation. Each oral paper has also been
social distancing. If you are not able to join the physical conference, we assigned with a poster slot, to be presented together with the poster
invite you to attend the virtual part of CVPR 2022, which will take place papers in a 2.5-hour poster session. To accommodate the growing
online the week following the physical conference. The virtual number of papers and attendees, we scheduled three parallel oral
conference will run in a similar fashion as the past three virtual CVPRs and sessions as in the last physical CVPRs and a portion of the poster sessions
provide a full coverage of the technical papers. We hope you will enjoy it will partly overlap with the oral sessions.
as well if you cannot travel to New Orleans. Our hearts go out to everyone
We would like to thank everyone involved in making CVPR 2022 a
who has been affected by this pandemic, directly or indirectly.
success. This includes the organizing committee, the area chairs, the
Following an accelerating multi-year trend, CVPR 2022 experienced reviewers, authors, demo session participants, donors, exhibitors, and
another surge with a record number of 8161 submissions, amounting to everyone else without whom this meeting would not be possible. We also
a 15% increase from the 7093 submissions to CVPR 2021. After 3.5 thank Nicole Finn and C to C Events for their organization of the logistics
months of diligent work from the 300+ area chairs and 6427 reviewers of the conference. Last but not least, we thank all of you for attending
(including 1723 emergency reviewers), 2063 papers have been accepted CVPR 2022 and making it one of the top venues for computer vision
through a rigorous review process, leading to an overall acceptance rate research in the world. We hope that you also have some time to explore
of 25.28%. Each paper received at least 3 reviews, followed by an author gorgeous New Orleans during the conference. Enjoy CVPR 2022 and we
rebuttal phase and discussion phase among ACs, working in triplet look forward to meeting you in person!
groups, and assigned reviewers. The final paper decisions were Program Chairs: Kristin Dana, Gang Hua, Stefan Roth,
recommended by the AC triplets and approved by the program chairs. Dimitris Samaras, and Richa Singh
Following the tradition of CVPR, the PCs did not pre-set any acceptance
cap. The resulting acceptance rate reflects the community consensus, General Chairs: Rama Chellappa, Jiri Matas,
and is very well aligned with past CVPRs. Long Quan, and Mubarak Shah

CVPR 2022 Organizing Committee


General Chairs: ............................. Rama Chellappa Website Chairs: .................................... Anton Milan
Jiri Matas AJ Piergiovanni
Long Quan Shiliang Zhang
Mubarak Shah Corporate Relations Chairs: ................. Mei Han
Program Chairs:............................ Kristin Dana Shiguang Shan
Gang Hua Bjorn Stenger
Stefan Roth Diversity, Equity, & Inclusion Chairs: .. Noah Snavely
Dimitris Samaras Shuran Song
Richa Singh
Social Activities Chairs: ........................ Giovanni M. Farinella
Workshops Chairs:........................ Mohit Gupta Rana Hanocka
Vishal Patel
Local Arrangements Chairs: ................. Philippos Mordohai
Richard Souvenir
Jinwei Ye
Tutorials Chairs: ........................... Boqing Gong
Technical Chairs: .................................. Ke Ma
Julien Mairal
Maneet Singh
Demo Exhibit Chairs:.................... Humphrey Shi
Publicity Chair: ..................................... Kosta Derpanis
Maria Vakalopoulou
Senior PAMI-TC Ombuds: .................... David Forsyth
Presentation Chairs: ..................... Brendan Morris
Linda Shapiro
Zhixin Shu
CVPR 2022 Ombuds: ............................ Kate Saenko
Finance Chairs: ............................. Octavia Camps
Noah Snavely
Brian Price
Publications Specialist: ........................ Eric Mortensen
Doctoral Consortium Chairs: ........ Minh Hoai Nguyen
Adriana Kovashka Event Producer:.................................... Nicole Bumpus Finn

1
CVPR 2022 Area Chairs
Sathyanarayanan Aakur Pascal Fua Zuzana Kukelova Robert Pless Qi Tian
Ehsan Adeli Yasutaka Furukawa Ajay Kumar Thomas Pock YingLi Tian
Lourdes Agapito Jürgen Gall Kiriakos Kutulakos Marc Pollefeys Radu Timofte
Naveed Akhtar Chuang Gan Suha Kwak Jean Ponce Tatiana Tommasi
Alexandre Alahi Efstratios Gavves Junseok Kwon Gerard Pons-Moll Antonio Torralba
Pablo Arbelaez Andreas Geiger Shang-Hong Lai Qiang Qiu Lorenzo Torresani
Iro Armeni Bernard Ghanem Christoph Lampert Petia Radeva Anh Tran
Yannis Avrithis Spyros Gidaris Ivan Laptev Noha Radwan Tali Treibitz
R. Venkatesh Babu Shiry Ginosar Diane Larlus A.N. Rajagopalan Zhuowen Tu
Kavita Bala Ross Girshick Svetlana Lazebnik Rene Ranftl Matthew Turk
Vineeth Balasubramanian Georgia Gkioxari Laura Leal-Taixé Rajeev Ranjan Tinne Tuytelaars
Adrian Barbu Boqing Gong Kyoung Mu Lee Nalini Ratha Georgios Tzimiropoulos
Joao Barreto Venu Madhav Govindu Stefan Lee Yogesh Rawat Maria Vakalopoulou
Jonathan Barron Xianfeng GU Victor Lempitsky James Rehg Joost van de Weijer
Serge Belongie Fatma Guney Ales Leonardis Ian Reid Jan van Gemert
Alexander Berg Xiaojie Guo Vincent Lepetit Zhou Ren Gul Varol
Margrit Betke Bumsub Ham Fuxin Li Hamid Rezatofighi Nuno Vasconcelos
Horst Bischof Bohyung Han Haoxiang Li Susanna Ricco Mayank Vatsa
Soma Biswas Mei Han HONGDONG LI Emanuele Rodola Ashok Veeraraghavan
Edmond Boyer Tian Han Jing Liao Carsten Rother Olga Veksler
Yuri Boykov Emily Hand Shengcai Liao Amit K. Roy-Chowdhury Carl Vondrick
Michael Brown Tatsuya Harada Dahua Lin Michael Ryoo Chaohui Wang
Jianfei Cai Tal Hassner Stephen Lin Ryusuke Sagawa He Wang
Octavia Camps Kaiming He Zhe Lin Mathieu Salzmann Jiang Wang
Barbara Caputo Otmar Hilliges Haibin Ling Aswin Sankaranarayanan Le Wang
Ayan Chakrabarti Derek Hoiem Cheng-Lin Liu Imari Sato Liang Wang
Ishani Chakraborty Mahdi Hosseini Feng Liu Yoichi Sato Ruiping Wang
Shayok Chakraborty Han Hu Miaomiao Liu Torsten Sattler Shenlong Wang
Tat-Jen Cham Di Huang Ming-Yu Liu Harpreet Sawhney Xiaolong Wang
Chao Chen Junzhou Huang Wei Liu Bernt Schiele Xiaoyu Wang
Chen Chen Qixing Huang Yanxi Liu Konrad Schindler Xinchao Wang
Dongdong Chen Sharon Xiaolei Huang Zicheng Liu Cordelia Schmid Xinggang Wang
Mei Chen David Jacobs Huchuan Lu Nicu Sebe Yu-Xiong Wang
Minsu Cho Nathan Jacobs Le Lu Thomas Serre Zhangyang Wang
David Crandall C.V. Jawahar Zhanyu Ma Fahad Shahbaz Khan Jiajun Wu
Daniel Cremers Qiang Ji Arun Mallya Qi Shan Tianfu Wu
Angela Dai Rongrong Ji Yasuyuki Matsushita Shiguang Shan Ying Wu
Dengxin Dai Xiangyang Ji Stefano Mattoccia Ying Shan Saining Xie
Dima Damen Jiaya Jia Tao Mei Wei Shen Jiaolong Yang
Martin Danelljan Hailin Jin Deyu Meng Xiaohui Shen Ming Yang
Kostas Daniilidis Justin Johnson Dimitris Metaxas Humphrey Shi Ruigang Yang
Trevor Darrell Maya Kabkab Ajmal Mian Jianbo Shi Angela Yao
Fernando de la Torre Angjoo Kanazawa Anurag Mittal Mike Zheng Shou Jinwei Ye
Minh Do Srikrishna Karanam Philippos Mordohai Leonid Sigal Li Yi
Weisheng Dong Dimosthenis Karatzas Francesc Moreno Yi-Zhe Song Xi Yin
Sergio Escalera Rei Kawakami Brendan Morris Richard Souvenir Kuk-Jin Yoon
Irfan Essa Hiroshi Kawasaki Yadong Mu Concetto Spampinato Shaodi You
Xin Fan Margret Keuper Vittorio Murino Bjorn Stenger Jingyi Yu
Yi Fang Salman Khan Naila Murray Hao Su Junsong Yuan
Giovanni Farinella Gunhee Kim Vinay Namboodiri Waqas Sultani Lu Yuan
Ryan Farrell Seon Joo Kim Ko Nishino Chen Sun Stefanos Zafeiriou
Alireza Fathi Tae-Kyun Kim Zhenxing Niu Deqing Sun Jianming Zhang
Paolo Favaro Benjamin Kimia Bjorn Ommer Jian Sun Lei Zhang
Michael Felsberg Kris Kitani Vicente Ordonez Qianru Sun Ning Zhang
Jiashi Feng Hedvig Kjellström Matthew O'Toole Sabine Süsstrunk Shiliang Zhang
Sanja Fidler Iasonas Kokkinos Jinshan Pan Supasorn Suwajanakorn Qi Zhao
David Fleet Vladlen Koltun Maja Pantic Ping Tan Jingjing Zheng
David Forsyth Nikos Komodakis Vishal Patel Robby Tan Bolei Zhou
David Fouhey Jana Kosecka Ioannis Patras Siyu Tang Andrew Zisserman
Victor Fragoso Matej Kristan Georgios Pavlakos Wei Tang
Mario Fritz Hilde Kühne Vladimir Pavlovic Camillo Jose Taylor

2
CVPR 2022 Outstanding Reviewers
We are pleased to recognize the following researchers as “CVPR 2022 Outstanding Reviewers”. These reviewers contributed reviews noted as
excellent by area chairs and will be sent a $100 gift certificate in recognition of their outstanding community service.

Vida Adeli Abir Das Zhenyu Jiang Juhong Min David Stutz
Vítor Albiero Rajshekhar Das Tejan Karmali Gaurav Mittal Matthew Tancik
Rareș Ambruș Neel Dey Shyamgopal Karthik Martin R. Oswald Garvita Tiwari
Liang An Helisa Dhamo Marc A. Kastner Despoina Paschalidou Nergis Tomen
Bjoern Andres Jose Dolz Corentin Kervadec Sujoy Paul Shubham Tulsiani
Nikita Araslanov Simon Donné Pirazh Khorramshahi Adithya Pediredla Mathias Unberath
Ali Athar Daniel Duckworth Dohyung Kim Songyou Peng Sai Vemprala
Haoran Bai Victor Escorcia Sungyeon Kim Juan Perez Dor Verbin
Francisco Barranco Carlos Esteves Alexander Kirillov Ilya Petrov Christoph Vogel
Hector Basevi Michael Firman Erich Kobler Suzanne Petryk Chien-Yi Wang
Stefan Becker Anna Fruehstueck A. Sophia Koepke Silvia Pintea Xiaosen Wang
Cigdem Beyan Yonggan Fu Nikos Kolotouros Benjamin Planche Jia Wei
Goutam Bhat Guillermo Gallego Lingshun Kong Michaël Ramamonjisoa Davis Wertheimer
Bharat Bhatnagar Difei Gao Simon Kornblith Nikhila Ravi Kelvin Wong
Andreas Blattmann Isha Garg Jie Lei Ambareesh Revanur Scott Workman
Amine Bourki Ioannis Gkioulekas Hengduo Li Elisa Ricci Bartlomiej Wronski
Guillem Brasó Shubham Goel Senwei Liang Anna Rohrbach Tz-Ying Wu
Francois Bremond Benoit Guillard Yancong Lin Andres Romero Xin Xie
Andrew Brown Kamal Gupta Yonghuai Liu Jérôme Rony Haofei Xu
Angela Castillo Maciej Halber Ziyi Liu Karsten Roth Ke Yan
Menglei Chai Alexandros Haliassos Sylvain Lobry Patrick Ruhkamp Sangdoo Yun
David Chan Adam Harrison Tobias Lorenz István Sárándi Yi Zeng
Stanley Chan Chen He Andres Mafla Patrick Schramowski Chuhan Zhang
Prithvijit Chattopadhyay Tong He Upal Mahbub Katja Schwarz Ning Zhang
Richard Chen Jennifer Hobbs Massimiliano Mancini Matan Sela Xiaoqi Zhao
Julian Chibane Lukas Hoyer Wei Mao Evan Shelhamer Haitian Zheng
Sanghyuk Chun Jiabo Huang Riccardo Marin Sheng Shen Huan Zheng
Jihoon Chung Jiahui Huang Renaud Marlet Wu Shi Yongpei Zhu
Javier Civera Junhwa Hur Richard Marriott Nina Shvetsova Zhiming Zou
Rodolfo Corona Sukjun Hwang Carlo Masone Oriane Siméoni
Pasquale Coscia Gabriel Ilharco Simone Melzi Gaurang Sriramanan
Gabriela Csurka Samyak Jain Moustafa Meshry Elisavet Stathopoulou

3
Tuesday, June 21 (Morning) Program
Tuesday, June 21 1000–1230 Poster 1.1 (Halls B2-C)
Machine Learning
1. Efficient Deep Embedded Subspace Clustering, Jinyu Cai, Jicong
0700–1700 Registration (Great Hall Lobby) Fan, Wenzhong Guo, Shiping Wang, Yunhe Zhang, Zhao Zhang
2. Clipped Hyperbolic Classifiers Are Super-Hyperbolic Classifiers,
0700–0830 Breakfast (Halls D-E) Yunhui Guo, Xudong Wang, Yubei Chen, Stella X. Yu
3. CO-SNE: Dimensionality Reduction and Visualization for
0800–0830 Poster Setup (Halls B2-C) Hyperbolic Data, Yunhui Guo, Haoran Guo, Stella X. Yu
4. Noise Is Also Useful: Negative Correlation-Steered Latent
0830–0900 Opening Remarks & Paper Awards (Hall B1) Contrastive Learning, Jiexi Yan, Lei Luo, Chenghao Xu, Cheng
Deng, Heng Huang
0900–1030 Oral 1.1: Award Papers (Hall B1) 5. Active Learning for Open-Set Annotation, Kun-Peng Ning, Xun
Zhao, Yu Li, Sheng-Jun Huang
Chairs: CVPR 2022 Program Chairs
6. Understanding and Increasing Efficiency of Frank-Wolfe
Extended presentation of the CVPR 2022 Best Paper, Best Adversarial Training, Theodoros Tsiligkaridis, Jay Roberts
Student Paper, and honorable mention paper(s). 7. Robust Optimization As Data Augmentation for Large-Scale
Graphs, Kezhi Kong, Guohao Li, Mucong Ding, Zuxuan Wu, Chen
1030–1100 Morning Break (Halls B2-C) Zhu, Bernard Ghanem, Gavin Taylor, Tom Goldstein
8. A Re-Balancing Strategy for Class-Imbalanced Classification
Based on Instance Difficulty, Sihao Yu, Jiafeng Guo, Ruqing
1000–1700 Exhibits (Halls B2-C)
Zhang, Yixing Fan, Zizhen Wang, Xueqi Cheng
• See Exhibits map for list of exhibitors.
9. The Devil Is in the Margin: Margin-Based Label Smoothing for
Network Calibration, Bingyuan Liu, Ismail Ben Ayed, Adrian
1000–1330 Demos (Halls B2-C Demo Area) Galdran, Jose Dolz
• Real-Time, Accurate, and Consistent Video Semantic 10. Towards Better Plasticity-Stability Trade-Off in Incremental
Segmentation via Unsupervised Adaptation and Cross-Unit Learning: A Simple Linear Connector, Guoliang Lin, Hanlu Chu,
Deployment on Mobile Device, Hyojin Park, Alan Yessenbayev, Hanjiang Lai
Tushar Singhal, Navin Kumar Adhikari, Yizhe Zhang, Shubhankar 11. GCR: Gradient Coreset Based Replay Buffer Selection for
Borse, Hong Cai, Nilesh Pandey, Fei Yin, Frank Mayer, Balaji Calidas, Continual Learning, Rishabh Tiwari, Krishnateja Killamsetty,
Fatih Porikli, (Qualcomm AI Research) Rishabh Iyer, Pradeep Shenoy
• A Low-Cost & Real-Time Motion Capture System, Anargyros 12. Learning Bayesian Sparse Networks With Full Experience Replay
Chatzitofis, Georgios Albanis, Nikolaos Zioulis, Spyridon Thermos for Continual Learning, Qingsen Yan, Dong Gong, Yuhang Liu,
(Codewheel; Univ. of Thessaly) Anton van den Hengel, Javen Qinfeng Shi
• GeoEngine: A Platform for Production-Ready Geospatial Research, 13. A Variational Bayesian Method for Similarity Learning in Non-
Sagar Verma, Siddharth Gupta, Hal Shin, Akash Panigrahi, Shubham Rigid Image Registration, Daniel Grzech, Mohammad Farid
Goswami, Shweta Pardeshi, Natanael Exe, Ujwal Dutta, Tanka Azampour, Ben Glocker, Julia Schnabel, Nassir Navab, Bernhard
Joshi, Nitin Bhojwani (Université Paris-Saclay, CentraleSupélec, Kainz, Loïc Le Folgoc
Inria, Centre de Vision Numérique, Granular AI) 14. Learning To Learn by Jointly Optimizing Neural Architecture
• DeepLIIF: An Online Platform for Quantification of Clinical and Weights, Yadong Ding, Yu Wu, Chengyue Huang, Siliang
Pathology Slides, Parmida Ghahremani, Joseph Marino, Ricardo Tang, Yi Yang, Longhui Wei, Yueting Zhuang, Qi Tian
Dodds, Saad Nadeem (Memorial Sloan Kettering Cancer Center) 15. Learning To Prompt for Continual Learning, Zifeng Wang,
• Talking Face Generation With Multilingual TTS, Hyoung-Kyu Song, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren,
Sang Hoon Woo, Junhyeok Lee, Seungmin Yang, Hyunjae Cho, Guolong Su, Vincent Perot, Jennifer Dy, Tomas Pfister
Dongho Choi, Kang-wook Kim, Youseong Lee (MINDsLab Inc.; KAIST; 16. Meta-Attention for ViT-Backed Continual Learning, Mengqi Xue,
Seoul National Univ.) Haofei Zhang, Jie Song, Mingli Song
• [Virtual] Scenic: A JAX Library for Computer Vision Research and 17. Multi-Frame Self-Supervised Depth With Transformers, Vitor
Beyond, Mostafa Dehghani, Alexey Gritsenko, Anurag Arnab, Guizilini, Rareș Ambruș, Dian Chen, Sergey Zakharov, Adrien Gaidon
Matthias Minderer, yi Tay (Google Brain & Google Research) 18. Continual Learning With Lifelong Vision Transformer, Zhen
• [Virtual] BigDL 2.0: Seamless Scaling of AI Pipelines From Laptops Wang, Liu Liu, Yiqun Duan, Yajing Kong, Dacheng Tao
to Distributed Cluster, Jason Dai, Ding Ding, Dongjie Shi, 19. Rethinking Bayesian Deep Learning Methods for Semi-
Shengsheng Huang, Jiao Wang, Xin Qiu, Kai Huang, Guoqiong Song, Supervised Volumetric Medical Image Segmentation, Jianfeng
Yang Wang, Yiquan Gong, Jiaming Song, Shan Yu, Le Zheng, Yina Wang, Thomas Lukasiewicz
Chen, Junwei Deng, Ge Song (Intel) 20. Revisiting Random Channel Pruning for Neural Network
• [Virtual] PyMiceTracking: An Open-Source Toolbox for Real-Time Compression, Yawei Li, Kamil Adamczewski, Wen Li, Shuhang
Behavioral Neuroscience Experiments, Richardson Menezes, Aron Gu, Radu Timofte, Luc Van Gool
de Miranda, Helton Maia (Federal Univ. of Rio Grande do Norte) 21. Deep Safe Multi-View Clustering: Reducing the Risk of
• [Invited Talk] Papers and Code Aren’t Enough: Why Demos are Clustering Performance Degradation Caused by View Increase,
Critical to ML Research and How to Build Them, Abubakar Abid Huayi Tang, Yong Liu
(HuggingFace/Gradio)
4
Tuesday, June 21 (Morning) Program
22. Hypergraph-Induced Semantic Tuplet Loss for Deep Metric 45. Data-Free Network Compression via Parametric Non-Uniform
Learning, Jongin Lim, Sangdoo Yun, Seulki Park, Jin Young Choi Mixed Precision Quantization, Vladimir Chikin, Mikhail Antiukh
23. Towards Robust and Reproducible Active Learning Using Neural 46. AdaSTE: An Adaptive Straight-Through Estimator to Train
Networks, Prateek Munjal, Nasir Hayat, Munawar Hayat, Binary Neural Networks, Huu Le, Rasmus Kjær Høier, Che-Tsung
Jamshid Sourati, Shadab Khan Lin, Christopher Zach
24. Non-Iterative Recovery From Nonlinear Observations Using 47. Training Quantised Neural Networks With STE Variants: The
Generative Models, Jiulong Liu, Zhaoqiang Liu Additive Noise Annealing Algorithm, Matteo Spallanzani, Gian
25. Gaussian Process Modeling of Approximate Inference Errors for Paolo Leonardi, Luca Benini
Variational Autoencoders, Minyoung Kim 48. AME: Attention and Memory Enhancement in Hyper-Parameter
26. Robust Combination of Distributed Gradients Under Adversarial Optimization, Nuo Xu, Jianlong Chang, Xing Nie, Chunlei Huo,
Perturbations, Kwang In Kim Shiming Xiang, Chunhong Pan
27. Do Learned Representations Respect Causal Relationships? Lan 49. Accelerating Neural Network Optimization Through an
Wang, Vishnu Naresh Boddeti Automated Control Theory Lens, Jiahao Wang, Baoyuan Wu, Rui
28. How Much More Data Do I Need? Estimating Requirements for Su, Mingdeng Cao, Shuwei Shi, Wanli Ouyang, Yujiu Yang
Downstream Tasks, Rafid Mahmood, James Lucas, David Acuna, 50. Efficient Maximal Coding Rate Reduction by Variational Forms,
Daiqing Li, Jonah Philion, Jose M. Alvarez, Zhiding Yu, Sanja Christina Baek, Ziyang Wu, Kwan Ho Ryan Chan, Tianjiao Ding, Yi
Fidler, Marc T. Law Ma, Benjamin D. Haeffele
29. Pushing the Envelope of Gradient Boosting Forests via Globally- 51. A Unified Framework for Implicit Sinkhorn Differentiation,
Optimized Oblique Trees, Magzhan Gabidolla, Miguel Á. Marvin Eisenberger, Aysim Toker, Laura Leal-Taixé, Florian
Carreira-Perpiñán Bernard, Daniel Cremers
30. Contrastive Test-Time Adaptation, Dian Chen, Dequan Wang, 52. Computing Wasserstein-p Distance Between Images With
Trevor Darrell, Sayna Ebrahimi Linear Cost, Yidong Chen, Chen Li, Zhonghua Lu
31. AutoSDF: Shape Priors for 3D Completion, Reconstruction and 53. An Iterative Quantum Approach for Transformation Estimation
Generation, Paritosh Mittal, Yen-Chi Cheng, Maneesh Singh, From Point Sets, Natacha Kuete Meli, Florian Mannel, Jan
Shubham Tulsiani Lellmann
32. Selective-Supervised Contrastive Learning With Noisy Labels, Deep Learning Architectures & Techniques
Shikun Li, Xiaobo Xia, Shiming Ge, Tongliang Liu 54. BoosterNet: Improving Domain Generalization of Deep Neural
33. RecDis-SNN: Rectifying Membrane Potential Distribution for Nets Using Culpability-Ranked Features, Nourhan Bayasi,
Directly Training Spiking Neural Networks, Yufei Guo, Xinyi Tong, Ghassan Hamarneh, Rafeef Garbi
Yuanpei Chen, Liwen Zhang, Xiaode Liu, Zhe Ma, Xuhui Huang 55. Pooling Revisited: Your Receptive Field Is Suboptimal, Dong-
34. Hierarchical Nearest Neighbor Graph Embedding for Efficient Hwan Jang, Sanghyeok Chu, Joonhyuk Kim, Bohyung Han
Dimensionality Reduction, Saquib Sarfraz, Marios Koulakis, 56. Why Discard if You Can Recycle?: A Recycling Max Pooling
Constantin Seibold, Rainer Stiefelhagen Module for 3D Point Cloud Analysis, Jiajing Chen, Burak
Statistical Methods Kakillioglu, Huantao Ren, Senem Velipasalar
35. Scalable Penalized Regression for Noise Detection in Learning 57. Online Convolutional Re-Parameterization, Mu Hu, Junyi Feng,
With Noisy Labels, Yikai Wang, Xinwei Sun, Yanwei Fu Jiashen Hua, Baisheng Lai, Jianqiang Huang, Xiaojin Gong, Xian-
36. Nested Hyperbolic Spaces for Dimensionality Reduction and Sheng Hua
Hyperbolic NN Design, Xiran Fan, Chun-Hao Yang, Baba C. 58. RepMLPNet: Hierarchical Vision MLP With Re-Parameterized
Vemuri Locality, Xiaohan Ding, Honghao Chen, Xiangyu Zhang, Jungong
37. Learning Structured Gaussians to Approximate Deep Han, Guiguang Ding
Ensembles, Ivor J. A. Simpson, Sara Vicente, Neill D. F. Campbell 59. DyRep: Bootstrapping Training With Dynamic Re-
38. Out-of-Distribution Generalization With Causal Invariant Parameterization, Tao Huang, Shan You, Bohan Zhang, Yuxuan
Transformations, Ruoyu Wang, Mingyang Yi, Zhitang Chen, Du, Fei Wang, Chen Qian, Chang Xu
Shengyu Zhu 60. Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for
39. Split Hierarchical Variational Compression, Tom Ryder, Chen Free, Tianlong Chen, Zhenyu Zhang, Yihua Zhang, Shiyu Chang,
Zhang, Ning Kang, Shifeng Zhang Sijia Liu, Zhangyang Wang
40. Implicit Feature Decoupling With Depthwise Quantization, 61. Condensing CNNs With Partial Differential Equations, Anil Kag,
Iordanis Fostiropoulos, Barry Boehm Venkatesh Saligrama
41. Understanding Uncertainty Maps in Vision With Statistical 62. Deep Equilibrium Optical Flow Estimation, Shaojie Bai,
Testing, Jurijs Nazarovs, Zhichun Huang, Songwong Zhengyang Geng, Yash Savani, J. Zico Kolter
Tasneeyapant, Rudrasis Chakraborty, Vikas Singh 63. Frame Averaging for Equivariant Shape Space Learning, Matan
Atzmon, Koki Nagano, Sanja Fidler, Sameh Khamis, Yaron Lipman
Optimization Methods
64. Dual-Generator Face Reenactment, Gee-Sern Hsu, Chun-Hung
42. A Hybrid Quantum-Classical Algorithm for Robust Fitting, Anh-
Tsai, Hung-Yi Wu
Dzung Doan, Michele Sasdelli, David Suter, Tat-Jun Chin
65. Convolution of Convolution: Let Kernels Spatially Collaborate,
43. A Scalable Combinatorial Solver for Elastic Geometrically
Rongzhen Zhao, Jian Li, Zhenzhi Wu
Consistent 3D Shape Matching, Paul Roetzer, Paul Swoboda,
Daniel Cremers, Florian Bernard 66. SASIC: Stereo Image Compression With Latent Shifts and
44. FastDOG: Fast Discrete Optimization on GPU, Ahmed Abbas,
Stereo Attention, Matthias Wödlinger, Jan Kotera, Jan Xu, Robert
Sablatnig
Paul Swoboda
5
Tuesday, June 21 (Morning) Program
67. RADU: Ray-Aligned Depth Update Convolutions for ToF Data Recognition: Detection, Categorization, Retrieval
Denoising, Michael Schelling, Pedro Hermosilla, Timo Ropinski 87. ISNet: Shape Matters for Infrared Small Target Detection,
68. Co-Domain Symmetry for Complex-Valued Deep Learning, Mingjin Zhang, Rui Zhang, Yuxiang Yang, Haichen Bai, Jing
Utkarsh Singhal, Yifei Xing, Stella X. Yu Zhang, Jie Guo
69. Paramixer: Parameterizing Mixing Links in Sparse Factors Works 88. Pseudo-Stereo for Monocular 3D Object Detection in
Better Than Dot-Product Self-Attention, Tong Yu, Ruslan Autonomous Driving, Yi-Nan Chen, Hang Dai, Yong Ding
Khalitov, Lei Cheng, Zhirong Yang 89. CLRNet: Cross Layer Refinement Network for Lane Detection,
70. Compressing Models With Few Samples: Mimicking Then Tu Zheng, Yifei Huang, Yang Liu, Wenjian Tang, Zheng Yang,
Replacing, Huanyu Wang, Junjie Liu, Xin Ma, Yang Yong, Zhenhua Deng Cai, Xiaofei He
Chai, Jianxin Wu 90. CAT-Det: Contrastively Augmented Transformer for Multi-
71. Total Variation Optimization Layers for Computer Vision, Modal 3D Object Detection, Yanan Zhang, Jiaxin Chen, Di Huang
Raymond A. Yeh, Yuan-Ting Hu, Zhongzheng Ren, Alexander G. 91. Modality-Agnostic Learning for Radar-Lidar Fusion in Vehicle
Schwing Detection, Yu-Jhe Li, Jinhyung Park, Matthew O'Toole, Kris Kitani
72. AIM: An Auto-Augmenter for Images and Meshes, Vinit 92. Group Contextualization for Video Recognition, Yanbin Hao, Hao
Veerendraveer Singh, Chandra Kambhamettu Zhang, Chong-Wah Ngo, Xiangnan He
73. Recurrent Variational Network: A Deep Learning Inverse 93. Learning Transferable Human-Object Interaction Detector With
Problem Solver Applied to the Task of Accelerated MRI Natural Language Supervision, Suchen Wang, Yueqi Duan,
Reconstruction, George Yiasemis, Jan-Jakob Sonke, Clarisa Henghui Ding, Yap-Peng Tan, Kim-Hui Yap, Junsong Yuan
Sánchez, Jonas Teuwen 94. Accelerating DETR Convergence via Semantic-Aligned
74. Deep Orientation-Aware Functional Maps: Tackling Symmetry Matching, Gongjie Zhang, Zhipeng Luo, Yingchen Yu, Kaiwen Cui,
Issues in Shape Matching, Nicolas Donati, Etienne Corman, Maks Shijian Lu
Ovsjanikov 95. Efficient Video Instance Segmentation via Tracklet Query and
75. Weakly-Supervised Metric Learning With Cross-Module Proposal, Jialian Wu, Sudhir Yarram, Hui Liang, Tian Lan, Junsong
Communications for the Classification of Anterior Chamber Yuan, Jayan Eledath, Gérard Medioni
Angle Images, Jingqi Huang, Yue Ning, Dong Nie, Linan Guan, 96. Class Re-Activation Maps for Weakly-Supervised Semantic
Xiping Jia Segmentation, Zhaozheng Chen, Tan Wang, Xiongwei Wu, Xian-
76. Delving Into the Estimation Shift of Batch Normalization in a Sheng Hua, Hanwang Zhang, Qianru Sun
Network, Lei Huang, Yi Zhou, Tian Wang, Jie Luo, Xianglong Liu 97. Democracy Does Matter: Comprehensive Feature Mining for Co-
77. Generalizing Interactive Backpropagating Refinement for Dense Salient Object Detection, Siyue Yu, Jimin Xiao, Bingfeng Zhang,
Prediction Networks, Fanqing Lin, Brian Price, Tony Martinez Eng Gee Lim
78. Brain-Inspired Multilayer Perceptron With Spiking Neurons, 98. C2AM: Contrastive Learning of Class-Agnostic Activation Map
Wenshuo Li, Hanting Chen, Jianyuan Guo, Ziyang Zhang, Yunhe for Weakly Supervised Object Localization and Semantic
Wang Segmentation, Jinheng Xie, Jianfeng Xiang, Junliang Chen,
79. Smooth Maximum Unit: Smooth Activation Function for Deep Xianxu Hou, Xiaodong Zhao, Linlin Shen
Networks Using Smoothing Maximum Technique, Koushik 99. Sketching Without Worrying: Noise-Tolerant Sketch-Based
Biswas, Sandeep Kumar, Shilpak Banerjee, Ashish Kumar Pandey Image Retrieval, Ayan Kumar Bhunia, Subhadeep Koley, Abdullah
80. Revisiting Weakly Supervised Pre-Training of Visual Perception Faiz Ur Rahman Khilji, Aneeshan Sain, Pinaki Nath Chowdhury,
Models, Mannat Singh, Laura Gustafson, Aaron Adcock, Vinicius Tao Xiang, Yi-Zhe Song
de Freitas Reis, Bugra Gedik, Raj Prateek Kosaraju, Dhruv 100. AutoLoss-Zero: Searching Loss Functions From Scratch for
Mahajan, Ross Girshick, Piotr Dollár, Laurens van der Maaten Generic Tasks, Hao Li, Tianwen Fu, Jifeng Dai, Hongsheng Li, Gao
81. On the Integration of Self-Attention and Convolution, Xuran Huang, Xizhou Zhu
Pan, Chunjiang Ge, Rui Lu, Shiji Song, Guanfu Chen, Zeyi Huang, 101. Consistency Learning via Decoding Path Augmentation for
Gao Huang Transformers in Human Object Interaction Detection, Jihwan
82. Hire-MLP: Vision MLP via Hierarchical Rearrangement, Jianyuan Park, SeungJun Lee, Hwan Heo, Hyeong Kyu Choi, Hyunwoo J.
Guo, Yehui Tang, Kai Han, Xinghao Chen, Han Wu, Chao Xu, Kim
Chang Xu, Yunhe Wang 102. A Proposal-Based Paradigm for Self-Supervised Sound Source
83. Stable Long-Term Recurrent Video Super-Resolution, Benjamin Localization in Videos, Hanyu Xuan, Zhiliang Wu, Jian Yang, Yan
Naoto Chiche, Arnaud Woiselle, Joana Frontera-Pons, Jean-Luc Yan, Xavier Alameda-Pineda
Starck 103. SimAN: Exploring Self-Supervised Representation Learning of
84. Single-Domain Generalized Object Detection in Urban Scene via Scene Text via Similarity-Aware Normalization, Canjie Luo,
Cyclic-Disentangled Self-Distillation, Aming Wu, Cheng Deng Lianwen Jin, Jingdong Chen
85. Progressive End-to-End Object Detection in Crowded Scenes, 104. Towards End-to-End Unified Scene Text Detection and Layout
Anlin Zheng, Yuang Zhang, Xiangyu Zhang, Xiaojuan Qi, Jian Sun Analysis, Shangbang Long, Siyang Qin, Dmitry Panteleev,
86. Zero-Shot Text-Guided Object Generation With Dream Fields, Alessandro Bissacco, Yasuhisa Fujii, Michalis Raptis
Ajay Jain, Ben Mildenhall, Jonathan T. Barron, Pieter Abbeel, Ben 105. Clothes-Changing Person Re-Identification With RGB Modality
Poole Only, Xinqian Gu, Hong Chang, Bingpeng Ma, Shutao Bai,
Shiguang Shan, Xilin Chen

6
Tuesday, June 21 (Morning) Program
106. MonoJSG: Joint Semantic and Geometric Cost Volume for 126. Panoptic SegFormer: Delving Deeper Into Panoptic
Monocular 3D Object Detection, Qing Lian, Peiliang Li, Xiaozhi Segmentation With Transformers, Zhiqi Li, Wenhai Wang, Enze
Chen Xie, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo,
107. Homography Loss for Monocular 3D Object Detection, Jiaqi Gu, Tong Lu
Bojian Wu, Lubin Fan, Jianqiang Huang, Shen Cao, Zhiyu Xiang, 127. Masked-Attention Mask Transformer for Universal Image
Xian-Sheng Hua Segmentation, Bowen Cheng, Ishan Misra, Alexander G.
108. TransFusion: Robust LiDAR-Camera Fusion for 3D Object Schwing, Alexander Kirillov, Rohit Girdhar
Detection With Transformers, Xuyang Bai, Zeyu Hu, Xinge Zhu, 128. FocalClick: Towards Practical Interactive Image Segmentation,
Qingqiu Huang, Yilun Chen, Hongbo Fu, Chiew-Lan Tai Xi Chen, Zhiyan Zhao, Yilei Zhang, Manni Duan, Donglian Qi,
109. TWIST: Two-Way Inter-Label Self-Training for Semi-Supervised Hengshuang Zhao
3D Instance Segmentation, Ruihang Chu, Xiaoqing Ye, Zhengzhe 129. High Quality Segmentation for Ultra High-Resolution Images,
Liu, Xiao Tan, Xiaojuan Qi, Chi-Wing Fu, Jiaya Jia Tiancheng Shen, Yuechen Zhang, Lu Qi, Jason Kuen, Xingyu Xie,
110. RBGNet: Ray-Based Grouping for 3D Object Detection, Haiyang Jianlong Wu, Zhe Lin, Jiaya Jia
Wang, Shaoshuai Shi, Ze Yang, Rongyao Fang, Qi Qian, 130. Wnet: Audio-Guided Video Object Segmentation via Wavelet-
Hongsheng Li, Bernt Schiele, Liwei Wang Based Cross-Modal Denoising Networks, Wenwen Pan, Haonan
111. Voxel Field Fusion for 3D Object Detection, Yanwei Li, Xiaojuan Shi, Zhou Zhao, Jieming Zhu, Xiuqiang He, Zhigeng Pan, Lianli
Qi, Yukang Chen, Liwei Wang, Zeming Li, Jian Sun, Jiaya Jia Gao, Jun Yu, Fei Wu, Qi Tian
112. Learning To Detect Mobile Objects From LiDAR Scans Without 131. Recurrent Dynamic Embedding for Video Object Segmentation,
Labels, Yurong You, Katie Luo, Cheng Perng Phoo, Wei-Lun Chao, Mingxing Li, Li Hu, Zhiwei Xiong, Bang Zhang, Pan Pan, Dong Liu
Wen Sun, Bharath Hariharan, Mark Campbell, Kilian Q. 132. Accelerating Video Object Segmentation With Compressed
Weinberger Video, Kai Xu, Angela Yao
113. OccAM’s Laser: Occlusion-Based Attribution Maps for 3D Object 133. Per-Clip Video Object Segmentation, Kwanyong Park, Sanghyun
Detectors on LiDAR Data, David Schinagl, Georg Krispel, Horst Woo, Seoung Wug Oh, In So Kweon, Joon-Young Lee
Possegger, Peter M. Roth, Horst Bischof 134. SWEM: Towards Real-Time Video Object Segmentation With
114. Confidence Propagation Cluster: Unleash Full Potential of Sequential Weighted Expectation-Maximization, Zhihui Lin,
Object Detectors, Yichun Shen, Wanli Jiang, Zhen Xu, Rundong Tianyu Yang, Maomao Li, Ziyu Wang, Chun Yuan, Wenhao Jiang,
Li, Junghyun Kwon, Siyi Li Wei Liu
115. TransGeo: Transformer Is All You Need for Cross-View Image 135. Neural Recognition of Dashed Curves With Gestalt Law of
Geo-Localization, Sijie Zhu, Mubarak Shah, Chen Chen Continuity, Hanyuan Liu, Chengze Li, Xueting Liu, Tien-Tsin Wong
116. A Voxel Graph CNN for Object Classification With Event 136. CVNet: Contour Vibration Network for Building Extraction,
Cameras, Yongjian Deng, Hao Chen, Hai Liu, Youfu Li Ziqiang Xu, Chunyan Xu, Zhen Cui, Xiangwei Zheng, Jian Yang
117. OSKDet: Orientation-Sensitive Keypoint Localization for 137. A Keypoint-Based Global Association Network for Lane
Rotated Object Detection, Dongchen Lu, Dongmei Li, Yali Li, Detection, Jinsheng Wang, Yinchao Ma, Shaofei Huang, Tianrui
Shengjin Wang Hui, Fei Wang, Chen Qian, Tianzhu Zhang
118. Canonical Voting: Towards Robust Oriented Bounding Box 138. EDTER: Edge Detection With Transformer, Mengyang Pu,
Detection in 3D Scenes, Yang You, Zelin Ye, Yujing Lou, Chengkun Yaping Huang, Yuming Liu, Qingji Guan, Haibin Ling
Li, Yong-Lu Li, Lizhuang Ma, Weiming Wang, Cewu Lu 139. Fixing Malfunctional Objects With Learned Physical Simulation
Segmentation, Grouping and Shape Analysis and Functional Prediction, Yining Hong, Kaichun Mo, Li Yi,
119. Category Contrast for Unsupervised Domain Adaptation in Leonidas J. Guibas, Antonio Torralba, Joshua B. Tenenbaum,
Visual Tasks, Jiaxing Huang, Dayan Guan, Aoran Xiao, Shijian Lu, Chuang Gan
Ling Shao 140. Coherent Point Drift Revisited for Non-Rigid Shape Matching
120. Amodal Segmentation Through Out-of-Task and Out-of- and Registration, Aoxiang Fan, Jiayi Ma, Xin Tian, Xiaoguang Mei,
Distribution Generalization With a Bayesian Model, Yihong Sun, Wei Liu
Adam Kortylewski, Alan Yuille 141. CodedVTR: Codebook-Based Sparse Voxel Transformer With
121. GANSeg: Learning To Segment by Unsupervised Hierarchical Geometric Guidance, Tianchen Zhao, Niansong Zhang, Xuefei
Image Generation, Xingzhe He, Bastian Wandt, Helge Rhodin Ning, He Wang, Li Yi, Yu Wang
122. Segment-Fusion: Hierarchical Context Fusion for Robust 3D 142. FLOAT: Factorized Learning of Object Attributes for Improved
Semantic Segmentation, Anirud Thyagharajan, Benjamin Multi-Object Multi-Part Scene Parsing, Rishubh Singh, Pranav
Ummenhofer, Prashant Laddha, Om Ji Omer, Sreenivas Gupta, Pradeep Shenoy, Ravikiran Sarvadevabhatla
Subramoney 143. Rotationally Equivariant 3D Object Detection, Hong-Xing Yu,
123. Deep Hierarchical Semantic Segmentation, Liulei Li, Tianfei Jiajun Wu, Li Yi
Zhou, Wenguan Wang, Jianwu Li, Yi Yang 144. AUV-Net: Learning Aligned UV Maps for Texture Transfer and
124. Semantic Segmentation by Early Region Proxy, Yifan Zhang, Bo Synthesis, Zhiqin Chen, Kangxue Yin, Sanja Fidler
Pang, Cewu Lu 3D From Single Images
125. Panoptic, Instance and Semantic Relations: A Relational Context 145. Learning To Estimate Robust 3D Human Mesh From In-the-Wild
Encoder to Enhance Panoptic Segmentation, Shubhankar Borse, Crowded Scenes, Hongsuk Choi, Gyeongsik Moon, JoonKyu Park,
Hyojin Park, Hong Cai, Debasmit Das, Risheek Garrepalli, Fatih Kyoung Mu Lee
Porikli 146. Human Mesh Recovery From Multiple Shots, Georgios Pavlakos,
Jitendra Malik, Angjoo Kanazawa

7
Tuesday, June 21 (Morning) Program
147. HandOccNet: Occlusion-Robust 3D Hand Mesh Estimation 169. LAKe-Net: Topology-Aware Point Cloud Completion by
Network, JoonKyu Park, Yeonguk Oh, Gyeongsik Moon, Hongsuk Localizing Aligned Keypoints, Junshu Tang, Zhijun Gong, Ran Yi,
Choi, Kyoung Mu Lee Yuan Xie, Lizhuang Ma
148. Photorealistic Monocular 3D Reconstruction of Humans Wearing 170. OcclusionFusion: Occlusion-Aware Motion Estimation for Real-
Clothing, Thiemo Alldieck, Mihai Zanfir, Cristian Sminchisescu Time Dynamic 3D Reconstruction, Wenbin Lin, Chengwei Zheng,
149. Disentangled3D: Learning a 3D Generative Model With Jun-Hai Yong, Feng Xu
Disentangled Geometry and Appearance From Monocular 171. Depth Estimation by Combining Binocular Stereo and
Images, Ayush Tewari, Mallikarjun B R, Xingang Pan, Ohad Fried, Monocular Structured-Light, Yuhua Xu, Xiaoli Yang, Yushan Yu,
Maneesh Agrawala, Christian Theobalt Wei Jia, Zhaobi Chu, Yulan Guo
150. NeuralHDHair: Automatic High-Fidelity Hair Modeling From a 172. Learning From Pixel-Level Noisy Label: A New Perspective for
Single Image Using Implicit Neural Representations, Keyu Wu, Light Field Saliency Detection, Mingtao Feng, Kendong Liu, Liang
Yifan Ye, Lingchen Yang, Hongbo Fu, Kun Zhou, Youyi Zheng Zhang, Hongshan Yu, Yaonan Wang, Ajmal Mian
151. Topologically-Aware Deformation Fields for Single-View 3D Photogrammetry and Remote Sensing
Reconstruction, Shivam Duggal, Deepak Pathak 173. HyperTransformer: A Textural and Spectral Feature Fusion
152. Generating Diverse 3D Reconstructions From a Single Occluded Transformer for Pansharpening, Wele Gedara Chaminda
Face Image, Rahul Dey, Vishnu Naresh Boddeti Bandara, Vishal M. Patel
153. LOLNerf: Learn From One Look, Daniel Rebain, Mark Matthews, 174. Revisiting Near/Remote Sensing With Geospatial Attention,
Kwang Moo Yi, Dmitry Lagun, Andrea Tagliasacchi Scott Workman, M. Usman Rafique, Hunter Blanton, Nathan
154. Learning Local Displacements for Point Cloud Completion, Yida Jacobs
Wang, David Joseph Tan, Nassir Navab, Federico Tombari 175. Memory-Augmented Deep Conditional Unfolding Network for
155. Exploiting Pseudo Labels in a Self-Supervised Learning Pan-Sharpening, Gang Yang, Man Zhou, Keyu Yan, Aiping Liu,
Framework for Improved Monocular Depth Estimation, Andra Xueyang Fu, Fan Wang
Petrovai, Sergiu Nedevschi 176. Mutual Information-Driven Pan-Sharpening, Man Zhou, Keyu
156. Dimension Embeddings for Monocular 3D Object Detection, Yan, Jie Huang, Zihe Yang, Xueyang Fu, Feng Zhao
Yunpeng Zhang, Wenzhao Zheng, Zheng Zhu, Guan Huang, 177. Sparse and Complete Latent Organization for Geospatial
Dalong Du, Jie Zhou, Jiwen Lu Semantic Segmentation, Fengyu Yang, Chenyang Ma
157. Understanding 3D Object Articulation in Internet Videos, 178. The Probabilistic Normal Epipolar Constraint for Frame-to-
Shengyi Qian, Linyi Jin, Chris Rockwell, Siyi Chen, David F. Fouhey Frame Rotation Optimization Under Uncertain Feature
158. P3Depth: Monocular Depth Estimation With a Piecewise Positions, Dominik Muhle, Lukas Koestler, Nikolaus Demmel,
Planarity Prior, Vaishakh Patil, Christos Sakaridis, Alexander Florian Bernard, Daniel Cremers
Liniger, Luc Van Gool 179. Oriented RepPoints for Aerial Object Detection, Wentong Li,
159. Neural Face Identification in a 2D Wireframe Projection of a Yijie Chen, Kaixuan Hu, Jianke Zhu
Manifold Object, Kehan Wang, Jia Zheng, Zihan Zhou 180. Using 3D Topological Connectivity for Ghost Particle Reduction
160. PanopticDepth: A Unified Framework for Depth-Aware Panoptic in Flow Reconstruction, Christina Tsalicoglou, Thomas Rösgen
Segmentation, Naiyu Gao, Fei He, Jian Jia, Yanhu Shan, Haoyang 181. PolyWorld: Polygonal Building Extraction With Graph Neural
Zhang, Xin Zhao, Kaiqi Huang Networks in Satellite Images, Stefano Zorzi, Shabab Bazrafkan,
161. Stability-Driven Contact Reconstruction From Monocular Color Stefan Habenschuss, Friedrich Fraundorfer
Images, Zimeng Zhao, Binghui Zuo, Wei Xie, Yangang Wang 182. Self-Supervised Super-Resolution for Multi-Exposure Push-
162. LGT-Net: Indoor Panoramic Room Layout Estimation With Frame Satellites, Ngoc Long Nguyen, Jérémy Anger, Axel Davy,
Geometry-Aware Transformer Network, Zhigang Jiang, Pablo Arias, Gabriele Facciolo
Zhongzheng Xiang, Jinhua Xu, Ming Zhao
Low-Level Vision
163. Collaborative Learning for Hand and Object Reconstruction
183. MISF: Multi-Level Interactive Siamese Filtering for High-Fidelity
With Attention-Guided Graph Convolution, Tze Ho Elden Tse,
Image Inpainting, Xiaoguang Li, Qing Guo, Di Lin, Ping Li, Wei
Kwang In Kim, Ales̆ Leonardis, Hyung Jin Chang
Feng, Song Wang
164. RM-Depth: Unsupervised Learning of Recurrent Monocular
184. Iterative Deep Homography Estimation, Si-Yuan Cao, Jianxin Hu,
Depth in Dynamic Scenes, Tak-Wai Hui
Zehua Sheng, Hui-Liang Shen
165. Exploring Geometric Consistency for Monocular 3D Object
185. GCFSR: A Generative and Controllable Face Super Resolution
Detection, Qing Lian, Botao Ye, Ruijia Xu, Weilong Yao, Tong
Method Without Facial and GAN Priors, Jingwen He, Wu Shi, Kai
Zhang
Chen, Lean Fu, Chao Dong
166. Learning 3D Object Shape and Layout Without 3D Supervision,
186. Deep Color Consistent Network for Low-Light Image
Georgia Gkioxari, Nikhila Ravi, Justin Johnson
Enhancement, Zhao Zhang, Huan Zheng, Richang Hong,
167. Single-Stage 3D Geometry-Preserving Depth Estimation Model
Mingliang Xu, Shuicheng Yan, Meng Wang
Training on Dataset Mixtures With Uncalibrated Stereo Data,
187. LAR-SR: A Local Autoregressive Model for Image Super-
Nikolay Patakin, Anna Vorontsova, Mikhail Artemyev, Anton
Resolution, Baisong Guo, Xiaoyun Zhang, Haoning Wu, Yu Wang,
Konushin
Ya Zhang, Yan-Feng Wang
168. Occluded Human Mesh Recovery, Rawal Khirodkar, Shashank
188. Multi-Scale Memory-Based Video Deblurring, Bo Ji, Angela Yao
Tripathi, Kris Kitani
189. Local Texture Estimator for Implicit Representation Function,
Jaewon Lee, Kyong Hwan Jin

8
Tuesday, June 21 (Morning) Program
190. Chitransformer: Towards Reliable Stereo From Cues, Qing Su, 211. Can You Spot the Chameleon? Adversarially Camouflaging
Shihao Ji Images From Co-Salient Object Detection, Ruijun Gao, Qing
191. BNUDC: A Two-Branched Deep Neural Network for Restoring Guo, Felix Juefei-Xu, Hongkai Yu, Huazhu Fu, Wei Feng, Yang Liu,
Images From Under-Display Cameras, Jaihyun Koh, Jangho Lee, Song Wang
Sungroh Yoon 212. Zoom in and Out: A Mixed-Scale Triplet Network for
192. ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Camouflaged Object Detection, Youwei Pang, Xiaoqi Zhao, Tian-
Image Prior, Metin Ersin Arican, Ozgur Kara, Gustav Bredell, Zhu Xiang, Lihe Zhang, Huchuan Lu
Ender Konukoglu Behavior Analysis
193. IFRNet: Intermediate Feature Refine Network for Efficient 213. Self-Supervised Keypoint Discovery in Behavioral Videos,
Frame Interpolation, Lingtong Kong, Boyuan Jiang, Donghao Luo, Jennifer J. Sun, Serim Ryou, Roni H. Goldshmid, Brandon
Wenqing Chu, Xiaoming Huang, Ying Tai, Chengjie Wang, Jie Yang Weissbourd, John O. Dabiri, David J. Anderson, Ann Kennedy,
194. Learning Graph Regularisation for Guided Super-Resolution, Yisong Yue, Pietro Perona
Riccardo de Lutio, Alexander Becker, Stefano D'Aronco, Stefania 214. Learning To Align Sequential Actions in the Wild, Weizhe Liu,
Russo, Jan D. Wegner, Konrad Schindler Bugra Tekin, Huseyin Coskun, Vibhav Vineet, Pascal Fua, Marc
195. Self-Supervised Deep Image Restoration via Adaptive Pollefeys
Stochastic Gradient Langevin Dynamics, Weixi Wang, Ji Li, Hui Ji 215. Dynamic 3D Gaze From Afar: Deep Gaze Estimation From
196. Self-Supervised Arbitrary-Scale Point Clouds Upsampling via Temporal Eye-Head-Body Coordination, Soma Nonaka, Shohei
Implicit Neural Representation, Wenbo Zhao, Xianming Liu, Nobuhara, Ko Nishino
Zhiwei Zhong, Junjun Jiang, Wei Gao, Ge Li, Xiangyang Ji 216. End-to-End Human-Gaze-Target Detection With Transformers,
197. Noise Distribution Adaptive Self-Supervised Image Denoising Danyang Tu, Xiongkuo Min, Huiyu Duan, Guodong Guo, Guangtao
Using Tweedie Distribution and Score Matching, Kwanyoung Zhai, Wei Shen
Kim, Taesung Kwon, Jong Chul Ye 217. Automatic Synthesis of Diverse Weak Supervision Sources for
198. Unpaired Deep Image Deraining Using Dual Contrastive Behavior Analysis, Albert Tseng, Jennifer J. Sun, Yisong Yue
Learning, Xiang Chen, Jinshan Pan, Kui Jiang, Yufeng Li, Yufeng 218. MUSE-VAE: Multi-Scale VAE for Environment-Aware Long Term
Huang, Caihua Kong, Longgang Dai, Zhentao Fan Trajectory Prediction, Mihee Lee, Samuel S. Sohn, Seonghyeon
199. Blind2Unblind: Self-Supervised Image Denoising With Visible Moon, Sejong Yoon, Mubbasir Kapadia, Vladimir Pavlovic
Blind Spots, Zejin Wang, Jiazheng Liu, Guoqing Li, Hua Han 219. Graph-Based Spatial Transformer With Memory Replay for
200. Self-Augmented Unpaired Image Dehazing via Density and Multi-Future Pedestrian Trajectory Prediction, Lihuan Li,
Depth Decomposition, Yang Yang, Chaoyue Wang, Risheng Liu, Maurice Pagnucco, Yang Song
Lin Zhang, Xiaojie Guo, Dacheng Tao 220. End-to-End Trajectory Distribution Prediction Based on
201. VideoINR: Learning Video Implicit Neural Representation for Occupancy Grid Maps, Ke Guo, Wenxi Liu, Jia Pan
Continuous Space-Time Super-Resolution, Zeyuan Chen, Yinbo 221. Learning Affordance Grounding From Exocentric Images,
Chen, Jingwen Liu, Xingqian Xu, Vidit Goel, Zhangyang Wang, Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, Dacheng Tao
Humphrey Shi, Xiaolong Wang
Vision Applications & Systems
202. Fast Algorithm for Low-Rank Tensor Completion in Delay-
222. 3D Scene Painting via Semantic Image Synthesis, Jaebong
Embedded Space, Ryuki Yamamoto, Hidekata Hontani, Akira
Jeong, Janghun Jo, Sunghyun Cho, Jaesik Park
Imakura, Tatsuya Yokota
223. Learning Invisible Markers for Hidden Codes in Offline-to-Online
203. Exploring and Evaluating Image Restoration Potential in
Photography, Jun Jia, Zhongpai Gao, Dandan Zhu, Xiongkuo Min,
Dynamic Scenes, Cheng Zhang, Shaolin Su, Yu Zhu, Qingsen Yan,
Guangtao Zhai, Xiaokang Yang
Jinqiu Sun, Yanning Zhang
224. ETHSeg: An Amodel Instance Segmentation Network and a
204. GIQE: Generic Image Quality Enhancement via Nth Order Itera-
Real-World Dataset for X-Ray Waste Inspection, Lingteng Qiu,
tive Degradation, Pranjay Shyam, Kyung-Soo Kim, Kuk-Jin Yoon
Zhangyang Xiong, Xuhao Wang, Kenkun Liu, Yihan Li, Guanying
205. Does Text Attract Attention on E-Commerce Images: A Novel
Chen, Xiaoguang Han, Shuguang Cui
Saliency Prediction Dataset and Method, Lai Jiang, Yifei Li,
225. Doodle It Yourself: Class Incremental Learning by Drawing a
Shengxi Li, Mai Xu, Se Lei, Yichen Guo, Bo Huang
Few Sketches, Ayan Kumar Bhunia, Viswanatha Reddy Gajjala,
206. IDR: Self-Supervised Image Denoising via Iterative Data
Subhadeep Koley, Rohit Kundu, Aneeshan Sain, Tao Xiang, Yi-Zhe
Refinement, Yi Zhang, Dasong Li, Ka Lung Law, Xiaogang Wang,
Song
Hongwei Qin, Hongsheng Li
226. Image Disentanglement Autoencoder for Steganography
207. ABPN: Adaptive Blend Pyramid Network for Real-Time Local
Without Embedding, Xiyao Liu, Ziping Ma, Junxing Ma, Jian
Retouching of Ultra High-Resolution Photo, Biwen Lei, Xiefan
Zhang, Gerald Schaefer, Hui Fang
Guo, Hongyu Yang, Miaomiao Cui, Xuansong Xie, Di Huang
227. Adaptive Hierarchical Representation Learning for Long-Tailed
208. Texture-Based Error Analysis for Image Super-Resolution,
Object Detection, Banghuai Li
Salma Abdel Magid, Zudi Lin, Donglai Wei, Yulun Zhang, Jinjin Gu,
228. Semiconductor Defect Detection by Hybrid Classical-Quantum
Hanspeter Pfister
Deep Learning, Yuan-Fu Yang, Min Sun
209. Blind Image Super-Resolution With Elaborate Degradation
229. Density-Preserving Deep Point Cloud Compression, Yun He, Xinlin
Modeling on Noise and Kernel, Zongsheng Yue, Qian Zhao,
Ren, Danhang Tang, Yinda Zhang, Xiangyang Xue, Yanwei Fu
Jianwen Xie, Lei Zhang, Deyu Meng, Kwan-Yee K. Wong
230. Graph-Context Attention Networks for Size-Varied Deep Graph
210. KNN Local Attention for Image Restoration, Hunsang Lee,
Matching, Zheheng Jiang, Hossein Rahmani, Plamen Angelov,
Hyesong Choi, Kwanghoon Sohn, Dongbo Min
Sue Black, Bryan M. Williams
9
Tuesday, June 21 (Morning) Program
231. TransWeather: Transformer-Based Restoration of Images Notes:
Degraded by Adverse Weather Conditions, Jeya Maria Jose
Valanarasu, Rajeev Yasarla, Vishal M. Patel
232. ObjectFormer for Image Manipulation Detection and
Localization, Junke Wang, Zuxuan Wu, Jingjing Chen, Xintong
Han, Abhinav Shrivastava, Ser-Nam Lim, Yu-Gang Jiang
233. Sequential Voting With Relational Box Fields for Active Object
Detection, Qichen Fu, Xingyu Liu, Kris Kitani
234. Efficient Classification of Very Large Images With Tiny Objects,
Fanjie Kong, Ricardo Henao
235. Partially Does It: Towards Scene-Level FG-SBIR With Partial
Input, Pinaki Nath Chowdhury, Ayan Kumar Bhunia, Viswanatha
Reddy Gajjala, Aneeshan Sain, Tao Xiang, Yi-Zhe Song
236. Long-Term Visual Map Sparsification With Heterogeneous
GNN, Ming-Fang Chang, Yipu Zhao, Rajvi Shah, Jakob J. Engel,
Michael Kaess, Simon Lucey
237. Connecting the Complementary-View Videos: Joint Camera
Identification and Subject Association, Ruize Han, Yiyang Gan,
Jiacheng Li, Feifan Wang, Wei Feng, Song Wang
238. DiffusionCLIP: Text-Guided Diffusion Models for Robust Image
Manipulation, Gwanghyun Kim, Taesung Kwon, Jong Chul Ye
239. Aesthetic Text Logo Synthesis via Content-Aware Layout
Inferring, Yizhi Wang, Guo Pu, Wenhan Luo, Yexin Wang, Pengfei
Xiong, Hongwen Kang, Zhouhui Lian
240. Rethinking Image Cropping: Exploring Diverse Compositions
From Global Views, Gengyun Jia, Huaibo Huang, Chaoyou Fu,
Ran He
241. Defensive Patches for Robust Recognition in the Physical World,
Jiakai Wang, Zixin Yin, Pengfei Hu, Aishan Liu, Renshuai Tao,
Haotong Qin, Xianglong Liu, Dacheng Tao
242. Semi-Supervised Video Paragraph Grounding With Contrastive
Encoder, Xun Jiang, Xing Xu, Jingran Zhang, Fumin Shen, Zuo
Cao, Heng Tao Shen
243. Large-Scale Pre-Training for Person Re-Identification With
Noisy Labels, Dengpan Fu, Dongdong Chen, Hao Yang, Jianmin
Bao, Lu Yuan, Lei Zhang, Houqiang Li, Fang Wen, Dong Chen
244. Meta Distribution Alignment for Generalizable Person Re-
Identification, Hao Ni, Jingkuan Song, Xiaopeng Luo, Feng Zheng,
Wen Li, Heng Tao Shen
245. FvOR: Robust Joint Shape and Pose Optimization for Few-View
Object Reconstruction, Zhenpei Yang, Zhile Ren, Miguel Angel
Bautista, Zaiwei Zhang, Qi Shan, Qixing Huang
246. It’s About Time: Analog Clock Reading in the Wild, Charig Yang,
Weidi Xie, Andrew Zisserman
247. Consistency Driven Sequential Transformers Attention Model
for Partially Observable Scenes, Samrudhdhi B. Rangrej, Chetan
L. Srinidhi, James J. Clark
248. SMARTADAPT: Multi-Branch Object Detection Framework for
Videos on Mobiles, Ran Xu, Fangzhou Mu, Jayoung Lee, Preeti
Mukherjee, Somali Chaterji, Saurabh Bagchi, Yin Li
249. Generating 3D Bio-Printable Patches Using Wound
Segmentation and Reconstruction To Treat Diabetic Foot
Ulcers, Han Joo Chae, Seunghwan Lee, Hyewon Son, Seungyeob
Han, Taebin Lim
250. Investigating the Impact of Multi-LiDAR Placement on Object
Detection for Autonomous Driving, Hanjiang Hu, Zuxin Liu,
Sharad Chitlangia, Akhil Agnihotri, Ding Zhao

1130–1330 Lunch (Halls D-E)


10
Tuesday, June 21 (Afternoon) Program
1300–1330 Poster Switch/Setup (Halls B2-C) 1330–1500 Oral 1.2.2: 3D From Single Images
(Great Hall B-C)
1330–1500 Oral 1.2.1: Segmentation, Grouping & Papers in this session are in Poster Session 1.2
Shape Analysis (Great Hall A-D) Chairs: Angjoo Kanazawa (Univ. of California Berkeley)
Papers in this session are in Poster Session 1.2 Pascal Fua (EPFL)
Chairs: Yi Fang (New York Univ.) Format (5 min. presentation; 3 min. group questions/3 papers)
Benjamin Kimia (Brown Univ.) 16. [1330] Accurate 3D Body Shape Regression Using Metric and
Semantic Attributes, Vasileios Choutas, Lea Müller, Chun-Hao P.
Chao Chen (Stony Brook Univ.)
Huang, Siyu Tang, Dimitrios Tzionas, Michael J. Black
Format (5 min. presentation; 3 min. group questions/3 papers)
17. [1335] JIFF: Jointly-Aligned Implicit Face Function for High
1. [1330] CMT-DeepLab: Clustering Mask Transformers for Panoptic
Quality Single View Clothed Human Reconstruction, Yukang Cao,
Segmentation, Qihang Yu, Huiyu Wang, Dahun Kim, Siyuan Qiao,
Guanying Chen, Kai Han, Wenqi Yang, Kwan-Yee K. Wong
Maxwell Collins, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-
18. [1340] Tracking People by Predicting 3D Appearance, Location
Chieh Chen
and Pose, Jathushan Rajasegaran, Georgios Pavlakos, Angjoo
2. [1335] Unsupervised Hierarchical Semantic Segmentation With
Kanazawa, Jitendra Malik
Multiview Cosegmentation and Clustering Transformers, Tsung-
Wei Ke, Jyh-Jing Hwang, Yunhui Guo, Xudong Wang, Stella X. Yu 19. [1348] ArtiBoost: Boosting Articulated 3D Hand-Object Pose
Estimation via Online Exploration and Synthesis, Lixin Yang,
3. [1340] Rethinking Semantic Segmentation: A Prototype View,
Tianfei Zhou, Wenguan Wang, Ender Konukoglu, Luc Van Gool Kailin Li, Xinyu Zhan, Jun Lv, Wenqiang Xu, Jiefeng Li, Cewu Lu
20. [1353] Interacting Attention Graph for Single Image Two-Hand
4. [1348] Semantic-Aware Domain Generalized Segmentation, Duo
Reconstruction, Mengcheng Li, Liang An, Hongwen Zhang,
Peng, Yinjie Lei, Munawar Hayat, Yulan Guo, Wen Li
Lianpeng Wu, Feng Chen, Tao Yu, Yebin Liu
5. [1353] Adaptive Early-Learning Correction for Segmentation
21. [1358] 3D Human Tongue Reconstruction From Single “In-the-
From Noisy Annotations, Sheng Liu, Kangning Liu, Weicheng Zhu,
Wild” Images, Stylianos Ploumpis, Stylianos Moschoglou, Vasileios
Yiqiu Shen, Carlos Fernandez-Granda
Triantafyllou, Stefanos Zafeiriou
6. [1358] Pointly-Supervised Instance Segmentation, Bowen Cheng,
22. [1406] EPro-PnP: Generalized End-to-End Probabilistic
Omkar Parkhi, Alexander Kirillov
Perspective-N-Points for Monocular Object Pose Estimation,
7. [1406] Joint Forecasting of Panoptic Segmentations With Hansheng Chen, Pichao Wang, Fan Wang, Wei Tian, Lu Xiong, Hao
Difference Attention, Colin Graber, Cyril Jazra, Wenjie Luo, Li
Liangyan Gui, Alexander G. Schwing
23. [1411] Diversity Matters: Fully Exploiting Depth Clues for Reliable
8. [1411] FocusCut: Diving Into a Focus View in Interactive Monocular 3D Object Detection, Zhuoling Li, Zhan Qu, Yang
Segmentation, Zheng Lin, Zheng-Peng Duan, Zhao Zhang, Chun- Zhou, Jianzhuang Liu, Haoqian Wang, Lihui Jiang
Le Guo, Ming-Ming Cheng
24. [1416] OmniFusion: 360 Monocular Depth Estimation via
9. [1416] Human Instance Matting via Mutual Guidance and Multi- Geometry-Aware Fusion, Yuyan Li, Yuliang Guo, Zhixin Yan,
Instance Refinement, Yanan Sun, Chi-Keung Tang, Yu-Wing Tai Xinyu Huang, Ye Duan, Liu Ren
10. [1424] Deformable Sprites for Unsupervised Video 25. [1424] Gated2Gated: Self-Supervised Depth Estimation From
Decomposition, Vickie Ye, Zhengqi Li, Richard Tucker, Angjoo Gated Images, Amanpreet Walia, Stefanie Walz, Mario Bijelic,
Kanazawa, Noah Snavely Fahim Mannan, Frank Julca-Aguilar, Michael Langer, Werner
11. [1429] Eigencontours: Novel Contour Descriptors Based on Low- Ritter, Felix Heide
Rank Approximation, Wonhui Park, Dongkwon Jin, Chang-Su Kim 26. [1429] IRISformer: Dense Vision Transformers for Single-Image
12. [1434] Robust and Accurate Superquadric Recovery: A Inverse Rendering in Indoor Scenes, Rui Zhu, Zhengqin Li,
Probabilistic Approach, Weixiao Liu, Yuwei Wu, Sipu Ruan, Janarbek Matai, Fatih Porikli, Manmohan Chandraker
Gregory S. Chirikjian 27. [1434] Egocentric Scene Understanding via Multimodal Spatial
13. [1442] Medial Spectral Coordinates for 3D Shape Analysis, Rectifier, Tien Do, Khiem Vuong, Hyun Soo Park
Morteza Rezanejad, Mohammad Khodadad, Hamidreza Mahyar, 28. [1442] Multi-View Depth Estimation by Fusing Single-View Depth
Herve Lombaert, Michael Gruninger, Dirk Walther, Kaleem Siddiqi Probability With Multi-View Geometry, Gwangbin Bae, Ignas
14. [1447] Scribble-Supervised LiDAR Semantic Segmentation, Ozan Budvytis, Roberto Cipolla
Unal, Dengxin Dai, Luc Van Gool 29. [1447] The Implicit Values of a Good Hand Shake: Handheld
15. [1452] SoftGroup for 3D Instance Segmentation on Point Clouds, Multi-Frame Neural Depth Refinement, Ilya Chugunov, Yuxuan
Thang Vu, Kookhoi Kim, Tung M. Luu, Thanh Nguyen, Chang D. Zhang, Zhihao Xia, Xuaner Zhang, Jiawen Chen, Felix Heide
Yoo 30. [1452] BANMo: Building Animatable 3D Neural Models From
Many Casual Videos, Gengshan Yang, Minh Vo, Natalia Neverova,
Deva Ramanan, Andrea Vedaldi, Hanbyul Joo

11
Tuesday, June 21 (Afternoon) Program
1330–1500 Oral 1.2.3: Video Analysis & Understanding 1330–1700 Demos (Halls B2-C Demo Area)
(Hall B1) • Interactive Segmentation and Visualization for Tiny Objects in
Papers in this session are in Poster Session 1.2 Multi-Megapixel Images, Chengyuan Xu, Boning Dong, Noah Stier,
Chairs: Nicu Sebe (Univ. of Trento) Curtis McCully, D. Andrew Howell, Pradeep Sen, Tobias Hollerer
(UCSB; Las Cumbres Observatory)
YingLi Tian (City Univ. of New York)
• VL-InterpreT: An Interactive Visualization Tool for Interpreting
Concetto Spampinato (Univ. of Catania)
Vision-Language Transformers, Estelle Guez Aflalo, Meng Du,
Format (5 min. presentation; 3 min. group questions/3 papers)
Shao-Yen Tseng, Yongfei Liu, Chenfei Wu, Nan Duan, Vasudev Lal
31. [1330] Self-Supervised Video Transformer, Kanchana Ranasinghe, (Intel Labs; UCLA; Microsoft Research)
Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan, Michael
• Speech Driven Tongue Animation, Salvador Medina, Denis Tome,
S. Ryoo
Carsten Stoll, Thibaut Weise, Iain Matthews (Carnegie Mellon
32. [1335] Temporally Efficient Vision Transformer for Video Instance
University; Epic Games)
Segmentation, Shusheng Yang, Xinggang Wang, Yu Li, Yuxin
Fang, Jiemin Fang, Wenyu Liu, Xun Zhao, Ying Shan • Effective Conditioned and Composed Image Retrieval Combining
CLIP-Based Features, Alberto Baldrati, Marco Bertini, Tiberio
33. [1340] VISOLO: Grid-Based Space-Time Aggregation for Efficient
Uricchio, Alberto Del Bimbo (Università degli Studi di Firenze;
Online Video Instance Segmentation, Su Ho Han, Sukjun Hwang,
Università di Pisa)
Seoung Wug Oh, Yeonchool Park, Hyunwoo Kim, Min-Jung Kim,
Seon Joo Kim • DetectorDetective: Investigating the Effects of Adversarial
Examples on Object Detectors, Sivapriya Vellaichamy, Matthew
34. [1348] Temporal Alignment Networks for Long-Term Video,
Hull, Zijie J. Wang, Nilaksh Das, ShengYun Peng, Haekyu Park, Duen
Tengda Han, Weidi Xie, Andrew Zisserman
Horng Chau (Georgia Institute of Technology)
35. [1353] Revisiting the “Video” in Video-Language Understanding,
• [Virtual] V-Doc: Visual Questions Answers With Documents, Yihao
Shyamal Buch, Cristóbal Eyzaguirre, Adrien Gaidon, Jiajun Wu, Li
Ding, Zhe Huang, Runlin Wang, YanHang Zhang, Xianru Chen,
Fei-Fei, Juan Carlos Niebles
Yuzhong Ma, Hyunsuk Chung, Soyeon Caren Han (The Univ. of
36. [1358] Invariant Grounding for Video Question Answering, Yicong
Sydney; Fortifyedge)
Li, Xiang Wang, Junbin Xiao, Wei Ji, Tat-Seng Chua
• [Virtual] VisCUIT: Visual Auditor for Bias in CNN Image Classifier,
37. [1406] P3IV: Probabilistic Procedure Planning From Instructional Seongmin Lee, Zijie J. Wang, Judy Hoffman, Duen Horng Chau
Videos With Weak Supervision, He Zhao, Isma Hadji, Nikita (Georgia Institute of Technology)
Dvornik, Konstantinos G. Derpanis, Richard P. Wildes, Allan D. • [Virtual] Clustering Plotted Data by Image Segmentation, Tarek
Jepson Naous, Srinjay Sarkar, Abubakar Abid, James Zou (American Univ. of
38. [1411] FineDiving: A Fine-Grained Dataset for Procedure-Aware Beirut; VinAI Research; Hugging Face; Stanford Univ.)
Action Quality Assessment, Jinglin Xu, Yongming Rao, Xumin Yu,
Guangyi Chen, Jie Zhou, Jiwen Lu
39. [1416] Cross-Model Pseudo-Labeling for Semi-Supervised Action 1430–1700 Poster 1.2 (Halls B2-C)
Recognition, Yinghao Xu, Fangyun Wei, Xiao Sun, Ceyuan Yang, Video Analysis & Understanding
Yujun Shen, Bo Dai, Bolei Zhou, Stephen Lin 46. Compositional Temporal Grounding With Structured
40. [1424] Revisiting Skeleton-Based Action Recognition, Haodong Variational Cross-Graph Correspondence Learning, Juncheng Li,
Duan, Yue Zhao, Kai Chen, Dahua Lin, Bo Dai Junlin Xie, Long Qian, Linchao Zhu, Siliang Tang, Fei Wu, Yi Yang,
41. [1429] OpenTAL: Towards Open Set Temporal Action Yueting Zhuang, Xin Eric Wang
Localization, Wentao Bao, Qi Yu, Yu Kong 47. UMT: Unified Multi-Modal Transformers for Joint Video
42. [1434] Dual-AI: Dual-Path Actor Interaction Learning for Group Moment Retrieval and Highlight Detection, Ye Liu, Siyuan Li,
Activity Recognition, Mingfei Han, David Junhao Zhang, Yali Yang Wu, Chang-Wen Chen, Ying Shan, Xiaohu Qie
Wang, Rui Yan, Lina Yao, Xiaojun Chang, Yu Qiao 48. Future Transformer for Long-Term Action Anticipation,
43. [1442] TransRank: Self-Supervised Video Representation
Dayoung Gong, Joonseok Lee, Manjin Kim, Seong Jong Ha, Minsu
Learning via Ranking-Based Transformation Recognition, Cho
Haodong Duan, Nanxuan Zhao, Kai Chen, Dahua Lin 49. MLP-3D: A MLP-Like 3D Architecture With Grouped Time
44. [1447] Revealing Occlusions With 4D Neural Fields, Basile Van
Mixing, Zhaofan Qiu, Ting Yao, Chong-Wah Ngo, Tao Mei
Hoorick, Purva Tendulkar, Dídac Surís, Dennis Park, Simon Stent, 50. Learning Pixel-Level Distinctions for Video Highlight Detection,
Carl Vondrick Fanyue Wei, Biao Wang, Tiezheng Ge, Yuning Jiang, Wen Li, Lixin
45. [1452] HODOR: High-Level Object Descriptors for Object Re-
Duan
Segmentation in Video Learned From Static Images, Ali Athar, 51. DR.VIC: Decomposition and Reasoning for Video Individual
Jonathon Luiten, Alexander Hermans, Deva Ramanan, Bastian Counting, Tao Han, Lei Bai, Junyu Gao, Qi Wang, Wanli Ouyang
Leibe 52. Slot-VPS: Object-Centric Representation Learning for Video
Panoptic Segmentation, Yi Zhou, Hui Zhang, Hana Lee, Shuyang
Sun, Pingjun Li, Yangguang Zhu, ByungIn Yoo, Xiaojuan Qi, Jae-
1500–1530 Afternoon Break (Halls B2-C) Joon Han
53. Explore Spatio-Temporal Aggregation for Insubstantial Object
1000–1700 Exhibits (Halls B2-C) Detection: Benchmark Dataset and Baseline, Kailai Zhou, Yibo
Wang, Tao Lv, Yunqian Li, Linsen Chen, Qiu Shen, Xun Cao
• See Exhibits map for list of exhibitors.

12
Tuesday, June 21 (Afternoon) Program
54. Video Shadow Detection via Spatio-Temporal Interpolation 77. Progressive Attention on Multi-Level Dense Difference Maps for
Consistency Training, Xiao Lu, Yihong Cao, Sheng Liu, Generic Event Boundary Detection, Jiaqi Tang, Zhaoyang Liu,
Chengjiang Long, Zipei Chen, Xuanyu Zhou, Yimin Yang, Chunxia Chen Qian, Wayne Wu, Limin Wang
Xiao 78. Comparing Correspondences: Video Prediction With
55. Coarse-To-Fine Feature Mining for Video Semantic Correspondence-Wise Losses, Daniel Geng, Max Hamilton,
Segmentation, Guolei Sun, Yun Liu, Henghui Ding, Thomas Andrew Owens
Probst, Luc Van Gool Image & Video Synthesis and Generation
56. Tencent-MVSE: A Large-Scale Benchmark Dataset for Multi- 79. Sound-Guided Semantic Image Manipulation, Seung Hyun Lee,
Modal Video Similarity Evaluation, Zhaoyang Zeng, Yongsheng Wonseok Roh, Wonmin Byeon, Sang Ho Yoon, Chanyoung Kim,
Luo, Zhenhua Liu, Fengyun Rao, Dian Li, Weidong Guo, Zhen Wen Jinkyu Kim, Sangpil Kim
57. Object-Region Video Transformers, Roei Herzig, Elad Ben- 80. Expressive Talking Head Generation With Granular Audio-Visual
Avraham, Karttikeya Mangalam, Amir Bar, Gal Chechik, Anna Control, Borong Liang, Yan Pan, Zhizhi Guo, Hang Zhou, Zhibin
Rohrbach, Trevor Darrell, Amir Globerson Hong, Xiaoguang Han, Junyu Han, Jingtuo Liu, Errui Ding,
58. Colar: Effective and Efficient Online Action Detection by Jingdong Wang
Consulting Exemplars, Le Yang, Junwei Han, Dingwen Zhang 81. Depth-Aware Generative Adversarial Network for Talking Head
59. SimVP: Simpler Yet Better Video Prediction, Zhangyang Gao, Video Generation, Fa-Ting Hong, Longhao Zhang, Li Shen, Dan Xu
Cheng Tan, Lirong Wu, Stan Z. Li 82. Learning Motion-Dependent Appearance for High-Fidelity
60. Imposing Consistency for Optical Flow Estimation, Jisoo Jeong, Rendering of Dynamic Humans From a Single Camera, Jae Shin
Jamie Menjay Lin, Fatih Porikli, Nojun Kwak Yoon, Duygu Ceylan, Tuanfeng Y. Wang, Jingwan Lu, Jimei Yang,
61. Stand-Alone Inter-Frame Attention in Video Models, Fuchen Zhixin Shu, Hyun Soo Park
Long, Zhaofan Qiu, Yingwei Pan, Ting Yao, Jiebo Luo, Tao Mei 83. Audio-Driven Neural Gesture Reenactment With Video Motion
62. Video Swin Transformer, Ze Liu, Jia Ning, Yue Cao, Yixuan Wei, Graphs, Yang Zhou, Jimei Yang, Dingzeyu Li, Jun Saito, Deepali
Zheng Zhang, Stephen Lin, Han Hu Aneja, Evangelos Kalogerakis
63. Bayesian Nonparametric Submodular Video Partition for 84. Portrait Eyeglasses and Shadow Removal by Leveraging 3D
Robust Anomaly Detection, Hitesh Sapkota, Qi Yu Synthetic Data, Junfeng Lyu, Zhibo Wang, Feng Xu
64. Self-Supervised Predictive Learning: A Negative-Free Method 85. Weakly Supervised High-Fidelity Clothing Model Generation,
for Sound Source Localization in Visual Scenes, Zengjie Song, Ruili Feng, Cheng Ma, Chengji Shen, Xin Gao, Zhenjiang Liu,
Yuxi Wang, Junsong Fan, Tieniu Tan, Zhaoxiang Zhang Xiaobo Li, Kairi Ou, Deli Zhao, Zheng-Jun Zha
65. Likert Scoring With Grade Decoupling for Long-Term Action 86. TemporalUV: Capturing Loose Clothing With Temporally
Assessment, Angchi Xu, Ling-An Zeng, Wei-Shi Zheng Coherent UV Coordinates, You Xie, Huiqi Mao, Angela Yao, Nils
66. Complex Video Action Reasoning via Learnable Markov Logic Thuerey
Network, Yang Jin, Linchao Zhu, Yadong Mu 87. Full-Range Virtual Try-On With Recurrent Tri-Level Transform,
67. Learning From Temporal Gradient for Semi-Supervised Action Han Yang, Xinrui Yu, Ziwei Liu
Recognition, Junfei Xiao, Longlong Jing, Lin Zhang, Ju He, Qi She, 88. Style-Based Global Appearance Flow for Virtual Try-On, Sen He,
Zongwei Zhou, Alan Yuille, Yingwei Li Yi-Zhe Song, Tao Xiang
68. Semi-Supervised Video Semantic Segmentation With Inter- 89. Dressing in the Wild by Watching Dance Videos, Xin Dong,
Frame Feature Reconstruction, Jiafan Zhuang, Zilei Wang, Yuan Fuwei Zhao, Zhenyu Xie, Xijin Zhang, Daniel K. Du, Min Zheng,
Gao Xiang Long, Xiaodan Liang, Jianchao Yang
69. Weakly Supervised Temporal Action Localization via 90. A Brand New Dance Partner: Music-Conditioned Pluralistic
Representative Snippet Knowledge Propagation, Linjiang Dancing Controlled by Multiple Dance Genres, Jinwoo Kim,
Huang, Liang Wang, Hongsheng Li Heeseok Oh, Seongjean Kim, Hoseok Tong, Sanghoon Lee
70. Joint Hand Motion and Interaction Hotspots Prediction From 91. Unpaired Cartoon Image Synthesis via Gated Cycle Mapping,
Egocentric Videos, Shaowei Liu, Subarna Tripathi, Somdeb Yifang Men, Yuan Yao, Miaomiao Cui, Zhouhui Lian, Xuansong
Majumdar, Xiaolong Wang Xie, Xian-Sheng Hua
71. Human Hands As Probes for Interactive Object Understanding, 92. DLFormer: Discrete Latent Transformer for Video Inpainting,
Mohit Goyal, Sahil Modi, Rishabh Goyal, Saurabh Gupta Jingjing Ren, Qingqing Zheng, Yuanyuan Zhao, Xuemiao Xu, Chen
72. LD-ConGR: A Large RGB-D Video Dataset for Long-Distance Li
Continuous Gesture Recognition, Dan Liu, Libo Zhang, Yanjun 93. ST-MFNet: A Spatio-Temporal Multi-Flow Network for Frame
Wu Interpolation, Duolikun Danier, Fan Zhang, David Bull
73. Object-Aware Video-Language Pre-Training for Retrieval, 94. Video Frame Interpolation With Transformer, Liying Lu,
Jinpeng Wang, Yixiao Ge, Guanyu Cai, Rui Yan, Xudong Lin, Ying Ruizheng Wu, Huaijia Lin, Jiangbo Lu, Jiaya Jia
Shan, Xiaohu Qie, Mike Zheng Shou 95. Long-Term Video Frame Interpolation via Feature Propagation,
74. Fast and Unsupervised Action Boundary Detection for Action Dawit Mureja Argaw, In So Kweon
Segmentation, Zexing Du, Xue Wang, Guoqing Zhou, Qing Wang 96. Many-to-Many Splatting for Efficient Video Frame
75. Multiview Transformers for Video Recognition, Shen Yan, Interpolation, Ping Hu, Simon Niklaus, Stan Sclaroff, Kate
Xuehan Xiong, Anurag Arnab, Zhichao Lu, Mi Zhang, Chen Sun, Saenko
Cordelia Schmid 97. Look Outside the Room: Synthesizing a Consistent Long-Term
76. Semi-Weakly-Supervised Learning of Complex Actions From 3D Scene Video From a Single Image, Xuanchi Ren, Xiaolong
Instructional Task Videos, Yuhan Shen, Ehsan Elhamifar Wang

13
Tuesday, June 21 (Afternoon) Program
98. Spatial-Temporal Space Hand-in-Hand: Spatial-Temporal Video 3D From Single Images
Super-Resolution via Cycle-Projected Mutual Learning, 116. 360MonoDepth: High-Resolution 360° Monocular Depth
Mengshun Hu, Kui Jiang, Liang Liao, Jing Xiao, Junjun Jiang, Estimation, Manuel Rey-Area, Mingze Yuan, Christian Richardt
Zheng Wang 117. Pre-Train, Self-Train, Distill: A Simple Recipe for Supersizing 3D
99. Playable Environments: Video Manipulation in Space and Time, Reconstruction, Kalyan Vasudev Alwala, Abhinav Gupta,
Willi Menapace, Stéphane Lathuilière, Aliaksandr Siarohin, Shubham Tulsiani
Christian Theobalt, Sergey Tulyakov, Vladislav Golyanik, Elisa 118. DGECN: A Depth-Guided Edge Convolutional Network for End-
Ricci to-End 6D Pose Estimation, Tuo Cao, Fei Luo, Yanping Fu,
100. Event-Based Video Reconstruction via Potential-Assisted Wenxiao Zhang, Shengjie Zheng, Chunxia Xiao
Spiking Neural Network, Lin Zhu, Xiao Wang, Yi Chang, Jianing 119. MonoGround: Detecting Monocular 3D Objects From the
Li, Tiejun Huang, Yonghong Tian Ground, Zequn Qin, Xi Li
101. Modular Action Concept Grounding in Semantic Video 120. 3D Shape Reconstruction From 2D Images With Disentangled
Prediction, Wei Yu, Wenxin Chen, Songheng Yin, Steve Attribute Flow, Xin Wen, Junsheng Zhou, Yu-Shen Liu, Hua Su,
Easterbrook, Animesh Garg Zhen Dong, Zhizhong Han
102. Show Me What and Tell Me How: Video Synthesis via 121. Toward Practical Monocular Indoor Depth Estimation, Cho-Ying
Multimodal Conditioning, Ligong Han, Jian Ren, Hsin-Ying Lee, Wu, Jialiang Wang, Michael Hall, Ulrich Neumann, Shuochen Su
Francesco Barbieri, Kyle Olszewski, Shervin Minaee, Dimitris 122. Focal Length and Object Pose Estimation via Render and
Metaxas, Sergey Tulyakov Compare, Georgy Ponimatkin, Yann Labbé, Bryan Russell,
103. StyleGAN-V: A Continuous Video Generator With the Price, Mathieu Aubry, Josef Sivic
Image Quality and Perks of StyleGAN2, Ivan Skorokhodov, 123. CLIP-NeRF: Text-and-Image Driven Manipulation of Neural
Sergey Tulyakov, Mohamed Elhoseiny Radiance Fields, Can Wang, Menglei Chai, Mingming He,
104. Structure-Aware Motion Transfer With Deformable Anchor Dongdong Chen, Jing Liao
Model, Jiale Tao, Biao Wang, Borun Xu, Tiezheng Ge, Yuning 124. Registering Explicit to Implicit: Towards High-Fidelity Garment
Jiang, Wen Li, Lixin Duan Mesh Reconstruction From Single Images, Heming Zhu,
105. Image Animation With Perturbed Masks, Yoav Shalev, Lior Wolf Lingteng Qiu, Yuda Qiu, Xiaoguang Han
106. Thin-Plate Spline Motion Model for Image Animation, Jian 125. Layered Depth Refinement With Mask Guidance, Soo Ye Kim,
Zhao, Hui Zhang Jianming Zhang, Simon Niklaus, Yifei Fan, Simon Chen, Zhe Lin,
107. Controllable Animation of Fluid Elements in Still Images, Munchurl Kim
Aniruddha Mahapatra, Kuldeep Kulkarni 126. HEAT: Holistic Edge Attention Transformer for Structured
108. Watch It Move: Unsupervised Discovery of 3D Joints for Re- Reconstruction, Jiacheng Chen, Yiming Qian, Yasutaka Furukawa
Posing of Articulated Objects, Atsuhiro Noguchi, Umar Iqbal, 127. BARC: Learning To Regress 3D Dog Shape From Images by
Jonathan Tremblay, Tatsuya Harada, Orazio Gallo Exploiting Breed Information, Nadine Rüegg, Silvia Zuffi, Konrad
109. Geometric Structure Preserving Warp for Natural Image Schindler, Michael J. Black
Stitching, Peng Du, Jifeng Ning, Jiguang Cui, Shaoli Huang, 128. Time3D: End-to-End Joint Monocular 3D Object Detection and
Xinchao Wang, Jiaxin Wang Tracking for Autonomous Driving, Peixuan Li, Jieyu Jin
110. Few-Shot Incremental Learning for Label-to-Image Translation, 129. What’s in Your Hands? 3D Reconstruction of Generic Objects in
Pei Chen, Yangkang Zhang, Zejian Li, Lingyun Sun Hands, Yufei Ye, Abhinav Gupta, Shubham Tulsiani
111. Exemplar-Based Pattern Synthesis With Implicit Periodic Field 130. 3D Moments From Near-Duplicate Photos, Qianqian Wang,
Network, Haiwei Chen, Jiayi Liu, Weikai Chen, Shichen Liu, Yajie Zhengqi Li, David Salesin, Noah Snavely, Brian Curless, Janne
Zhao Kontkanen
112. SIMBAR: Single Image-Based Scene Relighting for Effective 131. Neural Window Fully-Connected CRFs for Monocular Depth
Data Augmentation for Automated Driving Vision Tasks, Estimation, Weihao Yuan, Xiaodong Gu, Zuozhuo Dai, Siyu Zhu,
Xianling Zhang, Nathan Tseng, Ameerah Syed, Rohan Bhasin, Ping Tan
Nikita Jaipuria 132. PUMP: Pyramidal and Uniqueness Matching Priors for
113. SoftCollage: A Differentiable Probabilistic Tree Generator for Unsupervised Learning of Local Descriptors, Jérome Revaud,
Image Collage, Jiahao Yu, Li Chen, Mingrui Zhang, Mading Li Vincent Leroy, Philippe Weinzaepfel, Boris Chidlovskii
114. PILC: Practical Image Lossless Compression With an End-to- 133. CroMo: Cross-Modal Learning for Monocular Depth Estimation,
End GPU Oriented Neural Framework, Ning Kang, Shanzhao Yannick Verdié, Jifei Song, Barnabé Mas, Benjamin Busam, Ales̆
Qiu, Shifeng Zhang, Zhenguo Li, Shu-Tao Xia Leonardis, Steven McDonagh
115. Kubric: A Scalable Dataset Generator, Klaus Greff, Francois 134. -SfT: Shape-From-Template With a Physics-Based
Belletti, Lucas Beyer, Carl Doersch, Yilun Du, Daniel Duckworth, Deformation Model, Navami Kairanda, Edith Tretschk, Mohamed
David J. Fleet, Dan Gnanapragasam, Florian Golemo, Charles Elgharib, Christian Theobalt, Vladislav Golyanik
Herrmann, Thomas Kipf, Abhijit Kundu, Dmitry Lagun, Issam 135. Human-Aware Object Placement for Visual Environment
Laradji, Hsueh-Ti (Derek) Liu, Henning Meyer, Yishu Miao, Derek Reconstruction, Hongwei Yi, Chun-Hao P. Huang, Dimitrios
Nowrouzezahrai, Cengiz Oztireli, Etienne Pot, Noha Radwan, Tzionas, Muhammed Kocabas, Mohamed Hassan, Siyu Tang,
Daniel Rebain, Sara Sabour, Mehdi S. M. Sajjadi, Matan Sela, Justus Thies, Michael J. Black
Vincent Sitzmann, Austin Stone, Deqing Sun, Suhani Vora, Ziyu 136. AutoRF: Learning 3D Object Radiance Fields From Single View
Wang, Tianhao Wu, Kwang Moo Yi, Fangcheng Zhong, Andrea Observations, Norman Müller, Andrea Simonelli, Lorenzo Porzi,
Tagliasacchi Samuel Rota Bulò, Matthias Nießner, Peter Kontschieder

14
Tuesday, June 21 (Afternoon) Program
137. Pix2NeRF: Unsupervised Conditional -GAN for Single Image to 157. PhysFormer: Facial Video-Based Physiological Measurement
Neural Radiance Fields Translation, Shengqu Cai, Anton With Temporal Difference Transformer, Zitong Yu, Yuming
Obukhov, Dengxin Dai, Luc Van Gool Shen, Jingang Shi, Hengshuang Zhao, Philip H.S. Torr, Guoying
138. MonoScene: Monocular 3D Semantic Scene Completion, Anh- Zhao
Quan Cao, Raoul de Charette 158. GazeOnce: Real-Time Multi-Person Gaze Estimation, Mingfang
139. GenDR: A Generalized Differentiable Renderer, Felix Petersen, Zhang, Yunfei Liu, Feng Lu
Bastian Goldluecke, Christian Borgelt, Oliver Deussen 159. Generalizing Gaze Estimation With Rotation Consistency, Yiwei
140. MonoDTR: Monocular 3D Object Detection With Depth-Aware Bao, Yunfei Liu, Haofei Wang, Feng Lu
Transformer, Kuan-Chih Huang, Tsung-Han Wu, Hung-Ting Su, 160. Face Relighting With Geometrically Consistent Shadows,
Winston H. Hsu Andrew Hou, Michel Sarkis, Ning Bi, Yiying Tong, Xiaoming Liu
141. ROCA: Robust CAD Model Retrieval and Alignment From a 161. HairMapper: Removing Hair From Portraits Using GANs, Yiqian
Single Image, Can Gümeli, Angela Dai, Matthias Nießner Wu, Yong-Liang Yang, Xiaogang Jin
Face & Gestures 162. Learning To Restore 3D Face From In-the-Wild Degraded
142. HP-Capsule: Unsupervised Face Part Discovery by Hierarchical
Images, Zhenyu Zhang, Yanhao Ge, Ying Tai, Xiaoming Huang,
Parsing Capsule Network, Chang Yu, Xiangyu Zhu, Xiaomei Chengjie Wang, Hao Tang, Dongjin Huang, Zhifeng Xie
Zhang, Zidu Wang, Zhaoxiang Zhang, Zhen Lei Segmentation, Grouping and Shape Analysis
143. Killing Two Birds With One Stone: Efficient and Robust Training 163. Semi-Supervised Semantic Segmentation Using Unreliable
of Face Recognition CNNs by Partial FC, Xiang An, Jiankang Pseudo-Labels, Yuchao Wang, Haochen Wang, Yujun Shen,
Deng, Jia Guo, Ziyong Feng, XuHan Zhu, Jing Yang, Tongliang Liu Jingjing Fei, Wei Li, Guoqiang Jin, Liwei Wu, Rui Zhao, Xinyi Le
144. Sparse Local Patch Transformer for Robust Face Alignment and 164. Perturbed and Strict Mean Teachers for Semi-Supervised
Landmarks Inherent Relation Learning, Jiahao Xia, Weiwei Qu, Semantic Segmentation, Yuyuan Liu, Yu Tian, Yuanhong Chen,
Wenjian Huang, Jianguo Zhang, Xi Wang, Min Xu Fengbei Liu, Vasileios Belagiannis, Gustavo Carneiro
145. Enhancing Face Recognition With Self-Supervised 3D 165. ST++: Make Self-Training Work Better for Semi-Supervised
Reconstruction, Mingjie He, Jie Zhang, Shiguang Shan, Xilin Chen Semantic Segmentation, Lihe Yang, Wei Zhuo, Lei Qi, Yinghuan
146. Learning To Learn Across Diverse Data Biases in Deep Face Shi, Yang Gao
Recognition, Chang Liu, Xiang Yu, Yi-Hsuan Tsai, Masoud Faraki, 166. Beyond Semantic to Instance Segmentation: Weakly-
Ramin Moslemi, Manmohan Chandraker, Yun Fu Supervised Instance Segmentation via Semantic Knowledge
147. An Efficient Training Approach for Very Large Scale Face Transfer and Self-Refinement, Beomyoung Kim, YoungJoon Yoo,
Recognition, Kai Wang, Shuo Wang, Panpan Zhang, Zhipeng Chae Eun Rhee, Junmo Kim
Zhou, Zheng Zhu, Xiaobo Wang, Xiaojiang Peng, Baigui Sun, Hao 167. Self-Supervised Image-Specific Prototype Exploration for
Li, Yang You Weakly Supervised Semantic Segmentation, Qi Chen, Lingxiao
148. MogFace: Towards a Deeper Appreciation on Face Detection, Yang, Jian-Huang Lai, Xiaohua Xie
Yang Liu, Fei Wang, Jiankang Deng, Zhipeng Zhou, Baigui Sun, 168. Regional Semantic Contrast and Aggregation for Weakly
Hao Li Supervised Semantic Segmentation, Tianfei Zhou, Meijie Zhang,
149. Exploring Frequency Adversarial Attacks for Face Forgery Fang Zhao, Jianwu Li
Detection, Shuai Jia, Chao Ma, Taiping Yao, Bangjie Yin, 169. Multi-Class Token Transformer for Weakly Supervised Semantic
Shouhong Ding, Xiaokang Yang Segmentation, Lian Xu, Wanli Ouyang, Mohammed Bennamoun,
150. End-to-End Reconstruction-Classification Learning for Face Farid Boussaid, Dan Xu
Forgery Detection, Junyi Cao, Chao Ma, Taiping Yao, Shen Chen, 170. Weakly Supervised Semantic Segmentation by Pixel-to-
Shouhong Ding, Xiaokang Yang Prototype Contrast, Ye Du, Zehua Fu, Qingjie Liu, Yunhong Wang
151. Domain Generalization via Shuffled Style Assembly for Face 171. Threshold Matters in WSSS: Manipulating the Activation for the
Anti-Spoofing, Zhuo Wang, Zezheng Wang, Zitong Yu, Weihong Robust and Accurate Segmentation Model Against Thresholds,
Deng, Jiahong Li, Tingting Gao, Zhongyuan Wang Minhyun Lee, Dongseob Kim, Hyunjung Shim
152. Privacy-Preserving Online AutoML for Domain-Specific Face 172. Novel Class Discovery in Semantic Segmentation, Yuyang Zhao,
Detection, Chenqian Yan, Yuge Zhang, Quanlu Zhang, Yaming Zhun Zhong, Nicu Sebe, Gim Hee Lee
Yang, Xinyang Jiang, Yuqing Yang, Baoyuan Wang 173. Pin the Memory: Learning To Generalize Semantic
153. Simulated Adversarial Testing of Face Recognition Models, Segmentation, Jin Kim, Jiyoung Lee, Jungin Park, Dongbo Min,
Nataniel Ruiz, Adam Kortylewski, Weichao Qiu, Cihang Xie, Sarah Kwanghoon Sohn
Adel Bargal, Alan Yuille, Stan Sclaroff 174. ISDNet: Integrating Shallow and Deep Networks for Efficient
154. Decoupled Multi-Task Learning With Cyclical Self-Regulation Ultra-High Resolution Segmentation, Shaohua Guo, Liang Liu,
for Face Parsing, Qingping Zheng, Jiankang Deng, Zheng Zhu, Zhenye Gan, Yabiao Wang, Wuhao Zhang, Chengjie Wang,
Ying Li, Stefanos Zafeiriou Guannan Jiang, Wei Zhang, Ran Yi, Lizhuang Ma, Ke Xu
155. Towards Semi-Supervised Deep Facial Expression Recognition 175. Incremental Learning in Semantic Segmentation From Image
With an Adaptive Confidence Margin, Hangyu Li, Nannan Wang, Labels, Fabio Cermelli, Dario Fontanel, Antonio Tavera, Marco
Xi Yang, Xiaoyu Wang, Xinbo Gao Ciccone, Barbara Caputo
156. Towards Accurate Facial Landmark Detection via Cascaded 176. Instance Segmentation With Mask-Supervised Polygonal
Transformers, Hui Li, Zidong Guo, Seon-Min Rhee, Seungju Han, Boundary Transformers, Justin Lazarow, Weijian Xu, Zhuowen
Jae-Joon Han Tu

15
Tuesday, June 21 (Afternoon) Program
177. SharpContour: A Contour-Based Boundary Refinement 198. Towards Weakly-Supervised Text Spotting Using a Multi-Task
Approach for Efficient and Accurate Instance Segmentation, Transformer, Yair Kittenplon, Inbal Lavi, Sharon Fogel, Yarin Bar,
Chenming Zhu, Xuanye Zhang, Yanran Li, Liangdong Qiu, Kai R. Manmatha, Pietro Perona
Han, Xiaoguang Han 199. TableFormer: Table Structure Understanding With
178. Sparse Object-Level Supervision for Instance Segmentation Transformers, Ahmed Nassar, Nikolaos Livathinos, Maksym
With Pixel Embeddings, Adrian Wolny, Qin Yu, Constantin Pape, Lysak, Peter Staar
Anna Kreshuk 200. Knowledge Mining With Scene Text for Fine-Grained
179. Mask Transfiner for High-Quality Instance Segmentation, Lei Recognition, Hao Wang, Junchao Liao, Tianheng Cheng, Zewen
Ke, Martin Danelljan, Xia Li, Yu-Wing Tai, Chi-Keung Tang, Fisher Gao, Hao Liu, Bo Ren, Xiang Bai, Wenyu Liu
Yu 201. PubTables-1M: Towards Comprehensive Table Extraction From
180. Open-World Instance Segmentation: Exploiting Pseudo Ground Unstructured Documents, Brandon Smock, Rohith Pesala, Robin
Truth From Learned Pairwise Affinity, Weiyao Wang, Matt Abraham
Feiszli, Heng Wang, Jitendra Malik, Du Tran Recognition: Detection, Categorization, Retrieval
181. Sparse Instance Activation for Real-Time Instance Segmenta- 202. Focal and Global Knowledge Distillation for Detectors,
tion, Tianheng Cheng, Xinggang Wang, Shaoyu Chen, Wenqiang Zhendong Yang, Zhe Li, Xiaohu Jiang, Yuan Gong, Zehuan Yuan,
Zhang, Qian Zhang, Chang Huang, Zhaoxiang Zhang, Wenyu Liu Danpei Zhao, Chun Yuan
182. E2EC: An End-to-End Contour-Based Method for High-Quality 203. Speed Up Object Detection on Gigapixel-Level Images With
High-Speed Instance Segmentation, Tao Zhang, Shiqing Wei, Patch Arrangement, Jiahao Fan, Huabin Liu, Wenjie Yang, John
Shunping Ji See, Aixin Zhang, Weiyao Lin
183. Hyperbolic Image Segmentation, Mina Ghadimi Atigh, Julian 204. Training Object Detectors From Scratch: An Empirical Study in
Schoep, Erman Acar, Nanne van Noord, Pascal Mettes the Era of Vision Transformer, Weixiang Hong, Jiangwei Lao,
184. SeeThroughNet: Resurrection of Auxiliary Loss by Preserving Wang Ren, Jian Wang, Jingdong Chen, Wei Chu
Class Probability Information, Dasol Han, Jaewook Yoo, Dokwan 205. Learning With Neighbor Consistency for Noisy Labels, Ahmet
Oh Iscen, Jack Valmadre, Anurag Arnab, Cordelia Schmid
185. CDGNet: Class Distribution Guided Network for Human Parsing, 206. Meta Convolutional Neural Networks for Single Domain
Kunliang Liu, Ouk Choi, Jianming Wang, Wonjun Hwang Generalization, Chaoqun Wan, Xu Shen, Yonggang Zhang,
186. CLIMS: Cross Language Image Matching for Weakly Supervised Zhiheng Yin, Xinmei Tian, Feng Gao, Jianqiang Huang, Xian-
Semantic Segmentation, Jinheng Xie, Xianxu Hou, Kai Ye, Linlin Sheng Hua
Shen 207. Dual Cross-Attention Learning for Fine-Grained Visual
187. Sparse Non-Local CRF, Olga Veksler, Yuri Boykov Categorization and Object Re-Identification, Haowei Zhu,
188. Detecting Camouflaged Object in Frequency Domain, Yijie Zhong, Wenjing Ke, Dong Li, Ji Liu, Lu Tian, Yi Shan
Bo Li, Lv Tang, Senyun Kuang, Shuang Wu, Shouhong Ding 208. Geometry-Aware Guided Loss for Deep Crack Recognition,
189. Progressive Minimal Path Method With Embedded CNN, Wei Zhuangzhuang Chen, Jin Zhang, Zhuonan Lai, Jie Chen, Zun Liu,
Liao Jianqiang Li
Document Analysis & Understanding 209. Segment, Magnify and Reiterate: Detecting Camouflaged
190. Open-Set Text Recognition via Character-Context Decoupling, Objects the Hard Way, Qi Jia, Shuilian Yao, Yu Liu, Xin Fan,
Chang Liu, Chun Yang, Xu-Cheng Yin Risheng Liu, Zhongxuan Luo
191. Neural Collaborative Graph Machines for Table Structure Recog- 210. Dynamic Sparse R-CNN, Qinghang Hong, Fengming Liu, Dong Li,
nition, Hao Liu, Xin Li, Bing Liu, Deqiang Jiang, Yinsong Liu, Bo Ren Ji Liu, Lu Tian, Yi Shan
192. Revisiting Document Image Dewarping by Grid Regularization, 211. Deep Hybrid Models for Out-of-Distribution Detection, Senqi
Xiangwei Jiang, Rujiao Long, Nan Xue, Zhibo Yang, Cong Yao, Cao, Zhongfei Zhang
Gui-Song Xia 212. AutoLoss-GMS: Searching Generalized Margin-Based Softmax
193. Syntax-Aware Network for Handwritten Mathematical Loss Function for Person Re-Identification, Hongyang Gu,
Expression Recognition, Ye Yuan, Xiao Liu, Wondimu Dikubab, Jianmin Li, Guangyuan Fu, Chifong Wong, Xinghao Chen, Jun Zhu
Hui Liu, Zhilong Ji, Zhongqin Wu, Xiang Bai 213. Feature Erasing and Diffusion Network for Occluded Person Re-
194. Few Could Be Better Than All: Feature Sampling and Grouping Identification, Zhikang Wang, Feng Zhu, Shixiang Tang, Rui
for Scene Text Detection, Jingqun Tang, Wenqing Zhang, Zhao, Lihuo He, Jiangning Song
Hongye Liu, MingKun Yang, Bo Jiang, Guanglong Hu, Xiang Bai 214. Multi-Label Classification With Partial Annotations Using Class-
195. Fourier Document Restoration for Robust Document Aware Selective Loss, Emanuel Ben-Baruch, Tal Ridnik, Itamar
Dewarping and Recognition, Chuhui Xue, Zichen Tian, Fangneng Friedman, Avi Ben-Cohen, Nadav Zamir, Asaf Noy, Lihi Zelnik-
Zhan, Shijian Lu, Song Bai Manor
196. XYLayoutLM: Towards Layout-Aware Multimodal Networks for 215. BoxeR: Box-Attention for 2D and 3D Transformers, Duy-Kien
Visually-Rich Document Understanding, Zhangxuan Gu, Nguyen, Jihong Ju, Olaf Booij, Martin R. Oswald, Cees G. M.
Changhua Meng, Ke Wang, Jun Lan, Weiqiang Wang, Ming Gu, Snoek
Liqing Zhang 216. Multi-Label Iterated Learning for Image Classification With
197. SwinTextSpotter: Scene Text Spotting via Better Synergy Label Ambiguity, Sai Rajeswar, Pau Rodríguez, Soumye Singhal,
Between Text Detection and Text Recognition, Mingxin Huang, David Vazquez, Aaron Courville
Yuliang Liu, Zhenghao Peng, Chongyu Liu, Dahua Lin, Shenggao 217. Vision Transformer With Deformable Attention, Zhuofan Xia,
Zhu, Nicholas Yuan, Kai Ding, Lianwen Jin Xuran Pan, Shiji Song, Li Erran Li, Gao Huang

16
Tuesday, June 21 (Afternoon) Program
218. MViTv2: Improved Multiscale Vision Transformers for 238. Video-Text Representation Learning via Differentiable Weak
Classification and Detection, Yanghao Li, Chao-Yuan Wu, Haoqi Temporal Alignment, Dohwan Ko, Joonmyung Choi, Juyeon Ko,
Fan, Karttikeya Mangalam, Bo Xiong, Jitendra Malik, Christoph Shinyeong Noh, Kyoung-Woon On, Eun-Sol Kim, Hyunwoo J. Kim
Feichtenhofer 239. MAD: A Scalable Dataset for Language Grounding in Videos
219. Dense Learning Based Semi-Supervised Object Detection, From Movie Audio Descriptions, Mattia Soldan, Alejandro Pardo,
Binghui Chen, Pengyu Li, Xiang Chen, Biao Wang, Lei Zhang, Juan León Alcázar, Fabian Caba, Chen Zhao, Silvio Giancola,
Xian-Sheng Hua Bernard Ghanem
220. R(Det)2: Randomized Decision Routing for Object Detection, 240. Advancing High-Resolution Video-Language Representation
Yali Li, Shengjin Wang With Large-Scale Video Transcriptions, Hongwei Xue, Tiankai
221. GlideNet: Global, Local and Intrinsic Based Dense Embedding Hang, Yanhong Zeng, Yuchong Sun, Bei Liu, Huan Yang, Jianlong
NETwork for Multi-Category Attributes Prediction, Kareem Fu, Baining Guo
Metwaly, Aerin Kim, Elliot Branson, Vishal Monga 241. Measuring Compositional Consistency for Video Question
222. Self-Supervised Equivariant Learning for Oriented Keypoint Answering, Mona Gandhi, Mustafa Omer Gul, Eva Prakash,
Detection, Jongmin Lee, Byungjin Kim, Minsu Cho Madeleine Grunde-McLaughlin, Ranjay Krishna, Maneesh
223. Label Relation Graphs Enhanced Hierarchical Residual Network Agrawala
for Hierarchical Multi-Granularity Classification, Jingzhou Chen, 242. SimVQA: Exploring Simulated Environments for Visual
Peng Wang, Jian Liu, Yuntao Qian Question Answering, Paola Cascante-Bonilla, Hui Wu, Letao
224. Object Localization Under Single Coarse Point Supervision, Wang, Rogerio S. Feris, Vicente Ordonez
Xuehui Yu, Pengfei Chen, Di Wu, Najmul Hassan, Guorong Li, 243. Transform-Retrieve-Generate: Natural Language-Centric
Junchi Yan, Humphrey Shi, Qixiang Ye, Zhenjun Han Outside-Knowledge Visual Question Answering, Feng Gao, Qing
225. Rethinking Visual Geo-Localization for Large-Scale Ping, Govind Thattai, Aishwarya Reganti, Ying Nian Wu, Prem
Applications, Gabriele Berton, Carlo Masone, Barbara Caputo Natarajan
226. Whose Hands Are These? Hand Detection and Hand-Body 244. SwapMix: Diagnosing and Regularizing the Over-Reliance on
Association in the Wild, Supreeth Narasimhaswamy, Thanh Visual Context in Visual Question Answering, Vipul Gupta,
Nguyen, Mingzhen Huang, Minh Hoai Zhuowan Li, Adam Kortylewski, Chenyu Zhang, Yingwei Li, Alan
Yuille
227. Cloning Outfits From Real-World Images to 3D Characters for
Generalizable Person Re-Identification, Yanan Wang, Xuezhi 245. MuKEA: Multimodal Knowledge Extraction and Accumulation
Liang, Shengcai Liao for Knowledge-Based Visual Question Answering, Yang Ding,
Jing Yu, Bang Liu, Yue Hu, Mingxin Cui, Qi Wu
228. Towards Unsupervised Domain Generalization, Xingxuan
Zhang, Linjun Zhou, Renzhe Xu, Peng Cui, Zheyan Shen, Haoxin 246. Maintaining Reasoning Consistency in Compositional Visual
Liu Question Answering, Chenchen Jing, Yunde Jia, Yuwei Wu, Xinyu
Liu, Qi Wu
229. ViM: Out-of-Distribution With Virtual-Logit Matching, Haoqi
Wang, Zhizhong Li, Litong Feng, Wayne Zhang 247. MLSLT: Towards Multilingual Sign Language Translation,
Aoxiong Yin, Zhou Zhao, Weike Jin, Meng Zhang, Xingshan Zeng,
230. Vision Transformer Slimming: Multi-Dimension Searching in
Xiaofei He
Continuous Optimization Space, Arnav Chavan, Zhiqiang Shen,
Zhuang Liu, Zechun Liu, Kwang-Ting Cheng, Eric P. Xing 248. A Simple Multi-Modality Transfer Learning Baseline for Sign
Language Translation, Yutong Chen, Fangyun Wei, Xiao Sun,
231. Nonuniform-to-Uniform Quantization: Towards Accurate
Zhirong Wu, Stephen Lin
Quantization via Generalized Straight-Through Estimation,
Zechun Liu, Kwang-Ting Cheng, Dong Huang, Eric P. Xing, 249. C2SLR: Consistency-Enhanced Continuous Sign Language
Zhiqiang Shen Recognition, Ronglai Zuo, Brian Mak
250. Signing at Scale: Learning to Co-Articulate Signs for Large-
Vision & Language
Scale Photo-Realistic Sign Language Production, Ben Saunders,
232. Align and Prompt: Video-and-Language Pre-Training With
Necati Cihan Camgoz, Richard Bowden
Entity Prompts, Dongxu Li, Junnan Li, Hongdong Li, Juan Carlos
251. Generating Diverse and Natural 3D Human Motions From Text,
Niebles, Steven C.H. Hoi
Chuan Guo, Shihao Zou, Xinxin Zuo, Sen Wang, Wei Ji, Xingyu Li,
233. Language-Bridged Spatial-Temporal Interaction for Referring
Li Cheng
Video Object Segmentation, Zihan Ding, Tianrui Hui, Junshi
252. Sub-Word Level Lip Reading With Visual Attention, K R Prajwal,
Huang, Xiaoming Wei, Jizhong Han, Si Liu
Triantafyllos Afouras, Andrew Zisserman
234. Language As Queries for Referring Video Object Segmentation,
253. Habitat-Web: Learning Embodied Object-Search Strategies
Jiannan Wu, Yi Jiang, Peize Sun, Zehuan Yuan, Ping Luo
From Human Demonstrations at Scale, Ram Ramrakhya, Eric
235. End-to-End Referring Video Object Segmentation With
Undersander, Dhruv Batra, Abhishek Das
Multimodal Transformers, Adam Botach, Evgenii Zheltonozhskii,
254. ViSTA: Vision and Scene Text Aggregation for Cross-Modal
Chaim Baskin
Retrieval, Mengjun Cheng, Yipeng Sun, Longchao Wang,
236. Multi-Level Representation Learning With Semantic Alignment
Xiongwei Zhu, Kun Yao, Jie Chen, Guoli Song, Junyu Han, Jingtuo
for Referring Video Object Segmentation, Dongming Wu,
Liu, Errui Ding, Jingdong Wang
Xingping Dong, Ling Shao, Jianbing Shen
255. Cross Modal Retrieval With Querybank Normalisation, Simion-
237. X-Pool: Cross-Modal Language-Video Attention for Text-Video
Vlad Bogolin, Ioana Croitoru, Hailin Jin, Yang Liu, Samuel Albanie
Retrieval, Satya Krishna Gorti, Noël Vouitsis, Junwei Ma, Keyvan
256. Prompt Distribution Learning, Yuning Lu, Jianzhuang Liu,
Golestan, Maksims Volkovs, Animesh Garg, Guangwei Yu
Yonggang Zhang, Yajing Liu, Xinmei Tian

17
Tuesday, June 21 (Afternoon) Program
257. VALHALLA: Visual Hallucination for Machine Translation, Yi Li,
Rameswar Panda, Yoon Kim, Chun-Fu (Richard) Chen, Rogerio S.
Feris, David Cox, Nuno Vasconcelos
258. VL-ADAPTER: Parameter-Efficient Transfer Learning for Vision-
and-Language Tasks, Yi-Lin Sung, Jaemin Cho, Mohit Bansal
259. Winoground: Probing Vision and Language Models for Visio-
Linguistic Compositionality, Tristan Thrush, Ryan Jiang, Max
Bartolo, Amanpreet Singh, Adina Williams, Douwe Kiela, Candace
Ross

1700–1800 Plenary 1 (Hall B1)


Chair: Rama Chellappa (John Hopkins Univ.)
Keynote: Learning To See the Human Way, Josh Tenenbaum
(MIT)
Abstract: Computer vision is one of the great AI success stories.
Yet we are still far from having machine systems that can
reliably and robustly see everything a human being sees in an
image or in the real world. Despite rapid advances in self-
supervised visual and multimodal representation learning, we
are also far from having systems that can learn to see as richly
as a human does, from so little data, or that can learn new visual
concepts or adapt their representations as quickly as a human
does. And even today’s remarkable generative image synthesis
systems imagine the world in a very different and
fundamentally less flexible way than human beings do. How can
we close these gaps? I will describe several core insights from
the study of human vision and visual cognitive development
that run counter to the dominant trends in today’s computer
vision and machine learning world, but that can motivate and
guide an alternative approach to building practical machine
vision systems.
Technically, this approach rests on advances in differentiable
and probabilistic programming: hybrids of neural, symbolic and
probabilistic modeling and inference that can be more robust,
more flexible and more data-efficient than purely neural
approaches to learning to see. New probabilistic programming
platforms offer to make these approaches scalable as well.
Conceptually, this approach draws on classic proposals for
understanding vision as “inverse graphics”, “analysis by
synthesis” or “inference to the best explanation”, and the notion
that at least some high-level architecture for scene
representation is built into the brain by evolution rather than
learned from experience, reflecting invariant properties of the
physical world. Learning then enables, enriches and extends
these built-in representations; it does not create them from
scratch. I will show a few examples of recent machine vision
successes based on these ideas, from our group and others. But
the hardest problems are still very open. I will highlight some
“Grand Challenge” tasks for building machines that learn to see
like people: problems that far outstrip the abilities of any
current system, and that I hope can inspire the next steps
towards progress for computer vision researchers regardless of
which approach they favor.

1800–2000 PAMI TC Meeting (Hall B1)

Notes:

18
Wednesday, June 22 (Morning) Program
Wednesday, June 22 15. [0952] Deep Visual Geo-Localization Benchmark, Gabriele Berton,
Riccardo Mereu, Gabriele Trivigno, Carlo Masone, Gabriela Csurka,
Torsten Sattler, Barbara Caputo
0730–1600 Registration (Great Hall Lobby) 16. [1000] RendNet: Unified 2D/3D Recognizer With Latent Space
Rendering, Ruoxi Shi, Xinyang Jiang, Caihua Shan, Yansen Wang,
0700–0830 Breakfast (Halls D-E) Dongsheng Li
17. [1005] Sparse Fuse Dense: Towards High Quality 3D Detection
With Depth Completion, Xiaopei Wu, Liang Peng, Honghui Yang,
0800–0830 Poster Setup (Halls B2-C) Liang Xie, Chenxi Huang, Chengqi Deng, Haifeng Liu, Deng Cai
18. [1010] Focal Sparse Convolutional Networks for 3D Object
0830–1018 Oral 2.1.1: Recognition: Detection, Detection, Yukang Chen, Yanwei Li, Xiangyu Zhang, Jian Sun,
Categorization, Retrieval (Great Hall A-D) Jiaya Jia
Papers in this session are in Poster Session 2.1
Chairs: Jean Ponce (Inria) 0830–1018 Oral 2.1.2: 3D From Multi-View & Sensors
Emily Hand (Univ. of Nevada, Reno) (Hall B1)
Hedvig Kjellström (KTH Royal Inst. of Technology) Papers in this session are in Poster Session 2.1
Format (5 min. presentation; 3 min. group questions/3 papers) Chairs: Daniel Cremers (TU Munich)
1. [0830] MixFormer: Mixing Features Across Windows and Kostas Daniilidis (Univ. of Pennsylvania)
Dimensions, Qiang Chen, Qiman Wu, Jian Wang, Qinghao Hu, Tao
Format (5 min. presentation; 3 min. group questions/3 papers)
Hu, Errui Ding, Jian Cheng, Jingdong Wang
19. [0830] Point-NeRF: Point-Based Neural Radiance Fields,
2. [0835] Recurrent Glimpse-Based Decoder for Detection With
Qiangeng Xu, Zexiang Xu, Julien Philip, Sai Bi, Zhixin Shu, Kalyan
Transformer, Zhe Chen, Jing Zhang, Dacheng Tao
Sunkavalli, Ulrich Neumann
3. [0840] Mobile-Former: Bridging MobileNet and Transformer,
20. [0835] NeRFusion: Fusing Radiance Fields for Large-Scale Scene
Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Xiaoyi
Reconstruction, Xiaoshuai Zhang, Sai Bi, Kalyan Sunkavalli, Hao
Dong, Lu Yuan, Zicheng Liu
Su, Zexiang Xu
4. [0848] Unsupervised Domain Generalization by Learning a Bridge 21. [0840] Direct Voxel Grid Optimization: Super-Fast Convergence
Across Domains, Sivan Harary, Eli Schwartz, Assaf Arbelle, Peter for Radiance Fields Reconstruction, Cheng Sun, Min Sun, Hwann-
Staar, Shady Abu-Hussein, Elad Amrani, Roei Herzig, Amit Alfassy, Tzong Chen
Raja Giryes, Hilde Kuehne, Dina Katabi, Kate Saenko, Rogerio S.
22. [0848] Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance
Feris, Leonid Karlinsky
Fields, Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P.
5. [0853] SIGMA: Semantic-Complete Graph Matching for Domain
Srinivasan, Peter Hedman
Adaptive Object Detection, Wuyang Li, Xinyu Liu, Yixuan Yuan
23. [0853] RegNeRF: Regularizing Neural Radiance Fields for View
6. [0858] Target-Relevant Knowledge Preservation for Multi-Source
Synthesis From Sparse Inputs, Michael Niemeyer, Jonathan T.
Domain Adaptive Object Detection, Jiaxi Wu, Jiaxin Chen,
Barron, Ben Mildenhall, Mehdi S. M. Sajjadi, Andreas Geiger, Noha
Mengzhe He, Yiru Wang, Bo Li, Bingqi Ma, Weihao Gan, Wei Wu,
Radwan
Yali Wang, Di Huang
24. [0858] Ref-NeRF: Structured View-Dependent Appearance for
7. [0906] PNP: Robust Learning From Noisy Labels by Probabilistic Neural Radiance Fields, Dor Verbin, Peter Hedman, Ben
Noise Prediction, Zeren Sun, Fumin Shen, Dan Huang, Qiong Mildenhall, Todd Zickler, Jonathan T. Barron, Pratul P. Srinivasan
Wang, Xiangbo Shu, Yazhou Yao, Jinhui Tang
25. [0906] Plenoxels: Radiance Fields Without Neural Networks, Sara
8. [0911] Few-Shot Object Detection With Fully Cross-Transformer,
Fridovich-Keil, Alex Yu, Matthew Tancik, Qinhong Chen, Benjamin
Guangxing Han, Jiawei Ma, Shiyuan Huang, Long Chen, Shih-Fu
Recht, Angjoo Kanazawa
Chang
26. [0911] Neural 3D Scene Reconstruction With the Manhattan-
9. [0916] Task Discrepancy Maximization for Fine-Grained Few-
World Assumption, Haoyu Guo, Sida Peng, Haotong Lin, Qianqian
Shot Classification, SuBeen Lee, WonJun Moon, Jae-Pil Heo
Wang, Guofeng Zhang, Hujun Bao, Xiaowei Zhou
10. [0924] Leveraging Self-Supervision for Cross-Domain Crowd 27. [0916] Neural 3D Video Synthesis From Multi-View Video, Tianye
Counting, Weizhe Liu, Nikita Durasov, Pascal Fua Li, Mira Slavcheva, Michael Zollhöfer, Simon Green, Christoph
11. [0929] What To Look at and Where: Semantic and Spatial Lassner, Changil Kim, Tanner Schmidt, Steven Lovegrove, Michael
Refined Transformer for Detecting Human-Object Interactions, Goesele, Richard Newcombe, Zhaoyang Lv
A S M Iftekhar, Hao Chen, Kaustav Kundu, Xinyu Li, Joseph Tighe,
28. [0924] Learning To Solve Hard Minimal Problems, Petr Hruby,
Davide Modolo
Timothy Duff, Anton Leykin, Tomas Pajdla
12. [0934] AdaMixer: A Fast-Converging Query-Based Object
29. [0929] Learning a Structured Latent Space for Unsupervised
Detector, Ziteng Gao, Limin Wang, Bing Han, Sheng Guo
Point Cloud Completion, Yingjie Cai, Kwan-Yee Lin, Chao Zhang,
13. [0942] Correlation Verification for Image Retrieval, Seongwon Qiang Wang, Xiaogang Wang, Hongsheng Li
Lee, Hongje Seong, Suhyeon Lee, Euntai Kim 30. [0934] Lepard: Learning Partial Point Cloud Matching in Rigid and
14. [0947] Real-Time Object Detection for Streaming Perception, Deformable Scenes, Yang Li, Tatsuya Harada
Jinrong Yang, Songtao Liu, Zeming Li, Xiaoping Li, Jian Sun
31. [0942] IRON: Inverse Rendering by Optimizing Neural SDFs and
Materials From Photometric Images, Kai Zhang, Fujun Luan,
Zhengqi Li, Noah Snavely
19
Wednesday, June 22 (Morning) Program
32. [0947] Learning Multi-View Aggregation in the Wild for Large- 49. [0942] Parametric Scattering Networks, Shanel Gauthier,
Scale 3D Semantic Segmentation, Damien Robert, Bruno Vallet, Benjamin Thérien, Laurent Alsène-Racicot, Muawiz Chaudhary,
Loic Landrieu Irina Rish, Eugene Belilovsky, Michael Eickenberg, Guy Wolf
33. [0952] HyperDet3D: Learning a Scene-Conditioned 3D Object 50. [0947] Burst Image Restoration and Enhancement, Akshay
Detector, Yu Zheng, Yueqi Duan, Jiwen Lu, Jie Zhou, Qi Tian Dudhane, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz
34. [1000] KeyTr: Keypoint Transporter for 3D Reconstruction of Khan, Ming-Hsuan Yang
Deformable Objects in Videos, David Novotny, Ignacio Rocco, 51. [0952] MAXIM: Multi-Axis MLP for Image Processing,
Samarth Sinha, Alexandre Carlier, Gael Kerchenbaum, Roman Zhengzhong Tu, Hossein Talebi, Han Zhang, Feng Yang, Peyman
Shapovalov, Nikita Smetanin, Natalia Neverova, Benjamin Milanfar, Alan Bovik, Yinxiao Li
Graham, Andrea Vedaldi 52. [1000] Event-Aided Direct Sparse Odometry, Javier Hidalgo-
35. [1005] SelfRecon: Self Reconstruction Your Digital Avatar From Carrió, Guillermo Gallego, Davide Scaramuzza
Monocular Video, Boyi Jiang, Yang Hong, Hujun Bao, Juyong 53. [1005] CamLiFlow: Bidirectional Camera-LiDAR Fusion for Joint
Zhang Optical Flow and Scene Flow Estimation, Haisong Liu, Tao Lu,
36. [1010] Ditto: Building Digital Twins of Articulated Objects From Yihui Xu, Jia Liu, Wenjie Li, Lijun Chen
Interaction, Zhenyu Jiang, Cheng-Chun Hsu, Yuke Zhu 54. [1010] Target-Aware Dual Adversarial Learning and a Multi-
Scenario Multi-Modality Benchmark To Fuse Infrared and Visible
for Object Detection, Jinyuan Liu, Xin Fan, Zhanbo Huang,
0830–1018 Oral 2.1.3: Low-Level Vision (Great Hall B-C) Guanyao Wu, Risheng Liu, Wei Zhong, Zhongxuan Luo
Papers in this session are in Poster Session 2.1
Chairs: Minsu Cho (POSTECH)
1030–1100 Morning Break (Halls B2-C)
Kavita Bala (Cornell Univ.)
Format (5 min. presentation; 3 min. group questions/3 papers)
37. [0830] Bijective Mapping Network for Shadow Removal, Yurui 1000–1700 Exhibits (Halls B2-C)
Zhu, Jie Huang, Xueyang Fu, Feng Zhao, Qibin Sun, Zheng-Jun Zha • See Exhibits map for list of exhibitors.
38. [0835] Toward Fast, Flexible, and Robust Low-Light Image
Enhancement, Long Ma, Tengyu Ma, Risheng Liu, Xin Fan,
1000–1230 Demos (Halls B2-C Demo Area)
Zhongxuan Luo
• Real-Time, Accurate, and Consistent Video Semantic
39. [0840] Robust Equivariant Imaging: A Fully Unsupervised
Segmentation via Unsupervised Adaptation and Cross-Unit
Framework for Learning To Image From Noisy and Partial
Deployment on Mobile Device, Hyojin Park, Alan Yessenbayev,
Measurements, Dongdong Chen, Julián Tachella, Mike E. Davies
Tushar Singhal, Navin Kumar Adhikari, Yizhe Zhang, Shubhankar
40. [0848] Details or Artifacts: A Locally Discriminative Learning Borse, Hong Cai, Nilesh Pandey, Fei Yin, Frank Mayer, Balaji Calidas,
Approach to Realistic Image Super-Resolution, Jie Liang, Hui Fatih Porikli, (Qualcomm AI Research)
Zeng, Lei Zhang
• A Low-Cost & Real-Time Motion Capture System, Anargyros
41. [0853] Dual Adversarial Adaptation for Cross-Device Real-World Chatzitofis, Georgios Albanis, Nikolaos Zioulis, Spyridon Thermos
Image Super-Resolution, Xiaoqian Xu, Pengxu Wei, Weikai Chen, (Codewheel; Univ. of Thessaly)
Yang Liu, Mingzhi Mao, Liang Lin, Guanbin Li
• GeoEngine: A Platform for Production-Ready Geospatial
42. [0858] SphereSR: 360° Image Super-Resolution With Arbitrary
Research, Sagar Verma, Siddharth Gupta, Hal Shin, Akash
Projection via Continuous Spherical Image Representation,
Panigrahi, Shubham Goswami, Shweta Pardeshi, Natanael Exe,
Youngho Yoon, Inchul Chung, Lin Wang, Kuk-Jin Yoon
Ujwal Dutta, Tanka Joshi, Nitin Bhojwani (Université Paris-Saclay,
43. [0906] Learning Trajectory-Aware Transformer for Video Super- CentraleSupélec, Inria, Centre de Vision Numérique, Granular AI)
Resolution, Chengxu Liu, Huan Yang, Jianlong Fu, Xueming Qian • DeepLIIF: An Online Platform for Quantification of Clinical
44. [0911] Discrete Cosine Transform Network for Guided Depth Map Pathology Slides, Parmida Ghahremani, Joseph Marino, Ricardo
Super-Resolution, Zixiang Zhao, Jiangshe Zhang, Shuang Xu, Zudi Dodds, Saad Nadeem (Memorial Sloan Kettering Cancer Center)
Lin, Hanspeter Pfister
• Talking Face Generation With Multilingual TTS, Hyoung-Kyu Song,
45. [0916] Faithful Extreme Rescaling via Generative Prior Sang Hoon Woo, Junhyeok Lee, Seungmin Yang, Hyunjae Cho,
Reciprocated Invertible Representations, Zhixuan Zhong, Liangyu Dongho Choi, Kang-wook Kim, Youseong Lee (MINDsLab Inc.;
Chai, Yang Zhou, Bailin Deng, Jia Pan, Shengfeng He KAIST; Seoul National Univ.)
46. [0924] ELIC: Efficient Learned Image Compression With • [Virtual] Scenic: A JAX Library for Computer Vision Research and
Unevenly Grouped Space-Channel Contextual Adaptive Coding, Beyond, Mostafa Dehghani, Alexey Gritsenko, Anurag Arnab,
Dailan He, Ziming Yang, Weikun Peng, Rui Ma, Hongwei Qin, Yan Matthias Minderer, yi Tay (Google Brain & Google Research)
Wang
• [Virtual] BigDL 2.0: Seamless Scaling of AI Pipelines From Laptops
47. [0929] Restormer: Efficient Transformer for High-Resolution to Distributed Cluster, Jason Dai, Ding Ding, Dongjie Shi,
Image Restoration, Syed Waqas Zamir, Aditya Arora, Salman Shengsheng Huang, Jiao Wang, Xin Qiu, Kai Huang, Guoqiong Song,
Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang Yang Wang, Yiquan Gong, Jiaming Song, Shan Yu, Le Zheng, Yina
48. [0934] Deep Rectangling for Image Stitching: A Learning Chen, Junwei Deng, Ge Song (Intel)
Baseline, Lang Nie, Chunyu Lin, Kang Liao, Shuaicheng Liu, Yao
• [Virtual] PyMiceTracking: An Open-Source Toolbox for Real-Time
Zhao
Behavioral Neuroscience Experiments, Richardson Menezes, Aron
de Miranda, Helton Maia (Federal Univ. of Rio Grande do Norte)

20
Wednesday, June 22 (Morning) Program
1000–1230 Poster 2.1 (Halls B2-C) 77. LC-FDNet: Learned Lossless Image Compression With
Low-Level Vision Frequency Decomposition Network, Hochang Rhee, Yeong Il
Jang, Seyun Kim, Nam Ik Cho
55. Image Dehazing Transformer With Transmission-Aware 3D
Position Embedding, Chun-Le Guo, Qixin Yan, Saeed Anwar, 78. Exposure Normalization and Compensation for Multiple-
Runmin Cong, Wenqi Ren, Chongyi Li Exposure Correction, Jie Huang, Yajing Liu, Xueyang Fu, Man
Zhou, Yang Wang, Feng Zhao, Zhiwei Xiong
56. Unsupervised Deraining: Where Contrastive Learning Meets
Self-Similarity, Yuntong Ye, Changfeng Yu, Yi Chang, Lin Zhu, Xi- 79. Revisiting Temporal Alignment for Video Restoration, Kun
Le Zhao, Luxin Yan, Yonghong Tian Zhou, Wenbo Li, Liying Lu, Xiaoguang Han, Jiangbo Lu
57. Towards Multi-Domain Single Image Dehazing via Test-Time
80. Learning the Degradation Distribution for Blind Image Super-
Training, Huan Liu, Zijun Wu, Liangyan Li, Sadaf Salehkalaibar, Resolution, Zhengxiong Luo, Yan Huang, Shang Li, Liang Wang,
Jun Chen, Keyan Wang Tieniu Tan
58. Physically Disentangled Intra- and Inter-Domain Adaptation for
81. LSVC: A Learning-Based Stereo Video Compression
Varicolored Haze Removal, Yi Li, Yi Chang, Yan Gao, Changfeng Framework, Zhenghao Chen, Guo Lu, Zhihao Hu, Shan Liu, Wei
Yu, Luxin Yan Jiang, Dong Xu
59. Incorporating Semi-Supervised and Positive-Unlabeled Learning
82. Learning Based Multi-Modality Image and Video Compression,
for Boosting Full Reference Image Quality Assessment, Yue Cao, Guo Lu, Tianxiong Zhong, Jing Geng, Qiang Hu, Dong Xu
Zhaolin Wan, Dongwei Ren, Zifei Yan, Wangmeng Zuo 83. Transformer Based Line Segment Classifier With Image Context
60. Practical Learned Lossless JPEG Recompression With Multi-
for Real-Time Vanishing Point Detection in Manhattan World,
Level Cross-Channel Entropy Model in the DCT Domain, Lina Xin Tong, Xianghua Ying, Yongjie Shi, Ruibin Wang, Jinfa Yang
Guo, Xinjie Shi, Dailan He, Yuanyuan Wang, Rui Ma, Hongwei 84. Deep Vanishing Point Detection: Geometric Priors Make
Qin, Yan Wang Dataset Variations Vanish, Yancong Lin, Ruben Wiersma, Silvia
61. Neural Compression-Based Feature Learning for Video
L. Pintea, Klaus Hildebrandt, Elmar Eisemann, Jan C. van Gemert
Restoration, Cong Huang, Jiahao Li, Bin Li, Dong Liu, Yan Lu 85. Stereo Depth From Events Cameras: Concentrate and Focus on
62. Bi-Directional Object-Context Prioritization Learning for
the Future, Yeongwoo Nam, Mohammad Mostafavi, Kuk-Jin
Saliency Ranking, Xin Tian, Ke Xu, Xin Yang, Lin Du, Baocai Yin, Yoon, Jonghyun Choi
Rynson W.H. Lau 3D From Multi-View & Sensors
63. Pixel Screening Based Intermediate Correction for Blind 86. Volumetric Bundle Adjustment for Online Photorealistic Scene
Deblurring, Meina Zhang, Yingying Fang, Guoxi Ni, Tieyong Zeng Capture, Ronald Clark
64. URetinex-Net: Retinex-Based Deep Unfolding Network for 87. Neural Volumetric Object Selection, Zhongzheng Ren, Aseem
Low-Light Image Enhancement, Wenhui Wu, Jian Weng, Agarwala, Bryan Russell, Alexander G. Schwing, Oliver Wang
Pingping Zhang, Xu Wang, Wenhan Yang, Jianmin Jiang 88. HVH: Learning a Hybrid Neural Volumetric Representation for
65. A Text Attention Network for Spatial Deformation Robust Scene Dynamic Hair Performance Capture, Ziyan Wang, Giljoo Nam,
Text Image Super-Resolution, Jianqi Ma, Zhetong Liang, Lei Zhang Tuur Stuyck, Stephen Lombardi, Michael Zollhöfer, Jessica
66. Coarse-To-Fine Deep Video Coding With Hyperprior-Guided Hodgins, Christoph Lassner
Mode Prediction, Zhihao Hu, Guo Lu, Jinyang Guo, Shan Liu, Wei 89. NeuralHOFusion: Neural Volumetric Rendering Under Human-
Jiang, Dong Xu Object Interactions, Yuheng Jiang, Suyi Jiang, Guoxing Sun, Zhuo
67. Task Decoupled Framework for Reference-Based Super- Su, Kaiwen Guo, Minye Wu, Jingyi Yu, Lan Xu
Resolution, Yixuan Huang, Xiaoyun Zhang, Yu Fu, Siheng Chen, 90. BNV-Fusion: Dense 3D Reconstruction Using Bi-Level Neural
Ya Zhang, Yan-Feng Wang, Dazhi He Volume Fusion, Kejie Li, Yansong Tang, Victor Adrian Prisacariu,
68. Learning Semantic Associations for Mirror Detection, Huankang Philip H.S. Torr
Guan, Jiaying Lin, Rynson W.H. Lau 91. Input-Level Inductive Biases for 3D Reconstruction, Wang Yifan,
69. SketchEdit: Mask-Free Local Image Manipulation With Partial Carl Doersch, Relja Arandjelović, João Carreira, Andrew Zisserman
Sketches, Yu Zeng, Zhe Lin, Vishal M. Patel 92. Multi-View Mesh Reconstruction With Neural Deferred
70. Investigating Tradeoffs in Real-World Video Super-Resolution, Shading, Markus Worchel, Rodrigo Diaz, Weiwen Hu, Oliver
Kelvin C.K. Chan, Shangchen Zhou, Xiangyu Xu, Chen Change Loy Schreer, Ingo Feldmann, Peter Eisert
71. BasicVSR++: Improving Video Super-Resolution With Enhanced 93. StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions,
Propagation and Alignment, Kelvin C.K. Chan, Shangchen Zhou, Lukas Höllein, Justin Johnson, Matthias Nießner
Xiangyu Xu, Chen Change Loy 94. RGB-Depth Fusion GAN for Indoor Depth Completion, Haowen
72. Inertia-Guided Flow Completion and Style Fusion for Video Wang, Mingyuan Wang, Zhengping Che, Zhiyuan Xu, Xiuquan
Inpainting, Kaidong Zhang, Jingjing Fu, Dong Liu Qiao, Mengshi Qi, Feifei Feng, Jian Tang
73. Joint Global and Local Hierarchical Priors for Learned Image 95. PlanarRecon: Real-Time 3D Plane Detection and Reconstruction
Compression, Jun-Hyuk Kim, Byeongho Heo, Jong-Seok Lee From Posed Monocular Videos, Yiming Xie, Matheus Gadelha,
74. Reflash Dropout in Image Super-Resolution, Xiangtao Kong, Fengting Yang, Xiaowei Zhou, Huaizu Jiang
Xina Liu, Jinjin Gu, Yu Qiao, Chao Dong 96. Scene Representation Transformer: Geometry-Free Novel View
75. Towards Robust Rain Removal Against Adversarial Attacks: A Synthesis Through Set-Latent Scene Representations, Mehdi S.
Comprehensive Benchmark Analysis and Beyond, Yi Yu, M. Sajjadi, Henning Meyer, Etienne Pot, Urs Bergmann, Klaus
Wenhan Yang, Yap-Peng Tan, Alex C. Kot Greff, Noha Radwan, Suhani Vora, Mario Lučić, Daniel
76. Dreaming To Prune Image Deraining Networks, Weiqi Zou, Yang Duckworth, Alexey Dosovitskiy, Jakob Uszkoreit, Thomas
Wang, Xueyang Fu, Yang Cao Funkhouser, Andrea Tagliasacchi

21
Wednesday, June 22 (Morning) Program
97. ShapeFormer: Transformer-Based Shape Completion via 118. Motron: Multimodal Probabilistic Human Motion Forecasting,
Sparse Representation, Xingguang Yan, Liqiang Lin, Niloy J. Tim Salzmann, Marco Pavone, Markus Ryll
Mitra, Dani Lischinski, Daniel Cohen-Or, Hui Huang 119. Human Trajectory Prediction With Momentary Observation,
98. GuideFormer: Transformers for Image Guided Depth Jianhua Sun, Yuxuan Li, Liang Chai, Hao-Shu Fang, Yong-Lu Li,
Completion, Kyeongha Rho, Jinsung Ha, Youngjung Kim Cewu Lu
99. Improving Neural Implicit Surfaces Geometry With Patch 120. Non-Probability Sampling Network for Stochastic Human
Warping, François Darmon, Bénédicte Bascle, Jean-Clément Trajectory Prediction, Inhwan Bae, Jin-Hwi Park, Hae-Gon Jeon
Devaux, Pascal Monasse, Mathieu Aubry 121. Remember Intentions: Retrospective-Memory-Based Trajectory
100. Critical Regularizations for Neural Surface Reconstruction in the Prediction, Chenxin Xu, Weibo Mao, Wenjun Zhang, Siheng Chen
Wild, Jingyang Zhang, Yao Yao, Shiwei Li, Tian Fang, David 122. GroupNet: Multiscale Hypergraph Neural Networks for
McKinnon, Yanghai Tsin, Long Quan Trajectory Prediction With Relational Reasoning, Chenxin Xu,
101. Gradient-SDF: A Semi-Implicit Surface Representation for 3D Maosen Li, Zhenyang Ni, Ya Zhang, Siheng Chen
Reconstruction, Christiane Sommer, Lu Sang, David Schubert, 123. Learning Pixel Trajectories With Multiscale Contrastive Random
Daniel Cremers Walks, Zhangxing Bian, Allan Jabri, Alexei A. Efros, Andrew Owens
102. Neural RGB-D Surface Reconstruction, Dejan Azinović, Ricardo 124. Adaptive Trajectory Prediction via Transferable GNN, Yi Xu,
Martin-Brualla, Dan B Goldman, Matthias Nießner, Justus Thies Lichen Wang, Yizhou Wang, Yun Fu
103. POCO: Point Convolution for Surface Reconstruction, Alexandre 125. Neural Prior for Trajectory Estimation, Chaoyang Wang,
Boulch, Renaud Marlet Xueqian Li, Jhony Kaesemodel Pontes, Simon Lucey
104. Reconstructing Surfaces for Sparse Point Clouds With On- 126. M2I: From Factored Marginal Trajectory Prediction to
Surface Priors, Baorui Ma, Yu-Shen Liu, Zhizhong Han Interactive Prediction, Qiao Sun, Xin Huang, Junru Gu, Brian C.
105. Surface Reconstruction From Point Clouds by Learning Williams, Hang Zhao
Predictive Context Priors, Baorui Ma, Yu-Shen Liu, Matthias 127. How Many Observations Are Enough? Knowledge Distillation
Zwicker, Zhizhong Han for Trajectory Forecasting, Alessio Monti, Angelo Porrello,
106. IDEA-Net: Dynamic 3D Point Cloud Interpolation via Deep Simone Calderara, Pasquale Coscia, Lamberto Ballan, Rita
Embedding Alignment, Yiming Zeng, Yue Qian, Qijian Zhang, Cucchiara
Junhui Hou, Yixuan Yuan, Ying He 128. ATPFL: Automatic Trajectory Prediction Model Design Under
107. Deterministic Point Cloud Registration via Novel Transformation Federated Learning Framework, Chunnan Wang, Xiang Chen,
Decomposition, Wen Chen, Haoang Li, Qiang Nie, Yun-Hui Liu Junzhe Wang, Hongzhi Wang
108. Global-Aware Registration of Less-Overlap RGB-D Scans, Che 129. Whose Track Is It Anyway? Improving Robustness to Tracking
Sun, Yunde Jia, Yi Guo, Yuwei Wu Errors With Affinity-Based Trajectory Prediction, Xinshuo Weng,
109. Finding Good Configurations of Planar Primitives in Boris Ivanovic, Kris Kitani, Marco Pavone
Unorganized Point Clouds, Mulin Yu, Florent Lafarge 130. Convolutions for Spatial Interaction Modeling, Zhaoen Su, Chao
110. Self-Supervised Global-Local Structure Modeling for Point Wang, David Bradley, Carlos Vallespi-Gonzalez, Carl Wellington,
Cloud Domain Adaptation With Reliable Voted Pseudo Labels, Nemanja Djuric
Hehe Fan, Xiaojun Chang, Wanyue Zhang, Yi Cheng, Ying Sun, 131. Style-ERD: Responsive and Coherent Online Motion Style
Mohan Kankanhalli Transfer, Tianxin Tao, Xiaohang Zhan, Zhongquan Chen, Michiel
111. AziNorm: Exploiting the Radial Symmetry of Point Cloud for van de Panne
Azimuth-Normalized 3D Perception, Shaoyu Chen, Xinggang 132. Neural Inertial Localization, Sachini Herath, David Caruso, Chen
Wang, Tianheng Cheng, Wenqiang Zhang, Qian Zhang, Chang Liu, Yufan Chen, Yasutaka Furukawa
Huang, Wenyu Liu 133. RIO: Rotation-Equivariance Supervised Learning of Robust
112. WarpingGAN: Warping Multiple Uniform Priors for Adversarial Inertial Odometry, Xiya Cao, Caifa Zhou, Dandan Zeng,
3D Point Cloud Generation, Yingzhi Tang, Yue Qian, Qijian Yongliang Wang
Zhang, Yiming Zeng, Junhui Hou, Xuefei Zhe 134. CaDeX: Learning Canonical Deformation Coordinate Space for
Motion & Tracking Dynamic Surface Representation via Neural Homeomorphism,
113. Forward Propagation, Backward Regression, and Pose Jiahui Lei, Kostas Daniilidis
Association for Hand Tracking in the Wild, Mingzhen Huang, Pose Estimation & Tracking
Supreeth Narasimhaswamy, Saif Vazir, Haibin Ling, Minh Hoai 135. ElePose: Unsupervised 3D Human Pose Estimation by
114. Neural MoCon: Neural Motion Control for Physically Plausible Predicting Camera Elevation and Learning Normalizing Flows
Human Motion Capture, Buzhen Huang, Liang Pan, Yuan Yang, on 2D Poses, Bastian Wandt, James J. Little, Helge Rhodin
Jingyi Ju, Yangang Wang 136. Projective Manifold Gradient Layer for Deep Rotation
115. MotionAug: Augmentation With Physical Correction for Human Regression, Jiayi Chen, Yingda Yin, Tolga Birdal, Baoquan Chen,
Motion Prediction, Takahiro Maeda, Norimichi Ukita Leonidas J. Guibas, He Wang
116. Progressively Generating Better Initial Guesses Towards Next 137. Multimodal Colored Point Cloud to Image Alignment, Noam
Stages for High-Quality Human Motion Prediction, Tiezheng Rotstein, Amit Bracha, Ron Kimmel
Ma, Yongwei Nie, Chengjiang Long, Qing Zhang, Guiqing Li 138. Multi-Instance Point Cloud Registration by Efficient
117. Spatio-Temporal Gating-Adjacency GCN for Human Motion Correspondence Clustering, Weixuan Tang, Danping Zou
Prediction, Chongyang Zhong, Lei Hu, Zihao Zhang, Yongjing Ye, 139. REGTR: End-to-End Point Cloud Correspondences With
Shihong Xia Transformers, Zi Jian Yew, Gim Hee Lee

22
Wednesday, June 22 (Morning) Program
140. Text2Pos: Text-to-Point-Cloud Cross-Modal Localization, 160. Long-Tailed Recognition via Weight Balancing, Shaden
Manuel Kolmet, Qunjie Zhou, Aljoša Ošep, Laura Leal-Taixé Alshammari, Yu-Xiong Wang, Deva Ramanan, Shu Kong
141. BCOT: A Markerless High-Precision 3D Object Tracking 161. Balanced Contrastive Learning for Long-Tailed Visual
Benchmark, Jiachen Li, Bin Wang, Shiqiang Zhu, Xin Cao, Fan Recognition, Jianggang Zhu, Zheng Wang, Jingjing Chen, Yi-Ping
Zhong, Wenxuan Chen, Te Li, Jason Gu, Xueying Qin Phoebe Chen, Yu-Gang Jiang
142. SAR-Net: Shape Alignment and Recovery Network for Category- 162. Targeted Supervised Contrastive Learning for Long-Tailed
Level 6D Object Pose and Size Estimation, Haitao Lin, Zichang Recognition, Tianhong Li, Peng Cao, Yuan Yuan, Lijie Fan, Yuzhe
Liu, Chilam Cheang, Yanwei Fu, Guodong Guo, Xiangyang Xue Yang, Rogerio S. Feris, Piotr Indyk, Dina Katabi
143. ES6D: A Computation Efficient and Symmetry-Aware 6D Pose 163. Long-Tailed Visual Recognition via Gaussian Clouded Logit
Regression Framework, Ningkai Mo, Wanshui Gan, Naoto Adjustment, Mengke Li, Yiu-ming Cheung, Yang Lu
Yokoya, Shifeng Chen 164. Long-Tail Recognition via Compositional Knowledge Transfer,
144. Coupled Iterative Refinement for 6D Multi-Object Pose Sarah Parisot, Pedro M. Esperança, Steven McDonagh, Tamas J.
Estimation, Lahav Lipson, Zachary Teed, Ankit Goyal, Jia Deng Madarasz, Yongxin Yang, Zhenguo Li
145. ZebraPose: Coarse to Fine Surface Encoding for 6DoF Object 165. Nested Collaborative Learning for Long-Tailed Visual Recog-
Pose Estimation, Yongzhi Su, Mahdi Saleh, Torben Fetzer, Jason nition, Jun Li, Zichang Tan, Jun Wan, Zhen Lei, Guodong Guo
Rambach, Nassir Navab, Benjamin Busam, Didier Stricker, 166. Retrieval Augmented Classification for Long-Tail Visual
Federico Tombari Recognition, Alexander Long, Wei Yin, Thalaiyasingam Ajanthan,
146. SurfEmb: Dense and Continuous Correspondence Distributions Vu Nguyen, Pulak Purkait, Ravi Garg, Alan Blair, Chunhua Shen,
for Object Pose Estimation With Learnt Surface Embeddings, Anton van den Hengel
Rasmus Laurvig Haugaard, Anders Glent Buch 167. Trustworthy Long-Tailed Classification, Bolian Li, Zongbo Han,
147. MetaPose: Fast 3D Pose From Multiple Views Without 3D Haining Li, Huazhu Fu, Changqing Zhang
Supervision, Ben Usman, Andrea Tagliasacchi, Kate Saenko, 168. C2AM Loss: Chasing a Better Decision Boundary for Long-Tail
Avneesh Sud Object Detection, Tong Wang, Yousong Zhu, Yingying Chen,
148. Templates for 3D Object Pose Estimation Revisited: Chaoyang Zhao, Bin Yu, Jinqiao Wang, Ming Tang
Generalization to New Objects and Robustness to Occlusions, 169. Equalized Focal Loss for Dense Long-Tailed Object Detection,
Van Nguyen Nguyen, Yinlin Hu, Yang Xiao, Mathieu Salzmann, Bo Li, Yongqiang Yao, Jingru Tan, Gang Zhang, Fengwei Yu,
Vincent Lepetit Jianwei Lu, Ye Luo
149. GPV-Pose: Category-Level Object Pose Estimation via 170. Relieving Long-Tailed Instance Segmentation via Pairwise Class
Geometry-Guided Point-Wise Voting, Yan Di, Ruida Zhang, Balance, Yin-Yin He, Peizhen Zhang, Xiu-Shen Wei, Xiangyu
Zhiqiang Lou, Fabian Manhardt, Xiangyang Ji, Nassir Navab, Zhang, Jian Sun
Federico Tombari 171. iFS-RCNN: An Incremental Few-Shot Instance Segmenter, Khoi
150. HSC4D: Human-Centered 4D Scene Capture in Large-Scale Nguyen, Sinisa Todorovic
Indoor-Outdoor Space Using Wearable IMUs and LiDAR, Yudi 172. Open-Vocabulary Instance Segmentation via Robust Cross-
Dai, Yitai Lin, Chenglu Wen, Siqi Shen, Lan Xu, Jingyi Yu, Yuexin Modal Pseudo-Labeling, Dat Huynh, Jason Kuen, Zhe Lin,
Ma, Cheng Wang Jiuxiang Gu, Ehsan Elhamifar
151. OVE6D: Object Viewpoint Encoding for Depth-Based 6D Object 173. SimT: Handling Open-Set Noise for Domain Adaptive Semantic
Pose Estimation, Dingding Cai, Janne Heikkilä, Esa Rahtu Segmentation, Xiaoqing Guo, Jie Liu, Tongliang Liu, Yixuan Yuan
152. FS6D: Few-Shot 6D Pose Estimation of Novel Objects, Yisheng 174. Undoing the Damage of Label Shift for Cross-Domain Semantic
He, Yao Wang, Haoqiang Fan, Jian Sun, Qifeng Chen Segmentation, Yahao Liu, Jinhong Deng, Jiale Tao, Tong Chu,
153. OnePose: One-Shot Object Pose Estimation Without CAD Lixin Duan, Wen Li
Models, Jiaming Sun, Zihao Wang, Siyu Zhang, Xingyi He, 175. Representation Compensation Networks for Continual
Hongcheng Zhao, Guofeng Zhang, Xiaowei Zhou Semantic Segmentation, Chang-Bin Zhang, Jia-Wen Xiao, Xialei
154. OSOP: A Multi-Stage One Shot Object Pose Estimation Liu, Ying-Cong Chen, Ming-Ming Cheng
Framework, Ivan Shugurov, Fu Li, Benjamin Busam, Slobodan Ilic 176. Remember the Difference: Cross-Domain Few-Shot Semantic
155. DiffPoseNet: Direct Differentiable Camera Pose Estimation, Segmentation via Meta-Memory Transfer, Wenjian Wang,
Chethan M. Parameshwara, Gokul Hari, Cornelia Fermüller, Nitin Lijuan Duan, Yuxi Wang, Qing En, Junsong Fan, Zhaoxiang Zhang
J. Sanket, Yiannis Aloimonos 177. Domain-Agnostic Prior for Transfer Semantic Segmentation,
156. Iterative Corresponding Geometry: Fusing Region and Depth Xinyue Huo, Lingxi Xie, Hengtong Hu, Wengang Zhou, Houqiang
for Highly Efficient 3D Tracking of Textureless Objects, Manuel Li, Qi Tian
Stoiber, Martin Sundermeyer, Rudolph Triebel 178. Image Segmentation Using Text and Image Prompts, Timo
157. CPPF: Towards Robust Category-Level 9D Pose Estimation in Lüddecke, Alexander Ecker
the Wild, Yang You, Ruoxi Shi, Weiming Wang, Cewu Lu 179. PCL: Proxy-Based Contrastive Learning for Domain
158. Leveraging Equivariant Features for Absolute Pose Regression, Generalization, Xufeng Yao, Yang Bai, Xinyun Zhang, Yuechen
Mohamed Adel Musallam, Vincent Gaudillière, Miguel Ortiz del Zhang, Qi Sun, Ran Chen, Ruiyu Li, Bei Yu
Castillo, Kassem Al Ismaeil, Djamila Aouada 180. Localized Adversarial Domain Generalization, Wei Zhu, Le Lu,
Transfer / Low-Shot / Long-Tail Learning Jing Xiao, Mei Han, Jiebo Luo, Adam P. Harrison
159. The Majority Can Help the Minority: Context-Rich Minority 181. Compound Domain Generalization via Meta-Knowledge
Oversampling for Long-Tailed Classification, Seulki Park, Encoding, Chaoqi Chen, Jiongcheng Li, Xiaoguang Han, Xiaoqing
Youngkyu Hong, Byeongho Heo, Sangdoo Yun, Jin Young Choi Liu, Yizhou Yu

23
Wednesday, June 22 (Morning) Program
182. Style Neophile: Constantly Seeking Novel Styles for Domain 203. FMCNet: Feature-Level Modality Compensation for Visible-
Generalization, Juwon Kang, Sohyun Lee, Namyup Kim, Suha Infrared Person Re-Identification, Qiang Zhang, Changzhou Lai,
Kwak Jianan Liu, Nianchang Huang, Jungong Han
183. Slimmable Domain Adaptation, Rang Meng, Weijie Chen, Shicai 204. Graph Sampling Based Deep Metric Learning for Generalizable
Yang, Jie Song, Luojun Lin, Di Xie, Shiliang Pu, Xinchao Wang, Person Re-Identification, Shengcai Liao, Ling Shao
Mingli Song, Yueting Zhuang 205. Implicit Sample Extension for Unsupervised Person Re-Identi-
184. Exploring Domain-Invariant Parameters for Source Free fication, Xinyu Zhang, Dongdong Li, Zhigang Wang, Jian Wang,
Domain Adaptation, Fan Wang, Zhongyi Han, Yongshun Gong, Errui Ding, Javen Qinfeng Shi, Zhaoxiang Zhang, Jingdong Wang
Yilong Yin 206. Rethinking Reconstruction Autoencoder-Based Out-of-
185. Cross-Domain Few-Shot Learning With Task-Specific Adapters, Distribution Detection, Yibo Zhou
Wei-Hong Li, Xialei Liu, Hakan Bilen 207. Catching Both Gray and Black Swans: Open-Set Supervised
186. Task-Adaptive Negative Envision for Few-Shot Open-Set Recog- Anomaly Detection, Choubo Ding, Guansong Pang, Chunhua Shen
nition, Shiyuan Huang, Jiawei Ma, Guangxing Han, Shih-Fu Chang 208. Fine-Grained Object Classification via Self-Supervised Pose
187. Reusing the Task-Specific Classifier as a Discriminator: Alignment, Xuhui Yang, Yaowei Wang, Ke Chen, Yong Xu,
Discriminator-Free Adversarial Domain Adaptation, Lin Chen, Yonghong Tian
Huaian Chen, Zhixiang Wei, Xin Jin, Xiao Tan, Yi Jin, Enhong Chen 209. Hyperbolic Vision Transformers: Combining Improvements in
188. Safe Self-Refinement for Transformer-Based Domain Metric Learning, Aleksandr Ermolov, Leyla Mirvakhabova,
Adaptation, Tao Sun, Cheng Lu, Tianshuo Zhang, Haibin Ling Valentin Khrulkov, Nicu Sebe, Ivan Oseledets
189. Continual Test-Time Domain Adaptation, Qin Wang, Olga Fink, 210. Non-Isotropy Regularization for Proxy-Based Deep Metric
Luc Van Gool, Dengxin Dai Learning, Karsten Roth, Oriol Vinyals, Zeynep Akata
190. Source-Free Domain Adaptation via Distribution Estimation, 211. Self-Taught Metric Learning Without Labels, Sungyeon Kim,
Ning Ding, Yixing Xu, Yehui Tang, Chao Xu, Yunhe Wang, Dongwon Kim, Minsu Cho, Suha Kwak
Dacheng Tao 212. Not Just Selection, but Exploration: Online Class-Incremental
191. Domain Adaptation on Point Clouds via Geometry-Aware Continual Learning via Dual View Consistency, Yanan Gu, Xu
Implicits, Yuefan Shen, Yanchao Yang, Mi Yan, He Wang, Youyi Yang, Kun Wei, Cheng Deng
Zheng, Leonidas J. Guibas 213. Energy-Based Latent Aligner for Incremental Learning, K J
192. Deformation and Correspondence Aware Unsupervised Joseph, Salman Khan, Fahad Shahbaz Khan, Rao Muhammad
Synthetic-to-Real Scene Flow Estimation for Point Clouds, Zhao Anwer, Vineeth N Balasubramanian
Jin, Yinjie Lei, Naveed Akhtar, Haifeng Li, Munawar Hayat 214. Sketch3T: Test-Time Training for Zero-Shot SBIR, Aneeshan
193. Hyperspherical Consistency Regularization, Cheng Tan, Sain, Ayan Kumar Bhunia, Vaishnav Potlapalli, Pinaki Nath
Zhangyang Gao, Lirong Wu, Siyuan Li, Stan Z. Li Chowdhury, Tao Xiang, Yi-Zhe Song
194. BatchFormer: Learning To Explore Sample Relationships for 215. The Devil Is in the Pose: Ambiguity-Free 3D Rotation-Invariant
Robust Representation Learning, Zhi Hou, Baosheng Yu, Learning via Pose-Aware Convolution, Ronghan Chen, Yang
Dacheng Tao Cong
Recognition: Detection, Categorization, Retrieval 216. Finding Badly Drawn Bunnies, Lan Yang, Kaiyue Pang,
195. Cascade Transformers for End-to-End Person Search, Rui Yu, Honggang Zhang, Yi-Zhe Song
Dawei Du, Rodney LaLonde, Daniel Davila, Christopher Funk, 217. Generalized Category Discovery, Sagar Vaze, Kai Han, Andrea
Anthony Hoogs, Brian Clipp Vedaldi, Andrew Zisserman
196. Delving Deep Into the Generalization of Vision Transformers 218. Recall@k Surrogate Loss With Large Batches and Similarity
Under Distribution Shifts, Chongzhi Zhang, Mingyuan Zhang, Mixup, Yash Patel, Giorgos Tolias, Jiří Matas
Shanghang Zhang, Daisheng Jin, Qiang Zhou, Zhongang Cai, 219. Modeling 3D Layout for Group Re-Identification, Quan Zhang,
Haiyu Zhao, Xianglong Liu, Ziwei Liu Kaiheng Dang, Jian-Huang Lai, Zhanxiang Feng, Xiaohua Xie
197. MPViT: Multi-Path Vision Transformer for Dense Prediction, 220. Causal Transportability for Visual Recognition, Chengzhi Mao,
Youngwan Lee, Jonghee Kim, Jeffrey Willette, Sung Ju Hwang Kevin Xia, James Wang, Hao Wang, Junfeng Yang, Elias
198. NFormer: Robust Person Re-Identification With Neighbor Bareinboim, Carl Vondrick
Transformer, Haochen Wang, Jiayi Shen, Yongtuo Liu, Yan Gao, 221. Attributable Visual Similarity Learning, Borui Zhang, Wenzhao
Efstratios Gavves Zheng, Jie Zhou, Jiwen Lu
199. Part-Based Pseudo Label Refinement for Unsupervised Person 222. Bi-Level Alignment for Cross-Domain Crowd Counting, Shenjian
Re-Identification, Yoonki Cho, Woo Jae Kim, Seunghoon Hong, Gong, Shanshan Zhang, Jian Yang, Dengxin Dai, Bernt Schiele
Sung-Eui Yoon 223. Mutual Quantization for Cross-Modal Search With Noisy Labels,
200. Temporal Complementarity-Guided Reinforcement Learning Erkun Yang, Dongren Yao, Tongliang Liu, Cheng Deng
for Image-to-Video Person Re-Identification, Wei Wu, Jiawei Liu, 224. Task Adaptive Parameter Sharing for Multi-Task Learning,
Kecheng Zheng, Qibin Sun, Zheng-Jun Zha Matthew Wallingford, Hao Li, Alessandro Achille, Avinash
201. Augmented Geometric Distillation for Data-Free Incremental Ravichandran, Charless Fowlkes, Rahul Bhotika, Stefano Soatto
Person ReID, Yichen Lu, Mei Wang, Weihong Deng 225. Simple Multi-Dataset Detection, Xingyi Zhou, Vladlen Koltun,
202. Salient-to-Broad Transition for Video Person Re-Identification, Philipp Krähenbühl
Shutao Bai, Bingpeng Ma, Hong Chang, Rui Huang, Xilin Chen 226. Cross-Domain Adaptive Teacher for Object Detection, Yu-Jhe
Li, Xiaoliang Dai, Chih-Yao Ma, Yen-Cheng Liu, Kan Chen, Bichen
Wu, Zijian He, Kris Kitani, Peter Vajda

24
Wednesday, June 22 (Morning) Program
227. Balanced and Hierarchical Relation Learning for One-Shot 248. E-CIR: Event-Enhanced Continuous Intensity Recovery, Chen
Object Detection, Hanqing Yang, Sijia Cai, Hualian Sheng, Bing Song, Qixing Huang, Chandrajit Bajaj
Deng, Jianqiang Huang, Xian-Sheng Hua, Yong Tang, Yu Zhang 249. Learning Robust Image-Based Rendering on Sparse Scene
228. Semantic-Aligned Fusion Transformer for One-Shot Object Geometry via Depth Completion, Yuqi Sun, Shili Zhou, Ri Cheng,
Detection, Yizhou Zhao, Xun Guo, Yan Lu Weimin Tan, Bo Yan, Lang Fu
229. MSDN: Mutually Semantic Distillation Network for Zero-Shot 250. Neural Rays for Occlusion-Aware Image-Based Rendering, Yuan
Learning, Shiming Chen, Ziming Hong, Guo-Sen Xie, Wenhan Liu, Sida Peng, Lingjie Liu, Qianqian Wang, Peng Wang, Christian
Yang, Qinmu Peng, Kai Wang, Jian Zhao, Xinge You Theobalt, Xiaowei Zhou, Wenping Wang
230. Robust Region Feature Synthesizer for Zero-Shot Object 251. Industrial Style Transfer With Large-Scale Geometric Warping
Detection, Peiliang Huang, Junwei Han, De Cheng, Dingwen and Content Preservation, Jinchao Yang, Fei Guo, Shuo Chen,
Zhang Jun Li, Jian Yang
Image & Video Synthesis and Generation 252. PCA-Based Knowledge Distillation Towards Lightweight and
231. Region-Aware Face Swapping, Chao Xu, Jiangning Zhang, Miao Content-Style Balanced Photorealistic Style Transfer Models,
Hua, Qian He, Zili Yi, Yong Liu Tai-Yin Chiu, Danna Gurari
232. High-Resolution Face Swapping via Latent Semantics 253. Commonality in Natural Images Rescues GANs: Pretraining
Disentanglement, Yangyang Xu, Bailin Deng, Junle Wang, GANs With Generic and Privacy-Free Synthetic Data, Kyungjune
Yanqing Jing, Jia Pan, Shengfeng He Baek, Hyunjung Shim
233. Rethinking Deep Face Restoration, Yang Zhao, Yu-Chuan Su, 254. Think Twice Before Detecting GAN-Generated Fake Images
Chun-Te Chu, Yandong Li, Marius Renn, Yukun Zhu, Changyou From Their Spectral Domain Imprints, Chengdong Dong, Ajay
Chen, Xuhui Jia Kumar, Eryun Liu
234. Blind Face Restoration via Integrating Face Shape and 255. Robust Invertible Image Steganography, Youmin Xu, Chong
Generative Priors, Feida Zhu, Junwei Zhu, Wenqing Chu, Xinyi Mou, Yujie Hu, Jingfen Xie, Jian Zhang
Zhang, Xiaozhong Ji, Chengjie Wang, Ying Tai 256. Distinguishing Unseen From Seen for Generalized Zero-Shot
235. FENeRF: Face Editing in Neural Radiance Fields, Jingxiang Sun, Learning, Hongzu Su, Jingjing Li, Zhi Chen, Lei Zhu, Ke Lu
Xuan Wang, Yong Zhang, Xiaoyu Li, Qi Zhang, Yebin Liu, Jue 257. Few-Shot Font Generation by Learning Fine-Grained Local
Wang Styles, Licheng Tang, Yiyang Cai, Jiaming Liu, Zhibin Hong,
236. TransEditor: Transformer-Based Dual-Space GAN for Highly Mingming Gong, Minhu Fan, Junyu Han, Jingtuo Liu, Errui Ding,
Controllable Facial Editing, Yanbo Xu, Yueqin Yin, Liming Jiang, Jingdong Wang
Qianyi Wu, Chengyao Zheng, Chen Change Loy, Bo Dai, Wayne 258. XMP-Font: Self-Supervised Cross-Modality Pre-Training for
Wu Few-Shot Font Generation, Wei Liu, Fangyue Liu, Fei Ding, Qian
237. Pastiche Master: Exemplar-Based High-Resolution Portrait He, Zili Yi
Style Transfer, Shuai Yang, Liming Jiang, Ziwei Liu, Chen Change 259. Learning To Generate Line Drawings That Convey Geometry
Loy and Semantics, Caroline Chan, Frédo Durand, Phillip Isola
238. Self-Supervised Correlation Mining Network for Person Image
Generation, Zijian Wang, Xingqun Qi, Kun Yuan, Muyi Sun 1130–1330 Lunch (Halls D-E)
239. Exploring Dual-Task Correlation for Pose Guided Person Image
Generation, Pengze Zhang, Lingxiao Yang, Jian-Huang Lai,
Xiaohua Xie Notes:
240. InsetGAN for Full-Body Image Generation, Anna Frühstück,
Krishna Kumar Singh, Eli Shechtman, Niloy J. Mitra, Peter Wonka,
Jingwan Lu
241. BodyGAN: General-Purpose Controllable Neural Human Body
Generation, Chaojie Yang, Hanhui Li, Shengjie Wu, Shengkai
Zhang, Haonan Yan, Nianhong Jiao, Jie Tang, Runnan Zhou,
Xiaodan Liang, Tianxiang Zheng
242. HumanNeRF: Efficiently Generated Human Radiance Field
From Sparse Inputs, Fuqiang Zhao, Wei Yang, Jiakai Zhang, Pei
Lin, Yingliang Zhang, Jingyi Yu, Lan Xu
243. Structure-Aware Flow Generation for Human Body Reshaping,
Jianqiang Ren, Yuan Yao, Biwen Lei, Miaomiao Cui, Xuansong Xie
244. Modeling Image Composition for Complex Scene Generation,
Zuopeng Yang, Daqing Liu, Chaoyue Wang, Jie Yang, Dacheng
Tao
245. Local Attention Pyramid for Scene Image Generation, Sang-
Heon Shim, Sangeek Hyun, DaeHyun Bae, Jae-Pil Heo
246. Interactive Image Synthesis With Panoptic Layout Generation,
Bo Wang, Tao Wu, Minfeng Zhu, Peng Du
247. iPLAN: Interactive and Procedural Layout Planning, Feixiang He,
Yanlong Huang, He Wang

25
Wednesday, June 22 (Morning) Program
1100–1300 Doctoral Consortium (Rivergate Terrace)
(by invitation only)
Supported by:

• Bowen Cheng (Univ. of Illinois, Urbana-Champaign)


• Ioana Croitoru (Inst. of Mathematics of the Romanian Academy)
• Yuqi Ding (Louisiana State Univ.)
• Amanda Duarte (Univ. Politècnica de Catalunya)
• Kshitij Dwivedi (Goethe Univ. Frankfurt)
• Bin Fan (Northwestern Polytechnical Univ.)
• Valentin Gabeur (Universite Grenoble Alpes)
• Tejas Gokhale (Arizona State Univ.)
• Jindong Gu (Univ. of Munich)
• Felipe Gutierrez-Barragan (Univ. of Wisconsin, Madison)
• Pan He (Univ. of Florida)
• K J Joseph (Indian Inst. of Technology Hyderabad)
• Donghyun Kim (Boston Univ.)
• Dahun Kim (Korea Advanced Inst. of Science and Technology)
• Zhuang Liu (Univ. of California, Berkeley)
• Vishnu Lokhande (Univ. of Wisconsin, Madison)
• Felix Petersen (Univ. of Konstanz)
• Vipin Pillai (Univ. of Maryland, Baltimore County)
• Fabio Pizzati (Mines ParisTech)
• Dripta S. Raychaudhuri (Univ. of California, Riverside)
• N Dinesh Reddy (Carnegie Mellon Univ.)
• Liyue Shen (Stanford Univ.)
• Yapeng Tian (Univ. of Rochester)
• Xinlong Wang (The Univ. of Adelaide)
• Yida Wang (Technical Univ. of Munich)
• Jae Shin Yoon (Univ. of Minnesota)
• Jiaojiao Zhao (Univ. of Amsterdam)
• Xingyi Zhou (Univ. of Texas, Austin)

Notes:

26
Wednesday, June 22 (Afternoon) Program
1300–1330 Poster Switch/Setup (Halls B2-C) 1330–1500 Oral 2.2.2: Motion, Tracking, Registration,
Vision & X, and Theory (Great Hall B-C)
1330–1500 Oral 2.2.1: Transfer / Low-Shot / Long-Tail Papers in this session are in Poster Session 2.2
Learning (Hall B1) Chairs: Fuxin Li (Oregon State Univ.)
Papers in this session are in Poster Session 2.2 Hamid Rezatofighi (Monash Univ.)
Chairs: Lorenzo Torresani (Dartmouth College) Vincent Lepetit (Université de Bordeaux)
Mei Chen (Microsoft) Format (5 min. presentation; 3 min. group questions/3 papers)
Ehsan Adeli (Stanford Univ.) 16. [1330] MeMOT: Multi-Object Tracking With Memory, Jiarui Cai,
Mingze Xu, Wei Li, Yuanjun Xiong, Wei Xia, Zhuowen Tu, Stefano
Format (5 min. presentation; 3 min. group questions/3 papers)
Soatto
1. [1330] Balanced MSE for Imbalanced Visual Regression, Jiawei
17. [1335] Unsupervised Learning of Accurate Siamese Tracking,
Ren, Mingyuan Zhang, Cunjun Yu, Ziwei Liu
Qiuhong Shen, Lei Qiao, Jinyang Guo, Peixia Li, Xin Li, Bo Li,
2. [1335] Transferability Metrics for Selecting Source Model
Weitao Feng, Weihao Gan, Wei Wu, Wanli Ouyang
Ensembles, Andrea Agostinelli, Jasper Uijlings, Thomas Mensink,
18. [1340] Beyond 3D Siamese Tracking: A Motion-Centric Paradigm
Vittorio Ferrari
for 3D Single Object Tracking in Point Clouds, Chaoda Zheng, Xu
3. [1340] OoD-Bench: Quantifying and Understanding Two
Yan, Haiming Zhang, Baoyuan Wang, Shenghui Cheng, Shuguang
Dimensions of Out-of-Distribution Generalization, Nanyang Ye,
Cui, Zhen Li
Kaican Li, Haoyue Bai, Runpeng Yu, Lanqing Hong, Fengwei Zhou,
Zhenguo Li, Jun Zhu 19. [1348] GMFlow: Learning Optical Flow via Global Matching,
Haofei Xu, Jing Zhang, Jianfei Cai, Hamid Rezatofighi, Dacheng
4. [1348] Robust Fine-Tuning of Zero-Shot Models, Mitchell
Tao
Wortsman, Gabriel Ilharco, Jong Wook Kim, Mike Li, Simon
20. [1353] GridShift: A Faster Mode-Seeking Algorithm for Image
Kornblith, Rebecca Roelofs, Raphael Gontijo Lopes, Hannaneh
Segmentation and Object Tracking, Abhishek Kumar, Oladayo S.
Hajishirzi, Ali Farhadi, Hongseok Namkoong, Ludwig Schmidt
Ajani, Swagatam Das, Rammohan Mallipeddi
5. [1353] Joint Distribution Matters: Deep Brownian Distance
21. [1358] SNUG: Self-Supervised Neural Dynamic Garments, Igor
Covariance for Few-Shot Classification, Jiangtao Xie, Fei Long,
Santesteban, Miguel A. Otaduy, Dan Casas
Jiaming Lv, Qilong Wang, Peihua Li
6. [1358] Learning To Learn and Remember Super Long Multi- 22. [1406] Weakly-Supervised Action Transition Learning for
Domain Task Sequence, Zhenyi Wang, Li Shen, Tiehang Duan, Stochastic Human Motion Prediction, Wei Mao, Miaomiao Liu,
Donglin Zhan, Le Fang, Mingchen Gao Mathieu Salzmann
23. [1411] Multi-Objective Diverse Human Motion Prediction With
7. [1406] Learning Distinctive Margin Toward Active Domain
Knowledge Distillation, Hengbo Ma, Jiachen Li, Ramtin Hosseini,
Adaptation, Ming Xie, Yuxi Li, Yabiao Wang, Zekun Luo, Zhenye
Masayoshi Tomizuka, Chiho Choi
Gan, Zhongyi Sun, Mingmin Chi, Chengjie Wang, Pei Wang
24. [1416] Context-Aware Sequence Alignment Using 4D Skeletal
8. [1411] DINE: Domain Adaptation From Single and Multiple Black-
Augmentation, Taein Kwon, Bugra Tekin, Siyu Tang, Marc
Box Predictors, Jian Liang, Dapeng Hu, Jiashi Feng, Ran He
Pollefeys
9. [1416] Source-Free Object Detection by Learning To Overlook
Domain Style, Shuaifeng Li, Mao Ye, Xiatian Zhu, Lihua Zhou, Lin 25. [1424] Enabling Equivariance for Arbitrary Lie Groups, Lachlan E.
Xiong MacDonald, Sameera Ramasinghe, Simon Lucey
26. [1429] RAMA: A Rapid Multicut Algorithm on GPU, Ahmed Abbas,
10. [1424] Towards Principled Disentanglement for Domain
Paul Swoboda
Generalization, Hanlin Zhang, Yi-Fan Zhang, Weiyang Liu, Adrian
Weller, Bernhard Schölkopf, Eric P. Xing 27. [1434] Self-Supervised Material and Texture Representation
Learning for Remote Sensing Tasks, Peri Akiva, Matthew Purri,
11. [1429] Exact Feature Distribution Matching for Arbitrary Style
Matthew Leotta
Transfer and Domain Generalization, Yabin Zhang, Minghan Li,
Ruihuang Li, Kui Jia, Lei Zhang 28. [1442] RCP: Recurrent Closest Point for Point Cloud, Xiaodong
12. [1434] Causality Inspired Representation Learning for Domain Gu, Chengzhou Tang, Weihao Yuan, Zuozhuo Dai, Siyu Zhu, Ping
Generalization, Fangrui Lv, Jian Liang, Shuang Li, Bin Zang, Chi Tan
Harold Liu, Ziteng Wang, Di Liu 29. [1447] Audio-Visual Speech Codecs: Rethinking Audio-Visual
Speech Enhancement by Re-Synthesis, Karren Yang, Dejan
13. [1442] Learning What Not To Segment: A New Perspective on
Marković, Steven Krenn, Vasu Agrawal, Alexander Richard
Few-Shot Segmentation, Chunbo Lang, Gong Cheng, Binfei Tu,
Junwei Han 30. [1452] Balanced Multimodal Learning via On-the-Fly Gradient
Modulation, Xiaokang Peng, Yake Wei, Andong Deng, Dong Wang,
14. [1447] Towards Fewer Annotations: Active Learning via Region
Di Hu
Impurity and Prediction Uncertainty for Domain Adaptive
Semantic Segmentation, Binhui Xie, Longhui Yuan, Shuang Li, Chi
Harold Liu, Xinjing Cheng
15. [1452] ADeLA: Automatic Dense Labeling With Attention for
Viewpoint Shift in Semantic Segmentation, Hanxiang Ren,
Yanchao Yang, He Wang, Bokui Shen, Qingnan Fan, Youyi Zheng,
C. Karen Liu, Leonidas J. Guibas

27
Wednesday, June 22 (Afternoon) Program
1330–1500 Oral 2.2.3: 3D from Multiview & Sensors, 1430–1700 Demos (Halls B2-C Demo Area)
Learning for Vision, Explainable Vision, • Interactive Segmentation and Visualization for Tiny Objects in
and Privacy (Great Hall A-D) Multi-Megapixel Images, Chengyuan Xu, Boning Dong, Noah Stier,
Papers in this session are in Poster Session 2.2 Curtis McCully, D. Andrew Howell, Pradeep Sen, Tobias Hollerer
(UCSB; Las Cumbres Observatory)
Chairs: Bolei Zhou (CUHK)
Marc Pollefeys (ETH Zurich; Microsoft) • VL-InterpreT: An Interactive Visualization Tool for Interpreting
Vision-Language Transformers, Estelle Guez Aflalo, Meng Du,
Format (5 min. presentation; 3 min. group questions/3 papers)
Shao-Yen Tseng, Yongfei Liu, Chenfei Wu, Nan Duan, Vasudev Lal
31. [1330] Block-NeRF: Scalable Large Scene Neural View Synthesis,
(Intel Labs; UCLA; Microsoft Research)
Matthew Tancik, Vincent Casser, Xinchen Yan, Sabeek Pradhan,
Ben Mildenhall, Pratul P. Srinivasan, Jonathan T. Barron, Henrik • Speech Driven Tongue Animation, Salvador Medina, Denis Tome,
Kretzschmar Carsten Stoll, Thibaut Weise, Iain Matthews (Carnegie Mellon
University; Epic Games)
32. [1335] SceneSqueezer: Learning To Compress Scene for Camera
Relocalization, Luwei Yang, Rakesh Shrestha, Wenbo Li, • Effective Conditioned and Composed Image Retrieval Combining
Shuaicheng Liu, Guofeng Zhang, Zhaopeng Cui, Ping Tan CLIP-Based Features, Alberto Baldrati, Marco Bertini, Tiberio
33. [1340] Light Field Neural Rendering, Mohammed Suhail, Carlos
Uricchio, Alberto Del Bimbo (Università degli Studi di Firenze;
Esteves, Leonid Sigal, Ameesh Makadia Università di Pisa)
• DetectorDetective: Investigating the Effects of Adversarial
34. [1348] Extracting Triangular 3D Models, Materials, and Lighting
Examples on Object Detectors, Sivapriya Vellaichamy, Matthew
From Images, Jacob Munkberg, Jon Hasselgren, Tianchang Shen,
Hull, Zijie J. Wang, Nilaksh Das, ShengYun Peng, Haekyu Park, Duen
Jun Gao, Wenzheng Chen, Alex Evans, Thomas Müller, Sanja Fidler
Horng Chau (Georgia Institute of Technology)
35. [1353] Super-Fibonacci Spirals: Fast, Low-Discrepancy Sampling
of SO(3), Marc Alexa • [Virtual] V-Doc: Visual Questions Answers With Documents, Yihao
Ding, Zhe Huang, Runlin Wang, YanHang Zhang, Xianru Chen,
36. [1358] Stochastic Backpropagation: A Memory Efficient Strategy
Yuzhong Ma, Hyunsuk Chung, Soyeon Caren Han (The Univ. of
for Training Video Models, Feng Cheng, Mingze Xu, Yuanjun
Sydney; Fortifyedge)
Xiong, Hao Chen, Xinyu Li, Wei Li, Wei Xia
• [Virtual] VisCUIT: Visual Auditor for Bias in CNN Image Classifier,
37. [1406] It’s All in the Teacher: Zero-Shot Quantization Brought
Seongmin Lee, Zijie J. Wang, Judy Hoffman, Duen Horng Chau
Closer to the Teacher, Kanghyun Choi, Hye Yoon Lee, Deokki
(Georgia Institute of Technology)
Hong, Joonsang Yu, Noseong Park, Youngsok Kim, Jinho Lee
• [Virtual] Clustering Plotted Data by Image Segmentation, Tarek
38. [1411] NLX-GPT: A Model for Natural Language Explanations in
Naous, Srinjay Sarkar, Abubakar Abid, James Zou (American Univ. of
Vision and Vision-Language Tasks, Fawaz Sammani, Tanmoy
Beirut; VinAI Research; Hugging Face; Stanford Univ.)
Mukherjee, Nikos Deligiannis
39. [1416] Explaining Deep Convolutional Neural Networks via Latent
Visual-Semantic Filter Attention, Yu Yang, Seungbae Kim, 1430–1700 Poster 2.2 (Halls B2-C)
Jungseock Joo 3D From Multi-View & Sensors
40. [1424] Parameter-Free Online Test-Time Adaptation, Malik 46. AirObject: A Temporally Evolving Graph Embedding for Object
Boudiaf, Romain Mueller, Ismail Ben Ayed, Luca Bertinetto Identification, Nikhil Varma Keetha, Chen Wang, Yuheng Qiu,
41. [1429] Patch-Level Representation Learning for Self-Supervised Kuan Xu, Sebastian Scherer
Vision Transformers, Sukmin Yun, Hankook Lee, Jaehyung Kim, 47. Voxel Set Transformer: A Set-to-Set Approach to 3D Object
Jinwoo Shin Detection From Point Clouds, Chenhang He, Ruihuang Li, Shuai
42. [1434] Deep Spectral Methods: A Surprisingly Strong Baseline for Li, Lei Zhang
Unsupervised Semantic Segmentation and Localization, Luke 48. SS3D: Sparsely-Supervised 3D Object Detection From Point
Melas-Kyriazi, Christian Rupprecht, Iro Laina, Andrea Vedaldi Cloud, Chuandong Liu, Chenqiang Gao, Fangcen Liu, Jiang Liu,
43. [1442] Mixed Differential Privacy in Computer Vision, Aditya Deyu Meng, Xinbo Gao
Golatkar, Alessandro Achille, Yu-Xiang Wang, Aaron Roth, Michael 49. Back to Reality: Weakly-Supervised 3D Object Detection With
Kearns, Stefano Soatto Shape-Guided Label Enhancement, Xiuwei Xu, Yifan Wang, Yu
44. [1447] DPGEN: Differentially Private Generative Energy-Guided Zheng, Yongming Rao, Jie Zhou, Jiwen Lu
Network for Natural Image Synthesis, Jia-Wei Chen, Chia-Mu Yu, 50. VISTA: Boosting 3D Object Detection via Dual Cross-VIew
Ching-Chia Kao, Tzai-Wei Pang, Chun-Shien Lu SpaTial Attention, Shengheng Deng, Zhihao Liang, Lin Sun, Kui
45. [1452] Local Learning Matters: Rethinking Data Heterogeneity in Jia
Federated Learning, Matias Mendieta, Taojiannan Yang, Pu 51. Embracing Single Stride 3D Object Detector With Sparse
Wang, Minwoo Lee, Zhengming Ding, Chen Chen Transformer, Lue Fan, Ziqi Pang, Tianyuan Zhang, Yu-Xiong
Wang, Hang Zhao, Feng Wang, Naiyan Wang, Zhaoxiang Zhang
52. Point Density-Aware Voxels for LiDAR 3D Object Detection,
1500–1530 Afternoon Break (Halls B2-C) Jordan S. K. Hu, Tianshu Kuai, Steven L. Waslander
53. Point-to-Voxel Knowledge Distillation for LiDAR Semantic
1000–1700 Exhibits (Halls B2-C) Segmentation, Yuenan Hou, Xinge Zhu, Yuexin Ma, Chen Change
• See Exhibits map for list of exhibitors. Loy, Yikang Li
54. Contrastive Boundary Learning for Point Cloud Segmentation,
Liyao Tang, Yibing Zhan, Zhe Chen, Baosheng Yu, Dacheng Tao

28
Wednesday, June 22 (Afternoon) Program
55. Stratified Transformer for 3D Point Cloud Segmentation, Xin 75. Probabilistic Warp Consistency for Weakly-Supervised
Lai, Jianhui Liu, Li Jiang, Liwei Wang, Hengshuang Zhao, Shu Liu, Semantic Correspondences, Prune Truong, Martin Danelljan,
Xiaojuan Qi, Jiaya Jia Fisher Yu, Luc Van Gool
56. No Pain, Big Gain: Classify Dynamic Point Cloud Sequences 76. Locality-Aware Inter– and Intra-Video Reconstruction for Self-
With Static Models by Fitting Feature-Level Space-Time Supervised Correspondence Learning, Liulei Li, Tianfei Zhou,
Surfaces, Jia-Xing Zhong, Kaichen Zhou, Qingyong Hu, Bing Wenguan Wang, Lu Yang, Jianwu Li, Yi Yang
Wang, Niki Trigoni, Andrew Markham 77. Transforming Model Prediction for Tracking, Christoph Mayer,
57. Point2Seq: Detecting 3D Objects As Sequences, Yujing Xue, Martin Danelljan, Goutam Bhat, Matthieu Paul, Danda Pani
Jiageng Mao, Minzhe Niu, Hang Xu, Michael Bi Mi, Wei Zhang, Paudel, Fisher Yu, Luc Van Gool
Xiaogang Wang, Xinchao Wang 78. Ranking-Based Siamese Visual Tracking, Feng Tang, Qiang Ling
58. PTTR: Relational 3D Point Cloud Object Tracking With 79. Correlation-Aware Deep Tracking, Fei Xie, Chunyu Wang,
Transformer, Changqing Zhou, Zhipeng Luo, Yueru Luo, Tianrui Guangting Wang, Yue Cao, Wankou Yang, Wenjun Zeng
Liu, Liang Pan, Zhongang Cai, Haiyu Zhao, Shijian Lu 80. Global Tracking via Ensemble of Local Trackers, Zikun Zhou,
59. A Unified Query-Based Paradigm for Point Cloud Understand- Jianqiu Chen, Wenjie Pei, Kaige Mao, Hongpeng Wang, Zhenyu
ing, Zetong Yang, Li Jiang, Yanan Sun, Bernt Schiele, Jiaya Jia He
60. PointCLIP: Point Cloud Understanding by CLIP, Renrui Zhang, 81. Global Tracking Transformers, Xingyi Zhou, Tianwei Yin, Vladlen
Ziyu Guo, Wei Zhang, Kunchang Li, Xupeng Miao, Bin Cui, Yu Koltun, Philipp Krähenbühl
Qiao, Peng Gao, Hongsheng Li 82. Unified Transformer Tracker for Object Tracking, Fan Ma, Mike
61. X-Trans2Cap: Cross-Modal Knowledge Transfer Using Zheng Shou, Linchao Zhu, Haoqi Fan, Yilei Xu, Yi Yang, Zhicheng
Transformer for 3D Dense Captioning, Zhihao Yuan, Xu Yan, Yan
Yinghong Liao, Yao Guo, Guanbin Li, Shuguang Cui, Zhen Li 83. Transformer Tracking With Cyclic Shifting Window Attention,
62. MVS2D: Efficient Multi-View Stereo via Attention-Driven 2D Zikai Song, Junqing Yu, Yi-Ping Phoebe Chen, Wei Yang
Convolutions, Zhenpei Yang, Zhile Ren, Qi Shan, Qixing Huang 84. Spiking Transformers for Event-Based Single Object Tracking,
63. TransMVSNet: Global Context-Aware Multi-View Stereo Net- Jiqing Zhang, Bo Dong, Haiwei Zhang, Jianchuan Ding, Felix
work With Transformers, Yikang Ding, Wentao Yuan, Qingtian Heide, Baocai Yin, Xin Yang
Zhu, Haotian Zhang, Xiangyue Liu, Yuanjiang Wang, Xiao Liu 85. Adiabatic Quantum Computing for Multi Object Tracking, Jan-
64. RayMVSNet: Learning Ray-Based 1D Implicit Fields for Accurate Nico Zaech, Alexander Liniger, Martin Danelljan, Dengxin Dai, Luc
Multi-View Stereo, Junhua Xi, Yifei Shi, Yijie Wang, Yulan Guo, Van Gool
Kai Xu 86. HiVT: Hierarchical Vector Transformer for Multi-Agent Motion
65. IterMVS: Iterative Probability Estimation for Efficient Multi- Prediction, Zikang Zhou, Luyao Ye, Jianping Wang, Kui Wu, Kejie
View Stereo, Fangjinhua Wang, Silvano Galliani, Christoph Lu
Vogel, Marc Pollefeys 87. Towards Discriminative Representation: Multi-View Trajectory
66. PSMNet: Position-Aware Stereo Merging Network for Room Contrastive Learning for Online Multi-Object Tracking, En Yu,
Layout Estimation, Haiyan Wang, Will Hutchcroft, Yuguang Li, Zhuoling Li, Shoudong Han
Zhiqiang Wan, Ivaylo Boyadzhiev, Yingli Tian, Sing Bing Kang 88. TrackFormer: Multi-Object Tracking With Transformers, Tim
67. Non-Parametric Depth Distribution Modelling Based Depth Meinhardt, Alexander Kirillov, Laura Leal-Taixé, Christoph
Inference for Multi-View Stereo, Jiayu Yang, Jose M. Alvarez, Feichtenhofer
Miaomiao Liu 89. Learning of Global Objective for Network Flow in Multi-Object
68. Differentiable Stereopsis: Meshes From Multiple Views Using Tracking, Shuai Li, Yu Kong, Hamid Rezatofighi
Differentiable Rendering, Shubham Goel, Georgia Gkioxari, 90. LMGP: Lifted Multicut Meets Geometry Projections for Multi-
Jitendra Malik Camera Multi-Object Tracking, Duy M. H. Nguyen, Roberto
69. Rethinking Depth Estimation for Multi-View Stereo: A Unified Henschel, Bodo Rosenhahn, Daniel Sonntag, Paul Swoboda
Representation, Rui Peng, Rongjie Wang, Zhenyu Wang, Yawen 91. Multi-Object Tracking Meets Moving UAV, Shuai Liu, Xin Li,
Lai, Ronggang Wang Huchuan Lu, You He
70. Efficient Multi-View Stereo by Iterative Dynamic Cost Volume, 92. Visible-Thermal UAV Tracking: A Large-Scale Benchmark and
Shaoqian Wang, Bo Li, Yuchao Dai New Baseline, Pengyu Zhang, Jie Zhao, Dong Wang, Huchuan
71. PlaneMVS: 3D Plane Reconstruction From Multi-View Stereo, Lu, Xiang Ruan
Jiachen Liu, Pan Ji, Nitin Bansal, Changjiang Cai, Qingan Yan, 93. Unsupervised Domain Adaptation for Nighttime Aerial
Xiaolei Huang, Yi Xu Tracking, Junjie Ye, Changhong Fu, Guangze Zheng, Danda Pani
72. Discrete Time Convolution for Fast Event-Based Stereo, Paudel, Guang Chen
Kaixuan Zhang, Kaiwei Che, Jianguo Zhang, Jie Cheng, Ziyang 94. Learning Optical Flow With Kernel Patch Attention, Ao Luo, Fan
Zhang, Qinghai Guo, Luziwei Leng Yang, Xin Li, Shuaicheng Liu
73. Stereo Magnification With Multi-Layer Images, Taras Khakhulin, 95. Towards Understanding Adversarial Robustness of Optical Flow
Denis Korzhenkov, Pavel Solovev, Gleb Sterkin, Andrei-Timotei Networks, Simon Schrodi, Tonmoy Saikia, Thomas Brox
Ardelean, Victor Lempitsky 96. DIP: Deep Inverse Patchmatch for High-Resolution Optical
Motion & Tracking Flow, Zihua Zheng, Ni Nie, Zhi Ling, Pengfei Xiong, Jiangyu Liu,
74. TransforMatcher: Match-to-Match Attention for Semantic Hao Wang, Jiankun Li
Correspondence, Seungwook Kim, Juhong Min, Minsu Cho

29
Wednesday, June 22 (Afternoon) Program
Computer Vision Theory 118. Motion-Modulated Temporal Fragment Alignment Network for
97. On the Instability of Relative Pose Estimation and RANSAC’s Few-Shot Action Recognition, Jiamin Wu, Tianzhu Zhang, Zhe
Role, Hongyi Fan, Joe Kileel, Benjamin Kimia Zhang, Feng Wu, Yongdong Zhang
98. Bootstrapping ViTs: Towards Liberating Vision Transformers 119. Knowledge Distillation As Efficient Pre-Training: Faster
From Pre-Training, Haofei Zhang, Jiarui Duan, Mengqi Xue, Jie Convergence, Higher Data-Efficiency, and Better
Song, Li Sun, Mingli Song Transferability, Ruifei He, Shuyang Sun, Jihan Yang, Song Bai,
99. Global Sensing and Measurements Reuse for Image Xiaojuan Qi
Compressed Sensing, Zi-En Fan, Feng Lian, Jia-Ni Quan 120. Transferability Estimation Using Bhattacharyya Class
100. Maximum Consensus by Weighted Influences of Monotone Separability, Michal Pándy, Andrea Agostinelli, Jasper Uijlings,
Boolean Functions, Erchuan Zhang, David Suter, Ruwan Vittorio Ferrari, Thomas Mensink
Tennakoon, Tat-Jun Chin, Alireza Bab-Hadiashar, Giang Truong, 121. Revisiting the Transferability of Supervised Pretraining: An MLP
Syed Zulqarnain Gilani Perspective, Yizhou Wang, Shixiang Tang, Feng Zhu, Lei Bai, Rui
101. MS2DG-Net: Progressive Correspondence Learning via Multiple Zhao, Donglian Qi, Wanli Ouyang
Sparse Semantics Dynamic Graph, Luanyuan Dai, Yizhang Liu, 122. Task2Sim: Towards Effective Pre-Training and Transfer From
Jiayi Ma, Lifang Wei, Taotao Lai, Changcai Yang, Riqing Chen Synthetic Data, Samarth Mishra, Rameswar Panda, Cheng Perng
102. Styleformer: Transformer Based Generative Adversarial Phoo, Chun-Fu (Richard) Chen, Leonid Karlinsky, Kate Saenko,
Networks With Style Vector, Jeeseung Park, Younggeun Kim Venkatesh Saligrama, Rogerio S. Feris
103. Scanline Homographies for Rolling-Shutter Plane Absolute 123. Which Model To Transfer? Finding the Needle in the Growing
Pose, Fang Bai, Agniva Sengupta, Adrien Bartoli Haystack, Cedric Renggli, André Susano Pinto, Luka Rimanic,
Joan Puigcerver, Carlos Riquelme, Ce Zhang, Mario Lučić
Transfer / Low-Shot / Long-Tail Learning 124. Does Robustness on ImageNet Transfer to Downstream Tasks?
104. Generating Representative Samples for Few-Shot Yutaro Yamada, Mayu Otani
Classification, Jingyi Xu, Hieu Le 125. What Makes Transfer Learning Work for Medical Images:
105. Matching Feature Sets for Few-Shot Image Classification, Feature Reuse & Other Factors, Christos Matsoukas, Johan
Arman Afrasiyabi, Hugo Larochelle, Jean-François Lalonde, Fredin Haslum, Moein Sorkhei, Magnus Söderberg, Kevin Smith
Christian Gagné 126. OW-DETR: Open-World Detection Transformer, Akshita Gupta,
106. Improving Adversarially Robust Few-Shot Image Classification Sanath Narayan, K J Joseph, Salman Khan, Fahad Shahbaz Khan,
With Generalizable Representations, Junhao Dong, Yuan Wang, Mubarak Shah
Jian-Huang Lai, Xiaohua Xie 127. Unseen Classes at a Later Time? No Problem, Hari Chandana
107. Sylph: A Hypernetwork Framework for Incremental Few-Shot Kuchibhotla, Sumitra S Malagi, Shivam Chandhok, Vineeth N
Object Detection, Li Yin, Juan M. Perez-Rua, Kevin J. Liang Balasubramanian
108. Forward Compatible Few-Shot Class-Incremental Learning, Da- 128. Continual Object Detection via Prototypical Task Correlation
Wei Zhou, Fu-Yun Wang, Han-Jia Ye, Liang Ma, Shiliang Pu, De- Guided Gating Mechanism, Binbin Yang, Xinchi Deng, Han Shi,
Chuan Zhan Changlin Li, Gengwei Zhang, Hang Xu, Shen Zhao, Liang Lin,
109. Constrained Few-Shot Class-Incremental Learning, Michael Xiaodan Liang
Hersche, Geethan Karunaratne, Giovanni Cherubini, Luca Benini, 129. On Generalizing Beyond Domains in Cross-Domain Continual
Abu Sebastian, Abbas Rahimi Learning, Christian Simon, Masoud Faraki, Yi-Hsuan Tsai, Xiang
110. Pushing the Limits of Simple Pipelines for Few-Shot Learning: Yu, Samuel Schulter, Yumin Suh, Mehrtash Harandi, Manmohan
External Data and Fine-Tuning Make a Difference, Shell Xu Hu, Chandraker
Da Li, Jan Stühmer, Minyoung Kim, Timothy M. Hospedales 130. Online Continual Learning on a Contaminated Data Stream
111. EASE: Unsupervised Discriminant Subspace Learning for With Blurry Task Boundaries, Jihwan Bang, Hyunseo Koh, Seulki
Transductive Few-Shot Learning, Hao Zhu, Piotr Koniusz Park, Hwanjun Song, Jung-Woo Ha, Jonghyun Choi
112. Few-Shot Learning With Noisy Labels, Kevin J. Liang, 131. DyTox: Transformers for Continual Learning With DYnamic
Samrudhdhi B. Rangrej, Vladan Petrovic, Tal Hassner TOken eXpansion, Arthur Douillard, Alexandre Ramé, Guillaume
113. Ranking Distance Calibration for Cross-Domain Few-Shot Couairon, Matthieu Cord
Learning, Pan Li, Shaogang Gong, Chengjie Wang, Yanwei Fu 132. Self-Sustaining Representation Expansion for Non-Exemplar
114. Revisiting Learnable Affines for Batch Norm in Few-Shot Class-Incremental Learning, Kai Zhu, Wei Zhai, Yang Cao, Jiebo
Transfer Learning, Moslem Yazdanpanah, Aamer Abdul Rahman, Luo, Zheng-Jun Zha
Muawiz Chaudhary, Christian Desrosiers, Mohammad Havaei, 133. En-Compactness: Self-Distillation Embedding & Contrastive
Eugene Belilovsky, Samira Ebrahimi Kahou Generation for Generalized Zero-Shot Learning, Xia Kong,
115. Attribute Surrogates Learning and Spectral Tokens Pooling in Zuodong Gao, Xiaofan Li, Ming Hong, Jun Liu, Chengjie Wang,
Transformers for Few-Shot Learning, Yangji He, Weihan Liang, Yuan Xie, Yanyun Qu
Dongyang Zhao, Hong-Yu Zhou, Weifeng Ge, Yizhou Yu, 134. VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot
Wenqiang Zhang Learning, Wenjia Xu, Yongqin Xian, Jiuniu Wang, Bernt Schiele,
116. Learning To Memorize Feature Hallucination for One-Shot Zeynep Akata
Image Generation, Yu Xie, Yanwei Fu, Ying Tai, Yun Cao, Junwei 135. Siamese Contrastive Embedding Network for Compositional
Zhu, Chengjie Wang Zero-Shot Learning, Xiangyu Li, Xu Yang, Kun Wei, Cheng Deng,
117. A Closer Look at Few-Shot Image Generation, Yunqing Zhao, Muli Yang
Henghui Ding, Houjing Huang, Ngai-Man Cheung

30
Wednesday, June 22 (Afternoon) Program
136. KG-SP: Knowledge Guided Simple Primitives for Open World 156. Reflection and Rotation Symmetry Detection via Equivariant
Compositional Zero-Shot Learning, Shyamgopal Karthik, Learning, Ahyun Seo, Byungjin Kim, Suha Kwak, Minsu Cho
Massimiliano Mancini, Zeynep Akata 157. Learning To Imagine: Diversify Memory for Incremental
137. Non-Generative Generalized Zero-Shot Learning via Task- Learning Using Unlabeled Data, Yu-Ming Tang, Yi-Xing Peng,
Correlated Disentanglement and Controllable Samples Wei-Shi Zheng
Synthesis, Yaogong Feng, Xiaowen Huang, Pengbo Yang, Jian Yu, 158. A Simple Episodic Linear Probe Improves Visual Recognition in
Jitao Sang the Wild, Yuanzhi Liang, Linchao Zhu, Xiaohan Wang, Yi Yang
138. WALT: Watch and Learn 2D Amodal Representation From 159. Cross Domain Object Detection by Target-Perceived Dual
Time-Lapse Imagery, N. Dinesh Reddy, Robert Tamburo, Branch Distillation, Mengzhe He, Yali Wang, Jiaxi Wu, Yiru Wang,
Srinivasa G. Narasimhan Hanqing Li, Bo Li, Weihao Gan, Wei Wu, Yu Qiao
Recognition: Detection, Categorization, Retrieval 160. Multi-Granularity Alignment Domain Adaptation for Object
139. Omni-DETR: Omni-Supervised Object Detection With Detection, Wenzhang Zhou, Dawei Du, Libo Zhang, Tiejian Luo,
Transformers, Pei Wang, Zhaowei Cai, Hao Yang, Gurumurthy Yanjun Wu
Swaminathan, Nuno Vasconcelos, Bernt Schiele, Stefano Soatto 161. Expanding Low-Density Latent Regions for Open-Set Object
140. DESTR: Object Detection With Split Transformer, Liqiang He, Detection, Jiaming Han, Yuqiang Ren, Jian Ding, Xingjia Pan, Ke
Sinisa Todorovic Yan, Gui-Song Xia
141. A Dual Weighting Label Assignment Scheme for Object 162. Class-Incremental Learning With Strong Pre-Trained Models,
Detection, Shuai Li, Chenhang He, Ruihuang Li, Lei Zhang Tz-Ying Wu, Gurumurthy Swaminathan, Zhizhong Li, Avinash
142. Entropy-Based Active Learning for Object Detection With Ravichandran, Nuno Vasconcelos, Rahul Bhotika, Stefano Soatto
Progressive Diversity Constraint, Jiaxi Wu, Jiaxin Chen, Di Huang 163. ProposalCLIP: Unsupervised Open-Category Object Proposal
143. Localization Distillation for Dense Object Detection, Zhaohui Generation via Exploiting CLIP Cues, Hengcan Shi, Munawar
Zheng, Rongguang Ye, Ping Wang, Dongwei Ren, Wangmeng Hayat, Yicheng Wu, Jianfei Cai
Zuo, Qibin Hou, Ming-Ming Cheng Self-, Semi-, Meta-, & Unsupervised Learning
144. Group R-CNN for Weakly Semi-Supervised Object Detection 164. Self-Supervised Models Are Continual Learners, Enrico Fini,
With Points, Shilong Zhang, Zhuoran Yu, Liyang Liu, Xinjiang Victor G. Turrisi da Costa, Xavier Alameda-Pineda, Elisa Ricci,
Wang, Aojun Zhou, Kai Chen Karteek Alahari, Julien Mairal
145. Overcoming Catastrophic Forgetting in Incremental Object 165. The Two Dimensions of Worst-Case Training and Their
Detection via Elastic Response Distillation, Tao Feng, Mang Integrated Effect for Out-of-Domain Generalization, Zeyi
Wang, Hangjie Yuan Huang, Haohan Wang, Dong Huang, Yong Jae Lee, Eric P. Xing
146. CREAM: Weakly Supervised Object Localization via Class RE- 166. Beyond Supervised vs. Unsupervised: Representative
Activation Mapping, Jilan Xu, Junlin Hou, Yuejie Zhang, Rui Feng, Benchmarking and Analysis of Image Representation Learning,
Rui-Wei Zhao, Tao Zhang, Xuequan Lu, Shang Gao Matthew Gwilliam, Abhinav Shrivastava
147. One Loss for Quantization: Deep Hashing With Discrete 167. SimMIM: A Simple Framework for Masked Image Modeling,
Wasserstein Distributional Matching, Khoa D. Doan, Peng Yang, Zhenda Xie, Zheng Zhang, Yue Cao, Yutong Lin, Jianmin Bao,
Ping Li Zhuliang Yao, Qi Dai, Han Hu
148. PSTR: End-to-End One-Step Person Search With Transformers, 168. Semantic-Aware Auto-Encoders for Self-Supervised
Jiale Cao, Yanwei Pang, Rao Muhammad Anwer, Hisham Representation Learning, Guangrun Wang, Yansong Tang, Liang
Cholakkal, Jin Xie, Mubarak Shah, Fahad Shahbaz Khan Lin, Philip H.S. Torr
149. Protecting Celebrities From DeepFake With Identity 169. UNICON: Combating Label Noise Through Uniform Selection
Consistency Transformer, Xiaoyi Dong, Jianmin Bao, Dongdong and Contrastive Learning, Nazmul Karim, Mamshad Nayeem
Chen, Ting Zhang, Weiming Zhang, Nenghai Yu, Dong Chen, Fang Rizve, Nazanin Rahnavard, Ajmal Mian, Mubarak Shah
Wen, Baining Guo 170. Contrastive Conditional Neural Processes, Zesheng Ye, Lina Yao
150. MDAN: Multi-Level Dependent Attention Network for Visual 171. One-Bit Active Query With Contrastive Pairs, Yuhang Zhang,
Emotion Analysis, Liwen Xu, Zhengtao Wang, Bin Wu, Simon Lui Xiaopeng Zhang, Lingxi Xie, Jie Li, Robert C. Qiu, Hengtong Hu, Qi
151. Contextual Similarity Distillation for Asymmetric Image Tian
Retrieval, Hui Wu, Min Wang, Wengang Zhou, Houqiang Li, Qi 172. HCSC: Hierarchical Contrastive Selective Coding, Yuanfan Guo,
Tian Minghao Xu, Jiawen Li, Bingbing Ni, Xuanyu Zhu, Zhenbang Sun,
152. Improving Visual Grounding With Visual-Linguistic Verification Yi Xu
and Iterative Reasoning, Li Yang, Yan Xu, Chunfeng Yuan, Wei 173. Motion-Aware Contrastive Video Representation Learning via
Liu, Bing Li, Weiming Hu Foreground-Background Merging, Shuangrui Ding, Maomao Li,
153. MPC: Multi-View Probabilistic Clustering, Junjie Liu, Junlong Liu, Tianyu Yang, Rui Qian, Haohang Xu, Qingyi Chen, Jue Wang,
Shaotian Yan, Rongxin Jiang, Xiang Tian, Boxuan Gu, Yaowu Hongkai Xiong
Chen, Chen Shen, Jianqiang Huang 174. Hierarchical Self-Supervised Representation Learning for Movie
154. Text Spotting Transformers, Xiang Zhang, Yongwen Su, Subarna Understanding, Fanyi Xiao, Kaustav Kundu, Joseph Tighe, Davide
Tripathi, Zhuowen Tu Modolo
155. Represent, Compare, and Learn: A Similarity-Aware Framework 175. Anomaly Detection via Reverse Distillation From One-Class
for Class-Agnostic Counting, Min Shi, Hao Lu, Chen Feng, Embedding, Hanqiu Deng, Xingyu Li
Chengxin Liu, Zhiguo Cao 176. Unsupervised Representation Learning for Binary Networks by
Joint Classifier Learning, Dahyun Kim, Jonghyun Choi

31
Wednesday, June 22 (Afternoon) Program
177. DC-SSL: Addressing Mismatched Class Distribution in Semi- 198. Integrative Few-Shot Learning for Classification and
Supervised Learning, Zhen Zhao, Luping Zhou, Yue Duan, Lei Segmentation, Dahyun Kang, Minsu Cho
Wang, Lei Qi, Yinghuan Shi 199. GANORCON: Are Generative Models Useful for Few-Shot
178. Learning To Collaborate in Decentralized Learning of Segmentation? Oindrila Saha, Zezhou Cheng, Subhransu Maji
Personalized Models, Shuangtong Li, Tianyi Zhou, Xinmei Tian, 200. SphericGAN: Semi-Supervised Hyper-Spherical Generative
Dacheng Tao Adversarial Networks for Fine-Grained Image Synthesis, Tianyi
179. Highly-Efficient Incomplete Large-Scale Multi-View Clustering Chen, Yunfei Zhang, Xiaoyang Huo, Si Wu, Yong Xu, Hau San
With Consensus Bipartite Graph, Siwei Wang, Xinwang Liu, Li Wong
Liu, Wenxuan Tu, Xinzhong Zhu, Jiyuan Liu, Sihang Zhou, En Zhu 201. CoordGAN: Self-Supervised Dense Correspondences Emerge
180. DASO: Distribution-Aware Semantics-Oriented Pseudo-Label From GANs, Jiteng Mu, Shalini De Mello, Zhiding Yu, Nuno
for Imbalanced Semi-Supervised Learning, Youngtaek Oh, Vasconcelos, Xiaolong Wang, Jan Kautz, Sifei Liu
Dong-Jin Kim, In So Kweon Privacy and Federated Learning
181. Global Convergence of MAML and Theory-Inspired Neural 202. GradViT: Gradient Inversion of Vision Transformers, Ali
Architecture Search for Few-Shot Learning, Haoxiang Wang, Hatamizadeh, Hongxu Yin, Holger R. Roth, Wenqi Li, Jan Kautz,
Yite Wang, Ruoyu Sun, Bo Li Daguang Xu, Pavlo Molchanov
182. Semi-Supervised Object Detection via Multi-Instance Alignment 203. Deep 3D-to-2D Watermarking: Embedding Messages in 3D
With Global Class Prototypes, Aoxue Li, Peng Yuan, Zhenguo Li Meshes and Extracting Them From 2D Renderings, Innfarn Yoo,
183. Unbiased Teacher v2: Semi-Supervised Object Detection for Huiwen Chang, Xiyang Luo, Ondrej Stava, Ce Liu, Peyman
Anchor-Free and Anchor-Based Detectors, Yen-Cheng Liu, Chih- Milanfar, Feng Yang
Yao Ma, Zsolt Kira
204. CD2-pFed: Cyclic Distillation-Guided Channel Decoupling for
184. Spectral Unsupervised Domain Adaptation for Visual
Model Personalization in Federated Learning, Yiqing Shen,
Recognition, Jingyi Zhang, Jiaxing Huang, Zichen Tian, Shijian Lu
Yuyin Zhou, Lequan Yu
185. DATA: Domain-Aware and Task-Aware Self-Supervised
205. APRIL: Finding the Achilles’ Heel on Privacy for Vision
Learning, Qing Chang, Junran Peng, Lingxi Xie, Jiajun Sun,
Transformers, Jiahao Lu, Xi Sheryl Zhang, Tianli Zhao, Xiangyu
Haoran Yin, Qi Tian, Zhaoxiang Zhang
He, Jian Cheng
186. Dynamic Kernel Selection for Improved Generalization and
206. Rethinking Architecture Design for Tackling Data
Memory Efficiency in Meta-Learning, Arnav Chavan, Rishabh
Heterogeneity in Federated Learning, Liangqiong Qu, Yuyin
Tiwari, Udbhav Bamba, Deepak K. Gupta
Zhou, Paul Pu Liang, Yingda Xia, Feifei Wang, Ehsan Adeli, Li Fei-
187. DeepDPM: Deep Clustering With an Unknown Number of Fei, Daniel Rubin
Clusters, Meitar Ronen, Shahaf E. Finder, Oren Freifeld
207. Robust Federated Learning With Noisy and Heterogeneous
188. PLAD: Learning To Infer Shape Programs With Pseudo-Labels Clients, Xiuwen Fang, Mang Ye
and Approximate Distributions, R. Kenny Jones, Homer Walke,
208. Federated Learning With Position-Aware Neurons, Xin-Chun Li,
Daniel Ritchie
Yi-Chu Xu, Shaoming Song, Bingshuai Li, Yinchuan Li, Yunfeng
189. Robust Outlier Detection by De-Biasing VAE Likelihoods, Shao, De-Chuan Zhan
Kushal Chauhan, Barath Mohan U, Pradeep Shenoy, Manish
209. Layer-Wised Model Aggregation for Personalized Federated
Gupta, Devarajan Sridharan
Learning, Xiaosong Ma, Jie Zhang, Song Guo, Wenchao Xu
190. Image-to-Lidar Self-Supervised Distillation for Autonomous
210. FedCor: Correlation-Based Active Client Selection Strategy for
Driving Data, Corentin Sautier, Gilles Puy, Spyros Gidaris,
Heterogeneous Federated Learning, Minxue Tang, Xuefei Ning,
Alexandre Boulch, Andrei Bursuc, Renaud Marlet
Yitu Wang, Jingwei Sun, Yu Wang, Hai Li, Yiran Chen
191. CrossPoint: Self-Supervised Cross-Modal Contrastive Learning
211. FedDC: Federated Learning With Non-IID Data via Local Drift
for 3D Point Cloud Understanding, Mohamed Afham, Isuru
Decoupling and Correction, Liang Gao, Huazhu Fu, Li Li, Yingwen
Dissanayake, Dinithi Dissanayake, Amaya Dharmasiri, Kanchana
Chen, Ming Xu, Cheng-Zhong Xu
Thilakarathna, Ranga Rodrigo
212. Differentially Private Federated Learning With Local
192. Cross-Domain Correlation Distillation for Unsupervised Domain
Regularization and Sparsification, Anda Cheng, Peisong Wang,
Adaptation in Nighttime Semantic Segmentation, Huan Gao,
Xi Sheryl Zhang, Jian Cheng
Jichang Guo, Guoli Wang, Qian Zhang
213. Auditing Privacy Defenses in Federated Learning via Generative
193. DAFormer: Improving Network Architectures and Training
Gradient Leakage, Zhuohang Li, Jiaxin Zhang, Luyang Liu, Jian
Strategies for Domain-Adaptive Semantic Segmentation, Lukas
Liu
Hoyer, Dengxin Dai, Luc Van Gool
214. Learn From Others and Be Yourself in Heterogeneous
194. WildNet: Learning Domain Generalized Semantic
Federated Learning, Wenke Huang, Mang Ye, Bo Du
Segmentation From the Wild, Suhyeon Lee, Hongje Seong,
215. RSCFed: Random Sampling Consensus Federated Semi-
Seongwon Lee, Euntai Kim
Supervised Learning, Xiaoxiao Liang, Yiqun Lin, Huazhu Fu, Lei
195. UCC: Uncertainty Guided Cross-Head Co-Training for Semi-
Zhu, Xiaomeng Li
Supervised Semantic Segmentation, Jiashuo Fan, Bin Gao, Huan
216. Federated Class-Incremental Learning, Jiahua Dong, Lixu Wang,
Jin, Lihui Jiang
Zhen Fang, Gan Sun, Shichao Xu, Xiao Wang, Qi Zhu
196. Semi-Supervised Semantic Segmentation With Error
217. Fine-Tuning Global Model via Data-Free Knowledge Distillation
Localization Network, Donghyeon Kwon, Suha Kwak
for Non-IID Federated Learning, Lin Zhang, Li Shen, Liang Ding,
197. Unbiased Subclass Regularization for Semi-Supervised Semantic
Dacheng Tao, Ling-Yu Duan
Segmentation, Dayan Guan, Jiaxing Huang, Aoran Xiao, Shijian Lu

32
Wednesday, June 22 (Afternoon) Program
218. FedCorr: Multi-Stage Federated Learning for Label Noise 240. Leveling Down in Computer Vision: Pareto Inefficiencies in Fair
Correction, Jingyi Xu, Zihan Chen, Tony Q.S. Quek, Kai Fong Deep Classifiers, Dominik Zietlow, Michael Lohaus, Guha
Ernest Chong Balakrishnan, Matthäus Kleindessner, Francesco Locatello,
219. ResSFL: A Resistance Transfer Framework for Defending Model Bernhard Schölkopf, Chris Russell
Inversion Attack in Split Federated Learning, Jingtao Li, Adnan 241. Deep Unlearning via Randomized Conditionally Independent
Siraj Rakin, Xing Chen, Zhezhi He, Deliang Fan, Chaitali Hessians, Ronak Mehta, Sourav Pal, Vikas Singh, Sathya N. Ravi
Chakrabarti 242. Equivariance Allows Handling Multiple Nuisance Variables
Explainable Computer Vision When Analyzing Pooled Neuroimaging Datasets, Vishnu Suresh
220. Cycle-Consistent Counterfactuals by Latent Transformations, Lokhande, Rudrasis Chakraborty, Sathya N. Ravi, Vikas Singh
Saeed Khorram, Li Fuxin 243. A Study on the Distribution of Social Biases in Self-Supervised
221. Consistent Explanations by Contrastive Learning, Vipin Pillai, Learning Visual Models, Kirill Sirotkin, Pablo Carballeira, Marcos
Soroush Abbasi Koohpayegani, Ashley Ouligian, Dennis Fong, Escudero-Viñolo
Hamed Pirsiavash Vision & X
222. Towards Better Understanding Attribution Methods, Sukrut 244. Cross-Modal Perceptionist: Can Face Geometry Be Gleaned
Rao, Moritz Böhle, Bernt Schiele From Voices? Cho-Ying Wu, Chin-Cheng Hsu, Ulrich Neumann
223. Proto2Proto: Can You Recognize the Car, the Way I Do? Monish 245. Learning Hierarchical Cross-Modal Association for Co-Speech
Keswani, Sriranjani Ramakrishnan, Nishant Reddy, Vineeth N Gesture Generation, Xian Liu, Qianyi Wu, Hang Zhou, Yinghao
Balasubramanian Xu, Rui Qian, Xinyi Lin, Xiaowei Zhou, Wayne Wu, Bo Dai, Bolei
224. Do Explanations Explain? Model Knows Best, Ashkan Khakzar, Zhou
Pedram Khorsandi, Rozhin Nobahari, Nassir Navab 246. SEEG: Semantic Energized Co-Speech Gesture Generation,
225. HINT: Hierarchical Neuron Concept Explainer, Andong Wang, Yuanzhi Liang, Qianyu Feng, Linchao Zhu, Li Hu, Pan Pan, Yi
Wei-Ning Lee, Xiaojuan Qi Yang
226. Deformable ProtoPNet: An Interpretable Image Classifier Using 247. Mix and Localize: Localizing Sound Sources in Mixtures, Xixi Hu,
Deformable Prototypes, Jon Donnelly, Alina Jade Barnett, Ziyang Chen, Andrew Owens
Chaofan Chen 248. Reading To Listen at the Cocktail Party: Multi-Modal Speech
227. What Do Navigation Agents Learn About Their Environment? Separation, Akam Rahimi, Triantafyllos Afouras, Andrew
Kshitij Dwivedi, Gemma Roig, Aniruddha Kembhavi, Roozbeh Zisserman
Mottaghi 249. IntentVizor: Towards Generic Query Guided Interactive Video
228. A Framework for Learning Ante-Hoc Explainable Models via Summarization, Guande Wu, Jianzhe Lin, Claudio T. Silva
Concepts, Anirban Sarkar, Deepak Vijaykeerthy, Anindya Sarkar, 250. M3L: Language-Based Video Editing via Multi-Modal Multi-
Vineeth N Balasubramanian Level Transformers, Tsu-Jui Fu, Xin Eric Wang, Scott T. Grafton,
229. Exploiting Explainable Metrics for Augmented SGD, Mahdi S. Miguel P. Eckstein, William Yang Wang
Hosseini, Mathieu Tuli, Konstantinos N. Plataniotis 251. Finding Fallen Objects via Asynchronous Audio-Visual
230. FAM: Visual Explanations for the Feature Representations From Integration, Chuang Gan, Yi Gu, Siyuan Zhou, Jeremy Schwartz,
Deep Convolutional Networks, Yuxi Wu, Changhuai Chen, Jun Seth Alter, James Traer, Dan Gutfreund, Joshua B. Tenenbaum,
Che, Shiliang Pu Josh H. McDermott, Antonio Torralba
231. Interactive Disentanglement: Learning Concepts by Interacting 252. Weakly Paired Associative Learning for Sound and Image
With Their Prototype Representations, Wolfgang Stammer, Representations via Bimodal Associative Memory, Sangmin Lee,
Marius Memmel, Patrick Schramowski, Kristian Kersting Hyung-Il Kim, Yong Man Ro
232. B-Cos Networks: Alignment Is All We Need for Interpretability, 253. Egocentric Deep Multi-Channel Audio-Visual Active Speaker
Moritz Böhle, Mario Fritz, Bernt Schiele Localization, Hao Jiang, Calvin Murdock, Vamsi Krishna Ithapu
233. The Flag Median and FlagIRLS, Nathan Mankovich, Emily J. 254. Audio-Visual Generalised Zero-Shot Learning With Cross-Modal
King, Chris Peterson, Michael Kirby Attention and Language, Otniel-Bogdan Mercea, Lukas Riesch,
Transparency, Fairness, Accountability, Privacy & Ethics in Vision A. Sophia Koepke, Zeynep Akata
234. Learning Fair Classifiers With Partially Annotated Group Labels, 255. It’s Time for Artistic Correspondence in Music and Video, Dídac
Sangwon Jung, Sanghyuk Chun, Taesup Moon Surís, Carl Vondrick, Bryan Russell, Justin Salamon
235. Estimating Structural Disparities for Face Models, Shervin 256. Self-Supervised Object Detection From Audio-Visual
Ardeshir, Cristina Segalin, Nathan Kallus Correspondence, Triantafyllos Afouras, Yuki M. Asano, Francois
236. Estimating Example Difficulty Using Variance of Gradients, Fagan, Andrea Vedaldi, Florian Metze
Chirag Agarwal, Daniel D'souza, Sara Hooker 257. More Than Words: In-the-Wild Visually-Driven Prosody for
237. Fairness-Aware Adversarial Perturbation Towards Bias Text-to-Speech, Michael Hassid, Michelle Tadmor Ramanovich,
Mitigation for Deployed Deep Models, Zhibo Wang, Xiaowei Brendan Shillingford, Miaosen Wang, Ye Jia, Tal Remez
Dong, Henry Xue, Zhifei Zhang, Weifeng Chiu, Tao Wei, Kui Ren 258. OBJECTFOLDER 2.0: A Multisensory Object Dataset for Sim2Real
238. Fair Contrastive Learning for Facial Attribute Classification, Transfer, Ruohan Gao, Zilin Si, Yen-Yu Chang, Samuel Clarke,
Sungho Park, Jewook Lee, Pilhyeon Lee, Sunhee Hwang, Jeannette Bohg, Li Fei-Fei, Wenzhen Yuan, Jiajun Wu
Dohyung Kim, Hyeran Byun 259. A Probabilistic Graphical Model Based on Neural-Symbolic
239. Leveraging Adversarial Examples To Quantify Membership Reasoning for Visual Relationship Detection, Dongran Yu, Bo
Information Leakage, Ganesh Del Grosso, Hamid Jalalzai, Georg Yang, Qianhao Wei, Anchen Li, Shirui Pan
Pichler, Catuscia Palamidessi, Pablo Piantanida
33
Wednesday, June 22 (Afternoon) Program
1700–1800 Plenary 2 (Hall B1)
Chair: Gang Hua (Wormpex AI Research)
Keynote: Toward Integrative AI with Computer Vision,
Xuedong Huang (Microsoft Azure AI)
Abstract: The pace of innovation in AI over the past decade has
been remarkable. Thanks to big data, computing power and
modern network architecture, we are seeing a wave of continu-
ous breakthroughs find their way into people’s everyday lives.
In my role as a technology leader developing both the science
and engineering of AI, I see the unrealized potential of making
AI more useful in the real world for every person and organiza-
tion. While modern AI has reached human parity on a few well-
defined, narrow research benchmarks such as speech recogni-
tion, SuperGLUE, and image captioning, a rapidly growing num-
ber of disjointed AI tasks are needed to mimic human intelli-
gence in understanding the open and complex world. As each AI
task is often defined by the statistics manifested from large
amounts of task-specific data, we end up building expensive si-
los without a synergistic way of knowledge sharing and trans-
ferring among the different AI tasks.
In this keynote, I will share our progress on Integrative AI, a
multi-lingual, multi-modal approach addressing this challenge
using a holistic, semantic representation to unify various tasks
in speech, language, and vision. To apply Integrative AI to com-
puter vision, we have been developing a foundation model
called Florence, which introduces the concept of a semantic
layer through large-scale image and language pretraining. Our
Florence model distills visual knowledge and reasoning into an
image and text transformer to enable zero-shot and few-shot
capabilities for common computer vision tasks such as recogni-
tion, detection, segmentation, and captioning. Through bridging
the gap between textural representation and various vision
downstream tasks, we not only achieved state-of-the-art results
on dozens of benchmarks such as ImageNet-1K zero-shot, COCO
segmentation, VQA and Kinetcs-600, but also discovered novel
results of image understanding.
We are encouraged by these preliminary successes and believe
we have only scratched the surface of Integrative AI. As lan-
guage is at the core of human intelligence, I foresee that the se-
mantic layer will empower computer vision to go beyond visual
perception and connect pixels seamlessly to the core of human
intelligence: intent, reasoning, and decision.

1830–2030 Special Event (Mardi Gras World)


Address: 1380 Port of New Orleans Pl. (refer to map → )

Notes:

34
Thursday, June 23 (Morning) Program
Thursday, June 23 16. [1000] Smooth-Swap: A Simple Enhancement for Face-Swapping
With Smoothness, Jiseob Kim, Jihoon Lee, Byoung-Tak Zhang
17. [1005] Few-Shot Head Swapping in the Wild, Changyong Shu,
0700–0830 Breakfast (Halls D-E) Hemao Wu, Hang Zhou, Jiaming Liu, Zhibin Hong, Changxing
Ding, Junyu Han, Jingtuo Liu, Errui Ding, Jingdong Wang
18. [1010] ClothFormer: Taming Video Virtual Try-On in All Module,
0800–1400 Registration (Great Hall Lobby) Jianbin Jiang, Tan Wang, He Yan, Junhui Liu

0800–0830 Poster Setup (Halls B2-C)


0830–1018 Oral 3.1.2: Deep Learning Architectures &
Techniques (Great Hall A-D)
0830–1018 Oral 3.1.1: Image & Video Synthesis and Papers in this session are in Poster Session 3.1
Generation (I) (Hall B1)
Chairs: Saining Xie (Facebook AI Research)
Papers in this session are in Poster Session 3.1
Hao Su (UCSD)
Chairs: Sharon Xiaolei Huang (Pennsylvania State Univ.)
Format (5 min. presentation; 3 min. group questions/3 papers)
Shaodi You (Univ. of Amsterdam) 19. [0830] A-ViT: Adaptive Tokens for Efficient Vision Transformer,
Format (5 min. presentation; 3 min. group questions/3 papers) Hongxu Yin, Arash Vahdat, Jose M. Alvarez, Arun Mallya, Jan
1. [0830] Diffusion Autoencoders: Toward a Meaningful and Kautz, Pavlo Molchanov
Decodable Representation, Konpat Preechakul, Nattanat 20. [0835] MetaFormer Is Actually What You Need for Vision, Weihao
Chatthee, Suttisak Wizadwongsa, Supasorn Suwajanakorn Yu, Mi Luo, Pan Zhou, Chenyang Si, Yichen Zhou, Xinchao Wang,
2. [0835] Polymorphic-GAN: Generating Aligned Samples Across Jiashi Feng, Shuicheng Yan
Multiple Domains With Learned Morph Maps, Seung Wook Kim, 21. [0840] Reversible Vision Transformers, Karttikeya Mangalam,
Karsten Kreis, Daiqing Li, Antonio Torralba, Sanja Fidler Haoqi Fan, Yanghao Li, Chao-Yuan Wu, Bo Xiong, Christoph
3. [0840] Polarity Sampling: Quality and Diversity Control of Pre- Feichtenhofer, Jitendra Malik
Trained Generative Networks via Singular Values, Ahmed Imtiaz
22. [0848] Learned Queries for Efficient Local Attention, Moab Arar,
Humayun, Randall Balestriero, Richard Baraniuk
Ariel Shamir, Amit H. Bermano
4. [0848] Ensembling Off-the-Shelf Models for GAN Training, Nupur 23. [0853] Shunted Self-Attention via Multi-Scale Token
Kumari, Richard Zhang, Eli Shechtman, Jun-Yan Zhu Aggregation, Sucheng Ren, Daquan Zhou, Shengfeng He, Jiashi
5. [0853] Marginal Contrastive Correspondence for Guided Image Feng, Xinchao Wang
Generation, Fangneng Zhan, Yingchen Yu, Rongliang Wu, Jiahui 24. [0858] Automatic Relation-Aware Graph Network Proliferation,
Zhang, Shijian Lu, Changgong Zhang Shaofei Cai, Liang Li, Xinzhe Han, Jiebo Luo, Zheng-Jun Zha,
6. [0858] GRAM: Generative Radiance Manifolds for 3D-Aware Qingming Huang
Image Generation, Yu Deng, Jiaolong Yang, Jianfeng Xiang, Xin
25. [0906] β-DARTS: Beta-Decay Regularization for Differentiable
Tong
Architecture Search, Peng Ye, Baopu Li, Yikang Li, Tao Chen,
7. [0906] High-Resolution Image Synthesis With Latent Diffusion Jiayuan Fan, Wanli Ouyang
Models, Robin Rombach, Andreas Blattmann, Dominik Lorenz, 26. [0911] Distribution Consistent Neural Architecture Search, Junyi
Patrick Esser, Björn Ommer Pan, Chong Sun, Yizhou Zhou, Ying Zhang, Chen Li
8. [0911] Vector Quantized Diffusion Model for Text-to-Image 27. [0916] Training-Free Transformer Architecture Search, Qinqin
Synthesis, Shuyang Gu, Dong Chen, Jianmin Bao, Fang Wen, Bo Zhou, Kekai Sheng, Xiawu Zheng, Ke Li, Xing Sun, Yonghong Tian,
Zhang, Dongdong Chen, Lu Yuan, Baining Guo Jie Chen, Rongrong Ji
9. [0916] ManiTrans: Entity-Level Text-Guided Image Manipulation
28. [0924] TeachAugment: Data Augmentation Optimization Using
via Token-Wise Semantic Alignment and Generation, Jianan
Teacher Knowledge, Teppei Suzuki
Wang, Guansong Lu, Hang Xu, Zhenguo Li, Chunjing Xu, Yanwei
29. [0929] Knowledge Distillation via the Target-Aware Transformer,
Fu
Sihao Lin, Hongwei Xie, Bing Wang, Kaicheng Yu, Xiaojun Chang,
10. [0924] Dataset Distillation by Matching Training Trajectories, Xiaodan Liang, Gang Wang
George Cazenavette, Tongzhou Wang, Antonio Torralba, Alexei A. 30. [0934] Knowledge Distillation: A Good Teacher Is Patient and
Efros, Jun-Yan Zhu Consistent, Lucas Beyer, Xiaohua Zhai, Amélie Royer, Larisa
11. [0929] Continual Predictive Learning From Videos, Geng Chen, Markeeva, Rohan Anil, Alexander Kolesnikov
Wendong Zhang, Han Lu, Siyu Gao, Yunbo Wang, Mingsheng
31. [0942] An Image Patch Is a Wave: Phase-Aware Vision MLP, Yehui
Long, Xiaokang Yang
Tang, Kai Han, Jianyuan Guo, Chang Xu, Yanxi Li, Chao Xu, Yunhe
12. [0934] Motion-Adjustable Neural Implicit Video Representation,
Wang
Long Mai, Feng Liu
32. [0947] Dynamic MLP for Fine-Grained Image Classification by
13. [0942] Splicing ViT Features for Semantic Appearance Transfer, Leveraging Geographical and Temporal Information, Lingfeng
Narek Tumanyan, Omer Bar-Tal, Shai Bagon, Tali Dekel Yang, Xiang Li, Renjie Song, Borui Zhao, Juntian Tao, Shihao Zhou,
14. [0947] MAT: Mask-Aware Transformer for Large Hole Image Jiajun Liang, Jian Yang
Inpainting, Wenbo Li, Zhe Lin, Kun Zhou, Lu Qi, Yi Wang, Jiaya Jia 33. [0952] Controllable Dynamic Multi-Task Architectures, Dripta S.
15. [0952] Day-to-Night Image Synthesis for Training Nighttime Raychaudhuri, Yumin Suh, Samuel Schulter, Xiang Yu, Masoud
Neural ISPs, Abhijith Punnappurath, Abdullah Abuolaim, Faraki, Amit K. Roy-Chowdhury, Manmohan Chandraker
Abdelrahman Abdelhamed, Alex Levinshtein, Michael S. Brown
35
Thursday, June 23 (Morning) Program
34. [1000] Grounded Language-Image Pre-Training, Liunian Harold 49. [0942] LASER: LAtent SpacE Rendering for 2D Visual
Li, Pengchuan Zhang, Haotian Zhang, Jianwei Yang, Chunyuan Li, Localization, Zhixiang Min, Naji Khosravan, Zachary Bessinger,
Yiwu Zhong, Lijuan Wang, Lu Yuan, Lei Zhang, Jenq-Neng Hwang, Manjunath Narayana, Sing Bing Kang, Enrique Dunn, Ivaylo
Kai-Wei Chang, Jianfeng Gao Boyadzhiev
35. [1005] ZZ-Net: A Universal Rotation Equivariant Architecture for 50. [0947] Learning To Detect Scene Landmarks for Camera
2D Point Clouds, Georg Bökman, Fredrik Kahl, Axel Flinth Localization, Tien Do, Ondrej Miksik, Joseph DeGol, Hyun Soo
36. [1010] CADTransformer: Panoptic Symbol Spotting Transformer Park, Sudipta N. Sinha
for CAD Drawings, Zhiwen Fan, Tianlong Chen, Peihao Wang, 51. [0952] Geometric Transformer for Fast and Robust Point Cloud
Zhangyang Wang Registration, Zheng Qin, Hao Yu, Changjian Wang, Yulan Guo,
Yuxing Peng, Kai Xu
52. [1000] ARCS: Accurate Rotation and Correspondence Search,
0830–1018 Oral 3.1.3: Human Pose Estimation &
Liangzu Peng, Manolis C. Tsakiris, René Vidal
Tracking, Localization, and Object Pose
53. [1005] FisherMatch: Semi-Supervised Rotation Regression via
Estimation (Great Hall B-C)
Entropy-Based Filtering, Yingda Yin, Yingcheng Cai, He Wang,
Papers in this session are in Poster Session 3.1 Baoquan Chen
Chairs: Leonid Sigal (Univ. of British Columbia) 54. [1010] Uni6D: A Unified CNN Framework Without Projection
Georgios Pavlakos (UC Berkeley) Breakdown for 6D Pose Estimation, Xiaoke Jiang, Donghai Li, Hao
Angela Yao (National Univ. of Singapore) Chen, Ye Zheng, Rui Zhao, Liwei Wu
Format (5 min. presentation; 3 min. group questions/3 papers)
37. [0830] Adversarial Parametric Pose Prior, Andrey Davydov,
1030–1100 Morning Break (Halls B2-C)
Anastasia Remizova, Victor Constantin, Sina Honari, Mathieu
Salzmann, Pascal Fua
38. [0835] Temporal Feature Alignment and Mutual Information 1000–1600 Exhibits (Halls B2-C)
Maximization for Video-Based Human Pose Estimation, • See Exhibits map for list of exhibitors.
Zhenguang Liu, Runyang Feng, Haoming Chen, Shuang Wu, Yixing
Gao, Yunjun Gao, Xiang Wang
1000–1200 Demos (Halls B2-C Demo Area)
39. [0840] PoseTriplet: Co-Evolving 3D Human Pose Estimation,
Imitation, and Hallucination Under Self-Supervision, Kehong • Real-Time, Accurate, and Consistent Video Semantic
Gong, Bingbing Li, Jianfeng Zhang, Tao Wang, Jing Huang, Segmentation via Unsupervised Adaptation and Cross-Unit
Michael Bi Mi, Jiashi Feng, Xinchao Wang Deployment on Mobile Device, Hyojin Park, Alan Yessenbayev,
Tushar Singhal, Navin Kumar Adhikari, Yizhe Zhang, Shubhankar
40. [0848] Generalizable Human Pose Triangulation, Kristijan Bartol, Borse, Hong Cai, Nilesh Pandey, Fei Yin, Frank Mayer, Balaji Calidas,
David Bojanić, Tomislav Petković, Tomislav Pribanić Fatih Porikli, (Qualcomm AI Research)
41. [0853] GLAMR: Global Occlusion-Aware Human Mesh Recovery
• A Low-Cost & Real-Time Motion Capture System, Anargyros
With Dynamic Cameras, Ye Yuan, Umar Iqbal, Pavlo Molchanov,
Chatzitofis, Georgios Albanis, Nikolaos Zioulis, Spyridon Thermos
Kris Kitani, Jan Kautz
(Codewheel; Univ. of Thessaly)
42. [0858] Bailando: 3D Dance Generation by Actor-Critic GPT With
• GeoEngine: A Platform for Production-Ready Geospatial
Choreographic Memory, Li Siyao, Weijiang Yu, Tianpei Gu,
Research, Sagar Verma, Siddharth Gupta, Hal Shin, Akash
Chunze Lin, Quan Wang, Chen Qian, Chen Change Loy, Ziwei Liu
Panigrahi, Shubham Goswami, Shweta Pardeshi, Natanael Exe,
43. [0906] Contextual Instance Decoupling for Robust Multi-Person Ujwal Dutta, Tanka Joshi, Nitin Bhojwani (Université Paris-Saclay,
Pose Estimation, Dongkai Wang, Shiliang Zhang CentraleSupélec, Inria, Centre de Vision Numérique, Granular AI)
44. [0911] End-to-End Multi-Person Pose Estimation With • DeepLIIF: An Online Platform for Quantification of Clinical
Transformers, Dahu Shi, Xing Wei, Liangqi Li, Ye Ren, Wenming Pathology Slides, Parmida Ghahremani, Joseph Marino, Ricardo
Tan Dodds, Saad Nadeem (Memorial Sloan Kettering Cancer Center)
45. [0916] Meta Agent Teaming Active Learning for Pose Estimation,
• Talking Face Generation With Multilingual TTS, Hyoung-Kyu Song,
Jia Gong, Zhipeng Fan, Qiuhong Ke, Hossein Rahmani, Jun Liu
Sang Hoon Woo, Junhyeok Lee, Seungmin Yang, Hyunjae Cho,
46. [0924] Keypoint Transformer: Solving Joint Identification in Dongho Choi, Kang-wook Kim, Youseong Lee (MINDsLab Inc.;
Challenging Hands and Object Interactions for Accurate 3D Pose KAIST; Seoul National Univ.)
Estimation, Shreyas Hampali, Sayan Deb Sarkar, Mahdi Rad, • [Virtual] Scenic: A JAX Library for Computer Vision Research and
Vincent Lepetit Beyond, Mostafa Dehghani, Alexey Gritsenko, Anurag Arnab,
47. [0929] Not All Tokens Are Equal: Human-Centric Visual Analysis Matthias Minderer, yi Tay (Google Brain & Google Research)
via Token Clustering Transformer, Wang Zeng, Sheng Jin, Wentao
• [Virtual] BigDL 2.0: Seamless Scaling of AI Pipelines From Laptops
Liu, Chen Qian, Ping Luo, Wanli Ouyang, Xiaogang Wang
to Distributed Cluster, Jason Dai, Ding Ding, Dongjie Shi,
48. [0934] Occlusion-Robust Face Alignment Using a Viewpoint- Shengsheng Huang, Jiao Wang, Xin Qiu, Kai Huang, Guoqiong Song,
Invariant Hierarchical Network Architecture, Congcong Zhu, Yang Wang, Yiquan Gong, Jiaming Song, Shan Yu, Le Zheng, Yina
Xintong Wan, Shaorong Xie, Xiaoqiang Li, Yinzheng Gu Chen, Junwei Deng, Ge Song (Intel)
• [Virtual] PyMiceTracking: An Open-Source Toolbox for Real-Time
Behavioral Neuroscience Experiments, Richardson Menezes, Aron
de Miranda, Helton Maia (Federal Univ. of Rio Grande do Norte)

36
Thursday, June 23 (Morning) Program
1000–1230 Poster 3.1 (Halls B2-C) 76. Spatially-Adaptive Multilayer Selection for GAN Inversion and
Image & Video Synthesis and Generation Editing, Gaurav Parmar, Yijun Li, Jingwan Lu, Richard Zhang,
Jun-Yan Zhu, Krishna Kumar Singh
55. OSSGAN: Open-Set Semi-Supervised Image Generation, Kai
Katsumata, Duc Minh Vo, Hideki Nakayama 77. On Aliased Resizing and Surprising Subtleties in GAN
Evaluation, Gaurav Parmar, Richard Zhang, Jun-Yan Zhu
56. Attribute Group Editing for Reliable Few-Shot Image
Generation, Guanqi Ding, Xinzhe Han, Shuhui Wang, Shuzhe Wu, 78. Dual-Path Image Inpainting With Auxiliary GAN Inversion,
Xin Jin, Dandan Tu, Qingming Huang Wentao Wang, Li Niu, Jianfu Zhang, Xue Yang, Liqing Zhang
57. Few Shot Generative Model Adaption via Relaxed Spatial
79. InOut: Diverse Image Outpainting via GAN Inversion, Yen-Chi
Structural Alignment, Jiayu Xiao, Liang Li, Chaofei Wang, Zheng- Cheng, Chieh Hubert Lin, Hsin-Ying Lee, Jian Ren, Sergey
Jun Zha, Qingming Huang Tulyakov, Ming-Hsuan Yang
58. Semantic-Shape Adaptive Feature Modulation for Semantic
80. Diverse Plausible 360-Degree Image Outpainting for Efficient
Image Synthesis, Zhengyao Lv, Xiaoming Li, Zhenxing Niu, Bing 3DCG Background Creation, Naofumi Akimoto, Yuhi Matsuo,
Cao, Wangmeng Zuo Yoshimitsu Aoki
59. Retrieval-Based Spatially Adaptive Normalization for Semantic
81. Contextual Outpainting With Object-Level Contrastive
Image Synthesis, Yupeng Shi, Xiao Liu, Yuxiang Wei, Zhongqin Learning, Jiacheng Li, Chang Chen, Zhiwei Xiong
Wu, Wangmeng Zuo 82. RePaint: Inpainting Using Denoising Diffusion Probabilistic
60. Generative Flows With Invertible Attentions, Rhea Sanjay
Models, Andreas Lugmayr, Martin Danelljan, Andres Romero,
Sukthanker, Zhiwu Huang, Suryansh Kumar, Radu Timofte, Luc Fisher Yu, Radu Timofte, Luc Van Gool
Van Gool 83. Perception Prioritized Training of Diffusion Models, Jooyoung
61. Style-Structure Disentangled Features and Normalizing Flows
Choi, Jungbeom Lee, Chaehun Shin, Sungwon Kim, Hyunwoo
for Diverse Icon Colorization, Yuan-kui Li, Yun-Hsuan Lien, Yu- Kim, Sungroh Yoon
Shuen Wang 84. Dynamic Dual-Output Diffusion Models, Yaniv Benny, Lior Wolf
62. SemanticStyleGAN: Learning Compositional Generative Priors 85. Generating High Fidelity Data From Low-Density Regions Using
for Controllable Image Synthesis and Editing, Yichun Shi, Xiao Diffusion Models, Vikash Sehwag, Caner Hazirbas, Albert Gordo,
Yang, Yangyue Wan, Xiaohui Shen Firat Ozgenel, Cristian Canton
63. Manifold Learning Benefits GANs, Yao Ni, Piotr Koniusz, Richard 86. Global Context With Discrete Diffusion in Vector Quantised
Hartley, Richard Nock Modelling for Image Generation, Minghui Hu, Yujie Wang, Tat-
64. DO-GAN: A Double Oracle Framework for Generative
Jen Cham, Jianfei Yang, P.N. Suganthan
Adversarial Networks, Aye Phyu Phyu Aung, Xinrun Wang, 87. Bridging Global Context Interactions for High-Fidelity Image
Runsheng Yu, Bo An, Senthilnath Jayavelu, Xiaoli Li Completion, Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai, Dinh
65. Improving GAN Equilibrium by Raising Spatial Awareness,
Phung
Jianyuan Wang, Ceyuan Yang, Yinghao Xu, Yujun Shen, 88. Autoregressive Image Generation Using Residual Quantization,
Hongdong Li, Bolei Zhou Doyup Lee, Chiheon Kim, Saehoon Kim, Minsu Cho, Wook-Shin Han
66. Feature Statistics Mixing Regularization for Generative 89. Arbitrary-Scale Image Synthesis, Evangelos Ntavelis, Mohamad
Adversarial Networks, Junho Kim, Yunjey Choi, Youngjung Uh Shahbazi, Iason Kastanis, Radu Timofte, Martin Danelljan, Luc
67. StyleSwin: Transformer-Based GAN for High-Resolution Image
Van Gool
Generation, Bowen Zhang, Shuyang Gu, Bo Zhang, Jianmin Bao, 90. Cluster-Guided Image Synthesis With Unconditional Models,
Dong Chen, Fang Wen, Yong Wang, Baining Guo Markos Georgopoulos, James Oldfield, Grigorios G. Chrysos,
68. MaskGIT: Masked Generative Image Transformer, Huiwen
Yannis Panagakis
Chang, Han Zhang, Lu Jiang, Ce Liu, William T. Freeman Segmentation, Grouping and Shape Analysis
69. StyTr2: Image Style Transfer With Transformers, Yingying Deng, 91. Dynamic Prototype Convolution Network for Few-Shot
Fan Tang, Weiming Dong, Chongyang Ma, Xingjia Pan, Lei Wang, Semantic Segmentation, Jie Liu, Yanqi Bao, Guo-Sen Xie, Huan
Changsheng Xu Xiong, Jan-Jakob Sonke, Efstratios Gavves
70. Style Transformer for Image Inversion and Editing, Xueqi Hu, 92. Generalized Few-Shot Semantic Segmentation, Zhuotao Tian,
Qiusheng Huang, Zhengyi Shi, Siyuan Li, Changxin Gao, Li Sun, Xin Lai, Li Jiang, Shu Liu, Michelle Shu, Hengshuang Zhao, Jiaya
Qingli Li Jia
71. Reduce Information Loss in Transformers for Pluralistic Image 93. Learning Non-Target Knowledge for Few-Shot Semantic
Inpainting, Qiankun Liu, Zhentao Tan, Dongdong Chen, Qi Chu, Segmentation, Yuanwei Liu, Nian Liu, Qinglong Cao, Xiwen Yao,
Xiyang Dai, Yinpeng Chen, Mengchen Liu, Lu Yuan, Nenghai Yu Junwei Han, Ling Shao
72. Incremental Transformer Structure Enhanced Image Inpainting 94. Decoupling Zero-Shot Semantic Segmentation, Jian Ding, Nan
With Masking Positional Encoding, Qiaole Dong, Chenjie Cao, Xue, Gui-Song Xia, Dengxin Dai
Yanwei Fu 95. Class-Balanced Pixel-Level Self-Labeling for Domain Adaptive
73. UniCoRN: A Unified Conditional Image Repainting Network, Semantic Segmentation, Ruihuang Li, Shuai Li, Chenhang He,
Jimeng Sun, Shuchen Weng, Zheng Chang, Si Li, Boxin Shi Yabin Zhang, Xu Jia, Lei Zhang
74. High-Fidelity GAN Inversion for Image Attribute Editing, Tengfei 96. ContrastMask: Contrastive Learning To Segment Every Thing,
Wang, Yong Zhang, Yanbo Fan, Jue Wang, Qifeng Chen Xuehui Wang, Kai Zhao, Ruixin Zhang, Shouhong Ding, Yan
75. HyperInverter: Improving StyleGAN Inversion via Wang, Wei Shen
Hypernetwork, Tan M. Dinh, Anh Tuan Tran, Rang Nguyen, Binh-
Son Hua
37
Thursday, June 23 (Morning) Program
97. The Neurally-Guided Shape Parser: Grammar-Based Labeling of 117. SemAffiNet: Semantic-Affine Transformation for Point Cloud
3D Shape Regions With Approximate Inference, R. Kenny Jones, Segmentation, Ziyi Wang, Yongming Rao, Xumin Yu, Jie Zhou,
Aalia Habib, Rana Hanocka, Daniel Ritchie Jiwen Lu
98. AutoGPart: Intermediate Supervision Search for Generalizable 118. An MIL-Derived Transformer for Weakly Supervised Point Cloud
3D Part Segmentation, Xueyi Liu, Xiaomeng Xu, Anyi Rao, Segmentation, Cheng-Kun Yang, Ji-Jia Wu, Kai-Syun Chen, Yung-
Chuang Gan, Li Yi Yu Chuang, Yen-Yu Lin
99. APES: Articulated Part Extraction From Sprite Sheets, Zhan Xu, 119. Weakly Supervised Segmentation on Outdoor 4D Point Clouds
Matthew Fisher, Yang Zhou, Deepali Aneja, Rushikesh Dudhat, Li With Temporal Matching and Spatial Graph Propagation, Hanyu
Yi, Evangelos Kalogerakis Shi, Jiacheng Wei, Ruibo Li, Fayao Liu, Guosheng Lin
100. GASP, a Generalized Framework for Agglomerative Clustering 120. Point2Cyl: Reverse Engineering 3D Objects From Point Clouds
of Signed Graphs and Its Application to Instance Segmentation, to Extrusion Cylinders, Mikaela Angelina Uy, Yen-Yu Chang,
Alberto Bailoni, Constantin Pape, Nathan Hütsch, Steffen Wolf, Minhyuk Sung, Purvi Goel, Joseph G. Lambourne, Tolga Birdal,
Thorsten Beier, Anna Kreshuk, Fred A. Hamprecht Leonidas J. Guibas
101. CycleMix: A Holistic Strategy for Medical Image Segmentation Deep Learning Architectures & Techniques
From Scribble Supervision, Ke Zhang, Xiahai Zhuang 121. Demystifying the Neural Tangent Kernel From a Practical
102. Cross-Patch Dense Contrastive Learning for Semi-Supervised Perspective: Can It Be Trusted for Neural Architecture Search
Segmentation of Cellular Nuclei in Histopathologic Images, Without Training? Jisoo Mok, Byunggook Na, Ji-Hoon Kim,
Huisi Wu, Zhaoze Wang, Youyi Song, Lin Yang, Jing Qin Dongyoon Han, Sungroh Yoon
103. C-CAM: Causal CAM for Weakly Supervised Semantic 122. BaLeNAS: Differentiable Architecture Search via the Bayesian
Segmentation on Medical Image, Zhang Chen, Zhiqiang Tian, Learning Rule, Miao Zhang, Shirui Pan, Xiaojun Chang, Steven
Jihua Zhu, Ce Li, Shaoyi Du Su, Jilin Hu, Gholamreza (Reza) Haffari, Bin Yang
104. CRIS: CLIP-Driven Referring Image Segmentation, Zhaoqing 123. Arch-Graph: Acyclic Architecture Relation Predictor for Task-
Wang, Yu Lu, Qiang Li, Xunqiang Tao, Yandong Guo, Mingming Transferable Neural Architecture Search, Minbin Huang, Zhijian
Gong, Tongliang Liu Huang, Changlin Li, Xin Chen, Hang Xu, Zhenguo Li, Xiaodan
105. MatteFormer: Transformer-Based Image Matting via Prior- Liang
Tokens, GyuTae Park, SungJoon Son, JaeYoung Yoo, SeHo Kim, 124. Shapley-NAS: Discovering Operation Contribution for Neural
Nojun Kwak Architecture Search, Han Xiao, Ziwei Wang, Zheng Zhu, Jie
106. Boosting Robustness of Image Matting With Context Zhou, Jiwen Lu
Assembling and Strong Data Augmentation, Yutong Dai, Brian 125. GreedyNASv2: Greedier Search With a Greedy Path Filter, Tao
Price, He Zhang, Chunhua Shen Huang, Shan You, Fei Wang, Chen Qian, Changshui Zhang,
107. Pyramid Grafting Network for One-Stage High Resolution Xiaogang Wang, Chang Xu
Saliency Detection, Chenxi Xie, Changqun Xia, Mingcan Ma, 126. Neural Architecture Search With Representation Mutual
Zhirui Zhao, Xiaowu Chen, Jia Li Information, Xiawu Zheng, Xiang Fei, Lei Zhang, Chenglin Wu,
108. Multi-Source Uncertainty Mining for Deep Unsupervised Fei Chao, Jianzhuang Liu, Wei Zeng, Yonghong Tian, Rongrong Ji
Saliency Detection, Yifan Wang, Wenbo Zhang, Lijun Wang, Ting 127. Performance-Aware Mutual Knowledge Distillation for
Liu, Huchuan Lu Improving Neural Architecture Search, Pengtao Xie, Xuefeng Du
109. Modeling Motion With Multi-Modal Features for Text-Based 128. Knowledge Distillation With the Reused Teacher Classifier,
Video Segmentation, Wangbo Zhao, Kai Wang, Xiangxiang Chu, Defang Chen, Jian-Ping Mei, Hailin Zhang, Can Wang, Yan Feng,
Fuzhao Xue, Xinchao Wang, Yang You Chun Chen
110. GAT-CADNet: Graph Attention Network for Panoptic Symbol 129. Self-Distillation From the Last Mini-Batch for Consistency
Spotting in CAD Drawings, Zhaohua Zheng, Jianfang Li, Lingjie Regularization, Yiqing Shen, Liwu Xu, Yuzhe Yang, Yaqian Li,
Zhu, Honghua Li, Frank Petzold, Ping Tan Yandong Guo
111. Bending Graphs: Hierarchical Shape Matching Using Gated 130. Decoupled Knowledge Distillation, Borui Zhao, Quan Cui, Renjie
Optimal Transport, Mahdi Saleh, Shun-Cheng Wu, Luca Cosmo, Song, Yiyu Qiu, Jiajun Liang
Nassir Navab, Benjamin Busam, Federico Tombari 131. Scaling Up Your Kernels to 31x31: Revisiting Large Kernel
112. CAPRI-Net: Learning Compact CAD Shapes With Adaptive Design in CNNs, Xiaohan Ding, Xiangyu Zhang, Jungong Han,
Primitive Assembly, Fenggen Yu, Zhiqin Chen, Manyi Li, Aditya Guiguang Ding
Sanghi, Hooman Shayani, Ali Mahdavi-Amiri, Hao Zhang 132. A ConvNet for the 2020s, Zhuang Liu, Hanzi Mao, Chao-Yuan
113. RIM-Net: Recursive Implicit Fields for Unsupervised Learning of Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie
Hierarchical Shape Structures, Chengjie Niu, Manyi Li, Kai Xu, 133. Beyond Fixation: Dynamic Window Visual Transformer,
Hao Zhang Pengzhen Ren, Changlin Li, Guangrun Wang, Yun Xiao, Qing Du,
114. Discovering Objects That Can Move, Zhipeng Bao, Pavel Xiaodan Liang, Xiaojun Chang
Tokmakov, Allan Jabri, Yu-Xiong Wang, Adrien Gaidon, Martial 134. Lite Vision Transformer With Enhanced Self-Attention, Chenglin
Hebert Yang, Yilin Wang, Jianming Zhang, He Zhang, Zijun Wei, Zhe Lin,
115. PatchFormer: An Efficient Point Transformer With Patch Alan Yuille
Attention, Cheng Zhang, Haocheng Wan, Xinyi Shen, Zizhao Wu 135. Swin Transformer V2: Scaling Up Capacity and Resolution, Ze
116. Panoptic-PHNet: Towards Real-Time and High-Precision LiDAR Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei,
Panoptic Segmentation via Clustering Pseudo Heatmap, Jinke Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo
Li, Xiao He, Yang Wen, Yuan Gao, Xiaoqiang Cheng, Dan Zhang

38
Thursday, June 23 (Morning) Program
136. The Principle of Diversity: Training Stronger Vision Transformers 157. Active Learning by Feature Mixing, Amin Parvaneh, Ehsan
Calls for Reducing All Levels of Redundancy, Tianlong Chen, Abbasnejad, Damien Teney, Gholamreza (Reza) Haffari, Anton
Zhenyu Zhang, Yu Cheng, Ahmed Awadallah, Zhangyang Wang van den Hengel, Javen Qinfeng Shi
137. MulT: An End-to-End Multitask Learning Transformer, Deblina 158. When To Prune? A Policy Towards Early Structural Pruning,
Bhattacharjee, Tong Zhang, Sabine Süsstrunk, Mathieu Salzmann Maying Shen, Pavlo Molchanov, Hongxu Yin, Jose M. Alvarez
138. Towards Robust Vision Transformer, Xiaofeng Mao, Gege Qi, 159. Contrastive Dual Gating: Learning Sparse Features With
Yuefeng Chen, Xiaodan Li, Ranjie Duan, Shaokai Ye, Yuan He, Hui Contrastive Learning, Jian Meng, Li Yang, Jinwoo Shin, Deliang
Xue Fan, Jae-sun Seo
139. DearKD: Data-Efficient Early Knowledge Distillation for Vision 160. How Well Do Sparse ImageNet Models Transfer? Eugenia
Transformers, Xianing Chen, Qiong Cao, Yujie Zhong, Jing Zhang, Iofinova, Alexandra Peste, Mark Kurtz, Dan Alistarh
Shenghua Gao, Dacheng Tao 161. Rep-Net: Efficient On-Device Learning via Feature
140. MSG-Transformer: Exchanging Local Spatial Information by Reprogramming, Li Yang, Adnan Siraj Rakin, Deliang Fan
Manipulating Messenger Tokens, Jiemin Fang, Lingxi Xie, 162. CHEX: CHannel EXploration for CNN Model Compression,
Xinggang Wang, Xiaopeng Zhang, Wenyu Liu, Qi Tian Zejiang Hou, Minghai Qin, Fei Sun, Xiaolong Ma, Kun Yuan, Yi Xu,
141. NomMer: Nominate Synergistic Context in Vision Transformer Yen-Kuang Chen, Rong Jin, Yuan Xie, Sun-Yuan Kung
for Visual Recognition, Hao Liu, Xinghua Jiang, Xin Li, Zhimin 163. HODEC: Towards Efficient High-Order DEcomposed
Bao, Deqiang Jiang, Bo Ren Convolutional Neural Networks, Miao Yin, Yang Sui, Wanzhao
142. TopFormer: Token Pyramid Transformer for Mobile Semantic Yang, Xiao Zang, Yu Gong, Bo Yuan
Segmentation, Wenqiang Zhang, Zilong Huang, Guozhong Luo, 164. AdaViT: Adaptive Vision Transformers for Efficient Image
Tao Chen, Xinggang Wang, Wenyu Liu, Gang Yu, Chunhua Shen Recognition, Lingchen Meng, Hengduo Li, Bor-Chun Chen, Shiyi
143. Multi-Scale High-Resolution Vision Transformer for Semantic Lan, Zuxuan Wu, Yu-Gang Jiang, Ser-Nam Lim
Segmentation, Jiaqi Gu, Hyoukjun Kwon, Dilin Wang, Wei Ye, Meng 165. Cross-Image Relational Knowledge Distillation for Semantic
Li, Yu-Hsin Chen, Liangzhen Lai, Vikas Chandra, David Z. Pan Segmentation, Chuanguang Yang, Helong Zhou, Zhulin An, Xue
144. Scaling Vision Transformers, Xiaohua Zhai, Alexander Jiang, Yongjun Xu, Qian Zhang
Kolesnikov, Neil Houlsby, Lucas Beyer 166. Mr.BiQ: Post-Training Non-Uniform Quantization Based on
145. Bridged Transformer for Vision and Point Cloud 3D Object Minimizing the Reconstruction Error, Yongkweon Jeon,
Detection, Yikai Wang, TengQi Ye, Lele Cao, Wenbing Huang, Chungman Lee, Eulrang Cho, Yeonju Ro
Fuchun Sun, Fengxiang He, Dacheng Tao 167. IntraQ: Learning Synthetic Images With Intra-Class
146. CSWin Transformer: A General Vision Transformer Backbone Heterogeneity for Zero-Shot Network Quantization, Yunshan
With Cross-Shaped Windows, Xiaoyi Dong, Jianmin Bao, Zhong, Mingbao Lin, Gongrui Nan, Jianzhuang Liu, Baochang
Dongdong Chen, Weiming Zhang, Nenghai Yu, Lu Yuan, Dong Zhang, Yonghong Tian, Rongrong Ji
Chen, Baining Guo 168. DECORE: Deep Compression With Reinforcement Learning,
147. TransMix: Attend To Mix for Vision Transformers, Jie-Neng Manoj Alwani, Yang Wang, Vashisht Madhavan
Chen, Shuyang Sun, Ju He, Philip H.S. Torr, Alan Yuille, Song Bai 169. Towards Efficient and Scalable Sharpness-Aware Minimization,
148. MiniViT: Compressing Vision Transformers With Weight Yong Liu, Siqi Mai, Xiangning Chen, Cho-Jui Hsieh, Yang You
Multiplexing, Jinnian Zhang, Houwen Peng, Kan Wu, Mengchen 170. AEGNN: Asynchronous Event-Based Graph Neural Networks,
Liu, Bin Xiao, Jianlong Fu, Lu Yuan Simon Schaefer, Daniel Gehrig, Davide Scaramuzza
149. Fine-Tuning Image Transformers Using Learnable Memory, Mark 171. DiSparse: Disentangled Sparsification for Multitask Model
Sandler, Andrey Zhmoginov, Max Vladymyrov, Andrew Jackson Compression, Xinglong Sun, Ali Hassani, Zhangyang Wang, Gao
150. Patch Slimming for Efficient Vision Transformers, Yehui Tang, Huang, Humphrey Shi
Kai Han, Yunhe Wang, Chang Xu, Jianyuan Guo, Chao Xu, 172. Multi-Modal Extreme Classification, Anshul Mittal, Kunal
Dacheng Tao Dahiya, Shreya Malani, Janani Ramaswamy, Seba Kuruvilla,
151. CMT: Convolutional Neural Networks Meet Vision Transformers, Jitendra Ajmera, Keng-hao Chang, Sumeet Agarwal,
Jianyuan Guo, Kai Han, Han Wu, Yehui Tang, Xinghao Chen, Purushottam Kar, Manik Varma
Yunhe Wang, Chang Xu 173. A Sampling-Based Approach for Efficient Clustering in Large
152. Multimodal Token Fusion for Vision Transformers, Yikai Wang, Datasets, Georgios Exarchakis, Omar Oubari, Gregor Lenz
Xinghao Chen, Lele Cao, Wenbing Huang, Fuchun Sun, Yunhe Wang 174. Come-Closer-Diffuse-Faster: Accelerating Conditional Diffusion
Efficient Learning & Inference Models for Inverse Problems Through Stochastic Contraction,
153. CAFE: Learning To Condense Dataset by Aligning Features, Kai Hyungjin Chung, Byeongsu Sim, Jong Chul Ye
Wang, Bo Zhao, Xiangyu Peng, Zheng Zhu, Shuo Yang, Shuo 175. Learnable Lookup Table for Neural Network Quantization,
Wang, Guan Huang, Hakan Bilen, Xinchao Wang, Yang You Longguang Wang, Xiaoyu Dong, Yingqian Wang, Li Liu, Wei An,
154. Lite-MDETR: A Lightweight Multi-Modal Detector, Qian Lou, Yulan Guo
Yen-Chang Hsu, Burak Uzkent, Ting Hua, Yilin Shen, Hongxia Jin 176. Instance-Aware Dynamic Neural Network Quantization,
155. DeeCap: Dynamic Early Exiting for Efficient Image Captioning, Zhenhua Liu, Yunhe Wang, Kai Han, Siwei Ma, Wen Gao
Zhengcong Fei, Xu Yan, Shuhui Wang, Qi Tian 177. Training High-Performance Low-Latency Spiking Neural
156. Searching the Deployable Convolution Neural Networks for Networks by Differentiation on Spike Representation, Qingyan
GPUs, Linnan Wang, Chenhan Yu, Satish Salian, Slawomir Kierat, Meng, Mingqing Xiao, Shen Yan, Yisen Wang, Zhouchen Lin, Zhi-
Szymon Migacz, Alex Fit Florea Quan Luo

39
Thursday, June 23 (Morning) Program
178. Fire Together Wire Together: A Dynamic Pruning Approach 198. Shape From Thermal Radiation: Passive Ranging Using Multi-
With Self-Supervised Mask Prediction, Sara Elkerdawy, Mostafa Spectral LWIR Measurements, Yasuto Nagase, Takahiro Kushida,
Elhoushi, Hong Zhang, Nilanjan Ray Kenichiro Tanaka, Takuya Funatomi, Yasuhiro Mukaigawa
179. Wavelet Knowledge Distillation: Towards Efficient Image-to- 199. NAN: Noise-Aware NeRFs for Burst-Denoising, Naama Pearl,
Image Translation, Linfeng Zhang, Xin Chen, Xiaobing Tu, Tali Treibitz, Simon Korman
Pengfei Wan, Ning Xu, Kaisheng Ma 200. Estimating Fine-Grained Noise Model via Contrastive Learning,
180. PokeBNN: A Binary Pursuit of Lightweight Accuracy, Yichi Yunhao Zou, Ying Fu
Zhang, Zhiru Zhang, Lukasz Lew 201. Real-Time Hyperspectral Imaging in Hardware via Trained
181. Automated Progressive Learning for Efficient Training of Vision Metasurface Encoders, Maksim Makarenko, Arturo Burguete-
Transformers, Changlin Li, Bohan Zhuang, Guangrun Wang, Lopez, Qizhou Wang, Fedor Getman, Silvio Giancola, Bernard
Xiaodan Liang, Xiaojun Chang, Yi Yang Ghanem, Andrea Fratalocchi
182. DeltaCNN: End-to-End CNN Inference of Sparse Frame 202. MNSRNet: Multimodal Transformer Network for 3D Surface
Differences in Videos, Mathias Parger, Chengcheng Tang, Super-Resolution, Wuyuan Xie, Tengcong Huang, Miaohui Wang
Christopher D. Twigg, Cem Keskin, Robert Wang, Markus 203. PhyIR: Physics-Based Inverse Rendering for Panoramic Indoor
Steinberger Images, Zhen Li, Lingli Wang, Xiang Huang, Cihui Pan, Jiaqi Yang
183. Channel Balancing for Accurate Quantization of Winograd Visual Reasoning
Convolutions, Vladimir Chikin, Vladimir Kryzhanovskiy
204. Neural Shape Mating: Self-Supervised Object Assembly With
184. ClusterGNN: Cluster-Based Coarse-To-Fine Graph Neural Adversarial Shape Priors, Yun-Chun Chen, Haoda Li, Dylan
Network for Efficient Feature Matching, Yan Shi, Jun-Xiong Cai, Turpin, Alec Jacobson, Animesh Garg
Yoli Shavit, Tai-Jiang Mu, Wensen Feng, Kai Zhang 205. Learning To Anticipate Future With Dynamic Context Removal,
185. Interspace Pruning: Using Adaptive Filter Representations To Xinyu Xu, Yong-Lu Li, Cewu Lu
Improve Training of Sparse CNNs, Paul Wimmer, Jens Mehnert, 206. Self-Supervised Spatial Reasoning on Multi-View Line
Alexandru Condurache
Drawings, Siyuan Xiang, Anbang Yang, Yanfei Xue, Yaoqing
186. AlignQ: Alignment Quantization With ADMM-Based Yang, Chen Feng
Correlation Preservation, Ting-An Chen, De-Nian Yang, Ming-
207. Contextual Debiasing for Visual Recognition With Causal
Syan Chen Mechanisms, Ruyang Liu, Hao Liu, Ge Li, Haodi Hou, TingHao Yu,
187. TVConv: Efficient Translation Variant Convolution for Layout- Tao Yang
Aware Visual Processing, Jierun Chen, Tianlang He, Weipeng
Zhuo, Li Ma, Sangtae Ha, S.-H. Gary Chan 3D From Multi-View & Sensors
188. SplitNets: Designing Neural Architectures for Efficient 208. Relative Pose From a Calibrated and an Uncalibrated
Distributed Computing on Head-Mounted Systems, Xin Dong, Smartphone Image, Yaqing Ding, Daniel Barath, Jian Yang,
Barbara De Salvo, Meng Li, Chiao Liu, Zhongnan Qu, H.T. Kung, Zuzana Kukelova
Ziyun Li 209. Exploiting Rigidity Constraints for LiDAR Scene Flow
189. TO-FLOW: Efficient Continuous Normalizing Flows With Estimation, Guanting Dong, Yueyi Zhang, Hanlin Li, Xiaoyan Sun,
Temporal Optimization Adjoint With Moving Speed, Shian Du, Zhiwei Xiong
Yihong Luo, Wei Chen, Jian Xu, Delu Zeng 210. NICE-SLAM: Neural Implicit Scalable Encoding for SLAM, Zihan
Zhu, Songyou Peng, Viktor Larsson, Weiwei Xu, Hujun Bao,
Physics-Based Vision and Shape-From-X
Zhaopeng Cui, Martin R. Oswald, Marc Pollefeys
190. DiLiGenT102: A Photometric Stereo Benchmark Dataset With 211. NinjaDesc: Content-Concealing Visual Descriptors via
Controlled Shape and Material Variation, Jieji Ren, Feishi Wang, Adversarial Learning, Tony Ng, Hyo Jin Kim, Vincent T. Lee,
Jiahao Zhang, Qian Zheng, Mingjun Ren, Boxin Shi Daniel DeTone, Tsun-Yi Yang, Tianwei Shen, Eddy Ilg, Vassileios
191. Universal Photometric Stereo Network Using Global Lighting Balntas, Krystian Mikolajczyk, Chris Sweeney
Contexts, Satoshi Ikehata 212. ScaleNet: A Shallow Architecture for Scale Estimation, Axel
192. Uncertainty-Aware Deep Multi-View Photometric Stereo, Berk Barroso-Laguna, Yurun Tian, Krystian Mikolajczyk
Kaya, Suryansh Kumar, Carlos Oliveira, Vittorio Ferrari, Luc Van 213. Camera Pose Estimation Using Implicit Distortion Models, Linfei
Gool Pan, Marc Pollefeys, Viktor Larsson
193. Fast Light-Weight Near-Field Photometric Stereo, Daniel Lichy, 214. GIFS: Neural Implicit Function for General Shape
Soumyadip Sengupta, David W. Jacobs Representation, Jianglong Ye, Yuntao Chen, Naiyan Wang,
194. Glass Segmentation Using Intensity and Spectral Polarization Xiaolong Wang
Cues, Haiyang Mei, Bo Dong, Wen Dong, Jiaxi Yang, Seung-Hwan 215. Learning Deep Implicit Functions for 3D Shapes With Dynamic
Baek, Felix Heide, Pieter Peers, Xiaopeng Wei, Xin Yang Code Clouds, Tianyang Li, Xin Wen, Yu-Shen Liu, Hua Su,
195. Shape From Polarization for Complex Scenes in the Wild, Zhizhong Han
Chenyang Lei, Chenyang Qi, Jiaxin Xie, Na Fan, Vladlen Koltun, 216. SPAMs: Structured Implicit Parametric Models, Pablo Palafox,
Qifeng Chen Nikolaos Sarafianos, Tony Tung, Angela Dai
196. Deep Depth From Focus With Differential Focus Volume, 217. Deblur-NeRF: Neural Radiance Fields From Blurry Images, Li
Fengting Yang, Xiaolei Huang, Zihan Zhou Ma, Xiaoyu Li, Jing Liao, Qi Zhang, Xuan Wang, Jue Wang, Pedro
197. Optimal LED Spectral Multiplexing for NIR2RGB Translation, Lei V. Sander
Liu, Yuze Chen, Junchi Yan, Yinqiang Zheng

40
Thursday, June 23 (Morning) Program
218. Panoptic Neural Fields: A Semantic Object-Aware Neural Scene 239. Single-Stage Is Enough: Multi-Person Absolute 3D Pose
Representation, Abhijit Kundu, Kyle Genova, Xiaoqi Yin, Alireza Estimation, Lei Jin, Chenyang Xu, Xiaojuan Wang, Yabo Xiao,
Fathi, Caroline Pantofaru, Leonidas J. Guibas, Andrea Yandong Guo, Xuecheng Nie, Jian Zhao
Tagliasacchi, Frank Dellaert, Thomas Funkhouser 240. Distribution-Aware Single-Stage Models for Multi-Person 3D
219. Depth-Supervised NeRF: Fewer Views and Faster Training for Pose Estimation, Zitian Wang, Xuecheng Nie, Xiaochao Qu,
Free, Kangle Deng, Andrew Liu, Jun-Yan Zhu, Deva Ramanan Yunpeng Chen, Si Liu
220. Dense Depth Priors for Neural Radiance Fields From Sparse 241. Trajectory Optimization for Physics-Based Reconstruction of 3D
Input Views, Barbara Roessle, Jonathan T. Barron, Ben Human Pose From Monocular Video, Erik Gärtner, Mykhaylo
Mildenhall, Pratul P. Srinivasan, Matthias Nießner Andriluka, Hongyi Xu, Cristian Sminchisescu
221. EfficientNeRF - Efficient Neural Radiance Fields, Tao Hu, Shu 242. Ray3D: Ray-Based 3D Human Pose Estimation for Monocular
Liu, Yilun Chen, Tiancheng Shen, Jiaya Jia Absolute 3D Localization, Yu Zhan, Fenghai Li, Renliang Weng,
222. InfoNeRF: Ray Entropy Minimization for Few-Shot Neural Wongun Choi
Volume Rendering, Mijeong Kim, Seonguk Seo, Bohyung Han 243. Lite Pose: Efficient Architecture Design for 2D Human Pose
223. Mega-NERF: Scalable Construction of Large-Scale NeRFs for Estimation, Yihan Wang, Muyang Li, Han Cai, Wei-Ming Chen,
Virtual Fly-Throughs, Haithem Turki, Deva Ramanan, Mahadev Song Han
Satyanarayanan 244. Location-Free Human Pose Estimation, Xixia Xu, Yingguo Gao,
224. Urban Radiance Fields, Konstantinos Rematas, Andrew Liu, Ke Yan, Xue Lin, Qi Zou
Pratul P. Srinivasan, Jonathan T. Barron, Andrea Tagliasacchi, 245. MHFormer: Multi-Hypothesis Transformer for 3D Human Pose
Thomas Funkhouser, Vittorio Ferrari Estimation, Wenhao Li, Hong Liu, Hao Tang, Pichao Wang, Luc
225. Hallucinated Neural Radiance Fields in the Wild, Xingyu Chen, Qi Van Gool
Zhang, Xiaoyu Li, Yue Chen, Ying Feng, Xuan Wang, Jue Wang 246. Estimating Egocentric 3D Human Pose in the Wild With External
226. Towards Multimodal Depth Estimation From Light Fields, Titus Weak Supervision, Jian Wang, Lingjie Liu, Weipeng Xu,
Leistner, Radek Mackowiak, Lynton Ardizzone, Ullrich Köthe, Kripasindhu Sarkar, Diogo Luvizon, Christian Theobalt
Carsten Rother 247. Physical Inertial Poser (PIP): Physics-Aware Real-Time Human
227. Degradation-Agnostic Correspondence From Resolution- Motion Tracking From Sparse Inertial Sensors, Xinyu Yi, Yuxiao
Asymmetric Stereo, Xihao Chen, Zhiwei Xiong, Zhen Cheng, Zhou, Marc Habermann, Soshi Shimada, Vladislav Golyanik,
Jiayong Peng, Yueyi Zhang, Zheng-Jun Zha Christian Theobalt, Feng Xu
228. Uniform Subdivision of Omnidirectional Camera Space for 248. PoseKernelLifter: Metric Lifting of 3D Human Pose Using
Efficient Spherical Stereo Matching, Donghun Kang, Hyeonjoong Sound, Zhijian Yang, Xiaoran Fan, Volkan Isler, Hyun Soo Park
Jang, Jungeon Lee, Chong-Min Kyung, Min H. Kim 249. Differentiable Dynamics for Articulated 3D Human Motion
229. Attention Concatenation Volume for Accurate and Efficient Reconstruction, Erik Gärtner, Mykhaylo Andriluka, Erwin
Stereo Matching, Gangwei Xu, Junda Cheng, Peng Guo, Xin Yang Coumans, Cristian Sminchisescu
230. Generalized Binary Search Network for Highly-Efficient Multi- 250. COAP: Compositional Articulated Occupancy of People, Marko
View Stereo, Zhenxing Mi, Chang Di, Dan Xu Mihajlovic, Shunsuke Saito, Aayush Bansal, Michael Zollhöfer,
231. Revisiting Domain Generalized Stereo Matching Networks Siyu Tang
From a Feature Consistency Perspective, Jiawei Zhang, Xiang 251. Capturing Humans in Motion: Temporal-Attentive 3D Human
Wang, Xiao Bai, Chen Wang, Lei Huang, Yimin Chen, Lin Gu, Jun Pose and Shape Estimation From Monocular Video, Wen-Li Wei,
Zhou, Tatsuya Harada, Edwin R. Hancock Jen-Chun Lin, Tyng-Luh Liu, Hong-Yuan Mark Liao
232. GraftNet: Towards Domain Generalized Stereo Matching With a 252. SC2-PCR: A Second Order Spatial Compatibility for Efficient and
Broad-Spectrum and Task-Oriented Feature, Biyang Liu, Huimin Robust Point Cloud Registration, Zhi Chen, Kun Sun, Fan Yang,
Yu, Guodong Qi Wenbing Tao
233. ITSA: An Information-Theoretic Approach to Automatic 253. MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D
Shortcut Avoidance and Domain Generalization in Stereo Human Pose Estimation in Video, Jinlu Zhang, Zhigang Tu,
Matching Networks, WeiQin Chuah, Ruwan Tennakoon, Reza Jianyu Yang, Yujin Chen, Junsong Yuan
Hoseinnezhad, Alireza Bab-Hadiashar, David Suter 254. Putting People in Their Place: Monocular Regression of 3D
234. ActiveZero: Mixed Domain Learning for Active Stereovision People in Depth, Yu Sun, Wu Liu, Qian Bao, Yili Fu, Tao Mei,
With Zero Annotation, Isabella Liu, Edward Yang, Jianyu Tao, Rui Michael J. Black
Chen, Xiaoshuai Zhang, Qing Ran, Zhu Liu, Hao Su 255. FLAG: Flow-Based 3D Avatar Generation From Sparse
235. FoggyStereo: Stereo Matching With Fog Volume Observations, Sadegh Aliakbarian, Pashmina Cameron, Federica
Representation, Chengtang Yao, Lidong Yu Bogo, Andrew Fitzgibbon, Thomas J. Cashman
Pose Estimation & Tracking 256. GOAL: Generating 4D Whole-Body Motion for Hand-Object
236. Multi-Person Extreme Motion Prediction, Wen Guo, Xiaoyu Bie, Grasping, Omid Taheri, Vasileios Choutas, Michael J. Black,
Xavier Alameda-Pineda, Francesc Moreno-Noguer Dimitrios Tzionas
237. Learning Local-Global Contextual Adaptation for Multi-Person 257. Capturing and Inferring Dense Full-Body Human-Scene
Pose Estimation, Nan Xue, Tianfu Wu, Gui-Song Xia, Liangpei Contact, Chun-Hao P. Huang, Hongwei Yi, Markus Höschle,
Zhang Matvey Safroshkin, Tsvetelina Alexiadis, Senya Polikovsky,
238. AdaptPose: Cross-Dataset Adaptation for 3D Human Pose Daniel Scharstein, Michael J. Black
Estimation by Learnable Motion Generation, Mohsen Gholami,
Bastian Wandt, Helge Rhodin, Rabab Ward, Z. Jane Wang

41
Thursday, June 23 (Morning) Program
258. BodyMap: Learning Full-Body Dense Correspondence Map,
Anastasia Ianina, Nikolaos Sarafianos, Yuanlu Xu, Ignacio Rocco,
Tony Tung
259. ICON: Implicit Clothed Humans Obtained From Normals,
Yuliang Xiu, Jinlong Yang, Dimitrios Tzionas, Michael J. Black

1130–1330 Lunch (Halls D-E)

Notes:

42
Thursday, June 23 (Afternoon) Program
1300–1330 Poster Switch/Setup (Halls B2-C) 1330–1500 Oral 3.2.2: Image & Video Synthesis and
Generation (II); Video Analysis &
1330–1500 Oral 3.2.1: Security, Transparency, Understanding (Hall B1)
Fairness, Accountability, Privacy & Ethics Papers in this session are in Poster Session 3.2
in Vision (Great Hall B-C) Chairs: Seon Joo Kim (Yonsei Univ.)
Papers in this session are in Poster Session 3.2 Hilde Kühne (Goethe Univ. Frankfurt)
Chairs: Mario Fritz (Helmholtz Center for Information Security) Yogesh Rawat (Univ. of Central Florida)
Tatiana Tommasi (Politecnico di Torino) Format (5 min. presentation; 3 min. group questions/3 papers)
Junzhou Huang (Univ. of Texas at Arlington) 16. [1330] Drop the GAN: In Defense of Patches Nearest Neighbors
As Single Image Generative Models, Niv Granot, Ben Feinstein,
Format (5 min. presentation; 3 min. group questions/3 papers)
Assaf Shocher, Shai Bagon, Michal Irani
1. [1330] Adversarial Texture for Fooling Person Detectors in the
17. [1335] GAN-Supervised Dense Visual Alignment, William Peebles,
Physical World, Zhanhao Hu, Siyuan Huang, Xiaopei Zhu, Fuchun
Jun-Yan Zhu, Richard Zhang, Antonio Torralba, Alexei A. Efros, Eli
Sun, Bo Zhang, Xiaolin Hu
Shechtman
2. [1335] Infrared Invisible Clothing: Hiding From Infrared Detectors
18. [1340] Look Closer To Supervise Better: One-Shot Font
at Multiple Angles in Real World, Xiaopei Zhu, Zhanhao Hu,
Generation via Component-Based Discriminator, Yuxin Kong,
Siyuan Huang, Jianmin Li, Xiaolin Hu
Canjie Luo, Weihong Ma, Qiyuan Zhu, Shenggao Zhu, Nicholas
3. [1340] Enhancing Classifier Conservativeness and Robustness by
Yuan, Lianwen Jin
Polynomiality, Ziqi Wang, Marco Loog
19. [1348] Text2Mesh: Text-Driven Neural Stylization for Meshes,
4. [1348] Backdoor Attacks on Self-Supervised Learning, Aniruddha
Oscar Michel, Roi Bar-On, Richard Liu, Sagie Benaim, Rana
Saha, Ajinkya Tejankar, Soroush Abbasi Koohpayegani, Hamed
Hanocka
Pirsiavash
20. [1353] StyleSDF: High-Resolution 3D-Consistent Image and
5. [1353] Towards Practical Deployment-Stage Backdoor Attack on
Geometry Generation, Roy Or-El, Xuan Luo, Mengyi Shan, Eli
Deep Neural Networks, Xiangyu Qi, Tinghao Xie, Ruizhe Pan,
Shechtman, Jeong Joon Park, Ira Kemelmacher-Shlizerman
Jifeng Zhu, Yong Yang, Kai Bu
21. [1358] Physical Simulation Layer for Accurate 3D Modeling,
6. [1358] Few-Shot Backdoor Defense Using Shapley Estimation,
Mariem Mezghanni, Théo Bodrito, Malika Boulkenafed, Maks
Jiyang Guan, Zhuozhuo Tu, Ran He, Dacheng Tao
Ovsjanikov
7. [1406] Better Trigger Inversion Optimization in Backdoor
22. [1406] Fourier PlenOctrees for Dynamic Radiance Field
Scanning, Guanhong Tao, Guangyu Shen, Yingqi Liu, Shengwei
Rendering in Real-Time, Liao Wang, Jiakai Zhang, Xinhang Liu,
An, Qiuling Xu, Shiqing Ma, Pan Li, Xiangyu Zhang
Fuqiang Zhao, Yanshun Zhang, Yingliang Zhang, Minye Wu, Jingyi
8. [1411] Bandits for Structure Perturbation-Based Black-Box
Yu, Lan Xu
Attacks to Graph Neural Networks With Theoretical Guarantees,
23. [1411] Neural Texture Extraction and Distribution for Controllable
Binghui Wang, Youqi Li, Pan Zhou
Person Image Synthesis, Yurui Ren, Xiaoqing Fan, Ge Li, Shan Liu,
9. [1416] Improving Robustness Against Stealthy Weight Bit-Flip
Thomas H. Li
Attacks by Output Code Matching, Ozan Özdenizci, Robert
24. [1416] I M Avatar: Implicit Morphable Head Avatars From Videos,
Legenstein
Yufeng Zheng, Victoria Fernández Abrevaya, Marcel C. Bühler, Xu
10. [1424] LAS-AT: Adversarial Training With Learnable Attack Chen, Michael J. Black, Otmar Hilliges
Strategy, Xiaojun Jia, Yong Zhang, Baoyuan Wu, Ke Ma, Jue
25. [1424] E2V-SDE: From Asynchronous Events to Fast and
Wang, Xiaochun Cao
Continuous Video Reconstruction via Neural Stochastic
11. [1429] Subspace Adversarial Training, Tao Li, Yingwen Wu, Sizhe
Differential Equations, Jongwan Kim, DongJin Lee, Byunggook
Chen, Kun Fang, Xiaolin Huang Na, Seongsik Park, Sungroh Yoon
12. [1434] Pyramid Adversarial Training Improves ViT Performance,
26. [1429] RCL: Recurrent Continuous Localization for Temporal
Charles Herrmann, Kyle Sargent, Lu Jiang, Ramin Zabih, Huiwen Action Detection, Qiang Wang, Yanhao Zhang, Yun Zheng, Pan
Chang, Ce Liu, Dilip Krishnan, Deqing Sun Pan
13. [1442] Fingerprinting Deep Neural Networks Globally via 27. [1434] Self-Supervised Predictive Convolutional Attentive Block
Universal Adversarial Perturbations, Zirui Peng, Shaofeng Li, for Anomaly Detection, Nicolae-Cătălin Ristea, Neelu Madan,
Guoxing Chen, Cheng Zhang, Haojin Zhu, Minhui Xue Radu Tudor Ionescu, Kamal Nasrollahi, Fahad Shahbaz Khan,
14. [1447] Robust Image Forgery Detection Over Online Social Thomas B. Moeslund, Mubarak Shah
Network Shared Images, Haiwei Wu, Jiantao Zhou, Jinyu Tian, Jun 28. [1442] MeMViT: Memory-Augmented Multiscale Vision
Liu Transformer for Efficient Long-Term Video Recognition, Chao-
15. [1452] Quantifying Societal Bias Amplification in Image Yuan Wu, Yanghao Li, Karttikeya Mangalam, Haoqi Fan, Bo Xiong,
Captioning, Yusuke Hirota, Yuta Nakashima, Noa Garcia Jitendra Malik, Christoph Feichtenhofer
29. [1447] TubeR: Tubelet Transformer for Video Action Detection,
Jiaojiao Zhao, Yanyi Zhang, Xinyu Li, Hao Chen, Bing Shuai,
Mingze Xu, Chunhui Liu, Kaustav Kundu, Yuanjun Xiong, Davide
Modolo, Ivan Marsic, Cees G. M. Snoek, Joseph Tighe
30. [1452] MixFormer: End-to-End Tracking With Iterative Mixed
Attention, Yutao Cui, Cheng Jiang, Limin Wang, Gangshan Wu

43
Thursday, June 23 (Afternoon) Program
1330–1500 Oral 3.2.3: Recognition, Learning for 1400–1600 Demos (Halls B2-C Demo Area)
Vision, and Robot Vision (Great Hall A-D) • Interactive Segmentation and Visualization for Tiny Objects in
Papers in this session are in Poster Session 3.2 Multi-Megapixel Images, Chengyuan Xu, Boning Dong, Noah Stier,
Chairs: Kris Kitani (Carnegie Mellon Univ.) Curtis McCully, D. Andrew Howell, Pradeep Sen, Tobias Hollerer
(UCSB; Las Cumbres Observatory)
R. Venkatesh Babu (Indian Inst. of Science)
• VL-InterpreT: An Interactive Visualization Tool for Interpreting
Noha Radwan (Google)
Vision-Language Transformers, Estelle Guez Aflalo, Meng Du,
Format (5 min. presentation; 3 min. group questions/3 papers)
Shao-Yen Tseng, Yongfei Liu, Chenfei Wu, Nan Duan, Vasudev Lal
31. [1330] DN-DETR: Accelerate DETR Training by Introducing Query (Intel Labs; UCLA; Microsoft Research)
DeNoising, Feng Li, Hao Zhang, Shilong Liu, Jian Guo, Lionel M.
• Speech Driven Tongue Animation, Salvador Medina, Denis Tome,
Ni, Lei Zhang
Carsten Stoll, Thibaut Weise, Iain Matthews (Carnegie Mellon
32. [1335] Proper Reuse of Image Classification Features Improves
University; Epic Games)
Object Detection, Cristina Vasconcelos, Vighnesh Birodkar,
Vincent Dumoulin • Effective Conditioned and Composed Image Retrieval Combining
CLIP-Based Features, Alberto Baldrati, Marco Bertini, Tiberio
33. [1340] Boosting 3D Object Detection by Simulating
Uricchio, Alberto Del Bimbo (Università degli Studi di Firenze;
Multimodality on Point Clouds, Wu Zheng, Mingxuan Hong, Li
Università di Pisa)
Jiang, Chi-Wing Fu
• DetectorDetective: Investigating the Effects of Adversarial
34. [1348] TransVPR: Transformer-Based Place Recognition With
Examples on Object Detectors, Sivapriya Vellaichamy, Matthew
Multi-Level Attention Aggregation, Ruotong Wang, Yanqing
Hull, Zijie J. Wang, Nilaksh Das, ShengYun Peng, Haekyu Park, Duen
Shen, Weiliang Zuo, Sanping Zhou, Nanning Zheng
Horng Chau (Georgia Institute of Technology)
35. [1353] Disentangling Visual Embeddings for Attributes and
• [Virtual] V-Doc: Visual Questions Answers With Documents, Yihao
Objects, Nirat Saini, Khoi Pham, Abhinav Shrivastava
Ding, Zhe Huang, Runlin Wang, YanHang Zhang, Xianru Chen,
36. [1358] QueryDet: Cascaded Sparse Query for Accelerating High-
Yuzhong Ma, Hyunsuk Chung, Soyeon Caren Han (The Univ. of
Resolution Small Object Detection, Chenhongyi Yang, Zehao
Sydney; Fortifyedge)
Huang, Naiyan Wang
• [Virtual] VisCUIT: Visual Auditor for Bias in CNN Image Classifier,
37. [1406] Unknown-Aware Object Detection: Learning What You
Seongmin Lee, Zijie J. Wang, Judy Hoffman, Duen Horng Chau
Don’t Know From Videos in the Wild, Xuefeng Du, Xin Wang, (Georgia Institute of Technology)
Gabriel Gozum, Yixuan Li
• [Virtual] Clustering Plotted Data by Image Segmentation, Tarek
38. [1411] Interpretable Part-Whole Hierarchies and Conceptual-
Naous, Srinjay Sarkar, Abubakar Abid, James Zou (American Univ. of
Semantic Relationships in Neural Networks, Nicola Garau,
Beirut; VinAI Research; Hugging Face; Stanford Univ.)
Niccolò Bisagno, Zeno Sambugaro, Nicola Conci
39. [1416] Can Neural Nets Learn the Same Model Twice?
Investigating Reproducibility and Double Descent From the 1430–1700 Poster 3.2 (Halls B2-C)
Decision Boundary Perspective, Gowthami Somepalli, Liam Fowl, Video Analysis & Understanding
Arpit Bansal, Ping Yeh-Chiang, Yehuda Dar, Richard Baraniuk, 46. UnweaveNet: Unweaving Activity Stories, Will Price, Carl
Micah Goldblum, Tom Goldstein Vondrick, Dima Damen
40. [1424] Calibrating Deep Neural Networks by Pairwise 47. Weakly-Supervised Online Action Segmentation in Multi-View
Constraints, Jiacheng Cheng, Nuno Vasconcelos Instructional Videos, Reza Ghoddoosian, Isht Dwivedi, Nakul
41. [1429] Lifelong Graph Learning, Chen Wang, Yuheng Qiu, Dasong Agarwal, Chiho Choi, Behzad Dariush
Gao, Sebastian Scherer 48. Audio-Adaptive Activity Recognition Across Video Domains,
42. [1434] OrphicX: A Causality-Inspired Latent Variable Model for Yunhua Zhang, Hazel Doughty, Ling Shao, Cees G. M. Snoek
Interpreting Graph Neural Networks, Wanyu Lin, Hao Lan, Hao 49. Frame-Wise Action Representations for Long Videos via
Wang, Baochun Li Sequence Contrastive Learning, Minghao Chen, Fangyun Wei,
43. [1442] Coarse-To-Fine Q-Attention: Efficient Learning for Visual Chong Li, Deng Cai
Robotic Manipulation via Discretisation, Stephen James, Kentaro 50. Image Based Reconstruction of Liquids From 2D Surface
Wada, Tristan Laidlow, Andrew J. Davison Detections, Florian Richter, Ryan K. Orosco, Michael C. Yip
44. [1447] Dual Task Learning by Leveraging Both Dense 51. Learning From Untrimmed Videos: Self-Supervised Video
Correspondence and Mis-Correspondence for Robust Change Representation Learning With Hierarchical Consistency, Zhiwu
Detection With Imperfect Matches, Jin-Man Park, Ue-Hwan Kim, Qing, Shiwei Zhang, Ziyuan Huang, Yi Xu, Xiang Wang, Mingqian
Seon-Hoon Lee, Jong-Hwan Kim Tang, Changxin Gao, Rong Jin, Nong Sang
45. [1452] Cross-View Transformers for Real-Time Map-View 52. How Do You Do It? Fine-Grained Action Understanding With
Semantic Segmentation, Brady Zhou, Philipp Krähenbühl Pseudo-Adverbs, Hazel Doughty, Cees G. M. Snoek
53. Programmatic Concept Learning for Human Motion Description
and Synthesis, Sumith Kulal, Jiayuan Mao, Alex Aiken, Jiajun Wu
1500–1530 Afternoon Break (Halls B2-C) 54. Learning To Recognize Procedural Activities With Distant
Supervision, Xudong Lin, Fabio Petroni, Gedas Bertasius, Marcus
1000–1600 Exhibits (Halls B2-C) Rohrbach, Shih-Fu Chang, Lorenzo Torresani
• See Exhibits map for list of exhibitors.

44
Thursday, June 23 (Afternoon) Program
55. Implicit Motion Handling for Video Camouflaged Object Recognition: Detection, Categorization, Retrieval
Detection, Xuelian Cheng, Huan Xiong, Deng-Ping Fan, Yiran 75. Open-Vocabulary One-Stage Detection With Hierarchical
Zhong, Mehrtash Harandi, Tom Drummond, Zongyuan Ge Visual-Language Knowledge Distillation, Zongyang Ma, Guan
56. Dynamic Scene Graph Generation via Anticipatory Pre-Training, Luo, Jin Gao, Liang Li, Yuxin Chen, Shaoru Wang, Congxuan
Yiming Li, Xiaoshan Yang, Changsheng Xu Zhang, Weiming Hu
57. Learning To Refactor Action and Co-Occurrence Features for 76. Learning To Prompt for Open-Vocabulary Object Detection
Temporal Action Localization, Kun Xia, Le Wang, Sanping Zhou, With Vision-Language Model, Yu Du, Fangyun Wei, Zihe Zhang,
Nanning Zheng, Wei Tang Miaojing Shi, Yue Gao, Guoqi Li
58. OCSampler: Compressing Videos to One Clip With Single-Step 77. Sign Language Video Retrieval With Free-Form Textual Queries,
Sampling, Jintao Lin, Haodong Duan, Kai Chen, Dahua Lin, Limin Amanda Duarte, Samuel Albanie, Xavier Giró-i-Nieto, Gül Varol
Wang 78. FashionVLP: Vision Language Transformer for Fashion Retrieval
59. A Hybrid Egocentric Activity Anticipation Framework via With Feedback, Sonam Goenka, Zhaoheng Zheng, Ayush
Memory-Augmented Recurrent and One-Shot Representation Jaiswal, Rakesh Chada, Yue Wu, Varsha Hedau, Pradeep
Forecasting, Tianshan Liu, Kin-Man Lam Natarajan
60. TubeFormer-DeepLab: Video Mask Transformer, Dahun Kim, 79. Pushing the Performance Limit of Scene Text Recognizer
Jun Xie, Huiyu Wang, Siyuan Qiao, Qihang Yu, Hong-Seok Kim, Without Human Annotation, Caiyuan Zheng, Hui Li, Seon-Min
Hartwig Adam, In So Kweon, Liang-Chieh Chen Rhee, Seungju Han, Jae-Joon Han, Peng Wang
61. ASM-Loc: Action-Aware Segment Modeling for Weakly- 80. ESCNet: Gaze Target Detection With the Understanding of 3D
Supervised Temporal Action Localization, Bo He, Xitong Yang, Scenes, Jun Bao, Buyu Liu, Jun Yu
Le Kang, Zhiyu Cheng, Xin Zhou, Abhinav Shrivastava 81. Interactive Multi-Class Tiny-Object Detection, Chunggi Lee,
62. A Graph Matching Perspective With Transformers on Video Seonwook Park, Heon Song, Jeongun Ryu, Sanghoon Kim,
Instance Segmentation, Zheyun Qin, Xiankai Lu, Xiushan Nie, Haejoon Kim, Sérgio Pereira, Donggeun Yoo
Yilong Yin, Jianbing Shen 82. Weakly Supervised Rotation-Invariant Aerial Object Detection
63. STRPM: A Spatiotemporal Residual Predictive Model for High- Network, Xiaoxu Feng, Xiwen Yao, Gong Cheng, Junwei Han
Resolution Video Prediction, Zheng Chang, Xinfeng Zhang, 83. Large Loss Matters in Weakly Supervised Multi-Label
Shanshe Wang, Siwei Ma, Wen Gao Classification, Youngwook Kim, Jae Myung Kim, Zeynep Akata,
64. Look for the Change: Learning Object States and State- Jungwoo Lee
Modifying Actions From Untrimmed Web Videos, Tomáš 84. MetaFSCIL: A Meta-Learning Approach for Few-Shot Class
Souček, Jean-Baptiste Alayrac, Antoine Miech, Ivan Laptev, Josef Incremental Learning, Zhixiang Chi, Li Gu, Huan Liu, Yang Wang,
Sivic Yuanhao Yu, Jin Tang
65. End-to-End Compressed Video Representation Learning for 85. FreeSOLO: Learning To Segment Objects Without Annotations,
Generic Event Boundary Detection, Congcong Li, Xinyao Wang, Xinlong Wang, Zhiding Yu, Shalini De Mello, Jan Kautz, Anima
Longyin Wen, Dexiang Hong, Tiejian Luo, Libo Zhang Anandkumar, Chunhua Shen, Jose M. Alvarez
66. Contextualized Spatio-Temporal Contrastive Learning With 86. Revisiting AP Loss for Dense Object Detection: Adaptive
Self-Supervision, Liangzhe Yuan, Rui Qian, Yin Cui, Boqing Gong, Ranking Pair Selection, Dongli Xu, Jinhong Deng, Wen Li
Florian Schroff, Ming-Hsuan Yang, Hartwig Adam, Ting Liu 87. SIOD: Single Instance Annotated per Category per Image for
67. Deep Anomaly Discovery From Unlabeled Videos via Normality Object Detection, Hanjun Li, Xingjia Pan, Ke Yan, Fan Tang, Wei-
Advantage and Self-Paced Refinement, Guang Yu, Siqi Wang, Shi Zheng
Zhiping Cai, Xinwang Liu, Chuanfu Xu, Chengkun Wu 88. Towards Robust Adaptive Object Detection Under Noisy
68. A Deeper Dive Into What Deep Spatiotemporal Networks Annotations, Xinyu Liu, Wuyang Li, Qiushi Yang, Baopu Li,
Encode: Quantifying Static vs. Dynamic Information, Matthew Yixuan Yuan
Kowal, Mennatullah Siam, Md Amirul Islam, Neil D. B. Bruce, 89. Task-Specific Inconsistency Alignment for Domain Adaptive
Richard P. Wildes, Konstantinos G. Derpanis Object Detection, Liang Zhao, Limin Wang
69. Long-Short Temporal Contrastive Learning of Video 90. Salvage of Supervision in Weakly Supervised Object Detection,
Transformers, Jue Wang, Gedas Bertasius, Du Tran, Lorenzo Lin Sui, Chen-Lin Zhang, Jianxin Wu
Torresani 91. Label, Verify, Correct: A Simple Few Shot Object Detection
70. Scene Consistency Representation Learning for Video Scene Method, Prannay Kaul, Weidi Xie, Andrew Zisserman
Segmentation, Haoqian Wu, Keyu Chen, Yanan Luo, Ruizhi Qiao, 92. Background Activation Suppression for Weakly Supervised
Bo Ren, Haozhe Liu, Weicheng Xie, Linlin Shen Object Localization, Pingyu Wu, Wei Zhai, Yang Cao
71. Unsupervised Pre-Training for Temporal Action Localization 93. Bridging the Gap Between Classification and Localization for
Tasks, Can Zhang, Tianyu Yang, Junwu Weng, Meng Cao, Jue Weakly Supervised Object Localization, Eunji Kim, Siwon Kim,
Wang, Yuexian Zou Jungbeom Lee, Hyunwoo Kim, Sungroh Yoon
72. Contrastive Learning for Unsupervised Video Highlight 94. Divide and Conquer: Compositional Experts for Generalized
Detection, Taivanbat Badamdorj, Mrigank Rochan, Yang Wang, Novel Class Discovery, Muli Yang, Yuehua Zhu, Jiaping Yu, Aming
Li Cheng Wu, Cheng Deng
73. Deformable Video Transformer, Jue Wang, Lorenzo Torresani 95. Cloth-Changing Person Re-Identification From a Single Image
74. Recurring the Transformer for Video Action Recognition, Jiewen With Gait Prediction and Regularization, Xin Jin, Tianyu He,
Yang, Xingbo Dong, Liujun Liu, Chao Zhang, Jiajun Shen, Dahai Kecheng Zheng, Zhiheng Yin, Xu Shen, Zhen Huang, Ruoyu Feng,
Yu Jianqiang Huang, Zhibo Chen, Xian-Sheng Hua

45
Thursday, June 23 (Afternoon) Program
96. Lifelong Unsupervised Domain Adaptive Person Re-Identifica- 116. Not All Labels Are Equal: Rationalizing the Labeling Costs for
tion With Coordinated Anti-Forgetting and Adaptation, Zhipeng Training Object Detection, Ismail Elezi, Zhiding Yu, Anima
Huang, Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, Peng Chu, Anandkumar, Laura Leal-Taixé, Jose M. Alvarez
Quanzeng You, Jiang Wang, Zicheng Liu, Zheng-Jun Zha 117. Self-Supervised Learning of Object Parts for Semantic
97. Unleashing Potential of Unsupervised Pre-Training With Intra- Segmentation, Adrian Ziegler, Yuki M. Asano
Identity Regularization for Person Re-Identification, Zizheng 118. MUM: Mix Image Tiles and UnMix Feature Tiles for Semi-
Yang, Xin Jin, Kecheng Zheng, Feng Zhao Supervised Object Detection, JongMok Kim, JooYoung Jang,
98. Learning With Twin Noisy Labels for Visible-Infrared Person Re- Seunghyeon Seo, Jisoo Jeong, Jongkeun Na, Nojun Kwak
Identification, Mouxing Yang, Zhenyu Huang, Peng Hu, Taihao Li, 119. Scale-Equivalent Distillation for Semi-Supervised Object
Jiancheng Lv, Xi Peng Detection, Qiushan Guo, Yao Mu, Jianyu Chen, Tianqi Wang,
99. Towards Total Recall in Industrial Anomaly Detection, Karsten Yizhou Yu, Ping Luo
Roth, Latha Pemula, Joaquin Zepeda, Bernhard Schölkopf, 120. A Self-Supervised Descriptor for Image Copy Detection, Ed
Thomas Brox, Peter Gehler Pizzi, Sreya Dutta Roy, Sugosh Nagavara Ravindra, Priya Goyal,
100. H2FA R-CNN: Holistic and Hierarchical Feature Alignment for Matthijs Douze
Cross-Domain Weakly Supervised Object Detection, Yunqiu Xu, 121. Self-Supervised Transformers for Unsupervised Object
Yifan Sun, Zongxin Yang, Jiaxu Miao, Yi Yang Discovery Using Normalized Cut, Yangtao Wang, Xi Shen, Shell
101. Geometric and Textural Augmentation for Domain Gap Xu Hu, Yuan Yuan, James L. Crowley, Dominique Vaufreydaz
Reduction, Xiao-Chang Liu, Yong-Liang Yang, Peter Hall 122. CAD: Co-Adapting Discriminative Features for Improved Few-
102. General Incremental Learning With Domain-Aware Categorical Shot Classification, Philip Chikontwe, Soopil Kim, Sang Hyun
Representations, Jiangwei Xie, Shipeng Yan, Xuming He Park
103. DST: Dynamic Substitute Training for Data-Free Black-Box 123. Semi-Supervised Few-Shot Learning via Multi-Factor
Attack, Wenxuan Wang, Xuelin Qian, Yanwei Fu, Xiangyang Xue Clustering, Jie Ling, Lei Liao, Meng Yang, Jia Shuai
104. ART-Point: Improving Rotation Robustness of Point Cloud 124. CoSSL: Co-Learning of Representation and Classifier for
Classifiers via Adversarial Rotation, Ruibin Wang, Yibo Yang, Imbalanced Semi-Supervised Learning, Yue Fan, Dengxin Dai,
Dacheng Tao Anna Kukleva, Bernt Schiele
125. Safe-Student for Safe Deep Semi-Supervised Learning With
Self-, Semi-, Meta-, & Unsupervised Learning
Unseen-Class Unlabeled Data, Rundong He, Zhongyi Han,
105. Label Matching Semi-Supervised Object Detection, Binbin
Xiankai Lu, Yilong Yin
Chen, Weijie Chen, Shicai Yang, Yunyi Xuan, Jie Song, Di Xie,
126. A Simple Data Mixing Prior for Improving Self-Supervised
Shiliang Pu, Mingli Song, Yueting Zhuang
Learning, Sucheng Ren, Huiyu Wang, Zhengqi Gao, Shengfeng
106. Multidimensional Belief Quantification for Label-Efficient Meta-
He, Alan Yuille, Yuyin Zhou, Cihang Xie
Learning, Deep Shankar Pandey, Qi Yu
127. DETReg: Unsupervised Pretraining With Region Priors for
107. Propagation Regularizer for Semi-Supervised Learning With
Object Detection, Amir Bar, Xin Wang, Vadim Kantorov,
Extremely Scarce Labeled Samples, Noo-ri Kim, Jee-Hyong Lee
Colorado J. Reed, Roei Herzig, Gal Chechik, Anna Rohrbach,
108. Learning To Affiliate: Mutual Centralized Learning for Few-Shot
Trevor Darrell, Amir Globerson
Classification, Yang Liu, Weifeng Zhang, Chao Xiang, Tu Zheng,
128. Sound and Visual Representation Learning With Multiple
Deng Cai, Xiaofei He
Pretraining Tasks, Arun Balajee Vasudevan, Dengxin Dai, Luc
109. Class-Aware Contrastive Semi-Supervised Learning, Fan Yang,
Van Gool
Kai Wu, Shuyi Zhang, Guannan Jiang, Yong Liu, Feng Zheng, Wei
129. UniVIP: A Unified Framework for Self-Supervised Visual Pre-
Zhang, Chengjie Wang, Long Zeng
Training, Zhaowen Li, Yousong Zhu, Fan Yang, Wei Li, Chaoyang
110. Exploring the Equivalence of Siamese Self-Supervised Learning
Zhao, Yingying Chen, Zhiyang Chen, Jiahao Xie, Liwei Wu, Rui
via a Unified Gradient Framework, Chenxin Tao, Honghui Wang, Zhao, Ming Tang, Jinqiao Wang
Xizhou Zhu, Jiahua Dong, Shiji Song, Gao Huang, Jifeng Dai
130. Weakly Supervised Object Localization As Domain Adaption,
111. Dual Temperature Helps Contrastive Learning Without Many
Lei Zhu, Qi She, Qian Chen, Yunfei You, Boyu Wang, Yanye Lu
Negative Samples: Towards Understanding and Simplifying
131. Debiased Learning From Naturally Imbalanced Pseudo-Labels,
MoCo, Chaoning Zhang, Kang Zhang, Trung X. Pham, Axi Niu,
Xudong Wang, Zhirong Wu, Long Lian, Stella X. Yu
Zhinan Qiao, Chang D. Yoo, In So Kweon
132. Towards Discovering the Effectiveness of Moderately Confident
112. Learning Where To Learn in Cross-View Self-Supervised
Samples for Semi-Supervised Learning, Hui Tang, Kui Jia
Learning, Lang Huang, Shan You, Mingkai Zheng, Fei Wang,
133. Masked Feature Prediction for Self-Supervised Visual Pre-
Chen Qian, Toshihiko Yamasaki
Training, Chen Wei, Haoqi Fan, Saining Xie, Chao-Yuan Wu, Alan
113. Dist-PU: Positive-Unlabeled Learning From a Label Distribution
Yuille, Christoph Feichtenhofer
Perspective, Yunrui Zhao, Qianqian Xu, Yangbangyan Jiang,
134. Contrastive Learning for Space-Time Correspondence via Self-
Peisong Wen, Qingming Huang
Cycle Consistency, Jeany Son
114. SimMatch: Semi-Supervised Learning With Similarity Matching,
135. Id-Free Person Similarity Learning, Bing Shuai, Xinyu Li, Kaustav
Mingkai Zheng, Shan You, Lang Huang, Fei Wang, Chen Qian,
Kundu, Joseph Tighe
Chang Xu
136. End-to-End Semi-Supervised Learning for Video Action
115. Active Teacher for Semi-Supervised Object Detection, Peng Mi,
Detection, Akash Kumar, Yogesh Singh Rawat
Jianghang Lin, Yiyi Zhou, Yunhang Shen, Gen Luo, Xiaoshuai Sun,
Liujuan Cao, Rongrong Fu, Qiang Xu, Rongrong Ji 137. Probabilistic Representations for Video Contrastive Learning,
Jungin Park, Jiyoung Lee, Ig-Jae Kim, Kwanghoon Sohn

46
Thursday, June 23 (Afternoon) Program
138. Interact Before Align: Leveraging Cross-Modal Knowledge for Computer Vision for Social Good
Domain Adaptive Action Recognition, Lijin Yang, Yifei Huang, 157. DeepFake Disrupter: The Detector of DeepFake Is My Friend,
Yusuke Sugano, Yoichi Sato Xueyu Wang, Jiajun Huang, Siqi Ma, Surya Nepal, Chang Xu
139. BEVT: BERT Pretraining of Video Transformers, Rui Wang, 158. HybridCR: Weakly-Supervised 3D Point Cloud Semantic
Dongdong Chen, Zuxuan Wu, Yinpeng Chen, Xiyang Dai, Segmentation via Hybrid Contrastive Regularization, Mengtian
Mengchen Liu, Yu-Gang Jiang, Luowei Zhou, Lu Yuan Li, Yuan Xie, Yunhang Shen, Bo Ke, Ruizhi Qiao, Bo Ren, Shaohui
140. Generative Cooperative Learning for Unsupervised Video Lin, Lizhuang Ma
Anomaly Detection, M. Zaigham Zaheer, Arif Mahmood, M. 159. Open-Domain, Content-Based, Multi-Modal Fact-Checking of
Haris Khan, Mattia Segu, Fisher Yu, Seung-Ik Lee Out-of-Context Images via Online Resources, Sahar Abdelnabi,
141. When Does Contrastive Visual Representation Learning Work? Rakibul Hasan, Mario Fritz
Elijah Cole, Xuan Yang, Kimberly Wilber, Oisin Mac Aodha, Serge 160. Leveraging Real Talking Faces via Self-Supervision for Robust
Belongie Forgery Detection, Alexandros Haliassos, Rodrigo Mira, Stavros
142. The Norm Must Go On: Dynamic Unsupervised Domain Petridis, Maja Pantic
Adaptation by Normalization, M. Jehanzeb Mirza, Jakub Micorek, Adversarial Attack & Defense
Horst Possegger, Horst Bischof
161. Transferable Sparse Adversarial Attack, Ziwen He, Wei Wang,
143. What Matters for Meta-Learning Vision Regression Tasks? Ning Jing Dong, Tieniu Tan
Gao, Hanna Ziesche, Ngo Anh Vien, Michael Volpp, Gerhard
162. Segment and Complete: Defending Object Detectors Against
Neumann
Adversarial Patch Attacks With Robust Patch Detection, Jiang
Robot Vision Liu, Alexander Levine, Chun Pong Lau, Rama Chellappa, Soheil Feizi
144. IFOR: Iterative Flow Minimization for Robotic Object 163. Stochastic Variance Reduced Ensemble Adversarial Attack for
Rearrangement, Ankit Goyal, Arsalan Mousavian, Chris Paxton, Boosting the Adversarial Transferability, Yifeng Xiong, Jiadong
Yu-Wei Chao, Brian Okorn, Jia Deng, Dieter Fox Lin, Min Zhang, John E. Hopcroft, Kun He
145. TCTrack: Temporal Contexts for Aerial Tracking, Ziang Cao, 164. Improving Adversarial Transferability via Neuron Attribution-
Ziyuan Huang, Liang Pan, Shiwei Zhang, Ziwei Liu, Changhong Based Attacks, Jianping Zhang, Weibin Wu, Jen-tse Huang,
Fu Yizhan Huang, Wenxuan Wang, Yuxin Su, Michael R. Lyu
146. AKB-48: A Real-World Articulated Object Knowledge Base, Liu 165. Complex Backdoor Detection by Symmetric Feature
Liu, Wenqiang Xu, Haoyuan Fu, Sucheng Qian, Qiaojun Yu, Yang Differencing, Yingqi Liu, Guangyu Shen, Guanhong Tao, Zhenting
Han, Cewu Lu Wang, Shiqing Ma, Xiangyu Zhang
147. 3DAC: Learning Attribute Compression for Point Clouds, 166. Protecting Facial Privacy: Generating Adversarial Identity Masks
Guangchi Fang, Qingyong Hu, Hanyun Wang, Yiling Xu, Yulan via Style-Robust Makeup Transfer, Shengshan Hu, Xiaogeng Liu,
Guo Yechao Zhang, Minghui Li, Leo Yu Zhang, Hai Jin, Libing Wu
148. Simple but Effective: CLIP Embeddings for Embodied AI, 167. Zero-Query Transfer Attacks on Context-Aware Object
Apoorv Khandelwal, Luca Weihs, Roozbeh Mottaghi, Aniruddha Detectors, Zikui Cai, Shantanu Rane, Alejandro E. Brito, Chengyu
Kembhavi Song, Srikanth V. Krishnamurthy, Amit K. Roy-Chowdhury, M.
149. Multi-Robot Active Mapping via Neural Bipartite Graph Salman Asif
Matching, Kai Ye, Siyan Dong, Qingnan Fan, He Wang, Li Yi, Fei 168. 360-Attack: Distortion-Aware Perturbations From Perspective-
Xia, Jue Wang, Baoquan Chen Views, Yunjian Zhang, Yanwei Liu, Jinxia Liu, Jingbo Miao,
150. Continuous Scene Representations for Embodied AI, Samir Antonios Argyriou, Liming Wang, Zhen Xu
Yitzhak Gadre, Kiana Ehsani, Shuran Song, Roozbeh Mottaghi 169. Label-Only Model Inversion Attacks via Boundary Repulsion,
151. Interactron: Embodied Adaptive Object Detection, Klemen Mostafa Kahla, Si Chen, Hoang Anh Just, Ruoxi Jia
Kotar, Roozbeh Mottaghi 170. Merry Go Round: Rotate a Frame and Fool a DNN, Daksh
152. Online Learning of Reusable Abstract Models for Object Goal Thapar, Aditya Nigam, Chetan Arora
Navigation, Tommaso Campari, Leonardo Lamanna, Paolo 171. Cross-Modal Transferable Adversarial Attacks From Images to
Traverso, Luciano Serafini, Lamberto Ballan Videos, Zhipeng Wei, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang
153. RNNPose: Recurrent 6-DoF Object Pose Refinement With 172. BppAttack: Stealthy and Efficient Trojan Attacks Against Deep
Robust Correspondence Field Estimation and Pose Neural Networks via Image Quantization and Contrastive
Optimization, Yan Xu, Kwan-Yee Lin, Guofeng Zhang, Xiaogang Adversarial Learning, Zhenting Wang, Juan Zhai, Shiqing Ma
Wang, Hongsheng Li 173. Investigating Top-k White-Box and Transferable Black-Box
154. UDA-COPE: Unsupervised Domain Adaptation for Category- Attack, Chaoning Zhang, Philipp Benz, Adil Karjauv, Jae Won
Level Object Pose Estimation, Taeyeop Lee, Byeong-Uk Lee, Cho, Kang Zhang, In So Kweon
Inkyu Shin, Jaesung Choe, Ukcheol Shin, In So Kweon, Kuk-Jin 174. Boosting Black-Box Attack With Partially Transferred
Yoon Conditional Adversarial Distribution, Yan Feng, Baoyuan Wu,
155. Symmetry and Uncertainty-Aware Object SLAM for 6DoF Yanbo Fan, Li Liu, Zhifeng Li, Shu-Tao Xia
Object Pose Estimation, Nathaniel Merrill, Yuliang Guo, Xingxing 175. Practical Evaluation of Adversarial Robustness via Adaptive
Zuo, Xinyu Huang, Stefan Leutenegger, Xi Peng, Liu Ren, Auto Attack, Ye Liu, Yaya Cheng, Lianli Gao, Xianglong Liu,
Guoquan Huang Qilong Zhang, Jingkuan Song
156. Upright-Net: Learning Upright Orientation for 3D Point Cloud, 176. Towards Efficient Data Free Black-Box Adversarial Attack, Jie
Xufang Pang, Feng Li, Ning Ding, Xiaopin Zhong Zhang, Bo Li, Jianghe Xu, Shuang Wu, Shouhong Ding, Lei Zhang,
Chao Wu

47
Thursday, June 23 (Afternoon) Program
177. Masking Adversarial Damage: Finding Adversarial Saliency for 197. Shape-Invariant 3D Adversarial Point Clouds, Qidong Huang,
Robust and Sparse Network, Byung-Kwan Lee, Junho Kim, Yong Xiaoyi Dong, Dongdong Chen, Hang Zhou, Weiming Zhang,
Man Ro Nenghai Yu
178. Certified Patch Robustness via Smoothed Vision Transformers, 198. Shadows Can Be Dangerous: Stealthy and Effective Physical-
Hadi Salman, Saachi Jain, Eric Wong, Aleksander Madry World Adversarial Attack by Natural Phenomenon, Yiqi Zhong,
179. Towards Practical Certifiable Patch Defense With Vision Xianming Liu, Deming Zhai, Junjun Jiang, Xiangyang Ji
Transformer, Zhaoyu Chen, Bo Li, Jianghe Xu, Shuang Wu, 199. Exploring Effective Data for Surrogate Training Towards Black-
Shouhong Ding, Wenqiang Zhang Box Attack, Xuxiang Sun, Gong Cheng, Hongda Li, Lei Pei, Junwei
180. On Adversarial Robustness of Trajectory Prediction for Han
Autonomous Vehicles, Qingzhao Zhang, Shengtuo Hu, Jiachen 200. NICGSlowDown: Evaluating the Efficiency Robustness of Neural
Sun, Qi Alfred Chen, Z. Morley Mao Image Caption Generation Models, Simin Chen, Zihe Song,
181. 3DeformRS: Certifying Spatial Deformations on Point Clouds, Mirazul Haque, Cong Liu, Wei Yang
Gabriel Pérez S., Juan C. Pérez, Motasem Alfarra, Silvio Giancola, 201. Dual-Key Multimodal Backdoors for Visual Question Answering,
Bernard Ghanem Matthew Walmer, Karan Sikka, Indranil Sur, Abhinav
182. Stereoscopic Universal Perturbations Across Different Shrivastava, Susmit Jha
Architectures and Datasets, Zachary Berger, Parth Agrawal, Tian 202. Proactive Image Manipulation Detection, Vishal Asnani, Xi Yin,
Yu Liu, Stefano Soatto, Alex Wong Tal Hassner, Sijia Liu, Xiaoming Liu
183. Aug-NeRF: Training Stronger Neural Radiance Fields With Vision & Language
Triple-Level Physically-Grounded Augmentations, Tianlong 203. ADAPT: Vision-Language Navigation With Modality-Aligned
Chen, Peihao Wang, Zhiwen Fan, Zhangyang Wang Action Prompts, Bingqian Lin, Yi Zhu, Zicong Chen, Xiwen Liang,
184. Bounded Adversarial Attack on Deep Content Features, Qiuling Jianzhuang Liu, Xiaodan Liang
Xu, Guanhong Tao, Xiangyu Zhang 204. ENVEDIT: Environment Editing for Vision-and-Language
185. DEFEAT: Deep Hidden Feature Backdoor Attacks by Navigation, Jialu Li, Hao Tan, Mohit Bansal
Imperceptible Perturbation and Latent Representation 205. HOP: History-and-Order Aware Pre-Training for Vision-and-
Constraints, Zhendong Zhao, Xiaojun Chen, Yuexin Xuan, Ye Language Navigation, Yanyuan Qiao, Yuankai Qi, Yicong Hong,
Dong, Dakui Wang, Kaitai Liang Zheng Yu, Peng Wang, Qi Wu
186. Two Coupled Rejection Metrics Can Tell Adversarial Examples 206. Less Is More: Generating Grounded Navigation Instructions
Apart, Tianyu Pang, Huishuai Zhang, Di He, Yinpeng Dong, Hang From Landmarks, Su Wang, Ceslee Montgomery, Jordi Orbay,
Su, Wei Chen, Jun Zhu, Tie-Yan Liu Vighnesh Birodkar, Aleksandra Faust, Izzeddin Gur, Natasha
187. Give Me Your Attention: Dot-Product Attention Considered Jaques, Austin Waters, Jason Baldridge, Peter Anderson
Harmful for Adversarial Patch Robustness, Giulio Lovisotto, 207. Bridging the Gap Between Learning in Discrete and Continuous
Nicole Finnie, Mauricio Munoz, Chaithanya Kumar Mummadi, Jan Environments for Vision-and-Language Navigation, Yicong
Hendrik Metzen Hong, Zun Wang, Qi Wu, Stephen Gould
188. Improving the Transferability of Targeted Adversarial Examples 208. Reinforced Structured State-Evolution for Vision-Language
Through Object-Based Diverse Input, Junyoung Byun, Seungju Navigation, Jinyu Chen, Chen Gao, Erli Meng, Qiong Zhang, Si Liu
Cho, Myung-Joon Kwon, Hee-Seon Kim, Changick Kim
209. Cross-Modal Map Learning for Vision and Language Navigation,
189. Adversarial Eigen Attack on Black-Box Models, Linjun Zhou, Georgios Georgakis, Karl Schmeckpeper, Karan Wanchoo, Soham
Peng Cui, Xingxuan Zhang, Yinan Jiang, Shiqiang Yang Dan, Eleni Miltsakaki, Dan Roth, Kostas Daniilidis
190. Appearance and Structure Aware Robust Deep Visual Graph 210. Counterfactual Cycle-Consistent Learning for Instruction
Matching: Attack, Defense and Beyond, Qibing Ren, Qingquan Following and Generation in Vision-Language Navigation,
Bao, Runzhong Wang, Junchi Yan Hanqing Wang, Wei Liang, Jianbing Shen, Luc Van Gool,
191. Enhancing Adversarial Training With Second-Order Statistics of Wenguan Wang
Weights, Gaojie Jin, Xinping Yi, Wei Huang, Sven Schewe, 211. One Step at a Time: Long-Horizon Vision-and-Language
Xiaowei Huang Navigation With Milestones, Chan Hee Song, Jihyung Kil, Tai-Yu
192. Towards Data-Free Model Stealing in a Hard Label Setting, Pan, Brian M. Sadler, Wei-Lun Chao, Yu Su
Sunandini Sanyal, Sravanti Addepalli, R. Venkatesh Babu 212. Expanding Large Pre-Trained Unimodal Models With
193. Robust Structured Declarative Classifiers for 3D Point Clouds: Multimodal Information Injection for Image-Text Multimodal
Defending Adversarial Attacks With Implicit Gradients, Kaidong Classification, Tao Liang, Guosheng Lin, Mingyang Wan, Tianrui
Li, Ziming Zhang, Cuncong Zhong, Guanghui Wang Li, Guojun Ma, Fengmao Lv
194. DTA: Physical Camouflage Attacks Using Differentiable 213. Shifting More Attention to Visual Backbone: Query-Modulated
Transformation Network, Naufal Suryanto, Yongsu Kim, Hyoeun Refinement Networks for End-to-End Visual Grounding, Jiabo
Kang, Harashta Tatimma Larasati, Youngyeo Yun, Thi-Thu- Ye, Junfeng Tian, Ming Yan, Xiaoshan Yang, Xuwu Wang, Ji
Huong Le, Hunmin Yang, Se-Yoon Oh, Howon Kim Zhang, Liang He, Xin Lin
195. Frequency-Driven Imperceptible Adversarial Attack on 214. Pseudo-Q: Generating Pseudo Language Queries for Visual
Semantic Similarity, Cheng Luo, Qinliang Lin, Weicheng Xie, Grounding, Haojun Jiang, Yuanze Lin, Dongchen Han, Shiji Song,
Bizhu Wu, Jinheng Xie, Linlin Shen Gao Huang
196. Enhancing Adversarial Robustness for Deep Metric Learning, 215. Multi-View Transformer for 3D Visual Grounding, Shijia Huang,
Mo Zhou, Vishal M. Patel Yilun Chen, Jiaya Jia, Liwei Wang

48
Thursday, June 23 (Afternoon) Program
216. Multi-Modal Dynamic Graph Transformer for Visual Grounding, 238. GPU-Based Homotopy Continuation for Minimal Problems in
Sijia Chen, Baochun Li Computer Vision, Chiang-Heng Chien, Hongyi Fan, Ahmad
217. Weakly-Supervised Generation and Grounding of Visual Abdelfattah, Elias Tsigaridas, Stanimire Tomov, Benjamin Kimia
Descriptions With Conditional Generative Models, Effrosyni 239. HARA: A Hierarchical Approach for Robust Rotation Averaging,
Mavroudi, René Vidal Seong Hun Lee, Javier Civera
218. Weakly Supervised Temporal Sentence Grounding With 240. RAGO: Recurrent Graph Optimizer for Multiple Rotation
Gaussian-Based Contrastive Proposal Learning, Minghang Averaging, Heng Li, Zhaopeng Cui, Shuaicheng Liu, Ping Tan
Zheng, Yanjie Huang, Qingchao Chen, Yuxin Peng, Yang Liu 241. A Unified Model for Line Projections in Catadioptric Cameras
219. Visual Abductive Reasoning, Chen Liang, Wenguan Wang, With Rotationally Symmetric Mirrors, Pedro Miraldo, José Pedro
Tianfei Zhou, Yi Yang Iglesias
220. Query and Attention Augmentation for Knowledge-Based 242. ELSR: Efficient Line Segment Reconstruction With Planes and
Explainable Reasoning, Yifeng Zhang, Ming Jiang, Qi Zhao Points Guidance, Dong Wei, Yi Wan, Yongjun Zhang, Xinyi Liu,
221. REX: Reasoning-Aware and Grounded Explanation, Shi Chen, Qi Bin Zhang, Xiqi Wang
Zhao 243. Self-Supervised Neural Articulated Shape and Appearance
222. Not All Relations Are Equal: Mining Informative Labels for Models, Fangyin Wei, Rohan Chabra, Lingni Ma, Christoph
Scene Graph Generation, Arushi Goel, Basura Fernando, Frank Lassner, Michael Zollhöfer, Szymon Rusinkiewicz, Chris Sweeney,
Keller, Hakan Bilen Richard Newcombe, Mira Slavcheva
223. Unsupervised Vision-Language Parsing: Seamlessly Bridging 244. Virtual Elastic Objects, Hsiao-yu Chen, Edith Tretschk, Tuur
Visual Scene Graphs With Language Structures via Dependency Stuyck, Petr Kadlecek, Ladislav Kavan, Etienne Vouga, Christoph
Relationships, Chao Lou, Wenjuan Han, Yuhuan Lin, Zilong Zheng Lassner
224. Scene Graph Expansion for Semantics-Guided Image 245. Decoupling Makes Weakly Supervised Local Feature Better,
Outpainting, Chiao-An Yang, Cheng-Yo Tan, Wan-Cyuan Fan, Kunhong Li, Longguang Wang, Li Liu, Qing Ran, Kai Xu, Yulan
Cheng-Fu Yang, Meng-Lin Wu, Yu-Chiang Frank Wang Guo
225. VisualHow: Multimodal Problem Solving, Jinhui Yang, Xianyu 246. JoinABLe: Learning Bottom-Up Assembly of Parametric CAD
Chen, Ming Jiang, Shi Chen, Louis Wang, Qi Zhao Joints, Karl D.D. Willis, Pradeep Kumar Jayaraman, Hang Chu,
226. FLAVA: A Foundational Language and Vision Alignment Model, Yunsheng Tian, Yifei Li, Daniele Grandi, Aditya Sanghi, Linh Tran,
Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Joseph G. Lambourne, Armando Solar-Lezama, Wojciech Matusik
Couairon, Wojciech Galuba, Marcus Rohrbach, Douwe Kiela 247. ImplicitAtlas: Learning Deformable Shape Templates in Medical
227. Multi-Modal Alignment Using Representation Codebook, Jiali Imaging, Jiancheng Yang, Udaranga Wickramasinghe, Bingbing
Duan, Liqun Chen, Son Tran, Jinyu Yang, Yi Xu, Belinda Zeng, Ni, Pascal Fua
Trishul Chilimbi 248. DoubleField: Bridging the Neural Surface and Radiance Fields
228. Negative-Aware Attention Framework for Image-Text Match- for High-Fidelity Human Reconstruction and Rendering, Ruizhi
ing, Kun Zhang, Zhendong Mao, Quan Wang, Yongdong Zhang Shao, Hongwen Zhang, He Zhang, Mingjia Chen, Yan-Pei Cao,
229. Vision-Language Pre-Training With Triple Contrastive Learning, Tao Yu, Yebin Liu
Jinyu Yang, Jiali Duan, Son Tran, Yi Xu, Sampath Chanda, Liqun 249. Surface-Aligned Neural Radiance Fields for Controllable 3D
Chen, Belinda Zeng, Trishul Chilimbi, Junzhou Huang Human Synthesis, Tianhan Xu, Yasuhiro Fujita, Eiichi Matsumoto
230. Vision-Language Pre-Training for Boosting Scene Text 250. Structured Local Radiance Fields for Human Avatar Modeling,
Detectors, Sibo Song, Jianqiang Wan, Zhibo Yang, Jun Tang, Zerong Zheng, Han Huang, Tao Yu, Hongwen Zhang, Yandong
Wenqing Cheng, Xiang Bai, Cong Yao Guo, Yebin Liu
231. COTS: Collaborative Two-Stream Vision-Language Pre-Training 251. High-Fidelity Human Avatars From a Single RGB Camera, Hao
Model for Cross-Modal Retrieval, Haoyu Lu, Nanyi Fei, Yuqi Huo, Zhao, Jinsong Zhang, Yu-Kun Lai, Zerong Zheng, Yingdi Xie, Yebin
Yizhao Gao, Zhiwu Lu, Ji-Rong Wen Liu, Kun Li
252. Forecasting Characteristic 3D Poses of Human Actions, Christian
3D From Multi-View & Sensors
Diller, Thomas Funkhouser, Angela Dai
232. NeurMiPs: Neural Mixture of Planar Experts for View Synthesis,
253. Virtual Correspondence: Humans as a Cue for Extreme-View
Zhi-Hao Lin, Wei-Chiu Ma, Hao-Yu Hsu, Yu-Chiang Frank Wang,
Shenlong Wang Geometry, Wei-Chiu Ma, Anqi Joyce Yang, Shenlong Wang,
Raquel Urtasun, Antonio Torralba
233. FWD: Real-Time Novel View Synthesis With Forward Warping
254. BEHAVE: Dataset and Method for Tracking Human Object
and Depth, Ang Cao, Chris Rockwell, Justin Johnson
Interactions, Bharat Lal Bhatnagar, Xianghui Xie, Ilya A. Petrov,
234. SOMSI: Spherical Novel View Synthesis With Soft Occlusion
Cristian Sminchisescu, Christian Theobalt, Gerard Pons-Moll
Multi-Sphere Images, Tewodros Habtegebrial, Christiano Gava,
255. Primitive3D: 3D Object Dataset Synthesis From Randomly
Marcel Rogge, Didier Stricker, Varun Jampani
Assembled Primitives, Xinke Li, Henghui Ding, Zekun Tong,
235. Fast, Accurate and Memory-Efficient Partial Permutation
Yuwei Wu, Yeow Meng Chee
Synchronization, Shaohan Li, Yunpeng Shi, Gilad Lerman
256. RGB-Multispectral Matching: Dataset, Learning Methodology,
236. Learning To Find Good Models in RANSAC, Daniel Barath, Luca
Evaluation, Fabio Tosi, Pierluigi Zama Ramirez, Matteo Poggi,
Cavalli, Marc Pollefeys
Samuele Salti, Stefano Mattoccia, Luigi Di Stefano
237. Optimizing Elimination Templates by Greedy Parameter
257. NPBG++: Accelerating Neural Point-Based Graphics, Ruslan
Search, Evgeniy Martyushev, Jana Vráblíková, Tomas Pajdla
Rakhimov, Andrei-Timotei Ardelean, Victor Lempitsky, Evgeny
Burnaev

49
Thursday, June 23 (Afternoon) Program
258. Depth-Guided Sparse Structure-From-Motion for Movies and
TV Shows, Sheng Liu, Xiaohan Nie, Raffay Hamid
259. Motion-From-Blur: 3D Shape and Motion Estimation of Motion-
Blurred Objects in Videos, Denys Rozumnyi, Martin R. Oswald,
Vittorio Ferrari, Marc Pollefeys

1700–1800 Plenary 3 (Hall B1)


Chair: Kristin Dana (Rutgers Univ.)
Keynote: Understanding Visual Appearance from Micron to
Global Scale, Kavita Bala (Cornell Univ.)
Abstract: Augmented reality/mixed reality (AR/MR) is poised
to create compelling and immersive user experiences by com-
bining computer vision and computer graphics. In this talk, I
will describe my group’s research on these complementary ar-
eas: graphics models for realistic visual appearance and render-
ing, reconstruction of shape and materials, and visual search
and recognition for scene understanding. Further, using recog-
nition as a core building block, we show how visual discovery of
images at a global scale can discover visual patterns and trends
across geography and time.

Notes:

50
Friday, June 24 (Morning) Program
Friday, June 24 14. [0947] Geometric Anchor Correspondence Mining With
Uncertainty Modeling for Universal Domain Adaptation, Liang
Chen, Yihang Lou, Jianzhong He, Tao Bai, Minghua Deng
0700–0830 Breakfast (Halls D-E) 15. [0952] Scaling Vision Transformers to Gigapixel Images via
Hierarchical Self-Supervised Learning, Richard J. Chen,
Chengkuan Chen, Yicong Li, Tiffany Y. Chen, Andrew D. Trister,
0800–1400 Registration (Great Hall Lobby) Rahul G. Krishnan, Faisal Mahmood
16. [1000] Versatile Multi-Modal Pre-Training for Human-Centric
0800–0830 Poster Setup (Halls B2-C) Perception, Fangzhou Hong, Liang Pan, Zhongang Cai, Ziwei Liu
17. [1005] Bridging Video-Text Retrieval With Multiple Choice
Questions, Yuying Ge, Yixiao Ge, Xihui Liu, Dian Li, Ying Shan,
0830–1018 Oral 4.1.1: Representation Learning Xiaohu Qie, Ping Luo
(Great Hall A-D)
18. [1010] Integrating Language Guidance Into Vision-Based Deep
Papers in this session are in Poster Session 4.1 Metric Learning, Karsten Roth, Oriol Vinyals, Zeynep Akata
Chairs: Jiajun Wu (Stanford Univ.)
Pablo Arbelaez (Universidad de los Andes)
0830–1018 Oral 4.1.2: Computational Photography
Format (5 min. presentation; 3 min. group questions/3 papers)
(Great Hall B-C)
1. [0830] Masked Autoencoders Are Scalable Vision Learners,
Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Papers in this session are in Poster Session 4.1
Ross Girshick Chairs: Jinwei Ye (Louisiana State Univ.)
2. [0835] Learning ABCs: Approximate Bijective Correspondence for Qi Shan (Apple)
Isolating Factors of Variation With Weak Supervision, Kieran A. Format (5 min. presentation; 3 min. group questions/3 papers)
Murphy, Varun Jampani, Srikumar Ramalingam, Ameesh Makadia 19. [0830] NeRF in the Dark: High Dynamic Range View Synthesis
3. [0840] Bayesian Invariant Risk Minimization, Yong Lin, Hanze From Noisy Raw Images, Ben Mildenhall, Peter Hedman, Ricardo
Dong, Hao Wang, Tong Zhang Martin-Brualla, Pratul P. Srinivasan, Jonathan T. Barron
4. [0848] Crafting Better Contrastive Views for Siamese 20. [0835] DIVeR: Real-Time and Accurate Neural Radiance Fields
Representation Learning, Xiangyu Peng, Kai Wang, Zheng Zhu, With Deterministic Integration for Volume Rendering, Liwen Wu,
Mang Wang, Yang You Jae Yong Lee, Anand Bhattad, Yu-Xiong Wang, David Forsyth
5. [0853] Rethinking Minimal Sufficient Representation in 21. [0840] HumanNeRF: Free-Viewpoint Rendering of Moving People
Contrastive Learning, Haoqing Wang, Xun Guo, Zhi-Hong Deng, From Monocular Video, Chung-Yi Weng, Brian Curless, Pratul P.
Yan Lu Srinivasan, Jonathan T. Barron, Ira Kemelmacher-Shlizerman
6. [0858] Multi-Level Feature Learning for Contrastive Multi-View 22. [0848] Neural Reflectance for Shape Recovery With Shadow
Clustering, Jie Xu, Huayi Tang, Yazhou Ren, Liang Peng, Xiaofeng Handling, Junxuan Li, Hongdong Li
Zhu, Lifang He 23. [0853] Visual Vibration Tomography: Estimating Interior Material
7. [0906] Point-Level Region Contrast for Object Detection Pre- Properties From Monocular Video, Berthy T. Feng, Alexander C.
Training, Yutong Bai, Xinlei Chen, Alexander Kirillov, Alan Yuille, Ogren, Chiara Daraio, Katherine L. Bouman
Alexander C. Berg 24. [0858] Dancing Under the Stars: Video Denoising in Starlight,
8. [0911] Class-Incremental Learning by Knowledge Distillation Kristina Monakhova, Stephan R. Richter, Laura Waller, Vladlen
With Adaptive Feature Consolidation, Minsoo Kang, Jaeyoo Park, Koltun
Bohyung Han 25. [0906] BACON: Band-Limited Coordinate Networks for
9. [0916] A Stitch in Time Saves Nine: A Train-Time Regularizing Multiscale Scene Representation, David B. Lindell, Dave Van
Loss for Improved Neural Network Calibration, Ramya Veen, Jeong Joon Park, Gordon Wetzstein
Hebbalaguppe, Jatin Prakash, Neelabh Madan, Chetan Arora 26. [0911] Practical Stereo Matching via Cascaded Recurrent
10. [0924] SLIC: Self-Supervised Learning With Iterative Clustering Network With Adaptive Correlation, Jiankun Li, Peisen Wang,
for Human Action Videos, Salar Hosseini Khorasgani, Yuxuan Pengfei Xiong, Tao Cai, Ziwei Yan, Lei Yang, Jiangyu Liu, Haoqiang
Chen, Florian Shkurti Fan, Shuaicheng Liu
11. [0929] OMNIVORE: A Single Model for Many Visual Modalities, 27. [0916] 3D Photo Stylization: Learning To Generate Stylized Novel
Rohit Girdhar, Mannat Singh, Nikhila Ravi, Laurens van der Views From a Single Image, Fangzhou Mu, Jian Wang, Yicheng
Maaten, Armand Joulin, Ishan Misra Wu, Yin Li
12. [0934] DPICT: Deep Progressive Image Compression Using Trit- 28. [0924] BokehMe: When Neural Rendering Meets Classical
Planes, Jae-Han Lee, Seungmin Jeon, Kwang Pyo Choi, Youngo Rendering, Juewen Peng, Zhiguo Cao, Xianrui Luo, Hao Lu, Ke
Park, Chang-Su Kim Xian, Jianming Zhang
13. [0942] Efficient Geometry-Aware 3D Generative Adversarial 29. [0929] Deblurring via Stochastic Refinement, Jay Whang,
Networks, Eric R. Chan, Connor Z. Lin, Matthew A. Chan, Koki Mauricio Delbracio, Hossein Talebi, Chitwan Saharia, Alexandros
Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas J. G. Dimakis, Peyman Milanfar
Guibas, Jonathan Tremblay, Sameh Khamis, Tero Karras, Gordon 30. [0934] Learning to Deblur Using Light Field Generated and Real
Wetzstein Defocus Images, Lingyan Ruan, Bin Chen, Jizhou Li, Miuling Lam

51
Friday, June 24 (Morning) Program
31. [0942] Towards Layer-Wise Image Vectorization, Xu Ma, Yuqian 49. [0942] PartGlot: Learning Shape Part Segmentation From
Zhou, Xingqian Xu, Bin Sun, Valerii Filev, Nikita Orlov, Yun Fu, Language Reference Games, Juil Koo, Ian Huang, Panos
Humphrey Shi Achlioptas, Leonidas J. Guibas, Minhyuk Sung
32. [0947] Dual-Shutter Optical Vibration Sensing, Mark Sheinin, 50. [0947] DF-GAN: A Simple and Effective Baseline for Text-to-
Dorian Chan, Matthew O'Toole, Srinivasa G. Narasimhan Image Synthesis, Ming Tao, Hao Tang, Fei Wu, Xiao-Yuan Jing,
33. [0952] Fisher Information Guidance for Learned Time-of-Flight Bing-Kun Bao, Changsheng Xu
Imaging, Jiaqu Li, Tao Yue, Sijie Zhao, Xuemei Hu 51. [0952] L-Verse: Bidirectional Generation Between Image and
34. [1000] Autofocus for Event Cameras, Shijie Lin, Yinqiang Zhang, Text, Taehoon Kim, Gwangmo Song, Sihaeng Lee, Sangyun Kim,
Lei Yu, Bin Zhou, Xiaowei Luo, Jia Pan Yewon Seo, Soonyoung Lee, Seung Hwan Kim, Honglak Lee,
35. [1005] Adaptive Gating for Single-Photon 3D Imaging, Ryan Po, Kyunghoon Bae
Adithya Pediredla, Ioannis Gkioulekas 52. [1000] Think Global, Act Local: Dual-Scale Graph Transformer for
36. [1010] LiDAR Snowfall Simulation for Robust 3D Object Vision-and-Language Navigation, Shizhe Chen, Pierre-Louis
Detection, Martin Hahner, Christos Sakaridis, Mario Bijelic, Felix Guhur, Makarand Tapaswi, Cordelia Schmid, Ivan Laptev
Heide, Fisher Yu, Dengxin Dai, Luc Van Gool 53. [1005] LaTr: Layout-Aware Transformer for Scene-Text VQA, Ali
Furkan Biten, Ron Litman, Yusheng Xie, Srikar Appalaraju, R.
Manmatha
0830–1018 Oral 4.1.3: Vision & Language (Hall B1) 54. [1010] Learning Program Representations for Food Images and
Papers in this session are in Poster Session 4.1 Cooking Recipes, Dim P. Papadopoulos, Enrique Mora, Nadiia
Chairs: Zicheng Liu (Microsoft) Chepurko, Kuan Wei Huang, Ferda Ofli, Antonio Torralba
Gul Varol (Ecole des Ponts ParisTech)
Format (5 min. presentation; 3 min. group questions/3 papers) 1030–1100 Morning Break (Halls B2-C)
37. [0830] MERLOT Reserve: Neural Script Knowledge Through
Vision and Language and Sound, Rowan Zellers, Jiasen Lu, Ximing
Lu, Youngjae Yu, Yanpeng Zhao, Mohammadreza Salehi, Aditya 1000–1230 Poster 4.1 (Halls B2-C)
Kusupati, Jack Hessel, Ali Farhadi, Yejin Choi Representation Learning
38. [0835] Joint Video Summarization and Moment Localization by 55. On the Importance of Asymmetry for Siamese Representation
Cross-Task Sample Transfer, Hao Jiang, Yadong Mu Learning, Xiao Wang, Haoqi Fan, Yuandong Tian, Daisuke Kihara,
39. [0840] Towards General Purpose Vision Systems: An End-to-End Xinlei Chen
Task-Agnostic Vision-Language Architecture, Tanmay Gupta, 56. Leverage Your Local and Global Representations: A New Self-
Amita Kamath, Aniruddha Kembhavi, Derek Hoiem Supervised Learning Strategy, Tong Zhang, Congpei Qiu, Wei Ke,
40. [0848] Disentangling Visual and Written Concepts in CLIP, Joanna Sabine Süsstrunk, Mathieu Salzmann
Materzyńska, Antonio Torralba, David Bau 57. Exploring Set Similarity for Dense Self-Supervised
41. [0853] CLIP-Event: Connecting Text and Images With Event Representation Learning, Zhaoqing Wang, Qiang Li, Guoxin
Structures, Manling Li, Ruochen Xu, Shuohang Wang, Luowei Zhang, Pengfei Wan, Wen Zheng, Nannan Wang, Mingming
Zhou, Xudong Lin, Chenguang Zhu, Michael Zeng, Heng Ji, Shih-Fu Gong, Tongliang Liu
Chang 58. Align Representations With Base: A New Approach to Self-
42. [0858] Robust Cross-Modal Representation Learning With Supervised Learning, Shaofeng Zhang, Lyn Qiu, Feng Zhu, Junchi
Progressive Self-Distillation, Alex Andonian, Shixing Chen, Raffay Yan, Hengrui Zhang, Rui Zhao, Hongyang Li, Xiaokang Yang
Hamid 59. Identifying Ambiguous Similarity Conditions via Semantic
43. [0906] TubeDETR: Spatio-Temporal Video Grounding With Matching, Han-Jia Ye, Yi Shi, De-Chuan Zhan
Transformers, Antoine Yang, Antoine Miech, Josef Sivic, Ivan 60. Node Representation Learning in Graph via Node-to-
Laptev, Cordelia Schmid Neighbourhood Mutual Information Maximization, Wei Dong,
44. [0911] 3D-SPS: Single-Stage 3D Visual Grounding via Referred Junsheng Wu, Yi Luo, Zongyuan Ge, Peng Wang
Point Progressive Selection, Junyu Luo, Jiahui Fu, Xianghao Kong, 61. Instance-Dependent Label-Noise Learning With Manifold-
Chen Gao, Haibing Ren, Hao Shen, Huaxia Xia, Si Liu Regularized Transition Matrix Estimation, De Cheng, Tongliang
45. [0916] 3DJCG: A Unified Framework for Joint Dense Captioning Liu, Yixiong Ning, Nannan Wang, Bo Han, Gang Niu, Xinbo Gao,
and Visual Grounding on 3D Point Clouds, Daigang Cai, Lichen Masashi Sugiyama
Zhao, Jing Zhang, Lu Sheng, Dong Xu 62. Unsupervised Visual Representation Learning by Online
Constrained K-Means, Qi Qian, Yuanhong Xu, Juhua Hu, Hao Li,
46. [0924] Globetrotter: Connecting Languages by Connecting
Rong Jin
Images, Dídac Surís, Dave Epstein, Carl Vondrick
63. Rethinking the Augmentation Module in Contrastive Learning:
47. [0929] Unsupervised Vision-and-Language Pre-Training via
Learning Hierarchical Augmentation Invariance With Expanded
Retrieval-Based Multi-Granular Alignment, Mingyang Zhou,
Views, Junbo Zhang, Kaisheng Ma
Licheng Yu, Amanpreet Singh, Mengjiao Wang, Zhou Yu, Ning
64. Use All the Labels: A Hierarchical Multi-Label Contrastive
Zhang
Learning Framework, Shu Zhang, Ran Xu, Caiming Xiong,
48. [0934] WebQA: Multihop and Multimodal QA, Yingshan Chang,
Chetan Ramaiah
Mridu Narang, Hisami Suzuki, Guihong Cao, Jianfeng Gao,
Yonatan Bisk 65. Robust Contrastive Learning Against Noisy Views, Ching-Yao
Chuang, R Devon Hjelm, Xin Wang, Vibhav Vineet, Neel Joshi,
Antonio Torralba, Stefanie Jegelka, Yale Song
52
Friday, June 24 (Morning) Program
66. On Learning Contrastive Representations for Learning With 87. Weakly Supervised Semantic Segmentation Using Out-of-
Noisy Labels, Li Yi, Sheng Liu, Qi She, A. Ian McLeod, Boyu Wang Distribution Data, Jungbeom Lee, Seong Joon Oh, Sangdoo Yun,
67. Directional Self-Supervised Learning for Heavy Image Junsuk Choe, Eunji Kim, Sungroh Yoon
Augmentations, Yalong Bai, Yifan Yang, Wei Zhang, Tao Mei 88. Tree Energy Loss: Towards Sparsely Annotated Semantic
68. Continual Learning for Visual Search With Backward Consistent Segmentation, Zhiyuan Liang, Tiancai Wang, Xiangyu Zhang,
Feature Embedding, Timmy S. T. Wan, Jun-Cheng Chen, Tzer-Yi Jian Sun, Jianbing Shen
Wu, Chu-Song Chen 89. Bending Reality: Distortion-Aware Transformers for Adapting
69. Probing Representation Forgetting in Supervised and to Panoramic Semantic Segmentation, Jiaming Zhang, Kailun
Unsupervised Continual Learning, MohammadReza Davari, Yang, Chaoxiang Ma, Simon Reiß, Kunyu Peng, Rainer
Nader Asadi, Sudhir Mudur, Rahaf Aljundi, Eugene Belilovsky Stiefelhagen
70. Mimicking the Oracle: An Initial Phase Decorrelation Approach 90. MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic
for Class Incremental Learning, Yujun Shi, Kuangqi Zhou, Jian Segmentation, Inkyu Shin, Yi-Hsuan Tsai, Bingbing Zhuang,
Liang, Zihang Jiang, Jiashi Feng, Philip H.S. Torr, Song Bai, Samuel Schulter, Buyu Liu, Sparsh Garg, In So Kweon, Kuk-Jin
Vincent Y. F. Tan Yoon
71. Bring Evanescent Representations to Life in Lifelong Class 91. NightLab: A Dual-Level Architecture With Hardness Detection
Incremental Learning, Marco Toldo, Mete Ozay for Segmentation at Night, Xueqing Deng, Peng Wang, Xiaochen
72. Unsupervised Learning of Debiased Representations With Lian, Shawn Newsam
Pseudo-Attributes, Seonguk Seo, Joon-Young Lee, Bohyung Han 92. Fast Point Transformer, Chunghyun Park, Yoonwoo Jeong, Minsu
73. A Conservative Approach for Unbiased Learning on Unknown Cho, Jaesik Park
Biases, Myeongho Jeon, Daekyung Kim, Woochul Lee, Myungjoo 93. RigidFlow: Self-Supervised Scene Flow Learning on Point
Kang, Joonseok Lee Clouds by Local Rigidity Prior, Ruibo Li, Chi Zhang, Guosheng
74. Evading the Simplicity Bias: Training a Diverse Set of Models Lin, Zhe Wang, Chunhua Shen
Discovers Solutions With Superior OOD Generalization, Damien 94. ConDor: Self-Supervised Canonicalization of 3D Pose for Partial
Teney, Ehsan Abbasnejad, Simon Lucey, Anton van den Hengel Shapes, Rahul Sajnani, Adrien Poulenard, Jivitesh Jain, Radhika
75. Co-Advise: Cross Inductive Bias Distillation, Sucheng Ren, Dua, Leonidas J. Guibas, Srinath Sridhar
Zhengqi Gao, Tianyu Hua, Zihui Xue, Yonglong Tian, Shengfeng 95. DisARM: Displacement Aware Relation Module for 3D
He, Hang Zhao Detection, Yao Duan, Chenyang Zhu, Yuqing Lan, Renjiao Yi,
76. PIXMIX: Dreamlike Pictures Comprehensively Improve Safety Xinwang Liu, Kai Xu
Measures, Dan Hendrycks, Andy Zou, Mantas Mazeika, Leonard 96. Learning Object Context for Novel-View Scene Layout
Tang, Bo Li, Dawn Song, Jacob Steinhardt Generation, Xiaotian Qiao, Gerhard P. Hancke, Rynson W.H. Lau
77. RegionCLIP: Region-Based Language-Image Pretraining, Yiwu 97. Weakly but Deeply Supervised Occlusion-Reasoned Parametric
Zhong, Jianwei Yang, Pengchuan Zhang, Chunyuan Li, Noel Road Layouts, Buyu Liu, Bingbing Zhuang, Manmohan
Codella, Liunian Harold Li, Luowei Zhou, Xiyang Dai, Lu Yuan, Yin Chandraker
Li, Jianfeng Gao 98. Beyond Cross-View Image Retrieval: Highly Accurate Vehicle
78. Uni-Perceiver: Pre-Training Unified Architecture for Generic Localization Using Satellite Image, Yujiao Shi, Hongdong Li
Perception for Zero-Shot and Few-Shot Tasks, Xizhou Zhu, 99. Raw High-Definition Radar for Multi-Task Learning, Julien
Jinguo Zhu, Hao Li, Xiaoshi Wu, Hongsheng Li, Xiaohua Wang, Rebut, Arthur Ouaknine, Waqas Malik, Patrick Pérez
Jifeng Dai 100. Zero Experience Required: Plug & Play Modular Transfer
79. Conditional Prompt Learning for Vision-Language Models, Learning for Semantic Visual Navigation, Ziad Al-Halah,
Kaiyang Zhou, Jingkang Yang, Chen Change Loy, Ziwei Liu Santhosh Kumar Ramakrishnan, Kristen Grauman
Scene Analysis & Understanding 101. UKPGAN: A General Self-Supervised Keypoint Detector, Yang
80. Noisy Boundaries: Lemon or Lemonade for Semi-Supervised You, Wenhai Liu, Yanjie Ze, Yong-Lu Li, Weiming Wang, Cewu Lu
Instance Segmentation? Zhenyu Wang, Yali Li, Shengjin Wang 102. Cannot See the Forest for the Trees: Aggregating Multiple
81. Partial Class Activation Attention for Semantic Segmentation, Viewpoints To Better Classify Objects in Videos, Sukjun Hwang,
Sun-Ao Liu, Hongtao Xie, Hai Xu, Yongdong Zhang, Qi Tian Miran Heo, Seoung Wug Oh, Seon Joo Kim
82. Learning Affinity From Attention: End-to-End Weakly- Navigation & Autonomous Driving
Supervised Semantic Segmentation With Transformers, Lixiang 103. Rethinking Efficient Lane Detection via Curve Modeling,
Ru, Yibing Zhan, Baosheng Yu, Bo Du Zhengyang Feng, Shaohua Guo, Xin Tan, Ke Xu, Min Wang,
83. Towards Noiseless Object Contours for Weakly Supervised Lizhuang Ma
Semantic Segmentation, Jing Li, Junsong Fan, Zhaoxiang Zhang 104. Exploiting Temporal Relations on Radar Perception for
84. Class Similarity Weighted Knowledge Distillation for Continual Autonomous Driving, Peizhao Li, Pu Wang, Karl Berntorp,
Semantic Segmentation, Minh Hieu Phan, The-Anh Ta, Son Lam Hongfu Liu
Phung, Long Tran-Thanh, Abdesselam Bouzerdoum 105. Towards Robust and Adaptive Motion Forecasting: A Causal
85. Structural and Statistical Texture Knowledge Distillation for Representation Perspective, Yuejiang Liu, Riccardo Cadei, Jonas
Semantic Segmentation, Deyi Ji, Haoran Wang, Mingyuan Tao, Schweizer, Sherwin Bahmani, Alexandre Alahi
Jianqiang Huang, Xian-Sheng Hua, Hongtao Lu 106. BE-STI: Spatial-Temporal Integrated Network for Class-
86. L2G: A Simple Local-to-Global Knowledge Transfer Framework Agnostic Motion Prediction With Bidirectional Enhancement,
for Weakly Supervised Semantic Segmentation, Peng-Tao Jiang, Yunlong Wang, Hongyu Pan, Jun Zhu, Yu-Huan Wu, Xin Zhan,
Yuqi Yang, Qibin Hou, Yunchao Wei Kun Jiang, Diange Yang

53
Friday, June 24 (Morning) Program
107. ScePT: Scene-Consistent, Policy-Based Trajectory Predictions 128. SelfD: Self-Learning Large-Scale Driving Policies From the Web,
for Planning, Yuxiao Chen, Boris Ivanovic, Marco Pavone Jimuyang Zhang, Ruizhao Zhu, Eshed Ohn-Bar
108. Stochastic Trajectory Prediction via Motion Indeterminacy 129. Towards Real-World Navigation With Deep Differentiable
Diffusion, Tianpei Gu, Guangyi Chen, Junlong Li, Chunze Lin, Planners, Shu Ishida, João F. Henriques
Yongming Rao, Jie Zhou, Jiwen Lu 130. Privacy Preserving Partial Localization, Marcel Geppert, Viktor
109. Vehicle Trajectory Prediction Works, but Not Everywhere, Larsson, Johannes L. Schönberger, Marc Pollefeys
Mohammadhossein Bahari, Saeed Saadatnejad, Ahmad Rahimi, 131. Efficient Large-Scale Localization by Global Instance
Mohammad Shaverdikondori, Amir Hossein Shahidzadeh, Seyed- Recognition, Fei Xue, Ignas Budvytis, Daniel Olmeda Reino,
Mohsen Moosavi-Dezfooli, Alexandre Alahi Roberto Cipolla
110. LTP: Lane-Based Trajectory Prediction for Autonomous Driving, 132. CrossLoc: Scalable Aerial Localization Assisted by Multimodal
Jingke Wang, Tengju Ye, Ziqing Gu, Junbo Chen Synthetic Data, Qi Yan, Jianhao Zheng, Simon Reding, Shanci Li,
111. ONCE-3DLanes: Building Monocular 3D Lane Detection, Fan Iordan Doytchinov
Yan, Ming Nie, Xinyue Cai, Jianhua Han, Hang Xu, Zhen Yang, Low-Level Vision
Chaoqiang Ye, Yanwei Fu, Michael Bi Mi, Li Zhang 133. Bilateral Video Magnification Filter, Shoichiro Takeda, Kenta
112. Towards Driving-Oriented Metric for Lane Detection Models, Niwa, Mariko Isogawa, Shinya Shimizu, Kazuki Okami, Yushi
Takami Sato, Qi Alfred Chen Aono
113. Eigenlanes: Data-Driven Lane Descriptors for Structurally 134. Neural Data-Dependent Transform for Learned Image
Diverse Lanes, Dongkwon Jin, Wonhui Park, Seong-Gyun Jeong, Compression, Dezhao Wang, Wenhan Yang, Yueyu Hu, Jiaying
Heeyeon Kwon, Chang-Su Kim Liu
114. LIFT: Learning 4D LiDAR Image Fusion Transformer for 3D 135. Towards Bidirectional Arbitrary Image Rescaling: Joint
Object Detection, Yihan Zeng, Da Zhang, Chunwei Wang, Optimization and Cycle Idempotence, Zhihong Pan, Baopu Li,
Zhenwei Miao, Ting Liu, Xin Zhan, Dayang Hao, Chao Ma Dongliang He, Mingde Yao, Wenhao Wu, Tianwei Lin, Xin Li, Errui
115. DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Ding
Object Detection, Yingwei Li, Adams Wei Yu, Tianjian Meng, Ben 136. Deep Generalized Unfolding Networks for Image Restoration,
Caine, Jiquan Ngiam, Daiyi Peng, Junyang Shen, Yifeng Lu, Denny Chong Mou, Qian Wang, Jian Zhang
Zhou, Quoc V. Le, Alan Yuille, Mingxing Tan 137. Look Back and Forth: Video Super-Resolution With Explicit
116. A Versatile Multi-View Framework for LiDAR-Based 3D Object Temporal Difference Modeling, Takashi Isobe, Xu Jia, Xin Tao,
Detection With Guidance From Panoptic Segmentation, Changlin Li, Ruihuang Li, Yongjie Shi, Jing Mu, Huchuan Lu, Yu-
Hamidreza Fazlali, Yixuan Xu, Yuan Ren, Bingbing Liu Wing Tai
117. Forecasting From LiDAR via Future Object Detection, Neehar 138. XYDeblur: Divide and Conquer for Single Image Deblurring,
Peri, Jonathon Luiten, Mengtian Li, Aljoša Ošep, Laura Leal-Taixé, Seo-Won Ji, Jeongmin Lee, Seung-Wook Kim, Jun-Pyo Hong,
Deva Ramanan Seung-Jin Baek, Seung-Won Jung, Sung-Jea Ko
118. RIDDLE: Lidar Data Compression With Range Image Deep Delta 139. Abandoning the Bayer-Filter To See in the Dark, Xingbo Dong,
Encoding, Xuanyu Zhou, Charles R. Qi, Yin Zhou, Dragomir Anguelov Wanyan Xu, Zhihui Miao, Lan Ma, Chao Zhang, Jiewen Yang, Zhe
119. Learning From All Vehicles, Dian Chen, Philipp Krähenbühl Jin, Andrew Beng Jin Teoh, Jiajun Shen
120. Is Mapping Necessary for Realistic PointGoal Navigation? 140. RSTT: Real-Time Spatial Temporal Transformer for Space-Time
Ruslan Partsey, Erik Wijmans, Naoki Yokoyama, Oles Video Super-Resolution, Zhicheng Geng, Luming Liang, Tianyu
Dobosevych, Dhruv Batra, Oleksandr Maksymets Ding, Ilya Zharkov
121. Symmetry-Aware Neural Architecture for Embodied Visual 141. All-in-One Image Restoration for Unknown Corruption, Boyun
Exploration, Shuang Liu, Takayuki Okatani Li, Xiao Liu, Peng Hu, Zhongqin Wu, Jiancheng Lv, Xi Peng
122. COOPERNAUT: End-to-End Driving With Cooperative Perception 142. Modeling sRGB Camera Noise With Normalizing Flows, Shayan
for Networked Vehicles, Jiaxun Cui, Hang Qiu, Dian Chen, Peter Kousha, Ali Maleky, Michael S. Brown, Marcus A. Brubaker
Stone, Yuke Zhu 143. A Differentiable Two-Stage Alignment Scheme for Burst Image
123. Topology Preserving Local Road Network Estimation From Reconstruction With Large Shift, Shi Guo, Xi Yang, Jianqi Ma,
Single Onboard Camera Image, Yigit Baran Can, Alexander Gaofeng Ren, Lei Zhang
Liniger, Danda Pani Paudel, Luc Van Gool 144. Video Frame Interpolation Transformer, Zhihao Shi, Xiangyu Xu,
124. Coupling Vision and Proprioception for Navigation of Legged Xiaohong Liu, Jun Chen, Ming-Hsuan Yang
Robots, Zipeng Fu, Ashish Kumar, Ananye Agarwal, Haozhi Qi, 145. The Devil Is in the Details: Window-Based Attention for Image
Jitendra Malik, Deepak Pathak Compression, Renjie Zou, Chunfeng Song, Zhaoxiang Zhang
125. Pyramid Architecture for Multi-Scale Processing in Point Cloud 146. Mask-Guided Spectral-Wise Transformer for Efficient
Segmentation, Dong Nie, Rui Lan, Ling Wang, Xiaofeng Ren Hyperspectral Image Reconstruction, Yuanhao Cai, Jing Lin,
126. 3D-VField: Adversarial Augmentation of Point Clouds for Xiaowan Hu, Haoqian Wang, Xin Yuan, Yulun Zhang, Radu
Domain Generalization in 3D Object Detection, Alexander Timofte, Luc Van Gool
Lehner, Stefano Gasperini, Alvaro Marcos-Ramiro, Michael 147. RestoreFormer: High-Quality Blind Face Restoration From
Schmidt, Mohammad-Ali Nikouei Mahani, Nassir Navab, Undegraded Key-Value Pairs, Zhouxia Wang, Jiawei Zhang,
Benjamin Busam, Federico Tombari Runjian Chen, Wenping Wang, Ping Luo
127. Generating Useful Accident-Prone Driving Scenarios via a 148. AdaInt: Learning Adaptive Intervals for 3D Lookup Tables on
Learned Traffic Prior, Davis Rempe, Jonah Philion, Leonidas J. Real-Time Image Enhancement, Canqian Yang, Meiguang Jin, Xu
Guibas, Sanja Fidler, Or Litany Jia, Yi Xu, Ying Chen

54
Friday, June 24 (Morning) Program
149. HerosNet: Hyperspectral Explicable Reconstruction and 168. AP-BSN: Self-Supervised Denoising for Real-World Images via
Optimal Sampling Deep Network for Snapshot Compressive Asymmetric PD and Blind-Spot Network, Wooseok Lee,
Imaging, Xuanyu Zhang, Yongbing Zhang, Ruiqin Xiong, Qilin Sanghyun Son, Kyoung Mu Lee
Sun, Jian Zhang 169. Synthetic Aperture Imaging With Events and Frames, Wei Liao,
150. HDNet: High-Resolution Dual-Domain Learning for Spectral Xiang Zhang, Lei Yu, Shijie Lin, Wen Yang, Ning Qiao
Compressive Imaging, Xiaowan Hu, Yuanhao Cai, Jing Lin, 170. Ev-TTA: Test-Time Adaptation for Event-Based Object
Haoqian Wang, Xin Yuan, Yulun Zhang, Radu Timofte, Luc Van Recognition, Junho Kim, Inwoo Hwang, Young Min Kim
Gool 171. Time Lens++: Event-Based Frame Interpolation With
151. Learning To Zoom Inside Camera Imaging Pipeline, Chengzhou Parametric Non-Linear Flow and Multi-Scale Fusion, Stepan
Tang, Yuqiang Yang, Bing Zeng, Ping Tan, Shuaicheng Liu Tulyakov, Alfredo Bochicchio, Daniel Gehrig, Stamatios
152. Towards an End-to-End Framework for Flow-Guided Video Georgoulis, Yuanyou Li, Davide Scaramuzza
Inpainting, Zhen Li, Cheng-Ze Lu, Jianhua Qin, Chun-Le Guo, 172. Unifying Motion Deblurring and Frame Interpolation With
Ming-Ming Cheng Events, Xiang Zhang, Lei Yu
153. Context-Aware Video Reconstruction for Rolling Shutter 173. EvUnroll: Neuromorphic Events Based Rolling Shutter Image
Cameras, Bin Fan, Yuchao Dai, Zhiyuan Zhang, Qi Liu, Mingyi He Correction, Xinyu Zhou, Peiqi Duan, Yi Ma, Boxin Shi
154. CVF-SID: Cyclic Multi-Variate Function for Self-Supervised 174. Learning Adaptive Warping for Real-World Rolling Shutter
Image Denoising by Disentangling Noise From Image, Correction, Mingdeng Cao, Zhihang Zhong, Jiahao Wang,
Reyhaneh Neshatavar, Mohsen Yavartanoo, Sanghyun Son, Yinqiang Zheng, Yujiu Yang
Kyoung Mu Lee 175. Neural Global Shutter: Learn To Restore Video From a Rolling
155. Global Matching With Overlapping Attention for Optical Flow Shutter Camera With Global Reset Feature, Zhixiang Wang,
Estimation, Shiyu Zhao, Long Zhao, Zhixing Zhang, Enyu Zhou, Xiang Ji, Jia-Bin Huang, Shin'ichi Satoh, Xiao Zhou, Yinqiang
Dimitris Metaxas Zheng
156. CRAFT: Cross-Attentional Flow Transformer for Robust Optical 176. TimeReplayer: Unlocking the Potential of Event Cameras for
Flow, Xiuchao Sui, Shaohua Li, Xue Geng, Yan Wu, Xinxing Xu, Video Interpolation, Weihua He, Kaichao You, Zhendong Qiao,
Yong Liu, Rick Goh, Hongyuan Zhu Xu Jia, Ziyang Zhang, Wenhui Wang, Huchuan Lu, Yaoyuan
157. Unified Multivariate Gaussian Mixture for Efficient Neural Wang, Jianxing Liao
Image Compression, Xiaosu Zhu, Jingkuan Song, Lianli Gao, 177. Optimizing Video Prediction via Video Frame Interpolation, Yue
Feng Zheng, Heng Tao Shen Wu, Qiang Wen, Qifeng Chen
158. Video Demoiréing With Relation-Based Temporal Consistency, 178. Reference-Based Video Super-Resolution Using Multi-Camera
Peng Dai, Xin Yu, Lan Ma, Baoheng Zhang, Jia Li, Wenbo Li, Video Triplets, Junyong Lee, Myeonghee Lee, Sunghyun Cho,
Jiajun Shen, Xiaojuan Qi Seungyong Lee
159. Noise2NoiseFlow: Realistic Camera Noise Modeling Without 179. Memory-Augmented Non-Local Attention for Video Super-
Clean Images, Ali Maleky, Shayan Kousha, Michael S. Brown, Resolution, Jiyang Yu, Jingen Liu, Liefeng Bo, Tao Mei
Marcus A. Brubaker 180. Optical Flow Estimation for Spiking Camera, Liwen Hu, Rui
160. Deep Constrained Least Squares for Blind Image Super- Zhao, Ziluo Ding, Lei Ma, Boxin Shi, Ruiqin Xiong, Tiejun Huang
Resolution, Ziwei Luo, Haibin Huang, Lei Yu, Youwei Li, Haoqiang 181. Compressive Single-Photon 3D Cameras, Felipe Gutierrez-
Fan, Shuaicheng Liu Barragan, Atul Ingle, Trevor Seets, Mohit Gupta, Andreas Velten
161. Learning Multiple Adverse Weather Removal via Two-Stage 182. Single-Photon Structured Light, Varun Sundar, Sizhuo Ma,
Knowledge Learning and Multi-Contrastive Regularization: Aswin C. Sankaranarayanan, Mohit Gupta
Toward a Unified Model, Wei-Ting Chen, Zhi-Kai Huang, Cheng- 183. All-Photon Polarimetric Time-of-Flight Imaging, Seung-Hwan
Che Tsai, Hao-Hsiang Yang, Jian-Jiun Ding, Sy-Yen Kuo Baek, Felix Heide
162. Unsupervised Homography Estimation With Coplanarity-Aware 184. Holocurtains: Programming Light Curtains via Binary
GAN, Mingbo Hong, Yuhang Lu, Nianjin Ye, Chunyu Lin, Qijun Holography, Dorian Chan, Srinivasa G. Narasimhan, Matthew
Zhao, Shuaicheng Liu O'Toole
Computational Photography Vision & Language
163. Attentive Fine-Grained Structured Sparsity for Image 185. Towards Implicit Text-Guided 3D Shape Generation, Zhengzhe
Restoration, Junghun Oh, Heewon Kim, Seungjun Nah, Cheeun Liu, Yi Wang, Xiaojuan Qi, Chi-Wing Fu
Hong, Jonghyun Choi, Kyoung Mu Lee 186. Towards Language-Free Training for Text-to-Image
164. Uformer: A General U-Shaped Transformer for Image Generation, Yufan Zhou, Ruiyi Zhang, Changyou Chen,
Restoration, Zhendong Wang, Xiaodong Cun, Jianmin Bao, Chunyuan Li, Chris Tensmeyer, Tong Yu, Jiuxiang Gu, Jinhui Xu,
Wengang Zhou, Jianzhuang Liu, Houqiang Li Tong Sun
165. Bringing Old Films Back to Life, Ziyu Wan, Bo Zhang, Dongdong 187. ZeroCap: Zero-Shot Image-to-Text Generation for Visual-
Chen, Jing Liao Semantic Arithmetic, Yoad Tewel, Yoav Shalev, Idan Schwartz,
166. Learning sRGB-to-Raw-RGB De-Rendering With Content- Lior Wolf
Aware Metadata, Seonghyeon Nam, Abhijith Punnappurath, 188. EMScore: Evaluating Video Captioning via Coarse-Grained and
Marcus A. Brubaker, Michael S. Brown Fine-Grained Embedding Matching, Yaya Shi, Xu Yang, Haiyang
167. SNR-Aware Low-Light Image Enhancement, Xiaogang Xu, Xu, Chunfeng Yuan, Bing Li, Weiming Hu, Zheng-Jun Zha
Ruixing Wang, Chi-Wing Fu, Jiaya Jia

55
Friday, June 24 (Morning) Program
189. Hierarchical Modular Network for Video Captioning, Hanhua Ye, 209. ReSTR: Convolution-Free Referring Image Segmentation Using
Guorong Li, Yuankai Qi, Shuhui Wang, Qingming Huang, Ming- Transformers, Namyup Kim, Dongwon Kim, Cuiling Lan, Wenjun
Hsuan Yang Zeng, Suha Kwak
190. SWINBERT: End-to-End Transformers With Sparse Attention for 210. LAVT: Language-Aware Vision Transformer for Referring Image
Video Captioning, Kevin Lin, Linjie Li, Chung-Ching Lin, Faisal Segmentation, Zhao Yang, Jiaqi Wang, Yansong Tang, Kai Chen,
Ahmed, Zhe Gan, Zicheng Liu, Yumao Lu, Lijuan Wang Hengshuang Zhao, Philip H.S. Torr
191. End-to-End Generative Pretraining for Multimodal Video 211. An Empirical Study of Training End-to-End Vision-and-
Captioning, Paul Hongsuck Seo, Arsha Nagrani, Anurag Arnab, Language Transformers, Zi-Yi Dou, Yichong Xu, Zhe Gan,
Cordelia Schmid Jianfeng Wang, Shuohang Wang, Lijuan Wang, Chenguang Zhu,
192. Beyond a Pre-Trained Object Detector: Cross-Modal Textual Pengchuan Zhang, Lu Yuan, Nanyun Peng, Zicheng Liu, Michael
and Visual Context for Image Captioning, Chia-Wen Kuo, Zsolt Zeng
Kira 212. Are Multimodal Transformers Robust to Missing Modality?
193. Scaling Up Vision-Language Pre-Training for Image Captioning, Mengmeng Ma, Jian Ren, Long Zhao, Davide Testuggine, Xi Peng
Xiaowei Hu, Zhe Gan, Jianfeng Wang, Zhengyuan Yang, Zicheng Image & Video Synthesis and Generation
Liu, Yumao Lu, Lijuan Wang 213. Text to Image Generation With Semantic-Spatial Aware GAN,
194. Comprehending and Ordering Semantics for Image Captioning, Wentong Liao, Kai Hu, Michael Ying Yang, Bodo Rosenhahn
Yehao Li, Yingwei Pan, Ting Yao, Tao Mei 214. StyleT2I: Toward Compositional and High-Fidelity Text-to-
195. NOC-REK: Novel Object Captioning With Retrieved Vocabulary Image Synthesis, Zhiheng Li, Martin Renqiang Min, Kai Li,
From External Knowledge, Duc Minh Vo, Hong Chen, Akihiro Chenliang Xu
Sugimoto, Hideki Nakayama 215. Blended Diffusion for Text-Driven Editing of Natural Images,
196. Injecting Semantic Concepts Into End-to-End Image Omri Avrahami, Dani Lischinski, Ohad Fried
Captioning, Zhiyuan Fang, Jianfeng Wang, Xiaowei Hu, Lin Liang, 216. Make It Move: Controllable Image-to-Video Generation With
Zhe Gan, Lijuan Wang, Yezhou Yang, Zicheng Liu Text Descriptions, Yaosi Hu, Chong Luo, Zhenzhong Chen
197. DIFNet: Boosting Visual Information Flow for Image Captioning, 217. Predict, Prevent, and Evaluate: Disentangled Text-Driven Image
Mingrui Wu, Xuying Zhang, Xiaoshuai Sun, Yiyi Zhou, Chao Chen, Manipulation Empowered by Pre-Trained Vision-Language
Jiaxin Gu, Xing Sun, Rongrong Ji Model, Zipeng Xu, Tianwei Lin, Hao Tang, Fu Li, Dongliang He,
198. VisualGPT: Data-Efficient Adaptation of Pretrained Language Nicu Sebe, Radu Timofte, Luc Van Gool, Errui Ding
Models for Image Captioning, Jun Chen, Han Guo, Kai Yi, Boyang 218. A Style-Aware Discriminator for Controllable Image
Li, Mohamed Elhoseiny Translation, Kunhee Kim, Sanghun Park, Eunyeong Jeon, Taehun
199. Show, Deconfound and Tell: Image Captioning With Causal Kim, Daijin Kim
Inference, Bing Liu, Dong Wang, Xu Yang, Yong Zhou, Rui Yao, 219. Alleviating Semantics Distortion in Unsupervised Low-Level
Zhiwen Shao, Jiaqi Zhao Image-to-Image Translation via Structure Consistency
200. EI-CLIP: Entity-Aware Interventional Contrastive Learning for E- Constraint, Jiaxian Guo, Jiachen Li, Huan Fu, Mingming Gong,
Commerce Cross-Modal Retrieval, Haoyu Ma, Handong Zhao, Kun Zhang, Dacheng Tao
Zhe Lin, Ajinkya Kale, Zhangyang Wang, Tong Yu, Jiuxiang Gu, 220. Exploring Patch-Wise Semantic Relation for Contrastive
Sunav Choudhary, Xiaohui Xie Learning in Image-to-Image Translation Tasks, Chanyong Jung,
201. CLIPstyler: Image Style Transfer With a Single Text Condition, Gihyun Kwon, Jong Chul Ye
Gihyun Kwon, Jong Chul Ye 221. FlexIT: Towards Flexible Semantic Image Translation, Guillaume
202. HairCLIP: Design Your Hair by Text and Reference Image, Tianyi Couairon, Asya Grechka, Jakob Verbeek, Holger Schwenk,
Wei, Dongdong Chen, Wenbo Zhou, Jing Liao, Zhentao Tan, Lu Matthieu Cord
Yuan, Weiming Zhang, Nenghai Yu 222. Modulated Contrast for Versatile Image Synthesis, Fangneng
203. DenseCLIP: Language-Guided Dense Prediction With Context- Zhan, Jiahui Zhang, Yingchen Yu, Rongliang Wu, Shijian Lu
Aware Prompting, Yongming Rao, Wenliang Zhao, Guangyi 223. QS-Attn: Query-Selected Attention for Contrastive Learning in
Chen, Yansong Tang, Zheng Zhu, Guan Huang, Jie Zhou, Jiwen Lu I2I Translation, Xueqi Hu, Xinyue Zhou, Qiusheng Huang, Zhengyi
204. On Guiding Visual Attention With Language Specification, Shi, Li Sun, Qingli Li
Suzanne Petryk, Lisa Dunlap, Keyan Nasseri, Joseph Gonzalez, 224. Self-Supervised Dense Consistency Regularization for Image-
Trevor Darrell, Anna Rohrbach to-Image Translation, Minsu Ko, Eunju Cha, Sungjoo Suh, Huijin
205. UTC: A Unified Transformer With Inter-Task Contrastive Lee, Jae-Joon Han, Jinwoo Shin, Bohyung Han
Learning for Visual Dialog, Cheng Chen, Zhenshan Tan, Qingrong 225. Maximum Spatial Perturbation Consistency for Unpaired
Cheng, Xin Jiang, Qun Liu, Yudong Zhu, Xiaodong Gu Image-to-Image Translation, Yanwu Xu, Shaoan Xie, Wenhao
206. Text-to-Image Synthesis Based on Object-Guided Joint- Wu, Kun Zhang, Mingming Gong, Kayhan Batmanghelich
Decoding Transformer, Fuxiang Wu, Liu Liu, Fusheng Hao, 226. InstaFormer: Instance-Aware Image-to-Image Translation With
Fengxiang He, Jun Cheng Transformer, Soohyun Kim, Jongbeom Baek, Jihye Park,
207. LiT: Zero-Shot Transfer With Locked-Image Text Tuning, Gyeongnyeon Kim, Seungryong Kim
Xiaohua Zhai, Xiao Wang, Basil Mustafa, Andreas Steiner, Daniel 227. Unsupervised Image-to-Image Translation With Generative
Keysers, Alexander Kolesnikov, Lucas Beyer Prior, Shuai Yang, Liming Jiang, Ziwei Liu, Chen Change Loy
208. GroupViT: Semantic Segmentation Emerges From Text 228. StylizedNeRF: Consistent 3D Scene Stylization As Stylized
Supervision, Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, NeRF via 2D-3D Mutual Learning, Yi-Hua Huang, Yue He, Yu-Jie
Thomas Breuel, Jan Kautz, Xiaolong Wang Yuan, Yu-Kun Lai, Lin Gao

56
Friday, June 24 (Morning) Program
229. NeRF-Editing: Geometry Editing of Neural Radiance Fields, Yu- 249. PhotoScene: Photorealistic Material and Lighting Transfer for
Jie Yuan, Yang-Tian Sun, Yu-Kun Lai, Yuewen Ma, Rongfei Jia, Lin Indoor Scenes, Yu-Ying Yeh, Zhengqin Li, Yannick Hold-Geoffroy,
Gao Rui Zhu, Zexiang Xu, Miloš Hašan, Kalyan Sunkavalli, Manmohan
230. GeoNeRF: Generalizing NeRF With Geometry Priors, Chandraker
Mohammad Mahdi Johari, Yann Lepoittevin, François Fleuret 250. Neural Template: Topology-Aware Reconstruction and
231. Ray Priors Through Reprojection: Improving Neural Radiance Disentangled Generation of 3D Meshes, Ka-Hei Hui, Ruihui Li,
Fields for Novel View Extrapolation, Jian Zhang, Yuanqing Jingyu Hu, Chi-Wing Fu
Zhang, Huan Fu, Xiaowei Zhou, Bowen Cai, Jinchi Huang, Rongfei 251. Neural Mesh Simplification, Rolandos Alexandros Potamias,
Jia, Binqiang Zhao, Xing Tang Stylianos Ploumpis, Stefanos Zafeiriou
232. AR-NeRF: Unsupervised Learning of Depth and Defocus Effects 252. SkinningNet: Two-Stream Graph Convolutional Neural Network
From Natural Images With Aperture Rendering Neural Radiance for Skinning Prediction of Synthetic Characters, Albert Mosella-
Fields, Takuhiro Kaneko Montoro, Javier Ruiz-Hidalgo
233. HDR-NeRF: High Dynamic Range Neural Radiance Fields, Xin 253. CLIP-Forge: Towards Zero-Shot Text-To-Shape Generation,
Huang, Qi Zhang, Ying Feng, Hongdong Li, Xuan Wang, Qing Aditya Sanghi, Hang Chu, Joseph G. Lambourne, Ye Wang, Chin-
Wang Yi Cheng, Marco Fumero, Kamal Rahimi Malekshan
234. NeRFReN: Neural Radiance Fields With Reflections, Yuan-Chen 254. UNIST: Unpaired Neural Implicit Shape Translation Network,
Guo, Di Kang, Linchao Bao, Yu He, Song-Hai Zhang Qimin Chen, Johannes Merz, Aditya Sanghi, Hooman Shayani, Ali
235. Neural Point Light Fields, Julian Ost, Issam Laradji, Alejandro Mahdavi-Amiri, Hao Zhang
Newell, Yuval Bahat, Felix Heide 255. CoNeRF: Controllable Neural Radiance Fields, Kacper Kania,
236. 3D-Aware Image Synthesis via Learning Structural and Textural Kwang Moo Yi, Marek Kowalski, Tomasz Trzciński, Andrea
Representations, Yinghao Xu, Sida Peng, Ceyuan Yang, Yujun Tagliasacchi
Shen, Bolei Zhou 256. Neural Points: Point Cloud Representation With Neural Fields
237. GIRAFFE HD: A High-Resolution 3D-Aware Generative Model, for Arbitrary Upsampling, Wanquan Feng, Jin Li, Hongrui Cai,
Yang Xue, Yuheng Li, Krishna Kumar Singh, Yong Jae Lee Xiaonan Luo, Juyong Zhang
238. Multi-View Consistent Generative Adversarial Networks for 3D- 257. Modeling Indirect Illumination for Inverse Rendering, Yuanqing
Aware Image Synthesis, Xuanmeng Zhang, Zhedong Zheng, Zhang, Jiaming Sun, Xingyi He, Huan Fu, Rongfei Jia, Xiaowei
Daiheng Gao, Bang Zhang, Pan Pan, Yi Yang Zhou
239. Bi-Level Doubly Variational Learning for Energy-Based Latent 258. Neural Head Avatars From Monocular RGB Videos, Philip-
Variable Models, Ge Kan, Jinhu Lü, Tian Wang, Baochang Zhang, William Grassal, Malte Prinzler, Titus Leistner, Carsten Rother,
Aichun Zhu, Lei Huang, Guodong Guo, Hichem Snoussi Matthias Nießner, Justus Thies
240. High-Resolution Image Harmonization via Collaborative Dual 259. DeepCurrents: Learning Implicit Representations of Shapes
Transformations, Wenyan Cong, Xinhao Tao, Li Niu, Jing Liang, With Boundaries, David Palmer, Dmitriy Smirnov, Stephanie
Xuesong Gao, Qihao Sun, Liqing Zhang Wang, Albert Chern, Justin Solomon
241. Brain-Supervised Image Editing, Keith M. Davis III, Carlos de la
Torre-Ortiz, Tuukka Ruotsalo 1130–1330 Lunch (Halls D-E)
Vision & Graphics
242. De-Rendering 3D Objects in the Wild, Felix Wimbauer, Shangzhe
Notes:
Wu, Christian Rupprecht
243. Neural Fields As Learnable Kernels for 3D Reconstruction,
Francis Williams, Zan Gojcic, Sameh Khamis, Denis Zorin, Joan
Bruna, Sanja Fidler, Or Litany
244. HyperStyle: StyleGAN Inversion With HyperNetworks for Real
Image Editing, Yuval Alaluf, Omer Tov, Ron Mokady, Rinon Gal,
Amit Bermano
245. 3PSDF: Three-Pole Signed Distance Function for Learning
Surfaces With Arbitrary Topologies, Weikai Chen, Cheng Lin,
Weiyang Li, Bo Yang
246. Pop-Out Motion: 3D-Aware Image Deformation via Learning
the Shape Laplacian, Jihyun Lee, Minhyuk Sung, Hyunjin Kim,
Tae-Kyun Kim
247. Deep Image-Based Illumination Harmonization, Zhongyun Bao,
Chengjiang Long, Gang Fu, Daquan Liu, Yuanzhen Li, Jiaming Wu,
Chunxia Xiao
248. GLASS: Geometric Latent Augmentation for Shape Spaces,
Sanjeev Muralikrishnan, Siddhartha Chaudhuri, Noam Aigerman,
Vladimir G. Kim, Matthew Fisher, Niloy J. Mitra

57
Friday, June 24 (Afternoon) Program
1300–1330 Poster Switch/Setup (Halls B2-C) 1330–1500 Oral 4.2.2: Scene & Shape Analysis and
Understanding (Hall B1)
1330–1500 Oral 4.2.1: Biometrics, Face & Gestures, Papers in this session are in Poster Session 4.2
and Medical Image Analysis (Great Hall A-D) Chairs: Philippos Mordohai (Stevens Inst. of Technology)
Papers in this session are in Poster Session 4.2 Angela Dai (Technical Univ. of Munich)
Chairs: David Crandall (Indiana Univ.) Format (5 min. presentation; 3 min. group questions/3 papers)
Yanxi Liu (Penn State Univ.) 16. [1330] VRDFormer: End-to-End Video Visual Relation Detection
With Transformers, Sipeng Zheng, Shizhe Chen, Qin Jin
Mahdi Hosseini (Univ. of New Brunswick)
17. [1335] Video K-Net: A Simple, Strong, and Unified Baseline for
Format (5 min. presentation; 3 min. group questions/3 papers)
Video Segmentation, Xiangtai Li, Wenwei Zhang, Jiangmiao
1. [1330] Escaping Data Scarcity for High-Resolution
Pang, Kai Chen, Guangliang Cheng, Yunhai Tong, Chen Change
Heterogeneous Face Hallucination, Yiqun Mei, Pengfei Guo,
Loy
Vishal M. Patel
18. [1340] Visual Acoustic Matching, Changan Chen, Ruohan Gao,
2. [1335] AnyFace: Free-Style Text-To-Face Synthesis and
Paul Calamia, Kristen Grauman
Manipulation, Jianxin Sun, Qiyao Deng, Qi Li, Muyi Sun, Min Ren,
Zhenan Sun 19. [1348] The Devil Is in the Labels: Noisy Label Correction for
3. [1340] General Facial Representation Learning in a Visual-
Robust Scene Graph Generation, Lin Li, Long Chen, Yifeng Huang,
Zhimeng Zhang, Songyang Zhang, Jun Xiao
Linguistic Manner, Yinglin Zheng, Hao Yang, Ting Zhang, Jianmin
Bao, Dongdong Chen, Yangyu Huang, Lu Yuan, Dong Chen, Ming 20. [1353] Learning Multiple Dense Prediction Tasks From Partially
Zeng, Fang Wen Annotated Data, Wei-Hong Li, Xialei Liu, Hakan Bilen
21. [1358] PONI: Potential Functions for ObjectGoal Navigation With
4. [1348] Self-Supervised Learning of Adversarial Example: Towards
Interaction-Free Learning, Santhosh Kumar Ramakrishnan,
Good Generalizations for Deepfake Detection, Liang Chen, Yong
Devendra Singh Chaplot, Ziad Al-Halah, Jitendra Malik, Kristen
Zhang, Yibing Song, Lingqiao Liu, Jue Wang
Grauman
5. [1353] Detecting Deepfakes With Self-Blended Images, Kaede
Shiohara, Toshihiko Yamasaki 22. [1406] Continual Stereo Matching of Continuous Driving Scenes
6. [1358] 3D Shape Variational Autoencoder Latent
With Growing Architecture, Chenghao Zhang, Kun Tian, Bin Fan,
Gaofeng Meng, Zhaoxiang Zhang, Chunhong Pan
Disentanglement via Mini-Batch Feature Swapping for Bodies
and Faces, Simone Foti, Bongjin Koo, Danail Stoyanov, Matthew J. 23. [1411] FIFO: Learning Fog-Invariant Features for Foggy Scene
Clarkson Segmentation, Sohyun Lee, Taeyoung Son, Suha Kwak
24. [1416] Both Style and Fog Matter: Cumulative Domain
7. [1406] Evaluation-Oriented Knowledge Distillation for Deep Face
Adaptation for Semantic Foggy Scene Understanding, Xianzheng
Recognition, Yuge Huang, Jiaxiang Wu, Xingkun Xu, Shouhong
Ma, Zhixiang Wang, Yacheng Zhan, Yinqiang Zheng, Zheng Wang,
Ding
Dengxin Dai, Chia-Wen Lin
8. [1411] AdaFace: Quality Adaptive Margin for Face Recognition,
Minchul Kim, Anil K. Jain, Xiaoming Liu 25. [1424] Equivariant Point Cloud Analysis via Learning Orientations
for Message Passing, Shitong Luo, Jiahan Li, Jiaqi Guan, Yufeng
9. [1416] Moving Window Regression: A Novel Approach to Ordinal
Regression, Nyeong-Ho Shin, Seon-Ho Lee, Chang-Su Kim Su, Chaoran Cheng, Jian Peng, Jianzhu Ma
26. [1429] Surface Representation for Point Clouds, Haoxi Ran, Jun
10. [1424] FaceFormer: Speech-Driven 3D Facial Animation With
Liu, Chengjie Wang
Transformers, Yingruo Fan, Zhaojiang Lin, Jun Saito, Wenping
27. [1434] Not All Points Are Equal: Learning Highly Efficient Point-
Wang, Taku Komura
Based Detectors for 3D LiDAR Point Clouds, Yifan Zhang,
11. [1429] Neural Emotion Director: Speech-Preserving Semantic
Qingyong Hu, Guoquan Xu, Yanxin Ma, Jianwei Wan, Yulan Guo
Control of Facial Expressions in “In-the-Wild” Videos, Foivos
Paraperas Papantoniou, Panagiotis P. Filntisis, Petros Maragos, 28. [1442] 3D Common Corruptions and Data Augmentation,
Anastasios Roussos Oğuzhan Fatih Kar, Teresa Yeo, Andrei Atanov, Amir Zamir
12. [1434] Deep Decomposition for Stochastic Normal-Abnormal 29. [1447] INS-Conv: Incremental Sparse Convolution for Online 3D
Transport, Peirong Liu, Yueh Lee, Stephen Aylward, Marc Segmentation, Leyao Liu, Tian Zheng, Yun-Jou Lin, Kai Ni, Lu
Niethammer Fang
30. [1452] How Much Does Input Data Type Impact Final Face Model
13. [1442] DTFD-MIL: Double-Tier Feature Distillation Multiple
Accuracy? Jiahao Luo, Fahim Hasan Khan, Issei Mori, Akila de
Instance Learning for Histopathology Whole Slide Image
Silva, Eric Sandoval Ruezga, Minghao Liu, Alex Pang, James Davis
Classification, Hongrun Zhang, Yanda Meng, Yitian Zhao, Yihong
Qiao, Xiaoyun Yang, Sarah E. Coupland, Yalin Zheng
14. [1447] Node-Aligned Graph Convolutional Network for Whole-
Slide Image Representation and Classification, Yonghang Guan,
Jun Zhang, Kuan Tian, Sen Yang, Pei Dong, Jinxi Xiang, Wei Yang,
Junzhou Huang, Yuyao Zhang, Xiao Han
15. [1452] Temporal Context Matters: Enhancing Single Image
Prediction With Disease Progression Representations, Aishik
Konwer, Xuan Xu, Joseph Bae, Chao Chen, Prateek Prasanna

58
Friday, June 24 (Afternoon) Program
1330–1500 Oral 4.2.3: Datasets & Evaluation, Action & 42. [1434] Episodic Memory Question Answering, Samyak Datta,
Event Recognition, and Visual Question Sameer Dharur, Vincent Cartillier, Ruta Desai, Mukul Khanna,
Answering (Great Hall B-C) Dhruv Batra, Devi Parikh
Papers in this session are in Poster Session 4.2 43. [1442] ScanQA: 3D Question Answering for Spatial Scene
Chairs: Ivan Laptev (INRIA Paris) Understanding, Daichi Azuma, Taiki Miyanishi, Shuhei Kurita,
Motoaki Kawanabe
Efstratios Gavves (Univ. of Amsterdam )
44. [1447] Learning Part Segmentation Through Unsupervised
Waqas Sultani (Information Technology Univ.)
Domain Adaptation From Synthetic Vehicles, Qing Liu, Adam
Format (5 min. presentation; 3 min. group questions/3 papers) Kortylewski, Zhishuai Zhang, Zizhang Li, Mengqi Guo, Qihao Liu,
31. [1330] Ego4D: Around the World in 3,000 Hours of Egocentric Xiaoding Yuan, Jiteng Mu, Weichao Qiu, Alan Yuille
Video, Kristen Grauman, Andrew Westbury, Eugene Byrne, 45. [1452] BTS: A Bi-Lingual Benchmark for Text Segmentation in
Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson the Wild, Xixi Xu, Zhongang Qi, Jianqi Ma, Honglun Zhang, Ying
Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Shan, Xiaohu Qie
Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan,
Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric
Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent 1500–1530 Afternoon Break (Halls B2-C)
Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli,
Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham
Gebreselasie, Cristina González, James Hillis, Xuhua Huang, Yifei
1430–1700 Poster 4.2 (Halls B2-C)
Huang, Wenqi Jia, Weslie Khoo, Jáchym Kolář, Satwik Kottur, Representation Learning
Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang 46. Unified Contrastive Learning in Image-Text-Label Space,
Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Jianwei Yang, Chunyuan Li, Pengchuan Zhang, Bin Xiao, Ce Liu,
Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz, Merey Lu Yuan, Jianfeng Gao
Ramazanova, Leda Sari, Kiran Somasundaram, Audrey 47. AlignMixup: Improving Representations by Interpolating
Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Aligned Features, Shashanka Venkataramanan, Ewa Kijak,
Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbeláez, Laurent Amsaleg, Yannis Avrithis
David Crandall, Dima Damen, Giovanni Maria Farinella, Christian 48. On the Road to Online Adaptation for Semantic Image
Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Segmentation, Riccardo Volpi, Pau De Jorge, Diane Larlus,
Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Gabriela Csurka
Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, 49. ADAS: A Direct Adaptation Strategy for Multi-Target Domain
Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Adaptive Semantic Segmentation, Seunghun Lee, Wonhyeok
Yan, Jitendra Malik Choi, Changjae Kim, Minwoo Choi, Sunghoon Im
32. [1335] TransRAC: Encoding Multi-Scale Temporal Correlation 50. Kernelized Few-Shot Object Detection With Efficient Integral
With Transformers for Repetitive Action Counting, Huazhang Hu, Aggregation, Shan Zhang, Lei Wang, Naila Murray, Piotr Koniusz
Sixun Dong, Yiqun Zhao, Dongze Lian, Zhengxin Li, Shenghua Gao 51. Neural Mean Discrepancy for Efficient Out-of-Distribution
33. [1340] Animal Kingdom: A Large and Diverse Dataset for Animal Detection, Xin Dong, Junfeng Guo, Ang Li, Wei-Te Ting, Cong Liu,
Behavior Understanding, Xun Long Ng, Kian Eng Ong, Qichen H.T. Kung
Zheng, Yun Ni, Si Yong Yeo, Jun Liu 52. A Structured Dictionary Perspective on Implicit Neural
34. [1348] vCLIMB: A Novel Video Class Incremental Learning Representations, Gizem Yüce, Guillermo Ortiz-Jiménez, Beril
Benchmark, Andrés Villa, Kumail Alhamoud, Victor Escorcia, Besbinar, Pascal Frossard
Fabian Caba, Juan León Alcázar, Bernard Ghanem 53. LARGE: Latent-Based Regression Through GAN Semantics,
35. [1353] Opening Up Open World Tracking, Yang Liu, Idil Esen Yotam Nitzan, Rinon Gal, Ofir Brenner, Daniel Cohen-Or
Zulfikar, Jonathon Luiten, Achal Dave, Deva Ramanan, Bastian 54. Rethinking Controllable Variational Autoencoders, Huajie Shao,
Leibe, Aljoša Ošep, Laura Leal-Taixé Yifei Yang, Haohong Lin, Longzhong Lin, Yizhuo Chen, Qinmin
36. [1358] Bongard-HOI: Benchmarking Few-Shot Visual Reasoning Yang, Han Zhao
for Human-Object Interactions, Huaizu Jiang, Xiaojian Ma, Weili 55. Learning Canonical F-Correlation Projection for Compact
Nie, Zhiding Yu, Yuke Zhu, Anima Anandkumar Multiview Representation, Yun-Hao Yuan, Jin Li, Yun Li, Jipeng
37. [1406] CNN Filter DB: An Empirical Investigation of Trained Qiang, Yi Zhu, Xiaobo Shen, Jianping Gou
Convolutional Filters, Paul Gavrikov, Janis Keuper 56. Cross-Architecture Self-Supervised Video Representation
38. [1411] Failure Modes of Domain Generalization Algorithms, Learning, Sheng Guo, Zihua Xiong, Yujie Zhong, Limin Wang,
Tigran Galstyan, Hrayr Harutyunyan, Hrant Khachatrian, Greg Ver Xiaobo Guo, Bing Han, Weilin Huang
Steeg, Aram Galstyan 57. Improving Video Model Transfer With Dynamic Representation
39. [1416] A Comprehensive Study of Image Classification Model Learning, Yi Li, Nuno Vasconcelos
Sensitivity to Foregrounds, Backgrounds, and Visual Attributes, 58. Self-Supervised Image Representation Learning With
Mazda Moayeri, Phillip Pope, Yogesh Balaji, Soheil Feizi Geometric Set Consistency, Nenglun Chen, Lei Chu, Hao Pan,
40. [1424] Grounding Answers for Visual Questions Asked by Visually Yan Lu, Wenping Wang
Impaired People, Chongyan Chen, Samreen Anjum, Danna Gurari 59. HLRTF: Hierarchical Low-Rank Tensor Factorization for Inverse
41. [1429] Learning To Answer Questions in Dynamic Audio-Visual Problems in Multi-Dimensional Imaging, Yisi Luo, Xi-Le Zhao,
Scenarios, Guangyao Li, Yake Wei, Yapeng Tian, Chenliang Xu, Ji- Deyu Meng, Tai-Xiang Jiang
Rong Wen, Di Hu
59
Friday, June 24 (Afternoon) Program
60. Point-BERT: Pre-Training 3D Point Cloud Transformers With 81. “The Pedestrian Next to the Lamppost” Adaptive Object Graphs
Masked Point Modeling, Xumin Yu, Lulu Tang, Yongming Rao, for Better Instantaneous Mapping, Avishkar Saha, Oscar
Tiejun Huang, Jie Zhou, Jiwen Lu Mendez, Chris Russell, Richard Bowden
61. DiGS: Divergence Guided Shape Implicit Neural Representation 82. Category-Aware Transformer Network for Better Human-
for Unoriented Point Clouds, Yizhak Ben-Shabat, Chamin Hewa Object Interaction Detection, Leizhen Dong, Zhimin Li, Kunlun
Koneputugodage, Stephen Gould Xu, Zhijun Zhang, Luxin Yan, Sheng Zhong, Xu Zou
62. Neural Convolutional Surfaces, Luca Morreale, Noam Aigerman, 83. Exploring Structure-Aware Transformer Over Interaction Pro-
Paul Guerrero, Vladimir G. Kim, Niloy J. Mitra posals for Human-Object Interaction Detection, Yong Zhang,
63. Representing 3D Shapes With Probabilistic Directed Distance Yingwei Pan, Ting Yao, Rui Huang, Tao Mei, Chang-Wen Chen
Fields, Tristan Aumentado-Armstrong, Stavros Tsogkas, Sven 84. Distillation Using Oracle Queries for Transformer-Based
Dickinson, Allan D. Jepson Human-Object Interaction Detection, Xian Qu, Changxing Ding,
64. H4D: Human 4D Modeling by Learning Neural Compositional Xingao Li, Xubin Zhong, Dacheng Tao
Representation, Boyan Jiang, Yinda Zhang, Xingkui Wei, 85. Human-Object Interaction Detection via Disentangled
Xiangyang Xue, Yanwei Fu Transformer, Desen Zhou, Zhichao Liu, Jian Wang, Leshan Wang,
65. Learning Memory-Augmented Unidirectional Metrics for Cross- Tao Hu, Errui Ding, Jingdong Wang
Modality Person Re-Identification, Jialun Liu, Yifan Sun, Feng 86. MSTR: Multi-Scale Transformer for End-to-End Human-Object
Zhu, Hongbin Pei, Yi Yang, Wenhui Li Interaction Detection, Bumsoo Kim, Jonghwan Mun, Kyoung-
66. Contrastive Regression for Domain Adaptation on Gaze Woon On, Minchul Shin, Junhyun Lee, Eun-Sol Kim
Estimation, Yaoming Wang, Yangzhou Jiang, Jin Li, Bingbing Ni, 87. GaTector: A Unified Framework for Gaze Object Prediction,
Wenrui Dai, Chenglin Li, Hongkai Xiong, Teng Li Binglu Wang, Tao Hu, Baoshan Li, Xiaojuan Chen, Zhijie Zhang
67. Forward Compatible Training for Large-Scale Embedding 88. STCrowd: A Multimodal Dataset for Pedestrian Perception in
Retrieval Systems, Vivek Ramanujan, Pavan Kumar Anasosalu Crowded Scenes, Peishan Cong, Xinge Zhu, Feng Qiao, Yiming
Vasu, Ali Farhadi, Oncel Tuzel, Hadi Pouransari Ren, Xidong Peng, Yuenan Hou, Lan Xu, Ruigang Yang, Dinesh
68. Improving Subgraph Recognition With Variational Graph Manocha, Yuexin Ma
Information Bottleneck, Junchi Yu, Jie Cao, Ran He 89. Crowd Counting in the Frequency Domain, Weibo Shu, Jia Wan,
69. Learning Soft Estimator of Keypoint Scale and Orientation With Kay Chen Tan, Sam Kwong, Antoni B. Chan
Probabilistic Covariant Loss, Pei Yan, Yihua Tan, Shengzhou 90. Boosting Crowd Counting via Multifaceted Attention, Hui Lin,
Xiong, Yuan Tai, Yansheng Li Zhiheng Ma, Rongrong Ji, Yaowei Wang, Xiaopeng Hong
70. Few-Shot Keypoint Detection With Uncertainty Learning for 91. Rethinking Spatial Invariance of Convolutional Networks for
Unseen Species, Changsheng Lu, Piotr Koniusz Object Counting, Zhi-Qi Cheng, Qi Dai, Hong Li, Jingkuan Song,
Scene Analysis and Understanding Xiao Wu, Alexander G. Hauptmann
71. Stacked Hybrid-Attention and Group Collaborative Learning for 92. Cerberus Transformer: Joint Semantic, Affordance and
Unbiased Scene Graph Generation, Xingning Dong, Tian Gan, Attribute Parsing, Xiaoxue Chen, Tianyu Liu, Hao Zhao, Guyue
Xuemeng Song, Jianlong Wu, Yuan Cheng, Liqiang Nie Zhou, Ya-Qin Zhang
72. Structured Sparse R-CNN for Direct Scene Graph Generation, 93. Collaborative Transformers for Grounded Situation
Yao Teng, Limin Wang Recognition, Junhyeong Cho, Youngseok Yoon, Suha Kwak
73. PPDL: Predicate Probability Distribution Based Loss for Computational Photography
Unbiased Scene Graph Generation, Wei Li, Haiwei Zhang, Qijie 94. Deep Stereo Image Compression via Bi-Directional Coding,
Bai, Guoqing Zhao, Ning Jiang, Xiaojie Yuan Jianjun Lei, Xiangrui Liu, Bo Peng, Dengchao Jin, Wanqing Li,
74. RU-Net: Regularized Unrolling Network for Scene Graph Jingxiao Gu
Generation, Xin Lin, Changxing Ding, Jing Zhang, Yibing Zhan, 95. RFNet: Unsupervised Network for Mutually Reinforcing Multi-
Dacheng Tao Modal Image Registration and Fusion, Han Xu, Jiayi Ma, Jiteng
75. Fine-Grained Predicates Learning for Scene Graph Generation, Yuan, Zhuliang Le, Wei Liu
Xinyu Lyu, Lianli Gao, Yuyu Guo, Zhou Zhao, Hao Huang, Heng 96. Semi-Supervised Wide-Angle Portraits Correction by Multi-
Tao Shen, Jingkuan Song Scale Transformer, Fushun Zhu, Shan Zhao, Peng Wang, Hao
76. HL-Net: Heterophily Learning Network for Scene Graph Wang, Hua Yan, Shuaicheng Liu
Generation, Xin Lin, Changxing Ding, Yibing Zhan, Zijian Li, 97. Semi-Supervised Learning of Semantic Correspondence With
Dacheng Tao Pseudo-Labels, Jiwon Kim, Kwangrok Ryoo, Junyoung Seo,
77. SGTR: End-to-End Scene Graph Generation With Transformer, Gyuseong Lee, Daehwan Kim, Hansang Cho, Seungryong Kim
Rongjie Li, Songyang Zhang, Xuming He 98. SCS-Co: Self-Consistent Style Contrastive Learning for Image
78. Classification-Then-Grounding: Reformulating Video Scene Harmonization, Yucheng Hang, Bin Xia, Wenming Yang, Qingmin
Graphs As Temporal Bipartite Graphs, Kaifeng Gao, Long Chen, Liao
Yulei Niu, Jian Shao, Jun Xiao 99. Automatic Color Image Stitching Using Quaternion Rank-1
79. RelTransformer: A Transformer-Based Long-Tail Visual Alignment, Jiaxue Li, Yicong Zhou
Relationship Recognition, Jun Chen, Aniket Agarwal, Sherif 100. SpaceEdit: Learning a Unified Editing Space for Open-Domain
Abdelkarim, Deyao Zhu, Mohamed Elhoseiny Image Color Editing, Jing Shi, Ning Xu, Haitian Zheng, Alex
80. Spatial Commonsense Graph for Object Localisation in Partial Smith, Jiebo Luo, Chenliang Xu
Scenes, Francesco Giuliari, Geri Skenderi, Marco Cristani, Yiming 101. Degree-of-Linear-Polarization-Based Color Constancy, Taishi
Wang, Alessio Del Bue Ono, Yuhi Kondo, Legong Sun, Teppei Kurita, Yusuke Moriuchi

60
Friday, June 24 (Afternoon) Program
102. Point Cloud Color Constancy, Xiaoyan Xing, Yanlin Qian, Sibo 123. Alignment-Uniformity Aware Representation Learning for
Feng, Yuhan Dong, Jiří Matas Zero-Shot Video Classification, Shi Pu, Kaili Zhao, Mao Zheng
103. Boosting View Synthesis With Residual Transfer, Xuejian Rong, 124. Cross-Modal Representation Learning for Zero-Shot Action
Jia-Bin Huang, Ayush Saraf, Changil Kim, Johannes Kopf Recognition, Chung-Ching Lin, Kevin Lin, Lijuan Wang, Zicheng
104. Deep Hyperspectral-Depth Reconstruction Using Single Color- Liu, Linjie Li
Dot Projection, Chunyu Li, Yusuke Monno, Masatoshi Okutomi 125. Cross-Modal Background Suppression for Audio-Visual Event
105. Quantization-Aware Deep Optics for Diffractive Snapshot Localization, Yan Xia, Zhou Zhao
Hyperspectral Imaging, Lingen Li, Lizhi Wang, Weitao Song, Lei 126. Fine-Grained Temporal Contrastive Learning for Weakly-
Zhang, Zhiwei Xiong, Hua Huang Supervised Temporal Action Localization, Junyu Gao, Mengyuan
106. PIE-Net: Photometric Invariant Edge Guided Network for Chen, Changsheng Xu
Intrinsic Image Decomposition, Partha Das, Sezer Karaoglu, 127. An Empirical Study of End-to-End Temporal Action Detection,
Theo Gevers Xiaolong Liu, Song Bai, Xiang Bai
107. Multimodal Material Segmentation, Yupeng Liang, Ryosuke 128. Everything at Once – Multi-Modal Fusion Transformer for Video
Wakaki, Shohei Nobuhara, Ko Nishino Retrieval, Nina Shvetsova, Brian Chen, Andrew Rouditchenko,
108. Occlusion-Aware Cost Constructor for Light Field Depth Samuel Thomas, Brian Kingsbury, Rogerio S. Feris, David
Estimation, Yingqian Wang, Longguang Wang, Zhengyu Liang, Harwath, James Glass, Hilde Kuehne
Jungang Yang, Wei An, Yulan Guo 129. DirecFormer: A Directed Attention in Transformer Approach to
109. Learning Neural Light Fields With Ray-Space Embedding, Robust Action Recognition, Thanh-Dat Truong, Quoc-Huy Bui,
Benjamin Attal, Jia-Bin Huang, Michael Zollhöfer, Johannes Kopf, Chi Nhan Duong, Han-Seok Seo, Son Lam Phung, Xin Li, Khoa Luu
Changil Kim 130. MS-TCT: Multi-Scale Temporal ConvTransformer for Action
110. Acquiring a Dynamic Light Field Through a Single-Shot Coded Detection, Rui Dai, Srijan Das, Kumara Kahatapitiya, Michael S.
Image, Ryoya Mizuno, Keita Takahashi, Michitaka Yoshida, Ryoo, François Brémond
Chihiro Tsutake, Toshiaki Fujii, Hajime Nagahara 131. Uncertainty-Guided Probabilistic Transformer for Complex
111. Gravitationally Lensed Black Hole Emission Tomography, Aviad Action Recognition, Hongji Guo, Hanjing Wang, Qiang Ji
Levis, Pratul P. Srinivasan, Andrew A. Chael, Ren Ng, Katherine L. 132. AdaFocus V2: End-to-End Training of Spatial Dynamic
Bouman Networks for Video Recognition, Yulin Wang, Yang Yue, Yuanze
112. Deep Saliency Prior for Reducing Visual Distraction, Kfir Lin, Haojun Jiang, Zihang Lai, Victor Kulikov, Nikita Orlov,
Aberman, Junfeng He, Yossi Gandelsman, Inbar Mosseri, David E. Humphrey Shi, Gao Huang
Jacobs, Kai Kohlhoff, Yael Pritch, Michael Rubinstein 133. UBoCo: Unsupervised Boundary Contrastive Learning for
113. Personalized Image Aesthetics Assessment With Rich Generic Event Boundary Detection, Hyolim Kang, Jinwoo Kim,
Attributes, Yuzhe Yang, Liwu Xu, Leida Li, Nan Qie, Yaqian Li, Taehyun Kim, Seon Joo Kim
Peng Zhang, Yandong Guo 134. Detector-Free Weakly Supervised Group Activity Recognition,
114. Artistic Style Discovery With Independent Components, Xin Xie, Dongkeun Kim, Jinsung Lee, Minsu Cho, Suha Kwak
Yi Li, Huaibo Huang, Haiyan Fu, Wanwan Wang, Yanqing Guo 135. Multi-Grained Spatio-Temporal Features Perceived Network for
Action and Event Recognition Event-Based Lip-Reading, Ganchao Tan, Yang Wang, Han Han,
Yang Cao, Feng Wu, Zheng-Jun Zha
115. Bridge-Prompt: Towards Ordinal Action Understanding in
Instructional Videos, Muheng Li, Lei Chen, Yueqi Duan, Zhilan 136. Efficient Two-Stage Detection of Human-Object Interactions
Hu, Jianjiang Feng, Jie Zhou, Jiwen Lu With a Novel Unary-Pairwise Transformer, Frederic Z. Zhang,
116. SVIP: Sequence VerIfication for Procedures in Videos, Yicheng
Dylan Campbell, Stephen Gould
Qian, Weixin Luo, Dongze Lian, Xu Tang, Peilin Zhao, Shenghua 137. Interactiveness Field in Human-Object Interactions, Xinpeng Liu,
Gao Yong-Lu Li, Xiaoqian Wu, Yu-Wing Tai, Cewu Lu, Chi-Keung Tang
117. Set-Supervised Action Learning in Procedural Task Videos via 138. GEN-VLKT: Simplify Association and Enhance Interaction
Pairwise Order Consistency, Zijia Lu, Ehsan Elhamifar Understanding for HOI Detection, Yue Liao, Aixi Zhang, Miao Lu,
Yongliang Wang, Xiaobo Li, Si Liu
118. Exploring Denoised Cross-Video Contrast for Weakly-
Supervised Temporal Action Localization, Jingjing Li, Tianyu 139. Object-Relation Reasoning Graph for Action Recognition,
Yang, Wei Ji, Jue Wang, Li Cheng Yangjun Ou, Li Mi, Zhenzhong Chen
119. GateHUB: Gated History Unit With Background Suppression for 140. UBnormal: New Benchmark for Supervised Open-Set Video
Online Action Detection, Junwen Chen, Gaurav Mittal, Ye Yu, Yu Anomaly Detection, Andra Acsintoae, Andrei Florescu, Mariana-
Kong, Mei Chen Iuliana Georgescu, Tudor Mare, Paul Sumedrea, Radu Tudor
Ionescu, Fahad Shahbaz Khan, Mubarak Shah
120. E2(GO)MOTION: Motion Augmented Event Stream for
141. Decoupling and Recoupling Spatiotemporal Representation for
Egocentric Action Recognition, Chiara Plizzari, Mirco
RGB-D-Based Motion Recognition, Benjia Zhou, Pichao Wang,
Planamente, Gabriele Goletto, Marco Cannici, Emanuele Gusso,
Jun Wan, Yanyan Liang, Fan Wang, Du Zhang, Zhen Lei, Hao Li,
Matteo Matteucci, Barbara Caputo
Rong Jin
121. Hybrid Relation Guided Set Matching for Few-Shot Action
142. SPAct: Self-Supervised Privacy Preservation for Action Recogni-
Recognition, Xiang Wang, Shiwei Zhang, Zhiwu Qing, Mingqian
tion, Ishan Rajendrakumar Dave, Chen Chen, Mubarak Shah
Tang, Zhengrong Zuo, Changxin Gao, Rong Jin, Nong Sang
143. Unsupervised Action Segmentation by Joint Representation
122. Spatio-Temporal Relation Modeling for Few-Shot Action
Learning and Online Clustering, Sateesh Kumar, Sanjay Haresh,
Recognition, Anirudh Thatipelli, Sanath Narayan, Salman Khan,
Awais Ahmed, Andrey Konin, M. Zeeshan Zia, Quoc-Huy Tran
Rao Muhammad Anwer, Fahad Shahbaz Khan, Bernard Ghanem
61
Friday, June 24 (Afternoon) Program
144. InfoGCN: Representation Learning for Human Skeleton-Based 163. Sparse to Dense Dynamic 3D Facial Expression Generation,
Action Recognition, Hyung-gun Chi, Myoung Hoon Ha, Naima Otberdout, Claudio Ferrari, Mohamed Daoudi, Stefano
Seunggeun Chi, Sang Wan Lee, Qixing Huang, Karthik Ramani Berretti, Alberto Del Bimbo
145. Learning Video Representations of Human Motion From 164. Learning To Listen: Modeling Non-Deterministic Dyadic Facial
Synthetic Data, Xi Guo, Wei Wu, Dongliang Wang, Jing Su, Motion, Evonne Ng, Hanbyul Joo, Liwen Hu, Hao Li, Trevor
Haisheng Su, Weihao Gan, Jian Huang, Qin Yang Darrell, Angjoo Kanazawa, Shiry Ginosar
146. Learnable Irrelevant Modality Dropout for Multimodal Action 165. Speech Driven Tongue Animation, Salvador Medina, Denis
Recognition on Modality-Specific Annotated Videos, Saghir Tome, Carsten Stoll, Mark Tiede, Kevin Munhall, Alexander G.
Alfasly, Jian Lu, Chen Xu, Yuru Zou Hauptmann, Iain Matthews
Biometrics 166. Knowledge-Driven Self-Supervised Representation Learning for
147. EyePAD++: A Distillation-Based Approach for Joint Eye Facial Action Unit Recognition, Yanan Chang, Shangfei Wang
Authentication and Presentation Attack Detection Using 167. gDNA: Towards Generative Detailed Neural Avatars, Xu Chen,
Periocular Images, Prithviraj Dhar, Amit Kumar, Kirsten Kaplan, Tianjian Jiang, Jie Song, Jinlong Yang, Michael J. Black, Andreas
Khushi Gupta, Rakesh Ranjan, Rama Chellappa Geiger, Otmar Hilliges
148. Gait Recognition in the Wild With Dense 3D Representations 168. GraFormer: Graph-Oriented Transformer for 3D Pose
and a Benchmark, Jinkai Zheng, Xinchen Liu, Wu Liu, Lingxiao Estimation, Weixi Zhao, Weiqiang Wang, Yunjie Tian
He, Chenggang Yan, Tao Mei 169. Uncertainty-Aware Adaptation for Self-Supervised 3D Human
149. Camera-Conditioned Stable Feature Generation for Isolated Pose Estimation, Jogendra Nath Kundu, Siddharth Seth,
Camera Supervised Person Re-IDentification, Chao Wu, Pradyumna YM, Varun Jampani, Anirban Chakraborty, R.
Wenhang Ge, Ancong Wu, Xiaobin Chang Venkatesh Babu
150. Lagrange Motion Analysis and View Embeddings for Improved 170. Towards Diverse and Natural Scene-Aware 3D Human Motion
Gait Recognition, Tianrui Chai, Annan Li, Shaoxiong Zhang, Synthesis, Jingbo Wang, Yu Rong, Jingyuan Liu, Sijie Yan, Dahua
Zilong Li, Yunhong Wang Lin, Bo Dai
151. DeepFace-EMD: Re-Ranking Using Patch-Wise Earth Mover’s 171. PINA: Learning a Personalized Implicit Neural Avatar From a
Distance Improves Out-of-Distribution Face Identification, Hai Single RGB-D Video Sequence, Zijian Dong, Chen Guo, Jie Song,
Phan, Anh Nguyen Xu Chen, Andreas Geiger, Otmar Hilliges
152. Learning Second Order Local Anomaly for General Face Forgery 172. The Wanderings of Odysseus in 3D Scenes, Yan Zhang, Siyu
Detection, Jianwei Fei, Yunshu Dai, Peipeng Yu, Tianrun Shen, Tang
Zhihua Xia, Jian Weng 173. OSSO: Obtaining Skeletal Shape From Outside, Marilyn Keller,
153. PatchNet: A Simple Face Anti-Spoofing Framework via Fine- Silvia Zuffi, Michael J. Black, Sergi Pujades
Grained Patch Recognition, Chien-Yi Wang, Yu-Ding Lu, Shang- 174. LiDARCap: Long-Range Marker-Less 3D Human Motion
Ta Yang, Shang-Hong Lai Capture With LiDAR Point Clouds, Jialian Li, Jingyi Zhang,
154. Face2Exp: Combating Data Biases for Facial Expression Zhiyong Wang, Siqi Shen, Chenglu Wen, Yuexin Ma, Lan Xu,
Recognition, Dan Zeng, Zhiyuan Lin, Xiao Yan, Yuting Liu, Fei Jingyi Yu, Cheng Wang
Wang, Bo Tang 175. Unimodal-Concentrated Loss: Fully Adaptive Label Distribution
155. Local-Adaptive Face Recognition via Graph-Based Meta- Learning for Ordinal Regression, Qiang Li, Jingjing Wang,
Clustering and Regularized Adaptation, Wenbin Zhu, Chien-Yi Zhaoliang Yao, Yachun Li, Pengju Yang, Jingwei Yan, Chunmao
Wang, Kuan-Lun Tseng, Shang-Hong Lai, Baoyuan Wang Wang, Shiliang Pu
176. Spatial-Temporal Parallel Transformer for Arm-Hand Dynamic
Face and Gestures
Estimation, Shuying Liu, Wenbin Wu, Jiaxian Wu, Yue Lin
156. EMOCA: Emotion Driven Monocular Face Capture and
177. LISA: Learning Implicit Shape and Appearance of Hands, Enric
Animation, Radek Daněček, Michael J. Black, Timo Bolkart
Corona, Tomas Hodan, Minh Vo, Francesc Moreno-Noguer, Chris
157. Robust Egocentric Photo-Realistic Facial Expression Transfer
Sweeney, Richard Newcombe, Lingni Ma
for Virtual Reality, Amin Jourabloo, Fernando De la Torre, Jason
178. MobRecon: Mobile-Friendly Hand Mesh Reconstruction From
Saragih, Shih-En Wei, Stephen Lombardi, Te-Li Wang, Danielle
Monocular Image, Xingyu Chen, Yufeng Liu, Yajiao Dong, Xiong
Belko, Autumn Trimble, Hernan Badino
Zhang, Chongyang Ma, Yanmin Xiong, Yuan Zhang, Xiaoyan Guo
158. FaceVerse: A Fine-Grained and Detail-Controllable 3D Face
179. Mining Multi-View Information: A Strong Self-Supervised
Morphable Model From a Hybrid Dataset, Lizhen Wang, Zhiyuan
Framework for Depth-Based 3D Hand Pose and Mesh
Chen, Tao Yu, Chenguang Ma, Liang Li, Yebin Liu
Estimation, Pengfei Ren, Haifeng Sun, Jiachang Hao, Jingyu
159. ImFace: A Nonlinear 3D Morphable Face Model With Implicit
Wang, Qi Qi, Jianxin Liao
Neural Representations, Mingwu Zheng, Hongyu Yang, Di
180. Low-Resource Adaptation for Personalized Co-Speech Gesture
Huang, Liming Chen
Generation, Chaitanya Ahuja, Dong Won Lee, Louis-Philippe
160. Physically-Guided Disentangled Implicit Rendering for 3D Face
Morency
Modeling, Zhenyu Zhang, Yanhao Ge, Ying Tai, Weijian Cao,
181. D-Grasp: Physically Plausible Dynamic Grasp Synthesis for
Renwang Chen, Kunlin Liu, Hao Tang, Xiaoming Huang, Chengjie
Hand-Object Interactions, Sammy Christen, Muhammed
Wang, Zhifeng Xie, Dongjin Huang
Kocabas, Emre Aksan, Jemin Hwangbo, Jie Song, Otmar Hilliges
161. RigNeRF: Fully Controllable Neural 3D Portraits, ShahRukh
Athar, Zexiang Xu, Kalyan Sunkavalli, Eli Shechtman, Zhixin Shu
162. HeadNeRF: A Real-Time NeRF-Based Parametric Head Model,
Yang Hong, Bo Peng, Haiyao Xiao, Ligang Liu, Juyong Zhang
62
Friday, June 24 (Afternoon) Program
Medical, Biological and Cell Microscopy 200. Vox2Cortex: Fast Explicit Reconstruction of Cortical Surfaces
182. Synthetic Generation of Face Videos With Plethysmograph From 3D MRI Scans With Geometric Deep Neural Networks,
Physiology, Zhen Wang, Yunhao Ba, Pradyumna Chari, Oyku Fabian Bongratz, Anne-Marie Rickmann, Sebastian Pölsterl,
Deniz Bozkurt, Gianna Brown, Parth Patwa, Niranjan Vaddi, Christian Wachinger
Laleh Jalilian, Achuta Kadambi 201. Aladdin: Joint Atlas Building and Diffeomorphic Registration
183. Contour-Hugging Heatmaps for Landmark Detection, James Learning With Pairwise Alignment, Zhipeng Ding, Marc
McCouat, Irina Voiculescu Niethammer
184. Which Images To Label for Few-Shot Medical Landmark 202. Learning Optimal K-Space Acquisition and Reconstruction
Detection? Quan Quan, Qingsong Yao, Jun Li, S. Kevin Zhou Using Physics-Informed Neural Networks, Wei Peng, Li Feng,
185. Self-Supervised Bulk Motion Artifact Removal in Optical Guoying Zhao, Fang Liu
Coherence Tomography Angiography, Jiaxiang Ren, Kicheon 203. NODEO: A Neural Ordinary Differential Equation Based
Park, Yingtian Pan, Haibin Ling Optimization Framework for Deformable Image Registration,
186. Multi-Marginal Contrastive Learning for Multi-Label Subcellular Yifan Wu, Tom Z. Jiahao, Jiancong Wang, Paul A. Yushkevich, M.
Protein Localization, Ziyi Liu, Zengmao Wang, Bo Du Ani Hsieh, James C. Gee
187. Transformer-Empowered Multi-Scale Contextual Matching and 204. SMPL-A: Modeling Person-Specific Deformable Anatomy,
Aggregation for Multi-Contrast MRI Super-Resolution, Hengtao Guo, Benjamin Planche, Meng Zheng, Srikrishna
Guangyuan Li, Jun Lv, Yapeng Tian, Qi Dou, Chengyan Wang, Karanam, Terrence Chen, Ziyan Wu
Chenliang Xu, Jing Qin 205. DiRA: Discriminative, Restorative, and Adversarial Learning for
188. Harmony: A Generic Unsupervised Approach for Disentangling Self-Supervised Medical Image Analysis, Fatemeh Haghighi,
Semantic Content From Parameterized Transformations, Mohammad Reza Hosseinzadeh Taher, Michael B. Gotway,
Mostofa Rafid Uddin, Gregory Howe, Xiangrui Zeng, Min Xu Jianming Liang
189. Cross-Modal Clinical Graph Transformer for Ophthalmic Report 206. Affine Medical Image Registration With Coarse-To-Fine Vision
Generation, Mingjie Li, Wenjia Cai, Karin Verspoor, Shirui Pan, Transformer, Tony C. W. Mok, Albert C. S. Chung
Xiaodan Liang, Xiaojun Chang 207. Topology-Preserving Shape Reconstruction and Registration via
190. BoostMIS: Boosting Medical Image Semi-Supervised Learning Neural Diffeomorphic Flow, Shanlin Sun, Kun Han, Deying Kong,
With Adaptive Pseudo Labeling and Informative Active Hao Tang, Xiangyi Yan, Xiaohui Xie
Annotation, Wenqiao Zhang, Lei Zhu, James Hallinan, Shengyu 208. Generalizable Cross-Modality Medical Image Segmentation via
Zhang, Andrew Makmur, Qingpeng Cai, Beng Chin Ooi Style Augmentation and Dual Normalization, Ziqi Zhou, Lei Qi,
191. Incremental Cross-View Mutual Distillation for Self-Supervised Xin Yang, Dong Ni, Yinghuan Shi
Medical CT Synthesis, Chaowei Fang, Liang Wang, Dingwen 209. Closing the Generalization Gap of Cross-Silo Federated Medical
Zhang, Jun Xu, Yixuan Yuan, Junwei Han Image Segmentation, An Xu, Wenqi Li, Pengfei Guo, Dong Yang,
192. Towards Low-Cost and Efficient Malaria Detection, Waqas Holger R. Roth, Ali Hatamizadeh, Can Zhao, Daguang Xu, Heng
Sultani, Wajahat Nawaz, Syed Javed, Muhammad Sohail Danish, Huang, Ziyue Xu
Asma Saadia, Mohsen Ali 210. FIBA: Frequency-Injection Based Backdoor Attack in Medical
193. ACPL: Anti-Curriculum Pseudo-Labelling for Semi-Supervised Image Analysis, Yu Feng, Benteng Ma, Jing Zhang, Shanshan
Medical Image Classification, Fengbei Liu, Yu Tian, Yuanhong Zhao, Yong Xia, Dacheng Tao
Chen, Yuyuan Liu, Vasileios Belagiannis, Gustavo Carneiro 211. Surpassing the Human Accuracy: Detecting Gallbladder Cancer
194. Multimodal Dynamics: Dynamical Fusion for Trustworthy From USG Images With Curriculum Learning, Soumen Basu,
Multimodal Classification, Zongbo Han, Fan Yang, Junzhou Mayank Gupta, Pratyaksha Rana, Pankaj Gupta, Chetan Arora
Huang, Changqing Zhang, Jianhua Yao 212. CellTypeGraph: A New Geometric Computer Vision Benchmark,
195. M3T: Three-Dimensional Medical Image Classifier Using Multi- Lorenzo Cerrone, Athul Vijayan, Tejasvinee Mody, Kay Schneitz,
Plane and Multi-Slice Transformer, Jinseong Jang, Dosik Hwang Fred A. Hamprecht
196. Self-Supervised Pre-Training of Swin Transformers for 3D 213. ContIG: Self-Supervised Multimodal Contrastive Learning for
Medical Image Analysis, Yucheng Tang, Dong Yang, Wenqi Li, Medical Imaging With Genetics, Aiham Taleb, Matthias Kirchler,
Holger R. Roth, Bennett Landman, Daguang Xu, Vishwesh Nath, Remo Monti, Christoph Lippert
Ali Hatamizadeh Datasets and Evaluation
197. HyperSegNAS: Bridging One-Shot Neural Architecture Search 214. FERV39k: A Large-Scale Multi-Scene Dataset for Facial
With 3D Medical Image Segmentation Using HyperNet, Cheng Expression Recognition in Videos, Yan Wang, Yixuan Sun, Yiwen
Peng, Andriy Myronenko, Ali Hatamizadeh, Vishwesh Nath, Md Huang, Zhongying Liu, Shuyong Gao, Wei Zhang, Weifeng Ge,
Mahfuzur Rahman Siddiquee, Yufan He, Daguang Xu, Rama Wenqiang Zhang
Chellappa, Dong Yang 215. Multi-Dimensional, Nuanced and Subjective – Measuring the
198. DArch: Dental Arch Prior-Assisted 3D Tooth Instance Perception of Facial Expressions, De'Aira Bryant, Siqi Deng,
Segmentation With Weak Annotations, Liangdong Qiu, Chongjie Nashlie Sephus, Wei Xia, Pietro Perona
Ye, Pei Chen, Yunbi Liu, Xiaoguang Han, Shuguang Cui 216. DAD-3DHeads: A Large-Scale Dense, Accurate and Diverse
199. Clean Implicit 3D Structure From Noisy 2D STEM Images, Dataset for 3D Head Alignment From a Single Image, Tetiana
Hannah Kniesel, Timo Ropinski, Tim Bergner, Kavitha Shaga Martyniuk, Orest Kupyn, Yana Kurlyak, Igor Krashenyi, Jiří Matas,
Devan, Clarissa Read, Paul Walther, Tobias Ritschel, Pedro Viktoriia Sharmanska
Hermosilla

63
Friday, June 24 (Afternoon) Program
217. OakInk: A Large-Scale Knowledge Repository for 234. ABO: Dataset and Benchmarks for Real-World 3D Object
Understanding Hand-Object Interaction, Lixin Yang, Kailin Li, Understanding, Jasmine Collins, Shubham Goel, Kenan Deng,
Xinyu Zhan, Fei Wu, Anran Xu, Liu Liu, Cewu Lu Achleshwar Luthra, Leon Xu, Erhan Gundogdu, Xi Zhang, Tomas
218. PoseTrack21: A Dataset for Person Search, Multi-Object F. Yago Vicente, Thomas Dideriksen, Himanshu Arora, Matthieu
Tracking and Multi-Person Pose Tracking, Andreas Döring, Di Guillaumin, Jitendra Malik
Chen, Shanshan Zhang, Bernt Schiele, Jürgen Gall 235. Improving Segmentation of the Inferior Alveolar Nerve Through
219. Learning Modal-Invariant and Temporal-Memory for Video- Deep Label Propagation, Marco Cipriano, Stefano Allegretti,
Based Visible-Infrared Person Re-Identification, Xinyu Lin, Federico Bolelli, Federico Pollastri, Costantino Grana
Jinxing Li, Zeyu Ma, Huafeng Li, Shuang Li, Kaixiong Xu, 236. ZeroWaste Dataset: Towards Deformable Object Segmentation
Guangming Lu, David Zhang in Cluttered Scenes, Dina Bashkirova, Mohamed Abdelfattah,
220. JRDB-Act: A Large-Scale Dataset for Spatio-Temporal Action, Ziliang Zhu, James Akl, Fadi Alladkani, Ping Hu, Vitaly Ablavsky,
Social Group and Activity Detection, Mahsa Ehsanpour, Berk Calli, Sarah Adel Bargal, Kate Saenko
Fatemeh Saleh, Silvio Savarese, Ian Reid, Hamid Rezatofighi 237. DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for
221. DanceTrack: Multi-Object Tracking in Uniform Appearance and Semantic Change Segmentation, Aysim Toker, Lukas
Diverse Motion, Peize Sun, Jinkun Cao, Yi Jiang, Zehuan Yuan, Kondmann, Mark Weber, Marvin Eisenberger, Andrés Camero,
Song Bai, Kris Kitani, Ping Luo Jingliang Hu, Ariadna Pregel Hoderlein, Çağlar Şenaras, Timothy
222. Egocentric Prediction of Action Target in 3D, Yiming Li, Ziang Davis, Daniel Cremers, Giovanni Marchisio, Xiao Xiang Zhu, Laura
Cao, Andrew Liang, Benjamin Liang, Luoyao Chen, Hang Zhao, Leal-Taixé
Chen Feng 238. Open Challenges in Deep Stereo: The Booster Dataset, Pierluigi
223. HOI4D: A 4D Egocentric Dataset for Category-Level Human- Zama Ramirez, Fabio Tosi, Matteo Poggi, Samuele Salti, Stefano
Object Interaction, Yunze Liu, Yun Liu, Che Jiang, Kangbo Lyu, Mattoccia, Luigi Di Stefano
Weikang Wan, Hao Shen, Boqiang Liang, Zhoujie Fu, He Wang, Li 239. No-Reference Point Cloud Quality Assessment via Domain
Yi Adaptation, Qi Yang, Yipeng Liu, Siheng Chen, Yiling Xu, Jun Sun
224. Amodal Panoptic Segmentation, Rohit Mohan, Abhinav Valada 240. Exploring Endogenous Shift for Cross-Domain Detection: A
225. Large-Scale Video Panoptic Segmentation in the Wild: A Large-Scale Benchmark and Perturbation Suppression
Benchmark, Jiaxu Miao, Xiaohan Wang, Yu Wu, Wei Li, Xu Network, Renshuai Tao, Hainan Li, Tianbo Wang, Yanlu Wei, Yifu
Zhang, Yunchao Wei, Yi Yang Ding, Bowei Jin, Hongping Zhi, Xianglong Liu, Aishan Liu
226. YouMVOS: An Actor-Centric Multi-Shot Video Object 241. How Good Is Aesthetic Ability of a Fashion Model? Xingxing
Segmentation Dataset, Donglai Wei, Siddhant Kharbanda, Zou, Kaicheng Pang, Wen Zhang, Waikeung Wong
Sarthak Arora, Roshan Roy, Nishant Jain, Akash Palrecha, Tanav 242. Instance-Wise Occlusion and Depth Orders in Natural Scenes,
Shah, Shray Mathur, Ritik Mathur, Abhijay Kemkar, Anirudh Hyunmin Lee, Jaesik Park
Chakravarthy, Zudi Lin, Won-Dong Jang, Yansong Tang, Song 243. PhoCaL: A Multi-Modal Dataset for Category-Level Object Pose
Bai, James Tompkin, Philip H.S. Torr, Hanspeter Pfister Estimation With Photometrically Challenging Objects,
227. The DEVIL Is in the Details: A Diagnostic Evaluation Benchmark Pengyuan Wang, HyunJun Jung, Yitong Li, Siyuan Shen, Rahul
for Video Inpainting, Ryan Szeto, Jason J. Corso Parthasarathy Srikanth, Lorenzo Garattoni, Sven Meier, Nassir
228. 3MASSIV: Multilingual, Multimodal and Multi-Aspect Dataset of Navab, Benjamin Busam
Social Media Short Videos, Vikram Gupta, Trisha Mittal, Puneet 244. Replacing Labeled Real-Image Datasets With Auto-Generated
Mathur, Vaibhav Mishra, Mayank Maheshwari, Aniket Bera, Contours, Hirokatsu Kataoka, Ryo Hayamizu, Ryosuke Yamada,
Debdoot Mukherjee, Dinesh Manocha Kodai Nakashima, Sora Takashima, Xinyu Zhang, Edgar Josafat
229. AxIoU: An Axiomatically Justified Measure for Video Moment Martinez-Noriega, Nakamasa Inoue, Rio Yokota
Retrieval, Riku Togashi, Mayu Otani, Yuta Nakashima, Esa 245. V2C: Visual Voice Cloning, Qi Chen, Mingkui Tan, Yuankai Qi,
Rahtu, Janne Heikkilä, Tetsuya Sakai Jiaqiu Zhou, Yuanqing Li, Qi Wu
230. A Large-Scale Comprehensive Dataset and Copy-Overlap 246. M5Product: Self-Harmonized Contrastive Learning for E-
Aware Evaluation Protocol for Segment-Level Video Copy Commercial Multi-Modal Pretraining, Xiao Dong, Xunlin Zhan,
Detection, Sifeng He, Xudong Yang, Chen Jiang, Gang Liang, Wei Yangxin Wu, Yunchao Wei, Michael C. Kampffmeyer, Xiaoyong
Zhang, Tan Pan, Qing Wang, Furong Xu, Chunguang Li, JinXiong Wei, Minlong Lu, Yaowei Wang, Xiaodan Liang
Liu, Hui Xu, Kaiming Huang, Yuan Cheng, Feng Qian, Xiaobo 247. It Is Okay To Not Be Okay: Overcoming Emotional Bias in
Zhang, Lei Yang Affective Image Captioning by Contrastive Data Collection,
231. Assembly101: A Large-Scale Multi-View Video Dataset for Youssef Mohamed, Faizan Farooq Khan, Kilichbek Haydarov,
Understanding Procedural Activities, Fadime Sener, Dibyadip Mohamed Elhoseiny
Chatterjee, Daniel Shelepov, Kun He, Dipika Singhania, Robert 248. From Representation to Reasoning: Towards Both Evidence and
Wang, Angela Yao Commonsense Reasoning for Video Question-Answering,
232. Optimal Correction Cost for Object Detection Evaluation, Mayu Jiangtong Li, Li Niu, Liqing Zhang
Otani, Riku Togashi, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, 249. Point Cloud Pre-Training With Natural 3D Structures, Ryosuke
Shin'ichi Satoh Yamada, Hirokatsu Kataoka, Naoya Chiba, Yukiyasu Domae,
233. GrainSpace: A Large-Scale Dataset for Fine-Grained and Tetsuya Ogata
Domain-Adaptive Recognition of Cereal Grains, Lei Fan, Yiwen
Ding, Dongdong Fan, Donglin Di, Maurice Pagnucco, Yang Song

64
Friday, June 24 (Afternoon) Program
250. The Auto Arborist Dataset: A Large-Scale Benchmark for
Multiview Urban Forest Monitoring Under Domain Shift, Sara
Beery, Guanhang Wu, Trevor Edwards, Filip Pavetic, Bo Majewski,
Shreyasee Mukherjee, Stanley Chan, John Morgan, Vivek Rathod,
Jonathan Huang
251. AutoMine: An Unmanned Mine Dataset, Yuchen Li, Zixuan Li,
Siyu Teng, Yu Zhang, Yuhang Zhou, Yuchang Zhu, Dongpu Cao,
Bin Tian, Yunfeng Ai, Zhe Xuanyuan, Long Chen
252. SmartPortraits: Depth Powered Handheld Smartphone Dataset
of Human Portraits for State Estimation, Reconstruction and
Synthesis, Anastasiia Kornilova, Marsel Faizullin, Konstantin
Pakulev, Andrey Sadkov, Denis Kukushkin, Azat Akhmetyanov,
Timur Akhtyamov, Hekmat Taherinejad, Gonzalo Ferrer
253. BigDatasetGAN: Synthesizing ImageNet With Pixel-Wise
Annotations, Daiqing Li, Huan Ling, Seung Wook Kim, Karsten
Kreis, Sanja Fidler, Antonio Torralba
254. Rope3D: The Roadside Perception Dataset for Autonomous
Driving and Monocular 3D Object Detection Task, Xiaoqing Ye,
Mao Shu, Hanyu Li, Yifeng Shi, Yingying Li, Guangjie Wang, Xiao
Tan, Errui Ding
255. Unifying Panoptic Segmentation for Autonomous Driving,
Oliver Zendel, Matthias Schörghuber, Bernhard Rainer, Markus
Murschitz, Csaba Beleznai
256. DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure
Cooperative 3D Object Detection, Haibao Yu, Yizhen Luo, Mao
Shu, Yiyi Huo, Zebang Yang, Yifeng Shi, Zhenglong Guo, Hanyu
Li, Xing Hu, Jirui Yuan, Zaiqing Nie
257. SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task
Domain Adaptation, Tao Sun, Mattia Segu, Janis Postels, Yuxuan
Wang, Luc Van Gool, Bernt Schiele, Federico Tombari, Fisher Yu
258. Ithaca365: Dataset and Driving Perception Under Repeated and
Challenging Weather Conditions, Carlos A. Diaz-Ruiz, Youya Xia,
Yurong You, Jose Nino, Junan Chen, Josephine Monica, Xiangyu
Chen, Katie Luo, Yan Wang, Marc Emond, Wei-Lun Chao, Bharath
Hariharan, Kilian Q. Weinberger, Mark Campbell

1700–1800 Plenary 4 (Hall B1)


Hosts: Kristin Dana (Rudgers Univ.; Steg AI)
Dimitris Samaras (Stony Brook Univ.)
Panel: Embodied Computer Vision
Participants:
• Moderator: Martial Hebert (CMU)
• Kristen Grauman (UT Austin; Meta AI)
• Nicholas Roy (Zoox; MIT)
• Michael Ryoo (Stonybrook Univ., Google)

Notes:

65
CVPR 2022 Notes

66
CVPR 2022 Gold & Silver Donors (Inside Back Cover)

67
CVPR 2022 Platinum Donors (Outside Back Cover)

68

You might also like