Ref VSR

Reference-based Video Super-Resolution
Using Multi-Camera Video Triplets
Junyong Lee Myeonghee Lee Sunghyun Cho Seungyong Lee

2
Motivation
Multi-Camera Smartphones
The higher the focal length, the better the resolution and detail of the captured subject
Apple iPhone 13 Pro Max Samsung Galaxy S21 Ultra

3
Motivation
Reference-based Super-Resolution (RefSR)

Transferring high-quality details from a reference to LR image
Yang et al. CVPR ‘21 Wang et al. ICCV ’21 (RefSR in dual-camera setting)
Patch-based global matching modules are widely employed for matching LR-Ref pair
Learning texture transformer network for image super-resolution, Yang et al., CVPR 2020
Dual-camera super-resolution with aligned attention modules, Wang et al., ICCV 2021
4
Motivation
Reference-based Super-Resolution (RefSR)

Matching often fails for the region outside the overlapped regions between multi-cameras
LR↑ / Ref↓ Bicubic DCSR (RefSR)1
Inside the overlap
Outside the overlap
1Dual-camera super-resolution with aligned attention modules, Wang et al., ICCV 2021
5
Motivation
Reference-based Video Super-Resolution (RefVSR)

Expanding the RefSR task for a video
LR↑ / Ref↓ Bicubic DCSR (RefSR)1 Ours (RefVSR)
Inside the overlap
Outside the overlap
6
Motivation
Reference-based Video Super-Resolution (RefVSR)

High-fidelity textures from temporal reference video frames
LR↑ / Ref↓ Bicubic DCSR (RefSR)1 Ours (RefVSR)
Inside the overlap
Outside the overlap
7
Reference-based Video Super-Resolution Using Multi-Camera Video Triplets
Proposed RefVSR Framework

𝑅𝑅𝑅𝑅𝑅𝑅 𝑁𝑁
Input: low-resolution (LR: ultra-wide) and reference (Ref: wide-angle) video frames {𝐼𝐼𝑡𝑡𝐿𝐿𝐿𝐿 , 𝐼𝐼𝑡𝑡 }𝑡𝑡=1
Output: super-resolved (SR) resulting frame sequence {𝐼𝐼𝑡𝑡𝑆𝑆𝑆𝑆 }𝑁𝑁
𝑡𝑡=1
𝐿𝐿𝐿𝐿 𝑅𝑅𝑅𝑅𝑅𝑅 𝑅𝑅𝑅𝑅𝑅𝑅 𝐿𝐿𝐿𝐿 𝑅𝑅𝑅𝑅𝑅𝑅

𝐼𝐼𝑡𝑡−1 𝐼𝐼𝑡𝑡−1 𝐼𝐼𝑡𝑡𝐿𝐿𝐿𝐿 𝐼𝐼𝑡𝑡 𝐼𝐼𝑡𝑡+1 𝐼𝐼𝑡𝑡+1
𝐹𝐹𝑓𝑓 𝐹𝐹𝑓𝑓 𝐹𝐹𝑓𝑓
𝐹𝐹𝑏𝑏 𝐹𝐹𝑏𝑏 𝐹𝐹𝑏𝑏
𝑈𝑈 𝑈𝑈 𝑈𝑈
𝑆𝑆𝑆𝑆
𝐼𝐼𝑡𝑡−1 𝐼𝐼𝑡𝑡𝑆𝑆𝑆𝑆 𝑆𝑆𝑆𝑆
𝐼𝐼𝑡𝑡+1
8
Key Idea: Fuse and Propagate

Global matching is needed only between an LR and reference frames at a recurrent step
𝐿𝐿𝐿𝐿 𝑅𝑅𝑅𝑅𝑅𝑅 𝑅𝑅𝑅𝑅𝑅𝑅 𝐿𝐿𝐿𝐿 𝑅𝑅𝑅𝑅𝑅𝑅

𝐼𝐼𝑡𝑡−1 𝐼𝐼𝑡𝑡−1 𝐼𝐼𝑡𝑡𝐿𝐿𝐿𝐿 𝐼𝐼𝑡𝑡 𝐼𝐼𝑡𝑡+1 𝐼𝐼𝑡𝑡+1
𝐹𝐹𝑓𝑓
propagate 𝐹𝐹𝑓𝑓 𝐹𝐹𝑓𝑓
𝐹𝐹𝑏𝑏 𝐹𝐹𝑏𝑏
propagate 𝐹𝐹𝑏𝑏
𝑈𝑈 𝑈𝑈 𝑈𝑈
𝑆𝑆𝑆𝑆
𝐼𝐼𝑡𝑡−1 𝐼𝐼𝑡𝑡𝑆𝑆𝑆𝑆 𝑆𝑆𝑆𝑆
𝐼𝐼𝑡𝑡+1
9
Recurrent Cell
1. Inter-frame fusion
Propagated features + LR features  temporally aggregated features
2. Reference Alignment and Propagation (RAP)

temporally aggregated features + reference features  propagating features
𝑅𝑅𝑅𝑅𝑅𝑅 𝐿𝐿𝐿𝐿 𝑅𝑅𝑅𝑅𝑅𝑅

𝐼𝐼𝑡𝑡 𝐼𝐼𝑡𝑡+1 𝐼𝐼𝑡𝑡+1
𝐹𝐹𝑓𝑓 Inter-frame fusion

𝐹𝐹𝑏𝑏 𝐹𝐹𝑏𝑏
10
Reference Alignment and Propagation

1. Reference feature alignment1
Aligning reference features to LR features
2. Propagative temporal fusion

Matching confidence-based fusion between
temporally aggregated features and reference features
Propagative Temporal Fusion

temporally
Reference Fusion and Propagation (𝑅𝑅𝐴𝐴𝑃𝑃) aggregated features
{𝑓𝑓,𝑏𝑏}
ℎ� 𝑡𝑡
𝜙𝜙
conv
concat
𝐼𝐼𝑡𝑡𝐿𝐿𝐿𝐿 cosine 𝑐𝑐𝑡𝑡 (matching confidence)
𝜙𝜙 similarity
𝑅𝑅𝑒𝑒𝑒𝑒
𝐼𝐼𝑡𝑡
Reference
matrix
4 … 5 𝑝𝑝
… … … 𝑡𝑡 aligned Ref
𝑅𝑅
𝑅𝑅𝑅𝑅𝑅𝑅
9 … 2 (matching index)
ℎ� 𝑡𝑡 {𝑓𝑓,𝑏𝑏}
feature features ℎ𝑡𝑡
𝑅𝑅𝑅𝑅𝑅𝑅
ℎ� 𝑡𝑡 concat
alignment
Reference Propagative
{𝑓𝑓,𝑏𝑏}
ℎ𝑡𝑡 Guidance for the fusion conv
Reference feature
Alignment1 Temporal matching
aligned to LR 𝑐𝑐𝑡𝑡
{𝑓𝑓,𝑏𝑏} Accumulated Fusion {𝑓𝑓,𝑏𝑏} confidence
𝑐𝑐𝑡𝑡̃ matching confidence 𝑐𝑐𝑡𝑡
accumulated
{𝑓𝑓,𝑏𝑏}
ℎ� 𝑡𝑡 matching confidence
{𝑓𝑓,𝑏𝑏}
𝑐𝑐𝑡𝑡̃ max {𝑓𝑓,𝑏𝑏}
𝑐𝑐𝑡𝑡
11
Reference Alignment and Propagation

Matching confidence-based fusion to guide only well-matched reference features to be propagated
Inside the overlap
Outside the overlap
w/o PTF w/ PTF w/o PTF w/ PTF

12
Training
RealMCVSR Dataset
The RealMCVSR dataset provides real-world HD video triplets

concurrently recorded by Apple iPhone 12 Pro Max equipped with triple cameras
(a) Ultra-wide Video (b) Wide-angle Video (c) Telephoto Video

13
Training
2-Stage Training Scheme Fully Utilizing Video Triplets

Pre-training stage (supervised setting)
 Inputs  4x downsampled ultra-wide (LR) and wide (reference) videos
 Supervision  ultra-wide (HR) and wide (reference) videos
Wide-angle videos for reference supervision
𝑆𝑆𝑆𝑆 𝐻𝐻𝐻𝐻 𝑊𝑊𝑖𝑖𝑖𝑖𝑖𝑖

𝑙𝑙𝑝𝑝𝑝𝑝𝑝𝑝 = 𝐼𝐼𝑡𝑡,𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 − 𝐼𝐼𝑡𝑡,𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 + 𝛿𝛿(𝐼𝐼𝑡𝑡𝑆𝑆𝑆𝑆 , 𝐼𝐼𝑡𝑡𝐻𝐻𝐻𝐻 ) + 𝑙𝑙𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 (𝐼𝐼𝑡𝑡𝑆𝑆𝑆𝑆 , 𝐼𝐼𝑡𝑡∈Ω )
Low-frequency term High-frequency term
14
Training
2-Stage Training Scheme Fully Utilizing Video Triplets

Adaptation stage for real-world 4x VSR (self-supervised setting)
 Inputs  real-world ultra-wide (LR) and wide (reference) videos
 Supervision  telephoto (reference) video
Telephoto videos for reference supervision
𝑆𝑆𝑆𝑆 𝑈𝑈𝑈𝑈 𝑇𝑇𝑒𝑒𝑒𝑒𝑒𝑒

𝑙𝑙8𝐾𝐾 = 𝐼𝐼𝑡𝑡↓,𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 − 𝐼𝐼𝑡𝑡,𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 + 𝑙𝑙𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 (𝐼𝐼𝑡𝑡𝑆𝑆𝑆𝑆 , 𝐼𝐼𝑡𝑡∈Ω )
Low-frequency term High-frequency term
• Experiments
Qualitative Results
(8K ultra-wide VSR from real-world HD videos)
15
• Experiments
Video 1
The region inside the overlapped FoV between ultra-wide and wide-angle frames
16
• Experiments
Video 1
The region outside the overlapped FoV between ultra-wide and wide-angle frames
18
• Experiments
Video 2
20
• Experiments
Video 2
22
• Experiments
Video 3
24
• Experiments
Video 3
26
28
Results
Quantitative Results
State-of-the-art super-resolution performance
Improved fidelity balance > 49%
Thank You!
Analysis on reference video types Analysis on alignment modules
Paper Project
More results
Analysis on propagative temporal fusion

Ref VSR

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ref VSR

Uploaded by

Copyright:

Available Formats

Reference-based Video Super-Resolution

Using Multi-Camera Video Triplets

Junyong Lee Myeonghee Lee Sunghyun Cho Seungyong Lee

Apple iPhone 13 Pro Max Samsung Galaxy S21 Ultra

Reference-based Super-Resolution (RefSR)

Reference-based Super-Resolution (RefSR)

LR↑ / Ref↓ Bicubic DCSR (RefSR)1

Inside the overlap

Outside the overlap

Reference-based Video Super-Resolution (RefVSR)

LR↑ / Ref↓ Bicubic DCSR (RefSR)1 Ours (RefVSR)

Inside the overlap

Outside the overlap

Reference-based Video Super-Resolution (RefVSR)

LR↑ / Ref↓ Bicubic DCSR (RefSR)1 Ours (RefVSR)

Inside the overlap

Outside the overlap

Proposed RefVSR Framework

𝐿𝐿𝐿𝐿 𝑅𝑅𝑅𝑅𝑅𝑅 𝑅𝑅𝑅𝑅𝑅𝑅 𝐿𝐿𝐿𝐿 𝑅𝑅𝑅𝑅𝑅𝑅

𝐹𝐹𝑓𝑓 𝐹𝐹𝑓𝑓 𝐹𝐹𝑓𝑓

𝐹𝐹𝑏𝑏 𝐹𝐹𝑏𝑏 𝐹𝐹𝑏𝑏

Key Idea: Fuse and Propagate

𝐿𝐿𝐿𝐿 𝑅𝑅𝑅𝑅𝑅𝑅 𝑅𝑅𝑅𝑅𝑅𝑅 𝐿𝐿𝐿𝐿 𝑅𝑅𝑅𝑅𝑅𝑅

2. Reference Alignment and Propagation (RAP)

𝑅𝑅𝑅𝑅𝑅𝑅 𝐿𝐿𝐿𝐿 𝑅𝑅𝑅𝑅𝑅𝑅

𝐹𝐹𝑓𝑓 Inter-frame fusion

Reference Alignment and Propagation

2. Propagative temporal fusion

Propagative Temporal Fusion

Reference Alignment and Propagation

w/o PTF w/ PTF w/o PTF w/ PTF

The RealMCVSR dataset provides real-world HD video triplets

(a) Ultra-wide Video (b) Wide-angle Video (c) Telephoto Video

2-Stage Training Scheme Fully Utilizing Video Triplets

Wide-angle videos for reference supervision

𝑆𝑆𝑆𝑆 𝐻𝐻𝐻𝐻 𝑊𝑊𝑖𝑖𝑖𝑖𝑖𝑖

2-Stage Training Scheme Fully Utilizing Video Triplets

Telephoto videos for reference supervision

𝑆𝑆𝑆𝑆 𝑈𝑈𝑈𝑈 𝑇𝑇𝑒𝑒𝑒𝑒𝑒𝑒

Analysis on reference video types Analysis on alignment modules

You might also like