You are on page 1of 4

TECHNOLOGICAL WATCH, DREAM, UPHF, JANUARY 2024 1

Advances in Automatic Image Restoration and


Upscaling: Enhancing Visual Data Fidelity
Cedric Derieux
Université Polytechnique Hauts-de-France, Master 1 Audiovisual Production, DREAM

sophisticated lab for building learning models and conducting


Abstract—The proliferation of digital imagery in various sectors experiments. Loved for its dynamic nature, PyTorch allows
necessitates advanced methods for improving image quality. researchers to modify models on the fly and observe their
Image restoration and upscaling techniques have become crucial performance in real-time. This adaptability proves particularly
in transforming degraded or low-resolution images into high-
fidelity, high-resolution outputs. Image upscaling is the process of
useful in the iterative process of designing learning models,
increasing the resolution or size of an image, thereby enhancing its especially when the best approach to achieve optimal results is
overall visual quality. It is widely used in various fields, including uncertain. It is widely used in applications such as computer
photography, graphic design, AI art, and video production. The vision and natural language processing. [5]
primary goal of image upscaling is to improve the quality of
images without the need to retake or recreate them, saving time
and resources. Today, with interfaces such as chaiNNer, you can
obtain a high-definition image from a low-definition image, using
II. CNN-BASED MODELS
a pre-trained upscaling model downloadable from the Internet. Convolutional Neural Network (CNN) based models are a
This paper explores the computer science concepts enabling these class of deep learning models primarily used for analyzing
techniques, with a focus on the new learning models and visual data. CNNs are designed to automatically and adaptively
architectures such as SRFormer, HAT, FBCNN, ERSGAN, and
learn spatial hierarchies of features from the input data, which
RealESRGAN. We also contrast convolutional neural networks
(CNNs) and transformers, discussing their respective strengths makes them highly effective for tasks like image classification,
and limitations in the context of image restoration. object detection, and image restoration. [6]
The network works using convolutional layers (Fig. 1) that
Index Terms— Learning Models, Image Restoration, Upscaling, can focus on low-level features like edges and textures in the
Convolutional Neural Network, Transformer-based architectures early layers, then aggregate and combine these low-level
features to detect higher-level features (like shapes or objects)
in the later layers.
I. THE FUNDAMENTALS OF LEARNING MODELS

L earning models are the backbone of modern artificial


intelligence (AI). They are essentially programs that have
the capacity to learn from data. Think of them as students who
improve their knowledge over time with study and experience.
In computer science, these learning models adjust their internal
parameters (which are akin to the knowledge and skills a
student acquires) to make fewer errors in tasks they're designed
to perform, like recognizing objects in a picture, predicting the Fig. 1. Typical CNN architecture. [9]
weather, or, in our case, fixing and enlarging images. Deep
learning, a subset of machine learning, involves neural
A. FBCNN
networks with multiple layers that can learn increasingly
abstract features of the data and extract hierarchical features The Fast and Blind Convolutional Neural Network (FBCNN)
from them, enabling the capture of complex patterns necessary is a model designed to restore images without any prior
for high-quality image reconstruction. [1] knowledge of the degradation processes. It utilizes
convolutional layers, which are fundamental building blocks of
CNNs, to learn from generic patterns in the data. This enables
A. PyTorch for Model Development FBCNN to adaptively restore a wide array of image
PyTorch, an open-source machine learning library, is a corruptions. The model demonstrates robustness in blind image
popular tool among AI developers for its flexibility and restoration scenarios, where the type and degree of image
dynamic computation graph. It provides a platform akin to a corruption are unknown. FBCNN consists of four components:
TECHNOLOGICAL WATCH, DREAM, UPHF, JANUARY 2024 2

decoupler, QF predictor, flexible controller and image enhances the quality of the output for images with intricate
reconstructor. [10] (Fig. 2) degradation patterns. [14]
JPEG is one of the most widely-used image compression
algorithms and formats due to its simplicity and fast
D. RealESRGAN
encoding/decoding speeds. However, it is a lossy compression
algorithm and can introduce annoying artifacts. FBCNN is a RealESRGAN is an advancement of the ESRGAN
flexible (Fig. 3) blind JPEG artifacts removal network for real framework for practical application, specifically targeting the
JPEG image restoration. [11] (Fig. 4) restoration of real-world images. It addresses challenges such
as over-smoothing and the loss of fine details, which were
prevalent in predecessor models. RealESRGAN achieves this
by generating textures that closely resemble those found in the
original high-resolution images. This results in restored images
that retain the naturalness and authenticity of the original
images. (Fig. 5) [16]

Fig. 2. Network Architecture of FBCNN [11]

Fig. 3. Flexibility of FBCNN [11]


Fig. 5. Comparison between low-resolution source image and Real-ERSGAN output

III. TRANSFORMER-BASED ARCHITECTURES


Transformer-based architectures are a type of deep learning
model that have revolutionized the field of natural language
processing. The key innovation of Transformer models is the
self-attention mechanism, which weights the importance of
each part of the input data differently. For example, this allows
(a) JPEG (b) FBCNN Q=30
the model to consider the context of each word in a sentence,
Fig. 4. Comparison between JPEG source image and FBCNN Output leading to a better understanding of the input data. [18]
Unlike previous architecture Convolutional Neural Networks
B. GAN
(CNNs), Transformers process input sequences in parallel,
Generative Adversarial Networks (GANs) are a class of making them highly efficient for training and inference. This is
machine learning frameworks invented by Ian Goodfellow and a significant advantage as it reduces the training time.
his colleagues in 2014. They are designed to generate new data
instances that resemble the training data. GANs consist of two
parts: a generator network, which produces new data instances, A. SRFormer
and a discriminator network, which evaluates them for SRFormer is a novel architecture that leverages the power of
authenticity. The generator improves its output based on transformer models. Transformers are known for their ability to
feedback from the discriminator, creating a dynamic adversarial learn contextual relationships within data sequences, making
process. This process leads to the generation of high-quality them highly effective for tasks involving sequential data.
data that closely mimics the distribution of the original dataset. In the context of SRFormer, these transformer models are
[13] used to process image data. The architecture employs self-
attention mechanisms, a key component of transformer models,
C. ERSGAN to distinguish and amplify high-resolution features within
images. This is a significant advancement over traditional
The Enhanced Residual Sparse Generative Adversarial
Convolutional Neural Network (CNN)-based methods, which
Network (ERSGAN) is a variant of GAN that integrates
often struggle with predicting fine details during the upscaling
residual learning and sparse representations in a GAN
process. [22] (Fig. 6)
framework. Residual learning helps the model to learn complex
The core of SRFormer is the Permuted Self-Attention (PSA),
functions by reformulating the layers as learning residual
a simple, efficient, and effective attention mechanism. It allows
functions with reference to the layer inputs. Sparse
the model to build large range pairwise correlations with even
representations, on the other hand, allow the model to focus on
less computational burden than the original Window Self-
critical data points for image restoration. This approach
Attention (WSA) of SwinIR. SRFormer is specifically designed
TECHNOLOGICAL WATCH, DREAM, UPHF, JANUARY 2024 3

for single image super-resolution tasks, and it demonstrates B. Transformers in Image Restoration
superior performance compared to state-of-the-art methods Transformers, on the other hand, use self-attention
from both Transformer and Convolutional networks. [23] mechanisms to weigh the influence of different parts of the
input data. For image restoration, this means they can capture
long-range dependencies and global context more effectively
than CNNs.

Strengths of Transformers:
- Excel at capturing global context and long-range
dependencies within the image.
- Self-attention allows them to focus on relevant parts of the
(a) Source Image (b) 4x Upscale SRFormer

Fig. 6. Comparison between low-resolution source image and SRFormer output


image regardless of their spatial position.
- Potentially more flexible in modeling various types of image
degradation.
B. HAT
The Holistic Attention Transformer (HAT) is another Weaknesses of Transformers:
transformer-based architecture that makes extensive use of - Typically require more computational resources and larger
attention mechanisms. These mechanisms enable the model to datasets to train effectively.
interpret images in a comprehensive manner, considering the - Can be slower to train due to their complexity.
entire image as a whole rather than in isolated segments. - May include many more parameters than CNNs, increasing
This global perspective is crucial for tasks such as image the risk of overfitting on smaller datasets.
restoration, where understanding the interplay between various
image segments is necessary to accurately reconstruct or For image restoration, Transformers are often used when the
enhance image details. By paying attention to the entire image, task requires understanding the entire image content, such as in
HAT can understand the context and dependencies between cases of severe degradation where local information is not
different parts of the image, leading to more accurate and sufficient to reconstruct the original image. However, due to
holistic image restoration. [25] (Fig. 7) their computational intensity, hybrid approaches combining
CNNs and transformers are also popular, aiming to leverage the
strengths of both architectures.

C. Limitations and Challenges


These advanced image restoration methods face several
challenges:
- Significant computational resources are necessary, especially
(a) Source Image (b) 4x Upscale HAT during the training phase.
Fig. 7. Comparison between low-resolution source image and HAT output - Large, diverse datasets of high-quality images are required for
effective training, which can be resource-intensive to compile.
IV. COMPARATIVE ANALYSIS - Artifacts may be introduced by the models, particularly in
cases of severe or atypical degradation.
A. CNNs in Image Restoration - Overfitting can occur when models are overly optimized for
CNNs leverage convolutional filters to process image data. training data, reducing their generalizability to new images.
These filters slide over the image to capture local patterns and - Transformers, while powerful, may exhibit slower
textures. For image restoration, CNNs are effective because performance and higher resource consumption compared to
they can learn hierarchical feature representations and perform CNNs.
well in capturing spatial hierarchies, which is beneficial for
tasks like denoising and super-resolution.
V. TRAINING A DEEP LEARNING MODEL FOR IMAGE
Strengths of CNNs: RESTORATION
- Efficient for spatial data processing.
Training a deep learning model for image restoration
- Locally connected layers lead to fewer parameters and faster
involves several steps, from data preparation to model
training.
evaluation. Here is a high-level overview of the process: [29]
- Well-suited for capturing local dependencies and textures.
Step 1: Dataset Preparation
Weaknesses of CNNs:
Before training begins, a dataset consisting of degraded images
- Limited receptive field size can hinder capturing global
and their high-quality counterparts is required. These image
context (though this can be mitigated by deeper architectures or
pairs are used to teach the model the mapping from a low-
dilated convolutions).
quality to a high-quality image.
- Sometimes struggle with restoring larger-scale structures.
TECHNOLOGICAL WATCH, DREAM, UPHF, JANUARY 2024 4

- Degraded Images: These are images that have been [3] PyTorch documentation [Online] Available:
https://pytorch.org/docs/stable/index.html
purposefully downgraded through processes such as blurring, [4] PyTorch Wikipedia [Online] Available:
adding noise, compression artifacts, or downsampling. https://en.wikipedia.org/wiki/PyTorch
[5] What is PyTorch? (Machine/Deep Learning) – Video by IBM Technology
- 11 oct. 2023 [Online] Available:
- High-Quality Images: The dataset must also include the https://www.youtube.com/watch?v=fJ40w_2h8kk
original or enhanced images that the model aims to reproduce. [6] Different types of CNN models [Online] Available:
https://iq.opengenus.org/different-types-of-cnn-models/
[7] Convolutional Neural Network: Tout ce qu’il y a à savoir – 25 June 2023
Step 2: Model Architecture Selection [Online] Available: https://datascientest.com/convolutional-neural-
Selecting the right model architecture is crucial. This choice network
depends on the specific task (denoising, deblurring, super- [8] How to Develop Convolutional Neural Network Models for Time Series
Forecasting by Jason Brownlee on August 28, 2020 [Online] Available:
resolution) and the available computational resources. https://machinelearningmastery.com/how-to-develop-convolutional-
neural-network-models-for-time-series-forecasting/
Step 3: Loss Function Design [9] Convolutional neural network Wikipedia [Online] Available:
https://en.wikipedia.org/wiki/Convolutional_neural_network
The loss function measures the difference between the model's [10] Towards Flexible Blind JPEG Artifacts Removal - Jiaxi Jiang, Kai
output and the ground truth high-quality image. Common loss Zhang, Radu Timofte - 29 September 2021 [Online] Available:
functions include Mean Squared Error (MSE), Structural https://arxiv.org/abs/2109.14573
[11] Towards Flexible Blind JPEG Artifacts Removal (FBCNN, ICCV 2021)
Similarity Index (SSIM), and perceptual loss based on features Github Official Code [Online] Available: https://github.com/jiaxi-
extracted from pretrained networks. jiang/FBCNN
[12] Towards Flexible Blind JPEG Artifacts Removal (FBCNN, ICCV 2021)
for Windows Github Official Code [Online] Available:
Step 4: Model Training https://github.com/bycloudai/FBCNN-Windows
During training, the model learns to restore images by adjusting [13] What is a Generative Adversarial Network? by Thomas Wood [Online]
its parameters to minimize the loss function. This process Available: https://deepai.org/machine-learning-glossary-and-
terms/generative-adversarial-network
involves: [14] ERSGAN: A Beginner's Guide to Harnessing the Power of Advanced AI
Image Enhancement - Zaid Meccai – 24 July 2023 [Online] Available:
- Forward Pass: The model processes the input degraded https://blog.segmind.com/how-to-get-started-with-the-ersgan-ai-model/
[15] Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure
images to generate restored outputs. Synthetic Data Github Official Code [Online] Available:
- Loss Calculation: The loss between the restored output and https://github.com/xinntao/Real-ESRGAN
the high-quality target image is calculated. [16] Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure
Synthetic Data [Online] Available: https://arxiv.org/abs/2107.10833
- Backpropagation: The model's parameters are updated in the [17] What Is the Transformer Architecture and How Does It Work? [Online]
direction that reduces the loss, using optimization algorithms Available: https://datagen.tech/guides/computer-vision/transformer-
like stochastic gradient descent (SGD) or Adam. architecture/
[18] What is a transformer model? [Online] Available:
https://www.ibm.com/topics/transformer-model
Step 5: Model Evaluation [19] The Transformer Model by Stefania Cristina on January 6, 2023 [Online]
After training, the model is evaluated on a separate validation Available: https://machinelearningmastery.com/the-transformer-model/
[20] Transformer (machine learning model) [Online] Available:
set to assess its generalization capabilities. Performance metrics https://en.wikipedia.org/wiki/Transformer_%28machine_learning_model
might include Peak Signal-to-Noise Ratio (PSNR), SSIM, and %29
visual inspection. [21] SRFormer: Permuted Self-Attention for Single Image Super-Resolution
(ICCV2023) [Online] Available: https://github.com/HVision-
NKU/SRFormer
[22] SRFormer: Efficient Yet Powerful Transformer Network for Single Image
VI. CONCLUSION Super Resolution - Armin Mehri, Parichehr Behjati, Dario Carpio, Angel
Domingo Sappa - 27 October 2023 [Online] Available:
In conclusion, the landscape of automatic image restoration https://ieeexplore.ieee.org/document/10298198
[23] SRFormer: Permuted Self-Attention for Single Image Super-Resolution -
and upscaling has witnessed significant advancements with the Yupeng Zhou, Zhen Li, Chun-Le Guo, Song Bai, Ming-Ming Cheng, Qibin
advent of CNN and transformer-based models. The selection of Hou - 17 Mars 2023 [Online] Available https://arxiv.org/abs/2303.09735
an appropriate architecture, whether SRFormer, HAT, FBCNN, [24] Activating More Pixels in Image Super-Resolution Transformer - Xiangyu
Chen, Xintao Wang, Jiantao Zhou, Yu Qiao, Chao Dong – 19 Mars 2023
ERSGAN, or RealESRGAN, depends on the specific image [Online] Available: https://arxiv.org/abs/2205.04437
restoration task. The creation of potent models necessitates [25] HAT: Hybrid Attention Transformer for Image Restoration - Xiangyu
careful architecture choice, rigorous training, and continuous Chen, Xintao Wang, Wenlong Zhang, Xiangtao Kong, Yu Qiao, Jiantao
Zhou, Chao Dong – 11 September 2023 [Online] Available:
refinement, facilitated by frameworks such as PyTorch. Despite https://arxiv.org/abs/2309.05239
their potential, these methods contend with ongoing challenges, [26] HAT Github [Online] Available: https://github.com/XPixelGroup/HAT
driving continuous research to further advance the field of [27] A Survey of Deep Learning Approaches to Image Restoration - Jingwen
Su, Boyan Xu, Hujun Yin - 25 February 2022 [Online] Available:
image restoration. https://research.manchester.ac.uk/en/publications/a-survey-of-deep-
learning-approaches-to-image-restoration
[28] On-Demand Learning for Deep Image Restoration – Ruohan Gao,
KristenGrauman – 2 August 2017 [Online] Available:
REFERENCES https://arxiv.org/pdf/1612.01380
[1] Discover the differences between classical and AI upscaling methods, [29] A Comprehensive Review of Deep Learning-Based Real-World Image
their benefits, and applications in enhancing image quality – 13 May 2023 Restoration - Lujun Zhai, Yonghui Wang, Suxia Cui, Yu Zhou - 6 March
[Online] Available: https://unimatrixz.com/topics/ai-upscaler/upscaling- 2023 [Online] Available: https://ieeexplore.ieee.org/document/10056934
methods/
[2] chaiNNer Github Official Code [Online] Available:
https://github.com/chaiNNer-org/chaiNNer

You might also like