Professional Documents
Culture Documents
Figure 7. (a) is original image, (b) is the result from HAT, (c) is the result from Bicubic interpolation, (d) is the result from Srcnn, (e) is the
result from ESRGAN, (f) is the result from SwinIR
require training multiple models or evaluating their perfor- computationally intensive and time-consuming, with each
mance. The drawback with this method is that it may not model taking approximately six days to train. Despite the
capture the unique traits and nuances of the vintage face resource intensity, this phase was crucial for ensuring that
photos, like wrinkles, spots, or hairstyles. Furthermore, the our models could accurately learn and replicate the high-
pre-trained model might add biases or artifacts, like alter- resolution characteristics of the FFHQ images.
ing the gender, skin tone, or facial expression, that are not Model Evaluation,The primary objective of our exper-
in the original photos. Because our method compares the iment was to assess the performance of different super-
performance of multiple models using different metrics, it resolution models, namely the SRCNN, ESRGAN, SwinIR,
is superior. In this manner, the optimal model that produces and HAT models [8, 9, 10, 11]. We evaluated these mod-
faithful and high-quality super-sampled images of vintage els based on two prominent image quality metrics: the Peak
faces while maintaining their original features and expres- Signal-to-Noise Ratio (PSNR) and the Structural Similarity
sions can be identified. Since our method covers a broad Index Measure (SSIM). The PSNR is an engineering term
range of models and metrics, it is more thorough and rig- for the ratio between the maximum possible power of a sig-
orous. It can yield insightful information for future studies nal and the power of corrupting noise that affects the fidelity
and applications about image super-resolution. of its representation, making it a standard for measuring the
quality of reconstruction of lossy compression codecs. On
5. Experiment the other hand, SSIM is a method for comparing similarities
between two images. The SSIM index can be viewed as a
Dataset and Model Setup, Our experiment commenced quality measure of one of the images being compared, pro-
with the usage of the FFHQ dataset, a comprehensive col- vided the other image is regarded as of perfect quality. The
lection of 70,000 high-resolution PNG images of human higher value in PSNR and SSIM indicate the better model,
faces. Owing to its high-quality content and detailed at- As table 1, HAT is the highest amount all other models. The
tributes, the FFHQ dataset was ideal for the purpose of results of our experiment are visually presented in figures
training our super-resolution models. (a), (b), (c), (d), (e) and (f). An analysis of these results re-
The models we trained were cloned from repositories veals distinct characteristics and performance levels of each
available on GitHub [8, 9, 10, 11]. The training process was super-resolution model. The HAT model’s output closely
Method PSNR SSIM In summary, our experimental results demonstrate the su-
perior performance of HAT over state-of-the-art methods
SRCNN 27.855 0.819 such as SwinIR and ESRGAN in the context of facial image
ESRGAN 28.667 0.826 restoration. The ability of HAT to preserve fine details such
SwinIR 30.098 0.860 as eyelashes and eyebrows contributes to its overall effec-
HAT 30.535 0.866 tiveness in producing high-quality, realistic images, making
it a promising approach for various applications in the field
Table 1. Results. Ours is better.
of image processing and computer vision.
In summary, each super-resolution model has its
resembles the ground truth image. The processed image strengths and weaknesses, and the choice of model depends
appears to have less noise and looks as if a filter has been on the specific requirements of the task. In our case, HAT
applied to it, resulting in a higher exposure. This can be performed the best overall, offering a good balance between
attributed to HAT’s advanced learning capabilities, which detail enhancement, noise suppression, and faithfulness to
are designed to suppress noise while enhancing the overall the original image.
quality of upscaling. The model effectively learns the high-
frequency components, resulting in more visually pleasing
5.1. Conclusion
images that maintain a balance between sharpness and nat- Our study highlights the effectiveness of the HAT model
uralness. in facial image restoration, making it a promising approach
The SRCNN model [1], on the other hand, seems to have for various applications in the field of image processing
made minimal changes to the low-resolution input image. and computer vision. However, it is important to recog-
Visually, the output appears to be a higher-resolution ver- nize that each super-resolution model has its strengths and
sion of the original low-resolution input, with no signifi- weaknesses, and the choice of model depends on the spe-
cant enhancement in detail or quality. However, it is worth cific requirements of the task. Future work could explore
noting that SRCNN requires the least computational power the integration of the strengths of these models to develop
among the models we tested. It has a simpler network struc- a more robust and versatile super-resolution method for a
ture, which results in faster computation times but at the wider range of applications.
cost of more advanced super-resolution features.
The ESRGAN model shows an impressive level of de- 5.2. References
tail enhancement [2]. This is due to the model’s perceptual
loss function, which optimizes textures and details during [1] C. Dong, C. C. Loy, K. He, and X. Tang, ”Image
the upscaling process. However, this strength can also be Super-Resolution Using Deep Convolutional Networks,”
a weakness. The ESRGAN model, in its quest to add de- arXiv:1501.00092 [cs], Jul. 2015. [Online]. Available:
tail, can sometimes introduce elements that deviate from the https://arxiv.org/abs/1501.00092.
original image’s content. This is because the model uses a [2] X. Wang et al., ”ESRGAN: Enhanced
generative adversarial network (GAN) structure. The gen- Super-Resolution Generative Adversarial Net-
erator in ESRGAN is trying to create images that the dis- works,” arXiv.org, 2018. [Online]. Available:
criminator cannot distinguish from high-resolution images. https://arxiv.org/abs/1809.00219.
Sometimes, in this process, it generates high-frequency de- [3] X. Wang, L. Xie, C. Dong, and Y. Shan, ”Real-
tails that may not be present in the ground truth image, lead- ESRGAN: Training Real-World Blind Super-Resolution
ing to the creation of objects or details that shouldn’t be with Pure Synthetic Data,” arXiv.org, Aug. 17, 2021. [On-
there. line]. Available: https://arxiv.org/abs/2107.10833 (accessed
Based on our experimental results, we can conclude that Jul. 10, 2023).
HAT [5, 6]is the best-performing method among the three [4] J. Liang, J. Cao, G. Sun, K. Zhang, L. Van Gool,
tested approaches for facial image restoration. HAT con- and R. Timofte, ”SwinIR: Image Restoration Using Swin
sistently demonstrates superior performance in preserving Transformer,” arXiv.org, Aug. 23, 2021. [Online]. Avail-
crucial facial features such as eyelashes and eyebrows, re- able: https://arxiv.org/abs/2108.10257.
sulting in more realistic and visually appealing images. In [5] X. Chen, X. Wang, J. Zhou, and D. Chen, ”Ac-
comparison, SwinIR and ESRGAN show inferior perfor- tivating More Pixels in Image Super-Resolution Trans-
mance in terms of detail preservation, with ESRGAN be- former,” arXiv (Cornell University), May 2022, doi:
ing particularly prone to overwriting eyelashes with other https://doi.org/10.48550/arxiv.2205.04437.
color pixels. This finding highlights the effectiveness of [6] X. Chen et al., ”HAT: Hybrid Attention Transformer
HAT in capturing and restoring fine details in facial images, for Image Restoration,” arXiv (Cornell University), Sep.
which is essential for accurate and realistic representation. 2023, doi: https://doi.org/10.48550/arxiv.2309.05239.
[7] ”NVlabs/ffhq-dataset,” GitHub, Apr. 09, 2021. [On-
line]. Available: https://github.com/NVlabs/ffhq-dataset.
[8] S. Salaria, ”Using The Super-Resolution Con-
volutional Neural Network for Image Restoration,”
GitHub, Nov. 17, 2023. [Online]. Available:
https://github.com/xoraus/Super-Resolution-CNN-for-
Image-Restoration (accessed Dec. 06, 2023).
[9] Xintao, ”xinntao/ESRGAN,” GitHub, Jul. 26, 2023.
[Online]. Available: https://github.com/xinntao/ESRGAN.
[10] J. Liang, ”SwinIR: Image Restoration Using Swin
Transformer,” GitHub, Oct. 02, 2022. [Online]. Avail-
able: https://github.com/JingyunLiang/SwinIR (accessed
Oct. 02, 2022).
[11] ”HAT,” GitHub, Dec. 06, 2023. [Online]. Avail-
able: https://github.com/XPixelGroup/HAT (accessed Dec.
06, 2023).