You are on page 1of 5

IMAGE INPAINTING

An overview and a look into its future


Samuel Luis Rodríguez Maness
December 2022

1. Abstract

Image painting is a process that deals with the restoration or reconstruction of corrupted regions
of an image using the structural information already existing in the picture. However, many times
unreasonable results are generated due to the lack of said existing data. Being a difficult task to
complete with accuracy, researchers have come up with all kinds of procedures hoping to optimize
results and become the new established norm. This paper wishes to provide an overview of the
current state of the image inpainting research and inform on some of the latest innovative practices
that come with their advantages and with their limitations.

2. Introduction

Image inpainting, which aims to fill missing areas of pictures in a plausible way, has had many
real-world applications over the years ranging from the restoration of film, photographs, and
paintings to the removal of publicity and subtitles from images [1], making it a highly valuable tool.
Therefore, image inpainting is becoming a more and more researched topic, having various
methods to fulfill this process. Essentially, there are two divisions when it comes to the way of
proceeding with image inpainting; that is using traditional methods or using the newer deep
learning-based methods based on trained networks.
Inside the traditional strategies, there are also sub-divisions, where they are usually diffusion
based or patch based, relying on mathematics and on the chance of the unknown region being alike
the known regions, providing some poor results if the hole area is too large, such as a discontinuous
texture at the hole region. The deep learning methods prove to perform better, becoming a hot spot
in recent years inside computer vision, but they require large amounts of data, are complex to
implement, and they are effective only if there is a similarity in training and testing images [2].
Lately, investigators have come up with more interesting methods than these traditional
solutions, which seek to make image inpainting much more effective, also considering the context
in which it is used. Further ahead, some of the most important and inventive approaches will be
covered in detail, as to inform the reader on advances in the field and how they can be beneficial
for us.
3. Novel Strategies for Image Inpainting

Using edge loss

For effective image inpainting, several researchers have found it particularly useful to make use of
the edge information in an image, to provide guidance in terms of edge maps.
A notable model based on edge and structure information is the medical image inpainting, used for
fixing distortion in medical images such as Magnetic resonance imaging (MRI).[3] This model obtains
the edge image by a Canny edge detector and the structure image by RTV. It then utilizes both edge
and structure images as priors of the ground truth, which will then guide the network to provide
reasonable results. Finally, to synthesize significant structural textures a multi-level loss is performed.
Like the medical image inpainting model, facial image inpainting has the goal of reconstructing
faces, and relies heavily on the visible edges of the person’s face.[4] An application for this practice is
facial recognition and security, such as when the user wishes to unlock their phone with their face, but
it’s partially covered with a facemask.
Another study also makes use of the Canny edge operator, to use the edge loss with the goal of
making a spatial projection layer to project information from non-hole regions to hole regions [5]. This
procedure allows an efficient spatial consistency to be formed with the resulting inpainted image.
According to the experiments held, the results are very positive with more reasonable textures than
other methods. However, this approach assumes that edge and structure information is the key to
reconstructing the image, and if we can’t extract this information in another type of image, the model
could prove less effective. Specifically, the spatial projections model elaborates on some limitations it
encounters, such as inconsistencies when dealing with images with large textural information.

Text-guidance

If we consider how descriptions about pictures can guide people to restore pictures purposely
in our daily life, in this strategy, the user attempts to guide the inpainting tasks using text inputs. We
will see how textual description as a reference can provide rich information for image inpainting [6].
There have been several proposals and models invented, but it is often proven insufficient leading to
discontinuities with the known and unknown areas.
One of the best working models called TGNet [7] (Text-guided Inpainting Network) provides
a deeply fused text-image module. This works in two steps, first with the extraction of the image and
text feature, and secondly with a mask attention module aimed at smoothing boundaries between the
hole and complete regions.
Text guidance is innovative because it can not only repair the image with great precision using
text inputs, but it can change its attributes almost entirely. This gives the user freedom to make
modifications to the hole area depending on what input they give. Say, if the corrupted image consists
of a man’s face, we can change the input and obtain a face with blue eyes, or a moustache.

This method appears to be one of the most interesting approaches due to the combination of
text and image modules, and by the range of possibilities that can be offered in the output depending
on what the user chooses to write. It’s also worth noting that such new technology must be limited and
perhaps not all words that the user inputs give desired results.

Learning-based approaches improved

Even state-of-the-art models that are based on deep learning are subject to limitations due to
the complexity of some image inpainting cases. Some researchers have based themselves on these
models and aimed to fix said limitations by incorporating changes in the learning process.
When we encounter images with complicated foreground and background composition it
becomes a very challenging image inpainting scenario. To tackle this issue, foreground-aware image
inpainting proposes a system which explicitly incorporates the knowledge of a foreground object in a
training process.[8] So, the model will detect the contour of the foreground first and then inpaint the
missing region using the predicted contour as a guidance.
There is another notable study which has come up with the idea of implementing an easier and
more straightforward application of image inpainting through neural network hallucinations.[9] The
system makes use of a pre-trained deep neural network instead of a trained one. This pre-trained network
then provides an inaccurate “dreamlike” output in the hole region called hallucinations, which will later
be regularized by using the Total Variation (TV) norm to the point where a viable result is provided.
Making use of an untrained neural network proves to be much easier than using a specialized deep
neural network. Neural network hallucinations not only prove useful in image inpainting, but also in
the research of Natural Language Generation (NLG) [10].
While these models all work well with their tests that have been implemented, the use that can
be given to them is specific to certain cases. The foreground-aware model draws a contour of the
foreground object, but if we do not have a clear one, or if we possess multiple objects, it might not hand
out a wanted output. In the case of the pre-trained deep neural network, depending on the quality of the
regularizes we will receive a better or worse outcome. Consequently, there is room for improvement in
this strategy.
Context means everything

As it has been described in this paper, there are many novel approaches to improving the image
inpainting procedure, and all seem to provide encouraging results. A very significant aspect to consider
if we wish to make image inpainting as accurate as possible, is taking into account what situation we
are dealing with and what is the context when we wish to repair an incomplete image.

For example, depending on if the picture is a medical scan or if it is a portrait of a person


changes things greatly. In this field of visual computing, it is difficult to obtain a single best procedure
to englobe all problems. To achieve best results, we must acknowledge that for now, we need multiple
strategies to approach different image inpainting situations.
After describing various image inpainting techniques, we can deduce when it is more
appropriate to make use of the models they propose. For instance, if we do not know what the corrupted
area holds and we want to fill it with specific information, we can make use of textual guidance.
Alternatively, for medical scans it is convenient to use the medical image inpainting focusing on edge
and structural information. In the case of having a clear background and foreground, foreground-aware
image inpainting should be effective, and if we wish to complete an image with lower quality, making
use of hallucinations of neural networks is very straightforward.
Knowing some limitations, we can also know when not to use a model, as it is the case of image
inpainting via spatial projections, that has trouble providing good results with highly texturized images,
like in the case of a rocky landscape.

4. Conclusion

The search for alternatives that work beyond the traditional image inpainting methods seems to
give many feasible solutions, making it difficult to choose the best model, and hence making context
important to know when we should use one process or another. Still, researchers will very likely come
up with new strategies as investigations progress, providing new solutions for different type of image
inpainting problems that we can encounter.

References

[1] Marcelo Bertalmio, Guillermo Sapiro, Vincent Caselles, and Coloma Ballester. 2000. Image
inpainting. In Proceedings of the 27th annual conference on Computer graphics and interactive
techniques (SIGGRAPH '00). ACM Press/Addison-Wesley Publishing Co., USA, 417–424.
https://doi.org/10.1145/344779.344972

[2] Xiaobo Zhang, Donghai Zhai, Tianrui Li, Yuxin Zhou, Yang Lin,Image inpainting based on deep
learning: A review,Information Fusion,Volume 90,2023,Pages 74-94,ISSN 1566-2535,
https://doi.org/10.1016/j.inffus.2022.08.033.

[3] Qianna Wang, Yi Chen, Nan Zhang, Yanhui Gu, “Medical image inpainting with edge and structure
priors”, Measurement, Volume 185, 2021, 110027, ISSN 0263-2241,
https://doi.org/10.1016/j.measurement.2021.110027.

[4] X. Gao, M. Nguyen and W. Q. Yan, "Face Image Inpainting Based on Generative Adversarial
Network," 2021 36th International Conference on Image and Vision Computing New Zealand
(IVCNZ), 2021, pp. 1-6, doi: 10.1109/IVCNZ54163.2021.9653347.

[5] Shruti S Phutke, Subrahmanyam Murala, “Image inpainting via spatial projections”, Pattern
Recognition,Volume 133, 2023, 109040, ISSN 0031-3203,
https://doi.org/10.1016/j.patcog.2022.109040.

[6] Ailin Li, Lei Zhao, Zhiwen Zuo, Zhizhong Wang, Wei Xing, Dongming Lu, MIGT: Multi-modal Image
Inpainting Guided with Text, Neurocomputing, 2022, ISSN 0925-2312,
https://doi.org/10.1016/j.neucom.2022.11.074.

[7] Y. Gao and Q. Zhu, "Text-Guided Image Inpainting," 2022 IEEE 6th Information Technology and
Mechatronics Engineering Conference (ITOEC), 2022, pp. 1804-1807, doi:
10.1109/ITOEC53115.2022.9734369.

[8] W. Xiong et al., "Foreground-Aware Image Inpainting," 2019 IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR), 2019, pp. 5833-5841, doi: 10.1109/CVPR.2019.00599.

[9] A. Fawzi, H. Samulowitz, D. Turaga and P. Frossard, "Image inpainting through neural networks
hallucinations," 2016 IEEE 12th Image, Video, and Multidimensional Signal Processing Workshop
(IVMSP), 2016, pp. 1-5, doi: 10.1109/IVMSPW.2016.7528221

[10] Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Yejin Bang, Andrea
Madotto, and Pascale Fung. 2022. Survey of Hallucination in Natural Language Generation. ACM
Comput. Surv. Just Accepted (November 2022). https://doi.org/10.1145/3571730

You might also like