You are on page 1of 52

CARTOONIFYING AN

IMAGE
Submitted in partial fulfillment of the requirements
of the degree of

T. E. Computer Engineering
By

Kartik Parekh 37 192264


Shubh Shah 38 192265

Guide (s):

Ms. Varsha Nagpurkar


Assistant Professor

Department of Computer Engineering


St. Francis Institute of Technology
(Engineering College)

University of Mumbai
2020-2021

1
CERTIFICATE
This is to certify that the project entitled “Cartoonifying An Image” is a bonafide work of

“Kartik Parekh (37), Shubh Shah (38)” submitted to the University of Mumbai in partial

fulfillment of the requirement for the award of the degree of T.E. in Computer Engineering

(Name and sign)


Guide

(Name and sign)


Head of Department
Project Report Approval for T.E.

This project report entitled (Cartoonifying An Image) by (Kartik Parekh,


Shubh Shah) is approved for the degree of T.E. in Computer Engineering.

Examiners

1.

2.

Date:

Place: Mumbai
Declaration
I declare that this written submission represents my ideas in my own words
and where others' ideas or words have been included, I have adequately cited and
referenced the original sources. I also declare that I have adhered to all principles
of academic honesty and integrity and have not misrepresented or fabricated or
falsified any idea/data/fact/source in my submission. I understand that any
violation of the above will be cause for disciplinary action by the Institute and can
also evoke penal action from the sources which have thus not been properly cited
or from whom proper permission has not been taken when needed.

(Signature)

Kartik Parekh 37
Shubh Shah 38

Date:
Abstract
Since the user for cartoon images retrieval system targets to get relevant images to query image from
database within same object (i.e. a user has cartoon image with object ―any realistic photo or human
face, in this case the user will target to get all relevant image with), therefore An important step in
cartoon image retrieval is defining the object within cartoon image .In this paper, an efficient method for
objects extraction from normal images is introduced; it is based on general assumptions related to color
and locations of objects in cartoon images, the objects are generally drawn near the center of the image,
the background colors is the more frequently drawn near the edges of cartoon image, and the object
colors is less touch for the edges. The processes of color quantization, seed filling and found the object
ghost have been used. The results of conducted tests indicated that the system have promising efficiency
for extracting both single or multi object(s) lay in simple and complex backgrounds of cartoon images.
We have performed qualitative and quantative analysis and test on many images to get a more accurate
result.
Contents
Chapter Contents Page No.
INTRODUCTION 7
1.1 Description 7
1.2 Problem Formulation 7
1 1.3 Motivation 8
1.4 Proposed Solution 8
1.5 Scope of the project 8
2 REVIEW OF LITERATURE 9-11
SYSTEM ANALYSIS 12-13
3.1 Functional Requirements 12
3.2 Non-Functional Requirements 12
3 3.3 Specific Requirements 12-13
Use-Case Diagrams and description 13
3.4

ANALYSIS MODELING 14-15


4.1 Activity Diagrams / Class Diagram 14
4
4.2 Functional Modeling 15
DESIGN 16-25
5 5.1 Architectural Design 16-23
5.2 User Interface Design 24-25
IMPLEMENTATION 26-40
Algorithms / Methods Used 26-31
6 6.1
6.2 Working of the project 32-40
7 CONCLUSION 41
1 INTRODUCTION:

Cartoon images play essential roles in our everyday lives especially in entertainment, education, and
advertisement, that become an increasingly intensive research in the field of multimedia and computer
graphics. The automatically cartoon object extraction is very useful in many applications; one of the
most importantly is the cartoon images retrieval, where the user for cartoon images retrieval system
targets to get similar images to query image from database in character (i.e., a user has cartoon image, so
the user will target to get all relevant image with same real life character). Today, a number of
researchers have exploited the concepts related to content based images retrieval (CBIR) to search for
cartoon images containing particular object(s) of interest. Several region-based retrieval methods
proposed.
Some of the automatic methods, which discriminate the region(s) of interest from the other less useful
regions in an image, have been adapted to retrieve cartoon characters; they use partial features for
recognizing regions and/or aspects which are suitable for cartoon characterization or gesture recognition.
Some efforts go beyond extracting central objects, others used Salient Object Detection (SOD). In this
paper, a simple automatic method for objects extraction from cartoon image is proposed; it is based on
the assumption that the wanted object is founded within or close to the central part of image.

1.1 Description:

The process carried out here is cartoonization in which a given realistic image or any human face can
be converted into a cartoon type filter. Any given realistic photo gets converted into a cartoon type
photo on a click of a ‘convert’ button. Basically this filter involves with smoothing of the image and
masking the sharp edges on the converted image.

1.2 Problem Formulation:

Our childhood was filled with cartoons as an integral part of our live at that time, we all used be
very fond of cartoons and animated character, thereby here we propose a system where you can
convert any given image (let it be image of your own) to a cartoonified version of it. We have used
one of the simplified algorithm of ML for converting our image to proper cartoonified version of it.
1.3 Motivation:
Following were the motivation which led us to research:
 The higher literature review reveals that there varied gaps within the study of converting image to
a cartoon image.
 An obvious disadvantage of smoothing is the fact that it does not only smooth noise, but also
blurs important features such as edges and, thus, makes them harder to identify.
 Linear diffusion filtering dislocates edges when moving from finer to coarser scales.
 To implement multiple number of bilateral filters.
 To apply multiple number of values to the existing parameters.

1.4 Proposed Solution:

In this project, we have proposed a technique wherein we import an image and then we start with
multiple processes to get to the final cartoonified output. First we convert the image in gray scale
resolution then use edge detector to detect the sharp edges of the image then after the edges are
extracted we blur the image to give it a more cartoonifying effect and last we apply colour
quantization process to convert the image into cartoon painting and thus after this our final output is
ready.

1.5 Scope of the project:

The project showed that image was successfully converted into a cartoon-style image with help of
Cartoon colour quantization process and masking the edges and giving it a smooth effect also the video
clips were transformed into an animation clip with the help of the python library called cv2. In the future
work, we would like to focus more on generating a portrait defined HD image even though we used the
loss function but still failed to the result. We also plan on focusing more on the video conversion so we
get HD or a 4k quality video which will be more beneficial. Humans around the globe like to see a better
quality of image or movie or let it be any visuals, so achieving a HD quality remains our main focus and
a scope to extend this project.
2 REVIEW OF LITERATURE:

We have referred these three research papers for some more information on our project topic
Cartoonifying An Image.

Paper 1:

Auto-painter cartoon image generation from sketch using by using cGAN


The authors studied various problems faced by sketch artist while sketching various black and white
cartoon drawings, for consideration coloring of various sketches, mixing of a different color to get a
unique shade for a particular sketch. According to their research, some difficulties were faced by the
artist to get a unique or desired color they need after mixing two or more colors. So to overcome this
problem an application was introduced which was known as “sketch-to-image synthesis while using
conditional generative adversarial networks” (cGAN)[3]. Later on, they found that the application faced
a problem and failed to give the desired output. To avoid this issue, they invented the Auto-painter model
which could automatically generate suitable colors for a sketch. Their application was based on
conditional GAN with ‘Unet’[2] structure which allowed the output image to have both low-level
information of sketch as well as learned high-level color information. They also founded more
constraints based on the pix2pix model to obtain finer painting. Here they worked on the autopainter to
alter to color control so that their network could alter the combined result which could satisfy the user by
various colors. The result showed that the auto-painter could generate a polished animated image from
the two given datasets. In spite of the guaranteed result, the system suffered from difficulties of adjusting
parameters just like other D. Learning models. Also, the combination network structure resulted in less
speed for training.

Paper 2:

Library Cartoons: A Liter oons: A Literature Review of Libr perspective on Library-


themed Car  y-themed Cartoons, Caricatur oons, Caricatures, and Comics Julia B.
Chambers is a MLIS up and-comer at San Jose State University's School of Library and
Information Science.  To comprehend contrasting perspectives on past occasions,
antiquarians, political theory  researchers, and sociologists have examined political and
publication kid's shows with topics  going from decisions to financial approach to human
rights. However sparse examination has  been devoted to kid's shows with library topics.
The creator of this paper inspects peer-explored  writing regarding the matter of library
kid's shows, including verifiable foundation, examination  of ongoing subjects, and
contentions for advancing library-themed kid's shows, exaggerations,  and funnies. The
creator finds a huge hole in the writing on this theme and presumes that data  experts
would profit by an extensive substance investigation of library-themed kid's shows to 
improve comprehension of the essentialness of libraries during noteworthy occasions,
survey  public view of libraries, and distinguish patterns after some time.  
Researchers have examined and dissected the impact and estimation of publication kid's
shows in  the United States since the beginning of the twentieth century, not long after
kid's shows turned  into a standard element in East Coast papers. In a 1933 article,
American craftsmanship and  scholarly pundit Elizabeth Luther Cary contended that
American exaggeration gave  understanding into history, uncovering perspectives or
elective mentalities that papers and  history books have in any case neglected to record.
Twenty years afterward, Stephen Becker  (1959), creator of Comic Art in America, agreed
that early instances of exaggeration served to  make up for editorial shortfalls, in some
cases going about as the solitary satisfactory source for  editorial excessively indecent or
touchy to show up in composed publications. Richard Felton  Outcault's Yellow Kid
publication kid's shows, distributed in 1896 in the New York World, are  one model:
"[Yellow Kid] brought something new and disturbing into American homes: the  ghettos,
and ghetto children, and customary savagery, and slang, and the arrogance of  destitution"
(Becker, 1959, p. 13).  
Contemporary publication kid's shows keep on filling in as an adequate arrangement for circulating
disputable perspectives (Kuipers, 2011), frequently with the purpose of influencing public assessment. In
an investigation of political kid's shows with official political decision subjects, Edwards and Ware (2005)
analyzed the effect of publication kid's shows on open assessment and presumed that negative
personifications of electors added to public indifference toward the discretionary cycle. Comparative
decisions about the intensity of comic craftsmanship  to impact general assessment were
accounted for in an examination by Josh Greenberg (2002),  whose exploration recommended
that kid's shows may assist individuals with interpreting life  occasions. Conversely, different
researchers have inspected political kid's shows as a reflection  of general assessment instead of a
provocateur of thought. Anyway the writing, here, presents  opposing ends. Edward Holley and
Norman Stevens (1969), for example, contend that kid's  shows are an exact depiction of public
assessment, while others highlight proof showing that  kid's shows don't really mirror the overall
view nor fill in as opportune pictures of notable  occasions (Gilmartin and Brunn, 1998; Meyer,
Seidler, Curry, and Aveni, 1980). Concentrated as  artistic expressions (Robb, 2009), Zeitgeist
ephemera (Holley and Stevens, 1969), essential  sources (Thomas, 2004), and even problem
solvers (Edwards and Product, 2005; see additionally  Marin-Arrese, 2008; Neuberger and
Kremar, 2008), article kid's shows have been the subject of  investigation in an assortment of
scholarly trains. Nonetheless, insufficient exploration has been  devoted to the subject of kid's
shows or cartoons containing library topics. Indeed, the creator of  this writing survey discovered
just one investigation, directed by Alireza Isfandyari-Moghaddam  and Vahideh Kashi-Nahanji
(2010), dedicated to the substance examination of topics in a little  choice of library kid's shows,
and that review neglected to portray its determination cycle or on  the other hand the strategy for
content examination utilized. However library-themed kid's shows  exist in wealth and go back to
the late 1800s. Library kid's shows not just offer a wide scope of  critique on custodians,  
library financing, and the digitization of data, however they likewise give exceptional 
understanding into the historical backdrop of libraries in the U.S. For instance, an
unmistakable  reading material utilized in initial library science classes, Foundations of
Library and Data  Science by Richard E. Rubin (2010), presents a commendatory
perspective on Andrew  Carnegie's $56 million commitment toward the development of
thousands of libraries across  America (p. 60). While Rubin takes note of that a few people
scrutinized Carnegie's gifts as a  type of social control, there is no notice of the public's
shock over the taxation rate they made.  Nor is there notice of the see held by some that the
development of these libraries was simply  about Carnegie's personality as opposed to
about the public great. However various article kid's  shows, for example, the two models
underneath, mock Carnegie's generosity, disparage his self image, and issue critique on the
taxation rate at last delivered by his endowment of public  libraries to urban communities
around the nation. Notwithstanding the rich history of library  kid's shows, many
examination inquiries concerning kid's shows containing library subjects have  never been
tended to in the writing. For example, what were probably the most punctual library  kid's
shows in this nation? What were normal topics? Have the subjects changed throughout the 
long term – particularly since the Internet turned into a broad examination device? Most 
significantly, does the investigation of library kid's shows matter?  
This survey of academic writing on the subject of library kid's shows distinguishes past
regions  of study, features some topical patterns, and contends that a far reaching content
examination of  library-themed kid's shows would add to the field of library science
similarly that researchers in different orders have utilized publication kid's shows to enhance
their comprehension of notable  occasions, investigate public discernment, and recognize
patterns.  

Paper 3:

An introduction to image synthesis with GAN.


A couple of years back, there had been tremendous growth in the research of GAN
(Generative Adversarial Network). GAN was put forward in the year 2014 where it was
introduced in various applications such as deep learning, natural language processing
(NLP). From this paper, we explored the different methods of image synthesis such as
direct method, Hierarchal method and iterative method. They spoke about two methods of
image synthesis which are “text-to-image conversion” and “image-to-image translation”.
In text-to-image conversion, current methods worked well on a pre-defined dataset where
each image contains one object such as Caltech-UCSD Birds and Oxford102, but the
performance on complex datasets such as MSCOCO is much poor. While some of the
models were successful in producing realistic images of rooms in LSUN because the
rooms didn’t contain any living things. So, they got a successful image of the room
because living things are more complicated to convert then static objects. This was the
limitation in this model and it was necessary to learn different concepts of the object. To
improve the performance of GAN and enhance output in the task they trained different
models that would generate a single object and train another model which would learn to
combine various objects according to text descriptions, and that CapsNet.
3 SYSTEM ANALYSIS:

3.1 Functional Requirements:

Dataset: We used around 1000 images (including realistic images, some reallife photos and human face) which
were made to undergo our system for testing and finding accuracy. We used different images like light colour or
dark colour images to check whether the functionality changes or not. The images taken into consideration and
their relevant output is shown in Fig1. All the images taken into consideration gave a cartoonified output.

Libraries in python:
 NumPy==1.19.5
 OpenCV-python==4.5.1. *
 Scipy 1.6.3

3.2 Non-Functional Requirements: These are basically the quality constraints that the system
must satisfy according to the project contract. The priority or extent to which these factors are
implemented varies from one project to other. They are also called non-behavioral requirements.
 The processing of each request should be done within 10 seconds.
 The system should provide better accuracy.
 The image should be clear.
 The system should be user friendly.

3.3 Specific Requirements:

Hardware:
The hardware environment consists of the following:
CPU: Intel Pentium IV 600MHz or above
Mother Board: Intel 810 or above
Hard disk space: 20GB or more
Display: Color Monitor Memory: 128 MB RAM
Other Devices: Keyboard, mouse
Client side:

Monitor screen: The software shall display information to the user via the monitor screen
Mouse: The software shall interact with the movement of the mouse and the mouse buttons.
The mouse shall activate areas for data input, command buttons and select options from menus.
Keyboard: the software shall interact with the keystrokes of the keyboard.

Software:

Development Tools:
Front End: Django
Back End: Python
Operating System: Windows 10
The actual program that will perform the operations is written in Python.

3.4 Use case diagram and description:

(Fig 1)

In the use case diagram shown above;


User have given the functionality to upload any realistic image or human face, with that image he can
chose any filter which he/she wants to apply.
Gets to view the output in cartoonified format and the last functionality which is given is he/she can
download the output by right clicking on the image.
4 ANALYSIS MODELING

4.1 Activity diagram:

(Fig 2)
4.2 Functional Diagram:

(Fig 3)
5 DESIGN

5.1 Architectural Design:


To create a cartoon effect we need to apply the following steps:
1. Detecting and emphasizing edges
To produce accurate carton effects, as the first step, we need to understand the difference between a
common digital image and a cartoon image. In the following example, you can see how both images look
like.

At the first glance we can clearly see two major differences.


1. The first difference is that the colors in the cartoon image are more homogeneous as compared to
the normal image.
2. The second difference is noticeable within the edges that are much sharper and more pronounced
in the cartoon.
Now, when we have clarified two main differences it is straightforward what our job is. We need to
detect and emphasize the edges and apply a filter to reduce the colour palette of the input image. When
we achieve that goal, we would obtain a pretty cool result.
Let’s begin by importing the necessary libraries and loading the input image.
# Necessary imports
import cv2
import numpy as np
# Importing function cv2_imshow necessary for programming in Google Colab
from google.colab.patches import cv2_imshow
Now, we are going to load the image.
img = cv2.imread("Superman.jpeg")
cv2_imshow(img)
The next step is to detect the edges. For that task, we need to choose the most suitable method.
Remember, our goal is to detect clear edges. There are several edge detectors that we can pick. Our first
choice will be one of the most common detectors, and that is the Canny edge detector. But unfortunately,
if we apply this detector we will not be able to achieve desirable results. We can proceed with Canny,
and yet you can see that there are too many details captured. This can be changed if we play around with
Canny’s input parameters (numbers 100 and 200).
edges = cv2.Canny(img, 100, 200)
cv2_imshow(edges)

Although Canny is an excellent edge detector that we can use in many cases in our code we will use
a threshold method that gives us more satisfying results. It uses a threshold pixel value to convert a
grayscale image into a binary image. For instance, if a pixel value in the original image is above the
threshold, it will be assigned to 255. Otherwise, it will be assigned to 0 as we can see in the following
image.
However, a simple threshold may not be good if the image has different lighting conditions in different
areas. In this case, we opt to use cv2.adaptiveThreshold() function which calculates the threshold for
smaller regions of the image. In this way, we get different thresholds for different regions of the same
image. That is the reason why this function is very suitable for our goal. It will emphasize black edges
around objects in the image.
So, the first thing that we need to do is to convert the original colour image into a grayscale image. Also,
before the threshold, we want to suppress the noise from the image to reduce the number of detected
edges that are undesired. To accomplish this, we will apply the median filter which replaces each pixel
value with the median value of all the pixels in a small pixel neighbourhood. The
function cv2.medianBlur()requires only two arguments: the image on which we will apply the filter and
the size of a filter.
The next step is to apply cv2.adaptiveThreshold()function. As the parameters for this function we need to
define:
 max value which will be set to 255
 cv2.ADAPTIVE_THRESH_MEAN_C: a threshold value is the mean of the neighborhood area.
 cv2.ADAPTIVE_THRESH_GAUSSIAN_C: a threshold value is the weighted sum of
neighborhood values where weights are a gaussian window.
 Block Size – It determents the size of the neighborhood area.
 C – It is just a constant which is subtracted from the calculated mean (or the weighted mean).
For better illustration, let’s compare the differences when we use a median filter, and when we do not
apply one.
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
edges = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 9, 5)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
gray_1 = cv2.medianBlur(gray, 5)
edges = cv2.adaptiveThreshold(gray_1, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 9, 5)
As you can see we will obtain much better results when we apply a median filter. Naturally, edge
detection obviously is not perfect. One idea that we will not explore here and that you can try on your
own is to apply morphological operations on these images. For instance, erosion can assist us here to
eliminate small tiny lines that are not a part of a large edge.

Morphol
ogical transformations with OpenCV in Python
2. Image filtering
Now we need to choose a filter that is suitable for converting an RGB image into a color painting or a
cartoon. There are several filters that we can use. For example, if we choose to
use cv2.medianBlur() filter we will obtain a solid result. We will manage to blur the colors of the image
so that they appear more homogeneous. On the other hand, this filter will also blur the edges and this is
something that we want to avoid.
The most suitable filter for our goal is a bilateral filter because it smooths flat regions of the image while
keeping the edges sharp.

Bilateral filter
Bilateral filter is one of the most commonly used edge-preserving and noise-reducing filters. In the
following image you can see an example of a bilateral filter in 3D when it is processing an edge area in
the image.

Similarly to the Gaussian, bilateral filter replaces each pixel value with a weighted average of nearby
pixel values. However, the difference between these two filters is that a bilateral filter takes into account
the variation of pixel intensities in order to preserve edges. The idea is that two nearby pixels that occupy
nearby spatial locations also must have some similarity in the intensity levels. To better understand this
let’s have a look in the following equation:
BF[I]p=1Wp∑q∈SGσs(∥p−q∥)Gσr(Ip−Iq)Iq
Were:
Wp=∑q∈SGσs(∥p−q∥)Gσr(Ip−Iq)
Here, the term 1Wp is a normalized weighted average of nearby pixels p and q.
Parameters σs and σr control the amount of filtering. Gσs is a spatial Gaussian function that controls the
influence of distant pixels, and Gσr is a range Gaussian function that controls the influence of pixels with
an intensity value different from the central pixel intensity Ip. So, this function makes sure that only
those pixels with similar intensities to the central pixel are considered for smoothing. Therefore, it will
preserve the edges since pixels at edges will have large intensity variation. 
Now, to visualize this equation let’s have a look at the following image. On the left we have an input
image represented in 3D. We can see that it has one sharp edge. Then, we have a spatial weight and a
range weight function based on pixel intensity. Now, when we multiply range and spatial weights we
will get a combination of these weights. In that way the output image will still preserve the sharp edges
while flat areas will be smoothed.

There are three arguments in cv2.bilateralFilter() function:


 d – Diameter of each pixel neighborhood that is used during filtering.
 sigmaColor – the standard deviation of the filter in the color space. A larger value of the
parameter means that farther colors within the pixel neighborhood will be mixed together,
resulting in larger areas of semi-equal color.
 sigmaSpace –the standard deviation of the filter in the coordinate space. A larger value of the
parameter means that farther pixels will influence each other as long as their colors are close
enough.
color = cv2.bilateralFilter(img, d=9, sigmaColor=200,sigmaSpace=200)
cv2_imshow(color)

3. Creating a cartoon effect


Our final step is to combine the previous two: We will use cv2.bitwise_and() the function to mix edges
and the colour image into a single one. If you need a more detailed explanation about bitwise operations
click on this link.
cartoon = cv2.bitwise_and(color, color, mask=edges)
cv2_imshow(cartoon)

This is our final result, and you can see that indeed we do get something similar to a cartoon or a comic
book image. Would you agree that this looks like Superman from coloured comic books?
4. Creating a cartoon effect using color quantization
Another interesting way to create a cartoon effect is by using the color quantization method. This method
will reduce the number of colours in the image and that will create a cartoon-like effect. We will perform
colour quantization by using the K-means clustering algorithm for displaying output with a limited
number of colours.
First, we need to define color_quantization() function.
def color_quantization(img, k):
# Defining input data for clustering
data = np.float32(img).reshape((-1, 3))
# Defining criteria
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 20, 1.0)
# Applying cv2.kmeans function
ret, label, center = cv2.kmeans(data, k, None, criteria, 10, cv2.KMEANS_RANDOM_CENTERS)
center = np.uint8(center)
result = center[label.flatten()]
result = result.reshape(img.shape)
return result
Different values for K will determine the number of colours in the output picture. So, for our goal, we
will reduce the number of colours to 7. Let’s look at our results.
img_1 = color_quantization(img, 7)
cv2_imshow(img_1)
Not bad at all! Now, let’s see what we will get if we apply the median filter on this image. It will create
more homogeneous pastel-like colouring.
blurred = cv2.medianBlur(img_1, 3)
cv2_imshow(blurred)

And finally, let’s combine the image with detected edges and this blurred quantized image.
cartoon_1 = cv2.bitwise_and(blurred, blurred, mask=edges)
cv2_imshow(cartoon_1)
For better comparison let’s take a look at all our outputs.
So, there you go. You can see that our Superman looks pretty much like a cartoon superhero.
5.2 User Interface Design
Streamlit is a Python framework that lets you build web apps for data science projects very quickly.
You can easily create a user interface with various widgets, in a few lines of code. Furthermore,
Streamlit is a great tool for deploying machine learning models to the web, and adding great
visualizations of your data. Streamlit also has a powerful caching mechanism, that optimizes the
performance of your app. Furthermore, Streamlit Sharing is a service provided freely by the library
creators, that lets you easily deploy and share your app with others.

(Fig 6)

(Fig 7)
(Fig 8)

(Fig 9)
6 IMPLEMENTATION

6.1 Algorithm used:

To create a cartoon effect, we need to pay attention to two things; edge and colour palette. Those are

what make the differences between a photo and a cartoon. To adjust that two main components, there are

four main steps that we will go through:

1. Load image

2. Create edge mask

3. Reduce the colour palette

4. Combine edge mask with the coloured image

Before jumping to the main steps, don’t forget to import the required libraries in your notebook,

especially cv2 and NumPy.
import cv2
import numpy as np# required if you use Google Colab
from google.colab.patches import cv2_imshow
from google.colab import files

1. Load Image

The first main step is loading the image. Define the read_file function, which includes

the cv2_imshow to load our selected image in Google Colab.

Call the created function to load the image.


uploaded = files.upload()
filename = next(iter(uploaded))
img = read_file(filename)

26
I chose the image below to be transformed into a cartoon.

Image by Kate Winegeart on Unsplash

2. Create Edge Mask

Commonly, a cartoon effect emphasizes the thickness of the edge in an image. We can detect the edge in

an image by using the cv2.adaptiveThreshold() function.

Overall, we can define the egde_mask function as:

In that function, we transform the image into grayscale. Then, we reduce the noise of the blurred grayscale
image by using cv2.medianBlur. The larger blur value means fewer black noises appear in the image.
27
And then, apply adaptiveThreshold function, and define the line size of the edge. A larger line size

means the thicker edges that will be emphasized in the image.

After defining the function, call it and see the result.


line_size = 7
blur_value = 7edges = edge_mask(img, line_size, blur_value)
cv2_imshow(edges)

Edge Mask Detection

3. Reduce the Color Palette

28
The main difference between a photo and a drawing — in terms of colour — is the number of distinct

colours in each of them. A drawing has fewer colours than a photo. Therefore, we use colour

quantization to reduce the number of colours in the photo.

Color Quantization

To do colour quantization, we apply the K-Means clustering algorithm which is provided by the OpenCV

library. To make it easier in the next steps, we can define the color_quantization function as below.

We can adjust the k value to determine the number of colours that we want to apply to the image.
total_color = 9
img = color_quantization(img, total_color)

In this case, I used 9 as the k value for the image. The result is shown below.

29
After Color Quantization

Bilateral Filter

After doing colour quantization, we can reduce the noise in the image by using a bilateral filter. It would

give a bit blurred and sharpness-reducing effect to the image.


blurred = cv2.bilateralFilter(img, d=7, sigmaColor=200,sigmaSpace=200)

There are three parameters that you can adjust based on your preferences:

30
 d — Diameter of each pixel neighbourhood

 sigmaColor — A larger value of the parameter means larger areas of semi-equal colour.

 sigmaSpace –A larger value of the parameter means that farther pixels will influence each

other as long as their colours are close enough.

Result of Bilateral Filter

4. Combine Edge Mask with the Colored Image

The final step is combining the edge mask that we created earlier, with the colour-processed image. To do
so, use the cv2.bitwise_and function.
cartoon = cv2.bitwise_and(blurred, blurred, mask=edges)

And there it is! We can see the “cartoon-version” of the original photo below.

31
Final Result

32
6.2 Working of the project:

Training the model:

# USAGE
# python train_mask_detector.py --dataset dataset

# import the necessary packages


from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.layers import AveragePooling2D
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.preprocessing.image import load_img
from tensorflow.keras.utils import to_categorical
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from imutils import paths
import matplotlib.pyplot as plt
import numpy as np
import argparse
import os

# construct the argument parser and parse the arguments


ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", required=True,
help="path to input dataset")
ap.add_argument("-p", "--plot", type=str, default="plot.png",
help="path to output loss/accuracy plot")
ap.add_argument("-m", "--model", type=str,
default="mask_detector.model",
help="path to output face mask detector model")
args = vars(ap.parse_args())

# initialize the initial learning rate, number of epochs to train for,


# and batch size
INIT_LR = 1e-4
EPOCHS = 20
BS = 32
33
# grab the list of images in our dataset directory, then initialize
# the list of data (i.e., images) and class images
print("[INFO] loading images...")
imagePaths = list(paths.list_images(args["dataset"]))
data = []
labels = []

# loop over the image paths


for imagePath in imagePaths:
# extract the class label from the filename
label = imagePath.split(os.path.sep)[-2]

# load the input image (224x224) and preprocess it


image = load_img(imagePath, target_size=(224, 224))
image = img_to_array(image)
image = preprocess_input(image)

# update the data and labels lists, respectively


data.append(image)
labels.append(label)

# convert the data and labels to NumPy arrays


data = np.array(data, dtype="float32")
labels = np.array(labels)

# perform one-hot encoding on the labels


lb = LabelBinarizer()
labels = lb.fit_transform(labels)
labels = to_categorical(labels)

# partition the data into training and testing splits using 75% of
# the data for training and the remaining 25% for testing
(trainX, testX, trainY, testY) = train_test_split(data, labels,
test_size=0.20, stratify=labels, random_state=42)

# construct the training image generator for data augmentation


aug = ImageDataGenerator(
rotation_range=20,
zoom_range=0.15,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.15,
horizontal_flip=True,
fill_mode="nearest")

# load the MobileNetV2 network, ensuring the head FC layer sets are

34
# left off

35
baseModel = MobileNetV2(weights="imagenet", include_top=False,
input_tensor=Input(shape=(224, 224, 3)))

# construct the head of the model that will be placed on top of the
# the base model
headModel = baseModel.output
headModel = AveragePooling2D(pool_size=(7, 7))(headModel)
headModel = Flatten(name="flatten")(headModel)
headModel = Dense(128, activation="relu")(headModel)
headModel = Dropout(0.5)(headModel)
headModel = Dense(2, activation="softmax")(headModel)

# place the head FC model on top of the base model (this will become
# the actual model we will train)
model = Model(inputs=baseModel.input, outputs=headModel)

# loop over all layers in the base model and freeze them so they will
# *not* be updated during the first training process
for layer in baseModel.layers:
layer.trainable = False

# compile our model


print("[INFO] compiling model...")
opt = Adam(lr=INIT_LR, decay=INIT_LR / EPOCHS)
model.compile(loss="binary_crossentropy", optimizer=opt,
metrics=["accuracy"])

# train the head of the network


print("[INFO] training head...")
H = model.fit(
aug.flow(trainX, trainY, batch_size=BS),
steps_per_epoch=len(trainX) // BS,
validation_data=(testX, testY),
validation_steps=len(testX) // BS,
epochs=EPOCHS)

# make predictions on the testing set


print("[INFO] evaluating network...")
predIdxs = model.predict(testX, batch_size=BS)

# for each image in the testing set we need to find the index of the
# label with corresponding largest predicted probability
predIdxs = np.argmax(predIdxs, axis=1)

# show a nicely formatted classification report


print(classification_report(testY.argmax(axis=1), predIdxs,
target_names=lb.classes_))

36
# serialize the model to disk
print("[INFO] saving mask detector model...")
model.save(args["model"], save_format="h5")

# plot the training loss and accuracy


N = EPOCHS
plt.style.use("ggplot")
plt.figure()
plt.plot(np.arange(0, N), H.history["loss"], label="train_loss")
plt.plot(np.arange(0, N), H.history["val_loss"], label="val_loss")
plt.plot(np.arange(0, N), H.history["accuracy"], label="train_acc")
plt.plot(np.arange(0, N), H.history["val_accuracy"], label="val_acc")
plt.title("Training Loss and Accuracy")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend(loc="lower left")
plt.savefig(args["plot"])

Detecting mask via image

# USAGE
# python detect_mask_image.py --image images/pic1.jpeg

# import the necessary packages


from tensorflow.keras.applications.mobilenet_v2 import preprocess_input
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.models import load_model
import numpy as np
import argparse
import cv2
import os
def mask_image():
# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
help="path to input image")
ap.add_argument("-f", "--face", type=str,
default="face_detector",
help="path to face detector model directory")
ap.add_argument("-m", "--model", type=str,
default="mask_detector.model",
help="path to trained face mask detector model")
ap.add_argument("-c", "--confidence", type=float, default=0.5,
help="minimum probability to filter weak detections")
args = vars(ap.parse_args())

# load our serialized face detector model from disk


37
print("[INFO] loading face detector model...")
prototxtPath = os.path.sep.join([args["face"], "deploy.prototxt"])
weightsPath = os.path.sep.join([args["face"],
"res10_300x300_ssd_iter_140000.caffemodel"])
net = cv2.dnn.readNet(prototxtPath, weightsPath)

# load the face mask detector model from disk


print("[INFO] loading face mask detector model...")
model = load_model(args["model"])

# load the input image from disk, clone it, and grab the image spatial
# dimensions
image = cv2.imread(args["image"])
orig = image.copy()
(h, w) = image.shape[:2]

# construct a blob from the image


blob = cv2.dnn.blobFromImage(image, 1.0, (300, 300),
(104.0, 177.0, 123.0))

# pass the blob through the network and obtain the face detections
print("[INFO] computing face detections...")
net.setInput(blob)
detections = net.forward()

# loop over the detections


for i in range(0, detections.shape[2]):
# extract the confidence (i.e., probability) associated with
# the detection
confidence = detections[0, 0, i, 2]

# filter out weak detections by ensuring the confidence is


# greater than the minimum confidence
if confidence > args["confidence"]:
# compute the (x, y)-coordinates of the bounding box for
# the object
box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
(startX, startY, endX, endY) = box.astype("int")

# ensure the bounding boxes fall within the dimensions of


# the frame
(startX, startY) = (max(0, startX), max(0, startY))
(endX, endY) = (min(w - 1, endX), min(h - 1, endY))

# extract the face ROI, convert it from BGR to RGB channel


# ordering, resize it to 224x224, and preprocess it
face = image[startY:endY, startX:endX]

38
face = cv2.cvtColor(face, cv2.COLOR_BGR2RGB)

39
face = cv2.resize(face, (224, 224))
face = img_to_array(face)
face = preprocess_input(face)
face = np.expand_dims(face, axis=0)

# pass the face through the model to determine if the face


# has a mask or not
(mask, withoutMask) = model.predict(face)[0]

# determine the class label and color we'll use to draw


# the bounding box and text
label = "Mask" if mask > withoutMask else "No Mask"
color = (0, 255, 0) if label == "Mask" else (0, 0, 255)

# include the probability in the label


label = "{}: {:.2f}%".format(label, max(mask, withoutMask) * 100)

# display the label and bounding box rectangle on the output


# frame
cv2.putText(image, label, (startX, startY - 10),
cv2.FONT_HERSHEY_SIMPLEX, 0.45, color, 2)
cv2.rectangle(image, (startX, startY), (endX, endY), color, 2)

# show the output image


cv2.imshow("Output", image)
cv2.waitKey(0)

if name == " main ":


mask_image()

Detecting mask via video


# USAGE
# python detect_mask_video.py

# import the necessary packages


from tensorflow.keras.applications.mobilenet_v2 import preprocess_input
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.models import load_model
from imutils.video import VideoStream
import numpy as np
import argparse
import imutils
import time
import cv2
import os

40
def detect_and_predict_mask(frame, faceNet, maskNet):
# grab the dimensions of the frame and then construct a blob
# from it
(h, w) = frame.shape[:2]
blob = cv2.dnn.blobFromImage(frame, 1.0, (300, 300),
(104.0, 177.0, 123.0))

# pass the blob through the network and obtain the face detections
faceNet.setInput(blob)
detections = faceNet.forward()

# initialize our list of faces, their corresponding


locations, # and the list of predictions from our face mask
network faces = []
locs = []
preds = []

# loop over the detections


for i in range(0, detections.shape[2]):
# extract the confidence (i.e., probability) associated with
# the detection
confidence = detections[0, 0, i, 2]

# filter out weak detections by ensuring the confidence is


# greater than the minimum confidence
if confidence > args["confidence"]:
# compute the (x, y)-coordinates of the bounding box for
# the object
box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
(startX, startY, endX, endY) = box.astype("int")

# ensure the bounding boxes fall within the dimensions of


# the frame
(startX, startY) = (max(0, startX), max(0, startY))
(endX, endY) = (min(w - 1, endX), min(h - 1, endY))

# extract the face ROI, convert it from BGR to RGB channel


# ordering, resize it to 224x224, and preprocess it
face = frame[startY:endY, startX:endX]
face = cv2.cvtColor(face, cv2.COLOR_BGR2RGB)
face = cv2.resize(face, (224, 224))
face = img_to_array(face)
face = preprocess_input(face)

# add the face and bounding boxes to their respective


# lists
faces.append(face)

41
locs.append((startX, startY, endX, endY))

42
# only make a predictions if at least one face was detected
if len(faces) > 0:
# for faster inference we'll make batch predictions on *all*
# faces at the same time rather than one-by-one predictions
# in the above `for` loop
faces = np.array(faces, dtype="float32")
preds = maskNet.predict(faces, batch_size=32)

# return a 2-tuple of the face locations and their corresponding


# locations
return (locs, preds)

# construct the argument parser and parse the arguments


ap = argparse.ArgumentParser()
ap.add_argument("-f", "--face", type=str,
default="face_detector",
help="path to face detector model directory")
ap.add_argument("-m", "--model", type=str,
default="mask_detector.model",
help="path to trained face mask detector model")
ap.add_argument("-c", "--confidence", type=float, default=0.5,
help="minimum probability to filter weak detections")
args = vars(ap.parse_args())

# load our serialized face detector model from disk


print("[INFO] loading face detector model...")
prototxtPath = os.path.sep.join([args["face"], "deploy.prototxt"])
weightsPath = os.path.sep.join([args["face"],
"res10_300x300_ssd_iter_140000.caffemodel"])
faceNet = cv2.dnn.readNet(prototxtPath, weightsPath)

# load the face mask detector model from disk


print("[INFO] loading face mask detector model...")
maskNet = load_model(args["model"])

# initialize the video stream and allow the camera sensor to warm up
print("[INFO] starting video stream...")
vs = VideoStream(src=0).start()
time.sleep(2.0)

# loop over the frames from the video stream


while True:
# grab the frame from the threaded video stream and resize it
# to have a maximum width of 400 pixels
frame = vs.read()
frame = imutils.resize(frame, width=400)

43
# detect faces in the frame and determine if they are wearing a
# face mask or not
(locs, preds) = detect_and_predict_mask(frame, faceNet, maskNet)

# loop over the detected face locations and their corresponding


# locations
for (box, pred) in zip(locs, preds):
# unpack the bounding box and predictions
(startX, startY, endX, endY) = box
(mask, withoutMask) = pred

# determine the class label and color we'll use to draw


# the bounding box and text
label = "Mask" if mask > withoutMask else "No Mask"
color = (0, 255, 0) if label == "Mask" else (0, 0, 255)

# include the probability in the label


label = "{}: {:.2f}%".format(label, max(mask, withoutMask) * 100)

# display the label and bounding box rectangle on the output


# frame
cv2.putText(frame, label, (startX, startY - 10),
cv2.FONT_HERSHEY_SIMPLEX, 0.45, color, 2)
cv2.rectangle(frame, (startX, startY), (endX, endY), color, 2)

# show the output frame


cv2.imshow("Frame", frame)
key = cv2.waitKey(1) & 0xFF

# if the `q` key was pressed, break from the loop


if key == ord("q"):
break

# do a bit of cleanup
cv2.destroyAllWindows()
vs.stop()

44
7. CONCLUSIONS

In the field of face recognition CNN adopted the main method where the convolutional layers are
combined into a single layer. For its combined convolutional neural network gives higher
accuracy than the rest of other algorithms, also it quite fast compares with other algorithms.
Detection of a masked face showed a higher accuracy rate and capable of faster detecting the
mask face and without mask face of a person, which helps systematically detection of a person
over the visual detection.

One of the main reasons behind achieving this accuracy lies in MaxPooling. It provides
rudimentary translation invariance to the internal representation along with the reduction in the
number of parameters the model has to learn. This sample-based discretization process down-
samples the input representation consisting of image, by reducing its dimensionality. Number of
neurons has the optimized value of 64 which is not too high. A much higher number of neurons
and filters can lead to worse performance. The optimized filter values and pool_size help to filter
out the main portion (face) of the image to detect the existence of mask correctly without causing
over-fitting.

The system can efficiently detect partially occluded faces either with a mask or hair or hand. It
considers the occlusion degree of four regions – nose, mouth, chin and eye to differentiate
between annotated mask or face covered by hand. Therefore, a mask covering the face fully
including nose and chin will only be treated as “with mask” by the model.

The main challenges faced by the method mainly comprise of varying angles and lack of clarity.
Indistinct moving faces in the video stream make it more difficult. However, following the
trajectories of several frames of the video helps to create a better decision – “with mask” or
“without mask”.
Appendix

45
46
(Fig 13)

(Fig 14)

47
(Fig 15)

(Fig 16)

48
References

[1] Ariya Das, Mohammad Wasif Ansari, Rohini Basak, Covid-19 Face Mask Detection Using
TensorFlow, Keras and OpenCV, 05th February 2021[Date added to IEEE Explore] DOI:
10.1109/INDICON49873.2020.9342585.

[2] Md. Shahriar Islam, Eimdadul Haque Moon,Md. Ashikujjaman Shaikat, Mohammad Jahangir
Alam, A Novel Approach to Detect Face Mask using CNN, 18 January 2021 [Date added to IEEE
Explore], 10.1109/ICISS49785.2020.9315927.

[3] S. Feng C. Shen N. Xia W. Song M. Fan and B. J "Cowling Rational use of face masks in the
COVID-19 pandemic" Lancet Respirat. Med. vol. 8 no. 5 pp. 434-436 2020.

[4] B. Suvarnamukhi and M. Seshashayee "Big Data Concepts and Techniques in Data Processing"
International Journal of Computer Sciences and Engineering vol. 6 no. 10 pp. 712-714 2018.

[5] C. Kanan and G. Cottrell "Color-to-Grayscale: Does the Method Matter in Image Recognition?"
PLoS ONE vol. 7 no. 1 pp. e29740 2019.

Acknowledgements
We express our deep sense of gratitude to our project guide Ms. Varsha Nagpurkar for encouraging us
and guiding us throughout this project. We were able to successfully complete this project with the
help of her deep insights into the subject and constant help.

We are very much thankful to Dr.Kavita Sonawane, HOD of the Computer department at St. Francis
Institute of Technology for providing us with the opportunity of undertaking this project which has led
to us learning so much in the domain of Machine Learning.

Last but not the least we would like to thank all our peers who greatly contributed to the completion of
this project with their constant support and help.

49
List of Figures

Fig. No. Figure Caption Page No.

Fig 1 Use Case diagram 13


Fig 2 Activity diagram 14
Fig 3 Functional diagram 15
Fig 4 CNN Architecture 17
Fig 5 Overview of the model 17
Fig 6 User Interface Design 18
Fig 7 UI-Uploaded Image 18
Fig 8 UI-Result 19
Fig 9 CNN Algorithm 20
Fig 10 Conversion of RGB image to 21
Gray image
Fig 11 Dataset 1 (with mask) 32

Fig 12 Dataset 2 (without mask) 32

Fig 13 Output(Image as input) 33

Fig 14 Performance Parameters 33


Fig 15 Accuracy/loss training curve plot 34

Fig 16 Accuracy with other algorithms 34

50
List of Abbreviations

Sr. No. Abbreviation Expanded form

1 CNN Convolution Neural Network


2 ML Machine Learning
3 ReLU Rectified Linear Unit
4 SVM Support Vector Machine
5 DFD Dataflow Diagram
6 UML Unified Modeling Language
7 AI Artificial Intelligence

51

You might also like