You are on page 1of 42

Shivgram Education Society’s

SHREE RAYESHWAR INSTITUTE OF ENGINEERING & INFORMATION TECHNOLOGY


SHIRODA, GOA – 403 103

DEPARTMENT OF INFORMATION TECHNOLOGY

2022 – 2023

“Emotion based Movie Recommendation System using Deep Learning”


By
Mr. Shane Furtado
Miss. Saloni Naik
Miss. Shravani Nevagi
Mr. Vishal Bidikar

Under the Guidance of


Prof. Manjusha Sanke
Assistant Professor, SRIEIT

BACHELOR OF ENGINEERING: GOA UNIVERSITY

i
Shivgram Education Society’s
SHREE RAYESHWAR INSTITUTE OF ENGINEERING & INFORMATION TECHNOLOGY
SHIRODA, GOA – 403 103

2022 - 2023

CERTIFICATE
This is to certify that this dissertation entitled

“EMOTION BASED MOVIE RECOMMENDATION SYSTEM


USING DEEP LEARNING”
Submitted in partial fulfillment of the requirements for Bachelor’s Degree in Information
Technology of Goa University is the bonafide work of

Mr. Shane Furtado


Miss. Saloni Naik
Miss. Shravani Nevagi
Mr. Vishal Bidikar

_____________________ ______________________
Prof. MANJUSHA SANKE
INTERNAL GUIDE HEAD OF DEPARTMENT

__________________
PRINCIPAL

ii
Shivgram Education Society’s
SHREE RAYESHWAR INSTITUTE OF ENGINEERING & INFORMATION TECHNOLOGY
SHIRODA, GOA – 403 103

2022 - 2023

CERTIFICATE

The dissertation entitled,


“EMOTION BASED MOVIE
RECOMMENDATION SYSTEM USING DEEP LEARNING”
submitted by
Mr. Shane Furtado
Miss. Saloni Naik
Miss. Shravani Nevagi
Mr. Vishal Bidikar

in partial fulfillment of the requirements of the Bachelor’s Degree in Information


Technology of Goa University is evaluated and found satisfactory.

DATE: _____________ EXAMINER 1: _______________

PLACE: ___________ EXAMINER 2: _______________

iii
ACKNOWLEDGEMENT

This dissertation would not have been possible without the guidance and the help of several
individuals who in one way or another contributed and extended their valuable assistance in the
preparation and completion of this report.

We would like to express my gratitude to Shree Rayeshwar Institute of Engineering and Information
Technology for giving us the opportunity to do this project for out final year assessment and gain
applicative knowledge of the same.

My sincere gratitude to Prof. Manjusha Sanke, for giving us her complete support and guidance as a
project guide. We are thankful to Prof. Anthony R for his valuable instructions and devoting his time
in explaining us the project pipelining. We would also like to thank our fellow team members who
showed great team work and made it a memorable learning experience.

We would also like to thank my seniors, friends and my parents for guiding and motivating me
throughout my project completion.

Lastly, we would like to extend thanks to all the teaching staff of Shree Rayeshwar Institute of
Engineering and Information Technology, classmates, our parents and friends for their constant
support and motivation.

iv
ABSTRACT

Emotions play a significant role in human perception and decision making. Recent studies on
emotion detection and recognition are the subject for attention by researchers from different fields.
There is a very important emotion category, namely basic emotions which were defined by Paul
Ekman. In his research, he discovered that emotion expressions depend only in part from human
derivation and he has identified six basic emotions. This group contains: anger, sadness, happiness,
surprise, fear, disgust. This project work focuses on four emotional states: happiness, surprise,
sadness and a neutral state which can be always found in daily life, this dissertation uses the
emotion detection and recognition algorithm to detect the emotion of the user and display the movies
list based on the emotion detected.

We present a web-based application which will be able to identify the emotion of user from the
image uploaded on the server captured via a webcam and stream the songs online. The current
systems have web application for playing movies and system to detect human emotion. This work is a
blend of movie plyer and emotion. This will provide the users visiting the web application unique
experience. There are 3 major steps involved in the proposed system;
(1) uploading the image to the server
(2) detecting emotion from the image
(3) presenting movie recommendation list based on the emotion of the user.
The proposed system is expected to identify the emotions and show the movies accordingly. The
feasibility of the system will depend on the percentage of success of emotion analysis and
recognition.

v
TABLE OF CONTENTS

CHAPTER NO. TITLE PAGE NO.

TITLE PAGE i
CERTIFICATE ii
CERTIFICATE iii
ACKNOWLEDGEMENT iv
ABSTRACT v
TABLE OF CONTENTS vi
LIST OF FIGURES vii
LIST OF TABLES viii
LIST OF ALGORITHMS ix
ABBREVIATIONS x

Chapter 1 INTRODUCTION ………………………………………………………(1-5)

1.1 Introduction & Background of Project ………….……………... (1)

1.2 Motivation for Research …………...………………………..…. (3)

1.3 Research Questions, Problem Statement & Objectives…………(4)


1.4 Report organization ………......................................................... (5)

Chapter 2 LITERATURE REVIEW ……………………………...………….……(6-11)

2.1 Introduction …………..................................................………... (6)


2.2 Review of Existing Literature …………..……..………………..(8)
2.3 Summary………………………………………………….........(11)

vi
Chapter 3 RESEARCH DESIGN & METHODOLOGY …………………………..(12-18)
3.1 Assumptions ………………….…………..……..…. (12)

3.2 Proposed Research Design/methodology ….….…………… (16)

Chapter 4 IMPLEMENTATION & RESULT ANALYSIS………………..… (00)

4.1 Introduction………………………………………………………
4.2 Pre-processing……………………………………………………
4.3 CNN Model Specifications………………………………………..
4.4 Training the model…………………………………………………
4.5 Testing the model………………………………………………..
4.6 User interface for face recognition…………………………….
4.7 Database……………………………………………………….
4.8 Implementation output………………………………………
Chapter 5 CONCLUSION & FUTURE SCOPE …………(00)
5.1 Conclusion………… …………………………………………(00)
5.2 Future Scope….……………………………………………..…(00)

REFERENCES ………………………………………….……………………….………..…..(00)

vii
LIST OF FIGURES
FIGURE NO. DESCRIPTION PAGE NO.
3.1.1 CNN Model 12
3.1.2 FER13 Dataset 15
3.2.1 System Block diagram 16
3.2.2 System Activity diagram 17
3.2.3 System Sequence diagram 18

viii
LIST OF TABLES
TABLE NO. DESCRIPTION PAGE NO.
2.2.1 Literature Survey Table 9

ix
LIST OF ALGORITHMS
ALGORITHM DESCRIPTION PAGE NO.
NO.
4.1 Gray scale algorithm 25
4.2 Gray scale and resizing algorithm 26

x
LIST OF ABBREVIATIONS

Sr No. ABBREVIATION FULL FORM

1 ANN Artificial Neural Network


2 CNN Convolution Neural Network
3 SVM Support Vector Machines
4 LSTM Long Short-term Memory
5 ELM Extreme Learning Machine
6 RNN Recurrent Neural
7 FC Fully Connected
8 IMDB Internet Movie Database
9 FER Facial Emotion/Expression Recognition
10 FERA FER and Analysis
11 FER13 Facial Expression Recognition 2013
12 JAFFE Japanese female facial emotion
13 EMP Emotion Music Player
14 GPU Graphics Processing Unit

xi
Chapter 1

INTRODUCTION

1.1 Introduction and project background


Electronic Commerce is the movement of everything involving business on the Internet and the
World Wide Web. It leads to simpler, faster and more efficient business transactions because the
customers can benefit from the increasing range and ease of access to information, products and
services. However, in today’s competitive business environment, providing value to the
customer is very important for businesses to survive. The most effective way to provide value is
to know the customers and serve them as individuals. Recommender systems have become an
answer to the need of personalization.

A Recommender System refers to a system that is capable of predicting the future preference of
a set of items for a user, and recommends the top items. The customer usually provides the
recommender system with data such as the characteristics of the product he is looking for, his
ratings, demographic data, etc. The recommender system applies one or several
recommendation techniques on these data and then recommends products to the customers.
However, for subjective and complexes products such as movies, music, perfume, the task of
rating or describing the desired product characteristics is quite difficult for customers.

While language is crucial to human communication, in most interactions it is supplemented by


various forms of expressive information, such as facial expressions, vocal nuances, gesture and
posture. Facial expressions are studied since ancient times, as it is one of the most important
channel of non-verbal communication. From starting to end of the day human tend to changes
plenty of emotions; it may be because of their mental or physical circumstances. Initially facial
expressions were studied by great philosophers and thinkers like Aristotle and Stewart. With
Darwin, the study of facial expressions became an empirical study. Darwin’s studies created
large interest among psychologists and cognitive scientists. The 20th century saw many studies
relating facial expression to emotion and inter-human communication. The general approach to
automatic facial expression analysis consists of three steps: face detection and tracking, feature
extraction and expression classification / recognition.

1
Image Recognition is one of the most significant Machine Learning and artificial intelligence
examples. Basically, it is an approach for identifying and detecting a feature or an object in the
digital image. Face detection is a form of image recognition which automatically finds the face
region from the input images.

Deep learning is a subarea of machine learning that has revolutionized image processing. It
made previous techniques based on manual feature extraction, obsolete. Image recognition is
done with trainable, multi-layer neural networks. Instead of handcrafting rules, we feed labeled
images to the network. Each layer learns/extracts different aspects of the images. Lower layers
extract basic features (such as edges), and higher layers extract more complex concepts.

The general approach to automatic facial expression analysis consists of three steps:

 face detection and tracking

 feature extraction involves extract meaningful or discriminative information

caused by facial expressions

 expression classification / recognition.

Although humans are filled with various emotions, modern psychology defines six basic facial
expressions that are Happiness, Sadness, Surprise, Fear, Disgust, and Anger as universal
emotions. Facial muscles movements help to identify human emotions. Basic facial features are
eyebrow, mouth, nose and eyes, cheeks. These features are actively used in training the emotion
recognition model.

2
1.2 Motivation for Research
It’s not always easy to pick the right movie to watch. Sometimes you’re in the state of mind to
see individuals begin to look all starry eyed at, or you need a motion picture to remind your
connection with music, or you need to just watch characters to whom you relate or any emotion-
based movie. Moreover, as user preferences for these subjective products change constantly
according to their emotions, the traditional user profile is not sufficient to understand and
capture these changes. Settling on this decision can’t generally be understood by attempting to
pick something dependent on genre. This is why we are going to create, a “Movie
recommendation system based on emotion”, that can capture customer preferences according to
their emotions. Emotion plays an important role in rational and intelligent behavior; thus, we
incorporate user emotions in to the recommendation process.

Our motivation in this work is to use emotion recognition techniques to generate additional
inputs for movie recommended system’s algorithm, and to enhance the accuracy of the resulting
movie recommendations. A movie player should be intelligent and act according to user’s
preferences. A movie player should help users organize and play the songs automatically
without putting much effort into selection and re-organization of movies. The Emotion-Based
movie Player provides a better platform to all the movie watchers, and ensures automation of
movie selection and periodic updating of play-lists. This helps users organize and play movie
based on their moods.

3
1.3 Research questions, Problem statement and Objective

In a traditional player, the user has to manually browse and select a movie that would match his
mood and emotional experience. With the increasing advancements in the field of multimedia
and technology, various players have been developed with features like fast forward, reverse,
genre classification etc.

Although these features satisfy the user ‘s basic requirements, yet the user has to face the task of
manually browsing through the playlist and select a movie based on his current mood and
behavior. Movie enthusiasts have a troublesome time creating and segregating the playlist
manually once they have many movies. That is the requirements of an individual, a user
sporadically suffered through the need and desire of browsing through his playlist, according to
his mood and emotions.

Recognizing emotion from images has become one of the active research themes in image
processing and in applications based on human- computer interaction. The objective of the
project is to provide adapted and personalized movie suggestions to the user based on their
current mood. With the proposed system the user does not need to spend time to search for a
movie and just has to click a picture of themselves and upload. The system will use image
processing and facial detection technologies to detect the emotion of the user and a list of related
movie recommendations will be created.

4
1.4 Report Organization

Chapter 1- Introduction

It is the current section that gives an overview of the project. It explains why the report is being
written and what the major goal is for this project to be completed.

Chapter 2- Literature Survey

The study that was conducted includes all references, benefits, and drawbacks, as well as a full
comparison of all accessible techniques, as well as the accuracy of each technique.

Chapter 3- Research design and Methodology

This section provides us about the technique chosen for implementation, i.e. how the project will be
carried out, as well as the criteria used to select the method.

Chapter 4- Implementation & Result Analysis

This section contains the implementation and result analysis of the project that includes the different
models codes, results and comparison.

Chapter 5- Conclusion and Future work

This section contains the summary of the project that includes the purpose, different methodologies
of the project, various algorithms and their descriptions.

5
Chapter 2

LITERATURE REVIEW

2.1 Introduction
In 1978 very first effort was made by (Sown et al., 1978) to analyze facial expressions automatically.
A thorough investigation of the initial research studies on this topic is found in (Pantic & Rothkrantz,
2000; Fasel & Luettin, 2003). The study Initiated by (Tian et al., 2001) suggested identifying facial
expressions using FACS. After that, several studies have discussed detecting AU occurrence and AU
intensity, including (Tian et al. 2005). The different approaches detect distinct facial points and
interpret the meaning of expression accordingly. Three significant steps are universal in automatic
deep FER, i.e., pre-processing, deep feature learning, and deep feature classification. Out of these
three, feature extraction and classification are two critical tasks in FER and Analysis (FERA)

Decades of scientific research have been conducted developing and evaluating methods for
automated FER. There is now an extensive literature proposing and evaluating hundreds of different
kinds of methods, leveraging techniques from multiple areas, such as signal processing, machine
learning, computer vision, and speech processing. Different methodologies and techniques may be
employed to interpret emotion such as Bayesian networks, Gaussian Mixture models and Hidden
Markov Models and deep neural networks.

The existing approaches in emotion recognition to classify certain emotion types can be generally
classified into three main categories: knowledge-based techniques, statistical methods, and hybrid
approaches.

 Knowledge based techniques

It utilizes domain knowledge and the semantic and syntactic characteristics of language
in order to detect certain emotion types. It is mainly classified into two categories:
dictionary-based and corpus-based approaches. Dictionary-based approaches find
opinion or emotion seed words in a dictionary and search for their synonyms and
antonyms to expand the initial list of opinions or emotions. Corpus-based approaches on

6
the other hand, start with a seed list of opinion or emotion words, and expand the
database by finding other words with context-specific characteristics in a large corpus.

 Statistical Method

These involve the use of different supervised machine learning algorithms in which a
large set of annotated data is fed into the algorithms for the system to learn and predict
the appropriate emotion types. Machine learning algorithms provides more reasonable
classification accuracy compared to other approaches, but one of the challenges in
achieving good results in the classification process, is the need to have a sufficiently
large training set.

Some of the most commonly used machine learning algorithms include Support Vector
Machines (SVM), Naive Bayes, and Maximum Entropy. Deep learning, which is under
the unsupervised family of machine learning, is also widely employed in emotion
recognition. Well-known deep learning algorithms include different architectures of
Artificial Neural Network (ANN) such as Convolutional Neural Network (CNN), Long
Short-term Memory (LSTM), and Extreme Learning Machine (ELM).

 Hybrid approaches

These are a combination of knowledge-based techniques and statistical methods, which


exploit complementary characteristics from both techniques. A downside of using hybrid
techniques however, is the computational complexity during the classification process.

7
2.2 Review of Existing work
Alramzana Nujum Navaz; Serhani Mohamed Adel; Sujith Samuel Mathew [1] proposed a solution
that applies and compares four deep learning models for image pre-processing with the main
objective to improve emotion recognition accuracy. They used a systematic approach to increase
accuracy in three of the neural network configurations namely Alexnet, GoogLeNet, and CNN. The
maximum overall accuracy improvement from experiments was identified in CNN, from 23.5 to 59.4
% (≈ 36%increase).

P. Kaviya; T. Arumugaprakash [2] The proposed model achieves a final accuracy of 65% for Facial
Expression Recognition (FER)-2013 and 60% for custom datasets. The proposed model achieves a
final accuracy of 65% for Facial Expression Recognition (FER)-2013 and 60% for custom datasets.

Akriti Jaiswal; A. Krishnama Raju; Suman Deb [3] This paper proposed a convolutional neural
networks (CNN) based deep learning architecture for emotion detection from images. The
performance of the proposed method is evaluated using two datasets Facial emotion recognition
challenge (FERC-2013) and Japaness female facial emotion (JAFFE). The accuracies achieved with
proposed model are 70.14 and 98.6 .

Zeynab Rzayeva; Emin Alasgarov [4] proposed a CNN model that is trained on Cohn-Kanade and
RAVDESS datasets to find 5 major facial emotions. It consisted of 8 convolutional layers with the
addition of pooling and dropout layers. The model gave good results on both datasets and it predicted
surprise and happy emotions better in comparison to other emotions.

Renuka S. Deshmukh; Vandana Jagtap; Shilpa Paygude [5] The system focused on live images taken
from the webcam and developed automatic facial emotion recognition system for stressed individuals
thus assigning them music therapy so as to relief stress. The emotions considered for the experiments
included happiness, Sadness, Surprise, Fear, Disgust, and Anger that are universally accepted.

Noel Jaymon; Sushma Nagdeote; Aayush Yadav; Ryan Rodrigues [6] experimented with three
models as follows Simple, Inception, Xception model and showed test accuracy of 54%, 61.42% and
65.2% respectively.

Rabie Helaly; Mohamed Ali Hajjaji; Faouzi M'Sahli; Abdellatif Mtibaa [7] In this proposed work,
Xception convolutional neural network model is used as a deep learning model for facial expression
recognition. The dataset used for training the model is FER 2013. When the developed system is
trained and tested on the GPU, the accuracy of the results is 94%. However, when it is tested using

8
Sabrina Begaj; Ali Osman Topal; Maaruf Ali [8] They used three datasets: FER2013, AffectNet and
iCVMEFED. The last one was selected due to the fact that it offered completely raw images, was
relatively new. Data Augmentation was needed since the dataset was not large enough to properly
train a CNN. As a result, the CNN algorithm has performed best in detecting happy faces. The most
problematic case is confusing fear with surprise.

Leo Pauly; Deepa Sankar [9] The system detects the faces of the users from the live camera stream
along with gender identification and product recommendation algorithms for targeting products to
the right user.

Shlok Gilda; Husain Zafar; Chintan Soni; Kshitija Waghurdekar [10] developed a Emotion music
player, which recommends music based on the real-time mood of the user. EMP provides smart
mood based music recommendation by incorporating the capabilities of emotion context reasoning.
The music player contains three modules: Emotion Module, Music Classification Module and
Recommendation Module. The Emotion Module takes an image of the user’s face as an input and
makes use of deep learning algorithms to identify their mood with an accuracy of 90.23%

Rabia Qayyum; Vishwesh Akre; Talha Hafeez [11] The proposed study uses Convolutional Neural
Network (CNN) and Recurrent Neural Network (RNN) to conduct a comparison of which deep
learning technique works best for emotion recognition. Both neural network are trained using
FER2013 dataset of Kaggle with seven emotion classes. The trained models are evaluated where
CNN attains the accuracy of 65% and RNN lack behind with the accuracy of 41%. The trained
models are then applied using music player based on one’s facial expressions.

Table 2.2.1 Survey table


Refe METHODS MERITS DEMERITS DATASET RESULTS
renc
USED
e No.
[1] Alexnet, 1.Manual labeling Disgust and neural Facial image data Alexnet-
GoogleNet, of images emotions were never of Indian film 57.9%
CNN (Alexnet), classified correctly. stars.
2.Removal of GoogleNet-
confusion emotion 51.2%
therby narrowing CNN- 59.4%
the classes
(GoogleNet),
3.Image
enhancement
using CNN

9
[2] CNN Haar filter to Several neural images FER-2013 65%
detect and extract are predicted as sad
face features neighbor
[3] CNN Computing time In FER dataset we FER-2013 70.14%
was reduced in the train on 32,298
Japanese model as samples which is & Japanese female &
compared to FER- validate on 3589 facial expression
(JAFFE) 98.6%
2013 samples and in
JAFFE dataset we
train 833 samples.
[4] CNN Predicted surprise - Cohn-Kanade & -
and happy RAVDESS
emotions better in
comparison to
others

[5] SVM The pre- SVM have a good CK+ , FERG, -


(Support processing step generalization NVIE
vector include converting performance, but
machine) the facial their results are in
expression into general less sparse
binary image.
[6] Simple, - Lack of robustness FER- 2013 54%, 61.42%
Inception, & 65.2%
Xception
model
[7] Xception This exception Vibrant environment FER-2013 94%
CNN model is trained
on more than
millions of
images. The
gained results
confirmed that this
system can be
effectively
implemented and
utilized using low
powered
embedded system.
[8] CNN iCVMEFED was Data Augmentation FER-2013,
selected as it was needed. AffectNet &
offered completely iCVMEFED -
raw images. Faced problems while
detecting feared and
surprised
emotion/face.

10
[11] CNN & - - FER-2013 CNN-65%
RNN
RNN-41%

2.3 Summary
The concepts and strategies utilized by various researchers to diagnose and categorize lung cancers
are highlighted in this report. The ultimate goal is to create an Emotion Recognition System that can
identify and differentiate the different types of emotions.

11
Chapter 3

Research design and Methodology

3.1 Assumption
Convolution Neural Network (CNN) architecture

CNN is a type of deep learning model for processing data that has a grid pattern, such as images,
which is inspired by the organization of animal visual cortex and designed to automatically and
adaptively learn spatial hierarchies of features, from low- to high-level patterns.

CNN is a mathematical construct that is typically composed of three types of layers (or building
blocks): convolution, pooling, and fully connected layers. The first two, convolution and pooling
layers, perform feature extraction, whereas the third, a fully connected layer, maps the extracted
features into final output, such as classification. A convolution layer plays a key role in CNN, which
is composed of a stack of mathematical operations, such as convolution, a specialized type of linear
operation. In digital images, pixel values are stored in a two-dimensional (2D) grid and a small grid
of parameters called kernel, an optimizable feature extractor, is applied at each image position, which
makes CNNs highly efficient for image processing, since a feature may occur anywhere in the image.

As one layer feeds its output into the next layer, extracted features can hierarchically and
progressively become more complex. The process of optimizing parameters such as kernels is called
training, which is performed so as to minimize the difference between outputs and ground truth
labels through an optimization algorithm called backpropagation and gradient descent, among others.

12
Fig 3.1.1
Three layers of convolution neural network:
1. Convolutional Layer

This layer is the first layer that is used to extract the various features from the input images. In this
layer, the mathematical operation of convolution is performed between the input image and a filter of
a particular size M*M. By sliding the filter over the input image, the dot product is taken between the
filter and the parts of the input image with respect to the size of the filter (M*M).
The output is termed as the Feature map which gives us information about the image such as the

corners and edges. The convolution layer in CNN passes the result to the next layer once applying
the convolution operation in the input. Convolutional layers in CNN benefit a lot as they ensure the
spatial relationship between the pixels is intact.

2. Pooling Layer
The primary aim of this layer is to decrease the size of the convolved feature map to reduce the
computational costs. This is performed by decreasing the connections between layers and
independently operates on each feature map. Depending upon method used, there are several types of
pooling operations.

In Max Pooling, the largest element is taken from feature map. Average Pooling calculates the
average of the elements in a predefined sized Image section. The total sum of the elements in the
predefined section is computed in Sum Pooling. The Pooling Layer usually serves as a bridge
between the Convolutional Layer and the FC Layer.
This CNN model generalizes the features extracted by the convolution layer, and helps the networks
to recognize the features independently. With the help of this, the computations are also reduced in a
network.

3. Fully Connected Layer

The Fully Connected (FC) layer consists of the weights and biases along with the neurons and is used
to connect the neurons between two different layers. These layers are usually placed before the
output layer and form the last few layers of a CNN Architecture.
In this, the input image from the previous layers is flattened and fed to the FC layer. The flattened
vector then undergoes few more FC layers where the mathematical functions operations usually take

13
place. In this stage, the classification process begins to take place. The reason two layers are
connected is that two fully connected layers will perform better than a single connected layer.

4. Activation Functions

The last fully connected layer’s activation function is frequently distinct from the others. Each
activity necessitates the selection of an appropriate activation function. The softmax function, which
normalizes output real values from the last fully connected layer to target class probabilities, where
each value ranges between 0 and 1 and all values total to 1, is an activation function used in the
multiclass classification problem.

5. Dropout

The Dropout layer is a mask that nullifies some neurons’ contributions to the following layer while
leaving all others unchanged. Dropout layers are critical in CNN training because they prevent the
training data from over fitting. If they aren’t there, the first batch of training data has a
disproportionately large impact on learning.
The software provides real-time feedback on training progress and performance, and makes the
model after training easy to save and reuse. In CNN architecture initially we have to extract input
image of 48*48 from dataset FERC-2013. The network begins with an input layer of 48 by 48 which
matches the input data size parallelly processed through two similar models that is functionality in
deep learning, and then concatenated for better accuracy and getting features of images perfectly.

DATASET
A data set is a collection of related, discrete items of related data that may be accessed individually
or in combination or managed as a whole entity. A data set is organized into some type of data
structure. Below mentioned are some of the datasets used in our project.

FER2013 (Facial Expression Recognition 2013 Dataset)

The FER2013 (Facial Expression Recognition 2013) dataset contains images along with


categories describing the emotion of the person in it. The dataset contains 48×48 pixel grayscale
images with 7 different emotions such as Angry, Disgust, Fear, Happy, Sad, Surprise, and Neutral.
The dataset contains 28709 examples in the training set, 3589 examples in the public testing set,
and 3589 examples in the private test set.

14
Fig 3.1.2

Movie Dataset: IMDB


The IMDB dataset having 50K movie reviews for natural language processing or Text analytics. This
is a dataset for binary sentiment classification containing substantially more data than previous
benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training and 25,000
for testing. So, predict the number of positive and negative reviews using either classification or deep
learning algorithms.

15
3.2 Proposed Research Design/Methodology
This chapter displays the model based on which the project implementation will take place. Based on
the Literature review, following are the proposed methodology which we will be using in our project.
The proposed methodology is as shown below.
Phase 1- Collect image uploaded by the user on the Movie recommendation system.
Phase 2- Pre-processing is done on the image after they have been retrieved. Pre-processing includes
grayscale conversion, scaling, cropping.
Phase 3- Face detection
Phase 4- Feature Extraction
Phase 5- Classification is done on processed image using Convolutional Neural Network (CNN).
Phase 6- Distinguish the image into the supported emotions.
Phase 7- Match the emotion with the Movie database to retrieve list of movies.
Phase 8- Display list of movies to the user.

Fig 3.2.1 System Block diagram

16
System Activity Diagram

Fig 3.2.2 System Activity diagram


An activity diagram visually presents a series of actions or flow of control in a system . It represents
the flow from one activity to another.

The small filled circle represents the initial action or the start of a system activity. The start symbol
on the left indicates the model building pipeline which includes the following activities:
After acquisition of the the selected dataset, the training set is used to train the model. The images of
the dataset includes human faces the display various expressions. The distinguishing characters/
features og these expressions are extracted. The model learns through these features and then
classifies them into the respective emotion.
Once the model is trained, we run few tests to determine whether the model is trained sufficiently or
not, possibilities of overfitting and underfitting are checked. The diamond represents a decision with
alternate paths. If the training is not sufficient, the model is runs through a few more training sets.
Else, we can say the the model trained is complete. This model is then used to classify the the users
images.

The start symbol on the right indicates the image classification pipeline which includes the following
activities:
When a the system recieves an image, it checks whether it is a live (user uploaded) image or a test
image. A test image is a part of the testing set of the dataset, hence does not require pre-processing. If
it is a user image then it is sent for pre-processing where it is converted to grayscale and cropped to
desired size. The facial features are extracted and sent for classification the the model. After the
emotion is recognized the system searches the movies database corresponding list of movies and
displays it to the user.

17
System Sequence Diagram

Fig 3.2.3 System Sequence diagram

A sequence diagram is the most commonly used interaction diagram. Interaction diagram: An
interaction diagram is used to show the interactive behavior of a system. Sequence Diagrams: A
sequence diagram simply depicts interaction between objects in a sequential order i.e., the order in
which these interactions take place.

We can also use the terms event diagrams or event scenarios to refer to a sequence diagram.
Sequence diagrams describe how and in what order the objects in a system function. These diagrams
are widely used by businessmen and software developers to document and understand requirements
for new and existing systems.

Sequence Diagram Notations

 Actors – An actor in a UML diagram represents a type of role where it interacts with the
system and its objects. It is important to note here that an actor is always outside the scope of
the system we aim to model using the UML diagram.

 Lifelines – A lifeline is a named element which depicts an individual participant in a sequence


diagram. So basically, each instance in a sequence diagram is represented by a lifeline.
Lifeline elements are located at the top in a sequence diagram.

 Messages – Communication between objects is depicted using messages. The messages appear
in a sequential order on the lifeline. We represent messages using arrows.

18
 Synchronous messages – A synchronous message waits for a reply before the interaction can
move forward. The sender waits until the receiver has completed the processing of the
message. The caller continues only when it knows that the receiver has processed the previous
message i.e., it receives a reply message.

 Asynchronous Messages – An asynchronous message does not wait for a reply from the
receiver. The interaction moves forward irrespective of the receiver processing the previous
message or not. We use a lined arrow head to represent an asynchronous message.

Description:
1. Firstly, the application is opened by the user.
2. The device then gets access to the web cam.
3. The webcam captures the images of the user.
4. The device uses algorithms to detect the face and predict the mood.
5. It then requests database for dictionary of possible moods.
6. The mood is retrieved from the database.
7. The mood is displayed to the user.
8. The movie is requested from the database.
9. The playlist is generated and finally shown to the user.

19
Chapter 4

Implementation and Research Analysis

4.1 Introduction: Image processing Dataset: The train-test split is used to estimate the performance
of machine learning algorithms that are applicable for prediction-based Algorithms/Applications.
This method is a fast and easy procedure to perform such that we can compare our own machine
learning model results to machine results. By default, the Test set is split into 30 % of actual data and
the training set is split into 70% of the actual data.
Upon analysis of the data set chosen (FER-13) it was:
Train set: Found 28709 images belonging to 7 classes.
Test set: Found 7187 images belonging to 7 classes.

Uploading dataset to drive.


1. Open the Google Drive app.
2. Tap Add.
3. Tap Upload.
4. Find and tap the files you want to upload.
5. View uploaded files in My Drive until you move them.

To access google drive in google Collab


1. First, go to your Google Colab then type the below: from google.colab import drive.
drive.mount('/content/gdrive') ...
2. If you are able to access Google Drive, your google drive files should be all under:
/content/gdrive/My Drive/

After mounting the drive into google collab we can see the data set present in the drive folder. From
here we can extract the data from the dataset for the pre-processing stage. The datasets folders are
taken in the form of a link path to the folder in which the data is stored.

20
4.2 PRE-PROCESSING
Image pre-processing
Image pre-processing is the steps taken to format images before they are used by model training and
inference. This includes, but is not limited to, resizing, orienting, and colour corrections.
Image pre-processing may also decrease model training time and increase model inference speed. If
input images are particularly large, reducing the size of these images will dramatically improve
model training time without significantly reducing model performance.
Pre-processing is an essential step to clean image data before it is ready to be used in a computer
vision model. There are both technical and performance reasons why pre-processing is essential.
Fully connected layers in convolutional neural networks, a common architecture in computer vision,
require that all images are the same sized arrays. If your images are not in the same size, your model
may not perform as expected. If you are building a model in code using a library like TensorFlow,
you are likely to encounter an error if your image is not the same size.

Image augmentation
Image augmentation are manipulations applied to images to create different versions of similar
content in order to expose the model to a wider array of training examples. For example, randomly
altering rotation, brightness, or scale of an input image requires that a model consider what an image
subject looks like in a variety of situations.
Image augmentation manipulations are forms of image pre-processing, but there is a critical
difference: while image pre-processing steps are applied to training and test sets, image augmentation
is only applied to the training data. Thus, a transformation that could be an augmentation in some
situations may best be a pre-processing step in others.
Knowing the context for data collecting and model inference is -required to make informed pre-
processing and augmentation decisions.
It’s impossible to truly capture an image that accounts for every real world scenario a model may
encompass. This is where augmentation can help. By augmenting your images, you can increase the
sample size of your training data and add in new cases that might be hard to find in the real-world.
Augmenting existing training data to generalize to other situations allows the model to learn from a
wider array of situations.
Grayscale
Colour changes are an example of image transformations that may be applied to all images (train and
test) or randomly altered in training only as augmentations.

21
Generally, Gray scaling is a colour change applied to all images. While we may think “more signal is
always better; we should show the model colour,” we may see more timely model performance when
images are grayscale.
In addition, colour is sometimes not as relevant to a model. If you use greyscale, you don't need to
worry about gathering images for every colour of an object; your model will learn more general
features about an object that do not depend on colour.
Colour images are stored as red, green, and blue values, whereas grayscale images are only stored as
a range of black to white. This means for CNNs, our model only needs to work with one matrix per
image, not three.
Best tips: grayscale is fairly intuitive. If the problem at hand explicitly requires colour (like
delineating a white line from yellow line on roads), it’s not appropriate. If we’re, say, deciphering the
face of a rolled set of dice, grayscale may be a great option.

Resize
Changing the size of an image sounds trivial, but there are considerations to take into account.
Many model architectures call for square input images, but few devices capture perfectly square
images. Altering an image to be a square call for either stretching its dimensions to fit to be a square
or keeping its aspect ratio constant and filling in newly created “dead space” with new pixels.
Moreover, input images may be various sizes, and some may be smaller than the desired input size.
Best tips: preserving scale is not always required, filling in dead pixels with reflected image content
is often best, and down sampling large images to smaller images is often safest.

4.3 CNN Model specifications:


Conv2D:
Conv2D is a 2D Convolution Layer, this layer creates a convolution kernel that is wind with layers
input which helps produce a tensor of outputs.

Kernel - In image processing kernel is a convolution matrix or masks which can be used for blurring,
sharpening, embossing, edge detection, and more by doing a convolution between a kernel and an
image.

MaxPooling2D:
Max pooling is a sample-based discretization process. The objective is to down-sample an input
representation (image, hidden-layer output matrix, etc.), reducing its dimensionality and allowing for
assumptions to be made about features contained in the sub-regions binned.

22
This is done to in part to help over-fitting by providing an abstracted form of the representation. As
well, it reduces the computational cost by reducing the number of parameters to learn and provides
basic translation invariance to the internal representation.
Max pooling is done by applying a max filter to (usually) non-overlapping sub-regions of the initial
representation.

Dropout:
The Dropout layer randomly sets input units to 0 with a frequency of rate at each step during training
time, which helps prevent overfitting. Inputs not set to 0 are scaled up by 1/ (1 - rate) such that the
sum over all inputs is unchanged.

Flatten:
Flattening is used to convert all the resultant 2-Dimensional arrays from pooled feature maps into a
single long continuous linear vector. The flattened matrix is fed as input to the fully connected layer
to classify the image.

Dense:
Dense Layer is simple layer of neurons in which each neuron receives input from all the neurons of
previous layer, thus called as dense. Dense Layer is used to classify image based on output from
convolutional layers.

4.4 Training the model:


Faces are detected from the images in FER-13 dataset. Emotions of those faces are known. Feature
points of the face refer to 68 points on 7 regions of interest on the face. So, feature points of FER-13
dataset are extracted and mapped to particular emotion. Using these values, classification model is
prepared.

4.5 Testing the model:


To train the model, the train-test split() function is used. This function splits the dataset into training
and testing sets. The training data is not used for testing. A training ratio of 0.90 means 90% of the
dataset will be used for training and the remaining for testing the model. The Learning Rate (LR) is a
configurable parameter used in training which determines how fast the model weights are calculated.
A high LR can cause the model to converge too quickly while a small LR may lead to more accurate
weights (up to convergence) but takes more computation time. The number of epochs is the number
of times a dataset is passed forward and backward through the NN. The dataset is divided into

23
batches to lower the processing time and the number of training images in a batch is called the batch
size.

4.6 User interface for face recognition


A method of organizing an image collection includes detecting faces in the image collection,
Features to be included in the front-end user interface:
1. Click photo button: the button should be present inside the website and should open the web-
cam of the user’s computer, And the user should be able to click a photo based on his most
natural emotion.

2. Extracting features from the detected faces, determining an emotion of the user by analyzing
the extracted features, where in each face in the set of unique faces is believed to be from a
different person than the other faces in the set: and displaying the list of recommended
movies.

3. Login: the user should be able to log in, into their personal accounts.

4. Upload image: the user should be able to upload an image of themself previously taken as
well if preferred.

4.7 Database
Dataset: IMDB Dataset,
The dataset is available to download on Kaggle at
https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews/download?
datasetVersionNumber=1
In the real-world, ratings are very sparse and data points are mostly collected from very popular
movies and highly engaged users. We wouldn’t want movies that were rated by a small number of
users because it’s not credible enough. Similarly, users who have rated only a handful of
movies should also not be taken into account.
So, with all that taken into account and some trial-and-error experimentations, we will reduce the
noise by adding some filters for the final dataset.
 To qualify a movie, a minimum of 10 users should have voted a movie.

24
To qualify a user, a minimum of 50 movies should have voted by the user.

After which the dataset will be stored in the database. In separate categories according to their 7
emotion categories. The database will be connected to the front-end so as to display a list of
recommended movies as per the emotion of the user.

4.8 Implementation output:


Pre-processing:
Gray scale conversion:
4.1Gray scale algorithm:
Step 1: Import cv2
Step 2: Use cv2.imread() function to retrieve the image.
Step 3: Use cv2.cvtColor() function and convert RGB image to gray scale (COLOR_BGR2GRAY)
Step 4: Use cv2.imshow() function to display the gray scale converted image.

Gray scale conversion code:


import cv2
img = cv2.imread('/content/002.jpg', cv2.IMREAD_UNCHANGED)
cv2_imshow(img)

 
img = cv2.imread('/content/002.jpg', cv2.IMREAD_UNCHANGED)
gray_image = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

cv2_imshow(gray_image)
 
# Window shown waits for any key pressing event
cv2.destroyAllWindows()

25
Output:

Gray scale and resizing of image:


4.2 Gray scale and resizing algorithm:
Step 1: Import cv2
Step 2: Use cv2.imread() function to retrieve the image.
Step 3: Use cv2.cvtColor() function and convert RGB image to Gray scale(COLOR_BGR2GRAY)
Step 4: Center the image.
Step 5: Use cv2.getRotationMatrix2D function to make the transformation matrix M which will be
used for rotating an image .

Step 6: Use cv2.warpAffine which is an OpenCV transformation which takes a 2x3 transformation
matrix.
Step 7: Use cv2.imshow() function to display the resized image .

Gray scale-resizing conversion code:


import cv2
img = cv2.imread(' /content/002.jpg', cv2.IMREAD_UNCHANGED)
cv2_imshow(img)

26
 
img = cv2.imread(' /content/002.jpg', cv2.IMREAD_UNCHANGED)
gray_image = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

center = (200, 200)
M = cv2.getRotationMatrix2D(center, 0, 0.7)
resized_pic = cv2.warpAffine(gray_image, M, (gray_image.shape[1], g
ray_image.shape[0]))

plt.figure()
cv2_imshow(resized_pic)

#cv2_imshow(gray_image)
 
# Window shown waits for any key pressing event
cv2.destroyAllWindows()

Output:

27
Chapter 5

CONCLUSION & FUTURE SCOPE

5.1 Conclusion
Facial expression is an important channel for human communication and can be applied in many
real-world applications. Many people like to watch movies, but they have to browse through movies
according to their mood and then select one.

This project proposes a system for movie recommendation that will ease the job of the users. A web-
based application that will capture the users’ image via a webcam or upload an image stored on the
user’s system. Emotion will be detected from the uploaded image and a movie playlist will be
displayed on the website according to the emotion of the user.

5.2 Future Scope


The project can be further enhanced by making the online based project into an android-based
application that will be available to user for download on app stores like Google play etc. In the
current project scope, we recommend user a list of movies that match or correspond to the emotion
detected in the image, we can embed the movie player in the system itself for playing the movies. For
uploading mages, the user can then use the primary or secondary camera of the mobile phones. More
languages can be added to the project.
The project focuses on Facial Emotion detection, this can future be enhanced detect mood of the
users with the help of the user’s body language or gestures.

28
References
Texts:
[1] Rafael C. Gonzalez and Richard E. Woods. Digital Image Processing Third Edition- Pearson
Education. Inc, Prentice Hall, 2008.

Paper References:
[1] Alramzana Nujum Navaz; Serhani Mohamed Adel; Sujith Samuel Mathew “Facial Image Pre-
processing and Emotion Classification: A Deep Laerning Approach” , 2019 IEEE/ACS 16th
International Conference on Computer Systems and Applications.

[2] P. Kaviya; T. Arumugaprakash “Group Facial Emotion Analysis System Using CNN” , 2021 5 th
International Conference on Computing Methodologies and Communication.

[3] Akriti Jaiswal; A. Krishnama Raju; Suman Deb “Facial Emotion Detection using Deep Learning”
, 2020 International Conference for Emerging Technology.

[4] Zeynab Rzayeva; Emin Alasgarov “Facial Emotion Recognition using CNN” , 2022 IEEE
INMIC. Technology.

[5] Renuka S. Deshmukh; Vandana Jagtap; Shilpa Paygude “Facial Emotion Recognition System
through Machine Learning Approach” , 2017 International Conference on Intelligent Computing and
Control Systems.

[6] Noel Jaymon; Sushma Nagdeote; Aayush Yadav; Ryan Rodrigues “Real Time Emotion Detection
using Deep Learning” , 2021 International Conference on Advances in Electrical,Computing,
Communication and Sustainable Technologies.

[7] Rabie Helaly; Mohamed Ali Hajjaji; Faouzi M'Sahli; Abdellatif Mtibaa “Face Recognition Model
using Neural Network” , 2020 IEEE INMIC. Technology

29
[8] Sabrina Begaj; Ali Osman Topal; Maaruf Ali “Emotion Recognition Based on Facial Expression
using CNN” , 2020 International Conference on Computing, Networking, Telecommunications &
Engineering Sciences Applications.

[9] Leo Pauly; Deepa Sankar “Product Recommendation System from Emotion Detection” , 2015
IEEE ICCICCT.

[10] Shlok Gilda; Husain Zafar; Chintan Soni; Kshitija Waghurdekar “Smart Music Player
Integrating Facial Emotion Recognition and Music Mood Recommendation” , 2017 International
Conference on Wireless Communications, Signal Processing and Networking.

[11] Rabia Qayyum; Vishwesh Akre; Talha Hafeez “Android based Emotion Detection using
Convolutions Neural Network” , 2021 International Conference on Computational Intelligence and
Knowledge Economy.
[12] J.Jayapradha Soumya Sharma, Yash Dugar “Detection and Recognition of Human Emotion
Using Neural Network” , International Journal of Applied Engineering Research.

Reference hyperlinks:
https://www.sciencedirect.com/science/article/pii/S1877050920318019
https://www.researchgate.net/publication/
360889041_Facial_Emotion_Recognition_using_Deep_Learning_A_Survey
https://www.hindawi.com/journals/wcmc/2022/2024352/
https://en.wikipedia.org/wiki/Emotion_recognition
https://youtu.be/yN7qfBhfGqs
https://towardsdatascience.com/emotion-detection-a-machine-learning-project-f7431f652b1f
https://www.geeksforgeeks.org/unified-modeling-language-uml-sequence-diagrams/
https://www.javatpoint.com/uml-sequence-diagram
https://en.wikipedia.org/wiki/Sequence_diagram
https://courses.cs.washington.edu/courses/cse403/15sp/lectures/L10.pdf

30
31

You might also like