You are on page 1of 17

Informatics College Pokhara

Artificial Intelligence

CU6051NP
Coursework 1

Submitted By: Submitted To:


Student Name: Ashutosh Sunar Mr. Sushil Paudel
London Met ID: 18028985 Module Leader
Group: L3C1 Artificial Intelligence
Date: 17-Jan-2021
Abstract

The application of artificial intelligence is increasing day by day. It has been the
integral part of our daily life. It has become the reason of the quick change in
the technology and many other fields. It also has the ability of solving the
problem which are usually done by the humans with their natural intelligence.
This reports mainly reflects about the application of machine learning which is
the subsets of artificial intelligence in the gender prediction program. Report
consists of the explanation of the problem domain of gender prediction and also
some research work on the machine learning and its application. Also, naïve
bayes algorithm is introduced in this report. Moreover, how the proposed
solution addresses the real-world problem is also discussed in the report.
Table of Contents
1. Introduction................................................................................................ 1

1.1 Explanation of the topic or AI concepts used ...................................... 2

1.2 Explanation of the chosen problem domain ........................................ 3

2. Background ............................................................................................... 4

2.1 Research Work ................................................................................... 4

2.1.1 Journal on Gender prediction methods based on first name with


genderizeR. .............................................................................................. 4

2.1.2 Predicting Gender from Name...................................................... 5

2.2 Review and analysis of existing work in problem domain ................... 6

2.2.1 Gender API .................................................................................. 6

3. Solution ..................................................................................................... 7

3.1 Explanation of the proposed solution .................................................. 7

3.2 Explanation of the algorithms used ..................................................... 8

3.3 Pseudocode of the solution................................................................. 9

3.4 Diagrammatical representations of the solution ................................ 10

4. Conclusion .............................................................................................. 11

4.1 Analysis of the work done ................................................................. 11

4.2 Explanation of solution addressing the real-world problem. .............. 12

4.3 Further Work ..................................................................................... 13


Table of Figure
Figure 1: Journal of gender prediction by name ............................................... 4
Figure 2: Predicting Gender from name ........................................................... 5
Figure 3: Gender API ....................................................................................... 6
Figure 4: Flowchart of solution ....................................................................... 10
CU6051NP Artificial Intelligence

1. Introduction
Artificial intelligence commonly known as the AI is the ability of a digital
computer or computer-controlled robot that perform tasks generally associated
with intelligent beings. It is usually applied to the project of developing systems
enriched with the intellectual processes characteristics of human beings such
as the ability to reason, discover meaning, generalize of learn from the past
experience (Copeland, 2020). The need of artificial intelligence is increasing
day by day. It has been one of the most important factors for the quick change
in the technology and in the business field as well. Spam filtering, credit card
fraud detection, Recommendation system, search engines, scene classification
etc are the some of the application of artificial intelligence. It has intertwined in
all that we do, therefore it is hard to imagine life without it.

The coursework which is assigned to us focus on research work where we need


to carry out on different similar AI topics like Problem solving and Heuristic
Search, Adversarial Search and games, Natural Processing Language,
Machine Learning and recommendation systems. It is an individual task where
we are required to study and do research on selected topic and develop a
conceptual solution for the selected problem. Here, we are asked to asked to
write a report which includes the explanation of the selected problem domain
with research work done on chosen problem domain. Also, we are asked to
describe the solution after reviewing and analysing of existing work in the
problem domain with necessary diagrams and pseudocode.

Ashutosh Sunar 1
CU6051NP Artificial Intelligence

1.1 Explanation of the topic or AI concepts used


For this project, the topic that I have selected is ‘Gender Prediction by Name’
which predict the gender on the basis of name given. This is one of the
applications of machine learning which is subfield of artificial intelligence.
Machine Learning is the study of computer algorithms that improve
automatically through experience. It is of two types supervised learning and
unsupervised learning. Supervised learning means having a full set of labelled
data while training an algorithm where unsupervised learning is the training of
machine using information which is neither classified nor labelled and allowing
the algorithm to act on that information without guidance. The topic that I have
selected uses Classification predictive modelling which the supervised learning
concept.

Classification Predictive modelling is a task which uses the supervised learning


concept of machine learning where a class label is predicted for a given
example of input data. The examples of classification problems are: given an
example, classify if it is spam or not, given a handwritten character, classify it
as one of the known characters etc. Classification accuracy is not perfect but it
is good starting point for many classification tasks. There are mainly four types
of classification tasks which are binary classification, multi-class classification,
multi-label classification and imbalanced classification. The problem domain I
have chosen falls under binary classification. Binary classification is those
classification tasks which have two class labels for example email spam
detection (spam or not), conversion prediction (buy or not). The popular
algorithms that can be used for binary classification are logistic regression, k-
Nearest Neighbors, Decision trees, Support vector machine and Naïve Bayes
(Brownlee, 2020). Among these algorithms, I have used Naïve Bayes algorithm
for selected topic.

Ashutosh Sunar 2
CU6051NP Artificial Intelligence

1.2 Explanation of the chosen problem domain


There are numerous of large company in the world. Larger company means
huge amount of data. Most of the times company do survey to get the feedbacks
from the customers or consumers or for other purposes. Therefore, while
collecting data from the form, names are collected but not gender even if gender
option is given because most of the people do not want to disclose their gender.
In this situation, company cannot analyse category of their gender which can
be the problem for the company. The data collected from the people will be
wasted if the collected data is incomplete or necessary part is not available.

Therefore, with the help of gender prediction by name makes easier for any
organizations to predict whether the collected names are male or female. With
the help of this system, the company or organization can simply input the name
and the system will predict in which gender it falls either male or female.

Ashutosh Sunar 3
CU6051NP Artificial Intelligence

2. Background
This part of report includes the research works, reviews and the analysis done
for the project. The research works includes the journal related to the topic I
chose and other things that is need for development.

2.1 Research Work


2.1.1 Journal on Gender prediction methods based on first name with
genderizeR.

Figure 1: Journal of gender prediction by name

From this journal, I found that there has been increases interest in methods for
gender prediction based on first names which employ various open data
sources in recent years. These methods have been helpful for bibliometric
studies to customize offers for web users. Also, I found that there have been
several approaches proposed for gender prediction based on first name and
some of them are used in bibliometrics studies which were published in
prestigious scientific journals.

In short, this journal explains about the importance or necessity of gender


prediction based on first name (Wais, 2016).

Ashutosh Sunar 4
CU6051NP Artificial Intelligence

2.1.2 Predicting Gender from Name

Figure 2: Predicting Gender from name

From this website, I have learned that Individual’s gender is key predictor when
it comes to developing an effective predictive model, either it be in marketing or
healthcare, sports or any other domain. Almost every time, we as predictive
modelers have to face with missing info in the key variable. And we know that
gender is categorical variable with few options i.e., male, female, others or
disclosed and while using these values sometime it makes very hard to find the
missing value. So, in this situation gender prediction by name is needed.

Therefore, from this website, I learned that gender is one of the important key
factors and also learned the necessity of gender prediction by name (Acharath,
2019).

Ashutosh Sunar 5
CU6051NP Artificial Intelligence

2.2 Review and analysis of existing work in problem domain


2.2.1 Gender API

Figure 3: Gender API

It is one of the common tools for the gender prediction by name. This tool is
usually used by businessman or science researcher for predicting gender from
the people’s first name. This tool mostly predicts the English name’s gender
which is the main problem of this tool. This tool uses API and libraries that uses
datasets of profiles from the social networks. It Uses PHP, jQuery, Python and
Java platform. This tool provides both online and offline services (Gender API ,
2021).

From above research, I am able to understand about the predictive model and
its real use in world. The above research that I have done helps me to
understand about the importance of gender prediction. Also, I found that there
are different algorithms which have been used for gender prediction.

Ashutosh Sunar 6
CU6051NP Artificial Intelligence

3. Solution
3.1 Explanation of the proposed solution
The proposed system that I chose is gender prediction by name which
predictive modelling system. Before developing this system, I did some
research work on this topic. After doing research, I found that it is one of the
applications of machine learning which is the subsets of Artificial intelligence.
And, I started to do research about that machine learning. For that that I used
lecture slides as reference which was provide by our module leader and also, I
have read some journals and books about machine learning. While going
through these resources, I found that machine learning is computer algorithm
which learns from past experience and also found that it is of two types i.e.,
supervised and unsupervised learning. And with more research work, I came
to know that my proposed system also falls under supervised learning. After
that I learned about supervised learning and how it works.

With adequate research work on supervised learning, I came to know about


classification predictive modelling. I know that Classification Predictive
modelling is a task which uses the supervised learning concept of machine
learning where a class label is predicted for a given example of input data. And
I also found the system that I chose is also a classification predictive modelling.
And while continuing research work about classification predictive modelling, I
came to know about naïve bayes algorithm which is one of the popular
algorithms that is used in classification problem. And finally, I decided to used
naïve bayes algorithm because it was each to use and understand.

Ashutosh Sunar 7
CU6051NP Artificial Intelligence

3.2 Explanation of the algorithms used


The algorithms that I have used is Naïve Bayes algorithm. It is a classification
technique which is based on Bayes’ theorem. In other word, a Naïve Bayes
classifier assumes that the presence of a particular features in class is
unrelated to the presence of any other feature. It is easy to build and mostly
useful for very large data sets. It is known to outperform even highly advanced
classification methods. The formula of Naïve Bayes’ based on Bayes theorem
is as follow:

𝑃(𝑦│𝑐)∗𝑃(𝑐)
𝑃(𝑐|𝑦) =
𝑃(𝑦)

Where,

P(c/y) is the posterior probability of class ‘c’ given predictor ‘y’

P(c) is the prior probability of class

P(y/c) is the likelihood which is he probability of predictor given class.

P(y) is the prior probability of predictor

Similarly, I have used this theorem in my project which is as follows

𝑃(𝑛𝑎𝑚𝑒|𝑔𝑒𝑛𝑑𝑒𝑟) ∗ 𝑃(𝑔𝑒𝑛𝑑𝑒𝑟)
𝑃(𝑔𝑒𝑛𝑑𝑒𝑟|𝑛𝑎𝑚𝑒) =
𝑃(𝑛𝑎𝑚𝑒)

The naïve bayes algorithm easy and fast to predict class of test data set. And
it performs well in case of categorical input variables compared to numerical
variable. Therefore, Naïve Bayes Algorithm was very suitable for my project
(Ray, 2017).

Ashutosh Sunar 8
CU6051NP Artificial Intelligence

3.3 Pseudocode of the solution


START the program

IMPORT the data from dataset

TRAIN the data through Naïve Bayes Algorithm

WHILE

INPUT value

IF input value is e or exit

END the program

ELSE

Compares the input value with trained data

IF value matches

DISPLAY output

ELSE

GOTO starts of the program

ENDIF

ENDIF

ENDWHILE

END the program

Ashutosh Sunar 9
CU6051NP Artificial Intelligence

3.4 Diagrammatical representations of the solution

Figure 4: Flowchart of solution

Ashutosh Sunar 10
CU6051NP Artificial Intelligence

4. Conclusion
4.1 Analysis of the work done
The topic I chose is based on the implementation of naïve bayes algorithm to
classify gender by name. The report includes the explanation of topic and
explanation of the chosen Problem domain. To complete this project, different
kinds of research works has been carried out and review and analysis of
existing work in problem domain also has been done which are discussed in
this report. Along with the research work, the explanation of proposed solution
and algorithms used are also included in report. And the pseudocode and
flowchart of the solution are also mentioned in the report. With the help
pseudocode and flowchart, the development of the project was done. The
project is consoled based program which developed in python programming
language and visual studio code IDE.

The coursework helps us to know about the machine learning and its
application and also about the supervised learning and naïve bayes algorithm.

Ashutosh Sunar 11
CU6051NP Artificial Intelligence

4.2 Explanation of solution addressing the real-world problem.


The task of the project is to create the consoled based program of gender
prediction by name using naïve bayes algorithm. This program can be useful
for the larger organization and the office which mostly collect survey form.

While collecting the survey form, there would be the huge amount of data. The
data provided in survey form can have name of the provider but it is not sure
that he/she would provide his/her name because most of the people do not
want to disclose their gender. If the gender is not given then the data that is
need would be incomplete. And the data collected will be wasted and this
would-be loss for the company. So, with the help of gender prediction by name
makes easier for any organizations to predict whether the collected names are
male or female and can make use of this data for other purposes.

Ashutosh Sunar 12
CU6051NP Artificial Intelligence

4.3 Further Work


This coursework one consists of research part and documentation part of the
chosen topic. The report consists of the research work of the chosen topic,
explanation of the problem domain by using pseudocode and flowchart. Also,
explanation of the algorithm used is also discussed in this report. Besides this
the further work that is remaining to do are as follows:

1. Development of program
2. Testing of Program
3. Final documentation of the project.

Ashutosh Sunar 13

You might also like