Professional Documents
Culture Documents
CHAPTER 1
INTRODUCTION
1.1 Introduction to AI-ML
AIML (Artificial Intelligence Machine Learning) is a markup language designed for creating
chatbots and conversational agents. This report provides an overview of AIML, its history, key
features, applications, and its relevance in modern conversational AI development.
• Before leading to the meaning of artificial intelligence let understand what is the meaning of
Intelligence-
• Intelligence: The ability to learn and solve problems. This definition is taken from webster’s
Dictionary.
• The most common answer that one expects is “to make computers intelligent so that they can act
intelligently!”, but the question is how much intelligent? How can one judge intelligence?
If the computers can, somehow, solve real-world problems, by improving on their own from past
experiences, they would be called “intelligent”. Thus, the AI systems are more generic (rather than
specific), can “think” and are more flexible. Intelligence, as we know, is the ability to acquire and
apply knowledge. Knowledge is the information acquired through experience. Experience is the
knowledge gained through exposure(training). Summing the terms up, we get artificial intelligence as
the “copy of something natural (i.e., human beings) ‘WHO’ is capable of acquiring and applying the
information it has gained through exposure.”
• Intelligence is composed of:
A. Reasoning
B. Learning
C. Problem-Solving
D. Perception
E. Linguistic Intelligence
1. Problem Scoping
2. Data Acquisition
3. Data Exploration
4. Modelling
5. Evaluation
1.3.1. What is Problem Scoping?
Identifying a problem and having a vision to solve it, is called Problem Scoping. Scoping a
problem is not that easy as we need to have a deeper understanding so that the picture becomes clearer
while we are working to solve it. So, we use the 4Ws Problem Canvas to understand the problem in a
better way.
➢ What is 4Ws Problem Canvas? The 4Ws Problem canvas helps in identifying the key elements
related to the problem.
1. Who
2. What
3. Where
4. Why
1. Who?: This block helps in analysing the people who are getting affected directly or indirectly due
to a problem. Under this, we find out who are the 'Stakeholders' (those people who face this problem
and would be benefitted with the solution) to this problem? Below are the questions that we need to
discuss under this block.
3. Where?: This block will help us to look into the situation in which the problem arises, the context
of it, and the locations where it is prominent. Here is the Where Canvas:
it first using data. For example, if you want to make an Artificially Intelligent system which can predict
the salary of any employee based on his previous salaries, you would feed the data of his previous
salaries into the machine. The previous salary data here is known as Training Data while the next
salary prediction data set is known as the Testing Data. Data features refer to the type of data you want
to collect. In above example, data features would be salary amount, increment percentage, increment
period, bonus, etc. There can be various ways to collect the data. Some of them are:
1. Surveys
2. Web Scraping
3. Sensors
4. Cameras
5. Observations
6. API (Application Program Interface)
One of the most reliable and authentic sources of information, are the open-sourced websites hosted
by the government. Some of the open-sourced Govt. portals are: data.gov.in, india.gov. i
1. Quickly get a sense of the trends, relationships and patterns contained within the data.
2. Define strategy for which model to use at a later stage.
3. Communicate the same to others effectively.
To visualise data, we can use various types of visual representations like Bar graph, Histogram, Line
Chart, Pie Chart.
a) Supervised Learning: In a supervised learning model, the dataset which is fed to the machine is
labelled. A label is some information which can be used as a tag for data. For example, students get
grades according to the marks they secure in examinations. These grades are labels which categorise
the students according to their marks. There are two types of Supervised Learning models:
1. Classification: Where the data is classified according to the labels. This model works on discrete
dataset which means the data need not be continuous.
2. Regression: Such models work on continuous data. For example, if we wish to predict our next
salary, then we would put in the data of our previous salary, any increments, etc., and would train
the model. Here, the data which has been fed to the machine is continuous.
b) Unsupervised Learning: An unsupervised learning model works on unlabelled dataset. This means
that the data which is fed to the machine is random. This model is used to identify relationships,
patterns and trends out of the data which is fed into it. It helps the user in understanding what the data
is about and what are the major features identified by the machine in it.
1. Accuracy
2. Precision
3. Recall
4. F1 Score
• Neural Network:
Neural networks are loosely modelled after how neurons in the human brain behave. The key
advantage of neural networks is that they are able to extract data features automatically
without needing the input of the programmer. It is a fast and efficient way to solve problems for which
the dataset is very large, such as in images. As seen in the figure given above, the larger Neural
Networks tend to perform better with larger amounts of data whereas the traditional machine learning
algorithms stop improving after a certain saturation point.
➢ Gaming agents: These are agents that are designed to play games, either against human opponents
or other agents. Examples of gaming agents include chess-playing agents and poker-playing
agents.
➢ Fraud detection agents: These are agents that are designed to detect fraudulent behavior in
financial transactions. They can analyse patterns of behaviour to identify suspicious activity and
alert authorities. Examples of fraud detection agents include those used by banks and credit card
companies.
➢ Traffic management agents: These are agents that are designed to manage traffic flow in cities.
They can monitor traffic patterns, adjust traffic lights, and reroute vehicles to minimize congestion.
Examples of traffic management agents include those used in smart cities around the world.
➢ A software agent: It has Keystrokes, file contents, received network packages that act as sensors
and displays on the screen, files, and sent network packets acting as actuators.
➢ Human-agent has eyes, ears, and other organs which act as sensors, and hands, legs, mouth, and
other body parts act as actuators.
➢ A Robotic agent has Cameras and infrared range finders which act as sensors and various motors
act as actuators.
1.4 Key features of AI-ML
AIML incorporates several key features that make it suitable for building chatbots:
Artificial Intelligence has many practical applications across various industries and domains,
including:
➢ Healthcare: AI is used for medical diagnosis, drug discovery, and predictive analysis of diseases.
➢ Finance: AI helps in credit scoring, fraud detection, and financial forecasting.
➢ Retail: AI is used for product recommendations, price optimization, and supply chain
management.
➢ Manufacturing: AI helps in quality control, predictive maintenance, and production optimization.
➢ Transportation: AI is used for autonomous vehicles, traffic prediction, and route optimization.
➢ Customer service: AI-powered chatbots are used for customer support, answering frequently
asked questions, and handling simple requests.
➢ Security: AI is used for facial recognition, intrusion detection, and cybersecurity threat analysis.
➢ Marketing: AI is used for targeted advertising, customer segmentation, and sentiment analysis.
➢ Education: AI is used for personalized learning, adaptive testing, and intelligent tutoring systems.
CHAPTER 2
PYTHON PROGRAMMING
2.1 Introduction
Python is a versatile and widely-used programming language known for its simplicity,
readability, and extensive library support. This report provides an in-depth overview of Python,
including its history, key features, applications, and its significance in modern software development.
• ABC programming language is said to be the predecessor of Python language, which was capable
of Exception Handling and interfacing with the Amoeba Operating System.
• The following programming languages influence Python:
o ABC language.
o Modula-3
2.3 Key features of Python Programming
Python offers a range of features that contribute to its popularity
Python's syntax is clean and easy to read, which reduces the cost of program maintenance and enhances
collaboration among developers.
Python includes a comprehensive standard library that provides modules and packages for various
tasks, from file handling to web development, which simplifies and accelerates software development.
Python is available on multiple platforms, including Windows, macOS, and Linux, allowing
developers to create cross-platform applications with ease.
The Python community is large and active, fostering the creation of numerous third-party libraries and
frameworks. Popular examples include NumPy for scientific computing, Django for web development,
and TensorFlow for machine learning.
2.3.5. Versatility
Python is a versatile language suitable for a wide range of applications, including web development,
data analysis, artificial intelligence, scientific computing, automation, and more.
Python is the go-to language for data analysis and machine learning due to libraries like Pandas,
Matplotlib, Scikit-Learn, and TensorFlow.
Python's ease of use makes it ideal for automating repetitive tasks and writing scripts for various
purposes.
1. NumPy:
What is NumPy?
▪ NumPy is a Python library used for working with arrays.
▪ It also has functions for working in domain of linear algebra, Fourier transform, and matrices.
▪ NumPy was created in 2005 by Travis Oliphant. It is an open-source project and you can use it
freely.
▪ NumPy stands for Numerical Python.
➢ Data Types in Python
NumPy has some extra data types, and refer to data types with one character, like i for integers, u for
unsigned integers etc.
Below is a list of all data types in NumPy and the characters used to represent them.
• i - integer
• b - Boolean
• u - unsigned integer
• f - float
• c - complex float
• m - time delta
• M – datetime
• O - object
• S - string
• U - Unicode string
• V - fixed chunk of memory for other type (void)
Methods used in NumPy
➢ Copy: The copy owns the data and any changes made to the copy will not affect original array,
and any changes made to the original array will not affect the copy.
➢ Shape: NumPy arrays have an attribute called shape that returns a tuple with each index having
the number of corresponding elements.
➢ Reshape: Reshaping means changing the shape of an array.
• The shape of an array is the number of elements in each dimension.
• By reshaping we can add or remove dimensions or change number of elements in each
dimension.
➢ Iterating Arrays:
• Iterating means going through elements one by one.
• As we deal with multi-dimensional arrays in NumPy, we can do this using basic for loop of
python.
• If we iterate on a 1-D array it will go through each element one by one.
➢ Join:
•Joining means putting contents of two or more arrays in a single array.
•In SQL we join tables based on a key, whereas in NumPy we join arrays by axes.
•We pass a sequence of arrays that we want to join to the concatenate () function, along with the
axis. If axis is not explicitly passed, it is taken as 0.
➢ Split:
• Splitting is reverse operation of Joining.
• Joining merges multiple arrays into one and Splitting breaks one array into multiple.
• We use array split () for splitting arrays, we pass it the array we want to split and the number
of splits.
➢ Search:
• You can search an array for a certain value, and return the indexes that get a match.
• To search an array, use the where () method.
➢ Sort:
• Sorting means putting elements in an ordered sequence.
• Ordered sequence is any sequence that has an order corresponding to elements, like numeric
or alphabetical, ascending or descending.
• The NumPy ND array object has a function called sort (), that will sort a specified array.
➢ Filter:
• Getting some elements out of an existing array and creating a new array out of them is called
import NumPy as np
# Create a NumPy array
arr = np.array ([1, 2, 3, 4, 5])
# Print the array
print("Original Array:")
print(arr)
# Perform some basic operations
mean_value = np.mean(arr)
sum_value = np.sum(arr)
max_value = np.max(arr)
min_value = np.min(arr)
# Print the results
print("\nMean:",mean_value)
print("Sum:", sum_value)
print("Max:", max_value)
print("Min:", min_value)
# Create a 2D array
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Print the 2D array print("\n2D Array:")
print(matrix)
# Transpose the 2D array
transposed_matrix = np.transpose(matrix)
# Print the transposed 2D array
print("\nTransposed 2DArray:")
print(transposed_matrix)
Output:
Original Array: [1 2 3 4 5]
Mean: 3.0
Sum: 15
Max: 5
Min: 1
2D Array:
[[1 2 3] [4 5 6]
[7 8 9]]
Transposed 2D Array:
[[1 4 7] [2 5 8]
[3 6 9]]
2. Matplotlib
What is Matplotlib?
• Matplotlib is a low-level graph plotting library in python that serves as a visualization utility.
• Matplotlib was created by John D. Hunter. o Matplotlib is open source and we can use it freely.
• Matplotlib is mostly written in python, a few segments are written in C, Objective C and Java
script for Platform compatibility.
Different Matplotlib:
➢ Pyplot:
o Most of the Matplotlib utilities lies under the pyplot submodule, and are usually imported under
the plt alias:
o import matplotlib. Pyplot as plt
o Now the Pyplot package can be referred to as plt
➢ Plotting:
o The plot() function is used to draw points (markers) in a diagram.
o By default, the plot() function draws a line from point to point.
o The function takes parameters for specifying points in the diagram.
o Parameter 1 is an array containing the points on the x-axis.
o Parameter 2 is an array containing the points on the y-axis.
➢ Markers:
o You can use the keyword argument marker to emphasize each point with a specified marker.
➢ Line:
o You can use the keyword argument line style, or shorter ls, to change the style of the plotted line.
➢ Labels:
o With Pyplot, you can use the xlabel() and ylabel() functions to set a label for the x- and y-axis.
➢ Grids:
o With Pyplot, you can use the grid() function to add grid lines to the plot.
➢ Subplot:
o With the subplot() function you can draw multiple plots in one figure.
Example program to demonstrate simple plot using Matplot library:
# importing the required module
import matplotlib.pyplot as plt
# x axis values x =[1,2,3]
# corresponding y axis values
y = [2,4,1]
# plotting the points plt.plot(x, y)
# naming the x axis plt.xlabel('x - axis')
# naming the y axis plt.ylabel('y - axis')
# giving a title to my graph plt.title('My first graph!')
# function to show the plot plt.show()
3. Pandas
➢ What are Pandas?
o Pandas is a Python library used for working with data sets.
o It has functions for analysing, cleaning, exploring, and manipulating data.
o the name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was created
by Wes McKinney in 2008.
❖ Different methods in Pandas
➢ Series:
• A Pandas Series is like a column in a table. o It is a one-dimensional array holding data of any
type.
➢ Key/Value Objects as Series
• You can also use a key/value object, like a dictionary, when creating a Series.
➢ Data frame:
➢ What is a Data Frame?
• A Pandas Data Frame is a 2-dimensional data structure, like a 2-dimensional array, or a table
with rows and columns.
➢ Read CSV:
❖ Read CSV Files
• A simple way to store big data sets is to use CSV files (comma separated files).
• CSV files contains plain text and is a well know format that can be read by everyone including
Pandas.
• In our examples we will be using a CSV file called 'data.csv'.
➢ Read JSON
• Big data sets are often stored, or extracted as JSON.
• JSON is plain text, but has the format of an object, and is well known in the world of
programming, including Pandas.
• In our examples we will be using a JSON file called 'data.json'.
➢ Analysing Data:
▪ Viewing the Data
• One of the most used method for getting a quick overview of the DataFrame, is the head()
method.
• The head () method returns the headers and a specified number of rows, starting from the top.
• There is also a tail () method for viewing the last rows of the Data Frame.
• The tail () method returns the headers and a specified number of rows, starting from the
bottom.
• The Data Frames object has a method called info (), that gives you more information about
the data set.
4.Scikit-learn:
➢ What is Sklearn in Python? A Python module called Scikit-learn offers a variety of supervised
and unsupervised learning techniques. It is based on several technologies you may already be
acquainted with, including NumPy, pandas, and Matplotlib.
➢ What is Sklearn? French research scientist David Cornopean’s scikits. learn is a Google Summer
of Code venture where the scikit-learn project first began. Its name refers to the idea that it's a
modification to SciPy called "Seiki" (SciPy Toolkit), which was independently created and
published. Later, other programmers rewrote the core codebase.
❖ Different methods in Sci-Kit learn:
➢ Data Modelling Process Dataset Loading:
A collection of data is called dataset. It is having the following two components:
▪ Features − The variables of data are called its features. They are also known as predictors, inputs
or attributes.
▪ Feature matrix − It is the collection of features, in case there are more than one.
▪ Feature Names − It is the list of all the names of the features.
▪ Response − It is the output variable that basically depends upon the feature variables. They are
also known as target, label or output.
Dept. Of AIML, AIET, Mijar Page | 17
Artificial Intelligence and Machine Learning
▪ Response Vector − It is used to represent response column. Generally, we have just one response
column.
▪ Target Names − It represent the possible values taken by a response vector. Scikit-learn have few
example datasets like iris and digits for classification and the Boston house prices for regression.
➢ Data Representation
As we know that machine learning is about to create model from data. For this purpose, computer must
understand the data first. Next, we are going to discuss various ways to represent the data in order to
be understood by computer
➢ Data as table
• The best way to represent data in Scikit-learn is in the form of tables. A table represents a 2-D grid
of data where rows represent the individual elements of the dataset and the columns represents the
quantities related to those individual elements.
➢ Estimator API
▪ What is Estimator API
• It is one of the main APIs implemented by Scikit-learn. It provides a consistent interface for a wide
range of ML applications that’s why all machine learning algorithms in Scikit-Learn are
implemented via Estimator API. The object that learns from the data (fitting the data) is an
estimator. It can be used with any of the algorithms like classification, regression, clustering or
even with a transformer, that extracts useful features from raw data.
➢ Linear Modelling
The following table lists out various linear models provided by Scikit-Learn −
1. Linear Regression
It is one of the best statistical models that studies the relationship between a dependent variable (Y)
with a given set of independent variables (X).
2. Logistic Regression
Logistic regression, despite its name, is a classification algorithm rather than regression algorithm.
Based on a given set of independent variables, it is used to estimate discrete value (0 or 1, yes/no,
true/false).
3. Ridge Regression
Ridge regression or Tikhonov regularization is the regularization technique that performs L2
regularization. It modifies the loss function by adding the penalty equivalent to the square of the
magnitude of coefficients.
4. Bayesian Ridge Regression
Bayesian regression allows a natural mechanism to survive insufficient data or poorly distributed data
by formulating linear regression using probability distributors rather than point estimates.
5. LASSO
LASSO is the regularisation technique that performs L1 regularisation. It modifies the loss function
by adding the penalty (shrinkage quantity) equivalent to the summation of the absolute value of
coefficients.
6. Multi-task LASSO
It allows to fit multiple regression problems jointly enforcing the selected features to be same for all
the regression problems, also called tasks. Sklearn provides a linear model named Multitask Lasso,
trained with a mixed L1, L2-norm for regularisation, which estimates sparse coefficients for multiple
regression problems jointly.
7. Elastic-Net
The Elastic-Net is a regularized regression method that linearly combines both penalties i.e. L1 and
L2 of the Lasso and Ridge regression methods. It is useful when there are multiple correlated features.
8. Multi-task Elastic-Net
It is an Elastic-Net model that allows to fit multiple regression problems jointly enforcing the selected
features to be same for all the regression problems, also called tasks.
➢ Stochastic Gradient Descent
Here, we will learn about an optimization algorithm in Sklearn, termed as Stochastic Gradient
Descent (SGD). Stochastic Gradient Descent (SGD) is a simple yet efficient optimization algorithm
used to find the values of parameters/coefficients of functions that minimize a cost function. In other
words, it is used for discriminative learning of linear classifiers under convex loss functions such as
SVM and Logistic regression. It has been successfully applied to large-scale datasets because the
update to the coefficients is performed for each training instance, rather than at the end of instances.
➢ SGD Classifier
Stochastic Gradient Descent (SGD) classifier basically implements a plain SGD learning routine
supporting various loss functions and penalties for classification. Scikit-learn provides SGD Classifier
module to implement SGD classification.
➢ K-Nearest Neighbours (KNN)
This chapter will help you in understanding the nearest neighbour methods in Sklearn.
Neighbour based learning method are of both types namely supervised and unsupervised. Supervised
neighbours-based learning can be used for both classification as well as regression predictive problems
but, it is mainly used for classification predictive problems in industry.
❖ Types of algorithms
Different types of algorithms which can be used in neighbour-based methods implementation are as
follows –
• Brute Force
The brute-force computation of distances between all pairs of points in the dataset provides the
most neighbour search implementation. Mathematically, for N samples in D dimensions, brute-force
approach scales for small data samples, this algorithm can be very useful, but it becomes infeasible as
and when number of samples grows. Brute force neighbour search can be enabled by writing the
keyword.
• K-D Tree
One of the tree-based data structures that have been invented to address the computational
inefficiencies of the brute-force approach, is KD tree data structure. Basically, the KD tree is a binary
tree structure which is called K-dimensional tree. It recursively partitions the parameters space along
the data axes by dividing it into nested orthographic regions into which the data points are filled.
• Boosting Methods
Boosting methods build ensemble model in an increment way. The main principle is to build the
model incrementally by training each base model estimator sequentially. In order to build powerful
ensemble, these methods basically combine several week learners which are sequentially trained over
multiple iterations of training data. The sklearn. ensemble module is having following two boosting
methods.
• AdaBoost
It is one of the most successful boosting ensemble methods whose main key is in the way they give
weights to the instances in dataset. That’s why the algorithm needs to pay less attention to the instances
while constructing subsequent models.
Example program:
# load the iris dataset as an example from sklearn. Datasets
import load_iris
iris = load_iris()
# store the feature matrix (X) and response vector (y)
X = iris.data y = iris.target
# store the feature and target names feature_names = iris.feature_names
target_names = iris.target_names
# X and y are numpy arrays
print("\nType of X is:", type(X))
# printing first 5 input rows print("\nFirst 5 rows of X:\n", X[:5])
Output:
Feature names: ['sepal length (cm)','sepal width (cm)','petal length (cm)','petal width (cm)']
First 5 rows of X:
5.OpenCV
➢ What is OpenCV?
OpenCV is a Python open-source library, which is used for computer vision in Artificial
intelligence, Machine Learning, face recognition, etc. In OpenCV, the CV is an abbreviation form of
a computer vision, which is defined as a field of study that helps computers to understand the content
of the digital images such as photographs and videos.
➢ OpenCV Read and Save Image
▪ OpenCV Reading Images
OpenCV allows us to perform multiple operations on the image, but to do that it is necessary to read
an image file as input, and then we can perform the various operations on it. OpenCV provides
following functions which are used to read and write the images.
OpenCV imwrite() function is used to save an image to a specified file. The file extension defines the
image format. The syntax is the following:
cv2.imwrite(filename, img[, params])
Parameters:
Filename- Name of the file to be loaded image- Image to be saved. params- The following parameters
are currently supported:
o For JPEG, quality can be from 0 to 100. The default value is 95.
o For PNG, quality can be the compress level from 0 to 9. The default value is 1.
o For PPM, PGM, or PBM, it can be a binary format flag 0 or 1. The default value is 1.
• BLOB extraction
Blob extraction means to separate the BLOBs (objects) in a binary image. A BLOB contains a
group of connected pixels. We can determine whether two pixels are connected or not by the
connectivity, i.e., which pixels is neighbour of another pixel. There are two types of connectivity. The
8-connectivity and the 4-connectivity. The 8-connectivity is far better than 4-connectivity.
• BLOB representation
BLOB representation is simply means that convert the BLOB into a few representative numbers.
After the BLOB extraction, the next step is to classify the several BLOBs. There are two steps in the
BLOB representation process. In the first step, each BLOB is denoted by several characteristics, and
the second step is to apply some matching methods that compare the features of each BLOB.
• BLOB classification
Here we determine the type of BLOB, for example, given BLOB is a circle or not. Here the
question is how to define which BLOBs are circle and which are not based on their features that we
described earlier. For this purpose, generally we need to make a prototype model of the object we are
looking for.
➢ OpenCV Image Filters
Image filtering is the process of modifying an image by changing its shades or color of the pixel. It is
also used to increase brightness and contrast. In this tutorial, we will learn about several types of filters.
A. Bilateral Filter
OpenCV provides the bilateral Filter () function to apply the bilateral filter on the image. The
bilateral filter can reduce unwanted noise very well while keeping edges sharp. The syntax of the
function is given below:
▪ Face Detection: The face detection is generally considered as finding the faces (location and size)
in an image and probably extract them to be used by the face detection algorithm.
▪ Face Recognition: The face recognition algorithm is used in finding features that are uniquely
described in the image. The facial image is already extracted, cropped, resized, and usually
converted in the grayscale.
There are various algorithms of face detection and face recognition. Here we will learn about face
detection using the HAAR cascade algorithm.
CHAPTER 3
3.1 Introduction
One of the most precious gifts to a human being is an ability to see, listen, speak and respond
according to the situations. But there are some unfortunate ones who are deprived of this. Making a
single compact device for people with hearing and vocal impairment is a tough job. Communication
between deaf-dumb and normal person have been always a challenging task. This project proposes an
innovative communication system framework for deaf, dumb and people in a single compact device.
We provide a technique for a person to read a text and it can be achieved by capturing an image through
a camera which converts a text to speech (TTS). It provides a way for the deaf people to read a text by
speech to text (STT) conversion technology. Also, it provides a technique for dumb people using text
to voice conversion.
The system is provided with four switches and each switch has a different function. The dumb
people can communicate their message through text which will be read out by e-speak, the deaf people
can be able to hear others speech from text. All these functions are implemented by the use of Laptop.
The number of Deaf & Dumb people is over five- hitter of the population. Linguistic communication
is principally used by Deaf & Dumb to speak with each other. The most downside today moon-faced
by Deaf & Dumb folks is to talk with those that don’t understand linguistic communication. In contrast,
writing is associate degree possibility; it’s thought as a slow and inefficient manner of communication.
Thus a viable possibility would be to rent an expert linguistic communication translator. In this project,
we are introducing two- way smart communication system for Deaf & Dumb and Normal people; the
project is building a system that assists Deaf & Dumb people to convey their messages to Normal
people.
The system consists of two main parts: The first part is for Deaf & Dumb person to convey their
messages to a normal person and the second one is for a normal person who can also respond them
easily without learning a sign language with the help of GUI.
3.2 Objectives of Smart Communication for deaf and dumb people
Creating smart communication solutions for deaf and mute individuals using Python involves
several objectives, each contributing to the overall goal of enhancing accessibility, inclusivity, and
independence for this community. In this detailed explanation, we will explore the key objectives in
developing smart communication tools using Python:
▪ Implementation: Use Python for mobile app development (e.g., with Kivy or Flask for web-based
apps) and integrate features like real-time chat, video calls with sign language interpretation, and
other communication tools.
❖ Integration of Natural Language Processing (NLP):
▪ Objective: Incorporate natural language processing capabilities into communication tools to
enhance the understanding of written language. This objective aims to make written
communication more accessible and efficient for deaf individuals.
▪ Implementation: Leverage Python libraries for NLP, such as NLTK or spaCy, to develop
applications that analyse and understand written text. This can be applied to chat applications,
educational tools, and other communication platforms.
❖ Accessibility in Web Development:
▪ Objective: Ensure web applications and online platforms are accessible to deaf and mute users.
This involves implementing features that accommodate various communication needs, such as
captioning, sign language interpretation, and accessible user interfaces.
▪ Implementation: Use Python frameworks like Django or Flask for web development and integrate
accessibility features, such as ARIA (Accessible Rich Internet Applications) attributes, to enhance
the usability of web applications for individuals with diverse communication abilities.
❖ User Authentication and Security:
▪ Objective: Develop secure and user-friendly authentication mechanisms for smart communication
applications. This ensures that the privacy and security of deaf and mute individuals using these
tools are safeguarded.
▪ Implementation: Implement secure user authentication using Python frameworks like Flask-
Security or Django's authentication system. Integrate encryption protocols to protect sensitive
communication data exchanged within the applications.
❖ Community Engagement and Social Inclusion:
▪ Objective: Foster community engagement and social inclusion by developing Python-based
platforms that connect deaf and mute individuals with each other and the broader community. This
can include social networking features, forums, and collaborative tools.
▪ Implementation: Use Python frameworks to build community platforms with features like user
profiles, discussion forums, and collaborative spaces. Integrate communication tools within these
platforms to facilitate interaction and collaboration among users.
❖ Continuous Improvement and Adaptability:
▪ Objective: Strive for continuous improvement in smart communication tools by staying updated
with technological advancements. Ensure that the tools remain adaptable to evolving needs and
emerging technologies.
Dept. Of AIML, AIET, Mijar Page | 29
Artificial Intelligence and Machine Learning
▪ Implementation: Regularly update Python libraries and frameworks, adopt new machine learning
models, and incorporate user feedback to enhance the functionality and adaptability of smart
communication applications over time. The objectives of creating smart communication solutions
for deaf and mute individuals using Python encompass a wide range of functionalities, from real-
time sign language interpretation to building accessible web applications. These objectives
collectively aim to empower the deaf and mute community, promoting inclusivity, accessibility,
and independence through the use of technology.
3.3 Software/Hardware Requirement
❖ SYSTEM REQUIREMENTS
❖ Software Requirements
➢ Python IDE(3.11.5)
➢ PyCharm(2023.1.2)
➢ Python libraries:
• Open-CV python
• NumPy
• Pyttsx3
• Speech-recognition
• OS
❖ Hardware Requirements
➢ laptop/PC
➢ 4//8/16 GB RAM
➢ i3/i5/i7 intel more then 6th gen
➢ etc.
❖ Camera to capture image: An RGB image can be viewed as three images (a red scale image, a
green scale image and a blue scale image) stacked on top of each other. In MATLAB, an RGB
image is basically a M*N*3 array of color pixel, where each color pixel is a triplet which
corresponds to red, blue and green color component of RGB image at a specified spatial location.
Similarly, A Grayscale image can be viewed as a single layered image.
Key Words: MATLAB, RGB.
❖ Speech texter: Speech texter is an online multi-language speech recognizer that can help you type
long documents, books, reports, blog posts with your voice. If you need help, please visit our help
page at https://www.speechtexter.com/help .This app supports over 60 different languages. For
better results use a high-quality microphone, remove any background noise, and speak loudly and
clearly. It can create text notes/sms/emails/tweets from users’ voice.
❖ Microphone: Microphone is used to give speech input that would be later converted into text using
speech texter so that deaf could read it easily as they cannot hear.[3] OpenCV: OpenCV is a library,
a cross-platform and is an opensource tool of various programming functions that focus mainly on
real-time computer vision. It can be used for various purposes such as face recognition, object
identification, mobile robotics, segmentation, gesture-recognition, etc.
❖ OpenCV: OpenCV is a library, a cross-platform and is an opensource tool of various programming
functions that focus mainly on real-time computer vision. It can be used for various purposes such
as face recognition, object identification, mobile robotics, segmentation, gesture-recognition, etc.
METHODOLOGY AND IMPLEMENTATION
The Project is divided into 3 different modules:
1. Gesture-to-Text (GTT)
2.Text-to-Speech (TTS)
3.Speech-to-Text (STT)
1. Gesture-to-Text (GTT)
The third process is developed for the vocally impaired people who cannot exchange the
thoughts to the normal people. Dumb people use gesture to communicate with normal people which
are majorly not understandable by normal people. The process starts with the capturing of image and
crops the useful portion. Convert the RGB image into Gray scale image for better functioning, Blur
the cropped image through Gaussian blur function and pass it to the threshold function to get the
highlighted part of the image. Find the contours and an angle between two fingers. By using convex
hull function, we can implement the finger point. Count the number of angles which is less than 90
degree which gives the number of defects. According to the number of defects, the text is printed on
display and read out by the Speaker.
Key Words: OpenCV, Gaussian blur, contours
2.Text-to-speech (TTS)
The first process text to speech conversion is done for the dumb masses who cannot speak. The
Dumb people convert their thoughts to text which could be transferred to a voice signal. The converted
voice signal is spoken out by E-speak synthesizer. After selecting the option OP1 the OS and sub
process imported. Call text to speech function and enter the text as input. After entering the text from
keyboard, the E-speak synthesizer converts text to speech. The process also provided with the
keyboard interrupt ctrl+ C.
3.Speech-to-Text (STT)
The second process is developed for dumb people who cannot speak. In order to help them,
we have interfaced the Logitech camera to capture the image by using OPENCV tool. The captured
image is converted to text using Tesseract OCR and save the text to file out.txt. Open the text file
and split the paragraph into sentences and save it. In OCR, the adaptive thresholding techniques are
used to change the image into binary images and they are transferred to character outlines. The
converted text is read out by the E-speak.
3.3 Source Code
# Python program to translate
# speech to text and text to speech
import speech_recognition as sr
import pyttsx3
import cv2
import numpy as np
import math
import pyttsx3
# Initialize the recognizer
r = sr.Recognizer()
# Function to convert text to
# speech
def text_to_speech(command):
# Initialize the engine
engine = pyttsx3.init()
engine.say(command)
engine.runAndWait()
# Loop infinitely for user to
# speak
def speech_to_text():
recognizer = sr.Recognizer()
# Use the default microphone as the audio source
with sr.Microphone() as source:
print("Say something:")
# Adjust for ambient noise before listening
recognizer.adjust_for_ambient_noise(source)
# Listen for the user's speech
audio = recognizer.listen(source)
print("Transcribing...")
try:
# Use Google Web Speech API to convert speech to text
text = recognizer.recognize_google(audio)
print(f"You said: {text}")
except sr.UnknownValueError:
print("Google Web Speech API could not understand audio")
except sr.RequestError as e:
print(f"Could not request results from Google Web Speech API; {e}")
def gesture_to_text():
cap = cv2.VideoCapture(0)
while (1):
try: # an error comes if it does not find anything in window as it
cannot find contour of max area
# therefore this try error statement
ret, frame = cap.read()
frame = cv2.flip(frame, 1)
kernel = np.ones((3, 3), np.uint8)
# define region of interest
roi = frame[100:400, 100:400]
cv2.rectangle(frame, (100, 100), (400, 400), (0, 255, 0), 0)
hsv = cv2.cvtColor(roi, cv2.COLOR_BGR2HSV)
# define range of skin color in HSV
lower_skin = np.array([0, 20, 70], dtype=np.uint8)
elif l == 5:
cv2.putText(frame, 'i am hungry', (0, 50), font, 2, (0, 0, 255),
3, cv2.LINE_AA)
engine = pyttsx3.init()
engine.setProperty("rate", 120)
engine.say("i am hungry")
# engine.runAndWait()
else:
cv2.putText(frame, 'reposition', (10, 50), font, 2, (0, 0, 255),
3, cv2.LINE_AA)
# show the windows
cv2.imshow('mask', mask)
cv2.imshow('frame', frame)
except:
pass
cv2.destroyAllWindows()
cap.release()
if __name__ == "__main__":
while True:
com_mode = input("Select 1 for Text To Speech \nSelect 2 for Speech To
Text\nSelect 3 for Gesture to Text")
if com_mode == "1":
command = input("Enter Text To convert Speech:")
text_to_speech(command)
if com_mode == "2":
print("Say Something to convert text")
speech_to_text()
if com_mode == "3":
#print("Say Something to convert text")
gesture_to_text()
3.5 Outputs
Figure 3.5.1 and Figure 3.5.2 Showing Interacting camera display and Displaying “Ok” text from
Gesture.
Dept. Of AIML, AIET, Mijar Page | 35
Artificial Intelligence and Machine Learning
Figure 3.5.3 and Figure 3.5.4 Displaying “Best of Luck” and “No” text from Gesture.
Figure 3.5.5, Figure 3.5.6 and Figure 3.5.7 Displaying “How are you”, “I am fine” and “I am Hungry”
text from Gesture.
CHAPTER 4
SNAPSHOTS AND PHOTOGRAPHS
4.1 Coding
Figure 4.1, Figure 4.2 and Figure 4.3 shows the image of the practice session during the Internship
4.2 Team
Figure 4.4, Figure 4.5 and Figure 4.6 shows our team during the Internship.
CHAPTER 5
CONCLUSION