Professional Documents
Culture Documents
ON
MACHINE LEARNING WITH PYTHON
Session 2020-2021
I
CERTIFICATE
This is to certify that the seminar entitled “MACHINE LEARNING” has been
presented by JAYESH GUPTA under my guidance during the academic year
2020.
Submitted to
Ms. Anima Sharma
II
ACKNOWLEDGEMENT
The training opportunity I had with [ED Apply Pvt. Ltd.] was a great chance
for learning and professional development. Therefore, I consider myself as a
very lucky individual as I was provided with an opportunity to be a part of it. I
am also grateful for having a chance to meet so many wonderful people and
professionals who led me though this training period.
(Name of student)
PAGE INDEX
III
SR. TITLE PAGE NO.
NO.
ABSTRACT 1
LEARNING OUTCOMES OF TRAINING 2
1. INTRODUCTION TO MACHINE LEARNING 3
1.1 WHAT IS MACHINE LEARNING? 3
1.2 FUTURE SCOPE OFMACHINE LEARNING 4
1.3 WHY PYTHON PROGRAMMIMG LANGUAGE? 5
4. DATA VISUALIZATION 24
4.1 CONFUSION MATRIX 24
4.2 ACCURACY, PRECISION AND RECALL 26
5. PROJECTS
5.1 WINE QUALITY ANALYSIS 28
5.2 FACE RECOGNISATION SYASTEM 30
CONCLUSION 36
BIBLIOGRAPHY 37
FIGURE INDEX
IV
FIGURE FIGURE TITLE PAGE NO.
NO.
V
ABSTRACT
VI
LEARNING OUTCOMES OF TRAINING
VII
CHAPTER 1
INTRODUCTION TO MACHINE LEARNING
In this chapter, you will learn in detail about the concepts of Python in
machine learning.
VIII
• Definition 1
Machine learning is a discipline that deals with programming the systems so
as to make them automatically learn and improve with experience.
• Definition 2
A program is said to learn from experience E with respect to some
performance measure P, if its performance on T, as measured by P, improves
with experience E.
1.2.2 Robotics
Robotics is one of the fields that always gain the interest of researchers as well
as the common
We are still at an infant state in the field of Machine Learning. There are a lot
of advancements to achieve in this field. One of them that will take Machine
Learning to the next level is Quantum Computing.
IX
1.2.4 Job Scope of ML
The scope of Machine Learning in India, as well as in other parts of the world,
is high in comparison to other career fields when it comes to job opportunities.
According to Gartner, there will be 2.3 million jobs in the field of Artificial
Intelligence and Machine Learning by 2022
Python offers concise and readable code. While complex algorithms and
versatile workflows stand behind machine learning and AI, Python’s
simplicity allows developers to write reliable systems. Developers get to put
X
all their effort into solving an ML problem instead of focusing on the technical
nuances of the language.
In the Developer Survey 2018 by Stack Overflow, Python was among the top
10 most popular programming languages, which ultimately means that you
can find and hire a development company with the necessary skill set to build
your AI-based project.
XI
CHAPTER 2
TYPES OF MACHINE LEARNING
XII
2.1 Concepts of Learning
However, the most commonly used ones are supervised and unsupervised
learning.
XIII
Regression trains on and predicts a continuous-valued response, for example
predicting real estate prices.
XIV
Common examples of supervised learning include classifying e-mails into
spam and not spam categories, labeling webpages based on their content, and
voice recognition.
When learning data contains only some indications without any description or
labels, it is up to the coder or to the algorithm to find the structure of the
underlying data, to discover hidden patterns, or to determine how to describe
the data. This kind of learning data is called unlabeled data
Suppose that we have a number of data points, and we want to classify them
into several groups. We may not exactly know what the criteria of
classification would be. So, an unsupervised learning algorithm tries to
classify the given dataset into a certain number of groups in an optimum way.
XV
If some learning samples are labeled, but some other are not labeled, then it is
semi-supervised learning. It makes use of a large amount of unlabeled data
for training and a small amount of labeled data for testing. Semi-supervised
learning is applied in cases where it is expensive to acquire a fully labeled
dataset while more practical to label a small subset. For example, it often
requires skilled experts to label certain remote sensing images, and lots of
field experiments to locate oil at a particular location, while acquiring
unlabeled data is relatively easy.
Here learning data gives feedback so that the system adjusts to dynamic
conditions in order to achieve a certain objective. The system evaluates its
performance based on the feedback responses and reacts accordingly. The
best-known instances include self-driving cars and chess master algorithm
AlphaGo.
XVI
As a subfield of information technology, its objective is to program machines
so that they will learn.
XVII
CHAPTER 3
TRAINING A MACHINE LEARNING MODEL
Defining a Problem
Importing Dataset
Pre-processing Data
Evaluating Algorithms
Improving Results
Analyzing Results
XVIII
3.2 Data Pre-processing
• In the real world, we usually come across lots of raw data which is not
fit to be readily processed by machine learning algorithms.
XIX
Fig 3.1 Label and One Hot Encoding
XX
A simple variable linear regression technique is a type of ML algorithm
that demonstrates how a single input-independent variable (feature
variable) and an output-dependent variable work together.
XXI
Advantages: Easy to implement and interpret. Suited well for a linearly
separable dataset.
XXII
The random forest technique is simple, highly accurate and widely used
by engineers.
3.3.4 K-Means
XXIII
points inside a cluster are homogeneous and are heterogeneous to peer
groups.
How K-means Forms Cluster K-means forms cluster in the steps given below:
Each data point forms a cluster with the closest centroids, that is k clusters.
Finds the centroid of each cluster based on existing cluster members. Here
we have new centroids.
As we have new centroids, repeat step 2 and 3. Find the closest distance for
each data point from new centroids and get associated with new k-clusters.
Repeat this process until convergence occurs, that is till centroids do not
change.
Determination of Value of K
In K-means, we have clusters and each cluster has its own centroid. Sum of
square of difference between centroid and the data points within a cluster
constitutes the sum of square value for that cluster. Also, when the sum of
square values for all the clusters are added, it becomes total within sum of
square value for the cluster solution.
XXIV
Fig 3.5 K-Means Algorithm
The case being assigned to the class is the most common among its K nearest
neighbours measured by a distance function. These distance functions can be
Euclidean, Manhattan, Makowski and Hamming distance. First three functions
are used for continuous function and fourth one (Hamming) for categorical
variables.
XXV
If K = 1, then the case is simply assigned to the class of its nearest neighbour.
At times, choosing K turns out to be a challenge while performing KNN
modelling.
The algorithm looks at different centroids and compares distance using some
sort of function (usually Euclidean), then analyses those results and assigns
each point to the group so that it is optimized to be placed with all the closest
points to it.
You can use KNN for both classification and regression problems. However, it
is more widely used in classification problems in the industry. KNN can easily
be mapped to our real lives.
You will have to note the following points before selecting KNN:
Variables should be normalized else higher range variables can bias it.
Works on pre-processing stage more before going for KNN like outlier,
noise removal
XXVI
Fig 3.6 K-Nearest Neighbour
CHAPTER 4
DATA VISUALIZATION
XXVII
Fig 4.1 Confusion Matrix
True Positive:
Interpretation: You predicted positive and it’s true. You predicted that a
woman is pregnant and she actually is.
True Negative:
XXVIII
Interpretation: You predicted negative and it’s true. You predicted that a man is
not pregnant and he actually is not.
Interpretation: You predicted positive and it’s false. You predicted that a man
is pregnant but he actually is not.
Interpretation: You predicted negative and it’s false. You predicted that a
woman is not pregnant but she actually is.
4.2 Accuracy, Precision and Recall
XXIX
a tumor as being malignant, the prediction is called a true positive. When the
system incorrectly classifies a benign tumor as being malignant, the prediction
is a false positive. Similarly, a false negative is an incorrect prediction that
the tumor is benign, and a true negative is a correct prediction that a tumor is
benign. These four outcomes can be used to calculate several common
measures of classification performance, like accuracy, precision, recall and so
on.
Where,
TP is the number of true positives
TN is the number of true negatives
FP is the number of false positives
FN is the number of false negatives.
Precision is the fraction of the tumors that were predicted to be malignant that
are actually malignant. Precision is calculated with the following formula:
XXX
In this example, precision measures the fraction of tumors that were predicted
to be malignant that are actually malignant. Recall measures the fraction of
truly malignant tumors that were detected. The precision and recall measures
could reveal that a classifier with impressive accuracy actually fails to detect
most of the malignant tumors. If most tumors are benign, even a classifier that
never predicts malignancy could have high accuracy. A different classifier
with lower accuracy and higher recall might be better suited to the task, since
it will detect more of the malignant tumors. Many other performance measures
for classification can also be used.
CHAPTER 5
PROJECTS
Problem Statement:
Using the dataset of red wine quality analysis, train a model using suitable ML
algorithm which can predict the quality of a wine as fine, good or great.
XXXI
Fig 5.1 Project Screenshot 1
XXXII
Fig 5.3 Project Screenshot 3
XXXIII
Fig 5.5 Project Screenshot 5
Problem Statement
Develop a face recognition system which can detect human faces and can also
count the no. of faces in the frame.
XXXIV
Fig 5.6 Project Screenshot
CHAPTER 6
APPLICATIONS OF MACHINE LEARNING
XXXV
Artificial Intelligence (AI) and Machine Learning are everywhere. Chances
are that you are using them and not even aware about that. In Machine
Learning (ML), computers, software, and devices perform via cognition
similar to human brain.
Typical successful applications of machine learning include programs that
decode handwritten text, face recognition, voice recognition, speech
recognition, pattern recognition, spam detection programs, weather
forecasting, stock market analysis and predictions, and so on. This chapter
discusses these applications in detail.
Siri, Google Now, Alexa are some of the common examples of virtual
personal assistants. These applications assist in finding information, when
asked over voice. All that is needed is activating them and asking questions
like for example “What are my appointments for today?”, “What are the
flights from Delhi to New York”.
GPS navigation services monitor the user’s location and velocities and use
them to build a map of current traffic. This helps in preventing the traffic
congestions. Machine learning in such scenarios helps to estimate the regions
where congestion can be found based on previous records.
XXXVI
Video surveillance systems nowadays are powered by AI and machine
learning is the technology behind this that makes it possible to detect and
prevent crimes before they occur. They track odd and suspicious behaviour of
people and sends alerts to human attendants, who can ultimately help
accidents and crimes.
Facebook continuously monitors the friends that you connect with, your
interests, workplace, or a group that you share with someone etc. Based on
continuous learning, a list of Facebook users is given as friend suggestions.
You upload a picture of you with a friend and Facebook instantly recognizes
that friend. Machine learning works at the core of Computer Vision, which is a
technique to extract useful information from images and videos. Pinterest uses
computer vision to identify objects or pins in the images and recommend
similar pins to its users.
XXXVII
6.7 Online Customer Support
Google and similar search engines are using machine learning to improve the
search results for their users. Every time a search is executed, the algorithms at
the backend keep a watch at how the users respond to the results. Depending
on the user responses, the algorithms working at the backend improve the
search results.
XXXVIII
of tools that helps them compare millions of transactions and make a
distinction between legal or illegal transactions taking place between the
buyers and sellers.
CONCLUSION
XXXIX
Machine Learning is a technique of training machines to perform the activities
a human brain can do, albeit a bit faster and better than an average human-
being. Today we have seen that the machines can beat human champions in
games such as Chess, Alpha-Go which are considered very complex. You
have seen that machines can be trained to perform human activities in several
areas and can aid humans in living better lives. Machine Learning can be a
Supervised or Unsupervised.
If you have lesser amount of data and clearly labelled data for training, opt for
Supervised Learning. Unsupervised Learning would generally give better
performance and results for large data sets. If you have a huge data set easily
available, go for deep learning techniques. You also have learned
Reinforcement Learning and Deep Reinforcement Learning. You now know
what Neural Networks are, their applications and limitations. Finally, when it
comes to the development of machine learning models of your own, you
looked at the choices of various development languages, IDEs and Platforms.
Next thing that you need to do is start learning and practicing each machine
learning technique.
XL
BIBLIOGRAPHY
XLI