You are on page 1of 175

TABLE OF CONTENTS

INTRODUCTION
INTRODUCTION OF MACHINE LEARNING
DATA SCIENCE, ARTIFICIAL INTELLIGENCE
AND MACHINE LEARNING
MACHINE LEARNING PREPARATION
MACHINE LEARNING WORKING SECTION
Linear Regression
PRACTICAL LINEAR REGRESSION
APPENDIX
CONCLUSION
INTRODUCTION
Python
Summarize
The details of machine learning will be discussed later, but briefly, if
a machine can learn by itself based on experience or predict it, then
the system is Intelligent or ML Activated.
Currently, machine learning has become an important factor for any
engineering department. It is very important to learn it for data
analysis, classification, prediction. Big data, data science, and
machine learning are associated with artificial intelligence. At
present, in general web apps or mobile phones, ML's various theories
are applying so that the application you use becomes more intelligent
and can gain the ability to understand your mind. Common
applications and ML applicable apps are different, common
applications will always be common, but the ML implementation
will be unique, every time you use the app, it seems to be more
intelligent. However, ML can not only give an intelligence to the
app, but there is no pair of ML for classification and perfection of
any type from diagnosis. In this course, basically, as well as creating
models, the mathematics behind it will be explained in the fluent
language.
This Course is for:
Artistic Intelligence, Big Data, Interested in Data Mining or ML
practitioner, ML Hobbyist and ML Beginners can have this course.
Whoever heard the name of Machine Learning and interested in
applying can do the course. Details are described below.
What you need to know before starting the course (* Marked
topics will be excluded from discussion):
Basic Python Programming *
Basic MATLAB Programming *
Basic JavaScript Programming *
Linear Algebra *
Pythonic Syntactic Sugar
If you have an OOP python it can be considered as
a plus point
Calculus (integral and differential)
Basic statistics such as: Mean, Mode, Median,
Variance, Co-Variance, Correlation, Standard
Deviation ...
What will be discussed in this?
Machine learning is actually a very broad subject. In a course, it is
not possible to complete. Regardless, research is going on to build
better models of better models. In this course you will be introduced
to machine learning, but you will have to go to advanced level your
own. Then let's look at the topics (full topics will be updated later):
Necessary software installation
Anaconda Python Distribution Installation
Identity and installation with PyCharm IDE
Make Sublime Text 3 useful for Python
Machine learning kick start
What is machine learning?
What is the application of machine learning?
What is regression?
What is linear and polynomial regression?
Prediction with Simple Linear Regression (using
Sklearn Module)
Prediction with Simple Linear Regression (Model
from Scratch)
Machine learning kick start 2
Supervised Learning
Unsupervised Learning
Two essential precognition algorithms
Why are these two algorithms necessary?
What is Penalized Regression Method?
What is Ensemble Method?
How to select the algorithm?
The general recipe for making predictive models
Identify the problem through the dataset chain
New dissection problems
What are attributes and labels? What are
synonyms?
The dataset has to be kept in mind
Model and Cost Function
Model Representation
Cost Function
Cost Function Intuition - 1
Cost Function Intuition - 2
Ovefitting - Is your model giving over
performance?
Parameter learning
Gradient Descent
Gradient Descent Intuition
Gradient Descent in Linear Regression
Frequently Asked Questions:
What will be the use of machine learning in my career?
Machine learning is a very wide area. It is from Artificial
Intelligence to Pattern Recognition. Every day, there is a lot of work
to do with data. This data is processed by Google, Microsoft, and
large companies, through pattern recoding. That's why it is so easy to
offer Google search. Whatever the mistakes, he decides to fix it.
Suppose you watch videos on regular programming in YouTube.
After a few days, he will give a video suggestion that you would like
the video to watch.
It does not matter what you need in the careers. If you are a doctor,
you know a little bit of programming, some can be made by ML,
some Data Science and some NLP (Natural Language Processing) or
NLU (Natural Language Understanding) then you can make artificial
brain, which may lead to disease symptoms and can give disease
infections. Whenever you go around, you can treat minor diseases as
a brain e doctor made by you as a chatbot.
In career or not, a large interest-bearing area of CS is ML.
Everybody should know the keywords used in ML.
For whom is machine learning?
It is very good to have a science background to learn machine
learning. Because common programming is done through explicit
programming but in the case of predication, it cannot be solved by
using explicit programming. If there is not a slight idea about
science, there may be problems understanding the underlying
concepts, but you can develop models apart from Math, but the
optimization of the model is close to the impossible without Math.
When should machine learning be used?
If you think your app requires music / video / blog post review to be
set. Or you need smart spammer blockers on your website. Or by
clicking on any ad on your website based on some parameters ... etc.
What is the reason for discussing so many languages?
If a flower stack JavaScript developer wants to apply ML method to
his web app, he will have to learn Python again, to avoid these
troubles, the same thing will be displayed on different platforms.
Which books will be followed?
Machine Learning in Python: Essential Techniques
for Predictive Analysis [Wiley] - Michael Bowles
Mastering Machine Learning with Scikit-Learn
[PACKT]
Data Science from Scratch [OREILY] - Joel Grus
Building Machine Learning System with Python
[PACKT]
Is There any TV series made with ML?
If machine learning is complicated and difficult, it is difficult and
stagnant if you only learn the theory when learning something. But if
we look at the topical related movie or series as well, then our
interest is multiplied. So, this is a short list.
Person Of Interest
A TV series, which is a very enjoyable one, based on machine
learning, this is just enough to love machine learning. The main
character of this is the ultimate talented programmer Harold Finch
and his right-hand John Reese. Harold Finch creates a machine that
can predict before an accident occurs and Harold Finch's job is to
prevent the accident.
In it, shown that->
Natural Language Understanding (where Harold
uses this machine to communicate with in English
language)
Image Processing (Facial Recognition, Object
Recognition, Optical Character Recognition ...)
Artificial Neural Network: It is often seen that
several pictures are interconnected through lines,
these are actually the connections of Artificial
Neuron. A large part of this course will be ANN
Silicon Valley
Although the series is primarily about the genius of the talented
programmer and its data compression company, the ML application
here is called 3rd season.
The data compression algorithm is the main function of how much
information is there in a dataset? If the algorithm detects that a
specific part of the dataset is redundant, it does not damage even if it
is deleted. Except that part, the compressed data size will be lower
than usual, that is normal. But the information extraction tie is the
real challenge.
Suppose that your class teacher only speaks the word 'a' in class, it
can be understood from though the sum of 'A' is acceptable as a
dataset, but the amount of information in 0 When we compress these
'A' strings, the output file size will be 0 bytes. As it has no
information at all. But if the bad algorithm applying, the size of the
output file can be equal or slightly less in input.
Predict one of the applications of machine learning. So, by using this
in data compression we can extract information very easily. But if
the performance of our model is bad, then the AI system can cut the
information as part of the redundant part.
In the 3rd season (non-spoiler) it can be seen that under some
circumstances, Richard is supposed to drop the machine learning
system, but he says that it will be his compression algorithm useless.
We can assume that from here, the ML Methodology applying the
Information Extraction was the main work of the Middle Out
(fictional algorithm).
Very interesting and insightful a TV series Silicon Valley. Although
it is not fully connected with ML, its stories will help you to better
spend your time.
INTRODUCTION OF MACHINE
LEARNING
What is machine learning?
Before starting machine learning, let's look at some definitions from
books. In this regard Arthur Samuel said
Field of study that gives computers the ability to learn without being
explicitly programmed.
Suppose, a Bipedal robot can learn to walk manually, without a
specific walker program, but it can be said that robotic learning
algorithms have been used. We can easily write a program for a
Bipedal Robot walk. But that walk cannot be called intelligent, in
any way, if an embedded system is programmed for that, then only
that specific task, then how can it be intelligent? If the behavior of
the device changes with the change, then it can be called an
intelligent.
Tom Michelsaid-
A computer program is said to learn from experience E with respect
to some class of tasks T and performance measure P, if its
performance at tasks in T, as measured by P, improves with
experience E.
Suddenly, looking at the definition can be a problem, so it can be
said by an example,
Well, I made a machine that can play chess, then we can write the
following parameters like this,
E = Suppose, the machine played 500 pieces full set chess
T = chess game is the task of machine
P = The machine is not won or won.
As the Definition, If the increase in the number of machine games
(E) increases its win rate (P), then the machine really knows how to
learn.
And it is absolutely impossible to program explicitly.
DATA SCIENCE, ARTIFICIAL
INTELLIGENCE AND MACHINE
LEARNING
Data Science
Data Science is the sum of statistics, machine learning and data
visualization. The job of a Data Scientist is to find answers to some
questions through a dataset.
Artificial Intelligence
Artificial Intelligence: Some problems and problem-solving
techniques, which are used to solve complex problems. The
computer is playing cards, chess games, natural language translation,
security strategy management incorporates AI. There is no word that
AI's problem is to be based on the real dataset, it can be theoretical.
Machine Learning
Machine learning is a section of artificial intelligence where
intelligence systems are created, through datasets or interactive
experiences. Machine learning technology is being used in a number
of fields, including Cybersecurity, Bioinformatics, Natural Language
Processing, Computer Vision, Robotics.
The most basic work of Machine Learning is to check the data
classification, such as an email or website comment spam. There is a
lot of research going on above Deep Learning or Deep Network,
which is mainly used in Convolution Neural Network.
At present, machine learning at the industrial level is very important.
Everyone should know some machine learning methodology. Several
things of machine learning will overlap with data science, but the
main target of machine learning is to build a predictive model.
In a Word,
That is, AI helps in creating an Intelligent Machine. ML is a subfield
of AI, which helps the machine to learn something and helps the
latest data science learning algorithm-based machine to find a data
pattern, which it can use in the next application.
Data science, ML and AI will often feel the same thing because the
differences between them are very small. But there is a common joke
about data science,
Computer knowledge of a Data Scientist is more than a statistic and
its statistics knowledge is more than a computer scientist.
Computer knowledge of a Data Scientist is more than a statistic and
its statistics knowledge is more than a computer scientist.
Types of learning algorithms
Supervised learning
First, teach the machine, then use its teaching.
Regression
Let's start discussing the most familiar problems of ML. Suppose I
have some data, the size of a house and its bargain.
If I plotted this dataset, I would see a graph like this below.
Data-set:
Problem:
I was asked to find out with the above dataset,
If your friend's house size is 750 square feet, then how much is the
price?
Solution:
If I could find an equation which would have the Corresponding
Price available in the area, that means
y = f(x)
or
Price = f (Area)
That means we have to find out what the f () function is? I will not
say here how to find out f ().
The information we get from the problem
Here we are giving our algorithm a dataset where the correct answer
is given. (From the graph)
That means we know the actual price of a fixed size house
By feeding this data in the algorithm, we can teach
him that the price of this size is so much higher.
This is called Training Data.
Now based on this training data, we know the size
of the house that was not in the training data. For
example, if I want to know the price of 3000 sq. ft,
it is not in the dataset! But the model I created can
estimate the price of 3000 sq. ft home based on the
previous experience.
This problem is related to Regression problems
Because, from the earlier used values, we are trying to assume a new
value. Seeing the next example will be clean.
Classification
Now let's look at another type of dataset, where the size of the tumor
is said to be whether the tumor is deadly, malignant or lethal.

Here 1 has been identified as yes and 0 as no.


According to common sense, if the size of the tumor is larger, then
its chances of lethal are increased.
But the dataset shows that some tumors may be larger in size but not
lethal. Again, some small tumors may be lethal.
Problem: We can create a predictive model that can tell whether the
tumor is lethal (based on size).
The information that is available from the dataset
This is a classification problem, because we want
to input the tumor size and output type Yes / No
type answer. We do not want any value, for
example, we wanted a standard for house prices,
yes / No type reply was not acceptable. So that's the
regression problem
The data will be divided into two parts, there will
be nothing in between, 1 to Malignant and 0 as a
malignant we can tag.
If it is plotted differently

Here we are trying to say using a single parameter


(size of tumor) that is deadly? There is actually no
such thing as input, and many more inputs may
have. Ex: Age vs Tumor size
If there are more input parameters
Ways to solve?
We can separate this group two with a straight line. Which will be
discussed later.
Later we will look at Unsupervised Learning as well as the necessary
Python software package installation.
Un-Supervised learning
The machine will learn something to do by itself, then using the
lesson will understand the data structure and pattern that you do not
even understand
MACHINE LEARNING
PREPARATION
Details of installation of necessary packages will be in this section.
Instructions for downloading and installing Anaconda packages will
be available in the next chapter.
Python package installation
Machine learning Python package installation:
There are several python modules and libraries required for machine
learning. We will build models from scratch as well as see how
models can be made using libraries.
Anaconda Package Download Installation (Python.7.7)
In the full course we will use the Python version 2.5. So, the Python
version of the Anaconda package is recommended to be 2.7.
Windows
Download for Windows 32bit (335MB)
Download for Windows 64bit (281MB)
OSX
64 Bit download
Linux
64 Bit Download(329MB)
32 Bit Download(332MB)
After downloading, go to the download directory and enter the
following command in the terminal,
bash Anaconda2-4.0.0-Linux-x68_64.sh
Spyder IDE
Open the Spyder IDE and run the code below, if it works then
understand that your computer is ready for machine learning.

from sklearn.linear_model import


LinearRegression
print sklearn.__version__
Anaconda Official Website
In the next phase we will see the regression analysis, by example.
MACHINE LEARNING PYTHON TOOLS
Pandas
Data Frame Library
Numpy
Calculation and Matrix Library (Linear Algebra)
Scikit-learn
Machine Learning Library
IPython/Jupyter Notebook
Toll to write Machine Learning Program Language
easily.
Introdiction of IPython/ Jupyter Notebook
There are only two kinds of languages: the ones people complain
about and the ones nobody uses -- Bjarne Stroustrup
Python Library for machine learning
Python libraries will be used for machine learning:
numpy - for scientific calculation
pandas - data frame
matplotlib - for two- and three-dimensional graph
plates
scikit-learn
Machine learning algorithm
Data Pre-Processing
Predictive model building and performance testing
much more
IPython Notebook / Jupyter Notebook
Painless Machine Learning Model
IPython Notebook / Jupyter Notebook
Jupyter Notebook was previously known as IPython Notebook.
Why is it necessary to know about IPython Notebook?

The things we use in a notebook. Speaking of


IPython Notebook, the programmer's notebook will
not be called inaccurate.
As the work of machine learning is Iterable, it
means that the work of IPython Notebook machine
learning is a perfect tool to check the previous part
and the next part of the work as well.
We share the code for sharing the code, but the
ones shared with it must be sure to run the code.
Documents in the case of IPython Notebook are
shareable. Each command or command can be
shared by bundle output through a document.
Another great advantage is the IPython Notebook,
fully Supported Markdown formatting. If you wish,
you can write notes in the form of notes in the
Markdown format.
In addition to IPython Notebook Python: C #,
Scala, PHP, etc. supports other languages, but
plugins should be used in that case.
Running IPython Notebook
Open the cmd ipython notebook then press Enter. If not working,
write jupyter notebook.Do two things, do not work, reinstall
Anaconda package.
Basic Instruction
Enter the code and press Enter to enter the new line
Pressing Shift + Enter will execute a cell
A little more IPython Notebook!
Inline graph plotting!

%matplotlib inline
from matplotlib import pyplot as plt
import numpy as np

x = np.array(range(10))
y = np.array(range(10))

plt.plot(x, y)
plt.show()
The work of IPython has been shown! Later, new packages will be
introduced.
MACHINE LEARNING
WORKING SECTION
Before applying machine learning, there are several things to know
about. We will look at this chapter, how to build a predictive model,
starting from the machine learning algorithm choice can be published
through a simple recipe.
Scikit-learn has three parts:

Input
Training data
Predicted output
Model
The most common thing to do when using the scikit-learn library is
to create an object that uses the classifier. For example, in case of
previous home size and bargain related problems,

from sklearn import LinearRegression


#Creating the regression model
linear_regression_model = LinearRegression()
We will be working with this linear regression model with data.
Fit (Train)
#Let, house_sizes = Contains all size of house
#house_price = Contains all price of house
#Training the model
linear_regression_model.fit(house_sizes,
house_prices)
Generally, there are often two functions in each model that fit and
predict.
Predict
#predicted value
Here, test_house_size is a variable which is not a
member of training data, rather a unique one
predicted_price =
linear_regression_model.predict(test_house_size)
These predicted_price variables will be assigned to the size of the
house I want to know.
What is the Machine Learning Workflow?
According to the terminology,

An orchestrated and repeatable pattern


which systematically transforms and processes
information to create prediction solutions.
An orchestrated and repeatable pattern:
That means, we'll define the problem with the same workflow, and
we will build solutions through that workflow.
Transforms and processes information:
Before creating the model with data, it should be used for training.
Suppose we want to create a predictive model that answers yes or no.
If input data is numerical, then output is also numerical. For this
reason, we can replace the Yes / No label in the field of training data,
and we can replace it by 1 and 0. This is called information
preprocessing.
Create prediction solutions:
Predict is the latest milestone of any machine learning. However, in
order to meet the demand of the prediction, the customer needs to be
monitored.
For example, a new model of my model takes training in 2 days, it
takes more than 1-day time to predict. Now, if more new data comes
in 1 day, I need more time to train them. The time limit for predicting
data will increase further.Will this model take any healthy normal
people? Of course, so, as much as a model is capable of predicting a
model closer to the time of less training, algorithms and machine
learning systems are as good as they are.
Machine Learning Workflow
What I really want to do, it starts with work. In the case of home size
and price problems, I mean, I want to input the size of the house and
ask for the output.
The complex work can be easy if we set the right question then we
look for the answer.
Data tweaks
Now, to solve my problems, or to train the model, of course, we need
to know the data. A machine will only be able to distinguish between
a good job and bad work, if it is trained by showing good and bad
work, then it will be able to match the training data of the new event
after it is not good or bad.
Selecting the algorithms
The most difficult task is to select the algorithm. Artificial neural net
does not have any meaning in the problems that can be done using
simple linear regressions.
Since the same work can be done with different models, it is usually
chosen that the model error will be less visible.
In order to select the algorithm, you must understand the Problem
Set and available dataset. As the problem of home and bargain is a
regression problem, if I apply the clustering (data classification)
algorithm here, then its prediction will be very bad.
Model Training
Dataset, due to sharing in two parts, will be required to learn the
model before giving training data. Then you have to choose an input
(testing dataset) and input the model to find out how precise or
incorrect it can be predicted.
At that time, I do not give training in all my hands, so some data is
going away from me, so that I can verify that I can learn something
from the model I made.
It's a lot like this, I want to teach the ML model I created to name 3's
room. Then I'll create a Dataset like this,

3x1 = 3
3x2 = 6
3x3 = 9
3x4 = 12
3 X 5 = 15
3 x 6 = 18
3 X 9 = 27
3 X 10 = 30
Now if I separate the training data and testing data from here then the
datasets will stand,
Note: There is also a separate algorithm for separating training data
and testing data from datasets. Data Selections are very good if the
data selections are well-structured algorithms. We'll look into details
later.
Training data
3x1 = 3
3x2 = 6
3x3 = 9
3x4 = 12
3 x 6 = 18
3 X 9 = 27
3 X 10 = 30
Testing data
3x5=15
Here, it turns out that I did not give 3x5 to the training data. That
means, I'll model the model with the remaining data except 3 x 5.
Then, with the input of 3 and 5, will the model be given around
output 15? If so, that means my model algorithm selection, data
preprocessing and training is appropriate. And if it does not work,
then I'll have to triage again from the data preprocessing start with
the new algorithm.
Creating a model is a whole iterative process, that is to say, reflection
is to be done, it is not right to think that a model will be perfect at
once.
Workflow Guidelines
The last step is to be careful to build the model, from the very
beginning
To build the model, from the beginning all the thoughts will come
forward. Because every step is dependent on the previous step. Like
a chain, one part is wrong, again the first thing to start from.
So, the ultimate target, the handset datasets, the algorithm selection
will do everything with care and thinking.
Anyone can go back to the previous step
Suppose you have a quality dataset, but you want to output the result
of adding two numbers. At the same time, with what you want, the
dataset is no match. So, we have to replacing the data-set of the
quality by adding dataset of yoga, then we need to train the model
again.
Data needs to be sorted
RAW data will never be sorted in your mind, for model training it
must be prepressed.
This is the most time-consuming data preprocessing.
The more fun the data is
In reality, the data you can feed in the model will be better than its
prediction accuracy. This theory is indispensable.
Problems, it's a solution that is good
Never give up on bad solutions. If you do not get adequate
performance even after many attempts at solving a problem, ask
other steps.

Am I right?
Do I have the necessary data to solve the problem?
Is the corrected algorithm correct?
If you do not get satisfactory answers, then do not fix the problem.
The reason is that the model gives the correct answer 50% and the
remaining 50% is wrong answer, it is better to create a confession,
not to solve the problem.
"There are no right answers to wrong questions." - Ursula K. Le
Guin
"Ask the right questions if you're going to find the right answers." -
Vanessa Redgrave
"In school, we're rewarded for having the answer, not for asking a
good question." - Richard Saul Wurman
The first step in the form of machine learning model
Workflow Revision
If we want to apply general method, then the first thing we need to
do is solve a problem with those methods. Let's see our sorted
problem once,
Problem details
Predict whether a person will be infected with diabetes
Seeing the details of the problem, we may have got our desired
question, what does it mean to ask a new question? This problem is
only going to be solved?!
The answer is no, no. It is wrong to define the machine learning
problem with just one line of questions, break this question into
small and specific questions, then they have to be resolved.
Then what are the questions?

The questions should be such that if we solve it, we


can actually create a fully functional predictive
model.
Based on the question we are building a Predictive
Model, so we need to do point. We will create
solutions based on those criteria.
It is more useful than a one-line question. Some
statements in question, which define the beginning
of our Solution Build, define the last (for example,
if the model's prediction succession rate is good,
then we will go to predict without excluding model
optimization) and how we Reach Our Goals.
So, let's look at Solution Statement Goals
Data Sources Observation (Scope & Dataset)
Performance score and performance target
Where to Use (Context)
How to create a solution
After adding these points one by one we will get our desired
questions.
Scope and Data Sources
Monitoring the American Diabetes website datasets has given
several factors to diagnose diabetes, which will help us to identify
the important input variables in our dataset
Age
Adults are more likely to be infected with diabetes.
Race
African-American, Asian-American, American-Indian is more likely
to have diabetes.
Gender
Sex does not have any effect on diabetes.
Why are the above Factors important?
Input Variable Filtering:Factors show that we will
focus on 'race' and will not give importance to
'gender'. If we do not have Gender as an input
variable, then the prediction must be bad.
To choose the dataset:We have to select the dataset
which has Age, Race (may contain more), those
datasets.
We will select Pima Indian Diabetes Study from
the University of California Irvine (UCI) repository
for this purpose. Because these datasets fill our
demanded crystals.
Changed Statement
After reviewing the scope and dataset, our changed problem
statement.The Pima Indian Diabetes Dataset will be used to identify
who will be suffering from diabetes
Performance score and performance target
As difficult as solving the problem, we can easily
guess that the solution to this problem is yes / no.
That is, Binary Result (True or False)
If we build the model, we will get a performance
score. That means how well our model can predict.
But there is a limit to this performance. Generally,
100% of the prediction rate is not, but we have to
try to be good.
So, we need to think about acupressure while
building. We do not have any guarantee that we
will get accuracy.
There is nothing worse than 0% Accuracy. If the
performance score of my model is 50%, it means
that if the model does any of the prediction, then
the probability of it would be 50-50. So, we must
look at more than 50% of auctions.
50% extreme bad performance scores when we do
prognosis. Genetic differences are a big factor here.
It has been found that both twins and all are the
same as genetic differences. There is a possibility
of diabetes being different for each of these people.
More than 70% Accuracy is fairly responsive. Now
we can go ahead with this point at the end point.
So, let's make some changes to our target statement.
Changed Statement
By using Pima Indian Diabetes Dataset,
70% or more of accuracy will have to be diagnosed
who will be suffering from diabetes.
Context
Our problem is medical-based, so we have to relevance here. This
will make the solution better.
Everyone is genetically different from each other,
so here are some of the known-unknown factors.
Although apparently the same factor, two people
may have diabetes and others may not.
This is that we're using a potentially promising
phrase to be here. So, we are not 100% sure, at all,
whether diabetes will happen at all.
If we add to our statement or the likelihood, then
there will be a modified statement,
Changed Statement
Using Pima Indian Diabetes Dataset, 70% or more of accuracy
should be detected, who may be suffering from diabetes.
Make the solution
In our statement, the machine-learning process has not yet arrived. If
we use machine learning workflow, we will get a good idea to create
solutions.
Machine Learning Workflow:
Pima Indian Data Prepressing
Data Transformation (if required)
Reverted statement again
Changed Statement
After using a machine learning workflow, Pima Indian Data has to
be made a predictive model after prorogation and necessary
transformation.
Now this model has to be diagnosed with acupuncture 70% or more
who may be suffering from diabetes.
Final Statement and Questions
Need to use a dataset? - Pima Indian dataset
What is the performance? - 70%
How to create a solution? - Using predictive model
through Data preprocessing and transformation
using machine learning workflow, then you must
predict using a dataset.
It was seen from the discussion that, a little question of one line, we
have researched in a variety of ways to find answers to several
important questions, which we can easily solve the problem. We will
solve the problem by using this workflow in the next episodes.
Data Processing-1
"Give me six hours to chop down a tree and I will spend the first four
sharpening the ax." - Abraham Lincoln
Data Creation (Data Collection and Prepressing) - 1
We have seen in the previous episode how to create our target
statement through the right questions.
We will see the second step of machine learning today. The second
step was,
Machine learning means working with data, so if I say, it is not
surprising that the data collection, the most time spent in model
building, to process.
The model will build on the collected data, the better your algorithm,
if the data is not effective, then your predictive model will not be
good either. It is unanimously acknowledged. So in this step of
machine learning you have to be more careful.
If it is good to prepare the data then it will be very easy to make the
model, there will be no need for repeated tuning and no need for data
cleansing. If you can not prepare data well, then your model will not
be good, but you have to put it in the data again and again for the
model building.
So before handling the data cleanly in the managerial level, it is good
to hand in the model building.
Let's see what we'll do in this episode.
Overview

Finding Data
Data Inspection and Data Cleaning
Data Exploration
Converting Data to Tidy Data through Handling
All things are done at Jupyter Notebook
Let's see, what is Tidy Data?
Tidy Data
With the dataset that can be easily modeled, it can easily be
visualized and those that have a specific structure or structure are
Tidy Data.
Feature of Tidy Data
Each variable will be column one
Each observation will be a row
Each observational unit will be a table
Tidy form of collected dataset is somewhat time consuming.
50-80% of the time spent on machine learning based projects are
used to collect, clean and organize data.
Data collection
What are the good sources of data collection?
Google
If you search Google, of course you will find, but
there may be a little caution, hahabi, fake and
cancellation data, they can be used for testing. But
if a serious project is required, try to collect
vertefid data.
Government database
Government databases are really good sources for
data calculation. Because here you can find fairly
varied data. Some government databases have good
documentation for checking the data.
Professional or company data source
Very good source. Some professional societies
share their databases. Twitter also reports and
shares their tweets of their tweets and their own
analysis of tweets. Financial data is available from
the company's free API, such as Yahoo! Share
these types of datasets.
The company that you work in
The company you are in may also be a good source
of data.
University Data Repository
Some universities offer datasets at free of charge,
such as the University of California Irvine. They
have their own data repositories from which you
can collect data.
Kaggle
Do not know about Kaggle's name while working
with machine learning, or not. You can call it the
code force of data scientists. There is a constant
contest on data analysis. Unrivaled for high grade
datasets.
GitHub
Yes, there is a large amount of data available in
Git-hub. You can check out this Awesome Dataset
Collection
All that has been discussed above
Sometimes a source does not work with the data, so
all sources cannot be tricked. Then integrate all the
data and make the Tidy data work.
From where will we collect the dataset of our selected problem?
Pima Indian Diabetes Data
Data files
Dataset details
It has been said that we will collect diabetes database from UCI
Machine Learning repository.
Some features of this dataset:
At least 21-year-old Female Patient
768 Observation (768 Row)
Contrary to every row, there are 10 columns
9 of 10 columns are the feature, meaning: number
of pregnancies, blood pressure, glucose, insulin
level ... etc.
And the rest of the column is: Do not have diabetes
(True / False)
Using this dataset, we will find out the solution of the problem.
Let's take a look at some data rules before that.
Data Rule # 1
Whatever you are trying to predict, it is as good as it is in the
dataset
This rule can be understood by reading or listening to the rule and
even with common sense.
Actually, that's not the case, since we want to figure out how likely a
person is to be diagnosed with diabetes, this dataset is perfect for our
work, because there is a direct line on a column, the person who has
been tested is diagnosed with diabetes?
To solve many problems, you might not want to predict exactly what
you want to predict. Then you will have to recreate the dataset and
sort it in a way that matches your target variables (the attribute that is
predicted, such as whether there is diabetes here) or near.
Data Rule # 2
No matter how good you look at the dataset, the way you want to
work with it will never be in the format.
So, the next task of collecting data is data preprocessing. Which we
talk about today.
CSV (Comma Separated Value) Download and
Direct Data.
If you visit the UCI link, you can see there are links to two files
named data and .name.
Valid values in the .data file are separated, but the file format is not
.csv, another thing is that there is no value what the value means
(say, but separate file - .name).
So, for your convenience, I uploaded the .csv file. Whereas, along
with the values, any column actually indicates the properties.
Download two files and place them on your PC.
csv pima dataset download (original)
csv pima dataset download (modified)
Note
original: It has been said that there is no diabetes
here
Modified: Replaced all 1/0 with TRUE / FALSE
The data exploration with the Pandas library
Did you know a bit about ipython notebook? If you do not know,
take a look from here.
Open the cmd in Windows and open the notebook
with the following command
ipython notebook
If the command does not work, then try it jupyter
notebook
If your browser is open then New> Python 3 opens
a Python file and performs the tasks shown here.
Import required libraries
Before starting the work, we added the necessary libraries with the
code below
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Magic Function of Jupiter Notebook for Inline
Ploting (We do not want to make plot shows in
separate windows)
% matplotlib inline
Data load and reviews
pd.read_csv (r'file_path ')
Here we have imported the pandas library as pd (as), in order to call
any function of pandas, I do not need to write the pandas, pd must be
written.
If I did this,
import pandas as PANDA
So to call the function, I wrote it like PANDA.read_csv ('file_path').
Now, in the read_csv function, the function is understood to mean
that the job is to read the csv file.
This function converts csv files into Pandas dataframe format. Which
can be changed by the Pandas library.
Read_csv ('filePath') Here I gave the directory where my PC was the
csv file in the argument. You must give your PC the directory where
you have the file.
Data_frame.shape
Since DataFram's data is a matrix (or 2D array) so we call the shape
variables to see the row and column number.
Output Row - 768 (without label) and Column - 10pm
data_frame.head (number)
By calling the data_frame.head (3) function we printed the first 3
rows of dataframes.
data_frame.tail (number)
By calling the data_frame.tail (4) function, we print the last 4 row of
dataframe.
Today's chapter ends here, but it was the first part of data
preparation. In the next phase we will discuss the fundamentals of
data prepressing.
Data Processing -Last Part
"Organize, do not agonize." - Nancy Pelosi
Data Preparation (Data Prepressing) - 2
Change the dataframe
Dataset data may be missing almost always. We also have to handle
those missing data. Yes, maybe we will not get lost data, but
programs can crash if they do not take any necessary measures.
What Columns to Exclude?
Which will not be used
There are columns but no data
If the same column is multiple times, then one will
have to delete the rest
Many times, it may seem to be the
name of two separate columns but
the fact is the same. For example,
one column is written in length and
the other is written in the column
Size (centimeter), suddenly it seems
that two things are different because
the label is Size and Length. But it
was noticed that we were getting the
size data by multiplying each of the
lengths of the data by 100. It is not
possible to find similar data by
calculating hands and even though
it is not an efficient method. These
extra columns actually generate
noise on dataset. We will separate
the same columns by using
'Correlation' here.
What is Correlated Column?
If the same information is in a different format, in
the example above, the length and size are actually
the same thing, only the unit is different. So, these
are Correlated Column
Adding little information or not.
Configuring learning algorithm.
A little bit about the linear regression
To understand the next example, we need some basic of linear
regression.
Think of the following fantasy datasets,

Graph
If you are told, how much will the house cost 5 sq ft? Feel free to
say, the answer would be 25 lac.
How did you say?
It is very easy, for every 1 sq ft, the price increases to 5 lac.
If we want to stand up to a mathematical model, it will be a lot like
that.
price=size(sqft)×5(tkinlac)
Or,
y=f(x)=α×x
Where, the price is y, sizeis x, a is 5 and the f(x) function tells how
much the price for its value x.
In reality, the model is not so easy, there are many odds, now I'm
getting the value with an alpha quality, but if you multiply it by beta,
gamma, theta etc. may not even get close quality.
Let's see the following dataset,

Graph

If you say, if the size of the house is 6 sq ft then how much will the
price? Now you are going to get a lot of trouble, because the
increased price is not balanced with increasing the size of every
square fit. By separating from the previous, you can find the
difference with the difference by adding the difference, the problem
is not so easy. No re-attached to the reason
Now if I am told, then I would be quite inconvenienced to stand a
mathematical model of it. Any such linear emission, where 1, 2, ... 5
gives 10, 12 ... 22 if inputs are available?
Exact cannot build a model, but it may be possible to create a nearby
model, which may be quite similar to the equation,
Examples of Correlated Column
Let's say we again get the famous problem House Price Prediction.

Without distinguishing the dataset column properly, I sat down to


predict it with the formula (Linear Regression Formula)
Price=α ∗ Area(Acre)+β×Size(kilosqmeter)+γ×noOfRooms
In the linear regression we saw every aspect (input variable)
multiplied by a co-efficient, then adding them and predicting the
output. Due to having the same column area & size twice, the output
price will never get better.
Here are two of the same columns that are easy to understand
because the example is created by me: P. Jokes Apart, if there are
many columns, all the names are different and the data is different,
but there is actually a different unit-based synonym, it is a very
complicated use of them. So here we will take the help of correlation
of an important statistic.
Pearson's Correlation Co-efficient or Pearson's r
Calling the co-relation function in the Pandas library, it calculates the
co-relation according to the formula below. Detailed discussion on
co-relation will be done further, for now, be happy about this
formula.

In this formula, x is a variable and y is another variable (Is not it too


obvious?).
We have to find out how much the value of r is. With the value of r,
we understand how much the compatibility of two variables is. If r is
equal to 1, then there is no difference between the two variables, so
calculating the correlation of any variable with the value of r is 1
Further explanation, if you want, correlation Co-efficient calculation
between the Acre and Sq meters of the above example will get the
value of r 1.
Let us see how to find out if there is any data missing in any column
in the dataset.
Find out the empty part of the null or dataset
Open notebook created previously and enter the code below,
print data_frame.isnull () values.any ()
isnull () values.any ()
isnull ()
It returns the dataframe again, but the difference is there is no value,
Empty Cell is replaced with True and Non-Empty Cell is replaced by
False.
values.any ()
isnull () Returns the dataframe, but. If the value is turned into an
array of True / False.
Check any. any () function, if there is no value blank or Empty in the
array.
There is no empty data in the pima-data.csv file. So, this program
shows False when calling the statement.
Deliberately deleting a cell and then
data_frame.isnull () values.any () run the
statement
Here I want to delete a cell in the pima-data.csv file and load it with
Pandas and continue the code.
It looks like the output is coming true. There is one cell empty.
Creating Correlation Matrix Heatmap
We have come to know a bit about Correlation and how to find out if
a null value is hidden in the dataset. Now let's see how to generate
Correlation Matrix Heatmap. Before that let's say, what is the key to
heatmap.
Heatmap
According to Wikipedia,
A heat map (or heatmap) is a graphical representation of data where
a matrix in individual values is represented by colors.
In other words, we generate a plot by replacing the numerical value
with colors. That will be Heatmap
As such, Correlation Heatmap is replacing Correlation values with
colors and plots in the graph.
Correlation Heatmap
We have seen how to calculate the co-relation between the two
variables
Ask yourself the question, how many values (floating point) is easy
to compare or easy to compare colors? Of course, the colors are easy
to compare,
The work we have to do is to select a variable and find a co-relation
with each variable (even with its own). To do this we will assign
variables to Row and Column wise,

Earlier it was said, the co-relation with a variable of its own will
always be 1. The value of the tables along with the diagonals must be
1 because the co-relation was detected in their own way. And
corr_value is a variation of a variable and another variable can be a
value, since we define these values using the libraries so we do not
see the need to calculate in our own hands.
The most important thing to do is choose the color of the heatmap.
There is nothing to worry about, if we look at the built-in color map
of the Matplotlib library, we can work now. If you want, you can
give your choice of color to the documentation. For now, we will use
the default.
Matplotlib Heat Map Color Guide
Matplotlib will set the color according to the sequence below when
generating a heatmap.
Less Correlated to More Correlated
Blue -> Cyan -> Yellow -> Red -> Dark Red (Correlation 1)
Heatmap Generating Function
Let's write down the function to generate an instant heatmap, the
function is like that
# Here size means plot-size
def corr_heatmap (data_frame, size = 11):
# Getting correlation using Pandas
correlation = data_frame.corr ()
# Dividing the plot in subplots for growing size of plots
fig, heatmap = plt.subplots (figsize = (size, size))
# Plotting the correlation heatmap
heatmap.matshow (correlation)
# Adding xticks and yticks
plt.xticks (range (len (correlation.columns)), correlation.columns
plt.yticks (range (len (correlation.columns)), correlation.columns
# Displaying the graph
plt.show ()
Why use the sub-slot?
If you wanted to be able to generate heatmaps even using
plt.matshow (correlation), but I could not generate the arbitrary size
graph, so plotting the plot is a size plot and generating a customized
plot of the plot.
What is xticks and yticks?
plt.xticks (range (len (correlation.columns)), correlation.columns)
This code implies that the length of each block is 1 unit and the spots
will be 0, 1, 2 ... len (correlation.columns). And each block with the
next argument (correlation.columns) has been labeled.
The same applies for plt.yticks ..
What has been done with plt.show ()?
U kiddin' bro?
HitmapPlotting’s by corr_heatmap (data_frame, size) function
Tried to write the function with difficulty and not use? With the code
snippet, I can easily plot the heatmap,
corr_heatmap (data_frame)
Generated hit mapCloser view
Remarkable
If we had already seen two variables in the same way, they would
have Correlation 1. In each diagonal, the co-relation with its own has
been detected, so the diagonal blocks are dark red.
But notice that the coils of skin and thickness are both but 1 (dark
red color).
So, the skin and the thickness are actually the same thing, the unit
has just changed. Do not believe?
If you do one thing, multiply each value of thickness by 0.0393701
and you will see the value of skin. 1 millimeter = 0.0373701 inch
Now you can tell which unit is actually?
Calvert, I've got dataset cleaning
From the above work, we realized that we have columns of the same
type. Tidy Data's attributes were that each column must be unique.
Keep one of the duplicate and the rest will disappear from the
dataset.
I'm going to miss the skin variable here, if you want you can erase it
or thickness, your wish is completely.
# Deleting 'skin' column fully
del data_frame ['skin']
# Checking if the action was successful or not
data_frame.head ()
We could drop a duplicate column. The work is not finished yet, the
data must be molded. There is nothing to worry about, this is the last
step of data prepation. So cheers!
Data Molding
Data type adjustment
Our dataset should be such that it is useful to work in all algorithms.
Otherwise, we need to do twinkling of data for each algorithm which
is a lot of trouble. So, we will do the trouble once and for all, so that
it does not become the cause of headaches.
Data type checking
Before data handling, please check the datatype once.
data_frame.head ()
If you do, then you will see some samples of dataForm and you will
notice that all the values are float or integer type but there is a
boolean type.
Data type changing
We will make true 1 and make False 0. The code below can be done
by snippet,
# Mapping the values
map_diabetes = {true: 1, False: 0}
#Setting the map to the data_frame
data_frame ['diabetes'] = data_frame ['diabetes']. map
(map_diabetes)
#Looks what we have done
data_frame.head ()
Congratulations!
This molded and cleaned dataset will be able to work in our
algorithm we want.
But?
Data Rule # 3
Rare events are less likely to predict with high auctions
Normal, because the rare event means that such events are less in
your dataset. And the lesser the duration of these events will be less
prediction and wors. But it is better not to worry about it. Prior to the
conventional prediction, fix the ray events later or not.
Some other analyzes.
True / False Ratio Check
If we want to see how many percentages of this dataset are affected
by diabetes and how many people do not, then notebooks will be
removed and write the instant code.
num_true = 0.0
num_false = 0.0
for item in data_frame ['diabetes']:
if item == True:
num_true + = 1
else:
num_false + = 1
percent_true = (num_true / (num_true + num_false)) * 100
percent_false = (num_false / (num_true + num_false)) * 100
print ("Number of True Cases: {0} ({1: 2.2f}%)" format (num_true,
percent_true)
print ("Number of False Cases: {0} ({1: 2.2f}%)" format (num_false,
percent_false))
Output:
Number of True Cases: 268.0 (34.90%)
Number of False Cases: 500.0 (65.10%)
We can write the code in Pythonic Way in four lines.
# Pythonic Way
num_true = len (data_frame.loc [data_frame ['diabetes'] == true])
num_false = len (data_frame.loc [data_frame ['diabetes'] ==
False])
print ("Number of True Cases: {0} ({1: 2.2f}%)" format (num_true,
(num_true / (num_true + num_false)) * 100))
print ("Number of False Cases: {0} ({1: 2.2f}%)" format (num_false,
(num_true / (num_true + num_false)) * 100))
Data Rule # 4
Keep data manipulation history and check regularly
There is a system to do this (using Jupyter Notebook)
Using the version control system, such as: Git, SVN, BitBucket,
GitHub, GitLab etc.
Summary
What have you done in these two episodes?
Read the data with Pandas
Auction of ideas about co-relation
Duplicate columns evicted
I've doled the data
Check true / false too
Algorithm Selection
Model Training Model Performance tasting-1
Model Performance Tasting-Last Part
Linear Regression
To see we came to the third step of machine learning. How to
determine which of the so many learning algorithms will be the best
choice for me.
If you see the work sequence again, then it will stand,
Overview of this chapter

We will discuss this phase in this chapter,


What is the function of the learning algorithm?
Select the algorithm based on which crystallization
Use the solution statement for sorting the algorithm
We will discuss what will be the best algorithm
I'll choose the primary algorithm
The primary reason for the primary reason is that
the same problem is not good enough to run with
an algorithm, always have to run behind our best
algorithm. So, it is necessary to train the
performance of the same dataset with different
algorithms and perform tests. But to work, first you
have to sort through an algorithm, that will be
discussed here.
Learning algorithm works
Although it sounds ridiculous, first we must understand what is the
function of the algorithm in machine learning process. So, let's see,
The learning algorithm can be compared to the engine which
manages the whole machine learning process. The important work
behind the data preprocessing is the learning algorithm.
We first share our datasets in two ways,
Training data (remains in large quantity; testing data is excluded
from here)
Testing data (in small quantities; there is no data testing dataset of
training datasets)
Now we feed this training data in algorithms, usually in the Scikit-
learn algorithm, if we use the fit () function to work on feed and
analyzes.
Mathematical model works behind this algorithm. Through this
mathematical model the algorithm dataset analyzes the internal
parameters during the time. To understand these tasks, we need to
discuss math, but we can work all the time considering it as a
magical thing to continue working. Of course, we will look at
Mathematical Analysis, but it is not a perfect time now. If you can
lose interest in math, then we will learn to drive before, then see how
the engine works, how the learning algorithm works.
After doing the simple, predict () function, we can predict things that
are not in the dataset (for example, to diagnose diabetes, before the
model train with the Pima Indian Dataset function, then with the
function of any person's parameters and predict ( ) Functions will tell
her the possibility of diabetes.
Train and predict with an algorithm?
There are about 50 learning algorithms established. You can also
create custom algorithms by crossover between them. But how will
we understand the effect of our algorithm? This topic is for
discussion about it.
Everyone has some selected factors to choose from for the algorithm.
When you are an expert, you can understand yourself which
algorithm is best for any purpose.
For now, you can choose based on these factors.
Algorithm Decision Factors
Learning type (Supervised or Unsupervised)
Result (Value or Yes / No type answer)
Complexity (Simple or Complex)
Basic or advanced
We will select the solution statement and the workflow process, two
combinations, it will be better to choose an algorithm.
Learning type
Different algorithms have different learning processes. Let's first
look at the solution statement first and then decide what kind of
learning we need.
After using a machine learning workflow, Pima Indian Data has to
be made a predictive model after preparation and necessary
transformation. Now this model has to be diagnosed with
acupuncture 70% or more who may be suffering from diabetes .
In the bold statement on the above statement, we see that there is a
proposal to build a predictive model.
We know,
Prediction Model => Supervised Machine Learning
That is, we got what our selected algorithm's learning type would be.
Ultimately, we will not use those algorithms that work with
Unsupervised Learning
By doing this,
22 algorithms were dropped, 28 handled algorithms in hand
So much! There is no problem, we can further filter one by one.
Result type
As I said earlier, we usually want to get two types of answers. One is
the value (Regression: how the bargain will be like with the size of
the house) Yes / No type answer.
Here the diabetic problem is actually classification, because we want
to know whether diabetes will be. Again, the decrypt value; For
example: 1-100, 101-200, 201-300 or small, medium, large etc. also
fall into classification.
Now I see how many algorithms are dropped,
8 kinds of algorithms left out of the result type classification, 20
in hand
Complexity
Since we started learning machine learning, we should avoid
complex algorithms. Make sure to apply KISS (Keep It Short and
Simple) formula.
What are the complex algorithms?
Ensemble Algorithms:
These are special algorithms, because each
Ensemble Algorithm is a collection of many
algorithms.
Very good performance
Debugging is not convenient
The algorithm decreased by 14 points

Basic or Enhanced?
Enhanced
Basic variation
Better than Perfomance Basic (say?: P)
Extra convenient
Complex
Basic
Easy
So easy to understand
Yes, you understand, because we're bigger, so it's good to be in
Basic.
Three Candidate Algorithm at the end of filtering
We now have three algorithms,

Naive Bayes
Logistic Regression
Decision Tree
I'll pick one from this. After a few discussions about three things, we
will come to a decision that will be used by Better. Three basic and
classic algorithm of machine learning. Complex algorithms are
basically made using them as building blocks. Let's start with naive
bayes.

Naive Bayes

Naive Bayes algorithm 'Bayes Theorem' made by applying. For those


who did not hear the names of 'Bayes Theorem', it could be said that
probability is one of the fundamental theorems 'Bayes Probability
Theorem'. Having a very important theorem, the details will be
discussed later. (Mathematics again)
The 'Naive Bayes' algorithm defines the possibility of something
happening. For example, what is the probability of diabetes with
high blood pressure? Thus, this algorithm (which is based on the
previous dataset), determines the probability of an event by mixing
probability with different 'Feature / Input Variable'.

Some of its features are:


Determines the chances of event occurring
Each feature or input variable (for problem
problems: no of preg, insulin, etc.) is equally
important.
Here the blood pressure is equal to the BMI (Body
Mass Index) equally important (as well as all the
variables)
A small amount of data is sufficient for predication
Logistic Regression

Confusing name, that means we knew Regression means continuous


related. But Classification is a Distinct Value. It seems, why we're
discussing the Regression method for classification?
The output of Logistic Regression is actually 1 (.9999) or 0
(0.00001).
Feature

Binary result
The weight of the Input Variable / Feature is
weighted (all features may not be equally
important)
Let's see the next algorithm.
Decision Tree

Its structure is similar to binary tree (assuming the


data structure is covered)
Every node is actually a decision
It takes a lot of data for decision splitting
Finally, Naive Bayes selected
Why?

It is easy to understand
Work faster (about 100 times faster than normal
algorithm)
Even if the data is changed, the model is stable
Debugging is comparatively easy
The biggest reason is that this algorithm matches
our problem perfectly, because we are trying to
find the likelihood and the work of this algorithm is
to determine the likelihood :)
Summary
Lots of learning algorithms are available
I did the
selection Learning Type – Supervised
Result - Binary Classification
Complexity - Non-Ensemble
Basic or Enhanced - Basic
Naive Bayes selected for training, because
Easy, Fast and Stable
Model Training
We have worked up to the solution statement, data collection and
prepressing and algorithm so far. Now we will see how to train the
model. According to the flowchart, we are in the following step,
Overview of this chapter

Talk about the training process


More about scikit-learn package
Trained with Algorithm Diabetes Data Trained
Model
Machine Learning Training
Definition,
Letting specific data teach a machine learning algorithm to create
specific prediction model.
That means, through training, a machine learning model can be
trained in a specific prediction model by using special datasets,
machine learning training.
Normally, I cannot predict the process of sun-raining by modeling
diabetic diets. For that we need a different specific dataset.
Often the model is needed to retrain.
Why do we retrain the model?
The Pima Indian dataset is updated after a few days with new data
(ie, the reservation is being added to the new row). We know
according to the characteristics of machine learning, as much as the
dataset is, so with the new add dataset again model training, we will
get better results than before.
New Data => better predictions'
From the new dataset we can verify verification for some data
training and for some data testing.
Training overview
Dataset splitting
What to do at the beginning of training is to share the dataset. On
average, we usually keep 70% of the data for training and the
remaining 30% for Rakhi testing.
We train the algorithm by feeding in the training data algorithm. The
actual meaning of an algorithm is to set the internal parameters of
algorithms for a specific dataset. When we look at Mathematical
Analysis, the matter will be clearer.
What is our training goal before?
Training goals and training data
We are taking hypothetical datasets to understand the training goals.
Again, we are not currently using the Diabetes Dataset to understand
the "training goals".
Well, two input variables / features are enough to understand that the
floppy will not be enough. The two features are X & Y. If we plot a
Scatter of Y with every X,
Explanation of Scatter Plot:
Here the blue dots are meant to denote X and Y for the combination
of X and Y, and red dot is meant to be a flop for all combinations of
X and Y.
We can draw a general decision boundary, to separate the dots. It
will basically provide your trained algorithm. Such a decision
boundary for the top dataset can give your algorithm.
But after the boundary draws, there are several red crosses in the
blue section and some blue circles have come in red.
So, training is not 100%. But 100% Acurease is not our target. To
create so much acutrate models, there is a lot of overfitting chances.
Overfitting and underfitting machine are important aspects of
learning, so we will take a detailed discussion later on.
Using this training data, we can make a decision boundary by
training the model. For now this is the purpose of my training data.
What is the working of testing data?
It is normal to think, why are not we using 100% of the data in
training? Why are we split at 70-30% ?! Did not reduce the training
data? It will not affect the performance? Where is the problem of
training with all the data?
Yes, many questions and my efforts will be to answer all of them.
Why do not we use 100% of data in training? Why are we split
at 70-30% ?! Did not reduce the training data?
The questions are the same, we try to understand the matter through
an earlier discussion.
Well, I'm teaching the name of someone (think he does not know
how to multiply, just knows to add). Notice what I've written in
'bold'. At that time, I am teaching her the name logic. Now if I'm
him,
2x1=1
2x2=4
2x 3 = 6
2x 4 = 8
2 X 5 = 10
2 x 6 = 12
2 x 7 = 14
2 x 8 = 16
2 x 9 = 18
2 x 10 = 20
I have memorized the names of these two repeatedly. If I tell her
now, say 2 x 3 =? He can immediately reply that 2 x 3 = 6 Now let
me tell him to test if I say the name of the room of 5. Now he will
not be able to reply immediately, in some cases he cannot answer.
With this example I understood that I could not teach him anything
at all. I stopped him from logging out 100% of the data supply. But if
I did this,
2x1 = 2
2x2=4
2x 3 = 6
2X 5 = 10
2 x 6 = 12
2 x 7 = 14
2 x 8 = 16
2 x 10 = 20
Here are two data missing, and I gave you the responsibility to find
out what the missing values will be? He will try to find logic now.
There will not be any memorabilia. Not only that, I can actually learn
from logic that he can learn logic at all.
That means we can verify with all the testing data, whether the
model I created is Predictive or not, I know which data I have, but I
did not provide it in training. In that case, if I can predict near the
model, then I am fairly successful. Because I could train a model. By
asking a question I do not know, how do I know another answer is
correct?
Input variables or feature selection
Feature Selection or Feature Engineering is a huge topic in terms of
data science. Because earlier said, there are many features in dataset
which are non-ad serving, excluding that, the prediction will be even
better. The variable name variation of the useful feature selection is
called the Vertical Name Feature Engineering.
We did feature engineering during dataset cleaning, that is, we did
cut out the skin by taking out co-relation.
Selected features in Pima Indian Diabetes Dataset:
No of Pregnancies
Glucose Concentration
Blood Pressure
Thickness
Insulin
Body Mass Index
Diabetes Predisposition
Age
Model training using Scikit-learn
Finally, we read the theories and after the extensive knowledge of
coding we can sit. Prepare, we'll start training models now.
Do you remember what to do at the beginning of training? If there is
no problem,
Data splitting
Through the code below, we will split data 70-30%. 70% of the
training data, the rest testing data. Remove Jupiter's Notebook and
start writing in the previous code.
from sklearn.model_selection import
train_test_split

feature_column_names = ['num_preg',
'glucose_conc', 'diastolic_bp', 'thickness', 'insulin',
'bmi', 'diab_preb', 'age']

predicted_class_name = ['diabetes']
# Getting feature variable values
X = data_frame [feature_column_names] .values
y = data_frame [predicted_class_name] .values
# Saving 30% for testing
split_test_size = 0.30
# Splitting using scikit-learn train_test_split
function
X_train, X_test, y_train, y_test = train_test_split
(X, y, test_size = split_test_size, random_state =
42)
For random_state = 42, this means that if the program runs every
time, splatting is guaranteed to be from the same place.
Is the split in the dataset really 70-30? Let's check
print ("{0: 0.2f}% in training set" .format ((len (X_train) / len
(data_frame.index)) * 100))
print ("{0: 0.2f}% test test" .format ((len (X_test) / len
(data_frame.index)) * 100))
Output:
69.92% in training set
30.08% in test set
Near!
What is missing data? (0 value, not null value)
Sometimes a column may have different values, but when you check,
you see that many values are 0 which is not possible. How to deal
with it? There is an algorithm that replaces 0 values with an average
value to be placed in the state to work, before going see it, how many
of our values are actually 0!
print ("# rows in dataframe {0}". format (len (data_frame))
print ("# rows missing glucose_conc: {0}" format (len
(data_frame.loc [data_frame ['glucose_conc'] == 0])))
print ("# rows missing diastolic_bp: {0}" format (len
(data_frame.loc [data_frame ['diastolic_bp'] == 0])))
print ("# rows missing thickness: {0}" format (len (data_frame.loc
[data_frame ['thickness'] == 0])))
print ("# rows missing insulin: {0}" format (len (data_frame.loc
[data_frame ['insulin'] == 0])))
print ("# rows missing bmi: {0}" format (len (data_frame.loc
[data_frame ['bmi'] == 0])))
print ("# rows missing diab_pred: {0}" format (len (data_frame.loc
[data_frame ['diab_pred'] == 0])))
print ("# rows missing age: {0}" format (len (data_frame.loc
[data_frame ['age'] == 0])))
Output:

# rows in dataframe 768


# rows missing glucose_conc: 5
# rows missing diastolic_bp: 35
# rows missing thickness: 227
# rows missing insulin: 374
# rows missing bmi: 11
# rows missing diab_pred: 0
# rows missing age: 0
Imputation is a technique to replace a value with a value. In order to
apply Imprint Cycle has provided ready codes, we will continue to
work using it.

from sklearn.preprocessing import Imputer


#Impute with mean all 0 readings
fill_0 = Imputer (missing_values = 0, strategy =
"mean", axis = 0)
X_train = fill_0.fit_transform (X_train)
X_test = fill_0.fit_transform (X_test)
Here is fill_0 and a model, whose job is to replace the 0 values with
a logical value through mean strategies.
We will train with this modified train value and test it with test value.
Why did not I use the y-train or y_test?
Because there is no missing data there.
Model Training
Finally, we'll train the model by calling that magic function.
from sklearn.naive_bayes import GaussianNB
# create Gaussian Naive Bayes model object and train with it the
data
nb_model = GaussianNB ()
nb_model.fit (X_train, y_train.ravel ())
We had already discussed that our selected algorithm would be
Naive Bayes and a model of that algorithm is Gaussian Naive Bayes.
We made the object of an empty model, then train it by calling
function () function with training value.
In the next chapter we will see how the model we are making is
performing performance!
Model Performance Testing-1
We finished working from data collection to model training. Now the
rest of the model is doing performance shows. The model
performance chapter will be a little bigger, so I divided it into two.
Two-episode Model Performance Testing Chapters Overview
The main subject matter
Model Evaluation Between Test Data
Result Interpretation
Result Improvement / Model Improvement /
Accuracy Improvement
Also
Confusion Matrix
Recall
Precision
AUC
ROC
Overfitting
Model Hyperparameter
Overweight reduction
K-Fold Cross Validation or N-Fold Cross
Validation
Bias-Variance Trade off
For some good performance performances discount
Walking in the halt, we finally came to the end,
Then let's start.
Remember,
Statistics work with data only, we define what is bad and which is
good. And depending on how good we are, we will use the model.
There are many theories, I see a little implementation.
How the person is performing in the data tracked
Before we started the work, we divided the data into two parts, one
training is another testing. I have talked about memorizing this time.
Anyway, we will now see, if the data is being tracked, then what
kind of predicting it is to feed that data.
It's a lot like the example of that name,
If this is training data:
3x1=3
3x2=6
3x1==10
If you want to know what kind of training has been done in the
training data,
3 x 1 =?
Run the code below to tray your model (previously not called
though, so far as you have done in Jupyter Notebook)
# This returns array of predicted results
prediction_from_trained_data = nb_model.predict (X_train)
Now prediction_from_trained_data variable has predicted array
assigning. If we want, we can now sit down with the pen, the result is
the result of the observation at the dataset and the predicted result
key of the model we created in the dataset observation.
Or another thing can be done with the build-in modules of the Cykit-
Lern library, that our model could capture the right diabetes and how
many did not.
If you open the Jupiter notebook instead of the register pen, start
writing there,
# performance metrics library
from sklearn import metrics
# get current accuracy of the model
accuracy = metrics.accuracy_score (y_train,
prediction_from_trained_data)
print ("Accuracy of our naive bayes model is: {0: .4f}" format
(accuracy))
If you have forgotten
We split the dataset into four variables, X_train, y_train, X_test,
y_test
Where,
X_train = Input values for train [no_of_preg, insulin, glucose ... etc]
(70% of full dataset)
y_train = Xscinding output of x_train [diabetes -> yes / no] (since
the intensifying value of y_train, it is also 70%)
X_test = Input values to test [30% of total dataset and this 30% of
training data does not exist]
y_test = Suspending output to test input
Since we are seeing some of the acuracies in the track data, so it is
not normal that we use the output of the model x_train
(prediction_from_trained_data) and X_train's real output (y_train) in
the metrics.accuracy_score function.
Output of the previous code snippet
The output of the previous code snippet is that,
Accuracy of our naive bayes model is: 0.7542
The target in our solution statement was predicted in 70 or more
Accuracy. But here we see Accuracy about 75%.
Stop, there is nothing like celebrating before. This accuracy score but
on the training data, means that he has been trained by this data and
then testing the data in the test. That is, the contents of syllabus were
asked.
Performance in testing data
Now if you want to say that the test data will be the performance of
the code, then you should check that the code below matches the
code,
# this returns array of predicted results from test_data
prediction_from_test_data = nb_model.predict (X_test)
accuracy = metrics.accuracy_score (y_test,
prediction_from_test_data)
print ("Accuracy of our naive bayes model is: {0: 0.4f}" format
(accuracy))
Output
Accuracy of our naive bayes model is: 0.7000
That means asking questions outside the syllabus, but he can answer
with 70% acurer, that is, 70% of his answer is correct and the rest is
wrong.
That is what we wanted, that is, if we give data inputs to a new tested
person in this trend model then 73% of the answer is correct. If the
model is called new person's diabetes, its likelihood is 70%.
But then
Yes, yes, it has not yet arrived at the time of celebration. The next
penny file work of data collection is performance testing and
necessary changes.
Classification Test Problems Performance Testing: Confusion
Matrix
Our problem is classification type and there are different
measurements for it to perform the performance test. What's not to
say is confusion matrix There is nothing to be confused with names.
In addition to writing code, we will know more about this.
For now, know that with the Confusion Matrix, we will know how
our model performs. Then enter the code below,
print ("Confusion Matrix")
# labels for set 1=True to upper left and 0 = False to lower right
print ("{0}".format(metrics.confusion_matrix(y_test,
prediction_from_test_data, labels=[1, 0])))
Confusion Matrix
We can publish table numbers by TP, FP, FN and TN. Where,
TP = The original output is 1 or diabetic is likely to be ** and ** the
predicted model we have made
FP = Actual output is not likely to be 0 or diabetes ** But **
Predicting the model we created
FN = True output is 1 or diabetic is possible ** But ** Predicting
the model we created
TN = The original output is 0 and our model is predicting 0
If Confusion Matrix is a little confusing, think well once again and
try to understand the table with your thoughts.
Shortcut
TP = How many incidents happened and detected as happened
FP = How many incidents did not happen but detected as happened
FN = How many incidents happened but did not detect as it
happened
TN = How many incidents did not happen and did not detect
Let's see again, according to the above statistics,
52 cases have been detected as diabetes and 52 have actually been
diagnosed with diabetes. <- TP
28 cases have been detected as diabetes, but 28 people are not
actually diabetic. <- FP
33 cases have not been diagnosed as diabetes, but 33 people are
actually diabetic. <- FN
118 cases did not detect as diabetes and 118 were not infected with
diabetes <- TN
If our model was 100% Accurate, then what would have
happened to its Confusion Matrix?
It is easy to understand; 100% Accurate Model will have FP = 0 and
FN = 0. Then the Confusion Matrix will be such,

Confusion Matrix Review: Classification Report


We can see some more statistic reports to find model Accuracy
through the Confusion Matrix. We will also see how to find out the
formula, along with the built-in cykit's built-in function.
Generation of classification report is actually above the Confusion
Matrices data. To view the classification report, run the following
statement,
print ("Classification Report")
# labels for set 1 = True to upper left and 0 = False to lower right
print ("{0}" format (metrics.classification_report (y_test,
prediction_from_test_data, labels = [1, 0])))
Report output
Classification Report
precision recall f1-score support
1 0.61 0.65 0.63 80
0 0.81 0.78 0.79 151
avg / total 0.74 0.74 0.74 231
Here we will discuss with two topics; one is Precision and another is
Recall
Precision Finding Formula
$$ Precision = \ frac {TP} {TP + FP} $$
That is, we know for the perfect condition, FP = 0, so 100% of
Accurate Model
$$ Precision = \ frac {TP} {TP + 0} = \ frac {TP} {TP} = 1 $$
means that the value of Precision is as good as possible. Our target
would be to make Precision's value as big as possible. ##### `Recall`
exit form $$ recall = \ frac {TP} {TP + FN} $$
Similarly, in the case of 100% Accurate Model
$$ recall = \ frac {TP} {TP + 0} = \ frac {TP} {TP} = 1 $$
Again, we have the goal of increasing the value of Recall as much as
possible.
Precision - 0.61 & Recall - 0.65 Not bad, but its value can be
increased further. We will make that effort.
How to Improve Performance?

We can increase the performance of the model


through the following methods
Adjust or modify that algorithm
Collecting more data or improving data frame
Try to Improve Training
Algorithm Changed
Let's change the algorithm instead: Random Forest
Why do I see Random Forest? Because,
It is Ensemble Algorithm (simple and advanced)
There may be many trees in the data subset
The results of the tree are overloaded and the
performance is good
We do not have to do anything about our data, as we have already
done prepressing. Just create new models and train with data and test
performance with test data.
Write down the code,
from sklearn.ensemble import RandomForestClassifier
# Create a RandomForestClassifier object
rf_model = RandomForestClassifier (random_state = 42)
rf_model.fit (X_train, y_train.ravel ())
Write it down and hit the bottom of an output like that, below the
training, now it's time to perform a performance test.
Output:
RandomForestClassifier (bootstrap = True, class_weight = None,
criterion = 'gini',
max_depth = None, max_features = 'auto', max_leaf_nodes = None,
min_samples_leaf = 1, min_samples_split = 2,
min_weight_fraction_leaf = 0.0, n_estimators = 10, n_jobs = 1,
oob_score = False, random_state = 42, verbose = 0, warm_start =
False)
Random Forest Performance Testing: Predict Training Data
The code as before,
rf_predict_train = rf_model.predict (X_train)
#get accuracy
rf_accuracy = metrics.accuracy_score (y_train, rf_predict_train)
#print accuracy
print ("Accuracy: {0: .4f}" format (rf_accuracy))
Output:
Accuracy: 0.9870
Unsteady is not it? Dataset he has memorized well. Now let's see
how the performance in the testing data!
Random Forest Performance Testing: Predict Testing Data
rf_predict_test = rf_model.predict(X_test)
#get accuracy
rf_accuracy_testdata = metrics.accuracy_score(y_test,
rf_predict_test)
#print accuracy
print ("Accuracy: {0:.4f}".format(rf_accuracy_testdata))
Output:
Accuracy: 0.7000
Accuracy, 70% in testing data and 98% in training data. That is, the
answer to the question of syllabus is good, but the answer is quite
bad coming from outside. This means that he cannot predict good on
real world data, but he can predict good from the dataset that he has
learned.
More than our Naive Bayes model was doing well! Our testing data
requires good acuity.
Let's see how the classification report of the Random Forest model!
We can put the borrowed code upside down somewhat!
print ("Confusion Matrix for Random Forest")
# labels for set 1 = True to upper left and 0 = False
to lower right
print ("{0}" format (metrics.confusion_matrix
(y_test, rf_predict_test, labels = [1, 0])))
print ("")
print ("Classification Report \")\
# labels for set 1 = True to upper left and 0 = False
to lower right
print ("{0}" format (metrics.classification_report
(y_test, rf_predict_test, labels = [1, 0])))
Output:
Confusion Matrix for Random Forest
[[43]
[30 121]]
Classification Report
precision recall f1-score support
1 0.59 0.54 0.56 80
0.77 0.80 0.78 151
avg / total 0.70 0.71 0.71 231
Here, the value of precision and recall is worse than our previous
Naive Bayes.
Important: Overfitting
When you see Training Data and Testing Data are fairly good at the
Accuracy Score, then your model will be confronted with the most
classic of machine learning problems. That is Overfitting, our
Random Forest model suffers from overfitting. We will look at
several methods of going out of Overfitting's vicious cycle of the
next phase.
Model Performance testing: Last Part
Model performance - second and last episode
We came to the last stage of machine learning. I had seen roughly
the previous episode, if the model does not perform well or if other
models are doing well then how to work.
We are still in this step,
Model Performance Revision - ROC
Before understanding ROC, you must know about Confusion Matrix,
if not know, then read from the previous episode.
To draw the ROC curve at ROC space, the X-axis has to set the FPR
(False Positive Rate) and the TPR (True Positive Rate) at Y-axis.
If we get TPR and FPR rates from the Confusion Matrix, we will get
a point, so we get one point from each of the many models that we
work on the same dataset.
By adding these points, you'll get your desired ROC curve.
The TPR of a Perfect Classifier is 1 and the FPR is 0.
An example of ROC is understood by an example.
Let's see a scenery,
I took a diabetic dataset, there is 1,000 observations in the dataset, I
divided it to 80% -20%. Currently 80% of the data is training data,
20% data is testing data.
Again, 100 out of 200 testing data is positive (i.e., the output positive
is more easily and 100 out of the data outcome is diabetes). And 100
negatives.
I made four models, I will train four of these models and I will
perform their performance. There are four models,
Gaussian Naive Bayes Model
Logistic Regression Model
Random Forest Model
Artificial Neural Network Model
We have not seen the Artificial Neural Network yet and there is no
problem even if I do not know about it.
I can train every model like previous episode and then find out their
confusion matrix, right? In the same way, I'll teach people by
learning the models with 80% datasets and then reading them to test
their performance. (Find out the Confusion Matrix).
Also remember, the confusion matrix of each model as well as their
TPR, FPR out.
Gaussian Naive Bayes Model

Logistic Regression Model

Random Forest Model


Artificial Neural Network Model

We already know that the ROC curve has Y-axis in TPR and X-axis
has FPR. Then we can easily place these four coordinates at ROC
Space
Coordinates
Coordinate -> Model (X, Y)
G point -> Gaussian Naive (0.28, 0.63)
L point -> Logistic Regression (.77, .77)
R point -> Random forest (.88, .24)
A point -> Artificial Neural Network (.76, .12)
We'll plot these points now.
import numpy as np
import matplotlib.pyplot as plt
# fpr, tp
naive_bayes = np.array ([0.28, 0.63])
logistic = np.array ([0.77, 0.77])
random_forest = np.array ([0.88, 0.24])
ann = np.array ([0.12, 0.76])
# plotting
plt.scatter (naive_bayes [0], naive_bayes [1], label = 'naive bayes',
facecolors = 'black', edgecolors = 'orange', s = 300)
plt.scatter (logistic [0], logistic [1], label = 'Logistic Regression',
facecolors = 'orange', edgecolors = 'orange', s = 300)
plt.scatter (random_forest [0], random_forest [1], label = 'Random
forest', facecolors = 'blue', edgecolors = 'black', s = 300)
plt.scatter (ann [0], ann [1], label = 'artificial neural network',
facecolors = 'red', edgecolors = 'black', s = 300)
plt.plot ([0, 1], [0, 1], 'k--')
plt.xlim ([0.0, 1.0])
plt.ylim ([0.0, 1.0])
plt.xlabel ('False Positive Rate')
plt.ylabel ('True Positive Rate')
plt.title ('Receiver operating characteristic example')
plt.legend (loc = 'lower center')
'plt.show ()

Examples are taken from Wikipedia, there are some extra points out
of Wikipedia's ROC curve,
I've split scatter separately to explain each point here. If you have a
lot of models, or if you plot a paradigm shift based on the same
model, then your plotted ROC curve will be such.
The model here is only 4, so the line plot here cannot be understood,
so Scatter Plot is done.
ROC curve explained
100% Accurate Model's FPR = 0 and TPR = 1. It is easily
understood by the Ideal as the ANN model is best, then Naive Bayes,
then Logistic Regression and the end is the Random Forest
performed.
Already (and again) say, it is not always like ANN> NB> LR> RF,
according to the type of dataset and problem, one model performance
is different. I imagined the whole thing here.
Let's say that the dashed line that has gone through the middle is
called line of no-discrimination. The points are as good as the above
on the line and the worse is the bottom.
AUC or Area Under Curve
Looking at a ROC curve above? The ROC curve covers the area as
much as it is. 100% Accurate Model's AUC is TPR * FPR or area of
whole graph.
A lot of questions have been raised about measuring performance by
AUC, at present, everyone prefers ROC. So, I did not talk about the
AUC.
Overfitting
Earlier it was said, at some time the performance of the model is so
good that the accuracy rate in training data is about 95-99%. But
predicting test data is not 40% accurate.
The question is, why is it?
In fact, we train with that dataset, there is noise and there is real data
as well. That is, 100% Pure Dataset will never get you.
It can be a classic expletive, I found some datasets, how many bells I
read, and how many hours I got to get a number of marks on it. Now
I'm going to predict the model on this dataset and then paddle it, and
if I see some way, Marx is getting more than sleeping, and based on
that, I slept before the next test but I did not read anything (because I
made sense that Marx got more sleeping ). It is going to mean what
the results will be.
So, what is the reason that this is wrongly giving the prediction? Two
reasons, (1) there is not enough data, (2) the number of columns in
the dataset (variable, here are some bells read and many hours sleep).
There may be many reasons for Marx's goodness, if the test is MCQ,
then it is a good amount of color with a bark, or the question is much
easier etc. So I did the train without input, so the model is naturally
that? Without knowing the reasons, I will adapt myself to the dataset
given in the way that the error is the lowest.
Model train means reducing error, and hyperparameters for each
model are set according to mathematical analysis for error reduction.
If you use the Hyperpermeter, then the model will use the same error
(this is normal). But if the model is adapted to the noise of the
dataset, it will be a lot of trouble if the error is reduced.
Later on, we will look at more detail about overfitting in few steps.
Overweight reduction
What can be done to reduce the overfitting is to collect data and
increase the number of columns. Pure datasets and good prediction
results as much as possible. What did it do in the dataset? If you
want to tune the algorithm well then it is possible to find out. We
will look at a method.
Regularization & Regularization Hyperparameter
We can control how we will learn an algorithm. The machine
learning algorithm means that no mathematical model is working
behind it, so the mathematical model learning mechanism can be
controlled with certain parameters.
Hold out a model out of the output with this formula,
Y = ax^{3} + b x Y = a x 3 + bx
We can control its learning, subtract from section results, create a
Regularized Model,
Y=ax3+bx−λ×xY = ax^{3} + bx - \lambda \times xY=ax3+bx−λ×x
Here, the Regularization Hyperparameter i s λ
Note that, Y s value will be slightly lower than the previous
prediction, so that the training dataset will be less than this precision.
But it's good! Because? The reason is that, now he is not memorizing
every dataset, because Regularization Hyperparameter will not allow
him to memorize, as its value increases, its predicted value will be as
much as penalty. So, we can say this as a Penalized Machine
Learning Model.
Whenever the model can adapt to the dataset for reducing error,
Omni lambda will remove it with a penalty. Our ultimate work will
be done to tune this lambda in such a way that accuracy is good at
Testing Dataset. Accuracy glance at training dataset: p
Improve Accuracy via Regularization Hyperparameter Tuning in
Logistic Regression Models
Topic title became a bit bigger. A little earlier we learned that by
hacking the mathematical model we can reduce the model
overloading through regularization. Modular-based Regularization
Hyperparameter is a variety of. Alive Logistic Regression's model is
coded in the Cykit Library and they also provide a convenient
interface for Regularization Hyperparameter.
Our task will be to collect prediction scores by changing the
Regularization Hyperparameter's value. Then store that accuracy will
be the highest in the Hyperparameter Value.
I saw the theorem, this time to see the practical watch. Now you
must take out the notebook and write the code.
from sklearn.linear_model import LogisticRegression
lr_model = LogisticRegression (C = 0.7, random_state = 42)
lr_model.fit (X_train, y_train.ravel ())
lr_predict_test = lr_model.predict (X_test)
# training metrics
print "Accuracy: {0: .4f}" format (metrics.accuracy_score (y_test,
lr_predict_test))
print "Confusion Matrix"
print metrics.confusion_matrix (y_test, lr_predict_test, labels = [1,
0])
Output
Accuracy: 0.7446
Confusion Matrix
[23 128]
Classification Report
precision recall f1-score support
1 0.66 0.55 0.60 80
0.78 0.85 0.81 151
avg / total 0.74 0.74 0.74 231
We did this for the Nive Voices model. Here C is our Regularization
Hyperparameter, starting at the beginning, we'll check Acuracie for
its various values later.
C (Regularization Hyperparameter) Values
C_start = 0.1
C_end = 5
C_inc = 0.1
C_values, recall_scores = [], []
C_val = C_start
best_recall_score = 0
while (C_val <C_end):
C_values.append (C_val)
lr_model_loop = LogisticRegression (C = C_val, random_state =
42)
lr_model_loop.fit (X_train, y_train.ravel ())
lr_predict_loop_test = lr_model_loop.predict (X_test)
recalls_score = metrics.recall_score (y_test, lr_predict_loop_test)
recall_scores.append (recall_score)
if (recall_score> best_recall_score):
best_recall_score = recall_score
best_lr_predict_test = lr_predict_loop_test
C_val = C_val + C_inc
best_score_C_val = C_values [recall_scores.index
(best_recall_score)]
print "1st max value of {0: .3f} occured at C = {1: .3f}" format
(best_recall_score, best_score_C_val)
% matplotlib inline
plt.plot (C_values, recall_scores, "-")
plt.xlabel ("C value")
plt.ylabel ("recall score")
Since Regularization Hyperparameter C, and I want to see the
recall_scores for the value of different Cs (the better the
recall_score), so C_start = 0.1 auction, C_end = 5 auction, and
increase the value of C to 0.1 in the loop.
And for every C value, I checked Acuracie with predicted dataset,
whenever the value of the recall is greater than before, then at the
best_recall_score, recall_score means the current score will be
assigned.
The code is not difficult to understand the previous issues.
Two lists named C_values and Recall_scores are for value store
Output
Graph of how performance is changing with the increase in the value
of C

When the value of C is between 2 and 3, then Recal Score is the


highest, the value of C is less than 4-5 and 0-1.
Model performance with class_weight = 'balanced' and C change
Regularization Hyperparameter will have one reason, there can be
multiple reasons. A little earlier we removed the value of C. Now we
will see another parameter (class_weight) balanced with the
performance.
Following the class_weight = 'balanced', changing the value of C
will be the main objective.
C_start = 0.1
C_end = 5
C_inc = 0.1
C_values, recall_scores = [], []
C_val = C_start
best_recall_score = 0
while (C_val <C_end):
C_values.append (C_val)
lr_model_loop = LogisticRegression (C = C_val, class_weight =
"balanced", random_state = 42)
lr_model_loop.fit (X_train, y_train.ravel ())
lr_predict_loop_test = lr_model_loop.predict (X_test)
recalls_score = metrics.recall_score (y_test, lr_predict_loop_test)
recall_scores.append (recall_score)
if (recall_score> best_recall_score):
best_recall_score = recall_score
best_lr_predict_test = lr_predict_loop_test
C_val = C_val + C_inc
best_score_C_val = C_values [recall_scores.index
(best_recall_score)]
print "1st max value of {0: .3f} occured at C = {1: .3f}" format
(best_recall_score,best_score_C_val)
% matplotlib inline
plt.plot (C_values, recall_scores, "-")
plt.xlabel ("C value")
plt.ylabel ("recall score")
Output:

Class Wait Balance is shown in the Recall Score increased to 0.73+,


definitely what we were looking for!
Confusion Matrix
Code:
from sklearn.linear_model import LogisticRegression
lr_model = LogisticRegression (class_weight = "balanced", C =
best_score_C_val, random_state = 42)
lr_model.fit (X_train, y_train.ravel ())
lr_predict_test = lr_model.predict (X_test)
# training metrics
print "Accuracy: {0: .4f}" format (metrics.accuracy_score (y_test,
lr_predict_test))
print metrics.confusion_matrix (y_test, lr_predict_test, labels = [1,
0])
1 print ""
2 print "Classification Report"
3 print metrics.classification_report (y_test,
lr_predict_test, labels = [1,0])
4 print metrics.recall_score (y_test, lr_predict_test)
Output:
Accuracy: 0.7143
[[59 21]
[45 106]]
Classification Report
precision recall f1-score support
1 0.57 0.74 0.64 80
0 0.83 0.70 0.76 151
avg / total 0.74 0.71 0.72 231
0.7375
This way we can increase the accrual through regularization
(decreasing overfitting).
K-Fold / N-Fold Cross-validation
Another effective algorithm for reducing overfitting is K-Fold Cross-
validation. Although the name is very difficult to hear, the job is very
simple.
Our Diabetes Dataset but Negative Answer More (i.e. Diabetes is
not). K-Fold Cross-validation helps in providing good acuity in cases
where the balance of the dataset is low.
K-Fold or N-Fold Cross-validation is the same thing when k = N! Or
K = Number of observations.
What is done in k-Fold cross-validation, the full dataset is subscribed
to k equal sized.
Now this data is taken from a number of subsystems for testing.
For example, I have 25 observation datasets, I divided them into 5
groups.
So, the Dataset is uprooted by 5 o'clock. Now, I hold the first group
of these five groups, I gave the rest to the training, hold the test with
the dataset.
Hold the second group at the second pass (do not send to the
training), and the rest will send to training.
In the same way, the fourth and fifth passes hold the positional group
and send it to the rest trainings.
In this way we will train 5 times 5-Fold. Since every group has the
Observation 5 and the group number 5 will be named 5-Fold Cross-
validation.
Model training and testing using cross-validation
Cross-Validation Enabled Model is built in cyket, if you apply CV
with any Normal model, you will get Cross-validation Enabled
Model.
For example, LogisticRegression's cross-validation Enabled model
will be LogisticRegressionCV, thus true for the rest.
Let's see its performance,
from sklearn.linear_model import LogisticRegressionCV
lr_cv_model = LogisticRegressionCV (n_jobs = -1, random_state =
42, cs = 3, cv = 10, refit = false, class_weight = "balanced")
# set number of jobs to -1 which uses all the cores to parallelize
lr_cv_model.fit (X_train, y_train.ravel ())
lr_cv_predict_test = lr_cv_model.predict (X_test)
# training metrics
print "Accuracy: {0: .4f}" format (metrics.accuracy_score (y_test,
lr_cv_predict_test))
print metrics.confusion_matrix (y_test, lr_cv_predict_test, labels =
[1, 0])
print ""
print "Classification Report"
print metrics.classification_report (y_test, lr_cv_predict_test, labels
= [1,0])
Output:
Accuracy: 0.7100
[[55 25]
[42 109]]
Classification Report
precision recall f1-score support
1 0.57 0.69 0.62 80
0 0.81 0.72 0.76 151
avg / total 0.73 0.71 0.72 231
10-Fold Cross Validation performance has not come bad!
Scikit-Learn Algorithm Cheat Sheet
Linear Regression Basic Discussion
In this section,

Linear Regression Basic Discussion


Linear Regression Phase II and Gradient Descent
Linear Regression: Initial Discussion
Soon, we learned about the target of machine learning. But still
ignorant about how the mathematical model is working. In today's
discussions, in addition to the Predictive Model building, we will see
how the models are actually created or what is the logic behind it.
Contents of today's discussion

What is Linear Regression?


Model Representation
Cost Function
Cost Function Intuition
Let's get started.
Before starting, we think of the famous house bidding dataset.
Remember, your friend is a Real Estate Businessman and you are a
Data Scientist. Knowing about your friend, he thought that he would
do some business work with you and instead he would give you
something.
The thing is, your friend has a dataset, where the size of the house
and the bargain is given. What you have to do, is to analyze the
analytical ability of machine learning in that dataset, to predict the
size of the house that is not given the size of the house.
This problem actually falls into the regression, how?
Linear Regression
Regression:
Regression means to predict real-value output. Another type of
prediction we did (yes / not based on), that is classification. To
predict 10, 20, 30, or 1236, 5.123, etc., means that I have given a
regression problem.
Linear:
Linear means the straight-line type. If we want to solve the problem
with a line model then it will be linear model.
So Linear Regression
Linear Regression is the method of predicting real value with a
model like a line. If my model predicted value through a barred line,
then its name would be Polynomial Regression.
Single Variable Linear Regression
Well, while analyzing the diabetes dataset, we worked with several
input values like no of pregnancy, insulin level etc. ** But the input
column in your friend's dataset is just one size of the house
So, we are sharing this problem as a Single Variable Linear
Regression. If there was another input variable instead of one, as if
there were no rooms, we would call it Multi Variable Linear
Regression Problem
This is a supervised learning problem
Because, we are giving some data here with the right answer, that
means we will send some home size and price to the machine
learning model. Leveraged data is meant to be a supervised learning.
What is the meaning of prediction through linear model?
We try to understand the matter through a small dataset. Suppose, the
rate of eating at your nominal restaurant is proportional to your
income. And do you work in a company where your salary increases
in month (or is that company?).
After 5 months of service, you have been counting for 5 months,
every month to eat KFC, BFC, Hajir Biryani, Star Kabab etc etc.
After calculating it was seen that the dataset has stood like this.
Income every month days out to eat

As you have already made a good hand in the matplotlib library, let's
just shoot a graph.
import matplotlib.pyplot as plt
import numpy as np
beton = np.array ([20, 30, 40, 50, 60])
khaoa = np.array ([5, 10, 15, 20, 25])
plt.xlabel ('Proti mash e income')
plt.ylabel ('Khete jaoar har')
# Income vs expenses
plt.title ("Ae vs Baae")
plt.plot (beton, khaoa)
plt.show ()
Graph
If I tell you, okay, in the sixth month, how many times will you go to
eat? You can say without difficulty, 30 times (if the income increases
balanced).
This is what you predicted, but a mathematical model can be made.
$$ Khaoa = \ frac {Aye - 10k} {10k} \ times 5 $$
With this equation you can verify dataset.
Here I created an equation; this is the linear model where you can
give an Aye input and sometimes you will be predicted. Graphic
proof of being a linear model is graph. Your dataset is undoubtedly
fitted in the linear model.
Now let's think of another scenario,
Income per monthHow many times have you been eating out every
month?
Seeing in this dataset your income increases in the early months, but
later, you have not been able to quit eating habits first. Then he's
taken over control.
A scatter of this dataset
If you say this time, how many times will you go to the next month if
your income is 15k? Now there's no linear pattern in the dataset, no
specific equation and you can easily predict.
We can put the linear model apart from the non-linearity, with
extreme conditions. That's the next discussion. Now we will discuss
the linear pattern. We understand the linear regression, now what is
the model representation?
Model Representation
The simplest Bengali model of model rearrangement is the analysis
that we will run in a dataset, how many notes of the notation mean,
how to write and how to construct a book. Why is it necessary? The
reason is that when you go to read theoretical books of machine
learning, then you will not be able to match this course. There may
be models reprinted with Math's Harkhabi symbol. So, to understand
them we also need to know about official notation.
Your friend's dataset needs a little bit again, so paste here again.
Home size (single-sq ft) (assuming it is x)Home Prices (Single - Tk.)
(Hold it, Y)
So, this is the row number 47 of this dataset so we will write,
m = 47
X = "input" variable / feature
Y = "output" variable / "target" value
$$ (x, y) $$ is a row with this notation, it can be any one.
If I want to mean the 20th row, I will write $$ (X ^ {(20)}, Y ^
{(20)}) $$
If it means training example then we have to say
Hypothesis
Let's see a diagram,
The question is,
How do we create this? Since today's chapter can be considered with
a linear model, we can choose a linear function.
Well, we're writing Hypothesis, who we write in the short hand.
Notice that the input variable here is just one, so let us say Univariate
Linear Regression.
Cost Function
We usually spend income calculating. Always try so that our
expenditure is the lowest. This is exactly the case for machine
learning. Here's an all-in-one effort, minimize how much the Cost
Function is. As model training, we understand Cost Function
Minimization.
Before minimizing the cast function, it is understandable that what is
actually the Cost Function. Before understanding the Cost Function,
let's know another thing.
We've chosen the function for hypothesis. Here we only know its
value. But I do not know how much it will be. Let's do some research
on what it will be like.
Do one thing before, make a scatter plot of your friend's dataset.

One of the straight lines that we have to do is to fit into this data.
It must be sorted and sorted according to the value of our hypothesis
(Prediction) Dataset, ie cost is lowest or Cost Function is Minimized.
Before knowing what the value of the theta will fit in the dataset, we
will see how the graphs of h come with the change in the value of the
parameters (theta).
Grab and
Then the graph will come,
Grab and
Then the graph will come,

With Diets, but almost fits. What we have to do is shift the line a bit
more, do not remembe r y = m x + c ? m Is the slope and the c
constants whose work is to move the straight line to the positive side
of Y-axis (for c positive values).
Here θ 0 is working and doing c job and θ 1 ffff

is doing m’s work.


Now we try to plot the graph again using two parameters.
Hold it, and
Our hypothesis is , h =90000+120 × x
For every size of the house, let's see the output line of our
Hypothesis's Straight-Line Plot and Scatter Plot output.

It must be very well (apparently). But see some data disappeared in


the top, that is, we can tune a bit further to get such a graph , θ 0

an d θ1
Green color line is our new hypothesis. The red line is the previous
one.

All understood but did not say the cost function? I cannot even get
the ticket.
There is nothing to worry about, we have raised the hypothesis this
time we will see the cost function.
The Cost Function will be displayed here by J ( θ 0 , θ 1 ) . If

we had a few more styles in our model, then we would publish the

cast functio n θ 2 , θ 3 ... etc . So, the parameters of the cost

function and the parameter parameters are the same.


The formula of Cost Function is,

J ( θ 0 , θ 1 )= 2 m 1 ∑ i = 1 m ( h 0 ( X ( i ) )

− Y ( i ) ) 2

Do you remember what? If you do not mind, read from the


beginning. I have applied the Ordinary Least Square method as Cost
Function here. Cost Function is not just that. But usually it is used.
The cost function that you want to say is that; We will extract error
for every Observation (Error = Hypothesis Value - Real Value), we
will classify that error. Thus, we shall calculate Erro r 2 sum (m is
the sum of the number of Erro r 2 ). Then we divide with

\frac{1}{2 \times m} 2 × m 1

By not increasing the talk, we calculate the cost calculation for 5


observations of datasets.
Dataset’s first five Observation
Home size (single-sq ft) (assuming it is x)Home Prices (Single - Tk.)
(Hold it, Y)

Take hypothesis , h =90000+120 × X


Input,
X = [2104, 1600, 2400, 1416, 3000]
Real output,
Y = [399900,329900, 369000, 232000, 539900]
Hypothesis output,
Example,
h1 = 90000 + 120 * 2104 = 342480
h2 = 90000 + 120 * 1600 = 282000
h5 = 90000 + 120 * 3000 = 450000
H = [342480, 282000, 378000, 259920, 450000]
So, we understood how to predict the value of the model parameter
and we predicted the value with the hypothesis function.See Real and
Predicted value erratic, we'll calculate the cost based on the error.
Observation here or m = 5, break the output of the cast function,
break it a bit,

J (90000,120)= 2 × m 1 ×( e 1/2 + e/ 22 + e 3/2 +

e 4/2 + e 5/2 )

Where , e 1 =( h ( X 1 )− Y 1 )=342480−399900=

−57420
Thus, we can remove the rest of the errors by python, formulate the
form and calculate the cost.

12 + e 22 + e 32 + e 42 + e 52 =14534002800
Multiplied by1/ 2 × m

J (90000,120)=1453400280

This is the calculated cost for 0 =90000 , θ 1 =120

In this way we will calculate the cost calculation for our various
combinations, and the lowest cost comes in a combination, then by
that combination we will perform a performance test by making a
linear model.
Frequently Asked Questions:
What is the value of multiplication by the Cost Function?
It is done to make the next mathematical calculation easy. There is
nothing else. If you do not multiply by half, there is no problem.
What is the meaning of Residual, MSE (Mean Square Error), OLS
(Ordinary Least Square), Loss Function, Residual Sum of Squares
(RSS)?

Residual means the sum of the difference in the


value of the original value of the original and the
value of the original value. What is the sum of the
error is
MSE means all the observations of the error class
of the class
Ordinary Least Square is the Statistical Estimator,
which calculates cost calculation
Loss Function is another name for Cost Function or
Alias
RSS sources are:
RS S = ∑ i = 1 m Residua l 2

Why is the OLS used as the cost function method?


Very important is a question and not usually asked. The main reason
to use OLS as the Cost Function is its parabolic graph. It is very easy
to get cost estimation from parabolic function. And its graph is
parabolic, it is normal.
Linear Regression Phase II and Gradient Descent
Linear Regression: Second Episode
We learned some basic knowledge of linear regression in the last
phase as well as some of the Cost Function Calculations. Today, we
will try to know the following.
Contents of today's discussion

Cost Function Input – 2


Graph of J (\ theta) J (θ)
Gradient Descent Optimization
Cost Function Intuition
So long hopefully the linear model has a good idea, if that happens,
then we will come back to the Cost Function.
What is the benefit of the cost function graph?
Our task was to minimize the cost. So is the main goal of all
engineering. Using as little resources as possible results are
available. Likewise, our main goal for machine learning is to give
accurate predication.
If we plot the results of many models of cast function, we can easily
track graphs, for the parameters of the lowest error.
Let's take a look at a new thing by omitting everything, thinking
about the following datasets,
Income (X)Cost (Y)

Graph
Such a graph of this dataset,

We will use this model to predict it: h 0 ( θ )= θ × X


We will plot J (θ) for different θ values. I mean calculate the cost
per prediction. Then look for the value of J (θ) for any value of θ.

h 0 ( θ )= θ × X 0

Hold θ = 0.1
Then the plot will come,

hypo1
Cost Calculation: J(0.1)=2×31 ×4 2 +40 2 +400
2=26936.0
Hold again , θ =0.2
Then plots,
hypo2
Cost Calculation: 15151. 5 J (0.2) = 2×31 ×3
2+30 2 +300 2 =15151.5
Hold again θ =0.3
Then plots,
Cost Calculation: J (0.3)= 2×31 ×2 2 +20 2
+200 2 =6734.0
Hold again θ =0. 4
Cost Calculation: (0.4)= 2×31 ×1 2 +10 2 +100 2
=1683.5
θ = 0.5

Cost Calculation: J (0.5)= 2×31 ×0 2 +0 2 +0 2 =0

Increasing the value of theta, θ = 0.6


J (0.6)= 2×31
Cost Calculation: 1683. 5
×(−1) 2 +(−10) 2 +(−100) 2 =1683.5
Increasingly θ = 0.7
Cost Calculation: J
(0.7)= 2×31 ×(−2) 2 +
(−20) 2 +(−200) 2 =6734. 0
I did not grow up, and now we get scalar plots for the number of J (\
theta) J (θ) values for each theta value,
Cost Function Graph
J = [26936.0, 15151.5, 6734.0, 1683.5, 0, 1683.5, 6734.0]
theta = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7]
colors = ['blue', 'black', 'orange', 'pink', 'magenta', 'brown', 'aqua']
'for i in range (len (j)):
lbl = 'Hypothesis H =% 0.1f * x'% theta [i]
plt.scatter (x [i], j [i], linewidth = 5, color = colors [i], label = lbl)
'plt.legend (loc = 'best')
plt.title ('Cost Function Graph')
plt.xlabel ('Theta')
plt.ylabel ('J (theta)')
plt.show ()
What do you understand from the graph? Cost is less than the lowest
for \ theta = 0.5θ = 0.5. That means the prediction is better when the
theater value is 0.50.5. Thus, from the cost function of each model
we can imagine how good the performance of the model is.
If our model was J ( θ 0 , θ 1) = θ 0 + θ 1 × x
Then it could have been plotted,

We finally learned a lot about Cost Function. Now we will see the
Cost Function Minimization Using Gradient Descent
Gradient Descent Algorithm
Do you remember calculus? Differentiation? That's what we will be
doing now. If you do not mind, then let's look at a differentiation
first.
Differentiation: Method for Calculating Slope at a specific point
of a function
At any point the derivative of a function means that the tangent tilt of
that function is at that point. Hold, y = f (x) y = f (x) any one
function, we now have its (x_ {1}, y_ {1}) (x
) Want to know the tangent, its slope (how many degrees angle
produces with the XX axis) at the point. Then we deflect f (x) f (x)
with respect to the independent variable xx. The differential operator
writes to the operator \ frac {dy} {dx}
Let's see the picture below,

Slope or shield
The formula for the slope is, m = Δ y / Δx

The quality of the shield is of four types, non-zero positives,


negative, zero, and undefined. Based on this quality we can classify
shields.
This slope can be divided into four,
Positive Slope
The slope that produces an arched angle with the
XX axis is called a positive shield. If the positive
shield actually goes to him, then yy's value will
increase.
Negative Slope
The slope that produces a thick angle with the XX
axis is called a negative shield. If you go towards
him as a negative shield then the quality of yy will
be reduced.
Zero Valued Slope
The slope, which produces 0-degree angle with the
XX axis, is called zero slope.
Undefined Slope
The slope that produces 9090 degrees with the XX
axis is called a positive shield.
Slopes,

Partial Derivative
We will most certainly use the Personal Derivative. A function that is
always a defender on a variable is not true. Eg: z = f (x, y) = x ^ {2}
+ xy + y ^ {2} z = f (x, y) = x Let's think about this function, here
the zz variable is dependent on x, yx, y two. So if we want to track
changes in zz with respect to xx and yy two, then there will be no
derivative.
z = x 2+ xy + y 2
Δ z / Δx= 2 x + y
Δ z / Δx= 2 y + x
If we calculate the cost function with the parameters then our
normal derivative is being taken, but if there is no cost function with
two or more parameters, then we must take a partial derivative. For
now, we will try to understand Gradient Distends with a parametric
cast function.
The question is, what will we do with this shield? Actually, we can
save Billion-Billions of secs with a little (!) Concept of calculus.
We will try to minimize the cost with differentiation and slope
concept. And for that effort, the algorithm we use will be that of
Gradient Descent
Gradient Distinct
Algorithm
repeat till convergence {

θ j : = θ j − α δθ j δ J ( θj )

} Mathematical Notation
I meanMathProgramming
x and y are equalx = yx == y
Assign the value of y to xx: = yx = y
x update examplex: = x + 1x = x + 1
That means : =This is meant by θj’s value to be updated every time.
Here, α is the Learning Rate
Gradient Distinct Intuition
What is the algorithm actually saying? We already know that
machine learning model training means that the parameters of the
model are set in such a way that our prediction is perfect. We try to
understand through a few graphs, what is the function of the gradient
descent algorithm?
Hold our Cost Function j(θ1)
This time any θ1 Hold the value of it, and deferent at that point. If
the shield is positive, it means that J (θ1) The value increases and the
value of the value decreases in the opposite direction. Seeing the
picture below can be understood.

This time we take another point, which is at the left of the local
minimum.

That means the Gradient Dissent Sources tell us which direction the
cost function will be minimized. This is when a parameter. It is not
convenient to visualize these hundreds of parameters, but in all
cases, it is exactly the same.
This update will continue until the minimum point is reached. The
algorithm at the minimum point will become an automatic stop

because the minimum point δ J ( θ 1 ) )/δ θ 1 = 0 and if the

gradient part is 0 then there will be no updates.


To this end, we will be able to know about the second linear
regression, multi-parameter gradient descent and batch gradient
descent in the next phase.
Frequently Asked Questions
What is the learning rate?
Learning rate or alphaα (physical mining), how fast the cost function
is to convert to local minimum. Lowering the learning rate. It takes
more time (iteration) to converge the value to mean it will be
updated many times. Increasing learning will reduce the updates.
This alpha should be any positive number.
Increasing the learning rate or decreasing the effects of the
effects?
Imagine, you were tied on your eyes and released on an little
ground. And it is said, your job will be to find the lowest point. Now
if you walk on the big step, you can avoid the minimum point, and it
will take a long time to walk down the small steps. We can say that
the step that is taking the step of learning is analogy.
Do you need to increase / decrease the learning rate along with
step?
No, because updating of the automatic gradient descent algorithm
decreases because the minimums move towards the local point. So, if
the value of \ alphaα is fixed then it will be converted to the
minimum point.
What is the purpose of randomizing θ 1 value or the value of the
above parameters?
The answer to this question is huge, the main advantage of the
parameter initialization in random points is to find out the global
minimum. The same graph can have local minimums or global
minimums. The local minimum is meant to mean the point which is
comparatively lower in the overall graph. And the global minim is
the point of the whole graph that is the lowest point.
Again, we have backed up the example of Patni. Suppose you were
asked to leave the point by picking up the point of the helicopter and
picking the minimal point. You kept going straight and took out a
local minimum. Now if you leave it on that point again and you're
going to go out, you will start skipping every time you get a local
minimum point.
Now you get rid of the randomly helicopter at this point and this time
you can actually go to the global point.

Multivariable Linear Regression


In the previous episodes we saw how to fit the linear model in single
variable variables. Today we will see, if the problem is multi-
variable / column / feature, then its analysis will be like.
Multivariable Datasets
Before the start of the work, let's see the dataset,

Remarkable
Notice that the input variable is not the same as before. Rather, many
more, we now have the features that we will not have xx only. Now
we need to separate each column with the mathematical notation so
that we can understand what is actually a column. To do this, we will
place the column number with the xx subscribe for each column. In
the superscript, the row indexes sit and subscribe to the column
Column Index.
Example: (For First Row Only)
Size(feet2) =x1(1Number of bed rooms=x2(1)
Number of floors=x3(1)Age of home=x4(1)
Price=y1(1)
2nd example
If we want to arrange the second-row input variables in matrices,
then it would be like this, since we are not considering any specific
columwise variables, we created a matrix with all the variables so we
do not have to implement the subscript separately.
Hopefully, the matrix notation of the third and fourth rows can be
understood. End of the notation is understood; we will go directly to
the model building.
Hypothesis
The previous hypothesis was that,
hθ ( x )= θ 0 + θ 1 x

With this we will not work on this multi variable set. Then the way?
Well, there are ways, that is to multiply a new parameter before each
variable.

hθ ( x )= θ 0 + θ 1 x 1 + θ 2 x 2 + θ 3 x 3

+ θ 4 x 4 …(1)

Now we can do good practice with different values of the theater, for
example,

h θ ( x )=80+0. 1 x 1 +0.0 1 x 2 +3 x 3 −2 x 4

…(2)

This equation (2) (2) has nothing to take seriously, it is a thoughtless


example.
Math again
There is nothing to fear, we are here to discuss basic mathematical
notation. Because understanding the notions will not be a problem to
understand the theory of General-Purpose Machine Learning, I can
write shortcut, you will also understand.
Hypothesis modification
We can see the Multivariable Hypothesis model in equation (1) . It's
a matter of saying, if we want to sort it into matrix form, then we will
have a big problem. Because, the hypothesis parameter has startedθ0
From, but the row of variables has started x1 from At that time the
number of model parameters is more than the number of columns.
To subtract the addition of matrix, the dimension should be equal, so
for matrix operations to be implemented, we will modify the
equation (1) (1).
We can write equations (1)

, hθ ( x )= θ 0 x 0 + θ 1 x 1 + θ 2 x 2 + θ 3

x 3 + θ 4 x 4 + ⋯ + θ n x n …(3 )

If we x0 = 1 If there is no difference between the equation (2) and (3)

If we want to keep X and θ in the matrix of n numbers, we will write


this way,
Similarly, if we write theta parameters in the form of matrix,

Why Hypothesis is Modified?


Matrix Multiplication: Rule Number 1
The first condition of multiplication of two matrices is that the first
matrices will be equal to the number of row numbers in the second
matrix. If we were not sitting, then the dimension of the two would
never be equal. However, we still do not transpose the second
matrix, which means that it may take a little reversal. There could be
another solution to equal the dimension, if we could lift it. But to
remove the parameter is not an intelligent task. If we do not take it
alone, then we can only value it.
Matrix Multiply Example:
If you do not remember linear algebra, it can be taken as a little eye
wash, in the following equation,

Hold on,

And

We can write the whole thing in the form of matrix,


As a matter of fact,

However,
For example, a column and another row matrix. But the variables
we're working with are both column matrices. So, we can convert a
column matrix to the row matrix to multiply. Transpose the name of
this conversion. Transposing is very easy; the matrix rows are
arranged in columns or columns will be arranged in a row.
We have to modify theta matrix here, so that's it

Transposer operation is defined by superscript T there.


Hypothesis matrices notation
Hopefully it has been boded properly, however artificial intelligence,
no matter what date science; There is not a single moment except the
linear algebra. While learning image processing, it needs to be
bundled with a matrix-based math.
Modified Gradient Descent
In the case of multivariable regression, the gradient degeneration
algorithm will also be changed.
The previous algorithm was,

Where,
While, n=1

Changed formula, when


Since, for multiple variables,
Will run,
}
In the next phase we will write the code in Python.
PRACTICAL LINEAR
REGRESSION
Practical Linear Regression: Gradient Descent
We will learn to make a linear regression model scratch in today's
chapter. But before doing this you must have an idea about NumPy.
If not, read this chapter. Then let's start.
Dataset
To build a linear regression model I am using Andrew Ng's Machine
Learning Course’s Linear Regression Chapters' Dataset. But there is
a slight difference. My dataset changed slightly and the change was
just adding two columns in the first row. Click here to view or
download the dataset
Data visualization
The first thing we'll do is draw a scatter plot of my dataset. I'll use
the Seaborn library here. The Seaborn library is based on matplotlib.
There are many features built-in to make data visualization easy.
Seaborn installation
Run the command in the command window or terminal,
pip install seaborn
Creating a scatter plot using Seaborn
import csv
import matplotlib.pyplot as plt
% matplotlib inline
import seaborn as sns
import pandas as pd
import numpy as np
'# Loading Dataset
with open ('ex1data1.txt') as csvfile:
population, profit = zip (* [(float (row '' population ''), float (row
['profit'])) for row in csv.DictReader (csvfile)])
'#Creating DataFrame
df = pd.DataFrame ()
df ['Population'] = population
df ['Profit'] = profit
'# Plotting using Seaborn
sns.lmplot (x = "Population", y = "Profit", data = df, fit_reg =
False, scatter_kws = {'s': 45})
Plot Output
Explanation
Create dataset loads and dataframes
At first, I loaded the dataset and took two input data populations and
output data / target / label in the Python list on the Profit List. I made
a Pandas dataframe with two lists.
Plotting
Using the lmplot function, I created ScatterTotal where I labeled X
and Y axes with x and y, and passed the generated dataframe as data.
If the value of fit_reg was true then the graph would appear, that
means that Seaborn fits a linear model. But our main task is to do
this by using gradient descent algorithm. With scatter_kws = {'s': 45}
I changed the scatter dots.
If true_reg = true, plots will look like

Cost Calculation and Gradient Dissent: Matrix Operation


So far, we have looked at Gradient Dissent and Cost Calculation. But
before handing it over to the code, the theory needs to be thoroughly
welded. The important thing to do is to write code, visualization.
Let's apply a gradient diagnosis with a little visualization.
Cost Calculation
We know from the calculation of the cost calculation,

If you want to apply this formula, you can use the loop. But no, we
can do it very easily using NumPy through the matrix operation.
What is the meaning of which notes are in the previous chapters? I
also show it again through a simple example.
Well, this is my dataset,

Where, m=3
Linear Regression Formula,

But the dimension 2 x 1 of our theta, that is, two rows and a column.
In the matrix form,

And X’s dimension is a single variable in linear regression, so we


add an element containing an element. In other words,
Then our hypothesis will be for every column, X x θ

In the Matrix form,


Output or Target Matrix,

Calculated Cost Formula in Matrix Form

homework
Gradient Dissent Formula Formats Write in the form of matrices.
We will write this matrix calculation in Python. These reasons
require linear Algebra Solid Foundation to quickly understand the
calculation of machine learning. It is very easy to apply machine
learning algorithm for those who understand the good linear algebra
and calculus.
How to Apply Cost Calculation and Gradient Dissent Algorithm
Using NumPy
Now we will use the NumPy to write the 97 calculations of the
calculation of the calculation of the calculation of the dataset and the
gradient descent.
Cost calculation function in Python
# Here, X, y and theta are 2D NumPy array
def computeCost (X, y, theta):
# Getting number of observations
m = len (y)
# Getting hypothesis output
hypothesis = X.dot (theta)
# Computing loss
loss = hypothesis – y
# Computing cost
cost = sum (loss ** 2)
# Returning cost
return (cost / (2 * m))
Put the number of observations on a m
Hypothesis values out
I got the loss, which is the value of the original
value and the predicted value
I got the cost, which is the sum of the squares of
loss
Average cost return
Gradient Dissent Calculation Function in Python
def gradientDescent (X, y, theta, alpha, iterations):
cost = []
m = len (y)
for i in range (iterations):
# Calculating Loss
loss = X.dot (theta) - y
# Calculating gradient
gradient = X.T.dot (loss)
# Updating theta
theta = theta - (alpha / m) * gradient
# Recording the costs
cost.append (computeCost (X, y, theta))
# Printing out
print ("Cost at iteration {0}: {1}" .form (i, computeCost (X, y,
theta)))
return (theta, cost)

We will update the parameters in a specific


iteration range, that is, the cost of a certain quantity
decreases, we do not see it, how much the cost has
decreased in a particular erection. So we fixed the
iteration. There may be another way, until after a
specific cost, the iterations will continue. But it can
be dangerous in many cases that we can see soon
after.
Creating an error calculation plot using the created function
#Converting loaded dataset in numpy array
# Example:
# X = [[1, 10],
# [1, 20],
# [1, 30]]
#X = np.concatenate ((np.ones (len (population)) reshape
(population, 1), population.reshape (len (population), 1)), axis = 1)
# Example
# y = [[1],
# [2],
# [3]]
y = np.array (profit) .reshape (len (profit), 1)
# Creating theta matrix, theta = [[0], [0]]
theta = np.zeros ((2, 1))
# Learning rate
alpha = 0.1
#Iterations to be taken
iterations = 1500
# Updated theta and calculated cost
theta, cost = gradientDescent (X, y, theta, alpha, iterations)
Output
Cost at iteration 0: 5.4441412681185035
Cost at iteration 1: 5.409587207509947
Cost at iteration 2: 5.376267659177092
Cost at iteration 3: 5.344138517723003
Cost at iteration 4: 5.313157253503435
Cost at iteration 5: 5.2832828563299445
Cost at iteration 6: 5.254475781184327
Cost at iteration 23: 177385094.9287188
Cost at iteration 24: 9252868248.562147
Cost at iteration 25: 482653703983.4108
Cost at iteration 26: 25176474129882.656
Cost at iteration 27: 1313270455375155.5
Cost at iteration 28: 6.850360698103145e + 16
Cost at iteration 29: 3.573326537732157e + 18
Gradient Dissent Formula is not working?
Gradient Dissent Algorithm's original work is minimized, but
iterations 29 See how much it has increased! What is the reason for
it?
The original chronology: the learning rate
If we run the code lower than the learning rate,
# Creating theta matrix, theta = [[0], [0]]
theta = np.zeros ((2, 1))
# Learning rate
alpha = 0.01
#Iterations to be taken
iterations = 1500
# Updated theta and calculated cost
theta, cost = gradientDescent (X, y, theta, alpha, iterations)
Output
Cost at iteration 0: 6.737190464870004
Cost at iteration 1: 5.9315935686049555
Cost at iteration 2: 5.901154707081388
Cost at iteration 3: 5.895228586444221
Cost at iteration 4: 5.8900949431173295
Cost at iteration 5: 5.885004158443647
Cost at iteration 6: 5.879932480491418
Cost at iteration 7: 5.874879094762575
Cost at iteration 8: 5.869843911806385
Cost at iteration 1497: 4.483434734017543
Cost at iteration 1498: 4.483411453374869
Cost at iteration 1499: 4.483388256587726
Now see the cost is really decreasing. That means our Gradient
Descent Algorithm is working properly. The learning rate is not
overshooting, and gradient heels are coming down!
What was the problem?
If you read Linear Regression second part chapter, then understand
that because the learning rate is high, instead of converting it to the
minimum point, it was just going upwards because of overshoot.
Cost vs. Iteration Graph
import matplotlib.pyplot as plt
plt.plot ([i for i in range (1500)], cost, linewidth = 1.9)
plt.xlabel ("Iterations")
plt.ylabel ('Cost')
plt.show ()
If we plot a piece of iteration vs. kusta, it will look like this,
Output
Multivariable
Linear Regression
We have seen single variable linear regression so long. The work is
quite similar to multivariable, but the number of columns can be
increased. Even matrices notation will remain the same.
In the next phase, Gradient Descent Matrix Notation will publish and
publish the model for multivariable linear regression.
APPENDIX
NumPy contact
Note: From now on, the code will be executed on Python 3, if you
have set up Python 2, then create a virtual environment with Python
3 setup, the process of converting the previous code into Python 3 is
in progress.
Numpy installation
If you do not have Numpy, enter the following code in the Command
Window / Terminal,
pip install numpy scipy matplotlib ipython jupyter pandas sympy
nose
The piped sympy nose, which is used for pipely installed ipython
jupyter pandas
And if you have Anaconda Setup, then your PC has already installed
NumPy. If you have any problems see this documentation.
Gradient Dissent If we want to do multiple loops, you can do it with
single elements but it will not be an efficient at all. Computing time
is very important for machine learning. And to do that, we must have
a good handle on Numpy libraries. By gradually solving the
problem, Numpy libraries will have the same skills.
Numpy at the beginning
Numpy is mainly used for scientific computation. Because machine
learning issues are working with high dimensional arrays, we need
tools that can work very fast with such high dimensional arrays.
Numpy is such a library in the way we work with arrays on
MATLAB, we can say NumPy in Python's MATLAB interface. But
there are several differences.
NumPy's documentation is enough to know perfectly. But here I will
discuss important. Then let's start.
Array
Numpy Array is the number of grid values. And all the value type is
the same, that means float, int64, int8 etc.
The dimension of an array is what we call it isRank. For example,
2 Dimensional Numpy Array, we say, Rank 2 Array. Numpy is a
Tuple of the Shape Integer array that shows how many elements are
in each dimension.
Let's see the example below,
import numpy as np
a = np.array ([1, 2, 3]) # Creates a rank 1 array
print (type (a)) # Prints "<class 'numpy.ndarray'>"
print (a.shape) # Prints "(3,)"
print (a [0], a [1], a [2]) # Prints "1 2 3"
'a [0] = 5 # Change an element of the array
print (a) # prints "[5 2 3]"
'b = np.array ([[1, 2, 3], [4, 5, 6]]) # Create a rank 2 array
print (b.shape) # Prints "(2, 3)"
print (b [0, 0], b [0, 1], b [1, 0]) # Prints "1 2 4"
Numpy also has some functions that we can create specific size
arrays,
See the documentation for more information about arrays.
Array Indexing
Numpy arrays can be indexed in a number of ways.
Slicing
Python list the way we slice, numpy arrays can also be sliced,
because Array can be multi-dimensional, for every dimension you
have to specify from how long you want to slice
import numpy as np
'# Create the following rank 2 array with shape (3, 4)
# [[1 2 3 4]
# [5 6 7 8]
# [9 10 11 12]]
a = np.array ([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
'# Use slicing to pull out the subarray consisting of the first 2 rows
and columns 1 and 2; b is the following array of shape (2, 2):
# [[2 3]
# [6 7]]
b = a [: 2, 1: 3]
'# A slice of an array is a view in the same data, so modifying it will
modify the original array
'print (a [0, 1]) # Prints "2"
b [0, 0] = 43 # b [0, 0] is the same piece of data as a [0, 1]
print (a [0, 1]) # Prints "43"
You can also write integer indexing and slice indexing. But by doing
so, the new array will decrease by 1, as well
import numpy as np
a = np.array ([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
# Two way of accessing the data in the middle row of the array
# Mixing integer indexing with slices yield
# While using only slices yields an array of the same rank as the
original array
row_r1 = a [1,:] # rank 1 of the second row of a
row_r2 = a [1: 2,:] # rank 2 view of the second row of a
print (row_r1, row_r1.shape) # Prints "[5 6 7 8] (4,)"
print (row_r2, row_r2.shape) # Prints "[[5 6 7 8]] (1, 4)"
# We can make the same distinction when accessing column of an
array
col_r1 = a [:, 1]
col_r2 = a [:, 1: 2]
print (col_r1, col_r1.shape) # Prints "[2 6 10] (3,)"
print (col_r2, col_r2.shape) # Prints "[[2]
# [6]
# [10]] (3, 1) "
Integer Array Indexing
When slicing with numpy array integers, new arrays will always be
the sub array of the original array. That means, the new arbitrary
array can be created with integer array indexing that the array's
element comes from the original array.
If we need an array whose elements are in the ascending order such
as 0, 1, 2, 3, then np.arange (num) function can be created with an
array.
If you see an example,
import numpy as np
a = np.array ([[1, 2], [3, 4], [5, 6]])
# Example of integer array indexing
# The new array will have shape (3,)
print (a [[0, 1, 2], [0, 1 0]]) # Prints "[1 4 # Which is equivalent to
this one
print (np.array ([a [0, 0], a [1, 1], a [2, 0]]) # Prints "[1 4 5]"
# We can also write in this way [Plain old indexing]
print (np.array ([a [0] [0], a [1] [1], a [2] [0]]))
# Create sequence of array using `arange` function
sequence = np.arange (4)
print (sequence) # Prints "[0 2 3]"
Indexing can be done in another way, for example
import numpy as np
a = np.array ([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
print (a)
# Prints
# [[1 2 3]
# [4 5 6]
# [7 8 9]
# [10 11 12]]
# Create an array of indices [we can refer this as selector]
selector = np.array ([0, 2, 0, 1])
# Selecting one element from each row using the selector indices
print (a [np.arange (4), selector]) # Prints "[1 6 7 11]"
# Mutate one element from each row of 'a' using the indices in b
a [np.arange (4), selector] + = 10
print (a)
# Prints
# "[[11 2 3]
# [4 5 16]
# 17 [9 9]
# [10 21 12]] "
Boolean Array Indexing with Boolean Expression
Through Boolean array indexing, we can select elements with
different conditions. This can be done in Pandas Library too. For
example, let's see,
import numpy as np
a = np.array ([[1, 2], [3, 4], [5, 6]])
# Find the elements of 'a' that are bigger than 2
# This returns a numpy array of booleans of the same shape as 'a'
# where each slot of bool_idx tells whether that element of 'a' is> 2
bool_idx = (a> 2)
print (bool_idx)
# Prints
# "[[False False]
# [True True]
# [True True]] "
print (a [bool_idx]) # Prints "[3 4 5 6]"
# We can all get a statement in a statement
print (a [a> 2]) # Prints "[3 4 5 6]"
Indexing is presented here in a nutshell; more details will be seen for
documentation.
Datatypes
When creating an array, trying to guess numpy, you are creating a
data type array. But if you want to create an array with integer but
you may have to put an element of the float type then you have to
override its guess, so there is an optional argument in Numpy. Let's
see, for example,
import numpy as np
x = np.array ([1, 2]) # let numpy handle the data type
print (x.dtype) # Prints "int32"
x = np.array ([1.0, 2.0]) # Again do numpy do it's magic
print (x.dtype) # Prints "float64"
x = np.array ([1, 2], dtype = np.int64) # Forcing a particular data
type
print (x.dtype) # Prints "int64"
Array Math
Basic Math
This topic is very important. Because by using this we will save from
the use of the loop.
Remember, the mathematical operator usually works as an element
Wise. And every operator can also be done with Numpy's built-in
function.
import numpy as np
x = np.array ([[1, 2], [3, 4]], dtype = np.float64)
y = np.array ([[5, 6], [7, 8]], dtype = np.float64)
# Element wise sum; both produce this array
# "[6.0 8.0]
# [10.0 12.0]] "
print (x + y)
print np.add (x, y)
# Element wise subtraction
# "[[-4.0-4.0]
# [-4.0 -4.0]] "
print (x - y)
print np.subtract (x, y)
# Element wise multiplication
# "[[12. 12.]
# [21. 32.]] "
print (x * y)
print (np.multiply (x, y))
# Element wise division
# "[[0.2 0.33333333]
# [0.42857143 0.5]] "
print (x / y)
print (np.divide (x, y))
Element wise Square root
# "[[1. 1.41421356]
# [1.73205081 2.]] "
print (np.sqrt (x))
Matrix operation
We had already seen, machine learning meant working with matrix,
so we would know well the Matrix manipulation through Numpy.
How to multiply the elemental value, how to multiply the matrix.
Before handing down the code, let's review the dot multiplication,
Dot Product of Vectors

a = i ^+ 2 j ^ + 3 k And b = i ^+ 2 j ^ + 3 k ^
will be dot product,

In the form of matrix

Dot product of Matrices / Multiplication of Matrices


But if you think of matrix dot multiplication, it will be like this
Now what will be the counting of these matrices? let's see,

Take, C as a matrix,

If we do dot multiplication or matrix multiplication of C with C ,


However,
A.C cannot be done,
The condition of two matrices dot product is that if the first matrix is
the dimension m 1, n 1 and the second matrix dimension m 2, n 2, it

should be n 1= m 2

A’s dimension 3x3 and C‘s dimension 2x2, so they cannot be


multipltiplied.
Now let's see the code,
import numpy as np
x = np.array ([[1, 2], [3, 4]])
y = np.array ([[5, 6], [7, 8]])
v = np.array ([9, 10])
w = np.array ([11, 12])
#Inner product of vectors
print (v.dot (w))
print np.dot (v, w)
#Metrix / vector product;
print (x.dot (v))
print np.dot (x, v)
# Matrix / matrix product; both products the rank 2 array
# [[19 22]
# [43 50]]
print (x.dot (y))
print (np.dot (x, y))
Now if we want to sum the sum of all elements or column wise sum,
then sum of NumPy function is very useful.
import numpy as np
x = np.array ([[1, 2], [3, 4]])
print (np.sum (x)) # Compute sum of all elements; prints "10"
print (np.sum (x, axis = 0)) # Compute sum of each column; prints "
[6 6]"
print (np.sum (x, axis = 1)) # Compute sum of each row; prints "[3
7]"
Broadcasting
Numpy's Broadcasting Mechanism is very useful if you have to work
with different shapes arrays. For example, if we want to do the
following without Broadcasting of Numpy,
import numpy as np
x = np.array ([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
v = np.array ([1, 0, 1])
# We will add the vector 'v' to each row of the matrix 'x'
#Storing the result in the matrix 'y'
y = np.empty_like (x) # Create an empty matrix with the same shape
as 'x'
# Add the vector 'v' to each row of the matrix 'x' with a clear loop
for i in range (4):
y [i,:] = x [i,:] + v
Now 'y' is the following
# [[2 2 4]
# [5 5 7]
# [8 8 10]
# [11 11 13]]
print (y)
But, when the matrix x is too big, the computed slope will be slow
with the loop. If we can make Row Wise by three more copies of v,

But we can easily add it. This copy can be done in Numpy like this,
import numpy as np
x = np.array ([[1, 2, 3], [4, 5, 6], [7, 8. 9], [10, 11, 12]])
v = np.array ([1, 0, 1])
# Stacking 4 copies of 'v' on top of each other [4 -> 4 rows, 1 -> 1
rows]
vv = np.tile (v, (4, 1))
print (vv)
# Prints "[[1 0]
# [1 0]
# [1 0]
# [1 0]] "
y = x + vv # Adding elementwise
print (y)
# [[2 2 4]
# [5 5 7]
# [8 8 10]
# [11 11 13]]
Actually, there was no need to do all this work, Numpy handles it
manually, and this is the Numpy's Broadcasting
import numpy as np
x = np.array ([[1, 2, 3], [4, 5, 6], [7, 8. 9], [10, 11, 12]])
v = np.array ([1, 0, 1])
y=x+v
print (y)
# [[2 2 4]
# [5 5 7]
# [8 8 10]
# [11 11 13]]
See Numpy User Guide, Release 1.11.0 - Section 3.5.1 (General
Broadcasting Rules) for more details about broadcasting.
Broadcasting applications
import numpy as np
## Example 1
# Computing the external product of vectors
v = np.array ([1, 2, 3]) # shape (3,)
w = np.array ([4, 5]) # shape (2,)
# To compute an external product, we first reshape 'v' to be a column
vector of shape (3, 1)
# Then we can broadcast it against 'w' to produce a output of shape
(3,2)
# Which is the outer product of 'v' and 'w':
# [[4 5]
# [8 10]
# [12 15]]
print (v.reshape (3, 1) * w)
# Add a vector to each row of a matrix
x = np.array ([[1, 2, 3], [4, 5, 6]])
# [[2 4 6]
# [5 7 9]]
print (x + v)
## Example 2
# Let's add vector 'w' with 'x' [x.T == x.transpose ()]
z = x.T + w
print (z)
#prints
# [[59]
# [6 10]
# [7 11]]
# Now we have to revert back to original shape
print (z.T.)
#prints
# [[6 6]
# [9 10 11]]
NumPy’s basically some operations are shown. In the next phase, we
will begin to apply the formula using the NumPy library.
CONCLUSION
This tutorial introduces you to machine learning. Now, you know
that machine learning is a technique of training machines for human
brain operations, which is slightly faster and better than average
people. Today we have seen that the machines can lose human
champions in games, Chess, Alpha GO, which seems very
complicated. You have seen that machines can be trained for human
activity in many areas and can help people to lead a better life.
Machine learning can be a supervision or nonprofit. If you have less
information for your training and clearly labeled data, select the
option for supervised education. Generally learning unknown usually
gives better performance and results for larger data sets. If you have
a large data set available easily, go for deep learning strategies. You
have learned to learn reinforcement and to learn deep reinforcement.
Now you know what neural networks, their applications and
limitations are.

You might also like