Machine Learning

Machine Learning:- we need to feed data into machine for the machine to learn.
Machine Learning is the process of helping the machine Learn how to take decision logically
It will use data and algorithm to learn also it is the subset of AI
Ex-netflix and amazon built such machine learning model by using tons of data inorder to identify
profitable opportunity and avoid risk
The term machine learning was first coined by Arthur Samuel in the year 1959.
First formal definition of ML was given by Tom M “ a computer program is set to learn from experience E
with respect to some classes of task T and performance measure P if its performance at task in T as
measured by P improves with Experience E.”
The need of machine learning
By the help of Ml we can solve the complex problems
We can uncover the pattern and trends in data
We can improve decision making skills
We can do predictive modeling
We can solve complex problems with marginal level of accuracy.
Machine Learning Definition
Algorithm:A Machine learning algo is a set of rules and statistical technique used to learn patterns from
data and draw the significant information from it.logic behind ML model.
Model: A model is main component of ML. A model is trained by using machine learning algorithm.
An algorithm maps all the decision that a model is supposed to take on the given input inorder to get
the correct output.
Predictor Variable:- it is the feature of the data that can be used to predict the output.
Response variable: It is the output variable that needs to be predicted by using the predictor variable.
Training data:the ML Model is built using training data and training data helps to identify key trends and
patterns to predict the output.
Testing Data: After the model is trained it must be tested to evaluate how accurately it can predict the
outcome.
Locality Carpet Area No. of rooms House price

Predictor variable Predictor variable Predictor variable Response variable
Basic Pipeline- Data, Train the machine,Build the model,Predicting Outcome.
Or
Data-Define the objective-Preparing the Data-Exploring the data-Building Model-Model evaluation-Best

Model selected-Testdata-Prediction on Test Data.
Machine Learning Types:
Supervised Learning- Teach Machine or train machine using the data which is well labelled.
Unsupervised Learning- We will train the machine using unlabeled data and we are training the machine
without any guidance
Reinforcement Learning- Part of Machine Learning where an agent is put in an environment and
behaves in this environment and he learns to behave in this environment by observing the rewards
which it gets from those actions
Eg.Alexa,Self Driven Cars
Machine Reinforcement Learning

Learning
UnSupervised ML
Supervised machine Learning
1 Numeric-Regression supervised learning
2 catagorical-Classification supervised machine learning
Unsupervised:-Clustering technique
Regression Classification clustering

Supervised Supervised Unsupervised
Numeric format catagorical clusters
Forecast or Predict Compute category Making of similar item clusters
House Price Classification of species of iris All the transaction which are
dataset fraudulent in nature
Linear regression Logistic regression K means
Conversion of raw data into the meaningfull form and this is called data preprocessing.
Convert data into useful format.
To make our data ready for model building.
Steps involved in data preprocessing
1 Handling Missing Values
2 treating Outlier
3 Scaling the dataset
4 Encoding the categorical variable
Handling the missing values- To avoid lost data
To avoid skip of patterns and trends.
Methods of Handling Missing values:
1 Methods- Mean/median/mode imputation mean can be used when the variable is numeric and
having normal distribution.
Median can be used to fill missing valued when variable is numeric and it is skewed.
More can be used when our variable is in categorical format.
2 Random Sample Imputation: In this method random value from existing set of value is taken and
used to fill the missing value. It is easy and variance is same as original dataset.
3 Capturing NAN value with the new feature: This method is used where the data is missing due to
some cause so in that case we can create a new feature in dataframe which will store the null value
for that particular feature. Easy to implement easy to identify where missing value is.
4 End of Distribution: We will the missing value with some extreme value of the feature.
Extreme value=mean + - 3*(std)
5 Arbitrary value imputation:We can fill the missing value with any value in dataset. It is purely
judgement based decision.
6: Frequent category Imputation: When the variable is categorical best way to fill missing value with
the more popular class mode.
7: Using KNN imputation: K nearest neighbor: if some values are missing at 6 th place baased on
neighbours we are filling missing values by using 5th and 7th place value k nearest neighbor.
8: dropping all the NAN Values: If in dataset for a particular 60% or more is missing then it is advised
to drop that particular feature.
Handling the outliers:Differ significantly from the rest of the data called outliers.
Visualisation and mathematical way of dealing with outlier.
Boxplot
Scatterplot
IQR
ZScore
3 feature Scaling:-When we are having different feature vary over different ranges it is difficult to put
them in a model
The process of bringing features of all features together in the same range is called scaling.
Theres a different method of feature scaling
1 Absolute max scaling: According to this method every value is divide by the max value of the feature
convert all the value from -1 to +1 prone to outliers.
Marks Abs max scaling Min max scaling Normalisation

10 10/300 10-10/300-10=0 10-110/300-10
20 20/300 20-10/300- 20-110/300-10
10=10/290=0.034
300 300/300 300-10/300-10=1 300-110/300-10
2 Min Max Scaling-This method follows the formuls
x-xmin/xmax-xmin
this will create the range from 0 to 1.
3 Normalisation= x-xmean/xmax-xmin
4 Standardisation: Converts each value of a feature to a zscore suited when data is normally distributed
Z=x-xmean/std
5 robust scaling: when the dataset is skewed this method will come to the picture
Scaled value=x-xmedian/IQR
Method formula pythoncode

1 Absolute max scaling x/xmax MaxAbsScaler()
2 min max scaler x-xmin/xmax-xmin MinMaxScaler()
3 Normalisation x-xmean/xmax-xmin normalize
4 Standardisation x-xmean/xstd StandardScaler()
Robust Scaling x-xmedian/iqr RobustScaling
Encoding the Dataset
Encoding the dataset:
Encode the categorical column into numerical
Machine learning model may fail to incorporate these variable that’s why it is important to convert them
into simple numeric codes process is called as encoding
Two ways by which we do encoding
1 nominal encoding
2 ordinal encoding
Nominal Encoding:- Present in ordered form increase the features in the dataset.
10 categories
N number of category in a feature then it will creature n additional feature in dataset.
1 hot encoding
color 1 hot encoding red yellow green

red Red,yellow,green 1 0 0
red 1 0 0
yellow 0 1 0
green 0 0 1
yellow 0 1 0
Disadvantage of 1 hot encoding that it will create additional feature in the dataset that’s why it lead to
cause curse of dimensionality problem
It refers to the scenario it will increase the excessive no of feature in dataset leads to decrease model
performance.
It will difficult to design model in higher dimensional space .
In Higher dimensional space it will take more processing time for model to work along with increase in
noise and error and ultimately decrease the accuracy.
Ordinal Encoding:in the ordinal encoding the label encoding is there which will encode each category of
feature in same column.
state stateonly
Maharashtra 3
Delhi 0
karnataka 2
Gujarat 1
TamilNadu 4

Machine Learning

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine Learning

Uploaded by

Copyright:

Available Formats

Machine Learning:- we need to feed data into machine for the machine to learn.

It will use data and algorithm to learn also it is the subset of AI

The need of machine learning

By the help of Ml we can solve the complex problems

We can uncover the pattern and trends in data

We can improve decision making skills

We can do predictive modeling

We can solve complex problems with marginal level of accuracy.

Machine Learning Definition

Locality Carpet Area No. of rooms House price

Data-Define the objective-Preparing the Data-Exploring the data-Building Model-Model evaluation-Best

Machine Learning Types:

Eg.Alexa,Self Driven Cars

Machine Reinforcement Learning

Supervised machine Learning

1 Numeric-Regression supervised learning

2 catagorical-Classification supervised machine learning

Regression Classification clustering

Convert data into useful format.

To make our data ready for model building.

Steps involved in data preprocessing

1 Handling Missing Values

3 Scaling the dataset

4 Encoding the categorical variable

Handling the missing values- To avoid lost data

To avoid skip of patterns and trends.

Methods of Handling Missing values:

More can be used when our variable is in categorical format.

Extreme value=mean + - 3*(std)

Visualisation and mathematical way of dealing with outlier.

Theres a different method of feature scaling

Marks Abs max scaling Min max scaling Normalisation

2 Min Max Scaling-This method follows the formuls

this will create the range from 0 to 1.

Method formula pythoncode

Encoding the dataset:

Encode the categorical column into numerical

Two ways by which we do encoding

N number of category in a feature then it will creature n additional feature in dataset.

color 1 hot encoding red yellow green

It will difficult to design model in higher dimensional space .

You might also like