Unit 1 PDF

Established as per the Section 2(f) of the UGC Act, 1956
Approved by AICTE, COA and BCI, New Delhi
Machine Learning(B21EP0502 )
Dept. of Electronics and Computer Engineering
Dr. Vidyasagar K N
COURSE OBJECTIVES:
This course will enable the students to:

1. Discuss the basic theory underlying machine learning.
2. Explain machine learning algorithms to solve problems of moderate
complexity for data analysis.
3. Illustrate the concept of Genetic Programming and Artificial Neural
Network.
4. Discuss the implementation of Machine learning algorithms and modules.
COURSE OUTCOMES:
After studying this course, students will be able to:

CO1: Comprehend statistical methods as basis of machine learning domain
CO2: Apply variety of learning algorithms for appropriate applications.
CO3: Implement machine learning techniques to solve problems in applicable
domains
CO4: Evaluate and compare algorithms based on different metrics and
parameters.
CO5: Design application using machine learning techniques.
CO6: Apply Dimensionality Reduction technique.
UNIT-1:
INTRODUCTION TO MACHINE LEARNING MACHINE LEARNING,
▪ Introduction to Machine Learning Machine Learning

▪ Types of Machine Learning, Issues in Machine Learning
▪ Application of Machine Learning
▪ Steps in developing a Machine Learning Application
▪ Importance of Data Visualization
▪ Basics of Supervised and Unsupervised Learning
UNIT-2:
REGRESSION TECHNIQUES
▪ Linear Regression
▪ Logistic Regression
▪ Learning with Trees: Decision Trees,
▪ Constructing Decision Trees using Gini Index
▪ Classification and Regression Trees (CART)
▪ Hyperparameters tuning
▪ Loss Functions
▪ Evaluation Measures for Regression Technique
UNIT-3:
CLASSIFICATION AND CLUSTERING
• Classification: Rule based classification, classification by Bayesian Belief

networks, Hidden Markov Models. Support Vector Machine: Maximum
Margin Linear Separators, Quadratic Programming solution to finding
maximum margin separators, Kernels for learning non-linear functions.
• Clustering: K-means Algorithms, Supervised learning after clustering,
Radial Basis functions. Dimensionality Reduction Techniques, Principal
Component Analysis
UNIT-4:
ARTIFICIAL NEURAL NETWORKS:
▪ Biological Neurons and Biological Neural Networks,

▪ Perceptron Learning
▪ Activation Functions
▪ Multilayer Perceptron's
▪ Back-propagation Neural Networks
▪ Competitive Neural Networks
TEXT BOOK
• Tom Mitchell: Introduction to Machine Learning , McGraw Hill 2013

• Ethem Alpaydin-Introduction to Machine Learning-The MIT Press (2014)
Who is Alan Turing?
What is Machine
Learning?
Machine learning is a branch of artificial intelligence (AI) and computer
science which focuses on the use of data and algorithms to imitate the
way that humans learn, gradually improving its accuracy
Machine learning is an important component of the growing field of data science.
Through the use of statistical methods, algorithms are trained to make classifications or predictions,
uncovering key insights within data mining projects.
These insights subsequently drive decision making within applications and businesses, ideally impacting key
growth metrics.
As big data continues to expand and grow, the market demand for data scientists will increase, requiring
them to assist in the identification of the most relevant business questions and subsequently the data to
answer them.
• Traditional Programming
Data
Computer Output
Program
• Machine Learning
Data
Computer Program
Output
Machine Learning Real
Examples
If you have used Netflix, then you must know that it
recommends you some movies or shows for
watching based on what you have watched earlier.
Machine Learning is used for this recommendation
and to select the data which matches your choice. It
uses the earlier data.
When you upload a photo on Facebook, it can
recognize a person in that photo and suggest you,
mutual friends. ML is used for these predictions. It
uses data like your friend-list, photos available etc.
and it makes predictions based on that.
Software, which shows how you will look when you
get older. This image processing also uses machine
learning.
Types of Machine
Learning
TYPES OF MACHINE LEARNING
TYPES OF MACHINE LEARNING
SUPERVISED LEARNING
• Supervised learning is when the model is getting trained on a labelled
dataset. A labelled dataset is one that has both input and output
parameters. In this type of learning both training and validation, datasets
are labelled as shown in the figures below.
SUPERVISED LEARNING
• Training the system: While training the model, data is usually split in the
ratio of 80:20 i.e. 80% as training data and the rest as testing data.
• In training data, we feed input as well as output for 80% of data.
• The model learns from training data only. We use different machine
learning algorithms(which we will discuss in detail in the next articles) to
build our model.
• Learning means that the model will build some logic of its own.
Once the model is ready then it is good to be tested.
• At the time of testing, the input is fed from the remaining 20% of data that
the model has never seen before, the model will predict some value and we
will compare it with the actual output and calculate the accuracy.
SUPERVISED LEARNING
CLASSIFICATION
• Classification: It is a Supervised Learning task where output is having

defined labels(discrete value).
• For example in above Figure A, Output – Purchased has defined labels i.e. 0
or 1; 1 means the customer will purchase, and 0 means that the customer
won’t purchase.
• The goal here is to predict discrete values belonging to a particular class
and evaluate them based on accuracy. It can be either binary or multi-class
classification.
• In binary classification, the model predicts either 0 or 1; yes or no but in the
case of multi-class classification, the model predicts more than one
class. Example: Gmail classifies mails in more than one class like social,
promotions, updates, and forums.
REGRESSION
• Regression: It is a Supervised Learning task where output is having

continuous value.
• For example, in above Figure B, Output – Wind Speed is not having any
discrete value but is continuous in a particular range.
• The goal here is to predict a value as much closer to the actual output value
as our model can and then evaluation is done by calculating the error value.
• The smaller the error the greater the accuracy of our regression model.
EXAMPLE OF SUPERVISED LEARNING ALGORITHMS
• Linear Regression
• Logistic Regression
• Nearest Neighbor
• Gaussian Naive Bayes
• Decision Trees
• Support Vector Machine (SVM)
• Random Forest
UNSUPERVISED LEARNING
• unsupervised machine learning analyzes and clusters unlabeled datasets

using machine learning algorithms. These algorithms find hidden patterns
and data without any human intervention,
• i.e., we don’t give output to our model. The training model has only input
parameter values and discovers the groups or patterns on its own.
• Data-set in Figure A is Mall data that contains information about its clients
that subscribe to them. Once subscribed they are provided a membership
card and the mall has complete information about the customer and his/her
every purchase.
• Now using this data and unsupervised learning techniques, the mall can
easily group clients based on the parameters we are feeding in.
The input to the unsupervised
learning models is as follows:
• Unstructured data: May contain
noisy(meaningless) data,
missing values, or unknown data
• Unlabeled data: Data only
contains a value for input
parameters, there is no targeted
value(output). It is easy to
collect as compared to the
labeled one in the Supervised
approach.
TYPES OF UNSUPERVISED LEARNING
• Clustering: Broadly this technique is applied to group data based on different

patterns, such as similarities or differences, our machine model finds. These
algorithms are used to process raw, unclassified data objects into groups.
• For example, in the above figure, we have not given output parameter values,
so this technique will be used to group clients based on the input
parameters provided by our data.
• Association: This technique is a rule-based ML technique that finds out

some very useful relations between parameters of a large data set. This
technique is basically used for market basket analysis that helps to better
understand the relationship between different products.
• For e.g. shopping stores use algorithms based on this technique to find out
the relationship between the sale of one product w.r.t to another’s sales
based on customer behavior.
• Like if a customer buys milk, then he may also buy bread, eggs, or butter.
Once trained well, such models can be used to increase their sales by
planning different offers.
Some algorithms: K-Means Clustering

• DBSCAN – Density-Based Spatial Clustering of Applications with Noise
• BIRCH – Balanced Iterative Reducing and Clustering using Hierarchies
• Hierarchical Clustering
SEMI-SUPERVISED LEARNING
As the name suggests, its working lies between Supervised and Unsupervised
techniques. We use these techniques when we are dealing with data that is a little
bit labeled and the rest large portion of it is unlabeled. We can use the
unsupervised techniques to predict labels and then feed these labels to
supervised techniques. This technique is mostly applicable in the case of image
data sets where usually all images are not labeled.
REINFORCEMENT LEARNING
• In this technique, the model keeps on increasing its performance using
Reward Feedback to learn the behavior or pattern.
• These algorithms are specific to a particular problem e.g. Google Self
Driving car, AlphaGo where a bot competes with humans and even itself to
get better and better performers in Go Game.
• Each time we feed in data, they learn and add the data to their knowledge
which is training data. So, the more it learns the better it gets trained and
hence experienced.
REINFORCEMENT LEARNING
• Agents observe input.
• An agent performs an action by
making some decisions.
• After its performance, an agent
receives a reward and accordingly
reinforces and the model stores in
state-action pair of information.
• Temporal Difference (TD)
• Q-Learning
• Deep Adversarial Networks
Examples
TO BETTER FILTER EMAILS AS
SPAM OR NOT
• Task – Classifying emails as spam or not

• Performance Measure – The fraction of emails accurately classified as
spam or not spam
• Experience – Observing you label emails as spam or not spam
A CHECKERS LEARNING PROBLEM
• Task – Playing checkers game

• Performance Measure – percent of games won against opposer
• Experience – playing implementation games against itself
HANDWRITING RECOGNITION
PROBLEM
• Task – Acknowledging handwritten words within portrayal

• Performance Measure – percent of words accurately classified
• Experience – a directory of handwritten words with given classifications
A ROBOT DRIVING PROBLEM
• Task – driving on public four-lane highways using sight scanners

• Performance Measure – average distance progressed before a fallacy
• Experience – order of images and steering instructions noted down while
observing a human driver
FRUIT PREDICTION PROBLEM
• Task – forecasting different fruits for recognition

• Performance Measure – able to predict maximum variety of fruits
• Experience – training machine with the largest datasets of fruits images
FACE RECOGNITION PROBLEM
• Task – predicting different types of faces

• Performance Measure – able to predict maximum types of faces
• Experience – training machine with maximum amount of datasets of
different face images
AUTOMATIC TRANSLATION OF
DOCUMENTS
• Task – translating one type of language used in a document to other

language
• Performance Measure – able to convert one language to other efficiently
• Experience – training machine with a large dataset of different types of
languages
Design a Learning
System in Machine
Learning
“Machine Learning enables a Machine to Automatically learn from Data,
Improve performance from an Experience and predict things without
explicitly programmed.”
When we fed the Training Data to Machine Learning Algorithm, this algorithm
will produce a mathematical model and with the help of the mathematical
model, the machine will make a prediction and take a decision without being
explicitly programmed.
Also, during training data, the more machine will work with it the more it will
get experience and the more it will get experience the more efficient result is
produced.
EXAMPLE :
In Driverless Car, the training data is fed to Algorithm like how to Drive Car in
Highway, Busy and Narrow Street with factors like speed limit, parking, stop
at signal etc.
After that, a Logical and Mathematical model is created based on that and
after that, the car will work according to the logical model.
Also, the more data the data is fed the more efficient output is produced.
Steps for Designing
Learning System are:
STEP – 1: CHOOSING THE TRAINING EXPERIENCE:
• The very important and first task is to choose the training data or training
experience which will be fed to the Machine Learning Algorithm.
• It is important to note that the data or experience that we fed to the
algorithm must have a significant impact on the Success or Failure of the
Model.
• So Training data or experience should be chosen wisely.
ATTRIBUTES WHICH WILL IMPACT ON SUCCESS AND
FAILURE OF DATA
• The training experience will be able to provide direct or indirect feedback

regarding choices.
• For example: While Playing chess the training data will provide feedback
to itself like instead of this move if this is chosen the chances of success
increases
FAILURE OF DATA
• Second important attribute is the degree to which the learner will control
the sequences of training examples.
• For example: when training data is fed to the machine then at that time
accuracy is very less but when it gains experience while playing again and
again with itself or opponent the machine algorithm will get feedback and
control the chess game accordingly.
FAILURE OF DATA
Third important attribute is how it will represent the distribution of examples

over which performance will be measured.
For example, a Machine learning algorithm will get experience while going
through a number of different cases and different examples.
Thus, Machine Learning Algorithm will get more and more experience by
passing through more and more examples and hence its performance will
increase
STEP 2- CHOOSING TARGET FUNCTION
• It means according to the knowledge fed to the algorithm the machine

learning will choose NextMove function which will describe what type of
legal moves should be taken.
• For example : While playing chess with the opponent, when opponent will
play then the machine learning algorithm will decide what be the number
of possible legal moves taken in order to get success
STEP 3- CHOOSING REPRESENTATION FOR TARGET
FUNCTION
• When the machine algorithm will know all the possible legal moves the
next step is to choose the optimized move using any representation i.e.
using linear Equations, Hierarchical Graph Representation, Tabular form
etc.
• The NextMove function will move the Target move like out of these move
which will provide more success rate.
• For Example : while playing chess machine have 4 possible moves, so the
machine will choose that optimized move which will provide success to it.
STEP 4- CHOOSING FUNCTION APPROXIMATION
ALGORITHM
• An optimized move cannot be chosen just with the training data.

• The training data had to go through with set of example and through these
examples the training data will approximates which steps are chosen and
after that machine will provide feedback on it.
• For Example : When a training data of Playing chess is fed to algorithm so
at that time it is not machine algorithm will fail or get success and again
from that failure or success it will measure while next move what step
should be chosen and what is its success rate
STEP 5- FINAL DESIGN
The final design is created at last when system goes from number of
examples , failures and success , correct and incorrect decision and what
will be the next step etc.
Example: DeepBlue is an intelligent computer which is ML-based won chess
game against the chess expert Garry Kasparov, and it became the first
computer which had beaten a human chess expert.
ISSUES IN MACHINE LEARNING
• What algorithms exist for learning general target functions from specific
training examples?
• In what settings will algorithms converge to the desired function, given
sufficient training data?
• Which algorithms perform best for which types of problems and
representations?
• How much training data is sufficient?

• What general bounds can be found to relate the confidence in learned
hypotheses to the amount of training experience and the character of the
learner's hypothesis space?
• When and how can prior knowledge held by the learner guide the process
of generalizing from examples?
• Can prior knowledge be helpful even when it is only approximately
correct?
• What is the best strategy for choosing a useful next training experience,
and how does the choice of this strategy alter the complexity of the
learning problem?
• What is the best way to reduce the learning task to one or more function
approximation problems?
• Put another way, what specific functions should the system attempt to
learn? Can this process itself be automated?
How can the learner automatically alter its representation to improve its
ability to represent and learn the target function?
Why Data
Preprocessing:
WHY DATA PREPROCESSING?
➢ Data in the real world is dirty

✓ incomplete: lacking attribute values, lacking certain attributes of interest,
or containing only aggregate data
✓ noisy: containing errors or outliers
✓ inconsistent: containing discrepancies in codes or names
➢ No quality data, no quality mining results!
✓ Quality decisions must be based on quality data
✓ Data warehouse needs consistent integration of quality data
➢ A multi-dimensional measure of data quality:
✓ A well-accepted multi-dimensional view:
MAJOR TASKS IN DATA PREPROCESSING
• Data cleaning
• Fill in missing values, smooth noisy data, identify or remove outliers, and resolve
inconsistencies
• Data integration
• Integration of multiple databases, data cubes, files, or notes
• Data transformation
• Normalization (scaling to a specific range)
• Aggregation
• Data reduction
• Obtains reduced representation in volume but produces the same or similar analytical
results
• Data discretization: with particular importance, especially for numerical data
• Data aggregation, dimensionality reduction, data compression,generalization
Forms of data preprocessing
DATA CLEANING
DATA CLEANING
• Data cleaning tasks
✓ Fill in missing values
✓ Identify outliers and smooth out noisy data
✓ Correct inconsistent data

MISSING DATA
• Data is not always available
✓ E.g., many tuples have no recorded value for several attributes, such as customer income
in sales data
• Missing data may be due to

✓ equipment malfunction
✓ inconsistent with other recorded data and thus deleted
✓ data not entered due to misunderstanding
✓ certain data may not be considered important at the time of entry
✓ not register history or changes of the data
• Missing data may need to be inferred

HOW TO HANDLE MISSING DATA?
• Ignore the tuple: usually done when class label is missing (assuming the task is
classification—not effective in certain cases)
• Fill in the missing value manually: tedious + infeasible?
• Use a global constant to fill in the missing value: e.g., “unknown”, a new class?!
• Use the attribute mean to fill in the missing value
• Use the attribute mean for all samples of the same class to fill in the
missing value: smarter
• Use the most probable value to fill in the missing value: inference-based such
as regression, Bayesian formula, decision tree
NOISY DATA
Q: What is noise?
A: Random error in a measured variable.
• Incorrect attribute values may be due to
✓ faulty data collection instruments
✓ data entry problems
✓ data transmission problems
✓ technology limitation
✓ inconsistency in naming convention
• Other data problems which requires data cleaning
✓ duplicate records
✓ incomplete data
✓ inconsistent data
HOW TO HANDLE NOISY DATA?
• Binning method:
✓ first sort data and partition into (equi-depth) bins
✓ then one can smooth by bin means, smooth by bin median, smooth by bin
boundaries, etc.
✓ used also for discretization
• Clustering
✓ detect and remove outliers
• Semi-automated method: combined computer and human
inspection
✓ detect suspicious values and check manually
• Regression
✓ smooth by fitting the data into regression functions
DATA VISUALIZATION
• Data visualization is an important skill to possess for anyone trying to
extract and communicate insights from data.
• Great business narratives and presentations often stem from brilliant
visualizations that convey the key ideas in a concise and aesthetic manner.
• In the field of machine learning, visualization plays a key role throughout
the entire process of analysis - to obtain relationships, observe trends and
portray the results as well.
NECESSITY OF DATA VISUALIZATION
• It is difficult for the human eye to decipher patterns from raw numbers
only.
• Sometimes, even the statistical information summarized from the data
may mislead you to wrong conclusions.
• Therefore, you should visualize the data often to understand how different
features are behaving.
DATA VISUALIZATION-RETAIL STORE SALES EXAMPLE
• Each of the branches had employed a different strategy to calculate its

discount rate, and the sales numbers were also quite different across all of
them.
• It is difficult to draw this type of insight and understand the difference
between each of the branches using raw numbers alone; therefore, we
should utilize an appropriate visualization technique to ‘look’ at the data.
FACTS AND DIMENSIONS
• Graphics and visuals, when used intelligently and innovatively, can convey
a lot more than what raw data alone can.
• Matplotlib serves the purpose of providing multiple functions to build
graphs from the data stored in your lists, arrays, etc.
There are two types of data, which are as follows:

• Facts
• Dimensions
FACTS AND DIMENSIONS
• Facts and dimensions are different types of variables that help you
interpret data better.
• Facts are numerical data, and dimensions are metadata.
• Metadata explains the additional information associated with the factual
variable.
• Both facts and dimensions are equally important for generating actionable
insights from a given data set.
• For example, in a data set about the height of students in a class, the
height of the students would be a fact variable, whereas the gender of the
students would be a dimensional variable.
• You can use dimensions to slice data for easier analysis. In this case, the
distribution of height based on the gender of a student can be studied.
QUESTION
Consider a bank having thousands of ATMs across India. In every
transaction, the following variables are recorded:
1. Withdrawal amount
2. Account balance after withdrawal
3. Transaction charge amount
4. Customer ID
5. ATM ID
6. Date of withdrawal
Which among the following are fact variables?
DIMENSIONAL MODELLING
What are the benefits of having dimension variables apart from facts?
• Performing various types of analyses, such as sector-wise, country-wise
or funding type-wise analyses.
• Extracting specific, useful information such as the total investment made
in the automobile sector in India between 2014 and 2015.
BAR GRAPH
• Plots are used to convey different ideas.
• For example, you can use certain plots to visualize the spread of data
across two variables and other plots to gauge the frequency of a label.
• Depending on the objective of your visualization task, you can choose an
appropriate plot.
• A bar graph is helpful when you need to visualize a numeric feature (fact)
across multiple categories.
• import mplotlib.pyplot as plt
• plt.bar(x_component, y_component): Used to draw a bar graph
• plt.show(): Explicit command required to display the plot object
BAR GRAPH
• plt.xlabel(), plt.ylabel(): Specify labels for the x and y axes

• plt.title(): Add a title to the plot object.
SCATTER PLOT
• Scatter plot, as the name suggests, displays how the variables are spread
across the range considered. It can be used to identify a relationship or
pattern between two quantitative variables and the presence of outliers
within them.
• plt.scatter(x_axis, y_axis)
• plt.scatter(x_axis, y_axis, c = color, label = labels)
• Another feature of a scatter plot allows you to use labels to further
distinguish points over another dimension variable.
SCATTER PLOT
SCATTER PLOT
SCATTER PLOT-QUESTION
Select the cases where a scatterplot would be helpful in generating insights.
1. To check whether a relationship exists between the age of a person and
their income.
2. To check whether there are any irregular entries in the data range.
3. To check whether stock prices are positively related to the profit of a
company.
4. To understand the distribution of the salaries of the employees in a
company.
LINE GRAPH AND HISTOGRAM
• A line graph is used to present continuous time-dependent data. It accurately
depicts the trend of a variable over a specified time period.
• A line chart or line plot or line graph or curve chart is a type of chart which
displays information as a series of data points called 'markers' connected by
straight line segments.
• A line graph can be helpful when you want to identify the trend of a particular
)
variable. Some key industries and services that rely on line graphs include
financial markets and weather forecast
• plt.plot(x_axis, y_axis)
• plt.yticks(rotation = number) #could do for xticks as well

• A histogram is a frequency chart that records the number of occurrences of
an entry or an element in a data set. It can be useful when you want to
understand the distribution of a given series.
• A histogram is a plot that lets you discover, and show, the underlying
frequency distribution (shape) of a set of continuous data.
• plt.hist(profit, bins = 100,edgecolor='Orange',color='cyan')
BOX PLOT
• Box plots are quite effective in summarizing the spread of a large data set
into a visual representation. They use percentiles to divide the data range.
• The percentile value gives the proportion of the data range that falls below
a chosen data point when all the data points are arranged in the
descending order.
• For example, if a data point with a value of 700 has a percentile value of
99% in a data set, then it means that 99% of the values in the data set are
less than 700.
BOX PLOT
• A Box and Whisker Plot (or Box Plot) is a convenient way of visually
displaying the data distribution through their quartiles.
• The lines extending parallel from the boxes are known as the “whiskers”,
which are used to indicate variability outside the upper and lower
quartiles.
• Outliers are sometimes plotted as individual dots that are in-line with
whiskers. Box Plots can be drawn either vertically or horizontally.
• plt.boxplot([ list_1, list_2])

BOX PLOT
BOX PLOT
Box plots divide the data range into three important categories, which are as
follows:
• Median value: This is the value that divides the data range into two equal
halves, i.e., the 50th percentile.
• Interquartile range (IQR): These data points range between the 25th and
75th percentile values.
• Outliers: These are data points that differ significantly from other
observations and lie beyond the whiskers.
BOX PLOT
CHOOSING PLOT TYPES
• Each of the plot types is good at communicating a specific type of
information. Which means, in certain situations, certain plot types are
preferred over the others.
• So, how do you select the best possible plot type in a given situation?
• To answer this question, you need to first define the objective of creating a
plot.
• A good visualization, along with the right type of graph, presents the
relationship between different variables effectively and allows you to
analyze them at a quick glance.
CHOOSING PLOT TYPES
CHOOSING PLOT TYPES- COMPARISON
• These charts can be used when you want to
compare one set of values with other sets of
values.
• The objective is to differentiate one particular
set of values from the other sets, for example,
quarterly sales of competing phones in the
market.
• The following two types of charts are used to
show a comparison:
1. Column chart
2. Bar chart
CHOOSING PLOT TYPES- COMPOSITION
• You would need to use a composition chart to display

how the various elements make up the complete data.
• Composition charts can be static, which shows the
composition at a particular instance of time, or
dynamic, which shows the changes in the
composition over a period of time.
• Two of the popular composition charts are as follows:
• Pie/ Doughnut Chart
• Stacked Column chart
• The pie chart is by far the most common way to
represent static composition, while the stacked
column chart can be used to show the variation of
composition over a period of time.
CHOOSING PLOT TYPES- RELATIONSHIP
• A relationship chart helps in visualizing the correlation between variables.

• It can help in answering questions such as ‘Is there a correlation between the
amount spent on marketing and the sales revenue?’ and
• ‘How does the gross profit vary with the change in offers?’.
• Two of the most common types of charts used to visualize relationships
between variables are as follows:
• Scatter plot
• Bubble Plot
CHOOSING PLOT TYPES- RELATIONSHIP
• A scatter plot can help correlate two variables, whereas a bubble chart
adds one more dimension, i.e., the size of the bubble (usually indicative of
the frequency of occurrence of that particular data point)
CHOOSING PLOT TYPES- DISTRIBUTION
• A distribution chart tries to answer the question ‘How is the data
distributed?’.
• For example, suppose you asked everyone their age in a survey.
• Using a distribution chart will help you visualize the distribution of ages in
the data set.
• The distribution can be over a variable, or it can also be over a period of
time. Two of the most used charts for visualizing distribution are as
follows:
• Histogram
• Scatter plots
CHOOSING PLOT TYPES- DISTRIBUTION
• Histograms are quite good at displaying the distribution of data over

intervals, whereas scatter plots are good at visualizing the distribution of
data over two different variables.
Supervised and
Unsupervised Learning
Supervised Learning vs Unsupervised Learning
𝑥 → 𝑦 𝑥
cat
dog
bear
dog
bear
dog
cat
cat
bear
𝑥 → 𝑦 𝑥
cat
dog
bear
dog
bear
dog
cat
cat
bear
𝑥 → 𝑦 𝑥
cat
dog
bear
Classification dog
bear
Clustering
dog
cat
cat
bear
Supervised Learning Examples
Classification cat
Face Detection
Language Parsing
Structured Prediction
Supervised Learning Examples
cat = 𝑓( )
= 𝑓( )
= 𝑓( )
Supervised Learning – k-Nearest Neighbors
cat
dog
bear
cat, cat, dog k=3
cat
cat dog bear
119
dog
bear
cat
dog
bear k=3
cat
bear, dog, dog
cat dog bear
120
dog
bear
•How do we choose the right K?
•How do we choose the right features?
•How do we choose the right distance metric?
121
Answer: Just choose the one combination

122 that works best!
BUT not on the test data.
Instead split the training data into a ”Training set” and

a ”Validation set” (also called ”Development set”)
Unsupervised Learning – k-means clustering
123
k=3
1. Initially assign
all images to a
random cluster
124
k=3
2. Compute the
mean image (in
feature space) for
each cluster
125
k=3
3. Reassign images
to clusters
based on similarity to
cluster means
126
k=3
4. Keep repeating
this process
until convergence
Unsupervised Learning – k-Means clustering
•How sensitive is this method with respect to the random
assignment of clusters?
Answer: Just choose the one combination
127 that works best!
BUT not on the test data.
Instead split the training data into a ”Training set” and

a ”Validation set” (also called ”Development set”)
Supervised Learning - Classification
Training Data Test Data
dog
cat bear
cat dog bear

cat
128
dog bear
Training Data Test Data
cat
dog
cat
. 129 .
. .
. .
bear
Training Data
𝑥1 = [ ] 𝑦1 = [ cat ]
𝑥2 = [ ] 𝑦2 = [dog ]
𝑥3 = [ ] 𝑦3 = [cat ]
. 130
.
.
𝑥𝑛 = [ ] 𝑦𝑛 = [bear ]
Training Data targets /
inputs labels / predictions
We need to find a function that
𝑥1 = [𝑥11 𝑥12 𝑥13 𝑥14 ]

ground truth
𝑦1 = 1 𝑦ො1 = 1
maps x and y for any of them
𝑥2 = [𝑥21 𝑥22 𝑥23 𝑥24 ] 𝑦2 = 2 𝑦ො2 = 2
𝑦ෝ𝑖 = 𝑓(𝑥𝑖 ; 𝜃)
𝑥3 = [𝑥31 𝑥32 𝑥33 𝑥34 ] 𝑦3 = 1 𝑦ො3 = 2

How
131
do we ”learn” the paramet
. of this function?
. We choose ones that makes the
.
following
𝑛
quantity small:
𝑥𝑛 = [𝑥𝑛1 𝑥𝑛2 𝑥𝑛3 𝑥𝑛4 ] 𝑦𝑛 = 3 𝑦ො𝑛 = 1 ෍ 𝐶𝑜𝑠𝑡(𝑦ෝ𝑖 , 𝑦𝑖 )
𝑖=1
Supervised Learning – Linear Softmax
inputs labels /
ground truth
𝑥1 = [𝑥11 𝑥12 𝑥13 𝑥14 ] 𝑦1 = 1
𝑥2 = [𝑥21 𝑥22 𝑥23 𝑥24 ] 𝑦2 = 2
𝑥3 = [𝑥31 𝑥32 𝑥33 𝑥34 ] 𝑦3 = 1

132
.
.
.
𝑥𝑛 = [𝑥𝑛1 𝑥𝑛2 𝑥𝑛3 𝑥𝑛4 ] 𝑦𝑛 = 3

inputs labels / predictions
ground truth
𝑥1 = [𝑥11 𝑥12 𝑥13 𝑥14 ] 𝑦1 = [1 0 0] 𝑦ො1 = [0.85 0.10 0.05]
𝑥2 = [𝑥21 𝑥22 𝑥23 𝑥24 ] 𝑦2 = [0 1 0] 𝑦ො2 = [0.20 0.70 0.10]
𝑥3 = [𝑥31 𝑥32 𝑥33 𝑥34 ] 𝑦3 = [1 0 0] 𝑦ො3 = [0.40 0.45 0.05]

133
.
.
.
𝑥𝑛 = [𝑥𝑛1 𝑥𝑛2 𝑥𝑛3 𝑥𝑛4 ] 𝑦𝑛 = [0 0 1] 𝑦ො𝑛 = [0.40 0.25 0.35]

𝑥𝑖 = [𝑥𝑖1 𝑥𝑖2 𝑥𝑖3 𝑥𝑖4 ] 𝑦𝑖 = [1 0 0] 𝑦ො𝑖 = [𝑓𝑐 𝑓𝑑 𝑓𝑏 ]
𝑔𝑐 = 𝑤𝑐1 𝑥𝑖1 + 𝑤𝑐2 𝑥𝑖2 + 𝑤𝑐3 𝑥𝑖3 + 𝑤𝑐4 𝑥𝑖4 + 𝑏𝑐

𝑔𝑑 = 𝑤𝑑1 𝑥𝑖1 + 𝑤𝑑2 𝑥𝑖2 + 𝑤𝑑3 𝑥𝑖3 + 𝑤𝑑4 𝑥𝑖4 + 𝑏𝑑
𝑔𝑏 = 𝑤𝑏1 𝑥𝑖1 + 𝑤𝑏2 𝑥𝑖2 + 𝑤
134
𝑏3 𝑥𝑖3 + 𝑤𝑏4 𝑥𝑖4 + 𝑏𝑏
𝑔𝑐 𝑔𝑐 𝑔𝑑 𝑔𝑏
𝑓𝑐 = 𝑒 /(𝑒 +𝑒 + 𝑒 )
𝑓𝑑 = 𝑒 𝑔𝑑 /(𝑒 𝑔𝑐 +𝑒 𝑔𝑑 + 𝑒 𝑔𝑏 )
𝑔𝑏 𝑔𝑐 𝑔𝑑 𝑔𝑏
𝑓𝑏 = 𝑒 /(𝑒 +𝑒 + 𝑒 )

Unit 1 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 1 PDF

Uploaded by

Copyright:

Available Formats

Established as per the Section 2(f) of the UGC Act, 1956

Approved by AICTE, COA and BCI, New Delhi

Dept. of Electronics and Computer Engineering

This course will enable the students to:

After studying this course, students will be able to:

▪ Introduction to Machine Learning Machine Learning

• Classification: Rule based classification, classification by Bayesian Belief

▪ Biological Neurons and Biological Neural Networks,

• Tom Mitchell: Introduction to Machine Learning , McGraw Hill 2013

• Classification: It is a Supervised Learning task where output is having

• Regression: It is a Supervised Learning task where output is having

• unsupervised machine learning analyzes and clusters unlabeled datasets

• Clustering: Broadly this technique is applied to group data based on different

• Association: This technique is a rule-based ML technique that finds out

Some algorithms: K-Means Clustering

• Task – Classifying emails as spam or not

• Task – Playing checkers game

• Task – Acknowledging handwritten words within portrayal

• Task – driving on public four-lane highways using sight scanners

• Task – forecasting different fruits for recognition

• Task – predicting different types of faces

• Task – translating one type of language used in a document to other

• The training experience will be able to provide direct or indirect feedback

Third important attribute is how it will represent the distribution of examples

• It means according to the knowledge fed to the algorithm the machine

• An optimized move cannot be chosen just with the training data.

• How much training data is sufficient?

➢ Data in the real world is dirty

• Data cleaning tasks

✓ Fill in missing values

✓ Identify outliers and smooth out noisy data

✓ Correct inconsistent data

• Missing data may be due to

✓ inconsistent with other recorded data and thus deleted

✓ data not entered due to misunderstanding

✓ certain data may not be considered important at the time of entry

✓ not register history or changes of the data

• Missing data may need to be inferred

• Fill in the missing value manually: tedious + infeasible?

• Use the attribute mean to fill in the missing value

• Each of the branches had employed a different strategy to calculate its

There are two types of data, which are as follows:

• plt.xlabel(), plt.ylabel(): Specify labels for the x and y axes

• plt.yticks(rotation = number) #could do for xticks as well

• plt.boxplot([ list_1, list_2])

• You would need to use a composition chart to display

• A relationship chart helps in visualizing the correlation between variables.

• Histograms are quite good at displaying the distribution of data over

cat dog bear

Answer: Just choose the one combination

Instead split the training data into a ”Training set” and

Instead split the training data into a ”Training set” and

cat dog bear

𝑥1 = [𝑥11 𝑥12 𝑥13 𝑥14 ]

𝑥3 = [𝑥31 𝑥32 𝑥33 𝑥34 ] 𝑦3 = 1 𝑦ො3 = 2

𝑥2 = [𝑥21 𝑥22 𝑥23 𝑥24 ] 𝑦2 = 2

𝑥3 = [𝑥31 𝑥32 𝑥33 𝑥34 ] 𝑦3 = 1

𝑥𝑛 = [𝑥𝑛1 𝑥𝑛2 𝑥𝑛3 𝑥𝑛4 ] 𝑦𝑛 = 3

𝑥2 = [𝑥21 𝑥22 𝑥23 𝑥24 ] 𝑦2 = [0 1 0] 𝑦ො2 = [0.20 0.70 0.10]

𝑥3 = [𝑥31 𝑥32 𝑥33 𝑥34 ] 𝑦3 = [1 0 0] 𝑦ො3 = [0.40 0.45 0.05]

𝑥𝑛 = [𝑥𝑛1 𝑥𝑛2 𝑥𝑛3 𝑥𝑛4 ] 𝑦𝑛 = [0 0 1] 𝑦ො𝑛 = [0.40 0.25 0.35]

𝑥𝑖 = [𝑥𝑖1 𝑥𝑖2 𝑥𝑖3 𝑥𝑖4 ] 𝑦𝑖 = [1 0 0] 𝑦ො𝑖 = [𝑓𝑐 𝑓𝑑 𝑓𝑏 ]

𝑔𝑐 = 𝑤𝑐1 𝑥𝑖1 + 𝑤𝑐2 𝑥𝑖2 + 𝑤𝑐3 𝑥𝑖3 + 𝑤𝑐4 𝑥𝑖4 + 𝑏𝑐