Professional Documents
Culture Documents
UNIT-I
INTRODUCTION – Learning, Types of Learning, Well defined learning problems, Designing a Learning
System, History of ML, Introduction of Machine Learning Approaches – (Artificial Neural Network,
Clustering, Reinforcement Learning, Decision Tree Learning, Bayesian networks, Support Vector
Machine, Genetic Algorithm), Issues in Machine Learning and Data Science Vs Machine Learning;
Learning: Definition
A computer program is said to learn from experience E with respect to some class of tasks T and
performance measure P, if its performance at tasks T, as measured by P, improves with experience E.
Examples
i) Handwriting recognition learning problem
• Task T: Recognizing and classifying handwritten words within images
• Performance P: Percent of words correctly classified
• Training experience E: A dataset of handwritten words with given classifications
ii) A robot driving learning problem
• Task T: Driving on highways using vision sensors
• Performance measure P: Average distance traveled before an error
• Training experience: A sequence of images and steering commands recorded while observing a human
driver
iii) A chess learning problem
• Task T: Playing chess
• Performance measure P: Percent of games won against opponents
• Training experience E: Playing practice games against itself
Definition
A computer program which learns from experience is called a machine learning program or
simply a learning program. Such a program is sometimes also referred to as a learner.
Machine learning was first conceived from the mathematical modeling of neural networks. A paper by
logician Walter Pitts and neuroscientist Warren McCulloch, published in 1943, attempted to
mathematically map out thought processes and decision making in human cognition.
In 1950, Alan Turning proposed the Turing Test, which became the litmus test for which machines were
deemed "intelligent" or "unintelligent." The criteria for a machine to receive status as an "intelligent"
machine, was for it to have the ability to convince a human being that it, the machine, was also a human
being. AI and machine learning algorithms aren’t new. The field of AI dates back to the 1950s. Arthur
Lee Samuels, an IBM researcher, developed one of the earliest machine learning programs — a
self-learning program for playing checkers. In fact, he coined the term machine learning. His approach
to machine learning was explained in a paper published in the IBM Journal of Research and
Development in 1959.Over the decades, AI techniques have been widely used as a method of
improving the performance of underlying code. In the last few years with the focus on distributed
computing models and cheaper compute and storage, there has been a surge of interest in AI and
machine learning that has lead to a huge amount of money being invested in startup software
companies.
1. Data storage
Facilities for storing and retrieving huge amounts of data are an important component of the learning
process. Humans and computers alike utilize data storage as a foundation for advanced
reasoning.
• In a human being, the data is stored in the brain and data is retrieved using electrochemical signals.
• Computers use hard disk drives, flash memory, random access memory and similar devices to store
data and use cables and other technology to retrieve data.
2. Abstraction
The second component of the learning process is known as abstraction. Abstraction is the process of
extracting knowledge about stored data. This involves creating general concepts about the data as a
whole. The creation of knowledge involves application of known models and creation of new models.
The process of fitting a model to a dataset is known as training. When the model has been trained, the
data is transformed into an abstract form that summarizes the original information.
3. Generalization
The third component of the learning process is known as generalization. The term generalization
describes the process of turning the knowledge about stored data into a form that can be utilized for
future action. These actions are to be carried out on tasks that are similar, but not identical, to those
what have been seen before. In generalization, the goal is to discover those properties of the data that
will be most relevant to future tasks.
4. Evaluation
Evaluation is the last component of the learning process. It is the process of giving feedback to the user
to measure the utility of the learned knowledge. This feedback is then utilized to effect improvements in
the whole learning process.
1. Image Recognition:
Image recognition is one of the most common applications of machine learning. It is used to identify
objects, persons, places, digital images, etc. The popular use case of image recognition and face
detection is, Automatic friend tagging suggestion: Face book provides us a feature of auto friend
tagging suggestion. Whenever we upload a photo with our Face book friends, then we automatically get
a tagging suggestion with name, and the technology behind this is machine learning's face
detection and recognition algorithm.
2. Speech Recognition
While using Google, we get an option of "Search by voice," it comes under speech recognition, and it's a
popular application of machine learning.Speech recognition is a process of converting voice instructions
into text, and it is also known as "Speech to text", or "Computer speech recognition." At present,
machine learning algorithms are widely used by various applications of speech recognition. Google
assistant, Siri, Cortana, and Alexa are using speech recognition technology to follow the voice
instructions.
3. Traffic prediction:
If we want to visit a new place, we take help of Google Maps, which shows us the correct path with the
shortest route and predicts the traffic conditions. It predicts the traffic conditions such as whether traffic
is cleared, slow-moving, or heavily congested with the help of two ways:
o Real Time location of the vehicle form Google Map app and sensors
o Average time has taken on past days at the same time.
Everyone who is using Google Map is helping this app to make it better. It takes information from the
user and sends back to its database to improve the performance.
4. Product recommendations:
Machine learning is widely used by various e-commerce and entertainment companies such
as Amazon, Netflix, etc., for product recommendation to the user. Whenever we search for some
product on Amazon, then we started getting an advertisement for the same product while internet
surfing on the same browser and this is because of machine learning. Google understands the user
interest using various machine learning algorithms and suggests the product as per customer interest.
As similar, when we use Netflix, we find some recommendations for entertainment series, movies, etc.,
and this is also done with the help of machine learning.
5. Self-driving cars:
One of the most exciting applications of machine learning is self-driving cars. Machine learning plays a
significant role in self-driving cars. Tesla, the most popular car manufacturing company is working on
self-driving car. It is using unsupervised learning method to train the car models to detect people and
objects while driving.
Whenever we receive a new email, it is filtered automatically as important, normal, and spam. We
always receive an important mail in our inbox with the important symbol and spam emails in our spam
box, and the technology behind this is Machine learning. Below are some spam filters used by Gmail:
o Content Filter
o Header filter
o General blacklists filter
o Rules-based filters
o Permission filters
Some machine learning algorithms such as Multi-Layer Perceptron, Decision tree, and Naïve Bayes
classifier are used for email spam filtering and malware detection.
We have various virtual personal assistants such as Google assistant, Alexa, Cortana, Siri. As the name
suggests, they help us in finding the information using our voice instruction. These assistants can help us
in various ways just by our voice instructions such as Play music, call someone, Open an email,
Scheduling an appointment, etc. These virtual assistants use machine learning algorithms as an
important part. These assistant record our voice instructions, send it over the server on a cloud, and
decode it using ML algorithms and act accordingly.
Machine learning is making our online transaction safe and secure by detecting fraud transaction.
Whenever we perform some online transaction, there may be various ways that a fraudulent
transaction can take place such as fake accounts, fake ids, and steal money in the middle of a
transaction. So to detect this, Feed Forward Neural network helps us by checking whether it is a
genuine transaction or a fraud transaction. For each genuine transaction, the output is converted into
some hash values, and these values become the input for the next round. For each genuine transaction,
there is a specific pattern which gets change for the fraud transaction hence, it detects it and makes our
online transactions more secure.
Machine learning is widely used in stock market trading. In the stock market, there is always a risk of up
and downs in shares, so for this machine learning's long short term memory neural network is used for
the prediction of stock market trends.
In medical science, machine learning is used for diseases diagnoses. With this, medical technology is
growing very fast and able to build 3D models that can predict the exact position of lesions in the brain.
It helps in finding brain tumors and other brain-related diseases easily.
Nowadays, if we visit a new place and we are not aware of the language then it is not a problem at all, as
for this also machine learning helps us by converting the text into our known languages. Google's GNMT
(Google Neural Machine Translation) provide this feature, which is a Neural Machine Learning that
translates the text into our familiar language, and it called as automatic translation.
Learning Models
Machine learning is concerned with using the right features to build the right models that
achieve the right tasks. The basic idea of learning models has divided into three categories. For a given
problem, the collection of all possible outcomes represents the sample space or instance space.
Using a Logical expression. (Logical models)
Using the Geometry of the instance space. (Geometric models)
Using Probability to classify the instance space. (Probabilistic models)
Grouping and Grading
The learning process starts with task T, performance measure P and training experience E and objective
are to find an unknown target function. The target function is an exact knowledge to be learned from
the training experience and it’s unknown.
When we want to design a learning system that follows the learning process, we need to consider a few
design choices. The design choices will be to decide the following key components:
1. Type of training experience
2. Choosing the Target Function
3. Choosing a representation for the Target Function
4. Choosing an approximation algorithm for the Target Function
5. The final Design
We will look into the game - checkers learning problem and apply the above design choices. For
a checkers learning problem, the three elements will be,
1. Task T: To play checkers
2. Performance measure P: Total percent of the game won in the tournament.
3. Training experience E: A set of games played against itself
Key Terminology
Classifier: A method that receives a new input as an unlabeled instance of an observation or feature
and identifies a category or class to which it belongs. Many commonly used classifiers employ statistical
inference (probability measure) to categorize the best label for a given instance.
Confusion matrix (aka error matrix): A matrix that visualizes the performance of the classification
algorithm using the data in the matrix. It compares the predicted classification against the actual
classification in the form of false positive, true positive, false negative and true negative information.
Accuracy (aka err or rate): The rate of correct (or incorrect) predictions made by the model over
a dataset. Accuracy is usually estimated by using an independent test set that was not used at
any time during the learning process. More complex accuracy estimation techniques, such as cross-
validation and bootstrapping, are commonly used, especially with datasets containing a small number of
instances.
Cost : The measurement of performance (or accuracy) of a model that predicts (or evaluates) the
outcome for an established result; in other words, that quantifies the deviation between
predicted and actual values (or class labels). An optimization function attempts to minimize the cost
function.
Cross-validation: A verification technique that evaluates the generalization ability of a model for an
independent dataset. It defines a dataset that is used for testing the trained model during the
training phase for over fitting. Cross-validation can also be used to evaluate the performance of
various prediction functions. In k-fold cross-validation, the training dataset is arbitrarily partitioned
into k mutually exclusive subsamples (or folds) of equal sizes. The model is trained k times (or
folds), where each iteration uses one of the k subsamples for testing (cross validating), and the
remaining k-1 subsamples are applied toward training the model. The k results of cross-validation
are averaged to estimate the accuracy as a single estimation.
Data mining: The process of knowledge discovery or pattern detection in a large dataset. The methods
involved in data mining aid in extracting the accurate data and transforming it to a known structure for
further evaluation.
Dataset: A collection of data that conform to a schema with no ordering requirements. In a
typical dataset, each column represents a feature and each row represents a member of the
dataset.
Dimension: A set of attributes that defines a property. The primary functions of dimension are filtering,
classification, and grouping.
Induction algorithm: An algorithm that uses the training dataset to generate a model that
generalizes beyond the training dataset.
Instance: An object characterized by feature vectors from which the model is either trained for
generalization or used for prediction.
Knowledge discovery: The process of abstracting knowledge from structured or unstructured sources
to serve as the basis for further exploration. Such knowledge is collectively represented as a schema
and can be condensed in the form of a model or models to which queries can be made for
statistical prediction, evaluation, and further knowledge discovery .
Model: A structure that summarizes a dataset for description or prediction. Each model can be
tuned to the specific requirements of an application. Applications in big data have large datasets with
many predictors and features that are too complex for a simple parametric model to extract
useful information. The learning process synthesizes the parameters and the structures of a
model from a given dataset.
Online Analytical Processing (OLAP): An approach for resolving multidimensional analytical queries.
Such queries index into the data with two or more attributes (or dimensions). OLAP encompasses a
broad class of business intelligence data and is usually synonymous with multidimensional OLAP
(MOLAP). OLAP engines facilitate the exploration of multidimensional data interactively from
several perspectives, thereby allowing for complex analytical and ad hoc queries with a rapid
execution time.
Types of Learning
In general, machine learning algorithms can be classified into three types.
Supervised learning
Unsupervised learning
Reinforcement learning
Supervised learning
Supervised learning is a learning mechanism that infers the underlying relationship between the
observed data (also called input data) and a target variable (a dependent variable or label) that
is subject to prediction. The learning task uses the labeled training data (training examples) to
synthesize the model function that attempts to generalize the underlying relationship between the
feature vectors (input) and the supervisory signals (output). The feature vectors influence the
direction and magnitude of change in order to improve the overall performance of the function
model. The training data comprise observed input (feature) vectors and a desired output value
(also called the supervisory signal or class label).
Supervised learning deals with or learns with <labeled= data. This implies that some data is already
tagged with the correct answer.
Types:-
Regression
Logistic Regression
Classification
Naive Bayes Classifiers
K-NN (k nearest neighbors)
Decision Trees
Support Vector Machine
Advantages:-
Supervised learning allows collecting data and produces data output from previous experiences.
Helps to optimize performance criteria with the help of experience.
Supervised machine learning helps to solve various types of real-world computation problems.
Disadvantages:-
Classifying big data can be challenging.
Training for supervised learning needs a lot of computation time. So, it requires a lot of time.
Unsupervised learning
o Clustering: Clustering is a method of grouping the objects into clusters such that objects with
most similarities remains into a group and has less or no similarities with the objects of another
group. Cluster analysis finds the commonalities between the data objects and categorizes them
as per the presence and absence of those commonalities.
o Association: An association rule is an unsupervised learning method which is used for finding
the relationships between variables in the large database. It determines the set of items that
occurs together in the dataset. Association rule makes marketing strategy more effective. Such
as people who buy X item (suppose a bread) are also tend to purchase Y (Butter/Jam) item. A
typical example of Association rule is Market Basket Analysis.
Reinforcement Learning
o Reinforcement Learning is a feedback-based Machine learning technique in which an agent
learns to behave in an environment by performing the actions and seeing the results of actions.
For each good action, the agent gets positive feedback, and for each bad action, the agent gets
negative feedback or penalty.
o In Reinforcement Learning, the agent learns automatically using feedbacks without any labeled
data, unlike supervised learning.
o Since there is no labeled data, so the agent is bound to learn by its experience only.
o RL solves a specific type of problem where decision making is sequential, and the goal is long-
term, such as game-playing, robotics, etc.
o Value (): It is expected long-term retuned with the discount factor and opposite to the short-
term reward.
o Q-value (): It is mostly similar to the value, but it takes one additional parameter as a current
action (a).
Advantages
Reinforcement learning is used to solve complex problems that cannot be solved by conventional
techniques. This learning model is very similar to the learning of human beings. Hence, it is close to
achieving perfection.
Disadvantages
Too much reinforcement learning can lead to an overload of states which can diminish the results,
also it is not preferable for solving simple problems. The curse of dimensionality limits reinforcement
learning for real physical systems.
Input Layer:
As the name suggests, it accepts inputs in several different formats provided by the programmer.
Hidden Layer:
The hidden layer presents in-between input and output layers. It performs all the calculations to find
hidden features and patterns.
Output Layer:
The input goes through a series of transformations using the hidden layer, which finally results in output
that is conveyed using this layer.
The artificial neural network takes input and computes the weighted sum of the inputs and includes a
bias. This computation is represented in the form of a transfer function.
There is no particular guideline for determining the structure of artificial neural networks. The
appropriate network structure is accomplished through experience, trial, and error.
Unrecognized behavior of the network:
It is the most significant issue of ANN. When ANN produces a testing solution, it does not provide insight
concerning why and how. It decreases trust in the network.
Hardware dependence:
Artificial neural networks need processors with parallel processing power, as per their structure.
Therefore, the realization of the equipment is dependent.
Difficulty of showing the issue to the network:
ANNs can work with numerical data. Problems must be converted into numerical values before being
introduced to ANN. The presentation mechanism to be resolved here will directly impact the
performance of the network. It relies on the user's abilities.
The duration of the network is unknown:
The network is reduced to a specific value of the error, and this value does not give us optimum results.
Clustering
A way of grouping the data points into different clusters, consisting of similar data points. The objects
with the possible similarities remain in a group that has less or no similarities with another group
The clustering methods are broadly divided into Hard clustering (data point belongs to only one group)
and Soft Clustering (data points can belong to another group also). But there are also other various
approaches of Clustering exist. Below are the main clustering methods used in Machine learning:
1. Partitioning Clustering
2. Density-Based Clustering
3. Distribution Model-Based Clustering
4. Hierarchical Clustering
5. Fuzzy Clustering
The clustering technique can be widely used in various tasks. Some most common uses of this technique are:
o Market Segmentation
o Statistical data analysis
o Social network analysis
o Image segmentation
o Anomaly detection, etc.
Decision Tree is a supervised learning technique that can be used for both classification and Regression
problems, but mostly it is preferred for solving Classification problems. It is a tree-structured classifier,
where internal nodes represent the features of a dataset, branches represent the decision
rules and each leaf node represents the outcome. In a Decision tree, there are two nodes, which are
the Decision Node and Leaf Node. Decision nodes are used to make any decision and have multiple
branches, whereas Leaf nodes are the output of those decisions and do not contain any further
branches.
Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further after getting
a leaf node.
Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes according to the
given conditions.
Branch/Sub Tree: A tree formed by splitting the tree.
Pruning: Pruning is the process of removing the unwanted branches from the tree.
Parent/Child node: The root node of the tree is called the parent node, and other nodes are called the
child nodes.
detection, diagnostics, automated insight, reasoning, time series prediction, and decision making under
uncertainty.
Bayesian Network can be used for building models from data and experts opinions, and it consists of
two parts:
The generalized form of Bayesian network that represents and solve decision problems under uncertain
knowledge is known as an Influence diagram.
A Bayesian network graph is made up of nodes and Arcs (directed links), where:
o Each node corresponds to the random variables, and a variable can be continuous or discrete.
o Arc or directed arrows represent the causal relationship or conditional probabilities between
random variables. These directed links or arrows connect the pair of nodes in the graph.
These links represent that one node directly influence the other node, and if there is no directed
link that means that nodes are independent with each other
o In the above diagram, A, B, C, and D are random variables represented by the nodes of
the network graph.
o If we are considering node B, which is connected with node A by a directed arrow,
then node A is called the parent of Node B.
o Node C is independent of node A.
o Causal Component
o Actual numbers
Each node in the Bayesian network has condition probability distribution P(Xi |Parent(Xi) ), which
determines the effect of the parent on that node.
Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms, which is
used for Classification as well as Regression problems. However, primarily, it is used for Classification
problems in Machine Learning. The goal of the SVM algorithm is to create the best line or decision
boundary that can segregate n-dimensional space into classes so that we can easily put the new data
point in the correct category in the future. This best decision boundary is called a hyperplane. SVM
chooses the extreme points/vectors that help in creating the hyperplane. These extreme cases are called
as support vectors, and hence algorithm is termed as Support Vector Machine. Consider the below
diagram in which there are two different categories that are classified using a decision boundary or
hyperplane:
Advantages of SVM:
Effective in high dimensional cases
Its memory efficient as it uses a subset of training points in the decision function called support
vectors
Different kernel functions can be specified for the decision functions and its possible to specify
custom kernels
Assignment 1
Course
Question Outcome No. , Title of Questions
No Blooms Level
4 CO1, Discuss about the Historical progress of Machine Learning. What is the concept
Remember of Clustering in ML.
Find the maximally general hypothesis and maximally specific hypothesis for the
training examples given in the table using the candidate elimination algorithm.
Given Training Example:
5 CO1, Sky Temp Humidity wind water Forecast Sport
Remember
Sunny warm Normal Strong warm same Yes
Sunny warm High Strong warm same Yes
Rainy cold High Strong warm change No
Sunny warm High Strong cool change Yes