Professional Documents
Culture Documents
Unit I
Unit I
MACHINE LEARNING
1.1 Introduction
▪ Machine Learning (ML) is a sub-field of Artificial Intelligence (AI) which concerns with developing
computational theories of learning and building learning machines.
▪ Learning is a phenomenon and process which has manifestations of various aspects. Learning process
includes gaining of new symbolic knowledge and development of cognitive skills through instruction
and practice.
▪ Machine Learning Definition: A computer program is said to learn from experience E with respect to so
me class of tasks T and performance measure P, if its performance at tasks in T, as measured by P,
improves with experience E.
▪ Application of machine learning methods to large databases is called data mining.
▪ The goal of machine learning is to build computer systems that can adapt and learn from their
experience.
▪ Algorithm is used to solve a problem on computer. An algorithm is a sequence of instruction. It should
carry out to transform the input to output. For example, for addition of four numbers is carried out by
giving four number as input to the algorithm and output is sum of all four numbers.
Why is Machine Learning Important ?
▪ Machine learning algorithms can figure out how to perform important tasks by generalizing from
examples.
▪ Machine Learning provides business insight and intelligence. Decision makers are provided with
greater insights into their organizations.
▪ Machine learning algorithms discover the relationships between the variables of a system (input,
output and hidden) from direct samples of the system.
▪ Following are some of the reasons:
1. Some tasks cannot be defined well, except by examples. For example : Recognizing people.
2. Relationships and correlations can be hidden within large amounts of data. To solve these problems,
machine learning and data mining may be able to find these relationships.
3. Human designers often produce machines that do not work as well as desired in the environments
in which they are used.
How Machines Learn?
▪ Machine learning typically follows three phases:
▪ Training: A training set of examples of correct
behavior is analyzed and some representation of
the newly learnt knowledge is stored. This is some
form of rules.
▪ Validation: The rules are checked and, if necessary,
additional training is given. Sometimes additional
test data are used, but instead, a human expert
may validate the rules, or some other automatic
knowledge based component may be used. The
role of the tester is often called the opponent.
▪ Application: The rules are used in responding to
some new situation.
Machine learning process in
divided into three parts: Data Data input: Information is used for
1.1.1 How do Machine Learn?
inputs, abstraction and future decision making.
generalization.
▪ Definition: A computer program is said to learn from experience E with respect to some class of tasks
T and performance measure P, if its performance at tasks in T, as measured by P, improves with
experience E.
▪ Identify three features are as follows:
1. Class of tasks
2. Measure of performance to be improved
3. Source of experience
What are T, P, E? How do we formulate a machine learning problem ?
1. Task T: Driving on public, 4-lane highway using vision sensors.
2. Performance measure P: Average distance traveled before an error (as judged by human overseer).
3. Training experience E: A sequence of images and steering commands recorded while observing a
human driver.
1.2 Types of
Machine Learning
▪ Learning is constructing or
modifying representation of
what is being experienced.
Learn means to get knowledge
of by study, experience or being
taught.
▪ Machine learning is usually
divided into three types:
Supervised, unsupervised and
reinforcement learning.
1.2.1 Supervised Learning
▪ Supervised learning is the machine learning task of inferring a function from supervised training data.
The training data consist of a set of training examples.
▪ Supervised learning in which the network is trained by providing it with input and matching output
patterns. These input-output pairs are usually provided by an external teacher.
▪ Human learning is based on the past experiences. A computer does not have experiences.
▪ Supervised learning is the machine learning task of inferring a function from supervised training data.
▪ A supervised learning algorithm analyzes the training data and produces an inferred function, which is
called a classifier or a regression function. Fig. 1.2.2. shows supervised learning process.
▪ Each input vector requires a
corresponding target vector.
Training Pair = (Input Vector, Target
Vector)
▪ Supervised learning denotes a
method in which some input
vectors are collected and presented
to the network.
▪ Supervised learning is further
divided into methods which use
reinforcement or error correction.
The perceptron learning algorithm
is an example of supervised
learning with reinforcement.
▪ In order to solve a given problem of supervised learning, following steps
are performed:
1. Find out the type of training examples.
2. Collect a training set.
3. Determine the input feature representation of the learned function.
4. Determine the structure of the learned function and corresponding
learning algorithm.
5. Complete the design and then run the learning algorithm on the
collected training set.
6. Evaluate the accuracy of the learned function. After parameter
adjustment and learning, the performance of the resulting function
should be measured on a test set that is separate from the training set.
1.2.1.1 Classification
▪ Classification predicts categorical labels (classes), prediction models
continuous-values function.
▪ Prediction means models continuous-valued functions ,i.e., predicts
unknown or missing values.
▪ Aim: To predict categorical class labels for new samples.
▪ Input: Training set of samples, each with a class label.
▪ Output: Classifier is based on the training set and the class labels.
▪ Prediction is similar to classification. It constructs a model and uses predict
unknown or missing value.
▪ Classification and prediction may need to be preceded by relevance
analysis, Which attempts to identify attributes that do not contribute to the
classification or prediction process.
1.2.1.2 Regression
▪ For an input x, if the output is continuous, this is called a regression problem. For
example, based on historical information of demand for tooth paste in your
supermarket, you are asked to predict the demand for the next month.
▪ For regression tasks, the typical accuracy metrics are Root Mean Square Error
(RMSE) and Mean Absolute Percentage Error (MAPE). These metrics measure the
distance between the predicted numeric target and the actual numeric answer.
Regression Line
▪ Least squares: The least squares regression line is the line that makes the sum of squared residuals as small as
possible. Linear means “straight line”.
▪ Regression line is the line which gives the best estimate of one variable from the Value of any other given variable.
▪ The regression line gives the average relationship between the two variables in Mathematical form.
▪ For two variables X and Y, there are always two lines of regression.
▪ Regression line of X on Y: Gives the best estimate for the value of X for any specific given values of Y:
X = a + bY
▪ Regression line of Y on X: Gives the best estimate for a value of Y for any specific given values of X:
Y = a + bx
Where
a= X – intercept, Y-intercept
b= Slope of the line
X = Dependent variable
Y = Independent variable
Evaluating a Regression Model
▪ Assume we want to predict a car’s price using some features such as
dimensions, horsepower, engine specification, mileage etc. This is a
typical regression problem, Where the target variable (price) is a
continuous numeric value.
Advantages:
▪ Training a linear regression model is usually much faster than
methods such as neural networks.
▪ Linear regression models are simple and require minimum memory
to implement .
Assessing Performance of Regression- Error Measures
▪ The training error is the mean error over the training sample.
The test error is the expected prediction error over an
independent test sample.
Supervised learning requires that the target For unsupervised learning typically either the Reinforcement learning is learning what to do
variable is well defined and that a sufficient target variable is unknown or has only been and how to map situations to actions. The
number of its values are given. recorded for too small a number of cases. learner is not told which actions to take.
Supervised learning deals with two main Unsupervised Learning deals with clustering and Reinforcement learning deals with exploitation
tasks regression and classification. associative rule mining problems. or exploration, Markov’s decision processes,
policy learning, deep learning and value
learning.
The input data in supervised learning in Unsupervised learning uses unlabelled data. The data is not predefined in reinforcement
labelled data. learning.
Learns by using labelled data. Trained using unlabelled data Without any Works on interacting with the environment.
guidance.
• From first two examples: S2: <?, Warm, Normal, Strong, Cool, Change>
• This is inconsistent with third examples, and there are no hypotheses consistent with these three
examples PROBLEM: We have biased the learner to consider only conjunctive hypotheses. We
require a more expressive hypothesis space.
• The obvious solution to the problem of assuring that the target concept is in the hypothesis space H
is to provide a hypothesis space capable of representing every teachable concept.
Inductive Bias – Fundamental Property of Inductive Inference:
▪ A learner that makes no a priori assumptions regarding the identity concept has no rational
basis for classifying any unseen instances.
▪ Inductive Leap: A learner should be able to generalize training data using prior assumptions
in order to classify unseen instances.
▪ The generalization is known as inductive leap and our prior assumptions are the inductive
bias of the learner.
Inductive Bias Formal Definition
▪ Consider a concept learning algorithm L for the set of instances X. Let c be an arbitrary
concept defined over X, and let Dc = {<x, c(x)>) be an arbitrary set of training examples of
c.
▪ Let L(xi,Dc) denote the classification assigned to the instance xi by L after training on the
data Dc.
▪ The inductive bias of L Is any minimal set of assertions B such that for any target concept c
and corresponding training examples Dc the following formula holds.
Three Learning Algorithms:
▪ ROTE-LEARNER: Learning corresponds simply to storing each observed training example in memory.
Subsequent instances are classified by looking them up in memory. If the instance is found in memory,
the stored classification is returned. Otherwise, the system refuses to classify the new instance.
Inductive Bias: No inductive bias
▪ CANDIDATE-ELIMINATION: New instances are classified only in the case where all members of the
current version space agree on the classification. Otherwise, the system refuses to classify the new
instance. Inductive Bias: the target concept can be represented in its hypothesis space.
▪ FIND-S: This algorithm, described earlier, finds the most specific hypothesis consistent with the
training examples. It then uses this hypothesis to classify all subsequent instances. Inductive Bias: The
target concept can be represented in its hypothesis space, and all instances are negative instances
unless the opposite is entailed by its other knowledge.
1.6 Evaluation and
Cross Validation
▪ Cross-validation is a technique for evaluating estimating performance by training
several machine learning models on subsets of the available input data and
evaluating them on the complementary subset of the data.
▪ But this generic task is broken down into a number of special cases. When training is
done, the data that was removed can be used to test the performance of the
learned model on “new” data. This is the basic idea for a whole class of model
evaluation methods called cross validation.
▪ The K-fold cross validation is one way to improve over the holdout method. The
data set is divided into k subsets, and the holdout method is repeated k times.
▪ Leave-one-out cross validation is K-fold cross validation taken to its logical extreme,
with K equal to N, the number of data points in the set.
▪ Total accuracy is simply the sum of true positive and true negatives,
divided by the total number of items, that is :
Total accuracy = TP+TN / (TP+TN+FP+FN)
▪ Random Accuracy is defined as the sum of the products of reference
likelihood and result likelihood for each class. That is,
Random accuracy = Actual False *Predicted False + Actual True Predicted
True/ (Total *Total)
ROC Curve:
▪ Receiver Operating Characteristics (ROC) graphs have long been used in signal detection theory to
depict the trade-off between hit rates and false alarm rates over noisy channel.
▪ ROC curve summarizes the performance of the model at different threshold values.
▪ By combining confusion matrices at all threshold values. ROC curves are typically used in binary
classification to study the output of a classifier.
▪ An ROC plot plots true positive rate on the Y-axis against false positive rate on the X-axis; a single
contingency table corresponds to a single point in an ROC plot.
▪ An ROC curve is convex if the slopes are monotonically non-increasing when moving along the
curve from (0, 0) to (1, 1). A concavity in an ROC curve, i.e., two or more adjacent segments with
increasing slopes, indicates a locally worse than random ranking.
▪ True Positive Rate (TPR) is a synonym for recall and is therefore defined as follows:
▪ True Positive Rate TPR =TP / TP+FN
▪ False Positive Rate (FPR) is defined as follows: False Positive Rate FPR = = FP / FP+TN
1.6.2 Concept Learning
▪ Inducing general functions from specific training examples is a main issue of machine learning.
▪ Concept learning: Acquiring the definition of a general category from given sample positive and negative
training examples of the category.
▪ The hypothesis space has a general-to-specific ordering of hypotheses, and the search can be efficiently
organized by taking advantage of a naturally occurring structure over the hypothesis space.
▪ Formal definition for concept learning: Inferring a boolean-valued function from training examples of its input
and output.
▪ An example for concept – learning is the learning of bird-concept from the given examples of birds (positive
examples) and non-birds (negative examples).
▪ The inductive learning hypothesis: Any hypothesis found to approximate the target function well over a
sufficiently large set of training examples will also approximate the target function well over other unobserved
examples.
▪ Lacking any further information, our assumption is that the best hypothesis regarding unseen instances is the
hypothesis that best fits the observed training data. This is the fundamental assumption of inductive learning.
1.6.3 Concept Learning as Search
▪ Concept learning can be viewed as the task of searching through a large space of
hypotheses implicitly defined by the hypothesis representation.
▪ The goal of this search is to find the hypothesis that best fits the training
examples.