Professional Documents
Culture Documents
What Is Machine Learning?
What Is Machine Learning?
There are some tasks for which you do not have an algorithm to solve it by conventional
programming, but we have lots of data from which useful information can be learned (can you
think of suitable examples??). With digital devices and powerful memory and computation
capabilities, stored data can be analyzed and turned into information that we can make use of, to
make predictions. There are always certain patterns or regularities in the data from which we
can identify the underlying process that generates the data and use it for prediction. We can
never get this process completely; can construct a good and useful approximation. That
approximation may not explain everything, but may still be able to account for some part of the
data. We believe that though identifying the complete process may not be possible, we can still
detect certain patterns or regularities. This is machine learning. ( See the more formal definition
of ML in Mitchell book and introduction lecture slides) . Pattern recognition is a very typical
application area of ML. (Can you think of some of them??). Application of machine learning
methods to large databases is called data mining. The analogy is that a large volume of earth and
raw material is extracted from a mine, which when processed leads to a small amount of very
precious material.
Task -1 Do some general self-study for self-knowledge and read about various applications
of ML from internet under broad categories: for example in e-commerce, finance,
computer vision, manufacturing, robotics medical diagnosis, telecommunications, science,
speech recognition, bioinformatics etc
It must be clear that ML is NOT a database problem, but a part of AI: They are intelligent
systems made possible by LEARNING in a changing environment. Machine learning is
programming computers to optimize a performance criterion using example data or past
experience. We have a model defined up to some parameters, and learning is the execution of a
computer program to optimize the parameters of the model using the training data or past
experience. The model may be predictive to make predictions in the future, or descriptive to gain
knowledge from data, or both
Learning paradigms
1. Supervised
2. Unsupervised
3. Reinforcement Learning
In supervised learning there is the data pair an input, X, an output, Y, (often called label of
training data) and the task is to learn the mapping from the input to the output, whose correct
values are provided by a supervisor. The approach is that we assume a model defined with
respect to a set of parameters to approximate this mapping. The machine learning program
optimizes the parameters, such that the approximation error is minimized, that is, our estimates
are as close as possible to the correct values given in the training set. The two types of
supervised learning:
1) Classification where the labels are of discrete variables (K categories in general for a multi-
classification problem (e.g. K=2, it’s a binary classification, Y could be +1/-1or 0/1 )
(2) Regression, where the labels to be predicted are continuous.
In Unsupervised learning, (Also called Density Estimation in statistics), there is no such
supervisor and we only have input data, no labels are provided. Density estimation techniques
learn the probability distribution according to which the data has been sampled. There are other
approaches of unsupervised learning such as Clustering in which similar data points are grouped
and partitioned to homogenous sets. [Note the differences between Clustering and Classification]
The overall aim in unsupervised learning is to find the regularities or associations in the input
data. There is a structure to the input space such that certain patterns occur more often than
others, and we want to see what generally happens and what does not. Feature selection and
Dimensionality reduction ( We will study the PCA technique later in our course ) are some other
commonly used unsupervised learning schemes.
Task 2: You should be able to classify a given problem into the correct paradigm, as a self-study,
go through all the typical application examples of each category for strengthening your concepts
General Design of a Supervised learning problem is shown below ( details we have discussed in
class)
We will be very clear about the following
What do you mean by features and dimensionality of your data and feature space?
How many features? How many training examples do we have in the above input space of the
training data?
The diagram below summarizes very well the role of features in the evolving of Machine
Learning from earlier AI expert systems that were mostly rule based to what we have today
popularly called as Deep Learning. We see that it is the Feature Discovery process that is getting
more and more automated.
Most of the real-world problems belong to the Classification category and we will start our
lectures therefore first with Classifiers ( We will study in our course Naïve Bayes Classifier,
Logistic Regression, SVM ) and then we move onto Regression problems most popularly
handled by ANN. (Please note that Logistic Regression is actually a Classification problem,
there is a confusion in using the name regression, it is a misnomer) The next important thing that
we must learn is the distinction between Linear and Non-linear classifiers and we should have
the concept of Linearly and Non-Linearly separable Datasets. A data set is separable by a
learner if there is some instance of that learner that correctly predicts all the data points. The data
points can be separated into two classes using a hyperplane in feature space. In 2-dimensions, the
decision boundary is a straight line as shown in the examples below:
Please see an important observation that often a suitable feature mapping can change NLS data
to LS one usually by transforming to a higher dimensional space. See the 1D example below
how a feature mapping from 1D to 2D makes the data set linearly separable. This is the basis of
Kernel Mapping we will study in brief in later lectures when we discuss SVM.
Non-Linear Classifier
What is this??