Professional Documents
Culture Documents
FUNDAMENTALS
Sudhakar MS
School of Electronics Engineering
Vellore Institute of Technology
MACHINE LEARNING
A computer program to learn from experience with
Teaching machine/computers to do things naturally by
respect to some class of tasks.
learning through experience
Algorithms that
Practise of using
Science of getting can learn from
algorithms to parse
computers to act data without
data, learn from it and
without being relying on rule-
then make prediction
explicitly based
or determination
programmed programming
Data
Computer Output
Program
Machine Learning
Data
Computer Program
Output
5
A classic example of a task that requires machine
learning: It is very hard to say what
makes a 2
• Data is well "labeled” denotes that the data is pre-tagged with the right answer
Process In a supervised learning model, input and In unsupervised learning model, only input data will
output variables will be given. be given
Input Data Algorithms are trained using labeled data. Algorithms are used against data which is not labeled
Algorithms Used Support vector machine, Neural network, Unsupervised algorithms can be divided into different
Linear and logistics regression, random categories: like Cluster algorithms, K-means,
forest, and Classification trees. Hierarchical clustering, etc.
Computational Complexity Supervised learning is a simpler method. Unsupervised learning is computationally complex
Use of Data Supervised learning model uses training data Unsupervised learning does not use output data.
to learn a link between the input and the
outputs.
Accuracy of Results Highly accurate and trustworthy method. Less accurate and trustworthy method.
Real Time Learning Learning method takes place offline. Learning method takes place in real time.
Data from G. Witt. Journal of Statistics Education, Volume 21, Number 1 (2013)
Supervised Learning: Classification
• Given (x 1 , y1), (x 2 , y2), ..., (x n , yn)
• Learn a function f(x) to predict y given x
– y is categorical == classification
- Clump Thickness
- Uniformity of Cell Size
Age - Uniformity of Cell Shape
…
Tumor Size
Genes
Individuals
[Source: Daphne Koller]
Unsupervised Learning
Performance Measure
• Accuracy – Proportion of examples for which the model produces the correct output
• Error rate / Expected loss - Proportion of examples for which the model produces the incorrect output
Model evaluation
• Data set (Design Matrix) - Collection of examples
• Test data, Test error (generalization error)
• Train data, Training error
Simple example of a Learning algorithm
• Linear
Regression
• System that takes a vector as input and predict the scalar as an output
Source:https://www.geeksforgeeks.org/underfitting-and-overfitting-in-machine-learning/
Generalisation & Regularisation
• The central challenge in machine learning is that we must perform well on
inputs—not just those on which our model was trained. The ability to
perform well on previously unobserved inputs is called generalization.
• Generalization or Test error should be low
• Generalization error of a machine learning model by measuring its
performance on a test set of examples that were collected separatelyfrom the
training set.
• Regularization - design our machine learning algorithms to perform well on a
specific task- increase or decrease the model’s capacity-no best form of
regularization- Instead we must choose a form of regularization that is well-
suited to the particular task we want to solve.
Hyperparameters and Validation Sets
• Most machine learning algorithms have several settings that we can use
to control the behavior of the learning algorithm. These settings are
called hyperparameters
• Typically, one uses about 80% of the training data for training and 20%
for validation. Since the validation set is used to “train” the
hyperparameters, the validation set error will underestimate the
generalization error, though typically by a smaller amount than the
training error. After all hyperparameter optimization is complete, the
generalization error may be estimated using the test set.
• Cross Validation – K fold Cross Validation- k non-overlapping subsets.
Machine Learning (ML) Algorithms
• Deep Learning • Bayesian
• Ensemble • Decision Tree
• Neural Networks • Dimensionality Reduction
• Regularisation • Instance Based
• Rule Based • Clustering
• Regression