You are on page 1of 44

Applied Machine Learning

Swarna Chaudhary
BITS Assistant Professor, CSIS
swarna.chaudhary@pilani.bits-pilani.ac.in
Pilani
Pilani Campus
• The slides presented here are obtained from the authors of the books and from
various other contributors. I hereby acknowledge all the contributors for their
material and inputs.
• I have added and modified a few slides to suit the requirements of the course.

2
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Session Content
• Objective of course
• Evaluation Plan
• What is Machine Learning?
• Why use Machine Learning?
• Application areas of Machine Learning
• Types of Learning
• Challenges of Machine Learning

BITS Pilani, Pilani Campus


Modular Structure
No Title of the Module
M1 Introduction to Machine Learning

M2-M3 End-to-end Machine Learning Pipeline


M4 Linear Prediction Models
M5 Classification Models I
M6 Classification Models II
M7 Unsupervised Learning
M8 Neural Networks
M9 Deep Networks
M10 FAccT Machine Learning

4
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Course
Details
LO1 High-level conceptual understanding of machine learning field and its
applicability and limitations
LO2 Ability to frame a machine learning problem and select candidate machine
learning models
LO3 Ability to design and implement an end-to-end machine learning based
solution using Python and appropriate libraries and do trouble shooting
LO4 Ability to evaluate and interpret the results from a machine learning
system
Books & References

1. Aurelien Geron, “Hands-On Machine Learning with Scikit-Learn, Keras and


Tensorflow”, O’Reilly, 2020
2. P-N Tan, M. Steinbach, Vipin Kumar, “Introduction to Data Mining”, 2016, Pearson
3. Christoph Molner, “Interpretable Machine Learning”, 2020,
https://christophm.github.io/interpretable-ml-book/
4. Pedro Domingos, “A Few Useful Things to Know About Machine Learning”, pp. 78-
87, Communications of the ACM, vol. 55 no. 10, October 2012

BITS Pilani, Pilani Campus


Lab Plan
Lab No. Lab Objective
1 End to end Machine learning project
2 Classification
3 Linear regression
4 Logistic Regression/ Naïve Bayesian Classification
5 Decision Tree/ Random Forest
6 Deep Neural Network
7 Convolutional Neural Network
8 Recurrent Neural Network

• Labs not graded


• Lab recordings available at CSIS virtual labs
• Labs will be conducted in Python
Experiential
Learning
Lab Material

Lab Capsule available in Virtual Lab hosted in Platify via


LMS. Platform/ Tools: Python

Dataset Sources

https://archive.ics.uci.edu/ml/index.php

https://www.kaggle.com/datasets
https://data.gov.in/catalogsv2

BITS Pilani, Pilani Campus


Evaluation Components
Component Name Type Weight Duration
EC – 1a Quiz Contact 5% TBA
Session based
EC - 1b Assignments Analysis 25% 2 Weeks
each
Implementation (12+13)

EC – 2 Mid-Semester Test Closed/Open Book 30% 2 hrs

EC – 3 Comprehensive Exam Open Book 40% 2 hrs

BITS Pilani, Pilani Campus


Machine Learning
• Machine learning is a scientific discipline that explores
the construction and study of algorithms that
can learn from data. 
• Such algorithms operate by building a model based on
inputs and using that to make predictions or decisions,
rather than following only explicitly programmed
instructions.

Saturday, September 3,Deemed


BITS Pilani, 2 to be University under Section 3 of UGC Act, 1956
Traditional Programming

Data Computer Output


Program

Machine Learning

Data Computer Program


Output
Slide credit: Pedro Domingos
What is Machine Learning?
Definition by Tom Mitchell (1998):
"A computer program is said to learn from experience E with respect to some class
of tasks T and performance measure P, if its performance at tasks in T, as measured
by P, improves with experience E.”

Example: playing checkers.

E = the experience of playing many games of checkers


T = the task of playing checkers.
P = the probability that the program will win the next game.

Saturday, September 3, 2
Learning Problems
Example
T: Recognizing hand-written words
P: Percentage of words correctly classified
E: Database of human-labeled images of
handwritten words

12
Where does ML fit in?

Slide credit: Dhruv Batra, Fei Sha


Steps of Machine Learning
1.Define Business problem
2.Gathering Data
3.Preparing that data
4.Choosing a model
5.Training
6.Evaluation
7.Hyperparameter Tuning
8.Prediction
Application Domains
• Web search
• Finance
• Telecom
• Healthcare
• E-commerce
• Robotics
• Social networks
• Language Processing
Many more emerging…

Slide credit: Pedro Domingos


Application Types
– Medical diagnosis
– Credit card applications or transactions
– Worm detection in network packets
– Spam filtering in email
– Recommended articles in a newspaper
– Recommended books, movies, music, etc
– Handwritten letters
– Speech Recognition
– Personalized medicine/treatment
– Sentiment Analysis

Slide credit: Ray Mooney


Why Machine Learning?
Traditional Approach to Spam Filtering
Spam typically uses words or phrases such as “4U,” “credit card,” “free,” and
“amazing”

Solution
Write a detection algorithm for frequently appearing patterns in
spams Test and update the detection rules until it is good enough.

Challenge
Detection algorithm likely
to be a long list of complex
rules hard to maintain.

17
BITS Pilani, Pilani Campus
Why Machine Learning?
Machine Learning Approach to Spam Filtering
Automatically learns phrases that are good predictors of spam by detecting
unusually
frequent patterns of words in spams compared to “ham”s

The program is much shorter, easier to maintain, and most likely more
accurate.

18
BITS Pilani, Pilani Campus
Why Machine Learning?
Machine Learning Approach - Spam Filtering
Automatically learns phrases that are good predictors of spam by detecting
unusually
frequent patterns of words in spams compared to “ham”s

The program is much shorter, easier to maintain, and most likely more
accurate.
19
BITS Pilani, Pilani Campus
A Classic example of ML Task

20
Types of Learning
• Supervised (inductive) learning
– Given: training data, desired outputs (labels)
• Unsupervised learning
– Given: training data (without desired outputs)
• Semi-supervised learning
– Given: training data + a few desired outputs
• Reinforcement learning
– Given: rewards from sequence of actions

Slide Credit: Eric Eaton


Supervised Learning: Regression

• Given (x1, y1), (x2, y2), ..., (xn, yn)


• Learn a function f(x) to predict y given x

9
September Arctic Sea Ice Extent

8
7
(1,000,000 sq km)

6
5
4
3
2
1
0
1970 1980 1990 2000 2010 2020
Year
Data from G. Witt. Journal of Statistics Education, Volume 21, Number 1 (2013) Slide Credit: Eric Eaton
Regression

y = f(x)
output prediction features
function

• Training: given a training set of labeled examples {(x1,y1), …,


(xN,yN)}, estimate the prediction function f by minimizing the
prediction error on the training set
• Testing: apply f to a never before seen test example x and output
the predicted value y = f(x)
Slide credit: L. Lazebnik
Supervised Learning: Classification
• Given (x1, y1), (x2, y2), ..., (xn, yn)
• Learn a function f(x) to predict y given x
– y is categorical == classification
Breast Cancer (Malignant / Benign)

1(Malignant)

0(Benign)
Tumor Size
Predict Benign
Predict Malignant
Tumor Size
Slide Credit: Eric Eaton
Supervised Learning Applications
• Face Detection.
• Spam detection.
• Weather forecasting.
• Predicting housing prices based on the prevailing market price.
• Stock price predictions,

25
Supervised Learning Techniques
• Linear Regression
• Logistic Regression
• Naïve Bayes Classifiers
• Support Vector Machines (SVMs)
• Decision Trees and Random Forests
• Neural networks

26
BITS Pilani, Pilani Campus
Unsupervised Learning
• Given x1, x2, ..., xn (without labels)
• Output hidden structure behind the x’s
– E.g., clustering

Slide Credit: Eric Eaton


Unsupervised Learning Techniques
Clustering
• k-Means
• Hierarchical Cluster Analysis
• Expectation Maximization

Visualization and dimensionality reduction

• Principal Component Analysis (PCA)

• Kernel PCA

• t-distributed Stochastic Neighbor Embedding (t-SNE)

28
BITS Pilani, Pilani Campus
Reinforcement Learning
• No predefined data
• Semi-supervised learning model in machine learning,
• Allow an agent to take actions and interact with an environment so as to
maximize the total rewards. 
• Examples:
– Autonomous cars
– Game playing
– Robot in a maze

Slide Credit: Eric Eaton


Supervised, Unsupervised and Reinforcement
Learning Comparison
Types of Learning
Based on how training data is used

31
Types of Learning

Based on how training data is used

32
Challenges of Machine Learning
• Training Data
– Insufficient
– Non representative
– Poor Quality
– Irrelevant attributes
• Model Selection
– Overfitting
– Underfitting
• Testing and Validation
– Hyperparameters

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Insufficient Training Data
Consider trade-off Between Algorithm development & training data
capture

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Non-representative Training Data

Training Data be representative of the new cases we want to generalize


• Small sample size leads to sampling noise
• Missing data over emphasizes the role of wealth on happiness
• If sampling process is flawed, even large sample size can lead to sampling bias

missing data
available data

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Data Quality
Some instances have missing features
– e.g., 5% of customers did not specify their age
– Ignore the instances all together or the feature,
– fill in the missing values
– Train multiple ML models

Some instances can be erroneous, noisy or outliers


– Human or machine generated
– Identify, discard or fix manually, as appropriate

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Irrelevant Features
• Feature selection
• more useful features to train on among existing features.
• Feature extraction
• combine existing features to produce a more useful one.
• Create new features by gathering new data

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Overfitting vs Underfitting
• Overfitting leads to high performance in training set but performs
poorly on new data
• e.g., a high-degree polynomial life satisfaction model that strongly overfits the training data
• Small training set or sampling noise can lead to model following the noise than the underlying
pattern in the dataset
• Solution: Regularization Large slope

Smaller slope
High-order
polynomial

• Underfitting when the model is too simple to learn the underlying


structure in the data
• Select a more powerful model, with more parameters
• Feed better features to the learning algorithm BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Testing and Validation
• Good ML algorithms need to work well on test data
• But test data is often not accessible to the provider of the algorithm
• Common assumption is training data is representative of test data
• Randomly chosen subset of the training data is held out as validation set
• aka dev set
• Once ML model is trained, its performance is evaluated on validation data
• Expectation is ML model working well on validation set will work well on
unknown test data
• Typically, 20-30% of the data is randomly held out as validation data

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Cross Validation
• To reduce the bias of validation set selection process
• Often K is chosen as 10
• aka 10 fold cross validation
• 10 fold cross validation involves
• randomly selecting the validation set 10 times
• model generation with 10 resulting training set
• Evaluate the performance of each on that validation set
• averaging the performance over the validation sets

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Choice of Hyper-parameters
• Modern ML models often use a lot of model parameters
• Known as hyperparameters
• Model performance depends on choice of parameters
• Each parameter can assume a number of values
• Real numbers or categories
• Exponential number of hyperparameter combinations possible
• Best model correspond to best cross validation performance over the set of
hyperparameter combinations
• Expensive to perform
• Some empirical frameworks available for hyperparameter optimization

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Open source ML programming tools
Algorithms
Platform
or Features
Scikit Learn Linux, Mac OS, Python, C, C+ Classification, Regression, Clustering
Windows + Preprocessing, Model Selection
Dimensionality reduction.

PyTorch Linux, Mac OS, Python, C++ Autograd Module, Optim Module
Windows NN Module

TensorFlow Linux, Mac OS, Python, C++ Provides a library for dataflow
Windows programming.

Weka Linux, Mac OS, Java Data preparation, Classification


Windows Regression, Clustering, Visualization
Association rules mining
Evaluation
• Accuracy
• Precision and recall
• Squared error
• Likelihood
• Posterior probability
• Cost / Utility
• Margin
• Entropy
• etc.

Slide credit: Pedro Domingos


Next class
plan

• Module 2: End-to-End ML Pipeline (Students are requested to check the

Module 1 virtual lab python file for next class)

BITS Pilani, Pilani Campus

You might also like