CS1 Aml

Applied Machine Learning
Swarna Chaudhary
BITS Assistant Professor, CSIS
swarna.chaudhary@pilani.bits-pilani.ac.in
Pilani
Pilani Campus
• The slides presented here are obtained from the authors of the books and from
various other contributors. I hereby acknowledge all the contributors for their
material and inputs.
• I have added and modified a few slides to suit the requirements of the course.
2
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Session Content
• Objective of course
• Evaluation Plan
• What is Machine Learning?
• Why use Machine Learning?
• Application areas of Machine Learning
• Types of Learning
• Challenges of Machine Learning
BITS Pilani, Pilani Campus

Modular Structure
No Title of the Module
M1 Introduction to Machine Learning
M2-M3 End-to-end Machine Learning Pipeline

M4 Linear Prediction Models
M5 Classification Models I
M6 Classification Models II
M7 Unsupervised Learning
M8 Neural Networks
M9 Deep Networks
M10 FAccT Machine Learning
4
Course
Details
LO1 High-level conceptual understanding of machine learning field and its
applicability and limitations
LO2 Ability to frame a machine learning problem and select candidate machine
learning models
LO3 Ability to design and implement an end-to-end machine learning based
solution using Python and appropriate libraries and do trouble shooting
LO4 Ability to evaluate and interpret the results from a machine learning
system
Books & References
1. Aurelien Geron, “Hands-On Machine Learning with Scikit-Learn, Keras and

Tensorflow”, O’Reilly, 2020
2. P-N Tan, M. Steinbach, Vipin Kumar, “Introduction to Data Mining”, 2016, Pearson
3. Christoph Molner, “Interpretable Machine Learning”, 2020,
https://christophm.github.io/interpretable-ml-book/
4. Pedro Domingos, “A Few Useful Things to Know About Machine Learning”, pp. 78-
87, Communications of the ACM, vol. 55 no. 10, October 2012

Lab Plan
Lab No. Lab Objective
1 End to end Machine learning project
2 Classification
3 Linear regression
4 Logistic Regression/ Naïve Bayesian Classification
5 Decision Tree/ Random Forest
6 Deep Neural Network
7 Convolutional Neural Network
8 Recurrent Neural Network
• Labs not graded

• Lab recordings available at CSIS virtual labs
• Labs will be conducted in Python
Experiential
Learning
Lab Material
Lab Capsule available in Virtual Lab hosted in Platify via

LMS. Platform/ Tools: Python
Dataset Sources
https://archive.ics.uci.edu/ml/index.php
https://www.kaggle.com/datasets
https://data.gov.in/catalogsv2

Evaluation Components
Component Name Type Weight Duration
EC – 1a Quiz Contact 5% TBA
Session based
EC - 1b Assignments Analysis 25% 2 Weeks
each
Implementation (12+13)
EC – 2 Mid-Semester Test Closed/Open Book 30% 2 hrs
EC – 3 Comprehensive Exam Open Book 40% 2 hrs

Machine Learning
• Machine learning is a scientific discipline that explores
the construction and study of algorithms that
can learn from data.
• Such algorithms operate by building a model based on
inputs and using that to make predictions or decisions,
rather than following only explicitly programmed
instructions.
Saturday, September 3,Deemed

BITS Pilani, 2 to be University under Section 3 of UGC Act, 1956
Traditional Programming
Data Computer Output

Program
Machine Learning
Data Computer Program

Output
Slide credit: Pedro Domingos
What is Machine Learning?
Definition by Tom Mitchell (1998):
"A computer program is said to learn from experience E with respect to some class
of tasks T and performance measure P, if its performance at tasks in T, as measured
by P, improves with experience E.”
Example: playing checkers.
E = the experience of playing many games of checkers

T = the task of playing checkers.
P = the probability that the program will win the next game.
Saturday, September 3, 2
Learning Problems
Example
T: Recognizing hand-written words
P: Percentage of words correctly classified
E: Database of human-labeled images of
handwritten words
12
Where does ML fit in?
Slide credit: Dhruv Batra, Fei Sha

Steps of Machine Learning
1.Define Business problem
2.Gathering Data
3.Preparing that data
4.Choosing a model
5.Training
6.Evaluation
7.Hyperparameter Tuning
8.Prediction
Application Domains
• Web search
• Finance
• Telecom
• Healthcare
• E-commerce
• Robotics
• Social networks
• Language Processing
Many more emerging…

Application Types
– Medical diagnosis
– Credit card applications or transactions
– Worm detection in network packets
– Spam filtering in email
– Recommended articles in a newspaper
– Recommended books, movies, music, etc
– Handwritten letters
– Speech Recognition
– Personalized medicine/treatment
– Sentiment Analysis
Slide credit: Ray Mooney

Why Machine Learning?
Traditional Approach to Spam Filtering
Spam typically uses words or phrases such as “4U,” “credit card,” “free,” and
“amazing”
Solution
Write a detection algorithm for frequently appearing patterns in
spams Test and update the detection rules until it is good enough.
Challenge
Detection algorithm likely
to be a long list of complex
rules hard to maintain.
17
Machine Learning Approach to Spam Filtering
Automatically learns phrases that are good predictors of spam by detecting
unusually
frequent patterns of words in spams compared to “ham”s
The program is much shorter, easier to maintain, and most likely more
accurate.
18
Machine Learning Approach - Spam Filtering
Automatically learns phrases that are good predictors of spam by detecting
unusually
frequent patterns of words in spams compared to “ham”s
The program is much shorter, easier to maintain, and most likely more
accurate.
19
A Classic example of ML Task
20
Types of Learning
• Supervised (inductive) learning
– Given: training data, desired outputs (labels)
• Unsupervised learning
– Given: training data (without desired outputs)
• Semi-supervised learning
– Given: training data + a few desired outputs
• Reinforcement learning
– Given: rewards from sequence of actions
Slide Credit: Eric Eaton

Supervised Learning: Regression
• Given (x1, y1), (x2, y2), ..., (xn, yn)

• Learn a function f(x) to predict y given x
9
September Arctic Sea Ice Extent
8
7
(1,000,000 sq km)
6
5
4
3
2
1
0
1970 1980 1990 2000 2010 2020
Year
Data from G. Witt. Journal of Statistics Education, Volume 21, Number 1 (2013) Slide Credit: Eric Eaton
Regression
y = f(x)
output prediction features
function
• Training: given a training set of labeled examples {(x1,y1), …,

(xN,yN)}, estimate the prediction function f by minimizing the
prediction error on the training set
• Testing: apply f to a never before seen test example x and output
the predicted value y = f(x)
Slide credit: L. Lazebnik
Supervised Learning: Classification
• Given (x1, y1), (x2, y2), ..., (xn, yn)
• Learn a function f(x) to predict y given x
– y is categorical == classification
Breast Cancer (Malignant / Benign)
1(Malignant)
0(Benign)
Tumor Size
Predict Benign
Predict Malignant
Tumor Size
Supervised Learning Applications
• Face Detection.
• Spam detection.
• Weather forecasting.
• Predicting housing prices based on the prevailing market price.
• Stock price predictions,
25
Supervised Learning Techniques
• Linear Regression
• Logistic Regression
• Naïve Bayes Classifiers
• Support Vector Machines (SVMs)
• Decision Trees and Random Forests
• Neural networks
26
Unsupervised Learning
• Given x1, x2, ..., xn (without labels)
• Output hidden structure behind the x’s
– E.g., clustering

Unsupervised Learning Techniques
Clustering
• k-Means
• Hierarchical Cluster Analysis
• Expectation Maximization
Visualization and dimensionality reduction
• Principal Component Analysis (PCA)
• Kernel PCA
• t-distributed Stochastic Neighbor Embedding (t-SNE)
28
Reinforcement Learning
• No predefined data
• Semi-supervised learning model in machine learning,
• Allow an agent to take actions and interact with an environment so as to
maximize the total rewards.
• Examples:
– Autonomous cars
– Game playing
– Robot in a maze

Supervised, Unsupervised and Reinforcement
Learning Comparison
Types of Learning
Based on how training data is used
31
Types of Learning
Based on how training data is used
32
Challenges of Machine Learning
• Training Data
– Insufficient
– Non representative
– Poor Quality
– Irrelevant attributes
• Model Selection
– Overfitting
– Underfitting
• Testing and Validation
– Hyperparameters

Insufficient Training Data
Consider trade-off Between Algorithm development & training data
capture

Non-representative Training Data
Training Data be representative of the new cases we want to generalize

• Small sample size leads to sampling noise
• Missing data over emphasizes the role of wealth on happiness
• If sampling process is flawed, even large sample size can lead to sampling bias
missing data
available data

Data Quality
Some instances have missing features
– e.g., 5% of customers did not specify their age
– Ignore the instances all together or the feature,
– fill in the missing values
– Train multiple ML models
Some instances can be erroneous, noisy or outliers

– Human or machine generated
– Identify, discard or fix manually, as appropriate

Irrelevant Features
• Feature selection
• more useful features to train on among existing features.
• Feature extraction
• combine existing features to produce a more useful one.
• Create new features by gathering new data

Overfitting vs Underfitting
• Overfitting leads to high performance in training set but performs
poorly on new data
• e.g., a high-degree polynomial life satisfaction model that strongly overfits the training data
• Small training set or sampling noise can lead to model following the noise than the underlying
pattern in the dataset
• Solution: Regularization Large slope
Smaller slope
High-order
polynomial
• Underfitting when the model is too simple to learn the underlying

structure in the data
• Select a more powerful model, with more parameters
• Feed better features to the learning algorithm BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Testing and Validation
• Good ML algorithms need to work well on test data
• But test data is often not accessible to the provider of the algorithm
• Common assumption is training data is representative of test data
• Randomly chosen subset of the training data is held out as validation set
• aka dev set
• Once ML model is trained, its performance is evaluated on validation data
• Expectation is ML model working well on validation set will work well on
unknown test data
• Typically, 20-30% of the data is randomly held out as validation data

Cross Validation
• To reduce the bias of validation set selection process
• Often K is chosen as 10
• aka 10 fold cross validation
• 10 fold cross validation involves
• randomly selecting the validation set 10 times
• model generation with 10 resulting training set
• Evaluate the performance of each on that validation set
• averaging the performance over the validation sets

Choice of Hyper-parameters
• Modern ML models often use a lot of model parameters
• Known as hyperparameters
• Model performance depends on choice of parameters
• Each parameter can assume a number of values
• Real numbers or categories
• Exponential number of hyperparameter combinations possible
• Best model correspond to best cross validation performance over the set of
hyperparameter combinations
• Expensive to perform
• Some empirical frameworks available for hyperparameter optimization

Open source ML programming tools
Algorithms
Platform
or Features
Scikit Learn Linux, Mac OS, Python, C, C+ Classification, Regression, Clustering
Windows + Preprocessing, Model Selection
Dimensionality reduction.
PyTorch Linux, Mac OS, Python, C++ Autograd Module, Optim Module
Windows NN Module
TensorFlow Linux, Mac OS, Python, C++ Provides a library for dataflow
Windows programming.
Weka Linux, Mac OS, Java Data preparation, Classification

Windows Regression, Clustering, Visualization
Association rules mining
Evaluation
• Accuracy
• Precision and recall
• Squared error
• Likelihood
• Posterior probability
• Cost / Utility
• Margin
• Entropy
• etc.

Next class
plan
• Module 2: End-to-End ML Pipeline (Students are requested to check the
Module 1 virtual lab python file for next class)

CS1 Aml

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CS1 Aml

Uploaded by

Copyright:

Available Formats

Applied Machine Learning

BITS Pilani, Pilani Campus

M2-M3 End-to-end Machine Learning Pipeline

1. Aurelien Geron, “Hands-On Machine Learning with Scikit-Learn, Keras and

BITS Pilani, Pilani Campus

• Labs not graded

Lab Capsule available in Virtual Lab hosted in Platify via

BITS Pilani, Pilani Campus

EC – 2 Mid-Semester Test Closed/Open Book 30% 2 hrs

EC – 3 Comprehensive Exam Open Book 40% 2 hrs

BITS Pilani, Pilani Campus

Saturday, September 3,Deemed

Data Computer Output

Data Computer Program

Example: playing checkers.

E = the experience of playing many games of checkers

Slide credit: Dhruv Batra, Fei Sha

Slide credit: Pedro Domingos

Slide credit: Ray Mooney

Slide Credit: Eric Eaton

• Given (x1, y1), (x2, y2), ..., (xn, yn)

• Training: given a training set of labeled examples {(x1,y1), …,

Slide Credit: Eric Eaton

Visualization and dimensionality reduction

• Principal Component Analysis (PCA)

• t-distributed Stochastic Neighbor Embedding (t-SNE)

Slide Credit: Eric Eaton

Based on how training data is used

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

Training Data be representative of the new cases we want to generalize

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

Some instances can be erroneous, noisy or outliers

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

• Underfitting when the model is too simple to learn the underlying

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

Weka Linux, Mac OS, Java Data preparation, Classification

Slide credit: Pedro Domingos

• Module 2: End-to-End ML Pipeline (Students are requested to check the

Module 1 virtual lab python file for next class)

BITS Pilani, Pilani Campus

You might also like