ML 1 Lecture 1

Lecture 1
CSE 602 Machine Learning - I

What is a Machine?
• A mechanical device that performs a specific task

• It has an input and an output
• Between the I/O, there is a processing algorithm
• In AI, a machine is something that can replicate, or supplement
human performance
May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 2

What is Learning?
• When the performance P of a machine M on a specific task T

improves over time – then M is said to have “learned”
• We need to have a definition of a task
• Requires the use of some algorithm to learn
• We need performance metrics to measure the performance

What is Machine Learning?
• Machine Learning (ML) is a subfield of artificial intelligence (AI)

that focuses on the development of algorithms and statistical models
that enable computer systems to improve their performance on a
specific task through learning from data

Alternate Definitions of ML
• Learning from Data: Learn patterns and make predictions or decisions
based on data - The learning process involves recognizing patterns,
trends, and relationships within data.
• Task-Specific Improvement: Improve the performance of a specific
task over time - image recognition, language translation,
recommendation systems, or playing games.
• No Explicit Programming: Learn and adapt from examples (rows of
data) without being explicitly programmed for the specific task (e.g.,
explicit programming using IF-THEN rules)
Here is the ML scene
• Simple tabular data

• Features/Attributes in
columns
• Data types: Categorical,
Numerical, Timestamps
• Data row-wise and each
row is atomic

What is a Machine Learning Algorithm?
• A set of rules or mathematical procedures that enable a computer

system to learn from data and make predictions or decisions – can
be also called a model
• When we get the data for ML:
• We use some of it for training the algorithm – we make the algo learn the
data
• We use the remaining data for testing it – to know how good the algo has
learnt the data in the training step?
• Note: There is no other logical way to divide the data to do ML
Types of Machine Learning Algorithms
• Supervised – you have a teacher (label) that guides the ML process -

The algorithm must learn to predict this label
• Unsupervised – you may or may not have a label, but the purpose is
to make sense of the data only (not to predict under any supervision)
• Reinforcement Learning – you must learn by interacting with an
environment – the learning happens when you receive feedback –
you end up doing things which produce good feedback and avoid
actions which produce bad feedback
Types of Machine Learning Algorithms
Decrease the number of
features (dimensions) of Predict a label
the data
Predict a number

Supervised ML
• One of the columns is the label (also called output or target)– either
numerical (regression) or categorical (classification) -
Classification is typically binary
• Definition: The algorithm is trained on a labeled dataset - The goal
is to learn the mapping between inputs and outputs
• Some Examples: Linear regression, decision trees, support vector
machines, and neural networks

Supervised ML

Unsupervised ML
• Data might be labeled or not labeled

• Definition: Generally works with unlabeled data, and the algorithm's
objective is to find patterns, structures, or relationships within the data
without explicit guidance - It aims to discover the inherent structure of
the data.
• Some Examples: Clustering algorithms (k-means, hierarchical
clustering), dimensionality reduction techniques (principal component
analysis), and generative models (autoencoders) – But most common is
cluster analysis
Unsupervised ML
Cluster of Pepper
Cluster of Eggplants
Cluster of
Onions

Reinforcement Learning
• Data might be labeled or not labeled (doesn’t matter)

• Definition: A paradigm where an agent interacts with an environment and
learns to make decisions by receiving feedback in the form of rewards or
punishments based on its actions –
• The agent is in a different state (situation) each time it takes an action
• The goal is to learn a policy that maximizes cumulative rewards over time
• Policy specifies the best action in each situation
• Some Examples: Q-learning, deep reinforcement learning algorithms
(Deep Q Network, Proximal Policy Optimization), and actor-critic methods
Reinforcement ML

RL – DeepMind teaching itself Parkour

• Objective:
• Learn a mapping from the features to the label (in case of SL)
• Learn a mapping from the features to the clusters (in case of USL)
• Learn a mapping from states to actions (in case of RL)
• There are many possible such mappings
• Specifically, each ML algorithm has its own mapping
• A mapping can be also called a hypothesis – because the data scientist
hypothesizes about a possible mapping for his/her data
• No one knows the best hypothesis
• And the mapping basically means a statistical/mathematical

relationship between the set of features and the desired output (be it
label, clusters, or the policy)
• Each algorithm has its own mathematical relationship
• This relationship becomes possible because of inherent patterns in the
data
• Customers of a particular type are bound to default on their payments
• Customers of 4 regions will potentially respond to a marketing campaign and
customers
May 15, 2024 of some otherCSE602
3 regions will I not
- Machine Learning respond
- Dr. Tariq Mahmood 18
• These patterns are natural – mostly based on natural human behaviors

or natural activities
• Natural way of agricultural life cycle allows us to predict the yield
• Customer’s natural inclinations to buy allows us to predict the response to
marketing or promotion campaign
• A regularity in how the stocks fluctuate allow us to predict/forecast it
• The location, family and salary information have a strong correlation with the bill
payment patterns of the customers
• The patterns of economic ups and downs allow us to forecast many economic
indicators
• Which is the best mapping? This can be learnt only through:

• Experience
• Intuition
• Trial-and-Error
• Using the more famous algorithms nowadays
• Data cleaning
• Statistical data analysis (before ML)
• It also depends on the data sample: the time duration of the gathered
data, its granularity (how detailed it is)
• Experience – More ML projects executed, better the experience

• Trial-and-Error: ML algorithms present complicated scenarios –keep on
trying different things to take control of the situation –
• Trying out different algorithmic parameters
• Trying out different features (selection techniques)
• Changing the size and duration of the data sample
• Trying out different algorithms
• Using the most famous algorithm – for example, boosting algorithms are famous
but they might backfire in live testing
• We train – and keep on checking the performance of the algorithm

• Initially, it will be very unexpected and bad but due to trial and error and
experience, the ML engineer will keep on repeating the learning – Hence,
it will improve considerably
• You check the performance metrics - detailed in Lecture 2
• For example, initial training accuracy might be 50% (only 50% predictive power),
but due to repeated iteration, it might become 89%
• Generally, you get a very good performance on the training data (>85%)
and
May 15,reasonably
2024 good oneCSE602
on- Machine
the test
Learning data
I - Dr. Tariq(>75%)
Mahmood 22
ML Process

ML Process

ML Process

DevOps
• Initially, development of a software, its testing and deployment

(production on live systems) were isolated activities
• There was much interaction between these three teams – and many
problems
• Reluctance of Dev to give more time to testing to explain their code
• Lack of programming environment of the software to test and deploy it (Dev had
to help in this)
• Communication gaps
• Lot of time wastage
DevOps (combine dev + test + dep)

MLOps
(DevOps for developing ML application)

MLOps

MLOps
Starts here

Live Testing
• The trained algorithm is first tested on the data acquired (in house testing)
• Then it is deployed in real-time for testing (ML Ops process)
• Performance typically drops in this case – because the data patterns on
which the ML algorithm has been trained are not exactly the same as
those present in the training data
• This is a natural and a regular thing - so live performance deteriorates–
but the ML engineer should adapt by updating the model – needs to find
an excellent middle ground
Live Testing
• Middle ground – keep on updating the model with new data (which always
keeps on gathering) but with some policy (not randomly)
• When to update? How to update? How much time to take for update?
• When a reasonable performance starts to be obtained in live testing data, then
we say that the model has “generalized”
• That is, the model will be now able to give a reasonable predictive
performance in real-time for a long time
• It will fight out the small variations in patterns which occur in live testing
from time to time – and continue to give a reasonable predictive performance
What are Hyperparameters of ML Algos?
• Each ML algorithm is associated with a set of parameters – which are the

characteristics of any algorithm
• You manipulate these parameters to make the algorithm fit better on the data
and learn the patterns
• Also known as hyper parameters
• Gradient Boosting params: sklearn.ensemble.GradientBoostingClassifier
— scikit-learn 1.4.0 documentation
• Random Forest params: sklearn.ensemble.RandomForestClassifier
— scikit-learn 1.4.0 documentation
What are Hyperparameters of ML Algos?
• We call them “hyper” because they can be manipulated by us

externally by us
• They are not the actual internal parameters that the model learns
while fitting the data
• E.g., Neural Networks target the learning of weights between
connections (internal)
• External hyper parameters (in our control): Number of layers,
Number of neurons, Drop out, Batch normalization etc.
Forecasting

Forecasting

Difference between Forecasting and ML
• Forecasting: Using statistical methods and time series analysis to

make predictions about future values based on historical data patterns
- relies on mathematical models and assumes that historical trends can
be indicative of future trends.
• Machine Learning: Encompasses a broader range of techniques,
including statistical methods, but goes beyond traditional forecasting
by using algorithms that can learn from data - capable of identifying
complex patterns, relationships, and dependencies in data, allowing
them to make predictions based on learned features.
• Forecasting: Effective when dealing with relatively simple time series

data and when the underlying patterns are well-defined and consistent
(input variable is time + possible others) (output is a numerical
number)
• Time series forecasting is an application of regression
• Machine Learning: Handles more complex and diverse datasets - It
can adapt to non-linear (difficult) relationships, handle high-
dimensional data, and capture intricate patterns that may be challenging
for traditional forecasting methods.
• Forecasting: Requires manual intervention and tuning, making

them less adaptable to changing patterns or sudden shifts in data.
• Machine Learning: ML models adapt to changing patterns by
continuously learning from new data - makes ML models more
flexible in dynamic environments.

• Forecasting: Involves selecting relevant features manually, which

requires domain expertise and a good understanding of the
underlying processes.
• Machine Learning: Automatically learn relevant features from data
(but not always), reducing the need for extensive manual feature
engineering - true for deep learning models that can learn
hierarchical representations.

• Forecasting: Often more interpretable, as the relationships between

variables are explicitly defined.
• Machine Learning: Some ML models, especially complex ones
like deep neural networks, can be perceived as "black boxes" due to
their high dimensionality and intricate internal representations.
Interpreting the decisions made by these models can be challenging
(a general problem in ML: models are not explainable).

Solving a Business Problem
• ML and analytics is always done to solve a business problem

• The ML engineer/data scientist needs to spend considerable time
to understand the business and its key stake holders – before
venturing on any ML activity
• Interviews and regular communication with business
• When ML starts, regular updates to the business need to be provided

Examples of Predictive Business Problems?

Before Getting the Data
• Hone your SQL skills

• Have a good Business understanding
• Good collaboration with IT department
• Understanding security protocols and abiding by them
• Ability to understand spaghetti database structures
• Make compromises in acquiring the data – not all might be available or
accessible
• Document the thought process
Getting the Data – Data Ingestion
• Connecting to data sources

• Define the rules for acquiring data (when to connect, how much to
ingest, where to store)
• Abide by security rules
• Make copies of databases
• Implement Version control (GitHub) for each ML activity (MLOps /
Data Engineering)
How much data is required?
• 3-5 years is good for batch data

• 1 year could also be enough for streaming data
• Typically, data engineering and data base skills are required to extract data
• Connecting with databases
• Excellent understanding of the business
• IT department may not always help you
• High level of SQL skills may be required along with team work to extract the
relevant data
After Getting the Data
• Go with your team to your silo to do ML (request for 3-5 months to

deliver the first prototype)
• Maintain regular communication with the business – needed to
understand the features in the data and take feedback on the very initial
results
• Don’t get lost in your own world in trying to perfect and perfect the
model – too much perfection can wreck generalization (more of this in
Lecture 2)

Important
• Remember that 20% of the data maybe controlling the other 80%
(pareto rule)
• If you can get hold of this 20% in training phase, maybe the model
can generalize well in real-time (live testing)
• But maybe a very simple model can get hold of this 20% or more
data – not necessary to use the state of the art algorithm

Bias Variance Trade Off

Bias Variance Trade Off
• Bias – underfitting – didn’t grab the complexity

• Variance – performance on another dataset – overfitting
• Underfitted: High Bias - 0 variance - Fail on testing, validation and live data.
• Overfitted: High variance. 0 bias.
• Testing good. Validation good.
• Live testing fails - Generalization is low.
• Good fitted model: Bias-Variance balance (done by the algorithm)

External Factors Affecting ML
• Cross-Validation
• Regularization
• Feature-Engineering
• Occams Razor

Section

ML 1 Lecture 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML 1 Lecture 1

Uploaded by

Copyright:

Available Formats

Lecture 1

CSE 602 Machine Learning - I

• A mechanical device that performs a specific task

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 2

• When the performance P of a machine M on a specific task T

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 3

• Machine Learning (ML) is a subfield of artificial intelligence (AI)

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 4

• Simple tabular data

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 6

• A set of rules or mathematical procedures that enable a computer

• Supervised – you have a teacher (label) that guides the ML process -

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 9

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 10

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 11

• Data might be labeled or not labeled

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 13

• Data might be labeled or not labeled (doesn’t matter)

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 15

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 16

• And the mapping basically means a statistical/mathematical

• These patterns are natural – mostly based on natural human behaviors

• Which is the best mapping? This can be learnt only through:

• Experience – More ML projects executed, better the experience

• We train – and keep on checking the performance of the algorithm

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 23

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 24

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 25

• Initially, development of a software, its testing and deployment

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 27

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 28

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 29

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 30

• Each ML algorithm is associated with a set of parameters – which are the

• We call them “hyper” because they can be manipulated by us

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 35

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 36

• Forecasting: Using statistical methods and time series analysis to

• Forecasting: Effective when dealing with relatively simple time series

• Forecasting: Requires manual intervention and tuning, making

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 39

• Forecasting: Involves selecting relevant features manually, which

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 40

• Forecasting: Often more interpretable, as the relationships between

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 41

• ML and analytics is always done to solve a business problem

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 42

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 43

• Hone your SQL skills

• Connecting to data sources

• 3-5 years is good for batch data

• Go with your team to your silo to do ML (request for 3-5 months to

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 48

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 49

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 50

• Bias – underfitting – didn’t grab the complexity

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 51

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 52

You might also like