You are on page 1of 54

Lecture 1

CSE 602 Machine Learning - I


What is a Machine?

• A mechanical device that performs a specific task


• It has an input and an output
• Between the I/O, there is a processing algorithm
• In AI, a machine is something that can replicate, or supplement
human performance

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 2


What is Learning?

• When the performance P of a machine M on a specific task T


improves over time – then M is said to have “learned”
• We need to have a definition of a task
• Requires the use of some algorithm to learn
• We need performance metrics to measure the performance

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 3


What is Machine Learning?

• Machine Learning (ML) is a subfield of artificial intelligence (AI)


that focuses on the development of algorithms and statistical models
that enable computer systems to improve their performance on a
specific task through learning from data

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 4


Alternate Definitions of ML
• Learning from Data: Learn patterns and make predictions or decisions
based on data - The learning process involves recognizing patterns,
trends, and relationships within data.
• Task-Specific Improvement: Improve the performance of a specific
task over time - image recognition, language translation,
recommendation systems, or playing games.
• No Explicit Programming: Learn and adapt from examples (rows of
data) without being explicitly programmed for the specific task (e.g.,
explicit programming using IF-THEN rules)
May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 5
Here is the ML scene

• Simple tabular data


• Features/Attributes in
columns
• Data types: Categorical,
Numerical, Timestamps
• Data row-wise and each
row is atomic

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 6


What is a Machine Learning Algorithm?

• A set of rules or mathematical procedures that enable a computer


system to learn from data and make predictions or decisions – can
be also called a model
• When we get the data for ML:
• We use some of it for training the algorithm – we make the algo learn the
data
• We use the remaining data for testing it – to know how good the algo has
learnt the data in the training step?
• Note: There is no other logical way to divide the data to do ML
May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 7
Types of Machine Learning Algorithms

• Supervised – you have a teacher (label) that guides the ML process -


The algorithm must learn to predict this label
• Unsupervised – you may or may not have a label, but the purpose is
to make sense of the data only (not to predict under any supervision)
• Reinforcement Learning – you must learn by interacting with an
environment – the learning happens when you receive feedback –
you end up doing things which produce good feedback and avoid
actions which produce bad feedback
May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 8
Types of Machine Learning Algorithms
Decrease the number of
features (dimensions) of Predict a label
the data

Predict a number

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 9


Supervised ML

• One of the columns is the label (also called output or target)– either
numerical (regression) or categorical (classification) -
Classification is typically binary
• Definition: The algorithm is trained on a labeled dataset - The goal
is to learn the mapping between inputs and outputs
• Some Examples: Linear regression, decision trees, support vector
machines, and neural networks

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 10


Supervised ML

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 11


Unsupervised ML

• Data might be labeled or not labeled


• Definition: Generally works with unlabeled data, and the algorithm's
objective is to find patterns, structures, or relationships within the data
without explicit guidance - It aims to discover the inherent structure of
the data.
• Some Examples: Clustering algorithms (k-means, hierarchical
clustering), dimensionality reduction techniques (principal component
analysis), and generative models (autoencoders) – But most common is
cluster analysis
May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 12
Unsupervised ML

Cluster of Pepper

Cluster of Eggplants

Cluster of
Onions

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 13


Reinforcement Learning

• Data might be labeled or not labeled (doesn’t matter)


• Definition: A paradigm where an agent interacts with an environment and
learns to make decisions by receiving feedback in the form of rewards or
punishments based on its actions –
• The agent is in a different state (situation) each time it takes an action
• The goal is to learn a policy that maximizes cumulative rewards over time
• Policy specifies the best action in each situation
• Some Examples: Q-learning, deep reinforcement learning algorithms
(Deep Q Network, Proximal Policy Optimization), and actor-critic methods
May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 14
Reinforcement ML

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 15


RL – DeepMind teaching itself Parkour

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 16


Here is the ML scene

• Objective:
• Learn a mapping from the features to the label (in case of SL)
• Learn a mapping from the features to the clusters (in case of USL)
• Learn a mapping from states to actions (in case of RL)
• There are many possible such mappings
• Specifically, each ML algorithm has its own mapping
• A mapping can be also called a hypothesis – because the data scientist
hypothesizes about a possible mapping for his/her data
• No one knows the best hypothesis
May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 17
Here is the ML scene

• And the mapping basically means a statistical/mathematical


relationship between the set of features and the desired output (be it
label, clusters, or the policy)
• Each algorithm has its own mathematical relationship
• This relationship becomes possible because of inherent patterns in the
data
• Customers of a particular type are bound to default on their payments
• Customers of 4 regions will potentially respond to a marketing campaign and
customers
May 15, 2024 of some otherCSE602
3 regions will I not
- Machine Learning respond
- Dr. Tariq Mahmood 18
Here is the ML scene

• These patterns are natural – mostly based on natural human behaviors


or natural activities
• Natural way of agricultural life cycle allows us to predict the yield
• Customer’s natural inclinations to buy allows us to predict the response to
marketing or promotion campaign
• A regularity in how the stocks fluctuate allow us to predict/forecast it
• The location, family and salary information have a strong correlation with the bill
payment patterns of the customers
• The patterns of economic ups and downs allow us to forecast many economic
indicators
May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 19
Here is the ML scene

• Which is the best mapping? This can be learnt only through:


• Experience
• Intuition
• Trial-and-Error
• Using the more famous algorithms nowadays
• Data cleaning
• Statistical data analysis (before ML)
• It also depends on the data sample: the time duration of the gathered
data, its granularity (how detailed it is)
May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 20
Here is the ML scene

• Experience – More ML projects executed, better the experience


• Trial-and-Error: ML algorithms present complicated scenarios –keep on
trying different things to take control of the situation –
• Trying out different algorithmic parameters
• Trying out different features (selection techniques)
• Changing the size and duration of the data sample
• Trying out different algorithms
• Using the most famous algorithm – for example, boosting algorithms are famous
but they might backfire in live testing
May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 21
Here is the ML scene

• We train – and keep on checking the performance of the algorithm


• Initially, it will be very unexpected and bad but due to trial and error and
experience, the ML engineer will keep on repeating the learning – Hence,
it will improve considerably
• You check the performance metrics - detailed in Lecture 2
• For example, initial training accuracy might be 50% (only 50% predictive power),
but due to repeated iteration, it might become 89%
• Generally, you get a very good performance on the training data (>85%)
and
May 15,reasonably
2024 good oneCSE602
on- Machine
the test
Learning data
I - Dr. Tariq(>75%)
Mahmood 22
ML Process

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 23


ML Process

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 24


ML Process

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 25


DevOps

• Initially, development of a software, its testing and deployment


(production on live systems) were isolated activities
• There was much interaction between these three teams – and many
problems
• Reluctance of Dev to give more time to testing to explain their code
• Lack of programming environment of the software to test and deploy it (Dev had
to help in this)
• Communication gaps
• Lot of time wastage
May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 26
DevOps (combine dev + test + dep)

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 27


MLOps
(DevOps for developing ML application)

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 28


MLOps
(DevOps for developing ML application)

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 29


MLOps
(DevOps for developing ML application)
Starts here

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 30


Live Testing

• The trained algorithm is first tested on the data acquired (in house testing)
• Then it is deployed in real-time for testing (ML Ops process)
• Performance typically drops in this case – because the data patterns on
which the ML algorithm has been trained are not exactly the same as
those present in the training data
• This is a natural and a regular thing - so live performance deteriorates–
but the ML engineer should adapt by updating the model – needs to find
an excellent middle ground
May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 31
Live Testing
• Middle ground – keep on updating the model with new data (which always
keeps on gathering) but with some policy (not randomly)
• When to update? How to update? How much time to take for update?
• When a reasonable performance starts to be obtained in live testing data, then
we say that the model has “generalized”
• That is, the model will be now able to give a reasonable predictive
performance in real-time for a long time
• It will fight out the small variations in patterns which occur in live testing
from time to time – and continue to give a reasonable predictive performance
May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 32
What are Hyperparameters of ML Algos?

• Each ML algorithm is associated with a set of parameters – which are the


characteristics of any algorithm
• You manipulate these parameters to make the algorithm fit better on the data
and learn the patterns
• Also known as hyper parameters
• Gradient Boosting params: sklearn.ensemble.GradientBoostingClassifier
— scikit-learn 1.4.0 documentation
• Random Forest params: sklearn.ensemble.RandomForestClassifier
— scikit-learn 1.4.0 documentation
May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 33
What are Hyperparameters of ML Algos?

• We call them “hyper” because they can be manipulated by us


externally by us
• They are not the actual internal parameters that the model learns
while fitting the data
• E.g., Neural Networks target the learning of weights between
connections (internal)
• External hyper parameters (in our control): Number of layers,
Number of neurons, Drop out, Batch normalization etc.
May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 34
Forecasting

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 35


Forecasting

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 36


Difference between Forecasting and ML

• Forecasting: Using statistical methods and time series analysis to


make predictions about future values based on historical data patterns
- relies on mathematical models and assumes that historical trends can
be indicative of future trends.
• Machine Learning: Encompasses a broader range of techniques,
including statistical methods, but goes beyond traditional forecasting
by using algorithms that can learn from data - capable of identifying
complex patterns, relationships, and dependencies in data, allowing
them to make predictions based on learned features.
May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 37
Difference between Forecasting and ML

• Forecasting: Effective when dealing with relatively simple time series


data and when the underlying patterns are well-defined and consistent
(input variable is time + possible others) (output is a numerical
number)
• Time series forecasting is an application of regression
• Machine Learning: Handles more complex and diverse datasets - It
can adapt to non-linear (difficult) relationships, handle high-
dimensional data, and capture intricate patterns that may be challenging
for traditional forecasting methods.
May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 38
Difference between Forecasting and ML

• Forecasting: Requires manual intervention and tuning, making


them less adaptable to changing patterns or sudden shifts in data.
• Machine Learning: ML models adapt to changing patterns by
continuously learning from new data - makes ML models more
flexible in dynamic environments.

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 39


Difference between Forecasting and ML

• Forecasting: Involves selecting relevant features manually, which


requires domain expertise and a good understanding of the
underlying processes.
• Machine Learning: Automatically learn relevant features from data
(but not always), reducing the need for extensive manual feature
engineering - true for deep learning models that can learn
hierarchical representations.

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 40


Difference between Forecasting and ML

• Forecasting: Often more interpretable, as the relationships between


variables are explicitly defined.
• Machine Learning: Some ML models, especially complex ones
like deep neural networks, can be perceived as "black boxes" due to
their high dimensionality and intricate internal representations.
Interpreting the decisions made by these models can be challenging
(a general problem in ML: models are not explainable).

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 41


Solving a Business Problem

• ML and analytics is always done to solve a business problem


• The ML engineer/data scientist needs to spend considerable time
to understand the business and its key stake holders – before
venturing on any ML activity
• Interviews and regular communication with business
• When ML starts, regular updates to the business need to be provided

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 42


Examples of Predictive Business Problems?

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 43


Before Getting the Data

• Hone your SQL skills


• Have a good Business understanding
• Good collaboration with IT department
• Understanding security protocols and abiding by them
• Ability to understand spaghetti database structures
• Make compromises in acquiring the data – not all might be available or
accessible
• Document the thought process
May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 44
May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 45
Getting the Data – Data Ingestion

• Connecting to data sources


• Define the rules for acquiring data (when to connect, how much to
ingest, where to store)
• Abide by security rules
• Make copies of databases
• Implement Version control (GitHub) for each ML activity (MLOps /
Data Engineering)
May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 46
How much data is required?

• 3-5 years is good for batch data


• 1 year could also be enough for streaming data
• Typically, data engineering and data base skills are required to extract data
• Connecting with databases
• Excellent understanding of the business
• IT department may not always help you
• High level of SQL skills may be required along with team work to extract the
relevant data
May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 47
After Getting the Data

• Go with your team to your silo to do ML (request for 3-5 months to


deliver the first prototype)
• Maintain regular communication with the business – needed to
understand the features in the data and take feedback on the very initial
results
• Don’t get lost in your own world in trying to perfect and perfect the
model – too much perfection can wreck generalization (more of this in
Lecture 2)

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 48


Important

• Remember that 20% of the data maybe controlling the other 80%
(pareto rule)
• If you can get hold of this 20% in training phase, maybe the model
can generalize well in real-time (live testing)
• But maybe a very simple model can get hold of this 20% or more
data – not necessary to use the state of the art algorithm

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 49


Bias Variance Trade Off

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 50


Bias Variance Trade Off

• Bias – underfitting – didn’t grab the complexity


• Variance – performance on another dataset – overfitting
• Underfitted: High Bias - 0 variance - Fail on testing, validation and live data.
• Overfitted: High variance. 0 bias.
• Testing good. Validation good.
• Live testing fails - Generalization is low.
• Good fitted model: Bias-Variance balance (done by the algorithm)

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 51


External Factors Affecting ML

• Cross-Validation
• Regularization
• Feature-Engineering
• Occams Razor

May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 52


May 15, 2024 CSE602 - Machine Learning I - Dr. Tariq Mahmood 53
Section

You might also like