Lecture 01 Introducing ML 13102022 031101pm

LECTURER:
Humera Farooq, Ph.D.

Computer Sciences Department,
Bahria University (Karachi Campus)
COURSE ASSESSMENT
 Assignments/ Research Project …...20%
 Test/Quiz…………………………10%
 Mid-Term…………………………30%
 Final Examination......................…….40%
3
Text Books and Reading Material
 Machine Learning, Tom Mitchell, McGraw-Hill.
 Pattern Recognition and Machine Learning,
Christopher M. Bishop
 Machine Learning: a Probabilistic Perspective,
Kevin Murphy
 100 pages of Machine Learning
 Peter Flach, Machine Learning: The art and
science of algorithms that make sense of data.
Cambridge University Press, 2012.
Outline
1. ML in a Nutshell
2. Representation, Evaluation, Optimization
3. Types of Learning
4. Trade-offs in Machine Learning
Machine Learning
5
 Machine learning is a subfield of computer science

that is concerned with building algorithms which, to
be useful, rely on a collection of examples of some
phenomenon. These examples can come from nature,
be handcrafted by humans or generated by another
algorithm.
 Machine learning can also be defined as the process
of solving a practical problem by
1) Gathering a dataset
2) Algorithmically building a statistical model based on that dataset.
Machine Learning can work like:
6
 Reduce time programming (correct spelling)
 Customize and scale products (specific tasks i.e.

language software's)
 Complete seemingly “unprogrammable” task (to

understand grammar, spaces by ML)
ML in a Nutshell
7
 Tens of thousands of machine learning algorithms

 Hundreds new every year
 Every machine learning algorithm has three
components:
 Representation
 Evaluation
 Optimization
Representation
8
 Decision trees
 Sets of rules / Logic programs
 Instances
 Graphical models (Bayes/Markov nets)
 Neural networks
 Support vector machines
Evaluation
9
 Accuracy
 Precision and recall
 Squared error
 Likelihood
 Posterior probability
 Cost / Utility
 Margin
 Entropy
Optimization
10
 Combinatorial optimization
 E.g.: Greedy search
 Convex optimization
 E.g.: Gradient descent
 Constrained optimization
 E.g.: Linear programming
Types of Learning
11
 Supervised (inductive) learning

 Training data includes desired outputs
 Unsupervised learning
 Training data does not include desired outputs
 Semi-supervised learning
 Training data includes a few desired outputs
 Reinforcement learning
 Rewards from sequence of actions
Types of Algorithms
A Taxonomy of Machine Learning Techniques:
Highlight on Important Approaches
13
Supervised Unsupervised
K-Means EM Self-Organizing
Linear Nonlinear Maps
Linear Logistic Perceptron

Regression Regression
Single Combined
Bagging Boosting Random
Forests
Easy to Interpret Hard to Interpret
Decision Rule Naïve k-Nearest Multi-Layer SVM

Trees Learning Bayes Neighbours Perceptron
“Best” Machine Learning Algorithm
• Bad news: no algorithm is the best
e.g. No machine learning algorithm will perform well on every task / data
set
• Good news: all of them are the best

e.g. Each machine learning algorithm will perform well on some task /
dataset
• “No free lunch” theorem

Wolpert (1996): all algorithms perform equally when averaged over all
possible problems
Trade-offs in Machine Learning
• Accuracy vs. interpretability
• Bias vs. variance
• Complexity vs. scalability

Some models / algorithms for computing them may not scale to large
data sets
• Domain-knowledge
‐ vs. data-driven
‐
• More data vs. better algorithm

Preparing Data
• Machine learning algorithms require data!
• Preprocessing is often necessary to transform the data
prior to applying a learning algorithm
Sampling: selecting a subset of observations
Feature extraction: selecting input variables
Normalization (standardization, scaling, binarization)
Handling missing data
• May depend on the algorithm

Supervised Learning
 The learning algorithm would receive a set of

inputs along with the corresponding correct outputs
to train a model
Training Data Model Predication
(Labeled Data)
Quantitative
Prediction
Discrete Prediction
the output Y is continuous /

the output is qualitative numerical / ordered value. For
(categorical). For example: example:
•whether the value of stock Z will •the value of stock Z one year
have increased or decreased one year from now
from now
a person’s income based on
•
•whether a credit card transaction is demographic factors
fraudulent or authentic
Unsupervised Learning
 The learning algorithm would receive unlabeled

raw data to train a model and to find patterns in the
data
Training Data Model Clustering
(Unlabeled Data)
Dimensionality reduction
create new features from original inputs that retain important

information. For example:
•represent a document as a small set of topics instead of as a
large collection of words
Cluster analysis
partition data into subsets that share common

characteristics. For example:
• group similar patients in medical database
Semisupervised Learning
 The learning algorithm receives labeled and unlabeled raw

data to train a model. Main objective is to efficiently
accommodate the unlabeled data
Training Data Model Data Modeling

(Labeled + and
Unlabeled Data) Augmentation
Reinforcement Learning
 Reinforcement learning is a subfield of machine learning where the machine “lives” in an
environment and is capable of perceiving the state of that environment as a vector of features.
 The machine can execute actions in every state.
 Different actions bring rewards and could also move the machine to another state of the
environment.
 The goal of a reinforcement learning algorithm is to learn a policy.
 A policy is a function f (similar to the model in supervised learning) that takes the feature
vector of a state as input and outputs an optimal action to execute in that state. The action is
optimal if it maximizes the expected average reward .
https://techvidvan.com/tutorials/reinforcement-learning/
Key Issues in Machine Learning
 Modeling
 How to formulate application problems as machine learning problems ? How to
represent the data?
 Learning Protocols (where is the data & labels coming from?)
 Representation
 What functions should we learn (hypothesis spaces) ?
 How to map raw input to an instance space?
 Any rigorous way to find these? Any general approach?
 Algorithms
 What are good algorithms?
 How do we define success?
 Generalization Vs. over fitting
 The computational problem
Machine Learning as a Process
- Define measurable and quantifiable goals

- Use this stage to learn about the problem
- Normalization
- Transformation
- Missing Values
- Outliers
- Study models accuracy

- Work better than the naïve
approach or previous system - Data Splitting
- Do the results make sense in the -Features Engineering
context of the problem -Estimating Performance
-Evaluation and Model
Selection
A Complete Learning Process
Data Types
 Qualitative / Categorical Data
 Nominal Data: Nominal data is opposite to

ordinal they have no order to it. (Gender,
Hair colour)
 Ordinal data: Ordinal data has some order to
it. (Low, Medium, High) or (First, Second,
Third)
 Binary Data: Binary data contains two values
(0 and 1) or (yes and no).
 Quantitative/ Numerical Data
 Discrete Data: Discrete data types are the

opposite of continuous, they have a logical
end to them.
 Continuous Data: are variables in the form of

numbers that don't have a logical end to
them, they can keep increasing without an
end

Tabulated Data
Each column is considered as a feature or adding one dimension to the data

Each column represent different information (an example of co relation)
Collectively all columns known as features / feature space and total dimension of the data
ML for Images and Videos
A key attribute of images data type is the presence
of spatial features/relationships within images that Unstructured Data : This data is usually
need to be understood to extract insightful composed of everything else including texts, images,
videos, speech/audio, time series/
information from the images.
Each image (greyscale) is a 2D data which can be
represented as a matrix
Video based data type consists of videos in different

formats.
A distinguishing factor with this data type is the
relationships between different frames in the video

with respect to positions, movements of
objects/people etc. need to be taken into account to
better obtain information from the videos.
https://ailephant.com/tag/convolutional-neural-network/
ML for Audio / Time Series
 This type of data has a sequence of ordered data points each having a timestamp.
 The most salient feature in this data is the relationship between the different data points such as periodic
patterns, seasonal behaviors, and so on.
 For example, if you consider the temperature recorded in a city over last year, looking at the changes over time,
we can easily identify that winter months are colder and summer months are hotter.
 This type of insight is basic but can only be observed if you look at the data points with their timestamps. Figure
2 shows an example visualization of time series data.
ML for Heterogeneous Data
Multimodal Learning
+
/ Fusion of different
features / modals
Notation: Scalars, Vectors
 A scalar is a simple numerical value, like 15 or - 3.25.
 Variables or constants that take scalar values are denoted by an italic letter, like x or a
 A vector is an ordered list of scalar values, called attributes. We denote a vector as a bold character, for example, x or w.
Vectors can be visualized as arrows that point to some directions as well as points in a multi-dimensional space.
 Illustrations of three two-dimensional vectors, a = [2, 3], b = [-2, 5], and c = [1, 0] is given in fig. 1
Illustrations of three two-dimensional vectors,

a = [2, 3], b = [-2, 5], and c = [1, 0]
Notation: Sets
Operations on Sets
A derived set creation operator looks like this: .
This notation means that we create a new set S’ by putting into it x squared such that that x is in S, and x is greater than 3.
The cardinality operator |S| returns the number of elements in set S.

Notation
Where notation means “is defined as”
Capital Pi Notation
A notation analogous to capital sigma is the capital pi notation. It denotes a product of elements in a collection
or attributes of a vector
Where a.b means ‘a’ multiply by ‘b’. Even if written ab its same meanings
Operations on Vectors
The sum of two vectors x + z is defined as the vector
The difference of two vectors x-z is defined as the vector
A vector multiplied by a scalar is a vector.
A dot-product of two vectors is a scalar
The two vectors must be of the same dimensionality. Otherwise, the dot-product is undefined
The multiplication of a matrix W by a vector X gives another vector as a result:

Operations on Vectors
When vectors participate in operations on matrices, a vector is by default represented as a matrix with one column.
When the vector is on the right of the matrix, it remains a column vector. We can only multiply a matrix by vector if
the vector has the same number of rows as the number of columns in the matrix. Let our vector be
Then Wx is a two-dimensional vector defined as,
When the vector is on the left side of the matrix in the multiplication, then it has to be transposed before we multiply
it by the matrix. The transpose of the vector x denoted as makes a row vector out of a column vector. Let’s
say,
then
The multiplication of the vector x by the matrix W is given by
we can only multiply a vector by a matrix if the vector has the same number of dimensions as the number of rows
in the matrix.
Derivative and Gradient
A derivative f’ of a function f is a function or a value that describes how fast f grows (or decreases).
If the derivative is a constant value, like 5 or -3, then the function grows (or decreases) constantly at any
point x of its domain.
If the derivative f’ is a function, then the function f can grow at a different regions
If the derivative f’ is positive at some point x, then the function f grows at this point.
If the derivative of f is negative at some x, then the function decreases at this point.
The derivative of zero at x means that the function’s slope at x is horizontal
The process of finding a derivative is called differentiation
Gradient is the generalization of derivative for functions that take several inputs (or one input in
the form of a vector or some other complex structure). A gradient of a function is a vector of
partial derivatives.
Few other concepts
A random variable, usually written as an italic capital letter, like X, is a variable whose possible values are
numerical outcomes of a random phenomenon. There are two types of random variables: discrete and
continuous
The probability distribution of a discrete random variable is described by a list of probabilities associated
with each of its possible values.
A continuous random variable takes an infinite number of possible values in some interval. Examples include
height, weight, and time
Max and Arg Max, in which the Max operator return highest value of a function and
Arg Max return the elements of any set that maximize the function
Summary
Learning?
Applications of Machine Learning
Representation, Evaluation, Optimization
Types of Learning
Trade-offs in Machine Learning

Lecture 01 Introducing ML 13102022 031101pm

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 01 Introducing ML 13102022 031101pm

Uploaded by

Copyright:

Available Formats

LECTURER:

Humera Farooq, Ph.D.

 Assignments/ Research Project …...20%

 Machine learning is a subfield of computer science

 Reduce time programming (correct spelling)

 Customize and scale products (specific tasks i.e.

 Complete seemingly “unprogrammable” task (to

 Tens of thousands of machine learning algorithms

 Supervised (inductive) learning

Linear Logistic Perceptron

Decision Rule Naïve k-Nearest Multi-Layer SVM

• Good news: all of them are the best

• “No free lunch” theorem

• Bias vs. variance

• Complexity vs. scalability

• More data vs. better algorithm

Feature extraction: selecting input variables

Normalization (standardization, scaling, binarization)

Handling missing data

• May depend on the algorithm

 The learning algorithm would receive a set of

the output Y is continuous /

 The learning algorithm would receive unlabeled

create new features from original inputs that retain important

partition data into subsets that share common

 The learning algorithm receives labeled and unlabeled raw

Training Data Model Data Modeling

- Define measurable and quantifiable goals

- Study models accuracy

 Nominal Data: Nominal data is opposite to

 Quantitative/ Numerical Data

 Discrete Data: Discrete data types are the

 Continuous Data: are variables in the form of

Each column is considered as a feature or adding one dimension to the data

Video based data type consists of videos in different

relationships between different frames in the video

Illustrations of three two-dimensional vectors,

The cardinality operator |S| returns the number of elements in set S.

Where notation means “is defined as”

The sum of two vectors x + z is defined as the vector

The difference of two vectors x-z is defined as the vector

A vector multiplied by a scalar is a vector.

A dot-product of two vectors is a scalar

The multiplication of a matrix W by a vector X gives another vector as a result:

Then Wx is a two-dimensional vector defined as,

The multiplication of the vector x by the matrix W is given by

The derivative of zero at x means that the function’s slope at x is horizontal

The process of finding a derivative is called differentiation

You might also like