You are on page 1of 36

LECTURER:

Humera Farooq, Ph.D.


Computer Sciences Department,
Bahria University (Karachi Campus)
COURSE ASSESSMENT

 Assignments/ Research Project …...20%

 Test/Quiz…………………………10%

 Mid-Term…………………………30%

 Final Examination......................…….40%
3
Text Books and Reading Material
 Machine Learning, Tom Mitchell, McGraw-Hill.
 Pattern Recognition and Machine Learning,
Christopher M. Bishop
 Machine Learning: a Probabilistic Perspective,
Kevin Murphy
 100 pages of Machine Learning
 Peter Flach, Machine Learning: The art and
science of algorithms that make sense of data.
Cambridge University Press, 2012.
Outline

1. ML in a Nutshell
2. Representation, Evaluation, Optimization
3. Types of Learning
4. Trade-­offs in Machine Learning
Machine Learning
5

 Machine learning is a subfield of computer science


that is concerned with building algorithms which, to
be useful, rely on a collection of examples of some
phenomenon. These examples can come from nature,
be handcrafted by humans or generated by another
algorithm.
 Machine learning can also be defined as the process
of solving a practical problem by
1) Gathering a dataset
2) Algorithmically building a statistical model based on that dataset.
Machine Learning can work like:
6

 Reduce time programming (correct spelling)

 Customize and scale products (specific tasks i.e.


language software's)

 Complete seemingly “unprogrammable” task (to


understand grammar, spaces by ML)
ML in a Nutshell
7

 Tens of thousands of machine learning algorithms


 Hundreds new every year
 Every machine learning algorithm has three
components:
 Representation
 Evaluation
 Optimization
Representation
8

 Decision trees
 Sets of rules / Logic programs
 Instances
 Graphical models (Bayes/Markov nets)
 Neural networks
 Support vector machines
Evaluation
9

 Accuracy
 Precision and recall
 Squared error
 Likelihood
 Posterior probability
 Cost / Utility
 Margin
 Entropy
Optimization
10

 Combinatorial optimization
 E.g.: Greedy search
 Convex optimization
 E.g.: Gradient descent
 Constrained optimization
 E.g.: Linear programming
Types of Learning
11

 Supervised (inductive) learning


 Training data includes desired outputs
 Unsupervised learning
 Training data does not include desired outputs
 Semi-supervised learning
 Training data includes a few desired outputs
 Reinforcement learning
 Rewards from sequence of actions
Types of Algorithms
A Taxonomy of Machine Learning Techniques:
Highlight on Important Approaches
13

Supervised Unsupervised

K-Means EM Self-Organizing
Linear Nonlinear Maps

Linear Logistic Perceptron


Regression Regression
Single Combined
Bagging Boosting Random
Forests
Easy to Interpret Hard to Interpret

Decision Rule Naïve k-Nearest Multi-Layer SVM


Trees Learning Bayes Neighbours Perceptron
“Best” Machine Learning Algorithm
• Bad news: no algorithm is the best
e.g. No machine learning algorithm will perform well on every task / data
set

• Good news: all of them are the best


e.g. Each machine learning algorithm will perform well on some task /
dataset

• “No free lunch” theorem


Wolpert (1996): all algorithms perform equally when averaged over all
possible problems
Trade-­offs in Machine Learning
• Accuracy vs. interpretability

• Bias vs. variance

• Complexity vs. scalability


Some models / algorithms for computing them may not scale to large
data sets

• Domain-­knowledge
‐ vs. data-­driven

• More data vs. better algorithm


Preparing Data
• Machine learning algorithms require data!
• Preprocessing is often necessary to transform the data
prior to applying a learning algorithm
Sampling: selecting a subset of observations

Feature extraction: selecting input variables

Normalization (standardization, scaling, binarization)

Handling missing data

• May depend on the algorithm


Supervised Learning

 The learning algorithm would receive a set of


inputs along with the corresponding correct outputs
to train a model
Training Data Model Predication
(Labeled Data)

Quantitative
Prediction
Discrete Prediction

the output Y is continuous /


the output is qualitative numerical / ordered value. For
(categorical). For example: example:
•whether the value of stock Z will •the value of stock Z one year
have increased or decreased one year from now
from now
a person’s income based on

•whether a credit card transaction is demographic factors
fraudulent or authentic
Unsupervised Learning

 The learning algorithm would receive unlabeled


raw data to train a model and to find patterns in the
data
Training Data Model Clustering
(Unlabeled Data)

Dimensionality reduction

create new features from original inputs that retain important


information. For example:
•represent a document as a small set of topics instead of as a
large collection of words
Cluster analysis

partition data into subsets that share common


characteristics. For example:
• group similar patients in medical database
Semisupervised Learning

 The learning algorithm receives labeled and unlabeled raw


data to train a model. Main objective is to efficiently
accommodate the unlabeled data

Training Data Model Data Modeling


(Labeled + and
Unlabeled Data) Augmentation
Reinforcement Learning
 Reinforcement learning is a subfield of machine learning where the machine “lives” in an
environment and is capable of perceiving the state of that environment as a vector of features.
 The machine can execute actions in every state.
 Different actions bring rewards and could also move the machine to another state of the
environment.
 The goal of a reinforcement learning algorithm is to learn a policy.
 A policy is a function f (similar to the model in supervised learning) that takes the feature
vector of a state as input and outputs an optimal action to execute in that state. The action is
optimal if it maximizes the expected average reward .

https://techvidvan.com/tutorials/reinforcement-learning/
Key Issues in Machine Learning
 Modeling
 How to formulate application problems as machine learning problems ? How to
represent the data?
 Learning Protocols (where is the data & labels coming from?)

 Representation
 What functions should we learn (hypothesis spaces) ?
 How to map raw input to an instance space?
 Any rigorous way to find these? Any general approach?

 Algorithms
 What are good algorithms?
 How do we define success?
 Generalization Vs. over fitting
 The computational problem
Machine Learning as a Process

- Define measurable and quantifiable goals


- Use this stage to learn about the problem

- Normalization
- Transformation
- Missing Values
- Outliers

- Study models accuracy


- Work better than the naïve
approach or previous system - Data Splitting
- Do the results make sense in the -Features Engineering
context of the problem -Estimating Performance
-Evaluation and Model
Selection
A Complete Learning Process
Data Types
 Qualitative / Categorical Data

 Nominal Data: Nominal data is opposite to


ordinal they have no order to it. (Gender,
Hair colour)
 Ordinal data: Ordinal data has some order to
it. (Low, Medium, High) or (First, Second,
Third)
 Binary Data: Binary data contains two values
(0 and 1) or (yes and no).

 Quantitative/ Numerical Data

 Discrete Data: Discrete data types are the


opposite of continuous, they have a logical
end to them.

 Continuous Data: are variables in the form of


numbers that don't have a logical end to
them, they can keep increasing without an
end

Tabulated Data

Each column is considered as a feature or adding one dimension to the data


Each column represent different information (an example of co relation)

Collectively all columns known as features / feature space and total dimension of the data
ML for Images and Videos
A key attribute of images data type is the presence
of spatial features/relationships within images that Unstructured Data : This data is usually
need to be understood to extract insightful composed of everything else including texts, images,
videos, speech/audio, time series/
information from the images.
Each image (greyscale) is a 2D data which can be

represented as a matrix

Video based data type consists of videos in different


formats.
A distinguishing factor with this data type is the

relationships between different frames in the video


with respect to positions, movements of
objects/people etc. need to be taken into account to
better obtain information from the videos.

https://ailephant.com/tag/convolutional-neural-network/
ML for Audio / Time Series
 This type of data has a sequence of ordered data points each having a timestamp.
 The most salient feature in this data is the relationship between the different data points such as periodic
patterns, seasonal behaviors, and so on.
 For example, if you consider the temperature recorded in a city over last year, looking at the changes over time,
we can easily identify that winter months are colder and summer months are hotter.
 This type of insight is basic but can only be observed if you look at the data points with their timestamps. Figure
2 shows an example visualization of time series data.
ML for Heterogeneous Data

Multimodal Learning
+
/ Fusion of different
features / modals
Notation: Scalars, Vectors
 A scalar is a simple numerical value, like 15 or - 3.25.

 Variables or constants that take scalar values are denoted by an italic letter, like x or a

 A vector is an ordered list of scalar values, called attributes. We denote a vector as a bold character, for example, x or w.
Vectors can be visualized as arrows that point to some directions as well as points in a multi-dimensional space.

 Illustrations of three two-dimensional vectors, a = [2, 3], b = [-2, 5], and c = [1, 0] is given in fig. 1

Illustrations of three two-dimensional vectors,


a = [2, 3], b = [-2, 5], and c = [1, 0]
Notation: Sets

Operations on Sets
A derived set creation operator looks like this: .

This notation means that we create a new set S’ by putting into it x squared such that that x is in S, and x is greater than 3.

The cardinality operator |S| returns the number of elements in set S.


Notation

Where notation means “is defined as”

Capital Pi Notation
A notation analogous to capital sigma is the capital pi notation. It denotes a product of elements in a collection
or attributes of a vector

Where a.b means ‘a’ multiply by ‘b’. Even if written ab its same meanings
Operations on Vectors

The sum of two vectors x + z is defined as the vector

The difference of two vectors x-z is defined as the vector

A vector multiplied by a scalar is a vector.

A dot-product of two vectors is a scalar

The two vectors must be of the same dimensionality. Otherwise, the dot-product is undefined

The multiplication of a matrix W by a vector X gives another vector as a result:


Operations on Vectors
When vectors participate in operations on matrices, a vector is by default represented as a matrix with one column.
When the vector is on the right of the matrix, it remains a column vector. We can only multiply a matrix by vector if
the vector has the same number of rows as the number of columns in the matrix. Let our vector be

Then Wx is a two-dimensional vector defined as,

When the vector is on the left side of the matrix in the multiplication, then it has to be transposed before we multiply
it by the matrix. The transpose of the vector x denoted as makes a row vector out of a column vector. Let’s
say,

then

The multiplication of the vector x by the matrix W is given by

we can only multiply a vector by a matrix if the vector has the same number of dimensions as the number of rows
in the matrix.
Derivative and Gradient
A derivative f’ of a function f is a function or a value that describes how fast f grows (or decreases).

If the derivative is a constant value, like 5 or -3, then the function grows (or decreases) constantly at any
point x of its domain.

If the derivative f’ is a function, then the function f can grow at a different regions

If the derivative f’ is positive at some point x, then the function f grows at this point.

If the derivative of f is negative at some x, then the function decreases at this point.

The derivative of zero at x means that the function’s slope at x is horizontal

The process of finding a derivative is called differentiation

Gradient is the generalization of derivative for functions that take several inputs (or one input in
the form of a vector or some other complex structure). A gradient of a function is a vector of
partial derivatives.
Few other concepts

A random variable, usually written as an italic capital letter, like X, is a variable whose possible values are
numerical outcomes of a random phenomenon. There are two types of random variables: discrete and
continuous

The probability distribution of a discrete random variable is described by a list of probabilities associated
with each of its possible values.

A continuous random variable takes an infinite number of possible values in some interval. Examples include
height, weight, and time

Max and Arg Max, in which the Max operator return highest value of a function and

Arg Max return the elements of any set that maximize the function
Summary

Learning?
Applications of Machine Learning
Representation, Evaluation, Optimization
Types of Learning
Trade-­offs in Machine Learning

You might also like