0% found this document useful (0 votes)

10 views80 pages

01 - Introduction

machine learning

Uploaded by

amiteshkumar91223

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views80 pages

01 - Introduction

machine learning

Uploaded by

amiteshkumar91223

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

INTRODUCTION

Fall 2025 MC460303 Machine Learning

B1:
Machine learning: an algorithmic perspective.
2nd Edition
Marsland, Stephen.
CRC press, 2015.
Credits
1. B1
2. https://en.wikipedia.org/wiki/Curse_of_dimensionality
3. Keogh, Eamonn, and Abdullah Mueen. "Curse of
Dimensionality." Encyclopedia of Machine Learning. Springer US, 2011.
257-258.
4. https://en.wikipedia.org/wiki/Precision_and_recall
5. http://scott.fortmann-roe.com/docs/BiasVariance.html
6. https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff
7. http://homepages.cae.wisc.edu/~ece539/project/s01/qi.ppt
8. https://medium.com/@UdacityINDIA/difference-between-machine-
learning-deep-learning-and-artificial-intelligence-e9073d43a4c3
Assignment
Read:
B1: Ch1, Ch 2
Problems:
AI vs. Machine Learning vs. Deep Learning

Credit: https://www.geospatialworld.net/blogs/difference-between-ai%EF%BB%BF-machine-learning-and-deep-learning/
Soft Computing vs. Hard Computing
 Soft computing is the use of approximate calculations to provide
imprecise but usable solutions to complex computational problems.
 Neural Networks
 Genetic Algorithms

 Fuzzy Logic

 In contrast, Hard computing techniques consist of algorithms that find

provably correct and optimal solutions to problems.
Soft Computing Hard Computing
Soft Computing is liberal of inexactness, uncertainty,
1. partial truth and approximation. Hard computing needs a exactly state analytic model.
Soft Computing relies on formal logic and probabilistic
2. reasoning. Hard computing relies on binary logic and crisp system.
Soft computing has the features of approximation and Hard computing has the features of exactitude(precision)
3. dispositionality. and categoricity.
4. Soft computing is stochastic in nature. Hard computing is deterministic in nature.

5. Soft computing works on ambiguous and noisy data. Hard computing works on exact data.

6. Soft computing can perform parallel computations. Hard computing performs sequential computations.

7. Soft computing produces approximate results. Hard computing produces precise results.

8. Soft computing will emerge its own programs. Hard computing requires programs to be written.

9. Soft computing incorporates randomness . Hard computing is settled.

10. Soft computing will use multivalued logic. Hard computing uses two-valued logic.
Credit: https://www.geeksforgeeks.org/difference-between-soft-computing-and-hard-computing/
Machine Learning
 Computer algorithms that allow computer programs to automatically
improve through experience…
 …without explicit programming for the problem at hand.
Types of Machine Learning
 Supervised learning
 Unsupervised learning
 Reinforcement learning
 Evolutionary learning
Credit: https://learncuriously.wordpress.com/2018/12/22/machine-learning-vs-human-learning-part-1/
A Supervised Learning Example
 Apples vs. Oragnes
Helpful instruments and attributes!
 Camera
 Color, Shape
 Weighing machine
 Freshness: pH sensor, Moisture sensor, and Gas sensor
(http://www.ijsce.org/wp-content/uploads/papers/v8i3/C3146078318.pdf)
 Texture profile analysis: using visible and near infrared hyperspectral
imaging
https://pubmed.ncbi.nlm.nih.gov/24128497/
 ..and more???
Apple Orange
If redness > .85
Redness (from Color) More towards red More towards
[0,1] : 1-> completely red orange/yellow And If roughness < .1
And If moisture < .4
Roughness (from texture) Smooth surface Rough surface Then Output “Apple”
[0,1]: 0-> completely
Else Output “Orange”
smooth

𝛼 = .85
Moisture [0:1] : 1-> 100% Less % of water High % of water
𝛽 = .1
water
𝛾 = .4
If redness >𝛼 𝛼 = .85
And If roughness < 𝛽 𝛽 = .1
And If moisture < 𝛾 𝛾 = .4
Then Output “Apple”
Else Output “Orange”

Consider the following sample

Redness Roughness Moisture Fruit type
.9 .08 .35 Apple
If redness >𝛼 𝛼 = .85 Need to update threshold parameter 𝛽 = .1,
And If roughness < 𝛽 𝛽 = .1 need to push towards .15
Δ𝛽 ← a fraction of (𝛽expected − 𝛽existing )
And If moisture < 𝛾 𝛾 = .4
Δ𝛽 ← 𝜇 (𝛽expected − 𝛽existing )
Then Output “Apple” (𝜇 = .3, learning rate)
Else Output “Orange” 𝛽new ← 𝛽existing + 𝜇 𝛽expected − 𝛽existing
= .1 + .3 .15 − .1 = .115
Consider the following sample
Redness Roughness Moisture Fruit type
.91 .15 .31 Apple ✘
If redness >𝛼 𝛼 = .85 Need to update threshold parameter 𝛼 = .85,
And If roughness < 𝛽 𝛽 = .115 need to push towards .7
Δ𝛼 ← a fraction of (𝛼expected − 𝛼existing )
And If moisture < 𝛾 𝛾 = .4
Δ𝛼 ← 𝜇 (𝛼expected − 𝛼existing )
Then Output “Apple” (𝜇 = .3, learning rate)
Else Output “Orange” 𝛼new ← 𝛼existing + 𝜇 𝛼expected − 𝛼existing
= .85 + .3 .7 − .85 = .805
Consider the following sample …and so on.
Redness Roughness Moisture Fruit type
.7 .11 .33 Apple ✘
 Features  Parameters  Model = Algorithms + Trained Parameters
 Algorithm  Hyper (𝜇)
 Training, Testing
 Model (𝛼, 𝛽, 𝛾)
 The algorithm varies across different machine learning techniques
𝛽
sin 𝛼 + cos
𝑓 𝛼, 𝛽, 𝛾 = 2
𝑒𝛾
If 𝑓 𝛼, 𝛽, 𝛾 ≥ .3
Then Output “Apple”
Else Output “Orange”

The above algorithm may not make sense, but it is an algorithm nonetheless.
Key Terms
 Features
 Algorithm
 Parameter
 Hyper parameter
 Model parameter

 Model
 Training, Testing
Supervised learning

Learning
Algorithm
Classification Problem

x1 and x2 are called “features”

Class is the output
 Unsupervised learning
 Correct responses are not provided, but instead the algorithm tries to
identify similarities between the inputs so that inputs that have something in
common are categorized together.
An Un-Supervised Learning Example
 Good apples
 Very fresh, the best of the lot
 Average apples
 Averagely fresh
 Below average
 About to go bad, slightly damaged
Reinforcement Learning
 Somewhere between supervised and unsupervised learning.
 The algorithm gets a “reward score” for each prediction (action), but
does not get told how to better the score.
 It has to explore and try out different possibilities until it maximizes
the reward score.
An Example
 Robot mouse has no
idea of the room
layout – obstructions
and ways
 Robot mouse only
knows how to
recognize block with
cheese – the end
goal.
Evolutionary Learning
 Evolution as a search problem
 Competing animals and “Survival of the fittest”
 “Fittest” animals
◼ Livelonger
◼ Stronger
◼ More attractive

 Hence, they get more mates and produce more and “healthier” off springs
 Nature is biased towards
“fitter” animals for sexual
reproduction
 Basisfor Genetic
Algorithms
 A child inherits
chromosome from its
parents
 Survival of fittest means
child can be better than its
parents
Genetic Algorithm (GA)
Modelling a problem as a GA
 A method for representing solutions as chromosomes (or

string of characters)
 A way to calculate the fitness of a solution
One generation
 A selection method to choose parents

 A way to generate offspring by breeding the parents

Select, Produce, Repeat!

Regression Problem

y What is the value of y when x = 0.44?

Function Approximation

Points plotted in 2D Goes through all

data points but the
spike looks out of
place

Joining with straight lines Plotted using y = 3

sin(5x)

Approximated using cubic

function
Classification Problem

Decision Boundaries
Machine Learning Process
1. Data Collection and Preparation
 #Samples, error-free, etc.
2. Feature Selection
 #features, which features to select, etc.
3. Algorithm Choice/Model Selection
 Which algorithm to select
4. Parameters
 Algorithms can be parametrized
5. Training
 Generalizing model based on training dataset to predict outputs of unseen
data
6. Evaluation
 How good our trained model fares on unseen data
Input
Input is a vector of n dimensions
x = {𝑥0 , 𝑥1 , … , 𝑥𝑛−1 }

Sometimes, it is represented as a column vector.

Curse of Dimensionality
 As dimensionality increases, the volume of the space increases so fast that the
available data become sparse.
 In order to obtain a statistically sound and reliable result, the amount of data
needed to support the result often grows exponentially with the dimensionality.
X Y
4 9
8 4
8 5
9 9
1 7
6 8
10 8
10 10
10 7
1 3
 Organizing and searching data often relies on detecting areas where
objects form groups with similar properties
 In high dimensional data, however, all objects appear to be sparse and
dissimilar in many ways, which prevents common data organization
strategies from being efficient.

X Y
4 9
8 4
8 5
9 9
1 7
6 8
10 8
10 10
10 7
1 3
 Having higher dimension is not necessarily a good thing for machine
learning algorithms.
 The same can hold for very few dimensions as well.
 Generally:
 Resultsimprove with increasing the number of dimensions
 Then reach their best at “ideal #dimension”

 And then start deteriorating with further higher dimensions.

Training, Testing, & Validation Sets
(Supervised Learning)
 We “train” the algorithm on a certain data set called “Training Set”
 The algorithm generalizes or adapts itself to predict target labels of yet to
be seen data based on training dataset.
 Once trained, the algorithm is evaluated on “Testing Set” for how well
it can predict the unseen data.
 Every data set is assumed to contain data points based on certain
pattern or generating function – say F.
 Mostof the points can be generated nicely by F.
 Some cannot ! They are noise.
Noise

Generating
function

Well-behaved
points
Overfitting

Just right trained Over trained – predicts noise too

(Overfitting)

Prediction snapshots at two different points of training

Just rightly
Roughness → trained model!

Redness →

Classification logic: if a point touches the thick blue line, it is an apple,

else it is not an apple
Just rightly trained model! Over trained model
Roughness →

Redness →
Testing using just rightly Testing using over
trained model! trained model
 We want to stop training before overfitting happens
 We need to evaluate how well the training algorithm is generalizing
(predicting) an unseen data set.
 Training data set cannot be used here – wont detect overfitting

 Test data set cannot be used – saved for final evaluation

 We use “Validation set” – a third data set, which is different from training
set and test set.
◼ The process is called cross validation.
50: 25:25 (if you have plenty of data)
60:20:20 (if you don’t have plenty of data)
Multi-fold Cross Validation Model with lowest validation
error is selected, OR,
Average error is considered

Model 1

Model 2
Testing Output Results
 Confusion Matrix: a (𝑘 ∗ 𝑘) matrix, where k = #target labels
 𝑖, 𝑗 𝑡ℎ entry denotes #samples having target label i, but labelled as j by
the algorithm.

𝐶3 has most misclassifications, i.e., two.

Accuracy
 Assume a binary classification (Class I and Class II)
 Consider results from Class I perspective:
 True Positive (TP): An observation correctly classified into Class I
 False Positive (FP): An observation incorrectly classified into Class I

 True Negative (TN): An observation correctly classified into the other class
(i.e., Class II)
 False Negative (FN): An observation incorrectly classified into the other class
(i.e., Class II)
Confusion Matrix
 Precision and Recall alone are not sufficient (why?)
1
 Precision ∝ …to an extent
Recall
 F1 score = harmonic mean of precision and recall
F-measure

𝛽 > 1 favors recall

𝛽 < 1 favors precision
𝛽 = 1 equal importance to precision and recall

2𝑃𝑅
 𝐹𝐵=1 or 𝐹1 =
𝑃+𝑅
 Harmonic mean is more conservative than arithmetic mean
◼ Closer to the smaller of the two numbers
Evaluating more than two classes
 Accuracy is defined as:

#Correct Predictions #𝑇𝑃 + #𝑇𝑁

=
#Total Predictions #𝑇𝑃 + #𝐹𝑃 + #𝑇𝑁 + #𝐹𝑁

Incorrectly given in text book Eq. 2.2

#Correct Positive Examples
=
#Total Positive Examples
#Correct Negative Examples
=
#Total Negative Examples
#Correct Positive Examples
=
#Total Classified as Positive
#Correct Positive Examples
=
#Total Positive Examples
Incorrectly given in text book: swapped text-based definitions of Precision and Recall
Receiver Operator Characteristic (ROC) Curve

Ideal Classifier
By chance –
50-50 ratio

Anti Classifier
True positive rate =
#𝑇𝑃 False positive rate =
#𝐹𝑃
#𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
#𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠
Area under ROC curve
(1,1)
Better Result

Area under ROC

curve – larger area
means better Results from multi-fold
classifier cross validation of a
classifier
The further away you
are from diagonal –
the better it is (0,0)
Matthew’s Correlation Coefficient
 So far defined measures of accuracy work best when #positive
samples is same as #negative samples in the dataset.
 Otherwise, a more correct measure is Matthew’s Correlation
Coefficient:
Case of Multi-Class Classification
 The above measures can be calculated individually for each class X
 Considering X as the positive class and clubbing all the other classes
together as the negative class.
Maximum a Posteriori (MAP) Hypothesis
 What is the most likely class given the training data?
 Let 𝑋𝑗 = {𝑋𝑗1 , 𝑋𝑗2 , … , 𝑋𝑗𝑛 } be an input vector. Then, we are interested in
class 𝐶𝑥 such that
𝑃 𝐶𝑥 𝑋𝑗 > 𝑃 𝐶𝑦 𝑋𝑗 ∀𝑥, 𝑦

 But, how to estimate 𝑃 𝐶𝑥 𝑋𝑗 ???

But, how to estimate 𝑃 𝐶𝑥 𝑋𝑗 ???
Say, 𝑃 𝐶1 𝑥 = 5 =?
6
=
10

#Examples in bin 𝑋𝑗 of class 𝐶1

𝑃 𝐶𝑥 𝑋𝑗 =
#Examples in bin 𝑋𝑗
 #samples=10000, #dimension =1, #bins/dimension=14, #bins=14
#samples 10000
𝐸 =
bin 14
 #samples=10000, #dimension =2, #bins/dimension=14, #bins= 142
#samples 10000
𝐸 =
bin 14 2
 #samples=10000, #dimension =3, #bins/dimension=14, #bins=143
#samples 10000
𝐸 =
bin 14 3
…
5
 #samples=10000, #dimension =5, #bins/dimension=14, #bins=14
#samples 10000
𝐸 =
bin 14 5
 For, 𝑋𝑗 = {𝑋𝑗1 , 𝑋𝑗2 , … , 𝑋𝑗𝑛 }, as n (#dimensions) increases, #samples in
each bin of histogram shrinks
 Becomes almost 1 for each bin (curse of dimensionality)
 Severe lack of statistical confidence
Using Bayes’ Rule

Where:

𝑃(𝐶𝑖 ) can be estimated easily.

How to estimate 𝑃(𝑋𝑘 |𝐶𝑖 ) ?
How to estimate 𝑃(𝑋𝑘 |𝐶𝑖 ) ?
Say, 𝑃 𝑥 = 5 𝐶2 =?
4
=
50

#Examples in bin 𝑋𝑗 of class 𝐶𝑖

For X = 𝑋𝑗 , 𝑃 𝑋𝑗 𝐶𝑖 =
Toal #examples of class 𝐶𝑖
#Examples in bin 𝑋𝑗 of class 𝐶𝑖
For X = 𝑋𝑗 , 𝑃 𝑋𝑗 𝐶𝑖 =
Toal #examples of class 𝐶𝑖

 For, 𝑋𝑗 = {𝑋𝑗1 , 𝑋𝑗2 , … , 𝑋𝑗𝑛 }, as n (#dimensions) increases, #samples in

each bin of histogram shrinks
 Becomes almost 1 for each bin (curse of dimensionality)
 Simplifying assumption: “With respect to classification, elements of
feature vector are mutually conditionally independent”

=
=

Hence, rule for naïve Bayes' classifier is to select 𝐶𝑖 for which the following is
maximum:
List of activity you have been doing since last few days

Suppose a deadline is looming, but it is not urgent. Further, there is no ongoing party and
you are feeling lazy. Based on naïve Bayes’ classifier, what will you do?
Input 𝑋𝑗 = {Deadline = Near, Party = No, Lazy = Yes}
max 𝑖
Let us consider class “Party”
𝑃 Party =?
= 5/10
𝑃 Deadline = Near Party =?
2
=
5
𝑃 Party = No Party =?
0
=
5
𝑃 Lazy = Yes Party =? 𝑋𝑗 = {Deadline = Near, Party = No, Lazy = Yes}
3
=
5
=𝑃 Party × 𝑃 Deadline = Near Party ×
𝑃 Party = No Party × 𝑃 Lazy = Yes Party = 0
Basic Statistics
 Average measures: Mean, Median, Mode, Variance
 Mean: arithmetic average.
 Median: the middle value (sort and find)

 Mode: most frequent value

 Variance: measures how spread out values are

Variance
σ𝑁
𝑖=1 𝑥𝑖 − 𝜇
2
var 𝑥𝑖 = 𝜎 2 𝑥𝑖 =𝐸 𝑥𝑖 − 𝜇 2
=
𝑁
Where:
 𝑥𝑖 is random variable sampled N times (𝑥1 , 𝑥2 , … , 𝑥𝑛 )
 𝜇 denotes the mean of 𝑥𝑖

 𝜎 is known as standard deviation (square root of variance)

𝐸 𝑥𝑖 − 𝜇 2 is expectation of the squared deviation of a random

variable from its mean
 Covariance: measures dependency of two (random) variables
cov 𝑥𝑖 , 𝑦𝑖 = 𝐸( 𝑥𝑖 − 𝜇 𝑦𝑖 − 𝑣 )

Where, 𝜇 is the mean of 𝑥𝑖 and 𝜈 is the mean of 𝑦𝑖

 Zero value: both (𝑥𝑖 and 𝑦𝑖 ) are unrelated
 Positive value: both increase/decrease at the same time.

 Negative value: when one increases, the other decreases, and vice-versa

cov 𝑥𝑖 , 𝑥𝑖 = 𝜎 2 𝑥𝑖 = var 𝑥𝑖
 For multiple variables, covariance matrix contains covariance
between all pairs of variables.

Where, 𝑥𝑖 is a column vector describing the elements of 𝑖 𝑡ℎ variable,

and 𝜇𝑖 is their mean.
 Square and symmetric matrix

 Matrix form:
Gaussian or Normal Distribution
 It is a probability distribution defined as (for two dimension):

Where, 𝜇 is the mean and 𝜎 the standard deviation.

Bell
Curve
 For higher dimension, it is defined as

Where, Σ is the 𝑛 × 𝑛 covariance matrix

Machine Learning & AI Textbook
No ratings yet
Machine Learning & AI Textbook
33 pages
Machine Learning Fundamentals and Applications
No ratings yet
Machine Learning Fundamentals and Applications
70 pages
Overview of Machine Learning Concepts
No ratings yet
Overview of Machine Learning Concepts
79 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
68 pages
AI and Machine Learning Overview
No ratings yet
AI and Machine Learning Overview
15 pages
Machine Learning vs Conventional Computing
No ratings yet
Machine Learning vs Conventional Computing
16 pages
Unit 1
100% (1)
Unit 1
88 pages
Final Project
No ratings yet
Final Project
9 pages
Introduction to Machine Learning Basics
No ratings yet
Introduction to Machine Learning Basics
27 pages
Intro To Machine Learning
No ratings yet
Intro To Machine Learning
31 pages
ML-UNIT - I - Part A
No ratings yet
ML-UNIT - I - Part A
88 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
68 pages
What Is Machine Learning - Definition, Types, and Examples - Coursera
No ratings yet
What Is Machine Learning - Definition, Types, and Examples - Coursera
10 pages
Comprehensive Machine Learning Guide
No ratings yet
Comprehensive Machine Learning Guide
24 pages
Introduction To AI
No ratings yet
Introduction To AI
17 pages
INtroduction To AdvancedMachine Learning2019
No ratings yet
INtroduction To AdvancedMachine Learning2019
69 pages
SC 02
No ratings yet
SC 02
14 pages
Machine Learning - Module 1
No ratings yet
Machine Learning - Module 1
105 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
52 pages
Machine Learning Overview and Types
No ratings yet
Machine Learning Overview and Types
20 pages
1 - ML - Introduction
No ratings yet
1 - ML - Introduction
47 pages
Machine Learning Overview and Types
No ratings yet
Machine Learning Overview and Types
16 pages
What Is Machine Learning - IBM
No ratings yet
What Is Machine Learning - IBM
1 page
UNIT I Introduction To Machine Learning
No ratings yet
UNIT I Introduction To Machine Learning
150 pages
Industrial Training Report on Machine Learning
No ratings yet
Industrial Training Report on Machine Learning
21 pages
Training Report On Machine Learning
No ratings yet
Training Report On Machine Learning
27 pages
Machine Learning in Bioinformatics Guide
No ratings yet
Machine Learning in Bioinformatics Guide
7 pages
01 ML Overview Notes
No ratings yet
01 ML Overview Notes
22 pages
Lec 15
No ratings yet
Lec 15
37 pages
Machine Learning Fundamentals Overview
No ratings yet
Machine Learning Fundamentals Overview
18 pages
Intelligent Agents in AI Overview
No ratings yet
Intelligent Agents in AI Overview
22 pages
ML Module 1
No ratings yet
ML Module 1
52 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
48 pages
Machine Learning vs Deep Learning Explained
No ratings yet
Machine Learning vs Deep Learning Explained
5 pages
Seminar Report Bhavesh
No ratings yet
Seminar Report Bhavesh
25 pages
Machine Learning PPT For Students
73% (11)
Machine Learning PPT For Students
18 pages
Introduction to Machine Learning with Azure
100% (1)
Introduction to Machine Learning with Azure
53 pages
Lecture 1.2 Introduction To Machine Learning
No ratings yet
Lecture 1.2 Introduction To Machine Learning
31 pages
ML Basic
No ratings yet
ML Basic
12 pages
Module - 01 Machine Learning (BCS602)
No ratings yet
Module - 01 Machine Learning (BCS602)
42 pages
Machine Learning for Students
No ratings yet
Machine Learning for Students
22 pages
Machine Learning
No ratings yet
Machine Learning
97 pages
Machine Learning vs Deep Learning Explained
No ratings yet
Machine Learning vs Deep Learning Explained
3 pages
Machine Learning vs Deep Learning Explained
No ratings yet
Machine Learning vs Deep Learning Explained
3 pages
ML Unit-1
No ratings yet
ML Unit-1
130 pages
Unit - 1 - SC
No ratings yet
Unit - 1 - SC
98 pages
Foundations of Machine Learning
100% (4)
Foundations of Machine Learning
202 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
9 pages
ML Maths Full Notes
No ratings yet
ML Maths Full Notes
120 pages
Turner, Ryan - Python Machine Learning - The Ultimate Beginner's Guide To Learn Python Machine Learning Step by Step Using Scikit-Learn and Tensorflow (2019)
100% (1)
Turner, Ryan - Python Machine Learning - The Ultimate Beginner's Guide To Learn Python Machine Learning Step by Step Using Scikit-Learn and Tensorflow (2019)
144 pages
Machine Learning
100% (2)
Machine Learning
211 pages
Introduction To Machine Learning For Beginners: Ayush Pant
No ratings yet
Introduction To Machine Learning For Beginners: Ayush Pant
28 pages
Lecture Note #2 - PEC-CS701E
No ratings yet
Lecture Note #2 - PEC-CS701E
19 pages
AI Lecture 9
No ratings yet
AI Lecture 9
39 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
270 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
38 pages
ML m1-m5 NOTES
No ratings yet
ML m1-m5 NOTES
160 pages
OS Lab Assignment Session 1 To 4
No ratings yet
OS Lab Assignment Session 1 To 4
3 pages
NumpyHandbook
No ratings yet
NumpyHandbook
23 pages
Assignment2. (4) - JupyterLab
No ratings yet
Assignment2. (4) - JupyterLab
3 pages
Question2ml - Ipynb - Colab
No ratings yet
Question2ml - Ipynb - Colab
3 pages
MLLabcode 1
No ratings yet
MLLabcode 1
3 pages
ML LAB - Ipynb - (4) - JupyterLab
No ratings yet
ML LAB - Ipynb - (4) - JupyterLab
10 pages
Scaffloding TC
No ratings yet
Scaffloding TC
3 pages
Puzzle Number Replacement Guide
No ratings yet
Puzzle Number Replacement Guide
10 pages
Prudentaire - Auto Shutter
No ratings yet
Prudentaire - Auto Shutter
3 pages
An Underwriting Guide To
No ratings yet
An Underwriting Guide To
17 pages
CBA User Guide
No ratings yet
CBA User Guide
22 pages
Real Time and Embedded Lab Manual
No ratings yet
Real Time and Embedded Lab Manual
68 pages
Digital Logic Design Winter Semester 2024-2025: Final Exam
No ratings yet
Digital Logic Design Winter Semester 2024-2025: Final Exam
14 pages
Defining Number Range
No ratings yet
Defining Number Range
14 pages
Mathematical Reasoning Proofs MAT 1362 Fall 2017 Alistair Savage Ebook Ultra-Clear PDF
100% (2)
Mathematical Reasoning Proofs MAT 1362 Fall 2017 Alistair Savage Ebook Ultra-Clear PDF
45 pages
Ocean Basin, Trenches, and Plate Movement (Final Presentation) pt.2
No ratings yet
Ocean Basin, Trenches, and Plate Movement (Final Presentation) pt.2
26 pages
21st Century HVAC for Naval Combatants
No ratings yet
21st Century HVAC for Naval Combatants
114 pages
Mental Health Insights in the Philippines
No ratings yet
Mental Health Insights in the Philippines
2 pages
Chuenyindee T Et All (2022) - Public Utility Vehicle Service Quality and Customer Satisfaction in The Philippines During The COVID-19 Pandemic.
No ratings yet
Chuenyindee T Et All (2022) - Public Utility Vehicle Service Quality and Customer Satisfaction in The Philippines During The COVID-19 Pandemic.
11 pages
Unit I: Educational Foundation: Module III. Social Science Theories and Their Implications To Education
No ratings yet
Unit I: Educational Foundation: Module III. Social Science Theories and Their Implications To Education
4 pages
A Critical Analysis of Gustave Le Bon's View of The Period of Jahiliyah Before The Advent of Islam
67% (3)
A Critical Analysis of Gustave Le Bon's View of The Period of Jahiliyah Before The Advent of Islam
9 pages
Quiz 1 - Answers and Rubrics
No ratings yet
Quiz 1 - Answers and Rubrics
3 pages
Potrait of A Lady Extra Questions
50% (8)
Potrait of A Lady Extra Questions
2 pages
D.A.V. Centenary Public School, Chander Nagar: Unit Test: 2 (2021-22) Subject: Mathematics Class: IX Max: Marks:30
No ratings yet
D.A.V. Centenary Public School, Chander Nagar: Unit Test: 2 (2021-22) Subject: Mathematics Class: IX Max: Marks:30
2 pages
DJI Air 3S Release Notes en
No ratings yet
DJI Air 3S Release Notes en
5 pages
6th Sem 3rd Unit Mangement of Nursing Services
No ratings yet
6th Sem 3rd Unit Mangement of Nursing Services
55 pages
Silent Sound Tech Explained
No ratings yet
Silent Sound Tech Explained
17 pages
Boolean Simplification and K-Maps Lab
No ratings yet
Boolean Simplification and K-Maps Lab
7 pages
Global Culture Handout - 1
No ratings yet
Global Culture Handout - 1
3 pages
Leadership and Communication Exam Guide
No ratings yet
Leadership and Communication Exam Guide
2 pages
Community Organizing in Health Care
No ratings yet
Community Organizing in Health Care
3 pages
STS - Human Flourishing The Good Life
No ratings yet
STS - Human Flourishing The Good Life
3 pages
S4TWL - MRP Live On SAP HANA - MD01N: Symptom
No ratings yet
S4TWL - MRP Live On SAP HANA - MD01N: Symptom
6 pages
The Anunnaki
100% (1)
The Anunnaki
6 pages
Architectural Portfolio: Devesh Uniyal
No ratings yet
Architectural Portfolio: Devesh Uniyal
24 pages
Chapter 1
No ratings yet
Chapter 1
11 pages

01 - Introduction

Uploaded by

01 - Introduction

Uploaded by

INTRODUCTION

Fall 2025 MC460303 Machine Learning

 In contrast, Hard computing techniques consist of algorithms that find

9. Soft computing incorporates randomness . Hard computing is settled.

Consider the following sample

x1 and x2 are called “features”

 A way to generate offspring by breeding the parents

Select, Produce, Repeat!

y What is the value of y when x = 0.44?

Points plotted in 2D Goes through all

Joining with straight lines Plotted using y = 3

Approximated using cubic

Sometimes, it is represented as a column vector.

 And then start deteriorating with further higher dimensions.

Just right trained Over trained – predicts noise too

Prediction snapshots at two different points of training

Classification logic: if a point touches the thick blue line, it is an apple,

 Test data set cannot be used – saved for final evaluation

𝐶3 has most misclassifications, i.e., two.

𝛽 > 1 favors recall

#Correct Predictions #𝑇𝑃 + #𝑇𝑁

Incorrectly given in text book Eq. 2.2

Area under ROC

 But, how to estimate 𝑃 𝐶𝑥 𝑋𝑗 ???

#Examples in bin 𝑋𝑗 of class 𝐶1

𝑃(𝐶𝑖 ) can be estimated easily.

#Examples in bin 𝑋𝑗 of class 𝐶𝑖

 For, 𝑋𝑗 = {𝑋𝑗1 , 𝑋𝑗2 , … , 𝑋𝑗𝑛 }, as n (#dimensions) increases, #samples in

 Mode: most frequent value

 Variance: measures how spread out values are

 𝜎 is known as standard deviation (square root of variance)

𝐸 𝑥𝑖 − 𝜇 2 is expectation of the squared deviation of a random

Where, 𝜇 is the mean of 𝑥𝑖 and 𝜈 is the mean of 𝑦𝑖

Where, 𝑥𝑖 is a column vector describing the elements of 𝑖 𝑡ℎ variable,

Where, 𝜇 is the mean and 𝜎 the standard deviation.

Where, Σ is the 𝑛 × 𝑛 covariance matrix

You might also like