0% found this document useful (0 votes)

45 views84 pages

Introduction à l'apprentissage automatique

L'apprentissage automatique est une méthode d'analyse des données qui utilise des algorithmes pour construire des modèles analytiques à partir de données, permettant aux ordinateurs de découvrir des informations sans programmation explicite. Il se divise en apprentissage supervisé, où les algorithmes sont formés sur des exemples étiquetés, et apprentissage non supervisé, qui traite des données sans étiquettes. Les performances des modèles sont évaluées à l'aide de métriques telles que la précision, le rappel, la précision et le score F1.

Uploaded by

sarabenamar27

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views84 pages

Introduction à l'apprentissage automatique

Uploaded by

sarabenamar27

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Qu'est-ce que

l'apprentissage
automatique ?
Machine Learning

● Avant de nous plonger dans les réseaux

neuronaux, Tensorflow, Keras API, etc., il
est bon de comprendre quelques idées
fondamentales concernant l'apprentissage
automatique..
● Dans cette section, nous allons aborder
quelques théories et concepts importants
concernant l'apprentissage automatique.
Machine Learning

● Aperçu de la section :
 Qu'est-ce que l'apprentissage automatique ?
 Différence entre apprentissage supervisé et
apprentissage non supervisé
 Processus d'apprentissage supervisé
 Évaluation des performances
 Sur-adaptation
Qu'est-ce que l'apprentissage automatique ?

● L'apprentissage automatique est une méthode

d'analyse des données qui automatise la
construction de modèles analytiques.
● Grâce à des algorithmes qui apprennent de
manière itérative à partir de données,
l'apprentissage automatique permet aux
ordinateurs de trouver des informations cachées
sans être explicitement programmés pour les
chercher.
A quoi sert-il ?
 Moteurs de recommandation
 Détection des fraudes.
 Segmentation des clients
 Résultats de recherche sur le Web.
 Analyse des sentiments dans le texte

 Annonces en temps réel sur les pages Web  Taux de désabonnement

 Reconnaissance des formes et des

 Evaluation du crédit.
images.

 Prévision des pannes d'équipement.  Filtrage du courrier électronique

indésirable.
 Nouveaux modèles de tarification.

 Détection des intrusions dans les réseaux.

Machine Learning

● Il existe différents types d'apprentissage

automatique sur lesquels nous nous
concentrerons au cours des prochaines
sections du cours :
• Apprentissage supervisé
• Apprentissage non supervisé
Machine Learning

● Let’s begin by learning about one of the most

common machine learning tasks- Supervised
Learning!
Supervised Learning
Supervised Learning

● Supervised learning algorithms are trained

using labeled examples, such as an input
where the desired output is known.
● For example, a segment of text could have a
category label, such as:
○ Spam vs. Legitimate Email
○ Positive vs. Negative Movie Review
Supervised Learning

● The network receives a set of inputs along

with the corresponding correct outputs, and
the algorithm learns by comparing its actual
output with correct outputs to find errors.
● It then modifies the model accordingly.
Supervised Learning

● Supervised learning is commonly used in

applications where historical data predicts
likely future events.
Machine Learning Process

Test
Data

Model
Data Data Model Model
Training &
Acquisition Cleaning Testing Deployment
Building
Overfitting and
Underfitting
Machine Learning

● Overfitting
○ The model fits too much to the noise from
the data.
○ This often results in low error on training
sets but high error on test/validation sets.
Machine Learning

Good Model

X
Machine Learning

● Overfitting

X
Machine Learning

● Underfitting
○ Model does not capture the underlying
trend of the data and does not fit the data
well enough.
○ Low variance but high bias.
○ Underfitting is often a result of an
excessively simple model.
Machine Learning

Underfitting

X
Machine Learning

● This data was easy to visualize, but how can

we see underfitting and overfitting when
dealing with multi dimensional data sets?
● First let’s imagine we trained a model and then
measured its error over training time.
Machine Learning

● Good Model

Error

Training
Time
Machine Learning

● Good Model

Error

Epochs
Machine Learning

● Bad Model

Error

Epochs
Machine Learning

● When thinking about overfitting and

underfitting we want to keep in mind the
relationship of model performance on the
training set versus the test/validation set.
Machine Learning

● Let’s imagine we split our data into a training

set and a test set
Machine Learning

● We first see performance on the training set

Error

Epochs
Machine Learning

● Next we check performance on the test set

Error

Epochs
Machine Learning

● Ideally the model would perform well on both,

with similar behavior.

Error

Epochs
Machine Learning

● But what happens if we overfit on the training

data? That means we would perform poorly on
new test data!

Error

Epochs
Machine Learning

● But what happens if we overfit on the training

data? That means we would perform poorly on
new test data!

Error

Epochs
Machine Learning

● This is a good indication of training too much

on the training data, you should look for the
point to cut off training time!

Error

Epochs
Machine Learning

● We’ll check on this idea again when we

actually begin creating models!
● For now just be aware of this possible issue!
Evaluating
Performance
CLASSIFICATION
Model Evaluation

● The key classification metrics we need to

understand are:
○ Accuracy
○ Recall
○ Precision
○ F1-Score
Model Evaluation

● Typically in any classification task your

model can only achieve two results:
○ Either your model was correct in its
prediction.
○ Or your model was incorrect in its
prediction.
Model Evaluation

● Fortunately incorrect vs correct expands to

situations where you have multiple classes.
● For the purposes of explaining the metrics,
let’s imagine a binary classification
situation, where we only have two
available classes.
Model Evaluation

● In our example, we will attempt to predict

if an image is a dog or a cat.
● Since this is supervised learning, we will
first fit/train a model on training data,
then test the model on testing data.
● Once we have the model’s predictions
from the X_test data, we compare it to the
true y values (the correct labels).
Model Evaluation

TRAINED
MODEL
Model Evaluation

TRAINED
Test Image
from X_test
MODEL
Model Evaluation

TRAINED
Test Image
from X_test
MODEL

DOG
Correct Label
from y_test
Model Evaluation

TRAINED DOG
Test Image
from X_test
MODEL
Prediction on
Test Image
DOG
Correct Label
from y_test
Model Evaluation

TRAINED DOG
Test Image
from X_test
MODEL
Prediction on
Test Image
DOG
Correct Label DOG == DOG ?
from y_test
Compare Prediction to Correct Label
Model Evaluation

TRAINED CAT
Test Image
from X_test
MODEL
Prediction on
Test Image
DOG
Correct Label DOG == CAT ?
from y_test
Compare Prediction to Correct Label
Model Evaluation

● We repeat this process for all the images in

our X test data.
● At the end we will have a count of correct
matches and a count of incorrect matches.
● The key realization we need to make, is
that in the real world, not all incorrect or
correct matches hold equal value!
Model Evaluation

● Also in the real world, a single metric won’t

tell the complete story!
● To understand all of this, let’s bring back
the 4 metrics we mentioned and see how
they are calculated.
● We could organize our predicted values
compared to the real values in a confusion
matrix.
Model Evaluation

● Accuracy
○ Accuracy in classification problems is
the number of correct predictions
made by the model divided by the total
number of predictions.
Model Evaluation

● Accuracy
○ For example, if the X_test set was 100
images and our model correctly
predicted 80 images, then we have
80/100.
○ 0.8 or 80% accuracy.
Model Evaluation

● Accuracy
○ Accuracy is useful when target classes
are well balanced
○ In our example, we would have roughly
the same amount of cat images as we
have dog images.
Model Evaluation

● Accuracy
○ Accuracy is not a good choice with
unbalanced classes!
○ Imagine we had 99 images of dogs and 1
image of a cat.
○ If our model was simply a line that
always predicted dog we would get 99%
accuracy!
Model Evaluation

● Accuracy
○ Imagine we had 99 images of dogs and 1
image of a cat.
○ If our model was simply a line that
always predicted dog we would get 99%
accuracy!
○ In this situation we’ll want to
understand recall and precision
Model Evaluation

● Recall
○ Ability of a model to find all the relevant
cases within a dataset.
○ The precise definition of recall is the
number of true positives divided by the
number of true positives plus the
number of false negatives.
Model Evaluation

● Precision
○ Ability of a classification model to
identify only the relevant data points.
○ Precision is defined as the number of
true positives divided by the number of
true positives plus the number of false
positives.
Model Evaluation

● Recall and Precision

○ Often you have a trade-off between
Recall and Precision.
○ While recall expresses the ability to find
all relevant instances in a dataset,
precision expresses the proportion of
the data points our model says was
relevant actually were relevant.
Model Evaluation

● F1-Score
○ In cases where we want to find an
optimal blend of precision and recall we
can combine the two metrics using
what is called the F1 score.
Model Evaluation

● F1-Score
○ The F1 score is the harmonic mean of
precision and recall taking both metrics
into account in the following equation:
Model Evaluation

● F1-Score
○ We use the harmonic mean instead of a
simple average because it punishes
extreme values.
○ A classifier with a precision of 1.0 and a
recall of 0.0 has a simple average of 0.5
but an F1 score of 0.
Model Evaluation

● We can also view all correctly classified

versus incorrectly classified images in the
form of a confusion matrix.
Confusion Matrix

Machine Math &

Learning
Statistics
DS

Software Research

Domain
Knowledge
Model Evaluation

● The main point to remember with the

Machine Math &
confusion matrix andLearning
the various
Statistics
calculated metrics is that DS
they are all
fundamentally ways of comparing
Software Research the
predicted values versus the true values.
● What constitutes “good”Domain
metrics, will
really depend on the specific situation!
Knowledge
Model Evaluation

● Let’s think back on this idea of:

Machine Math &
○ What is a good enough Learning accuracy?
Statistics
● This all depends on theDS context of the
situation! Software Research

● Did you create a model to predict

presence of a disease? Domain
● Is the disease presence
Knowledge well balanced in
the general population? (Probably not!)
Model Evaluation

● Often models are used as quick

Machine Math &
diagnostic tests to have Learningbefore having a
Statistics
more invasive test (e.g.DS getting urine test
before getting a biopsy)
Software Research

● We also need to consider what is at

stake! Domain
Knowledge
Model Evaluation

● Often we have a precision/recall trade

Machine Math &
off, We need to decide if the
Learning model will
Statistics
should focus on fixing DSFalse Positives vs.
False Negatives. Software Research

● In disease diagnosis, it is probably better

to go in the directionDomain
of False positives,
so we make sure we correctly classify as
Knowledge
many cases of disease as possible!
Model Evaluation

● All of this is to say, machine learning is

Machine Math &
not performed in a “vacuum”,
Learning but instead
Statistics
a collaborative processDS where we should
consult with expertsSoftware
in the domain (e.g.
Research

medical doctors)
Domain
Knowledge
Evaluating
Performance
REGRESSION
Evaluating Regression

● Let’s take a moment now to discuss

evaluating Regression Models
● Regression is a task when a model
attempts to predict continuous values
(unlike categorical values, which is
classification)
Evaluating Regression

● You may have heard of some evaluation

metrics like accuracy or recall.
● These sort of metrics aren’t useful for
regression problems, we need metrics
designed for continuous values!
Evaluating Regression

● For example, attempting to predict the

price of a house given its features is a
regression task.
● Attempting to predict the country a
house is in given its features would be a
classification task.
Evaluating Regression

● Let’s discuss some of the most common

evaluation metrics for regression:
○ Mean Absolute Error
○ Mean Squared Error
○ Root Mean Square Error
Evaluating Regression

● Mean Absolute Error (MAE)

○ This is the mean of the absolute
value of errors.
○ Easy to understand
Evaluating Regression

● MAE won’t punish large errors however.

Evaluating Regression

● MAE won’t punish large errors however.

Evaluating Regression

● We want our error metrics to account

for these!
Evaluating Regression

● Mean Squared Error (MSE)

○ This is the mean of the squared
errors.
○ Larger errors are noted more than
with MAE, making MSE more
popular.
Evaluating Regression

● Root Mean Square Error (RMSE)

○ This is the root of the mean of the
squared errors.
○ Most popular (has same units as y)
Machine Learning

● Most common question from students:

○ “Is this value of RMSE good?”
● Context is everything!
● A RMSE of $10 is fantastic for predicting
the price of a house, but horrible for
predicting the price of a candy bar!
Machine Learning

● Compare your error metric to the average

value of the label in your data set to try to
get an intuition of its overall performance.
● Domain knowledge also plays an
important role here!
Machine Learning

● Context of importance is also necessary to

consider.
● We may create a model to predict how
much medication to give, in which case
small fluctuations in RMSE may actually be
very significant.
Unsupervised Learning
Machine Learning

● We’ve covered supervised learning, where

the label was known due to historical
labeled data.
● But what happens when we don’t have
historical labels?
Machine Learning

● There are certain tasks that fall under

unsupervised learning:
○ Clustering
○ Anomaly Detection
○ Dimensionality Reduction
Machine Learning

● Clustering
○ Grouping together unlabeled data
points into categories/clusters
○ Data points are assigned to a cluster
based on similarity
Machine Learning

● Anomaly Detection
○ Attempts to detect outliers in a
dataset
○ For example, fraudulent transactions
on a credit card.
Machine Learning

● Dimensionality Reduction
○ Data processing techniques that
reduces the number of features in a
data set, either for compression, or to
better understand underlying trends
within a data set.
Machine Learning

● Unsupervised Learning
○ It’s important to note, these are
situations where we don’t have the
correct answer for historical data!
○ Which means evaluation is much
harder and more nuanced!
Unsupervised Process

Test
Data

Model
Data Data Training & Transformation Model
Acquisition Cleaning Building Deployment

Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
42 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
109 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
109 pages
04 Machine Learning Overview
No ratings yet
04 Machine Learning Overview
109 pages
Machine - Learning - Unit - 1
No ratings yet
Machine - Learning - Unit - 1
70 pages
Classification
No ratings yet
Classification
22 pages
Ai Unit 5
No ratings yet
Ai Unit 5
13 pages
Machine Learning Course Syllabus Overview
No ratings yet
Machine Learning Course Syllabus Overview
118 pages
Chapter 7 - LAST
No ratings yet
Chapter 7 - LAST
29 pages
Challenges in Machine Learning Explained
No ratings yet
Challenges in Machine Learning Explained
32 pages
Supervised Learning: Classification Basics
No ratings yet
Supervised Learning: Classification Basics
63 pages
CH 6
No ratings yet
CH 6
24 pages
Machine Learning Classification Guide
No ratings yet
Machine Learning Classification Guide
28 pages
AI Evaluation
No ratings yet
AI Evaluation
18 pages
NLP Classifier Evaluation Metrics Guide
No ratings yet
NLP Classifier Evaluation Metrics Guide
146 pages
Types of Machine Learning Models Explained
No ratings yet
Types of Machine Learning Models Explained
48 pages
Machine Learning Challenges and Solutions
No ratings yet
Machine Learning Challenges and Solutions
32 pages
Generative vs Discriminative Classifiers
No ratings yet
Generative vs Discriminative Classifiers
53 pages
Supervised Machine Learning Explained
No ratings yet
Supervised Machine Learning Explained
122 pages
Unit 3
No ratings yet
Unit 3
27 pages
Machine Learning Basics and kNN Guide
No ratings yet
Machine Learning Basics and kNN Guide
60 pages
Lecture 20 - Evaluation Metrics
No ratings yet
Lecture 20 - Evaluation Metrics
27 pages
Binary Classifier Training and Evaluation
No ratings yet
Binary Classifier Training and Evaluation
151 pages
ML Unit 3
No ratings yet
ML Unit 3
127 pages
Machine Learning Classification Models
No ratings yet
Machine Learning Classification Models
24 pages
ML Challenges and Metrics
No ratings yet
ML Challenges and Metrics
19 pages
Evaluating Data Mining Model Performance
No ratings yet
Evaluating Data Mining Model Performance
30 pages
Lecture 2.2 Example Data Preparation Feature Engineering
No ratings yet
Lecture 2.2 Example Data Preparation Feature Engineering
25 pages
Supervised Learning Explained: Key Concepts
No ratings yet
Supervised Learning Explained: Key Concepts
38 pages
ML Unit IV
No ratings yet
ML Unit IV
70 pages
Chapter 2 Supervised Learning - p1-2
No ratings yet
Chapter 2 Supervised Learning - p1-2
45 pages
Machine Learning Note
No ratings yet
Machine Learning Note
40 pages
FML - KNN
No ratings yet
FML - KNN
64 pages
Machine Learning Concepts and Techniques
No ratings yet
Machine Learning Concepts and Techniques
68 pages
Session 1 Evaluation Model
No ratings yet
Session 1 Evaluation Model
58 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
134 pages
2-Training and Testing Models, Evaluation Metrics-01-07-2023
No ratings yet
2-Training and Testing Models, Evaluation Metrics-01-07-2023
23 pages
GR 10 - Final Evaluation
No ratings yet
GR 10 - Final Evaluation
45 pages
3ML.02.MainConcepts Evaluation
No ratings yet
3ML.02.MainConcepts Evaluation
35 pages
Machine Learning Model Building Steps
No ratings yet
Machine Learning Model Building Steps
35 pages
Module 2 - ML
No ratings yet
Module 2 - ML
53 pages
CH-5 ML
No ratings yet
CH-5 ML
36 pages
Evaluation of Predictive Models Final
No ratings yet
Evaluation of Predictive Models Final
6 pages
How to Evaluate Machine Learning Models
No ratings yet
How to Evaluate Machine Learning Models
14 pages
Lec10 Intro ML
No ratings yet
Lec10 Intro ML
93 pages
Modelling and Evaluation
No ratings yet
Modelling and Evaluation
36 pages
Machine Learning Performance Evaluation
No ratings yet
Machine Learning Performance Evaluation
68 pages
05 - Machine Learning
No ratings yet
05 - Machine Learning
31 pages
Unit3 Evaluating Models
No ratings yet
Unit3 Evaluating Models
10 pages
Data Mining Classification and Prediction
No ratings yet
Data Mining Classification and Prediction
17 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
6.data Mining - Classification
No ratings yet
6.data Mining - Classification
37 pages
Chapter 01 Introduction To Machine Learning
No ratings yet
Chapter 01 Introduction To Machine Learning
59 pages
2.8 MNIST + Classification
No ratings yet
2.8 MNIST + Classification
24 pages
Supervised Machine Learning
No ratings yet
Supervised Machine Learning
25 pages
Classification vs Regression in ML
No ratings yet
Classification vs Regression in ML
15 pages
DTS 101 Lecture 2
No ratings yet
DTS 101 Lecture 2
30 pages
Unit Iii
No ratings yet
Unit Iii
67 pages
Machine Learning Basics: Supervised Learning
No ratings yet
Machine Learning Basics: Supervised Learning
6 pages
1 Python Intro To Linear Regression LT
No ratings yet
1 Python Intro To Linear Regression LT
13 pages
K Nearest Neighbors and Distance Metrics
No ratings yet
K Nearest Neighbors and Distance Metrics
13 pages
Region Growing A New Approach
No ratings yet
Region Growing A New Approach
23 pages
Is Naive Bayes A Good Classifier For Document Clas
No ratings yet
Is Naive Bayes A Good Classifier For Document Clas
11 pages
New One-to-One Data Encryption Method
No ratings yet
New One-to-One Data Encryption Method
8 pages
F-Pure Threshold of Schubert Cycles
No ratings yet
F-Pure Threshold of Schubert Cycles
12 pages
Image Processing and Machine Vision
No ratings yet
Image Processing and Machine Vision
10 pages
Data Analytics III I Good Notes
No ratings yet
Data Analytics III I Good Notes
86 pages
TCS Literature Review Extended
No ratings yet
TCS Literature Review Extended
5 pages
Research Paper Documents
No ratings yet
Research Paper Documents
4 pages
AI Overview: Types, Evolution, and Agents
No ratings yet
AI Overview: Types, Evolution, and Agents
12 pages
Fits and Risks: Chatgpt For Tourism: Applications, Bene
No ratings yet
Fits and Risks: Chatgpt For Tourism: Applications, Bene
14 pages
Swimmer Pose Estimation With SAM and YOLOv8
No ratings yet
Swimmer Pose Estimation With SAM and YOLOv8
25 pages
Question and Answer For Unit 1,2,3,8
No ratings yet
Question and Answer For Unit 1,2,3,8
35 pages
Transparency in Assistive Robotics
No ratings yet
Transparency in Assistive Robotics
12 pages
I - The Ultimate Ai Book For Product Managers
No ratings yet
I - The Ultimate Ai Book For Product Managers
76 pages
Experiment No: 8 TITTLE: Write A Program To Solve 8 Puzzle Problems. Exercise
No ratings yet
Experiment No: 8 TITTLE: Write A Program To Solve 8 Puzzle Problems. Exercise
3 pages
Unit-2 MLT
No ratings yet
Unit-2 MLT
84 pages
23 Cyber Security and Data Privacy Law in Pakistan Protecting Information and Privacy in The Digital Age
No ratings yet
23 Cyber Security and Data Privacy Law in Pakistan Protecting Information and Privacy in The Digital Age
11 pages
AI and NationalCompetitiveness
No ratings yet
AI and NationalCompetitiveness
21 pages
Gender Dynamics in Ex-Machina
No ratings yet
Gender Dynamics in Ex-Machina
2 pages
Running Generative AI On Intel AI Laptops
No ratings yet
Running Generative AI On Intel AI Laptops
10 pages
AI Workshop Report - Compressed
No ratings yet
AI Workshop Report - Compressed
6 pages
Madras Institute of Technology Anna University: Assignment-1 Case Study of Entrepreneurs
No ratings yet
Madras Institute of Technology Anna University: Assignment-1 Case Study of Entrepreneurs
14 pages
Make Your First 1 Million Dollars With Ai
No ratings yet
Make Your First 1 Million Dollars With Ai
128 pages
Chapter 8 Argumentative Essays
No ratings yet
Chapter 8 Argumentative Essays
44 pages
AI Lab Manual: Semester VI Experiments
No ratings yet
AI Lab Manual: Semester VI Experiments
54 pages
Milestone 7 Task (Peer Feedback)
No ratings yet
Milestone 7 Task (Peer Feedback)
3 pages
Passive Trading
25% (4)
Passive Trading
20 pages
H&M Fashion AI and Forecasting and Risk
No ratings yet
H&M Fashion AI and Forecasting and Risk
6 pages
HR Trends: A Decade of Change
No ratings yet
HR Trends: A Decade of Change
10 pages
MSFT Fy20q4 10K
No ratings yet
MSFT Fy20q4 10K
126 pages
Societies 15 00006
No ratings yet
Societies 15 00006
28 pages
PhD Aspirations in AI at IIT Ropar
100% (1)
PhD Aspirations in AI at IIT Ropar
2 pages
AI Insights from John Oliver's Show
No ratings yet
AI Insights from John Oliver's Show
1 page
Ie Generative Ai Deloitte Consulting
No ratings yet
Ie Generative Ai Deloitte Consulting
3 pages