Part I - NLP

08/11/2021
Artificial Intelligence for Natural Language Processing (NLP)
Dr. Eng. Wael Ouarda

Assistant Professor, CRNS, Higher Education Ministry, Tunisia
Centre de Recherche en Numérique de Sfax , Route de Tunis km 10 , Sakiet Ezzit , 3021 Sfax – Tunisie
Wael Ouarda - CRNS 1
About me
 Assistant Professor at the Digital Research Center of Sfax, Tunisia
 Head of the Brain4ICT team at the CRNS
 Postdoctoral Researcher at National School of Engineering of Sfax, University of Sfax:
 PRF Project 2017 – 2020: Multimodal Biometric Platform for fighting against the Terrorism in Tunisia;
 PAQ Collabora Project 2019 – 2021: Identification of Radicalized Profiles of Young Tunisians on Social Networks;
 PRF Project 2019 – 2021: Artificial Intelligence for facial dysmorphic identification for Tunisian newborns.
 PRF Project 2020 – 2021: Dysmorphic Face analysis for Metabolic sundrom
 Trainer on Machine Learning and Deep Learning:
 University of Manouba (1);
 DGET (ISETN) (2);
 University of Sousse (3);
 University of Monastir (3);
 University of Sfax (2);
 Spring and Summer Schools (2).
 Lead Auditor ISO 9001:2015;
 Lead Project Implementer ISO 21500:2012;
 Past Regional Coordinator of the Sfax Smart City Living Lab (SSCLL);
 Past IEEE Tunisia Section General Secretary (2018 – 2020).

1
08/11/2021
Brain4ICT’s Overview
Brain-like Architectures for Information and Communication Technology

Title: (Brain4ICT): Tools & Applications for Smart City
Vision: Co-contribute to extend the CNRS to a leading center of excellence in AI

Research Topics:
 Optimization (Bio-inspired Algorithm), Learning (ML & DL) and Reasoning (FL);
 Computer Vision;
 Signal Processing;
 Natural Language Processing;
 Business Intelligence.
Outline
Part I – Machine Learning tools for NLP
1. Artificial Intelligence (AI): from perception to reasoning
2. How to design and use a Machine Learning (ML) for NLP?
3. Machine Learning Techniques: A brief Review & Comparison
4. Neural Network: Theory and Application
5. Naïve Bayes: Theory and Application
6. Support Vector Machines (SVM): Theory and Application
7. How to select the appropriate Machine Learning
8. How to evaluate a Machine Learning Performance?
Part II – Natural Langage Processing (NLP) tools
9. Machine Learning (ML) for NLP?
10. Libraries & Frameworks
11. Cleaning Process
12. Word Embedding
13. Features Selection & Features Transformation
14. NLP Applications: Clustering & Classification Tasks
2
08/11/2021
Outline
Part III – Deep Learning tools for NLP
15. Convolutional Neural Network
16. Long Short Term Memory
17. CNN-LSTM for NLP
18. Transformers Vs Bert & Attentions in NLP
Part IV – Chatbots
19. Natural Language Understanding
20. Natural Language Generation
21. Chatbot form Scratch
22. Chatbot with Frameworks
1. Artificial Intelligence (AI): from perception to reasoning

Intelligence Artificial Intelligence
Image
Perception Processing
Living beings
Optimization Bio-Inspired
Living beings Optimization
Learning Machine
Learning
Baby, Animal, etc.
Reasoning Fuzzy Logic

Human
3
08/11/2021
2. How to design and use a Machine Learning (ML)?

Training process
Features Classification
Preprocessing Features Representation
Datamining
Data Cleaning Features Selection
Database Sentiment Analysis
Data cleaning Data Engineering
Topic Modeling
Dataset
Model
Data 1 Data 2
Data 1 Data 2
Testing process
F(X|X=”I”)=P1
F(X/X=”II”)=P2
? Preprocessing max(P1,P2)
X?
Why Preprocessing?
1. Text: Removal of: Feature 1 Feature 2 Feature 1 Feature 2

■ Special character;
■ Stopwords; 0 1500 0 True
■ Spell check; -1 3500 -1 True
■ Translation, etc.
2 NaN -> 0 2 False
2. Image/Video NaN -> 0 1000 0 True

■ Noise; 3 NaN -> 0 3 False
■ Blur,etc.
3. Data mining
■ Missed values: not available data -> Replace missed values by zero, max, min, average, mediance, etc;
■ Mixed values: We find different types of columns
● Categorical, Object, String, etc -> Encoding data
● Numerical
4
08/11/2021
Why Data Representation? Is to transform an object to a numerical vector

which will be used for learning
1. Text: Word Embedding

Tf-IDF (Statistical approach
Word2Vec (Learning based on Neural Network and Bag of Word) (Google)
FastText (Learning based on Neural Network and Bag of Word) (Facebook)
2. Image/Video
Hand crafted approaches: Gabor, Wavelet, Local Binary
Deep Learning approaches: CNN et Neural network AE
3. Data mining
CSV file coming from databases
Features selection based on Correlation analysis
Features transformation
Non-linear transformation: Autoencodeur (Deep Learning)
Linear transformation: Principal Component Analysis (PCA, LDA)
Why Data mining?

X Y Z Sentiment
1 1 -2 Négatif
1. Similairity based: 1 0 -1 Négatif
Speed, Less robust -1 -1 -1 Négatif
Used when features are pertinent (like DNA) 0 1 1 Positif
2. Probability based 1 -1 -1 Négatif
Huge data, Categorical Data 1 2 1 Positif
Initiation: Independency within variables 0 0 1 Négatif
2 -1 -1 Positif
3. Boundary Decision based 1 2 1 Négatif
■ Support Vector Machines (SVM) 0 1 0 Positif
● Small data, Numerical Data
● 2 cases: Linear seprarability and non linear separability (SVM with kernel) 0 1 2 Positif
■ Neural Network (NN)
● Huge data, Numerical Data
● We have to choose the best architecture for learning
10
5
08/11/2021
3. Machine Learning Techniques: A brief Review &

Comparison
Euclidian Distance
Similarity based
Cosine Distance
Supervised Learning Probability based Naïve Bayes

Machine Learning
Unsupervised Support Vector

Single Hidden Layer
Learning Machines
Boundary Decision
based
Reinforcement Multi Layer
Neural Network Autoencoder
Learning Perception (MLP)
Deep Learning CNN
RNN
6
08/11/2021

3x5x5x2
F F
x F F
C1
y F F
C2
z F F
W1 F F W3
W2
[x,y,z]: Input Vector

W1: Weight Matrix of Input Layer
W2: Weight Matrix of Hidden Layer
W3: Weight Matrix of Output Layer
F: Activation Function
C1: Class Output 1
C2: Class Output 2

F F
x F F
C1
y F F
C2
z F F
W1 F F W3
W2
Input: 1x3 to classify F(X|X=”I”)=P1 Model

F(X/X=”II”)=P2 F [P1, P2]
max(P1,P2)
(N,M) x (MxP) = (N,P)
1x3 3x5 1x5 5x5 1x5 5x2 1x2

W3
w’11 w’12 w’13 w’14 w’15 w’’11 w’’12
W2 w’21 w’22 w’23 w’24 w’25
w11 w12 w13 w14 w15 w’’21 w’’22
W1 w’31 w’32 w’33 w’34 w’35 w’’31 w’’32
w21 w22 w23 w24 w25
w’41 w’42 w’43 w’44 w’45 w’’41 w’’42
w31 w32 w33 w34 w35
w’51 w’52 w’53 w’54 w’55 w’’51 w’’52
7
08/11/2021
F= Nonlinear Activation Function to insert a Non Linear Representation into Neural Network

Train Vector = [2 , -1] ; Train Label = 1
𝟏
Logistic function =
𝟏 𝒆 𝒙
-1
8
08/11/2021

𝟏
Train Vector = [2 , -1] ; Train Label = 1 Logistic function =
𝟏 𝒆 𝒙
Step 1: Weights’ Initialization
0,5 1
2 1
-1 -1
1,5 3
-1 -3
-2 -4

𝟏
𝟏 𝒆 𝒙
Step 2: Forward Pass

Forward
0,5 1
2 ? 1
-1 -1
1,5 3
-1 -3
-2 -4
? = 𝒍𝒐𝒈𝒊𝒔𝒕𝒊𝒄 𝟎, 𝟓 ∗ 𝟐 + 𝟏, 𝟓 ∗ −𝟏 = 𝒍𝒐𝒈𝒊𝒔𝒕𝒊𝒄 −𝟎, 𝟓 = 𝟎, 𝟑𝟕𝟖
9
08/11/2021

𝟏
𝟏 𝒆 𝒙

Forward
0,5 1
2 0,378 1
-1 -1
1,5 3
-1 -3
-2 -4
? = 𝑙𝑜𝑔𝑖𝑠𝑡𝑖𝑐 0,5 ∗ 2 + 1,5 ∗ −1 = 𝑙𝑜𝑔𝑖𝑠𝑡𝑖𝑐 −0,5 = 0,378

𝟏
𝟏 𝒆 𝒙

Forward
0,5 1
2 0,378 1
-1 -1
1,5 3
-1 ? -3
-2 -4
? = 𝑙𝑜𝑔𝑖𝑠𝑡𝑖𝑐 −1 ∗ 2 + (−2) ∗ −1 = 𝑙𝑜𝑔𝑖𝑠𝑡𝑖𝑐 0 = 0,5
10
08/11/2021

𝟏
𝟏 𝒆 𝒙

Forward
0,5 1
2 0,378 ? 1
-1 -1
1,5 3
-1 0,5 -3
-2 -4
? = 𝑙𝑜𝑔𝑖𝑠𝑡𝑖𝑐 1 ∗ 0,378 + 3 ∗ 0,5 = 𝑙𝑜𝑔𝑖𝑠𝑡𝑖𝑐 1,878 = 0,876

𝟏
𝟏 𝒆 𝒙

Forward
0,5 1
2 0,378 0,876 1
-1 -1
1,5 3
-1 0,5 0,085 -3
-2 -4
? = 𝑙𝑜𝑔𝑖𝑠𝑡𝑖𝑐 (−1) ∗ 0,378 + (−4) ∗ 0,5 = 0,085
11
08/11/2021

𝟏
𝟏 𝒆 𝒙

Forward
0,5 1
2 0,378 0,876 1
-1 -1
0,648
1,5 3
-1 0,5 0,085 -3
-2 -4
? = 𝑙𝑜𝑔𝑖𝑠𝑡𝑖𝑐 1 ∗ 0,876 + (−3) ∗ 0,085 = 0,648

𝟏
𝟏 𝒆 𝒙
Step 3: Backward Pass

Backward
0,5 1
2 0,378 0,876 1
Δ = 1 – 0,648 = 0,352
-1 -1
0,648
1,5 3
-1 0,5 0,085 -3
-2 -4
12
08/11/2021

𝟏
𝟏 𝒆 𝒙

Backward
Δ = 0,041
0,5 1
2 0,378 0,876 1
Δ = 0,352
-1 -1
0,648
1,5 3
-1 0,5 0,085 -3
-2 -4
Δ = 0,876 ∗ (1 − 0,876) ∗ (1 ∗ 0,352) = 0,041

𝟏
𝟏 𝒆 𝒙

Backward
Δ = 0,041
0,5 1
2 0,378 0,876 1
Δ = 0,352
-1 -1
0,648
1,5 3 Δ = −0,082
-1 0,5 0,085 -3
-2 -4
Δ = 0,085 ∗ (1 − 0,085) ∗ ((−3) ∗ 0,352) = − 0,082
13
08/11/2021

𝟏
𝟏 𝒆 𝒙

Backward
Δ = 0,041
Δ
0,5 1
2 0,378 0,876 1
Δ = 0,352
-1 -1
0,648
1,5 3 Δ = −0,082
-1 0,5 0,085 -3
-2 -4
Δ = 0,378 ∗ (1 − 0,378) ∗ [ 1∗ 0,041 + (−1) ∗ (−0,082) ] = 0,029

𝟏
𝟏 𝒆 𝒙

Backward
Δ = 0,029 Δ = 0,041
0,5 1
2 0,378 0,876 1
Δ = 0,352
-1 -1
0,648
Δ Δ = −0,082
1,5 3
-1 0,5 0,085 -3
-2 -4
Δ = 0,5 ∗ (1 − 0,5) ∗ [ 3∗ 0,041 + (−4) ∗ (−0,082) ] = 0,113
14
08/11/2021

𝟏
𝟏 𝒆 𝒙

Backward
Δ = 0,029 Δ = 0,041
0,5 1
2 0,378 0,876 1
Δ = 0,352
-1 -1
0,648
Δ = 0,113 3 Δ = −0,082
1,5
-1 0,5 0,085 -3
-2 -4
Δ = 0,5 ∗ (1 − 0,5) ∗ [ 3∗ 0,041 + (−4) ∗ (−0,082) ] = 0,113

𝟏
𝟏 𝒆 𝒙
Δ = 0,029 Δ = 0,041
Step 4: Weights’ Update
0,5 1
2 0,378 0,876 1
Δ = 0,352
-1 -1
Learning Rate α=0,1 0,648
Δ = 0,113 3 Δ = −0,082
1,5
-1 0,5 0,085 -3
-2 -4
0,5 -> weight old value + α * neuron value * Delta of the next neuron
0,5 −> 0,5 + 0,1 ∗ 2 ∗ 𝟎, 𝟎𝟐𝟗 = 𝟎, 𝟓𝟎𝟔
15
08/11/2021

𝟏
𝟏 𝒆 𝒙
Δ = 0,029 Δ = 0,041
0,506
0,5 1
2 0,378 0,876 1
Δ = 0,352
-1 -1
Δ = 0,113 3 Δ = −0,082
1,5
-1 0,5 0,085 -3
-2 -4
0,5 -> weight old value + α * neuron value * Delta of the next neuron
0,5 −> 0,5 + 0,1 ∗ 2 ∗ 𝟎, 𝟎𝟐𝟗 = 𝟎, 𝟓𝟎𝟔

𝟏
𝟏 𝒆 𝒙
Δ = 0,029 Δ = 0,041
0,506
0,5 1
2 0,378 0,876 1
Δ = 0,352
-1 -1
Δ = 0,113 3 Δ = −0,082
1,5
-1 0,5 0,085 -3
-2 -4
0,5 −> 0,5 + 0,1 ∗ 2 ∗ 𝟎, 𝟎𝟐𝟗 = 𝟎, 𝟓𝟎𝟔

−1 −> −1 + 0,1 ∗ 2 ∗ 𝟎, 𝟏𝟏𝟑 = −𝟎, 𝟗𝟕𝟕
1,5 −> 1,5 + 0,1 ∗ (−1) ∗ 𝟎, 𝟎𝟐𝟗 = 𝟏, 𝟒𝟗𝟕
−2 −> −2 + 0,1 ∗ (−1) ∗ 𝟎, 𝟏𝟏𝟑 = −𝟐, 𝟎𝟏𝟏
16
08/11/2021

𝟏
𝟏 𝒆 𝒙
Δ = 0,029 Δ = 0,041
0,506
0,5 1
2 0,378 0,876 1
−𝟎, 𝟗𝟕𝟕 Δ = 0,352
-1 -1
Δ = 0,113 3 Δ = −0,082
1,5
𝟏, 𝟒𝟗𝟕
-1 0,5 0,085 -3
-2 -4
−𝟐, 𝟎𝟏𝟏
1 −> 1 + 0,1 ∗ 0,378 ∗ 𝟎, 𝟎𝟒𝟏 = 𝟏, 𝟎𝟎𝟐
−1 −> −1 + 0,1 ∗ 0,378 ∗ −𝟎, 𝟎𝟖𝟐 = −𝟏, 𝟎𝟎𝟑
3 −> 3 + 0,1 ∗ 0,5 ∗ 𝟎, 𝟎𝟒𝟏 = 𝟑, 𝟎𝟎𝟐
−4 −> −4 + 0,1 ∗ 0,5 ∗ −𝟎, 𝟎𝟖𝟐 == −𝟒, 𝟎𝟎𝟒

𝟏
𝟏 𝒆 𝒙
Δ = 0,029 Δ = 0,041
0,506 𝟏, 𝟎𝟎𝟐
0,5 1
2 0,378 0,876 1
−𝟎, 𝟗𝟕𝟕 -𝟏, 𝟎𝟎𝟑 Δ = 0,352
-1 -1
Δ = 0,113 3 Δ = −0,082
1,5
𝟏, 𝟒𝟗𝟕 3,002
-1 0,5 0,085 -3
-2 -4
−𝟐, 𝟎𝟏𝟏 -4,004
1 −> 1 + 0,1 ∗ 0,876 ∗ 𝟎, 𝟑𝟓𝟐 = 𝟏, 𝟎𝟑𝟏

−3−> −3 + 0,1 ∗ 0,085 ∗ 𝟎, 𝟑𝟓𝟐 = −𝟐, 𝟗𝟗𝟕
17
08/11/2021

𝟏
𝟏 𝒆 𝒙
Δ = 0,029 Δ = 0,041
0,506 𝟏, 𝟎𝟎𝟐
Step 4: Weights’ Update 1,031
0,5 1
2 0,378 0,876 1
−𝟎, 𝟗𝟕𝟕 -𝟏, 𝟎𝟎𝟑 Δ = 0,352
-1 -1
Δ = 0,113 3 Δ = −0,082
1,5
𝟏, 𝟒𝟗𝟕 3,002 -2,997
-1 0,5 0,085 -3
-2 -4
−𝟐, 𝟎𝟏𝟏 -4,004
Step 5: Repeat Step 1 to Step 4 for all train vectors
Train Database

𝟏
𝟏 𝒆 𝒙
Δ = 0,029 Δ = 0,041
0,506 𝟏, 𝟎𝟎𝟐
0,5 1
2 0,378 0,876 1
−𝟎, 𝟗𝟕𝟕 -𝟏, 𝟎𝟎𝟑 Δ = 0,352
-1 -1
0,648
Train Database
Δ = 0,113 3 Δ = −0,082
1,5
𝟏, 𝟒𝟗𝟕 3,002 -2,997
-1 0,5 0,085 -3
-2 -4
−𝟐, 𝟎𝟏𝟏 -4,004
18
08/11/2021

𝟏
𝟏 𝒆 𝒙
Δ = 0,029 Δ = 0,041
0,506 𝟏, 𝟎𝟎𝟐
0,5 1
2 0,378 0,876 1
−𝟎, 𝟗𝟕𝟕 -𝟏, 𝟎𝟎𝟑 Δ = 0,352
-1 -1
Epoch Train 0,648
Database
Δ = 0,113 3 Δ = −0,082
1,5
𝟏, 𝟒𝟗𝟕 3,002 -2,997
-1 0,5 0,085 -3
-2 -4
−𝟐, 𝟎𝟏𝟏 -4,004

Learning
Activity
Color Type Origin Stolen Color

𝟑 𝟐
𝑷 𝑹𝒆𝒅/𝒀𝒆𝒔 = 𝑷 𝒀𝒆𝒍𝒍𝒐𝒘/𝒀𝒆𝒔 =
Red Sport Domicile Yes 𝟓 𝟓
𝟐 𝟑
𝑷 𝑹𝒆𝒅/𝑵𝒐 = 𝑷 𝒀𝒆𝒍𝒍𝒐𝒘/𝑵𝒐 =
Red Sport Domicile No 𝟓 𝟓
Red Sport Domicile Yes
Yellow Sport Domicile No Type
𝟓 𝟒 𝟏
𝑷 𝒀𝒆𝒔 = 𝑷 𝑺𝒑𝒐𝒓𝒕/𝒀𝒆𝒔 = 𝑷 𝑪𝒍𝒂𝒔𝒔𝒊𝒄/𝒀𝒆𝒔 =
𝟏𝟎
Yellow Sport Importation Yes 𝟓 𝟓
𝟓 𝟐 𝟑
𝑷 𝑺𝒑𝒐𝒓𝒕/𝑵𝒐 = 𝑷 𝑪𝒍𝒂𝒔𝒔𝒊𝒄/𝑵𝒐 =
Yellow Classic Importation No 𝑷 𝑵𝒐 =
𝟏𝟎 𝟓 𝟓
Yellow Classic Importation Yes

Yellow Classic Domicile No Origin
𝟐 𝟑
Red Classic Importation No 𝑷 𝑫𝒐𝒎𝒊𝒄𝒊𝒍𝒆/𝒀𝒆𝒔 = 𝑷 𝑰𝒎𝒑𝒐𝒓𝒕𝒂𝒕𝒊𝒐𝒏/𝒀𝒆𝒔 =
𝟓 𝟓
𝟑 𝟐
Red Sport Importation Yes 𝑷 𝑫𝒐𝒎𝒊𝒄𝒊𝒍𝒆/𝑵𝒐 = 𝑷 𝑰𝒎𝒑𝒐𝒓𝒕𝒂𝒕𝒊𝒐𝒏/𝑵𝒐 =
𝟓 𝟓
19
08/11/2021

Testing
𝟓 𝟓
𝑷 𝑵𝒐 = 𝑷 𝒀𝒆𝒔 =
𝟏𝟎 𝟏𝟎
Sample X= <Red, Classic, Domicile>
Color
𝟑 𝟐
𝑷 𝑿/𝒀𝒆𝒔 = 𝑷 𝑹𝒆𝒅/𝒀𝒆𝒔 x 𝑷 𝑪𝒍𝒂𝒔𝒔𝒊𝒄/𝒀𝒆𝒔 x 𝑷 𝑫𝒐𝒎𝒊𝒄𝒊𝒍𝒆/𝒀𝒆𝒔 x 𝑷 𝒀𝒆𝒔 𝑷 𝑹𝒆𝒅/𝒀𝒆𝒔 = 𝑷 𝒀𝒆𝒍𝒍𝒐𝒘/𝒀𝒆𝒔 =
𝟓 𝟓
𝟑 𝟏 𝟐 𝟓 𝟐 𝟑
= ∗ ∗ ∗ 𝑷 𝑹𝒆𝒅/𝑵𝒐 = 𝑷 𝒀𝒆𝒍𝒍𝒐𝒘/𝑵𝒐 =
𝟓 𝟓 𝟓 𝟏𝟎
𝟓 𝟓
𝑷 𝑿/𝑵𝒐 = 𝑷 𝑹𝒆𝒅/𝑵𝒐 x 𝑷 𝑪𝒍𝒂𝒔𝒔𝒊𝒄/𝑵𝒐 x 𝑷 𝑫𝒐𝒎𝒊𝒄𝒊𝒍𝒆/𝑵𝒐 x 𝑷 𝑵𝒐 Type

𝟐 𝟑 𝟑 𝟓
𝟒 𝟏
= ∗ ∗ ∗ 𝑷 𝑺𝒑𝒐𝒓𝒕/𝒀𝒆𝒔 = 𝑷 𝑪𝒍𝒂𝒔𝒔𝒊𝒄/𝒀𝒆𝒔 =
𝟓 𝟓 𝟓 𝟏𝟎 𝟓 𝟓
𝟐 𝟑
𝑷 𝑺𝒑𝒐𝒓𝒕/𝑵𝒐 = 𝑷 𝑪𝒍𝒂𝒔𝒔𝒊𝒄/𝑵𝒐 =
𝟓 𝟓
Origin
𝟐 𝟑
𝑷 𝑫𝒐𝒎𝒊𝒄𝒊𝒍𝒆/𝒀𝒆𝒔 = 𝑷 𝑰𝒎𝒑𝒐𝒓𝒕𝒂𝒕𝒊𝒐𝒏/𝒀𝒆𝒔 =
𝟓 𝟓
𝟑 𝟐
𝑷 𝑫𝒐𝒎𝒊𝒄𝒊𝒍𝒆/𝑵𝒐 = 𝑷 𝑰𝒎𝒑𝒐𝒓𝒕𝒂𝒕𝒊𝒐𝒏/𝑵𝒐 =
𝟓 𝟓
6. Support Vector Machines: Theory and Application
Basic Idea: Find the appropriate Support Vector which maximize Margin Distance
Class A Features Space
M1
M2
M1 + M2 = Margin Distance
Class B Support Vector SV: A.X + B
20
08/11/2021
Case 1: Linear Separation Case 2: Non Linear Separation

Class A
Class A
Class B
M1
M2
Class B
Support Vector SV: A.X + B Support Vector SV: F(X)
F is a non linear Function
Case 1: Linear Separation Case 2: Non Linear Separation

Class A
Class A
Class B
M1
M2
Class B
Support Vector SV: A.X + B Support Vector SV: F(X)
F is a non linear Function
21
08/11/2021

New Features Space
Features Space
Class A
Class B
Kernel
Class C
Class D
Kernel
22
08/11/2021

Case 1: Linear Separation
1. K Support Vectors = N-1 where N is the number of samples (K=3)

2. Initialize 3 linear support vectors
• D1= A1*X+B1
• D2= A2*X+B2
• D3= A3*X+B3
3. Compute the accuracy of SVs
• D1 (75%)
• D2 (100%)
• D3 (100%)
4. Thresholding Acc>85%
• D2 (100%)
• D3 (100%)
4. Compare Margin Distance
• M2 (100%)
• M3 (100%)
• We keep the highest one

Case 2: Non Linear Separation
1. Use a Kernel Function F=x^2

(0,1) 2. Transform Vectors using F function
3. Apply Linear separability
(-1,1) 4. K Support Vectors = N-1 where N is the number of samples (K=3)
5. Initialize 3 linear support vectors
• D1= A1*X+B1
• D2= A2*X+B2
(0,-1) • D3= A3*X+B3
3. Compute the accuracy of SVs
• D1 (75%)
(F(0)=1,F(1)=1) (F(1)=1,F(1)=1) • D2 (100%)
• D3 (100%)
4. Thresholding Acc>85%
(F(0)=1,F(-1)=1) (F(-1)=1,F(1)=1) • D2 (100%)
• D3 (100%)
4. Compare Margin Distance
• M2 (100%)
• M3 (100%)
• We keep the highest one
23
08/11/2021
8. How to evaluate the Machine Learning Performance
65
35
𝑅𝑒𝑐𝑎𝑙𝑙 ∗ 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛
𝐹1 − 𝑠𝑐𝑜𝑟𝑒 = 2 ∗
𝑅𝑒𝑐𝑎𝑙𝑙 + 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛
𝑔 = (𝑅𝑒𝑐𝑎𝑙𝑙 ∗ 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛)
24

Part I - NLP

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Part I - NLP

Uploaded by

Copyright:

Available Formats

08/11/2021

Artificial Intelligence for Natural Language Processing (NLP)

Dr. Eng. Wael Ouarda

Wael Ouarda - CRNS 1

 Lead Project Implementer ISO 21500:2012;

 Past IEEE Tunisia Section General Secretary (2018 – 2020).

Brain-like Architectures for Information and Communication Technology

Vision: Co-contribute to extend the CNRS to a leading center of excellence in AI

Wael Ouarda - CRNS 3

Wael Ouarda - CRNS 5

1. Artificial Intelligence (AI): from perception to reasoning

Reasoning Fuzzy Logic

Wael Ouarda - CRNS 6

2. How to design and use a Machine Learning (ML)?

Wael Ouarda - CRNS 7

2. How to design and use a Machine Learning (ML)?

1. Text: Removal of: Feature 1 Feature 2 Feature 1 Feature 2

2. Image/Video NaN -> 0 1000 0 True

2. How to design and use a Machine Learning (ML)?

Why Data Representation? Is to transform an object to a numerical vector

1. Text: Word Embedding

2. How to design and use a Machine Learning (ML)?

Why Data mining?

3. Machine Learning Techniques: A brief Review &

Supervised Learning Probability based Naïve Bayes

Unsupervised Support Vector

Deep Learning CNN

Wael Ouarda - CRNS 11

4. Neural Network: Theory and Application

Wael Ouarda - CRNS 12

4. Neural Network: Theory and Application

[x,y,z]: Input Vector

Wael Ouarda - CRNS 13

4. Neural Network: Theory and Application

Input: 1x3 to classify F(X|X=”I”)=P1 Model

1x3 3x5 1x5 5x5 1x5 5x2 1x2

Wael Ouarda - CRNS 14

4. Neural Network: Theory and Application

Wael Ouarda - CRNS 15

4. Neural Network: Theory and Application

Wael Ouarda - CRNS 16

4. Neural Network: Theory and Application

Step 1: Weights’ Initialization

Wael Ouarda - CRNS 17

4. Neural Network: Theory and Application

Step 2: Forward Pass

? = 𝒍𝒐𝒈𝒊𝒔𝒕𝒊𝒄 𝟎, 𝟓 ∗ 𝟐 + 𝟏, 𝟓 ∗ −𝟏 = 𝒍𝒐𝒈𝒊𝒔𝒕𝒊𝒄 −𝟎, 𝟓 = 𝟎, 𝟑𝟕𝟖

Wael Ouarda - CRNS 18

4. Neural Network: Theory and Application

Step 2: Forward Pass

? = 𝑙𝑜𝑔𝑖𝑠𝑡𝑖𝑐 0,5 ∗ 2 + 1,5 ∗ −1 = 𝑙𝑜𝑔𝑖𝑠𝑡𝑖𝑐 −0,5 = 0,378

Wael Ouarda - CRNS 19

4. Neural Network: Theory and Application

Step 2: Forward Pass

? = 𝑙𝑜𝑔𝑖𝑠𝑡𝑖𝑐 −1 ∗ 2 + (−2) ∗ −1 = 𝑙𝑜𝑔𝑖𝑠𝑡𝑖𝑐 0 = 0,5

Wael Ouarda - CRNS 20

4. Neural Network: Theory and Application

Step 2: Forward Pass

? = 𝑙𝑜𝑔𝑖𝑠𝑡𝑖𝑐 1 ∗ 0,378 + 3 ∗ 0,5 = 𝑙𝑜𝑔𝑖𝑠𝑡𝑖𝑐 1,878 = 0,876

Wael Ouarda - CRNS 21

4. Neural Network: Theory and Application

Step 2: Forward Pass

? = 𝑙𝑜𝑔𝑖𝑠𝑡𝑖𝑐 (−1) ∗ 0,378 + (−4) ∗ 0,5 = 0,085

Wael Ouarda - CRNS 22