Intor AI ESGB

Course: Artificial Intelligence
Pr. Fatima Ezzahraa BEN BOUAZZA

fbenbouazza@um6ss.ma
Course: Artificial Intelligence

Lecture 1: Introdcution to AI
Pr. Fatima Ezzahraa BEN BOUAZZA

fbenbouazza@um6ss.ma
Outline
Ø Introduction to AI
Ø Machine Learning
Ø Supervised Learning
Ø Unsupervised Learning
Artificial Intelligence
Ø What is Artificial Intelligence (AI)?

Ø Types of AI
Ø Symbolic vs. Machine Learning
What is Artificial Intelligence (AI) ?
AI: Branch of Computer Science
Endow Machines of Capabilities that are similar to Human Intelligence
à Solve problems, Perform Tasks, Other Cognitive Capabilities
AI Types
Weak AI (or Narrow AI)

§Machine that concentrates on one precise task
§e.g. recognizing a sound, an image, play a game against a human
§Works within a fixed and predetermined context
§Does not work outside the task it is intended for
Strong AI
§Machine endowed with consciousness, emotion, innovation capabilities, imagination and
creativity
§Can perform any task a human being can do
General AI
§machine capable of applying intelligence to any problem rather than to a specific problem
Current Systems à Weak AI

With very few exceptions that can (kind of) approach Strong AI
Examples of Weak AI
§Cognitive Capabilities
Vision: Image Recognition
Hearing: Speech Recognition
Natural language Processing: Language translation, etc.
§Medical Applications
Cancer detection from Histopathologic images
Prediction of Glycaemia for a patient with diabetes
§Games
Chess: Deep Blue (IBM) beats world champion Garry Kasparov in 1997
Go Game: AlphaGo (Software by DeepMind) beats world champion Ke Jie in 2017.
§Voice Assistants: Siri, Alexa, Google, Cortana, Bixby, etc.

§Autonomous Cars
Impressive Successes
2 Types of (weak) AI
§Symbolic AI
Based on Symbolic Representations, manipulation of Symbols
Logic, Reasoning, Knowledge Representation (Expert Systems)
Good Reasoning Capabilities, based on rules, etc.
§Machine Learning AI
Mathematical Model (function) with adjustable parameters
Parameters learned (adjusted) automatically based on data to make the AI model as
effective as possible
Ai becomes becomes better by automatic learning without (or with minimal) human
intervention
Can handle any data type
Artificial Intelligence: Symbolic vs Machine

Learning
Symbolic
Good reasoning capabilities, based on rules
Good explainability.
Based on symbolic representations
Machine Learning
Good learning capabilities
Most of the time, no explainability
Based on any type of data
Machine Learning has dominated AI over the last years

What explains the current boom in Machine
Learning AI
§High performance machines, at more affordable costs
Very high computing power
Very high storage capacity
§Access to very large datasets
§Various Sensors, Internet of Things
§Advent of powerful AI algorithms
Deep Neural Networks (Deep Learning)
§Availability of open-source AI software: Tensorflow, PyTorch, Sickitlearn, Keras, etc.
§Large-Scale Deployment
Outline
Ø Introduction to IA
Ø Machine Learning
2
Machine Learning: definition
Two definitions:
Machine Learning is concerned with the design, the analysis,
and the application of algorithms that allow computers to learn
ü A computer learns if it improves its performance at some task
with experience (i.e., by collecting data)
Machine Learning is concerned with the design, the analysis,
and the application of algorithms to extract a model of a system
from the sole observation (or the simulation) of this system in
some situations (i.e., by collecting data).
ü A model can be any relationship between the variables used to

describ the system.
Two main goals: make predictions and better understand the system 3
Machine learning: when ?

• Learning is useful when:
Ø Human expertise does not exist
navigating on Mars
Ø Humans are unable to explain their expertise
speech recognition
Ø Solution changes in time
routing on a computer network
Ø Solution needs to be adapted to particular cases
user biometrics
Example:
It is easier to write a program that learns to play checkers or
backgammon well by self-play rather than converting the
4
expertise of a master player to a program.
Applications: recommendation system
• Netflix prize: predict how much someone is going to love
a movie based on his movies preferences
• Data: over 100 million ratings that over 480,000 users gave
to nearly 18,000 movies
• Reward: $1,000,000 dollars if 10% improvement with respect
to Netflix's system in 2006 (two teams succeeded in 2009)
http://www.netflixprize.com 5
Applications: robotics
Machine learning is a core technology within robotics:
• To better sense the environment (computer vision, sound
recognition…)
• To decide on which sequence of actions to take to perform a given
task (reinforcement learning, imitation learning…)
1
4
Applications: credit risk analysis
• Data:
• Logical rules automatically learned from data:
1
5
Some applications
Some applications
Computer-aided cytology
Problem
•Early diagnosis of disease
based on cytological tests
•Improve detection of rare
abnormal cells among tens of
thousands of normal cells
Approach
•Pathologists collect a database of
normal and abnormal cells
• Automatically learn a cell
classification model
•Scan whole-slide digital images (>=
40000 x 40000 pixels) and rank most
suspicious ones for expert review
Other applications
Machine learning has a wide spectrum of applications
including:
• Retail: Market basket analysis, Customer
relationship management (CRM)
• Finance: Credit scoring, fraud detection
• Manufacturing: Optimization, troubleshooting
• Power systems: monitoring, control
• Medicine: Medical diagnosis
• Telecommunications: Quality of service optimization,
routing
• Bioinformatics: Motifs, alignment, network inference
• Web mining: Search engines
18
...
Related fields
Ø Artificial Intelligence: smart algorithms

Statistics: inference from a sample
Ø Computer Science: efficient algorithms and
complex models
Ø Systems and control: analysis, modeling, and control
of dynamical systems
Ø Data Mining/data science: searching through large volumes
of data
19
Problem definition
One part of the data
Data generation mining process
Raw data Each step generates many questions:

Ø Data generation: data types, sample
Preprocessing size, online/offline...
Ø Preprocessing: normalization,
Preprocessed data missing values, feature
selection/extraction...
Ø Machine learning: hypothesis, choice
Machine of learning paradigm/algorithm...
learning
Ø Hypothesis validation: cross-

Hypothesis validation, model deployment...
Validation
Knowledge/Predictive model
20
Glossary
l Data=a table (dataset, database, sample)
Variables (attributes, features) =
measurements made on objects
VAR 1 VAR 2 VAR 3 VAR 4 VAR 5 VAR 6 VAR 7 VAR 8 VAR 9 VAR 10 VAR 11 ...
Object 1 0 1 2 0 1 1 2 1 0 2 0 ...
Object 2 2 1 2 0 1 1 0 2 1 0 2 ...
Object 3 0 0 1 0 1 1 2 0 2 1 2 ...
Object 4 1 1 2 2 0 0 0 1 2 1 1 ...
Object 5 0 1 0 2 1 0 2 1 1 0 1 ...
Object 6 0 1 2 1 1 1 1 1 1 1 1 ...
Object 7 2 1 0 1 1 2 2 2 1 1 1 ...
Object 8 2 2 1 0 0 0 1 1 1 1 2 ...
Object 9 1 1 0 1 0 0 0 0 1 2 1 ...
Object 10 1 2 2 0 1 0 1 2 1 0 1 ...
... ... ... ... ... ... ... ... ... ... ... ... ...
Objects (samples, observations,

individuals, examples, patterns) Dimension=number of variables
Size=number of objects
Ø Objects: samples, patients, documents, images...

Ø Variables: genes, proteins, words, pixels...
21
Outline
Ø Machine Learning
ü Introduction
• Model selection, cross-validation, overfitting
22
Supervised learning
Inputs Output
X1 X2 X3 X4 Y Supervised
-0.61 -0.43 Y 0.51 Healthy learning
-2.3 -1.2 N -0.21 Disease
0.33 -0.16 N 0.3 Healthy
0.23 -0.87 Y 0.09 Disease
-0.69 0.65 N 0.58 Healthy
0.61 0.92 Y 0.02 Disease
model,
Learning sample hypothesis
Ø Goal: from the database (learning sample), find a function f of the

inputs that approximates at best the output
Ø Formally:
16
Ø Symbolic output ⇒classification, Numerical output ⇒regression
Two main goals

Ø Predictive:
Make predictions for a new object described by its attributes
X1 X2 X3 X4 Y
-0.71 -0.27 T -0.72 Healthy
-2.3 -1.2 F -0.92 Disease
0.42 0.26 F -0.06 Healthy
0.84 -0.78 T -0.3 Disease
-0.55 -0.63 F -0.02 Healthy
0.07 0.24 T 0.4 Disease
0.75 0.49 F -0.88 ?
Ø Informative:
Help to understand the relationship between the inputs and the
output
Find the most relevant inputs
24
Example of applications
Ø Biomedical domain: medical diagnosis, differentiation of

diseases, prediction of the response to a treatment...
Gene expression, Metabolite concentrations...
X1 X2 ... X4 Y
-0.1 0.02 ... 0.01 Healthy
-2.3 -1.2 ... 0.88 Disease
Patients 0 0.65 ... -0.69 Healthy
0.71 0.85 ... -0.03 Disease
-0.18 0.14 ... 0.84 Healthy
-0.64 0.15 ... 0.03 Disease
25
Ø Perceptual tasks: handwritten character recognition,
speech recognition...
Inputs:
- > a grey intensity [0,255] for
each pixel
- > each image is
represented by a vector of
pixel intensities
- > eg.: 32x32=1024
dimensions
Output:
9 discrete values
Y={0,1,2,...,9}
26
Time series prediction: predicting electricity load, network

usage, stock market prices...
27
Outline
Ø Introduction
• Introduction
ü Model selection, cross-validation, overfitting
Ø Unsupervised learning
28
Illustrative problem
Ø Medical diagnosis from two measurements (eg., weights and

temperature)
1
X1 X2 Y
0.93 0.9 H ealthy
0.44 0.85 D isease
0.53 0.31 H ealthy
X2
0.19 0.28 D isease
... ... ...
0.57 0.09 D isease
0.12 0.47 H ealthy
0
0 1
X1
Ø Goal: find a model that classifies at best new cases for

which X1 and X2 are known
29
Learning algorithm
A learning algorithm is defined by:
Ø a family of candidate models (=hypothesis space H)
Ø a quality measure for a model
Ø an optimization strategy
It takes as input a learning sample and outputs a function h
in H of maximum quality
1
a model
obtained by
supervised
X2
learning
0
0 1
X1
30
Linear model
Disease if w0+w1*X1+w2*X2>0
h(X1,X2)=
Normal otherwise
X2
0
0 1
X1
§ Learning phase: from the learning sample, find the best

values for w0, w1 and w2
§ Many alternatives even for this simple model (LDA,
Perceptron, SVM...)
31
Quadratic model
Disease if w0+w1*X1+w2*X2+w3*X12+w4*X22>0
h(X1,X2)=
Normal otherwise
1
X2
0
0 1
X1
Ø Learning phase: from the learning sample, find the best

values for w0, w1,w2, w3 and w4
Ø Many alternatives even for this simple model
(LDA, Perceptron, SVM...)
26
Artificial neural network
Disease if some very complex function of X1,X2>0

h(X1,X2)=
Normal otherwise
X2
0
0 1
X1
Ø Learning phase: from the learning sample, find the

numerous parameters of the very complex function
33
Which model is the best?

linear quadratic neural net
1 1 1
0 0 0
0 LS error= 3.4% 1 0 LS error= 1.0% 1 0 LS error= 0% 1
34
1 1 1
0 0 0
0 LS error= 3.4% 1 0 LS error= 1.0% 1 0 LS error= 0% 1
Ø Why not choose the model that minimises the error rate on
the learning sample? (also called re-substitution error)
Ø How well are you going to predict future data drawn from
the same distribution? (generalisation error)
35
The test set method
1. Randomly choose 30% of the data to

1
be in a test sample
2. The remainder is a learning sample
3. Learn the model from the learning
sample
0
0 1
4. Estimate its future performance on
the test sample
36
1 1 1
0 0 0
0 1 0 1 0 1
LS error= 3.4% LS error= 1.0% LS error= 0%
TS error= 3.5% TS error= 1.5% TS error= 3.5%
Ø We say that the neural network overfits the data

Ø Overfitting occurs when the learning algorithm starts
fitting noise. (by opposition, the linear model underfits the
data)
37
k-fold Cross Validation
Ø Randomly partition the dataset into k subsets (for example 10)
TS
Ø For each subset:

§ learn the model on the objects that are not in the subset
§ compute the error rate on the points in the subset
Ø Report the mean error rate over the k subsets
Ø When k=the number of objects ⇒leave-one-out cross
validation
38
CV-based complexity control
Error
Under-fitting Over-fitting
CV error
LS error
Optimal complexity Complexity

39
Complexity
Ø Controlling complexity is called regularization or

smoothing
Ø Complexity can be controlled in several ways
§ The size of the hypothesis space: number of candidate
models, range of the parameters...
§ The performance criterion: learning set performance versus
parameter range, eg. minimizes
Err(LS)+λ C(model)
§ The optimization algorithms: number of iterations, nature of
the optimization problem (one global optimum versus
several local optima)...
40
Comparison of learning algorithms
Three main criteria:
Ø Accuracy:
ü Measured by the generalization error (estimated by CV)
Ø Efficiency:
ü Computing times and scalability for learning and testing
Ø Interpretability:
ü Comprehension brought by the model about the input-output
relationship
Unfortunately, there is usually a trade-off between these
criteria
41
Outline
Ø Machine Learning
42
Unsupervised learning
• Unsupervised learning tries to find any regularities in

the data without guidance about inputs and outputs
A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 A16 A17 A18 A19
-0.27 -0.15 -0.14 0.91 -0.17 0.26 -0.48 -0.1 -0.53 -0.65 0.23 0.22 0.98 0.57 0.02 -0.55 -0.32 0.28 -0.33
-2.3 -1.2 -4.5 -0.01 -0.83 0.66 0.55 0.27 -0.65 0.39 -1.3 -0.2 -3.5 0.4 0.21 -0.87 0.64 0.6 -0.29
0.41 0.77 -0.44 0 0.03 -0.82 0.17 0.54 -0.04 0.6 0.41 0.66 -0.27 -0.86 -0.92 0 0.48 0.74 0.49
0.28 -0.71 -0.82 0.27 -0.21 -0.9 0.61 -0.57 0.44 0.21 0.97 -0.27 0.74 0.2 -0.16 0.7 0.79 0.59 -0.33
-0.28 0.48 0.79 -0.14 0.8 0.28 0.75 0.26 0.3 -0.78 -0.72 0.94 -0.78 0.48 0.26 0.83 -0.88 -0.59 0.71
0.01 0.36 0.03 0.03 0.59 -0.5 0.4 -0.88 -0.53 0.95 0.15 0.31 0.06 0.37 0.66 -0.34 0.79 -0.12 0.49
-0.53 -0.8 -0.64 -0.93 -0.51 0.28 0.25 0.01 -0.94 0.96 0.25 -0.12 0.27 -0.72 -0.77 -0.31 0.44 0.58 -0.86
0.04 0.94 -0.92 -0.38 -0.07 0.98 0.1 0.19 -0.57 -0.69 -0.23 0.05 0.13 -0.28 0.98 -0.08 -0.3 -0.84 0.47
-0.88 -0.73 -0.4 0.58 0.24 0.08 -0.2 0.42 -0.61 -0.13 -0.47 -0.36 -0.37 0.95 -0.31 0.25 0.55 0.52 -0.66
-0.56 0.97 -0.93 0.91 0.36 -0.14 -0.9 0.65 0.41 -0.12 0.35 0.21 0.22 0.73 0.68 -0.65 -0.4 0.91 -0.64
• Are there interesting groups of variables or samples?

outliers? What are the dependencies between variables?
43
Unsupervised learning methods
• Many families of tasks/problems exist, among which:

Ø Clustering: try to find natural groups of samples/variables
eg: k-means, hierarchical clustering
Ø Dimensionality reduction: project the data from a high-
dimensional space down to a small number of
dimensions
eg: principal/independent component analysis, MDS
Ø Density estimation: determine the distribution of data within
the input space
eg: bayesian networks, mixture models.
44
Clustering
variables
• Clustering rows
grouping similar
objects
• Clustering columns
objects
grouping similar variables across

samples
• Bi-Clustering/Two-way
clustering
Cluster of
Bi-cluster objects grouping objects that are similar
Cluster of
variables
across a subset of variables
45
Principal Component Analysis
• An exploratory technique used to reduce the dimensionality

of the data set to a smaller space (2D, 3D)
A1 A2 A3 A4 A5 A6 A7 A8 A9 A10
PC1 PC2
0.25 0.93 0.04 -0.78 -0.53 0.57 0.19 0.29 0.37 -0.22 0.36 0.1
-2.3 -1.2 -4.5 -0.51 -0.76 0.07 0.81 0.95 0.99 0.26 -2.3 -1.2
-0.29 -1 0.73 -0.33 0.52 0.13 0.13 0.53 -0.5 -0.48 0.27 -0 .8 9
-0.16 -0.17 -0.26 0.32 -0.08 -0.38 -0.48 0.99 -0.95 0.34 -0 .1 9 0.7
0.07 -0.87 0.39 0.5 -0.63 -0.53 0.79 0.88 0.74 -0.14 -0 .7 7 -0.7
0.61 0.15 0.68 -0.94 0.5 0.06 -0.56 0.49 0 -0.77 -0 .6 5 -0 .9 9
• Transform some large number of variables into a

smaller number of uncorrelated variables called
principal components (PCs)
46
Books
• Reference book for the course:
• The elements of statistical learning: data mining, inference, and prediction.
T. Hastie et al, Springer, 2001 (second edition in 2009)
• Freely downloadable here:
http://statweb.stanford.edu/~tibs/ElemStatLearn/
• Other textbooks
• Machine Learning. Tom Mitchell, McGraw Hill, 1997.
• Pattern classification (2nd edition). R.Duda, P.Hart, D.Stork, Wiley
Interscience, 2000
• Pattern Recognition and Machine Learning (Information Science and Statistics).
C.M.Bishop, Springer, 2004
• Introduction to Machine Learning. Ethan Alpaydin, MIT Press, 2004.
• Machine Learning: The Art and Science of Algorithms that Make Sense of Data.
Peter Flach. Cambridge University Press, 2012.
• Machine Learning: a probabilistic perspective. Kevin P. Murphy. MIT Press,
2012. 102
Books
More advanced/specific topics
• kernel methods for pattern analysis. J. Shawe-Taylor and N. Cristianini.
Cambridge University Press, 2004
• Reinforcement Learning: An Introduction. R.S. Sutton and A.G. Barto. MIT
Press, 1998
• Neuro-Dynamic Programming. D.P Bertsekas and J.N. Tsitsiklis.
Athena Scientific, 1996
• Semi-supervised learning. Chapelle et al., MIT Press, 2006
• Predicting structured data. G. Bakir et al., MIT Press, 2007
• Deep learning. Goodfellow, Bengio, Courville, MIT Press, 2016
48
Generic ML toolboxes
• scikit-learn
scikit-learn.org
Open source machine learning toolbox (in Python)
• WEKA
http://www.cs.waikato.ac.nz/ml/weka/
Open source machine learning toolbox (in Java)
• Many R and Matlab packages
http://www.kyb.mpg.de/bs/people/spider/
http://www.cs.ubc.ca/~murphyk/Software/BNT/bnt.html
…
49
Journals
• Journal of Machine Learning Research

• Machine Learning
• IEEE Transactions on Pattern Analysis and Machine Intelligence
• Journal of Artificial Intelligence Research
• Neural computation
• Annals of Statistics
• IEEE Transactions on Neural Networks
Data Mining and Knowledge Discovery
...
50
Conferences
• International Conference on Machine Learning (ICML)
• European Conference on Machine Learning (ECML)
• Neural Information Processing Systems (NIPS)
• Uncertainty in Artificial Intelligence (UAI)
• International Joint Conference on Artificial Intelligence (IJCAI)
• International Conference on Artificial Neural Networks
(ICANN)
• Computational Learning Theory (COLT)
• Knowledge Discovery and Data mining (KDD)
• ...
51

Intor AI ESGB

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Intor AI ESGB

Uploaded by

Copyright:

Available Formats

Course: Artificial Intelligence

Pr. Fatima Ezzahraa BEN BOUAZZA

Course: Artificial Intelligence

Pr. Fatima Ezzahraa BEN BOUAZZA

Ø What is Artificial Intelligence (AI)?

Weak AI (or Narrow AI)

Current Systems à Weak AI

§Voice Assistants: Siri, Alexa, Google, Cortana, Bixby, etc.

Artificial Intelligence: Symbolic vs Machine

Machine Learning has dominated AI over the last years

ü A model can be any relationship between the variables used to

Machine learning: when ?

• Logical rules automatically learned from data:

Ø Artificial Intelligence: smart algorithms

Raw data Each step generates many questions:

Ø Hypothesis validation: cross-

Objects (samples, observations,

Ø Objects: samples, patients, documents, images...

Ø Goal: from the database (learning sample), find a function f of the

Two main goals

Find the most relevant inputs

Ø Biomedical domain: medical diagnosis, differentiation of

Gene expression, Metabolite concentrations...

Time series prediction: predicting electricity load, network

Ø Medical diagnosis from two measurements (eg., weights and

Ø Goal: find a model that classifies at best new cases for

§ Learning phase: from the learning sample, find the best

Ø Learning phase: from the learning sample, find the best

Disease if some very complex function of X1,X2>0

Ø Learning phase: from the learning sample, find the

Which model is the best?

The test set method

1. Randomly choose 30% of the data to

Ø We say that the neural network overfits the data

k-fold Cross Validation

Ø Randomly partition the dataset into k subsets (for example 10)

Ø For each subset:

Optimal complexity Complexity

Ø Controlling complexity is called regularization or

• Unsupervised learning tries to find any regularities in

• Are there interesting groups of variables or samples?

Unsupervised learning methods

• Many families of tasks/problems exist, among which:

grouping similar variables across

Principal Component Analysis

• An exploratory technique used to reduce the dimensionality

• Transform some large number of variables into a

• Journal of Machine Learning Research

You might also like