You are on page 1of 26

Course: Artificial Intelligence

Pr. Fatima Ezzahraa BEN BOUAZZA


fbenbouazza@um6ss.ma

Course: Artificial Intelligence


Lecture 1: Introdcution to AI

Pr. Fatima Ezzahraa BEN BOUAZZA


fbenbouazza@um6ss.ma
Outline

Ø Introduction to AI
Ø Machine Learning
Ø Supervised Learning
Ø Unsupervised Learning

Artificial Intelligence

Ø What is Artificial Intelligence (AI)?


Ø Types of AI
Ø Symbolic vs. Machine Learning
What is Artificial Intelligence (AI) ?
AI: Branch of Computer Science
Endow Machines of Capabilities that are similar to Human Intelligence
à Solve problems, Perform Tasks, Other Cognitive Capabilities

AI Types

Weak AI (or Narrow AI)


§Machine that concentrates on one precise task
§e.g. recognizing a sound, an image, play a game against a human
§Works within a fixed and predetermined context
§Does not work outside the task it is intended for
Strong AI
§Machine endowed with consciousness, emotion, innovation capabilities, imagination and
creativity
§Can perform any task a human being can do
General AI
§machine capable of applying intelligence to any problem rather than to a specific problem

Current Systems à Weak AI


With very few exceptions that can (kind of) approach Strong AI

Examples of Weak AI

§Cognitive Capabilities
Vision: Image Recognition
Hearing: Speech Recognition
Natural language Processing: Language translation, etc.

§Medical Applications
Cancer detection from Histopathologic images
Prediction of Glycaemia for a patient with diabetes

§Games
Chess: Deep Blue (IBM) beats world champion Garry Kasparov in 1997
Go Game: AlphaGo (Software by DeepMind) beats world champion Ke Jie in 2017.

§Voice Assistants: Siri, Alexa, Google, Cortana, Bixby, etc.


§Autonomous Cars
Impressive Successes
2 Types of (weak) AI

§Symbolic AI
Based on Symbolic Representations, manipulation of Symbols
Logic, Reasoning, Knowledge Representation (Expert Systems)
Good Reasoning Capabilities, based on rules, etc.
§Machine Learning AI
Mathematical Model (function) with adjustable parameters
Parameters learned (adjusted) automatically based on data to make the AI model as
effective as possible
Ai becomes becomes better by automatic learning without (or with minimal) human
intervention
Can handle any data type

Artificial Intelligence: Symbolic vs Machine


Learning
Symbolic
Good reasoning capabilities, based on rules
Good explainability.
Based on symbolic representations
Machine Learning
Good learning capabilities
Most of the time, no explainability
Based on any type of data

Machine Learning has dominated AI over the last years


What explains the current boom in Machine
Learning AI
§High performance machines, at more affordable costs
Very high computing power
Very high storage capacity
§Access to very large datasets
§Various Sensors, Internet of Things
§Advent of powerful AI algorithms
Deep Neural Networks (Deep Learning)
§Availability of open-source AI software: Tensorflow, PyTorch, Sickitlearn, Keras, etc.
§Large-Scale Deployment

Outline

Ø Introduction to IA
Ø Machine Learning
Ø Supervised Learning
Ø Unsupervised Learning

2
Machine Learning: definition
Two definitions:
Machine Learning is concerned with the design, the analysis,
and the application of algorithms that allow computers to learn
ü A computer learns if it improves its performance at some task
with experience (i.e., by collecting data)
Machine Learning is concerned with the design, the analysis,
and the application of algorithms to extract a model of a system
from the sole observation (or the simulation) of this system in
some situations (i.e., by collecting data).

ü A model can be any relationship between the variables used to


describ the system.

Two main goals: make predictions and better understand the system 3

Machine learning: when ?


• Learning is useful when:
Ø Human expertise does not exist
navigating on Mars
Ø Humans are unable to explain their expertise
speech recognition
Ø Solution changes in time
routing on a computer network
Ø Solution needs to be adapted to particular cases
user biometrics
Example:
It is easier to write a program that learns to play checkers or
backgammon well by self-play rather than converting the
4
expertise of a master player to a program.
Applications: recommendation system
• Netflix prize: predict how much someone is going to love
a movie based on his movies preferences
• Data: over 100 million ratings that over 480,000 users gave
to nearly 18,000 movies
• Reward: $1,000,000 dollars if 10% improvement with respect
to Netflix's system in 2006 (two teams succeeded in 2009)

http://www.netflixprize.com 5

Applications: robotics
Machine learning is a core technology within robotics:
• To better sense the environment (computer vision, sound
recognition…)
• To decide on which sequence of actions to take to perform a given
task (reinforcement learning, imitation learning…)

1
4
Applications: credit risk analysis

• Data:

• Logical rules automatically learned from data:

1
5

Some applications
Some applications
Computer-aided cytology
Problem
•Early diagnosis of disease
based on cytological tests
•Improve detection of rare
abnormal cells among tens of
thousands of normal cells

Approach
•Pathologists collect a database of
normal and abnormal cells
• Automatically learn a cell
classification model
•Scan whole-slide digital images (>=
40000 x 40000 pixels) and rank most
suspicious ones for expert review

Other applications
Machine learning has a wide spectrum of applications
including:
• Retail: Market basket analysis, Customer
relationship management (CRM)
• Finance: Credit scoring, fraud detection
• Manufacturing: Optimization, troubleshooting
• Power systems: monitoring, control
• Medicine: Medical diagnosis
• Telecommunications: Quality of service optimization,
routing
• Bioinformatics: Motifs, alignment, network inference
• Web mining: Search engines
18
...
Related fields

Ø Artificial Intelligence: smart algorithms


Statistics: inference from a sample
Ø Computer Science: efficient algorithms and
complex models
Ø Systems and control: analysis, modeling, and control
of dynamical systems
Ø Data Mining/data science: searching through large volumes
of data

19

Problem definition
One part of the data
Data generation mining process

Raw data Each step generates many questions:


Ø Data generation: data types, sample
Preprocessing size, online/offline...
Ø Preprocessing: normalization,
Preprocessed data missing values, feature
selection/extraction...
Ø Machine learning: hypothesis, choice
Machine of learning paradigm/algorithm...
learning

Ø Hypothesis validation: cross-


Hypothesis validation, model deployment...

Validation

Knowledge/Predictive model
20
Glossary
l Data=a table (dataset, database, sample)
Variables (attributes, features) =
measurements made on objects

VAR 1 VAR 2 VAR 3 VAR 4 VAR 5 VAR 6 VAR 7 VAR 8 VAR 9 VAR 10 VAR 11 ...
Object 1 0 1 2 0 1 1 2 1 0 2 0 ...
Object 2 2 1 2 0 1 1 0 2 1 0 2 ...
Object 3 0 0 1 0 1 1 2 0 2 1 2 ...
Object 4 1 1 2 2 0 0 0 1 2 1 1 ...
Object 5 0 1 0 2 1 0 2 1 1 0 1 ...
Object 6 0 1 2 1 1 1 1 1 1 1 1 ...
Object 7 2 1 0 1 1 2 2 2 1 1 1 ...
Object 8 2 2 1 0 0 0 1 1 1 1 2 ...
Object 9 1 1 0 1 0 0 0 0 1 2 1 ...
Object 10 1 2 2 0 1 0 1 2 1 0 1 ...
... ... ... ... ... ... ... ... ... ... ... ... ...

Objects (samples, observations,


individuals, examples, patterns) Dimension=number of variables
Size=number of objects

Ø Objects: samples, patients, documents, images...


Ø Variables: genes, proteins, words, pixels...
21

Outline

Ø Introduction to IA
Ø Machine Learning
Ø Supervised Learning
ü Introduction
• Model selection, cross-validation, overfitting
Ø Unsupervised Learning

22
Supervised learning
Inputs Output

X1 X2 X3 X4 Y Supervised
-0.61 -0.43 Y 0.51 Healthy learning
-2.3 -1.2 N -0.21 Disease
0.33 -0.16 N 0.3 Healthy
0.23 -0.87 Y 0.09 Disease
-0.69 0.65 N 0.58 Healthy
0.61 0.92 Y 0.02 Disease
model,
Learning sample hypothesis

Ø Goal: from the database (learning sample), find a function f of the


inputs that approximates at best the output
Ø Formally:

16
Ø Symbolic output ⇒classification, Numerical output ⇒regression

Two main goals


Ø Predictive:
Make predictions for a new object described by its attributes
X1 X2 X3 X4 Y
-0.71 -0.27 T -0.72 Healthy
-2.3 -1.2 F -0.92 Disease
0.42 0.26 F -0.06 Healthy
0.84 -0.78 T -0.3 Disease
-0.55 -0.63 F -0.02 Healthy
0.07 0.24 T 0.4 Disease
0.75 0.49 F -0.88 ?

Ø Informative:
Help to understand the relationship between the inputs and the
output

Find the most relevant inputs

24
Example of applications

Ø Biomedical domain: medical diagnosis, differentiation of


diseases, prediction of the response to a treatment...

Gene expression, Metabolite concentrations...

X1 X2 ... X4 Y
-0.1 0.02 ... 0.01 Healthy
-2.3 -1.2 ... 0.88 Disease
Patients 0 0.65 ... -0.69 Healthy
0.71 0.85 ... -0.03 Disease
-0.18 0.14 ... 0.84 Healthy
-0.64 0.15 ... 0.03 Disease

25

Example of applications
Ø Perceptual tasks: handwritten character recognition,
speech recognition...
Inputs:
- > a grey intensity [0,255] for

each pixel
- > each image is

represented by a vector of
pixel intensities
- > eg.: 32x32=1024

dimensions

Output:

9 discrete values
Y={0,1,2,...,9}

26
Example of applications

Time series prediction: predicting electricity load, network


usage, stock market prices...

27

Outline

Ø Introduction
Ø Supervised Learning
• Introduction
ü Model selection, cross-validation, overfitting
Ø Unsupervised learning

28
Illustrative problem

Ø Medical diagnosis from two measurements (eg., weights and


temperature)

1
X1 X2 Y
0.93 0.9 H ealthy
0.44 0.85 D isease
0.53 0.31 H ealthy

X2
0.19 0.28 D isease
... ... ...
0.57 0.09 D isease
0.12 0.47 H ealthy
0
0 1
X1

Ø Goal: find a model that classifies at best new cases for


which X1 and X2 are known
29

Learning algorithm
A learning algorithm is defined by:
Ø a family of candidate models (=hypothesis space H)
Ø a quality measure for a model
Ø an optimization strategy
It takes as input a learning sample and outputs a function h
in H of maximum quality
1
a model
obtained by
supervised
X2

learning

0
0 1
X1
30
Linear model
Disease if w0+w1*X1+w2*X2>0
h(X1,X2)=
Normal otherwise

X2
0
0 1
X1

§ Learning phase: from the learning sample, find the best


values for w0, w1 and w2
§ Many alternatives even for this simple model (LDA,
Perceptron, SVM...)
31

Quadratic model

Disease if w0+w1*X1+w2*X2+w3*X12+w4*X22>0
h(X1,X2)=
Normal otherwise

1
X2

0
0 1
X1

Ø Learning phase: from the learning sample, find the best


values for w0, w1,w2, w3 and w4
Ø Many alternatives even for this simple model
(LDA, Perceptron, SVM...)
26
Artificial neural network

Disease if some very complex function of X1,X2>0


h(X1,X2)=
Normal otherwise

X2
0
0 1
X1

Ø Learning phase: from the learning sample, find the


numerous parameters of the very complex function

33

Which model is the best?


linear quadratic neural net
1 1 1

0 0 0
0 LS error= 3.4% 1 0 LS error= 1.0% 1 0 LS error= 0% 1

34
Which model is the best?
linear quadratic neural net
1 1 1

0 0 0
0 LS error= 3.4% 1 0 LS error= 1.0% 1 0 LS error= 0% 1

Ø Why not choose the model that minimises the error rate on
the learning sample? (also called re-substitution error)
Ø How well are you going to predict future data drawn from
the same distribution? (generalisation error)

35

The test set method

1. Randomly choose 30% of the data to


1
be in a test sample
2. The remainder is a learning sample
3. Learn the model from the learning
sample
0
0 1
4. Estimate its future performance on
the test sample

36
Which model is the best?
linear quadratic neural net
1 1 1

0 0 0
0 1 0 1 0 1
LS error= 3.4% LS error= 1.0% LS error= 0%
TS error= 3.5% TS error= 1.5% TS error= 3.5%

Ø We say that the neural network overfits the data


Ø Overfitting occurs when the learning algorithm starts
fitting noise. (by opposition, the linear model underfits the
data)

37

k-fold Cross Validation

Ø Randomly partition the dataset into k subsets (for example 10)

TS

Ø For each subset:


§ learn the model on the objects that are not in the subset
§ compute the error rate on the points in the subset
Ø Report the mean error rate over the k subsets
Ø When k=the number of objects ⇒leave-one-out cross
validation

38
CV-based complexity control
Error

Under-fitting Over-fitting

CV error

LS error

Optimal complexity Complexity


39

Complexity

Ø Controlling complexity is called regularization or


smoothing
Ø Complexity can be controlled in several ways
§ The size of the hypothesis space: number of candidate
models, range of the parameters...
§ The performance criterion: learning set performance versus
parameter range, eg. minimizes
Err(LS)+λ C(model)
§ The optimization algorithms: number of iterations, nature of
the optimization problem (one global optimum versus
several local optima)...
40
Comparison of learning algorithms
Three main criteria:
Ø Accuracy:
ü Measured by the generalization error (estimated by CV)
Ø Efficiency:
ü Computing times and scalability for learning and testing
Ø Interpretability:
ü Comprehension brought by the model about the input-output
relationship
Unfortunately, there is usually a trade-off between these
criteria

41

Outline

Ø Introduction to IA
Ø Machine Learning
Ø Supervised Learning
Ø Unsupervised Learning

42
Unsupervised learning

• Unsupervised learning tries to find any regularities in


the data without guidance about inputs and outputs
A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 A16 A17 A18 A19
-0.27 -0.15 -0.14 0.91 -0.17 0.26 -0.48 -0.1 -0.53 -0.65 0.23 0.22 0.98 0.57 0.02 -0.55 -0.32 0.28 -0.33
-2.3 -1.2 -4.5 -0.01 -0.83 0.66 0.55 0.27 -0.65 0.39 -1.3 -0.2 -3.5 0.4 0.21 -0.87 0.64 0.6 -0.29
0.41 0.77 -0.44 0 0.03 -0.82 0.17 0.54 -0.04 0.6 0.41 0.66 -0.27 -0.86 -0.92 0 0.48 0.74 0.49
0.28 -0.71 -0.82 0.27 -0.21 -0.9 0.61 -0.57 0.44 0.21 0.97 -0.27 0.74 0.2 -0.16 0.7 0.79 0.59 -0.33
-0.28 0.48 0.79 -0.14 0.8 0.28 0.75 0.26 0.3 -0.78 -0.72 0.94 -0.78 0.48 0.26 0.83 -0.88 -0.59 0.71
0.01 0.36 0.03 0.03 0.59 -0.5 0.4 -0.88 -0.53 0.95 0.15 0.31 0.06 0.37 0.66 -0.34 0.79 -0.12 0.49
-0.53 -0.8 -0.64 -0.93 -0.51 0.28 0.25 0.01 -0.94 0.96 0.25 -0.12 0.27 -0.72 -0.77 -0.31 0.44 0.58 -0.86
0.04 0.94 -0.92 -0.38 -0.07 0.98 0.1 0.19 -0.57 -0.69 -0.23 0.05 0.13 -0.28 0.98 -0.08 -0.3 -0.84 0.47
-0.88 -0.73 -0.4 0.58 0.24 0.08 -0.2 0.42 -0.61 -0.13 -0.47 -0.36 -0.37 0.95 -0.31 0.25 0.55 0.52 -0.66
-0.56 0.97 -0.93 0.91 0.36 -0.14 -0.9 0.65 0.41 -0.12 0.35 0.21 0.22 0.73 0.68 -0.65 -0.4 0.91 -0.64

• Are there interesting groups of variables or samples?


outliers? What are the dependencies between variables?

43

Unsupervised learning methods

• Many families of tasks/problems exist, among which:


Ø Clustering: try to find natural groups of samples/variables
eg: k-means, hierarchical clustering
Ø Dimensionality reduction: project the data from a high-
dimensional space down to a small number of
dimensions
eg: principal/independent component analysis, MDS
Ø Density estimation: determine the distribution of data within
the input space
eg: bayesian networks, mixture models.

44
Clustering

variables
• Clustering rows
grouping similar
objects
• Clustering columns
objects

grouping similar variables across


samples
• Bi-Clustering/Two-way
clustering
Cluster of
Bi-cluster objects grouping objects that are similar
Cluster of
variables
across a subset of variables
45

Principal Component Analysis

• An exploratory technique used to reduce the dimensionality


of the data set to a smaller space (2D, 3D)
A1 A2 A3 A4 A5 A6 A7 A8 A9 A10
PC1 PC2
0.25 0.93 0.04 -0.78 -0.53 0.57 0.19 0.29 0.37 -0.22 0.36 0.1
-2.3 -1.2 -4.5 -0.51 -0.76 0.07 0.81 0.95 0.99 0.26 -2.3 -1.2
-0.29 -1 0.73 -0.33 0.52 0.13 0.13 0.53 -0.5 -0.48 0.27 -0 .8 9
-0.16 -0.17 -0.26 0.32 -0.08 -0.38 -0.48 0.99 -0.95 0.34 -0 .1 9 0.7
0.07 -0.87 0.39 0.5 -0.63 -0.53 0.79 0.88 0.74 -0.14 -0 .7 7 -0.7
0.61 0.15 0.68 -0.94 0.5 0.06 -0.56 0.49 0 -0.77 -0 .6 5 -0 .9 9

• Transform some large number of variables into a


smaller number of uncorrelated variables called
principal components (PCs)

46
Books
• Reference book for the course:
• The elements of statistical learning: data mining, inference, and prediction.
T. Hastie et al, Springer, 2001 (second edition in 2009)
• Freely downloadable here:
http://statweb.stanford.edu/~tibs/ElemStatLearn/
• Other textbooks
• Machine Learning. Tom Mitchell, McGraw Hill, 1997.
• Pattern classification (2nd edition). R.Duda, P.Hart, D.Stork, Wiley
Interscience, 2000
• Pattern Recognition and Machine Learning (Information Science and Statistics).
C.M.Bishop, Springer, 2004
• Introduction to Machine Learning. Ethan Alpaydin, MIT Press, 2004.
• Machine Learning: The Art and Science of Algorithms that Make Sense of Data.
Peter Flach. Cambridge University Press, 2012.
• Machine Learning: a probabilistic perspective. Kevin P. Murphy. MIT Press,
2012. 102

Books
More advanced/specific topics
• kernel methods for pattern analysis. J. Shawe-Taylor and N. Cristianini.
Cambridge University Press, 2004
• Reinforcement Learning: An Introduction. R.S. Sutton and A.G. Barto. MIT
Press, 1998
• Neuro-Dynamic Programming. D.P Bertsekas and J.N. Tsitsiklis.
Athena Scientific, 1996
• Semi-supervised learning. Chapelle et al., MIT Press, 2006
• Predicting structured data. G. Bakir et al., MIT Press, 2007
• Deep learning. Goodfellow, Bengio, Courville, MIT Press, 2016

48
Generic ML toolboxes
• scikit-learn
scikit-learn.org
Open source machine learning toolbox (in Python)
• WEKA
http://www.cs.waikato.ac.nz/ml/weka/
Open source machine learning toolbox (in Java)
• Many R and Matlab packages
http://www.kyb.mpg.de/bs/people/spider/
http://www.cs.ubc.ca/~murphyk/Software/BNT/bnt.html

49

Journals

• Journal of Machine Learning Research


• Machine Learning
• IEEE Transactions on Pattern Analysis and Machine Intelligence
• Journal of Artificial Intelligence Research
• Neural computation
• Annals of Statistics
• IEEE Transactions on Neural Networks
Data Mining and Knowledge Discovery
...
50
Conferences
• International Conference on Machine Learning (ICML)
• European Conference on Machine Learning (ECML)
• Neural Information Processing Systems (NIPS)
• Uncertainty in Artificial Intelligence (UAI)
• International Joint Conference on Artificial Intelligence (IJCAI)
• International Conference on Artificial Neural Networks
(ICANN)
• Computational Learning Theory (COLT)
• Knowledge Discovery and Data mining (KDD)
• ...
51

You might also like