Professional Documents
Culture Documents
• Full Professor
Dipartimento di Elettronica, Informazione e Bioingegneria
(DEIB), Politecnico di Milano, Italy
Email: manuel.roveri@polimi.it
Web: http://roveri.faculty.polimi.it
• Research interests: TinyML, IoT and edge computing, privacy-
preserving machine and deep learning
• Lecturer of « Computing Infrastructures» and «Hardware
Architecture for Embedded and edge AI»
• Associate Editor of IEEE Trans. on Artificial Intelligence, Neural
Networks, IEEE Trans. on Emerging Tecnologies in Computational
Intelligence, IEEE Trans. on Neural Networks and Learning Systems
• Chair of the IEEE CIS Technical Activities strategic planning
committee and IEEE CIS Neural Network Technical Committee
• Co-Founder of DHIRIA, a Spin-Off of Politecnico di Milano
2
Summary
• Introduction to AI projects
• AI models and prediction
• The basics of learning
• Neural Networks
• Evaluating AI models
• Technologies for AI
• AI and Cloud
• Hardware accelerators
• AI and IoT
• Challenges and opportunities
3
Artificial Intelligence
Machine Learning
Deep Learning
4
5
Artificial Intelligence
Machine Learning
Deep Learning
BackProp
Neural Networks Deep Learning
SVM
K Nearest Neighbour
Bayesian Inference
5
6
Univac
BackProp
Neural Networks Deep Learning
SVM
K Nearest Neighbour
Bayesian Inference
6
7
Univac
Computation,
Memory
Eniac Transistor Integrated
Circuits
Microprocessors
BackProp
Neural Networks Deep Learning
SVM
Computational needs
K Nearest Neighbour Memory Requirements
Bayesian Inference
7
8
We live in the era of information abundance
Virtual Sensors
Real Sensors
8
The 4-layer AI model
Solutions
Capabilities
Methodologies
Technologies
9
The 4-layer AI model
Solutions
10
The 4-layer AI model
Capabilities
11
The 4-layer AI model
Methodologies
12
The 4-layer AI model
Technologies
13
The 4-layer AI model
Solutions
Capabilities
Methodologies
Technologies
14
From the 4-layer model to the design of AI projects
AI Engineer
15
From the 4-layer AI model
to the design of AI projects
Solutions
Capabilities AI Engineer
Methodologies
Technologies
16
Identify the class of solutions
Solutions
Capabilities AI Engineer
Methodologies
Technologies
17
Identify Capabilities and Methodologies
Solutions
Capabilities AI Engineer
Methodologies
Technologies
18
Identify Capabilities and Methodologies
Solutions
Capabilities AI Engineer
Methodologies
Technologies
19
Identify Capabilities and Methodologies
Solutions
Capabilities AI Engineer
Methodologies
Technologies
20
Identify the Technologies
Solutions
Capabilities AI Engineer
Methodologies
Technologies
21
Identify the Technologies
Solutions
Capabilities AI Engineer
Methodologies
Technologies
22
Design of an AI project
Solutions
Capabilities AI Engineer
Methodologies
Technologies
Model
23
The training set
Solutions
Capabilities AI Engineer
Methodologies
Technologies
Training
Data
24
The training set
Solutions
Capabilities AI Engineer
Methodologies
Technologies
Training
TRAINING
Data
25
What after the training?
Training
TRAINING
Data
26
What after the training? Validation
Training
TRAINING
Data
VALIDATION
Perturbed Regression Problem -> BIAS
2 3
y- y-
f( ,x) f( ,x)
1.5 ZN 2.5 ZN
data1
2
1
1.5
0.5
1
0
y
y
0.5
-0.5
0
-1
-0.5
-1.5
-1
-2 -1.5
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x
27
What after the training? Validation
Training
TRAINING
Data
VALIDATION
28
What after the training? Validation
Training
TRAINING
Data
VALIDATION
29
Ready to go!
Prediction
Input Data
30
Ready to go … but never-ending learning
Prediction
Input Data
Training Data
31
Ready to go … but never-ending learning
Prediction
Input Data
TRAINING/
ADAPTATION
Training Data
32
Ready to go … but never-ending learning
Prediction
Input Data
TRAINING/
ADAPTATION
Training Data
Solutions
Capabilities
Methodologies
Technologies
AI Engineer
33
34
Models and prediction
Application
Reference
concept
Detection
trigger
35
Models and prediction
Application
Reference
concept
Detection
trigger
Lecture 5
36
Models and prediction: Regression
Regression
37
Models and prediction: Regression
Regression 2
y-
1.5
0.5
Interest
y
0
-0.5
-1
-1.5
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
Age
x
38
Models and prediction: Regression
Regression 2
ZN
1.5
Interest
0.5
y
0
-0.5
-1
-1.5
Age
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x
39
Models and prediction: Regression
Regression 2
f( ,x)
ZN
1.5
0.5
Interest
0
y
-0.5
-1
-1.5
-2
Age
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x
40
Models and prediction: Regression
Regression 2
y-
f( ,x)
1.5 ZN
0.5
Interest
0
y
-0.5
-1
-1.5
-2
Age
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x
41
Models and prediction: Classification
Classification
42
Models and prediction: Classification
Classification
Income
Age
43
Models and prediction: Classification
Classification
Income
Age
44
Models and prediction: Classification
Classification
Income
Age
45
Models and prediction: Classification
Classification
Income
46
Models and prediction: Clustering-
Clustering
47
Models and prediction: Clustering
Clustering
48
Models and prediction: Clustering
R
TLIE
Clustering
OU
49
Models and prediction: Prediction
Prediction
50
Models and prediction: Prediction
Prediction
Prediction Model
51
Models and prediction: Prediction
Prediction
Prediction Model
52
Models and prediction: Prediction
Prediction
Prediction Model
53
Models and prediction: Change Detection
Change Detection
54
Models and prediction: Change Detection
Change Detection
55
Models and prediction: Change Detection
Change Detection
Change Detection
56
Models and prediction: Change Detection
Change Detection
Change Detection
Application
Alarm
Analytics
57
Models and prediction: Adaptation
Adaptation
Application
Reference
concept
Detection
trigger
58
Models and prediction: Adaptation
Adaptation
Application
Reference
concept
Detection
trigger
59
The basics of Learning
60
A ”toy” example
Pass
Pass
NO Pass
Pass
61
Data-processing and applications
P
Data generating
process
Application
Model
of the
system
62
x(k + 1) = f (x(k), u(k)) + η(k), (46)
y(k) = h(x(k), u(k)) + d(k) (47)
Learning the system model where x ∈ Rn is the state vector, y ∈ R� is the output vector, u ∈ Rm is the input vector, which
may consist of some controlled inputs as well as some uncontrolled inputs which however can be
measured, η is the i.i.d. random variable describing the uncertainty affecting the state vector; d is
an independent and identically distributed (i.i.d.) random variable describing the noise affecting
the output vector. The functions f and h are, in general, non-linear functions and unlike the models
presented in (26)-(27) they are assumed unknown.
iSense D1.1: Specification of System Characteristics
The output equation (47) models the relationship among the output, the state and the input
variables, whileIf the state equation (46), models the evolution of the state variables over time with
the relationship between y(k) and u(k) is linear, the system model simplifies to
respect to the inputs and states.
The discrete-time model presented above isy(k) quite=general + d(k)
Du(k)and allows the modeling of a wide range(49)
of applications. In the following, we
where D is an � × m matrix.
specialize the system model to cover interesting application
cases, namely those where P can be described within a regression framework, the case where the
output
JOURNAL OF LATEX CLASS FILES, variables
VOL. 6, NO. 2.2.2coincide
1, JANUARY 2007 with the
Input-output state variables (input-output description) and the general
models 2 case
P
where the process can be specified with a state space representation.
Of particular interest is the case where P can be described by the input-output representation, here
as in single vs. ensemble, sequentialData generating
vs. batch,
considered passive
in the SISOvs. scenario,
particular, let xt 2 Rd be the observation at time t, generated
active. 2.2.1 Regression models by an unknown process X, and let yt be its class label,
We believe that the classification active process y(k) = is h(y(k
vs. passive 1), y(k − 2),
the −belonging y(k −set
to . a. . ,finite ky ),⇤.
u(k), − 1), . . . , u(k
Theu(kprobability ku )) + d(k)
of −observations (50)
When P
most appropriate one since it refers to does not
the have internal
way classifiers states
at time (i.e., the
t candependency. system has no dynamics), the
be expressed as input-output models represent a further output variables
Application
depend only and
on characterized
the input by a finite
variables at time
timelagkX and, hence,Linear (47) can be rewritten as
adapt in response to concept drift. In thespecific
following, we mainly
(x,y)
focus on active classifiers [2]–[5], [8]–[20], since
subcase of the above where the relationship between the outputX
p(x|t) = p(y|t)p(x|y, t), such that
and the input variables is
p(y|t) = 1,
linear. In the
suchproposed
a case and for the MIMO scenario, the system model assumes the general canonical
approach falls in this category. Readersform interested
[31]: in passive y2⇤ y2⇤
classifiers can refer to [1], [6]. y(k) = h(u(k)) + d(k) (1) (48)
Active classifiers rely on triggering mechanisms detecting where p(y|t) > 0, is the probability of receiving a sample
m
when the classifier is no more aligned with the current concept, ofA(z)y(k) class y 2 X
=
⇤, while
Bi (z) p(x|y, t)C(z)
ui (k) +
is the conditional probability
d(k) (51)
generally by means of a CDT. Estimate distribution of
i=1 i
class
F (z) y at time t. Both the probabilities of
D(z)
The adaptation phase is then activated as soon as a change classes and the23conditional probabilities are unknown and,
a model
where z is the time-shift possibly,
operator, time
A(z),variantBi (z), (whenever
C(z), D(z),a and Fi (z)drift
concept represent the z-transform
occurs).
is detected, and moves the learning machine into a new opera-
functions and, ui is the i-th input.
tional state. On the contrary, in passive solutions, thecanonical The training sequence is composed of the first
classifierform we can specify some linear input-output models for0 the system which T observa-
From the tions, assumed to be generated in stationary conditions, i.e.,
undergoes a continuous update every are time new supervised
widely-used in system identification, e.g., the AR, ARX and OE models. If we have a priori
samples are made available. These latter solutionsaboutgenerally 8y 2 ⇤, p(y|t) and p(x|y, t) do not change in t 2 [0, T0 ]. No
information the nature of the system then we can exploit such an information to build up an
rely on an ensemble of classifiers with effective
adaptation confined
model. in assumptions
An interesting
are made on how often supervised pairs (xt , yt )
positive consequence is that, after having identified the system with
the update of the weights of the fusion/aggregation
the suitable model, rule and are provided during the operational life (t > T ), as these
the bias component of the residual error vanishes and the0same model satisfies
in the creation/removal of classifiers composing the ensemble.
the i.i.d hypothesis, whichcould be for
is useful received
the subsequent followingstatistical
a regular time-pattern
change detectionscheme
phase.
[19] suggests an active classifier that monitors nonstation- (e.g., one supervised sample out of m) or asynchronously.
arity by inspecting variations in the mean AR value
system a slidingWhen the system can be expressed as a linear autoregressive (AR) model, Eq.
of model:
window opened over raw data. Differently, (51) simplifies
[18] to a linear relationship
takes decisions IV. JITbetween
C LASSIFIERS the outputFOR variable
R ECURRENTy(k) at time k and
C ONCEPTS : its previous
values. For instance,
by inspecting the normalized Kolmogorov-Smirnov distancein the case of a scalar single output of
THE G ENERAL F ORMULATION
order ky , the system can be expressed
between the cumulative density functions as estimated from the
training samples and a window of the most recent ones. The key elements composing a JIT classifier are the set
[14]–[17], [20] present triggering mechanisms based on of concept representations C = {C1 , . . . , CN } and the set of
A(z)y(k) = d(k), (52) 63
Supervised Learning: Statistical framework
Regression Classification
64
affecting the parameter vector. Which are then the effects of this perturbation on the
y = g(x) + h, (3.7)
65
Designing a classifier
Feature 2
Feature 1
66
Some issues we need to focus on
67
Non linear regression
yi
xi
x
68
Let ZN = {(x1 , y1 ), ..., (x(”wrong”)
N , yN )} be model
the set to describeofthe
composed data? Which couples.
N (input-output) is the relationship between the optimal
Non-linear
The goal ofregression:
machine learning statistical
parameter framework
configuration,
is to build the simplestconstrained
approximating by model
the selected
able to model family, and the current
explain past ZN data andone configured
future on will
instances that a limited data by
be provided set?
theSince the estimated parameter vector is a re-
data generating
process. alization of a random variable centered on the optimal one, the model we obtain
3.4 Learning from data and uncertainty at the model level
Consider then the situation
from the where the process
available datagenerating the data
can be seen as (system model)
a perturbed model induced by perturbations
is ruled by affecting the parameter h is a noise
where vector. Whichtermare then the
modeling theeffects
existingofuncertainty
this perturbation
affectingon
thethe
unknow
The time invariant process generating the data
performance ofnon the linear
model? function
This section g(x), if any. aims Once the generic the
at addressing dataaboveinstance xi is available, (3.
aspects.
3.4 Learning fromydata = g(x) + h, value
andprovides
uncertainty at the modelyi = g(x leveli ) + hi , hi being (3.7)a realization39of the random variable h.
The goal of learning
practical cases, the system for which we aim to create a model, by receiving inp
is
where
provides, h is a noise
given inputterm modeling
xioutput
, provides instance the existing
value yi . We to build the
uncertainty
comment simplest
affecting
that both theinputs
unknown and outputs are quantities me
xi
non3.4linear function
Learning from datag(x), if any.through
surable
and uncertainty Once
at the model level approximating
thesensors.
generic data
The instance
ultimate x ismodelavailable,39 (3.7) is to build an approximati
generating thei goal of learning
3.4.1
provides value Basics
yi = g(xof iof
)+ Learning:
g(x)hi ,Learning
3.4 hi being
based onadata
from the inherent,
realization
able of
information
and
Process
uncertainty
dataapproximation
toatthe
explainrandom
present
the model level dataseth.ZNInand
invariable
past through estimation
model 39 family
h
where
practical cases, risks
is a noise term modeling the existing uncertainty
the system for which we aim to create a model, by receiving input
data
affecting
Z and
the
future
unknown
xi , provides value yi .g(x),
non linear function if any.
We comment Onceh
where the
that generic
is aboth
noiseinputsdata
term instance
and Noutputs
modeling xi isexisting
the available,
are , x)(3.7) mea-
uncertainty
f (qquantities affecting the unknown (3.
We collect
provides a setyiof=couples
value (training
hi , linear
g(xi ) +non hi being set)
function g(x),instances
a realization if of
any.theOncerandom provided
the variable
generic h.
databy In
instance xi is available, (3.7)
surable through sensors. The ultimate goal of learning is to build an approximation
practical cases, the system provides for whichvalue we aim yi =tog(x i ) + hai ,model,
create hiyibeing byareceiving
realization inputof the random variable h. In
p . Selection
Let
of g(x) Z
based =on the
{(x y parameterized
information
,
N value yi1. We1comment
xi , provides ), ..., present
(x
practicalthat N ,
cases, y
both in
)}in the
be
dataset
Ntheinputs
system the
the Z Ndata
parameter
set
through
andforoutputs
which are generating
vector
composed model
we quantities
q 2 ofQ
family
aim to create mea- N
⇢ R
(input-output) of acouples.
a model, by receiving input
suitable fam
ofThemodels f (q , x) can be isdriven thatby someinputsaapproximating
priori available information
mea- able about
The goal
surable through ofsensors.
machine learning
xi , ultimate
provides value
goal isylearning
of i .to process.
Webuild
comment to the
build simplest
both
an approximationand outputs model
are quantities to t
system
of g(x) based on the information surable model.
through
present If
fin data
, x) are
(qsensors.
dataset ZThe likely
ultimate
N through
to be generated
goal
model of learning isby
family a linear
(3.8)
to build model -or a linear mod
an approximation
explain past
And wish to model unknown Z data
N suffices- and
of g(x)then future
with this
based instances
on thetype of model
information that
presentwill
should be
in dataset provided
be Zconsidered. by
N through model
the datacase,
Infamily
this generating
learning reli
process. family
parameterized
parameterized ofparameter
in theon models
vast results vector x) p
q 2 Q ⇢ Rby. Selection
f (q ,provided the system (3.8) family theory, e.g., see [130]. T
of a identification
suitable
f (q , x) (3.8)
of models Consider
f (q , x) canthen bethe
driven
outcome situation
by the
of some where
the a learning
priori p the process
available
parameterized in the parameter vector q 2 Q ⇢ R . Selection of a suitable family
procedure generating
information
is the about
parameterthethedata (system
configuration model)
q̂ and, henc
system model.fby
ismodels
of ruled If, data
(q x) canarebelikely
model driven f toq̂be
(by x),
,some
parameterized generated
whose
a in
priori by a linear
the quality/accuracy
parameter
available model
vector q 2 -or
information ⇢aRlinear
Qmust
about
pbe
. the modelof a suitable family
assessed.
Selection
suffices-
systemthen thisIf data
model. typeare
of likely
model
ofthe
If should
models
to accuracy
be generatedbe can
f (q , x) considered.
bybe
performance
a linear In is
drivenmodel
by this
somenot
-or case, learning
aa linear
priori
met, relies
available
and
model information
margin about the
for improvement exists, w
on vast results
suffices- provided
then this ofby
typehave system
to
model model.
theselect
system
should abeIfnew
datamodel
are likely
identification
considered. In to becase,
generated
theory,
family
this e.g.,
and by
seearelies
reiterate
learning linear
themodel
[130]. The-or a linear
learning modelFor instance,
process.
on vastofresults
outcome the theprovided
learning suffices-
thebydifference
then identification
the system
procedure this type of
is y=
the parameter
-residual- g(x)
model
theory,
between +
should h,
e.g.,be
seeconsidered.
configuration
the [130].
reconstructed
this case, learning relies (3.7)
TheInhence,
q̂ and, value f (q̂ , x) and the measur
on vast results provided by the system identification theory, e.g., see [130]. The 69
ˆ
mily,
70 i.e., g(x) = f (q o , x). Rewrite the structural risk V̄ (q̂ ) associated with mo
Inherent,
q̂ , x), i.e., theapproximation
performance of and estimation
the obtained risksas
model,
Optimal Model
Selected Model
Best Reachable Model
Model Space
Target Space
71
72
Approximation and estimation risks
Optimal Model
Approx.
Error
Selected Model
Best Reachable Model
Estimation
Error
Model Space
Target Space
72
What about Neural Networks?
73
Modelling space and time
74
Neural computation
75
Neural computation
Activation function
• Heaviside
• Sigmoidal
• Linear
76
Multi-layer Neural Networks
3.4 Learning from data and uncertainty at the model level 39
where h is a noise term modeling the existing uncertainty affecting the unknown
non linear function g(x), if any. Once the generic data instance xi is available, (3.7)
provides value yi = g(xi ) + hi , hi being a realization of the random variable h. In
practical cases, the system for which we aim to create a model, by receiving input
xi , provides value yi . We comment that both inputs and outputs are quantities mea-
surable through sensors. The ultimate goal of learning is to build an approximation
of g(x) based on the information present in dataset ZN through model family
xi f (q , x) (3.8)
78
Universal approximation theorem
79
80
Approximation and estimation risks
Optimal Model
Approx.
Error
Selected Model
Best Reachable Model
Estimation
Error
Model Space
Target Space
80
81
Approximation and estimation risks
Optimal Model
Selected Model
Estimation
Error
NN Model Space
Approx. Error = 0
Target Space
81
Quality assessment of the solution
«How good is your ‘good’?»
82
Two examples: how good is my good solution?
50 0 0 100% 23 1 0 95.8%
1 1
33.3% 0.0% 0.0% 0.0% 30.7% 1.3% 0.0% 4.2%
0 47 0 100% 0 25 2 92.6%
2 2
0.0% 31.3% 0.0% 0.0% 0.0% 33.3% 2.7% 7.4%
Output Class
Output Class
0 3 50 94.3% 0 1 23 95.8%
3 3
0.0% 2.0% 33.3% 5.7% 0.0% 1.3% 30.7% 4.2%
3
Target Class Target Class
83
84
Assessing the performance
§ Apparent Error Rate (AER),or resubstitution: The whole
set ZN is used both to infer the model and to estimate its
error
§ Sample Partitioning (SP): SD and SE are obtained by
randomly splitting ZN in two disjoint subsets. SD is used to
estimate the model and SE to estimate its accuracy.
§ Leaving-One-Out (LOO): SE contains one pattern in ZN,
and SD contains the remaining n − 1 patterns. The pro-
cedure is iterated n times by holding out each pattern in
ZN, and the resulting n estimates are averaged.
84
85
Assessing the performance (2)
§ w-fold Crossvalidation (wCV): ZN is randomly split into
w disjoint subsets of equal size. For each subset the
re- maining w − 1 subsets are merged to form SD and
the reserved subset is used as SE . The resulting w
estimates are averaged. This procedure can be
iterated and the results averaged when w ≪ n in order
to reduce the random resampling variance. This
estimate is a generalization of LOO.
85
What about the technology for AI?
86
“Artificial Intelligence deals with the development of
hardware-and-software systems endowed with
human-like capabilities, able to autonomously pursue a
given goal and making decisions that, until that moment,
were usually assigned to humans”
87
87
«… hardware-and-software systems endowed with human-like capabilities …»
AI SOFTWARE
AI HARDWARE
88
88
«… hardware-and-software systems endowed with human-like capabilities …»
AI SOFTWARE
(Application)
AI SOFTWARE
(Framework/Platform/Tool)
AI HARDWARE
89
89
An IT perspective for the AI
AI SOFTWARE SOFTWARE
(Application) (Application)
AI SOFTWARE SOFTWARE
(Framework/Platform/Tool) (Environment)
AI HARDWARE HARDWARE
90
90
An IT perspective for the AI
91
91
An IT perspective for the AI
92
92
AI and Technology
50x – 100x
Computing
Systems
Deep Blue defeated
Kasparov (1996)
0.1x – 0.5x
PCs
Intelligent Object Embedded
Devices
Internet of
0.01x–0.05x
Things
93
93
AI Hardware
AI HARDWARE
Edge
Computing
Systems
PC
Embedded
PCs
Embedded
Devices
Internet of
Things
94
94
AI Hardware: dal datacenter a ML-as-a-service
Machine-Learning
Deep-Learning
AI APPLICATION as-a-Service
AI PLATFORM
ML and DL Solutions
95
95
Why Machine Learning in the Cloud?
Solutions
Cloud Platforms
97
Machine Learning Infrastructure as a Service
Design ease
Flexibility
98
Machine Learning Platform as a Service
Design ease
Amazon SageMaker, AZURE ML service,
Google Cloud ML engine, IBM Watson
Flexibility
Provide pre-configured environments used by AI experts to train, tune and host models
99
ML software as a service
Amazon AI Services, Google Cloud AI, Microsoft
Cognitive Services and IBM Watson
Design ease
Ready-to-use Pre-defined
Solutions actions/activities
Cloud Platforms
Flexibility
100
AI and off-the-shelf technological solutions
Design ease
User
Ready-to-use Pre-defined
Solutions actions/activities
Flexibility
Developer
101
102
A real world example: image classification
102
103
A real world example: image classification
Solutions
Capabilities AI Engineer
Methodologies
Technologies
103
104
A real world example: image classification on AWS
Solutions
Capabilities AI Engineer
Methodologies
• ML Solutions as a service
»Rekognition»
Technologies • ML Platforms as a service
«Sagemaker»
• ML Infrastructures as a service
«EC2»
Training
TRAINING
Data
104
The evolution of computation and memory
Univac Computation,
Memory
Pioneering works in
the field of AI
Statisitcal Methods
BackProp
SVM
Neural Networks Deep Learning
Computational needs
Memory Requirements
Bayesian Inference K Nearest Neighbour
105
Deep Learning: brain-inspired architecture
106
Deep representation of knowledge
107
108
Increasing the complexity of deep learning models
Complexity doubles
every 3.5 months
108
109
Increasing the complexity of deep learning models
Complexity doubles
every 3.5 months
109
Generations and seasons:
the evolution of computation and memory
Univac Computation,
Memory
Pioneering works in
the field of AI
Statisitcal Methods
BackProp
110
Generations and seasons:
the evolution of computation and memory
Univac Computation,
Memory
Pioneering works in
the field of AI
Statisitcal Methods
BackProp
111
Enabling accelerator operations
Edge
Computing
Systems
PC
Embedded
PCs
Embedded
Devices
Internet of
Things
112
113
Graphical Processing Units (GPU)
• Data-parallel computations: the same program is executed on many data elements in parallel
• The scientific codes had to be mapped onto the matrix operations.
• High level languages (such as CUDA and OpenCL) that target the GPUs directly
• Up to 1000x faster than CPU
113
114
Tensor Processing Unit (TPU)
• Custom-built integrated circuit developed specifically for machine learning and tailored for TensorFlow
• Powering Google data centers since 2015 as well as CPUs and GPUs
• A Tensor is an n-dimensional matrix. This is the basic unit of operation in with TensorFlow
114
115
Field-Programmable Gate Array (FPGA)
• Array of logic gates that can be programmed (“configured”) in the field, i.e., by the user of the device as
opposed to the people who designed it
• Array of carefully designed and interconnected digital subcircuits that efficiently implement common
functions offering very high levels of flexibility. The digital subcircuits are called configurable logic
blocks (CLBs)
115
116
CPU, GPU, TPU and FPGA: an AI comparison
Advantages Disadvantages
• Easy to be programmed and • Most suited for simple models
support any programming that do not take long to train and
CPU framework. for small models with small
• Fast design space exploration training set
and run your applications.
• Ideal for applications in which • Programmed in languages like
data need to be processed in CUDA and OpenCL and therefore
GPU parallel like the pixels of images provide limited flexibility
or videos. compared to CPUs.
• Very fast at performing dense • For applications and models
vector and matrix computations based on the TensorFlow.
TPU and are specialized on running • Lower flexibility compared to
very fast program based on CPUs and GPUs
Tensorflow.
• Higher performance, lower cost • Programmed using OpenCL and
and lower power consumption High-level Synthesis (HLS)
FPGA compared to other options like • Limited flexibility compared to
CPUs and GPU other platforms.
116
Machine-Learning-as-a-service: pros and cons
117
117
From Cloud to IoT and Edge
Computing: new platforms for ML/DL
118
118
IoT, PC Embedded and Edge Computing
AI APPLICATION
AI HARDWARE
Internet-of- PC Edge
Things Embedded Computing
119
119
IoT, PC Embedded and Edge Computing
AI APPLICATION
AI HARDWARE
Internet-of- PC Edge
Things Embedded Computing
120
120
IoT, PC Embedded and Edge Computing
ü Increase autonomy
ü Reduce decision-making latency
ü Reduce transmission bandwidth
AI APPLICATION ü Increase energy-efficiency
ü Security and Privacy
AI PLATFORM MACHINE AND DEEP LEARNING PLAFORMS FOR IoT AND EDGE
AI HARDWARE
Internet-of- PC Edge
Things Embedded Computing
121
121
122
Intelligent Internet-of-Things and Cyber-Physical Systems
Adaptation
Domain
Cyber
Self-
awareness
Intelligent
Intelligent
Cognitive
Processing IoT and Mechanisms
of
Physical
Cyber- for Actuation Reliability
and Control
Sensing Physical
Systems Self-
Diagnosis
Physical
Domain
Self-
healing
122
123
Intelligent Internet-of-Things
Self-
eness mec healing
ar h
User a w
Self- nose to re anisms
g pair
to dia faul
ts
? fault
i s ms
n
a
e ch
i ch M W?
Wh ich H Application / Service
Wh
Intelligent Mechanisms
Remot
e
contro Detect changes
llabilit
HW / Sensors y in the
users’behaviour
Ener
g
Harv y
estin
g mart energy
S
Environment ement
manag
123
124
Sensors
Sensors
Sensors
Comm
Comm
Comm
Comm
124
125
Comm
Comm
Comm
Sensors
Sensors
Sensors
125
The last mile …
126
127
127
128
128
129
129
recognition algorithms to do verifica-
130 tion and identification, two separate
but related tasks. Verification involves
trying to correctly determine whether
two faces presented to the facial recog-
IN A MILLION
A new benchmark test shows that even Google’s facial
With that in mind, University of Wash-
ington researchers raised the bar by cre-
among a million “distractor” faces. Ini-
tial results on algorithms developed by
Google and four other research groups
were presented at the IEEE Conference
recognition algorithm is far from perfect
ating the MegaFace Challenge using
1 million Flickr images of 690,000 on Computer Vision and Pattern Recog-
unique faces that are publicly available
under a Creative Commons license. nition on 30 June. (One of MegaFace’s
The MegaFace Challenge forces facial
recognition algorithms to do verifica- developers also heads a computer vision
tion and identification, two separate
Helen of Troy may have had the the project’s principal investigator. “And
but related tasks. Verification involves team at Google’s Seattle office.)
trying to correctly determine whether
face that launched a thousand we make a number of discoveries that are
two faces presented to the facial recog- The results presented were a mix of
FINDING ONE FACE nition algorithm belong to the same per-
ships, but even the best facial recogni- only possible when evaluating at scale.”
son. Identification involves trying to find
a matching photo of the same person
the intriguing and the expected. Nobody
in a crowd of a million
Google and four other research groups
were presented at the IEEE Conference strang-
The huge drops in accuracy when scan-
ning a million faces matter because facial
was surprised that the algorithms’ per-
formances suffered as the number of
recognition algorithm is far from perfect
ers. The first public benchmark test based recognition algorithms inevitably face distractor faces increased. And the fact
on Computer Vision and Pattern Recog-
nition on 30 June. (One of MegaFace’s
developers also heads a computer vision
on principal
Helen of Troy may have had the the project’s 1 million faces
investigator. “And has shown how facial such challenges in the real world. People
team at Google’s Seattle office.) that algorithms had trouble identifying
The results presented were a mix of
face that launched a thousand we make a number of discoveries that are
recognition
ships, but even the best facial recogni- only possible algorithms from Google and increasingly trust these algorithms to cor-
the intriguing and the expected. Nobody
when evaluating at scale.” the same person at different ages was a
tion algorithms might have had trouble was surprised that the algorithms’ per-
The huge drops in accuracy when scan-
finding her in a crowd of a million strang- ning a other
million facesresearch groups around the world rectly identify them in security verifica-
formances suffered as the number of
matter because facial known problem. However, the results
distractor faces increased. And the fact
ers. The first public benchmark test based recognition algorithms inevitably face
still fall
on 1 million faces has shown how facial such challenges well
in the real short of perfection.
that algorithms had trouble identifying
world. People
the same person at different ages was a
recognition algorithms from Google and increasingly trust these algorithms to cor-
tion scenarios, and law enforcement may also showed that algorithms trained on
Facial recognition algorithms that also rely on facial recognition to pick sus-
known problem. However, the results
other research groups around the world rectly identify them in security verifica-
still fall well short of perfection. also showed that algorithms trained on
tion scenarios, and law enforcement may
relatively small data sets can compete
had previously performed with more pects out of the hundreds of thousands of
relatively small data sets can compete
Facial recognition algorithms that also rely on facial recognition to pick sus-
with those trained on very large ones,
had previously performed with more pects out of the hundreds of thousands of
with those trained on very large ones,
benchmark test involving 13,000 faces The than
most popular95benchmark
percent until accuracy on a popular
such as Google’s FaceNet, which was
than 95 percent accuracy on a popular faces captured on surveillance cameras.
trained on more than 500 million pho- faces captured on surveillance cameras. such as Google’s FaceNet, which was
benchmark test
LFW involving 13,000 faces The most popular benchmark until trained on more than 500 million pho-
tos of 10 million people.
saw significant drops in accuracy when now has been the Labeled Faces in the
taking on the new MegaFace Challenge. Wild (LFW) For example, the FaceN algorithm from
test created in 2007.
“MegaFace’s key idea is that algorithms saturated—that is, there are many, many also performed well on certain tasks.
08.News.INT - 08.News.NA [P]{NA}.indd 11 7/13/16 2:10 PM
should be evaluated at large scale,” says algorithms that perform above 95 percent Nevertheless, FaceNet has so far per-
Ira Kemelmacher-Shlizerman, an assis- on LFW,” Kemelmacher-Shlizerman says. formed the best overall. It delivered the
tant professor of computer science at the “This gives the impression that face rec- most consistent performance across
University of Washington, in Seattle, and ognition is solved and working perfectly.” all testing.
130
131
Thank you for the attention!
132