Professional Documents
Culture Documents
Carlos-Automl School PDF
Carlos-Automl School PDF
Carlos Soares
csoares@fe.up.pt
• ... government/science
– e.g. tax fraud detection, catastrophe management and
environmental monitoring
http://www.gta.ufrj.br/ensino/eel879/trabalhos_vf_2010_2/lemos/introducao.html
1 0.7 327.2 0 A
– induce model 2 -0.6 1234.2 1 B train
from 3 ... ... ... ...
can’t resist it: his slides look so much better than mine!
ELo, Kao, Ho, Lee, Chui, Cheung, “OLAP on sequence data”, SIGMOD, 2008
Zaki. Editorial: Online, Interactive and Anytime Data Mining. SIGKDD Explorations.
2002;3(2)
carlos soares
PlanLearn workshop@ECAI 2012 (http://datamining.liacs.nl/planlearn.html) @ automated ml @ ICMC 2018 22
metadata: ... often distributed
• use “local” data
– more relevant
– … but possibly not sufficient
for reliable model
• or higher-level data
– more data
– … but possibly less relevant
• use domain information
Nozari and Soares, Meta-Learning to Choose the Level of Analysis in Nested Data:
A Case Study on Error Detection in Foreign Trade Statistics, Proc. of the
International Joint Conference on Neural Networks (IJCNN), 2015
– … and models
Gama, Kosina. Learning about the learning process. In: Proc. of the 10th
Int. Conf. on Advances in Intelligent Data Analysis. Springer-Verlag;
2011:162-172.
– evolving models
Gama, Knowledge Discovery from Data Streams (2010), Chapman & Hall/CRC
Press
1 0.7 327.2 0 A
– induce model 2 -0.6 1234.2 1 B train
from 3 ... ... ... ...
1 0.7 327.2 0 5 -1
2 -0.6 1234.2 1 4 1 characterization 2399,49,1,0.65,…
3 ... ... ... ... ...
max.
examples correlation
i xi,1 xi,1 xi,1 xi,1 decisão
Target
Attr
Absolute mean
correlation Descriptive
Examples Col
between numeric Correlation
Statistic
features
Attr
Correlation
between numeric Examples Col Correlation
Non-
features aggregated
Average degree
of dataset
Examples Set Descriptive
characterization Degree
Statistic
graph
Jensen-Shannon
distance between Examples Set Jensen-
Descriptive
dataset and Shannon
Statistic
bootstrap Bootstrap Set distance
Predictions
Target Non-
Decision stump Accuracy
aggregated
landmarker
Examples Col
C5.0
C5.0
CD
CD
5 4 3 2 1
5 4 3 2 1
Default Syst.ReliefF Traditional Default
Traditional Syst.CFS Set
Syst.ReliefF
Set Systematic Syst.CFS
Systematic
SVM SVM
CD CD
5 4 3 2 1 5 4 3 2 1
RF RF
CD CD
5 4 3 2 1 5 4 3 2 1
environments
0.4
NMSE SVM − NMSE Random Forests
0.2
• MetaStream: a
0.0
metalearning approach
−0.2
characteristics to base-
−0.6
."
." BEST
Test'
." ALGORITHM
!" !" !"
Meta%data*
x" ym*
!" !" !" !"
!" !" !" !" Meta%example*
." Learning* Meta% x" ym*
." algorithm* model* !" !" !"
."
!" !" !" !"
Base3data'
x" yb'
!" !" !" !"
!" !" !" !"
."
Training'
."
." Meta3example'
." x" ym'
!" !" !" !" Feature' !" !" !"
!" !" !" extractor'
."
."
Test'
."
!" !" !"
carlos soares @ automated ml @ ICMC 2018 41
Meta-features issues: base level
http://www.springer.com/computer/artificial/book/978-3-540-73262-4