Up M PHD Seminar Cart RF May 2023

UPM-PhD
Non-Linear Regression and Input Variable Importance in Machine Learning with CART
and Random Forests
José Mira
Statistical Laboratory
May 26, 2023
• ¿What are data mining and Machine Learning?
– Data analysis
– Model development
• Objectives
– Understanding a complex/ a priori “chaotic” process from
natural or social sciences
• Description of trends and regularities
• Pattern identification
– Prediction (could be “black box”)
– Quantification of uncertainty
– josemanuel.mira@upm.es
2
• Introduction
• CART
• Random Forests
• Sensitivity Analysis
• A reminder of Bayesian Stats and Bayesian
CART
• Examples
Análisis de la varianza 3
Essential (“delicate”) issues
• Outliers
– Errors
– Correct data: Rare events could be more
relevant/interesting tan “regular ones”
• Missing Values
• Do a set of model hypothesis hold for our data? (specially for
parametric models)
4
Two cultures:
• Data are generated by a proposed stochastic model (data

generating process)
• Unknown stochastic model: algorithmic/non parametric
approaches are used instead
L. Breiman (2001), Statistical Modelling: The two Cultures, Statistical Science, 16,3,199-
231.
5
Linear Regression
X Logistic regression Y
….
X Unknown Y
Decision trees
SVM..
– Model validation: predictive accuracy
6
Examples
• Modelling of natural sciences processes
– Meteo
– Astronomy!
– Structural Engineering
– Nuclear Engineering
– Complementary to mechanistics approaches- quantification of uncertainty
– Complex Computer simulators
• Energy:
– Demand
– Prices
– Patterns
• CLIENT FIDELITY
• Credit scoring, credit cards
• SALES
• Telecom or banking
– Client behavior:
• IN ADVANCE IDENTIFICATION OF FUTURE CLIENT NEEDS ….
7
Examples
• Service industry
• FRAUD DETECTION
TURNING DATA INTO KNOWLEDGE
8
Characteristics
• Tools to EXTRACT, EXPLOIT information

– Two variables
– 100 individuals
– 100 variables
– One million individuals
9
Definitions
• “Data mining represents the work of processsing graphically

or numerically large amounts or continuous streams of data
with the aim of extracting information useful to those who
possess them”
Azzalini & Scarpa 2012
10
Data mining
• Bridging the gap between

– Statistics
– Artificial Intelligence ( Machine learning)
– Data base management
11
“ If you torture the data long enough, Nature
will always confess”
R.H. Coase 1991 Nobel Laurate in Economics
12
Data types
• …spatial
• …time dependent
• …documentary
• …multimedia
• WWW
13
• There is more tan one good model
• Sir Ronald Fisher “All models are wrong, some
are useful”
• Tradeoff between simplicity (parsimony) and
accuracy
• Dimensionality
14
Tools (I)
• Unsupervised Learning Methods:

– Principal Component Analysis (PCA)
– Factor Analysis
– Cluster Analysis
– Autoencoders
– RBM (Restricted Boltzmann Machines)
– SOM (Self Organizing Maps)
15
Tools (II)
• Supervised Learning Methods:

– Multiple linear regression
– Non-parametric regression
– Splines
– Additive models GLM
– CART, Bagging and Random Forests
– Neural networks – Deep Learning-MLP
– Support Vector Machines
– Boosting
16
Tools (III)
• In-between Supervised and Unsupervised Learning Methods:

– Reinforcement Learning (a reward instead of a level)
– Semisupervised
17
Steps
• Define objective:
– E.g. Classification of customers in a bank
• Choose the type of model:
– Ej. Classification Trees
• Choose algorithm:
– CART, BART, CHAID, Random Forest….
• Choose software: R, Python, Matlab, SPSS, SAS….
18
CART -Based Models
The basics of tree models
• Belong to Data Mining

• Build models which describes/expresses a given variable of
interest in terms of the remaining ones .
• Allow for assessment of explanatory variable importance
Classification andINTRODUCCIÓN
Regression Trees
(Breiman et al, 1984)
Data mining vs Traditional Statistics
Characteristics
• Does not establish any parametric model, it is thus fully data-
dependent
• The data “choose” the model (Form of “Artificial
Intelligence”)
• Very computer and algorithm intensive
• Thus, increasingly competitive , given growth in computing
power and lower costs
• Interpretation: easy-not so easy
Breiman, L. (2001), Statistical Modeling; The two cultures, Statistical Science,
16, (3), 199-231
Classification and Regression Trees
Variable of interest
Qualitative/categorical Continuous/Frequcies
Classification Trees
Regression trees
Class estimation Quantit output

Estim.
PREDICTION
Difference between estimation and prediction
• Estimation is about parameters

• Prediction for observables
• Example with simple regression
• Weight=b0+b1height+u(random error)
Advantages (I)
• Not subject to distributional assumptions

• Works very well with a large number of explanatory
variables (“predictors”)
• Capable of identifying and modelling complex
interactions (non-linearities) between inputs
Advantages (II)
• Interpretability vs traditional probabilistic models:

– To think in terms of probability is abstract and often counter-
intuitive
– The inherent logic of a tree is straighforward to understand
• Easy to build
Drawbacks
• No analytical or “closed form”, it is just an algorithm

• Not robust vs outliers
Software and Algorithms
Algorithms
- CART
• Software - CHAID
– R - BART
– MATLAB - DYNATREE
– SPSS
– GUIDE
– SAS….
– PYTHON
Ensembles of trees
- Random Forests
- Bagging
- Boosting
Main Features
• Response, output or dependent variable

• Predictor, input or independent variables
• Training sample
• Validation sample
• Test sample (Future data)
Tree building (growing)

Tree pruning
STEPS Tree validation_selection
The key points
• Characteristics of the algorithms

– Selection of predictor variables and Split
points: : Splitting rules
– Criteria to decide if a node is terminal or
partition process continues: Stopping rules
– Three types pf nodes: root, internal, leaf
– Category assignment to leaf (“terminal”) nodes:
Classification criterion
– Value of output variable for terminal nodes:
Prediction criterion
Splitting criteria-I
1) Categorical output, with Q categories
Entropy reduction
Q
H   pi log( pi )
i 1
Gini reduction (Total leaf impurity)

Q
G  1   pi2
i 1
Splitting criteria - II
2) Quantitative output
F-test
Variance reduction
Entropy as a measure of uncertainty
Independent from the distributions values, just on the

probabilities.
Interpretation: Number of posible configurations in N atoms
N!
W
( Np1 )!...( NpQ )!
Stirling’s formula
log( N !)  N log( N )  N
Q
W   pi log( pi )
i 1
Entropy: the basics
Suppose we have four balls, 2 red R1,R2 and 2 black B1 and B2
In how many ways could we sort them out?
In 4!
But now let us suppose we consider that the sorting B1N1B2N2 is the
same as
B1N1B1N2 or
B1N2B2N1 ..
i.e, that what matters is the COLOUR of the ball in a given poosition
(first, second, third or fourth) not WHICH OF THE TWO red ones
or WHICH of the two black ones it is.
Thus, the number of different sortings would be NOT 4!

But (4!/(2!x2!)
Gini index as a measure of uncertainty
Q
G  1   pi2
i 1
Similar to entropy
Probability of two independiente extractions
being equal
Gini index
We have proportion p1 of red balls and 1-p1 of black ones
We now draw a ball at random and I guess its colour, in

accordance with the probabilities above.
Four possibilities:
It’s red and I say it’s red p1*p1

It’s red and I say it’s black p1(1-p1)
It’s black and I say it’s red (1-p1)p1
It’s black and I say it’s black (1-p1)^2
My guess has been right for the first and fourth

possibilities and wrong for third and fourth
P(wrong guess)=1-P(right guess)=

=1-p1^2-p2^2, Gini index
Stopping criteria
– One or a minimal number of observations per

node
– All observations in the node are equal
– External bound on the number of tree levels
• Care with “Overfitting”. …… Outliers

Overall accuracy or Model assesment
statistics for CART
1) Numerical outputs
MSE
2) Categorical outputs
Proportion of misclassified data

Global impurity index in final nodes (Entropy or Gini)
Number of tree levels: cost-complexity pruning
with Cross validation
Assigning values to terminal nodes: Prediction
• Assignment of mean value of the output for a given

terminal node.
• **A linear regression model may be built from all

observations in a terminal node, with the inputs
associated to the node.
NEV non
explained
variability
Deviance o RSS
Example: Iris (flower) Classification.
Avaliable info on four physical flower measurements:

• petal length and width,
Predictor • sepal length and width,
variables
and about the species for each individual flower in the sample:
• IRIS SETOSA
• IRIS VERSICOLOR
Categorical • IRIS VIRGÍNICA
variable
OBJECTIVE: Finding a classification criterion for new flowers
Iris
Virgínica Versicolor Setosa

Example: Iris
Grupo X1 X2 X3 X4
1 1 5,1 3,5 1,4 0,2
2 1 4,9 3,0 1,4 0,2
3 1 4,7 3,2 1,3 0,2
1.- Setosa 4 1 4,6 3,1 1,5 0,2
5 1 5,0 3,6 1,4 0,2
:
50
:
1
:
5,0
:
3,3
:
1,4
:
0,2
X1: Sepal Length
1 2 7,0 3,2 4,7 1,4
2 2 6,4 3,2 4,5 1,5
X2: Sepal width
2.- Versicolor 3
4
2
2
6,9
5,5
3,1
2,3
4,9
4,0
1,5
1,3
5
:
2
:
6,5
:
2,8
:
4,6
:
1,5
: X3: Petal length
50 2 5,7 2,8 4,1 1,3
1
2
3
3
6,3
5,8
3,3
2,7
6,0
5,1
2,5
1,9 X4: Petal width
3 3 7,1 3,0 5,9 2,1
3.- Virginica 4
5
3
3
6,3
6,5
2,9
3,0
5,6
5,8
1,8
2,2
: : : : : :
50 3 5,9 3,0 5,1 1,8
ESPECIE
Nodo 0
Categoría
Iris-setosa
%
33,33
n
50
Root node
Iris-versicolor 33,33 50
Iris-virginica 33,33 50
Total (100,00) 150
Split points pétalo - longitud
Mejora=0,3333 Variables selected
<=2,4500000000000002 >2,4500000000000002
Nodo 1 Nodo 2
Categoría % n Categoría % n
Iris-setosa 100,00 50 Iris-setosa 0,00 0
Iris-versicolor 0,00 0 Iris-versicolor 50,00 50
Iris-virginica 0,00 0 Iris-virginica 50,00 50
Total (33,33) 50 Total (66,67) 100
pétalo - ancho
Mejora=0,2598
Leaf nodes
<=1,75 >1,75
Nodo 3 Nodo 4
Total (36,00) 54 Total (30,67) 46
A new flower, what species is it?
A new flower (prediction)
ESPECIE
• Petal length= 5.82 Categoría

Nodo 0
% n
• Petal width=3.83
Iris-setosa
Iris-versicolor
Iris-virginica
33,33
33,33
33,33
50
50
50
• Sepal length=2.36 Total (100,00) 150
• Sepal width=0.32
pétalo - longitud
Mejora=0,3333
<=2,4500000000000002 >2,4500000000000002
Nodo 1 Nodo 2
Total (33,33) 50 Total (66,67) 100
pétalo - ancho
Mejora=0,2598
<=1,75 >1,75
Nodo 3 Nodo 4
Total (36,00) 54 Total (30,67) 46
Ensembles of trees
Bagging
Random Forest
Ensembles of trees: Bagging and Random Forests
• Bagging
• Random Forests
Ensembles of tres result from resampling of original data.
More stable and better predictors than standard CART
Ensembles of trees: Bagging
First sophistication of CART

Bootstrap (resampling)
Better statistical properties, specially in terms of
variances
More difficult to interpret than standard (single tree)
CARTs
Ensembles of trees: Random Forests
Sophistication of bagging
Bootstrap (resampling)
For each Split, random selection of subset of inputs
Better statistical properties, specially in terms of
variances
More difficult to interpret than standard (single tree)
CARTs
Definition of sensitivity analysis
• Given a set of input variables, what is the effect on the output

of “small” chages to the those inputs
• Not necessarily a statistical problem

Problems with traditional SA
• Miopic to non-monotonic transformations
• Unable to proper handling of non-uniform inputs

Classification of SA
• Local
• Global
Regression-based sensitivity indices
• Regression coefficient
y   0  1 x1  1 x2  ...   K xK
Regression-based sensitivity indices
• Standardized regression coefficient – the input and output

variables are divided by their standard deviation.
• Interpretation: when a given factor Xi is increased by a

standard deviation, the output increases on average by Bi
standard deviations (all other inputs held equal-ceteris
paribus)
ANOVA methods
• Global SA
• Based on decomposition of (output) variability in its
different (input) sources
• Noisy or non-noisy context
• No noise term for deterministic simulation computer
codes
• There is noise in real data
Concept of interaction
• The effect on the output of a given input depends on

the value of the another/other input/inputs
• Road traffic example
• Example for continuous models

Concept of interaction (II)
• The effect on the output of a given input depends on the value of the another/other
input/inputs
• Road traffic example:

– Experiment in A6 motorway: villalba-Aravaca (30 km)
» Response: average speed:
• Factor I: car model: Ferrari or Hyundai Atos
• Factor II: traffic: traffic jam or very low
– Does the effect on one factor on the mean value of the response depend on the
value taken by the other factor?
• Example for continuous models:

– Z=3X2+4Y3
– Z=X+Y+4XY
Concept of Interaction (III)
• Example for continuous models:
– Z=3X2+4Y3
– Z=X+Y+4XY
• Formulation of interaction in terms of derivatives
Interaction and non-linearity
• Newton’s law:
– F=ma
– A rigid body is subjected to a force F1 at time point t1
– The same body is subjected to a force F2 at time point t2
– The same body is subjected to F1 and F2 at time point t3
ANOVA-based SA
• Definition of main effect
• Definition of second or higher order interactions

ANOVA-based SA
• Sums of squares decomposition
• Global sensitivity measures

ANOVA-based SA
• Main effects: discrete models

• Two factors/inputs
 i  y i.  y ..
 j  y . j  y ..
I J
 y
i 1 j 1
ij
y .. 
IJ
J

j 1
yij
y i. 
J
I
y ij
y. j  i 1
I
ANOVA-based SA
• Discrete 3 factor models

 i  y i..  y ... ;  j  y . j .  y ... ;  k  y ..k  y ...
I J K
 y
i 1 j 1 k 1
ijk
y ... 
IJK
J K
 y
j 1 k 1
ijk
y i.. 
JK
I K
 y ijk
y. j.  i 1 k 1
IK
I J
 y
i 1 j 1
ijk
y ..k 
IJ
ANOVA-based SA
• Definition of second order interactions
( ) ij  yij     i   j 
 yij  y i.  y . j  y ..
ANOVA-based SA
• Second order interactions for discrete models
• Second order interactions for continuous models

ANOVA-based SA
• Second order interactions with 3 factors
( ) ij  y ij .     i   j 
y ij .  y i..  y . j .  y ...
( ) ik  y i.k     i   k 
y ij .  y i..  y ..k  y ...
(  ) jk  y . jk     j   k 
y . jk  y . j .  y ..k  y ...
ANOVA-based SA
• Continuous formulation
• Infinite number of i and infinte number of j
 i   y ( x1i , x2 )dx2  
x2
 j   y ( x1 , x2 j )dx1  
x1
   y( x , x )dx dx
x2 x1
1 2 1 2
ANOVA-based SA
- Descomposition in sums of squares
- Global sensitivity index

Sums of squares decomposition
• Three factor-case
VT  VE ( )  VE (  )  VE ( )  VE ( )  VE ( )  VE (  )  VE ( )  VNE

• Discrete three factor-case
yij  y..  yij  y i.  y i.  y j .  y j .  y..

I J I J I J I J

i 1
 ( yij  y..)  
j 1
2
i 1
 ( yi.  y.. )  
j 1
2
i 1
 ( y. j  y..)  
j 1
2
i 1
 ij i. . j
( y
j 1
 y  y  y..) 2

I J I J
 J   I    
i
2 2
j  ( ) 2
ij
i 1 j 1 i 1 j 1
• Continuous 2 factor-case
  ( y( x , x )   ) dx dx   ( x1 )dx1  dx2    2 ( x2 )dx2  dx1    ( ( x , x ))

2 2
1 2 1 2 1 2 dx1dx2
VT  VE ( )  VE (  )  VE ( )
ANOVA-based sensitivity indices
• Also called Sobol indices
VE ( )
D1  ;
VT
VE (  )
D2  ;
VT
VE ( )  VE ( )
D'1  ;
VT
Kriging model-based SA
• Paper by Oakley and O’Hagan (2004)

• For computer codes- deterministic (non noisy) functions
• Simplified (surrogate) kriging modelos
• Kriging = spatial statistic interpolators
• Very intensive in analytical manipulations
Input variable importance with RF (I)
Concept of Out of Bag (OOB)

observations
For each resampling, the OOB
observations are those from the
original sample which do not
appear in the resampled one
75
Input variable importance with RF (II)
For each resampling i, from a total of B (B trees)

1) Obtain the OOB subsample, L i
2) Obtain the number of correct classifications
for L i, call it c i
For each input variable X j
1) Random permutation of its values , L i
2) Obtain the number of correct
classifications , call it c ij
3) Obtain the importance measure for X j
1 B
D j   | ci  cij |
B i 1
76
Input variable importance with RF (III)
Gini importance measure
Compute for each node decrease in Gini index

Compute mean of decrease for all tree nodes with
involvement of Xj
77
Bayesian Statistics
• Alternative to frequentist approach

• Practical and conceptual differences
Bayesian Statistics
• Conceptual differences
• Definition of probability
Bayesian Statistics
• Practical differences
• Incorporation of prior info by means of prior distribution
• Bayes’s theorem
p ( | X )   ( )l ( X |  )
Advantages of Bayesian Statistics
• Incorporation of prior info

• Computational methods : MCMC and particle filters
• Prediction: more “elegant” tan frequentist
References for Bayesian Statistics
• Robert, C., (2007), “The Bayesian Choice”. Springer.

• Gelman, A., Carlin, J., Stern, H.S; Dunson, D., Vehtari. A.,
Rubin, D., (2013) Bayesian Data Analysis. CRC Press.
• Krusche, J., (2015), Doing Bayesian Data Analysis, A Tutorial
with R. JAGS and Stan. Academic Press.
• McElreath, R., (2020), Statistical Rethinking: A Bayesian
Course with Examples in R and STAN. CRC Press.
References for Bayesian Statistics (Blogs)
• Bayes’ Blog:
– https://markpsite.wordpress.com
• Doing Bayesian Data Analysis:
– http://doindbayesiandatanalysis.blogspot.com
• Count Bayesie
– https://www.countbayesie.com
Non-informative prior distributions
• Express quasi “absolute” prior ignorance

• A priori improper uniform
• On uniform under metric transfromations
• More complex and subjective than it seems
• Practical and interesting approach of Box y Tiao (1973)
• Jeffreys’ invariant prior
MCMC methods
• Markov Chain Monte Carlo

• Revolution in Bayesian Statistics from seminal paper by Smith
and Gelfand (1990)
• Allows to implement models which to data were previously
unfeasible
• Origin in the Metropolis algorithm
MCMC-Gibbs sampling
• The joint distribution is unknown, but the full conditionals are

known
f ( X | Y , Z ), f (Y | X , Z ), f ( Z | X , Y )
• By means of sampling of full conditionals, convergence to the

joint distribution is reached
Software for Bayesian Statistics
Stan
Jags (formerly WinBugs)
Implementation of MCMC
Models have to be “programmed” (not the MCMC
algortithms)
BART model
• Sum of trees model

• Within Bayesian framework
Dynatree model
• Dynamic version of the BART model

• More computational efficiency in dynamic or recursive estimation context
• Within Bayesian framework
SA with Dynatree
• Sobol indices
• On simplified surrogate model developed as sums of trees
• Similar to kriging sensitivity methodology
• Within the Bayesian framework
Simulation examples
• Simulate a model (stochastic or deterministic)

• Carry out SA
• Significant prior knowledge of input variable
importance ranking
• Useful for better understanding of ML tools because
the input –output relationship is fully known.
Results first simulation example
y  x1  x2  x3
• Full factorial design
• X1
• [1] -2.5 -1.5 -0.5 0.5 1.5 2.5
• > X2
• [1] -2.5 -1.5 -0.5 0.5 1.5 2.5
• > X3
• [1] -2.5 -1.5 -0.5 0.5 1.5 2.5
• > X4
• [1] -2.5 -1.5 -0.5 0.5 1.5 2.5
Results first simulation example (I)
• Analysis of Variance Table
• Response: y
• Df Sum Sq Mean Sq F value Pr(>F)
• x1b 9 8250 916.67
• x2b 9 8250 916.67
• x3b 9 8250 916.67
• x1b:x2b 81 0 0.00
• x1b:x3b 81 0 0.00
• x2b:x3b 81 0 0.00
• x1b:x2b:x3b 729 0 0.00
• Residuals 0 0
Results first simulation example (II)
• RANDOM FORESTS
• importance(arbol.rf)
• %IncMSE IncNodePurity
• X1b 122.51 7080.11
• X2b 125.36 7122.51
• X3b 128.94 7218.21
Second simulation example
y  x1  x2  x3  3 x1 x2
• Full factorial design
• X1
• [1] -2.5 -1.5 -0.5 0.5 1.5 2.5
• > X2
• [1] -2.5 -1.5 -0.5 0.5 1.5 2.5
• > X3
• [1] -2.5 -1.5 -0.5 0.5 1.5 2.5
Results second simulation example (I)
• Analysis of Variance Table
Response: y
Df SumSq Mean Sq
x1b 9 8250 916.7
x2b 9 8250 916.7
x3b 9 8250 916.7
x1b:x2b 81 612562 7562.5
x1b:x3b 81 0 0.0
x2b:x3b 81 0 0.0
x1b:x2b:x3b 729 0 0.0
Residuals 0 0
Results second simulation example (II)
importance(arbol.rf)
%IncMSE IncNodePurity
X1b 73.38 218499.14
X2b 74.25 203480.14
X3b -21.46 19781.42
References (I)
– Azzalini, A., and Scarpa, B., (2012), “Data analysis and data mining”. Oxford University Press.
– Breiman, L., (2001), “Random Forests”, Machine Learning, 45, pp 5-32
– Breiman, L., (2001), “Statistical modelling: the two cultures”, Statistical Science, v. 16 (3),199-
231.
– Chipman, H., George, E., and McCullogh, R., (2010), “BART: Bayesian Additive Regression
Trees”, The Annals of Applied Statistics, 4,1, 266-298.
– Gramacy, R., Taddy, M, (2013), “Variable selection and sensitivity analysis via dynamic trees
with application to computer code performance testing”., The Annals of Applied Statistics, vol.
7, 1, 51-80.
– Grompig, U, (2009), Variable importance assessment in regression:linear regression vs Random
Forest”, The American Statistician, vol. 63, num. 4.
– Hastie, T., Tibshirani, R., and Friedman, J., (2008), “The elements of Statistical Learning”.
Springer.
– James, G., Witten, D., Hastie, T., Tibshirani, R., (2013), “An introduction to statistical learning
with applications in R”. Springer.
– Matignon, R., (2007), “Data Mining using SAS Enterprise Miner”. Wiley
– Oakley, J. and O’Hagan, A., (2004), “Probabilistic sensitivity analysis of complex Models: A
Bayesian approach”. Journal of the Royal Statistical Society, B., 76(3), 751-769.
– Taddy, M., Gramacy, R., and Polson, L., (2011), “Dynamic trees for learning and design”, JASA,
106(493), 109-123
References (II)
– Verikas, A., Gelzinis, A., Bacauskiene, M., (2011), Mining data with Random
Forests: a survey and results of new tests”, Patter Recognition, 44, 330-349.
– Raschka, S., (2016), “Python machine learning”, Packt Publishing.
– Mueller, J. and Massaron, L., (2016), “Machine learning for dummies”. Wiley
– Lewis, , (2016), “Deep learning made easy with R: A gentle Introduction for Data
Science”, Auscov.
– Murphy, K., (2012), “Machine Learning: A probabilistic perspective”. MIT Press.
– Theodoridis, S., and Koutrumbas, K., (2009), “Pattern recognition”. Academic Press
– Theodoridis, J., (2015), “Machine Learning: A Bayesian and optimization
Perspective”. Academic Press.
– Le Cunn, Y., Bengio, J., and Hinton., (2015), “Deep learning”, Nature, 521, 436-444.
References (III)
– Torgo., L., (2017), “Data Mining with R. Learning with case studies”. CRC Press.
– Lantz, B., (2019), “Machine learning with R”. Packt.
– Ahrazem, I., Mira, J. and González, C., (2019), “Multi-Output Conditional Inference
Trees Applied to the Electricity Market: Variable Importance Analysis”. Energies,
2019, 12(6), 1097; https://doi.org/10.3390/en12061097
– Ahrazem, I., Forte, J., Mira, J. and González, C., (2020), “Variable Importance
Analysis in Imbalanced datasets: A New Approach”. DOI: 10.1109/ACCESS.2020.3008416
– .
R vs Python et al
101

Up M PHD Seminar Cart RF May 2023

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Up M PHD Seminar Cart RF May 2023

Uploaded by

Copyright:

Available Formats

UPM-PhD

• Data are generated by a proposed stochastic model (data

TURNING DATA INTO KNOWLEDGE

• Tools to EXTRACT, EXPLOIT information

• “Data mining represents the work of processsing graphically

Azzalini & Scarpa 2012

• Bridging the gap between

R.H. Coase 1991 Nobel Laurate in Economics

• Unsupervised Learning Methods:

• Supervised Learning Methods:

• In-between Supervised and Unsupervised Learning Methods:

• Belong to Data Mining

Class estimation Quantit output

• Estimation is about parameters

• Not subject to distributional assumptions

• Interpretability vs traditional probabilistic models:

• No analytical or “closed form”, it is just an algorithm

• Response, output or dependent variable

Tree building (growing)

• Characteristics of the algorithms

Gini reduction (Total leaf impurity)

Independent from the distributions values, just on the

Suppose we have four balls, 2 red R1,R2 and 2 black B1 and B2

In how many ways could we sort them out?

Thus, the number of different sortings would be NOT 4!

We now draw a ball at random and I guess its colour, in

It’s red and I say it’s red p1*p1

My guess has been right for the first and fourth

P(wrong guess)=1-P(right guess)=

– One or a minimal number of observations per

• Care with “Overfitting”. …… Outliers

Proportion of misclassified data

• Assignment of mean value of the output for a given

• **A linear regression model may be built from all

Avaliable info on four physical flower measurements:

Virgínica Versicolor Setosa

• Petal length= 5.82 Categoría

First sophistication of CART

• Given a set of input variables, what is the effect on the output

• Not necessarily a statistical problem

• Miopic to non-monotonic transformations

• Unable to proper handling of non-uniform inputs

• Standardized regression coefficient – the input and output

• Interpretation: when a given factor Xi is increased by a

• The effect on the output of a given input depends on

• Road traffic example

• Example for continuous models

• Road traffic example:

• Example for continuous models:

• Example for continuous models:

• Definition of main effect

• Definition of second or higher order interactions

• Sums of squares decomposition

• Global sensitivity measures

• Main effects: discrete models

• Discrete 3 factor models

• Definition of second order interactions

• Second order interactions for discrete models

• Second order interactions for continuous models

• Second order interactions with 3 factors

- Descomposition in sums of squares

- Global sensitivity index

VT  VE ( )  VE (  )  VE ( )  VE ( )  VE ( )  VE (  )  VE ( )  VNE