You are on page 1of 16

Machine Learning for Chemical Engineers

CHE F315

Ajaya Kumar Pani


BITS Pilani Department of Chemical Engineering
B.I.T.S-Pilani, Pilani Campus
Pilani Campus
Lecture-9
02-02-2024
BITS Pilani
Pilani Campus
Data Preprocessing
Revision
BITS Pilani
Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

Data preprocessing

Raw Missing Outlier


data value

Dimensionality
reduction
• Variable selection Normalization
• Variable
transormation

3 February 2024 4
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

Data preprocessing

Missing value
Deletion
Replacement
Mean replacement
Interpolation replacement

3 February 2024 5
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

Data preprocessing

Outlier detection and removal


Univariate
– 3σ rule
– Hampel identifier
– Quartile-based identifier and boxplots

Multivariate
– Mahalanobis distance
– Minimum covariance determinant (MCD) estimator
– Minimum volume ellipsoid (MVE) estimator
– Smallest half volume

3 February 2024 6
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

MATLAB Application
AI, Data Science, and Statistics  Statistics and Machine Learning Toolbox 
Cluster Analysis and Anomaly Detection:
Isolation forest, Robust Random Cut Forest, Local Outlier Factor, One-Class
Support Vector Machine (SVM), Mahalanobis Distance, Incremental Robust
Random Cut Forest, Incremental One-Class Support Vector Machine (SVM)

Open data.mat
The input has 1000 samples and 10 variables
[N,D] =size(X);
md_classical = pdist2(X,mean(X),"mahalanobis");
[SIG, MU, md_robust] = robustcov(X);
threshold = sqrt(chi2inv(1-0.05,D));
plot(1:1000,md_classical,'r+:',1:1000,md_robust,'b*:');
yline(threshold)

3 February 2024 7
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

MATLAB Application
plot(md_classical)

3 February 2024 8
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

MATLAB Application
plot(md_robust)

3 February 2024 9
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

MATLAB Application
plot(1:1000,md_classical,1:1000,md_robust);
plot(1:1000,md_classical,'r+:',1:1000,md_robust,'b*:');
yline(threshold)

3 February 2024 10
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

Data preprocessing

Data transformation/normalization/scaling
Min-max
z-score

3 February 2024 11
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

Data preprocessing

Dimensionality reduction
variable/feature selection
Filter methods
– Mutual information
– Correlation
– Chi square test
– ANOVA
Wrapper methods
– Forward selection
– Backward selection
Embedded methods
– LASSO
– Elastic net
– Ridge regression

3 February 2024 12
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

Data preprocessing
Variable transformation/feature extraction
Linear Non-linear
• Principal Component
• Kernel PCA
Analysis (PCA)
• Kernel ICA
• Factor Analysis
• Multi-Dimensional Scaling
• Linear Discriminant
Analysis • Isometric Mapping
(Isomap)
• Singular Value
Decomposition • Non-negative matrix
factorization
• Independent component
analysis • Random forest
• Autoencoder
3 February 2024 13
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

Statistics

Probability
Joint probability
Conditional probability
Baye’s rule
Probability distribution
Continuous
Discrete

3 February 2024 14
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

MATLAB Application
AI, Data Science, and Statistics  Statistics and Machine Learning Toolbox 
Dimensionality Reduction and Feature Extraction:

[COEFF, SCORE, LATENT, TSQUARED, EXPLAINED] = pca(X)


[COEFF, LATENT] = pcacov(V)
ppca – probabilistic PCA
Factor analysis  Factor analysis
Discriminant analysis  DISCR=fitcdiscr(X,Y)

3 February 2024 15
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

3 February 2024
16 BITS Pilani, Pilani Campus

You might also like