Professional Documents
Culture Documents
CHE F315
Data preprocessing
Dimensionality
reduction
• Variable selection Normalization
• Variable
transormation
3 February 2024 4
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Data preprocessing
Missing value
Deletion
Replacement
Mean replacement
Interpolation replacement
3 February 2024 5
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Data preprocessing
Multivariate
– Mahalanobis distance
– Minimum covariance determinant (MCD) estimator
– Minimum volume ellipsoid (MVE) estimator
– Smallest half volume
3 February 2024 6
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
MATLAB Application
AI, Data Science, and Statistics Statistics and Machine Learning Toolbox
Cluster Analysis and Anomaly Detection:
Isolation forest, Robust Random Cut Forest, Local Outlier Factor, One-Class
Support Vector Machine (SVM), Mahalanobis Distance, Incremental Robust
Random Cut Forest, Incremental One-Class Support Vector Machine (SVM)
Open data.mat
The input has 1000 samples and 10 variables
[N,D] =size(X);
md_classical = pdist2(X,mean(X),"mahalanobis");
[SIG, MU, md_robust] = robustcov(X);
threshold = sqrt(chi2inv(1-0.05,D));
plot(1:1000,md_classical,'r+:',1:1000,md_robust,'b*:');
yline(threshold)
3 February 2024 7
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
MATLAB Application
plot(md_classical)
3 February 2024 8
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
MATLAB Application
plot(md_robust)
3 February 2024 9
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
MATLAB Application
plot(1:1000,md_classical,1:1000,md_robust);
plot(1:1000,md_classical,'r+:',1:1000,md_robust,'b*:');
yline(threshold)
3 February 2024 10
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Data preprocessing
Data transformation/normalization/scaling
Min-max
z-score
3 February 2024 11
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Data preprocessing
Dimensionality reduction
variable/feature selection
Filter methods
– Mutual information
– Correlation
– Chi square test
– ANOVA
Wrapper methods
– Forward selection
– Backward selection
Embedded methods
– LASSO
– Elastic net
– Ridge regression
3 February 2024 12
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Data preprocessing
Variable transformation/feature extraction
Linear Non-linear
• Principal Component
• Kernel PCA
Analysis (PCA)
• Kernel ICA
• Factor Analysis
• Multi-Dimensional Scaling
• Linear Discriminant
Analysis • Isometric Mapping
(Isomap)
• Singular Value
Decomposition • Non-negative matrix
factorization
• Independent component
analysis • Random forest
• Autoencoder
3 February 2024 13
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Statistics
Probability
Joint probability
Conditional probability
Baye’s rule
Probability distribution
Continuous
Discrete
3 February 2024 14
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
MATLAB Application
AI, Data Science, and Statistics Statistics and Machine Learning Toolbox
Dimensionality Reduction and Feature Extraction:
3 February 2024 15
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
3 February 2024
16 BITS Pilani, Pilani Campus