Professional Documents
Culture Documents
Lecture 6
Lecture 6
CHE F315
Recap
Feature selection
Measure of relevant feature
Probability
Probability distribution
26 January 2024 4
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Feature selection
Filter methods
– Mutual information
– Correlation
– Chi square test
– ANOVA Retains the most relevant variables from the
Wrapper methods original dataset
– Forward selection
– Backward selection
Embedded methods
– LASSO
– Elastic net
– Ridge regression
26 January 2024 5
BITS Pilani, Pilani Campus
ET ZC362 Environmental Pollution Control
Feature extraction
Determine a smaller set of new variables, each being a combination of the
input variables, containing the same information as the input variables
Linear Non-linear
• Principal Component • Kernel PCA
Analysis (PCA) • Kernel ICA
• Factor Analysis • Multi-Dimensional Scaling
• Linear Discriminant • Isometric Mapping
Analysis (Isomap)
• Singular Value • Non-negative matrix
Decomposition factorization
• Independent component • Random forest
analysis • Autoencoder
26 January 2024 6
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
26 January 2024 8
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
PCA Algorithm
26 January 2024 10
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
PCA Algorithm
Se = λe (S – λI)e = 0
S: Square Matrix
λ: Eigenvalue
e: Eigenvector
e will be an eigenvector of S if and only if det(S – λI) = 0
Solve P = Xe to calculate the principal components
26 January 2024 11
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Example
2 12
Find the eigenvalues of A
1 5
26 January 2024 12
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
26 January 2024 13
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
26 January 2024 14
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Covariance Matrix:
– Variables must be in same units
– Emphasizes variables with most variance
– Mean eigenvalue ≠1.0
Correlation Matrix:
– Variables are standardized (mean 0.0, SD 1.0)
– Variables can be in different units
– All variables have same impact on analysis
– Mean eigenvalue = 1.0
26 January 2024 15
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
26 January 2024 16
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
26 January 2024 17
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
26 January 2024
18 BITS Pilani, Pilani Campus