You are on page 1of 18

Machine Learning for Chemical Engineers

CHE F315

Ajaya Kumar Pani


BITS Pilani Department of Chemical Engineering
B.I.T.S-Pilani, Pilani Campus
Pilani Campus
Lecture-6
25-01-2024
BITS Pilani
Pilani Campus
Data Preprocessing
BITS Pilani
Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

Recap

Feature selection
Measure of relevant feature
Probability
Probability distribution

26 January 2024 4
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

Feature selection
Filter methods
– Mutual information
– Correlation
– Chi square test
– ANOVA Retains the most relevant variables from the
Wrapper methods original dataset
– Forward selection
– Backward selection
Embedded methods
– LASSO
– Elastic net
– Ridge regression

26 January 2024 5
BITS Pilani, Pilani Campus
ET ZC362 Environmental Pollution Control

Feature extraction
Determine a smaller set of new variables, each being a combination of the
input variables, containing the same information as the input variables
Linear Non-linear
• Principal Component • Kernel PCA
Analysis (PCA) • Kernel ICA
• Factor Analysis • Multi-Dimensional Scaling
• Linear Discriminant • Isometric Mapping
Analysis (Isomap)
• Singular Value • Non-negative matrix
Decomposition factorization
• Independent component • Random forest
analysis • Autoencoder
26 January 2024 6
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

Principal component analysis

A statically procedure that uses orthogonal transformation


to convert a set of correlated variables into linearly
uncorrelated variables
Applications
• Image processing (face recognition, image analysis)
• Gene expression analysis
• Data visualization
• Data compression
• Outlier/noise detection
• Dimensionality reduction (Feature extraction)
• Process monitoring-fault detection

7 BITS Pilani, Pilani Campus


CHE F315 Machine Learning for Chemical Engineers

Principal component analysis

26 January 2024 8
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

Principal component analysis


All principal components (PCs) start at the origin of the
ordinate axes.
First PC is direction of maximum variance from origin
Subsequent PCs are orthogonal to 1st PC and describe
maximum residual variance
Let x  Rm denote a sample vector of m sensors.
Assuming that there are N samples for each sensor, a data
matrix X  RN×m is composed with each row representing
a sample.
The matrix X can be decomposed into a score matrix T and
a loading matrix P by either the NIPALS or the singular
value decomposition (SVD) algorithm
26 January 2024 9
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

PCA Algorithm

Form the data matrix X containing your data (N×m)


Calculate the covariance matrix S, based on X

Determine the eigen vectors and eigen values

26 January 2024 10
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

PCA Algorithm

Se = λe  (S – λI)e = 0
S: Square Matrix
λ: Eigenvalue
e: Eigenvector
e will be an eigenvector of S if and only if det(S – λI) = 0
Solve P = Xe to calculate the principal components

26 January 2024 11
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

Example
2  12
Find the eigenvalues of A 
 1  5 

26 January 2024 12
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

From m original variables: x1,x2,...,xm:


Produce m new variables: p1,p2,...,pm:
p1 = a11x1 + a12x2 + ... + a1kxm
p2 = a21x1 + a22x2 + ... + a2kxm pk 's are
... Principal Components
pk = am1x1 + am2x2 + ... + ammxm
such that:

pm 's are uncorrelated (orthogonal)


p1 explains as much as possible of original variance in data set
p2 explains as much as possible of remaining variance
etc.

26 January 2024 13
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

{a11,a12,...,a1m} is 1st Eigenvector of p1 = a11x1 + a12x2 + ... + a1kxm


p2 = a21x1 + a22x2 + ... + a2kxm
correlation/covariance matrix, and
...
coefficients of first principal component pk = am1x1 + am2x2 + ... + ammxm

{a21,a22,...,a2m} is 2nd Eigenvector of


correlation/covariance matrix, and
coefficients of 2nd principal component

{ak1,ak2,...,amm} is kth Eigenvector of


correlation/covariance matrix, and
coefficients of kth principal component

26 January 2024 14
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

Covariance Matrix:
– Variables must be in same units
– Emphasizes variables with most variance
– Mean eigenvalue ≠1.0

Correlation Matrix:
– Variables are standardized (mean 0.0, SD 1.0)
– Variables can be in different units
– All variables have same impact on analysis
– Mean eigenvalue = 1.0

26 January 2024 15
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

Step by step procedure

Get the input data matrix


Mean center and scale each value
Determine the correlation matrix
Get the eigen vectors and corresponding eigen values
Arrange the eigen vectors in the order of decreasing eigen
values
Consider only those eigen vectors for which the cumulative
eigen value is equal to or more than 90% of the total eigen
value
These eigen vectors (principal components are our modified
set of variables)

26 January 2024 16
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

FinalData = RowFeatureVector x RowZeroMeanData


RowFeatureVector is the matrix with the eigenvectors in
the columns transposed so that the eigenvectors are
now in the rows, with the most significant eigenvector at
the top
RowZeroMeanData is the mean-adjusted data
transposed, ie. the data items are in each
column, with each row holding a separate
dimension.

26 January 2024 17
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

26 January 2024
18 BITS Pilani, Pilani Campus

You might also like