You are on page 1of 14

Machine Learning for Chemical Engineers

CHE F315

Ajaya Kumar Pani


BITS Pilani Department of Chemical Engineering
B.I.T.S-Pilani, Pilani Campus
Pilani Campus
Lecture-4
19-01-2024
BITS Pilani
Pilani Campus
Data Preprocessing
BITS Pilani
Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

Recap

Missing value
Outlier detection - univariate methods
Descriptive statistics
Univariate
Multivariate

26 January 2024 4
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

Data preprocessing
Multivariate outlier detection

Pani, A. K., & Mohanta, H. K. (2016). Online monitoring of cement clinker quality using multivariate statistics and
Takagi-Sugeno fuzzy-inference technique. Control Engineering Practice, 57, 1-17.
26 January 2024 5
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

Data preprocessing

Multivariate outlier detection


Mahalanobis distance

If the underlying distribution is a multivariate normal distribution,


it is common to use the 0.975 quantile of a chi-square
distribution with p degrees of freedom 2p;0.975 as a cut off
value

Multivariate trimming (MVT)


Minimum covariance determinant (MCD) estimator
Minimum volume ellipsoid (MVE) estimator
Smallest half volume

26 January 2024 6
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

Pani, A. K., & Mohanta, H. K. (2016). Online monitoring of cement clinker quality using multivariate statistics and
Takagi-Sugeno fuzzy-inference technique. Control Engineering Practice, 57, 1-17.

26 January 2024 7
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

Useful References
https://www.machinelearningplus.com/statistics/mahalanobi
s-distance/
Chiang, L. H., Pell, R. J., & Seasholtz, M. B. (2003).
Exploring process data with the use of robust outlier
detection algorithms. Journal of Process Control, 13(5),
437-449.

26 January 2024 8
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

Data transformation (scaling)

Min-max
z-score

26 January 2024 9
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

Dimensionality reduction

Rallo, R., Ferre-Gine, J., Arenas, A.,


& Giralt, F. (2002). Neural virtual
sensor for the inferential
prediction of product quality from
process variables. Computers &
Chemical Engineering, 26(12),
1735-1754.

26 January 2024 10
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

Dimensionality reduction
As the number of dimensions increases time/computation
complexity increases
• Variable (feature) selection
Reduces dataset size by removing irrelevant variables
• Variable (feature) extraction (transformation)

26 January 2024 11
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

Dimensionality reduction

Variable (feature) selection


Filter based
– Stepwise forward selection
– Stepwise backward elimination

Wrapper based

26 January 2024 12
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

ACROSS
2. Mean and mode are examples of
______________ of univariate data.
4. Noisy data is
(normal/abnormal) data.
1
7. The branch of statistics that is used for
2
summarizing data is called ______
statistics.
10. Kurtosis characterized
the__________ of data.
3 12. The assumption of testing of data is
called a ___________ .
4 5
13. Raw facts are called _______.
6 14. Data wrangling refers to making data
suitable for processing. (Yes/ No)
7 8 9 15. Pairplot is used to visualize univariate
data. (Yes/No)
10

DOWN
1. The averaged square distance from
its mean is called ____________.
3. The characteristics of Big Data are
11
volume, velocity and
12
__________________.
5. The Dataset of two variables is
13 called __________________ data.
6. Visualiztion helps in presentation of
14 data. (Yes/ No)
8. Normalized covariance is called
________________.
9. Processed data is
________________.
15 11. Incorrect rejection of true
hypothesis is called _____________
error.
26 January 2024 13
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

26 January 2024
14 BITS Pilani, Pilani Campus

You might also like