Professional Documents
Culture Documents
Lecture 4
Lecture 4
CHE F315
Recap
Missing value
Outlier detection - univariate methods
Descriptive statistics
Univariate
Multivariate
26 January 2024 4
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Data preprocessing
Multivariate outlier detection
Pani, A. K., & Mohanta, H. K. (2016). Online monitoring of cement clinker quality using multivariate statistics and
Takagi-Sugeno fuzzy-inference technique. Control Engineering Practice, 57, 1-17.
26 January 2024 5
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Data preprocessing
26 January 2024 6
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Pani, A. K., & Mohanta, H. K. (2016). Online monitoring of cement clinker quality using multivariate statistics and
Takagi-Sugeno fuzzy-inference technique. Control Engineering Practice, 57, 1-17.
26 January 2024 7
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Useful References
https://www.machinelearningplus.com/statistics/mahalanobi
s-distance/
Chiang, L. H., Pell, R. J., & Seasholtz, M. B. (2003).
Exploring process data with the use of robust outlier
detection algorithms. Journal of Process Control, 13(5),
437-449.
26 January 2024 8
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Min-max
z-score
26 January 2024 9
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Dimensionality reduction
26 January 2024 10
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Dimensionality reduction
As the number of dimensions increases time/computation
complexity increases
• Variable (feature) selection
Reduces dataset size by removing irrelevant variables
• Variable (feature) extraction (transformation)
26 January 2024 11
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Dimensionality reduction
Wrapper based
26 January 2024 12
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
ACROSS
2. Mean and mode are examples of
______________ of univariate data.
4. Noisy data is
(normal/abnormal) data.
1
7. The branch of statistics that is used for
2
summarizing data is called ______
statistics.
10. Kurtosis characterized
the__________ of data.
3 12. The assumption of testing of data is
called a ___________ .
4 5
13. Raw facts are called _______.
6 14. Data wrangling refers to making data
suitable for processing. (Yes/ No)
7 8 9 15. Pairplot is used to visualize univariate
data. (Yes/No)
10
DOWN
1. The averaged square distance from
its mean is called ____________.
3. The characteristics of Big Data are
11
volume, velocity and
12
__________________.
5. The Dataset of two variables is
13 called __________________ data.
6. Visualiztion helps in presentation of
14 data. (Yes/ No)
8. Normalized covariance is called
________________.
9. Processed data is
________________.
15 11. Incorrect rejection of true
hypothesis is called _____________
error.
26 January 2024 13
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
26 January 2024
14 BITS Pilani, Pilani Campus