You are on page 1of 16

Machine Learning for Chemical Engineers

CHE F315

Ajaya Kumar Pani


BITS Pilani Department of Chemical Engineering
B.I.T.S-Pilani, Pilani Campus
Pilani Campus
Lecture-5
24-01-2024
BITS Pilani
Pilani Campus
Data Preprocessing
BITS Pilani
Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

Recap

Multivariate data
Euclidean and Mahalanobis distance
Multivariate outlier detection
Data transformation
Dimensionality reduction
Feature selection
Feature extraction (transformation)

26 January 2024 4
BITS Pilani, Pilani Campus
ET ZC362 Environmental Pollution Control

Feature selection

Supervised learning Unsupervised learning


A matrix of unlabeled data
Predictor and response Similarity of data samples
variable are evaluated based on
Each predictor variable is values of the variables
expected to contribute
to the value of If a variable does not
response variable contribute in deciding
When this contribution is the similarity or
very little, the variable dissimilarity of samples,
is weakly relevant then that variable is
(irrelevant) weakly relevant

26 January 2024 5
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

Feature selection
Measure of relevant feature
– Mutual information
– Correlation based similarity
– Distance-based similarity
A typical feature selection process consists of four steps:
– Generation of possible subsets
– Subset evaluation
– Stop searching based on some stopping criterion
– Validation of the result

26 January 2024 6
BITS Pilani, Pilani Campus
Probability
BITS Pilani
Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

Why Probability in ML
Designing machines that learn from observed data
Uncertainty in learning from data
Observed data can be consistent with many models and
therefore which model is appropriate, given the data, is
uncertain
Predictions about future data and the future consequences
of actions are uncertain
Many aspects of learning and intelligence crucially depend
on the careful probabilistic representation of uncertainty.
Probabilistic framework describes how to represent and
manipulate uncertainty about models and predictions
Bayesian interpretation  use of probability to quantify
uncertainty
26 January 2024 8
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

Review of basics

p(A) – probability that event A is true


p(A) – 1  A will definitely happen
p(A) – 0  A will definitely not happen
p(Ā) = 1-p(A)
Joint probability p(A,B)
Conditional probability p(AΙB)

26 January 2024 9
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

Example

An industry produces few defective products. It is observed


that in a lot of 1000 products, 25 are defective. If two
random samples are selected for testing without
replacement, calculate the probability that both products
are defective

26 January 2024 10
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

Probability distribution

A mathematical model that relates the value of the variable


with the probability of occurrence of that value in a
population
Continuous distribution  Variable being measured is
expressed on a continuous scale
Discrete distribution  when the measured parameter can
only take certain values (such as integers)

26 January 2024 11
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

Probability distribution

26 January 2024 12
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

Example

The metal layer thickness on silicon wafers in a CVD


process is normally distributed with mean 0.2508 and std
dev 0.0005. The specification is 0.2500 0.0015. What
fractions of the wafers produced conform to the
specification?

26 January 2024 13
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

Probability distribution

Central limit theorem


If x1, x2, …, xn are independent random variables with
mean  and variance 2, then
𝑦−σ𝑛
𝑖=1 𝜇𝑖
The distribution of approaches the N(0,1)
σ𝑛 𝜎
𝑖=1 𝑖
2

distribution as n approaches infinity

26 January 2024 14
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

References

Montgomery, D. C. (2019). Introduction to statistical quality


control. John wiley & sons.
Ghahramani, Z. (2015). Probabilistic machine learning and
artificial intelligence. Nature, 521(7553), 452-459.

26 January 2024 15
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

26 January 2024
16 BITS Pilani, Pilani Campus

You might also like