You are on page 1of 23

Seminar on Machine Learning

in Chemical Engineering
Submitted By : Tushar Goel
2018chb1061
OUTLINE
Brief Introduction to Machine
1 Learning and its Importance

Machine Learning Algorithms


2

Applications of Machine Learning in


3 Chemical Engineering and
Industries

Conclusion
4
What is Machine Learning?

 Machine learning (ML) is basically a mathematical prediction tool which


automatically learns from the data and based on its learnings make predictions for the
unseen data.

 The only input required in this technique is data, if one has some meaningful data then
he/she can obtain the desired results through ML.

 For example, if we want to predict the price of a house just by seeing the basic
features of the house like number of bedrooms, bathrooms, floors etc. then we need
sufficient amount of data in which each house’s features and price are listed.
Why Study Machine Learning?

 Machine Learning is the technology of future. Since, the inception of ML, it has been
helping humans in solving the unsolvable problems before and achieving the
unimaginable like self-driving vehicle, speech recognition etc.

 ML is not restricted to any boundaries, it has wide ranging applications. ML has also
find some applications in chemical engineering which has opened new doors of
opportunities for chemical engineers.

 Therefore, it is very important for chemical engineers to at least have some basic
knowledge of ML so that they never miss an opportunity in which their problem could
be easily solved by applying ML.
Classifications of Machine Learning
Machine learning algorithms can be classified on the basis of various grounds but here we
will see only two bases based on which these are classified.

1. On the basis of type of data or dataset

2. On the basis of the output variable type


On the basis of type of data
In this classification, the machine learning algorithms can again be divided into two following
categories.

I. Supervised Algorithms: In supervised algorithms, the data includes both the feature variables as well as
the corresponding labels or target variables. For instance, we have gender detection problem in which we are
given a dataset which includes two feature variables: - height and weight, and also an output variable i.e.
gender (male or female), based on this data we have to predict the gender of some unseen data.

II. Unsupervised Algorithms: On the other hand, unsupervised algorithms only have feature variables with
no labels. For instance, the same gender detection problem but this time the data only has two feature
variables (height and weight) but no corresponding label(gender).
On the basis of output variable type
In this classification, the machine learning algorithms can further be divided into two following
categories.

I. Regression Algorithms: In regression, the output variable is a real, continuous value. For example, the
prediction of house price based on certain features of the house (like the number of bedrooms, bathrooms
etc.), in this case, the price variable belongs to a real and continuous number.

II. Classification Algorithms: On the other hand, in classification problems the output variable can only
assume some fixed number of values which are called classes. For example, the same gender detection
problem as the gender variable can only have two values (male or female).
Machine Learning Algorithms
The most basic and frequently used machine learning algorithms are only discussed here
which are the following

0 02 03
1
Linear Support Neural
and Non- Ve c t o r Networks
Linear Machines
Regression
Linear Regression
 In this algorithm our objective is to find an approximate linear function which could map the
features or the input variables to the output variable.

 This function is a linear combination of weights and feature variables and there is also a bias
term.

 Weights and bias are the parameters of this model which need to be determined in order to
know the function completely.
Non-Linear Regression
 Similar to Linear Regression, a non-linear function like a polynomial function instead of a
linear function can also be fitted or trained on the data if the linear function is not able to
produce reasonable predictions with low error.

 This method of non linear fitting is called non-linear regression.

 For example, a polynomial function of degree 2 is to be fitted and we have only two feature
variables (

Ỹ =𝑏+𝑊 1 𝑋 1+𝑊 2 𝑋 2 +𝑊 3 𝑋 21+𝑊 4 𝑋 22+ 𝑊 5 𝑋 1 𝑋 2


Support Vector Machines (SVMs)
 SVMs is a classification algorithm. In every classification algorithm,
the goal is to find a decision boundary which can efficiently separate
the classes but this decision boundary can have infinite orientations in
space as we rotate it about different axis (see Fig).

 This problem is solved by support vector machine as it finds an


optimal decision boundary from among all the sub-optimal solutions.
Instead of a thin line type decision boundary, it finds a thick decision
boundary (in case of 2D data while a thick hyperplane in higher
dimensions) which has margin on its both sides.

 The objective of the algorithm is to maximize the width of this


margin, thus finds an optimal hyperplane (as there will be only one
hyperplane with maximum length of margin).
Neural Networks
 Neural Networks is a part of Deep Learning which is a subset of machine
learning only. Deep Learning is a more advanced version of ML which is mostly
used when a very large dataset is available.

 Neural Networks are very similar to the machine learning models, the difference
arises due to their structure, in case of machine learning there is an input layer
which consists of the feature variables and an output layer which constitutes of
the final predictions of the output variable. On the other hand, in Neural
Networks in addition to the input and output layers, there is one more type of
layers which appear between the input and output layers, these are called hidden
layers (see Fig).

 The circle in the hidden layer is called a node and each node, just like the output
variable make predictions based on the features of the previous layer and these
predictions (values in the nodes of the hidden layer) of the current layer are
further used to predict the values for the nodes in the following layer and in this
way the process of prediction continues until it reaches the output layer.
Machine Learning in Chemical Engineering and
Industrial Applications
I will be discussing three ML applications in chemical engineering related industries which will
help the readers to understand the implementation of ML and provide insights into solving real
world problems.

i. Support Vector Machines for Quality Monitoring in a Plastic Injection Molding


Process
ii. Dissolved Gas Analysis
iii. Big Data in Oil and Gas Pipelines
Support Vector Machines for Quality
Monitoring in a Plastic Injection
Molding Process
Background of the Problem
 Injection Molding is an industrial process used for manufacturing plastic objects like chairs, buckets
etc. In the manufacturing process, the product quality keeps on fluctuating which is the major
problem.

 There are a large number of process parameters like temperature, pressure, flow rate and many more,
which controls the quality of molded parts and even a slight variation in one of these parameters results
in considerable variation in the quality.

 Currently, this problem of monitoring product quality is mainly solved by statistical analysis of the
related data (parameters’ data and the corresponding parts quality data). The various parameters are
monitored by an operator who based on intuition controls the set points of various parameters manually.

 This hit and trial procedure leads to erroneous results in terms of product quality. Thus, a more reliable
method is required which could provide a better control over the product quality.
Machine Learning Solution
 A more sophisticated solution can be achieved through the machine learning model Support Vector
Machines (SVMs), which can better analyze the data and predicts the defects in the product based on
the parameters which in turn helps in determining the appropriate set points for the various parameters.

 The model input features include six process parameters generated through the various sensors which
are cycle time, metering time, injection time, barrel temperature before nozzle, cushion, and injection
velocity.

 The output of the model is a measure of product quality which is represented through six quality
variables namely streaks, stains, burn marks, edges, unfilled parts and warpage which have been found
to be representative of the range of defects that can occur so far in the plant.

 Thus, this has become a Multiclass Classification Problem which can be solved through SVMs with
very high classification accuracy.
Dissolved Gas Analysis
Background of the Problem
 Dissolved Gas Analysis (DGA) is a technique for the assessment of electrical transformers
which is employed in various industries .

 All the transformers during normal operation generate some amount of gases like hydrogen,
methane, ethane, ethylene, acetylene, carbon monoxide, and carbon dioxide which gets
dissolved in transformer oil.

 But when the transformers become faulty, these gas concentrations increase in comparison to
their normal counterparts. DGA is involved in measuring these gas concentrations and
analyzing them in order to detect some anomaly in the transformer's functioning.

 However, DGA has been in deployment for several decades, there is no unique and universally
accepted method of analyzing and interpreting the DGA concentrations.
Machine Learning Solution
This problem of anomaly detection in transformer can be solved through classification as well as
regression approach. In both these formulations, the input features are the same which are the seven
gas concentrations but the output variable is different.

1) Classification: In the classification formulation, it is assumed that the transformer will be either in
"normal" condition or "faulty" condition. For the normal case, the label y is assigned the value "1"
while in the case of faulty, it is assigned "0". The convention used in labelling the transformer as
normal or faulty is that, if the DGA measurement was taken at least five years prior to failure then
the corresponding label is normal and faulty otherwise.

2) Regression: In the regression formulation, in addition to the time stamps for each DGA measurement
we also have the information about time of failure. We used this information to obtain more
informative labels  [0,1], where =0 would mean “bound to fail”, =1 would mean “should not fail in
the foreseeable future”, and values between those two extremes would quantify the risk of failure
Big Data in Oil and Gas Pipelines
Background of the Problem
 The natural gas and crude oil produced in industries is transported through long metallic pipes. Due to
the extreme weather conditions, these pipes are prone to many different types of defects like corrosion,
cracks, dents etc.
 Therefore, it is very important to devise an efficient and effective method which would locate these
defects and also identify the defect type (corrosion, cracks or dents etc.).
 One such method which is largely employed in most of the industries is the use of Magnetic Flux
Leakage (MFL) signals. In this method, magnetic field is produced and allowed to pass through the
pipelines, in this process, some of the magnetic flux leaks out and gets detected by the MFL sensors
which are installed along the length of the pipeline at equal distances.
 When a defect is encountered along the pipe, the magnetic flux leakage increases and thus, the MFL
sensors record a high amplitude at these locations which indicates the presence of defects.
 Also, different types of defects will have certain characteristic patterns in the graph plotted between the
Magnetic Flux Leakage components (, and ) and the length of the pipe. Thus, studying these graphs can
help in identifying the type of defect present.
Machine Learning Solution
 There is one more problem of defect depth estimation which is quite difficult to solve manually.

 The MFL signal increases in magnitude when the defect depth increases. But this relationship is
very complex and cannot be determined analytically.

 Therefore, to capture this relationship, machine learning models are used. The MFL sensor
recordings will serve as the input features for the model. But as the number of sensors is very large,
the dimensionality of the features is reduced first through dimensionality reduction techniques.

 The output variable is the defect depth which is a real number.

 Thus, we have collected the features and the output variable for training our model.
Conclusion
 We have explored quite a few applications of machine learning in chemical engineering
related industries which have substantially reduced the human effort and helped save a lot of
time and money and made possible and feasible which was considered impossible and
impractical before.

 These applications of ML in chemical engineering are a mere beginning, there’s a lot more to
come, only if we, chemical engineers instead of adopting a conventional thinking, starts to
incorporate machine learning in our daily curriculum.

 The integration of chemical engineering and machine learning is a revolutionary step towards
achieving the unimaginable targets and solutions to unsolvable problems in chemical
engineering.

You might also like