You are on page 1of 55

Unibertsitate Masterra

Konputazio Ingeniaritza eta Sistema Adimentsuak


                                             

Konputazio Zientziak eta Adimen Artifiziala Saila –


Departamento de Ciencias de la Computación e Inteligencia Artificial

K
I Master Tesia
S
A
GANs & Friction Stir Welding
I
by
C Ali Hamza
S
I
Tutorea(k)

Ramón Moreno
LORTEK

Josune Gallego, Carmen Hernández


Konputazio Zientziak eta Adimen Artifiziala saila
Informatika Fakultatea

KZAA
/CCIA
2021ko irailan
Abstract

Generative adversarial network (GAN), an promising area in the field of deep


learning, has made it possible to generate realistic images of human faces,
change the style of an image and generate voice and texts. The generation of
realistic images using GANs is an interesting research area and in this thesis
that is the main concern. Over the past years people have been working on
generating images from scratch, but no work on generating images and their
ground truth all together. This works focuses on using GAN methods to
generate images and the ground truth simultaneously. Different new methods
have been tried to achieve our goal, we are getting good results so far.

On the other hand, we have worked in the detection of defectives in Friction


Stir Welding.

Keywords: Machine Learning, supervised/unsupervised learning, GANs,


Friction Stir, Welding, DL, Data augmentation.

I
II
Contents

Abstract I

List of Figures VI

1 Introduction 3

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Main contributions and structure of the thesis . . . . . . . . . 4

2 Basic concepts 6

2.1 Deep Learning (DL) . . . . . . . . . . . . . . . . . . . . . . . 6

2.1.1 Convolutional Neural Networks (CNN) . . . . . . . . . 7

2.1.2 Convolution Layer . . . . . . . . . . . . . . . . . . . . 8

2.1.3 Transposed Convolutional Layer . . . . . . . . . . . . . 8

2.1.4 Ground Truth . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 Generative Adversarial Neural Networks . . . . . . . . . . . . 11

2.2.1 Impact and some applications of GANs . . . . . . . . . 13

III
CONTENTS IV

3 Our GAN 15

3.1 The Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.2 How to define and train the generator model . . . . . . . . . . 16

3.3 How to define and train the discriminator model . . . . . . . . 18

3.4 Loss Functions and Optimizers . . . . . . . . . . . . . . . . . 20

3.5 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.6.1 No Convergence . . . . . . . . . . . . . . . . . . . . . . 23

3.6.2 Model Collapse . . . . . . . . . . . . . . . . . . . . . . 23

3.6.3 Non-informative loss . . . . . . . . . . . . . . . . . . . 23

3.7 Solving problems and results with high resolution . . . . . . . 23

4 Investigating Generating Images and GT 27

4.1 4rth channel (GT as the fourth channel) . . . . . . . . . . . . 28

4.2 Joint Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.3 Two generators . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.4 Similarity Loss . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5 Data Analysis on Industrial application on FSW 34

5.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.2 Problem to solve . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.3 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
CONTENTS V

5.4 The Application . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.4.1 Optional Application . . . . . . . . . . . . . . . . . . . 41

6 Conclusions and future work 43

6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Bibliography 45
List of Figures

2.1 AI. vs ML. vs DL. . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Explanation of the convolutional operation. . . . . . . . . . . 8

2.3 Convolutional Transposed layer. . . . . . . . . . . . . . . . . . 10

2.4 Image and the GT. . . . . . . . . . . . . . . . . . . . . . . . . 11

2.5 GAN Architecture. . . . . . . . . . . . . . . . . . . . . . . . . 13

2.6 Image to Image Translation examples. . . . . . . . . . . . . . 14

3.1 Four Real Images. . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.2 Generator Model. . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.3 Discriminator Model. . . . . . . . . . . . . . . . . . . . . . . . 18

3.4 Project working pipeline. . . . . . . . . . . . . . . . . . . . . . 22

3.5 GAN architecture with noise. . . . . . . . . . . . . . . . . . . 24

3.6 This figure shows the training at epoch 10. As we can see the
model has learned the color and a bit of the structure of the
real images. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.7 This figure shows the training at epoch 60. Model have learned
the structure of the real images. . . . . . . . . . . . . . . . . . 25

VI
LIST OF FIGURES VII

3.8 This figure shows the training at epoch 200. Generated images
are better. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.9 This figure shows the training at epoch 500. The model has
learned to create defective images. . . . . . . . . . . . . . . . . 25

3.10 This figure shows the training at epoch 700. There is not a
major improvement. . . . . . . . . . . . . . . . . . . . . . . . 26

4.1 Our training input with for channels. . . . . . . . . . . . . . . 28

4.2 Our training input with for channels. . . . . . . . . . . . . . . 29

4.3 Two images on one Canvas. . . . . . . . . . . . . . . . . . . . 30

4.4 Joint images. . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.5 Joint GAN framework. Gi , Gt , Di , and Dt the generators and


discriminators for the image and text. Xr∗ are the real examples
and Xg∗ are the generated samples. . . . . . . . . . . . . . . . 31

4.6 Four generated images. . . . . . . . . . . . . . . . . . . . . . . 33

5.1 Basic principles of welding “Friction Stir Welding”. . . . . . . 35

5.2 Robotic FSW Welding and Aluminum Extrusion Welding Fa-


cility at LORTEK. . . . . . . . . . . . . . . . . . . . . . . . . 35

5.3 Poor quality part with excess Burr resulting from excessive pen-
etration of the tool in the penetration phase. . . . . . . . . . . 36

5.4 Good quality part resulting from appropriate penetration of the


tool. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5.5 Evolution of the axial force monitored during the penetration


and welding phases of a poor quality part (shown in Figure 5.3). 37

5.6 Evolution of the axial force monitored during the penetration


and welding phases of a good quality part (shown in Figure 5.4). 37
LIST OF FIGURES VIII

5.7 Bad welding on the left side vs good welding on the right side.
The green line is the force we apply to the material and the red
line is the real force. . . . . . . . . . . . . . . . . . . . . . . . 38

5.8 Good welds vs Bad weld. . . . . . . . . . . . . . . . . . . . . . 40

5.9 The Application detecting bad welds . . . . . . . . . . . . . . 41

5.10 Our Plot application. . . . . . . . . . . . . . . . . . . . . . . . 42


LIST OF FIGURES 1
Acronyms

AI Artificial Intelligence. 1, 13, 14

CNN Convolutional Neural Network. 1, 7, 8

CSV Comma Separated Value File. 1, 3, 37–41, 44

DL Deep Learning. 1, 6, 43

FSW Friction Stir Welding. 1, 4, 34, 37, 43

GAN Generative Adversarial Network. 1, 3, 4, 12–14, 20, 22, 23, 27, 31, 43

GT Ground Truth. 1, 4, 11, 21, 27, 31, 33, 43

ICA Independent Component Analysis. 1

ML Machine Learning. 1

MLP Multi-Layer Perceptron. 1

NN Neural Network. 1

UL Unsupervised Learning. 1

2
Chapter 1

Introduction

1.1 Motivation

The motivation of this project comes from the recent arrival of Generative
Adversarial Networks (GANs) and the innovative possibilities they are
opening in many different fields. GAN architecture arose in 2014 , and since
then, it has been highlighted as a potential alternative(s) for data augmenta-
tion and missing data problems, amongst others, due to their understanding
capabilities on generating realistic data instances.

1.2 Objectives

This project, created in partnership with Lortek, has the first objective of
creating a GAN model that creates new images. This GAN model will be
used in many projects for data augmentation. The second objective is to
identify bad weldings given a Comma Separated Value Files (CSVs). It
should be noted that this second work is completely different from that of the
GANs.

Knowing the objectives, we subdivide the work into smaller tasks:

• Task One Part 1: we are going to work with GANs to create syn-

3
CHAPTER 1. INTRODUCTION 4

thesized images and once the first milestone is achieved, We will try to
improve the quality of the images to make them look as real as possible.

• Task One Part 2: This time in addition to creating the image, we


will also create the Ground Truth (GT) at the same time with some
modification in the model.

• Task Two: We will work on creating an application that will be able


to tell if two pieces have been welded correctly or not.

1.3 Main contributions and structure of the


thesis

The work has been developed in preliminary sense with the aim to describe
the creation of the images using GAN, if we succeed in creating it, we will
go ahead and create the ground truth and the image simultaneously. After
pasing this stage, we shall analyse FSW data to detect whether or not the
piece has been welded correctly.

The current work has three main contributions and we have three months to
complete my thesis, so we are going to divide our work in the next way:

The first month we are going to focus on studying and reviewing the previous
work that has been done in the past and create our own GAN model to
generate new images.

Second month, after succeeding in the previous mention work above, we may
execute the idea of creating the image and the ground truth together.

In third month, now analyzing skills will come in to action since it’s necessary
to analyse the FSW data and create a classifier to recognize whether or not
the piece has been welded correctly. Finally we summarize the general and
particular conclusions of the developed work, and we present a future research
project.

If we have troubles training and generating images with GANs, we will spend
our second month on that and we will skip the part of the creation of the
images and the Ground Truth (GT) at the same time. The GANs will be
CHAPTER 1. INTRODUCTION 5

used to do data augmentation and the FSW application will be installed in


the corresponding machine to stop welding in case failures are detected.
Chapter 2

Basic concepts

We will introduce some needed basic concepts in order to understand the


thesis. These concepts will be mainly focused on Deep Learning (DL)
and are just a basic introduction to the topics that we will be referencing
through the document. Some of the methods and strategies used will be
further explained alongside the project.

2.1 Deep Learning (DL)

Deep Learning is a subset of Machine Learning, which on the other hand


is a subset of Artificial Intelligence. Artificial Intelligence is a general term
that refers to techniques that enable computers to mimic human behavior.
Machine Learning represents a set of algorithms trained on data that make
all of this possible.

6
CHAPTER 2. BASIC CONCEPTS 7

Figure 2.1: AI. vs ML. vs DL.

Deep Learning, on the other hand, is just a type of Machine Learning, inspired
by the structure of a human brain. Deep learning algorithms attempt to draw
similar conclusions as humans would by continually analyzing data with a
given logical structure.

2.1.1 Convolutional Neural Networks (CNN)

A Convolutional Neural Network (CNN) is a deep learning algorithm


that can take in an input a image, assign importance to various aspects/ob-
jects in the image, and be able to differentiate one from the other. The
architecture of a CNN is analogous to that of the connectivity pattern of
Neurons in the Human Brain and was inspired by the organization of the
Visual Cortex. Individual neurons learn something from the images and by
layers, they are able to identify an object in the image. CNN is mostly used
to work for images.
CHAPTER 2. BASIC CONCEPTS 8

Normally images are too big and we can’t work with each pixel of the image
so, the role of the CNN is to reduce the images into a form that is easier to
process and work with them. The good thing about CNN is that we don‘t
lose information which makes CNN very powerful. We have various types of
convolutional layers but in this work, we are going to focus on two of them,
ConvolutionND and ConvolutionNDTranspose.

2.1.2 Convolution Layer

The convolution layer on an input of size i×i is defined by the padding (p) and
the stride (s) where the p is the number of zeros padded around the original
input increasing the size to (i + 2p) × (i + 2p), see Figure 2.2, and the s is the
amount by which the kernel is shifted when sliding across the input image.

Figure 2.2: Explanation of the convolutional operation.

This operation will be very important in our work, as we will reduce the
dimensionality of the input space.

2.1.3 Transposed Convolutional Layer

A transpose layer is usually used for upsampling that is to generate an output


feature map that has a spatial dimension greater than that of the input feature
map. In our case, we are going to use the transpose layer in our generator
CHAPTER 2. BASIC CONCEPTS 9

to generate images from a small input which is noise. We can explain the
transposed layer in a 4 steps process like Aqeel Anwar in [1]. See Figure 2.3.

1. Calculate new parameters z and p0 .

2. Between each row and column of the input, insert z number of zeros.
This increases the size of the input to (2i − 1) × (2i − 1).

3. Pad the modified input image with p0 number of zeros.

4. Carry out standard convolution on the image generated from step 3


with a stride length of 1.
CHAPTER 2. BASIC CONCEPTS

Figure 2.3: Convolutional Transposed layer.


10
CHAPTER 2. BASIC CONCEPTS 11

2.1.4 Ground Truth

Ground truth refers to the actual nature of the problem that is the target of
a machine learning model, reflected by the relevant data sets associated with
the use case in question. Supervised machine learning models are trained
on labeled data that are considered “ground truth” for the model to identify
patterns that predict those labels on new data.

Nearly all the time you can safely treat this the same as the label. In this
problem, we are going to use the Ground Truth (GT) as the label. As we
can see in Figure 2.4, the left image represent the welding image, and the
image on the right side is the ground truth where the white part represents
the defect.

Figure 2.4: Image and the GT.

2.2 Generative Adversarial Neural Networks

GANs are an approach to generative modeling using deep learning methods,


such as convolutional neural networks. Generative modeling is an unsuper-
vised learning task in machine learning. It involves automatically discovering
and learning the regularities or patterns in input data in such a way that the
model can be used to generate new examples that plausibly could have been
drawn from the original dataset. According to S. Goyal [6], GANs are a spe-
cific way of training a generative model by framing the problem as supervised
CHAPTER 2. BASIC CONCEPTS 12

learning with two sub-models, the generator model that is trained to generate
new examples, and the discriminator model that tries to classify examples as
either real or fake. Both models are trained together in a zero-sum game, ad-
versarial, until the generator manages to fool the discriminator about half the
time, meaning that the generator model is generating plausible examples. The
generator model generates images from random noise where random noise is
the input sampled using uniform or normal distribution and then it is fed into
the generator which generates an image. The generator output which is fake
images and the real images from the training set are fed into the discriminator
that learns how to differentiate fake images from the real images. See GAN
basic architecture in Figure 2.5.

The objective function of the complete network is the following:

min max V (D, G) = Ex∼Pdata (x) [log D(x)] + Ez∼pz (z) [log (1 − D(G(z)))]
G D
(2.1)

where:

• Pdata : Represents the distribution of real data.

• Pz : Represents the distribution of noise (usually a Gaussian distribu-


tion) from which we can generate a fake image.

• x and z: Represent the samples from each corresponding space.

• Ex and Ez : Represent the expected log-likelihood from the outputs of


both real and generated images.

• D function outputs a real number ranging between 0 and 1 representing


the probability for data being real (1) or fake (0). On the other hand,
the G function outputs a generated sample or instance.
CHAPTER 2. BASIC CONCEPTS 13

Figure 2.5: GAN Architecture.

2.2.1 Impact and some applications of GANs

Artificial Intelligence (AI) is continuously increasing its capabilities. In


June 2014, GANs started a revolution in deep learning. The rise of GANs
was inevitable and soon, they became a global phenomenon in Artificial Intel-
ligence. AI is now capable of creating new data after learning from unlabeled
input data, thus, it has become a “creative intelligence”.

Some applications:

• Predicting the next frame in a video: You train a GAN on video


sequences and let it predict what would occur next

• Image to Image Translation: Generate an image from another


image. For example, given on the left, you have labels of a street scene
and you can generate a realistic photo with GAN. On the right, you
give a simple drawing of a handbag and you get a realistic drawing of a
handbag.
CHAPTER 2. BASIC CONCEPTS 14

Figure 2.6: Image to Image Translation examples.

• Text to Image Generation: Text to Image Generation: Just ask to


your GAN what you want to see and get a realistic photo of the target.

• Sometimes GANs can be dangerous, for example, it can create fake


images that seem real and this fact can have serious ethical, social, and
political consequences. There are people working to create another AI,
that can identify if the images are real or fake.
Chapter 3

Our GAN

3.1 The Data

Figure 3.1: Four Real Images.

15
CHAPTER 3. OUR GAN 16

As we can see in Figure 4.6 we have several welding images that have been
taken with a special camera just in the welded area of the piece. In addition,
as we can see in Figure 2.4, a professional has been in charge of labeling the
area where the welding has not been done correctly, this label will be useful
when we want to create the welding images and their ground truth at the
same time.

3.2 How to define and train the generator model

The generator model is responsible for creating new fake images from random
noise. The random noise is an arbitrary vector space of Gaussian-distributed
values. The noise in itself meaningless but providing them to the generator
model during the training, the generator will assign meaning to the points,
until the end of the training, the noise represents a compressed representation
of the output space. If there is no overfitting in our model, we will be able to
create new images giving a new random Gaussian noise.

The job of the generator model is to transform the Gaussian distributed num-
bers to 2D image values. The structure of the neural network of the generator
can be arbitrary, allowing you to use neural networks as a multilayer per-
ceptron, a convolution neural network, or any other structure as long as the
dimensions of the input and output match the dimension of the latent space
and the real data. For our cause, we use a Dense layer as the first hidden
layer that has enough nodes to represent a low-resolution version of the out-
put image. The number of nodes depends on the image, so you have to try
with a different number of nodes. The activation from these nodes can then
be reshaped into something similar to an image, in our case, (N, 4, 4, 256).
After the first layer, there are many ways to do the upsampling, sometimes
called deconvolution. We are going to use the Conv2DT ranspose layer which
can be configured with a stride of (2 × 2) that will quadruple the area of the
input feature maps. After each Conv2DT ranspose layer, we are going to add
a LeakyReLU layer with alpha equal to 0.2. These two layers can be repeated
to reach the desired resolution. The output layer of the model is a conv2D
with 3 filters for each channel. This layer has a padding of (1, 1) because
we don’t want to change our image generated in the previous layers. A tanh
activation is used to ensure output values in the desired range of [−1, 1], a
current best practice.
CHAPTER 3. OUR GAN 17

Input: N-Elements vector of Gaussian random numbers


Output: RGB Image with pixel values in [-1,1]

Figure 3.2: Generator Model.


CHAPTER 3. OUR GAN 18

3.3 How to define and train the discriminator


model

This model takes a sample as an input and outputs a classification prediction


as to whether the sample is real or fake.

Input: RGB image


Output: Binary classification (real or fake)

Figure 3.3: Discriminator Model.

The discriminator model has a normal convolutional layer followed by four


convolutional layers using a stride of 2 × 2 to downsample the input. The
model is trained to minimize the binary cross-entropy loss function. We have
tried LeakyReLu and ReLU and the results were similar, we are also using
Dropout, and using the Adam version of stochastic gradient descent with a
learning rate of 0.0002 and a momentum of 0.5 recommended in [16].

Basically we could train this model with real examples with class labels of
CHAPTER 3. OUR GAN 19

one, and randomly generated samples with class labels of zero and use this
model for binary classification. After a few epochs this model is able to classify
correctly, because Deep learning is good for classification.
CHAPTER 3. OUR GAN 20

3.4 Loss Functions and Optimizers

Having the generator model and the discriminator model, we can specify how
they learn through the loss function and optimizers. There are many loss
functions but as we mentioned before, we are going to use the Binary cross-
entropy loss function because our target values are {0, 1} (f ake, real). The
loss function is evaluated first and models will be updated just in case if there
is a good reason. Cross-entropy will calculate a score that summarizes the av-
erage difference between the actual and predicted probability distributions for
prediction class 1. The score is minimized and a perfect cross-entropy is 0. Our
model defines the real label as 1 and the fake label as 0. These labels will be
used when calculating the losses of the Discriminator and the Generator, and
this is also the convention used in the original GAN paper. As an optimizer,
we are going to use Adam with a learning rate of 0.0002 and Betal = 0.5.
Adam is an optimization algorithm that can be used instead of the classical
stochastic gradient descent procedure to update network weights iteratively
based on training data.

3.5 Training

You update the parameters of the discriminator and the generator at each
training iteration. As it’s generally done for all neural networks, the training
process consists of two loops, one for the training epochs and the other for
the batches for each epoch.
We will construct different mini-batches for real and fake images, and also
adjust the Generator’s objective function to maximize the log D(G(z)).
The goal of training the discriminator is to maximize the probability of cor-
rectly classifying a given input as real or fake. Practically, we want to max-
imize log (D(x)) + log (1 − D(G(z))). Due to the separate mini-batch, this
step is done in two steps.

First, pass through the Discriminator, calculate the loss (log (D(x))), then
calculate the gradient in a backward pass. Secondly, we will construct a batch
of fake samples with the current generator, forward pass this batch through
CHAPTER 3. OUR GAN 21

the Discriminator, calculate the loss log (1 − D(G(z))), and accumulate the
gradient with a backward pass. Now, with the gradients accumulated from
both the all-real and all-fake batches, we call a step of the Discriminator’s
optimizer Now, we will train the Generator by maximizing log (1 − D(G(z)))
to generate better fakes. We accomplish this by classifying the Generator
output from part 1 with the Discriminator, computing the Generator’s loss
using real labels as GT, computing Generator gradients in a backward pass,
and finally updating G’s parameters with an optimizer step. It should be
noted that the generator model is only concerned with the discriminator’s
performance on fake samples. These two steps are repeated as many times as
the user-defined in the variable epochs, so One epoch is when an entire dataset
is passed forward and backwards through the neural network only once. The
number of batches within an epoch is defined by how many times the batch
size divides into the training dataset.
When we train the model, we do not do it one image at a time, we choose n
images and train it with all of them at the same time so that the training is
faster. This stack of n images is called batch size. In most cases, the batch
size is limited to the memory size of the GPU.

For the image selection, the dataset is shuffled before each epoch and we
choose the first batch. For training the model we have to repeatedly retrieve
samples of real images and samples of generated images and update the model
for a fixed number of iterations. After some iterations, the model will learn to
discriminate between real or fake images perfectly, because neuronal networks
are very good discriminators.

We update the discriminator separately for real and fake so that we can track
the accuracy of the model. Once we have both models we will instantiate them,
because it’s a good practice. From the DCGAN paper [16], the authors
specify that all model weights shall be randomly initialized from a Normal
distribution with mean = 0, stdev = 0.02. It should be noted that both
models use batch normalization. Batch normalization seeks to optimize the
model by replacing the “complicated interaction between all of the weights
of all of the layers” [Goodfellow, 2017] used to calculate mean and variance
features, with single mean and variance parameters [Goodfellow, 2017]. Batch
normalization was seen by the authors of DCGAN as one of the key reasons
for the model’s success, used in both G and D, and is regarded as essential to
the model [Radford et al., 2015, Salimans et al., 2016, Qi, 2017].
CHAPTER 3. OUR GAN 22

3.6 Results

Figure 3.4: Project working pipeline.

As we can see in Figure 3.4 (a), we already have learned the color later,
in Figure 3.4 (b), we learned the structure of the real images, and finally, in
Figure 3.4 (c), and Figure 3.4 (d), the generated images improved considerably.
The results are good, images are close to the real data, so the first
step is done but the generated images have a low resolution as it
can be realized in 3.4, so in the next step, we are going to try to
upscale the generated images.

One may think that changing the layers in the generator and in the discrimi-
nator to make them compatible with different size of images, would be enough,
but no, in practice we observed that this is not fulfilled. The discriminator
learns too fast, creating stability problems. We tried to stabilize it by play-
ing with the learning rate but it wasn’t satisfactory. GANs presents several
challenges to avoid during their training and which are currently subjects of
research, among them, the most common problems that arise in the training
of a GAN are:

• No convergence

• Model Collapse

• Non informative loss


CHAPTER 3. OUR GAN 23

3.6.1 No Convergence

The generator and the discriminator cannot reach an equilibrium. The loss
function of the generator and discriminator begin to oscillate without being
able to achieve long-term stability. Although it is common in the GAN that
at the beginning the loss functions oscillate, as the training progresses, the
objective is that stability is achieved. When this does not occur, the samples
are produced by the generator, but their quality does not improve.

3.6.2 Model Collapse

This occurs when the generator produces similar samples even though the
inputs are of very different characteristics. The generator finds that a small
set of samples fool the discriminator and then it is not able to produce others.
In these cases, the gradient of the loss function remains stagnant at a value
close to 0.

3.6.3 Non-informative loss

Although it seems natural to think that the lower the loss of the generator,
the higher the quality of the samples it produces, this is not so immediate.
The loss of the generator must be compared with that of the discriminator,
which is constantly improving. Therefore, it is not so easy to evaluate the
improvement of the model. The generator could be producing increasingly
higher quality samples, even as the loss function increases.

3.7 Solving problems and results with high


resolution

To solve problems encountered during the previous training, we tried a few


well-known strategies such as adding noise, or spectral normalization sug-
gested in [8],[11] to train GANs. A few of them helped to stabilize the model
and get images with high resolution. Some of the tricks used:
CHAPTER 3. OUR GAN 24

Adding Noise: As supported by Padala [10], Making the training of the


discriminator more difficult is beneficial for the overall stability, so we added
noise to both the real images and fake images.

Figure 3.5: GAN architecture with noise.

Spectral Normalization

In many papers, it is shown that Spectral Normalization [11], [8], a particular


kind of normalization applied to the convolution kernels, can help the stability
of the training. In our case, we used it on each convolution layer.

It should be noted that the resolution of our real image was 256 × 256 and for
our models, all images had to be the same. This problem was solved easily
with Keras, Pytorch, or OpenCV because in the data loader function we can
rescale the input images. After all these changes and improvements, we could
generate really good images, see Figures 3.6, 3.7, 3.8, 3.9 and 3.10.

Figure 3.6: This figure shows the training at epoch 10. As we can see the
model has learned the color and a bit of the structure of the real images.
CHAPTER 3. OUR GAN 25

Figure 3.7: This figure shows the training at epoch 60. Model have learned
the structure of the real images.

Figure 3.8: This figure shows the training at epoch 200. Generated images
are better.

Figure 3.9: This figure shows the training at epoch 500. The model has
learned to create defective images.
CHAPTER 3. OUR GAN 26

Figure 3.10: This figure shows the training at epoch 700. There is not a major
improvement.

The samples generated in this part were good enough to be valid and usable
for other tasks. Moreover, we have been able to upscale our images, making
some changes. In the first part, the generated images were good but when
we did the upscaling, the generated images were better. At the end of the
training, the generator model will be saved to file. This model can be loaded
and used to generate new random but plausible samples from the original
dataset. Since the images are generated by a random noise, the generated
images will be completely random, that is, they may or may not contain a
defect.
Chapter 4

Investigating Generating
Images and GT

So, we have created images with GANs, now we are going to try to generate
two images at the same time with GANs. The motivation behind this idea
is that when we work with deep learning models to detect a property in the
image, we have to label that image and generate a labeled image pointing
out the property. For this job, one professional have to label the image and
it can be expensive work.To avoid this work, we are going to generate with
our GAN, two images, the image containing the welding and also the labeled
image.

So,in this chapter, we are going to try to generate the image and its labeled
image from the same noise and for that We have tried three different ways to
handle this problem.

• Fourth channel.
• Joint images (Image and Ground Truth (GT) on same canvas).
• Create two generators.

27
CHAPTER 4. INVESTIGATING GENERATING IMAGES AND GT 28

4.1 4rth channel (GT as the fourth channel)

Here we used the model we have already created in the previous part and did
some changes: In the pre-processing, we had two folders, one for the images
and the other for the ground truth. We read both of them and used Python
numpy to join the ground-truth image as the fourth channel to the real image.
Normally images have 3 channels, the RGB where each channel represents one
color, sometimes images have 4 channels, where the fourth channel represents
the alpha value. In this part, we tried to use the ground truth image as the
fourth channel of the image. For that, we read our image and converted it
to a binary image, and used the concatenate function to join it. See Figure
4.1. The reason why we believe that this method can work is that the color it
generates is the same as the original images, so we could say that each channel
learns correctly, so we believe that the fourth channel will learn in the same
way.

Once we had our four-channel images, we had to change our models to make
them compatible with the four-channel images.

Figure 4.1: Our training input with for channels.


CHAPTER 4. INVESTIGATING GENERATING IMAGES AND GT 29

Results

Figure 4.2: Our training input with for channels.

As we can see in Figure 4.2 images generated by our fourth channel model
are not good, they contain noise and the model hasn’t learned the relation
between the image and the ground truth.

4.2 Joint Images

In this section, we have used a single canvas and we have added the two images
to it, in the first half the original image and on the other half, the image that
represents the ground truth. See Figure 4.3. The reason we thought this
method would work is that we know that layers learn certain characteristics
and we believe that some layers would learn to create the ground truth part
correctly. Here we didn’t have to change the model.
CHAPTER 4. INVESTIGATING GENERATING IMAGES AND GT 30

Figure 4.3: Two images on one Canvas.

Results

Figure 4.4: Joint images.

As we can see in Figure 4.4, we have good images but in some images the
ground truth has no relation with the image. It was something expected, we
could have done more experiments like changing the color of the ground truth
but as we have already mentioned we don´t have enough time to spend on it.
CHAPTER 4. INVESTIGATING GENERATING IMAGES AND GT 31

4.3 Two generators

This model comes from another idea where Beckam and Pal (2017) employed
the use of two generators and two discriminators within a GAN to generate
terrain. This method uses one generator to map the noise to heightmaps,
and the second generator to map from heightmap to textures. Likewise, two
discriminators are used D(x) and D(x, y).D(x) computes the probability that
x is a real heightmap, while D(x, y) is the probability that (x, y) is a real
heightmap/texture compared to a real heightmap. We are going to try some-
thing similar.

In this model, a single z is drawn from the latent space and is the input of both
generators, Gi for the image and GT for the image of the ground truth. The
discriminator’s Di and Dt try to differentiate between the generated images
and the real examples. In addition to checking if the ground truth corresponds
to the image, we will use the cosine similarity function to get a similarity loss.
See Figure 4.5.

Figure 4.5: Joint GAN framework. Gi , Gt , Di , and Dt the generators and


discriminators for the image and text. Xr∗ are the real examples and Xg∗ are
the generated samples.
CHAPTER 4. INVESTIGATING GENERATING IMAGES AND GT 32

4.4 Similarity Loss

The similarity loss function used will control how the embeddings of the do-
mains relate to each other. The discriminators Di and Dt output embedding
vectors embi and embt . The similarity loss we use is an l2 loss on the angle
between the embedding vectors.

  2
embi ∗ embt
SimLossθ (embi .embt ) = θ − arcos (4.1)
||embi ||2 embt ||2

The target angle differential θ is 0 for embeddings that should be the same
and π for embeddings that should be different.
CHAPTER 4. INVESTIGATING GENERATING IMAGES AND GT 33

Results

Figure 4.6: Four generated images.

As we can see in Figure 4.6, the generated images are quite good and the
Ground Truth (GT) is related and matches the generated images.
Chapter 5

Data Analysis on Industrial


application on FSW

5.1 Background

Friction Stir Welding (FSW) is a contact welding process that uses the
heat generated by friction to fuse two different pieces. The advantage of this
technique is that no consumables are required in the process. Finished welded
parts have improved aesthetics compared with other welding methods.

Friction stir welding uses a special tool that rotates at high speeds over the
seams that need to be weld together and as the tool rotates over the metal,
heat generates between them. This heat causes the metals to become plastic
and fuse into one another. FSW robotic welding cells are one of the facilities
used for this type of manufacturing. During the FSW welding process, the
robotic arm is controlled by a force control in which two main stages are
distinguished:

1. Penetration stage.

2. Translation-welding stage.

34
CHAPTER 5. DATA ANALYSIS ON INDUSTRIAL APPLICATION ON FSW 35

Figure 5.1: Basic principles of welding “Friction Stir Welding”.

Figure 5.2: Robotic FSW Welding and Aluminum Extrusion Welding Facility
at LORTEK.

To achieve good quality welded joints, the penetration phase must be carried
out in a controlled manner to ensure correct heating and penetration of the
tool into the weld joint, before the translation-welding stage. Due to various
CHAPTER 5. DATA ANALYSIS ON INDUSTRIAL APPLICATION ON FSW 36

factors, excessive penetration of the tool can sometimes occur resulting in ex-
cessive burr and unacceptable parts of poor quality (see Figure 5.3). On the
other hand, if the penetration is carried out properly, good heating and pen-
etration of the tool are achieved, so that pieces of good quality are produced.
See Figure 5.4.

Figure 5.3: Poor quality part with excess Burr resulting from excessive pene-
tration of the tool in the penetration phase.

Figure 5.4: Good quality part resulting from appropriate penetration of the
tool.
CHAPTER 5. DATA ANALYSIS ON INDUSTRIAL APPLICATION ON FSW 37

5.2 Problem to solve

The possibility of detecting adequate penetration by analyzing the evolution


of the axial force monitored during the penetration phase (or other data,
parameters) is considered. Figure 5.5 shows the evolution in the case of a
poor quality part, while Figure 5.6 shows the evolution for a good quality
part. These graphs are obtained employing a data viewer developed as part
of the control system of the robotic FSW welding installation. This control
system is also prepared to record various parameters of the FSW welding
process in each of the welds carried out, generating a CSV file where the
data is stored.

Figure 5.5: Evolution of the axial force monitored during the penetration and
welding phases of a poor quality part (shown in Figure 5.3).

Figure 5.6: Evolution of the axial force monitored during the penetration and
welding phases of a good quality part (shown in Figure 5.4).

Therefore, the possibility of developing some type of FSW welding data anal-
ysis system that allows evaluating the quality of the welded joints made is
raised. In this case, the aim would be to develop a system that, through
automated data analysis (CSV file), is capable of indicating whether the
CHAPTER 5. DATA ANALYSIS ON INDUSTRIAL APPLICATION ON FSW 38

penetration phase has been appropriate and a good quality part has been
achieved. To distinguish between good/bad parts, the following data could
be analyzed within the penetration phase:

• Time elapsed between tool-part contact (F orceF eedback : IN T = 200N )


and change in force command (F orceT arget : IN T of 4500N → 5750N ).

• Position in Z(RP osZ : IN T ) at the instant of change in force command


(F orceT arget : IN T of 4500N → 5750N ).

• Slope of the (F orceF eedback : IN T ) curve at the instant of change in


force command (F orceT arget : IN T of 4500N → 5750N ).

• Others ...

5.3 Solution

We start by reading the first minute of the CSV and through several plots,
we see what variables influence to decide if it is a good piece. The first thing
to see is what values are those that inform us about the failures. After seeing
the graphs of all of them, we have been able to observe that the force exerted
on the defective parts exceeds the force that theoretically we want to apply
several times.

Figure 5.7: Bad welding on the left side vs good welding on the right side.
The green line is the force we apply to the material and the red line is the
real force.
CHAPTER 5. DATA ANALYSIS ON INDUSTRIAL APPLICATION ON FSW 39

With this information we are able to know if the piece has been welded cor-
rectly or not but the detection has been made by the consequence, that is, we
had to read the entire CSV. What matters is to stop it right in the penetration
part because in that phase we believe that it is already possible to know that
a defective part is going to be produced.
To solve this problem, instead of reading the entire minute CSV, we read only
the penetration phase, that is, only the part in which the force is applied is
less to 4500N and we stop just when we tell it to exert a force of 5750N since
that means that the movement has already started. Once we have the data
we want to study, we have seen that several factors can influence the detection
of defective parts. To carry out these studies, cubic splines and polynomial
fits have been made.

Z-axis: It has been seen that the defective pieces, in the penetration phase,
are perforated deeper than the correctly wielded pieces. This parameter will
have be most important when detecting whether the part is defective or not.

Time/Force: Through splines, we have been able to fit cubic functions using
the actual force and time. Through the splines function we have achieved 4
different cubic functions and We have used these functions to evaluate the new
welds. To fit the function, We have used the function already implemented in
scipy.interpolate [15] passing the time as X and the force as Y.
CHAPTER 5. DATA ANALYSIS ON INDUSTRIAL APPLICATION ON FSW 40

Figure 5.8: Good welds vs Bad weld.

We used splines because we don’t have enough data to apply something like
Random Forest, XGBoost, or Deep Learning and as we can appreciate in
Figure 5.8 the force can be differentiated between the bad and the good one,
so that’s why we thought that splines are a good approach for this problem.

Once we have the functions and we know how far it has to penetrate on the
z-axis, each time the penetration process is finished, we will pass that data
through our function and check its z-axis to make a decision. With all this
data and knowledge we have created an application in Qt with C++ that allows
us to know if a failure has occurred just at the moment of penetration. The
application is fast and is 100% effective on the tested CSV.
CHAPTER 5. DATA ANALYSIS ON INDUSTRIAL APPLICATION ON FSW 41

5.4 The Application

Figure 5.9: The Application detecting bad welds

As can be appreciated, the application shows through an image if it is a


defective or good weldment. In addition to saying if the weldment is defective,
it also tells what kind of error the weldment contains.

5.4.1 Optional Application

When we were reading our CSV, we realized that we need an application to


create plots, so we created a simple application that allows us to read the
CSV and create plots with any variables.
CHAPTER 5. DATA ANALYSIS ON INDUSTRIAL APPLICATION ON FSW 42

Figure 5.10: Our Plot application.

As we can see in Figure 4.6, our application also allows us to select a range
to zoom it.
Chapter 6

Conclusions and future work

6.1 Conclusions

A variety of improvements could be made, but overall we are happy with the
obtained results. We have learned a lot in this project about important and
complex concepts like DL, GANs, and FSW.

We have been able to observe that training a GAN is not an easy task and
we have also seen how the slightest change, such as wanting to generate im-
ages with better resolution, makes the model stop working. Simultaneously
generating an image and GT is interesting because it can be later used for
many other tasks and we could also use the GT as the label. For this we tried
three different techniques, getting results in which the images do look quite
similar to the real ones. Each training has a duration of about 8-24 hours so
we have not been able to do many tests, with more time we could get better
results. Once the models have been created and trained, they can be used to
generate as many images as needed.

It is known that DL learning models work better when we have a big amount
of data, the problem is that sometimes companies want to identify defective
pieces but they haven’t produced many defective pieces to share them with
us, so we used this GAN model to create synthetic images with defects and
we used all that dataset combined to do a good identification.

43
CHAPTER 6. CONCLUSIONS AND FUTURE WORK 44

In the last section, we have worked with a real industrial problem, in which
we had the parameters of friction welding saved in a CSV for each piece.
Making an analysis, we find out which are the most important variables and
throw splines we fitted them into multiple functions. This functions helped
us to see if the welded piece has been welded correctly or not. We have also
created two applications, the first one, given a CSV, tells us if the part is
correct or not , and the second one helps us to create plots to visualize and
help the operator.

In this problem, we have only worked with one type of friction welding problem
but in the future, we would like to work on other problems. Both applications
are currently being used in the welding machine and are useful in detecting
errors.

6.2 Future work

The work presented in this thesis can be further expanded in many different
ways.

The resolution of the images generated by GANs could be improved as cur-


rently only two sizes have been tested, (64 × 64) and (128 × 128). We could
even generalize it for different types of sizes since for each project there are
images of a specific size. Another thing that could be done is to use the input
of the GANs to control the output, i.e. to be able to control the position and
size of the defect as well as the type of weld. From the beginning, we wanted
to have control over the output of the images, but that can only be achieved
when we have a lot of images. In our case, we are creating GANs to augment
the data and then we could work on e.g. Variational encoder to control the
output.

As we have been able to observe in the Figure 4.6, the images generated by
the models in which we generate the images with its GT, are not very sharp,
we could try to play with some parameters to try to obtain better images.

On the other hand, in the friction welding section, we could detect more types
of defects since in this thesis we have only worked with the defects that occur
in the penetration phase..
Bibliography

[1] Aqeel Anwar. What is Transposed Convolutional Layer? link, April


2021.

[2] Jason Brownlee. How to Code the GAN Training Algorithm and Loss
Functions. link, July 2019.

[3] Conda. Getting started with Conda — Conda 4.10.3.post38+0b1312ce


documentation. link, 2017.

[4] Md Shahid (dshahid380). Convolutional Neural Network. link, February


2019.

[5] Twi Global. Refill Friction Stir Spot Welding. link, September 2020.

[6] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David
Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio.
Generative adversarial networks. Commun. ACM, 63(11):139–144, 2020.

[7] Shweta Goyal. GANs - A brief introduction to Generative Adversarial


Networks. link, December 2019.

[8] Zinan Lin, Vyas Sekar, and Giulia C. Fanti. Why spectral normalization
stabilizes GANs: Analysis and improvements. CoRR, abs/2009.02773,
2020.

[9] Bingchen Liu, Yizhe Zhu, Kunpeng Song, and Ahmed Elgammal. To-
wards faster and stabilized GAN training for high-fidelity few-shot image
synthesis. In 9th International Conference on Learning Representations,
ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net,
2021.

[10] Padala Manisha, Debojit Das, and Sujit Gujar. Effect of input noise
dimension in gans. CoRR, abs/2004.06882, 2020.

45
BIBLIOGRAPHY 46

[11] Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida.
Spectral normalization for generative adversarial networks. CoRR,
abs/1802.05957, 2018.

[12] Artem Oppermann. What is Deep Learning and How does it work? link,
August 2020.

[13] Isuru Pamuditha. Installing Pytorch with GPU Support (CUDA) in


Ubuntu 18.04 — Complete Guide. link, May 2021.

[14] Marco Pasini. 10 Lessons I Learned Training GANs for a Year. link, July
2019.

[15] Timmy Siauw Qingkai Kong and Alexandre Bayen. Cubic spline inter-
polation. link. Accessed: 2021-11-04.

[16] Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised represen-
tation learning with deep convolutional generative adversarial networks.
In Yoshua Bengio and Yann LeCun, editors, 4th International Conference
on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May
2-4, 2016, Conference Track Proceedings, 2016.

[17] Joseph Rocca. Understanding Generative Adversarial Networks (GANs).


link, March 2021.

[18] Sumit Saha. A Comprehensive Guide to Convolutional Neural Networks


— the ELI5 way. link, December 2018.

[19] Jean Vitor. How to load Pytorch models with OpenCV. link, October
2020.

[20] Kashyap Vyas. Friction Welding: Process, Types, its Advantages. link,
August 2019.

You might also like