You are on page 1of 6

Understanding Deep Learning: DNN, RNN, LSTM, CNN

and R-CNN
medium.com/@sprhlabs/understanding-deep-learning-dnn-rnn-lstm-cnn-and-r-cnn-6602ed94dbff

March 21, 2019

Image credit

Deep Learning for Public Safety


It’s an unavoidable truth that violent crime and murder is increasing around the world
at an alarmin rate, like in America murder rate is increased by 17% higher than five
years ago. Among the murders that were occurred, about 73% of US murders are
committed with guns, a proportion of which has increased in recent years.¹ World
leaders are trying to clamp-down this certain situation with the help of their law
enforcement system. Despite their efforts, sometimes things get out of control due to
the lack of action in no time. But in such cases, we the tech giants can make an approach
to ensure public safety using Deep Learning.

This can be demonstrated through a simple model where we are going to look at an
active shooter and how an object detection system is going to identify a weapon, track
the criminal and deploy a depth sensing localized drone to de-escalate with a pepper

1/6
spray and then escalate using force by dropping down 3 feet to the group and deploying
an electric shock weapon.

This figure is showing how a simple model that is developed using deep learning can be used to
ensure public safety.

For attaining this model, we have to use Machine Learning. Questions may arise in your
mind what is this Machine Learning and Deep Learning as most of the people just enjoy
the benefits of technology but very few of them are aware or interested to know about
the terms and how they work. Here we are going to give you a concise lucid idea about
these terms.

What is Machine Learning?


Machine Learning is a subset of Artificial Intelligence and Deep Learning is an
important part of its’ broader family which includes deep neural networks, deep belief
networks, and recurrent neural networks.² Mainly, in Deep Learning there are three
fundamental architectures of neural network that perform well on different types of
data which are FFNN, RNN, and CNN.

Deep Neural Networks (DNNs)


Deep Neural Networks (DNNs) are typically Feed Forward Networks
(FFNNs) in which data flows from the input layer to the output layer without going
backward³ and the links between the layers are one way which is in the forward
direction and they never touch a node again.

2/6
The outputs are obtained by supervised learning with datasets of some information
based on ‘what we want’ through back propagation. Like you go to a restaurant and the
chef gives you an idea about the ingredients of your meal. FFNNs work in the same
way as you will have the flavor of those specific ingredients while eating but just after
finishing your meal you will forget what you have eaten. If the chef gives you the meal of
same ingredients again you can’t recognize the ingredients, you have to start from
scratch as you don’t have any memory of that. But the human brain doesn’t work like
that.

Recurrent Neural Network (RNN)


A Recurrent Neural Network (RNN) addresses this issue which is a FFNN with a
time twist. This neural network isn’t stateless, has connections between passes and
connections through time. They are a class of artificial neural network where
connections between nodes form a directed graph along a sequence like features links
from a layer to previous layers, allowing information to flow back into the previous
parts of the network thus each model in the layers depends on past events, allowing
information to persist.

3/6
In this way, RNNs can use their internal state (memory) to process sequences of
inputs. This makes them applicable to tasks such as unsegmented, connected
handwriting recognition or speech recognition. But they not only work on the
information you feed but also on the related information from the past which means
whatever you feed and train the network matters, like feeding it ‘chicken’ then ‘egg’ may
give different output in comparison to ‘egg’ then ‘chicken’. RNNs also have problems
like vanishing (or exploding) gradient/long-term dependency problem where
information rapidly gets lost over time. Actually, it’s the weight which gets lost when it
reaches a value of 0 or 1 000 000, not the neuron. But in this case, the previous state
won’t be very informative as it’s the weight which stores the information from the past.

Long Short Term Memory (LSTM)


Thankfully, breakthroughs like Long Short Term Memory (LSTM) don’t have this
problem! LSTMs are a special kind of RNN, capable of learning long-term
dependencies which make RNN smart at remembering things that have happened in
the past and finding patterns across time to make its next guesses make sense. LSTMs
broke records for improved Machine Translation, Language Modeling and Multilingual
Language Processing.

Convolutional Neural Network (CNN)


Next comes the Convolutional Neural Network (CNN, or ConvNet) which is a
class of deep neural networks which is most commonly applied to analyzing visual
imagery. Their other applications include video understanding, speech recognition and
understanding natural language processing. Also, LSTM combined with

4/6
Convolutional Neural Networks (CNNs) improved automatic image captioning
like those are seen in Facebook. Thus you can see that RNN is more like helping us in
data processing predicting our next step whereas CNN helps us in visuals analyzing.

RNN or CNN: Which One is Better?


Though RNNs operate over sequences of vectors: sequences in the input, the output, or
in the most general case both in comparison with CNN which not only have constrained
Application Programming Interface (API) but also fixed amount of
computational steps. This is why CNN is kind of more powerful now than RNN. This is
mostly because RNN has gradient vanishing and exploding problems (over 3 layers, the
performance may drop) whereas CNN can be stacked into a very deep model, for which
it’s been proven quite effective.

But CNNs are not also flawless. A typical CNN can tell the type of an object but can’t
specify their location. This is because CNN can regress one object at a time thus when
multiple objects remain in the same visual field then the CNN bounding box regression
cannot work well due to interference. As for example, CNN can detect the bird shown in
the model below but if there are two birds of different species within the same visual
field it can’t detect that.

While an R-CNN (R standing for regional, for object detection) can force the CNN to
focus on a single region at a time improvising dominance of a specific object in a given
region. Before feeding into CNN for classification and bounding box regression, the
regions in the R-CNN are resized into equal size following detection by selective search
algorithm. Therefore, it helps to specify a preferred object.

Are there any techniques to go one step further and locate exact pixels of each object
instead of just bounding boxes? Yes, there is. Image segmentation is what Kaiming He
and a team of researchers, including Girshick, explored at Facebook AI using an
architecture known as Mask R-CNN which can satisfy our intuition a bit.

How Our Designed Model is Going to Work?


In the previously mentioned model, we have combined RNN and CNN to make R-
CNN which performs as Mask R-CNN. It can identify object outlines at the pixel level
by adding a branch to Faster R-CNN that outputs a binary mask saying whether or not
5/6
a given pixel is part of an object (such as a gun). This helps with Semantic and Instance
Segmentation and to eliminate Background Movement. Our approach uses Augmented
Reality to Sense Space, Depth, Dimensions, Angle — like a localized GPS which may
help us detecting the body pose of a shooter and from which we can predict what may
happen next by analyzing previous data. The drone is used there for mobility, discovery,
close proximity encounter to save lives immediately.

We found the iPhone A12 Bionic Chip a great edge decentralized neural network engine
as the latest iPhone XS max has 6.9 billion transistors, 6-core CPU, 8-core Neural
Engine on SoC Bionic chip and can do 5 trillion operations per second which is suitable
for machine learning and AR depth sensing.

References:

1. US violent crime and murder down after two years of increases, FBI data
shows,24/9/2018, The Guardian.

2. The definition “without being explicitly programmed” is often attributed to Arthur


Samuel, who coined the term “machine learning” in 1959, but the phrase is not found
verbatim in this publication and may be a paraphrase that appeared later. Confer
“Paraphrasing Arthur Samuel (1959), the question is: How can computers learn to solve
problems without being explicitly programmed?” in Koza, John R.; Bennett, Forrest H.;
Andre, David; Keane, Martin A. (1996). Automated Design of Both the Topology and
Sizing of Analog Electrical Circuits Using Genetic Programming. Artificial Intelligence
in Design ’96. Springer, Dordrecht. pp. 151–170.

3. Hof, Robert D. “Is Artificial Intelligence Finally Coming into Its Own?”. MIT
Technology Review. Retrieved 2018–07–10.

6/6

You might also like