You are on page 1of 16

Complete Glossary of Keras Neural Network Layers (with

Code)
analyticsarora.com/complete-glossary-of-keras-neural-network-layers-with-code

Avi Arora July 8, 2021

Article Overview

Introduction
Deep learning isn’t easy to know about. Since the subject is relatively new and still in its
developing phase, beginners often find it hard to find all the information they need in one
place, especially about all the different types of neural network layers that exist. I know this
since I went through this phase myself.

So, to help out the future aspirants, I decided to take it upon myself to make a glossary of all
the neural network layers, along with the procedure of their instantiation in Keras.

1/16
So, if you need any information about what neural network layers are all about or how they
work, feel free to go through this article! Let’s start.

What Are Neural Network Layers?


Layers are the basic building blocks of an artificial neural network. Each layer consists of a
specific number of neurons and has a specific purpose. From a broader perspective, there are
three basic types of neural network layers:

Input layer
Output layer
Hidden layer

The first layer that takes in the inputs to the neural network is referred to as the input layer
and the last layer that produces the results for a given input is called the output layer. Every
layer in between is referred to as a hidden layer since the user cannot and does not have to
interact directly with it.

Each layer has a specific set of parameters associated with it:

Weights
Activation function
Bias

All these parameters contribute towards and output of each node and subsequently the
inputs to the next layer. Here’s a graphical representation of how this works:

2/16
*Note: the unit step function is used only for demonstration purposes. Any activation function of choice
could be used here.

Now that we have a brief overview of what the layers are, let’s dive into the specifics of each
layer and its functions.

Core Layers
These are the most important neural network layers that will be used in almost all the
networks, regardless of what the application is. You can think of them as the basic building
blocks of every deep learning project.

I. Input Layer

The input layer is responsible for taking in the input to the neural network. It consists of
several nodes and passes on its output to the hidden layers in front of it. The architecture of
an input layer is quite straightforward since it doesn’t have any weights associated with it.

Here’s how simply you can make an input layer for a model.

3/16
x = Input(shape=(32,))
y = tf.square(x) # This op will be treated like a layer
model = Model(x, y)

II. Dense Layer

The dense layer is the most common type of layer used in a neural network. Its basic
characteristic is that all neurons in a dense layer receive inputs from all the neurons present
in the previous layer, hence the name dense. Sometimes, it’s also referred to as the fully
connected (FC) layer.

The layer accepts all kinds of data types and usually is used with a variety of activation
functions. However, the most common types of activation functions used are ReLu or leaky-
ReLu.

Here’s how it can be instantiated:

from keras.models import Sequential


from keras.layers import
Activation, Dense

model = Sequential()
layer_1 = Dense(16, input_shape =
(8,))
model.add(layer_1)

III. Embedding Layer

The embedding layer is a vital part of the neural network when you’re dealing with text. To
understand what the embedding layer does, you need to know word embeddings first.
Briefly, It’s a technique where similar types of words, that have similar meanings, have a
similar representation.

The embedding layer could be thought of as an improvement to the bag of words models
often used in machine learning. Word embeddings could be used on text data and once it’s
learned, you could use it in different neural networks. Let’s see how this layer works.

4/16
The layer takes in categorical input as a 1D array, with each document (word) represented in
the form of a unique integer. The output from the layer is, however, a 2D array where each
document inputted has a corresponding embedding with it. Note that if you don’t have
labeled text data, you could use the Tokenizer API from Keras.

There are 3 arguments that you need to specify:

input_dim: the size of input vocabulary, or the no. of words.


output_dim: the size of the vector to represent embedded words.
input_length: length of the individual input documents.

Also, it’s not necessary to use it along with the rest of your deep learning network and you
could use it independently as well to learn the weights using a given vocabulary. Also, if you
need to skip this step as well and need to use some common words, a great option is to use
Transfer Learning and use the pre-trained embedding models available.

Let’s see how the keras embedding layer can be instantiated:

model = tf.keras.Sequential()
model.add(tf.keras.layers.Embedding(1000, 64, input_length=10))
input_array = np.random.randint(1000, size=(32, 10))
model.compile('rmsprop', 'mse')
output_array = model.predict(input_array)
print(output_array.shape)

>>> (32, 10, 64)

IV. Masking Layer

Sometimes during the processing of data, it’s common to have samples of varying lengths.
However, to ensure consistency within the data samples, a mechanism known as padding is
used. Padding assigns dummy values where the data is missing. However, we need to make
sure that the padded values don’t affect our calculations, since they were just dummy values.

So, what masking essentially does is that it lets the model know what specific values are
missing, to skip them while doing the calculations. However, a masking layer could be used
without padding as well. Even if there’s no padding used, masking helps the model know
what specific data points are missing.

Let’s take a look at how a masking layer can be used:

5/16
samples, timesteps, features = 32, 10, 8
inputs = np.random.random([samples, timesteps, features]).astype(np.float32)

inputs[:, 3, :] = 0.
inputs[:, 5, :] = 0.

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Masking(mask_value=0.,
input_shape=(timesteps, features)))
model.add(tf.keras.layers.LSTM(32))

For this specific case, the time steps 3 and 5 will be skipped from the LSTM calculation.

V. Lambda Layer

If you’re familiar with Python, there are chances you know what Lambda functions are – they
are used to transform an input value to a certain output value using a specific function. It
could be as simple as multiplying the input by two.

Lambda layers follow the same intuition and whatever input is passed onto them, they simply
apply a function to the input and give the output. Below given is an example to see how
they’re instantiated:

model.add(Lambda(lambda x: x ** 2))

6/16
That’s how easy it is to introduce a lambda layer into your neural network! However, it’s
pretty unlikely that you’ll want to make a function as short as the above one. Most of the
times, functions are written separately to ensure code readability. Let’s see another example
where a bit longer function is used to transform data in the lambda layer.

def antirectifier(x):
x -= K.mean(x, axis=1, keepdims=True)
x = K.l2_normalize(x, axis=1)
pos = K.relu(x)
neg = K.relu(-x)

return K.concatenate([pos, neg], axis=1)

model.add(Lambda(antirectifier))

There you go! Just like that, you can add any function you require into the lambda layer. It
could be as complex as you require, or as simple as a mere line. As long as it fulfills your
requirements, you’re good to go.

Pooling Layers
Pooling is a very important concept whenever we talk about Convolutional Neural Networks,
or more commonly known as CNNs. The pooling layer helps to down-sample the feature
maps. This is achieved by effectively summarizing the features in patches. Not only does it
solve the problem of CNNs being very sensitive to feature locations in the input map, but it
also drastically reduces the computational resources required to learn the parameters.

There are two major categories of Pooling used in practical neural networks:

Average Pooling: Down-sample using average per patch.


Max Pooling: Down-sample using the maximum value in each patch.

An important thing to note here is that there are no learnable parameters in the pooling
layer, and it merely serves the purpose of reducing the dimensions of the tensor.

According to the tensor dimensions, there are a lot of Pooling classes provided by Keras. For
the sake of example, we’ll be considering max-pooling in 2D. However, feel free to dive in
further here.

This is how the MaxPooling2D layer can be initialized:

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2), input_shape=(4, 4, 1)))
model.compile('adam', 'mean_squared_error')

model.predict(input_image, steps=1)

7/16
You can set the input_image variable according to your use case and change the parameters
as you require, and the pooling layer is all set to roll.

Convolutional Layer
Convolutional layers are the basic building blocks of CNNs. The task of a convolutional layer
is nothing but applying filters onto the input passed onto it. Once the output of the layer is
calculated, an activation function is applied, and the results are passed on to the successive
layer.

If there are multiple filters applied to the input image, the resulting image is referred to as a
feature map, which detects different features present in the input image.

This demonstrates a single filter (yellow) and the resulting convolved feature it
produces.

The filters used in the convolutional layers are of varying types. They could be as simple as
line detectors. The main aim, however, of a convolutional layer is to learn all the parameters
within the context of the problem we’re dealing with.

Now Keras provides a lot of different classes for convolutional layers depending upon the
requirements and the dimensions of the input tensors. The details can be found here on
official docs.

Let’s initialize a two-dimensional convolutional layer as an example and see how things work
out.

8/16
# The inputs are 28x28 RGB images with `channels_last` and the batch
# size is 4.

input_shape = (4, 28, 28, 3)


x = tf.random.normal(input_shape)
y = tf.keras.layers.Conv2D(2, 3, activation='relu', input_shape=input_shape[1:])(x)

print(y.shape)
>>> (4, 26, 26, 2)

That’s a very basic example of how you can use the convolutional layers but when you study
them in-depth, you’ll figure out there are actually a lot of parameters that you can play
around with and set according to your requirements.

Preprocessing Layers
Preprocessing is the foremost step that’s done after collecting the data and before data
normalization. From cleaning the data thoroughly to applying any data transformations
required, preprocessing takes care of everything. In a nutshell, it turns the raw, unstructured
data into useful information that can be processed by the neural network.

Since preprocessing is done on raw data, it differs for applications. So, there are different
preprocessing layers you’d need for different purposes. Let’s move on and see what layers
Keras provides:

I. Text Preprocessing

This consists of the textvectorization layer which is very helpful when you have raw text data
available. The layer preprocesses the samples inputted to it in the form of strings in the
following way to finally convert them into a form that the mode can comprehend:

1. Standardizing the sample (lowercase)


2. Splitting into substrings (breaking down into words)
3. Recombining into tokens (ngrams)
4. Indexing
5. Transform using the Index

That’s all. Let’s see a practical example to instantiate a textvectorization layer using a list of
random words:

9/16
vocab_data = ["earth", "wind", "and", "fire"]
max_len = 4 # Sequence length to pad the outputs to.

# Create the layer, passing the vocab directly. You can also pass the
# vocabulary arg a path to a file containing one vocabulary word per
# line.

vectorize_layer = TextVectorization(
max_tokens=max_features,
output_mode='int',
output_sequence_length=max_len,
vocabulary=vocab_data)

# Because we've passed the vocabulary directly, we don't need to adapt


# the layer - the vocabulary is already set. The vocabulary contains the # padding
token ('') and OOV token ('[UNK]') as well as the passed tokens.

vectorize_layer.get_vocabulary()
>>> ['', '[UNK]', 'earth', 'wind', 'and', 'fire']

II. Numerical Preprocessing

When you have numerical data at hand, the task isn’t complex. Since the data is already in a
form that can be fed to the network, all you have to do is normalize the data so there are no
unusually high or low values affecting the model training. This is achieved by the
preprocessing normalization layer.

You can simply feed the layer your raw data and it will attempt to normalize it with a mean or
0 and a standard deviation of 1. Let’s see an example of how to set the layer up in Keras.

input_data = np.array([[1.], [2.], [3.]], np.float32)


layer = Normalization(mean=3., variance=2.)
layer(input_data)

>>> <tf.Tensor: shape=(3, 1), dtype=float32,


numpy= array([[-1.4142135 ],
[-0.70710677],
[ 0. ]], dtype=float32)>

As shown above, you can pass in your required mean and variance values when instantiating
the normalization layer.

Keras provides another useful preprocessing layer for numerical data called the discretization
layer. If you have continuous data but your project requires a discrete set of values, the
discretization layer can be quite helpful for you. Feel free to dive in here to know more about
it.

III. Categorical Preprocessing

10/16
Just like the text data, categorical data also cannot be used by the neural network, and we
need to preprocess it and encode it in some way before passing it on to the training phase.
Keras provides a CategoryEncoding layer built-in for this very purpose.

The class structure is as follows:

tf.keras.layers.experimental.preprocessing.CategoryEncoding(
num_tokens=None, output_mode="multi_hot", sparse=False, **kwargs
)

The output_mode parameter lets you decide what type of encoding you want to use. There
are three types of encodings that you can choose from, namely:

one-hot encoding
multi-hot encoding
count encoding

Here’s an example that uses one-hot encoding:

layer = tf.keras.layers.experimental.preprocessing.CategoryEncoding(
num_tokens=4, output_mode="one_hot")
layer([3, 2, 0, 1])

>>> <tf.Tensor: shape=(4, 4), dtype=float32, numpy=


array([[0., 0., 0., 1.],
[0., 0., 1., 0.],
[1., 0., 0., 0.],
[0., 1., 0., 0.]], dtype=float32)>

IV. Image Preprocessing

Last but not least, preprocessing images is another very important topic to cover, especially if
you’re interested in using CNNs for object detection. Keras offers three different image-
preprocessing classes that you can use to transform your images as you like:

1. Resizing layer: Used to resize the input tensors. Also lets you change the aspect_ratio
of the image.
2. Rescaling layer: To rescale the image to a required level. Preserves the scale of the
input and keeps it the same in the output.
3. CenterCrop layer: Used to crop the central portion of the image to the dimensions of
your requirement.

Note: Except for the Rescaling layer, the input and output tensors should be 4D.

Normalization Layers

11/16
In the context of deep learning, normalization is the process that helps prepare the data
before it’s used for training a neural network. Most of the time, the data collected comes from
real-life scenarios, and hence the spread is a lot. This causes a lot of unnecessary noise and
inconsistencies in the model performance since a small set of values has a huge effect on the
model.

To deal with it, normalization is used, and a uniform scale is deployed for the numerical
values present. However, the process ensures that there isn’t any loss of information and the
range on which the values are spread out isn’t affected as well.

There are two normalization layers present in Keras that we’ll be taking a look at.

I. Batch Normalization Layer

Batch normalization is the most used normalization technique, especially in the case of
CNNs. It’s used when the neural network is trained in the form of mini-batches. This
essentially means that instead of passing a single training example in the network and then
backpropagating the error, we pass multiple examples in the form of batches.

Batch normalization is applied on the activations on all the neurons in the batch to make the
output close to Gaussian distribution, i.e., mean close to 0 and standard deviation close to 1.
There are some additional advantages of using batch normalization, such as:

Improves the training time of the network


Has a regularization effect overall
Reduces the effect of weight initialization

Here’s the class structure of batch normalization:

tf.keras.layers.BatchNormalization(
axis=-1,
momentum=0.99,
epsilon=0.001,
center=True,
scale=True,
beta_initializer="zeros",
gamma_initializer="ones",
moving_mean_initializer="zeros",
moving_variance_initializer="ones",
beta_regularizer=None,
gamma_regularizer=None,
beta_constraint=None,
gamma_constraint=None,
**kwargs
)

You can add a batch normalization layer with only a single line of code:

12/16
model = Sequential()
model.add(BatchNormalization())

II. Layer Normalization Layer

Layer normalization follows pretty much an opposite approach to batch normalization. In


fact, it was designed to address the issues that were experienced with batch normalization
such as being dependent on the batch size.

In layer normalization, normalization is applied independent of the batch size and each
neuron is normalized for a single instance throughout all the channels. This way, whatever
the batch size is, the results stay the same. The idea, however, is the same, to make the
distribution as close to gaussian as possible.

While this technique doesn’t sit well with CNNs as batch normalization does, it has some
benefits associated with it:

Much better with RNNs (Recurrent Neural Network)


No effect of batch size

The layer normalization class in Keras is as follows:

tf.keras.layers.LayerNormalization(
axis=-1,
epsilon=0.001,
center=True,
scale=True,
beta_initializer="zeros",
gamma_initializer="ones",
beta_regularizer=None,
gamma_regularizer=None,
beta_constraint=None,
gamma_constraint=None,
**kwargs
)

Just like batch normalization, it’s very convenient to add a layer normalization layer into
your network:

model_lay = tf.keras.models.Sequential([
tf.keras.layers.LayerNormalization(axis=3 , center=True , scale=True)
])

Regularization Layers

Dropout Layer

13/16
Dropout is probably the most used regularization method in Deep Learning, mainly due to its
impressive results and easy interpretation. If you’re not familiar with how dropout works,
here’s a brief introduction to give you an overview.

The dropout layer randomly sets some inputs units as zero during the training phase, in
order to keep the model from getting too complex, hence ensuring generalization. There is a
parameter rate that decided how many values are to be set to zero, basically the probability.

Note that since rate is the probability of a unit being set to zero, it should be between 0 and 1.

Here’s an instance of making a dropout layer:

tf.random.set_seed(0)
layer = tf.keras.layers.Dropout(.2, input_shape=(2,))
data = np.arange(10).reshape(5, 2).astype(np.float32)
print(data)
>>> [[0. 1.]
[2. 3.]
[4. 5.]
[6. 7.]
[8. 9.]]

outputs = layer(data, training=True)


print(outputs)
>>> tf.Tensor(
[[ 0. 1.25]
[ 2.5 3.75]
[ 5. 6.25]
[ 7.5 8.75]
[10. 0. ]], shape=(5, 2), dtype=float32)

As you can see, since the rate parameter was set to 0.2, only 2 of the 10 inputs were set to 0.

Reshaping Layers
Sometimes when you’re developing a neural network, you don’t get the output you wish for.
In such a scenario, you can reshape whatever you have. Keras provides a comprehensive list
of reshaping layers that you can use in different scenarios.

In this tutorial, we’ll be covering the following reshaping layers since they’re used the most:

I. Reshape Layer

The reshape layer lets you change the shape of any arbitrary input into your desired shape.
However, you do need to know the input shape and provide it as a parameter to the layer.
The output shape is (batch_size,) + target_shape.

Here’s a brief example of how this layer can be used:

14/16
model = tf.keras.Sequential()
model.add(tf.keras.layers.Reshape((3, 4), input_shape=(12,)))
model.output_shape

>>> (None, 3, 4)

As you can see, we have converted a shape of (12,) into our desired shape of (3,4). The first
value is None since the batch size here is None.

II. Flatten Layer

The purpose of this layer is fairly straightforward. Whatver input it gets, it just flattens it. So,
it’s not affected by anything such as batch size. Let’s try inputting a multi-dimensional input
to the flatten layer and see how the layer flattens it:

model = tf.keras.Sequential()
model.add(tf.keras.layers.Conv2D(64, 3, 3, input_shape=(3, 32, 32)))
model.add(Flatten())
model.output_shape

>>> (None, 640)

As you can see, the flatten layer has flattened the initial shape of (3,32,32) to (None, 640). If
you’re wondering what None refers to it, it refers to the batch size.

Summary
Deep neural networks are composed of neural network layers and the data gets processed all
the way from the input layer to the output layer, going through a lot of learnable parameters
on its way. There is a variety of layers that are involved in this process and hence affect the
data in different ways.

Throughout the article, we’ve taken a detailed look at the most important layers you need to
know about if you’re going to train your neural networks. We have gone through the
descriptions of each layer and how they modify the inputs and pass on the outputs to the
successive layers. Moreover, we have seen instantiations examples for each layer in Keras.

So, make sure you go through the article in detail and if you want to explore further, don’t
hesitate to pay a visit to the official docs of Keras.

Avi Arora
Avi is a Computer Science student at the Georgia Institute of Technology
pursuing a Masters in Machine Learning. He is a software engineer working at
Capital One, and the Co Founder of the company Octtone. His company creates
software products in the Health & Wellness space.

15/16
16/16

You might also like