You are on page 1of 20

Make a simple Neural network using Numpy

In this article, we will discuss how to make a simple neural network using NumPy.

Import Libraries

First, we will import all the packages that we will need. We will need numpy, h5py (for loading
dataset stored in H5 file), and matplotlib (for plotting).
import numpy as np
import matplotlib.pyplot as plt
import h5py

Data Preparation

The data is available in (“.h5”) format and contain training and test set of images labeled as cat or
non-cat. The dataset is available in github repo for download. Load the dataset using the following
function:

def load_dataset():
train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r")
train_x = np.array(train_dataset["train_set_x"][:])
train_y = np.array(train_dataset["train_set_y"][:])test_dataset =
h5py.File('datasets/test_catvnoncat.h5', "r")
test_x = np.array(test_dataset["test_set_x"][:])
test_y = np.array(test_dataset["test_set_y"][:])classes = np.array(test_dataset["list_classes"][:])

train_y = train_y.reshape((1, train_y.shape[0]))


test_y = test_y.reshape((1, test_y.shape[0]))

return train_x, train_y, test_x, test_y, classes

We can analyze the data by looking at their shape.

train_x, train_y, test_x, test_y, classes = load_dataset()print ("Train X shape: " + str(train_x.shape))
print ("Train Y shape: " + str(train_y.shape))print ("Test X shape: " + str(test_x.shape))
print ("Test Y shape: " + str(test_y.shape))

We have 209 train image where each image is square (height = 64px) and (width = 64px) and have
3 channels (RGB). Similarly, we have 50 test images of the same dimension.

Let us visualize the image. You can change the index to see different images.
# change index for another image
index = 2
plt.imshow(train_x[index])

Data Preprocessing: The common data preprocessing for image data involves:

1. Figure out the dimensions and shapes of the data (m_train, m_test, num_px, …)

2. Reshape the datasets such that each example is now a vector of size (height * width *
channel, 1)

3. “Standardize” the data

First, we need to flatten the image. This can be done by reshaping the images of shape (height,
width, channel) in a numpy-array of shape (height ∗ width ∗channel, 1).

train_x = train_x.reshape(train_x.shape[0], -1).T


test_x = test_x.reshape(test_x.shape[0], -1).Tprint ("Train X shape: " + str(train_x.shape))
print ("Train Y shape: " + str(train_y.shape))print ("Test X shape: " + str(test_x.shape))
print ("Test Y shape: " + str(test_y.shape))

Standardize the data: The common preprocessing step in machine learning is to center and
standardize the dataset. For the given picture datasets, it can be done by dividing every row of the
dataset by 255 (the maximum value of a pixel channel).

train_x = train_x/255.
test_x = test_x/255.

Now we will build a simple neural network model that can correctly classify pictures as cat or non-
cat.

Neural Network Model

We will build a Neural Network as shown in the following figure.


Key steps: The main steps for building a Neural Network are:

1. Define the model structure (like number of input features, number of ouput, etc.)

2. Initialize the model’s parameters (weight and bias)

3. Loop:

• Calculate current loss (forward propagation)

• Calculate current gradient (backward propagation)

• Update parameters (gradient descent)

Activation Function

The sigmoid activation function is given by

The sigmoid activation function can be calculated using np.exp().

def sigmoid(z):
return 1/(1+np.exp(-z))

Initializing Parameters
We need to initialize the parameter 𝑤 (weight) and 𝑏 (bias). In the following example, 𝑤 is
initialized as a vector of random numbers using np.random.randn() while 𝑏 is initialize zero.

def initialize_parameters(dim):
w = np.random.randn(dim, 1)*0.01
b=0
return w, b

Forward and Back Propagation

Once the parameters are initialized, we can do the “forward” and “backward” propagation steps for
learning the parameters.

• Set of input features (X) are given.

• We will calculate the activation function as given below.

• We will compute the cost as given below.

• Finally, we will calculate the gradients as follows (back propagation).

def propagate(w, b, X, Y):


m = X.shape[1]

#calculate activation function


A = sigmoid(np.dot(w.T, X)+b) #find the cost
cost = (-1/m) * np.sum(Y * np.log(A) + (1 - Y) * (np.log(1 - A)))
#find gradient (back propagation)
dw = (1/m) * np.dot(X, (A-Y).T)
db = (1/m) * np.sum(A-Y) cost = np.squeeze(cost)
grads = {"dw": dw,
"db": db}
return grads, cost

Optimization
After initializing the parameters, computing the cost function, and calculating gradients, we can
now update the parameters using gradient descent.

def gradient_descent(w, b, X, Y, iterations, learning_rate):


costs = []
for i in range(iterations):
grads, cost = propagate(w, b, X, Y)

#update parameters
w = w - learning_rate * grads["dw"]
b = b - learning_rate * grads["db"]
costs.append(cost)
if i % 500 == 0:
print ("Cost after iteration %i: %f" %(i, cost))

params = {"w": w,
"b": b}
return params, costs

Prediction

Using the learned parameter w and b, we can predict the labels for a train or test examples. For
prediction we first need to calculate the activation function given as follows.

Then convert the output (prediction) into 0 (if A <= 0.5) or 1 (if A > 0.5) and store in y_pred.

def predict(w, b, X):


# number of example
m = X.shape[1]
y_pred = np.zeros((1,m))
w = w.reshape(X.shape[0], 1)

A = sigmoid(np.dot(w.T, X)+b)

for i in range(A.shape[1]):
y_pred[0,i] = 1 if A[0,i] >0.5 else 0
pass
return y_pred

Final Model

We can put together all the building block in the right order to make a neural network model.
def model(train_x, train_y, test_x, test_y, iterations, learning_rate):
w, b = initialize_parameters(train_x.shape[0])
parameters, costs = gradient_descent(w, b, train_x, train_y, iterations, learning_rate)

w = parameters["w"]
b = parameters["b"]

# predict
train_pred_y = predict(w, b, train_x)
test_pred_y = predict(w, b, test_x) print("Train Acc: {} %".format(100 -
np.mean(np.abs(train_pred_y - train_y)) * 100))
print("Test Acc: {} %".format(100 - np.mean(np.abs(test_pred_y - test_y)) * 100))

return costs

We can use the following code to train and predict on the image dataset using the model built
above. We will use the learning_rate of 0.005 and train the model for 2000 iterations.

costs = model(train_x, train_y, test_x, test_y, iterations = 2000, learning_rate = 0.005)

Training accuracy is around 99% which means that our model is working and fit the training data
with high probability. Test accuracy is around 70%. Given the simple model and the small dataset,
we can consider it as a good model. Finally, we can plot the cost and see how the model was
learning parameters.

plt.plot(costs)
plt.ylabel('cost')
plt.xlabel('iterations')
plt.title("Learning rate =" + str(d["learning_rate"]))
plt.show()

We can see the cost decreasing in each iteration which shows that the parameters are being learned.
Convolution Neural Networks
What is a CNN?

A Convolutional Neural Network is type of neural network that is used mainly in image processing
applications. Other applications of CNNs are in sequential data such as audio, time series, and
NLP. Convolution is one of the main building blocks of a CNN. The term convolution refers to the
mathematical combination of two functions to produce a third function. It merges two sets of
information.

We won’t go over a lot of theory here. There’s plenty of fantastic material available online for this.

Types of CNN operations

CNNs are majorly used for applications surrounding images, audio, videos, text, and time series
modelling. There are 3 types of convolution operations.

• 1D convolution — majorly used where the input is sequential such as text or audio.

• 2D convolution — majorly used where the input is an image.

• 3D convolution — majorly used in 3D medical imaging or detecting events in


videos. This is outside the scope of this blog post. We will only focus on the first
two.

1D Convolution for 1D Input

The filter slides along a single dimension to produce an output. The following diagrams are taken
from this Stackoverflow answer.
1D Convolution for 2D Input

2D Convolution for 2D Input

Check out this Stackoverflow answer for more information on different types of CNN operations.

A Few Key Terminologies

The terminologies are explained for 2D convolutions and 2D inputs ie. images because I could not
find relevant visualizations for 1D Convolutions. All the visualizations are taken from here.

Convolution Operation

To calculate the output dimension after a convolution operation, we can use the following formula.
The kernel/filter slides over the input signal as shown below. You can see the filter (the green
square) is sliding over our input (the blue square) and the sum of the convolution goes into
the feature map (the red square).

Filter/Kernel

A convolution is performed on an input image using filters. The output of convolution is known as
a feature map.
In CNN terminology, the 3×3 matrix is called a ‘filter‘ or ‘kernel’ or ‘feature detector’ and the
matrix formed by sliding the filter over the image and computing the dot product is called the
‘Convolved Feature’ or ‘Activation Map’ or the ‘Feature Map‘. It is important to note that filters
act as feature detectors from the original input image.

more filters = more feature maps = more features.

A filter is nothing but a matrix of numbers. Following are the different types of filters —
Stride

Stride specifies how much we move the convolution filter at each step.
We can have bigger strides if we want less overlap between the receptive fields. This also makes
the resulting feature map smaller since we are skipping over potential locations. The following
figure demonstrates a stride of 2. Note that the feature map got smaller.

Padding

Here we have retained more information from the borders and have also preserved the size of the
image.
We see that the size of the feature map is smaller than the input, because the convolution filter
needs to be contained in the input. If we want to maintain the same dimensionality, we can
use padding to surround the input with zeros.

Pooling

We apply pooling to reduce dimensionality.

• Pooling reduces the size of the input and makes the feature dimension smaller.

• Because of lower spatial size, the number of parameters in the network are reduced.
This helps in combating overfitting.

• Pooling makes the network robust to distortions in the image because we take the
aggregate(max, sum, average etc.) of the pixel values in a neighborhood.
Python code

Import Library
import numpy as npimport torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader

Input Data

To start with, we define a few input tensors which we will use throughout this blog post.

input_1d is a 1 dimensional float tensor. input_2d is a 2 dimensional float tensor. input_2d_img is a


3 dimensional float tensor which represents an image.
input_1d = torch.tensor([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], dtype = torch.float)input_2d = torch.tensor([[1, 2, 3,
4, 5], [6, 7, 8, 9, 10]], dtype = torch.float)input_2d_img = torch.tensor([[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [1, 2,
3, 4, 5, 6, 7, 8, 9, 10], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]], [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [1, 2, 3, 4, 5, 6, 7, 8, 9,
10], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]], [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [1, 2, 3, 4, 5, 6,
7, 8, 9, 10]]], dtype = torch.float)
###################### OUTPUT ######################Input 1D:input_1d.shape:
torch.Size([10])input_1d:
tensor([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.])
====================================================================Input
2D:input_2d.shape: torch.Size([2, 5])input_2d:
tensor([[ 1., 2., 3., 4., 5.],
[ 6., 7., 8., 9., 10.]])
====================================================================input_
2d_img:input_2d_img.shape: torch.Size([3, 3, 10])input_2d_img:
tensor([[[ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.],
[ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.],
[ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.]], [[ 1., 2., 3., 4., 5., 6., 7.,
8., 9., 10.],
[ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.],
[ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.]], [[ 1., 2., 3., 4., 5., 6., 7.,
8., 9., 10.],
[ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.],
[ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.]]])

1D Convolution

nn.Conv1d() applies 1D convolution over the input. nn.Conv1d() expects the input to be of the
shape [batch_size, input_channels, signal_length] .
You can check out the complete list of parameters in the official PyTorch Docs. The required
parameters are —

• in_channels (python:int) — Number of channels in the input signal. This should be


equal to the number of channels in the input tensor.

• out_channels (python:int) — Number of channels produced by the convolution.

• kernel_size (python:int or tuple) — Size of the convolving kernel.

Conv1d — Input 1d

The input is a 1D signal which consists of 10 numbers. We will convert this into a tensor of size [1,
1, 10].
input_1d = input_1d.unsqueeze(0).unsqueeze(0)
input_1d.shape
###################### OUTPUT ######################torch.Size([1, 1, 10])

CNN Output with out_channels=1, kernel_size=3 and stride=1 .

cnn1d_1 = nn.Conv1d(in_channels=1, out_channels=1, kernel_size=3, stride=1)print("cnn1d_1: \n")


print(cnn1d_1(input_1d).shape, "\n")
print(cnn1d_1(input_1d))
###################### OUTPUT ######################cnn1d_1: torch.Size([1, 1, 8])
tensor([[[-1.2353, -1.4051, -1.5749, -1.7447, -1.9145, -2.0843, -2.2541, -2.4239]]],
grad_fn=<SqueezeBackward1>)

CNN Output with out_channels=1, kernel_size=3 and stride=2 .

cnn1d_2 = nn.Conv1d(in_channels=1, out_channels=1, kernel_size=3, stride=2)print("cnn1d_2: \n")


print(cnn1d_2(input_1d).shape, "\n")
print(cnn1d_2(input_1d))
###################### OUTPUT ######################cnn1d_2: torch.Size([1, 1, 4])
tensor([[[0.5107, 0.3528, 0.1948, 0.0368]]], grad_fn=<SqueezeBackward1>)

CNN Output with out_channels=1, kernel_size=2 and stride=1 .

cnn1d_3 = nn.Conv1d(in_channels=1, out_channels=1, kernel_size=2, stride=1)print("cnn1d_3: \n")


print(cnn1d_3(input_1d).shape, "\n")
print(cnn1d_3(input_1d))
###################### OUTPUT ######################cnn1d_3: torch.Size([1, 1, 9])
tensor([[[0.0978, 0.2221, 0.3465, 0.4708, 0.5952, 0.7196, 0.8439, 0.9683, 1.0926]]],
grad_fn=<SqueezeBackward1>)

CNN Output with out_channels=5, kernel_size=3 and stride=2 .

cnn1d_4 = nn.Conv1d(in_channels=1, out_channels=5, kernel_size=3, stride=1)print("cnn1d_4: \n")


print(cnn1d_4(input_1d).shape, "\n")
print(cnn1d_4(input_1d))
###################### OUTPUT ######################cnn1d_4: torch.Size([1, 5, 8])
tensor([[[-1.8410e+00, -2.8884e+00, -3.9358e+00, -4.9832e+00, -6.0307e+00,-7.0781e+00, -
8.1255e+00, -9.1729e+00],
[-4.6073e-02, -3.4436e-02, -2.2799e-02, -1.1162e-02, 4.7439e-04,1.2111e-02, 2.3748e-
02, 3.5385e-02],
[-1.5541e+00, -1.8505e+00, -2.1469e+00, -2.4433e+00, -2.7397e+00, -3.0361e+00, -
3.3325e+00, -3.6289e+00],
[ 6.6593e-01, 1.2362e+00, 1.8066e+00, 2.3769e+00, 2.9472e+00, 3.5175e+00,
4.0878e+00, 4.6581e+00],
[ 2.0414e-01, 6.0421e-01, 1.0043e+00, 1.4044e+00, 1.8044e+00,2.2045e+00,
2.6046e+00, 3.0046e+00]]],
grad_fn=<SqueezeBackward1>)

Conv1d — Input 2d

To apply 1D convolution on a 2d input signal, we can do the following. First, we define our input
tensor of the size [1, 2, 5] where batch_size = 1, input_channels = 2 , and signal_length = 5 .
input_2d = input_2d.unsqueeze(0)
input_2d.shape
###################### OUTPUT ######################torch.Size([1, 2, 5])

CNN Output with in_channels=2, out_channels=1, kernel_size=3, stride=1 .


cnn1d_5 = nn.Conv1d(in_channels=2, out_channels=1, kernel_size=3, stride=1)print("cnn1d_5: \n")
print(cnn1d_5(input_2d).shape, "\n")
print(cnn1d_5(input_2d))
###################### OUTPUT ######################cnn1d_5: torch.Size([1, 1, 3])
tensor([[[-6.6836, -7.6893, -8.6950]]], grad_fn=<SqueezeBackward1>)

CNN Output with in_channels=2, out_channels=1, kernel_size=3, stride=2 .

cnn1d_6 = nn.Conv1d(in_channels=2, out_channels=1, kernel_size=3, stride=2)print("cnn1d_6: \n")


print(cnn1d_6(input_2d).shape, "\n")
print(cnn1d_6(input_2d))
###################### OUTPUT ######################cnn1d_6: torch.Size([1, 1, 2])
tensor([[[-3.4744, -3.7142]]], grad_fn=<SqueezeBackward1>)

CNN Output with in_channels=2, out_channels=1, kernel_size=2, stride=1 .

cnn1d_7 = nn.Conv1d(in_channels=2, out_channels=1, kernel_size=2, stride=1)print("cnn1d_7: \n")


print(cnn1d_7(input_2d).shape, "\n")
print(cnn1d_7(input_2d))
###################### OUTPUT ######################cnn1d_7: torch.Size([1, 1, 4])
tensor([[[0.5619, 0.6910, 0.8201, 0.9492]]], grad_fn=<SqueezeBackward1>)

CNN Output with in_channels=2, out_channels=5, kernel_size=3, stride=1 .

cnn1d_8 = nn.Conv1d(in_channels=2, out_channels=5, kernel_size=3, stride=1)print("cnn1d_8: \n")


print(cnn1d_8(input_2d).shape, "\n")
print(cnn1d_8(input_2d))
###################### OUTPUT ######################cnn1d_8: torch.Size([1, 5, 3])
tensor([[[ 1.5024, 2.4199, 3.3373],
[ 0.2980, -0.0873, -0.4727],
[ 1.5443, 1.7086, 1.8729],
[ 2.6177, 3.2974, 3.9772],
[-2.5145, -2.2906, -2.0668]]], grad_fn=<SqueezeBackward1>)

2D Convolution

nn.Conv2d() applies 2D convolution over the input. nn.Conv2d() expects the input to be of the
shape [batch_size, input_channels, input_height, input_width] .
You can check out the complete list of parameters in the official PyTorch Docs. The required
parameters are —

• in_channels (python:int) — Number of channels in the 2d input eg. image.

• out_channels (python:int) — Number of channels produced by the convolution.

• kernel_size (python:int or tuple) — Size of the convolving kernel

Conv2d — Input 2d

To apply 2D convolution on a 2d input signal (eg. images), we can do the following. First, we
define our input tensor of the size [1, 3, 3, 10] where batch_size = 1, input_channels =
3, input_height = 3 , and input_width = 10 .
input_2d_img = input_2d_img.unsqueeze(0)
input_2d_img.shape
###################### OUTPUT ######################torch.Size([1, 3, 3, 10])

CNN Output with in_channels=3, out_channels=1, kernel_size=3, stride=1 .


cnn2d_1 = nn.Conv2d(in_channels=3, out_channels=1, kernel_size=3, stride=1)print("cnn2d_1: \n")
print(cnn2d_1(input_2d_img).shape, "\n")
print(cnn2d_1(input_2d_img))
###################### OUTPUT ######################cnn2d_1: torch.Size([1, 1, 1, 8])
tensor([[[[-1.0716, -1.5742, -2.0768, -2.5793, -3.0819, -3.5844, -4.0870,-4.5896]]]],
grad_fn=<MkldnnConvolutionBackward>)

CNN Output with in_channels=3, out_channels=1, kernel_size=3, stride=2 .

cnn2d_2 = nn.Conv2d(in_channels=3, out_channels=1, kernel_size=3, stride=2)print("cnn2d_2: \n")


print(cnn2d_2(input_2d_img).shape, "\n")
print(cnn2d_2(input_2d_img))
###################### OUTPUT ######################cnn2d_2: torch.Size([1, 1, 1, 4])
tensor([[[[-0.7407, -1.2801, -1.8195, -2.3590]]]],
grad_fn=<MkldnnConvolutionBackward>)

CNN Output with in_channels=3, out_channels=1, kernel_size=2, stride=1 .

cnn2d_3 = nn.Conv2d(in_channels=3, out_channels=1, kernel_size=2, stride=1)print("cnn2d_3: \n")


print(cnn2d_3(input_2d_img).shape, "\n")
print(cnn2d_3(input_2d_img))
###################### OUTPUT ######################cnn2d_3: torch.Size([1, 1, 2, 9])
tensor([[[[-0.8046, -1.5066, -2.2086, -2.9107, -3.6127, -4.3147, -5.0167, -5.7188, -6.4208],
[-0.8046, -1.5066, -2.2086, -2.9107, -3.6127, -4.3147, -5.0167,-5.7188, -6.4208]]]],
grad_fn=<MkldnnConvolutionBackward>)

CNN Output with in_channels=3, out_channels=5, kernel_size=3, stride=1 .

cnn2d_4 = nn.Conv2d(in_channels=3, out_channels=5, kernel_size=3, stride=1)print("cnn2d_4: \n")


print(cnn2d_4(input_2d_img).shape, "\n")
print(cnn2d_4(input_2d_img))
###################### OUTPUT ######################cnn2d_4: torch.Size([1, 5, 1, 8])
tensor([[[[-2.0868e+00, -2.7669e+00, -3.4470e+00, -4.1271e+00, -4.8072e+00, -5.4873e+00, -
6.1673e+00, -6.8474e+00]], [[-4.5052e-01, -5.5917e-01, -6.6783e-01, -7.7648e-01, -8.8514e-
01, -9.9380e-01, -1.1025e+00, -1.2111e+00]], [[ 6.6228e-01, 8.3826e-01, 1.0142e+00,
1.1902e+00, 1.3662e+00,1.5422e+00, 1.7181e+00, 1.8941e+00]], [[-5.4425e-01, -
1.2149e+00, -1.8855e+00, -2.5561e+00, -3.2267e+00, -3.8973e+00, -4.5679e+00, -5.2385e+00]],
[[ 2.0564e-01, 1.6357e-01, 1.2150e-01, 7.9434e-02, 3.7365e-02, -4.7036e-03, -4.6773e-02, -
8.8842e-02]]]],
grad_fn=<MkldnnConvolutionBackward>)

You might also like