You are on page 1of 5

Vision Network: Ventral Pathway

Introduction:
The Vision Network of the human brain contains 2 pathways, Dorsal and
Ventral. While the dorsal pathway processes spatial information(detect
motion), object detection happens at the IT visual cortex which is
through the ventral pathway(V1->V2->V4->IT).

By the Hubel and Wiesel experiment, it was found out that basic edge
detection happens at the V1 cortex and rudimentary shapes are identified
in the V4 cortex. The IT cortex identifies the object and emotional
content of the image. Hence it is responsible for the reply, "Oh, this is
Onam - King Bali Chakravarthy.” when the image given in the
assignment is shown.

Coding a Neural Network that processes similar to this


Vision Network:
CNNs are deep learning models that are inspired from the visual cortex
of the brain. The convolution kernels, when trained, perform similar
functions to the V1, V2, V4 cortexes.

In the Neural Network trained for this purpose, the kernels in the first
layer activate when edges(soft edges) are detected. The next layer
processes the information from activated kernels and activates when a
particular rudimentary shape specific to that kernel is identified. It
functions similar to the V4 cortex. The third layer is a dense
layer(another layer flattens the output from the second layer) that
encodes the object. To show the functioning of an encoder, a decoder
neural network is trained that decodes the encoded representation into an
image that has similar properties to the input image. The entire training
process was done on the single image provided so this network might
not perform well on other kinds of data, but this is a very popular Deep
Learning architecture called “autoencoder”(when both encoder and
decoder are combined). This architecture is widely used for many
applications and generalizes the data well when more data is provided
for training.

The Network:

Layer 3: similar to the IT layer.


The following reconstruction of image is to show that the encoder
works:

The right image is the output from the decoder after training was
completed. Though the color gradient is very different, the object and
leaves behind are reconstructed to a good extent when shapes are
considered for evaluation. The color of the outline also matches with
input to a recognizable level. Hence the output from the encoder
contains some information about the objects in the image. When the
dimensions of the encoded space are changed(increased), there will be
better reconstruction of the image's details inside the shape.

First layer: similar to V1 cortex


The first layer detects the edges in the image. The kernels in the layer
are 6X6 images that are activated when that edge is present in the input
image. Some kernels are here:

The kernels contain some edge, but there is also a lot of noise. These
kernels become somewhat clean when multiple images are used to train
the network as the noise is because of overfitting the data, i.e the kernels
become very specific to the given image.

Second layer: Detects basic shapes


Some images of the kernels in the second layer:
A Y like shape A Z like shape

An inverted F like shape A + like shape

A 4 like shape Horizontal and vertical lines


along 1, 5 and 0, 4 forms a box like shape
These kernels contain the information of some very basic shapes. They
are also very noisy and they are not very hard images of some shapes.
The pixel intensity varies along the shape but this gives an idea that
some kernels are activated when a particular shape is observed.

(All the kernels of the network can be seen in this google colab notebook:
https://colab.research.google.com/drive/1ZuPJtkT5iUAe43lykVlLjup8cKS8ONB-?usp=sharing)

You might also like