Professional Documents
Culture Documents
Introduction:
The Vision Network of the human brain contains 2 pathways, Dorsal and
Ventral. While the dorsal pathway processes spatial information(detect
motion), object detection happens at the IT visual cortex which is
through the ventral pathway(V1->V2->V4->IT).
By the Hubel and Wiesel experiment, it was found out that basic edge
detection happens at the V1 cortex and rudimentary shapes are identified
in the V4 cortex. The IT cortex identifies the object and emotional
content of the image. Hence it is responsible for the reply, "Oh, this is
Onam - King Bali Chakravarthy.” when the image given in the
assignment is shown.
In the Neural Network trained for this purpose, the kernels in the first
layer activate when edges(soft edges) are detected. The next layer
processes the information from activated kernels and activates when a
particular rudimentary shape specific to that kernel is identified. It
functions similar to the V4 cortex. The third layer is a dense
layer(another layer flattens the output from the second layer) that
encodes the object. To show the functioning of an encoder, a decoder
neural network is trained that decodes the encoded representation into an
image that has similar properties to the input image. The entire training
process was done on the single image provided so this network might
not perform well on other kinds of data, but this is a very popular Deep
Learning architecture called “autoencoder”(when both encoder and
decoder are combined). This architecture is widely used for many
applications and generalizes the data well when more data is provided
for training.
The Network:
The right image is the output from the decoder after training was
completed. Though the color gradient is very different, the object and
leaves behind are reconstructed to a good extent when shapes are
considered for evaluation. The color of the outline also matches with
input to a recognizable level. Hence the output from the encoder
contains some information about the objects in the image. When the
dimensions of the encoded space are changed(increased), there will be
better reconstruction of the image's details inside the shape.
The kernels contain some edge, but there is also a lot of noise. These
kernels become somewhat clean when multiple images are used to train
the network as the noise is because of overfitting the data, i.e the kernels
become very specific to the given image.
(All the kernels of the network can be seen in this google colab notebook:
https://colab.research.google.com/drive/1ZuPJtkT5iUAe43lykVlLjup8cKS8ONB-?usp=sharing)