You are on page 1of 16

1/20/23, 12:05 PM Attention Mechanism for Image Processing - Coding Ninjas CodeStudio

Guided Paths Contests Interview Prep Practice Resources Login

Have you registered for Codestudio Beginner Contest 26 yet! Register Now

Codestudio Library Deep Learning Attention Mechanism

Attention Mechanism …

Browse Category

Attention Mechanism
Problem of the day
Consistent and structured
for Image Processing
practice daily can land you in

soham Medewar Share


Last Updated: May 13,
2022 :
Explore

Table of Contents

1. Introduction

2. Image captioning using


Attention Mechanism

3. Working of Bahdanau
Attention Model

4. Image Captioning

4.1. Importing libraries

4.2. Loading Data

4.3. Data Preprocessing

4.4. Model Making

4.5. Training Model

4.6. Training Model

4.7. Greedy Search and BLEU


Evaluation

Related Articles
Set your goal
Important for focused learning
Hierarchical Attention
Prepare for tech
Network interviews

Learn and practise


Attention Mechanism coding

In Deep Learning
Become an expert
competitive coder

Already setup? Next

https://www.codingninjas.com/codestudio/library/attention-mechanism-for-image-processing 1/16
1/20/23, 12:05 PM Attention Mechanism for Image Processing - Coding Ninjas CodeStudio

Guided Paths Contests Interview Prep Practice Resources Login


Introduction
Humans have a complicated cognitive skill called the
attention mechanism. When people receive information,
they might choose to disregard part of the primary data
while paying attention to secondary data.

Attention is the term for this power of self-selection. The


neural network's attention mechanism allows it to focus
on a subset of inputs in order to choose certain
characteristics.

For Deep Learning practitioners, the attention


mechanism has been a go-to approach. It was originally
developed for Neural Machine Translation utilizing
Seq2Seq Models, but today we'll look at how it's used in
Image Captioning.

Instead of compressing a complete image into a static


form, the Attention technique dynamically brings
important elements to the forefront when they are
needed. This is especially crucial when an image has a
lot of clutter.

Image captioning
using Attention
Mechanism
The encoder-decoder image captioning system would
encode the image with a pre-trained Convolutional
Neural Network in a hidden state. An LSTM would then
use it to decode this hidden state and generate a
caption.

The outputs from previous elements and new sequence


data are used as inputs for each sequence element. This
provides RNN networks a kind of memory, which could
help captions become more useful and contextual.

However, because RNNs are computationally expensive


to train and assess, memory is typically limited to a few
components. By picking the most relevant elements from
an input image, attention models can help solve this
challenge.

The image is first divided into n pieces with an Attention


method, and then we compute an image representation
for each part. The attention mechanism focuses on the
appropriate region of the image when the RNN
generates a new word, so the decoder only uses certain Set your goal
Important for focused learning
sections of the image.
Prepare for tech
interviews

Learn and practise


coding

Become an expert
competitive coder

Already setup?

source

https://www.codingninjas.com/codestudio/library/attention-mechanism-for-image-processing 2/16
1/20/23, 12:05 PM Attention Mechanism for Image Processing - Coding Ninjas CodeStudio

Working
Guided Paths
of Interview Prep
Contests Practice Resources Login

Bahdanau Attention
Model
Only a few source positions are focused on in Bahdanau
or Local attention. Global attention is computationally
expensive because it focuses on all source side words
for all target words. To compensate for this shortcoming,
local attention chooses to focus on only a tiny subset of
the encoder's hidden states per target word.

Local attention locates an alignment point, calculates


the attention weight in the left and right windows where
its location is found, and then weights the context
vector. The main benefit of local attention is that it
lowers the cost of calculating the attention mechanism.

The local attention is used in the computation to


forecast the position of the source language end to be
aligned at the present decoding using a prediction
function and then travel through the context window,
only considering the words within the window.

Now, let us implement a model that will help understand


the attention mechanism in image captioning.

Image
Captioning
I will be implementing the model using the Flickr 8k
dataset. The link for the dataset is given here. The
dataset has 8000 different images and each image has
five different captions.

Importing libraries

import numpy as np
import pandas as pd
import string
from numpy import array
from PIL import Image
from pickle import load
import pickle
import matplotlib.pyplot as plt
from collections import Counter
import sys, time, os, warnings
warnings.filterwarnings("ignore")
from tqdm import tqdm
import re Set your goal
import keras Important for focused learning
from nltk.translate.bleu_score import sentence_ble
u Prepare for tech
interviews
import tensorflow as tf
from keras.preprocessing.sequence import pad_se
Learn and practise
quences
coding
from tensorflow.keras.utils import to_categorical, pl
ot_model
Become an expert
from keras.models import Model competitive coder
from keras.layers import Input, Dense, BatchNorm
alization, LSTM, Embedding, Dropout
from keras.layers.merge import add Already setup?

from keras.callbacks import ModelCheckpoint


from keras.preprocessing.text import Tokenizer
from keras.preprocessing.image import load_img, i
mg_to_array

https://www.codingninjas.com/codestudio/library/attention-mechanism-for-image-processing 3/16
1/20/23, 12:05 PM Attention Mechanism for Image Processing - Coding Ninjas CodeStudio
from keras.applications.vgg16 import VGG16, prep
rocess_input
Guided Paths Contests Interview Prep Practice Resources Login
from sklearn.model_selection import train_test_spli
t
from sklearn.utils import shuffle

Loading Data
I am using google colab for training the model, and I am
loading the dataset directly from google drive.

image_path = "/content/drive/MyDrive/Datasets/arc
hive/Images"
dir_Flickr_text = "/content/drive/MyDrive/Datasets/a
rchive/captions.txt"
jpgs = os.listdir(image_path)

print("Total Images in Dataset = {}".format(len(jpg


s)))

Total Images in Dataset = 8101

Data Preprocessing
Firstly, we will make a dataframe of image names
associated with its captions.

file = open(dir_Flickr_text,'r')
text = file.read()
file.close()

datatxt = []
i = 0
for line in text.split('\n'):
try:
col = line.split('\t')
col = col[0].split(',')
w = col[0].split("#")
if i == 0:
i+=1
continue
i+=1
datatxt.append(w + [col[1].lower()])
except:
continue

data = pd.DataFrame(datatxt,columns=["filenam
e","caption"])
data = data[data.filename != '2258277193_58694 Set your goal
Important for focused learning
9ec62.jpg.1']
uni_filenames = np.unique(data.filename.values)
Prepare for tech
data.head() interviews

Learn and practise


coding

Become an expert
competitive coder

Already setup?

Next, we will visualize some images with their

https://www.codingninjas.com/codestudio/library/attention-mechanism-for-image-processing 4/16
1/20/23, 12:05 PM Attention Mechanism for Image Processing - Coding Ninjas CodeStudio
respective 5 captions.
Guided Paths Contests Interview Prep Practice Resources Login

npc = 5
npx = 224
t_sz = (npx,npx,3)
count = 1

Figure = plt.figure(figsize=(10,20))
for i in uni_filenames[10:14]:
fname = image_path + '/' + i
captions = list(data["caption"].loc[data["filenam
e"]==i].values)
image_load = load_img(fname, t_sz=t_sz)
axs = Figure.add_subplot(npc,2,count,xticks=[],yti
cks=[])
axs.imshow(image_load)
count += 1

axs = Figure.add_subplot(npc,2,count)
plt.axis('off')
axs.plot()
axs.set_xlim(0,1)
axs.set_ylim(0,len(captions))
for i, caption in enumerate(captions):
axs.text(0,i,caption,fontsize=20)
count += 1
plt.show()

Let us see the size of the current vocabulary.

vocabulary = []
for txt in data.caption.values:
vocabulary.extend(txt.split())
print('Vocabulary is of size: %d' % len(set(vocabula
ry)))

Set your goal


Important for focused learning

Prepare for tech


Vocabulary is of Size: 8182
interviews

Now we will do some text cleaning on the caption, i.e.,


Learn and practise
removing punctuation, removing single characters, coding
removing numerical values.
Become an expert
competitive coder
def punctuation_removal(text_original):
tnp = text_original.translate(string.punctuation)
return(tnp) Already setup?

def single_character_removal(text):
tlmt1 = ""
for word in text.split():

https://www.codingninjas.com/codestudio/library/attention-mechanism-for-image-processing 5/16
1/20/23, 12:05 PM Attention Mechanism for Image Processing - Coding Ninjas CodeStudio
if len(word) > 1:
Guided Paths tlmt1 += " "Interview
Contests + wordPrep Practice Resources Login
return(tlmt1)

def number_removal(text):
tnn = ""
for word in text.split():
isalpha = word.isalpha()
if isalpha:
tnn += " " + word
return(tnn)

def text_cleaner(text_original):
text = punctuation_removal(text_original)
text = single_character_removal(text)
text = number_removal(text)
return(text)

for i, caption in enumerate(data.caption.values):


nc = text_cleaner(caption)
data["caption"].iloc[i] = nc

Let’s check the size of the dataset after cleaning the


dataset.

clean = []
for txt in data.caption.values:
clean.extend(txt.split())
print('Clean Vocabulary Size: %d' % len(set(clean)))

Clean Vocabulary Size: 8182

Next, we save all of the descriptions and picture paths in


two separate lists so that we can use the path set to
load all of the images at once. We also add '<start >'
and '<end >' tags to each caption so that the model can
understand where each caption begins and ends.

PATH = "/content/drive/MyDrive/Datasets/archive/I
mages/"
total_captions = []
for cp  in data["caption"].astype(str):
cp = '<start> ' + cp+ ' <end>'
total_captions.append(cp)

total_captions[:10]

Set your goal


Important for focused learning

Prepare for tech


interviews

img_vectors = []
Learn and practise
for annotations in data["filename"]:
coding
image_paths = PATH + annotations
img_vectors.append(image_paths)
Become an expert
competitive coder
img_vectors[:10]

Already setup?

https://www.codingninjas.com/codestudio/library/attention-mechanism-for-image-processing 6/16
1/20/23, 12:05 PM Attention Mechanism for Image Processing - Coding Ninjas CodeStudio

Guided Paths Contests Interview Prep Practice Resources Login

Now we will see the size of captions and image path


vectors.

print(f"len(img_vectors) : {len(img_vectors)}")
print(f"len(total_captions) : {len(total_captions)}")

len(all_img_name_vector) : 40455
len(all_captions) : 40455

We'll just take 40000 of each so we can properly set


batch size, i.e. 625 batches if batch size=64. To do this,
we create a function that restricts the dataset to 40000
photos and descriptions.

def data_limiter(nums,tc,imv):
training_captions, image_vector = shuffle(tc,imv,ra
ndom_state=1)
training_captions = training_captions[:nums]
image_vector = image_vector[:nums]
return training_captions,image_vector

train_captions,img_name_vector = data_limiter(40
000,total_captions,img_vectors)

Model Making
Let's use VGG16 to define the image feature, extraction
model. It's important to note that we don't need to
classify the images here; all we need to do is extract an
image vector. As a result, the softmax layer is removed
from the model. Before feeding the photos into the
model, we must all preprocess them to the same size,
224×224.

def load_image(path):
image = tf.io.read_file(path)
image = tf.image.decode_jpeg(image, channels=
3)
image = tf.image.resize(image, (224, 224))
image = preprocess_input(image)
return image, path Set your goal
Important for focused learning

image_model = tf.keras.applications.VGG16(includ
Prepare for tech
e_top=False, weights='imagenet')
interviews
new_input = image_model.input
hidden_layer = image_model.layers[-1].output
Learn and practise
image_features_extract_model = tf.keras.Model(ne coding
w_input, hidden_layer)
Become an expert
image_features_extract_model.summary() competitive coder

Already setup?

https://www.codingninjas.com/codestudio/library/attention-mechanism-for-image-processing 7/16
1/20/23, 12:05 PM Attention Mechanism for Image Processing - Coding Ninjas CodeStudio

Guided Paths Contests Interview Prep Practice Resources Login

Next, let’s Map each image name to the function to load


the image:

encode_train = sorted(set(img_name_vector))
image_dataset = tf.data.Dataset.from_tensor_slice
s(encode_train)
image_dataset = image_dataset.map(load_image,
num_parallel_calls=tf.data.experimental.AUTOTUN
E).batch(64)

We extract the features and save them in the


appropriate .npy files, after which we pass them through
the encoder. NPY files include all of the data needed to
recreate an array on any machine, including dtype and
shape data.

%%time
for img, path in tqdm(image_dataset):
batch_features = image_features_extract_model(i
mg)
batch_features = tf.reshape(batch_features,
(batch_features.shape[0], -1, batch
_features.shape[3]))
Set your goal
Important for focused learning
for bf, p in zip(batch_features, path):
path_of_feature = p.numpy().decode("utf-8") Prepare for tech
np.save(path_of_feature, bf.numpy()) interviews

Now, we will tokenize the captions and will build a Learn and practise
vocabulary of 5000 unique words from the data. The coding
words that are not in the vocabulary will be marked as
<unk>. Become an expert
competitive coder

topk = 5000
Already setup?
tkn = tf.keras.preprocessing.text.Tokenizer(num_wo
rds=topk,
oov_token="<unk>",
filters='!"#$%&()*+.,-/:;
=?@[\]^_`{|}~ ')

https://www.codingninjas.com/codestudio/library/attention-mechanism-for-image-processing 8/16
1/20/23, 12:05 PM Attention Mechanism for Image Processing - Coding Ninjas CodeStudio

tkn.fit_on_texts(train_captions)
Guided Paths Contests Interview Prep Practice Resources Login
train_seqs = tkn.texts_to_sequences(train_caption
s)
tkn.word_index['<pad>'] = 0
tkn.index_word[0] = '<pad>'

train_seqs = tkn.texts_to_sequences(train_caption
s)
cap_vector = tf.keras.preprocessing.sequence.pad_
sequences(train_seqs, padding='post')

train_captions[:3]

train_seqs[:3]

Let us see the maximum and minimum length of the


captions.

def max_sz(tensor):
return max(len(t) for t in tensor)
mx_l = max_sz(train_seqs)

def min_sz(tensor):
return min(len(t) for t in tensor)
min_l = min_sz(train_seqs)

print('Max Length of any caption : Min Length of an


y caption = '+ str(mx_l) +" : "+str(min_length)min_l)

Max Length of any caption : Min Length of any cap


tion = 31 : 2

Training Model
Now we will split the data using train_test_split.
Set your goal
Important for focused learning
img_name_train, img_name_val, cap_train, cap_val
= train_test_split(img_name_vector,cap_vector, test Prepare for tech
_size=0.2, random_state=0) interviews

Defining the training parameters Learn and practise


coding

BATCH_SIZE = 64 Become an expert


BUFFER_SIZE = 1000 competitive coder
embedding_dim = 256
units = 512
Already setup?
vocab_size = len(tokenizer.word_index) + 1
num_steps = len(img_name_train) // BATCH_SIZE
features_shape = 512
attention_features_shape = 49

https://www.codingninjas.com/codestudio/library/attention-mechanism-for-image-processing 9/16
1/20/23, 12:05 PM Attention Mechanism for Image Processing - Coding Ninjas CodeStudio
Next, let’s create a tf.data dataset to use for training our
model.
Guided Paths Contests Interview Prep Practice Resources Login

def map_function(img_name, cap):
tensor_img = np.load(img_name.decode('utf-8')+'.n
py')
return tensor_img, cap
dataset = tf.data.Dataset.from_tensor_slices((img_
name_train, cap_train))

dataset = dataset.map(lambda item1, item2: tf.nu


mpy_function(
map_function, [item1, item2], [tf.float32, tf.int3
2]),
num_parallel_calls=tf.data.experimental.AUTO
TUNE)

dataset = dataset.shuffle(BUFFER_SIZE).batch(BA
TCH_SIZE)
dataset = dataset.prefetch(buffer_size=tf.data.exp
erimental.AUTOTUNE)

Let us define the encoder-decoder model with attention.

class VGG16_Encoder(tf.keras.Model):
def __init__(self, embedding_dim):
super(VGG16_Encoder, self).__init__()
self.fc = tf.keras.layers.Dense(embedding_dim)
self.dropout = tf.keras.layers.Dropout(0.5, noise
_shape=None, seed=None)

def call(self, x):
x = self.fc(x)
x = tf.nn.relu(x)
return x 

Defining RNN

def rnn_type(units):
if tf.test.is_gpu_available():
return tf.compat.v1.keras.layers.CuDNNLSTM(u
nits,
return_state=True,
return_sequences=True,
recurrent_initializer='glorot_u
niform')
else:
return tf.keras.layers.GRU(units,
return_state=True,
return_sequences=True,
recurrent_activation='sigmoid',
recurrent_initializer='glorot_unifo Set your goal
rm') Important for focused learning

Defining RNN Decoder with Bahdanau Attention. Prepare for tech


interviews

class Rnn_Local_Decoder(tf.keras.Model): Learn and practise


def __init__(self, embedding_dim, units, vocab_siz coding
e):
super(Rnn_Local_Decoder, self).__init__() Become an expert
self.units = units competitive coder
self.embedding = tf.keras.layers.Embedding(voca
b_size, embedding_dim) Already setup?
self.gru = tf.keras.layers.GRU(self.units,
return_sequences=True,
return_state=True,
recurrent_initializer='glorot_unifo
rm')
https://www.codingninjas.com/codestudio/library/attention-mechanism-for-image-processing 10/16
1/20/23, 12:05 PM Attention Mechanism for Image Processing - Coding Ninjas CodeStudio
self.fc1 = tf.keras.layers.Dense(self.units)
Guided Paths Contests Interview Prep Practice Resources Login
self.dropout = tf.keras.layers.Dropout(0.5, noise_s
hape=None, seed=None)
self.batchnormalization = tf.keras.layers.BatchNor
malization(axis=-1, momentum=0.99, epsilon=0.00
1, center=True, scale=True, beta_initializer='zeros',
gamma_initializer='ones', moving_mean_initializer
='zeros', moving_variance_initializer='ones', beta_r
egularizer=None, gamma_regularizer=None, beta_
constraint=None, gamma_constraint=None)

self.fc2 = tf.keras.layers.Dense(vocab_size)

# Attention Mechanism
self.Uattn = tf.keras.layers.Dense(units)
self.Wattn = tf.keras.layers.Dense(units)
self.Vattn = tf.keras.layers.Dense(1)

def call(self, x, features, hidden):


hidden_with_time_axis = tf.expand_dims(hidde
n, 1)
score = self.Vattn(tf.nn.tanh(self.Uattn(features) +
self.Wattn(hidden_with_time_axis)))
attention_weights = tf.nn.softmax(score, axis=1)
context_vector = attention_weights * features
context_vector = tf.reduce_sum(context_vector, ax
is=1)
x = self.embedding(x)
x = tf.concat([tf.expand_dims(context_vector, 1),
x], axis=-1)
output, state = self.gru(x)
x = self.fc1(output)
x = tf.reshape(x, (-1, x.shape[2]))
x= self.dropout(x)
x= self.batchnormalization(x)
x = self.fc2(x)
return x, state, attention_weights

def reset_state(self, batch_size):
return tf.zeros((batch_size, self.units))

encoder = VGG16_Encoder(embedding_dim)
decoder = Rnn_Local_Decoder(embedding_dim, un
its, vocab_size)

Defining optimizer and loss function.

optimizer = tf.keras.optimizers.Adam()
loss_object = tf.keras.losses.SparseCategoricalCros
sentropy(
from_logits=True, reduction='none')

def loss_function(real, pred): Set your goal


mask = tf.math.logical_not(tf.math.equal(real, 0)) Important for focused learning
loss_ = loss_object(real, pred)
mask = tf.cast(mask, dtype=loss_.dtype) Prepare for tech
interviews
loss_ *= mask

return tf.reduce_mean(loss_) Learn and practise


coding

Become an expert
Training Model competitive coder

Let's go on to define the training stage. We use a


technique known as Teacher Forcing, which involves Already setup?
passing the target word to the decoder as the next
input. This strategy aids in fast learning the correct
sequence or statistical features for the sequence.

https://www.codingninjas.com/codestudio/library/attention-mechanism-for-image-processing 11/16
1/20/23, 12:05 PM Attention Mechanism for Image Processing - Coding Ninjas CodeStudio

loss_plot
Guided Paths = []
Contests Interview Prep Practice Resources Login

@tf.function
def train_step(img_tensor, target):
loss = 0
hidden = decoder.reset_state(batch_size=target.sh
ape[0])
dec_input = tf.expand_dims([tokenizer.word_index
['<start>']] * BATCH_SIZE, 1)

with tf.GradientTape() as tape:


features = encoder(img_tensor)
for i in range(1, target.shape[1]):
predictions, hidden, _ = decoder(dec_input, fea
tures, hidden)
loss += loss_function(target[:, i], predictions)
dec_input = tf.expand_dims(target[:, i], 1)

total_loss = (loss / int(target.shape[1]))


trainable_variables = encoder.trainable_variables +
decoder.trainable_variables
gradients = tape.gradient(loss, trainable_variables)
optimizer.apply_gradients(zip(gradients, trainable_
variables))

return loss, total_loss

Training the model.

EPOCHS = 20
for epoch in range(0, EPOCHS):
start = time.time()
total_loss = 0
for (batch, (img_tensor, target)) in enumerate(dat
aset):
batch_loss, t_loss = train_step(img_tensor, targ
et)
total_loss += t_loss
if batch % 100 == 0:
print ('Epoch {} Batch {} Loss {:.4f}'.format(
epoch + 1, batch, batch_loss.numpy() / int(ta
rget.shape[1])))
loss_plot.append(total_loss / num_steps)
print ('Epoch {} Loss {:.6f}'.format(epoch + 1
total_loss/num_steps))
print ('Time taken for 1 epoch {} sec\n'.format(tim
e.time() - start))

Greedy Search and BLEU Evaluation


Defining a greedy method of defining captions
Set your goal
Important for focused learning
def evaluate(image):
ap = np.zeros((max_length, attention_features_sh Prepare for tech
ape)) interviews
hdn = decoder.reset_state(batch_size=1)
ti = tf.expand_dims(load_image(image)[0], 0) Learn and practise
itv = image_features_extract_model(ti) coding
itv = tf.reshape(itv, (itv.shape[0], -1, itv.shape[3]))
ftrs = encoder(itv) Become an expert
dec_input = tf.expand_dims([tokenizer.word_inde competitive coder

x['<start>']], 0)
result = [] Already setup?
for i in range(max_length):
predictions, hdn, attention_weights = decoder(d
ec_input, ftrs, hdn)
ap[i] = tf.reshape(attention_weights, (-1, )).num
py()
https://www.codingninjas.com/codestudio/library/attention-mechanism-for-image-processing 12/16
1/20/23, 12:05 PM Attention Mechanism for Image Processing - Coding Ninjas CodeStudio
predicted_id = tf.argmax(predictions[0]).numpy
()
Guided Paths Contests Interview Prep Practice Resources Login
result.append(tokenizer.index_word[predicted_i
d])

if tokenizer.index_word[predicted_id] == '<end
>':
return result, ap

dec_input = tf.expand_dims([predicted_id], 0)
ap = ap[:len(result), :]

return result, ap

Plotting attention maps for each generated word.

def plot_attention(image, result, attention_plot):


ti = np.array(Image.open(image))
f = plt.figure(figsize=(10, 10))
lr = len(result)
for l in range(lr):
temp_att = np.resize(attention_plot[l], (8, 8))
ax = f.add_subplot(lr//2, lr//2, l+1)
ax.set_title(result[l])
img = ax.imshow(ti)
ax.imshow(temp_att, cmap='gray', alpha=0.6, e
xtent=img.get_extent())

plt.tight_layout()
plt.show()

Constructing captions for the images.

r = np.random.randint(0, len(img_name_val))
photo = img_name_val[r]
start = time.time()
real_caption = ' '.join([tokenizer.index_word[i] for i i
n cap_val[r] if i not in [0]])
result, attention_plot = evaluate(photo)

first = real_caption.split(' ', 1)[1]


real_caption = first.rsplit(' ', 1)[0]

for i in result:


if i=="<unk>":
result.remove(i)

#remove <end> from result


result_join = ' '.join(result)
result_final = result_join.rsplit(' ', 1)[0]

real_appn = []
real_appn.append(real_caption.split()) Set your goal
reference = real_appn Important for focused learning
candidate = result_final
Prepare for tech
print ('Real Caption:', real_caption) interviews

print ('Prediction Caption:', result_final)


Learn and practise
coding
plot_attention(photo, result, attention_plot)
print(f"time took to Predict: {round(time.time()-star
t)} sec") Become an expert
competitive coder

Image.open(img_name_val[r])
Already setup?

Real Caption: brown dog in field


Prediction Caption: the brown dog is standing in dr
https://www.codingninjas.com/codestudio/library/attention-mechanism-for-image-processing 13/16
1/20/23, 12:05 PM Attention Mechanism for Image Processing - Coding Ninjas CodeStudio
y grass field
Guided Paths Contests Interview Prep Practice Resources Login

time took to Predict: 2 sec

FAQs
1. What are Attention Models?
Attention models, also known as attention
mechanisms, are neural network input processing
strategies that allow the network to focus on
specific parts of a complicated input one by one
until the entire dataset is categorized.

2. What are attention layers?


The attention layers are based on human concepts
of attention, however, they are just a weighted
mean reduction. The query, the values, and the keys
are all fed into the attention layer. When the query
has one key and the keys and values are the same,
these inputs are frequently identical.

3. What is the self-attention model?


The self-attention mechanism, in layman's words,
allows the inputs to interact with one another
("self") and determine who they should pay more
attention to ("attention"). These interactions and Set your goal
attention scores are aggregated in the outputs. Important for focused learning

Prepare for tech


Key interviews

Takeaways
Learn and practise
coding
In this article, we have discussed the following topics:
Become an expert
Attention mechanism
competitive coder
Working of Bahdanau attention model
Implementation of image captioning model
Want to learn more about Machine Learning? Here is an Already setup?
excellent course that can guide you in learning. 
Happy Coding!

https://www.codingninjas.com/codestudio/library/attention-mechanism-for-image-processing 14/16
1/20/23, 12:05 PM Attention Mechanism for Image Processing - Coding Ninjas CodeStudio

Previous Article Next Article


Guided Paths Contests Interview Prep Practice Resources Login

Attention Mechanism In Hierarchical Attention


Deep Learning Network

Share
this
Was this article article
helpful ? with
friends
0 upvotes and
colleague
:

Comments

Write your thoughts...

Post

No comments yet

Be the first to share what you think

Set your goal


Important for focused learning
Categories: Coding courses for beginners | Web Development Courses |
Data Science & Machine Learning Courses | Prepare for tech
Competitive Programming Course | interviews
Android App Development Courses | Courses for interview preparation
Learn and practise
Popular C++ Foundation with Data Structures | coding
Courses: Java Foundation with Data Structures |
Python Foundation with Data Structures | Competitive Programming | Become an expert
Full Stack Web Development competitive coder

Already setup?

https://www.codingninjas.com/codestudio/library/attention-mechanism-for-image-processing 15/16
1/20/23, 12:05 PM Attention Mechanism for Image Processing - Coding Ninjas CodeStudio
Career Ninja Competitive Programmer Track |
Tracks: Ninja AndroidGuided
Developer
PathsCareer Track | Interview Prep
Contests Practice Resources Login
Ninja Web Developer Career Track - NodeJS & ReactJs |
Ninja Web Developer Career Track - NodeJS |
Ninja Data Scientist Career Track |
Ninja Machine Learning Engineer Career Track

Interested in Coding Ninjas Flagship


Click Here
Courses?

CODING PRODUCTS COMMUNITY


FOLLOW US ON
NINJAS
Problem of CodeStudio
About Us the day
Blog
Press Interview
Events
Privacy Problems
Campus
Policy Interview
Ninjas
Terms & Experiences
Conditions Interview
Bug Bounty Bundle

Hire from Guided


CodeStudio Paths
Library

Test Series

Contest

We
accept
payments
using:

Set your goal


Important for focused learning

Prepare for tech


interviews

Learn and practise


coding

Become an expert
competitive coder

Already setup?

https://www.codingninjas.com/codestudio/library/attention-mechanism-for-image-processing 16/16

You might also like