Professional Documents
Culture Documents
Have you registered for Codestudio Beginner Contest 26 yet! Register Now
Attention Mechanism …
Browse Category
Attention Mechanism
Problem of the day
Consistent and structured
for Image Processing
practice daily can land you in
Table of Contents
1. Introduction
3. Working of Bahdanau
Attention Model
4. Image Captioning
Related Articles
Set your goal
Important for focused learning
Hierarchical Attention
Prepare for tech
Network interviews
In Deep Learning
Become an expert
competitive coder
https://www.codingninjas.com/codestudio/library/attention-mechanism-for-image-processing 1/16
1/20/23, 12:05 PM Attention Mechanism for Image Processing - Coding Ninjas CodeStudio
Image captioning
using Attention
Mechanism
The encoder-decoder image captioning system would
encode the image with a pre-trained Convolutional
Neural Network in a hidden state. An LSTM would then
use it to decode this hidden state and generate a
caption.
Become an expert
competitive coder
Already setup?
source
https://www.codingninjas.com/codestudio/library/attention-mechanism-for-image-processing 2/16
1/20/23, 12:05 PM Attention Mechanism for Image Processing - Coding Ninjas CodeStudio
Working
Guided Paths
of Interview Prep
Contests Practice Resources Login
Bahdanau Attention
Model
Only a few source positions are focused on in Bahdanau
or Local attention. Global attention is computationally
expensive because it focuses on all source side words
for all target words. To compensate for this shortcoming,
local attention chooses to focus on only a tiny subset of
the encoder's hidden states per target word.
Image
Captioning
I will be implementing the model using the Flickr 8k
dataset. The link for the dataset is given here. The
dataset has 8000 different images and each image has
five different captions.
Importing libraries
import numpy as np
import pandas as pd
import string
from numpy import array
from PIL import Image
from pickle import load
import pickle
import matplotlib.pyplot as plt
from collections import Counter
import sys, time, os, warnings
warnings.filterwarnings("ignore")
from tqdm import tqdm
import re Set your goal
import keras Important for focused learning
from nltk.translate.bleu_score import sentence_ble
u Prepare for tech
interviews
import tensorflow as tf
from keras.preprocessing.sequence import pad_se
Learn and practise
quences
coding
from tensorflow.keras.utils import to_categorical, pl
ot_model
Become an expert
from keras.models import Model competitive coder
from keras.layers import Input, Dense, BatchNorm
alization, LSTM, Embedding, Dropout
from keras.layers.merge import add Already setup?
https://www.codingninjas.com/codestudio/library/attention-mechanism-for-image-processing 3/16
1/20/23, 12:05 PM Attention Mechanism for Image Processing - Coding Ninjas CodeStudio
from keras.applications.vgg16 import VGG16, prep
rocess_input
Guided Paths Contests Interview Prep Practice Resources Login
from sklearn.model_selection import train_test_spli
t
from sklearn.utils import shuffle
Loading Data
I am using google colab for training the model, and I am
loading the dataset directly from google drive.
image_path = "/content/drive/MyDrive/Datasets/arc
hive/Images"
dir_Flickr_text = "/content/drive/MyDrive/Datasets/a
rchive/captions.txt"
jpgs = os.listdir(image_path)
Data Preprocessing
Firstly, we will make a dataframe of image names
associated with its captions.
file = open(dir_Flickr_text,'r')
text = file.read()
file.close()
datatxt = []
i = 0
for line in text.split('\n'):
try:
col = line.split('\t')
col = col[0].split(',')
w = col[0].split("#")
if i == 0:
i+=1
continue
i+=1
datatxt.append(w + [col[1].lower()])
except:
continue
data = pd.DataFrame(datatxt,columns=["filenam
e","caption"])
data = data[data.filename != '2258277193_58694 Set your goal
Important for focused learning
9ec62.jpg.1']
uni_filenames = np.unique(data.filename.values)
Prepare for tech
data.head() interviews
Become an expert
competitive coder
Already setup?
https://www.codingninjas.com/codestudio/library/attention-mechanism-for-image-processing 4/16
1/20/23, 12:05 PM Attention Mechanism for Image Processing - Coding Ninjas CodeStudio
respective 5 captions.
Guided Paths Contests Interview Prep Practice Resources Login
npc = 5
npx = 224
t_sz = (npx,npx,3)
count = 1
Figure = plt.figure(figsize=(10,20))
for i in uni_filenames[10:14]:
fname = image_path + '/' + i
captions = list(data["caption"].loc[data["filenam
e"]==i].values)
image_load = load_img(fname, t_sz=t_sz)
axs = Figure.add_subplot(npc,2,count,xticks=[],yti
cks=[])
axs.imshow(image_load)
count += 1
axs = Figure.add_subplot(npc,2,count)
plt.axis('off')
axs.plot()
axs.set_xlim(0,1)
axs.set_ylim(0,len(captions))
for i, caption in enumerate(captions):
axs.text(0,i,caption,fontsize=20)
count += 1
plt.show()
vocabulary = []
for txt in data.caption.values:
vocabulary.extend(txt.split())
print('Vocabulary is of size: %d' % len(set(vocabula
ry)))
def single_character_removal(text):
tlmt1 = ""
for word in text.split():
https://www.codingninjas.com/codestudio/library/attention-mechanism-for-image-processing 5/16
1/20/23, 12:05 PM Attention Mechanism for Image Processing - Coding Ninjas CodeStudio
if len(word) > 1:
Guided Paths tlmt1 += " "Interview
Contests + wordPrep Practice Resources Login
return(tlmt1)
def number_removal(text):
tnn = ""
for word in text.split():
isalpha = word.isalpha()
if isalpha:
tnn += " " + word
return(tnn)
def text_cleaner(text_original):
text = punctuation_removal(text_original)
text = single_character_removal(text)
text = number_removal(text)
return(text)
clean = []
for txt in data.caption.values:
clean.extend(txt.split())
print('Clean Vocabulary Size: %d' % len(set(clean)))
PATH = "/content/drive/MyDrive/Datasets/archive/I
mages/"
total_captions = []
for cp in data["caption"].astype(str):
cp = '<start> ' + cp+ ' <end>'
total_captions.append(cp)
total_captions[:10]
img_vectors = []
Learn and practise
for annotations in data["filename"]:
coding
image_paths = PATH + annotations
img_vectors.append(image_paths)
Become an expert
competitive coder
img_vectors[:10]
Already setup?
https://www.codingninjas.com/codestudio/library/attention-mechanism-for-image-processing 6/16
1/20/23, 12:05 PM Attention Mechanism for Image Processing - Coding Ninjas CodeStudio
print(f"len(img_vectors) : {len(img_vectors)}")
print(f"len(total_captions) : {len(total_captions)}")
len(all_img_name_vector) : 40455
len(all_captions) : 40455
def data_limiter(nums,tc,imv):
training_captions, image_vector = shuffle(tc,imv,ra
ndom_state=1)
training_captions = training_captions[:nums]
image_vector = image_vector[:nums]
return training_captions,image_vector
train_captions,img_name_vector = data_limiter(40
000,total_captions,img_vectors)
Model Making
Let's use VGG16 to define the image feature, extraction
model. It's important to note that we don't need to
classify the images here; all we need to do is extract an
image vector. As a result, the softmax layer is removed
from the model. Before feeding the photos into the
model, we must all preprocess them to the same size,
224×224.
def load_image(path):
image = tf.io.read_file(path)
image = tf.image.decode_jpeg(image, channels=
3)
image = tf.image.resize(image, (224, 224))
image = preprocess_input(image)
return image, path Set your goal
Important for focused learning
image_model = tf.keras.applications.VGG16(includ
Prepare for tech
e_top=False, weights='imagenet')
interviews
new_input = image_model.input
hidden_layer = image_model.layers[-1].output
Learn and practise
image_features_extract_model = tf.keras.Model(ne coding
w_input, hidden_layer)
Become an expert
image_features_extract_model.summary() competitive coder
Already setup?
https://www.codingninjas.com/codestudio/library/attention-mechanism-for-image-processing 7/16
1/20/23, 12:05 PM Attention Mechanism for Image Processing - Coding Ninjas CodeStudio
encode_train = sorted(set(img_name_vector))
image_dataset = tf.data.Dataset.from_tensor_slice
s(encode_train)
image_dataset = image_dataset.map(load_image,
num_parallel_calls=tf.data.experimental.AUTOTUN
E).batch(64)
%%time
for img, path in tqdm(image_dataset):
batch_features = image_features_extract_model(i
mg)
batch_features = tf.reshape(batch_features,
(batch_features.shape[0], -1, batch
_features.shape[3]))
Set your goal
Important for focused learning
for bf, p in zip(batch_features, path):
path_of_feature = p.numpy().decode("utf-8") Prepare for tech
np.save(path_of_feature, bf.numpy()) interviews
Now, we will tokenize the captions and will build a Learn and practise
vocabulary of 5000 unique words from the data. The coding
words that are not in the vocabulary will be marked as
<unk>. Become an expert
competitive coder
topk = 5000
Already setup?
tkn = tf.keras.preprocessing.text.Tokenizer(num_wo
rds=topk,
oov_token="<unk>",
filters='!"#$%&()*+.,-/:;
=?@[\]^_`{|}~ ')
https://www.codingninjas.com/codestudio/library/attention-mechanism-for-image-processing 8/16
1/20/23, 12:05 PM Attention Mechanism for Image Processing - Coding Ninjas CodeStudio
tkn.fit_on_texts(train_captions)
Guided Paths Contests Interview Prep Practice Resources Login
train_seqs = tkn.texts_to_sequences(train_caption
s)
tkn.word_index['<pad>'] = 0
tkn.index_word[0] = '<pad>'
train_seqs = tkn.texts_to_sequences(train_caption
s)
cap_vector = tf.keras.preprocessing.sequence.pad_
sequences(train_seqs, padding='post')
train_captions[:3]
train_seqs[:3]
def max_sz(tensor):
return max(len(t) for t in tensor)
mx_l = max_sz(train_seqs)
def min_sz(tensor):
return min(len(t) for t in tensor)
min_l = min_sz(train_seqs)
Training Model
Now we will split the data using train_test_split.
Set your goal
Important for focused learning
img_name_train, img_name_val, cap_train, cap_val
= train_test_split(img_name_vector,cap_vector, test Prepare for tech
_size=0.2, random_state=0) interviews
https://www.codingninjas.com/codestudio/library/attention-mechanism-for-image-processing 9/16
1/20/23, 12:05 PM Attention Mechanism for Image Processing - Coding Ninjas CodeStudio
Next, let’s create a tf.data dataset to use for training our
model.
Guided Paths Contests Interview Prep Practice Resources Login
def map_function(img_name, cap):
tensor_img = np.load(img_name.decode('utf-8')+'.n
py')
return tensor_img, cap
dataset = tf.data.Dataset.from_tensor_slices((img_
name_train, cap_train))
dataset = dataset.shuffle(BUFFER_SIZE).batch(BA
TCH_SIZE)
dataset = dataset.prefetch(buffer_size=tf.data.exp
erimental.AUTOTUNE)
class VGG16_Encoder(tf.keras.Model):
def __init__(self, embedding_dim):
super(VGG16_Encoder, self).__init__()
self.fc = tf.keras.layers.Dense(embedding_dim)
self.dropout = tf.keras.layers.Dropout(0.5, noise
_shape=None, seed=None)
def call(self, x):
x = self.fc(x)
x = tf.nn.relu(x)
return x
Defining RNN
def rnn_type(units):
if tf.test.is_gpu_available():
return tf.compat.v1.keras.layers.CuDNNLSTM(u
nits,
return_state=True,
return_sequences=True,
recurrent_initializer='glorot_u
niform')
else:
return tf.keras.layers.GRU(units,
return_state=True,
return_sequences=True,
recurrent_activation='sigmoid',
recurrent_initializer='glorot_unifo Set your goal
rm') Important for focused learning
self.fc2 = tf.keras.layers.Dense(vocab_size)
# Attention Mechanism
self.Uattn = tf.keras.layers.Dense(units)
self.Wattn = tf.keras.layers.Dense(units)
self.Vattn = tf.keras.layers.Dense(1)
def reset_state(self, batch_size):
return tf.zeros((batch_size, self.units))
encoder = VGG16_Encoder(embedding_dim)
decoder = Rnn_Local_Decoder(embedding_dim, un
its, vocab_size)
optimizer = tf.keras.optimizers.Adam()
loss_object = tf.keras.losses.SparseCategoricalCros
sentropy(
from_logits=True, reduction='none')
Become an expert
Training Model competitive coder
https://www.codingninjas.com/codestudio/library/attention-mechanism-for-image-processing 11/16
1/20/23, 12:05 PM Attention Mechanism for Image Processing - Coding Ninjas CodeStudio
loss_plot
Guided Paths = []
Contests Interview Prep Practice Resources Login
@tf.function
def train_step(img_tensor, target):
loss = 0
hidden = decoder.reset_state(batch_size=target.sh
ape[0])
dec_input = tf.expand_dims([tokenizer.word_index
['<start>']] * BATCH_SIZE, 1)
EPOCHS = 20
for epoch in range(0, EPOCHS):
start = time.time()
total_loss = 0
for (batch, (img_tensor, target)) in enumerate(dat
aset):
batch_loss, t_loss = train_step(img_tensor, targ
et)
total_loss += t_loss
if batch % 100 == 0:
print ('Epoch {} Batch {} Loss {:.4f}'.format(
epoch + 1, batch, batch_loss.numpy() / int(ta
rget.shape[1])))
loss_plot.append(total_loss / num_steps)
print ('Epoch {} Loss {:.6f}'.format(epoch + 1
total_loss/num_steps))
print ('Time taken for 1 epoch {} sec\n'.format(tim
e.time() - start))
x['<start>']], 0)
result = [] Already setup?
for i in range(max_length):
predictions, hdn, attention_weights = decoder(d
ec_input, ftrs, hdn)
ap[i] = tf.reshape(attention_weights, (-1, )).num
py()
https://www.codingninjas.com/codestudio/library/attention-mechanism-for-image-processing 12/16
1/20/23, 12:05 PM Attention Mechanism for Image Processing - Coding Ninjas CodeStudio
predicted_id = tf.argmax(predictions[0]).numpy
()
Guided Paths Contests Interview Prep Practice Resources Login
result.append(tokenizer.index_word[predicted_i
d])
if tokenizer.index_word[predicted_id] == '<end
>':
return result, ap
dec_input = tf.expand_dims([predicted_id], 0)
ap = ap[:len(result), :]
return result, ap
plt.tight_layout()
plt.show()
r = np.random.randint(0, len(img_name_val))
photo = img_name_val[r]
start = time.time()
real_caption = ' '.join([tokenizer.index_word[i] for i i
n cap_val[r] if i not in [0]])
result, attention_plot = evaluate(photo)
real_appn = []
real_appn.append(real_caption.split()) Set your goal
reference = real_appn Important for focused learning
candidate = result_final
Prepare for tech
print ('Real Caption:', real_caption) interviews
Image.open(img_name_val[r])
Already setup?
FAQs
1. What are Attention Models?
Attention models, also known as attention
mechanisms, are neural network input processing
strategies that allow the network to focus on
specific parts of a complicated input one by one
until the entire dataset is categorized.
Takeaways
Learn and practise
coding
In this article, we have discussed the following topics:
Become an expert
Attention mechanism
competitive coder
Working of Bahdanau attention model
Implementation of image captioning model
Want to learn more about Machine Learning? Here is an Already setup?
excellent course that can guide you in learning.
Happy Coding!
https://www.codingninjas.com/codestudio/library/attention-mechanism-for-image-processing 14/16
1/20/23, 12:05 PM Attention Mechanism for Image Processing - Coding Ninjas CodeStudio
Share
this
Was this article article
helpful ? with
friends
0 upvotes and
colleague
:
Comments
Post
No comments yet
Already setup?
https://www.codingninjas.com/codestudio/library/attention-mechanism-for-image-processing 15/16
1/20/23, 12:05 PM Attention Mechanism for Image Processing - Coding Ninjas CodeStudio
Career Ninja Competitive Programmer Track |
Tracks: Ninja AndroidGuided
Developer
PathsCareer Track | Interview Prep
Contests Practice Resources Login
Ninja Web Developer Career Track - NodeJS & ReactJs |
Ninja Web Developer Career Track - NodeJS |
Ninja Data Scientist Career Track |
Ninja Machine Learning Engineer Career Track
Test Series
Contest
We
accept
payments
using:
Become an expert
competitive coder
Already setup?
https://www.codingninjas.com/codestudio/library/attention-mechanism-for-image-processing 16/16