You are on page 1of 6

Second Progress Report

On

Text Summarization

Submitted in the partial fulfillment of the Degree of

Bachelor of Technology
(Computer Science and Engineering)

Submitted By

Raghav Vohra (01096302719)


Piyush Garg (02296302719)
Deepanshu Aggarwal (02596302719)

Under the Supervision of

Dr. Neeti Sangwan

Department of Computer Science and Engineering


Maharaja Surajmal Institute of Technology
Janakpuri, New Delhi.
2019-2023
SCOPE AND PURPOSE

To develop a Deep Learning Model using Transformer architecture for Abstractive Text Summarization

PROGRESS:

OBJECTIVE ACHIEVED:

1. We applied Multi-Headed Attention and Feed-Forward Neural Network. We split the inputs into
multiple heads, and after processing, concatenate output from all the heads.
2. We made fundamental units of encoder and decoder. These expanded into 4 encoder/decoder
layers.
3. We stack all the intermediate layers in a Custom Model class.
4. We applied custom learning rate in which training on a custom learning rate scheduler that helps faster
convergence.
5. We train the model with Sparse Categorical Cross Entropy loss and used Adam Optimizer.

ADDITIONAL WORK:

PROPOSED ARCHITECTURE

Most competitive neural sequence transduction models have an encoder-decoder structure [5, 2, 35]. Here, the
encoder maps an input sequence of symbol representations (x1, ..., xn) to a sequence of continuous representations z
= (z1, ..., zn). Given z, the decoder then generates an output sequence (y1, ..., ym) of symbols one element at a time.
At each step the model is auto-regressive [10], consuming the previously generated symbols as additional input
when generating the next. The Transformer follows this overall architecture using stacked self-attention and point-
wise, fully connected layers for both the encoder and decoder.
Fig.1. Architect
ureofth emodel

Encoder
And Decoder
Blocks

1. The first
step in
calculating self-
attention is to
create three
vectors from
each of the
encoder’s input
vectors (in this
case, the
embedding of
each word). So
for each word,
we create a
Query vector, a
Key vector, and
a Value vector.
These vectors
are created by
multiplying the
embedding by three matrices that we trained during the training process.
2. The second step in calculating self-attention is to calculate a score. Say we’re calculating the self-attention
for the first word in this example, “Thinking”. We need to score each word of the input sentence against this
word. The score determines how much focus to place on other parts of the input sentence as we encode a
word at a certain position.
3. The third and fourth steps are to divide the scores by 8 (the square root of the dimension of the key
vectors used in the paper – 64. This leads to having more stable gradients. There could be other possible
values here, but this is the default), then pass the result through a SoftMax operation. SoftMax normalizes
the scores so they’re all positive and add up to 1.
Custom Learning Rate
The transformer paper also suggests training on a custom learning rate scheduler that helps faster
convergence
Encoder Decoder Block

EXPECTED RESULT:

The expected result of our project is:

 Our algorithm takes less computation time and resources than other approaches.
 Provide more accuracy of summarization than other approaches.

5|Page
REFERENCES:

1. Transformer for NMT: https://www.tensorflow.org/tutorials/text/transformer


2. TensorFlow API docs: https://www.tensorflow.org/api_docs

3. Attention is all you need: https://arxiv.org/abs/1706.03762

4. https://machinelearningmastery.com/how-does-attention-work-in-encoder-decoder-
recurrent-neural-networks/

5. https://medium.com/analytics-vidhya/https-medium-com-understanding-attention-
mechanism-natural-language-processing-9744ab6aed6a

6|Page

You might also like