You are on page 1of 2

Fast-Track Semester 2022

Technical Answers to Real-World Problems


Digital Assignment 4
Prof. Rajakumar K

Informative Text Summarizer Using NLP and DL

Team Members:
Saumitra Pathak (19BCE2411)
Shivam Bansal (19BCE0930)
Arkaraj Ghosh (19BCE24218)
Debalay Dasgupta (19BCE2423)
Pratyay Piyush (19BCE2364)
Implementation
We investigated three kinds of datasets. They consist of academic films on YouTube,
research papers, and news stories. We employed the tokenization, vectorization, and
stop word cloud concepts for preprocessing.

The T5 abstractive transformer model works best with input text that is the length of a
news story, thus it is processed immediately.
The Youtube video transcript has been processed using the YouTube transcript API for
the video transcript.

The T5 component employs a transformer library and a pre-trained summarization


pipeline model. Before delivering them to T5, YouTube, transcripts, and research
journals are processed via it for an extractive summary. Sumy is an extractive
summarization library that includes LexRank, TextRank, Luhn, and LSA.

Experimental Comparison

ROUGE metric is used for comparative analysis between the proposed hybrid model
and the seq2seq RNN baseline model.

ROUGE-1 and ROUGE-2 are calculated using unigram and bigram overlaps
respectively while ROUGE-L is based on the longest common subsequences.

Based on 250 lit papers, 10 video transcripts, 20 research articles

Method ROUGE-1 F1 ROUGE-2 F2 ROUGE-3 F3

Avg of SUMY 0.2834 0.1132 0.2629


Extractive
Summary
methods

Hybrid Model 0.2541 0.1066 0.2354


(Avg of SUMY
Extractive
Summary
methods + T5
transformer
model)

You might also like