Professional Documents
Culture Documents
Team Members:
Saumitra Pathak (19BCE2411)
Shivam Bansal (19BCE0930)
Arkaraj Ghosh (19BCE24218)
Debalay Dasgupta (19BCE2423)
Pratyay Piyush (19BCE2364)
Implementation
We investigated three kinds of datasets. They consist of academic films on YouTube,
research papers, and news stories. We employed the tokenization, vectorization, and
stop word cloud concepts for preprocessing.
The T5 abstractive transformer model works best with input text that is the length of a
news story, thus it is processed immediately.
The Youtube video transcript has been processed using the YouTube transcript API for
the video transcript.
Experimental Comparison
ROUGE metric is used for comparative analysis between the proposed hybrid model
and the seq2seq RNN baseline model.
ROUGE-1 and ROUGE-2 are calculated using unigram and bigram overlaps
respectively while ROUGE-L is based on the longest common subsequences.