Abstractive Text Summarization Using Hybrid Methods

Abstractive Text Summarization
using Hybrid Methods

Project Guide Project Team Members
Gautam Daga 106118029
Subhradeep Saha 106118095
Dr. S Jaya Nirmala Yash Shah 106118107
Block Diagram
Test Cases
Module Name Test case Input Expected output
id
Preprocessor TC_01_01 { article:”This is a sentence.This is Array of sentences
second sentence. ……”, [“This is a sentence”,”This is second
abstract : “Summarized Doc” } sentence”,......]
TC_01_02 {article:”Sentence 1.Sentence 2. [“Sentence 1”,”Sentence 2”, “Sentence

Sentence 3. ……”} 3”,...]
Abstractor TC_02_01 [“Sentence 1”,”Sentence 2”, K-concise versions of each sentence.
“Sentence 3”,...] [[“Concise Sentence 1 1”,”Concise
Sentence 1 2”, … ],[“Concise Sentence
2 1”, ….. ,], ……]
Extractor TC_03_01 [[“Concise Sentence 1 1”,”Concise [“Concise sentence 1 x”,”Concise
Sentence 1 2”, … ],[“Concise Sentence 2 x”, …. ,”Concise Sentence n’
Sentence 2 1”, ….. ,], ……] x”] combined to form final summary
Integrated all TC_0x_01 {article:”This is a sentence.This is Final summary with n’
modules second sentence. ……”} sentences.[“Concise sentence 1 x”, ….
,”Concise Sentence n’ x”]
Evaluation Metrics
● Rouge-N : Measure of matching ‘n-grams’ between generated and
reference text.
Rouge Recall =
Rouge Precision =
Rouge F1-Score =
Evaluation Metrics cont.
● Rouge - L : Measures longest common subsequence (LCS) between
our model output and reference
Recall =
Precision =
F1-Score =
We consider the F1-scores for evaluating the model.

Comparison analysis (Dummy)
Model Rouge-1 Rouge-2 Rouge-L
One2one-top1 w/o RoBERTa
Comp-ctrl w/o RoBERTa
One2one-top1 with RoBERTa
Comp-ctrl with RoBERTa
Modified One2one-top1 with RoBERTa and grammar check
Modified Comp-ctrl with RoBERTa and grammar check

Dataset Format
• CNN/DailyMail
• DUC 2002
Format of individual (json)item in pre-processed CNN/DailyMail dataset:

{
"id": String,
"article": [ String ] , // The main article input
"abstract": [ String ], // Summarized text(abstract)
"unique_three_gram_novelty": Float,
"three_gram_novelty": Float,
"unique_two_gram_novelty": Float,
"two_gram_novelty": Float,
"extractive_fragment_density": Float,
"extractive_fragment_coverage": Float,
"similar_source_indices_lebanoff": [ [ Int ] ],
"avg_fusion_ratio": Float,
"extracted": [ Int ],
"score": [ Float ]
}
Screenshots
References
[1] Hou Pong Chan, Irwin King,A condense-then-select strategy for text
summarization,Knowledge-Based Systems,Volume 227,2021,107235, ISSN 0950-7051,
https://doi.org/10.1016/j.knosys.2021.107235.
[2] Y. Wu, B. Hu, Learning to extract coherent summary via deep reinforcement learning, in:
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th
innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on
Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February
2-7, 2018, 2018, pp. 5602–5609.
[3] R. Nallapati, F. Zhai, B. Zhou, Summarunner: A recurrent neural network based sequence model
for extractive summarization of documents, in: Proceedings of the Thirty-First AAAI Conference on
Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA., 2017, pp. 3075–3081.
References(contd.)
[4] E. Sharma, L. Huang, Z. Hu, L. Wang, An entity-driven framework for abstractive summarization,
in: K. Inui, J. Jiang, V. Ng, X. Wan (Eds.), Proceedings of the 2019 Conference on Empirical Methods
in Natural Language Processing and the 9th International Joint Conference on Natural Language
Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, Association for
Computational Linguistics, 2019, pp. 3278– 3289. doi:10.18653/v1/D19-1323.
[5] A. See, P. J. Liu, C. D. Manning, Get to the point: Summarization with pointer-generator
networks, in: Proceedings of the 55th Annual Meeting of the Association for Computational
Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers, 2017, pp.
1073–1083.
[6] J. Xu, G. Durrett, Neural extractive text summarization with syntactic compression, in:
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the
9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association
for Computational Linguistics, Hong Kong, China, 2019, pp. 3283–3294.
References(contd.)
[7] Chin-Yew Lin, “Rouge: A Package for Automatic Evaluation of Summaries.” Barcelona Spain,
Workshop o Text Summarization Branches Out, Post- Conference Workshop of ACL 2004.
[8] J. Devlin, M. Chang, K. Lee, K. Toutanova, BERT: pre-training of deep bidirectional transformers
for language understanding, in: Proceedings of the 2019 Conference of the North American
Chapter of the Association for Computational Linguistics: Human Language Technologies,
NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), 2019.
[9] A. Cohan, F. Dernoncourt, D. S. Kim, T. Bui, S. Kim, W. Chang, N. Goharian, A discourse-aware

attention model for abstractive summarization of long documents, in: Proceedings of the 2018
Conference of the North American Chapter of the Association for Computational Linguistics:
Human Language Technologies, NAACL-HLT, New Orleans, Louisiana, USA, June 1-6, 2018, Volume
2 (Short Papers), 2018

Abstractive Text Summarization Using Hybrid Methods

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Abstractive Text Summarization Using Hybrid Methods

Uploaded by

Copyright:

Available Formats

Abstractive Text Summarization

using Hybrid Methods

TC_01_02 {article:”Sentence 1.Sentence 2. [“Sentence 1”,”Sentence 2”, “Sentence

We consider the F1-scores for evaluating the model.

One2one-top1 w/o RoBERTa

Comp-ctrl w/o RoBERTa

One2one-top1 with RoBERTa

Comp-ctrl with RoBERTa

Modified One2one-top1 with RoBERTa and grammar check

Modified Comp-ctrl with RoBERTa and grammar check

Format of individual (json)item in pre-processed CNN/DailyMail dataset:

[9] A. Cohan, F. Dernoncourt, D. S. Kim, T. Bui, S. Kim, W. Chang, N. Goharian, A discourse-aware

You might also like