Professional Documents
Culture Documents
The major goal of this work is to help users get concise, cohesive, and coherent answers to complex
questions (non-factoid questions). The problem that this research work aims to solve is clearly captured in
the passages below. The major question that informs this work is, how can we provide users with a single,
concise, cohesive, and coherent answer to their complex questions (non-factoid questions)? This work
seeks to improve on the automatic text summarization (ATS) architecture and results thereafter bringing
the improved ATS into the Non-factoid QAS architecture to generate a single, concise, cohesive, and
coherent answer to users' complex questions (non-factoid questions). Further below are the aim and
objectives of this study and also a fairly proposed methodology.
“The limitation of the analyzed studies is related to how they provide the answer. The majority of studies
focus on selecting a few passages from different documents and ranking them according to their
usefulness to answer a question. However, it is common for non-factoid questions to have several
restrictions, narrowing the search space down to a specific answer. For instance, for the question “How
should I treat measles in a 12-year-old boy?” the ideal passage to be used as an answer should cover
“treatment”, “measles”, “12-year-old” and “boy”, which is very unlikely and there may not be a ready-
made passage in the knowledge base containing all the information needed. In this case, the ideal system
must search for different information pieces in different documents and merge them to compose a single
answer. However, this is challenging and still an open research problem.”
“Automatic generation of answers based on multiple passages is a critical issue for developing full end-
to-end non-factoid question-answer systems. The problem emerges from the fact that the automatic
generation of coherent and cohesive text – especially for long passages – is still an open research question
(Bau et al., 2020). Broadly speaking, coherence and cohesion refer to how a text is organized so that it
can hold together. In a coherent answer, concepts are connected meaningfully and logically by using
grammatical and lexical cohesive devices.”
“The future directions in the non-factoid QA should concern methods that generate natural language and
use several information sources to compose complex answers instead of using a simple extracted
sentence”
Cortes, E. G., Woloszyn, V., Barone, D., Möller, S., & Vieira, R. (2022). A systematic review of
question answering systems for non-factoid questions. Journal of Intelligent Information Systems,
58(3), 453–480. https://doi.org/10.1007/s10844-021-00655-8
https://drive.google.com/file/d/1v_iuRQk6AJgw1txmZq4QQevBmr6GM2mm/view?usp=sharing
To innovate in the field of deep learning-based text analysis by constructing and thoroughly assessing
hybrid deep learning systems for document summarization and non-factoid question answering. Through
the creation of unique deep learning architectures and proof of concepts, this research aims to enhance the
conciseness, functionality and accuracy of automated systems in interpreting multiple documents and
answering non-factoid questions
The rapid growth of textual content on the internet and the increasing demand for efficient and accurate
information retrieval systems have spurred interest in non-factoid question-answering (QA) systems.
These systems aim to provide comprehensive, contextually accurate, and concise responses to complex
questions, synthesizing information from a variety of sources (Nazari & Mahdavi, 2019; Widyassari et
al., 2022). Despite advancements in natural language processing and question-answering techniques,
crafting coherent and comprehensive answers to non-factoid questions remains an open research
challenge (Breja & Jain, 2022; Cortes et al., 2022).
Existing non-factoid QA models tend to focus primarily on the selection and ranking of passages based
on their relevance to the query. This focus often results in an inadequate addressal of the inherent
complexities of non-factoid questions, which frequently require the integration of information from
diverse sources (Cortes et al., 2022). Moreover, existing automatic text summarization (ATS) approaches
have achieved only limited success in generating answers from multiple passages due to the presence of
redundant information and a lack of effective techniques to integrate various information snippets
cohesively (Sharma & Sharma, 2022; Wang et al., 2022; Yadav et al., 2022).
Within the scope of ATS, abstractive summarization of long documents remains a relatively
underexplored area when compared to its counterpart involving shorter documents. This disparity
becomes more pronounced in cases where very long documents, such as scientific articles, need to be
summarized (Alomari et al., 2022). Recent advancements in massive pre-trained language models
(PTLMs) and the availability of new long-document datasets have begun to steer research focus towards
summarizing lengthy documents containing thousands of words (Alomari et al., 2022).
Furthermore, the abstractive type of ATS, which necessitates a semantic understanding of the text and
real-world knowledge, often produces more meaningful summaries akin to those crafted by humans
(Suleiman & Awajan, 2020; Wazery et al., 2022). However, it's also associated with higher computational
costs and has the propensity to generate summaries riddled with factual errors, redundant information,
and significant information loss (Alomari et al., 2022).
In response to these challenges, the present work proposes an innovative approach at the intersection of
non-factoid QA systems and ATS. Instead of the traditional answer extraction and answer ranking in non-
factoid QA systems, this approach employs ATS to provide the final answer to the user. The intended
result is a comprehensive, coherent, and non-redundant summary that effectively answers the user's
question. This is achieved by designing a novel deep learning-based framework that integrates extractive,
abstractive, and compressive-based methods (Lewis et al., 2020; See et al., 2017). Such an approach not
only addresses a significant gap in the current research landscape but also signifies a promising avenue
towards addressing the complexities of non-factoid question answering.
Objectives
Objective 1: To design and implement a hybrid deep learning system for efficient document
summarization.
Objective 2: To assess the performance and accuracy of the developed hybrid deep learning document
summarization system through suitable evaluation techniques.
Objective 3: To design and develop a novel deep learning-based system for non-factoid question
answering, culminating in the creation of a comprehensive proof of concept.
Objective 4: To systematically evaluate the functionality, accuracy, and efficiency of the developed non-
factoid question-answering system using the developed proof of concept.
Sample Methodology Breakdown
Objective 1: To design and implement a hybrid deep learning system for efficient document
summarization
Methodology 1
- The extractive part will be implemented using BERT, which will be fine-tuned to
assign relevance scores to sentences in the document. The sentences with the
highest scores will be selected as informative.
- The dense layer will output a score for each sentence in the input
document.
- Fine-tune the model on the dataset, where the target is to assign higher
scores to sentences that are part of the human-written summary.
- Use the model to score the sentences in a new document. Select the top-
scoring sentences as the extractive summary.
- The abstractive part will involve using GPT, which will generate a new,
abstractive summary from the informative sentences selected by BERT.
- Load a pre-trained GPT model using the HuggingFace Transformers
library.
- Fine-tune the GPT model on the extractive summaries. The target is the
human-written summary.
- The compressive part will involve using Longformer, which can handle long
sequences. Longformer will be used to condense the selected sentences without
losing key information.
- Fine-tune the Longformer model on the dataset, with the target being the
human-written summary.
- The entire process forms a pipeline, with each step relying on the output from the
previous step.
- Implement both unit tests and functional tests to ensure that individual
components of the architecture are working correctly.
- Data collection
- The CNN / Daily Mail dataset will be used, as it is a standard benchmark for
document summarization tasks.
- Data preprocessing
- BERT tokenizer will be used for tokenizing the text for the extractive part and
GPT tokenizer for the abstractive part. Special tokens like start, end, and pad
tokens will be handled accordingly. All sequences will be padded to a fixed
length - the maximum length of the sequence in the dataset. For sequences
shorter than this length, padding tokens will be added until they reach the
maximum length. For sequences longer than the maximum length, they will be
truncated.
- Use the BERT tokenizer for tokenizing the text for the extractive part.
- Handle special tokens like start, end, and pad tokens appropriately for
each of the models.
- For each of the models, pad all sequences to a fixed length - the
maximum length allowed by the model (BERT, GPT, Longformer have
different sequence length limits).
- The Adam optimizer will be used with an initial learning rate of 0.001. After a
few epochs, if the validation loss does not decrease, the learning rate will be
reduced by a factor of 10 to refine the optimization process.
- If the validation loss does not decrease after a few epochs, reduce the
learning rate by a factor of 10.
- Hyperparameter Tuning
- A grid search or random search strategy will be used to explore the performance
of different hyperparameters. The learning rate, batch size, and the number of
training epochs will be given particular attention
- Give particular attention to the learning rate, batch size, and the number
of training epochs.
Objective 2: To assess the performance and accuracy of the developed hybrid deep learning document
summarization system through suitable evaluation techniques
Methodology 2
- Quantitative Evaluation
- Rouge Score Calculation: We will use the PyRouge library in Python to calculate
the Rouge scores. The Rouge scores will include Rouge-1 (comparing unigram
overlaps), Rouge-2 (bigram overlaps), and Rouge-L (longest common
subsequence). This will give us a statistical measure of how well the model
performed in terms of precision, recall, and F1 score.
- Import the Rouge module from the rouge library in Python (from rouge
import Rouge).
- Calculate the average Rouge scores overall summaries in your dataset for
a comprehensive measure of the model's performance.
- BLEU Score Calculation: Besides Rouge, we can also compute the BLEU
(Bilingual Evaluation Understudy) scores using the NLTK (Natural Language
Toolkit) library in Python. BLEU is a metric for evaluating a generated sentence
to a reference sentence, which can also serve as a good quantitative measure for
our summarization task.
- Calculate the average BLEU score over all summaries in your dataset
- Qualitative Evaluation
- Ask each evaluator to read the summary and the original text, then rate
the summary based on the provided criteria.
- Inter-Annotator Agreement: Finally, to ensure the validity of human evaluations,
we will calculate the inter-annotator agreement among the human evaluators
using metrics such as Fleiss's kappa. This step is necessary to ensure that the
scoring isn't subjective and has a consensus.
- You can use the fleiss_kappa function from the nltk library to do this
(from nltk import agreement; ratingtask =
agreement.AnnotationTask(data=your_data); print('Fleiss\'s
Kappa:',ratingtask.multi_kappa())). The 'your_data' is a list of lists, each
inner list containing the evaluator ID, item ID, and score.
- If the kappa score is high (close to 1), it means there's strong agreement
between the evaluators, which validates the human evaluation results.
Objective 3: To design and develop a novel deep learning-based system for non-factoid question
answering, culminating in the creation of a comprehensive proof of concept.
Methodology 3
- The Question-Answering (QA) system will be designed with three core modules:
a question processing module, a document retrieval module (Dense Passage
Retrieval - DPR), and a summarization module (our hybrid summarization model
from Objective 1).
- Design specifics
- Question Processing Module: This is the initial phase where we receive the user's
question. We will use the BERT model from the HuggingFace Transformers
library to encode the user's question into a meaningful vector representation.
- Document Retrieval Module: We'll use the Dense Passage Retrieval (DPR)
model for this step. This model will use the output from the question processing
module (the vector representation of the user's question) to retrieve the most
relevant documents from the given dataset.
- Summarization Module: Here, we will input the retrieved documents into our
pre-trained hybrid summarization model from Objective 1. The summarization
model will generate a concise and accurate summary of the information in the
documents, which directly answers the user's question.
- We'll use Python for implementation because of its extensive libraries for NLP
and ML tasks.
- The HuggingFace Transformers library will be used for the DPR module, and
PyTorch for our summarization module.
- Question Processing Module: The role of this module is to convert the user's
question into a suitable format for the document retrieval module.
- Document Retrieval Module: This module will take the processed question and
find the most relevant documents from our corpus.
- Feed the processed question from the BERT model into the DPR.
- Summarization Module: This model will take the retrieved documents and
generate a summary that directly answers the user's question
- Feed the processed documents into the model and retrieve the generated
summaries.
- Training of Question-Answering Model
- Data Collection
- Data Preprocessing
- Model Training
- Use the AdamW optimizer for training the DPR model. This variant of the Adam
optimizer includes weight decay, which can help prevent overfitting.
- Monitor the model's validation loss and adjust the learning rate using PyTorch's
ReduceLROnPlateau scheduler if the loss plateaus.
- Hyperparameter Tuning
- Validation
- Evaluate the performance of the model on a separate validation set from the MS
MARCO dataset.
- Use Precision@K, Recall@K, and F1-Score for the document retrieval model
evaluation. These metrics measure the quality of the documents retrieved in
terms of relevance.
- Use ROUGE scores for the summarization model evaluation. This metric
measures the quality of the summaries generated in terms of similarity with the
reference summaries.
Objective 4: To systematically evaluate the functionality, accuracy, and efficiency of the developed non-
factoid question-answering system using the developed proof of concept.
Methodology 4
- Evaluation of Question-Answering System
- Quantitative Evaluation
- Use the same dataset (e.g., MS MARCO) to evaluate all models under
consideration. This ensures a fair comparison.
- Evaluate the Document Retrieval Module: Use Precision@K, Recall@K, and F1-
Score to evaluate the performance of document retrieval. These metrics measure
the relevance and comprehensiveness of the retrieved documents.
- Qualitative Evaluation
- Use Python to code the infrastructure for the qualitative evaluation. Use an
interface that allows evaluators to easily rate the answers generated by each
model according to the criteria.
- Recruit a small panel of human evaluators to perform this task. They would rate
the responses based on clarity, relevance, and correctness
- Compare the performance of our model with the benchmark models based on
both quantitative and qualitative evaluation results.
- Use Python's Matplotlib or Seaborn libraries to visualize the results for easier
comparison and interpretation.
- Analyze areas where our model outperforms or underperforms compared to the
benchmarks. Identify potential areas for improvement.
- Exploring deep learning methods for multiple document summarization to answer user-complex
questions.
- Objective 1
- Objective 2
- Objective 3
- The goal here is to infuse our summarization architecture from objective 1 into the QAS
architecture.
- Objective 4