You are on page 1of 3

Synopsis

On

External Knowledge Sources For Answer Quality Prediction And


Sub-Topic Detection

For

School of Computer Science and Engineering

(Vellore Institute of Technology - VIT Bhopal)

MASTER OF COMPUTER APPLICATION


(2022-2024)

Submitted By:

Page | 1
Introduction
In the modern day internet, Community Question Answering Forums and Community Social
Networks are some of the fastest-growing areas and constitute a huge amount of internet traffic. In
this thesis, we propose novel approaches for Answer Quality Prediction in Community Question
Answering Forums and Sub-Topic Detection in a Community Social Network like Twitter. To
alleviate this, the problem of Answer Quality Prediction aims to predict the quality of an answer
posted in response to a forum question . In this thesis, we propose a novel approach for Answer
Quality Prediction known as - "Deep Feature Fusion Network" which combines the advantages of
both hand-crafted features and deep learning based systems. Given a question-answer pair along with
its metadata, a Deep Feature Fusion Network architecture independently learns features using the
Deep Neural Network and computes hand-crafted features leveraging various external knowledge
sources and then combines them using a fully connected neural network trained to predict the quality
of the given answer.

Motivation and Problem Description


The time invested in preparing the answer and the intention to answer also affect the quality
of answers. As a result, for a given question posed by the user, the quality of response often
varies a lot - ranging from highly precise and detailed answers given by authentic users (users
who are genuinely interested, write correct answers and have decent knowledge about the
question) to highly imprecise or non-comprehensible one-word or single line answers posted
by spammy and other non-serious users. To alleviate this problem, Community Question
Answering forums often include feedback mechanisms such as votes, ratings etc. for rating
the quality of answers and users which could also be used as signals for ranking the answers
for given a question. In this thesis, we propose novel approach to address the issue of Answer
Quality Prediction in Community Question Answering forums. The problem of Answer
Quality Prediction in Community Question Answering can defined as follows: Given a
question Q and its set of community answers rate the answers corresponding to their quality .

Methodology

Answer Quality Prediction in Community Question Answering


CQA forums provide a platform for interaction with experts and serve as popular and
effective means of information seeking on theWeb. These forums attract a large number of
users seeking and providing answers to a variety of questions on a various subjects. CQA
forums are a platform to present information in the form of a question and public response to
it, in the form of natural language. Over time, such CQA forums have become rich
repositories of knowledge encoded in the form of user generated questions and answers.
Users can get precise and compact answers to their questions in these forums instead of a list
of relevant documents from a search engine.

Page | 2
Sub Topic Detection
In the recent years, web has not only become the common source of information, but also a
crucial player in the evolution of events. Twitter is one of the microblogging social media
platforms where users pen their opinion on various issues, events and entities like
entertainment, industry, science, celebrities, social events, vehicles, phones, sports,
celebrities, local festivals, epidemics, economic, political, cultural etc., in the form of tweets.
Even before the news media or product feedback forums have the knowledge of what could
be the emerging topics/issues, Twitter can deliver the newly developing and trending topics
across the world and report to the web community in real time. The vast usage of Twitter has
generated the opportunity and the necessity of managing the online reputation of those
entities in real time. Online Reputation Management consists of monitoring and handling the
opinion of Internet users on celebrities, companies, and products, and is already a
fundamental tool in corporate communication.

Summary
We present a novel approach "Deep Feature Fusion Networks ", an end-to-end differentiable
approach which combines hand-crafted features into CNN and BLNA models for improving
Answer Quality Prediction. DFFN enriches the feature representations learned through CNN
and BLNA by introducing more hand-crafted similarity features computed using external
resources such asWikipedia, Anchor text of Google Cross-Lingual Dictionary , Clickthrough
Data. As a result, we show that DFFN achieves state-of-the-art performance on the standard
SemEval-2015 and SemEval-2016 benchmark datasets and shows better performance than
baseline approaches which individually employ either hand-crafted features or deep learning
based techniques.

Conclusions and Future Work


We have leveraged the advantages of both hand-crafted features and deep learning methods
and external knowledge sources by combining the hand-crafted features computed from data
and external knowledge sources into the deep learning model. DFFN enriches the feature
representations learned through CNN and BLNA by introducing more hand-crafted similarity
features computed using external resources such as Wikipedia, Anchor Text of Google Cross-
Lingual Dictionary, Clickthrough Data in addition to the ones computed from the data. As a
result, we show that DFFN achieves state-of-the-art performance on the standard SemEval-
2015 and SemEval-2016 benchmark datasets and shows better performance than baseline
approaches which individually employ either Hand Crafted Features or Deep Learning based
techniques. It will also be interesting to combine CNN, BLNA and Hand Crafted Features
together as a single Deep Feature Fusion Network and deep dive into the features learned. We
proposed a novel approach for the problem of sub-topic detection from tweets. We designed
interesting ways of semantically representing tweets using various external knowledge bases,
and further combined them with relevant similarity metrics to define features for the sub-
topic detection classifier.

Page | 3

You might also like