You are on page 1of 18

A Graph-based Document Ranking Model

Final Year Project Proposal

Session: 2021 – 2025

Submitted by:
Muhammad Muneeb Iftikhar 2021-CS-648
Muhammad Awais Nasir 2021-CS-627
Hira Ahmad 2020-R/2021-CS-709

Supervised by:
Dr. Irfan Yousaf

Department of Computer Science, New Campus


University of Engineering and Technology
Lahore, Pakistan

1
Contents
List of Figures 3

List of Tables 4

1 Chapter 1: Proposal Synopsis 5


1.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Features/Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.7 Proposed Methodology/System . . . . . . . . . . . . . . . . . . . . . . . 13
1.8 Tools and Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.8.1 Software Technologies . . . . . . . . . . . . . . . . . . . . . . . . 14
1.8.2 Hardware Technologies . . . . . . . . . . . . . . . . . . . . . . . . 14
1.9 Team Members Individual Tasks/Work Division . . . . . . . . . . . . . . 14
1.10 Data Gathering Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.10.1 Source of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.10.2 Data Collection Process . . . . . . . . . . . . . . . . . . . . . . . 15
1.11 Timeline/Gantt chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

References 17

2
List of Figures
1 Proposed Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2 Gantt chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3
List of Tables
1 Summary of Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Work Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4
Proposal Synopsis

1 Chapter 1: Proposal Synopsis


1.1 Abstract
This project proposes an innovative document design framework that uses graphs and
machine learning to understand user queries and website content. Unlike traditional
search engines, it focuses on in-depth user descriptions and goes beyond keyword matches.
By analyzing relationships between terms, the system ranks websites based on semantic
similarity to the user’s query, providing more relevant and insightful answers to complex
questions.

1.2 Introduction
Nowadays it can be hard to find relevant information online because there is just so
much out there! Search engines regularly help us navigate this information overload, but
sometimes they don’t quite understand exactly what we’re looking for. The goal of this
project is to develop a new research tool that can use unique graphics (graphs) and clever
computer algorithms (machine learning) to find the best answers to your wide range of
questions.
Imagine that you have a difficult question and need an in-depth answer. This process
will be similar to that of your personal research assistant. You ask a long, detailed
question, and the system will highlight points and search the web for relevant websites.
But it won’t just search websites for similar terms - it will be much smarter! It will use
these images and charts to understand exactly what your request means and what is on
the websites. This way you get answers that are truly specific to your needs, not just
websites with random keywords.
This project is not concerned with obtaining any information on the Internet. About
the websites that best match your detailed question, make sure the answer you get is
exactly what you are looking for. The final feature will be easier to use and will show
which websites best answer your question. This will make it much easier to find the right
answers to your most difficult and detailed questions in the websites on the Internet

1.3 Problem Statement


In the past, traditional approaches to document sequencing struggled to understand the
semantic relationships in text. Often there was a lack of understanding of the context
and importance of documentation to respond to user queries, resulting in imprecise on-
line search results. To overcome these limitations, graph-based models have emerged,
providing a more sophisticated approach to serializing documents. They are a smart and
innovative way to fill out forms, helping people get what they need quickly.

1.4 Objectives
The major aim of this graph-based document ranking model was to improve the efficiency
as well as accuracy in document retrieval and ranking processes. Our model aims are:
• It allows for personalized search results related to the user’s specific queries, truly
enhancing both the search experience and satisfaction.
• This enables recommendation systems to efficiently identify and suggest relevant
documents according to user preferences, budgets, and requirements.

5
Proposal Synopsis

• Enhancing knowledge discovery in scientific research by enabling retrievable docu-


mentation and effective searches, helping researchers find relevant data.

1.5 Features/Scope
Scope:
• Development of a document ranking model based on semantic graphs.
• Use advanced ranking algorithms to make search results more accurate and relevant.
• Present the list of results to users using a simple and user-friendly format.
• Evaluate and test the system performance.
• Provide comprehensive documentation of methodology and results.
Features:
• Effectively extract keywords from user queries.
• It collects data from a variety of sources, including search engines and web scraping.
• Cleans and structures collected data for analysis.
• It uses advanced algorithms for document serialization based on graph theory, ma-
chine learning, and natural language processing.
• It provides an intuitive interface for query input and result display.
• Designed to handle large document processing (Internet) and improve efficiency.
• Aims improve the relevance and accuracy of the search results by using the latest
techniques.
• It ensures scalability, efficiency, and robustness of the system.

6
Proposal Synopsis

1.6 Related Work


Analysing similarities between legal court documents using natural language
processing approaches based on transformers
This document discusses the use of advanced artificial intelligence (AI) techniques,
especially natural language processing (NLP) to assess the similarity of legal docu-
ments in the Brazilian legal system It uses transformer-based models such as BERT
, GPT-2 and RoBERTa on use a large corpus of Brazilian Portuguese Trained.
These models have been optimized and developed specifically for regulatory systems
using sensitive data types. By calculating the vector representations of each doc-
ument and grouping them according to similarity, the study investigates how well
this model performs in terms of similarities between codes The findings suggest that
transformer-based models, in particular RoBERTa, performs better than traditional
NLP techniques for this task. [1].
Aspect-based Document Similarity for Research Papers
This paper presents a method to enhance traditional document similarity measures
by measuring specific similarity aspects between documents. Instead of just saying
whether the documents are similar or not, the method looks at how similar they are
in content. The researchers tested different models such as RoBERTa and SciBERT
in research papers to see which worked best. They found that SciBERT worked
well and suggest that this approach could help improve recommend-er systems for
research papers in the future. They share their data sets, code, and trained models
for others to use [2].
A new simple and effective measure for bag-of-word inter-document simi-
larity measurement
This paper discusses the measurement of similarity between two documents using the
bag of words (BoW) measure, which is commonly used in text mining. This popular
similarity measure highlights the limitations of traditional term weighting methods
for improving cosine similarity. Instead, the paper proposes a simple alternative that
does not rely on explicit weighting terms. This new measure uses a more nuanced
probabilistic approach to compare documents based on their shared vocabulary,
producing good results for binary BoW and competitive results for term-frequency-
based BoW representations [3].
Similar Cases Recommendation using Legal Knowledge Graphs
This paper discusses the construction of legal literacy estimates from Indian court
cases and legal documents, with a particular focus on intellectual property rights
issues. The goal is to anticipate similar court cases. To do this, the authors use a
graph neural network (GNN) model and incorporate relevant features obtained from
topic models and domain experts. These features help improve prediction accuracy
for similar cases. Furthermore, the book discusses how large-scale language models
(LLMs) can be enhanced with knowledge from these legal knowledge audits prior to
their use in other projects[4].
Detecting Semantic Similarity Of Documents Using Natural Language Pro-
cessing
This paper explores various approaches to understanding and measuring content
component coherence. It looks at how we can use embeddings, which capture mean-
ing and vocabulary in the text. Researchers developed a new model based on the

7
Proposal Synopsis

Universal Sentence Encoder to measure similarity between texts. They tested these
methods with a dataset and found that their new model performed better than other
methods. Finally, they developed an application for finding similarities between doc-
uments using their new models [5].
A Keyphrase Graph-Based Method for Document Similarity Measurement
This paper describes a method for measuring the mutual similarity of two datasets.
It attempts to provide a richer understanding of concepts and relationships in texts
by integrating knowledge from large databases such as DBpedia and Wikipedia,
proposing graph-based semantic models that consider the meaning of texts and of
its structure rather than relying solely on mathematical terminology. The method
analyzes the similarity by comparing semantic information represented by keyphrase
graphs in two documents, making it perform similar to specific methods for standard
data sets [6].
A Personalized Graph-Based Document Ranking Model Using a Semantic
User Profile
a personalized document ranking model uses extended graph-based distance mea-
sure, and semantic user profiles derived from Web Ontology (ODP), the individual
document ranking model model uses minimum common supergraph (MCS) and max-
imum common subgraph (mcs) methods together to align the user’s interests with
relevant documents It extends this approach by including understanding by shar-
ing concepts and using links. The comparative results show the effectiveness of the
model against Yahoo search results. [7].
Semantic Textual Similarity in Japanese Clinical Domain Texts Using BERT
This project focuses on semantic text analogy (STS) in the Japanese clinical do-
main, where two Japanese datasets (treatment) were generated using a bidirectional
encoder representation from a transformer (BERT) that handles features for STS
tasks in different languages internal media and electronic medical records) other than
English. The results show that although both general and clinical Japanese BERT
models performed well demonstrated superior performance, possibly due to greater
prior training in texts compared with the clinical Japanese BERT. [8].
Efficient Graph-based Document Similarity
This paper presents an efficient approach to document similarity based on graph
representation, which incorporates the relational knowledge of knowledge graphs In
contrast to traditional vocabulary-based approaches, graph models consume lan-
guages and vocabularies handle differences effectively. Experimental results show
that this method outperforms comparative measures to relate document similarity
to human judgments even for small documents, being statistically efficient compared
to other graph-based methods [9].
Research on document similarity calculation and detection based on deep
learning
This project introduces a new approach to document similarity estimation and anal-
ysis using deep learning. It uses efficient subtree matching to estimate document
feature sequences and extract keyword frequencies. Using deep learning to classify
documents, the method aims to facilitate similarity detection. The empirical findings
show low computational errors and reliable results, demonstrating the effectiveness
and feasibility of the proposed method. [10].

8
Proposal Synopsis

Table 1: Summary of Projects

Project Utilize Incorpora- Incorpora- Compara- Incorpora- Accuracy


Title Graph- tes Se- tes Se- tive tes Ma-
Based mantics mantic Effec- chine
Ranking User Recov- tiveness Learn-
Profiles ery against ing for
different Ranking
SEs
Personalized Incorporates Incorporates Derives Shows the Utilizes Precision
Graph advanced user pref- seman- model’s machine at 10
Based graph- erences tic user perfor- learning (P@10)
Model based and be- profiles mance algo- metric, is
ranking havior to from web compared rithms to 42.80%,
methods. personal- ontology. to various improve Precision
ize search SEs i.e document at 20
results. Yahoo. ranking. (P@20),
is 36.60%.
Semantic Does not Does not Does not Does not Utilizes CR re-
Textual utilize incorpo- incor- compare deep sults:
Similar- graph- rate a porate effec- learning 79%,
ity in based semantic semantic tiveness for doc- EMR
Japanese ranking. user pro- recovery. against ument results:
file. search ranking. 83%.
results of
SEs.
Research Does not Does not Does not Does not Utilizes Keyword
on Doc- utilize incorpo- incor- compare deep Repre-
ument graph- rate a porate effec- learning sentation
Similarity based semantic semantic tiveness for doc- Model,
Calcula- ranking. user pro- recovery. against ument Sentence
tion and file. search ranking. structure
Detection results of (42%-
based SEs. 96%),
on Deep Word
Learning order
(78%-
90%),
Word
form.
Continued on next page

9
Proposal Synopsis

Table 1 – continued from previous page


Project Utilize Incorpora- Incorpora- Compara- Incorpora- Accuracy
Title Graph- tes Se- tes Se- tive tes Ma-
Based mantics mantic Effec- chine
Ranking User Recov- tiveness Learn-
Profiles ery against ing for
different Ranking
SEs
Efficient Utilizes Does not Does not Does not Utilizes STS-12:
Graph graph incorpo- incor- compare deep 0.673,
based represen- rate a porate effec- learning STS-15:
Docu- tations for semantic semantic tiveness for doc- 0.707.
ment document user pro- recovery against ument
Similarity similarity. file. through search ranking.
shared results
concepts. of SEs
search
results.
Detecting Incorporates Does not This Compares Develops Pearson
Semantic embed- incor- project the per- a new Corre-
Similarity dings and porate explores formance model lation
Of Doc- similarity semantic methods of their based Values
uments measures, user pro- to cap- new on the 0.7722,
Using but not files. ture both model Universal Spearman
Natural specif- mean- against Sentence Corre-
Language ically ing and other Encoder lation
Process- based on words in methods to mea- Values
ing graph text but using a sure 0.6898,
ranking. does not dataset. similarity Kendall’s
specif- between Tau Cor-
ically texts. relation
mention Values
semantic 0.5167
recovery.
Aspect- Utilizes Does not Does not Compares Tests Achieves
based tradi- incor- specif- different different high F1
Docu- tional porate ically mod- machine scores
ment document semantic mention els like learning on ACL
Similar- similarity user pro- semantic RoBERTa mod- Anthol-
ity for mea- files. recovery. and SciB- els like ogy and
Research sures, not ERT, not RoBERTa CORD-19
Papers specif- search and SciB- datasets.
ically engines. ERT.
graph-
based.
Continued on next page

10
Proposal Synopsis

Table 1 – continued from previous page


Project Utilize Incorpora- Incorpora- Compara- Incorpora- Accuracy
Title Graph- tes Se- tes Se- tive tes Ma-
Based mantics mantic Effec- chine
Ranking User Recov- tiveness Learn-
Profiles ery against ing for
different Ranking
SEs
A Utilizes a Does not Integrates Does not Does not Achieves
Keyphrase keyphrase incor- semantic compare specif- the high-
Graph- graph- porate infor- effec- ically est Pear-
Based based semantic mation tiveness mention son cor-
Method method user pro- from large against machine relation
for Doc- for doc- files. databases different learning coefficient
ument ument like DB- search for rank- of 0.81.
Similarity similarity pedia and engines. ing.
Measure- measure- Wikipedia
ment ment. to en-
hance
under-
standing
of con-
cepts
and rela-
tionships
within
text.
Analysing Utilizes Does not Does not Does not Does not Achieves
simi- transformer- incor- incor- compare specif- a TF-IDF
larities based porate porate effec- ically accu-
between NLP ap- semantic semantic tiveness mention racy of
legal proaches user pro- recovery. against machine 0.964 and
court doc- for sim- files. different learning Word2Vec
uments ilarity search for rank- CBow
using analysis engines. ing. ptBR
NLP ap- between accuracy
proaches legal doc- of 0.999.
based on uments.
trans-
formers
Continued on next page

11
Proposal Synopsis

Table 1 – continued from previous page


Project Utilize Incorpora- Incorpora- Compara- Incorpora- Accuracy
Title Graph- tes Se- tes Se- tive tes Ma-
Based mantics mantic Effec- chine
Ranking User Recov- tiveness Learn-
Profiles ery against ing for
different Ranking
SEs
Similar Utilizes Does not Does not Does not Does not Achieves
Cases graph incor- incor- compare specif- a citation
Recom- neural porate porate effec- ically prediction
menda- network semantic semantic tiveness mention accuracy
tion using (GNN) user pro- recovery. against machine of 0.725
Legal models files. different learning and case
Knowl- for recom- search for rank- similarity
edge menda- engines. ing. accuracy
Graphs tion using of 0.556.
legal
knowl-
edge
graphs.
A new Does not Does not Does not Does not Does not Achieves
simple utilize incor- incor- compare specif- 95.81
and ef- graph- porate porate effec- ically ±0.23
fective based semantic semantic tiveness mention accuracy
measure ranking. user pro- recovery. against machine for term-
for bag- files. different learning frequency-
of-word search for rank- based
inter- engines. ing. BoW
document represen-
similarity tation
measure- and 95.10
ment ±0.22
accuracy
for binary
BoW
represen-
tation.

12
Proposal Synopsis

1.7 Proposed Methodology/System

Figure 1: Proposed Methodology

1. Query Processing:
Receive long and detailed queries from the user. Extract keywords from the query
using natural language processing (NLP) techniques such as tokenization, stop-word
removal, and stemming.
2. Data Collection:
Query multiple search engines (SE1, SE2, SE3) using the extracted keywords. Col-
lect URLs and relevant metadata (title, snippet, etc.) of search results from each
search engine.
3. Web Scraping:
Visit each URL obtained from the search engines. Scrape the content of the web
pages using web scraping techniques such as BeautifulSoup or Scrapy. Extract text,
images, and other relevant information from the web pages.
4. Data Preprocessing:
Clean the scraped data by removing HTML tags, boilerplate content, and other
noise. Normalize the text data by converting it to lowercase and removing punctua-
tion. Tokenize the text into words or phrases. Perform lemmatization or stemming
to reduce words to their base form.
5. Graph Construction:
Represent the preprocessed data as a graph structure. Nodes in the graph repre-
sent web pages or documents. Edges between nodes represent relationships such as
hyperlinks, co-occurrence of keywords, or semantic similarity.
6. Ranking Algorithm:
Develop a ranking algorithm based on graph theory, machine learning, or deep learn-
ing techniques. Incorporate features such as node centrality, edge weights, document
relevance, and user query relevance. Train the ranking model using labeled data if
available, or unsupervised learning techniques otherwise.
7. Ranking and Result Generation:
Apply the ranking algorithm to the constructed graph to rank the web pages/documents.
Generate a ranked list of web pages/documents based on their relevance to the user
query. Select the top ranked results for display to the user.

13
Proposal Synopsis

8. User Interface:
Design and develop a user friendly interface for query input and result display. Pro-
vide features such as filtering, sorting, and pagination for result navigation. Ensure
responsiveness and accessibility across different devices and screen sizes.
9. Testing and Evaluation:
Conduct thorough testing of the search engine system, including unit testing, in-
tegration testing, and system testing. Evaluate the performance of the system in
terms of search accuracy, relevance, speed, and user satisfaction. Gather feedback
from users through surveys, interviews, or usability studies.
10. Optimization:
Identify areas for optimization based on testing and evaluation results. Optimize
the system for performance, scalability, and resource efficiency.

1.8 Tools and Technologies


1.8.1 Software Technologies
• DJango/Flask
• Python
• Draw.io

1.8.2 Hardware Technologies


• 8GB of RAM
• 64-bit Windows 10 Operating System
• Intel(R) Core(TM) i5-3320M CPU @2.60GHz

1.9 Team Members Individual Tasks/Work Division

Team Members Work Division


Muhammad Muneeb Iftikhar Data Collection, Frontend, Backend
Muhammad Awais Nasir Data Collection, Frontend, Backend
Hira Ahmad Data Collection, Frontend, Documentation
Table 2: Work Division

1.10 Data Gathering Approach


1.10.1 Source of Data
The success of our graph based document ranking model and search engine heavily relies
on the availability and quality of data. In order to ensure comprehensive and diverse
coverage, we will employ multiple sources for data collection:
Search Engine Results Pages (SERPs):
We will utilize popular search engines such as Google, Bing, and Yahoo to retrieve
initial search results relevant to user queries. These search engines serve as primary
sources of web page URLs and metadata.

14
Proposal Synopsis

Publicly Available Datasets:


We will explore existing datasets that contain web page information, such as the
Common Crawl dataset or academic datasets from research publications. These
datasets provide a wealth of web page data, including text content, HTML structure,
and metadata.
Custom Web Scraping:
In addition to search engine results and existing datasets, we will employ custom
web scraping techniques to extract data directly from targeted websites. This ap-
proach allows us to gather specific information from sources that are not covered by
traditional search engines or public datasets.

1.10.2 Data Collection Process


Our data collection process will involve the following steps:
Query Submission:
Upon receiving a user query, we will submit the query to multiple search engines
simultaneously to retrieve initial search results.
Result Parsing:
We will parse the search engine results pages (SERPs) to extract URLs, titles, snip-
pets, and other relevant metadata for each search result.
Web Page Crawling:
We will crawl the URLs obtained from the SERPs to visit the corresponding web
pages and retrieve their content. This step involves fetching the HTML content of
each web page and extracting textual information, images, and other relevant data.
Data Filtering and Preprocessing:
The collected data will undergo filtering and preprocessing to remove noise, such
as advertisements, boilerplate content, and irrelevant information. We will also
normalize the text data by converting it to lowercase, removing punctuation, and
tokenizing it into words or phrases.
Storage and Organization:
Processed data will be stored in a structured format, such as a database or file
system, for easy access and retrieval during subsequent stages of the project.

15
Proposal Synopsis

1.11 Timeline/Gantt chart

Figure 2: Gantt chart

16
Proposal Synopsis

References
[1] https://arxiv.org/pdf/2204.07182.pdf. [Accessed 28-03-2024].
[2] arxiv.org. https://arxiv.org/pdf/2010.06395.pdf. [Accessed 28-03-2024].
[3] arxiv.org. https://arxiv.org/pdf/1902.03402.pdf. [Accessed 28-03-2024].
[4] arxiv.org. https://arxiv.org/pdf/2107.04771.pdf. [Accessed 28-03-2024].
[5] Detecting Semantic Similarity Of Documents Using Natural Language Process-
ing — pdf.sciencedirectassets.com. https://pdf.sciencedirectassets.com/
280203/1-s2.0-S1877050921X00129/1-s2.0-S1877050921011716/main.pdf?
X-Amz-Security-Token=IQoJb3JpZ2luX2VjEHwaCXVzLWVhc3QtMSJIMEYCIQC8de%
2BH5NGS5D3MN0fiVFB7iqA1%2FBZO8mLXoHJ41mk7CQIhALz6VQFnOqq%
2BoC5yRHJoL9NC1Aj2XyULXoTYtIxoTGGvKrsFCJX%2F%2F%2F%2F%2F%2F%2F%2F%
2F%2FwEQBRoMMDU5MDAzNTQ2ODY1IgyvRgBeD5BnE%2FkEB3YqjwUPYrXwd8rfTT%
2BF7N1pHaEoKqUmHTv3LoSERaa5n6JjOi0T96KKbo7hhwlW5M1H8GPLDbEwzrf%
2BVUn2AXeIuj9wZoqXb4yXkRSZPfDK9fmqaPk1b8ncx8OdLkEftWUBbhXcsQyN6dmsBYJsst2xi8OmDlW3
2BD734zZSDYRHGzp5e3XWSvKDZQkz9YYDbhqTcfq8xAhSewzVPYwu6sOoEUFWV87W0pbuCrTQCf0I2GSRM
2BG%2BY%2BgfPvjtTUgEhOJt5VCINWvCsFP1xbhODVSMjkfR5%
2BilLPZSVn9uOz4IPpFPadly65B7uu02huXJTh8pKlhmRSLJ3EVG48kM2fjGKBoUZ%
2BXBi7QNcDqH4OWfGE6Y6tp36IiomPTk8d5gQEU5iEMLGHrtBSfdPib%
2F8zQCLznY13BVeXxghMjcYCwDACv7A740afQqzvoTd%2BStmnEUfDZ2YrdkRS5KSkm7ZGoc4xMMr9LIVl
2F1og9OaOayd8d%2FnNkLRbq08aK3j3uv4Jnrrt6znYdeiL9Cf8ISm8GOnyUK9gkjW5Il7WunuZ%
2FuMLb%2B0cUyeZK6XFnrY9UWuchQj279L47S6999G0u5%2B%
2F7t3FQObbn5dxbAsb7GKy0bdnpABu564YjtisPMrAxfubntTsOgfE%
2BnToVicFtJ2Pz9lbjgLdJDxP6WNNavjV3Y0x3sc6yc2gvz5uMLwY9Ae%
2Fx2XbUqD01wupvweufdR9PhOng74E6%2BupMgQ3GdMOmlh7AGOrABD3rMtNpGMgZHj7YC57rzF4Hs5I81
2BlGUmUu9PEC%2BZ7E%2BQFXzOJzTvU%2BGJD1TQ9VljFu%
2BUaWrEUa64Y1KygIG6ATYjYUj0pvcbxrFl1P1tPzWXG6Y20Uq9BXA2COdASYGjpTtNKMrcXyxjbwShxNW
2BjLUAwVnuSvVh8Wwc17x17QSVaHijQHKfLYwYu2heLhvhvjx9rM%
3D&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=
20240325T210323Z&X-Amz-SignedHeaders=host&X-Amz-Expires=
300&X-Amz-Credential=ASIAQ3PHCVTY7VBGAT2U%2F20240325%
2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=
adefb57b859f40fa6b6884ca78cbd9d940ec9d7ed6004fe86e8bbdc441784e59&
hash=889b85535bba579a247f162ba27a1446ad2d1abe2864f418d5e1ffa0061f7511&
host=68042c943591013ac2b2430a89b270f6af2c76d8dfd086a07176afe7c76c2c61&
pii=S1877050921011716&tid=spdf-56c07cb6-3fcc-4915-8db8-01c30d15c0d8&
sid=c700e0da7af819488a7b15020b20ed7dc55fgxrqb&type=client&
tsoh=d3d3LnNjaWVuY2VkaXJlY3QuY29t&ua=0a0d5d50065303560159&rr=
86a1d8e98e97de43&cc=pk. [Accessed 28-03-2024].
[6] EBSCOhost Login — eds.p.ebscohost.com. https://eds.p.ebscohost.com/eds/
pdfviewer/pdfviewer?vid=1&sid=f436834e-0858-4565-8f4a-8604f74bd736%
40redis. [Accessed 28-03-2024].
[7] Mariam Daoud, Lynda Tamine, and Mohand Boughanem. A personalized graph-
based document ranking model using a semantic user profile. In User Modeling,
Adaptation, and Personalization: 18th International Conference, UMAP 2010, Big
Island, HI, USA, June 20-24, 2010. Proceedings 18, pages 171–182. Springer, 2010.

17
Proposal Synopsis

[8] Faith Wavinya Mutinda, Shuntaro Yada, Shoko Wakamiya, and Eiji Aramaki. Se-
mantic textual similarity in japanese clinical domain texts using bert. Methods of
Information in Medicine, 60(S 01):e56–e64, 2021.
[9] Christian Paul, Achim Rettinger, Aditya Mogadala, Craig A Knoblock, and Pedro
Szekely. Efficient graph-based document similarity. In The Semantic Web. Latest
Advances and New Domains: 13th International Conference, ESWC 2016, Herak-
lion, Crete, Greece, May 29–June 2, 2016, Proceedings 13, pages 334–349. Springer,
2016.
[10] Cui Xing, Yan Yang, and Jian Luo. Research on document similarity calculation and
detection based on deep learning. In Journal of Physics: Conference Series, volume
1757, page 012007. IOP Publishing, 2021.

18

You might also like