You are on page 1of 81

WORD PREDICTION USING

TRIE DATA STRUCTURE

J COMPONENT PROJECT REPORT

Fall 2021-2022

Submitted by

Drashti Patel(19BCE0602)

Devanshi Choudhary (19BCE2614)

Prince Kumar(19BCI0002)

in partial fulfilment for the award of the


degree of

B. Tech

in

Computer Science and Engineering

Vellore-632014, Tamil Nadu, India

School of Computer Science and Engineering


December 2021
Title: Word prediction using Trie data structure

Abstract

In the present century, we live in a fast-forward life where saving time is a key for all and
hence, we as humans have made use of modern technology in order to save more and more
time and also to perform tasks efficiently. In this regard, word predictor is a small contribution
that increases efficiency.

Word Predictor is used in messaging apps like WhatsApp, web search engines, word processors,
and commands like interpreters, among other things. The original goal of word prediction
software was to assist persons with physical limitations in increasing their typing speed and
reducing the number of keystrokes required to produce a certain word or sentence. As a result,
we created our own word prediction software based on the Trie data structure, which
significantly improves the user's productivity by at least 10%. Word completion, often
known as autocomplete, is a function that guesses the rest of a word as the user types it. In most
frequently used graphical user interfaces, users can accept a recommendation or prediction by
using the tab key, or the down arrow key to accept any additional words from the list of
predicted words by pressing the down arrow key.

Word predictors are useful in a variety of situations, including texting, search engines, and so
on. We utilized the Trie data structure to create our word prediction algorithm. Our application
employs a pre-stored collection of terms to forecast what words the user would think of, which is
quite helpful.

Keywords

Trie data structure, auto-complete, word prediction.

Introduction
When autocomplete properly guesses the term a user intends to type after only a few letters have
been placed into a text input field, it speeds up human-computer interactions. It works best in
domains with a restricted number of available words (like command-line interpreters), when
specific words are significantly more prevalent (like in email addresses), or when creating
organized and predictable material (as in source code editors). Many autocomplete algorithms
that are based on Artificial intelligence and machine learning and are powered with natural
language processing learn new words after the user has written them a few times, and can suggest
alternatives based on the learned habits of the individual user, hence leading to quiet a
personalized predictor system for every user.
Autocomplete or word completion works in order that when the author writes the primary letter
or letters of a word, the program predicts one or more possible words as choices. If the word he
intends to –write is included within the list he can select it, for instance by using the amount keys.
If the user's desired word is not anticipated, the writer must insert the word's next letter. At this
stage, the word(s) chosen are changed such that the words offered to begin with the same letters
as the persons chosen. When the user's desired word appears, it is picked, and the word is also
added to the text.
Words most likely to follow the recently typed one are anticipated in another type of word
prediction based on recent word pairs utilized. Word prediction is based on language modelling,
which calculates the words that are most likely to appear within a given vocabulary. Basic word
prediction on AAC devices is typically combined with a regency model, in which words that the
AAC user uses more frequently are more likely to be anticipated. Word prediction software
frequently allows users to either submit their own words into word prediction dictionaries or to
"learn" words that have already been published.

Literature survey

Authors and Title Methodology merits


year
1. Action Word • Attendgru:- This is the Action word
Sakib Prediction for only approach that does not prediction as a
Haque∗ , Neural Source explicitly include the AST, complementary
Aakash Code so this baseline is not problem to source
Bansal∗ , Summarization applicable to the challenge code
Lingfei experiments. summarization. We
Wu† and • ast-attendgru-This baseline argue that action
Collin represents approaches that word prediction is a
McMillan∗ flatten the AST into a key component of
∗Dept. of sequence, then use a code
Computer seq2seq- like approach to summarization,
Science, create the summary. because 1) high
University • ast-attendgru-fc-They quality summaries
of Notre proposed an extension to tend to use an action
Dame, code summarization tools word as the first
Notre that includes “file context”, word in the
Dame, IN, which they define as all the summary, and
USA other functions in the same 2) the first word of a
7 JAN 2021 file as the function being prediction tends to
summarized. have a high impact
• graph2seq-They use a on the subsequent
graph neural network predictions from a
(GNN) to model the AST. model. A key point
Their paper focuses on is that source code
code generation, but summarization tools
suggest that it is possible very often need to
to use the GNN encoder predict the action
for code summarization as word anyway, so a
well special emphasis on
• code2seq-are pursuing path- this problem is
based encoding solutions justified due to that
that, in short, randomly word’s importance.
select 100- 200 paths in the
AST and use an attention
mechanism to attend to
different paths for different
words. We use their
approach as a representative
of path-based solution.
2. Virtual • Once a sequence of words In this paper, a
Monica keyboard with is started, the system word prediction
Jordan, the prediction indicated a path with the strategy supported
Guilherme of words for states most likely to occur Markov chains and
N., children with in sequential positions, an inverted and an
Nogueira cerebral palsy because a word may belong observational
to more than one matrix which will be
grammatical class.

Neto, • Grammatical features of the Employed in


Alceu Brito word are loaded during education schools in
Jr, Percy typing time. The probability an
Nohama of classes occupying the exceedingly
17 February next position is set developin g country,
2020 according to their sequence whose computers
position. have low
• However, because the computational
partial deletion of a word power. Both the
may occur, the grammatical quantity of click
features might be retrieved. savings and
If the last word entered is therefore
entirely deleted, the accuracy within
information from previous the prediction of
words will be required. desired words to be
This retrieval requires entered in a very
storing the possible paths text were
from that sequence position considered
onwards. satisfactory. the
whole system
support ed
prediction technique
is out
there on the web,
and users have
unrestricted and free
access thereto. The
linguistic
limitation observed
b y the employment
of a corpus is
solved by opening
the ASCII text
file..
3. Effective Word • The methodology is clear A pioneering
Muhammad Prediction in and straight as the research has been
Hassan∗Muh Urdu intelligent words were built presented for Auto
ammad Language with the implementation of Sug-gestor and
Saeed∗Ali UsingStochasti artificial intelligence, Predictor of word
Nawaz∗Kam c Model through which the user (Get Your Word
r an typed character and a list Before Your Typing
Ahsan†Sehar of suggestions to complete to Increase The
Jabeen∗Farh the word is available for the Typing Speed).(N-
an Ahmed user. 1) logi chas been
Siddiqui∗Kh • When the user types a used for more
a war Islam completed word, then the Artificial
2nd July next most proba-bility Intelligence,
2018 holder word would be different algorithms,
given to be fitted there so and searching
itcould save time more approaches to
than before with 99% increase the typing
perfect cor-rection. speed by providing
• The study ended up with an better auto
optimized version of the suggestions and
algorithm that would fulfill predictions.
the need of output with the Although the
best utilization and research and
artificial intelligence by experiments did not
working behind the screen contain a huge
as well to make the amount of data, the
intelligence even better for outcomes met state-
user of front-end program to of –the- art work. It
get experience that is more has been noticed
professional with the that words
application. dictionary, quality
and quantity had
significant impact
on predicting the list
of suggestion and its
precision.
4. Word • The computational In this paper the
Abdul Prediction linguistic model called n- proposed word
Saboor , Support Model gram was used for the word prediction support
Mihaly for SSVEP- prediction based on the text model can
Benda , Based BCI database. significantly
Felix Web Speller • The n-gram model increase the ITR
Gembler , consisted of the contiguous and accuracy of the
and Ivan sequence of n items for a SSVEP-based BCI
Volosyak given text. Thus, the n-gram web speller. The
16 May is a set of co- occurrence of CSS based
2019 words within a given animation can
bracket, and they can be provide better
classified as uni-gram (size stimulation and
1), bi-gram (size 2), tri- straightforward
gram (size 3), etc. configuration
• The number of n-grams (g) options. During the
for a given sentence (s) can execution of the
be speller, the AJAX
calculated as gs = w − (m − calls were placed to
1) call the Java Servlet
(2) where m is gram and Java Beans
number (m = 1 for uni- which provided the
gram, m = 2 for bi-gram ... JSON format
and so on), and w is equal to glossary from the
number of words in a database. These
sentence. calls proved to
• The n-gram model predicts produce the in-time
xi, based on word prediction.
xi−(n−1),...,xi−1 (3) In Therefore, the
terms of probability, we can online speller
represent it as P(xi | proposed during this
xi−(n−1),...,xi−1) study will be
extended for the net
based email
interface, or to write
down a blog which
require a user to
type the text within
the net
browser.
5. Neural • In this paper,word In this paper,the
Rongxiang Machine prediction mechanism was encoder-decoder
Weng, Translation proposed to control the architecture
Shujian with Word values of initial state. The provides a general
Huang, Predictions intuition is that since the paradigm for
Zaixiang initial state is responsible learning machine
Zheng, for the translation translation from the
Xinyu of whole target sentence, it source language to
should at least contain the target language.
information of each word in However, due to
the target sentence the large amount of
parameters and
relatively small
Dai and . training data set, the
Jiajun • Word Predictions for end-toend learning
Chen 5 Decoder’s Hidden States of an NMT model
Aug 2017 Similar intuition is also may not be able to
applied for the decoder. learn the best
Because the hidden states of solution. We argue
the decoder are responsible that at least part of
for the translation of target the problem is
words, they should be able caused by the long
to predict the target words error
as well. The only difference backpropagation
is that we remove the pipeline of the
already generated words recurrent structures
from the prediction task. So in multiple time
each hidden state in the steps, which
decoder is required to p provides no direct
control of the
information carried
by the hidden states
in both the encoder
and decoder.

6. Federated • In this paper, the next-word In this paper,a CIFG


Andrew learning prediction model uses a language model
Hard, for mobile variant of the Long Short- trained from scratch
Kanishka keyboard Term Memory (LSTM) using federated
Rao, Rajiv prediction recurrent neural network learning can
Mathews, called the Coupled Input outperform an
Swaroop and Forget Gate (CIFG) . identical server
Ramaswam As with Gated Recurrent trained CIFG model
y, Units , the CIFG uses a and baseline n-gram
Franc¸oise single gate to control both model on the
Beaufays the input and recurrent cell keyboard next-word
Sean self- connections, reducing prediction task. To
Augenstein, the number of parameters our knowledge, this
Hubert per cell by 25%. For represents one of
Eichner, timestep t, the input gate it the first applications
Chloe and forget gate ft have the of federated
Kiddon, relation: ft = 1 − it. language modeling
Daniel • The Federated Averaging in a commercial
Ramage algorithm is used on the setting. Federated
28 Feb 2019 server to combine client learning offers
updates and produce a new security and privacy
global model. At training advantages for
round t, a global model wt users by training
is sent to a subset K of across a population
client devices. In the special of highly distributed
case of t computing devices
= 0, client devices start from while
the same global model that simultaneously
has either been randomly
initialized or pre-trained on improving
proxy data. Each of the language model
clients participating in a quality
given round has a local
dataset consisting of nk
examples, where k is an
index of participating
clients. nk varies from
device to device.
7. Next Words • Data pre-processing is an In this paper,they
Sourabh Prediction illustration to create a used a train and try
Ambulgekar Using flexible model that can help an algorithm that
, Recurrent users to detect next word best fit this task and
Sanket NeuralNetwork while understanding user mostly we are
Malewadika s vocable in a fast and looking forward to
r, effective manner so user implementing an
Raju need to provide 40 letters LSTM to get good
Garande then it passes this letter to accuracy as this task
and Dr. LSTM NN and predicts N is quite complex
Bharti number of letters because we have to
Joshi 09 • The cell state is similar to predict the user’s
August a transport line. It runs future text which he
2021 straight down the whole will be thinking At
chain, with just some present we manage
minor direct to understand the
communications. It’s simple problem statement
for data to simply stream as this problem is
alongside it unaltered. The unique, we created a
LSTM can eliminate or add 3d vector layer of
data to the phone state, input and a 2d
painstakingly managed by vector layer for
structures called entryways. output and feed
Gates are an approach to through to the
alternatively let data LSTM layer having
through. They are made out 128 hidden layers
of a sigmoid neural net and manage to get
layer and a pointwise accuracy to around
duplication activity. The 56% during 5
sigmoid layer yields epochs
numbers somewhere in the
range of nothing and one,
depicting the amount of
every segment ought to be
let through. A worth of zero
signifies "let nothing
through," while a worth of
one signifies "let everything
through!" A LSTM has
three of these gate, to secure
and control the cell
state.
8. Relevance In this paper ,Sometimes, In this paper, a
Gend Lal and enhanced the prediction of the next completely unique
Prajapati, entropy word in language model is method for multi-
Rekha Saha based based on the prediction document data fusion
31 May Dempster report from different namely REEDS is
2019 Shafer document evidence. The proposed which
approach for words predicted by predicts the
next word different evidence are subsequent word
prediction denoted as w1, w2 and w3 given a group of
using in the FOD, denoted as X= words with its mass
language {w1, w2, w3}. A total of value from the
model five documents predict the various document of
next word independently Language Model
and its prediction is because the evidence.
characterized as BOE. this is often a hybrid
These BOEs are denoted as methodology which
m1, m2, m3, m4 and m5 integrates the
respectively. relevance of various
• The set of next probable evidence together
words, called as candidates with EBE and
from the various document Dempster’s
are given and the decision combination rule.
about the target word is to The altered
be taken. Undoubtedly, w1 credibility is decided
will be the next likely supported three
word. However, in the factors. Firstly, UM
second BOE, the next is calculated using
probable word w1 is EBE.
assigned zero mass value
and w2 is assigned the
highest mass value 0.90.
This is highly conflicting
with the rest of BOEs
which creates confusion
and forbids from taking the
decision on the next
probable word.
• Algorithm used wasDS
Combination Evidence,
that combines the evidence
using Dempster’s
combination rule and
algorithm 2 REEDS which
predicts next word from a
given set of probable words
with its mass
9. Word • Two groups of monolingual this study
Petra van Recognition Dutch children took part in elaborated on
Alphen, and Word this study: a group of 68 previous studies by
Susanne Prediction in toddlers with a suspicion of testing a clinically
Brouwer, Preschoolers DLD and a group of 36 relevant sample of
Nina With (a toddlers with TD. For children with DLD
David, Suspicion of) a simplicity reasons, we refer at a young age on
Emma Developmenta to the former as the DLD both online word
Dijkstra and l Language group and the latter as the recognition and
Paula Disorder: TD group. prediction. Such
Fikkert Evidence • Dutch sentences were data aid in a better
4 Jun 2021 From devised containing a verb understanding of
Eye Tracking that predicted which of the early problems in
two objects displayed on language
the screen would be named. processing,
Each predicting verb and which may help to
target noun was only used develop useful
once in the experiment. diagnostic tools and
Two different sentence treatments for
constructions were used in children with DLD.
which the target noun that Our results revealed
was predicted by the verb that our groups
was either the subject or the demonstrated
object of the sentence. different online
language processing
trajectories. Most
importantly, we
found that, although
children with DLD
had similar online
word recognition
skills, they seem to
have
different online
prediction
abilities.
10.
Henrique Hybrid Model • This paper proposes a In this paper, it
X. Goulart, For Word hybrid word prediction absolutely was
Mauro D. Prediction Using model, that performs developed a hybrid
L. Tosi, Naive Bayes and inferences based on Naive model to complete
Daniel Latent Bayes and Latent Semantic sentences
Soares- Information Analysis (LSA) theories. employing Naive
Gonc¸alves, • The co-occurrences Bayes and LSA
Rodrigo F. patterns which represents models. there have
Maia and the Naive Bayes network been proposed the
Guilherme can be stored in a set of parameters λ1, ...,
Wachs- graphs G = (g0, g1, λn−1 to optimize
Lopes ..., gd−1), where nodes the inferences of
2 Mar 2018 represent words and edges Naive Bayes and
represent the number of another parameter α
times that each pair of to enhance their
words co-occurred in a integration with the
same textual level. Each LSA. That is, for
graph g(i) with 0 <= i <= n every lambda λ,
− 1 represents the different
connections of the words experiments result
that appear in the text with i to similar final
words between them. values. this is often
Latent Semantic Analysis consists a robust evidence
of building the relationship table. that the
Firstly, it is necessary to adopt the optimization
textual level to be used in the LSA. technique is
In this paper, only sentences with functioning.
more than 4 nonstop- words are Furthermore, results
used. This measure was established show that the
to gather only the most semantic foremost relevant
relevant phrases history information
for the suggestion
are the previous and
posterior
neighboring
words.
Methodology

To comprehend the implementation of this project, we must first comprehend the Trie data
structure that we used to build it.

The phrase "retrieval" inspired the term "Trie." It's a sorted, tree-based data structure for storing a
collection of strings. There are as many signs as there are characters in each location. In order to
search the dictionary for a certain term, it uses the prefix. Because we know that strings are made
up of letters ranging from 'a' to 'z,' each node of the trie must have a maximum of 26 points.

The digital tree, or prefix tree, is another name for Trie. The key with which a node is associated
is determined by its position in the Trie.

Now let us see some basic properties of the trie data structure:
• The root node of the trie is always a null node.
• The child nodes of each node are sorted alphabetically.
• As we told earlier, each node can have a maximum of 26 child nodes.
• All nodes apart from the root node can store one of the alphabets.
Trie has the same three basic operations as any other tree-based data structure: insertion of a node,
searching for a node, and deletion of a node. The action of locating a node is at the heart of our
project. However, learning about the insertion process is necessary in order to comprehend the
search operation. As a result, in this part, we witness both insertion and searching of a node.

The first step is to create a completely new node in the trie. It's critical to understand the following
points before we begin the implementation:

● Every letter of the input key (word) is inserted as a separate entity within the Trie_node.
Note that children point to the next following level of Trie nodes.
● The key character array acts as an index of the children node.
● If this node already encompasses a regard to the current letter, set the current node to it
referenced node. Otherwise, create a replacement node, set the letter to be adequate for the
current letter, and even start the current node with this new node.
● The character length determines the depth of the trie.

Given below is the pseudo code for insertion of a node

Insertion of a node into this data structure has two main components.
• map<character,TrieNode>; This part is used in order to establish parent child relationship.
• Boolean endOfWord It is used to indicate the end of the string when a set of
characters are to be added to the trie
Next let us see the searching of a node, which is very similar to the insert operation, the
implementation of the search operation is as shown below.

Now let us come to what actually happens when we give input to this trie-based autocomplete
program. Consider our dictionary file has "Hell", "Hello", "Help", "Helps", "Hellish” as the five
words in the words that have “He” as the start part of the string.

Now when we type he it would suggest all these words ,bur when we type helping as the word it
would terminate the search p as it is the last child node for the trie that has stored all these words.
Shown below is how the programs tries to predict the words. Here is a part of the trie that has
stored these words
Next let’s see how the word “Help” is searched

"Help" is processed by starting up from the root node and comparing each node as it moves down.
Here the traversal can be viewed as H->E->L->P as the search terminates due to the end of the
string.

In case the string “Helpi” is given as input the search terminates at “p” and the program would not
give any result and ask if we wished to all this word to the dictionary, and hence the user will have
the ability to add new words to the pre-existing dictionary.
Implementation

This is the javascript file “trie.js” which will help us search for the words using trie data
structure.

This file has the word dictionary that will be used to predict words using the trie.js program. This
dictionary had over 50,000 words.
Generating test cases using http://generatortestcase.herokuapp.com/

This is the basic interface of our project

The snap below shows all the words predicted when the yser gives “app” as the input.
The user can click on the word they required
When user gives “bac” as input

When user gives “wa” as input

Our previous model was a simple C++ code and had no specific graphical user interface, rather
used the output screen only as the mode of interaction between user and program.

This is the output when autocomplete was selected and “app” was given as the input.
And incase a word that is not present in the dictionary was typed the system asked if the user
wished to add the word to the dictionary.
Conclusion

Word Predictor have application in messaging application like what’s app, web search engines,
word processors, command like interpreters, etc. The original purpose of word prediction software
was to help people with physical disabilities increase their typing speed,[1] as well as to help them
decrease the number of keystrokes needed in order to complete a word or a sentence. Thus, on this
front, we developed our own program for word predictor using data structure trie which definitely
increases the efficiency of the user.

It is also more faster at predicting words than a hash table, which is another popular technique used
for word prediction, this is because we can insert and find strings in O(L) time where L represent
the length of a single word. This is obviously faster than Hash.

This is also faster than Hashing because of the ways it is implemented. We do not need to compute
any hash function. No collision handling is required (like we do in open addressing and separate
chaining) Another advantage of Trie is, we can easily print all words in alphabetical order which
is not easily possible with hashing.

The graph of comparison of time complexities clearly shows that trie is an apt data structure for
any word prediction or autocomplete system.
References

[1] Haque, S., Bansal, A., Wu, L. and McMillan, C., 2021, March. Action word prediction for
neural source code summarization. In 2021 IEEE International Conference on Software
Analysis, Evolution and Reengineering (SANER) (pp. 330-341). IEEE.

[2] Jordan, M., Neto, G.N.N., Brito Jr, A. and Nohama, P., 2020. Virtual keyboard with the
prediction of words for children with cerebral palsy. Computer methods and programs in
biomedicine, 192, p.105402.

[3] Siddiqui, M.F. and Hassan, M., 2018. Effective word prediction in urdu language using
stochastic model. Sukkur IBA Journal of Computing and Mathematical Sciences, 2(2), pp.38-46.

[4] Saboor, A., Benda, M., Gembler, F. and Volosyak, I., 2019, June. Word Prediction Support
Model for SSVEP-Based BCI Web Speller. In International Work-Conference on Artificial
Neural Networks (pp. 430-441). Springer, Cham.

[5] Weng, R., Huang, S., Zheng, Z., Dai, X. and Chen, J., 2017. Neural machine translation with
word predictions. arXiv preprint arXiv:1708.01771.

[6] Hard, A., Rao, K., Mathews, R., Ramaswamy, S., Beaufays, F., Augenstein, S., Eichner, H.,
Kiddon, C. and Ramage, R., 1811. Federated learning for mobile keyboard prediction (2018).
arXiv preprint arXiv:1811.03604.

[7] Ambulgekar, S., Malewadikar, S., Garande, R. and Joshi, B., 2021. Next Words Prediction
Using Recurrent NeuralNetworks. In ITM Web of Conferences (Vol. 40, p. 03034). EDP
Sciences.

[8] Prajapati, G.L. and Saha, R., 2019. REEDS: Relevance and enhanced entropy based
Dempster Shafer approach for next word prediction using language model. Journal of
Computational Science, 35, pp.1-11.

[9] van Alphen, P., Brouwer, S., Davids, N., Dijkstra, E. and Fikkert, P., 2021. Word
Recognition and Word Prediction in Preschoolers With (a Suspicion of) a Developmental
Language Disorder: Evidence From Eye Tracking. Journal of Speech, Language, and Hearing
Research, 64(6), pp.2005-2021.

[10] Goulart, H.X., Tosi, M.D., Gonçalves, D.S., Maia, R.F. and Wachs-Lopes, G.A., 2018.
Hybrid model for word prediction using naive bayes and latent information. arXiv preprint
arXiv:1803.00985.
Don`t Worry! This report is 100% safe & secure. It`s not available publically and it`s not accessible by search engines
(Google, Yahoo. Bing, etc)

Sentence
In the present century, we live in a fast-forward life where saving time is a key for all and hence, we as humans have
made use of modern technology in order to save more and more time and also to perform tasks efficiently. Word
Predictor is used in messaging apps like WhatsApp, web search engines, word processors, and commands like
interpreters, among other things. . Word completion, often known as autocomplete, is a function that guesses the rest of
a word as the user types it. In most frequently used graphical user interfaces, users can accept a recommendation or
prediction by using the tab key, or the down arrow key to accept any additional words from the list of predicted words by
pressing the down arrow key. Word predictors are useful in a variety of situations, including texting, search engines, and
so on. We utilized the Trie data structure to create our word prediction algorithm. Our application employs a pre-stored
collection of terms to forecast what words the user would think of, which is quite helpful. Introduction Autocomplete or
word completion works in order that when the author writes the primary letter or letters of a word, the program predicts
one or more possible words as choices. If the word he intends to –write is included within the list he can select it, for
instance by using the amount keys. If the user's desired word is not anticipated, the writer must insert the word's next
letter. At this stage, the word(s) chosen are changed such that the words offered to begin with the same letters as the
persons chosen. When the user's desired word appears, it is picked, and the word is also added into the text. Words
most likely to follow the recently typed one are anticipated in another type of word prediction based on recent word
pairs utilized. Word prediction is based on language modelling, which calculates the words that are most likely to appear
within a given vocabulary. Basic word prediction on AAC devices is typically combined with a regency model, in which
words that the AAC user uses more frequently are more likely to be anticipated. Word prediction software frequently
allows users to either submit their own words into word prediction dictionaries or to "learn" words that have already
been published. Authors and year Title Methodology merits 1. Sakib Haque∗ , Aakash Bansal∗ , Lingfei Wu† and Collin
McMillan∗ ∗Dept. of Computer Science, University of Notre Dame, Notre Dame, IN, USA 7 JAN 2021 Action Word
Prediction for Neural Source Code Summarization · Attendgru:- This is the only approach that does not explicitly include
the AST, so this baseline is not applicable to the challenge experiments. · ast-attendgru-This baseline represents
approaches that flatten the AST into a sequence, then use a seq2seq- like approach to create the summary. · ast-
attendgru-fc-They proposed an extension to code summarization tools that includes “file context”, which they define as
all the other functions in the same file as the function being summarized. · graph2seq-They use a graph neural network
(GNN) to model the AST. Their paper focuses on code generation, but suggest that it is possible to use the GNN encoder
for code summarization as well · code2seq-are pursuing path- based encoding solutions that, in short, randomly select
100- 200 paths in the AST and use an attention mechanism to attend to different paths for different words. We use their
approach as a representative of path-based solution. Action word prediction as a complementary problem to source
code summarization. We argue that action word prediction is a key component of code summarization, because 1) high
quality summaries tend to use an action word as the first word in the summary, and 2) the first word of a prediction
tends to have a high impact on the subsequent predictions from a model. A key point is that source code summarization
tools very often need to predict the action word anyway, so a special emphasis on this problem is justified due to that
word’s importance.

Report Title: Project Report

Report Link:
(Use this link to send report to https://www.check-plagiarism.com/plag-report/523233a7af052c0c285fafb718fa09b2f54721639064344
anyone)

Report Generated Date: 09 December, 2021

Total Words: 690

Total Characters: 4091

Keywords/Total Words Ratio: 0%

Excluded URL: No
Unique: 93%

Matched: 7%

Sentence wise detail:


In the present century, we live in a fast-forward life where saving time is a key for all and hence, we as
humans have made use of modern technology in order to save more and more time and also to perform tasks efficiently.
Word Predictor is used in messaging apps like WhatsApp, web search engines, word processors, and commands like
interpreters, among other things. .
Word completion, often known as autocomplete, is a function that guesses the rest of a word as the user types it.
In most frequently used graphical user interfaces, users can accept a recommendation or prediction by using the tab
key, or
the down arrow key to accept any additional words from the list of predicted words by pressing the down arrow key.
Word predictors are useful in a variety of situations, including texting, search engines, and so on.
We utilized the Trie data structure to create our word prediction algorithm.
Our application employs a pre-stored collection of terms to forecast what words the user would think of, which is quite
helpful.
Introduction Autocomplete or word completion works in order that when the author writes the primary
letter or letters of a word, the program predicts one or more possible words as choices.
If the word he intends to –write is included within the list he can select it, for instance by using the amount keys.
If the users desired word is not anticipated, the writer must insert the word&#039;s next letter.
At this stage, the word(s) chosen are changed such that the words offered to begin with the same letters as the persons
chosen.
When the user&#039;s desired word appears, it is picked, and the word is also added into the text.
Words most likely to follow the recently typed one are anticipated in another type of word prediction based on recent
word pairs utilized.
Word prediction is based on language modelling, which calculates the words that are most likely to appear within a
given vocabulary.
Basic word prediction on AAC devices is typically combined with a regency model, in which words that the AAC user uses
more frequently are more likely to be anticipated.
Word prediction software frequently allows users to either submit their own words into word prediction dictionaries or to
learn&quot; words that have already been published.
Authors and year Title Methodology merits 1. Sakib Haque∗ , Aakash Bansal∗ , Lingfei Wu† and Collin McMillan∗ ∗Dept.
(0)
of Computer Science, University of Notre Dame, Notre Dame, IN, USA 7 JAN 2021 Action Word Prediction for Neural
Source Code
Summarization · Attendgru:- This is the only approach that does not explicitly include the AST, so this baseline is not
applicable to the challenge experiments. (1)
· ast-attendgru-This baseline represents approaches that flatten the AST into a sequence, then use a seq2seq- like
approach to create the summary. · ast-attendgru-fc-They proposed an extension to code summarization tools that
includes “file context”, which (2)
they define as all the other functions in the same file as the function being summarized. (3)
· graph2seq-They use a graph neural network (GNN) to model the AST.
Their paper focuses on code generation, but suggest that it is possible to use the GNN
encoder for code summarization as well · code2seq-are pursuing path- based encoding solutions that, in short, randomly
select 100- 200 paths in the AST and use an attention mechanism to attend to different paths for different words. (4)
We use their approach as a representative of path-based solution.
Action word prediction as a complementary problem to source code summarization.
We argue that action word prediction is a key component of code summarization, because 1) high quality summaries
tend to use an action word
as the first word in the summary, and 2) the first word of a prediction tends to have a high impact on the subsequent
predictions
from a model.
A key point is that source code summarization tools very often need to predict the action
word anyway, so a special emphasis on this problem is justified due to that word’s importance.

Match Urls:
0:
https://www.semanticscholar.org/paper/Recurrent-Networks-%3A-Learning-Algorithms-%E2%88%97-Doya/f7c185861999
97d8521a51047b636eedcc83be60
1: https://www.imdb.com/title/tt0640956/trivia
2: https://docs.microsoft.com/en-us/ef/ef6/fundamentals/configuring/config-file
3: https://dictionary.cambridge.org/dictionary/english/summarized
4: https://www.powerthesaurus.org/different/synonyms/
Keywords Density

One Word 2 Words 3 Words

word 11.44% word prediction 2.93% action word prediction 0.88%

predict 4.99% code summarization 1.76% code summarization tools 0.59%

prediction 3.52% action word 1.47% present century live 0.29%

words 3.23% arrow key 0.59% baseline applicable challenge


0.29%
code 2.64% desired word 0.59%
function summarized graph2seq
0.29%

Plagiarism Report
By check-plagiarism.com
Don`t Worry! This report is 100% safe & secure. It`s not available publically and it`s not accessible by search engines
(Google, Yahoo. Bing, etc)

Sentence
Neto, Alceu Brito Jr, Percy Nohama 17 February 2020 · Grammatical features of the word are loaded during typing time.
The probability of classes occupying the next position is set according to their sequence position. · However, because
the partial deletion of a word may occur, the grammatical features might be retrieved. This retrieval requires storing the
possible paths from that sequence position onwards. Both the quantity of click savings and therefore the accuracy within
the prediction of desired words to be entered in a very text were considered satisfactory. the whole system support ed
prediction technique is out there on the web, and users have unrestricted and free access thereto. The linguistic
limitation observed b y the employment of a corpus is solved by opening the ASCII text file.. 3. Muhammad Hassan∗Muh
ammad Saeed∗Ali Nawaz∗Kamr an Ahsan†Sehar Jabeen∗Farh an Ahmed Siddiqui∗Kha war Islam 2nd July 2018
Effective Word Prediction in Urdu Language UsingStochasti c Model · The methodology is clear and straight as the
intelligent words were built with the implementation of artificial intelligence, through which the user typed character and
a list of suggestions to complete the word is available for the user. · · The study ended up with an optimized version of
the algorithm that would fulfill the need of output with the best utilization and artificial intelligence by working A
pioneering research has Auto Sug-gestor and Predictor of word (Get Your Word Before Your Typing to Increase The
Typing Speed).(N- 1) logi chas been used for more Artificial Intelligence, different algorithms, and searching approaches
to increase the typing speed by providing better auto suggestion. Although the research and experiments did not
contain a huge amount of data, the behind the screen as well to make the intelligence even better for user of front-end
program to get experience that is more professional with the application. outcomes met state- of –the- art work. It has
been noticed that words dictionary, quality and quantity had significant impact on predicting the list of suggestion and
its precision. 4. Abdul Saboor , Mihaly Benda , Felix Gembler , and Ivan Volosyak 16 May 2019 Word Prediction Support
Model for SSVEP- Based BCI Web Speller · The computational linguistic model called n-gram was used for the word
prediction based on the text database. · The n-gram model consisted of the contiguous sequence of n items for a given
text. Thus, the n-gram is a set of co- occurrence of words within a given bracket, and they can be classified as uni-gram
(size 1), bi-gram (size 2), tri-gram (size 3), etc. · (2) where m is gram number (m = 1 for uni-gram, m = 2 for bi-gram ...
and so on), and w is equal to number of words in a sentence. · The n-gram model predicts xi, based on xi−(n−1),...,xi−1
(3) In terms of probability, we can represent it as P(xi | xi−(n−1),...,xi−1) In this paper the proposed word prediction
support model can significantly increase the ITR and accuracy of the SSVEP-based BCI web speller. The CSS based
animation can provide better stimulation and straightforward configuration options. During the execution of the speller,
the AJAX calls were placed to call the Java Servlet and Java Beans which provided the JSON format glossary from the
database. These calls proved to produce the in-time word prediction. Therefore, the online speller proposed during this
study will be extended for the net based email interface, or to write down a blog which require a user to type the text
within the net browser. Dai and Jiajun Chen 5 Aug 2017 should at least contain information of each word in the target
sentence. • Word Predictions for Decoder’s Hidden States Similar intuition is also applied for the decoder. Because the
hidden states of the decoder are responsible for the translation of target words, they should be able to predict the target
words as well. However, due to the large amount of parameters and relatively small training data set, the end-toend
learning of an NMT model may not be able to learn the best solution. We argue that at least part of the problem is
caused by the long error backpropagation pipeline of the recurrent structures in multiple time steps, which provides no
direct control of the information carried by the hidden states in both the encoder and decoder

Report Title: Project Report 2

Report Link: https://www.check-plagiarism.com/plag-report/5232352c3ac91979efac3794563a4d4


(Use this link to send report to anyone) 3ce5dd1639064823

Report Generated Date: 09 December, 2021

Total Words: 759

Total Characters: 4461

Keywords/Total Words Ratio: 0%


Excluded URL: No

Unique: 98%

Matched: 2%

Sentence wise detail:


Neto, Alceu Brito Jr, Percy Nohama 17 February 2020 · Grammatical features of the word are loaded during typing time.
The probability of classes occupying the next position is set according to their sequence position.
· However, because the partial deletion of a word may occur, the grammatical features might be retrieved.
This retrieval requires storing the possible paths from that sequence position onwards.
Both the quantity of click savings and therefore the accuracy within the prediction of desired words to be entered in a
very text were considered satisfactory.
the whole system support ed prediction technique is out there on the web, and users have unrestricted and free access
thereto.
The linguistic limitation observed b y the employment of a corpus is solved by opening the ASCII text file.. 3.
Muhammad Hassan∗Muh ammad Saeed∗Ali Nawaz∗Kamr an Ahsan†Sehar Jabeen∗Farh an Ahmed Siddiqui∗Kha war
Islam 2nd July 2018 Effective Word Prediction in Urdu Language
UsingStochasti c Model · The methodology is clear and straight as the intelligent words were built with the
implementation of artificial intelligence, through which the user typed character and a list of suggestions to complete
the word is available for the user. (0)
· · The study ended up with an optimized version of the algorithm that would fulfill the need of output with the best
utilization and artificial intelligence by working A pioneering research has Auto Sug-gestor and Predictor of word (Get
Your Word Before Your Typing to Increase
The Typing Speed).
(N- 1) logi chas been used for more Artificial Intelligence, different algorithms, and searching approaches to increase the
typing speed by providing better auto
suggestion.
Although the research and experiments did not contain a huge amount of data, the behind the screen as well to make
the intelligence even better for user of front-end program to get experience that is more professional with the (1)
application.
outcomes met state- of –the- art work.
It has been noticed that words dictionary, quality and quantity had significant impact on predicting the list of suggestion
and its precision. 4.
Abdul Saboor , Mihaly Benda , Felix Gembler , and Ivan Volosyak 16 May 2019 Word Prediction Support
Model for SSVEP- Based BCI Web Speller · The computational linguistic model called n-gram was used for the word
prediction based on the text database.
· The n-gram model consisted of the contiguous sequence of n items for a given text.
Thus, the n-gram is a set of co- occurrence of words within a given bracket, and they can be classified as uni-gram (size
1), bi-gram (size 2), tri-gram (size 3), etc.
· (2) where m is gram number (m = 1 for uni-gram, m = 2 for bi-gram ...
and so on), and w is equal to number of words in a sentence.
· The n-gram model predicts xi, based on xi−(n−1),...
,xi−1 (3) In terms of probability, we can represent it as P(xi xi−(n−1),...
,xi−1) In this paper the proposed word prediction support model can significantly increase the ITR and accuracy of the
SSVEP-based BCI web speller.
The CSS based animation can provide better stimulation and straightforward configuration options.
During the execution of the speller, the AJAX calls were placed to call the Java Servlet and Java Beans which provided the
JSON format glossary from the database.
These calls proved to produce the in-time word prediction.
Therefore, the online speller proposed during this study will be extended for the net based email interface,
or to write down a blog which require a user to type the text within the net browser.
Dai and Jiajun Chen 5 Aug 2017 should at least contain information of each word in the target sentence.
• Word Predictions for Decoder’s Hidden States Similar intuition is also applied for the decoder.
Because the hidden states of the decoder are responsible for the translation of target words, they should be able to
predict the target words as well.
However, due to the large amount of parameters and relatively small training data set, the end-toend learning of an NMT
model may not be able to learn the best solution.
We argue that at least part of the problem is caused by the long error backpropagation pipeline of the recurrent
structures in
multiple time steps, which provides no direct control of the information carried by the hidden states in both the encoder
and decoder

Match Urls:
0:
https://medium.com/double-pointer/system-design-interview-autocomplete-type-ahead-system-for-a-search-box-1ac968f
9f121
1:
https://www.mckinsey.com/~/media/McKinsey/Industries/Public%20and%20Social%20Sector/Our%20Insights/The%20ag
e%20of%20analytics%20Competing%20in%20a%20data%20driven%20world/MGI-The-Age-of-Analytics-Full-report.pdf
Keywords Density

One Word 2 Words 3 Words

word 4.87% word prediction 1.54% prediction support model 0.51%

gram 3.33% decoder 1.03% ssvep based bci 0.51%

predict 3.08% artificial intelligence 0.77% neto alceu brito 0.26%

prediction 2.05% gram size 0.77% gram equal number 0.26%

model 1.79% hidden states 0.77% ajax calls call 0.26%

Plagiarism Report
By check-plagiarism.com
CSE3001 Software Engineering

REVIEW 1 DOCUMENT

WORD PREDICTION
USING TRIE DATA
STRUCTURE
PREPARED BY

Drashti Patel [19BCE0602]


Devanshi Chaudhary [19BCE2614]
Prince Kumar[19BCI0002]
1. ABSTRACT:

In this busy world no one has time now. Technology is being developed every day to increase the
efficiency. In this front, word predictor is a small step which increases our efficiency multi fold times.
Word predictor has applications in various areas like texting, search engine etc. To develop our word
predictor program, we used the data structure Trie. Our program uses a stored file of words to predict the
words which the user may think of thus helping a lot.

PROCESS MODEL IDENTIFIER -EVOLUTIONARY MODEL We prefer this model over

other model because in this project we want to know need and requirement of user at each and

every step and want to make it more efficient and time saving and user-friendly.

2. INTRODUCTION:

Autocomplete speeds up human-computer interactions when it correctly predicts the word a user intends
to enter after only a few characters have been typed into a text input field. It works best in domains with a
limited number of possible words (such as in command line interpreters), when some words are much
more common (such as when addressing an email), or writing structured and predictable text (as in
source code editors). Many autocomplete algorithms learn new words after the user has written them a
few times, and can suggest alternatives based on the learned habits of the individual user.

3. BASIC PRINCIPLE BEHIND WORKING OF AUTOCOMPLETE

Autocomplete or word completion works so that when the writer writes the first letter or letters of a word,
the program predicts one or more possible words as choices. If the word he intends to –write is included
in the list he can select it, for example by using the number keys. If the word that the user wants is not
predicted, the writer must enter the next letter of the word. At this time, the word choice(s) is altered so
that the words provided begin with the same letters as those that have been selected. When the word that
the user wants appears it is selected, and the word is inserted into the text.

In another form of word prediction, words most likely to follow the just written one are predicted,
based on recent word pairs used. Word prediction uses language modeling, where within a set
vocabulary the words are most likely to occur are calculated. Along with language modeling, basic
word prediction on AAC devices is often coupled with a regency model, where words that are used
more frequently by the AAC user are more likely to be predicted. Word prediction software often
also allows the user to enter their own words into the word prediction dictionaries either directly, or
by "learning" words that have been written.

4. PROCESS MODEL IDENTIFIER - EVOLUTIONARY MODEL


We prefer this model over other model because in this project we want to know need and requirement of user
at each and every step and want to make it more efficient and time saving and user-friendly.

ALGORITHM USED TRIE Training data set Prediction algorithm

TOOLS USED : draw.io and online visual paradigm.


Software Requirements
Specification

for

WORD PREDICTION
USING TRIE DATA STRUCTURE

Version 1.0

Prepared by

Drashti Patel [19BCE0602][LEADER]


Research works, paper readings, project idea with base paper.

Devanshi Chaudhary [19BCE2614]


Documentation, gantt chart, and necessary diagrams.

Prince Kumar[19BCI0002]
code implementations and testing.
ii

Table of Contents
Table of Contents ..................................................................................................................1-2
Revision History ....................................................................................................................1-2
1. Introduction .....................................................................................................................1-3
1.1 Purpose.................................................................................................................................... 1-3
1.2 Document Conventions ........................................................................................................... 1-3
1.3 Intended Audience and Reading Suggestions ......................................................................... 1-3
1.4 Product Scope ............................................................................................................................ 2
1.5 References .................................................................................................................................. 2
2. Overall Description .........................................................................................................2-4
2.1 Product Perspective................................................................................................................. 2-4
2.2 Product Functions ...................................................................................................................... 2
2.3 User Classes and Characteristics................................................................................................ 3
2.4 Operating Environment .............................................................................................................. 3
2.5 Design and Implementation Constraints .................................................................................... 4
2.6 User Documentation .................................................................................................................. 4
2.7 Assumptions and Dependencies................................................................................................. 4
3. External Interface Requirements ..................................................................................... 6
3.1 User Interfaces ........................................................................................................................... 6
3.2 Hardware Interfaces ................................................................................................................... 7
3.3 Software Interfaces .................................................................................................................... 7
3.4 Communications Interfaces ....................................................................................................... 7
4. System Features ................................................................................................................. 7
4.1 System Feature 1 ........................................................................................................................ 7
4.2 System Feature 2 (and so on) ..................................................................................................... 7
5. Other Nonfunctional Requirements ................................................................................. 8
5.1 Performance Requirements ........................................................................................................ 8
5.2 Safety Requirements .................................................................................................................. 8
5.3 Security Requirements ............................................................................................................... 8
5.4 Software Quality Attributes ....................................................................................................... 8
5.5 Business Rules ........................................................................................................................... 9
6. Other Requirements .......................................................................................................... 9
Appendix A: Glossary.............................................................................................................. 9
Appendix B: Analysis Models ................................................................................................. 9
Appendix C: To Be Determined List .................................................................................... 10

Revision History
Name Date Reason For Changes Version
Word prediction 08-09-21 Na Version 1
using trie data
structure
1. Introduction

1.1 Purpose
This document specifies requirements of the software that predicts word based on letters typed
so far. So, we will be using these requirements to proceed with implementing our idea. We will
be knowing what all resources we will need to implement word prediction. For example, before
starting to work on project, we should have knowledge of a programming language. So that
become our pre-requisite along with many others.

While working on the project, we will need many resources which we can be ready with on
time. This will prevent last moment rush and risk of not reaching deadline on time. For
example, we will need a dictionary file all the time while doing the coding part. So, we can get
a dictionary file ready before starting with coding part. Even after the project is completed, we
have to take care all the new words are added to the dictionary file and hence we have to keep
updating it.

Word Prediction allows user to complete the word based on input they have given so far. They can
even add new word if it does not already exist.

Autocomplete speeds up human-computer interactions when it correctly predicts the word a


user intends to enter after only a few characters have been typed into a text input field. It
works best in domains with a limited number of possible words (such as in command line
interpreters), when some words are much more common (such as when addressing an email),
or writing structured and predictable text (as in source code editors).
Many autocomplete algorithms learn new words after the user has written them a few times, and can
suggest alternatives based on the learned habits of the individual user.

1.2 Document Conventions


This document follows the standard IEEE format for the SRS document.

1.3 Intended Audience and Reading Suggestions


Nowadays, in every field we need to type using our computers, mobiles or other electronic
devices. In hurry, we might make some spelling mistakes too. For that reason, word predictor
will help all the working-class people and also students to make their work easier. It will reduce
errors in their work. It will reduce their time and make it easier for them.

Any individual can understand by reading the reference papers provided in section 1.5 and then
follow the order of this SRS document for better understanding of the project.

1.4 Product Scope


Word Predictor have application in messaging application like WhatsApp, web search engines, word
processors, command like interpreters etc. The original purpose of word prediction software
was to help people with physical disabilities increase their typing speed, as well as to help them
decrease the number of keystrokes needed in order to complete a word or a sentence. Thus, in this
front we developed our own program for word predictor using data structure trie which definitely
increases efficiency of the user by at least 10%.
Autocomplete, or word completion, is a feature in which an application predicts the rest of a word a
user is typing. In graphical user interfaces, users can typically press the tab key to accept a
suggestion or the down arrow key to accept one of several.

1.5 References
[1] Trie data structure
https://www.javatpoint.com/trie-data-
structure

[2] auto-complete using trie


https://www.geeksforgeeks.org/auto-complete-feature-using-trie/

[3] An improved Bayesian TRIE based model for SMS text normalization
https://www.researchgate.net/publication/343441896_An_improved_Bayesian_TRIE_based_model
_for_SMS_text_normalization

2. Overall Description

2.1 Product Perspective


In this busy world no one has time now. Technology is being developed every day to increase the
efficiency. In this front, word predictor is a small step which increases our efficiency multifold
times.
Word predictor has applications in various areas like texting, search engine etc. To develop our
word predictor program we used the data structure Trie. Our program uses a stored file of words
to predict the words which the user may think of thus helping a lot.

2.2 Product Functions


Auto complete or word completion works so that when the writer writes the first letter or letters of
a word, the program predicts one or more possible words as choices. If the word he intends to –
write is included in the list he can select it, for example by using the number keys.
If the word that the user wants is not predicted, the writer must enter the next letter of the word. At
this time, the word choice(s) is altered so that the words provided begin with the same letters as those
that have been selected. When the word that the user wants appears it is selected, and the word is
inserted into the text.

In another form of word prediction, words most likely to follow the just written one are predicted,
based on recent word pairs used. Word prediction uses language modeling, where within a set
vocabulary the words are most likely to occur are calculated. Along with language modeling, basic
word prediction on AAC devices is often coupled with a regency model, where words that are used
more frequently by the AAC user are more likely to be predicted. Word prediction software often
also allows the user to enter their own words into the word prediction dictionaries either directly,
or by "learning" words that have been written.

2.3 User Classes and Characteristics


The user classes details will be updated in the next version.

The objectives of this project are:

To understand the dynamic data structure tree used in developing the program.
To understand the data structure ‘trie’ being used in the program.

To construct a strong and efficient algorithm to develop the program which is editable and can be
later used as a module for bigger software mechanism.

To develop a real time program which is efficient and has a fast processing and also has an industrial
application.

2.4 Operating Environment


The basic requirement is listed below:
➢ Any operation system in latest version like windows 10 with sufficient memory space.
➢ For running the codes, we need a CPP complier like Code blocks
➢ Our program uses an external dictionary file which will be accessed throughout the
program.
2.5 Design and Implementation Constraints
➢ Hardware limitations - (timing requirements- program should be executed and input should be
given before program halts, memory requirements - a temporary buffer will be made of the
input characters to predict the word for which memory will be required)
➢ Interfaces – execution window of program will act as the interface between user and system
user will provide word and input and system will predict the word and give it as input
➢ Specific technologies tools – software will be required to execute the word prediction program
➢ Databases to be used - a trained data set will be used which is a dictionary which will predict
the word
➢ Parallel operations - asked user foe next operation to be performed and to check trained data
set after every input word
➢ Language requirements - user should know English because input words will be given in
English language
➢ Programming standards - user is expected to execute the developed program and excepted to
give input properly in acceptable format.

2.6 User Documentation


Will be updated in further versions.

2.7 Assumptions and Dependencies


1. User have basic knowledge of computers.
2. User should have sufficient knowledge of English since the system interface will be in
English
3. System should be able to load the program and trained data set
4. System should have working display to show the output

Gaant chart:
Pert Chart:

WBS:
Timeline Chart:

3. External Interface Requirements

a. User Interfaces
Users will be using it using any electronic devices such as laptops, mobiles, computers etc.
Input device of system will be used to give input to the software and when the user presses enter, the
software shows words, that matches letters entered so far, on the screen of device being used.

a. Hardware Interfaces
The laptop or mobile the user shall use will be the hardware interface required.
b. Software Interfaces
Our program uses an external dictionary file which will be accessed throughout the program.
In the file, words are stored in title case without any space between two words. Every capital
letter denotes beginning of new word and next one denotes end.

c. Communications Interfaces
Will be updated in future versions.

4. System Features
4.1 System Feature 1: our basic text predictor.
4.1.1 Description and Priority
Autocomplete or word completion works so that when the writer writes the first
letter or letters of a word, the program predicts one or more possible words as
choices. If the word he intends to –write is included in the list he can select it, for
example by using the number keys. If the word that the user wants is not predicted,
the writer must enter the next letter of the word. At this time, the word choice(s) is
altered so that the words provided begin with the same letters as those that have been
selected. When the word that the user wants appears it is selected, and the word is
inserted into the text.
4.1.2 Stimulus/Response Sequences
This section will be updated in future versions.
4.1.3 Functional Requirements
Our program is the main functional requirement where we will be using a
programming language, example C, C++, etc.

The system shall constantly access the dictionary file. User will give input in
the form of word (partial or full). Words matching the letters typed so far will be
displayed on the screen.

If there are no matching words, then system will ask if user wants to add the
word to dictionary.
After every word prediction cycle completes, system asks if user wants to add
more words. If yes, another cycle starts, else the system halts.
5. Other Nonfunctional Requirements

5.1 Performance Requirements

The system should be interactive to users.


• The interface is simple and easy to use.
• System is user friendly, self-explanatory and also it is provided with a help guide.
• This system can be used by everyone.
• Speed: The system should be made as fast as possible to reduce response time.
• Throughput: The throughput should be as high as possible. We should be able to attain
maximum output in minimum time.
• Resource Utilization: Resources are modified according to user requirements.

5.2. Safety Requirements

➢ The system has to be secure from attacks.


➢ The system should be tough and not prone to breakdowns and in case of breakdown
should be stabilized soon.
➢ The administrators maintain the system as per the maintenance contract.

5.3. Security Requirements

➢ The system shall not try to use the essential data of users for any justified or unjustified
reasons.

5.4. Software Quality Attributes

Reliability:

• The system cannot be relied upon completely but we have to try to attain maximum
reliability.
• Reliability will also be higher since we try to attain maximum accuracy.
• Maintain proper and updated dictionary files to improve reliability.

Accuracy

• The information provided in the dictionary files and by the user should be correct.
• Minimize the errors.
• All operations will be done correctly to increase the level of accuracy.
5.5Business Rules
This field is not necessary for this project.

6. Other Requirements

Appendix A: Glossary
Trie data structure: it an advanced version of the tree data structure.

1. Appendix B: Analysis Models

1. DATA FLOW DIAGRAM

A data flow diagram (DFD) maps out the flow of information for any process
or system. It uses defined symbols like rectangles, circles and arrows, plus
short text labels, to show data inputs, outputs, storage points and the routes
between each destination.
In our project, user enters string. If the string is substring of any word in our
dictionary file, the program sends list of those matching words. If not, then it asks
user if they want to add the word or not. But in either case, the user is asked if
they want to go for autocomplete again or quit.

2. ENTITY RELATION DIAGRAM

Entity Relationship Diagrams illustrate the logical structure of databases.


ERDs show entities in a database and relationships between tables within that
database. It is essential to have ER-Diagrams if you want to create a good
database design. The diagrams help focus on how the database actually works.
In our project, user is one entity with user ID as primary key. Interface is
another entity with input given by user and matching words that dictionary
sends as two instances. Dictionary file is the third entity with stored words as
one key and user added words as another.
3. CLASS DIAGRAM

The class diagram is the main building block of object-oriented modelling. It is


used for general conceptual modelling of the structure of the application, and for
detailed modelling translating the models into programming code.
In our project, we have these classes. Attributes and functions of each classes are
clearly mentioned.

4. USE CASE DIAGRAM

A use case diagram at its simplest is a representation of a user's interaction with


the system that shows the relationship between the user and the different use cases
in which the user is involved.
In our project, user has to interact with the program at each step for the program to
proceed. At first program asks if user wants to autocomplete any string or quit. If
they choose autocomplete; program asks for string. Then if the string matches
with any of those in dictionary file, it shows list of matching words. If not, it asks
user if they want to add this string to dictionary. If user presses y, it adds the string,
if they press n, it does not. Then after each cycle of predicting, program again
asks if want to autocomplete or quit.
The lower-level modules will mostly conduct database updates and modifications. The
other modules will mainly be reading data from the database. To do this, the modules
will connect to the database using a common connection file containing the details to
make the connection.
Information transfer between each of the pages/classes will mainly be the constant
information stored such as words. The pages themselves will be sending information
to themselves via POST and GET calls so there will not be data flow between nodes
necessarily.

5. SEQUENCE DIAGRAM

A sequence diagram shows object interactions arranged in time sequence. It


depicts the objects and classes involved in the scenario and the sequence of
messages exchanged between the objects needed to carry out the functionality of
the scenario.
In our project, strings are the objects. User enters strings, it is sent to dictionary
file to check if there are any matching strings. Again, list of matching strings is
displayed. If it’s not there, then user can add it to the file.

6. COLLABORATION DIAGRAM
A collaboration diagram, also known as a communication diagram, is an
illustration of the relationships and interactions among software objects in the
Unified Modelling Language (UML). These diagrams can be used to portray the
dynamic behaviour of a particular use case and define the role of each object.
In our project, there are so many instances where user decides what should be
done with objects. User enters strings, it is sent to dictionary file to check if there
are any matching strings. Again, list of matching strings is displayed. If it’s not
there, then user can add it to the file.

Appendix C: To Be Determined List


Will be updated in further versions.
Review 2

REVIEW 2
DOCUMENT
for

WORD PREDICTION
USING
TRIE DATA STRUCTURE

Prepared by

Drashti Patel [19BCE0602]

Devanshi Choudhary [19BCE2614]

Prince Kumar[19BCI0002]

Date : 01 NOV 2021

1|Page
Review 2

CONTENTS

1. INTRODUCTION ................................................................................................................................. 3
1.1. PURPOSE OF THIS DOCUMENT ............................................................................................................ 3
1.2. SCOPE OF THE DEVELOPMENT PROJECT ............................................................................................. 3
1.3. DEFINITIONS, ACRONYMS, AND ABBREVIATIONS ............................................................................... 3
1.4 REFERENCES ..................................................................................................................................... 3
1.5. OVERVIEW OF THE PROJECT .............................................................................................................. 3

2. SYSTEM ARCHITECTURE DESCRIPTION ................................................................................... 4


2.1. OVERVIEW OF MODULES / COMPONENTS ........................................................................................... 4
2.2. STRUCTURE AND RELATIONSHIPS ...................................................................................................... 4
2.3. USER INTERFACE ISSUES ............................................................................................................ 8

3. DETAILED DESCRIPTION OF COMPONENTS ............................................................................ 8


3.1. COMPONENT TEMPLATE DESCRIPTION ............................................................................................... 8
3.2. X COMPONENT (OR CLASS OR FUNCTION ...) ..................................................................................... 9

4. REUSE AND RELATIONSHIPS TO OTHER PRODUCTS ............................................................ 9

5. DESIGN DECISION AND TRADEOFF ............................................................................................. 9

6. PSEUDOCODE FOR COMPONENTS ............................................................................................... 9

7. APPENDICES (IF ANY) .................................................................................................................... 10


7.1. SDS COMPONENT TEMPLATE ............................................................................................. 10

2|Page
Review 2

The Software Design Specification Outline

1. Introduction

1.1 Purpose of this document

The objective of this software design specification (SDS) is to ensure that the final
outputted software product meets the requirements of the end customer, i.e., functions
as expected, is reliable, is easy to use, does not demand inordinate efforts to train staff
in its use, etc. Specifically, the software design specification is a description of the
software components and sub-systems to be provided as part of the product.

1.2 Scope of the development project

Word Predictor have application in messaging application like WhatsApp, web search
engines, word processors, command like interpreters etc. The original purpose of word
prediction software was to help people with physical disabilities increase their typing speed,
as well as to help them decrease the number of keystrokes needed in order to complete a
word or a sentence. Thus, in this front we developed our own program for word predictor
using data structure trie which definitely increases efficiency of the user by at least 10%.
Autocomplete, or word completion, is a feature in which an application predicts the rest of a
word a user is typing. In graphical user interfaces, users can typically press the tab key to
accept a suggestion or the down arrow key to accept one of several.

1.3 Definitions, acronyms, and abbreviations

y= yes
n= no
str= string
dict= dictionary
PK=primary key

1.4 References

Not applicable

1.5 Overview of document

In System architecture description, we have shown structures and state diagrams to


help the reader to understand the charts. In detailed description of components, we
describe each component used in our project. In reuse and relationships to other
products section, we have written about the modules we have used from previous
products and in what way it functions in our project. In design decision and tradeoffs
section, we try to motivate any decisions that will help the reader understand
the design that our team is using and to capture good ideas that were abandoned with
the reason.

2. System architecture description

3|Page
Review 2

2.1 Overview of modules / components

There will be 4 main components to the system:


• User
-Makes decision
- Provides key word
-Add new word in data base
-Quit
• User Interface
- Display Output from system
• Auto Complete Algorithm
-Predicts the complete word from the keyword provided by used
-Connects to data set for words
• Dictionary Data Set
-Provides data to auto complete algorithm
-Add new words in data base

2.2 Structure and relationships

1. DATA FLOW DIAGRAM

A data flow diagram (DFD) maps out the flow of information for any process
or system. It uses defined symbols like rectangles, circles and arrows, plus
short text labels, to show data inputs, outputs, storage points and the routes
between each destination.
In our project, user enters string. If the string is substring of any word in our
dictionary file, the program sends list of those matching words. If not, then it asks
user if they want to add the word or not. But in either case, the user is asked if
they want to go for autocomplete again or quit.

2. ENTITY RELATION DIAGRAM

Entity Relationship Diagrams illustrate the logical structure of databases.


ERDs show entities in a database and relationships between tables within that
4|Page
Review 2

database. It is essential to have ER-Diagrams if you want to create a good


database design. The diagrams help focus on how the database actually works.
In our project, user is one entity with user ID as primary key. Interface is
another entity with input given by user and matching words that dictionary
sends as two instances. Dictionary file is the third entity with stored words as
one key and user added words as another.

3. CLASS DIAGRAM

The class diagram is the main building block of object-oriented modelling. It is


used for general conceptual modelling of the structure of the application, and for
detailed modelling translating the models into programming code.
In our project, we have these classes. Attributes and functions of each classes are
clearly mentioned.

4. USE CASE DIAGRAM

A use case diagram at its simplest is a representation of a user's interaction with


the system that shows the relationship between the user and the different use cases
5|Page
Review 2

in which the user is involved.


In our project, user has to interact with the program at each step for the program to
proceed. At first program asks if user wants to autocomplete any string or quit. If
they choose autocomplete; program asks for string. Then if the string matches
with any of those in dictionary file, it shows list of matching words. If not, it asks
user if they want to add this string to dictionary. If user presses y, it adds the string,
if they press n, it does not. Then after each cycle of predicting, program again
asks if want to autocomplete or quit.
The lower-level modules will mostly conduct database updates and modifications. The
other modules will mainly be reading data from the database. To do this, the modules
will connect to the database using a common connection file containing the details to
make the connection.
Information transfer between each of the pages/classes will mainly be the constant
information stored such as words. The pages themselves will be sending information
to themselves via POST and GET calls so there will not be data flow between nodes
necessarily.

5. SEQUENCE DIAGRAM

A sequence diagram shows object interactions arranged in time sequence. It


depicts the objects and classes involved in the scenario and the sequence of
messages exchanged between the objects needed to carry out the functionality of
the scenario.
In our project, strings are the objects. User enters strings, it is sent to dictionary
file to check if there are any matching strings. Again, list of matching strings is
displayed. If it’s not there, then user can add it to the file.

6|Page
Review 2

6. COLLABORATION DIAGRAM

A collaboration diagram, also known as a communication diagram, is an


illustration of the relationships and interactions among software objects in the
Unified Modelling Language (UML). These diagrams can be used to portray the
dynamic behaviour of a particular use case and define the role of each object.
In our project, there are so many instances where user decides what should be
done with objects. User enters strings, it is sent to dictionary file to check if there
are any matching strings. Again, list of matching strings is displayed. If it’s not
there, then user can add it to the file.

7|Page
Review 2

2.3 User interface issues

The user interface will be focused on simplicity throughout the system. The same
overall format will be used for all users to maintain consistency. Since the interface
will be simple, there is much need to adjust it for different users’ abilities.
Issues will be showing the available dictionary to client in an effective way. This
means figuring out the best way to present words etc. Another issue will be how to
show the added word information effectively. This can probably be done by
displaying a page with information from the database about a particular word in a
document style format.

3. Detailed description of components

Identification is where the component is located.


Type is the type of component.
Purpose is the reason behind the component.
Function is the functions provided by the component.
Subordinates are the components this component depends on
Interfaces explains the layout and usage of the component.
Resources are the parts required to make the component do its job.
Processing is what processes take place during the functions of the component.
Data is the critical data to the component.

3.1 Component template description

Identification Users, located in the dictionary


Type Component
Purpose To autocomplete the keywords.
Function Allow the Users to enter the keyword and view the result after
Autocomplete.
Subordinates Choose input, enter keywords, submit keywords and quit.
Dependencies This component is Dependent on the subordinates’ modules and
user interface.
Interfaces Users can make a choose either autocomplete or quit.
To autocomplete the users, have to Enter the keywords and submit
to the User interface.
Resources This will require simply the database and the search the word in
the database.
Processing
Trie algorithm will match the given keyword of the users and
display it to user’s screen.
Data Users’ keywords if the result not found after auto-completion,
thekeyword will be added to the dictionary.

8|Page
Review 2

3.2 X Component (or Class or Function ...)

Identification Interface, located in the dictionary


Type Component
Purpose Provide function for users using the site.
Function Add word
Parse word
Search word
Auto-complete
Subordinates Display the result and quit.
Dependencies This component is Dependent on the users and dictionary file.
Interfaces It allows the users the
Resources It requires the dictionary file and trie algorithm.
Processing First let the user choose the whether he/she wants to autocomplete
or quit, then if autocomplete users will enter the keyword and the
interface will display the result.
Data N/P

4.0 Reuse and relationships to other products

Word Predictor uses trie to predict word based on what the user has entered so far. In
computer science, a trie, also called digital tree or prefix tree, is a kind of search
tree—an ordered tree data structure used to store a dynamic set or associative array
where the keys are usually strings. We are using a pre-defined dictionary file, which
anyone can access from GitHub. It is a must to implement the project. It is a .txt file
where all the words are already stored and we can access it to search words, compare
with string entered by user, add word to it, if there are some new words.

5.0 Design decisions and tradeoffs

We are trying to keep the system simple. The idea is to have a lot of functionality, but not at the
expense of having a usable system. We are replacing the existing systems which do not work well
and are too complicated to be effective. We are focusing our efforts around creating a system that
does the important functions, well. In this project the motivation was to re-create all the things from
scratch in such a way that interface is simple, system is highly efficient and all the modules
complement each other

6.0 Pseudocode for components

1. Start
2. Input choice (autocomplete or quit)
3. If autocomplete enter the keywords otherwise quit.
4. Enter the keywords
5. Auto-Complete algorithm predicts one or more possible words as
choice.
6. If the result found, display the result on the screen.
7. If the result not found, Add the word to the dictionary file.
8. stop

9|Page
Review 2

7.0 Appendices (if any)

The lower level modules will mostly conduct database updates and modifications. The
other modules will mainly be reading data from the database. To do this, the modules
will connect to the database using a common connection file containing the details to
make the connection.
Information transfer between each of the pages/classes will mainly be the constant
information stored such as words. The pages themselves will be sending information
to themselves via POST and GET calls so there will not be data flow between nodes
necessarily

SDS component template

Auto complete algorithm Component

Trie Algorithm

Identification Trie algorithm, located in the dictionary


Type Component
Purpose To make the Auto-completion efficient and fast processing.
Function To search the users keyword in order fashion from the dictionary
file.
Subordinates Users input and Dataset.
Dependencies This class ac be used my many other components of the
systemsuch as Auto-completion.
Interfaces when the writer writes the first letter or letters of a word, the
program predicts one or more possible words as choices. If the
word he intends to –write is included in the list.
Resources This will require simply the Dictionary file and connection to the
dictionary file.
Processing
Loading of dictionary file from hard-disk in the memory.
Adding word in trie.
Implementing Node class for trie.
Searching a word in trie
Implementing autocomplete.

Data N/P

10 | P a g e
Review 2

Dataset Component

Identification Dataset, located in the dictionary


Type Component
Purpose It will be effective for those users with physical impairments to
increase typing rate and reduce spelling errors.
Function Allow the interface to search add and manage dictionary file.
Subordinates Entered keyword and manage words.
Dependencies Auto-complete algorithm
Interfaces It allows the user to enter their own words into the word
prediction dictionaries either directly, or by "learning" words that
have been written
Resources It requires the dictionary file and trie algorithm.
Processing It will be effective for those users with physical impairments to
increase typing rate and reduce spelling errors.

Data N/P

11 | P a g e
Review 2

IMPLEMENTATION:

CODE:

#include <iostream>
#include <fstream>
#include <vector>
#include <string>
#include <algorithm>
#include <string>
#include <cstring>
using namespace std;
class Node
{
public:
Node()
{
mContent = ' ';
mMarker = false;
}
~Node() {}
char content()
{
return mContent;
}
void setContent(char c)
{
mContent = c;
}
bool wordMarker()
{
return mMarker;
}
void setWordMarker()
{
mMarker = true;
}
Node *findChild(char c);
void appendChild(Node *child)
{
mChildren.push_back(child);
}
vector<Node *> children()
{
return mChildren;
}

private:
char mContent;
bool mMarker;

12 | P a g e
Review 2

vector<Node *> mChildren;


};
Node *Node::findChild(char c)
{
for (int i = 0; i < mChildren.size(); i++)
{
Node *tmp = mChildren.at(i);
if (tmp->content() == c)
{
return tmp;
}
}
return NULL;
}
class Trie
{
public:
Trie();
~Trie();
void addWord(string s);
bool searchWord(string s);
bool autoComplete(string s, vector<string> &);
void parseTree(Node *current, char *s, vector<string> &, bool &loop);

private:
Node *root;
};
Trie::Trie()
{
root = new Node();
}
Trie::~Trie()
{
// Free memory
}
void Trie::addWord(string s)
{
Node *current = root;
if (s.length() == 0)
{
current->setWordMarker();
return;
}
for (int i = 0; i < s.length(); i++)
{
Node *child = current->findChild(s[i]);
if (child != NULL)
{
current = child;
}

13 | P a g e
Review 2

else
{
Node *tmp = new Node();
tmp->setContent(s[i]);
current->appendChild(tmp);
current = tmp;
}
if (i == s.length() - 1)
current->setWordMarker();
}
}
bool Trie::searchWord(string s)
{
Node *current = root;
while (current != NULL)
{
for (int i = 0; i < s.length(); i++)
{
Node *tmp = current->findChild(s[i]);
if (tmp == NULL)
return false;
current = tmp;
}
if (current->wordMarker())
return true;
else
return false;
}
return false;
}
bool Trie::autoComplete(std::string s, std::vector<string> &res)
{
Node *current = root;
for (int i = 0; i < s.length(); i++)
{
Node *tmp = current->findChild(s[i]);
if (tmp == NULL)
return false;
current = tmp;
}
char c[100];
strcpy(c, s.c_str());
bool loop = true;
parseTree(current, c, res, loop);
return true;
}
void Trie::parseTree(Node *current, char *s, std::vector<string> &res, bool
&loop)
{
char k[100] = {0};

14 | P a g e
Review 2

char a[2] = {0};


if (loop)
{
if (current != NULL)
{
if (current->wordMarker() == true)
{
res.push_back(s);
if (res.size() > 15)
loop = false;
}
vector<Node *> child = current->children();
for (int i = 0; i < child.size() && loop; i++)
{
strcpy(k, s);
a[0] = child[i]->content();
a[1] = '\0';
strcat(k, a);
if (loop)
parseTree(child[i], k, res, loop);
}
}
}
}
bool loadDictionary(Trie *trie, string filename)
{
ifstream words;
ifstream input;
words.open(filename.c_str());
if (!words.is_open())
{
cout << "Dictionary file Not Open" << endl;
return false;
}
while (!words.eof())
{
char s[100];
words >> s;
trie->addWord(s);
}
return true;
}
int main()
{
system("color 1E");
Trie *trie = new Trie();
int mode;
cout << "Loading dictionary" << endl;
loadDictionary(trie, "words.txt");
while (1)

15 | P a g e
Review 2

{
cout << endl
<< endl;
cout << "Interactive mode,press " << endl;
cout << "1: Auto Complete Feature" << endl;
cout << "2: Quit" << endl
<< endl;
cin >> mode;
switch (mode)
{
case 1: //Auto complete
{
string s;
char addNew;
cin >> s;
transform(s.begin(), s.end(), s.begin(), ::tolower);
vector<string> autoCompleteList;
trie->autoComplete(s, autoCompleteList);
if (autoCompleteList.size() == 0)
{
cout << "No suggestions" << endl;
cout << "Want to add this to the dictionary?(y/n): ";
cin >> addNew;
if (addNew == 'y' || addNew == 'Y')
{
trie->addWord(s);
cout << "Word " << s << " added to the dictionary." << endl;
}
else
cout << "Word " << s << " was not added to the dictionary" << endl;
}
else
{
cout << "Autocomplete reply :" << endl;
for (int i = 0; i < autoCompleteList.size(); i++)
{
cout << "\t \t " << autoCompleteList[i] << endl;
}
}
}
continue;
case 2:
delete trie;
return 0;
default:
continue;
}
}
}

16 | P a g e
Review 2

SCREENSHOT OF IMPLEMENTATION

It provides the user with 2 options to choose from


1. Auto complete feature: if the user chooses this option then he/she will be able to
access the word predicting mode
2. Quit: upon choosing this option the user can move out the interface or the home
screen.

The first interface will look as shown below:

The second interface is the page for the word predicting program
When the user chooses option 1 in the 1st interface they will land on this page
The program asks for the string. And displays matching words after computing it
based on the algorithm with the help of the dictionary that already has many words
saved in it.
The image below shows the possible words that are generated by the program when
an input “app” is fed to the program as the input.

First the program asks if user wants to go for ‘1. Autocomplete’ or wants to ‘2. Quit’
the program.
If user chooses ‘1’, he’ll get this output:

17 | P a g e
Review 2

The third interface is the page for the adding a new word to the dictionary

If the input string does not have any matches in the dictionary file, then the program
lands on this particular interface and asks the user if they want to add the input string
to the dictionary.

The user is asked if they want to add the string into the dictionary file. If users press
‘y’, it asks for full word that’s to be added.

If user presses ‘n’, the program again asks if they want to ‘1. Autocomplete’ or ‘2.
Quit’ i.e. It will land back to the home page.

Shown below is the interface where the user wants or does not want to add the string
“awaqw” to the dictionary.

The program asks for the string. And displays matching words. If the input string does
not have any matches in the dictionary file, it shows this:

18 | P a g e
Review 2

The user is asked if they want to add the string into the dictionary file. If users will
press ‘y’, it asks for full word that’s to be added. If user presses ‘n’, the program again
asks if they want to ‘1. Autocomplete’ or ‘2. Quit’.

19 | P a g e
REVIEW 3
for

WORD PREDICTION
USING
TRIE DATA STRUCTURE
Prepared by

Drashti Patel[19BCE0602]
Devanshi Choudhary [19BCE2614]
Prince Kumar[19BCI0002]

Date : 22 Nov 2021


1. Drashti: research works, paper readings, project idea with base paper,
cyclomatic testing.

2. Devanshi: Documentation, Gantt chart, and necessary diagrams, functional


and automated testing.

3. Prince: code implementations, configuration management.

CODE:

#include <iostream>
#include <fstream>
#include <vector>
#include <string>
#include <algorithm>
#include <string>
#include <cstring> using
namespace std; class
Node
{ public:
Node()
{
mContent = ' ';
mMarker = false;
}
~Node() {}
char content()
{
return mContent;
}
void setContent(char c)
{
mContent = c;
}
bool wordMarker()
{
return mMarker;
}
void setWordMarker()
{
mMarker = true;
}
Node *findChild(char c);
void appendChild(Node *child)
{
mChildren.push_back(child);
}
vector<Node *> children()
{
return mChildren;
}

private:
char mContent;
bool mMarker;
vector<Node *> mChildren;
};
Node *Node::findChild(char c)
{
for (int i = 0; i < mChildren.size(); i++)
{
Node *tmp = mChildren.at(i);
if (tmp->content() == c)
{
return tmp;
}
}
return NULL;
} class Trie {
public:
Trie();
~Trie();
void addWord(string s);
bool searchWord(string s);
bool autoComplete(string s, vector<string> &);
void parseTree(Node *current, char *s, vector<string> &, bool &loop);

private:
Node *root;
};
Trie::Trie()
{

root = new Node();


}
Trie::~Trie()
{
// Free memory
}
void Trie::addWord(string s)
{
Node *current = root;
if (s.length() == 0)
{
current->setWordMarker();
return;
}
for (int i = 0; i < s.length(); i++)
{
Node *child = current->findChild(s[i]);
if (child != NULL)
{
current = child;
}
else
{
Node *tmp = new Node();
tmp->setContent(s[i]); current-
>appendChild(tmp);
current = tmp;
}
if (i == s.length() - 1)
current->setWordMarker();
}
}
bool Trie::searchWord(string s)
{
Node *current = root;

while (current != NULL)


{
for (int i = 0; i < s.length(); i++)
{
Node *tmp = current->findChild(s[i]);
if (tmp == NULL) return false;
current = tmp;
}

if (current->wordMarker())
return true; else
return false;
}
return false;
}
bool Trie::autoComplete(std::string s, std::vector<string> &res)
{
Node *current = root; for (int
i = 0; i < s.length(); i++)
{
Node *tmp = current->findChild(s[i]);
if (tmp == NULL) return false;
current = tmp;
} char c[100];
strcpy(c, s.c_str());
bool loop = true;
parseTree(current, c, res, loop);
return true;
}
void Trie::parseTree(Node *current, char *s, std::vector<string> &res,
bool &loop) { char k[100] = {0}; char a[2] = {0}; if (loop)
{
if (current != NULL)
{

if (current->wordMarker() == true)
{
res.push_back(s);
if (res.size() > 15)
loop = false;
}
vector<Node *> child = current->children();
for (int i = 0; i < child.size() && loop; i++)
{
strcpy(k, s);
a[0] = child[i]->content();
a[1] = '\0';
strcat(k, a); if
(loop)
parseTree(child[i], k, res, loop);
}
}
}
}
bool loadDictionary(Trie *trie, string filename)
{ ifstream words; ifstream
input;
words.open(filename.c_str());
if (!words.is_open())
{
cout << "Dictionary file Not Open" << endl;
return false;
}
while (!words.eof())
{ char
s[100];
words >> s;
trie->addWord(s);
} return
true; } int
main() {
system("color 1E");

Trie *trie = new Trie(); int mode;


cout << "Loading dictionary" << endl;
loadDictionary(trie, "words.txt");
while (1)
{
cout << endl;

cout << "Interactive mode,press " << endl;


cout << "1: Auto Complete Feature" << endl;
cout << "2: Quit" << endl;

cin >> mode;


switch (mode) {
case 1: //Auto complete
{ string
s; char
addNew; cin
>> s;
transform(s.begin(), s.end(), s.begin(), ::tolower);
vector<string> autoCompleteList; trie-
>autoComplete(s, autoCompleteList); if
(autoCompleteList.size() == 0)
{
cout << "No suggestions" << endl;
cout << "Want to add this to the dictionary?(y/n): ";
cin >> addNew;
if (addNew == 'y' || addNew == 'Y')
{
trie->addWord(s);
cout << "Word " << s << " added to the dictionary." << endl;
}
else
cout << "Word " << s << " was not added to the dictionary" << endl;
}
else
{
cout << "Autocomplete reply :" << endl;
for (int i = 0; i < autoCompleteList.size(); i++)
{
cout << "\t \t " << autoCompleteList[i] << endl;
}
} }
continue;
case 2:
delete trie;
return 0; default:
continue;
}
}}
IMPLEMENTATION SCREEN SHOTS:

This is the javascript file “trie.js” which will help us search for the words using trie data structures.

This fine has the word dictionary that will be used to predict words using the trie.js program.
This is the basic interface of our project

The snap below shows all the words predicted when the yser gives “app” as the input.

The user can click on the word they required


Results of previous prototype
WHEN INPUT WAS “vell”

WHEN INPUT WORD IS NOT PRESENT IN THE DICTIONARY


GRAPH OBTAINED:

TESTING (AUTOMATED TESTING)

Name of the tool:

Selenium

Features of the tool:

Selenium is a portable framework for testing web applications. Selenium


provides a playback tool for authoring functional tests without the need to learn
a test scripting language (Selenium IDE). The tests can then run against most
modern web browsers. Selenium runs on Windows, Linux, and macOS.

Selenium test script can be written in programming languages like Java, C#,
Python, Ruby, PHP, Perl and JavaScript. Selenium offers record and playback
features with its browser add-on Selenium IDE. The powerful Selenium WebDriver
helps you create more complex and advanced automation scripts.

Objective of the tool:

Selenium automates browser-based web applications allowing an agile tester to


automate Repeated test scripts so they can come up with more critical test
scenarios. Selenium framework speeds up the test execution process and
improves testing performance as a Whole.

Test cases generated using http://generatortestcase.herokuapp.com/

Test report:
Graph :

You might also like