Welcome to Scribd!

Chatbot: Computational Linguistics II

Uploaded by

0% found this document useful (0 votes)

8 views4 pages

The document describes the interim report of a rule-based Hindi chatbot being designed by Jashn Arora and Tanish Lad. Their approach involves: 1) Greeting the user and asking for a query related to a domain like sports. 2) Using lemmatized data from the domain to find the most similar sentence to the user's query using cosine similarity. 3) Continuing the interaction by repeating steps 2 until the user exits. They have code for lemmatization and calculating cosine similarity between sentences. Going forward, they plan to select a domain, add greetings, and improve accuracy. Challenges include lemmatizing user queries and improving similarity methods.

Original Description:

Original Title

Interim Report

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

8 views4 pages

Chatbot: Computational Linguistics II

Uploaded by

17druva M

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 4

Search inside document

Computational Linguistics II

Chatbot
Jashn Arora - 2018114006
Tanish Lad - 2018114005

Interim Report

Problem Statement
To Design a Rule-Based Chatbot in Hindi which will interact with the user
and answer the user’s queries.

Our Approach
After looking through some of the Previous Works on the topic, we have
decided to follow this procedure:

● Upon starting, the Chatbot will first greet the user. This will be done
by Brute-Force Approach.
● The Chatbot will ask the user for a query related to our domain (yet to
be decided). The user will enter his query.
● We would have already scrapped the data related to our domain, and
that would be our dataset. The data will be parsed and each word in
every sentence would be present in it’s lemmatized form.
● We would then parse and lemmatize each query of the user.
● We would find the most similar sentence from our dataset
corresponding to the query using Cosine Similarity Formula (this may
be changed if we find a better rule based approach for finding
similarity between two sentences). The most similar sentence would
be our output for the given query of the user.
● This process will continue until the user signals to exit using a closing
command (yet to be decided).

What we have done till this interim submission

● Python code to lemmatize the parsed data. We tested this on sample
data and it works successfully. We took some sample sentences,
chunked them and gave it as the input to our code. The code will
extract the root of each word from the chunked data and store it in an
array. The root of each word is present in the chunked output. The
code will simply extract root using simple python functions for text
manipulation like split and strip.

Sample Output:

The first sentence corresponds to the actual sentence, and the

second sentence corresponds to the lemmatized form for each word
of the sentence.
● Python code to find the cosine similarity between two sentences.

We would first remove the stop words from both the input sentences.
Then, using cosine rule from trigonometry, cosine similarity between
the sentence vectors will be found. The value would always be
greater than or equal to 0 and less than or equal to 1.

Upon running our code, the cosine similarity was found to be 0.6324555320336759
which is good because the sentences are quite similar.

What we plan to do till the end

● Selecting a suitable domain like any sport or any topic that has
sufficient information available on the web in hindi to scrape.
● Coding a brute force logic to greet the user
● Removing Stop Words from the Sentences
● Successfully simulate a rule based chatbot with a good accuracy.

Challenges we might face

● Finding a way to lemmatize the user query as our current method

requires the query to be sent to a Hindi Online Chunker and receive
the chunked output so as to lemmatize it. We may change our
approach to lemmatize the query if we find a better suitable approach
to do it.

● If we find that the cosine similarity formula is not very accurate for
some sentences, we would try to find a better rule based approach,
and if we do find a better method, we might shift to that method.

Contributions of each person

We both read both the papers together. We both coded all the codes together on a single
laptop.

References
[1] S
viatlana Höhn.A
data-driven model of explanations for a chatbot
that helps to practice conversation in a foreign language (2017)

[2] Ananthakrishnan Ramanathan, Durgesh D Rao. A

Lightweight
Stemmer for Hindi (2003)

[3] Geeks for Geeks.

https://www.geeksforgeeks.org/python-measure-similarity-between-
two-sentences-using-cosine-similarity/

Terminology and Glossary For Carding. (CARDINGTOOLS)
Document7 pages
Terminology and Glossary For Carding. (CARDINGTOOLS)
Dan
No ratings yet
What Is Chatscript? Natural Language Processing
Document6 pages
What Is Chatscript? Natural Language Processing
Ankit Sharma
No ratings yet
An O (K Log N) Algorithm For Prefix Based Ranked Autocomplete
Document14 pages
An O (K Log N) Algorithm For Prefix Based Ranked Autocomplete
Deden Ramadhan
No ratings yet
Sentiment Analysis From H El Reviews: Data Mining For Business Intelligence
Document13 pages
Sentiment Analysis From H El Reviews: Data Mining For Business Intelligence
Aniket Sujay
No ratings yet
65 SC Tae1 A3
Document3 pages
65 SC Tae1 A3
Mr Unknown
No ratings yet
Training Millions of Personalized Dialogue Agents
Document5 pages
Training Millions of Personalized Dialogue Agents
Fun 1
No ratings yet
Chapter 15 - MINING MEANING FROM TEXT
Document20 pages
Chapter 15 - MINING MEANING FROM TEXT
Simer Fibers
No ratings yet
Aiproject Report
Document11 pages
Aiproject Report
ashutosh
No ratings yet
NLP For ML - Spam Classifier
Document14 pages
NLP For ML - Spam Classifier
Thomas West
No ratings yet
Chap 7.1 Sequence Analysis Using FFN
Document47 pages
Chap 7.1 Sequence Analysis Using FFN
HRITWIK GHOSH
No ratings yet
An Efficient Method For High
Document6 pages
An Efficient Method For High
phaniteja
No ratings yet
Text Summerizer Synopsis-1
Document6 pages
Text Summerizer Synopsis-1
Madhura Joshi
No ratings yet
Javascript Strings Homework
Document8 pages
Javascript Strings Homework
h680z3gv
100% (1)
NLP Unit 1
Document34 pages
NLP Unit 1
hellrider22
100% (1)
Semantic Search - Reader View
Document21 pages
Semantic Search - Reader View
Santa Cruz
No ratings yet
AI-Unit 5
Document31 pages
AI-Unit 5
jt5qx6nggt
No ratings yet
Dynamic Chat Bot
Document4 pages
Dynamic Chat Bot
Lakshay Malhotra
No ratings yet
Creating A Rule-Based Chatbot
Document14 pages
Creating A Rule-Based Chatbot
ursar
No ratings yet
Natural Language Processing
Document15 pages
Natural Language Processing
ridham
No ratings yet
Finding Similar Items
Document85 pages
Finding Similar Items
Mallika Reddy
No ratings yet
6-Bert T5 GPT
Document31 pages
6-Bert T5 GPT
Mariem El Mechry
No ratings yet
Problem Statement:: Rule-Based Machine Translation (RBMT), Statistical Machine Translation (SMT), Neural
Document4 pages
Problem Statement:: Rule-Based Machine Translation (RBMT), Statistical Machine Translation (SMT), Neural
Govind Messi
No ratings yet
AI-Unit 5
Document32 pages
AI-Unit 5
Pranya Aneja
No ratings yet
Artificial Intelligence Project: Chatterbot
Document8 pages
Artificial Intelligence Project: Chatterbot
Eshan Goel
No ratings yet
Recurrent Neural Networks Tutorial, Part 2
Document16 pages
Recurrent Neural Networks Tutorial, Part 2
hoja
No ratings yet
Slides
Document26 pages
Slides
coderyami18
No ratings yet
Disha.M 22blc1376 Toc
Document15 pages
Disha.M 22blc1376 Toc
dishamanjappa
No ratings yet
Top 30 NLP Interview Questions and Answers: 1. What Do You Understand by Natural Language Processing?
Document18 pages
Top 30 NLP Interview Questions and Answers: 1. What Do You Understand by Natural Language Processing?
03sri03
No ratings yet
Presenation
Document5 pages
Presenation
Kinza Waqas
No ratings yet
REVIEW ON SONG RECOMMENDING CHATBOT Ijariie18855
Document3 pages
REVIEW ON SONG RECOMMENDING CHATBOT Ijariie18855
Manikandan
No ratings yet
Comprehensive Guide Attention Mechanism Deep Learning
Document17 pages
Comprehensive Guide Attention Mechanism Deep Learning
Vishal Ashok Palled
No ratings yet
IMDB Movie Review Analysis
Document9 pages
IMDB Movie Review Analysis
adarsh gupta
No ratings yet
Problem Solving Approach L N D C S E
Document36 pages
Problem Solving Approach L N D C S E
Yekanthavasan
No ratings yet
PSA 5 Final
Document36 pages
PSA 5 Final
Yekanthavasan
No ratings yet
Mini Project Report
Document26 pages
Mini Project Report
Akash Dahad
No ratings yet
Tamil Textual Image Reader
Document4 pages
Tamil Textual Image Reader
KogulVimal
No ratings yet
Text Semantic Similarity
Document17 pages
Text Semantic Similarity
Saathvik R Khatavkar
No ratings yet
Conversion of Sign Language To Text: For Dumb and Deaf
Document26 pages
Conversion of Sign Language To Text: For Dumb and Deaf
Arbaz Hashmi
No ratings yet
Vits
Document37 pages
Vits
Gobi
No ratings yet
Experiment 9: Aim: Theory
Document4 pages
Experiment 9: Aim: Theory
Varun Vora
No ratings yet
Eightfold Ai
Document10 pages
Eightfold Ai
Shashi sidhwani
No ratings yet
SI 413, Unit 4: Scanning and Parsing: 1 Syntax & Semantics
Document12 pages
SI 413, Unit 4: Scanning and Parsing: 1 Syntax & Semantics
Aditya Roy karmakar
No ratings yet
Compiler - Lexical Analysis
Document17 pages
Compiler - Lexical Analysis
trupti.kodinariya9810
No ratings yet
1 - Pos Chunker - IISTE Research Paper
Document6 pages
1 - Pos Chunker - IISTE Research Paper
iiste
No ratings yet
Unsupervised Text Summarization Using Sentence Embeddings
Document18 pages
Unsupervised Text Summarization Using Sentence Embeddings
pradeep_dhote9
No ratings yet
Describing Syntax and Semantics (Part 2) : Concepts of Programming Languages
Document31 pages
Describing Syntax and Semantics (Part 2) : Concepts of Programming Languages
Sameeha Abdullah
No ratings yet
Paper 2.edited
Document6 pages
Paper 2.edited
Prajakta Khedkar
No ratings yet
NLP Detail 3
Document5 pages
NLP Detail 3
shahzad sultan
No ratings yet
Sentiment Analysis Using Supervised Machine Learning Ijariie13051
Document7 pages
Sentiment Analysis Using Supervised Machine Learning Ijariie13051
ntsuandih
No ratings yet
Thesis On String Matching
Document5 pages
Thesis On String Matching
angiejorgensensaltlakecity
100% (2)
Data Types: Today's Agenda
Document7 pages
Data Types: Today's Agenda
nfjnzjkngjsr
No ratings yet
Cs224n 2020 Lecture08 NMT
Document77 pages
Cs224n 2020 Lecture08 NMT
Tuan Minh Nguyen Hoang
No ratings yet
Chatbot
Document3 pages
Chatbot
shilpa
No ratings yet
Concurrent Context Free Framework For Conceptual Similarity Problem Using Reverse Dictionary
Document4 pages
Concurrent Context Free Framework For Conceptual Similarity Problem Using Reverse Dictionary
Editor IJRITCC
No ratings yet
Final Paper
Document5 pages
Final Paper
Samir Fuddi
No ratings yet
Efficient Algorithm For Auto Correction Using N-Gram Indexing
Document5 pages
Efficient Algorithm For Auto Correction Using N-Gram Indexing
Sandeep Kumar Das
No ratings yet
Math - Random in V8
Document18 pages
Math - Random in V8
Barrel Roll
No ratings yet
ADA Lect10
Document12 pages
ADA Lect10
H
No ratings yet
Sentiment Analysis Using Machine Learning Classifiers
Document41 pages
Sentiment Analysis Using Machine Learning Classifiers
souptik.sc
No ratings yet
Basics of Chat GPT: How to utilize this powerful tool to enhance your life!
From Everand
Basics of Chat GPT: How to utilize this powerful tool to enhance your life!
Adam Larsen
No ratings yet
Mastering TensorFlow 2.x: Implement Powerful Neural Nets across Structured, Unstructured datasets and Time Series Data
From Everand
Mastering TensorFlow 2.x: Implement Powerful Neural Nets across Structured, Unstructured datasets and Time Series Data
Rajdeep Dua
No ratings yet
ENERGY AND ENVIRONMENT - 18ME751 Module 4 - Dr. Shashikant
Document31 pages
ENERGY AND ENVIRONMENT - 18ME751 Module 4 - Dr. Shashikant
17druva M
No ratings yet
Chatbot Using A Knowladge in Datbase
Document6 pages
Chatbot Using A Knowladge in Datbase
17druva M
No ratings yet
Building An Expert Recommender Chatbot
Document5 pages
Building An Expert Recommender Chatbot
17druva M
No ratings yet
Survey On The Design and Development of Indian Language Chatbots
Document7 pages
Survey On The Design and Development of Indian Language Chatbots
17druva M
No ratings yet
Chatbot For Generation of Subtitles in Regional Languages: Shreyas B, Aravinth P, Kripakaran P
Document4 pages
Chatbot For Generation of Subtitles in Regional Languages: Shreyas B, Aravinth P, Kripakaran P
17druva M
No ratings yet
Comparison of Wavelet Performances in Surface Roughness Recognition
Document6 pages
Comparison of Wavelet Performances in Surface Roughness Recognition
purushothaman sinivasan
No ratings yet
Technical Seminar Topic:: 40 GB Wi-Fi
Document25 pages
Technical Seminar Topic:: 40 GB Wi-Fi
Sumeer singh
No ratings yet
WLAN Products and Solution Pre-Sales Training
Document57 pages
WLAN Products and Solution Pre-Sales Training
NOC U Energy Corp.
No ratings yet
Cascade Control
Document10 pages
Cascade Control
Manoj Rajagopalan
No ratings yet
10MCA17 UNIX Programs (MCA SEM 2, VTU)
Document54 pages
10MCA17 UNIX Programs (MCA SEM 2, VTU)
Abhilash H M
No ratings yet
O232E - 05p1 - Protection Blocks - 161104
Document23 pages
O232E - 05p1 - Protection Blocks - 161104
RK K
No ratings yet
FPGA Based Acoustic Modem For Underwater Communication
Document4 pages
FPGA Based Acoustic Modem For Underwater Communication
Editor IJRITCC
No ratings yet
Lesson Plan - PYTHON PROGRAMMING
Document3 pages
Lesson Plan - PYTHON PROGRAMMING
reshu.tyagi
No ratings yet
KChief 600
Document4 pages
KChief 600
Captive Mahbub
No ratings yet
SMS Solution
Document4 pages
SMS Solution
ed
No ratings yet
A R P Senior Secondary School, Halduchaur Half Yearly Examination (2022-23) Subject: Computer TIME: 2:30 HRS Class: Viii Max. Marks 60
Document2 pages
A R P Senior Secondary School, Halduchaur Half Yearly Examination (2022-23) Subject: Computer TIME: 2:30 HRS Class: Viii Max. Marks 60
Ajay bisht
No ratings yet
KM Security White Paper
Document32 pages
KM Security White Paper
datajerzy
No ratings yet
Journal of Consumer Research, Inc.: The University of Chicago Press
Document7 pages
Journal of Consumer Research, Inc.: The University of Chicago Press
Neri David Martinez
No ratings yet
Shirish - Kumar - Software - Engineer New
Document1 page
Shirish - Kumar - Software - Engineer New
Abhishek Kumar
No ratings yet
7.BLD Example Solves PDF
Document10 pages
7.BLD Example Solves PDF
Subbaraju Gv
No ratings yet
TSKHAY1
Document9 pages
TSKHAY1
kate cc
No ratings yet
It Application Tools in Business: Getting Started
Document33 pages
It Application Tools in Business: Getting Started
Khristian Joshua G. Jurado
No ratings yet
Log
Document221 pages
Log
Zainuddin Satria Manggala
No ratings yet
Embedded System Architectures: Dr. Shubhajit Roy Chowdhury
Document29 pages
Embedded System Architectures: Dr. Shubhajit Roy Chowdhury
vinitgarg1993
No ratings yet
Ch.1 - Introduction: Study Guide To Accompany Operating Systems Concepts 9
Document7 pages
Ch.1 - Introduction: Study Guide To Accompany Operating Systems Concepts 9
John Fredy Redondo Morelo
No ratings yet
Entrust and You New Version
Document40 pages
Entrust and You New Version
Ahmad Shahin
No ratings yet
520CMD01 CS en
Document4 pages
520CMD01 CS en
DJ Thang
No ratings yet
Buk98150-55 - 2 MOSFET
Document9 pages
Buk98150-55 - 2 MOSFET
Olga Plohotnichenko
No ratings yet
Turning Centers Machine Code Glossary
Document3 pages
Turning Centers Machine Code Glossary
Laura B
No ratings yet
Lecture 7
Document25 pages
Lecture 7
Jayaraj Joshi
No ratings yet
LAS 205 - Syllabus - Fall 2022
Document3 pages
LAS 205 - Syllabus - Fall 2022
Ziad Kassab
No ratings yet
Final Exam QP - Microprocessor and Microcontroller - Sep - 2023
Document5 pages
Final Exam QP - Microprocessor and Microcontroller - Sep - 2023
sagilen91mariapah
No ratings yet
National Exams (2015 - 2021) Language
Document15 pages
National Exams (2015 - 2021) Language
Nasr-edineOuahani
No ratings yet
0026 DO Q-Control User Manual
Document32 pages
0026 DO Q-Control User Manual
Jerzy Dziewiczkiewicz
No ratings yet