Professional Documents
Culture Documents
INTRODUCTION
Many chatbots have been developed that provide a multitude of services through a
wide range of methods. A chatbot is a brand-new conversational agent in the highspeed
changing technology world. With the advance of Artificial Intelligence and machine learning,
chatbots are becoming more and more popular.
One of the potential paths to retrieve the info automatically and quickly is through a
chatbot. The interaction in the format of speech or text between humans and computers is
gaining more and more in popularity nowadays. People expect to have similar experiences
when they talk to machines as when they talk to human beings.
People input the natural language speech or text, while the program provides the most
feasible intelligent response in the form of text or speech. Worldwide various chat systems are
available that enable communication using natural languages.
1
These chat systems are broadly categorized into two main types namely,
Human-Human Dialog System - There is not any involvement of machine
learning in Human-Human Dialog System and this work as a negotiator between
two humans. The Human-Human Dialog System doesn‘t require natural
processing abilities through machines. Globally, WhatsApp and skype are the
most popular Human-Human Dialog chat system.
Human-Computer Dialog System - The other available chat system other than
Human-human dialog system is the Human-Computer Dialog System is termed
as chatbot . Basically, chatbot is a computer program that pretends chat with
humans through natural language. On any platform like mobile, website and
desktop application, this system can interact with the humans, chatbot simulates
as a human being. Human being only interacts with one human at a time, the
chatbot interacts and communicates with hundreds and thousands of persons
simultaneously. It works and responds without considering how many persons
are interacting and what time of the day and night it is. However, for natural
language processing ability, the development of chatbot is an arduous task.
Chatbots are applicable in many fields such as in education, travelling, real
estate, internet, gaming, education, ecommerce, hospitality and health, call
center, media, financial bot insurance banks, business, travelling customer
service and in shopping .
2
1.2 SYSTEM SPECIFICATION
Mother Board : HP
RAM : 2GB
3
1.2.3 SOFTWARE DESCRIPTION
PYTHON
Python has become a staple in data science, allowing data analysts and other
professionals to use the language to conduct complex statistical calculations, create data
visualizations, build machine learning algorithms, manipulate and analyze data, and
complete other data-related tasks. Python can build a wide range of different data
visualizations, like line and bar graphs, pie charts, histograms, and 3D plots. Python also
has a number of libraries that enable coders to write programs for data analysis and
machine learning more quickly and efficiently, like TensorFlow and Keras.
4
Python can even be used by relative beginners to automate simple tasks on the
computer such as renaming files, finding and downloading online content or sending emails or
texts at desired intervals. Python is often used to develop the back end of a website or
application—the parts that a user doesn‘t see. Python‘s role in web development can include
sending data to and from servers, processing data and communicating with databases, URL
routing, and ensuring security.
PYTHON: Advantages
Python supports automatic garbage collection. It can be easily integrated with C, C++, COM,
ActiveX, CORBA, and Java.
Python comes under the OSI approved open-source license. This makes
it free to use and distribute. You can download the source code, modify it and even distribute
your version of Python. This is useful for organizations that want to modify some specific
behaviour and use their version for development.
The standard library of Python is huge, you can find almost all the functions needed for
your task. So, you don‘t have to depend on external libraries. But even if you do, a Python
package manager (pip) makes things easier to import other great packages from the Python
package index (PyPi). It consists of over 200,000 packages.
Python is a very productive language. Due to the simplicity of Python, developers can
focus on solving the problem. They don‘t need to spend too much time in understanding
the syntax or behaviour of the programming language. You write less code and get more
things done.
5
Library used:
NLTK (Natural Language Toolkit) - NLTK is a leading platform for building Python
programs to work with human language data. It provides easy-to-use interfaces to over 50
corpora and lexical resources such as WordNet, along with a suite of text processing libraries
for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers
for industrial-strength NLP libraries, and an active discussion forum.
TensorFlow - TensorFlow is a free and open-source software library for dataflow and
differentiable programming across a range of tasks. It is a symbolic math library, and is also
used for machine learning applications such as neural networks. It is used for both research
and production at Google.
Numpy - NumPy is a library for the Python programming language, adding support for large,
multi-dimensional arrays and matrices, along with a large collection of high-level
mathematical functions to operate on these arrays.
Keras - Keras is a deep learning API written in Python, running on top of the machine
learning platform TensorFlow. It was developed with a focus on enabling fast
experimentation. Being able to go from idea to result as fast as possible is key to doing good
research.
6
BACK END: Python Runtime
These are re-implementations of the Python language that do not depend on (or
necessarily interact with) the CPython runtime core. Many of them reuse (a large part of) the
standard library implementation. Note that most of these projects have not yet achieved
language compliance. However, many of these have goals and features or run in certain
environments that make them interesting in their own regard.
The python compiler reads a python source code or instruction. Then it verifies that
the instruction is well-formatted, i.e. it checks the syntax of each line. If it encounters an
error, it immediately halts the translation and shows an error message. If there is no error,
i.e. if the python instruction or source code is well-formatted then the compiler translates it
into its equivalent form in an intermediate language called ―Byte code‖. Byte code is then
sent to the Python Virtual Machine (PVM) which is the python interpreter. PVM converts
the python byte code into machine-executable code. If an error occurs during this
interpretation then the conversion is halted with an error message.
Environment variable
7
Python Path
It is used to set the path for the user-defined modules so that it can be directly
imported into a Python program. It is also responsible for handling the default search path
for Python Modules. The PYTHONPATH variable holds a string with the name of various
directories that needs to be added to the sys.path directory list by Python. The primary use
of this variable is to allow users to import modules that are not made installable yet.
Python home
This variable is used to set the default location of the standard Python libraries. By
default, python searches for its libraries in prefix/lib/pythonversion and
exec_prefix/lib/pythonversion. Here the prefix and exec_prefix are installation-dependent
directories, and both of them default to /usr/local.
Python start-up
Whenever the python interpreted is first initialized Python looks for a readable file
with the name .pythonrc.py in Unix and executes the commands inside it. The path to the
same file is stored in the ‗Python startup‘ variable. These files are responsible for setting up
the ‗Python path‘.
Python inspect
If the ‗Python inspect‘ variable is an empty string, it compels python to enter into an
interactive python-mode and ignore the ‗Python Start-up‘ files. It can also make changes to
the Python code and force it to enter the inspect mode on program termination. It is
equivalent to using the -i command-line option.
Python caseok
This environment variable is used to ignore all import statements while calling the
Python interpreter. In a Windows machine, it is used to find the first case-insensitive match
in an import statement.
8
Python verbose
If this variable is set to an empty string, it prints a message every time a module is
initialized showing the location of the file or the module from where it has been loaded. It
also generates information on module cleanup at the termination of the python program.
The above variables are the primary environment variables that are frequently used.
The above variables are the primary environment variables that are frequently used .
In this mode, you basically call the Python interpreter and throw a bunch of
commands to it to execute. To enter into the interactive mode of Python use the below
commands:
C:> python # Windows/DOS
Using command-line:
In this method of using python, you need to call the python interpreter first and then
ask it to run a python file.
Using an IDE
There are plenty of IDEs available on the internet like VScode, Sublime editor, and
pycharm, etc. Here we using Pycharm IDE. PyCharm is a dedicated Python Integrated
Development Environment (IDE) providing a wide range of essential tools for Python
developers, tightly integrated to create a convenient environment for productive Python, web,
and data science development.
9
2. SYSTEM STUDY
Students manually visit to the college to get answers for their queries by the college help
desk.
This process consumes lot of time as well as money as the customer needed to visit
college if it‘s miles away from home.
The existing system is a manual system. Each and every action is done manually by in
person. It takes more time to finish and also having chances of committing some delay and
some cases we may not see their requirements. The maintenance is also difficult for the
existing system.
Also, the manual process performs the above requirements it takes more time. And there
is no possible for a system to interact with the humans.
10
2.2 PROPOSED SYSTEM
A chatbot system is a software program that interacts with users using its own language
called the natural language. The purpose of a chatbot system is to simulate a conversation
with a human which is so human-like that the person gets fooled into believing that he's
talking with a human.
Chatbots seem to hold tremendous promise for providing users with quick and convenient
support responding specifically to their questions. The most frequent motivation for chatbot
users is considered to be productivity, while other motives are entertainment, social factors,
and contact with novelty. However, to balance the motivations mentioned above, a chatbot
should be built in a way that acts as a tool, a toy, and a friend at the same time.
By using chatbot students just have to query through the bot which is used for chatting.
Students can chat using any format there is no specific format the user has to follow. The
system uses built in Natural language processing to answer they query.
11
3. SYSTEM DESIGN AND DEVELOPMENT
Designs are used to enter message to the chatbot. The concept of a loop within a
conversation is intangible. We can speak to it, but it‘s not always so easy to define what‘s
actually happening, and it‘s even harder to visualize. Designing a conversation has many
similarities to designing an interface, it‘s just a matter of finding where they overlap and
extending the concept.
At the most basic level, a loop could consist of a user bouncing from one card to the next,
and back again. If we were to visualize this with each circle representing a card, a single piece
of content, within the bot.
Input Design is the process of converting a user-oriented description of the input into a
computer-based system. This design is important to avoid errors in the data input process and
show the correct direction to the management for getting correct information from the
computerized system. It is achieved by creating user-friendly screens for the data entry to
handle large volume of data. The goal of designing input is to make data entry easier and to
be free from errors.
Input design must capture all the data that the system needs, without introducing any
errors. Input error can be greatly reduced when inputting directly by,
Entry box - The Entry box used to enter queries or messages by the user. User can insert
multiline text as a input.
Send button - Send button in this chatbot accustomed-to send the messages and queries to
the chatbot. In this stage the messages send by the user are browsed in json.
Scrollbar - If there is long chat is exists, we need to go top first to the end of chat. For this
case we need scrollbar. By using scrollbar user can easily move the chat up and down.
12
3.2 OUTPUT DESIGN
Output design is very important concept in the computerized system, without reliable
output the user may feel the entire system is unnecessary and avoids using it. The proper
output design is important in any system and facilitates effective decision-making.
A quality output is one, which meets the requirements of the end user and presents the
information clearly. In any system results of processing are communicated to the users and to
other system through outputs. In output design it is determined how the information is to be
displaced for immediate need and also the hard copy output. It is the most important and
direct source information to the user. Efficient and intelligent output design improves the
system‘s relationship to help user decision-making.
Designing computer output should proceed in an organized, well thought out manner;
the right output must be developed while ensuring that each output element is designed so
that people will find the system can use easily and effectively.
When analysis design computer output, they should Identify the specific output that is
needed to meet the requirements. Select methods for presenting information. Create
document, report, or other formats that contain information produced by the system.
The output design is an ongoing activity almost from the beginning of the project, and
follows the principles of form design. Effects and well define an output design improves the
relationship of system and the user, thus facilitating decision-making.
A quality output is one, which meets the requirements of the end user and presents the
information clearly. In any system results of processing are communicated to the users and to
other system through outputs. It is the most important and direct source information to the
user. Efficient and intelligent output design improves the system‘s relationship to help user
decision-making.
Chatlog
Chatlog is the area that shows output of the queries asked by user. Chatbot read data
sets in the json file and transfer the data which is needed for the user‘s question.
13
3.3 DATABASE DESIGN
HDF5 - The h5py package is a Pythonic interface to the HDF5 binary data format.
HDF5 lets you store huge amounts of numerical dat a, and easily manipulate that data from
NumPy. For example, you can slice into multi-terabyte datasets stored on disk, as if they
were real NumPy arrays.
Data integration
In a database, information from several file are coordinated, accessed and operated
upon as through it is in a single file. Logically, the information are centralized, physically, the
data may be located on different devices, connected through data communication facilities.
Data integrity
Data integrity means storing all data in one place only and how each application to
access it. This approach results in more consistent information, one update being sufficient to
achieve a new record status for all applications which use it.
14
Conceptual design
The next step is to form a concise description of the data requirements using a high
level data model. This description would be independent of storage requirements. This step
involves identifying entities involves in the system, and the relationship between the different
entities. Entities and relationships are depicted in the form of a diagram called the Entity
relationship Diagram.
15
3.4 SYSTEM DEVELOPMENT
A) User
B) Chatbot
C) Information
A) USER
Text to machine: User will ask the computer to run command by giving input as text.
Command execution: Based on command received from the user, system will execute
the command(if available). Eg. Greetings
Machine to text: Once a command is received ,application speaks the command which
make user experience more interactive with the system.
B) CHATBOT
The machine has been embedded knowledge to identify the sentences and making a
decision itself as response to answer a question.
User can chat with the bot it implies as if enquiring to the college details.
C) INFORMATION
It can answer the questions asked by the user whatever question that is related to
college.
It search and gives that particular college information.
16
4. SYSTEM TESTING AND IMPLEMENTATION
UNIT TESTING
The software units in the system are modules and routines that are assembled and
integrated to perform a specific function. As a part of unit testing we executed the program
for individual modules independently. This enables, to detect errors in coding and logic that
are contained within each of the three module. This testing includes entering data that is
filling forms and ascertaining if the value matches to the type and entered into the database.
The various controls are tested to ensure that each performs its action as required.
INTEGRATION TESTING
Data can be lost across any interface, one module can have an adverse effect on
another, sub functions when combined, may not produce the desired major functions.
Integration testing is a systematic testing to discover errors associated within the interface.
The objective is to take unit tested modules and build a program structure. All the modules
are combined and tested as a whole. Here the admin module, employee module and student
module options are integrated and tested. This testing provides the assurance that the
application is well integrated functional unit with smooth transition of data.
17
4.2 SYSTEM IMPLEMENTATION
Implementation is the stage of the project when the theoretical design is turned out
into a working system. Thus it can be considered to be the most critical stage in achieving a
successful new system and in giving the user, confidence that the new system will work and
be effective. The implementation stage involves careful planning, investigation of the
existing system and it‘s constraints on implementation, designing of methods to achieve
changeover and evaluation of changeover methods.
Implementation Procedures
18
5. CONCLUSION
The main objectives of the project were to develop an algorithm that will be used to
identify answers related to user submitted questions. To develop a database were all the
related data will be stored and to develop a web interface. The web interface developed had
two parts, one for simple users and one for the administrator. A background research took
place, which included an overview of the conversation procedure and any relevant chatbots
available. A database was developed, which stores information about questions, answers,
keywords, logs and feedback messages.
An evaluation took place from data collected by potential data‘s of the college. Also
after received feedback from the first deployment, extra requirements were introduced and
implemented. The more a person interacts with chatbots, the more trends, and patterns the
system identifies based on the information it receives. Then, this data can be utilized to
determine user preferences and tastes, which is a long-term selling point for making a home
smarter.
FUTURE ENHANCEMENT
Set up voice terminals for chatbot. While voice interface may be optional, chatbots have
been in the enterprise long enough for developers and experts to begin identifying what
elements of chatbots are mainstay requirements.
NLP development, human-like conversational flexibility and 24/7 service are crucial to
maintaining chatbots' longevity in enterprise settings. Chatbots are AI devices and, looking
ahead, they need to keep up with AI trends, such as automated machine learning, easy system
integration and developing intelligence.
Adding face detection and face recognition in chatbot. The R&D centers of various
organizations are teaching chatbots to behave as humans do.
Chatbots are getting the skills of humans, and it will increase the rate of satisfaction. So
chatbots are becoming more human for outstanding results while benefiting the customers.
19
BIBILIOGRAPHY
BOOK REFERENCES
WEBSITE REFERENCES:
1. https://docs.python.org/3/
2. https://github.com/parulnith/Building-a-Simple-Chatbot-in-Python-using-NLTK
3. https://www.edureka.co/blog/how-to-make-a-chatbot-in-python/
4. https://keras.io/
5. https://www.nltk.org/
6. https://www.tensorflow.org/tutorials
7. https://numpy.org/
8. https://chatterbot.readthedocs.io/en/stable/
9. https://stackoverflow.com/
10. https://www.pythonanywhere.com/
20
APPENDICES
21
B. SAMPLE CODE
Chat_GUI.py
import nltk
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
import pickle
import numpy as np
def clean_up_sentence(sentence):
# tokenize the pattern - splitting words into array
sentence_words = nltk.word_tokenize(sentence)
# stemming every word - reducing to base form
sentence_words = [lemmatizer.lemmatize(word.lower()) for word in sentence_words]
return sentence_words
22
for s in sentence_words:
for i,word in enumerate(words):
if word == s:
# assign 1 if current word is in the vocabulary position
bag[i] = 1
if show_details:
print ("found in bag: %s" % word)
return(np.array(bag))
def predict_class(sentence):
# filter below threshold predictions
p = bag_of_words(sentence, words,show_details=False)
res = model.predict(np.array([p]))[0]
ERROR_THRESHOLD = 0.25
results = [[i,r] for i,r in enumerate(res) if r>ERROR_THRESHOLD]
# sorting strength probability
results.sort(key=lambda x: x[1], reverse=True)
return_list = []
for r in results:
return_list.append({"intent": classes[r[0]], "probability": str(r[1])})
return return_list
23
from tkinter import *
def send():
msg = EntryBox.get("1.0",'end-1c').strip()
EntryBox.delete("0.0",END)
if msg != '':
ChatBox.config(state=NORMAL)
ChatBox.insert(END, "You: " + msg + '\n\n')
ChatBox.config(foreground="#446665", font=("Verdana", 12 ))
ints = predict_class(msg)
res = getResponse(ints, intents)
ChatBox.config(state=DISABLED)
ChatBox.yview(END)
root = Tk()
root.title("Chatbot")
root.geometry("400x500")
root.resizable(width=FALSE, height=FALSE)
ChatBox.config(state=DISABLED)
24
#Create Button to send message
SendButton = Button(root, font=("Verdana",12,'bold'), text="Send", width="12",
height=5, bd=0, bg="#f9a602", activebackground="#3c9d9b",fg='#000000', command=
send )
root.mainloop()
Train_chat.py
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
from keras.optimizers import SGD
import random
import nltk
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
import json
import pickle
words=[]
25
classes = []
documents = []
ignore_letters = ['!', '?', ',', '.']
intents_file = open('intents.json').read()
intents = json.loads(intents_file)
pickle.dump(words,open('words.pkl','wb'))
pickle.dump(classes,open('classes.pkl','wb'))
26
# create an empty array for our output
output_empty = [0] * len(classes)
# training set, bag of words for each sentence
for doc in documents:
# initialize our bag of words
bag = []
# list of tokenized words for the pattern
pattern_words = doc[0]
# lemmatize each word - create base word, in attempt to represent related words
pattern_words = [lemmatizer.lemmatize(word.lower()) for word in pattern_words]
# create our bag of words array with 1, if word match found in current pattern
for word in words:
bag.append(1) if word in pattern_words else bag.append(0)
# output is a '0' for each tag and '1' for current tag (for each pattern)
output_row = list(output_empty)
output_row[classes.index(doc[1])] = 1
training.append([bag, output_row])
# shuffle our features and turn into np.array
random.shuffle(training)
training = np.array(training)
# create train and test lists. X - patterns, Y - intents
train_x = list(training[:,0])
train_y = list(training[:,1])
print("Training data created")
# Create model - 3 layers. First layer 128 neurons, second layer 64 neurons and 3rd output
layer contains number of neurons
# equal to number of intents to predict output intent with softmax
model = Sequential()
model.add(Dense(128, input_shape=(len(train_x[0]),), activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
27
model.add(Dropout(0.5))
model.add(Dense(len(train_y[0]), activation='softmax'))
# Compile model. Stochastic gradient descent with Nesterov accelerated gradient gives
good results for this model
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
print("model created")
intents.json
{"intents": [
{"tag": "greetings",
},
{"tag": "name",
"responses": ["You can call me Snitch","I'm Snitch","I'm Snitch your virtual assistant"]
},
28
{"tag": "patterns",
"patterns": ["How you could help me?", "What you can do?", "What help you provide?",
"How you can be helpful?", "What support is offered"],
"responses": ["I can guide you through \n1. Courses\n2. Course duration\n3. Fees\n4.
Location\n5. Semester\n6. Sem Duration\n7. Admission Requirement\n8. Classes\n9.
Exams\n10. Hours\n11. Facilities\n12. Fun Activities"]
},
{"tag": "courses",
"patterns": ["what courses are available", "how many courses are there in this
college","courses"],
},
{"tag": "courseDuration",
"responses": ["For B.E, B.Tech it will take 4years,\n\n For M.E, M.Tech, MBA, MCA it
will take 2years"]
},
{"tag": "Location",
29
},
{"tag": "semesters",
"patterns": ["how many semesters are there in a year","how many semesters one should
study in a year","how many semesters per year","semesters"],
},
{"tag": "semDuration",
"patterns": ["how many months are there in a semester","how long will be a single
semester","sem duration"],
},
{"tag": "studentRequirements",
"responses": ["Eligibility \n1. B.E - 10+2 with 50%+TNEA\\n\\n2. M.E - Graduation with
50% + GATE\\n\\n3. B.Tech - 10+2 with 50% + TNEA\\n\\n4. M.Tech - Graduation with
50% + GATE\\n\\n5. MBA - Graduation with 50% + TANCET\\n\\n6. MCA -
Graduation with 50% + TANCET"]
},
{"tag": "classes",
"patterns": ["how many classes will be there in a day","how long are the
classes?","classes"],
"responses": ["There may be four or five classes per day. Each class will be of 1 hour and
30 minutes."]
30
},
{"tag": "exams",
"responses": ["There are assignments which carry more weight than your written exams.
The assignments have deadlines which you should not exceed if you want to get better
marks."]
},
{"tag": "hours",
"patterns": ["what are your hours","when are you guys open","what your hours of
operation"],
"responses": ["You can message us here at any hours. But our college premises will be
open from 7:00 am to 5:00 pm only."]
},
{"tag": "funActivities",
"patterns": ["will there be any extra curriculum activities?","does the college conducts
any fun program"],
"responses": ["Yes, Of course. Our college not only provides excellent education but also
encourage students to take part in different curriculum activities. The college conducts
yearly programs like Sports meet, Carnival, Holi festival, and Etc. \n Also our college has
basketball court, badminton court, table tennis, chess, carrom board and many more
refreshment zones.\n There are many intra college meet and inter college meets."]
},
31
{"tag": "facilities",
"patterns": ["what facilities are provided by the college?","what are the facilities of
college for students", "what are the college infrastructures "],
"responses": ["With excellent education facilities, Our College provides various other
facilities like 24 hours internet, library, classes with AC, discusson room, canteen,
parking space, and student service for any students queries."]
},
{"tag": "fee",
"responses": ["Fees Structure per year, \n\n1. B.E - 60000\n\n2. M.E - 50000\n\n3.
B.Tech - 60000\n\n4. M.Tech - 35000\n\n5. MBA - 50000\n\n6. MCA - 41000"]
},
{"tag": "goodbye",
},
{"tag": "invalid",
"patterns": ["","gvsd","asbhk"],
"responses": ["Sorry, can't understand you", "Please give me more info", "Not sure I
understand"]
},
32
{"tag": "thanks",
"patterns": ["Thanks", "Thank you", "That's helpful", "Awesome, thanks", "Thanks for
helping me"],
]}
33
C. SCREENSHOTS
34
F.3: TRAINING DATA CREATED
35
D. REPORTS
36
F.5: CHATBOT RESPONCE - 2
37
E.
38
F.7: CHATBOT RESPONCE - 4
39
F.8: CHATBOT RESPONCE - 5
40
F.9: CHATBOT RESPONCE - 6
41