Professional Documents
Culture Documents
INTRODUCTION
1.1. Overview
A college is an educational institution that typically offers undergraduate programs leading to
bachelor's degrees, as well as graduate programs leading to master's degrees and doctoral
degrees. Colleges may be either public or private, and they may offer a wide range of
academic disciplines, such as arts, sciences, humanities, engineering, and business. In many
countries, including the India, colleges are an integral part of the higher education system and
are often considered to be the next step after completing high school. College education
provides students with an opportunity to deepen their knowledge in a specific field of study,
develop critical thinking skills, and gain practical experience through internships or research
projects. In addition to academic programs, colleges may offer a range of extracurricular
activities, such as sports teams, clubs, and organizations, to provide students with a well-
rounded educational experience. Colleges may also offer various support services, such as
academic advising, career counselling, and financial aid, to help students succeed in their
studies and prepare for their future careers.
Rule-based chatbots are able to hold basic conversations based on “if/then” logic. These
chatbots do not understand context or intents. Human agents map out conversations via a
flowchart, anticipating what a customer might ask, and program how the chatbot should
respond. We use logical next steps and clear call-to-action buttons to build rule-based
chatbots conversations. Companies design rule-based chatbots to answer simple questions
and often bring web visitors to a live agent to further the conversation. They are not designed
to learn and become smarter over time. We can build a rule-based chatbot with very simple
or more complicated rules. They can’t, however, answer any questions outside of the defined
rules. Rule-based chatbots do not learn through interactions and only perform and work
within scenarios for which they are trained for.
There are many existing machine learning algorithms used in chatbots, including but not
limited to:
1. Rule-based chatbots - These chatbots are based on a set of predefined rules that are
used to determine the response to a user's input.
2. Naive Bayes - This algorithm is used for text classification, including sentiment
analysis, spam filtering, and topic classification.
3. Support Vector Machines (SVM) - This algorithm is commonly used in text
classification tasks and can classify data into two or more categories.
4. Decision Trees - This algorithm is used to make decisions based on a series of
questions and answers that lead to a conclusion.
5. Random Forest - This algorithm is a collection of decision trees that work together to
make a prediction.
6. Neural Networks - This algorithm is inspired by the structure of the human brain and
is used for tasks such as image recognition, natural language processing, and speech
recognition.
7. Recurrent Neural Networks (RNN) - This type of neural network is used for tasks
where the input data is a sequence, such as natural language processing.
8. Long Short-Term Memory (LSTM) - This is a type of RNN that is used for tasks
where the input sequence is long and requires the model to remember information
from earlier in the sequence.
9. Transformer - This is a neural network architecture that has gained popularity for
natural language processing tasks, such as language translation and text
summarization.
These are just a few examples of machine learning algorithms used in chatbots. The choice of
algorithm will depend on the specific requirements of the chatbot and the type of data it will
be processing.
Disadvantages
While machine learning algorithm based chatbots have numerous advantages, they also have
some disadvantages. Here are a few of them:
1. Data Dependency: Machine learning algorithm based chatbots require a lot of
training data to learn and understand the user's queries. They can be less effective if
there isn't enough data or the quality of the data is poor.
2. Limited Scope: These chatbots have a limited scope and can only answer questions
that they are trained for. If a user asks a question that the chatbot hasn't been trained
for, it may not be able to provide a useful response.
3. Lack of Personality: Machine learning chatbots are often perceived as impersonal
because they don't have the ability to convey emotions or respond in a conversational
manner. This can make the interaction feel robotic and unsatisfying for some users.
4. Requires Frequent Updates: Machine learning algorithms require frequent updates
to stay relevant. Chatbots that are not updated frequently may provide outdated
responses or incorrect information.
5. Security Risks: As with any system that uses data, machine learning chatbots are
vulnerable to security risks. If a chatbot is not properly secured, it can be hacked and
sensitive information can be stolen.
Proposed System
The proposed system of "CollegeBot: An AI-based College Chatbot" includes the following
components:
AI chatbots are more complex programmed bots based on Natural Language Processing
(NLP) and Machine Learning (ML) algorithms. Unlike rule-based chatbots, AI-powered
chatbots learn as they go. Human agents train chatbots to decipher free form conversations
based on “Intents” and “Entities.” “Brains” in turn rule Intents and Entities. Entities identify a
subject: People, place, or things. Intents are a bit harder to grasp. Here is a description of
“intents” from Hoover – our Intelligent Agent:
AI-Powered Chatbots are more complex chatbots, often empowered with Natural Language
Processing (NLP) and Machine Learning (ML) algorithms. Unlike rule-based chatbots, AI-
powered bots can answer a user with non-pre-defined responses, and ML helps them to learn
from each integration with the user and remember one’s preferences.
Training: The system first collects a dataset related to college and pre-processes it using
techniques such as stemming, lemmatization, removal of stop words, and tokenization. Then,
it uses feature extraction techniques such as bag of words (BoW) and term frequency–inverse
document frequency (TF-IDF) to represent the text data in a vectorised form. The system
then trains the model using Natural Language Processing (NLP) techniques.
Collecting dataset related to College: This step involves gathering relevant data
related to college from various sources in the college
Pre-processing: The collected data is pre-processed to make it suitable for analysis.
This includes techniques such as stemming, lemmatization, removal of stop words,
and tokenization.
Feature Extraction: The pre-processed data is then converted into a numerical
format that can be used for analysis. This is done through techniques such as bag of
words (BoW) and term frequency–inverse document frequency (TF-IDF).
Training: The extracted features are used to train the chatbot using natural language
processing (NLP) techniques.
Testing: The testing phase of the system involves Students/Staff/Parents/Others registering
and logging in to the chatbot. They can input their queries in English, and the system uses
NLP techniques such as intent recognition, entity recognition, dependency parsing, and
generate response to provide relevant and accurate answers. The system also gives responses
in text as well as text-to-speech formats.
Overall, the proposed system aims to provide an efficient and user-friendly platform for
Students/Staff/Parents/Others to get information related to college using AI and NLP
techniques.
Inputting queries: Students/Staff/Parents/Others would input their queries related to
college in English.
Intent recognition: The chatbot would need to identify the intent of the
Students/Staff/Parents/Others 's query, such as whether they are looking for
information on college.
Entity recognition: The chatbot would also need to recognize any relevant entities in
the Users’ query, such as specific class timings, college enquires, or exam timetable
etc.
Dependency parsing: This step involves analysing the relationships between the
words in the user's query to better understand the meaning behind the query.
Generating response: Finally, the chatbot would use the information gathered from
the intent recognition, entity recognition, and dependency parsing steps to generate a
response to the user’s query.
Text and Speech Response: The response would be given in both text and speech
format to make it more accessible for the Students/Staff/Parents/Others. The chatbot
responds to the user’s query with both text and speech.
Advantages
The proposed system of "CollegeBot: An AI based College Chatbot" has several advantages,
such as:
Quick and Convenient Access: Students/Staff/Parents/Others can easily access the
system from anywhere, at any time using a smartphone, laptop or tablet, making it a
quick and convenient way for them to get help and support.
Personalized Assistance: The chatbot provides personalized assistance to
Students/Staff/Parents/Others, based on their specific queries, allowing them to
receive customized guidance and support.
Instant Response: The system provides an instant response to
Students/Staff/Parents/Others, helping them to save time and make more informed
decisions.
24/7 Availability: The chatbot is available 24/7, meaning that
Students/Staff/Parents/Others can access help and support at any time, without having
to wait for office hours.
Cost-effective: The system is cost-effective, as it requires minimal human resources
to operate and can handle multiple queries simultaneously.
Increased Productivity: By providing Students/Staff/Parents/Others with the
information and support they need, the chatbot can help to increase their productivity
and profitability.
Improved Accuracy: The use of natural language processing (NLP) and machine
learning algorithms can help to improve the accuracy of the system, ensuring that
Students/Staff/Parents/Others receive reliable and trustworthy information.
CHAPTER
SYSYEM DESIGN
System Architecture
CollegeBot Training
CollegeBot Web
App
Web Admin
CollegeBot Response
Prediction
Intent Recognition
Entity Recognition
Dependency Parsing Login
CollegeBot
Response
Input Query
Text
Speech
Student/Staff/
parents/Other users
System Flow
System Flow
The system flow of "CollegeBot: An AI based College Chatbot" can be described as follows:
1. Admin collects the dataset related to college and performs pre-processing tasks such
as stemming, lemmatization, removal of stop words, and tokenization.
2. Admin performs feature extraction using two techniques: bag of words (BoW) and
term frequency–inverse document frequency (TF-IDF).
3. Admin trains the model using natural language processing (NLP).
4. Users register and login to the system.
5. Users input their queries in English.
6. The system predicts the intent of the query using NLP.
7. The system recognizes entities such as staff names, college exam schedule, college
admission enquires, courses conducted, etc. from the query.
8. The system performs dependency parsing to understand the grammatical structure of
the query.
9. The system generates a response to the query based on the intent, recognized entities,
and grammatical structure using NLP.
10. The system gives the response to the user in text format as well as text to speech
format.
11. The user can ask more queries or logout from the system.
System Implementation
The proposed system of "CollegeBot: An AI based College Chatbot" can be developed using
Python Flask NLP packages and MySQL database. The system architecture can include the
following components:
1. Admin Module: This module includes the functionality of collecting the dataset
related to college, pre-processing it using stemming, lemmatization, removal of stop
words, and tokenization techniques. Then, the feature extraction is done using two
techniques - bag of words (BoW) and term frequency-inverse document frequency
(TF-IDF). Finally, the NLP model is trained using the extracted features.
2. User Module: This module includes two types of users – Student/Staff/parents/Other
users and system administrators. users can register and login to the system. They can
input their queries in English, which will be analysed by the NLP model to predict the
intent, recognize entities, and generate responses using dependency parsing. The
responses can be in text as well as text to speech format. System administrators can
access the dataset, train the model, and monitor the system's performance.
3. NLP Packages: The NLP packages used in the system can include NLTK, spaCy, and
TextBlob. These packages can provide functionalities like tokenization, stemming,
lemmatization, removal of stop words, entity recognition, dependency parsing, and
sentiment analysis.
4. MySQL Database: The MySQL database can be used to store the dataset, user
information, and system logs. The dataset can be stored in a separate table, and the
user information can be stored in a separate table with appropriate attributes. The
system logs can be stored in a separate table to monitor the system's performance.
5. Flask Framework: The Flask framework can be used to develop the web application.
It can provide functionalities like routing, request handling, session management, and
user authentication.
The overall architecture can be designed with the Flask framework as the front end, NLP
packages as the processing engine, and MySQL database as the data storage. This system can
provide an efficient and user-friendly solution for users to get their queries resolved related to
college.
System Description
CollegeBot is an AI-based chatbot designed for users to provide college information and
support. The system consists of two main components: the admin panel and the user-facing
chatbot.
The admin panel is responsible for collecting and pre-processing the college data required
for training the chatbot. The admin panel collects the relevant college dataset and pre-
processes the data through techniques such as stemming, lemmatization, removal of stop
words, and tokenization. It then extracts features from the data using two techniques: bag of
words (BoW) and term frequency–inverse document frequency (TF-IDF). After feature
extraction, the data is trained using natural language processing (NLP) techniques. The user-
facing chatbot is designed to handle user queries and provide relevant information. It uses
NLP techniques to predict user intent, recognize entities, and perform dependency parsing to
generate appropriate responses. The chatbot can communicate in text as well as text-to-
speech format for enhanced accessibility. The chatbot also provides registration and login
functionality to personalize the user experience. Overall, CollegeBot aims to provide users
with access to college information and support through an easy-to-use, conversational
interface. The system allows users to obtain accurate and timely information that can help
them make informed decisions and improve their college practices.
Modules Description
1. CollegeBot Web App
The design and development of the CollegeBot web app involve integrating different
technologies and tools to create a seamless user experience for users. The front end, back end,
and database work together to provide an efficient and reliable platform for users to get
answers to their queries related to college.
1.1. Front End: The front end of the CollegeBot web app was implemented using HTML,
CSS, and JavaScript. The user interface allows student/staff/parents/other users to register,
log in, and input queries related to college. The chatbot response is displayed as text and also
as text-to-speech output.
1.2. Back End: The back end of the CollegeBot web app was implemented using Python
Flask. Flask is a web framework that allows developers to create web applications using
Python. The Flask app receives user queries from the front end and passes them to the NLP
module for prediction. The NLP module then generates the response, which is sent back to
the front end for display.
1.3. Database: The database used in the CollegeBot web app is MySQL. The database stores
user information such as name, email address, and password for registration and login
purposes. It also stores the chatbot training dataset and the output of the NLP module for each
user query.
2. CollegeBot Chat Window
The chat window of CollegeBot is the main interface where users can interact with the
chatbot. It is designed using HTML, CSS, and JavaScript and integrated with the backend
developed using Python Flask and MySQL.
The following are the modules and their descriptions used for the development of the
CollegeBot chat window:
2.1. HTML/CSS: The user interface is developed using HTML and CSS. The HTML file
includes the basic structure of the chat window, such as chat area, user input area, and send
button. The CSS file is used to style the chat window, such as color, font, and layout.
2.2. JavaScript: The chat window is interactive and dynamic, which is achieved by using
JavaScript. The JavaScript file handles the user input and sends it to the backend for
processing. It also receives the response from the backend and displays it on the chat
window.
2.3. Python Flask: The backend of the CollegeBot chat window is developed using Python
Flask. The Flask app receives the user input from the JavaScript file and processes it using
the NLP algorithm. The response is generated by the Flask app and sent back to the
JavaScript file.
2.4. MySQL: The chatbot's database is developed using MySQL. It stores the user's login
credentials, user input, and chatbot responses. The MySQL database is integrated with the
Python Flask app to retrieve and store data.
Overall, the chat window of CollegeBot is an interactive and user-friendly interface that
allows users to interact with the chatbot and get answers to their college-related queries.
3. End User Interface
CollegeBot is an AI-based chatbot designed to assist users with their college queries. The
chatbot has two interfaces: one for the admin and another for the users.
3.1. Admin Interface: The admin interface consists of modules for collecting, pre-
processing, and training the chatbot with data related to college. The admin can perform the
following tasks:
Collect dataset related to college
Pre-process the data by performing stemming, lemmatization, removal of stop
words, and tokenization
Feature extraction using two techniques: bag of words (BoW) and term
frequency–inverse document frequency (TF-IDF)
Train the chatbot using natural language processing techniques
3.2. User Interface: The user interface consists of modules for registering, logging in,
inputting queries in English, and receiving responses from the chatbot. The users can perform
the following tasks:
Register with the chatbot by providing their name, email address, and
password
Log in to their account
Input queries related to college in English
Receive responses from the chatbot that are generated using natural language
processing techniques
The chatbot responds in text and text-to-speech formats, making it accessible
to users who are not proficient in reading and writing.
Both the admin and user interfaces are designed using Python Flask and MySQL. The chat
window interface is designed using HTML, CSS, and JavaScript.4.
4. CollegeBot Training
CollegeBot, being an AI-based colleges' chatbot, requires extensive training in natural
language processing (NLP) techniques. The following are the sub modules involved in
training the CollegeBot chatbot:
Training
College.CSV Web UI
Testing
Storage
Data Exploration: The module performs an initial exploration of the dataset to understand
the characteristics of the data. This includes calculating basic statistics such as mean, median,
mode, and standard deviation for each column of the dataset, visualizing the distribution of
data using histograms, scatterplots, and box plots
Read
Visualize
Import Web UI
Store
Storage
4.3. Pre-processing
Pre-processing is an important step in the development of any NLP application, including
chatbots. The pre-processing module of CollegeBot includes the following sub-modules:
Tokenization: This module is responsible for breaking down the text input into smaller units
such as words, phrases, or sentences, known as tokens. In CollegeBot, tokenization is done
using Python's built-in nltk package.
Sentence Tokens
Stopword removal: This module removes common words in the English language that do
not add much meaning to the text, such as "the", "and", "in", etc. This helps reduce the
dimensionality of the data and speed up the processing. In CollegeBot, stopword removal is
also done using nltk.
Stemming and Lemmatization: This module reduces words to their root form or lemma,
which helps in reducing the number of unique words in the data and grouping together words
with similar meanings. In CollegeBot, stemming and lemmatization are performed using
nltk.
Tokenization
college Dataset
Pre-processing
Stemming
Storage
Eq. 1-5
Feature extraction in DL with the context of words is also essential. The technique used for
this purpose is word2vec neural network-based algorithm. Equation 5 given below shows
how word2vec manages the word-context with the help of probability measures. The D
represents the pair-wise illustration of a set of words, and (w; c) is the word-context pair
drawn from the large set D.
Eq.5
The multi-word context is also a variant of word2vec, as shown in Equation 6. The variable-
length context is also controlled by the given below mathematics.
Eq.6
4.4.2. Bag of Words (BoW)
This is a commonly used technique for feature extraction. In this approach, each word in the
dataset is treated as a feature, and a matrix is created to represent the frequency of each word
in each document. This matrix is used as input to machine learning models.
Word Embedding Layer
Embedding is the representation of words into real numbers. Many machine learning and DL
Algorithms cannot process data in raw form (text form) and can only process numerical
values as input for learning. Word embedding organizes texts which are converted into
numbers. It extracts relevant features from the textual data and structures them up in the form
of real values. Word embedding uses a word mapping dictionary to convert the terms (words)
to a real value vector. There are two main problems with machine learning feature
engineering techniques, one problem is the sparse vectors for data representation, and the
second issue is that; it does not take into account the meaning of words to some extent. In
embedding vectors, similar words will be represented by the almost near real-valued
numbers. For example, the terms love and affection will be near to each other in the
embedding vector.
TF-IDF
Bag of Words
Preprocessed Dataset
Feature Extraction
WordCloud
Storage
4.5. Classification and Training
Training the CollegeBot model: Once the features have been extracted, the machine learning
model is trained on the training set. The model is typically trained using supervised learning
algorithms such as LSTM.
Design a classifier model which can be trained on the corpus with respect to the target
variable i.e. the Tag from the corpus.
Encoder units helps to ‘understand’ the input sequence (“Are you free tomorrow?”) and the
decoder decodes the ‘thought vector’ and generate the output sequence (Yes, what’s up?”).
Thought vector can be thought of as neural representational form of input sequence, which
only the decoder can look inside and produce output sequence.
Storage
Accuracy: It measures the percentage of correctly predicted labels out of all labels. It can be
calculated as:
Accuracy = (TP + TN) / (TP + TN + FP + FN)
Precision: It is the ratio of correctly predicted positive instances to the total predicted
positive instances. It can be calculated as:
Precision = TP / (TP + FP)
Recall: It is the ratio of correctly predicted positive instances to the total actual positive
instances. It can be calculated as:
Recall = TP / (TP + FN)
F1-Score: It is the harmonic mean of precision and recall. It can be calculated as:
F1-Score = 2 * ((Precision * Recall) / (Precision + Recall))
These performance metrics can be used to measure the accuracy of CollegeBot in recognizing
user intents and providing appropriate responses.
Algorithm
Step 1: Data Extraction and Preprocessing
The dataset hails from College. - It contains College Information
Parse each. yml file:
1. Concatenate two or more sentences if the answer has two or more of them.
2. Remove unwanted data types which are produced while parsing the data.
3. Append and to all the answers.
4. Create a Tokenizer and load the whole vocabulary (questions + answers) into it.
Three arrays required by the model are encoder_input_data, decoder_input_data and
decoder_output_data
Encoder_input-data: Tokenize the questions and Pad them to maximum length.
Decoder_input-data: Tokenize the answers and Pad them to maximum length.
Decoder_output-data: Tokenize the answers and Remove the first element from all the
tokenized_answers. This is the element which we added earlier.
Step 2: Defining the Encoder-Decoder Model
The model will have Embedding, LSTM and Dense layers. The basic configuration is
as follows:
1. 2 Input Layers: One for encoder_input_data and another for decoder_input_data.
2. Embedding layer: For converting token vectors to fix sized dense vectors. (Note :
Don't forget the ask_zero=True argument here )
3. LSTM layer: Provide access to Long-Short Term cells.
Working:
The encoder_input_data comes in the Embedding layer ( encoder_embedding ).
The output of the Embedding layer goes to the LSTM cell which produces 2 state
vectors ( h and c which are encoder_states )
These states are set in the LSTM cell of the decoder.
The decoder_input_data comes in through the Embedding layer.
The Embedding’s goes in LSTM cell (which had the states) to produce sequences
.
Long Short Term Memory (LSTM):
Long Short-Term Memory (LSTM) networks are a type of recurrent neural network
capable of learning order dependence in sequence prediction problems.
This is a behaviour required in complex problem domains like machine translation,
speech recognition, and more.
The success of LSTMs is in their claim to be one of the first implements to overcome
the technical problems and deliver on the promise of recurrent neural
networks.
LSTM network is comprised of different memory blocks called cells (the rectangles
that we see in the image).
There are two states that are being transferred to the next cell; the cell state and the
hidden state.
The memory blocks are responsible for remembering things and manipulations to this
memory is done through three major mechanisms, called gates.
The key to LSTMs is the cell state, the horizontal line running through the top of the
diagram.
The cell state is kind of like a conveyor belt. It runs straight down the entire chain,
with only some minor linear interactions. It’s very easy for information to just flow
along it unchanged.
The LSTM does have the ability to remove or add information to the cell state,
carefully regulated by structures called gates.
Gates are a way to optionally let information through. They are composed out of a
sigmoid neural net layer and a pointwise multiplication operation.
The sigmoid layer outputs numbers between zero and one, describing how much of
each component should be let through. A value of zero means “let nothing through,”
while a value of one means “let everything through!”
Step 4: Training the Model
We train the model for a number of 150 epochs with RMSprop optimizer and
categorical_crossentropy loss function.
Model training accuracy = 0.96 ie; 96%
Step 5: Defining Inference Models
Encoder inference model: Takes the question as input and outputs LSTM states (h
and c).
Decoder inference model: Takes in 2 inputs, one is the LSTM states (Output of
encoder model), second are the answer input sequences (ones not having the tag). It
will output the answers for the question which we fed to the encoder model and its
state values.
Step 6: Talking with our Chatbot
First, we define a method str_to_tokens which converts str questions to Integer tokens
with padding.
1. First, we take a question as input and predict the state values using enc_model.
2. We set the state values in the decoder's LSTM.
3. Then, we generate a sequence which contains the element.
4. We input this sequence in the dec_model.
5. We replace the element with the element which was predicted by the dec_model and
update the state values.
6. We carry out the above steps iteratively till we hit the tag or the maximum answer
length.
Feasibility Study
Feasibility study is an important process that evaluates the potential success of a proposed
system. Here's a feasibility study of "CollegeBot: An AI based College Chatbot" developed
with Python Flask NLP Packages and MySQL:
Technical Feasibility:
The proposed system is technically feasible, as it uses commonly available
technologies such as Python Flask and MySQL.
Python Flask provides a flexible framework for developing web applications and
chatbots, while MySQL is a reliable and popular database management system.
NLP packages such as NLTK, spaCy, and Scikit-learn are widely used and have
extensive documentation and community support, making the development process
smoother.
The use of AI techniques such as bag of words and TF-IDF for feature extraction,
along with NLP for language processing, adds complexity to the system but also
enhances its accuracy and performance.
Operational Feasibility:
The proposed system is operationally feasible, as it can be easily accessed by
users via a web or mobile interface.
The system requires internet connectivity and a device capable of running a web
browser, which may be a challenge in remote areas with poor connectivity or
limited access to technology.
The system can be used by users of all ages and education levels, provided they
have basic English language proficiency.
The system can provide support and information to users on a 24/7 basis, allowing
them to access relevant information at their convenience.
Economic Feasibility:
The proposed system is economically feasible, as it does not require significant
hardware or software investment.
The use of open-source technologies such as Python Flask and NLP packages
reduces development costs, while the availability of college datasets and resources
online makes it easier to gather relevant data.
However, the system will require ongoing maintenance and support, as well as
regular updates to keep it relevant and accurate.
Legal and Ethical Feasibility:
The proposed system is legally and ethically feasible, as it does not violate any
laws or ethical standards.
The system will require adherence to data privacy and security standards, such as
encrypting user data and complying with data protection laws.
The system should also be designed to avoid bias or discrimination based on
factors such as race, gender, or socio-economic status.
Overall, the feasibility study suggests that "CollegeBot: An AI based College Chatbot"
developed with Python Flask NLP Packages and MySQL is a viable solution that can provide
users with valuable support and information related to college.
Software Testing
Software testing is the process of evaluating a software system or application to ensure that it
meets its requirements and functions as intended. In the case of CollegeBot, testing is
essential to ensure that the chatbot functions correctly and provides accurate responses to user
queries.
The following are some of the types of testing that can be performed on CollegeBot:
1. Functional Testing: This type of testing checks whether the chatbot functions as
intended and performs its specified functions. It involves testing the chatbot's user
interface, input validation, database interaction, and other functional requirements.
2. Usability Testing: This type of testing focuses on the chatbot's ease of use and user
experience. It involves testing how easily users can interact with the chatbot, how
easily they can access the information they need, and how the chatbot responds to
their queries.
3. Performance Testing: This type of testing checks the chatbot's performance under
different load conditions. It involves testing the chatbot's response time, scalability,
and resource utilization under varying levels of user activity.
4. Security Testing: This type of testing checks whether the chatbot is secure from
potential security threats. It involves testing the chatbot's authentication and
authorization mechanisms, data encryption, and other security measures.
5. Compatibility Testing: This type of testing checks the chatbot's compatibility with
different hardware and software environments. It involves testing the chatbot's
compatibility with different web browsers, operating systems, and other software
components.
In the case of CollegeBot, it is essential to perform comprehensive testing to ensure that the
chatbot is functioning correctly and providing accurate responses to user queries. By
performing various types of testing, developers can identify and fix any bugs, errors, or issues
that may arise during the testing process.
Test Cases
1. Test Case ID: TRN001
Input: A dataset of 1000 college-related sentences
Expected Result: The dataset is successfully imported into the system for training.
2. Test Case ID: TRN002
Input: Pre-processed dataset with stemming, lemmatization, stop word removal and
tokenization applied
Expected Result: The pre-processing step should successfully produce a clean and consistent
dataset for training.
3. Test Case ID: TRN003
Input: Bag of words technique applied for feature extraction
Expected Result: The bag of words technique should successfully create a feature matrix
from the pre-processed dataset.
4. Test Case ID: TRN004
Input: Term frequency-inverse document frequency (TF-IDF) technique applied for feature
extraction
Expected Result: The TF-IDF technique should successfully create a feature matrix from the
pre-processed dataset.
5. Test Case ID: TRN005
Input: The chatbot is trained using the feature matrix and NLP techniques
Expected Result: The chatbot successfully learns to recognize user intent, entities and
dependencies, and generate appropriate responses.
6. Test Case ID: TST001
Input: A set of 20 test queries related to college in
English
Expected Result: The chatbot successfully recognizes the intent, entities and dependencies of
the test queries and generates appropriate responses.
7. Test Case ID: TST002
Input: A set of 20 test queries related to college in other languages (e.g. Spanish, French,
Chinese)
Expected Result: The chatbot should be able to recognize the language of the input and
respond with an appropriate message informing the user that the chatbot is only able to
process queries in English.
8. Test Case ID: TST003
Input: A set of 20 test queries related to non-college topics
Expected Result: The chatbot should recognize that the queries are not related to college and
generate an appropriate response, such as suggesting the user to search for information using
a search engine.
9. Test Case ID: TST004
Input: Testing the text-to-speech functionality by inputting a query and checking if the
chatbot generates a speech output
Expected Result: The chatbot should successfully generate a speech output in response to the
user's query.
Test Report
Test Title: Test Report for CollegeBot: An AI based College Chatbot
Introduction
CollegeBot is an AI-based chatbot that provides assistance to users by answering their
queries related to college. The chatbot is developed using Python Flask NLP Packages and
MySQL. This test report provides an overview of the testing process, the objectives of
testing, the test environment, the test results, and the conclusions drawn from the testing.
Test Objective
The objective of this testing is to verify the functionality and performance of the CollegeBot
chatbot. The testing will ensure that the chatbot is capable of correctly processing user
queries and providing accurate and relevant responses.
Test Scope
The testing will cover the following aspects of the CollegeBot chatbot:
User registration and login functionality
User query processing and response generation
Accuracy and relevance of chatbot responses
Test Environment
The following environment was used for testing:
Operating System: Windows 10
Python Version: 3.9.6
Flask Version: 2.0.1
MySQL Version: 8.0.26
Test Result
The testing process was carried out using a combination of manual and automated testing
techniques. The following test cases were executed, and the results were recorded:
TC ID Input Expected Result Result
TC001 User registration with valid User account created
credentials successfully PASS
TC002 User registration with invalid Error message displayed
credentials PASS
TC003 User login with valid credentials Login successful PASS
TC004 User login with invalid Error message displayed PASS
TC ID Input Expected Result Result
credentials
TC005 User query related to college Relevant response generated
admission PASS
TC006 User query related to timetable Relevant response generated PASS
TC007 User query related to results Relevant response generated PASS
TC008 User query related to courses Relevant response generated
offered PASS
TC009 User query with misspelled Chatbot suggests corrected
words spelling and generates
relevant response PASS
TC010 User query with incorrect Chatbot suggests corrected
grammar grammar and generates
relevant response PASS
CHAPTER 7
SYSTEM SPECIFICATION
7.1 Hardware specification
Processors: Intel® Core™ i5 processor 4300M at 2.60 GHz or 2.59 GHz (1
socket, 2 cores, 2 threads per core), 8 GB of DRAM
Disk space: 320 GB
Operating systems: Windows® 10, macOS*, and Linux*
7.2 Software specification
Server Side : Python 3.7.4(64-bit) or (32-bit)
Client Side : JQyerty HTML, CSS, Bootstrap
IDE : Flask 1.1.1
Back end : MySQL 5.
Server : Wampserver 2i
OS : Windows 10 64 –bit or Ubuntu 18.04 LTS “Bionic Beaver”
DL Packages : Pandas, SciKitLearn, NumPy
SOFTWARE DESCRIPTION
8.1. Python 3.7.4
Python is a general-purpose interpreted, interactive, object-oriented, and high-level
programming language. It was created by Guido van Rossum during 1985- 1990. Like Perl,
Python source code is also available under the GNU General Public License (GPL). This
tutorial gives enough understanding on Python programming language.
Pandas is mainly used for data analysis and associated manipulation of tabular data in Data
frames. Pandas allows importing data from various file formats such as comma-separated
values, JSON, Parquet, SQL database tables or queries, and Microsoft Excel. Pandas allows
various data manipulation operations such as merging, reshaping, selecting, as well as data
cleaning, and data wrangling features. The development of pandas introduced into Python
many comparable features of working with Data frames that were established in the R
programming language. The panda’s library is built upon another library NumPy, which is
oriented to efficiently working with arrays instead of the features of working on Data frames.
NumPy
NumPy, which stands for Numerical Python, is a library consisting of multidimensional array
objects and a collection of routines for processing those arrays. Using NumPy, mathematical
and logical operations on arrays can be performed.
Scikit-learn (formerly scikits. learn and also known as sklearn) is a free software machine
learning library for the Python programming language. It features various classification,
regression and clustering algorithms including support-vector machines, random forests,
gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python
numerical and scientific libraries NumPy and SciPy.
NLTK
NLTK is a leading platform for building Python programs to work with human language
data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as
WordNet, along with a suite of text processing libraries for classification, tokenization,
stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP
libraries, and an active discussion forum.
NLTK (Natural Language Toolkit) Library is a suite that contains libraries and programs for
statistical language processing. It is one of the most powerful NLP libraries, which contains
packages to make machines understand human language and reply to it with an appropriate
response.
WordCloud
A word cloud (also called tag cloud or weighted list) is a visual representation of text data.
Words are usually single words, and the importance of each is shown with font size or color.
Python fortunately has a wordcloud library allowing to build them.
The wordcloud library is here to help you build a wordcloud in minutes. A word cloud is a
data visualization technique that shows the most used words in large font and the least used
words in small font. It helps to get an idea about your text data, especially when working on
problems based on natural language processing.
8.2. MySQL
MySQL tutorial provides basic and advanced concepts of MySQL. Our MySQL tutorial is
designed for beginners and professionals. MySQL is a relational database management
system based on the Structured Query Language, which is the popular language for accessing
and managing the records in the database. MySQL is open-source and free software under the
GNU license. It is supported by Oracle Company. MySQL database that provides for how to
manage database and to manipulate data with the help of various SQL queries. These queries
are: insert records, update records, delete records, select records, create tables, drop tables,
etc. There are also given MySQL interview questions to help you better understand the
MySQL database.
MySQL is currently the most popular database management system software used for
managing the relational database. It is open-source database software, which is supported by
Oracle Company. It is fast, scalable, and easy to use database management system in
comparison with Microsoft SQL Server and Oracle Database. It is commonly used in
conjunction with PHP scripts for creating powerful and dynamic server-side or web-based
enterprise applications. It is developed, marketed, and supported by MySQL AB, a Swedish
company, and written in C programming language and C++ programming language. The
official pronunciation of MySQL is not the My Sequel; it is My Ess Que Ell. However, you
can pronounce it in your way. Many small and big companies use MySQL. MySQL supports
many Operating Systems like Windows, Linux, MacOS, etc. with C, C++, and Java
languages.
8.3. WampServer
WampServer is a Windows web development environment. It allows you to create web
applications with Apache2, PHP and a MySQL database. Alongside, PhpMyAdmin allows
you to manage easily your database.
WampServer is a reliable web development software program that lets you create web apps
with MYSQL database and PHP Apache2. With an intuitive interface, the application
features numerous functionalities and makes it the preferred choice of developers from
around the world. The software is free to use and doesn’t require a payment or subscription.
8.4. Bootstrap 4
Bootstrap is a free and open-source tool collection for creating responsive websites and web
applications. It is the most popular HTML, CSS, and JavaScript framework for developing
responsive, mobile-first websites.
It solves many problems which we had once, one of which is the cross-browser compatibility
issue. Nowadays, the websites are perfect for all the browsers (IE, Firefox, and Chrome) and
for all sizes of screens (Desktop, Tablets, Phablets, and Phones). All thanks to Bootstrap
developers -Mark Otto and Jacob Thornton of Twitter, though it was later declared to be an
open-source project.
Easy to use: Anybody with just basic knowledge of HTML and CSS can start using
Bootstrap
Responsive features: Bootstrap's responsive CSS adjusts to phones, tablets, and desktops
Mobile-first approach: In Bootstrap, mobile-first styles are part of the core framework
Browser compatibility: Bootstrap 4 is compatible with all modern browsers (Chrome,
Firefox, Internet Explorer 10+, Edge, Safari, and Opera)
8.5. Flask
Flask is a web framework. This means flask provides you with tools, libraries and
technologies that allow you to build a web application. This web application can be some
web pages, a blog, a wiki or go as big as a web-based calendar application or a commercial
website.
Flask is often referred to as a micro framework. It aims to keep the core of an application
simple yet extensible. Flask does not have built-in abstraction layer for database handling,
nor does it have formed a validation support. Instead, Flask supports the extensions to add
such functionality to the application. Although Flask is rather young compared to
most Python frameworks, it holds a great promise and has already gained popularity among
Python web developers. Let’s take a closer look into Flask, so-called “micro” framework for
Python.
Flask was designed to be easy to use and extend. The idea behind Flask is to build a solid
foundation for web applications of different complexity. From then on you are free to plug in
any extensions you think you need. Also you are free to build your own modules. Flask is
great for all kinds of projects. It's especially good for prototyping.
Flask is part of the categories of the micro-framework. Micro-framework are normally
framework with little to no dependencies to external libraries. This has pros and cons. Pros
would be that the framework is light, there are little dependency to update and watch for
security bugs, cons is that some time you will have to do more work by yourself or increase
yourself the list of dependencies by adding plugins. In the case of Flask, its dependencies are:
WSGI-Web Server Gateway Interface (WSGI) has been adopted as a standard for Python
web application development. WSGI is a specification for a universal interface between the
web server and the web applications.
Werkzeug-It is a WSGI toolkit, which implements requests, response objects, and other
utility functions. This enables building a web framework on top of it. The Flask framework
uses Werkzeug as one of its bases.
Jinja2 Jinja2 is a popular templating engine for Python. A web templating system combines
a template with a certain data source to render dynamic web pages.
Conclusion
In conclusion, CollegeBot, an AI-based College Chatbot developed with Python Flask NLP
Packages and MySQL, provides a solution for users to have access to college information and
solutions through an intuitive and interactive chat interface. The chatbot makes use of Natural
Language Processing (NLP) techniques such as stemming, lemmatization, removal of stop
words, and tokenization to pre-process the collected college data. It uses two feature
extraction techniques, Bag of Words (BoW) and Term Frequency-Inverse Document
Frequency (TF-IDF), to generate vector representations of the pre-processed data. The
chatbot is trained using LSTM and NLP to predict user intent, recognize entities, and
generate responses to user queries. It uses a combination of machine learning algorithms such
LSTM to classify user queries. Performance analysis metrics such as confusion matrix,
accuracy, precision, recall, and F1-score were used to evaluate the effectiveness of the
chatbot. The results showed that the chatbot was able to accurately predict user intents and
generate appropriate responses. Thus, CollegeBot provides a user-friendly and accessible
platform for users to access college information, advice, and solutions through a chat
interface. The implementation of NLP and machine learning techniques allows for efficient
and accurate responses to user queries, improving the user experience and overall
effectiveness of the chatbot.
Future Enhancement
There are several future enhancements that can be made to "CollegeBot: An AI based
College Chatbot" developed with Python Flask NLP Packages and MySQL:
Multilingual support: Currently, the chatbot only supports queries in English.
However, it can be enhanced to support multiple languages, making it accessible to
users who speak different languages.
Image and video recognition: In addition to text-based queries, the chatbot can be
enhanced to recognize and respond to queries containing images or videos related to
college.
Personalization: The chatbot can be enhanced to personalize responses based on the
user’s previous queries and preferences.
Voice-based interactions: The chatbot can be enhanced to support voice-based
interactions, allowing users to interact with the chatbot using speech instead of text.
Integration with smart devices: The chatbot can be integrated with smart devices such
as Google Home or Amazon Echo, making it more accessible and convenient for
users to use.
Overall, these enhancements can help make "CollegeBot: An AI based College Chatbot"
more efficient, user-friendly, and accessible to users worldwide.
References
1. Zhang, Y., & Wallace, B. (2017). A sensitivity analysis of (and practitioners’ guide
to) convolutional neural networks for sentence classification. arXiv preprint
arXiv:1510.03820.
2. Goyal, P., Gupta, R., & Goyal, L. M. (2020). A review of chatbot and natural
language processing. International Journal of Advanced Research in Computer
Science, 11(4), 69-75.
3. Rashid, S. M., Abdullah, A. H., & Ahmed, M. A. (2019). Development of a chatbot
using natural language processing for customer service. International Journal of
Computer Science and Information Security (IJCSIS), 17(5), 167.
4. Lowe, R., & Pow, N. (2017). The rise of the conversational interface: A new kid on
the block. Computer, 50(8), 58-63.
5. Rajabi, A., Asgarian, A., & Ebrahimi, M. (2018). A comparative study of machine
learning algorithms for automated response selection in chatbot systems. In
Proceedings of the 9th Workshop on Computational Approaches to Subjectivity,
Sentiment and Social Media Analysis (pp. 45-52).
6. Singh, A., & Sharma, M. (2020). AI Chatbot: A review of literature. In 2020 2nd
International Conference on Innovative Mechanisms for Industry Applications
(ICIMIA) (pp. 23-28). IEEE.
7. Saini, V., & Singh, S. (2019). A review on chatbots in customer service industry. In
2019 6th International Conference on Computing for Sustainable Global
Development (INDIACom) (pp. 313-317). IEEE.
8. Hernandez-Mendez, A., Perez-Meana, H., & Sucar, L. E. (2018). Natural language
processing and chatbots: A survey of current research and future possibilities. Journal
of Computing and Information Technology, 26(1), 1-18.
9. Debnath, B., Chakraborty, D., & Mandal, S. K. (2019). Chatbot for e-learning: A
review. In Proceedings of the 2nd International Conference on Inventive Research in
Computing Applications (pp. 186-190). IEEE.
10. Gao, W., & Huang, H. (2019). An intelligent chatbot system for online customer
service. In Proceedings of the 2019 2nd International Conference on Education and
Multimedia Technology (pp. 208-211). ACM.
11. Sarker, S., & Rana, S. (2020). AI based chatbot for customer service: A review. In
2020 IEEE Region 10 Symposium (TENSYMP) (pp. 1774-1778). IEEE.
12. Muduli, S., & Sharma, S. (2021). Implementation of a conversational chatbot system
for e-commerce. In Intelligent Computing, Information and Control Systems (pp. 753-
760). Springer.
13. Ahmad, M., Kamal, A., & Shahzad, W. (2019). A review of chatbots in customer
service. In 2019 3rd International Conference on Computing, Mathematics and
Engineering Technologies (iCoMET) (pp. 1-6). IEEE.
14. H. Jin and H. Kim, "Developing a Chatbot Service Model for Customer Support," in
International Journal of Human-Computer Interaction, vol. 36, no. 12, pp. 1188-1195,
2020.
15. J. R. Lloyd and C. A. Boyd, "The Application of Chatbots in Learning Environments:
A Review of Recent Research," in Journal of Educational Technology Development
and Exchange, vol. 13, no. 1, pp. 1-14, 2020.
16. S. Srinivasan and S. Gunasekaran, "Survey on Chatbot Development and Its
Applications," in Journal of Computer Science, vol. 16, no. 11, pp. 1398-1411, 2020.
17. M. H. Hashim, A. Alhamid, M. Aljahdali and A. Albaham, "Chatbot technology for
customer service: a systematic literature review," in International Journal of
Advanced Computer Science and Applications, vol. 10, no. 6, pp. 305-312, 2019.
18. P. L. Poon and K. D. Chau, "Designing and Implementing a Chatbot for Customer
Service," in International Journal of Innovation and Technology Management, vol.
16, no. 5, pp. 1-18, 2019.
19. Y. Liu, L. Wang and X. Liu, "Designing and Developing a Chatbot for Customer
Service," in Proceedings of the 2019 International Conference on Computer Science
and Artificial Intelligence, pp. 209-213, 2019.
20. Y. Zhao, X. Zhao, Y. Zhang and C. Liu, "A survey on chatbot design techniques," in
Journal of Network and Computer Applications, vol. 153, pp. 102-117, 2020.
21. A. Singh and A. Rani, "A Comprehensive Study on Chatbots: History, Taxonomy,
Technologies, and Future Directions," in Journal of Ambient Intelligence and
Humanized Computing, vol. 11, no. 6, pp. 2561-2595, 2020.
22. R. J. Passonneau and J. Li, "The benefits and drawbacks of chatbots in customer
service," in Proceedings of the 2019 Conference on Empirical Methods in Natural
Language Processing and the 9th International Joint Conference on Natural Language
Processing (EMNLP-IJCNLP), pp. 5982-5991, 2019.
23. A. Kapoor and S. Sood, "A Survey of Chatbot Implementation Techniques," in
Proceedings of the 2020 International Conference on Smart Technologies in
Computing, Communications and Electrical Engineering (ICSTCEE), pp. 206-210,
2020.
24. Y. He, Q. Liu and Y. Yang, "A Survey of Chatbot Design Techniques in Speech
Interaction," in Proceedings of the 2020 IEEE 17th International Conference on
Networking, Sensing and Control (ICNSC), pp. 1-5, 2020.
25. S. S. Shrivastava and S. K. Sharma, "A Survey on Recent Trends in Chatbot
Development and Implementation," in Proceedings of the 2020 International
Conference on Inventive Computation Technologies (ICICT), pp. 190-196, 2020.
Web References
1. IBM Watson Assistant - https://www.ibm.com/cloud/watson-assistant/
2. Dialogflow - https://cloud.google.com/dialogflow
3. Botpress - https://botpress.com/
4. Microsoft Bot Framework - https://dev.botframework.com/
5. Rasa - https://rasa.com/
6. Pandorabots - https://www.pandorabots.com/
7. Tars - https://www.tars.com/
8. SnatchBot - https://www.snatchbot.me/
9. ManyChat - https://manychat.com/
10. BotStar - https://www.botstar.com/
Book References
1. Python Crash Course: A Hands-On, Project-Based Introduction to Programming by
Eric Matthes
2. Flask Web Development: Developing Web Applications with Python by Miguel
Grinberg
3. Learning MySQL: Get a Handle on Your Data by Seyed M.M. (Saied) Tahaghoghi
and Hugh E. Williams
4. Natural Language Processing with Python: Analyzing Text with the Natural Language
Toolkit by Steven Bird, Ewan Klein, and Edward Loper
5. Speech and Language Processing (3rd ed. draft) by Dan Jurafsky and James H. Martin
6. Applied Natural Language Processing with Python: Implementing Machine Learning
and Deep Learning Algorithms for Natural Language Processing by Taweh Beysolow
II
7. Practical Natural Language Processing: A Comprehensive Guide to Building Real-
World NLP Systems by Sowmya Vajjala, Bodhisattwa Majumder, and Anuj Gupta
8. Deep Learning for Natural Language Processing: Creating Neural Networks with
Python by Palash Goyal, Sumit Pandey, and Karan Jain
9. Building Chatbots with Python: Using Natural Language Processing and Machine
Learning by Sumit Raj
10. Practical Bot Development: Designing and Building Bots with Node.js and Microsoft
Bot Framework by Szymon Rozga and Rahul Rai
Screenshots
Source code
Packages
from flask import Flask
from flask import Flask, render_template, Response, redirect, request, session, abort, url_for
import os
import base64
from datetime import datetime
import mysql.connector
import gensim
from gensim.parsing.preprocessing import remove_stopwords, STOPWORDS
from gensim.parsing.porter import PorterStemmer
import spacy
nlp = spacy.load('en')
Admin
mydb = mysql.connector.connect(
host="localhost",
user="root",
passwd="",
charset="utf8",
database="chatbot_hospital"
)
app = Flask(__name__)
##session key
app.secret_key = 'abcdef'
def login_admin():
cnt=0
act=""
msg=""
if request.method == 'POST':
username1 = request.form['uname']
password1 = request.form['pass']
mycursor = mydb.cursor()
mycursor.execute("SELECT count(*) FROM admin where username=%s && password=
%s",(username1,password1))
myresult = mycursor.fetchone()[0]
if myresult>0:
session['username'] = username1
#result=" Your Logged in sucessfully**"
return redirect(url_for('admin'))
else:
msg="Your logged in fail!!!"
return render_template('login_admin.html',msg=msg,act=act)
Add Data
def view_data():
msg=request.args.get("msg")
act=request.args.get("act")
url=""
mycursor = mydb.cursor()
mycursor.execute("SELECT * FROM cc_data")
data = mycursor.fetchall()
if request.method=='POST':
input1=request.form['input']
output=request.form['output']
link=request.form['link']
if link is None:
url=""
else:
url=' <a href='+link+' target="_blank">Click Here</a>'
output+=url
mycursor.execute("SELECT max(id)+1 FROM cc_data")
maxid = mycursor.fetchone()[0]
if maxid is None:
maxid=1
sql = "INSERT INTO cc_data(id,input,output) VALUES (%s,%s,%s)"
val = (maxid,input1,output)
mycursor.execute(sql, val)
mydb.commit()
print(mycursor.rowcount, "Added Success")
return redirect(url_for('view_data',msg='success'))
#if cursor.rowcount==1:
# return redirect(url_for('index',act='1'))
if act=="del":
did=request.args.get("did")
mycursor.execute("delete from cc_data where id=%s",(did,))
mydb.commit()
return redirect(url_for('view_data'))
return render_template('view_data.html',msg=msg,act=act,data=data)
Chatbot
def bot():
msg=""
output=""
uname=""
mm=""
s=""
xn=0
if 'username' in session:
uname = session['username']
cnt=0
mycursor = mydb.cursor()
mycursor.execute("SELECT * FROM cc_register where uname=%s",(uname, ))
value = mycursor.fetchone()
Field Type
username varchar(20)
password varchar(20)
Field Type
id int(11)
input varchar(200)
output text
Collaboration Diagram
Deployment Diagram
ER – Diagram