College ChatBot - Report

CHAPTER 1
INTRODUCTION
1.1. Overview
A college is an educational institution that typically offers undergraduate programs leading to
bachelor's degrees, as well as graduate programs leading to master's degrees and doctoral
degrees. Colleges may be either public or private, and they may offer a wide range of
academic disciplines, such as arts, sciences, humanities, engineering, and business. In many
countries, including the India, colleges are an integral part of the higher education system and
are often considered to be the next step after completing high school. College education
provides students with an opportunity to deepen their knowledge in a specific field of study,
develop critical thinking skills, and gain practical experience through internships or research
projects. In addition to academic programs, colleges may offer a range of extracurricular
activities, such as sports teams, clubs, and organizations, to provide students with a well-
rounded educational experience. Colleges may also offer various support services, such as
academic advising, career counselling, and financial aid, to help students succeed in their
studies and prepare for their future careers.
Anna University Regional Campus Madurai is a campus of Anna University, a premier

engineering and technology university located in Tamil Nadu, India. The campus is located in
Keelakuyilkudi village, about 25 kilometres from the city of Madurai. The Anna University
Regional Campus Madurai offers undergraduate and postgraduate programs in engineering,
technology, and management. The campus is spread over 32 acres and has state-of-the-art
facilities, including modern classrooms, well-equipped laboratories, a library, a computer
centre, and sports facilities. The campus is known for its academic excellence and has highly
qualified faculty members who are experts in their respective fields. The campus also
provides a conducive learning environment for students, with an emphasis on research and
innovation. In addition to academic programs, Anna University Regional Campus Madurai
offers a range of extracurricular activities, such as sports, cultural events, and technical clubs,
to help students develop their skills and interests outside of the classroom.
Overall, Anna University Regional Campus Madurai is a renowned educational institution in
the region and attracts students from across Tamil Nadu and beyond who seek quality
education and professional development opportunities.
1.1.1. Need of College Chatbot
Anna University Regional Campus Madurai can benefit from a college chatbot for several
reasons. Firstly, a chatbot can provide an efficient and effective means of communication
between the college and its students, staff, parents, and other users. The chatbot can handle
routine queries, such as those related to admission procedures, course registration, and
academic schedules, thereby reducing the workload of administrative staff. Secondly, a
college chatbot can offer personalized assistance to students, staff, and other users. By
leveraging AI and natural language processing capabilities, the chatbot can understand the
specific needs of individual users and provide customized responses that address their queries
or concerns. This can enhance the user experience and increase satisfaction levels among
college stakeholders. Thirdly, a college chatbot can serve as a platform for students to access
information and resources related to their academic and professional development. The
chatbot can provide guidance on career opportunities, internships, and research projects, and
can help students connect with alumni and industry professionals for mentorship and
networking. Finally, a college chatbot can help the college stay connected with its
stakeholders and adapt to changing circumstances or requirements. For instance, the chatbot
can provide updates on the latest developments in academic programs or administrative
policies, and can help the college respond to emergency situations or crises in a timely and
effective manner. Overall, a college chatbot can benefit Anna University Regional Campus
Madurai by providing a modern and innovative platform for communication, engagement,
and support, and by enhancing the efficiency, effectiveness, and quality of the college's
services and programs.
1.2. Problem Description
1.2.1. Problem Identification
While college chatbots have become increasingly popular in recent years, there are still
several problems associated with existing college chatbots, such as:
Limited Functionality: Many existing college chatbots have limited functionality and are
only capable of providing basic information such as course schedules or admissions
requirements. They often lack the ability to provide personalized assistance or guidance to
users.
Inaccurate Responses: Some college chatbots may provide inaccurate or outdated
information to users, which can lead to confusion and frustration. This can also damage the
reputation of the college.
Technical Issues: College chatbots may face technical issues, such as slow response times,
connectivity problems, or system crashes. These issues can affect the user experience and
result in a loss of trust in the chatbot.
Lack of Natural Language Processing: Some college chatbots may lack natural language
processing (NLP) capabilities, which can limit their ability to understand and respond to user
queries in a conversational manner.
Limited Integration: Some college chatbots may not be fully integrated with the college's
database and information systems, which can limit their ability to provide accurate and up-to-
date information to users.
Lack of Human Interaction: Some users may prefer human interaction over interacting with
a chatbot. Existing college chatbots may not provide the option for users to speak with a
human representative, which can limit their effectiveness in certain situations.
In summary, some of the problems associated with existing college chatbots include limited
functionality, inaccurate responses, technical issues, lack of natural language processing,
limited integration, and the lack of human interaction.
1.2.2. Problem Statement
The problem statement of "CollegeBot: An AI-based College Chatbot" is to address the
limitations and shortcomings of existing college chatbots by developing an interactive
chatbot that can provide personalized and accurate assistance to students, staff, parents, and
other users.
The current college chatbots often have limited functionality and lack the ability to provide
personalized assistance or guidance to users. They may also provide inaccurate or outdated
information, face technical issues, lack natural language processing capabilities, and have
limited integration with the college's information systems. Additionally, some users may
prefer human interaction over interacting with a chatbot. To address these limitations, the
"CollegeBot" project aims to develop an AI-based chatbot that can provide prompt and
accurate assistance to users, understand natural language queries, integrate with the college's
information systems, and continuously improve its capabilities over time. The chatbot will
also offer the option for users to speak with a human representative if needed, to ensure that
users can access the support they require in the most effective and efficient manner. The
problem statement of "CollegeBot: An AI-based College Chatbot" is to improve the user
experience and satisfaction with the college by providing a modern and innovative platform
for user engagement, reducing the workload of staff, and improving the efficiency of the
college's administrative and academic processes.
1.3. AI Chatbot
Chatbots, as the name suggests, are computer programs built to simulate human
conversations— whether that is on a website, a messaging app, or a virtual assistant. With
today’s customers expecting immediacy and personalization in their interactions with brands,
the addition of chatbots as a communication channel has become critical to business growth.
1.3.1. Types of chatbots

As chatbots are still a relatively new business technology, debate surrounds how many
different types of chatbots exist and what the industry should call them.
Some common types of chatbots include the following:
 Scripted or quick reply chatbots.
As the most basic chatbots, they act as a hierarchical decision tree. These bots interact with
users through predefined questions that progress until the chatbot answers the user's question.
 Keyword recognition-based chatbots.
These chatbots are a bit more complex; they attempt to listen to what the user types and
respond accordingly using keywords from customer responses. This bot combines
customizable keywords and AI to respond appropriately. Unfortunately, these chatbots
struggle with repetitive keyword use or redundant questions.
 Hybrid chatbots.
These chatbots combine elements of menu-based and keyword recognition-based bots. Users
can choose to have their questions answered directly or use the chatbot's menu to make
selections if keyword recognition is ineffective.
 Contextual chatbots.
These chatbots are more complex than others and require a data-centric focus. They use AI
and ML to remember user conversations and interactions, and use these memories to grow
and improve over time. Instead of relying on keywords, these bots use what customers ask
and how they ask it to provide answers and self-improve.
 Voice-enabled chatbots.
This type of chatbot is the future of this technology. Voice-enabled chatbots use spoken
dialogue from users as input that prompts responses or creative tasks. Developers can create
these chatbots using text-to-speech and voice recognition APIs. Examples include Amazon
Alexa and Apple's Siri.
1.3.2. Architecture of AI Chatbots
Chatbots are similar to a messaging interface where bots respond to users’ queries instead of
human beings. They look like other apps. But its UI layer works differently. Machine
Learning Algorithms power the conversation between a human being and a chatbot.
ML algorithms break down your queries or messages into human understandable natural
languages with NLP techniques and send the response similar to what you expect from a
human on the other side.
1.3.2. Use cases of Chatbot
 Online shopping
In these environments, sales teams can use chatbots to answer noncomplex product questions
or provide helpful information that consumers could search for later, including shipping price
and availability.
 Customer service
Service departments can also use chatbots to help service agents answer repetitive requests.
For example, a service rep might give the chatbot an order number and ask when the order
shipped. Generally, a chatbot transfers the call or text to a human service agent once a
conversation gets too complex.
 Virtual assistants
Chatbots can also act as virtual assistants. Apple, Amazon, Google and Microsoft all have
forms of virtual assistants. Apps, such as Apple's Siri and Microsoft's Cortana, or products,
like Amazon's Echo with Alexa or Google Home, all play the part of a personal chatbot.
1.4. Aim and Objective
The aim of the project "CollegeBot: An AI-based College Chatbot" is to provide a
personalized and efficient platform for students, staff, parents, and other users to access
relevant information and assistance related to the college.
The objectives of the project for each user group are:
For Students:
 To provide timely and accurate information about academic programs, courses, and
schedules.
 To assist students in locating and accessing college resources, such as the library,
academic advisors, and financial aid.
 To offer personalized guidance to students on academic matters, such as assignment
submissions, exam preparation, and career opportunities.
 To enable students to track their academic progress and receive timely reminders for
important deadlines.
 To improve overall student experience and satisfaction with the college.
For Staff:
 To provide a platform for staff to access information and assistance related to their job
functions and responsibilities.
 To enable staff to efficiently communicate with students and other staff members
regarding administrative and academic matters.
 To offer personalized guidance to staff on administrative matters, such as HR policies,
payroll, and benefits.
 To improve overall staff productivity and satisfaction with the college.
For Parents:
 To provide a platform for parents to access information and assistance related to their
child's academic progress and college experience.
 To enable parents to efficiently communicate with college staff regarding their child's
academic and administrative matters.
 To offer personalized guidance to parents on academic matters, such as course
selection, academic progress tracking, and financial aid.
 To improve overall parent satisfaction with the college and increase their engagement
with their child's education.
For Other Users:
 To provide a platform for other users, such as alumni and prospective students, to
access relevant information about the college.
 To enable other users to efficiently communicate with college staff and access
resources related to their needs.
 To offer personalized guidance to other users on matters related to their interaction
with the college.
 To improve overall user experience and satisfaction with the college.
1.5. Scope of the Project
The scope of the project "CollegeBot: An AI-based College Chatbot" includes the
development of an interactive chatbot that can assist students, staff, parents, and other users
in accessing relevant information and assistance related to the college.
The chatbot will be equipped with natural language processing (NLP) capabilities to
understand and respond to users' queries in a conversational manner. It will be integrated with
the college's database and student information system to provide personalized responses to
users.
The scope of the project includes the following features and functionalities:
 Admissions and Enrolment Assistance: The chatbot will provide information about
admissions criteria, application deadlines, and enrolment procedures.
 Course and Program Information: The chatbot will provide information about courses,
programs, and majors offered by the college.
 Schedule and Calendar: The chatbot will provide information about academic
schedules, exam dates, and other important deadlines.
 Academic Support: The chatbot will provide guidance to students on academic
matters, such as assignment submissions, exam preparation, and career opportunities.
 Administrative Support: The chatbot will provide guidance to staff and parents on
administrative matters, such as HR policies, payroll, and benefits.
 Resource Navigation: The chatbot will assist users in locating and accessing college
resources, such as the library, academic advisors, and financial aid.
 Personalized Assistance: The chatbot will offer personalized guidance to users based
on their queries, academic history, and preferences.
 Continuous Improvement: The chatbot will continuously improve its capabilities by
incorporating feedback from users and analysing usage patterns.
The scope of the project also includes the development of a user-friendly interface that allows
users to easily interact with the chatbot and access the information they need. The chatbot
will be available 24/7 to provide prompt assistance to users, irrespective of their location or
time zone.
1.6. Importance of the Project
The project "CollegeBot: An AI-based College Chatbot" is important for several reasons:
 Enhanced User Experience: The chatbot provides a personalized and efficient
platform for students, staff, parents, and other users to access relevant information and
assistance related to the college. This enhances the overall user experience and
satisfaction with the college.
 24/7 Availability: The chatbot is available 24/7 to provide prompt assistance to users,
irrespective of their location or time zone. This ensures that users can access the
information they need at their convenience.
 Time-Saving: The chatbot eliminates the need for users to wait in long queues or
spend time searching for information. This saves time and improves productivity for
both users and staff.
 Cost-Effective: The chatbot reduces the need for hiring additional staff to handle
routine queries and administrative tasks. This results in cost savings for the college.
 Personalized Assistance: The chatbot offers personalized guidance to users based on
their queries, academic history, and preferences. This improves the quality of
assistance provided to users and enhances their engagement with the college.
 Continuous Improvement: The chatbot continuously improves its capabilities by
incorporating feedback from users and analysing usage patterns. This ensures that the
chatbot remains relevant and useful to users over time.
 Competitive Advantage: The chatbot provides a competitive advantage to the college
by offering a modern and innovative platform for user engagement. This enhances the
reputation of the college and attracts more students and staff.
In summary, the project "CollegeBot: An AI-based College Chatbot" is important because it
enhances user experience, saves time and costs, offers personalized assistance, and provides a
competitive advantage to the college.
1.7. Purpose of the Project
The purpose of the project "CollegeBot: An AI-based College Chatbot" is to develop an
interactive chatbot that can assist students, staff, parents, and other users in accessing relevant
information and assistance related to the college. The chatbot is designed to improve the
overall user experience by providing personalized and efficient assistance to users. The
primary purpose of the project is to enhance user engagement and satisfaction with the
college. By providing a modern and innovative platform for user engagement, the chatbot can
attract and retain more students, staff, and parents. The chatbot also saves time and costs by
eliminating the need for hiring additional staff to handle routine queries and administrative
tasks. Another purpose of the project is to improve the efficiency of the college's
administrative and academic processes. By providing prompt and accurate assistance to users,
the chatbot can streamline the college's operations and reduce the workload of staff. Finally,
the project aims to leverage the latest advancements in artificial intelligence and natural
language processing to develop a chatbot that can continuously improve its capabilities over
time. By incorporating user feedback and analysing usage patterns, the chatbot can become
more effective and relevant to users. In summary, the purpose of the project "CollegeBot: An
AI-based College Chatbot" is to enhance user engagement and satisfaction, improve the
efficiency of the college's operations, and leverage the latest advancements in artificial
intelligence to develop a chatbot that can continuously improve its capabilities.
CHAPTER 2
LITERATURE SURVEY
2.1. Title: "A Review of Chatbot Applications for Education: Opportunities
and Challenges"
Author: Alvin W. Yeo and Dongfang Liu
Year: 2022
Link: https://www.sciencedirect.com/science/article/pii/S2405452621001916
Problem Identified
Limited understanding of the potential applications of chatbots in education and the
challenges associated with their development and implementation
Objective
To review the existing literature on chatbots in education and identify opportunities and
challenges for their development and implementation
Methodology
Literature review and analysis of existing studies on chatbots in education
Dataset: N/A
Merits
Provides a comprehensive review of existing research on chatbots in education, highlighting
the potential benefits and challenges associated with their development and implementation
Demerits
Limited empirical evaluation of the effectiveness of chatbots in education, as well as the
generalizability of the review to other contexts.
2.2. Title: "A Chatbot-Based Intelligent Tutoring System for Supporting

English Writing Learning"
Author: Weiwei Cui, Qianqian Liu, and Jing Han
Year: 2022
Link: https://www.mdpi.com/2076-3417/12/7/3228
Problem Identified
Difficulty in providing personalized support for English writing learning in large classrooms
Objective
To develop a chatbot-based intelligent tutoring system for supporting English writing
learning in large classrooms
Methodology
Design and development of a chatbot-based intelligent tutoring system, data collection and
evaluation
Dataset
Student responses collected from an English writing course in a Chinese university
Merits
Develops an intelligent tutoring system that can provide personalized support for English
writing learning in large classrooms, potentially improving learning outcomes for students
Demerits
Limited generalizability of the study to other contexts and languages, as well as the potential
for bias in the selection of the specific chatbot approach used in the study.
2.3. Title: "Chatbots in Higher Education: A Review of the Literature"

Author: Maria Spante, Frida Svensson
Year: 2021
Problem Identified
Lack of studies on the use of chatbots in higher education
Objective
To review the literature on the use of chatbots in higher education and identify gaps in the
research
Methodology
Systematic literature review
Dataset: N/A
Merits
Provides a comprehensive review of the literature on the use of chatbots in higher education
Demerits
Limited empirical studies on the use of chatbots in higher education
2.4. Title: "Developing an AI-Based Chatbot for Student Services at Higher
Education Institutions"
Author: Hajar Alsulami, Abdulaziz Alsuhibany, Ali Alsalman
Year: 2021
Problem Identified
Inefficiency of traditional student services at higher education institutions
Objective
To develop an AI-based chatbot for student services at higher education institutions
Methodology
Design science research methodology
Dataset
Student survey and focus groups
Merits
Develops an AI-based chatbot for student services that can improve the efficiency and quality
of services provided by higher education institutions
Demerits
Limited generalizability of the study as it is conducted in a specific context.
2.5. Title: "Development of a Natural Language Processing Based Chatbot

for Higher Education Institutes"
Author: Siddharth Agrawal, Shikhar Prakash, Rishi Raj Gupta, and Rishabh Tandon
Year: 2020
Link: https://ieeexplore.ieee.org/document/9131986
Problem Identified
Difficulty in handling large numbers of student queries and inquiries
Objective
To develop a natural language processing based chatbot for higher education institutes that
can provide timely and accurate responses to student queries
Methodology
Data collection, data pre-processing, natural language processing techniques, and
implementation of a chatbot
Dataset
Student queries collected from a higher education institute
Merits
Develops a chatbot that can provide timely and accurate responses to student queries,
potentially reducing the workload of college staff and improving student satisfaction
Demerits
Limited evaluation of the chatbot's effectiveness and usability, as well as the generalizability
of the study to other contexts
2.6. Title: "A Comparative Study of Rule-Based and Neural Network-

Based Chatbots for Academic Advising"
Author: Jacob J. Martin and Olin Johnson
Year: 2020
Problem Identified
Inadequate academic advising resources and services for college students
Objective
To compare the performance of rule-based and neural network-based chatbots for academic
advising
Methodology
Design and implementation of two chatbots, data collection and evaluation
Dataset
Student queries and responses collected from a college's academic advising center
Merits
Provides a comparison of two different chatbot approaches for academic advising, potentially
providing insights into the most effective chatbot approach for this context
Demerits
Limited generalizability of the study to other contexts, as well as the potential for bias in the
selection of the specific chatbot approaches used in the study.
2.7. Title: "A Chatbot for Library Services: A User-Centered Design

Approach"
Author: Fatih Oguz
Year: 2019
Link: https://digitalcommons.unl.edu/libphilprac/2277/
Problem Identified
Inefficiencies in library services and a lack of personalized assistance for library users.
Objective
To design and develop a chatbot for library services using a user-centered design approach.
Methodology
User-cantered design approach, including surveys, interviews, and prototype development.
Dataset: N/A
Merits
Develops a chatbot for library services that provides personalized assistance for library users,
improving the efficiency and effectiveness of library services.
Demerits
Limited generalizability of the study to other contexts, as well as the potential for bias in the
selection of the specific chatbot approach used in the study.
2.8. Title: "Evaluating the Effectiveness of a Chatbot-Based Study

Advisor"
Author: Adam D. Worrall and Eduardo J. Simoes
Year: 2019
Link: https://dl.acm.org/doi/abs/10.1145/3292147.3326322
Problem Identified
Limited availability of personalized study advisors for college students, leading to reduced
academic success
Objective
To evaluate the effectiveness of a chatbot-based study advisor in improving academic success
for college students
Methodology
Design and development of a chatbot-based study advisor, implementation and evaluation of
the study advisor with a group of college students, comparison of academic success before
and after implementation
Dataset
Academic performance data from a group of college students before and after the
implementation of the chatbot-based study advisor
Merits
Provides evidence that a chatbot-based study advisor can improve academic success for
college students, potentially increasing student retention and graduation rates
Demerits
Limited generalizability of the study to other contexts and populations, as well as potential
bias in the selection of the specific chatbot approach used in the study.
2.9. Title: "Chatbots as Conversational Agents in University Career

Services"
Author: Sanja Tumbas, Milica Stojkovic, and Nenad Nenin
Year: 2018
Problem Identified: Limited access to career services in universities due to high demand and
limited resources
Objective
To explore the potential of chatbots as conversational agents for university career services,
with the aim of improving access and engagement
Methodology
Design and development of a chatbot-based career service, implementation and evaluation of
the chatbot with a group of university students, comparison of user engagement before and
after implementation
Dataset
User engagement data from a group of university students before and after the
implementation of the chatbot-based career service
Merits
Provides evidence that chatbots can improve access and engagement with university career
services, potentially improving career outcomes for students
Demerits
2.10. Title: "A Deep Learning-Based Chatbot Framework for Flight

Booking"
Author: Kishan P. Bhavsar and Gaurav K. Dharamshi
Year: 2018
Problem Identified
Inefficient and time-consuming process of flight booking, with limited options for
personalized search and recommendation
Objective
To develop a deep learning-based chatbot framework for flight booking that provides
personalized search and recommendation options
Methodology
Design and development of a deep learning-based chatbot framework for flight booking,
implementation and evaluation of the chatbot with a group of users, comparison of user
satisfaction with the chatbot approach versus traditional flight booking methods
Dataset
User satisfaction data from a group of users who used the chatbot-based approach for flight
booking
Merits
Provides evidence that a deep learning-based chatbot approach can improve the efficiency
and personalization of flight booking, potentially improving user satisfaction and loyalty
Demerits
CHAPTER 3
SYSTEM ANALYSIS
Existing System
 Rule-Based (Scripted) Chatbots:
The first and perhaps the simplest bots are rule-based chatbots, also known as decision-tree
bots. These bots are the most common, and many of us have likely interacted with one either
through Live Chat features, on e-commerce sites, or via social media.
Rule-Based or Scripted Chatbots are the simplest type because they use a decision tree to
communicate with users. When communicating with users, scripted bots recognize keywords
and channel them down the correct path to achieve their goals, like information about current
best deals, and so on. Such chatbots have a very limited skill set. Still, you can use them for
simple tasks such as:
‒ Customer support agents that provide customers with automated responses
‒ Engagement bots that inform customers about special offers
Rule-based chatbots are able to hold basic conversations based on “if/then” logic. These
chatbots do not understand context or intents. Human agents map out conversations via a
flowchart, anticipating what a customer might ask, and program how the chatbot should
respond. We use logical next steps and clear call-to-action buttons to build rule-based
chatbots conversations. Companies design rule-based chatbots to answer simple questions
and often bring web visitors to a live agent to further the conversation. They are not designed
to learn and become smarter over time. We can build a rule-based chatbot with very simple
or more complicated rules. They can’t, however, answer any questions outside of the defined
rules. Rule-based chatbots do not learn through interactions and only perform and work
within scenarios for which they are trained for.
There are many existing machine learning algorithms used in chatbots, including but not
limited to:
1. Rule-based chatbots - These chatbots are based on a set of predefined rules that are
used to determine the response to a user's input.
2. Naive Bayes - This algorithm is used for text classification, including sentiment
analysis, spam filtering, and topic classification.
3. Support Vector Machines (SVM) - This algorithm is commonly used in text
classification tasks and can classify data into two or more categories.
4. Decision Trees - This algorithm is used to make decisions based on a series of
questions and answers that lead to a conclusion.
5. Random Forest - This algorithm is a collection of decision trees that work together to
make a prediction.
6. Neural Networks - This algorithm is inspired by the structure of the human brain and
is used for tasks such as image recognition, natural language processing, and speech
recognition.
7. Recurrent Neural Networks (RNN) - This type of neural network is used for tasks
where the input data is a sequence, such as natural language processing.
8. Long Short-Term Memory (LSTM) - This is a type of RNN that is used for tasks
where the input sequence is long and requires the model to remember information
from earlier in the sequence.
9. Transformer - This is a neural network architecture that has gained popularity for
natural language processing tasks, such as language translation and text
summarization.
These are just a few examples of machine learning algorithms used in chatbots. The choice of
algorithm will depend on the specific requirements of the chatbot and the type of data it will
be processing.
Disadvantages
While machine learning algorithm based chatbots have numerous advantages, they also have
some disadvantages. Here are a few of them:
1. Data Dependency: Machine learning algorithm based chatbots require a lot of
training data to learn and understand the user's queries. They can be less effective if
there isn't enough data or the quality of the data is poor.
2. Limited Scope: These chatbots have a limited scope and can only answer questions
that they are trained for. If a user asks a question that the chatbot hasn't been trained
for, it may not be able to provide a useful response.
3. Lack of Personality: Machine learning chatbots are often perceived as impersonal
because they don't have the ability to convey emotions or respond in a conversational
manner. This can make the interaction feel robotic and unsatisfying for some users.
4. Requires Frequent Updates: Machine learning algorithms require frequent updates
to stay relevant. Chatbots that are not updated frequently may provide outdated
responses or incorrect information.
5. Security Risks: As with any system that uses data, machine learning chatbots are
vulnerable to security risks. If a chatbot is not properly secured, it can be hacked and
sensitive information can be stolen.
Proposed System
The proposed system of "CollegeBot: An AI-based College Chatbot" includes the following
components:
AI chatbots are more complex programmed bots based on Natural Language Processing
(NLP) and Machine Learning (ML) algorithms. Unlike rule-based chatbots, AI-powered
chatbots learn as they go. Human agents train chatbots to decipher free form conversations
based on “Intents” and “Entities.” “Brains” in turn rule Intents and Entities. Entities identify a
subject: People, place, or things. Intents are a bit harder to grasp. Here is a description of
“intents” from Hoover – our Intelligent Agent:
AI-Powered Chatbots are more complex chatbots, often empowered with Natural Language
Processing (NLP) and Machine Learning (ML) algorithms. Unlike rule-based chatbots, AI-
powered bots can answer a user with non-pre-defined responses, and ML helps them to learn
from each integration with the user and remember one’s preferences.
Training: The system first collects a dataset related to college and pre-processes it using
techniques such as stemming, lemmatization, removal of stop words, and tokenization. Then,
it uses feature extraction techniques such as bag of words (BoW) and term frequency–inverse
document frequency (TF-IDF) to represent the text data in a vectorised form. The system
then trains the model using Natural Language Processing (NLP) techniques.
 Collecting dataset related to College: This step involves gathering relevant data
related to college from various sources in the college
 Pre-processing: The collected data is pre-processed to make it suitable for analysis.
This includes techniques such as stemming, lemmatization, removal of stop words,
and tokenization.
 Feature Extraction: The pre-processed data is then converted into a numerical
format that can be used for analysis. This is done through techniques such as bag of
words (BoW) and term frequency–inverse document frequency (TF-IDF).
 Training: The extracted features are used to train the chatbot using natural language
processing (NLP) techniques.
Testing: The testing phase of the system involves Students/Staff/Parents/Others registering
and logging in to the chatbot. They can input their queries in English, and the system uses
NLP techniques such as intent recognition, entity recognition, dependency parsing, and
generate response to provide relevant and accurate answers. The system also gives responses
in text as well as text-to-speech formats.
Overall, the proposed system aims to provide an efficient and user-friendly platform for
Students/Staff/Parents/Others to get information related to college using AI and NLP
techniques.
 Inputting queries: Students/Staff/Parents/Others would input their queries related to
college in English.
 Intent recognition: The chatbot would need to identify the intent of the
Students/Staff/Parents/Others 's query, such as whether they are looking for
information on college.
 Entity recognition: The chatbot would also need to recognize any relevant entities in
the Users’ query, such as specific class timings, college enquires, or exam timetable
etc.
 Dependency parsing: This step involves analysing the relationships between the
words in the user's query to better understand the meaning behind the query.
 Generating response: Finally, the chatbot would use the information gathered from
the intent recognition, entity recognition, and dependency parsing steps to generate a
response to the user’s query.
 Text and Speech Response: The response would be given in both text and speech
format to make it more accessible for the Students/Staff/Parents/Others. The chatbot
responds to the user’s query with both text and speech.
Advantages
The proposed system of "CollegeBot: An AI based College Chatbot" has several advantages,
such as:
 Quick and Convenient Access: Students/Staff/Parents/Others can easily access the
system from anywhere, at any time using a smartphone, laptop or tablet, making it a
quick and convenient way for them to get help and support.
 Personalized Assistance: The chatbot provides personalized assistance to
Students/Staff/Parents/Others, based on their specific queries, allowing them to
receive customized guidance and support.
 Instant Response: The system provides an instant response to
Students/Staff/Parents/Others, helping them to save time and make more informed
decisions.
 24/7 Availability: The chatbot is available 24/7, meaning that
Students/Staff/Parents/Others can access help and support at any time, without having
to wait for office hours.
 Cost-effective: The system is cost-effective, as it requires minimal human resources
to operate and can handle multiple queries simultaneously.
 Increased Productivity: By providing Students/Staff/Parents/Others with the
information and support they need, the chatbot can help to increase their productivity
and profitability.
 Improved Accuracy: The use of natural language processing (NLP) and machine
learning algorithms can help to improve the accuracy of the system, ensuring that
Students/Staff/Parents/Others receive reliable and trustworthy information.
CHAPTER
SYSYEM DESIGN
System Architecture
CollegeBot Training
Upload Dataset Preprocessing

Login Feature Extraction
Classification
Build and Train
CollegeBot Web
App
Web Admin
CollegeBot Response
Prediction
Intent Recognition
Entity Recognition
Dependency Parsing Login
Generate Response Register
CollegeBot
Response
Input Query
Text
Speech
Student/Staff/
parents/Other users
System Architecture Description

The system architecture of "CollegeBot: An AI based College Chatbot" can be divided into
two main components: the back-end and the front-end.
The back-end consists of the following modules:
1. Data Collection: This module is responsible for collecting the dataset related to
college. This dataset will be used for training the NLP model.
2. Pre-processing: This module is responsible for performing pre-processing techniques
such as stemming, lemmatization, removal of stop words, and tokenization on the
collected dataset.
3. Feature Extraction: This module is responsible for extracting the features from the
pre-processed dataset using two techniques: bag of words (BoW) and term frequency–
inverse document frequency (TF-IDF).
4. NLP Training: This module is responsible for training the NLP model using the pre-
processed dataset and extracted features.
The front-end consists of the following modules:
1. Registration and Login: This module is responsible for allowing users to register and
login to the system.
2. Input Queries: This module is responsible for allowing users to input their queries
related to college in English.
3. Prediction: This module is responsible for predicting the intent of the query,
recognizing the entities involved in the query, performing dependency parsing on the
query, and generating a response using NLP. The response can be in the form of text
as well as text-to-speech answer.
System Flow
System Flow
The system flow of "CollegeBot: An AI based College Chatbot" can be described as follows:
1. Admin collects the dataset related to college and performs pre-processing tasks such
as stemming, lemmatization, removal of stop words, and tokenization.
2. Admin performs feature extraction using two techniques: bag of words (BoW) and
term frequency–inverse document frequency (TF-IDF).
3. Admin trains the model using natural language processing (NLP).
4. Users register and login to the system.
5. Users input their queries in English.
6. The system predicts the intent of the query using NLP.
7. The system recognizes entities such as staff names, college exam schedule, college
admission enquires, courses conducted, etc. from the query.
8. The system performs dependency parsing to understand the grammatical structure of
the query.
9. The system generates a response to the query based on the intent, recognized entities,
and grammatical structure using NLP.
10. The system gives the response to the user in text format as well as text to speech
format.
11. The user can ask more queries or logout from the system.
System Implementation
The proposed system of "CollegeBot: An AI based College Chatbot" can be developed using
Python Flask NLP packages and MySQL database. The system architecture can include the
following components:
1. Admin Module: This module includes the functionality of collecting the dataset
related to college, pre-processing it using stemming, lemmatization, removal of stop
words, and tokenization techniques. Then, the feature extraction is done using two
techniques - bag of words (BoW) and term frequency-inverse document frequency
(TF-IDF). Finally, the NLP model is trained using the extracted features.
2. User Module: This module includes two types of users – Student/Staff/parents/Other
users and system administrators. users can register and login to the system. They can
input their queries in English, which will be analysed by the NLP model to predict the
intent, recognize entities, and generate responses using dependency parsing. The
responses can be in text as well as text to speech format. System administrators can
access the dataset, train the model, and monitor the system's performance.
3. NLP Packages: The NLP packages used in the system can include NLTK, spaCy, and
TextBlob. These packages can provide functionalities like tokenization, stemming,
lemmatization, removal of stop words, entity recognition, dependency parsing, and
sentiment analysis.
4. MySQL Database: The MySQL database can be used to store the dataset, user
information, and system logs. The dataset can be stored in a separate table, and the
user information can be stored in a separate table with appropriate attributes. The
system logs can be stored in a separate table to monitor the system's performance.
5. Flask Framework: The Flask framework can be used to develop the web application.
It can provide functionalities like routing, request handling, session management, and
user authentication.
The overall architecture can be designed with the Flask framework as the front end, NLP
packages as the processing engine, and MySQL database as the data storage. This system can
provide an efficient and user-friendly solution for users to get their queries resolved related to
college.
System Description
CollegeBot is an AI-based chatbot designed for users to provide college information and
support. The system consists of two main components: the admin panel and the user-facing
chatbot.
The admin panel is responsible for collecting and pre-processing the college data required
for training the chatbot. The admin panel collects the relevant college dataset and pre-
processes the data through techniques such as stemming, lemmatization, removal of stop
words, and tokenization. It then extracts features from the data using two techniques: bag of
words (BoW) and term frequency–inverse document frequency (TF-IDF). After feature
extraction, the data is trained using natural language processing (NLP) techniques. The user-
facing chatbot is designed to handle user queries and provide relevant information. It uses
NLP techniques to predict user intent, recognize entities, and perform dependency parsing to
generate appropriate responses. The chatbot can communicate in text as well as text-to-
speech format for enhanced accessibility. The chatbot also provides registration and login
functionality to personalize the user experience. Overall, CollegeBot aims to provide users
with access to college information and support through an easy-to-use, conversational
interface. The system allows users to obtain accurate and timely information that can help
them make informed decisions and improve their college practices.
Modules Description
1. CollegeBot Web App
The design and development of the CollegeBot web app involve integrating different
technologies and tools to create a seamless user experience for users. The front end, back end,
and database work together to provide an efficient and reliable platform for users to get
answers to their queries related to college.
1.1. Front End: The front end of the CollegeBot web app was implemented using HTML,
CSS, and JavaScript. The user interface allows student/staff/parents/other users to register,
log in, and input queries related to college. The chatbot response is displayed as text and also
as text-to-speech output.
1.2. Back End: The back end of the CollegeBot web app was implemented using Python
Flask. Flask is a web framework that allows developers to create web applications using
Python. The Flask app receives user queries from the front end and passes them to the NLP
module for prediction. The NLP module then generates the response, which is sent back to
the front end for display.
1.3. Database: The database used in the CollegeBot web app is MySQL. The database stores
user information such as name, email address, and password for registration and login
purposes. It also stores the chatbot training dataset and the output of the NLP module for each
user query.
2. CollegeBot Chat Window
The chat window of CollegeBot is the main interface where users can interact with the
chatbot. It is designed using HTML, CSS, and JavaScript and integrated with the backend
developed using Python Flask and MySQL.
The following are the modules and their descriptions used for the development of the
CollegeBot chat window:
2.1. HTML/CSS: The user interface is developed using HTML and CSS. The HTML file
includes the basic structure of the chat window, such as chat area, user input area, and send
button. The CSS file is used to style the chat window, such as color, font, and layout.
2.2. JavaScript: The chat window is interactive and dynamic, which is achieved by using
JavaScript. The JavaScript file handles the user input and sends it to the backend for
processing. It also receives the response from the backend and displays it on the chat
window.
2.3. Python Flask: The backend of the CollegeBot chat window is developed using Python
Flask. The Flask app receives the user input from the JavaScript file and processes it using
the NLP algorithm. The response is generated by the Flask app and sent back to the
JavaScript file.
2.4. MySQL: The chatbot's database is developed using MySQL. It stores the user's login
credentials, user input, and chatbot responses. The MySQL database is integrated with the
Python Flask app to retrieve and store data.
Overall, the chat window of CollegeBot is an interactive and user-friendly interface that
allows users to interact with the chatbot and get answers to their college-related queries.
3. End User Interface
CollegeBot is an AI-based chatbot designed to assist users with their college queries. The
chatbot has two interfaces: one for the admin and another for the users.
3.1. Admin Interface: The admin interface consists of modules for collecting, pre-
processing, and training the chatbot with data related to college. The admin can perform the
following tasks:
 Collect dataset related to college
 Pre-process the data by performing stemming, lemmatization, removal of stop
words, and tokenization
 Feature extraction using two techniques: bag of words (BoW) and term
frequency–inverse document frequency (TF-IDF)
 Train the chatbot using natural language processing techniques
3.2. User Interface: The user interface consists of modules for registering, logging in,
inputting queries in English, and receiving responses from the chatbot. The users can perform
the following tasks:
 Register with the chatbot by providing their name, email address, and
password
 Log in to their account
 Input queries related to college in English
 Receive responses from the chatbot that are generated using natural language
processing techniques
 The chatbot responds in text and text-to-speech formats, making it accessible
to users who are not proficient in reading and writing.
Both the admin and user interfaces are designed using Python Flask and MySQL. The chat
window interface is designed using HTML, CSS, and JavaScript.4.
4. CollegeBot Training
CollegeBot, being an AI-based colleges' chatbot, requires extensive training in natural
language processing (NLP) techniques. The following are the sub modules involved in
training the CollegeBot chatbot:
4.1. Data Collection

The first step in building an effective chatbot is collecting data. In this module, the chatbot
administrator gathers datasets related to college from various sources such as college
administration, college courses, timings, exam schedule, admission, results, time table. Once
the sources have been identified, the next step is to extract the relevant data from them. This
can be done using web scraping tools that can extract data from websites automatically.
4.2. Import and Explore Dataset
The "Import and Explore Dataset" module of CollegeBot is responsible for importing the
dataset related to college and performing an initial exploration of the data. The module
performs the following tasks:
Importing Dataset: The module imports the dataset related to college from a specified data
source, such as a CSV or Excel file.
Training
College.CSV Web UI
Testing
Storage
Data Exploration: The module performs an initial exploration of the dataset to understand
the characteristics of the data. This includes calculating basic statistics such as mean, median,
mode, and standard deviation for each column of the dataset, visualizing the distribution of
data using histograms, scatterplots, and box plots
Read
Visualize
Import Web UI
Store
Storage
4.3. Pre-processing
Pre-processing is an important step in the development of any NLP application, including
chatbots. The pre-processing module of CollegeBot includes the following sub-modules:
Tokenization: This module is responsible for breaking down the text input into smaller units
such as words, phrases, or sentences, known as tokens. In CollegeBot, tokenization is done
using Python's built-in nltk package.
Sentence Tokens
Stopword removal: This module removes common words in the English language that do
not add much meaning to the text, such as "the", "and", "in", etc. This helps reduce the
dimensionality of the data and speed up the processing. In CollegeBot, stopword removal is
also done using nltk.
Stemming and Lemmatization: This module reduces words to their root form or lemma,
which helps in reducing the number of unique words in the data and grouping together words
with similar meanings. In CollegeBot, stemming and lemmatization are performed using
nltk.
The pre-processing module is a critical component of CollegeBot, as it is responsible for

transforming raw text input into a format that can be used for training and inference.
Stop Words Removal
Tokenization
college Dataset
Pre-processing
Stemming
Storage
4.4. Feature Extraction

In "CollegeBot: An AI based Colleg Chatbot", feature extraction is the process of converting
text data into a numerical format that machine learning models can understand. The following
are the feature extraction modules used in CollegeBot:
4.4.1. Term Frequency-Inverse Document Frequency (TF-IDF)
This is a technique that gives more importance to rare words in the dataset. It calculates the
frequency of each word in each document, and then multiplies this frequency by the inverse
document frequency (IDF) of the word. The IDF is a measure of how rare the word is in the
dataset. The resulting matrix is used as input to machine learning models.
These feature extraction modules are implemented using Python libraries such as Scikit-learn
and NLTK. The output of these modules is a numerical matrix that can be used to train
machine learning models.
After eliminating irrelevant information, the elaborated list of words is converted into
numbers. The TF-IDF method is applied to accomplish this task. Term Frequency is several
occurrences of a word in a document, and IDF is the ratio of a total number of documents and
the number of documents containing the term. A popular and straightforward method of
feature extraction with text data is called the bag-of-words model of text. A bag-of-words
model, or BoW for short, is a way of extracting features from the text for use in modelling,
such as machine learning algorithms. A bag-of-words is a representation of text that describes
the occurrence of words within a document. It involves two things (1) A vocabulary of
known words, (2) A measure of the presence of known words. We extract features on the
basis of Equations Here tf represents term frequency and df represents document frequency.
Eq. 1-5
Feature extraction in DL with the context of words is also essential. The technique used for
this purpose is word2vec neural network-based algorithm. Equation 5 given below shows
how word2vec manages the word-context with the help of probability measures. The D
represents the pair-wise illustration of a set of words, and (w; c) is the word-context pair
drawn from the large set D.
Eq.5
The multi-word context is also a variant of word2vec, as shown in Equation 6. The variable-
length context is also controlled by the given below mathematics.
Eq.6
4.4.2. Bag of Words (BoW)
This is a commonly used technique for feature extraction. In this approach, each word in the
dataset is treated as a feature, and a matrix is created to represent the frequency of each word
in each document. This matrix is used as input to machine learning models.
 Word Embedding Layer
Embedding is the representation of words into real numbers. Many machine learning and DL
Algorithms cannot process data in raw form (text form) and can only process numerical
values as input for learning. Word embedding organizes texts which are converted into
numbers. It extracts relevant features from the textual data and structures them up in the form
of real values. Word embedding uses a word mapping dictionary to convert the terms (words)
to a real value vector. There are two main problems with machine learning feature
engineering techniques, one problem is the sparse vectors for data representation, and the
second issue is that; it does not take into account the meaning of words to some extent. In
embedding vectors, similar words will be represented by the almost near real-valued
numbers. For example, the terms love and affection will be near to each other in the
embedding vector.
TF-IDF
Bag of Words
Preprocessed Dataset
Feature Extraction
WordCloud
Storage
4.5. Classification and Training
Training the CollegeBot model: Once the features have been extracted, the machine learning
model is trained on the training set. The model is typically trained using supervised learning
algorithms such as LSTM.
Design a classifier model which can be trained on the corpus with respect to the target
variable i.e. the Tag from the corpus.
Encoder units helps to ‘understand’ the input sequence (“Are you free tomorrow?”) and the
decoder decodes the ‘thought vector’ and generate the output sequence (Yes, what’s up?”).
Thought vector can be thought of as neural representational form of input sequence, which
only the decoder can look inside and produce output sequence.
Feature Extracted Data LSTM
Classes: College info, timing,

course offered, time table,
result, exam schedule, enquiry,
admission etc.,
Storage
5. CollegeBot Response Predictor

The Prediction modules of "CollegeBot: An AI based College Chatbot" are responsible for
predicting the intent of the user's query, recognizing entities present in the query, dependency
parsing, generating a response, and giving a text as well as a text to speech answer. The
following are the modules involved in the Prediction stage:
5.1. Registration and Login Module

This module allows users to register and login to their accounts on the chatbot. The userneeds
to provide their basic details such as name, contact number, and email address to register for
the chatbot.
5.2. Input Query Module
This module allows users to input their queries in the chat window using English language.
The chatbot then processes the query using natural language processing techniques to
understand the intent of the query.
5.3. Intent Recognition Module
This module is responsible for recognizing the intent of the query inputted by the user. The
chatbot uses machine learning algorithms to recognize the intent of the query based on the
training data.
5.4. Entity Recognition Module
This module is responsible for recognizing the entities present in the query. Entities are the
specific details or parameters mentioned in the query such as crop name, location, and time
period. The chatbot uses NLP techniques to extract these entities from the query.
5.5. Dependency Parsing Module
This module is responsible for identifying the relationships between the words in the query.
The chatbot uses dependency parsing techniques to understand the structure of the query and
the relationships between the words.
5.6. Generate Response Module
This module is responsible for generating a response based on the intent and entities
recognized in the user's query. It involves using a predefined set of responses for each intent
and replacing the entities recognized in the query with the appropriate values.
5.7. Text to Speech
This module converts the generated response into speech using text-to-speech technology so
that the user can hear the response instead of just reading it. It involves using text-to-speech
libraries to generate the speech from the text.
5.8. Text Response
This module returns the generated response in text format for the user to read. It involves
simply returning the generated response as text.
6. Performance Analysis
Performance Analysis is an important step in evaluating the effectiveness of a chatbot. It
helps to determine how well the chatbot performs in recognizing the user's intent, generating
appropriate responses, and providing accurate information.
The following are the key performance metrics used in CollegeBot:
Confusion Matrix: It is a table used to evaluate the performance of a classification model. It
shows the number of True Positives (TP), True Negatives (TN), False Positives (FP), and
False Negatives (FN) in a matrix format.
Accuracy: It measures the percentage of correctly predicted labels out of all labels. It can be
calculated as:
Accuracy = (TP + TN) / (TP + TN + FP + FN)
Precision: It is the ratio of correctly predicted positive instances to the total predicted
positive instances. It can be calculated as:
Precision = TP / (TP + FP)
Recall: It is the ratio of correctly predicted positive instances to the total actual positive
instances. It can be calculated as:
Recall = TP / (TP + FN)
F1-Score: It is the harmonic mean of precision and recall. It can be calculated as:
F1-Score = 2 * ((Precision * Recall) / (Precision + Recall))
These performance metrics can be used to measure the accuracy of CollegeBot in recognizing
user intents and providing appropriate responses.
Algorithm
Step 1: Data Extraction and Preprocessing
 The dataset hails from College. - It contains College Information
 Parse each. yml file:
1. Concatenate two or more sentences if the answer has two or more of them.
2. Remove unwanted data types which are produced while parsing the data.
3. Append and to all the answers.
4. Create a Tokenizer and load the whole vocabulary (questions + answers) into it.
 Three arrays required by the model are encoder_input_data, decoder_input_data and
decoder_output_data
 Encoder_input-data: Tokenize the questions and Pad them to maximum length.
 Decoder_input-data: Tokenize the answers and Pad them to maximum length.
 Decoder_output-data: Tokenize the answers and Remove the first element from all the
tokenized_answers. This is the element which we added earlier.
Step 2: Defining the Encoder-Decoder Model
 The model will have Embedding, LSTM and Dense layers. The basic configuration is
as follows:
1. 2 Input Layers: One for encoder_input_data and another for decoder_input_data.
2. Embedding layer: For converting token vectors to fix sized dense vectors. (Note :
Don't forget the ask_zero=True argument here )
3. LSTM layer: Provide access to Long-Short Term cells.
Working:
 The encoder_input_data comes in the Embedding layer ( encoder_embedding ).
 The output of the Embedding layer goes to the LSTM cell which produces 2 state
vectors ( h and c which are encoder_states )
 These states are set in the LSTM cell of the decoder.
 The decoder_input_data comes in through the Embedding layer.
 The Embedding’s goes in LSTM cell (which had the states) to produce sequences
.
Long Short Term Memory (LSTM):
 Long Short-Term Memory (LSTM) networks are a type of recurrent neural network
capable of learning order dependence in sequence prediction problems.
 This is a behaviour required in complex problem domains like machine translation,
speech recognition, and more.
 The success of LSTMs is in their claim to be one of the first implements to overcome
the technical problems and deliver on the promise of recurrent neural
networks.
 LSTM network is comprised of different memory blocks called cells (the rectangles
that we see in the image).
 There are two states that are being transferred to the next cell; the cell state and the
hidden state.
 The memory blocks are responsible for remembering things and manipulations to this
memory is done through three major mechanisms, called gates.
 The key to LSTMs is the cell state, the horizontal line running through the top of the
diagram.
 The cell state is kind of like a conveyor belt. It runs straight down the entire chain,
with only some minor linear interactions. It’s very easy for information to just flow
along it unchanged.
 The LSTM does have the ability to remove or add information to the cell state,
carefully regulated by structures called gates.
 Gates are a way to optionally let information through. They are composed out of a
sigmoid neural net layer and a pointwise multiplication operation.
 The sigmoid layer outputs numbers between zero and one, describing how much of
each component should be let through. A value of zero means “let nothing through,”
while a value of one means “let everything through!”
Step 4: Training the Model
 We train the model for a number of 150 epochs with RMSprop optimizer and
categorical_crossentropy loss function.
 Model training accuracy = 0.96 ie; 96%
Step 5: Defining Inference Models
 Encoder inference model: Takes the question as input and outputs LSTM states (h
and c).
 Decoder inference model: Takes in 2 inputs, one is the LSTM states (Output of
encoder model), second are the answer input sequences (ones not having the tag). It
will output the answers for the question which we fed to the encoder model and its
state values.
Step 6: Talking with our Chatbot
 First, we define a method str_to_tokens which converts str questions to Integer tokens
with padding.
1. First, we take a question as input and predict the state values using enc_model.
2. We set the state values in the decoder's LSTM.
3. Then, we generate a sequence which contains the element.
4. We input this sequence in the dec_model.
5. We replace the element with the element which was predicted by the dec_model and
update the state values.
6. We carry out the above steps iteratively till we hit the tag or the maximum answer
length.
Feasibility Study
Feasibility study is an important process that evaluates the potential success of a proposed
system. Here's a feasibility study of "CollegeBot: An AI based College Chatbot" developed
with Python Flask NLP Packages and MySQL:
Technical Feasibility:
 The proposed system is technically feasible, as it uses commonly available
technologies such as Python Flask and MySQL.
 Python Flask provides a flexible framework for developing web applications and
chatbots, while MySQL is a reliable and popular database management system.
 NLP packages such as NLTK, spaCy, and Scikit-learn are widely used and have
extensive documentation and community support, making the development process
smoother.
 The use of AI techniques such as bag of words and TF-IDF for feature extraction,
along with NLP for language processing, adds complexity to the system but also
enhances its accuracy and performance.
Operational Feasibility:
 The proposed system is operationally feasible, as it can be easily accessed by
users via a web or mobile interface.
 The system requires internet connectivity and a device capable of running a web
browser, which may be a challenge in remote areas with poor connectivity or
limited access to technology.
 The system can be used by users of all ages and education levels, provided they
have basic English language proficiency.
 The system can provide support and information to users on a 24/7 basis, allowing
them to access relevant information at their convenience.
Economic Feasibility:
 The proposed system is economically feasible, as it does not require significant
hardware or software investment.
 The use of open-source technologies such as Python Flask and NLP packages
reduces development costs, while the availability of college datasets and resources
online makes it easier to gather relevant data.
 However, the system will require ongoing maintenance and support, as well as
regular updates to keep it relevant and accurate.
Legal and Ethical Feasibility:
 The proposed system is legally and ethically feasible, as it does not violate any
laws or ethical standards.
 The system will require adherence to data privacy and security standards, such as
encrypting user data and complying with data protection laws.
 The system should also be designed to avoid bias or discrimination based on
factors such as race, gender, or socio-economic status.
Overall, the feasibility study suggests that "CollegeBot: An AI based College Chatbot"
developed with Python Flask NLP Packages and MySQL is a viable solution that can provide
users with valuable support and information related to college.
Software Testing
Software testing is the process of evaluating a software system or application to ensure that it
meets its requirements and functions as intended. In the case of CollegeBot, testing is
essential to ensure that the chatbot functions correctly and provides accurate responses to user
queries.
The following are some of the types of testing that can be performed on CollegeBot:
1. Functional Testing: This type of testing checks whether the chatbot functions as
intended and performs its specified functions. It involves testing the chatbot's user
interface, input validation, database interaction, and other functional requirements.
2. Usability Testing: This type of testing focuses on the chatbot's ease of use and user
experience. It involves testing how easily users can interact with the chatbot, how
easily they can access the information they need, and how the chatbot responds to
their queries.
3. Performance Testing: This type of testing checks the chatbot's performance under
different load conditions. It involves testing the chatbot's response time, scalability,
and resource utilization under varying levels of user activity.
4. Security Testing: This type of testing checks whether the chatbot is secure from
potential security threats. It involves testing the chatbot's authentication and
authorization mechanisms, data encryption, and other security measures.
5. Compatibility Testing: This type of testing checks the chatbot's compatibility with
different hardware and software environments. It involves testing the chatbot's
compatibility with different web browsers, operating systems, and other software
components.
In the case of CollegeBot, it is essential to perform comprehensive testing to ensure that the
chatbot is functioning correctly and providing accurate responses to user queries. By
performing various types of testing, developers can identify and fix any bugs, errors, or issues
that may arise during the testing process.
Test Cases
1. Test Case ID: TRN001
Input: A dataset of 1000 college-related sentences
Expected Result: The dataset is successfully imported into the system for training.
Input: Pre-processed dataset with stemming, lemmatization, stop word removal and
tokenization applied
Expected Result: The pre-processing step should successfully produce a clean and consistent
dataset for training.
Input: Bag of words technique applied for feature extraction
Expected Result: The bag of words technique should successfully create a feature matrix
from the pre-processed dataset.
Input: Term frequency-inverse document frequency (TF-IDF) technique applied for feature
extraction
Expected Result: The TF-IDF technique should successfully create a feature matrix from the
pre-processed dataset.
Input: The chatbot is trained using the feature matrix and NLP techniques
Expected Result: The chatbot successfully learns to recognize user intent, entities and
dependencies, and generate appropriate responses.
6. Test Case ID: TST001
Input: A set of 20 test queries related to college in
English
Expected Result: The chatbot successfully recognizes the intent, entities and dependencies of
the test queries and generates appropriate responses.
Input: A set of 20 test queries related to college in other languages (e.g. Spanish, French,
Chinese)
Expected Result: The chatbot should be able to recognize the language of the input and
respond with an appropriate message informing the user that the chatbot is only able to
process queries in English.
Input: A set of 20 test queries related to non-college topics
Expected Result: The chatbot should recognize that the queries are not related to college and
generate an appropriate response, such as suggesting the user to search for information using
a search engine.
Input: Testing the text-to-speech functionality by inputting a query and checking if the
chatbot generates a speech output
Expected Result: The chatbot should successfully generate a speech output in response to the
user's query.
Test Report
Test Title: Test Report for CollegeBot: An AI based College Chatbot
Introduction
CollegeBot is an AI-based chatbot that provides assistance to users by answering their
queries related to college. The chatbot is developed using Python Flask NLP Packages and
MySQL. This test report provides an overview of the testing process, the objectives of
testing, the test environment, the test results, and the conclusions drawn from the testing.
Test Objective
The objective of this testing is to verify the functionality and performance of the CollegeBot
chatbot. The testing will ensure that the chatbot is capable of correctly processing user
queries and providing accurate and relevant responses.
Test Scope
The testing will cover the following aspects of the CollegeBot chatbot:
 User registration and login functionality
 User query processing and response generation
 Accuracy and relevance of chatbot responses
Test Environment
The following environment was used for testing:
 Operating System: Windows 10
 Python Version: 3.9.6
 Flask Version: 2.0.1
 MySQL Version: 8.0.26
Test Result
The testing process was carried out using a combination of manual and automated testing
techniques. The following test cases were executed, and the results were recorded:
TC ID Input Expected Result Result
TC001 User registration with valid User account created
credentials successfully PASS
TC002 User registration with invalid Error message displayed
credentials PASS
TC003 User login with valid credentials Login successful PASS
TC004 User login with invalid Error message displayed PASS
TC ID Input Expected Result Result
credentials
TC005 User query related to college Relevant response generated
admission PASS
TC006 User query related to timetable Relevant response generated PASS
TC007 User query related to results Relevant response generated PASS
TC008 User query related to courses Relevant response generated
offered PASS
TC009 User query with misspelled Chatbot suggests corrected
words spelling and generates
relevant response PASS
TC010 User query with incorrect Chatbot suggests corrected
grammar grammar and generates
relevant response PASS
Overall Test Result: PASS

Test Conclusion
The testing of CollegeBot was successful, and the chatbot was able to provide accurate and
relevant responses to user queries. The chatbot was also able to suggest corrections for
misspelled words and incorrect grammar, making it user-friendly. The testing process
identified a few minor issues that were resolved during the testing phase. Based on the test
results, it can be concluded that the CollegeBot chatbot is a reliable and efficient tool for
providing assistance to users in their daily college activities.
CHAPTER 7
SYSTEM SPECIFICATION
7.1 Hardware specification
 Processors: Intel® Core™ i5 processor 4300M at 2.60 GHz or 2.59 GHz (1
socket, 2 cores, 2 threads per core), 8 GB of DRAM
 Disk space: 320 GB
 Operating systems: Windows® 10, macOS*, and Linux*
7.2 Software specification
 Server Side : Python 3.7.4(64-bit) or (32-bit)
 Client Side : JQyerty HTML, CSS, Bootstrap
 IDE : Flask 1.1.1
 Back end : MySQL 5.
 Server : Wampserver 2i
 OS : Windows 10 64 –bit or Ubuntu 18.04 LTS “Bionic Beaver”
 DL Packages : Pandas, SciKitLearn, NumPy
SOFTWARE DESCRIPTION
8.1. Python 3.7.4
Python is a general-purpose interpreted, interactive, object-oriented, and high-level
programming language. It was created by Guido van Rossum during 1985- 1990. Like Perl,
Python source code is also available under the GNU General Public License (GPL). This
tutorial gives enough understanding on Python programming language.
Python is a high-level, interpreted, interactive and object-oriented scripting language. Python

is designed to be highly readable. It uses English keywords frequently where as other
languages use punctuation, and it has fewer syntactical constructions than other languages.
Python is a MUST for students and working professionals to become a great Software
Engineer specially when they are working in Web Development Domain.
Python is currently the most widely used multi-purpose, high-level programming language.
Python allows programming in Object-Oriented and Procedural paradigms. Python programs
generally are smaller than other programming languages like Java. Programmers have to type
relatively less and indentation requirement of the language, makes them readable all the time.
Python language is being used by almost all tech-giant companies like – Google, Amazon,
Facebook, Instagram, Dropbox, Uber… etc. The biggest strength of Python is huge collection
of standard libraries which can be used for the following:
 Machine Learning
 GUI Applications (like Kivy, Tkinter, PyQt etc.)
 Web frameworks like Django (used by YouTube, Instagram, Dropbox)
 Image processing (like OpenCV, Pillow)
 Web scraping (like Scrapy, Beautiful Soup, Selenium)
 Test frameworks
 Multimedia
 Scientific computing
 Text processing and many more.
Pandas
pandas is a fast, powerful, flexible and easy to use open source data analysis and
manipulation tool, built on top of the Python programming language. pandas is a Python
package that provides fast, flexible, and expressive data structures designed to make working
with "relational" or "labeled" data both easy and intuitive. It aims to be the fundamental high-
level building block for doing practical, real world data analysis in Python.
Pandas is mainly used for data analysis and associated manipulation of tabular data in Data
frames. Pandas allows importing data from various file formats such as comma-separated
values, JSON, Parquet, SQL database tables or queries, and Microsoft Excel. Pandas allows
various data manipulation operations such as merging, reshaping, selecting, as well as data
cleaning, and data wrangling features. The development of pandas introduced into Python
many comparable features of working with Data frames that were established in the R
programming language. The panda’s library is built upon another library NumPy, which is
oriented to efficiently working with arrays instead of the features of working on Data frames.
NumPy
NumPy, which stands for Numerical Python, is a library consisting of multidimensional array
objects and a collection of routines for processing those arrays. Using NumPy, mathematical
and logical operations on arrays can be performed.
NumPy is a general-purpose array-processing package. It provides a high-performance

multidimensional array object, and tools for working with these arrays.
Matplotlib
Matplotlib is a comprehensive library for creating static, animated, and interactive
visualizations in Python. Matplotlib makes easy things easy and hard things possible.
Matplotlib is a plotting library for the Python programming language and its numerical
mathematics extension NumPy. It provides an object-oriented API for embedding plots into
applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK.
Seaborn
Seaborn is a library for making statistical graphics in Python. It builds on top
of matplotlib and integrates closely with pandas data structures. Visualization is the central
part of Seaborn which helps in exploration and understanding of data.
Seaborn offers the following functionalities:

 Dataset oriented API to determine the relationship between variables.
 Automatic estimation and plotting of linear regression plots.
 It supports high-level abstractions for multi-plot grids.
 Visualizing univariate and bivariate distribution.
Scikit Learn
scikit-learn is a Python module for machine learning built on top of SciPy and is distributed
under the 3-Clause BSD license.
Scikit-learn (formerly scikits. learn and also known as sklearn) is a free software machine
learning library for the Python programming language. It features various classification,
regression and clustering algorithms including support-vector machines, random forests,
gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python
numerical and scientific libraries NumPy and SciPy.
NLTK
NLTK is a leading platform for building Python programs to work with human language
data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as
WordNet, along with a suite of text processing libraries for classification, tokenization,
stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP
libraries, and an active discussion forum.
NLTK (Natural Language Toolkit) Library is a suite that contains libraries and programs for
statistical language processing. It is one of the most powerful NLP libraries, which contains
packages to make machines understand human language and reply to it with an appropriate
response.
WordCloud
A word cloud (also called tag cloud or weighted list) is a visual representation of text data.
Words are usually single words, and the importance of each is shown with font size or color.
Python fortunately has a wordcloud library allowing to build them.
The wordcloud library is here to help you build a wordcloud in minutes. A word cloud is a
data visualization technique that shows the most used words in large font and the least used
words in small font. It helps to get an idea about your text data, especially when working on
problems based on natural language processing.
8.2. MySQL
MySQL tutorial provides basic and advanced concepts of MySQL. Our MySQL tutorial is
designed for beginners and professionals. MySQL is a relational database management
system based on the Structured Query Language, which is the popular language for accessing
and managing the records in the database. MySQL is open-source and free software under the
GNU license. It is supported by Oracle Company. MySQL database that provides for how to
manage database and to manipulate data with the help of various SQL queries. These queries
are: insert records, update records, delete records, select records, create tables, drop tables,
etc. There are also given MySQL interview questions to help you better understand the
MySQL database.
MySQL is currently the most popular database management system software used for
managing the relational database. It is open-source database software, which is supported by
Oracle Company. It is fast, scalable, and easy to use database management system in
comparison with Microsoft SQL Server and Oracle Database. It is commonly used in
conjunction with PHP scripts for creating powerful and dynamic server-side or web-based
enterprise applications. It is developed, marketed, and supported by MySQL AB, a Swedish
company, and written in C programming language and C++ programming language. The
official pronunciation of MySQL is not the My Sequel; it is My Ess Que Ell. However, you
can pronounce it in your way. Many small and big companies use MySQL. MySQL supports
many Operating Systems like Windows, Linux, MacOS, etc. with C, C++, and Java
languages.
8.3. WampServer
WampServer is a Windows web development environment. It allows you to create web
applications with Apache2, PHP and a MySQL database. Alongside, PhpMyAdmin allows
you to manage easily your database.
WampServer is a reliable web development software program that lets you create web apps
with MYSQL database and PHP Apache2. With an intuitive interface, the application
features numerous functionalities and makes it the preferred choice of developers from
around the world. The software is free to use and doesn’t require a payment or subscription.
8.4. Bootstrap 4
Bootstrap is a free and open-source tool collection for creating responsive websites and web
applications. It is the most popular HTML, CSS, and JavaScript framework for developing
responsive, mobile-first websites.
It solves many problems which we had once, one of which is the cross-browser compatibility
issue. Nowadays, the websites are perfect for all the browsers (IE, Firefox, and Chrome) and
for all sizes of screens (Desktop, Tablets, Phablets, and Phones). All thanks to Bootstrap
developers -Mark Otto and Jacob Thornton of Twitter, though it was later declared to be an
open-source project.
Easy to use: Anybody with just basic knowledge of HTML and CSS can start using
Bootstrap
Responsive features: Bootstrap's responsive CSS adjusts to phones, tablets, and desktops
Mobile-first approach: In Bootstrap, mobile-first styles are part of the core framework
Browser compatibility: Bootstrap 4 is compatible with all modern browsers (Chrome,
Firefox, Internet Explorer 10+, Edge, Safari, and Opera)
8.5. Flask
Flask is a web framework. This means flask provides you with tools, libraries and
technologies that allow you to build a web application. This web application can be some
web pages, a blog, a wiki or go as big as a web-based calendar application or a commercial
website.
Flask is often referred to as a micro framework. It aims to keep the core of an application
simple yet extensible. Flask does not have built-in abstraction layer for database handling,
nor does it have formed a validation support. Instead, Flask supports the extensions to add
such functionality to the application. Although Flask is rather young compared to
most Python frameworks, it holds a great promise and has already gained popularity among
Python web developers. Let’s take a closer look into Flask, so-called “micro” framework for
Python.
Flask was designed to be easy to use and extend. The idea behind Flask is to build a solid
foundation for web applications of different complexity. From then on you are free to plug in
any extensions you think you need. Also you are free to build your own modules. Flask is
great for all kinds of projects. It's especially good for prototyping.
Flask is part of the categories of the micro-framework. Micro-framework are normally
framework with little to no dependencies to external libraries. This has pros and cons. Pros
would be that the framework is light, there are little dependency to update and watch for
security bugs, cons is that some time you will have to do more work by yourself or increase
yourself the list of dependencies by adding plugins. In the case of Flask, its dependencies are:
WSGI-Web Server Gateway Interface (WSGI) has been adopted as a standard for Python
web application development. WSGI is a specification for a universal interface between the
web server and the web applications.
Werkzeug-It is a WSGI toolkit, which implements requests, response objects, and other
utility functions. This enables building a web framework on top of it. The Flask framework
uses Werkzeug as one of its bases.
Jinja2 Jinja2 is a popular templating engine for Python. A web templating system combines
a template with a certain data source to render dynamic web pages.
Conclusion
In conclusion, CollegeBot, an AI-based College Chatbot developed with Python Flask NLP
Packages and MySQL, provides a solution for users to have access to college information and
solutions through an intuitive and interactive chat interface. The chatbot makes use of Natural
Language Processing (NLP) techniques such as stemming, lemmatization, removal of stop
words, and tokenization to pre-process the collected college data. It uses two feature
extraction techniques, Bag of Words (BoW) and Term Frequency-Inverse Document
Frequency (TF-IDF), to generate vector representations of the pre-processed data. The
chatbot is trained using LSTM and NLP to predict user intent, recognize entities, and
generate responses to user queries. It uses a combination of machine learning algorithms such
LSTM to classify user queries. Performance analysis metrics such as confusion matrix,
accuracy, precision, recall, and F1-score were used to evaluate the effectiveness of the
chatbot. The results showed that the chatbot was able to accurately predict user intents and
generate appropriate responses. Thus, CollegeBot provides a user-friendly and accessible
platform for users to access college information, advice, and solutions through a chat
interface. The implementation of NLP and machine learning techniques allows for efficient
and accurate responses to user queries, improving the user experience and overall
effectiveness of the chatbot.
Future Enhancement
There are several future enhancements that can be made to "CollegeBot: An AI based
College Chatbot" developed with Python Flask NLP Packages and MySQL:
 Multilingual support: Currently, the chatbot only supports queries in English.
However, it can be enhanced to support multiple languages, making it accessible to
users who speak different languages.
 Image and video recognition: In addition to text-based queries, the chatbot can be
enhanced to recognize and respond to queries containing images or videos related to
college.
 Personalization: The chatbot can be enhanced to personalize responses based on the
user’s previous queries and preferences.
 Voice-based interactions: The chatbot can be enhanced to support voice-based
interactions, allowing users to interact with the chatbot using speech instead of text.
 Integration with smart devices: The chatbot can be integrated with smart devices such
as Google Home or Amazon Echo, making it more accessible and convenient for
users to use.
Overall, these enhancements can help make "CollegeBot: An AI based College Chatbot"
more efficient, user-friendly, and accessible to users worldwide.
References
1. Zhang, Y., & Wallace, B. (2017). A sensitivity analysis of (and practitioners’ guide
to) convolutional neural networks for sentence classification. arXiv preprint
arXiv:1510.03820.
2. Goyal, P., Gupta, R., & Goyal, L. M. (2020). A review of chatbot and natural
language processing. International Journal of Advanced Research in Computer
Science, 11(4), 69-75.
3. Rashid, S. M., Abdullah, A. H., & Ahmed, M. A. (2019). Development of a chatbot
using natural language processing for customer service. International Journal of
Computer Science and Information Security (IJCSIS), 17(5), 167.
4. Lowe, R., & Pow, N. (2017). The rise of the conversational interface: A new kid on
the block. Computer, 50(8), 58-63.
5. Rajabi, A., Asgarian, A., & Ebrahimi, M. (2018). A comparative study of machine
learning algorithms for automated response selection in chatbot systems. In
Proceedings of the 9th Workshop on Computational Approaches to Subjectivity,
Sentiment and Social Media Analysis (pp. 45-52).
6. Singh, A., & Sharma, M. (2020). AI Chatbot: A review of literature. In 2020 2nd
International Conference on Innovative Mechanisms for Industry Applications
(ICIMIA) (pp. 23-28). IEEE.
7. Saini, V., & Singh, S. (2019). A review on chatbots in customer service industry. In
2019 6th International Conference on Computing for Sustainable Global
Development (INDIACom) (pp. 313-317). IEEE.
8. Hernandez-Mendez, A., Perez-Meana, H., & Sucar, L. E. (2018). Natural language
processing and chatbots: A survey of current research and future possibilities. Journal
of Computing and Information Technology, 26(1), 1-18.
9. Debnath, B., Chakraborty, D., & Mandal, S. K. (2019). Chatbot for e-learning: A
review. In Proceedings of the 2nd International Conference on Inventive Research in
Computing Applications (pp. 186-190). IEEE.
10. Gao, W., & Huang, H. (2019). An intelligent chatbot system for online customer
service. In Proceedings of the 2019 2nd International Conference on Education and
Multimedia Technology (pp. 208-211). ACM.
11. Sarker, S., & Rana, S. (2020). AI based chatbot for customer service: A review. In
2020 IEEE Region 10 Symposium (TENSYMP) (pp. 1774-1778). IEEE.
12. Muduli, S., & Sharma, S. (2021). Implementation of a conversational chatbot system
for e-commerce. In Intelligent Computing, Information and Control Systems (pp. 753-
760). Springer.
13. Ahmad, M., Kamal, A., & Shahzad, W. (2019). A review of chatbots in customer
service. In 2019 3rd International Conference on Computing, Mathematics and
Engineering Technologies (iCoMET) (pp. 1-6). IEEE.
14. H. Jin and H. Kim, "Developing a Chatbot Service Model for Customer Support," in
International Journal of Human-Computer Interaction, vol. 36, no. 12, pp. 1188-1195,
2020.
15. J. R. Lloyd and C. A. Boyd, "The Application of Chatbots in Learning Environments:
A Review of Recent Research," in Journal of Educational Technology Development
and Exchange, vol. 13, no. 1, pp. 1-14, 2020.
16. S. Srinivasan and S. Gunasekaran, "Survey on Chatbot Development and Its
Applications," in Journal of Computer Science, vol. 16, no. 11, pp. 1398-1411, 2020.
17. M. H. Hashim, A. Alhamid, M. Aljahdali and A. Albaham, "Chatbot technology for
customer service: a systematic literature review," in International Journal of
Advanced Computer Science and Applications, vol. 10, no. 6, pp. 305-312, 2019.
18. P. L. Poon and K. D. Chau, "Designing and Implementing a Chatbot for Customer
Service," in International Journal of Innovation and Technology Management, vol.
16, no. 5, pp. 1-18, 2019.
19. Y. Liu, L. Wang and X. Liu, "Designing and Developing a Chatbot for Customer
Service," in Proceedings of the 2019 International Conference on Computer Science
and Artificial Intelligence, pp. 209-213, 2019.
20. Y. Zhao, X. Zhao, Y. Zhang and C. Liu, "A survey on chatbot design techniques," in
Journal of Network and Computer Applications, vol. 153, pp. 102-117, 2020.
21. A. Singh and A. Rani, "A Comprehensive Study on Chatbots: History, Taxonomy,
Technologies, and Future Directions," in Journal of Ambient Intelligence and
Humanized Computing, vol. 11, no. 6, pp. 2561-2595, 2020.
22. R. J. Passonneau and J. Li, "The benefits and drawbacks of chatbots in customer
service," in Proceedings of the 2019 Conference on Empirical Methods in Natural
Language Processing and the 9th International Joint Conference on Natural Language
Processing (EMNLP-IJCNLP), pp. 5982-5991, 2019.
23. A. Kapoor and S. Sood, "A Survey of Chatbot Implementation Techniques," in
Proceedings of the 2020 International Conference on Smart Technologies in
Computing, Communications and Electrical Engineering (ICSTCEE), pp. 206-210,
2020.
24. Y. He, Q. Liu and Y. Yang, "A Survey of Chatbot Design Techniques in Speech
Interaction," in Proceedings of the 2020 IEEE 17th International Conference on
Networking, Sensing and Control (ICNSC), pp. 1-5, 2020.
25. S. S. Shrivastava and S. K. Sharma, "A Survey on Recent Trends in Chatbot
Development and Implementation," in Proceedings of the 2020 International
Conference on Inventive Computation Technologies (ICICT), pp. 190-196, 2020.
Web References
1. IBM Watson Assistant - https://www.ibm.com/cloud/watson-assistant/
2. Dialogflow - https://cloud.google.com/dialogflow
3. Botpress - https://botpress.com/
4. Microsoft Bot Framework - https://dev.botframework.com/
5. Rasa - https://rasa.com/
6. Pandorabots - https://www.pandorabots.com/
7. Tars - https://www.tars.com/
8. SnatchBot - https://www.snatchbot.me/
9. ManyChat - https://manychat.com/
10. BotStar - https://www.botstar.com/
Book References
1. Python Crash Course: A Hands-On, Project-Based Introduction to Programming by
Eric Matthes
2. Flask Web Development: Developing Web Applications with Python by Miguel
Grinberg
3. Learning MySQL: Get a Handle on Your Data by Seyed M.M. (Saied) Tahaghoghi
and Hugh E. Williams
4. Natural Language Processing with Python: Analyzing Text with the Natural Language
Toolkit by Steven Bird, Ewan Klein, and Edward Loper
5. Speech and Language Processing (3rd ed. draft) by Dan Jurafsky and James H. Martin
6. Applied Natural Language Processing with Python: Implementing Machine Learning
and Deep Learning Algorithms for Natural Language Processing by Taweh Beysolow
II
7. Practical Natural Language Processing: A Comprehensive Guide to Building Real-
World NLP Systems by Sowmya Vajjala, Bodhisattwa Majumder, and Anuj Gupta
8. Deep Learning for Natural Language Processing: Creating Neural Networks with
Python by Palash Goyal, Sumit Pandey, and Karan Jain
9. Building Chatbots with Python: Using Natural Language Processing and Machine
Learning by Sumit Raj
10. Practical Bot Development: Designing and Building Bots with Node.js and Microsoft
Bot Framework by Szymon Rozga and Rahul Rai
Screenshots
Source code
Packages
from flask import Flask
from flask import Flask, render_template, Response, redirect, request, session, abort, url_for
import os
import base64
from datetime import datetime
import mysql.connector
import gensim
from gensim.parsing.preprocessing import remove_stopwords, STOPWORDS
from gensim.parsing.porter import PorterStemmer
import spacy
nlp = spacy.load('en')
Admin
mydb = mysql.connector.connect(
host="localhost",
user="root",
passwd="",
charset="utf8",
database="chatbot_hospital"
)
app = Flask(__name__)
##session key
app.secret_key = 'abcdef'
def login_admin():
cnt=0
act=""
msg=""
if request.method == 'POST':
username1 = request.form['uname']
password1 = request.form['pass']
mycursor = mydb.cursor()
mycursor.execute("SELECT count(*) FROM admin where username=%s && password=
%s",(username1,password1))
myresult = mycursor.fetchone()[0]
if myresult>0:
session['username'] = username1
#result=" Your Logged in sucessfully**"
return redirect(url_for('admin'))
else:
msg="Your logged in fail!!!"
return render_template('login_admin.html',msg=msg,act=act)
Add Data
def view_data():
msg=request.args.get("msg")
act=request.args.get("act")
url=""
mycursor.execute("SELECT * FROM cc_data")
data = mycursor.fetchall()
if request.method=='POST':
input1=request.form['input']
output=request.form['output']
link=request.form['link']
if link is None:
url=""
else:
url=' <a href='+link+' target="_blank">Click Here</a>'
output+=url
mycursor.execute("SELECT max(id)+1 FROM cc_data")
maxid = mycursor.fetchone()[0]
if maxid is None:
maxid=1
sql = "INSERT INTO cc_data(id,input,output) VALUES (%s,%s,%s)"
val = (maxid,input1,output)
mycursor.execute(sql, val)
mydb.commit()
print(mycursor.rowcount, "Added Success")
return redirect(url_for('view_data',msg='success'))
#if cursor.rowcount==1:
# return redirect(url_for('index',act='1'))
if act=="del":
did=request.args.get("did")
mycursor.execute("delete from cc_data where id=%s",(did,))
mydb.commit()
return redirect(url_for('view_data'))
return render_template('view_data.html',msg=msg,act=act,data=data)
Chatbot
def bot():
msg=""
output=""
uname=""
mm=""
s=""
xn=0
if 'username' in session:
uname = session['username']
cnt=0
mycursor.execute("SELECT * FROM cc_register where uname=%s",(uname, ))
value = mycursor.fetchone()
mycursor.execute("SELECT * FROM cc_data order by rand() limit 0,10")

data=mycursor.fetchall()
if request.method=='POST':
msg_input=request.form['msg_input']
stemmer = PorterStemmer()
from wordcloud import STOPWORDS
STOPWORDS.update(['rt', 'mkr', 'didn', 'bc', 'n', 'm',
'im', 'll', 'y', 've', 'u', 'ur', 'don',
'p', 't', 's', 'aren', 'kp', 'o', 'kat',
'de', 're', 'amp', 'will'])
def lower(text):
return text.lower()
def remove_specChar(text):
return re.sub("#[A-Za-z0-9_]+", ' ', text)
def remove_link(text):
return re.sub('@\S+|https?:\S+|http?:\S|[^A-Za-z0-9]+', ' ', text)
def remove_stopwords(text):
return " ".join([word for word in
str(text).split() if word not in STOPWORDS])
def stemming(text):
return " ".join([stemmer.stem(word) for word in text.split()])
#def lemmatizer_words(text):
# return " ".join([lematizer.lemmatize(word) for word in text.split()])
def cleanTxt(text):
text = lower(text)
text = remove_specChar(text)
text = remove_link(text)
text = remove_stopwords(text)
text = stemming(text)
return text
#cleaning the text
df['tweet_clean'] = df['tweet_text'].apply(cleanTxt)
#show the clean text
dat=df.head()
data=[]
for ss in dat.values:
data.append(ss)
msg_input=data
mm='%'+msg_input+'%'
mycursor.execute("SELECT count(*) FROM cc_data where input like %s || output like %s",
(mm,mm))
cnt=mycursor.fetchone()[0]
if cnt>0:
mycursor.execute("SELECT * FROM cc_data where input like %s || output like %s",
(mm,mm))
dd=mycursor.fetchone()
output=dd[2]
else:
if msg_input=="":
output="How can i help you?"
else:
output="Sorry, No Results Found!"
return json.dumps(output)
return render_template('bot.html',
msg=msg,output=output,uname=uname,data=data,value=value)
#LSTM
def ConvNet(embeddings, max_sequence_length, num_words, embedding_dim,
labels_index):
embedding_layer = Embedding(num_words,
embedding_dim,
weights=[embeddings],
input_length=max_sequence_length,
trainable=False)
sequence_input = Input(shape=(max_sequence_length,), dtype='int32')
embedded_sequences = embedding_layer(sequence_input)
convs = []
filter_sizes = [2,3,4,5,6]
for filter_size in filter_sizes:
l_conv = Conv1D(filters=200,
kernel_size=filter_size,
activation='relu')(embedded_sequences)
l_pool = GlobalMaxPooling1D()(l_conv)
convs.append(l_pool)
l_merge = concatenate(convs, axis=1)
x = Dropout(0.1)(l_merge)
x = Dense(128, activation='relu')(x)
x = Dropout(0.2)(x)
preds = Dense(labels_index, activation='sigmoid')(x)
model = Model(sequence_input, preds)
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['acc'])
model.summary()
return model
function getCurrentTimestamp() {
return new Date();
}
/**
* Renders a message on the chat screen based on the given arguments.
* This is called from the `showUserMessage` and `showBotMessage`.
*/
function renderMessageToScreen(args) {
// local variables
let displayDate = (args.time || getCurrentTimestamp()).toLocaleString('en-IN', {
month: 'short',
day: 'numeric',
hour: 'numeric',
minute: 'numeric',
});
let messagesContainer = $('.messages');
// init element
let message = $(`
<li class="message ${args.message_side}">
<div class="avatar"></div>
<div class="text_wrapper">
<div class="text">${args.text}</div>
<div class="timestamp">${displayDate}</div>
</div>
</li>
`);
// add to parent
messagesContainer.append(message);
// animations
setTimeout(function () {
message.addClass('appeared');
}, 0);
messagesContainer.animate({ scrollTop: messagesContainer.prop('scrollHeight') }, 300);
}
/**
* Displays the user message on the chat screen. This is the right side message.
*/
function showUserMessage(message, datetime) {
renderMessageToScreen({
text: message,
time: datetime,
message_side: 'right',
});
}
/**
* Displays the chatbot message on the chat screen. This is the left side message.
*/
function showBotMessage(message, datetime) {
renderMessageToScreen({
text: message,
time: datetime,
message_side: 'left',
});
}
$(function() {
$('button').click(function() {
var result="";
$.ajax({
url: '/bot',
data: $('form').serialize(),
type: 'POST',
success: function(response) {
showUserMessage($('input:text[name=msg_input]').val());
//alert(response);
showBotMessage(response);
//console.log(response);
},
error: function(error) {
console.log(error);
}
});\
});
});
Database Design
Database: college_chatbot
Table Design
Table structure for table cc_admin
Field Type
username varchar(20)
password varchar(20)
Table structure for table cc_department
Field Type Key

id int(11)
department varchar(30) Primary key
Table structure for table cc_register
Field Type Key

id int(11)
name varchar(20)
mobile bigint(20)
email varchar(40)
location varchar(30)
department Varchar(20)
Register_no varchar(20) Primary key
password varchar(20)
create_date varchar(20)
status int(11)
Table structure for table cc_data
Field Type
id int(11)
input varchar(200)
output text
Table structure for table cc_chat
Field Type Key

id int(11)
Register_no varchar(30) Foreign key
Send_query Varchar(50)
reply Varchar(200)
Date_time timestamp
Data Flow Diagram
Level - 0
Level - 1
Level – 2
UML Diagrams
Use Case Diagram
Class Diagram
Sequence Diagram
Activity Diagram
Component Diagram
Collaboration Diagram
Deployment Diagram
ER – Diagram

College ChatBot - Report

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

College ChatBot - Report

Uploaded by

Copyright:

Available Formats

CHAPTER 1

Anna University Regional Campus Madurai is a campus of Anna University, a premier

1.3.1. Types of chatbots

2.2. Title: "A Chatbot-Based Intelligent Tutoring System for Supporting

2.3. Title: "Chatbots in Higher Education: A Review of the Literature"

2.5. Title: "Development of a Natural Language Processing Based Chatbot

2.6. Title: "A Comparative Study of Rule-Based and Neural Network-

2.7. Title: "A Chatbot for Library Services: A User-Centered Design

2.8. Title: "Evaluating the Effectiveness of a Chatbot-Based Study

2.9. Title: "Chatbots as Conversational Agents in University Career

2.10. Title: "A Deep Learning-Based Chatbot Framework for Flight

Upload Dataset Preprocessing

Generate Response Register

System Architecture Description

4.1. Data Collection

The pre-processing module is a critical component of CollegeBot, as it is responsible for

Stop Words Removal

4.4. Feature Extraction

Feature Extracted Data LSTM

Classes: College info, timing,

5. CollegeBot Response Predictor

5.1. Registration and Login Module

Overall Test Result: PASS

Python is a high-level, interpreted, interactive and object-oriented scripting language. Python

NumPy is a general-purpose array-processing package. It provides a high-performance

Seaborn offers the following functionalities:

mycursor.execute("SELECT * FROM cc_data order by rand() limit 0,10")

Table structure for table cc_department

Field Type Key

Table structure for table cc_register

Field Type Key

Table structure for table cc_chat

Field Type Key

You might also like