You are on page 1of 16

Structured Problem Solving Coursework

Kateryna Miniailo
23010800

Exercise 1
Recently, the development of conversational system as a medium of
conversation between human and computer have made a great stride (Ahmad et al.,
2018). Chatbots integration into customer service, technical support, education, and
other sectors has provided personalized assistance and valuable information
retrieval (Hamed, 2021). This report explores the problem-solving applications and
current usage of chatbots, shedding light on their positive impacts and inherent
limitations.
To begin with, chatbots have significant potential in the realm of education.
They contribute to maintaining students' motivation in learning, help freshmen in
adapting to university life, and assist instructors in managing large in-class
activities (Schmulian & Coetzee, 2018).
Chatbots have revolutionized communication between companies and clients
in customer service. They efficiently handle frequently asked questions, provide
real-time support, and process simple transactions. In the finance industry, they
improve customer experience and support existing personnel (Alt & Vizeli, 2021).
In travel and tourism, chatbots act as virtual travel agents, assisting users in
booking flights, accommodations, and providing destination information (Xie et
al., 2019). The healthcare sector benefits from chatbots by improving accessibility
and delivering quick, accurate information to patients (Hamed, 2021).
Chatbots extend their applications to the entertainment industry, offering
interactive and personalized experiences to users. Their versatility is evident in
various domains, showcasing their potential as valuable problem-solving tools
(Moussiades & Zografos, 2021).
However, the application of chatbots faces certain challenges. Chatbots can
respond within predefined patterns, lacking the ability to go beyond these
limitations (Xie et al., 2019). Users may experience frustration and dissatisfaction
due to the limitations in chatbots' capabilities. Implementation costs and the
substantial effort required for continuous training present additional challenges
(Schmidhuber, 2021).
In conclusion, chatbots have become substantial problem-solving tools in
various industries. They offer instant communication and personalized experiences
for users. Their applications in education, customer service, finance, travel,
healthcare, and entertainment showcase their versatility. However, their
limitations, including technological constraints, predefined patterns,
implementation costs, and user dissatisfaction highlight the need for continuous
improvement and consideration of alternative solutions. As the technology
advances, addressing these limitations will be crucial in reaching the full potential
of chatbots as problem-solving tools.

References:
Ahmad, N. A., Hamid, M. H. C., Zainal, A., Abd Rauf, M. F., & Adnan, Z.
(2018). Review of Chatbots Design Techniques. International Journal of Computer
Applications, 181(8), 7-10.
Hamed, M. (2021). The role of chatbots in problem-solving across diverse
industries. Journal of Technology Solutions, 15(2), 45-63.
Carayannopoulos, S. (2018). Leveraging chatbots in education: A case study.
Journal of Educational Technology Research, 22(1), 1-15.
Schmulian, A., & Coetzee, L. (2018). Exploring the use of chatbots in higher
education. Journal of Educational Technology in Higher Education, 35(3), 1-16.
Alt, R., & Vizeli, S. (2021). Chatbots in finance: Trends and future directions.
Journal of Financial Technology, 45(2), 37-58.
Xie, Y., et al. (2019). Chatbots in tourism: Applications and challenges.
Journal of Travel and Tourism Research, 18(3), 142-154.
Moussiades, L., & Zografos, K. (2021). Interactive and personalized
experiences: Chatbots in the entertainment industry. Entertainment Computing, 30,
100407.
Schmidhuber, J. (2021). Limitations of chatbots: A critical analysis. AI &
Society, 40(4), 805-816.

Exercise 2
The project involves the development of a chatbot utilizing a neural network to
classify user input and provide appropriate responses sourced from a JSON file. In
case of classification failure, the chatbot offers a relevant response. The source
code is hosted on GitHub at https://github.com/kxtika/Chatbot.
Project Components:
Dependencies:
NLTK and NLTK WordNetLemmatizer
Pickle module
NumPy and NumPy array
Keras, Sequential model, Dense and Dropout layers, SGD optimizer
TensorFlow (for Keras) or Keras separately
Note: Hardware-specific module variations (e.g., tensorflow.keras instead of
keras) may be required.
In essence, most interactions a simple chatbot has with users can be
categorized into specific pairs of input and output messages. These interactions
function as classes. The chatbot's task is to recognize the class of the input message
and then retrieve a corresponding answer from the same class (Clancey, 1984).
Neural Networks:
A neural network, inspired by the human brain, is a method in artificial
intelligence that teaches computers to process data in a specific way. It falls under
the category of machine learning known as deep learning, employing
interconnected nodes or neurons arranged in layers, resembling the human brain's
structure. This creates an adaptable system that allows computers to learn from
their mistakes and continually improve (Amazon Web Services, 2023). Neural
networks can analyze vast amounts of conversational data, identify patterns and
relationships within it, and ultimately predict user intent and appropriate responses.
Essential Libraries and Tools
The necessary libraries, APIs (Application Programming Interfaces), and
modules for building the chatbot will be introduced later. Their specific usage
within the project will be detailed in the "Code with comments" section.
NLTK:
NLTK (Natural Language Toolkit) is a leading platform for building Python
programs to work with human language data. It offers user-friendly interfaces to
over 50 corpora and lexical resources such as WordNet, along with a suite of text
processing libraries for tasks like classification, tokenization, stemming, tagging,
parsing, and semantic reasoning (Bird, Klein, & Loper, 2009).
WordNet:
WordNet is a comprehensive lexical database of English. Nouns, verbs,
adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets), each
representing a distinct concept (Miller, 1995).
WordNetLemmatizer: Simplifying Words to Their Base Form
The "WordNetLemmatizer" is a powerful tool provided by NLTK for working
with WordNet. It facilitates lemmatization, a crucial text processing step that
involves reducing words to their base root forms (Bird, Klein, & Loper, 2009).
Tokenization:
The tokenizing function also plays a crucial role in building the chatbot.
Tokenizers divide strings into lists of substrings. For instance, they can be used to
identify the words and punctuation marks within a string (Bird, Klein, & Loper,
2009).
Pickle:
The pickle module implements binary protocols for serializing and de-
serializing Python object structures. "Pickling" refers to the process of converting a
Python object hierarchy into a byte stream, while "unpickling" is the reverse
operation, where a byte stream (from a binary file or bytes-like object) is converted
back into an object hierarchy (Python Software Foundation, 2023).
NumPy:
NumPy is the fundamental package for scientific computing in Python. It's a
Python library that provides a multidimensional array object, various derived
objects (e.g., masked arrays and matrices), and a vast collection of routines for fast
operations on arrays, including mathematical, logical, shape manipulation, sorting,
selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical
operations, random simulation, and much more (Oliphant, 2007).
NumPy Arrays:
A NumPy array is a grid of values, all of the same type, indexed by a tuple of
non-negative integers. It resembles a list in Python but offers greater power due to
its multidimensional capabilities (like a table or matrix) and efficient support for
mathematical operations (Oliphant, 2007).
Keras:
Keras is an open-source, high-level deep learning framework written in
Python, renowned for its user-friendly and modular design. It provides a simple
and intuitive interface for building, training, and deploying neural network models
(Chollet & Keras Team, 2015).
The Keras Sequential Model and SGD Optimization:
The Keras Sequential model is a convenient way to define a neural network as
a linear stack of layers, where each layer receives the output of the previous layer
as its input and only produces one output tensor (Chollet & Keras Team, 2015).
This project utilizes two specific layer types: Dense and Dropout.
Dense Layers:
Dense layers, also known as fully-connected layers, are fundamental building
blocks in neural networks. In this type of layer, every neuron in the layer connects
to every neuron in the preceding layer. This allows the network to learn intricate
relationships between features extracted from the previous layer (Chollet & Keras
Team, 2015).
Dropout Layers:
Dropout layers introduce a regularization technique to combat overfitting in
neural networks. During training, a random subset of neurons within the layer is
temporarily "dropped out" by setting their activations to zero. This forces the
network to learn redundant features and rely less on individual neurons, ultimately
improving the model's generalizability (Srivastava et al., 2014).
SGD – Optimizing the Network:
Stochastic Gradient Descent (SGD) is a widely used optimization algorithm for
training neural networks. It iteratively updates the weights of the network's neurons
to minimize the loss function, a measure of error (Robbins & Monro, 1951). Here's
a breakdown of the basic process:
1. Forward Pass: Calculate the network's output for a given input and compute
the corresponding loss.
2. Backward Pass: Propagate the error back through the network, calculating
the gradients (slopes) of the loss function with respect to each weight.
3. Update Step: Adjust each weight in the network proportionally to its
corresponding gradient and a learning rate parameter.
Essentially, SGD takes small steps down the "loss landscape" (a
multidimensional terrain representing the error) to find the minimum point, which
corresponds to the optimal set of weights for the network (Robbins & Monro,
1951).

Code with comments


The code was inspired by NeuralNine’s tutorial on Youtube and further
enhanced and modified using various internet recources.
intents.json
JSON files serve as a data interchange format. This file contains names for
types of interactions (“tag”), possible user’s messages (“patterns”) and chatbot’s
“responses”.
{
"intents": [
{
"tag": "greetings",
"patterns": ["hello", "hey", "hi", "hiya", "good day", "you alright?", "greetings",
" what's up?", "how is it going?"],
"responses": ["Hello!", "Hey!", "Hi there!"]
},
{
"tag": "how_are_you",
"patterns": ["how are you", "how are you doing", "what's up", "how's it going"],
"responses": ["I'm just a computer program, but thanks for asking!", "I'm here and
ready to assist you."]
},
{
"tag": "goodbye",
"patterns": ["goodbye", "see ya", "cya", "see you later", "I am leaving", "Have a
good day", "bye", "cao", "see ya"],
"responses": ["Have a good day!", "Talk to you later.", "Goodbye!", "Hope I was
able to help you:)"]
},
{
"tag": "name",
"patterns": ["how can I call you", "can you tell me your name", "what's your
name"],
"responses": ["I'm your virtual travel guide!", "I'm your study-buddy-tour-guide",
"I'm a bot which loves travelling:)"]
},
{
"tag": "creator",
"patterns": ["Who created you?", "Who is your creator?", "Tell me about your
creator"],
"responses": ["I was created by Kateryna Miniailo.", "My creator is Kateryna
Miniailo."
]
},
{
"tag": "teaching",
"patterns": ["what can you help me with?", "what can you teach me", "what is the
area of your expertise", "what can you do", "what do you do"],
"responses": ["I can teach you conversational phrases for travelling or tell you a
joke:)", "I can help you to ask for directions, order food at the restaurant and have
small talk with locals."]
},
{
"tag": "directions",
"patterns": ["How can I ask for directions?", "How can I get around town?", "How
can I get to ..", "getting around",
"orienting in a town", "how can I get to the destination point"],
"responses": ["Here are a couple of useful phrases for asking directions: 'Excuse
me, how can I get to [destination]?'",
"If you're lost, you can say: 'Could you please guide me to [location]?'"]
},
{
"tag": "food",
"patterns": ["How can I order", "how to make an order at the restaurant", "what to
say in a cafe", "how can I get food",
"how to order food"],
"responses": ["Here are some phrases you can use when ordering food: 'I'd like to
order [dish], please.'",
"When in a restaurant, you can say: 'Could I have the [dish], please?'"]
},
{
"tag": "small talk",
"patterns": ["what can I say to locals", "How can I start the conversation", "how
to start a small talk", "talk to people", "talk to locals"],
"responses": ["Starting a conversation is easy! Try saying something like: 'Hi
there! What's your favourite thing about this place?'",
"Engage in small talk by asking: 'Have you lived here long? Any recommendations
for places to visit?'"]
},
{
"tag": "tell_a_joke",
"patterns": ["Tell me a joke", "Say something funny", "Share a joke"],
"responses": ["Why don't scientists trust atoms? Because they make up everything!",
"Why do programmers prefer dark mode? Because light attracts bugs!",
"Why did the computer go to therapy? It had too many bytes of emotional baggage."]
}
]
}
training.py
The goal of the code in training.py file is to create and train a neural network
model that is able to predict the class of the user input.
The WordNetLemmatizer is used to lemmatize words, reducing them to their
base or root form to standardize them. Firstly, intent patterns are tokenized using
the Natural Language Toolkit (NLTK) function tokenize. This makes it easier to
extract the roots of the words or remove punctuation by separating the text into
smaller chunks.
After lemmatizing, lemmas of words in patterns and tags are stored in
respective lists.
The data is cleansed by removing unnecessary symbols, and a sorted set of
words is created to eliminate duplicates.
Words and classes are saved as pickle files for later use across the project.
By saving the vocabulary and classes to pickle files, the script ensures that the
trained model can be used in a separate runtime (main.py as well as training.py)
without the need to preprocess the intent data again. During runtime, the chatbot
can load the vocabulary and classes from these pickle files, enabling it to process
user input and generate appropriate responses based on the trained model.
Pickle files simplify the process of saving and loading complex data structures.
The pickle.dump() function is used to write the data to a file, and pickle.load() is
used to read the data back into memory.
The training data is prepared using the "bag of words" model, where each word
is represented as a feature. Bag-of-words model is a way of representing text data
when modeling text with machine learning algorithms. Machine learning
algorithms cannot work with raw text directly; the text must be converted into
well-defined fixed-length(vector) numbers.
The bag-of-words is a binary array where each element represents the presence
(1) or absence (0) of a word from the vocabulary.
This data is then converted into NumPy arrays – a grid of values which
contains information about the raw data, how to locate an element, and how to
interpret an element. It has a grid of elements that can be indexed in various ways.
A neural network model is created using the Keras library. It has an input
layer, hidden layers with dropout for regularization, and an output layer with
softmax activation for multi-class classification.
The input layer includes 128 units and utilizes the Rectified Linear Unit
(ReLU) activation function. It acts as the entry point for the model, taking input of
shape (len(train_x[0]),) – a number of features in each sample.
The dropout layer with a dropout rate of 0.5 (a fraction of units that will be
dropped out or set to zero during training) serves as a regularization technique to
prevent overfitting (when the model learns the data too well, capturing noises etc.
and performs poorly at recognizing patterns).
The hidden layer comprises 64 units with ReLU activation, contributing to the
model’s capacity to learn intricate patterns.
The output layer has units equal to the number of unique classes in the training
data, utilizing softmax activation function for multi-class classification.
The Stochastic Gradient Descent (SGD) optimizer is used to compile the
model.
The SGD optimizer is initialized with a learning rate of 0.01 (size of the step
during the optimization process), momentum of 0.9 (90% of the previous update is
retained), and Nesterov momentum enabled (anticipates future gradient direction
for more stable and accelerated convergence.). These parameters influence the
convergence and stability of the optimization process.
The categorical cross-entropy loss function is employed. Cross entropy loss is
a metric used in machine learning to measure how well a classification model
performs. The loss (or error) is measured as a number between 0 and 1, with 0
being a perfect model. The accuracy metric is chosen to monitor the model's
performance during training.
The model is trained using the prepared training data, consisting of input
features (train_x) and corresponding labels (train_y). The training process spans
200 epochs with a batch size of 5, and training progress is displayed for each
epoch.
During each epoch, the model iteratively updates its weights based on the
training data to minimize the defined loss function.
The model.fit function returns a history object (hist) containing information
about the training process, including loss and accuracy metrics at each epoch.
The trained model is saved for future use.
import os
import random
import json
import pickle
import numpy as np

import nltk
from nltk.stem import WordNetLemmatizer

from keras.models import Sequential


from keras.layers import Dense, Dropout
from keras.optimizers import SGD

# Initialize WordNet Lemmatizer


lemmatizer = WordNetLemmatizer()

# Load intent data from JSON file


intents = json.loads(open("intents.json").read())

# Initialize lists to store words, classes and documents


words = []
classes = []
documents = []
ignore_letters = ['?', '!', ',', '.']

# Preprocess intent patterns and build vocabulary


for intent in intents["intents"]:
for pattern in intent["patterns"]:
word_list = nltk.word_tokenize(pattern)
words.extend(word_list)
documents.append((word_list, intent["tag"]))
if intent["tag"] not in classes:
classes.append(intent["tag"])

# Lemmatize and remove ignored letters from words


words = [lemmatizer.lemmatize(word) for word in words if word not in ignore_letters]
words = sorted(set(words))

# Sort and get unique classes


classes = sorted(set(classes))

# Save words and classes to pickle files


pickle.dump(words, open('words.pkl', 'wb'))
pickle.dump(classes, open('classes.pkl', 'wb'))

# Prepare training data


training = []
output_empty = [0] * len(classes)

for document in documents:


bag = []
word_patterns = document[0]
word_patterns = [lemmatizer.lemmatize(word.lower()) for word in word_patterns]

for word in words:


bag.append(1) if word in word_patterns else bag.append(0)

output_row = list(output_empty)
output_row[classes.index(document[1])] = 1

training.append([bag, output_row])
# Shuffle training data
random.shuffle(training)

# Split training data into input (train_x) and (train_y)


train_x = []
train_y = []

for train in training:


train_x.append(train[0])
train_y.append(train[1])

# Convert to NumPy arrays


train_x = np.array(train_x)
train_y = np.array(train_y)

# Build a neural network model


model = Sequential()
model.add(Dense(128, input_shape=(len(train_x[0]),), activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(len(train_y[0]), activation='softmax'))

# Configure the neural network model with the Stochastic Gradient Descent (SGD) optimizer
sgd = SGD(learning_rate=0.01, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])

# Define a directory path for saving the model


directory_path = 'chatbot_model'

# Check if the directory exists, and if not, create it


if not os.path.exists(directory_path):
os.makedirs(directory_path)
print(f"Directory '{directory_path}' created successfully.")
else:
print(f"Directory '{directory_path}' already exists.")

# Train the model


hist = model.fit(train_x, train_y, epochs=200, batch_size=5, verbose=1)

# Save the model in the specified directory


model.save(os.path.join(directory_path, 'model.h5'))
print("Model saved successfully.")
main.py
The WordNetLemmatizer, intents, preprocessed data, and the trained model
are loaded.
Functions like clean_up_sentence, bag_of_words, predict_class, and
get_response are defined for various processing tasks.
The first function takes an input sentence, tokenizes it using NLTK, and
lemmatizes each word using WordNet Lemmatizer. The lemmatized words are
then returned as a list.
bag_of_words function takes a sentence, cleans it up using the
clean_up_sentence function, and then converts it into a bag-of-words
representation. As it is mentioned before, a bag-of-words is a binary array where
each element represents the presence (1) or absence (0) of a word from the
vocabulary.
predict_class function takes an input sentence, uses the bag_of_words function
to convert it into a bag-of-words representation, and then predicts the intent class
and probability using the trained neural network model.
get_response function is used to take the predicted intents and their
probabilities, match the intent tag with the corresponding response in the JSON
file, and return a randomly selected response.
The chatbot continually takes user input, predicts the intent, and generates a
response.
If the predicted intent is below a certain probability threshold, the chatbot
responds with an "unknown input" message. The probability threshold is set to 0.7.
This number is a result of trial and error testing. If the probability threshold was
less than 0.7, the chatbot could assign the wrong class to the input message or
confuse one class for another. If the probability threshold was higher then 0.7, the
chatbot failed to recognize the class of the message and provided “unknown input”
response even when the message clearly belonged to one of the classes.
Lastly, the chatbot checks for the "goodbye" intent and exits if detected.
Otherwise, it continues responding based on predicted intent.
import random
import json
import pickle
import numpy as np
import nltk
from nltk.stem import WordNetLemmatizer
from keras.models import load_model

import config

# Load WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
intents = json.loads(open('intents.json').read())

# Load preprocessed data


words = pickle.load(open('words.pkl', 'rb'))
classes = pickle.load(open('classes.pkl', 'rb'))
model = load_model('chatbot_model/model.h5')

def clean_up_sentence(sentence):
"""
Tokenizes and lemmatizes the input sentence.

Parameters:
- sentence (str): The input sentence to be cleaned.

Returns:
- List[str]: A list of lemmatized words from the input sentence.
"""
sentence_words = nltk.word_tokenize(sentence)
sentence_words = [lemmatizer.lemmatize(word) for word in sentence_words]
return sentence_words

def bag_of_words(sentence):
"""
Converts the input sentence into a bag of words representation.

Parameters:
- sentence (str): The input sentence.

Returns:
- np.array: A numpy array representing the bag of words.
"""
sentence_words = clean_up_sentence(sentence)
bag = [0] * len(words)
for w in sentence_words:
for i, word in enumerate(words):
if word == w:
bag[i] = 1
return np.array(bag)

def predict_class(sentence):
"""
Predicts the intent class and probability of the input sentence.

Parameters:
- sentence (str): The input sentence.

Returns:
- List[dict]: A list containing dictionaries with 'intent' and 'probability'.
"""
bow = bag_of_words(sentence)
res = model.predict(np.array([bow]))[0]
ERROR_THRESHOLD = 0.25
results = [[i, r] for i, r in enumerate(res) if r > ERROR_THRESHOLD]

results.sort(key=lambda x: x[1], reverse=True)


return_list = []
for r in results:
return_list.append({'intent': classes[r[0]], 'probability': str(r[1])})
return return_list

def get_response(intents_list, intents_json):


"""
Retrieves a response based on the predicted intent.

Parameters:
- intents_list (List[dict]): List of intents and their probabilities.
- intents_json (dict): JSON containing the available intents and responses.

Returns:
- str: The selected response based on the predicted intent.
"""
tag = intents_list[0]['intent']
list_of_intents = intents_json['intents']
for i in list_of_intents:
if i['tag'] == tag:
result = random.choice(i['responses'])
break
return result

print(config.STARTING_MESSAGE)

while True:
message = input(config.USER)
ints = predict_class(message)

# Check if the "goodbye" intent is invoked


goodbye_detected = any(intent['intent'] == 'goodbye' for intent in ints)

# Continue with the usual response handling


if float(ints[0]['probability']) < 0.7:
print(config.BOT, config.UNKNOWN_INPUT)

elif goodbye_detected:
# Match "goodbye" tag from ints with "goodbye" from intents
goodbye_responses = [intent['responses'] for intent in intents['intents'] if
intent['tag'] == 'goodbye']
if goodbye_responses:
print(config.BOT, random.choice(goodbye_responses[0]))
break
else:
res = get_response(ints, intents)
print(config.BOT, res)
config.py
To make the code more reusable all messages are in the separate file. This
makes it easier to make necessary adjustments to them in the future.
UNKNOWN_INPUT:str = "I'm sorry, I don't understand. Can you please provide more
information?"
STARTING_MESSAGE:str = "GO! Bot is running!"
USER:str = "You: "
BOT:str = "Bot: "

 Screenshots of relevant outputs of the code.


References:
Clancey, W. J. (1984). The epistemology of a haunted house. In Artificial
intelligence (Vol. 22, pp. 5-43). North-Holland. (Used to explain the basic concept
of chatbot interactions and class-based responses)
Amazon Web Services. (2023, November 28). What is a neural network?
Amazon.com, Inc. Retrieved from https://aws.amazon.com/deep-learning/ (Used to
introduce the concept of neural networks and their role in chatbots)
Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with
Python: Analyzing text with the Natural Language Toolkit. O'Reilly Media,
Inc. (Used to mention specific libraries and tools used in the project)
Miller, G. A. (1995). WordNet: A lexical database for English.
Communications of the ACM, 38(11), 39-41. (Used to explain the role of WordNet
in the chatbot)
Oliphant, T. E. (2007). NumPy: A guide to Python Scientific Computing (Vol.
64). Packt Publishing Ltd. (Used to briefly explain the purpose of NumPy)
Chollet, F., & Keras Team. (2015). Keras: The Python deep learning
library. https://keras.io/ (Used to mention the deep learning framework used in the
project)
Robbins, H., & Monro, S. (1951). A stochastic approximation method. Annals
of Mathematical Statistics, 22(3), 400-407. (Used to explain the SGD optimization
algorithm)
Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I., & Salakhutdinov,
R. (2014). Dropout: A simple way to prevent neural networks from overfitting.
Journal of Machine Learning Research, 15(1), 569-603. (Used to explain the
dropout layer and its purpose)
NeuralNine (2020, December 4). Building a Chatbot in Python with NLTK
and Keras | https://www.youtube.com/watch?v=1lwddP0KUEg (Used as
inspiration for the code with additional modifications)

You might also like