You are on page 1of 29

CM3060: Natural Language Processing - Coursework: Text classification

(Student No: 200644857)


Report Content:
1. Introduction, Target Domain Issue, Objectives
2. Dataset
Reason of Choice
Dataset MetaData
3. Evaluation Methodologies
4. Import Library Depedencies
5. Simple Dataset EDA
6. Dataset Pre-processing
Drop columns that are not needed
Reduce Noise
Lowercasing
Remove Punctuation
Tokenization
Remove Stopwords
Stemming
Lemmatization
7. Understand Dataset, Model Weightage (setinment distribution)
8. Sentiment Analysis
Naive Bayes
Model Performance (Evaluation)
Model Performance (Visualization)
Decision Tree
Model Performance (Evaluation)
Model Performance (Visualization)
Logistic Regression
Model Performance (Evaluation)
Model Performance (Visualization)
Random forest regressor
Model Performance (Evaluation)
Model Performance (Visualization)
9. Model Comparison: Observations
Score Evaluation
Metrics Comparison (Visualization)
10. Reflective Evaluation & Conclusion
11. Contributions to the selected domain-specific area & potential scope of project (transferability)
12. Conclusion, Personal Reflection- what I think I could have done better?
13. References

Introduction: Target Domain Issue & Project objectives


For my module coursework, I choose to focus on developing a sentiment analysis on airline customer evaluations, particularly British Airways.
Sentiment analysis is the process of determining the tone or viewpoint (sentiment) of a specific instance of a conversation or text. By analyzing/automating this process, we acquire multiple insights into various points of
view or features of the given situation. We learn more about consumer experiences in terms of service, in-flight experience, punctuality, general satisfaction, and so on in this situation.
The project's uniqueness, ambition, and creativity stem from the specific application of sentiment analysis to airline reviews. While sentiment analysis has received substantial research, applying it to the area of airline
reviews presents new problems and insights into customer feelings in the airline sector.
Any reference code is fully commented and make explicit where any 3rd party functionality ends and my code begins.
In short, my project is centred on creating a text classifier for sentiment analysis on the British Airway reviews. The goal is to deliver a solution that accurately categorises the sentiments represented in these evaluations,
providing insights into consumer experiences and opinions.
Dataset & Reason of Choice
This dataset was obtained via Kaggle.com, where a user (Athani Nikhil) shared a publicly available dataset. The dataset was chosen based on how comprehensive and deep it is in terms of information, including various
and unique customer reviews. It is a detailed dataset that contains information on every British Airways (Skytrax) flight from October 9, 2011 until February 16, 2023. The dataset contains information on approximately
3472 flights, including the origin and destination airports (Route), customer comments, class category, and suggestions.
The CSV thus shows that there is a wide range of reviews and is constantly updated. Having such variability improves the performance of our analysis. I did come across a few more similar airline review datasets. None
of them, though, were as robust as this one. Based on the data I acquired during my fieldwork, I ultimately opted to use the dataset below because it performed better in terms of characteristics and flexibility than the
others.
Dataset Meta Data
Name of Dataset: British Airlines Reviews
Owner (& Information): Athani Nikhil
Year: British Airline reviews from 2011 - 2023
Description: dataset that includes information on every flight taken by British Airways (Skytrax) from October 9, 2011 to February 16, 2023. The dataset includes information on about 3472 flights, including the
origin and destination airports (Route), customer feedback, class type they travelled in and the recommendations.
License Type: Public, Open Source
File name: 'BAreviews.csv'
Source: https://www.kaggle.com/datasets/athaninikhil/british-airlines-reviews
Alternate preview of CSV: https://drive.google.com/file/d/1YiotT_C3rkTSZI82Hs8PXWcglamdwk1V/view?usp=sharing
Evaluation
In order to assess how well our models are predicting the sentiment, we will analyze several metrics such as accuracy, precision, recall, and F1-score to provide information about the classifier's performance.
Additionally, we will present a table that will showcase detailed information on the model's performance.To visualize the same, we will present the performance metrics in a clear and visually attractive way.
In addition, I will analyze the confusion matrix to better understand how well the model predicts sentiment.By comparing the predicted sentiment with the actual sentiment in the test data, I will be able to determine
patterns of accurate and inaccurate classifications.To make it easier to interpret the confusion matrix, I will create a visual representation using Plotly. This will help me identify any trends or inconsistencies more easily.
Import Neccessary Libraries
In [1]: import nltk
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('omw-1.4')

[nltk_data] Downloading package punkt to


[nltk_data] /Users/prasannapalaniappan/nltk_data...
[nltk_data] Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data] /Users/prasannapalaniappan/nltk_data...
[nltk_data] Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data] /Users/prasannapalaniappan/nltk_data...
[nltk_data] Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to
[nltk_data] /Users/prasannapalaniappan/nltk_data...
[nltk_data] Package omw-1.4 is already up-to-date!
True
Out[1]:

In [2]: # Data manipulation and analysis


import pandas as pd # For data manipulation and analysis
import numpy as np # For numerical computations

# Regular expressions and math


import re # For regular expressions
import math # For mathematical operations

# Data visualization
import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
import plotly.io as pio

# Natural Language Processing (NLP) libraries


from nltk import classify # For classification tasks
from nltk import NaiveBayesClassifier # Naive Bayes Text Classifier
from nltk.corpus import stopwords # Stopword corpus
from nltk.tokenize import word_tokenize # Word tokenizer
from nltk.stem import SnowballStemmer # Word stemmer
from nltk.stem import WordNetLemmatizer # Word lemmatizer

# Machine learning libraries


from sklearn.model_selection import train_test_split, StratifiedShuffleSplit, GridSearchCV # For model evaluation and selection
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer # Text feature extraction

from sklearn.pipeline import Pipeline # For creating pipelines


from sklearn.naive_bayes import MultinomialNB # Naive Bayes classifier
from sklearn.linear_model import LogisticRegression # Logistic regression classifier
from sklearn.tree import DecisionTreeClassifier # Decision tree classifier
from sklearn.svm import LinearSVC # Linear Support Vector Classifier (SVC)
from sklearn.ensemble import RandomForestClassifier # Random Forest classifier

# Evaluation metrics
from sklearn import metrics
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
# Disable warnings
import warnings
warnings.filterwarnings('ignore')

Simple Dataset EDA


In [3]: df = pd.read_csv('BAreviews.csv') # read 'BAreviews.csv' dataset into 'dataset'
df = df.iloc[:, 1:] #drop ID column

print("Number of Reviews :", df.shape[0]) #dataset size


df.head(5)

Number of Reviews : 3473


Out[3]: Review Header Review Date Review Class Type Sentiment
0 "crew were really nice" 16th February 2023 ✅ Trip Verified | This was my first time flyin... Economy Class Positive
1 "Lots of cancellations and delays" 15th February 2023 ✅ Trip Verified | Lots of cancellations and d... Economy Class Negative
2 "Overall, very happy with BA" 7th February 2023 ✅ Trip Verified | BA 242 on the 6/2/23. Boardi... Economy Class Positive
3 "the best airline I've flown with" 6th February 2023 ✅ Trip Verified | Not only my first flight in... Economy Class Positive
4 "so determined to help" 4th February 2023 ✅ Trip Verified | My husband and myself were ... Economy Class Positive

In [4]: print("")
print("Dataset Shape =>", df.shape) #get dataset shape

print("")
print("Dataset Info")
print("-"*18)
print(df.info()) #get dataset info

print("")
print("")
print("Dataset Summary")
print("-"*18)
print(df.describe()) #get dataset summary
Dataset Shape => (3473, 5)

Dataset Info
------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3473 entries, 0 to 3472
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Review Header 3473 non-null object
1 Review Date 3473 non-null object
2 Review 3473 non-null object
3 Class Type 3472 non-null object
4 Sentiment 3473 non-null object
dtypes: object(5)
memory usage: 135.8+ KB
None

Dataset Summary
------------------
Review Header Review Date \
count 3473 3473
unique 2464 1669
top British Airways customer review 19th January 2015
freq 956 26

Review Class Type \


count 3473 3472
unique 3465 4
top British Airways from Tampa to Gatwick on Boein... Economy Class
freq 2 2789

Sentiment
count 3473
unique 2
top Positive
freq 2746

In [5]: #check for duplicate values


print("")
print("Duplicate Values => ",df.duplicated().sum())

print("")
print("Missing Values")
print("-"*18)
print(df.isna().sum())

fig = px.imshow(df.isnull(), width= 800, height= 300)


fig.show()
Duplicate Values => 6

Missing Values
------------------
Review Header 0
Review Date 0
Review 0
Class Type 1
Sentiment 0
dtype: int64

1000 200

2000 100

3000
0
Review Header Review Date Review Class Type Sentiment

Dataset Pre-Processing
1) Drop columns that are not needed
In [6]: '''
We do not require the columns Review Date,&Class Type
for our sentiment analysis.Hence to drop them
'''
newdf = df
newdf = newdf.drop(["Review Date","Class Type"], axis=1)
newdf.head(5)

Out[6]: Review Header Review Sentiment


0 "crew were really nice" ✅ Trip Verified | This was my first time flyin... Positive
1 "Lots of cancellations and delays" ✅ Trip Verified | Lots of cancellations and d... Negative
2 "Overall, very happy with BA" ✅ Trip Verified | BA 242 on the 6/2/23. Boardi... Positive
3 "the best airline I've flown with" ✅ Trip Verified | Not only my first flight in... Positive
4 "so determined to help" ✅ Trip Verified | My husband and myself were ... Positive

3) Reduce Noise
In [7]: '''
Remove "Trip Verified"
from the column for noise removal and standardization
'''
newdf['Review'] = newdf['Review'].str.lstrip(' ✅
Trip Verified | ')
newdf.head(5)

Out[7]: Review Header Review Sentiment


0 "crew were really nice" his was my first time flying with BA & I was p... Positive
1 "Lots of cancellations and delays" Lots of cancellations and delays and no one ap... Negative
2 "Overall, very happy with BA" BA 242 on the 6/2/23. Boarding was delayed due... Positive
3 "the best airline I've flown with" Not only my first flight in 17 years, but also... Positive
4 "so determined to help" My husband and myself were flying to Madrid on... Positive

4) Lowercasing
In [8]: '''
Lowercasing:
For data normalisation.Allows for uniform word comparisons across cases.
Conforms to language models and decreases vocabulary quantity,
increasing efficiency and properly capturing underlying semantics.
'''
newdf.loc[:, 'Review Header'] = newdf['Review Header'].str.lower()
newdf['Review'] = newdf['Review'].str.lower()

newdf['Review Header'] = newdf['Review Header'].apply(lambda x: x.lower())


newdf['Review'] = newdf['Review'].apply(lambda x: x.lower())

newdf.head(5)

Out[8]: Review Header Review Sentiment


0 "crew were really nice" his was my first time flying with ba & i was p... Positive
1 "lots of cancellations and delays" lots of cancellations and delays and no one ap... Negative
2 "overall, very happy with ba" ba 242 on the 6/2/23. boarding was delayed due... Positive
3 "the best airline i've flown with" not only my first flight in 17 years, but also... Positive
4 "so determined to help" my husband and myself were flying to madrid on... Positive

5) Remove Punctuation
In [9]: '''
Remove Punctuation:
For data normalisation. Remove punctuations from review text.
Apply to both columns - Review Header, Review
'''
def removeP(review):
punct = r'[^\w\s]'
review = re.sub(punct, '', review)
return review
newdf['Review Header'] = newdf['Review Header'].apply(removeP)
newdf['Review'] = newdf['Review'].apply(removeP)

newdf.head(5)

Out[9]: Review Header Review Sentiment


0 crew were really nice his was my first time flying with ba i was pl... Positive
1 lots of cancellations and delays lots of cancellations and delays and no one ap... Negative
2 overall very happy with ba ba 242 on the 6223 boarding was delayed due to... Positive
3 the best airline ive flown with not only my first flight in 17 years but also ... Positive
4 so determined to help my husband and myself were flying to madrid on... Positive

6) Tokenization
In [10]: '''
Tokenize:
The tokenizer function splits the review text into
individual words, creating tokens.
Each token is then converted to lowercase for consistency and
easier word comparisons.
Applying the tokenizer function to the "Review" column
creates a new "Tokenized Review" column with the tokenized text.
'''
def tokenizer(review):
tokens = word_tokenize(review)
return [token.lower() for token in tokens]

newdf['Tokenized Review'] = df['Review'].apply(tokenizer)


newdf.head(5)

Out[10]: Review Header Review Sentiment Tokenized Review


0 crew were really nice his was my first time flying with ba i was pl... Positive [✅ , trip, verified, |, this, was, my, first, t...
1 lots of cancellations and delays lots of cancellations and delays and no one ap... Negative [✅ , trip, verified, |, lots, of, cancellations...
2 overall very happy with ba ba 242 on the 6223 boarding was delayed due to... Positive [✅ , trip, verified, |, ba, 242, on, the, 6/2/2...
3 the best airline ive flown with not only my first flight in 17 years but also ... Positive [✅ , trip, verified, |, not, only, my, first, f...
4 so determined to help my husband and myself were flying to madrid on... Positive [✅ , trip, verified, |, my, husband, and, mysel...

6) Remove Stopwords
In [11]: '''
Stopwords are inconsequential words that are useless in SA.
We will remove them from the 'Tokenized Review' by the code.
The NLTK library is used to retrieve the set of stopwords in the English language.
My removeSW function removes stopwords from each tokenized review,
resulting in a new filtered list of tokens.
'''
stopWords = set(stopwords.words('english'))
def removeSW(tok_review):
new_tok_review = [word for word in tok_review if word.lower() not in stopWords]
return new_tok_review

newdf['Tokenized Review'] = newdf['Tokenized Review'].apply(removeSW)


newdf.head(5)

Out[11]: Review Header Review Sentiment Tokenized Review


0 crew were really nice his was my first time flying with ba i was pl... Positive [✅ , trip, verified, |, first, time, flying, ba...
1 lots of cancellations and delays lots of cancellations and delays and no one ap... Negative [✅ , trip, verified, |, lots, cancellations, de...
2 overall very happy with ba ba 242 on the 6223 boarding was delayed due to... Positive [✅ , trip, verified, |, ba, 242, 6/2/23, ., boa...
3 the best airline ive flown with not only my first flight in 17 years but also ... Positive [✅ , trip, verified, |, first, flight, 17, year...
4 so determined to help my husband and myself were flying to madrid on... Positive [✅ , trip, verified, |, husband, flying, madrid...

7) Stemming
In [12]: '''
Stemming is applied to 'Tokenized reviews'
in order to reduce words to their basic or root form.
The NLTK library's SnowballStemmer is used to stem English words.
The stemmer function stems each token in the tokenized review,
adding a new column called "Stemmed Review" to the DataFrame.
'''
stemming = SnowballStemmer("english")

def stemmer(tok_review):
return [stemming.stem(i) for i in tok_review]

newdf['Stemmed Review'] = newdf['Tokenized Review'].apply(stemmer)


newdf.head(5)

Out[12]: Review Header Review Sentiment Tokenized Review Stemmed Review


0 crew were really nice his was my first time flying with ba i was pl... Positive [✅ , trip, verified, |, first, time, flying, ba... [✅ , trip, verifi, |, first, time, fli, ba, &, ...
1 lots of cancellations and delays lots of cancellations and delays and no one ap... Negative [✅ , trip, verified, |, lots, cancellations, de... [✅ , trip, verifi, |, lot, cancel, delay, one, ...
2 overall very happy with ba ba 242 on the 6223 boarding was delayed due to... Positive [✅ , trip, verified, |, ba, 242, 6/2/23, ., boa... [✅ , trip, verifi, |, ba, 242, 6/2/23, ., board...
3 the best airline ive flown with not only my first flight in 17 years but also ... Positive [✅ , trip, verified, |, first, flight, 17, year... [✅ , trip, verifi, |, first, flight, 17, year, ...
4 so determined to help my husband and myself were flying to madrid on... Positive [✅ , trip, verified, |, husband, flying, madrid... [✅ , trip, verifi, |, husband, fli, madrid, 3rd...

In [13]: #changes words completely so remove steemer


newdf = newdf.drop('Stemmed Review', axis=1)
newdf.head(5)
Out[13]: Review Header Review Sentiment Tokenized Review
0 crew were really nice his was my first time flying with ba i was pl... Positive [✅ , trip, verified, |, first, time, flying, ba...
1 lots of cancellations and delays lots of cancellations and delays and no one ap... Negative [✅ , trip, verified, |, lots, cancellations, de...
2 overall very happy with ba ba 242 on the 6223 boarding was delayed due to... Positive [✅ , trip, verified, |, ba, 242, 6/2/23, ., boa...
3 the best airline ive flown with not only my first flight in 17 years but also ... Positive [✅ , trip, verified, |, first, flight, 17, year...
4 so determined to help my husband and myself were flying to madrid on... Positive [✅ , trip, verified, |, husband, flying, madrid...

8) Lemmatization
In [14]: """
Instead, we shall lemmatize our tokenized reviews in order
to reduce phrases to their simplest form.
The NLTK library's WordNetLemmatizer is used for lemmatization.
The lemmatizer function lemmatizes each token in the tokenized review,
inserting a new column called "Lemmatized Review" into the DataFrame.
"""
lemm = WordNetLemmatizer()

def lemmatizer(tok_review):
return [lemm.lemmatize(j) for j in tok_review]

newdf['Lemmatized Review'] = newdf['Tokenized Review'].apply(lemmatizer)


newdf.head(5)

Out[14]: Review Header Review Sentiment Tokenized Review Lemmatized Review


0 crew were really nice his was my first time flying with ba i was pl... Positive [✅ , trip, verified, |, first, time, flying, ba... [✅ , trip, verified, |, first, time, flying, ba...
1 lots of cancellations and delays lots of cancellations and delays and no one ap... Negative [✅ , trip, verified, |, lots, cancellations, de... [✅ , trip, verified, |, lot, cancellation, dela...
2 overall very happy with ba ba 242 on the 6223 boarding was delayed due to... Positive [✅ , trip, verified, |, ba, 242, 6/2/23, ., boa... [✅ , trip, verified, |, ba, 242, 6/2/23, ., boa...
3 the best airline ive flown with not only my first flight in 17 years but also ... Positive [✅ , trip, verified, |, first, flight, 17, year... [✅ , trip, verified, |, first, flight, 17, year...
4 so determined to help my husband and myself were flying to madrid on... Positive [✅ , trip, verified, |, husband, flying, madrid... [✅ , trip, verified, |, husband, flying, madrid...

Understand Dataset, Model Weightage (setinment distribution)


We will next examine the sentiment distribution in the 'Sentiment' column of our DataFrame. We shall find the most prevalent sentiment.
Following that, we will calculate the percentage of each sentiment and visualise the sentiment counts and percentages. This will give useful insights into the sentiment composition of our dataset, allowing us to analyse
and comprehend the sentiment distribution for further investigations. We will weigh our model performace with the same.
In [15]: # Count the occurrences of each sentiment
sentiment_counts = newdf['Sentiment'].value_counts()

# Get the major sentiment


print("")
major_sentiment = sentiment_counts.idxmax()
print("Major Sentiment => ", major_sentiment)
# Calculate percentage
print("")
sentiment_percentages = sentiment_counts / len(newdf) * 100
print("Sentiment Percentages => ")
print(sentiment_percentages)

# Bar plot for sentiment counts


fig_counts = go.Figure(
data=[go.Bar(x=sentiment_counts.index, y=sentiment_counts.values)],
layout=go.Layout(
title="Sentiment Counts",
xaxis=dict(title="Sentiment"),
yaxis=dict(title="Count")
)
)
fig_counts.update_layout(width=800, height=500)
fig_counts.show()

# Bar plot for percentages


fig_percentages = go.Figure(
data=[go.Bar(x=sentiment_percentages.index, y=sentiment_percentages.values)],
layout=go.Layout(
title="Sentiment Percentages",
xaxis=dict(title="Sentiment"),
yaxis=dict(title="Percentage")
)
)
fig_percentages.update_layout(width=800, height=500)
fig_percentages.show()

Major Sentiment => Positive

Sentiment Percentages =>


Positive 79.067089
Negative 20.932911
Name: Sentiment, dtype: float64
Sentiment Counts

2500

2000
Count

1500

1000

500

0
Positive Negative

Sentiment
Sentiment Percentages

80

70

60

50
Percentage

40

30

20

10

0
Positive Negative

Sentiment

Sentiment Analysis
Method 1: Multinomial Naive Bayes
About:
The Multinomial Naive Bayes model is a common choice for text classification problems due to its effectiveness in dealing with discrete variables
such as word frequencies. It successfully uses the Bayes' theorem with the premise of feature independence to calculate conditional probabilities and categorise
sentiment especially in customer reviews.

Reason for choice:


Because of its simplicity, efficiency, and efficacy in processing text data, I selected to use Multinomial Naive Bayes for sentiment analysis on
this dataset of British Airways customer evaluations. Because of its capacity to construct conditional probabilities based on word frequencies, this method is
well suited for identifying sentiment in reviews. It is interpretable, which allows us to comprehend the significance of particular words in deciding sentiment.
Furthermore, Multinomial Naive Bayes is resistant to noise and can handle a variety of text preparation approaches, making it a dependable choice for extracting
insights from big volumes of customer input.

In [16]: #Vectorize the text data using CountVectorizer


vectorizer = CountVectorizer()
X = vectorizer.fit_transform(newdf['Review'])
y = newdf['Sentiment']

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X,
y,
test_size=0.2,
random_state=42)
# Create and train the NB classifier
nbclf = MultinomialNB()
nbclf.fit(X_train, y_train)

y_pred = nbclf.predict(X_test) # Predict the labels

#----------------------------------------- Evaluation -------------------------------------------#

# Calculate accuracy,precision, recall, and F1-score


accuracy = accuracy_score(y_test, y_pred)
score_report = classification_report(y_test, y_pred, zero_division=1, output_dict=True)
metrics_NB = {
"Accuracy": accuracy,
"Precision": score_report["macro avg"]["precision"],
"Recall": score_report["macro avg"]["recall"],
"F1-Score": score_report["macro avg"]["f1-score"]
}

#Print the scores and classification report


print("")
print("Accuracy => ", accuracy.round(2))
print("")
metrics_NB = pd.DataFrame(metrics_NB, index=["Naive Bayes:"])
print(metrics_NB)
print("")
print('Performance Report =>')
print(classification_report(y_test, y_pred, zero_division=1))

# Plotting accuracy, precision, recall, and F1-Score


fig = go.Figure(data=[
go.Bar(x=metrics_NB.columns,
y=metrics_NB.iloc[0],
marker_color='lightskyblue')
])

fig.update_layout(
title="Naive Bayes Performance Metrics",
xaxis_title="Metrics",
yaxis_title="Value",
showlegend=False
)

fig.update_layout(width=800, height=500)
fig.show()

# Compute the confusion matrix


cm = confusion_matrix(y_test, y_pred)

# Create a list of strings for the text annotation


text = [[str(y) for y in x] for x in cm]

# Create a heatmap of the confusion matrix with numbers using Plotly


fig = go.Figure(data=go.Heatmap(z=cm,
colorscale='purpor',
text=text,
hoverinfo='text'))
fig.update_layout(
title="Confusion Matrix - Sentiment Analysis",
xaxis=dict(title="Predicted Labels"),
yaxis=dict(title="True Labels"),
)
fig.update_layout(width=800, height=500)
fig.show()

Accuracy => 0.85

Accuracy Precision Recall F1-Score


Naive Bayes: 0.847482 0.802097 0.765798 0.780963

Performance Report =>


precision recall f1-score support

Negative 0.73 0.61 0.66 170


Positive 0.88 0.93 0.90 525

accuracy 0.85 695


macro avg 0.80 0.77 0.78 695
weighted avg 0.84 0.85 0.84 695

Naive Bayes Performance Metrics

0.8

0.7

0.6

0.5
Value

0.4

0.3

0.2

0.1

0
Accuracy Precision Recall F1-Score

Metrics
Confusion Matrix - Sentiment Analysis

1.5

400
1
True Labels

300

0.5

200

100

−0.5
−0.5 0 0.5 1 1.5

Predicted Labels

Method 2: Decision Tree classifier


About:
Based on the attributes in the dataset, the decision tree model constructs a tree-like model of decisions and their potential effects. Each
internal node corresponds to a feature test, whereas each leaf node corresponds to a class label. The model separates the data recursively depending on the most
informative qualities, resulting in a hierarchical structure that facilitates interpretation and decision-making.

Reason for choice:


Due to its interpretability and ability to capture complicated patterns in text data, the Decision Tree classifier is an excellent candidate for
sentiment analysis. It can efficiently categorise sentiment categories by generating a tree-like structure based on information collected from the text. The
model's openness provides for insights into key features and rules that contribute to sentiment classification, while approaches like as pruning and ensemble
methods help improve generalisation and reduce overfitting.

In [17]: # Split the data: training set and testing set


X_train, X_test, y_train, y_test = train_test_split(newdf["Review"],
newdf["Sentiment"],
test_size=0.2,
random_state=42)

In [18]: '''
I will use a pipeline to train my Decision Tree classifier.
We will use CountVectorizer and TfidfTransformer to preprocess the text data and make it suitable for the classifier.
After fitting the pipeline to the training data, we will predict the sentiment of the test data and evaluate performance hen.

'''
# Employ and train the pipeline for Decision Tree classifier
dt_pipeline = Pipeline([("vect", CountVectorizer()),
("tfidf", TfidfTransformer()),
("clf_decisionTree", DecisionTreeClassifier())])

dt_pipeline.fit(X_train, y_train)

# Predict the sentiment using Decision Tree classifier


decision_tree = dt_pipeline.predict(X_test)

#----------------------------------------- Evaluation -------------------------------------------#

# Calculate evalutation
dt_accuracy = accuracy_score(y_test, decision_tree)
dt_scores = classification_report(y_test,
decision_tree,
zero_division=0,
output_dict=True)
dt_metrics = {
"Accuracy": dt_accuracy,
"Precision": dt_scores["macro avg"]["precision"],
"Recall": dt_scores["macro avg"]["recall"],
"F1-Score": dt_scores["macro avg"]["f1-score"]
}

#Print the scores and report


print("")
print("Accuracy =>", dt_accuracy.round(2))
print("")
dt_metrics = pd.DataFrame(dt_metrics, index=["Decision Tree:"])
print(dt_metrics)
print("")
print('Performance Report =>')
print(classification_report(y_test, decision_tree, zero_division=1))

# Plotting scores
fig2 = go.Figure(data=[
go.Bar(x=dt_metrics.columns,
y=dt_metrics.iloc[0],
marker_color='lightskyblue')
])

fig2.update_layout(
title="Decision Tree Performance Metrics",
xaxis_title="Metrics",
yaxis_title="Value",
showlegend=False
)
fig2.update_layout(width=800, height=500)
fig2.show()

# Compute the confusion matrix and draw heatmap


dtcm = confusion_matrix(y_test, decision_tree)
text = [[str(y) for y in x] for x in dtcm]
fig = go.Figure(data=go.Heatmap(z=dtcm,
colorscale='purpor',
text=text,
hoverinfo='text'))
fig.update_layout(
title="Decision Tree: Confusion Matrix",
xaxis=dict(title="Predicted Labels"),
yaxis=dict(title="True Labels"),
)
fig.update_layout(width=800, height=500)
fig.show()

Accuracy => 0.76

Accuracy Precision Recall F1-Score


Decision Tree: 0.758273 0.672331 0.670952 0.671631

Performance Report =>


precision recall f1-score support

Negative 0.51 0.50 0.50 170


Positive 0.84 0.84 0.84 525

accuracy 0.76 695


macro avg 0.67 0.67 0.67 695
weighted avg 0.76 0.76 0.76 695

Decision Tree Performance Metrics

0.7

0.6

0.5
Value

0.4

0.3

0.2

0.1

0
Accuracy Precision Recall F1-Score

Metrics
Decision Tree: Confusion Matrix

1.5

400

1
350
True Labels

300

0.5
250

200

0
150

100

−0.5
−0.5 0 0.5 1 1.5

Predicted Labels

Method 3: Logistic Regression classifier


About:
Logistic Regression is a common and simple approach for sentiment analysis that includes categorising text as positive, negative, or neutral. It
can successfully predict the sentiment of fresh text by converting language into numerical characteristics and training a logistic regression model on labelled
data, providing simplicity and interpretability. More sophisticated models, on the other hand, may be selected for more difficult sentiment analysis jobs that
involve collecting nuanced linguistic patterns and context.

Reason for choice:


I picked logistic regression for my sentiment analysis research on British Airways customer reviews since it is a straightforward and easy-to-
understand approach. As an ameature in nlp, logistic regression enables me to simply comprehend and explain how the input features link to sentiment
predictions. Furthermore, logistic regression is computationally efficient, which is critical in real life situations when dealing with a big number of customer
evaluations. Given the simplicity of this project's sentiment analysis, logistic regression provides a viable way for properly predicting sentiments in my
customer evaluations.

In [19]: from sklearn.pipeline import Pipeline

# Employ and train the pipeline for Logistic Regression classifier


lr_pipeline = Pipeline([
("vect", CountVectorizer()),
("tfidf", TfidfTransformer()),
("clf_logistic", LogisticRegression())
])
lr_pipeline.fit(X_train, y_train)

# Predict the sentiment using Logistic Regression classifier


logistic_regression = lr_pipeline.predict(X_test)

#----------------------------------------- Evaluation -------------------------------------------#

lr_accuracy = accuracy_score(y_test, logistic_regression)


lr_scores = classification_report(y_test, logistic_regression, zero_division=0, output_dict=True)

# Create a dictionary to store the metrics


lr_metrics = {
"Accuracy": lr_accuracy,
"Precision": lr_scores["macro avg"]["precision"],
"Recall": lr_scores["macro avg"]["recall"],
"F1-Score": lr_scores["macro avg"]["f1-score"]
}

#Print the scores and report


print("")
print("Accuracy =>", lr_accuracy.round(2))
print("")
lr_metrics = pd.DataFrame(lr_metrics, index=["Logistic Regression:"])
print(lr_metrics)
print("")
print('Performance Report =>')
print(classification_report(y_test, logistic_regression, zero_division=1))

# Plotting scores
fig3 = go.Figure(data=[
go.Bar(x=lr_metrics.columns,
y=lr_metrics.iloc[0],
marker_color='lightskyblue')
])

fig3.update_layout(
title="Logistic Regression Performance Metrics",
xaxis_title="Metrics",
yaxis_title="Value",
showlegend=False
)
fig3.update_layout(width=800, height=500)
fig3.show()

# Compute the confusion matrix and draw heatmap


dtcm = confusion_matrix(y_test, logistic_regression)
text = [[str(y) for y in x] for x in dtcm]
fig = go.Figure(data=go.Heatmap(z=dtcm,
colorscale='purpor',
text=text,
hoverinfo='text'))
fig.update_layout(
title="Logistic Regression: Confusion Matrix",
xaxis=dict(title="Predicted Labels"),
yaxis=dict(title="True Labels"),
)
fig.update_layout(width=800, height=500)
fig.show()
Accuracy => 0.83

Accuracy Precision Recall F1-Score


Logistic Regression: 0.827338 0.828044 0.672913 0.70365

Performance Report =>


precision recall f1-score support

Negative 0.83 0.37 0.51 170


Positive 0.83 0.98 0.90 525

accuracy 0.83 695


macro avg 0.83 0.67 0.70 695
weighted avg 0.83 0.83 0.80 695

Logistic Regression Performance Metrics

0.8

0.7

0.6

0.5
Value

0.4

0.3

0.2

0.1

0
Accuracy Precision Recall F1-Score

Metrics
Logistic Regression: Confusion Matrix

1.5
500

1 400
True Labels

300
0.5

200

0
100

−0.5
−0.5 0 0.5 1 1.5

Predicted Labels

Method 4: Random forest regressor


About:
This ensemble learning algorithm, consisting of multiple decision trees, has the capability to handle both categorical and numerical features,
making it suitable for extracting valuable insights from text data. With its ability to handle high-dimensional feature spaces and mitigate overfitting, Random
Forest Regressor is a promising choice for accurately predicting sentiment scores in British Airways customer reviews.

Reason for choice:


The Random Forest Regressor model was chosen for my sentiment analysis because of its capacity to manage complicated feature interactions while
avoiding overfitting. Random Forest's ensemble of decision trees captures a wide range of language patterns and contextual information, which is critical for
sentiment analysis. Furthermore, as compared to individual decision trees, Random Forest is less prone to overfitting, providing strong performance on unknown
data. I can efficiently find sentiment trends and patterns in British Airways customer reviews by using the strengths of Random Forest Regressor, delivering
significant insights for further research and decision-making.

In [20]: # Employ and train the pipeline for Random Forest classifier
rf_pipeline = Pipeline([("vect", CountVectorizer()),
("tfidf", TfidfTransformer()),
("clf_randomForest", RandomForestClassifier())])

rf_pipeline.fit(X_train, y_train)

# Predict the sentiment using Random Forest classifier


random_forest = rf_pipeline.predict(X_test)
#----------------------------------------- Evaluation -------------------------------------------#
rf_accuracy = accuracy_score(y_test, random_forest)
rf_scores = classification_report(y_test,
random_forest,
zero_division=0,
output_dict=True)

# Create a dictionary to store the metrics


rf_metrics = {
"Accuracy": rf_accuracy,
"Precision": rf_scores["macro avg"]["precision"],
"Recall": rf_scores["macro avg"]["recall"],
"F1-Score": rf_scores["macro avg"]["f1-score"]
}

#Print the scores and report


print("")
print("Accuracy =>", rf_accuracy.round(2))
print("")
rf_metrics = pd.DataFrame(rf_metrics, index=["Random Forest:"])
print(rf_metrics)
print("")
print('Performance Report =>')
print(classification_report(y_test, random_forest, zero_division=1))

# Plotting scores
fig4 = go.Figure(data=[
go.Bar(x=rf_metrics.columns,
y=rf_metrics.iloc[0],
marker_color='lightskyblue')
])

fig4.update_layout(
title="Random Forest Performance Metrics",
xaxis_title="Metrics",
yaxis_title="Value",
showlegend=False
)
fig4.update_layout(width=800, height=500)
fig4.show()

# Compute the confusion matrix and draw heatmap


dtcm = confusion_matrix(y_test, random_forest)
text = [[str(y) for y in x] for x in dtcm]
fig = go.Figure(data=go.Heatmap(z=dtcm,
colorscale='purpor',
text=text,
hoverinfo='text'))
fig.update_layout(
title="Random Forest: Confusion Matrix",
xaxis=dict(title="Predicted Labels"),
yaxis=dict(title="True Labels"),
)
fig.update_layout(width=800, height=500)
fig.show()
Accuracy => 0.79

Accuracy Precision Recall F1-Score


Random Forest: 0.789928 0.842481 0.576555 0.573655

Performance Report =>


precision recall f1-score support

Negative 0.90 0.16 0.27 170


Positive 0.78 0.99 0.88 525

accuracy 0.79 695


macro avg 0.84 0.58 0.57 695
weighted avg 0.81 0.79 0.73 695

Random Forest Performance Metrics

0.8

0.7

0.6

0.5
Value

0.4

0.3

0.2

0.1

0
Accuracy Precision Recall F1-Score

Metrics
Random Forest: Confusion Matrix

1.5
500

1 400
True Labels

300
0.5

200

0
100

−0.5
−0.5 0 0.5 1 1.5

Predicted Labels

Model Comparison: Observations


Nett Score Evaluation
In [21]: #metrics_NB = pd.DataFrame(metrics_NB, index=["Naive Bayes:"])
#print(metrics_NB)
#print(rf_metrics)
#print(lr_metrics)
#print(dt_metrics)

metrics_NB = pd.DataFrame(metrics_NB, index=["Naive Bayes:"])


rf_metrics = pd.DataFrame(rf_metrics, index=["Random Forest:"])
lr_metrics = pd.DataFrame(lr_metrics, index=["Logistic Regression:"])
dt_metrics = pd.DataFrame(dt_metrics, index=["Decision Tree:"])

# melt into one df


combined_metrics = pd.concat([metrics_NB, rf_metrics, lr_metrics, dt_metrics])
combined_metrics = combined_metrics.round(2)
combined_metrics
Out[21]: Accuracy Precision Recall F1-Score
Naive Bayes: 0.85 0.80 0.77 0.78
Random Forest: 0.79 0.84 0.58 0.57
Logistic Regression: 0.83 0.83 0.67 0.70
Decision Tree: 0.76 0.67 0.67 0.67

Metrics Score Comparison (Visualization)


In [22]: # Reset the index to have the classifier names as a column
combined_metrics.reset_index(inplace=True)

# Rename the columns


combined_metrics.rename(columns={"index": "Classifier"}, inplace=True)

# Melt df
combined_metrics_melted = pd.melt(combined_metrics,
id_vars="Classifier",
var_name="Metric",
value_name="Value")

# Visualization
fig = px.bar(combined_metrics_melted,
x="Classifier",
y="Value",
color="Metric",
barmode="group")

fig.update_layout(title="Comparison of Metrics",
xaxis_title="Classifier",
yaxis_title="Value",
width=800,
height=500)
fig.show()
Comparison of Metrics

Metric
0.8 Accuracy
Precision
Recall
0.7
F1-Score

0.6

0.5
Value

0.4

0.3

0.2

0.1

0
Naive Bayes: Random Forest: Logistic Regression: Decision Tree:

Classifier

In [23]: fig = go.Figure()

# lines for each column


for column in combined_metrics.columns[1:]:
fig.add_trace(go.Scatter(x=combined_metrics['Classifier'], y=combined_metrics[column], mode='lines', name=column))

fig.update_layout(
title='Comparison of Metrics',
xaxis_title='Classifier',
yaxis_title='Value',
width=800, height=500
)
Comparison of Metrics

0.85 Accuracy
Precision
Recall
0.8 F1-Score

0.75
Value

0.7

0.65

0.6

Naive Bayes: Random Forest: Logistic Regression: Decision Tree:

Classifier

Reflective Evaluation & Conclusion


From the observations above we can notice that the Naive Bayes model is the best performing model.
Accuracy Score => 0.85 => suggesting that it correctly identified 85% of the occurrences.
Precision => 0.80 => when model predicts a positive emotion, it is right 80% of the time.
Recall => 0.76 => correctly recognised 76% of the positive events.
F1-Score => 0.78 => harmonic mean of accuracy and recall. This score is a fair representation of the model's overall performance.
Performance Report =>
Accuracy Precision Recall F1-Score
Naive Bayes: 0.847482 0.802097 0.765798 0.780963

precision recall f1-score support


Negative 0.73 0.61 0.66 170
Positive 0.88 0.93 0.90 525

accuracy 0.85 695


macro avg 0.80 0.77 0.78 695
weighted avg 0.84 0.85 0.84 695

Negative => implies that the algorithm detects negative emotion fairly well, however there is potential for improvement.
Positive => the model has excellent accuracy, recall, and f1-score, showing great performance in recognising positive cases.
Overall, this model performs well, with strong accuracy, precision, recall, and f1-score. It outperforms the other models evaluated in the study, demonstrating its effectiveness in sentiment classification.
Contributions to the selected domain-specific area & potential scope of project (transferability)
The transferability of airline review sentiment analysis extends beyond industry-specific applications to many geographical locations. The approaches and models used here could be altered to analyse experiences in
reviews from other airlines or places throughout the world. This portability enables not only to British Airways but other organisations to acquire worldwide insights into client feelings and preferences, allowing them to
customise their goods and strategies to individual regions. Furthermore, the portability of sentiment analysis models can encourage partnerships and information exchange across sectors and geographies, leading to
breakthroughs in sentiment analysis methodologies and the development of more robust and accurate models.
The tools and methodology created for analysing feelings in airline evaluations may be extended and used to a wide range of areas, including hospitality, e-commerce, healthcare, and others.In the hospitality industry,
sentiment analysis may assist hotels and resorts in gauging client happiness, identifying areas for development, and tailoring services to fit visitor expectations. Analysing sentiments in customer reviews may give
significant insights into product quality, consumer preferences, and brand perception in the e-commerce business, assisting in product development and marketing strategies. Similarly, in healthcare, sentiment analysis
may help medical institutions evaluate patient feedback, improve patient treatment, and increase overall satisfaction. Because sentiment analysis methodologies are versatile, firms in these various areas may leverage
the power of consumer sentiment and make data-driven decisions to drive success and improve customer experiences.
Conclusion, Personal Reflection- what I think I could have done better?
Working on this project has been a really delightful experience for me, and it has given me vital insights into the power of leveraging relatively little data to have an enormous impact using simple NLP approaches. It
seemed like a marriage between two giants: mightful data and NLP's skills. This project has certainly opened my eyes to the limitless potential in this subject and undoubtedly piqued my curiosity.
While I am pleased with the progress made, I recognise that certain areas of the project may have been handled more efficiently. I faced particular difficulties in dealing with dataset pre-processing, which I recognize as
a critical aspect impacting the project's outcomes but despite it I have given my best efforts.I'm excited to return and improve on this following my midterm submissions. I should also be looking at other options for
models rather than the standard popular ones. Maybe there are better performing models for this case study and its dutiful only if I explore the same.
The opportunity to further drive my interest in the field of NLP was made possible by the professors' decision to give us this coursework, and I would want to thank them for that. Even more so, I'm interested to see what
the second half of the module has in store for me.
References
1. Divyansh (2020) “Airline Review Data Preprocessing - pt. 2 (NLP),” Kaggle [Preprint]. Available at: https://www.kaggle.com/code/divyansh22/airline-review-data-preprocessing-pt-2-nlp.
2. Artefact (2022b) Using NLP to extract insights from your customers’ reviews. Available at: https://www.artefact.com/blog/using-nlp-to-extract-quick-and-valuable-insights-from-your-customers-reviews/.
3. Bernardes, V. (2023) “How to analyze customer reviews with NLP: a case study,” Blog | Imaginary Cloud [Preprint]. Available at: https://www.imaginarycloud.com/blog/how-to-analyze-customer-reviews-with-nlp-
case-study/.
4. Black_Raven (2021) “Using NLP machine learning models to analyse product reviews,” Medium, 12 December. Available at: https://medium.com/analytics-vidhya/analysing-product-reviews-using-nlp-machine-
learning-models-29f2819a72b.
5. Bassig, M. (2022) “Take Action on Online Reviews with NLP,” ReviewTrackers, 30 June. Available at: https://www.reviewtrackers.com/blog/nlp-reviews/.
6. Bhatt, T. (2021) “Restaurant Review Analysis using NLP,” International Journal for Research in Applied Science and Engineering Technology, 9(VII), pp. 1099–1104. Available at:
https://doi.org/10.22214/ijraset.2021.36540.

You might also like