Post Graduate Program in Management
Batch: PGPM 2023-25
Name of the course: Business Forecasting
Term: IV
Submitted by
Group No: 03
Members:
NAME ROLL NO.
SHRADDHA PANGAM 2301131
TAPASWEE GANDETI 2301140
DEVARALA GOWTHAM TEJA 2301152
KOPPOLU SIDHARDHA 2301153
CHIRADIP LAHIRI 2301258
DEBASHRUTI BHATTACHARJEE 2301259
Introduction:
Based on the provided dataset, we can conduct a comprehensive analysis to understand the factors influencing
customer satisfaction, which includes variables like satisfaction level, gender, customer type, age, type of
travel, class, flight distance, seat comfort, food and drinking, gate location, in-flight wifi service, in-flight
entertainment, online support, ease of online booking, onboard service, leg room service, baggage handling,
checking service, cleanliness, online boarding, departure delay, and arrival delay.
Descriptive statistics outputs, Insights
Correlation Matrix:
The correlation matrix shows the relationships between various factors related to airline experiences. Some key
insights include:
Strong positive correlations: Flight
distance and departure/arrival time
convenience have strong positive
correlations, suggesting that longer
flights are often associated with more
convenient departure/arrival times.
Moderate positive correlations: Seat
comfort, food and drink, gate location,
and inflight entertainment have
moderate positive correlations with
overall satisfaction.
Negative correlations:
Departure/arrival delays have strong
negative correlations with customer
satisfaction, indicating that delays are a
major factor in overall experience.
Weak correlations: Age and online
booking have weak correlations with
other factors, suggesting that these
factors may not have a significant
impact on overall satisfaction.
Descriptive Analysis
Distance vs satisfaction level
Interpretation: Dissatisfied
Passengers tend to have shorter
flight distances than satisfied
passengers. This suggests that longer
flights may contribute to higher
satisfaction levels. Airlines can use
this information to improve services,
enhance the flying experience, and
boost overall passenger satisfaction
Gender-wise analysis:
Interpretation: The observation that
both female and male counts are the
same, yet male satisfaction levels are
not higher than female satisfaction
levels, could be attributed to several
factors. Individual preferences and
expectations play a crucial role. Men
and women may prioritize different
aspects of their flying experience—such
as legroom, food quality, or service—
leading to varying satisfaction levels.
Age Wise Satisfaction level
Interpretation: The majority of people
who are not satisfied are young age(45
and below), focusing on entertainment
and wifi services.
Type of travel vs satisfaction analysis
Interpretation: Both types of travel
(Business and Personal) have a higher
count of dissatisfied individuals than
satisfied individuals. Businesses and
travel providers should improve
customer experiences to enhance overall
satisfaction.
Class-wise analysis
Interpretation: In the Eco class, there
is a slightly lower count, but
predominantly dissatisfied individuals,
with very few satisfied ones. Focus
should be more towards Economy class
Customer Type wise analysis:
Interpretation: The Disloyal Customer
bar shows a higher proportion of
dissatisfaction compared to satisfaction.
In contrast, the Loyal Customer bar has
significantly more satisfied individuals
than dissatisfied ones. Focus should be
more on loyal customer who are
dissatisfied
Analysis of Amenities:
Amenities Graphs Interpretation
Passengers with higher wifi service
ratings (especially rating 4) are more
Inflight Wi-Fi satisfied.
service
Improving inflight wifi services can
enhance overall satisfaction.
Satisfied passengers tend to give
higher Inflight Entertainment Ratings
(closer to 4). Those who are
dissatisfied have lower ratings
Inflight (around 3).
entertainment
Outliers above the upper whisker
indicate extreme ratings.
Satisfied passengers tend to give
Legroom higher Inflight Entertainment Ratings
service (closer to 4). Those who are
dissatisfied have lower ratings
(around 3).
Cleanliness the count of dissatisfied responses
decreases as cleanliness ratings
improve. This shows that the
majority are dissatisfied and rated 2-
4 ratings. Regular inspections,
thorough cleaning protocols, and
attention to detail can make a
significant difference.
Food and The number of dissatisfied responses
Drinks decreases as food and Drink ratings
improve. This shows that the
majority are dissatisfied and rated 2-
4 ratings. Try and introduce new
food and beverages.
Majority are dissatisfied
Seat comfort Airlines should improve seat design,
cushioning, legroom, and overall
comfort to enhance passenger
satisfaction.
Regularly collect feedback from
passengers to identify specific areas
for improvement.
Digital services:
Services Graphs Interpretation
A significant number of
dissatisfied customers, especially
at lower ratings: The highest
Online- satisfaction is observed at a rating
Support of 4, while the lowest is at 0.
Improve online support to increase
satisfaction level.
Most customers are satisfied with
the ease of online booking, and
higher ratings are receiving more
Ease of satisfied responses. As the ease of
online online booking rating increases,
Booking the number of dissatisfied
customers decreases, and the
number of satisfied customers
increases.
Most customers are satisfied with
the online boarding process, and
those with higher ratings receive
Online
more satisfied responses.As the
Boarding
online boarding rating increases,
the number of dissatisfied
customers decreases, and the
number of satisfied customers
increases.
Various Operations analysis:
Operations Graphs Interpretation
A significant number of
dissatisfied customers,
especially at lower ratings:
The highest satisfaction is
observed at a rating of 4, while
the lowest is at 0. Improve
On-board online support to increase
service satisfaction level.
Most customers are satisfied
with the ease of online
booking, and higher ratings are
receiving more satisfied
responses. As the ease of
online booking rating
increases, the number of
Ease of dissatisfied customers
online decreases, and the number of
Booking satisfied customers increases.
Most customers are satisfied
with the online boarding
process, and those with higher
Online ratings receive more satisfied
Boarding responses.As the online
boarding rating increases, the
number of dissatisfied
customers decreases, and the
number of satisfied customers
increases.
Logistic regression model(s) description and outputs
The Logistic regression model has been constructed to predict customer satisfaction based on different
predictors like Flight distance, Seat Comfort, Online Boarding and On Board service. The models aim to
classify customers as either "satisfied" or "dissatisfied" based on their experiences during the flight.
a) Satisfaction measured according to Flight distance and seat comfort
R Code:
library(caret)
library(dplyr)
data <- read.csv('C:/Users/user/Downloads/Airline-data_1.csv')
data$satisfaction <- ifelse(data$satisfaction == 'satisfied', 1, 0)
data$satisfaction <- as.factor(data$satisfaction)
data$Gender <- as.factor(data$Gender)
data$Customer.Type <- as.factor(data$Customer.Type)
data$Type.of.Travel <- as.factor(data$Type.of.Travel)
data$Class <- as.factor(data$Class)
set.seed(123)
trainIndex <- createDataPartition(data$satisfaction, p = 0.7, list = FALSE)
train_data <- data[trainIndex, ]
test_data <- data[-trainIndex, ]
model <- glm(satisfaction ~ Flight.Distance + Seat.comfort, data = train_data, family =
binomial)
summary(model)
predictions <- predict(model, test_data, type = "response")
predictions <- ifelse(predictions > 0.5, 1, 0)
confusionMatrix(as.factor(predictions), test_data$satisfaction)
Output:
b) Satisfaction measured according to Online boarding and On board service:
R Code:
library(caret)
library(dplyr)
data <- read.csv('C:/Users/user/Downloads/Airline-data_1.csv')
data$satisfaction <- ifelse(data$satisfaction == 'satisfied', 1, 0)
data$satisfaction <- as.factor(data$satisfaction)
data$Gender <- as.factor(data$Gender)
data$Customer.Type <- as.factor(data$Customer.Type)
data$Type.of.Travel <- as.factor(data$Type.of.Travel)
data$Class <- as.factor(data$Class)
set.seed(123)
trainIndex <- createDataPartition(data$satisfaction, p = 0.7, list = FALSE)
train_data <- data[trainIndex, ]
test_data <- data[-trainIndex, ]
model <- glm(y_train ~ Online.boarding + On.board.service, data = training_data, family =
binomial)
summary(model)
predictions <- predict(model, test_data, type = "response")
predictions <- ifelse(predictions > 0.5, 1, 0)
confusionMatrix(as.factor(predictions), test_data$satisfaction)
Output:
Discussion of the logistic regression model(s)
The robustness of the model can be explained by its precision, recall and F1 score.
a)Satisfaction measured according to Flight distance and seat comfort
Precision: 0.7202
Recall: 0.7778
F1-score: 0.7479
The above scores indicate that the higher the comfort level of the seat, the more likely customers are to be
satisfied.While the effect of flight distance is significant, it has a small negative impact on satisfaction. This
suggests that longer flights may lead to a slight decrease in satisfaction, possibly due to fatigue, discomfort, or
other factors related to longer travel times.
b)Satisfaction measured according to Online boarding and On board service
Precision: 0.7226
Recall: 0.7768
F1-score: 0.7486
Out of all customers, the model predicted that 72.26% of the customers were satisfied. This indicates that the
model has a relatively low false positive rate. High recall indicates that the model effectively captures the most
satisfied customers, meaning that on-board service and online boarding are strong predictors of satisfaction.
The F1 score captures the overall balance of the model.
Discussion of the impact of cutoff values on the confusion matrix outputs
Logistic regression was used to predict customer satisfaction based on Flight Distance. In logistic regression,
cutoff values determine how predicted probabilities are converted to binary classifications. The default cutoff
is 0.5, with probabilities above 0.5 classified as 1 and below 0.5 classified as 0. Class 1 and 0 are assigned to
satisfied and non-satisfied categories.
Lowering cutoff to 0.4 will make the model more likely to predict an instance as satisfied (Class 1). This
increases sensitivity and correctly identifies satisfied customers, but at the cost of increasing false positives,
more dissatisfied customers are incorrectly predicted as satisfied. The model becomes less likely to correctly
identify dissatisfied customers, decreasing specificity. More dissatisfied customers might be wrongly classified
as satisfied. Raising the cutoff value to 0.6 could increase the likelihood of misclassifying satisfied customers
as dissatisfied, thereby, decreasing sensitivity. With a higher cutoff, the model will better detect dissatisfied
customers, increasing specificity, but at the expense of possibly missing out on identifying some satisfied
customers.
In a business context, if a company prioritizes correctly identifying satisfied customers, maybe for targeted
promotions or to retain them, lowering the cutoff value is appropriate. If a company feels identifying
dissatisfied customers is more important and will help them understand their pain points and serve them better,
then raising the cutoff might help, even at the risk of more false negatives- misclassifying satisfied customers
as dissatisfied.
Insights
Using One Predictor:
● Dependent variable is ‘satisfaction’ and Independent variable is ‘Flight Distance’
● The coefficients of flight distance indicated that longer flight distances are associated with a slight
decrease in the likelihood of passenger satisfaction.
● The P-value of both intercept and flight distance are statistically significant
● The negative coefficient for Flight. Distance suggests that longer flights are associated with a slight
decrease in passenger satisfaction
● The relatively small change in deviance suggests that while Flight.Distance is statistically significant,
it might not be the only or most important factor influencing satisfaction. Other variables should be
considered to improve the model.
Using two predictors
Dependent variable is ‘satisfaction’
Independent variables are ‘Flight Distance’ and ‘Seat Comfort’
Including Seat comfort as an additional predictor improves the model’s performance & helps
discriminate between satisfied and dissatisfied customers better.
❖ The coefficient of seat comfort is positive, and that of flight distance is negative i.e; longer flight
distances are associated with a slight decrease in satisfaction, and seat comfort has a significant
positive impact on passenger satisfaction.
❖ Adding seat comfort to the model significantly improves its efficiency, as indicated by the reduction
in deviance and AIC.
One predictor: Two predictors: Online Boarding and On
Board Service
Confusion Matrix Confusion Matrix Confusion Matrix and
Reference Reference Statistics
Prediction 0 1 Prediction 0 1
0 138 268 0 5929 4799 Reference
1 11583 13908 1 5792 9377 Prediction 0 1
0 11265 4766
Accuracy : 0.5424 Accuracy : 0.591 1 6332 16483
95% CI : (0.5363, 95% CI : (0.585, 0.597)
0.5485) No Information Rate : Accuracy : 0.7143
No Information Rate : 0.5474 95% CI : (0.7098,
0.5474 P-Value [Acc > NIR] : < 2.2e- 0.7188)
P-Value [Acc > NIR] : 16 No Information Rate :
0.9483 0.547
Kappa : 0.1686 P-Value [Acc > NIR] : <
Kappa : -0.0078 2.2e-16
Mcnemar's Test P-Value : < 2.2e-
Mcnemar's Test P-Value : <2e- 16 Kappa : 0.4191
16
Sensitivity : 0.5058 Mcnemar's Test P-Value : <
Sensitivity : 0.011774 Specificity : 0.6615 2.2e-16
Specificity : 0.981095 Pos Pred Value :
Pos Pred Value : 0.5527 Sensitivity : 0.6402
0.339901 Neg Pred Value : Specificity : 0.7757
Neg Pred Value : 0.6182 Pos Pred Value : 0.7027
0.545604 Prevalence : 0.4526 Neg Pred Value : 0.7225
Prevalence : Detection Rate : 0.2289 Prevalence : 0.4530
0.452601 Detection Prevalence : Detection Rate : 0.2900
Detection Rate : 0.4143 Detection Prevalence :
0.005329 Balanced Accuracy : 0.4127
Detection Prevalence : 0.5837 Balanced Accuracy :
0.015677 0.7079
Balanced Accuracy : 'Positive' Class : 0
0.496434 'Positive' Class : 0
'Positive' Class : 0
· Recommendations
• Improve seat comfort (cushioning, legroom, adjustable seats etc) across all classes, particularly in
Economy class, where dissatisfaction is higher.
• Make online boarding process more user-friendly and efficient. Streamline check-in procedures.
• Provide reliable Wi-Fi and a broad range of entertainment options.
• Reduce delays by improving operational efficiency, optimizing scheduling and proactive
communication to passengers
• Strengthen loyalty programs to retain and reward loyal customers.
R Code :
install.packages(c("readr","dplyr","stringr", "lubridate","tidyr", "readxl", "corrplot", "openxlsx"))
library(readr)
library(readxl)
library(dplyr)
library(corrplot)
library(openxlsx)
file_path <- "C:/Users/devar/OneDrive/Documents/BSF Airline Project/cleaned_data.csv"
airline_data <- read.csv("cleaned_data.csv")
head(airline_data)
summary(airline_data)
summary_stats <- summary(airline_data)
summary_df <- as.data.frame(summary_stats)
write_csv(summary_df, "C:/Users/devar/OneDrive/Documents/BSF Airline Project/cleaned_data.csv")
numeric_columns <- airline_data %>%
dplyr::select(where(is.numeric))
correlation_matrix <- cor(numeric_columns, use = "complete.obs")
print(correlation_matrix)
corrplot(correlation_matrix,
method = "circle",
tl.cex = 0.7,
cl.cex = 0.7,
tl.col = "black",
cl.ratio = 0.2)
corrplot(correlation_matrix,
method = "circle",
tl.cex = 0.7,
cl.cex = 0.7,
tl.col = "black",
cl.ratio = 0.2,
title = "Correlation Matrix of Airline Data", # Title of the plot
cex.main = 1.2,
mar = c(0, 0, 5, 0),
pch.cex = 0.8)