Ryan Airlines - Plag Check

You might also like

You are on page 1of 2

**Title:** Exploratory Data Analysis and Logistic Regression for Customer Satisfaction in the Airline

Industry

**Introduction:**

In the fiercely competitive airline industry, understanding what factors drive customer satisfaction is
paramount for airlines to stay ahead in the market. Customer satisfaction not only impacts passenger
loyalty but can significantly influence a company's reputation and bottom line. To gain insights into
this critical aspect of the airline business, we embark on a data analysis project using a dataset
named "Ryanair_data."

The primary objective of this project is to explore the factors that influence customer satisfaction
with a focus on Ryanair, a well-known low-cost airline. To achieve this, the project employs a
combination of exploratory data analysis (EDA) and logistic regression techniques.

**Exploratory Data Analysis (EDA):**

The EDA phase of the project aims to get a comprehensive understanding of the dataset. Key aspects
of the EDA include:

1. **Variable Distribution:**

The exploratory data analysis (EDA) section of the code provides an overview of the dataset and
the distribution of variables. Here are the key observations related to variable distribution:

- The dataset contains 103,904 rows and 25 columns.

- Data preprocessing is performed to handle missing values. There are 310 null values in
"Arrival.Delay.in.Minutes" and 1 null value in "Flight.Distance." Since there is no reliable way to fill
them and the number of rows with missing values is small compared to the total dataset, these 311
rows are dropped.

- The code explores five character columns to check for issues where 'nan' values might be present
as actual text instead of being treated as missing values. Fortunately, this problem is not present in
this dataset.

- The code also checks for duplication in the 'id' column, ensuring that the same information is not
repeated for several rows. It confirms that there is no duplication in the 'id' column.

2. **Correlation Analysis:**

The code proceeds to perform correlation analysis between variables using a heatmap. Before
constructing the heatmap, the character columns are converted into binary or ordinal variables, and
unnecessary columns are dropped. Here are the key findings from the correlation analysis:

- The correlation heatmap displays the relationships between different variables. It highlights
variables that are highly correlated with each other, such as "Cleanliness" with "Food and Drink,"
"Cleanliness" with "Seat Comfort," and so on.

- The target variable, "Satisfaction," is assessed for its correlation with other variables. It is found
that there is almost no linear correlation between satisfaction and "Gate Location" and "Gender."
- Variables such as "Online Boarding," "Inflight Wi-Fi Service," and "Type of Travel" show a fair
correlation with satisfaction, indicating that these factors might impact satisfaction positively.

- Surprisingly, variables like "Time Convenient" and "Food & Drink" are negatively correlated with
satisfaction, meaning that higher satisfaction with these aspects leads to lower overall satisfaction.

The EDA section in the code provides valuable insights into the dataset's variable distribution and
correlation patterns, which is crucial for understanding the relationships between different factors
and their potential influence on customer satisfaction.

You might also like