You are on page 1of 16

IST 387 Final Project

Section M003
Christina Demes
Histograms of Numerical Values in airData

This histogram is relatively bell-shaped, This histogram is severely skewed to the


which reveals that the majority of left, indicating that the vast majority of
customers are middle aged (in their 30s, departing flights are only delayed between
40s, and 50s). 0 and 50 minutes.
Histograms of Numerical Values in airData

Like the histogram of departure_delay_in_minutes, This histogram is not skewed to either side, nor is it a
this graph is skewed to the left. This reveals that the normal distribution. The duration of most flights fall
vast majority of arriving flights are only delayed between 50 and 150 minutes, and 250 to 300
between 0 and 25 minutes. minutes.
Histograms of Numerical Values in airData

This histogram shows that the majority of flights cover This histogram is slightly skewed to the right,
a distance of 2000 to 2500 miles indicating that the majority of customers have a
positive experience on their flight and would
recommend the airline to someone.
Tables of Categorical Data in airData

This table lists the destination cities and


the number of flights that completed trips
to each one based on the information in
airData. Based on the table, the most
flights were headed to Chicago, IL.
Tables of Categorical Data in airData

This table lists the origin cities of each flight and the number of flights that took off from each one.
Based on this table, the most flights took off from Atlanta, GA
Tables of Categorical Data in airData
This table reveals there are
more women customers than
male ones in the airData data
frame.

This table reveals that the most


people are travelling for business
and the second most are travelling
for personal reasons.

This table reveals that only


about 12.25% of the flights in
airData were cancelled.
Boxplots

This boxplot evaluates the relationship between This boxplot evaluates the relationship between This boxplot evaluates the relationship
type_of_travel and likelihood_to_recommend. gender and likelihood_to_recommend. The between flight cancellation and
Several outliers exist between the business travel medians and upper quartile values for each group likelihood_to_recommend. When people’s
and mileage tickets groups. Putting those aside, are almost identical, meaning that your gender flights were not cancelled, there was a greater
we can tell that people who were traveling on does not affect your likelihood to recommend. range of likelihoods to recommend than for
business or had mileage tickets were more likely The female group has a lower quartile value than those whose flights were cancelled. People
to recommend the airline that the people who the male group, meaning that their were more likely to recommend the airline
were traveling for personal reasons. likelihood_to_recommend scores have a greater when their flight was not cancelled than they
range than that the males. were when it was cancelled.
Detractors
Map
This map plots the
likelihood_to_recommend
based on each destination city.
Because it is using the data from
the detractors dataset, it only
includes likelihood values that
are below 7. The key indicates
that the darker the blue dot is,
the less likely passengers were
to recommend the airline. From
this map, we can see flights
headed to Philadelphia had the
least likelihood of being
recommended.
Promoters
Map
This map plots the
likelihood_to_recommend
based on each origin city. It
uses data from the promoters
dataset, so it only includes the
likelihood that are 8 and above.
The key indicates that the
lighter the blue dot is, the more
likely passengers were to
recommend the airline. From
this map, we can see there are
several origin cities that have
high likelihood_to_recommend
values. Tampa, FL is one
example.
Detractors Word Cloud

In word clouds, the size of the word correlates


to the frequency it appears in the data set. The
most commonly used words in the comments
coming from the detractors data set were:
flight, southeast, luggage, and southeast.
Outside of these, there are words like “terrible”
and “worst” that contain a negative
connotation. It makes sense that these words
were commonly used in these comments
because these customers would not
recommend the airline based on their
experience.
Promoters Word Cloud

The most commonly used words in the


promoters comments were: flight, southeast,
good, service, and time. Outside of these, there
are a number of words with positive
connotations in the cloud. This makes sense
because the people writing these comments
would recommend the airline to family and
friends based on their experience.
Matched Words in Detractors Text
Positive Words
Negative Words

To determine what words have positive and negative connotations, we read


in the positive-words and negative-words texts to use as dictionaries. By
doing this, we are able to match the words contained in the comments of the
detractors data frame to words in the positive and negative dictionaries. The
tables above list the words used in the comments with the frequency they
appear below. Looking at these tables, it is clear to see that more negative
words were used in the detractors comments than positive. This makes sense
because based on their experience, they would not recommend the airline,
so it must have been a negative one.
Matched Words in Promoters Text
Negative Words
Positive Words

Like we did with the comments in the detractors data frame, we


matched the words within the comments of the promoters data
frame to the positive and negative dictionaries. Just glancing at
these tables, you can see there were more positive words used
than negative ones among the promoters comments. Again, this is
logical because their experience must have been positive if they
would recommend the airline to others.
Linear Model

The R-squared is a diagnostic measure of


how accurate our model is. The p-value is
the probability attached to the likelihood of
getting the data results the model predicts.
The significance of the predictors is
indicated by asterisks, so for this data set the
most significant predictor is type of travel.

By analyzing the coefficient of


determination, we see that the adjusted R-
squared value is equal to 0.4218. The closer
this value is to 1, the better the model is at
explaining variability in the dependent
variable. Therefore, this model fairly weak
at predicting likelihood_to_recommend
based on the predictors we gave it.
Recommendation for the Managers
One recommendation I have for the airline to improve their likelihood_to_recommend values is to
study the matchedNdetract table. By doing so, they will be able to effectively analyze the copious
amount of comments they receive to find what the major issues people have with the airline are. In
the table on Slide 13, for example, we can see that one of the most included words in the
comments from the detractors was “delayed.” From this, we can infer that flight delays are a major
reason why a customer wouldn’t recommend the airline to family or friends. My recommendation
for the airline is to decrease the amount of delays, or if a flight is delayed, offer the customers some
sort of voucher/offering to fend off frustrations.

You might also like