Professional Documents
Culture Documents
Approach
The main objective here is to utilize the available An outline of the model developed to predict
flight operational data and data mining techniques delays of individual flights is shown in Fig. 1. The
to construct an analytical model. The analytical model consists of two main parts, the training
model constructed here is used to predict the flight process and the prediction process. The training
delay based on some of the flight attributes which process starts with data collection. Historical flight
will be discussed in the latter section of this paper. data and data corresponding to airlines, airports are
Additional models will be created to determine the collected and they are joined together using the
most likely cause of a flight delay and to predict the scheduled departure time and airport as the join
approximate duration of the delay. keys. In the preprocessing step, estimating missing
data and normalization are performed. Then the
IV. RESEARCH ATTRIBUTES training set is finally ready and it is used to train the
predictive model with sampling techniques. Data for
In this paper, we extract some attributes that the prediction process is collected and preprocessed
affect the flight delay, and formulate them as an in the same way as the training set. After that it is
input vector x in the proposed model as shown fed into the model trained with the training data. In
below in Table 1. the end, the model assigns each data point a label.
Airports Considered UA, AA, US, F9, B6, OO,
AS, NK, WN, DL, EV,
HA, MQ, VX
Flight On-time Scheduled Departure,
Performance data Departure Delay,
(Input) Scheduled Time, Elapsed
Time, Air Time, Distance,
Scheduled Arrival, Arrival
Delay, Previous Arrival
Delay, Previous Departure
Delay
Selected Features Scheduled Departure,
Departure Delay,
Scheduled Time, Elapsed
Time, Air Time,
Distance, Scheduled
Arrival, Arrival Delay,
Previous Arrival Delay,
Previous Departure Fig 1: Summary of the model developed
Delay, Month, Day of The above system proposed consists of 2 phases:
Month, Airline Name,
Origin Latitude, Origin Phase A: Data and Pre-processing
Longitude, Destination Phase B : Predictive Model
Latitude, Destination
Longitude A) Data and Pre-Processing
Classification (Output)1 - indicates occurrence of
delay i. Data Collection
0 - indicates absence of
delay To train and test models, we used a publicly
available dataset for domestic air traffic from
Regression (Output) Numerical value (Score) Kaggle. The original source of our dataset is the on-
of the flight delay line Bureau and Transportation Statistics database.
prediction Datasets of various airports, airlines, flights were
Table 1. Feature Study merged together with the help of joining keys and
the resultant dataset was the final dataset on which
models were deployed. The data set is from the year
2004-2019 and consists of well over 3 Million B) Predictive Model
examples with following features categorized as
follows: The model consists of 2 stages:
Fig 4: R2 score for SVM model – Arrival Delay In fig 6, we have represented the R2 score for
both training and testing dataset using SVM
Fig 3 and fig 4 represent implementation of SVM model.
model and the R2 score achieved for departure delay The Blue dots represent Training results and
and arrival delay is 0.18 and 0.25 respectively. Green dots represent Testing results.
DEPARTURE DELAY VIII. ACKNOWLEDGEMENT
TRAINING TESTING
R2 R2 This paper and the research behind it would not
REGRESSOR MSE MSE have been possible without the exceptional support
score score
SVM 0.18 31.63 0.18 32.00 of our project guide and supervisor, Mr. Abhay
DECISION
Patil. His enthusiasm, knowledge and exact
TREE
1.0 0.0 0.32 28.98 attention to detail have been an inspiration and kept
STACKING 0.86 12.64 0.05 34.19 the work on track from the beginning. We would
also like to thank our friends and family who
Table 3: Departure Delay scores
supported us and also offered deep insight into the
study.
ARRIVAL DELAY
TRAINING TESTING IX. REFERENCES
R2 R2
REGRESSOR MSE MSE [1] Chakrabarty, Navoneel, et al. ”Flight Arrival
score score
SVM 0.33 32.17 0.34 13.99 Delay Prediction Using Gradient Boosting
DECISION Classifier.” Emerging Technologies in Data
1.0 0.0 0.09 36.27
TREE Mining and Information Security. Springer,
STACKING 0.88 12.7 0.20 33.87 Singapore, 2019.
Table 4: Arrival Delay scores [2] Suvojit Manna, Sanket Biswas, Riyanka Kundu,
Somnath Rakshit, Priti Gupta, Subhas Burman
Table 3 and Table 4 summarizes the accuracy and ”A statistical approach to predict flight delay
precision of R2 score and MSE after implementation using gradient boosted decision tree”,
of the three algorithms such as SVM, Decision International Conference on Computational
Tree, Stacking depicting departure and arrival Intelligence in Data Science(ICCIDS), 2017.
delays. [3] Juan Jose Robollo and Hamsa Balakrishnan
”Characterization and Prediction of Air Traffic
VII. FUTURE SCOPE AND CONCLUSION Delays”.
[4] Sruti Oza, Somya Sharma, Hetal Sangoi, Rutuja
Implementation of the SVM model given the set of Raut, V.C. Kotak ”Flight Delay Prediction
attributes (feature values), is able to accurately predict System Using Weighted Multiple Linear
the Arrival Delay and Departure Delay if an aircraft Regression”, International
travelling from a specific origin to a destination with Journal Of Engineering And Computer Science
a specified set of parameters will arrive on time or get ISSN:2319-7242, Volume 4 Issue 4 April 2015.
delayed, with an accuracy of 0.18 (Departure Delay) [5] Anish M. Kalliguddi and Aera K. Leboulluec
and 0.33 (Arrival Delay) An accuracy near to 1 ”Predictive Modeling of Aircraft Flight Delay”,
succinctly proves the efficiency of this model, for our Universal Journal of Management 5(10): 485-
purpose. Thus, this satiates our requirement of 491, 2017, DOI: 10.13189/ujm.2017.051003
determining the delay for any given aircraft, given [6] Brett Naul ”Airline Departure Delay
merely the parameters of it. Prediction”
[7] Young Jin Kim, Sun Choi, Simon Briceno,
Dimitri Mavris ”A deep learning approach to
The Future Scope of this work involves the appliance
flight delay prediction”, 35th Digital Avionics
of additional advanced and novel pre-processing
Systems Conference (DASC), 2016.
techniques, Machine Learning-Deep Learning Hybrid
[8] Sina Khanmohammadi, Salih Tutun,
Models tuned with Grid rummage around for
Yunus Kucuk ”A New Multilevel Input Layer
achieving higher model performance
Artificial Neural Network for Predicting Flight
Delays at JFK
Airport”, doi.org/10.1016/j.procs.2016.09.321
[9] Loris Belcastro, Fabrizio Marozzo, Domenico
Talia and Paolo Trunfion ”Using Scalable Data
Mining for Predicting Flight Delays”
[10] https://en.wikipedia.org/wiki/Flight cancellati
on and delay.