You are on page 1of 4

ACRP Problem Statement 18-03-14 Recommended Funding Amount: --

Using Data Mining to Predict the Risk of Flight Delays at U.S. Airports – A Delay
Prediction Toolkit

ACRP Staff Comments

--

TRB Aviation Committees Comments

AIRFIELD AND AIRSPACE CAPACITY AND DELAY COMMITTEE: The committee has a neutral view of
the problem statement. While this could be a valuable tool for airports and airlines to plan for anticipated
delays, there doesn't seem to be any distinction between flight delay prediction or airport-level delay prediction.
There are extensive studies in both areas.

Association Committee Comments

--

Review Panel Comments

Not recommended. The proposed research is not a viable research topic for the ACRP program.

AOC Disposition

There was no discussion. No funds were allocated.


ACRP Problem Number 18-03-14

1. Problem Statement Title


Using Data Mining to Predict the Risk of Flight Delays at U.S Airports – A Delay Prediction
Toolkit
2. Background
The Federal of Aviation Administration (FAA) defines National Airspace System (NAS) as “the
network of United States airspace to include air navigation facilities, equipment, services,
airports or landing areas, aeronautical charts, information/services, rules, regulations, procedures,
technical information, manpower, and material” (FAA, 2012). NAS is one of the largest and
most complex stochastic systems in the world, and its complexity creates great difficulties in
predicting incidents in the system.
Flight delay is one of the most serious issues in NAS. The increasing air travel demand over the
years has put U.S. airports at their capacity limit, which increases the risk of flight delays.
Airlines for America estimated in 2014 the per-minute-cost of delays to U.S. Airlines was $81.18
per minute, 2.7 percent greater than in 2013, resulting in a total of $9.15 billion in direct aircraft
operating costs (Airlines for America, 2014). Understanding and mitigating flight delays in NAS
is a major, long-term objective of the FAA. Prediction of delay incidents and identification of
impact factors will allow airport executives to make critical decisions in mitigating the risk and
increasing the percentage of on-time flights. The rising question is how to accurately predict the
risk of delay in the real time at each airport given the large amount of data available in NAS.
Existing prediction models of flight delays mainly use traditional statistical methods and do not
capture the complexity of NAS in their prediction. These models do not provide practical
solutions for airports in predicting and handling delay situations since they are built at a small
scale with numerous assumptions made to meet the requirements of statistical tests.
The FAA’s Aviation System Performance Metrics (ASPM) online access system collects data on
flights to and from 77 ASPM airports every fifteen minutes. In addition, the ASPM database
provides data on many perspectives of the airports including city pair delay data, airport
efficiency data, airport capacity data, airport weather data, and airport traffic data. These
characteristics make the ASPM a large dataset with high volume, variety, and velocity. In order
to analyze this large dataset to predict the risk of flight delays in a real time manner, data mining
must be used. Data mining is the method of using machine learning algorithms to detect hidden
patterns in large and noisy data. This method has tremendous advantages over traditional
statistical methods.
I have completed several research projects proving that data mining can be used effectively in
predicting the risk of flight delays in NAS at the airport level (please see section 7 – Related
Research). Using a sample of the ASPM data, I have built several data mining models that can

1
predict accurately the risk of flight delay at an US airport based on various factors such as city
pair delays, airport efficiency, airport capacity, and airport traffic. The same algorithms can be
used to build an effective prediction model of flight delays for US airports; such model can be
fed with the continuous ASPM data to deliver the delay prediction in a real time manner. Airport
executives can use the information to make necessary decisions to handle and be prepared for the
delay situations. In addition, the information about factors influencing the delay also allow
airport executives to develop effective strategies to mitigate the risk of delays.
3. Objective
The purpose of this research is to develop and test data mining models for flight delay using the
ASPM database. Advanced algorithms such as decision tree, neural network, subject vector
machine, and Bayesian inference will be used to develop the prediction models. The models will
be tested and validated at several US airports of different types and sizes. A delay prediction
toolkit will be developed and include data mining models, algorithms, and software. Finally,
toolkit user guide will be produced to provide the airports with descriptions of the data mining
methods, datasets, data mining process, and guidelines on how to deploy and use the models for
the delay prediction purpose and how to interpret the prediction results.
4. Proposed Tasks
The following tasks are recommended for this project
• Comprehensive review of literature on flight delay prediction projects, data mining
methods, and big data analytics
• Collection of the ASPM data
• Development of data mining models using various machine learning algorithms
• Validation and evaluation of the prediction models
• Development of the toolkit that consists of the models, algorithms, and software
• Development of the toolkit user guide that provides detailed instructions and guidelines
on how to use and deploy the models at the airport level and how to interpret the
prediction results
• Survey of airport executives on the usability and ease of use of the prediction models
• Teleconferences with the project panel
• Interim report submission
• Final report submission along with algorithms and data mining models
5. Estimated Funding
This project is estimated to require $400,000.00, which includes travel, data collection, software,
and model deployment expenses.
6. Estimated Research Duration
The project is estimated to require 18 months to complete.
7. Related Research

2
1. Truong D. (2013). Prediction of Flight Delays Using Data Mining and Bayesian Inference
Methods (funded project). Embry-Riddle Aeronautical University. (Principal Investigator)
2. Truong, D., Friend, M, & Cheng, H. (2017). Applications of Business Analytics in Predicting
Flight On-time Performance. Transportation Journal, forthcoming. (in press)
3. Truong, D. (2016). Developing Airline Segmentation Based on the On-time Performance.
International Journal of Aeronautical Science & Aerospace Research, 3(5), 131-140.
4. Truong, D. (2016). Using Causal Data Mining to Predict the Risk of Delays in Passenger
Transportation. Transportation Research Part B: Methodological Journal. (under review)
5. Truong, D., Chen, H. (2015). Prediction Modeling for a Complex and Dynamic System.
INFORM Conference on Business Analytics & Operations Research, March 30 – April 1.
Huntington Beach, LA.
8. Process Used to Develop Problem Statement
This problem statement is developed based on the proposer’s research expertise and experience
in data mining, big data analytics, and flight delays plus the needs from airport perspective and
expert opinions. The proposer is teaching a data mining course at the doctoral level and has
completed several research projects on using data mining to predict the risk of flight delays.
9. Person Submitting Problem Statement and Date
Dothang Truong, Ph.D., CSCP
Associate Professor of Doctoral Studies
Embry-Riddle Aeronautical University
Daytona Beach, FL 32114
Date Submitted: 03/20/2017

You might also like