You are on page 1of 8

Data Analysis of Monthly Australian Domestic Flight Between 2005 and

2018

Ridwan Saeful Rohman

2018/08/13

1. Introduction

This analysis is purely created for non commercial use only. The purpose of this analysis is to inform the reader about the traffic of air
plane by the time of the year, so that the reader can take advantage to plan their vacation or business trip wisely.

2. Research Question

A. What month in the year has the busiest flight?


B. What month in the year has the most empty seats at the end of the year?
C. Which airport rute has the busiest flight at the end of the year?
D. Which airport rute has the busiest flight at the start of the year?

3. Preparation

In [1]:
library(data.table)
library(plotly)
library(stringr)
library(tibble)
library(dplyr)
library(readr)
library(ggplot2)
library(tidyr)
library(xml2)

#Set working directory

setwd("D:\\ShinKeshin\\Australian Air Traffic")

#Load airport data

airport <- read_csv('domcitypairsweb.csv')


airport$City1 <- airport$City1 %>% str_to_lower() %>% str_to_title()
airport$City2 <- airport$City2 %>% str_to_lower() %>% str_to_title()

#Filter airport data by year, between year 2005 and 2018

airport <- airport %>% filter(Year>=2005) %>% filter(Year<2019)

head(airport)

Loading required package: ggplot2

Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

last_plot

The following object is masked from 'package:stats':

filter

The following object is masked from 'package:graphics':

layout
layout

Attaching package: 'dplyr'

The following objects are masked from 'package:data.table':

between, first, last

The following objects are masked from 'package:stats':

filter, lag

The following objects are masked from 'package:base':

intersect, setdiff, setequal, union

Parsed with column specification:


cols(
City1 = col_character(),
City2 = col_character(),
Month = col_integer(),
Passenger_Trips = col_integer(),
Aircraft_Trips = col_integer(),
Passenger_Load_Factor = col_double(),
`Distance_GC_(km)` = col_integer(),
RPKs = col_integer(),
ASKs = col_integer(),
Seats = col_integer(),
Year = col_integer(),
Month_num = col_integer()
)

City1 City2 Month Passenger_Trips Aircraft_Trips Passenger_Load_Factor Distance_GC_(km) RPKs ASKs Seats

Albury Sydney 38353 9569 362 70.6 452 4325188 6128668 13559

Albury Sydney 38384 10416 398 69.6 452 4708032 6764632 14966

Albury Sydney 38412 12371 444 67.0 452 5591692 8347536 18468

Albury Sydney 38443 11939 434 65.9 452 5396428 8185268 18109

Albury Sydney 38473 11876 455 66.7 452 5367952 8041984 17792

Albury Sydney 38504 11184 442 61.2 452 5055168 8265272 18286

In [2]:
#Load city data

city <- fread('worldcitiespop.csv',data.table = F)


city.australia <- city %>% filter(Country == 'au') %>% select(City,Latitude,Longitude)
names(city.australia)[1] <- 'City'
city.australia$City <- city.australia$City %>% str_to_lower() %>% str_to_title()

head(city.australia)

City Latitude Longitude

Abbotsford -33.85000 151.1333

Abbotsford -37.80000 145.0000

Abbotsham -41.21667 146.1833

Abbotts -26.31667 118.3833

Abercorn -25.15000 151.0333

Abercrombie -33.95000 149.3167

5. Data Component/ Structure

In [3]:
str(airport)

Classes 'tbl_df', 'tbl' and 'data.frame': 10291 obs. of 12 variables:


$ City1 : chr "Albury" "Albury" "Albury" "Albury" ...
$ City2 : chr "Sydney" "Sydney" "Sydney" "Sydney" ...
$ Month : int 38353 38384 38412 38443 38473 38504 38534 38565 38596 38626 ...
$ Passenger_Trips : int 9569 10416 12371 11939 11876 11184 13090 13582 13847 12427 ...
$ Aircraft_Trips : int 362 398 444 434 455 442 461 471 458 462 ...
$ Passenger_Load_Factor: num 70.6 69.6 67 65.9 66.7 61.2 68.8 69 73.2 65.2 ...
$ Distance_GC_(km) : int 452 452 452 452 452 452 452 452 452 452 ...
$ RPKs : int 4325188 4708032 5591692 5396428 5367952 5055168 5916680 6139064 6258
844 5617004 ...
$ ASKs : int 6128668 6764632 8347536 8185268 8041984 8265272 8598396 8894004 8549
580 8611956 ...
$ Seats : int 13559 14966 18468 18109 17792 18286 19023 19677 18915 19053 ...
$ Year : int 2005 2005 2005 2005 2005 2005 2005 2005 2005 2005 ...
$ Month_num : int 1 2 3 4 5 6 7 8 9 10 ...
- attr(*, "spec")=List of 2
..$ cols :List of 12
.. ..$ City1 : list()
.. .. ..- attr(*, "class")= chr "collector_character" "collector"
.. ..$ City2 : list()
.. .. ..- attr(*, "class")= chr "collector_character" "collector"
.. ..$ Month : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ Passenger_Trips : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ Aircraft_Trips : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ Passenger_Load_Factor: list()
.. .. ..- attr(*, "class")= chr "collector_double" "collector"
.. ..$ Distance_GC_(km) : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ RPKs : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ ASKs : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ Seats : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ Year : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ Month_num : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
..$ default: list()
.. ..- attr(*, "class")= chr "collector_guess" "collector"
..- attr(*, "class")= chr "col_spec"

The most granular data of the airport data set is the summary of flight each month between two Australian cities.

6. Data Wrangling

In [4]:
airport <- merge(airport,city.australia,by.x = 'City1',by.y = 'City')
names(airport)[13] <- 'City1.Latitude'
names(airport)[14] <- 'City1.Longitude'

airport <- merge(airport,city.australia,by.x = 'City2',by.y = 'City')


names(airport)[15] <- 'City2.Latitude'
names(airport)[16] <- 'City2.Longitude'

head(airport)

City2 City1 Month Passenger_Trips Aircraft_Trips Passenger_Load_Factor Distance_GC_(km) RPKs ASKs Seats

Alice
Adelaide 40483 0 0 0.0 1316 0 0 0
Springs

Alice
Adelaide 40452 0 0 0.0 1316 0 0 0
Springs

Alice
Adelaide 41671 0 0 0.0 1316 0 0 0
Springs
Adelaide 41671 0 0 0.0 1316 0 0 0
Springs
City2 City1 Month Passenger_Trips Aircraft_Trips Passenger_Load_Factor Distance_GC_(km) RPKs ASKs Seats
Alice
Adelaide 40391 0 0 0.0 1316 0 0 0
Springs

Alice
Adelaide 40330 0 0 0.0 1316 0 0 0
Springs

Alice
Adelaide 42156 9846 94 67.5 1316 12957336 19184648 14578
Springs

7. Analysis

7.1 Data Visualization

In [5]:
airport <- airport %>% mutate(id = rownames(airport))
airport.1 <- airport %>%
select(-contains("Latitude"), -contains("Longitude"))
airport.1 <- airport.1 %>%
gather('City1', 'City2', key = "Airport.type", value = "City")
airport.1$Airport.type <- airport.1$Airport.type %>% str_replace(pattern = "City1", replacement =
"Departure")
airport.1$Airport.type <- airport.1$Airport.type %>% str_replace(pattern = "City2", replacement =
"Arrive")
airport.1 <- merge(airport.1, city.australia, by.x = "City", by.y = "City")
head(airport.1)

City Month Passenger_Trips Aircraft_Trips Passenger_Load_Factor Distance_GC_(km) RPKs ASKs Seats Year

Adelaide 40483 0 0 0.0 1316 0 0 0 2010

Adelaide 40452 0 0 0.0 1316 0 0 0 2010

Adelaide 41671 0 0 0.0 1316 0 0 0 2014

Adelaide 40391 0 0 0.0 1316 0 0 0 2010

Adelaide 40330 0 0 0.0 1316 0 0 0 2010

Adelaide 42156 9846 94 67.5 1316 12957336 19184648 14578 2015

In [6]:
au.map <- map_data('world') %>% filter(region == "Australia") %>% fortify()
head(au.map)

long lat group order region subregion

123.5945 -12.42568 133 7115 Australia Ashmore and Cartier Islands

123.5952 -12.43594 133 7116 Australia Ashmore and Cartier Islands

123.5732 -12.43418 133 7117 Australia Ashmore and Cartier Islands

123.5725 -12.42393 133 7118 Australia Ashmore and Cartier Islands

123.5945 -12.42568 133 7119 Australia Ashmore and Cartier Islands

158.8788 -54.70976 139 7267 Australia Macquarie Island

7.1.1 Visualization of flight route

In [23]:
In [23]:

map_aus_plot <- ggplot() +


geom_map(data=au.map, map=au.map,
aes(x=long, y=lat, group=group, map_id=region),
fill="white", colour="black") +
ylim(-43, -10) +
xlim(110, 160) +
geom_text(data = airport.1, aes(x = Longitude, y = Latitude,label=City),hjust=0, vjust=0) +
geom_line(data = airport.1, aes(x = Longitude, y = Latitude, group = id), colour = "red", alpha =
.1) +
geom_point(data = airport.1, aes(x = Longitude, y = Latitude))+
labs(title = "Australian Domestic Aircraft Routes")

map_aus_plot

Warning message:
"Ignoring unknown aesthetics: x, y"

7.1.2 Bar plot of Air Traffic Amount by Year

In [14]:
plot.year <- airport.1 %>%
ggplot(aes(x = Year, fill = City)) +
ggplot(aes(x = Year, fill = City)) +
geom_bar() +
labs(title = "Airport Traffic Amount by City from 2005 to 2018")

ggplotly(plot.year)

In [9]:
ls()

'airport' 'airport.1' 'au.map' 'city' 'city.australia' 'map_aus_plot' 'myCars' 'plot.year'

In [30]:

airport.2 <- airport.1 %>% mutate(empty_seat = Seats-Passenger_Trips) %>% filter(Month_num == 12)


%>% select(id,City, Year,Latitude,Longitude,empty_seat) %>%
group_by(id,City,Year,Latitude,Longitude) %>% summarise(empty_seat = sum(empty_seat))
head(airport.2)

id City Year Latitude Longitude empty_seat

10003 Perth 2015 -41.57151 147.1713 27894

10003 Perth 2015 -31.95224 115.8614 27894

10003 Sydney 2015 -33.86148 151.2055 27894

10004 Perth 2015 -41.57151 147.1713 27894

10004 Perth 2015 -31.95224 115.8614 27894

10004 Sydney 2015 -33.86148 151.2055 27894

In [32]:

map_aus_plot <- ggplot() +


geom_map(data=au.map, map=au.map,
aes(x=long, y=lat, group=group, map_id=region),
fill="white", colour="black") +
ylim(-43, -10) +
xlim(110, 160) +
geom_text(data = airport.2, aes(x = Longitude, y = Latitude,label=City),hjust=0, vjust=0) +
geom_line(data = airport.2, aes(x = Longitude, y = Latitude, group = id),color = 'red',alpha = .1
) +
geom_point(data = airport.2, aes(x = Longitude, y = Latitude)) +
labs(title = "Australian Domestic Aircraft Routes")

Warning message:
"Ignoring unknown aesthetics: x, y"
In [35]:
ggplotly(map_aus_plot)

Error in mutate_impl(.data, dots): Column `id` can't be modified because it's a grouping variable
Traceback:

1. ggplotly(map_aus_plot)
2. ggplotly.ggplot(map_aus_plot)
3. gg2list(p, width = width, height = height, tooltip = tooltip,
. dynamicTicks = dynamicTicks, layerData = layerData, originalData = originalData,
. source = source, ...)
4. Map(function(x, y) {
. if (is.null(y[["group"]]))
. return(x)
. dplyr::group_by_(x, y[["group"]])
. }, return_dat, mappingFormulas)
5. mapply(FUN = f, ..., SIMPLIFY = FALSE)
6. (function (x, y)
. {
. if (is.null(y[["group"]]))
. return(x)
. dplyr::group_by_(x, y[["group"]])
. })(dots[[1L]][[3L]], dots[[2L]][[3L]])
7. dplyr::group_by_(x, y[["group"]])
8. group_by_.data.frame(x, y[["group"]])
9. group_by(.data, !!!dots, add = add)
10. group_by.data.frame(.data, !!!dots, add = add)
11. group_by_prepare(.data, ..., add = add)
12. add_computed_columns(.data, new_groups)
13. mutate(.data, !!!mutate_vars)
14. mutate.tbl_df(.data, !!!mutate_vars)
15. mutate_impl(.data, dots)

In [34]:
map_aus_plot

You might also like