Professional Documents
Culture Documents
01/11/1023
remove(list=ls())
library(fpp3)
1
## -- Conflicts ------------------------------------------------- fpp3_conflicts --
## x lubridate::date() masks base::date()
## x dplyr::filter() masks stats::filter()
## x tsibble::intersect() masks base::intersect()
## x tsibble::interval() masks lubridate::interval()
## x dplyr::lag() masks stats::lag()
## x tsibble::setdiff() masks base::setdiff()
## x tsibble::union() masks base::union()
View(global_economy)
PART 1
a) Plot the GDP per capita over time for 3 countries of your choice. Which countries did you
choose? AND How has GDP per capita changed over time for these 3 countries?
ANSWER/COMMENT: I have selected three country which are United States, Canada and
Australia. After that I have found GDP growth is consistently increasing which meaning
upward trend.
global_economy %>%
filter(Code %in% my_choice) %>%
as_tsibble(key = Code, index = Year) %>%
ggplot(aes(x = Year, y = GDP / Population, color = Code)) +
geom_line() +
labs(
title = "GDP Over Time",
y = "GDP",
x = "Year"
)
2
GDP Over Time
60000
Code
40000
AUS
GDP
CAN
USA
20000
0
1960 1980 2000
Year
b) Out of ALL the countries in the dataset, which one had the highest GDP per capita in
2017? Filter and mutate the data as needed.
highest_gdp
## # A tsibble: 1 x 10 [1Y]
## # Key: Country [1]
## Country Code Year GDP Growth CPI Imports Exports Popul~1 GdpPe~2
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Luxembourg LUX 2017 6.24e10 2.30 111. 194. 230. 599449 104103.
## # ... with abbreviated variable names 1: Population, 2: GdpPerPoppulation
PART 2.
Use the canadian_gas data (monthly Canadian gas production in billions of cubic metres, January 1960 –
February 2005).
3
a) Plot Volume using autoplot, gg_subseries, gg_season to look at the effect of changing
seasonality over time. What do you observe with the seasonality?
ANSWER/COMMENT: The Canadian gas data reveals a developing trend and strong season-
ality, with summer output declining and winter output increasing. Seasonality dramatically
increased from 1975 to 1990 due to stronger summer and winter output fluctuations.
canadian_gas %>%
autoplot(Volume)+
labs(title = "Gas production of canada by monthly",
subtitle = "autoplot()",
y = " bcm")+
theme_replace()+
geom_line(col = "black")
15
bcm
10
1960 Jan 1970 Jan 1980 Jan 1990 Jan 2000 Jan
Month [1M]
canadian_gas %>%
gg_subseries(Volume)+
labs(title = "Gas production of canada by monthly",
subtitle = "gg_subseries()",
y = " bcm")
4
bcm
5
10
15
20
1960
1970
1980 Jan
1990
2000
1960
canadian_gas %>%
1970
y = "bcm")
1980
Feb
1990
2000
gg_season(Volume)+
gg_subseries()
1960
1970
1980
Mar
1990
2000
1960
1970
1980
Apr
subtitle = "gg_season()",
1990
2000
1960
1970
1980
May
1990
2000
1960
1970
5
1980
Jun
1990
Gas production of canada by monthly
2000
1960
1980
1990
2000
1960
1970
1980
Aug
1990
2000
1960
1970
1980
Sep
1990
2000
1960
1970
1980
Oct
1990
2000
1960
1970
1980
Nov
1990
2000
1960
1970
1980
Dec
1990
2000
Monthly Gas Production of Canada
gg_season()
20
15
1999
1989
bcm
10
1979
1969
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Month
b) Do an STL decomposition of the data. You will need to choose a seasonal window to allow
for the changing shape of the seasonal component
canadian_gas %>%
model(
STL(Volume ~ trend(window = 21) +
season(window = 13),
robust = TRUE)) %>%
components() %>%
autoplot()+
labs(title = "decomposition of canadian gas production")
6
decomposition of canadian gas production
Volume = trend + season_year + remainder
20
15
Volume
10
5
15
trend
10
5
season_year
1
0
−1
1.0
remainder
0.5
0.0
−0.5
−1.0
1960 Jan 1970 Jan 1980 Jan 1990 Jan 2000 Jan
Month
c) How does seasonal SHAPE change over time? Plot season_year using gg_season().
ANSWER/COMMENT:From a flat start in 1960 to a trend cycle in 1975, the seasonal shape
of gas production varies, indicating a slow increase over time.
canadian_gas %>%
gg_season(Volume)+
labs(title = "Monthly Gas Production of Canada",
subtitle = "gg_season()",
y = "bcm")
7
Monthly Gas Production of Canada
gg_season()
20
15
1999
1989
bcm
10
1979
1969
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Month
ad_plot
8
seasonally adjusted series
20
15
colour
Volume
original data
10
Seasonally Adjusted data
Trend Component
0
1960 Jan 1970 Jan 1980 Jan 1990 Jan 2000 Jan
Month
PART 3
Aus Retail Time Series We will use aus_rail dataset Using the code below, get a series (it gets a series
randomly by using sample() function):
set.seed(1234567)
head(myseries)
## # A tsibble: 6 x 5 [1M]
## # Key: State, Industry [1]
## State Industry Serie~1 Month Turno~2
## <chr> <chr> <chr> <mth> <dbl>
## 1 Victoria Cafes, restaurants and takeaway food servic~ A33494~ 1982 Apr 85.1
## 2 Victoria Cafes, restaurants and takeaway food servic~ A33494~ 1982 May 85.1
## 3 Victoria Cafes, restaurants and takeaway food servic~ A33494~ 1982 Jun 82.8
## 4 Victoria Cafes, restaurants and takeaway food servic~ A33494~ 1982 Jul 82.1
## 5 Victoria Cafes, restaurants and takeaway food servic~ A33494~ 1982 Aug 81.8
## 6 Victoria Cafes, restaurants and takeaway food servic~ A33494~ 1982 Sep 84.6
## # ... with abbreviated variable names 1: ‘Series ID‘, 2: Turnover
9
# remover NA's in the series with below:
myseries = myseries %>% filter(!is.na(`Series ID`))
nrow(myseries)
## [1] 441
a) Run a linear regression of Turnover on its trend. Hint: use TSLM() and trend() functions)
report(my_model)
## Series: Turnover
## Model: TSLM
##
## Residuals:
## Min 1Q Median 3Q Max
## -125.471 -50.951 -9.889 48.598 242.364
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -23.529 6.121 -3.844 0.000139 ***
## trend() 1.921 0.024 80.057 < 2e-16 ***
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 64.17 on 439 degrees of freedom
## Multiple R-squared: 0.9359, Adjusted R-squared: 0.9357
## F-statistic: 6409 on 1 and 439 DF, p-value: < 2.22e-16
b) Forecast for next 3 years. What are the values for the next 3 years are they monthly values?
10
ANSWER/COMMENT: INSERT ANSWER HERE
## # A fable: 36 x 6 [1M]
## # Key: State, Industry, .model [1]
## State Industry .model Month Turnover .mean
## <chr> <chr> <chr> <mth> <dist> <dbl>
## 1 Victoria Cafes, restaurants and takeaway ~ TSLM(~ 2019 Jan N(826, 4155) 826.
## 2 Victoria Cafes, restaurants and takeaway ~ TSLM(~ 2019 Feb N(828, 4155) 828.
## 3 Victoria Cafes, restaurants and takeaway ~ TSLM(~ 2019 Mar N(830, 4155) 830.
## 4 Victoria Cafes, restaurants and takeaway ~ TSLM(~ 2019 Apr N(832, 4155) 832.
## 5 Victoria Cafes, restaurants and takeaway ~ TSLM(~ 2019 May N(833, 4156) 833.
## 6 Victoria Cafes, restaurants and takeaway ~ TSLM(~ 2019 Jun N(835, 4156) 835.
## 7 Victoria Cafes, restaurants and takeaway ~ TSLM(~ 2019 Jul N(837, 4156) 837.
## 8 Victoria Cafes, restaurants and takeaway ~ TSLM(~ 2019 Aug N(839, 4156) 839.
## 9 Victoria Cafes, restaurants and takeaway ~ TSLM(~ 2019 Sep N(841, 4157) 841.
## 10 Victoria Cafes, restaurants and takeaway ~ TSLM(~ 2019 Oct N(843, 4157) 843.
## # ... with 26 more rows
myseries %>%
autoplot(Turnover) +
autolayer(forecast_result, series = "Turnover") +
labs(title = "TS plot with Forecast", x = "Time", y = "Turnover")
11
TS plot with Forecast
900
level
Turnover
600
80
95
300
d) Get the residuals, does it satisfy requirements for white noise error terms? Hint:
gg_tsresiduals()
First figure displays a residual plot for the damped model, confirming no white noise. However,
an ACF plot shows spikes exceeding 5% for all lags, indicating the probability of white noise.
12
Residual diagnostics
0.15
Innovation residuals
0.10
0.05
0.00
−0.05
−0.10
0.10 60
0.05
40
count
acf
0.00
−0.05 20
−0.10
0
6 12 18 24 −0.1 0.0 0.1
lag [1M] .resid
13