You are on page 1of 13

EXAM1 - Muhibbul Arman Mannan

01/11/1023

INSTRUCTIONS: INCLUDE CODES AS WELL AS ANSWERS/COMMENTS IN THE R MARKDOWN


UNDER THE CODE CHUNK. DON’T DELETE THE ** AND PLACE YOUR ANSWERS BETWEEN
THEM
Example: ANSWER/COMMENT: THIS IS THE ANSWER

remove(list=ls())
library(fpp3)

## Warning: package ’fpp3’ was built under R version 4.2.3

## -- Attaching packages ---------------------------------------------- fpp3 0.5 --

## v tibble 3.2.1 v tsibble 1.1.3


## v dplyr 1.1.1 v tsibbledata 0.4.1
## v tidyr 1.3.0 v feasts 0.3.1
## v lubridate 1.9.2 v fable 0.3.3
## v ggplot2 3.4.2 v fabletools 0.3.3

## Warning: package ’tibble’ was built under R version 4.2.3

## Warning: package ’dplyr’ was built under R version 4.2.3

## Warning: package ’tidyr’ was built under R version 4.2.2

## Warning: package ’lubridate’ was built under R version 4.2.2

## Warning: package ’ggplot2’ was built under R version 4.2.3

## Warning: package ’tsibble’ was built under R version 4.2.3

## Warning: package ’tsibbledata’ was built under R version 4.2.3

## Warning: package ’feasts’ was built under R version 4.2.3

## Warning: package ’fabletools’ was built under R version 4.2.3

## Warning: package ’fable’ was built under R version 4.2.3

1
## -- Conflicts ------------------------------------------------- fpp3_conflicts --
## x lubridate::date() masks base::date()
## x dplyr::filter() masks stats::filter()
## x tsibble::intersect() masks base::intersect()
## x tsibble::interval() masks lubridate::interval()
## x dplyr::lag() masks stats::lag()
## x tsibble::setdiff() masks base::setdiff()
## x tsibble::union() masks base::union()

library(tsibble)# do not forget to call the library, even for data.

We will be using the GDP information in the global_economy dataset.

View(global_economy)

PART 1

a) Plot the GDP per capita over time for 3 countries of your choice. Which countries did you
choose? AND How has GDP per capita changed over time for these 3 countries?

ANSWER/COMMENT: I have selected three country which are United States, Canada and
Australia. After that I have found GDP growth is consistently increasing which meaning
upward trend.

my_choice <- c("USA", "CAN", "AUS")

global_economy %>%
filter(Code %in% my_choice) %>%
as_tsibble(key = Code, index = Year) %>%
ggplot(aes(x = Year, y = GDP / Population, color = Code)) +
geom_line() +
labs(
title = "GDP Over Time",
y = "GDP",
x = "Year"
)

2
GDP Over Time

60000

Code
40000
AUS
GDP

CAN
USA

20000

0
1960 1980 2000
Year

b) Out of ALL the countries in the dataset, which one had the highest GDP per capita in
2017? Filter and mutate the data as needed.

ANSWER/COMMENT: Luxembourg (Year - 2017)

highest_gdp <- global_economy %>%


filter(Year == 2017) %>%
mutate(GdpPerPoppulation = GDP / Population) %>%
arrange(desc(GdpPerPoppulation)) %>%
head(1)

highest_gdp

## # A tsibble: 1 x 10 [1Y]
## # Key: Country [1]
## Country Code Year GDP Growth CPI Imports Exports Popul~1 GdpPe~2
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Luxembourg LUX 2017 6.24e10 2.30 111. 194. 230. 599449 104103.
## # ... with abbreviated variable names 1: Population, 2: GdpPerPoppulation

PART 2.

Use the canadian_gas data (monthly Canadian gas production in billions of cubic metres, January 1960 –
February 2005).

3
a) Plot Volume using autoplot, gg_subseries, gg_season to look at the effect of changing
seasonality over time. What do you observe with the seasonality?

ANSWER/COMMENT: The Canadian gas data reveals a developing trend and strong season-
ality, with summer output declining and winter output increasing. Seasonality dramatically
increased from 1975 to 1990 due to stronger summer and winter output fluctuations.

canadian_gas %>%
autoplot(Volume)+
labs(title = "Gas production of canada by monthly",
subtitle = "autoplot()",
y = " bcm")+
theme_replace()+
geom_line(col = "black")

Gas production of canada by monthly


autoplot()
20

15
bcm

10

1960 Jan 1970 Jan 1980 Jan 1990 Jan 2000 Jan
Month [1M]

canadian_gas %>%
gg_subseries(Volume)+
labs(title = "Gas production of canada by monthly",
subtitle = "gg_subseries()",
y = " bcm")

4
bcm

5
10
15
20
1960
1970
1980 Jan
1990
2000
1960

canadian_gas %>%
1970

y = "bcm")
1980
Feb

1990
2000

gg_season(Volume)+
gg_subseries()

1960
1970
1980
Mar

1990
2000
1960
1970
1980
Apr

subtitle = "gg_season()",
1990
2000
1960
1970
1980
May

1990
2000
1960
1970

5
1980
Jun

1990
Gas production of canada by monthly

2000
1960

labs(title = "Monthly Gas Production of Canada",


Month
1970
Jul

1980
1990
2000
1960
1970
1980
Aug

1990
2000
1960
1970
1980
Sep

1990
2000
1960
1970
1980
Oct

1990
2000
1960
1970
1980
Nov

1990
2000
1960
1970
1980
Dec

1990
2000
Monthly Gas Production of Canada
gg_season()
20

15

1999

1989
bcm

10
1979

1969

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Month

b) Do an STL decomposition of the data. You will need to choose a seasonal window to allow
for the changing shape of the seasonal component

canadian_gas %>%
model(
STL(Volume ~ trend(window = 21) +
season(window = 13),
robust = TRUE)) %>%
components() %>%
autoplot()+
labs(title = "decomposition of canadian gas production")

6
decomposition of canadian gas production
Volume = trend + season_year + remainder
20
15

Volume
10
5

15

trend
10
5

season_year
1
0
−1

1.0

remainder
0.5
0.0
−0.5
−1.0
1960 Jan 1970 Jan 1980 Jan 1990 Jan 2000 Jan
Month

c) How does seasonal SHAPE change over time? Plot season_year using gg_season().

ANSWER/COMMENT:From a flat start in 1960 to a trend cycle in 1975, the seasonal shape
of gas production varies, indicating a slow increase over time.

canadian_gas %>%
gg_season(Volume)+
labs(title = "Monthly Gas Production of Canada",
subtitle = "gg_season()",
y = "bcm")

7
Monthly Gas Production of Canada
gg_season()
20

15

1999

1989
bcm

10
1979

1969

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Month

d) Can you produce a plausible seasonally adjusted series?

ANSWER/COMMENT: INSERT ANSWER HERE

ad_plot <- canadian_gas %>%


model(
STL(Volume ~ trend(window = 21) +
season(window = 13),
robust = TRUE)
) %>%
components() %>%
ggplot(aes(x = Month)) +
geom_line(aes(y = Volume, colour = "original data")) +
geom_line(aes(y = season_adjust, colour = "Seasonally Adjusted data")) +
geom_line(aes(y = trend, colour = "Trend Component")) +
labs(title = "seasonally adjusted series")

ad_plot

8
seasonally adjusted series
20

15

colour
Volume

original data
10
Seasonally Adjusted data
Trend Component

0
1960 Jan 1970 Jan 1980 Jan 1990 Jan 2000 Jan
Month

PART 3

Aus Retail Time Series We will use aus_rail dataset Using the code below, get a series (it gets a series
randomly by using sample() function):

set.seed(1234567)

myseries <- aus_retail %>%


filter(`Series ID` == sample(aus_retail$`Series ID`,1))

head(myseries)

## # A tsibble: 6 x 5 [1M]
## # Key: State, Industry [1]
## State Industry Serie~1 Month Turno~2
## <chr> <chr> <chr> <mth> <dbl>
## 1 Victoria Cafes, restaurants and takeaway food servic~ A33494~ 1982 Apr 85.1
## 2 Victoria Cafes, restaurants and takeaway food servic~ A33494~ 1982 May 85.1
## 3 Victoria Cafes, restaurants and takeaway food servic~ A33494~ 1982 Jun 82.8
## 4 Victoria Cafes, restaurants and takeaway food servic~ A33494~ 1982 Jul 82.1
## 5 Victoria Cafes, restaurants and takeaway food servic~ A33494~ 1982 Aug 81.8
## 6 Victoria Cafes, restaurants and takeaway food servic~ A33494~ 1982 Sep 84.6
## # ... with abbreviated variable names 1: ‘Series ID‘, 2: Turnover

9
# remover NA's in the series with below:
myseries = myseries %>% filter(!is.na(`Series ID`))
nrow(myseries)

## [1] 441

# rename the column name `Series ID` with MyRandomSeries


rename(myseries, MyRandomSeries = `Series ID`)

## # A tsibble: 441 x 5 [1M]


## # Key: State, Industry [1]
## State Industry MyRan~1 Month Turno~2
## <chr> <chr> <chr> <mth> <dbl>
## 1 Victoria Cafes, restaurants and takeaway food servi~ A33494~ 1982 Apr 85.1
## 2 Victoria Cafes, restaurants and takeaway food servi~ A33494~ 1982 May 85.1
## 3 Victoria Cafes, restaurants and takeaway food servi~ A33494~ 1982 Jun 82.8
## 4 Victoria Cafes, restaurants and takeaway food servi~ A33494~ 1982 Jul 82.1
## 5 Victoria Cafes, restaurants and takeaway food servi~ A33494~ 1982 Aug 81.8
## 6 Victoria Cafes, restaurants and takeaway food servi~ A33494~ 1982 Sep 84.6
## 7 Victoria Cafes, restaurants and takeaway food servi~ A33494~ 1982 Oct 91.7
## 8 Victoria Cafes, restaurants and takeaway food servi~ A33494~ 1982 Nov 97.7
## 9 Victoria Cafes, restaurants and takeaway food servi~ A33494~ 1982 Dec 109.
## 10 Victoria Cafes, restaurants and takeaway food servi~ A33494~ 1983 Jan 94.6
## # ... with 431 more rows, and abbreviated variable names 1: MyRandomSeries,
## # 2: Turnover

a) Run a linear regression of Turnover on its trend. Hint: use TSLM() and trend() functions)

my_model <- myseries %>%


model(TSLM(Turnover ~ trend()))

report(my_model)

## Series: Turnover
## Model: TSLM
##
## Residuals:
## Min 1Q Median 3Q Max
## -125.471 -50.951 -9.889 48.598 242.364
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -23.529 6.121 -3.844 0.000139 ***
## trend() 1.921 0.024 80.057 < 2e-16 ***
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 64.17 on 439 degrees of freedom
## Multiple R-squared: 0.9359, Adjusted R-squared: 0.9357
## F-statistic: 6409 on 1 and 439 DF, p-value: < 2.22e-16

b) Forecast for next 3 years. What are the values for the next 3 years are they monthly values?

10
ANSWER/COMMENT: INSERT ANSWER HERE

forecast_result <- forecast(my_model, h = 36)


forecast_result

## # A fable: 36 x 6 [1M]
## # Key: State, Industry, .model [1]
## State Industry .model Month Turnover .mean
## <chr> <chr> <chr> <mth> <dist> <dbl>
## 1 Victoria Cafes, restaurants and takeaway ~ TSLM(~ 2019 Jan N(826, 4155) 826.
## 2 Victoria Cafes, restaurants and takeaway ~ TSLM(~ 2019 Feb N(828, 4155) 828.
## 3 Victoria Cafes, restaurants and takeaway ~ TSLM(~ 2019 Mar N(830, 4155) 830.
## 4 Victoria Cafes, restaurants and takeaway ~ TSLM(~ 2019 Apr N(832, 4155) 832.
## 5 Victoria Cafes, restaurants and takeaway ~ TSLM(~ 2019 May N(833, 4156) 833.
## 6 Victoria Cafes, restaurants and takeaway ~ TSLM(~ 2019 Jun N(835, 4156) 835.
## 7 Victoria Cafes, restaurants and takeaway ~ TSLM(~ 2019 Jul N(837, 4156) 837.
## 8 Victoria Cafes, restaurants and takeaway ~ TSLM(~ 2019 Aug N(839, 4156) 839.
## 9 Victoria Cafes, restaurants and takeaway ~ TSLM(~ 2019 Sep N(841, 4157) 841.
## 10 Victoria Cafes, restaurants and takeaway ~ TSLM(~ 2019 Oct N(843, 4157) 843.
## # ... with 26 more rows

c) Autoplot the forecast with original data

myseries %>%
autoplot(Turnover) +
autolayer(forecast_result, series = "Turnover") +
labs(title = "TS plot with Forecast", x = "Time", y = "Turnover")

## Warning in distributional::geom_hilo_ribbon(intvl_mapping, data =


## dplyr::anti_join(interval_data, : Ignoring unknown parameters: ‘series‘

## Warning in distributional::geom_hilo_linerange(intvl_mapping, data =


## dplyr::semi_join(interval_data, : Ignoring unknown parameters: ‘series‘

## Warning in geom_line(mapping = mapping, data = dplyr::anti_join(object, :


## Ignoring unknown parameters: ‘series‘

## Warning in ggplot2::geom_point(mapping = mapping, data =


## dplyr::semi_join(object, : Ignoring unknown parameters: ‘series‘

11
TS plot with Forecast

900

level
Turnover

600
80
95

300

1990 Jan 2000 Jan 2010 Jan 2020 Jan


Time

d) Get the residuals, does it satisfy requirements for white noise error terms? Hint:
gg_tsresiduals()

First figure displays a residual plot for the damped model, confirming no white noise. However,
an ACF plot shows spikes exceeding 5% for all lags, indicating the probability of white noise.

residual_diagnostic_model <- myseries %>%


model(MAdM = ETS(Turnover ~ error("M") + trend("Ad") + season("M")))
residual_diagnostic_model %>% gg_tsresiduals + labs(title = "Residual diagnostics")

12
Residual diagnostics
0.15
Innovation residuals

0.10
0.05
0.00
−0.05
−0.10

1990 Jan 2000 Jan 2010 Jan 2020 Jan


Month

0.10 60

0.05
40

count
acf

0.00

−0.05 20

−0.10
0
6 12 18 24 −0.1 0.0 0.1
lag [1M] .resid

13

You might also like