You are on page 1of 6

TIME SERIES ANALYSIS

data("AirPassengers")

mydata<-AirPassengers

str(AirPassengers)

head(AirPassengers)

datasets::AirPassengers
1. Using function Time Series (TS) (How to covert data into Time series)

ts(mydata, frequency = 12, start=c(1949,1)) # use 12 because it is monthly data and 12 months are
there, and start with 1949 with first month- it is used to check whether suitable for time series

attributes(mydata) # to examine the class of data- it shows its time series data

or class (mydata)

2. plot(mydata) # trends in the data

increasing trends but its seasonal also- it means in one year it increases and also decreases, hence it is
non-stationary in nature- hence we need to make log transformation. The span is smaller in initial
years and later on increases

3. Log Transformation

mydata<-log(mydata)

now the Span is almost similar


4. Decomposition of Additive Time Series

decomp<-decompose(mydata)

decomp$figure

plot(decomp$figure, type="b", xlab="month",ylab="seasonality index",col="red",las=2)# trends


become more apparent and June, July, Aug and Sep highest Passengers. July and Aug are having
20% traffic (0.2 Seasonality Index).

5. plot(decomp)# Used to identify the various prominent characteristics of Time series.

It can be seen following features-


- Seasonality
- Increasing trends
- Random with in the season
TIME SERIES FOR FORECASTING WITH ARIMA (AUTOREGRESSIVE INTEGRATED MOVING AVERAGES)

Install package (forecast)

library(forecast)

model1<-auto.arima(mydata)

run model1

ARIMA(0,1,1)(0,1,1)[12]

In this

0 means P- AR Order
1 means D- Degree of differencing
1 means Q- MA Order (moving averages)

AIC=-483.4 AICc=-483.21 BIC=-474.77 (help in


chossing best TS model)
AIC- Akaike information criterion
AICc- AIC with correction
BIC- Bayesian information criterion

attributes(model1)

model1$coef
ACF and PACF Plots

acf(model1$residual, main="Correlogram")# The autocorrelation in data is in the limit of dotted lines

pacf(model1$residuals, main"Partial Correlogram")# One line touches the lower boundary. To verify
the same use LjungBox Test

Box.test(model1$residual,lag=20,type="Ljung-Box")

data: model1$residual
X-squared = 17.688, df = 20, p-value = 0.6079
As the P-values is more than 0.05, hence little evidence is there for having non zero autocorrelation in
data.

Residual Plots

hist(model1$residuals, col="pink",xlab="error", main="histpgram of residuals", freq=FALSE)


lines(density(model1$residuals)) # seems normal distribution curve

forecast1<-forecast(model1,48) # forecast for 48 months

library(ggplot2)

autoplot(forecast1)# forecast for next 48 months as highlighted


Accuracy for this Model1

accuracy(model1)

ME RMSE MAE
MPE MAPE
Training set 0.0005730622 0.03504883 0.02626034
0.01098898 0.4752815
MASE ACF1
Training set 0.2169522 0.01443892

You might also like