You are on page 1of 53

TIME SERIES

Project Submitted to

The Controller of Examinations,

St Berchmans College (Autonomous), Changanassery,

in partial fulfillment of the award of Bachelor Degree in MAthematics

Submitted By:

ALEN SUNNY

Reg. No: 12100003

Under the guidance of

MS. DEVI N

Assistant Professor

Department of Mathematics

St. Berchmans College(Autonomous), Chanaganassery

DEPARTMENT OF MATHEMATICS
ST. BERCHMANS COLLEGE , CHANGANASSERY
2021-2024
CERTIFICATE

This is to certify that this Mr. ALEN SUNNY , has undergone Bachelor of Science in Mathematics

Course at St. Berchmans College, Changanassery, during the period 2021-2024 and has under-

taken the dissertation under the guidance of Ms. Devi N , Assistant Professor , Department of

Mathematics , St. Berchmans College (Autonomous ) Changanassery . He is permitted to submit

this dissertation to the Controller of Examination of the College

Changanassery

19 / 02 /2024 Fr.John J Chavara

Assistant Professor and Head

Department of Mathematics
CERTIFICATE

This is to certify that this project entitled TIME SERIES is a record of bonafide project work

done by ALEN SUNNY (Reg. no : 12100003 ) under my guidance and supervision, in partial

fulfilment of the requirements for the award of Bachelor of Science Degree in Mathematics and that

his project has not been previously submitted for the award of any Degree, Diploma, Fellowship,

Title or Recognition.

MS. DEVI N Fr.John J Chavara

Assistant Professor Assistant Professor and Head

Department of Mathematics Department of Mathematics

Changanassery

19 / 02 / 2024
DECLARATION

I, ALEN SUNNY (Reg. No: 12100001 , 2021-2024), do hereby declare that the

dissertation entitled TIME SERIES is a bonafide record of project work done by me under

the guidance and supervision of Ms. DEVI N , Assistant Professor, Department of Mathematics,

St Berchmans College (Autonomous), Changanassery, and that this dissertation or any part

there of has not previously formed the basis for the award of any degree, diploma, associateship,

fellowship or any other similar title of any University or Institution

Changanassery ALEN SUNNY

19 / 02 / 2024
ACKNOWLEDGEMENT

First and foremost, praises and thanks to the Almighty God for guiding me to complete this project

successfully.

I take this opportunity to express my profound gratitude and deep regards to my guide

Ms. DEVI N, Assistant Professor, Department of Mathematics, for her exemplary guidance,

monitoring and constant encouragement throughout the course of this project. The blessing, help

and guidance given by her, time to time shall carry me a long way in the journey of life on which

I am about to embark.

I would like to thank Fr. John J Chavara, Assistant Professor and Head of the Department of

Mathematics for his kind hearted support.

I may add that I am indebted to my family for their valuable encouragement. I express my sincere

thanks to all my colleagues and friends for their help and suggestions to improve this report.

Finally, my thanks go to all the people who have supported me to complete the project work

directly or indirectly.

ALEN SUNNY
Contents

1 Basic Concepts of Time Series 7

1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2 Long Term Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.2.1 Secular Trend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.2.2 Cyclical Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.3 Short-term Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.3.1 Seasonal Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.3.2 Irregular Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.4 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Estimation of Trend 11
2.1 Method for Measurement of Secular Trend . . . . . . . . . . . . . . . . . . . . . . . 11

2.1.1 Freehand curve Method (Graphical Method) . . . . . . . . . . . . . . . . . 12

2.1.2 Method of selected points: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.1.3 Method of semi- average: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.1.4 Method of Least Squares: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2 Methods for Measurement of seasonal variations . . . . . . . . . . . . . . . . . . . 19

2.2.1 Method of Simple Average . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.2.2 Ratio To Trend Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.2.3 Ratio to Moving Average Method . . . . . . . . . . . . . . . . . . . . . . . 22

2.3 Method of Cyclic Variation: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.4 Measurement of Irregular Component: . . . . . . . . . . . . . . . . . . . . . . . . . 25

3 Identifying Time series Models 26

3.1 Component of Time Series: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.1.1 Additive Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.1.2 Multiplicative Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2
3.2 Forecasting related models of Time Series: . . . . . . . . . . . . . . . . . . . . . . 27

3.2.1 Exponential smoothing methods . . . . . . . . . . . . . . . . . . . . . . . . 28

3.2.2 Single – Equation Regression Models: . . . . . . . . . . . . . . . . . . . . . 28

3.2.3 ARIMA Models: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4 FITTING 34

4.1 Fitting ARIMA models : The Box-Jenkins approach . . . . . . . . . . . . . . . . . 34

4.1.1 Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.1.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.1.3 Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.2 Application For Time Series: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.2.1 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
INTRODUCTION

Time series analysis serves as a powerful tool in understanding and deciphering patterns within

sequential data. In this chapter of , we discuss into the fundamental definition of time series and

explore its various manifestations, unraveling the intricate variations that make it a captivating

field of study.

Moving forward, the second chapter focuses on the pivotal methods employed for measuring sec-

ular trends and deciphering seasonal variations within time series data. This chapter acts as a

crucial foundation for comprehending the nuances of temporal patterns inherent in the datasets

under scrutiny.

The third chapter takes a comprehensive approach, exploring the landscape of time series mod-

els. From additive to multiplicative models, and delving into the intricacies of Autoregressive

(AR), Moving Average (MA), Autoregressive Integrated Moving Average (ARIMA), and more,

this section equips the reader with a diverse toolkit for analyzing and interpreting time- dependent

phenomena.

Our journey culminates in the fourth chapter, where the Box-Jenkins approach takes center stage.

This chapter unfolds the methodology behind fitting time series models, providing a systematic

and robust framework for practitioners to apply in real- world scenarios.

4
PRELIMINARIES

What are Time Series ? A Time series refers to the values of a chronologically ordered over

a successive period of time. Or can be said that It is a set of data collected and arranged in

accordance of Time. The analysis of time series means separating out different components which

influences values of series.

Examples

• Weather Data

• Rainfall Measurements

• Stock Prices etc..

Mean

It is the average of the given numbers and is calculated by dividing the sum of given numbers by

the total number of numbers.

P
x
M ean =
n

x is the sum of the all observations , n is the total number of observations

Covariance

covariance is a measure of how much two random variables change together. It provides insights

5
into the direction of the relationship between two variables, whether they tend to increase or

decrease together.

Pn
i=1 (Xi − X̄)(Yi − Ȳ )
Cov(X, Y ) =
n−1

• Xi and Yi are the individual data points,

• X̄ and Ȳ are the means of X and Y , respectively

• n is the number of data points

Method of least squares It is the process of finding the best-fitting curve or line of best fit for

a set of data points by reducing the sum of the squares of the offsets (residual part) of the points

from the curve

Method of least squares =


P P P
n xy − y x
m= P 2 P
n x − ( x)2
P P
y−m x
b=
n

y = mx + b

This method is used to find a linear line of the form y = mx + b, where y and x are variables, m

is the slope, and b is the y-intercept.

Pearson correlation coefficient It is a number between –1 and 1 that measures the strength

and direction of the relationship between two variables.

cov(X, Y )
ρ(X, Y ) =
σX σY

6
Chapter 1

Basic Concepts of Time Series

1.1 Definition

Assume that the series Xt runs throughout time, that is (Xt )t = 0, ±1, ±2, . . . but is only observed

at times t = 1,. . . , n. So we observe ( X1 , . . . , Xn ). Theoretical properties refer to the underlying

process (Xt )t ∈ Z

The notations Xt and X(t) are interchangeable.

The theory for the time series is based on the assumption of ‘second-order stationary’. (second-

order stationary means time series have a constant mean, variance and autocovariance that doesn’t

change with time).

The variations in the time series can be divided into two parts: long term variations

and short term variations.

7
1.2 Long Term Variation

Long-term variations refer to patterns or changes in a phenomenon that occur over extended peri-

ods of time, typically spanning years, decades, or even centuries. These variations are characterized

by their slow and gradual nature.

Long-term variations are divided into two :

1. Secular Trend

2. Cyclical Variation

1.2.1 Secular Trend

Secular Trend shows the definite and basic tendency of the statistical data with the passage of

time. This statistical data it often shows a consistent upward or downward direction.

example

1. Changes in productivity

2. Increase in rate of capital

3. Growth of population

1.2.2 Cyclical Variation

Cyclical variations refer to recurring, periodic fluctuations that extend beyond a year. These

variations are often linked to economic or business cycles. Series relating to prices, production,

8
demand etc undergo in the cyclical variation.

examples

1. Stock Prices

2. Medical Data

3. Population growth etc . . .

1.3 Short-term Variations

Short-term variations refer to changes that occur over relatively brief periods of time, typically

ranging from seconds to months.

It is divided into two :

1. Seasonal Variations

2. Irregular Variations

1.3.1 Seasonal Variations

Seasonal Variations are those variations which occur with some degree of regularity within a

specific period of one year or shorter. These variations are associated with recurring events,

climate changes, holidays etc .

examples

• In summers the sale of ice-cream increases

9
• At the time of diwali season , sale of crackers increases

1.3.2 Irregular Variations

Irregular Variations also known as random variations, It is caused by unusual, unexpected and

accidental events like natural calamities. It is unpredictable and non-patterned fluctuations in

nature.

examples

1. Crime Rates

2. Transportation Trends

3. Energy Consumption etc. . .

1.4 Objective

To understand the underlying structure of Time Series represented by sequence of observations by

breaking it down to its components. And to fit a mathematical model and procced to forecast the

future.

10
Chapter 2

Estimation of Trend

As we know time series consists of data arranged chronologically. In forecasting (an application of

Time series) it is important to analyse the characteristic movement of variations in the given time

series. Following are methods which is served as a tool for this analysis:

2.1 Method for Measurement of Secular Trend

• Freehand curve Method (Graphical Method)

• Method of selected points

• Methods for Least Squares

11
2.1.1 Freehand curve Method (Graphical Method)

This is the simple method of studying trend. In this method the given time series data are plotted

on graph paper by taking time on X-axis and the other variable on Y-axis. The graph obtained will

be irregular as it would include short-run oscillations. We may observe the up and down movement

of the curve and if a smooth freehand curve is drawn passing approximately to all points of a curve

previously drawn, it would eliminate the short-run oscillations (seasonal, cyclical and irregular

variations) and show the long-period general tendency of the data. However, It is very difficult to

draw a freehand smooth curve and different persons are likely to draw different curves from the

same data. The following points must be kept in mind in drawing a freehand smooth curve:

1. The curve is smooth

2. The numbers of points above the line or curve are equal to the points below it.

3. The sum of vertical deviations of the points above the smoothed line is equal to the sum

of the vertical deviations of the points below the line. In this way the positive deviations

will cancel the negative deviations. These deviations are the effects of seasonal cyclical and

irregular variations and by this process they are eliminated.

4. The sum of the squares of the vertical deviations from the trend line curve is minimum.

Example

The table below shows the data of sale of nine years:-

Year 1990 1991 1992 1993 1994 1995 1996 1997 1998
Sales (in Lakhs ) 65 95 115 63 120 100 150 135 172

If we draw a graph taking year on x-axis and sales on y- axis, it will be irregular as shown

below. Now drawing a freehand curve passing approximately through all this points will represent

trend line (trend line = blue line)

12
Merits

1. It is simple method of estimating trend which requires no mathematical calculations.

2. It is a flexible method as compared to rigid mathematical trends and therefore , a better

representative of the trend of the data.

3. This method can be used even if trend is not linear.

4. If the observations are relatively stable, the trend can easily be approximated by this method.

5. Being a non mathematical method , it can be applied even by a common man.

13
Demrits

1. It is subjective method. The values of trend obtained by different statisticians would be

different and hence, not reliable

2. Predictions made on the basis of this method are little value

2.1.2 Method of selected points:

In this method, two points considered to be the most representative or normal, are joined by

straight line to get secular trend. This, again, is a subjective method since different persons may

have different opinions regarding the representative points. Further, only linear trend can be

determined by this method.

2.1.3 Method of semi- average:

In this method, as the name itself suggests semi averages are calculated to find out the trend values.

By semi-averages is meant the averages of the two halves of a series. In this method, thus, the

given series is divided into two equal parts (halves) and the arithmetic mean of the values of each

part (half) is calculated. The computed means are termed as semi-averages. Each semi-average is

paired with the centre of time period of its part. The two pairs are then plotted on a graph paper

and the points are joined by a straight line to get the trend. It should be noted that if the data is

for even number of years, it can be easily divided into two halves. But if it is for odd number of

years, we leave the middle year of the time series and two halves constitute the periods on each

side of the middle year.

14
Merits

1. It is simple method of measuring trend

2. It is an objective method because anyone applying this to given data would get identical

trend value.

Demerits

1. This method can give only linear trend of the data irrespective of whether it exists or not.

2. This is only a crude method of measuring trend, since we do not know whether the effects

of other components are completely eliminated or not.

Example

Fit a trend line by the method of semi averages for the given data

Year 2000 2001 2002 2003 2004 2005 2006


Production 105 155 120 100 110 125 135

Solution

Since the number of years is odd(seven) , we will leave the middle year’s production value and

obtain the averages of first three years and last three years.

Year Production Average


2000 105
105+115+120
2001 115 3 = 113.33
2002 120
2003 100 (left out)
2004 110
110+125+135
2005 125 3 = 123.33
2006 135

15
Figure 2.1: Trend line

2.1.4 Method of Least Squares:

This is one of the most popular methods of fitting a mathematical trend. The fitted trend is

termed as the best in the sense that the sum of squares of deviations of observations, from it, is

minimized. This method of Least squares may be used either to fit linear trend or a nonlinear

trend (Parabolic and Exponential trend)

Fitting of Linear Trend:

Given the data (yt , t) for n periods, where t denotes time period such as year, month, day, etc.

We have the values of the two constants, ‘a’ and ‘b’ of the linear trend equation:

yt = a + bt

Where the value of ‘a’ is merely the Y-intercept or the height of the line above origin. That is,

when X=0,Y= a. The other constant ‘b’ represents the slope of the trend line. When b is positive,

the slope is upwards, and when b is negative, the slope is downward. This line is termed as the

16
line of best fit because it is so fitted that the total distance of deviations of the given data from

the line is minimum. The total of deviations is calculated by squaring the difference in trend value

and actual value of variable. Thus, the term “Least Squares” is attached to this method using

least square method, the normal equation for obtaining the values of a and b are :

X X
yt = na + b t

X X X
tyt = a t+b t2

P
Let X = t – A, such that X = 0 where A denotes the year of origin. The above equations can

also be written as
X X
Y = na + b t

X X X
XY = a X +b x2

P
Since x = 0 i.e. deviation from actual mean is zero We can write

P P
Y XY
a= ;b = P 2
n x

Merits

1. Given the mathematical form of the trend to be fitted, the least squares method is an

objective method.

2. Unlike the moving average method, it is possible to compute trend values for all the periods

and predict the value for a period lying outside the observed data

3. The results of the method of least squares are most satisfactory because the fitted trend

(y0 − yt )2
P P
satisfies the two most important properties , ie .(1) (y0 − yt ) = 0 and (2)

minimum. Here y0 denotes the observed values and yt denotes the calculated trend value.

17
The first property implies that the position of fitted trend equation is such that the sum of devi-

ations of observations above and below this equal to zero. The second property implies that the

sums of squares of deviations of observations, about the trend equations, are minimum.

Demrits

1. As compared with the moving average method, it is cumbersome method.

2. It is not flexible like the moving average method. If some observations are added, then the

entire calculations are to be done once again.

3. It can predict or estimate values only in the immediate future or the past.

4. This method cannot be used to fit growth curves, the pattern followed by the most of the

economic and business time series.

Example

Given below are the data relating to the production of sugarcane in a district.

Fit a straight trend by the method of least squares tabulate the trend values

Year 2000 2001 2002 2003 2004 2005 2006


Prod. of Sugarcane 40 45 46 42 47 50 46

Table 2.1: Production of Sugarcane Over the Years

Solution

Computation of trend values by the method of least squares(ODD Years)

P P
Y 316 XY
a= = = 45.146; b = P 2 = 1.036
n 7 x

18
Year (x) Production of X = x-2003 x2 XY Trend Values (yt )
Sugarcane (Y)
2000 40 -3 9 -120 42.04
2001 45 -2 4 -90 43.07
2002 46 -1 1 -46 44.11
2003 42 0 0 0 45.14
2004 47 1 1 47 46.18
2005 50 2 4 100 47.22
2006 46 3 9 138 48.23
2
N= 7 ΣY = 316 ΣX = 0 Σx = 28 ΣXY = 29 Σyt = 316

Therefore , the required equation of the straight line trend is given by

Y = a+bX

Y = 45.143 + 1.036(x-2003)

The trend values can be obtained by

WhenX = 2000, yt = 45.143 + 1.036(2000 − 2003) = 42.035

When X = 2001, yt = 45.143 + 1.036(2001 − 2003) = 43.071

2.2 Methods for Measurement of seasonal variations

2.2.1 Method of Simple Average

This is the easiest and the simplest method of studying seasonal variations. This method is used

when the time series variable consists of only the seasonal and random components. The effect

of taking average of data corresponding to the same period (say first quarter of each year) is to

eliminate the effect of random component and thus, the resulting averages consist of only seasonal

component. These averages are then converted into seasonal indices.

19
It involves the following steps:

If figures are given on a monthly basis:

1. Average the raw data monthly year wise

2. Find the sum of all the figures relating to a month. It means add all values of January for

all the year. Repeat the process for all the months.

3. Find the average of monthly figures. i.e., divide the monthly total by the number of years.

4. Obtain the average of monthly averages by dividing the sum of averages by 12.

5. Taking the average of monthly average as 100 find out the percentages of monthly averages.

For the average of January (X1) this percentage would be:( Monthly Average(for Jan) /

Grand average) ×100

Merits and Demerits

This is a simplest method of measuring seasonal variations. However this method is based on the

unrealistic assumption that the trend and cyclical variations are absent from the data.

Example

Calculate the seasonal index for the monthly sales of a product using of simple average

Months
Years
Jan Feb Mar Apr May June July Aug Sep Oct Nov Dec
2001 15 41 25 31 29 47 41 19 35 38 40 30
2002 20 21 27 19 17 25 29 31 35 39 30 44
2003 18 16 20 28 24 25 30 34 30 38 37 39

Solution

Computation of seasonal indices by method of simple averages.

20
S.I for Jan = (Monthly Average(for Jan) / Grand average) x 100

Grand Average=355.582/12=29.63

S.I for Jan=(17.666/29.361)100 = 59.62 ;

S.I for Feb=(26/29.361)100 = 87.77

Similarly other seasonal index values can be obtained.

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
59.62 87.77 80.99 87.77 78.74 109.12 112.49 94.49 112.49 129.36 120.36 126.89

Table 2.2: Months and its Simple Interest (S.I)

2.2.2 Ratio To Trend Method

This method is used when then cyclical variations are absent from the data, i.e. the time series

variable Y consists of trend, seasonal and random components. Using symbols, we can write Y =

T.S .R

Various steps for the computation:

1. Obtain the trend values for each month or quarter, etc, by the method of least squares

2. Divide the original values by the corresponding trend values. This would eliminate trend

21
values from the data.

3. To get figures in percentages, the quotients are multiplied by 100.

Thus, we have three equations:


Y
× 100
T

T.S.R
× 100
T

S.R × 100

Merits and Demerits

It is an objective method of measuring seasonal variations. However, it is very complicated and

doesn’t work if cyclical variations are present.

2.2.3 Ratio to Moving Average Method

The ratio to moving average is the most commonly used method of measuring seasonal variations.

This method assumes the presence of all the four components of a time series. Various steps in

the computation are as follows:

1. Compute the moving averages with period equal to the period of seasonal variations. This

would eliminate the seasonal components and minimize the effect of random component.

The resulting moving averages would consist of trend, cyclical and random components.

2. The original values, for each month are divided by the respective moving average figures and

the ratio is expressed as a percentage, i.e. SR” = Y / M.A = TCSR / TCR’, where R’ and

R” denote the changed random components.

3. Finally, the random component R” is eliminated by the method of simple averages

22
Merits and Demerits

This method assumes that all the four components of a time series are present and, therefore,

widely used for measuring seasonal variations. However, the seasonal variations are not completely

eliminated if the cycles of these variations are not of regular nature. Further, some information is

always lost at ends of the time series.

Example

Calculate seasonal indices by Ratio to moving average method from the following data

Year 1st Quarter 2nd Quarter 3rd Quarter 4th Quarter


2005 68 62 61 63
2006 65 58 66 61
2007 68 63 63 67

Solution

Year Quater Given 4-Fig Total 2-Fig Total 4-Fig Avg % of Moving Avg
2005 I 68
II 62 254 251 63.186 96.63
III 61 62.260 101.20
IV 63
2006 I 65
II 58 247 252 62.375 104.21
III 66 62.750 92.43
IV 61 62.875 104.97
2007 I 68
II 63 255 261 64.125 106.04
III 63 64.500 97.67
IV 67

Year 1st Quarter 2nd Quarter 3rd Quarter 4th Quarter


2005 - - 96.63 101.20
2006 104.21 92.43 104.97 95.50
2007 106.04 97.67 - -
Total 210.25 190.10 201.60 196.70
Average 105.125 95.05 100.80 98.35
Seasonal Index 105.30 95.21 100.97 98.52

Table 2.3: Seasonal Index Table

23
399.32
Arithmetic average of averages = = 99.83
4

By expressing each quarterly average as percentage of 99.83 , we will obtain seasonal indices.

Seasonal index of 1st Quarter = (105.125/99.83) × 100 = 105.30

Seasonal index of 2nd Quarter = (95.05/99.83) × 100 = 95.21

Seasonal index of 3rd Quarter = (100.80/99.83) × 100 = 100.97

Seasonal index of 4th Quarter = (98.35/99.83) × 100 = 98.52

2.3 Method of Cyclic Variation:

Cyclic variation exists in the data when tendency of the data increases and decreases in a given

period but time period is not fixed for cyclic variation (because it a short term variation).

For measurement of cycle variation first calculate seasonal and trend components then remove

seasonal, trend component and irregular component. Irregular component is just like an error term

like the previous knowledge which cannot be directly eliminated. To eliminate irregular component

moving average method is used. This method of elimination of the irregular component is known

as smoothing of irregular component.

Steps for computation of cyclic variation are:

First estimate trend (T) and seasonal values (S) of the given time series.

1. Divide time series values (Y) by trend (T) and seasonal estimated value (S), get cyclic (C)

and random component (R).

Y / T S = TCSI / T S = CI

2. b) Now eliminate random component from second step by using moving average of 3 or 5

months period and get cyclic component.

24
2.4 Measurement of Irregular Component:

Irregular component is the last component, also known as error term of the time series. An error

term can’t be eliminate fully from any time series because this happens due to natural forces.

There are no methods available to measure this component . But this component can be removed

little bit by averaging of the indices. If there is multiplicative model of time series then it can be

removed by dividing all other components to the irregular component. If there is additive model

then it can be removed by subtracting all components to their regular component.

25
Chapter 3

Identifying Time series Models

In the previous chapters we have seen that Time series , variables of time series etc.. Now we are

going to see the Time series models

Modelling of time series can be classified into two :

1. Component of Time Series

2. Forecasting

3.1 Component of Time Series:

Component wise we can classify time series into two:

• Additive Model

• Multiplicative Model

26
3.1.1 Additive Model

A data model in which the effects of individual factors are differentiated and added together to

model the data.

Let Y = original observation, T = trend component , S = seasonal component, C= cyclical

component , and I = irregular component

Assume that the value of Y of a composite series is the sum of the four components. That is

Y =T +S+C +I

3.1.2 Multiplicative Model

This model assumes that as the data increase, so does the seasonal patten. Most time series plots

exhibit such a pattern . In this model , the trend , seasonal , cyclical , irregular components are

multiplied.

Let Y = original observation, T = trend component , S = seasonal component, C= cyclical

component , and I = irregular component

It is assumed that the value Y of a composite series is the product of the four components. That

is

Y = T SCI

3.2 Forecasting related models of Time Series:

For forecasting , there are mainly four approaches based on time series data.

1. Exponential smoothing methods

27
2. Single-equation regression models

3. Simultaneous-equation regression models

4. Autoregressive integrated moving average (ARIMA) models

3.2.1 Exponential smoothing methods

Exponential smoothing is a time series method for forecasting univariate time series data. Time

series methods work on the principle that a prediction is a weighted linear sum of past observa-

tions or lags. The Exponential Smoothing time series method works by assigning exponentially

decreasing weights for past observations. Because the weight assigned to each demand observation

is exponentially decreased.

The model assumes that the future will be somewhat the same as the recent past. The only pattern

that Exponential Smoothing learns from demand history is its level - the average value around

which the demand varies over time.

Exponential smoothing is generally used to make forecasts of time-series data based on prior as-

sumptions by the user, such as seasonality or systematic trends

3.2.2 Single – Equation Regression Models:

Single – equation Regression Models in time series analysis involve using a single equation to

explain and predict the behaviour of a dependent variable . Typically, these models express the

dependent variable as a linear function of one or more independent variables , including time.

The most basic form is the autoregressive model (AR) , where the current value of the variable

depends on its past values. Another common model is the moving average model(MA), where the

current value is expressed as a linear as a linear combination of past error terms. And the ARIMA

models are can be considered as the single equation regression models.

28
ACF (Auto Correlation Function):

Auto Correlation function takes into consideration of all the past observations irrespective of its

effect on the future or present time period. It calculates the correlation between the t and (t-k)

time period. It includes all the lags or intervals between t and (t-k) time periods. Correlation is

always calculated using the Pearson Correlation formula

P earson′ scorrelationcoef f icient = ρ(X, Y ) = (cov(X, Y )/σX.σY

PACF (Partial Correlation Function):

The PACF determines the partial correlation between time period t and t-k. It doesn’t take into

consideration all the time lags between t and t-k.

e.g : let’s assume that today’s stock price may be dependent on 3 days prior stock price but

it might not take into consideration yesterday’s stock price closure. Therefore we considers the

time lags having a direct impact on future time period by neglecting the insignificant time lags in

between the two-time slots t and t-k

3.2.3 ARIMA Models:

The model which is a forecasting algorithm based on the assumption that previous values carry

inherent information and can be used to predict future values. In order to understand ARIMA,

we first have to separate it into its foundational constituents

1. AR

2. I

3. MA

29
The ARIMA model takes in three parameters:

1. p is the order of the AR term

2. q is the order of the MA term

3. d is the number of differencing

Autoregressive AR and Moving average MA

AR (auto-Regressive) Model

The AR model only depends on past values to estimate future values. Generalized form of the

AR model:
p
X
AR(p) : xt = α + βi xt−i + ϵ
i=1

The value p determines the number of past values p will be taken into account for the prediction.

The higher the order of the model, the more past values will be taken into account.

30
Consider an example of a milk distribution company that produces milk every month in the

country. We want to calculate the amount of milk to be produced current month considering the

milk generated in the last year.

We begin by calculating the PACF values of all the 12 lags with respect to the current month. If

the value of the PACF of any particular month is more than a significant value only those values

will be considered for the model analysis.

e.g : in the above figure the values 1,2, 3 up to 12 displays the direct effect(PACF) of the milk

production in the current month w.r.t the given the lag t. If we consider two significant values

above the threshold then the model will be termed as AR(2).

(The AR model can simply be thought of as the linear combination of p past values.)

MA (Moving Average ) Model

The moving-average model depends on past forecast errors to make predictions. Generalised

form of MA model:

yt = a1 ϵt−1 + a2 ϵt−2 + a3 ϵt−3 + . . . + ak ϵt−k

31
Consider an example of Cake distribution during a birthday function . Let’s assume that a person

asks you to bring pastries to the party. Every year you miss judging the number of invites to the

party and end upbringing more or less no of cakes as per requirement. The difference in the actual

and expected results in the error. So you want to avoid the error for this year hence we apply the

moving average model on the time series and calculate the number of pastries needed this year

based on past collective errors. Next, calculate the ACF values of all the lags in the time series.

If the value of the ACF of any particular month is more than a significant value only those values

will be considered for the model analysis.

e.g : in the above figure the values 1,2, 3 up to 12 displays the total error(ACF) of count in pastries

current month w.r.t the given the lag t by considering all the in-between lags between time t and

current month. If we consider two significant values above the threshold then the model will be

termed as MA(2).

( The MA model can simply be thought of as the linear combination of q past forecast errors.)

ARMA (Auto Regressive Moving Average) Model:

This is a model that is combined from the AR and MA models. In this model, the impact of

32
previous lags along with the residuals is considered for forecasting the future values of the time

series. Here β represents the coefficients of the AR model and α represents the coefficients of the

MA model.

Yt = β1 · yt−1 + α1 · εt−1 + β2 · yt−2 + α2 · εt−2 + β3 · yt−3 + α3 · εt−3 + . . . + βk · yt−k + αk · εt−k

Integrated ( I ) :

1. The integrated part refers to differencing the time series data to make it stationary.

2. Stationarity means that the statistical properties of the time series, such as mean and vari-

ance, remain constant over time.

3. The differencing parameter ”d” represents the number of times differencing is needed to

achieve stationarity.

ARIMA(Auto-Regressive Integrated Moving Average ) Model:

The ARIMA model is quite similar to the ARMA model other than the fact that it includes one

more factor known as Integrated( I ) i.e. differencing which stands for I in the ARIMA model. So

in short ARIMA model is a combination of a number of differences already applied on the model

in order to make it stationary, the number of previous lags along with residuals errors in order to

forecast future values.

33
Chapter 4

FITTING

One of the main approach is by using The Box-Jenkins method.

4.1 Fitting ARIMA models : The Box-Jenkins approach

The Box-Jenkins procedure is concerned with fitting an ARIMA model to data. It has three parts:

• Identification

• Estimation

• Verification

The data may require pre-processing to make it stationary. To achieve stationarity we may do any

of the following:

34
• Look at the time series

• Re-scale it ( for instance , by a logarithmic or exponential transform)

• Remove deterministic components

• Difference it until stationary. In practice d = 1, 2 should be sufficient.

4.1.1 Identification

For the moment we will assume that our series is stationary. The initial model identification is

carried out by estimating the sample autocorrelations and partial autocorrelations and comparing

the resulting sample autocorrelograms and partial autocorrelograms with the theoretical ACF and

PACF derived already.

• An MA(q) process has negligible ACF after the qth term

• An AR(p) process has negligible PACF after the pth term

As we have noted, very approximately, both the sample ACF and PACF have standard deviation

of around √1T , where T is the length of the series. A rule of thumb is that ACF and PACF values

±2
are negligible when they lie between √
T
. An ARMA(p, q) process has kth order sample ACF and

PACF decaying geometrically for k > max(p, q).

4.1.2 Estimation

: ARMA Process Now we consider an ARMA(p, q) process. If we assume a parametric model for

the white noise - this parametric model will be that of Gaussian white noise we can use maximum

likelihood.

35
We rely on the prediction error decomposition.That is ,X1 , . . . , X,have joint density

n
Y
f (X1 , . . . , Xn ) = f (X1 ) f (Xt |X1 , . . . , Xt−1 )
t=2

Suppose the conditional distribution of Xt given X1 , . . . , Xt−1 is normal with meanX̄t , and

variance Pt − 1, and suppose that X1 ∼ N (X̄1 , P0 ) Then for the log likelihood we obtain

n−1
X (Xt − X̄t )2

−2 log L = log(2π) + log Pt−1 +
t=1
Pt−1

Here X̄ and Pt−1 are functions of the parameters α1 , . . . , αp , β1 , . . . , βq and so maximum likelihood

estimators can be found (numerically) by minimising −2 log L with respect to these parameters.

The matrix of second derivatives of −2 log L, evaluated at the maximum likelihood funtion esti-

mation, is the observed information matrix, and its inverse is an approximation to the covariance

matrix of the estimators. Hence we can obtain approximate standard errors for the parameters

from this matrix.

Estimation : AR processes

For the AR(p) process


p
X
Xt = αi Xt−1 + εt (4.1.1)
i=1

we have the Yule-Walker equations

p
X
ρk = αi ρ| i − k|, for k > 0 (4.1.2)
i=1

We fit the parameters α1 , . . . , αp by solving

p
X
rk = αi r|i − k| k = 1, . . . , p (4.1.3)
i=1

These are p equations for the p unknowns α1 , . . . , αp which, as before, can be solved using a

36
Levinson-Durbin recursion. The Levinson-Durbin recursion gives the residual variance

 2
n p
1 X X
σ 2p = Xt − ᾱj Xt−j  (4.1.4)
n t=p+1 j=1

This can be used to select the appropriate order p. Define an approximate log likelihood by

−2 log L = n log σ̄p2



(4.1.5)

Then this can be used for likelihood ratio tests.

4.1.3 Verification

The third step is to check whether the model fits the data Two main techniques for model verifi-

cation are

• Overfitting: add extra parameters to the model and use likelihood ratio or t tests to check

that they are not significant.

• Residual analysis: calculate residuals from the fitted model and plot their acf, pacf, ‘spectral

density estimates’, etc, to check that they are consistent with white noise.

White noise : white noise refers to a random signal with a constant mean and constant variance.

It is a type of stochastic process where each data point is independent and identically distributed

(i.i.d.), meaning that there is no correlation between successive observations.

Characteristics of white noise

• Constant Mean

• Constant Variance

37
• Independence

• Randomness

Tests for white noise The Box–Pierce test is based on the statistic

m
X
2
Qm = T r2k (4.1.6)
k=1

where rk is the kth sample autocorrelation coefficient of the residual series, and p + q < m ≪ T .

It is called a ‘portmanteau test’. Because it is based on the all-inclusive statistic. If the model is

correct then Qm ∼ χ2m−p−q approximately

In fact ,r − k has variance (T − k)/(T (t + 2)) , an improved test is the Box-Ljung procedure which

replaces Q by

m
X (−1) 2
Qm = T (T + 2) (T − k) r2k (4.1.7)
k=1

where again Q̄m ∼ χ2m−p−q approximately

4.2 Application For Time Series:

4.2.1 Forecasting

Following is the prediction data for Bitcoin obtained from Yahoo Finance using the

Python programming language.

1 pip install yfinance

2 pip install statsmodels

3 pip install pmdarima

38
4 pip install matplotlib

5 import pandas as pd

6 import math

7 import numpy as np

8 import statsmodels

9 import pandas as pd

10 import numpy as np

11 import matplotlib . pyplot as plt

12 import yfinance as yf

13 from statsmodels . graphics . tsaplots import plot_acf , plot_pacf

14 from statsmodels . tsa . arima . model import ARIMA

15 from statsmodels . tsa . statespace . sarimax import SARIMAX

16 from time import time

17 import datetime

18 import warnings

19 from pmdarima . arima import auto_arima

20 warnings . filterw arnings ( ’ ignore ’)

21 import matplotlib . pyplot as plt

22 from statsmodels . tsa . arima . model import ARIMA

23 from sklearn . metrics import mean_squared_error , m e a n _ a b s o l u t e _ e r r o r

24 # Additional code for importing yfinance and downloading data

25 import yfinance as yf

26 df = yf . download ( ’BTC - USD ’)

1 df

1 plt . plot ( df . index , df [ ’ Adj Close ’ ])

2 plt . show

1 # Train test split

2 to_row = int ( len ( df ) *0.9)

3 training_data = list ( df [0: to_row ][ ’ Adj Close ’ ])

4 testing_data = list ( df [ to_row :][ ’ Adj Close ’ ])

1 # split data into train and training set

2 plt . figure ( figsize =(10 ,6) )

39
3 plt . grid = ( True )

4 plt . xlabel ( ’ Dates ’)

5 plt . ylabel ( ’ Closing Prices ’)

6 plt . plot ( df [0: to_row ][ ’ Adj Close ’] , ’ green ’ , label = ’ Train data ’)

7 plt . plot ( df [ to_row :][ ’ Adj Close ’] , ’ Blue ’ , label = ’ Test data ’ )

8 plt . legend ()

1 m o d e l _ p r e d i c t i o n s = []

2 n_test_obser = len ( testing_data )

40
Figure 4.1: Enter Caption

1 arima_model = auto_arima ( testing_data , s u p p r e s s _ w a r n i n g s = True , seasonal = False )

2 p , d , q = arima_model . order

4 print ( f ’ Best ARIMA model parameters : p ={ p } , d ={ d } , q ={ q } ’)

1 for i in range ( n_test_obser ) :

2 model = ARIMA ( training_data , order = (0 ,1 ,0) )

3 model_fit = model . fit ()

4 output = model_fit . forecast ()

5 print ( output )

6 yhat = output [0]

1 # Use auto_arima to find the best ARIMA parameters

2 model = auto_arima ( testing_data , s u p p r e s s _ w a r n i n g s = True , m =5 , trace = True ,

error_action = ’ ignore ’ , seasonal = False )

3 order = model . get_params () [ ’ order ’]

5 # Print the found ARIMA order (p , d , q )

41
6 print ( " ARIMA Order (p , d , q ) : " , order )

8 # Fit ARIMA model using the found parameters

9 arima_model = ARIMA ( training_data , order = order )

10 arima_result = arima_model . fit ()

11

12

13 # Display the summary of the ARIMA model

14 print ( arima_result . summary () )

42
1 for i in range ( n_test_obser ) :

2 model = ARIMA ( training_data , order = (0 ,1 ,0) )

3 model_fit = model . fit ()

4 output = model_fit . forecast ()

5 yhat = output [0]

6 m o d e l _ p r e d i c t i o n s . append ( yhat )

7 a c t u a l _ t e s t _ v a l u e = testing_data [ i ]

8 training_data . append ( a c t u a l _ t e s t _ v a l u e )

9 print ( model_fit . summary () )

43
1 plt . figure ( figsize =(11 ,7) )

2 plt . grid = ( True )

4 data_range = df [ to_row :]. index

5 plt . plot ( data_range , m o d e l _ p r e d i c t i o n s , color = ’ blue ’ , marker = ’o ’ , linestyle = ’

dashed ’ , label = ’ BTC Predicted Price ’)

6 plt . plot ( data_range , testing_data , color = ’ red ’ , label = ’ BTC Actual Price ’)

7 plt . title ( ’ Bitcoin Price Prediction ’)

8 plt . xlabel ( ’ Date ’)

9 plt . ylabel ( ’ Price ’)

10 plt . legend ()

11 plt . show ()

44
45
CONCLUSION

In the introductory chapter, the fundamental concepts of the time series were explored. A clear

definition of time series was provided, emphasizing its sequential nature and the inherent patterns

within. The discussion delved into the distinctions between long-term and short-term variations,

illustrating these concepts through practical examples. By understanding the classifications of

variations within time series, readers gained insight into the complexity and diversity of temporal

data. This groundwork set the stage for an in-depth exploration of techniques to analyse and

model time-dependent phenomena. The second chapter focused on estimating trends in time se-

ries data, employing various methodologies for measuring secular trends and seasonal variations.

The analysis included methods such as the Method of Selected Points, Least Squares, Ratio to

Trend, and Moving Average Method. The exploration extended to cyclic and irregular varia-

tions, with concrete examples elucidating the application of each approach. By comprehensively

evaluating different trend estimation methods, the chapter provided a robust foundation for sub-

sequent modeling and forecasting endeavours. The third chapter delved into the crucial task of

identifying time series models, employing a two-fold classification based on components and fore-

casting. The first classification distinguished between additive and multiplicative models, offering

a nuanced understanding of their respective applications. Subsequently, the discussion expanded

to forecasting-related models, encompassing exponential smoothing methods, single-equation re-

gression models, simultaneous-equation regression models, and Autoregressive Integrated Moving

Average (ARIMA) models. The incorporation of graph representations enhanced the conceptual

46
clarity, facilitating a more intuitive grasp of the modeling approaches presented. The final chap-

ter culminated in the application of the ARIMA model, as outlined in Chapter three , through

the Box-Jenkins approach. The process of model fitting was intricately detailed, emphasizing the

systematic steps involved. To exemplify the practical utility of the developed models, a real-world

case study involving Bitcoin data was undertaken. The application side showcased the forecasting

capabilities of the ARIMA model, demonstrating its adaptability to dynamic and volatile datasets.

The chapter provided a conclusive demonstration of the theoretical concepts discussed throughout

the project, offering valuable insights into the applicative potential of time series modeling tech-

niques.

47
BIBLIOGRAPHY

Books

1. BASIC ECONOMETRICS by Damodar N Gujarati , Dawn C Porter , Sangeetha

Gunasekar

2. STATISTICS FOR BUSINESS & ECONOMICS by Dennis J. Sweeney, Thomas

Arthur Williams, David R. Anderson , Jeffrey D. Camm , James J. Cochran

3. TIME SERIES MODELS by A.C. HARVEY

Websites

1. data.world

2. finance.yahoo.com

3. towardsdatascience.com

48

You might also like